Re: [Intel-gfx] [RFC PATCH 00/20] Initial Xe driver submission

2023-01-17 Thread Jason Ekstrand
On Thu, Jan 12, 2023 at 11:17 AM Matthew Brost 
wrote:

> On Thu, Jan 12, 2023 at 10:54:25AM +0100, Lucas De Marchi wrote:
> > On Thu, Jan 05, 2023 at 09:27:57PM +, Matthew Brost wrote:
> > > On Tue, Jan 03, 2023 at 12:21:08PM +, Tvrtko Ursulin wrote:
> > > >
> > > > On 22/12/2022 22:21, Matthew Brost wrote:
> > > > > Hello,
> > > > >
> > > > > This is a submission for Xe, a new driver for Intel GPUs that
> supports both
> > > > > integrated and discrete platforms starting with Tiger Lake (first
> platform with
> > > > > Intel Xe Architecture). The intention of this new driver is to
> have a fresh base
> > > > > to work from that is unencumbered by older platforms, whilst also
> taking the
> > > > > opportunity to rearchitect our driver to increase sharing across
> the drm
> > > > > subsystem, both leveraging and allowing us to contribute more
> towards other
> > > > > shared components like TTM and drm/scheduler. The memory model is
> based on VM
> > > > > bind which is similar to the i915 implementation. Likewise the
> execbuf
> > > > > implementation for Xe is very similar to execbuf3 in the i915 [1].
> > > > >
> > > > > The code is at a stage where it is already functional and has
> experimental
> > > > > support for multiple platforms starting from Tiger Lake, with
> initial support
> > > > > implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan
> drivers), as well
> > > > > as in NEO (for OpenCL and Level0). A Mesa MR has been posted [2]
> and NEO
> > > > > implementation will be released publicly early next year. We also
> have a suite
> > > > > of IGTs for XE that will appear on the IGT list shortly.
> > > > >
> > > > > It has been built with the assumption of supporting multiple
> architectures from
> > > > > the get-go, right now with tests running both on X86 and ARM
> hosts. And we
> > > > > intend to continue working on it and improving on it as part of
> the kernel
> > > > > community upstream.
> > > > >
> > > > > The new Xe driver leverages a lot from i915 and work on i915
> continues as we
> > > > > ready Xe for production throughout 2023.
> > > > >
> > > > > As for display, the intent is to share the display code with the
> i915 driver so
> > > > > that there is maximum reuse there. Currently this is being done by
> compiling the
> > > > > display code twice, but alternatives to that are under
> consideration and we want
> > > > > to have more discussion on what the best final solution will look
> like over the
> > > > > next few months. Right now, work is ongoing in refactoring the
> display codebase
> > > > > to remove as much as possible any unnecessary dependencies on i915
> specific data
> > > > > structures there..
> > > > >
> > > > > We currently have 2 submission backends, execlists and GuC. The
> execlist is
> > > > > meant mostly for testing and is not fully functional while GuC
> backend is fully
> > > > > functional. As with the i915 and GuC submission, in Xe the GuC
> firmware is
> > > > > required and should be placed in /lib/firmware/xe.
> > > >
> > > > What is the plan going forward for the execlists backend? I think it
> would
> > > > be preferable to not upstream something semi-functional and so to
> carry
> > > > technical debt in the brand new code base, from the very start. If
> it is for
> > > > Tigerlake, which is the starting platform for Xe, could it be made
> GuC only
> > > > Tigerlake for instance?
> > > >
> > >
> > > A little background here. In the original PoC written by Jason and
> Dave,
> > > the execlist backend was the only one present and it was in
> semi-working
> > > state. As soon as myself and a few others started working on Xe we went
> > > full in a on the GuC backend. We left the execlist backend basically in
> > > the state it was in. We left it in place for 2 reasons.
> > >
> > > 1. Having 2 backends from the start ensured we layered our code
> > > correctly. The layer was a complete disaster in the i915 so we really
> > > wanted to avoid that.
> > > 2. The thought was it might be needed for early product bring up one
> > > day.
> > >
> > > As I think about this a bit more, we likely just delete execlist
> backend
> > > before merging this upstream and perhaps just carry 1 large patch
> > > internally with this implementation that we can use as needed. Final
> > > decession TDB though.
> >
> > but that might regress after some time on "let's keep 2 backends so we
> > layer the code correctly". Leaving the additional backend behind
> > CONFIG_BROKEN or XE_EXPERIMENTAL, or something like that, not
> > enabled by distros, but enabled in CI would be a good idea IMO.
> >
> > Carrying a large patch out of tree would make things harder for new
> > platforms. A perfect backend split would make it possible, but like I
> > said, we are likely not to have it if we delete the second backend.
> >
>
> Good points here Lucas. One thing that we absolutely have wrong is
> falling back to execlists if GuC firmware is missing. We def should not
> be 

Re: [Intel-gfx] [RFC PATCH 00/20] Initial Xe driver submission

2023-01-17 Thread Jason Ekstrand
On Thu, Dec 22, 2022 at 4:29 PM Matthew Brost 
wrote:

> Hello,
>
> This is a submission for Xe, a new driver for Intel GPUs that supports both
> integrated and discrete platforms starting with Tiger Lake (first platform
> with
> Intel Xe Architecture). The intention of this new driver is to have a
> fresh base
> to work from that is unencumbered by older platforms, whilst also taking
> the
> opportunity to rearchitect our driver to increase sharing across the drm
> subsystem, both leveraging and allowing us to contribute more towards other
> shared components like TTM and drm/scheduler. The memory model is based on
> VM
> bind which is similar to the i915 implementation. Likewise the execbuf
> implementation for Xe is very similar to execbuf3 in the i915 [1].
>
> The code is at a stage where it is already functional and has experimental
> support for multiple platforms starting from Tiger Lake, with initial
> support
> implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan drivers), as
> well
> as in NEO (for OpenCL and Level0). A Mesa MR has been posted [2] and NEO
> implementation will be released publicly early next year. We also have a
> suite
> of IGTs for XE that will appear on the IGT list shortly.
>
> It has been built with the assumption of supporting multiple architectures
> from
> the get-go, right now with tests running both on X86 and ARM hosts. And we
> intend to continue working on it and improving on it as part of the kernel
> community upstream.
>
> The new Xe driver leverages a lot from i915 and work on i915 continues as
> we
> ready Xe for production throughout 2023.
>
> As for display, the intent is to share the display code with the i915
> driver so
> that there is maximum reuse there. Currently this is being done by
> compiling the
> display code twice, but alternatives to that are under consideration and
> we want
> to have more discussion on what the best final solution will look like
> over the
> next few months. Right now, work is ongoing in refactoring the display
> codebase
> to remove as much as possible any unnecessary dependencies on i915
> specific data
> structures there..
>
> We currently have 2 submission backends, execlists and GuC. The execlist is
> meant mostly for testing and is not fully functional while GuC backend is
> fully
> functional. As with the i915 and GuC submission, in Xe the GuC firmware is
> required and should be placed in /lib/firmware/xe.
>
> The GuC firmware can be found in the below location:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
>
> The easiest way to setup firmware is:
> cp -r /lib/firmware/i915 /lib/firmware/xe
>
> The code has been organized such that we have all patches that touch areas
> outside of drm/xe first for review, and then the actual new driver in a
> separate
> commit. The code which is outside of drm/xe is included in this RFC while
> drm/xe is not due to the size of the commit. The drm/xe is code is
> available in
> a public repo listed below.
>
> Xe driver commit:
>
> https://cgit.freedesktop.org/drm/drm-xe/commit/?h=drm-xe-next=9cb016ebbb6a275f57b1cb512b95d5a842391ad7


Drive-by comment here because I don't see any actual xe patches on the list:

You probably want to drop DRM_XE_SYNC_DMA_BUF from the uAPI.  Now that
we've landed the new dma-buf ioctls for sync_file import/export, there's
really no reason to have it as part of submit.  Dropping it should also
make locking a tiny bit easier.

--Jason



> Xe kernel repo:
> https://cgit.freedesktop.org/drm/drm-xe/
>
> There's a lot of work still to happen on Xe but we're very excited about
> it and
> wanted to share it early and welcome feedback and discussion.
>
> Cheers,
> Matthew Brost
>
> [1] https://patchwork.freedesktop.org/series/105879/
> [2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20418
>
> Maarten Lankhorst (12):
>   drm/amd: Convert amdgpu to use suballocation helper.
>   drm/radeon: Use the drm suballocation manager implementation.
>   drm/i915: Remove gem and overlay frontbuffer tracking
>   drm/i915/display: Neuter frontbuffer tracking harder
>   drm/i915/display: Add more macros to remove all direct calls to uncore
>   drm/i915/display: Remove all uncore mmio accesses in favor of intel_de
>   drm/i915: Rename find_section to find_bdb_section
>   drm/i915/regs: Set DISPLAY_MMIO_BASE to 0 for xe
>   drm/i915/display: Fix a use-after-free when intel_edp_init_connector
> fails
>   drm/i915/display: Remaining changes to make xe compile
>   sound/hda: Allow XE as i915 replacement for sound
>   mei/hdcp: Also enable for XE
>
> Matthew Brost (5):
>   drm/sched: Convert drm scheduler to use a work queue rather than
> kthread
>   drm/sched: Add generic scheduler message interface
>   drm/sched: Start run wq before TDR in drm_sched_start
>   drm/sched: Submit job before starting TDR
>   drm/sched: Add helper to set TDR timeout
>
> Thomas Hellström (3):
>   drm/suballoc: Introduce a generic 

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2023 at 4:32 PM Matthew Brost 
wrote:

> On Wed, Jan 11, 2023 at 04:18:01PM -0600, Jason Ekstrand wrote:
> > On Wed, Jan 11, 2023 at 2:50 AM Tvrtko Ursulin <
> > tvrtko.ursu...@linux.intel.com> wrote:
> >
> > >
> > > On 10/01/2023 14:08, Jason Ekstrand wrote:
> > > > On Tue, Jan 10, 2023 at 5:28 AM Tvrtko Ursulin
> > > >  tvrtko.ursu...@linux.intel.com>>
> > >
> > > > wrote:
> > > >
> > > >
> > > >
> > > > On 09/01/2023 17:27, Jason Ekstrand wrote:
> > > >
> > > > [snip]
> > > >
> > > >  >  >>> AFAICT it proposes to have 1:1 between *userspace*
> > > created
> > > >  > contexts (per
> > > >  >  >>> context _and_ engine) and drm_sched. I am not sure
> > > avoiding
> > > >  > invasive changes
> > > >  >  >>> to the shared code is in the spirit of the overall
> idea
> > > > and instead
> > > >  >  >>> opportunity should be used to look at way to
> > > > refactor/improve
> > > >  > drm_sched.
> > > >  >
> > > >  >
> > > >  > Maybe?  I'm not convinced that what Xe is doing is an abuse at
> > > > all or
> > > >  > really needs to drive a re-factor.  (More on that later.)
> > > > There's only
> > > >  > one real issue which is that it fires off potentially a lot of
> > > > kthreads.
> > > >  > Even that's not that bad given that kthreads are pretty light
> and
> > > > you're
> > > >  > not likely to have more kthreads than userspace threads which
> are
> > > > much
> > > >  > heavier.  Not ideal, but not the end of the world either.
> > > > Definitely
> > > >  > something we can/should optimize but if we went through with
> Xe
> > > > without
> > > >  > this patch, it would probably be mostly ok.
> > > >  >
> > > >  >  >> Yes, it is 1:1 *userspace* engines and drm_sched.
> > > >  >  >>
> > > >  >  >> I'm not really prepared to make large changes to DRM
> > > > scheduler
> > > >  > at the
> > > >  >  >> moment for Xe as they are not really required nor does
> > > Boris
> > > >  > seem they
> > > >  >  >> will be required for his work either. I am interested
> to
> > > see
> > > >  > what Boris
> > > >  >  >> comes up with.
> > > >  >  >>
> > > >  >  >>> Even on the low level, the idea to replace drm_sched
> > > threads
> > > >  > with workers
> > > >  >  >>> has a few problems.
> > > >  >  >>>
> > > >  >  >>> To start with, the pattern of:
> > > >  >  >>>
> > > >  >  >>>while (not_stopped) {
> > > >  >  >>> keep picking jobs
> > > >  >  >>>}
> > > >  >  >>>
> > > >  >  >>> Feels fundamentally in disagreement with workers
> (while
> > > >  > obviously fits
> > > >  >  >>> perfectly with the current kthread design).
> > > >  >  >>
> > > >  >  >> The while loop breaks and worker exists if no jobs are
> > > ready.
> > > >  >
> > > >  >
> > > >  > I'm not very familiar with workqueues. What are you saying
> would
> > > fit
> > > >  > better? One scheduling job per work item rather than one big
> work
> > > > item
> > > >  > which handles all available jobs?
> > > >
> > > > Yes and no, it indeed IMO does not fit to have a work item which
> is
> > > > potentially unbound in runtime. But it is a bit moot conceptual
> > > > mismatch
> > > > because it is a worst case / theoretical, and I think due more
> > > > fundamental concerns.
> > > >

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2023 at 2:50 AM Tvrtko Ursulin <
tvrtko.ursu...@linux.intel.com> wrote:

>
> On 10/01/2023 14:08, Jason Ekstrand wrote:
> > On Tue, Jan 10, 2023 at 5:28 AM Tvrtko Ursulin
> > mailto:tvrtko.ursu...@linux.intel.com>>
>
> > wrote:
> >
> >
> >
> > On 09/01/2023 17:27, Jason Ekstrand wrote:
> >
> > [snip]
> >
> >  >  >>> AFAICT it proposes to have 1:1 between *userspace*
> created
> >  > contexts (per
> >  >  >>> context _and_ engine) and drm_sched. I am not sure
> avoiding
> >  > invasive changes
> >  >  >>> to the shared code is in the spirit of the overall idea
> > and instead
> >  >  >>> opportunity should be used to look at way to
> > refactor/improve
> >  > drm_sched.
> >  >
> >  >
> >  > Maybe?  I'm not convinced that what Xe is doing is an abuse at
> > all or
> >  > really needs to drive a re-factor.  (More on that later.)
> > There's only
> >  > one real issue which is that it fires off potentially a lot of
> > kthreads.
> >  > Even that's not that bad given that kthreads are pretty light and
> > you're
> >  > not likely to have more kthreads than userspace threads which are
> > much
> >  > heavier.  Not ideal, but not the end of the world either.
> > Definitely
> >  > something we can/should optimize but if we went through with Xe
> > without
> >  > this patch, it would probably be mostly ok.
> >  >
> >  >  >> Yes, it is 1:1 *userspace* engines and drm_sched.
> >  >  >>
> >  >  >> I'm not really prepared to make large changes to DRM
> > scheduler
> >  > at the
> >  >  >> moment for Xe as they are not really required nor does
> Boris
> >  > seem they
> >  >  >> will be required for his work either. I am interested to
> see
> >  > what Boris
> >  >  >> comes up with.
> >  >  >>
> >  >  >>> Even on the low level, the idea to replace drm_sched
> threads
> >  > with workers
> >  >  >>> has a few problems.
> >  >  >>>
> >  >  >>> To start with, the pattern of:
> >  >  >>>
> >  >  >>>while (not_stopped) {
> >  >  >>> keep picking jobs
> >  >  >>>}
> >  >  >>>
> >  >  >>> Feels fundamentally in disagreement with workers (while
> >  > obviously fits
> >  >  >>> perfectly with the current kthread design).
> >  >  >>
> >  >  >> The while loop breaks and worker exists if no jobs are
> ready.
> >  >
> >  >
> >  > I'm not very familiar with workqueues. What are you saying would
> fit
> >  > better? One scheduling job per work item rather than one big work
> > item
> >  > which handles all available jobs?
> >
> > Yes and no, it indeed IMO does not fit to have a work item which is
> > potentially unbound in runtime. But it is a bit moot conceptual
> > mismatch
> > because it is a worst case / theoretical, and I think due more
> > fundamental concerns.
> >
> > If we have to go back to the low level side of things, I've picked
> this
> > random spot to consolidate what I have already mentioned and perhaps
> > expand.
> >
> > To start with, let me pull out some thoughts from workqueue.rst:
> >
> > """
> > Generally, work items are not expected to hog a CPU and consume many
> > cycles. That means maintaining just enough concurrency to prevent
> work
> > processing from stalling should be optimal.
> > """
> >
> > For unbound queues:
> > """
> > The responsibility of regulating concurrency level is on the users.
> > """
> >
> > Given the unbound queues will be spawned on demand to service all
> > queued
> > work items (more interesting when mixing up with the
> > system_unbound_wq),
> > in the proposed design the number of in

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-01-10 Thread Jason Ekstrand
On Tue, Jan 10, 2023 at 5:28 AM Tvrtko Ursulin <
tvrtko.ursu...@linux.intel.com> wrote:

>
>
> On 09/01/2023 17:27, Jason Ekstrand wrote:
>
> [snip]
>
> >  >>> AFAICT it proposes to have 1:1 between *userspace* created
> > contexts (per
> >  >>> context _and_ engine) and drm_sched. I am not sure avoiding
> > invasive changes
> >  >>> to the shared code is in the spirit of the overall idea and
> instead
> >  >>> opportunity should be used to look at way to refactor/improve
> > drm_sched.
> >
> >
> > Maybe?  I'm not convinced that what Xe is doing is an abuse at all or
> > really needs to drive a re-factor.  (More on that later.)  There's only
> > one real issue which is that it fires off potentially a lot of kthreads.
> > Even that's not that bad given that kthreads are pretty light and you're
> > not likely to have more kthreads than userspace threads which are much
> > heavier.  Not ideal, but not the end of the world either.  Definitely
> > something we can/should optimize but if we went through with Xe without
> > this patch, it would probably be mostly ok.
> >
> >  >> Yes, it is 1:1 *userspace* engines and drm_sched.
> >  >>
> >  >> I'm not really prepared to make large changes to DRM scheduler
> > at the
> >  >> moment for Xe as they are not really required nor does Boris
> > seem they
> >  >> will be required for his work either. I am interested to see
> > what Boris
> >  >> comes up with.
> >  >>
> >  >>> Even on the low level, the idea to replace drm_sched threads
> > with workers
> >  >>> has a few problems.
> >  >>>
> >  >>> To start with, the pattern of:
> >  >>>
> >  >>>while (not_stopped) {
> >  >>> keep picking jobs
> >  >>>}
> >  >>>
> >  >>> Feels fundamentally in disagreement with workers (while
> > obviously fits
> >  >>> perfectly with the current kthread design).
> >  >>
> >  >> The while loop breaks and worker exists if no jobs are ready.
> >
> >
> > I'm not very familiar with workqueues. What are you saying would fit
> > better? One scheduling job per work item rather than one big work item
> > which handles all available jobs?
>
> Yes and no, it indeed IMO does not fit to have a work item which is
> potentially unbound in runtime. But it is a bit moot conceptual mismatch
> because it is a worst case / theoretical, and I think due more
> fundamental concerns.
>
> If we have to go back to the low level side of things, I've picked this
> random spot to consolidate what I have already mentioned and perhaps
> expand.
>
> To start with, let me pull out some thoughts from workqueue.rst:
>
> """
> Generally, work items are not expected to hog a CPU and consume many
> cycles. That means maintaining just enough concurrency to prevent work
> processing from stalling should be optimal.
> """
>
> For unbound queues:
> """
> The responsibility of regulating concurrency level is on the users.
> """
>
> Given the unbound queues will be spawned on demand to service all queued
> work items (more interesting when mixing up with the system_unbound_wq),
> in the proposed design the number of instantiated worker threads does
> not correspond to the number of user threads (as you have elsewhere
> stated), but pessimistically to the number of active user contexts.


Those are pretty much the same in practice.  Rather, user threads is
typically an upper bound on the number of contexts.  Yes, a single user
thread could have a bunch of contexts but basically nothing does that
except IGT.  In real-world usage, it's at most one context per user thread.


> That
> is the number which drives the maximum number of not-runnable jobs that
> can become runnable at once, and hence spawn that many work items, and
> in turn unbound worker threads.
>
> Several problems there.
>
> It is fundamentally pointless to have potentially that many more threads
> than the number of CPU cores - it simply creates a scheduling storm.
>
> Unbound workers have no CPU / cache locality either and no connection
> with the CPU scheduler to optimize scheduling patterns. This may matter
> either on large systems or on small ones. Whereas the current design
> allows for scheduler to notice userspace CPU thread keeps waking up th

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-01-09 Thread Jason Ekstrand
On Mon, Jan 9, 2023 at 7:46 AM Tvrtko Ursulin <
tvrtko.ursu...@linux.intel.com> wrote:

>
> On 06/01/2023 23:52, Matthew Brost wrote:
> > On Thu, Jan 05, 2023 at 09:43:41PM +, Matthew Brost wrote:
> >> On Tue, Jan 03, 2023 at 01:02:15PM +, Tvrtko Ursulin wrote:
> >>>
> >>> On 02/01/2023 07:30, Boris Brezillon wrote:
>  On Fri, 30 Dec 2022 12:55:08 +0100
>  Boris Brezillon  wrote:
> 
> > On Fri, 30 Dec 2022 11:20:42 +0100
> > Boris Brezillon  wrote:
> >
> >> Hello Matthew,
> >>
> >> On Thu, 22 Dec 2022 14:21:11 -0800
> >> Matthew Brost  wrote:
> >>> In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
> >>> mapping between a drm_gpu_scheduler and drm_sched_entity. At first
> this
> >>> seems a bit odd but let us explain the reasoning below.
> >>>
> >>> 1. In XE the submission order from multiple drm_sched_entity is not
> >>> guaranteed to be the same completion even if targeting the same
> hardware
> >>> engine. This is because in XE we have a firmware scheduler, the
> GuC,
> >>> which allowed to reorder, timeslice, and preempt submissions. If a
> using
> >>> shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR
> falls
> >>> apart as the TDR expects submission order == completion order.
> Using a
> >>> dedicated drm_gpu_scheduler per drm_sched_entity solve this
> problem.
> >>
> >> Oh, that's interesting. I've been trying to solve the same sort of
> >> issues to support Arm's new Mali GPU which is relying on a
> FW-assisted
> >> scheduling scheme (you give the FW N streams to execute, and it does
> >> the scheduling between those N command streams, the kernel driver
> >> does timeslice scheduling to update the command streams passed to
> the
> >> FW). I must admit I gave up on using drm_sched at some point, mostly
> >> because the integration with drm_sched was painful, but also
> because I
> >> felt trying to bend drm_sched to make it interact with a
> >> timeslice-oriented scheduling model wasn't really future proof.
> Giving
> >> drm_sched_entity exlusive access to a drm_gpu_scheduler probably
> might
> >> help for a few things (didn't think it through yet), but I feel it's
> >> coming short on other aspects we have to deal with on Arm GPUs.
> >
> > Ok, so I just had a quick look at the Xe driver and how it
> > instantiates the drm_sched_entity and drm_gpu_scheduler, and I think
> I
> > have a better understanding of how you get away with using drm_sched
> > while still controlling how scheduling is really done. Here
> > drm_gpu_scheduler is just a dummy abstract that let's you use the
> > drm_sched job queuing/dep/tracking mechanism. The whole run-queue
> > selection is dumb because there's only one entity ever bound to the
> > scheduler (the one that's part of the xe_guc_engine object which also
> > contains the drm_gpu_scheduler instance). I guess the main issue we'd
> > have on Arm is the fact that the stream doesn't necessarily get
> > scheduled when ->run_job() is called, it can be placed in the
> runnable
> > queue and be picked later by the kernel-side scheduler when a FW slot
> > gets released. That can probably be sorted out by manually disabling
> the
> > job timer and re-enabling it when the stream gets picked by the
> > scheduler. But my main concern remains, we're basically abusing
> > drm_sched here.
> >
> > For the Arm driver, that means turning the following sequence
> >
> > 1. wait for job deps
> > 2. queue job to ringbuf and push the stream to the runnable
> >  queue (if it wasn't queued already). Wakeup the timeslice
> scheduler
> >  to re-evaluate (if the stream is not on a FW slot already)
> > 3. stream gets picked by the timeslice scheduler and sent to the FW
> for
> >  execution
> >
> > into
> >
> > 1. queue job to entity which takes care of waiting for job deps for
> >  us
> > 2. schedule a drm_sched_main iteration
> > 3. the only available entity is picked, and the first job from this
> >  entity is dequeued. ->run_job() is called: the job is queued to
> the
> >  ringbuf and the stream is pushed to the runnable queue (if it
> wasn't
> >  queued already). Wakeup the timeslice scheduler to re-evaluate
> (if
> >  the stream is not on a FW slot already)
> > 4. stream gets picked by the timeslice scheduler and sent to the FW
> for
> >  execution
> >
> > That's one extra step we don't really need. To sum-up, yes, all the
> > job/entity tracking might be interesting to share/re-use, but I
> wonder
> > if we couldn't have that without pulling out the scheduling part of
> > drm_sched, or maybe I'm missing something, and there's something in
> > drm_gpu_scheduler you really need.
> 
>  On second 

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-01-09 Thread Jason Ekstrand
On Thu, Jan 5, 2023 at 1:40 PM Matthew Brost 
wrote:

> On Mon, Jan 02, 2023 at 08:30:19AM +0100, Boris Brezillon wrote:
> > On Fri, 30 Dec 2022 12:55:08 +0100
> > Boris Brezillon  wrote:
> >
> > > On Fri, 30 Dec 2022 11:20:42 +0100
> > > Boris Brezillon  wrote:
> > >
> > > > Hello Matthew,
> > > >
> > > > On Thu, 22 Dec 2022 14:21:11 -0800
> > > > Matthew Brost  wrote:
> > > >
> > > > > In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
> > > > > mapping between a drm_gpu_scheduler and drm_sched_entity. At first
> this
> > > > > seems a bit odd but let us explain the reasoning below.
> > > > >
> > > > > 1. In XE the submission order from multiple drm_sched_entity is not
> > > > > guaranteed to be the same completion even if targeting the same
> hardware
> > > > > engine. This is because in XE we have a firmware scheduler, the
> GuC,
> > > > > which allowed to reorder, timeslice, and preempt submissions. If a
> using
> > > > > shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR
> falls
> > > > > apart as the TDR expects submission order == completion order.
> Using a
> > > > > dedicated drm_gpu_scheduler per drm_sched_entity solve this
> problem.
> > > >
> > > > Oh, that's interesting. I've been trying to solve the same sort of
> > > > issues to support Arm's new Mali GPU which is relying on a
> FW-assisted
> > > > scheduling scheme (you give the FW N streams to execute, and it does
> > > > the scheduling between those N command streams, the kernel driver
> > > > does timeslice scheduling to update the command streams passed to the
> > > > FW). I must admit I gave up on using drm_sched at some point, mostly
> > > > because the integration with drm_sched was painful, but also because
> I
> > > > felt trying to bend drm_sched to make it interact with a
> > > > timeslice-oriented scheduling model wasn't really future proof.
> Giving
> > > > drm_sched_entity exlusive access to a drm_gpu_scheduler probably
> might
> > > > help for a few things (didn't think it through yet), but I feel it's
> > > > coming short on other aspects we have to deal with on Arm GPUs.
> > >
> > > Ok, so I just had a quick look at the Xe driver and how it
> > > instantiates the drm_sched_entity and drm_gpu_scheduler, and I think I
> > > have a better understanding of how you get away with using drm_sched
> > > while still controlling how scheduling is really done. Here
> > > drm_gpu_scheduler is just a dummy abstract that let's you use the
> > > drm_sched job queuing/dep/tracking mechanism. The whole run-queue
>
> You nailed it here, we use the DRM scheduler for queuing jobs,
> dependency tracking and releasing jobs to be scheduled when dependencies
> are met, and lastly a tracking mechanism of inflights jobs that need to
> be cleaned up if an error occurs. It doesn't actually do any scheduling
> aside from the most basic level of not overflowing the submission ring
> buffer. In this sense, a 1 to 1 relationship between entity and
> scheduler fits quite well.
>

Yeah, I think there's an annoying difference between what AMD/NVIDIA/Intel
want here and what you need for Arm thanks to the number of FW queues
available. I don't remember the exact number of GuC queues but it's at
least 1k. This puts it in an entirely different class from what you have on
Mali. Roughly, there's about three categories here:

 1. Hardware where the kernel is placing jobs on actual HW rings. This is
old Mali, Intel Haswell and earlier, and probably a bunch of others.
(Intel BDW+ with execlists is a weird case that doesn't fit in this
categorization.)

 2. Hardware (or firmware) with a very limited number of queues where
you're going to have to juggle in the kernel in order to run desktop Linux.

 3. Firmware scheduling with a high queue count. In this case, you don't
want the kernel scheduling anything. Just throw it at the firmware and let
it go br.  If we ever run out of queues (unlikely), the kernel can
temporarily pause some low-priority contexts and do some juggling or,
frankly, just fail userspace queue creation and tell the user to close some
windows.

The existence of this 2nd class is a bit annoying but it's where we are. I
think it's worth recognizing that Xe and panfrost are in different places
here and will require different designs. For Xe, we really are just using
drm/scheduler as a front-end and the firmware does all the real scheduling.

How do we deal with class 2? That's an interesting question.  We may
eventually want to break that off into a separate discussion and not litter
the Xe thread but let's keep going here for a bit.  I think there are some
pretty reasonable solutions but they're going to look a bit different.

The way I did this for Xe with execlists was to keep the 1:1:1 mapping
between drm_gpu_scheduler, drm_sched_entity, and userspace xe_engine.
Instead of feeding a GuC ring, though, it would feed a fixed-size execlist
ring and then there was a tiny kernel which operated entirely in IRQ

Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-30 Thread Jason Ekstrand
On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> VM_BIND and related uapi definitions
>
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand VM_UNBIND documentation and add
> I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> and I915_GEM_VM_BIND_TLB_FLUSH flags.
> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> documentation for vm_bind/unbind.
> v5: Remove TLB flush requirement on VM_UNBIND.
> Add version support to stage implementation.
> v6: Define and use drm_i915_gem_timeline_fence structure for
> all timeline fences.
> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> Update documentation on async vm_bind/unbind and versioning.
> Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> batch_count field and I915_EXEC3_SECURE flag.
>
> Signed-off-by: Niranjana Vishwanathapura <
> niranjana.vishwanathap...@intel.com>
> Reviewed-by: Daniel Vetter 
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++
>  1 file changed, 280 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index ..a93e08bceee6
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,280 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_VM_BIND_VERSION
> + *
> + * VM_BIND feature version supported.
> + * See typedef drm_i915_getparam_t param.
> + *
> + * Specifies the VM_BIND feature version supported.
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> + *previously with VM_BIND, the ioctl will not support unbinding
> multiple
> + *mappings or splitting them. Similarly, VM_BIND calls will not
> replace
> + *any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *lifted, Similarly, binding will replace any mappings in the given
> range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION 57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND   (1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND   0x3d
> +#define DRM_I915_GEM_VM_UNBIND 0x3e
> +#define DRM_I915_GEM_EXECBUFFER3   0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND   DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +   /** @handle: User's handle for a drm_syncobj to wait on or signal.
> */
> +   __u32 handle;
> +
> +   /**
> +* @flags: Supported flags are:
> +*
> +* I915_TIMELINE_FENCE_WAIT:
> +* Wait for the input fence before the operation.
> +*
> +* I915_TIMELINE_FENCE_SIGNAL:
> +* Return operation completion fence as output.
> +*/
> +   __u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT(1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL  (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL
> << 1))
> +
> +   /**
> +* @value: A point in the timeline.
> +* Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +* timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +* binary one.
> +*/
> +   __u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
> GPU
> + * virtual address (VA) range to the section of an object that should be
> bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not 

Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-30 Thread Jason Ekstrand
On Thu, Jun 30, 2022 at 10:14 AM Matthew Auld 
wrote:

> On 30/06/2022 06:11, Jason Ekstrand wrote:
> > On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
> >  > <mailto:niranjana.vishwanathap...@intel.com>> wrote:
> >
> > VM_BIND and related uapi definitions
> >
> > v2: Reduce the scope to simple Mesa use case.
> > v3: Expand VM_UNBIND documentation and add
> >  I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> >  and I915_GEM_VM_BIND_TLB_FLUSH flags.
> > v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> >  documentation for vm_bind/unbind.
> > v5: Remove TLB flush requirement on VM_UNBIND.
> >  Add version support to stage implementation.
> > v6: Define and use drm_i915_gem_timeline_fence structure for
> >  all timeline fences.
> > v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> >  Update documentation on async vm_bind/unbind and versioning.
> >  Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> >  batch_count field and I915_EXEC3_SECURE flag.
> >
> > Signed-off-by: Niranjana Vishwanathapura
> >  > <mailto:niranjana.vishwanathap...@intel.com>>
> > Reviewed-by: Daniel Vetter  > <mailto:daniel.vet...@ffwll.ch>>
> > ---
> >   Documentation/gpu/rfc/i915_vm_bind.h | 280
> +++
> >   1 file changed, 280 insertions(+)
> >   create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> >
> > diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> > b/Documentation/gpu/rfc/i915_vm_bind.h
> > new file mode 100644
> > index ..a93e08bceee6
> > --- /dev/null
> > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> > @@ -0,0 +1,280 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2022 Intel Corporation
> > + */
> > +
> > +/**
> > + * DOC: I915_PARAM_VM_BIND_VERSION
> > + *
> > + * VM_BIND feature version supported.
> > + * See typedef drm_i915_getparam_t param.
> > + *
> > + * Specifies the VM_BIND feature version supported.
> > + * The following versions of VM_BIND have been defined:
> > + *
> > + * 0: No VM_BIND support.
> > + *
> > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
> > created
> > + *previously with VM_BIND, the ioctl will not support unbinding
> > multiple
> > + *mappings or splitting them. Similarly, VM_BIND calls will not
> > replace
> > + *any existing mappings.
> > + *
> > + * 2: The restrictions on unbinding partial or multiple mappings is
> > + *lifted, Similarly, binding will replace any mappings in the
> > given range.
> > + *
> > + * See struct drm_i915_gem_vm_bind and struct
> drm_i915_gem_vm_unbind.
> > + */
> > +#define I915_PARAM_VM_BIND_VERSION 57
> > +
> > +/**
> > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> > + *
> > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> > + * See struct drm_i915_gem_vm_control flags.
> > + *
> > + * The older execbuf2 ioctl will not support VM_BIND mode of
> operation.
> > + * For VM_BIND mode, we have new execbuf3 ioctl which will not
> > accept any
> > + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> > + */
> > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND   (1 << 0)
> > +
> > +/* VM_BIND related ioctls */
> > +#define DRM_I915_GEM_VM_BIND   0x3d
> > +#define DRM_I915_GEM_VM_UNBIND 0x3e
> > +#define DRM_I915_GEM_EXECBUFFER3   0x3f
> > +
> > +#define DRM_IOCTL_I915_GEM_VM_BIND
> >   DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
> > drm_i915_gem_vm_bind)
> > +#define DRM_IOCTL_I915_GEM_VM_UNBIND
> >   DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
> > drm_i915_gem_vm_bind)
> > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
> >   DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
> > drm_i915_gem_execbuffer3)
> > +
> > +/**
> > + * struct drm_i915_gem_timeline_fence - An input or output timeline
> > fence.
> > + *
> > + * The operation will wait for input fence to

Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-29 Thread Jason Ekstrand
On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> VM_BIND and related uapi definitions
>
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand VM_UNBIND documentation and add
> I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> and I915_GEM_VM_BIND_TLB_FLUSH flags.
> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> documentation for vm_bind/unbind.
> v5: Remove TLB flush requirement on VM_UNBIND.
> Add version support to stage implementation.
> v6: Define and use drm_i915_gem_timeline_fence structure for
> all timeline fences.
> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> Update documentation on async vm_bind/unbind and versioning.
> Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> batch_count field and I915_EXEC3_SECURE flag.
>
> Signed-off-by: Niranjana Vishwanathapura <
> niranjana.vishwanathap...@intel.com>
> Reviewed-by: Daniel Vetter 
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++
>  1 file changed, 280 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index ..a93e08bceee6
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,280 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_VM_BIND_VERSION
> + *
> + * VM_BIND feature version supported.
> + * See typedef drm_i915_getparam_t param.
> + *
> + * Specifies the VM_BIND feature version supported.
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> + *previously with VM_BIND, the ioctl will not support unbinding
> multiple
> + *mappings or splitting them. Similarly, VM_BIND calls will not
> replace
> + *any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *lifted, Similarly, binding will replace any mappings in the given
> range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION 57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND   (1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND   0x3d
> +#define DRM_I915_GEM_VM_UNBIND 0x3e
> +#define DRM_I915_GEM_EXECBUFFER3   0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND   DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +   /** @handle: User's handle for a drm_syncobj to wait on or signal.
> */
> +   __u32 handle;
> +
> +   /**
> +* @flags: Supported flags are:
> +*
> +* I915_TIMELINE_FENCE_WAIT:
> +* Wait for the input fence before the operation.
> +*
> +* I915_TIMELINE_FENCE_SIGNAL:
> +* Return operation completion fence as output.
> +*/
> +   __u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT(1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL  (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL
> << 1))
> +
> +   /**
> +* @value: A point in the timeline.
> +* Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +* timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +* binary one.
> +*/
> +   __u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
> GPU
> + * virtual address (VA) range to the section of an object that should be
> bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not 

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-08 Thread Jason Ekstrand
On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote:
> >
> >
> >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote:
> >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana Vishwanathapura
> wrote:
> >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote:
> >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura
> >>>>  wrote:
> >>>>
> >>>>   On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote:
> >>>>   >   On 02/06/2022 23:35, Jason Ekstrand wrote:
> >>>>   >
> >>>>   > On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura
> >>>>   >  wrote:
> >>>>   >
> >>>>   >   On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew
> >>>>Brost wrote:
> >>>>   >   >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin
> >>>>   wrote:
> >>>>   >   >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:
> >>>>   >   >> > +VM_BIND/UNBIND ioctl will immediately start
> >>>>   binding/unbinding
> >>>>   >   the mapping in an
> >>>>   >   >> > +async worker. The binding and unbinding will
> >>>>work like a
> >>>>   special
> >>>>   >   GPU engine.
> >>>>   >   >> > +The binding and unbinding operations are serialized
> and
> >>>>   will
> >>>>   >   wait on specified
> >>>>   >   >> > +input fences before the operation and will signal the
> >>>>   output
> >>>>   >   fences upon the
> >>>>   >   >> > +completion of the operation. Due to serialization,
> >>>>   completion of
> >>>>   >   an operation
> >>>>   >   >> > +will also indicate that all previous operations
> >>>>are also
> >>>>   >   complete.
> >>>>   >   >>
> >>>>   >   >> I guess we should avoid saying "will immediately start
> >>>>   >   binding/unbinding" if
> >>>>   >   >> there are fences involved.
> >>>>   >   >>
> >>>>   >   >> And the fact that it's happening in an async
> >>>>worker seem to
> >>>>   imply
> >>>>   >   it's not
> >>>>   >   >> immediate.
> >>>>   >   >>
> >>>>   >
> >>>>   >   Ok, will fix.
> >>>>   >   This was added because in earlier design binding was
> deferred
> >>>>   until
> >>>>   >   next execbuff.
> >>>>   >   But now it is non-deferred (immediate in that sense).
> >>>>But yah,
> >>>>   this is
> >>>>   >   confusing
> >>>>   >   and will fix it.
> >>>>   >
> >>>>   >   >>
> >>>>   >   >> I have a question on the behavior of the bind
> >>>>operation when
> >>>>   no
> >>>>   >   input fence
> >>>>   >   >> is provided. Let say I do :
> >>>>   >   >>
> >>>>   >   >> VM_BIND (out_fence=fence1)
> >>>>   >   >>
> >>>>   >   >> VM_BIND (out_fence=fence2)
> >>>>   >   >>
> >>>>   >   >> VM_BIND (out_fence=fence3)
> >>>>   >   >>
> >>>>   >   >>
> >>>>   >   >> In what order are the fences going to be signaled?
> >>>>   >   >>
> >>>>   >   >> In the order of VM_BIND ioctls? Or out of order?
> >>>>   >   >>
> >>>>   >   >> Because you wrote "serialized I assume it's : in order
> >>>>   >   >>
> >>>>   >
> >>>>   >   Yes, in the order of VM_BIND/UNBIND ioctls. Note that
> >>>>bind and
> >>>>   unbind
> >>>>   >   will use
> >>>>   >   

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-07 Thread Jason Ekstrand
On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote:
> >   On 02/06/2022 23:35, Jason Ekstrand wrote:
> >
> > On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura
> >  wrote:
> >
> >   On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew Brost wrote:
> >   >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote:
> >   >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:
> >   >> > +VM_BIND/UNBIND ioctl will immediately start binding/unbinding
> >   the mapping in an
> >   >> > +async worker. The binding and unbinding will work like a
> special
> >   GPU engine.
> >   >> > +The binding and unbinding operations are serialized and will
> >   wait on specified
> >   >> > +input fences before the operation and will signal the output
> >   fences upon the
> >   >> > +completion of the operation. Due to serialization,
> completion of
> >   an operation
> >   >> > +will also indicate that all previous operations are also
> >   complete.
> >   >>
> >   >> I guess we should avoid saying "will immediately start
> >   binding/unbinding" if
> >   >> there are fences involved.
> >   >>
> >   >> And the fact that it's happening in an async worker seem to
> imply
> >   it's not
> >   >> immediate.
> >   >>
> >
> >   Ok, will fix.
> >   This was added because in earlier design binding was deferred until
> >   next execbuff.
> >   But now it is non-deferred (immediate in that sense). But yah,
> this is
> >   confusing
> >   and will fix it.
> >
> >   >>
> >   >> I have a question on the behavior of the bind operation when no
> >   input fence
> >   >> is provided. Let say I do :
> >   >>
> >   >> VM_BIND (out_fence=fence1)
> >   >>
> >   >> VM_BIND (out_fence=fence2)
> >   >>
> >   >> VM_BIND (out_fence=fence3)
> >   >>
> >   >>
> >   >> In what order are the fences going to be signaled?
> >   >>
> >   >> In the order of VM_BIND ioctls? Or out of order?
> >   >>
> >   >> Because you wrote "serialized I assume it's : in order
> >   >>
> >
> >   Yes, in the order of VM_BIND/UNBIND ioctls. Note that bind and
> unbind
> >   will use
> >   the same queue and hence are ordered.
> >
> >   >>
> >   >> One thing I didn't realize is that because we only get one
> >   "VM_BIND" engine,
> >   >> there is a disconnect from the Vulkan specification.
> >   >>
> >   >> In Vulkan VM_BIND operations are serialized but per engine.
> >   >>
> >   >> So you could have something like this :
> >   >>
> >   >> VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2)
> >   >>
> >   >> VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)
> >   >>
> >   >>
> >   >> fence1 is not signaled
> >   >>
> >   >> fence3 is signaled
> >   >>
> >   >> So the second VM_BIND will proceed before the first VM_BIND.
> >   >>
> >   >>
> >   >> I guess we can deal with that scenario in userspace by doing the
> >   wait
> >   >> ourselves in one thread per engines.
> >   >>
> >   >> But then it makes the VM_BIND input fences useless.
> >   >>
> >   >>
> >   >> Daniel : what do you think? Should be rework this or just deal
> with
> >   wait
> >   >> fences in userspace?
> >   >>
> >   >
> >   >My opinion is rework this but make the ordering via an engine
> param
> >   optional.
> >   >
> >   >e.g. A VM can be configured so all binds are ordered within the VM
> >   >
> >   >e.g. A VM can be configured so all binds accept an engine argument
> >   (in
> >   >the case of the i915 likely this is a gem context h

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-02 Thread Jason Ekstrand
On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew Brost wrote:
> >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote:
> >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:
> >> > +VM_BIND/UNBIND ioctl will immediately start binding/unbinding the
> mapping in an
> >> > +async worker. The binding and unbinding will work like a special GPU
> engine.
> >> > +The binding and unbinding operations are serialized and will wait on
> specified
> >> > +input fences before the operation and will signal the output fences
> upon the
> >> > +completion of the operation. Due to serialization, completion of an
> operation
> >> > +will also indicate that all previous operations are also complete.
> >>
> >> I guess we should avoid saying "will immediately start
> binding/unbinding" if
> >> there are fences involved.
> >>
> >> And the fact that it's happening in an async worker seem to imply it's
> not
> >> immediate.
> >>
>
> Ok, will fix.
> This was added because in earlier design binding was deferred until next
> execbuff.
> But now it is non-deferred (immediate in that sense). But yah, this is
> confusing
> and will fix it.
>
> >>
> >> I have a question on the behavior of the bind operation when no input
> fence
> >> is provided. Let say I do :
> >>
> >> VM_BIND (out_fence=fence1)
> >>
> >> VM_BIND (out_fence=fence2)
> >>
> >> VM_BIND (out_fence=fence3)
> >>
> >>
> >> In what order are the fences going to be signaled?
> >>
> >> In the order of VM_BIND ioctls? Or out of order?
> >>
> >> Because you wrote "serialized I assume it's : in order
> >>
>
> Yes, in the order of VM_BIND/UNBIND ioctls. Note that bind and unbind will
> use
> the same queue and hence are ordered.
>
> >>
> >> One thing I didn't realize is that because we only get one "VM_BIND"
> engine,
> >> there is a disconnect from the Vulkan specification.
> >>
> >> In Vulkan VM_BIND operations are serialized but per engine.
> >>
> >> So you could have something like this :
> >>
> >> VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2)
> >>
> >> VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)
> >>
> >>
> >> fence1 is not signaled
> >>
> >> fence3 is signaled
> >>
> >> So the second VM_BIND will proceed before the first VM_BIND.
> >>
> >>
> >> I guess we can deal with that scenario in userspace by doing the wait
> >> ourselves in one thread per engines.
> >>
> >> But then it makes the VM_BIND input fences useless.
> >>
> >>
> >> Daniel : what do you think? Should be rework this or just deal with wait
> >> fences in userspace?
> >>
> >
> >My opinion is rework this but make the ordering via an engine param
> optional.
> >
> >e.g. A VM can be configured so all binds are ordered within the VM
> >
> >e.g. A VM can be configured so all binds accept an engine argument (in
> >the case of the i915 likely this is a gem context handle) and binds
> >ordered with respect to that engine.
> >
> >This gives UMDs options as the later likely consumes more KMD resources
> >so if a different UMD can live with binds being ordered within the VM
> >they can use a mode consuming less resources.
> >
>
> I think we need to be careful here if we are looking for some out of
> (submission) order completion of vm_bind/unbind.
> In-order completion means, in a batch of binds and unbinds to be
> completed in-order, user only needs to specify in-fence for the
> first bind/unbind call and the our-fence for the last bind/unbind
> call. Also, the VA released by an unbind call can be re-used by
> any subsequent bind call in that in-order batch.
>
> These things will break if binding/unbinding were to be allowed to
> go out of order (of submission) and user need to be extra careful
> not to run into pre-mature triggereing of out-fence and bind failing
> as VA is still in use etc.
>
> Also, VM_BIND binds the provided mapping on the specified address space
> (VM). So, the uapi is not engine/context specific.
>
> We can however add a 'queue' to the uapi which can be one from the
> pre-defined queues,
> I915_VM_BIND_QUEUE_0
> I915_VM_BIND_QUEUE_1
> ...
> I915_VM_BIND_QUEUE_(N-1)
>
> KMD will spawn an async work queue for each queue which will only
> bind the mappings on that queue in the order of submission.
> User can assign the queue to per engine or anything like that.
>
> But again here, user need to be careful and not deadlock these
> queues with circular dependency of fences.
>
> I prefer adding this later an as extension based on whether it
> is really helping with the implementation.
>

I can tell you right now that having everything on a single in-order queue
will not get us the perf we want.  What vulkan really wants is one of two
things:

 1. No implicit ordering of VM_BIND ops.  They just happen in whatever
their dependencies are resolved and we ensure ordering ourselves by having
a syncobj in the VkQueue.

 2. The ability to create multiple VM_BIND queues.  

Re: [Intel-gfx] [PATCH 07/27] Revert "drm/i915/gt: Propagate change in error status to children on unhold"

2021-08-20 Thread Jason Ekstrand
On Thu, Aug 19, 2021 at 1:22 AM Matthew Brost  wrote:
>
> Propagating errors to dependent fences is wrong, don't do it. A selftest
> in the following exposed the propagating of an error to a dependent
> fence after an engine reset.

I feel like we could still have a bit of a better message.  Maybe
something like this:

Propagating errors to dependent fences is broken and can lead to
errors from one client ending up in another.  In 3761baae908a (Revert
"drm/i915: Propagate errors on awaiting already signaled fences"), we
attempted to get rid of fence error propagation but missed the case
added in 8e9f84cf5cac ("drm/i915/gt: Propagate change in error status
to children on unhold").  Revert that one too.  This error was found
by an up-and-coming selftest which .

Otherwise, looks good to me.

--Jason

>
> This reverts commit 8e9f84cf5cac248a1c6a5daa4942879c8b765058.
>
> v2:
>  (Daniel Vetter)
>   - Use revert
>
> References: 3761baae908a (Revert "drm/i915: Propagate errors on awaiting 
> already signaled fences")
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index de5f9c86b9a4..cafb0608ffb4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -2140,10 +2140,6 @@ static void __execlists_unhold(struct i915_request *rq)
> if (p->flags & I915_DEPENDENCY_WEAK)
> continue;
>
> -   /* Propagate any change in error status */
> -   if (rq->fence.error)
> -   i915_request_set_error_once(w, 
> rq->fence.error);
> -
> if (w->engine != rq->engine)
> continue;
>
> --
> 2.32.0
>


Re: [Intel-gfx] [PATCH 2/2] drm/i915: Add pci ids and uapi for DG1

2021-08-12 Thread Jason Ekstrand
On Thu, Aug 12, 2021 at 9:49 AM Daniel Vetter  wrote:
>
> On Thu, Aug 12, 2021 at 2:44 PM Maarten Lankhorst
>  wrote:
> >
> > DG1 has support for local memory, which requires the usage of the
> > lmem placement extension for creating bo's, and memregion queries
> > to obtain the size. Because of this, those parts of the uapi are
> > no longer guarded behind FAKE_LMEM.
> >
> > According to the pull request referenced below, mesa should be mostly
> > ready for DG1. VK_EXT_memory_budget is not hooked up yet, but we
> > should definitely just enable the uapi parts by default.
> >
> > Signed-off-by: Maarten Lankhorst 
> > References: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11584
> > Cc: Jordan Justen jordan.l.jus...@intel.com
> > Cc: Jason Ekstrand ja...@jlekstrand.net
>
> Acked-by: Daniel Vetter 

Acked-by: Jason Ekstrand 

>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_create.c | 3 ---
> >  drivers/gpu/drm/i915/i915_pci.c| 1 +
> >  drivers/gpu/drm/i915/i915_query.c  | 3 ---
> >  3 files changed, 1 insertion(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > index 23fee13a3384..1d341b8c47c0 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > @@ -347,9 +347,6 @@ static int ext_set_placements(struct 
> > i915_user_extension __user *base,
> >  {
> > struct drm_i915_gem_create_ext_memory_regions ext;
> >
> > -   if (!IS_ENABLED(CONFIG_DRM_I915_UNSTABLE_FAKE_LMEM))
> > -   return -ENODEV;
> > -
> > if (copy_from_user(, base, sizeof(ext)))
> > return -EFAULT;
> >
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c 
> > b/drivers/gpu/drm/i915/i915_pci.c
> > index 1bbd09ad5287..93ccdc6bbd03 100644
> > --- a/drivers/gpu/drm/i915/i915_pci.c
> > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > @@ -1115,6 +1115,7 @@ static const struct pci_device_id pciidlist[] = {
> > INTEL_RKL_IDS(_info),
> > INTEL_ADLS_IDS(_s_info),
> > INTEL_ADLP_IDS(_p_info),
> > +   INTEL_DG1_IDS(_info),
> > {0, 0, 0}
> >  };
> >  MODULE_DEVICE_TABLE(pci, pciidlist);
> > diff --git a/drivers/gpu/drm/i915/i915_query.c 
> > b/drivers/gpu/drm/i915/i915_query.c
> > index e49da36c62fb..5e2b909827f4 100644
> > --- a/drivers/gpu/drm/i915/i915_query.c
> > +++ b/drivers/gpu/drm/i915/i915_query.c
> > @@ -432,9 +432,6 @@ static int query_memregion_info(struct drm_i915_private 
> > *i915,
> > u32 total_length;
> > int ret, id, i;
> >
> > -   if (!IS_ENABLED(CONFIG_DRM_I915_UNSTABLE_FAKE_LMEM))
> > -   return -ENODEV;
> > -
> > if (query_item->flags != 0)
> > return -EINVAL;
> >
> > --
> > 2.32.0
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH] drm/i915: Use locked access to ctx->engines in set_priority

2021-08-12 Thread Jason Ekstrand
On Tue, Aug 10, 2021 at 8:05 AM Daniel Vetter  wrote:
>
> This essentially reverts
>
> commit 89ff76bf9b3b0b86e6bbe344bd6378d8661303fc
> Author: Chris Wilson 
> Date:   Thu Apr 2 13:42:18 2020 +0100
>
> drm/i915/gem: Utilize rcu iteration of context engines
>
> Note that the other use of __context_engines_await have disappeard in
> the following commits:
>
> ccbc1b97948a ("drm/i915/gem: Don't allow changing the VM on running contexts 
> (v4)")
> c7a71fc8ee04 ("drm/i915: Drop getparam support for 
> I915_CONTEXT_PARAM_ENGINES")
> 4a766ae40ec8 ("drm/i915: Drop the CONTEXT_CLONE API (v2)")
>
> None of these have any business to optimize their engine lookup with
> rcu, unless extremely convincing benchmark data and a solid analysis
> why we can't make that workload (whatever it is that does) faster with
> a proper design fix.
>
> Also since there's only one caller of context_apply_all left and it's
> really just a loop, inline it and then inline the lopp body too. This
> is how all other callers that take the engine lock loop over engines,
> it's much simpler.
>
> Signed-off-by: Daniel Vetter 
> Cc: Chris Wilson 
> Cc: Mika Kuoppala 
> Cc: Daniel Vetter 
> Cc: Jason Ekstrand 
> Cc: Tvrtko Ursulin 
> Cc: Joonas Lahtinen 
> Cc: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 72 -
>  1 file changed, 14 insertions(+), 58 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index dbaeb924a437..fd169cf2f75a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1284,49 +1284,6 @@ static int __context_set_persistence(struct 
> i915_gem_context *ctx, bool state)
> return 0;
>  }
>
> -static inline struct i915_gem_engines *
> -__context_engines_await(const struct i915_gem_context *ctx,
> -   bool *user_engines)
> -{
> -   struct i915_gem_engines *engines;
> -
> -   rcu_read_lock();
> -   do {
> -   engines = rcu_dereference(ctx->engines);
> -   GEM_BUG_ON(!engines);
> -
> -   if (user_engines)
> -   *user_engines = i915_gem_context_user_engines(ctx);
> -
> -   /* successful await => strong mb */
> -   if (unlikely(!i915_sw_fence_await(>fence)))

Ugh... The first time I looked at this I thought the SW fence meant it
was actually waiting on something.  But, no, it's just making sure the
engines object still exists.  *sigh*  Burn it!

Reviewed-by: Jason Ekstrand 

> -   continue;
> -
> -   if (likely(engines == rcu_access_pointer(ctx->engines)))
> -   break;
> -
> -   i915_sw_fence_complete(>fence);
> -   } while (1);
> -   rcu_read_unlock();
> -
> -   return engines;
> -}
> -
> -static void
> -context_apply_all(struct i915_gem_context *ctx,
> - void (*fn)(struct intel_context *ce, void *data),
> - void *data)
> -{
> -   struct i915_gem_engines_iter it;
> -   struct i915_gem_engines *e;
> -   struct intel_context *ce;
> -
> -   e = __context_engines_await(ctx, NULL);
> -   for_each_gem_engine(ce, e, it)
> -   fn(ce, data);
> -   i915_sw_fence_complete(>fence);
> -}
> -
>  static struct i915_gem_context *
>  i915_gem_create_context(struct drm_i915_private *i915,
> const struct i915_gem_proto_context *pc)
> @@ -1776,23 +1733,11 @@ set_persistence(struct i915_gem_context *ctx,
> return __context_set_persistence(ctx, args->value);
>  }
>
> -static void __apply_priority(struct intel_context *ce, void *arg)
> -{
> -   struct i915_gem_context *ctx = arg;
> -
> -   if (!intel_engine_has_timeslices(ce->engine))
> -   return;
> -
> -   if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> -   intel_engine_has_semaphores(ce->engine))
> -   intel_context_set_use_semaphores(ce);
> -   else
> -   intel_context_clear_use_semaphores(ce);
> -}
> -
>  static int set_priority(struct i915_gem_context *ctx,
> const struct drm_i915_gem_context_param *args)
>  {
> +   struct i915_gem_engines_iter it;
> +   struct intel_context *ce;
> int err;
>
> err = validate_priority(ctx->i915, args);
> @@ -1800,7 +1745,18 @@ static int set_priority(struct i915_gem_context *ctx,
> return err;
>
> ctx->sched.pri

Re: [Intel-gfx] [PATCH] drm/doc/rfc: drop lmem uapi section

2021-08-10 Thread Jason Ekstrand
Acked-by: Jason Ekstrand 

On Tue, Aug 10, 2021 at 7:34 AM Daniel Vetter  wrote:
>
> We still have quite a bit more work to do with overall reworking of
> the ttm-based dg1 code, but the uapi stuff is now finalized with the
> latest pull. So remove that.
>
> This also fixes kerneldoc build warnings because we've included the
> same headers in two places, resulting in sphinx complaining about
> duplicated symbols. This regression has been created when we moved the
> uapi definitions to the real include/uapi/ folder in 727ecd99a4c9
> ("drm/doc/rfc: drop the i915_gem_lmem.h header")
>
> Reported-by: Stephen Rothwell 
> Cc: Stephen Rothwell 
> Fixes: 727ecd99a4c9 ("drm/doc/rfc: drop the i915_gem_lmem.h header")
> Cc: Matthew Auld 
> Signed-off-by: Daniel Vetter 
> ---
>  Documentation/gpu/rfc/i915_gem_lmem.rst | 107 
>  1 file changed, 107 deletions(-)
>
> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst 
> b/Documentation/gpu/rfc/i915_gem_lmem.rst
> index 675ba8620d66..91be041e68cc 100644
> --- a/Documentation/gpu/rfc/i915_gem_lmem.rst
> +++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
> @@ -22,110 +22,3 @@ real, with all the uAPI bits is:
>  * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
>  * Send RFC(with mesa-dev on cc) for final sign off on the uAPI
>  * Add pciid for DG1 and turn on uAPI for real
> -
> -New object placement and region query uAPI
> -==
> -Starting from DG1 we need to give userspace the ability to allocate buffers 
> from
> -device local-memory. Currently the driver supports gem_create, which can 
> place
> -buffers in system memory via shmem, and the usual assortment of other
> -interfaces, like dumb buffers and userptr.
> -
> -To support this new capability, while also providing a uAPI which will work
> -beyond just DG1, we propose to offer three new bits of uAPI:
> -
> -DRM_I915_QUERY_MEMORY_REGIONS
> --
> -New query ID which allows userspace to discover the list of supported memory
> -regions(like system-memory and local-memory) for a given device. We identify
> -each region with a class and instance pair, which should be unique. The class
> -here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
> -like DG1.
> -
> -Side note: The class/instance design is borrowed from our existing engine 
> uAPI,
> -where we describe every physical engine in terms of its class, and the
> -particular instance, since we can have more than one per class.
> -
> -In the future we also want to expose more information which can further
> -describe the capabilities of a region.
> -
> -.. kernel-doc:: include/uapi/drm/i915_drm.h
> -:functions: drm_i915_gem_memory_class 
> drm_i915_gem_memory_class_instance drm_i915_memory_region_info 
> drm_i915_query_memory_regions
> -
> -GEM_CREATE_EXT
> ---
> -New ioctl which is basically just gem_create but now allows userspace to 
> provide
> -a chain of possible extensions. Note that if we don't provide any extensions 
> and
> -set flags=0 then we get the exact same behaviour as gem_create.
> -
> -Side note: We also need to support PXP[1] in the near future, which is also
> -applicable to integrated platforms, and adds its own gem_create_ext 
> extension,
> -which basically lets userspace mark a buffer as "protected".
> -
> -.. kernel-doc:: include/uapi/drm/i915_drm.h
> -:functions: drm_i915_gem_create_ext
> -
> -I915_GEM_CREATE_EXT_MEMORY_REGIONS
> ---
> -Implemented as an extension for gem_create_ext, we would now allow userspace 
> to
> -optionally provide an immutable list of preferred placements at creation 
> time,
> -in priority order, for a given buffer object.  For the placements we expect
> -them each to use the class/instance encoding, as per the output of the 
> regions
> -query. Having the list in priority order will be useful in the future when
> -placing an object, say during eviction.
> -
> -.. kernel-doc:: include/uapi/drm/i915_drm.h
> -:functions: drm_i915_gem_create_ext_memory_regions
> -
> -One fair criticism here is that this seems a little over-engineered[2]. If we
> -just consider DG1 then yes, a simple gem_create.flags or something is totally
> -all that's needed to tell the kernel to allocate the buffer in local-memory 
> or
> -whatever. However looking to the future we need uAPI which can also support
> -upcoming Xe HP multi-tile architecture in a sane way, where there can be
> -multiple local-memory instances for a given device, and so using both class 
> and
> -instance in our uAPI to describe regions is desirable, a

Re: [Intel-gfx] [PATCH] drm/i915: Release ctx->syncobj on final put, not on ctx close

2021-08-07 Thread Jason Ekstrand

On August 6, 2021 15:18:59 Daniel Vetter  wrote:


gem context refcounting is another exercise in least locking design it
seems, where most things get destroyed upon context closure (which can
race with anything really). Only the actual memory allocation and the
locks survive while holding a reference.

This tripped up Jason when reimplementing the single timeline feature
in

commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d
Author: Jason Ekstrand 
Date:   Thu Jul 8 10:48:12 2021 -0500

   drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)

We could fix the bug by holding ctx->mutex, but it's cleaner to just


What bug is this fixing, exactly?

--Jason



make the context object actually invariant over its _entire_ lifetime.

Signed-off-by: Daniel Vetter 
Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
Cc: Jason Ekstrand 
Cc: Chris Wilson 
Cc: Tvrtko Ursulin 
Cc: Joonas Lahtinen 
Cc: Matthew Brost 
Cc: Matthew Auld 
Cc: Maarten Lankhorst 
Cc: "Thomas Hellström" 
Cc: Lionel Landwerlin 
Cc: Dave Airlie 
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c

index 754b9b8d4981..93ba0197d70a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref)
 trace_i915_context_free(ctx);
 GEM_BUG_ON(!i915_gem_context_is_closed(ctx));

+ if (ctx->syncobj)
+ drm_syncobj_put(ctx->syncobj);
+
 mutex_destroy(>engines_mutex);
 mutex_destroy(>lut_mutex);

@@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx)
 if (vm)
 i915_vm_close(vm);

- if (ctx->syncobj)
- drm_syncobj_put(ctx->syncobj);
-
 ctx->file_priv = ERR_PTR(-EBADF);

 /*
--
2.32.0




Re: [Intel-gfx] [PATCH -next] drm/i915: fix i915_globals_exit() section mismatch error

2021-08-04 Thread Jason Ekstrand
On Wed, Aug 4, 2021 at 3:41 PM Randy Dunlap  wrote:
>
> Fix modpost Section mismatch error in i915_globals_exit().
> Since both an __init function and an __exit function can call
> i915_globals_exit(), any function that i915_globals_exit() calls
> should not be marked as __init or __exit. I.e., it needs to be
> available for either of them.
>
> WARNING: modpost: vmlinux.o(.text+0x8b796a): Section mismatch in reference 
> from the function i915_globals_exit() to the function 
> .exit.text:__i915_globals_flush()
> The function i915_globals_exit() references a function in an exit section.
> Often the function __i915_globals_flush() has valid usage outside the exit 
> section
> and the fix is to remove the __exit annotation of __i915_globals_flush.
>
> ERROR: modpost: Section mismatches detected.
> Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them.

My gut says we actually want to back-port
https://lore.kernel.org/dri-devel/YPk3OCMrhg7UlU6T@phenom.ffwll.local/
instead.  Daniel, thoughts?

--Jason

>
> Fixes: 1354d830cb8f ("drm/i915: Call i915_globals_exit() if 
> pci_register_device() fails")
> Signed-off-by: Randy Dunlap 
> Cc: Jason Ekstrand 
> Cc: Daniel Vetter 
> Cc: Rodrigo Vivi 
> Cc: Jani Nikula 
> Cc: Joonas Lahtinen 
> Cc: intel-gfx@lists.freedesktop.org
> Cc: dri-de...@lists.freedesktop.org
> ---
>  drivers/gpu/drm/i915/i915_globals.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- linext-2021-0804.orig/drivers/gpu/drm/i915/i915_globals.c
> +++ linext-2021-0804/drivers/gpu/drm/i915/i915_globals.c
> @@ -138,7 +138,7 @@ void i915_globals_unpark(void)
> atomic_inc();
>  }
>
> -static void __exit __i915_globals_flush(void)
> +static void  __i915_globals_flush(void)
>  {
> atomic_inc(); /* skip shrinking */
>


Re: [Intel-gfx] [PATCH 2/2] drm/i915: delete gpu reloc code

2021-08-03 Thread Jason Ekstrand
Both are

Reviewed-by: Jason Ekstrand 

On Tue, Aug 3, 2021 at 7:49 AM Daniel Vetter  wrote:
>
> It's already removed, this just garbage collects it all.
>
> v2: Rebase over s/GEN/GRAPHICS_VER/
>
> v3: Also ditch eb.reloc_pool and eb.reloc_context (Maarten)
>
> Signed-off-by: Daniel Vetter 
> Cc: Jon Bloomfield 
> Cc: Chris Wilson 
> Cc: Maarten Lankhorst 
> Cc: Daniel Vetter 
> Cc: Joonas Lahtinen 
> Cc: "Thomas Hellström" 
> Cc: Matthew Auld 
> Cc: Lionel Landwerlin 
> Cc: Dave Airlie 
> Cc: Jason Ekstrand 
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 360 +-
>  .../drm/i915/selftests/i915_live_selftests.h  |   1 -
>  2 files changed, 1 insertion(+), 360 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index e4dc4c3b4df3..98e25efffb59 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -277,16 +277,8 @@ struct i915_execbuffer {
> bool has_llc : 1;
> bool has_fence : 1;
> bool needs_unfenced : 1;
> -
> -   struct i915_request *rq;
> -   u32 *rq_cmd;
> -   unsigned int rq_size;
> -   struct intel_gt_buffer_pool_node *pool;
> } reloc_cache;
>
> -   struct intel_gt_buffer_pool_node *reloc_pool; /** relocation pool for 
> -EDEADLK handling */
> -   struct intel_context *reloc_context;
> -
> u64 invalid_flags; /** Set of execobj.flags that are invalid */
>
> u64 batch_len; /** Length of batch within object */
> @@ -1035,8 +1027,6 @@ static void eb_release_vmas(struct i915_execbuffer *eb, 
> bool final)
>
>  static void eb_destroy(const struct i915_execbuffer *eb)
>  {
> -   GEM_BUG_ON(eb->reloc_cache.rq);
> -
> if (eb->lut_size > 0)
> kfree(eb->buckets);
>  }
> @@ -1048,14 +1038,6 @@ relocation_target(const struct 
> drm_i915_gem_relocation_entry *reloc,
> return gen8_canonical_addr((int)reloc->delta + target->node.start);
>  }
>
> -static void reloc_cache_clear(struct reloc_cache *cache)
> -{
> -   cache->rq = NULL;
> -   cache->rq_cmd = NULL;
> -   cache->pool = NULL;
> -   cache->rq_size = 0;
> -}
> -
>  static void reloc_cache_init(struct reloc_cache *cache,
>  struct drm_i915_private *i915)
>  {
> @@ -1068,7 +1050,6 @@ static void reloc_cache_init(struct reloc_cache *cache,
> cache->has_fence = cache->graphics_ver < 4;
> cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
> cache->node.flags = 0;
> -   reloc_cache_clear(cache);
>  }
>
>  static inline void *unmask_page(unsigned long p)
> @@ -1090,48 +1071,10 @@ static inline struct i915_ggtt *cache_to_ggtt(struct 
> reloc_cache *cache)
> return >ggtt;
>  }
>
> -static void reloc_cache_put_pool(struct i915_execbuffer *eb, struct 
> reloc_cache *cache)
> -{
> -   if (!cache->pool)
> -   return;
> -
> -   /*
> -* This is a bit nasty, normally we keep objects locked until the end
> -* of execbuffer, but we already submit this, and have to unlock 
> before
> -* dropping the reference. Fortunately we can only hold 1 pool node at
> -* a time, so this should be harmless.
> -*/
> -   i915_gem_ww_unlock_single(cache->pool->obj);
> -   intel_gt_buffer_pool_put(cache->pool);
> -   cache->pool = NULL;
> -}
> -
> -static void reloc_gpu_flush(struct i915_execbuffer *eb, struct reloc_cache 
> *cache)
> -{
> -   struct drm_i915_gem_object *obj = cache->rq->batch->obj;
> -
> -   GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));
> -   cache->rq_cmd[cache->rq_size] = MI_BATCH_BUFFER_END;
> -
> -   i915_gem_object_flush_map(obj);
> -   i915_gem_object_unpin_map(obj);
> -
> -   intel_gt_chipset_flush(cache->rq->engine->gt);
> -
> -   i915_request_add(cache->rq);
> -   reloc_cache_put_pool(eb, cache);
> -   reloc_cache_clear(cache);
> -
> -   eb->reloc_pool = NULL;
> -}
> -
>  static void reloc_cache_reset(struct reloc_cache *cache, struct 
> i915_execbuffer *eb)
>  {
> void *vaddr;
>
> -   if (cache->rq)
> -   reloc_gpu_flush(eb, cache);
> -
> if (!cache->vaddr)
> return;
>
> @@ -1313,295 +1256,6 @@ static void clflush_write32(u32 *addr, u32 value, 
> unsigne

Re: [Intel-gfx] [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-08-03 Thread Jason Ekstrand
On Tue, Aug 3, 2021 at 10:09 AM Daniel Vetter  wrote:
> On Wed, Jul 28, 2021 at 4:22 PM Matthew Auld
>  wrote:
> >
> > On Mon, 26 Jul 2021 at 17:10, Tvrtko Ursulin
> >  wrote:
> > >
> > >
> > > On 26/07/2021 16:14, Jason Ekstrand wrote:
> > > > On Mon, Jul 26, 2021 at 3:31 AM Maarten Lankhorst
> > > >  wrote:
> > > >>
> > > >> Op 23-07-2021 om 13:34 schreef Matthew Auld:
> > > >>> From: Chris Wilson 
> > > >>>
> > > >>> Jason Ekstrand requested a more efficient method than 
> > > >>> userptr+set-domain
> > > >>> to determine if the userptr object was backed by a complete set of 
> > > >>> pages
> > > >>> upon creation. To be more efficient than simply populating the userptr
> > > >>> using get_user_pages() (as done by the call to set-domain or execbuf),
> > > >>> we can walk the tree of vm_area_struct and check for gaps or vma not
> > > >>> backed by struct page (VM_PFNMAP). The question is how to handle
> > > >>> VM_MIXEDMAP which may be either struct page or pfn backed...
> > > >>>
> > > >>> With discrete we are going to drop support for set_domain(), so 
> > > >>> offering
> > > >>> a way to probe the pages, without having to resort to dummy batches 
> > > >>> has
> > > >>> been requested.
> > > >>>
> > > >>> v2:
> > > >>> - add new query param for the PROBE flag, so userspace can easily
> > > >>>check if the kernel supports it(Jason).
> > > >>> - use mmap_read_{lock, unlock}.
> > > >>> - add some kernel-doc.
> > > >>> v3:
> > > >>> - In the docs also mention that PROBE doesn't guarantee that the pages
> > > >>>will remain valid by the time they are actually used(Tvrtko).
> > > >>> - Add a small comment for the hole finding logic(Jason).
> > > >>> - Move the param next to all the other params which just return true.
> > > >>>
> > > >>> Testcase: igt/gem_userptr_blits/probe
> > > >>> Signed-off-by: Chris Wilson 
> > > >>> Signed-off-by: Matthew Auld 
> > > >>> Cc: Thomas Hellström 
> > > >>> Cc: Maarten Lankhorst 
> > > >>> Cc: Tvrtko Ursulin 
> > > >>> Cc: Jordan Justen 
> > > >>> Cc: Kenneth Graunke 
> > > >>> Cc: Jason Ekstrand 
> > > >>> Cc: Daniel Vetter 
> > > >>> Cc: Ramalingam C 
> > > >>> Reviewed-by: Tvrtko Ursulin 
> > > >>> Acked-by: Kenneth Graunke 
> > > >>> Reviewed-by: Jason Ekstrand 
> > > >>> ---
> > > >>>   drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 
> > > >>> -
> > > >>>   drivers/gpu/drm/i915/i915_getparam.c|  1 +
> > > >>>   include/uapi/drm/i915_drm.h | 20 ++
> > > >>>   3 files changed, 61 insertions(+), 1 deletion(-)
> > > >>>
> > > >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > > >>> b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > >>> index 56edfeff8c02..468a7a617fbf 100644
> > > >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > >>> @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops 
> > > >>> i915_gem_userptr_ops = {
> > > >>>
> > > >>>   #endif
> > > >>>
> > > >>> +static int
> > > >>> +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long 
> > > >>> len)
> > > >>> +{
> > > >>> + const unsigned long end = addr + len;
> > > >>> + struct vm_area_struct *vma;
> > > >>> + int ret = -EFAULT;
> > > >>> +
> > > >>> + mmap_read_lock(mm);
> > > >>> + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> > > >>> + /* Check for holes, note that we also update the addr 
> > > >>> below */
> > > >>> + if (vma->vm_start > addr)
> > >

Re: [Intel-gfx] [PATCH] drm/i915/selftests: prefer the create_user helper

2021-07-28 Thread Jason Ekstrand

On July 28, 2021 10:57:23 Matthew Auld  wrote:


No need to hand roll the set_placements stuff, now that that we have a
helper for this. Also no need to handle the -ENODEV case here, since
NULL mr implies missing device support, where the for_each_memory_region
helper will always skip over such regions.

Signed-off-by: Matthew Auld 
Cc: Jason Ekstrand 


Reviewed-by: Jason Ekstrand 



---
.../drm/i915/gem/selftests/i915_gem_mman.c| 46 ++-
1 file changed, 4 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c

index 0b2b73d8a364..eed1c2c64e75 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -860,24 +860,6 @@ static bool can_mmap(struct drm_i915_gem_object *obj, 
enum i915_mmap_type type)

 return !no_map;
}

-static void object_set_placements(struct drm_i915_gem_object *obj,
-  struct intel_memory_region **placements,
-  unsigned int n_placements)
-{
- GEM_BUG_ON(!n_placements);
-
- if (n_placements == 1) {
- struct drm_i915_private *i915 = to_i915(obj->base.dev);
- struct intel_memory_region *mr = placements[0];
-
- obj->mm.placements = >mm.regions[mr->id];
- obj->mm.n_placements = 1;
- } else {
- obj->mm.placements = placements;
- obj->mm.n_placements = n_placements;
- }
-}
-
#define expand32(x) (((x) << 0) | ((x) << 8) | ((x) << 16) | ((x) << 24))
static int __igt_mmap(struct drm_i915_private *i915,
  struct drm_i915_gem_object *obj,
@@ -972,15 +954,10 @@ static int igt_mmap(void *arg)
 struct drm_i915_gem_object *obj;
 int err;

- obj = i915_gem_object_create_region(mr, sizes[i], 0, I915_BO_ALLOC_USER);
- if (obj == ERR_PTR(-ENODEV))
- continue;
-
+ obj = __i915_gem_object_create_user(i915, sizes[i], , 1);
 if (IS_ERR(obj))
 return PTR_ERR(obj);

- object_set_placements(obj, , 1);
-
 err = __igt_mmap(i915, obj, I915_MMAP_TYPE_GTT);
 if (err == 0)
 err = __igt_mmap(i915, obj, I915_MMAP_TYPE_WC);
@@ -1101,15 +1078,10 @@ static int igt_mmap_access(void *arg)
 struct drm_i915_gem_object *obj;
 int err;

- obj = i915_gem_object_create_region(mr, PAGE_SIZE, 0, I915_BO_ALLOC_USER);
- if (obj == ERR_PTR(-ENODEV))
- continue;
-
+ obj = __i915_gem_object_create_user(i915, PAGE_SIZE, , 1);
 if (IS_ERR(obj))
 return PTR_ERR(obj);

- object_set_placements(obj, , 1);
-
 err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_GTT);
 if (err == 0)
 err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_WB);
@@ -1248,15 +1220,10 @@ static int igt_mmap_gpu(void *arg)
 struct drm_i915_gem_object *obj;
 int err;

- obj = i915_gem_object_create_region(mr, PAGE_SIZE, 0, I915_BO_ALLOC_USER);
- if (obj == ERR_PTR(-ENODEV))
- continue;
-
+ obj = __i915_gem_object_create_user(i915, PAGE_SIZE, , 1);
 if (IS_ERR(obj))
 return PTR_ERR(obj);

- object_set_placements(obj, , 1);
-
 err = __igt_mmap_gpu(i915, obj, I915_MMAP_TYPE_GTT);
 if (err == 0)
 err = __igt_mmap_gpu(i915, obj, I915_MMAP_TYPE_WC);
@@ -1405,15 +1372,10 @@ static int igt_mmap_revoke(void *arg)
 struct drm_i915_gem_object *obj;
 int err;

- obj = i915_gem_object_create_region(mr, PAGE_SIZE, 0, I915_BO_ALLOC_USER);
- if (obj == ERR_PTR(-ENODEV))
- continue;
-
+ obj = __i915_gem_object_create_user(i915, PAGE_SIZE, , 1);
 if (IS_ERR(obj))
 return PTR_ERR(obj);

- object_set_placements(obj, , 1);
-
 err = __igt_mmap_revoke(i915, obj, I915_MMAP_TYPE_GTT);
 if (err == 0)
 err = __igt_mmap_revoke(i915, obj, I915_MMAP_TYPE_WC);
--
2.26.3


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 11/11] drm/i915: Extract i915_module.c

2021-07-27 Thread Jason Ekstrand
On Tue, Jul 27, 2021 at 9:44 AM Tvrtko Ursulin
 wrote:
>
>
> On 27/07/2021 13:10, Daniel Vetter wrote:
> > The module init code is somewhat misplaced in i915_pci.c, since it
> > needs to pull in init/exit functions from every part of the driver and
> > pollutes the include list a lot.
> >
> > Extract an i915_module.c file which pulls all the bits together, and
> > allows us to massively trim the include list of i915_pci.c.
> >
> > The downside is that have to drop the error path check Jason added to
> > catch when we set up the pci driver too early. I think that risk is
> > acceptable for this pretty nice include.
>
> i915_module.c is an improvement and the rest for me is not extremely
> objectionable by the end of this incarnation, but I also do not see it
> as an improvement really.

It's not a big improvement to be sure, but I think there are a few
ways this is nicer:

 1. One less level of indirection to sort through.
 2. The init/exit table is generally simpler than the i915_global interface.
 3. It's easy to forget i915_global_register but forgetting to put an
_exit function in the module init table is a lot more obvious.

None of those are deal-breakers but they're kind-of nice.  Anyway,
this one is also

Reviewed-by: Jason Ekstrand 

--Jason

> There was a bug to fix relating to mock tests, but that is where the
> exercise should have stopped for now. After that it IMHO spiraled out of
> control, not least the unjustifiably expedited removal of cache
> shrinking. On balance for me it is too churny and boils down to two
> extremely capable people spending time on kind of really unimportant
> side fiddles. And I do not intend to prescribe you what to do, just
> expressing my bewilderment. FWIW... I can only say my opinion as it, not
> that it matters a lot.
>
> Regards,
>
> Tvrtko
>
> > Cc: Jason Ekstrand 
> > Cc: Tvrtko Ursulin 
> > Signed-off-by: Daniel Vetter 
> > ---
> >   drivers/gpu/drm/i915/Makefile  |   1 +
> >   drivers/gpu/drm/i915/i915_module.c | 113 
> >   drivers/gpu/drm/i915/i915_pci.c| 117 +
> >   drivers/gpu/drm/i915/i915_pci.h|   8 ++
> >   4 files changed, 125 insertions(+), 114 deletions(-)
> >   create mode 100644 drivers/gpu/drm/i915/i915_module.c
> >   create mode 100644 drivers/gpu/drm/i915/i915_pci.h
> >
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index 9022dc638ed6..4ebd9f417ddb 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -38,6 +38,7 @@ i915-y += i915_drv.o \
> > i915_irq.o \
> > i915_getparam.o \
> > i915_mitigations.o \
> > +   i915_module.o \
> > i915_params.o \
> > i915_pci.o \
> > i915_scatterlist.o \
> > diff --git a/drivers/gpu/drm/i915/i915_module.c 
> > b/drivers/gpu/drm/i915/i915_module.c
> > new file mode 100644
> > index ..c578ea8f56a0
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/i915_module.c
> > @@ -0,0 +1,113 @@
> > +/*
> > + * SPDX-License-Identifier: MIT
> > + *
> > + * Copyright © 2021 Intel Corporation
> > + */
> > +
> > +#include 
> > +
> > +#include "gem/i915_gem_context.h"
> > +#include "gem/i915_gem_object.h"
> > +#include "i915_active.h"
> > +#include "i915_buddy.h"
> > +#include "i915_params.h"
> > +#include "i915_pci.h"
> > +#include "i915_perf.h"
> > +#include "i915_request.h"
> > +#include "i915_scheduler.h"
> > +#include "i915_selftest.h"
> > +#include "i915_vma.h"
> > +
> > +static int i915_check_nomodeset(void)
> > +{
> > + bool use_kms = true;
> > +
> > + /*
> > +  * Enable KMS by default, unless explicitly overriden by
> > +  * either the i915.modeset prarameter or by the
> > +  * vga_text_mode_force boot option.
> > +  */
> > +
> > + if (i915_modparams.modeset == 0)
> > + use_kms = false;
> > +
> > + if (vgacon_text_force() && i915_modparams.modeset == -1)
> > + use_kms = false;
> > +
> > + if (!use_kms) {
> > + /* Silently fail loading to not upset userspace. */
> > + DRM_DEBUG_DRIVER("KMS disabled.\n");
> > + return 1;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct {
> > +   int 

Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 11:31 AM Tvrtko Ursulin
 wrote:
>
>
> On 26/07/2021 17:20, Jason Ekstrand wrote:
> > On Mon, Jul 26, 2021 at 11:08 AM Tvrtko Ursulin
> >  wrote:
> >> On 26/07/2021 16:42, Jason Ekstrand wrote:
> >>> On Mon, Jul 26, 2021 at 10:30 AM Jason Ekstrand  
> >>> wrote:
> >>>>
> >>>> On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin
> >>>>  wrote:
> >>>>>
> >>>>>
> >>>>> On 23/07/2021 20:29, Daniel Vetter wrote:
> >>>>>> With the global kmem_cache shrink infrastructure gone there's nothing
> >>>>>> special and we can convert them over.
> >>>>>>
> >>>>>> I'm doing this split up into each patch because there's quite a bit of
> >>>>>> noise with removing the static global.slab_ce to just a
> >>>>>> slab_ce.
> >>>>>>
> >>>>>> Cc: Jason Ekstrand 
> >>>>>> Signed-off-by: Daniel Vetter 
> >>>>>> ---
> >>>>>> drivers/gpu/drm/i915/gt/intel_context.c | 25 
> >>>>>> -
> >>>>>> drivers/gpu/drm/i915/gt/intel_context.h |  3 +++
> >>>>>> drivers/gpu/drm/i915/i915_globals.c |  2 --
> >>>>>> drivers/gpu/drm/i915/i915_globals.h |  1 -
> >>>>>> drivers/gpu/drm/i915/i915_pci.c |  2 ++
> >>>>>> 5 files changed, 13 insertions(+), 20 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> >>>>>> b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>>>> index baa05fddd690..283382549a6f 100644
> >>>>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> >>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>>>> @@ -7,7 +7,6 @@
> >>>>>> #include "gem/i915_gem_pm.h"
> >>>>>>
> >>>>>> #include "i915_drv.h"
> >>>>>> -#include "i915_globals.h"
> >>>>>> #include "i915_trace.h"
> >>>>>>
> >>>>>> #include "intel_context.h"
> >>>>>> @@ -15,14 +14,11 @@
> >>>>>> #include "intel_engine_pm.h"
> >>>>>> #include "intel_ring.h"
> >>>>>>
> >>>>>> -static struct i915_global_context {
> >>>>>> - struct i915_global base;
> >>>>>> - struct kmem_cache *slab_ce;
> >>>>>> -} global;
> >>>>>> +struct kmem_cache *slab_ce;
> >>>>
> >>>> Static?  With that,
> >>>>
> >>>> Reviewed-by: Jason Ekstrand 
> >>>>
> >>>>>>
> >>>>>> static struct intel_context *intel_context_alloc(void)
> >>>>>> {
> >>>>>> - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
> >>>>>> + return kmem_cache_zalloc(slab_ce, GFP_KERNEL);
> >>>>>> }
> >>>>>>
> >>>>>> static void rcu_context_free(struct rcu_head *rcu)
> >>>>>> @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu)
> >>>>>> struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
> >>>>>>
> >>>>>> trace_intel_context_free(ce);
> >>>>>> - kmem_cache_free(global.slab_ce, ce);
> >>>>>> + kmem_cache_free(slab_ce, ce);
> >>>>>> }
> >>>>>>
> >>>>>> void intel_context_free(struct intel_context *ce)
> >>>>>> @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce)
> >>>>>> i915_active_fini(>active);
> >>>>>> }
> >>>>>>
> >>>>>> -static void i915_global_context_exit(void)
> >>>>>> +void i915_context_module_exit(void)
> >>>>>> {
> >>>>>> - kmem_cache_destroy(global.slab_ce);
> >>>>>> + kmem_cache_destroy(slab_ce);
> >>>>>> }
> >>>>>>
> >>>>>> -static struct i9

Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 11:08 AM Tvrtko Ursulin
 wrote:
> On 26/07/2021 16:42, Jason Ekstrand wrote:
> > On Mon, Jul 26, 2021 at 10:30 AM Jason Ekstrand  
> > wrote:
> >>
> >> On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin
> >>  wrote:
> >>>
> >>>
> >>> On 23/07/2021 20:29, Daniel Vetter wrote:
> >>>> With the global kmem_cache shrink infrastructure gone there's nothing
> >>>> special and we can convert them over.
> >>>>
> >>>> I'm doing this split up into each patch because there's quite a bit of
> >>>> noise with removing the static global.slab_ce to just a
> >>>> slab_ce.
> >>>>
> >>>> Cc: Jason Ekstrand 
> >>>> Signed-off-by: Daniel Vetter 
> >>>> ---
> >>>>drivers/gpu/drm/i915/gt/intel_context.c | 25 -
> >>>>drivers/gpu/drm/i915/gt/intel_context.h |  3 +++
> >>>>drivers/gpu/drm/i915/i915_globals.c |  2 --
> >>>>drivers/gpu/drm/i915/i915_globals.h |  1 -
> >>>>drivers/gpu/drm/i915/i915_pci.c |  2 ++
> >>>>5 files changed, 13 insertions(+), 20 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> >>>> b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> index baa05fddd690..283382549a6f 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> @@ -7,7 +7,6 @@
> >>>>#include "gem/i915_gem_pm.h"
> >>>>
> >>>>    #include "i915_drv.h"
> >>>> -#include "i915_globals.h"
> >>>>#include "i915_trace.h"
> >>>>
> >>>>#include "intel_context.h"
> >>>> @@ -15,14 +14,11 @@
> >>>>#include "intel_engine_pm.h"
> >>>>#include "intel_ring.h"
> >>>>
> >>>> -static struct i915_global_context {
> >>>> - struct i915_global base;
> >>>> - struct kmem_cache *slab_ce;
> >>>> -} global;
> >>>> +struct kmem_cache *slab_ce;
> >>
> >> Static?  With that,
> >>
> >> Reviewed-by: Jason Ekstrand 
> >>
> >>>>
> >>>>static struct intel_context *intel_context_alloc(void)
> >>>>{
> >>>> - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
> >>>> + return kmem_cache_zalloc(slab_ce, GFP_KERNEL);
> >>>>}
> >>>>
> >>>>static void rcu_context_free(struct rcu_head *rcu)
> >>>> @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu)
> >>>>struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
> >>>>
> >>>>trace_intel_context_free(ce);
> >>>> - kmem_cache_free(global.slab_ce, ce);
> >>>> + kmem_cache_free(slab_ce, ce);
> >>>>}
> >>>>
> >>>>void intel_context_free(struct intel_context *ce)
> >>>> @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce)
> >>>>i915_active_fini(>active);
> >>>>}
> >>>>
> >>>> -static void i915_global_context_exit(void)
> >>>> +void i915_context_module_exit(void)
> >>>>{
> >>>> - kmem_cache_destroy(global.slab_ce);
> >>>> + kmem_cache_destroy(slab_ce);
> >>>>}
> >>>>
> >>>> -static struct i915_global_context global = { {
> >>>> - .exit = i915_global_context_exit,
> >>>> -} };
> >>>> -
> >>>> -int __init i915_global_context_init(void)
> >>>> +int __init i915_context_module_init(void)
> >>>>{
> >>>> - global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> >>>> - if (!global.slab_ce)
> >>>> + slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> >>>> + if (!slab_ce)
> >>>>return -ENOMEM;
> >>>>
> >>>> - i915_global_register();
> >>>>return 0;
> >>>>}
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel

Re: [Intel-gfx] [PATCH 10/10] drm/i915: Remove i915_globals

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> No longer used.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 

Reviewed-by: Jason Ekstrand 

But, also, tvrtko is right that dumping all that stuff in i915_pci.c
isn't great.  Mind typing a quick follow-on that moves i915_init/exit
to i915_drv.c?

--Jason

> ---
>  drivers/gpu/drm/i915/Makefile |  1 -
>  drivers/gpu/drm/i915/gt/intel_gt_pm.c |  1 -
>  drivers/gpu/drm/i915/i915_globals.c   | 53 ---
>  drivers/gpu/drm/i915/i915_globals.h   | 25 -
>  drivers/gpu/drm/i915/i915_pci.c   |  2 -
>  5 files changed, 82 deletions(-)
>  delete mode 100644 drivers/gpu/drm/i915/i915_globals.c
>  delete mode 100644 drivers/gpu/drm/i915/i915_globals.h
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 10b3bb6207ba..9022dc638ed6 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -166,7 +166,6 @@ i915-y += \
>   i915_gem_gtt.o \
>   i915_gem_ww.o \
>   i915_gem.o \
> - i915_globals.o \
>   i915_query.o \
>   i915_request.o \
>   i915_scheduler.o \
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index d86825437516..943c1d416ec0 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -6,7 +6,6 @@
>  #include 
>
>  #include "i915_drv.h"
> -#include "i915_globals.h"
>  #include "i915_params.h"
>  #include "intel_context.h"
>  #include "intel_engine_pm.h"
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> deleted file mode 100644
> index 04979789e7be..
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ /dev/null
> @@ -1,53 +0,0 @@
> -/*
> - * SPDX-License-Identifier: MIT
> - *
> - * Copyright © 2019 Intel Corporation
> - */
> -
> -#include 
> -#include 
> -
> -#include "i915_globals.h"
> -#include "i915_drv.h"
> -
> -static LIST_HEAD(globals);
> -
> -void __init i915_global_register(struct i915_global *global)
> -{
> -   GEM_BUG_ON(!global->exit);
> -
> -   list_add_tail(>link, );
> -}
> -
> -static void __i915_globals_cleanup(void)
> -{
> -   struct i915_global *global, *next;
> -
> -   list_for_each_entry_safe_reverse(global, next, , link)
> -   global->exit();
> -}
> -
> -static __initconst int (* const initfn[])(void) = {
> -};
> -
> -int __init i915_globals_init(void)
> -{
> -   int i;
> -
> -   for (i = 0; i < ARRAY_SIZE(initfn); i++) {
> -   int err;
> -
> -   err = initfn[i]();
> -   if (err) {
> -   __i915_globals_cleanup();
> -   return err;
> -   }
> -   }
> -
> -   return 0;
> -}
> -
> -void i915_globals_exit(void)
> -{
> -   __i915_globals_cleanup();
> -}
> diff --git a/drivers/gpu/drm/i915/i915_globals.h 
> b/drivers/gpu/drm/i915/i915_globals.h
> deleted file mode 100644
> index 57d2998bba45..
> --- a/drivers/gpu/drm/i915/i915_globals.h
> +++ /dev/null
> @@ -1,25 +0,0 @@
> -/*
> - * SPDX-License-Identifier: MIT
> - *
> - * Copyright © 2019 Intel Corporation
> - */
> -
> -#ifndef _I915_GLOBALS_H_
> -#define _I915_GLOBALS_H_
> -
> -#include 
> -
> -typedef void (*i915_global_func_t)(void);
> -
> -struct i915_global {
> -   struct list_head link;
> -
> -   i915_global_func_t exit;
> -};
> -
> -void i915_global_register(struct i915_global *global);
> -
> -int i915_globals_init(void);
> -void i915_globals_exit(void);
> -
> -#endif /* _I915_GLOBALS_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 0affcf33a211..ed72bcb58331 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -37,7 +37,6 @@
>  #include "gem/i915_gem_object.h"
>  #include "i915_request.h"
>  #include "i915_perf.h"
> -#include "i915_globals.h"
>  #include "i915_selftest.h"
>  #include "i915_scheduler.h"
>  #include "i915_vma.h"
> @@ -1308,7 +1307,6 @@ static const struct {
> { i915_request_module_init, i915_request_module_exit },
> { i915_scheduler_module_init, i915_scheduler_module_exit },
> { i915_vma_module_init, i915_vma_module_exit },
> -   { i915_globals_init, i915_globals_exit },
> { i915_mock_selftests, NULL },
> { i915_pmu_init, i915_pmu_exit },
> { i915_register_pci_driver, i915_unregister_pci_driver },
> --
> 2.32.0
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 09/10] drm/i915: move vma slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_vmas to just a
> slab_vmas.
>
> We have to keep i915_drv.h include in i915_globals otherwise there's
> nothing anymore that pulls in GEM_BUG_ON.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_globals.c |  3 +--
>  drivers/gpu/drm/i915/i915_globals.h |  3 ---
>  drivers/gpu/drm/i915/i915_pci.c |  2 ++
>  drivers/gpu/drm/i915/i915_vma.c | 25 -
>  drivers/gpu/drm/i915/i915_vma.h |  3 +++
>  5 files changed, 14 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index 8923589057ab..04979789e7be 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -8,7 +8,7 @@
>  #include 
>
>  #include "i915_globals.h"
> -#include "i915_vma.h"
> +#include "i915_drv.h"
>
>  static LIST_HEAD(globals);
>
> @@ -28,7 +28,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_vma_init,
>  };
>
>  int __init i915_globals_init(void)
> diff --git a/drivers/gpu/drm/i915/i915_globals.h 
> b/drivers/gpu/drm/i915/i915_globals.h
> index 7a57bce1da05..57d2998bba45 100644
> --- a/drivers/gpu/drm/i915/i915_globals.h
> +++ b/drivers/gpu/drm/i915/i915_globals.h
> @@ -22,7 +22,4 @@ void i915_global_register(struct i915_global *global);
>  int i915_globals_init(void);
>  void i915_globals_exit(void);
>
> -/* constructors */
> -int i915_global_vma_init(void);
> -
>  #endif /* _I915_GLOBALS_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index a44318519977..0affcf33a211 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -40,6 +40,7 @@
>  #include "i915_globals.h"
>  #include "i915_selftest.h"
>  #include "i915_scheduler.h"
> +#include "i915_vma.h"
>
>  #define PLATFORM(x) .platform = (x)
>  #define GEN(x) \
> @@ -1306,6 +1307,7 @@ static const struct {
> { i915_objects_module_init, i915_objects_module_exit },
> { i915_request_module_init, i915_request_module_exit },
> { i915_scheduler_module_init, i915_scheduler_module_exit },
> +   { i915_vma_module_init, i915_vma_module_exit },
> { i915_globals_init, i915_globals_exit },
> { i915_mock_selftests, NULL },
> { i915_pmu_init, i915_pmu_exit },
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 09a7c47926f7..d094e2016b93 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -34,24 +34,20 @@
>  #include "gt/intel_gt_requests.h"
>
>  #include "i915_drv.h"
> -#include "i915_globals.h"
>  #include "i915_sw_fence_work.h"
>  #include "i915_trace.h"
>  #include "i915_vma.h"
>
> -static struct i915_global_vma {
> -   struct i915_global base;
> -   struct kmem_cache *slab_vmas;
> -} global;
> +struct kmem_cache *slab_vmas;

static.  With that,

Reviewed-by: Jason Ekstrand 

>
>  struct i915_vma *i915_vma_alloc(void)
>  {
> -   return kmem_cache_zalloc(global.slab_vmas, GFP_KERNEL);
> +   return kmem_cache_zalloc(slab_vmas, GFP_KERNEL);
>  }
>
>  void i915_vma_free(struct i915_vma *vma)
>  {
> -   return kmem_cache_free(global.slab_vmas, vma);
> +   return kmem_cache_free(slab_vmas, vma);
>  }
>
>  #if IS_ENABLED(CONFIG_DRM_I915_ERRLOG_GEM) && IS_ENABLED(CONFIG_DRM_DEBUG_MM)
> @@ -1414,21 +1410,16 @@ void i915_vma_make_purgeable(struct i915_vma *vma)
>  #include "selftests/i915_vma.c"
>  #endif
>
> -static void i915_global_vma_exit(void)
> +void i915_vma_module_exit(void)
>  {
> -   kmem_cache_destroy(global.slab_vmas);
> +   kmem_cache_destroy(slab_vmas);
>  }
>
> -static struct i915_global_vma global = { {
> -   .exit = i915_global_vma_exit,
> -} };
> -
> -int __init i915_global_vma_init(void)
> +int __init i915_vma_module_init(void)
>  {
> -   global.slab_vmas = KMEM_CACHE(i915_vma, SLAB_HWCACHE_ALIGN);
> -   if (!global.slab_vmas)
> +   slab_vmas = KMEM_CACHE(i915_vma, SLAB_HWCACHE_ALIGN);
> +   if (!slab_vmas)
> return -ENOMEM;
>
> -   i91

Re: [Intel-gfx] [PATCH 08/10] drm/i915: move scheduler slabs to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_dependencies|priorities to just a
> slab_dependencies|priorities.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_globals.c   |  2 --
>  drivers/gpu/drm/i915/i915_globals.h   |  2 --
>  drivers/gpu/drm/i915/i915_pci.c   |  2 ++
>  drivers/gpu/drm/i915/i915_scheduler.c | 39 +++
>  drivers/gpu/drm/i915/i915_scheduler.h |  3 +++
>  5 files changed, 20 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index 8fffa8d93bc5..8923589057ab 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -8,7 +8,6 @@
>  #include 
>
>  #include "i915_globals.h"
> -#include "i915_scheduler.h"
>  #include "i915_vma.h"
>
>  static LIST_HEAD(globals);
> @@ -29,7 +28,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_scheduler_init,
> i915_global_vma_init,
>  };
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.h 
> b/drivers/gpu/drm/i915/i915_globals.h
> index 9734740708f4..7a57bce1da05 100644
> --- a/drivers/gpu/drm/i915/i915_globals.h
> +++ b/drivers/gpu/drm/i915/i915_globals.h
> @@ -23,8 +23,6 @@ int i915_globals_init(void);
>  void i915_globals_exit(void);
>
>  /* constructors */
> -int i915_global_request_init(void);
> -int i915_global_scheduler_init(void);
>  int i915_global_vma_init(void);
>
>  #endif /* _I915_GLOBALS_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index bb2bd12fb8c2..a44318519977 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -39,6 +39,7 @@
>  #include "i915_perf.h"
>  #include "i915_globals.h"
>  #include "i915_selftest.h"
> +#include "i915_scheduler.h"
>
>  #define PLATFORM(x) .platform = (x)
>  #define GEN(x) \
> @@ -1304,6 +1305,7 @@ static const struct {
> { i915_gem_context_module_init, i915_gem_context_module_exit },
> { i915_objects_module_init, i915_objects_module_exit },
> { i915_request_module_init, i915_request_module_exit },
> +   { i915_scheduler_module_init, i915_scheduler_module_exit },
> { i915_globals_init, i915_globals_exit },
> { i915_mock_selftests, NULL },
> { i915_pmu_init, i915_pmu_exit },
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
> b/drivers/gpu/drm/i915/i915_scheduler.c
> index 561c649e59f7..02d90d239ff5 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -7,15 +7,11 @@
>  #include 
>
>  #include "i915_drv.h"
> -#include "i915_globals.h"
>  #include "i915_request.h"
>  #include "i915_scheduler.h"
>
> -static struct i915_global_scheduler {
> -   struct i915_global base;
> -   struct kmem_cache *slab_dependencies;
> -   struct kmem_cache *slab_priorities;
> -} global;
> +struct kmem_cache *slab_dependencies;

static

> +struct kmem_cache *slab_priorities;

static

>
>  static DEFINE_SPINLOCK(schedule_lock);
>
> @@ -93,7 +89,7 @@ i915_sched_lookup_priolist(struct i915_sched_engine 
> *sched_engine, int prio)
> if (prio == I915_PRIORITY_NORMAL) {
> p = _engine->default_priolist;
> } else {
> -   p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
> +   p = kmem_cache_alloc(slab_priorities, GFP_ATOMIC);
> /* Convert an allocation failure to a priority bump */
> if (unlikely(!p)) {
> prio = I915_PRIORITY_NORMAL; /* recurses just once */
> @@ -122,7 +118,7 @@ i915_sched_lookup_priolist(struct i915_sched_engine 
> *sched_engine, int prio)
>
>  void __i915_priolist_free(struct i915_priolist *p)
>  {
> -   kmem_cache_free(global.slab_priorities, p);
> +   kmem_cache_free(slab_priorities, p);
>  }
>
>  struct sched_cache {
> @@ -313,13 +309,13 @@ void i915_sched_node_reinit(struct i915_sched_node 
> *node)
>  static struct i915_dependency *
>  i915_dependency_alloc(void)
>  {
> -   return kmem_cache_alloc(global.slab_dependencies, GFP_KERNEL);
> +   return kmem_cache_alloc(slab_dependenc

Re: [Intel-gfx] [PATCH 07/10] drm/i915: move request slabs to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_requests|execute_cbs to just a
> slab_requests|execute_cbs.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_globals.c |  2 --
>  drivers/gpu/drm/i915/i915_pci.c |  2 ++
>  drivers/gpu/drm/i915/i915_request.c | 47 -
>  drivers/gpu/drm/i915/i915_request.h |  3 ++
>  4 files changed, 24 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index 40a592fbc3e0..8fffa8d93bc5 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -8,7 +8,6 @@
>  #include 
>
>  #include "i915_globals.h"
> -#include "i915_request.h"
>  #include "i915_scheduler.h"
>  #include "i915_vma.h"
>
> @@ -30,7 +29,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_request_init,
> i915_global_scheduler_init,
> i915_global_vma_init,
>  };
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 2334eb3e9abb..bb2bd12fb8c2 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -35,6 +35,7 @@
>  #include "i915_drv.h"
>  #include "gem/i915_gem_context.h"
>  #include "gem/i915_gem_object.h"
> +#include "i915_request.h"
>  #include "i915_perf.h"
>  #include "i915_globals.h"
>  #include "i915_selftest.h"
> @@ -1302,6 +1303,7 @@ static const struct {
> { i915_context_module_init, i915_context_module_exit },
> { i915_gem_context_module_init, i915_gem_context_module_exit },
> { i915_objects_module_init, i915_objects_module_exit },
> +   { i915_request_module_init, i915_request_module_exit },
> { i915_globals_init, i915_globals_exit },
> { i915_mock_selftests, NULL },
> { i915_pmu_init, i915_pmu_exit },
> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> b/drivers/gpu/drm/i915/i915_request.c
> index 6594cb2f8ebd..69152369ea00 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -42,7 +42,6 @@
>
>  #include "i915_active.h"
>  #include "i915_drv.h"
> -#include "i915_globals.h"
>  #include "i915_trace.h"
>  #include "intel_pm.h"
>
> @@ -52,11 +51,8 @@ struct execute_cb {
> struct i915_request *signal;
>  };
>
> -static struct i915_global_request {
> -   struct i915_global base;
> -   struct kmem_cache *slab_requests;
> -   struct kmem_cache *slab_execute_cbs;
> -} global;
> +struct kmem_cache *slab_requests;

static

> +struct kmem_cache *slab_execute_cbs;

static

Am I tired of typing this?  Yes, I am!  Will I keep typing it?  Probably. :-P

>
>  static const char *i915_fence_get_driver_name(struct dma_fence *fence)
>  {
> @@ -107,7 +103,7 @@ static signed long i915_fence_wait(struct dma_fence 
> *fence,
>
>  struct kmem_cache *i915_request_slab_cache(void)
>  {
> -   return global.slab_requests;
> +   return slab_requests;
>  }
>
>  static void i915_fence_release(struct dma_fence *fence)
> @@ -159,7 +155,7 @@ static void i915_fence_release(struct dma_fence *fence)
> !cmpxchg(>engine->request_pool, NULL, rq))
> return;
>
> -   kmem_cache_free(global.slab_requests, rq);
> +   kmem_cache_free(slab_requests, rq);
>  }
>
>  const struct dma_fence_ops i915_fence_ops = {
> @@ -176,7 +172,7 @@ static void irq_execute_cb(struct irq_work *wrk)
> struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
>
> i915_sw_fence_complete(cb->fence);
> -   kmem_cache_free(global.slab_execute_cbs, cb);
> +   kmem_cache_free(slab_execute_cbs, cb);
>  }
>
>  static __always_inline void
> @@ -514,7 +510,7 @@ __await_execution(struct i915_request *rq,
> if (i915_request_is_active(signal))
> return 0;
>
> -   cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
> +   cb = kmem_cache_alloc(slab_execute_cbs, gfp);
> if (!cb)
> return -ENOMEM;
>
> @@ -868,7 +864,7 @@ request_alloc_slow(struct intel_timeline *tl,
> rq = list_first_entry(>requests, typeof(*rq), 

Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 10:30 AM Jason Ekstrand  wrote:
>
> On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin
>  wrote:
> >
> >
> > On 23/07/2021 20:29, Daniel Vetter wrote:
> > > With the global kmem_cache shrink infrastructure gone there's nothing
> > > special and we can convert them over.
> > >
> > > I'm doing this split up into each patch because there's quite a bit of
> > > noise with removing the static global.slab_ce to just a
> > > slab_ce.
> > >
> > > Cc: Jason Ekstrand 
> > > Signed-off-by: Daniel Vetter 
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_context.c | 25 -
> > >   drivers/gpu/drm/i915/gt/intel_context.h |  3 +++
> > >   drivers/gpu/drm/i915/i915_globals.c |  2 --
> > >   drivers/gpu/drm/i915/i915_globals.h |  1 -
> > >   drivers/gpu/drm/i915/i915_pci.c |  2 ++
> > >   5 files changed, 13 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > > b/drivers/gpu/drm/i915/gt/intel_context.c
> > > index baa05fddd690..283382549a6f 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > > @@ -7,7 +7,6 @@
> > >   #include "gem/i915_gem_pm.h"
> > >
> > >   #include "i915_drv.h"
> > > -#include "i915_globals.h"
> > >   #include "i915_trace.h"
> > >
> > >   #include "intel_context.h"
> > > @@ -15,14 +14,11 @@
> > >   #include "intel_engine_pm.h"
> > >   #include "intel_ring.h"
> > >
> > > -static struct i915_global_context {
> > > - struct i915_global base;
> > > - struct kmem_cache *slab_ce;
> > > -} global;
> > > +struct kmem_cache *slab_ce;
>
> Static?  With that,
>
> Reviewed-by: Jason Ekstrand 
>
> > >
> > >   static struct intel_context *intel_context_alloc(void)
> > >   {
> > > - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
> > > + return kmem_cache_zalloc(slab_ce, GFP_KERNEL);
> > >   }
> > >
> > >   static void rcu_context_free(struct rcu_head *rcu)
> > > @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu)
> > >   struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
> > >
> > >   trace_intel_context_free(ce);
> > > - kmem_cache_free(global.slab_ce, ce);
> > > + kmem_cache_free(slab_ce, ce);
> > >   }
> > >
> > >   void intel_context_free(struct intel_context *ce)
> > > @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce)
> > >   i915_active_fini(>active);
> > >   }
> > >
> > > -static void i915_global_context_exit(void)
> > > +void i915_context_module_exit(void)
> > >   {
> > > - kmem_cache_destroy(global.slab_ce);
> > > + kmem_cache_destroy(slab_ce);
> > >   }
> > >
> > > -static struct i915_global_context global = { {
> > > - .exit = i915_global_context_exit,
> > > -} };
> > > -
> > > -int __init i915_global_context_init(void)
> > > +int __init i915_context_module_init(void)
> > >   {
> > > - global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> > > - if (!global.slab_ce)
> > > + slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> > > + if (!slab_ce)
> > >   return -ENOMEM;
> > >
> > > - i915_global_register();
> > >   return 0;
> > >   }
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > > b/drivers/gpu/drm/i915/gt/intel_context.h
> > > index 974ef85320c2..a0ca82e3c40d 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > > @@ -30,6 +30,9 @@ void intel_context_init(struct intel_context *ce,
> > >   struct intel_engine_cs *engine);
> > >   void intel_context_fini(struct intel_context *ce);
> > >
> > > +void i915_context_module_exit(void);
> > > +int i915_context_module_init(void);
> > > +
> > >   struct intel_context *
> > >   intel_context_create(struct intel_engine_cs *engine);
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> > > b/drivers/gpu/drm/i915/i915_g

Re: [Intel-gfx] [PATCH 06/10] drm/i915: move gem_objects slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_objects to just a
> slab_objects.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c | 26 +++---
>  drivers/gpu/drm/i915/gem/i915_gem_object.h |  3 +++
>  drivers/gpu/drm/i915/i915_globals.c|  1 -
>  drivers/gpu/drm/i915/i915_globals.h|  1 -
>  drivers/gpu/drm/i915/i915_pci.c|  1 +
>  5 files changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 5c21cff33199..53156250d283 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -30,14 +30,10 @@
>  #include "i915_gem_context.h"
>  #include "i915_gem_mman.h"
>  #include "i915_gem_object.h"
> -#include "i915_globals.h"
>  #include "i915_memcpy.h"
>  #include "i915_trace.h"
>
> -static struct i915_global_object {
> -   struct i915_global base;
> -   struct kmem_cache *slab_objects;
> -} global;
> +struct kmem_cache *slab_objects;

static

With that,

Reviewed-by: Jason Ekstrand 

>  static const struct drm_gem_object_funcs i915_gem_object_funcs;
>
> @@ -45,7 +41,7 @@ struct drm_i915_gem_object *i915_gem_object_alloc(void)
>  {
> struct drm_i915_gem_object *obj;
>
> -   obj = kmem_cache_zalloc(global.slab_objects, GFP_KERNEL);
> +   obj = kmem_cache_zalloc(slab_objects, GFP_KERNEL);
> if (!obj)
> return NULL;
> obj->base.funcs = _gem_object_funcs;
> @@ -55,7 +51,7 @@ struct drm_i915_gem_object *i915_gem_object_alloc(void)
>
>  void i915_gem_object_free(struct drm_i915_gem_object *obj)
>  {
> -   return kmem_cache_free(global.slab_objects, obj);
> +   return kmem_cache_free(slab_objects, obj);
>  }
>
>  void i915_gem_object_init(struct drm_i915_gem_object *obj,
> @@ -664,23 +660,17 @@ void i915_gem_init__objects(struct drm_i915_private 
> *i915)
> INIT_WORK(>mm.free_work, __i915_gem_free_work);
>  }
>
> -static void i915_global_objects_exit(void)
> +void i915_objects_module_exit(void)
>  {
> -   kmem_cache_destroy(global.slab_objects);
> +   kmem_cache_destroy(slab_objects);
>  }
>
> -static struct i915_global_object global = { {
> -   .exit = i915_global_objects_exit,
> -} };
> -
> -int __init i915_global_objects_init(void)
> +int __init i915_objects_module_init(void)
>  {
> -   global.slab_objects =
> -   KMEM_CACHE(drm_i915_gem_object, SLAB_HWCACHE_ALIGN);
> -   if (!global.slab_objects)
> +   slab_objects = KMEM_CACHE(drm_i915_gem_object, SLAB_HWCACHE_ALIGN);
> +   if (!slab_objects)
> return -ENOMEM;
>
> -   i915_global_register();
> return 0;
>  }
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index f3ede43282dc..6d8ea62a372f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -48,6 +48,9 @@ static inline bool i915_gem_object_size_2big(u64 size)
>
>  void i915_gem_init__objects(struct drm_i915_private *i915);
>
> +void i915_objects_module_exit(void);
> +int i915_objects_module_init(void);
> +
>  struct drm_i915_gem_object *i915_gem_object_alloc(void);
>  void i915_gem_object_free(struct drm_i915_gem_object *obj);
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index dbb3d81eeea7..40a592fbc3e0 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -30,7 +30,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_objects_init,
> i915_global_request_init,
> i915_global_scheduler_init,
> i915_global_vma_init,
> diff --git a/drivers/gpu/drm/i915/i915_globals.h 
> b/drivers/gpu/drm/i915/i915_globals.h
> index f16752dbbdbf..9734740708f4 100644
> --- a/drivers/gpu/drm/i915/i915_globals.h
> +++ b/drivers/gpu/drm/i915/i915_globals.h
> @@ -23,7 +23,6 @@ int i915_globals_init(void);
>  void i915_globals_exit(void);
>
>  /* constructors */
> -int i915_global_objects_init(void);
>  int i915_global_request_init(void);
>  in

Re: [Intel-gfx] [PATCH 05/10] drm/i915: move gem_context slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_luts to just a
> slab_luts.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 25 +++--
>  drivers/gpu/drm/i915/gem/i915_gem_context.h |  3 +++
>  drivers/gpu/drm/i915/i915_globals.c |  2 --
>  drivers/gpu/drm/i915/i915_globals.h |  1 -
>  drivers/gpu/drm/i915/i915_pci.c |  2 ++
>  5 files changed, 13 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 89ca401bf9ae..c17c28af1e57 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -79,25 +79,21 @@
>  #include "gt/intel_ring.h"
>
>  #include "i915_gem_context.h"
> -#include "i915_globals.h"
>  #include "i915_trace.h"
>  #include "i915_user_extensions.h"
>
>  #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
>
> -static struct i915_global_gem_context {
> -   struct i915_global base;
> -   struct kmem_cache *slab_luts;
> -} global;
> +struct kmem_cache *slab_luts;

static.

With that,

Reviewed-by: Jason Ekstrand 

>  struct i915_lut_handle *i915_lut_handle_alloc(void)
>  {
> -   return kmem_cache_alloc(global.slab_luts, GFP_KERNEL);
> +   return kmem_cache_alloc(slab_luts, GFP_KERNEL);
>  }
>
>  void i915_lut_handle_free(struct i915_lut_handle *lut)
>  {
> -   return kmem_cache_free(global.slab_luts, lut);
> +   return kmem_cache_free(slab_luts, lut);
>  }
>
>  static void lut_close(struct i915_gem_context *ctx)
> @@ -2282,21 +2278,16 @@ i915_gem_engines_iter_next(struct 
> i915_gem_engines_iter *it)
>  #include "selftests/i915_gem_context.c"
>  #endif
>
> -static void i915_global_gem_context_exit(void)
> +void i915_gem_context_module_exit(void)
>  {
> -   kmem_cache_destroy(global.slab_luts);
> +   kmem_cache_destroy(slab_luts);
>  }
>
> -static struct i915_global_gem_context global = { {
> -   .exit = i915_global_gem_context_exit,
> -} };
> -
> -int __init i915_global_gem_context_init(void)
> +int __init i915_gem_context_module_init(void)
>  {
> -   global.slab_luts = KMEM_CACHE(i915_lut_handle, 0);
> -   if (!global.slab_luts)
> +   slab_luts = KMEM_CACHE(i915_lut_handle, 0);
> +   if (!slab_luts)
> return -ENOMEM;
>
> -   i915_global_register();
> return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> index 20411db84914..18060536b0c2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> @@ -224,6 +224,9 @@ i915_gem_engines_iter_next(struct i915_gem_engines_iter 
> *it);
> for (i915_gem_engines_iter_init(&(it), (engines)); \
>  ((ce) = i915_gem_engines_iter_next(&(it)));)
>
> +void i915_gem_context_module_exit(void);
> +int i915_gem_context_module_init(void);
> +
>  struct i915_lut_handle *i915_lut_handle_alloc(void);
>  void i915_lut_handle_free(struct i915_lut_handle *lut);
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index d36eb7dc40aa..dbb3d81eeea7 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -7,7 +7,6 @@
>  #include 
>  #include 
>
> -#include "gem/i915_gem_object.h"
>  #include "i915_globals.h"
>  #include "i915_request.h"
>  #include "i915_scheduler.h"
> @@ -31,7 +30,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_gem_context_init,
> i915_global_objects_init,
> i915_global_request_init,
> i915_global_scheduler_init,
> diff --git a/drivers/gpu/drm/i915/i915_globals.h 
> b/drivers/gpu/drm/i915/i915_globals.h
> index 60daa738a188..f16752dbbdbf 100644
> --- a/drivers/gpu/drm/i915/i915_globals.h
> +++ b/drivers/gpu/drm/i915/i915_globals.h
> @@ -23,7 +23,6 @@ int i915_globals_init(void);
>  void i915_globals_exit(void);
>
>  /* constructors */
> -int i915_global_gem_context_init(void);
>  int i915_global_objects_init(void);
>  int i915_global_request_init(void);
>  int i915_glob

Re: [Intel-gfx] [PATCH 0/8] drm/i915: Migrate memory to SMEM when imported cross-device (v8)

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 10:29 AM Matthew Auld
 wrote:
>
> On Mon, 26 Jul 2021 at 16:11, Jason Ekstrand  wrote:
> >
> > On Mon, Jul 26, 2021 at 3:12 AM Matthew Auld
> >  wrote:
> > >
> > > On Fri, 23 Jul 2021 at 18:21, Jason Ekstrand  wrote:
> > > >
> > > > This patch series fixes an issue with discrete graphics on Intel where 
> > > > we
> > > > allowed dma-buf import while leaving the object in local memory.  This
> > > > breaks down pretty badly if the import happened on a different physical
> > > > device.
> > > >
> > > > v7:
> > > >  - Drop "drm/i915/gem/ttm: Place new BOs in the requested region"
> > > >  - Add a new "drm/i915/gem: Call i915_gem_flush_free_objects() in 
> > > > i915_gem_dumb_create()"
> > > >  - Misc. review feedback from Matthew Auld
> > > > v8:
> > > >  - Misc. review feedback from Matthew Auld
> > > > v9:
> > > >  - Replace the i915/ttm patch with two that are hopefully more correct
> > > >
> > > > Jason Ekstrand (6):
> > > >   drm/i915/gem: Check object_can_migrate from object_migrate
> > > >   drm/i915/gem: Refactor placement setup for i915_gem_object_create*
> > > > (v2)
> > > >   drm/i915/gem: Call i915_gem_flush_free_objects() in
> > > > i915_gem_dumb_create()
> > > >   drm/i915/gem: Unify user object creation (v3)
> > > >   drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed
> > > >   drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails
> > > >
> > > > Thomas Hellström (2):
> > > >   drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)
> > > >   drm/i915/gem: Migrate to system at dma-buf attach time (v7)
> > >
> > > Should I push the series?
> >
> > Yes, please.  Do we have a solid testing plan for things like this
> > that touch discrete?  I tested with mesa+glxgears on my DG1 but
> > haven't run anything more stressful.
>
> I think all we really have are the migration related selftests, and CI
> is not even running them on DG1 due to other breakage. Assuming you
> ran these locally, I think we just merge the series?

Works for me.  Yes, I ran them on my TGL+DG1 box.  I've also tested
both GL and Vulkan PRIME support with the client running on DG1 and
the compositor running on TGL with this series and everything works
smooth.

--Jason


> >
> > --Jason
> >
> >
> > > >
> > > >  drivers/gpu/drm/i915/gem/i915_gem_create.c| 177 
> > > >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  58 --
> > > >  drivers/gpu/drm/i915/gem/i915_gem_object.c|  20 +-
> > > >  drivers/gpu/drm/i915/gem/i915_gem_object.h|   4 +
> > > >  drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  13 +-
> > > >  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 190 +-
> > > >  .../drm/i915/gem/selftests/i915_gem_migrate.c |  15 --
> > > >  7 files changed, 341 insertions(+), 136 deletions(-)
> > > >
> > > > --
> > > > 2.31.1
> > > >
> > > > ___
> > > > Intel-gfx mailing list
> > > > Intel-gfx@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin
 wrote:
>
>
> On 23/07/2021 20:29, Daniel Vetter wrote:
> > With the global kmem_cache shrink infrastructure gone there's nothing
> > special and we can convert them over.
> >
> > I'm doing this split up into each patch because there's quite a bit of
> > noise with removing the static global.slab_ce to just a
> > slab_ce.
> >
> > Cc: Jason Ekstrand 
> > Signed-off-by: Daniel Vetter 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c | 25 -
> >   drivers/gpu/drm/i915/gt/intel_context.h |  3 +++
> >   drivers/gpu/drm/i915/i915_globals.c |  2 --
> >   drivers/gpu/drm/i915/i915_globals.h |  1 -
> >   drivers/gpu/drm/i915/i915_pci.c |  2 ++
> >   5 files changed, 13 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index baa05fddd690..283382549a6f 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -7,7 +7,6 @@
> >   #include "gem/i915_gem_pm.h"
> >
> >   #include "i915_drv.h"
> > -#include "i915_globals.h"
> >   #include "i915_trace.h"
> >
> >   #include "intel_context.h"
> > @@ -15,14 +14,11 @@
> >   #include "intel_engine_pm.h"
> >   #include "intel_ring.h"
> >
> > -static struct i915_global_context {
> > - struct i915_global base;
> > - struct kmem_cache *slab_ce;
> > -} global;
> > +struct kmem_cache *slab_ce;

Static?  With that,

Reviewed-by: Jason Ekstrand 

> >
> >   static struct intel_context *intel_context_alloc(void)
> >   {
> > - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
> > + return kmem_cache_zalloc(slab_ce, GFP_KERNEL);
> >   }
> >
> >   static void rcu_context_free(struct rcu_head *rcu)
> > @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu)
> >   struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
> >
> >   trace_intel_context_free(ce);
> > - kmem_cache_free(global.slab_ce, ce);
> > + kmem_cache_free(slab_ce, ce);
> >   }
> >
> >   void intel_context_free(struct intel_context *ce)
> > @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce)
> >   i915_active_fini(>active);
> >   }
> >
> > -static void i915_global_context_exit(void)
> > +void i915_context_module_exit(void)
> >   {
> > - kmem_cache_destroy(global.slab_ce);
> > + kmem_cache_destroy(slab_ce);
> >   }
> >
> > -static struct i915_global_context global = { {
> > - .exit = i915_global_context_exit,
> > -} };
> > -
> > -int __init i915_global_context_init(void)
> > +int __init i915_context_module_init(void)
> >   {
> > - global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> > - if (!global.slab_ce)
> > + slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> > + if (!slab_ce)
> >   return -ENOMEM;
> >
> > - i915_global_register();
> >   return 0;
> >   }
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > b/drivers/gpu/drm/i915/gt/intel_context.h
> > index 974ef85320c2..a0ca82e3c40d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -30,6 +30,9 @@ void intel_context_init(struct intel_context *ce,
> >   struct intel_engine_cs *engine);
> >   void intel_context_fini(struct intel_context *ce);
> >
> > +void i915_context_module_exit(void);
> > +int i915_context_module_init(void);
> > +
> >   struct intel_context *
> >   intel_context_create(struct intel_engine_cs *engine);
> >
> > diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> > b/drivers/gpu/drm/i915/i915_globals.c
> > index 3de7cf22ec76..d36eb7dc40aa 100644
> > --- a/drivers/gpu/drm/i915/i915_globals.c
> > +++ b/drivers/gpu/drm/i915/i915_globals.c
> > @@ -7,7 +7,6 @@
> >   #include 
> >   #include 
> >
> > -#include "gem/i915_gem_context.h"
> >   #include "gem/i915_gem_object.h"
> >   #include "i915_globals.h"
> >   #include "i915_request.h"
> > @@ -32,7 +31,6 @@ static void __i915_globals_cleanup(void)
> >   }
> >
> >   static __initconst int (* const initfn[])(void) = {
> > -

Re: [Intel-gfx] [PATCH 03/10] drm/i915: move i915_buddy slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_blocks to just a
> slab_blocks.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_buddy.c   | 25 -
>  drivers/gpu/drm/i915/i915_buddy.h   |  3 ++-
>  drivers/gpu/drm/i915/i915_globals.c |  2 --
>  drivers/gpu/drm/i915/i915_pci.c |  2 ++
>  4 files changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_buddy.c 
> b/drivers/gpu/drm/i915/i915_buddy.c
> index caabcaea3be7..045d00c43b4c 100644
> --- a/drivers/gpu/drm/i915/i915_buddy.c
> +++ b/drivers/gpu/drm/i915/i915_buddy.c
> @@ -8,13 +8,9 @@
>  #include "i915_buddy.h"
>
>  #include "i915_gem.h"
> -#include "i915_globals.h"
>  #include "i915_utils.h"
>
> -static struct i915_global_buddy {
> -   struct i915_global base;
> -   struct kmem_cache *slab_blocks;
> -} global;
> +struct kmem_cache *slab_blocks;

static?  With that fixed,

Reviewed-by: Jason Ekstrand 

>
>  static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm,
>  struct i915_buddy_block 
> *parent,
> @@ -25,7 +21,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
> i915_buddy_mm *mm,
>
> GEM_BUG_ON(order > I915_BUDDY_MAX_ORDER);
>
> -   block = kmem_cache_zalloc(global.slab_blocks, GFP_KERNEL);
> +   block = kmem_cache_zalloc(slab_blocks, GFP_KERNEL);
> if (!block)
> return NULL;
>
> @@ -40,7 +36,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
> i915_buddy_mm *mm,
>  static void i915_block_free(struct i915_buddy_mm *mm,
> struct i915_buddy_block *block)
>  {
> -   kmem_cache_free(global.slab_blocks, block);
> +   kmem_cache_free(slab_blocks, block);
>  }
>
>  static void mark_allocated(struct i915_buddy_block *block)
> @@ -410,21 +406,16 @@ int i915_buddy_alloc_range(struct i915_buddy_mm *mm,
>  #include "selftests/i915_buddy.c"
>  #endif
>
> -static void i915_global_buddy_exit(void)
> +void i915_buddy_module_exit(void)
>  {
> -   kmem_cache_destroy(global.slab_blocks);
> +   kmem_cache_destroy(slab_blocks);
>  }
>
> -static struct i915_global_buddy global = { {
> -   .exit = i915_global_buddy_exit,
> -} };
> -
> -int __init i915_global_buddy_init(void)
> +int __init i915_buddy_module_init(void)
>  {
> -   global.slab_blocks = KMEM_CACHE(i915_buddy_block, 0);
> -   if (!global.slab_blocks)
> +   slab_blocks = KMEM_CACHE(i915_buddy_block, 0);
> +   if (!slab_blocks)
> return -ENOMEM;
>
> -   i915_global_register();
> return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_buddy.h 
> b/drivers/gpu/drm/i915/i915_buddy.h
> index d8f26706de52..3940d632f208 100644
> --- a/drivers/gpu/drm/i915/i915_buddy.h
> +++ b/drivers/gpu/drm/i915/i915_buddy.h
> @@ -129,6 +129,7 @@ void i915_buddy_free(struct i915_buddy_mm *mm, struct 
> i915_buddy_block *block);
>
>  void i915_buddy_free_list(struct i915_buddy_mm *mm, struct list_head 
> *objects);
>
> -int i915_global_buddy_init(void);
> +void i915_buddy_module_exit(void);
> +int i915_buddy_module_init(void);
>
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index a53135ee831d..3de7cf22ec76 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -7,7 +7,6 @@
>  #include 
>  #include 
>
> -#include "i915_buddy.h"
>  #include "gem/i915_gem_context.h"
>  #include "gem/i915_gem_object.h"
>  #include "i915_globals.h"
> @@ -33,7 +32,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_buddy_init,
> i915_global_context_init,
> i915_global_gem_context_init,
> i915_global_objects_init,
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 6ee77a8f43d6..f9527269e30a 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -31,6 +31,7 @@
>  #include "display/intel_fbdev.h"
>
>  #include "i915_active.h"
> +#include "i915_buddy.h"
>  #include "i915_drv.h"
>  #include "i9

Re: [Intel-gfx] [PATCH 02/10] drm/i915: move i915_active slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_cache to just a slab_cache.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_active.c  | 31 ++---
>  drivers/gpu/drm/i915/i915_active.h  |  3 +++
>  drivers/gpu/drm/i915/i915_globals.c |  2 --
>  drivers/gpu/drm/i915/i915_globals.h |  1 -
>  drivers/gpu/drm/i915/i915_pci.c |  2 ++
>  5 files changed, 16 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_active.c 
> b/drivers/gpu/drm/i915/i915_active.c
> index 91723123ae9f..9ffeb77eb5bb 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -13,7 +13,6 @@
>
>  #include "i915_drv.h"
>  #include "i915_active.h"
> -#include "i915_globals.h"
>
>  /*
>   * Active refs memory management
> @@ -22,10 +21,7 @@
>   * they idle (when we know the active requests are inactive) and allocate the
>   * nodes from a local slab cache to hopefully reduce the fragmentation.
>   */
> -static struct i915_global_active {
> -   struct i915_global base;
> -   struct kmem_cache *slab_cache;
> -} global;
> +struct kmem_cache *slab_cache;

static?  Or were you planning to expose it somehow?  With that fixed,

Reviewed-by: Jason Ekstrand 

>
>  struct active_node {
> struct rb_node node;
> @@ -174,7 +170,7 @@ __active_retire(struct i915_active *ref)
> /* Finally free the discarded timeline tree  */
> rbtree_postorder_for_each_entry_safe(it, n, , node) {
> GEM_BUG_ON(i915_active_fence_isset(>base));
> -   kmem_cache_free(global.slab_cache, it);
> +   kmem_cache_free(slab_cache, it);
> }
>  }
>
> @@ -322,7 +318,7 @@ active_instance(struct i915_active *ref, u64 idx)
>  * XXX: We should preallocate this before i915_active_ref() is ever
>  *  called, but we cannot call into fs_reclaim() anyway, so use 
> GFP_ATOMIC.
>  */
> -   node = kmem_cache_alloc(global.slab_cache, GFP_ATOMIC);
> +   node = kmem_cache_alloc(slab_cache, GFP_ATOMIC);
> if (!node)
> goto out;
>
> @@ -788,7 +784,7 @@ void i915_active_fini(struct i915_active *ref)
> mutex_destroy(>mutex);
>
> if (ref->cache)
> -   kmem_cache_free(global.slab_cache, ref->cache);
> +   kmem_cache_free(slab_cache, ref->cache);
>  }
>
>  static inline bool is_idle_barrier(struct active_node *node, u64 idx)
> @@ -908,7 +904,7 @@ int i915_active_acquire_preallocate_barrier(struct 
> i915_active *ref,
> node = reuse_idle_barrier(ref, idx);
> rcu_read_unlock();
> if (!node) {
> -   node = kmem_cache_alloc(global.slab_cache, 
> GFP_KERNEL);
> +   node = kmem_cache_alloc(slab_cache, GFP_KERNEL);
> if (!node)
> goto unwind;
>
> @@ -956,7 +952,7 @@ int i915_active_acquire_preallocate_barrier(struct 
> i915_active *ref,
> atomic_dec(>count);
> intel_engine_pm_put(barrier_to_engine(node));
>
> -   kmem_cache_free(global.slab_cache, node);
> +   kmem_cache_free(slab_cache, node);
> }
> return -ENOMEM;
>  }
> @@ -1176,21 +1172,16 @@ struct i915_active *i915_active_create(void)
>  #include "selftests/i915_active.c"
>  #endif
>
> -static void i915_global_active_exit(void)
> +void i915_active_module_exit(void)
>  {
> -   kmem_cache_destroy(global.slab_cache);
> +   kmem_cache_destroy(slab_cache);
>  }
>
> -static struct i915_global_active global = { {
> -   .exit = i915_global_active_exit,
> -} };
> -
> -int __init i915_global_active_init(void)
> +int __init i915_active_module_init(void)
>  {
> -   global.slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN);
> -   if (!global.slab_cache)
> +   slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN);
> +   if (!slab_cache)
> return -ENOMEM;
>
> -   i915_global_register();
> return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_active.h 
> b/drivers/gpu/drm/i915/i915_active.h
> index d0feda68b874..5fcdb0e2bc9e 100644
> --- a/drivers/gpu/drm/i915/i915_active.h
> +++ b/drivers/gpu/drm/i915/i915_active.h
> @@ -247,4 +247,

Re: [Intel-gfx] [PATCH 01/10] drm/i915: Check for nomodeset in i915_init() first

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> When modesetting (aka the full pci driver, which has nothing to do
> with disable_display option, which just gives you the full pci driver
> without the display driver) is disabled, we load nothing and do
> nothing.
>
> So move that check first, for a bit of orderliness. With Jason's
> module init/exit table this now becomes trivial.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 

Reviewed-by: Jason Ekstrand 

> ---
>  drivers/gpu/drm/i915/i915_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 48ea23dd3b5b..0deaeeba2347 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -1292,9 +1292,9 @@ static const struct {
> int (*init)(void);
> void (*exit)(void);
>  } init_funcs[] = {
> +   { i915_check_nomodeset, NULL },
> { i915_globals_init, i915_globals_exit },
> { i915_mock_selftests, NULL },
> -   { i915_check_nomodeset, NULL },
> { i915_pmu_init, i915_pmu_exit },
> { i915_register_pci_driver, i915_unregister_pci_driver },
> { i915_perf_sysctl_register, i915_perf_sysctl_unregister },
> --
> 2.32.0
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 3:31 AM Maarten Lankhorst
 wrote:
>
> Op 23-07-2021 om 13:34 schreef Matthew Auld:
> > From: Chris Wilson 
> >
> > Jason Ekstrand requested a more efficient method than userptr+set-domain
> > to determine if the userptr object was backed by a complete set of pages
> > upon creation. To be more efficient than simply populating the userptr
> > using get_user_pages() (as done by the call to set-domain or execbuf),
> > we can walk the tree of vm_area_struct and check for gaps or vma not
> > backed by struct page (VM_PFNMAP). The question is how to handle
> > VM_MIXEDMAP which may be either struct page or pfn backed...
> >
> > With discrete we are going to drop support for set_domain(), so offering
> > a way to probe the pages, without having to resort to dummy batches has
> > been requested.
> >
> > v2:
> > - add new query param for the PROBE flag, so userspace can easily
> >   check if the kernel supports it(Jason).
> > - use mmap_read_{lock, unlock}.
> > - add some kernel-doc.
> > v3:
> > - In the docs also mention that PROBE doesn't guarantee that the pages
> >   will remain valid by the time they are actually used(Tvrtko).
> > - Add a small comment for the hole finding logic(Jason).
> > - Move the param next to all the other params which just return true.
> >
> > Testcase: igt/gem_userptr_blits/probe
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Matthew Auld 
> > Cc: Thomas Hellström 
> > Cc: Maarten Lankhorst 
> > Cc: Tvrtko Ursulin 
> > Cc: Jordan Justen 
> > Cc: Kenneth Graunke 
> > Cc: Jason Ekstrand 
> > Cc: Daniel Vetter 
> > Cc: Ramalingam C 
> > Reviewed-by: Tvrtko Ursulin 
> > Acked-by: Kenneth Graunke 
> > Reviewed-by: Jason Ekstrand 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 -
> >  drivers/gpu/drm/i915/i915_getparam.c|  1 +
> >  include/uapi/drm/i915_drm.h | 20 ++
> >  3 files changed, 61 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > index 56edfeff8c02..468a7a617fbf 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops 
> > i915_gem_userptr_ops = {
> >
> >  #endif
> >
> > +static int
> > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> > +{
> > + const unsigned long end = addr + len;
> > + struct vm_area_struct *vma;
> > + int ret = -EFAULT;
> > +
> > + mmap_read_lock(mm);
> > + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> > + /* Check for holes, note that we also update the addr below */
> > + if (vma->vm_start > addr)
> > + break;
> > +
> > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> > + break;
> > +
> > + if (vma->vm_end >= end) {
> > + ret = 0;
> > + break;
> > + }
> > +
> > + addr = vma->vm_end;
> > + }
> > + mmap_read_unlock(mm);
> > +
> > + return ret;
> > +}
> > +
> >  /*
> >   * Creates a new mm object that wraps some normal memory from the process
> >   * context - user memory.
> > @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> >   }
> >
> >   if (args->flags & ~(I915_USERPTR_READ_ONLY |
> > - I915_USERPTR_UNSYNCHRONIZED))
> > + I915_USERPTR_UNSYNCHRONIZED |
> > + I915_USERPTR_PROBE))
> >   return -EINVAL;
> >
> >   if (i915_gem_object_size_2big(args->user_size))
> > @@ -504,6 +533,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> >   return -ENODEV;
> >   }
> >
> > + if (args->flags & I915_USERPTR_PROBE) {
> > + /*
> > +  * Check that the range pointed to represents real struct
> > +  * pages and not iomappings (at this moment in time!)
> > +  */
> > + ret = probe_range(current->mm, args->user_ptr, 
> > args->user_size);
> > + if (ret)
> > + return ret;
> > + }
> > +
>

Re: [Intel-gfx] [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 3:06 AM Matthew Auld
 wrote:
>
> On Fri, 23 Jul 2021 at 18:48, Jason Ekstrand  wrote:
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12044
>
> Cool, is that ready to go? i.e can we start merging the kernel + IGT side.

Yes, it's all reviewed.  Though, it sounds like Maarten had a comment
so we should settle on that before landing.

> >
> > On Fri, Jul 23, 2021 at 6:35 AM Matthew Auld  wrote:
> > >
> > > From: Chris Wilson 
> > >
> > > Jason Ekstrand requested a more efficient method than userptr+set-domain
> > > to determine if the userptr object was backed by a complete set of pages
> > > upon creation. To be more efficient than simply populating the userptr
> > > using get_user_pages() (as done by the call to set-domain or execbuf),
> > > we can walk the tree of vm_area_struct and check for gaps or vma not
> > > backed by struct page (VM_PFNMAP). The question is how to handle
> > > VM_MIXEDMAP which may be either struct page or pfn backed...
> > >
> > > With discrete we are going to drop support for set_domain(), so offering
> > > a way to probe the pages, without having to resort to dummy batches has
> > > been requested.
> > >
> > > v2:
> > > - add new query param for the PROBE flag, so userspace can easily
> > >   check if the kernel supports it(Jason).
> > > - use mmap_read_{lock, unlock}.
> > > - add some kernel-doc.
> > > v3:
> > > - In the docs also mention that PROBE doesn't guarantee that the pages
> > >   will remain valid by the time they are actually used(Tvrtko).
> > > - Add a small comment for the hole finding logic(Jason).
> > > - Move the param next to all the other params which just return true.
> > >
> > > Testcase: igt/gem_userptr_blits/probe
> > > Signed-off-by: Chris Wilson 
> > > Signed-off-by: Matthew Auld 
> > > Cc: Thomas Hellström 
> > > Cc: Maarten Lankhorst 
> > > Cc: Tvrtko Ursulin 
> > > Cc: Jordan Justen 
> > > Cc: Kenneth Graunke 
> > > Cc: Jason Ekstrand 
> > > Cc: Daniel Vetter 
> > > Cc: Ramalingam C 
> > > Reviewed-by: Tvrtko Ursulin 
> > > Acked-by: Kenneth Graunke 
> > > Reviewed-by: Jason Ekstrand 
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 -
> > >  drivers/gpu/drm/i915/i915_getparam.c|  1 +
> > >  include/uapi/drm/i915_drm.h | 20 ++
> > >  3 files changed, 61 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > index 56edfeff8c02..468a7a617fbf 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops 
> > > i915_gem_userptr_ops = {
> > >
> > >  #endif
> > >
> > > +static int
> > > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> > > +{
> > > +   const unsigned long end = addr + len;
> > > +   struct vm_area_struct *vma;
> > > +   int ret = -EFAULT;
> > > +
> > > +   mmap_read_lock(mm);
> > > +   for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> > > +   /* Check for holes, note that we also update the addr 
> > > below */
> > > +   if (vma->vm_start > addr)
> > > +   break;
> > > +
> > > +   if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> > > +   break;
> > > +
> > > +   if (vma->vm_end >= end) {
> > > +   ret = 0;
> > > +   break;
> > > +   }
> > > +
> > > +   addr = vma->vm_end;
> > > +   }
> > > +   mmap_read_unlock(mm);
> > > +
> > > +   return ret;
> > > +}
> > > +
> > >  /*
> > >   * Creates a new mm object that wraps some normal memory from the process
> > >   * context - user memory.
> > > @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > > }
> > >
> > > if (args->flags & ~(I915_USERPTR_READ_ONLY |
> > > -   I915_USERPTR_UNSYNC

Re: [Intel-gfx] [PATCH 0/8] drm/i915: Migrate memory to SMEM when imported cross-device (v8)

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 3:12 AM Matthew Auld
 wrote:
>
> On Fri, 23 Jul 2021 at 18:21, Jason Ekstrand  wrote:
> >
> > This patch series fixes an issue with discrete graphics on Intel where we
> > allowed dma-buf import while leaving the object in local memory.  This
> > breaks down pretty badly if the import happened on a different physical
> > device.
> >
> > v7:
> >  - Drop "drm/i915/gem/ttm: Place new BOs in the requested region"
> >  - Add a new "drm/i915/gem: Call i915_gem_flush_free_objects() in 
> > i915_gem_dumb_create()"
> >  - Misc. review feedback from Matthew Auld
> > v8:
> >  - Misc. review feedback from Matthew Auld
> > v9:
> >  - Replace the i915/ttm patch with two that are hopefully more correct
> >
> > Jason Ekstrand (6):
> >   drm/i915/gem: Check object_can_migrate from object_migrate
> >   drm/i915/gem: Refactor placement setup for i915_gem_object_create*
> > (v2)
> >   drm/i915/gem: Call i915_gem_flush_free_objects() in
> > i915_gem_dumb_create()
> >   drm/i915/gem: Unify user object creation (v3)
> >   drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed
> >   drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails
> >
> > Thomas Hellström (2):
> >   drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)
> >   drm/i915/gem: Migrate to system at dma-buf attach time (v7)
>
> Should I push the series?

Yes, please.  Do we have a solid testing plan for things like this
that touch discrete?  I tested with mesa+glxgears on my DG1 but
haven't run anything more stressful.

--Jason


> >
> >  drivers/gpu/drm/i915/gem/i915_gem_create.c| 177 
> >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  58 --
> >  drivers/gpu/drm/i915/gem/i915_gem_object.c|  20 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_object.h|   4 +
> >  drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  13 +-
> >  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 190 +-
> >  .../drm/i915/gem/selftests/i915_gem_migrate.c |  15 --
> >  7 files changed, 341 insertions(+), 136 deletions(-)
> >
> > --
> > 2.31.1
> >
> > ___
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 00/30] Remove CNL support

2021-07-23 Thread Jason Ekstrand

Generally a big fan. 

--Jason

On July 23, 2021 19:11:34 Lucas De Marchi  wrote:


Patches 1 and 2 are already being reviewed elsewhere. Discussion on 2nd
patch made me revive something I started after comment from Ville
at 
https://patchwork.freedesktop.org/patch/428168/?series=88988=1#comment_768918


This removes CNL completely from the driver, while trying to rename
functions and macros where appropriate (usually to GLK when dealing with
display or with ICL otherwise). It starts with display, which is more
straightforward, and then proceed to the rest of i915.

diff stat removing 1600 lines of dead code seems to pay the pain of
doing this.


Lucas De Marchi (30):
 drm/i915: fix not reading DSC disable fuse in GLK
 drm/i915/display: split DISPLAY_VER 9 and 10 in intel_setup_outputs()
 drm/i915/display: remove PORT_F workaround for CNL
 drm/i915/display: remove explicit CNL handling from intel_cdclk.c
 drm/i915/display: remove explicit CNL handling from intel_color.c
 drm/i915/display: remove explicit CNL handling from intel_combo_phy.c
 drm/i915/display: remove explicit CNL handling from intel_crtc.c
 drm/i915/display: remove explicit CNL handling from intel_ddi.c
 drm/i915/display: remove explicit CNL handling from
   intel_display_debugfs.c
 drm/i915/display: remove explicit CNL handling from intel_dmc.c
 drm/i915/display: remove explicit CNL handling from intel_dp.c
 drm/i915/display: remove explicit CNL handling from intel_dpll_mgr.c
 drm/i915/display: remove explicit CNL handling from intel_vdsc.c
 drm/i915/display: remove explicit CNL handling from
   skl_universal_plane.c
 drm/i915/display: remove explicit CNL handling from
   intel_display_power.c
 drm/i915/display: remove CNL ddi buf translation tables
 drm/i915/display: rename CNL references in skl_scaler.c
 drm/i915: remove explicit CNL handling from i915_irq.c
 drm/i915: remove explicit CNL handling from intel_pm.c
 drm/i915: remove explicit CNL handling from intel_mocs.c
 drm/i915: remove explicit CNL handling from intel_pch.c
 drm/i915: remove explicit CNL handling from intel_wopcm.c
 drm/i915/gt: remove explicit CNL handling from intel_sseu.c
 drm/i915: rename CNL references in intel_dram.c
 drm/i915/gt: rename CNL references in intel_engine.h
 drm/i915: finish removal of CNL
 drm/i915: remove GRAPHICS_VER == 10
 drm/i915: rename/remove CNL registers
 drm/i915: replace random CNL comments
 drm/i915: switch num_scalers/num_sprites to consider DISPLAY_VER

drivers/gpu/drm/i915/display/intel_bios.c |   8 +-
drivers/gpu/drm/i915/display/intel_cdclk.c|  72 +-
drivers/gpu/drm/i915/display/intel_color.c|   5 +-
.../gpu/drm/i915/display/intel_combo_phy.c| 106 +--
drivers/gpu/drm/i915/display/intel_crtc.c |   2 +-
drivers/gpu/drm/i915/display/intel_ddi.c  | 266 +---
.../drm/i915/display/intel_ddi_buf_trans.c| 616 +-
.../drm/i915/display/intel_ddi_buf_trans.h|   4 +-
drivers/gpu/drm/i915/display/intel_display.c  |   3 +-
.../drm/i915/display/intel_display_debugfs.c  |   2 +-
.../drm/i915/display/intel_display_power.c| 289 
.../drm/i915/display/intel_display_power.h|   2 -
drivers/gpu/drm/i915/display/intel_dmc.c  |   9 -
drivers/gpu/drm/i915/display/intel_dp.c   |  35 +-
drivers/gpu/drm/i915/display/intel_dp_aux.c   |   1 -
drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 586 +++--
drivers/gpu/drm/i915/display/intel_dpll_mgr.h |   1 -
drivers/gpu/drm/i915/display/intel_vbt_defs.h |   2 +-
drivers/gpu/drm/i915/display/intel_vdsc.c |   5 +-
drivers/gpu/drm/i915/display/skl_scaler.c |  10 +-
.../drm/i915/display/skl_universal_plane.c|  14 +-
drivers/gpu/drm/i915/gem/i915_gem_stolen.c|   1 -
drivers/gpu/drm/i915/gt/debugfs_gt_pm.c   |  10 +-
drivers/gpu/drm/i915/gt/intel_engine.h|   2 +-
drivers/gpu/drm/i915/gt/intel_engine_cs.c |   3 -
drivers/gpu/drm/i915/gt/intel_ggtt.c  |   4 +-
.../gpu/drm/i915/gt/intel_gt_clock_utils.c|  10 +-
drivers/gpu/drm/i915/gt/intel_gtt.c   |   6 +-
drivers/gpu/drm/i915/gt/intel_lrc.c   |  42 +-
drivers/gpu/drm/i915/gt/intel_mocs.c  |   2 +-
drivers/gpu/drm/i915/gt/intel_rc6.c   |   2 +-
drivers/gpu/drm/i915/gt/intel_rps.c   |   4 +-
drivers/gpu/drm/i915/gt/intel_sseu.c  |  79 ---
drivers/gpu/drm/i915/gt/intel_sseu.h  |   2 +-
drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c  |   6 +-
drivers/gpu/drm/i915/gvt/gtt.c|   2 +-
drivers/gpu/drm/i915/i915_debugfs.c   |   6 +-
drivers/gpu/drm/i915/i915_drv.h   |  13 +-
drivers/gpu/drm/i915/i915_irq.c   |   7 +-
drivers/gpu/drm/i915/i915_pci.c   |  23 +-
drivers/gpu/drm/i915/i915_perf.c  |  22 +-
drivers/gpu/drm/i915/i915_reg.h   | 245 ++-
drivers/gpu/drm/i915/intel_device_info.c  |  23 +-
drivers/gpu/drm/i915/intel_device_info.h  |   4 +-
drivers/gpu/drm/i915/intel_dram.c |  32 +-

Re: [Intel-gfx] [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-23 Thread Jason Ekstrand
Are there IGTs for this anywhere?

On Fri, Jul 23, 2021 at 12:47 PM Jason Ekstrand  wrote:
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12044
>
> On Fri, Jul 23, 2021 at 6:35 AM Matthew Auld  wrote:
> >
> > From: Chris Wilson 
> >
> > Jason Ekstrand requested a more efficient method than userptr+set-domain
> > to determine if the userptr object was backed by a complete set of pages
> > upon creation. To be more efficient than simply populating the userptr
> > using get_user_pages() (as done by the call to set-domain or execbuf),
> > we can walk the tree of vm_area_struct and check for gaps or vma not
> > backed by struct page (VM_PFNMAP). The question is how to handle
> > VM_MIXEDMAP which may be either struct page or pfn backed...
> >
> > With discrete we are going to drop support for set_domain(), so offering
> > a way to probe the pages, without having to resort to dummy batches has
> > been requested.
> >
> > v2:
> > - add new query param for the PROBE flag, so userspace can easily
> >   check if the kernel supports it(Jason).
> > - use mmap_read_{lock, unlock}.
> > - add some kernel-doc.
> > v3:
> > - In the docs also mention that PROBE doesn't guarantee that the pages
> >   will remain valid by the time they are actually used(Tvrtko).
> > - Add a small comment for the hole finding logic(Jason).
> > - Move the param next to all the other params which just return true.
> >
> > Testcase: igt/gem_userptr_blits/probe
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Matthew Auld 
> > Cc: Thomas Hellström 
> > Cc: Maarten Lankhorst 
> > Cc: Tvrtko Ursulin 
> > Cc: Jordan Justen 
> > Cc: Kenneth Graunke 
> > Cc: Jason Ekstrand 
> > Cc: Daniel Vetter 
> > Cc: Ramalingam C 
> > Reviewed-by: Tvrtko Ursulin 
> > Acked-by: Kenneth Graunke 
> > Reviewed-by: Jason Ekstrand 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 -
> >  drivers/gpu/drm/i915/i915_getparam.c|  1 +
> >  include/uapi/drm/i915_drm.h | 20 ++
> >  3 files changed, 61 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > index 56edfeff8c02..468a7a617fbf 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops 
> > i915_gem_userptr_ops = {
> >
> >  #endif
> >
> > +static int
> > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> > +{
> > +   const unsigned long end = addr + len;
> > +   struct vm_area_struct *vma;
> > +   int ret = -EFAULT;
> > +
> > +   mmap_read_lock(mm);
> > +   for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> > +   /* Check for holes, note that we also update the addr below 
> > */
> > +   if (vma->vm_start > addr)
> > +   break;
> > +
> > +   if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> > +   break;
> > +
> > +   if (vma->vm_end >= end) {
> > +   ret = 0;
> > +   break;
> > +   }
> > +
> > +   addr = vma->vm_end;
> > +   }
> > +   mmap_read_unlock(mm);
> > +
> > +   return ret;
> > +}
> > +
> >  /*
> >   * Creates a new mm object that wraps some normal memory from the process
> >   * context - user memory.
> > @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > }
> >
> > if (args->flags & ~(I915_USERPTR_READ_ONLY |
> > -   I915_USERPTR_UNSYNCHRONIZED))
> > +   I915_USERPTR_UNSYNCHRONIZED |
> > +   I915_USERPTR_PROBE))
> > return -EINVAL;
> >
> > if (i915_gem_object_size_2big(args->user_size))
> > @@ -504,6 +533,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > return -ENODEV;
> > }
> >
> > +   if (args->flags & I915_USERPTR_PROBE) {
> > +   /*
> > +* Check that the range pointed to represents real struct
> > +* pages and not iomappings (at this moment in time!)
> > +*/
> > +

Re: [Intel-gfx] [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-23 Thread Jason Ekstrand
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12044

On Fri, Jul 23, 2021 at 6:35 AM Matthew Auld  wrote:
>
> From: Chris Wilson 
>
> Jason Ekstrand requested a more efficient method than userptr+set-domain
> to determine if the userptr object was backed by a complete set of pages
> upon creation. To be more efficient than simply populating the userptr
> using get_user_pages() (as done by the call to set-domain or execbuf),
> we can walk the tree of vm_area_struct and check for gaps or vma not
> backed by struct page (VM_PFNMAP). The question is how to handle
> VM_MIXEDMAP which may be either struct page or pfn backed...
>
> With discrete we are going to drop support for set_domain(), so offering
> a way to probe the pages, without having to resort to dummy batches has
> been requested.
>
> v2:
> - add new query param for the PROBE flag, so userspace can easily
>   check if the kernel supports it(Jason).
> - use mmap_read_{lock, unlock}.
> - add some kernel-doc.
> v3:
> - In the docs also mention that PROBE doesn't guarantee that the pages
>   will remain valid by the time they are actually used(Tvrtko).
> - Add a small comment for the hole finding logic(Jason).
> - Move the param next to all the other params which just return true.
>
> Testcase: igt/gem_userptr_blits/probe
> Signed-off-by: Chris Wilson 
> Signed-off-by: Matthew Auld 
> Cc: Thomas Hellström 
> Cc: Maarten Lankhorst 
> Cc: Tvrtko Ursulin 
> Cc: Jordan Justen 
> Cc: Kenneth Graunke 
> Cc: Jason Ekstrand 
> Cc: Daniel Vetter 
> Cc: Ramalingam C 
> Reviewed-by: Tvrtko Ursulin 
> Acked-by: Kenneth Graunke 
> Reviewed-by: Jason Ekstrand 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 -
>  drivers/gpu/drm/i915/i915_getparam.c|  1 +
>  include/uapi/drm/i915_drm.h | 20 ++
>  3 files changed, 61 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> index 56edfeff8c02..468a7a617fbf 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops 
> i915_gem_userptr_ops = {
>
>  #endif
>
> +static int
> +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> +{
> +   const unsigned long end = addr + len;
> +   struct vm_area_struct *vma;
> +   int ret = -EFAULT;
> +
> +   mmap_read_lock(mm);
> +   for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> +   /* Check for holes, note that we also update the addr below */
> +   if (vma->vm_start > addr)
> +   break;
> +
> +   if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> +   break;
> +
> +   if (vma->vm_end >= end) {
> +   ret = 0;
> +   break;
> +   }
> +
> +   addr = vma->vm_end;
> +   }
> +   mmap_read_unlock(mm);
> +
> +   return ret;
> +}
> +
>  /*
>   * Creates a new mm object that wraps some normal memory from the process
>   * context - user memory.
> @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> }
>
> if (args->flags & ~(I915_USERPTR_READ_ONLY |
> -   I915_USERPTR_UNSYNCHRONIZED))
> +   I915_USERPTR_UNSYNCHRONIZED |
> +   I915_USERPTR_PROBE))
> return -EINVAL;
>
> if (i915_gem_object_size_2big(args->user_size))
> @@ -504,6 +533,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> return -ENODEV;
> }
>
> +   if (args->flags & I915_USERPTR_PROBE) {
> +   /*
> +* Check that the range pointed to represents real struct
> +* pages and not iomappings (at this moment in time!)
> +*/
> +   ret = probe_range(current->mm, args->user_ptr, 
> args->user_size);
> +   if (ret)
> +   return ret;
> +   }
> +
>  #ifdef CONFIG_MMU_NOTIFIER
> obj = i915_gem_object_alloc();
> if (obj == NULL)
> diff --git a/drivers/gpu/drm/i915/i915_getparam.c 
> b/drivers/gpu/drm/i915/i915_getparam.c
> index 24e18219eb50..bbb7cac43eb4 100644
> --- a/drivers/gpu/drm/i915/i915_getparam.c
> +++ b/drivers/gpu/drm/i915/i915_getparam.c
> @@ -134,6 +134,7 @@ int i915_getparam_ioctl(struct drm_device *dev, void 
> *data,
> case I915_PARAM_HAS_EXEC_FEN

[Intel-gfx] [PATCH 6/8] drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails

2021-07-23 Thread Jason Ekstrand
Without TTM, we have no such hook so we exit early but this is fine
because we use TTM on all LMEM platforms and, on integrated platforms,
there is no real migration.  If we do have the hook, it's better to just
let TTM handle the migration because it knows where things are actually
placed.

This fixes a bug where i915_gem_object_migrate fails to migrate newly
created LMEM objects.  In that scenario, the object has obj->mm.region
set to LMEM but TTM has it in SMEM because that's where all new objects
are placed there prior to getting actual pages.  When we invoke
i915_gem_object_migrate, it exits early because, from the point of view
of the GEM object, it's already in LMEM and no migration is needed.
Then, when we try to pin the pages, __i915_ttm_get_pages is called
which, unaware of our failed attempt at a migration, places the object
in SMEM.  This only happens on newly created objects because they have
this weird state where TTM thinks they're in SMEM, GEM thinks they're in
LMEM, and the reality is that they don't exist at all.

It's better if GEM just always calls into TTM and let's TTM handle
things.  That way the lies stay better contained.  Once the migration is
complete, the object will have pages, obj->mm.region will be correct,
and we're done lying.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index d09bd9bdb38ac..9d3497e1235a0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -607,12 +607,15 @@ int i915_gem_object_migrate(struct drm_i915_gem_object 
*obj,
mr = i915->mm.regions[id];
GEM_BUG_ON(!mr);
 
-   if (obj->mm.region == mr)
-   return 0;
-
if (!i915_gem_object_can_migrate(obj, id))
return -EINVAL;
 
+   if (!obj->ops->migrate) {
+   if (GEM_WARN_ON(obj->mm.region != mr))
+   return -EINVAL;
+   return 0;
+   }
+
return obj->ops->migrate(obj, mr);
 }
 
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 7/8] drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)

2021-07-23 Thread Jason Ekstrand
From: Thomas Hellström 

If our exported dma-bufs are imported by another instance of our driver,
that instance will typically have the imported dma-bufs locked during
dma_buf_map_attachment(). But the exporter also locks the same reservation
object in the map_dma_buf() callback, which leads to recursive locking.

So taking the lock inside _pin_pages_unlocked() is incorrect.

Additionally, the current pinning code path is contrary to the defined
way that pinning should occur.

Remove the explicit pin/unpin from the map/umap functions and move them
to the attach/detach allowing correct locking to occur, and to match
the static dma-buf drm_prime pattern.

Add a live selftest to exercise both dynamic and non-dynamic
exports.

v2:
- Extend the selftest with a fake dynamic importer.
- Provide real pin and unpin callbacks to not abuse the interface.
v3: (ruhl)
- Remove the dynamic export support and move the pinning into the
  attach/detach path.
v4: (ruhl)
- Put pages does not need to assert on the dma-resv
v5: (jason)
- Lock around dma_buf_unmap_attachment() when emulating a dynamic
  importer in the subtests.
- Use pin_pages_unlocked
v6: (jason)
- Use dma_buf_attach instead of dma_buf_attach_dynamic in the selftests
v7: (mauld)
- Use __i915_gem_object_get_pages (2 __underscores) instead of the
  4 underscore version in the selftests
v8: (mauld)
- Drop the kernel doc from the static i915_gem_dmabuf_attach function
- Add missing "err = PTR_ERR()" to a bunch of selftest error cases

Reported-by: Michael J. Ruhl 
Signed-off-by: Thomas Hellström 
Signed-off-by: Michael J. Ruhl 
Signed-off-by: Jason Ekstrand 
Reviewed-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  37 --
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 109 +-
 2 files changed, 132 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 616c3a2f1baf0..59dc56ae14d6b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -12,6 +12,8 @@
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 
+I915_SELFTEST_DECLARE(static bool force_different_devices;)
+
 static struct drm_i915_gem_object *dma_buf_to_obj(struct dma_buf *buf)
 {
return to_intel_bo(buf->priv);
@@ -25,15 +27,11 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
struct scatterlist *src, *dst;
int ret, i;
 
-   ret = i915_gem_object_pin_pages_unlocked(obj);
-   if (ret)
-   goto err;
-
/* Copy sg so that we make an independent mapping */
st = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
if (st == NULL) {
ret = -ENOMEM;
-   goto err_unpin_pages;
+   goto err;
}
 
ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);
@@ -58,8 +56,6 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
sg_free_table(st);
 err_free:
kfree(st);
-err_unpin_pages:
-   i915_gem_object_unpin_pages(obj);
 err:
return ERR_PTR(ret);
 }
@@ -68,13 +64,9 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment 
*attachment,
   struct sg_table *sg,
   enum dma_data_direction dir)
 {
-   struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
-
dma_unmap_sgtable(attachment->dev, sg, dir, DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sg);
kfree(sg);
-
-   i915_gem_object_unpin_pages(obj);
 }
 
 static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map 
*map)
@@ -168,7 +160,25 @@ static int i915_gem_end_cpu_access(struct dma_buf 
*dma_buf, enum dma_data_direct
return err;
 }
 
+static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf,
+ struct dma_buf_attachment *attach)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+
+   return i915_gem_object_pin_pages_unlocked(obj);
+}
+
+static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf,
+  struct dma_buf_attachment *attach)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+
+   i915_gem_object_unpin_pages(obj);
+}
+
 static const struct dma_buf_ops i915_dmabuf_ops =  {
+   .attach = i915_gem_dmabuf_attach,
+   .detach = i915_gem_dmabuf_detach,
.map_dma_buf = i915_gem_map_dma_buf,
.unmap_dma_buf = i915_gem_unmap_dma_buf,
.release = drm_gem_dmabuf_release,
@@ -204,6 +214,8 @@ static int i915_gem_object_get_pages_dmabuf(struct 
drm_i915_gem_object *obj)
struct sg_table *pages;
unsigned int sg_page_sizes;
 
+   assert_object_held(obj);
+
pages = dma_buf_map_attachment(obj->base.import_attach,
 

[Intel-gfx] [PATCH 8/8] drm/i915/gem: Migrate to system at dma-buf attach time (v7)

2021-07-23 Thread Jason Ekstrand
From: Thomas Hellström 

Until we support p2p dma or as a complement to that, migrate data
to system memory at dma-buf attach time if possible.

v2:
- Rebase on dynamic exporter. Update the igt_dmabuf_import_same_driver
  selftest to migrate if we are LMEM capable.
v3:
- Migrate also in the pin() callback.
v4:
- Migrate in attach
v5: (jason)
- Lock around the migration
v6: (jason)
- Move the can_migrate check outside the lock
- Rework the selftests to test more migration conditions.  In
  particular, SMEM, LMEM, and LMEM+SMEM are all checked.
v7: (mauld)
- Misc style nits

Signed-off-by: Thomas Hellström 
Signed-off-by: Michael J. Ruhl 
Reported-by: kernel test robot 
Signed-off-by: Jason Ekstrand 
Reviewed-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 23 -
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 87 ++-
 2 files changed, 106 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 59dc56ae14d6b..afa34111de02e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -164,8 +164,29 @@ static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf,
  struct dma_buf_attachment *attach)
 {
struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+   struct i915_gem_ww_ctx ww;
+   int err;
+
+   if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM))
+   return -EOPNOTSUPP;
+
+   for_i915_gem_ww(, err, true) {
+   err = i915_gem_object_lock(obj, );
+   if (err)
+   continue;
+
+   err = i915_gem_object_migrate(obj, , INTEL_REGION_SMEM);
+   if (err)
+   continue;
 
-   return i915_gem_object_pin_pages_unlocked(obj);
+   err = i915_gem_object_wait_migration(obj, 0);
+   if (err)
+   continue;
+
+   err = i915_gem_object_pin_pages(obj);
+   }
+
+   return err;
 }
 
 static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf,
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
index d4ce01e6ee854..ffae7df5e4d7d 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
@@ -85,9 +85,63 @@ static int igt_dmabuf_import_self(void *arg)
return err;
 }
 
-static int igt_dmabuf_import_same_driver(void *arg)
+static int igt_dmabuf_import_same_driver_lmem(void *arg)
 {
struct drm_i915_private *i915 = arg;
+   struct intel_memory_region *lmem = i915->mm.regions[INTEL_REGION_LMEM];
+   struct drm_i915_gem_object *obj;
+   struct drm_gem_object *import;
+   struct dma_buf *dmabuf;
+   int err;
+
+   if (!lmem)
+   return 0;
+
+   force_different_devices = true;
+
+   obj = __i915_gem_object_create_user(i915, PAGE_SIZE, , 1);
+   if (IS_ERR(obj)) {
+   pr_err("__i915_gem_object_create_user failed with err=%ld\n",
+  PTR_ERR(dmabuf));
+   err = PTR_ERR(obj);
+   goto out_ret;
+   }
+
+   dmabuf = i915_gem_prime_export(>base, 0);
+   if (IS_ERR(dmabuf)) {
+   pr_err("i915_gem_prime_export failed with err=%ld\n",
+  PTR_ERR(dmabuf));
+   err = PTR_ERR(dmabuf);
+   goto out;
+   }
+
+   /*
+* We expect an import of an LMEM-only object to fail with
+* -EOPNOTSUPP because it can't be migrated to SMEM.
+*/
+   import = i915_gem_prime_import(>drm, dmabuf);
+   if (!IS_ERR(import)) {
+   drm_gem_object_put(import);
+   pr_err("i915_gem_prime_import succeeded when it shouldn't 
have\n");
+   err = -EINVAL;
+   } else if (PTR_ERR(import) != -EOPNOTSUPP) {
+   pr_err("i915_gem_prime_import failed with the wrong err=%ld\n",
+  PTR_ERR(import));
+   err = PTR_ERR(import);
+   }
+
+   dma_buf_put(dmabuf);
+out:
+   i915_gem_object_put(obj);
+out_ret:
+   force_different_devices = false;
+   return err;
+}
+
+static int igt_dmabuf_import_same_driver(struct drm_i915_private *i915,
+struct intel_memory_region **regions,
+unsigned int num_regions)
+{
struct drm_i915_gem_object *obj, *import_obj;
struct drm_gem_object *import;
struct dma_buf *dmabuf;
@@ -97,8 +151,12 @@ static int igt_dmabuf_import_same_driver(void *arg)
int err;
 
force_different_devices = true;
-   obj = i915_gem_object_create_shmem(i915, PAGE_SIZE);
+
+   obj = __i915_gem_object_create_user(i915, PAGE_SIZE,
+   

[Intel-gfx] [PATCH 3/8] drm/i915/gem: Call i915_gem_flush_free_objects() in i915_gem_dumb_create()

2021-07-23 Thread Jason Ekstrand
This doesn't really fix anything serious since the chances of a client
creating and destroying a mass of dumb BOs is pretty low.  However, it
is called by the other two create IOCTLs to garbage collect old objects.
Call it here too for consistency.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index aa687b10dcd45..adcce37c04b8d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -151,6 +151,8 @@ i915_gem_dumb_create(struct drm_file *file,
if (args->pitch < args->width)
return -EINVAL;
 
+   i915_gem_flush_free_objects(i915);
+
args->size = mul_u32_u32(args->pitch, args->height);
 
mem_type = INTEL_MEMORY_SYSTEM;
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 4/8] drm/i915/gem: Unify user object creation (v3)

2021-07-23 Thread Jason Ekstrand
Instead of hand-rolling the same three calls in each function, pull them
into an i915_gem_object_create_user helper.  Apart from re-ordering of
the placements array ENOMEM check, there should be no functional change.

v2 (Matthew Auld):
 - Add the call to i915_gem_flush_free_objects() from
   i915_gem_dumb_create() in a separate patch
 - Move i915_gem_object_alloc() below the simple error checks
v3 (Matthew Auld):
 - Add __ to i915_gem_object_create_user and kerneldoc which warns the
   caller that it's not validating anything.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 119 ++---
 drivers/gpu/drm/i915/gem/i915_gem_object.h |   4 +
 2 files changed, 58 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index adcce37c04b8d..23fee13a33844 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -11,13 +11,14 @@
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 
-static u32 object_max_page_size(struct drm_i915_gem_object *obj)
+static u32 object_max_page_size(struct intel_memory_region **placements,
+   unsigned int n_placements)
 {
u32 max_page_size = 0;
int i;
 
-   for (i = 0; i < obj->mm.n_placements; i++) {
-   struct intel_memory_region *mr = obj->mm.placements[i];
+   for (i = 0; i < n_placements; i++) {
+   struct intel_memory_region *mr = placements[i];
 
GEM_BUG_ON(!is_power_of_2(mr->min_page_size));
max_page_size = max_t(u32, max_page_size, mr->min_page_size);
@@ -81,22 +82,46 @@ static int i915_gem_publish(struct drm_i915_gem_object *obj,
return 0;
 }
 
-static int
-i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
+/**
+ * Creates a new object using the same path as DRM_I915_GEM_CREATE_EXT
+ * @i915: i915 private
+ * @size: size of the buffer, in bytes
+ * @placements: possible placement regions, in priority order
+ * @n_placements: number of possible placement regions
+ *
+ * This function is exposed primarily for selftests and does very little
+ * error checking.  It is assumed that the set of placement regions has
+ * already been verified to be valid.
+ */
+struct drm_i915_gem_object *
+__i915_gem_object_create_user(struct drm_i915_private *i915, u64 size,
+ struct intel_memory_region **placements,
+ unsigned int n_placements)
 {
-   struct intel_memory_region *mr = obj->mm.placements[0];
+   struct intel_memory_region *mr = placements[0];
+   struct drm_i915_gem_object *obj;
unsigned int flags;
int ret;
 
-   size = round_up(size, object_max_page_size(obj));
+   i915_gem_flush_free_objects(i915);
+
+   size = round_up(size, object_max_page_size(placements, n_placements));
if (size == 0)
-   return -EINVAL;
+   return ERR_PTR(-EINVAL);
 
/* For most of the ABI (e.g. mmap) we think in system pages */
GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
 
if (i915_gem_object_size_2big(size))
-   return -E2BIG;
+   return ERR_PTR(-E2BIG);
+
+   obj = i915_gem_object_alloc();
+   if (!obj)
+   return ERR_PTR(-ENOMEM);
+
+   ret = object_set_placements(obj, placements, n_placements);
+   if (ret)
+   goto object_free;
 
/*
 * I915_BO_ALLOC_USER will make sure the object is cleared before
@@ -106,12 +131,18 @@ i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
 
ret = mr->ops->init_object(mr, obj, size, 0, flags);
if (ret)
-   return ret;
+   goto object_free;
 
GEM_BUG_ON(size != obj->base.size);
 
trace_i915_gem_object_create(obj);
-   return 0;
+   return obj;
+
+object_free:
+   if (obj->mm.n_placements > 1)
+   kfree(obj->mm.placements);
+   i915_gem_object_free(obj);
+   return ERR_PTR(ret);
 }
 
 int
@@ -124,7 +155,6 @@ i915_gem_dumb_create(struct drm_file *file,
enum intel_memory_type mem_type;
int cpp = DIV_ROUND_UP(args->bpp, 8);
u32 format;
-   int ret;
 
switch (cpp) {
case 1:
@@ -151,32 +181,19 @@ i915_gem_dumb_create(struct drm_file *file,
if (args->pitch < args->width)
return -EINVAL;
 
-   i915_gem_flush_free_objects(i915);
-
args->size = mul_u32_u32(args->pitch, args->height);
 
mem_type = INTEL_MEMORY_SYSTEM;
if (HAS_LMEM(to_i915(dev)))
mem_type = INTEL_MEMORY_LOCAL;
 
-   obj = i915_gem_object_alloc();
-   if (!obj)
-   return -ENOMEM;
-
mr = intel_memory_region_by_type(to_i915(dev), mem_type);
-   

[Intel-gfx] [PATCH 2/8] drm/i915/gem: Refactor placement setup for i915_gem_object_create* (v2)

2021-07-23 Thread Jason Ekstrand
Since we don't allow changing the set of regions after creation, we can
make ext_set_placements() build up the region set directly in the
create_ext and assign it to the object later.  This is similar to what
we did for contexts with the proto-context only simpler because there's
no funny object shuffling.  This will be used in the next patch to allow
us to de-duplicate a bunch of code.  Also, since we know the maximum
number of regions up-front, we can use a fixed-size temporary array for
the regions.  This simplifies memory management a bit for this new
delayed approach.

v2 (Matthew Auld):
 - Get rid of MAX_N_PLACEMENTS
 - Drop kfree(placements) from set_placements()
v3 (Matthew Auld):
 - Properly set ext_data->n_placements

Signed-off-by: Jason Ekstrand 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 82 --
 1 file changed, 46 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 51f92e4b1a69d..aa687b10dcd45 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -27,10 +27,13 @@ static u32 object_max_page_size(struct drm_i915_gem_object 
*obj)
return max_page_size;
 }
 
-static void object_set_placements(struct drm_i915_gem_object *obj,
- struct intel_memory_region **placements,
- unsigned int n_placements)
+static int object_set_placements(struct drm_i915_gem_object *obj,
+struct intel_memory_region **placements,
+unsigned int n_placements)
 {
+   struct intel_memory_region **arr;
+   unsigned int i;
+
GEM_BUG_ON(!n_placements);
 
/*
@@ -44,9 +47,20 @@ static void object_set_placements(struct drm_i915_gem_object 
*obj,
obj->mm.placements = >mm.regions[mr->id];
obj->mm.n_placements = 1;
} else {
-   obj->mm.placements = placements;
+   arr = kmalloc_array(n_placements,
+   sizeof(struct intel_memory_region *),
+   GFP_KERNEL);
+   if (!arr)
+   return -ENOMEM;
+
+   for (i = 0; i < n_placements; i++)
+   arr[i] = placements[i];
+
+   obj->mm.placements = arr;
obj->mm.n_placements = n_placements;
}
+
+   return 0;
 }
 
 static int i915_gem_publish(struct drm_i915_gem_object *obj,
@@ -148,7 +162,9 @@ i915_gem_dumb_create(struct drm_file *file,
return -ENOMEM;
 
mr = intel_memory_region_by_type(to_i915(dev), mem_type);
-   object_set_placements(obj, , 1);
+   ret = object_set_placements(obj, , 1);
+   if (ret)
+   goto object_free;
 
ret = i915_gem_setup(obj, args->size);
if (ret)
@@ -184,7 +200,9 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
return -ENOMEM;
 
mr = intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
-   object_set_placements(obj, , 1);
+   ret = object_set_placements(obj, , 1);
+   if (ret)
+   goto object_free;
 
ret = i915_gem_setup(obj, args->size);
if (ret)
@@ -199,7 +217,8 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
 
 struct create_ext {
struct drm_i915_private *i915;
-   struct drm_i915_gem_object *vanilla_object;
+   struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
+   unsigned int n_placements;
 };
 
 static void repr_placements(char *buf, size_t size,
@@ -230,8 +249,7 @@ static int set_placements(struct 
drm_i915_gem_create_ext_memory_regions *args,
struct drm_i915_private *i915 = ext_data->i915;
struct drm_i915_gem_memory_class_instance __user *uregions =
u64_to_user_ptr(args->regions);
-   struct drm_i915_gem_object *obj = ext_data->vanilla_object;
-   struct intel_memory_region **placements;
+   struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
u32 mask;
int i, ret = 0;
 
@@ -245,6 +263,8 @@ static int set_placements(struct 
drm_i915_gem_create_ext_memory_regions *args,
ret = -EINVAL;
}
 
+   BUILD_BUG_ON(ARRAY_SIZE(i915->mm.regions) != ARRAY_SIZE(placements));
+   BUILD_BUG_ON(ARRAY_SIZE(ext_data->placements) != 
ARRAY_SIZE(placements));
if (args->num_regions > ARRAY_SIZE(i915->mm.regions)) {
drm_dbg(>drm, "num_regions is too large\n");
ret = -EINVAL;
@@ -253,21 +273,13 @@ static int set_placements(struct 
drm_i915_gem_create_ext_memory_regions *args,
if (ret)
return ret;
 
-   placements = kmalloc_array(args->num_regions,
-  sizeof(str

[Intel-gfx] [PATCH 5/8] drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed

2021-07-23 Thread Jason Ekstrand
__i915_ttm_get_pages does two things.  First, it calls ttm_bo_validate()
to check the given placement and migrate the BO if needed.  Then, it
updates the GEM object to match, in case the object was migrated.  If
no migration occured, however, we might still have pages on the GEM
object in which case we don't need to fetch them from TTM and call
__i915_gem_object_set_pages.  This hasn't been a problem before because
the primary user of __i915_ttm_get_pages is __i915_gem_object_get_pages
which only calls it if the GEM object doesn't have pages.

However, i915_ttm_migrate also uses __i915_ttm_get_pages to do the
migration so this meant it was unsafe to call on an already populated
object.  This patch checks i915_gem_object_has_pages() before trying to
__i915_gem_object_set_pages so i915_ttm_migrate is safe to call, even on
populated objects.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index f253b11e9e367..771eb2963123f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -662,13 +662,14 @@ static int __i915_ttm_get_pages(struct 
drm_i915_gem_object *obj,
i915_ttm_adjust_gem_after_move(obj);
}
 
-   GEM_WARN_ON(obj->mm.pages);
-   /* Object either has a page vector or is an iomem object */
-   st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
-   if (IS_ERR(st))
-   return PTR_ERR(st);
+   if (!i915_gem_object_has_pages(obj)) {
+   /* Object either has a page vector or is an iomem object */
+   st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : 
obj->ttm.cached_io_st;
+   if (IS_ERR(st))
+   return PTR_ERR(st);
 
-   __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
+   __i915_gem_object_set_pages(obj, st, 
i915_sg_dma_sizes(st->sgl));
+   }
 
return ret;
 }
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/8] drm/i915/gem: Check object_can_migrate from object_migrate

2021-07-23 Thread Jason Ekstrand
We don't roll them together entirely because there are still a couple
cases where we want a separate can_migrate check.  For instance, the
display code checks that you can migrate a buffer to LMEM before it
accepts it in fb_create.  The dma-buf import code also uses it to do an
early check and return a different error code if someone tries to attach
a LMEM-only dma-buf to another driver.

However, no one actually wants to call object_migrate when can_migrate
has failed.  The stated intention is for self-tests but none of those
actually take advantage of this unsafe migration.

Signed-off-by: Jason Ekstrand 
Cc: Daniel Vetter 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c| 13 ++---
 .../gpu/drm/i915/gem/selftests/i915_gem_migrate.c | 15 ---
 2 files changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 5c21cff33199e..d09bd9bdb38ac 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -584,12 +584,6 @@ bool i915_gem_object_can_migrate(struct 
drm_i915_gem_object *obj,
  * completed yet, and to accomplish that, i915_gem_object_wait_migration()
  * must be called.
  *
- * This function is a bit more permissive than i915_gem_object_can_migrate()
- * to allow for migrating objects where the caller knows exactly what is
- * happening. For example within selftests. More specifically this
- * function allows migrating I915_BO_ALLOC_USER objects to regions
- * that are not in the list of allowable regions.
- *
  * Note: the @ww parameter is not used yet, but included to make sure
  * callers put some effort into obtaining a valid ww ctx if one is
  * available.
@@ -616,11 +610,8 @@ int i915_gem_object_migrate(struct drm_i915_gem_object 
*obj,
if (obj->mm.region == mr)
return 0;
 
-   if (!i915_gem_object_evictable(obj))
-   return -EBUSY;
-
-   if (!obj->ops->migrate)
-   return -EOPNOTSUPP;
+   if (!i915_gem_object_can_migrate(obj, id))
+   return -EINVAL;
 
return obj->ops->migrate(obj, mr);
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index 0b7144d2991ca..28a700f08b49a 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -61,11 +61,6 @@ static int igt_create_migrate(struct intel_gt *gt, enum 
intel_region_id src,
if (err)
continue;
 
-   if (!i915_gem_object_can_migrate(obj, dst)) {
-   err = -EINVAL;
-   continue;
-   }
-
err = i915_gem_object_migrate(obj, , dst);
if (err)
continue;
@@ -114,11 +109,6 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx 
*ww,
return err;
 
if (i915_gem_object_is_lmem(obj)) {
-   if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM)) {
-   pr_err("object can't migrate to smem.\n");
-   return -EINVAL;
-   }
-
err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM);
if (err) {
pr_err("Object failed migration to smem\n");
@@ -137,11 +127,6 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx 
*ww,
}
 
} else {
-   if (!i915_gem_object_can_migrate(obj, INTEL_REGION_LMEM)) {
-   pr_err("object can't migrate to lmem.\n");
-   return -EINVAL;
-   }
-
err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM);
if (err) {
pr_err("Object failed migration to lmem\n");
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/8] drm/i915: Migrate memory to SMEM when imported cross-device (v8)

2021-07-23 Thread Jason Ekstrand
This patch series fixes an issue with discrete graphics on Intel where we
allowed dma-buf import while leaving the object in local memory.  This
breaks down pretty badly if the import happened on a different physical
device.

v7:
 - Drop "drm/i915/gem/ttm: Place new BOs in the requested region"
 - Add a new "drm/i915/gem: Call i915_gem_flush_free_objects() in 
i915_gem_dumb_create()"
 - Misc. review feedback from Matthew Auld
v8:
 - Misc. review feedback from Matthew Auld
v9:
 - Replace the i915/ttm patch with two that are hopefully more correct

Jason Ekstrand (6):
  drm/i915/gem: Check object_can_migrate from object_migrate
  drm/i915/gem: Refactor placement setup for i915_gem_object_create*
(v2)
  drm/i915/gem: Call i915_gem_flush_free_objects() in
i915_gem_dumb_create()
  drm/i915/gem: Unify user object creation (v3)
  drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed
  drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails

Thomas Hellström (2):
  drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)
  drm/i915/gem: Migrate to system at dma-buf attach time (v7)

 drivers/gpu/drm/i915/gem/i915_gem_create.c| 177 
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  58 --
 drivers/gpu/drm/i915/gem/i915_gem_object.c|  20 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.h|   4 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  13 +-
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 190 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  15 --
 7 files changed, 341 insertions(+), 136 deletions(-)

-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Ditch i915 globals shrink infrastructure

2021-07-22 Thread Jason Ekstrand
On Thu, Jul 22, 2021 at 5:34 AM Tvrtko Ursulin
 wrote:
> On 22/07/2021 11:16, Daniel Vetter wrote:
> > On Thu, Jul 22, 2021 at 11:02:55AM +0100, Tvrtko Ursulin wrote:
> >> On 21/07/2021 19:32, Daniel Vetter wrote:
> >>> This essentially reverts
> >>>
> >>> commit 84a1074920523430f9dc30ff907f4801b4820072
> >>> Author: Chris Wilson 
> >>> Date:   Wed Jan 24 11:36:08 2018 +
> >>>
> >>>   drm/i915: Shrink the GEM kmem_caches upon idling
> >>>
> >>> mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it
> >>> then we need to fix that there, not hand-roll our own slab shrinking
> >>> code in i915.
> >>
> >> This is somewhat incomplete statement which ignores a couple of angles so I
> >> wish there was a bit more time to respond before steam rolling it in. :(
> >>
> >> The removed code was not a hand rolled shrinker, but about managing slab
> >> sizes in face of bursty workloads. Core code does not know when i915 is
> >> active and when it is idle, so calling kmem_cache_shrink() after going idle
> >> wass supposed to help with house keeping by doing house keeping work 
> >> outside
> >> of the latency sensitive phase.
> >>
> >> To "fix" (improve really) it in core as you suggest, would need some method
> >> of signaling when a slab user feels is an opportunte moment to do this 
> >> house
> >> keeping. And kmem_cache_shrink is just that so I don't see the problem.
> >>
> >> Granted, argument kmem_cache_shrink is not much used is a valid one so
> >> discussion overall is definitely valid. Becuase on the higher level we 
> >> could
> >> definitely talk about which workloads actually benefit from this code and
> >> how much which probably no one knows at this point.

Pardon me for being a bit curt here, but that discussion should have
happened 3.5 years ago when this landed.  The entire justification we
have on record for this change is, "When we finally decide the gpu is
idle, that is a good time to shrink our kmem_caches."  We have no
record of any workloads which benefit from this and no recorded way to
reproduce any supposed benefits, even if it requires a microbenchmark.
But we added over 100 lines of code for it anyway, including a bunch
of hand-rolled RCU juggling.  Ripping out unjustified complexity is
almost always justified, IMO.  The burden of proof here isn't on
Daniel to show he isn't regressing anything but it was on you and
Chris to show that complexity was worth something back in 2018 when
this landed.

--Jason


> >> But in general I think you needed to leave more time for discussion. 12
> >> hours is way too short.
> >
> > It's 500+ users of kmem_cache_create vs i915 doing kmem_cache_shrink. And
>
> There are two other callers for the record. ;)
>
> > I guarantee you there's slab users that churn through more allocations
> > than we do, and are more bursty.
>
> I wasn't disputing that.
>
> > An extraordinary claim like this needs extraordinary evidence. And then a
> > discussion with core mm/ folks so that we can figure out how to solve the
> > discovered problem best for the other 500+ users of slabs in-tree, so that
> > everyone benefits. Not just i915 gpu workloads.
>
> Yep, not disputing that either. Noticed I wrote it was a valid argument?
>
> But discussion with mm folks could also have happened before you steam
> rolled the "revert" in though. Perhaps tey would have said
> kmem_cache_shrink is the way. Or maybe it isn't. Or maybe they would
> have said meh. I just don't see how the rush was justified given the
> code in question.
>
> Regards,
>
> Tvrtko
>
> > -Daniel
> >
> >>> Noticed while reviewing a patch set from Jason to fix up some issues
> >>> in our i915_init() and i915_exit() module load/cleanup code. Now that
> >>> i915_globals.c isn't any different than normal init/exit functions, we
> >>> should convert them over to one unified table and remove
> >>> i915_globals.[hc] entirely.
> >>>
> >>> Cc: David Airlie 
> >>> Cc: Jason Ekstrand 
> >>> Signed-off-by: Daniel Vetter 
> >>> ---
> >>>drivers/gpu/drm/i915/gem/i915_gem_context.c |  6 --
> >>>drivers/gpu/drm/i915/gem/i915_gem_object.c  |  6 --
> >>>drivers/gpu/drm/i915/gt/intel_context.c |  6 --
> >>>drivers/gpu/drm/i915/gt/intel_gt_pm.c   |  4 -
> >>>drivers/gpu/drm/i915/i915_active.c   

Re: [Intel-gfx] [PATCH 3/4] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-22 Thread Jason Ekstrand
On Thu, Jul 22, 2021 at 3:44 AM Matthew Auld
 wrote:
>
> On Wed, 21 Jul 2021 at 21:28, Jason Ekstrand  wrote:
> >
> > On Thu, Jul 15, 2021 at 5:16 AM Matthew Auld  wrote:
> > >
> > > From: Chris Wilson 
> > >
> > > Jason Ekstrand requested a more efficient method than userptr+set-domain
> > > to determine if the userptr object was backed by a complete set of pages
> > > upon creation. To be more efficient than simply populating the userptr
> > > using get_user_pages() (as done by the call to set-domain or execbuf),
> > > we can walk the tree of vm_area_struct and check for gaps or vma not
> > > backed by struct page (VM_PFNMAP). The question is how to handle
> > > VM_MIXEDMAP which may be either struct page or pfn backed...
> > >
> > > With discrete are going to drop support for set_domain(), so offering a
> > > way to probe the pages, without having to resort to dummy batches has
> > > been requested.
> > >
> > > v2:
> > > - add new query param for the PROPBE flag, so userspace can easily
> > >   check if the kernel supports it(Jason).
> > > - use mmap_read_{lock, unlock}.
> > > - add some kernel-doc.
> > >
> > > Testcase: igt/gem_userptr_blits/probe
> > > Signed-off-by: Chris Wilson 
> > > Signed-off-by: Matthew Auld 
> > > Cc: Thomas Hellström 
> > > Cc: Maarten Lankhorst 
> > > Cc: Tvrtko Ursulin 
> > > Cc: Jordan Justen 
> > > Cc: Kenneth Graunke 
> > > Cc: Jason Ekstrand 
> > > Cc: Daniel Vetter 
> > > Cc: Ramalingam C 
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 -
> > >  drivers/gpu/drm/i915/i915_getparam.c|  3 ++
> > >  include/uapi/drm/i915_drm.h | 18 ++
> > >  3 files changed, 60 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > index 56edfeff8c02..fd6880328596 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops 
> > > i915_gem_userptr_ops = {
> > >
> > >  #endif
> > >
> > > +static int
> > > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> > > +{
> > > +   const unsigned long end = addr + len;
> > > +   struct vm_area_struct *vma;
> > > +   int ret = -EFAULT;
> > > +
> > > +   mmap_read_lock(mm);
> > > +   for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> > > +   if (vma->vm_start > addr)
> >
> > Why isn't this > end?  Are we somehow guaranteed that one vma covers
> > the entire range?
>
> AFAIK we are just making sure we don't have a hole(note that we also
> update addr below), for example the user might have done a partial
> munmap. There could be multiple vma's if the kernel was unable to
> merge them. If we reach the vm_end >= end, then we know we have a
> "valid" range.

Ok.  That wasn't obvious to me but I see the addr update now.  Makes
sense.  Might be worth a one-line comment for the next guy.  Either
way,

Reviewed-by: Jason Ekstrand 

Thanks for wiring this up!

--Jason

> >
> > > +   break;
> > > +
> > > +   if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> > > +   break;
> > > +
> > > +   if (vma->vm_end >= end) {
> > > +   ret = 0;
> > > +   break;
> > > +   }
> > > +
> > > +   addr = vma->vm_end;
> > > +   }
> > > +   mmap_read_unlock(mm);
> > > +
> > > +   return ret;
> > > +}
> > > +
> > >  /*
> > >   * Creates a new mm object that wraps some normal memory from the process
> > >   * context - user memory.
> > > @@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > > }
> > >
> > > if (args->flags & ~(I915_USERPTR_READ_ONLY |
> > > -   I915_USERPTR_UNSYNCHRONIZED))
> > > +   I915_USERPTR_UNSYNCHRONIZED |
> > > +   I915_USERPTR_PROBE))
> > > return -EINVAL;
> > >
&

Re: [Intel-gfx] [PATCH 3/4] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-21 Thread Jason Ekstrand
On Thu, Jul 15, 2021 at 5:16 AM Matthew Auld  wrote:
>
> From: Chris Wilson 
>
> Jason Ekstrand requested a more efficient method than userptr+set-domain
> to determine if the userptr object was backed by a complete set of pages
> upon creation. To be more efficient than simply populating the userptr
> using get_user_pages() (as done by the call to set-domain or execbuf),
> we can walk the tree of vm_area_struct and check for gaps or vma not
> backed by struct page (VM_PFNMAP). The question is how to handle
> VM_MIXEDMAP which may be either struct page or pfn backed...
>
> With discrete are going to drop support for set_domain(), so offering a
> way to probe the pages, without having to resort to dummy batches has
> been requested.
>
> v2:
> - add new query param for the PROPBE flag, so userspace can easily
>   check if the kernel supports it(Jason).
> - use mmap_read_{lock, unlock}.
> - add some kernel-doc.
>
> Testcase: igt/gem_userptr_blits/probe
> Signed-off-by: Chris Wilson 
> Signed-off-by: Matthew Auld 
> Cc: Thomas Hellström 
> Cc: Maarten Lankhorst 
> Cc: Tvrtko Ursulin 
> Cc: Jordan Justen 
> Cc: Kenneth Graunke 
> Cc: Jason Ekstrand 
> Cc: Daniel Vetter 
> Cc: Ramalingam C 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 -
>  drivers/gpu/drm/i915/i915_getparam.c|  3 ++
>  include/uapi/drm/i915_drm.h | 18 ++
>  3 files changed, 60 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> index 56edfeff8c02..fd6880328596 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops 
> i915_gem_userptr_ops = {
>
>  #endif
>
> +static int
> +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> +{
> +   const unsigned long end = addr + len;
> +   struct vm_area_struct *vma;
> +   int ret = -EFAULT;
> +
> +   mmap_read_lock(mm);
> +   for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> +   if (vma->vm_start > addr)

Why isn't this > end?  Are we somehow guaranteed that one vma covers
the entire range?

> +   break;
> +
> +   if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> +   break;
> +
> +   if (vma->vm_end >= end) {
> +   ret = 0;
> +   break;
> +   }
> +
> +   addr = vma->vm_end;
> +   }
> +   mmap_read_unlock(mm);
> +
> +   return ret;
> +}
> +
>  /*
>   * Creates a new mm object that wraps some normal memory from the process
>   * context - user memory.
> @@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> }
>
> if (args->flags & ~(I915_USERPTR_READ_ONLY |
> -   I915_USERPTR_UNSYNCHRONIZED))
> +   I915_USERPTR_UNSYNCHRONIZED |
> +   I915_USERPTR_PROBE))
> return -EINVAL;
>
> if (i915_gem_object_size_2big(args->user_size))
> @@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> return -ENODEV;
> }
>
> +   if (args->flags & I915_USERPTR_PROBE) {
> +   /*
> +* Check that the range pointed to represents real struct
> +* pages and not iomappings (at this moment in time!)
> +*/
> +   ret = probe_range(current->mm, args->user_ptr, 
> args->user_size);
> +   if (ret)
> +   return ret;
> +   }
> +
>  #ifdef CONFIG_MMU_NOTIFIER
> obj = i915_gem_object_alloc();
> if (obj == NULL)
> diff --git a/drivers/gpu/drm/i915/i915_getparam.c 
> b/drivers/gpu/drm/i915/i915_getparam.c
> index 24e18219eb50..d6d2e1a10d14 100644
> --- a/drivers/gpu/drm/i915/i915_getparam.c
> +++ b/drivers/gpu/drm/i915/i915_getparam.c
> @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void 
> *data,
> case I915_PARAM_PERF_REVISION:
> value = i915_perf_ioctl_version();
> break;
> +   case I915_PARAM_HAS_USERPTR_PROBE:
> +   value = true;
> +   break;
> default:
> DRM_DEBUG("Unknown parameter %d\n", param->param);
> return -EINVAL;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index e20eeeca7a1c..2e4112b

Re: [Intel-gfx] [PATCH] drm/i915: Ditch i915 globals shrink infrastructure

2021-07-21 Thread Jason Ekstrand
On Wed, Jul 21, 2021 at 1:32 PM Daniel Vetter  wrote:
>
> This essentially reverts
>
> commit 84a1074920523430f9dc30ff907f4801b4820072
> Author: Chris Wilson 
> Date:   Wed Jan 24 11:36:08 2018 +
>
> drm/i915: Shrink the GEM kmem_caches upon idling
>
> mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it
> then we need to fix that there, not hand-roll our own slab shrinking
> code in i915.
>
> Noticed while reviewing a patch set from Jason to fix up some issues
> in our i915_init() and i915_exit() module load/cleanup code. Now that
> i915_globals.c isn't any different than normal init/exit functions, we
> should convert them over to one unified table and remove
> i915_globals.[hc] entirely.

Mind throwing in a comment somewhere about how i915 is one of only two
users of kmem_cache_shrink() in the entire kernel?  That also seems to
be pretty good evidence that it's not useful.

Reviewed-by: Jason Ekstrand 

Feel free to land at-will and I'll deal with merge conflicts on my end.

> Cc: David Airlie 
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c |  6 --
>  drivers/gpu/drm/i915/gem/i915_gem_object.c  |  6 --
>  drivers/gpu/drm/i915/gt/intel_context.c |  6 --
>  drivers/gpu/drm/i915/gt/intel_gt_pm.c   |  4 -
>  drivers/gpu/drm/i915/i915_active.c  |  6 --
>  drivers/gpu/drm/i915/i915_globals.c | 95 -
>  drivers/gpu/drm/i915/i915_globals.h |  3 -
>  drivers/gpu/drm/i915/i915_request.c |  7 --
>  drivers/gpu/drm/i915/i915_scheduler.c   |  7 --
>  drivers/gpu/drm/i915/i915_vma.c |  6 --
>  10 files changed, 146 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 7d6f52d8a801..bf2a2319353a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -2280,18 +2280,12 @@ i915_gem_engines_iter_next(struct 
> i915_gem_engines_iter *it)
>  #include "selftests/i915_gem_context.c"
>  #endif
>
> -static void i915_global_gem_context_shrink(void)
> -{
> -   kmem_cache_shrink(global.slab_luts);
> -}
> -
>  static void i915_global_gem_context_exit(void)
>  {
> kmem_cache_destroy(global.slab_luts);
>  }
>
>  static struct i915_global_gem_context global = { {
> -   .shrink = i915_global_gem_context_shrink,
> .exit = i915_global_gem_context_exit,
>  } };
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 9da7b288b7ed..5c21cff33199 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -664,18 +664,12 @@ void i915_gem_init__objects(struct drm_i915_private 
> *i915)
> INIT_WORK(>mm.free_work, __i915_gem_free_work);
>  }
>
> -static void i915_global_objects_shrink(void)
> -{
> -   kmem_cache_shrink(global.slab_objects);
> -}
> -
>  static void i915_global_objects_exit(void)
>  {
> kmem_cache_destroy(global.slab_objects);
>  }
>
>  static struct i915_global_object global = { {
> -   .shrink = i915_global_objects_shrink,
> .exit = i915_global_objects_exit,
>  } };
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> b/drivers/gpu/drm/i915/gt/intel_context.c
> index bd63813c8a80..c1338441cc1d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -398,18 +398,12 @@ void intel_context_fini(struct intel_context *ce)
> i915_active_fini(>active);
>  }
>
> -static void i915_global_context_shrink(void)
> -{
> -   kmem_cache_shrink(global.slab_ce);
> -}
> -
>  static void i915_global_context_exit(void)
>  {
> kmem_cache_destroy(global.slab_ce);
>  }
>
>  static struct i915_global_context global = { {
> -   .shrink = i915_global_context_shrink,
> .exit = i915_global_context_exit,
>  } };
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index aef3084e8b16..d86825437516 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -67,8 +67,6 @@ static int __gt_unpark(struct intel_wakeref *wf)
>
> GT_TRACE(gt, "\n");
>
> -   i915_globals_unpark();
> -
> /*
>  * It seems that the DMC likes to transition between the DC states a 
> lot
>  * when there are no connected displays (no active power domains) 
> during
> @@ -116,8 +114,6 @@ static int __gt_park(struct intel_wa

Re: [Intel-gfx] [PATCH 6/6] drm/i915: Make the kmem slab for i915_buddy_block a global

2021-07-21 Thread Jason Ekstrand
On Wed, Jul 21, 2021 at 1:56 PM Daniel Vetter  wrote:
>
> On Wed, Jul 21, 2021 at 05:25:41PM +0100, Matthew Auld wrote:
> > On 21/07/2021 16:23, Jason Ekstrand wrote:
> > > There's no reason that I can tell why this should be per-i915_buddy_mm
> > > and doing so causes KMEM_CACHE to throw dmesg warnings because it tries
> > > to create a debugfs entry with the name i915_buddy_block multiple times.
> > > We could handle this by carefully giving each slab its own name but that
> > > brings its own pain because then we have to store that string somewhere
> > > and manage the lifetimes of the different slabs.  The most likely
> > > outcome would be a global atomic which we increment to get a new name or
> > > something like that.
> > >
> > > The much easier solution is to use the i915_globals system like we do
> > > for every other slab in i915.  This ensures that we have exactly one of
> > > them for each i915 driver load and it gets neatly created on module load
> > > and destroyed on module unload.  Using the globals system also means
> > > that its now tied into the shrink handler so we can properly respond to
> > > low-memory situations.
> > >
> > > Signed-off-by: Jason Ekstrand 
> > > Fixes: 88be9a0a06b7 ("drm/i915/ttm: add ttm_buddy_man")
> > > Cc: Matthew Auld 
> > > Cc: Christian König 
> >
> > It was intentionally ripped it out with the idea that we would be moving the
> > buddy stuff into ttm, and so part of that was trying to get rid of the some
> > of the i915 specifics, like this globals thing.
> >
> > Reviewed-by: Matthew Auld 
>
> I just sent out a patch to put i915_globals on a diet, so maybe we can
> hold this patch here a bit when there's other reasons for why this is
> special?

This is required to get rid of the dmesg warnings.

> Or at least no make this use the i915_globals stuff and instead just link
> up the init/exit function calls directly into Jason's new table, so that
> we don't have a merge conflict here?

I'm happy to deal with merge conflicts however they land.

--Jason

> -Daniel
>
> >
> > > ---
> > >   drivers/gpu/drm/i915/i915_buddy.c   | 44 ++---
> > >   drivers/gpu/drm/i915/i915_buddy.h   |  3 +-
> > >   drivers/gpu/drm/i915/i915_globals.c |  2 ++
> > >   3 files changed, 38 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_buddy.c 
> > > b/drivers/gpu/drm/i915/i915_buddy.c
> > > index 29dd7d0310c1f..911feedad4513 100644
> > > --- a/drivers/gpu/drm/i915/i915_buddy.c
> > > +++ b/drivers/gpu/drm/i915/i915_buddy.c
> > > @@ -8,8 +8,14 @@
> > >   #include "i915_buddy.h"
> > >   #include "i915_gem.h"
> > > +#include "i915_globals.h"
> > >   #include "i915_utils.h"
> > > +static struct i915_global_buddy {
> > > +   struct i915_global base;
> > > +   struct kmem_cache *slab_blocks;
> > > +} global;
> > > +
> > >   static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm 
> > > *mm,
> > >  struct i915_buddy_block 
> > > *parent,
> > >  unsigned int order,
> > > @@ -19,7 +25,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
> > > i915_buddy_mm *mm,
> > > GEM_BUG_ON(order > I915_BUDDY_MAX_ORDER);
> > > -   block = kmem_cache_zalloc(mm->slab_blocks, GFP_KERNEL);
> > > +   block = kmem_cache_zalloc(global.slab_blocks, GFP_KERNEL);
> > > if (!block)
> > > return NULL;
> > > @@ -34,7 +40,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
> > > i915_buddy_mm *mm,
> > >   static void i915_block_free(struct i915_buddy_mm *mm,
> > > struct i915_buddy_block *block)
> > >   {
> > > -   kmem_cache_free(mm->slab_blocks, block);
> > > +   kmem_cache_free(global.slab_blocks, block);
> > >   }
> > >   static void mark_allocated(struct i915_buddy_block *block)
> > > @@ -85,15 +91,11 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 
> > > size, u64 chunk_size)
> > > GEM_BUG_ON(mm->max_order > I915_BUDDY_MAX_ORDER);
> > > -   mm->slab_blocks = KMEM_CACHE(i915_buddy_block, SLAB_HWCACHE_ALIGN);
> > > -   if (!mm->slab_blocks)
> > > -   return -ENOMEM;
> > > -
> > > mm->free_l

[Intel-gfx] [PATCH 7/7] drm/i915/gem: Migrate to system at dma-buf attach time (v7)

2021-07-21 Thread Jason Ekstrand
From: Thomas Hellström 

Until we support p2p dma or as a complement to that, migrate data
to system memory at dma-buf attach time if possible.

v2:
- Rebase on dynamic exporter. Update the igt_dmabuf_import_same_driver
  selftest to migrate if we are LMEM capable.
v3:
- Migrate also in the pin() callback.
v4:
- Migrate in attach
v5: (jason)
- Lock around the migration
v6: (jason)
- Move the can_migrate check outside the lock
- Rework the selftests to test more migration conditions.  In
  particular, SMEM, LMEM, and LMEM+SMEM are all checked.
v7: (mauld)
- Misc style nits

Signed-off-by: Thomas Hellström 
Signed-off-by: Michael J. Ruhl 
Reported-by: kernel test robot 
Signed-off-by: Jason Ekstrand 
Reviewed-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 23 -
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 87 ++-
 2 files changed, 106 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 59dc56ae14d6b..afa34111de02e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -164,8 +164,29 @@ static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf,
  struct dma_buf_attachment *attach)
 {
struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+   struct i915_gem_ww_ctx ww;
+   int err;
+
+   if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM))
+   return -EOPNOTSUPP;
+
+   for_i915_gem_ww(, err, true) {
+   err = i915_gem_object_lock(obj, );
+   if (err)
+   continue;
+
+   err = i915_gem_object_migrate(obj, , INTEL_REGION_SMEM);
+   if (err)
+   continue;
 
-   return i915_gem_object_pin_pages_unlocked(obj);
+   err = i915_gem_object_wait_migration(obj, 0);
+   if (err)
+   continue;
+
+   err = i915_gem_object_pin_pages(obj);
+   }
+
+   return err;
 }
 
 static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf,
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
index d4ce01e6ee854..ffae7df5e4d7d 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
@@ -85,9 +85,63 @@ static int igt_dmabuf_import_self(void *arg)
return err;
 }
 
-static int igt_dmabuf_import_same_driver(void *arg)
+static int igt_dmabuf_import_same_driver_lmem(void *arg)
 {
struct drm_i915_private *i915 = arg;
+   struct intel_memory_region *lmem = i915->mm.regions[INTEL_REGION_LMEM];
+   struct drm_i915_gem_object *obj;
+   struct drm_gem_object *import;
+   struct dma_buf *dmabuf;
+   int err;
+
+   if (!lmem)
+   return 0;
+
+   force_different_devices = true;
+
+   obj = __i915_gem_object_create_user(i915, PAGE_SIZE, , 1);
+   if (IS_ERR(obj)) {
+   pr_err("__i915_gem_object_create_user failed with err=%ld\n",
+  PTR_ERR(dmabuf));
+   err = PTR_ERR(obj);
+   goto out_ret;
+   }
+
+   dmabuf = i915_gem_prime_export(>base, 0);
+   if (IS_ERR(dmabuf)) {
+   pr_err("i915_gem_prime_export failed with err=%ld\n",
+  PTR_ERR(dmabuf));
+   err = PTR_ERR(dmabuf);
+   goto out;
+   }
+
+   /*
+* We expect an import of an LMEM-only object to fail with
+* -EOPNOTSUPP because it can't be migrated to SMEM.
+*/
+   import = i915_gem_prime_import(>drm, dmabuf);
+   if (!IS_ERR(import)) {
+   drm_gem_object_put(import);
+   pr_err("i915_gem_prime_import succeeded when it shouldn't 
have\n");
+   err = -EINVAL;
+   } else if (PTR_ERR(import) != -EOPNOTSUPP) {
+   pr_err("i915_gem_prime_import failed with the wrong err=%ld\n",
+  PTR_ERR(import));
+   err = PTR_ERR(import);
+   }
+
+   dma_buf_put(dmabuf);
+out:
+   i915_gem_object_put(obj);
+out_ret:
+   force_different_devices = false;
+   return err;
+}
+
+static int igt_dmabuf_import_same_driver(struct drm_i915_private *i915,
+struct intel_memory_region **regions,
+unsigned int num_regions)
+{
struct drm_i915_gem_object *obj, *import_obj;
struct drm_gem_object *import;
struct dma_buf *dmabuf;
@@ -97,8 +151,12 @@ static int igt_dmabuf_import_same_driver(void *arg)
int err;
 
force_different_devices = true;
-   obj = i915_gem_object_create_shmem(i915, PAGE_SIZE);
+
+   obj = __i915_gem_object_create_user(i915, PAGE_SIZE,
+   

[Intel-gfx] [PATCH 5/7] drm/i915/gem/ttm: Respect the objection region in placement_from_obj

2021-07-21 Thread Jason Ekstrand
Whenever we had a user object (n_placements > 0), we were ignoring
obj->mm.region and always putting obj->placements[0] as the requested
region.  For LMEM+SMEM objects, this was causing them to get shoved into
LMEM on every i915_ttm_get_pages() even when SMEM was requested by, say,
i915_gem_object_migrate().

Signed-off-by: Jason Ekstrand 
Cc: Thomas Hellström 
Cc: Matthew Auld 
Cc: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index f253b11e9e367..b76bdd978a5cc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -150,8 +150,7 @@ i915_ttm_placement_from_obj(const struct 
drm_i915_gem_object *obj,
unsigned int i;
 
placement->num_placement = 1;
-   i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] :
-  obj->mm.region, requested, flags);
+   i915_ttm_place_from_region(obj->mm.region, requested, flags);
 
/* Cache this on object? */
placement->num_busy_placement = num_allowed;
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 6/7] drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)

2021-07-21 Thread Jason Ekstrand
From: Thomas Hellström 

If our exported dma-bufs are imported by another instance of our driver,
that instance will typically have the imported dma-bufs locked during
dma_buf_map_attachment(). But the exporter also locks the same reservation
object in the map_dma_buf() callback, which leads to recursive locking.

So taking the lock inside _pin_pages_unlocked() is incorrect.

Additionally, the current pinning code path is contrary to the defined
way that pinning should occur.

Remove the explicit pin/unpin from the map/umap functions and move them
to the attach/detach allowing correct locking to occur, and to match
the static dma-buf drm_prime pattern.

Add a live selftest to exercise both dynamic and non-dynamic
exports.

v2:
- Extend the selftest with a fake dynamic importer.
- Provide real pin and unpin callbacks to not abuse the interface.
v3: (ruhl)
- Remove the dynamic export support and move the pinning into the
  attach/detach path.
v4: (ruhl)
- Put pages does not need to assert on the dma-resv
v5: (jason)
- Lock around dma_buf_unmap_attachment() when emulating a dynamic
  importer in the subtests.
- Use pin_pages_unlocked
v6: (jason)
- Use dma_buf_attach instead of dma_buf_attach_dynamic in the selftests
v7: (mauld)
- Use __i915_gem_object_get_pages (2 __underscores) instead of the
  4 underscore version in the selftests
v8: (mauld)
- Drop the kernel doc from the static i915_gem_dmabuf_attach function
- Add missing "err = PTR_ERR()" to a bunch of selftest error cases

Reported-by: Michael J. Ruhl 
Signed-off-by: Thomas Hellström 
Signed-off-by: Michael J. Ruhl 
Signed-off-by: Jason Ekstrand 
Reviewed-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  37 --
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 109 +-
 2 files changed, 132 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 616c3a2f1baf0..59dc56ae14d6b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -12,6 +12,8 @@
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 
+I915_SELFTEST_DECLARE(static bool force_different_devices;)
+
 static struct drm_i915_gem_object *dma_buf_to_obj(struct dma_buf *buf)
 {
return to_intel_bo(buf->priv);
@@ -25,15 +27,11 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
struct scatterlist *src, *dst;
int ret, i;
 
-   ret = i915_gem_object_pin_pages_unlocked(obj);
-   if (ret)
-   goto err;
-
/* Copy sg so that we make an independent mapping */
st = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
if (st == NULL) {
ret = -ENOMEM;
-   goto err_unpin_pages;
+   goto err;
}
 
ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);
@@ -58,8 +56,6 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
sg_free_table(st);
 err_free:
kfree(st);
-err_unpin_pages:
-   i915_gem_object_unpin_pages(obj);
 err:
return ERR_PTR(ret);
 }
@@ -68,13 +64,9 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment 
*attachment,
   struct sg_table *sg,
   enum dma_data_direction dir)
 {
-   struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
-
dma_unmap_sgtable(attachment->dev, sg, dir, DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sg);
kfree(sg);
-
-   i915_gem_object_unpin_pages(obj);
 }
 
 static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map 
*map)
@@ -168,7 +160,25 @@ static int i915_gem_end_cpu_access(struct dma_buf 
*dma_buf, enum dma_data_direct
return err;
 }
 
+static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf,
+ struct dma_buf_attachment *attach)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+
+   return i915_gem_object_pin_pages_unlocked(obj);
+}
+
+static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf,
+  struct dma_buf_attachment *attach)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+
+   i915_gem_object_unpin_pages(obj);
+}
+
 static const struct dma_buf_ops i915_dmabuf_ops =  {
+   .attach = i915_gem_dmabuf_attach,
+   .detach = i915_gem_dmabuf_detach,
.map_dma_buf = i915_gem_map_dma_buf,
.unmap_dma_buf = i915_gem_unmap_dma_buf,
.release = drm_gem_dmabuf_release,
@@ -204,6 +214,8 @@ static int i915_gem_object_get_pages_dmabuf(struct 
drm_i915_gem_object *obj)
struct sg_table *pages;
unsigned int sg_page_sizes;
 
+   assert_object_held(obj);
+
pages = dma_buf_map_attachment(obj->base.import_attach,
 

[Intel-gfx] [PATCH 4/7] drm/i915/gem: Unify user object creation (v3)

2021-07-21 Thread Jason Ekstrand
Instead of hand-rolling the same three calls in each function, pull them
into an i915_gem_object_create_user helper.  Apart from re-ordering of
the placements array ENOMEM check, there should be no functional change.

v2 (Matthew Auld):
 - Add the call to i915_gem_flush_free_objects() from
   i915_gem_dumb_create() in a separate patch
 - Move i915_gem_object_alloc() below the simple error checks
v3 (Matthew Auld):
 - Add __ to i915_gem_object_create_user and kerneldoc which warns the
   caller that it's not validating anything.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 119 ++---
 drivers/gpu/drm/i915/gem/i915_gem_object.h |   4 +
 2 files changed, 58 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index adcce37c04b8d..23fee13a33844 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -11,13 +11,14 @@
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 
-static u32 object_max_page_size(struct drm_i915_gem_object *obj)
+static u32 object_max_page_size(struct intel_memory_region **placements,
+   unsigned int n_placements)
 {
u32 max_page_size = 0;
int i;
 
-   for (i = 0; i < obj->mm.n_placements; i++) {
-   struct intel_memory_region *mr = obj->mm.placements[i];
+   for (i = 0; i < n_placements; i++) {
+   struct intel_memory_region *mr = placements[i];
 
GEM_BUG_ON(!is_power_of_2(mr->min_page_size));
max_page_size = max_t(u32, max_page_size, mr->min_page_size);
@@ -81,22 +82,46 @@ static int i915_gem_publish(struct drm_i915_gem_object *obj,
return 0;
 }
 
-static int
-i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
+/**
+ * Creates a new object using the same path as DRM_I915_GEM_CREATE_EXT
+ * @i915: i915 private
+ * @size: size of the buffer, in bytes
+ * @placements: possible placement regions, in priority order
+ * @n_placements: number of possible placement regions
+ *
+ * This function is exposed primarily for selftests and does very little
+ * error checking.  It is assumed that the set of placement regions has
+ * already been verified to be valid.
+ */
+struct drm_i915_gem_object *
+__i915_gem_object_create_user(struct drm_i915_private *i915, u64 size,
+ struct intel_memory_region **placements,
+ unsigned int n_placements)
 {
-   struct intel_memory_region *mr = obj->mm.placements[0];
+   struct intel_memory_region *mr = placements[0];
+   struct drm_i915_gem_object *obj;
unsigned int flags;
int ret;
 
-   size = round_up(size, object_max_page_size(obj));
+   i915_gem_flush_free_objects(i915);
+
+   size = round_up(size, object_max_page_size(placements, n_placements));
if (size == 0)
-   return -EINVAL;
+   return ERR_PTR(-EINVAL);
 
/* For most of the ABI (e.g. mmap) we think in system pages */
GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
 
if (i915_gem_object_size_2big(size))
-   return -E2BIG;
+   return ERR_PTR(-E2BIG);
+
+   obj = i915_gem_object_alloc();
+   if (!obj)
+   return ERR_PTR(-ENOMEM);
+
+   ret = object_set_placements(obj, placements, n_placements);
+   if (ret)
+   goto object_free;
 
/*
 * I915_BO_ALLOC_USER will make sure the object is cleared before
@@ -106,12 +131,18 @@ i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
 
ret = mr->ops->init_object(mr, obj, size, 0, flags);
if (ret)
-   return ret;
+   goto object_free;
 
GEM_BUG_ON(size != obj->base.size);
 
trace_i915_gem_object_create(obj);
-   return 0;
+   return obj;
+
+object_free:
+   if (obj->mm.n_placements > 1)
+   kfree(obj->mm.placements);
+   i915_gem_object_free(obj);
+   return ERR_PTR(ret);
 }
 
 int
@@ -124,7 +155,6 @@ i915_gem_dumb_create(struct drm_file *file,
enum intel_memory_type mem_type;
int cpp = DIV_ROUND_UP(args->bpp, 8);
u32 format;
-   int ret;
 
switch (cpp) {
case 1:
@@ -151,32 +181,19 @@ i915_gem_dumb_create(struct drm_file *file,
if (args->pitch < args->width)
return -EINVAL;
 
-   i915_gem_flush_free_objects(i915);
-
args->size = mul_u32_u32(args->pitch, args->height);
 
mem_type = INTEL_MEMORY_SYSTEM;
if (HAS_LMEM(to_i915(dev)))
mem_type = INTEL_MEMORY_LOCAL;
 
-   obj = i915_gem_object_alloc();
-   if (!obj)
-   return -ENOMEM;
-
mr = intel_memory_region_by_type(to_i915(dev), mem_type);
-   

[Intel-gfx] [PATCH 3/7] drm/i915/gem: Call i915_gem_flush_free_objects() in i915_gem_dumb_create()

2021-07-21 Thread Jason Ekstrand
This doesn't really fix anything serious since the chances of a client
creating and destroying a mass of dumb BOs is pretty low.  However, it
is called by the other two create IOCTLs to garbage collect old objects.
Call it here too for consistency.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index aa687b10dcd45..adcce37c04b8d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -151,6 +151,8 @@ i915_gem_dumb_create(struct drm_file *file,
if (args->pitch < args->width)
return -EINVAL;
 
+   i915_gem_flush_free_objects(i915);
+
args->size = mul_u32_u32(args->pitch, args->height);
 
mem_type = INTEL_MEMORY_SYSTEM;
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/7] drm/i915/gem: Refactor placement setup for i915_gem_object_create* (v2)

2021-07-21 Thread Jason Ekstrand
Since we don't allow changing the set of regions after creation, we can
make ext_set_placements() build up the region set directly in the
create_ext and assign it to the object later.  This is similar to what
we did for contexts with the proto-context only simpler because there's
no funny object shuffling.  This will be used in the next patch to allow
us to de-duplicate a bunch of code.  Also, since we know the maximum
number of regions up-front, we can use a fixed-size temporary array for
the regions.  This simplifies memory management a bit for this new
delayed approach.

v2 (Matthew Auld):
 - Get rid of MAX_N_PLACEMENTS
 - Drop kfree(placements) from set_placements()
v3 (Matthew Auld):
 - Properly set ext_data->n_placements

Signed-off-by: Jason Ekstrand 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 82 --
 1 file changed, 46 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 51f92e4b1a69d..aa687b10dcd45 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -27,10 +27,13 @@ static u32 object_max_page_size(struct drm_i915_gem_object 
*obj)
return max_page_size;
 }
 
-static void object_set_placements(struct drm_i915_gem_object *obj,
- struct intel_memory_region **placements,
- unsigned int n_placements)
+static int object_set_placements(struct drm_i915_gem_object *obj,
+struct intel_memory_region **placements,
+unsigned int n_placements)
 {
+   struct intel_memory_region **arr;
+   unsigned int i;
+
GEM_BUG_ON(!n_placements);
 
/*
@@ -44,9 +47,20 @@ static void object_set_placements(struct drm_i915_gem_object 
*obj,
obj->mm.placements = >mm.regions[mr->id];
obj->mm.n_placements = 1;
} else {
-   obj->mm.placements = placements;
+   arr = kmalloc_array(n_placements,
+   sizeof(struct intel_memory_region *),
+   GFP_KERNEL);
+   if (!arr)
+   return -ENOMEM;
+
+   for (i = 0; i < n_placements; i++)
+   arr[i] = placements[i];
+
+   obj->mm.placements = arr;
obj->mm.n_placements = n_placements;
}
+
+   return 0;
 }
 
 static int i915_gem_publish(struct drm_i915_gem_object *obj,
@@ -148,7 +162,9 @@ i915_gem_dumb_create(struct drm_file *file,
return -ENOMEM;
 
mr = intel_memory_region_by_type(to_i915(dev), mem_type);
-   object_set_placements(obj, , 1);
+   ret = object_set_placements(obj, , 1);
+   if (ret)
+   goto object_free;
 
ret = i915_gem_setup(obj, args->size);
if (ret)
@@ -184,7 +200,9 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
return -ENOMEM;
 
mr = intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
-   object_set_placements(obj, , 1);
+   ret = object_set_placements(obj, , 1);
+   if (ret)
+   goto object_free;
 
ret = i915_gem_setup(obj, args->size);
if (ret)
@@ -199,7 +217,8 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
 
 struct create_ext {
struct drm_i915_private *i915;
-   struct drm_i915_gem_object *vanilla_object;
+   struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
+   unsigned int n_placements;
 };
 
 static void repr_placements(char *buf, size_t size,
@@ -230,8 +249,7 @@ static int set_placements(struct 
drm_i915_gem_create_ext_memory_regions *args,
struct drm_i915_private *i915 = ext_data->i915;
struct drm_i915_gem_memory_class_instance __user *uregions =
u64_to_user_ptr(args->regions);
-   struct drm_i915_gem_object *obj = ext_data->vanilla_object;
-   struct intel_memory_region **placements;
+   struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
u32 mask;
int i, ret = 0;
 
@@ -245,6 +263,8 @@ static int set_placements(struct 
drm_i915_gem_create_ext_memory_regions *args,
ret = -EINVAL;
}
 
+   BUILD_BUG_ON(ARRAY_SIZE(i915->mm.regions) != ARRAY_SIZE(placements));
+   BUILD_BUG_ON(ARRAY_SIZE(ext_data->placements) != 
ARRAY_SIZE(placements));
if (args->num_regions > ARRAY_SIZE(i915->mm.regions)) {
drm_dbg(>drm, "num_regions is too large\n");
ret = -EINVAL;
@@ -253,21 +273,13 @@ static int set_placements(struct 
drm_i915_gem_create_ext_memory_regions *args,
if (ret)
return ret;
 
-   placements = kmalloc_array(args->num_regions,
-  sizeof(str

[Intel-gfx] [PATCH 1/7] drm/i915/gem: Check object_can_migrate from object_migrate

2021-07-21 Thread Jason Ekstrand
We don't roll them together entirely because there are still a couple
cases where we want a separate can_migrate check.  For instance, the
display code checks that you can migrate a buffer to LMEM before it
accepts it in fb_create.  The dma-buf import code also uses it to do an
early check and return a different error code if someone tries to attach
a LMEM-only dma-buf to another driver.

However, no one actually wants to call object_migrate when can_migrate
has failed.  The stated intention is for self-tests but none of those
actually take advantage of this unsafe migration.

Signed-off-by: Jason Ekstrand 
Cc: Daniel Vetter 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c| 13 ++---
 .../gpu/drm/i915/gem/selftests/i915_gem_migrate.c | 15 ---
 2 files changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 9da7b288b7ede..f2244ae09a613 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -584,12 +584,6 @@ bool i915_gem_object_can_migrate(struct 
drm_i915_gem_object *obj,
  * completed yet, and to accomplish that, i915_gem_object_wait_migration()
  * must be called.
  *
- * This function is a bit more permissive than i915_gem_object_can_migrate()
- * to allow for migrating objects where the caller knows exactly what is
- * happening. For example within selftests. More specifically this
- * function allows migrating I915_BO_ALLOC_USER objects to regions
- * that are not in the list of allowable regions.
- *
  * Note: the @ww parameter is not used yet, but included to make sure
  * callers put some effort into obtaining a valid ww ctx if one is
  * available.
@@ -616,11 +610,8 @@ int i915_gem_object_migrate(struct drm_i915_gem_object 
*obj,
if (obj->mm.region == mr)
return 0;
 
-   if (!i915_gem_object_evictable(obj))
-   return -EBUSY;
-
-   if (!obj->ops->migrate)
-   return -EOPNOTSUPP;
+   if (!i915_gem_object_can_migrate(obj, id))
+   return -EINVAL;
 
return obj->ops->migrate(obj, mr);
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index 0b7144d2991ca..28a700f08b49a 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -61,11 +61,6 @@ static int igt_create_migrate(struct intel_gt *gt, enum 
intel_region_id src,
if (err)
continue;
 
-   if (!i915_gem_object_can_migrate(obj, dst)) {
-   err = -EINVAL;
-   continue;
-   }
-
err = i915_gem_object_migrate(obj, , dst);
if (err)
continue;
@@ -114,11 +109,6 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx 
*ww,
return err;
 
if (i915_gem_object_is_lmem(obj)) {
-   if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM)) {
-   pr_err("object can't migrate to smem.\n");
-   return -EINVAL;
-   }
-
err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM);
if (err) {
pr_err("Object failed migration to smem\n");
@@ -137,11 +127,6 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx 
*ww,
}
 
} else {
-   if (!i915_gem_object_can_migrate(obj, INTEL_REGION_LMEM)) {
-   pr_err("object can't migrate to lmem.\n");
-   return -EINVAL;
-   }
-
err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM);
if (err) {
pr_err("Object failed migration to lmem\n");
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/7] drm/i915: Migrate memory to SMEM when imported cross-device (v8)

2021-07-21 Thread Jason Ekstrand
This patch series fixes an issue with discrete graphics on Intel where we
allowed dma-buf import while leaving the object in local memory.  This
breaks down pretty badly if the import happened on a different physical
device.

v7:
 - Drop "drm/i915/gem/ttm: Place new BOs in the requested region"
 - Add a new "drm/i915/gem: Call i915_gem_flush_free_objects() in 
i915_gem_dumb_create()"
 - Misc. review feedback from Matthew Auld
v8:
 - Misc. review feedback from Matthew Auld

Jason Ekstrand (5):
  drm/i915/gem: Check object_can_migrate from object_migrate
  drm/i915/gem: Refactor placement setup for i915_gem_object_create*
(v2)
  drm/i915/gem: Call i915_gem_flush_free_objects() in
i915_gem_dumb_create()
  drm/i915/gem: Unify user object creation (v3)
  drm/i915/gem/ttm: Respect the objection region in placement_from_obj

Thomas Hellström (2):
  drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)
  drm/i915/gem: Migrate to system at dma-buf attach time (v7)

 drivers/gpu/drm/i915/gem/i915_gem_create.c| 177 
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  58 --
 drivers/gpu/drm/i915/gem/i915_gem_object.c|  13 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.h|   4 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |   3 +-
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 190 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  15 --
 7 files changed, 330 insertions(+), 130 deletions(-)

-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 5/7] drm/i915/gem/ttm: Respect the objection region in placement_from_obj

2021-07-21 Thread Jason Ekstrand
On Mon, Jul 19, 2021 at 8:35 AM Matthew Auld
 wrote:
>
> On Fri, 16 Jul 2021 at 20:49, Jason Ekstrand  wrote:
> >
> > On Fri, Jul 16, 2021 at 1:45 PM Matthew Auld
> >  wrote:
> > >
> > > On Fri, 16 Jul 2021 at 18:39, Jason Ekstrand  wrote:
> > > >
> > > > On Fri, Jul 16, 2021 at 11:00 AM Matthew Auld
> > > >  wrote:
> > > > >
> > > > > On Fri, 16 Jul 2021 at 16:52, Matthew Auld
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, 16 Jul 2021 at 15:10, Jason Ekstrand  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Fri, Jul 16, 2021 at 8:54 AM Matthew Auld
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Thu, 15 Jul 2021 at 23:39, Jason Ekstrand 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Whenever we had a user object (n_placements > 0), we were 
> > > > > > > > > ignoring
> > > > > > > > > obj->mm.region and always putting obj->placements[0] as the 
> > > > > > > > > requested
> > > > > > > > > region.  For LMEM+SMEM objects, this was causing them to get 
> > > > > > > > > shoved into
> > > > > > > > > LMEM on every i915_ttm_get_pages() even when SMEM was 
> > > > > > > > > requested by, say,
> > > > > > > > > i915_gem_object_migrate().
> > > > > > > >
> > > > > > > > i915_ttm_migrate calls i915_ttm_place_from_region() directly 
> > > > > > > > with the
> > > > > > > > requested region, so there shouldn't be an issue with migration 
> > > > > > > > right?
> > > > > > > > Do you have some more details?
> > > > > > >
> > > > > > > With i915_ttm_migrate directly, no.  But, in the last patch in the
> > > > > > > series, we're trying to migrate LMEM+SMEM buffers into SMEM on
> > > > > > > attach() and pin it there.  This blows up in a very unexpected 
> > > > > > > (IMO)
> > > > > > > way.  The flow goes something like this:
> > > > > > >
> > > > > > >  - Client attempts a dma-buf import from another device
> > > > > > >  - In attach() we call i915_gem_object_migrate() which calls
> > > > > > > i915_ttm_migrate() which migrates as requested.
> > > > > > >  - Once the migration is complete, we call 
> > > > > > > i915_gem_object_pin_pages()
> > > > > > > which calls i915_ttm_get_pages() which depends on
> > > > > > > i915_ttm_placement_from_obj() and so migrates it right back to 
> > > > > > > LMEM.
> > > > > >
> > > > > > The mm.pages must be NULL here, otherwise it would just increment 
> > > > > > the
> > > > > > pages_pin_count?
> > > >
> > > > Given that the test is using the four_underscores version, it
> > > > doesn't have that check.  However, this executes after we've done the
> > > > dma-buf import which pinned pages.  So we should definitely have
> > > > pages.
> > >
> > > We shouldn't call four_underscores() if we might already have
> > > pages though. Under non-TTM that would leak the pages, and in TTM we
> > > might hit the WARN_ON(mm->pages) in __i915_ttm_get_pages(), if for
> > > example nothing was moved. I take it we can't just call pin_pages()?
> > > Four scary underscores usually means "don't call this in normal code".
> >
> > I've switched the four_underscores call to a __two_underscores in
> > the selftests and it had no effect, good or bad.  But, still, probably
> > better to call that one.
> >
> > > >
> > > > > > >
> > > > > > > Maybe the problem here is actually that our TTM code isn't 
> > > > > > > respecting
> > > > > > > obj->mm.pages_pin_count?
> > > > > >
> > > > > > I think if the resource is moved, we always nuke the mm.pages after
> > > > > > being notified of the move. Also TTM is also not allowed to move
> > > > > &

Re: [Intel-gfx] [PATCH] drm/i915/gvt: Fix cached atomics setting for Windows VM

2021-07-21 Thread Jason Ekstrand
On Wed, Jul 21, 2021 at 4:08 AM Daniel Vetter  wrote:
>
> On Wed, Jul 21, 2021 at 8:21 AM Zhenyu Wang  wrote:
> > We've seen recent regression with host and windows VM running
> > simultaneously that cause gpu hang or even crash. Finally bisect to
> > 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems
> > cached atomics behavior difference caused regression issue.
> >
> > This trys to add new scratch register handler and add those in mmio
> > save/restore list for context switch. No gpu hang produced with this one.
> >
> > Cc: sta...@vger.kernel.org # 5.12+
> > Cc: "Xu, Terrence" 
> > Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9")
> > Signed-off-by: Zhenyu Wang 
>
> Adding Jon Bloomfield, since different settings between linux and
> windows for something that can hard-hang the machine on gen9 sounds
> ... not good.

The difference there is legit and intentional.

As far as what we do about it for GVT, if we can safely smash L3
atomics off underneath Windows without causing problems for the VM, we
should do that.  If not, we need to discuss this internally before
proceeding.

--Jason

> -Daniel
>
> > ---
> >  drivers/gpu/drm/i915/gvt/handlers.c | 1 +
> >  drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++
> >  2 files changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gvt/handlers.c 
> > b/drivers/gpu/drm/i915/gvt/handlers.c
> > index 98eb48c24c46..345b4be5ebad 100644
> > --- a/drivers/gpu/drm/i915/gvt/handlers.c
> > +++ b/drivers/gpu/drm/i915/gvt/handlers.c
> > @@ -3134,6 +3134,7 @@ static int init_bdw_mmio_info(struct intel_gvt *gvt)
> > MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL);
> > MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL);
> > MMIO_D(_MMIO(0xb110), D_BDW);
> > +   MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS);
> >
> > MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0,
> > D_BDW_PLUS, NULL, force_nonpriv_write);
> > diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c 
> > b/drivers/gpu/drm/i915/gvt/mmio_context.c
> > index b8ac80765461..f776c470914d 100644
> > --- a/drivers/gpu/drm/i915/gvt/mmio_context.c
> > +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
> > @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mmio_list[] 
> > __cacheline_aligned = {
> > {RCS0, COMMON_SLICE_CHICKEN2, 0x, true}, /* 0x7014 */
> > {RCS0, GEN9_CS_DEBUG_MODE1, 0x, false}, /* 0x20ec */
> > {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */
> > +   {RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */
> > +   {RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */
> > {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0x, true}, /* 0xe100 */
> > {RCS0, HALF_SLICE_CHICKEN2, 0x, true}, /* 0xe180 */
> > {RCS0, HALF_SLICE_CHICKEN3, 0x, true}, /* 0xe184 */
> > --
> > 2.32.0.rc2
> >
> > ___
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/7] drm/i915/gem: Refactor placement setup for i915_gem_object_create* (v2)

2021-07-21 Thread Jason Ekstrand
On Wed, Jul 21, 2021 at 3:32 AM Matthew Auld
 wrote:
>
> On Tue, 20 Jul 2021 at 23:07, Jason Ekstrand  wrote:
> >
> > On Mon, Jul 19, 2021 at 3:18 AM Matthew Auld
> >  wrote:
> > >
> > > On Fri, 16 Jul 2021 at 15:14, Jason Ekstrand  wrote:
> > > >
> > > > Since we don't allow changing the set of regions after creation, we can
> > > > make ext_set_placements() build up the region set directly in the
> > > > create_ext and assign it to the object later.  This is similar to what
> > > > we did for contexts with the proto-context only simpler because there's
> > > > no funny object shuffling.  This will be used in the next patch to allow
> > > > us to de-duplicate a bunch of code.  Also, since we know the maximum
> > > > number of regions up-front, we can use a fixed-size temporary array for
> > > > the regions.  This simplifies memory management a bit for this new
> > > > delayed approach.
> > > >
> > > > v2 (Matthew Auld):
> > > >  - Get rid of MAX_N_PLACEMENTS
> > > >  - Drop kfree(placements) from set_placements()
> > > >
> > > > Signed-off-by: Jason Ekstrand 
> > > > Cc: Matthew Auld 
> > > > ---
> > > >  drivers/gpu/drm/i915/gem/i915_gem_create.c | 81 --
> > > >  1 file changed, 45 insertions(+), 36 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
> > > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > > index 51f92e4b1a69d..5766749a449c0 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > > @@ -27,10 +27,13 @@ static u32 object_max_page_size(struct 
> > > > drm_i915_gem_object *obj)
> > > > return max_page_size;
> > > >  }
> > > >
> > > > -static void object_set_placements(struct drm_i915_gem_object *obj,
> > > > - struct intel_memory_region 
> > > > **placements,
> > > > - unsigned int n_placements)
> > > > +static int object_set_placements(struct drm_i915_gem_object *obj,
> > > > +struct intel_memory_region 
> > > > **placements,
> > > > +unsigned int n_placements)
> > > >  {
> > > > +   struct intel_memory_region **arr;
> > > > +   unsigned int i;
> > > > +
> > > > GEM_BUG_ON(!n_placements);
> > > >
> > > > /*
> > > > @@ -44,9 +47,20 @@ static void object_set_placements(struct 
> > > > drm_i915_gem_object *obj,
> > > > obj->mm.placements = >mm.regions[mr->id];
> > > > obj->mm.n_placements = 1;
> > > > } else {
> > > > -   obj->mm.placements = placements;
> > > > +   arr = kmalloc_array(n_placements,
> > > > +   sizeof(struct intel_memory_region 
> > > > *),
> > > > +   GFP_KERNEL);
> > > > +   if (!arr)
> > > > +   return -ENOMEM;
> > > > +
> > > > +   for (i = 0; i < n_placements; i++)
> > > > +   arr[i] = placements[i];
> > > > +
> > > > +   obj->mm.placements = arr;
> > > > obj->mm.n_placements = n_placements;
> > > > }
> > > > +
> > > > +   return 0;
> > > >  }
> > > >
> > > >  static int i915_gem_publish(struct drm_i915_gem_object *obj,
> > > > @@ -148,7 +162,9 @@ i915_gem_dumb_create(struct drm_file *file,
> > > > return -ENOMEM;
> > > >
> > > > mr = intel_memory_region_by_type(to_i915(dev), mem_type);
> > > > -   object_set_placements(obj, , 1);
> > > > +   ret = object_set_placements(obj, , 1);
> > > > +   if (ret)
> > > > +   goto object_free;
> > > >
> > > > ret = i915_gem_setup(obj, args->size);
> > > > if (ret)
> > > > @@ -184,7 +200,9 @@ i915_gem_create_ioctl(struct drm_device *dev, void 
> > > > *data,
> > > > return -ENOMEM;
>

Re: [Intel-gfx] [PATCH 3/7] drm/i915/gem: Unify user object creation

2021-07-21 Thread Jason Ekstrand
On Wed, Jul 21, 2021 at 3:25 AM Matthew Auld
 wrote:
>
> On Tue, 20 Jul 2021 at 23:04, Jason Ekstrand  wrote:
> >
> > On Tue, Jul 20, 2021 at 4:35 AM Matthew Auld
> >  wrote:
> > >
> > > On Thu, 15 Jul 2021 at 23:39, Jason Ekstrand  wrote:
> > > >
> > > > Instead of hand-rolling the same three calls in each function, pull them
> > > > into an i915_gem_object_create_user helper.  Apart from re-ordering of
> > > > the placements array ENOMEM check, the only functional change here
> > > > should be that i915_gem_dumb_create now calls 
> > > > i915_gem_flush_free_objects
> > > > which it probably should have been calling all along.
> > > >
> > > > Signed-off-by: Jason Ekstrand 
> > > > ---
> > > >  drivers/gpu/drm/i915/gem/i915_gem_create.c | 106 +
> > > >  1 file changed, 43 insertions(+), 63 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
> > > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > > index 391c8c4a12172..69bf9ec777642 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > > > @@ -11,13 +11,14 @@
> > > >  #include "i915_trace.h"
> > > >  #include "i915_user_extensions.h"
> > > >
> > > > -static u32 object_max_page_size(struct drm_i915_gem_object *obj)
> > > > +static u32 object_max_page_size(struct intel_memory_region 
> > > > **placements,
> > > > +   unsigned int n_placements)
> > > >  {
> > > > u32 max_page_size = 0;
> > > > int i;
> > > >
> > > > -   for (i = 0; i < obj->mm.n_placements; i++) {
> > > > -   struct intel_memory_region *mr = obj->mm.placements[i];
> > > > +   for (i = 0; i < n_placements; i++) {
> > > > +   struct intel_memory_region *mr = placements[i];
> > > >
> > > > GEM_BUG_ON(!is_power_of_2(mr->min_page_size));
> > > > max_page_size = max_t(u32, max_page_size, 
> > > > mr->min_page_size);
> > > > @@ -81,22 +82,35 @@ static int i915_gem_publish(struct 
> > > > drm_i915_gem_object *obj,
> > > > return 0;
> > > >  }
> > > >
> > > > -static int
> > > > -i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
> > > > +static struct drm_i915_gem_object *
> > > > +i915_gem_object_create_user(struct drm_i915_private *i915, u64 size,
> > > > +   struct intel_memory_region **placements,
> > > > +   unsigned int n_placements)
> > > >  {
> > > > -   struct intel_memory_region *mr = obj->mm.placements[0];
> > > > +   struct intel_memory_region *mr = placements[0];
> > > > +   struct drm_i915_gem_object *obj;
> > > > unsigned int flags;
> > > > int ret;
> > > >
> > > > -   size = round_up(size, object_max_page_size(obj));
> > > > +   i915_gem_flush_free_objects(i915);
> > > > +
> > > > +   obj = i915_gem_object_alloc();
> > > > +   if (!obj)
> > > > +   return ERR_PTR(-ENOMEM);
> > > > +
> > > > +   size = round_up(size, object_max_page_size(placements, 
> > > > n_placements));
> > > > if (size == 0)
> > > > -   return -EINVAL;
> > > > +   return ERR_PTR(-EINVAL);
> > > >
> > > > /* For most of the ABI (e.g. mmap) we think in system pages */
> > > > GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
> > > >
> > > > if (i915_gem_object_size_2big(size))
> > > > -   return -E2BIG;
> > > > +   return ERR_PTR(-E2BIG);
> > > > +
> > > > +   ret = object_set_placements(obj, placements, n_placements);
> > > > +   if (ret)
> > > > +   goto object_free;
> > >
> > > Thinking on this again, it might be way too thorny to expose
> > > create_user as-is to other parts of i915, like we do in the last
> > > patch. Since the caller will be expected to manually validate the
> > > placements, otherwise we migh

Re: [Intel-gfx] [PATCH] drm/i915: Correct the docs for intel_engine_cmd_parser

2021-07-21 Thread Jason Ekstrand
Would you mind pushing?  I still don't have those magic powers. :-)

--Jason

On Wed, Jul 21, 2021 at 5:05 AM Rodrigo Vivi  wrote:
>
> On Tue, Jul 20, 2021 at 04:04:59PM -0500, Jason Ekstrand wrote:
> > On Tue, Jul 20, 2021 at 3:26 PM Rodrigo Vivi  wrote:
> > >
> > > On Tue, Jul 20, 2021 at 04:25:21PM -0400, Rodrigo Vivi wrote:
> > > > On Tue, Jul 20, 2021 at 01:21:08PM -0500, Jason Ekstrand wrote:
> > > > > In c9d9fdbc108a ("drm/i915: Revert "drm/i915/gem: Asynchronous
> > > > > cmdparser""), the parameters to intel_engine_cmd_parser() were altered
> > > > > without updating the docs, causing Fi.CI.DOCS to start failing.
> > > > >
> > > > > Signed-off-by: Jason Ekstrand 
> > > > > ---
> > > > >  drivers/gpu/drm/i915/i915_cmd_parser.c | 4 +---
> > > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
> > > > > b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > > > index 322f4d5955a4f..e0403ce9ce692 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > > > @@ -1416,9 +1416,7 @@ static unsigned long *alloc_whitelist(u32 
> > > > > batch_length)
> > > > >   * @batch_offset: byte offset in the batch at which execution starts
> > > > >   * @batch_length: length of the commands in batch_obj
> > > > >   * @shadow: validated copy of the batch buffer in question
> > > > > - * @jump_whitelist: buffer preallocated with 
> > > > > intel_engine_cmd_parser_alloc_jump_whitelist()
> > > > > - * @shadow_map: mapping to @shadow vma
> > > > > - * @batch_map: mapping to @batch vma
> > > > > + * @trampoline: true if we need to trampoline into privileged 
> > > > > execution
> > > >
> > > > I was wondering if we should also return the original text, but this one
> > > > here looks better.
> > > >
> > > >
> > > > Reviewed-by: Rodrigo Vivi 
> > >
> > > btw, while on it, I wouldn't mind if you squash some english fixes to
> > > the trampoline documentation block inside this function ;)
> >
> > I don't mind at all but I'm not sure what changes you're suggesting.
>
> nevermind...
> It was just my broke english that didn't know the inversion on the "only if"
>
>
> >
> > > >
> > > >
> > > > >   *
> > > > >   * Parses the specified batch buffer looking for privilege 
> > > > > violations as
> > > > >   * described in the overview.
> > > > > --
> > > > > 2.31.1
> > > > >
> > > > > ___
> > > > > Intel-gfx mailing list
> > > > > Intel-gfx@lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 6/6] drm/i915: Make the kmem slab for i915_buddy_block a global

2021-07-21 Thread Jason Ekstrand
There's no reason that I can tell why this should be per-i915_buddy_mm
and doing so causes KMEM_CACHE to throw dmesg warnings because it tries
to create a debugfs entry with the name i915_buddy_block multiple times.
We could handle this by carefully giving each slab its own name but that
brings its own pain because then we have to store that string somewhere
and manage the lifetimes of the different slabs.  The most likely
outcome would be a global atomic which we increment to get a new name or
something like that.

The much easier solution is to use the i915_globals system like we do
for every other slab in i915.  This ensures that we have exactly one of
them for each i915 driver load and it gets neatly created on module load
and destroyed on module unload.  Using the globals system also means
that its now tied into the shrink handler so we can properly respond to
low-memory situations.

Signed-off-by: Jason Ekstrand 
Fixes: 88be9a0a06b7 ("drm/i915/ttm: add ttm_buddy_man")
Cc: Matthew Auld 
Cc: Christian König 
---
 drivers/gpu/drm/i915/i915_buddy.c   | 44 ++---
 drivers/gpu/drm/i915/i915_buddy.h   |  3 +-
 drivers/gpu/drm/i915/i915_globals.c |  2 ++
 3 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_buddy.c 
b/drivers/gpu/drm/i915/i915_buddy.c
index 29dd7d0310c1f..911feedad4513 100644
--- a/drivers/gpu/drm/i915/i915_buddy.c
+++ b/drivers/gpu/drm/i915/i915_buddy.c
@@ -8,8 +8,14 @@
 #include "i915_buddy.h"
 
 #include "i915_gem.h"
+#include "i915_globals.h"
 #include "i915_utils.h"
 
+static struct i915_global_buddy {
+   struct i915_global base;
+   struct kmem_cache *slab_blocks;
+} global;
+
 static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm,
 struct i915_buddy_block 
*parent,
 unsigned int order,
@@ -19,7 +25,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
i915_buddy_mm *mm,
 
GEM_BUG_ON(order > I915_BUDDY_MAX_ORDER);
 
-   block = kmem_cache_zalloc(mm->slab_blocks, GFP_KERNEL);
+   block = kmem_cache_zalloc(global.slab_blocks, GFP_KERNEL);
if (!block)
return NULL;
 
@@ -34,7 +40,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
i915_buddy_mm *mm,
 static void i915_block_free(struct i915_buddy_mm *mm,
struct i915_buddy_block *block)
 {
-   kmem_cache_free(mm->slab_blocks, block);
+   kmem_cache_free(global.slab_blocks, block);
 }
 
 static void mark_allocated(struct i915_buddy_block *block)
@@ -85,15 +91,11 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 size, u64 
chunk_size)
 
GEM_BUG_ON(mm->max_order > I915_BUDDY_MAX_ORDER);
 
-   mm->slab_blocks = KMEM_CACHE(i915_buddy_block, SLAB_HWCACHE_ALIGN);
-   if (!mm->slab_blocks)
-   return -ENOMEM;
-
mm->free_list = kmalloc_array(mm->max_order + 1,
  sizeof(struct list_head),
  GFP_KERNEL);
if (!mm->free_list)
-   goto out_destroy_slab;
+   return -ENOMEM;
 
for (i = 0; i <= mm->max_order; ++i)
INIT_LIST_HEAD(>free_list[i]);
@@ -145,8 +147,6 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 size, u64 
chunk_size)
kfree(mm->roots);
 out_free_list:
kfree(mm->free_list);
-out_destroy_slab:
-   kmem_cache_destroy(mm->slab_blocks);
return -ENOMEM;
 }
 
@@ -161,7 +161,6 @@ void i915_buddy_fini(struct i915_buddy_mm *mm)
 
kfree(mm->roots);
kfree(mm->free_list);
-   kmem_cache_destroy(mm->slab_blocks);
 }
 
 static int split_block(struct i915_buddy_mm *mm,
@@ -410,3 +409,28 @@ int i915_buddy_alloc_range(struct i915_buddy_mm *mm,
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/i915_buddy.c"
 #endif
+
+static void i915_global_buddy_shrink(void)
+{
+   kmem_cache_shrink(global.slab_blocks);
+}
+
+static void i915_global_buddy_exit(void)
+{
+   kmem_cache_destroy(global.slab_blocks);
+}
+
+static struct i915_global_buddy global = { {
+   .shrink = i915_global_buddy_shrink,
+   .exit = i915_global_buddy_exit,
+} };
+
+int __init i915_global_buddy_init(void)
+{
+   global.slab_blocks = KMEM_CACHE(i915_buddy_block, 0);
+   if (!global.slab_blocks)
+   return -ENOMEM;
+
+   i915_global_register();
+   return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_buddy.h 
b/drivers/gpu/drm/i915/i915_buddy.h
index 37f8c42071d12..d8f26706de52f 100644
--- a/drivers/gpu/drm/i915/i915_buddy.h
+++ b/drivers/gpu/drm/i915/i915_buddy.h
@@ -47,7 +47,6 @@ struct i915_buddy_block {
  * i915_buddy_alloc* and i915_buddy_free* should suffice.
  */
 struct i915_buddy_mm {
-   struct kmem_cache *slab_block

[Intel-gfx] [PATCH 5/6] drm/ttm: Initialize debugfs from ttm_global_init()

2021-07-21 Thread Jason Ekstrand
We create a bunch of debugfs entries as a side-effect of
ttm_global_init() and then never clean them up.  This isn't usually a
problem because we free the whole debugfs directory on module unload.
However, if the global reference count ever goes to zero and then
ttm_global_init() is called again, we'll re-create those debugfs entries
and debugfs will complain in dmesg that we're creating entries that
already exist.  This patch fixes this problem by changing the lifetime
of the whole TTM debugfs directory to match that of the TTM global
state.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/ttm/ttm_device.c | 12 
 drivers/gpu/drm/ttm/ttm_module.c | 16 
 2 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 519deea8e39b7..74e3b460132b3 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -44,6 +44,8 @@ static unsigned ttm_glob_use_count;
 struct ttm_global ttm_glob;
 EXPORT_SYMBOL(ttm_glob);
 
+struct dentry *ttm_debugfs_root;
+
 static void ttm_global_release(void)
 {
struct ttm_global *glob = _glob;
@@ -53,6 +55,7 @@ static void ttm_global_release(void)
goto out;
 
ttm_pool_mgr_fini();
+   debugfs_remove(ttm_debugfs_root);
 
__free_page(glob->dummy_read_page);
memset(glob, 0, sizeof(*glob));
@@ -73,6 +76,13 @@ static int ttm_global_init(void)
 
si_meminfo();
 
+   ttm_debugfs_root = debugfs_create_dir("ttm", NULL);
+   if (IS_ERR(ttm_debugfs_root)) {
+   ret = PTR_ERR(ttm_debugfs_root);
+   ttm_debugfs_root = NULL;
+   goto out;
+   }
+
/* Limit the number of pages in the pool to about 50% of the total
 * system memory.
 */
@@ -100,6 +110,8 @@ static int ttm_global_init(void)
debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root,
>bo_count);
 out:
+   if (ret && ttm_debugfs_root)
+   debugfs_remove(ttm_debugfs_root);
if (ret)
--ttm_glob_use_count;
mutex_unlock(_global_mutex);
diff --git a/drivers/gpu/drm/ttm/ttm_module.c b/drivers/gpu/drm/ttm/ttm_module.c
index 997c458f68a9a..7fcdef278c742 100644
--- a/drivers/gpu/drm/ttm/ttm_module.c
+++ b/drivers/gpu/drm/ttm/ttm_module.c
@@ -72,22 +72,6 @@ pgprot_t ttm_prot_from_caching(enum ttm_caching caching, 
pgprot_t tmp)
return tmp;
 }
 
-struct dentry *ttm_debugfs_root;
-
-static int __init ttm_init(void)
-{
-   ttm_debugfs_root = debugfs_create_dir("ttm", NULL);
-   return 0;
-}
-
-static void __exit ttm_exit(void)
-{
-   debugfs_remove(ttm_debugfs_root);
-}
-
-module_init(ttm_init);
-module_exit(ttm_exit);
-
 MODULE_AUTHOR("Thomas Hellstrom, Jerome Glisse");
 MODULE_DESCRIPTION("TTM memory manager subsystem (for DRM device)");
 MODULE_LICENSE("GPL and additional rights");
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 4/6] drm/ttm: Force re-init if ttm_global_init() fails

2021-07-21 Thread Jason Ekstrand
If we have a failure, decrement the reference count so that the next
call to ttm_global_init() will actually do something instead of assume
everything is all set up.

Signed-off-by: Jason Ekstrand 
Fixes: 62b53b37e4b1 ("drm/ttm: use a static ttm_bo_global instance")
Reviewed-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_device.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 5f31acec3ad76..519deea8e39b7 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -100,6 +100,8 @@ static int ttm_global_init(void)
debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root,
>bo_count);
 out:
+   if (ret)
+   --ttm_glob_use_count;
mutex_unlock(_global_mutex);
return ret;
 }
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/6] drm/i915: Use a table for i915_init/exit (v2)

2021-07-21 Thread Jason Ekstrand
If the driver was not fully loaded, we may still have globals lying
around.  If we don't tear those down in i915_exit(), we'll leak a bunch
of memory slabs.  This can happen two ways: use_kms = false and if we've
run mock selftests.  In either case, we have an early exit from
i915_init which happens after i915_globals_init() and we need to clean
up those globals.

The mock selftests case is especially sticky.  The load isn't entirely
a no-op.  We actually do quite a bit inside those selftests including
allocating a bunch of mock objects and running tests on them.  Once all
those tests are complete, we exit early from i915_init().  Perviously,
i915_init() would return a non-zero error code on failure and a zero
error code on success.  In the success case, we would get to i915_exit()
and check i915_pci_driver.driver.owner to detect if i915_init exited early
and do nothing.  In the failure case, we would fail i915_init() but
there would be no opportunity to clean up globals.

The most annoying part is that you don't actually notice the failure as
part of the self-tests since leaking a bit of memory, while bad, doesn't
result in anything observable from userspace.  Instead, the next time we
load the driver (usually for next IGT test), i915_globals_init() gets
invoked again, we go to allocate a bunch of new memory slabs, those
implicitly create debugfs entries, and debugfs warns that we're trying
to create directories and files that already exist.  Since this all
happens as part of the next driver load, it shows up in the dmesg-warn
of whatever IGT test ran after the mock selftests.

While the obvious thing to do here might be to call i915_globals_exit()
after selftests, that's not actually safe.  The dma-buf selftests call
i915_gem_prime_export which creates a file.  We call dma_buf_put() on
the resulting dmabuf which calls fput() on the file.  However, fput()
isn't immediate and gets flushed right before syscall returns.  This
means that all the fput()s from the selftests don't happen until right
before the module load syscall used to fire off the selftests returns
which is after i915_init().  If we call i915_globals_exit() in
i915_init() after selftests, we end up freeing slabs out from under
objects which won't get released until fput() is flushed at the end of
the module load syscall.

The solution here is to let i915_init() return success early and detect
the early success in i915_exit() and only tear down globals and nothing
else.  This way the module loads successfully, regardless of the success
or failure of the tests.  Because we've not enumerated any PCI devices,
no device nodes are created and it's entirely useless from userspace.
The only thing the module does at that point is hold on to a bit of
memory until we unload it and i915_exit() is called.  Importantly, this
means that everything from our selftests has the ability to properly
flush out between i915_init() and i915_exit() because there is at least
one syscall boundary in between.

In order to handle all the delicate init/exit cases, we convert the
whole thing to a table of init/exit pairs and track the init status in
the new init_progress global.  This allows us to ensure that i915_exit()
always tears down exactly the things that i915_init() successfully
initialized.  We also allow early-exit of i915_init() without failure by
an init function returning > 0.  This is useful for nomodeset, and
selftests.  For the mock selftests, we convert them to always return 1
so we get the desired behavior of the driver always succeeding to load
the driver and then properly tearing down the partially loaded driver.

v2 (Tvrtko Ursulin):
 - Guard init_funcs[i].exit with GEM_BUG_ON(i >= ARRAY_SIZE(init_funcs))
v2 (Daniel Vetter):
 - Update the docstring for i915.mock_selftests

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_pci.c   | 105 --
 drivers/gpu/drm/i915/i915_perf.c  |   3 +-
 drivers/gpu/drm/i915/i915_perf.h  |   2 +-
 drivers/gpu/drm/i915/i915_pmu.c   |   4 +-
 drivers/gpu/drm/i915/i915_pmu.h   |   4 +-
 .../gpu/drm/i915/selftests/i915_selftest.c|   4 +-
 6 files changed, 82 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 4e627b57d31a2..5f05cb1b5ac6b 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1185,27 +1185,9 @@ static void i915_pci_shutdown(struct pci_dev *pdev)
i915_driver_shutdown(i915);
 }
 
-static struct pci_driver i915_pci_driver = {
-   .name = DRIVER_NAME,
-   .id_table = pciidlist,
-   .probe = i915_pci_probe,
-   .remove = i915_pci_remove,
-   .shutdown = i915_pci_shutdown,
-   .driver.pm = _pm_ops,
-};
-
-static int __init i915_init(void)
+static int i915_check_nomodeset(void)
 {
bool use_kms = true;
-   int err;
-
-

[Intel-gfx] [PATCH 2/6] drm/i915: Call i915_globals_exit() if pci_register_device() fails

2021-07-21 Thread Jason Ekstrand
In the unlikely event that pci_register_device() fails, we were tearing
down our PMU setup but not globals.  This leaves a bunch of memory slabs
lying around.

Signed-off-by: Jason Ekstrand 
Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global")
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_globals.c | 4 ++--
 drivers/gpu/drm/i915/i915_pci.c | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_globals.c 
b/drivers/gpu/drm/i915/i915_globals.c
index 77f1911c463b8..87267e1d2ad92 100644
--- a/drivers/gpu/drm/i915/i915_globals.c
+++ b/drivers/gpu/drm/i915/i915_globals.c
@@ -138,7 +138,7 @@ void i915_globals_unpark(void)
atomic_inc();
 }
 
-static void __exit __i915_globals_flush(void)
+static void __i915_globals_flush(void)
 {
atomic_inc(); /* skip shrinking */
 
@@ -148,7 +148,7 @@ static void __exit __i915_globals_flush(void)
atomic_dec();
 }
 
-void __exit i915_globals_exit(void)
+void i915_globals_exit(void)
 {
GEM_BUG_ON(atomic_read());
 
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 50ed93b03e582..4e627b57d31a2 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1230,6 +1230,7 @@ static int __init i915_init(void)
err = pci_register_driver(_pci_driver);
if (err) {
i915_pmu_exit();
+   i915_globals_exit();
return err;
}
 
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/6] Fix the debugfs splat from mock selftests (v3)

2021-07-21 Thread Jason Ekstrand
This patch series fixes a miscellaneous collection of bugs that all add up
to all our mock selftests throwing dmesg warnings in CI.  As can be seen
from "drm/i915: Use a table for i915_init/exit", it's especially fun since
those warnings don't always show up in the selftests but can show up in
other random IGTs depending on test execution order.

Jason Ekstrand (6):
  drm/i915: Call i915_globals_exit() after i915_pmu_exit()
  drm/i915: Call i915_globals_exit() if pci_register_device() fails
  drm/i915: Use a table for i915_init/exit (v2)
  drm/ttm: Force re-init if ttm_global_init() fails
  drm/ttm: Initialize debugfs from ttm_global_init()
  drm/i915: Make the kmem slab for i915_buddy_block a global

 drivers/gpu/drm/i915/i915_buddy.c |  44 ++--
 drivers/gpu/drm/i915/i915_buddy.h |   3 +-
 drivers/gpu/drm/i915/i915_globals.c   |   6 +-
 drivers/gpu/drm/i915/i915_pci.c   | 104 --
 drivers/gpu/drm/i915/i915_perf.c  |   3 +-
 drivers/gpu/drm/i915/i915_perf.h  |   2 +-
 drivers/gpu/drm/i915/i915_pmu.c   |   4 +-
 drivers/gpu/drm/i915/i915_pmu.h   |   4 +-
 .../gpu/drm/i915/selftests/i915_selftest.c|   4 +-
 drivers/gpu/drm/ttm/ttm_device.c  |  14 +++
 drivers/gpu/drm/ttm/ttm_module.c  |  16 ---
 11 files changed, 136 insertions(+), 68 deletions(-)

-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/6] drm/i915: Call i915_globals_exit() after i915_pmu_exit()

2021-07-21 Thread Jason Ekstrand
We should tear down in the opposite order we set up.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 67696d7522718..50ed93b03e582 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1244,8 +1244,8 @@ static void __exit i915_exit(void)
 
i915_perf_sysctl_unregister();
pci_unregister_driver(_pci_driver);
-   i915_globals_exit();
i915_pmu_exit();
+   i915_globals_exit();
 }
 
 module_init(i915_init);
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/6] drm/i915: Always call i915_globals_exit() from i915_exit()

2021-07-21 Thread Jason Ekstrand
On Wed, Jul 21, 2021 at 6:26 AM Daniel Vetter  wrote:
>
> On Tue, Jul 20, 2021 at 09:55:22AM -0500, Jason Ekstrand wrote:
> > On Tue, Jul 20, 2021 at 9:18 AM Daniel Vetter  wrote:
> > >
> > > On Mon, Jul 19, 2021 at 01:30:44PM -0500, Jason Ekstrand wrote:
> > > > If the driver was not fully loaded, we may still have globals lying
> > > > around.  If we don't tear those down in i915_exit(), we'll leak a bunch
> > > > of memory slabs.  This can happen two ways: use_kms = false and if we've
> > > > run mock selftests.  In either case, we have an early exit from
> > > > i915_init which happens after i915_globals_init() and we need to clean
> > > > up those globals.  While we're here, add an explicit boolean instead of
> > > > using a random field from i915_pci_device to detect partial loads.
> > > >
> > > > The mock selftests case gets especially sticky.  The load isn't entirely
> > > > a no-op.  We actually do quite a bit inside those selftests including
> > > > allocating a bunch of mock objects and running tests on them.  Once all
> > > > those tests are complete, we exit early from i915_init().  Perviously,
> > > > i915_init() would return a non-zero error code on failure and a zero
> > > > error code on success.  In the success case, we would get to i915_exit()
> > > > and check i915_pci_driver.driver.owner to detect if i915_init exited 
> > > > early
> > > > and do nothing.  In the failure case, we would fail i915_init() but
> > > > there would be no opportunity to clean up globals.
> > > >
> > > > The most annoying part is that you don't actually notice the failure as
> > > > part of the self-tests since leaking a bit of memory, while bad, doesn't
> > > > result in anything observable from userspace.  Instead, the next time we
> > > > load the driver (usually for next IGT test), i915_globals_init() gets
> > > > invoked again, we go to allocate a bunch of new memory slabs, those
> > > > implicitly create debugfs entries, and debugfs warns that we're trying
> > > > to create directories and files that already exist.  Since this all
> > > > happens as part of the next driver load, it shows up in the dmesg-warn
> > > > of whatever IGT test ran after the mock selftests.
> > > >
> > > > While the obvious thing to do here might be to call i915_globals_exit()
> > > > after selftests, that's not actually safe.  The dma-buf selftests call
> > > > i915_gem_prime_export which creates a file.  We call dma_buf_put() on
> > > > the resulting dmabuf which calls fput() on the file.  However, fput()
> > > > isn't immediate and gets flushed right before syscall returns.  This
> > > > means that all the fput()s from the selftests don't happen until right
> > > > before the module load syscall used to fire off the selftests returns
> > > > which is after i915_init().  If we call i915_globals_exit() in
> > > > i915_init() after selftests, we end up freeing slabs out from under
> > > > objects which won't get released until fput() is flushed at the end of
> > > > the module load.
> > > >
> > > > The solution here is to let i915_init() return success early and detect
> > > > the early success in i915_exit() and only tear down globals and nothing
> > > > else.  This way the module loads successfully, regardless of the success
> > > > or failure of the tests.  Because we've not enumerated any PCI devices,
> > > > no device nodes are created and it's entirely useless from userspace.
> > > > The only thing the module does at that point is hold on to a bit of
> > > > memory until we unload it and i915_exit() is called.  Importantly, this
> > > > means that everything from our selftests has the ability to properly
> > > > flush out between i915_init() and i915_exit() because there are a couple
> > > > syscall boundaries in between.
> > > >
> > > > Signed-off-by: Jason Ekstrand 
> > > > Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global")
> > > > Cc: Daniel Vetter 
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_pci.c | 32 +---
> > > >  1 file changed, 25 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/i915_pci.c 
> > > > b/drivers/gpu/drm/i915/i915_pci.c
> > > > index 4e627b57d31a2..2

Re: [Intel-gfx] [PATCH 3/6] drm/i915: Use a table for i915_init/exit

2021-07-21 Thread Jason Ekstrand
On Wed, Jul 21, 2021 at 4:06 AM Tvrtko Ursulin
 wrote:
>
>
> On 20/07/2021 19:13, Jason Ekstrand wrote:
> > If the driver was not fully loaded, we may still have globals lying
> > around.  If we don't tear those down in i915_exit(), we'll leak a bunch
> > of memory slabs.  This can happen two ways: use_kms = false and if we've
> > run mock selftests.  In either case, we have an early exit from
> > i915_init which happens after i915_globals_init() and we need to clean
> > up those globals.
> >
> > The mock selftests case is especially sticky.  The load isn't entirely
> > a no-op.  We actually do quite a bit inside those selftests including
> > allocating a bunch of mock objects and running tests on them.  Once all
> > those tests are complete, we exit early from i915_init().  Perviously,
> > i915_init() would return a non-zero error code on failure and a zero
> > error code on success.  In the success case, we would get to i915_exit()
> > and check i915_pci_driver.driver.owner to detect if i915_init exited early
> > and do nothing.  In the failure case, we would fail i915_init() but
> > there would be no opportunity to clean up globals.
> >
> > The most annoying part is that you don't actually notice the failure as
> > part of the self-tests since leaking a bit of memory, while bad, doesn't
> > result in anything observable from userspace.  Instead, the next time we
> > load the driver (usually for next IGT test), i915_globals_init() gets
> > invoked again, we go to allocate a bunch of new memory slabs, those
> > implicitly create debugfs entries, and debugfs warns that we're trying
> > to create directories and files that already exist.  Since this all
> > happens as part of the next driver load, it shows up in the dmesg-warn
> > of whatever IGT test ran after the mock selftests.
> >
> > While the obvious thing to do here might be to call i915_globals_exit()
> > after selftests, that's not actually safe.  The dma-buf selftests call
> > i915_gem_prime_export which creates a file.  We call dma_buf_put() on
> > the resulting dmabuf which calls fput() on the file.  However, fput()
> > isn't immediate and gets flushed right before syscall returns.  This
> > means that all the fput()s from the selftests don't happen until right
> > before the module load syscall used to fire off the selftests returns
> > which is after i915_init().  If we call i915_globals_exit() in
> > i915_init() after selftests, we end up freeing slabs out from under
> > objects which won't get released until fput() is flushed at the end of
> > the module load syscall.
> >
> > The solution here is to let i915_init() return success early and detect
> > the early success in i915_exit() and only tear down globals and nothing
> > else.  This way the module loads successfully, regardless of the success
> > or failure of the tests.  Because we've not enumerated any PCI devices,
> > no device nodes are created and it's entirely useless from userspace.
> > The only thing the module does at that point is hold on to a bit of
> > memory until we unload it and i915_exit() is called.  Importantly, this
> > means that everything from our selftests has the ability to properly
> > flush out between i915_init() and i915_exit() because there is at least
> > one syscall boundary in between.
> >
> > In order to handle all the delicate init/exit cases, we convert the
> > whole thing to a table of init/exit pairs and track the init status in
> > the new init_progress global.  This allows us to ensure that i915_exit()
> > always tears down exactly the things that i915_init() successfully
> > initialized.  We also allow early-exit of i915_init() without failure by
> > an init function returning > 0.  This is useful for nomodeset, and
> > selftests.  For the mock selftests, we convert them to always return 1
> > so we get the desired behavior of the driver always succeeding to load
> > the driver and then properly tearing down the partially loaded driver.
> >
> > Signed-off-by: Jason Ekstrand 
> > Cc: Daniel Vetter 
> > Cc: Tvrtko Ursulin 
> > ---
> >   drivers/gpu/drm/i915/i915_pci.c   | 104 --
> >   drivers/gpu/drm/i915/i915_perf.c  |   3 +-
> >   drivers/gpu/drm/i915/i915_perf.h  |   2 +-
> >   drivers/gpu/drm/i915/i915_pmu.c   |   4 +-
> >   drivers/gpu/drm/i915/i915_pmu.h   |   4 +-
> >   .../gpu/drm/i915/selftests/i915_selftest.c|   2 +-
> >   6 files changed, 80 insertions(+), 39 deletions(-)
> >
> > diff --git a/drivers/gp

Re: [Intel-gfx] [PATCH 2/7] drm/i915/gem: Refactor placement setup for i915_gem_object_create* (v2)

2021-07-20 Thread Jason Ekstrand
On Mon, Jul 19, 2021 at 3:18 AM Matthew Auld
 wrote:
>
> On Fri, 16 Jul 2021 at 15:14, Jason Ekstrand  wrote:
> >
> > Since we don't allow changing the set of regions after creation, we can
> > make ext_set_placements() build up the region set directly in the
> > create_ext and assign it to the object later.  This is similar to what
> > we did for contexts with the proto-context only simpler because there's
> > no funny object shuffling.  This will be used in the next patch to allow
> > us to de-duplicate a bunch of code.  Also, since we know the maximum
> > number of regions up-front, we can use a fixed-size temporary array for
> > the regions.  This simplifies memory management a bit for this new
> > delayed approach.
> >
> > v2 (Matthew Auld):
> >  - Get rid of MAX_N_PLACEMENTS
> >  - Drop kfree(placements) from set_placements()
> >
> > Signed-off-by: Jason Ekstrand 
> > Cc: Matthew Auld 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_create.c | 81 --
> >  1 file changed, 45 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > index 51f92e4b1a69d..5766749a449c0 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > @@ -27,10 +27,13 @@ static u32 object_max_page_size(struct 
> > drm_i915_gem_object *obj)
> > return max_page_size;
> >  }
> >
> > -static void object_set_placements(struct drm_i915_gem_object *obj,
> > - struct intel_memory_region **placements,
> > - unsigned int n_placements)
> > +static int object_set_placements(struct drm_i915_gem_object *obj,
> > +struct intel_memory_region **placements,
> > +unsigned int n_placements)
> >  {
> > +   struct intel_memory_region **arr;
> > +   unsigned int i;
> > +
> > GEM_BUG_ON(!n_placements);
> >
> > /*
> > @@ -44,9 +47,20 @@ static void object_set_placements(struct 
> > drm_i915_gem_object *obj,
> > obj->mm.placements = >mm.regions[mr->id];
> > obj->mm.n_placements = 1;
> > } else {
> > -   obj->mm.placements = placements;
> > +   arr = kmalloc_array(n_placements,
> > +   sizeof(struct intel_memory_region *),
> > +   GFP_KERNEL);
> > +   if (!arr)
> > +   return -ENOMEM;
> > +
> > +   for (i = 0; i < n_placements; i++)
> > +   arr[i] = placements[i];
> > +
> > +   obj->mm.placements = arr;
> > obj->mm.n_placements = n_placements;
> > }
> > +
> > +   return 0;
> >  }
> >
> >  static int i915_gem_publish(struct drm_i915_gem_object *obj,
> > @@ -148,7 +162,9 @@ i915_gem_dumb_create(struct drm_file *file,
> > return -ENOMEM;
> >
> > mr = intel_memory_region_by_type(to_i915(dev), mem_type);
> > -   object_set_placements(obj, , 1);
> > +   ret = object_set_placements(obj, , 1);
> > +   if (ret)
> > +   goto object_free;
> >
> > ret = i915_gem_setup(obj, args->size);
> > if (ret)
> > @@ -184,7 +200,9 @@ i915_gem_create_ioctl(struct drm_device *dev, void 
> > *data,
> > return -ENOMEM;
> >
> > mr = intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
> > -   object_set_placements(obj, , 1);
> > +   ret = object_set_placements(obj, , 1);
> > +   if (ret)
> > +   goto object_free;
> >
> > ret = i915_gem_setup(obj, args->size);
> > if (ret)
> > @@ -199,7 +217,8 @@ i915_gem_create_ioctl(struct drm_device *dev, void 
> > *data,
> >
> >  struct create_ext {
> > struct drm_i915_private *i915;
> > -   struct drm_i915_gem_object *vanilla_object;
> > +   struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
> > +   unsigned int n_placements;
> >  };
> >
> >  static void repr_placements(char *buf, size_t size,
> > @@ -230,8 +249,7 @@ static int set_placements(struct 
> > drm_i915_gem_create_ext_memory_regions *args,
> > struct drm_i915_private *i915 = ext_data->i915;
> > s

Re: [Intel-gfx] [PATCH 3/7] drm/i915/gem: Unify user object creation

2021-07-20 Thread Jason Ekstrand
On Tue, Jul 20, 2021 at 4:35 AM Matthew Auld
 wrote:
>
> On Thu, 15 Jul 2021 at 23:39, Jason Ekstrand  wrote:
> >
> > Instead of hand-rolling the same three calls in each function, pull them
> > into an i915_gem_object_create_user helper.  Apart from re-ordering of
> > the placements array ENOMEM check, the only functional change here
> > should be that i915_gem_dumb_create now calls i915_gem_flush_free_objects
> > which it probably should have been calling all along.
> >
> > Signed-off-by: Jason Ekstrand 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_create.c | 106 +
> >  1 file changed, 43 insertions(+), 63 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > index 391c8c4a12172..69bf9ec777642 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > @@ -11,13 +11,14 @@
> >  #include "i915_trace.h"
> >  #include "i915_user_extensions.h"
> >
> > -static u32 object_max_page_size(struct drm_i915_gem_object *obj)
> > +static u32 object_max_page_size(struct intel_memory_region **placements,
> > +   unsigned int n_placements)
> >  {
> > u32 max_page_size = 0;
> > int i;
> >
> > -   for (i = 0; i < obj->mm.n_placements; i++) {
> > -   struct intel_memory_region *mr = obj->mm.placements[i];
> > +   for (i = 0; i < n_placements; i++) {
> > +   struct intel_memory_region *mr = placements[i];
> >
> > GEM_BUG_ON(!is_power_of_2(mr->min_page_size));
> > max_page_size = max_t(u32, max_page_size, 
> > mr->min_page_size);
> > @@ -81,22 +82,35 @@ static int i915_gem_publish(struct drm_i915_gem_object 
> > *obj,
> > return 0;
> >  }
> >
> > -static int
> > -i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
> > +static struct drm_i915_gem_object *
> > +i915_gem_object_create_user(struct drm_i915_private *i915, u64 size,
> > +   struct intel_memory_region **placements,
> > +   unsigned int n_placements)
> >  {
> > -   struct intel_memory_region *mr = obj->mm.placements[0];
> > +   struct intel_memory_region *mr = placements[0];
> > +   struct drm_i915_gem_object *obj;
> > unsigned int flags;
> > int ret;
> >
> > -   size = round_up(size, object_max_page_size(obj));
> > +   i915_gem_flush_free_objects(i915);
> > +
> > +   obj = i915_gem_object_alloc();
> > +   if (!obj)
> > +   return ERR_PTR(-ENOMEM);
> > +
> > +   size = round_up(size, object_max_page_size(placements, 
> > n_placements));
> > if (size == 0)
> > -   return -EINVAL;
> > +   return ERR_PTR(-EINVAL);
> >
> > /* For most of the ABI (e.g. mmap) we think in system pages */
> > GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
> >
> > if (i915_gem_object_size_2big(size))
> > -   return -E2BIG;
> > +   return ERR_PTR(-E2BIG);
> > +
> > +   ret = object_set_placements(obj, placements, n_placements);
> > +   if (ret)
> > +   goto object_free;
>
> Thinking on this again, it might be way too thorny to expose
> create_user as-is to other parts of i915, like we do in the last
> patch. Since the caller will be expected to manually validate the
> placements, otherwise we might crash and burn in weird ways as new
> users pop up. i.e it needs the same validation that happens as part of
> the extension. Also as new extensions arrive, like with PXP, that also
> has to get bolted onto create_user, which might have its own hidden
> constraints.

Perhaps.  Do you have a suggestion for how to make it available to
selftests without exposing it to "the rest of i915"?  If you want, I
can make create_user duplicate the placements uniqueness check.
That's really the only validation currently in the ioctl besides all
the stuff for making sure that the class/instance provided by the user
isn't bogus.  But if we've got real i915_memory_region pointers, we
don't need that.

--Jason
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 6/7] drm/i915/gem: Correct the locking and pin pattern for dma-buf (v6)

2021-07-20 Thread Jason Ekstrand
On Tue, Jul 20, 2021 at 4:07 AM Matthew Auld
 wrote:
>
> On Fri, 16 Jul 2021 at 15:14, Jason Ekstrand  wrote:
> >
> > From: Thomas Hellström 
> >
> > If our exported dma-bufs are imported by another instance of our driver,
> > that instance will typically have the imported dma-bufs locked during
> > dma_buf_map_attachment(). But the exporter also locks the same reservation
> > object in the map_dma_buf() callback, which leads to recursive locking.
> >
> > So taking the lock inside _pin_pages_unlocked() is incorrect.
> >
> > Additionally, the current pinning code path is contrary to the defined
> > way that pinning should occur.
> >
> > Remove the explicit pin/unpin from the map/umap functions and move them
> > to the attach/detach allowing correct locking to occur, and to match
> > the static dma-buf drm_prime pattern.
> >
> > Add a live selftest to exercise both dynamic and non-dynamic
> > exports.
> >
> > v2:
> > - Extend the selftest with a fake dynamic importer.
> > - Provide real pin and unpin callbacks to not abuse the interface.
> > v3: (ruhl)
> > - Remove the dynamic export support and move the pinning into the
> >   attach/detach path.
> > v4: (ruhl)
> > - Put pages does not need to assert on the dma-resv
> > v5: (jason)
> > - Lock around dma_buf_unmap_attachment() when emulating a dynamic
> >   importer in the subtests.
> > - Use pin_pages_unlocked
> > v6: (jason)
> > - Use dma_buf_attach instead of dma_buf_attach_dynamic in the selftests
> >
> > Reported-by: Michael J. Ruhl 
> > Signed-off-by: Thomas Hellström 
> > Signed-off-by: Michael J. Ruhl 
> > Signed-off-by: Jason Ekstrand 
> > Reviewed-by: Jason Ekstrand 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  43 ++--
> >  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 103 +-
> >  2 files changed, 132 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > index 616c3a2f1baf0..9a655f69a0671 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > @@ -12,6 +12,8 @@
> >  #include "i915_gem_object.h"
> >  #include "i915_scatterlist.h"
> >
> > +I915_SELFTEST_DECLARE(static bool force_different_devices;)
> > +
> >  static struct drm_i915_gem_object *dma_buf_to_obj(struct dma_buf *buf)
> >  {
> > return to_intel_bo(buf->priv);
> > @@ -25,15 +27,11 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
> > dma_buf_attachment *attachme
> > struct scatterlist *src, *dst;
> > int ret, i;
> >
> > -   ret = i915_gem_object_pin_pages_unlocked(obj);
> > -   if (ret)
> > -   goto err;
> > -
> > /* Copy sg so that we make an independent mapping */
> > st = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
> > if (st == NULL) {
> > ret = -ENOMEM;
> > -   goto err_unpin_pages;
> > +   goto err;
> > }
> >
> > ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);
> > @@ -58,8 +56,6 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
> > dma_buf_attachment *attachme
> > sg_free_table(st);
> >  err_free:
> > kfree(st);
> > -err_unpin_pages:
> > -   i915_gem_object_unpin_pages(obj);
> >  err:
> > return ERR_PTR(ret);
> >  }
> > @@ -68,13 +64,9 @@ static void i915_gem_unmap_dma_buf(struct 
> > dma_buf_attachment *attachment,
> >struct sg_table *sg,
> >enum dma_data_direction dir)
> >  {
> > -   struct drm_i915_gem_object *obj = 
> > dma_buf_to_obj(attachment->dmabuf);
> > -
> > dma_unmap_sgtable(attachment->dev, sg, dir, DMA_ATTR_SKIP_CPU_SYNC);
> > sg_free_table(sg);
> > kfree(sg);
> > -
> > -   i915_gem_object_unpin_pages(obj);
> >  }
> >
> >  static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct 
> > dma_buf_map *map)
> > @@ -168,7 +160,31 @@ static int i915_gem_end_cpu_access(struct dma_buf 
> > *dma_buf, enum dma_data_direct
> > return err;
> >  }
> >
> > +/**
> > + * i915_gem_dmabuf_attach - Do any extra attach work necessary
> > + * @dmabuf: imported dma-buf
> > + * @attach: new attach to do 

Re: [Intel-gfx] [PATCH 7/7] drm/i915/gem: Migrate to system at dma-buf attach time (v6)

2021-07-20 Thread Jason Ekstrand
Fixed all the nits below locally.  It'll be in the next send.

On Tue, Jul 20, 2021 at 5:53 AM Matthew Auld
 wrote:
>
> On Fri, 16 Jul 2021 at 15:14, Jason Ekstrand  wrote:
> >
> > From: Thomas Hellström 
> >
> > Until we support p2p dma or as a complement to that, migrate data
> > to system memory at dma-buf attach time if possible.
> >
> > v2:
> > - Rebase on dynamic exporter. Update the igt_dmabuf_import_same_driver
> >   selftest to migrate if we are LMEM capable.
> > v3:
> > - Migrate also in the pin() callback.
> > v4:
> > - Migrate in attach
> > v5: (jason)
> > - Lock around the migration
> > v6: (jason)
> > - Move the can_migrate check outside the lock
> > - Rework the selftests to test more migration conditions.  In
> >   particular, SMEM, LMEM, and LMEM+SMEM are all checked.
> >
> > Signed-off-by: Thomas Hellström 
> > Signed-off-by: Michael J. Ruhl 
> > Reported-by: kernel test robot 
> > Signed-off-by: Jason Ekstrand 
> > Reviewed-by: Jason Ekstrand 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_create.c|  2 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 23 -
> >  drivers/gpu/drm/i915/gem/i915_gem_object.h|  4 +
> >  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 89 ++-
> >  4 files changed, 112 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > index 039e4f3b39c79..41c4cd3e1ea01 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > @@ -82,7 +82,7 @@ static int i915_gem_publish(struct drm_i915_gem_object 
> > *obj,
> > return 0;
> >  }
> >
> > -static struct drm_i915_gem_object *
> > +struct drm_i915_gem_object *
> >  i915_gem_object_create_user(struct drm_i915_private *i915, u64 size,
> > struct intel_memory_region **placements,
> > unsigned int n_placements)
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > index 9a655f69a0671..5d438b95826b9 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > @@ -170,8 +170,29 @@ static int i915_gem_dmabuf_attach(struct dma_buf 
> > *dmabuf,
> >   struct dma_buf_attachment *attach)
> >  {
> > struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
> > +   struct i915_gem_ww_ctx ww;
> > +   int err;
> > +
> > +   if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM))
> > +   return -EOPNOTSUPP;
> > +
> > +   for_i915_gem_ww(, err, true) {
> > +   err = i915_gem_object_lock(obj, );
> > +   if (err)
> > +   continue;
> > +
> > +   err = i915_gem_object_migrate(obj, , INTEL_REGION_SMEM);
> > +   if (err)
> > +   continue;
> >
> > -   return i915_gem_object_pin_pages_unlocked(obj);
> > +   err = i915_gem_object_wait_migration(obj, 0);
> > +   if (err)
> > +   continue;
> > +
> > +   err = i915_gem_object_pin_pages(obj);
> > +   }
> > +
> > +   return err;
> >  }
> >
> >  static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf,
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
> > b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > index 8be4fadeee487..fbae53bd46384 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > @@ -61,6 +61,10 @@ i915_gem_object_create_shmem(struct drm_i915_private 
> > *i915,
> >  struct drm_i915_gem_object *
> >  i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915,
> >const void *data, resource_size_t 
> > size);
> > +struct drm_i915_gem_object *
> > +i915_gem_object_create_user(struct drm_i915_private *i915, u64 size,
> > +   struct intel_memory_region **placements,
> > +   unsigned int n_placements);
> >
> >  extern const struct drm_i915_gem_object_ops i915_gem_shmem_ops;
> >
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c 
> > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
> > index 4451bbb4917e4..

Re: [Intel-gfx] [PATCH] drm/i915: Correct the docs for intel_engine_cmd_parser

2021-07-20 Thread Jason Ekstrand
On Tue, Jul 20, 2021 at 3:26 PM Rodrigo Vivi  wrote:
>
> On Tue, Jul 20, 2021 at 04:25:21PM -0400, Rodrigo Vivi wrote:
> > On Tue, Jul 20, 2021 at 01:21:08PM -0500, Jason Ekstrand wrote:
> > > In c9d9fdbc108a ("drm/i915: Revert "drm/i915/gem: Asynchronous
> > > cmdparser""), the parameters to intel_engine_cmd_parser() were altered
> > > without updating the docs, causing Fi.CI.DOCS to start failing.
> > >
> > > Signed-off-by: Jason Ekstrand 
> > > ---
> > >  drivers/gpu/drm/i915/i915_cmd_parser.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
> > > b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > index 322f4d5955a4f..e0403ce9ce692 100644
> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > @@ -1416,9 +1416,7 @@ static unsigned long *alloc_whitelist(u32 
> > > batch_length)
> > >   * @batch_offset: byte offset in the batch at which execution starts
> > >   * @batch_length: length of the commands in batch_obj
> > >   * @shadow: validated copy of the batch buffer in question
> > > - * @jump_whitelist: buffer preallocated with 
> > > intel_engine_cmd_parser_alloc_jump_whitelist()
> > > - * @shadow_map: mapping to @shadow vma
> > > - * @batch_map: mapping to @batch vma
> > > + * @trampoline: true if we need to trampoline into privileged execution
> >
> > I was wondering if we should also return the original text, but this one
> > here looks better.
> >
> >
> > Reviewed-by: Rodrigo Vivi 
>
> btw, while on it, I wouldn't mind if you squash some english fixes to
> the trampoline documentation block inside this function ;)

I don't mind at all but I'm not sure what changes you're suggesting.

> >
> >
> > >   *
> > >   * Parses the specified batch buffer looking for privilege violations as
> > >   * described in the overview.
> > > --
> > > 2.31.1
> > >
> > > ___
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Correct the docs for intel_engine_cmd_parser

2021-07-20 Thread Jason Ekstrand
In c9d9fdbc108a ("drm/i915: Revert "drm/i915/gem: Asynchronous
cmdparser""), the parameters to intel_engine_cmd_parser() were altered
without updating the docs, causing Fi.CI.DOCS to start failing.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 322f4d5955a4f..e0403ce9ce692 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1416,9 +1416,7 @@ static unsigned long *alloc_whitelist(u32 batch_length)
  * @batch_offset: byte offset in the batch at which execution starts
  * @batch_length: length of the commands in batch_obj
  * @shadow: validated copy of the batch buffer in question
- * @jump_whitelist: buffer preallocated with 
intel_engine_cmd_parser_alloc_jump_whitelist()
- * @shadow_map: mapping to @shadow vma
- * @batch_map: mapping to @batch vma
+ * @trampoline: true if we need to trampoline into privileged execution
  *
  * Parses the specified batch buffer looking for privilege violations as
  * described in the overview.
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 6/6] drm/i915: Make the kmem slab for i915_buddy_block a global

2021-07-20 Thread Jason Ekstrand
There's no reason that I can tell why this should be per-i915_buddy_mm
and doing so causes KMEM_CACHE to throw dmesg warnings because it tries
to create a debugfs entry with the name i915_buddy_block multiple times.
We could handle this by carefully giving each slab its own name but that
brings its own pain because then we have to store that string somewhere
and manage the lifetimes of the different slabs.  The most likely
outcome would be a global atomic which we increment to get a new name or
something like that.

The much easier solution is to use the i915_globals system like we do
for every other slab in i915.  This ensures that we have exactly one of
them for each i915 driver load and it gets neatly created on module load
and destroyed on module unload.  Using the globals system also means
that its now tied into the shrink handler so we can properly respond to
low-memory situations.

Signed-off-by: Jason Ekstrand 
Fixes: 88be9a0a06b7 ("drm/i915/ttm: add ttm_buddy_man")
Cc: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_buddy.c   | 44 ++---
 drivers/gpu/drm/i915/i915_buddy.h   |  3 +-
 drivers/gpu/drm/i915/i915_globals.c |  2 ++
 3 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_buddy.c 
b/drivers/gpu/drm/i915/i915_buddy.c
index 29dd7d0310c1f..911feedad4513 100644
--- a/drivers/gpu/drm/i915/i915_buddy.c
+++ b/drivers/gpu/drm/i915/i915_buddy.c
@@ -8,8 +8,14 @@
 #include "i915_buddy.h"
 
 #include "i915_gem.h"
+#include "i915_globals.h"
 #include "i915_utils.h"
 
+static struct i915_global_buddy {
+   struct i915_global base;
+   struct kmem_cache *slab_blocks;
+} global;
+
 static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm,
 struct i915_buddy_block 
*parent,
 unsigned int order,
@@ -19,7 +25,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
i915_buddy_mm *mm,
 
GEM_BUG_ON(order > I915_BUDDY_MAX_ORDER);
 
-   block = kmem_cache_zalloc(mm->slab_blocks, GFP_KERNEL);
+   block = kmem_cache_zalloc(global.slab_blocks, GFP_KERNEL);
if (!block)
return NULL;
 
@@ -34,7 +40,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
i915_buddy_mm *mm,
 static void i915_block_free(struct i915_buddy_mm *mm,
struct i915_buddy_block *block)
 {
-   kmem_cache_free(mm->slab_blocks, block);
+   kmem_cache_free(global.slab_blocks, block);
 }
 
 static void mark_allocated(struct i915_buddy_block *block)
@@ -85,15 +91,11 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 size, u64 
chunk_size)
 
GEM_BUG_ON(mm->max_order > I915_BUDDY_MAX_ORDER);
 
-   mm->slab_blocks = KMEM_CACHE(i915_buddy_block, SLAB_HWCACHE_ALIGN);
-   if (!mm->slab_blocks)
-   return -ENOMEM;
-
mm->free_list = kmalloc_array(mm->max_order + 1,
  sizeof(struct list_head),
  GFP_KERNEL);
if (!mm->free_list)
-   goto out_destroy_slab;
+   return -ENOMEM;
 
for (i = 0; i <= mm->max_order; ++i)
INIT_LIST_HEAD(>free_list[i]);
@@ -145,8 +147,6 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 size, u64 
chunk_size)
kfree(mm->roots);
 out_free_list:
kfree(mm->free_list);
-out_destroy_slab:
-   kmem_cache_destroy(mm->slab_blocks);
return -ENOMEM;
 }
 
@@ -161,7 +161,6 @@ void i915_buddy_fini(struct i915_buddy_mm *mm)
 
kfree(mm->roots);
kfree(mm->free_list);
-   kmem_cache_destroy(mm->slab_blocks);
 }
 
 static int split_block(struct i915_buddy_mm *mm,
@@ -410,3 +409,28 @@ int i915_buddy_alloc_range(struct i915_buddy_mm *mm,
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/i915_buddy.c"
 #endif
+
+static void i915_global_buddy_shrink(void)
+{
+   kmem_cache_shrink(global.slab_blocks);
+}
+
+static void i915_global_buddy_exit(void)
+{
+   kmem_cache_destroy(global.slab_blocks);
+}
+
+static struct i915_global_buddy global = { {
+   .shrink = i915_global_buddy_shrink,
+   .exit = i915_global_buddy_exit,
+} };
+
+int __init i915_global_buddy_init(void)
+{
+   global.slab_blocks = KMEM_CACHE(i915_buddy_block, 0);
+   if (!global.slab_blocks)
+   return -ENOMEM;
+
+   i915_global_register();
+   return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_buddy.h 
b/drivers/gpu/drm/i915/i915_buddy.h
index 37f8c42071d12..d8f26706de52f 100644
--- a/drivers/gpu/drm/i915/i915_buddy.h
+++ b/drivers/gpu/drm/i915/i915_buddy.h
@@ -47,7 +47,6 @@ struct i915_buddy_block {
  * i915_buddy_alloc* and i915_buddy_free* should suffice.
  */
 struct i915_buddy_mm {
-   struct kmem_cache *slab_blocks;
/* Maintain 

[Intel-gfx] [PATCH 5/6] drm/ttm: Initialize debugfs from ttm_global_init()

2021-07-20 Thread Jason Ekstrand
We create a bunch of debugfs entries as a side-effect of
ttm_global_init() and then never clean them up.  This isn't usually a
problem because we free the whole debugfs directory on module unload.
However, if the global reference count ever goes to zero and then
ttm_global_init() is called again, we'll re-create those debugfs entries
and debugfs will complain in dmesg that we're creating entries that
already exist.  This patch fixes this problem by changing the lifetime
of the whole TTM debugfs directory to match that of the TTM global
state.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/ttm/ttm_device.c | 12 
 drivers/gpu/drm/ttm/ttm_module.c | 16 
 2 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 519deea8e39b7..74e3b460132b3 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -44,6 +44,8 @@ static unsigned ttm_glob_use_count;
 struct ttm_global ttm_glob;
 EXPORT_SYMBOL(ttm_glob);
 
+struct dentry *ttm_debugfs_root;
+
 static void ttm_global_release(void)
 {
struct ttm_global *glob = _glob;
@@ -53,6 +55,7 @@ static void ttm_global_release(void)
goto out;
 
ttm_pool_mgr_fini();
+   debugfs_remove(ttm_debugfs_root);
 
__free_page(glob->dummy_read_page);
memset(glob, 0, sizeof(*glob));
@@ -73,6 +76,13 @@ static int ttm_global_init(void)
 
si_meminfo();
 
+   ttm_debugfs_root = debugfs_create_dir("ttm", NULL);
+   if (IS_ERR(ttm_debugfs_root)) {
+   ret = PTR_ERR(ttm_debugfs_root);
+   ttm_debugfs_root = NULL;
+   goto out;
+   }
+
/* Limit the number of pages in the pool to about 50% of the total
 * system memory.
 */
@@ -100,6 +110,8 @@ static int ttm_global_init(void)
debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root,
>bo_count);
 out:
+   if (ret && ttm_debugfs_root)
+   debugfs_remove(ttm_debugfs_root);
if (ret)
--ttm_glob_use_count;
mutex_unlock(_global_mutex);
diff --git a/drivers/gpu/drm/ttm/ttm_module.c b/drivers/gpu/drm/ttm/ttm_module.c
index 997c458f68a9a..7fcdef278c742 100644
--- a/drivers/gpu/drm/ttm/ttm_module.c
+++ b/drivers/gpu/drm/ttm/ttm_module.c
@@ -72,22 +72,6 @@ pgprot_t ttm_prot_from_caching(enum ttm_caching caching, 
pgprot_t tmp)
return tmp;
 }
 
-struct dentry *ttm_debugfs_root;
-
-static int __init ttm_init(void)
-{
-   ttm_debugfs_root = debugfs_create_dir("ttm", NULL);
-   return 0;
-}
-
-static void __exit ttm_exit(void)
-{
-   debugfs_remove(ttm_debugfs_root);
-}
-
-module_init(ttm_init);
-module_exit(ttm_exit);
-
 MODULE_AUTHOR("Thomas Hellstrom, Jerome Glisse");
 MODULE_DESCRIPTION("TTM memory manager subsystem (for DRM device)");
 MODULE_LICENSE("GPL and additional rights");
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 4/6] drm/ttm: Force re-init if ttm_global_init() fails

2021-07-20 Thread Jason Ekstrand
If we have a failure, decrement the reference count so that the next
call to ttm_global_init() will actually do something instead of assume
everything is all set up.

Signed-off-by: Jason Ekstrand 
Fixes: 62b53b37e4b1 ("drm/ttm: use a static ttm_bo_global instance")
Reviewed-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_device.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 5f31acec3ad76..519deea8e39b7 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -100,6 +100,8 @@ static int ttm_global_init(void)
debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root,
>bo_count);
 out:
+   if (ret)
+   --ttm_glob_use_count;
mutex_unlock(_global_mutex);
return ret;
 }
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/6] drm/i915: Use a table for i915_init/exit

2021-07-20 Thread Jason Ekstrand
If the driver was not fully loaded, we may still have globals lying
around.  If we don't tear those down in i915_exit(), we'll leak a bunch
of memory slabs.  This can happen two ways: use_kms = false and if we've
run mock selftests.  In either case, we have an early exit from
i915_init which happens after i915_globals_init() and we need to clean
up those globals.

The mock selftests case is especially sticky.  The load isn't entirely
a no-op.  We actually do quite a bit inside those selftests including
allocating a bunch of mock objects and running tests on them.  Once all
those tests are complete, we exit early from i915_init().  Perviously,
i915_init() would return a non-zero error code on failure and a zero
error code on success.  In the success case, we would get to i915_exit()
and check i915_pci_driver.driver.owner to detect if i915_init exited early
and do nothing.  In the failure case, we would fail i915_init() but
there would be no opportunity to clean up globals.

The most annoying part is that you don't actually notice the failure as
part of the self-tests since leaking a bit of memory, while bad, doesn't
result in anything observable from userspace.  Instead, the next time we
load the driver (usually for next IGT test), i915_globals_init() gets
invoked again, we go to allocate a bunch of new memory slabs, those
implicitly create debugfs entries, and debugfs warns that we're trying
to create directories and files that already exist.  Since this all
happens as part of the next driver load, it shows up in the dmesg-warn
of whatever IGT test ran after the mock selftests.

While the obvious thing to do here might be to call i915_globals_exit()
after selftests, that's not actually safe.  The dma-buf selftests call
i915_gem_prime_export which creates a file.  We call dma_buf_put() on
the resulting dmabuf which calls fput() on the file.  However, fput()
isn't immediate and gets flushed right before syscall returns.  This
means that all the fput()s from the selftests don't happen until right
before the module load syscall used to fire off the selftests returns
which is after i915_init().  If we call i915_globals_exit() in
i915_init() after selftests, we end up freeing slabs out from under
objects which won't get released until fput() is flushed at the end of
the module load syscall.

The solution here is to let i915_init() return success early and detect
the early success in i915_exit() and only tear down globals and nothing
else.  This way the module loads successfully, regardless of the success
or failure of the tests.  Because we've not enumerated any PCI devices,
no device nodes are created and it's entirely useless from userspace.
The only thing the module does at that point is hold on to a bit of
memory until we unload it and i915_exit() is called.  Importantly, this
means that everything from our selftests has the ability to properly
flush out between i915_init() and i915_exit() because there is at least
one syscall boundary in between.

In order to handle all the delicate init/exit cases, we convert the
whole thing to a table of init/exit pairs and track the init status in
the new init_progress global.  This allows us to ensure that i915_exit()
always tears down exactly the things that i915_init() successfully
initialized.  We also allow early-exit of i915_init() without failure by
an init function returning > 0.  This is useful for nomodeset, and
selftests.  For the mock selftests, we convert them to always return 1
so we get the desired behavior of the driver always succeeding to load
the driver and then properly tearing down the partially loaded driver.

Signed-off-by: Jason Ekstrand 
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_pci.c   | 104 --
 drivers/gpu/drm/i915/i915_perf.c  |   3 +-
 drivers/gpu/drm/i915/i915_perf.h  |   2 +-
 drivers/gpu/drm/i915/i915_pmu.c   |   4 +-
 drivers/gpu/drm/i915/i915_pmu.h   |   4 +-
 .../gpu/drm/i915/selftests/i915_selftest.c|   2 +-
 6 files changed, 80 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 4e627b57d31a2..64ebd89eae6ce 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1185,27 +1185,9 @@ static void i915_pci_shutdown(struct pci_dev *pdev)
i915_driver_shutdown(i915);
 }
 
-static struct pci_driver i915_pci_driver = {
-   .name = DRIVER_NAME,
-   .id_table = pciidlist,
-   .probe = i915_pci_probe,
-   .remove = i915_pci_remove,
-   .shutdown = i915_pci_shutdown,
-   .driver.pm = _pm_ops,
-};
-
-static int __init i915_init(void)
+static int i915_check_nomodeset(void)
 {
bool use_kms = true;
-   int err;
-
-   err = i915_globals_init();
-   if (err)
-   return err;
-
-   err = i915_mock_selftests();
-   if (err)
-   return err > 0 ? 0

[Intel-gfx] [PATCH 2/6] drm/i915: Call i915_globals_exit() if pci_register_device() fails

2021-07-20 Thread Jason Ekstrand
In the unlikely event that pci_register_device() fails, we were tearing
down our PMU setup but not globals.  This leaves a bunch of memory slabs
lying around.

Signed-off-by: Jason Ekstrand 
Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global")
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_globals.c | 4 ++--
 drivers/gpu/drm/i915/i915_pci.c | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_globals.c 
b/drivers/gpu/drm/i915/i915_globals.c
index 77f1911c463b8..87267e1d2ad92 100644
--- a/drivers/gpu/drm/i915/i915_globals.c
+++ b/drivers/gpu/drm/i915/i915_globals.c
@@ -138,7 +138,7 @@ void i915_globals_unpark(void)
atomic_inc();
 }
 
-static void __exit __i915_globals_flush(void)
+static void __i915_globals_flush(void)
 {
atomic_inc(); /* skip shrinking */
 
@@ -148,7 +148,7 @@ static void __exit __i915_globals_flush(void)
atomic_dec();
 }
 
-void __exit i915_globals_exit(void)
+void i915_globals_exit(void)
 {
GEM_BUG_ON(atomic_read());
 
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 50ed93b03e582..4e627b57d31a2 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1230,6 +1230,7 @@ static int __init i915_init(void)
err = pci_register_driver(_pci_driver);
if (err) {
i915_pmu_exit();
+   i915_globals_exit();
return err;
}
 
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/6] drm/i915: Call i915_globals_exit() after i915_pmu_exit()

2021-07-20 Thread Jason Ekstrand
We should tear down in the opposite order we set up.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 67696d7522718..50ed93b03e582 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1244,8 +1244,8 @@ static void __exit i915_exit(void)
 
i915_perf_sysctl_unregister();
pci_unregister_driver(_pci_driver);
-   i915_globals_exit();
i915_pmu_exit();
+   i915_globals_exit();
 }
 
 module_init(i915_init);
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/6] Fix the debugfs splat from mock selftests

2021-07-20 Thread Jason Ekstrand
This patch series fixes a miscellaneous collection of bugs that all add up
to all our mock selftests throwing dmesg warnings in CI.  As can be seen
from "drm/i915: Use a table for i915_init/exit", it's especially fun since
those warnings don't always show up in the selftests but can show up in
other random IGTs depending on test execution order.

Jason Ekstrand (6):
  drm/i915: Call i915_globals_exit() after i915_pmu_exit()
  drm/i915: Call i915_globals_exit() if pci_register_device() fails
  drm/i915: Use a table for i915_init/exit
  drm/ttm: Force re-init if ttm_global_init() fails
  drm/ttm: Initialize debugfs from ttm_global_init()
  drm/i915: Make the kmem slab for i915_buddy_block a global

 drivers/gpu/drm/i915/i915_buddy.c |  44 ++--
 drivers/gpu/drm/i915/i915_buddy.h |   3 +-
 drivers/gpu/drm/i915/i915_globals.c   |   6 +-
 drivers/gpu/drm/i915/i915_pci.c   | 103 --
 drivers/gpu/drm/i915/i915_perf.c  |   3 +-
 drivers/gpu/drm/i915/i915_perf.h  |   2 +-
 drivers/gpu/drm/i915/i915_pmu.c   |   4 +-
 drivers/gpu/drm/i915/i915_pmu.h   |   4 +-
 .../gpu/drm/i915/selftests/i915_selftest.c|   2 +-
 drivers/gpu/drm/ttm/ttm_device.c  |  14 +++
 drivers/gpu/drm/ttm/ttm_module.c  |  16 ---
 11 files changed, 134 insertions(+), 67 deletions(-)

-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Check for nomodeset in i915_init() first

2021-07-20 Thread Jason Ekstrand
On Mon, Jul 19, 2021 at 3:35 AM Daniel Vetter  wrote:
>
> Jason is trying to sort out the unwinding in i915_init and i915_exit,
> while reviewing those patches I noticed that we also have the
> nomodeset handling now in the middle of things.
>
> Pull that out for simplisity in unwinding - if you run selftest with
> nomodeset you get nothing, *shrug*.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_pci.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 67696d752271..6fe709ac1b4b 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -1199,14 +1199,6 @@ static int __init i915_init(void)
> bool use_kms = true;
> int err;
>
> -   err = i915_globals_init();
> -   if (err)
> -   return err;
> -
> -   err = i915_mock_selftests();
> -   if (err)
> -   return err > 0 ? 0 : err;
> -
> /*
>  * Enable KMS by default, unless explicitly overriden by
>  * either the i915.modeset prarameter or by the
> @@ -1225,6 +1217,14 @@ static int __init i915_init(void)
> return 0;
> }
>
> +   err = i915_globals_init();
> +   if (err)
> +   return err;
> +
> +   err = i915_mock_selftests();
> +   if (err)
> +   return err > 0 ? 0 : err;
> +

Annoyingly, this actually makes i915_exit() harder because now we need
to conditionals: One for "do you have globals?" and one for "do you
have anything at all?".  It's actually easier to get right if we have

i915_globals_init()

/* Everything that can return 0 early */

fully_loaded = true

/* Everything that can fail */

> i915_pmu_init();
>
> err = pci_register_driver(_pci_driver);
> --
> 2.32.0
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/6] drm/i915: Always call i915_globals_exit() from i915_exit()

2021-07-20 Thread Jason Ekstrand
Sorry... didn't reply to everything the first time

On Tue, Jul 20, 2021 at 3:25 AM Tvrtko Ursulin
 wrote:
>
>
> On 19/07/2021 19:30, Jason Ekstrand wrote:
> > If the driver was not fully loaded, we may still have globals lying
> > around.  If we don't tear those down in i915_exit(), we'll leak a bunch
> > of memory slabs.  This can happen two ways: use_kms = false and if we've
> > run mock selftests.  In either case, we have an early exit from
> > i915_init which happens after i915_globals_init() and we need to clean
> > up those globals.  While we're here, add an explicit boolean instead of
> > using a random field from i915_pci_device to detect partial loads.
> >
> > The mock selftests case gets especially sticky.  The load isn't entirely
> > a no-op.  We actually do quite a bit inside those selftests including
> > allocating a bunch of mock objects and running tests on them.  Once all
> > those tests are complete, we exit early from i915_init().  Perviously,
> > i915_init() would return a non-zero error code on failure and a zero
> > error code on success.  In the success case, we would get to i915_exit()
> > and check i915_pci_driver.driver.owner to detect if i915_init exited early
> > and do nothing.  In the failure case, we would fail i915_init() but
> > there would be no opportunity to clean up globals.
> >
> > The most annoying part is that you don't actually notice the failure as
> > part of the self-tests since leaking a bit of memory, while bad, doesn't
> > result in anything observable from userspace.  Instead, the next time we
> > load the driver (usually for next IGT test), i915_globals_init() gets
> > invoked again, we go to allocate a bunch of new memory slabs, those
> > implicitly create debugfs entries, and debugfs warns that we're trying
> > to create directories and files that already exist.  Since this all
> > happens as part of the next driver load, it shows up in the dmesg-warn
> > of whatever IGT test ran after the mock selftests.
>
> Story checks out but I totally don't get why it wouldn't be noticed
> until now. Was it perhaps part of the selfetsts contract that a reboot
> is required after failure?

If there is such a contract, CI doesn't follow it.  We unload the
driver after selftests but that's it.

> > While the obvious thing to do here might be to call i915_globals_exit()
> > after selftests, that's not actually safe.  The dma-buf selftests call
> > i915_gem_prime_export which creates a file.  We call dma_buf_put() on
> > the resulting dmabuf which calls fput() on the file.  However, fput()
> > isn't immediate and gets flushed right before syscall returns.  This
> > means that all the fput()s from the selftests don't happen until right
> > before the module load syscall used to fire off the selftests returns
> > which is after i915_init().  If we call i915_globals_exit() in
> > i915_init() after selftests, we end up freeing slabs out from under
> > objects which won't get released until fput() is flushed at the end of
> > the module load.
>
> Nasty. Wasn't visible while globals memory leak was "in place". :I
>
> > The solution here is to let i915_init() return success early and detect
> > the early success in i915_exit() and only tear down globals and nothing
> > else.  This way the module loads successfully, regardless of the success
> > or failure of the tests.  Because we've not enumerated any PCI devices,
> > no device nodes are created and it's entirely useless from userspace.
> > The only thing the module does at that point is hold on to a bit of
> > memory until we unload it and i915_exit() is called.  Importantly, this
> > means that everything from our selftests has the ability to properly
> > flush out between i915_init() and i915_exit() because there are a couple
> > syscall boundaries in between.
>
> When you say "couple of syscall boundaries" you mean exactly two (module
> init/unload) or there is more to it? Like why "couple" is needed and not
> just that the module load syscall has exited? That part sounds
> potentially dodgy. What mechanism is used by the delayed flush?

It only needs the one syscall.  I've changed the text to say "at least
one syscall boundary".  I think that's more clear without providing an
exact count which may not be tractable.

> Have you checked how this change interacts with the test runner and CI?

As far as I know, there's no interesting interaction here.  That said,
I did just find that the live selftests fail the modprobe on selftest
failure which means they're tearing down globals before a full syscall
boundary which may be sketchy.  Fortunately, n

Re: [Intel-gfx] [PATCH 3/6] drm/i915: Always call i915_globals_exit() from i915_exit()

2021-07-20 Thread Jason Ekstrand
On Tue, Jul 20, 2021 at 9:18 AM Daniel Vetter  wrote:
>
> On Mon, Jul 19, 2021 at 01:30:44PM -0500, Jason Ekstrand wrote:
> > If the driver was not fully loaded, we may still have globals lying
> > around.  If we don't tear those down in i915_exit(), we'll leak a bunch
> > of memory slabs.  This can happen two ways: use_kms = false and if we've
> > run mock selftests.  In either case, we have an early exit from
> > i915_init which happens after i915_globals_init() and we need to clean
> > up those globals.  While we're here, add an explicit boolean instead of
> > using a random field from i915_pci_device to detect partial loads.
> >
> > The mock selftests case gets especially sticky.  The load isn't entirely
> > a no-op.  We actually do quite a bit inside those selftests including
> > allocating a bunch of mock objects and running tests on them.  Once all
> > those tests are complete, we exit early from i915_init().  Perviously,
> > i915_init() would return a non-zero error code on failure and a zero
> > error code on success.  In the success case, we would get to i915_exit()
> > and check i915_pci_driver.driver.owner to detect if i915_init exited early
> > and do nothing.  In the failure case, we would fail i915_init() but
> > there would be no opportunity to clean up globals.
> >
> > The most annoying part is that you don't actually notice the failure as
> > part of the self-tests since leaking a bit of memory, while bad, doesn't
> > result in anything observable from userspace.  Instead, the next time we
> > load the driver (usually for next IGT test), i915_globals_init() gets
> > invoked again, we go to allocate a bunch of new memory slabs, those
> > implicitly create debugfs entries, and debugfs warns that we're trying
> > to create directories and files that already exist.  Since this all
> > happens as part of the next driver load, it shows up in the dmesg-warn
> > of whatever IGT test ran after the mock selftests.
> >
> > While the obvious thing to do here might be to call i915_globals_exit()
> > after selftests, that's not actually safe.  The dma-buf selftests call
> > i915_gem_prime_export which creates a file.  We call dma_buf_put() on
> > the resulting dmabuf which calls fput() on the file.  However, fput()
> > isn't immediate and gets flushed right before syscall returns.  This
> > means that all the fput()s from the selftests don't happen until right
> > before the module load syscall used to fire off the selftests returns
> > which is after i915_init().  If we call i915_globals_exit() in
> > i915_init() after selftests, we end up freeing slabs out from under
> > objects which won't get released until fput() is flushed at the end of
> > the module load.
> >
> > The solution here is to let i915_init() return success early and detect
> > the early success in i915_exit() and only tear down globals and nothing
> > else.  This way the module loads successfully, regardless of the success
> > or failure of the tests.  Because we've not enumerated any PCI devices,
> > no device nodes are created and it's entirely useless from userspace.
> > The only thing the module does at that point is hold on to a bit of
> > memory until we unload it and i915_exit() is called.  Importantly, this
> > means that everything from our selftests has the ability to properly
> > flush out between i915_init() and i915_exit() because there are a couple
> > syscall boundaries in between.
> >
> > Signed-off-by: Jason Ekstrand 
> > Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global")
> > Cc: Daniel Vetter 
> > ---
> >  drivers/gpu/drm/i915/i915_pci.c | 32 +---
> >  1 file changed, 25 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c 
> > b/drivers/gpu/drm/i915/i915_pci.c
> > index 4e627b57d31a2..24e4e54516936 100644
> > --- a/drivers/gpu/drm/i915/i915_pci.c
> > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > @@ -1194,18 +1194,31 @@ static struct pci_driver i915_pci_driver = {
> >   .driver.pm = _pm_ops,
> >  };
> >
> > +static bool i915_fully_loaded = false;
> > +
> >  static int __init i915_init(void)
> >  {
> >   bool use_kms = true;
> >   int err;
> >
> > + i915_fully_loaded = false;
> > +
> >   err = i915_globals_init();
> >   if (err)
> >   return err;
> >
> > + /* i915_mock_selftests() only returns zero if no mock subtests were
> > +  * run.  If we get any non-zero error code, we ret

Re: [Intel-gfx] [PATCH 3/6] drm/i915: Always call i915_globals_exit() from i915_exit()

2021-07-20 Thread Jason Ekstrand
On Tue, Jul 20, 2021 at 3:25 AM Tvrtko Ursulin
 wrote:
>
>
> On 19/07/2021 19:30, Jason Ekstrand wrote:
> > If the driver was not fully loaded, we may still have globals lying
> > around.  If we don't tear those down in i915_exit(), we'll leak a bunch
> > of memory slabs.  This can happen two ways: use_kms = false and if we've
> > run mock selftests.  In either case, we have an early exit from
> > i915_init which happens after i915_globals_init() and we need to clean
> > up those globals.  While we're here, add an explicit boolean instead of
> > using a random field from i915_pci_device to detect partial loads.
> >
> > The mock selftests case gets especially sticky.  The load isn't entirely
> > a no-op.  We actually do quite a bit inside those selftests including
> > allocating a bunch of mock objects and running tests on them.  Once all
> > those tests are complete, we exit early from i915_init().  Perviously,
> > i915_init() would return a non-zero error code on failure and a zero
> > error code on success.  In the success case, we would get to i915_exit()
> > and check i915_pci_driver.driver.owner to detect if i915_init exited early
> > and do nothing.  In the failure case, we would fail i915_init() but
> > there would be no opportunity to clean up globals.
> >
> > The most annoying part is that you don't actually notice the failure as
> > part of the self-tests since leaking a bit of memory, while bad, doesn't
> > result in anything observable from userspace.  Instead, the next time we
> > load the driver (usually for next IGT test), i915_globals_init() gets
> > invoked again, we go to allocate a bunch of new memory slabs, those
> > implicitly create debugfs entries, and debugfs warns that we're trying
> > to create directories and files that already exist.  Since this all
> > happens as part of the next driver load, it shows up in the dmesg-warn
> > of whatever IGT test ran after the mock selftests.
>
> Story checks out but I totally don't get why it wouldn't be noticed
> until now. Was it perhaps part of the selfetsts contract that a reboot
> is required after failure?

No.  They do unload the driver, though.  They just don't re-load it.

> > While the obvious thing to do here might be to call i915_globals_exit()
> > after selftests, that's not actually safe.  The dma-buf selftests call
> > i915_gem_prime_export which creates a file.  We call dma_buf_put() on
> > the resulting dmabuf which calls fput() on the file.  However, fput()
> > isn't immediate and gets flushed right before syscall returns.  This
> > means that all the fput()s from the selftests don't happen until right
> > before the module load syscall used to fire off the selftests returns
> > which is after i915_init().  If we call i915_globals_exit() in
> > i915_init() after selftests, we end up freeing slabs out from under
> > objects which won't get released until fput() is flushed at the end of
> > the module load.
>
> Nasty. Wasn't visible while globals memory leak was "in place". :I
>
> > The solution here is to let i915_init() return success early and detect
> > the early success in i915_exit() and only tear down globals and nothing
> > else.  This way the module loads successfully, regardless of the success
> > or failure of the tests.  Because we've not enumerated any PCI devices,
> > no device nodes are created and it's entirely useless from userspace.
> > The only thing the module does at that point is hold on to a bit of
> > memory until we unload it and i915_exit() is called.  Importantly, this
> > means that everything from our selftests has the ability to properly
> > flush out between i915_init() and i915_exit() because there are a couple
> > syscall boundaries in between.
>
> When you say "couple of syscall boundaries" you mean exactly two (module
> init/unload) or there is more to it? Like why "couple" is needed and not
> just that the module load syscall has exited? That part sounds
> potentially dodgy. What mechanism is used by the delayed flush?
>
> Have you checked how this change interacts with the test runner and CI?

By the end of the series, a bunch of tests are fixed.  In particular,
https://gitlab.freedesktop.org/drm/intel/-/issues/3746

> >
> > Signed-off-by: Jason Ekstrand 
> > Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global")
> > Cc: Daniel Vetter 
> > ---
> >   drivers/gpu/drm/i915/i915_pci.c | 32 +---
> >   1 file changed, 25 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c 
> > b/drivers

Re: [Intel-gfx] [PATCH 4/4] drm/i915/uapi: reject set_domain for discrete

2021-07-19 Thread Jason Ekstrand
On Mon, Jul 19, 2021 at 4:10 AM Matthew Auld
 wrote:
>
> On Fri, 16 Jul 2021 at 16:23, Jason Ekstrand  wrote:
> >
> > On Fri, Jul 16, 2021 at 9:52 AM Tvrtko Ursulin
> >  wrote:
> > >
> > >
> > > On 15/07/2021 11:15, Matthew Auld wrote:
> > > > The CPU domain should be static for discrete, and on DG1 we don't need
> > > > any flushing since everything is already coherent, so really all this
> > > > does is an object wait, for which we have an ioctl. Longer term the
> > > > desired caching should be an immutable creation time property for the
> > > > BO, which can be set with something like gem_create_ext.
> > > >
> > > > One other user is iris + userptr, which uses the set_domain to probe all
> > > > the pages to check if the GUP succeeds, however we now have a PROBE
> > > > flag for this purpose.
> > > >
> > > > v2: add some more kernel doc, also add the implicit rules with caching
> > > >
> > > > Suggested-by: Daniel Vetter 
> > > > Signed-off-by: Matthew Auld 
> > > > Cc: Thomas Hellström 
> > > > Cc: Maarten Lankhorst 
> > > > Cc: Tvrtko Ursulin 
> > > > Cc: Jordan Justen 
> > > > Cc: Kenneth Graunke 
> > > > Cc: Jason Ekstrand 
> > > > Cc: Daniel Vetter 
> > > > Cc: Ramalingam C 
> > > > Reviewed-by: Ramalingam C 
> > > > ---
> > > >   drivers/gpu/drm/i915/gem/i915_gem_domain.c |  3 +++
> > > >   include/uapi/drm/i915_drm.h| 19 +++
> > > >   2 files changed, 22 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
> > > > b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > > > index 43004bef55cb..b684a62bf3b0 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > > > @@ -490,6 +490,9 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, 
> > > > void *data,
> > > >   u32 write_domain = args->write_domain;
> > > >   int err;
> > > >
> > > > + if (IS_DGFX(to_i915(dev)))
> > > > + return -ENODEV;
> > > > +
> > > >   /* Only handle setting domains to types used by the CPU. */
> > > >   if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS)
> > > >   return -EINVAL;
> > > > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > > > index 2e4112bf4d38..04ce310e7ee6 100644
> > > > --- a/include/uapi/drm/i915_drm.h
> > > > +++ b/include/uapi/drm/i915_drm.h
> > > > @@ -901,6 +901,25 @@ struct drm_i915_gem_mmap_offset {
> > > >*  - I915_GEM_DOMAIN_GTT: Mappable aperture domain
> > > >*
> > > >* All other domains are rejected.
> > > > + *
> > > > + * Note that for discrete, starting from DG1, this is no longer 
> > > > supported, and
> > > > + * is instead rejected. On such platforms the CPU domain is 
> > > > effectively static,
> > > > + * where we also only support a single _i915_gem_mmap_offset cache 
> > > > mode,
> > > > + * which can't be set explicitly and instead depends on the object 
> > > > placements,
> > > > + * as per the below.
> > > > + *
> > > > + * Implicit caching rules, starting from DG1:
> > > > + *
> > > > + *   - If any of the object placements (see 
> > > > _i915_gem_create_ext_memory_regions)
> > > > + * contain I915_MEMORY_CLASS_DEVICE then the object will be 
> > > > allocated and
> > > > + * mapped as write-combined only.
> >
> > Is this accurate?  I thought they got WB when living in SMEM and WC
> > when on the device.  But, since both are coherent, it's safe to lie to
> > userspace and say it's all WC.  Is that correct or am I missing
> > something?
>
> Yes, it's accurate, it will be allocated and mapped as WC. I think we
> can just make select_tt_caching always return cached if we want, and
> it looks like ttm seems to be fine with having different caching
> values for the tt vs io resource. Daniel, should we adjust this?

Mildly related, we had an issue some time back with i915+amdgpu where
we were choosing different caching settings for SMEM shared BOs and
the fallout was that we had all sorts of caching trouble when running
an integrat

[Intel-gfx] [PATCH 5/6] drm/ttm: Initialize debugfs from ttm_global_init()

2021-07-19 Thread Jason Ekstrand
We create a bunch of debugfs entries as a side-effect of
ttm_global_init() and then never clean them up.  This isn't usually a
problem because we free the whole debugfs directory on module unload.
However, if the global reference count ever goes to zero and then
ttm_global_init() is called again, we'll re-create those debugfs entries
and debugfs will complain in dmesg that we're creating entries that
already exist.  This patch fixes this problem by changing the lifetime
of the whole TTM debugfs directory to match that of the TTM global
state.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/ttm/ttm_device.c | 12 
 drivers/gpu/drm/ttm/ttm_module.c |  4 
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 519deea8e39b7..74e3b460132b3 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -44,6 +44,8 @@ static unsigned ttm_glob_use_count;
 struct ttm_global ttm_glob;
 EXPORT_SYMBOL(ttm_glob);
 
+struct dentry *ttm_debugfs_root;
+
 static void ttm_global_release(void)
 {
struct ttm_global *glob = _glob;
@@ -53,6 +55,7 @@ static void ttm_global_release(void)
goto out;
 
ttm_pool_mgr_fini();
+   debugfs_remove(ttm_debugfs_root);
 
__free_page(glob->dummy_read_page);
memset(glob, 0, sizeof(*glob));
@@ -73,6 +76,13 @@ static int ttm_global_init(void)
 
si_meminfo();
 
+   ttm_debugfs_root = debugfs_create_dir("ttm", NULL);
+   if (IS_ERR(ttm_debugfs_root)) {
+   ret = PTR_ERR(ttm_debugfs_root);
+   ttm_debugfs_root = NULL;
+   goto out;
+   }
+
/* Limit the number of pages in the pool to about 50% of the total
 * system memory.
 */
@@ -100,6 +110,8 @@ static int ttm_global_init(void)
debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root,
>bo_count);
 out:
+   if (ret && ttm_debugfs_root)
+   debugfs_remove(ttm_debugfs_root);
if (ret)
--ttm_glob_use_count;
mutex_unlock(_global_mutex);
diff --git a/drivers/gpu/drm/ttm/ttm_module.c b/drivers/gpu/drm/ttm/ttm_module.c
index 997c458f68a9a..88554f2db11fe 100644
--- a/drivers/gpu/drm/ttm/ttm_module.c
+++ b/drivers/gpu/drm/ttm/ttm_module.c
@@ -72,17 +72,13 @@ pgprot_t ttm_prot_from_caching(enum ttm_caching caching, 
pgprot_t tmp)
return tmp;
 }
 
-struct dentry *ttm_debugfs_root;
-
 static int __init ttm_init(void)
 {
-   ttm_debugfs_root = debugfs_create_dir("ttm", NULL);
return 0;
 }
 
 static void __exit ttm_exit(void)
 {
-   debugfs_remove(ttm_debugfs_root);
 }
 
 module_init(ttm_init);
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 6/6] drm/i915: Make the kmem slab for i915_buddy_block a global

2021-07-19 Thread Jason Ekstrand
There's no reason that I can tell why this should be per-i915_buddy_mm
and doing so causes KMEM_CACHE to throw dmesg warnings because it tries
to create a debugfs entry with the name i915_buddy_block multiple times.
We could handle this by carefully giving each slab its own name but that
brings its own pain because then we have to store that string somewhere
and manage the lifetimes of the different slabs.  The most likely
outcome would be a global atomic which we increment to get a new name or
something like that.

The much easier solution is to use the i915_globals system like we do
for every other slab in i915.  This ensures that we have exactly one of
them for each i915 driver load and it gets neatly created on module load
and destroyed on module unload.  Using the globals system also means
that its now tied into the shrink handler so we can properly respond to
low-memory situations.

Signed-off-by: Jason Ekstrand 
Fixes: 88be9a0a06b7 ("drm/i915/ttm: add ttm_buddy_man")
Cc: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_buddy.c   | 44 ++---
 drivers/gpu/drm/i915/i915_buddy.h   |  3 +-
 drivers/gpu/drm/i915/i915_globals.c |  2 ++
 3 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_buddy.c 
b/drivers/gpu/drm/i915/i915_buddy.c
index 29dd7d0310c1f..911feedad4513 100644
--- a/drivers/gpu/drm/i915/i915_buddy.c
+++ b/drivers/gpu/drm/i915/i915_buddy.c
@@ -8,8 +8,14 @@
 #include "i915_buddy.h"
 
 #include "i915_gem.h"
+#include "i915_globals.h"
 #include "i915_utils.h"
 
+static struct i915_global_buddy {
+   struct i915_global base;
+   struct kmem_cache *slab_blocks;
+} global;
+
 static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm,
 struct i915_buddy_block 
*parent,
 unsigned int order,
@@ -19,7 +25,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
i915_buddy_mm *mm,
 
GEM_BUG_ON(order > I915_BUDDY_MAX_ORDER);
 
-   block = kmem_cache_zalloc(mm->slab_blocks, GFP_KERNEL);
+   block = kmem_cache_zalloc(global.slab_blocks, GFP_KERNEL);
if (!block)
return NULL;
 
@@ -34,7 +40,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
i915_buddy_mm *mm,
 static void i915_block_free(struct i915_buddy_mm *mm,
struct i915_buddy_block *block)
 {
-   kmem_cache_free(mm->slab_blocks, block);
+   kmem_cache_free(global.slab_blocks, block);
 }
 
 static void mark_allocated(struct i915_buddy_block *block)
@@ -85,15 +91,11 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 size, u64 
chunk_size)
 
GEM_BUG_ON(mm->max_order > I915_BUDDY_MAX_ORDER);
 
-   mm->slab_blocks = KMEM_CACHE(i915_buddy_block, SLAB_HWCACHE_ALIGN);
-   if (!mm->slab_blocks)
-   return -ENOMEM;
-
mm->free_list = kmalloc_array(mm->max_order + 1,
  sizeof(struct list_head),
  GFP_KERNEL);
if (!mm->free_list)
-   goto out_destroy_slab;
+   return -ENOMEM;
 
for (i = 0; i <= mm->max_order; ++i)
INIT_LIST_HEAD(>free_list[i]);
@@ -145,8 +147,6 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 size, u64 
chunk_size)
kfree(mm->roots);
 out_free_list:
kfree(mm->free_list);
-out_destroy_slab:
-   kmem_cache_destroy(mm->slab_blocks);
return -ENOMEM;
 }
 
@@ -161,7 +161,6 @@ void i915_buddy_fini(struct i915_buddy_mm *mm)
 
kfree(mm->roots);
kfree(mm->free_list);
-   kmem_cache_destroy(mm->slab_blocks);
 }
 
 static int split_block(struct i915_buddy_mm *mm,
@@ -410,3 +409,28 @@ int i915_buddy_alloc_range(struct i915_buddy_mm *mm,
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/i915_buddy.c"
 #endif
+
+static void i915_global_buddy_shrink(void)
+{
+   kmem_cache_shrink(global.slab_blocks);
+}
+
+static void i915_global_buddy_exit(void)
+{
+   kmem_cache_destroy(global.slab_blocks);
+}
+
+static struct i915_global_buddy global = { {
+   .shrink = i915_global_buddy_shrink,
+   .exit = i915_global_buddy_exit,
+} };
+
+int __init i915_global_buddy_init(void)
+{
+   global.slab_blocks = KMEM_CACHE(i915_buddy_block, 0);
+   if (!global.slab_blocks)
+   return -ENOMEM;
+
+   i915_global_register();
+   return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_buddy.h 
b/drivers/gpu/drm/i915/i915_buddy.h
index 37f8c42071d12..d8f26706de52f 100644
--- a/drivers/gpu/drm/i915/i915_buddy.h
+++ b/drivers/gpu/drm/i915/i915_buddy.h
@@ -47,7 +47,6 @@ struct i915_buddy_block {
  * i915_buddy_alloc* and i915_buddy_free* should suffice.
  */
 struct i915_buddy_mm {
-   struct kmem_cache *slab_blocks;
/* Maintain 

[Intel-gfx] [PATCH 4/6] drm/ttm: Force re-init if ttm_global_init() fails

2021-07-19 Thread Jason Ekstrand
If we have a failure, decrement the reference count so that the next
call to ttm_global_init() will actually do something instead of assume
everything is all set up.

Signed-off-by: Jason Ekstrand 
Fixes: 62b53b37e4b1 ("drm/ttm: use a static ttm_bo_global instance")
Cc: Christian König 
---
 drivers/gpu/drm/ttm/ttm_device.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 5f31acec3ad76..519deea8e39b7 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -100,6 +100,8 @@ static int ttm_global_init(void)
debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root,
>bo_count);
 out:
+   if (ret)
+   --ttm_glob_use_count;
mutex_unlock(_global_mutex);
return ret;
 }
-- 
2.31.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


  1   2   3   4   5   6   7   >