Re: [Intel-gfx] [PATCH 6/7] drm/i915/perf: add interrupt enabling parameter

2020-03-03 Thread Dixit, Ashutosh
On Tue, 03 Mar 2020 14:19:04 -0800, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin 
>
> This let's the application choose to be driven by the interrupt
> mechanism of the HW. In conjuction with long periods for checks for
> the availability of data on the CPU, this can reduce the CPU load when
> doing capture of OA data.
>
> v2: Version the new parameter (Joonas)
> v3: Rebase (Umesh)
>
> Signed-off-by: Lionel Landwerlin 
> Signed-off-by: Umesh Nerlige Ramappa 

[snip]

> + /**
> +  * Specifying this property sets up the interrupt mechanism for the OA
> +  * buffer in i915. This option in conjuction with a long polling delay
> +  * for avaibility of OA data can reduce CPU load significantly if you
> +  * do not care about OA data being read as soon as it's available.
> +  *
> +  * This property is available in perf revision 5.
> +  */
> + DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT,

What if we do not expose this parameter in the uapi at all and internally
decide in i915 whether to leave the interrupt either always enabled or
always disabled (and in that case always use the hrtimer)? This way we
retain flexibility in i915 if hardware evolves in the future e.g. to use
watermarks for the interrupt, without yielding control to userspace.

Overall I feel we should avoid exposing unnecessary details of the internal
implemenation to userspace, they would be neither interested in knowing
internal details nor know how to properly use these parameters. Shouldn't
the driver be able to make these kinds of decisions internally?

At this point the only parameter which implicitly exposed to userspace is
the hrtimer poll period, so perhaps all we need to do is to expose that in
the uapi? Thoughts?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 7/7] drm/i915/perf: add flushing ioctl

2020-03-03 Thread Dixit, Ashutosh
On Tue, 03 Mar 2020 14:19:05 -0800, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin 
>
> With the currently available parameters for the i915-perf stream,
> there are still situations that are not well covered :
>
> If an application opens the stream with polling disable or at very low
> frequency and OA interrupt enabled, no data will be available even
> though somewhere between nothing and half of the OA buffer worth of
> data might have landed in memory.
>
> To solve this issue we have a new flush ioctl on the perf stream that
> forces the i915-perf driver to look at the state of the buffer when
> called and makes any data available through both poll() & read() type
> syscalls.
>
> v2: Version the ioctl (Joonas)
> v3: Rebase (Umesh)
>
> Signed-off-by: Lionel Landwerlin 
> Signed-off-by: Umesh Nerlige Ramappa 

[snip]

> +/**
> + * i915_perf_flush_data - handle `I915_PERF_IOCTL_FLUSH_DATA` ioctl
> + * @stream: An enabled i915 perf stream
> + *
> + * The intention is to flush all the data available for reading from the OA
> + * buffer
> + */
> +static void i915_perf_flush_data(struct i915_perf_stream *stream)
> +{
> + stream->pollin = oa_buffer_check(stream, true);
> +}

Since this function doesn't actually wake up any thread (which anyway can
be done by sending a signal to the blocked thread), is the only purpose of
this function to update OA buffer head/tail? But in that it is not clear
why a separate ioctl should be created for this, can't the read() call
itself call oa_buffer_check() to update the OA buffer head/tail?

Again just trying to minimize uapi changes if possible.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 7/7] drm/i915/perf: add flushing ioctl

2020-03-04 Thread Dixit, Ashutosh
On Wed, 04 Mar 2020 00:52:34 -0800, Lionel Landwerlin wrote:
>
> On 04/03/2020 07:48, Dixit, Ashutosh wrote:
> > On Tue, 03 Mar 2020 14:19:05 -0800, Umesh Nerlige Ramappa wrote:
> >> From: Lionel Landwerlin 
> >>
> >> With the currently available parameters for the i915-perf stream,
> >> there are still situations that are not well covered :
> >>
> >> If an application opens the stream with polling disable or at very low
> >> frequency and OA interrupt enabled, no data will be available even
> >> though somewhere between nothing and half of the OA buffer worth of
> >> data might have landed in memory.
> >>
> >> To solve this issue we have a new flush ioctl on the perf stream that
> >> forces the i915-perf driver to look at the state of the buffer when
> >> called and makes any data available through both poll() & read() type
> >> syscalls.
> >>
> >> v2: Version the ioctl (Joonas)
> >> v3: Rebase (Umesh)
> >>
> >> Signed-off-by: Lionel Landwerlin 
> >> Signed-off-by: Umesh Nerlige Ramappa 
> > [snip]
> >
> >> +/**
> >> + * i915_perf_flush_data - handle `I915_PERF_IOCTL_FLUSH_DATA` ioctl
> >> + * @stream: An enabled i915 perf stream
> >> + *
> >> + * The intention is to flush all the data available for reading from the 
> >> OA
> >> + * buffer
> >> + */
> >> +static void i915_perf_flush_data(struct i915_perf_stream *stream)
> >> +{
> >> +  stream->pollin = oa_buffer_check(stream, true);
> >> +}
> > Since this function doesn't actually wake up any thread (which anyway can
> > be done by sending a signal to the blocked thread), is the only purpose of
> > this function to update OA buffer head/tail? But in that it is not clear
> > why a separate ioctl should be created for this, can't the read() call
> > itself call oa_buffer_check() to update the OA buffer head/tail?
> >
> > Again just trying to minimize uapi changes if possible.
>
> Most applications will call read() after being notified by poll()/select()
> that some data is available.

Correct this is the standard non blocking read behavior.

> Changing that behavior will break some of the existing perf tests .

I am not suggesting changing that (that standard non blocking read
behavior).

> If any data is available, this new ioctl will wake up existing waiters on
> poll()/select().

The issue is we are not calling wake_up() in the above function to wake up
any blocked waiters. The ioctl will just update the OA buffer head/tail so
that (a) a subsequent blocking read will not block, or (b) a subsequent non
blocking read will return valid data (not -EAGAIN), or (c) a poll/select
will not block but return immediately saying data is available.

That is why it seems to me the ioctl is not required, updating the OA
buffer head/tail can be done as part of the read() (and the poll/select)
calls themselves.

We will investigate if this can be done and update the patches in the next
revision accordingly. Thanks!
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 7/7] drm/i915/perf: add flushing ioctl

2020-03-10 Thread Dixit, Ashutosh
On Tue, 10 Mar 2020 13:44:30 -0700, Lionel Landwerlin wrote:
>
> On 09/03/2020 21:51, Umesh Nerlige Ramappa wrote:
> > On Wed, Mar 04, 2020 at 09:56:28PM -0800, Dixit, Ashutosh wrote:
> >> On Wed, 04 Mar 2020 00:52:34 -0800, Lionel Landwerlin wrote:
> >>>
> >>> On 04/03/2020 07:48, Dixit, Ashutosh wrote:
> >>> > On Tue, 03 Mar 2020 14:19:05 -0800, Umesh Nerlige Ramappa wrote:
> >>> >> From: Lionel Landwerlin 
> >>> >>
> >>> >> With the currently available parameters for the i915-perf stream,
> >>> >> there are still situations that are not well covered :
> >>> >>
> >>> >> If an application opens the stream with polling disable or at very
> >>> low
> >>> >> frequency and OA interrupt enabled, no data will be available even
> >>> >> though somewhere between nothing and half of the OA buffer worth of
> >>> >> data might have landed in memory.
> >>> >>
> >>> >> To solve this issue we have a new flush ioctl on the perf stream
> >>> that
> >>> >> forces the i915-perf driver to look at the state of the buffer when
> >>> >> called and makes any data available through both poll() & read()
> >>> type
> >>> >> syscalls.
> >>> >>
> >>> >> v2: Version the ioctl (Joonas)
> >>> >> v3: Rebase (Umesh)
> >>> >>
> >>> >> Signed-off-by: Lionel Landwerlin 
> >>> >> Signed-off-by: Umesh Nerlige Ramappa
> >>> 
> >>> > [snip]
> >>> >
> >>> >> +/**
> >>> >> + * i915_perf_flush_data - handle `I915_PERF_IOCTL_FLUSH_DATA` ioctl
> >>> >> + * @stream: An enabled i915 perf stream
> >>> >> + *
> >>> >> + * The intention is to flush all the data available for reading
> >>> from the OA
> >>> >> + * buffer
> >>> >> + */
> >>> >> +static void i915_perf_flush_data(struct i915_perf_stream *stream)
> >>> >> +{
> >>> >> +    stream->pollin = oa_buffer_check(stream, true);
> >>> >> +}
> >>> > Since this function doesn't actually wake up any thread (which anyway
> >>> can
> >>> > be done by sending a signal to the blocked thread), is the only
> >>> purpose of
> >>> > this function to update OA buffer head/tail? But in that it is not
> >>> clear
> >>> > why a separate ioctl should be created for this, can't the read()
> >>> call
> >>> > itself call oa_buffer_check() to update the OA buffer head/tail?
> >>> >
> >>> > Again just trying to minimize uapi changes if possible.
> >>>
> >>> Most applications will call read() after being notified by
> >>> poll()/select()
> >>> that some data is available.
> >>
> >> Correct this is the standard non blocking read behavior.
> >>
> >>> Changing that behavior will break some of the existing perf tests .
> >>
> >> I am not suggesting changing that (that standard non blocking read
> >> behavior).
> >>
> >>> If any data is available, this new ioctl will wake up existing waiters
> >>> on
> >>> poll()/select().
> >>
> >> The issue is we are not calling wake_up() in the above function to wake
> >> up
> >> any blocked waiters. The ioctl will just update the OA buffer head/tail
> >> so
> >> that (a) a subsequent blocking read will not block, or (b) a subsequent
> >> non
> >> blocking read will return valid data (not -EAGAIN), or (c) a poll/select
> >> will not block but return immediately saying data is available.
> >>
> >> That is why it seems to me the ioctl is not required, updating the OA
> >> buffer head/tail can be done as part of the read() (and the poll/select)
> >> calls themselves.
> >>
> >> We will investigate if this can be done and update the patches in the
> >> next
> >> revision accordingly. Thanks!
> >
> > In this case, where we are trying to determine if there is any data in
> > the oa buffer before the next interrupt has fired, user could call poll
> > with a reasonable timeout to determine if data is available or not.  That
> > would eliminate the need for the flush ioctl. Thoughts?
> >
> > Thanks,
> > Umesh
>
>
> I almost forgot why this would cause problem.
>
> Checking the state of the buffer every time you call poll() will pretty
> much guarantee you have at least one report to read every time.
>
> So that would lead to lot more wakeups :(
>
> The whole system has to stay "unidirectional" with either interrupts or
> timeout driving the wakeups.
>
> This additional ioctl is the only solution I could find to add one more
> input to the wakeup mechanism.

Well, aren't we asking the app to sleep for time T and then call flush
(followed by read)? Then we might as well ask them to sleep for time T and
call poll? Or we can ask them set the hrtimer to T, skip the sleep and call
poll (followed by read)? Aren't these 3 mechanisms equivalent? To me the
last option seems to be the cleanest. Thanks!
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/7] drm/i915/perf: add new open param to configure polling of OA buffer

2020-03-12 Thread Dixit, Ashutosh
On Tue, 03 Mar 2020 14:19:02 -0800, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin 
>
> This new parameter let's the application choose how often the OA
> buffer should be checked on the CPU side for data availability. Longer
> polling period tend to reduce CPU overhead if the application does not
> care about somewhat real time data collection.
>
> v2: Allow disabling polling completely with 0 value (Lionel)
> v3: Version the new parameter (Joonas)
> v4: Rebase (Umesh)

Hi Lionel, I was thinking that one way to keep things simple for now (and
fix the high cpu usage issue) would be to expose _ONLY_ this OA poll period
parameter to user space. That is we don't expose the interrupt or the flush
ioctl to user space at this time. This way the user will be able to
configure the hrtimer frequency to a lower value to bring down the cpu
usage.

Also we would disallow disabling the timer (and internally also not use the
interrupt). So everything will happen in exactly the same way as it used to
(no other changes needed) but at a lower rate if the user so chooses.

What do you think about this?

Thanks!
--
Ashutosh
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/7] drm/i915/perf: add new open param to configure polling of OA buffer

2020-03-12 Thread Dixit, Ashutosh
On Thu, 12 Mar 2020 13:37:12 -0700, Lionel Landwerlin wrote:
> On 12/03/2020 21:27, Dixit, Ashutosh wrote:
> > On Tue, 03 Mar 2020 14:19:02 -0800, Umesh Nerlige Ramappa wrote:
> >> From: Lionel Landwerlin 
> >>
> >> This new parameter let's the application choose how often the OA
> >> buffer should be checked on the CPU side for data availability. Longer
> >> polling period tend to reduce CPU overhead if the application does not
> >> care about somewhat real time data collection.
> >>
> >> v2: Allow disabling polling completely with 0 value (Lionel)
> >> v3: Version the new parameter (Joonas)
> >> v4: Rebase (Umesh)
> > Hi Lionel, I was thinking that one way to keep things simple for now (and
> > fix the high cpu usage issue) would be to expose _ONLY_ this OA poll period
> > parameter to user space. That is we don't expose the interrupt or the flush
> > ioctl to user space at this time. This way the user will be able to
> > configure the hrtimer frequency to a lower value to bring down the cpu
> > usage.
> >
> > Also we would disallow disabling the timer (and internally also not use the
> > interrupt). So everything will happen in exactly the same way as it used to
> > (no other changes needed) but at a lower rate if the user so chooses.
> >
> > What do you think about this?
> >
> > Thanks!
> > --
> > Ashutosh
>
> Sure, just make sure the users know about this.

Ok, so the plan now is to just post and review/merge the first 4 patches
mostly as is, except that the poll timer cannot be disabled. IMO this
should solve the cpu usage issue. Then we can take up the remaining 3
interrupt and flush patches and see if they are really needed and move them
forward if they are.

> The fact that they can now select timer values that will potentially lead
> to the loss of the buffer's data because it was overridden.

I think you mean over-written. You are right but I think there is way
around this and we can post that patch soon which I think will avoid this
issue too.

Thanks!
--
Ashutosh
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] Patches that didn't applied cleanly on drm-intel-next-fixes

2020-04-08 Thread Dixit, Ashutosh
On Wed, 08 Apr 2020 13:59:38 -0700, Rodrigo Vivi wrote:
>
> Hi Ashutosh and Chris,
>
> these patches seems needed for 5.7 but didn't applied cleanly on dinf:
>
> Failed to cherry-pick:
> 6352219c39c0 ("drm/i915/perf: Do not clear pollin for small user read 
> buffers")
> 614654abe847 ("drm/i915: Check current i915_vma.pin_count status first on 
> unbind")
> c4e8ba739034 ("drm/i915/gt: Yield the timeslice if caught waiting on a user 
> semaphore")
>
> If they are critical for 5.7 please provide a version that applies or list 
> the dependencies.

I have sent a version of 6352219c39c0 to intel-gfx which should apply on dinf:

https://patchwork.freedesktop.org/series/75085/#rev8

Thanks!
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] lib: Use read() for timerfd timeout detection

2020-04-14 Thread Dixit, Ashutosh
On Tue, 14 Apr 2020 12:05:09 -0700, Chris Wilson wrote:
>
> The poll() is proving unreliable, where our tests timeout without the
> spinner being terminated. Let's try a blocking read instead!

Weird, wondering if all we need to do is set TFD_NONBLOCK on the fd?

>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1676
> Signed-off-by: Chris Wilson 
> ---
>  lib/igt_dummyload.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/lib/igt_dummyload.c b/lib/igt_dummyload.c
> index 99ca84ad8..a59afd45b 100644
> --- a/lib/igt_dummyload.c
> +++ b/lib/igt_dummyload.c
> @@ -399,12 +399,13 @@ igt_spin_factory(int fd, const struct igt_spin_factory 
> *opts)
>  static void *timer_thread(void *data)
>  {
>   igt_spin_t *spin = data;
> - struct pollfd pfd = {
> - .fd = spin->timerfd,
> - .events = POLLIN,
> - };
> + uint64_t overruns = 0;
> + int ret;
>
> - if (poll(, 1, -1) >= 0)
> + do {
> + ret = read(spin->timerfd, , sizeof(overruns));
> + } while (ret == -1 && errno == EINTR);

do {} while (!overruns) and skip if () below?

> + if (overruns)
>   igt_spin_end(spin);
>
>   return NULL;
> --
> 2.26.0
>
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads

2020-04-14 Thread Dixit, Ashutosh
On Tue, 14 Apr 2020 09:59:48 -0700, Umesh Nerlige Ramappa wrote:
> On Mon, Apr 13, 2020 at 11:58:18PM -0700, Dixit, Ashutosh wrote:
> > On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
> >>
> >> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
> >> >
> >> > A condition in wait_event_interruptible seems to be checked twice before
> >> > waiting on the event to occur. These checks are redundant when hrtimer
> >> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
> >> > pointers. The redundant checks add cpu overhead. Simplify the check
> >> > to reduce cpu overhead when using blocking io to read oa buffer reports.
> >> >
> >> > Signed-off-by: Umesh Nerlige Ramappa 
> >> > ---
> >> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++-
> >> >  1 file changed, 27 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> >> > b/drivers/gpu/drm/i915/i915_perf.c
> >> > index 5cde3e4e7be6..e28a3ab83fde 100644
> >> > --- a/drivers/gpu/drm/i915/i915_perf.c
> >> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> >> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct 
> >> > i915_perf_stream *stream)
> >> >  return pollin;
> >> >  }
> >> >
> >> > +/**
> >> > + * oa_buffer_check_reports - quick check if reports are available
> >> > + * @stream: i915 stream instance
> >> > + *
> >> > + * The return from this function is used as a condition for
> >> > + * wait_event_interruptible in blocking read. This is used to detect
> >> > + * available reports.
> >> > + *
> >> > + * A condition in wait_event_interruptible seems to be checked twice 
> >> > before
> >> > + * waiting on an event to occur. These checks are redundant when 
> >> > hrtimer events
> >> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail 
> >> > pointers. The
> >> > + * redundant checks add cpu overhead. We simplify the check to reduce 
> >> > cpu
> >> > + * overhead.
> >> > + */
> >> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> >> > +{
> >> > +unsigned long flags;
> >> > +bool available;
> >> > +
> >> > +spin_lock_irqsave(>oa_buffer.ptr_lock, flags);
> >> > +available = stream->oa_buffer.tail != stream->oa_buffer.head;
> >> > +spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
> >> > +
> >> > +return available;
> >> > +}
> >> > +
> >> >  /**
> >> >   * append_oa_status - Appends a status record to a userspace read() 
> >> > buffer.
> >> >   * @stream: An i915-perf stream opened for OA metrics
> >> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct 
> >> > i915_perf_stream *stream)
> >> >  return -EIO;
> >> >
> >> >  return wait_event_interruptible(stream->poll_wq,
> >> > -
> >> > oa_buffer_check_unlocked(stream));
> >> > +
> >> > oa_buffer_check_reports(stream));
> >> >  }
> >>
> >> I think the problem with this patch is that the original code had the
> >> property that the condition for data availability is checked (and the OA
> >> tail advanced) /before/ entering the wait. So the tail was advanced and if
> >> there was data there was no need to wait at all. This change has lost that
> >> property. With this change we will first always enter the wait and then get
> >> unblocked after the timer interrupt which will advance the tail and wake us
> >> up.
> >>
> >> I think if we want to do this, it is possible to do without losing the
> >> original property. Just call oa_buffer_check_unlocked() first (outside
> >> wait_event) and if there is data just return. If not, put yourself on
> >> stream->poll_wq from which the timer callback will wake us up. I think this
> >> is going to be something like prepare_to_wait() / schedule() /
> >> finish_wait() pattern of which there are numerous examples in the
> >> kernel. So in this case we won't call wait_event_interruptible() which
> >> checks

Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads

2020-04-14 Thread Dixit, Ashutosh
On Tue, 14 Apr 2020 12:09:42 -0700, Dixit, Ashutosh wrote:
>
> On Tue, 14 Apr 2020 09:59:48 -0700, Umesh Nerlige Ramappa wrote:
> > On Mon, Apr 13, 2020 at 11:58:18PM -0700, Dixit, Ashutosh wrote:
> > > On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
> > >>
> > >> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
> > >> >
> > >> > A condition in wait_event_interruptible seems to be checked twice 
> > >> > before
> > >> > waiting on the event to occur. These checks are redundant when hrtimer
> > >> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
> > >> > pointers. The redundant checks add cpu overhead. Simplify the check
> > >> > to reduce cpu overhead when using blocking io to read oa buffer 
> > >> > reports.
> > >> >
> > >> > Signed-off-by: Umesh Nerlige Ramappa 
> > >> > ---
> > >> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++-
> > >> >  1 file changed, 27 insertions(+), 1 deletion(-)
> > >> >
> > >> > diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> > >> > b/drivers/gpu/drm/i915/i915_perf.c
> > >> > index 5cde3e4e7be6..e28a3ab83fde 100644
> > >> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > >> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > >> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct 
> > >> > i915_perf_stream *stream)
> > >> >return pollin;
> > >> >  }
> > >> >
> > >> > +/**
> > >> > + * oa_buffer_check_reports - quick check if reports are available
> > >> > + * @stream: i915 stream instance
> > >> > + *
> > >> > + * The return from this function is used as a condition for
> > >> > + * wait_event_interruptible in blocking read. This is used to detect
> > >> > + * available reports.
> > >> > + *
> > >> > + * A condition in wait_event_interruptible seems to be checked twice 
> > >> > before
> > >> > + * waiting on an event to occur. These checks are redundant when 
> > >> > hrtimer events
> > >> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail 
> > >> > pointers. The
> > >> > + * redundant checks add cpu overhead. We simplify the check to reduce 
> > >> > cpu
> > >> > + * overhead.
> > >> > + */
> > >> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> > >> > +{
> > >> > +  unsigned long flags;
> > >> > +  bool available;
> > >> > +
> > >> > +  spin_lock_irqsave(>oa_buffer.ptr_lock, flags);
> > >> > +  available = stream->oa_buffer.tail != stream->oa_buffer.head;
> > >> > +  spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
> > >> > +
> > >> > +  return available;
> > >> > +}
> > >> > +
> > >> >  /**
> > >> >   * append_oa_status - Appends a status record to a userspace read() 
> > >> > buffer.
> > >> >   * @stream: An i915-perf stream opened for OA metrics
> > >> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct 
> > >> > i915_perf_stream *stream)
> > >> >return -EIO;
> > >> >
> > >> >return wait_event_interruptible(stream->poll_wq,
> > >> > -  
> > >> > oa_buffer_check_unlocked(stream));
> > >> > +  
> > >> > oa_buffer_check_reports(stream));
> > >> >  }
> > >>
> > >> I think the problem with this patch is that the original code had the
> > >> property that the condition for data availability is checked (and the OA
> > >> tail advanced) /before/ entering the wait. So the tail was advanced and 
> > >> if
> > >> there was data there was no need to wait at all. This change has lost 
> > >> that
> > >> property. With this change we will first always enter the wait and then 
> > >> get
> > >> unblocked after the timer interrupt which will advance the tail and wake 
> > >> us
> > >> up.
> > >>
> > >

Re: [Intel-gfx] [PATCH 3/3] drm/i915/perf: add interrupt enabling parameter

2020-04-17 Thread Dixit, Ashutosh
On Mon, 13 Apr 2020 08:48:22 -0700, Umesh Nerlige Ramappa wrote:
>
> @@ -556,16 +559,23 @@ static bool oa_buffer_check_unlocked(struct 
> i915_perf_stream *stream)
>   * waiting on an event to occur. These checks are redundant when hrtimer 
> events
>   * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. 
> The
>   * redundant checks add cpu overhead. We simplify the check to reduce cpu
> - * overhead.
> + * overhead. For interrupt events, we still need to make sure that
> + * oa_buffer_check_unlocked is called when an interrupt occurs.
>   */
>  static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
>  {
>   unsigned long flags;
>   bool available;
>
> - spin_lock_irqsave(>oa_buffer.ptr_lock, flags);
> - available = stream->oa_buffer.tail != stream->oa_buffer.head;
> - spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
> + if (!stream->oa_interrupt_monitor) {
> + spin_lock_irqsave(>oa_buffer.ptr_lock, flags);
> + available = stream->oa_buffer.tail != stream->oa_buffer.head;
> + spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
> + } else {
> + if (stream->half_full_count_last !=
> + atomic64_read(>half_full_count))
> + available = oa_buffer_check_unlocked(stream);
> + }

This logic is broken if we have both the interrupt and the timer enabled?
Anyway as I said for Patch 1, oa_buffer_check_reports() should not be
needed (and hopefully neither the half_full_count's as per the comment in
Patch 2).

> @@ -3710,13 +3736,16 @@ static int read_properties_unlocked(struct i915_perf 
> *perf,
>   break;
>   }
>   case DRM_I915_PERF_PROP_POLL_OA_PERIOD:
> - if (value < 10 /* 100us */) {
> + if (value > 0 && value < 10 /* 100us */) {
>   DRM_DEBUG("OA availability timer too small 
> (%lluns < 100us)\n",
> value);
>   return -EINVAL;
>   }
>   props->poll_oa_period = value;
>   break;
> + case DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT:
> + props->oa_interrupt_monitor = value != 0;

At one point it was fashionable to write this as '= !!value' but I believe
even that is not needed now and this can be written as '= value'.

> @@ -3725,6 +3754,19 @@ static int read_properties_unlocked(struct i915_perf 
> *perf,
>   uprop += 2;
>   }
>
> + /*
> +  * Blocking read need to be waken up by some mechanism. If no polling
> +  * of the HEAD/TAIL register is done by the kernel and no interrupt is
> +  * enabled, we'll never be able to wake up.
> +  */
> + if ((open_flags & I915_PERF_FLAG_FD_NONBLOCK) == 0 &&
> + !props->poll_oa_period &&
> + !props->oa_interrupt_monitor) {
> + DRM_DEBUG("Requesting a blocking stream with no polling period "
> +   "& no interrupt.\n");
> + return -EINVAL;
> + }

Shouldn't this be true for non-blocking reads too since the poll() can
block indefinitely if both timer and interrupt are disabled? I think we
should disallow disabling both in either case.

> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 14b67cd6b54b..947e65b937eb 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1987,12 +1987,23 @@ enum drm_i915_perf_property_id {
>* the driver if this parameter is not specified. Note that larger timer
>* values will reduce cpu consumption during OA perf captures. However,
>* excessively large values would potentially result in OA buffer
> -  * overwrites as captures reach end of the OA buffer.
> +  * overwrites as captures reach end of the OA buffer. A value of 0 means
> +  * no hrtimer will be started.
>*
>* This property is available in perf revision 5.
>*/
>   DRM_I915_PERF_PROP_POLL_OA_PERIOD,
>
> + /**
> +  * Specifying this property sets up the interrupt mechanism for the OA
> +  * buffer in i915. This option in conjunction with a long polling period
> +  * for avaibility of OA data can reduce CPU load significantly if you
> +  * do not care about OA data being read as soon as it's available.
> +  *
> +  * This property is available in perf revision 6.
> +  */
> + DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT,

I am still torn about exposing this new control to userspace. If we don't
we can have the interrupt always enabled and just document that the timer
can be disabled and when the timer is disabled the the OA sampling will be
driven off the interrupt. Anyway, if we decide that exposing this to user
space is better or more explicit, I'll go with it.

In either case, we need to add to the documentation above 

Re: [Intel-gfx] [PATCH 2/3] drm/i915: handle interrupts from the OA unit

2020-04-17 Thread Dixit, Ashutosh
On Mon, 13 Apr 2020 08:48:21 -0700, Umesh Nerlige Ramappa wrote:
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> index 0cc7dd54f4f9..61eee9fb8872 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> @@ -94,6 +94,18 @@ gen11_gt_engine_identity(struct intel_gt *gt,
>   return ident;
>  }
>
> +static void notify_perfmon_buffer_half_full(struct drm_i915_private *i915)
> +{
> + atomic64_inc(>perf.exclusive_stream->half_full_count);
> + wake_up_all(>perf.exclusive_stream->poll_wq);
> +}
> +

I was expecting this function to be almost the same as the timer
oa_poll_check_timer_cb(), something like, maybe with minor variations:

static void notify_perfmon_buffer_half_full(struct drm_i915_private *i915)
{
struct i915_perf_stream *stream = i915->perf.exclusive_stream;

if (oa_buffer_check_unlocked(stream)) {
stream->pollin = true;
wake_up(>poll_wq);
}
}

And consequently I was expecting to see zero changes to functions such as
oa_buffer_check_unlocked() and i915_perf_poll_locked() since everything
else is driven off stream->pollin as it is in case the timer callback.

So my question is why is notify_perfmon_buffer_half_full() not as I've
written above and what purpose are these new members half_full_count and
half_full_count_last serving? If it is to save a few cycles to adjust the
tail in oa_buffer_check_unlocked() (and I am not even sure of that) for an
interrupt which fires when half the buffer is full imo it is not worth it.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads

2020-04-14 Thread Dixit, Ashutosh
On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
>
> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
> >
> > A condition in wait_event_interruptible seems to be checked twice before
> > waiting on the event to occur. These checks are redundant when hrtimer
> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
> > pointers. The redundant checks add cpu overhead. Simplify the check
> > to reduce cpu overhead when using blocking io to read oa buffer reports.
> >
> > Signed-off-by: Umesh Nerlige Ramappa 
> > ---
> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++-
> >  1 file changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> > b/drivers/gpu/drm/i915/i915_perf.c
> > index 5cde3e4e7be6..e28a3ab83fde 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct 
> > i915_perf_stream *stream)
> > return pollin;
> >  }
> >
> > +/**
> > + * oa_buffer_check_reports - quick check if reports are available
> > + * @stream: i915 stream instance
> > + *
> > + * The return from this function is used as a condition for
> > + * wait_event_interruptible in blocking read. This is used to detect
> > + * available reports.
> > + *
> > + * A condition in wait_event_interruptible seems to be checked twice before
> > + * waiting on an event to occur. These checks are redundant when hrtimer 
> > events
> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail 
> > pointers. The
> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> > + * overhead.
> > + */
> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> > +{
> > +   unsigned long flags;
> > +   bool available;
> > +
> > +   spin_lock_irqsave(>oa_buffer.ptr_lock, flags);
> > +   available = stream->oa_buffer.tail != stream->oa_buffer.head;
> > +   spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
> > +
> > +   return available;
> > +}
> > +
> >  /**
> >   * append_oa_status - Appends a status record to a userspace read() buffer.
> >   * @stream: An i915-perf stream opened for OA metrics
> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct 
> > i915_perf_stream *stream)
> > return -EIO;
> >
> > return wait_event_interruptible(stream->poll_wq,
> > -   oa_buffer_check_unlocked(stream));
> > +   oa_buffer_check_reports(stream));
> >  }
>
> I think the problem with this patch is that the original code had the
> property that the condition for data availability is checked (and the OA
> tail advanced) /before/ entering the wait. So the tail was advanced and if
> there was data there was no need to wait at all. This change has lost that
> property. With this change we will first always enter the wait and then get
> unblocked after the timer interrupt which will advance the tail and wake us
> up.
>
> I think if we want to do this, it is possible to do without losing the
> original property. Just call oa_buffer_check_unlocked() first (outside
> wait_event) and if there is data just return. If not, put yourself on
> stream->poll_wq from which the timer callback will wake us up. I think this
> is going to be something like prepare_to_wait() / schedule() /
> finish_wait() pattern of which there are numerous examples in the
> kernel. So in this case we won't call wait_event_interruptible() which
> checks the condition upon waking up. No need to define
> oa_buffer_check_reports() either.

If on the other hand we say that this should actually be the desired
behavior of the blocking read, that it should not unblock immediately but
only after the next timer interrupt (so that an app can call the blocking
read repeatedly without worrying about it will return a little bit of data
in each call at a considerable amount of CPU load), then we may be able to
do something like this, but even then we may have to think about what would
be the correct way to do that. Though this may be that correct way and we
may just need to change the commit message, but is that what we want?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads

2020-04-13 Thread Dixit, Ashutosh
On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
>
> A condition in wait_event_interruptible seems to be checked twice before
> waiting on the event to occur. These checks are redundant when hrtimer
> events will call oa_buffer_check_unlocked to update the oa_buffer tail
> pointers. The redundant checks add cpu overhead. Simplify the check
> to reduce cpu overhead when using blocking io to read oa buffer reports.
>
> Signed-off-by: Umesh Nerlige Ramappa 
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 28 +++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> b/drivers/gpu/drm/i915/i915_perf.c
> index 5cde3e4e7be6..e28a3ab83fde 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct 
> i915_perf_stream *stream)
>   return pollin;
>  }
>
> +/**
> + * oa_buffer_check_reports - quick check if reports are available
> + * @stream: i915 stream instance
> + *
> + * The return from this function is used as a condition for
> + * wait_event_interruptible in blocking read. This is used to detect
> + * available reports.
> + *
> + * A condition in wait_event_interruptible seems to be checked twice before
> + * waiting on an event to occur. These checks are redundant when hrtimer 
> events
> + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. 
> The
> + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> + * overhead.
> + */
> +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> +{
> + unsigned long flags;
> + bool available;
> +
> + spin_lock_irqsave(>oa_buffer.ptr_lock, flags);
> + available = stream->oa_buffer.tail != stream->oa_buffer.head;
> + spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
> +
> + return available;
> +}
> +
>  /**
>   * append_oa_status - Appends a status record to a userspace read() buffer.
>   * @stream: An i915-perf stream opened for OA metrics
> @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct 
> i915_perf_stream *stream)
>   return -EIO;
>
>   return wait_event_interruptible(stream->poll_wq,
> - oa_buffer_check_unlocked(stream));
> + oa_buffer_check_reports(stream));
>  }

I think the problem with this patch is that the original code had the
property that the condition for data availability is checked (and the OA
tail advanced) /before/ entering the wait. So the tail was advanced and if
there was data there was no need to wait at all. This change has lost that
property. With this change we will first always enter the wait and then get
unblocked after the timer interrupt which will advance the tail and wake us
up.

I think if we want to do this, it is possible to do without losing the
original property. Just call oa_buffer_check_unlocked() first (outside
wait_event) and if there is data just return. If not, put yourself on
stream->poll_wq from which the timer callback will wake us up. I think this
is going to be something like prepare_to_wait() / schedule() /
finish_wait() pattern of which there are numerous examples in the
kernel. So in this case we won't call wait_event_interruptible() which
checks the condition upon waking up. No need to define
oa_buffer_check_reports() either.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/perf: Do not clear pollin for small user read buffers

2020-03-25 Thread Dixit, Ashutosh
On Wed, 25 Mar 2020 12:25:59 -0700, Lionel Landwerlin wrote:
>
> On 25/03/2020 20:20, Ashutosh Dixit wrote:
> > It is wrong to block the user thread in the next poll when OA data is
> > already available which could not fit in the user buffer provided in
> > the previous read. In several cases the exact user buffer size is not
> > known. Blocking user space in poll can lead to data loss when the
> > buffer size used is smaller than the available data.
> >
> > This change fixes this issue and allows user space to read all OA data
> > even when using a buffer size smaller than the available data using
> > multiple non-blocking reads rather than staying blocked in poll till
> > the next timer interrupt.
>
> Looks like you found a pretty important issue.
>
> Can you write an IGT test case so that we don't run into it again?

OK I'll see if I can come up with an IGT.

Thanks!
--
Ashutosh
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/perf: Do not clear pollin for small user read buffers

2020-03-25 Thread Dixit, Ashutosh
On Wed, 25 Mar 2020 17:32:35 -0700, Umesh Nerlige Ramappa wrote:
>
> On Wed, Mar 25, 2020 at 11:20:19AM -0700, Ashutosh Dixit wrote:
> > It is wrong to block the user thread in the next poll when OA data is
> > already available which could not fit in the user buffer provided in
> > the previous read. In several cases the exact user buffer size is not
> > known. Blocking user space in poll can lead to data loss when the
> > buffer size used is smaller than the available data.
> >
> > This change fixes this issue and allows user space to read all OA data
> > even when using a buffer size smaller than the available data using
> > multiple non-blocking reads rather than staying blocked in poll till
> > the next timer interrupt.
> >
> > Cc: Umesh Nerlige Ramappa 
> > Cc: Lionel Landwerlin 
> > Signed-off-by: Ashutosh Dixit 
> > ---
> > drivers/gpu/drm/i915/i915_perf.c | 62 ++--
> > 1 file changed, 11 insertions(+), 51 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> > b/drivers/gpu/drm/i915/i915_perf.c
> > index 3222f6cd8255..c1a47c030941 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -2957,49 +2957,6 @@ void i915_oa_init_reg_state(const struct 
> > intel_context *ce,
> > gen8_update_reg_state_unlocked(ce, stream);
> > }
> >
> > -/**
> > - * i915_perf_read_locked - _perf_stream_ops->read with error 
> > normalisation
> > - * @stream: An i915 perf stream
> > - * @file: An i915 perf stream file
> > - * @buf: destination buffer given by userspace
> > - * @count: the number of bytes userspace wants to read
> > - * @ppos: (inout) file seek position (unused)
> > - *
> > - * Besides wrapping _perf_stream_ops->read this provides a common 
> > place to
> > - * ensure that if we've successfully copied any data then reporting that 
> > takes
> > - * precedence over any internal error status, so the data isn't lost.
> > - *
> > - * For example ret will be -ENOSPC whenever there is more buffered data 
> > than
> > - * can be copied to userspace, but that's only interesting if we weren't 
> > able
> > - * to copy some data because it implies the userspace buffer is too small 
> > to
> > - * receive a single record (and we never split records).
> > - *
> > - * Another case with ret == -EFAULT is more of a grey area since it would 
> > seem
> > - * like bad form for userspace to ask us to overrun its buffer, but the 
> > user
> > - * knows best:
> > - *
> > - *   http://yarchive.net/comp/linux/partial_reads_writes.html
> > - *
> > - * Returns: The number of bytes copied or a negative error code on failure.
> > - */
> > -static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream,
> > -struct file *file,
> > -char __user *buf,
> > -size_t count,
> > -loff_t *ppos)
> > -{
> > -   /* Note we keep the offset (aka bytes read) separate from any
> > -* error status so that the final check for whether we return
> > -* the bytes read with a higher precedence than any error (see
> > -* comment below) doesn't need to be handled/duplicated in
> > -* stream->ops->read() implementations.
> > -*/
> > -   size_t offset = 0;
> > -   int ret = stream->ops->read(stream, buf, count, );
> > -
> > -   return offset ?: (ret ?: -EAGAIN);
> > -}
> > -
> > /**
> >  * i915_perf_read - handles read() FOP for i915 perf stream FDs
> >  * @file: An i915 perf stream file
> > @@ -3025,6 +2982,8 @@ static ssize_t i915_perf_read(struct file *file,
> > {
> > struct i915_perf_stream *stream = file->private_data;
> > struct i915_perf *perf = stream->perf;
> > +   size_t offset = 0;
> > +   int __ret;
> > ssize_t ret;
> >
> > /* To ensure it's handled consistently we simply treat all reads of a
> > @@ -3048,16 +3007,18 @@ static ssize_t i915_perf_read(struct file *file,
> > return ret;
> >
> > mutex_lock(>lock);
> > -   ret = i915_perf_read_locked(stream, file,
> > -   buf, count, ppos);
> > +   __ret = stream->ops->read(stream, buf, count, );
> > mutex_unlock(>lock);
> > } while (ret == -EAGAIN);
>
> ret will never be EAGAIN here in the while. EAGAIN was returned by the
> deleted function in this patch if offset and ret are both 0.

Good catch, I was so focussed on the non-blocking case that I missed the
blocking case.

> Although I don't see how that would be true.

As you say above, the old function i915_perf_read_locked() was doing this:

return offset ?: (__ret ?: -EAGAIN);

So -EAGAIN is returned from i915_perf_read_locked() when there is no data
to read but otherwise there is no other error. Since this is blocking read
we cannot return -EAGAIN to user space (since there is no data to read), we
must go back and 

Re: [Intel-gfx] [PATCH] drm/i915/perf: Do not clear pollin for small user read buffers

2020-03-26 Thread Dixit, Ashutosh
On Thu, 26 Mar 2020 02:09:34 -0700, Lionel Landwerlin wrote:
>
> On 26/03/2020 06:43, Ashutosh Dixit wrote:
> > It is wrong to block the user thread in the next poll when OA data is
> > already available which could not fit in the user buffer provided in
> > the previous read. In several cases the exact user buffer size is not
> > known. Blocking user space in poll can lead to data loss when the
> > buffer size used is smaller than the available data.
> >
> > This change fixes this issue and allows user space to read all OA data
> > even when using a buffer size smaller than the available data using
> > multiple non-blocking reads rather than staying blocked in poll till
> > the next timer interrupt.
> >
> > v2: Fix ret value for blocking reads (Umesh)
> >
> > Cc: Umesh Nerlige Ramappa 
> > Cc: Lionel Landwerlin 
> > Signed-off-by: Ashutosh Dixit 
> > ---
> >   drivers/gpu/drm/i915/i915_perf.c | 63 ++--
> >   1 file changed, 12 insertions(+), 51 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> > b/drivers/gpu/drm/i915/i915_perf.c
> > index 3222f6cd8255..e2d083efba6d 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -2957,49 +2957,6 @@ void i915_oa_init_reg_state(const struct 
> > intel_context *ce,
> > gen8_update_reg_state_unlocked(ce, stream);
> >   }
> >   -/**
> > - * i915_perf_read_locked - _perf_stream_ops->read with error 
> > normalisation
> > - * @stream: An i915 perf stream
> > - * @file: An i915 perf stream file
> > - * @buf: destination buffer given by userspace
> > - * @count: the number of bytes userspace wants to read
> > - * @ppos: (inout) file seek position (unused)
> > - *
> > - * Besides wrapping _perf_stream_ops->read this provides a common 
> > place to
> > - * ensure that if we've successfully copied any data then reporting that 
> > takes
> > - * precedence over any internal error status, so the data isn't lost.
> > - *
> > - * For example ret will be -ENOSPC whenever there is more buffered data 
> > than
> > - * can be copied to userspace, but that's only interesting if we weren't 
> > able
> > - * to copy some data because it implies the userspace buffer is too small 
> > to
> > - * receive a single record (and we never split records).
> > - *
> > - * Another case with ret == -EFAULT is more of a grey area since it would 
> > seem
> > - * like bad form for userspace to ask us to overrun its buffer, but the 
> > user
> > - * knows best:
> > - *
> > - *   http://yarchive.net/comp/linux/partial_reads_writes.html
> > - *
> > - * Returns: The number of bytes copied or a negative error code on failure.
> > - */
> > -static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream,
> > -struct file *file,
> > -char __user *buf,
> > -size_t count,
> > -loff_t *ppos)
> > -{
> > -   /* Note we keep the offset (aka bytes read) separate from any
> > -* error status so that the final check for whether we return
> > -* the bytes read with a higher precedence than any error (see
> > -* comment below) doesn't need to be handled/duplicated in
> > -* stream->ops->read() implementations.
> > -*/
> > -   size_t offset = 0;
> > -   int ret = stream->ops->read(stream, buf, count, );
> > -
> > -   return offset ?: (ret ?: -EAGAIN);
> > -}
> > -
> >   /**
> >* i915_perf_read - handles read() FOP for i915 perf stream FDs
> >* @file: An i915 perf stream file
> > @@ -3025,6 +2982,8 @@ static ssize_t i915_perf_read(struct file *file,
> >   {
> > struct i915_perf_stream *stream = file->private_data;
> > struct i915_perf *perf = stream->perf;
> > +   size_t offset = 0;
> > +   int __ret;
> > ssize_t ret;
> > /* To ensure it's handled consistently we simply treat all 
> > reads of
> > a
> > @@ -3048,16 +3007,19 @@ static ssize_t i915_perf_read(struct file *file,
> > return ret;
> > mutex_lock(>lock);
> > -   ret = i915_perf_read_locked(stream, file,
> > -   buf, count, ppos);
> > +   __ret = stream->ops->read(stream, buf, count, );
> > +   ret = offset ?: (__ret ?: -EAGAIN);
> > mutex_unlock(>lock);
> > } while (ret == -EAGAIN);
> > } else {
> > mutex_lock(>lock);
> > -   ret = i915_perf_read_locked(stream, file, buf, count, ppos);
> > +   __ret = stream->ops->read(stream, buf, count, );
> > +   ret = offset ?: (__ret ?: -EAGAIN);
> > mutex_unlock(>lock);
> > }
> >   + /* Possible values for __ret are 0, -EFAULT, -ENOSPC, -EAGAIN,
> > ... */
> > +
> > /* We allow the poll checking to sometimes report false positive EPOLLIN
> >  * events where we might actually report EAGAIN on 

Re: [Intel-gfx] [PATCH] drm/i915/perf: Do not clear pollin for small user read buffers

2020-03-26 Thread Dixit, Ashutosh
On Thu, 26 Mar 2020 11:02:46 -0700, Umesh Nerlige Ramappa wrote:
> On Wed, Mar 25, 2020 at 06:52:52PM -0700, Dixit, Ashutosh wrote:
> > On Wed, 25 Mar 2020 17:32:35 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >> On Wed, Mar 25, 2020 at 11:20:19AM -0700, Ashutosh Dixit wrote:
> >> >
> >> > +/* Possible values for __ret are 0, -EFAULT, -ENOSPC, -EAGAIN, 
> >> > ... */
> >>
> >> __ret may never be EAGAIN either (comment^). I don't see EAGAIN in the read
> >> path.
> >
> > It's here:
> >
> > gen8_append_oa_reports()
> > {
> >
> >/*
> > * An invalid tail pointer here means we're still waiting for the 
> > poll
> > * hrtimer callback to give us a pointer
> > */
> >if (tail == INVALID_TAIL_PTR)
> >return -EAGAIN;
> > }
>
> Oh, you are right, EAGAIN is returned here. I was looking for it with the
> poll period patch series applied and these references are removed in that
> series.

No, you are right, since that series is likely to be merged first, it is
better to remove it from this patch.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/perf: Do not clear pollin for small user read buffers

2020-04-03 Thread Dixit, Ashutosh
On Fri, 03 Apr 2020 09:17:14 -0700, Chris Wilson wrote:
>
> Quoting Ashutosh Dixit (2020-04-03 02:01:20)
> > It is wrong to block the user thread in the next poll when OA data is
> > already available which could not fit in the user buffer provided in
> > the previous read. In several cases the exact user buffer size is not
> > known. Blocking user space in poll can lead to data loss when the
> > buffer size used is smaller than the available data.
> >
> > This change fixes this issue and allows user space to read all OA data
> > even when using a buffer size smaller than the available data using
> > multiple non-blocking reads rather than staying blocked in poll till
> > the next timer interrupt.
> >
> > v2: Fix ret value for blocking reads (Umesh)
> > v3: Mistake during patch send (Ashutosh)
> > v4: Remove -EAGAIN from comment (Umesh)
> > v5: Improve condition for clearing pollin and return (Lionel)
> > v6: Improve blocking read loop and other cleanups (Lionel)
> > v7: Added Cc stable
> >
> > Cc: Umesh Nerlige Ramappa 
> > Cc: 
> > Reviewed-by: Lionel Landwerlin 
> > Signed-off-by: Ashutosh Dixit 
>
> Did you manage to devise a test case? It is nice (some might say
> important) to pair a patch for stable with its regression test.

Yes there is a test case here:

https://patchwork.freedesktop.org/series/75100/#rev3

Lionel verified that it is fails on stable kernels here:

https://patchwork.freedesktop.org/patch/358873/?series=75100=1

Thanks!
--
Ashutosh
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/perf: Do not clear pollin for small user read buffers

2020-03-30 Thread Dixit, Ashutosh
On Mon, 30 Mar 2020 01:23:29 -0700, Lionel Landwerlin wrote:
>
> On 28/03/2020 01:16, Ashutosh Dixit wrote:
> > It is wrong to block the user thread in the next poll when OA data is
> > already available which could not fit in the user buffer provided in
> > the previous read. In several cases the exact user buffer size is not
> > known. Blocking user space in poll can lead to data loss when the
> > buffer size used is smaller than the available data.
> >
> > This change fixes this issue and allows user space to read all OA data
> > even when using a buffer size smaller than the available data using
> > multiple non-blocking reads rather than staying blocked in poll till
> > the next timer interrupt.
> >
> > v2: Fix ret value for blocking reads (Umesh)
> > v3: Mistake during patch send (Ashutosh)
> > v4: Remove -EAGAIN from comment (Umesh)
> >
> > Cc: Umesh Nerlige Ramappa 
> > Cc: Lionel Landwerlin 
> > Signed-off-by: Ashutosh Dixit 
>
> Looks like you change makes more sense than what I suggested.
>
> I have a few nits below.

Thanks Lionel, most of what you suggested made sense so I have made those
changes and submitted a v5. Please take a look. More comments below.

>
> Thanks,
>
> -Lionel
>
>
> > ---
> >   drivers/gpu/drm/i915/i915_perf.c | 59 +++-
> >   1 file changed, 12 insertions(+), 47 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> > b/drivers/gpu/drm/i915/i915_perf.c
> > index c74ebac50015..5f6d9bff99c8 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -2914,49 +2914,6 @@ void i915_oa_init_reg_state(const struct 
> > intel_context *ce,
> > gen8_update_reg_state_unlocked(ce, stream);
> >   }
> >   -/**
> > - * i915_perf_read_locked - _perf_stream_ops->read with error 
> > normalisation
> > - * @stream: An i915 perf stream
> > - * @file: An i915 perf stream file
> > - * @buf: destination buffer given by userspace
> > - * @count: the number of bytes userspace wants to read
> > - * @ppos: (inout) file seek position (unused)
> > - *
> > - * Besides wrapping _perf_stream_ops->read this provides a common 
> > place to
> > - * ensure that if we've successfully copied any data then reporting that 
> > takes
> > - * precedence over any internal error status, so the data isn't lost.
> > - *
> > - * For example ret will be -ENOSPC whenever there is more buffered data 
> > than
> > - * can be copied to userspace, but that's only interesting if we weren't 
> > able
> > - * to copy some data because it implies the userspace buffer is too small 
> > to
> > - * receive a single record (and we never split records).
> > - *
> > - * Another case with ret == -EFAULT is more of a grey area since it would 
> > seem
> > - * like bad form for userspace to ask us to overrun its buffer, but the 
> > user
> > - * knows best:
> > - *
> > - *   http://yarchive.net/comp/linux/partial_reads_writes.html
> > - *
> > - * Returns: The number of bytes copied or a negative error code on failure.
> > - */
> > -static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream,
> > -struct file *file,
> > -char __user *buf,
> > -size_t count,
> > -loff_t *ppos)
> > -{
> > -   /* Note we keep the offset (aka bytes read) separate from any
> > -* error status so that the final check for whether we return
> > -* the bytes read with a higher precedence than any error (see
> > -* comment below) doesn't need to be handled/duplicated in
> > -* stream->ops->read() implementations.
> > -*/
> > -   size_t offset = 0;
> > -   int ret = stream->ops->read(stream, buf, count, );
> > -
> > -   return offset ?: (ret ?: -EAGAIN);
> > -}
> > -
> >   /**
> >* i915_perf_read - handles read() FOP for i915 perf stream FDs
> >* @file: An i915 perf stream file
> > @@ -2982,6 +2939,8 @@ static ssize_t i915_perf_read(struct file *file,
> >   {
> > struct i915_perf_stream *stream = file->private_data;
> > struct i915_perf *perf = stream->perf;
> > +   size_t offset = 0;
> > +   int __ret;
> > ssize_t ret;
> > /* To ensure it's handled consistently we simply treat all 
> > reads of
> > a
> > @@ -3005,16 +2964,19 @@ static ssize_t i915_perf_read(struct file *file,
> > return ret;
> > mutex_lock(>lock);
> > -   ret = i915_perf_read_locked(stream, file,
> > -   buf, count, ppos);
> > +   __ret = stream->ops->read(stream, buf, count, );
> > +   ret = offset ?: (__ret ?: -EAGAIN);

> I would drop this line above and move it to the end of the function.

Unfortunately, Umesh pointed this out, that can't be done because ret is
used in the loop (do { } while (ret == -EAGAIN); ).

> > mutex_unlock(>lock);
> >  

Re: [Intel-gfx] [PATCH] drm/i915/perf: don't read head/tail pointers outside critical section

2020-03-30 Thread Dixit, Ashutosh
On Mon, 30 Mar 2020 09:38:23 -0700, Chris Wilson wrote:
>
> Quoting Dixit, Ashutosh (2020-03-30 16:55:32)
> > On Mon, 30 Mar 2020 03:09:20 -0700, Chris Wilson wrote:
> > >
> > > Quoting Lionel Landwerlin (2020-03-30 10:14:11)
> > > > Reading or writing those fields should only happen under
> > > > stream->oa_buffer.ptr_lock.
> > >
> > > Writing, yes. Reading as a pair, sure. There are other ways you can
> > > ensure that the tail/head are read as one, but fair enough.
> >
> > Sorry but I am trying to understand exactly what the purpose of
> > stream->oa_buffer.ptr_lock is? This is a classic ring buffer producer
> > consumer situation where producer updates tail and consumer updates
> > head. Since both are u32's can't those operations be done without requiring
> > a lock?
>
> > > > spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
> > > >
> > > > -   return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> > > > -   stream->oa_buffer.head - gtt_offset) >= 
> > > > report_size;
> > > > +   return pollin;
> >
> > In what way is the original code incorrect? As I mentioned head is u32 and
> > can be read atomically without requiring a lock? We had deliberately moved
> > this code outside the lock so as to pick up the the latest value of head if
> > it had been updated in the consumer (read).
>
> It's the pair of reads here. What's the synchronisation between the read
> of tail/head with the update? There's no sync between the reads so
> order is not determined here.
>
> So we may see the head updated for an old tail, and so think we have
> plenty to report, when in fact there's none (or someother convolution).
>
> Normal ringbuffer is to sample the head/tail pointers, smp_rmb(), then
> consume the data between head/tail (with the write doing the smp_wmb()
> after updating the data and before moving the tail). [So the normal
> usage of barriers is around access to one of tail/head (the other is
> under your control) and the shared contents.]

Ok, thanks for explanantion Chris, I think I understand how barriers are
used with ring buffers but I am still not sure if the previous code was
incorrect. It is almost consumer side code running in the producer
thread. Anyway, let's just go with this patch for now:

Acked-by: Ashutosh Dixit 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/perf: Do not clear pollin for small user read buffers

2020-03-31 Thread Dixit, Ashutosh
On Tue, 31 Mar 2020 00:34:10 -0700, Lionel Landwerlin wrote:
>
> On 31/03/2020 08:22, Ashutosh Dixit wrote:
> > It is wrong to block the user thread in the next poll when OA data is
> > already available which could not fit in the user buffer provided in
> > the previous read. In several cases the exact user buffer size is not
> > known. Blocking user space in poll can lead to data loss when the
> > buffer size used is smaller than the available data.
> >
> > This change fixes this issue and allows user space to read all OA data
> > even when using a buffer size smaller than the available data using
> > multiple non-blocking reads rather than staying blocked in poll till
> > the next timer interrupt.
> >
> > v2: Fix ret value for blocking reads (Umesh)
> > v3: Mistake during patch send (Ashutosh)
> > v4: Remove -EAGAIN from comment (Umesh)
> > v5: Improve condition for clearing pollin and return (Lionel)
> >
> > Cc: Umesh Nerlige Ramappa 
> > Cc: Lionel Landwerlin 
> > Signed-off-by: Ashutosh Dixit 
>
> I forgot to mention this needs to be Cc: stable.

I will Cc stable or send them the patch after it's finalized, hope that
will be ok?

>
> Still one nit below which should make the remaining function a bit simpler.
>
> Thanks for your time.
>
> -Lionel
>
>
> > ---
> >   drivers/gpu/drm/i915/i915_perf.c | 62 +++-
> >   1 file changed, 13 insertions(+), 49 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> > b/drivers/gpu/drm/i915/i915_perf.c
> > index c74ebac50015..9c21f28f89a7 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -2914,49 +2914,6 @@ void i915_oa_init_reg_state(const struct 
> > intel_context *ce,
> > gen8_update_reg_state_unlocked(ce, stream);
> >   }
> >   -/**
> > - * i915_perf_read_locked - _perf_stream_ops->read with error 
> > normalisation
> > - * @stream: An i915 perf stream
> > - * @file: An i915 perf stream file
> > - * @buf: destination buffer given by userspace
> > - * @count: the number of bytes userspace wants to read
> > - * @ppos: (inout) file seek position (unused)
> > - *
> > - * Besides wrapping _perf_stream_ops->read this provides a common 
> > place to
> > - * ensure that if we've successfully copied any data then reporting that 
> > takes
> > - * precedence over any internal error status, so the data isn't lost.
> > - *
> > - * For example ret will be -ENOSPC whenever there is more buffered data 
> > than
> > - * can be copied to userspace, but that's only interesting if we weren't 
> > able
> > - * to copy some data because it implies the userspace buffer is too small 
> > to
> > - * receive a single record (and we never split records).
> > - *
> > - * Another case with ret == -EFAULT is more of a grey area since it would 
> > seem
> > - * like bad form for userspace to ask us to overrun its buffer, but the 
> > user
> > - * knows best:
> > - *
> > - *   http://yarchive.net/comp/linux/partial_reads_writes.html
> > - *
> > - * Returns: The number of bytes copied or a negative error code on failure.
> > - */
> > -static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream,
> > -struct file *file,
> > -char __user *buf,
> > -size_t count,
> > -loff_t *ppos)
> > -{
> > -   /* Note we keep the offset (aka bytes read) separate from any
> > -* error status so that the final check for whether we return
> > -* the bytes read with a higher precedence than any error (see
> > -* comment below) doesn't need to be handled/duplicated in
> > -* stream->ops->read() implementations.
> > -*/
> > -   size_t offset = 0;
> > -   int ret = stream->ops->read(stream, buf, count, );
> > -
> > -   return offset ?: (ret ?: -EAGAIN);
> > -}
> > -
> >   /**
> >* i915_perf_read - handles read() FOP for i915 perf stream FDs
> >* @file: An i915 perf stream file
> > @@ -2982,7 +2939,8 @@ static ssize_t i915_perf_read(struct file *file,
> >   {
> > struct i915_perf_stream *stream = file->private_data;
> > struct i915_perf *perf = stream->perf;
> > -   ssize_t ret;
> > +   size_t offset = 0;
> > +   int __ret;
> > /* To ensure it's handled consistently we simply treat all 
> > reads of
> > a
> >  * disabled stream as an error. In particular it might otherwise lead
> > @@ -2992,6 +2950,8 @@ static ssize_t i915_perf_read(struct file *file,
> > return -EIO;
> > if (!(file->f_flags & O_NONBLOCK)) {
> > +   ssize_t ret;
> > +
> > /* There's the small chance of false positives from
> >  * stream->ops->wait_unlocked.
> >  *
> > @@ -3005,13 +2965,13 @@ static ssize_t i915_perf_read(struct file *file,
> > return ret;
> > mutex_lock(>lock);
> > -   ret = i915_perf_read_locked(stream, 

Re: [Intel-gfx] [PATCH] drm/i915/perf: Do not clear pollin for small user read buffers

2020-04-01 Thread Dixit, Ashutosh
On Tue, 31 Mar 2020 23:57:57 -0700, Lionel Landwerlin wrote:
>
> On 01/04/2020 02:14, Ashutosh Dixit wrote:
> > It is wrong to block the user thread in the next poll when OA data is
> > already available which could not fit in the user buffer provided in
> > the previous read. In several cases the exact user buffer size is not
> > known. Blocking user space in poll can lead to data loss when the
> > buffer size used is smaller than the available data.
> >
> > This change fixes this issue and allows user space to read all OA data
> > even when using a buffer size smaller than the available data using
> > multiple non-blocking reads rather than staying blocked in poll till
> > the next timer interrupt.
> >
> > v2: Fix ret value for blocking reads (Umesh)
> > v3: Mistake during patch send (Ashutosh)
> > v4: Remove -EAGAIN from comment (Umesh)
> > v5: Improve condition for clearing pollin and return (Lionel)
> > v6: Improve blocking read loop and other cleanups (Lionel)
> >
> > Cc: Umesh Nerlige Ramappa 
> > Cc: Lionel Landwerlin 
> > Signed-off-by: Ashutosh Dixit 
> > ---
> >   drivers/gpu/drm/i915/i915_perf.c | 61 ++--
> >   1 file changed, 11 insertions(+), 50 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> > b/drivers/gpu/drm/i915/i915_perf.c
> > index 28e3d76fa2e6..2f78b147bb2d 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -2963,49 +2963,6 @@ void i915_oa_init_reg_state(const struct 
> > intel_context *ce,
> > gen8_update_reg_state_unlocked(ce, stream);
> >   }
> >   -/**
> > - * i915_perf_read_locked - _perf_stream_ops->read with error 
> > normalisation
> > - * @stream: An i915 perf stream
> > - * @file: An i915 perf stream file
> > - * @buf: destination buffer given by userspace
> > - * @count: the number of bytes userspace wants to read
> > - * @ppos: (inout) file seek position (unused)
> > - *
> > - * Besides wrapping _perf_stream_ops->read this provides a common 
> > place to
> > - * ensure that if we've successfully copied any data then reporting that 
> > takes
> > - * precedence over any internal error status, so the data isn't lost.
> > - *
> > - * For example ret will be -ENOSPC whenever there is more buffered data 
> > than
> > - * can be copied to userspace, but that's only interesting if we weren't 
> > able
> > - * to copy some data because it implies the userspace buffer is too small 
> > to
> > - * receive a single record (and we never split records).
> > - *
> > - * Another case with ret == -EFAULT is more of a grey area since it would 
> > seem
> > - * like bad form for userspace to ask us to overrun its buffer, but the 
> > user
> > - * knows best:
> > - *
> > - *   http://yarchive.net/comp/linux/partial_reads_writes.html
> > - *
> > - * Returns: The number of bytes copied or a negative error code on failure.
> > - */
> > -static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream,
> > -struct file *file,
> > -char __user *buf,
> > -size_t count,
> > -loff_t *ppos)
> > -{
> > -   /* Note we keep the offset (aka bytes read) separate from any
> > -* error status so that the final check for whether we return
> > -* the bytes read with a higher precedence than any error (see
> > -* comment below) doesn't need to be handled/duplicated in
> > -* stream->ops->read() implementations.
> > -*/
> > -   size_t offset = 0;
> > -   int ret = stream->ops->read(stream, buf, count, );
> > -
> > -   return offset ?: (ret ?: -EAGAIN);
> > -}
> > -
> >   /**
> >* i915_perf_read - handles read() FOP for i915 perf stream FDs
> >* @file: An i915 perf stream file
> > @@ -3031,7 +2988,8 @@ static ssize_t i915_perf_read(struct file *file,
> >   {
> > struct i915_perf_stream *stream = file->private_data;
> > struct i915_perf *perf = stream->perf;
> > -   ssize_t ret;
> > +   size_t offset = 0;
> > +   int ret;
> > /* To ensure it's handled consistently we simply treat all 
> > reads of
> > a
> >  * disabled stream as an error. In particular it might otherwise lead
> > @@ -3054,13 +3012,12 @@ static ssize_t i915_perf_read(struct file *file,
> > return ret;
> > mutex_lock(>lock);
> > -   ret = i915_perf_read_locked(stream, file,
> > -   buf, count, ppos);
> > +   ret = stream->ops->read(stream, buf, count, );
> > mutex_unlock(>lock);
> > -   } while (ret == -EAGAIN);
> > +   } while (!offset && !ret);
>
> This doesn't sound right, !offset means it will stop as soon as some data
> was written.
>
> But for the blocking read we want to fill the buffer up to -ENOSPC.

I don't think that's true. Here's 'man 2 read': "read() attempts to read
/up to/ 

Re: [Intel-gfx] [PATCH] drm/i915/perf: don't read head/tail pointers outside critical section

2020-03-30 Thread Dixit, Ashutosh
On Mon, 30 Mar 2020 03:09:20 -0700, Chris Wilson wrote:
>
> Quoting Lionel Landwerlin (2020-03-30 10:14:11)
> > Reading or writing those fields should only happen under
> > stream->oa_buffer.ptr_lock.
>
> Writing, yes. Reading as a pair, sure. There are other ways you can
> ensure that the tail/head are read as one, but fair enough.

Sorry but I am trying to understand exactly what the purpose of
stream->oa_buffer.ptr_lock is? This is a classic ring buffer producer
consumer situation where producer updates tail and consumer updates
head. Since both are u32's can't those operations be done without requiring
a lock?

>
> > Signed-off-by: Lionel Landwerlin 
> > Fixes: d1df41eb72ef ("drm/i915/perf: rework aging tail workaround")
> > ---
> >  drivers/gpu/drm/i915/i915_perf.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> > b/drivers/gpu/drm/i915/i915_perf.c
> > index c74ebac50015..ec9421f02ebd 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -463,6 +463,7 @@ static bool oa_buffer_check_unlocked(struct 
> > i915_perf_stream *stream)
> > u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
> > int report_size = stream->oa_buffer.format_size;
> > unsigned long flags;
> > +   bool pollin;
> > u32 hw_tail;
> > u64 now;
> >
> > @@ -532,10 +533,13 @@ static bool oa_buffer_check_unlocked(struct 
> > i915_perf_stream *stream)
> > stream->oa_buffer.aging_timestamp = now;
> > }
> >
> > +   pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> > + stream->oa_buffer.head - gtt_offset) >= 
> > report_size;
> > +
> > +
>
> Bonus \n
>
> Reviewed-by: Chris Wilson 
>
> > spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
> >
> > -   return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> > -   stream->oa_buffer.head - gtt_offset) >= report_size;
> > +   return pollin;

In what way is the original code incorrect? As I mentioned head is u32 and
can be read atomically without requiring a lock? We had deliberately moved
this code outside the lock so as to pick up the the latest value of head if
it had been updated in the consumer (read).
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] lib: Use read() for timerfd timeout detection

2020-04-15 Thread Dixit, Ashutosh
On Wed, 15 Apr 2020 07:39:00 -0700, Chris Wilson wrote:
>
> The poll() is proving unreliable, where our tests timeout without the
> spinner being terminated. Let's try a blocking read instead!
>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1676
> Signed-off-by: Chris Wilson 
> Cc: "Dixit, Ashutosh" 


Reviewed-by: Ashutosh Dixit 


> ---
>  lib/igt_dummyload.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/lib/igt_dummyload.c b/lib/igt_dummyload.c
> index 99ca84ad8..ae0fb9378 100644
> --- a/lib/igt_dummyload.c
> +++ b/lib/igt_dummyload.c
> @@ -399,14 +399,14 @@ igt_spin_factory(int fd, const struct igt_spin_factory 
> *opts)
>  static void *timer_thread(void *data)
>  {
>   igt_spin_t *spin = data;
> - struct pollfd pfd = {
> - .fd = spin->timerfd,
> - .events = POLLIN,
> - };
> + uint64_t overruns = 0;
>
> - if (poll(, 1, -1) >= 0)
> - igt_spin_end(spin);
> + /* Wait until we see the timer fire, or we get cancelled */
> + do {
> + read(spin->timerfd, , sizeof(overruns));
> + } while (!overruns);
>
> + igt_spin_end(spin);
>   return NULL;
>  }
>
> --
> 2.26.0
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround

2020-03-19 Thread Dixit, Ashutosh
Discussed with Umesh today. Below is what we came up with.

On Tue, 17 Mar 2020 17:03:30 -0700, Lionel Landwerlin wrote:
> On 16/03/2020 21:23, Dixit, Ashutosh wrote:
> > On Thu, 12 Mar 2020 16:04:59 -0700, Umesh Nerlige Ramappa wrote:
> >> From: Lionel Landwerlin 
> >>
> >> We're about to introduce an options to open the perf stream, giving
> >> the user ability to configure how often it wants the kernel to poll
> >> the OA registers for available data.
> >>
> >> Right now the workaround against the OA tail pointer race condition
> >> requires at least twice the internal kernel polling timer to make any
> >> data available.
> >>
> >> This changes introduce checks on the OA data written into the circular
> >> buffer to make as much data as possible available on the first
> >> iteration of the polling timer.
> > Excellent, this absolutely needs to be done, I was thinking it may be
> > possible even with the approach taken in the original code but the approach
> > here looks better. It is also a nice cleanup.
> >
> >> @@ -507,64 +487,76 @@ static bool oa_buffer_check_unlocked(struct 
> >> i915_perf_stream *stream)
> >>
> >>now = ktime_get_mono_fast_ns();
> >>
> >> -  /* Update the aged tail
> >> -   *
> >> -   * Flip the tail pointer available for read()s once the aging tail is
> >> -   * old enough to trust that the corresponding data will be visible to
> >> -   * the CPU...
> >> -   *
> >> -   * Do this before updating the aging pointer in case we may be able to
> >> -   * immediately start aging a new pointer too (if new data has become
> >> -   * available) without needing to wait for a later hrtimer callback.
> >> -   */
> >> -  if (aging_tail != INVALID_TAIL_PTR &&
> >> -  ((now - stream->oa_buffer.aging_timestamp) >
> >> -   OA_TAIL_MARGIN_NSEC)) {
> >> -
> >> -  aged_idx ^= 1;
> >> -  stream->oa_buffer.aged_tail_idx = aged_idx;
> >> +  if (hw_tail == stream->oa_buffer.aging_tail) {
> >> +  /* If the HW tail hasn't move since the last check and the HW
> >> +   * tail has been aging for long enough, declare it the new
> >> +   * tail.
> >> +   */
> > Is this really needed? True we will never return the last report but maybe
> > it's ok, save some code?
>
> It doesn't look like a lot of code :)

OK, leave as is in the patch.

>
> >
> >> +  if ((now - stream->oa_buffer.aging_timestamp) >
> > Do we need to initialize 'aging_timestamp = ktime_get_mono_fast_ns()' when
> > the stream is enabled?
>
>
> Aye... I think aging_tail should probably be initialized to
> INVALID_TAIL_PTR in init_oa_buffer() vfuncs.
>
> So that on the first call it never matches : if (hw_tail ==
> stream->oa_buffer.aging_tail)
>
> And that aging_timestamp is set properly.

OK, let's do this.

> >
> >> +  OA_TAIL_MARGIN_NSEC) {
> >> +  stream->oa_buffer.tail =
> >> +  stream->oa_buffer.aging_tail;
> >> +  }
> >> +  } else {
> >> +  u32 head, tail, landed_report_heads;
> >>
> >> -  aged_tail = aging_tail;
> >> +  /* NB: The head we observe here might effectively be a little 
> >> out of
> >> +   * date (between head and tails[aged_idx].offset if there is 
> >> currently
> >> +   * a read() in progress.
> >> +   */
> > Is this comment correct? Aren't we are taking the same lock when updating
> > head after the read?
>
>
> It seems the mention of tails[aged_idx].offset is definitely out of date.
>
>
> That being said, the read is concurrent to this function (only the update
> of head is locked).
>
> That means the read code can be going through the reports while we're going
> through a different set of reports in the same OA buffer.

OK, reword the comment so that it doesn't look like it's a locking problem.

>
>
> >
> >> +  head = stream->oa_buffer.head - gtt_offset;
> >>
> >> -  /* Mark that we need a new pointer to start aging... */
> >> -  stream->oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
> >> -  aging_tail = INVALID_TAIL_PTR;
> >> -  }
> >> +  hw_tail -= gtt_offset;
> >>
> >> -  /* Update the aging tail
> >> -   *
> >> -   * We throttle 

Re: [Intel-gfx] [PATCH 4/4] drm/i915/perf: add new open param to configure polling of OA buffer

2020-03-16 Thread Dixit, Ashutosh
On Thu, 12 Mar 2020 16:05:02 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin 
>
> This new parameter let's the application choose how often the OA
> buffer should be checked on the CPU side for data availability. Longer
> polling period tend to reduce CPU overhead if the application does not
> care about somewhat real time data collection.

Lionely already has comments on this so skipping those.

> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> b/drivers/gpu/drm/i915/i915_perf.c
> index 21a63644846f..ca139ac31b11 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -262,11 +262,11 @@
>   */
>  #define OA_TAIL_MARGIN_NSEC  10ULL
>
> -/* frequency for checking whether the OA unit has written new reports to the
> - * circular OA buffer...
> +/* The default frequency for checking whether the OA unit has written new
> + * reports to the circular OA buffer...
>   */
> -#define POLL_FREQUENCY 200
> -#define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY)
> +#define DEFAULT_POLL_FREQUENCY 200
> +#define DEFAULT_POLL_PERIOD (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY)

nit, but I mostly like to have "units" as part of the name, so I'd recommed
calling this DEFAULT_POLL_PERIOD_NS.

> @@ -349,6 +349,8 @@ static const struct i915_oa_format 
> gen12_oa_formats[I915_OA_FORMAT_MAX] = {
>   * @oa_periodic: Whether to enable periodic OA unit sampling
>   * @oa_period_exponent: The OA unit sampling period is derived from this
>   * @engine: The engine (typically rcs0) being monitored by the OA unit
> + * @poll_oa_period: The period at which the CPU will check for OA data
> + * availability
>   *
>   * As read_properties_unlocked() enumerates and validates the properties 
> given
>   * to open a stream of metrics the configuration is built up in the structure
> @@ -368,6 +370,7 @@ struct perf_open_properties {
>   int oa_period_exponent;
>
>   struct intel_engine_cs *engine;
> + u64 poll_oa_period;

poll_oa_period_ns?

> @@ -2642,9 +2645,9 @@ static void i915_oa_stream_enable(struct 
> i915_perf_stream *stream)
>
>   stream->perf->ops.oa_enable(stream);
>
> - if (stream->periodic)
> + if (stream->periodic && stream->poll_oa_period)

poll_oa_period check is not required, it's always present.

> @@ -3617,6 +3625,14 @@ static int read_properties_unlocked(struct i915_perf 
> *perf,
>   case DRM_I915_PERF_PROP_HOLD_PREEMPTION:
>   props->hold_preemption = !!value;
>   break;
> + case DRM_I915_PERF_PROP_POLL_OA_DELAY:
> + if (value < 10 /* 100us */) {

So no maximum? That's probably ok, reap what you sow. Have we tested 100
us, seems low, anyway.

> diff --git a/drivers/gpu/drm/i915/i915_perf_types.h 
> b/drivers/gpu/drm/i915/i915_perf_types.h
> index 9ee7c58e70d5..01559ead22e2 100644
> --- a/drivers/gpu/drm/i915/i915_perf_types.h
> +++ b/drivers/gpu/drm/i915/i915_perf_types.h
> @@ -304,6 +304,12 @@ struct i915_perf_stream {
>* reprogrammed.
>*/
>   struct i915_vma *noa_wait;
> +
> + /**
> +  * @poll_oa_period: The period in nanoseconds at which the OA
> +  * buffer should be checked for available data.
> +  */
> + u64 poll_oa_period;

poll_oa_period_ns?

>  };
>
>  /**
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 2813e579b480..dd511e7f795d 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1969,6 +1969,19 @@ enum drm_i915_perf_property_id {
>*/
>   DRM_I915_PERF_PROP_HOLD_PREEMPTION,
>
> + /**
> +  * This optional parameter specifies the timer interval in nanoseconds
> +  * at which the i915 driver will check the OA buffer for available data.
> +  * Minimum allowed value is 100 microseconds. A default value is used by
> +  * the driver if this parameter is not specified. Note that a large
> +  * value may reduce cpu consumption during OA perf captures, but it
> +  * would also potentially result in OA buffer overwrite as the captures
> +  * reach end of the OA buffer.
> +  *
> +  * This property is available in perf revision 4.
> +  */
> + DRM_I915_PERF_PROP_POLL_OA_DELAY,

Is DRM_I915_PERF_PROP_POLL_OA_PERIOD_NS a better name? At least PERIOD
instead of DELAY.

Thanks!
--
Ashutosh
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/3] drm/i915/perf: add new open param to configure polling of OA buffer

2020-03-21 Thread Dixit, Ashutosh
On Thu, 19 Mar 2020 15:52:03 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin 
>
> This new parameter let's the application choose how often the OA
> buffer should be checked on the CPU side for data availability. Longer
> polling period tend to reduce CPU overhead if the application does not
> care about somewhat real time data collection.

/snip/

> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index db649d03ab52..c781b9f31b3c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1980,6 +1980,19 @@ enum drm_i915_perf_property_id {
>*/
>   DRM_I915_PERF_PROP_GLOBAL_SSEU,
>
> + /**
> +  * This optional parameter specifies the timer interval in nanoseconds
> +  * at which the i915 driver will check the OA buffer for available data.
> +  * Minimum allowed value is 100 microseconds. A default value is used by
> +  * the driver if this parameter is not specified. Note that a large
> +  * value may reduce cpu consumption during OA perf captures, but it
> +  * would also potentially result in OA buffer overwrite as the captures
> +  * reach end of the OA buffer.

I would like to change the wording here a bit because the wording above
seems to say that the /same/ timer period which reduces cpu consumption
will also potentially result in OA buffer overwrites. This is misleading
and not true. Instead we should say something like: "Note that larger timer
values will reduce cpu consumption during OA perf captures. However,
excessively large values would potentially result in OA buffer overwrites
as captures reach end of the OA buffer".

Maybe some guidance to selecting the value, say based on sampling rate and
OA buffer size, could also be added here but at the minimum we should fix
up the wording as indicated above before pushing. Otherwise:

Reviewed-by: Ashutosh Dixit 

> +  *
> +  * This property is available in perf revision 5.
> +  */
> + DRM_I915_PERF_PROP_POLL_OA_PERIOD,
> +
>   DRM_I915_PERF_PROP_MAX /* non-ABI */
>  };
>
> --
> 2.20.1
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround

2020-03-21 Thread Dixit, Ashutosh
On Thu, 19 Mar 2020 15:52:01 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin 
>
> We're about to introduce an options to open the perf stream, giving
> the user ability to configure how often it wants the kernel to poll
> the OA registers for available data.
>
> Right now the workaround against the OA tail pointer race condition
> requires at least twice the internal kernel polling timer to make any
> data available.
>
> This changes introduce checks on the OA data written into the circular
> buffer to make as much data as possible available on the first
> iteration of the polling timer.

/snip/

> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> b/drivers/gpu/drm/i915/i915_perf.c
> index 3222f6cd8255..c1429d3acaf9 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -223,26 +223,17 @@
>   *
>   * Although this can be observed explicitly while copying reports to 
> userspace
>   * by checking for a zeroed report-id field in tail reports, we want to 
> account
> - * for this earlier, as part of the oa_buffer_check to avoid lots of 
> redundant
> - * read() attempts.
> - *
> - * In effect we define a tail pointer for reading that lags the real tail
> - * pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough
> - * time for the corresponding reports to become visible to the CPU.
> - *
> - * To manage this we actually track two tail pointers:
> - *  1) An 'aging' tail with an associated timestamp that is tracked until we
> - * can trust the corresponding data is visible to the CPU; at which point
> - * it is considered 'aged'.
> - *  2) An 'aged' tail that can be used for read()ing.
> - *
> - * The two separate pointers let us decouple read()s from tail pointer aging.
> - *
> - * The tail pointers are checked and updated at a limited rate within a 
> hrtimer
> - * callback (the same callback that is used for delivering EPOLLIN events)
> - *
> - * Initially the tails are marked invalid with %INVALID_TAIL_PTR which
> - * indicates that an updated tail pointer is needed.
> + * for this earlier, as part of the oa_buffer_check_unlocked to avoid lots of
> + * redundant read() attempts.
> + *
> + * We workaround this issue in oa_buffer_check_unlocked() by reading the 
> reports
> + * in the OA buffer, starting from the tail reported by the HW until we find 
> 2
> + * consecutive reports with their first 2 dwords of not at 0. Those dwords 
> are

until we find a report with its first 2 dwords not 0 meaning its previous
report is completely in memory and ready to be read.

> + * also set to 0 once read and the whole buffer is cleared upon OA buffer
> + * initialization. The first dword is the reason for this report while the
> + * second is the timestamp, making the chances of having those 2 fields at 0
> + * fairly unlikely. A more detailed explanation is available in
> + * oa_buffer_check_unlocked().

> @@ -477,16 +468,6 @@ static bool oa_buffer_check_unlocked(struct 
> i915_perf_stream *stream)
>*/
>   spin_lock_irqsave(>oa_buffer.ptr_lock, flags);
>
>   hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
>
>   hw_tail &= ~(report_size - 1);
>
> @@ -496,64 +477,64 @@ static bool oa_buffer_check_unlocked(struct 
> i915_perf_stream *stream)
>
>   now = ktime_get_mono_fast_ns();
>
> + if (hw_tail == stream->oa_buffer.aging_tail &&
> +(now - stream->oa_buffer.aging_timestamp) > OA_TAIL_MARGIN_NSEC) {
> + /* If the HW tail hasn't move since the last check and the HW
> +  * tail has been aging for long enough, declare it the new
> +  * tail.
> +  */
> + stream->oa_buffer.tail = stream->oa_buffer.aging_tail;
> + } else {
> + u32 head, tail;
>
> + /* NB: The head we observe here might effectively be a little
> +  * out of date. If a read() is in progress, the head could be
> +  * anywhere between this head and stream->oa_buffer.tail.
> +  */
> + head = stream->oa_buffer.head - gtt_offset;
>
> + hw_tail -= gtt_offset;
> + tail = hw_tail;
>
> + /* Walk the stream backward until we find a report with dword 0
> +  * & 1 not at 0. Since the circular buffer pointers progress by
> +  * increments of 64 bytes and that reports can be up to 256
> +  * bytes long, we can't tell whether a report has fully landed
> +  * in memory before the first 2 dwords of the following report
> +  * have effectively landed.
> +  *
> +  * This is assuming that the writes of the OA unit land in
> +  * memory in the order they were written to.
> +  * If not : (╯°□°)╯︵ ┻━┻
>*/
> - if (hw_tail >= gtt_offset &&
> - hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
> - 

Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround

2020-03-21 Thread Dixit, Ashutosh
On Sat, 21 Mar 2020 16:26:42 -0700, Dixit, Ashutosh wrote:
>
> On Thu, 19 Mar 2020 15:52:01 -0700, Umesh Nerlige Ramappa wrote:
> >
> > From: Lionel Landwerlin 
>
> > @@ -477,16 +468,6 @@ static bool oa_buffer_check_unlocked(struct 
> > i915_perf_stream *stream)
> >  */
> > spin_lock_irqsave(>oa_buffer.ptr_lock, flags);
> >
> > hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
> >
> > hw_tail &= ~(report_size - 1);
> >
> > @@ -496,64 +477,64 @@ static bool oa_buffer_check_unlocked(struct 
> > i915_perf_stream *stream)
> >
> > now = ktime_get_mono_fast_ns();
> >
> > +   if (hw_tail == stream->oa_buffer.aging_tail &&
> > +  (now - stream->oa_buffer.aging_timestamp) > OA_TAIL_MARGIN_NSEC) {
> > +   /* If the HW tail hasn't move since the last check and the HW
> > +* tail has been aging for long enough, declare it the new
> > +* tail.
> > +*/
> > +   stream->oa_buffer.tail = stream->oa_buffer.aging_tail;
> > +   } else {
> > +   u32 head, tail;
> >
> > +   /* NB: The head we observe here might effectively be a little
> > +* out of date. If a read() is in progress, the head could be
> > +* anywhere between this head and stream->oa_buffer.tail.
> > +*/
> > +   head = stream->oa_buffer.head - gtt_offset;
> >
> > +   hw_tail -= gtt_offset;
> > +   tail = hw_tail;
> >
> > +   /* Walk the stream backward until we find a report with dword 0
> > +* & 1 not at 0. Since the circular buffer pointers progress by
> > +* increments of 64 bytes and that reports can be up to 256
> > +* bytes long, we can't tell whether a report has fully landed
> > +* in memory before the first 2 dwords of the following report
> > +* have effectively landed.
> > +*
> > +* This is assuming that the writes of the OA unit land in
> > +* memory in the order they were written to.
> > +* If not : (╯°□°)╯︵ ┻━┻
> >  */
> > +   while (OA_TAKEN(tail, head) >= report_size) {
> > +   u32 previous_tail = (tail - report_size) & 
> > (OA_BUFFER_SIZE - 1);
> > +   u32 *report32 = (void *)(stream->oa_buffer.vaddr + 
> > previous_tail);
>
> Sorry, this is wrong. This should just be:
>
>   tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
>   report32 = (void *)(stream->oa_buffer.vaddr + tail);
>
> Otherwise when we break out of the loop below tail is still set one
> report_size ahead. previous_tail is not needed. (In the previous version of
> the patch this used to work out correctly).
>
> > +
> > +   /* Head of the report indicated by the HW tail register 
> > has
> > +* indeed landed into memory.
> > +*/
> > +   if (report32[0] != 0 || report32[1] != 0)
> > +   break;
> > +
> > +   tail = previous_tail;
> > }

Actually a couple of further improvements to the loop above are
possible. First there is no reason to start at previous_tail, we can just
start at the aligned hw_tail itself. Therefore the loop becomes:

while (OA_TAKEN(tail, head) >= report_size) {
u32 *report32 = (void *)(stream->oa_buffer.vaddr + 
tail);

if (report32[0] != 0 || report32[1] != 0)
break;

tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
}

Further, there is no reason to go back to the head but only to the old
tail. Therefore:

head = stream->oa_buffer.head - gtt_offset;
old_tail = stream->oa_buffer.tail - gtt_offset;

hw_tail -= gtt_offset;
tail = hw_tail;

while (OA_TAKEN(tail, old_tail) >= report_size) {
u32 *report32 = (void *)(stream->oa_buffer.vaddr + 
tail);

if (report32[0] != 0 || report32[1] != 0)
break;

tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
}

Please review and see if these two improvements are possible. Thanks!
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround

2020-03-22 Thread Dixit, Ashutosh
On Sat, 21 Mar 2020 21:44:43 -0700, Dixit, Ashutosh wrote:
>
> Actually a couple of further improvements to the loop above are
> possible. First there is no reason to start at previous_tail, we can just
> start at the aligned hw_tail itself.

Unless we deliberately want to wait and delay declaring the data as valid,
i.e. we are really not sure what is happening and it seems "safer" to wait
for an extra report.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/4] drm/i915/perf: move pollin setup to non hw specific code

2020-03-16 Thread Dixit, Ashutosh
On Thu, 12 Mar 2020 16:05:00 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin 
>
> This isn't really gen specific stuff, so just move it to the common
> code.

It seems pollin is not the only member which is not gen specific but is
initialized in gen specific code. Anyway any other such moves are for
future patches.

> @@ -2626,6 +2620,12 @@ static void gen12_oa_enable(struct i915_perf_stream 
> *stream)
>   */
>  static void i915_oa_stream_enable(struct i915_perf_stream *stream)
>  {
> + /*
> +  * Maybe make ->pollin per-stream state if we support multiple
> +  * concurrent streams in the future.
> +  */
> + stream->pollin = false;

What about the comment above? Isn't pollin already per-stream?

Please fix if needed. Otherwise this is:

Reviewed-by: Ashutosh Dixit 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround

2020-03-16 Thread Dixit, Ashutosh
On Thu, 12 Mar 2020 16:04:59 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin 
>
> We're about to introduce an options to open the perf stream, giving
> the user ability to configure how often it wants the kernel to poll
> the OA registers for available data.
>
> Right now the workaround against the OA tail pointer race condition
> requires at least twice the internal kernel polling timer to make any
> data available.
>
> This changes introduce checks on the OA data written into the circular
> buffer to make as much data as possible available on the first
> iteration of the polling timer.

Excellent, this absolutely needs to be done, I was thinking it may be
possible even with the approach taken in the original code but the approach
here looks better. It is also a nice cleanup.

> @@ -507,64 +487,76 @@ static bool oa_buffer_check_unlocked(struct 
> i915_perf_stream *stream)
>
>   now = ktime_get_mono_fast_ns();
>
> - /* Update the aged tail
> -  *
> -  * Flip the tail pointer available for read()s once the aging tail is
> -  * old enough to trust that the corresponding data will be visible to
> -  * the CPU...
> -  *
> -  * Do this before updating the aging pointer in case we may be able to
> -  * immediately start aging a new pointer too (if new data has become
> -  * available) without needing to wait for a later hrtimer callback.
> -  */
> - if (aging_tail != INVALID_TAIL_PTR &&
> - ((now - stream->oa_buffer.aging_timestamp) >
> -  OA_TAIL_MARGIN_NSEC)) {
> -
> - aged_idx ^= 1;
> - stream->oa_buffer.aged_tail_idx = aged_idx;
> + if (hw_tail == stream->oa_buffer.aging_tail) {
> + /* If the HW tail hasn't move since the last check and the HW
> +  * tail has been aging for long enough, declare it the new
> +  * tail.
> +  */

Is this really needed? True we will never return the last report but maybe
it's ok, save some code?

> + if ((now - stream->oa_buffer.aging_timestamp) >

Do we need to initialize 'aging_timestamp = ktime_get_mono_fast_ns()' when
the stream is enabled?

> + OA_TAIL_MARGIN_NSEC) {
> + stream->oa_buffer.tail =
> + stream->oa_buffer.aging_tail;
> + }
> + } else {
> + u32 head, tail, landed_report_heads;
>
> - aged_tail = aging_tail;
> + /* NB: The head we observe here might effectively be a little 
> out of
> +  * date (between head and tails[aged_idx].offset if there is 
> currently
> +  * a read() in progress.
> +  */

Is this comment correct? Aren't we are taking the same lock when updating
head after the read?

> + head = stream->oa_buffer.head - gtt_offset;
>
> - /* Mark that we need a new pointer to start aging... */
> - stream->oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
> - aging_tail = INVALID_TAIL_PTR;
> - }
> + hw_tail -= gtt_offset;
>
> - /* Update the aging tail
> -  *
> -  * We throttle aging tail updates until we have a new tail that
> -  * represents >= one report more data than is already available for
> -  * reading. This ensures there will be enough data for a successful
> -  * read once this new pointer has aged and ensures we will give the new
> -  * pointer time to age.
> -  */
> - if (aging_tail == INVALID_TAIL_PTR &&
> - (aged_tail == INVALID_TAIL_PTR ||
> -  OA_TAKEN(hw_tail, aged_tail) >= report_size)) {
> - struct i915_vma *vma = stream->oa_buffer.vma;
> - u32 gtt_offset = i915_ggtt_offset(vma);
> -
> - /* Be paranoid and do a bounds check on the pointer read back
> -  * from hardware, just in case some spurious hardware condition
> -  * could put the tail out of bounds...
> + /* Walk the stream backward until we find at least 2 reports
> +  * with dword 0 & 1 not at 0. Since the circular buffer
> +  * pointers progress by increments of 64 bytes and that
> +  * reports can be up to 256 bytes long, we can't tell whether
> +  * a report has fully landed in memory before the first 2
> +  * dwords of the following report have effectively landed.
> +  *
> +  * This is assuming that the writes of the OA unit land in
> +  * memory in the order they were written to.
> +  * If not : (╯°□°)╯︵ ┻━┻
>*/
> - if (hw_tail >= gtt_offset &&
> - hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
> - stream->oa_buffer.tails[!aged_idx].offset =
> - aging_tail = hw_tail;
> - stream->oa_buffer.aging_timestamp = now;
> - } else {
> - 

Re: [Intel-gfx] [PATCH 3/4] drm/i915/perf: only append status when data is available

2020-03-16 Thread Dixit, Ashutosh
On Thu, 12 Mar 2020 16:05:01 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin 
>
> The only bit of the status register we currently report in the
> i915-perf stream is the "report loss" bit. Only report this when we
> have some data to report with it. There was a kind of inconsistency
> here in that we could report report loss without appending the reports
> associated with the loss.

Splitting hair a bit, but I am wondering if this is realistic? If reports
have been lost in the middle of a OA buffer then there /will/ be some data
from the hardware so head != tail. So is the situation which this patch is
fixing ever been observed in practice?

Also, if we are doing this, how about moving the entire status handling
here, including intel_uncore_read() and OABUFFER_OVERFLOW handling (which I
understand resets the stream so probably doesn't have associated data).

In any case, since these are just random questions, this is:

Reviewed-by: Ashutosh Dixit 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround

2020-03-24 Thread Dixit, Ashutosh
On Tue, 24 Mar 2020 11:54:55 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin 
>
> We're about to introduce an options to open the perf stream, giving
> the user ability to configure how often it wants the kernel to poll
> the OA registers for available data.
>
> Right now the workaround against the OA tail pointer race condition
> requires at least twice the internal kernel polling timer to make any
> data available.
>
> This changes introduce checks on the OA data written into the circular
> buffer to make as much data as possible available on the first
> iteration of the polling timer.

Reviewed-by: Ashutosh Dixit 

>
> v2: Use OA_TAKEN macro without the gtt_offset (Lionel)
> v3: (Umesh)
> - Rebase
> - Change report to report32 from below review
>   https://patchwork.freedesktop.org/patch/330704/?series=66697=1
> v4: (Ashutosh, Lionel)
> - Fix checkpatch errors
> - Fix aging_timestamp initialization
> - Check for only one valid landed report
> - Fix check for unlanded report
> v5: (Ashutosh)
> - Fix bug in accurately determining landed report.
> - Optimize the check for landed reports by going as far as the
>   previously determined aged tail.
>
> Signed-off-by: Lionel Landwerlin 
> Signed-off-by: Umesh Nerlige Ramappa 
> ---
>  drivers/gpu/drm/i915/i915_perf.c   | 204 ++---
>  drivers/gpu/drm/i915/i915_perf_types.h |  28 ++--
>  2 files changed, 97 insertions(+), 135 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> b/drivers/gpu/drm/i915/i915_perf.c
> index 3222f6cd8255..4583ae9b77be 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -223,26 +223,17 @@
>   *
>   * Although this can be observed explicitly while copying reports to 
> userspace
>   * by checking for a zeroed report-id field in tail reports, we want to 
> account
> - * for this earlier, as part of the oa_buffer_check to avoid lots of 
> redundant
> - * read() attempts.
> - *
> - * In effect we define a tail pointer for reading that lags the real tail
> - * pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough
> - * time for the corresponding reports to become visible to the CPU.
> - *
> - * To manage this we actually track two tail pointers:
> - *  1) An 'aging' tail with an associated timestamp that is tracked until we
> - * can trust the corresponding data is visible to the CPU; at which point
> - * it is considered 'aged'.
> - *  2) An 'aged' tail that can be used for read()ing.
> - *
> - * The two separate pointers let us decouple read()s from tail pointer aging.
> - *
> - * The tail pointers are checked and updated at a limited rate within a 
> hrtimer
> - * callback (the same callback that is used for delivering EPOLLIN events)
> - *
> - * Initially the tails are marked invalid with %INVALID_TAIL_PTR which
> - * indicates that an updated tail pointer is needed.
> + * for this earlier, as part of the oa_buffer_check_unlocked to avoid lots of
> + * redundant read() attempts.
> + *
> + * We workaround this issue in oa_buffer_check_unlocked() by reading the 
> reports
> + * in the OA buffer, starting from the tail reported by the HW until we find 
> a
> + * report with its first 2 dwords not 0 meaning its previous report is
> + * completely in memory and ready to be read. Those dwords are also set to 0
> + * once read and the whole buffer is cleared upon OA buffer initialization. 
> The
> + * first dword is the reason for this report while the second is the 
> timestamp,
> + * making the chances of having those 2 fields at 0 fairly unlikely. A more
> + * detailed explanation is available in oa_buffer_check_unlocked().
>   *
>   * Most of the implementation details for this workaround are in
>   * oa_buffer_check_unlocked() and _append_oa_reports()
> @@ -454,8 +445,8 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream 
> *stream)
>   * (See description of OA_TAIL_MARGIN_NSEC above for further details.)
>   *
>   * Besides returning true when there is data available to read() this 
> function
> - * also has the side effect of updating the oa_buffer.tails[], 
> .aging_timestamp
> - * and .aged_tail_idx state used for reading.
> + * also updates the tail, aging_tail and aging_timestamp in the oa_buffer
> + * object.
>   *
>   * Note: It's safe to read OA config state here unlocked, assuming that this 
> is
>   * only called while the stream is enabled, while the global OA configuration
> @@ -465,28 +456,18 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream 
> *stream)
>   */
>  static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>  {
> + u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>   int report_size = stream->oa_buffer.format_size;
>   unsigned long flags;
> - unsigned int aged_idx;
> - u32 head, hw_tail, aged_tail, aging_tail;
> + u32 hw_tail;
>   u64 now;
>
>   /* We have to consider the (unlikely) possibility that read() errors
> - 

Re: [Intel-gfx] [PATCH] drm/i915: break TGL pci-ids in GT 1 & 2

2020-08-28 Thread Dixit, Ashutosh
On Fri, 28 Aug 2020 06:31:25 -0700, Lionel Landwerlin wrote:
>
> I'll need this in IGT to identify the different kind of GTs and apply
> the right performance query configuration.

Reviewed-by: Ashutosh Dixit 

> Signed-off-by: Lionel Landwerlin 
> ---
>  include/drm/i915_pciids.h | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/include/drm/i915_pciids.h b/include/drm/i915_pciids.h
> index 8e7ae30ebcbb..7eeecb07c9a1 100644
> --- a/include/drm/i915_pciids.h
> +++ b/include/drm/i915_pciids.h
> @@ -594,19 +594,25 @@
>   INTEL_VGA_DEVICE(0x4E51, info)
>
>  /* TGL */
> -#define INTEL_TGL_12_IDS(info) \
> +#define INTEL_TGL_12_GT1_IDS(info) \
> + INTEL_VGA_DEVICE(0x9A60, info), \
> + INTEL_VGA_DEVICE(0x9A68, info), \
> + INTEL_VGA_DEVICE(0x9A70, info)
> +
> +#define INTEL_TGL_12_GT2_IDS(info) \
>   INTEL_VGA_DEVICE(0x9A40, info), \
>   INTEL_VGA_DEVICE(0x9A49, info), \
>   INTEL_VGA_DEVICE(0x9A59, info), \
> - INTEL_VGA_DEVICE(0x9A60, info), \
> - INTEL_VGA_DEVICE(0x9A68, info), \
> - INTEL_VGA_DEVICE(0x9A70, info), \
>   INTEL_VGA_DEVICE(0x9A78, info), \
>   INTEL_VGA_DEVICE(0x9AC0, info), \
>   INTEL_VGA_DEVICE(0x9AC9, info), \
>   INTEL_VGA_DEVICE(0x9AD9, info), \
>   INTEL_VGA_DEVICE(0x9AF8, info)
>
> +#define INTEL_TGL_12_IDS(info) \
> + INTEL_TGL_12_GT1_IDS(info), \
> + INTEL_TGL_12_GT2_IDS(info)
> +
>  /* RKL */
>  #define INTEL_RKL_IDS(info) \
>   INTEL_VGA_DEVICE(0x4C80, info), \
> --
> 2.28.0
>
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/7] drm/i915/perf: Whitelist OA report trigger registers

2020-11-13 Thread Dixit, Ashutosh
On Fri, 13 Nov 2020 14:12:09 -0800, Umesh Nerlige Ramappa wrote:
>
> > +   if (wal->engine)
> > +   spin_lock_irqsave(>engine->uncore->lock, flags);
> > +
> > +   kfree(wal->list);
>
> if (wal->list)
>   kfree(wal->list);

void kfree(const void *objp)
{
...

if (unlikely(ZERO_OR_NULL_PTR(objp)))
return;
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t 2/2] i915/gem_ctx_thrash: Reopen the same device

2020-11-04 Thread Dixit, Ashutosh
On Wed, 04 Nov 2020 16:21:24 -0800, Chris Wilson wrote:
>
> Use gem_reopen_driver() to always reopen the same device without relying
> on the filtering in drm_open_driver().

Reviewed-by: Ashutosh Dixit 

> Signed-off-by: Chris Wilson 
> ---
>  tests/i915/gem_ctx_thrash.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/i915/gem_ctx_thrash.c b/tests/i915/gem_ctx_thrash.c
> index d32619d5d..d9ddd6689 100644
> --- a/tests/i915/gem_ctx_thrash.c
> +++ b/tests/i915/gem_ctx_thrash.c
> @@ -250,7 +250,7 @@ static void processes(void)
>   fds = malloc(num_ctx * sizeof(int));
>   igt_assert(fds);
>   for (unsigned n = 0; n < num_ctx; n++) {
> - fds[n] = drm_open_driver(DRIVER_INTEL);
> + fds[n] = gem_reopen_driver(fd);
>   if (fds[n] == -1) {
>   int err = errno;
>   for (unsigned i = n; i--; )
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t 1/2] i915/gem_exec_whisper: Reopen existing device

2020-11-04 Thread Dixit, Ashutosh
On Wed, 04 Nov 2020 16:21:23 -0800, Chris Wilson wrote:
>
> Reopen the existing device, rather than relying on the filtering in
> drm_open_driver().

Reviewed-by: Ashutosh Dixit 

> Signed-off-by: Chris Wilson 
> ---
>  tests/i915/gem_exec_whisper.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/tests/i915/gem_exec_whisper.c b/tests/i915/gem_exec_whisper.c
> index 9acf6c306..b63d791d1 100644
> --- a/tests/i915/gem_exec_whisper.c
> +++ b/tests/i915/gem_exec_whisper.c
> @@ -87,12 +87,12 @@ struct hang {
>   int fd;
>  };
>
> -static void init_hang(struct hang *h)
> +static void init_hang(struct hang *h, int fd)
>  {
>   uint32_t *batch;
>   int i, gen;
>
> - h->fd = drm_open_driver(DRIVER_INTEL);
> + h->fd = gem_reopen_driver(fd);
>   igt_allow_hang(h->fd, 0, 0);
>
>   gen = intel_gen(intel_get_drm_devid(h->fd));
> @@ -224,7 +224,7 @@ static void whisper(int fd, unsigned engine, unsigned 
> flags)
>   igt_require(gem_has_queues(fd));
>
>   if (flags & HANG)
> - init_hang();
> + init_hang(, fd);
>
>   nchild = 1;
>   if (flags & FORKED)
> @@ -304,7 +304,7 @@ static void whisper(int fd, unsigned engine, unsigned 
> flags)
>   }
>   if (flags & FDS) {
>   for (n = 0; n < 64; n++)
> - fds[n] = drm_open_driver(DRIVER_INTEL);
> + fds[n] = gem_reopen_driver(fd);
>   }
>
>   memset(batches, 0, sizeof(batches));
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] i915/gem_exec_parallel: Reopen the existing device

2020-11-04 Thread Dixit, Ashutosh
On Wed, 04 Nov 2020 14:23:21 -0800, Chris Wilson wrote:
>
> Avoid any unnecessary filtering inside drm_open_driver() by explicitly
> reopening the same device.
>
> Signed-off-by: Chris Wilson 
> ---
>  tests/i915/gem_exec_parallel.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/i915/gem_exec_parallel.c b/tests/i915/gem_exec_parallel.c
> index bdb8e3e90..1a988b957 100644
> --- a/tests/i915/gem_exec_parallel.c
> +++ b/tests/i915/gem_exec_parallel.c
> @@ -77,7 +77,7 @@ static void *thread(void *data)
>   pthread_mutex_unlock(t->mutex);
>
>   if (t->flags & FDS) {
> - fd = drm_open_driver(DRIVER_INTEL);
> + fd = gem_reopen_driver(t->fd);

Reviewed-by: Ashutosh Dixit 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915

2021-05-10 Thread Dixit, Ashutosh
On Sun, 09 May 2021 16:11:43 -0700, Jason Ekstrand wrote:
>
> Yes, landing GuC support may be the first step in removing execlist
> support. The inevitable reality is that GPU scheduling is coming and
> likely to be there only path in the not-too-distant future.  (See also
> the ongoing thread with AMD about fences.) I'm not going to pass
> judgement on whether or not this is a good thing.  I'm just reading the
> winds and, in my view, this is where things are headed for good or ill.
>
> In answer to the question above, the answer to "what do we gain from
> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
> again, I'm not necessarily advocating for it, but that is likely where
> things are headed.
>
> A firmware-based submission model isn't a bad design IMO and, aside from
> the firmware freedom issues, I think there are actual advantages to the
> model. Immediately, it'll unlock a few features like parallel submission
> (more on that in a bit) and long-running compute because they're
> implemented in GuC and the work to implement them properly in the
> execlist scheduler is highly non-trivial.  Longer term, it may (no
> guarantees) unlock some performance by getting the kernel out of the way.

I believe another main reason for GuC is support for HW based
virtualization like SRIOV. The only way to support SRIOV with execlists
would be to statically partition the GPU between VM's, any dynamic
partitioning needs something in HW.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v2] gem_watchdog: Skip test if default request expiry is not compiled in

2021-06-29 Thread Dixit, Ashutosh
On Mon, 24 May 2021 07:38:01 -0700, Tvrtko Ursulin wrote:
>
> From: Tvrtko Ursulin 
>
> Test incorrectly assumes no modparam means default expiry, while in
> reality no modparam means old kernel / de-selected feature in which
> case test should skip.
>
> v2:
>  * New line. (Petri)

Reviewed-by: Ashutosh Dixit 

> Signed-off-by: Tvrtko Ursulin 
> ---
>  tests/i915/gem_watchdog.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/tests/i915/gem_watchdog.c b/tests/i915/gem_watchdog.c
> index 8f9fb17750fb..67fddac74bc1 100644
> --- a/tests/i915/gem_watchdog.c
> +++ b/tests/i915/gem_watchdog.c
> @@ -630,6 +630,7 @@ igt_main
>
>   igt_fixture {
>   struct drm_i915_query_item item;
> + const unsigned int timeout = 1;
>   char *tmp;
>
>   i915 = drm_open_driver_master(DRIVER_INTEL);
> @@ -639,16 +640,13 @@ igt_main
>   igt_require_gem(i915);
>
>   tmp = __igt_params_get(i915, "request_timeout_ms");
> - if (tmp) {
> - const unsigned int timeout = 1;
> + igt_skip_on_f(!tmp || !atoi(tmp),
> +   "Request expiry not supported!\n");
> + free(tmp);
>
> - igt_params_save_and_set(i915, "request_timeout_ms",
> - "%u", timeout * 1000);
> - default_timeout_wait_s = timeout * 5;
> - free(tmp);
> - } else {
> - default_timeout_wait_s = 12;
> - }
> + igt_params_save_and_set(i915, "request_timeout_ms", "%u",
> + timeout * 1000);
> + default_timeout_wait_s = timeout * 5;
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-04-30 Thread Dixit, Ashutosh
On Fri, 30 Apr 2021 16:00:46 -0700, Dixit, Ashutosh wrote:
>
> On Fri, 30 Apr 2021 15:26:09 -0700, Umesh Nerlige Ramappa wrote:
> >
> > Looks like the engine can be dropped since all timestamps are in sync. I
> > just have one more question here. The timestamp itself is 36 bits.  Should
> > the uapi also report the timestamp width to the user OR should I just
> > return the lower 32 bits of the timestamp?
>
> How would exposing only the lower 32 bits of the timestamp work?

It would work I guess but overflow every few seconds. So if the counters
are sampled at a low frequency (once every few seconds) it would yield
misleading timestamps.

> The way to avoid exposing the width would be to expose the timestamp as a
> regular 64 bit value. In the kernel engine state, have a variable for the
> counter and keep on accumulating that (on each query) to full 64 bits in
> spite of the 36 bit HW counter overflow.
>
> So not exposing the width (or exposing a 64 bit timestamp) is a cleaner
> interface but also more work in the kernel.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-04-30 Thread Dixit, Ashutosh
On Fri, 30 Apr 2021 15:26:09 -0700, Umesh Nerlige Ramappa wrote:
>
> Looks like the engine can be dropped since all timestamps are in sync. I
> just have one more question here. The timestamp itself is 36 bits.  Should
> the uapi also report the timestamp width to the user OR should I just
> return the lower 32 bits of the timestamp?

How would exposing only the lower 32 bits of the timestamp work?

The way to avoid exposing the width would be to expose the timestamp as a
regular 64 bit value. In the kernel engine state, have a variable for the
counter and keep on accumulating that (on each query) to full 64 bits in
spite of the 36 bit HW counter overflow.

So not exposing the width (or exposing a 64 bit timestamp) is a cleaner
interface but also more work in the kernel.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-04-30 Thread Dixit, Ashutosh
On Fri, 30 Apr 2021 19:19:59 -0700, Umesh Nerlige Ramappa wrote:
>
> On Fri, Apr 30, 2021 at 07:35:41PM -0500, Jason Ekstrand wrote:
> >   On April 30, 2021 18:00:58 "Dixit, Ashutosh" 
> >   wrote:
> >
> > On Fri, 30 Apr 2021 15:26:09 -0700, Umesh Nerlige Ramappa wrote:
> >
> >   Looks like the engine can be dropped since all timestamps are in sync.
> >   I
> >   just have one more question here. The timestamp itself is 36 bits.
> >    Should
> >   the uapi also report the timestamp width to the user OR should I just
> >   return the lower 32 bits of the timestamp?
> >
> >   Yeah, I think reporting the timestamp width is a good idea since we're
> >   reporting the period/frequency here.
>
> Actually, I forgot that we are handling the overflow before returning the
> cs_cycles to the user and overflow handling was the only reason I thought
> user should know the width. Would you stil recommend returning the width in
> the uapi?

The width is needed for userspace to figure out if overflow has occured
between two successive query calls. I don't think I see this happening in
the code.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: Start disabling pread/pwrite ioctl's for future platforms

2021-01-24 Thread Dixit, Ashutosh
On Sat, 23 Jan 2021 05:05:02 -0800, Patchwork wrote:
>
> [1  ]
> [2  ]
> Project List - Patchwork
>
> Patch Details
>
>  Series:  drm/i915: Start disabling pread/pwrite ioctl's for future platforms
>  URL:  https://patchwork.freedesktop.org/series/86199/
>  State:  failure
>  Details:  https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19470/index.html
>
> CI Bug Log - changes from CI_DRM_9671_full -> Patchwork_19470_full
>
> Summary
>
> FAILURE
>
> Serious unknown changes coming with Patchwork_19470_full absolutely need to be
> verified manually.
>
> If you think the reported changes have nothing to do with the changes
> introduced in Patchwork_19470_full, please notify your bug team to allow them
> to document this new failure mode, which will reduce false positives in CI.
>
> Possible new issues
>
> Here are the unknown changes that may have been introduced in 
> Patchwork_19470_full:
>
> IGT changes
>
> Possible regressions
>
> * igt@kms_sequence@queue-busy:
>
>  * shard-skl: PASS -> FAIL

This is a unrelated failure and a false positive, the patch should not have
any effect on this platform.

Seeing a few igt@kms_sequence bugs in gitlab but maybe the signature for
this one is a little different. Thanks.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 55/56] drm/i915/perf: Enable OA formats for ADL_P

2021-03-12 Thread Dixit, Ashutosh
On Thu, 11 Mar 2021 14:36:31 -0800, Matt Roper wrote:
>
> From: Umesh Nerlige Ramappa 
>
> Enable relevant OA formats for ADL_P.

Reviewed-by: Ashutosh Dixit 

> Cc: Ashutosh Dixit 
> Signed-off-by: Umesh Nerlige Ramappa 
> Signed-off-by: Clinton Taylor 
> Signed-off-by: Matt Roper 
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> b/drivers/gpu/drm/i915/i915_perf.c
> index c15bead2dac7..e52b54ff0999 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -4291,6 +4291,7 @@ static void oa_init_supported_formats(struct i915_perf 
> *perf)
>   case INTEL_ROCKETLAKE:
>   case INTEL_DG1:
>   case INTEL_ALDERLAKE_S:
> + case INTEL_ALDERLAKE_P:
>   oa_format_add(perf, I915_OA_FORMAT_A12);
>   oa_format_add(perf, I915_OA_FORMAT_A12_B8_C8);
>   oa_format_add(perf, I915_OA_FORMAT_A32u40_A4u32_B8_C8);
> --
> 2.25.4
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 0/3] drm/i915: Drop legacy IOCTLs on new HW

2021-03-15 Thread Dixit, Ashutosh
On Mon, 15 Mar 2021 07:34:25 -0700, Jason Ekstrand wrote:
>
> These three patches exist to clean up some of our IOCTL mess in i915.
> We've got more clean-up we should do eventually, but these are some of the
> easiest to drop and most egregious cases.
>
> Test-with: 20210121083742.46592-1-ashutosh.di...@intel.com

Hi Jason,

Sorry the above IGT build is too old and has been discarded so no tests
were running on actual HW as is mentioned here:

https://intel-gfx-ci.01.org/test-with.html

I resubmitted the IGT patch today so we have a newer IGT build and have
just resubmitted this series with that IGT build. There are no other
changes with the actual patches themselves.

Thanks.
--
Ashutosh



>
> Ashutosh Dixit (1):
>   drm/i915: Disable pread/pwrite ioctl's for future platforms (v3)
>
> Jason Ekstrand (2):
>   drm/i915/gem: Drop legacy execbuffer support (v2)
>   drm/i915/gem: Drop relocation support on all new hardware (v5)
>
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 113 ++
>  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h|   2 -
>  drivers/gpu/drm/i915/i915_drv.c   |   2 +-
>  drivers/gpu/drm/i915/i915_gem.c   |  14 +++
>  include/uapi/drm/i915_drm.h   |   1 +
>  5 files changed, 26 insertions(+), 106 deletions(-)
>
> --
> 2.29.2
>
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/1] drm/i915: Disable pread/pwrite ioctl's for future platforms (v2)

2021-03-11 Thread Dixit, Ashutosh
On Thu, 11 Mar 2021 12:20:17 -0800, Jason Ekstrand wrote:
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b2e3b5cfccb4a..78ad5a9dd4784 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -374,10 +374,19 @@ int
>  i915_gem_pread_ioctl(struct drm_device *dev, void *data,
>struct drm_file *file)
>  {
> + struct drm_i915_private *i915 = to_i915(dev);
>   struct drm_i915_gem_pread *args = data;
>   struct drm_i915_gem_object *obj;
>   int ret;
>
> + /* Pread is disallowed for all platforms after TGL-LP */
> + if (INTEL_GEN(i915) >= 12 && !IS_TIGERLAKE(i915))
> + return -EOPNOTSUPP;
> +
> + /* All discrete memory platforms are Gen12 or above */
> + if (WARN_ON(HAS_LMEM(i915)))
> + return -EOPNOTSUPP;

Not sure but you are probably trying to make it explicit that pread/pwrite
are truly gone on dGfx? Because real dGfx are Gen12+ the code will return
from the first if statement and never get to the second if statement. And
there's talk on the relocation thread about tripping fake LMEM here for
platforms prior to Gen12.

So I'd suggest get rid of this second if statement and only retain the
first (for both pread and pwrite) since that seems to be entirely
sufficient.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] i915: Drop legacy execbuffer support

2021-03-11 Thread Dixit, Ashutosh
On Wed, 10 Mar 2021 13:00:49 -0800, Jason Ekstrand wrote:
>
> libdrm has supported the newer execbuffer2 ioctl and using it by default
> when it exists since libdrm commit b50964027bef249a0cc3d511de05c2464e0a1e22
> which landed Mar 2, 2010.  The i915 and i965 drivers in Mesa at the time
> both used libdrm and so did the Intel X11 back-end.  The SNA back-end
> for X11 has always used execbuffer2.
>
> Signed-off-by: Jason Ekstrand 
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 100 --
>  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h|   2 -
>  drivers/gpu/drm/i915/i915_drv.c   |   2 +-
>  3 files changed, 1 insertion(+), 103 deletions(-)

Don't we want to clean up references to legacy execbuffer in
include/uapi/drm/i915_drm.h too?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] i915: Drop legacy execbuffer support

2021-03-11 Thread Dixit, Ashutosh
On Thu, 11 Mar 2021 20:31:33 -0800, Jason Ekstrand wrote:
> On March 11, 2021 20:26:06 "Dixit, Ashutosh"  wrote:
>  On Wed, 10 Mar 2021 13:00:49 -0800, Jason Ekstrand wrote:
>
>  libdrm has supported the newer execbuffer2 ioctl and using it by default
>  when it exists since libdrm commit b50964027bef249a0cc3d511de05c2464e0a1e22
>  which landed Mar 2, 2010.  The i915 and i965 drivers in Mesa at the time
>  both used libdrm and so did the Intel X11 back-end.  The SNA back-end
>  for X11 has always used execbuffer2.
>
>  Signed-off-by: Jason Ekstrand 
>  ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 100 --
>  drivers/gpu/drm/i915/gem/i915_gem_ioctls.h|   2 -
>  drivers/gpu/drm/i915/i915_drv.c   |   2 +-
>  3 files changed, 1 insertion(+), 103 deletions(-)
>
>  Don't we want to clean up references to legacy execbuffer in
>  include/uapi/drm/i915_drm.h too?
>
> I thought about that but Daniel said we should leave them. Maybe a
> comment is in order?

No, should be ok since we are using drm_invalid_op(). If we want to delete
the unused 'struct drm_i915_gem_execbuffer' we can do that by converting
from DRM_IOW to DRM_IO in the DRM_IOCTL_I915_GEM_EXECBUFFER #define.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] i915/perf: Start hrtimer only if sampling the OA buffer

2021-03-01 Thread Dixit, Ashutosh
On Mon, 01 Mar 2021 16:01:41 -0800, Nerlige Ramappa, Umesh wrote:
>
> SAMPLE_OA parameter enables sampling of OA buffer and results in a call
> to init the OA buffer which initializes the OA unit head/tail pointers.
> The OA_EXPONENT parameter controls the periodicity of the OA reports in
> the OA buffer and results in starting a hrtimer.
>
> Before gen12, all use cases required the use of the OA buffer and i915
> enforced this setting when vetting out the parameters passed. In these
> platforms the hrtimer was enabled if OA_EXPONENT was passed. This worked
> fine since it was implied that SAMPLE_OA is always passed.
>
> With gen12, this changed. Users can use perf without enabling the OA
> buffer as in OAR use cases. While an OAR use case should ideally not
> start the hrtimer, we see that passing an OA_EXPONENT parameter will
> start the hrtimer even though SAMPLE_OA is not specified. This results
> in an uninitialized OA buffer, so the head/tail pointers used to track
> the buffer are zero.
>
> This itself does not fail, but if we ran a use-case that SAMPLED the OA
> buffer previously, then the OA_TAIL register is still pointing to an old
> value. When the timer callback runs, it ends up calculating a
> wrong/large number of available reports. Since we do a spinlock_irq_save
> and start processing a large number of reports, NMI watchdog fires and
> causes a crash.
>
> Start the timer only if SAMPLE_OA is specified.
> v2:
> - Drop SAMPLE OA check when appending samples (Ashutosh)
> - Prevent read if OA buffer is not being sampled

Reviewed-by: Ashutosh Dixit 

> Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
> Signed-off-by: Umesh Nerlige Ramappa 
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 13 +
>  1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> b/drivers/gpu/drm/i915/i915_perf.c
> index c15bead2dac7..2fd2c13b76ac 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -595,7 +595,6 @@ static int append_oa_sample(struct i915_perf_stream 
> *stream,
>  {
>   int report_size = stream->oa_buffer.format_size;
>   struct drm_i915_perf_record_header header;
> - u32 sample_flags = stream->sample_flags;
>
>   header.type = DRM_I915_PERF_RECORD_SAMPLE;
>   header.pad = 0;
> @@ -609,10 +608,8 @@ static int append_oa_sample(struct i915_perf_stream 
> *stream,
>   return -EFAULT;
>   buf += sizeof(header);
>
> - if (sample_flags & SAMPLE_OA_REPORT) {
> - if (copy_to_user(buf, report, report_size))
> - return -EFAULT;
> - }
> + if (copy_to_user(buf, report, report_size))
> + return -EFAULT;
>
>   (*offset) += header.size;
>
> @@ -2669,7 +2666,7 @@ static void i915_oa_stream_enable(struct 
> i915_perf_stream *stream)
>
>   stream->perf->ops.oa_enable(stream);
>
> - if (stream->periodic)
> + if (stream->sample_flags & SAMPLE_OA_REPORT)
>   hrtimer_start(>poll_check_timer,
> ns_to_ktime(stream->poll_oa_period),
> HRTIMER_MODE_REL_PINNED);
> @@ -2732,7 +2729,7 @@ static void i915_oa_stream_disable(struct 
> i915_perf_stream *stream)
>  {
>   stream->perf->ops.oa_disable(stream);
>
> - if (stream->periodic)
> + if (stream->sample_flags & SAMPLE_OA_REPORT)
>   hrtimer_cancel(>poll_check_timer);
>  }
>
> @@ -3015,7 +3012,7 @@ static ssize_t i915_perf_read(struct file *file,
>* disabled stream as an error. In particular it might otherwise lead
>* to a deadlock for blocking file descriptors...
>*/
> - if (!stream->enabled)
> + if (!stream->enabled || !(stream->sample_flags & SAMPLE_OA_REPORT))
>   return -EIO;
>
>   if (!(file->f_flags & O_NONBLOCK)) {
> --
> 2.20.1
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v2 01/11] lib/i915/gem_mman: add FIXED mmap mode

2021-07-29 Thread Dixit, Ashutosh
On Wed, 28 Jul 2021 15:20:15 -0700, Dixit, Ashutosh wrote:
>
> On Wed, 28 Jul 2021 03:30:31 -0700, Matthew Auld wrote:
> >
> > diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c
> > index 4b4f2114..e2514f0c 100644
> > --- a/lib/i915/gem_mman.c
> > +++ b/lib/i915/gem_mman.c
> > @@ -497,6 +497,43 @@ void *gem_mmap_offset__cpu(int fd, uint32_t handle, 
> > uint64_t offset,
> > return ptr;
> >  }
> >
> > +#define LOCAL_I915_MMAP_OFFSET_FIXED 4
>
> At the minimum for now let's get rid of the LOCAL_ prefix here. Even if
> this appears in i915_drm.h later, gcc will only complain if the value
> diverges.

We just merged lib/i915/i915_drm_local.h so any declarations which we are
expecting appear to later appear in i915_drm.h should be added there,
without the LOCAL_ prefix. So they should be added just as we are expecting
them to appear later in i915_drm.h. Thanks.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v2 03/11] lib/i915/gem_mman: add fixed mode to mmap__cpu_coherent

2021-07-28 Thread Dixit, Ashutosh
On Wed, 28 Jul 2021 03:30:33 -0700, Matthew Auld wrote:
>
> diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c
> index 222e8896..337d28fb 100644
> --- a/lib/i915/gem_mman.c
> +++ b/lib/i915/gem_mman.c
> @@ -580,6 +580,8 @@ void *gem_mmap__cpu_coherent(int fd, uint32_t handle, 
> uint64_t offset,
>   igt_assert(offset == 0);
>
>   ptr = __gem_mmap__cpu_coherent(fd, handle, offset, size, prot);
> + if (!ptr)
> + ptr = __gem_mmap_offset__fixed(fd, handle, offset, size, prot);
>   igt_assert(ptr);

Shouldn't this go in __gem_mmap__cpu_coherent instead? Maybe right after
__gem_mmap_offset__cpu.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v2 01/11] lib/i915/gem_mman: add FIXED mmap mode

2021-07-28 Thread Dixit, Ashutosh
On Wed, 28 Jul 2021 03:30:31 -0700, Matthew Auld wrote:
>
> diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c
> index 4b4f2114..e2514f0c 100644
> --- a/lib/i915/gem_mman.c
> +++ b/lib/i915/gem_mman.c
> @@ -497,6 +497,43 @@ void *gem_mmap_offset__cpu(int fd, uint32_t handle, 
> uint64_t offset,
>   return ptr;
>  }
>
> +#define LOCAL_I915_MMAP_OFFSET_FIXED 4

At the minimum for now let's get rid of the LOCAL_ prefix here. Even if
this appears in i915_drm.h later, gcc will only complain if the value
diverges.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v2 02/11] lib/i915/gem_mman: add fixed mode to mmap__device_coherent

2021-07-28 Thread Dixit, Ashutosh
On Wed, 28 Jul 2021 03:30:32 -0700, Matthew Auld wrote:
>
> diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c
> index e2514f0c..222e8896 100644
> --- a/lib/i915/gem_mman.c
> +++ b/lib/i915/gem_mman.c
> @@ -383,9 +383,10 @@ void *__gem_mmap__device_coherent(int fd, uint32_t 
> handle, uint64_t offset,
> I915_MMAP_OFFSET_WC);
>   if (!ptr)
>   ptr = __gem_mmap__wc(fd, handle, offset, size, prot);
> -
>   if (!ptr)
>   ptr = __gem_mmap__gtt(fd, handle, size, prot);
> + if (!ptr)
> + ptr = __gem_mmap_offset__fixed(fd, handle, offset, size, prot);

Wondering if we really want 4 system calls for discrete. Maybe we can move
this up to 2nd place right after __gem_mmap_offset?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v2 04/11] lib/i915/gem_mman: add fixed mode to gem_mmap__cpu

2021-07-28 Thread Dixit, Ashutosh
On Wed, 28 Jul 2021 03:30:34 -0700, Matthew Auld wrote:
>
> diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c
> index 337d28fb..6f5e6d72 100644
> --- a/lib/i915/gem_mman.c
> +++ b/lib/i915/gem_mman.c
> @@ -434,7 +434,13 @@ void *gem_mmap__device_coherent(int fd, uint32_t handle, 
> uint64_t offset,
>   */
>  void *__gem_mmap__cpu(int fd, uint32_t handle, uint64_t offset, uint64_t 
> size, unsigned prot)
>  {
> - return __gem_mmap(fd, handle, offset, size, prot, 0);
> + void *ptr;
> +
> + ptr = __gem_mmap(fd, handle, offset, size, prot, 0);
> + if (!ptr)
> + ptr = __gem_mmap_offset__fixed(fd, handle, offset, size, prot);
> +
> + return ptr;

What about __gem_mmap__wc? Also shouldn't we just fix the __gem_mmap_offset
fallback in __gem_mmap and that will take care of both __gem_mmap__cpu and
__gem_mmap__wc?

(I think it will actually also fix __gem_mmap__device_coherent and
__gem_mmap__cpu_coherent but maybe we can still have those patches in this
series especially if they save a couple of system calls).
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t v3 09/11] tests/i915/module_load: update for discrete

2021-08-02 Thread Dixit, Ashutosh
On Fri, 30 Jul 2021 01:53:46 -0700, Matthew Auld wrote:
>
> The set_caching ioctl is gone for discrete, and now just returns
> -ENODEV. Update the gem_sanitycheck to account for that. After this we
> should be back to just having the breakage caused by missing reloc
> support for the reload testcase.
>
> Signed-off-by: Matthew Auld 
> Cc: Maarten Lankhorst 
> Cc: Ashutosh Dixit 
> Cc: Daniel Vetter 
> Cc: Ramalingam C 
> ---
>  tests/i915/i915_module_load.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/tests/i915/i915_module_load.c b/tests/i915/i915_module_load.c
> index 98ceb5d8..4b42fe3e 100644
> --- a/tests/i915/i915_module_load.c
> +++ b/tests/i915/i915_module_load.c
> @@ -172,17 +172,22 @@ static void gem_sanitycheck(void)
>  {
>   struct drm_i915_gem_caching args = {};
>   int i915 = __drm_open_driver(DRIVER_INTEL);
> + int expected;

If we want to reduce a couple of line of code this is:

int expected = gem_has_lmem(i915) ? -ENODEV : -ENOENT;

Otherwise this is:

Reviewed-by: Ashutosh Dixit 

>   int err;
>
> + expected = -ENOENT;
> + if (gem_has_lmem(i915))
> + expected = -ENODEV;
> +
>   err = 0;
>   if (ioctl(i915, DRM_IOCTL_I915_GEM_SET_CACHING, ))
>   err = -errno;
> - if (err == -ENOENT)
> + if (err == expected)
>   store_all(i915);
>   errno = 0;
>
>   close(i915);
> - igt_assert_eq(err, -ENOENT);
> + igt_assert_eq(err, expected);
>  }
>
>  static void
> --
> 2.26.3
>


Re: [Intel-gfx] [PATCH i-g-t v3 08/11] lib/ioctl_wrappers: update set_domain for discrete

2021-08-02 Thread Dixit, Ashutosh
On Fri, 30 Jul 2021 01:53:45 -0700, Matthew Auld wrote:
>
> On discrete set_domain is now gone, instead we just need to add the
> wait.

Reviewed-by: Ashutosh Dixit 

> Signed-off-by: Matthew Auld 
> Cc: Maarten Lankhorst 
> Cc: Ashutosh Dixit 
> Cc: Daniel Vetter 
> Cc: Ramalingam C 
> ---
>  lib/ioctl_wrappers.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
> index 7e27a1b3..09eb3ce7 100644
> --- a/lib/ioctl_wrappers.c
> +++ b/lib/ioctl_wrappers.c
> @@ -565,7 +565,12 @@ int __gem_set_domain(int fd, uint32_t handle, uint32_t 
> read, uint32_t write)
>   */
>  void gem_set_domain(int fd, uint32_t handle, uint32_t read, uint32_t write)
>  {
> - igt_assert_eq(__gem_set_domain(fd, handle, read, write), 0);
> + int ret = __gem_set_domain(fd, handle, read, write);
> +
> + if (ret == -ENODEV && gem_has_lmem(fd))
> + igt_assert_eq(gem_wait(fd, handle, 0), 0);
> + else
> + igt_assert_eq(ret, 0);
>  }
>
>  /**
> --
> 2.26.3
>


Re: [Intel-gfx] [PATCH i-g-t v2 04/11] lib/i915/gem_mman: add fixed mode to gem_mmap__cpu

2021-08-02 Thread Dixit, Ashutosh
On Thu, 29 Jul 2021 01:50:45 -0700, Matthew Auld wrote:
>

Hi Matt,

> On 29/07/2021 00:07, Dixit, Ashutosh wrote:
> > On Wed, 28 Jul 2021 03:30:34 -0700, Matthew Auld wrote:
> >>
> >> diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c
> >> index 337d28fb..6f5e6d72 100644
> >> --- a/lib/i915/gem_mman.c
> >> +++ b/lib/i915/gem_mman.c
> >> @@ -434,7 +434,13 @@ void *gem_mmap__device_coherent(int fd, uint32_t 
> >> handle, uint64_t offset,
> >>*/
> >>   void *__gem_mmap__cpu(int fd, uint32_t handle, uint64_t offset, uint64_t 
> >> size, unsigned prot)
> >>   {
> >> -  return __gem_mmap(fd, handle, offset, size, prot, 0);
> >> +  void *ptr;
> >> +
> >> +  ptr = __gem_mmap(fd, handle, offset, size, prot, 0);
> >> +  if (!ptr)
> >> +  ptr = __gem_mmap_offset__fixed(fd, handle, offset, size, prot);
> >> +
> >> +  return ptr;
> >
> > What about __gem_mmap__wc? Also shouldn't we just fix the __gem_mmap_offset
> > fallback in __gem_mmap and that will take care of both __gem_mmap__cpu and
> > __gem_mmap__wc?
>
> For gem_mmap__wc it felt like slightly too much lying, since on discrete
> smem-only buffers are always wb, and so the __wc here is not what the user
> gets with the new FIXED mode.

Correct.

> gem_mmap__device_coherent() I think matches this new behaviour well,
> where we don't explicitly state what the mapping type is, but instead
> just guarantee that the returned mapping is device coherent. My rough
> thinking was to convert most users of __wc over to __device_coherent(),
> at least in the tests that we care about for discrete?

Correct, though note that ALL tests will run on discrete (though with
smem-only buffers with today's IGT since I don't believe we have added any
tests which create lmem buffers yet).

> On the other hand if we are happy with the lie, I don't think anything will
> break, and pretty much all testscases using mmap I think should just
> magically work on discrete, and it does mean a less less work vs converting
> to __device_coherent?

I lost you here since I really don't know what you mean by not changing to
__device_coherent because afaiu gem_mmap__wc and gem_mmap_offset__wc will
fail on discrete so we can't just not do anything.

Next, I have no idea what is working and what is not working with the
present series on DG1 (since the CI is showing all red on DG1 and I don't
believe DG1 is included in showing regressions) and whether or not we have
completed these FIXED related changes, so is this series complete or not?

Afais since this series hasn't done anything about wc (except
__device_coherent), any test which does wc will fail on discrete.

> >
> > (I think it will actually also fix __gem_mmap__device_coherent and
> > __gem_mmap__cpu_coherent but maybe we can still have those patches in this
> > series especially if they save a couple of system calls).

Last, I believe I have a very simple solution to this FIXED
conundrum. Which I believe works (you guys can tell me if it doesn't) but
what I am not sure is whether it is an acceptable programming paradigm.

Let us say, on discrete, we decide that we won't make any changes to any of
the tests (we'll change on the library), and the tests themselves will not
even be aware of the FIXED mode. The tests always ask for WC or WB but
under the hood we will convert those calls to FIXED on discrete. So the
tests ask for WC or WB but that will mean FIXED on discrete.

Then I believe all we need to do is make a single very simple change to the
function __gem_mmap_offset(). In this functions, if we see WC or WB we will
convert that to FIXED before issuing the ioctl (or alternative issue the
ioctl but if it returns error we will retry with FIXED).

This works even for __gem_mmap() because __gem_mmap() itself falls back to
__gem_mmap_offset(). So afais we don't need to make ANY changes to ANY of
the higher level functions because we have fixed it in the lowest level
function. If we do it this way we won't need most of the first 7 patches in
the series. (This is what I was saying in the previous reply too).

But like I said whether we want to do this or not we have to decide and I
want other CC'd reviewers to chime in on this. It doesn't mean that we
cannot make changes to higher level functions if it makes things more
efficient or if we want to proliferate the FIXED mode throughout IGT but I
think changing the lowest level function is sufficient to fix failing
tests.

Thanks.
--
Ashutosh


Re: [Intel-gfx] [PATCH i-g-t v3 03/11] lib/i915/gem_mman: add fixed mode to gem_mmap_offset__cpu

2021-08-02 Thread Dixit, Ashutosh
On Fri, 30 Jul 2021 01:53:40 -0700, Matthew Auld wrote:
>
> On discrete we only support the new fixed mode.
>
> Signed-off-by: Matthew Auld 
> Cc: Maarten Lankhorst 
> Cc: Ashutosh Dixit 
> Cc: Daniel Vetter 
> Cc: Ramalingam C 
> ---
>  lib/i915/gem_mman.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c
> index c432bb16..563a7ccf 100644
> --- a/lib/i915/gem_mman.c
> +++ b/lib/i915/gem_mman.c
> @@ -474,8 +474,14 @@ void *gem_mmap__cpu(int fd, uint32_t handle, uint64_t 
> offset, uint64_t size, uns
>  void *__gem_mmap_offset__cpu(int fd, uint32_t handle, uint64_t offset,
>uint64_t size, unsigned prot)
>  {
> - return __gem_mmap_offset(fd, handle, offset, size, prot,
> + void *ptr;
> +
> + ptr = __gem_mmap_offset(fd, handle, offset, size, prot,
>I915_MMAP_OFFSET_WB);
> + if (!ptr)
> + ptr = __gem_mmap_offset__fixed(fd, handle, offset, size, prot);
> +
> + return ptr;

Imo there's some asymmetry here. If we are adding fixed mode to
mmap__device_coherent (in the previous patch) then we should also be adding
it to mmap__cpu_coherent (as before). Or, if we are adding fixed mode to
__gem_mmap_offset__cpu we should also be adding it to
__gem_mmap_offset__wc. Thanks.


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 1/7] lib/i915/gem_mman: add FIXED mmap mode

2021-07-28 Thread Dixit, Ashutosh
On Tue, 27 Jul 2021 23:08:40 -0700, Petri Latvala wrote:
>
> On Tue, Jul 27, 2021 at 07:01:24PM -0700, Dixit, Ashutosh wrote:
> > On Mon, 26 Jul 2021 05:03:04 -0700, Matthew Auld wrote:
> > >
> > > diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c
> > > index 4b4f2114..e2514f0c 100644
> > > --- a/lib/i915/gem_mman.c
> > > +++ b/lib/i915/gem_mman.c
> > > @@ -497,6 +497,43 @@ void *gem_mmap_offset__cpu(int fd, uint32_t handle, 
> > > uint64_t offset,
> > >   return ptr;
> > >  }
> > >
> > > +#define LOCAL_I915_MMAP_OFFSET_FIXED 4
> >
> > Cc: @Petri Latvala
> >
> > This use of LOCAL declarations is more related to the methodology we follow
> > in IGT rather than this patch. We have seen in the past that such LOCAL's
> > linger on in the code for months and years till someone decides to clean
> > them so the question is can we prevent these LOCAL's from getting
> > introduced in the first place.
> >
> > One reason for these is that we sync IGT headers with drm-next whereas IGT
> > is used to test drm-tip. So the delta between the two results in such
> > LOCAL's as in this case.
> >
> > My proposal is that even if we don't start sync'ing IGT headers with
> > drm-tip (instead of drm-next) we allow direct modification of the headers
> > when needed to avoid introducing such LOCAL's. So in the above case we
> > would add:
> >
> > #define I915_MMAP_OFFSET_FIXED 4
> >
> > to i915_drm.h as part of this patch and then just use
> > I915_MMAP_OFFSET_FIXED. If another sync happens from drm-next before this
> > #define has appeared there, the compile will break and whoever syncs will
> > need to add this again to i915_drm.h.
>
> I don't like that kind of a breakage at all. That enforces mandatory
> fixups to some poor developer working on unrelated code who doesn't
> necessarily know how to even fix it easily.
>
> Of course an argument can be made that it's an i915 token in an i915
> header so it will be the i915 people doing it, but for a general case
> that's going to cause more harm than it solves problems, I feel.

OK, let's not change anything with the headers we import for now.

> > What do people think about a scheme such as this? The other, perhaps
> > better, option of course is to sync the headers directly with drm-tip
> > (whenever and as often as needed). But the goal in both cases is to avoid
> > LOCAL's, or other things like #ifndef's distributed throughout multiple
> > source files which we also do in such cases. A centralized internal header
> > to contain such declarations might not be so bad. Thanks.
>
> A separate manually written header for new tokens that are not yet in
> drm-next might be the least bad of all options. Although now that I've
> said it, the perfect world would have new tokens done like this:
>
> #ifndef I915_MMAP_OFFSET_FIXED
> #define I915_MMAP_OFFSET_FIXED 4
> #else
> _Static_assert(I915_MMAP_OFFSET_FIXED == 4, "ABI broken, yikes");
> #endif
>
> In a different language wrapping all that in a
>
> MAYBE_DECLARE(I915_MMAP_OFFSET_FIXED, 4)
>
> might be easier but with C preprocessor it's a bit more... involved. A
> separate build-time script to generate that header maybe? Such a
> script could also just completely omit the definition if header copies
> already introduce the token.

IMO all this wouldn't do much more that what gcc already does. For example,
for this:

#define I915_MMAP_OFFSET_FIXED 4
#define I915_MMAP_OFFSET_FIXED 4

gcc silently ignores the second #define, whereas for:

#define I915_MMAP_OFFSET_FIXED 4
#define I915_MMAP_OFFSET_FIXED 5

gcc will warn that second I915_MMAP_OFFSET_FIXED is redefined.

And gcc will error out on things like struct redeclaration.

> Recap:
>
> 1) We have kernel headers copied into IGT to ensure it builds fine
> without latest-and-greatest headers installed on the system.
>
> 2) Copies are from drm-next to ensure the next person to copy the
> headers doesn't accidentally drop definitions that originate from a
> vendor-specific tree. (That same reason is also for why one shouldn't
> edit the headers manually)
>
> 3) To get to drm-next, the kernel code needs to be tested with IGT
> first, so we need new definitions to test that kernel code in some
> form.

I guess it is possible to test with "Test-with:" and merge the kernel
changes first and the IGT changes later with the correct headers but maybe
it's inconvenient? But don't we merge the kernel changes before IGT?

> 4) LOCAL_* definitions that are cleaned up later when actual
> definitions reach drm-next sounds good in theory but in practice the
> clea

Re: [Intel-gfx] [PATCH i-g-t] tests/i915: Skip gem_exec_fair on GuC based platforms

2021-10-13 Thread Dixit, Ashutosh
On Wed, 13 Oct 2021 15:43:17 -0700,  wrote:
>
> From: John Harrison 
>
> The gem_exec_fair test is specifically testing scheduler algorithm
> performance. However, GuC does not implement the same algorithm as
> execlist mode and this test is not applicable. So, until sw arch
> approves a new algorithm and it is implemented in GuC, stop running
> the test.
>
> Signed-off-by: John Harrison 
> ---
>  tests/i915/gem_exec_fair.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/tests/i915/gem_exec_fair.c b/tests/i915/gem_exec_fair.c
> index ef5a450f6..ca9c73c6e 100644
> --- a/tests/i915/gem_exec_fair.c
> +++ b/tests/i915/gem_exec_fair.c
> @@ -1314,6 +1314,12 @@ igt_main
>   igt_require(gem_scheduler_enabled(i915));
>   igt_require(gem_scheduler_has_ctx_priority(i915));
>
> + /*
> +  * These tests are for a specific scheduling model which is
> +  * not currently implemented by GuC. So skip on GuC platforms.
> +  */
> + igt_require(intel_gen(intel_get_drm_devid(i915)) < 12);

Probably a feature check rather than a version check is better? Can we use
say gem_has_guc_submission() instead?

Though appears gem_has_guc_submission() only checks if guc submission is
available, not if it is actually in use (unless guc will used when
available automatically)? Is it possible to add the check if guc submission
is actually in use? Or a check for guc scheduler?

> +
>   cfg = intel_ctx_cfg_all_physical(i915);
>
>   igt_info("CS timestamp frequency: %d\n",
> --
> 2.25.1
>


Re: [Intel-gfx] [PATCH i-g-t] tests/i915: Skip gem_exec_fair on GuC based platforms

2021-10-13 Thread Dixit, Ashutosh
On Wed, 13 Oct 2021 18:07:05 -0700, John Harrison wrote:
>
> On 10/13/2021 15:53, Dixit, Ashutosh wrote:
> >> diff --git a/tests/i915/gem_exec_fair.c b/tests/i915/gem_exec_fair.c
> >> index ef5a450f6..ca9c73c6e 100644
> >> --- a/tests/i915/gem_exec_fair.c
> >> +++ b/tests/i915/gem_exec_fair.c
> >> @@ -1314,6 +1314,12 @@ igt_main
> >>igt_require(gem_scheduler_enabled(i915));
> >>igt_require(gem_scheduler_has_ctx_priority(i915));
> >>
> >> +  /*
> >> +   * These tests are for a specific scheduling model which is
> >> +   * not currently implemented by GuC. So skip on GuC platforms.
> >> +   */
> >> +  igt_require(intel_gen(intel_get_drm_devid(i915)) < 12);
> > Probably a feature check rather than a version check is better? Can we use
> > say gem_has_guc_submission() instead?
> >
> > Though appears gem_has_guc_submission() only checks if guc submission is
> > available, not if it is actually in use (unless guc will used when
> > available automatically)? Is it possible to add the check if guc submission
> > is actually in use? Or a check for guc scheduler?
>
> I believe this has come up a few times before. My understanding is that no,
> there is no current official/safe way for userland to check if GuC
> submission is enabled (you can read some of the debugfs files and make an
> educated guess but that isn't exactly an official interface). And the
> answer was that it isn't worth adding a UAPI specifically for it. Not least
> because it would be a UAPI solely for use by IGT which is not allowed.

Hmm, so kernel will use GuC submission if bit 0 of enable_guc module param
is 1, correct? Which is what gem_has_guc_submission() checks, though I
guess we can also add a function gem_using_guc_submission() which is
basically an alias for gem_has_guc_submission(). So we can't do this? Or
the module param is not an acceptable uapi? But we already introduced
gem_has_guc_submission()?

I think this kind of a generation/version check should be implemented in
the kernel. If kernel wants to turn on GuC submission by default let it do
that and set enable_guc. In IGT we just check enable_guc. No? Thanks.


Re: [Intel-gfx] [PATCH 1/2] drm/i915/dmabuf: fix broken build

2021-10-21 Thread Dixit, Ashutosh
On Thu, 21 Oct 2021 05:53:31 -0700, Matthew Auld wrote:
>
> wbinvd_on_all_cpus() is only defined on x86 it seems, plus we need to
> include asm/smp.h here.

Reviewed-by: Ashutosh Dixit 

> Reported-by: kernel test robot 
> Signed-off-by: Matthew Auld 
> Cc: Thomas Hellström 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index 1adcd8e02d29..a45d0ec2c5b6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -12,6 +12,13 @@
>  #include "i915_gem_object.h"
>  #include "i915_scatterlist.h"
>
> +#if defined(CONFIG_X86)
> +#include 
> +#else
> +#define wbinvd_on_all_cpus() \
> + pr_warn(DRIVER_NAME ": Missing cache flush in %s\n", __func__)
> +#endif
> +
>  I915_SELFTEST_DECLARE(static bool force_different_devices;)
>
>  static struct drm_i915_gem_object *dma_buf_to_obj(struct dma_buf *buf)
> --
> 2.26.3
>


Re: [Intel-gfx] [PATCH v3 i-g-t] tests/i915: Skip gem_exec_fair on GuC based platforms

2021-10-15 Thread Dixit, Ashutosh
On Fri, 15 Oct 2021 14:54:34 -0700,  wrote:
>
> diff --git a/tests/i915/gem_exec_fair.c b/tests/i915/gem_exec_fair.c
> index ef5a450f6..5cbb48f5a 100644
> --- a/tests/i915/gem_exec_fair.c
> +++ b/tests/i915/gem_exec_fair.c
> @@ -1314,6 +1314,12 @@ igt_main
>   igt_require(gem_scheduler_enabled(i915));
>   igt_require(gem_scheduler_has_ctx_priority(i915));
>
> + /*
> +  * These tests are for a specific scheduling model which is
> +  * not currently implemented by GuC. So skip on GuC platforms.
> +  */
> + igt_require(!gem_has_guc_submission(i915));

Reviewed-by: Ashutosh Dixit 


Re: [Intel-gfx] [igt-dev] [PATCH v2 i-g-t] tests/i915: Skip gem_exec_fair on GuC based platforms

2021-10-15 Thread Dixit, Ashutosh
On Fri, 15 Oct 2021 14:46:20 -0700, John Harrison wrote:
>
> On 10/15/2021 07:52, Dixit, Ashutosh wrote:
> > On Thu, 14 Oct 2021 12:42:38 -0700,  wrote:
> >> +  /*
> >> +   * These tests are for a specific scheduling model which is
> >> +   * not currently implemented by GuC. So skip on GuC platforms.
> >> +   */
> >> +  devid = intel_get_drm_devid(i915);
> >> +  igt_require((intel_gen(devid) < 12) || IS_TIGERLAKE(devid) ||
> >> +  IS_ROCKETLAKE(devid) || IS_ALDERLAKE_S(devid));
> > As I hinted on v1 let's just do this here:
> >
> > igt_require(gem_has_guc_submission(i915));
> >
> > So that we can can have a single unified way of detecting if GuC is being
> > used throughout IGT. Today it is gem_has_guc_submission() and it works with
> > the current kernel.
>
> Earlier, you were saying that 'has' was only checking for capability not
> usage. Which would be pretty useless for this situation. Looking at the
> code, though it sort of does work. It checks the live value of the
> enable_guc module parameter. If that says that GuC submission is enabled
> then either we are using GuC submission or we have no engines (because a
> failure to start the submission backend is terminal, there is no fallback
> to execlist mode if GuC didn't work). So it can be used.
>
> I say sort of, though, because the code also sets 'has_execlists' when it
> sets 'has_guc'. Which means that the gem_has_execlists() test is not
> useable as an indication that the execlist back end is being used. So
> gem_has_execlists() and gem_has_guc_submission() are basically very
> non-orthogonal. One is a test of hardware presence irrespective of use, the
> other is a test of usage irrespective of presence. The comment in the code
> is 'query whether the driver is using execlists as a hardware submission
> method'. So it seems like that was the original intention. Whether it has
> been broken since or was just broken from the beginning is unclear.

I completely agree, there is all this confusion in the code, it needs to be
cleaned up. For now I thought it was better to use a centralized way to
detect GuC rather than have different ways in different files. Fortunately
gem_has_guc_submission() is doing that though gem_has_execlists() is
broken in detecting if execlists are being used.


Re: [Intel-gfx] [igt-dev] [PATCH v2 i-g-t] tests/i915: Skip gem_exec_fair on GuC based platforms

2021-10-15 Thread Dixit, Ashutosh
On Thu, 14 Oct 2021 12:42:38 -0700,  wrote:
>
> + /*
> +  * These tests are for a specific scheduling model which is
> +  * not currently implemented by GuC. So skip on GuC platforms.
> +  */
> + devid = intel_get_drm_devid(i915);
> + igt_require((intel_gen(devid) < 12) || IS_TIGERLAKE(devid) ||
> + IS_ROCKETLAKE(devid) || IS_ALDERLAKE_S(devid));

As I hinted on v1 let's just do this here:

igt_require(gem_has_guc_submission(i915));

So that we can can have a single unified way of detecting if GuC is being
used throughout IGT. Today it is gem_has_guc_submission() and it works with
the current kernel.


Re: [Intel-gfx] [PATCH i-g-t] intel-gpu-top: Add support for per client stats

2021-12-03 Thread Dixit, Ashutosh
On Fri, 03 Dec 2021 07:54:56 -0800, Tvrtko Ursulin wrote:
>
> From: Tvrtko Ursulin 
>
> Use the i915 exported data in /proc//fdinfo to show GPU utilization
> per DRM client.

Didn't we just remove it? Adding it back now? Sorry for the probably dumb
question :/


Re: [Intel-gfx] [PATCH] drm/i915/pmu: Fix wakeref leak in PMU busyness during reset

2021-12-06 Thread Dixit, Ashutosh
On Mon, 06 Dec 2021 16:45:42 -0800, Umesh Nerlige Ramappa wrote:
>
> GuC PMU busyness gets gt wakeref if awake, but fails to release the
> wakeref if a reset is in progress. Release the wakeref if it was
> acquried successfully.
>
> Signed-off-by: Umesh Nerlige Ramappa 
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 1f9d4fde421f..a243304a2db1 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1181,6 +1181,7 @@ static ktime_t guc_engine_busyness(struct 
> intel_engine_cs *engine, ktime_t *now)
>   struct intel_gt *gt = engine->gt;
>   struct intel_guc *guc = >uc.guc;
>   u64 total, gt_stamp_saved;
> + intel_wakeref_t wakeref;

Should be bool.

>   unsigned long flags;
>   u32 reset_count;
>   bool in_reset;
> @@ -1206,18 +1207,21 @@ static ktime_t guc_engine_busyness(struct 
> intel_engine_cs *engine, ktime_t *now)
>* start_gt_clk is derived from GuC state. To get a consistent
>* view of activity, we query the GuC state only if gt is awake.
>*/
> - if (intel_gt_pm_get_if_awake(gt) && !in_reset) {

Also how about just switch them around:

if (!in_reset && intel_gt_pm_get_if_awake(gt))

Then you don't need any other changes I think.

> + wakeref = intel_gt_pm_get_if_awake(gt);
> + if (wakeref && !in_reset) {
>   stats_saved = *stats;
>   gt_stamp_saved = guc->timestamp.gt_stamp;
>   guc_update_engine_gt_clks(engine);
>   guc_update_pm_timestamp(guc, engine, now);
> - intel_gt_pm_put_async(gt);
>   if (i915_reset_count(gpu_error) != reset_count) {
>   *stats = stats_saved;
>   guc->timestamp.gt_stamp = gt_stamp_saved;
>   }
>   }
>
> + if (wakeref)
> + intel_gt_pm_put_async(gt);
> +
>   total = intel_gt_clock_interval_to_ns(gt, stats->total_gt_clks);
>   if (stats->running) {
>   u64 clk = guc->timestamp.gt_stamp - stats->start_gt_clk;
> --
> 2.20.1
>


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t] tests/i915/userptr: fix mapping type

2022-01-05 Thread Dixit, Ashutosh
On Wed, 05 Jan 2022 09:21:06 -0800, Matthew Auld wrote:
>
> We need to use the FIXED mapping type on discrete platforms.

Reviewed-by: Ashutosh Dixit 

> Signed-off-by: Matthew Auld 
> Cc: Thomas Hellström 
> Cc: Priyanka Dandamudi 
> ---
>  tests/i915/gem_userptr_blits.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/tests/i915/gem_userptr_blits.c b/tests/i915/gem_userptr_blits.c
> index 3464db66..a4dca4c0 100644
> --- a/tests/i915/gem_userptr_blits.c
> +++ b/tests/i915/gem_userptr_blits.c
> @@ -2185,7 +2185,10 @@ static void test_probe(int fd)
>*/
>   memset(_offset, 0, sizeof(mmap_offset));
>   mmap_offset.handle = gem_create(fd, PAGE_SIZE);
> - mmap_offset.flags = I915_MMAP_OFFSET_WB;
> + if (gem_has_lmem(fd))
> + mmap_offset.flags = I915_MMAP_OFFSET_FIXED;
> + else
> + mmap_offset.flags = I915_MMAP_OFFSET_WB;
>   igt_assert_eq(igt_ioctl(fd, DRM_IOCTL_I915_GEM_MMAP_OFFSET, 
> _offset), 0);
>
>   for (unsigned long pass = 0; pass < 4 * 4 * 4 * 4 * 4; pass++) {
> --
> 2.31.1
>


Re: [Intel-gfx] [PATCH] drm/i915/guc/slpc: Check GuC status before freq boost

2021-11-12 Thread Dixit, Ashutosh
On Thu, 11 Nov 2021 23:10:16 -0800, Vinay Belgaumkar wrote:
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> index 4e1d3cd29164..22c1c12369f2 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> @@ -183,11 +183,15 @@ static int slpc_unset_param(struct intel_guc_slpc *slpc,
>  static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq)
>  {
>   struct drm_i915_private *i915 = slpc_to_i915(slpc);
> + struct intel_guc *guc = slpc_to_guc(slpc);
>   intel_wakeref_t wakeref;
>   int ret = 0;
>
>   lockdep_assert_held(>lock);
>
> + if (!intel_guc_is_ready(guc))
> + return -ENODEV;
> +

Reviewed-by: Ashutosh Dixit 

The test wedges/resets the GPU after a request is queued but before it is
retired.



Re: [Intel-gfx] [PATCH 2/3] drm/i915/guc/slpc: Add waitboost functionality for SLPC

2021-11-01 Thread Dixit, Ashutosh
On Sun, 31 Oct 2021 21:39:36 -0700, Belgaumkar, Vinay wrote:
>
> @@ -945,6 +960,17 @@ void intel_rps_boost(struct i915_request *rq)
>   if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, >fence.flags)) {
>   struct intel_rps *rps = _ONCE(rq->engine)->gt->rps;
>
> + if (rps_uses_slpc(rps)) {
> + slpc = rps_to_slpc(rps);
> +
> + /* Return if old value is non zero */
> + if (atomic_fetch_inc(>num_waiters))
> + return;
> +
> + if (intel_rps_get_requested_frequency(rps) < 
> slpc->boost_freq)

I think this check is not needed because:

a. The waitboost code only changes min_freq. i915 code should not depend on
   how GuC changes requested_freq in response to change in min_freq.

b. What is more worrisome is that when we "de-boost" we set min_freq to
   min_freq_softlimit. If GuC e.g. has a delay in bringing requested_freq
   down and intel_rps_boost() gets called meanwhile we will miss the one
   opportunity we have to boost the freq (when num_waiters goes from 0 to
   1. Asking GuC to boost when actual_freq is already boost_freq is
   harmless in comparison). So to avoid this risk of missing the chance to
   boost I think we should delete this check and replace the code above
   with something like:

if (rps_uses_slpc(rps)) {
struct intel_guc_slpc *slpc = rps_to_slpc(rps);

if (slpc->boost_freq <= slpc->min_freq_softlimit)
return;

if (!atomic_fetch_inc(>num_waiters))
schedule_work(>boost_work);

return;
}

Note that this check:

if (slpc->boost_freq <= slpc->min_freq_softlimit)
return;

(which is basically a degenerate case in which we don't have to do
anything), can be probably be implemented when boost_freq is set in sysfs,
or may already be encompassed in "val < slpc->min_freq" in
intel_guc_slpc_set_boost_freq() in which case this check can also be
skipped from this function.

> +void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc)
> +{
> + /* Return min back to the softlimit.
> +  * This is called during request retire,
> +  * so we don't need to fail that if the
> +  * set_param fails.
> +  */

nit: maybe follow kernel multi-line comment format.


Re: [Intel-gfx] [PATCH v2 0/3] drm/i915/guc/slpc: Implement waitboost for SLPC

2021-11-01 Thread Dixit, Ashutosh
On Sun, 31 Oct 2021 21:39:34 -0700, Belgaumkar, Vinay wrote:
>
> Waitboost is a legacy feature implemented in the Host Turbo algorithm. This
> patch set implements it for the SLPC path. A "boost" happens when user
> calls gem_wait ioctl on a submission that has not landed on HW yet.

Afaiu user doesn't have to call gem_wait, the boost will happen whenever a
request waits to be submitted to GuC because of an unmet depedency. This
has to be done from i915 because GuC has not yet seen the request.

Rest of the cover letter is fine.


Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc/slpc: Define and initialize boost frequency

2021-11-01 Thread Dixit, Ashutosh
On Sun, 31 Oct 2021 21:39:35 -0700, Belgaumkar, Vinay wrote:
>
> Define helpers and struct members required to record boost info.
> Boost frequency is initialized to RP0 at SLPC init. Also define num_waiters
> which can track the pending boost requests.
>
> Boost will be done by scheduling a worker thread. This will allow
> us to make H2G calls inside an interrupt context. Initialize the

"to not make H2G calls from interrupt context" is probably better.

> +static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq)
> +{
> + struct drm_i915_private *i915 = slpc_to_i915(slpc);
> + intel_wakeref_t wakeref;
> + int ret = 0;
> +
> + lockdep_assert_held(>lock);
> +
> + /**

nit: this I believe should just be

/*

/** I believe shows up in kerneldoc so shouldn't be used unless we want
something in kerneldoc.

> +  * This function is a little different as compared to
> +  * intel_guc_slpc_set_min_freq(). Softlimit will not be updated
> +  * here since this is used to temporarily change min freq,
> +  * for example, during a waitboost. Caller is responsible for
> +  * checking bounds.
> +  */
> +
> + with_intel_runtime_pm(>runtime_pm, wakeref) {
> + ret = slpc_set_param(slpc,
> +  SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
> +  freq);
> + if (ret)
> + drm_err(>drm, "Unable to force min freq to %u: 
> %d",

Probably drm_err_ratelimited since it's called at run time not only at
init? Not sure if drm_err_once suffizes, probably not.

> + freq, ret);
> + }
> +
> + return ret;
> +}
> +
> +static void slpc_boost_work(struct work_struct *work)
> +{
> + struct intel_guc_slpc *slpc = container_of(work, typeof(*slpc), 
> boost_work);
> +
> + /* Raise min freq to boost. It's possible that
> +  * this is greater than current max. But it will
> +  * certainly be limited by RP0. An error setting
> +  * the min param is not fatal.
> +  */

nit: do we follow the following format for multi-line comments,
Documentation/process/coding-style.rst mentions this:

/*
 * Line 1
 * Line 2
 */


Re: [Intel-gfx] [PATCH 3/3] drm/i915/guc/slpc: Update boost sysfs hooks for SLPC

2021-11-01 Thread Dixit, Ashutosh
On Sun, 31 Oct 2021 21:39:37 -0700, Belgaumkar, Vinay wrote:
>
> +static int set_boost_freq(struct intel_rps *rps, u32 val)

Since this is legacy rps code path maybe change function name to 
rps_set_boost_freq?


Re: [Intel-gfx] [PATCH v3 0/3] drm/i915/guc/slpc: Implement waitboost for SLPC

2021-11-01 Thread Dixit, Ashutosh
On Mon, 01 Nov 2021 18:26:05 -0700, Vinay Belgaumkar wrote:
>
> Waitboost is a legacy feature implemented in the Host Turbo algorithm. This
> patch set implements it for the SLPC path. A boost can happen when a request
> is waiting for an unmet dependency. GT frequency gets temporarily bumped to
> boost freq to allow the previous request to finish quickly. We achieve this
> on SLPC by setting the min frequency, SLPC will set that as the requested
> frequency.
>
> The boost will occur through a worker thread that will be scheduled
> when the required conditions are met.
>
> Like before, boost frequency is configurable through sysfs, so we can
> adjust it to any specific value as long as it is between [min, RP0].
>
> v2: Add a worker thread to perform freq boost.
> v3: Address comments (Ashutosh)

For the series:

Reviewed-by: Ashutosh Dixit 

> Cc: Ashutosh Dixit 
> Signed-off-by: Vinay Belgaumkar 
>
> Vinay Belgaumkar (3):
>   drm/i915/guc/slpc: Define and initialize boost frequency
>   drm/i915/guc/slpc: Add waitboost functionality for SLPC
>   drm/i915/guc/slpc: Update boost sysfs hooks for SLPC
>
>  drivers/gpu/drm/i915/gt/intel_rps.c   |  72 +
>  drivers/gpu/drm/i915/gt/intel_rps.h   |   3 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 151 +++---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |   3 +
>  .../gpu/drm/i915/gt/uc/intel_guc_slpc_types.h |  13 ++
>  drivers/gpu/drm/i915/i915_request.c   |   2 +-
>  drivers/gpu/drm/i915/i915_sysfs.c |  19 +--
>  7 files changed, 223 insertions(+), 40 deletions(-)
>
> --
> 2.25.0
>


Re: [Intel-gfx] [PATCH 3/3] drm/i915/guc/slpc: Update boost sysfs hooks for SLPC

2021-11-03 Thread Dixit, Ashutosh
On Mon, 01 Nov 2021 13:28:14 -0700, Dixit, Ashutosh wrote:
>
> On Sun, 31 Oct 2021 21:39:37 -0700, Belgaumkar, Vinay wrote:
> >
> > +static int set_boost_freq(struct intel_rps *rps, u32 val)
>
> Since this is legacy rps code path maybe change function name to
> rps_set_boost_freq?

Not being able to find v3 of this patch so giving a R-b on v2 but the R-b
applies to v3:

Reviewed-by: Ashutosh Dixit 


Re: [Intel-gfx] [PATCH 3/3] drm/i915/guc/slpc: Update boost sysfs hooks for SLPC

2021-11-03 Thread Dixit, Ashutosh
On Mon, 01 Nov 2021 18:26:08 -0700, Vinay Belgaumkar wrote:
>
> Add a helper to sort through the SLPC/RPS paths of get/set methods.
> Boost frequency will be modified as long as it is within the constraints
> of RP0 and if it is different from the existing one. We will set min
> freq to boost only if there is at least one active waiter.
>
> v2: Add num_boosts to guc_slpc_info and changes for worker function
> v3: Review comments (Ashutosh)

Reviewed-by: Ashutosh Dixit 


Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc/slpc: Define and initialize boost frequency

2021-11-03 Thread Dixit, Ashutosh
On Mon, 01 Nov 2021 18:26:06 -0700, Vinay Belgaumkar wrote:
>
> Define helpers and struct members required to record boost info.
> Boost frequency is initialized to RP0 at SLPC init. Also define num_waiters
> which can track the pending boost requests.
>
> Boost will be done by scheduling a worker thread. This will avoid
> the need to make H2G calls inside an interrupt context. Initialize the
> worker function during SLPC init as well. Had to move intel_guc_slpc_init
> a few lines below to accomodate this.
>
> v2: Add a workqueue to handle waitboost
> v3: Code review comments (Ashutosh)

Reviewed-by: Ashutosh Dixit 


Re: [Intel-gfx] [PATCH 2/3] drm/i915/guc/slpc: Add waitboost functionality for SLPC

2021-11-03 Thread Dixit, Ashutosh
On Mon, 01 Nov 2021 18:26:07 -0700, Vinay Belgaumkar wrote:
>
> Add helper in RPS code for handling SLPC and non-SLPC paths.
> When boost is requested in the SLPC path, we can ask GuC to ramp
> up the frequency req by setting the minimum frequency to boost freq.
> Reset freq back to the min softlimit when there are no more waiters.
>
> v2: Schedule a worker thread which can boost freq from within
> an interrupt context as well.
>
> v3: No need to check against requested freq before scheduling boost
> work (Ashutosh)

Reviewed-by: Ashutosh Dixit 


Re: [Intel-gfx] [PATCH] drm/i915/selftest: Disable IRQ for timestamp calculation

2021-12-01 Thread Dixit, Ashutosh
On Tue, 30 Nov 2021 05:20:05 -0800, Anshuman Gupta wrote:
>
> gt_pm selftest calculates engine ticks cycles and wall time
> cycles by delta of respective engine elapsed TIMESTAMP and ktime
> for period of 1000us.
> It compares the engine ticks cycles with wall time cycles.
>
> Disable local cpu interrupt so that interrupt handler does not
> switch out the thread during measure_clocks() and prevent
> miscalculation of engine tick cycles.

Reviewed-by: Ashutosh Dixit 

> v2:
> - nuke preempt_{disable,enable}, as disable_local_irq()
>   disable the preemption. (Chris)
>
> Cc: Chris P Wilson 
> Cc: Badal Nilawar 
> Cc: Ashutosh Dixit 
> Signed-off-by: Anshuman Gupta 
> ---
>  drivers/gpu/drm/i915/gt/selftest_gt_pm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_gt_pm.c 
> b/drivers/gpu/drm/i915/gt/selftest_gt_pm.c
> index b9441217ca3d..55c5cdb99f45 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_gt_pm.c
> @@ -43,7 +43,7 @@ static void measure_clocks(struct intel_engine_cs *engine,
>   int i;
>
>   for (i = 0; i < 5; i++) {
> - preempt_disable();
> + local_irq_disable();
>   cycles[i] = -ENGINE_READ_FW(engine, RING_TIMESTAMP);
>   dt[i] = ktime_get();
>
> @@ -51,7 +51,7 @@ static void measure_clocks(struct intel_engine_cs *engine,
>
>   dt[i] = ktime_sub(ktime_get(), dt[i]);
>   cycles[i] += ENGINE_READ_FW(engine, RING_TIMESTAMP);
> - preempt_enable();
> + local_irq_enable();
>   }
>
>   /* Use the median of both cycle/dt; close enough */
> --
> 2.26.2
>


Re: [Intel-gfx] [PATCH i-g-t 5/7] lib/ioctl_wrappers: update mmap_{read, write} for discrete

2021-07-27 Thread Dixit, Ashutosh
On Mon, 26 Jul 2021 05:03:08 -0700, Matthew Auld wrote:
>
> We can no longer just call get_caching or set_domain, and the mmap mode
> must be FIXED. This should bring back gem_exec_basic and a few others in
> CI on DG1.

We should probably also similarly update mmap_{read, write} in
lib/intel_bufops.c.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 1/7] lib/i915/gem_mman: add FIXED mmap mode

2021-07-27 Thread Dixit, Ashutosh
On Mon, 26 Jul 2021 05:03:04 -0700, Matthew Auld wrote:
>
> diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c
> index 4b4f2114..e2514f0c 100644
> --- a/lib/i915/gem_mman.c
> +++ b/lib/i915/gem_mman.c
> @@ -497,6 +497,43 @@ void *gem_mmap_offset__cpu(int fd, uint32_t handle, 
> uint64_t offset,
>   return ptr;
>  }
>
> +#define LOCAL_I915_MMAP_OFFSET_FIXED 4

Cc: @Petri Latvala

This use of LOCAL declarations is more related to the methodology we follow
in IGT rather than this patch. We have seen in the past that such LOCAL's
linger on in the code for months and years till someone decides to clean
them so the question is can we prevent these LOCAL's from getting
introduced in the first place.

One reason for these is that we sync IGT headers with drm-next whereas IGT
is used to test drm-tip. So the delta between the two results in such
LOCAL's as in this case.

My proposal is that even if we don't start sync'ing IGT headers with
drm-tip (instead of drm-next) we allow direct modification of the headers
when needed to avoid introducing such LOCAL's. So in the above case we
would add:

#define I915_MMAP_OFFSET_FIXED 4

to i915_drm.h as part of this patch and then just use
I915_MMAP_OFFSET_FIXED. If another sync happens from drm-next before this
#define has appeared there, the compile will break and whoever syncs will
need to add this again to i915_drm.h.

What do people think about a scheme such as this? The other, perhaps
better, option of course is to sync the headers directly with drm-tip
(whenever and as often as needed). But the goal in both cases is to avoid
LOCAL's, or other things like #ifndef's distributed throughout multiple
source files which we also do in such cases. A centralized internal header
to contain such declarations might not be so bad. Thanks.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 1/7] lib/i915/gem_mman: add FIXED mmap mode

2021-07-27 Thread Dixit, Ashutosh
On Tue, 27 Jul 2021 19:01:24 -0700, Dixit, Ashutosh wrote:
>
> On Mon, 26 Jul 2021 05:03:04 -0700, Matthew Auld wrote:
> >
> > diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c
> > index 4b4f2114..e2514f0c 100644
> > --- a/lib/i915/gem_mman.c
> > +++ b/lib/i915/gem_mman.c
> > @@ -497,6 +497,43 @@ void *gem_mmap_offset__cpu(int fd, uint32_t handle, 
> > uint64_t offset,
> > return ptr;
> >  }
> >
> > +#define LOCAL_I915_MMAP_OFFSET_FIXED 4
>
> Cc: @Petri Latvala
>
> This use of LOCAL declarations is more related to the methodology we follow
> in IGT rather than this patch. We have seen in the past that such LOCAL's
> linger on in the code for months and years till someone decides to clean
> them so the question is can we prevent these LOCAL's from getting
> introduced in the first place.
>
> One reason for these is that we sync IGT headers with drm-next whereas IGT
> is used to test drm-tip. So the delta between the two results in such
> LOCAL's as in this case.
>
> My proposal is that even if we don't start sync'ing IGT headers with
> drm-tip (instead of drm-next) we allow direct modification of the headers
> when needed to avoid introducing such LOCAL's. So in the above case we
> would add:
>
> #define I915_MMAP_OFFSET_FIXED 4
>
> to i915_drm.h as part of this patch and then just use
> I915_MMAP_OFFSET_FIXED. If another sync happens from drm-next before this
> #define has appeared there, the compile will break and whoever syncs will
> need to add this again to i915_drm.h.
>
> What do people think about a scheme such as this? The other, perhaps
> better, option of course is to sync the headers directly with drm-tip
> (whenever and as often as needed). But the goal in both cases is to avoid
> LOCAL's, or other things like #ifndef's distributed throughout multiple
> source files which we also do in such cases. A centralized internal header
> to contain such declarations might not be so bad. Thanks.

I have submitted a patch based on this last statement here:

https://patchwork.freedesktop.org/series/93096/
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/uapi: Add DRM_I915_QUERY_GEOMETRY_SUBSLICES

2022-03-14 Thread Dixit, Ashutosh
On Mon, 14 Mar 2022 08:35:17 -0700, Tvrtko Ursulin wrote:
>
> >> Alternatively, all other uapi uses struct i915_engine_class_instance to
> >> address engines which uses u16:u16.
> >>
> >> How ugly it is to stuff a struct into u32 flags is the question... But you
> >> could at least use u16:u16 for consistency. Unless you wanted to leave some
> >> bits free for the future?
> > Originally when I wrote this I was wanting to leave space in case it was
> > ever needed. I'm not particularly for or against keeping the space now.
>
> Yes, shrug... Neither I can't guess if we are ever likely to hit a problem
> by having fewer bits for class:instance here compared to other uapi, or if
> stuffing struct i915_engine_class_instance into flags would just be too
> ugly. I mean there is option to define a new struct and not use flags at
> all but that's probably to complicated for what it is.
>
> Anyone else with an opinion? Consistency or should be fine even like it is?

Consistency. I'd prefer to stuff struct i915_engine_class_instance into
flags, fwiw.


Re: [Intel-gfx] [PATCH topic/core-for-CI] drm/i915/dg2: Add relocation exception

2022-02-18 Thread Dixit, Ashutosh
On Fri, 18 Feb 2022 14:38:53 -0800, Lucas De Marchi wrote:
>
> The move to softpin in igt is ongoing and should land soon.
> Meanwhile, like was done for ADL and RKL, add an exception to allow
> running the igt display tests before that conversion is complete
> so we can unblock CI.

One example failure we see on DG2 if we don't do this (reported by Lucas):

IGT-Version: 1.26-g9cb64a75 (x86_64) (Linux: 5.17.0-rc4-demarchi+ x86_64)
(testdisplay:10068) ioctl_wrappers-CRITICAL: Test assertion failure function 
gem_execbuf, file ../lib/ioctl_wrappers.c:674:
(testdisplay:10068) ioctl_wrappers-CRITICAL: Failed assertion: 
__gem_execbuf(fd, execbuf) == 0
(testdisplay:10068) ioctl_wrappers-CRITICAL: error: -22 != 0
Stack trace:
#0 ../lib/igt_core.c:1754 __igt_fail_assert()
#1 [gem_execbuf+0x48]
#2 ../lib/intel_batchbuffer.c:1053 igt_blitter_fast_copy__raw()
#3 ../lib/igt_fb.c:2497 blitcopy()
#4 ../lib/igt_fb.c:2646 setup_linear_mapping()
#5 ../lib/igt_fb.c:2671 create_cairo_surface__gpu()
#6 ../lib/igt_fb.c:3959 igt_get_cairo_surface()
#7 ../lib/igt_fb.c:3987 igt_get_cairo_ctx()
#8 ../lib/igt_fb.c:1980 igt_create_pattern_fb()
#9 ../tests/testdisplay.c:271 set_mode()
#10 ../tests/testdisplay.c:511 update_display()
#11 ../tests/testdisplay.c:763 main()
#12 ../csu/libc-start.c:342 __libc_start_main()
#13 [_start+0x2e]
Test testdisplay failed


Re: [Intel-gfx] [PATCH i-g-t] tests/i915/gem_exec_params: check available memory earlier

2022-03-01 Thread Dixit, Ashutosh
On Tue, 01 Mar 2022 03:03:59 -0800, Matthew Auld wrote:
>
> The shmem mmap and pwrite interfaces conveniently let us probe just a
> few pages, without needing to populate the entire object. On discrete
> and newer platforms the kernel has dropped support for both, leaving us
> with MMAP_OFFSET, which will always populate the entire object, for now
> at least. Luckily we can just move the batch creation to after checking
> the available memory to ensure we don't hit -ENOMEM on such platforms.
>
> Also it seems that doing a massive allocation(filling much of system
> memory) and then calling intel_purge_vm_caches() seems to take 40+
> seconds, like when calling intel_require_memory(). Hence switching the
> ordering here should also help with that.
>
> For reference the larger-than-life test is just a simple regression test
> to ensure that some very large batch buffer(greater than ~4G) can't
> overflow the batch_len, causing all kinds of issues. See 57b2d834bf23
> ("drm/i915/gem: Support parsing of oversize batches").

I believe we tried this identical patch some time but don't remember why it
was never merged. Maybe it was not helping or more likely we weren't sure
it wouldn't break the test somehow (considering all the shmfs/swap
stuff). But if we think it's ok to merge this and may actually resolve the
-ENOMEM issue, this is:

Reviewed-by: Ashutosh Dixit 


Re: [Intel-gfx] [PATCH] drm/i915/rps: Centralize computation of freq caps

2022-03-22 Thread Dixit, Ashutosh
On Mon, 21 Mar 2022 11:17:46 -0700, Lucas De Marchi wrote:
>
> On Mon, Mar 21, 2022 at 10:56:04AM -0700, Ashutosh Dixit wrote:
> > diff --git a/drivers/gpu/drm/i915/gt/intel_rps_types.h 
> > b/drivers/gpu/drm/i915/gt/intel_rps_types.h
> > index 3941d8551f52..5990df35b393 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_rps_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_rps_types.h
> > @@ -37,6 +37,13 @@ enum {
> > INTEL_RPS_TIMER,
> > };
> >
> > +/* Freq caps exposed by HW, in units of 16.67 MHz for recent gens */
>
> "recent gens" is probably too broad. Since we are exporting this struct
> to other parts of the driver and we are already abstracting the
> register location and bit position, maybe we should also already
> abstract the unit in the same place? i.e. just make it always be
> "units of 16.67 MHz", or even just make it a standard Hz unit.
>
> If this would rather make things more complicated for places that expect
> "hw units", then maybe just say in this comment the value is in "HW
> units" and intel_gpu_freq() should be used to convert to hz.

Fixed in v2. In v2 I've changed the comment to say values are in "hw units"
and intel_gpu_freq() should be used to convert to MHz.

I have also added a couple of hopefully clarifying comments to
intel_rps_get_freq_caps() in v2. Some of the history of this code was not
clear to me previously and I had to trace all the way back to cee991cb9323
to figure out what is happening.

Thanks.
--
Ashutosh

> > +struct intel_rps_freq_caps {
> > +   u8 rp0_freq;/* Non-overclocked max frequency. */
> > +   u8 rp1_freq;/* "less than" RP0 power/freqency */
> > +   u8 min_freq;/* AKA RPn. Minimum frequency */
> > +};
> > +
> > struct intel_rps {
> > struct mutex lock; /* protects enabling and the worker */
> >


Re: [Intel-gfx] [PATCH i-g-t 1/4] test/gem_lmem_swapping: account for object rounding

2022-03-24 Thread Dixit, Ashutosh
On Thu, 24 Mar 2022 07:26:18 -0700, Matthew Auld wrote:
>
> On DG2 the object size might be rounded when allocating lmem. Make sure
> we account for any rounding up.

Reviewed-by: Ashutosh Dixit 


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 3/4] tests/gem_lmem_swapping: limit lmem to 4G

2022-03-24 Thread Dixit, Ashutosh
On Thu, 24 Mar 2022 07:26:20 -0700, Matthew Auld wrote:
>
> @@ -554,6 +560,7 @@ igt_main_args("", long_options, help_str, opt_handler, 
> NULL)
>   igt_fixture {
>   free(regions);
>   close(i915);
> + igt_i915_driver_unload();

I thought we'd reload the module with default params here but when the next
test runs the module gets loaded automatically so maybe this is ok?


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 2/4] test/gem_lmem_swapping: fix physical engine usage

2022-03-24 Thread Dixit, Ashutosh
On Thu, 24 Mar 2022 07:26:19 -0700, Matthew Auld wrote:
>
> @@ -353,14 +356,17 @@ static void test_evict(int i915,
>   if (flags & TEST_PARALLEL) {
>   int fd = gem_reopen_driver(i915);
>
> + ctx = intel_ctx_create_all_physical(fd);
> + __gem_context_set_persistence(i915, ctx->id, false);
> +
>   igt_fork(child, nproc)
> - __do_evict(fd, >region, ,
> + __do_evict(fd, ctx, >region, ,
>  params.seed + child + 1);
>
>   igt_waitchildren();
>   close(fd);

We might introduce some memory leak tests some time so it would be good to
destroy any contexts we create with intel_ctx_destroy().

> @@ -528,17 +536,20 @@ igt_main_args("", long_options, help_str, opt_handler, 
> NULL)
>   for_each_physical_engine(i915, e)
>   __num_engines__++;
>   igt_require(__num_engines__);
> + ctx = intel_ctx_create_all_physical(i915);
> + __gem_context_set_persistence(i915, ctx->id, false);
> +
>   }
>
>   for (test = tests; test->name; test++) {
>   igt_describe("Exercise local memory swapping to system memory");
>   dynamic_lmem_subtest(region, regions, test->name)
> - test_evict(i915, region, test->flags);
> + test_evict(i915, ctx, region, test->flags);
>   }
>
>   igt_describe("Exercise local memory swapping during exhausting system 
> memory");
>   dynamic_lmem_subtest(region, regions, "smem-oom")
> - test_smem_oom(i915, region);
> + test_smem_oom(i915, ctx, region);
>
>   igt_fixture {
>   free(regions);

Here too.


[Intel-gfx] drm-tip compile break

2022-03-30 Thread Dixit, Ashutosh
Is anyone looking into fixing this:

drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c: In function 
‘amdgpu_gtt_mgr_recover’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:200:31: error: ‘struct 
ttm_range_mgr_node’ has no member named ‘tbo’
   amdgpu_ttm_recover_gart(node->tbo);
   ^~
make[4]: *** [scripts/Makefile.build:288: 
drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.o] Error 1

Thanks.


Re: [Intel-gfx] [PATCH] drm/i915/debugfs: Dump i915 children runtime status

2022-03-29 Thread Dixit, Ashutosh
On Mon, 28 Mar 2022 03:22:27 -0700, Anshuman Gupta wrote:
>
> +#ifdef CONFIG_PM
> +static int i915_runtime_dump_child_status(struct device *dev, void *data)
> +{
> + struct seq_file *m = data;
> + const char *rpm_status;
> +
> + /* Early return if runtime_pm is disabled */
> + if (dev->power.disable_depth)
> + return 0;
> +
> + switch (dev->power.runtime_status) {
> + case RPM_SUSPENDED:
> + rpm_status = "suspended";
> + break;
> + case RPM_SUSPENDING:
> + rpm_status = "suspending";
> + break;
> + case RPM_RESUMING:
> + rpm_status = "resuming";
> + break;
> + case RPM_ACTIVE:
> + rpm_status = "active";
> + break;
> + default:
> + rpm_status = "unknown";
> + }
> +
> + seq_printf(m, "\t%s %s: Runtime status: %s\n", dev_driver_string(dev),
> +dev_name(dev), rpm_status);
> +
> + return 0;
> +}
> +#endif

Maybe a nit, but perhaps defining a const array is better than having a
switch statement? Similar to what is done in rtpm_status_str(). The
function itself is very similar to rtpm_status_str() so can probably
benefit from that similarity. Can perhaps even be nearly identical to
rtpm_status_str() (since that is static in the genpd (generic power domain)
code).

See also 2bd5306a8764 ("PM / Domains: add debugfs listing of struct
generic_pm_domain-s"), though I am not sure if genpd's are applicable in
our case and certainly look way out of scope for now. Thanks.

> +
>  static int i915_runtime_pm_status(struct seq_file *m, void *unused)
>  {
>   struct drm_i915_private *dev_priv = node_to_i915(m->private);
> @@ -500,6 +534,10 @@ static int i915_runtime_pm_status(struct seq_file *m, 
> void *unused)
>  #ifdef CONFIG_PM
>   seq_printf(m, "Usage count: %d\n",
>  atomic_read(_priv->drm.dev->power.usage_count));
> + seq_printf(m, "Runtime active children: %d\n",
> +atomic_read(_priv->drm.dev->power.child_count));
> + device_for_each_child(>dev, m, i915_runtime_dump_child_status);
> +
>  #else
>   seq_printf(m, "Device Power Management (CONFIG_PM) disabled\n");
>  #endif
> --
> 2.26.2
>


Re: [Intel-gfx] [PATCH] drm/i915/rps: Centralize computation of freq caps

2022-03-21 Thread Dixit, Ashutosh
On Mon, 21 Mar 2022 11:17:46 -0700, Lucas De Marchi wrote:
>
> On Mon, Mar 21, 2022 at 10:56:04AM -0700, Ashutosh Dixit wrote:
> > diff --git a/drivers/gpu/drm/i915/gt/intel_rps_types.h 
> > b/drivers/gpu/drm/i915/gt/intel_rps_types.h
> > index 3941d8551f52..5990df35b393 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_rps_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_rps_types.h
> > @@ -37,6 +37,13 @@ enum {
> > INTEL_RPS_TIMER,
> > };
> >
> > +/* Freq caps exposed by HW, in units of 16.67 MHz for recent gens */
>
> "recent gens" is probably too broad. Since we are exporting this struct
> to other parts of the driver and we are already abstracting the
> register location and bit position, maybe we should also already
> abstract the unit in the same place? i.e. just make it always be
> "units of 16.67 MHz", or even just make it a standard Hz unit.
>
> If this would rather make things more complicated for places that expect
> "hw units", then maybe just say in this comment the value is in "HW
> units" and intel_gpu_freq() should be used to convert to hz.

OK, let me investigate some more, see what's possible and then get back
regarding this. Thanks for reviewing.


Re: [Intel-gfx] [PATCH] drm/i915/rps: Centralize computation of freq caps

2022-03-23 Thread Dixit, Ashutosh
On Wed, 23 Mar 2022 00:03:23 -0700, Nilawar, Badal wrote:
>
> > +/* "Caps" frequencies should be converted to MHz using intel_gpu_freq() */
> > +void intel_rps_get_freq_caps(struct intel_rps *rps, struct 
> > intel_rps_freq_caps *capSis)
>
> Since this function is covering gen6 and above it would be good to rename
> it as gen6_rps_get_freq_caps.

I've made this change in v4. Thanks.


Re: [Intel-gfx] [PATCH i-g-t] intel_gpu_top: Improve error message when insufficient privilege

2022-02-01 Thread Dixit, Ashutosh
On Tue, 01 Feb 2022 07:19:46 -0800, Tvrtko Ursulin wrote:
>
> From: Tvrtko Ursulin 
>
> Print out end user friendly help text when the running user has
> insufficient privilege for accessing system wide performance counters.

Reviewed-by: Ashutosh Dixit 

> Signed-off-by: Tvrtko Ursulin 
> Issue: https://gitlab.freedesktop.org/drm/intel/-/issues/5018
> ---
>  tools/intel_gpu_top.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
> index 81c724d1fe1c..0404a5881b40 100644
> --- a/tools/intel_gpu_top.c
> +++ b/tools/intel_gpu_top.c
> @@ -1761,6 +1761,15 @@ int main(int argc, char **argv)
>   if (ret) {
>   fprintf(stderr,
>   "Failed to initialize PMU! (%s)\n", strerror(errno));
> + if (errno == EACCES && geteuid())
> + fprintf(stderr,
> +"\n"
> +"When running as a normal user CAP_PERFMON is required to access 
> performance\n"
> +"monitoring. See \"man 7 capabilities\", \"man 8 setcap\", or contact your\n"
> +"distribution vendor for assistance.\n"
> +"\n"
> +"More information can be found at 'Perf events and tool security' 
> document:\n"
> +"https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html\n;);
>   ret = EXIT_FAILURE;
>   goto err;
>   }
> --
> 2.32.0
>


Re: [Intel-gfx] [PATCH i-g-t] tests/i915/gem_exec_capture: Fix memory object size in gem_exec_capture

2022-01-20 Thread Dixit, Ashutosh
On Thu, 20 Jan 2022 17:09:28 -0800, john.c.harri...@intel.com wrote:
>
> From: John Harrison 
>
> The capture tests require knowing exactly how big the test allocation
> is. Part of the test is to compare the captured size against the
> allocated size to make sure they match. That doesn't work if the
> allocator creates an object of a different size than was requested
> without reporting the larger size.

Reviewed-by: Ashutosh Dixit 

> Fixes: 85a593809 ("tests/i915/gem_exec_capture: Add support for local memory")
> Signed-off-by: John Harrison 
> ---
>  tests/i915/gem_exec_capture.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index 5b2482518..60f8df04c 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -387,10 +387,9 @@ static void capture(int fd, int dir, const intel_ctx_t 
> *ctx,
>   const struct intel_execution_engine2 *e, uint32_t region)
>  {
>   uint32_t handle;
> - uint64_t ahnd;
> - int obj_size = 4096;
> + uint64_t ahnd, obj_size = 4096;
>
> - handle = gem_create_in_memory_regions(fd, obj_size, region);
> + igt_assert_eq(__gem_create_in_memory_regions(fd, , _size, 
> region), 0);
>   ahnd = get_reloc_ahnd(fd, ctx->id);
>
>   __capture1(fd, dir, ahnd, ctx, e, handle, obj_size, region);
> --
> 2.25.1
>


Re: [Intel-gfx] [PATCH i-g-t] tests/i915/perf_pmu: Fix allow hang handling

2022-01-06 Thread Dixit, Ashutosh
On Thu, 06 Jan 2022 08:42:58 -0800, Tvrtko Ursulin wrote:
>
> From: Tvrtko Ursulin 
>
> Commit d7a74b959eea ("tests/i915/perf_pmu: Convert to intel_ctx_t (v3)")
> broke the test when it is run in its entirety.
>
> Correct context id needs to be used with igt_allow_hang to avoid context
> ban preventing the test to run to completion.

Reviewed-by: Ashutosh Dixit 

> Signed-off-by: Tvrtko Ursulin 
> Fixes: d7a74b959eea ("tests/i915/perf_pmu: Convert to intel_ctx_t (v3)")
> Cc: Ashutosh Dixit 
> Cc: Maarten Lankhorst 
> ---
> I haven't audited more tests apart from grep showing a lot of
> igt_allow_hang with the default context so someone needs to go through
> them and see if more are affected.

Thanks, will do.

> ---
>  tests/i915/perf_pmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/i915/perf_pmu.c b/tests/i915/perf_pmu.c
> index 581da8483f0c..1665dc5e6317 100644
> --- a/tests/i915/perf_pmu.c
> +++ b/tests/i915/perf_pmu.c
> @@ -2319,7 +2319,7 @@ igt_main
>   }
>
>   test_each_engine("busy-hang", fd, ctx, e) {
> - igt_hang_t hang = igt_allow_hang(fd, 0, 0);
> + igt_hang_t hang = igt_allow_hang(fd, ctx->id, 0);
>
>   single(fd, ctx, e, TEST_BUSY | FLAG_HANG);
>
> --
> 2.32.0
>


Re: [Intel-gfx] [PATCH] drm/i915/rps: Centralize computation of freq caps

2022-04-06 Thread Dixit, Ashutosh
On Wed, 06 Apr 2022 03:09:45 -0700, Anshuman Gupta wrote:
> On 2022-03-24 at 01:24:35 +0530, Ashutosh Dixit wrote:
> > +/* "Caps" frequencies should be converted to MHz using intel_gpu_freq() */
> IMHO, if this exported function deserves a comment, it should Kernel Doc 
> comment.
> for an example see Doc comment of intel_runtime_pm_get_raw().
> Thanks,
> Anshuman Gupta.
> > +void gen6_rps_get_freq_caps(struct intel_rps *rps, struct 
> > intel_rps_freq_caps *caps)

I have changed the comments to kernel doc for both gen6_rps_get_freq_caps()
and 'struct intel_rps_freq_caps' in v5 (v6 in Patchwork). Please take a
look. Thanks.


  1   2   3   4   5   6   >