Re: [Intel-gfx] [PATCH i-g-t v3 1/1] tests: Exercise remote request vs barrier handling race

2023-02-14 Thread Janusz Krzysztofik
Hi Kamil,

Thanks for review.

On Tuesday, 14 February 2023 22:20:10 CET Kamil Konieczny wrote:
> Hi Janusz,
> 
> On 2023-02-13 at 10:31:32 +0100, Janusz Krzysztofik wrote:
> > Users reported oopses on list corruptions when using i915 perf with a
> > number of concurrently running graphics applications.  That indicates we
> > are currently missing some important tests for such scenarios.  Cover
> > that gap.
> > 
> > Root cause analysis pointed out to an issue in barrier processing code and
> > its interaction with perf replacing kernel contexts' active barriers with
> > its own requests.
> > 
> > Add a new test intended for exercising intentionally racy barrier tasks
> > list processing and its interaction with other i915 subsystems.  As a
> > first subtest, add one that exercises the interaction of remote requests
> > with barrier tasks list handling, especially barrier preallocate / acquire
> > operations performed during context first pin / last unpin.
> > 
> > The code is partially inspired by Chris Wilson's igt@perf@open-race
> > subtest, which I was not able to get an Ack for from upstream.
> > 
> > v3: don't add the new subtest to gem_ctx_exec which occurred blocklisted,
> > create a new test hosting the new subtest, update commit descripion,
> >   - prepare parameters for perf open still in the main thread to avoid
> > test failures on platforms with no perf support (will skip now),
> >   - call perf open with OA buffer reports disabled, this will make sure
> > that the perf API doesn't unnecessarily enable the OA unit, while the
> > test still runs the targeted code (Umesh),
> >   - replace additional code for OA exponent calculations with a reasonable
> > hardcoded value (Umesh).
> > v2: convert to a separate subtest, not a variant of another one (that has
> > been dropped from the series),
> >   - move the subtest out of tests/i915/perf.c (Ashutosh), add it to
> > tests/i915/gem_ctx_exec.c,
> >   - don't touch lib/i915/perf.c (Umesh, Ashutosh), duplicate reused code
> > from tests/i915/perf.c in tests/i915/gem_ctx_exec.c.
> > 
> > References: https://gitlab.freedesktop.org/drm/intel/-/issues/6333
> > Signed-off-by: Janusz Krzysztofik 
> > Cc: Chris Wilson 
> > Cc: Kamil Konieczny 
> > Cc: Ashutosh Dixit 
> > Cc: Umesh Nerlige Ramappa 
> > ---
> >  tests/i915/gem_barrier_race.c | 159 ++
> >  tests/meson.build |   8 ++
> >  2 files changed, 167 insertions(+)
> >  create mode 100644 tests/i915/gem_barrier_race.c
> > 
> > diff --git a/tests/i915/gem_barrier_race.c b/tests/i915/gem_barrier_race.c
> > new file mode 100644
> > index 00..fd0c7bdf1c
> > --- /dev/null
> > +++ b/tests/i915/gem_barrier_race.c
> > @@ -0,0 +1,159 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright(c) 2023 Intel Corporation. All rights reserved.
> > + */
> > +
> > +#include 
> > +
> > +#include "drmtest.h"
> > +#include "igt_aux.h"
> > +#include "igt_core.h"
> > +#include "igt_gt.h"
> > +#include "intel_chipset.h"
> > +#include "intel_reg.h"
> > +#include "ioctl_wrappers.h"
> > +
> > +#include "i915/gem.h"
> > +#include "i915/gem_create.h"
> > +#include "i915/gem_engine_topology.h"
> > +#include "i915/perf.h"
> > +
> > +IGT_TEST_DESCRIPTION("Exercise barrier tasks list handling and its 
> > interation with other i915 subsystems");
> --- ^ 
> --^-- ^
> s/interation/interaction/

Thanks.

> Please make it generic so it will not need to be changed soon,
> for example:
> IGT_TEST_DESCRIPTION("Exercise barrier tasks and its interaction with other 
> subsystems");

Since we are not exercising barrier tasks only barriers (the list name is 
barrier_tasks, while another list is called preallocated_barriers, then 
"tasks" without "list" may be confusing, I believe), I'll use:

IGT_TEST_DESCRIPTION("Exercise barriers and their interaction with other 
subsystems");

OK?

> 
> > +
> > +/* Based on code patterns found in tests/i915/perf.c */
> > +static void perf_open_close_workload(int fd, int *done)
> > +{
> > +   struct intel_perf_metric_set *metric_set = NULL, *metric_set_iter;
> > +   struct intel_perf *intel_perf = intel_perf_for_fd(fd);
> > +   uint64_t properties[] = {
> > +   DRM_I915_PERF_PROP_SAMPLE_OA, true,
> > +   DRM_I915_PERF_PROP_OA_METRICS_SET, 0,
> > +   DRM_I915_PERF_PROP_OA_FORMAT, 0,
> > +   DRM_I915_PERF_PROP_OA_EXPONENT, 5,
> > +   };
> > +   struct drm_i915_perf_open_param param = {
> > +   .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_DISABLED,
> > +   .num_properties = sizeof(properties) / 16,
> > +   .properties_ptr = to_user_pointer(properties),
> > +   };
> > +   uint32_t devid = intel_get_drm_devid(fd);
> > +
> > +   igt_require(intel_perf);
> > +   intel_perf_load_perf_configs(intel_perf, fd);
> > +
> > +   igt_require(devid);
> > +   

Re: [Intel-gfx] [PATCH i-g-t v3 1/1] tests: Exercise remote request vs barrier handling race

2023-02-14 Thread Kamil Konieczny
Hi Janusz,

On 2023-02-13 at 10:31:32 +0100, Janusz Krzysztofik wrote:
> Users reported oopses on list corruptions when using i915 perf with a
> number of concurrently running graphics applications.  That indicates we
> are currently missing some important tests for such scenarios.  Cover
> that gap.
> 
> Root cause analysis pointed out to an issue in barrier processing code and
> its interaction with perf replacing kernel contexts' active barriers with
> its own requests.
> 
> Add a new test intended for exercising intentionally racy barrier tasks
> list processing and its interaction with other i915 subsystems.  As a
> first subtest, add one that exercises the interaction of remote requests
> with barrier tasks list handling, especially barrier preallocate / acquire
> operations performed during context first pin / last unpin.
> 
> The code is partially inspired by Chris Wilson's igt@perf@open-race
> subtest, which I was not able to get an Ack for from upstream.
> 
> v3: don't add the new subtest to gem_ctx_exec which occurred blocklisted,
> create a new test hosting the new subtest, update commit descripion,
>   - prepare parameters for perf open still in the main thread to avoid
> test failures on platforms with no perf support (will skip now),
>   - call perf open with OA buffer reports disabled, this will make sure
> that the perf API doesn't unnecessarily enable the OA unit, while the
> test still runs the targeted code (Umesh),
>   - replace additional code for OA exponent calculations with a reasonable
> hardcoded value (Umesh).
> v2: convert to a separate subtest, not a variant of another one (that has
> been dropped from the series),
>   - move the subtest out of tests/i915/perf.c (Ashutosh), add it to
> tests/i915/gem_ctx_exec.c,
>   - don't touch lib/i915/perf.c (Umesh, Ashutosh), duplicate reused code
> from tests/i915/perf.c in tests/i915/gem_ctx_exec.c.
> 
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/6333
> Signed-off-by: Janusz Krzysztofik 
> Cc: Chris Wilson 
> Cc: Kamil Konieczny 
> Cc: Ashutosh Dixit 
> Cc: Umesh Nerlige Ramappa 
> ---
>  tests/i915/gem_barrier_race.c | 159 ++
>  tests/meson.build |   8 ++
>  2 files changed, 167 insertions(+)
>  create mode 100644 tests/i915/gem_barrier_race.c
> 
> diff --git a/tests/i915/gem_barrier_race.c b/tests/i915/gem_barrier_race.c
> new file mode 100644
> index 00..fd0c7bdf1c
> --- /dev/null
> +++ b/tests/i915/gem_barrier_race.c
> @@ -0,0 +1,159 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright(c) 2023 Intel Corporation. All rights reserved.
> + */
> +
> +#include 
> +
> +#include "drmtest.h"
> +#include "igt_aux.h"
> +#include "igt_core.h"
> +#include "igt_gt.h"
> +#include "intel_chipset.h"
> +#include "intel_reg.h"
> +#include "ioctl_wrappers.h"
> +
> +#include "i915/gem.h"
> +#include "i915/gem_create.h"
> +#include "i915/gem_engine_topology.h"
> +#include "i915/perf.h"
> +
> +IGT_TEST_DESCRIPTION("Exercise barrier tasks list handling and its 
> interation with other i915 subsystems");
--- ^ 
--^-- ^
s/interation/interaction/

Please make it generic so it will not need to be changed soon,
for example:
IGT_TEST_DESCRIPTION("Exercise barrier tasks and its interaction with other 
subsystems");

> +
> +/* Based on code patterns found in tests/i915/perf.c */
> +static void perf_open_close_workload(int fd, int *done)
> +{
> + struct intel_perf_metric_set *metric_set = NULL, *metric_set_iter;
> + struct intel_perf *intel_perf = intel_perf_for_fd(fd);
> + uint64_t properties[] = {
> + DRM_I915_PERF_PROP_SAMPLE_OA, true,
> + DRM_I915_PERF_PROP_OA_METRICS_SET, 0,
> + DRM_I915_PERF_PROP_OA_FORMAT, 0,
> + DRM_I915_PERF_PROP_OA_EXPONENT, 5,
> + };
> + struct drm_i915_perf_open_param param = {
> + .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_DISABLED,
> + .num_properties = sizeof(properties) / 16,
> + .properties_ptr = to_user_pointer(properties),
> + };
> + uint32_t devid = intel_get_drm_devid(fd);
> +
> + igt_require(intel_perf);
> + intel_perf_load_perf_configs(intel_perf, fd);
> +
> + igt_require(devid);
> + igt_list_for_each_entry(metric_set_iter, _perf->metric_sets, 
> link) {
> + if (!strcmp(metric_set_iter->symbol_name,
> + IS_HASWELL(devid) ? "RenderBasic" : "TestOa")) {
> + metric_set = metric_set_iter;
> + break;
> + }
> + }
> + igt_require(metric_set);
> + igt_require(metric_set->perf_oa_metrics_set);
> + properties[3] = metric_set->perf_oa_metrics_set;
> + properties[5] = metric_set->perf_oa_format;
> +
> + intel_perf_free(intel_perf);
> +
> + igt_fork(child, 1) {
> + do {
> +  

[Intel-gfx] [PATCH i-g-t v3 1/1] tests: Exercise remote request vs barrier handling race

2023-02-13 Thread Janusz Krzysztofik
Users reported oopses on list corruptions when using i915 perf with a
number of concurrently running graphics applications.  That indicates we
are currently missing some important tests for such scenarios.  Cover
that gap.

Root cause analysis pointed out to an issue in barrier processing code and
its interaction with perf replacing kernel contexts' active barriers with
its own requests.

Add a new test intended for exercising intentionally racy barrier tasks
list processing and its interaction with other i915 subsystems.  As a
first subtest, add one that exercises the interaction of remote requests
with barrier tasks list handling, especially barrier preallocate / acquire
operations performed during context first pin / last unpin.

The code is partially inspired by Chris Wilson's igt@perf@open-race
subtest, which I was not able to get an Ack for from upstream.

v3: don't add the new subtest to gem_ctx_exec which occurred blocklisted,
create a new test hosting the new subtest, update commit descripion,
  - prepare parameters for perf open still in the main thread to avoid
test failures on platforms with no perf support (will skip now),
  - call perf open with OA buffer reports disabled, this will make sure
that the perf API doesn't unnecessarily enable the OA unit, while the
test still runs the targeted code (Umesh),
  - replace additional code for OA exponent calculations with a reasonable
hardcoded value (Umesh).
v2: convert to a separate subtest, not a variant of another one (that has
been dropped from the series),
  - move the subtest out of tests/i915/perf.c (Ashutosh), add it to
tests/i915/gem_ctx_exec.c,
  - don't touch lib/i915/perf.c (Umesh, Ashutosh), duplicate reused code
from tests/i915/perf.c in tests/i915/gem_ctx_exec.c.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6333
Signed-off-by: Janusz Krzysztofik 
Cc: Chris Wilson 
Cc: Kamil Konieczny 
Cc: Ashutosh Dixit 
Cc: Umesh Nerlige Ramappa 
---
 tests/i915/gem_barrier_race.c | 159 ++
 tests/meson.build |   8 ++
 2 files changed, 167 insertions(+)
 create mode 100644 tests/i915/gem_barrier_race.c

diff --git a/tests/i915/gem_barrier_race.c b/tests/i915/gem_barrier_race.c
new file mode 100644
index 00..fd0c7bdf1c
--- /dev/null
+++ b/tests/i915/gem_barrier_race.c
@@ -0,0 +1,159 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright(c) 2023 Intel Corporation. All rights reserved.
+ */
+
+#include 
+
+#include "drmtest.h"
+#include "igt_aux.h"
+#include "igt_core.h"
+#include "igt_gt.h"
+#include "intel_chipset.h"
+#include "intel_reg.h"
+#include "ioctl_wrappers.h"
+
+#include "i915/gem.h"
+#include "i915/gem_create.h"
+#include "i915/gem_engine_topology.h"
+#include "i915/perf.h"
+
+IGT_TEST_DESCRIPTION("Exercise barrier tasks list handling and its interation 
with other i915 subsystems");
+
+/* Based on code patterns found in tests/i915/perf.c */
+static void perf_open_close_workload(int fd, int *done)
+{
+   struct intel_perf_metric_set *metric_set = NULL, *metric_set_iter;
+   struct intel_perf *intel_perf = intel_perf_for_fd(fd);
+   uint64_t properties[] = {
+   DRM_I915_PERF_PROP_SAMPLE_OA, true,
+   DRM_I915_PERF_PROP_OA_METRICS_SET, 0,
+   DRM_I915_PERF_PROP_OA_FORMAT, 0,
+   DRM_I915_PERF_PROP_OA_EXPONENT, 5,
+   };
+   struct drm_i915_perf_open_param param = {
+   .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_DISABLED,
+   .num_properties = sizeof(properties) / 16,
+   .properties_ptr = to_user_pointer(properties),
+   };
+   uint32_t devid = intel_get_drm_devid(fd);
+
+   igt_require(intel_perf);
+   intel_perf_load_perf_configs(intel_perf, fd);
+
+   igt_require(devid);
+   igt_list_for_each_entry(metric_set_iter, _perf->metric_sets, 
link) {
+   if (!strcmp(metric_set_iter->symbol_name,
+   IS_HASWELL(devid) ? "RenderBasic" : "TestOa")) {
+   metric_set = metric_set_iter;
+   break;
+   }
+   }
+   igt_require(metric_set);
+   igt_require(metric_set->perf_oa_metrics_set);
+   properties[3] = metric_set->perf_oa_metrics_set;
+   properties[5] = metric_set->perf_oa_format;
+
+   intel_perf_free(intel_perf);
+
+   igt_fork(child, 1) {
+   do {
+   int stream = igt_ioctl(fd, DRM_IOCTL_I915_PERF_OPEN, 
);
+
+   igt_assert_fd(stream);
+   close(stream);
+
+   } while (!READ_ONCE(*done));
+   }
+}
+
+static void remote_request_workload(int fd, int *done)
+{
+   /*
+* Use DRM_IOCTL_I915_PERF_OPEN / close as
+* intel_context_prepare_remote_request() workload
+*/
+   perf_open_close_workload(fd, done);
+}
+
+/* Copied from tests/i915/gem_ctx_exec.c */
+static int exec(int fd,