from:"Sebastian Andrzej Siewior"

Re: [PATCH] drm/ttm/tests: Let ttm_bo_test consider different ww_mutex implementation.

2024-05-21 Thread Sebastian Andrzej Siewior

On 2024-04-26 09:01:37 [+0200], To dri-devel@lists.freedesktop.org wrote:
> On 2024-04-04 12:25:36 [+0200], To dri-devel@lists.freedesktop.org wrote:
> > PREEMPT_RT has a different locking implementation for ww_mutex. The
> > base mutex of struct ww_mutex is declared as struct WW_MUTEX_BASE. The
> > latter is defined as `mutex' for non-PREEMPT_RT builds and `rt_mutex'
> > for PREEMPT_RT builds.
> > 
> > Using mutex_lock() directly on the base mutex in
> > ttm_bo_reserve_deadlock() leads to compile error on PREEMPT_RT.
> > 
> > The locking-selftest has its own defines to deal with this and it is
> > probably best to defines the needed one within the test program since
> > their usefulness is limited outside of well known selftests.
> > 
> > Provide ww_mutex_base_lock() which points to the correct function for
> > PREEMPT_RT and non-PREEMPT_RT builds.
> > 
> > Fixes: 995279d280d1e ("drm/ttm/tests: Add tests for ttm_bo functions")
> > Signed-off-by: Sebastian Andrzej Siewior 
> > ---
> > 
> > For the record, testing led to
> > | WARNING: CPU: 5 PID: 2089 at include/drm/ttm/ttm_bo.h:247 
> > ttm_bo_reserve_no_wait_ticket+0xe8/0x150 [ttm_bo_test]
> > 
> > but this occurred with and without RT on v6.9-rc2.
> 
> a friendly ping.

ping ;)

Sebastian

Re: [PATCH] drm/ttm/tests: Let ttm_bo_test consider different ww_mutex implementation.

2024-04-26 Thread Sebastian Andrzej Siewior

On 2024-04-04 12:25:36 [+0200], To dri-devel@lists.freedesktop.org wrote:
> PREEMPT_RT has a different locking implementation for ww_mutex. The
> base mutex of struct ww_mutex is declared as struct WW_MUTEX_BASE. The
> latter is defined as `mutex' for non-PREEMPT_RT builds and `rt_mutex'
> for PREEMPT_RT builds.
> 
> Using mutex_lock() directly on the base mutex in
> ttm_bo_reserve_deadlock() leads to compile error on PREEMPT_RT.
> 
> The locking-selftest has its own defines to deal with this and it is
> probably best to defines the needed one within the test program since
> their usefulness is limited outside of well known selftests.
> 
> Provide ww_mutex_base_lock() which points to the correct function for
> PREEMPT_RT and non-PREEMPT_RT builds.
> 
> Fixes: 995279d280d1e ("drm/ttm/tests: Add tests for ttm_bo functions")
> Signed-off-by: Sebastian Andrzej Siewior 
> ---
> 
> For the record, testing led to
> | WARNING: CPU: 5 PID: 2089 at include/drm/ttm/ttm_bo.h:247 
> ttm_bo_reserve_no_wait_ticket+0xe8/0x150 [ttm_bo_test]
> 
> but this occurred with and without RT on v6.9-rc2.

a friendly ping.

Sebastian

[PATCH] drm/ttm/tests: Let ttm_bo_test consider different ww_mutex implementation.

2024-04-04 Thread Sebastian Andrzej Siewior

PREEMPT_RT has a different locking implementation for ww_mutex. The
base mutex of struct ww_mutex is declared as struct WW_MUTEX_BASE. The
latter is defined as `mutex' for non-PREEMPT_RT builds and `rt_mutex'
for PREEMPT_RT builds.

Using mutex_lock() directly on the base mutex in
ttm_bo_reserve_deadlock() leads to compile error on PREEMPT_RT.

The locking-selftest has its own defines to deal with this and it is
probably best to defines the needed one within the test program since
their usefulness is limited outside of well known selftests.

Provide ww_mutex_base_lock() which points to the correct function for
PREEMPT_RT and non-PREEMPT_RT builds.

Fixes: 995279d280d1e ("drm/ttm/tests: Add tests for ttm_bo functions")
Signed-off-by: Sebastian Andrzej Siewior 
---

For the record, testing led to
| WARNING: CPU: 5 PID: 2089 at include/drm/ttm/ttm_bo.h:247 
ttm_bo_reserve_no_wait_ticket+0xe8/0x150 [ttm_bo_test]

but this occurred with and without RT on v6.9-rc2.

 drivers/gpu/drm/ttm/tests/ttm_bo_test.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c 
b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
index 1f8a4f8adc929..9cc367a795341 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
@@ -18,6 +18,12 @@
 
 #define BO_SIZESZ_8K
 
+#ifdef CONFIG_PREEMPT_RT
+#define ww_mutex_base_lock(b)  rt_mutex_lock(b)
+#else
+#define ww_mutex_base_lock(b)  mutex_lock(b)
+#endif
+
 struct ttm_bo_test_case {
const char *description;
bool interruptible;
@@ -142,7 +148,7 @@ static void ttm_bo_reserve_deadlock(struct kunit *test)
bo2 = ttm_bo_kunit_init(test, test->priv, BO_SIZE);
 
ww_acquire_init(, _ww_class);
-   mutex_lock(>base.resv->lock.base);
+   ww_mutex_base_lock(>base.resv->lock.base);
 
/* The deadlock will be caught by WW mutex, don't warn about it */
lock_release(>base.resv->lock.base.dep_map, 1);
-- 
2.43.0

Re: [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.

2023-10-05 Thread Sebastian Andrzej Siewior

On 2023-10-04 08:44:58 [-0400], Harry Wentland wrote:
> CI passed.
> 
> Series is
> Acked-by: Harry Wentland 

Thank you.

> Harry

Sebastian

Re: [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin().

2023-10-05 Thread Sebastian Andrzej Siewior

On 2023-10-04 08:10:35 [-0400], Hamza Mahfooz wrote:
> I did some digging, and it seems like the intention of that patch was to
> fix the following splat:
> 
> WARNING: CPU: 5 PID: 1062 at
> drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/dc_fpu.c:71
> dc_assert_fp_enabled+0x1a/0x30 [amdgpu]
> [...]
> CPU: 5 PID: 1062 Comm: Xorg Tainted: G   OE 5.15.0-56-generic

So it has hard to look this up with a upstream v5.15 kernel since the
dcn32_populate_dml_pipes_from_context() was introduced in v6.0-rc1.
Judging from v6.0-rc1 I don't see how that warning could occur other
than using dc_assert_fp_enabled() without invoking DC_FP_START first.

> Hamza

Sebastian

Re: [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin().

2023-10-04 Thread Sebastian Andrzej Siewior

On 2023-10-03 15:53:41 [-0400], Harry Wentland wrote:
> On 2023-09-21 10:15, Sebastian Andrzej Siewior wrote:
> > This is a revert of the commit mentioned below while it is not wrong, as
> > in the kernel will explode, having migrate_disable() here it is
> > complete waste of resources.
> > 
> > Additionally commit message is plain wrong the review tag does not make
> 
> Not sure I follow what's unhelpful about the review tag with
> 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of 
> per-CPU variable")

I explained it below with two points what the reviewer should have
noticed why reading the commit message even if he does not know what
migrate_disable() itself does.

> I do wish the original patch showed the splat it's attempting
> to fix. It apparently made a difference for something, whether
> inadvertently or not. I wish I knew what that "something" was.

As far as I can tell the patch does make a difference.

> Harry

Sebastian

Re: [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.

2023-10-04 Thread Sebastian Andrzej Siewior

On 2023-10-03 15:54:58 [-0400], Harry Wentland wrote:
> On 2023-10-02 06:58, Sebastian Andrzej Siewior wrote:
> > On 2023-09-22 07:33:26 [+0200], Christian König wrote:
> >> Am 21.09.23 um 16:15 schrieb Sebastian Andrzej Siewior:
> >>> Hi,
> >>>
> >>> I stumbled uppon the amdgpu driver via a bugzilla report. The actual fix
> >>> is #4 + #5 and the rest was made while looking at the code.
> >>
> >> Oh, yes please :)
> >>
> >> Rodrigo and I have been trying to sort those things out previously, but
> >> that's Sisyphean work.
> >>
> >> In general the DC team needs to judge, but of hand it looks good to me.
> > 
> > Any way to get this merged? There was no reply from the DC team… No
> > reply from the person breaking it either. The bugzilla reporter stated
> > that it solves his trouble. He didn't report anything new ;)
> > 
> 
> Apologies for the slow progress. We're feeding it through our CI and
> will let you know the verdict soon.
> 
> Do you happen to have the bugzilla link that this is fixing? It would
> be helpful to include that as a link in the patches as well, to give
> them context.
The bugzilla report is at
  https://bugzilla.kernel.org/show_bug.cgi?id=217928

but the patches explain the situation, too. Even more verbose than the
report…

> Harry

Sebastian

Re: [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.

2023-10-02 Thread Sebastian Andrzej Siewior

On 2023-09-22 07:33:26 [+0200], Christian König wrote:
> Am 21.09.23 um 16:15 schrieb Sebastian Andrzej Siewior:
> > Hi,
> > 
> > I stumbled uppon the amdgpu driver via a bugzilla report. The actual fix
> > is #4 + #5 and the rest was made while looking at the code.
> 
> Oh, yes please :)
> 
> Rodrigo and I have been trying to sort those things out previously, but
> that's Sisyphean work.
> 
> In general the DC team needs to judge, but of hand it looks good to me.

Any way to get this merged? There was no reply from the DC team… No
reply from the person breaking it either. The bugzilla reporter stated
that it solves his trouble. He didn't report anything new ;)

> Christian.

Sebastian

[PATCH 5/5] drm/amd/display: Move the memory allocation out of dcn20_validate_bandwidth_fp().

2023-09-21 Thread Sebastian Andrzej Siewior

dcn20_validate_bandwidth_fp() is invoked while FPU access has been
enabled. FPU access requires disabling preemption even on PREEMPT_RT.
It is not possible to allocate memory with disabled preemption even with
GFP_ATOMIC on PREEMPT_RT.

Move the memory allocation before FPU access is enabled.
To preserve previous "clean" state of "pipes" add a memset() before the
second invocation of dcn20_validate_bandwidth_internal() where the
variable is used.

Signed-off-by: Sebastian Andrzej Siewior 
---
 .../drm/amd/display/dc/dcn20/dcn20_resource.c| 10 +-
 .../gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 16 +++-
 .../gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h |  5 ++---
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index d587f807dfd7c..5036a3e608324 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -2141,9 +2141,17 @@ bool dcn20_validate_bandwidth(struct dc *dc, struct 
dc_state *context,
bool fast_validate)
 {
bool voltage_supported;
+   display_e2e_pipe_params_st *pipes;
+
+   pipes = kcalloc(dc->res_pool->pipe_count, 
sizeof(display_e2e_pipe_params_st), GFP_KERNEL);
+   if (!pipes)
+   return false;
+
DC_FP_START();
-   voltage_supported = dcn20_validate_bandwidth_fp(dc, context, 
fast_validate);
+   voltage_supported = dcn20_validate_bandwidth_fp(dc, context, 
fast_validate, pipes);
DC_FP_END();
+
+   kfree(pipes);
return voltage_supported;
 }
 
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
index 8b2038162a7e1..2ad92497b9bf2 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
@@ -1923,7 +1923,7 @@ void dcn20_patch_bounding_box(struct dc *dc, struct 
_vcs_dpi_soc_bounding_box_st
 }
 
 static bool dcn20_validate_bandwidth_internal(struct dc *dc, struct dc_state 
*context,
-   bool fast_validate)
+   bool fast_validate, display_e2e_pipe_params_st *pipes)
 {
bool out = false;
 
@@ -1932,7 +1932,6 @@ static bool dcn20_validate_bandwidth_internal(struct dc 
*dc, struct dc_state *co
int vlevel = 0;
int pipe_split_from[MAX_PIPES];
int pipe_cnt = 0;
-   display_e2e_pipe_params_st *pipes = kzalloc(dc->res_pool->pipe_count * 
sizeof(display_e2e_pipe_params_st), GFP_ATOMIC);
DC_LOGGER_INIT(dc->ctx->logger);
 
BW_VAL_TRACE_COUNT();
@@ -1967,16 +1966,14 @@ static bool dcn20_validate_bandwidth_internal(struct dc 
*dc, struct dc_state *co
out = false;
 
 validate_out:
-   kfree(pipes);
 
BW_VAL_TRACE_FINISH();
 
return out;
 }
 
-bool dcn20_validate_bandwidth_fp(struct dc *dc,
-struct dc_state *context,
-bool fast_validate)
+bool dcn20_validate_bandwidth_fp(struct dc *dc, struct dc_state *context,
+bool fast_validate, display_e2e_pipe_params_st 
*pipes)
 {
bool voltage_supported = false;
bool full_pstate_supported = false;
@@ -1995,11 +1992,11 @@ bool dcn20_validate_bandwidth_fp(struct dc *dc,
ASSERT(context != dc->current_state);
 
if (fast_validate) {
-   return dcn20_validate_bandwidth_internal(dc, context, true);
+   return dcn20_validate_bandwidth_internal(dc, context, true, 
pipes);
}
 
// Best case, we support full UCLK switch latency
-   voltage_supported = dcn20_validate_bandwidth_internal(dc, context, 
false);
+   voltage_supported = dcn20_validate_bandwidth_internal(dc, context, 
false, pipes);
full_pstate_supported = 
context->bw_ctx.bw.dcn.clk.p_state_change_support;
 
if (context->bw_ctx.dml.soc.dummy_pstate_latency_us == 0 ||
@@ -2011,7 +2008,8 @@ bool dcn20_validate_bandwidth_fp(struct dc *dc,
// Fallback: Try to only support G6 temperature read latency
context->bw_ctx.dml.soc.dram_clock_change_latency_us = 
context->bw_ctx.dml.soc.dummy_pstate_latency_us;
 
-   voltage_supported = dcn20_validate_bandwidth_internal(dc, context, 
false);
+   memset(pipes, 0, dc->res_pool->pipe_count * 
sizeof(display_e2e_pipe_params_st));
+   voltage_supported = dcn20_validate_bandwidth_internal(dc, context, 
false, pipes);
dummy_pstate_supported = 
context->bw_ctx.bw.dcn.clk.p_state_change_support;
 
if (voltage_supported && (dummy_pstate_supported || 
!(context->stream_count))) {
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h
index a81a0b9e68842..b6c34198ddc86 100644
--- a/drive

[PATCH 4/5] drm/amd/display: Move the memory allocation out of dcn21_validate_bandwidth_fp().

2023-09-21 Thread Sebastian Andrzej Siewior

dcn21_validate_bandwidth_fp() is invoked while FPU access has been
enabled. FPU access requires disabling preemption even on PREEMPT_RT.
It is not possible to allocate memory with disabled preemption even with
GFP_ATOMIC on PREEMPT_RT.

Move the memory allocation before FPU access is enabled.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217928
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 10 +-
 drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  |  7 ++-
 drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h  |  5 ++---
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
index d1a25fe6c44fa..5674c3450fc36 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
@@ -953,9 +953,17 @@ static bool dcn21_validate_bandwidth(struct dc *dc, struct 
dc_state *context,
bool fast_validate)
 {
bool voltage_supported;
+   display_e2e_pipe_params_st *pipes;
+
+   pipes = kcalloc(dc->res_pool->pipe_count, 
sizeof(display_e2e_pipe_params_st), GFP_KERNEL);
+   if (!pipes)
+   return false;
+
DC_FP_START();
-   voltage_supported = dcn21_validate_bandwidth_fp(dc, context, 
fast_validate);
+   voltage_supported = dcn21_validate_bandwidth_fp(dc, context, 
fast_validate, pipes);
DC_FP_END();
+
+   kfree(pipes);
return voltage_supported;
 }
 
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
index 5805fb02af14e..8b2038162a7e1 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
@@ -2216,9 +2216,8 @@ static void dcn21_calculate_wm(struct dc *dc, struct 
dc_state *context,
>bw_ctx.dml, pipes, 
pipe_cnt);
 }
 
-bool dcn21_validate_bandwidth_fp(struct dc *dc,
-struct dc_state *context,
-bool fast_validate)
+bool dcn21_validate_bandwidth_fp(struct dc *dc, struct dc_state *context,
+bool fast_validate, display_e2e_pipe_params_st 
*pipes)
 {
bool out = false;
 
@@ -2227,7 +2226,6 @@ bool dcn21_validate_bandwidth_fp(struct dc *dc,
int vlevel = 0;
int pipe_split_from[MAX_PIPES];
int pipe_cnt = 0;
-   display_e2e_pipe_params_st *pipes = kzalloc(dc->res_pool->pipe_count * 
sizeof(display_e2e_pipe_params_st), GFP_ATOMIC);
DC_LOGGER_INIT(dc->ctx->logger);
 
BW_VAL_TRACE_COUNT();
@@ -2267,7 +2265,6 @@ bool dcn21_validate_bandwidth_fp(struct dc *dc,
out = false;
 
 validate_out:
-   kfree(pipes);
 
BW_VAL_TRACE_FINISH();
 
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h
index c51badf7b68a9..a81a0b9e68842 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h
@@ -77,9 +77,8 @@ int dcn21_populate_dml_pipes_from_context(struct dc *dc,
  struct dc_state *context,
  display_e2e_pipe_params_st *pipes,
  bool fast_validate);
-bool dcn21_validate_bandwidth_fp(struct dc *dc,
-struct dc_state *context,
-bool fast_validate);
+bool dcn21_validate_bandwidth_fp(struct dc *dc, struct dc_state *context, bool
+fast_validate, display_e2e_pipe_params_st 
*pipes);
 void dcn21_update_bw_bounding_box(struct dc *dc, struct clk_bw_params 
*bw_params);
 
 void dcn21_clk_mgr_set_bw_params_wm_table(struct clk_bw_params *bw_params);
-- 
2.40.1

[PATCH 3/5] drm/amd/display: Add a warning if the FPU is used outside from task context.

2023-09-21 Thread Sebastian Andrzej Siewior

Add a warning if the FPU is used from any context other than task
context. This is only precaution since the code is not able to be used
from softirq while the API allows it on x86 for instance.

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 8bd5926b47e06..4ae4720535a56 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -84,6 +84,7 @@ void dc_fpu_begin(const char *function_name, const int line)
 {
int depth;
 
+   WARN_ON_ONCE(!in_task());
preempt_disable();
depth = __this_cpu_inc_return(fpu_recursion_depth);
 
-- 
2.40.1

[PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.

2023-09-21 Thread Sebastian Andrzej Siewior

Hi,

I stumbled uppon the amdgpu driver via a bugzilla report. The actual fix
is #4 + #5 and the rest was made while looking at the code.

Sebastian

[PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin().

2023-09-21 Thread Sebastian Andrzej Siewior

This is a revert of the commit mentioned below while it is not wrong, as
in the kernel will explode, having migrate_disable() here it is
complete waste of resources.

Additionally commit message is plain wrong the review tag does not make
it any better. The migrate_disable() interface has a fat comment
describing it and it includes the word "undesired" in the headline which
should tickle people to read it before using it.
Initially I assumed it is worded too harsh but now I beg to differ.

The reviewer of the original commit, even not understanding what
migrate_disable() does should ask the following:

- migrate_disable() is added only to the CONFIG_X86 block and it claims
  to protect fpu_recursion_depth. Why are the other the architectures
  excluded?

- migrate_disable() is added after fpu_recursion_depth was modified.
  Shouldn't it be added before the modification or referencing takes
  place?

Moving on.
Disabling preemption DOES prevent CPU migration. A task, that can not be
pushed away from the CPU by the scheduler (due to disabled preemption)
can not be pushed or migrated to another CPU.

Disabling migration DOES NOT ensure consistency of per-CPU variables. It
only ensures that the task acts always on the same per-CPU variable. The
task remains preemptible meaning multiple tasks can access the same
per-CPU variable. This in turn leads to inconsistency for the statement

  *pcpu -= 1;

with two tasks on one CPU and a preemption point during the RMW
operation:

 Task A   Task B
 read pcpu to reg  # 0
 inc reg   # 0 -> 1
  read pcpu to reg  # 0
  inc reg   # 0 -> 1
  write reg to pcpu # 1
 write reg to pcpu # 1

At the end pcpu reads 1 but should read 2 instead. Boom.

get_cpu_ptr() already contains a preempt_disable() statement. That means
that the per-CPU variable can only be referenced by a single task which
is currently running. The only inconsistency that can occur if the
variable is additionally accessed from an interrupt.

Remove migrate_disable/enable() from dc_fpu_begin/end().

Cc: Tianci Yin 
Cc: Aurabindo Pillai 
Fixes: 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency 
of per-CPU variable")
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 172aa10a8800f..86f4c0e046548 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -91,7 +91,6 @@ void dc_fpu_begin(const char *function_name, const int line)
 
if (*pcpu == 1) {
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
-   migrate_disable();
kernel_fpu_begin();
 #elif defined(CONFIG_PPC64)
if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
@@ -132,7 +131,6 @@ void dc_fpu_end(const char *function_name, const int line)
if (*pcpu <= 0) {
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_end();
-   migrate_enable();
 #elif defined(CONFIG_PPC64)
if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
disable_kernel_vsx();
-- 
2.40.1

[PATCH 2/5] drm/amd/display: Simplify the per-CPU usage.

2023-09-21 Thread Sebastian Andrzej Siewior

The fpu_recursion_depth counter is used to ensure that dc_fpu_begin()
can be invoked multiple times while the FPU-disable function itself is
only invoked once. Also the counter part (dc_fpu_end()) is ballanced
properly.

Instead of using the get_cpu_ptr() dance around the inc it is simpler to
increment the per-CPU variable directly. Also the per-CPU variable has
to be incremented and decremented on the same CPU. This is ensured by
the inner-part which disables preemption. This is kind of not obvious,
works and the preempt-counter is touched a few times for no reason.

Disable preemption before incrementing fpu_recursion_depth for the first
time. Keep preemption disabled until dc_fpu_end() where the counter is
decremented making it obvious that the preemption has to stay disabled
while the counter is non-zero.
Use simple inc/dec functions.
Remove the nested preempt_disable/enable functions which are now not
needed.

Signed-off-by: Sebastian Andrzej Siewior 
---
 .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 50 ---
 1 file changed, 20 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 86f4c0e046548..8bd5926b47e06 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -60,11 +60,9 @@ static DEFINE_PER_CPU(int, fpu_recursion_depth);
  */
 inline void dc_assert_fp_enabled(void)
 {
-   int *pcpu, depth = 0;
+   int depth;
 
-   pcpu = get_cpu_ptr(_recursion_depth);
-   depth = *pcpu;
-   put_cpu_ptr(_recursion_depth);
+   depth = __this_cpu_read(fpu_recursion_depth);
 
ASSERT(depth >= 1);
 }
@@ -84,32 +82,27 @@ inline void dc_assert_fp_enabled(void)
  */
 void dc_fpu_begin(const char *function_name, const int line)
 {
-   int *pcpu;
+   int depth;
 
-   pcpu = get_cpu_ptr(_recursion_depth);
-   *pcpu += 1;
+   preempt_disable();
+   depth = __this_cpu_inc_return(fpu_recursion_depth);
 
-   if (*pcpu == 1) {
+   if (depth == 1) {
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_begin();
 #elif defined(CONFIG_PPC64)
-   if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
-   preempt_disable();
+   if (cpu_has_feature(CPU_FTR_VSX_COMP))
enable_kernel_vsx();
-   } else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) {
-   preempt_disable();
+   else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
enable_kernel_altivec();
-   } else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) {
-   preempt_disable();
+   else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
enable_kernel_fp();
-   }
 #elif defined(CONFIG_ARM64)
kernel_neon_begin();
 #endif
}
 
-   TRACE_DCN_FPU(true, function_name, line, *pcpu);
-   put_cpu_ptr(_recursion_depth);
+   TRACE_DCN_FPU(true, function_name, line, depth);
 }
 
 /**
@@ -124,29 +117,26 @@ void dc_fpu_begin(const char *function_name, const int 
line)
  */
 void dc_fpu_end(const char *function_name, const int line)
 {
-   int *pcpu;
+   int depth;
 
-   pcpu = get_cpu_ptr(_recursion_depth);
-   *pcpu -= 1;
-   if (*pcpu <= 0) {
+   depth = __this_cpu_dec_return(fpu_recursion_depth);
+   if (depth == 0) {
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_end();
 #elif defined(CONFIG_PPC64)
-   if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
+   if (cpu_has_feature(CPU_FTR_VSX_COMP))
disable_kernel_vsx();
-   preempt_enable();
-   } else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) {
+   else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
disable_kernel_altivec();
-   preempt_enable();
-   } else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) {
+   else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
disable_kernel_fp();
-   preempt_enable();
-   }
 #elif defined(CONFIG_ARM64)
kernel_neon_end();
 #endif
+   } else {
+   WARN_ON_ONCE(depth < 0);
}
 
-   TRACE_DCN_FPU(false, function_name, line, *pcpu);
-   put_cpu_ptr(_recursion_depth);
+   TRACE_DCN_FPU(false, function_name, line, depth);
+   preempt_enable();
 }
-- 
2.40.1

Re: [PATCH] drm/i915: Do not disable preemption for resets

2023-09-13 Thread Sebastian Andrzej Siewior

On 2023-07-05 10:30:25 [+0100], Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic") added a
> preempt disable section over the hardware reset callback to prepare the
> driver for being able to reset from atomic contexts.
…

This missed the v6.6 merge window. Has this been dropped for some reason
or just missed by chance? Can this be still applied, please?

Sebastian

Re: [PATCH] drm/i915: Do not disable preemption for resets

2023-07-26 Thread Sebastian Andrzej Siewior

On 2023-07-05 10:30:25 [+0100], Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic") added a
> preempt disable section over the hardware reset callback to prepare the
> driver for being able to reset from atomic contexts.
> 
…

This looks like what I tested previously. So

Acked-by: Sebastian Andrzej Siewior 

Thank you.

> Signed-off-by: Tvrtko Ursulin 
> Cc: Chris Wilson 
> Cc: Paul Gortmaker 
> Cc: Sebastian Andrzej Siewior 

Sebastian

Re: [PATCH] kernel/panic: Drop unblank_screen call

2022-08-31 Thread Sebastian Andrzej Siewior

On 2022-08-30 16:50:04 [+0200], Daniel Vetter wrote:
> Long story short, I have no idea why the direct call to unblank_screen
> survived for so long (the infrastructure to do it properly existed for
> years), nor why it wasn't removed when the console_unblank() call was
> finally added. But it makes a ton more sense to finally do that than
> not - it's just better encapsulation to go through the console
> functions instead of doing a direct call, so let's dare. Plus it
> really does not make much sense to call the only unblank
> implementation there is twice, once without, and once with appropriate
> locking.

Yup, calling it twice is redundant. 
The only difference I see is that the console implementation relies on
CONFIG_VT_CONSOLE while the former relied only on CONFIG_VT. There
should be no console output without CONFIG_VT_CONSOLE so no need to
unblank it.

Acked-by: Sebastian Andrzej Siewior 

Sebastian

[PATCH 2/2] drm/i915: Drop the irqs_disabled() check

2022-03-11 Thread Sebastian Andrzej Siewior

The !irqs_disabled() check triggers on PREEMPT_RT even with
i915_sched_engine::lock acquired. The reason is the lock is transformed
into a sleeping lock on PREEMPT_RT and does not disable interrupts.

There is no need to check for disabled interrupts. The lockdep
annotation below already check if the lock has been acquired by the
caller and will yell if the interrupts are not disabled.

Remove the !irqs_disabled() check.

Reported-by: Maarten Lankhorst 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_request.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 76cf5ac91e946..41d7c1071ab52 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -583,7 +583,6 @@ bool __i915_request_submit(struct i915_request *request)
 
RQ_TRACE(request, "\n");
 
-   GEM_BUG_ON(!irqs_disabled());
lockdep_assert_held(>sched_engine->lock);
 
/*
@@ -692,7 +691,6 @@ void __i915_request_unsubmit(struct i915_request *request)
 */
RQ_TRACE(request, "\n");
 
-   GEM_BUG_ON(!irqs_disabled());
lockdep_assert_held(>sched_engine->lock);
 
/*
-- 
2.35.1

[PATCH 1/2] drm/i915/gt: Queue and wait for the irq_work item.

2022-03-11 Thread Sebastian Andrzej Siewior

Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.

Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).

Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.

Reported-by: Clark Williams 
Signed-off-by: Sebastian Andrzej Siewior 
Reviewed-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 209cf265bf746..98efeb97a6ba6 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -311,9 +311,8 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
/* Kick the work once more to drain the signalers, and disarm the irq */
irq_work_sync(>irq_work);
while (READ_ONCE(b->irq_armed) && !atomic_read(>active)) {
-   local_irq_disable();
-   signal_irq_work(>irq_work);
-   local_irq_enable();
+   irq_work_queue(>irq_work);
+   irq_work_sync(>irq_work);
cond_resched();
}
 }
-- 
2.35.1

[PATCH 0/2] drm/i915: Avoid explicit IRQ-off sections.

2022-03-11 Thread Sebastian Andrzej Siewior

Hi,

these two patches are from the RT queue. They avoid IRQ-off checks and
IRQ-off regions which are not valid/ possible on PREEMPT_RT and are not
needed on !PREEMPT_RT.

Sebastian

Re: [PATCH] drm/i915: Depend on !PREEMPT_RT.

2022-03-02 Thread Sebastian Andrzej Siewior

On 2022-03-02 11:42:35 [+], Tvrtko Ursulin wrote:
> > > >  0005-drm-i915-Don-t-check-for-atomic-context-on-PREEMPT_R.patch
> > > 
> > > What do preempt_disable/enable do on PREEMPT_RT? Thinking if instead the
> > > solution could be to always force the !ATOMIC path (for the whole
> > > _wait_for_atomic macro) on PREEMPT_RT.
> > 
> > Could be one way to handle it. But please don't disable preemption and
> > or interrupts for longer period of time as all of it increases the
> > overall latency.
> 
> I am looking for your guidance of what is the correct thing here.
> 
> Main purpose of this macro on the i915 side is to do short waits on GPU
> registers changing post write from spin-locked sections. But there were rare
> cases when very short waits were needed from unlocked sections, shorter than
> 10us (which is AFAIR what usleep_range documents should be a lower limit).
> Which is why non-atomic path was added to the macro. That path uses
> preempt_disable/enable so it can use local_clock().
>
> All this may, or may not be, compatible with PREEMPT_RT to start with?

Your assumption about atomic is not correct and that is why I aim to
ignore for RT. Or maybe alter so it fits.
It is assumed, that in_atomic() is true in an interrupts handler or with
an acquired spinlock_t, right? Both condition keep the context
preemptible so the atomic check triggers. However, both (the force
threaded interrupt handler and the spinlock_t) ensure that the task is
stuck on the CPU.

So maybe your _WAIT_FOR_ATOMIC_CHECK() could point to cant_migrate().
It looks like you try to ensure that local_clock() is from the same CPU.

> Or question phrased differently, how we should implement the <10us waits
> from non-atomic sections under PREEMPT_RT?

I think if you swap check in _WAIT_FOR_ATOMIC_CHECK() it should be good.
After all the remains preemptible during the condition polls so it
should work.

> > The problem is that you can't acquire that lock from within that
> > trace-point on PREEMPT_RT. On !RT it is possible but it is also
> > problematic because LOCKDEP does not see possible dead locks unless that
> > trace-point is enabled.
> 
> Oh I meant could the include ordering problem be fixed differently?
> 
> """
> [PATCH 07/10] drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with
>  NOTRACE
> 
> The order of the header files is important. If this header file is
> included after tracepoint.h was included then the NOTRACE here becomes a
> nop. Currently this happens for two .c files which use the tracepoitns
> behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
> """
> 
> Like these two .c files - can order of includes just be changed in them?

Maybe. Let me check and get back to you.

> > I've been talking to Steven (after
> > https://lkml.kernel.org/r/20211214115837.6f33a...@gandalf.local.home)
> > and he wants to come up with something where you can pass a lock as
> > argument to the tracing-API. That way the lock can be acquired before
> > the trace event is invoked and lockdep will see it even if the trace
> > event is disabled.
> > So there is an idea how to get it to work eventually without disabling
> > it in the long term.
> > 
> > Making the register a raw_spinlock_t would solve problem immediately but
> > I am a little worried given the increased latency in a quick test:
> > 
> > https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfy...@linutronix.de/
> > 
> > also, this one single hardware but the upper limit atomic-polls is high.
> > 
> > > >  0008-drm-i915-gt-Queue-and-wait-for-the-irq_work-item.patch
> > > 
> > > Not sure about why cond_resched was put between irq_work_queue and
> > > irq_work_sync - would it not be like-for-like change to have the two
> > > together?
> > 
> > maybe it loops for a while and an additional scheduling would be nice.
> > 
> > > Commit message makes me think _queue already starts the handler on
> > > x86 at least.
> > 
> > Yes, irq_work_queue() triggers the IRQ right away on x86,
> > irq_work_sync() would wait for it to happen in case it did not happen.
> > On architectures which don't provide an IRQ-work interrupt, it is
> > delayed to the HZ tick timer interrupt. So this serves also as an
> > example in case someone want to copy the code ;)
> 
> My question wasn't why is there a need_resched() in there, but why is the
> patch:
> 
> + irq_work_queue(>irq_work);
>   cond_resched();
> + irq_work_sync(>irq_work);
> 
> And not:
> 
> + irq_work_queue(>irq_work);
> + irq_work_sync(>irq_work);
>   cond_resched();
> 
> To preserve like for like, if my understanding of the commit message was
> correct.

No strong need, it can be put as you suggest.
Should someone else schedule >irq_work from another CPU then you
could first attempt to cond_resched() and then wait for >irq_work's
completion. Assuming that this does not happen (because the irq_work was
previously queued and invoked immediately) irq_work_sync) will just

Re: [PATCH] drm/i915: Depend on !PREEMPT_RT.

2022-03-01 Thread Sebastian Andrzej Siewior

On 2022-03-01 14:27:18 [+], Tvrtko Ursulin wrote:
> > you see:
> > 0003-drm-i915-Use-preempt_disable-enable_rt-where-recomme.patch
> > 0004-drm-i915-Don-t-disable-interrupts-on-PREEMPT_RT-duri.patch
> 
> Two for the display folks.
> 
> > 0005-drm-i915-Don-t-check-for-atomic-context-on-PREEMPT_R.patch
> 
> What do preempt_disable/enable do on PREEMPT_RT? Thinking if instead the
> solution could be to always force the !ATOMIC path (for the whole
> _wait_for_atomic macro) on PREEMPT_RT.

Could be one way to handle it. But please don't disable preemption and
or interrupts for longer period of time as all of it increases the
overall latency.

Side note: All of these patches is a collection over time. I personally
have only a single i7-sandybridge with i915 and here I don't really
enter all the possible paths here. People report, I patch and look
around and then they are quiet so I assume that it is working.

> > 0006-drm-i915-Disable-tracing-points-on-PREEMPT_RT.patch
> 
> If the issue is only with certain trace points why disable all?

It is a class and it is easier that way.

> > 0007-drm-i915-skip-DRM_I915_LOW_LEVEL_TRACEPOINTS-with-NO.patch
> 
> Didn't quite fully understand, why is this not fixable? Especially thinking
> if the option of not blanket disabling all tracepoints in the previous
> patch.

The problem is that you can't acquire that lock from within that
trace-point on PREEMPT_RT. On !RT it is possible but it is also
problematic because LOCKDEP does not see possible dead locks unless that
trace-point is enabled.

I've been talking to Steven (after
https://lkml.kernel.org/r/20211214115837.6f33a...@gandalf.local.home)
and he wants to come up with something where you can pass a lock as
argument to the tracing-API. That way the lock can be acquired before
the trace event is invoked and lockdep will see it even if the trace
event is disabled.
So there is an idea how to get it to work eventually without disabling
it in the long term.

Making the register a raw_spinlock_t would solve problem immediately but
I am a little worried given the increased latency in a quick test:
   https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfy...@linutronix.de/

also, this one single hardware but the upper limit atomic-polls is high.

> > 0008-drm-i915-gt-Queue-and-wait-for-the-irq_work-item.patch
> 
> Not sure about why cond_resched was put between irq_work_queue and
> irq_work_sync - would it not be like-for-like change to have the two
> together? 

maybe it loops for a while and an additional scheduling would be nice.

> Commit message makes me think _queue already starts the handler on
> x86 at least.

Yes, irq_work_queue() triggers the IRQ right away on x86,
irq_work_sync() would wait for it to happen in case it did not happen.
On architectures which don't provide an IRQ-work interrupt, it is
delayed to the HZ tick timer interrupt. So this serves also as an
example in case someone want to copy the code ;)

> > 0009-drm-i915-gt-Use-spin_lock_irq-instead-of-local_irq_d.patch
> 
> I think this is okay. The part after the unlock is serialized by the tasklet
> already.
> 
> Slight doubt due the comment:
> 
>   local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */
> 
> Makes me want to think about it harder but not now.

Clark reported it and confirmed that the warning is gone on RT and
everything appears to work ;)
PREEMPT_RT wise there is no synchronisation vs irq_work other than an
actual lock (in case it is needed).

> Another thing to check is if execlists_context_status_change still needs the
> atomic notifier chain with this change.
> 
> > 0010-drm-i915-Drop-the-irqs_disabled-check.patch
> 
> LGTM.

Do you want me to repost that one?

> > Revert-drm-i915-Depend-on-PREEMPT_RT.patch
> 
> Okay.
> 
> And finally for this very patch (the thread I am replying to):
> 
> Acked-by: Tvrtko Ursulin 

Thanks.

> Regards,
> 
> Tvrtko

Sebastian

Re: [PATCH] drm/i915: Depend on !PREEMPT_RT.

2022-02-28 Thread Sebastian Andrzej Siewior

On 2022-02-28 10:10:48 [+], Tvrtko Ursulin wrote:
> Hi,
Hi,

> Could you paste a link to the queue of i915 patches pending for a quick
> overview of how much work there is and in what areas?

Last post to the list:
  
https://https://lkml.kernel.org/r/.kernel.org/all/20211214140301.520464-1-bige...@linutronix.de/

or if you look at the DRM section in 
   
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/series?h=v5.17-rc6-rt10-patches#n156

you see:
   0003-drm-i915-Use-preempt_disable-enable_rt-where-recomme.patch
   0004-drm-i915-Don-t-disable-interrupts-on-PREEMPT_RT-duri.patch
   0005-drm-i915-Don-t-check-for-atomic-context-on-PREEMPT_R.patch
   0006-drm-i915-Disable-tracing-points-on-PREEMPT_RT.patch
   0007-drm-i915-skip-DRM_I915_LOW_LEVEL_TRACEPOINTS-with-NO.patch
   0008-drm-i915-gt-Queue-and-wait-for-the-irq_work-item.patch
   0009-drm-i915-gt-Use-spin_lock_irq-instead-of-local_irq_d.patch
   0010-drm-i915-Drop-the-irqs_disabled-check.patch
   Revert-drm-i915-Depend-on-PREEMPT_RT.patch

and you could view them from
   
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches?h=v5.17-rc6-rt10-patches

> Also, I assume due absence of ARCH_SUPPORTS_RT being defined by any arch,
> that something more is not yet ready?

Correct. Looking at what I have queued for the next merge window I have
less than 20 patches (excluding i915 and printk) before ARCH_SUPPORTS_RT
can be enabled for x86-64.
 
> Regards,
> 
> Tvrtko

Sebastian

Re: [PATCH] drm/i915: Depend on !PREEMPT_RT.

2022-02-25 Thread Sebastian Andrzej Siewior

On 2022-02-14 19:59:08 [+0100], To intel-...@lists.freedesktop.org wrote:
> There are a few sections in the driver which are not compatible with
> PREEMPT_RT. They trigger warnings and can lead to deadlocks at runtime.
> 
> Disable the i915 driver on a PREEMPT_RT enabled kernel. This way
> PREEMPT_RT itself can be enabled without needing to address the i915
> issues first. The RT related patches are still in RT queue and will be
> handled later.
> 
> Signed-off-by: Sebastian Andrzej Siewior 

A gentle ping ;)

> ---
>  drivers/gpu/drm/i915/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index a4c94dc2e2164..3aa719d5a0f0d 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -3,6 +3,7 @@ config DRM_I915
>   tristate "Intel 8xx/9xx/G3x/G4x/HD Graphics"
>   depends on DRM
>   depends on X86 && PCI
> + depends on !PREEMPT_RT
>   select INTEL_GTT
>   select INTERVAL_TREE
>   # we need shmfs for the swappable backing store, and in particular

Sebastian

[PATCH] drm/i915: Depend on !PREEMPT_RT.

2022-02-14 Thread Sebastian Andrzej Siewior

There are a few sections in the driver which are not compatible with
PREEMPT_RT. They trigger warnings and can lead to deadlocks at runtime.

Disable the i915 driver on a PREEMPT_RT enabled kernel. This way
PREEMPT_RT itself can be enabled without needing to address the i915
issues first. The RT related patches are still in RT queue and will be
handled later.

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index a4c94dc2e2164..3aa719d5a0f0d 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -3,6 +3,7 @@ config DRM_I915
tristate "Intel 8xx/9xx/G3x/G4x/HD Graphics"
depends on DRM
depends on X86 && PCI
+   depends on !PREEMPT_RT
select INTEL_GTT
select INTERVAL_TREE
# we need shmfs for the swappable backing store, and in particular
-- 
2.34.1

Re: [PATCH 4/8] drm/i915: Use preempt_disable/enable_rt() where recommended

2022-02-11 Thread Sebastian Andrzej Siewior

On 2022-01-27 00:29:37 [+0100], Mario Kleiner wrote:
> Hi, first thank you for implementing these preempt disables according to
Hi Mario,

> the markers i left long ago. And sorry for the rather late reply.
> 
> I had a look at the code, as of Linux 5.16, and did also a little test run
> (of a standard kernel, not with PREEMPT_RT, only
> CONFIG_PREEMPT_VOLUNTARY=y) on my Intel Kabylake GT2, so some thoughts:
> 
> The area covers only register reads and writes. The part that worries me
> > is:
> > - __intel_get_crtc_scanline() the worst case is 100us if no match is
> >   found.
> >
> 
> This one can be a problem indeed on (maybe all?) modern Intel gpu's since
> Haswell, ie. the last ~10 years. I was able to reproduce it on my Kabylake
> Intel gpu.
> 
> Most of the time that for-loop with up to 100 repetitions (~ 100
> udelay(1) + one mmio register read) (cfe.
> https://elixir.bootlin.com/linux/v5.17-rc1/source/drivers/gpu/drm/i915/i915_irq.c#L856)
> will not execute, because most of the time that function gets called from
> the vblank irq handler and then that trigger condition (if
> (HAS_DDI(dev_priv) && !position)) is not true. However, it also gets called
> as part of power-saving on behalf of userspace context, whenever the
> desktop graphics goes idle for two video refresh cycles. If the desktop
> shows graphics activity again, and vblank interrupts need to get reenabled,
> the probability of hitting that case is then ~1-4% depending on video mode.
> How many loops it runs also varies.
> 
> On my little Intel(R) Core(TM) i5-8250U CPU machine with a mostly idle
> desktop, I observed about one hit every couple of seconds of regular use,
> and each hit took between 125 usecs and almost 250 usecs. I guess udelay(1)
> can take a bit longer than 1 usec?

it should get very close to this. Maybe something else extended the time
depending on what you observe.

> So that's too much for preempt-rt. What one could do is the following:
> 
> 1. In the for-loop in __intel_get_crtc_scanline(), add a preempt_enable()
> before the udelay(1); and a preempt_disable() again after it. Or
> potentially around the whole for-loop if the overhead of
> preempt_en/disable() is significant?

It is very optimized on x86 ;)

> 2. In intel_get_crtc_scanline() also wrap the call to
> __intel_get_crtc_scanline() into a preempt_disable() and preempt_enable(),
> so we can be sure that __intel_get_crtc_scanline() always gets called with
> preemption disabled.
> 
> Why should this work ok'ish? The point of the original preempt disable
> inside i915_get_crtc_scanoutpos
> 
> is that those two *stime = ktime_get() and *etime = ktime_get() clock
> queries happen as close to the scanout position query as possible to get a
> small confidence interval for when exactly the scanoutpos was
> read/determined from the display hardware. error = (etime - stime) is the
> error margin. If that margin becomes greater than 20 usecs, then the
> higher-level code will consider the measurement invalid and repeat the
> whole procedure up to 3 times before giving up.

The preempt-disable is needed then? The task is preemptible here on
PREEMPT_RT but it _may_ not come to this. The difference vs !RT is that
an interrupt will preempt this code without it.

> Normally, in my experience with different graphics chips, one would observe
> error < 3 usecs, so the measurement almost always succeeds at first try,
> only very rarely takes two attempts. The preempt disable is meant to make
> sure that this stays the case on a PREEMPT_RT kernel.

Was it needed?

> The problem here are the relatively rare cases where we hit that up to 100
> iterations for-loop. Here even on a regular kernel, due to hardware quirks,
> we already exceed the 20 usecs tolerance by a huge amount of more than 100
> usecs, leading to a retry of the measurement. And my tests showed that
> often the two succeeding retries also fail, because of hardware quirks can
> apparently create a blackout situation approaching 1 msec, so we lose
> anyway, regardless if we get preempted on a RT kernel or not. That's why
> enabling preemption on RT again during that for-loop should not make the
> situation worse and at least keep RT as real-time as intended.
> 
> In practice I would also expect that this failure case is the one least
> likely to impair userspace applications greatly in practice. The cases that
> mostly matter are the ones executed during vblank hardware irq, where the
> for-loop never executes and error margin and preempt off time is only about
> 1 usec. My own software which depends on very precise timestamps from the
> mechanism never reported >> 20 usecs errors during startup tests or runtime
> tests.

That is without RT I assume?

> > - intel_crtc_scanlines_since_frame_timestamp() not sure how long this
> >   may take in the worst case.
> >
> >
> intel_crtc_scanlines_since_frame_timestamp() should be harmless. That
>

Re: [Intel-gfx] [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT

2022-02-08 Thread Sebastian Andrzej Siewior

On 2021-12-14 11:58:37 [-0500], Steven Rostedt wrote:
> On Tue, 14 Dec 2021 18:34:50 +0200
> Ville Syrjälä  wrote:
> 
> > Looks lightly tedious. Can't we have "slow" (or whatever) versions of
> > the trace macros so we could just declare these the same was as before
> > without having to manually write that wrapper for every event?
> 
> That would be quite tedious as well ;-)
…
> It may be possible to do, but it will be far from trivial, and I'm not sure
> I want this to be an easy option. Locks should not be taken from trace
> events in general, as they are not tested with lockdep when the trace
> points are not enabled, and could hide deadlocks that may not appear until
> running on production.

So we disable the tracing points or try what Steven suggested?

> -- Steve

Sebastian

Re: [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT

2021-12-14 Thread Sebastian Andrzej Siewior

On 2021-12-14 09:36:52 [-0500], Steven Rostedt wrote:
> Another way around this that I can see is if the data for the tracepoints
> can fit on the stack and add wrappers around the tracepoints. For example,
> looking at the first tracepoint in i915_trace.h:
…

Nice.

> We could modify this to be:
…
> static inline void do_trace_intel_pipe(struct intel_crtc *crtc)
> {
>   u32 frame[3];
>   u32 scanline[3];
>   enum pipe pipe;
> 
>   if (!trace_intel_pipe_enable_enabled())
>   return;
> 
>   struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
>   struct intel_crtc *it__;
>   for_each_intel_crtc(_priv->drm, it__) {
>   frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
>   scanline[it__->pipe] = intel_get_crtc_scanline(it__);
>   }
> 
>   trace_intel_pipe(frame, scanline, crtc->pipe);
> }
…

> Then have the code call do_trace_intel_pipe() instead of trace_intel_pipe()
> and this should fix the issue with preempt rt.

Is this is something, that the i915 devs would accept?

> -- Steve

Sebastian

[PATCH 6/8] drm/i915: Don't check for atomic context on PREEMPT_RT

2021-12-14 Thread Sebastian Andrzej Siewior

The !in_atomic() check in _wait_for_atomic() triggers on PREEMPT_RT
because the uncore::lock is a spinlock_t and does not disable
preemption or interrupts.

Changing the uncore:lock to a raw_spinlock_t doubles the worst case
latency on an otherwise idle testbox during testing. Therefore I'm
currently unsure about changing this.

Link: https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfy...@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_utils.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index 7a5925072466a..b7b56fb1e2fc7 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -344,7 +344,7 @@ wait_remaining_ms_from_jiffies(unsigned long 
timestamp_jiffies, int to_wait_ms)
 #define wait_for(COND, MS) _wait_for((COND), (MS) * 1000, 10, 1000)
 
 /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
-#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
+#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT) && 
!defined(CONFIG_PREEMPT_RT)
 # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
 #else
 # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
-- 
2.34.1

[PATCH 8/8] drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE

2021-12-14 Thread Sebastian Andrzej Siewior

The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.

Cc: Steven Rostedt 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
 drivers/gpu/drm/i915/i915_trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 64c3fa7cc05df..89a4089bc4baf 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -823,7 +823,7 @@ DEFINE_EVENT(i915_request, i915_request_add,
 TP_ARGS(rq)
 );
 
-#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) && !defined(NOTRACE)
 DEFINE_EVENT(i915_request, i915_request_guc_submit,
 TP_PROTO(struct i915_request *rq),
 TP_ARGS(rq)
-- 
2.34.1

[PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT

2021-12-14 Thread Sebastian Andrzej Siewior

Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x0003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
|  rt_spin_lock+0x3f/0x50
|  gen6_read32+0x45/0x1d0 [i915]
|  g4x_get_vblank_counter+0x36/0x40 [i915]
|  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]

The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spinlock_t locks which are transformed into
sleeping locks on PREEMPT_RT. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping locks on PREEMPT_RT.
At the time the arguments are evaluated within trace point, preemption
is disabled and so the locks must not be acquired on PREEMPT_RT.

Based on this I don't see any other way than disable trace points on
PREMPT_RT.

Reported-by: Luca Abeni 
Cc: Steven Rostedt 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_trace.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 8104981a66044..64c3fa7cc05df 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -2,6 +2,10 @@
 #if !defined(_I915_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ)
 #define _I915_TRACE_H_
 
+#ifdef CONFIG_PREEMPT_RT
+#define NOTRACE
+#endif
+
 #include 
 #include 
 #include 
-- 
2.34.1

[PATCH 5/8] drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates

2021-12-14 Thread Sebastian Andrzej Siewior

From: Mike Galbraith 

Commit
   8d7849db3eab7 ("drm/i915: Make sprite updates atomic")

started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.

According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
If this needs to happen with disabled interrupts on PREEMPT_RT, and the
whole section is restricted to register access then all sleeping locks
need to be acquired before interrupts are disabled and some function
maybe moved after enabling interrupts again.
This includes:
- prepare_to_wait() + finish_wait() due its wake queue.
- drm_crtc_vblank_put() -> vblank_disable_fn() drm_device::vbl_lock.
- skl_pfit_enable(), intel_update_plane(), vlv_atomic_update_fifo() and
  maybe others due to intel_uncore::lock
- drm_crtc_arm_vblank_event() due to drm_device::event_lock and
  drm_device::vblank_time_lock.

Don't disable interrupts on PREEMPT_RT during atomic updates.

[bigeasy: drop local locks, commit message]

Signed-off-by: Mike Galbraith 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/display/intel_crtc.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_crtc.c 
b/drivers/gpu/drm/i915/display/intel_crtc.c
index 254e67141a776..7a39029b083f4 100644
--- a/drivers/gpu/drm/i915/display/intel_crtc.c
+++ b/drivers/gpu/drm/i915/display/intel_crtc.c
@@ -425,7 +425,8 @@ void intel_pipe_update_start(const struct intel_crtc_state 
*new_crtc_state)
 */
intel_psr_wait_for_idle(new_crtc_state);
 
-   local_irq_disable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
 
crtc->debug.min_vbl = min;
crtc->debug.max_vbl = max;
@@ -450,11 +451,13 @@ void intel_pipe_update_start(const struct 
intel_crtc_state *new_crtc_state)
break;
}
 
-   local_irq_enable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_enable();
 
timeout = schedule_timeout(timeout);
 
-   local_irq_disable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
}
 
finish_wait(wq, );
@@ -487,7 +490,8 @@ void intel_pipe_update_start(const struct intel_crtc_state 
*new_crtc_state)
return;
 
 irq_disable:
-   local_irq_disable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_VBLANK_EVADE)
@@ -566,7 +570,8 @@ void intel_pipe_update_end(struct intel_crtc_state 
*new_crtc_state)
new_crtc_state->uapi.event = NULL;
}
 
-   local_irq_enable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_enable();
 
/* Send VRR Push to terminate Vblank */
intel_vrr_send_push(new_crtc_state);
-- 
2.34.1

[PATCH 1/8] drm/i915: Drop the irqs_disabled() check

2021-12-14 Thread Sebastian Andrzej Siewior

The !irqs_disabled() check triggers on PREEMPT_RT even with
i915_sched_engine::lock acquired. The reason is the lock is transformed
into a sleeping lock on PREEMPT_RT and does not disable interrupts.

There is no need to check for disabled interrupts. The lockdep
annotation below already check if the lock has been acquired by the
caller and will yell if the interrupts are not disabled.

Remove the !irqs_disabled() check.

Reported-by: Maarten Lankhorst 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_request.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index fe682b6902aae..304565d567a1a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -581,7 +581,6 @@ bool __i915_request_submit(struct i915_request *request)
 
RQ_TRACE(request, "\n");
 
-   GEM_BUG_ON(!irqs_disabled());
lockdep_assert_held(>sched_engine->lock);
 
/*
@@ -690,7 +689,6 @@ void __i915_request_unsubmit(struct i915_request *request)
 */
RQ_TRACE(request, "\n");
 
-   GEM_BUG_ON(!irqs_disabled());
lockdep_assert_held(>sched_engine->lock);
 
/*
-- 
2.34.1

[PATCH 4/8] drm/i915: Use preempt_disable/enable_rt() where recommended

2021-12-14 Thread Sebastian Andrzej Siewior

From: Mike Galbraith 

Mario Kleiner suggest in commit
  ad3543ede630f ("drm/intel: Push get_scanout_position() timestamping into kms 
driver.")

a spots where preemption should be disabled on PREEMPT_RT. The
difference is that on PREEMPT_RT the intel_uncore::lock disables neither
preemption nor interrupts and so region remains preemptible.

The area covers only register reads and writes. The part that worries me
is:
- __intel_get_crtc_scanline() the worst case is 100us if no match is
  found.

- intel_crtc_scanlines_since_frame_timestamp() not sure how long this
  may take in the worst case.

It was in the RT queue for a while and nobody complained.
Disable preemption on PREEPMPT_RT during timestamping.

[bigeasy: patch description.]

Cc: Mario Kleiner 
Signed-off-by: Mike Galbraith 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_irq.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 038a9ec563c10..8e9ff0bcbc7e4 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -916,7 +916,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc,
 */
spin_lock_irqsave(_priv->uncore.lock, irqflags);
 
-   /* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
+   if (IS_ENABLED(CONFIG_PREEMPT_RT))
+   preempt_disable();
 
/* Get optional system timestamp before query. */
if (stime)
@@ -980,7 +981,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc,
if (etime)
*etime = ktime_get();
 
-   /* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
+   if (IS_ENABLED(CONFIG_PREEMPT_RT))
+   preempt_enable();
 
spin_unlock_irqrestore(_priv->uncore.lock, irqflags);
 
-- 
2.34.1

[PATCH 2/8] drm/i915/gt: Queue and wait for the irq_work item.

2021-12-14 Thread Sebastian Andrzej Siewior

Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.

Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).

Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.

Reported-by: Clark Williams 
Signed-off-by: Sebastian Andrzej Siewior 
Reviewed-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 209cf265bf746..6e1b9068d944c 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -311,10 +311,9 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
/* Kick the work once more to drain the signalers, and disarm the irq */
irq_work_sync(>irq_work);
while (READ_ONCE(b->irq_armed) && !atomic_read(>active)) {
-   local_irq_disable();
-   signal_irq_work(>irq_work);
-   local_irq_enable();
+   irq_work_queue(>irq_work);
cond_resched();
+   irq_work_sync(>irq_work);
}
 }
 
-- 
2.34.1

[PATCH 0/8] drm/i915: PREEMPT_RT related fixups.

2021-12-14 Thread Sebastian Andrzej Siewior



Hi,

The following patches are from the PREEMPT_RT queue. One patch was
applied, one added so here are eight again. I can post them in smaller
batches if that is preferred.
It is mostly about disabling interrupts/preemption which leads to
problems.  Unfortunately DRM_I915_LOW_LEVEL_TRACEPOINTS had to be
disabled because it acquires locks from within trace points. Making the
lock a raw_spinlock_t led to higher latencies during video playback
  https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfy...@linutronix.de/

and I'm not sure if I hit the worse case here.
I tested it on a SandyBridge with built-in i915 by using X, OpenGL and
playing videos without noticing any warnings. However, some code paths
were not entered.

Sebastian

[PATCH 3/8] drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()

2021-12-14 Thread Sebastian Andrzej Siewior

execlists_dequeue() is invoked from a function which uses
local_irq_disable() to disable interrupts so the spin_lock() behaves
like spin_lock_irq().
This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not
the same as spin_lock_irq().

execlists_dequeue_irq() and execlists_dequeue() has each one caller
only. If intel_engine_cs::active::lock is acquired and released with the
_irq suffix then it behaves almost as if execlists_dequeue() would be
invoked with disabled interrupts. The difference is the last part of the
function which is then invoked with enabled interrupts.
I can't tell if this makes a difference. From looking at it, it might
work to move the last unlock at the end of the function as I didn't find
anything that would acquire the lock again.

Reported-by: Clark Williams 
Signed-off-by: Sebastian Andrzej Siewior 
Reviewed-by: Maarten Lankhorst 
---
 .../drm/i915/gt/intel_execlists_submission.c| 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index a69df5e9e77af..2d5f0c226ad66 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1284,7 +1284,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * and context switches) submission.
 */
 
-   spin_lock(_engine->lock);
+   spin_lock_irq(_engine->lock);
 
/*
 * If the queue is higher priority than the last
@@ -1384,7 +1384,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * Even if ELSP[1] is occupied and not worthy
 * of timeslices, our queue might be.
 */
-   spin_unlock(_engine->lock);
+   spin_unlock_irq(_engine->lock);
return;
}
}
@@ -1410,7 +1410,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 
if (last && !can_merge_rq(last, rq)) {
spin_unlock(>base.sched_engine->lock);
-   spin_unlock(>sched_engine->lock);
+   spin_unlock_irq(>sched_engine->lock);
return; /* leave this for another sibling */
}
 
@@ -1572,7 +1572,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 */
sched_engine->queue_priority_hint = queue_prio(sched_engine);
i915_sched_engine_reset_on_empty(sched_engine);
-   spin_unlock(_engine->lock);
+   spin_unlock_irq(_engine->lock);
 
/*
 * We can skip poking the HW if we ended up with exactly the same set
@@ -1598,13 +1598,6 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
}
 }
 
-static void execlists_dequeue_irq(struct intel_engine_cs *engine)
-{
-   local_irq_disable(); /* Suspend interrupts across request submission */
-   execlists_dequeue(engine);
-   local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */
-}
-
 static void clear_ports(struct i915_request **ports, int count)
 {
memset_p((void **)ports, NULL, count);
@@ -2425,7 +2418,7 @@ static void execlists_submission_tasklet(struct 
tasklet_struct *t)
}
 
if (!engine->execlists.pending[0]) {
-   execlists_dequeue_irq(engine);
+   execlists_dequeue(engine);
start_timeslice(engine);
}
 
-- 
2.34.1

[PATCH v2] drm/i915: Don't disable interrupts and pretend a lock as been acquired in __timeline_mark_lock().

2021-12-10 Thread Sebastian Andrzej Siewior

This is a revert of commits
   d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as 
irqsafe")
   6c69a45445af9 ("drm/i915/gt: Mark context->active_count as protected by 
timeline->mutex")
   6dcb85a0ad990 ("drm/i915: Hold irq-off for the entire fake lock period")

The existing code leads to a different behaviour depending on whether
lockdep is enabled or not. Any following lock that is acquired without
disabling interrupts (but needs to) will not be noticed by lockdep.

This it not just a lockdep annotation but is used but an actual mutex_t
that is properly used as a lock but in case of __timeline_mark_lock()
lockdep is only told that it is acquired but no lock has been acquired.

It appears that its purpose is just satisfy the lockdep_assert_held()
check in intel_context_mark_active(). The other problem with disabling
interrupts is that on PREEMPT_RT interrupts are also disabled which
leads to problems for instance later during memory allocation.

Add a CONTEXT_IS_PARKING bit to intel_engine_cs and set_bit/clear_bit it
instead of mutex_acquire/mutex_release. Use test_bit in the two
identified spots which relied on the lockdep annotation.

Cc: Peter Zijlstra 
Signed-off-by: Sebastian Andrzej Siewior 
Acked-by: Daniel Vetter 
---
v1…v2:
 - Add commit 6dcb85a0ad990 as reference.
 - Name the bit CONTEXT_IS_PARKING.

 drivers/gpu/drm/i915/gt/intel_context.h   |  3 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  1 +
 drivers/gpu/drm/i915/gt/intel_engine_pm.c | 38 +--
 drivers/gpu/drm/i915/i915_request.h   |  3 +-
 4 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index 246c37d72cd73..d8c74bbf9aae2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -211,7 +211,8 @@ static inline void intel_context_enter(struct intel_context 
*ce)
 
 static inline void intel_context_mark_active(struct intel_context *ce)
 {
-   lockdep_assert_held(>timeline->mutex);
+   lockdep_assert(lockdep_is_held(>timeline->mutex) ||
+  test_bit(CONTEXT_IS_PARKING, >flags));
++ce->active_count;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 9e0177dc5484e..30cd81ad8911a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -118,6 +118,7 @@ struct intel_context {
 #define CONTEXT_LRCA_DIRTY 9
 #define CONTEXT_GUC_INIT   10
 #define CONTEXT_PERMA_PIN  11
+#define CONTEXT_IS_PARKING 12
 
struct {
u64 timeout_us;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index a1334b48dde7b..a8a2ad44b7e39 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -80,39 +80,6 @@ static int __engine_unpark(struct intel_wakeref *wf)
return 0;
 }
 
-#if IS_ENABLED(CONFIG_LOCKDEP)
-
-static unsigned long __timeline_mark_lock(struct intel_context *ce)
-{
-   unsigned long flags;
-
-   local_irq_save(flags);
-   mutex_acquire(>timeline->mutex.dep_map, 2, 0, _THIS_IP_);
-
-   return flags;
-}
-
-static void __timeline_mark_unlock(struct intel_context *ce,
-  unsigned long flags)
-{
-   mutex_release(>timeline->mutex.dep_map, _THIS_IP_);
-   local_irq_restore(flags);
-}
-
-#else
-
-static unsigned long __timeline_mark_lock(struct intel_context *ce)
-{
-   return 0;
-}
-
-static void __timeline_mark_unlock(struct intel_context *ce,
-  unsigned long flags)
-{
-}
-
-#endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
-
 static void duration(struct dma_fence *fence, struct dma_fence_cb *cb)
 {
struct i915_request *rq = to_request(fence);
@@ -159,7 +126,6 @@ static bool switch_to_kernel_context(struct intel_engine_cs 
*engine)
 {
struct intel_context *ce = engine->kernel_context;
struct i915_request *rq;
-   unsigned long flags;
bool result = true;
 
/*
@@ -214,7 +180,7 @@ static bool switch_to_kernel_context(struct intel_engine_cs 
*engine)
 * engine->wakeref.count, we may see the request completion and retire
 * it causing an underflow of the engine->wakeref.
 */
-   flags = __timeline_mark_lock(ce);
+   set_bit(CONTEXT_IS_PARKING, >flags);
GEM_BUG_ON(atomic_read(>timeline->active_count) < 0);
 
rq = __i915_request_create(ce, GFP_NOWAIT);
@@ -246,7 +212,7 @@ static bool switch_to_kernel_context(struct intel_engine_cs 
*engine)
 
result = false;
 out_unlock:
-   __timeline_mark_unlock(ce, flags);
+   clear_bit(CONTEXT_IS_PARKING, >flags);
re

Re: [PATCH] drm/i915: Don't disable interrupts and pretend a lock as been acquired in __timeline_mark_lock().

2021-11-30 Thread Sebastian Andrzej Siewior

On 2021-11-19 17:04:00 [+0100], Daniel Vetter wrote:
> Yeah if we can simplify this with reverts then I'm all for this.
> 
> Acked-by: Daniel Vetter 
> 
> I've asked drm/i915 maintainers to check

Thanks. Should I repost my queue (excluding this one) or should wait
until this one has been taken care?

> -Daniel

Sebastian

[PATCH] drm/i915: Don't disable interrupts and pretend a lock as been acquired in __timeline_mark_lock().

2021-11-18 Thread Sebastian Andrzej Siewior

This is a revert of commits
   d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as 
irqsafe")
   6c69a45445af9 ("drm/i915/gt: Mark context->active_count as protected by 
timeline->mutex")

The existing code leads to a different behaviour depending on whether
lockdep is enabled or not. Any following lock that is acquired without
disabling interrupts (but needs to) will not be noticed by lockdep.

This it not just a lockdep annotation but is used but an actual mutex_t
that is properly used as a lock but in case of __timeline_mark_lock()
lockdep is only told that it is acquired but no lock has been acquired.

It appears that its purpose is just satisfy the lockdep_assert_held()
check in intel_context_mark_active(). The other problem with disabling
interrupts is that on PREEMPT_RT interrupts are also disabled which
leads to problems for instance later during memory allocation.

Add a CONTEXT_IS_PARKED bit to intel_engine_cs and set_bit/clear_bit it
instead of mutex_acquire/mutex_release. Use test_bit in the two
identified spots which relied on the lockdep annotation.

Cc: Peter Zijlstra 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/gt/intel_context.h   |3 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |1 
 drivers/gpu/drm/i915/gt/intel_engine_pm.c |   38 +-
 drivers/gpu/drm/i915/i915_request.h   |3 +-
 4 files changed, 7 insertions(+), 38 deletions(-)

--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -211,7 +211,8 @@ static inline void intel_context_enter(s
 
 static inline void intel_context_mark_active(struct intel_context *ce)
 {
-   lockdep_assert_held(>timeline->mutex);
+   lockdep_assert(lockdep_is_held(>timeline->mutex) ||
+  test_bit(CONTEXT_IS_PARKED, >flags));
++ce->active_count;
 }
 
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -118,6 +118,7 @@ struct intel_context {
 #define CONTEXT_LRCA_DIRTY 9
 #define CONTEXT_GUC_INIT   10
 #define CONTEXT_PERMA_PIN  11
+#define CONTEXT_IS_PARKED  12
 
struct {
u64 timeout_us;
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -80,39 +80,6 @@ static int __engine_unpark(struct intel_
return 0;
 }
 
-#if IS_ENABLED(CONFIG_LOCKDEP)
-
-static unsigned long __timeline_mark_lock(struct intel_context *ce)
-{
-   unsigned long flags;
-
-   local_irq_save(flags);
-   mutex_acquire(>timeline->mutex.dep_map, 2, 0, _THIS_IP_);
-
-   return flags;
-}
-
-static void __timeline_mark_unlock(struct intel_context *ce,
-  unsigned long flags)
-{
-   mutex_release(>timeline->mutex.dep_map, _THIS_IP_);
-   local_irq_restore(flags);
-}
-
-#else
-
-static unsigned long __timeline_mark_lock(struct intel_context *ce)
-{
-   return 0;
-}
-
-static void __timeline_mark_unlock(struct intel_context *ce,
-  unsigned long flags)
-{
-}
-
-#endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
-
 static void duration(struct dma_fence *fence, struct dma_fence_cb *cb)
 {
struct i915_request *rq = to_request(fence);
@@ -159,7 +126,6 @@ static bool switch_to_kernel_context(str
 {
struct intel_context *ce = engine->kernel_context;
struct i915_request *rq;
-   unsigned long flags;
bool result = true;
 
/*
@@ -214,7 +180,7 @@ static bool switch_to_kernel_context(str
 * engine->wakeref.count, we may see the request completion and retire
 * it causing an underflow of the engine->wakeref.
 */
-   flags = __timeline_mark_lock(ce);
+   set_bit(CONTEXT_IS_PARKED, >flags);
GEM_BUG_ON(atomic_read(>timeline->active_count) < 0);
 
rq = __i915_request_create(ce, GFP_NOWAIT);
@@ -246,7 +212,7 @@ static bool switch_to_kernel_context(str
 
result = false;
 out_unlock:
-   __timeline_mark_unlock(ce, flags);
+   clear_bit(CONTEXT_IS_PARKED, >flags);
return result;
 }
 
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -642,7 +642,8 @@ i915_request_timeline(const struct i915_
 {
/* Valid only while the request is being constructed (or retired). */
return rcu_dereference_protected(rq->timeline,
-
lockdep_is_held(_access_pointer(rq->timeline)->mutex));
+
lockdep_is_held(_access_pointer(rq->timeline)->mutex) ||
+test_bit(CONTEXT_IS_PARKED, 
>context->flags));
 }
 
 static inline struct i915_gem_context *

[PATCH] kernel/locking: Use a pointer in ww_mutex_trylock().

2021-11-04 Thread Sebastian Andrzej Siewior

mutex_acquire_nest() expects a pointer, pass the pointer.

Fixes: 12235da8c80a1 ("kernel/locking: Add context to ww_mutex_trylock()")
Signed-off-by: Sebastian Andrzej Siewior 
---

Not sure why I haven't seen this earlier…

 kernel/locking/ww_rt_mutex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/ww_rt_mutex.c b/kernel/locking/ww_rt_mutex.c
index 0e00205cf467a..d1473c624105c 100644
--- a/kernel/locking/ww_rt_mutex.c
+++ b/kernel/locking/ww_rt_mutex.c
@@ -26,7 +26,7 @@ int ww_mutex_trylock(struct ww_mutex *lock, struct 
ww_acquire_ctx *ww_ctx)
 
if (__rt_mutex_trylock(>rtmutex)) {
ww_mutex_set_context_fastpath(lock, ww_ctx);
-   mutex_acquire_nest(>dep_map, 0, 1, ww_ctx->dep_map, 
_RET_IP_);
+   mutex_acquire_nest(>dep_map, 0, 1, _ctx->dep_map, 
_RET_IP_);
return 1;
}
 
-- 
2.33.1

[PATCH 7/9] drm/i915: Don't check for atomic context on PREEMPT_RT

2021-10-26 Thread Sebastian Andrzej Siewior

The !in_atomic() check in _wait_for_atomic() triggers on PREEMPT_RT
because the uncore::lock is a spinlock_t and does not disable
preemption or interrupts.

Changing the uncore:lock to a raw_spinlock_t doubles the worst case
latency on an otherwise idle testbox during testing. Therefore I'm
currently unsure about changing this.

Link: https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfy...@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_utils.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index 7a5925072466a..b7b56fb1e2fc7 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -344,7 +344,7 @@ wait_remaining_ms_from_jiffies(unsigned long 
timestamp_jiffies, int to_wait_ms)
 #define wait_for(COND, MS) _wait_for((COND), (MS) * 1000, 10, 1000)
 
 /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
-#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
+#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT) && 
!defined(CONFIG_PREEMPT_RT)
 # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
 #else
 # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
-- 
2.33.1

[PATCH 5/9] drm/i915: Use preempt_disable/enable_rt() where recommended

2021-10-26 Thread Sebastian Andrzej Siewior

From: Mike Galbraith 

Mario Kleiner suggest in commit
  ad3543ede630f ("drm/intel: Push get_scanout_position() timestamping into kms 
driver.")

a spots where preemption should be disabled on PREEMPT_RT. The
difference is that on PREEMPT_RT the intel_uncore::lock disables neither
preemption nor interrupts and so region remains preemptible.

The area covers only register reads and writes. The part that worries me
is:
- __intel_get_crtc_scanline() the worst case is 100us if no match is
  found.

- intel_crtc_scanlines_since_frame_timestamp() not sure how long this
  may take in the worst case.

It was in the RT queue for a while and nobody complained.
Disable preemption on PREEPMPT_RT during timestamping.

[bigeasy: patch description.]

Cc: Mario Kleiner 
Signed-off-by: Mike Galbraith 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_irq.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 77680bca46eec..be8f60226 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -916,7 +916,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc,
 */
spin_lock_irqsave(_priv->uncore.lock, irqflags);
 
-   /* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
+   if (IS_ENABLED(CONFIG_PREEMPT_RT))
+   preempt_disable();
 
/* Get optional system timestamp before query. */
if (stime)
@@ -980,7 +981,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc,
if (etime)
*etime = ktime_get();
 
-   /* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
+   if (IS_ENABLED(CONFIG_PREEMPT_RT))
+   preempt_enable();
 
spin_unlock_irqrestore(_priv->uncore.lock, irqflags);
 
-- 
2.33.1

[PATCH 6/9] drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates

2021-10-26 Thread Sebastian Andrzej Siewior

From: Mike Galbraith 

Commit
   8d7849db3eab7 ("drm/i915: Make sprite updates atomic")

started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.

According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
If this needs to happen with disabled interrupts on PREEMPT_RT, and the
whole section is restricted to register access then all sleeping locks
need to be acquired before interrupts are disabled and some function
maybe moved after enabling interrupts again.
This includes:
- prepare_to_wait() + finish_wait() due its wake queue.
- drm_crtc_vblank_put() -> vblank_disable_fn() drm_device::vbl_lock.
- skl_pfit_enable(), intel_update_plane(), vlv_atomic_update_fifo() and
  maybe others due to intel_uncore::lock
- drm_crtc_arm_vblank_event() due to drm_device::event_lock and
  drm_device::vblank_time_lock.

Don't disable interrupts on PREEMPT_RT during atomic updates.

[bigeasy: drop local locks, commit message]

Signed-off-by: Mike Galbraith 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/display/intel_crtc.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_crtc.c 
b/drivers/gpu/drm/i915/display/intel_crtc.c
index 254e67141a776..7a39029b083f4 100644
--- a/drivers/gpu/drm/i915/display/intel_crtc.c
+++ b/drivers/gpu/drm/i915/display/intel_crtc.c
@@ -425,7 +425,8 @@ void intel_pipe_update_start(const struct intel_crtc_state 
*new_crtc_state)
 */
intel_psr_wait_for_idle(new_crtc_state);
 
-   local_irq_disable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
 
crtc->debug.min_vbl = min;
crtc->debug.max_vbl = max;
@@ -450,11 +451,13 @@ void intel_pipe_update_start(const struct 
intel_crtc_state *new_crtc_state)
break;
}
 
-   local_irq_enable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_enable();
 
timeout = schedule_timeout(timeout);
 
-   local_irq_disable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
}
 
finish_wait(wq, );
@@ -487,7 +490,8 @@ void intel_pipe_update_start(const struct intel_crtc_state 
*new_crtc_state)
return;
 
 irq_disable:
-   local_irq_disable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_VBLANK_EVADE)
@@ -566,7 +570,8 @@ void intel_pipe_update_end(struct intel_crtc_state 
*new_crtc_state)
new_crtc_state->uapi.event = NULL;
}
 
-   local_irq_enable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_enable();
 
/* Send VRR Push to terminate Vblank */
intel_vrr_send_push(new_crtc_state);
-- 
2.33.1

[PATCH 2/9] drm/i915/gt: Queue and wait for the irq_work item.

2021-10-26 Thread Sebastian Andrzej Siewior

Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.

Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).

Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.

Reported-by: Clark Williams 
Signed-off-by: Sebastian Andrzej Siewior 
Reviewed-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 209cf265bf746..6e1b9068d944c 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -311,10 +311,9 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
/* Kick the work once more to drain the signalers, and disarm the irq */
irq_work_sync(>irq_work);
while (READ_ONCE(b->irq_armed) && !atomic_read(>active)) {
-   local_irq_disable();
-   signal_irq_work(>irq_work);
-   local_irq_enable();
+   irq_work_queue(>irq_work);
cond_resched();
+   irq_work_sync(>irq_work);
}
 }
 
-- 
2.33.1

[PATCH 1/9] drm/i915: Don't disable interrupts and pretend a lock as been acquired in __timeline_mark_lock().

2021-10-26 Thread Sebastian Andrzej Siewior

This is a revert of commits
   d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as 
irqsafe")
   6c69a45445af9 ("drm/i915/gt: Mark context->active_count as protected by 
timeline->mutex")

The existing code leads to a different behaviour depending on whether
lockdep is enabled or not. Any following lock that is acquired without
disabling interrupts (but needs to) will not be noticed by lockdep.

This it not just a lockdep annotation but is used but an actual mutex_t
that is properly used as a lock but in case of __timeline_mark_lock()
lockdep is only told that it is acquired but no lock has been acquired.

It appears that its purpose is just satisfy the lockdep_assert_held()
check in intel_context_mark_active(). The other problem with disabling
interrupts is that on PREEMPT_RT interrupts are also disabled which
leads to problems for instance later during memory allocation.

Add a CONTEXT_IS_PARKED bit to intel_engine_cs and set_bit/clear_bit it
instead of mutex_acquire/mutex_release. Use test_bit in the two
identified spots which relied on the lockdep annotation.

Acked-by: Peter Zijlstra (Intel) 
Suggested--by: Peter Zijlstra (Intel) 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/gt/intel_context.h   |  3 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  1 +
 drivers/gpu/drm/i915/gt/intel_engine_pm.c | 38 +--
 drivers/gpu/drm/i915/i915_request.h   |  3 +-
 4 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index 246c37d72cd73..8a575629ef1b6 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -211,7 +211,8 @@ static inline void intel_context_enter(struct intel_context 
*ce)
 
 static inline void intel_context_mark_active(struct intel_context *ce)
 {
-   lockdep_assert_held(>timeline->mutex);
+   lockdep_assert(lockdep_is_held(>timeline->mutex) ||
+  test_bit(CONTEXT_IS_PARKED, >flags));
++ce->active_count;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 9e0177dc5484e..329f470d125f2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -118,6 +118,7 @@ struct intel_context {
 #define CONTEXT_LRCA_DIRTY 9
 #define CONTEXT_GUC_INIT   10
 #define CONTEXT_PERMA_PIN  11
+#define CONTEXT_IS_PARKED  12
 
struct {
u64 timeout_us;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index a1334b48dde7b..456f1f5d0c04e 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -80,39 +80,6 @@ static int __engine_unpark(struct intel_wakeref *wf)
return 0;
 }
 
-#if IS_ENABLED(CONFIG_LOCKDEP)
-
-static unsigned long __timeline_mark_lock(struct intel_context *ce)
-{
-   unsigned long flags;
-
-   local_irq_save(flags);
-   mutex_acquire(>timeline->mutex.dep_map, 2, 0, _THIS_IP_);
-
-   return flags;
-}
-
-static void __timeline_mark_unlock(struct intel_context *ce,
-  unsigned long flags)
-{
-   mutex_release(>timeline->mutex.dep_map, _THIS_IP_);
-   local_irq_restore(flags);
-}
-
-#else
-
-static unsigned long __timeline_mark_lock(struct intel_context *ce)
-{
-   return 0;
-}
-
-static void __timeline_mark_unlock(struct intel_context *ce,
-  unsigned long flags)
-{
-}
-
-#endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
-
 static void duration(struct dma_fence *fence, struct dma_fence_cb *cb)
 {
struct i915_request *rq = to_request(fence);
@@ -159,7 +126,6 @@ static bool switch_to_kernel_context(struct intel_engine_cs 
*engine)
 {
struct intel_context *ce = engine->kernel_context;
struct i915_request *rq;
-   unsigned long flags;
bool result = true;
 
/*
@@ -214,7 +180,7 @@ static bool switch_to_kernel_context(struct intel_engine_cs 
*engine)
 * engine->wakeref.count, we may see the request completion and retire
 * it causing an underflow of the engine->wakeref.
 */
-   flags = __timeline_mark_lock(ce);
+   set_bit(CONTEXT_IS_PARKED, >flags);
GEM_BUG_ON(atomic_read(>timeline->active_count) < 0);
 
rq = __i915_request_create(ce, GFP_NOWAIT);
@@ -246,7 +212,7 @@ static bool switch_to_kernel_context(struct intel_engine_cs 
*engine)
 
result = false;
 out_unlock:
-   __timeline_mark_unlock(ce, flags);
+   clear_bit(CONTEXT_IS_PARKED, >flags);
return result;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.h 
b/drivers/gpu/drm/i915/i915_request.h
index dc359242d1aec..c5898882bb27a 100644
---

[PATCH 9/9] drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE

2021-10-26 Thread Sebastian Andrzej Siewior

The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.

Cc: Steven Rostedt 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
 drivers/gpu/drm/i915/i915_trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 0f8341ca67385..0aefb14c638ae 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -826,7 +826,7 @@ DEFINE_EVENT(i915_request, i915_request_add,
 TP_ARGS(rq)
 );
 
-#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) && !defined(NOTRACE)
 DEFINE_EVENT(i915_request, i915_request_guc_submit,
 TP_PROTO(struct i915_request *rq),
 TP_ARGS(rq)
-- 
2.33.1

[PATCH 4/9] drm/i915: Drop the irqs_disabled() check

2021-10-26 Thread Sebastian Andrzej Siewior

The !irqs_disabled() check triggers on PREEMPT_RT even with
i915_sched_engine::lock acquired. The reason is the lock is transformed
into a sleeping lock on PREEMPT_RT and does not disable interrupts.

There is no need to check for disabled interrupts. The lockdep
annotation below already check if the lock has been acquired by the
caller and will yell if the interrupts are not disabled.

Remove the !irqs_disabled() check.

Reported-by: Maarten Lankhorst 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_request.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 820a1f38b271e..3bbe34ff3b15a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -559,7 +559,6 @@ bool __i915_request_submit(struct i915_request *request)
 
RQ_TRACE(request, "\n");
 
-   GEM_BUG_ON(!irqs_disabled());
lockdep_assert_held(>sched_engine->lock);
 
/*
@@ -668,7 +667,6 @@ void __i915_request_unsubmit(struct i915_request *request)
 */
RQ_TRACE(request, "\n");
 
-   GEM_BUG_ON(!irqs_disabled());
lockdep_assert_held(>sched_engine->lock);
 
/*
-- 
2.33.1

[PATCH 8/9] drm/i915: Disable tracing points on PREEMPT_RT

2021-10-26 Thread Sebastian Andrzej Siewior

Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x0003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
|  rt_spin_lock+0x3f/0x50
|  gen6_read32+0x45/0x1d0 [i915]
|  g4x_get_vblank_counter+0x36/0x40 [i915]
|  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]

The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spinlock_t locks which are transformed into
sleeping locks on PREEMPT_RT. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping locks on PREEMPT_RT.
At the time the arguments are evaluated within trace point, preemption
is disabled and so the locks must not be acquired on PREEMPT_RT.

Based on this I don't see any other way than disable trace points on
PREMPT_RT.

Reported-by: Luca Abeni 
Cc: Steven Rostedt 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_trace.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 9795f456cccfc..0f8341ca67385 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -2,6 +2,10 @@
 #if !defined(_I915_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ)
 #define _I915_TRACE_H_
 
+#ifdef CONFIG_PREEMPT_RT
+#define NOTRACE
+#endif
+
 #include 
 #include 
 #include 
-- 
2.33.1

[PATCH 3/9] drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()

2021-10-26 Thread Sebastian Andrzej Siewior

execlists_dequeue() is invoked from a function which uses
local_irq_disable() to disable interrupts so the spin_lock() behaves
like spin_lock_irq().
This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not
the same as spin_lock_irq().

execlists_dequeue_irq() and execlists_dequeue() has each one caller
only. If intel_engine_cs::active::lock is acquired and released with the
_irq suffix then it behaves almost as if execlists_dequeue() would be
invoked with disabled interrupts. The difference is the last part of the
function which is then invoked with enabled interrupts.
I can't tell if this makes a difference. From looking at it, it might
work to move the last unlock at the end of the function as I didn't find
anything that would acquire the lock again.

Reported-by: Clark Williams 
Signed-off-by: Sebastian Andrzej Siewior 
Reviewed-by: Maarten Lankhorst 
---
 .../drm/i915/gt/intel_execlists_submission.c| 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index bedb80057046a..1dbcac05f44eb 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1284,7 +1284,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * and context switches) submission.
 */
 
-   spin_lock(_engine->lock);
+   spin_lock_irq(_engine->lock);
 
/*
 * If the queue is higher priority than the last
@@ -1384,7 +1384,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * Even if ELSP[1] is occupied and not worthy
 * of timeslices, our queue might be.
 */
-   spin_unlock(_engine->lock);
+   spin_unlock_irq(_engine->lock);
return;
}
}
@@ -1410,7 +1410,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 
if (last && !can_merge_rq(last, rq)) {
spin_unlock(>base.sched_engine->lock);
-   spin_unlock(>sched_engine->lock);
+   spin_unlock_irq(>sched_engine->lock);
return; /* leave this for another sibling */
}
 
@@ -1572,7 +1572,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 */
sched_engine->queue_priority_hint = queue_prio(sched_engine);
i915_sched_engine_reset_on_empty(sched_engine);
-   spin_unlock(_engine->lock);
+   spin_unlock_irq(_engine->lock);
 
/*
 * We can skip poking the HW if we ended up with exactly the same set
@@ -1598,13 +1598,6 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
}
 }
 
-static void execlists_dequeue_irq(struct intel_engine_cs *engine)
-{
-   local_irq_disable(); /* Suspend interrupts across request submission */
-   execlists_dequeue(engine);
-   local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */
-}
-
 static void clear_ports(struct i915_request **ports, int count)
 {
memset_p((void **)ports, NULL, count);
@@ -2424,7 +2417,7 @@ static void execlists_submission_tasklet(struct 
tasklet_struct *t)
}
 
if (!engine->execlists.pending[0]) {
-   execlists_dequeue_irq(engine);
+   execlists_dequeue(engine);
start_timeslice(engine);
}
 
-- 
2.33.1

[PATCH v2 0/9] drm/i915: PREEMPT_RT related fixups.

2021-10-26 Thread Sebastian Andrzej Siewior

the following patches are from the PREEMPT_RT queue. It is mostly about
disabling interrupts/preemption which leads to problems. Patches 1-4 are
independent of PREEMPT_RT.

Unfortunately DRM_I915_LOW_LEVEL_TRACEPOINTS had to be disabled because
it acquires locks from within trace points. It has been pointed out in
the previous post that this makes any kind of debugging impossible.
Making the uncore.lock a raw_spinlock_t doubles the latencies in my
limited testing [0]. I don't know the difference on other hardware. Also
it worries me a little that wait_for_atomic() has 50ms as the upper
spin/wait limit if the condition does not become true.

I tested it on a SandyBridge with built-in i915 by using X, OpenGL and
playing videos without noticing any warnings. However, some code paths
were not entered.

[0] https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfy...@linutronix.de/

Sebastian

Re: [RFC PATCH] drm: Increase DRM_OBJECT_MAX_PROPERTY by 18.

2021-10-19 Thread Sebastian Andrzej Siewior

On 2021-10-19 14:24:29 [+0200], Daniel Vetter wrote:
> 
> Ah dmesg help me understand what's going on. Does the below patch help? If
> it's this one that would also explain why intel CI hasn't hit it - it's a
> leak between tests and we run them all individually instead of once at
> boot-up.

Yes, it does. Thank you.

Tested-by: Sebastian Andrzej Siewior 

> Cheers, Daniel

Sebastian

Re: [RFC PATCH] drm: Increase DRM_OBJECT_MAX_PROPERTY by 18.

2021-10-14 Thread Sebastian Andrzej Siewior

On 2021-10-14 15:21:22 [+0200], Daniel Vetter wrote:
> On Wed, Oct 13, 2021 at 07:35:48PM +0200, Sebastian Andrzej Siewior wrote:
> > c7fcbf2513973 -> does not boot
> > c7fcbf2513973 + 2f425cf5242a0 -> boots, 18 x DRM_OBJECT_MAX_PROPERTY
> > 6f11f37459d8f -> boots, 0 x DRM_OBJECT_MAX_PROPERTY
> > 6f11f37459d8f + 2f425cf5242a0 -> boots, 18 x DRM_OBJECT_MAX_PROPERTY
> 
> Just to check, you've built 6f11f37459d8f, and then you cherry-picked
> 2f425cf5242a0 on top (not merged), and that already got you the warning
> flood?

Correct.

> I'm probably blind, but I'm really not seeing where this pile of
> properties is coming from. Can you pls also boot with drm.debug=0xe and
> attach full dmesg? Plus your .config please.

attached. dmesg.txt is 6f11f37459d8f and the other is 6f11f37459d8f +
2f425cf5242a0.

> Thanks, Daniel

Sebastian


config.xz
Description: application/xz


dmesg.txt.xz
Description: application/xz


dmesg-with-2f425cf5242a0.txt.xz
Description: application/xz

Re: [RFC PATCH] drm: Increase DRM_OBJECT_MAX_PROPERTY by 18.

2021-10-13 Thread Sebastian Andrzej Siewior

On 2021-10-13 14:57:34 [+0200], Daniel Vetter wrote:
> Hm there's a pile of commits there, and nothing immediately jumps to
> light. The thing is, 18 is likely way too much, since if e.g. we have a
> single new property on a plane and that pushes over the limit on all of
> them, you get iirc 3x4 already simply because we have that many planes.
> 
> So would be good to know the actual culprit.
> 
> Can you pls try to bisect the above range, applying the patch as a fixup
> locally (without commit, that will confuse git bisect a bit I think), so
> we know what/where went wrong?

c7fcbf2513973 -> does not boot
c7fcbf2513973 + 2f425cf5242a0 -> boots, 18 x DRM_OBJECT_MAX_PROPERTY
6f11f37459d8f -> boots, 0 x DRM_OBJECT_MAX_PROPERTY
6f11f37459d8f + 2f425cf5242a0 -> boots, 18 x DRM_OBJECT_MAX_PROPERTY

> I'm still confused why this isn't showing up anywhere in our intel ci ...
> 
> Thanks, Daniel

Sebastian

Re: [RFC PATCH] drm: Increase DRM_OBJECT_MAX_PROPERTY by 18.

2021-10-13 Thread Sebastian Andrzej Siewior

On 2021-10-13 14:02:59 [+0200], Daniel Vetter wrote:
> On Tue, Oct 05, 2021 at 08:51:51AM +0200, Sebastian Andrzej Siewior wrote:
> > The warning poped up, it says it increase it by the number of occurrence.
> > I saw it 18 times so here it is.
> > It started to up since commit
> >2f425cf5242a0 ("drm: Fix oops in damage self-tests by mocking damage 
> > property")
> > 
> > Increase DRM_OBJECT_MAX_PROPERTY by 18.
> > 
> > Signed-off-by: Sebastian Andrzej Siewior 
> 
> Which driver where? Whomever added that into upstream should also have
> realized this (things will just not work) and include it in there. So if
> things are tested correctly this should be part of a larger series to add
> these 18 props somewhere.

This is on i915 with full debug. If I remember correctly, it wasn't
there before commit
   c7fcbf2513973 ("drm/plane: check that fb_damage is set up when used")

With that commit the box crashed until commit 
   2f425cf5242a0 ("drm: Fix oops in damage self-tests by mocking damage 
property")

where I then observed this.

> Also maybe we should just dynamically allocate this array if people have
> this many properties on their objects.
> -Daniel

Sebastian

Re: [PATCH 3/8] drm/i915: Disable tracing points on PREEMPT_RT

2021-10-06 Thread Sebastian Andrzej Siewior

On 2021-10-06 12:15:21 [+0200], To Ville Syrjälä wrote:
> On 2021-10-06 12:34:19 [+0300], Ville Syrjälä wrote:
> > I think the correct answer is to make uncore.lock a raw_spinlock.
> > Without the tracepoints deubgging any of this is stuff pretty much
> > impossible. We also take that lock a lot.
> 
> Let me check if that works.

Turned the uncore.lock into raw_spinlock_t, the debug.lock as well because it
nests inside the former. Also intel_uncore_forcewake_domain::timer fires
now in hardirq since it almost only acquires the uncore.lock.
What worries me a little is the fw_domain_wait_ack_clear() /
wait_ack_clear() which spin-waits up a whole ms.
__gen6_gt_wait_for_thread_c0() has a 5ms limit and wait_ack_clear() has
50ms as upper limit. I know that it does not usually take long and if it
takes that long than it is an error most likely but still...
The full patch at the end of the email.

Now. Before that patch:

| T: 0 ( 1179) P:90 I:250 C: 182450 Min:  2 Act:6 Avg:5 Max:  21
| T: 1 ( 1180) P:90 I:250 C: 182437 Min:  2 Act:6 Avg:5 Max:  22
| T: 2 ( 1181) P:90 I:250 C: 182423 Min:  2 Act:6 Avg:5 Max:  23
| T: 3 ( 1182) P:90 I:250 C: 182401 Min:  2 Act:6 Avg:5 Max:  21
| T: 4 ( 1183) P:90 I:250 C: 182394 Min:  2 Act:7 Avg:5 Max:  27
| T: 5 ( 1184) P:90 I:250 C: 182381 Min:  2 Act:6 Avg:5 Max:  22
| T: 6 ( 1185) P:90 I:250 C: 182368 Min:  3 Act:6 Avg:5 Max:  23
| T: 7 ( 1186) P:90 I:250 C: 182356 Min:  2 Act:8 Avg:5 Max:  22

after:

| T: 0 ( 1183) P:90 I:250 C:1022143 Min:  3 Act:3 Avg:4 Max:  47
| T: 1 ( 1184) P:90 I:250 C:1022125 Min:  2 Act:3 Avg:4 Max:  51
| T: 2 ( 1185) P:90 I:250 C:1022110 Min:  2 Act:5 Avg:4 Max:  52
| T: 3 ( 1186) P:90 I:250 C:1022089 Min:  2 Act:3 Avg:4 Max:  52
| T: 4 ( 1187) P:90 I:250 C:1022083 Min:  2 Act:3 Avg:4 Max:  51
| T: 5 ( 1188) P:90 I:250 C:1022070 Min:  3 Act:3 Avg:4 Max:  50
| T: 6 ( 1189) P:90 I:250 C:1022058 Min:  3 Act:5 Avg:4 Max:  46
| T: 7 ( 1190) P:90 I:250 C:1022045 Min:  2 Act:3 Avg:4 Max:  51

This is cyclictest output. A little explanation:
T: means thread number. There is one thread pinned to each CPU so I have
8 CPUs. This is an i7 SandyBridge so 4 cores + hyperthread with a
built-in i915 (INTEL_SNB_D_GT1_IDS).

Max: is the maximal observed latency in us. The program fires a timer
every 250us and measures the latency between the programmed time and
now. Since it is the thread with the highest priority in the system it
should be scheduled right away. Unless there is a
preempt-disable/IRQ-off section somewhere.

I did the same test in both cases: started a video hoping for some HW
acceleration and skipped forward / backwards in the clip a few times. As
soon as I start jumping in the video the latencies rise. I don't observe
it without the patch.
The system is idle otherwise. If you have other tests in mind which put
more / different load on the system, I'm all yours. I'm mostly curious
how bad can it get. 

Sebastian

diff --git a/drivers/gpu/drm/i915/display/i9xx_plane.c 
b/drivers/gpu/drm/i915/display/i9xx_plane.c
index b1439ba78f67b..6f2cab95d8646 100644
--- a/drivers/gpu/drm/i915/display/i9xx_plane.c
+++ b/drivers/gpu/drm/i915/display/i9xx_plane.c
@@ -444,7 +444,7 @@ static void i9xx_update_plane(struct intel_plane *plane,
else
dspaddr_offset = linear_offset;

-   spin_lock_irqsave(_priv->uncore.lock, irqflags);
+   raw_spin_lock_irqsave(_priv->uncore.lock, irqflags);

intel_de_write_fw(dev_priv, DSPSTRIDE(i9xx_plane),
  plane_state->view.color_plane[0].stride);
@@ -490,7 +490,7 @@ static void i9xx_update_plane(struct intel_plane *plane,
intel_de_write_fw(dev_priv, DSPADDR(i9xx_plane),
  intel_plane_ggtt_offset(plane_state) + 
dspaddr_offset);

-   spin_unlock_irqrestore(_priv->uncore.lock, irqflags);
+   raw_spin_unlock_irqrestore(_priv->uncore.lock, irqflags);
 }

 static void i9xx_disable_plane(struct intel_plane *plane,
@@ -513,7 +513,7 @@ static void i9xx_disable_plane(struct intel_plane *plane,
 */
dspcntr = i9xx_plane_ctl_crtc(crtc_state);

-   spin_lock_irqsave(_priv->uncore.lock, irqflags);
+   raw_spin_lock_irqsave(_priv->uncore.lock, irqflags);

intel_de_write_fw(dev_priv, DSPCNTR(i9xx_plane), dspcntr);
if (DISPLAY_VER(dev_priv) >= 4)
@@ -521,7 +521,7 @@ static void i9xx_disable_plane(struct intel_plane *plane,
else
intel_de_write_fw(dev_priv, DSPADDR(i9xx_plane), 0);

-   spin_unlock_irqrestore(_priv->uncore.lock, irqflags);
+   raw_spin_unlock_irqrestore(_priv->uncore.lock, irqflags);
 }

 static void
@@ -539,11 +539,11 @@ g4x_primary_async_flip(struct intel_plane *plane,
if

Re: [PATCH 3/8] drm/i915: Disable tracing points on PREEMPT_RT

2021-10-06 Thread Sebastian Andrzej Siewior

On 2021-10-06 12:34:19 [+0300], Ville Syrjälä wrote:
> I think the correct answer is to make uncore.lock a raw_spinlock.
> Without the tracepoints deubgging any of this is stuff pretty much
> impossible. We also take that lock a lot.

Let me check if that works.

Sebastian

Re: [PATCH 8/8] drm/i915: Don't disable interrupts and pretend a lock as been acquired in __timeline_mark_lock().

2021-10-06 Thread Sebastian Andrzej Siewior

On 2021-10-05 21:16:17 [+0200], Peter Zijlstra wrote:
> > -static inline void intel_context_mark_active(struct intel_context *ce)
> > +static inline void intel_context_mark_active(struct intel_context *ce,
> > +bool timeline_mutex_needed)
> >  {
> > -   lockdep_assert_held(>timeline->mutex);
> > +   if (timeline_mutex_needed)
> > +   lockdep_assert_held(>timeline->mutex);
> > ++ce->active_count;
> >  }
> 
> Chris, might it be possible to write that something like:
> 
>   lockdep_assert(lockdep_is_held(>timeline->mutex) ||
>  engine_is_parked(ce));
> 
> instead?

This looks indeed way better given Torvald's yelling in similar cases.

> Otherwise,
> 
> Acked-by: Peter Zijlstra (Intel) 

Sebastian

[PATCH 7/8] drm/i915: Drop the irqs_disabled() check

2021-10-05 Thread Sebastian Andrzej Siewior

The !irqs_disabled() check triggers on PREEMPT_RT even with
i915_sched_engine::lock acquired. The reason is the lock is transformed
into a sleeping lock on PREEMPT_RT and does not disable interrupts.

There is no need to check for disabled interrupts. The lockdep
annotation below already check if the lock has been acquired by the
caller and will yell if the interrupts are not disabled.

Remove the !irqs_disabled() check.

Reported-by: Maarten Lankhorst 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_request.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 79da5eca60af5..b9dd6100c6d17 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -559,7 +559,6 @@ bool __i915_request_submit(struct i915_request *request)
 
RQ_TRACE(request, "\n");
 
-   GEM_BUG_ON(!irqs_disabled());
lockdep_assert_held(>sched_engine->lock);
 
/*
@@ -668,7 +667,6 @@ void __i915_request_unsubmit(struct i915_request *request)
 */
RQ_TRACE(request, "\n");
 
-   GEM_BUG_ON(!irqs_disabled());
lockdep_assert_held(>sched_engine->lock);
 
/*
-- 
2.33.0

[PATCH 6/8] drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()

2021-10-05 Thread Sebastian Andrzej Siewior

execlists_dequeue() is invoked from a function which uses
local_irq_disable() to disable interrupts so the spin_lock() behaves
like spin_lock_irq().
This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not
the same as spin_lock_irq().

execlists_dequeue_irq() and execlists_dequeue() has each one caller
only. If intel_engine_cs::active::lock is acquired and released with the
_irq suffix then it behaves almost as if execlists_dequeue() would be
invoked with disabled interrupts. The difference is the last part of the
function which is then invoked with enabled interrupts.
I can't tell if this makes a difference. From looking at it, it might
work to move the last unlock at the end of the function as I didn't find
anything that would acquire the lock again.

Reported-by: Clark Williams 
Signed-off-by: Sebastian Andrzej Siewior 
Reviewed-by: Maarten Lankhorst 
---
 .../drm/i915/gt/intel_execlists_submission.c| 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index de5f9c86b9a44..dbf44f9567449 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1283,7 +1283,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * and context switches) submission.
 */
 
-   spin_lock(_engine->lock);
+   spin_lock_irq(_engine->lock);
 
/*
 * If the queue is higher priority than the last
@@ -1383,7 +1383,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * Even if ELSP[1] is occupied and not worthy
 * of timeslices, our queue might be.
 */
-   spin_unlock(_engine->lock);
+   spin_unlock_irq(_engine->lock);
return;
}
}
@@ -1409,7 +1409,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 
if (last && !can_merge_rq(last, rq)) {
spin_unlock(>base.sched_engine->lock);
-   spin_unlock(>sched_engine->lock);
+   spin_unlock_irq(>sched_engine->lock);
return; /* leave this for another sibling */
}
 
@@ -1571,7 +1571,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 */
sched_engine->queue_priority_hint = queue_prio(sched_engine);
i915_sched_engine_reset_on_empty(sched_engine);
-   spin_unlock(_engine->lock);
+   spin_unlock_irq(_engine->lock);
 
/*
 * We can skip poking the HW if we ended up with exactly the same set
@@ -1597,13 +1597,6 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
}
 }
 
-static void execlists_dequeue_irq(struct intel_engine_cs *engine)
-{
-   local_irq_disable(); /* Suspend interrupts across request submission */
-   execlists_dequeue(engine);
-   local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */
-}
-
 static void clear_ports(struct i915_request **ports, int count)
 {
memset_p((void **)ports, NULL, count);
@@ -2427,7 +2420,7 @@ static void execlists_submission_tasklet(struct 
tasklet_struct *t)
}
 
if (!engine->execlists.pending[0]) {
-   execlists_dequeue_irq(engine);
+   execlists_dequeue(engine);
start_timeslice(engine);
}
 
-- 
2.33.0

[PATCH 0/8] drm/i915: PREEMPT_RT related fixups.

2021-10-05 Thread Sebastian Andrzej Siewior

Hi,

the following patches are from the PREEMPT_RT queue. It is mostly about
disabling interrupts/preemption which leads to problems.
Unfortunately DRM_I915_LOW_LEVEL_TRACEPOINTS had to be disabled because
it acquires locks from within trace points.
I tested it on a SandyBridge with built-in i915 by using X, OpenGL and
playing videos without noticing any warnings. However, some code paths
were not entered.

Sebastian

[PATCH 5/8] drm/i915/gt: Queue and wait for the irq_work item.

2021-10-05 Thread Sebastian Andrzej Siewior

Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.

Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).

Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.

Reported-by: Clark Williams 
Signed-off-by: Sebastian Andrzej Siewior 
Reviewed-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 209cf265bf746..6e1b9068d944c 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -311,10 +311,9 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
/* Kick the work once more to drain the signalers, and disarm the irq */
irq_work_sync(>irq_work);
while (READ_ONCE(b->irq_armed) && !atomic_read(>active)) {
-   local_irq_disable();
-   signal_irq_work(>irq_work);
-   local_irq_enable();
+   irq_work_queue(>irq_work);
cond_resched();
+   irq_work_sync(>irq_work);
}
 }
 
-- 
2.33.0

[PATCH 8/8] drm/i915: Don't disable interrupts and pretend a lock as been acquired in __timeline_mark_lock().

2021-10-05 Thread Sebastian Andrzej Siewior

This is a revert of commits
   d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as 
irqsafe")
   6c69a45445af9 ("drm/i915/gt: Mark context->active_count as protected by 
timeline->mutex")

The existing code leads to a different behaviour depending on wheather
lockdep is enabled or not. Any following lock that is acquired without
disabling interrupts (but needs to) will not be noticed by lockdep.

This it not just a lockdep annotation but is used but an actual mutex_t
that is properly used as a lock but in case of __timeline_mark_lock()
lockdep is only told that it is acquired but no lock has been acquired.

It appears that its purporse is just satisfy the lockdep_assert_held()
check in intel_context_mark_active(). The other problem with disabling
interrupts is that on PREEMPT_RT interrupts are also disabled which
leads to problems for instance later during memory allocation.

Add an argument to intel_context_mark_active() which is true if the lock
must have been acquired, false if other magic is involved and the lock
is not needed. Use the `false' argument only from within
switch_to_kernel_context() and remove __timeline_mark_lock().

Cc: Peter Zijlstra 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/gt/intel_context.h   |  6 ++-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |  2 +-
 drivers/gpu/drm/i915/gt/intel_engine_pm.c | 38 +--
 drivers/gpu/drm/i915/i915_request.c   |  7 ++--
 drivers/gpu/drm/i915/i915_request.h   |  3 +-
 5 files changed, 12 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index c410989507466..bed60dff93eff 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -161,9 +161,11 @@ static inline void intel_context_enter(struct 
intel_context *ce)
ce->ops->enter(ce);
 }
 
-static inline void intel_context_mark_active(struct intel_context *ce)
+static inline void intel_context_mark_active(struct intel_context *ce,
+bool timeline_mutex_needed)
 {
-   lockdep_assert_held(>timeline->mutex);
+   if (timeline_mutex_needed)
+   lockdep_assert_held(>timeline->mutex);
++ce->active_count;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 74775ae961b2b..02c0ab9fbb4b8 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -42,7 +42,7 @@ heartbeat_create(struct intel_context *ce, gfp_t gfp)
struct i915_request *rq;
 
intel_context_enter(ce);
-   rq = __i915_request_create(ce, gfp);
+   rq = __i915_request_create(ce, gfp, true);
intel_context_exit(ce);
 
return rq;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 1f07ac4e0672a..d75638d1d561e 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -80,39 +80,6 @@ static int __engine_unpark(struct intel_wakeref *wf)
return 0;
 }
 
-#if IS_ENABLED(CONFIG_LOCKDEP)
-
-static unsigned long __timeline_mark_lock(struct intel_context *ce)
-{
-   unsigned long flags;
-
-   local_irq_save(flags);
-   mutex_acquire(>timeline->mutex.dep_map, 2, 0, _THIS_IP_);
-
-   return flags;
-}
-
-static void __timeline_mark_unlock(struct intel_context *ce,
-  unsigned long flags)
-{
-   mutex_release(>timeline->mutex.dep_map, _THIS_IP_);
-   local_irq_restore(flags);
-}
-
-#else
-
-static unsigned long __timeline_mark_lock(struct intel_context *ce)
-{
-   return 0;
-}
-
-static void __timeline_mark_unlock(struct intel_context *ce,
-  unsigned long flags)
-{
-}
-
-#endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
-
 static void duration(struct dma_fence *fence, struct dma_fence_cb *cb)
 {
struct i915_request *rq = to_request(fence);
@@ -159,7 +126,6 @@ static bool switch_to_kernel_context(struct intel_engine_cs 
*engine)
 {
struct intel_context *ce = engine->kernel_context;
struct i915_request *rq;
-   unsigned long flags;
bool result = true;
 
/* GPU is pointing to the void, as good as in the kernel context. */
@@ -201,10 +167,9 @@ static bool switch_to_kernel_context(struct 
intel_engine_cs *engine)
 * engine->wakeref.count, we may see the request completion and retire
 * it causing an underflow of the engine->wakeref.
 */
-   flags = __timeline_mark_lock(ce);
GEM_BUG_ON(atomic_read(>timeline->active_count) < 0);
 
-   rq = __i915_request_create(ce, GFP_NOWAIT);
+   rq = __i915_request_create(ce, GFP_NOWAIT, false);
if (IS_ERR(

[PATCH 1/8] drm/i915: Use preempt_disable/enable_rt() where recommended

2021-10-05 Thread Sebastian Andrzej Siewior

From: Mike Galbraith 

Mario Kleiner suggest in commit
  ad3543ede630f ("drm/intel: Push get_scanout_position() timestamping into kms 
driver.")

a spots where preemption should be disabled on PREEMPT_RT. The
difference is that on PREEMPT_RT the intel_uncore::lock disables neither
preemption nor interrupts and so region remains preemptible.

The area covers only register reads and writes. The part that worries me
is:
- __intel_get_crtc_scanline() the worst case is 100us if no match is
  found.

- intel_crtc_scanlines_since_frame_timestamp() not sure how long this
  may take in the worst case.

It was in the RT queue for a while and nobody complained.
Disable preemption on PREEPMPT_RT during timestamping.

[bigeasy: patch description.]

Cc: Mario Kleiner 
Signed-off-by: Mike Galbraith 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_irq.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9bc4f4a8e12ec..547347241a47c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -886,7 +886,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc,
 */
spin_lock_irqsave(_priv->uncore.lock, irqflags);
 
-   /* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
+   if (IS_ENABLED(CONFIG_PREEMPT_RT))
+   preempt_disable();
 
/* Get optional system timestamp before query. */
if (stime)
@@ -950,7 +951,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc,
if (etime)
*etime = ktime_get();
 
-   /* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
+   if (IS_ENABLED(CONFIG_PREEMPT_RT))
+   preempt_enable();
 
spin_unlock_irqrestore(_priv->uncore.lock, irqflags);
 
-- 
2.33.0

[PATCH 4/8] drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE

2021-10-05 Thread Sebastian Andrzej Siewior

The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.

Cc: Steven Rostedt 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
 drivers/gpu/drm/i915/i915_trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 773e7362c4442..5ff6c0a37fdfa 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -826,7 +826,7 @@ DEFINE_EVENT(i915_request, i915_request_add,
 TP_ARGS(rq)
 );
 
-#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) && !defined(NOTRACE)
 DEFINE_EVENT(i915_request, i915_request_guc_submit,
 TP_PROTO(struct i915_request *rq),
 TP_ARGS(rq)
-- 
2.33.0

[PATCH 3/8] drm/i915: Disable tracing points on PREEMPT_RT

2021-10-05 Thread Sebastian Andrzej Siewior

Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x0003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
|  rt_spin_lock+0x3f/0x50
|  gen6_read32+0x45/0x1d0 [i915]
|  g4x_get_vblank_counter+0x36/0x40 [i915]
|  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]

The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spinlock_t locks which are transformed into
sleeping locks on PREEMPT_RT. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping locks on PREEMPT_RT.
At the time the arguments are evaluated within trace point, preemption
is disabled and so the locks must not be acquired on PREEMPT_RT.

Based on this I don't see any other way than disable trace points on
PREMPT_RT.

Reported-by: Luca Abeni 
Cc: Steven Rostedt 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/i915_trace.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 806ad688274bf..773e7362c4442 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -2,6 +2,10 @@
 #if !defined(_I915_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ)
 #define _I915_TRACE_H_
 
+#ifdef CONFIG_PREEMPT_RT
+#define NOTRACE
+#endif
+
 #include 
 #include 
 #include 
-- 
2.33.0

[PATCH 2/8] drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates

2021-10-05 Thread Sebastian Andrzej Siewior

From: Mike Galbraith 

Commit
   8d7849db3eab7 ("drm/i915: Make sprite updates atomic")

started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.

According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
If this needs to happen with disabled interrupts on PREEMPT_RT, and the
whole section is restricted to register access then all sleeping locks
need to be acquired before interrupts are disabled and some function
maybe moved after enabling interrupts again.
This includes:
- prepare_to_wait() + finish_wait() due its wake queue.
- drm_crtc_vblank_put() -> vblank_disable_fn() drm_device::vbl_lock.
- skl_pfit_enable(), intel_update_plane(), vlv_atomic_update_fifo() and
  maybe others due to intel_uncore::lock
- drm_crtc_arm_vblank_event() due to drm_device::event_lock and
  drm_device::vblank_time_lock.

Don't disable interrupts on PREEMPT_RT during atomic updates.

[bigeasy: drop local locks, commit message]

Signed-off-by: Mike Galbraith 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/display/intel_crtc.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_crtc.c 
b/drivers/gpu/drm/i915/display/intel_crtc.c
index 254e67141a776..7a39029b083f4 100644
--- a/drivers/gpu/drm/i915/display/intel_crtc.c
+++ b/drivers/gpu/drm/i915/display/intel_crtc.c
@@ -425,7 +425,8 @@ void intel_pipe_update_start(const struct intel_crtc_state 
*new_crtc_state)
 */
intel_psr_wait_for_idle(new_crtc_state);
 
-   local_irq_disable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
 
crtc->debug.min_vbl = min;
crtc->debug.max_vbl = max;
@@ -450,11 +451,13 @@ void intel_pipe_update_start(const struct 
intel_crtc_state *new_crtc_state)
break;
}
 
-   local_irq_enable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_enable();
 
timeout = schedule_timeout(timeout);
 
-   local_irq_disable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
}
 
finish_wait(wq, );
@@ -487,7 +490,8 @@ void intel_pipe_update_start(const struct intel_crtc_state 
*new_crtc_state)
return;
 
 irq_disable:
-   local_irq_disable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_VBLANK_EVADE)
@@ -566,7 +570,8 @@ void intel_pipe_update_end(struct intel_crtc_state 
*new_crtc_state)
new_crtc_state->uapi.event = NULL;
}
 
-   local_irq_enable();
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_enable();
 
/* Send VRR Push to terminate Vblank */
intel_vrr_send_push(new_crtc_state);
-- 
2.33.0

Re: [Intel-gfx] [PATCH 25/33] drm/i915/guc: Support request cancellation

2021-10-05 Thread Sebastian Andrzej Siewior

On 2021-10-05 11:13:16 [+0100], Tvrtko Ursulin wrote:
> Needs this fix:
> 
> commit d576b31bdece7b5034047cbe21170e948198d32f
> Author: Matthew Auld 
> Date:   Fri Sep 24 15:46:46 2021 +0100
> 
> drm/i915: remember to call i915_sw_fence_fini

Thanks, works. Needed a tweak since it does not apply as-is.

> But in the fix we forgot to add:
> 
> Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> 
> So not sure if it will appear on it's own. Adding Joonas and Rodrigo for
> maintainer level help.
> 
> Regards,
> 
> Tvrtko

Sebastian

Re: [PATCH 25/33] drm/i915/guc: Support request cancellation

2021-10-05 Thread Sebastian Andrzej Siewior

On 2021-07-27 12:15:59 [-0700], Daniele Ceraolo Spurio wrote:
> On 7/26/2021 5:23 PM, Matthew Brost wrote:
> > This adds GuC backend support for i915_request_cancel(), which in turn
> > makes CONFIG_DRM_I915_REQUEST_TIMEOUT work.
> > 
> Reviewed-by: Daniele Ceraolo Spurio 

I have a few instances of ODEBUG warnings since this commit
   62eaf0ae217d4 ("drm/i915/guc: Support request cancellation")

like:

| [ cut here ]
| ODEBUG: init destroyed (active state 0) object type: i915_sw_fence hint: 
sw_fence_dummy_notify+0x0/0x10
| WARNING: CPU: 0 PID: 987 at lib/debugobjects.c:505 
debug_print_object+0x6e/0x90
| Modules linked in:
| CPU: 0 PID: 987 Comm: Xorg Not tainted 5.15.0-rc4+ #67
| Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3-M, BIOS 
P2.10 04/24/2012
| RIP: 0010:debug_print_object+0x6e/0x90
…
| Call Trace:
|  i915_sw_fence_reinit+0x10/0x40
|  intel_context_init+0x185/0x1e0
|  intel_context_create+0x2e/0x100
|  default_engines+0x9d/0x120
|  i915_gem_create_context+0x40a/0x5d0
|  ? trace_kmalloc+0x29/0xd0
|  ? kmem_cache_alloc_trace+0xdd/0x190
|  i915_gem_context_open+0x140/0x1c0
|  i915_gem_open+0x70/0xa0
|  drm_file_alloc+0x1af/0x270
|  drm_open+0xdc/0x270
|  drm_stub_open+0xa6/0x130
|  chrdev_open+0xbe/0x250
|  ? cdev_device_add+0x80/0x80
|  do_dentry_open+0x15e/0x390
|  path_openat+0x76b/0xa60
|  do_filp_open+0xa4/0x150
|  ? lock_release+0x149/0x2f0
|  ? _raw_spin_unlock+0x24/0x40
|  do_sys_openat2+0x92/0x160
|  __x64_sys_openat+0x4f/0x90
|  do_syscall_64+0x3b/0xc0
|  entry_SYSCALL_64_after_hwframe+0x44/0xae
| RIP: 0033:0x7f91b5cfdf07

and:
| ODEBUG: activate destroyed (active state 0) object type: i915_sw_fence hint: 
sw_fence_dummy_notify+0x0/0x10
| WARNING: CPU: 0 PID: 987 at lib/debugobjects.c:505 
debug_print_object+0x6e/0x90
| 
| Call Trace:
|  debug_object_activate+0x174/0x200
|  i915_sw_fence_commit+0x10/0x20
|  intel_context_init+0x18d/0x1e0
|  intel_context_create+0x2e/0x100
|  default_engines+0x9d/0x120

---

| ODEBUG: active_state destroyed (active state 0) object type: i915_sw_fence 
hint: sw_fence_dummy_notify+0x0/0x10
| WARNING: CPU: 0 PID: 987 at lib/debugobjects.c:505 
debug_print_object+0x6e/0x90
| Call Trace:
|  __i915_sw_fence_complete+0x6f/0x280
|  intel_context_init+0x18d/0x1e0
|  intel_context_create+0x2e/0x100
|  default_engines+0x9d/0x120

Is this known? This is yesterday's -rc4, I first noticed it in -rc3.

> Daniele

Sebastian

[RFC PATCH] drm: Increase DRM_OBJECT_MAX_PROPERTY by 18.

2021-10-05 Thread Sebastian Andrzej Siewior

The warning poped up, it says it increase it by the number of occurrence.
I saw it 18 times so here it is.
It started to up since commit
   2f425cf5242a0 ("drm: Fix oops in damage self-tests by mocking damage 
property")

Increase DRM_OBJECT_MAX_PROPERTY by 18.

Signed-off-by: Sebastian Andrzej Siewior 
---

I have no idea whether this is correct or just a symptom of another
problem. This has been observed with i915 and full debug.

 include/drm/drm_mode_object.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/drm_mode_object.h b/include/drm/drm_mode_object.h
index c34a3e8030e12..1e5399e47c3a5 100644
--- a/include/drm/drm_mode_object.h
+++ b/include/drm/drm_mode_object.h
@@ -60,7 +60,7 @@ struct drm_mode_object {
void (*free_cb)(struct kref *kref);
 };
 
-#define DRM_OBJECT_MAX_PROPERTY 24
+#define DRM_OBJECT_MAX_PROPERTY 42
 /**
  * struct drm_object_properties - property tracking for _mode_object
  */
-- 
2.33.0

Re: [Intel-gfx] [RFC PATCH 2/2] drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()

2021-10-01 Thread Sebastian Andrzej Siewior

On 2021-09-16 11:38:55 [+0200], Maarten Lankhorst wrote:
> Patches look good.
Thank you for looking.

> For both patches:
> 
> Reviewed-by: Maarten Lankhorst 
> 
> I've been looking at running i915 with the -rt patch series, and
> noticed i915_request_submit fails with GEM_BUG_ON(!irqs_disabled());
> presumably same failure exists for i915_request_unsubmit().
> 
> Might be worth removing those checks as well? Seems double with
> lockdep_assert_held on an irq lock anyway.

yes, let me prepare something in a few.

> I've also noticed the local_irq_disable/enable is removed from
> intel_pipe_update_(start/end) in the rt series. It might make sense
> from a -rt point of view, but that code needs to run without
> interruptions, or i915 may show visual glitches or even locks up the
> system.
>
> It should just be a set of registers hammered in, but the code might
> needs to be fixed to take the mmio lock as outer lock, and become a
> strict set of register read/writes only.

Let me see. So Anton Lundin (Cc:) reported glitches due to _this_ patch
on -RT. I have just a Sandybridge around with a i915 and it does not get
near that code here. 

> ~Maarten

Sebastian

[PATCH 1/2] drm/i915/gt: Queue and wait for the irq_work item.

2021-09-08 Thread Sebastian Andrzej Siewior

Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.

Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).

Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.

Reported-by: Clark Williams 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 38cc42783dfb2..594dec2f76954 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -318,10 +318,9 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
/* Kick the work once more to drain the signalers, and disarm the irq */
irq_work_sync(>irq_work);
while (READ_ONCE(b->irq_armed) && !atomic_read(>active)) {
-   local_irq_disable();
-   signal_irq_work(>irq_work);
-   local_irq_enable();
+   irq_work_queue(>irq_work);
cond_resched();
+   irq_work_sync(>irq_work);
}
 }
 
-- 
2.33.0

[RFC PATCH 2/2] drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()

2021-09-08 Thread Sebastian Andrzej Siewior

execlists_dequeue() is invoked from a function which uses
local_irq_disable() to disable interrupts so the spin_lock() behaves
like spin_lock_irq().
This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not
the same as spin_lock_irq().

execlists_dequeue_irq() and execlists_dequeue() has each one caller
only. If intel_engine_cs::active::lock is acquired and released with the
_irq suffix then it behaves almost as if execlists_dequeue() would be
invoked with disabled interrupts. The difference is the last part of the
function which is then invoked with enabled interrupts.
I can't tell if this makes a difference. From looking at it, it might
work to move the last unlock at the end of the function as I didn't find
anything that would acquire the lock again.

Reported-by: Clark Williams 
Signed-off-by: Sebastian Andrzej Siewior 
---
 .../drm/i915/gt/intel_execlists_submission.c| 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index fc77592d88a96..2ec1dd352960b 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1265,7 +1265,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * and context switches) submission.
 */
 
-   spin_lock(>active.lock);
+   spin_lock_irq(>active.lock);
 
/*
 * If the queue is higher priority than the last
@@ -1365,7 +1365,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * Even if ELSP[1] is occupied and not worthy
 * of timeslices, our queue might be.
 */
-   spin_unlock(>active.lock);
+   spin_unlock_irq(>active.lock);
return;
}
}
@@ -1391,7 +1391,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 
if (last && !can_merge_rq(last, rq)) {
spin_unlock(>base.active.lock);
-   spin_unlock(>active.lock);
+   spin_unlock_irq(>active.lock);
return; /* leave this for another sibling */
}
 
@@ -1552,7 +1552,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * interrupt for secondary ports).
 */
execlists->queue_priority_hint = queue_prio(execlists);
-   spin_unlock(>active.lock);
+   spin_unlock_irq(>active.lock);
 
/*
 * We can skip poking the HW if we ended up with exactly the same set
@@ -1578,13 +1578,6 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
}
 }
 
-static void execlists_dequeue_irq(struct intel_engine_cs *engine)
-{
-   local_irq_disable(); /* Suspend interrupts across request submission */
-   execlists_dequeue(engine);
-   local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */
-}
-
 static void clear_ports(struct i915_request **ports, int count)
 {
memset_p((void **)ports, NULL, count);
@@ -2377,7 +2370,7 @@ static void execlists_submission_tasklet(struct 
tasklet_struct *t)
}
 
if (!engine->execlists.pending[0]) {
-   execlists_dequeue_irq(engine);
+   execlists_dequeue(engine);
start_timeslice(engine);
}
 
-- 
2.33.0

[PATCH 0/2] drm/i915/gt: Locking splats PREEMPT_RT

2021-09-08 Thread Sebastian Andrzej Siewior

Clark Williams reported two issues with the i915 driver running on
PREEMPT_RT. While #1 looks simple I have no idea about #2 thus the RFC.

Sebastian

Re: [PATCH v2 0/3] drm/amdgpu: Remove in_interrupt() usage.

2021-03-10 Thread Sebastian Andrzej Siewior

On 2021-02-09 18:43:54 [+0100], Christian König wrote:
> Hi Sebastian,
Hi Christian,

> to be honest I'm thinking about that for quite some time now and I don't
> think that this is possible without a severe rewrite of the driver.
> 
> The problem is simply that we have a lot of functions which deal with
> hardware handling independent of the context. But how registers are accessed
> needs to be different depending if your are in the interrupt handler or not.
> 
> You would need to push the information if we are coming in from the
> interrupt handler through a > 10 function calls.
> 
> I don't think that this is feasible nor good design.

Yeah, that is what I saw and didn't even try.

The possible backtrace (at the bottom of this email) is this a correct
assumption?

Another quick question: You acked my three-patch series. I don't see it
in the next tree as of today. Is there anything for me to do?

> Regards,
> Christian.
> 
> Am 09.02.21 um 17:53 schrieb Sebastian Andrzej Siewior:
> > On 2021-02-09 13:50:31 [+0100], Christian König wrote:
> > > Reviewed-by: Christian König  for the series.
> > Thank you.
> > Any chance you could give me a hand with the remaining three users
> > within the amdgpu driver? I don't know if the in_interrupt() check can
> > be limited to certain callers.
> > What I noticed while tracing v5.10 is this:
> > 
> > | Xorg-2257[007] d... 57261.620043: amdgpu_device_wreg: 
> > 0x699f, 0x1bcf, 0x0100
> > |  => trace_event_raw_event_amdgpu_device_wreg
> > |  => amdgpu_device_wreg.part.0
> > |  => dce110_arm_vert_intr
> > |  => dce110_vblank_set
> > |  => dm_enable_vblank
> > |  => drm_vblank_enable
> > |  => drm_vblank_get
> > |  => drm_wait_vblank_ioctl
> > |  => drm_ioctl_kernel
> > |  => drm_ioctl
> > |  => amdgpu_drm_ioctl
> > |  => __x64_sys_ioctl
> > |  => do_syscall_64
> > |  => entry_SYSCALL_64_after_hwframe
> > 
> > I think that amdgpu_device_wreg() -> amdgpu_kiq_wreg() could be invoked.
> > It doesn't here because amdgpu_sriov_runtime() is false.
> > The trace says `d' which means interrupts are disabled but
> > in_interrupt() will return false in this case (no IRQ/softirq).
> > 
> > Sebastian

Sebastian
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 0/3] video: Remove in_interrupt() usage.

2021-02-16 Thread Sebastian Andrzej Siewior

On 2021-02-08 23:38:07 [+0100], To linux-fb...@vger.kernel.org wrote:
> Folks,
> 
> in the discussion about preempt count consistency across kernel
> configurations:
> 
>  https://lore.kernel.org/r/20200914204209.256266...@linutronix.de/
> 
> it was concluded that the usage of in_interrupt() and related context
> checks should be removed from non-core code.
> 
> In the long run, usage of 'preemptible, in_*irq etc.' should be banned from
> driver code completely.
> 
> This series targets the video subsystem. The omap patches are a repost
> of [0], the amba-clcd is new after I received no feedback on my analysis
> [1].
> 
> [0] https://lkml.kernel.org/r/20210127172902.145335-1-bige...@linutronix.de
> [1] https://lkml.kernel.org/r/20210127174408.ududpwfrbg3dh...@linutronix.de

Could someone please apply the series? Video seems unmaintained.

Sebastian
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 0/3] drm/amdgpu: Remove in_interrupt() usage.

2021-02-09 Thread Sebastian Andrzej Siewior

On 2021-02-09 13:50:31 [+0100], Christian König wrote:
> Reviewed-by: Christian König  for the series.

Thank you.
Any chance you could give me a hand with the remaining three users
within the amdgpu driver? I don't know if the in_interrupt() check can
be limited to certain callers.
What I noticed while tracing v5.10 is this:

| Xorg-2257[007] d... 57261.620043: amdgpu_device_wreg: 0x699f, 
0x1bcf, 0x0100
|  => trace_event_raw_event_amdgpu_device_wreg
|  => amdgpu_device_wreg.part.0
|  => dce110_arm_vert_intr
|  => dce110_vblank_set
|  => dm_enable_vblank
|  => drm_vblank_enable
|  => drm_vblank_get
|  => drm_wait_vblank_ioctl
|  => drm_ioctl_kernel
|  => drm_ioctl
|  => amdgpu_drm_ioctl
|  => __x64_sys_ioctl
|  => do_syscall_64
|  => entry_SYSCALL_64_after_hwframe

I think that amdgpu_device_wreg() -> amdgpu_kiq_wreg() could be invoked.
It doesn't here because amdgpu_sriov_runtime() is false.
The trace says `d' which means interrupts are disabled but
in_interrupt() will return false in this case (no IRQ/softirq).

Sebastian
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v2 0/3] drm/amdgpu: Remove in_interrupt() usage.

2021-02-09 Thread Sebastian Andrzej Siewior

Folks,

in the discussion about preempt count consistency across kernel
configurations:

 https://lore.kernel.org/r/20200914204209.256266...@linutronix.de/

it was concluded that the usage of in_interrupt() and related context
checks should be removed from non-core code.

In the long run, usage of 'preemptible, in_*irq etc.' should be banned from
driver code completely.

This series addresses parts of the amdgpu driver.  There are still call sites
left in in the amdgpu driver.

v1…v2:
   - Limit to admgpu only
   - use "bool" instead of "bool == true"

Sebastian


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/3] drm/amdgpu: Remove in_interrupt() usage in gfx_v9_0_kiq_read_clock()

2021-02-09 Thread Sebastian Andrzej Siewior

gfx_v9_0_get_gpu_clock_counter() acquires a mutex_t lock and is the only
caller of gfx_v9_0_kiq_read_clock().
If it safe to acquire a mutex_t then gfx_v9_0_get_gpu_clock_counter() is
always invoked from preemptible context.

Remove in_interrupt() because it superfluous as it will always return
false.

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index ca7e7264926e6..72c319b860a33 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -4105,7 +4105,7 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct 
amdgpu_device *adev)
 *
 * also don't wait anymore for IRQ context
 * */
-   if (r < 1 && (amdgpu_in_reset(adev) || in_interrupt()))
+   if (r < 1 && (amdgpu_in_reset(adev)))
goto failed_kiq_read;
 
might_sleep();
-- 
2.30.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 3/3] drm/amdgpu: Replace in_task() in gfx_v8_0_parse_sq_irq()

2021-02-09 Thread Sebastian Andrzej Siewior

gfx_v8_0_parse_sq_irq() is using in_task() to distinguish if it is
invoked from a workqueue worker or directly from the interrupt handler.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

gfx_v8_0_parse_sq_irq() is invoked directly either from a worker or from
the interrupt service routine. The worker is only bypassed if the worker
is already busy.

Add an argument `from_wq' to gfx_v8_0_parse_sq_irq() which is true if
invoked from the worker.

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 37639214cbbbd..f346afce82ea0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6719,7 +6719,8 @@ static int gfx_v8_0_cp_ecc_error_irq(struct amdgpu_device 
*adev,
return 0;
 }
 
-static void gfx_v8_0_parse_sq_irq(struct amdgpu_device *adev, unsigned ih_data)
+static void gfx_v8_0_parse_sq_irq(struct amdgpu_device *adev, unsigned ih_data,
+ bool from_wq)
 {
u32 enc, se_id, sh_id, cu_id;
char type[20];
@@ -6757,7 +6758,7 @@ static void gfx_v8_0_parse_sq_irq(struct amdgpu_device 
*adev, unsigned ih_data)
 * or from BH in which case we can access SQ_EDC_INFO
 * instance
 */
-   if (in_task()) {
+   if (from_wq) {
mutex_lock(>grbm_idx_mutex);
gfx_v8_0_select_se_sh(adev, se_id, sh_id, 
cu_id);
 
@@ -6795,7 +6796,7 @@ static void gfx_v8_0_sq_irq_work_func(struct work_struct 
*work)
struct amdgpu_device *adev = container_of(work, struct amdgpu_device, 
gfx.sq_work.work);
struct sq_work *sq_work = container_of(work, struct sq_work, work);
 
-   gfx_v8_0_parse_sq_irq(adev, sq_work->ih_data);
+   gfx_v8_0_parse_sq_irq(adev, sq_work->ih_data, true);
 }
 
 static int gfx_v8_0_sq_irq(struct amdgpu_device *adev,
@@ -6810,7 +6811,7 @@ static int gfx_v8_0_sq_irq(struct amdgpu_device *adev,
 * just print whatever info is possible directly from the ISR.
 */
if (work_pending(>gfx.sq_work.work)) {
-   gfx_v8_0_parse_sq_irq(adev, ih_data);
+   gfx_v8_0_parse_sq_irq(adev, ih_data, false);
} else {
adev->gfx.sq_work.ih_data = ih_data;
schedule_work(>gfx.sq_work.work);
-- 
2.30.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/3] drm/amdgpu: Replace in_interrupt() usage in gmc_v*_process_interrupt()

2021-02-09 Thread Sebastian Andrzej Siewior

The usage of in_interrupt() in gmc_v*_process_interrupt() is intended to
use a different code path if invoked from the interrupt handler vs
invoked from the workqueue.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

gmc_v*_process_interrupt() is invoked via the ->process() callback
from amdgpu_ih_process() which in turn is invoked either from
amdgpu_irq_handler() (the interrupt handler) or from
amdgpu_irq_handle_*() which is a workqueue.

amdgpu_irq::ih is always processed from the interrupt handler, the other
three struct amdgpu_ih_ring members are processed from a workqueue.

Replace the in_interrupt() check with a comparison against adev->irq.ih.
A similar check is already done to check if the ih pointer is from
ih_soft.

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 3b7c6c31fce1f..7b6791d699e27 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -113,7 +113,7 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_device 
*adev,
/* Delegate it to a different ring if the hardware hasn't
 * already done it.
 */
-   if (in_interrupt()) {
+   if (entry->ih == >irq.ih) {
amdgpu_irq_delegate(adev, entry, 8);
return 1;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index aedef9017c4c2..266296be7302d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -486,7 +486,7 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device 
*adev,
/* Delegate it to a different ring if the hardware hasn't
 * already done it.
 */
-   if (in_interrupt()) {
+   if (entry->ih == >irq.ih) {
amdgpu_irq_delegate(adev, entry, 8);
return 1;
}
-- 
2.30.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 3/4] drm/amdgpu: Remove in_interrupt() usage in gfx_v9_0_kiq_read_clock()

2021-02-08 Thread Sebastian Andrzej Siewior

gfx_v9_0_get_gpu_clock_counter() acquires a mutex_t lock and is the only
caller of gfx_v9_0_kiq_read_clock().
If it safe to acquire a mutex_t then gfx_v9_0_get_gpu_clock_counter() is
always invoked from preemptible context.

Remove in_interrupt() because it superfluous as it will always return
false.

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index ca7e7264926e6..72c319b860a33 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -4105,7 +4105,7 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct 
amdgpu_device *adev)
 *
 * also don't wait anymore for IRQ context
 * */
-   if (r < 1 && (amdgpu_in_reset(adev) || in_interrupt()))
+   if (r < 1 && (amdgpu_in_reset(adev)))
goto failed_kiq_read;
 
might_sleep();
-- 
2.30.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 4/4] drm/amdgpu: Replace in_task() in gfx_v8_0_parse_sq_irq()

2021-02-08 Thread Sebastian Andrzej Siewior

gfx_v8_0_parse_sq_irq() is using in_task() to distinguish if it is
invoked from a workqueue worker or directly from the interrupt handler.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

gfx_v8_0_parse_sq_irq() is invoked directly either from a worker or from
the interrupt service routine. The worker is only bypassed if the worker
is already busy.

Add an argument `from_wq' to gfx_v8_0_parse_sq_irq() which is true if
invoked from the worker.

Signed-off-by: Sebastian Andrzej Siewior 
---

Side note: work_pending() will return false _before_ the callback
function (gfx_v8_0_sq_irq_work_func() in case) is invoked. That means if
the interrupt can fire again before the workqueue completed then it is
possible with the right timing to have `gfx.sq_work.ih_data'
overwritten from the previous invocation.

 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 37639214cbbbd..8a5a7ecb9fa2b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6719,7 +6719,8 @@ static int gfx_v8_0_cp_ecc_error_irq(struct amdgpu_device 
*adev,
return 0;
 }
 
-static void gfx_v8_0_parse_sq_irq(struct amdgpu_device *adev, unsigned ih_data)
+static void gfx_v8_0_parse_sq_irq(struct amdgpu_device *adev, unsigned ih_data,
+ bool from_wq)
 {
u32 enc, se_id, sh_id, cu_id;
char type[20];
@@ -6757,7 +6758,7 @@ static void gfx_v8_0_parse_sq_irq(struct amdgpu_device 
*adev, unsigned ih_data)
 * or from BH in which case we can access SQ_EDC_INFO
 * instance
 */
-   if (in_task()) {
+   if (from_wq == true) {
mutex_lock(>grbm_idx_mutex);
gfx_v8_0_select_se_sh(adev, se_id, sh_id, 
cu_id);
 
@@ -6795,7 +6796,7 @@ static void gfx_v8_0_sq_irq_work_func(struct work_struct 
*work)
struct amdgpu_device *adev = container_of(work, struct amdgpu_device, 
gfx.sq_work.work);
struct sq_work *sq_work = container_of(work, struct sq_work, work);
 
-   gfx_v8_0_parse_sq_irq(adev, sq_work->ih_data);
+   gfx_v8_0_parse_sq_irq(adev, sq_work->ih_data, true);
 }
 
 static int gfx_v8_0_sq_irq(struct amdgpu_device *adev,
@@ -6810,7 +6811,7 @@ static int gfx_v8_0_sq_irq(struct amdgpu_device *adev,
 * just print whatever info is possible directly from the ISR.
 */
if (work_pending(>gfx.sq_work.work)) {
-   gfx_v8_0_parse_sq_irq(adev, ih_data);
+   gfx_v8_0_parse_sq_irq(adev, ih_data, false);
} else {
adev->gfx.sq_work.ih_data = ih_data;
schedule_work(>gfx.sq_work.work);
-- 
2.30.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/4] drm/amdgpu: Replace in_interrupt() usage in gmc_v*_process_interrupt()

2021-02-08 Thread Sebastian Andrzej Siewior

The usage of in_interrupt() in gmc_v*_process_interrupt() is intended to
use a different code path if invoked from the interrupt handler vs
invoked from the workqueue.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

gmc_v*_process_interrupt() is invoked via the ->process() callback
from amdgpu_ih_process() which in turn is invoked either from
amdgpu_irq_handler() (the interrupt handler) or from
amdgpu_irq_handle_*() which is a workqueue.

amdgpu_irq::ih is always processed from the interrupt handler, the other
three struct amdgpu_ih_ring members are processed from a workqueue.

Replace the in_interrupt() check with a comparison against adev->irq.ih.
A similar check is already done to check if the ih pointer is from
ih_soft.

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 3b7c6c31fce1f..7b6791d699e27 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -113,7 +113,7 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_device 
*adev,
/* Delegate it to a different ring if the hardware hasn't
 * already done it.
 */
-   if (in_interrupt()) {
+   if (entry->ih == >irq.ih) {
amdgpu_irq_delegate(adev, entry, 8);
return 1;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index aedef9017c4c2..266296be7302d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -486,7 +486,7 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device 
*adev,
/* Delegate it to a different ring if the hardware hasn't
 * already done it.
 */
-   if (in_interrupt()) {
+   if (entry->ih == >irq.ih) {
amdgpu_irq_delegate(adev, entry, 8);
return 1;
}
-- 
2.30.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/4] drm/gma500: Remove in_atomic() usage.

2021-02-08 Thread Sebastian Andrzej Siewior

The driver is using msleep() if it is safe to use based on in_atomic().
This is not needed this macro is only used from
i2c_algorithm::master_xfer() which is always invoked from preemptible
context.

Remove in_atomic() because it is superfluous. Remove wait_for_atomic()
because it has no users.

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/gma500/intel_gmbus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/gma500/intel_gmbus.c 
b/drivers/gpu/drm/gma500/intel_gmbus.c
index 370bd6451bd9b..eb0924473a218 100644
--- a/drivers/gpu/drm/gma500/intel_gmbus.c
+++ b/drivers/gpu/drm/gma500/intel_gmbus.c
@@ -44,13 +44,13 @@
ret__ = -ETIMEDOUT; \
break;  \
}   \
-   if (W && !(in_atomic() || in_dbg_master())) msleep(W);  \
+   if (W && !(in_dbg_master()))\
+   msleep(W);  \
}   \
ret__;  \
 })
 
 #define wait_for(COND, MS) _wait_for(COND, MS, 1)
-#define wait_for_atomic(COND, MS) _wait_for(COND, MS, 0)
 
 #define GMBUS_REG_READ(reg) ioread32(dev_priv->gmbus_reg + (reg))
 #define GMBUS_REG_WRITE(reg, val) iowrite32((val), dev_priv->gmbus_reg + (reg))
-- 
2.30.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 0/4] drm: Replace in_interrupt() usage.

2021-02-08 Thread Sebastian Andrzej Siewior

Folks,

in the discussion about preempt count consistency across kernel
configurations:

 https://lore.kernel.org/r/20200914204209.256266...@linutronix.de/

it was concluded that the usage of in_interrupt() and related context
checks should be removed from non-core code.

In the long run, usage of 'preemptible, in_*irq etc.' should be banned from
driver code completely.

This series addresses the gma500 driver and parts of the amdgpu driver.
There are still call sites left in in the amdgpu driver.

Sebastian


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/3] video: omap: Remove in_interrupt() usage.

2021-02-08 Thread Sebastian Andrzej Siewior

From: "Ahmed S. Darwish" 

alloc_req() uses in_interrupt() to detect if it is safe to use down().

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

The semaphore is used as a counting semaphore, initialized with the
number of slots in the request pool minus IRQ_REQ_POOL_SIZE - which are
reserved for the in_interrupt() user to ensure that a request is always
available. The preemptible user will block on the semphore waiting for a
request to become available in case there are no requests available.

Replace in_interrupt() with a `can_sleep' argument to indicate if it is
safe to block on the sempahore.

Cc: linux-o...@vger.kernel.org
Signed-off-by: Ahmed S. Darwish 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/video/fbdev/omap/hwa742.c | 42 ---
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/video/fbdev/omap/hwa742.c 
b/drivers/video/fbdev/omap/hwa742.c
index cfe63932f8255..b191bef22d984 100644
--- a/drivers/video/fbdev/omap/hwa742.c
+++ b/drivers/video/fbdev/omap/hwa742.c
@@ -100,6 +100,14 @@ struct {
struct hwa742_request   req_pool[REQ_POOL_SIZE];
struct list_headpending_req_list;
struct list_headfree_req_list;
+
+   /*
+* @req_lock: protect request slots pool and its tracking lists
+* @req_sema: counter; slot allocators from task contexts must
+*push it down before acquiring a slot. This
+*guarantees that atomic contexts will always have
+*a minimum of IRQ_REQ_POOL_SIZE slots available.
+*/
struct semaphorereq_sema;
spinlock_t  req_lock;
 
@@ -224,13 +232,13 @@ static void disable_tearsync(void)
hwa742_write_reg(HWA742_NDP_CTRL, b);
 }
 
-static inline struct hwa742_request *alloc_req(void)
+static inline struct hwa742_request *alloc_req(bool can_sleep)
 {
unsigned long flags;
struct hwa742_request *req;
int req_flags = 0;
 
-   if (!in_interrupt())
+   if (can_sleep)
down(_sema);
else
req_flags = REQ_FROM_IRQ_POOL;
@@ -399,8 +407,8 @@ static void send_frame_complete(void *data)
hwa742.int_ctrl->enable_plane(OMAPFB_PLANE_GFX, 0);
 }
 
-#define ADD_PREQ(_x, _y, _w, _h) do {  \
-   req = alloc_req();  \
+#define ADD_PREQ(_x, _y, _w, _h, can_sleep) do {\
+   req = alloc_req(can_sleep); \
req->handler= send_frame_handler;   \
req->complete   = send_frame_complete;  \
req->par.update.x = _x; \
@@ -413,7 +421,8 @@ static void send_frame_complete(void *data)
 } while(0)
 
 static void create_req_list(struct omapfb_update_window *win,
-   struct list_head *req_head)
+   struct list_head *req_head,
+   bool can_sleep)
 {
struct hwa742_request *req;
int x = win->x;
@@ -427,7 +436,7 @@ static void create_req_list(struct omapfb_update_window 
*win,
color_mode = win->format & OMAPFB_FORMAT_MASK;
 
if (x & 1) {
-   ADD_PREQ(x, y, 1, height);
+   ADD_PREQ(x, y, 1, height, can_sleep);
width--;
x++;
flags &= ~OMAPFB_FORMAT_FLAG_TEARSYNC;
@@ -439,19 +448,19 @@ static void create_req_list(struct omapfb_update_window 
*win,
 
if (xspan * height * 2 > hwa742.max_transmit_size) {
yspan = hwa742.max_transmit_size / (xspan * 2);
-   ADD_PREQ(x, ystart, xspan, yspan);
+   ADD_PREQ(x, ystart, xspan, yspan, can_sleep);
ystart += yspan;
yspan = height - yspan;
flags &= ~OMAPFB_FORMAT_FLAG_TEARSYNC;
}
 
-   ADD_PREQ(x, ystart, xspan, yspan);
+   ADD_PREQ(x, ystart, xspan, yspan, can_sleep);
x += xspan;
width -= xspan;
flags &= ~OMAPFB_FORMAT_FLAG_TEARSYNC;
}
if (width)
-   ADD_PREQ(x, y, 1, height);
+   ADD_PREQ(x, y, 1, height, can_sleep);
 }
 
 static void auto_update_complete(void *data)
@@ -461,12 +470,12 @@ static void auto_update_complete(void *data)
  jiffies + HWA742_AUTO_UPDATE_TIME);
 }
 
-static void hwa742_update_window_auto(struct timer_list *unused)
+static void __hwa742_update_window_auto(bool can_sleep)
 {
LIST_HEAD(req_list);
struct hwa742_request *last;
 
-   create_req_list(_update_window, _list);
+   create_req_list(_update_window, _list, can_sleep);
las

[PATCH 0/3] video: Remove in_interrupt() usage.

2021-02-08 Thread Sebastian Andrzej Siewior

Folks,

in the discussion about preempt count consistency across kernel
configurations:

 https://lore.kernel.org/r/20200914204209.256266...@linutronix.de/

it was concluded that the usage of in_interrupt() and related context
checks should be removed from non-core code.

In the long run, usage of 'preemptible, in_*irq etc.' should be banned from
driver code completely.

This series targets the video subsystem. The omap patches are a repost
of [0], the amba-clcd is new after I received no feedback on my analysis
[1].

[0] https://lkml.kernel.org/r/20210127172902.145335-1-bige...@linutronix.de
[1] https://lkml.kernel.org/r/20210127174408.ududpwfrbg3dh...@linutronix.de

Sebastian


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 3/3] video: fbdev: amba-clcd: Always use msleep() for waiting

2021-02-08 Thread Sebastian Andrzej Siewior

The driver uses in_atomic() to distinguish between mdelay() and msleep().

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

I traced the usage of in_interrupt() back to its initial merge:
bfe694f833643 ("[ARM] Add ARM AMBA CLCD framebuffer driver.")
https://git.kernel.org/history/history/c/bfe694f833643

The driver has been removed and added back in the meantime.
I've been looking for the IRQ context as described in the comment and
couldn't find it. The functions calling clcdfb_sleep() also call
conditionally backlight_update_status() which acquires a mutex. If it is
okay to acquire a mutex then it is okay to use msleep() since both
functions must be used in preemptible context.

Replace clcdfb_sleep() with msleep().

Cc: Peter Collingbourne 
Cc: Russell King 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/video/fbdev/amba-clcd.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/video/fbdev/amba-clcd.c b/drivers/video/fbdev/amba-clcd.c
index 33595cc4778e9..9ec969e136bfd 100644
--- a/drivers/video/fbdev/amba-clcd.c
+++ b/drivers/video/fbdev/amba-clcd.c
@@ -35,19 +35,6 @@
 /* This is limited to 16 characters when displayed by X startup */
 static const char *clcd_name = "CLCD FB";
 
-/*
- * Unfortunately, the enable/disable functions may be called either from
- * process or IRQ context, and we _need_ to delay.  This is _not_ good.
- */
-static inline void clcdfb_sleep(unsigned int ms)
-{
-   if (in_atomic()) {
-   mdelay(ms);
-   } else {
-   msleep(ms);
-   }
-}
-
 static inline void clcdfb_set_start(struct clcd_fb *fb)
 {
unsigned long ustart = fb->fb.fix.smem_start;
@@ -77,7 +64,7 @@ static void clcdfb_disable(struct clcd_fb *fb)
val &= ~CNTL_LCDPWR;
writel(val, fb->regs + fb->off_cntl);
 
-   clcdfb_sleep(20);
+   msleep(20);
}
if (val & CNTL_LCDEN) {
val &= ~CNTL_LCDEN;
@@ -109,7 +96,7 @@ static void clcdfb_enable(struct clcd_fb *fb, u32 cntl)
cntl |= CNTL_LCDEN;
writel(cntl, fb->regs + fb->off_cntl);
 
-   clcdfb_sleep(20);
+   msleep(20);
 
/*
 * and now apply power.
-- 
2.30.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/3] video: omapfb: Remove WARN_ON(in_interrupt()).

2021-02-08 Thread Sebastian Andrzej Siewior

From: "Ahmed S. Darwish" 

dsi_sync_vc() uses in_interrupt() to create a warning if the function is
used in non-preemptible context.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

The wait_for_completion() function (used in dsi_sync_vc_vp() and
dsi_sync_vc_l4() has already a check if it is invoked from proper
context.

Remove WARN_ON(in_interrupt()) from the driver.

Cc: linux-o...@vger.kernel.org
Signed-off-by: Ahmed S. Darwish 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/video/fbdev/omap2/omapfb/dss/dsi.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/video/fbdev/omap2/omapfb/dss/dsi.c 
b/drivers/video/fbdev/omap2/omapfb/dss/dsi.c
index dc34bb04b865c..df90091de75f8 100644
--- a/drivers/video/fbdev/omap2/omapfb/dss/dsi.c
+++ b/drivers/video/fbdev/omap2/omapfb/dss/dsi.c
@@ -2373,8 +2373,6 @@ static int dsi_sync_vc(struct platform_device *dsidev, 
int channel)
 
WARN_ON_ONCE(!dsi_bus_is_locked(dsidev));
 
-   WARN_ON(in_interrupt());
-
if (!dsi_vc_is_enabled(dsidev, channel))
return 0;
 
-- 
2.30.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/2] video: omapfb: Remove WARN_ON(in_interrupt()).

2021-01-28 Thread Sebastian Andrzej Siewior

From: "Ahmed S. Darwish" 

dsi_sync_vc() uses in_interrupt() to create a warning if the function is
used in non-preemptible context.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

The wait_for_completion() function (used in dsi_sync_vc_vp() and
dsi_sync_vc_l4() has already a check if it is invoked from proper
context.

Remove WARN_ON(in_interrupt()) from the driver.

Signed-off-by: Ahmed S. Darwish 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/video/fbdev/omap2/omapfb/dss/dsi.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/video/fbdev/omap2/omapfb/dss/dsi.c 
b/drivers/video/fbdev/omap2/omapfb/dss/dsi.c
index dc34bb04b865c..df90091de75f8 100644
--- a/drivers/video/fbdev/omap2/omapfb/dss/dsi.c
+++ b/drivers/video/fbdev/omap2/omapfb/dss/dsi.c
@@ -2373,8 +2373,6 @@ static int dsi_sync_vc(struct platform_device *dsidev, 
int channel)
 
WARN_ON_ONCE(!dsi_bus_is_locked(dsidev));
 
-   WARN_ON(in_interrupt());
-
if (!dsi_vc_is_enabled(dsidev, channel))
return 0;
 
-- 
2.30.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC] in_atomic() usage in amba-clcd.c / FB_ARMCLCD

2021-01-28 Thread Sebastian Andrzej Siewior

The mba-clcd driver uses in_interrupt() in clcdfb_sleep():
| /*
|  * Unfortunately, the enable/disable functions may be called either from
|  * process or IRQ context, and we _need_ to delay.  This is _not_ good.
|  */
| static inline void clcdfb_sleep(unsigned int ms)
| {
| if (in_atomic()) {
| mdelay(ms);
| } else {
| msleep(ms);
| }
| }

I traced it back to its initial merge:
bfe694f833643 ("[ARM] Add ARM AMBA CLCD framebuffer driver.")
https://git.kernel.org/history/history/c/bfe694f833643

The driver has been removed and added back in the meantime. 
I've been looking for the IRQ context as described in the comment and
couldn't find it. The functions calling clcdfb_sleep() also call
conditionally backlight_update_status() which acquires a mutex.

Is this part really outdated and now msleep() could be used
unconditionally?

Sebastian
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/2] video: omap: Remove in_interrupt() usage.

2021-01-28 Thread Sebastian Andrzej Siewior

From: "Ahmed S. Darwish" 

alloc_req() uses in_interrupt() to detect if it is safe to use down().

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

The semaphore is used as a counting semaphore, initialized with the
number of slots in the request pool minus IRQ_REQ_POOL_SIZE - which are
reserved for the in_interrupt() user to ensure that a request is always
available. The preemptible user will block on the semphore waiting for a
request to become available in case there are no requests available.

Replace in_interrupt() with a `can_sleep' argument to indicate if it is
safe to block on the sempahore.

Signed-off-by: Ahmed S. Darwish 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/video/fbdev/omap/hwa742.c | 42 ---
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/video/fbdev/omap/hwa742.c 
b/drivers/video/fbdev/omap/hwa742.c
index cfe63932f8255..b191bef22d984 100644
--- a/drivers/video/fbdev/omap/hwa742.c
+++ b/drivers/video/fbdev/omap/hwa742.c
@@ -100,6 +100,14 @@ struct {
struct hwa742_request   req_pool[REQ_POOL_SIZE];
struct list_headpending_req_list;
struct list_headfree_req_list;
+
+   /*
+* @req_lock: protect request slots pool and its tracking lists
+* @req_sema: counter; slot allocators from task contexts must
+*push it down before acquiring a slot. This
+*guarantees that atomic contexts will always have
+*a minimum of IRQ_REQ_POOL_SIZE slots available.
+*/
struct semaphorereq_sema;
spinlock_t  req_lock;
 
@@ -224,13 +232,13 @@ static void disable_tearsync(void)
hwa742_write_reg(HWA742_NDP_CTRL, b);
 }
 
-static inline struct hwa742_request *alloc_req(void)
+static inline struct hwa742_request *alloc_req(bool can_sleep)
 {
unsigned long flags;
struct hwa742_request *req;
int req_flags = 0;
 
-   if (!in_interrupt())
+   if (can_sleep)
down(_sema);
else
req_flags = REQ_FROM_IRQ_POOL;
@@ -399,8 +407,8 @@ static void send_frame_complete(void *data)
hwa742.int_ctrl->enable_plane(OMAPFB_PLANE_GFX, 0);
 }
 
-#define ADD_PREQ(_x, _y, _w, _h) do {  \
-   req = alloc_req();  \
+#define ADD_PREQ(_x, _y, _w, _h, can_sleep) do {\
+   req = alloc_req(can_sleep); \
req->handler= send_frame_handler;   \
req->complete   = send_frame_complete;  \
req->par.update.x = _x; \
@@ -413,7 +421,8 @@ static void send_frame_complete(void *data)
 } while(0)
 
 static void create_req_list(struct omapfb_update_window *win,
-   struct list_head *req_head)
+   struct list_head *req_head,
+   bool can_sleep)
 {
struct hwa742_request *req;
int x = win->x;
@@ -427,7 +436,7 @@ static void create_req_list(struct omapfb_update_window 
*win,
color_mode = win->format & OMAPFB_FORMAT_MASK;
 
if (x & 1) {
-   ADD_PREQ(x, y, 1, height);
+   ADD_PREQ(x, y, 1, height, can_sleep);
width--;
x++;
flags &= ~OMAPFB_FORMAT_FLAG_TEARSYNC;
@@ -439,19 +448,19 @@ static void create_req_list(struct omapfb_update_window 
*win,
 
if (xspan * height * 2 > hwa742.max_transmit_size) {
yspan = hwa742.max_transmit_size / (xspan * 2);
-   ADD_PREQ(x, ystart, xspan, yspan);
+   ADD_PREQ(x, ystart, xspan, yspan, can_sleep);
ystart += yspan;
yspan = height - yspan;
flags &= ~OMAPFB_FORMAT_FLAG_TEARSYNC;
}
 
-   ADD_PREQ(x, ystart, xspan, yspan);
+   ADD_PREQ(x, ystart, xspan, yspan, can_sleep);
x += xspan;
width -= xspan;
flags &= ~OMAPFB_FORMAT_FLAG_TEARSYNC;
}
if (width)
-   ADD_PREQ(x, y, 1, height);
+   ADD_PREQ(x, y, 1, height, can_sleep);
 }
 
 static void auto_update_complete(void *data)
@@ -461,12 +470,12 @@ static void auto_update_complete(void *data)
  jiffies + HWA742_AUTO_UPDATE_TIME);
 }
 
-static void hwa742_update_window_auto(struct timer_list *unused)
+static void __hwa742_update_window_auto(bool can_sleep)
 {
LIST_HEAD(req_list);
struct hwa742_request *last;
 
-   create_req_list(_update_window, _list);
+   create_req_list(_update_window, _list, can_sleep);
last = list_entry(req_list.

[PATCH 0/2] video: omap*: Remove in_interrupt() usage.

2021-01-28 Thread Sebastian Andrzej Siewior

Folks,

in the discussion about preempt count consistency across kernel
configurations:

 https://lore.kernel.org/r/20200914204209.256266...@linutronix.de/

it was concluded that the usage of in_interrupt() and related context
checks should be removed from non-core code.

In the long run, usage of 'preemptible, in_*irq etc.' should be banned from
driver code completely.

This series targets the video subsystem.

Sebastian

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [patch V3 13/37] mips/mm/highmem: Switch to generic kmap atomic

2021-01-11 Thread Sebastian Andrzej Siewior

On 2021-01-09 01:33:52 [+0100], Thomas Bogendoerfer wrote:
> On Sat, Jan 09, 2021 at 12:58:05AM +0100, Thomas Bogendoerfer wrote:
> > On Fri, Jan 08, 2021 at 08:20:43PM +, Paul Cercueil wrote:
> > > Hi Thomas,
> > > 
> > > 5.11 does not boot anymore on Ingenic SoCs, I bisected it to this commit.
> > > 
> > > Any idea what could be happening?
> > 
> > not yet, kernel crash log of a Malta QEMU is below.
> 
> update:
> 
> This dirty hack lets the Malta QEMU boot again:
> 
> diff --git a/mm/highmem.c b/mm/highmem.c
> index c3a9ea7875ef..190cdda1149d 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -515,7 +515,7 @@ void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t 
> prot)
>   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
>   BUG_ON(!pte_none(*(kmap_pte - idx)));
>   pteval = pfn_pte(pfn, prot);
> - set_pte_at(_mm, vaddr, kmap_pte - idx, pteval);
> + set_pte(kmap_pte - idx, pteval);
>   arch_kmap_local_post_map(vaddr, pteval);
>   current->kmap_ctrl.pteval[kmap_local_idx()] = pteval;
>   preempt_enable();
> 
> set_pte_at() tries to update cache and could do an kmap_atomic() there.
So the old implementation used set_pte() while the new one uses
set_pte_at().

> Not sure, if this is allowed at this point.
The problem is the recursion
  kmap_atomic() -> __update_cache() -> kmap_atomic()

and kmap_local_idx_push() runs out if index space before stack space.

I'm not sure if the __update_cache() worked for highmem. It has been
added for that in commit
   f4281bba81810 ("MIPS: Handle highmem pages in __update_cache")

but it assumes that the address returned by kmap_atomic() is the same or
related enough for flush_data_cache_page() to work.

> Thomas.
> 

Sebastian
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [patch V3 10/37] ARM: highmem: Switch to generic kmap atomic

2020-11-13 Thread Sebastian Andrzej Siewior

On 2020-11-12 09:10:34 [+0100], Marek Szyprowski wrote:
> I can do more tests to help fixing this issue. Just let me know what to do.

-> https://lkml.kernel.org/r/87y2j6n8mj@nanos.tec.linutronix.de

Sebastian
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

2020-05-02 Thread Sebastian Andrzej Siewior

On 2020-04-30 16:10:16 [-0600], Jason A. Donenfeld wrote:
> Sometimes it's not okay to use SIMD registers, the conditions for which
> have changed subtly from kernel release to kernel release. Usually the
> pattern is to check for may_use_simd() and then fallback to using
> something slower in the unlikely case SIMD registers aren't available.
> So, this patch fixes up i915's accelerated memcpy routines to fallback
> to boring memcpy if may_use_simd() is false.

That would indicate that these functions are used from IRQ/softirq which
break otherwise if the kernel is also using the registers. The crypto
code uses it for that purpose.

So
   Reviewed-by: Sebastian Andrzej Siewior 

May I ask how large the memcpy can be? I'm asking in case it is large
and an explicit rescheduling point might be needed.

> Cc: sta...@vger.kernel.org
> Signed-off-by: Jason A. Donenfeld 

Sebastian
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/2 v2] drm/vmwgfx: Remove a few unused functions

2020-03-03 Thread Sebastian Andrzej Siewior

I noticed that there is a prototype for vmw_fifo_ping_host_locked() but
no function. Then I looked further and noticed more functions which are
not used anymore or functions protoypes which remained after the
function was removed.

Remove unused function (prototypes).

Signed-off-by: Sebastian Andrzej Siewior 
---
v1…v2: Also remove the "VGA registers." in struct vmw_private as per
   Emil Velikov. It it is unused since commit a278724aa23c
   ("drm/vmwgfx: Implement fbdev on kms v2").

On 2020-02-28 11:42:46 [+], Emil Velikov wrote:
> Hi Sebastian,
Hi Emil,

> Seems like the last user was removed in 2015 with commit a278724aa23c
> ("drm/vmwgfx: Implement fbdev on kms v2").
> 
> With this patch, the section of "VGA registers" in struct vmw_private
> becomes dead code.
> IMHO it would make sense to also remove those with this patch.

Okay, thanks, removed.

> HTH
> Emil
   
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 28 -
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 81 -
 drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c | 31 --
 3 files changed, 140 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 86b69397d166e..5c8d643d99c0f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -479,19 +479,6 @@ struct vmw_private {
bool assume_16bpp;
bool has_sm4_1;
 
-   /*
-* VGA registers.
-*/
-
-   struct vmw_vga_topology_state vga_save[VMWGFX_MAX_DISPLAYS];
-   uint32_t vga_width;
-   uint32_t vga_height;
-   uint32_t vga_bpp;
-   uint32_t vga_bpl;
-   uint32_t vga_pitchlock;
-
-   uint32_t num_displays;
-
/*
 * Framebuffer info.
 */
@@ -900,7 +887,6 @@ extern void vmw_fifo_commit(struct vmw_private *dev_priv, 
uint32_t bytes);
 extern void vmw_fifo_commit_flush(struct vmw_private *dev_priv, uint32_t 
bytes);
 extern int vmw_fifo_send_fence(struct vmw_private *dev_priv,
   uint32_t *seqno);
-extern void vmw_fifo_ping_host_locked(struct vmw_private *, uint32_t reason);
 extern void vmw_fifo_ping_host(struct vmw_private *dev_priv, uint32_t reason);
 extern bool vmw_fifo_have_3d(struct vmw_private *dev_priv);
 extern bool vmw_fifo_have_pitchlock(struct vmw_private *dev_priv);
@@ -947,7 +933,6 @@ extern struct ttm_placement vmw_mob_placement;
 extern struct ttm_placement vmw_mob_ne_placement;
 extern struct ttm_placement vmw_nonfixed_placement;
 extern struct ttm_bo_driver vmw_bo_driver;
-extern int vmw_dma_quiescent(struct drm_device *dev);
 extern int vmw_bo_map_dma(struct ttm_buffer_object *bo);
 extern void vmw_bo_unmap_dma(struct ttm_buffer_object *bo);
 extern const struct vmw_sg_table *
@@ -1085,8 +1070,6 @@ int vmw_fb_on(struct vmw_private *vmw_priv);
 
 int vmw_kms_init(struct vmw_private *dev_priv);
 int vmw_kms_close(struct vmw_private *dev_priv);
-int vmw_kms_save_vga(struct vmw_private *vmw_priv);
-int vmw_kms_restore_vga(struct vmw_private *vmw_priv);
 int vmw_kms_cursor_bypass_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv);
 void vmw_kms_cursor_post_execbuf(struct vmw_private *dev_priv);
@@ -1139,7 +1122,6 @@ int vmw_overlay_init(struct vmw_private *dev_priv);
 int vmw_overlay_close(struct vmw_private *dev_priv);
 int vmw_overlay_ioctl(struct drm_device *dev, void *data,
  struct drm_file *file_priv);
-int vmw_overlay_stop_all(struct vmw_private *dev_priv);
 int vmw_overlay_resume_all(struct vmw_private *dev_priv);
 int vmw_overlay_pause_all(struct vmw_private *dev_priv);
 int vmw_overlay_claim(struct vmw_private *dev_priv, uint32_t *out);
@@ -1186,10 +1168,6 @@ extern void vmw_otables_takedown(struct vmw_private 
*dev_priv);
 
 extern const struct vmw_user_resource_conv *user_context_converter;
 
-extern int vmw_context_check(struct vmw_private *dev_priv,
-struct ttm_object_file *tfile,
-int id,
-struct vmw_resource **p_res);
 extern int vmw_context_define_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv);
 extern int vmw_extended_context_define_ioctl(struct drm_device *dev, void 
*data,
@@ -1219,7 +1197,6 @@ vmw_context_get_dx_query_mob(struct vmw_resource 
*ctx_res);
 
 extern const struct vmw_user_resource_conv *user_surface_converter;
 
-extern void vmw_surface_res_free(struct vmw_resource *res);
 extern int vmw_surface_destroy_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_priv);
 extern int vmw_surface_define_ioctl(struct drm_device *dev, void *data,
@@ -1230,11 +1207,6 @@ extern int vmw_gb_surface_define_ioctl(struct drm_device 
*dev, void *data,
   struct drm_file *file_priv)

[PATCH 2/2] drm/vmwgfx: Remove a few unused functions

2020-02-24 Thread Sebastian Andrzej Siewior

I noticed that there is a prototype for vmw_fifo_ping_host_locked() but
no function. Then I looked further and noticed more functions which are
not used anymore or functions protoypes which remained after the
function was removed.

Remove unused function (prototypes).

Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 15 -
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 81 -
 drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c | 31 --
 3 files changed, 127 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 86b69397d166e..0df2f6bc00a9c 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -900,7 +900,6 @@ extern void vmw_fifo_commit(struct vmw_private *dev_priv, 
uint32_t bytes);
 extern void vmw_fifo_commit_flush(struct vmw_private *dev_priv, uint32_t 
bytes);
 extern int vmw_fifo_send_fence(struct vmw_private *dev_priv,
   uint32_t *seqno);
-extern void vmw_fifo_ping_host_locked(struct vmw_private *, uint32_t reason);
 extern void vmw_fifo_ping_host(struct vmw_private *dev_priv, uint32_t reason);
 extern bool vmw_fifo_have_3d(struct vmw_private *dev_priv);
 extern bool vmw_fifo_have_pitchlock(struct vmw_private *dev_priv);
@@ -947,7 +946,6 @@ extern struct ttm_placement vmw_mob_placement;
 extern struct ttm_placement vmw_mob_ne_placement;
 extern struct ttm_placement vmw_nonfixed_placement;
 extern struct ttm_bo_driver vmw_bo_driver;
-extern int vmw_dma_quiescent(struct drm_device *dev);
 extern int vmw_bo_map_dma(struct ttm_buffer_object *bo);
 extern void vmw_bo_unmap_dma(struct ttm_buffer_object *bo);
 extern const struct vmw_sg_table *
@@ -1085,8 +1083,6 @@ int vmw_fb_on(struct vmw_private *vmw_priv);
 
 int vmw_kms_init(struct vmw_private *dev_priv);
 int vmw_kms_close(struct vmw_private *dev_priv);
-int vmw_kms_save_vga(struct vmw_private *vmw_priv);
-int vmw_kms_restore_vga(struct vmw_private *vmw_priv);
 int vmw_kms_cursor_bypass_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv);
 void vmw_kms_cursor_post_execbuf(struct vmw_private *dev_priv);
@@ -1139,7 +1135,6 @@ int vmw_overlay_init(struct vmw_private *dev_priv);
 int vmw_overlay_close(struct vmw_private *dev_priv);
 int vmw_overlay_ioctl(struct drm_device *dev, void *data,
  struct drm_file *file_priv);
-int vmw_overlay_stop_all(struct vmw_private *dev_priv);
 int vmw_overlay_resume_all(struct vmw_private *dev_priv);
 int vmw_overlay_pause_all(struct vmw_private *dev_priv);
 int vmw_overlay_claim(struct vmw_private *dev_priv, uint32_t *out);
@@ -1186,10 +1181,6 @@ extern void vmw_otables_takedown(struct vmw_private 
*dev_priv);
 
 extern const struct vmw_user_resource_conv *user_context_converter;
 
-extern int vmw_context_check(struct vmw_private *dev_priv,
-struct ttm_object_file *tfile,
-int id,
-struct vmw_resource **p_res);
 extern int vmw_context_define_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv);
 extern int vmw_extended_context_define_ioctl(struct drm_device *dev, void 
*data,
@@ -1219,7 +1210,6 @@ vmw_context_get_dx_query_mob(struct vmw_resource 
*ctx_res);
 
 extern const struct vmw_user_resource_conv *user_surface_converter;
 
-extern void vmw_surface_res_free(struct vmw_resource *res);
 extern int vmw_surface_destroy_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_priv);
 extern int vmw_surface_define_ioctl(struct drm_device *dev, void *data,
@@ -1230,11 +1220,6 @@ extern int vmw_gb_surface_define_ioctl(struct drm_device 
*dev, void *data,
   struct drm_file *file_priv);
 extern int vmw_gb_surface_reference_ioctl(struct drm_device *dev, void *data,
  struct drm_file *file_priv);
-extern int vmw_surface_check(struct vmw_private *dev_priv,
-struct ttm_object_file *tfile,
-uint32_t handle, int *id);
-extern int vmw_surface_validate(struct vmw_private *dev_priv,
-   struct vmw_surface *srf);
 int vmw_surface_gb_priv_define(struct drm_device *dev,
   uint32_t user_accounting_size,
   SVGA3dSurfaceAllFlags svga3d_flags,
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index f47d5710cc951..dbd80fcf0c6aa 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -1897,87 +1897,6 @@ int vmw_kms_write_svga(struct vmw_private *vmw_priv,
return 0;
 }
 
-int vmw_kms_save_vga(struct vmw_private *vmw_priv)
-{
-   struct vmw_vga_topology_state *save;
-   uint32_t i;
-
-   vmw_priv->vga_wi

[PATCH 1/2] drm/vmwgfx: Drop preempt_disable() in vmw_fifo_ping_host()

2020-02-24 Thread Sebastian Andrzej Siewior

vmw_fifo_ping_host() disables preemption around a test and a register
write via vmw_write(). The write function acquires a spinlock_t typed
lock which is not allowed in a preempt_disable()ed section on
PREEMPT_RT. This has been reported in the bugzilla.

It has been explained by Thomas Hellstrom that this preempt_disable()ed
section is not required for correctness.

Remove the preempt_disable() section.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206591
Link: 
https://lkml.kernel.org/r/0b5e1c65d89951de993deab06d1d197b40fd67aa.ca...@vmware.com
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
index e5252ef3812f0..6941689085ed3 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
@@ -169,10 +169,8 @@ void vmw_fifo_ping_host(struct vmw_private *dev_priv, 
uint32_t reason)
 {
u32 *fifo_mem = dev_priv->mmio_virt;
 
-   preempt_disable();
if (cmpxchg(fifo_mem + SVGA_FIFO_BUSY, 0, 1) == 0)
vmw_write(dev_priv, SVGA_REG_SYNC, reason);
-   preempt_enable();
 }
 
 void vmw_fifo_release(struct vmw_private *dev_priv, struct vmw_fifo_state 
*fifo)
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

1 2 >

1 - 100 of 131 matches

Mail list logo