Re: [Intel-gfx] [PATCH] drm/i915: Reboot CI if we get wedged during driver init

2020-07-01 Thread Michal Wajdeczko


On 01.07.2020 17:17, Chris Wilson wrote:
> Quoting Michał Winiarski (2020-07-01 16:07:21)
>> From: Michał Winiarski 
>>
>> Getting wedged device on driver init is pretty much unrecoverable.
>> Since we're running verious scenarios that may potentially hit this in

typo

>> CI (module reload / selftests / hotunplug), and if it happens, it means
>> that we can't trust any subsequent CI results, we should just apply the
>> taint to let the CI know that it should reboot (CI checks taint between
>> test runs).
>>
>> Signed-off-by: Michał Winiarski 
>> Cc: Chris Wilson 
>> Cc: Petri Latvala 
>> ---
>>  drivers/gpu/drm/i915/gt/intel_reset.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
>> b/drivers/gpu/drm/i915/gt/intel_reset.c
>> index 0156f1f5c736..d27e8bb7d550 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>> @@ -1360,6 +1360,8 @@ void intel_gt_set_wedged_on_init(struct intel_gt *gt)
>>  I915_WEDGED_ON_INIT);
>> intel_gt_set_wedged(gt);
>> set_bit(I915_WEDGED_ON_INIT, >reset.flags);
>> +
> 
> Ah, we don't say around here that this WEDGED_ON_INIT is non-recoverable,
> could you please add a comment to that effect?
> 

Such comment is already in WEDGED_ON_INIT description, but repeating it
will definitely help

>> +   add_taint_for_CI(TAINT_WARN);

btw, today we are tainting kernel for CI silently and from different
places, so maybe it is worth to add there some debug log with
__builtin_return_address() for better diagnose why we stopped CI?

with typo/comment fixed,
Reviewed-by: Michal Wajdeczko 

>>  }
>>  
>>  void intel_gt_init_reset(struct intel_gt *gt)
>> -- 
>> 2.27.0
>>
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Reboot CI if we get wedged during driver init

2020-07-01 Thread Chris Wilson
Quoting Michał Winiarski (2020-07-01 16:07:21)
> From: Michał Winiarski 
> 
> Getting wedged device on driver init is pretty much unrecoverable.
> Since we're running verious scenarios that may potentially hit this in
> CI (module reload / selftests / hotunplug), and if it happens, it means
> that we can't trust any subsequent CI results, we should just apply the
> taint to let the CI know that it should reboot (CI checks taint between
> test runs).
> 
> Signed-off-by: Michał Winiarski 
> Cc: Chris Wilson 
> Cc: Petri Latvala 
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
> b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 0156f1f5c736..d27e8bb7d550 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -1360,6 +1360,8 @@ void intel_gt_set_wedged_on_init(struct intel_gt *gt)
>  I915_WEDGED_ON_INIT);
> intel_gt_set_wedged(gt);
> set_bit(I915_WEDGED_ON_INIT, >reset.flags);
> +

Ah, we don't say around here that this WEDGED_ON_INIT is non-recoverable,
could you please add a comment to that effect?

> +   add_taint_for_CI(TAINT_WARN);
>  }
>  
>  void intel_gt_init_reset(struct intel_gt *gt)
> -- 
> 2.27.0
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Reboot CI if we get wedged during driver init

2020-07-01 Thread Chris Wilson
Quoting Michał Winiarski (2020-07-01 16:07:21)
> From: Michał Winiarski 
> 
> Getting wedged device on driver init is pretty much unrecoverable.
> Since we're running verious scenarios that may potentially hit this in
> CI (module reload / selftests / hotunplug), and if it happens, it means
> that we can't trust any subsequent CI results, we should just apply the
> taint to let the CI know that it should reboot (CI checks taint between
> test runs).

Ok, we treat WEDGED_ON_INIT as non-recoverable [as opposed to the less
wedged WEDGED].
 
> Signed-off-by: Michał Winiarski 
> Cc: Chris Wilson 
> Cc: Petri Latvala 
Reviewed-by: Chris Wilson 
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Reboot CI if we get wedged during driver init

2020-07-01 Thread Michał Winiarski
From: Michał Winiarski 

Getting wedged device on driver init is pretty much unrecoverable.
Since we're running verious scenarios that may potentially hit this in
CI (module reload / selftests / hotunplug), and if it happens, it means
that we can't trust any subsequent CI results, we should just apply the
taint to let the CI know that it should reboot (CI checks taint between
test runs).

Signed-off-by: Michał Winiarski 
Cc: Chris Wilson 
Cc: Petri Latvala 
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
b/drivers/gpu/drm/i915/gt/intel_reset.c
index 0156f1f5c736..d27e8bb7d550 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1360,6 +1360,8 @@ void intel_gt_set_wedged_on_init(struct intel_gt *gt)
 I915_WEDGED_ON_INIT);
intel_gt_set_wedged(gt);
set_bit(I915_WEDGED_ON_INIT, >reset.flags);
+
+   add_taint_for_CI(TAINT_WARN);
 }
 
 void intel_gt_init_reset(struct intel_gt *gt)
-- 
2.27.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx