Re: [PATCH] drm/i915/selftests: Use preemption timeout on cleanup
Hi Janusz,
> > > > > > +
> > > > > > cond_resched();
> > > > > >
> > > > > > - if (intel_gt_wait_for_idle(gt, HZ * 3) == -ETIME) {
> > > > > > + if (intel_gt_wait_for_idle(gt, HZ * timeout_ms / 500)
> > > > > > == -ETIME) {
> > > > >
> > > > > where is this 500 coming from?
> > > >
> > > > / 1000 would convert it to seconds as needed, and / 500 used instead
> > > > was
> > > > supposed to to mean that we are willing to wait for preempt_timeout_ms
> > > > *
> > 2.
> > > > Sorry for that shortcut. Would you like me to provide a clarifying
> > > > comment,
> > > > or maybe better use explicit 2 * preempt_timeout / 1000 ?
> > >
> > > It was clear that you were doubling it, but what's more
> > > interesting to know (perhaps in a comment) is why you are
> > > choosing to use the double of the timeout_ms instead of other
> > > values.
> > >
> > > Makes sense?
> >
> > Yes, good question.
> >
> > Is it possible for more than one bb to hang? If yes then should we wait
> > longer than the longest preemption timeout? Before I assumed that maybe we
> > should, just in case, but now, having that revisited and reconsidered, I
> > tend
> > to agree that the longest preempt timeout, perhaps with a small margin
> > (let's
> > say +100ms) should be enough to recover from a single failing test case.
> > Let
> > me verify if that works for the linked case.
>
> I've done some testing and got a confirmation that the issue I'm trying to
> address in the first place requires a timeout almost twice as long as the
> longest preemption timeout.
>
> I propose the following correction:
>
> - if (intel_gt_wait_for_idle(gt, HZ * 3) == -ETIME) {
> + /* 2 x longest preempt timeout, experimentally determined */
> + if (intel_gt_wait_for_idle(gt, 2 * timeout_ms * HZ / 1000) == -ETIME) {
with this change, I merge your patch to drm-intel-next.
Thanks,
Andi
Re: [PATCH] drm/i915/selftests: Use preemption timeout on cleanup
Hi Janusz,
> > > > > +
> > > > > cond_resched();
> > > > >
> > > > > - if (intel_gt_wait_for_idle(gt, HZ * 3) == -ETIME) {
> > > > > + if (intel_gt_wait_for_idle(gt, HZ * timeout_ms / 500)
> > > > > == -
> > > ETIME) {
> > > >
> > > > where is this 500 coming from?
> > >
> > > / 1000 would convert it to seconds as needed, and / 500 used instead was
> > > supposed to to mean that we are willing to wait for preempt_timeout_ms *
> 2.
> > > Sorry for that shortcut. Would you like me to provide a clarifying
> comment,
> > > or maybe better use explicit 2 * preempt_timeout / 1000 ?
> >
> > It was clear that you were doubling it, but what's more
> > interesting to know (perhaps in a comment) is why you are
> > choosing to use the double of the timeout_ms instead of other
> > values.
> >
> > Makes sense?
>
> Yes, good question.
>
> Is it possible for more than one bb to hang? If yes then should we wait
> longer than the longest preemption timeout? Before I assumed that maybe we
> should, just in case, but now, having that revisited and reconsidered, I tend
> to agree that the longest preempt timeout, perhaps with a small margin (let's
> say +100ms) should be enough to recover from a single failing test case. Let
> me verify if that works for the linked case.
As we agreed offline, I'm going to add this comment you suggested
to your change as a justification to the "/ 500":
/* 2x longest preempt timeout, experimentally determined */
With this:
Reviewed-by: Andi Shyti
Thanks,
Andi
Re: [PATCH] drm/i915/selftests: Use preemption timeout on cleanup
Hi Andi,
On Tuesday, 17 December 2024 19:00:40 CET Janusz Krzysztofik wrote:
> Hi Andi,
>
> On Tuesday, 17 December 2024 18:12:08 CET Andi Shyti wrote:
> > Hi Janusz,
> >
> > ...
> >
> > > > > +
> > > > > cond_resched();
> > > > >
> > > > > - if (intel_gt_wait_for_idle(gt, HZ * 3) == -ETIME) {
> > > > > + if (intel_gt_wait_for_idle(gt, HZ * timeout_ms / 500)
> > > > > == -ETIME) {
> > > >
> > > > where is this 500 coming from?
> > >
> > > / 1000 would convert it to seconds as needed, and / 500 used instead was
> > > supposed to to mean that we are willing to wait for preempt_timeout_ms *
> 2.
> > > Sorry for that shortcut. Would you like me to provide a clarifying
> > > comment,
> > > or maybe better use explicit 2 * preempt_timeout / 1000 ?
> >
> > It was clear that you were doubling it, but what's more
> > interesting to know (perhaps in a comment) is why you are
> > choosing to use the double of the timeout_ms instead of other
> > values.
> >
> > Makes sense?
>
> Yes, good question.
>
> Is it possible for more than one bb to hang? If yes then should we wait
> longer than the longest preemption timeout? Before I assumed that maybe we
> should, just in case, but now, having that revisited and reconsidered, I tend
> to agree that the longest preempt timeout, perhaps with a small margin (let's
> say +100ms) should be enough to recover from a single failing test case. Let
> me verify if that works for the linked case.
I've done some testing and got a confirmation that the issue I'm trying to
address in the first place requires a timeout almost twice as long as the
longest preemption timeout.
I propose the following correction:
- if (intel_gt_wait_for_idle(gt, HZ * 3) == -ETIME) {
+ /* 2 x longest preempt timeout, experimentally determined */
+ if (intel_gt_wait_for_idle(gt, 2 * timeout_ms * HZ / 1000) == -ETIME) {
Thanks,
Janusz
>
> Thanks,
> Janusz
>
> >
> > Thanks,
> > Andi
> >
>
>
>
>
>
Re: [PATCH] drm/i915/selftests: Use preemption timeout on cleanup
Hi Andi,
On Tuesday, 17 December 2024 18:12:08 CET Andi Shyti wrote:
> Hi Janusz,
>
> ...
>
> > > > +
> > > > cond_resched();
> > > >
> > > > - if (intel_gt_wait_for_idle(gt, HZ * 3) == -ETIME) {
> > > > + if (intel_gt_wait_for_idle(gt, HZ * timeout_ms / 500)
> > > > == -
> > ETIME) {
> > >
> > > where is this 500 coming from?
> >
> > / 1000 would convert it to seconds as needed, and / 500 used instead was
> > supposed to to mean that we are willing to wait for preempt_timeout_ms *
2.
> > Sorry for that shortcut. Would you like me to provide a clarifying
comment,
> > or maybe better use explicit 2 * preempt_timeout / 1000 ?
>
> It was clear that you were doubling it, but what's more
> interesting to know (perhaps in a comment) is why you are
> choosing to use the double of the timeout_ms instead of other
> values.
>
> Makes sense?
Yes, good question.
Is it possible for more than one bb to hang? If yes then should we wait
longer than the longest preemption timeout? Before I assumed that maybe we
should, just in case, but now, having that revisited and reconsidered, I tend
to agree that the longest preempt timeout, perhaps with a small margin (let's
say +100ms) should be enough to recover from a single failing test case. Let
me verify if that works for the linked case.
Thanks,
Janusz
>
> Thanks,
> Andi
>
Re: [PATCH] drm/i915/selftests: Use preemption timeout on cleanup
Hi Janusz,
...
> > > +
> > > cond_resched();
> > >
> > > - if (intel_gt_wait_for_idle(gt, HZ * 3) == -ETIME) {
> > > + if (intel_gt_wait_for_idle(gt, HZ * timeout_ms / 500) == -
> ETIME) {
> >
> > where is this 500 coming from?
>
> / 1000 would convert it to seconds as needed, and / 500 used instead was
> supposed to to mean that we are willing to wait for preempt_timeout_ms * 2.
> Sorry for that shortcut. Would you like me to provide a clarifying comment,
> or maybe better use explicit 2 * preempt_timeout / 1000 ?
It was clear that you were doubling it, but what's more
interesting to know (perhaps in a comment) is why you are
choosing to use the double of the timeout_ms instead of other
values.
Makes sense?
Thanks,
Andi
Re: [PATCH] drm/i915/selftests: Use preemption timeout on cleanup
Hi Andi,
Thanks for review.
On Monday, 16 December 2024 14:26:58 CET Andi Shyti wrote:
> Hi Janusz,
>
> ...
>
> > for_each_gt(gt, i915, i) {
> > + struct intel_engine_cs *engine;
> > + unsigned long timeout_ms = 0;
> > + unsigned int id;
> > +
> > if (intel_gt_is_wedged(gt))
> > ret = -EIO;
> >
> > + for_each_engine(engine, gt, id) {
> > + if (engine->props.preempt_timeout_ms >
timeout_ms)
> > + timeout_ms = engine-
>props.preempt_timeout_ms;
> > + }
>
>
> the brackets are not really required here.
OK, I was not sure if for_each_if used inside for_each_engine is supposed to
resolve potential issues with potentially confusing if nesting, but from your
comment I conclude it does. I'll fix it.
>
> > +
> > cond_resched();
> >
> > - if (intel_gt_wait_for_idle(gt, HZ * 3) == -ETIME) {
> > + if (intel_gt_wait_for_idle(gt, HZ * timeout_ms / 500) == -
ETIME) {
>
> where is this 500 coming from?
/ 1000 would convert it to seconds as needed, and / 500 used instead was
supposed to to mean that we are willing to wait for preempt_timeout_ms * 2.
Sorry for that shortcut. Would you like me to provide a clarifying comment,
or maybe better use explicit 2 * preempt_timeout / 1000 ?
Thanks,
Janusz
>
> Thanks,
> Andi
>
> > pr_err("%pS timed out, cancelling all further
testing.\n",
> >__builtin_return_address(0));
> >
>
Re: [PATCH] drm/i915/selftests: Use preemption timeout on cleanup
Hi Janusz,
...
> for_each_gt(gt, i915, i) {
> + struct intel_engine_cs *engine;
> + unsigned long timeout_ms = 0;
> + unsigned int id;
> +
> if (intel_gt_is_wedged(gt))
> ret = -EIO;
>
> + for_each_engine(engine, gt, id) {
> + if (engine->props.preempt_timeout_ms > timeout_ms)
> + timeout_ms = engine->props.preempt_timeout_ms;
> + }
the brackets are not really required here.
> +
> cond_resched();
>
> - if (intel_gt_wait_for_idle(gt, HZ * 3) == -ETIME) {
> + if (intel_gt_wait_for_idle(gt, HZ * timeout_ms / 500) ==
> -ETIME) {
where is this 500 coming from?
Thanks,
Andi
> pr_err("%pS timed out, cancelling all further
> testing.\n",
> __builtin_return_address(0));
>
> --
> 2.47.1
