Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after each read/write operation

2023-10-09 Thread Gustavo Sousa
Quoting Rodrigo Vivi (2023-10-05 12:13:34-03:00)
>On Thu, Oct 05, 2023 at 03:05:31AM -0400, Kahola, Mika wrote:
>> > -Original Message-
>> > From: Vivi, Rodrigo 
>> > Sent: Wednesday, October 4, 2023 3:56 PM
>> > To: Kahola, Mika 
>> > Cc: intel-gfx@lists.freedesktop.org
>> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after 
>> > each read/write operation
>> > 
>> > On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote:
>> > > Every know and then we receive the following error when running for
>> > > example IGT test kms_flip.
>> > >
>> > > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
>> > > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
>> > >
>> > > Since the error is sporadic in nature, the patch proposes to reset the
>> > > message bus after every successful or unsuccessful read or write
>> > > operation. However, testing revealed that this alone is not sufficient
>> > > method an additiona delay is also introduces anything from 200us to
>> > > 300us. This delay is experimental value and has no specification to
>> > > back it up.
>> > 
>> > have you tried the delays without the bus_reset?
>> Yes, we have bumped up the delay, first from 0x100 to 0x200 and then as per 
>> BSpec change 0xa000 and I have tried 0xf000. Increasing the timeout reduces
>> the frequency of this error but doesn't solve this issue.
>
>what is exactly this BSPec's 0xa000? where can I see it? So maybe you can
>update the message above removing the 'no specification to back it up'.

I think we are confusing "delay" with the "timeout parameter" of the msgbus.

The PHY has a register to control the timeout parameter of msgbus transactions
(BSpec 65156). It's default value is 0x100. With commit e028d7a4235d
("drm/i915/cx0: Check and increase msgbus timeout threshold"), we had integrated
a workaround that bumped the timeout value to 0x200 in case timeouts were
observed. Later on, there was a BSpec update with the formal timeout value to be
programmed to 0xa000, which was incorporated with commit e35628968032
("drm/i915/cx0: Add step for programming msgbus timer").

I *believe* what Rodrigo has asked was about the usleep_range() calls added with
this patch, if we tried to only keep the usleed_range() without the bus reset.

--
Gustavo Sousa

>
>Oh, and my english is bad, but it looks to me that 'empirical' might
>sound better than 'experimental' for this case, since you really did
>a lot of experiments before coming to this final conclusion.
>
>> 
>> > have you talked to hw architects about this?
>> Yes, HW guys requested traces which I provided but based on these the 
>> sequence we use in i915
>> is correct.
>> 
>> > 
>> > I wonder if we should add the delay inside the bus_reset itself?
>> > although the bit 15 clear check should be enough by itself and it doesn't 
>> > look like it is a hw/fw reset involved to justify the extra
>> > delay.
>> That should be enough. To me, it looks like when reading/writing to the bus 
>> maybe too fast, the hw cannot handle that and we need
>> to reset and let things settle down before trying again.
>> 
>> > 
>> > well, at least some /* FIXME: */ or /* XXX: */ comments is desired along 
>> > with the messages if we are going with this hack without
>> > understanding why...
>> True, I will add these the the patch.
>> 
>> Thanks for review!
>> 
>> -Mika-
>> > 
>> > >
>> > > Signed-off-by: Mika Kahola 
>> > > ---
>> > >  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 6 ++
>> > >  1 file changed, 6 insertions(+)
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > > b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > > index abd607b564f1..a71b8a29d6b0 100644
>> > > --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > > +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > > @@ -220,9 +220,12 @@ static u8 __intel_cx0_read(struct drm_i915_private 
>> > > *i915, enum port port,
>> > >  /* 3 tries is assumed to be enough to read successfully */
>> > >  for (i = 0; i < 3; i++) {
>> > >  status = __intel_cx0_read_once(i915, port, lane, addr);
>> > > +intel_cx0_bus_reset(i915, port, lane);
>> > >
>> > >  if (status >= 0)
>&g

Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after each read/write operation

2023-10-06 Thread Kahola, Mika
> -Original Message-
> From: Sousa, Gustavo 
> Sent: Friday, October 6, 2023 2:57 PM
> To: Kahola, Mika ; Vivi, Rodrigo 
> 
> Cc: intel-gfx@lists.freedesktop.org
> Subject: RE: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after 
> each read/write operation
> 
> Quoting Kahola, Mika (2023-10-06 03:49:15-03:00)
> >> -Original Message-
> >> From: Vivi, Rodrigo 
> >> Sent: Thursday, October 5, 2023 7:10 PM
> >> To: Sousa, Gustavo 
> >> Cc: Kahola, Mika ;
> >> intel-gfx@lists.freedesktop.org
> >> Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus
> >> after each read/write operation
> >>
> >> On Thu, Oct 05, 2023 at 12:40:35PM -0300, Gustavo Sousa wrote:
> >> > Quoting Rodrigo Vivi (2023-10-05 12:13:34-03:00)
> >> > >On Thu, Oct 05, 2023 at 03:05:31AM -0400, Kahola, Mika wrote:
> >> > >> > -Original Message-
> >> > >> > From: Vivi, Rodrigo 
> >> > >> > Sent: Wednesday, October 4, 2023 3:56 PM
> >> > >> > To: Kahola, Mika 
> >> > >> > Cc: intel-gfx@lists.freedesktop.org
> >> > >> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset
> >> > >> > message bus after each read/write operation
> >> > >> >
> >> > >> > On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote:
> >> > >> > > Every know and then we receive the following error when
> >> > >> > > running for example IGT test kms_flip.
> >> > >> > >
> >> > >> > > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> >> > >> > > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
> >> > >> > >
> >> > >> > > Since the error is sporadic in nature, the patch proposes to
> >> > >> > > reset the message bus after every successful or unsuccessful
> >> > >> > > read or write operation. However, testing revealed that this
> >> > >> > > alone is not sufficient method an additiona delay is also
> >> > >> > > introduces anything from 200us to 300us. This delay is
> >> > >> > > experimental value and has no specification to back it up.
> >> > >> >
> >> > >> > have you tried the delays without the bus_reset?
> >> > >> Yes, we have bumped up the delay, first from 0x100 to 0x200 and
> >> > >> then as per BSpec change 0xa000 and I have tried 0xf000.
> >> > >> Increasing the timeout reduces the frequency of this error but 
> >> > >> doesn't solve this issue.
> >> > >
> >> > >what is exactly this BSPec's 0xa000? where can I see it? So maybe
> >> > >you can update the message above removing the 'no specification to back 
> >> > >it up'.
> >> >
> >> > (Resending this because I got a delivery failure notification)
> >> >
> >> > I think we are confusing "delay" with the "timeout parameter" of the 
> >> > msgbus.
> >> >
> >> > The PHY has a register to control the timeout parameter of msgbus
> >> > transactions (BSpec 65156). It's default value is 0x100. With
> >> > commit e028d7a4235d
> >> > ("drm/i915/cx0: Check and increase msgbus timeout threshold"), we
> >> > had integrated a workaround that bumped the timeout value to 0x200
> >> > in case timeouts were observed. Later on, there was a BSpec update
> >> > with the formal timeout value to be programmed to 0xa000, which was
> >> > incorporated with commit e35628968032
> >> > ("drm/i915/cx0: Add step for programming msgbus timer").
> >> >
> >> > I *believe* what Rodrigo has asked was about the usleep_range()
> >> > calls added with this patch, if we tried to only keep the usleed_range() 
> >> > without the bus reset.
> >>
> >> yes, that was my original question.
> >
> >I have no good explanation why usleep_range() is needed. Without it,
> >the kms_flip test eventually throws these read/write failures. As these
> >are a bit sporadic in nature, it takes some time to catch these errors.
> 
> I think the question is whether the bus reset is really necessary. Maybe only 
> the usleep_range() hack would be "enough" to
> mitigate the issue?

I have been scrat

Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after each read/write operation

2023-10-06 Thread Gustavo Sousa
Quoting Kahola, Mika (2023-10-06 03:49:15-03:00)
>> -Original Message-
>> From: Vivi, Rodrigo 
>> Sent: Thursday, October 5, 2023 7:10 PM
>> To: Sousa, Gustavo 
>> Cc: Kahola, Mika ; intel-gfx@lists.freedesktop.org
>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after 
>> each read/write operation
>> 
>> On Thu, Oct 05, 2023 at 12:40:35PM -0300, Gustavo Sousa wrote:
>> > Quoting Rodrigo Vivi (2023-10-05 12:13:34-03:00)
>> > >On Thu, Oct 05, 2023 at 03:05:31AM -0400, Kahola, Mika wrote:
>> > >> > -Original Message-
>> > >> > From: Vivi, Rodrigo 
>> > >> > Sent: Wednesday, October 4, 2023 3:56 PM
>> > >> > To: Kahola, Mika 
>> > >> > Cc: intel-gfx@lists.freedesktop.org
>> > >> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message
>> > >> > bus after each read/write operation
>> > >> >
>> > >> > On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote:
>> > >> > > Every know and then we receive the following error when running
>> > >> > > for example IGT test kms_flip.
>> > >> > >
>> > >> > > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
>> > >> > > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
>> > >> > >
>> > >> > > Since the error is sporadic in nature, the patch proposes to
>> > >> > > reset the message bus after every successful or unsuccessful
>> > >> > > read or write operation. However, testing revealed that this
>> > >> > > alone is not sufficient method an additiona delay is also
>> > >> > > introduces anything from 200us to 300us. This delay is
>> > >> > > experimental value and has no specification to back it up.
>> > >> >
>> > >> > have you tried the delays without the bus_reset?
>> > >> Yes, we have bumped up the delay, first from 0x100 to 0x200 and
>> > >> then as per BSpec change 0xa000 and I have tried 0xf000. Increasing
>> > >> the timeout reduces the frequency of this error but doesn't solve this 
>> > >> issue.
>> > >
>> > >what is exactly this BSPec's 0xa000? where can I see it? So maybe you
>> > >can update the message above removing the 'no specification to back it 
>> > >up'.
>> >
>> > (Resending this because I got a delivery failure notification)
>> >
>> > I think we are confusing "delay" with the "timeout parameter" of the 
>> > msgbus.
>> >
>> > The PHY has a register to control the timeout parameter of msgbus
>> > transactions (BSpec 65156). It's default value is 0x100. With commit
>> > e028d7a4235d
>> > ("drm/i915/cx0: Check and increase msgbus timeout threshold"), we had
>> > integrated a workaround that bumped the timeout value to 0x200 in case
>> > timeouts were observed. Later on, there was a BSpec update with the
>> > formal timeout value to be programmed to 0xa000, which was
>> > incorporated with commit e35628968032
>> > ("drm/i915/cx0: Add step for programming msgbus timer").
>> >
>> > I *believe* what Rodrigo has asked was about the usleep_range() calls
>> > added with this patch, if we tried to only keep the usleed_range() without 
>> > the bus reset.
>> 
>> yes, that was my original question.
>
>I have no good explanation why usleep_range() is needed. Without it, the 
>kms_flip test eventually
>throws these read/write failures. As these are a bit sporadic in nature, it 
>takes some time to catch
>these errors.

I think the question is whether the bus reset is really necessary. Maybe only
the usleep_range() hack would be "enough" to mitigate the issue?

--
Gustavo Sousa

>
>The patch is a hack and my idea was to set message bus at reset state after 
>each read/write operation.
>Unfortunately, this alone is not enough to pass kms_flip without these dmesg 
>errors on read/write.
>However, the kms_flip test itself, which triggers these, passes without issues.
>  
>And I missed to mention that these errors show up (at least more frequently) 
>when 2x 4k monitors are
>connected. These may not be visible with only one monitor connected. For such 
>a system, I haven't
>been testing that much.
>
>-Mika-
>
>> 
>> >
>> > --
>> > Gustavo Sousa
>> >
>&

Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after each read/write operation

2023-10-06 Thread Kahola, Mika
> -Original Message-
> From: Vivi, Rodrigo 
> Sent: Thursday, October 5, 2023 7:10 PM
> To: Sousa, Gustavo 
> Cc: Kahola, Mika ; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after 
> each read/write operation
> 
> On Thu, Oct 05, 2023 at 12:40:35PM -0300, Gustavo Sousa wrote:
> > Quoting Rodrigo Vivi (2023-10-05 12:13:34-03:00)
> > >On Thu, Oct 05, 2023 at 03:05:31AM -0400, Kahola, Mika wrote:
> > >> > -Original Message-
> > >> > From: Vivi, Rodrigo 
> > >> > Sent: Wednesday, October 4, 2023 3:56 PM
> > >> > To: Kahola, Mika 
> > >> > Cc: intel-gfx@lists.freedesktop.org
> > >> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message
> > >> > bus after each read/write operation
> > >> >
> > >> > On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote:
> > >> > > Every know and then we receive the following error when running
> > >> > > for example IGT test kms_flip.
> > >> > >
> > >> > > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> > >> > > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
> > >> > >
> > >> > > Since the error is sporadic in nature, the patch proposes to
> > >> > > reset the message bus after every successful or unsuccessful
> > >> > > read or write operation. However, testing revealed that this
> > >> > > alone is not sufficient method an additiona delay is also
> > >> > > introduces anything from 200us to 300us. This delay is
> > >> > > experimental value and has no specification to back it up.
> > >> >
> > >> > have you tried the delays without the bus_reset?
> > >> Yes, we have bumped up the delay, first from 0x100 to 0x200 and
> > >> then as per BSpec change 0xa000 and I have tried 0xf000. Increasing
> > >> the timeout reduces the frequency of this error but doesn't solve this 
> > >> issue.
> > >
> > >what is exactly this BSPec's 0xa000? where can I see it? So maybe you
> > >can update the message above removing the 'no specification to back it up'.
> >
> > (Resending this because I got a delivery failure notification)
> >
> > I think we are confusing "delay" with the "timeout parameter" of the msgbus.
> >
> > The PHY has a register to control the timeout parameter of msgbus
> > transactions (BSpec 65156). It's default value is 0x100. With commit
> > e028d7a4235d
> > ("drm/i915/cx0: Check and increase msgbus timeout threshold"), we had
> > integrated a workaround that bumped the timeout value to 0x200 in case
> > timeouts were observed. Later on, there was a BSpec update with the
> > formal timeout value to be programmed to 0xa000, which was
> > incorporated with commit e35628968032
> > ("drm/i915/cx0: Add step for programming msgbus timer").
> >
> > I *believe* what Rodrigo has asked was about the usleep_range() calls
> > added with this patch, if we tried to only keep the usleed_range() without 
> > the bus reset.
> 
> yes, that was my original question.

I have no good explanation why usleep_range() is needed. Without it, the 
kms_flip test eventually
throws these read/write failures. As these are a bit sporadic in nature, it 
takes some time to catch
these errors.

The patch is a hack and my idea was to set message bus at reset state after 
each read/write operation.
Unfortunately, this alone is not enough to pass kms_flip without these dmesg 
errors on read/write.
However, the kms_flip test itself, which triggers these, passes without issues.
  
And I missed to mention that these errors show up (at least more frequently) 
when 2x 4k monitors are
connected. These may not be visible with only one monitor connected. For such a 
system, I haven't
been testing that much.

-Mika-

> 
> >
> > --
> > Gustavo Sousa
> >
> > >
> > >Oh, and my english is bad, but it looks to me that 'empirical' might
> > >sound better than 'experimental' for this case, since you really did
> > >a lot of experiments before coming to this final conclusion.
> > >
> > >>
> > >> > have you talked to hw architects about this?
> > >> Yes, HW guys requested traces which I provided but based on these
> > >> the sequence we use in i915 is correct.
> > >>
> > >> >
> > >> > I wonder if we should add the dela

Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after each read/write operation

2023-10-05 Thread Rodrigo Vivi
On Thu, Oct 05, 2023 at 12:40:35PM -0300, Gustavo Sousa wrote:
> Quoting Rodrigo Vivi (2023-10-05 12:13:34-03:00)
> >On Thu, Oct 05, 2023 at 03:05:31AM -0400, Kahola, Mika wrote:
> >> > -Original Message-
> >> > From: Vivi, Rodrigo 
> >> > Sent: Wednesday, October 4, 2023 3:56 PM
> >> > To: Kahola, Mika 
> >> > Cc: intel-gfx@lists.freedesktop.org
> >> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus 
> >> > after each read/write operation
> >> > 
> >> > On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote:
> >> > > Every know and then we receive the following error when running for
> >> > > example IGT test kms_flip.
> >> > >
> >> > > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> >> > > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
> >> > >
> >> > > Since the error is sporadic in nature, the patch proposes to reset the
> >> > > message bus after every successful or unsuccessful read or write
> >> > > operation. However, testing revealed that this alone is not sufficient
> >> > > method an additiona delay is also introduces anything from 200us to
> >> > > 300us. This delay is experimental value and has no specification to
> >> > > back it up.
> >> > 
> >> > have you tried the delays without the bus_reset?
> >> Yes, we have bumped up the delay, first from 0x100 to 0x200 and then as 
> >> per 
> >> BSpec change 0xa000 and I have tried 0xf000. Increasing the timeout reduces
> >> the frequency of this error but doesn't solve this issue.
> >
> >what is exactly this BSPec's 0xa000? where can I see it? So maybe you can
> >update the message above removing the 'no specification to back it up'.
> 
> (Resending this because I got a delivery failure notification)
> 
> I think we are confusing "delay" with the "timeout parameter" of the msgbus.
> 
> The PHY has a register to control the timeout parameter of msgbus transactions
> (BSpec 65156). It's default value is 0x100. With commit e028d7a4235d
> ("drm/i915/cx0: Check and increase msgbus timeout threshold"), we had 
> integrated
> a workaround that bumped the timeout value to 0x200 in case timeouts were
> observed. Later on, there was a BSpec update with the formal timeout value to 
> be
> programmed to 0xa000, which was incorporated with commit e35628968032
> ("drm/i915/cx0: Add step for programming msgbus timer").
> 
> I *believe* what Rodrigo has asked was about the usleep_range() calls added 
> with
> this patch, if we tried to only keep the usleed_range() without the bus reset.

yes, that was my original question.

> 
> --
> Gustavo Sousa
> 
> >
> >Oh, and my english is bad, but it looks to me that 'empirical' might
> >sound better than 'experimental' for this case, since you really did
> >a lot of experiments before coming to this final conclusion.
> >
> >> 
> >> > have you talked to hw architects about this?
> >> Yes, HW guys requested traces which I provided but based on these the 
> >> sequence we use in i915
> >> is correct.
> >> 
> >> > 
> >> > I wonder if we should add the delay inside the bus_reset itself?
> >> > although the bit 15 clear check should be enough by itself and it 
> >> > doesn't look like it is a hw/fw reset involved to justify the extra
> >> > delay.
> >> That should be enough. To me, it looks like when reading/writing to the 
> >> bus maybe too fast, the hw cannot handle that and we need
> >> to reset and let things settle down before trying again.
> >> 
> >> > 
> >> > well, at least some /* FIXME: */ or /* XXX: */ comments is desired along 
> >> > with the messages if we are going with this hack without
> >> > understanding why...
> >> True, I will add these the the patch.
> >> 
> >> Thanks for review!
> >> 
> >> -Mika-
> >> > 
> >> > >
> >> > > Signed-off-by: Mika Kahola 
> >> > > ---
> >> > >  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 6 ++
> >> > >  1 file changed, 6 insertions(+)
> >> > >
> >> > > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> >> > > b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> >> > > index abd607b564f1..a71b8a29d6b0 100644
> >> > >

Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after each read/write operation

2023-10-05 Thread Gustavo Sousa
Quoting Rodrigo Vivi (2023-10-05 12:13:34-03:00)
>On Thu, Oct 05, 2023 at 03:05:31AM -0400, Kahola, Mika wrote:
>> > -Original Message-
>> > From: Vivi, Rodrigo 
>> > Sent: Wednesday, October 4, 2023 3:56 PM
>> > To: Kahola, Mika 
>> > Cc: intel-gfx@lists.freedesktop.org
>> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after 
>> > each read/write operation
>> > 
>> > On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote:
>> > > Every know and then we receive the following error when running for
>> > > example IGT test kms_flip.
>> > >
>> > > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
>> > > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
>> > >
>> > > Since the error is sporadic in nature, the patch proposes to reset the
>> > > message bus after every successful or unsuccessful read or write
>> > > operation. However, testing revealed that this alone is not sufficient
>> > > method an additiona delay is also introduces anything from 200us to
>> > > 300us. This delay is experimental value and has no specification to
>> > > back it up.
>> > 
>> > have you tried the delays without the bus_reset?
>> Yes, we have bumped up the delay, first from 0x100 to 0x200 and then as per 
>> BSpec change 0xa000 and I have tried 0xf000. Increasing the timeout reduces
>> the frequency of this error but doesn't solve this issue.
>
>what is exactly this BSPec's 0xa000? where can I see it? So maybe you can
>update the message above removing the 'no specification to back it up'.

(Resending this because I got a delivery failure notification)

I think we are confusing "delay" with the "timeout parameter" of the msgbus.

The PHY has a register to control the timeout parameter of msgbus transactions
(BSpec 65156). It's default value is 0x100. With commit e028d7a4235d
("drm/i915/cx0: Check and increase msgbus timeout threshold"), we had integrated
a workaround that bumped the timeout value to 0x200 in case timeouts were
observed. Later on, there was a BSpec update with the formal timeout value to be
programmed to 0xa000, which was incorporated with commit e35628968032
("drm/i915/cx0: Add step for programming msgbus timer").

I *believe* what Rodrigo has asked was about the usleep_range() calls added with
this patch, if we tried to only keep the usleed_range() without the bus reset.

--
Gustavo Sousa

>
>Oh, and my english is bad, but it looks to me that 'empirical' might
>sound better than 'experimental' for this case, since you really did
>a lot of experiments before coming to this final conclusion.
>
>> 
>> > have you talked to hw architects about this?
>> Yes, HW guys requested traces which I provided but based on these the 
>> sequence we use in i915
>> is correct.
>> 
>> > 
>> > I wonder if we should add the delay inside the bus_reset itself?
>> > although the bit 15 clear check should be enough by itself and it doesn't 
>> > look like it is a hw/fw reset involved to justify the extra
>> > delay.
>> That should be enough. To me, it looks like when reading/writing to the bus 
>> maybe too fast, the hw cannot handle that and we need
>> to reset and let things settle down before trying again.
>> 
>> > 
>> > well, at least some /* FIXME: */ or /* XXX: */ comments is desired along 
>> > with the messages if we are going with this hack without
>> > understanding why...
>> True, I will add these the the patch.
>> 
>> Thanks for review!
>> 
>> -Mika-
>> > 
>> > >
>> > > Signed-off-by: Mika Kahola 
>> > > ---
>> > >  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 6 ++
>> > >  1 file changed, 6 insertions(+)
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > > b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > > index abd607b564f1..a71b8a29d6b0 100644
>> > > --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > > +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > > @@ -220,9 +220,12 @@ static u8 __intel_cx0_read(struct drm_i915_private 
>> > > *i915, enum port port,
>> > >  /* 3 tries is assumed to be enough to read successfully */
>> > >  for (i = 0; i < 3; i++) {
>> > >  status = __intel_cx0_read_once(i915, port, lane, addr);
>> > > +intel_cx0_bus_reset(i915, port, l

Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after each read/write operation

2023-10-05 Thread Rodrigo Vivi
On Thu, Oct 05, 2023 at 03:05:31AM -0400, Kahola, Mika wrote:
> > -Original Message-
> > From: Vivi, Rodrigo 
> > Sent: Wednesday, October 4, 2023 3:56 PM
> > To: Kahola, Mika 
> > Cc: intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after 
> > each read/write operation
> > 
> > On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote:
> > > Every know and then we receive the following error when running for
> > > example IGT test kms_flip.
> > >
> > > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> > > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
> > >
> > > Since the error is sporadic in nature, the patch proposes to reset the
> > > message bus after every successful or unsuccessful read or write
> > > operation. However, testing revealed that this alone is not sufficient
> > > method an additiona delay is also introduces anything from 200us to
> > > 300us. This delay is experimental value and has no specification to
> > > back it up.
> > 
> > have you tried the delays without the bus_reset?
> Yes, we have bumped up the delay, first from 0x100 to 0x200 and then as per 
> BSpec change 0xa000 and I have tried 0xf000. Increasing the timeout reduces
> the frequency of this error but doesn't solve this issue.

what is exactly this BSPec's 0xa000? where can I see it? So maybe you can
update the message above removing the 'no specification to back it up'.

Oh, and my english is bad, but it looks to me that 'empirical' might
sound better than 'experimental' for this case, since you really did
a lot of experiments before coming to this final conclusion.

> 
> > have you talked to hw architects about this?
> Yes, HW guys requested traces which I provided but based on these the 
> sequence we use in i915
> is correct.
> 
> > 
> > I wonder if we should add the delay inside the bus_reset itself?
> > although the bit 15 clear check should be enough by itself and it doesn't 
> > look like it is a hw/fw reset involved to justify the extra
> > delay.
> That should be enough. To me, it looks like when reading/writing to the bus 
> maybe too fast, the hw cannot handle that and we need
> to reset and let things settle down before trying again.
> 
> > 
> > well, at least some /* FIXME: */ or /* XXX: */ comments is desired along 
> > with the messages if we are going with this hack without
> > understanding why...
> True, I will add these the the patch.
> 
> Thanks for review!
> 
> -Mika-
> > 
> > >
> > > Signed-off-by: Mika Kahola 
> > > ---
> > >  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 6 ++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > index abd607b564f1..a71b8a29d6b0 100644
> > > --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > @@ -220,9 +220,12 @@ static u8 __intel_cx0_read(struct drm_i915_private 
> > > *i915, enum port port,
> > >   /* 3 tries is assumed to be enough to read successfully */
> > >   for (i = 0; i < 3; i++) {
> > >   status = __intel_cx0_read_once(i915, port, lane, addr);
> > > + intel_cx0_bus_reset(i915, port, lane);
> > >
> > >   if (status >= 0)
> > >   return status;
> > > +
> > > + usleep_range(200, 300);
> > >   }
> > >
> > >   drm_err_once(>drm, "PHY %c Read %04x failed after %d
> > > retries.\n", @@ -299,9 +302,12 @@ static void __intel_cx0_write(struct 
> > > drm_i915_private *i915, enum port port,
> > >   /* 3 tries is assumed to be enough to write successfully */
> > >   for (i = 0; i < 3; i++) {
> > >   status = __intel_cx0_write_once(i915, port, lane, addr, data,
> > > committed);
> > > + intel_cx0_bus_reset(i915, port, lane);
> > >
> > >   if (status == 0)
> > >   return;
> > > +
> > > + usleep_range(200, 300);
> > >   }
> > >
> > >   drm_err_once(>drm,
> > > --
> > > 2.34.1
> > >


Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after each read/write operation

2023-10-05 Thread Kahola, Mika
> -Original Message-
> From: Vivi, Rodrigo 
> Sent: Wednesday, October 4, 2023 3:56 PM
> To: Kahola, Mika 
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after 
> each read/write operation
> 
> On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote:
> > Every know and then we receive the following error when running for
> > example IGT test kms_flip.
> >
> > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
> >
> > Since the error is sporadic in nature, the patch proposes to reset the
> > message bus after every successful or unsuccessful read or write
> > operation. However, testing revealed that this alone is not sufficient
> > method an additiona delay is also introduces anything from 200us to
> > 300us. This delay is experimental value and has no specification to
> > back it up.
> 
> have you tried the delays without the bus_reset?
Yes, we have bumped up the delay, first from 0x100 to 0x200 and then as per 
BSpec change 0xa000 and I have tried 0xf000. Increasing the timeout reduces
the frequency of this error but doesn't solve this issue.

> have you talked to hw architects about this?
Yes, HW guys requested traces which I provided but based on these the sequence 
we use in i915
is correct.

> 
> I wonder if we should add the delay inside the bus_reset itself?
> although the bit 15 clear check should be enough by itself and it doesn't 
> look like it is a hw/fw reset involved to justify the extra
> delay.
That should be enough. To me, it looks like when reading/writing to the bus 
maybe too fast, the hw cannot handle that and we need
to reset and let things settle down before trying again.

> 
> well, at least some /* FIXME: */ or /* XXX: */ comments is desired along with 
> the messages if we are going with this hack without
> understanding why...
True, I will add these the the patch.

Thanks for review!

-Mika-
> 
> >
> > Signed-off-by: Mika Kahola 
> > ---
> >  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > index abd607b564f1..a71b8a29d6b0 100644
> > --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > @@ -220,9 +220,12 @@ static u8 __intel_cx0_read(struct drm_i915_private 
> > *i915, enum port port,
> > /* 3 tries is assumed to be enough to read successfully */
> > for (i = 0; i < 3; i++) {
> > status = __intel_cx0_read_once(i915, port, lane, addr);
> > +   intel_cx0_bus_reset(i915, port, lane);
> >
> > if (status >= 0)
> > return status;
> > +
> > +   usleep_range(200, 300);
> > }
> >
> > drm_err_once(>drm, "PHY %c Read %04x failed after %d
> > retries.\n", @@ -299,9 +302,12 @@ static void __intel_cx0_write(struct 
> > drm_i915_private *i915, enum port port,
> > /* 3 tries is assumed to be enough to write successfully */
> > for (i = 0; i < 3; i++) {
> > status = __intel_cx0_write_once(i915, port, lane, addr, data,
> > committed);
> > +   intel_cx0_bus_reset(i915, port, lane);
> >
> > if (status == 0)
> > return;
> > +
> > +   usleep_range(200, 300);
> > }
> >
> > drm_err_once(>drm,
> > --
> > 2.34.1
> >


Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after each read/write operation

2023-10-04 Thread Rodrigo Vivi
On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote:
> Every know and then we receive the following error when running
> for example IGT test kms_flip.
> 
> [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
> 
> Since the error is sporadic in nature, the patch proposes
> to reset the message bus after every successful or unsuccessful
> read or write operation. However, testing revealed that this
> alone is not sufficient method an additiona delay is also
> introduces anything from 200us to 300us. This delay is experimental
> value and has no specification to back it up.

have you tried the delays without the bus_reset?
have you talked to hw architects about this?

I wonder if we should add the delay inside the bus_reset itself?
although the bit 15 clear check should be enough by itself and
it doesn't look like it is a hw/fw reset involved to justify the
extra delay.

well, at least some /* FIXME: */ or /* XXX: */ comments is
desired along with the messages if we are going with this
hack without understanding why...

> 
> Signed-off-by: Mika Kahola 
> ---
>  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c 
> b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> index abd607b564f1..a71b8a29d6b0 100644
> --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> @@ -220,9 +220,12 @@ static u8 __intel_cx0_read(struct drm_i915_private 
> *i915, enum port port,
>   /* 3 tries is assumed to be enough to read successfully */
>   for (i = 0; i < 3; i++) {
>   status = __intel_cx0_read_once(i915, port, lane, addr);
> + intel_cx0_bus_reset(i915, port, lane);
>  
>   if (status >= 0)
>   return status;
> +
> + usleep_range(200, 300);
>   }
>  
>   drm_err_once(>drm, "PHY %c Read %04x failed after %d retries.\n",
> @@ -299,9 +302,12 @@ static void __intel_cx0_write(struct drm_i915_private 
> *i915, enum port port,
>   /* 3 tries is assumed to be enough to write successfully */
>   for (i = 0; i < 3; i++) {
>   status = __intel_cx0_write_once(i915, port, lane, addr, data, 
> committed);
> + intel_cx0_bus_reset(i915, port, lane);
>  
>   if (status == 0)
>   return;
> +
> + usleep_range(200, 300);
>   }
>  
>   drm_err_once(>drm,
> -- 
> 2.34.1
>