Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Dave Airlie
On Thu, 4 Aug 2022 at 15:25, Dave Airlie  wrote:
>
> On Thu, 4 Aug 2022 at 14:46, Linus Torvalds
>  wrote:
> >
> > On Wed, Aug 3, 2022 at 9:27 PM Linus Torvalds
> >  wrote:
> > >
> > > I'll do a few more. It's close enough already that it should be just
> > > four more reboots to pinpoint exactly which commit breaks.
> >
> > commit 5d945cbcd4b16a29d6470a80dfb19738f9a4319f is the first bad commit.
> >
> > I think it's supposed to make no semantic changes, but it clearly does.
> >
> > What a pain to figure out what's wrong in there, and I assume it
> > doesn't revert cleanly either.
> >
> > Bringing in the guilty parties. See
> >
> >   
> > https://lore.kernel.org/all/CAHk-=wj+yzaunxiewhfcrkbdlsqkizdr1q3yjlaqpo6avq2...@mail.gmail.com/
> >
> > for the beginning of this thread.
>
> I think I've tracked it down, looks like it would only affect GFX8
> cards, which might explain why you and I have seen it, and I haven't
> seen any other reports.
>
> pretty sure you have an rx580, and I just happen to have a fiji card
> in this machine right now.
>
> I'll retest on master and send you a fixup patch.

To close the loop

https://lore.kernel.org/all/20220804055036.691670-1-airl...@redhat.com/T/#u

Seems to fix it here.

Dave.


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Dave Airlie
On Thu, 4 Aug 2022 at 14:46, Linus Torvalds
 wrote:
>
> On Wed, Aug 3, 2022 at 9:27 PM Linus Torvalds
>  wrote:
> >
> > I'll do a few more. It's close enough already that it should be just
> > four more reboots to pinpoint exactly which commit breaks.
>
> commit 5d945cbcd4b16a29d6470a80dfb19738f9a4319f is the first bad commit.
>
> I think it's supposed to make no semantic changes, but it clearly does.
>
> What a pain to figure out what's wrong in there, and I assume it
> doesn't revert cleanly either.
>
> Bringing in the guilty parties. See
>
>   
> https://lore.kernel.org/all/CAHk-=wj+yzaunxiewhfcrkbdlsqkizdr1q3yjlaqpo6avq2...@mail.gmail.com/
>
> for the beginning of this thread.

I think I've tracked it down, looks like it would only affect GFX8
cards, which might explain why you and I have seen it, and I haven't
seen any other reports.

pretty sure you have an rx580, and I just happen to have a fiji card
in this machine right now.

I'll retest on master and send you a fixup patch.

Dave.


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Wed, Aug 3, 2022 at 9:27 PM Linus Torvalds
 wrote:
>
> I'll do a few more. It's close enough already that it should be just
> four more reboots to pinpoint exactly which commit breaks.

commit 5d945cbcd4b16a29d6470a80dfb19738f9a4319f is the first bad commit.

I think it's supposed to make no semantic changes, but it clearly does.

What a pain to figure out what's wrong in there, and I assume it
doesn't revert cleanly either.

Bringing in the guilty parties. See

  
https://lore.kernel.org/all/CAHk-=wj+yzaunxiewhfcrkbdlsqkizdr1q3yjlaqpo6avq2...@mail.gmail.com/

for the beginning of this thread.

Linus


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Wed, Aug 3, 2022 at 9:24 PM Dave Airlie  wrote:
>
> I've reproduced it, I'll send you a revert pile when I confirm it is
> the buddy allocator.

I've bisected it to 86bd6706c404..074293dd9f61 and don't see "buddy"
in any of those commits.

I'll do a few more. It's close enough already that it should be just
four more reboots to pinpoint exactly which commit breaks.

  Linus


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Dave Airlie
On Thu, 4 Aug 2022 at 14:02, Linus Torvalds
 wrote:
>
> On Wed, Aug 3, 2022 at 8:53 PM Dave Airlie  wrote:
> >
> > > It works on my intel laptop, so it's amdgpu somewhere.
> >
> > I'll spin my ryzen up to see if I can reproduce, and test against the
> > drm-next pre-merge tree as well.
>
> So it's not my merge - I've had a bad result in the middle of the DRM
> history too.
>
> On a positive note, my arm64 machine works fine, but that's just using
> fbdev so ...
>
> But another datapoint to say that it's amdgpu-specific. Not that that
> was really in doubt.

I've reproduced it, I'll send you a revert pile when I confirm it is
the buddy allocator.

Dave.


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Wed, Aug 3, 2022 at 8:53 PM Dave Airlie  wrote:
>
> > It works on my intel laptop, so it's amdgpu somewhere.
>
> I'll spin my ryzen up to see if I can reproduce, and test against the
> drm-next pre-merge tree as well.

So it's not my merge - I've had a bad result in the middle of the DRM
history too.

On a positive note, my arm64 machine works fine, but that's just using
fbdev so ...

But another datapoint to say that it's amdgpu-specific. Not that that
was really in doubt.

Linus


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Dave Airlie
On Thu, 4 Aug 2022 at 13:47, Linus Torvalds
 wrote:
>
> On Wed, Aug 3, 2022 at 8:37 PM Dave Airlie  wrote:
> >
> > Actually I did miss that so that looks good.
>
> .. I wish it did, but I just actually test-booted my desktop with the
> result, and it crashes the X server.  This seems to be the splat in
> Xorg.0.log:
>
>   (II) Initializing extension DRI2
>   (II) AMDGPU(0): Setting screen physical size to 2032 x 571
>   (EE)
>   (EE) Backtrace:
>   (EE) 0: /usr/libexec/Xorg (OsLookupColor+0x13d) [0x55b1dc61258d]
>   (EE) 1: /lib64/libc.so.6 (__sigaction+0x50) [0x7f7972a3ea70]
>   (EE) 2: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
> (AMDGPUCreateWindow_oneshot+0x101) [0x7f797207ddd1]
>   (EE) 3: /usr/libexec/Xorg (compIsAlternateVisual+0xdc4) [0x55b1dc545fa4]
>   (EE) 4: /usr/libexec/Xorg (InitRootWindow+0x17) [0x55b1dc4e0047]
>   (EE) 5: /usr/libexec/Xorg (miPutImage+0xd4c) [0x55b1dc49e60b]
>   (EE) 6: /lib64/libc.so.6 (__libc_start_call_main+0x80) [0x7f7972a29550]
>   (EE) 7: /lib64/libc.so.6 (__libc_start_main+0x89) [0x7f7972a29609]
>   (EE) 8: /usr/libexec/Xorg (_start+0x25) [0x55b1dc49f2c5]
>   (EE)
>   (EE) Segmentation fault at address 0x4
>   (EE)
> Fatal server error:
>   (EE) Caught signal 11 (Segmentation fault). Server aborting
>
> so something is going horribly wrong. No kernel oops, though.
>
> It works on my intel laptop, so it's amdgpu somewhere.

I'll spin my ryzen up to see if I can reproduce, and test against the
drm-next pre-merge tree as well.

Dave.


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Wed, Aug 3, 2022 at 8:37 PM Dave Airlie  wrote:
>
> Actually I did miss that so that looks good.

.. I wish it did, but I just actually test-booted my desktop with the
result, and it crashes the X server.  This seems to be the splat in
Xorg.0.log:

  (II) Initializing extension DRI2
  (II) AMDGPU(0): Setting screen physical size to 2032 x 571
  (EE)
  (EE) Backtrace:
  (EE) 0: /usr/libexec/Xorg (OsLookupColor+0x13d) [0x55b1dc61258d]
  (EE) 1: /lib64/libc.so.6 (__sigaction+0x50) [0x7f7972a3ea70]
  (EE) 2: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
(AMDGPUCreateWindow_oneshot+0x101) [0x7f797207ddd1]
  (EE) 3: /usr/libexec/Xorg (compIsAlternateVisual+0xdc4) [0x55b1dc545fa4]
  (EE) 4: /usr/libexec/Xorg (InitRootWindow+0x17) [0x55b1dc4e0047]
  (EE) 5: /usr/libexec/Xorg (miPutImage+0xd4c) [0x55b1dc49e60b]
  (EE) 6: /lib64/libc.so.6 (__libc_start_call_main+0x80) [0x7f7972a29550]
  (EE) 7: /lib64/libc.so.6 (__libc_start_main+0x89) [0x7f7972a29609]
  (EE) 8: /usr/libexec/Xorg (_start+0x25) [0x55b1dc49f2c5]
  (EE)
  (EE) Segmentation fault at address 0x4
  (EE)
Fatal server error:
  (EE) Caught signal 11 (Segmentation fault). Server aborting

so something is going horribly wrong. No kernel oops, though.

It works on my intel laptop, so it's amdgpu somewhere.

I guess I will start bisecting. Oy vey.

 Linus


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Dave Airlie
On Thu, 4 Aug 2022 at 13:16, Linus Torvalds
 wrote:
>
> On Wed, Aug 3, 2022 at 7:46 PM Linus Torvalds
>  wrote:
> >
> > I think I have it resolved, am still doing a full build test, and will
> > then compare against what your suggested merge is.
>
> Hmm.
>
> I end up with *almost* the same thing.
>
> Except I ended up with a
>
> select DRM_BUDDY
>
> for the DRM_AMDGPU config entry, and you don't have that.
>
> I *think* my version is correct, in that clearly the amdgpu driver now
> uses that buddy logic (just doing a random "grep drm_buddy_block" to
> see).

Actually I did miss that so that looks good.

>
> But this was messy enough to resolve that I think people should
> double-check my end, and maybe I just got confused at some point in
> the process.
>
> And while I seem to have gotten the same result as you did on the i915
> firmware side too, again, I'd like people to re-verify.

I'll pull it down and look over it.

Dave.


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread pr-tracker-bot
The pull request you sent on Wed, 3 Aug 2022 15:37:43 +1000:

> git://anongit.freedesktop.org/drm/drm tags/drm-next-2022-08-03

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Wed, Aug 3, 2022 at 7:46 PM Linus Torvalds
 wrote:
>
> I think I have it resolved, am still doing a full build test, and will
> then compare against what your suggested merge is.

Hmm.

I end up with *almost* the same thing.

Except I ended up with a

select DRM_BUDDY

for the DRM_AMDGPU config entry, and you don't have that.

I *think* my version is correct, in that clearly the amdgpu driver now
uses that buddy logic (just doing a random "grep drm_buddy_block" to
see).

But this was messy enough to resolve that I think people should
double-check my end, and maybe I just got confused at some point in
the process.

And while I seem to have gotten the same result as you did on the i915
firmware side too, again, I'd like people to re-verify.

   Linus


Re: [git pull] drm for 5.20/6.0

2022-08-03 Thread Linus Torvalds
On Tue, Aug 2, 2022 at 10:38 PM Dave Airlie  wrote:
>
> This is a conflicty one. The late revert in 5.19 of the amdgpu buddy
> allocator causes major conflict fallout. The buddy allocator code in
> this one works, so the resolutions are usually just to take stuff from
> this. It might actually be cleaner if you revert
> 925b6e59138cefa47275c67891c65d48d3266d57 (Revert "drm/amdgpu: add drm
> buddy support to amdgpu") first in your tree then merge this.

Ugh, what a pain. The other conflicts are also due to just randomly
duplicated commits, with *usually* your drm tree having the superset
(so "just take yours" is the easy resolution), but not always (ie the
Intel firmware-69 mess was apparently not dealt with in the
development tree).

Nasty.

I think I have it resolved, am still doing a full build test, and will
then compare against what your suggested merge is.

  Linus