Re: [Nouveau] 6.2 still cannot get hdmi display out on Thinkpad P73 Quadro RTX 4000 Mobile/TU104

2023-05-04 Thread Marc MERLIN
On Thu, May 04, 2023 at 10:43:21PM -0500, Steven Kucharzyk wrote:
> On Thu, 4 May 2023 16:32:16 -0700
> Marc MERLIN  wrote:
> 
> > Hi again, I just saw a bunch of commits from all of you (thanks), but
> > still can't find info if my thinkpad P73 with Quadro RTX 4000
> > Mobile/TU104 is meant to be supported, or not, and if so, how I can
> > best report issues beyond what I've already sent.
> > 
> > The intel graphics works great thankfully, but I do need to use HDMI
> > out from time to time, which is only wired to the nvidia chip
> > unfortunately.
> > 
> > Guidance would be very appreciated.
> 
> I'm going to take a leap here ...
> 
> any UEFI ? TSM ?
 
Yes, I boot with UEFI.  Not sure what TSM means

> In the spec's that I looked at Lenovo's ThinkPad P73 FHD / 4K UHD
> personally I found it interesting that the "up to" Nvidia Quadro RTX
> 5000 was listed as "Discrete" vs. the UHD Graphics 620 (24 EUs) as
> "Integrated".  Are you 4K?

4K correct. As far as I understand, I have integrated intel graphics,
which is what I use every day, and that nvidia chip I never use and have
no real need for, except that external display ports are only connected
to that chip, so I have to use it in that case.
I had a P70 with the same config and was able to get nouveau working on
it and HDMI out, but P73 uses different chips and never fully got it
working (well, the monitor turns on and I see a mouse cursor, so
something works)
https://docs.google.com/document/d/1GnyBE1xc4qx3EF-IcUOwr7d9D8Npzy63Pwj-joOw86o/view#heading=h.tmm3ssfqplva
explains how I got it to work on P70

> HDMI ... I have had issues with Laps + HDMI when plugging the cable
> into an already turn-on monitor. I have taken DVI-I > DP cables just to
> see ... Next I didn't see any reference to Nvidia drivers is that your
> option ? (I know, I live with bane of a "tainted kernel" because of
> them and flop back and forth to see how Nouveau is progressing)
 
I do not have plans to use the nvidia binary drivers and do want my
nvidia chip to be turned off all the time except when I need video out
(for battery reasons)

If you wanted context/more info:
https://www.spinics.net/lists/nouveau/msg11393.html
https://www.spinics.net/lists/nouveau/msg11394.html

and older from 2020:
https://www.spinics.net/lists/nouveau/msg05361.html

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/  


Re: [Nouveau] 6.2 still cannot get hdmi display out on Thinkpad P73 Quadro RTX 4000 Mobile/TU104

2023-05-04 Thread Steven Kucharzyk
On Thu, 4 May 2023 16:32:16 -0700
Marc MERLIN  wrote:

> Hi again, I just saw a bunch of commits from all of you (thanks), but
> still can't find info if my thinkpad P73 with Quadro RTX 4000
> Mobile/TU104 is meant to be supported, or not, and if so, how I can
> best report issues beyond what I've already sent.
> 
> The intel graphics works great thankfully, but I do need to use HDMI
> out from time to time, which is only wired to the nvidia chip
> unfortunately.
> 
> Guidance would be very appreciated.
> 
> Thanks,
> Marc

I'm going to take a leap here ...

any UEFI ? TSM ?

In the spec's that I looked at Lenovo's ThinkPad P73 FHD / 4K UHD
personally I found it interesting that the "up to" Nvidia Quadro RTX
5000 was listed as "Discrete" vs. the UHD Graphics 620 (24 EUs) as
"Integrated".  Are you 4K?

I see it TB3's and that could mean that if you're going 4K you'd want
to be using those ... a guess since I have neither.

HDMI ... I have had issues with Laps + HDMI when plugging the cable
into an already turn-on monitor. I have taken DVI-I > DP cables just to
see ... Next I didn't see any reference to Nvidia drivers is that your
option ? (I know, I live with bane of a "tainted kernel" because of
them and flop back and forth to see how Nouveau is progressing)

Either my eye's are getting worse or I missed what O/s your running ...
not sure that's the issue or drivers and possibly HDMI & TB.




Re: [Nouveau] 6.2 still cannot get hdmi display out on Thinkpad P73 Quadro RTX 4000 Mobile/TU104

2023-05-04 Thread Marc MERLIN
Hi again, I just saw a bunch of commits from all of you (thanks), but
still can't find info if my thinkpad P73 with Quadro RTX 4000 Mobile/TU104 
is meant to be supported, or not, and if so, how I can best report
issues beyond what I've already sent.

The intel graphics works great thankfully, but I do need to use HDMI out
from time to time, which is only wired to the nvidia chip unfortunately.

Guidance would be very appreciated.

Thanks,
Marc

On Thu, Apr 20, 2023 at 10:46:20PM -0700, Marc MERLIN wrote:
> Tested with 6.2.8 and still nothing.  Is it meant to work at all?
> 
> Intel graphics works, but as soon as I plug in external HDMI, nouveau outputs 
> huge amount of spam logs
> but nothing seems to work
> 
> nouveau: detected PR support, will not use DSM
> nouveau :01:00.0: enabling device ( -> 0003)
> Console: switching to colour dummy device 80x25
> nouveau :01:00.0: NVIDIA TU104 (164000a1)
> nouveau :01:00.0: bios: version 90.04.4d.00.2c
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/nvdec/scrubber.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/acr/bl.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/acr/ucode_ahesasc.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/acr/bl.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/acr/ucode_asb.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/acr/unload_bl.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/acr/ucode_unload.bin
> nouveau :01:00.0: pmu: firmware unavailable
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/fecs_bl.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/fecs_inst.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/fecs_data.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/fecs_sig.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/gpccs_bl.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/gpccs_inst.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/gpccs_data.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/gpccs_sig.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/sw_nonctx.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/sw_ctx.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/sw_bundle_init.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/gr/sw_method_init.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/sec2/sig.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/sec2/image.bin
> nouveau :01:00.0: firmware: direct-loading firmware 
> nvidia/tu104/sec2/desc.bin
> nouveau :01:00.0: disp: preinit running...
> nouveau :01:00.0: disp: preinit completed in 0us
> nouveau :01:00.0: disp: fini running...
> nouveau :01:00.0: disp: fini completed in 0us
> nouveau :01:00.0: fb: 8192 MiB GDDR6
> nouveau :01:00.0: disp: init running...
> nouveau :01:00.0: disp: init skipped, engine has no users
> nouveau :01:00.0: disp: init completed in 2us
> nouveau :01:00.0: DRM: VRAM: 8192 MiB
> nouveau :01:00.0: DRM: GART: 536870912 MiB
> nouveau :01:00.0: DRM: BIT table 'A' not found
> nouveau :01:00.0: DRM: BIT table 'L' not found
> nouveau :01:00.0: DRM: TMDS table version 2.0
> nouveau :01:00.0: DRM: DCB version 4.1
> nouveau :01:00.0: DRM: DCB outp 00: 02800f66 04600020
> nouveau :01:00.0: DRM: DCB outp 01: 02011f52 00020010
> nouveau :01:00.0: DRM: DCB outp 02: 01022f36 04600010
> nouveau :01:00.0: DRM: DCB outp 03: 04033f76 04600010
> nouveau :01:00.0: DRM: DCB outp 04: 04044f86 04600020
> nouveau :01:00.0: DRM: DCB conn 00: 00020047
> nouveau :01:00.0: DRM: DCB conn 01: 00010161
> nouveau :01:00.0: DRM: DCB conn 02: 1248
> nouveau :01:00.0: DRM: DCB conn 03: 01000348
> nouveau :01:00.0: DRM: DCB conn 04: 02000471
> nouveau :01:00.0: DRM: MM: using COPY for buffer copies
> nouveau :01:00.0: disp: init running...
> nouveau :01:00.0: disp: one-time init running...
> nouveau :01:00.0: disp: outp 00:0006:0f82: type 06 loc 0 or 2 link 2 con 
> 0 edid 6 bus 0 head f
> nouveau :01:00.0: disp: outp 00:0006:0f82: bios dp 42 13 00 00
> nouveau :01:00.0: disp: outp 01:0002:0f42: type 02 loc 0 or 2 link 1 con 
> 1 edid 5 bus 1 head f
> nouveau :01:00.0: disp: outp 02:0006:0f41: type 06 loc 0 or 1 link 1 con 
> 2 edid 3 bus 2 head f
> nouveau :01:00.0: disp: outp 02:0006:0f41: bios dp 42 13 00 00
> nouveau :01:00.0: disp: outp 03:0006:0f44: type 06 loc 0 or 4 link 1 con 
> 3 edid 7 bus 3 head 

Re: [Nouveau] Disabling -Warray-bounds for gcc-13 too

2023-05-04 Thread Kees Cook
On April 27, 2023 3:50:06 PM PDT, Karol Herbst  wrote:
>On Fri, Apr 28, 2023 at 12:46 AM Lyude Paul  wrote:
>>
>> Hey Linus, Kees. Responses below
>>
>> On Sun, 2023-04-23 at 13:23 -0700, Kees Cook wrote:
>> > On April 23, 2023 10:36:24 AM PDT, Linus Torvalds 
>> >  wrote:
>> > > Kees,
>> > >  I made the mistake of upgrading my M2 Macbook Air to Fedora-38, and
>> > > in the process I got gcc-13 which is not WERROR-clean because we only
>> > > limited the 'array-bounds' warning to gcc-11 and gcc-12. But gcc-13
>> > > has all the same issues.
>> > >
>> > > And I want to be able to do my arm64 builds with WERROR on still...
>> > >
>> > > I guess it never made much sense to hope it was going to go away
>> > > without having a confirmation, so I just changed it to be gcc-11+.
>> >
>> > Yeah, that's fine. GCC 13 released without having a fix for at least one 
>> > (hopefully last) known array-bounds vs jump threading bug:
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071
>> >
>> > > And one of them is from you.
>> > >
>> > > In particular, commit 4076ea2419cf ("drm/nouveau/disp: Fix
>> > > nvif_outp_acquire_dp() argument size") cannot possibly be right, It
>> > > changes
>> > >
>> > > nvif_outp_acquire_dp(struct nvif_outp *outp, u8 dpcd[16],
>> > >
>> > > to
>> > >
>> > > nvif_outp_acquire_dp(struct nvif_outp *outp, u8 
>> > > dpcd[DP_RECEIVER_CAP_SIZE],
>> > >
>> > > and then does
>> > >
>> > >memcpy(args.dp.dpcd, dpcd, sizeof(args.dp.dpcd));
>> > >
>> > > where that 'args.dp.dpcd' is a 16-byte array, and DP_RECEIVER_CAP_SIZE 
>> > > is 15.
>> >
>> > Yeah, it was an incomplete fix. I sent the other half here, but it fell 
>> > through the cracks:
>> > https://lore.kernel.org/lkml/20230204184307.never.825-k...@kernel.org/
>>
>> Thanks for bringing this to our attention, yeah this definitely just looks
>> like it got missed somewhere down the line. It looks like Karol responded
>> already so I assume the patch is in the pipeline now, but let me know if
>> there's anything else you need.
>>
>
>uhm, I didn't push anything, but I can push it through drm-misct asap,
>just wanted to ask if somebody wants to pick a quicker route. But I
>guess not?

If you can pick it up, that would be great. There's no rush. :)



-- 
Kees Cook


Re: [Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

2023-05-04 Thread Diogo Ivo
On Mon, Jan 30, 2023 at 08:36:06AM +1000, Ben Skeggs wrote:
> On Fri, 27 Jan 2023 at 20:42, Diogo Ivo  wrote:
> >
> > On Fri, Jan 27, 2023 at 04:00:59PM +1000, Ben Skeggs wrote:
> > > On Fri, 20 Jan 2023 at 21:37, Diogo Ivo  
> > > wrote:
> > > >
> > > > On Wed, Jan 18, 2023 at 11:28:49AM +1000, Ben Skeggs wrote:
> > > > > On Mon, 16 Jan 2023 at 22:27, Diogo Ivo 
> > > > >  wrote:
> > > > > > On Mon, Jan 16, 2023 at 07:45:05AM +1000, David Airlie wrote:
> > > > > > > As a quick check can you try changing
> > > > > > >
> > > > > > > drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target
> > > > > > > from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ?
> > > >
> > > > > In addition to Dave's change, can you try changing the
> > > > > nvkm_falcon_load_dmem() call in gm20b_pmu_init() to:
> > > > >
> > > > > nvkm_falcon_pio_wr(falcon, (u8 *), 0, 0, DMEM, addr_args,
> > > > > sizeof(args), 0, false);
> > > >
> > > > Chiming in just to say that with this change I see the same as Nicolas
> > > > except that the init message size is 255 instead of 0:
> > > >
> > > > [2.196934] nouveau 5700.gpu: pmu: unexpected init message size 
> > > > 255 vs 42
> > > I've attached an entirely untested patch (to go on top of the other
> > > hacks/fixes so far), that will hopefully get us a little further.
> >
> > Hello,
> >
> > Thank you for the patch! I can confirm that it fixes the problem
> > on the Pixel C, and everything works as before the regression.
> > With this, for the combination of patches
> >
> > Tested-by: Diogo Ivo 
> >
> > which I can resend after testing the final patch version.
> Thank you (both!) for testing!
> 
> I've attached a "final" version of a patch that I'll send (assuming it
> still works ;)) after re-testing.  There's only a minor change to
> avoid breaking the non-Tegra path, so I expect it should be fine.

Hello!

I have tested this new version and everything is working as before, so

Tested-by: Diogo Ivo 

Thank you,
Diogo


Re: [Nouveau] 2023 X.Org Foundation Membership deadline for voting in the election

2023-05-04 Thread Ricardo Garcia
This is a reminder that the deadline for new memberships and renewals
finishes in a couple of weeks. Original email follows.

Thanks for your attention.

On Wed, 2023-02-15 at 16:58 +0100, Ricardo Garcia wrote:
> The 2023 X.Org Foundation elections are rapidly approaching. We will be
> forwarding the election schedule and nominating process to the
> membership shortly.
> 
> Please note that only current members can vote in the upcoming election,
> and that the deadline for new memberships or renewals to vote in the
> upcoming election is 26 March 2023 at 23:59 UTC.
> 
> If you are interested in joining the X.Org Foundation or in renewing
> your membership, please visit the membership system site at:
> https://members.x.org/
> 
> Ricardo Garcia, on behalf of the X.Org elections committee



[Nouveau] [PATCH 4/6] drm/ttm: Change the parameters of ttm_range_man_init() from pages to bytes

2023-05-04 Thread Somalapuram Amaranath
Change the parameters of ttm_range_man_init_nocheck()
size from page size to byte size.
Cleanup the PAGE_SHIFT operation on the depended caller functions.

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 ++--
 drivers/gpu/drm/drm_gem_vram_helper.c   | 2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c | 4 ++--
 drivers/gpu/drm/ttm/ttm_range_manager.c | 8 
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +-
 include/drm/ttm/ttm_range_manager.h | 6 +++---
 6 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 6b270d4662a3..f0dabdfd3780 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -75,10 +75,10 @@ static void amdgpu_ttm_backend_unbind(struct ttm_device 
*bdev,
 
 static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev,
unsigned int type,
-   uint64_t size_in_page)
+   uint64_t size)
 {
return ttm_range_man_init(>mman.bdev, type,
- false, size_in_page);
+ false, size << PAGE_SHIFT);
 }
 
 /**
diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
b/drivers/gpu/drm/drm_gem_vram_helper.c
index e7be562790de..db1915414e4a 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -999,7 +999,7 @@ static int drm_vram_mm_init(struct drm_vram_mm *vmm, struct 
drm_device *dev,
return ret;
 
ret = ttm_range_man_init(>bdev, TTM_PL_VRAM,
-false, vram_size >> PAGE_SHIFT);
+false, vram_size);
if (ret)
return ret;
 
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 777d38b211d2..aa8785b6b1e8 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -70,13 +70,13 @@ struct radeon_device *radeon_get_rdev(struct ttm_device 
*bdev)
 static int radeon_ttm_init_vram(struct radeon_device *rdev)
 {
return ttm_range_man_init(>mman.bdev, TTM_PL_VRAM,
- false, rdev->mc.real_vram_size >> PAGE_SHIFT);
+ false, rdev->mc.real_vram_size);
 }
 
 static int radeon_ttm_init_gtt(struct radeon_device *rdev)
 {
return ttm_range_man_init(>mman.bdev, TTM_PL_TT,
- true, rdev->mc.gtt_size >> PAGE_SHIFT);
+ true, rdev->mc.gtt_size);
 }
 
 static void radeon_evict_flags(struct ttm_buffer_object *bo,
diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c
index ae11d07eb63a..62fddcc59f02 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -169,7 +169,7 @@ static const struct ttm_resource_manager_func 
ttm_range_manager_func = {
  * @bdev: ttm device
  * @type: memory manager type
  * @use_tt: if the memory manager uses tt
- * @p_size: size of area to be managed in pages.
+ * @size: size of area to be managed in bytes.
  *
  * The range manager is installed for this device in the type slot.
  *
@@ -177,7 +177,7 @@ static const struct ttm_resource_manager_func 
ttm_range_manager_func = {
  */
 int ttm_range_man_init_nocheck(struct ttm_device *bdev,
   unsigned type, bool use_tt,
-  unsigned long p_size)
+  u64 size)
 {
struct ttm_resource_manager *man;
struct ttm_range_manager *rman;
@@ -191,9 +191,9 @@ int ttm_range_man_init_nocheck(struct ttm_device *bdev,
 
man->func = _range_manager_func;
 
-   ttm_resource_manager_init(man, bdev, p_size);
+   ttm_resource_manager_init(man, bdev, size);
 
-   drm_mm_init(>mm, 0, p_size);
+   drm_mm_init(>mm, 0, size);
spin_lock_init(>lock);
 
ttm_set_driver_manager(bdev, type, >manager);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 9ad28346aff7..4926e7c73e75 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -700,7 +700,7 @@ static int vmw_vram_manager_init(struct vmw_private 
*dev_priv)
 {
int ret;
ret = ttm_range_man_init(_priv->bdev, TTM_PL_VRAM, false,
-dev_priv->vram_size >> PAGE_SHIFT);
+dev_priv->vram_size);
ttm_resource_manager_set_used(ttm_manager_type(_priv->bdev, 
TTM_PL_VRAM), false);
return ret;
 }
diff --git a/include/drm/ttm/ttm_range_manager.h 
b/include/drm/ttm/ttm_range_manager.h
index 7963b957e9ef..05bffded1b53 100644
--- a/include/drm/ttm/ttm_range_manager.h
+++ b/include/drm/ttm/ttm_range_manager.h
@@ -36,15 +36,15 @@ to_ttm_range_mgr_node(struct ttm_resource *res)
 
 int 

Re: [Nouveau] [Intel-gfx] [PATCH v2 2/3] drm/fb-helper: Set framebuffer for vga-switcheroo clients

2023-05-04 Thread Rodrigo Vivi
On Thu, Jan 12, 2023 at 09:11:55PM +0100, Thomas Zimmermann wrote:
> Set the framebuffer info for drivers that support VGA switcheroo. Only
> affects the amdgpu and nouveau drivers, which use VGA switcheroo and
> generic fbdev emulation. For other drivers, this does nothing.
> 
> This fixes a potential regression in the console code. Both, amdgpu and
> nouveau, invoked vga_switcheroo_client_fb_set() from their internal fbdev
> code. But the call got lost when the drivers switched to the generic
> emulation.
> 
> Fixes: 087451f372bf ("drm/amdgpu: use generic fb helpers instead of setting 
> up AMD own's.")
> Fixes: 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers")
> Signed-off-by: Thomas Zimmermann 
> Reviewed-by: Daniel Vetter 
> Reviewed-by: Alex Deucher 
> Cc: Ben Skeggs 
> Cc: Karol Herbst 
> Cc: Lyude Paul 
> Cc: Thomas Zimmermann 
> Cc: Javier Martinez Canillas 
> Cc: Laurent Pinchart 
> Cc: Jani Nikula 
> Cc: Dave Airlie 
> Cc: Evan Quan 
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Hawking Zhang 
> Cc: Likun Gao 
> Cc: "Christian König" 
> Cc: Stanley Yang 
> Cc: "Tianci.Yin" 
> Cc: Xiaojian Du 
> Cc: Andrey Grodzovsky 
> Cc: YiPeng Chai 
> Cc: Somalapuram Amaranath 
> Cc: Bokun Zhang 
> Cc: Guchun Chen 
> Cc: Hamza Mahfooz 
> Cc: Aurabindo Pillai 
> Cc: Mario Limonciello 
> Cc: Solomon Chiu 
> Cc: Kai-Heng Feng 
> Cc: Felix Kuehling 
> Cc: Daniel Vetter 
> Cc: "Marek Olšák" 
> Cc: Sam Ravnborg 
> Cc: Hans de Goede 
> Cc: "Ville Syrjälä" 
> Cc: dri-de...@lists.freedesktop.org
> Cc: nouveau@lists.freedesktop.org
> Cc:  # v5.17+
> ---
>  drivers/gpu/drm/drm_fb_helper.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
> index 427631706128..5e445c61252d 100644
> --- a/drivers/gpu/drm/drm_fb_helper.c
> +++ b/drivers/gpu/drm/drm_fb_helper.c
> @@ -30,7 +30,9 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
>  #include 
> +#include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1940,6 +1942,7 @@ static int drm_fb_helper_single_fb_probe(struct 
> drm_fb_helper *fb_helper,
>int preferred_bpp)
>  {
>   struct drm_client_dev *client = _helper->client;
> + struct drm_device *dev = fb_helper->dev;

On drm-tip, this commit has a silent conflict with
cff84bac9922 ("drm/fh-helper: Split fbdev single-probe helper")
that's already in drm-next.

I had created a fix-up patch in drm-tip re-introducing this line.

We probably need a backmerge from drm-next into drm-misc-fixes with
the resolution applied there. And probably propagated that resolution
later...

>   struct drm_fb_helper_surface_size sizes;
>   int ret;
>  
> @@ -1961,6 +1964,11 @@ static int drm_fb_helper_single_fb_probe(struct 
> drm_fb_helper *fb_helper,
>   return ret;
>  
>   strcpy(fb_helper->fb->comm, "[fbcon]");
> +
> + /* Set the fb info for vgaswitcheroo clients. Does nothing otherwise. */
> + if (dev_is_pci(dev->dev))
> + vga_switcheroo_client_fb_set(to_pci_dev(dev->dev), 
> fb_helper->info);
> +
>   return 0;
>  }
>  
> -- 
> 2.39.0
> 


[Nouveau] [PATCH v2 2/2] drm/nouveau/clk: avoid usage of list iterator after loop

2023-05-04 Thread Jakob Koschel
If potentially no valid element is found, 'pstate' would contain an
invalid pointer past the iterator loop. To ensure 'pstate' is always
valid, we only set it if the correct element was found. That allows
adding a WARN_ON() in case the code works incorrectly, exposing
currently undetectable potential bugs.

Additionally, Linus proposed to avoid any use of the list iterator
variable after the loop, in the attempt to move the list iterator
variable declaration into the macro to avoid any potential misuse after
the loop [1].

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c
index da07a2fbef06..d914cce6d0b8 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c
@@ -269,14 +269,18 @@ nvkm_pstate_prog(struct nvkm_clk *clk, int pstatei)
struct nvkm_subdev *subdev = >subdev;
struct nvkm_fb *fb = subdev->device->fb;
struct nvkm_pci *pci = subdev->device->pci;
-   struct nvkm_pstate *pstate;
+   struct nvkm_pstate *pstate = NULL, *iter;
int ret, idx = 0;
 
-   list_for_each_entry(pstate, >states, head) {
-   if (idx++ == pstatei)
+   list_for_each_entry(iter, >states, head) {
+   if (idx++ == pstatei) {
+   pstate = iter;
break;
+   }
}
 
+   if (WARN_ON(!pstate))
+   return -EINVAL;
nvkm_debug(subdev, "setting performance state %d\n", pstatei);
clk->pstate = pstatei;
 

-- 
2.34.1



[Nouveau] [PATCH] drm/nouveau: dispnv50: fix missing-prototypes warning

2023-05-04 Thread Arnd Bergmann
From: Arnd Bergmann 

nv50_display_create() is declared in another header, along with
a couple of declarations that are now outdated:

drivers/gpu/drm/nouveau/dispnv50/disp.c:2517:1: error: no previous prototype 
for 'nv50_display_create'

Fixes: ba801ef068c1 ("drm/nouveau/kms: display destroy/init/fini hooks can be 
static")
Signed-off-by: Arnd Bergmann 
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 1 +
 drivers/gpu/drm/nouveau/nv50_display.h  | 4 +---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 5bb777ff1313..9b6824f6b9e4 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -64,6 +64,7 @@
 #include "nouveau_connector.h"
 #include "nouveau_encoder.h"
 #include "nouveau_fence.h"
+#include "nv50_display.h"
 
 #include 
 
diff --git a/drivers/gpu/drm/nouveau/nv50_display.h 
b/drivers/gpu/drm/nouveau/nv50_display.h
index fbd3b15583bc..60f77766766e 100644
--- a/drivers/gpu/drm/nouveau/nv50_display.h
+++ b/drivers/gpu/drm/nouveau/nv50_display.h
@@ -31,7 +31,5 @@
 #include "nouveau_reg.h"
 
 int  nv50_display_create(struct drm_device *);
-void nv50_display_destroy(struct drm_device *);
-int  nv50_display_init(struct drm_device *);
-void nv50_display_fini(struct drm_device *);
+
 #endif /* __NV50_DISPLAY_H__ */
-- 
2.39.2



[Nouveau] [PATCH v3 1/4] drm/amdgpu: Use cursor start instead of ttm resource start

2023-05-04 Thread Somalapuram Amaranath
cleanup PAGE_SHIFT operation and replacing
ttm_resource resource->start with cursor start
using amdgpu_res_first API.
v1 -> v2: reorder patch sequence
v2 -> v3: addressing review comment v2

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 10 +++---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 25a68de0..2a74039c82eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1491,9 +1491,11 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
 u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
+   struct amdgpu_res_cursor cursor;
uint64_t offset;
 
-   offset = (bo->tbo.resource->start << PAGE_SHIFT) +
+   amdgpu_res_first(bo->tbo.resource, 0, bo->tbo.resource->size, );
+   offset = cursor.start +
 amdgpu_ttm_domain_start(adev, bo->tbo.resource->mem_type);
 
return amdgpu_gmc_sign_extend(offset);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c5ef7f7bdc15..ffe6a1ab7f9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -849,6 +849,7 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev,
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(bdev);
struct amdgpu_ttm_tt *gtt = ttm_to_amdgpu_ttm_tt(ttm);
+   struct amdgpu_res_cursor cursor;
uint64_t flags;
int r;
 
@@ -896,7 +897,8 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev,
flags = amdgpu_ttm_tt_pte_flags(adev, ttm, bo_mem);
 
/* bind pages into GART page tables */
-   gtt->offset = (u64)bo_mem->start << PAGE_SHIFT;
+   amdgpu_res_first(bo_mem, 0, bo_mem->size, );
+   gtt->offset = cursor.start;
amdgpu_gart_bind(adev, gtt->offset, ttm->num_pages,
 gtt->ttm.dma_address, flags);
gtt->bound = true;
@@ -916,6 +918,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
struct ttm_operation_ctx ctx = { false, false };
struct amdgpu_ttm_tt *gtt = ttm_to_amdgpu_ttm_tt(bo->ttm);
+   struct amdgpu_res_cursor cursor;
struct ttm_placement placement;
struct ttm_place placements;
struct ttm_resource *tmp;
@@ -927,7 +930,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
 
addr = amdgpu_gmc_agp_addr(bo);
if (addr != AMDGPU_BO_INVALID_OFFSET) {
-   bo->resource->start = addr >> PAGE_SHIFT;
+   bo->resource->start = addr;
return 0;
}
 
@@ -949,7 +952,8 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, tmp);
 
/* Bind pages */
-   gtt->offset = (u64)tmp->start << PAGE_SHIFT;
+   amdgpu_res_first(tmp, 0, tmp->size, );
+   gtt->offset = cursor.start;
amdgpu_ttm_gart_bind(adev, bo, flags);
amdgpu_gart_invalidate_tlb(adev);
ttm_resource_free(bo, >resource);
-- 
2.32.0



Re: [Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

2023-05-04 Thread Diogo Ivo
On Fri, Jan 27, 2023 at 04:00:59PM +1000, Ben Skeggs wrote:
> On Fri, 20 Jan 2023 at 21:37, Diogo Ivo  wrote:
> >
> > On Wed, Jan 18, 2023 at 11:28:49AM +1000, Ben Skeggs wrote:
> > > On Mon, 16 Jan 2023 at 22:27, Diogo Ivo  
> > > wrote:
> > > > On Mon, Jan 16, 2023 at 07:45:05AM +1000, David Airlie wrote:
> > > > > As a quick check can you try changing
> > > > >
> > > > > drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target
> > > > > from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ?
> >
> > > In addition to Dave's change, can you try changing the
> > > nvkm_falcon_load_dmem() call in gm20b_pmu_init() to:
> > >
> > > nvkm_falcon_pio_wr(falcon, (u8 *), 0, 0, DMEM, addr_args,
> > > sizeof(args), 0, false);
> >
> > Chiming in just to say that with this change I see the same as Nicolas
> > except that the init message size is 255 instead of 0:
> >
> > [2.196934] nouveau 5700.gpu: pmu: unexpected init message size 255 
> > vs 42
> I've attached an entirely untested patch (to go on top of the other
> hacks/fixes so far), that will hopefully get us a little further.

Hello,

Thank you for the patch! I can confirm that it fixes the problem
on the Pixel C, and everything works as before the regression.
With this, for the combination of patches

Tested-by: Diogo Ivo  

which I can resend after testing the final patch version.

Thanks,
Diogo


[Nouveau] [PATCH 4/5] drm/nouveau/fifo/gf100-: make gf100_fifo_nonstall_block() static

2023-05-04 Thread Jiapeng Chong
This symbol is not used outside of gf100.c, so marks it static.

drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: warning: no previous 
prototype for ‘gf100_fifo_nonstall_block’.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3021
Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c
index 5bb65258c36d..6c94451d0faa 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c
@@ -447,7 +447,7 @@ gf100_fifo_nonstall_allow(struct nvkm_event *event, int 
type, int index)
spin_unlock_irqrestore(>lock, flags);
 }
 
-void
+static void
 gf100_fifo_nonstall_block(struct nvkm_event *event, int type, int index)
 {
struct nvkm_fifo *fifo = container_of(event, typeof(*fifo), 
nonstall.event);
-- 
2.20.1.7.g153144c



Re: [Nouveau] 2023 X.Org Board of Directors Elections Nomination period is NOW

2023-05-04 Thread Ricardo Garcia
This is a reminder that the nomination period for the X.Org Board of
Director elections finishes in a week, on March 19th.

If you would like to nominate yourself please send email to the election
committee electi...@x.org, giving your

name
current professional affiliation
a statement of contribution to X.Org or related technologies
a personal statement.

To vote or to be elected to the Board you needed to be a Member of the
X.Org Foundation. To be a Member of the X.Org Foundation you need to
apply or renew your membership until the end of the nomination period.

Original email follows below. Thanks for your attention.

On Wed, 2023-02-15 at 21:53 +0100, Ricardo Garcia wrote:
> We are seeking nominations for candidates for election to the X.Org
> Foundation Board of Directors. All X.Org Foundation members are eligible
> for election to the board.
> 
> Nominations for the 2023 election are now open and will remain open
> until 23:59 UTC on 19 March 2023.
> 
> The Board consists of directors elected from the membership. Each year,
> an election is held to bring the total number of directors to eight. The
> four members receiving the highest vote totals will serve as directors
> for two year terms.
> 
> The directors who received two year terms starting in 2022 were Emma
> Anholt, Mark Filion, Alyssa Rosenzweig and Ricardo Garcia. They will
> continue to serve until their term ends in 2024. Current directors whose
> term expires in 2023 are Samuel Iglesias Gonsálvez, Manasi D Navare,
> Lyude Paul and Daniel Vetter.
> 
> A director is expected to participate in the fortnightly IRC meeting to
> discuss current business and to attend the annual meeting of the X.Org
> Foundation, which will be held at a location determined in advance by
> the Board of Directors.
> 
> A member may nominate themselves or any other member they feel is
> qualified. Nominations should be sent to the Election Committee at
> elections at x.org.
> 
> Nominees shall be required to be current members of the X.Org
> Foundation, and submit a personal statement of up to 200 words that will
> be provided to prospective voters. The collected statements, along with
> the statement of contribution to the X.Org Foundation in the member's
> account page on http://members.x.org, will be made available to all
> voters to help them make their voting decisions.
> 
> Nominations, membership applications or renewals and completed personal
> statements must be received no later than 23:59 UTC on 19 March 2023.
> 
> The slate of candidates will be published 26 March 2023 and candidate
> Q will begin then. The deadline for Xorg membership applications and
> renewals is 26 March 2023.
> 
> Cheers,
> Ricardo Garcia, on behalf of the X.Org BoD
> 



Re: [Nouveau] [PATCH] drm/nouveau/mmu: fix use-after-free bug in nvkm_vmm_pfn_map

2023-05-04 Thread Zheng Hacker
Hi,

This bug has been proved to be a false positive. So there is no need
to make the patch.

Thanks,
Zheng

Lyude Paul  于2023年3月7日周二 08:11写道:
>
> Actually - could you resend this with dri-de...@lists.freedesktop.org added to
> the cc list just to make patchwork happy?
>
> On Sat, 2022-10-29 at 15:46 +0800, Zheng Wang wrote:
> > If it failed in kzalloc, vma will be freed in nvkm_vmm_node_merge.
> > The later use of vma will casue use after free.
> >
> > Reported-by: Zheng Wang 
> > Reported-by: Zhuorao Yang 
> >
> > Fix it by returning to upper caller as soon as error occurs.
> >
> > Signed-off-by: Zheng Wang 
> > ---
> >  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c 
> > b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
> > index ae793f400ba1..04befd28f80b 100644
> > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
> > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
> > @@ -1272,8 +1272,7 @@ nvkm_vmm_pfn_map(struct nvkm_vmm *vmm, u8 shift, u64 
> > addr, u64 size, u64 *pfn)
> >  page -
> >  vmm->func->page, map);
> >   if (WARN_ON(!tmp)) {
> > - ret = -ENOMEM;
> > - goto next;
> > + return -ENOMEM;
> >   }
> >
> >   if ((tmp->mapped = map))
>
> --
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat
>


Re: [Nouveau] [PATCH 2/2] drm/nouveau/kms: Add INHERIT ioctl to nvkm/nvif for reading IOR state

2023-05-04 Thread Dan Carpenter
Hi Lyude,

kernel test robot noticed the following build warnings:

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Lyude-Paul/drm-nouveau-kms-Add-INHERIT-ioctl-to-nvkm-nvif-for-reading-IOR-state/20230408-062329
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:
https://lore.kernel.org/r/20230407222133.1425969-2-lyude%40redhat.com
patch subject: [PATCH 2/2] drm/nouveau/kms: Add INHERIT ioctl to nvkm/nvif for 
reading IOR state
config: csky-randconfig-m031-20230409 
(https://download.01.org/0day-ci/archive/20230409/202304091929.sr0cfhln-...@intel.com/config)
compiler: csky-linux-gcc (GCC) 12.1.0

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Reported-by: Dan Carpenter 
| Link: https://lore.kernel.org/r/202304091929.sr0cfhln-...@intel.com/

New smatch warnings:
drivers/gpu/drm/nouveau/dispnv50/disp.c:2518 nv50_display_read_hw_or_state() 
error: uninitialized symbol 'head_idx'.

vim +/head_idx +2518 drivers/gpu/drm/nouveau/dispnv50/disp.c

a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2477  static inline void
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2478  nv50_display_read_hw_or_state(struct drm_device *dev, struct nv50_disp 
*disp,
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2479struct nouveau_encoder *outp)
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2480  {
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2481  struct drm_crtc *crtc;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2482  struct drm_connector_list_iter conn_iter;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2483  struct drm_connector *conn;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2484  struct nv50_head_atom *armh;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2485  const u32 encoder_mask = drm_encoder_mask(>base.base);
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2486  bool found_conn = false, found_head = false;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2487  u8 proto;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2488  int head_idx;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2489  int ret;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2490  
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2491  switch (outp->dcb->type) {
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2492  case DCB_OUTPUT_TMDS:
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2493  ret = nvif_outp_inherit_tmds(>outp, );
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2494  break;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2495  case DCB_OUTPUT_DP:
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2496  ret = nvif_outp_inherit_dp(>outp, );
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2497  break;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2498  case DCB_OUTPUT_LVDS:
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2499  ret = nvif_outp_inherit_lvds(>outp, );
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2500  break;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2501  case DCB_OUTPUT_ANALOG:
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2502  ret = nvif_outp_inherit_rgb_crt(>outp, );
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2503  break;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2504  default:
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2505  drm_dbg_kms(dev, "Readback for %s not implemented yet, 
skipping\n",
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2506  outp->base.base.name);
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2507  drm_WARN_ON(dev, true);
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2508  return;
a3d963915cf6f2 drivers/gpu/drm/nouveau/dispnv50/disp.c Lyude Paul 2023-04-07  
2509  }
a3d963915cf6f2 

Re: [Nouveau] [PATCH] Change the meaning of the fields in the ttm_place structure from pfn to bytes

2023-05-04 Thread Stanislaw Gruszka
On Fri, Mar 03, 2023 at 03:55:56PM +0100, Michel Dänzer wrote:
> On 3/3/23 08:16, Somalapuram Amaranath wrote:
> > Change the ttm_place structure member fpfn, lpfn, mem_type to
> > res_start, res_end, res_type.
> > Change the unsigned to u64.
> > Fix the dependence in all the DRM drivers and
> > clean up PAGE_SHIFT operation.
> > 
> > Signed-off-by: Somalapuram Amaranath 
> > 
> > [...]
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> > index 44367f03316f..5b5104e724e3 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> > @@ -131,11 +131,12 @@ static int amdgpu_gtt_mgr_new(struct 
> > ttm_resource_manager *man,
> > goto err_free;
> > }
> >  
> > -   if (place->lpfn) {
> > +   if (place->res_end) {
> > spin_lock(>lock);
> > r = drm_mm_insert_node_in_range(>mm, >mm_nodes[0],
> > -   num_pages, tbo->page_alignment,
> > -   0, place->fpfn, place->lpfn,
> > +   num_pages, tbo->page_alignment, 
> > 0,
> > +   place->res_start << PAGE_SHIFT,
> > +   place->res_end << PAGE_SHIFT,
> > DRM_MM_INSERT_BEST);
> 
> This should be >> or no shift instead of <<, shouldn't it? Multiplying a 
> value in bytes by the page size doesn't make sense.
> 
> 
> I didn't check the rest of the patch in detail, but it's easy introduce 
> subtle regressions with this kind of change. It'll require a lot of review & 
> testing scrutiny.

Also good justification. The changelog says only what is done, nothing about 
why the change is needed.

Regards
Stanislaw


[Nouveau] [PATCH 2/2] drm/nouveau: constify pointers to hwmon_channel_info

2023-05-04 Thread Krzysztof Kozlowski
Statically allocated array of pointed to hwmon_channel_info can be made
const for safety.

Signed-off-by: Krzysztof Kozlowski 

---

This depends on hwmon core patch:
https://lore.kernel.org/all/20230406203103.3011503-2-krzysztof.kozlow...@linaro.org/

Therefore I propose this should also go via hwmon tree.

Cc: Jean Delvare 
Cc: Guenter Roeck 
Cc: linux-hw...@vger.kernel.org
---
 drivers/gpu/drm/nouveau/nouveau_hwmon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_hwmon.c 
b/drivers/gpu/drm/nouveau/nouveau_hwmon.c
index e844be49e11e..db30a4c2cd4d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_hwmon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_hwmon.c
@@ -211,7 +211,7 @@ static const struct attribute_group 
temp1_auto_point_sensor_group = {
 
 #define N_ATTR_GROUPS   3
 
-static const struct hwmon_channel_info *nouveau_info[] = {
+static const struct hwmon_channel_info * const nouveau_info[] = {
HWMON_CHANNEL_INFO(chip,
   HWMON_C_UPDATE_INTERVAL),
HWMON_CHANNEL_INFO(temp,
-- 
2.34.1



Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton
Thanks, Ben.

On 30/01/2023 01:09, Ben Skeggs wrote:
> On Sat, 28 Jan 2023 at 21:29, Chris Clayton  wrote:
>>
>>
>>
>> On 28/01/2023 05:42, Linux kernel regression tracking (Thorsten Leemhuis) 
>> wrote:
>>> On 27.01.23 20:46, Chris Clayton wrote:
 [Resend because the mail client on my phone decided to turn HTML on behind 
 my back, so my reply got bounced.]

 Thanks Thorsten.

 I did try to revert but it didnt revert cleanly and I don't have the 
 knowledge to fix it up.

 The patch was part of a merge that included a number of related patches. 
 Tomorrow, I'll try to revert the lot and report
 back.
>>>
>>> You are free to do so, but there is no need for that from my side. I
>>> only wanted to know if a simple revert would do the trick; if it
>>> doesn't, it in my experience often is best to leave things to the
>>> developers of the code in question,
>>
>> Sound advice, Thorsten. Way to many conflicts for me to resolve.
> Hey,
> 
> This is a complete shot-in-the-dark, as I don't see this behaviour on
> *any* of my boards.  Could you try the attached patch please?

Unfortunately, the patch made no difference.

I've been looking at how the graphics on my laptop is set up, and have a bit of 
a worry about whether the firmware might
be playing a part in this problem. In order to offload video decoding to the 
NVidia TU117 GPU, it seems the scrubber
firmware must be available, but as far as I know,that has not been released by 
NVidia. To get it to work, I followed
what ubuntu have done and the scrubber in /lib/firmware/nvidia/tu117/nvdec/ is 
a symlink to
../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
firmware loaded is for a different card is being
loaded. I note that processing related to firmware is being changed in the 
patch. Might my set up be at the root of my
problem?

I'll have a fiddle an see what I can work out.

Chris

> 
> Thanks,
> Ben.
> 
>>
>> as they know it best and thus have a
>>> better idea which hidden side effect a more complex revert might have.
>>>
>>> Ciao, Thorsten
>>>
 On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten Leemhuis) 
 wrote:
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
>
> @nouveau-maintainers, did anyone take a look at this? The report is
> already 8 days old and I don't see a single reply. Sure, we'll likely
> get a -rc8, but still it would be good to not fix this on the finish line.
>
> Chris, btw, did you try if you can revert the commit on top of latest
> mainline? And if so, does it fix the problem?
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
> wrote:
>> [adding various lists and the two other nouveau maintainers to the list
>> of recipients]
>
>> On 18.01.23 21:59, Chris Clayton wrote:
>>> Hi.
>>>
>>> I build and installed the lastest development kernel earlier this week. 
>>> I've found that when I try the laptop down (or
>>> reboot it), it hangs right at the end of closing the current session. 
>>> The last line I see on  the screen when rebooting is:
>>>
>>>   sd 4:0:0:0: [sda] Synchronising SCSI cache
>>>
>>> when closing down I see one additional line:
>>>
>>>   sd 4:0:0:0 [sda]Stopping disk
>>>
>>> In both cases the machine then hangs and I have to hold down the power 
>>> button fot a few seconds to switch it off.
>>>
>>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
>>> landed on:
>>>
>>>   # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
>>> drm/nouveau/flcn: new code to load+boot simple HS FWs
>>> (VPR scrubber)
>>>
>>> I built and installed a kernel with 
>>> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
>>> checked out
>>> and that shuts down and reboots fine. It the did the same with the bad 
>>> commit checked out and that does indeed hang, so
>>> I'm confident the bisect outcome is OK.
>>>
>>> Kernels 6.1.6 and 5.15.88 are also OK.
>>>
>>> My system had dual GPUs - one intel and one NVidia. Related extracts 
>>> from 'lscpi -v' is:
>>>
>>> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 
>>> [UHD Graphics] (rev 05) (prog-if 00 [VGA controller])
>>> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>>>
>>> Flags: bus master, fast devsel, latency 0, IRQ 142
>>>
>>> Memory at c200 

[Nouveau] [PATCH 2/2] drm/nouveau/clk: avoid usage of list iterator after loop

2023-05-04 Thread Jakob Koschel
If potentially no valid element is found, 'pstate' would contain an
invalid pointer past the iterator loop. To ensure 'pstate' is always
valid, we only set it if the correct element was found. That allows
adding a BUG_ON in case the code works incorrectly, exposing currently
undetectable potential bugs.

Additionally, Linus proposed to avoid any use of the list iterator
variable after the loop, in the attempt to move the list iterator
variable declaration into the marcro to avoid any potential misuse after
the loop [1].

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c
index da07a2fbef06..871127dfe1d7 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c
@@ -269,14 +269,17 @@ nvkm_pstate_prog(struct nvkm_clk *clk, int pstatei)
struct nvkm_subdev *subdev = >subdev;
struct nvkm_fb *fb = subdev->device->fb;
struct nvkm_pci *pci = subdev->device->pci;
-   struct nvkm_pstate *pstate;
+   struct nvkm_pstate *pstate = NULL, *iter;
int ret, idx = 0;
 
-   list_for_each_entry(pstate, >states, head) {
-   if (idx++ == pstatei)
+   list_for_each_entry(iter, >states, head) {
+   if (idx++ == pstatei) {
+   pstate = iter;
break;
+   }
}
 
+   BUG_ON(!pstate);
nvkm_debug(subdev, "setting performance state %d\n", pstatei);
clk->pstate = pstatei;
 

-- 
2.34.1



Re: [Nouveau] [PATCH v2] mm: Take a page reference when removing device exclusive entries

2023-05-04 Thread David Hildenbrand

On 30.03.23 03:25, Alistair Popple wrote:

Device exclusive page table entries are used to prevent CPU access to
a page whilst it is being accessed from a device. Typically this is
used to implement atomic operations when the underlying bus does not
support atomic access. When a CPU thread encounters a device exclusive
entry it locks the page and restores the original entry after calling
mmu notifiers to signal drivers that exclusive access is no longer
available.

The device exclusive entry holds a reference to the page making it
safe to access the struct page whilst the entry is present. However
the fault handling code does not hold the PTL when taking the page
lock. This means if there are multiple threads faulting concurrently
on the device exclusive entry one will remove the entry whilst others
will wait on the page lock without holding a reference.

This can lead to threads locking or waiting on a folio with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via
munmap() they may still be freed by a migration. This leads to
warnings such as PAGE_FLAGS_CHECK_AT_FREE due to the page being locked
when the refcount drops to zero.

Fix this by trying to take a reference on the folio before locking
it. The code already checks the PTE under the PTL and aborts if the
entry is no longer there. It is also possible the folio has been
unmapped, freed and re-allocated allowing a reference to be taken on
an unrelated folio. This case is also detected by the PTE check and
the folio is unlocked without further changes.

Signed-off-by: Alistair Popple 
Reviewed-by: Ralph Campbell 
Reviewed-by: John Hubbard 
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Cc: sta...@vger.kernel.org


Acked-by: David Hildenbrand 

--
Thanks,

David / dhildenb



Re: [Nouveau] [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-05-04 Thread Boris Brezillon
On Thu, 19 Jan 2023 04:58:48 +
Matthew Brost  wrote:

> > For the ops structures the drm_gpuva_manager allocates for reporting the
> > split/merge steps back to the driver I have ideas to entirely avoid
> > allocations, which also is a good thing in respect of Christians feedback
> > regarding the huge amount of mapping requests some applications seem to
> > generate.
> >  
> 
> It should be fine to have allocations to report the split/merge step as
> this step should be before a dma-fence is published, but yea if possible
> to avoid extra allocs as that is always better.
> 
> Also BTW, great work on drm_gpuva_manager too. We will almost likely
> pick this up in Xe rather than open coding all of this as we currently
> do. We should probably start the port to this soon so we can contribute
> to the implementation and get both of our drivers upstream sooner.

Also quite interested in using this drm_gpuva_manager for pancsf, since
I've been open-coding something similar. Didn't have the
gpuva_region concept to make sure VA mapping/unmapping requests don't
don't go outside a pre-reserved region, but it seems to automate some
of the stuff I've been doing quite nicely.


Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton



On 18/02/2023 12:25, Karol Herbst wrote:
> On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton  
> wrote:
>>
>>
>>
>> On 15/02/2023 11:09, Karol Herbst wrote:
>>> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
>>> (Thorsten Leemhuis)  wrote:

 On 13.02.23 10:14, Chris Clayton wrote:
> On 13/02/2023 02:57, Dave Airlie wrote:
>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  
>> wrote:
>>>
>>>
>>>
>>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) 
>>> wrote:
 On 10.02.23 20:01, Karol Herbst wrote:
> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> Leemhuis)  wrote:
>>
>> On 08.02.23 09:48, Chris Clayton wrote:
>>>
>>> I'm assuming  that we are not going to see a fix for this 
>>> regression before 6.2 is released.
>>
>> Yeah, looks like it. That's unfortunate, but happens. But there is 
>> still
>> time to fix it and there is one thing I wonder:
>>
>> Did any of the nouveau developers look at the netconsole captures 
>> Chris
>> posted more than a week ago to check if they somehow help to track 
>> down
>> the root of this problem?
>
> I did now and I can't spot anything. I think at this point it would
> make sense to dump the active tasks/threads via sqsrq keys to see if
> any is in a weird state preventing the machine from shutting down.

 Many thx for looking into it!
>>>
>>> Yes, thanks Karol.
>>>
>>> Attached is the output from dmesg when this block of code:
>>>
>>> /bin/mount /dev/sda7 /mnt/sda7
>>> /bin/mountpoint /proc || /bin/mount /proc
>>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>>> /bin/echo t > /proc/sysrq-trigger
>>> /bin/sleep 1
>>> /bin/sync
>>> /bin/sleep 1
>>> kill $(pidof dmesg)
>>> /bin/umount /mnt/sda7
>>>
>>> is executed immediately before /sbin/reboot is called as the final step 
>>> of rebooting my system.
>>>
>>> I hope this is what you were looking for, but if not, please let me 
>>> know what you need
>
> Thanks Dave. [...]
 FWIW, in case anyone strands here in the archives: the msg was
 truncated. The full post can be found in a new thread:

 https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/

 Sadly it seems the info "With runpm=0, both reboot and poweroff work on
 my laptop." didn't bring us much further to a solution. :-/ I don't
 really like it, but for regression tracking I'm now putting this on the
 back-burner, as a fix is not in sight.

 #regzbot monitor:
 https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
 #regzbot backburner: hard to debug and apparently rare
 #regzbot ignore-activity

>>>
>>> yeah.. this bug looks a little annoying. Sadly the only Turing based
>>> laptop I got doesn't work on Nouveau because of firmware related
>>> issues and we probably need to get updated ones from Nvidia here :(
>>>
>>> But it's a bit weird that the kernel doesn't shutdown, because I don't
>>> see anything in the logs which would prevent that from happening.
>>> Unless it's waiting on one of the tasks to complete, but none of them
>>> looked in any way nouveau related.
>>>
>>> If somebody else has any fancy kernel debugging tips here to figure
>>> out why it hangs, that would be very helpful...
>>>
>>
>> I think I've figured this out. It's to do with how my system is configured. 
>> I do have an initrd, but the only thing on
>> it is the cpu microcode which, it is recommended, should be loaded early. 
>> The absence of the NVidia firmare from an
>> initrd doesn't matter because the drivers for the hardware that need to load 
>> firmware are all built as modules, So, by
>> the time the devices are configured via udev, the root partition is mounted 
>> and the drivers can get at the firmware.
>>
>> I've found, by turning on nouveau debug and taking a video of the screen as 
>> the system shuts down, that nouveau seems to
>> be trying to run the scrubber very very late in the shutdown process. The 
>> problem is that by this time, I think the root
>> partition, and thus the scrubber binary, have become inaccessible.
>>
>> I seem to have two choices - either make the firmware accessible on an 
>> initrd or unload the module in a shutdown script
>> before the scrubber binary becomes inaccessible. The latter of these is the 
>> workaround I have implemented whilst the
>> problem I reported has been under investigation. For simplicity, I think 
>> I'll promote my workaround to being the
>> permanent solution.
>>
>> So, apologies (and thanks) to everyone whose time I have taken up with this 
>> non-bug.
>>
> 
> Well.. nouveau 

Re: [Nouveau] [PATCH v2 05/10] iommufd: Use GFP_KERNEL_ACCOUNT for iommu_map()

2023-05-04 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, January 19, 2023 2:01 AM
> 
> iommufd follows the same design as KVM and uses memory cgroups to limit
> the amount of kernel memory a iommufd file descriptor can pin down. The
> various internal data structures already use GFP_KERNEL_ACCOUNT.
> 
> However, one of the biggest consumers of kernel memory is the IOPTEs
> stored under the iommu_domain. Many drivers will allocate these at
> iommu_map() time and will trivially do the right thing if we pass in
> GFP_KERNEL_ACCOUNT.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


Re: [Nouveau] [PATCH] drm/nouveau: Adding support to control backlight using bl_power for nva3.

2023-05-04 Thread Bagas Sanjaya
On Sat, Oct 29, 2022 at 03:48:50PM -0300, antoniospg wrote:
> Test plan:
> 
> * Turn off:
> echo 1 > /sys/class/backlight/nv_backlight/bl_power
> 
> * Turn on:
> echo 0 > /sys/class/backlight/nv_backlight/bl_power
> 

You sent this patch twice, so I reply to the latest one.

What is it doing? Please describe the patch. Remember to write the
description in imperative mood.

-- 
An old man doll... just what I always wanted! - Clara


signature.asc
Description: PGP signature


Re: [Nouveau] [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings

2023-05-04 Thread Liam R. Howlett
* Danilo Krummrich  [230217 08:45]:
> Add infrastructure to keep track of GPU virtual address (VA) mappings
> with a decicated VA space manager implementation.
> 
> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
> start implementing, allow userspace applications to request multiple and
> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
> intended to serve the following purposes in this context.
> 
> 1) Provide infrastructure to track GPU VA allocations and mappings,
>making use of the maple_tree.
> 
> 2) Generically connect GPU VA mappings to their backing buffers, in
>particular DRM GEM objects.
> 
> 3) Provide a common implementation to perform more complex mapping
>operations on the GPU VA space. In particular splitting and merging
>of GPU VA mappings, e.g. for intersecting mapping requests or partial
>unmap requests.
> 
> Suggested-by: Dave Airlie 
> Signed-off-by: Danilo Krummrich 
> ---
>  Documentation/gpu/drm-mm.rst|   31 +
>  drivers/gpu/drm/Makefile|1 +
>  drivers/gpu/drm/drm_gem.c   |3 +
>  drivers/gpu/drm/drm_gpuva_mgr.c | 1704 +++
>  include/drm/drm_drv.h   |6 +
>  include/drm/drm_gem.h   |   75 ++
>  include/drm/drm_gpuva_mgr.h |  714 +
>  7 files changed, 2534 insertions(+)
>  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>  create mode 100644 include/drm/drm_gpuva_mgr.h
> 
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index a52e6f4117d6..c9f120cfe730 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
>  .. kernel-doc:: drivers/gpu/drm/drm_mm.c
> :export:
>  
...

> +
> +/**
> + * drm_gpuva_remove_iter - removes the iterators current element
> + * @it: the _gpuva_iterator
> + *
> + * This removes the element the iterator currently points to.
> + */
> +void
> +drm_gpuva_iter_remove(struct drm_gpuva_iterator *it)
> +{
> + mas_erase(>mas);
> +}
> +EXPORT_SYMBOL(drm_gpuva_iter_remove);
> +
> +/**
> + * drm_gpuva_insert - insert a _gpuva
> + * @mgr: the _gpuva_manager to insert the _gpuva in
> + * @va: the _gpuva to insert
> + * @addr: the start address of the GPU VA
> + * @range: the range of the GPU VA
> + *
> + * Insert a _gpuva with a given address and range into a
> + * _gpuva_manager.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
> +  struct drm_gpuva *va)
> +{
> + u64 addr = va->va.addr;
> + u64 range = va->va.range;
> + MA_STATE(mas, >va_mt, addr, addr + range - 1);
> + struct drm_gpuva_region *reg = NULL;
> + int ret;
> +
> + if (unlikely(!drm_gpuva_in_mm_range(mgr, addr, range)))
> + return -EINVAL;
> +
> + if (unlikely(drm_gpuva_in_kernel_region(mgr, addr, range)))
> + return -EINVAL;
> +
> + if (mgr->flags & DRM_GPUVA_MANAGER_REGIONS) {
> + reg = drm_gpuva_in_region(mgr, addr, range);
> + if (unlikely(!reg))
> + return -EINVAL;
> + }
> +

-

> + if (unlikely(drm_gpuva_find_first(mgr, addr, range)))
> + return -EEXIST;
> +
> + ret = mas_store_gfp(, va, GFP_KERNEL);

mas_walk() will set the internal maple state to the limits to what it
finds.  So, instead of an iterator, you can use the walk function and
ensure there is a large enough area in the existing NULL:

/*
 * Nothing at addr, mas now points to the location where the store would
 * happen
 */
if (mas_walk())
return -EEXIST;

/* The NULL entry ends at mas.last, make sure there is room */
if (mas.last < (addr + range - 1))
return -EEXIST;

/* Limit the store size to the correct end address, and store */
 mas.last = addr + range - 1;
 ret = mas_store_gfp(, va, GFP_KERNEL);

> + if (unlikely(ret))
> + return ret;
> +
> + va->mgr = mgr;
> + va->region = reg;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_gpuva_insert);
> +
> +/**
> + * drm_gpuva_remove - remove a _gpuva
> + * @va: the _gpuva to remove
> + *
> + * This removes the given  from the underlaying tree.
> + */
> +void
> +drm_gpuva_remove(struct drm_gpuva *va)
> +{
> + MA_STATE(mas, >mgr->va_mt, va->va.addr, 0);
> +
> + mas_erase();
> +}
> +EXPORT_SYMBOL(drm_gpuva_remove);
> +
...

> +/**
> + * drm_gpuva_find_first - find the first _gpuva in the given range
> + * @mgr: the _gpuva_manager to search in
> + * @addr: the _gpuvas address
> + * @range: the _gpuvas range
> + *
> + * Returns: the first _gpuva within the given range
> + */
> +struct drm_gpuva *
> +drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
> +  u64 addr, u64 range)
> +{
> + MA_STATE(mas, >va_mt, addr, 0);
> +
> + return mas_find(, addr + range - 1);
> +}
> +EXPORT_SYMBOL(drm_gpuva_find_first);
> +
> +/**
> + * drm_gpuva_find 

Re: [Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

2023-05-04 Thread Diogo Ivo
On Sat, Jan 14, 2023 at 04:27:38AM +0100, Karol Herbst wrote:
> I tried to look into it, but my jetson nano, just constantly behaves
> in very strange ways. I tried to compile and install a 6.1 kernel onto
> it, but any kernel just refuses to boot and I have no idea what's up
> with that device. The kernel starts to boot and it just stops in the
> middle. From what I can tell is that most of the tegra devices never
> worked reliably in the first place and there are a couple of random
> and strange bugs around. I've attached my dmesg, so if anybody has any
> clues why the kernel just stops doing anything, it would really help
> me.

Hello,

Thank you for looking into this! I have seen this type of hang in
mainline on this SoC, and it was due to a reset not being deasserted.
Would you mind getting a log with initcall_debug enabled to pinpoint
where the hang occurs? I would be happy to help if I can.

> But maybe it would be for the best to just pull tegra support out of
> nouveau, because in the current situation we really can't spare much
> time dealing with them and we are already busy enough just dealing
> with the desktop GPUs. And the firmware we got from Nvidia is so
> ancient and different from the desktop GPU ones, that without actually
> having all those boards available and properly tested, we can't be
> sure to not break them.
> 
> And afaik there are almost no _actual_ users, just distribution folks
> wanting to claim "support" for those devices, but then ending up using
> Nvidia's out of tree Tegra driver in deployments anyway.

> If there are actual users using them for their daily life, I'd like to
> know, because I'm aware of none.

For what it's worth, I consider myself a user of nouveau. Granted, I'm
using it as a hobby project, but in its current state it is not far from
a usable desktop experience on the Pixel C.

Diogo


[Nouveau] [PATCH v3 2/4] drm/amdkfd: Use cursor start instead of ttm resource start

2023-05-04 Thread Somalapuram Amaranath
cleanup PAGE_SHIFT operation and replacing
ttm_resource resource->start with cursor start
using amdgpu_res_first API.
v1 -> v2: reorder patch sequence
v2 -> v3: addressing review comment v2

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c06ada0844ba..9114393d2ee6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -200,8 +200,12 @@ static int add_queue_mes(struct device_queue_manager *dqm, 
struct queue *q,
queue_input.wptr_addr = (uint64_t)q->properties.write_ptr;
 
if (q->wptr_bo) {
+   struct amdgpu_res_cursor cursor;
+
wptr_addr_off = (uint64_t)q->properties.write_ptr & (PAGE_SIZE 
- 1);
-   queue_input.wptr_mc_addr = 
((uint64_t)q->wptr_bo->tbo.resource->start << PAGE_SHIFT) + wptr_addr_off;
+   amdgpu_res_first(q->wptr_bo->tbo.resource, 0,
+q->wptr_bo->tbo.resource->size, );
+   queue_input.wptr_mc_addr = cursor.start + wptr_addr_off;
}
 
queue_input.is_kfd_process = 1;
-- 
2.32.0



Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton



On 28/01/2023 05:42, Linux kernel regression tracking (Thorsten Leemhuis) wrote:
> On 27.01.23 20:46, Chris Clayton wrote:
>> [Resend because the mail client on my phone decided to turn HTML on behind 
>> my back, so my reply got bounced.]
>>
>> Thanks Thorsten.
>>
>> I did try to revert but it didnt revert cleanly and I don't have the 
>> knowledge to fix it up.
>>
>> The patch was part of a merge that included a number of related patches. 
>> Tomorrow, I'll try to revert the lot and report
>> back.
> 
> You are free to do so, but there is no need for that from my side. I
> only wanted to know if a simple revert would do the trick; if it
> doesn't, it in my experience often is best to leave things to the
> developers of the code in question, 

Sound advice, Thorsten. Way to many conflicts for me to resolve.

as they know it best and thus have a
> better idea which hidden side effect a more complex revert might have.
> 
> Ciao, Thorsten
> 
>> On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten Leemhuis) 
>> wrote:
>>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>>> to make this easily accessible to everyone.
>>>
>>> @nouveau-maintainers, did anyone take a look at this? The report is
>>> already 8 days old and I don't see a single reply. Sure, we'll likely
>>> get a -rc8, but still it would be good to not fix this on the finish line.
>>>
>>> Chris, btw, did you try if you can revert the commit on top of latest
>>> mainline? And if so, does it fix the problem?
>>>
>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>> --
>>> Everything you wanna know about Linux kernel regression tracking:
>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>> If I did something stupid, please tell me, as explained on that page.
>>>
>>> #regzbot poke
>>>
>>> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
>>> wrote:
 [adding various lists and the two other nouveau maintainers to the list
 of recipients]
>>>
 On 18.01.23 21:59, Chris Clayton wrote:
> Hi.
>
> I build and installed the lastest development kernel earlier this week. 
> I've found that when I try the laptop down (or
> reboot it), it hangs right at the end of closing the current session. The 
> last line I see on  the screen when rebooting is:
>
>   sd 4:0:0:0: [sda] Synchronising SCSI cache
>
> when closing down I see one additional line:
>
>   sd 4:0:0:0 [sda]Stopping disk
>
> In both cases the machine then hangs and I have to hold down the power 
> button fot a few seconds to switch it off.
>
> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
> landed on:
>
>   # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
> drm/nouveau/flcn: new code to load+boot simple HS FWs
> (VPR scrubber)
>
> I built and installed a kernel with 
> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
> checked out
> and that shuts down and reboots fine. It the did the same with the bad 
> commit checked out and that does indeed hang, so
> I'm confident the bisect outcome is OK.
>
> Kernels 6.1.6 and 5.15.88 are also OK.
>
> My system had dual GPUs - one intel and one NVidia. Related extracts from 
> 'lscpi -v' is:
>
> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
> Graphics] (rev 05) (prog-if 00 [VGA controller])
> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
>
> Flags: bus master, fast devsel, latency 0, IRQ 142
>
> Memory at c200 (64-bit, non-prefetchable) [size=16M]
>
> Memory at a000 (64-bit, prefetchable) [size=256M]
>
> I/O ports at 5000 [size=64]
>
> Expansion ROM at 000c [virtual] [disabled] [size=128K]
>
> Capabilities: [40] Vendor Specific Information: Len=0c 
>
> Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 
> 00
>
> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
>
> Capabilities: [d0] Power Management version 2
>
> Kernel driver in use: i915
>
> Kernel modules: i915
>
>
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
> 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> controller])
> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti 
> Mobile]
> Flags: bus master, fast devsel, latency 0, IRQ 141
> Memory at c400 (32-bit, non-prefetchable) [size=16M]
> Memory at b000 (64-bit, prefetchable) [size=256M]
> Memory at c000 (64-bit, prefetchable) [size=32M]
> I/O ports at 4000 [size=128]
> Expansion ROM at c300 [disabled] [size=512K]
>

[Nouveau] [PATCH 5/6] drm/ttm: Change the meaning of the fields in the drm_mm_nodes structure from pfn to bytes

2023-05-04 Thread Somalapuram Amaranath
Change the ttm_range_man_alloc() allocation from pages to size in bytes.
Fix the dependent drm_mm_nodes start and size from pages to bytes.

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/i915/i915_scatterlist.c |  6 +++---
 drivers/gpu/drm/ttm/ttm_range_manager.c | 15 +++
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c 
b/drivers/gpu/drm/i915/i915_scatterlist.c
index 756289e43dff..7defda1219d0 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.c
+++ b/drivers/gpu/drm/i915/i915_scatterlist.c
@@ -94,7 +94,7 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct 
drm_mm_node *node,
if (!rsgt)
return ERR_PTR(-ENOMEM);
 
-   i915_refct_sgt_init(rsgt, node->size << PAGE_SHIFT);
+   i915_refct_sgt_init(rsgt, node->size);
st = >table;
/* restricted by sg_alloc_table */
if (WARN_ON(overflows_type(DIV_ROUND_UP_ULL(node->size, segment_pages),
@@ -110,8 +110,8 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct 
drm_mm_node *node,
sg = st->sgl;
st->nents = 0;
prev_end = (resource_size_t)-1;
-   block_size = node->size << PAGE_SHIFT;
-   offset = node->start << PAGE_SHIFT;
+   block_size = node->size;
+   offset = node->start;
 
while (block_size) {
u64 len;
diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c
index 62fddcc59f02..ff9962f7f81d 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -83,9 +83,10 @@ static int ttm_range_man_alloc(struct ttm_resource_manager 
*man,
 
spin_lock(>lock);
ret = drm_mm_insert_node_in_range(mm, >mm_nodes[0],
- PFN_UP(node->base.size),
- bo->page_alignment, 0,
- place->fpfn, lpfn, mode);
+ node->base.size,
+ bo->page_alignment << PAGE_SHIFT, 0,
+ place->fpfn << PAGE_SHIFT,
+ lpfn << PAGE_SHIFT, mode);
spin_unlock(>lock);
 
if (unlikely(ret)) {
@@ -119,11 +120,10 @@ static bool ttm_range_man_intersects(struct 
ttm_resource_manager *man,
 size_t size)
 {
struct drm_mm_node *node = _ttm_range_mgr_node(res)->mm_nodes[0];
-   u32 num_pages = PFN_UP(size);
 
/* Don't evict BOs outside of the requested placement range */
-   if (place->fpfn >= (node->start + num_pages) ||
-   (place->lpfn && place->lpfn <= node->start))
+   if ((place->fpfn << PAGE_SHIFT) >= (node->start + size) ||
+   (place->lpfn && (place->lpfn << PAGE_SHIFT) <= node->start))
return false;
 
return true;
@@ -135,10 +135,9 @@ static bool ttm_range_man_compatible(struct 
ttm_resource_manager *man,
 size_t size)
 {
struct drm_mm_node *node = _ttm_range_mgr_node(res)->mm_nodes[0];
-   u32 num_pages = PFN_UP(size);
 
if (node->start < place->fpfn ||
-   (place->lpfn && (node->start + num_pages) > place->lpfn))
+   (place->lpfn && (node->start + size) > place->lpfn << PAGE_SHIFT))
return false;
 
return true;
-- 
2.32.0



[Nouveau] [PATCH v4 4/4] drm/amdgpu: Cleanup PAGE_SHIFT operation

2023-05-04 Thread Somalapuram Amaranath
Cleaning up page shift operations.

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index a97e8236bde9..ffe6a1ab7f9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -930,7 +930,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
 
addr = amdgpu_gmc_agp_addr(bo);
if (addr != AMDGPU_BO_INVALID_OFFSET) {
-   bo->resource->start = addr >> PAGE_SHIFT;
+   bo->resource->start = addr;
return 0;
}
 
-- 
2.32.0



Re: [Nouveau] [PATCH] drm/gem: Expose the buffer object handle to userspace last

2023-05-04 Thread Steven Price
On 14/02/2023 12:50, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Currently drm_gem_handle_create_tail exposes the handle to userspace
> before the buffer object constructions is complete. This allowing
> of working against a partially constructed object, which may also be in
> the process of having its creation fail, can have a range of negative
> outcomes.
> 
> A lot of those will depend on what the individual drivers are doing in
> their obj->funcs->open() callbacks, and also with a common failure mode
> being -ENOMEM from drm_vma_node_allow.
> 
> We can make sure none of this can happen by allocating a handle last,
> although with a downside that more of the function now runs under the
> dev->object_name_lock.
> 
> Looking into the individual drivers open() hooks, we have
> amdgpu_gem_object_open which seems like it could have a potential security
> issue without this change.
> 
> A couple drivers like qxl_gem_object_open and vmw_gem_object_open
> implement no-op hooks so no impact for them.
> 
> A bunch of other require a deeper look by individual owners to asses for
> impact. Those are lima_gem_object_open, nouveau_gem_object_open,
> panfrost_gem_open, radeon_gem_object_open and virtio_gpu_gem_object_open.

I've looked over the panfrost code, and I can't see how this could
create a security hole there. It looks like there's a path which can
confuse the shrinker (so objects might not be purged when they could
be[1]) but they would be freed properly in the normal path - so no worse
than user space could already do.

[1] gpu_usecount is incremented in panfrost_lookup_bos() per bo, but not
decremented on failure.

> Putting aside the risk assesment of the above, some common scenarios to
> think about are along these lines:
> 
> 1)
> Userspace closes a handle by speculatively "guessing" it from a second
> thread.
> 
> This results in an unreachable buffer object so, a memory leak.
> 
> 2)
> Same as 1), but object is in the process of getting closed (failed
> creation).
> 
> The second thread is then able to re-cycle the handle and idr_remove would
> in the first thread would then remove the handle it does not own from the
> idr.

This, however, looks plausible - and I can see how this could
potentially trigger a security hole in user space.

> 3)
> Going back to the earlier per driver problem space - individual impact
> assesment of allowing a second thread to access and operate on a partially
> constructed handle / object. (Can something crash? Leak information?)
> 
> In terms of identifying when the problem started I will tag some patches
> as references, but not all, if even any, of them actually point to a
> broken state. I am just identifying points at which more opportunity for
> issues to arise was added.
> 
> References: 304eda32920b ("drm/gem: add hooks to notify driver when object 
> handle is created/destroyed")
> References: ca481c9b2a3a ("drm/gem: implement vma access management")
> References: b39b5394fabc ("drm/gem: Add drm_gem_object_funcs")
> Cc: dri-de...@lists.freedesktop.org
> Cc: Rob Clark 
> Cc: Ben Skeggs 
> Cc: David Herrmann 
> Cc: Noralf Trønnes 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: amd-...@lists.freedesktop.org
> Cc: l...@lists.freedesktop.org
> Cc: nouveau@lists.freedesktop.org
> Cc: Steven Price 
> Cc: virtualizat...@lists.linux-foundation.org
> Cc: spice-de...@lists.freedesktop.org
> Cc: Zack Rusin 

FWIW I think this makes the code easier to reason about, so

Reviewed-by: Steven Price 

> ---
>  drivers/gpu/drm/drm_gem.c | 48 +++
>  1 file changed, 24 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index aa15c52ae182..e3d897bca0f2 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -356,52 +356,52 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
>  u32 *handlep)
>  {
>   struct drm_device *dev = obj->dev;
> - u32 handle;
>   int ret;
>  
>   WARN_ON(!mutex_is_locked(>object_name_lock));
>   if (obj->handle_count++ == 0)
>   drm_gem_object_get(obj);
>  
> + ret = drm_vma_node_allow(>vma_node, file_priv);
> + if (ret)
> + goto err_put;
> +
> + if (obj->funcs->open) {
> + ret = obj->funcs->open(obj, file_priv);
> + if (ret)
> + goto err_revoke;
> + }
> +
>   /*
> -  * Get the user-visible handle using idr.  Preload and perform
> -  * allocation under our spinlock.
> +  * Get the user-visible handle using idr as the _last_ step.
> +  * Preload and perform allocation under our spinlock.
>*/
>   idr_preload(GFP_KERNEL);
>   spin_lock(_priv->table_lock);
> -
>   ret = idr_alloc(_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
> -
>   spin_unlock(_priv->table_lock);
>   idr_preload_end();
>  
> - mutex_unlock(>object_name_lock);
>   if (ret < 0)
> - 

Re: [Nouveau] [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro

2023-05-04 Thread Liam R. Howlett
* Danilo Krummrich  [230217 08:44]:
> Split up the MA_STATE() macro such that components using the maple tree
> can easily inherit from struct ma_state and build custom tree walk
> macros to hide their internals from users.
> 
> Example:
> 
> struct sample_iter {
>   struct ma_state mas;
>   struct sample_mgr *mgr;
>   struct sample_entry *entry;
> };
> 
> \#define SAMPLE_ITER(name, __mgr) \
>   struct sample_iter name = { \
>   .mas = __MA_STATE(&(__mgr)->mt, 0, 0),
>   .mgr = __mgr,
>   .entry = NULL,
>   }

I see this patch is to allow for anonymous maple states, this looks
good.

I've a lengthy comment about the iterator that I'm adding here to head
off anyone that may copy your example below.

> 
> \#define sample_iter_for_each_range(it__, start__, end__) \
>   for ((it__).mas.index = start__, (it__).entry = mas_find(&(it__).mas, 
> end__ - 1); \
>(it__).entry; (it__).entry = mas_find(&(it__).mas, end__ - 1))

I see you've added something like the above in your patch set as well.
I'd like to point out that the index isn't the only state information
that needs to be altered here, and in fact, this could go very wrong.

The maple state has a node and an offset within that node.  If you set
the index to lower than the current position of your iterator and call
mas_find() then what happens is somewhat undefined.  I expect you will
get the wrong value (most likely either the current value or the very
next one that the iterator is already pointing to).  I believe you have
been using a fresh maple state for each iterator in your patches, but I
haven't had a deep look into your code yet.

We have methods of resetting the iterator and set the range (mas_set()
and mas_set_range()) which are safe for what you are doing, but they
will start the walk from the root node to the index again.

So, if you know what you are doing is safe, then the way you have
written it will work, but it's worth mentioning that this could occur.

It is also worth pointing out that it would be much safer to use a
function to do the above so you get type safety.. and I was asked to add
this to the VMA interface by Linus [1], which is on its way upstream [2].

1. 
https://lore.kernel.org/linux-mm/CAHk-=wg9wqxbgkndkd2bqocnn73rdswuwsavbb7t-tekyke...@mail.gmail.com/
2. 
https://lore.kernel.org/linux-mm/20230120162650.984577-1-liam.howl...@oracle.com/

> 
> Signed-off-by: Danilo Krummrich 
> ---
>  include/linux/maple_tree.h | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index e594db58a0f1..ca04c900e51a 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -424,8 +424,8 @@ struct ma_wr_state {
>  #define MA_ERROR(err) \
>   ((struct maple_enode *)(((unsigned long)err << 2) | 2UL))
>  
> -#define MA_STATE(name, mt, first, end)   
> \
> - struct ma_state name = {\
> +#define __MA_STATE(mt, first, end)   \
> + {   \
>   .tree = mt, \
>   .index = first, \
>   .last = end,\
> @@ -435,6 +435,9 @@ struct ma_wr_state {
>   .alloc = NULL,  \
>   }
>  
> +#define MA_STATE(name, mt, first, end)   
> \
> + struct ma_state name = __MA_STATE(mt, first, end)
> +
>  #define MA_WR_STATE(name, ma_state, wr_entry)
> \
>   struct ma_wr_state name = { \
>   .mas = ma_state,\
> -- 
> 2.39.1
> 


[Nouveau] [PATCH v2 2/4] drm/amdkfd: Use cursor start instead of ttm resource start

2023-05-04 Thread Somalapuram Amaranath
cleanup PAGE_SHIFT operation and replacing
ttm_resource resource->start with cursor start
using amdgpu_res_first API
v1 -> v2: reorder patch sequence

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c06ada0844ba..f87ce4f1cb93 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -200,8 +200,11 @@ static int add_queue_mes(struct device_queue_manager *dqm, 
struct queue *q,
queue_input.wptr_addr = (uint64_t)q->properties.write_ptr;
 
if (q->wptr_bo) {
+   struct amdgpu_res_cursor cursor;
wptr_addr_off = (uint64_t)q->properties.write_ptr & (PAGE_SIZE 
- 1);
-   queue_input.wptr_mc_addr = 
((uint64_t)q->wptr_bo->tbo.resource->start << PAGE_SHIFT) + wptr_addr_off;
+   amdgpu_res_first(q->wptr_bo->tbo.resource, 0,
+q->wptr_bo->tbo.resource->size, );
+   queue_input.wptr_mc_addr = cursor.start + wptr_addr_off;
}
 
queue_input.is_kfd_process = 1;
-- 
2.32.0



[Nouveau] [PATCH v3 2/3] drm/fb-helper: Set framebuffer for vga-switcheroo clients

2023-05-04 Thread Thomas Zimmermann
Set the framebuffer info for drivers that support VGA switcheroo. Only
affects the amdgpu and nouveau drivers, which use VGA switcheroo and
generic fbdev emulation. For other drivers, this does nothing.

This fixes a potential regression in the console code. Both, amdgpu and
nouveau, invoked vga_switcheroo_client_fb_set() from their internal fbdev
code. But the call got lost when the drivers switched to the generic
emulation.

Fixes: 087451f372bf ("drm/amdgpu: use generic fb helpers instead of setting up 
AMD own's.")
Fixes: 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers")
Signed-off-by: Thomas Zimmermann 
Reviewed-by: Daniel Vetter 
Reviewed-by: Alex Deucher 
Cc: Ben Skeggs 
Cc: Karol Herbst 
Cc: Lyude Paul 
Cc: Thomas Zimmermann 
Cc: Javier Martinez Canillas 
Cc: Laurent Pinchart 
Cc: Jani Nikula 
Cc: Dave Airlie 
Cc: Evan Quan 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Hawking Zhang 
Cc: Likun Gao 
Cc: "Christian König" 
Cc: Stanley Yang 
Cc: "Tianci.Yin" 
Cc: Xiaojian Du 
Cc: Andrey Grodzovsky 
Cc: YiPeng Chai 
Cc: Somalapuram Amaranath 
Cc: Bokun Zhang 
Cc: Guchun Chen 
Cc: Hamza Mahfooz 
Cc: Aurabindo Pillai 
Cc: Mario Limonciello 
Cc: Solomon Chiu 
Cc: Kai-Heng Feng 
Cc: Felix Kuehling 
Cc: Daniel Vetter 
Cc: "Marek Olšák" 
Cc: Sam Ravnborg 
Cc: Hans de Goede 
Cc: "Ville Syrjälä" 
Cc: dri-de...@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc:  # v5.17+
---
 drivers/gpu/drm/drm_fb_helper.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 367fb8b2d5fa..c5c13e192b64 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -30,7 +30,9 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include 
+#include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1924,6 +1926,7 @@ static int drm_fb_helper_single_fb_probe(struct 
drm_fb_helper *fb_helper,
 int preferred_bpp)
 {
struct drm_client_dev *client = _helper->client;
+   struct drm_device *dev = fb_helper->dev;
struct drm_fb_helper_surface_size sizes;
int ret;
 
@@ -1945,6 +1948,11 @@ static int drm_fb_helper_single_fb_probe(struct 
drm_fb_helper *fb_helper,
return ret;
 
strcpy(fb_helper->fb->comm, "[fbcon]");
+
+   /* Set the fb info for vgaswitcheroo clients. Does nothing otherwise. */
+   if (dev_is_pci(dev->dev))
+   vga_switcheroo_client_fb_set(to_pci_dev(dev->dev), 
fb_helper->info);
+
return 0;
 }
 
-- 
2.39.0



Re: [Nouveau] [PATCH v2 06/10] iommu/intel: Add a gfp parameter to alloc_pgtable_page()

2023-05-04 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, January 19, 2023 2:01 AM
> 
> This is eventually called by iommufd through intel_iommu_map_pages() and
> it should not be forced to atomic. Push the GFP_ATOMIC to all callers.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


Re: [Nouveau] [PATCH v2 06/10] iommu/intel: Add a gfp parameter to alloc_pgtable_page()

2023-05-04 Thread Baolu Lu

On 2023/1/19 2:00, Jason Gunthorpe wrote:

This is eventually called by iommufd through intel_iommu_map_pages() and
it should not be forced to atomic. Push the GFP_ATOMIC to all callers.

Signed-off-by: Jason Gunthorpe


Reviewed-by: Lu Baolu 

Best regards,
baolu


[Nouveau] [PATCH v4 1/4] drm/gem: Remove BUG_ON in drm_gem_private_object_init

2023-05-04 Thread Somalapuram Amaranath
ttm_resource can allocate size in bytes to support less than page size

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/drm_gem.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 59a0bb5ebd85..ee8b5c2b6c60 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -152,8 +152,6 @@ EXPORT_SYMBOL(drm_gem_object_init);
 void drm_gem_private_object_init(struct drm_device *dev,
 struct drm_gem_object *obj, size_t size)
 {
-   BUG_ON((size & (PAGE_SIZE - 1)) != 0);
-
obj->dev = dev;
obj->filp = NULL;
 
-- 
2.32.0



[Nouveau] [PATCH -resend] drm/nouveau/kms/nv50- (gcc13): fix nv50_wndw_new_ prototype

2023-05-04 Thread Jiri Slaby (SUSE)
gcc-13 warns about mismatching types for enums. That revealed switched
arguments of nv50_wndw_new_():
  drivers/gpu/drm/nouveau/dispnv50/wndw.c:696:1: error: conflicting types for 
'nv50_wndw_new_' due to enum/integer mismatch; have 'int(const struct 
nv50_wndw_func *, struct drm_device *, enum drm_plane_type,  const char *, int, 
 const u32 *, u32,  enum nv50_disp_interlock_type,  u32,  struct nv50_wndw **)'
  drivers/gpu/drm/nouveau/dispnv50/wndw.h:36:5: note: previous declaration of 
'nv50_wndw_new_' with type 'int(const struct nv50_wndw_func *, struct 
drm_device *, enum drm_plane_type,  const char *, int,  const u32 *, enum 
nv50_disp_interlock_type,  u32,  u32,  struct nv50_wndw **)'

It can be barely visible, but the declaration says about the parameters
in the middle:
  enum nv50_disp_interlock_type,
  u32 interlock_data,
  u32 heads,

While the definition states differently:
  u32 heads,
  enum nv50_disp_interlock_type interlock_type,
  u32 interlock_data,

Unify/fix the declaration to match the definition.

Cc: Martin Liska 
Cc: Ben Skeggs 
Cc: Karol Herbst 
Cc: Lyude Paul 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Jiri Slaby (SUSE) 
---

Notes:
[v2] switch to uint instead of to enum

 drivers/gpu/drm/nouveau/dispnv50/wndw.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.h 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.h
index 591c852f326b..76a6ae5d5652 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.h
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.h
@@ -35,8 +35,9 @@ struct nv50_wndw {
 
 int nv50_wndw_new_(const struct nv50_wndw_func *, struct drm_device *,
   enum drm_plane_type, const char *name, int index,
-  const u32 *format, enum nv50_disp_interlock_type,
-  u32 interlock_data, u32 heads, struct nv50_wndw **);
+  const u32 *format, u32 heads,
+  enum nv50_disp_interlock_type, u32 interlock_data,
+  struct nv50_wndw **);
 void nv50_wndw_flush_set(struct nv50_wndw *, u32 *interlock,
 struct nv50_wndw_atom *);
 void nv50_wndw_flush_clr(struct nv50_wndw *, u32 *interlock, bool flush,
-- 
2.39.0



[Nouveau] [PATCH linux-next] drm/nouveau/fifo: remove duplicated included chid.h

2023-05-04 Thread yang.yang29
From: Xu Panda 

The chid.h is included more than once.

Signed-off-by: Xu Panda 
Signed-off-by: Yang Yang 
---
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c
index b7c9d6115bce..b19a3612b62e 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c
@@ -24,7 +24,6 @@
 #include "chan.h"
 #include "chid.h"
 #include "cgrp.h"
-#include "chid.h"
 #include "runl.h"
 #include "priv.h"

-- 
2.15.2


[Nouveau] [PATCH 2/6] drm/amd: fix’s on ttm_resource rework to use size_t type

2023-05-04 Thread Somalapuram Amaranath
Fix the ttm_resource from num_pages to size_t size.

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c| 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c   | 8 
 6 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 1f3302aebeff..44367f03316f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -144,7 +144,7 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
node->base.start = node->mm_nodes[0].start;
} else {
node->mm_nodes[0].start = 0;
-   node->mm_nodes[0].size = node->base.num_pages;
+   node->mm_nodes[0].size = PFN_UP(node->base.size);
node->base.start = AMDGPU_BO_INVALID_OFFSET;
}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2e8f6cd7a729..e51f80bb1d07 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -542,6 +542,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
/* GWS and OA don't need any alignment. */
page_align = bp->byte_align;
size <<= PAGE_SHIFT;
+
} else if (bp->domain & AMDGPU_GEM_DOMAIN_GDS) {
/* Both size and alignment must be a multiple of 4. */
page_align = ALIGN(bp->byte_align, 4);
@@ -776,7 +777,7 @@ int amdgpu_bo_kmap(struct amdgpu_bo *bo, void **ptr)
return 0;
}
 
-   r = ttm_bo_kmap(>tbo, 0, bo->tbo.resource->num_pages, >kmap);
+   r = ttm_bo_kmap(>tbo, 0, PFN_UP(bo->tbo.resource->size), >kmap);
if (r)
return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 6546552e596c..5c4f93ee0c57 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -62,7 +62,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
if (!res)
goto fallback;
 
-   BUG_ON(start + size > res->num_pages << PAGE_SHIFT);
+   BUG_ON(start + size > res->size);
 
cur->mem_type = res->mem_type;
 
@@ -110,7 +110,7 @@ static inline void amdgpu_res_first(struct ttm_resource 
*res,
cur->size = size;
cur->remaining = size;
cur->node = NULL;
-   WARN_ON(res && start + size > res->num_pages << PAGE_SHIFT);
+   WARN_ON(res && start + size > res->size);
return;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 5e6ddc7e101c..677ad2016976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -127,7 +127,7 @@ TRACE_EVENT(amdgpu_bo_create,
 
TP_fast_assign(
   __entry->bo = bo;
-  __entry->pages = bo->tbo.resource->num_pages;
+  __entry->pages = PFN_UP(bo->tbo.resource->size);
   __entry->type = bo->tbo.resource->mem_type;
   __entry->prefer = bo->preferred_domains;
   __entry->allow = bo->allowed_domains;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index dc262d2c2925..36066965346f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -381,7 +381,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
dst.offset = 0;
 
r = amdgpu_ttm_copy_mem_to_mem(adev, , ,
-  new_mem->num_pages << PAGE_SHIFT,
+  new_mem->size,
   amdgpu_bo_encrypted(abo),
   bo->base.resv, );
if (r)
@@ -424,7 +424,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 static bool amdgpu_mem_visible(struct amdgpu_device *adev,
   struct ttm_resource *mem)
 {
-   u64 mem_size = (u64)mem->num_pages << PAGE_SHIFT;
+   u64 mem_size = (u64)mem->size;
struct amdgpu_res_cursor cursor;
u64 end;
 
@@ -568,7 +568,7 @@ static int amdgpu_ttm_io_mem_reserve(struct ttm_device 
*bdev,
 struct ttm_resource *mem)
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(bdev);
-   size_t bus_size = (size_t)mem->num_pages << PAGE_SHIFT;
+   size_t bus_size = (size_t)mem->size;
 
switch (mem->mem_type) {
case 

[Nouveau] [PATCH 4/6] drm/nouveau: fix’s on ttm_resource rework to use size_t type

2023-05-04 Thread Somalapuram Amaranath
Fix the ttm_resource from num_pages to size_t size.

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 4 ++--
 drivers/gpu/drm/nouveau/nouveau_bo0039.c | 4 ++--
 drivers/gpu/drm/nouveau/nouveau_bo5039.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_bo74c1.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_bo85b5.c | 4 ++--
 drivers/gpu/drm/nouveau/nouveau_bo9039.c | 4 ++--
 drivers/gpu/drm/nouveau/nouveau_bo90b5.c | 4 ++--
 drivers/gpu/drm/nouveau/nouveau_boa0b5.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c| 5 ++---
 drivers/gpu/drm/nouveau/nouveau_mem.c| 4 ++--
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 11 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 126b3c6e12f9..16ca4a141866 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -532,7 +532,7 @@ nouveau_bo_map(struct nouveau_bo *nvbo)
if (ret)
return ret;
 
-   ret = ttm_bo_kmap(>bo, 0, nvbo->bo.resource->num_pages, 
>kmap);
+   ret = ttm_bo_kmap(>bo, 0, PFN_UP(nvbo->bo.resource->size), 
>kmap);
 
ttm_bo_unreserve(>bo);
return ret;
@@ -1236,7 +1236,7 @@ vm_fault_t nouveau_ttm_fault_reserve_notify(struct 
ttm_buffer_object *bo)
} else {
/* make sure bo is in mappable vram */
if (drm->client.device.info.family >= NV_DEVICE_INFO_V0_TESLA ||
-   bo->resource->start + bo->resource->num_pages < mappable)
+   bo->resource->start + PFN_UP(bo->resource->size) < mappable)
return 0;
 
for (i = 0; i < nvbo->placement.num_placement; ++i) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo0039.c 
b/drivers/gpu/drm/nouveau/nouveau_bo0039.c
index 7390132129fe..e2ce44adaa5c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo0039.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo0039.c
@@ -52,7 +52,7 @@ nv04_bo_move_m2mf(struct nouveau_channel *chan, struct 
ttm_buffer_object *bo,
u32 src_offset = old_reg->start << PAGE_SHIFT;
u32 dst_ctxdma = nouveau_bo_mem_ctxdma(bo, chan, new_reg);
u32 dst_offset = new_reg->start << PAGE_SHIFT;
-   u32 page_count = new_reg->num_pages;
+   u32 page_count = PFN_UP(new_reg->size);
int ret;
 
ret = PUSH_WAIT(push, 3);
@@ -62,7 +62,7 @@ nv04_bo_move_m2mf(struct nouveau_channel *chan, struct 
ttm_buffer_object *bo,
PUSH_MTHD(push, NV039, SET_CONTEXT_DMA_BUFFER_IN, src_ctxdma,
   SET_CONTEXT_DMA_BUFFER_OUT, dst_ctxdma);
 
-   page_count = new_reg->num_pages;
+   page_count = PFN_UP(new_reg->size);
while (page_count) {
int line_count = (page_count > 2047) ? 2047 : page_count;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo5039.c 
b/drivers/gpu/drm/nouveau/nouveau_bo5039.c
index 4c75c7b3804c..c6cf3629a9f9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo5039.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo5039.c
@@ -41,7 +41,7 @@ nv50_bo_move_m2mf(struct nouveau_channel *chan, struct 
ttm_buffer_object *bo,
 {
struct nouveau_mem *mem = nouveau_mem(old_reg);
struct nvif_push *push = chan->chan.push;
-   u64 length = (new_reg->num_pages << PAGE_SHIFT);
+   u64 length = new_reg->size;
u64 src_offset = mem->vma[0].addr;
u64 dst_offset = mem->vma[1].addr;
int src_tiled = !!mem->kind;
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo74c1.c 
b/drivers/gpu/drm/nouveau/nouveau_bo74c1.c
index ed6c09d67840..9b7ba31fae13 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo74c1.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo74c1.c
@@ -44,7 +44,7 @@ nv84_bo_move_exec(struct nouveau_channel *chan, struct 
ttm_buffer_object *bo,
if (ret)
return ret;
 
-   PUSH_NVSQ(push, NV74C1, 0x0304, new_reg->num_pages << PAGE_SHIFT,
+   PUSH_NVSQ(push, NV74C1, 0x0304, new_reg->size,
0x0308, upper_32_bits(mem->vma[0].addr),
0x030c, lower_32_bits(mem->vma[0].addr),
0x0310, upper_32_bits(mem->vma[1].addr),
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo85b5.c 
b/drivers/gpu/drm/nouveau/nouveau_bo85b5.c
index dec29b2d8bb2..a15a38a87a95 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo85b5.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo85b5.c
@@ -44,10 +44,10 @@ nva3_bo_move_copy(struct nouveau_channel *chan, struct 
ttm_buffer_object *bo,
struct nvif_push *push = chan->chan.push;
u64 src_offset = mem->vma[0].addr;
u64 dst_offset = mem->vma[1].addr;
-   u32 page_count = new_reg->num_pages;
+   u32 page_count = PFN_UP(new_reg->size);
int ret;
 
-   page_count = new_reg->num_pages;
+   page_count = PFN_UP(new_reg->size);
while (page_count) {
int line_count = (page_count > 8191) ? 8191 : page_count;
 
diff --git 

Re: [Nouveau] Disabling -Warray-bounds for gcc-13 too

2023-05-04 Thread Kees Cook
On April 23, 2023 10:36:24 AM PDT, Linus Torvalds 
 wrote:
>Kees,
>  I made the mistake of upgrading my M2 Macbook Air to Fedora-38, and
>in the process I got gcc-13 which is not WERROR-clean because we only
>limited the 'array-bounds' warning to gcc-11 and gcc-12. But gcc-13
>has all the same issues.
>
>And I want to be able to do my arm64 builds with WERROR on still...
>
>I guess it never made much sense to hope it was going to go away
>without having a confirmation, so I just changed it to be gcc-11+.

Yeah, that's fine. GCC 13 released without having a fix for at least one 
(hopefully last) known array-bounds vs jump threading bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071

>And one of them is from you.
>
>In particular, commit 4076ea2419cf ("drm/nouveau/disp: Fix
>nvif_outp_acquire_dp() argument size") cannot possibly be right, It
>changes
>
> nvif_outp_acquire_dp(struct nvif_outp *outp, u8 dpcd[16],
>
>to
>
> nvif_outp_acquire_dp(struct nvif_outp *outp, u8 dpcd[DP_RECEIVER_CAP_SIZE],
>
>and then does
>
>memcpy(args.dp.dpcd, dpcd, sizeof(args.dp.dpcd));
>
>where that 'args.dp.dpcd' is a 16-byte array, and DP_RECEIVER_CAP_SIZE is 15.

Yeah, it was an incomplete fix. I sent the other half here, but it fell through 
the cracks:
https://lore.kernel.org/lkml/20230204184307.never.825-k...@kernel.org/



>

>I think it's all entirely harmless from a code generation standpoint,
>because the 15-byte field will be padded out to 16 bytes in the
>structure that contains it, but it's most definitely buggy.

Right; between this, that GCC 13 wasn't released yet, and I had no feedback 
from NV folks, I didn't chase down landing that fix.

>
>So that warning does find real cases of wrong code. But when those
>real cases are hidden by hundreds of lines of unfixable false
>positives, we don't have much choice.

Yup, totally agreed. The false positives I've looked at all seem to be similar 
to the outstanding jump threading bug, so I'm hoping once that gets fixed we'll 
finally have a good signal with that warning enabled. :)

-Kees


-- 
Kees Cook


[Nouveau] [PATCH 2/5] drm/nouveau/acr: remove the unused variable loc

2023-05-04 Thread Jiapeng Chong
The variable loc is not effectively used in the function, so delete it.

drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c:221:7: warning: variable ‘loc’ 
set but not used.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3024
Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c
index f36a359d4531..bd104a030243 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c
@@ -218,7 +218,7 @@ nvkm_acr_lsfw_load_sig_image_desc_v2(struct nvkm_subdev 
*subdev,
const struct firmware *hsbl;
const struct nvfw_ls_hsbl_bin_hdr *hdr;
const struct nvfw_ls_hsbl_hdr *hshdr;
-   u32 loc, sig, cnt, *meta;
+   u32 sig, cnt, *meta;
 
ret = nvkm_firmware_load_name(subdev, path, "hs_bl_sig", ver, 
);
if (ret)
@@ -227,7 +227,6 @@ nvkm_acr_lsfw_load_sig_image_desc_v2(struct nvkm_subdev 
*subdev,
hdr = nvfw_ls_hsbl_bin_hdr(subdev, hsbl->data);
hshdr = nvfw_ls_hsbl_hdr(subdev, hsbl->data + 
hdr->header_offset);
meta = (u32 *)(hsbl->data + hshdr->meta_data_offset);
-   loc = *(u32 *)(hsbl->data + hshdr->patch_loc);
sig = *(u32 *)(hsbl->data + hshdr->patch_sig);
cnt = *(u32 *)(hsbl->data + hshdr->num_sig);
 
-- 
2.20.1.7.g153144c



Re: [Nouveau] [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-05-04 Thread Intel



On 1/19/23 05:58, Matthew Brost wrote:

On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:

On 1/18/23 21:37, Thomas Hellström (Intel) wrote:

On 1/18/23 07:12, Danilo Krummrich wrote:

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
     DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
     space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
     to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
     UMDs can request the named operations to be processed either
     synchronously or asynchronously. It supports DRM syncobjs
     (incl. timelines) as synchronization mechanism. The management of the
     GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
     execution happens asynchronously. It supports DRM syncobj (incl.
     timelines) as synchronization mechanism. DRM GEM object locking is
     handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich 
---
   Documentation/gpu/driver-uapi.rst   |   3 +
   drivers/gpu/drm/nouveau/Kbuild  |   2 +
   drivers/gpu/drm/nouveau/Kconfig |   2 +
   drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
   drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
   drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
   drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
   drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
   drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
   drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
   drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
   11 files changed, 1295 insertions(+), 4 deletions(-)
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

...

+static struct dma_fence *
+nouveau_bind_job_run(struct nouveau_job *job)
+{
+    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+    struct bind_job_op *op;
+    int ret = 0;
+

I was looking at how nouveau does the async binding compared to how xe
does it.
It looks to me that this function being a scheduler run_job callback is
the main part of the VM_BIND dma-fence signalling critical section for
the job's done_fence and if so, needs to be annotated as such?

Yes, that's the case.


For example nouveau_uvma_region_new allocates memory, which is not
allowed if in a dma_fence signalling critical section and the locking
also looks suspicious?

Thanks for pointing this out, I missed that somehow.

I will change it to pre-allocate new regions, mappings and page tables
within the job's submit() function.


Yea that what we basically do in Xe, in the IOCTL step allocate all the
backing store for new page tables, populate new page tables (these are
not yet visible in the page table structure), and in last step which is
executed after all the dependencies are satified program all the leaf
entires making the new binding visible.

We screwed have this up by defering most of the IOCTL to a worker but
will fix this fix this one way or another soon - get rid of worker or
introduce a type of sync that is signaled after the worker + publish the
dma-fence in the worker. I'd like to close on this one soon.
  

For the ops structures the drm_gpuva_manager allocates for reporting the
split/merge steps back to the driver I have ideas to entirely avoid
allocations, which also is a good thing in respect of Christians feedback
regarding the huge amount of mapping requests some applications seem to
generate.


It should be fine to have allocations to report the split/merge step as
this step should be before a dma-fence is published, but yea if possible
to avoid extra allocs as that is always better.

Also BTW, great work on drm_gpuva_manager too. We will almost likely
pick this up in Xe rather than open coding all of this as we currently
do. We should probably start the port to this soon so we can contribute
to the implementation and get both of our drivers upstream sooner.
  

Regarding the locking, anything specific that makes it look suspicious to
you?


I haven't looked into this too but almost certainly Thomas is suggesting
that if you allocate memory anywhere under the nouveau_uvmm_lock then
you can't use this lock in the run_job() callback as this in the
dma-fencing path.


Yes, that was what looked suspicious to me, although I haven't either 
looked at the code in detail to say for sure.


But starting by annotating this with dma_fence_[begin | 

Re: [Nouveau] [PATCH drm-next v3 04/15] drm: manager to keep track of GPUs VA mappings

2023-05-04 Thread Boris Brezillon
On Tue,  4 Apr 2023 03:27:30 +0200
Danilo Krummrich  wrote:

> +struct drm_gpuva_manager {
> + /**
> +  * @name: the name of the DRM GPU VA space
> +  */
> + const char *name;
> +
> + /**
> +  * @mm_start: start of the VA space
> +  */
> + u64 mm_start;
> +
> + /**
> +  * @mm_range: length of the VA space
> +  */
> + u64 mm_range;
> +
> + /**
> +  * @mtree: the _tree to track GPU VA mappings
> +  */
> + struct maple_tree mtree;
> +
> + /**
> +  * @kernel_alloc_node:
> +  *
> +  * _gpuva representing the address space cutout reserved for
> +  * the kernel
> +  */
> + struct drm_gpuva kernel_alloc_node;
> +
> + /**
> +  * @ops: _gpuva_fn_ops providing the split/merge steps to drivers
> +  */
> + struct drm_gpuva_fn_ops *ops;

Any reason for not making that a const object (same goes for all the
functions being passed a drm_gpuva_fn_ops)?

> +};


Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton



On 13/02/2023 02:57, Dave Airlie wrote:
> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  wrote:
>>
>>
>>
>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> On 10.02.23 20:01, Karol Herbst wrote:
 On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
 Leemhuis)  wrote:
>
> On 08.02.23 09:48, Chris Clayton wrote:
>>
>> I'm assuming  that we are not going to see a fix for this regression 
>> before 6.2 is released.
>
> Yeah, looks like it. That's unfortunate, but happens. But there is still
> time to fix it and there is one thing I wonder:
>
> Did any of the nouveau developers look at the netconsole captures Chris
> posted more than a week ago to check if they somehow help to track down
> the root of this problem?

 I did now and I can't spot anything. I think at this point it would
 make sense to dump the active tasks/threads via sqsrq keys to see if
 any is in a weird state preventing the machine from shutting down.
>>>
>>> Many thx for looking into it!
>>
>> Yes, thanks Karol.
>>
>> Attached is the output from dmesg when this block of code:
>>
>> /bin/mount /dev/sda7 /mnt/sda7
>> /bin/mountpoint /proc || /bin/mount /proc
>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>> /bin/echo t > /proc/sysrq-trigger
>> /bin/sleep 1
>> /bin/sync
>> /bin/sleep 1
>> kill $(pidof dmesg)
>> /bin/umount /mnt/sda7
>>
>> is executed immediately before /sbin/reboot is called as the final step of 
>> rebooting my system.
>>
>> I hope this is what you were looking for, but if not, please let me know 
>> what you need
> 

Thanks Dave.
> Another ot in the dark, but does nouveau.runpm=0 help at all?
> 
> Dave.


Re: [Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

2023-05-04 Thread Diogo Ivo
On Fri, Jan 27, 2023 at 10:03:17AM +0100, Nicolas Chauvet wrote:
> I've tried to run glmark2-wayland under weston with DRI_PRIME=1, it
> seems to work at the beginning, but then I have the following error:
> 
> [ 1510.861730] nouveau 5700.gpu: gr: DATA_ERROR 0003
> [INVALID_OPERATION] ch 3 [04002a2000 glmark2-wayland[2753]] subc 0
> class b197 mthd 19d0 data 003d
> [ 1510.952000] nouveau 5700.gpu: gr: DATA_ERROR 0003
> [INVALID_OPERATION] ch 3 [04002a2000 glmark2-wayland[2753]] subc 0
> class b197 mthd 19d0 data 003d
> [ 1510.952060] nouveau 5700.gpu: gr: DATA_ERROR 009c [] ch 3
> [04002a2000 glmark2-wayland[2753]] subc 0 class b197 mthd 0d78 data
> 0006
> I think it's a separate error as I think I can reproduce on kernel
> 6.1x (I will open a separate thread).

Hello,

Would you mind testing this Mesa merge request (and the kernel patches
mentioned there) to see if it fixes this error:

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20811

Thanks,
Diogo


[Nouveau] [PATCH v2] drm/nouveau: fix incorrect conversion to dma_resv_wait_timeout()

2023-05-04 Thread John Ogness
Commit 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")
converted from ttm_bo_wait_ctx() to dma_resv_wait_timeout().
However, dma_resv_wait_timeout() returns greater than zero on
success as opposed to ttm_bo_wait_ctx(). As a result, relocs
will fail and log errors even when it was a success.

Change the return code handling to match that of
nouveau_gem_ioctl_cpu_prep(), which was already using
dma_resv_wait_timeout() correctly.

Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")
Reported-by: Tanmay Bhushan <0070472...@gmail.com>
Link: https://lore.kernel.org/lkml/20230119225351.71657-1-0070472...@gmail.com
Signed-off-by: John Ogness 
---
 The original report was actually a patch that needed fixing.
 Since nobody has stepped up to fix this regression correctly,
 I'm posting the v2.

 This is a real regression introduced in 6.3-rc1.

 drivers/gpu/drm/nouveau/nouveau_gem.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index f77e44958037..346839c24273 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -645,8 +645,9 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
struct drm_nouveau_gem_pushbuf_reloc *reloc,
struct drm_nouveau_gem_pushbuf_bo *bo)
 {
-   long ret = 0;
+   int ret = 0;
unsigned i;
+   long lret;
 
for (i = 0; i < req->nr_relocs; i++) {
struct drm_nouveau_gem_pushbuf_reloc *r = [i];
@@ -703,13 +704,18 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
data |= r->vor;
}
 
-   ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
-   DMA_RESV_USAGE_BOOKKEEP,
-   false, 15 * HZ);
-   if (ret == 0)
+   lret = dma_resv_wait_timeout(nvbo->bo.base.resv,
+DMA_RESV_USAGE_BOOKKEEP,
+false, 15 * HZ);
+   if (!lret)
ret = -EBUSY;
+   else if (lret > 0)
+   ret = 0;
+   else
+   ret = lret;
+
if (ret) {
-   NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
+   NV_PRINTK(err, cli, "reloc wait_idle failed: %d\n",
  ret);
break;
}

base-commit: 09a9639e56c01c7a00d6c0ca63f4c7c41abe075d
-- 
2.30.2



[Nouveau] [PATCH] Change the meaning of the fields in the ttm_place structure from pfn to bytes

2023-05-04 Thread Somalapuram Amaranath
Change the ttm_place structure member fpfn, lpfn, mem_type to
res_start, res_end, res_type.
Change the unsigned to u64.
Fix the dependence in all the DRM drivers and
clean up PAGE_SHIFT operation.

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   |  11 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|  66 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  22 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c   |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c   |  17 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |  40 ---
 drivers/gpu/drm/drm_gem_vram_helper.c |  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  22 ++--
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |   2 +-
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 102 --
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.h |   2 +-
 drivers/gpu/drm/i915/intel_region_ttm.c   |  12 +--
 drivers/gpu/drm/nouveau/nouveau_bo.c  |  41 +++
 drivers/gpu/drm/nouveau/nouveau_mem.c |  10 +-
 drivers/gpu/drm/qxl/qxl_object.c  |  14 +--
 drivers/gpu/drm/qxl/qxl_ttm.c |   8 +-
 drivers/gpu/drm/radeon/radeon_object.c|  50 -
 drivers/gpu/drm/radeon/radeon_ttm.c   |  20 ++--
 drivers/gpu/drm/radeon/radeon_uvd.c   |   8 +-
 drivers/gpu/drm/ttm/ttm_bo.c  |  20 ++--
 drivers/gpu/drm/ttm/ttm_range_manager.c   |  21 ++--
 drivers/gpu/drm/ttm/ttm_resource.c|   8 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c|  46 
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c|  30 +++---
 include/drm/ttm/ttm_placement.h   |  12 +--
 25 files changed, 293 insertions(+), 305 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 44367f03316f..5b5104e724e3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -131,11 +131,12 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
goto err_free;
}
 
-   if (place->lpfn) {
+   if (place->res_end) {
spin_lock(>lock);
r = drm_mm_insert_node_in_range(>mm, >mm_nodes[0],
-   num_pages, tbo->page_alignment,
-   0, place->fpfn, place->lpfn,
+   num_pages, tbo->page_alignment, 
0,
+   place->res_start << PAGE_SHIFT,
+   place->res_end << PAGE_SHIFT,
DRM_MM_INSERT_BEST);
spin_unlock(>lock);
if (unlikely(r))
@@ -219,7 +220,7 @@ static bool amdgpu_gtt_mgr_intersects(struct 
ttm_resource_manager *man,
  const struct ttm_place *place,
  size_t size)
 {
-   return !place->lpfn || amdgpu_gtt_mgr_has_gart_addr(res);
+   return !place->res_end || amdgpu_gtt_mgr_has_gart_addr(res);
 }
 
 /**
@@ -237,7 +238,7 @@ static bool amdgpu_gtt_mgr_compatible(struct 
ttm_resource_manager *man,
  const struct ttm_place *place,
  size_t size)
 {
-   return !place->lpfn || amdgpu_gtt_mgr_has_gart_addr(res);
+   return !place->res_end || amdgpu_gtt_mgr_has_gart_addr(res);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 283e8fe608ce..2926389e21d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -130,15 +130,15 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
u32 c = 0;
 
if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
-   unsigned visible_pfn = adev->gmc.visible_vram_size >> 
PAGE_SHIFT;
+   u64 visible_pfn = adev->gmc.visible_vram_size;
 
-   places[c].fpfn = 0;
-   places[c].lpfn = 0;
-   places[c].mem_type = TTM_PL_VRAM;
+   places[c].res_start = 0;
+   places[c].res_end = 0;
+   places[c].res_type = TTM_PL_VRAM;
places[c].flags = 0;
 
if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
-   places[c].lpfn = visible_pfn;
+   places[c].res_end = visible_pfn;
else
places[c].flags |= TTM_PL_FLAG_TOPDOWN;
 
@@ -148,9 +148,9 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, 
u32 domain)
}
 
if (domain & AMDGPU_GEM_DOMAIN_GTT) {
-   places[c].fpfn = 0;
-   places[c].lpfn = 0;
-   places[c].mem_type =
+   places[c].res_start = 0;
+   places[c].res_end = 

[Nouveau] [PATCH 6/6] drm/amdgpu: Cleanup the GDS, GWS and OA allocations

2023-05-04 Thread Somalapuram Amaranath
Change the size of GDS, GWS and OA from pages to bytes.
The initialized gds_size, gws_size and oa_size in bytes,
remove PAGE_SHIFT in amdgpu_ttm_init_on_chip().
:
Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 12 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  3 +--
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index c3d9d75143f4..4641b25956fd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -142,16 +142,16 @@ void amdgpu_job_set_resources(struct amdgpu_job *job, 
struct amdgpu_bo *gds,
  struct amdgpu_bo *gws, struct amdgpu_bo *oa)
 {
if (gds) {
-   job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
-   job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
+   job->gds_base = amdgpu_bo_gpu_offset(gds);
+   job->gds_size = amdgpu_bo_size(gds);
}
if (gws) {
-   job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT;
-   job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT;
+   job->gws_base = amdgpu_bo_gpu_offset(gws);
+   job->gws_size = amdgpu_bo_size(gws);
}
if (oa) {
-   job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT;
-   job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT;
+   job->oa_base = amdgpu_bo_gpu_offset(oa);
+   job->oa_size = amdgpu_bo_size(oa);
}
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index f5d5eee09cea..9285037d6d88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -541,12 +541,11 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA)) {
/* GWS and OA don't need any alignment. */
page_align = bp->byte_align;
-   size <<= PAGE_SHIFT;
 
} else if (bp->domain & AMDGPU_GEM_DOMAIN_GDS) {
/* Both size and alignment must be a multiple of 4. */
page_align = ALIGN(bp->byte_align, 4);
-   size = ALIGN(size, 4) << PAGE_SHIFT;
+   size = ALIGN(size, 4);
} else {
/* Memory should be aligned at least to a page size. */
page_align = ALIGN(bp->byte_align, PAGE_SIZE) >> PAGE_SHIFT;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index f0dabdfd3780..a8e444a31d8f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -77,8 +77,7 @@ static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev,
unsigned int type,
uint64_t size)
 {
-   return ttm_range_man_init(>mman.bdev, type,
- false, size << PAGE_SHIFT);
+   return ttm_range_man_init(>mman.bdev, type, false, size);
 }
 
 /**
-- 
2.32.0



[Nouveau] [PATCH RESEND] drm/nouveau/hwmon: Use sysfs_emit in show function callsbacks

2023-05-04 Thread Deepak R Varma
According to Documentation/filesystems/sysfs.rst, the show() callback
function of kobject attributes should strictly use sysfs_emit() instead
of sprintf() family functions. So, make this change.
Issue identified using the coccinelle device_attr_show.cocci script.

Signed-off-by: Deepak R Varma 
---
Note:
   Resending the patch for review and feedback. No functional changes.


 drivers/gpu/drm/nouveau/nouveau_hwmon.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_hwmon.c 
b/drivers/gpu/drm/nouveau/nouveau_hwmon.c
index a7db7c31064b..e844be49e11e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_hwmon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_hwmon.c
@@ -41,7 +41,7 @@ static ssize_t
 nouveau_hwmon_show_temp1_auto_point1_pwm(struct device *d,
 struct device_attribute *a, char *buf)
 {
-   return snprintf(buf, PAGE_SIZE, "%d\n", 100);
+   return sysfs_emit(buf, "%d\n", 100);
 }
 static SENSOR_DEVICE_ATTR(temp1_auto_point1_pwm, 0444,
  nouveau_hwmon_show_temp1_auto_point1_pwm, NULL, 0);
@@ -54,8 +54,8 @@ nouveau_hwmon_temp1_auto_point1_temp(struct device *d,
struct nouveau_drm *drm = nouveau_drm(dev);
struct nvkm_therm *therm = nvxx_therm(>client.device);
 
-   return snprintf(buf, PAGE_SIZE, "%d\n",
- therm->attr_get(therm, NVKM_THERM_ATTR_THRS_FAN_BOOST) * 1000);
+   return sysfs_emit(buf, "%d\n",
+ therm->attr_get(therm, 
NVKM_THERM_ATTR_THRS_FAN_BOOST) * 1000);
 }
 static ssize_t
 nouveau_hwmon_set_temp1_auto_point1_temp(struct device *d,
@@ -87,8 +87,8 @@ nouveau_hwmon_temp1_auto_point1_temp_hyst(struct device *d,
struct nouveau_drm *drm = nouveau_drm(dev);
struct nvkm_therm *therm = nvxx_therm(>client.device);
 
-   return snprintf(buf, PAGE_SIZE, "%d\n",
-therm->attr_get(therm, NVKM_THERM_ATTR_THRS_FAN_BOOST_HYST) * 1000);
+   return sysfs_emit(buf, "%d\n",
+ therm->attr_get(therm, 
NVKM_THERM_ATTR_THRS_FAN_BOOST_HYST) * 1000);
 }
 static ssize_t
 nouveau_hwmon_set_temp1_auto_point1_temp_hyst(struct device *d,
-- 
2.34.1





Re: [Nouveau] [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE

2023-05-04 Thread Liam R. Howlett
* Danilo Krummrich  [230217 08:44]:
> Generic components making use of the maple tree (such as the
> DRM GPUVA Manager) delegate the responsibility of ensuring mutual
> exclusion to their users.
> 
> While such components could inherit the concept of an external lock,
> some users might just serialize the access to the component and hence to
> the internal maple tree.
> 
> In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
> indicate not to do any internal lockdep checks.
> 
> Signed-off-by: Danilo Krummrich 
> ---
>  include/linux/maple_tree.h | 20 +++-
>  lib/maple_tree.c   |  7 ---
>  2 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index ca04c900e51a..f795e5def8d0 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -170,10 +170,11 @@ enum maple_type {
>  #define MT_FLAGS_USE_RCU 0x02
>  #define MT_FLAGS_HEIGHT_OFFSET   0x02
>  #define MT_FLAGS_HEIGHT_MASK 0x7C
> -#define MT_FLAGS_LOCK_MASK   0x300
> +#define MT_FLAGS_LOCK_MASK   0x700
>  #define MT_FLAGS_LOCK_IRQ0x100
>  #define MT_FLAGS_LOCK_BH 0x200
>  #define MT_FLAGS_LOCK_EXTERN 0x300
> +#define MT_FLAGS_LOCK_NONE   0x400

Please add this to the documentation above the flags as well.  We should
probably add enough context so that users don't just set this and then
use multiple writers.

>  
>  #define MAPLE_HEIGHT_MAX 31
>  
> @@ -559,11 +560,16 @@ static inline void mas_set(struct ma_state *mas, 
> unsigned long index)
>   mas_set_range(mas, index, index);
>  }
>  
> -static inline bool mt_external_lock(const struct maple_tree *mt)
> +static inline bool mt_lock_external(const struct maple_tree *mt)
>  {
>   return (mt->ma_flags & MT_FLAGS_LOCK_MASK) == MT_FLAGS_LOCK_EXTERN;
>  }
>  
> +static inline bool mt_lock_none(const struct maple_tree *mt)
> +{
> + return (mt->ma_flags & MT_FLAGS_LOCK_MASK) == MT_FLAGS_LOCK_NONE;
> +}
> +
>  /**
>   * mt_init_flags() - Initialise an empty maple tree with flags.
>   * @mt: Maple Tree
> @@ -577,7 +583,7 @@ static inline bool mt_external_lock(const struct 
> maple_tree *mt)
>  static inline void mt_init_flags(struct maple_tree *mt, unsigned int flags)
>  {
>   mt->ma_flags = flags;
> - if (!mt_external_lock(mt))
> + if (!mt_lock_external(mt) && !mt_lock_none(mt))
>   spin_lock_init(>ma_lock);
>   rcu_assign_pointer(mt->ma_root, NULL);
>  }
> @@ -612,9 +618,11 @@ static inline void mt_clear_in_rcu(struct maple_tree *mt)
>   if (!mt_in_rcu(mt))
>   return;
>  
> - if (mt_external_lock(mt)) {
> + if (mt_lock_external(mt)) {
>   BUG_ON(!mt_lock_is_held(mt));
>   mt->ma_flags &= ~MT_FLAGS_USE_RCU;
> + } else if (mt_lock_none(mt)) {
> + mt->ma_flags &= ~MT_FLAGS_USE_RCU;
>   } else {
>   mtree_lock(mt);
>   mt->ma_flags &= ~MT_FLAGS_USE_RCU;
> @@ -631,9 +639,11 @@ static inline void mt_set_in_rcu(struct maple_tree *mt)
>   if (mt_in_rcu(mt))
>   return;
>  
> - if (mt_external_lock(mt)) {
> + if (mt_lock_external(mt)) {
>   BUG_ON(!mt_lock_is_held(mt));
>   mt->ma_flags |= MT_FLAGS_USE_RCU;
> + } else if (mt_lock_none(mt)) {
> + mt->ma_flags |= MT_FLAGS_USE_RCU;
>   } else {
>   mtree_lock(mt);
>   mt->ma_flags |= MT_FLAGS_USE_RCU;
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 26e2045d3cda..f51c0fd4eaad 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -802,8 +802,8 @@ static inline void __rcu **ma_slots(struct maple_node 
> *mn, enum maple_type mt)
>  
>  static inline bool mt_locked(const struct maple_tree *mt)
>  {
> - return mt_external_lock(mt) ? mt_lock_is_held(mt) :
> - lockdep_is_held(>ma_lock);
> + return mt_lock_external(mt) ? mt_lock_is_held(mt) :
> + mt_lock_none(mt) ? true : lockdep_is_held(>ma_lock);

It might be better to just make this two return statements for clarity.

>  }
>  
>  static inline void *mt_slot(const struct maple_tree *mt,
> @@ -6120,7 +6120,8 @@ bool mas_nomem(struct ma_state *mas, gfp_t gfp)
>   return false;
>   }
>  
> - if (gfpflags_allow_blocking(gfp) && !mt_external_lock(mas->tree)) {
> + if (gfpflags_allow_blocking(gfp) &&
> + !mt_lock_external(mas->tree) && !mt_lock_none(mas->tree)) {
>   mtree_unlock(mas->tree);
>   mas_alloc_nodes(mas, gfp);
>   mtree_lock(mas->tree);
> -- 
> 2.39.1
> 


Re: [Nouveau] [PATCH drm-next v3 04/15] drm: manager to keep track of GPUs VA mappings

2023-05-04 Thread Boris Brezillon
On Tue,  4 Apr 2023 03:27:30 +0200
Danilo Krummrich  wrote:

> +/**
> + * drm_gpuva_prealloc_create - creates a preallocated node to store a
> + * _gpuva entry.
> + *
> + * Returns: the _gpuva_prealloc object on success, NULL on failure
> + */
> +struct drm_gpuva_prealloc *
> +drm_gpuva_prealloc_create(void)
> +{
> + struct drm_gpuva_prealloc *pa;
> +
> + pa = kzalloc(sizeof(*pa), GFP_KERNEL);
> + if (!pa)
> + return NULL;
> +
> + if (mas_preallocate(>mas, GFP_KERNEL)) {

mas_preallocate() needs a valid tree field to calculate the number
of nodes to pre-allocate. I guess we're missing a MA_STATE_INIT() here,
and we need to pass a gpuva_mgr object to this helper.

> + kfree(pa);
> + return NULL;
> + }
> +
> + return pa;
> +}
> +EXPORT_SYMBOL(drm_gpuva_prealloc_create);


[Nouveau] [PATCH v2 0/2] drm/nouveau: avoid usage of list iterator after loop

2023-05-04 Thread Jakob Koschel
This patch set includes two instances where the list iterator variable
'pstate' is implicitly assumed to be valid after the iterator loop.
While in pratice that is most likely the case (if
'pstatei'/'args->v0.state' is <= the elements in clk->states), we should
explicitly only allow 'pstate' to always point to correct 'nvkm_pstate'
structs.

That allows catching potential bugs with WARN_ON(!pstate) that otherwise
would be completely undetectable.

It also helps the greater mission to hopefully move the list iterator
variable into the iterating macro directly [1].

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
Changes in v2:
- convert BUG_ON() into WARN_ON()
- Link to v1: 
https://lore.kernel.org/r/20230301-drm-nouveau-avoid-iter-after-loop-v1-0-0702ec23f...@gmail.com

---
Jakob Koschel (2):
  drm/nouveau/device: avoid usage of list iterator after loop
  drm/nouveau/clk: avoid usage of list iterator after loop

 drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c | 11 ---
 drivers/gpu/drm/nouveau/nvkm/subdev/clk/base.c| 10 +++---
 2 files changed, 15 insertions(+), 6 deletions(-)
---
base-commit: c0927a7a5391f7d8e593e5e50ead7505a23cadf9
change-id: 20230301-drm-nouveau-avoid-iter-after-loop-4bff97166efa

Best regards,
-- 
Jakob Koschel 



[Nouveau] [PATCH 1/2] drm/i915: constify pointers to hwmon_channel_info

2023-05-04 Thread Krzysztof Kozlowski
Statically allocated array of pointed to hwmon_channel_info can be made
const for safety.

Signed-off-by: Krzysztof Kozlowski 

---

This depends on hwmon core patch:
https://lore.kernel.org/all/20230406203103.3011503-2-krzysztof.kozlow...@linaro.org/

Therefore I propose this should also go via hwmon tree.

Cc: Jean Delvare 
Cc: Guenter Roeck 
Cc: linux-hw...@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_hwmon.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
b/drivers/gpu/drm/i915/i915_hwmon.c
index 596dd2c07010..87b527a54272 100644
--- a/drivers/gpu/drm/i915/i915_hwmon.c
+++ b/drivers/gpu/drm/i915/i915_hwmon.c
@@ -267,7 +267,7 @@ static const struct attribute_group *hwm_groups[] = {
NULL
 };
 
-static const struct hwmon_channel_info *hwm_info[] = {
+static const struct hwmon_channel_info * const hwm_info[] = {
HWMON_CHANNEL_INFO(in, HWMON_I_INPUT),
HWMON_CHANNEL_INFO(power, HWMON_P_MAX | HWMON_P_RATED_MAX | 
HWMON_P_CRIT),
HWMON_CHANNEL_INFO(energy, HWMON_E_INPUT),
@@ -275,7 +275,7 @@ static const struct hwmon_channel_info *hwm_info[] = {
NULL
 };
 
-static const struct hwmon_channel_info *hwm_gt_info[] = {
+static const struct hwmon_channel_info * const hwm_gt_info[] = {
HWMON_CHANNEL_INFO(energy, HWMON_E_INPUT),
NULL
 };
-- 
2.34.1



[Nouveau] [PATCH 34/37] drm/nouveau/nvkm/engine/gr/tu102: Completely remove unused function ‘tu102_gr_load’

2023-05-04 Thread Lee Jones
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c:210:1: warning: ‘tu102_gr_load’ 
defined but not used [-Wunused-function]

Cc: Ben Skeggs 
Cc: Karol Herbst 
Cc: Lyude Paul 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Signed-off-by: Lee Jones 
---
 drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c
index 10a7e59482a6f..a7775aa185415 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c
@@ -206,19 +206,6 @@ tu102_gr_av_to_init_veid(struct nvkm_blob *blob, struct 
gf100_gr_pack **ppack)
return gk20a_gr_av_to_init_(blob, 64, 0x0010, ppack);
 }
 
-static int
-tu102_gr_load(struct gf100_gr *gr, int ver, const struct gf100_gr_fwif *fwif)
-{
-   int ret;
-
-   ret = gm200_gr_load(gr, ver, fwif);
-   if (ret)
-   return ret;
-
-   return gk20a_gr_load_net(gr, "gr/", "sw_veid_bundle_init", ver, 
tu102_gr_av_to_init_veid,
->bundle_veid);
-}
-
 static const struct gf100_gr_fwif
 tu102_gr_fwif[] = {
{  0, gm200_gr_load, _gr, _gr_fecs_acr, _gr_gpccs_acr 
},
-- 
2.40.0.rc1.284.g88254d51c5-goog



Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton
Hi Karol.

I sent the originsl report to Ben and LKML. Thorsten then added you, Lyude
Paul and the dri-devel and nouveau lists. So you should have received this
report on or about January 19.

Chris

On Fri, 27 Jan 2023, 11:35 Karol Herbst,  wrote:

> Where was the original email sent to anyway, because I don't have it at
> all.
>
> Anyhow, I suspect we want to fetch logs to see what's happening, but
> due to the nature of this bug it might get difficult.
>
> I'm checking out the laptops I have here if I can reproduce this
> issue, but I think all mine with Turing GPUs are fine.
>
> Maybe Ben has any idea what might be wrong with
> 0e44c21708761977dcbea9b846b51a6fb684907a or if that's an issue which
> is already fixed by not upstreamed patches as I think I remember Ben
> to talk about something like that recently.
>
> Karol
>
> On Fri, Jan 27, 2023 at 12:20 PM Linux kernel regression tracking
> (Thorsten Leemhuis)  wrote:
> >
> > Hi, this is your Linux kernel regression tracker. Top-posting for once,
> > to make this easily accessible to everyone.
> >
> > @nouveau-maintainers, did anyone take a look at this? The report is
> > already 8 days old and I don't see a single reply. Sure, we'll likely
> > get a -rc8, but still it would be good to not fix this on the finish
> line.
> >
> > Chris, btw, did you try if you can revert the commit on top of latest
> > mainline? And if so, does it fix the problem?
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > --
> > Everything you wanna know about Linux kernel regression tracking:
> > https://linux-regtracking.leemhuis.info/about/#tldr
> > If I did something stupid, please tell me, as explained on that page.
> >
> > #regzbot poke
> >
> > On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
> > wrote:
> > > [adding various lists and the two other nouveau maintainers to the list
> > > of recipients]
> >
> > > On 18.01.23 21:59, Chris Clayton wrote:
> > >> Hi.
> > >>
> > >> I build and installed the lastest development kernel earlier this
> week. I've found that when I try the laptop down (or
> > >> reboot it), it hangs right at the end of closing the current session.
> The last line I see on  the screen when rebooting is:
> > >>
> > >>  sd 4:0:0:0: [sda] Synchronising SCSI cache
> > >>
> > >> when closing down I see one additional line:
> > >>
> > >>  sd 4:0:0:0 [sda]Stopping disk
> > >>
> > >> In both cases the machine then hangs and I have to hold down the
> power button fot a few seconds to switch it off.
> > >>
> > >> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and
> landed on:
> > >>
> > >>  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a]
> drm/nouveau/flcn: new code to load+boot simple HS FWs
> > >> (VPR scrubber)
> > >>
> > >> I built and installed a kernel with
> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit)
> checked out
> > >> and that shuts down and reboots fine. It the did the same with the
> bad commit checked out and that does indeed hang, so
> > >> I'm confident the bisect outcome is OK.
> > >>
> > >> Kernels 6.1.6 and 5.15.88 are also OK.
> > >>
> > >> My system had dual GPUs - one intel and one NVidia. Related extracts
> from 'lscpi -v' is:
> > >>
> > >> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2
> [UHD Graphics] (rev 05) (prog-if 00 [VGA controller])
> > >> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
> > >>
> > >> Flags: bus master, fast devsel, latency 0, IRQ 142
> > >>
> > >> Memory at c200 (64-bit, non-prefetchable) [size=16M]
> > >>
> > >> Memory at a000 (64-bit, prefetchable) [size=256M]
> > >>
> > >> I/O ports at 5000 [size=64]
> > >>
> > >> Expansion ROM at 000c [virtual] [disabled] [size=128K]
> > >>
> > >> Capabilities: [40] Vendor Specific Information: Len=0c 
> > >>
> > >> Capabilities: [70] Express Root Complex Integrated Endpoint,
> MSI 00
> > >>
> > >> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> > >>
> > >> Capabilities: [d0] Power Management version 2
> > >>
> > >> Kernel driver in use: i915
> > >>
> > >> Kernel modules: i915
> > >>
> > >>
> > >> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce
> GTX 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> > >> controller])
> > >> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti
> Mobile]
> > >> Flags: bus master, fast devsel, latency 0, IRQ 141
> > >> Memory at c400 (32-bit, non-prefetchable) [size=16M]
> > >> Memory at b000 (64-bit, prefetchable) [size=256M]
> > >> Memory at c000 (64-bit, prefetchable) [size=32M]
> > >> I/O ports at 4000 [size=128]
> > >> Expansion ROM at c300 [disabled] [size=512K]
> > >> Capabilities: [60] Power Management version 3
> > >> Capabilities: [68] MSI: 

Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton
Thanks Thorsten.

I did try to revert but it didnt revert cleanly and I don't have the
knowledge to fix it up.

The patch was part of a merge that included a number of related patches.
I'll try to revert the lot and report back.

Chris


On Fri, 27 Jan 2023, 11:20 Linux kernel regression tracking (Thorsten
Leemhuis),  wrote:

> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
>
> @nouveau-maintainers, did anyone take a look at this? The report is
> already 8 days old and I don't see a single reply. Sure, we'll likely
> get a -rc8, but still it would be good to not fix this on the finish line.
>
> Chris, btw, did you try if you can revert the commit on top of latest
> mainline? And if so, does it fix the problem?
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
> wrote:
> > [adding various lists and the two other nouveau maintainers to the list
> > of recipients]
>
> > On 18.01.23 21:59, Chris Clayton wrote:
> >> Hi.
> >>
> >> I build and installed the lastest development kernel earlier this week.
> I've found that when I try the laptop down (or
> >> reboot it), it hangs right at the end of closing the current session.
> The last line I see on  the screen when rebooting is:
> >>
> >>  sd 4:0:0:0: [sda] Synchronising SCSI cache
> >>
> >> when closing down I see one additional line:
> >>
> >>  sd 4:0:0:0 [sda]Stopping disk
> >>
> >> In both cases the machine then hangs and I have to hold down the power
> button fot a few seconds to switch it off.
> >>
> >> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and
> landed on:
> >>
> >>  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a]
> drm/nouveau/flcn: new code to load+boot simple HS FWs
> >> (VPR scrubber)
> >>
> >> I built and installed a kernel with
> f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit)
> checked out
> >> and that shuts down and reboots fine. It the did the same with the bad
> commit checked out and that does indeed hang, so
> >> I'm confident the bisect outcome is OK.
> >>
> >> Kernels 6.1.6 and 5.15.88 are also OK.
> >>
> >> My system had dual GPUs - one intel and one NVidia. Related extracts
> from 'lscpi -v' is:
> >>
> >> 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2
> [UHD Graphics] (rev 05) (prog-if 00 [VGA controller])
> >> Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]
> >>
> >> Flags: bus master, fast devsel, latency 0, IRQ 142
> >>
> >> Memory at c200 (64-bit, non-prefetchable) [size=16M]
> >>
> >> Memory at a000 (64-bit, prefetchable) [size=256M]
> >>
> >> I/O ports at 5000 [size=64]
> >>
> >> Expansion ROM at 000c [virtual] [disabled] [size=128K]
> >>
> >> Capabilities: [40] Vendor Specific Information: Len=0c 
> >>
> >> Capabilities: [70] Express Root Complex Integrated Endpoint,
> MSI 00
> >>
> >> Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> >>
> >> Capabilities: [d0] Power Management version 2
> >>
> >> Kernel driver in use: i915
> >>
> >> Kernel modules: i915
> >>
> >>
> >> 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce
> GTX 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> >> controller])
> >> Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti
> Mobile]
> >> Flags: bus master, fast devsel, latency 0, IRQ 141
> >> Memory at c400 (32-bit, non-prefetchable) [size=16M]
> >> Memory at b000 (64-bit, prefetchable) [size=256M]
> >> Memory at c000 (64-bit, prefetchable) [size=32M]
> >> I/O ports at 4000 [size=128]
> >> Expansion ROM at c300 [disabled] [size=512K]
> >> Capabilities: [60] Power Management version 3
> >> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
> >> Capabilities: [78] Express Legacy Endpoint, MSI 00
> >> Kernel driver in use: nouveau
> >> Kernel modules: nouveau
> >>
> >> DRI_PRIME=1 is exported in one of my init scripts (yes, I am still
> using sysvinit).
> >>
> >> I've attached the bisect.log, but please let me know if I can provide
> any other diagnostics. Please cc me as I'm not
> >> subscribed.
> >
> > Thanks for the report. To be sure the issue doesn't fall through the
> > cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> > tracking bot:
> >
> > #regzbot ^introduced e44c2170876197
> > #regzbot title drm: nouveau: hangs on poweroff/reboot
> > #regzbot ignore-activity
> >
> > This isn't a regression? This issue or a fix for it are 

[Nouveau] Fwd: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton
Proof, if any where needed, that I should consume more coffee before dealing 
with email...

Adding cc recipients that were dropped in my message this morning.


 Forwarded Message 
Subject: Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected
Date: Mon, 13 Feb 2023 09:21:10 +
From: Chris Clayton 
To: Dave Airlie 

[ Apologies for the incomplete message I sent a few minutes ago. I should have 
had more coffee before I started dealing
with email. ]

On 13/02/2023 02:57, Dave Airlie wrote:
> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  wrote:
>>
>>
>>
>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> On 10.02.23 20:01, Karol Herbst wrote:
 On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
 Leemhuis)  wrote:
>
> On 08.02.23 09:48, Chris Clayton wrote:
>>
>> I'm assuming  that we are not going to see a fix for this regression 
>> before 6.2 is released.
>
> Yeah, looks like it. That's unfortunate, but happens. But there is still
> time to fix it and there is one thing I wonder:
>
> Did any of the nouveau developers look at the netconsole captures Chris
> posted more than a week ago to check if they somehow help to track down
> the root of this problem?

 I did now and I can't spot anything. I think at this point it would
 make sense to dump the active tasks/threads via sqsrq keys to see if
 any is in a weird state preventing the machine from shutting down.
>>>
>>> Many thx for looking into it!
>>
>> Yes, thanks Karol.
>>
>> Attached is the output from dmesg when this block of code:
>>
>> /bin/mount /dev/sda7 /mnt/sda7
>> /bin/mountpoint /proc || /bin/mount /proc
>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>> /bin/echo t > /proc/sysrq-trigger
>> /bin/sleep 1
>> /bin/sync
>> /bin/sleep 1
>> kill $(pidof dmesg)
>> /bin/umount /mnt/sda7
>>
>> is executed immediately before /sbin/reboot is called as the final step of 
>> rebooting my system.
>>
>> I hope this is what you were looking for, but if not, please let me know 
>> what you need
> 

Thanks, Dave.

> Another shot in the dark, but does nouveau.runpm=0 help at all?
> 
Yes, it does. With runpm=0, both reboot and poweroff work on my laptop. Of 
course, it also means that the discrete
(NVidia) GPU is now powered on permanently.

Chris


> Dave.


Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton



On 30/01/2023 23:27, Ben Skeggs wrote:
> On Tue, 31 Jan 2023 at 09:09, Chris Clayton  wrote:
>>
>> Hi again.
>>
>> On 30/01/2023 20:19, Chris Clayton wrote:
>>> Thanks, Ben.
>>
>> 
>>
 Hey,

 This is a complete shot-in-the-dark, as I don't see this behaviour on
 *any* of my boards.  Could you try the attached patch please?
>>>
>>> Unfortunately, the patch made no difference.
>>>
>>> I've been looking at how the graphics on my laptop is set up, and have a 
>>> bit of a worry about whether the firmware might
>>> be playing a part in this problem. In order to offload video decoding to 
>>> the NVidia TU117 GPU, it seems the scrubber
>>> firmware must be available, but as far as I know,that has not been released 
>>> by NVidia. To get it to work, I followed
>>> what ubuntu have done and the scrubber in /lib/firmware/nvidia/tu117/nvdec/ 
>>> is a symlink to
>>> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
>>> firmware loaded is for a different card is being
>>> loaded. I note that processing related to firmware is being changed in the 
>>> patch. Might my set up be at the root of my
>>> problem?
>>>
>>> I'll have a fiddle an see what I can work out.
>>>
>>> Chris
>>>

 Thanks,
 Ben.

>
>>
>> Well, my fiddling has got my system rebooting and shutting down successfully 
>> again. I found that if I delete the symlink
>> to the scrubber firmware, reboot and shutdown work again. There are however, 
>> a number of other files in the tu117
>> firmware directory tree that that are symlinks to actual files in its tu116 
>> counterpart. So I deleted all of those too.
>> Unfortunately, the absence of one or more of those symlinks causes Xorg to 
>> fail to start. I've reinstated all the links
>> except scrubber and I now have a system that works as it did until I tried 
>> to run a kernel that includes the bad commit
>> I identified in my bisection. That includes offloading video decoding to the 
>> NVidia card, so what ever I read that said
>> the scrubber firmware was needed seems to have been wrong. I get a new 
>> message that (nouveau :01:00.0: fb: VPR
>> locked, but no scrubber binary!), but, hey, we can't have everything.
>>
>> If you still want to get to the bottom of this, let me know what you need me 
>> to provide and I'll do my best. I suspect
>> you might want to because there will a n awful lot of Ubuntu-based systems 
>> out there with that scrubber.bin symlink in
>> place. On the other hand,m it could but quite a while before ubuntu are 
>> deploying 6.2 or later kernels.
> The symlinks are correct - whole groups of GPUs share the same FW, and
> we use symlinks in linux-firmware to represent this.
> 
> I don't really have any ideas how/why this patch causes issues with
> shutdown - it's a path that only gets executed during initialisation.
> Can you try and capture the kernel log during shutdown ("dmesg -w"
> over ssh? netconsole?), and see if there's any relevant messages
> providing a hint at what's going on?  Alternatively, you could try
> unloading the module (you will have to stop X/wayland/gdm/etc/etc
> first) and seeing if that hangs too.
> 
> Ben.

Sorry for the delay - I've been learning about netconsole and netcat. However, 
I had no success with ssh and netconsole
produced a log with nothing unusual in it.

Simply stopping Xorg and removing the nouveau module succeeds.

So, I rebuilt rc6+ after a pull from linus' tree this morning and set the 
nouveau debug level to 7. I then booted to a
console before doing a reboot (with Ctl+Alt+Del). As expected the machine 
locked up just before it would ordinarily
restart. The last few lines on the console might be helpful:

...
nouveau :01:00:0  fifo: preinit running...
nouveau :01:00:0  fifo: preinit completed in 4us
nouveau :01:00:0  gr: preinit running...
nouveau :01:00:0  gr: preinit completed in 0us
nouveau :01:00:0  nvdec0: preinit running...
nouveau :01:00:0  nvdec0: preinit completed in 0us
nouveau :01:00:0  nvdec0: preinit running...
nouveau :01:00:0  nvdec0: preinit completed in 0us
nouveau :01:00:0  sec2: preinit running...
nouveau :01:00:0  sec2: preinit completed in 0us
nouveau :01:00:0  fb:.VPR locked, running scrubber binary

These messages appear after the "sd 4:0:0:0 [sda] Stopping disk" I reported in 
my initial email.

After the "running scrubber" line appears the machine is locked and I have to 
hold down the power button to recover. I
get the same outcome from running "halt -dip", "poweroff -di" and "shutdown -h 
-P now". I guess it's no surprise that
all three result in the same outcome because invocations halt, poweroff and 
reboot (without the -f argument)from a
runlevel other than 0 resukt in shutdown being run. switching to runlevel 0 
with "telenit 0" results in the same
messages from nouveau followed by the lockup.

Let me know if you need any additional diagnostics.

Chris

> 
>>
>> Thanks,
>>
>> Chris
>>
>> 


[Nouveau] [PATCH v4 3/4] drm/amdgpu: Movie the amdgpu_gtt_mgr start and size from pages to bytes

2023-05-04 Thread Somalapuram Amaranath
To support GTT manager amdgpu_res_first, amdgpu_res_next
from pages to bytes and clean up PAGE_SHIFT operation.
Change the GTT manager init and allocate from pages to bytes
v1 -> v2: reorder patch sequence
v3 -> v4: reorder patch sequence

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c| 13 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h |  8 
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 44367f03316f..a1fbfc5984d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -116,7 +116,6 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
  struct ttm_resource **res)
 {
struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   uint32_t num_pages = PFN_UP(tbo->base.size);
struct ttm_range_mgr_node *node;
int r;
 
@@ -134,8 +133,10 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
if (place->lpfn) {
spin_lock(>lock);
r = drm_mm_insert_node_in_range(>mm, >mm_nodes[0],
-   num_pages, tbo->page_alignment,
-   0, place->fpfn, place->lpfn,
+   tbo->base.size,
+   tbo->page_alignment << 
PAGE_SHIFT, 0,
+   place->fpfn << PAGE_SHIFT,
+   place->lpfn << PAGE_SHIFT,
DRM_MM_INSERT_BEST);
spin_unlock(>lock);
if (unlikely(r))
@@ -144,7 +145,7 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
node->base.start = node->mm_nodes[0].start;
} else {
node->mm_nodes[0].start = 0;
-   node->mm_nodes[0].size = PFN_UP(node->base.size);
+   node->mm_nodes[0].size = node->base.size;
node->base.start = AMDGPU_BO_INVALID_OFFSET;
}
 
@@ -285,8 +286,8 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, 
uint64_t gtt_size)
 
ttm_resource_manager_init(man, >mman.bdev, gtt_size);
 
-   start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
-   size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
+   start = (AMDGPU_GTT_MAX_TRANSFER_SIZE * 
AMDGPU_GTT_NUM_TRANSFER_WINDOWS) << PAGE_SHIFT;
+   size = adev->gmc.gart_size - start;
drm_mm_init(>mm, start, size);
spin_lock_init(>lock);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 5c4f93ee0c57..5c78f0b09351 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -94,8 +94,8 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
while (start >= node->size << PAGE_SHIFT)
start -= node++->size << PAGE_SHIFT;
 
-   cur->start = (node->start << PAGE_SHIFT) + start;
-   cur->size = min((node->size << PAGE_SHIFT) - start, size);
+   cur->start = node->start + start;
+   cur->size = min(node->size - start, size);
cur->remaining = size;
cur->node = node;
break;
@@ -155,8 +155,8 @@ static inline void amdgpu_res_next(struct amdgpu_res_cursor 
*cur, uint64_t size)
node = cur->node;
 
cur->node = ++node;
-   cur->start = node->start << PAGE_SHIFT;
-   cur->size = min(node->size << PAGE_SHIFT, cur->remaining);
+   cur->start = node->start;
+   cur->size = min(node->size, cur->remaining);
break;
default:
return;
-- 
2.32.0



Re: [Nouveau] [PATCH drm-next v2 03/16] maple_tree: split up MA_STATE() macro

2023-05-04 Thread Liam R. Howlett
* Danilo Krummrich  [230220 09:38]:
> On 2/17/23 19:34, Liam R. Howlett wrote:
> > * Danilo Krummrich  [230217 08:44]:
> > > Split up the MA_STATE() macro such that components using the maple tree
> > > can easily inherit from struct ma_state and build custom tree walk
> > > macros to hide their internals from users.
> > > 
> > > Example:
> > > 
> > > struct sample_iter {
> > >   struct ma_state mas;
> > >   struct sample_mgr *mgr;
> > >   struct sample_entry *entry;
> > > };
> > > 
> > > \#define SAMPLE_ITER(name, __mgr) \
> > >   struct sample_iter name = { \
> > >   .mas = __MA_STATE(&(__mgr)->mt, 0, 0),
> > >   .mgr = __mgr,
> > >   .entry = NULL,
> > >   }
> > 
> > I see this patch is to allow for anonymous maple states, this looks
> > good.
> > 
> > I've a lengthy comment about the iterator that I'm adding here to head
> > off anyone that may copy your example below.
> > 
> > > 
> > > \#define sample_iter_for_each_range(it__, start__, end__) \
> > >   for ((it__).mas.index = start__, (it__).entry = mas_find(&(it__).mas, 
> > > end__ - 1); \
> > >(it__).entry; (it__).entry = mas_find(&(it__).mas, end__ - 1))
> > 
> > I see you've added something like the above in your patch set as well.
> > I'd like to point out that the index isn't the only state information
> > that needs to be altered here, and in fact, this could go very wrong.
> > 
> > The maple state has a node and an offset within that node.  If you set
> > the index to lower than the current position of your iterator and call
> > mas_find() then what happens is somewhat undefined.  I expect you will
> > get the wrong value (most likely either the current value or the very
> > next one that the iterator is already pointing to).  I believe you have
> > been using a fresh maple state for each iterator in your patches, but I
> > haven't had a deep look into your code yet.
> 
> Yes, I'm aware that I'd need to reset the whole iterator in order to re-use
> it.

Okay, good.  The way you have it written makes it unsafe to just call
without knowledge of the state and that will probably end poorly over
the long run.  If it's always starting from MAS_START then it's probably
safer to just initialize when you want to use it to the correct start
address.

> 
> Regarding the other considerations of the iterator design please see my
> answer to Matthew.
> 
> > 
> > We have methods of resetting the iterator and set the range (mas_set()
> > and mas_set_range()) which are safe for what you are doing, but they
> > will start the walk from the root node to the index again.
> > 
> > So, if you know what you are doing is safe, then the way you have
> > written it will work, but it's worth mentioning that this could occur.
> > 
> > It is also worth pointing out that it would be much safer to use a
> > function to do the above so you get type safety.. and I was asked to add
> > this to the VMA interface by Linus [1], which is on its way upstream [2].
> > 
> > 1. 
> > https://lore.kernel.org/linux-mm/CAHk-=wg9wqxbgkndkd2bqocnn73rdswuwsavbb7t-tekyke...@mail.gmail.com/
> > 2. 
> > https://lore.kernel.org/linux-mm/20230120162650.984577-1-liam.howl...@oracle.com/
> 
> You mean having wrappers like sample_find() instead of directly using
> mas_find()?

I'm not sure you need to go that low level, but I would ensure I have a
store/load function that ensures the correct type being put in/read from
are correct on compile - especially since you seem to have two trees to
track two different sets of things.  That iterator is probably safe
since the type is defined within itself.

> 
> > 
> > > 
> > > Signed-off-by: Danilo Krummrich 
> > > ---
> > >   include/linux/maple_tree.h | 7 +--
> > >   1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> > > index e594db58a0f1..ca04c900e51a 100644
> > > --- a/include/linux/maple_tree.h
> > > +++ b/include/linux/maple_tree.h
> > > @@ -424,8 +424,8 @@ struct ma_wr_state {
> > >   #define MA_ERROR(err) \
> > >   ((struct maple_enode *)(((unsigned long)err << 2) | 
> > > 2UL))
> > > -#define MA_STATE(name, mt, first, end)   
> > > \
> > > - struct ma_state name = {\
> > > +#define __MA_STATE(mt, first, end)   
> > > \
> > > + {   \
> > >   .tree = mt, 
> > > \
> > >   .index = first, 
> > > \
> > >   .last = end,
> > > \
> > > @@ -435,6 +435,9 @@ struct ma_wr_state {
> > >   .alloc = NULL,  
> > > \
> > >   }
> > > +#define MA_STATE(name, mt, first, end)   
> > > \
> > > + struct 

[Nouveau] [PATCH v4 4/4] drm/i915: Clean up page shift operation

2023-05-04 Thread Somalapuram Amaranath
Remove page shift operations as ttm_resource moved
from num_pages to size_t size in bytes.

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/i915/i915_scatterlist.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c 
b/drivers/gpu/drm/i915/i915_scatterlist.c
index 114e5e39aa72..bd7aaf7738f4 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.c
+++ b/drivers/gpu/drm/i915/i915_scatterlist.c
@@ -94,7 +94,7 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct 
drm_mm_node *node,
if (!rsgt)
return ERR_PTR(-ENOMEM);
 
-   i915_refct_sgt_init(rsgt, node->size << PAGE_SHIFT);
+   i915_refct_sgt_init(rsgt, node->size);
st = >table;
if (sg_alloc_table(st, DIV_ROUND_UP_ULL(node->size, segment_pages),
   GFP_KERNEL)) {
@@ -105,8 +105,8 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct 
drm_mm_node *node,
sg = st->sgl;
st->nents = 0;
prev_end = (resource_size_t)-1;
-   block_size = node->size << PAGE_SHIFT;
-   offset = node->start << PAGE_SHIFT;
+   block_size = node->size;
+   offset = node->start;
 
while (block_size) {
u64 len;
-- 
2.32.0



Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton



On 20/02/2023 05:35, Ben Skeggs wrote:
> On Sun, 19 Feb 2023 at 04:55, Chris Clayton  wrote:
>>
>>
>>
>> On 18/02/2023 15:19, Chris Clayton wrote:
>>>
>>>
>>> On 18/02/2023 12:25, Karol Herbst wrote:
 On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton  
 wrote:
>
>
>
> On 15/02/2023 11:09, Karol Herbst wrote:
>> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
>> (Thorsten Leemhuis)  wrote:
>>>
>>> On 13.02.23 10:14, Chris Clayton wrote:
 On 13/02/2023 02:57, Dave Airlie wrote:
> On Sun, 12 Feb 2023 at 00:43, Chris Clayton 
>  wrote:
>>
>>
>>
>> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) 
>> wrote:
>>> On 10.02.23 20:01, Karol Herbst wrote:
 On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
 Leemhuis)  wrote:
>
> On 08.02.23 09:48, Chris Clayton wrote:
>>
>> I'm assuming  that we are not going to see a fix for this 
>> regression before 6.2 is released.
>
> Yeah, looks like it. That's unfortunate, but happens. But there 
> is still
> time to fix it and there is one thing I wonder:
>
> Did any of the nouveau developers look at the netconsole captures 
> Chris
> posted more than a week ago to check if they somehow help to 
> track down
> the root of this problem?

 I did now and I can't spot anything. I think at this point it would
 make sense to dump the active tasks/threads via sqsrq keys to see 
 if
 any is in a weird state preventing the machine from shutting down.
>>>
>>> Many thx for looking into it!
>>
>> Yes, thanks Karol.
>>
>> Attached is the output from dmesg when this block of code:
>>
>> /bin/mount /dev/sda7 /mnt/sda7
>> /bin/mountpoint /proc || /bin/mount /proc
>> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
>> /bin/echo t > /proc/sysrq-trigger
>> /bin/sleep 1
>> /bin/sync
>> /bin/sleep 1
>> kill $(pidof dmesg)
>> /bin/umount /mnt/sda7
>>
>> is executed immediately before /sbin/reboot is called as the final 
>> step of rebooting my system.
>>
>> I hope this is what you were looking for, but if not, please let me 
>> know what you need

 Thanks Dave. [...]
>>> FWIW, in case anyone strands here in the archives: the msg was
>>> truncated. The full post can be found in a new thread:
>>>
>>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>>>
>>> Sadly it seems the info "With runpm=0, both reboot and poweroff work on
>>> my laptop." didn't bring us much further to a solution. :-/ I don't
>>> really like it, but for regression tracking I'm now putting this on the
>>> back-burner, as a fix is not in sight.
>>>
>>> #regzbot monitor:
>>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>>> #regzbot backburner: hard to debug and apparently rare
>>> #regzbot ignore-activity
>>>
>>
>> yeah.. this bug looks a little annoying. Sadly the only Turing based
>> laptop I got doesn't work on Nouveau because of firmware related
>> issues and we probably need to get updated ones from Nvidia here :(
>>
>> But it's a bit weird that the kernel doesn't shutdown, because I don't
>> see anything in the logs which would prevent that from happening.
>> Unless it's waiting on one of the tasks to complete, but none of them
>> looked in any way nouveau related.
>>
>> If somebody else has any fancy kernel debugging tips here to figure
>> out why it hangs, that would be very helpful...
>>
>
> I think I've figured this out. It's to do with how my system is 
> configured. I do have an initrd, but the only thing on
> it is the cpu microcode which, it is recommended, should be loaded early. 
> The absence of the NVidia firmare from an
> initrd doesn't matter because the drivers for the hardware that need to 
> load firmware are all built as modules, So, by
> the time the devices are configured via udev, the root partition is 
> mounted and the drivers can get at the firmware.
>
> I've found, by turning on nouveau debug and taking a video of the screen 
> as the system shuts down, that nouveau seems to
> be trying to run the scrubber very very late in the shutdown process. The 
> problem is that by this time, I think the root
> partition, and thus the scrubber binary, have become inaccessible.
>
> I seem to 

Re: [Nouveau] [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings

2023-05-04 Thread Liam R. Howlett
* Danilo Krummrich  [230227 08:17]:

...
> > > Would this variant be significantly more efficient?
> > 
> > Well, what you are doing is walking the tree to see if there's anything
> > there... then re-walking the tree to store it.  So, yes, it's much more
> > efficient..  However, writing is heavier.  How much of the time is spent
> > walking vs writing depends on the size of the tree, but it's rather easy
> > to do this in a single walk of the tree so why wouldn't you?
> 
> I will, I was just curious about how much of an impact it has.
> 
> > 
> > > 
> > > Also, would this also work while already walking the tree?
> > 
> > Yes, to an extent.  If you are at the correct location in the tree, you
> > can write to that location.  If you are not in the correct location and
> > try to write to the tree then things will go poorly..  In this scenario,
> > we are very much walking the tree and writing to it in two steps.
> > 
> > > 
> > > To remove an entry while walking the tree I have a separate function
> > > drm_gpuva_iter_remove(). Would I need something similar for inserting
> > > entries?
> > 
> > I saw that.  Your remove function uses the erase operation which is
> > implemented as a walk to that location and a store of a null over the
> > range that is returned.  You do not need a function to insert an entry
> > if the maple state is at the correct location, and that doesn't just
> > mean setting mas.index/mas.last to the correct value.  There is a node &
> > offset saved in the maple state that needs to be in the correct
> > location.  If you store to that node then the node may be replaced, so
> > other iterators that you have may become stale, but the one you used
> > execute the store operation will now point to the new node with the new
> > entry.
> > 
> > > 
> > > I already provided this example in a separate mail thread, but it may 
> > > makes
> > > sense to move this to the mailing list:
> > > 
> > > In __drm_gpuva_sm_map() we're iterating a given range of the tree, where 
> > > the
> > > given range is the size of the newly requested mapping. 
> > > __drm_gpuva_sm_map()
> > > invokes a callback for each sub-operation that needs to be taken in order 
> > > to
> > > fulfill this mapping request. In most cases such a callback just creates a
> > > drm_gpuva_op object and stores it in a list.
> > > 
> > > However, drivers can also implement the callback, such that they directly
> > > execute this operation within the callback.
> > > 
> > > Let's have a look at the following example:
> > > 
> > >   0 a 2
> > > old: |---|   (bo_offset=n)
> > > 
> > > 1 b 3
> > > req:   |---| (bo_offset=m)
> > > 
> > >   0  a' 1 b 3
> > > new: |-|---| (a.bo_offset=n,b.bo_offset=m)
> > > 
> > > This would result in the following operations.
> > > 
> > > __drm_gpuva_sm_map() finds entry "a" and calls back into the driver
> > > suggesting to re-map "a" with the new size. The driver removes entry "a"
> > > from the tree and adds "a'"
> > 
> > What you have here won't work.  The driver will cause your iterators
> > maple state to point to memory that is freed.  You will either need to
> > pass through your iterator so that the modifications can occur with that
> > maple state so it remains valid, or you will need to invalidate the
> > iterator on every modification by the driver.
> > 
> > I'm sure the first idea you have will be to invalidate the iterator, but
> > that is probably not the way to proceed.  Even ignoring the unclear
> > locking of two maple states trying to modify the tree, this is rather
> > inefficient - each invalidation means a re-walk of the tree.  You may as
> > well not use an iterator in this case.
> > 
> > Depending on how/when the lookups occur, you could still iterate over
> > the tree and let the driver modify the ending of "a", but leave the tree
> > alone and just store b over whatever - but the failure scenarios may
> > cause you grief.
> > 
> > If you pass the iterator through, then you can just use it to do your
> > writes and keep iterating as if nothing changed.
> 
> Passing through the iterater clearly seems to be the way to go.
> 
> I assume that if the entry to insert isn't at the location of the iterator
> (as in the following example) we can just keep walking to this location my
> changing the index of the mas and calling mas_walk()?

no.  You have to mas_set() to the value and walk from the top of the
tree.  mas_walk() walks down, not from side to side - well, it does go
forward within a node (increasing offset), but if you hit the node limit
then you have gotten yourself in trouble.

> This would also imply
> that the "outer" tree walk continues after the entry we just inserted,
> right?

I don't understand the "outer" tree walk statement.

> 
>1 a 3
> old:   |---| (bo_offset=n)
> 
>  0 b 2
> req: |---|   (bo_offset=m)
> 
>  0 b 2  a' 3
> 

[Nouveau] [PATCH 1/5] drm/nouveau/nvfw/acr: make wpr_generic_header_dump() static

2023-05-04 Thread Jiapeng Chong
This symbol is not used outside of acr.c, so marks it static.

drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: warning: no previous prototype 
for ‘wpr_generic_header_dump’.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3023
Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c 
b/drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c
index 83a9c48bc58c..7ac90c495737 100644
--- a/drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c
+++ b/drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c
@@ -45,7 +45,7 @@ wpr_header_v1_dump(struct nvkm_subdev *subdev, const struct 
wpr_header_v1 *hdr)
nvkm_debug(subdev, "\tstatus: %d\n", hdr->status);
 }
 
-void
+static void
 wpr_generic_header_dump(struct nvkm_subdev *subdev, const struct 
wpr_generic_header *hdr)
 {
nvkm_debug(subdev, "wprGenericHeader\n");
-- 
2.20.1.7.g153144c



Re: [Nouveau] [PATCH v2 07/10] iommu/intel: Support the gfp argument to the map_pages op

2023-05-04 Thread Baolu Lu

On 2023/1/19 19:57, Baolu Lu wrote:

On 2023/1/19 2:00, Jason Gunthorpe wrote:

Flow it down to alloc_pgtable_page() via pfn_to_dma_pte() and
__domain_mapping().

Signed-off-by: Jason Gunthorpe


Irrelevant to this patch, GFP_ATOMIC could be changed to GFP_KERNEL in
some places. I will follow up further to clean it up.


It has been done in the next patch. Sorry for the noise.

Best regards,
baolu


[Nouveau] [PATCH] drm/nouveau/fifo: make gf100_fifo_nonstall_block static

2023-05-04 Thread Ben Dooks
Make gf100_fifo_nonstall_block as it isn't exported, to
silence the following sparse warning:

drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: warning: symbol 
'gf100_fifo_nonstall_block' was not declared. Should it be static?

Signed-off-by: Ben Dooks 
---
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c
index 5bb65258c36d..6c94451d0faa 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c
@@ -447,7 +447,7 @@ gf100_fifo_nonstall_allow(struct nvkm_event *event, int 
type, int index)
spin_unlock_irqrestore(>lock, flags);
 }
 
-void
+static void
 gf100_fifo_nonstall_block(struct nvkm_event *event, int type, int index)
 {
struct nvkm_fifo *fifo = container_of(event, typeof(*fifo), 
nonstall.event);
-- 
2.39.0



Re: [Nouveau] [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI

2023-05-04 Thread Boris Brezillon
Hi Danilo,

On Fri, 10 Mar 2023 17:45:58 +0100
Danilo Krummrich  wrote:

> Hi Boris,
> 
> On 3/9/23 10:48, Boris Brezillon wrote:
> > On Thu, 9 Mar 2023 10:12:43 +0100
> > Boris Brezillon  wrote:
> >   
> >> Hi Danilo,
> >>
> >> On Fri, 17 Feb 2023 14:44:06 +0100
> >> Danilo Krummrich  wrote:
> >>  
> >>> Changes in V2:
> >>> ==
> >>>Nouveau:
> >>>  - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in 
> >>> fence
> >>>signalling critical sections. Updates to the VA space are split up 
> >>> in three
> >>>separate stages, where only the 2. stage executes in a fence 
> >>> signalling
> >>>critical section:
> >>>
> >>>  1. update the VA space, allocate new structures and page tables  
> >>
> >> Sorry for the silly question, but I didn't find where the page tables
> >> pre-allocation happens. Mind pointing it to me? It's also unclear when
> >> this step happens. Is this at bind-job submission time, when the job is
> >> not necessarily ready to run, potentially waiting for other deps to be
> >> signaled. Or is it done when all deps are met, as an extra step before
> >> jumping to step 2. If that's the former, then I don't see how the VA
> >> space update can happen, since the bind-job might depend on other
> >> bind-jobs modifying the same portion of the VA space (unbind ops might
> >> lead to intermediate page table levels disappearing while we were
> >> waiting for deps). If it's the latter, I wonder why this is not
> >> considered as an allocation in the fence signaling path (for the
> >> bind-job out-fence to be signaled, you need these allocations to
> >> succeed, unless failing to allocate page-tables is considered like a HW
> >> misbehavior and the fence is signaled with an error in that case).  
> > 
> > Ok, so I just noticed you only have one bind queue per drm_file
> > (cli->sched_entity), and jobs are executed in-order on a given queue,
> > so I guess that allows you to modify the VA space at submit time
> > without risking any modifications to the VA space coming from other
> > bind-queues targeting the same VM. And, if I'm correct, synchronous
> > bind/unbind ops take the same path, so no risk for those to modify the
> > VA space either (just wonder if it's a good thing to have to sync
> > bind/unbind operations waiting on async ones, but that's a different
> > topic).  
> 
> Yes, that's all correct.
> 
> The page table allocation happens through nouveau_uvmm_vmm_get() which 
> either allocates the corresponding page tables or increases the 
> reference count, in case they already exist, accordingly.
> The call goes all the way through nvif into the nvkm layer (not the 
> easiest to follow the call chain) and ends up in nvkm_vmm_ptes_get().
> 
> There are multiple reasons for updating the VA space at submit time in 
> Nouveau.
> 
> 1) Subsequent EXEC ioctl() calls would need to wait for the bind jobs 
> they depend on within the ioctl() rather than in the scheduler queue, 
> because at the point of time where the ioctl() happens the VA space 
> wouldn't be up-to-date.

Hm, actually that's what explicit sync is all about, isn't it? If you
have async binding ops, you should retrieve the bind-op out-fences and
pass them back as in-fences to the EXEC call, so you're sure all the
memory mappings you depend on are active when you execute those GPU
jobs. And if you're using sync binds, the changes are guaranteed to be
applied before the ioctl() returns. Am I missing something?

> 
> 2) Let's assume a new mapping is requested and within it's range other 
> mappings already exist. Let's also assume that those existing mappings 
> aren't contiguous, such that there are gaps between them. In such a case 
> I need to allocate page tables only for the gaps between the existing 
> mappings, or alternatively, allocate them for the whole range of the new 
> mapping, but free / decrease the reference count of the page tables for 
> the ranges of the previously existing mappings afterwards.
> In the first case I need to know the gaps to allocate page tables for 
> when submitting the job, which means the VA space must be up-to-date. In 
> the latter one I must save the ranges of the previously existing 
> mappings somewhere in order to clean them up, hence I need to allocate 
> memory to store this information. Since I can't allocate this memory in 
> the jobs run() callback (fence signalling critical section) I need to do 
> it when submitting the job already and hence the VA space must be 
> up-to-date again.

Yep that makes perfect sense, and that explains how the whole thing can
work. When I initially read the patch series, I had more complex use
cases in mind, with multiple bind queues targeting the same VM, and
synchronous bind taking a fast path (so they don't have to wait on
async binds which can in turn wait on external deps). This model makes
it hard to predict what the VA space will look like when an async bind
operation gets to be 

Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton
Hi again.

On 30/01/2023 20:19, Chris Clayton wrote:
> Thanks, Ben.



>> Hey,
>>
>> This is a complete shot-in-the-dark, as I don't see this behaviour on
>> *any* of my boards.  Could you try the attached patch please?
> 
> Unfortunately, the patch made no difference.
> 
> I've been looking at how the graphics on my laptop is set up, and have a bit 
> of a worry about whether the firmware might
> be playing a part in this problem. In order to offload video decoding to the 
> NVidia TU117 GPU, it seems the scrubber
> firmware must be available, but as far as I know,that has not been released 
> by NVidia. To get it to work, I followed
> what ubuntu have done and the scrubber in /lib/firmware/nvidia/tu117/nvdec/ 
> is a symlink to
> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
> firmware loaded is for a different card is being
> loaded. I note that processing related to firmware is being changed in the 
> patch. Might my set up be at the root of my
> problem?
> 
> I'll have a fiddle an see what I can work out.
> 
> Chris
> 
>>
>> Thanks,
>> Ben.
>>
>>>

Well, my fiddling has got my system rebooting and shutting down successfully 
again. I found that if I delete the symlink
to the scrubber firmware, reboot and shutdown work again. There are however, a 
number of other files in the tu117
firmware directory tree that that are symlinks to actual files in its tu116 
counterpart. So I deleted all of those too.
Unfortunately, the absence of one or more of those symlinks causes Xorg to fail 
to start. I've reinstated all the links
except scrubber and I now have a system that works as it did until I tried to 
run a kernel that includes the bad commit
I identified in my bisection. That includes offloading video decoding to the 
NVidia card, so what ever I read that said
the scrubber firmware was needed seems to have been wrong. I get a new message 
that (nouveau :01:00.0: fb: VPR
locked, but no scrubber binary!), but, hey, we can't have everything.

If you still want to get to the bottom of this, let me know what you need me to 
provide and I'll do my best. I suspect
you might want to because there will a n awful lot of Ubuntu-based systems out 
there with that scrubber.bin symlink in
place. On the other hand,m it could but quite a while before ubuntu are 
deploying 6.2 or later kernels.

Thanks,

Chris




Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton



On 18/02/2023 15:19, Chris Clayton wrote:
> 
> 
> On 18/02/2023 12:25, Karol Herbst wrote:
>> On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton  
>> wrote:
>>>
>>>
>>>
>>> On 15/02/2023 11:09, Karol Herbst wrote:
 On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
 (Thorsten Leemhuis)  wrote:
>
> On 13.02.23 10:14, Chris Clayton wrote:
>> On 13/02/2023 02:57, Dave Airlie wrote:
>>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton  
>>> wrote:



 On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) 
 wrote:
> On 10.02.23 20:01, Karol Herbst wrote:
>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
>> Leemhuis)  wrote:
>>>
>>> On 08.02.23 09:48, Chris Clayton wrote:

 I'm assuming  that we are not going to see a fix for this 
 regression before 6.2 is released.
>>>
>>> Yeah, looks like it. That's unfortunate, but happens. But there is 
>>> still
>>> time to fix it and there is one thing I wonder:
>>>
>>> Did any of the nouveau developers look at the netconsole captures 
>>> Chris
>>> posted more than a week ago to check if they somehow help to track 
>>> down
>>> the root of this problem?
>>
>> I did now and I can't spot anything. I think at this point it would
>> make sense to dump the active tasks/threads via sqsrq keys to see if
>> any is in a weird state preventing the machine from shutting down.
>
> Many thx for looking into it!

 Yes, thanks Karol.

 Attached is the output from dmesg when this block of code:

 /bin/mount /dev/sda7 /mnt/sda7
 /bin/mountpoint /proc || /bin/mount /proc
 /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
 /bin/echo t > /proc/sysrq-trigger
 /bin/sleep 1
 /bin/sync
 /bin/sleep 1
 kill $(pidof dmesg)
 /bin/umount /mnt/sda7

 is executed immediately before /sbin/reboot is called as the final 
 step of rebooting my system.

 I hope this is what you were looking for, but if not, please let me 
 know what you need
>>
>> Thanks Dave. [...]
> FWIW, in case anyone strands here in the archives: the msg was
> truncated. The full post can be found in a new thread:
>
> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>
> Sadly it seems the info "With runpm=0, both reboot and poweroff work on
> my laptop." didn't bring us much further to a solution. :-/ I don't
> really like it, but for regression tracking I'm now putting this on the
> back-burner, as a fix is not in sight.
>
> #regzbot monitor:
> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
> #regzbot backburner: hard to debug and apparently rare
> #regzbot ignore-activity
>

 yeah.. this bug looks a little annoying. Sadly the only Turing based
 laptop I got doesn't work on Nouveau because of firmware related
 issues and we probably need to get updated ones from Nvidia here :(

 But it's a bit weird that the kernel doesn't shutdown, because I don't
 see anything in the logs which would prevent that from happening.
 Unless it's waiting on one of the tasks to complete, but none of them
 looked in any way nouveau related.

 If somebody else has any fancy kernel debugging tips here to figure
 out why it hangs, that would be very helpful...

>>>
>>> I think I've figured this out. It's to do with how my system is configured. 
>>> I do have an initrd, but the only thing on
>>> it is the cpu microcode which, it is recommended, should be loaded early. 
>>> The absence of the NVidia firmare from an
>>> initrd doesn't matter because the drivers for the hardware that need to 
>>> load firmware are all built as modules, So, by
>>> the time the devices are configured via udev, the root partition is mounted 
>>> and the drivers can get at the firmware.
>>>
>>> I've found, by turning on nouveau debug and taking a video of the screen as 
>>> the system shuts down, that nouveau seems to
>>> be trying to run the scrubber very very late in the shutdown process. The 
>>> problem is that by this time, I think the root
>>> partition, and thus the scrubber binary, have become inaccessible.
>>>
>>> I seem to have two choices - either make the firmware accessible on an 
>>> initrd or unload the module in a shutdown script
>>> before the scrubber binary becomes inaccessible. The latter of these is the 
>>> workaround I have implemented whilst the
>>> problem I reported has been under investigation. For simplicity, I think 
>>> I'll promote my 

[Nouveau] [PATCH 2/6] drm/amdgpu: Remove TTM resource->start visible VRAM condition

2023-05-04 Thread Somalapuram Amaranath
Use amdgpu_bo_in_cpu_visible_vram() instead.

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 981010de0a28..d835ee2131d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -600,7 +600,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 
if (!amdgpu_gmc_vram_full_visible(>gmc) &&
bo->tbo.resource->mem_type == TTM_PL_VRAM &&
-   bo->tbo.resource->start < adev->gmc.visible_vram_size >> PAGE_SHIFT)
+   amdgpu_bo_in_cpu_visible_vram(bo))
amdgpu_cs_report_moved_bytes(adev, ctx.bytes_moved,
 ctx.bytes_moved);
else
@@ -1346,7 +1346,6 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
struct ttm_operation_ctx ctx = { false, false };
struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
-   unsigned long offset;
int r;
 
/* Remember that this BO was accessed by the CPU */
@@ -1355,8 +1354,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
if (bo->resource->mem_type != TTM_PL_VRAM)
return 0;
 
-   offset = bo->resource->start << PAGE_SHIFT;
-   if ((offset + bo->base.size) <= adev->gmc.visible_vram_size)
+   if (amdgpu_bo_in_cpu_visible_vram(abo))
return 0;
 
/* Can't move a pinned BO to visible VRAM */
@@ -1378,10 +1376,9 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
else if (unlikely(r))
return VM_FAULT_SIGBUS;
 
-   offset = bo->resource->start << PAGE_SHIFT;
/* this should never happen */
if (bo->resource->mem_type == TTM_PL_VRAM &&
-   (offset + bo->base.size) > adev->gmc.visible_vram_size)
+   amdgpu_bo_in_cpu_visible_vram(abo))
return VM_FAULT_SIGBUS;
 
ttm_bo_move_to_lru_tail_unlocked(bo);
-- 
2.32.0



[Nouveau] [PATCH 4/4] drm/amdkfd: Use cursor start instead of ttm resource start

2023-05-04 Thread Somalapuram Amaranath
cleanup PAGE_SHIFT operation and replacing
ttm_resource resource->start with cursor start
using amdgpu_res_first API

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c06ada0844ba..f87ce4f1cb93 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -200,8 +200,11 @@ static int add_queue_mes(struct device_queue_manager *dqm, 
struct queue *q,
queue_input.wptr_addr = (uint64_t)q->properties.write_ptr;
 
if (q->wptr_bo) {
+   struct amdgpu_res_cursor cursor;
wptr_addr_off = (uint64_t)q->properties.write_ptr & (PAGE_SIZE 
- 1);
-   queue_input.wptr_mc_addr = 
((uint64_t)q->wptr_bo->tbo.resource->start << PAGE_SHIFT) + wptr_addr_off;
+   amdgpu_res_first(q->wptr_bo->tbo.resource, 0,
+q->wptr_bo->tbo.resource->size, );
+   queue_input.wptr_mc_addr = cursor.start + wptr_addr_off;
}
 
queue_input.is_kfd_process = 1;
-- 
2.32.0



[Nouveau] [PATCH 3/6] drm/ttm: Change the meaning of resource->start from pfn to bytes

2023-05-04 Thread Somalapuram Amaranath
Change resource->start from pfn to bytes to
allow allocating objects smaller than a page.
Change all DRM drivers using ttm_resource start and size pfn to bytes.
Change amdgpu_res_first() cur->start, cur->size from pfn to bytes.
Replacing ttm_resource resource->start field with cursor.start.
Change amdgpu_gtt_mgr_new() allocation from pfn to bytes.

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 13 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h  |  8 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 10 +++---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c   |  6 +-
 drivers/gpu/drm/drm_gem_vram_helper.c   |  2 +-
 drivers/gpu/drm/nouveau/nouveau_bo.c| 13 ++---
 drivers/gpu/drm/nouveau/nouveau_bo0039.c|  4 ++--
 drivers/gpu/drm/nouveau/nouveau_mem.c   | 10 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c   |  2 +-
 drivers/gpu/drm/nouveau/nv17_fence.c|  2 +-
 drivers/gpu/drm/nouveau/nv50_fence.c|  2 +-
 drivers/gpu/drm/qxl/qxl_drv.h   |  2 +-
 drivers/gpu/drm/qxl/qxl_object.c|  2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c   |  5 ++---
 drivers/gpu/drm/radeon/radeon_object.c  |  6 +++---
 drivers/gpu/drm/radeon/radeon_object.h  |  2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c | 13 ++---
 drivers/gpu/drm/radeon/radeon_vm.c  |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c  |  4 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c  |  3 +--
 23 files changed, 63 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 44367f03316f..a1fbfc5984d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -116,7 +116,6 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
  struct ttm_resource **res)
 {
struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   uint32_t num_pages = PFN_UP(tbo->base.size);
struct ttm_range_mgr_node *node;
int r;
 
@@ -134,8 +133,10 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
if (place->lpfn) {
spin_lock(>lock);
r = drm_mm_insert_node_in_range(>mm, >mm_nodes[0],
-   num_pages, tbo->page_alignment,
-   0, place->fpfn, place->lpfn,
+   tbo->base.size,
+   tbo->page_alignment << 
PAGE_SHIFT, 0,
+   place->fpfn << PAGE_SHIFT,
+   place->lpfn << PAGE_SHIFT,
DRM_MM_INSERT_BEST);
spin_unlock(>lock);
if (unlikely(r))
@@ -144,7 +145,7 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
node->base.start = node->mm_nodes[0].start;
} else {
node->mm_nodes[0].start = 0;
-   node->mm_nodes[0].size = PFN_UP(node->base.size);
+   node->mm_nodes[0].size = node->base.size;
node->base.start = AMDGPU_BO_INVALID_OFFSET;
}
 
@@ -285,8 +286,8 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, 
uint64_t gtt_size)
 
ttm_resource_manager_init(man, >mman.bdev, gtt_size);
 
-   start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
-   size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
+   start = (AMDGPU_GTT_MAX_TRANSFER_SIZE * 
AMDGPU_GTT_NUM_TRANSFER_WINDOWS) << PAGE_SHIFT;
+   size = adev->gmc.gart_size - start;
drm_mm_init(>mm, start, size);
spin_lock_init(>lock);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index d835ee2131d2..f5d5eee09cea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1488,9 +1488,11 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
 u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
+   struct amdgpu_res_cursor cursor;
uint64_t offset;
 
-   offset = (bo->tbo.resource->start << PAGE_SHIFT) +
+   amdgpu_res_first(bo->tbo.resource, 0, bo->tbo.resource->size, );
+   offset = cursor.start +
 amdgpu_ttm_domain_start(adev, bo->tbo.resource->mem_type);
 
return amdgpu_gmc_sign_extend(offset);

[Nouveau] 2023 X.Org Board of Directors Elections timeline extended, request for nominations

2023-05-04 Thread Ricardo Garcia
We are seeking nominations for candidates for election to the X.org
Foundation Board of Directors. However, as we presently do not have
enough nominations to start the election - the decision has been made to
extend the timeline by 2 weeks. Note this is a fairly regular part of
the elections process.

The new deadline for nominations to the X.org Board of Directors is
23:59 UTC on April 2nd, 2023.

The new deadline for membership application or renewals is April 9th,
2023. Membership is required to vote on the elections.

The Board consists of directors elected from the membership. Each year,
an election is held to bring the total number of directors to eight. The
four members receiving the highest vote totals will serve as directors
for two year terms.

The directors who received two year terms starting in 2022 were Emma
Anholt, Mark Filion, Alyssa Rosenzweig and Ricardo Garcia. They will
continue to serve until their term ends in 2024. Current directors whose
term expires in 2023 are Samuel Iglesias Gonsálvez, Manasi D Navare,
Lyude Paul and Daniel Vetter.

A director is expected to participate in the fortnightly IRC meeting to
discuss current business and to attend the annual meeting of the X.Org
Foundation, which will be held at a location determined in advance by
the Board of Directors.

A member may nominate themselves or any other member they feel is
qualified. Nominations should be sent to the Election Committee at
electi...@x.org.

Nominees shall be required to be current members of the X.Org
Foundation, and submit a personal statement of up to 200 words that will
be provided to prospective voters. The collected statements, along with
the statement of contribution to the X.Org Foundation in the member's
account page on http://members.x.org, will be made available to all
voters to help them make their voting decisions.

Nominations, membership applications or renewals and completed personal
statements must be received no later than 23:59 UTC on April 2nd, 2023.

The slate of candidates will be published April 10th 2023 and candidate
Q will begin then. The deadline for Xorg membership applications and
renewals is April 9th, 2023.

Cheers,
Ricardo Garcia, on behalf of the X.Org BoD



Re: [Nouveau] nvkm_devinit_func.disable() to be made void

2023-05-04 Thread Deepak R Varma
On Sat, Jan 14, 2023 at 08:10:43PM +0530, Deepak R Varma wrote:
> Hello,
> It appears that the callback function disable() of struct nvkm_devinit_func 
> does
> not need return U64 and can be transformed to be a void. This will impact a 
> few
> drivers that have currently implementation of this callback since those always
> return 0ULL. So,
> 
> Change from
> 8 struct nvkm_devinit_func {
>   ... ...
>   15  u64  (*disable)(struct nvkm_devinit *);
> 1 };
> 
> Change to
> 8 struct nvkm_devinit_func {
>   ... ...
>   15  void  (*disable)(struct nvkm_devinit *);
> 1 };
> 
> 
> I am unsure if this change will have any UAPI impact. Hence wanted to confirm
> with you if you think this transformation is useful. If yes, I will be happy 
> to
> submit a patch for your consideration.

Hello,
May I request a response on my query? Shall I proceed with submitting a patch
proposal for consideration?

Thank you,
./drv

> 
> Please let me know.
> 
> Thank you,
> ./drv
> 
> 




Re: [Nouveau] [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI

2023-05-04 Thread Boris Brezillon
Hi Danilo,

On Fri, 17 Feb 2023 14:44:06 +0100
Danilo Krummrich  wrote:

> Changes in V2:
> ==
>   Nouveau:
> - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
>   signalling critical sections. Updates to the VA space are split up in 
> three
>   separate stages, where only the 2. stage executes in a fence signalling
>   critical section:
> 
> 1. update the VA space, allocate new structures and page tables

Sorry for the silly question, but I didn't find where the page tables
pre-allocation happens. Mind pointing it to me? It's also unclear when
this step happens. Is this at bind-job submission time, when the job is
not necessarily ready to run, potentially waiting for other deps to be
signaled. Or is it done when all deps are met, as an extra step before
jumping to step 2. If that's the former, then I don't see how the VA
space update can happen, since the bind-job might depend on other
bind-jobs modifying the same portion of the VA space (unbind ops might
lead to intermediate page table levels disappearing while we were
waiting for deps). If it's the latter, I wonder why this is not
considered as an allocation in the fence signaling path (for the
bind-job out-fence to be signaled, you need these allocations to
succeed, unless failing to allocate page-tables is considered like a HW
misbehavior and the fence is signaled with an error in that case).

Note that I'm not familiar at all with Nouveau or TTM, and it might
be something that's solved by another component, or I'm just
misunderstanding how the whole thing is supposed to work. This being
said, I'd really like to implement a VM_BIND-like uAPI in pancsf using
the gpuva_manager infra you're proposing here, so please bare with me
:-).

> 2. (un-)map the requested memory bindings
> 3. free structures and page tables
> 
> - Separated generic job scheduler code from specific job implementations.
> - Separated the EXEC and VM_BIND implementation of the UAPI.
> - Reworked the locking parts of the nvkm/vmm RAW interface, such that
>   (un-)map operations can be executed in fence signalling critical 
> sections.
> 

Regards,

Boris



[Nouveau] [PATCH] drm/nouveau/mc/ga100: make ga100_mc_device static

2023-05-04 Thread Ben Dooks
Make ga100_mc_device static as it isn't exported, to
fix the following sparse warning:

drivers/gpu/drm/nouveau/nvkm/subdev/mc/ga100.c:51:1: warning: symbol 
'ga100_mc_device' was not declared. Should it be static?

Signed-off-by: Ben Dooks 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/mc/ga100.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mc/ga100.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/mc/ga100.c
index 1e2eabec1a76..5d28d30d09d5 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mc/ga100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mc/ga100.c
@@ -47,7 +47,7 @@ ga100_mc_device_enabled(struct nvkm_mc *mc, u32 mask)
return (nvkm_rd32(mc->subdev.device, 0x000600) & mask) == mask;
 }
 
-const struct nvkm_mc_device_func
+static const struct nvkm_mc_device_func
 ga100_mc_device = {
.enabled = ga100_mc_device_enabled,
.enable = ga100_mc_device_enable,
-- 
2.39.0



Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton
[Resend because the mail client on my phone dedcided to turn HTML on behinf my 
back, so my repluy got bounced.]

Thanks Karol.

I sent the original report to Ben and LKML. Thorsten then added you, Lyude Paul 
and the dri-devel and nouveau  mail
lists. So you should have received this report on or about January 19.

Chris

On 27/01/2023 11:35, Karol Herbst wrote:
> Where was the original email sent to anyway, because I don't have it at all.
> 
> Anyhow, I suspect we want to fetch logs to see what's happening, but
> due to the nature of this bug it might get difficult.
> 
> I'm checking out the laptops I have here if I can reproduce this
> issue, but I think all mine with Turing GPUs are fine.
> 
> Maybe Ben has any idea what might be wrong with
> 0e44c21708761977dcbea9b846b51a6fb684907a or if that's an issue which
> is already fixed by not upstreamed patches as I think I remember Ben
> to talk about something like that recently.
> 
> Karol
> 
> On Fri, Jan 27, 2023 at 12:20 PM Linux kernel regression tracking
> (Thorsten Leemhuis)  wrote:
>>
>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>> to make this easily accessible to everyone.
>>
>> @nouveau-maintainers, did anyone take a look at this? The report is
>> already 8 days old and I don't see a single reply. Sure, we'll likely
>> get a -rc8, but still it would be good to not fix this on the finish line.
>>
>> Chris, btw, did you try if you can revert the commit on top of latest
>> mainline? And if so, does it fix the problem?
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>> #regzbot poke
>>
>> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten Leemhuis)
>> wrote:
>>> [adding various lists and the two other nouveau maintainers to the list
>>> of recipients]
>>
>>> On 18.01.23 21:59, Chris Clayton wrote:
 Hi.

 I build and installed the lastest development kernel earlier this week. 
 I've found that when I try the laptop down (or
 reboot it), it hangs right at the end of closing the current session. The 
 last line I see on  the screen when rebooting is:

  sd 4:0:0:0: [sda] Synchronising SCSI cache

 when closing down I see one additional line:

  sd 4:0:0:0 [sda]Stopping disk

 In both cases the machine then hangs and I have to hold down the power 
 button fot a few seconds to switch it off.

 Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected between this two and 
 landed on:

  # first bad commit: [0e44c21708761977dcbea9b846b51a6fb684907a] 
 drm/nouveau/flcn: new code to load+boot simple HS FWs
 (VPR scrubber)

 I built and installed a kernel with 
 f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) 
 checked out
 and that shuts down and reboots fine. It the did the same with the bad 
 commit checked out and that does indeed hang, so
 I'm confident the bisect outcome is OK.

 Kernels 6.1.6 and 5.15.88 are also OK.

 My system had dual GPUs - one intel and one NVidia. Related extracts from 
 'lscpi -v' is:

 00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD 
 Graphics] (rev 05) (prog-if 00 [VGA controller])
 Subsystem: CLEVO/KAPOK Computer CometLake-H GT2 [UHD Graphics]

 Flags: bus master, fast devsel, latency 0, IRQ 142

 Memory at c200 (64-bit, non-prefetchable) [size=16M]

 Memory at a000 (64-bit, prefetchable) [size=256M]

 I/O ports at 5000 [size=64]

 Expansion ROM at 000c [virtual] [disabled] [size=128K]

 Capabilities: [40] Vendor Specific Information: Len=0c 

 Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00

 Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-

 Capabilities: [d0] Power Management version 2

 Kernel driver in use: i915

 Kernel modules: i915


 01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 
 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
 controller])
 Subsystem: CLEVO/KAPOK Computer TU117M [GeForce GTX 1650 Ti Mobile]
 Flags: bus master, fast devsel, latency 0, IRQ 141
 Memory at c400 (32-bit, non-prefetchable) [size=16M]
 Memory at b000 (64-bit, prefetchable) [size=256M]
 Memory at c000 (64-bit, prefetchable) [size=32M]
 I/O ports at 4000 [size=128]
 Expansion ROM at c300 [disabled] [size=512K]
 Capabilities: [60] Power Management version 3
 Capabilities: [68] MSI: 

Re: [Nouveau] [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI

2023-05-04 Thread Boris Brezillon
On Thu, 9 Mar 2023 10:12:43 +0100
Boris Brezillon  wrote:

> Hi Danilo,
> 
> On Fri, 17 Feb 2023 14:44:06 +0100
> Danilo Krummrich  wrote:
> 
> > Changes in V2:
> > ==
> >   Nouveau:
> > - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
> >   signalling critical sections. Updates to the VA space are split up in 
> > three
> >   separate stages, where only the 2. stage executes in a fence 
> > signalling
> >   critical section:
> > 
> > 1. update the VA space, allocate new structures and page tables  
> 
> Sorry for the silly question, but I didn't find where the page tables
> pre-allocation happens. Mind pointing it to me? It's also unclear when
> this step happens. Is this at bind-job submission time, when the job is
> not necessarily ready to run, potentially waiting for other deps to be
> signaled. Or is it done when all deps are met, as an extra step before
> jumping to step 2. If that's the former, then I don't see how the VA
> space update can happen, since the bind-job might depend on other
> bind-jobs modifying the same portion of the VA space (unbind ops might
> lead to intermediate page table levels disappearing while we were
> waiting for deps). If it's the latter, I wonder why this is not
> considered as an allocation in the fence signaling path (for the
> bind-job out-fence to be signaled, you need these allocations to
> succeed, unless failing to allocate page-tables is considered like a HW
> misbehavior and the fence is signaled with an error in that case).

Ok, so I just noticed you only have one bind queue per drm_file
(cli->sched_entity), and jobs are executed in-order on a given queue,
so I guess that allows you to modify the VA space at submit time
without risking any modifications to the VA space coming from other
bind-queues targeting the same VM. And, if I'm correct, synchronous
bind/unbind ops take the same path, so no risk for those to modify the
VA space either (just wonder if it's a good thing to have to sync
bind/unbind operations waiting on async ones, but that's a different
topic).

> 
> Note that I'm not familiar at all with Nouveau or TTM, and it might
> be something that's solved by another component, or I'm just
> misunderstanding how the whole thing is supposed to work. This being
> said, I'd really like to implement a VM_BIND-like uAPI in pancsf using
> the gpuva_manager infra you're proposing here, so please bare with me
> :-).
> 
> > 2. (un-)map the requested memory bindings
> > 3. free structures and page tables
> > 
> > - Separated generic job scheduler code from specific job 
> > implementations.
> > - Separated the EXEC and VM_BIND implementation of the UAPI.
> > - Reworked the locking parts of the nvkm/vmm RAW interface, such that
> >   (un-)map operations can be executed in fence signalling critical 
> > sections.
> >   
> 
> Regards,
> 
> Boris
> 



[Nouveau] [PATCH 1/4] drm/amdgpu: Movie the amdgpu_gtt_mgr start and size from pages to bytes

2023-05-04 Thread Somalapuram Amaranath
To support GTT manager amdgpu_res_first, amdgpu_res_next
from pages to bytes and clean up PAGE_SHIFT operation.

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 5c4f93ee0c57..5c78f0b09351 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -94,8 +94,8 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
while (start >= node->size << PAGE_SHIFT)
start -= node++->size << PAGE_SHIFT;
 
-   cur->start = (node->start << PAGE_SHIFT) + start;
-   cur->size = min((node->size << PAGE_SHIFT) - start, size);
+   cur->start = node->start + start;
+   cur->size = min(node->size - start, size);
cur->remaining = size;
cur->node = node;
break;
@@ -155,8 +155,8 @@ static inline void amdgpu_res_next(struct amdgpu_res_cursor 
*cur, uint64_t size)
node = cur->node;
 
cur->node = ++node;
-   cur->start = node->start << PAGE_SHIFT;
-   cur->size = min(node->size << PAGE_SHIFT, cur->remaining);
+   cur->start = node->start;
+   cur->size = min(node->size, cur->remaining);
break;
default:
return;
-- 
2.32.0



[Nouveau] [PATCH v2 1/2] drm/nouveau/device: avoid usage of list iterator after loop

2023-05-04 Thread Jakob Koschel
If potentially no valid element is found, 'pstate' would contain an
invalid pointer past the iterator loop. To ensure 'pstate' is always
valid, we only set it if the correct element was found. That allows
adding a WARN_ON() in case the code works incorrectly, exposing
currently undetectable potential bugs.

Additionally, Linus proposed to avoid any use of the list iterator
variable after the loop, in the attempt to move the list iterator
variable declaration into the macro to avoid any potential misuse after
the loop [1].

Link: 
https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
 [1]
Signed-off-by: Jakob Koschel 
---
 drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c
index ce774579c89d..8ae14ab8f88e 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c
@@ -72,7 +72,7 @@ nvkm_control_mthd_pstate_attr(struct nvkm_control *ctrl, void 
*data, u32 size)
} *args = data;
struct nvkm_clk *clk = ctrl->device->clk;
const struct nvkm_domain *domain;
-   struct nvkm_pstate *pstate;
+   struct nvkm_pstate *pstate = NULL, *iter;
struct nvkm_cstate *cstate;
int i = 0, j = -1;
u32 lo, hi;
@@ -103,11 +103,16 @@ nvkm_control_mthd_pstate_attr(struct nvkm_control *ctrl, 
void *data, u32 size)
return -EINVAL;
 
if (args->v0.state != NVIF_CONTROL_PSTATE_ATTR_V0_STATE_CURRENT) {
-   list_for_each_entry(pstate, >states, head) {
-   if (i++ == args->v0.state)
+   list_for_each_entry(iter, >states, head) {
+   if (i++ == args->v0.state) {
+   pstate = iter;
break;
+   }
}
 
+   if (WARN_ON_ONCE(!pstate))
+   return -EINVAL;
+
lo = pstate->base.domain[domain->name];
hi = lo;
list_for_each_entry(cstate, >list, head) {

-- 
2.34.1



Re: [Nouveau] [PATCH v2 0/8] Fix several device private page reference counting issues

2023-05-04 Thread Vlastimil Babka (SUSE)
On 9/28/22 14:01, Alistair Popple wrote:
> This series aims to fix a number of page reference counting issues in
> drivers dealing with device private ZONE_DEVICE pages. These result in
> use-after-free type bugs, either from accessing a struct page which no
> longer exists because it has been removed or accessing fields within the
> struct page which are no longer valid because the page has been freed.
> 
> During normal usage it is unlikely these will cause any problems. However
> without these fixes it is possible to crash the kernel from userspace.
> These crashes can be triggered either by unloading the kernel module or
> unbinding the device from the driver prior to a userspace task exiting. In
> modules such as Nouveau it is also possible to trigger some of these issues
> by explicitly closing the device file-descriptor prior to the task exiting
> and then accessing device private memory.

Hi, as this series was noticed to create a CVE [1], do you think a stable
backport is warranted? I think the "It is possible to launch the attack
remotely." in [1] is incorrect though, right?

It looks to me that patch 1 would be needed since the CONFIG_DEVICE_PRIVATE
introduction, while the following few only to kernels with 27674ef6c73f
(probably not so critical as that includes no LTS)?

Thanks,
Vlastimil

[1] https://nvd.nist.gov/vuln/detail/CVE-2022-3523

> This involves some minor changes to both PowerPC and AMD GPU code.
> Unfortunately I lack hardware to test either of those so any help there
> would be appreciated. The changes mimic what is done in for both Nouveau
> and hmm-tests though so I doubt they will cause problems.
> 
> To: Andrew Morton 
> To: linux...@kvack.org
> Cc: linux-ker...@vger.kernel.org
> Cc: amd-...@lists.freedesktop.org
> Cc: nouveau@lists.freedesktop.org
> Cc: dri-de...@lists.freedesktop.org
> 
> Alistair Popple (8):
>   mm/memory.c: Fix race when faulting a device private page
>   mm: Free device private pages have zero refcount
>   mm/memremap.c: Take a pgmap reference on page allocation
>   mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page()
>   mm/migrate_device.c: Add migrate_device_range()
>   nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()
>   nouveau/dmem: Evict device private memory during release
>   hmm-tests: Add test for migrate_device_range()
> 
>  arch/powerpc/kvm/book3s_hv_uvmem.c   |  17 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  19 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c |  11 +-
>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 108 +++
>  include/linux/memremap.h |   1 +-
>  include/linux/migrate.h  |  15 ++-
>  lib/test_hmm.c   | 129 ++---
>  lib/test_hmm_uapi.h  |   1 +-
>  mm/memory.c  |  16 +-
>  mm/memremap.c|  30 ++-
>  mm/migrate.c |  34 +--
>  mm/migrate_device.c  | 239 +---
>  mm/page_alloc.c  |   8 +-
>  tools/testing/selftests/vm/hmm-tests.c   |  49 +-
>  15 files changed, 516 insertions(+), 163 deletions(-)
> 
> base-commit: 088b8aa537c2c767765f1c19b555f21ffe555786



Re: [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-05-04 Thread Chris Clayton



On 15/02/2023 11:09, Karol Herbst wrote:
> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
> (Thorsten Leemhuis)  wrote:
>>
>> On 13.02.23 10:14, Chris Clayton wrote:
>>> On 13/02/2023 02:57, Dave Airlie wrote:
 On Sun, 12 Feb 2023 at 00:43, Chris Clayton  
 wrote:
>
>
>
> On 10/02/2023 19:33, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 10.02.23 20:01, Karol Herbst wrote:
>>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
>>> Leemhuis)  wrote:

 On 08.02.23 09:48, Chris Clayton wrote:
>
> I'm assuming  that we are not going to see a fix for this regression 
> before 6.2 is released.

 Yeah, looks like it. That's unfortunate, but happens. But there is 
 still
 time to fix it and there is one thing I wonder:

 Did any of the nouveau developers look at the netconsole captures Chris
 posted more than a week ago to check if they somehow help to track down
 the root of this problem?
>>>
>>> I did now and I can't spot anything. I think at this point it would
>>> make sense to dump the active tasks/threads via sqsrq keys to see if
>>> any is in a weird state preventing the machine from shutting down.
>>
>> Many thx for looking into it!
>
> Yes, thanks Karol.
>
> Attached is the output from dmesg when this block of code:
>
> /bin/mount /dev/sda7 /mnt/sda7
> /bin/mountpoint /proc || /bin/mount /proc
> /bin/dmesg -w > /mnt/sda7/sysrq.dmesg.log &
> /bin/echo t > /proc/sysrq-trigger
> /bin/sleep 1
> /bin/sync
> /bin/sleep 1
> kill $(pidof dmesg)
> /bin/umount /mnt/sda7
>
> is executed immediately before /sbin/reboot is called as the final step 
> of rebooting my system.
>
> I hope this is what you were looking for, but if not, please let me know 
> what you need
>>>
>>> Thanks Dave. [...]
>> FWIW, in case anyone strands here in the archives: the msg was
>> truncated. The full post can be found in a new thread:
>>
>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>>
>> Sadly it seems the info "With runpm=0, both reboot and poweroff work on
>> my laptop." didn't bring us much further to a solution. :-/ I don't
>> really like it, but for regression tracking I'm now putting this on the
>> back-burner, as a fix is not in sight.
>>
>> #regzbot monitor:
>> https://lore.kernel.org/lkml/e0b80506-b3cf-315b-4327-1b988d860...@googlemail.com/
>> #regzbot backburner: hard to debug and apparently rare
>> #regzbot ignore-activity
>>
> 
> yeah.. this bug looks a little annoying. Sadly the only Turing based
> laptop I got doesn't work on Nouveau because of firmware related
> issues and we probably need to get updated ones from Nvidia here :(
> 
> But it's a bit weird that the kernel doesn't shutdown, because I don't
> see anything in the logs which would prevent that from happening.
> Unless it's waiting on one of the tasks to complete, but none of them
> looked in any way nouveau related.
> 
> If somebody else has any fancy kernel debugging tips here to figure
> out why it hangs, that would be very helpful...
> 

I think I've figured this out. It's to do with how my system is configured. I 
do have an initrd, but the only thing on
it is the cpu microcode which, it is recommended, should be loaded early. The 
absence of the NVidia firmare from an
initrd doesn't matter because the drivers for the hardware that need to load 
firmware are all built as modules, So, by
the time the devices are configured via udev, the root partition is mounted and 
the drivers can get at the firmware.

I've found, by turning on nouveau debug and taking a video of the screen as the 
system shuts down, that nouveau seems to
be trying to run the scrubber very very late in the shutdown process. The 
problem is that by this time, I think the root
partition, and thus the scrubber binary, have become inaccessible.

I seem to have two choices - either make the firmware accessible on an initrd 
or unload the module in a shutdown script
before the scrubber binary becomes inaccessible. The latter of these is the 
workaround I have implemented whilst the
problem I reported has been under investigation. For simplicity, I think I'll 
promote my workaround to being the
permanent solution.

So, apologies (and thanks) to everyone whose time I have taken up with this 
non-bug.

Chris

>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> That page also explains what to do if mails like this annoy you.
>>
>> #regzbot ignore-activity
>>
> 


[Nouveau] [PATCH] drm/nouveau/fifo: small cleanup in nvkm_chan_cctx_get()

2023-05-04 Thread Dan Carpenter
The ">cgrp->mutex" and ">mutex" variables refer to the same
thing.  Use ">mutex" consistently.

Signed-off-by: Dan Carpenter 
---
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c
index b7c9d6115bce..790b73ee5272 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c
@@ -105,7 +105,7 @@ nvkm_chan_cctx_get(struct nvkm_chan *chan, struct nvkm_engn 
*engn, struct nvkm_c
if (cctx) {
refcount_inc(>refs);
*pcctx = cctx;
-   mutex_unlock(>cgrp->mutex);
+   mutex_unlock(>mutex);
return 0;
}
 
-- 
2.35.1



[Nouveau] [PATCH v3 3/4] drm/amdgpu: Movie the amdgpu_gtt_mgr start and size from pages to bytes

2023-05-04 Thread Somalapuram Amaranath
To support GTT manager amdgpu_res_first, amdgpu_res_next
from pages to bytes and clean up PAGE_SHIFT operation.
v1 -> v2: reorder patch sequence

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 5c4f93ee0c57..5c78f0b09351 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -94,8 +94,8 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
while (start >= node->size << PAGE_SHIFT)
start -= node++->size << PAGE_SHIFT;
 
-   cur->start = (node->start << PAGE_SHIFT) + start;
-   cur->size = min((node->size << PAGE_SHIFT) - start, size);
+   cur->start = node->start + start;
+   cur->size = min(node->size - start, size);
cur->remaining = size;
cur->node = node;
break;
@@ -155,8 +155,8 @@ static inline void amdgpu_res_next(struct amdgpu_res_cursor 
*cur, uint64_t size)
node = cur->node;
 
cur->node = ++node;
-   cur->start = node->start << PAGE_SHIFT;
-   cur->size = min(node->size << PAGE_SHIFT, cur->remaining);
+   cur->start = node->start;
+   cur->size = min(node->size, cur->remaining);
break;
default:
return;
-- 
2.32.0



[Nouveau] [PATCH v2 4/4] drm/amdgpu: Support allocate of amdgpu_gtt_mgr from pages to bytes

2023-05-04 Thread Somalapuram Amaranath
Change the GTT manager init and allocate from pages to bytes
v1 -> v2: reorder patch sequence

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 44367f03316f..a1fbfc5984d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -116,7 +116,6 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
  struct ttm_resource **res)
 {
struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   uint32_t num_pages = PFN_UP(tbo->base.size);
struct ttm_range_mgr_node *node;
int r;
 
@@ -134,8 +133,10 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
if (place->lpfn) {
spin_lock(>lock);
r = drm_mm_insert_node_in_range(>mm, >mm_nodes[0],
-   num_pages, tbo->page_alignment,
-   0, place->fpfn, place->lpfn,
+   tbo->base.size,
+   tbo->page_alignment << 
PAGE_SHIFT, 0,
+   place->fpfn << PAGE_SHIFT,
+   place->lpfn << PAGE_SHIFT,
DRM_MM_INSERT_BEST);
spin_unlock(>lock);
if (unlikely(r))
@@ -144,7 +145,7 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
node->base.start = node->mm_nodes[0].start;
} else {
node->mm_nodes[0].start = 0;
-   node->mm_nodes[0].size = PFN_UP(node->base.size);
+   node->mm_nodes[0].size = node->base.size;
node->base.start = AMDGPU_BO_INVALID_OFFSET;
}
 
@@ -285,8 +286,8 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, 
uint64_t gtt_size)
 
ttm_resource_manager_init(man, >mman.bdev, gtt_size);
 
-   start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
-   size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
+   start = (AMDGPU_GTT_MAX_TRANSFER_SIZE * 
AMDGPU_GTT_NUM_TRANSFER_WINDOWS) << PAGE_SHIFT;
+   size = adev->gmc.gart_size - start;
drm_mm_init(>mm, start, size);
spin_lock_init(>lock);
 
-- 
2.32.0



[Nouveau] [PATCH 2/4] drm/amdgpu: Support allocate of amdgpu_gtt_mgr from pages to bytes

2023-05-04 Thread Somalapuram Amaranath
Change the GTT manager init and allocate from pages to bytes

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 44367f03316f..a1fbfc5984d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -116,7 +116,6 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
  struct ttm_resource **res)
 {
struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-   uint32_t num_pages = PFN_UP(tbo->base.size);
struct ttm_range_mgr_node *node;
int r;
 
@@ -134,8 +133,10 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
if (place->lpfn) {
spin_lock(>lock);
r = drm_mm_insert_node_in_range(>mm, >mm_nodes[0],
-   num_pages, tbo->page_alignment,
-   0, place->fpfn, place->lpfn,
+   tbo->base.size,
+   tbo->page_alignment << 
PAGE_SHIFT, 0,
+   place->fpfn << PAGE_SHIFT,
+   place->lpfn << PAGE_SHIFT,
DRM_MM_INSERT_BEST);
spin_unlock(>lock);
if (unlikely(r))
@@ -144,7 +145,7 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
node->base.start = node->mm_nodes[0].start;
} else {
node->mm_nodes[0].start = 0;
-   node->mm_nodes[0].size = PFN_UP(node->base.size);
+   node->mm_nodes[0].size = node->base.size;
node->base.start = AMDGPU_BO_INVALID_OFFSET;
}
 
@@ -285,8 +286,8 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, 
uint64_t gtt_size)
 
ttm_resource_manager_init(man, >mman.bdev, gtt_size);
 
-   start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
-   size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
+   start = (AMDGPU_GTT_MAX_TRANSFER_SIZE * 
AMDGPU_GTT_NUM_TRANSFER_WINDOWS) << PAGE_SHIFT;
+   size = adev->gmc.gart_size - start;
drm_mm_init(>mm, start, size);
spin_lock_init(>lock);
 
-- 
2.32.0



Re: [Nouveau] [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings

2023-05-04 Thread Bagas Sanjaya
On Wed, Jan 18, 2023 at 07:12:45AM +0100, Danilo Krummrich wrote:
> This adds the infrastructure for a manager implementation to keep track
> of GPU virtual address (VA) mappings.

"Add infrastructure for ..."

> + * Analogue to drm_gpuva_sm_map_ops_create() drm_gpuva_sm_unmap_ops_create()
> + * provides drivers a the list of operations to be executed in order to unmap
> + * a range of GPU VA space. The logic behind this functions is way simpler
> + * though: For all existent mappings enclosed by the given range unmap
> + * operations are created. For mappings which are only partically located 
> within
> + * the given range, remap operations are created such that those mappings are
> + * split up and re-mapped partically.

"Analogous to ..."

> + *
> + * The following paragraph depicts the basic constellations of existent GPU 
> VA
> + * mappings, a newly requested mapping and the resulting mappings as 
> implemented
> + * by drm_gpuva_sm_map_ops_create()  - it doesn't cover arbitrary 
> combinations
> + * of those constellations.
> + *
> + * ::
> + *
> + *   1) Existent mapping is kept.
> + *   
> + *
> + *0 a 1
> + *   old: |---| (bo_offset=n)
> + *
> + *0 a 1
> + *   req: |---| (bo_offset=n)
> + *
> + *0 a 1
> + *   new: |---| (bo_offset=n)
> + *
> + *
> + *   2) Existent mapping is replaced.
> + *   
> + *
> + *0 a 1
> + *   old: |---| (bo_offset=n)
> + *
> + *0 a 1
> + *   req: |---| (bo_offset=m)
> + *
> + *0 a 1
> + *   new: |---| (bo_offset=m)
> + *
> + *
> + *   3) Existent mapping is replaced.
> + *   
> + *
> + *0 a 1
> + *   old: |---| (bo_offset=n)
> + *
> + *0 b 1
> + *   req: |---| (bo_offset=n)
> + *
> + *0 b 1
> + *   new: |---| (bo_offset=n)
> + *
> + *
> + *   4) Existent mapping is replaced.
> + *   
> + *
> + *0  a  1
> + *   old: |-|   (bo_offset=n)
> + *
> + *0 a 2
> + *   req: |---| (bo_offset=n)
> + *
> + *0 a 2
> + *   new: |---| (bo_offset=n)
> + *
> + *   Note: We expect to see the same result for a request with a different bo
> + * and/or bo_offset.
> + *
> + *
> + *   5) Existent mapping is split.
> + *   -
> + *
> + *0 a 2
> + *   old: |---| (bo_offset=n)
> + *
> + *0  b  1
> + *   req: |-|   (bo_offset=n)
> + *
> + *0  b  1  a' 2
> + *   new: |-|-| (b.bo_offset=n, a.bo_offset=n+1)
> + *
> + *   Note: We expect to see the same result for a request with a different bo
> + * and/or non-contiguous bo_offset.
> + *
> + *
> + *   6) Existent mapping is kept.
> + *   
> + *
> + *0 a 2
> + *   old: |---| (bo_offset=n)
> + *
> + *0  a  1
> + *   req: |-|   (bo_offset=n)
> + *
> + *0 a 2
> + *   new: |---| (bo_offset=n)
> + *
> + *
> + *   7) Existent mapping is split.
> + *   -
> + *
> + *0 a 2
> + *   old: |---| (bo_offset=n)
> + *
> + *  1  b  2
> + *   req:   |-| (bo_offset=m)
> + *
> + *0  a  1  b  2
> + *   new: |-|-| (a.bo_offset=n,b.bo_offset=m)
> + *
> + *
> + *   8) Existent mapping is kept.
> + *   
> + *
> + * 0 a 2
> + *   old: |---| (bo_offset=n)
> + *
> + *  1  a  2
> + *   req:   |-| (bo_offset=n+1)
> + *
> + *0 a 2
> + *   new: |---| (bo_offset=n)
> + *
> + *
> + *   9) Existent mapping is split.
> + *   -
> + *
> + *0 a 2
> + *   old: |---|   (bo_offset=n)
> + *
> + *  1 b 3
> + *   req:   |---| (bo_offset=m)
> + *
> + *0  a  1 b 3
> + *   new: |-|---| (a.bo_offset=n,b.bo_offset=m)
> + *
> + *
> + *   10) Existent mapping is merged.
> + *   ---
> + *
> + *0 a 2
> + *   old: |---|   (bo_offset=n)
> + *
> + *  1 a 3
> + *   req:   |---| (bo_offset=n+1)
> + *
> + *0a3
> + *   new: |-| (bo_offset=n)
> + *
> + *
> + *   11) Existent mapping is split.
> + *   --
> + *
> + *0a3
> + *   old: |-| (bo_offset=n)
> + *
> + *  1  b  2
> + *   req:   |-|   (bo_offset=m)
> + *
> + *0  a  1  b  2  a' 3
> + *   new: |-|-|-| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
> + *
> + *
> + *   12) Existent mapping is kept.
> + *   -
> + *
> + * 

[Nouveau] [PATCH v4 3/4] drm/amdgpu: GDS/GWS/OA cleanup the page shift operation

2023-05-04 Thread Somalapuram Amaranath
Remove page shift operations as ttm_resource moved
from num_pages to size_t size in bytes.
v1 – v4: adding missing related to amdgpu_ttm_init_on_chip

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 12 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  6 +++---
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 9e549923622b..2732d89c8468 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -142,16 +142,16 @@ void amdgpu_job_set_resources(struct amdgpu_job *job, 
struct amdgpu_bo *gds,
  struct amdgpu_bo *gws, struct amdgpu_bo *oa)
 {
if (gds) {
-   job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
-   job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
+   job->gds_base = amdgpu_bo_gpu_offset(gds);
+   job->gds_size = amdgpu_bo_size(gds);
}
if (gws) {
-   job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT;
-   job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT;
+   job->gws_base = amdgpu_bo_gpu_offset(gws);
+   job->gws_size = amdgpu_bo_size(gws);
}
if (oa) {
-   job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT;
-   job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT;
+   job->oa_base = amdgpu_bo_gpu_offset(oa);
+   job->oa_size = amdgpu_bo_size(oa);
}
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2ab67ab204df..bbd0a4550fbf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -541,12 +541,11 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA)) {
/* GWS and OA don't need any alignment. */
page_align = bp->byte_align;
-   size <<= PAGE_SHIFT;
 
} else if (bp->domain & AMDGPU_GEM_DOMAIN_GDS) {
/* Both size and alignment must be a multiple of 4. */
page_align = ALIGN(bp->byte_align, 4);
-   size = ALIGN(size, 4) << PAGE_SHIFT;
+   size = ALIGN(size, 4);
} else {
/* Memory should be aligned at least to a page size. */
page_align = ALIGN(bp->byte_align, PAGE_SIZE) >> PAGE_SHIFT;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index ffe6a1ab7f9a..c1500875b4ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1849,19 +1849,19 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
}
 
/* Initialize various on-chip memory pools */
-   r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GDS, adev->gds.gds_size);
+   r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GDS, adev->gds.gds_size << 
PAGE_SHIFT);
if (r) {
DRM_ERROR("Failed initializing GDS heap.\n");
return r;
}
 
-   r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GWS, adev->gds.gws_size);
+   r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_GWS, adev->gds.gws_size << 
PAGE_SHIFT);
if (r) {
DRM_ERROR("Failed initializing gws heap.\n");
return r;
}
 
-   r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_OA, adev->gds.oa_size);
+   r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_OA, adev->gds.oa_size << 
PAGE_SHIFT);
if (r) {
DRM_ERROR("Failed initializing oa heap.\n");
return r;
-- 
2.32.0



Re: [Nouveau] [PATCH v2 01/10] iommu: Add a gfp parameter to iommu_map()

2023-05-04 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, January 19, 2023 2:01 AM
> 
> The internal mechanisms support this, but instead of exposting the gfp to
> the caller it wrappers it into iommu_map() and iommu_map_atomic()
> 
> Fix this instead of adding more variants for GFP_KERNEL_ACCOUNT.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


[Nouveau] [PATCH v4 2/4] drm/amdkfd: Use cursor start instead of ttm resource start

2023-05-04 Thread Somalapuram Amaranath
cleanup PAGE_SHIFT operation and replacing
ttm_resource resource->start with cursor start
using amdgpu_res_first API.
v1 -> v2: reorder patch sequence
v2 -> v3: addressing review comment v2

Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c06ada0844ba..9114393d2ee6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -200,8 +200,12 @@ static int add_queue_mes(struct device_queue_manager *dqm, 
struct queue *q,
queue_input.wptr_addr = (uint64_t)q->properties.write_ptr;
 
if (q->wptr_bo) {
+   struct amdgpu_res_cursor cursor;
+
wptr_addr_off = (uint64_t)q->properties.write_ptr & (PAGE_SIZE 
- 1);
-   queue_input.wptr_mc_addr = 
((uint64_t)q->wptr_bo->tbo.resource->start << PAGE_SHIFT) + wptr_addr_off;
+   amdgpu_res_first(q->wptr_bo->tbo.resource, 0,
+q->wptr_bo->tbo.resource->size, );
+   queue_input.wptr_mc_addr = cursor.start + wptr_addr_off;
}
 
queue_input.is_kfd_process = 1;
-- 
2.32.0



Re: [Nouveau] [PATCH v2 08/10] iommu/intel: Use GFP_KERNEL in sleepable contexts

2023-05-04 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, January 19, 2023 2:01 AM
> 
> These contexts are sleepable, so use the proper annotation. The
> GFP_ATOMIC
> was added mechanically in the prior patches.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


Re: [Nouveau] [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-05-04 Thread Intel



On 1/18/23 07:12, Danilo Krummrich wrote:

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
UMDs can request the named operations to be processed either
synchronously or asynchronously. It supports DRM syncobjs
(incl. timelines) as synchronization mechanism. The management of the
GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
execution happens asynchronously. It supports DRM syncobj (incl.
timelines) as synchronization mechanism. DRM GEM object locking is
handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich 
---
  Documentation/gpu/driver-uapi.rst   |   3 +
  drivers/gpu/drm/nouveau/Kbuild  |   2 +
  drivers/gpu/drm/nouveau/Kconfig |   2 +
  drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
  drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
  drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
  drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
  drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
  drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
  drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
  drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
  11 files changed, 1295 insertions(+), 4 deletions(-)
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

...


+static struct dma_fence *
+nouveau_bind_job_run(struct nouveau_job *job)
+{
+   struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+   struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+   struct bind_job_op *op;
+   int ret = 0;
+


I was looking at how nouveau does the async binding compared to how xe 
does it.
It looks to me that this function being a scheduler run_job callback is 
the main part of the VM_BIND dma-fence signalling critical section for 
the job's done_fence and if so, needs to be annotated as such?


For example nouveau_uvma_region_new allocates memory, which is not 
allowed if in a dma_fence signalling critical section and the locking 
also looks suspicious?


Thanks,

Thomas



+   nouveau_uvmm_lock(uvmm);
+   list_for_each_op(op, _job->ops) {
+   switch (op->op) {
+   case OP_ALLOC: {
+   bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE;
+
+   ret = nouveau_uvma_region_new(uvmm,
+ op->va.addr,
+ op->va.range,
+ sparse);
+   if (ret)
+   goto out_unlock;
+   break;
+   }
+   case OP_FREE:
+   ret = nouveau_uvma_region_destroy(uvmm,
+ op->va.addr,
+ op->va.range);
+   if (ret)
+   goto out_unlock;
+   break;
+   case OP_MAP:
+   ret = nouveau_uvmm_sm_map(uvmm,
+ op->va.addr, op->va.range,
+ op->gem.obj, op->gem.offset,
+ op->flags && 0xff);
+   if (ret)
+   goto out_unlock;
+   break;
+   case OP_UNMAP:
+   ret = nouveau_uvmm_sm_unmap(uvmm,
+   op->va.addr,
+   op->va.range);
+   if (ret)
+   goto out_unlock;
+   break;
+   }
+   }
+
+out_unlock:
+   nouveau_uvmm_unlock(uvmm);
+   if (ret)
+   NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret);
+   return ERR_PTR(ret);
+}
+
+static void
+nouveau_bind_job_free(struct nouveau_job *job)
+{
+   struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+   struct bind_job_op *op, *next;
+
+   list_for_each_op_safe(op, next, _job->ops) {
+   struct drm_gem_object *obj = 

[Nouveau] [PATCH] drm/nouveau/gr/gv100-: unlock on error in gf100_gr_chan_new()

2023-05-04 Thread Dan Carpenter
Drop the "gr->fecs.mutex" lock before returning on this error path.

Fixes: ca081fff6ecc ("drm/nouveau/gr/gf100-: generate golden context during 
first object alloc")
Signed-off-by: Dan Carpenter 
---
 drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c
index 5f20079c3660..24bec8f8f83e 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c
@@ -442,6 +442,7 @@ gf100_gr_chan_new(struct nvkm_gr *base, struct 
nvkm_fifo_chan *fifoch,
if (gr->data == NULL) {
ret = gf100_grctx_generate(gr, chan, fifoch->inst);
if (ret) {
+   mutex_unlock(>fecs.mutex);
nvkm_error(>engine.subdev, "failed to construct 
context\n");
return ret;
}
-- 
2.35.1



[Nouveau] [PATCH] drm/nouveau/fifo: make nvkm_runl_new() return error pointers

2023-05-04 Thread Dan Carpenter
All six callers expect error pointers instead of NULL so make the
nvkm_runl_new() return error pointers as expected.

Fixes: d94470e9d150 ("drm/nouveau/fifo: add common runlist/engine topology")
Signed-off-by: Dan Carpenter 
---
 drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c
index b5836cbc29aa..adc4a9544ebc 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c
@@ -399,7 +399,7 @@ nvkm_runl_new(struct nvkm_fifo *fifo, int runi, u32 addr, 
int id_nr)
int ret;
 
if (!(runl = kzalloc(sizeof(*runl), GFP_KERNEL)))
-   return NULL;
+   return ERR_PTR(-ENOMEM);
 
runl->func = fifo->func->runl;
runl->fifo = fifo;
@@ -419,7 +419,7 @@ nvkm_runl_new(struct nvkm_fifo *fifo, int runi, u32 addr, 
int id_nr)
(ret = nvkm_chid_new(_chan_event, subdev, id_nr, 0, 
id_nr, >chid))) {
RUNL_ERROR(runl, "cgid/chid: %d", ret);
nvkm_runl_del(runl);
-   return NULL;
+   return ERR_PTR(ret);
}
} else {
runl->cgid = nvkm_chid_ref(fifo->cgid);
-- 
2.35.1



Re: [Nouveau] [PATCH v2 07/10] iommu/intel: Support the gfp argument to the map_pages op

2023-05-04 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, January 19, 2023 2:01 AM
> 
> Flow it down to alloc_pgtable_page() via pfn_to_dma_pte() and
> __domain_mapping().
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


Re: [Nouveau] [REGRESSION] GM20B probe fails after commit 2541626cfb79

2023-05-04 Thread Diogo Ivo
On Mon, Jan 16, 2023 at 07:45:05AM +1000, David Airlie wrote:
> On Thu, Dec 29, 2022 at 12:58 AM Diogo Ivo  
> wrote:
> As a quick check can you try changing
> 
> drivers/gpu/drm/nouveau/nvkm/core/firmware.c:nvkm_firmware_mem_target
> from NVKM_MEM_TARGET_HOST to NVKM_MEM_TARGET_NCOH ?

Hello!

Applying this change breaks probing in a different way, with a
bad PC=0x0. From a quick look at nvkm_falcon_load_dmem it looks like this
could happen due to the .load_dmem() callback not being properly
initialized. This is the kernel log I got:

[2.010601] Unable to handle kernel NULL pointer dereference at virtual 
address 
[2.019436] Mem abort info:
[2.022273]   ESR = 0x8605
[2.026066]   EC = 0x21: IABT (current EL), IL = 32 bits
[2.031429]   SET = 0, FnV = 0
[2.034528]   EA = 0, S1PTW = 0
[2.037694]   FSC = 0x05: level 1 translation fault
[2.042572] [] user address but active_mm is swapper
[2.048961] Internal error: Oops: 8605 [#1] SMP
[2.054529] Modules linked in:
[2.057582] CPU: 0 PID: 36 Comm: kworker/u8:1 Not tainted 6.2.0-rc3+ #2
[2.064190] Hardware name: Google Pixel C (DT)
[2.068628] Workqueue: events_unbound deferred_probe_work_func
[2.074463] pstate: 4005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[2.081417] pc : 0x0
[2.083600] lr : nvkm_falcon_load_dmem+0x58/0x80
[2.088218] sp : ffc009ddb6f0
[2.091526] x29: ffc009ddb6f0 x28: ff808028a008 x27: ff8081e43c38
[2.098658] x26: 00ff x25: ff808028a0a0 x24: 
[2.105788] x23: ff8080c328f8 x22: 002c x21: 5fd4
[2.112917] x20: ffc009ddb76c x19: ff8080c328b8 x18: 
[2.120047] x17: 2e74696e695f646f x16: 6874656d5f77732f x15: 
[2.127176] x14: 02f546c2 x13:  x12: 01ce
[2.134306] x11: 0001 x10: 0a90 x9 : ffc009ddb600
[2.141436] x8 : ff80803d19f0 x7 : ff80bf971180 x6 : 01b9
[2.148565] x5 :  x4 :  x3 : 002c
[2.155693] x2 : 5fd4 x1 : ffc009ddb76c x0 : ff8080c328b8
[2.162822] Call trace:
[2.165264]  0x0
[2.167099]  gm20b_pmu_init+0x78/0xb4
[2.170762]  nvkm_pmu_init+0x20/0x34
[2.174334]  nvkm_subdev_init_+0x60/0x12c
[2.178339]  nvkm_subdev_init+0x60/0xa0
[2.182171]  nvkm_device_init+0x14c/0x2a0
[2.186178]  nvkm_udevice_init+0x60/0x9c
[2.190097]  nvkm_object_init+0x48/0x1b0
[2.194013]  nvkm_ioctl_new+0x168/0x254
[2.197843]  nvkm_ioctl+0xd0/0x220
[2.201239]  nvkm_client_ioctl+0x10/0x1c
[2.205160]  nvif_object_ctor+0xf4/0x22c
[2.209079]  nvif_device_ctor+0x28/0x70
[2.212910]  nouveau_cli_init+0x150/0x590
[2.216916]  nouveau_drm_device_init+0x60/0x2a0
[2.221442]  nouveau_platform_device_create+0x90/0xd0
[2.226489]  nouveau_platform_probe+0x3c/0x9c
[2.230841]  platform_probe+0x68/0xc0
[2.234500]  really_probe+0xbc/0x2dc
[2.238070]  __driver_probe_device+0x78/0xe0
[2.242334]  driver_probe_device+0xd8/0x160
[2.246511]  __device_attach_driver+0xb8/0x134
[2.250948]  bus_for_each_drv+0x78/0xd0
[2.254782]  __device_attach+0x9c/0x1a0
[2.258612]  device_initial_probe+0x14/0x20
[2.262789]  bus_probe_device+0x98/0xa0
[2.266619]  deferred_probe_work_func+0x88/0xc0
[2.271142]  process_one_work+0x204/0x40c
[2.275150]  worker_thread+0x230/0x450
[2.278894]  kthread+0xc8/0xcc
[2.281946]  ret_from_fork+0x10/0x20
[2.285525] Code: bad PC value
[2.288576] ---[ end trace  ]---

Diogo


Re: [Nouveau] 2023 X.Org Foundation Membership deadline for voting in the election

2023-05-04 Thread Harald Koenig
On Apr 17, Laurent Pinchart wrote:

> I don't know if I'm the only one affected by this issue, but I've just
> received today two months of e-mails from x.org, including all the
> reminders aboud membership renewal and election nomination period. This
> isn't the first time this happens, and the last time I was told there
> was no automated process to quick the mail queues when errors happen,
> making mails pile up forever on x.org's side until someone handles it
> manually. This is something you really want to automate, or at least
> monitored.

same here for me: looking into the mail header,
both mails were stuck on server "gabe.freedesktop.org" 

Received: from gabe.freedesktop.org (localhost [127.0.0.1])
by gabe.freedesktop.org (Postfix) with ESMTP id BD01310E459;
Mon, 17 Apr 2023 11:42:45 + (UTC)
X-Original-To: eve...@lists.x.org
Delivered-To: eve...@lists.x.org
Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 6C54510E162;
 Wed, 15 Feb 2023 15:58:10 + (UTC)

and 

Received: from gabe.freedesktop.org (localhost [127.0.0.1])
by gabe.freedesktop.org (Postfix) with ESMTP id 6735010E46D;
Mon, 17 Apr 2023 11:42:45 + (UTC)
X-Original-To: eve...@lists.x.org
Delivered-To: eve...@lists.x.org
Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 98DB48953E;
 Mon, 13 Mar 2023 15:23:02 + (UTC)



Harald
-- 
"I hope to die  ___   _
before I *have* to use Microsoft Word.",   0--,|/OOO\
Donald E. Knuth, 02-Oct-2001 in Tuebingen.<_/  /  /OOO\
\  \/OOO\
  \ O|//
   \/\/\/\/\/\/\/\/\/
Harald Koenig   //  / \\  \
harald.koe...@mailbox.org  ^   ^


  1   2   >