Re: [ANNOUNCE] mesa 22.0.0-rc2
On 2/9/22 20:56, Bas Nieuwenhuizen wrote: On Wed, Feb 9, 2022 at 7:10 PM Dylan Baker wrote: Hi all, I'd like to announce the availability of Mesa 22.0.0-rc2, the second release candidate for mesa 22.0.0. We have lots of fixes here, including a good deal of zink fixes, and some changes for shared microsoft, egl, core mesa, crocus, broadcom, iris, core intel, anv, llvmpipe, xvga, radeonsi, aco, and radv. Cheers, Dylan shortlog Charmaine Lee (1): mesa: fix misaligned pointer returned by dlist_alloc Daniel Stone (1): egl/wayland: Reset buffer age when destroying buffers Danylo Piliaiev (1): turnip: Unconditionaly remove descriptor set from pool's list on free Dave Airlie (1): crocus: find correct relocation target for the bo. Dylan Baker (4): .pick_status.json: Update to 0447a2303fb06d6ad1f64e5f079a74bf2cf540da .pick_status.json: Update to 8335fdfeafbe1fd14cb65f9088bbba15d9eb00dc .pick_status.json: Update to 5e9df85b1a4504c5b4162e77e139056dc80accc6 VERSION: bump version for 22.0.0-rc2 Iago Toral Quiroga (1): broadcom/compiler: fix offset alignment for ldunifa when skipping Jesse Natalie (2): microsoft/compiler: Only prep phis for the current function microsoft/compiler: Only treat tess level location as special if it's a patch constant Kenneth Graunke (1): iris: Make an iris_foreach_batch macro that skips unsupported batches Lionel Landwerlin (3): intel/fs: don't set allow_sample_mask for CS intrinsics intel/nir: fix shader call lowering anv: fix conditional render for vkCmdDrawIndirectByteCountEXT Mike Blumenkrantz (7): zink: disable PIPE_SHADER_CAP_FP16_CONST_BUFFERS llvmpipe: disable PIPE_SHADER_CAP_FP16_CONST_BUFFERS zink: add VK_BUFFER_USAGE_CONDITIONAL_RENDERING_BIT_EXT for query binds zink: use scanout obj when returning resource param info zink: fix PIPE_CAP_TGSI_BALLOT export conditional zink: reject invalid draws zink: min/max blit region in coverage functions Neha Bhende (1): svga: store shared_mem_size in svga_compute_shader instead of svga_context Pierre-Eric Pelloux-Prayer (1): radeonsi: limit loop unrolling for LLVM < 13 Rhys Perry (2): aco: don't encode src2 for v_writelane_b32_e64 radv: fix R_02881C_PA_CL_VS_OUT_CNTL with mixed cull/clip distances Samuel Pitoiset (1): Revert "radv: re-apply "Do not access set layout during vkCmdBindDescriptorSets."" Hi Dylan, can we add commit 66f7289d568db8711adb885acc56622e2aff252a Author: Samuel Pitoiset Date: Wed Jan 19 16:15:33 2022 +0100 radv: add reference counting for descriptor set layouts If we take that revert? The revert wasn't because the patch was bad but because we had a better patch. Yes, we need that. git tag: mesa-22.0.0-rc2 https://mesa.freedesktop.org/archive/mesa-22.0.0-rc2.tar.xz SHA256: 14d6478ad367b22fbb24251f3282d98ba9b8c7758dcd416b33353e1387fd57f7 mesa-22.0.0-rc2.tar.xz SHA512: 9e05355a31f1640df6e800ccdf3150720d1a54aa21d9eb748d567b2b64090b09b6bc54318f2f72644b48c8d08f9db0f7ab3d35c9e1b629ded932fd9ed2e87630 mesa-22.0.0-rc2.tar.xz PGP: https://mesa.freedesktop.org/archive/mesa-22.0.0-rc2.tar.xz.sig
Re: [Mesa-dev] Outstanding Mesa 21.0 patches
For RADV: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10174 On 4/9/21 6:39 PM, Dylan Baker wrote: Hi all, I've been a little behind on release work recently, and I'm tryinng to cleanup the backlog of patches against the 21.0 branch that haven't been applied but have been nominated. Below is the list of outstanding patches that either don't apply, or cause regressions. If you'd like to have these applied please provide a backport, otherwise I'm going to mark all of these as "denominated". Cheers, Dylan 6a29632dd2 Revert "glcpp: disable 'windows' tests" bd1705a480 vulkan: Make vk_debug_report_callback derive from vk_object_base 366fb28dac ci: Fix MESA_TEMPLATES_COMMIT value 0464117ad9 ci: remove nouveau from shader-db runs 0f7379e308 ci: tracie dashboard URLs only in the failure after the testcase cff5c40fc3 pan/bi: Fix blend shaders using LD_TILE with MRT c0c03f29e0 lavapipe: implement physical device group enumeration b6b3b38434 turnip: consider HW limit on number of views when apply multipos opt 5a340c0929 vulkan/util: add api to reset object magic + private data. 226c7ae2a8 lavapipe: reset object base on recycled command buffers 8b44e45347 intel/perf: fix roll over PERF_CNT counter accumulation 3d3f21f0be ci: add libdrm to the x86_test-vk container 0a939e788f lavapipe: reorder descriptor set stages to get correct binding abc724e440 lavapipe: sort bindings before creating descriptor set 3436e5295b pan/bi: Treat +DISCARD.f32 as message-passing 2c02740a8c intel/mi_builder: Use AddCSMMIOStartOffset for LRI d4f21b53f2 nir/range_analysis: Add "is finite" range analysis tracking aa5d38decd nir/range_analysis: Add "is a number" range analysis tracking f4a7dbc58f nir/range_analysis: Fix analysis of fmin, fmax, or fsat with NaN source 30cf07cc8a lavapipe: fix primitive-restart for uint8 indices 32eb74e1e1 ac/gpu_info: fix more non-coherent RB and GL2 combinations 799a931d12 anv/apply_pipeline_layout: Rework the early pass index/offset helpers 3257ab9f23 radv: Dedupe winsyses per device. 90632ae7b3 lavapipe: stop tracking draw start/count on rendering state f7acdb1d1d st/glthread: allow for invalid L3 cache id. a5d5cbdf08 freedreno: Fix file descriptor leak. 61cf77583a lavapipe: Free sorted descriptor array. 33d87eeb5a gallium: add PIPE_CAP_ALLOW_DYNAMIC_VAO_FASTPATH 1df3a00dcd iris: disable dynamic VAO fastpath on GFX version 9 9413c6aec3 mesa: Add anything dynamically indexed before any non-dynamically indexed fe53c22294 lavapipe: fix only clearing depth or stencil paths. fe5349f70c freedreno/a6xx: Fix alpha tests. e4ef5f0433 mesa/st: ignore texture_index if tex_instr has deref src 363c1ef0c0 gallium/u_threaded: split draws that don't fit in a batch 961361cdc9 aco: ensure loops nested in a WQM loop are in WQM 0845cabc72 vulkan: Track dependencies of Python imports ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Request for developer access for Tony Wasserka
+1 On 11/20/20 10:44 AM, Daniel Schürmann wrote: Hi, I would like to request developer access for our new team member, Tony Wasserka. He has proven himself capable with a number of MRs and works actively on the ACO backend. His gitlab nick is neobrain. Thanks in advance, Daniel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] mediump support: future work
On RADV, we already support fp16 with the LLVM backend (not saying it's always optimized though) and with ACO it should be mostly working but not yet enabled because I think we would like fast packed math support first and I'm not sure if fp16 I/O are implemented. There is also some missing fp16 optimizations in the ACO backend which makes some games (eg. Youngblood) a bit slower if fp16 features/extensions are exposed but the gap should be filled soon hopefully. I think the most important thing missing is a SLP vectorizer in NIR for us. On 5/4/20 8:43 PM, Marek Olšák wrote: Hi, This is the status of mediump support in Mesa. What I listed is what AMD GPUs can do. "Yes" means what Mesa supports. *Feature* *FP16 support* *Int16 support* ALU Yes No UniformsNo No VS in No No VS out / FS in No No FS out No No TCS, TES, GS out / in No No Sampler coordinates (only coord, derivs, lod, bias; not offset and compare) No --- Image coordinates --- No Return value from samplers (incl. sampler buffers) Yes No Return value from image loads (incl. image buffers) No No Data source for image stores (incl. image buffers) No No If 16-bit sampler/image instructions are surrounded by conversions, promote them to 32 bits No No Please let me know if you don't see the table correctly. I'd like to know if I can enable some of them using the existing FP16 CAP. The only drivers supporting FP16 are currently Freedreno and Panfrost. Thanks, Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [ANNOUNCE] mesa 20.0.3
Good catch! Yes, please revert it asap, it breaks a bunch of things ... :( On 4/2/20 11:11 AM, Danylo Piliaiev wrote: "spirv: Implement OpCopyObject and OpCopyLogical as blind copies" was reverted yesterday due to the failures in several dEQP-VK tests, see: https://gitlab.freedesktop.org/mesa/mesa/-/commit/68f325b256d96dca923f6c7d84bc6faf43911245 https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4375 I'm not sure if it's already known or how important it is, but I'd better say it than not. On 02.04.20 00:52, Eric Engestrom wrote: Hi all, I'd like to announce the release of Mesa 20.0.3. Quite a busy cycle again, with fixes all over the tree, but nothing extraordinary; mostly AMD (radv, aco), NIR and Intel (isl, anv), as expected. Cheers, Eric --- Git shortlog Caio Marcelo de Oliveira Filho (1): mesa/main: Fix overflow in validation of DispatchComputeGroupSizeARB Dylan Baker (6): docs/relnotes: Add sha256 sums for 20.0.2 .pick_status.json: Update to cf62c2b2ac69637785f55b790fdd601c17e7e9d5 .pick_status.json: Mark 672d10619980687acec329742f055f7f3796c1b8 as backported .pick_status.json: Mark c923de68dd0ab10a5a5fb3196f539707d046d897 as backported .pick_status.json: Mark 56de6f698e3f164d97f132203e8159ef0b8e9bb8 as denominated .pick_status.json: Update to aee004a7c8900938d1c17f0ac299d40001b383b0 Eric Engestrom (8): .pick_status.json: Update to 3252041a7872c49e53bb02ffe8b079b5fc43f15e .pick_status.json: Update to 12711939320e4fcd3a0d86af22da1042ad92035f .pick_status.json: Update to 05069e1f0794aadd40ce9269f858e50c64254388 .pick_status.json: Update to 8970b7839aebefa7207c9535ac34ab4e8cc0ae25 .pick_status.json: Update to 5f4d9b419a1c931ad468b8b22b8a95b1216891e4 .pick_status.json: Update to 70ac7f5b0c46370075a35067c9f7dfe78e84b16d docs: add release notes for 20.0.3 VERSION: bump to 20.0.3 Erik Faye-Lund (3): rbug: do not return void-value pipebuffer: clean up cast-warnings vtn/opencl: fully enable OpenCLstd_Clz Francisco Jerez (1): intel/fs/gen12: Fix interaction of SWSB dependency combination with EU fusion workaround. Greg V (1): amd/addrlib: fix build on non-x86 platforms Ian Romanick (2): soft-fp64/fsat: Correctly handle NaN soft-fp64: Split a block that was missing a cast on a comparison Jason Ekstrand (5): intel/blorp: Add support for swizzling fast-clear colors anv: Swizzle fast-clear values nir/lower_int64: Lower 8 and 16-bit downcasts with nir_lower_mov64 anv: Account for the header in anv_state_stream_alloc spirv: Implement OpCopyObject and OpCopyLogical as blind copies John Stultz (2): gallium: hud_context: Fix scalar initializer warning. vc4_bufmgr: Remove duplicative VC definition Jordan Justen (2): intel: Update TGL PCI strings intel: Add TGL PCI ID Lionel Landwerlin (5): isl: implement linear tiling row pitch requirement for display isl: properly filter supported display modifiers on Gen9+ isl: only apply main surface ccs pitch constraint with CCS isl: drop min row pitch alignment when set by the driver intel: add new TGL pci ids Marek Olšák (3): nir: fix clip/cull_distance_array_size in nir_lower_clip_cull_distance_arrays ac: fix fast division st/mesa: fix use of uninitialized memory due to st_nir_lower_builtin Marek Vasut (1): etnaviv: Emit PE.ALPHA_COLOR_EXT* on GPUs with half-float support Neil Armstrong (1): Revert "ci: Remove T820 from CI temporarily" Pierre-Eric Pelloux-Prayer (1): st/mesa: disallow deferred flush if there are multiple contexts Rhys Perry (11): nir/gather_info: handle emit_vertex_with_counter aco: set has_divergent_branch for discards in loops aco: handle missing second predecessors at merge block phis aco: skip NIR in unreachable merge blocks aco: improve check for unreachable loop continue blocks aco: emit IR in IF's merge block instead if the other side ends in a jump aco: fix boolean undef regclass nir/gather_info: fix per-vertex handling in try_mask_partial_io aco: implement 64-bit VGPR constant copies in handle_operands() glsl: fix race in instance getters util/u_queue: fix race in total_jobs_size access Rob Clark (2): freedreno/ir3/ra: fix array liveranges util: fix u_fifo_pop() Samuel Pitoiset (7): radv/gfx10: fix required subgroup size with VK_EXT_subgroup_size_control radv/gfx10: fix required ballot size with VK_EXT_subgroup_size_control radv: fix optional pSizes parameter when binding streamout buffers radv: enable VK_KHR_8bit_storage on GFX6-GFX7 ac/nir: use llvm.amdgcn.rcp for nir_op_frcp ac/nir: use llvm.amdgcn.rsq for nir_op_frsq ac/nir:
Re: [Mesa-dev] [ANNOUNCE] Mesa 20.0 branchpoint planned for 2020/01/29, Milestone opened
On 1/28/20 8:46 PM, Dylan Baker wrote: Quoting Dylan Baker (2020-01-22 10:27:05) Hi list, due to some last minute changes in plan I'll be managing the 20.0 release. The release calendar has been updated, but the gitlab milestone wasn't opened. That has been corrected, and is here https://gitlab.freedesktop.org/mesa/mesa/-/milestones/9, please add any issues or MRs you would like to land before the branchpoint to the milestone. Thanks, Dylan Hi list, There are still a fair number of issues and MRs opened for the 20.0 branch point, should we postpone the branch point? Everything we wanted to merge to 20.0 is now upstream. No deadline extension request from the RADV team, at least. :-) Dylan ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] 12.0 release?
I should have said that in my previous reply: We would like to have ACO/GFX6 support and VK_AMD_shader_explicit_vertex_parameter in Mesa 20.0. I think it's doable if branchpoint is in two weeks or something like that. FWIW, 19.0 branchpoint was on January 29 last year. On 1/15/20 5:50 PM, Samuel Pitoiset wrote: If we can wait end of january that would be highly appreciated :-) On 1/15/20 5:13 PM, Jason Ekstrand wrote: When were we planning to cut the 20.0 release? We just landed Vulkan 1.2 support for ANV and RADV this morning so it seems like a good time to me. The release calendar has nothing for 2020: https://www.mesa3d.org/release-calendar.html --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] 12.0 release?
If we can wait end of january that would be highly appreciated :-) On 1/15/20 5:13 PM, Jason Ekstrand wrote: When were we planning to cut the 20.0 release? We just landed Vulkan 1.2 support for ANV and RADV this morning so it seems like a good time to me. The release calendar has nothing for 2020: https://www.mesa3d.org/release-calendar.html --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] libdrm versioning - switch to 19.0.0?
LGTM It's easier to figure how old is a release with year based versioning. On 10/10/19 10:14 PM, Marek Olšák wrote: Hi, I expect to make a new libdrm release soon. Any objections to changing the versioning scheme? Current: 2.4.n n = starts from 0, incremented per release New proposals: year.n.0 (19.0.0) year.month.n (19.10.0) year.month.day (19.10.10) Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Switching to Gitlab Issues instead of Bugzilla?
+1 On 8/29/19 8:52 PM, Kenneth Graunke wrote: Hi all, As a lot of you have probably noticed, Bugzilla seems to be getting a lot of spam these days - several of us have been disabling a bunch of accounts per day, sweeping new reports under the rug, hiding comments, etc. This bug spam causes emails to be sent (more spam!) and then us to have to look at ancient bugs that suddenly have updates. I think it's probably time to consider switching away from Bugzilla. We are one of the few projects remaining - Mesa, DRM, and a few DDX drivers are still there, but almost all other projects are gone: https://bugs.freedesktop.org/enter_bug.cgi Originally, I was in favor of retaining Bugzilla just to not change too many processes all at once. But we've been using Gitlab a while now, and several of us have been using Gitlab issues in our personal repos; it's actually pretty nice. Some niceities: - Bug reporters don't necessarily need to sign up for an account anymore. They can sign in with their Gitlab.com, Github, Google, or Twitter accounts. Or make one as before. This may be nicer for reporters that don't want to open yet another account just to report an issue to us. - Anti-spam support is actually maintained. Bugzilla makes it near impossible to actually delete garbage, Gitlab makes it easier. It has a better account creation hurdle than Bugzilla's ancient captcha, and Akismet plug-ins for handling spam. - The search interface is more modern and easier to use IMO. - Permissions & accounts are easier - it's the same unified system. - Easy linking between issues and MRs - mention one in the other, and both get updated with cross-links so you don't miss any discussion. - Milestone tracking - This could be handy for release trackers - both features people want to land, and bugs blocking the release. - We could also use it for big efforts like direct state access, getting feature parity with fglrx, or whatnot. - Khronos switched a while ago as well, so a number of us are already familiar with using it there. Some cons: - Moving bug reports between the kernel and Mesa would be harder. We would have to open a bug in the other system. (Then again, moving bugs between Mesa and X or Wayland would be easier...) What do people think? If folks are in favor, Daniel can migrate everything for us, like he did with the other projects. If not, I'd like to hear what people's concerns are. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Navi14 for 19.2
ACK. On 8/26/19 9:05 PM, Marek Olšák wrote: Hi, I'd like to push the Navi14 merge request to 19.2 no later than Tuesday August 27. https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1726 Please ack if it's OK with you, Thanks, Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] ac: fix exclusive scans on GFX8-GFX9
This fixes a regression introduced with scan operations on GFX10. Note that some subgroups CTS still fail on GFX10 but I assume it's a different issue. This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive*. v2: - move the logic back to ac_build_scan() Fixes: 227c29a80de "amd/common/gfx10: implement scan & reduce operations" Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 05871f5ea98..5abae00d8f6 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -4221,10 +4221,9 @@ ac_build_scan(struct ac_llvm_context *ctx, nir_op op, LLVMValueRef src, LLVMValu if (ctx->chip_class >= GFX10) { result = inclusive ? src : identity; } else { - if (inclusive) - result = src; - else - result = ac_build_dpp(ctx, identity, src, dpp_wf_sr1, 0xf, 0xf, false); + if (!inclusive) + src = ac_build_dpp(ctx, identity, src, dpp_wf_sr1, 0xf, 0xf, false); + result = src; } if (maxprefix <= 1) return result; -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] ac: fix exclusive scans on GFX8-GFX9
On 8/21/19 3:59 PM, Bas Nieuwenhuizen wrote: On Wed, Aug 21, 2019 at 3:45 PM Samuel Pitoiset wrote: This fixes a regression introduced with scan operations on GFX10. Note that some subgroups CTS still fail on GFX10 but I assume it's a different issue. This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive*. Fixes: 227c29a80de "amd/common/gfx10: implement scan & reduce operations" Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 05871f5ea98..d72eaa2db46 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -4221,10 +4221,7 @@ ac_build_scan(struct ac_llvm_context *ctx, nir_op op, LLVMValueRef src, LLVMValu if (ctx->chip_class >= GFX10) { result = inclusive ? src : identity; } else { - if (inclusive) - result = src; - else - result = ac_build_dpp(ctx, identity, src, dpp_wf_sr1, 0xf, 0xf, false); + result = src; } if (maxprefix <= 1) return result; @@ -4333,6 +4330,8 @@ ac_build_exclusive_scan(struct ac_llvm_context *ctx, LLVMValueRef src, nir_op op get_reduction_identity(ctx, op, ac_get_type_size(LLVMTypeOf(src))); result = LLVMBuildBitCast(ctx->builder, ac_build_set_inactive(ctx, src, identity), LLVMTypeOf(identity), ""); + if (ctx->chip_class <= GFX9) + result = ac_build_dpp(ctx, identity, result, dpp_wf_sr1, 0xf, 0xf, false); Kinda annoying that we still do the inclusive/exclusive logic for gfx10 inside ac_build_scan. Can we keep this inside the function by using a intermediate src? Looks better indeed, I will make that change. result = ac_build_scan(ctx, op, result, identity, ctx->wave_size, false); return ac_build_wwm(ctx, result); -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] ac: fix exclusive scans on GFX8-GFX9
This fixes a regression introduced with scan operations on GFX10. Note that some subgroups CTS still fail on GFX10 but I assume it's a different issue. This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive*. Fixes: 227c29a80de "amd/common/gfx10: implement scan & reduce operations" Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 05871f5ea98..d72eaa2db46 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -4221,10 +4221,7 @@ ac_build_scan(struct ac_llvm_context *ctx, nir_op op, LLVMValueRef src, LLVMValu if (ctx->chip_class >= GFX10) { result = inclusive ? src : identity; } else { - if (inclusive) - result = src; - else - result = ac_build_dpp(ctx, identity, src, dpp_wf_sr1, 0xf, 0xf, false); + result = src; } if (maxprefix <= 1) return result; @@ -4333,6 +4330,8 @@ ac_build_exclusive_scan(struct ac_llvm_context *ctx, LLVMValueRef src, nir_op op get_reduction_identity(ctx, op, ac_get_type_size(LLVMTypeOf(src))); result = LLVMBuildBitCast(ctx->builder, ac_build_set_inactive(ctx, src, identity), LLVMTypeOf(identity), ""); + if (ctx->chip_class <= GFX9) + result = ac_build_dpp(ctx, identity, result, dpp_wf_sr1, 0xf, 0xf, false); result = ac_build_scan(ctx, op, result, identity, ctx->wave_size, false); return ac_build_wwm(ctx, result); -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv/gfx10: do not use NGG with NAVI14
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index 64bd0d64401..c049a2844b8 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -2320,6 +2320,7 @@ radv_fill_shader_keys(struct radv_device *device, } if (device->physical_device->rad_info.chip_class >= GFX10 && + device->physical_device->rad_info.family != CHIP_NAVI14 && !(device->instance->debug_flags & RADV_DEBUG_NO_NGG)) { if (nir[MESA_SHADER_TESS_CTRL]) { keys[MESA_SHADER_TESS_EVAL].vs_common_out.as_ngg = true; -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radv/gfx10: don't initialize VGT_INSTANCE_STEP_RATE_0
Only gfx9 and older use it to get InstanceID in VGPR1. Ported from RadeonSI. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/si_cmd_buffer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c index e37b9498a71..32674d38bb9 100644 --- a/src/amd/vulkan/si_cmd_buffer.c +++ b/src/amd/vulkan/si_cmd_buffer.c @@ -192,7 +192,8 @@ si_emit_graphics(struct radv_physical_device *physical_device, radeon_set_context_reg(cs, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0); } - radeon_set_context_reg(cs, R_028AA0_VGT_INSTANCE_STEP_RATE_0, 1); + if (physical_device->rad_info.chip_class <= GFX9) + radeon_set_context_reg(cs, R_028AA0_VGT_INSTANCE_STEP_RATE_0, 1); if (!physical_device->has_clear_state) radeon_set_context_reg(cs, R_028AB8_VGT_VTX_CNT_EN, 0x0); if (physical_device->rad_info.chip_class < GFX7) -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: implement VK_AMD_shader_core_properties2
Trivial extension that matches PAL. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_device.c | 9 + src/amd/vulkan/radv_extensions.py | 1 + 2 files changed, 10 insertions(+) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index cc45ac95c08..5fde4577e4e 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -1300,6 +1300,15 @@ void radv_GetPhysicalDeviceProperties2( properties->vgprAllocationGranularity = 4; break; } + case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_SHADER_CORE_PROPERTIES_2_AMD: { + VkPhysicalDeviceShaderCoreProperties2AMD *properties = + (VkPhysicalDeviceShaderCoreProperties2AMD *)ext; + + properties->shaderCoreFeatures = 0; + properties->activeComputeUnitCount = + pdevice->rad_info.num_good_compute_units; + break; + } case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VERTEX_ATTRIBUTE_DIVISOR_PROPERTIES_EXT: { VkPhysicalDeviceVertexAttributeDivisorPropertiesEXT *properties = (VkPhysicalDeviceVertexAttributeDivisorPropertiesEXT *)ext; diff --git a/src/amd/vulkan/radv_extensions.py b/src/amd/vulkan/radv_extensions.py index 3624970dd37..b28d74f5746 100644 --- a/src/amd/vulkan/radv_extensions.py +++ b/src/amd/vulkan/radv_extensions.py @@ -143,6 +143,7 @@ EXTENSIONS = [ Extension('VK_AMD_rasterization_order', 1, 'device->has_out_of_order_rast'), Extension('VK_AMD_shader_ballot', 1, 'device->use_shader_ballot'), Extension('VK_AMD_shader_core_properties',1, True), +Extension('VK_AMD_shader_core_properties2', 1, True), Extension('VK_AMD_shader_info', 1, True), Extension('VK_AMD_shader_trinary_minmax', 1, True), Extension('VK_GOOGLE_decorate_string',1, True), -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: allow to enable VK_AMD_shader_ballot only on GFX8+
Scans aren't implemented on SI/CIK. Cc: 19.2 Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_device.c | 3 ++- src/amd/vulkan/radv_shader.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index cc45ac95c08..4aafe6e78aa 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -383,7 +383,8 @@ radv_physical_device_init(struct radv_physical_device *device, device->rad_info.family == CHIP_RENOIR || device->rad_info.chip_class >= GFX10; - device->use_shader_ballot = device->instance->perftest_flags & RADV_PERFTEST_SHADER_BALLOT; + device->use_shader_ballot = device->rad_info.chip_class >= GFX8 && + device->instance->perftest_flags & RADV_PERFTEST_SHADER_BALLOT; /* Determine the number of threads per wave for all stages. */ device->cs_wave_size = 64; diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 1e6a9a950d8..f2a8ac8abe3 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -297,7 +297,7 @@ radv_shader_compile_to_nir(struct radv_device *device, .lower_ubo_ssbo_access_to_offsets = true, .caps = { .amd_gcn_shader = true, - .amd_shader_ballot = device->instance->perftest_flags & RADV_PERFTEST_SHADER_BALLOT, + .amd_shader_ballot = device->physical_device->use_shader_ballot, .amd_trinary_minmax = true, .derivative_group = true, .descriptor_array_dynamic_indexing = true, -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radv: add a new debug option called RADV_DEBUG=noshaderballot
Shader ballot will be enabled by default for Wolfenstein Youngblood. This follows what we did for sisched. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_debug.h | 1 + src/amd/vulkan/radv_device.c | 1 + 2 files changed, 2 insertions(+) diff --git a/src/amd/vulkan/radv_debug.h b/src/amd/vulkan/radv_debug.h index ef5b331d188..1a8b9a42c20 100644 --- a/src/amd/vulkan/radv_debug.h +++ b/src/amd/vulkan/radv_debug.h @@ -53,6 +53,7 @@ enum { RADV_DEBUG_NOBINNING = 0x80, RADV_DEBUG_NO_LOAD_STORE_OPT = 0x100, RADV_DEBUG_NO_NGG= 0x200, + RADV_DEBUG_NO_SHADER_BALLOT = 0x400, }; enum { diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index cc45ac95c08..49518d43218 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -495,6 +495,7 @@ static const struct debug_control radv_debug_options[] = { {"nobinning", RADV_DEBUG_NOBINNING}, {"noloadstoreopt", RADV_DEBUG_NO_LOAD_STORE_OPT}, {"nongg", RADV_DEBUG_NO_NGG}, + {"noshaderballot", RADV_DEBUG_NO_SHADER_BALLOT}, {NULL, 0} }; -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv: force enable VK_AMD_shader_ballot for Wolfenstein Youngblood
This gives a nice boost, +20% at this time on my Vega 56. Shader ballot should be enabled by default at some point but it reduces performance a bit (-6%) with Wolfeinstein II. Enable it only for Youngblood at the moment, like what we did for Talos in the past. As a bonus point, it gets rid of some minor artifacts that only happens when ballot is disabled for some reasons. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_device.c | 8 1 file changed, 8 insertions(+) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 49518d43218..c04f6a27e82 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -554,6 +554,14 @@ radv_handle_per_app_options(struct radv_instance *instance, */ if (HAVE_LLVM < 0x900) instance->debug_flags |= RADV_DEBUG_NO_LOAD_STORE_OPT; + } else if (!strcmp(name, "Wolfenstein: Youngblood")) { + if (!(instance->debug_flags & RADV_DEBUG_NO_SHADER_BALLOT)) { + /* Force enable VK_AMD_shader_ballot because it looks +* safe and it gives a nice boost (+20% on Vega 56 at +* this time). +*/ + instance->perftest_flags |= RADV_PERFTEST_SHADER_BALLOT; + } } } -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv/gfx10: hardcode some depth+stencil formats in the format table
The script doesn't handle them correctly and D16_UNORM_S8_UINT isn't supported by the hardware, mark it as invalid. This fixes warning when generating gfx10_format_table.h. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111393 Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/gfx10_format_table.py | 5 + 1 file changed, 5 insertions(+) diff --git a/src/amd/vulkan/gfx10_format_table.py b/src/amd/vulkan/gfx10_format_table.py index 81b0bed92aa..f55b302bf82 100644 --- a/src/amd/vulkan/gfx10_format_table.py +++ b/src/amd/vulkan/gfx10_format_table.py @@ -66,6 +66,11 @@ HARDCODED = { 'VK_FORMAT_BC6H_SFLOAT_BLOCK': hardcoded_format('BC6_SFLOAT'), 'VK_FORMAT_BC7_UNORM_BLOCK': hardcoded_format('BC7_UNORM'), 'VK_FORMAT_BC7_SRGB_BLOCK': hardcoded_format('BC7_SRGB'), + +# DS +'VK_FORMAT_D16_UNORM_S8_UINT': hardcoded_format('INVALID'), +'VK_FORMAT_D24_UNORM_S8_UINT': hardcoded_format('8_24_UNORM'), +'VK_FORMAT_D32_SFLOAT_S8_UINT': hardcoded_format('X24_8_32_FLOAT'), } -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radv/gfx10: tidy up gfx10_format_table.py
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/gfx10_format_table.py | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/src/amd/vulkan/gfx10_format_table.py b/src/amd/vulkan/gfx10_format_table.py index 34ad5f6cdf2..81b0bed92aa 100644 --- a/src/amd/vulkan/gfx10_format_table.py +++ b/src/amd/vulkan/gfx10_format_table.py @@ -21,7 +21,7 @@ # USE OR OTHER DEALINGS IN THE SOFTWARE. # """ -Script that generates the mapping from Gallium PIPE_FORMAT_xxx to gfx10 +Script that generates the mapping from Vulkan VK_FORMAT_xxx to gfx10 IMG_FORMAT_xxx enums. """ @@ -34,12 +34,10 @@ import re import sys AMD_REGISTERS = os.path.abspath(os.path.join(os.path.dirname(sys.argv[0]), "../registers")) -#GALLIUM_UTIL = os.path.abspath(os.path.join(os.path.dirname(sys.argv[0]), "../../auxiliary/util")) sys.path.extend([AMD_REGISTERS]) from regdb import Object, RegisterDatabase from vk_format_parse import * -#from u_format_parse import * # # Hard-coded mappings @@ -82,11 +80,11 @@ header_template = mako.template.Template("""\ ##__VA_ARGS__ } static const struct gfx10_format gfx10_format_table[VK_FORMAT_RANGE_SIZE] = { -% for pipe_format, args in formats: +% for vk_format, args in formats: % if args is not None: - [${pipe_format}] = FMT(${args}), + [${vk_format}] = FMT(${args}), % else: -/* ${pipe_format} is not supported */ +/* ${vk_format} is not supported */ % endif % endfor }; @@ -114,8 +112,8 @@ class Gfx10Format(object): class Gfx10FormatMapping(object): -def __init__(self, pipe_formats, gfx10_formats): -self.pipe_formats = pipe_formats +def __init__(self, vk_formats, gfx10_formats): +self.vk_formats = vk_formats self.gfx10_formats = gfx10_formats self.plain_gfx10_formats = dict( @@ -219,17 +217,17 @@ class Gfx10FormatMapping(object): if __name__ == '__main__': -pipe_formats = parse(sys.argv[1]) +vk_formats = parse(sys.argv[1]) with open(sys.argv[2], 'r') as filp: db = RegisterDatabase.from_json(json.load(filp)) gfx10_formats = [Gfx10Format(entry) for entry in db.enum('IMG_FORMAT').entries] -mapping = Gfx10FormatMapping(pipe_formats, gfx10_formats) +mapping = Gfx10FormatMapping(vk_formats, gfx10_formats) formats = [] -for fmt in pipe_formats: +for fmt in vk_formats: if fmt.name in HARDCODED: obj = HARDCODED[fmt.name] else: -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv/gfx10: do not emit PA_SC_TILE_STEERING_OVERRIDE twice
CLEAR_STATE emits it for us. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/si_cmd_buffer.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c index a5057fe25a2..68ec925f2b5 100644 --- a/src/amd/vulkan/si_cmd_buffer.c +++ b/src/amd/vulkan/si_cmd_buffer.c @@ -366,8 +366,6 @@ si_emit_graphics(struct radv_physical_device *physical_device, radeon_set_context_reg(cs, R_028C50_PA_SC_NGG_MODE_CNTL, S_028C50_MAX_DEALLOCS_IN_WAVE(512)); radeon_set_context_reg(cs, R_028C58_VGT_VERTEX_REUSE_BLOCK_CNTL, 14); - radeon_set_context_reg(cs, R_02835C_PA_SC_TILE_STEERING_OVERRIDE, - physical_device->rad_info.pa_sc_tile_steering_override); radeon_set_context_reg(cs, R_02807C_DB_RMI_L2_CACHE_CONTROL, S_02807C_Z_WR_POLICY(V_02807C_CACHE_STREAM_WR) | S_02807C_S_WR_POLICY(V_02807C_CACHE_STREAM_WR) | -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radv: do not emit PKT3_CONTEXT_CONTROL with AMDGPU 3.6.0+
It's emitted by the kernel. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_device.c | 9 ++--- src/amd/vulkan/si_cmd_buffer.c | 9 ++--- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 05d09bb08eb..110808fb98d 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -2008,9 +2008,12 @@ VkResult radv_CreateDevice( device->empty_cs[family] = device->ws->cs_create(device->ws, family); switch (family) { case RADV_QUEUE_GENERAL: - radeon_emit(device->empty_cs[family], PKT3(PKT3_CONTEXT_CONTROL, 1, 0)); - radeon_emit(device->empty_cs[family], CONTEXT_CONTROL_LOAD_ENABLE(1)); - radeon_emit(device->empty_cs[family], CONTEXT_CONTROL_SHADOW_ENABLE(1)); + /* Since amdgpu version 3.6.0, CONTEXT_CONTROL is emitted by the kernel */ + if (device->physical_device->rad_info.drm_minor < 6) { + radeon_emit(device->empty_cs[family], PKT3(PKT3_CONTEXT_CONTROL, 1, 0)); + radeon_emit(device->empty_cs[family], CONTEXT_CONTROL_LOAD_ENABLE(1)); + radeon_emit(device->empty_cs[family], CONTEXT_CONTROL_SHADOW_ENABLE(1)); + } break; case RADV_QUEUE_COMPUTE: radeon_emit(device->empty_cs[family], PKT3(PKT3_NOP, 0, 0)); diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c index 701b2398b50..a5057fe25a2 100644 --- a/src/amd/vulkan/si_cmd_buffer.c +++ b/src/amd/vulkan/si_cmd_buffer.c @@ -161,9 +161,12 @@ si_emit_graphics(struct radv_physical_device *physical_device, { int i; - radeon_emit(cs, PKT3(PKT3_CONTEXT_CONTROL, 1, 0)); - radeon_emit(cs, CONTEXT_CONTROL_LOAD_ENABLE(1)); - radeon_emit(cs, CONTEXT_CONTROL_SHADOW_ENABLE(1)); + /* Since amdgpu version 3.6.0, CONTEXT_CONTROL is emitted by the kernel */ + if (physical_device->rad_info.drm_minor < 6) { + radeon_emit(cs, PKT3(PKT3_CONTEXT_CONTROL, 1, 0)); + radeon_emit(cs, CONTEXT_CONTROL_LOAD_ENABLE(1)); + radeon_emit(cs, CONTEXT_CONTROL_SHADOW_ENABLE(1)); + } if (physical_device->has_clear_state) { radeon_emit(cs, PKT3(PKT3_CLEAR_STATE, 0, 0)); -- 2.22.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: fix image_has_{cmask,fmask}() helpers
The driver should now rely on cmask_offset because CMASK can be disabled by the driver for some reasons (eg. mipmaps). Apply the same change for FMASK, although it should be useless. Fixes: ad1bc8621df ("radv: remove radv_get_image_fmask_info()") Fixes: 10d08da52c6 ("radv/gfx10: add missing dcc_tile_swizzle tweak") Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_private.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h index 49d3c78db98..ee0761e69fe 100644 --- a/src/amd/vulkan/radv_private.h +++ b/src/amd/vulkan/radv_private.h @@ -1633,7 +1633,7 @@ bool radv_layout_dcc_compressed(const struct radv_image *image, static inline bool radv_image_has_cmask(const struct radv_image *image) { - return image->planes[0].surface.cmask_size; + return image->cmask_offset; } /** @@ -1642,7 +1642,7 @@ radv_image_has_cmask(const struct radv_image *image) static inline bool radv_image_has_fmask(const struct radv_image *image) { - return image->planes[0].surface.fmask_size; + return image->fmask_offset; } /** -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv: remove radv_get_image_fmask_info()
It's unnecessary to duplicate fields in another struct. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_device.c | 12 - src/amd/vulkan/radv_image.c | 44 +++- src/amd/vulkan/radv_meta_clear.c | 12 ++--- src/amd/vulkan/radv_private.h| 16 ++-- 4 files changed, 25 insertions(+), 59 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 9aa731a252c..b9db931c309 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -4406,9 +4406,9 @@ radv_initialise_color_surface(struct radv_device *device, if (radv_image_has_fmask(iview->image)) { if (device->physical_device->rad_info.chip_class >= GFX7) - cb->cb_color_pitch |= S_028C64_FMASK_TILE_MAX(iview->image->fmask.pitch_in_pixels / 8 - 1); - cb->cb_color_attrib |= S_028C74_FMASK_TILE_MODE_INDEX(iview->image->fmask.tile_mode_index); - cb->cb_color_fmask_slice = S_028C88_TILE_MAX(iview->image->fmask.slice_tile_max); + cb->cb_color_pitch |= S_028C64_FMASK_TILE_MAX(surf->u.legacy.fmask.pitch_in_pixels / 8 - 1); + cb->cb_color_attrib |= S_028C74_FMASK_TILE_MODE_INDEX(surf->u.legacy.fmask.tiling_index); + cb->cb_color_fmask_slice = S_028C88_TILE_MAX(surf->u.legacy.fmask.slice_tile_max); } else { /* This must be set for fast clear to work without FMASK. */ if (device->physical_device->rad_info.chip_class >= GFX7) @@ -4449,9 +4449,9 @@ radv_initialise_color_surface(struct radv_device *device, } if (radv_image_has_fmask(iview->image)) { - va = radv_buffer_get_va(iview->bo) + iview->image->offset + iview->image->fmask.offset; + va = radv_buffer_get_va(iview->bo) + iview->image->offset + iview->image->fmask_offset; cb->cb_color_fmask = va >> 8; - cb->cb_color_fmask |= iview->image->fmask.tile_swizzle; + cb->cb_color_fmask |= surf->fmask_tile_swizzle; } else { cb->cb_color_fmask = cb->cb_color_base; } @@ -4501,7 +4501,7 @@ radv_initialise_color_surface(struct radv_device *device, if (radv_image_has_fmask(iview->image)) { cb->cb_color_info |= S_028C70_COMPRESSION(1); if (device->physical_device->rad_info.chip_class == GFX6) { - unsigned fmask_bankh = util_logbase2(iview->image->fmask.bank_height); + unsigned fmask_bankh = util_logbase2(surf->u.legacy.fmask.bankh); cb->cb_color_attrib |= S_028C74_FMASK_BANK_HEIGHT(fmask_bankh); } diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index aaaf15ec8dc..efbb9de96b7 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -715,7 +715,7 @@ gfx10_make_texture_descriptor(struct radv_device *device, assert(image->plane_count == 1); - va = gpu_address + image->offset + image->fmask.offset; + va = gpu_address + image->offset + image->fmask_offset; switch (image->info.samples) { case 2: @@ -879,7 +879,7 @@ si_make_texture_descriptor(struct radv_device *device, assert(image->plane_count == 1); - va = gpu_address + image->offset + image->fmask.offset; + va = gpu_address + image->offset + image->fmask_offset; if (device->physical_device->rad_info.chip_class == GFX9) { fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK; @@ -915,7 +915,7 @@ si_make_texture_descriptor(struct radv_device *device, } fmask_state[0] = va >> 8; - fmask_state[0] |= image->fmask.tile_swizzle; + fmask_state[0] |= image->planes[0].surface.fmask_tile_swizzle; fmask_state[1] = S_008F14_BASE_ADDRESS_HI(va >> 40) | S_008F14_DATA_FORMAT(fmask_format) | S_008F14_NUM_FORMAT(num_format); @@ -946,9 +946,9 @@ si_make_texture_descriptor(struct radv_device *device, fmask_state[7] |= va >> 8; } } else { - fmask_state[3] |= S_008F1C_TILING_INDEX(image->fmask.tile_mode_index); + fmask_state[3] |= S_008F1C_TILING_INDEX(image->planes[0].surface.u.legacy.fmask.tiling_index); fmask_state[4] |= S_008F20_DEPTH(depth - 1) | - S_008F20_PI
[Mesa-dev] [PATCH 1/2] radv: remove radv_get_image_cmask_info()
It's unnecessary to duplicate fields in another struct. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_device.c | 4 ++-- src/amd/vulkan/radv_image.c | 38 +--- src/amd/vulkan/radv_meta_clear.c | 11 + src/amd/vulkan/radv_private.h| 13 ++- 4 files changed, 21 insertions(+), 45 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 29be192443a..9aa731a252c 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -4400,7 +4400,7 @@ radv_initialise_color_surface(struct radv_device *device, cb->cb_color_pitch = S_028C64_TILE_MAX(pitch_tile_max); cb->cb_color_slice = S_028C68_TILE_MAX(slice_tile_max); - cb->cb_color_cmask_slice = iview->image->cmask.slice_tile_max; + cb->cb_color_cmask_slice = surf->u.legacy.cmask_slice_tile_max; cb->cb_color_attrib |= S_028C74_TILE_MODE_INDEX(tile_mode_index); @@ -4420,7 +4420,7 @@ radv_initialise_color_surface(struct radv_device *device, /* CMASK variables */ va = radv_buffer_get_va(iview->bo) + iview->image->offset; - va += iview->image->cmask.offset; + va += iview->image->cmask_offset; cb->cb_color_cmask = va >> 8; va = radv_buffer_get_va(iview->bo) + iview->image->offset; diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index 8ff93e4344c..aaaf15ec8dc 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -939,7 +939,7 @@ si_make_texture_descriptor(struct radv_device *device, S_008F24_META_RB_ALIGNED(image->planes[0].surface.u.gfx9.cmask.rb_aligned); if (radv_image_is_tc_compat_cmask(image)) { - va = gpu_address + image->offset + image->cmask.offset; + va = gpu_address + image->offset + image->cmask_offset; fmask_state[5] |= S_008F24_META_DATA_ADDRESS(va >> 40); fmask_state[6] |= S_008F28_COMPRESSION_EN(1); @@ -952,7 +952,7 @@ si_make_texture_descriptor(struct radv_device *device, fmask_state[5] |= S_008F24_LAST_ARRAY(last_layer); if (radv_image_is_tc_compat_cmask(image)) { - va = gpu_address + image->offset + image->cmask.offset; + va = gpu_address + image->offset + image->cmask_offset; fmask_state[6] |= S_008F28_COMPRESSION_EN(1); fmask_state[7] |= va >> 8; @@ -1138,45 +1138,27 @@ radv_image_alloc_fmask(struct radv_device *device, image->alignment = MAX2(image->alignment, image->fmask.alignment); } -static void -radv_image_get_cmask_info(struct radv_device *device, - struct radv_image *image, - struct radv_cmask_info *out) -{ - assert(image->plane_count == 1); - - if (device->physical_device->rad_info.chip_class >= GFX9) { - out->alignment = image->planes[0].surface.cmask_alignment; - out->size = image->planes[0].surface.cmask_size; - return; - } - - out->slice_tile_max = image->planes[0].surface.u.legacy.cmask_slice_tile_max; - out->alignment = image->planes[0].surface.cmask_alignment; - out->slice_size = image->planes[0].surface.cmask_slice_size; - out->size = image->planes[0].surface.cmask_size; -} - static void radv_image_alloc_cmask(struct radv_device *device, struct radv_image *image) { + unsigned cmask_alignment = image->planes[0].surface.cmask_alignment; + unsigned cmask_size = image->planes[0].surface.cmask_size; uint32_t clear_value_size = 0; - radv_image_get_cmask_info(device, image, >cmask); - if (!image->cmask.size) + if (!cmask_size) return; - assert(image->cmask.alignment); + assert(cmask_alignment); - image->cmask.offset = align64(image->size, image->cmask.alignment); + image->cmask_offset = align64(image->size, cmask_alignment); /* + 8 for storing the clear values */ if (!image->clear_value_offset) { - image->clear_value_offset = image->cmask.offset + image->cmask.size; + image->clear_value_offset = image->cmask_offset + cmask_size; clear_value_size = 8; } - image->size = image->cmask.offset + image->cmask.size + clear_value_size; - image->alignment = MAX2(image->alignment, image->cmask.alignment); + image->size = image->
[Mesa-dev] [PATCH 1/2] radv: only account for tile_swizzle for color surfaces with DCC
It's 0 for depth surfaces with TC compat HTILE enabled. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_image.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index f3237dd5985..221b554e73e 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -483,6 +483,8 @@ si_set_mutable_tex_desc_fields(struct radv_device *device, meta_va = gpu_address + image->dcc_offset; if (chip_class <= GFX8) meta_va += base_level_info->dcc_offset; + + meta_va |= (uint32_t)plane->surface.tile_swizzle << 8; } else if (!is_storage_image && radv_image_is_tc_compat_htile(image)) { meta_va = gpu_address + image->htile_offset; @@ -490,10 +492,8 @@ si_set_mutable_tex_desc_fields(struct radv_device *device, if (meta_va) { state[6] |= S_008F28_COMPRESSION_EN(1); - if (chip_class <= GFX9) { + if (chip_class <= GFX9) state[7] = meta_va >> 8; - state[7] |= plane->surface.tile_swizzle; - } } } -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv/gfx10: add missing dcc_tile_swizzle tweak
Fixes: c90f46700dd ("radv/gfx10: mask DCC tile swizzle by alignment") Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_image.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index 221b554e73e..8ff93e4344c 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -484,7 +484,9 @@ si_set_mutable_tex_desc_fields(struct radv_device *device, if (chip_class <= GFX8) meta_va += base_level_info->dcc_offset; - meta_va |= (uint32_t)plane->surface.tile_swizzle << 8; + unsigned dcc_tile_swizzle = plane->surface.tile_swizzle << 8; + dcc_tile_swizzle &= plane->surface.dcc_alignment - 1; + meta_va |= dcc_tile_swizzle; } else if (!is_storage_image && radv_image_is_tc_compat_htile(image)) { meta_va = gpu_address + image->htile_offset; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] radv/gfx10: determine correct wave size when lowering subgroups
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_shader.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 97fa80b348c..f0ab2d5e467 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -124,6 +124,17 @@ unsigned shader_io_get_unique_index(gl_varying_slot slot) unreachable("illegal slot in get unique index\n"); } +static uint8_t +radv_get_shader_wave_size(const struct radv_physical_device *pdevice, + gl_shader_stage stage) +{ + if (stage == MESA_SHADER_COMPUTE) + return pdevice->cs_wave_size; + else if (stage == MESA_SHADER_FRAGMENT) + return pdevice->ps_wave_size; + return pdevice->ge_wave_size; +} + VkResult radv_CreateShaderModule( VkDevice_device, const VkShaderModuleCreateInfo* pCreateInfo, @@ -422,9 +433,13 @@ radv_shader_compile_to_nir(struct radv_device *device, nir_lower_global_vars_to_local(nir); nir_remove_dead_variables(nir, nir_var_function_temp); + + uint8_t wave_size = radv_get_shader_wave_size(device->physical_device, + nir->info.stage); + nir_lower_subgroups(nir, &(struct nir_lower_subgroups_options) { - .subgroup_size = 64, - .ballot_bit_size = 64, + .subgroup_size = wave_size, + .ballot_bit_size = wave_size, .lower_to_scalar = 1, .lower_subgroup_masks = 1, .lower_shuffle = 1, @@ -667,17 +682,6 @@ radv_get_shader_binary_size(size_t code_size) return code_size + DEBUGGER_NUM_MARKERS * 4; } -static uint8_t -radv_get_shader_wave_size(const struct radv_physical_device *pdevice, - gl_shader_stage stage) -{ - if (stage == MESA_SHADER_COMPUTE) - return pdevice->cs_wave_size; - else if (stage == MESA_SHADER_FRAGMENT) - return pdevice->ps_wave_size; - return pdevice->ge_wave_size; -} - static void radv_postprocess_config(const struct radv_physical_device *pdevice, const struct ac_shader_config *config_in, const struct radv_shader_variant_info *info, -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] radv/gfx10: use the correct target machine for Wave32
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_llvm_helper.cpp | 30 + src/amd/vulkan/radv_shader.c| 3 ++- src/amd/vulkan/radv_shader_helper.h | 3 ++- 3 files changed, 26 insertions(+), 10 deletions(-) diff --git a/src/amd/vulkan/radv_llvm_helper.cpp b/src/amd/vulkan/radv_llvm_helper.cpp index 2b14ddcf184..612548e4219 100644 --- a/src/amd/vulkan/radv_llvm_helper.cpp +++ b/src/amd/vulkan/radv_llvm_helper.cpp @@ -28,8 +28,10 @@ class radv_llvm_per_thread_info { public: radv_llvm_per_thread_info(enum radeon_family arg_family, - enum ac_target_machine_options arg_tm_options) - : family(arg_family), tm_options(arg_tm_options), passes(NULL) {} + enum ac_target_machine_options arg_tm_options, + unsigned arg_wave_size) + : family(arg_family), tm_options(arg_tm_options), + wave_size(arg_wave_size), passes(NULL), passes_wave32(NULL) {} ~radv_llvm_per_thread_info() { @@ -47,19 +49,28 @@ public: if (!passes) return false; + if (llvm_info.tm_wave32) { + passes_wave32 = ac_create_llvm_passes(llvm_info.tm_wave32); + if (!passes_wave32) + return false; + } + return true; } bool compile_to_memory_buffer(LLVMModuleRef module, char **pelf_buffer, size_t *pelf_size) { - return ac_compile_module_to_elf(passes, module, pelf_buffer, pelf_size); + struct ac_compiler_passes *p = wave_size == 32 ? passes_wave32 : passes; + return ac_compile_module_to_elf(p, module, pelf_buffer, pelf_size); } bool is_same(enum radeon_family arg_family, -enum ac_target_machine_options arg_tm_options) { +enum ac_target_machine_options arg_tm_options, +unsigned arg_wave_size) { if (arg_family == family && - arg_tm_options == tm_options) + arg_tm_options == tm_options && + arg_wave_size == wave_size) return true; return false; } @@ -67,7 +78,9 @@ public: private: enum radeon_family family; enum ac_target_machine_options tm_options; + unsigned wave_size; struct ac_compiler_passes *passes; + struct ac_compiler_passes *passes_wave32; }; /* we have to store a linked list per thread due to the possiblity of multiple gpus being required */ @@ -99,17 +112,18 @@ bool radv_compile_to_elf(struct ac_llvm_compiler *info, bool radv_init_llvm_compiler(struct ac_llvm_compiler *info, bool thread_compiler, enum radeon_family family, -enum ac_target_machine_options tm_options) +enum ac_target_machine_options tm_options, +unsigned wave_size) { if (thread_compiler) { for (auto : radv_llvm_per_thread_list) { - if (I.is_same(family, tm_options)) { + if (I.is_same(family, tm_options, wave_size)) { *info = I.llvm_info; return true; } } - radv_llvm_per_thread_list.emplace_back(family, tm_options); + radv_llvm_per_thread_list.emplace_back(family, tm_options, wave_size); radv_llvm_per_thread_info = radv_llvm_per_thread_list.back(); if (!tinfo.init()) { diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index f0ab2d5e467..5e3b1378a14 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -1163,7 +1163,8 @@ shader_variant_compile(struct radv_device *device, radv_init_llvm_once(); radv_init_llvm_compiler(_llvm, thread_compiler, - chip_family, tm_options); + chip_family, tm_options, + radv_get_shader_wave_size(device->physical_device, stage)); if (gs_copy_shader) { assert(shader_count == 1); radv_compile_gs_copy_shader(_llvm, *shaders, , diff --git a/src/amd/vulkan/radv_shader_helper.h b/src/amd/vulkan/radv_shader_helper.h index d9dace0b495..c64d2df676b 100644 --- a/src/amd/vulkan/radv_shader_helper.h +++ b/src/amd/vulkan/radv_shader_helper.h @@ -29,7 +29,8 @@ extern "C" { bool radv_init_llvm_compiler(struct ac_llvm_compiler *info, bool thread_compiler, enum radeon_family family, -
[Mesa-dev] [PATCH 1/4] radv/gfx10: add Wave32 support for fragment shaders
It can be enabled with RADV_PERFTEST=pswave32. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_debug.h | 1 + src/amd/vulkan/radv_device.c | 6 ++ src/amd/vulkan/radv_nir_to_llvm.c | 2 ++ src/amd/vulkan/radv_pipeline.c| 3 ++- src/amd/vulkan/radv_private.h | 1 + src/amd/vulkan/radv_shader.c | 4 +++- src/amd/vulkan/radv_shader.h | 1 + 7 files changed, 16 insertions(+), 2 deletions(-) diff --git a/src/amd/vulkan/radv_debug.h b/src/amd/vulkan/radv_debug.h index 6414e882676..65dbec6e90d 100644 --- a/src/amd/vulkan/radv_debug.h +++ b/src/amd/vulkan/radv_debug.h @@ -65,6 +65,7 @@ enum { RADV_PERFTEST_SHADER_BALLOT = 0x40, RADV_PERFTEST_TC_COMPAT_CMASK = 0x80, RADV_PERFTEST_CS_WAVE_32 = 0x100, + RADV_PERFTEST_PS_WAVE_32 = 0x200, }; bool diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 29be192443a..b66b15edf73 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -385,10 +385,15 @@ radv_physical_device_init(struct radv_physical_device *device, /* Determine the number of threads per wave for all stages. */ device->cs_wave_size = 64; + device->ps_wave_size = 64; if (device->rad_info.chip_class >= GFX10) { if (device->instance->perftest_flags & RADV_PERFTEST_CS_WAVE_32) device->cs_wave_size = 32; + + /* For pixel shaders, wave64 is recommanded. */ + if (device->instance->perftest_flags & RADV_PERFTEST_PS_WAVE_32) + device->ps_wave_size = 32; } radv_physical_device_init_mem_types(device); @@ -503,6 +508,7 @@ static const struct debug_control radv_perftest_options[] = { {"shader_ballot", RADV_PERFTEST_SHADER_BALLOT}, {"tccompatcmask", RADV_PERFTEST_TC_COMPAT_CMASK}, {"cswave32", RADV_PERFTEST_CS_WAVE_32}, + {"pswave32", RADV_PERFTEST_PS_WAVE_32}, {NULL, 0} }; diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index bb78bcccf0e..bba5849b152 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -4323,6 +4323,8 @@ radv_nir_shader_wave_size(struct nir_shader *const *shaders, int shader_count, { if (shaders[0]->info.stage == MESA_SHADER_COMPUTE) return options->cs_wave_size; + else if (shaders[0]->info.stage == MESA_SHADER_FRAGMENT) + return options->ps_wave_size; return 64; } diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index d62066cbee4..dbfe261c982 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -4060,7 +4060,8 @@ radv_pipeline_generate_fragment_shader(struct radeon_cmdbuf *ctx_cs, ps->config.spi_ps_input_addr); radeon_set_context_reg(ctx_cs, R_0286D8_SPI_PS_IN_CONTROL, - S_0286D8_NUM_INTERP(ps->info.fs.num_interp)); + S_0286D8_NUM_INTERP(ps->info.fs.num_interp) | + S_0286D8_PS_W32_EN(pipeline->device->physical_device->ps_wave_size == 32)); radeon_set_context_reg(ctx_cs, R_0286E0_SPI_BARYC_CNTL, pipeline->graphics.spi_baryc_cntl); diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h index 143c09811c8..a1347060190 100644 --- a/src/amd/vulkan/radv_private.h +++ b/src/amd/vulkan/radv_private.h @@ -302,6 +302,7 @@ struct radv_physical_device { bool has_dcc_constant_encode; /* Number of threads per wave. */ + uint8_t ps_wave_size; uint8_t cs_wave_size; /* This is the drivers on-disk cache used as a fallback as opposed to diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 9c88ab551bb..48ed86c99b1 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -673,7 +673,8 @@ radv_get_shader_wave_size(const struct radv_physical_device *pdevice, { if (stage == MESA_SHADER_COMPUTE) return pdevice->cs_wave_size; - + else if (stage == MESA_SHADER_FRAGMENT) + return pdevice->ps_wave_size; return 64; } @@ -1142,6 +1143,7 @@ shader_variant_compile(struct radv_device *device, options->tess_offchip_block_dw_size = device->tess_offchip_block_dw_size; options->address32_hi = device->physical_device->rad_info.address32_hi; options->cs_wave_size = device->physical_device->cs_wave_size; + options->ps_wave_size = device->physical_device->ps_wave_size; if (options->supports_spill) tm_options |= AC_TM_SUPPORTS_SPILL; diff --git a/src/amd/vulkan/radv_shader.h b/src/amd/vulkan/radv_shader.h index 92ae2a7259d..0ef4962
[Mesa-dev] [PATCH 2/4] radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders
It can be enabled with RADV_PERFTEST=gewave32. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_debug.h | 1 + src/amd/vulkan/radv_device.c | 5 + src/amd/vulkan/radv_nir_to_llvm.c | 13 +++-- src/amd/vulkan/radv_pipeline.c| 10 +- src/amd/vulkan/radv_private.h | 1 + src/amd/vulkan/radv_shader.c | 3 ++- src/amd/vulkan/radv_shader.h | 1 + 7 files changed, 26 insertions(+), 8 deletions(-) diff --git a/src/amd/vulkan/radv_debug.h b/src/amd/vulkan/radv_debug.h index 65dbec6e90d..ef5b331d188 100644 --- a/src/amd/vulkan/radv_debug.h +++ b/src/amd/vulkan/radv_debug.h @@ -66,6 +66,7 @@ enum { RADV_PERFTEST_TC_COMPAT_CMASK = 0x80, RADV_PERFTEST_CS_WAVE_32 = 0x100, RADV_PERFTEST_PS_WAVE_32 = 0x200, + RADV_PERFTEST_GE_WAVE_32 = 0x400, }; bool diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index b66b15edf73..fc961040b6e 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -386,6 +386,7 @@ radv_physical_device_init(struct radv_physical_device *device, /* Determine the number of threads per wave for all stages. */ device->cs_wave_size = 64; device->ps_wave_size = 64; + device->ge_wave_size = 64; if (device->rad_info.chip_class >= GFX10) { if (device->instance->perftest_flags & RADV_PERFTEST_CS_WAVE_32) @@ -394,6 +395,9 @@ radv_physical_device_init(struct radv_physical_device *device, /* For pixel shaders, wave64 is recommanded. */ if (device->instance->perftest_flags & RADV_PERFTEST_PS_WAVE_32) device->ps_wave_size = 32; + + if (device->instance->perftest_flags & RADV_PERFTEST_GE_WAVE_32) + device->ge_wave_size = 32; } radv_physical_device_init_mem_types(device); @@ -509,6 +513,7 @@ static const struct debug_control radv_perftest_options[] = { {"tccompatcmask", RADV_PERFTEST_TC_COMPAT_CMASK}, {"cswave32", RADV_PERFTEST_CS_WAVE_32}, {"pswave32", RADV_PERFTEST_PS_WAVE_32}, + {"gewave32", RADV_PERFTEST_GE_WAVE_32}, {NULL, 0} }; diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index bba5849b152..91251aa69bd 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -295,7 +295,7 @@ get_tcs_num_patches(struct radv_shader_context *ctx) /* GFX6 bug workaround - limit LS-HS threadgroups to only one wave. */ if (ctx->options->chip_class == GFX6) { - unsigned one_wave = 64 / MAX2(num_tcs_input_cp, num_tcs_output_cp); + unsigned one_wave = ctx->options->ge_wave_size / MAX2(num_tcs_input_cp, num_tcs_output_cp); num_patches = MIN2(num_patches, one_wave); } return num_patches; @@ -3038,7 +3038,8 @@ handle_es_outputs_post(struct radv_shader_context *ctx, LLVMValueRef wave_idx = ac_unpack_param(>ac, ctx->merged_wave_info, 24, 4); vertex_idx = LLVMBuildOr(ctx->ac.builder, vertex_idx, LLVMBuildMul(ctx->ac.builder, wave_idx, - LLVMConstInt(ctx->ac.i32, 64, false), ""), ""); + LLVMConstInt(ctx->ac.i32, + ctx->ac.wave_size, false), ""), ""); lds_base = LLVMBuildMul(ctx->ac.builder, vertex_idx, LLVMConstInt(ctx->ac.i32, itemsize_dw, 0), ""); } @@ -3140,7 +3141,7 @@ static LLVMValueRef get_thread_id_in_tg(struct radv_shader_context *ctx) LLVMBuilderRef builder = ctx->ac.builder; LLVMValueRef tmp; tmp = LLVMBuildMul(builder, get_wave_id_in_tg(ctx), - LLVMConstInt(ctx->ac.i32, 64, false), ""); + LLVMConstInt(ctx->ac.i32, ctx->ac.wave_size, false), ""); return LLVMBuildAdd(builder, tmp, ac_get_thread_id(>ac), ""); } @@ -4190,7 +4191,7 @@ ac_setup_rings(struct radv_shader_context *ctx) */ LLVMTypeRef v2i64 = LLVMVectorType(ctx->ac.i64, 2); uint64_t stream_offset = 0; - unsigned num_records = 64; + unsigned num_records = ctx->ac.wave_size; LLVMValueRef base_ring; base_ring = @@ -4223,7 +4224,7 @@ ac_setup_rings(struct radv_shader_context *ctx) ring = LLVMBuildInsertElement(ctx->ac.builder,
[Mesa-dev] [PATCH 2/6] radv/gfx10: implement a bug workaround for NGG -> legacy transitions
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_cmd_buffer.c | 14 ++ src/amd/vulkan/si_cmd_buffer.c | 9 +++-- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c index 37026246aa9..cf3f81b2031 100644 --- a/src/amd/vulkan/radv_cmd_buffer.c +++ b/src/amd/vulkan/radv_cmd_buffer.c @@ -3626,6 +3626,20 @@ void radv_CmdBindPipeline( /* Prefetch all pipeline shaders at first draw time. */ cmd_buffer->state.prefetch_L2_mask |= RADV_PREFETCH_SHADERS; + if ((cmd_buffer->device->physical_device->rad_info.family == CHIP_NAVI10 || +cmd_buffer->device->physical_device->rad_info.family == CHIP_NAVI12 || +cmd_buffer->device->physical_device->rad_info.family == CHIP_NAVI14) && + cmd_buffer->state.emitted_pipeline && + radv_pipeline_has_ngg(cmd_buffer->state.emitted_pipeline) && + !radv_pipeline_has_ngg(cmd_buffer->state.pipeline)) { + /* Transitioning from NGG to legacy GS requires +* VGT_FLUSH on Navi10-14. VGT_FLUSH is also emitted +* at the beginning of IBs when legacy GS ring pointers +* are set. +*/ + cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_VGT_FLUSH; + } + radv_bind_dynamic_state(cmd_buffer, >dynamic_state); radv_bind_streamout_state(cmd_buffer, pipeline); diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c index 94f759139ee..18b2236e54b 100644 --- a/src/amd/vulkan/si_cmd_buffer.c +++ b/src/amd/vulkan/si_cmd_buffer.c @@ -878,8 +878,7 @@ gfx10_cs_emit_cache_flush(struct radeon_cmdbuf *cs, unsigned cb_db_event = 0; /* We don't need these. */ - assert(!(flush_bits & (RADV_CMD_FLAG_VGT_FLUSH | - RADV_CMD_FLAG_VGT_STREAMOUT_SYNC))); + assert(!(flush_bits & (RADV_CMD_FLAG_VGT_STREAMOUT_SYNC))); if (flush_bits & RADV_CMD_FLAG_INV_ICACHE) gcr_cntl |= S_586_GLI_INV(V_586_GLI_ALL); @@ -998,6 +997,12 @@ gfx10_cs_emit_cache_flush(struct radeon_cmdbuf *cs, *flush_cnt, 0x); } + /* VGT state sync */ + if (flush_bits & RADV_CMD_FLAG_VGT_FLUSH) { + radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); + radeon_emit(cs, EVENT_TYPE(V_028A90_VGT_FLUSH) | EVENT_INDEX(0)); + } + /* Ignore fields that only modify the behavior of other fields. */ if (gcr_cntl & C_586_GL1_RANGE & C_586_GL2_RANGE & C_586_SEQ) { /* Flush caches and wait for the caches to assert idle. -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] radv/gfx10: remove an obsolete VGT_REUSE_OFF workaround
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index 4d4f86a7e24..b3952846f43 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3641,12 +3641,6 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, S_028A84_PRIMITIVEID_EN(es_enable_prim_id) | S_028A84_NGG_DISABLE_PROVOK_REUSE(es_enable_prim_id)); - bool vgt_reuse_off = pipeline->device->physical_device->rad_info.family == CHIP_NAVI10 && - pipeline->device->physical_device->rad_info.chip_external_rev == 0x1 && -es_type == MESA_SHADER_TESS_EVAL; - - radeon_set_context_reg(ctx_cs, R_028AB4_VGT_REUSE_OFF, - S_028AB4_REUSE_OFF(vgt_reuse_off)); radeon_set_context_reg(ctx_cs, R_028AAC_VGT_ESGS_RING_ITEMSIZE, ngg_state->vgt_esgs_ring_itemsize); -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] radv/gfx10: disable LATE_ALLOC_GS on Navi14
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/si_cmd_buffer.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c index 3d6c672dd0f..d48ed804e63 100644 --- a/src/amd/vulkan/si_cmd_buffer.c +++ b/src/amd/vulkan/si_cmd_buffer.c @@ -311,6 +311,7 @@ si_emit_graphics(struct radv_physical_device *physical_device, late_alloc_limit = (num_cu_per_sh - 2) * 4; } + unsigned late_alloc_limit_gs = late_alloc_limit; unsigned cu_mask_vs = 0x; unsigned cu_mask_gs = 0x; @@ -324,6 +325,12 @@ si_emit_graphics(struct radv_physical_device *physical_device, } } + /* Don't use late alloc for NGG on Navi14 due to a hw bug. */ + if (physical_device->rad_info.family == CHIP_NAVI14) { + late_alloc_limit_gs = 0; + cu_mask_gs = 0x; + } + radeon_set_sh_reg_idx(physical_device, cs, R_00B118_SPI_SHADER_PGM_RSRC3_VS, 3, S_00B118_CU_EN(cu_mask_vs) | S_00B118_WAVE_LIMIT(0x3F)); @@ -336,7 +343,7 @@ si_emit_graphics(struct radv_physical_device *physical_device, if (physical_device->rad_info.chip_class >= GFX10) { radeon_set_sh_reg_idx(physical_device, cs, R_00B204_SPI_SHADER_PGM_RSRC4_GS, 3, S_00B204_CU_EN(0x) | - S_00B204_SPI_SHADER_LATE_ALLOC_GS_GFX10(late_alloc_limit)); + S_00B204_SPI_SHADER_LATE_ALLOC_GS_GFX10(late_alloc_limit_gs)); } radeon_set_sh_reg_idx(physical_device, cs, R_00B01C_SPI_SHADER_PGM_RSRC3_PS, -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] radv/gfx10: implement a bug workaround for GE_PC_ALLOC
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 17 - src/amd/vulkan/si_cmd_buffer.c | 13 + 2 files changed, 13 insertions(+), 17 deletions(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index 5c913f29a5a..4d4f86a7e24 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3456,18 +3456,6 @@ radv_pipeline_generate_vgt_gs_mode(struct radeon_cmdbuf *ctx_cs, radeon_set_context_reg(ctx_cs, R_028A40_VGT_GS_MODE, vgt_gs_mode); } -static void -gfx10_set_ge_pc_alloc(struct radeon_cmdbuf *ctx_cs, - struct radv_pipeline *pipeline, - bool culling) -{ - struct radeon_info *info = >device->physical_device->rad_info; - - radeon_set_uconfig_reg(ctx_cs, R_030980_GE_PC_ALLOC, - S_030980_OVERSUB_EN(1) | - S_030980_NUM_PC_LINES((culling ? 256 : 128) * info->max_se - 1)); -} - static void radv_pipeline_generate_hw_vs(struct radeon_cmdbuf *ctx_cs, struct radeon_cmdbuf *cs, @@ -3534,9 +3522,6 @@ radv_pipeline_generate_hw_vs(struct radeon_cmdbuf *ctx_cs, if (pipeline->device->physical_device->rad_info.chip_class <= GFX8) radeon_set_context_reg(ctx_cs, R_028AB4_VGT_REUSE_OFF, outinfo->writes_viewport_index); - - if (pipeline->device->physical_device->rad_info.chip_class >= GFX10) - gfx10_set_ge_pc_alloc(ctx_cs, pipeline, false); } static void @@ -3699,8 +3684,6 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, S_03096C_PRIM_GRP_SIZE(ngg_state->max_gsprims) | S_03096C_VERT_GRP_SIZE(ngg_state->hw_max_esverts) | S_03096C_BREAK_WAVE_AT_EOI(break_wave_at_eoi)); - - gfx10_set_ge_pc_alloc(ctx_cs, pipeline, false); } static void diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c index 18b2236e54b..3d6c672dd0f 100644 --- a/src/amd/vulkan/si_cmd_buffer.c +++ b/src/amd/vulkan/si_cmd_buffer.c @@ -382,6 +382,19 @@ si_emit_graphics(struct radv_physical_device *physical_device, S_00B0C0_SOFT_GROUPING_EN(1) | S_00B0C0_NUMBER_OF_REQUESTS_PER_CU(4 - 1)); radeon_set_sh_reg(cs, R_00B1C0_SPI_SHADER_REQ_CTRL_VS, 0); + + if (physical_device->rad_info.family == CHIP_NAVI10 || + physical_device->rad_info.family == CHIP_NAVI12 || + physical_device->rad_info.family == CHIP_NAVI14) { + /* SQ_NON_EVENT must be emitted before GE_PC_ALLOC is written. */ + radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); + radeon_emit(cs, EVENT_TYPE(V_028A90_SQ_NON_EVENT) | EVENT_INDEX(0)); + } + + /* TODO: For culling, replace 128 with 256. */ + radeon_set_uconfig_reg(cs, R_030980_GE_PC_ALLOC, + S_030980_OVERSUB_EN(1) | + S_030980_NUM_PC_LINES(128 * physical_device->rad_info.max_se - 1)); } if (physical_device->rad_info.chip_class >= GFX8) { -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] radv: skip draw calls with 0-sized index buffers
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_cmd_buffer.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c index e0ea47b5745..37026246aa9 100644 --- a/src/amd/vulkan/radv_cmd_buffer.c +++ b/src/amd/vulkan/radv_cmd_buffer.c @@ -4323,6 +4323,12 @@ radv_emit_draw_packets(struct radv_cmd_buffer *cmd_buffer, int index_size = radv_get_vgt_index_size(state->index_type); uint64_t index_va; + /* Skip draw calls with 0-sized index buffers. They +* cause a hang on some chips, like Navi10-14. +*/ + if (!cmd_buffer->state.max_index_count) + return; + index_va = state->index_va; index_va += info->first_index * index_size; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] radv/gfx10: implement a GE bug workaround
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 27 +++ 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index b3952846f43..d62066cbee4 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3592,6 +3592,7 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, bool es_enable_prim_id = outinfo->export_prim_id || (es && es->info.info.uses_prim_id); bool break_wave_at_eoi = false; + unsigned ge_cntl; unsigned nparams; if (es_type == MESA_SHADER_TESS_EVAL) { @@ -3674,10 +3675,28 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, S_028838_INDEX_BUF_EDGE_FLAG_ENA(!radv_pipeline_has_tess(pipeline) && !radv_pipeline_has_gs(pipeline))); - radeon_set_uconfig_reg(ctx_cs, R_03096C_GE_CNTL, - S_03096C_PRIM_GRP_SIZE(ngg_state->max_gsprims) | - S_03096C_VERT_GRP_SIZE(ngg_state->hw_max_esverts) | - S_03096C_BREAK_WAVE_AT_EOI(break_wave_at_eoi)); + ge_cntl = S_03096C_PRIM_GRP_SIZE(ngg_state->max_gsprims) | + S_03096C_VERT_GRP_SIZE(ngg_state->hw_max_esverts) | + S_03096C_BREAK_WAVE_AT_EOI(break_wave_at_eoi); + + /* Bug workaround for a possible hang with non-tessellation cases. +* Tessellation always sets GE_CNTL.VERT_GRP_SIZE = 0 +* +* Requirement: GE_CNTL.VERT_GRP_SIZE = VGT_GS_ONCHIP_CNTL.ES_VERTS_PER_SUBGRP - 5 +*/ + if ((pipeline->device->physical_device->rad_info.family == CHIP_NAVI10 || +pipeline->device->physical_device->rad_info.family == CHIP_NAVI12 || +pipeline->device->physical_device->rad_info.family == CHIP_NAVI14) && + !radv_pipeline_has_tess(pipeline) && + ngg_state->hw_max_esverts != 256) { + ge_cntl &= C_03096C_VERT_GRP_SIZE; + + if (ngg_state->hw_max_esverts > 5) { + ge_cntl |= S_03096C_VERT_GRP_SIZE(ngg_state->hw_max_esverts - 5); + } + } + + radeon_set_uconfig_reg(ctx_cs, R_03096C_GE_CNTL, ge_cntl); } static void -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv/gfx10: add Wave32 support for compute shaders
It can be enabled with RADV_PERFTEST=cswave32. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_debug.h | 1 + src/amd/vulkan/radv_device.c | 12 +++- src/amd/vulkan/radv_nir_to_llvm.c | 14 +- src/amd/vulkan/radv_pipeline.c| 3 ++- src/amd/vulkan/radv_private.h | 3 +++ src/amd/vulkan/radv_shader.c | 25 ++--- src/amd/vulkan/radv_shader.h | 1 + 7 files changed, 53 insertions(+), 6 deletions(-) diff --git a/src/amd/vulkan/radv_debug.h b/src/amd/vulkan/radv_debug.h index 723fabda57f..6414e882676 100644 --- a/src/amd/vulkan/radv_debug.h +++ b/src/amd/vulkan/radv_debug.h @@ -64,6 +64,7 @@ enum { RADV_PERFTEST_BO_LIST= 0x20, RADV_PERFTEST_SHADER_BALLOT = 0x40, RADV_PERFTEST_TC_COMPAT_CMASK = 0x80, + RADV_PERFTEST_CS_WAVE_32 = 0x100, }; bool diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 65e3ccf91ad..29be192443a 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -383,6 +383,14 @@ radv_physical_device_init(struct radv_physical_device *device, device->use_shader_ballot = device->instance->perftest_flags & RADV_PERFTEST_SHADER_BALLOT; + /* Determine the number of threads per wave for all stages. */ + device->cs_wave_size = 64; + + if (device->rad_info.chip_class >= GFX10) { + if (device->instance->perftest_flags & RADV_PERFTEST_CS_WAVE_32) + device->cs_wave_size = 32; + } + radv_physical_device_init_mem_types(device); radv_fill_device_extension_table(device, >supported_extensions); @@ -494,6 +502,7 @@ static const struct debug_control radv_perftest_options[] = { {"bolist", RADV_PERFTEST_BO_LIST}, {"shader_ballot", RADV_PERFTEST_SHADER_BALLOT}, {"tccompatcmask", RADV_PERFTEST_TC_COMPAT_CMASK}, + {"cswave32", RADV_PERFTEST_CS_WAVE_32}, {NULL, 0} }; @@ -1930,7 +1939,8 @@ VkResult radv_CreateDevice( device->scratch_waves = MAX2(32 * physical_device->rad_info.num_good_compute_units, max_threads_per_block / 64); - device->dispatch_initiator = S_00B800_COMPUTE_SHADER_EN(1); + device->dispatch_initiator = S_00B800_COMPUTE_SHADER_EN(1) | + S_00B800_CS_W32_EN(device->physical_device->cs_wave_size == 32); if (device->physical_device->rad_info.chip_class >= GFX7) { /* If the KMD allows it (there is a KMD hw register for it), diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 020c6d17771..feaab8f6370 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -4317,6 +4317,15 @@ static void declare_esgs_ring(struct radv_shader_context *ctx) LLVMSetAlignment(ctx->esgs_ring, 64 * 1024); } +static uint8_t +radv_nir_shader_wave_size(struct nir_shader *const *shaders, int shader_count, + const struct radv_nir_compiler_options *options) +{ + if (shaders[0]->info.stage == MESA_SHADER_COMPUTE) + return options->cs_wave_size; + return 64; +} + static LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm, struct nir_shader *const *shaders, @@ -4333,8 +4342,11 @@ LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm, options->unsafe_math ? AC_FLOAT_MODE_UNSAFE_FP_MATH : AC_FLOAT_MODE_DEFAULT; + uint8_t wave_size = radv_nir_shader_wave_size(shaders, + shader_count, options); + ac_llvm_context_init(, ac_llvm, options->chip_class, -options->family, float_mode, 64); +options->family, float_mode, wave_size); ctx.context = ctx.ac.context; radv_nir_shader_info_init(_info->info); diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index 583b600dfdd..6b8b7bbe25a 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -4648,7 +4648,8 @@ radv_compute_generate_pm4(struct radv_pipeline *pipeline) threads_per_threadgroup = compute_shader->info.cs.block_size[0] * compute_shader->info.cs.block_size[1] * compute_shader->info.cs.block_size[2]; - waves_per_threadgroup = DIV_ROUND_UP(threads_per_threadgroup, 64); + waves_per_threadgroup = DIV_ROUND_UP(threads_per_threadgroup, + device->physical_device->cs_wave_size); if (device->physical_device->rad_info.chip_class >= G
[Mesa-dev] [PATCH] radv/gfx10: only compile the GS copy shader on-demand
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index 583b600dfdd..e11196bd82e 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -2626,7 +2626,8 @@ void radv_create_shaders(struct radv_pipeline *pipeline, if(modules[MESA_SHADER_GEOMETRY]) { struct radv_shader_binary *gs_copy_binary = NULL; - if (!pipeline->gs_copy_shader) { + if (!pipeline->gs_copy_shader && + !radv_pipeline_has_ngg(pipeline)) { pipeline->gs_copy_shader = radv_create_gs_copy_shader( device, nir[MESA_SHADER_GEOMETRY], _copy_binary, keys[MESA_SHADER_GEOMETRY].has_multiview_view_index); -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv/gfx10: do not use the fast depth or stencil clear bytes path
On 7/29/19 2:30 PM, Bas Nieuwenhuizen wrote: On Mon, Jul 29, 2019 at 2:20 PM Samuel Pitoiset wrote: On 7/29/19 2:15 PM, Bas Nieuwenhuizen wrote: On Mon, Jul 29, 2019 at 2:11 PM Samuel Pitoiset wrote: The HTILE masks seem to be different and so we need to rework that path. Just disabled for now and implement later. The HTILE masks are not different per amdvlk? Can you at least rework the commit message to reflect that? "It needs to be reworked on GFX10, so just disable it for now." ? How about just "It causes issues on GFX10"? We don't know it needs to be reworked either? Looks like it needs but whatever, I'm fine with that, so Rb? This fixes rendering issues with vkmark and Wreckfest at least. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_meta_clear.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/amd/vulkan/radv_meta_clear.c b/src/amd/vulkan/radv_meta_clear.c index b93ba3e0b29..8ddc2e38cd4 100644 --- a/src/amd/vulkan/radv_meta_clear.c +++ b/src/amd/vulkan/radv_meta_clear.c @@ -1005,7 +1005,7 @@ radv_can_fast_clear_depth(struct radv_cmd_buffer *cmd_buffer, if (!view_mask && clear_rect->layerCount != iview->image->info.array_size) return false; - if (cmd_buffer->device->physical_device->rad_info.chip_class < GFX9 && + if (cmd_buffer->device->physical_device->rad_info.chip_class != GFX9 && (!(aspects & VK_IMAGE_ASPECT_DEPTH_BIT) || ((vk_format_aspects(iview->image->vk_format) & VK_IMAGE_ASPECT_STENCIL_BIT) && !(aspects & VK_IMAGE_ASPECT_STENCIL_BIT @@ -1048,7 +1048,8 @@ radv_fast_clear_depth(struct radv_cmd_buffer *cmd_buffer, iview->image->planes[0].surface.htile_size, clear_word); } else { /* Only clear depth or stencil bytes in the HTILE buffer. */ - assert(cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9); + /* TODO: Implement that path for GFX10. */ + assert(cmd_buffer->device->physical_device->rad_info.chip_class == GFX9); flush_bits = clear_htile_mask(cmd_buffer, iview->image->bo, iview->image->offset + iview->image->htile_offset, iview->image->planes[0].surface.htile_size, clear_word, -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv/gfx10: do not use the fast depth or stencil clear bytes path
On 7/29/19 2:15 PM, Bas Nieuwenhuizen wrote: On Mon, Jul 29, 2019 at 2:11 PM Samuel Pitoiset wrote: The HTILE masks seem to be different and so we need to rework that path. Just disabled for now and implement later. The HTILE masks are not different per amdvlk? Can you at least rework the commit message to reflect that? "It needs to be reworked on GFX10, so just disable it for now." ? This fixes rendering issues with vkmark and Wreckfest at least. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_meta_clear.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/amd/vulkan/radv_meta_clear.c b/src/amd/vulkan/radv_meta_clear.c index b93ba3e0b29..8ddc2e38cd4 100644 --- a/src/amd/vulkan/radv_meta_clear.c +++ b/src/amd/vulkan/radv_meta_clear.c @@ -1005,7 +1005,7 @@ radv_can_fast_clear_depth(struct radv_cmd_buffer *cmd_buffer, if (!view_mask && clear_rect->layerCount != iview->image->info.array_size) return false; - if (cmd_buffer->device->physical_device->rad_info.chip_class < GFX9 && + if (cmd_buffer->device->physical_device->rad_info.chip_class != GFX9 && (!(aspects & VK_IMAGE_ASPECT_DEPTH_BIT) || ((vk_format_aspects(iview->image->vk_format) & VK_IMAGE_ASPECT_STENCIL_BIT) && !(aspects & VK_IMAGE_ASPECT_STENCIL_BIT @@ -1048,7 +1048,8 @@ radv_fast_clear_depth(struct radv_cmd_buffer *cmd_buffer, iview->image->planes[0].surface.htile_size, clear_word); } else { /* Only clear depth or stencil bytes in the HTILE buffer. */ - assert(cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9); + /* TODO: Implement that path for GFX10. */ + assert(cmd_buffer->device->physical_device->rad_info.chip_class == GFX9); flush_bits = clear_htile_mask(cmd_buffer, iview->image->bo, iview->image->offset + iview->image->htile_offset, iview->image->planes[0].surface.htile_size, clear_word, -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv/gfx10: do not use the fast depth or stencil clear bytes path
The HTILE masks seem to be different and so we need to rework that path. Just disabled for now and implement later. This fixes rendering issues with vkmark and Wreckfest at least. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_meta_clear.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/amd/vulkan/radv_meta_clear.c b/src/amd/vulkan/radv_meta_clear.c index b93ba3e0b29..8ddc2e38cd4 100644 --- a/src/amd/vulkan/radv_meta_clear.c +++ b/src/amd/vulkan/radv_meta_clear.c @@ -1005,7 +1005,7 @@ radv_can_fast_clear_depth(struct radv_cmd_buffer *cmd_buffer, if (!view_mask && clear_rect->layerCount != iview->image->info.array_size) return false; - if (cmd_buffer->device->physical_device->rad_info.chip_class < GFX9 && + if (cmd_buffer->device->physical_device->rad_info.chip_class != GFX9 && (!(aspects & VK_IMAGE_ASPECT_DEPTH_BIT) || ((vk_format_aspects(iview->image->vk_format) & VK_IMAGE_ASPECT_STENCIL_BIT) && !(aspects & VK_IMAGE_ASPECT_STENCIL_BIT @@ -1048,7 +1048,8 @@ radv_fast_clear_depth(struct radv_cmd_buffer *cmd_buffer, iview->image->planes[0].surface.htile_size, clear_word); } else { /* Only clear depth or stencil bytes in the HTILE buffer. */ - assert(cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9); + /* TODO: Implement that path for GFX10. */ + assert(cmd_buffer->device->physical_device->rad_info.chip_class == GFX9); flush_bits = clear_htile_mask(cmd_buffer, iview->image->bo, iview->image->offset + iview->image->htile_offset, iview->image->planes[0].surface.htile_size, clear_word, -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] ac: do not crash when the buffer data format is invalid
This might happen when a pipeline doesn't define the vertex input state, so the buffer data format is 0 (aka INVALID). This fixes crashes when compiling some shaders on GFX10. Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 250bfc5229e..278f8893432 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -1508,6 +1508,7 @@ ac_get_tbuffer_format(struct ac_llvm_context *ctx, unsigned format; switch (dfmt) { default: unreachable("bad dfmt"); + case V_008F0C_BUF_DATA_FORMAT_INVALID: format = V_008F0C_IMG_FORMAT_INVALID; break; case V_008F0C_BUF_DATA_FORMAT_8: format = V_008F0C_IMG_FORMAT_8_UINT; break; case V_008F0C_BUF_DATA_FORMAT_8_8: format = V_008F0C_IMG_FORMAT_8_8_UINT; break; case V_008F0C_BUF_DATA_FORMAT_8_8_8_8: format = V_008F0C_IMG_FORMAT_8_8_8_8_UINT; break; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: implement VK_EXT_index_type_uint8
Natively supported on VI+. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_cmd_buffer.c | 60 +++ src/amd/vulkan/radv_device.c | 6 src/amd/vulkan/radv_extensions.py | 1 + 3 files changed, 61 insertions(+), 6 deletions(-) diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c index d9783e6ca8a..e0ea47b5745 100644 --- a/src/amd/vulkan/radv_cmd_buffer.c +++ b/src/amd/vulkan/radv_cmd_buffer.c @@ -2541,6 +2541,21 @@ struct radv_draw_info { uint64_t strmout_buffer_offset; }; +static uint32_t +radv_get_primitive_reset_index(struct radv_cmd_buffer *cmd_buffer) +{ + switch (cmd_buffer->state.index_type) { + case V_028A7C_VGT_INDEX_8: + return 0xffu; + case V_028A7C_VGT_INDEX_16: + return 0xu; + case V_028A7C_VGT_INDEX_32: + return 0xu; + default: + unreachable("invalid index type"); + } +} + static void si_emit_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer, bool instanced_draw, bool indirect_draw, @@ -2612,7 +2627,7 @@ radv_emit_draw_registers(struct radv_cmd_buffer *cmd_buffer, if (primitive_reset_en) { uint32_t primitive_reset_index = - state->index_type ? 0xu : 0xu; + radv_get_primitive_reset_index(cmd_buffer); if (primitive_reset_index != state->last_primitive_reset_index) { radeon_set_context_reg(cs, @@ -3233,6 +3248,36 @@ void radv_CmdBindVertexBuffers( cmd_buffer->state.dirty |= RADV_CMD_DIRTY_VERTEX_BUFFER; } +static uint32_t +vk_to_index_type(VkIndexType type) +{ + switch (type) { + case VK_INDEX_TYPE_UINT8_EXT: + return V_028A7C_VGT_INDEX_8; + case VK_INDEX_TYPE_UINT16: + return V_028A7C_VGT_INDEX_16; + case VK_INDEX_TYPE_UINT32: + return V_028A7C_VGT_INDEX_32; + default: + unreachable("invalid index type"); + } +} + +static uint32_t +radv_get_vgt_index_size(uint32_t type) +{ + switch (type) { + case V_028A7C_VGT_INDEX_8: + return 1; + case V_028A7C_VGT_INDEX_16: + return 2; + case V_028A7C_VGT_INDEX_32: + return 4; + default: + unreachable("invalid index type"); + } +} + void radv_CmdBindIndexBuffer( VkCommandBuffer commandBuffer, VkBuffer buffer, @@ -3251,12 +3296,12 @@ void radv_CmdBindIndexBuffer( cmd_buffer->state.index_buffer = index_buffer; cmd_buffer->state.index_offset = offset; - cmd_buffer->state.index_type = indexType; /* vk matches hw */ + cmd_buffer->state.index_type = vk_to_index_type(indexType); cmd_buffer->state.index_va = radv_buffer_get_va(index_buffer->bo); cmd_buffer->state.index_va += index_buffer->offset + offset; - int index_size_shift = cmd_buffer->state.index_type ? 2 : 1; - cmd_buffer->state.max_index_count = (index_buffer->size - offset) >> index_size_shift; + int index_size = radv_get_vgt_index_size(indexType); + cmd_buffer->state.max_index_count = (index_buffer->size - offset) / index_size; cmd_buffer->state.dirty |= RADV_CMD_DIRTY_INDEX_BUFFER; radv_cs_add_buffer(cmd_buffer->device->ws, cmd_buffer->cs, index_buffer->bo); } @@ -4275,7 +4320,7 @@ radv_emit_draw_packets(struct radv_cmd_buffer *cmd_buffer, } if (info->indexed) { - int index_size = state->index_type ? 4 : 2; + int index_size = radv_get_vgt_index_size(state->index_type); uint64_t index_va; index_va = state->index_va; @@ -4354,8 +4399,11 @@ static bool radv_need_late_scissor_emission(struct radv_cmd_buffer *cmd_buffer, if (cmd_buffer->state.dirty & used_states) return true; + uint32_t primitive_reset_index = + radv_get_primitive_reset_index(cmd_buffer); + if (info->indexed && state->pipeline->graphics.prim_restart_enable && - (state->index_type ? 0xu : 0xu) != state->last_primitive_reset_index) + primitive_reset_index != state->last_primitive_reset_index) return true; return false; diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 9ba100df6e8..65e3ccf91ad 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -987,6 +987,12 @@ void radv_GetPhysicalDeviceFeatures2( features->uniformBufferStandardLayout = true; break; } +
Re: [Mesa-dev] [PATCH] radv: Set correct metadata size for GFX9+.
Reviewed-by: Samuel Pitoiset On 7/25/19 4:55 PM, Bas Nieuwenhuizen wrote: Without correct size, radeonsi assumes the metadata is incorrect, which can and will cause issues. Since the metadata is really incorrect without the size, let us fix that. Fixes: e43cc3e3afc "radv/gfx9: handle GFX9 opaque metadata" --- src/amd/vulkan/radv_image.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index 0941cbb..541ff4086f4 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -1034,7 +1034,8 @@ radv_query_opaque_metadata(struct radv_device *device, for (i = 0; i <= image->info.levels - 1; i++) md->metadata[10+i] = image->planes[0].surface.u.legacy.level[i].offset >> 8; md->size_metadata = (11 + image->info.levels - 1) * 4; - } + } else + md->size_metadata = 10 * 4; } void ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv/gfx10: Disable DCC with scanout.
It's already disabled later in this function? On 7/25/19 4:34 PM, Bas Nieuwenhuizen wrote: (a) radv does not set the DCC fields required yet. (b) radeonsi just broke their DCC metadata. Fixes: f8b6c5a1a63 "radeonsi: rewrite si_get_opaque_metadata, also for gfx10 support" --- src/amd/vulkan/radv_image.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index 0941cbb..4bcdb70214a 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -161,6 +161,9 @@ radv_use_dcc_for_image(struct radv_device *device, if (image->shareable) return false; + if (radv_surface_has_scanout(device, create_info)) + return false; + /* TODO: Enable DCC for storage images. */ if ((pCreateInfo->usage & VK_IMAGE_USAGE_STORAGE_BIT) || (pCreateInfo->flags & VK_IMAGE_CREATE_EXTENDED_USAGE_BIT)) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv/gfx10: use L2 for DMA copy/fill operations
On 7/25/19 3:39 PM, Bas Nieuwenhuizen wrote: r-b though it sounds like some of our cache flushes might be not ideal. Yes. On Thu, Jul 25, 2019 at 3:35 PM Samuel Pitoiset wrote: It's coherent and faster. GFX7-GFX9 should also support this but for now only uses L2 for GFX10 because it's untested on previous gens. This fixes dEQP-VK.memory.pipeline_barrier.transfer_* This also fixes some missing geometry in Dawn Of War III because VBOs weren't updated correctly. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/si_cmd_buffer.c | 16 1 file changed, 16 insertions(+) diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c index 21a90cb2514..94f759139ee 100644 --- a/src/amd/vulkan/si_cmd_buffer.c +++ b/src/amd/vulkan/si_cmd_buffer.c @@ -1501,6 +1501,14 @@ void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer, unsigned dma_flags = 0; unsigned byte_count = MIN2(size, cp_dma_max_byte_count(cmd_buffer)); + if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX10) { + /* DMA operations via L2 are coherent and faster. +* TODO: GFX7-GFX9 should also support this but it +* requires tests/benchmarks. +*/ + dma_flags |= CP_DMA_USE_L2; + } + si_cp_dma_prepare(cmd_buffer, byte_count, size + skipped_size + realign_size, _flags); @@ -1545,6 +1553,14 @@ void si_cp_dma_clear_buffer(struct radv_cmd_buffer *cmd_buffer, uint64_t va, unsigned byte_count = MIN2(size, cp_dma_max_byte_count(cmd_buffer)); unsigned dma_flags = CP_DMA_CLEAR; + if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX10) { + /* DMA operations via L2 are coherent and faster. +* TODO: GFX7-GFX9 should also support this but it +* requires tests/benchmarks. +*/ + dma_flags |= CP_DMA_USE_L2; + } + si_cp_dma_prepare(cmd_buffer, byte_count, size, _flags); /* Emit the clear packet. */ -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv/gfx10: use L2 for DMA copy/fill operations
It's coherent and faster. GFX7-GFX9 should also support this but for now only uses L2 for GFX10 because it's untested on previous gens. This fixes dEQP-VK.memory.pipeline_barrier.transfer_* This also fixes some missing geometry in Dawn Of War III because VBOs weren't updated correctly. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/si_cmd_buffer.c | 16 1 file changed, 16 insertions(+) diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c index 21a90cb2514..94f759139ee 100644 --- a/src/amd/vulkan/si_cmd_buffer.c +++ b/src/amd/vulkan/si_cmd_buffer.c @@ -1501,6 +1501,14 @@ void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer, unsigned dma_flags = 0; unsigned byte_count = MIN2(size, cp_dma_max_byte_count(cmd_buffer)); + if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX10) { + /* DMA operations via L2 are coherent and faster. +* TODO: GFX7-GFX9 should also support this but it +* requires tests/benchmarks. +*/ + dma_flags |= CP_DMA_USE_L2; + } + si_cp_dma_prepare(cmd_buffer, byte_count, size + skipped_size + realign_size, _flags); @@ -1545,6 +1553,14 @@ void si_cp_dma_clear_buffer(struct radv_cmd_buffer *cmd_buffer, uint64_t va, unsigned byte_count = MIN2(size, cp_dma_max_byte_count(cmd_buffer)); unsigned dma_flags = CP_DMA_CLEAR; + if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX10) { + /* DMA operations via L2 are coherent and faster. +* TODO: GFX7-GFX9 should also support this but it +* requires tests/benchmarks. +*/ + dma_flags |= CP_DMA_USE_L2; + } + si_cp_dma_prepare(cmd_buffer, byte_count, size, _flags); /* Emit the clear packet. */ -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] radv/gfx10: fix intensity formats by setting ALPHA_IS_ON_MSB
This fixes dEQP-VK.rasterization.primitive_size.points.point_size_* This also fixes some black squares with the Sascha SSAO demo. v2: - do not set for multiple channels - call vi_alpha_is_on_msb() for pre-GFX10 - remove unused 'swap' Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_image.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index 0941cbb..d46946269e6 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -617,6 +617,15 @@ static unsigned gfx9_border_color_swizzle(const enum vk_swizzle swizzle[4]) return bc_swizzle; } +static bool vi_alpha_is_on_msb(struct radv_device *device, VkFormat format) +{ + const struct vk_format_description *desc = vk_format_description(format); + + if (device->physical_device->rad_info.chip_class >= GFX10 && desc->nr_channels == 1) + return desc->swizzle[3] == VK_SWIZZLE_X; + + return radv_translate_colorswap(format, false) <= 1; +} /** * Build the sampler view descriptor for a texture (GFX10). */ @@ -691,11 +700,9 @@ gfx10_make_texture_descriptor(struct radv_device *device, state[7] = 0; if (radv_dcc_enabled(image, first_level)) { - unsigned swap = radv_translate_colorswap(vk_format, FALSE); - state[6] |= S_00A018_MAX_UNCOMPRESSED_BLOCK_SIZE(V_028C78_MAX_BLOCK_SIZE_256B) | S_00A018_MAX_COMPRESSED_BLOCK_SIZE(V_028C78_MAX_BLOCK_SIZE_128B) | - S_00A018_ALPHA_IS_ON_MSB(swap <= 1); + S_00A018_ALPHA_IS_ON_MSB(vi_alpha_is_on_msb(device, vk_format)); } /* Initialize the sampler view for FMASK. */ @@ -849,9 +856,7 @@ si_make_texture_descriptor(struct radv_device *device, state[5] |= S_008F24_LAST_ARRAY(last_layer); } if (image->dcc_offset) { - unsigned swap = radv_translate_colorswap(vk_format, FALSE); - - state[6] = S_008F28_ALPHA_IS_ON_MSB(swap <= 1); + state[6] = S_008F28_ALPHA_IS_ON_MSB(vi_alpha_is_on_msb(device, vk_format)); } else { /* The last dword is unused by hw. The shader uses it to clear * bits in the first dword of sampler state. -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv/gfx10: fix intensity formats by setting ALPHA_IS_ON_MSB
This fixes dEQP-VK.rasterization.primitive_size.points.point_size_* This also fixes some black squares with the Sascha SSAO demo. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_image.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index 0941cbb..59d6d0ced78 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -617,6 +617,19 @@ static unsigned gfx9_border_color_swizzle(const enum vk_swizzle swizzle[4]) return bc_swizzle; } +static bool vi_alpha_is_on_msb(struct radv_device *device, VkFormat format) +{ + const struct vk_format_description *desc = vk_format_description(format); + + /* Formats with 3 channels can't have alpha. */ + if (desc->nr_channels == 3) + return true; /* same as xxxA; is any value OK here? */ + + if (device->physical_device->rad_info.chip_class >= GFX10 && desc->nr_channels == 1) + return desc->swizzle[3] == VK_SWIZZLE_X; + + return radv_translate_colorswap(format, false) <= 1; +} /** * Build the sampler view descriptor for a texture (GFX10). */ @@ -695,7 +708,7 @@ gfx10_make_texture_descriptor(struct radv_device *device, state[6] |= S_00A018_MAX_UNCOMPRESSED_BLOCK_SIZE(V_028C78_MAX_BLOCK_SIZE_256B) | S_00A018_MAX_COMPRESSED_BLOCK_SIZE(V_028C78_MAX_BLOCK_SIZE_128B) | - S_00A018_ALPHA_IS_ON_MSB(swap <= 1); + S_00A018_ALPHA_IS_ON_MSB(vi_alpha_is_on_msb(device, vk_format)); } /* Initialize the sampler view for FMASK. */ -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/5] radv/gfx10: do not enable NGG if a pipeline uses XFB
On 7/23/19 9:31 PM, Bas Nieuwenhuizen wrote: On Tue, Jul 23, 2019 at 3:21 PM Samuel Pitoiset wrote: NGG GS for streamout requires a bunch of work, so enable it with the legacy path only for now. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 28 1 file changed, 28 insertions(+) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index a7ff0e2d139..0903e5abf37 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -33,6 +33,7 @@ #include "radv_shader.h" #include "nir/nir.h" #include "nir/nir_builder.h" +#include "nir/nir_xfb_info.h" #include "spirv/nir_spirv.h" #include "vk_util.h" @@ -2269,6 +2270,16 @@ radv_generate_graphics_pipeline_key(struct radv_pipeline *pipeline, return key; } +static bool +radv_nir_stage_uses_xfb(const nir_shader *nir) +{ + nir_xfb_info *xfb = nir_gather_xfb_info(nir, NULL); + bool uses_xfb = !!xfb; + + ralloc_free(xfb); + return uses_xfb; +} + static void radv_fill_shader_keys(struct radv_device *device, struct radv_shader_variant_key *keys, @@ -2321,6 +2332,23 @@ radv_fill_shader_keys(struct radv_device *device, */ keys[MESA_SHADER_TESS_EVAL].vs_common_out.as_ngg = false; } + + /* TODO: Implement streamout support for NGG. */ + bool uses_xfb = false; + if ((nir[MESA_SHADER_VERTEX] && +radv_nir_stage_uses_xfb(nir[MESA_SHADER_VERTEX])) || + (nir[MESA_SHADER_TESS_EVAL] && +radv_nir_stage_uses_xfb(nir[MESA_SHADER_TESS_EVAL])) || + (nir[MESA_SHADER_GEOMETRY] && +radv_nir_stage_uses_xfb(nir[MESA_SHADER_GEOMETRY]))) + uses_xfb = true; transform feedback can only happen on the last stage before PS right? Can we first determine what the last shader is and only then check for xfb? That way we don't have to scan 3 shaders. Yes. Pushed with this slightly improved. + + if (uses_xfb) { + if (nir[MESA_SHADER_TESS_CTRL]) + keys[MESA_SHADER_TESS_EVAL].vs_common_out.as_ngg = false; + else + keys[MESA_SHADER_VERTEX].vs_common_out.as_ngg = false; + } } for(int i = 0; i < MESA_SHADER_STAGES; ++i) -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] radv/gfx10: update streamout descriptors
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_cmd_buffer.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c index 84d627340e9..c2e3f3b5fd0 100644 --- a/src/amd/vulkan/radv_cmd_buffer.c +++ b/src/amd/vulkan/radv_cmd_buffer.c @@ -2419,8 +2419,15 @@ radv_flush_streamout_descriptors(struct radv_cmd_buffer *cmd_buffer) desc[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) | S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) | S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) | - S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) | - S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32); + S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W); + + if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX10) { + desc[3] |= S_008F0C_FORMAT(V_008F0C_IMG_FORMAT_32_FLOAT) | + S_008F0C_OOB_SELECT(3) | + S_008F0C_RESOURCE_LEVEL(1); + } else { + desc[3] |= S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32); + } } va = radv_buffer_get_va(cmd_buffer->upload.upload_bo); -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] radv/gfx10: emit streamout shader config
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_shader.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 5fd1022b05a..56f421026b7 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -690,7 +690,12 @@ static void radv_postprocess_config(const struct radv_physical_device *pdevice, config_out->float_mode |= V_00B028_FP_64_DENORMS; config_out->rsrc2 = S_00B12C_USER_SGPR(info->num_user_sgprs) | - S_00B12C_SCRATCH_EN(scratch_enabled); + S_00B12C_SCRATCH_EN(scratch_enabled) | + S_00B12C_SO_BASE0_EN(!!info->info.so.strides[0]) | + S_00B12C_SO_BASE1_EN(!!info->info.so.strides[1]) | + S_00B12C_SO_BASE2_EN(!!info->info.so.strides[2]) | + S_00B12C_SO_BASE3_EN(!!info->info.so.strides[3]) | + S_00B12C_SO_EN(!!info->info.so.num_outputs); config_out->rsrc1 = S_00B848_VGPRS((num_vgprs - 1) / 4) | S_00B848_DX10_CLAMP(1) | @@ -700,12 +705,7 @@ static void radv_postprocess_config(const struct radv_physical_device *pdevice, config_out->rsrc2 |= S_00B22C_USER_SGPR_MSB_GFX10(info->num_user_sgprs >> 5); } else { config_out->rsrc1 |= S_00B228_SGPRS((num_sgprs - 1) / 8); - config_out->rsrc2 |= S_00B22C_USER_SGPR_MSB_GFX9(info->num_user_sgprs >> 5) | - S_00B12C_SO_BASE0_EN(!!info->info.so.strides[0]) | - S_00B12C_SO_BASE1_EN(!!info->info.so.strides[1]) | - S_00B12C_SO_BASE2_EN(!!info->info.so.strides[2]) | - S_00B12C_SO_BASE3_EN(!!info->info.so.strides[3]) | - S_00B12C_SO_EN(!!info->info.so.num_outputs); + config_out->rsrc2 |= S_00B22C_USER_SGPR_MSB_GFX9(info->num_user_sgprs >> 5); } switch (stage) { -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] radv/gfx10: do not enable NGG if a pipeline uses XFB
NGG GS for streamout requires a bunch of work, so enable it with the legacy path only for now. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 28 1 file changed, 28 insertions(+) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index a7ff0e2d139..0903e5abf37 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -33,6 +33,7 @@ #include "radv_shader.h" #include "nir/nir.h" #include "nir/nir_builder.h" +#include "nir/nir_xfb_info.h" #include "spirv/nir_spirv.h" #include "vk_util.h" @@ -2269,6 +2270,16 @@ radv_generate_graphics_pipeline_key(struct radv_pipeline *pipeline, return key; } +static bool +radv_nir_stage_uses_xfb(const nir_shader *nir) +{ + nir_xfb_info *xfb = nir_gather_xfb_info(nir, NULL); + bool uses_xfb = !!xfb; + + ralloc_free(xfb); + return uses_xfb; +} + static void radv_fill_shader_keys(struct radv_device *device, struct radv_shader_variant_key *keys, @@ -2321,6 +2332,23 @@ radv_fill_shader_keys(struct radv_device *device, */ keys[MESA_SHADER_TESS_EVAL].vs_common_out.as_ngg = false; } + + /* TODO: Implement streamout support for NGG. */ + bool uses_xfb = false; + if ((nir[MESA_SHADER_VERTEX] && +radv_nir_stage_uses_xfb(nir[MESA_SHADER_VERTEX])) || + (nir[MESA_SHADER_TESS_EVAL] && +radv_nir_stage_uses_xfb(nir[MESA_SHADER_TESS_EVAL])) || + (nir[MESA_SHADER_GEOMETRY] && +radv_nir_stage_uses_xfb(nir[MESA_SHADER_GEOMETRY]))) + uses_xfb = true; + + if (uses_xfb) { + if (nir[MESA_SHADER_TESS_CTRL]) + keys[MESA_SHADER_TESS_EVAL].vs_common_out.as_ngg = false; + else + keys[MESA_SHADER_VERTEX].vs_common_out.as_ngg = false; + } } for(int i = 0; i < MESA_SHADER_STAGES; ++i) -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/5] radv/gfx10: enable VK_EXT_transform_feedback
When a pipeline uses transform feedback, the driver fallbacks to the legacy path because NGG support for streamout is a non-trivial amount of work. AMDVLK also uses the legacy path for streamout, while RadeonSI uses the new NGG path. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_extensions.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_extensions.py b/src/amd/vulkan/radv_extensions.py index e9addad0035..8e1d61dfaaf 100644 --- a/src/amd/vulkan/radv_extensions.py +++ b/src/amd/vulkan/radv_extensions.py @@ -129,7 +129,7 @@ EXTENSIONS = [ Extension('VK_EXT_shader_stencil_export', 1, True), Extension('VK_EXT_shader_subgroup_ballot',1, True), Extension('VK_EXT_shader_subgroup_vote', 1, True), -Extension('VK_EXT_transform_feedback',1, 'device->rad_info.chip_class < GFX10'), +Extension('VK_EXT_transform_feedback',1, True), Extension('VK_EXT_vertex_attribute_divisor', 3, True), Extension('VK_EXT_ycbcr_image_arrays',1, True), Extension('VK_AMD_buffer_marker', 1, True), -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] radv/gfx10: declare streamout user SGPRs
Required for legacy streamout. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index cf73cdc692b..020c6d17771 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -876,9 +876,6 @@ declare_streamout_sgprs(struct radv_shader_context *ctx, gl_shader_stage stage, { int i; - if (ctx->ac.chip_class >= GFX10) - return; - /* Streamout SGPRs. */ if (ctx->shader_info->info.so.num_outputs) { assert(stage == MESA_SHADER_VERTEX || -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3] radv/gfx10: fix VS input VGPRs with the legacy path
For some reasons, InstanceID is VGPR3 although StepRate0 is set to 1. v3: fix instanceID input VGPR for geometry v2: fix instanceID Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 12 +--- src/amd/vulkan/radv_shader.c | 8 ++-- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 336bae28614..cf73cdc692b 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -852,9 +852,15 @@ declare_vs_input_vgprs(struct radv_shader_context *ctx, struct arg_info *args) } } else { if (ctx->ac.chip_class >= GFX10) { - add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ - add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ - add_arg(args, ARG_VGPR, ctx->ac.i32, >abi.instance_id); + if (ctx->options->key.vs_common_out.as_ngg) { + add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ + add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ + add_arg(args, ARG_VGPR, ctx->ac.i32, >abi.instance_id); + } else { + add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* unused */ + add_arg(args, ARG_VGPR, ctx->ac.i32, >vs_prim_id); + add_arg(args, ARG_VGPR, ctx->ac.i32, >abi.instance_id); + } } else { add_arg(args, ARG_VGPR, ctx->ac.i32, >abi.instance_id); add_arg(args, ARG_VGPR, ctx->ac.i32, >vs_prim_id); diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 3adaf52e152..06122664a13 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -765,7 +765,7 @@ static void radv_postprocess_config(const struct radv_physical_device *pdevice, if (info->vs.export_prim_id) { vgpr_comp_cnt = 2; } else if (info->info.vs.needs_instance_id) { - vgpr_comp_cnt = 1; + vgpr_comp_cnt = pdevice->rad_info.chip_class >= GFX10 ? 3 : 1; } else { vgpr_comp_cnt = 0; } @@ -837,7 +837,11 @@ static void radv_postprocess_config(const struct radv_physical_device *pdevice, if (es_type == MESA_SHADER_VERTEX) { /* VGPR0-3: (VertexID, InstanceID / StepRate0, ...) */ - es_vgpr_comp_cnt = info->info.vs.needs_instance_id ? 1 : 0; + if (info->info.vs.needs_instance_id) { + es_vgpr_comp_cnt = pdevice->rad_info.chip_class >= GFX10 ? 3 : 1; + } else { + es_vgpr_comp_cnt = 0; + } } else if (es_type == MESA_SHADER_TESS_EVAL) { es_vgpr_comp_cnt = info->info.uses_prim_id ? 3 : 2; } else { -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] radv/gfx10: fix VS input VGPRs with the legacy path
For some reasons, InstanceID is VGPR3 although StepRate0 is set to 1. v2: fix instanceID Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 12 +--- src/amd/vulkan/radv_shader.c | 2 +- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 336bae28614..cf73cdc692b 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -852,9 +852,15 @@ declare_vs_input_vgprs(struct radv_shader_context *ctx, struct arg_info *args) } } else { if (ctx->ac.chip_class >= GFX10) { - add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ - add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ - add_arg(args, ARG_VGPR, ctx->ac.i32, >abi.instance_id); + if (ctx->options->key.vs_common_out.as_ngg) { + add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ + add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ + add_arg(args, ARG_VGPR, ctx->ac.i32, >abi.instance_id); + } else { + add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* unused */ + add_arg(args, ARG_VGPR, ctx->ac.i32, >vs_prim_id); + add_arg(args, ARG_VGPR, ctx->ac.i32, >abi.instance_id); + } } else { add_arg(args, ARG_VGPR, ctx->ac.i32, >abi.instance_id); add_arg(args, ARG_VGPR, ctx->ac.i32, >vs_prim_id); diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 3adaf52e152..3d1b56e7f60 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -765,7 +765,7 @@ static void radv_postprocess_config(const struct radv_physical_device *pdevice, if (info->vs.export_prim_id) { vgpr_comp_cnt = 2; } else if (info->info.vs.needs_instance_id) { - vgpr_comp_cnt = 1; + vgpr_comp_cnt = pdevice->rad_info.chip_class >= GFX10 ? 3 : 1; } else { vgpr_comp_cnt = 0; } -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv/gfx10: fix VS input VGPRs with the legacy path
On 7/23/19 1:37 PM, Bas Nieuwenhuizen wrote: So does this work with tests that use multiple instances? Apparently no. If so, r-b. On Tue, Jul 23, 2019 at 1:29 PM Samuel Pitoiset wrote: Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 336bae28614..9cea92e8a69 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -851,7 +851,8 @@ declare_vs_input_vgprs(struct radv_shader_context *ctx, struct arg_info *args) add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* unused */ } } else { - if (ctx->ac.chip_class >= GFX10) { + if (ctx->ac.chip_class >= GFX10 && + ctx->options->key.vs_common_out.as_ngg) { add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ add_arg(args, ARG_VGPR, ctx->ac.i32, >abi.instance_id); -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv/gfx10: fix VS input VGPRs with the legacy path
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 336bae28614..9cea92e8a69 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -851,7 +851,8 @@ declare_vs_input_vgprs(struct radv_shader_context *ctx, struct arg_info *args) add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* unused */ } } else { - if (ctx->ac.chip_class >= GFX10) { + if (ctx->ac.chip_class >= GFX10 && + ctx->options->key.vs_common_out.as_ngg) { add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ add_arg(args, ARG_VGPR, ctx->ac.i32, NULL); /* user vgpr */ add_arg(args, ARG_VGPR, ctx->ac.i32, >abi.instance_id); -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: fix dumping disassembly with RADV_DEBUG=shaders
Fixes: a20a9d0c5e7 ("radv: dont store disasm string unless keep_shader_info flag set") Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_shader.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 3adaf52e152..736388c555c 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -1013,7 +1013,8 @@ radv_shader_variant_create(struct radv_device *device, return NULL; } - if (device->keep_shader_info) { + if (device->keep_shader_info || + (device->instance->debug_flags & RADV_DEBUG_DUMP_SHADERS)) { const char *disasm_data; size_t disasm_size; if (!ac_rtld_get_section_by_name(_binary, ".AMDGPU.disasm", _data, _size)) { -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv/gfx10: enable CLEAR_state
It actually works. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_device.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 992e12840f7..93b03afda22 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -354,8 +354,7 @@ radv_physical_device_init(struct radv_physical_device *device, /* The mere presence of CLEAR_STATE in the IB causes random GPU hangs * on GFX6. */ - device->has_clear_state = device->rad_info.chip_class >= GFX7 && - device->rad_info.chip_class <= GFX9; + device->has_clear_state = device->rad_info.chip_class >= GFX7; device->cpdma_prefetch_writes_memory = device->rad_info.chip_class <= GFX8; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radv/gfx10: correctly determine the number of vertices per primitive
On 7/22/19 6:01 PM, Ilia Mirkin wrote: On Mon, Jul 22, 2019 at 11:49 AM Samuel Pitoiset wrote: For TES as NGG. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 336bae28614..6e5a283f923 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -112,6 +112,7 @@ struct radv_shader_context { unsigned gs_max_out_vertices; unsigned gs_output_prim; + unsigned tes_point_mode; unsigned tes_primitive_mode; uint32_t tcs_patch_outputs_read; @@ -3304,7 +3305,6 @@ handle_ngg_outputs_post(struct radv_shader_context *ctx) { LLVMBuilderRef builder = ctx->ac.builder; struct ac_build_if_state if_state; - unsigned num_vertices = 3; LLVMValueRef tmp; assert((ctx->stage == MESA_SHADER_VERTEX || @@ -3322,6 +3322,20 @@ handle_ngg_outputs_post(struct radv_shader_context *ctx) ac_unpack_param(>ac, ctx->gs_vtx_offset[2], 0, 16), }; + /* Determine the number of vertices per primitive. */ + unsigned num_vertices; + + if (ctx->stage == MESA_SHADER_VERTEX) { + num_vertices = 3; /* TODO: optimize for points & lines */ + } else { + if (ctx->tes_point_mode) + num_vertices = 1; + else if (ctx->tes_primitive_mode == GL_LINES) + num_vertices = 2; + else + num_vertices = 3; + } + /* TODO: streamout */ /* Copy Primitive IDs from GS threads to the LDS address corresponding @@ -4435,6 +4449,7 @@ LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm, ctx.tcs_num_inputs = util_last_bit64(shader_info->info.vs.ls_outputs_written); ctx.tcs_num_patches = get_tcs_num_patches(); } else if (shaders[i]->info.stage == MESA_SHADER_TESS_EVAL) { + ctx.tes_point_mode = shaders[i]->info.tess.point_mode; Drive-by-comment without reading the full context... What if there's e.g. a GS which produces not-points? This bool will be set, and the logic above will say num_vertices = 1, which presumably is bad. With GS, TES is emitted as ES and this function isn't called because it's for NGG only. -ilia ctx.tes_primitive_mode = shaders[i]->info.tess.primitive_mode; ctx.abi.load_tess_varyings = load_tes_input; ctx.abi.load_tess_coord = load_tess_coord; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv/gfx10: correctly determine the number of vertices per primitive
For TES as NGG. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 336bae28614..6e5a283f923 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -112,6 +112,7 @@ struct radv_shader_context { unsigned gs_max_out_vertices; unsigned gs_output_prim; + unsigned tes_point_mode; unsigned tes_primitive_mode; uint32_t tcs_patch_outputs_read; @@ -3304,7 +3305,6 @@ handle_ngg_outputs_post(struct radv_shader_context *ctx) { LLVMBuilderRef builder = ctx->ac.builder; struct ac_build_if_state if_state; - unsigned num_vertices = 3; LLVMValueRef tmp; assert((ctx->stage == MESA_SHADER_VERTEX || @@ -3322,6 +3322,20 @@ handle_ngg_outputs_post(struct radv_shader_context *ctx) ac_unpack_param(>ac, ctx->gs_vtx_offset[2], 0, 16), }; + /* Determine the number of vertices per primitive. */ + unsigned num_vertices; + + if (ctx->stage == MESA_SHADER_VERTEX) { + num_vertices = 3; /* TODO: optimize for points & lines */ + } else { + if (ctx->tes_point_mode) + num_vertices = 1; + else if (ctx->tes_primitive_mode == GL_LINES) + num_vertices = 2; + else + num_vertices = 3; + } + /* TODO: streamout */ /* Copy Primitive IDs from GS threads to the LDS address corresponding @@ -4435,6 +4449,7 @@ LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm, ctx.tcs_num_inputs = util_last_bit64(shader_info->info.vs.ls_outputs_written); ctx.tcs_num_patches = get_tcs_num_patches(); } else if (shaders[i]->info.stage == MESA_SHADER_TESS_EVAL) { + ctx.tes_point_mode = shaders[i]->info.tess.point_mode; ctx.tes_primitive_mode = shaders[i]->info.tess.primitive_mode; ctx.abi.load_tess_varyings = load_tes_input; ctx.abi.load_tess_coord = load_tess_coord; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radv/gfx10: reduce max_esverts_base to 128
Same as RadeonSI. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index a7ff0e2d139..fce60a62ee9 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -1704,7 +1704,7 @@ calculate_ngg_info(const VkGraphicsPipelineCreateInfo *pCreateInfo, /* All these are per subgroup: */ bool max_vert_out_per_gs_instance = false; - unsigned max_esverts_base = 256; + unsigned max_esverts_base = 128; unsigned max_gsprims_base = 128; /* default prim group size clamp */ /* Hardware has the following non-natural restrictions on the value -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: fix crash in vkCmdClearAttachments with unused attachment
depth_stencil_attachment and/or ds_resolve attachment can be NULL. This fixes crashes with dEQP-VK.renderpass.suballocation.unused_clear_attachments.* Cc: 19.1 Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_meta_clear.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_meta_clear.c b/src/amd/vulkan/radv_meta_clear.c index dd2ba402f40..b93ba3e0b29 100644 --- a/src/amd/vulkan/radv_meta_clear.c +++ b/src/amd/vulkan/radv_meta_clear.c @@ -1688,7 +1688,7 @@ emit_clear(struct radv_cmd_buffer *cmd_buffer, if (ds_resolve_clear) ds_att = subpass->ds_resolve_attachment; - if (ds_att->attachment == VK_ATTACHMENT_UNUSED) + if (!ds_att || ds_att->attachment == VK_ATTACHMENT_UNUSED) return; VkImageLayout image_layout = ds_att->layout; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] ac/nir: fix txf_ms with an offset
Reviewed-by: Samuel Pitoiset On 7/19/19 9:17 PM, Rhys Perry wrote: Seems to fix some hair artifacts in Max Payne 3: https://github.com/daniel-schuermann/mesa/issues/76 Signed-off-by: Rhys Perry Fixes: f4e499ec791 ('radv: add initial non-conformant radv vulkan driver') --- src/amd/common/ac_nir_to_llvm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index 96bf89a8bf9..549a26ea243 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -3784,7 +3784,7 @@ static void visit_tex(struct ac_nir_context *ctx, nir_tex_instr *instr) goto write_result; } - if (args.offset && instr->op != nir_texop_txf) { + if (args.offset && instr->op != nir_texop_txf && instr->op != nir_texop_txf_ms) { LLVMValueRef offset[3], pack; for (unsigned chan = 0; chan < 3; ++chan) offset[chan] = ctx->ac.i32_0; @@ -3919,7 +3919,7 @@ static void visit_tex(struct ac_nir_context *ctx, nir_tex_instr *instr) args.coords[sample_chan], fmask_ptr); } - if (args.offset && instr->op == nir_texop_txf) { + if (args.offset && (instr->op == nir_texop_txf || instr->op == nir_texop_txf_ms)) { int num_offsets = instr->src[offset_src].src.ssa->num_components; num_offsets = MIN2(num_offsets, instr->coord_components); for (unsigned i = 0; i < num_offsets; ++i) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] radv/gfx10: do not set ELEMENT_SIZE for buffer descriptors
This field doesn't exist. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_device.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 6e313aa9aa1..3c553cb93e7 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -2164,7 +2164,6 @@ fill_geom_tess_rings(struct radv_queue *queue, S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) | S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) | S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) | - S_008F0C_ELEMENT_SIZE(1) | S_008F0C_INDEX_STRIDE(3) | S_008F0C_ADD_TID_ENABLE(1); @@ -2174,7 +2173,8 @@ fill_geom_tess_rings(struct radv_queue *queue, S_008F0C_RESOURCE_LEVEL(1); } else { desc[3] |= S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) | - S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32); + S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32) | + S_008F0C_ELEMENT_SIZE(1); } /* GS entry for ES->GS ring */ @@ -2234,7 +2234,6 @@ fill_geom_tess_rings(struct radv_queue *queue, S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) | S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) | S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) | - S_008F0C_ELEMENT_SIZE(1) | S_008F0C_INDEX_STRIDE(1) | S_008F0C_ADD_TID_ENABLE(true); @@ -2244,7 +2243,8 @@ fill_geom_tess_rings(struct radv_queue *queue, S_008F0C_RESOURCE_LEVEL(1); } else { desc[7] |= S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) | - S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32); + S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32) | + S_008F0C_ELEMENT_SIZE(1); } } -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] radv/gfx10: emit the GS NGG prologue before the nested barrier
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 7e623414adc..6feb55e3916 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -4453,6 +4453,7 @@ LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm, if (i) { if (shaders[i]->info.stage == MESA_SHADER_GEOMETRY && ctx.options->key.vs_common_out.as_ngg) { + gfx10_ngg_gs_emit_prologue(); nested_barrier = false; } else { nested_barrier = true; @@ -4495,12 +4496,6 @@ LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm, LLVMBasicBlockRef merge_block; if (shader_count >= 2 || is_ngg) { - - if (shaders[i]->info.stage == MESA_SHADER_GEOMETRY && - ctx.options->key.vs_common_out.as_ngg) { - gfx10_ngg_gs_emit_prologue(); - } - LLVMValueRef fn = LLVMGetBasicBlockParent(LLVMGetInsertBlock(ctx.ac.builder)); LLVMBasicBlockRef then_block = LLVMAppendBasicBlockInContext(ctx.ac.context, fn, ""); merge_block = LLVMAppendBasicBlockInContext(ctx.ac.context, fn, ""); -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/7] radv/gfx10: do not allocate space for the ZPASS_DONE bug
GFX10 isn't affected. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_cmd_buffer.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c index b6ac14f63a9..84d627340e9 100644 --- a/src/amd/vulkan/radv_cmd_buffer.c +++ b/src/amd/vulkan/radv_cmd_buffer.c @@ -364,12 +364,14 @@ radv_reset_cmd_buffer(struct radv_cmd_buffer *cmd_buffer) radv_buffer_get_va(cmd_buffer->upload.upload_bo); cmd_buffer->gfx9_fence_va += fence_offset; - /* Allocate a buffer for the EOP bug on GFX9. */ - radv_cmd_buffer_upload_alloc(cmd_buffer, 16 * num_db, 8, -_bug_offset, _ptr); - cmd_buffer->gfx9_eop_bug_va = - radv_buffer_get_va(cmd_buffer->upload.upload_bo); - cmd_buffer->gfx9_eop_bug_va += eop_bug_offset; + if (cmd_buffer->device->physical_device->rad_info.chip_class == GFX9) { + /* Allocate a buffer for the EOP bug on GFX9. */ + radv_cmd_buffer_upload_alloc(cmd_buffer, 16 * num_db, 8, +_bug_offset, _ptr); + cmd_buffer->gfx9_eop_bug_va = + radv_buffer_get_va(cmd_buffer->upload.upload_bo); + cmd_buffer->gfx9_eop_bug_va += eop_bug_offset; + } } cmd_buffer->status = RADV_CMD_BUFFER_STATUS_INITIAL; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/7] radv: change a bunch of >= GFX9 to == GFX9
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_cmd_buffer.c | 6 +++--- src/amd/vulkan/radv_device.c | 2 +- src/amd/vulkan/radv_image.c | 10 +- src/amd/vulkan/si_cmd_buffer.c | 12 ++-- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c index b4301c0da15..b6ac14f63a9 100644 --- a/src/amd/vulkan/radv_cmd_buffer.c +++ b/src/amd/vulkan/radv_cmd_buffer.c @@ -1294,7 +1294,7 @@ radv_emit_fb_color_state(struct radv_cmd_buffer *cmd_buffer, cb->cb_color_attrib2); radeon_set_context_reg(cmd_buffer->cs, R_028EE0_CB_COLOR0_ATTRIB3 + index * 4, cb->cb_color_attrib3); - } else if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) { + } else if (cmd_buffer->device->physical_device->rad_info.chip_class == GFX9) { radeon_set_context_reg_seq(cmd_buffer->cs, R_028C60_CB_COLOR0_BASE + index * 0x3c, 11); radeon_emit(cmd_buffer->cs, cb->cb_color_base); radeon_emit(cmd_buffer->cs, S_028C64_BASE_256B(cb->cb_color_base >> 32)); @@ -1432,7 +1432,7 @@ radv_emit_fb_ds_state(struct radv_cmd_buffer *cmd_buffer, radeon_emit(cmd_buffer->cs, ds->db_z_read_base >> 32); radeon_emit(cmd_buffer->cs, ds->db_stencil_read_base >> 32); radeon_emit(cmd_buffer->cs, ds->db_htile_data_base >> 32); - } else if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) { + } else if (cmd_buffer->device->physical_device->rad_info.chip_class == GFX9) { radeon_set_context_reg_seq(cmd_buffer->cs, R_028014_DB_HTILE_DATA_BASE, 3); radeon_emit(cmd_buffer->cs, ds->db_htile_data_base); radeon_emit(cmd_buffer->cs, S_028018_BASE_HI(ds->db_htile_data_base >> 32)); @@ -2508,7 +2508,7 @@ si_emit_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer, draw_vertex_count); if (state->last_ia_multi_vgt_param != ia_multi_vgt_param) { - if (info->chip_class >= GFX9) { + if (info->chip_class == GFX9) { radeon_set_uconfig_reg_idx(cmd_buffer->device->physical_device, cs, R_030960_IA_MULTI_VGT_PARAM, diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 8dd24cb8192..15bda6822e8 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -2502,7 +2502,7 @@ radv_emit_global_shader_pointers(struct radv_queue *queue, radv_emit_shader_pointer(queue->device, cs, regs[i], va, true); } - } else if (queue->device->physical_device->rad_info.chip_class >= GFX9) { + } else if (queue->device->physical_device->rad_info.chip_class == GFX9) { uint32_t regs[] = {R_00B030_SPI_SHADER_USER_DATA_PS_0, R_00B130_SPI_SHADER_USER_DATA_VS_0, R_00B208_SPI_SHADER_USER_DATA_ADDR_LO_GS, diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index 4d3ed71c23c..0941cbb 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -522,7 +522,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device, } state[7] = meta_va >> 16; - } else if (chip_class >= GFX9) { + } else if (chip_class == GFX9) { state[3] &= C_008F1C_SW_MODE; state[4] &= C_008F20_PITCH; @@ -787,7 +787,7 @@ si_make_texture_descriptor(struct radv_device *device, } /* S8 with either Z16 or Z32 HTILE need a special format. */ - if (device->physical_device->rad_info.chip_class >= GFX9 && + if (device->physical_device->rad_info.chip_class == GFX9 && vk_format == VK_FORMAT_S8_UINT && radv_image_is_tc_compat_htile(image)) { if (image->vk_format == VK_FORMAT_D32_SFLOAT_S8_UINT) @@ -828,7 +828,7 @@ si_make_texture_descriptor(struct radv_device *device, state[6] = 0; state[7] = 0; - if (device->physical_device->rad_info.chip_class >= GFX9) { + if (device->physical_device->rad_info.chip_class == GFX9) { unsigned bc_swizzle = gfx9_border_color_swizzle(swizzle); /* Depth is the last accessible layer on Gfx9. @@ -874,7 +874,7 @@ si_make_texture_descriptor(struct
[Mesa-dev] [PATCH 3/7] radv: clean up fill_geom_tess_rings()
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_device.c | 34 +- 1 file changed, 9 insertions(+), 25 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 15bda6822e8..6e313aa9aa1 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -2158,7 +2158,6 @@ fill_geom_tess_rings(struct radv_queue *queue, index stride 64 */ desc[0] = esgs_va; desc[1] = S_008F04_BASE_ADDRESS_HI(esgs_va >> 32) | - S_008F04_STRIDE(0) | S_008F04_SWIZZLE_ENABLE(true); desc[2] = esgs_ring_size; desc[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) | @@ -2167,7 +2166,7 @@ fill_geom_tess_rings(struct radv_queue *queue, S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) | S_008F0C_ELEMENT_SIZE(1) | S_008F0C_INDEX_STRIDE(3) | - S_008F0C_ADD_TID_ENABLE(true); + S_008F0C_ADD_TID_ENABLE(1); if (queue->device->physical_device->rad_info.chip_class >= GFX10) { desc[3] |= S_008F0C_FORMAT(V_008F0C_IMG_FORMAT_32_FLOAT) | @@ -2182,17 +2181,12 @@ fill_geom_tess_rings(struct radv_queue *queue, /* stride 0, num records - size, elsize0, index stride 0 */ desc[4] = esgs_va; - desc[5] = S_008F04_BASE_ADDRESS_HI(esgs_va >> 32)| - S_008F04_STRIDE(0) | - S_008F04_SWIZZLE_ENABLE(false); + desc[5] = S_008F04_BASE_ADDRESS_HI(esgs_va >> 32); desc[6] = esgs_ring_size; desc[7] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) | S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) | S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) | - S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) | - S_008F0C_ELEMENT_SIZE(0) | - S_008F0C_INDEX_STRIDE(0) | - S_008F0C_ADD_TID_ENABLE(false); + S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W); if (queue->device->physical_device->rad_info.chip_class >= GFX10) { desc[7] |= S_008F0C_FORMAT(V_008F0C_IMG_FORMAT_32_FLOAT) | @@ -2213,17 +2207,12 @@ fill_geom_tess_rings(struct radv_queue *queue, /* stride 0, num records - size, elsize0, index stride 0 */ desc[0] = gsvs_va; - desc[1] = S_008F04_BASE_ADDRESS_HI(gsvs_va >> 32)| - S_008F04_STRIDE(0) | - S_008F04_SWIZZLE_ENABLE(false); + desc[1] = S_008F04_BASE_ADDRESS_HI(gsvs_va >> 32); desc[2] = gsvs_ring_size; desc[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) | S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) | S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) | - S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) | - S_008F0C_ELEMENT_SIZE(0) | - S_008F0C_INDEX_STRIDE(0) | - S_008F0C_ADD_TID_ENABLE(false); + S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W); if (queue->device->physical_device->rad_info.chip_class >= GFX10) { desc[3] |= S_008F0C_FORMAT(V_008F0C_IMG_FORMAT_32_FLOAT) | @@ -2238,9 +2227,8 @@ fill_geom_tess_rings(struct radv_queue *queue, elsize 4, index stride 16 */ /* shader will patch stride and desc[2] */ desc[4] = gsvs_va; - desc[5] = S_008F04_BASE_ADDRESS_HI(gsvs_va >> 32)| - S_008F04_STRIDE(0) | - S_008F04_SWIZZLE_ENABLE(true); + desc[5] = S_008F04_BASE_ADDRESS_HI(gsvs_va >> 32) | + S_008F04_SWIZZLE_ENABLE(1); desc[6] = 0; desc[7] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) | S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) | @@ -2268,9 +2256,7 @@ fill_geom_tess_rings(struct radv_queue *queue, uint64_t tess_offchip_va = tess_va + tess_offchip_ring_offset; desc[0] = tess_va; - desc[1] = S_008F04_BASE_ADDRESS_HI(tess_va >> 32) | - S_008F04_STRIDE(0) | - S_008F04_SWIZZLE_ENABLE(false); + desc[1] = S_008F04_BASE_ADDRESS_HI(tess_va >> 32); desc[2] = tess_factor_ring_size; desc[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) | S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) | @@ -2287,9 +2273,7 @@ fill_geom_tess_rings(struct radv_queue *
[Mesa-dev] [PATCH 7/7] radv/gfx10: update descriptors for inline uniform blocks
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 6feb55e3916..19dcae3a476 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -1373,9 +1373,16 @@ radv_load_resource(struct ac_shader_abi *abi, LLVMValueRef index, uint32_t desc_type = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) | S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) | S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) | - S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) | - S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) | - S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32); + S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W); + + if (ctx->ac.chip_class >= GFX10) { + desc_type |= S_008F0C_FORMAT(V_008F0C_IMG_FORMAT_32_FLOAT) | +S_008F0C_OOB_SELECT(3) | +S_008F0C_RESOURCE_LEVEL(1); + } else { + desc_type |= S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) | + S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32); + } LLVMValueRef desc_components[4] = { LLVMBuildPtrToInt(ctx->ac.builder, desc_ptr, ctx->ac.intptr, ""), -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/7] ac/nir: do not clamp shadow reference on GFX10
RadeonSI only uses Z32_FLOAT_CLAMP for upgraded depth textures on GFX10 and RADV doesn't promotes Z16 or Z24. Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_nir_to_llvm.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index 96bf89a8bf9..75ee534eb3e 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -3805,12 +3805,16 @@ static void visit_tex(struct ac_nir_context *ctx, nir_tex_instr *instr) /* TC-compatible HTILE on radeonsi promotes Z16 and Z24 to Z32_FLOAT, * so the depth comparison value isn't clamped for Z16 and -* Z24 anymore. Do it manually here. +* Z24 anymore. Do it manually here for GFX8-9; GFX10 has an explicitly +* clamped 32-bit float format. * * It's unnecessary if the original texture format was * Z32_FLOAT, but we don't know that here. */ - if (args.compare && ctx->ac.chip_class >= GFX8 && ctx->abi->clamp_shadow_reference) + if (args.compare && + ctx->ac.chip_class >= GFX8 && + ctx->ac.chip_class <= GFX9 && + ctx->abi->clamp_shadow_reference) args.compare = ac_build_clamp(>ac, ac_to_float(>ac, args.compare)); /* pack derivatives */ -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/3] radv/gfx10: move emitting VGT_PRIMITIVEID_EN into the NGG path
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index 9338fcd550a..bcb7ccc803d 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3280,12 +3280,6 @@ radv_pipeline_generate_vgt_gs_mode(struct radeon_cmdbuf *ctx_cs, vgt_gs_mode = ac_vgt_gs_mode(gs->info.gs.vertices_out, pipeline->device->physical_device->rad_info.chip_class); - } else if (radv_pipeline_has_ngg(pipeline)) { - bool enable_prim_id = - outinfo->export_prim_id || vs->info.info.uses_prim_id; - - vgt_primitiveid_en |= S_028A84_PRIMITIVEID_EN(enable_prim_id) | - S_028A84_NGG_DISABLE_PROVOK_REUSE(enable_prim_id); } else if (outinfo->export_prim_id || vs->info.info.uses_prim_id) { vgt_gs_mode = S_028A40_MODE(V_028A40_GS_SCENARIO_A); vgt_primitiveid_en |= S_028A84_PRIMITIVEID_EN(1); @@ -3425,6 +3419,8 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, uint64_t va = radv_buffer_get_va(shader->bo) + shader->bo_offset; gl_shader_stage es_type = radv_pipeline_has_tess(pipeline) ? MESA_SHADER_TESS_EVAL : MESA_SHADER_VERTEX; + struct radv_shader_variant *es = + es_type == MESA_SHADER_TESS_EVAL ? pipeline->shaders[MESA_SHADER_TESS_EVAL] : pipeline->shaders[MESA_SHADER_VERTEX]; radeon_set_sh_reg_seq(cs, R_00B320_SPI_SHADER_PGM_LO_ES, 2); radeon_emit(cs, va >> 8); @@ -3441,6 +3437,8 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, bool misc_vec_ena = outinfo->writes_pointsize || outinfo->writes_layer || outinfo->writes_viewport_index; + bool es_enable_prim_id = outinfo->export_prim_id || +(es && es->info.info.uses_prim_id); bool break_wave_at_eoi = false; unsigned nparams; @@ -3479,6 +3477,10 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, cull_dist_mask << 8 | clip_dist_mask); + radeon_set_context_reg(ctx_cs, R_028A84_VGT_PRIMITIVEID_EN, + S_028A84_PRIMITIVEID_EN(es_enable_prim_id) | + S_028A84_NGG_DISABLE_PROVOK_REUSE(es_enable_prim_id)); + bool vgt_reuse_off = pipeline->device->physical_device->rad_info.family == CHIP_NAVI10 && pipeline->device->physical_device->rad_info.chip_external_rev == 0x1 && es_type == MESA_SHADER_TESS_EVAL; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/3] radv/gfx10: set BREAK_WAVE_AT_EOI if TES or GS enable the primitive ID
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 8 1 file changed, 8 insertions(+) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index b11d79f4811..a7ff0e2d139 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3445,6 +3445,14 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, bool break_wave_at_eoi = false; unsigned nparams; + if (es_type == MESA_SHADER_TESS_EVAL) { + struct radv_shader_variant *gs = + pipeline->shaders[MESA_SHADER_GEOMETRY]; + + if (es_enable_prim_id || (gs && gs->info.info.uses_prim_id)) + break_wave_at_eoi = true; + } + nparams = MAX2(outinfo->param_exports, 1); radeon_set_context_reg(ctx_cs, R_0286C4_SPI_VS_OUT_CONFIG, S_0286C4_VS_EXPORT_COUNT(nparams - 1) | -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/3] radv/gfx10: do not emit VGT_GS_MODE
Unnecessary. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index bcb7ccc803d..b11d79f4811 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3274,6 +3274,9 @@ radv_pipeline_generate_vgt_gs_mode(struct radeon_cmdbuf *ctx_cs, unsigned vgt_primitiveid_en = 0; uint32_t vgt_gs_mode = 0; + if (radv_pipeline_has_ngg(pipeline)) + return; + if (radv_pipeline_has_gs(pipeline)) { const struct radv_shader_variant *gs = pipeline->shaders[MESA_SHADER_GEOMETRY]; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] radv/gfx10: do not always execute a barrier before the second shader
On 7/18/19 2:29 AM, Bas Nieuwenhuizen wrote: On Wed, Jul 17, 2019 at 3:44 PM Samuel Pitoiset wrote: With NGG, empty waves may still be required to export data. This fixes dEQP-VK.ycbcr.format.*_unorm.geometry_*. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 31 ++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 3e18303879e..7e623414adc 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -4448,8 +4448,37 @@ LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm, declare_esgs_ring(); } - if (i) + bool nested_barrier = false; + + if (i) { + if (shaders[i]->info.stage == MESA_SHADER_GEOMETRY && + ctx.options->key.vs_common_out.as_ngg) { + nested_barrier = false; + } else { + nested_barrier = true; + } + } We can simplify this to nested_barrier = i && (shaders[i]->info.stage != MESA_SHADER_GEOMETRY || !ctx.options->key.vs_common_out.as_ngg); Otherwise r-b, I'm just surprised an s_barrier is okay. I'm going to move the NGG GS prologue into that inner if, so I would prefer to keep this way. + + if (nested_barrier) { + /* Execute a barrier before the second shader in +* a merged shader. +* +* Execute the barrier inside the conditional block, +* so that empty waves can jump directly to s_endpgm, +* which will also signal the barrier. +* +* This is possible in gfx9, because an empty wave +* for the second shader does not participate in +* the epilogue. With NGG, empty waves may still +* be required to export data (e.g. GS output vertices), +* so we cannot let them exit early. +* +* If the shader is TCS and the TCS epilog is present +* and contains a barrier, it will wait there and then +* reach s_endpgm. + */ ac_emit_barrier(, ctx.stage); + } nir_foreach_variable(variable, [i]->outputs) scan_shader_output_decl(, variable, shaders[i], shaders[i]->info.stage); -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] radv: move emitting VGT_GS_MODE into the HW VS path
On 7/18/19 2:10 AM, Bas Nieuwenhuizen wrote: On Thu, Jul 18, 2019 at 2:05 AM Bas Nieuwenhuizen wrote: On Wed, Jul 17, 2019 at 3:44 PM Samuel Pitoiset wrote: It's useless for NGG anyways. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 43 ++ 1 file changed, 33 insertions(+), 10 deletions(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index fdeb31c453e..686fd371f0f 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3272,27 +3272,18 @@ radv_pipeline_generate_vgt_gs_mode(struct radeon_cmdbuf *ctx_cs, Can you rename the function? Actually now that I see your later patches, how about we keep this function, return immediately if ngg, and then move the primitive id stuff for ngg to ngg? Yes, looks better. pipeline->shaders[MESA_SHADER_TESS_EVAL] : pipeline->shaders[MESA_SHADER_VERTEX]; unsigned vgt_primitiveid_en = 0; - uint32_t vgt_gs_mode = 0; - if (radv_pipeline_has_gs(pipeline)) { - const struct radv_shader_variant *gs = - pipeline->shaders[MESA_SHADER_GEOMETRY]; - - vgt_gs_mode = ac_vgt_gs_mode(gs->info.gs.vertices_out, - pipeline->device->physical_device->rad_info.chip_class); - } else if (radv_pipeline_has_ngg(pipeline)) { + if (radv_pipeline_has_ngg(pipeline)) { bool enable_prim_id = outinfo->export_prim_id || vs->info.info.uses_prim_id; vgt_primitiveid_en |= S_028A84_PRIMITIVEID_EN(enable_prim_id) | S_028A84_NGG_DISABLE_PROVOK_REUSE(enable_prim_id); } else if (outinfo->export_prim_id || vs->info.info.uses_prim_id) { - vgt_gs_mode = S_028A40_MODE(V_028A40_GS_SCENARIO_A); vgt_primitiveid_en |= S_028A84_PRIMITIVEID_EN(1); } radeon_set_context_reg(ctx_cs, R_028A84_VGT_PRIMITIVEID_EN, vgt_primitiveid_en); - radeon_set_context_reg(ctx_cs, R_028A40_VGT_GS_MODE, vgt_gs_mode); } static void @@ -3370,6 +3361,38 @@ radv_pipeline_generate_hw_vs(struct radeon_cmdbuf *ctx_cs, cull_dist_mask << 8 | clip_dist_mask); + /* We always write VGT_GS_MODE in the VS state, because every switch +* between different shader pipelines involving a different GS or no GS +* at all involves a switch of the VS (different GS use different copy +* shaders). On the other hand, when the API switches from a GS to no +* GS and then back to the same GS used originally, the GS state is not +* sent again. +*/ + unsigned vgt_gs_mode; + if (!radv_pipeline_has_gs(pipeline)) { + const struct radv_vs_output_info *outinfo = + get_vs_output_info(pipeline); + const struct radv_shader_variant *vs = + pipeline->shaders[MESA_SHADER_TESS_EVAL] ? + pipeline->shaders[MESA_SHADER_TESS_EVAL] : + pipeline->shaders[MESA_SHADER_VERTEX]; + unsigned mode = V_028A40_GS_OFF; + + /* PrimID needs GS scenario A. */ + if (outinfo->export_prim_id || vs->info.info.uses_prim_id) + mode = V_028A40_GS_SCENARIO_A; + + vgt_gs_mode = S_028A40_MODE(mode); + } else { + const struct radv_shader_variant *gs = + pipeline->shaders[MESA_SHADER_GEOMETRY]; + + vgt_gs_mode = ac_vgt_gs_mode(gs->info.gs.vertices_out, + pipeline->device->physical_device->rad_info.chip_class); + } + + radeon_set_context_reg(ctx_cs, R_028A40_VGT_GS_MODE, vgt_gs_mode); + Can you keep this in a separate function (possibly with the name radv_pipeline_generate_vgt_gs_mode)? if (pipeline->device->physical_device->rad_info.chip_class <= GFX8) radeon_set_context_reg(ctx_cs, R_028AB4_VGT_REUSE_OFF, outinfo->writes_viewport_index); -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: fix crash in shader tracing.
Reviewed-by: Samuel Pitoiset On 7/18/19 2:51 AM, Dave Airlie wrote: From: Dave Airlie Enabling tracing, and then having a vmfault, can leads to a segfault before we print out the traces, as if a meta shader is executing and we don't have the NIR for it. Just pass the stage and give back a default. Fixes: 9b9ccee4d64 ("radv: take LDS into account for compute shader occupancy stats") --- src/amd/vulkan/radv_nir_to_llvm.c | 8 ++-- src/amd/vulkan/radv_private.h | 1 + src/amd/vulkan/radv_shader.c | 2 +- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 3e18303879e..c08789a4361 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -4244,9 +4244,10 @@ ac_setup_rings(struct radv_shader_context *ctx) unsigned radv_nir_get_max_workgroup_size(enum chip_class chip_class, + gl_shader_stage stage, const struct nir_shader *nir) { - switch (nir->info.stage) { + switch (stage) { case MESA_SHADER_TESS_CTRL: return chip_class >= GFX7 ? 128 : 64; case MESA_SHADER_GEOMETRY: @@ -4257,6 +4258,8 @@ radv_nir_get_max_workgroup_size(enum chip_class chip_class, return 0; } + if (!nir) + return chip_class >= GFX9 ? 128 : 64; unsigned max_workgroup_size = nir->info.cs.local_size[0] * nir->info.cs.local_size[1] * nir->info.cs.local_size[2]; @@ -4340,7 +4343,8 @@ LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm, for (int i = 0; i < shader_count; ++i) { ctx.max_workgroup_size = MAX2(ctx.max_workgroup_size, radv_nir_get_max_workgroup_size(ctx.options->chip_class, - shaders[i])); + shaders[i]->info.stage, + shaders[i])); } if (ctx.ac.chip_class >= GFX10) { diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h index 931d4039397..f1f30887e01 100644 --- a/src/amd/vulkan/radv_private.h +++ b/src/amd/vulkan/radv_private.h @@ -2138,6 +2138,7 @@ void radv_compile_nir_shader(struct ac_llvm_compiler *ac_llvm, const struct radv_nir_compiler_options *options); unsigned radv_nir_get_max_workgroup_size(enum chip_class chip_class, +gl_shader_stage stage, const struct nir_shader *nir); /* radv_shader_info.h */ diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index bcc050a86cc..8f24a6d72b0 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -1232,7 +1232,7 @@ generate_shader_stats(struct radv_device *device, lds_increment); } else if (stage == MESA_SHADER_COMPUTE) { unsigned max_workgroup_size = - radv_nir_get_max_workgroup_size(chip_class, variant->nir); + radv_nir_get_max_workgroup_size(chip_class, stage, variant->nir); lds_per_wave = (conf->lds_size * lds_increment) / DIV_ROUND_UP(max_workgroup_size, 64); } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: reset the window scissor with no clear state.
Reviewed-by: Samuel Pitoiset On 7/18/19 3:20 AM, Dave Airlie wrote: From: Dave Airlie IF we don't have clear state (which gfx10 doesn't currently) we will fix to reset the scissor. AMDVLK will leave it set to something else. Marek also has this fix for radeonsi pending. --- src/amd/vulkan/si_cmd_buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c index 6fe447ef2e9..0efa169d674 100644 --- a/src/amd/vulkan/si_cmd_buffer.c +++ b/src/amd/vulkan/si_cmd_buffer.c @@ -202,7 +202,7 @@ si_emit_graphics(struct radv_physical_device *physical_device, /* CLEAR_STATE doesn't clear these correctly on certain generations. * I don't know why. Deduced by trial and error. */ - if (physical_device->rad_info.chip_class <= GFX7) { + if (physical_device->rad_info.chip_class <= GFX7 || !physical_device->has_clear_state) { radeon_set_context_reg(cs, R_028B28_VGT_STRMOUT_DRAW_OPAQUE_OFFSET, 0); radeon_set_context_reg(cs, R_028204_PA_SC_WINDOW_SCISSOR_TL, S_028204_WINDOW_OFFSET_DISABLE(1)); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: put back VGT_FLUSH at ring init on gfx10
Reviewed-by: Samuel Pitoiset On 7/18/19 8:14 AM, Dave Airlie wrote: From: Dave Airlie I can find no evidence that removing this is a good idea. Fixes: 9b116173b6a ("radv: do not emit VGT_FLUSH on GFX10") --- src/amd/vulkan/radv_device.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index b397a9a8aa0..8dd24cb8192 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -2753,10 +2753,8 @@ radv_get_preamble_cs(struct radv_queue *queue, radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); radeon_emit(cs, EVENT_TYPE(V_028A90_VS_PARTIAL_FLUSH) | EVENT_INDEX(4)); - if (queue->device->physical_device->rad_info.chip_class < GFX10) { - radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); - radeon_emit(cs, EVENT_TYPE(V_028A90_VGT_FLUSH) | EVENT_INDEX(0)); - } + radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); + radeon_emit(cs, EVENT_TYPE(V_028A90_VGT_FLUSH) | EVENT_INDEX(0)); } radv_emit_gs_ring_sizes(queue, cs, esgs_ring_bo, esgs_ring_size, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] radv/gfx10: do not always execute a barrier before the second shader
With NGG, empty waves may still be required to export data. This fixes dEQP-VK.ycbcr.format.*_unorm.geometry_*. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 31 ++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 3e18303879e..7e623414adc 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -4448,8 +4448,37 @@ LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm, declare_esgs_ring(); } - if (i) + bool nested_barrier = false; + + if (i) { + if (shaders[i]->info.stage == MESA_SHADER_GEOMETRY && + ctx.options->key.vs_common_out.as_ngg) { + nested_barrier = false; + } else { + nested_barrier = true; + } + } + + if (nested_barrier) { + /* Execute a barrier before the second shader in +* a merged shader. +* +* Execute the barrier inside the conditional block, +* so that empty waves can jump directly to s_endpgm, +* which will also signal the barrier. +* +* This is possible in gfx9, because an empty wave +* for the second shader does not participate in +* the epilogue. With NGG, empty waves may still +* be required to export data (e.g. GS output vertices), +* so we cannot let them exit early. +* +* If the shader is TCS and the TCS epilog is present +* and contains a barrier, it will wait there and then +* reach s_endpgm. + */ ac_emit_barrier(, ctx.stage); + } nir_foreach_variable(variable, [i]->outputs) scan_shader_output_decl(, variable, shaders[i], shaders[i]->info.stage); -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] radv: move emitting VGT_GS_MODE into the HW VS path
It's useless for NGG anyways. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 43 ++ 1 file changed, 33 insertions(+), 10 deletions(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index fdeb31c453e..686fd371f0f 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3272,27 +3272,18 @@ radv_pipeline_generate_vgt_gs_mode(struct radeon_cmdbuf *ctx_cs, pipeline->shaders[MESA_SHADER_TESS_EVAL] : pipeline->shaders[MESA_SHADER_VERTEX]; unsigned vgt_primitiveid_en = 0; - uint32_t vgt_gs_mode = 0; - if (radv_pipeline_has_gs(pipeline)) { - const struct radv_shader_variant *gs = - pipeline->shaders[MESA_SHADER_GEOMETRY]; - - vgt_gs_mode = ac_vgt_gs_mode(gs->info.gs.vertices_out, - pipeline->device->physical_device->rad_info.chip_class); - } else if (radv_pipeline_has_ngg(pipeline)) { + if (radv_pipeline_has_ngg(pipeline)) { bool enable_prim_id = outinfo->export_prim_id || vs->info.info.uses_prim_id; vgt_primitiveid_en |= S_028A84_PRIMITIVEID_EN(enable_prim_id) | S_028A84_NGG_DISABLE_PROVOK_REUSE(enable_prim_id); } else if (outinfo->export_prim_id || vs->info.info.uses_prim_id) { - vgt_gs_mode = S_028A40_MODE(V_028A40_GS_SCENARIO_A); vgt_primitiveid_en |= S_028A84_PRIMITIVEID_EN(1); } radeon_set_context_reg(ctx_cs, R_028A84_VGT_PRIMITIVEID_EN, vgt_primitiveid_en); - radeon_set_context_reg(ctx_cs, R_028A40_VGT_GS_MODE, vgt_gs_mode); } static void @@ -3370,6 +3361,38 @@ radv_pipeline_generate_hw_vs(struct radeon_cmdbuf *ctx_cs, cull_dist_mask << 8 | clip_dist_mask); + /* We always write VGT_GS_MODE in the VS state, because every switch +* between different shader pipelines involving a different GS or no GS +* at all involves a switch of the VS (different GS use different copy +* shaders). On the other hand, when the API switches from a GS to no +* GS and then back to the same GS used originally, the GS state is not +* sent again. +*/ + unsigned vgt_gs_mode; + if (!radv_pipeline_has_gs(pipeline)) { + const struct radv_vs_output_info *outinfo = + get_vs_output_info(pipeline); + const struct radv_shader_variant *vs = + pipeline->shaders[MESA_SHADER_TESS_EVAL] ? + pipeline->shaders[MESA_SHADER_TESS_EVAL] : + pipeline->shaders[MESA_SHADER_VERTEX]; + unsigned mode = V_028A40_GS_OFF; + + /* PrimID needs GS scenario A. */ + if (outinfo->export_prim_id || vs->info.info.uses_prim_id) + mode = V_028A40_GS_SCENARIO_A; + + vgt_gs_mode = S_028A40_MODE(mode); + } else { + const struct radv_shader_variant *gs = + pipeline->shaders[MESA_SHADER_GEOMETRY]; + + vgt_gs_mode = ac_vgt_gs_mode(gs->info.gs.vertices_out, + pipeline->device->physical_device->rad_info.chip_class); + } + + radeon_set_context_reg(ctx_cs, R_028A40_VGT_GS_MODE, vgt_gs_mode); + if (pipeline->device->physical_device->rad_info.chip_class <= GFX8) radeon_set_context_reg(ctx_cs, R_028AB4_VGT_REUSE_OFF, outinfo->writes_viewport_index); -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] radv/gfx10: set BREAK_WAVE_AT_EOI if TES or GS enable the primitive ID
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 8 1 file changed, 8 insertions(+) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index de933937f03..8b6e62a75f5 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3452,6 +3452,14 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, bool break_wave_at_eoi = false; unsigned nparams; + if (es_type == MESA_SHADER_TESS_EVAL) { + struct radv_shader_variant *gs = + pipeline->shaders[MESA_SHADER_GEOMETRY]; + + if (es_enable_prim_id || (gs && gs->info.info.uses_prim_id)) + break_wave_at_eoi = true; + } + nparams = MAX2(outinfo->param_exports, 1); radeon_set_context_reg(ctx_cs, R_0286C4_SPI_VS_OUT_CONFIG, S_0286C4_VS_EXPORT_COUNT(nparams - 1) | -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] radv: move emitting VGT_PRIMITIVEID_EN into the HW VS and NGG paths
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 42 -- 1 file changed, 15 insertions(+), 27 deletions(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index 686fd371f0f..de933937f03 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3262,30 +3262,6 @@ radv_pipeline_generate_multisample_state(struct radeon_cmdbuf *ctx_cs, S_02882C_YMAX_BOTTOM_EXCLUSION(exclusion)); } -static void -radv_pipeline_generate_vgt_gs_mode(struct radeon_cmdbuf *ctx_cs, - struct radv_pipeline *pipeline) -{ - const struct radv_vs_output_info *outinfo = get_vs_output_info(pipeline); - const struct radv_shader_variant *vs = - pipeline->shaders[MESA_SHADER_TESS_EVAL] ? - pipeline->shaders[MESA_SHADER_TESS_EVAL] : - pipeline->shaders[MESA_SHADER_VERTEX]; - unsigned vgt_primitiveid_en = 0; - - if (radv_pipeline_has_ngg(pipeline)) { - bool enable_prim_id = - outinfo->export_prim_id || vs->info.info.uses_prim_id; - - vgt_primitiveid_en |= S_028A84_PRIMITIVEID_EN(enable_prim_id) | - S_028A84_NGG_DISABLE_PROVOK_REUSE(enable_prim_id); - } else if (outinfo->export_prim_id || vs->info.info.uses_prim_id) { - vgt_primitiveid_en |= S_028A84_PRIMITIVEID_EN(1); - } - - radeon_set_context_reg(ctx_cs, R_028A84_VGT_PRIMITIVEID_EN, vgt_primitiveid_en); -} - static void gfx10_set_ge_pc_alloc(struct radeon_cmdbuf *ctx_cs, struct radv_pipeline *pipeline, @@ -3368,7 +3344,7 @@ radv_pipeline_generate_hw_vs(struct radeon_cmdbuf *ctx_cs, * GS and then back to the same GS used originally, the GS state is not * sent again. */ - unsigned vgt_gs_mode; + unsigned vgt_primitiveid_en, vgt_gs_mode; if (!radv_pipeline_has_gs(pipeline)) { const struct radv_vs_output_info *outinfo = get_vs_output_info(pipeline); @@ -3376,22 +3352,27 @@ radv_pipeline_generate_hw_vs(struct radeon_cmdbuf *ctx_cs, pipeline->shaders[MESA_SHADER_TESS_EVAL] ? pipeline->shaders[MESA_SHADER_TESS_EVAL] : pipeline->shaders[MESA_SHADER_VERTEX]; + bool enable_prim_id = outinfo->export_prim_id || + vs->info.info.uses_prim_id; unsigned mode = V_028A40_GS_OFF; /* PrimID needs GS scenario A. */ - if (outinfo->export_prim_id || vs->info.info.uses_prim_id) + if (enable_prim_id) mode = V_028A40_GS_SCENARIO_A; vgt_gs_mode = S_028A40_MODE(mode); + vgt_primitiveid_en = enable_prim_id; } else { const struct radv_shader_variant *gs = pipeline->shaders[MESA_SHADER_GEOMETRY]; vgt_gs_mode = ac_vgt_gs_mode(gs->info.gs.vertices_out, pipeline->device->physical_device->rad_info.chip_class); + vgt_primitiveid_en = 0; } radeon_set_context_reg(ctx_cs, R_028A40_VGT_GS_MODE, vgt_gs_mode); + radeon_set_context_reg(ctx_cs, R_028A84_VGT_PRIMITIVEID_EN, vgt_primitiveid_en); if (pipeline->device->physical_device->rad_info.chip_class <= GFX8) radeon_set_context_reg(ctx_cs, R_028AB4_VGT_REUSE_OFF, @@ -3448,6 +3429,8 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, uint64_t va = radv_buffer_get_va(shader->bo) + shader->bo_offset; gl_shader_stage es_type = radv_pipeline_has_tess(pipeline) ? MESA_SHADER_TESS_EVAL : MESA_SHADER_VERTEX; + struct radv_shader_variant *es = + es_type == MESA_SHADER_TESS_EVAL ? pipeline->shaders[MESA_SHADER_TESS_EVAL] : pipeline->shaders[MESA_SHADER_VERTEX]; radeon_set_sh_reg_seq(cs, R_00B320_SPI_SHADER_PGM_LO_ES, 2); radeon_emit(cs, va >> 8); @@ -3464,6 +3447,8 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, bool misc_vec_ena = outinfo->writes_pointsize || outinfo->writes_layer || outinfo->writes_viewport_index; + bool es_enable_prim_id = outinfo->export_prim_id || +(es && es->info.info.uses_prim_id); bool break_wave_at_eoi = false; unsigned nparams; @@ -3502,6 +3487,10 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, cull_dist_mask << 8 | clip_dist_mask); + radeon_set_conte
[Mesa-dev] [PATCH] radv: fix VGT_GS_MODE if VS uses the primitive ID
Found by inspection. Cc: Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index a3323ae8135..f6cb3611c9d 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3264,6 +3264,10 @@ radv_pipeline_generate_vgt_gs_mode(struct radeon_cmdbuf *ctx_cs, struct radv_pipeline *pipeline) { const struct radv_vs_output_info *outinfo = get_vs_output_info(pipeline); + const struct radv_shader_variant *vs = + pipeline->shaders[MESA_SHADER_TESS_EVAL] ? + pipeline->shaders[MESA_SHADER_TESS_EVAL] : + pipeline->shaders[MESA_SHADER_VERTEX]; unsigned vgt_primitiveid_en = 0; uint32_t vgt_gs_mode = 0; @@ -3274,16 +3278,12 @@ radv_pipeline_generate_vgt_gs_mode(struct radeon_cmdbuf *ctx_cs, vgt_gs_mode = ac_vgt_gs_mode(gs->info.gs.vertices_out, pipeline->device->physical_device->rad_info.chip_class); } else if (radv_pipeline_has_ngg(pipeline)) { - const struct radv_shader_variant *vs = - pipeline->shaders[MESA_SHADER_TESS_EVAL] ? - pipeline->shaders[MESA_SHADER_TESS_EVAL] : - pipeline->shaders[MESA_SHADER_VERTEX]; bool enable_prim_id = outinfo->export_prim_id || vs->info.info.uses_prim_id; vgt_primitiveid_en |= S_028A84_PRIMITIVEID_EN(enable_prim_id) | S_028A84_NGG_DISABLE_PROVOK_REUSE(enable_prim_id); - } else if (outinfo->export_prim_id) { + } else if (outinfo->export_prim_id || vs->info.info.uses_prim_id) { vgt_gs_mode = S_028A40_MODE(V_028A40_GS_SCENARIO_A); vgt_primitiveid_en |= S_028A84_PRIMITIVEID_EN(1); } -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: fix gathering clip/cull distance masks for GS
On 7/17/19 10:25 AM, Juan A. Suarez Romero wrote: On Tue, 2019-07-16 at 08:37 +0200, Samuel Pitoiset wrote: For NGG, the driver relies on the VS outinfo struct. This fixes dEQP-VK.clipping.user_defined.clip_*_vert_tess_geom_* Should this be included in 19.1 stable branch? No, it's GFX10 specific. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 76d784b3374..b890ce56f16 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -2407,6 +2407,11 @@ scan_shader_output_decl(struct radv_shader_context *ctx, ctx->shader_info->tes.outinfo.cull_dist_mask = (1 << shader->info.cull_distance_array_size) - 1; ctx->shader_info->tes.outinfo.cull_dist_mask <<= shader->info.clip_distance_array_size; } + if (stage == MESA_SHADER_GEOMETRY) { + ctx->shader_info->vs.outinfo.clip_dist_mask = (1 << shader->info.clip_distance_array_size) - 1; + ctx->shader_info->vs.outinfo.cull_dist_mask = (1 << shader->info.cull_distance_array_size) - 1; + ctx->shader_info->vs.outinfo.cull_dist_mask <<= shader->info.clip_distance_array_size; + } } } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: use correct register setter for ngg hw addr
Reviewed-by: Samuel Pitoiset On 7/17/19 6:59 AM, Dave Airlie wrote: From: Dave Airlie this shouldn't matter, but it's good to be correct. --- src/amd/vulkan/radv_pipeline.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index 5cdfe6d24eb..c7660c2900c 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3408,7 +3408,7 @@ radv_pipeline_generate_hw_ngg(struct radeon_cmdbuf *ctx_cs, radeon_set_sh_reg_seq(cs, R_00B320_SPI_SHADER_PGM_LO_ES, 2); radeon_emit(cs, va >> 8); - radeon_emit(cs, va >> 40); + radeon_emit(cs, S_00B324_MEM_BASE(va >> 40)); radeon_set_sh_reg_seq(cs, R_00B228_SPI_SHADER_PGM_RSRC1_GS, 2); radeon_emit(cs, shader->config.rsrc1); radeon_emit(cs, shader->config.rsrc2); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv/gfx10: disable the TC compat zrange workaround
Unnecessary. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_cmd_buffer.c | 7 ++- src/amd/vulkan/radv_device.c | 2 ++ src/amd/vulkan/radv_image.c | 7 --- src/amd/vulkan/radv_private.h| 1 + 4 files changed, 13 insertions(+), 4 deletions(-) diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c index a6d4e0d0e21..b4301c0da15 100644 --- a/src/amd/vulkan/radv_cmd_buffer.c +++ b/src/amd/vulkan/radv_cmd_buffer.c @@ -1356,7 +1356,8 @@ radv_update_zrange_precision(struct radv_cmd_buffer *cmd_buffer, uint32_t db_z_info = ds->db_z_info; uint32_t db_z_info_reg; - if (!radv_image_is_tc_compat_htile(image)) + if (!cmd_buffer->device->physical_device->has_tc_compat_zrange_bug || + !radv_image_is_tc_compat_htile(image)) return; if (!radv_layout_has_htile(image, layout, @@ -1566,6 +1567,10 @@ radv_set_tc_compat_zrange_metadata(struct radv_cmd_buffer *cmd_buffer, { struct radeon_cmdbuf *cs = cmd_buffer->cs; uint64_t va = radv_buffer_get_va(image->bo); + + if (!cmd_buffer->device->physical_device->has_tc_compat_zrange_bug) + return; + va += image->offset + image->tc_compat_zrange_offset; radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, cmd_buffer->state.predicating)); diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 9d75305fc2b..b397a9a8aa0 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -363,6 +363,8 @@ radv_physical_device_init(struct radv_physical_device *device, device->has_scissor_bug = device->rad_info.family == CHIP_VEGA10 || device->rad_info.family == CHIP_RAVEN; + device->has_tc_compat_zrange_bug = device->rad_info.chip_class < GFX10; + /* Out-of-order primitive rasterization. */ device->has_out_of_order_rast = device->rad_info.chip_class >= GFX8 && device->rad_info.max_se >= 2; diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index ccbec36849e..4d3ed71c23c 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -1186,14 +1186,15 @@ radv_image_alloc_dcc(struct radv_image *image) } static void -radv_image_alloc_htile(struct radv_image *image) +radv_image_alloc_htile(struct radv_device *device, struct radv_image *image) { image->htile_offset = align64(image->size, image->planes[0].surface.htile_alignment); /* + 8 for storing the clear values */ image->clear_value_offset = image->htile_offset + image->planes[0].surface.htile_size; image->size = image->clear_value_offset + 8; - if (radv_image_is_tc_compat_htile(image)) { + if (radv_image_is_tc_compat_htile(image) && + device->physical_device->has_tc_compat_zrange_bug) { /* Metadata for the TC-compatible HTILE hardware bug which * have to be fixed by updating ZRANGE_PRECISION when doing * fast depth clears to 0.0f. @@ -1402,7 +1403,7 @@ radv_image_create(VkDevice _device, if (radv_image_can_enable_htile(image) && !(device->instance->debug_flags & RADV_DEBUG_NO_HIZ)) { image->tc_compatible_htile = image->planes[0].surface.flags & RADEON_SURF_TC_COMPATIBLE_HTILE; - radv_image_alloc_htile(image); + radv_image_alloc_htile(device, image); } else { radv_image_disable_htile(image); } diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h index e1b5b456ef3..931d4039397 100644 --- a/src/amd/vulkan/radv_private.h +++ b/src/amd/vulkan/radv_private.h @@ -317,6 +317,7 @@ struct radv_physical_device { bool has_clear_state; bool cpdma_prefetch_writes_memory; bool has_scissor_bug; + bool has_tc_compat_zrange_bug; bool has_out_of_order_rast; bool out_of_order_rast_allowed; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv/gfx10: implement VK_EXT_post_depth_coverage
I did implement this extension a while ago but it didn't work on pre GFX10 for some reasons. Now all CTS pass. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_extensions.py | 1 + src/amd/vulkan/radv_nir_to_llvm.c | 1 + src/amd/vulkan/radv_pipeline.c| 1 + src/amd/vulkan/radv_shader.c | 1 + src/amd/vulkan/radv_shader.h | 1 + 5 files changed, 5 insertions(+) diff --git a/src/amd/vulkan/radv_extensions.py b/src/amd/vulkan/radv_extensions.py index 8b6ba6a4df0..e9addad0035 100644 --- a/src/amd/vulkan/radv_extensions.py +++ b/src/amd/vulkan/radv_extensions.py @@ -120,6 +120,7 @@ EXTENSIONS = [ Extension('VK_EXT_memory_priority', 1, True), Extension('VK_EXT_pci_bus_info', 2, True), Extension('VK_EXT_pipeline_creation_feedback',1, True), +Extension('VK_EXT_post_depth_coverage', 1, 'device->rad_info.chip_class >= GFX10'), Extension('VK_EXT_queue_family_foreign', 1, True), Extension('VK_EXT_sample_locations', 1, True), Extension('VK_EXT_sampler_filter_minmax', 1, 'device->rad_info.chip_class >= GFX7'), diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index a689003d473..3e18303879e 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -4637,6 +4637,7 @@ ac_fill_shader_info(struct radv_shader_variant_info *shader_info, struct nir_sha break; case MESA_SHADER_FRAGMENT: shader_info->fs.early_fragment_test = nir->info.fs.early_fragment_tests; +shader_info->fs.post_depth_coverage = nir->info.fs.post_depth_coverage; break; case MESA_SHADER_GEOMETRY: shader_info->gs.vertices_in = nir->info.gs.vertices_in; diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index 31495ec078d..7056ac8ca60 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -3822,6 +3822,7 @@ radv_compute_db_shader_control(const struct radv_device *device, S_02880C_MASK_EXPORT_ENABLE(mask_export_enable) | S_02880C_Z_ORDER(z_order) | S_02880C_DEPTH_BEFORE_SHADER(ps->info.fs.early_fragment_test) | + S_02880C_PRE_SHADER_DEPTH_COVERAGE_ENABLE(ps->info.fs.post_depth_coverage) | S_02880C_EXEC_ON_HIER_FAIL(ps->info.info.ps.writes_memory) | S_02880C_EXEC_ON_NOOP(ps->info.info.ps.writes_memory) | S_02880C_DUAL_QUAD_DISABLE(disable_rbplus); diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 1e9399de193..75f1ce3e869 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -270,6 +270,7 @@ radv_shader_compile_to_nir(struct radv_device *device, .int64_atomics = true, .multiview = true, .physical_storage_buffer_address = true, + .post_depth_coverage = true, .runtime_descriptor_array = true, .shader_viewport_index_layer = true, .stencil_export = true, diff --git a/src/amd/vulkan/radv_shader.h b/src/amd/vulkan/radv_shader.h index 360591349a8..fea0d1c8df1 100644 --- a/src/amd/vulkan/radv_shader.h +++ b/src/amd/vulkan/radv_shader.h @@ -283,6 +283,7 @@ struct radv_shader_variant_info { uint32_t float16_shaded_mask; bool can_discard; bool early_fragment_test; + bool post_depth_coverage; } fs; struct { unsigned block_size[3]; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv/gfx10: fallback to the legacy path if tess and extreme geometry
This is unsupported and hangs. This fixes GPU hangs with dEQP-VK.tessellation.geometry_interaction.limits.output_required_*. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 12 src/amd/vulkan/radv_shader.c | 2 +- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index d1eede172dc..a22e605ca1c 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -2306,6 +2306,18 @@ radv_fill_shader_keys(struct radv_device *device, } else { keys[MESA_SHADER_VERTEX].vs_common_out.as_ngg = true; } + + if (nir[MESA_SHADER_TESS_CTRL] && + nir[MESA_SHADER_GEOMETRY] && + nir[MESA_SHADER_GEOMETRY]->info.gs.invocations * + nir[MESA_SHADER_GEOMETRY]->info.gs.vertices_out > 256) { + /* Fallback to the legacy path if tessellation is +* enabled with extreme geometry because +* EN_MAX_VERT_OUT_PER_GS_INSTANCE doesn't work and it +* might hang. +*/ + keys[MESA_SHADER_TESS_EVAL].vs_common_out.as_ngg = false; + } } for(int i = 0; i < MESA_SHADER_STAGES; ++i) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 1e9399de193..6bafcb2f869 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -796,7 +796,7 @@ static void radv_postprocess_config(const struct radv_physical_device *pdevice, break; } - if (pdevice->rad_info.chip_class >= GFX10 && + if (pdevice->rad_info.chip_class >= GFX10 && info->is_ngg && (stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_TESS_EVAL || stage == MESA_SHADER_GEOMETRY)) { unsigned gs_vgpr_comp_cnt, es_vgpr_comp_cnt; gl_shader_stage es_stage = stage; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radv/gfx10: always build the GS copy shader but uses it on-demand
It should be possible to build it on-demand too but it requires more work. On GFX10, the GS copy shader is required when tess is enabled with extreme geometry. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_cmd_buffer.c | 8 src/amd/vulkan/radv_pipeline.c | 21 ++--- src/amd/vulkan/radv_private.h| 2 ++ 3 files changed, 24 insertions(+), 7 deletions(-) diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c index 6a0db2b67e9..a6d4e0d0e21 100644 --- a/src/amd/vulkan/radv_cmd_buffer.c +++ b/src/amd/vulkan/radv_cmd_buffer.c @@ -929,7 +929,7 @@ radv_emit_prefetch_L2(struct radv_cmd_buffer *cmd_buffer, if (mask & RADV_PREFETCH_GS) { radv_emit_shader_prefetch(cmd_buffer, pipeline->shaders[MESA_SHADER_GEOMETRY]); - if (pipeline->gs_copy_shader) + if (radv_pipeline_has_gs_copy_shader(pipeline)) radv_emit_shader_prefetch(cmd_buffer, pipeline->gs_copy_shader); } @@ -1124,7 +1124,7 @@ radv_emit_graphics_pipeline(struct radv_cmd_buffer *cmd_buffer) pipeline->shaders[i]->bo); } - if (radv_pipeline_has_gs(pipeline) && pipeline->gs_copy_shader) + if (radv_pipeline_has_gs_copy_shader(pipeline)) radv_cs_add_buffer(cmd_buffer->device->ws, cmd_buffer->cs, pipeline->gs_copy_shader->bo); @@ -2362,7 +2362,7 @@ radv_emit_streamout_buffers(struct radv_cmd_buffer *cmd_buffer, uint64_t va) base_reg + loc->sgpr_idx * 4, va, false); } - if (pipeline->gs_copy_shader) { + if (radv_pipeline_has_gs_copy_shader(pipeline)) { loc = >gs_copy_shader->info.user_sgprs_locs.shader_data[AC_UD_STREAMOUT_BUFFERS]; if (loc->sgpr_idx != -1) { base_reg = R_00B130_SPI_SHADER_USER_DATA_VS_0; @@ -4071,7 +4071,7 @@ static void radv_emit_view_index(struct radv_cmd_buffer *cmd_buffer, unsigned in radeon_set_sh_reg(cmd_buffer->cs, base_reg + loc->sgpr_idx * 4, index); } - if (pipeline->gs_copy_shader) { + if (radv_pipeline_has_gs_copy_shader(pipeline)) { struct radv_userdata_info *loc = >gs_copy_shader->info.user_sgprs_locs.shader_data[AC_UD_VIEW_INDEX]; if (loc->sgpr_idx != -1) { uint32_t base_reg = R_00B130_SPI_SHADER_USER_DATA_VS_0; diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index 31495ec078d..d1eede172dc 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -120,6 +120,22 @@ bool radv_pipeline_has_ngg(const struct radv_pipeline *pipeline) return variant->info.is_ngg; } +bool radv_pipeline_has_gs_copy_shader(const struct radv_pipeline *pipeline) +{ + if (!radv_pipeline_has_gs(pipeline)) + return false; + + /* The GS copy shader is required if the pipeline has GS on GFX6-GFX9. +* On GFX10, it might be required in rare cases if it's not possible to +* enable NGG. +*/ + if (radv_pipeline_has_ngg(pipeline)) + return false; + + assert(pipeline->gs_copy_shader); + return true; +} + static void radv_pipeline_destroy(struct radv_device *device, struct radv_pipeline *pipeline, @@ -2395,7 +2411,6 @@ void radv_create_shaders(struct radv_pipeline *pipeline, struct radv_shader_binary *binaries[MESA_SHADER_STAGES] = {NULL}; struct radv_shader_variant_key keys[MESA_SHADER_STAGES] = {0}; unsigned char hash[20], gs_copy_hash[20]; - bool use_ngg = device->physical_device->rad_info.chip_class >= GFX10; radv_start_feedback(pipeline_feedback); @@ -2416,7 +2431,7 @@ void radv_create_shaders(struct radv_pipeline *pipeline, gs_copy_hash[0] ^= 1; bool found_in_application_cache = true; - if (modules[MESA_SHADER_GEOMETRY] && !use_ngg) { + if (modules[MESA_SHADER_GEOMETRY]) { struct radv_shader_variant *variants[MESA_SHADER_STAGES] = {0}; radv_create_shader_variants_from_pipeline_cache(device, cache, gs_copy_hash, variants, _in_application_cache); @@ -2567,7 +2582,7 @@ void radv_create_shaders(struct radv_pipeline *pipeline, } } - if(modules[MESA_SHADER_GEOMETRY] && !use_ngg) { + if(modules[MESA_SHADER_GEOMETRY]) { struct radv_shader_binary *gs_copy_binary = NULL; if (!pipeline->gs_copy_shader) { pipeline->gs_copy_shader = radv_create_gs_copy_shader( diff --git a/src/amd/vulkan/radv_private.h b/sr
Re: [Mesa-dev] [PATCH] android: radv/gfx10: generate gfx10_format_table.h
Acked-by: Samuel Pitoiset On 7/10/19 9:13 AM, Mauro Rossi wrote: This patch adds gfx10_format_table.h in Makefile.sources and the rules for Android to fix following building errors: In file included from external/mesa/src/amd/vulkan/radv_debug.c:35: In file included from external/mesa/src/amd/vulkan/radv_debug.h:27: external/mesa/src/amd/vulkan/radv_private.h:95:10: fatal error: 'gfx10_format_table.h' file not found ^~ 1 error generated. In file included from external/mesa/src/amd/vulkan/radv_android.c:31: external/mesa/src/amd/vulkan/radv_private.h:95:10: fatal error: 'gfx10_format_table.h' file not found ^~ 1 error generated. Fixes: 3dc5ec5d16 ("radv/gfx10: generate gfx10_format_table.h") Signed-off-by: Mauro Rossi --- src/amd/vulkan/Android.mk | 15 +++ src/amd/vulkan/Makefile.sources | 3 ++- 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/Android.mk b/src/amd/vulkan/Android.mk index 0725feacb5..23cebb1ec8 100644 --- a/src/amd/vulkan/Android.mk +++ b/src/amd/vulkan/Android.mk @@ -83,6 +83,7 @@ LOCAL_GENERATED_SOURCES += $(intermediates)/radv_entrypoints.h LOCAL_GENERATED_SOURCES += $(intermediates)/radv_extensions.c LOCAL_GENERATED_SOURCES += $(intermediates)/radv_extensions.h LOCAL_GENERATED_SOURCES += $(intermediates)/vk_format_table.c +LOCAL_GENERATED_SOURCES += $(intermediates)/gfx10_format_table.h RADV_ENTRYPOINTS_SCRIPT := $(MESA_TOP)/src/amd/vulkan/radv_entrypoints_gen.py RADV_EXTENSIONS_SCRIPT := $(MESA_TOP)/src/amd/vulkan/radv_extensions.py @@ -117,6 +118,20 @@ $(intermediates)/vk_format_table.c: $(VK_FORMAT_TABLE_SCRIPT) \ @mkdir -p $(dir $@) $(MESA_PYTHON2) $(VK_FORMAT_TABLE_SCRIPT) $(vk_format_layout_csv) > $@ +RADV_GEN10_FORMAT_TABLE_INPUTS := \ + $(MESA_TOP)/src/amd/vulkan/vk_format_layout.csv \ + $(MESA_TOP)/src/amd/registers/gfx10-rsrc.json + +RADV_GEN10_FORMAT_TABLE_DEP := \ + $(MESA_TOP)/src/amd/registers/regdb.py + +RADV_GEN10_FORMAT_TABLE := $(LOCAL_PATH)/gfx10_format_table.py + +$(intermediates)/gfx10_format_table.h: $(RADV_GEN10_FORMAT_TABLE) $(RADV_GEN10_FORMAT_TABLE_INPUTS) $(RADV_GEN10_FORMAT_TABLE_DEP) + @mkdir -p $(dir $@) + @echo "Gen Header: $(PRIVATE_MODULE) <= $(notdir $(@))" + $(hide) $(MESA_PYTHON2) $(RADV_GEN10_FORMAT_TABLE) $(RADV_GEN10_FORMAT_TABLE_INPUTS) > $@ || ($(RM) $@; false) + LOCAL_SHARED_LIBRARIES += $(RADV_SHARED_LIBRARIES) LOCAL_EXPORT_C_INCLUDE_DIRS := \ diff --git a/src/amd/vulkan/Makefile.sources b/src/amd/vulkan/Makefile.sources index df90c1150a..312cd0b1e9 100644 --- a/src/amd/vulkan/Makefile.sources +++ b/src/amd/vulkan/Makefile.sources @@ -91,5 +91,6 @@ VULKAN_GENERATED_FILES := \ radv_entrypoints.h \ radv_extensions.c \ radv_extensions.h \ - vk_format_table.c + vk_format_table.c \ + gfx10_format_table.h ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv: update LATE_ALLOC_VS.LIMIT
Mirror RadeonSI. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/si_cmd_buffer.c | 60 -- 1 file changed, 42 insertions(+), 18 deletions(-) diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c index a832dbd89eb..e996fa250a9 100644 --- a/src/amd/vulkan/si_cmd_buffer.c +++ b/src/amd/vulkan/si_cmd_buffer.c @@ -264,9 +264,6 @@ si_emit_graphics(struct radv_physical_device *physical_device, /* Logical CUs 16 - 31 */ radeon_set_sh_reg(cs, R_00B404_SPI_SHADER_PGM_RSRC4_HS, S_00B404_CU_EN(0x)); - radeon_set_sh_reg(cs, R_00B204_SPI_SHADER_PGM_RSRC4_GS, - S_00B204_CU_EN(0x) | - S_00B204_SPI_SHADER_LATE_ALLOC_GS_GFX10(0)); radeon_set_sh_reg(cs, R_00B104_SPI_SHADER_PGM_RSRC4_VS, S_00B104_CU_EN(0x)); radeon_set_sh_reg(cs, R_00B004_SPI_SHADER_PGM_RSRC4_PS, @@ -291,28 +288,55 @@ si_emit_graphics(struct radv_physical_device *physical_device, S_028A44_ES_VERTS_PER_SUBGRP(64) | S_028A44_GS_PRIMS_PER_SUBGRP(4)); } - radeon_set_sh_reg(cs, R_00B21C_SPI_SHADER_PGM_RSRC3_GS, - S_00B21C_CU_EN(0x) | S_00B21C_WAVE_LIMIT(0x3F)); - if (physical_device->rad_info.num_good_cu_per_sh <= 4) { + /* Compute LATE_ALLOC_VS.LIMIT. */ + unsigned num_cu_per_sh = physical_device->rad_info.num_good_cu_per_sh; + unsigned late_alloc_limit; /* The limit is per SH. */ + + if (physical_device->rad_info.family == CHIP_KABINI) { + late_alloc_limit = 0; /* Potential hang on Kabini. */ + } else if (num_cu_per_sh <= 4) { /* Too few available compute units per SH. Disallowing -* VS to run on CU0 could hurt us more than late VS +* VS to run on one CU could hurt us more than late VS * allocation would help. * -* LATE_ALLOC_VS = 2 is the highest safe number. +* 2 is the highest safe number that allows us to keep +* all CUs enabled. */ - radeon_set_sh_reg(cs, R_00B118_SPI_SHADER_PGM_RSRC3_VS, - S_00B118_CU_EN(0x) | S_00B118_WAVE_LIMIT(0x3F) ); - radeon_set_sh_reg(cs, R_00B11C_SPI_SHADER_LATE_ALLOC_VS, S_00B11C_LIMIT(2)); + late_alloc_limit = 2; } else { - /* Set LATE_ALLOC_VS == 31. It should be less than -* the number of scratch waves. Limitations: -* - VS can't execute on CU0. -* - If HS writes outputs to LDS, LS can't execute on CU0. + /* This is a good initial value, allowing 1 late_alloc +* wave per SIMD on num_cu - 2. */ - radeon_set_sh_reg(cs, R_00B118_SPI_SHADER_PGM_RSRC3_VS, - S_00B118_CU_EN(0xfffe) | S_00B118_WAVE_LIMIT(0x3F)); - radeon_set_sh_reg(cs, R_00B11C_SPI_SHADER_LATE_ALLOC_VS, S_00B11C_LIMIT(31)); + late_alloc_limit = (num_cu_per_sh - 2) * 4; + } + + unsigned cu_mask_vs = 0x; + unsigned cu_mask_gs = 0x; + + if (late_alloc_limit > 2) { + if (physical_device->rad_info.chip_class >= GFX10) { + /* CU2 & CU3 disabled because of the dual CU design */ + cu_mask_vs = 0xfff3; + cu_mask_gs = 0xfff3; /* NGG only */ + } else { + cu_mask_vs = 0xfffe; /* 1 CU disabled */ + } + } + + radeon_set_sh_reg(cs, R_00B118_SPI_SHADER_PGM_RSRC3_VS, + S_00B118_CU_EN(cu_mask_vs) | + S_00B118_WAVE_LIMIT(0x3F)); + radeon_set_sh_reg(cs, R_00B11C_SPI_SHADER_LATE_ALLOC_VS, + S_00B11C_LIMIT(late_alloc_limit)); + + radeon_set_sh_reg(cs, R_00B21C_SPI_SHADER_PGM_RSRC3_GS, + S_00B21C_CU_EN(cu_mask_gs) | S_00B21C_WAVE_LIMIT(0x3F)); + + if (physical_device->rad_info.chip_class >= GFX10) { + radeon_set_sh_reg(cs
[Mesa-dev] [PATCH 1/2] radv/gfx10: support pixel shaders without exports
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_pipeline.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index fdb0ed29ea4..31495ec078d 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -4283,9 +4283,15 @@ radv_pipeline_init(struct radv_pipeline *pipeline, *stalls without this setting. * * Don't add this to CB_SHADER_MASK. +* +* GFX10 supports pixel shaders without exports by setting both the +* color and Z formats to SPI_SHADER_ZERO. The hw will skip export +* instructions if any are present. */ struct radv_shader_variant *ps = pipeline->shaders[MESA_SHADER_FRAGMENT]; - if (!blend.spi_shader_col_format) { + if ((pipeline->device->physical_device->rad_info.chip_class <= GFX9 || +ps->info.fs.can_discard) && + !blend.spi_shader_col_format) { if (!ps->info.info.ps.writes_z && !ps->info.info.ps.writes_stencil && !ps->info.info.ps.writes_sample_mask) -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: fix gathering clip/cull distance masks for GS
For NGG, the driver relies on the VS outinfo struct. This fixes dEQP-VK.clipping.user_defined.clip_*_vert_tess_geom_* Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_nir_to_llvm.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index 76d784b3374..b890ce56f16 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -2407,6 +2407,11 @@ scan_shader_output_decl(struct radv_shader_context *ctx, ctx->shader_info->tes.outinfo.cull_dist_mask = (1 << shader->info.cull_distance_array_size) - 1; ctx->shader_info->tes.outinfo.cull_dist_mask <<= shader->info.clip_distance_array_size; } + if (stage == MESA_SHADER_GEOMETRY) { + ctx->shader_info->vs.outinfo.clip_dist_mask = (1 << shader->info.clip_distance_array_size) - 1; + ctx->shader_info->vs.outinfo.cull_dist_mask = (1 << shader->info.cull_distance_array_size) - 1; + ctx->shader_info->vs.outinfo.cull_dist_mask <<= shader->info.clip_distance_array_size; + } } } -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv/gfx10: enable OC_LDS_EN for NGG GS if the ES stage is TES
Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_shader.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index f6b0297d4a3..1e9399de193 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -826,7 +826,8 @@ static void radv_postprocess_config(const struct radv_physical_device *pdevice, config_out->rsrc1 |= S_00B228_GS_VGPR_COMP_CNT(gs_vgpr_comp_cnt) | S_00B228_WGP_MODE(1); config_out->rsrc2 |= S_00B22C_ES_VGPR_COMP_CNT(es_vgpr_comp_cnt) | -S_00B22C_LDS_SIZE(config_in->lds_size); +S_00B22C_LDS_SIZE(config_in->lds_size) | +S_00B22C_OC_LDS_EN(es_stage == MESA_SHADER_TESS_EVAL); } else if (pdevice->rad_info.chip_class >= GFX9 && stage == MESA_SHADER_GEOMETRY) { unsigned es_type = info->gs.es_type; -- 2.22.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev