Re: [Mesa-dev] [PATCH 2/2] i965/tcs: support gl_PatchVerticesIn as a system value from SPIR-V
On Saturday, January 6, 2018 9:07:44 PM PST Jason Ekstrand wrote: > On Sat, Jan 6, 2018 at 5:12 PM, Kenneth Graunke> wrote: > > > On Wednesday, November 15, 2017 11:53:08 PM PST Iago Toral Quiroga wrote: > > > We currently handle this by lowering it to a uniform for gen8+ but > > > the SPIR-V path generates this as a system value, so handle that > > > case as well. > > > --- > > > src/mesa/drivers/dri/i965/brw_tcs.c | 9 - > > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c > > b/src/mesa/drivers/dri/i965/brw_tcs.c > > > index 4424efea4f0..b07b11f485d 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_tcs.c > > > +++ b/src/mesa/drivers/dri/i965/brw_tcs.c > > > @@ -296,7 +296,14 @@ brw_tcs_populate_key(struct brw_context *brw, > > >per_patch_slots |= prog->info.patch_outputs_written; > > > } > > > > > > - if (devinfo->gen < 8 || !tcp) > > > + /* For GLSL, gen8+ lowers gl_PatchVerticesIn to a uniform, however > > > +* the SPIR-V path always lowers it to a system value. > > > +*/ > > > + bool reads_patch_vertices_as_system_value = > > > + tcp && (tcp->program.nir->info.system_values_read & > > > + BITFIELD64_BIT(SYSTEM_VALUE_VERTICES_IN)); > > > + > > > + if (devinfo->gen < 8 || !tcp || reads_patch_vertices_as_ > > system_value) > > >key->input_vertices = brw->ctx.TessCtrlProgram.patch_vertices; > > > key->outputs_written = per_vertex_slots; > > > key->patch_outputs_written = per_patch_slots; > > > > > > > I guess this is okay, and it's better than nothing. I'd really rather > > see it converted to a uniform, like it is in the normal GLSL paths. If > > you're going to add recompiles based on the key like this, it might be > > nice to at least update the brw_tcs_precompile function to guess, so we > > at least attempt to avoid a recompile. > > > > Ugh... I'm happy to give a stronger "I don't like this". In Vulkan, this > is part of the pipeline state so we just pass it in through the shader > key. With GL, ugh... Personally, I think I'd be ok with just making it > state based all the time but we already have the infrastructure to pass it > through as a uniform so we may as well. I think the better thing to do > would be to add a quick little pass that moves VERTICES_IN to a uniform and > call that on gen8+ brw_link.cpp. Then we can delete > LowerTESPatchVerticesIn as i965 is the only user. The "pass" would be > really easy: LowerTCSPatchVerticesIn rather. I like this plan. > > void > brw_nir_lower_tcs_vertices_in_to_uniform(nir_shader *nir, const struct > gl_program *prog, brw_tcs_prog_data *prog_data) > { >int uniform = -1; >nir_foreach_var_safe(var, >system_values) { > if (var->data.location != SYSTEM_VALUE_VERTICES_IN) > continue; > > if (uniform < -1) { > gl_state_index tokens[5] = { > STATE_INTERNAL, > STATE_TESS_PATCH_VERTICES_IN, > }; > int index = _mesa_add_state_reference(prog->Parameters, tokens); > > uniform = prog_data->nr_params; > uint32_t *param = > brw_stage_prog_data_add_params(_data->base->base, 1); > *param = BRW_PARAM_PARAMETER(index, SWIZZLE_); > } > > var->mode = nir_var_uniform; > var->data.location = uniform; > exec_node_remove(>node); > exec_list_push_tail(>uniforms, >node); >} > } > > I may not have gotten my state referencing quite right there, but I think > it's close. I'd probably put the pas in brw_nir_uniforms.cpp if I was > writing it. > > --Jason > signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965/tcs: support gl_PatchVerticesIn as a system value from SPIR-V
On Sat, Jan 6, 2018 at 5:12 PM, Kenneth Graunkewrote: > On Wednesday, November 15, 2017 11:53:08 PM PST Iago Toral Quiroga wrote: > > We currently handle this by lowering it to a uniform for gen8+ but > > the SPIR-V path generates this as a system value, so handle that > > case as well. > > --- > > src/mesa/drivers/dri/i965/brw_tcs.c | 9 - > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c > b/src/mesa/drivers/dri/i965/brw_tcs.c > > index 4424efea4f0..b07b11f485d 100644 > > --- a/src/mesa/drivers/dri/i965/brw_tcs.c > > +++ b/src/mesa/drivers/dri/i965/brw_tcs.c > > @@ -296,7 +296,14 @@ brw_tcs_populate_key(struct brw_context *brw, > >per_patch_slots |= prog->info.patch_outputs_written; > > } > > > > - if (devinfo->gen < 8 || !tcp) > > + /* For GLSL, gen8+ lowers gl_PatchVerticesIn to a uniform, however > > +* the SPIR-V path always lowers it to a system value. > > +*/ > > + bool reads_patch_vertices_as_system_value = > > + tcp && (tcp->program.nir->info.system_values_read & > > + BITFIELD64_BIT(SYSTEM_VALUE_VERTICES_IN)); > > + > > + if (devinfo->gen < 8 || !tcp || reads_patch_vertices_as_ > system_value) > >key->input_vertices = brw->ctx.TessCtrlProgram.patch_vertices; > > key->outputs_written = per_vertex_slots; > > key->patch_outputs_written = per_patch_slots; > > > > I guess this is okay, and it's better than nothing. I'd really rather > see it converted to a uniform, like it is in the normal GLSL paths. If > you're going to add recompiles based on the key like this, it might be > nice to at least update the brw_tcs_precompile function to guess, so we > at least attempt to avoid a recompile. > Ugh... I'm happy to give a stronger "I don't like this". In Vulkan, this is part of the pipeline state so we just pass it in through the shader key. With GL, ugh... Personally, I think I'd be ok with just making it state based all the time but we already have the infrastructure to pass it through as a uniform so we may as well. I think the better thing to do would be to add a quick little pass that moves VERTICES_IN to a uniform and call that on gen8+ brw_link.cpp. Then we can delete LowerTESPatchVerticesIn as i965 is the only user. The "pass" would be really easy: void brw_nir_lower_tcs_vertices_in_to_uniform(nir_shader *nir, const struct gl_program *prog, brw_tcs_prog_data *prog_data) { int uniform = -1; nir_foreach_var_safe(var, >system_values) { if (var->data.location != SYSTEM_VALUE_VERTICES_IN) continue; if (uniform < -1) { gl_state_index tokens[5] = { STATE_INTERNAL, STATE_TESS_PATCH_VERTICES_IN, }; int index = _mesa_add_state_reference(prog->Parameters, tokens); uniform = prog_data->nr_params; uint32_t *param = brw_stage_prog_data_add_params(_data->base->base, 1); *param = BRW_PARAM_PARAMETER(index, SWIZZLE_); } var->mode = nir_var_uniform; var->data.location = uniform; exec_node_remove(>node); exec_list_push_tail(>uniforms, >node); } } I may not have gotten my state referencing quite right there, but I think it's close. I'd probably put the pas in brw_nir_uniforms.cpp if I was writing it. --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: fix st_nir_assign_var_locations for patch variables
Thanks. Reviewed-by: Timothy ArceriOn 06/01/18 20:01, Karol Herbst wrote: Signed-off-by: Karol Herbst Reviewed-by: Kenneth Graunke --- src/mesa/state_tracker/st_glsl_to_nir.cpp | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp b/src/mesa/state_tracker/st_glsl_to_nir.cpp index 5683df..1c5de3d5de 100644 --- a/src/mesa/state_tracker/st_glsl_to_nir.cpp +++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp @@ -139,8 +139,12 @@ st_nir_assign_var_locations(struct exec_list *var_list, unsigned *size, } bool processed = false; - if (var->data.patch) { - unsigned patch_loc = var->data.location - VARYING_SLOT_VAR0; + if (var->data.patch && + var->data.location != VARYING_SLOT_TESS_LEVEL_INNER && + var->data.location != VARYING_SLOT_TESS_LEVEL_OUTER && + var->data.location != VARYING_SLOT_BOUNDING_BOX0 && + var->data.location != VARYING_SLOT_BOUNDING_BOX1) { + unsigned patch_loc = var->data.location - VARYING_SLOT_PATCH0; if (processed_patch_locs & (1 << patch_loc)) processed = true; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965/tcs: support gl_PatchVerticesIn as a system value from SPIR-V
On Wednesday, November 15, 2017 11:53:08 PM PST Iago Toral Quiroga wrote: > We currently handle this by lowering it to a uniform for gen8+ but > the SPIR-V path generates this as a system value, so handle that > case as well. > --- > src/mesa/drivers/dri/i965/brw_tcs.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c > b/src/mesa/drivers/dri/i965/brw_tcs.c > index 4424efea4f0..b07b11f485d 100644 > --- a/src/mesa/drivers/dri/i965/brw_tcs.c > +++ b/src/mesa/drivers/dri/i965/brw_tcs.c > @@ -296,7 +296,14 @@ brw_tcs_populate_key(struct brw_context *brw, >per_patch_slots |= prog->info.patch_outputs_written; > } > > - if (devinfo->gen < 8 || !tcp) > + /* For GLSL, gen8+ lowers gl_PatchVerticesIn to a uniform, however > +* the SPIR-V path always lowers it to a system value. > +*/ > + bool reads_patch_vertices_as_system_value = > + tcp && (tcp->program.nir->info.system_values_read & > + BITFIELD64_BIT(SYSTEM_VALUE_VERTICES_IN)); > + > + if (devinfo->gen < 8 || !tcp || reads_patch_vertices_as_system_value) >key->input_vertices = brw->ctx.TessCtrlProgram.patch_vertices; > key->outputs_written = per_vertex_slots; > key->patch_outputs_written = per_patch_slots; > I guess this is okay, and it's better than nothing. I'd really rather see it converted to a uniform, like it is in the normal GLSL paths. If you're going to add recompiles based on the key like this, it might be nice to at least update the brw_tcs_precompile function to guess, so we at least attempt to avoid a recompile. As a stop-gap measure, Reviewed-by: Kenneth Graunkesignature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/15] RadeonSI 32-bit GPU pointers
On Sat, Jan 6, 2018 at 5:51 PM, Christian Königwrote: > Hi Marek, > > actually I was on the verge to remove the 32bit VM support in libdrm because > it clashes with HMM and SVM in general. > > Is it possible to set the upper 32bit of the 64bit address to some fixed > value instead? Yes, but not on radeon. radeon only has 8GB of virtual address space and 4GB on older kernels. I would have to change LLVM to set the high bits differently on amdgpu but keep the high bits 0 on radeon. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 3/4] meson: build clover
Quoting Jan Vesely (2018-01-06 15:18:54) > On Fri, 2018-01-05 at 15:26 -0800, Dylan Baker wrote: > > Quoting Jan Vesely (2018-01-05 14:16:41) > > > Hi, > > > > > > > > > sorry for the delay. I was mostly traveling during the holidays. > > > > No worries, I was also away over the holidays and didn't look at this until > > today. > > > > > > > > On Fri, 2017-12-15 at 10:54 -0800, Dylan Baker wrote: > > > > This has only been compile tested. > > > > > > > > v2: - Have a single option for opencl (Eric E) > > > > - fix typo "tgis" -> "tgsi" (Curro) > > > > - Don't add "lib" to pipe loader libraries, which matches the > > > > autotools behavior > > > > v3: - Remove trailing whitespace > > > > - Make PIPE_SEARCH_DIR an absolute path > > > > > > > > cc: Curro Jerez> > > > cc: Jan Vesely > > > > cc: Aaron Watry > > > > Signed-off-by: Dylan Baker > > > > --- > > > > include/meson.build | 19 > > > > meson.build | 29 +- > > > > meson_options.txt | 7 ++ > > > > src/gallium/auxiliary/pipe-loader/meson.build | 3 +- > > > > src/gallium/meson.build | 12 ++- > > > > src/gallium/state_trackers/clover/meson.build | 122 > > > > ++ > > > > src/gallium/targets/opencl/meson.build| 73 +++ > > > > src/gallium/targets/pipe-loader/meson.build | 77 > > > > 8 files changed, 336 insertions(+), 6 deletions(-) > > > > create mode 100644 src/gallium/state_trackers/clover/meson.build > > > > create mode 100644 src/gallium/targets/opencl/meson.build > > > > create mode 100644 src/gallium/targets/pipe-loader/meson.build > > > > > > > > diff --git a/include/meson.build b/include/meson.build > > > > index e4dae91cede..a2e7ce6580e 100644 > > > > --- a/include/meson.build > > > > +++ b/include/meson.build > > > > @@ -78,3 +78,22 @@ if with_gallium_st_nine > > > > subdir : 'd3dadapter', > > > >) > > > > endif > > > > + > > > > +# Only install the headers if we are building a stand alone > > > > implementation and > > > > +# not an ICD enabled implementation > > > > +if with_gallium_opencl and not with_opencl_icd > > > > + install_headers( > > > > +'CL/cl.h', > > > > +'CL/cl.hpp', > > > > +'CL/cl_d3d10.h', > > > > +'CL/cl_d3d11.h', > > > > +'CL/cl_dx9_media_sharing.h', > > > > +'CL/cl_egl.h', > > > > +'CL/cl_ext.h', > > > > +'CL/cl_gl.h', > > > > +'CL/cl_gl_ext.h', > > > > +'CL/cl_platform.h', > > > > +'CL/opencl.h', > > > > +subdir: 'CL' > > > > + ) > > > > +endif > > > > diff --git a/meson.build b/meson.build > > > > index 842d441199e..74b2d5c49dc 100644 > > > > --- a/meson.build > > > > +++ b/meson.build > > > > @@ -583,6 +583,22 @@ if with_gallium_st_nine > > > >endif > > > > endif > > > > > > > > +_opencl = get_option('gallium-opencl') > > > > +if _opencl !=' disabled' > > > > + if not with_gallium > > > > +error('OpenCL Clover implementation requires at least one gallium > > > > driver.') > > > > + endif > > > > + > > > > + # TODO: alitvec? > > > > + dep_clc = dependency('libclc') > > > > + with_gallium_opencl = true > > > > + with_opencl_icd = _opencl == 'icd' > > > > +else > > > > + dep_clc = [] > > > > + with_gallium_opencl = false > > > > + with_gallium_icd = false > > > > +endif > > > > + > > > > gl_pkgconfig_c_flags = [] > > > > if with_platform_x11 > > > >if with_any_vk or (with_glx == 'dri' and with_dri_platform == 'drm') > > > > @@ -930,7 +946,7 @@ dep_thread = dependency('threads') > > > > if dep_thread.found() and host_machine.system() != 'windows' > > > >pre_args += '-DHAVE_PTHREAD' > > > > endif > > > > -if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 # TODO: > > > > clover > > > > +if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 or > > > > with_gallium_opencl > > > >dep_elf = dependency('libelf', required : false) > > > >if not dep_elf.found() > > > > dep_elf = cc.find_library('elf') > > > > @@ -972,12 +988,19 @@ if with_amd_vk or with_gallium_radeonsi or > > > > with_gallium_r600 > > > > llvm_modules += 'asmparser' > > > >endif > > > > endif > > > > +if with_gallium_opencl > > > > + llvm_modules += [ > > > > +'all-targets', 'linker', 'coverage', 'instrumentation', 'ipo', > > > > 'irreader', > > > > +'lto', 'option', 'objcarcopts', 'profiledata', > > > > + ] > > > > + # TODO: optional modules > > > > +endif > > > > > > > > _llvm = get_option('llvm') > > > > if _llvm == 'auto' > > > >dep_llvm = dependency( > > > > 'llvm', version : '>= 3.9.0', modules : llvm_modules, > > > > -required : with_amd_vk or with_gallium_radeonsi or > > > > with_gallium_swr, > > > > +required : with_amd_vk or with_gallium_radeonsi or > > > >
Re: [Mesa-dev] [PATCH v3 3/4] meson: build clover
On Fri, 2018-01-05 at 15:26 -0800, Dylan Baker wrote: > Quoting Jan Vesely (2018-01-05 14:16:41) > > Hi, > > > > > > sorry for the delay. I was mostly traveling during the holidays. > > No worries, I was also away over the holidays and didn't look at this until > today. > > > > > On Fri, 2017-12-15 at 10:54 -0800, Dylan Baker wrote: > > > This has only been compile tested. > > > > > > v2: - Have a single option for opencl (Eric E) > > > - fix typo "tgis" -> "tgsi" (Curro) > > > - Don't add "lib" to pipe loader libraries, which matches the > > > autotools behavior > > > v3: - Remove trailing whitespace > > > - Make PIPE_SEARCH_DIR an absolute path > > > > > > cc: Curro Jerez> > > cc: Jan Vesely > > > cc: Aaron Watry > > > Signed-off-by: Dylan Baker > > > --- > > > include/meson.build | 19 > > > meson.build | 29 +- > > > meson_options.txt | 7 ++ > > > src/gallium/auxiliary/pipe-loader/meson.build | 3 +- > > > src/gallium/meson.build | 12 ++- > > > src/gallium/state_trackers/clover/meson.build | 122 > > > ++ > > > src/gallium/targets/opencl/meson.build| 73 +++ > > > src/gallium/targets/pipe-loader/meson.build | 77 > > > 8 files changed, 336 insertions(+), 6 deletions(-) > > > create mode 100644 src/gallium/state_trackers/clover/meson.build > > > create mode 100644 src/gallium/targets/opencl/meson.build > > > create mode 100644 src/gallium/targets/pipe-loader/meson.build > > > > > > diff --git a/include/meson.build b/include/meson.build > > > index e4dae91cede..a2e7ce6580e 100644 > > > --- a/include/meson.build > > > +++ b/include/meson.build > > > @@ -78,3 +78,22 @@ if with_gallium_st_nine > > > subdir : 'd3dadapter', > > >) > > > endif > > > + > > > +# Only install the headers if we are building a stand alone > > > implementation and > > > +# not an ICD enabled implementation > > > +if with_gallium_opencl and not with_opencl_icd > > > + install_headers( > > > +'CL/cl.h', > > > +'CL/cl.hpp', > > > +'CL/cl_d3d10.h', > > > +'CL/cl_d3d11.h', > > > +'CL/cl_dx9_media_sharing.h', > > > +'CL/cl_egl.h', > > > +'CL/cl_ext.h', > > > +'CL/cl_gl.h', > > > +'CL/cl_gl_ext.h', > > > +'CL/cl_platform.h', > > > +'CL/opencl.h', > > > +subdir: 'CL' > > > + ) > > > +endif > > > diff --git a/meson.build b/meson.build > > > index 842d441199e..74b2d5c49dc 100644 > > > --- a/meson.build > > > +++ b/meson.build > > > @@ -583,6 +583,22 @@ if with_gallium_st_nine > > >endif > > > endif > > > > > > +_opencl = get_option('gallium-opencl') > > > +if _opencl !=' disabled' > > > + if not with_gallium > > > +error('OpenCL Clover implementation requires at least one gallium > > > driver.') > > > + endif > > > + > > > + # TODO: alitvec? > > > + dep_clc = dependency('libclc') > > > + with_gallium_opencl = true > > > + with_opencl_icd = _opencl == 'icd' > > > +else > > > + dep_clc = [] > > > + with_gallium_opencl = false > > > + with_gallium_icd = false > > > +endif > > > + > > > gl_pkgconfig_c_flags = [] > > > if with_platform_x11 > > >if with_any_vk or (with_glx == 'dri' and with_dri_platform == 'drm') > > > @@ -930,7 +946,7 @@ dep_thread = dependency('threads') > > > if dep_thread.found() and host_machine.system() != 'windows' > > >pre_args += '-DHAVE_PTHREAD' > > > endif > > > -if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 # TODO: > > > clover > > > +if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 or > > > with_gallium_opencl > > >dep_elf = dependency('libelf', required : false) > > >if not dep_elf.found() > > > dep_elf = cc.find_library('elf') > > > @@ -972,12 +988,19 @@ if with_amd_vk or with_gallium_radeonsi or > > > with_gallium_r600 > > > llvm_modules += 'asmparser' > > >endif > > > endif > > > +if with_gallium_opencl > > > + llvm_modules += [ > > > +'all-targets', 'linker', 'coverage', 'instrumentation', 'ipo', > > > 'irreader', > > > +'lto', 'option', 'objcarcopts', 'profiledata', > > > + ] > > > + # TODO: optional modules > > > +endif > > > > > > _llvm = get_option('llvm') > > > if _llvm == 'auto' > > >dep_llvm = dependency( > > > 'llvm', version : '>= 3.9.0', modules : llvm_modules, > > > -required : with_amd_vk or with_gallium_radeonsi or with_gallium_swr, > > > +required : with_amd_vk or with_gallium_radeonsi or with_gallium_swr > > > or with_gallium_opencl, > > >) > > >with_llvm = dep_llvm.found() > > > elif _llvm == 'true' > > > @@ -1154,8 +1177,6 @@ else > > >dep_lmsensors = [] > > > endif > > > > > > -# TODO: clover > > > - > > > # TODO: gallium tests > > > > > > # TODO: various libdirs > > > diff --git
Re: [Mesa-dev] [PATCH 2/2] i965: Torch public intel_batchbuffer_emit_dword/float helpers.
Both are Reviewed-by: Jason EkstrandThere is a part of me that has been tempted for some time to try and make some sort of generic batch buffer structure and share it between GL and Vulkan. Getting this stuff right is hard and a good set of unified helpers may help. I'm not sure how good of an idea that would be but it's a thought. Also, not all that applicable to this patch, it just got me thinking about it again. :-) On January 5, 2018 20:04:47 Kenneth Graunke wrote: intel_batchbuffer_emit_float is dead code, it should go. intel_batchbuffer_emit_dword only had one user, which had bungled using them by forgetting to call intel_batchbuffer_require_space first. So it seems wise to delete these unsafe helpers. Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 4 ++-- src/mesa/drivers/dri/i965/intel_batchbuffer.h | 13 - 2 files changed, 2 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index 3fd8e05d3dc..a17e1699254 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -692,9 +692,9 @@ brw_finish_batch(struct brw_context *brw) * necessary by emitting an extra MI_NOOP after the end. */ intel_batchbuffer_require_space(brw, 8, brw->batch.ring); - intel_batchbuffer_emit_dword(>batch, MI_BATCH_BUFFER_END); + *brw->batch.map_next++ = MI_BATCH_BUFFER_END; if (USED_BATCH(brw->batch) & 1) { - intel_batchbuffer_emit_dword(>batch, MI_NOOP); + *brw->batch.map_next++ = MI_NOOP; } brw->batch.no_wrap = false; diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h b/src/mesa/drivers/dri/i965/intel_batchbuffer.h index a927fe7e09e..a9a34600ad1 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h @@ -78,19 +78,6 @@ static inline uint32_t float_as_int(float f) return fi.d; } -static inline void -intel_batchbuffer_emit_dword(struct intel_batchbuffer *batch, GLuint dword) -{ - *batch->map_next++ = dword; - assert(batch->ring != UNKNOWN_RING); -} - -static inline void -intel_batchbuffer_emit_float(struct intel_batchbuffer *batch, float f) -{ - intel_batchbuffer_emit_dword(batch, float_as_int(f)); -} - static inline void intel_batchbuffer_begin(struct brw_context *brw, int n, enum brw_gpu_ring ring) { -- 2.15.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/15] RadeonSI 32-bit GPU pointers
Hi Marek, actually I was on the verge to remove the 32bit VM support in libdrm because it clashes with HMM and SVM in general. Is it possible to set the upper 32bit of the 64bit address to some fixed value instead? Regards, Christian. Am 06.01.2018 um 12:12 schrieb Marek Olšák: Hi, This series: - increases the number of buckets in pb_cache - adds 32-bit heaps: GTT WC, VRAM, and read-only versions of those - adds a 32-bit VM allocator into winsys/radeon and enables 32-bit VM allocations in both winsyses - moves all const_uploader allocations to 32-bit address space - puts "amdgpu.uniform" LLVM metadata on loads instead of GEPs, so that InstCombine doesn't remove it - switches shader pointers in user SGPRs to 32 bits Dependencies: - https://reviews.llvm.org/D41715 - https://reviews.llvm.org/D41651 This frees up to 7 user SGPRs in merged shaders, 5 user SGPRs in vertex shaders, and 4 user SGPRs in other shaders. Please review. Thanks, Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 00/31] Nir support for Nouveau
On Sat, Jan 6, 2018 at 1:34 AM, Kenneth Graunkewrote: > On Thursday, January 4, 2018 11:56:44 AM PST Jason Ekstrand wrote: >> On January 4, 2018 12:51:15 Karol Herbst wrote: >> >> > On Thu, Jan 4, 2018 at 7:06 PM, Ilia Mirkin wrote: >> >> On Thu, Jan 4, 2018 at 10:01 AM, Karol Herbst wrote: >> >>> significant changes to last series: >> >>> * arb_gpu_shader5 interpolateat* (those nir ops don't map well to nvir) >> >>> no good plan on how to properly implement those >> >> >> >> What's the issue? They should map as well as the TGSI ones. (Since the >> >> TGSI ones are just the GLSL ones.) >> >> >> > >> > it is a bit ugly, because usually all inputs vars are lowered away, so >> > that they are inputs. So they need special handling; >> > >> > lowered (input is centroid): >> > vec1 32 ssa_25 = intrinsic load_input (ssa_24) () (0, 0) /* base=0 */ >> > /* component=0 */ /* packed:centroid_qualified */ >> > vec1 32 ssa_27 = intrinsic load_input (ssa_26) () (0, 1) /* base=0 */ >> > /* component=1 */ /* packed:centroid_qualified */ >> > >> > not lowered: >> > decl_var INTERP_MODE_NONE vec2 in@unqualified-temp >> > vec2 32 ssa_11 = intrinsic interp_var_at_centroid () (in@unqualified-temp) >> > () >> > >> > I kind of wished I could have a load_input intrinsic with a flag or >> > load_input_at_centroid, so that I end up with the same code in the >> > end. >> >> In i965, we use the NIR explicit input interpolation intrinsics. I'm on my >> phone so I can't give more details easily. > > Setting nir_shader_compiler_options::use_interpolated_input_intrinsics > will eliminate the need to look at variables. Instead, you'll get these > intrinsics: > > - load_input (for flat shaded inputs) > - load_interpolated_input (for non-flat shaded inputs) > - load_barycentric_pixel > - load_barycentric_centroid > - load_barycentric_sample > - load_barycentric_at_sample (+ sample ID source) > - load_barycentric_at_offset (+ offset.xy source) > > The load_interpolated_input intrinsic takes an extra source, which > should always be one of the load_barycentric_* intrinsics. That way, > from the intrinsic, you can see exactly how to interpolate it. > > I highly recommend using these. They're much nicer to work with. well I already implemented looking at the variables and this is no issue really. What I am more concerned about are local variables used in interp_var_at_* intrinsics and those didn't get converted with use_interpolated_input_intrinsics. I will take a deeper look, maybe there is some weird condition to actually convert those as well. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/15] ac: place amdgpu.uniform on loads instead of GEPs
From: Marek Olšák--- src/amd/common/ac_llvm_build.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 164f310..ed00d20 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -775,25 +775,28 @@ ac_build_indexed_store(struct ac_llvm_context *ctx, * dynamically uniform (i.e. load to an SGPR) * \param invariant Whether the load is invariant (no other opcodes affect it) */ static LLVMValueRef ac_build_load_custom(struct ac_llvm_context *ctx, LLVMValueRef base_ptr, LLVMValueRef index, bool uniform, bool invariant) { LLVMValueRef pointer, result; pointer = ac_build_gep0(ctx, base_ptr, index); - if (uniform) + /* This will be removed by InstCombine if index == 0. */ + if (HAVE_LLVM < 0x0600 && uniform) LLVMSetMetadata(pointer, ctx->uniform_md_kind, ctx->empty_md); result = LLVMBuildLoad(ctx->builder, pointer, ""); if (invariant) LLVMSetMetadata(result, ctx->invariant_load_md_kind, ctx->empty_md); + if (HAVE_LLVM >= 0x0600 && uniform) + LLVMSetMetadata(result, ctx->uniform_md_kind, ctx->empty_md); return result; } LLVMValueRef ac_build_load(struct ac_llvm_context *ctx, LLVMValueRef base_ptr, LLVMValueRef index) { return ac_build_load_custom(ctx, base_ptr, index, false, false); } LLVMValueRef ac_build_load_invariant(struct ac_llvm_context *ctx, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/15] radeonsi: implement 32-bit pointers in user data SGPRs
From: Marek OlšákSGPRS: 2170102 -> 2158430 (-0.54 %) VGPRS: 1645656 -> 1641516 (-0.25 %) Spilled SGPRs: 9078 -> 8810 (-2.95 %) Spilled VGPRs: 130 -> 114 (-12.31 %) Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread Code Size: 52094872 -> 52692540 (1.15 %) bytes --- src/amd/common/ac_llvm_build.c| 13 +++ src/amd/common/ac_llvm_build.h| 5 + src/gallium/drivers/radeonsi/si_descriptors.c | 10 +- src/gallium/drivers/radeonsi/si_shader.c | 115 +- src/gallium/drivers/radeonsi/si_shader.h | 23 - src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c | 6 +- 6 files changed, 122 insertions(+), 50 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index ed00d20..02d1b39 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -57,20 +57,21 @@ ac_llvm_context_init(struct ac_llvm_context *ctx, LLVMContextRef context, ctx->context = context; ctx->module = NULL; ctx->builder = NULL; ctx->voidt = LLVMVoidTypeInContext(ctx->context); ctx->i1 = LLVMInt1TypeInContext(ctx->context); ctx->i8 = LLVMInt8TypeInContext(ctx->context); ctx->i16 = LLVMIntTypeInContext(ctx->context, 16); ctx->i32 = LLVMIntTypeInContext(ctx->context, 32); ctx->i64 = LLVMIntTypeInContext(ctx->context, 64); + ctx->intptr = HAVE_32BIT_POINTERS ? ctx->i32 : ctx->i64; ctx->f16 = LLVMHalfTypeInContext(ctx->context); ctx->f32 = LLVMFloatTypeInContext(ctx->context); ctx->f64 = LLVMDoubleTypeInContext(ctx->context); ctx->v2i16 = LLVMVectorType(ctx->i16, 2); ctx->v2i32 = LLVMVectorType(ctx->i32, 2); ctx->v3i32 = LLVMVectorType(ctx->i32, 3); ctx->v4i32 = LLVMVectorType(ctx->i32, 4); ctx->v2f32 = LLVMVectorType(ctx->f32, 2); ctx->v4f32 = LLVMVectorType(ctx->f32, 4); ctx->v8i32 = LLVMVectorType(ctx->i32, 8); @@ -128,21 +129,24 @@ unsigned ac_get_type_size(LLVMTypeRef type) { LLVMTypeKind kind = LLVMGetTypeKind(type); switch (kind) { case LLVMIntegerTypeKind: return LLVMGetIntTypeWidth(type) / 8; case LLVMFloatTypeKind: return 4; case LLVMDoubleTypeKind: + return 8; case LLVMPointerTypeKind: + if (LLVMGetPointerAddressSpace(type) == AC_CONST_32BIT_ADDR_SPACE) + return 4; return 8; case LLVMVectorTypeKind: return LLVMGetVectorSize(type) * ac_get_type_size(LLVMGetElementType(type)); case LLVMArrayTypeKind: return LLVMGetArrayLength(type) * ac_get_type_size(LLVMGetElementType(type)); default: assert(0); return 0; @@ -2035,10 +2039,19 @@ LLVMValueRef ac_find_lsb(struct ac_llvm_context *ctx, LLVMIntEQ, src0, ctx->i32_0, ""), LLVMConstInt(ctx->i32, -1, 0), lsb, ""); } LLVMTypeRef ac_array_in_const_addr_space(LLVMTypeRef elem_type) { return LLVMPointerType(LLVMArrayType(elem_type, 0), AC_CONST_ADDR_SPACE); } + +LLVMTypeRef ac_array_in_const32_addr_space(LLVMTypeRef elem_type) +{ + if (!HAVE_32BIT_POINTERS) + return ac_array_in_const_addr_space(elem_type); + + return LLVMPointerType(LLVMArrayType(elem_type, 0), + AC_CONST_32BIT_ADDR_SPACE); +} diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h index b1c4737..5235664 100644 --- a/src/amd/common/ac_llvm_build.h +++ b/src/amd/common/ac_llvm_build.h @@ -27,36 +27,40 @@ #include #include #include "amd_family.h" #ifdef __cplusplus extern "C" { #endif +#define HAVE_32BIT_POINTERS (HAVE_LLVM >= 0x0600) + enum { AC_CONST_ADDR_SPACE = 2, /* CONST is the only address space that selects SMEM loads */ AC_LOCAL_ADDR_SPACE = 3, + AC_CONST_32BIT_ADDR_SPACE = 6, /* same as CONST, but the pointer type has 32 bits */ }; struct ac_llvm_context { LLVMContextRef context; LLVMModuleRef module; LLVMBuilderRef builder; LLVMTypeRef voidt; LLVMTypeRef i1; LLVMTypeRef i8; LLVMTypeRef i16; LLVMTypeRef i32; LLVMTypeRef i64; + LLVMTypeRef intptr; LLVMTypeRef f16; LLVMTypeRef f32; LLVMTypeRef f64; LLVMTypeRef v2i16; LLVMTypeRef v2i32; LLVMTypeRef v3i32; LLVMTypeRef v4i32; LLVMTypeRef v2f32; LLVMTypeRef v4f32; LLVMTypeRef v8i32; @@ -331,16 +335,17 @@ void ac_declare_lds_as_pointer(struct ac_llvm_context *ac); LLVMValueRef ac_lds_load(struct ac_llvm_context
[Mesa-dev] [PATCH 13/15] ac: rename and move si_const_array into common code
From: Marek Olšák--- src/amd/common/ac_llvm_build.c| 6 ++ src/amd/common/ac_llvm_build.h| 3 +++ src/amd/common/ac_nir_to_llvm.c | 20 +++- src/gallium/drivers/radeonsi/si_shader.c | 19 ++- src/gallium/drivers/radeonsi/si_shader_internal.h | 2 -- src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c | 6 +++--- 6 files changed, 25 insertions(+), 31 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index a3af204..164f310 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -2026,10 +2026,16 @@ LLVMValueRef ac_find_lsb(struct ac_llvm_context *ctx, params, 2, AC_FUNC_ATTR_READNONE); /* TODO: We need an intrinsic to skip this conditional. */ /* Check for zero: */ return LLVMBuildSelect(ctx->builder, LLVMBuildICmp(ctx->builder, LLVMIntEQ, src0, ctx->i32_0, ""), LLVMConstInt(ctx->i32, -1, 0), lsb, ""); } + +LLVMTypeRef ac_array_in_const_addr_space(LLVMTypeRef elem_type) +{ + return LLVMPointerType(LLVMArrayType(elem_type, 0), + AC_CONST_ADDR_SPACE); +} diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h index 2d6efb5..b1c4737 100644 --- a/src/amd/common/ac_llvm_build.h +++ b/src/amd/common/ac_llvm_build.h @@ -329,15 +329,18 @@ void ac_init_exec_full_mask(struct ac_llvm_context *ctx); void ac_declare_lds_as_pointer(struct ac_llvm_context *ac); LLVMValueRef ac_lds_load(struct ac_llvm_context *ctx, LLVMValueRef dw_addr); void ac_lds_store(struct ac_llvm_context *ctx, LLVMValueRef dw_addr, LLVMValueRef value); LLVMValueRef ac_find_lsb(struct ac_llvm_context *ctx, LLVMTypeRef dst_type, LLVMValueRef src0); + +LLVMTypeRef ac_array_in_const_addr_space(LLVMTypeRef elem_type); + #ifdef __cplusplus } #endif #endif diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index 0445d27..bc5b140 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -344,26 +344,20 @@ create_llvm_function(LLVMContextRef ctx, LLVMModuleRef module, LLVMAddTargetDependentFunctionAttr(main_function, "no-nans-fp-math", "true"); LLVMAddTargetDependentFunctionAttr(main_function, "unsafe-fp-math", "true"); } return main_function; } -static LLVMTypeRef const_array(LLVMTypeRef elem_type, int num_elements) -{ - return LLVMPointerType(LLVMArrayType(elem_type, num_elements), - AC_CONST_ADDR_SPACE); -} - static int get_elem_bits(struct ac_llvm_context *ctx, LLVMTypeRef type) { if (LLVMGetTypeKind(type) == LLVMVectorTypeKind) type = LLVMGetElementType(type); if (LLVMGetTypeKind(type) == LLVMIntegerTypeKind) return LLVMGetIntTypeWidth(type); if (type == ctx->f16) return 16; @@ -606,58 +600,58 @@ static void allocate_user_sgprs(struct nir_to_llvm_context *ctx, static void declare_global_input_sgprs(struct nir_to_llvm_context *ctx, gl_shader_stage stage, bool has_previous_stage, gl_shader_stage previous_stage, const struct user_sgpr_info *user_sgpr_info, struct arg_info *args, LLVMValueRef *desc_sets) { - LLVMTypeRef type = const_array(ctx->ac.i8, 1024 * 1024); + LLVMTypeRef type = ac_array_in_const_addr_space(ctx->ac.i8); unsigned num_sets = ctx->options->layout ? ctx->options->layout->num_sets : 0; unsigned stage_mask = 1 << stage; if (has_previous_stage) stage_mask |= 1 << previous_stage; /* 1 for each descriptor set */ if (!user_sgpr_info->indirect_all_descriptor_sets) { for (unsigned i = 0; i < num_sets; ++i) { if (ctx->options->layout->set[i].layout->shader_stages & stage_mask) { add_array_arg(args, type, >descriptor_sets[i]); } } } else { - add_array_arg(args, const_array(type, 32), desc_sets); + add_array_arg(args, ac_array_in_const_addr_space(type),
[Mesa-dev] [PATCH 10/15] radeonsi: disallow constant buffers with a 64-bit address in slot 0
From: Marek OlšákState trackers must use a user buffer or const_uploader, or set pipe_resource::flags same as const_uploader->flags. --- src/gallium/drivers/radeonsi/si_descriptors.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c index 17115e1..b372090 100644 --- a/src/gallium/drivers/radeonsi/si_descriptors.c +++ b/src/gallium/drivers/radeonsi/si_descriptors.c @@ -1207,20 +1207,26 @@ void si_set_rw_buffer(struct si_context *sctx, static void si_pipe_set_constant_buffer(struct pipe_context *ctx, enum pipe_shader_type shader, uint slot, const struct pipe_constant_buffer *input) { struct si_context *sctx = (struct si_context *)ctx; if (shader >= SI_NUM_SHADERS) return; + if (slot == 0 && input && input->buffer && + !(r600_resource(input->buffer)->flags & RADEON_FLAG_32BIT)) { + assert(!"constant buffer 0 must have a 32-bit VM address, use const_uploader"); + return; + } + slot = si_get_constbuf_slot(slot); si_set_constant_buffer(sctx, >const_and_shader_buffers[shader], si_const_and_shader_buffer_descriptors_idx(shader), slot, input); } void si_get_pipe_constant_buffer(struct si_context *sctx, uint shader, uint slot, struct pipe_constant_buffer *cbuf) { cbuf->user_buffer = NULL; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/15] gallium/radeon: set number of pb_cache buckets = number of heaps
From: Marek Olšák--- src/gallium/drivers/radeon/radeon_winsys.h| 24 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 27 +-- src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 2 +- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 27 ++- src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 2 +- 5 files changed, 25 insertions(+), 57 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_winsys.h b/src/gallium/drivers/radeon/radeon_winsys.h index 9f274b4..7914170 100644 --- a/src/gallium/drivers/radeon/radeon_winsys.h +++ b/src/gallium/drivers/radeon/radeon_winsys.h @@ -716,44 +716,20 @@ static inline unsigned radeon_flags_from_heap(enum radeon_heap heap) RADEON_FLAG_32BIT; case RADEON_HEAP_VRAM: case RADEON_HEAP_GTT_WC: case RADEON_HEAP_GTT: default: return flags; } } -/* The pb cache bucket is chosen to minimize pb_cache misses. - * It must be between 0 and 3 inclusive. - */ -static inline unsigned radeon_get_pb_cache_bucket_index(enum radeon_heap heap) -{ -switch (heap) { -case RADEON_HEAP_VRAM_NO_CPU_ACCESS: -return 0; -case RADEON_HEAP_VRAM_READ_ONLY: -case RADEON_HEAP_VRAM_READ_ONLY_32BIT: -case RADEON_HEAP_VRAM_32BIT: -case RADEON_HEAP_VRAM: -return 1; -case RADEON_HEAP_GTT_WC: -case RADEON_HEAP_GTT_WC_READ_ONLY: -case RADEON_HEAP_GTT_WC_READ_ONLY_32BIT: -case RADEON_HEAP_GTT_WC_32BIT: -return 2; -case RADEON_HEAP_GTT: -default: -return 3; -} -} - /* Return the heap index for winsys allocators, or -1 on failure. */ static inline int radeon_get_heap_index(enum radeon_bo_domain domain, enum radeon_bo_flag flags) { /* VRAM implies WC (write combining) */ assert(!(domain & RADEON_DOMAIN_VRAM) || flags & RADEON_FLAG_GTT_WC); /* NO_CPU_ACCESS implies VRAM only. */ assert(!(flags & RADEON_FLAG_NO_CPU_ACCESS) || domain == RADEON_DOMAIN_VRAM); /* Resources with interprocess sharing don't use any winsys allocators. */ diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c index 92c314e..5d565ff 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c @@ -366,43 +366,44 @@ static void amdgpu_add_buffer_to_global_list(struct amdgpu_winsys_bo *bo) simple_mtx_lock(>global_bo_list_lock); LIST_ADDTAIL(>u.real.global_list_item, >global_bo_list); ws->num_buffers++; simple_mtx_unlock(>global_bo_list_lock); } } static struct amdgpu_winsys_bo *amdgpu_create_bo(struct amdgpu_winsys *ws, uint64_t size, unsigned alignment, - unsigned usage, enum radeon_bo_domain initial_domain, unsigned flags, - unsigned pb_cache_bucket) + int heap) { struct amdgpu_bo_alloc_request request = {0}; amdgpu_bo_handle buf_handle; uint64_t va = 0; struct amdgpu_winsys_bo *bo; amdgpu_va_handle va_handle; unsigned va_gap_size; int r; /* VRAM or GTT must be specified, but not both at the same time. */ assert(util_bitcount(initial_domain & RADEON_DOMAIN_VRAM_GTT) == 1); bo = CALLOC_STRUCT(amdgpu_winsys_bo); if (!bo) { return NULL; } - pb_cache_init_entry(>bo_cache, >u.real.cache_entry, >base, - pb_cache_bucket); + if (heap >= 0) { + pb_cache_init_entry(>bo_cache, >u.real.cache_entry, >base, + heap); + } request.alloc_size = size; request.phys_alignment = alignment; if (initial_domain & RADEON_DOMAIN_VRAM) request.preferred_heap |= AMDGPU_GEM_DOMAIN_VRAM; if (initial_domain & RADEON_DOMAIN_GTT) request.preferred_heap |= AMDGPU_GEM_DOMAIN_GTT; /* If VRAM is just stolen system memory, allow both VRAM and * GTT, whichever has free space. If a buffer is evicted from @@ -446,21 +447,21 @@ static struct amdgpu_winsys_bo *amdgpu_create_bo(struct amdgpu_winsys *ws, if (!(flags & RADEON_FLAG_READ_ONLY)) vm_flags |= AMDGPU_VM_PAGE_WRITEABLE; r = amdgpu_bo_va_op_raw(ws->dev, buf_handle, 0, size, va, vm_flags, AMDGPU_VA_OP_MAP); if (r) goto error_va_map; pipe_reference_init(>base.reference, 1); bo->base.alignment = alignment; - bo->base.usage = usage; + bo->base.usage = 0; bo->base.size = size; bo->base.vtbl = _winsys_bo_vtbl; bo->ws = ws; bo->bo = buf_handle; bo->va = va; bo->u.real.va_handle = va_handle; bo->initial_domain
[Mesa-dev] [PATCH 06/15] winsys/amdgpu: enable 32-bit VM allocations
From: Marek Olšák--- src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c index 5d565ff..8ce131c 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c @@ -430,21 +430,22 @@ static struct amdgpu_winsys_bo *amdgpu_create_bo(struct amdgpu_winsys *ws, fprintf(stderr, "amdgpu:size : %"PRIu64" bytes\n", size); fprintf(stderr, "amdgpu:alignment : %u bytes\n", alignment); fprintf(stderr, "amdgpu:domains : %u\n", initial_domain); goto error_bo_alloc; } va_gap_size = ws->check_vm ? MAX2(4 * alignment, 64 * 1024) : 0; if (size > ws->info.pte_fragment_size) alignment = MAX2(alignment, ws->info.pte_fragment_size); r = amdgpu_va_range_alloc(ws->dev, amdgpu_gpu_va_range_general, - size + va_gap_size, alignment, 0, , _handle, 0); + size + va_gap_size, alignment, 0, , _handle, + flags & RADEON_FLAG_32BIT ? AMDGPU_VA_RANGE_32_BIT : 0); if (r) goto error_va_alloc; unsigned vm_flags = AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_EXECUTABLE; if (!(flags & RADEON_FLAG_READ_ONLY)) vm_flags |= AMDGPU_VM_PAGE_WRITEABLE; r = amdgpu_bo_va_op_raw(ws->dev, buf_handle, 0, size, va, vm_flags, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/15] ac: don't use byval LLVM qualifier in shaders
From: Marek Olšákshader-db doesn't show any regression and 32-bit pointers with byval are declared as VGPRs for some reason. --- src/amd/common/ac_llvm_helper.cpp | 3 +-- src/amd/common/ac_llvm_util.c | 2 -- src/amd/common/ac_llvm_util.h | 1 - src/amd/common/ac_nir_to_llvm.c | 6 ++ src/gallium/auxiliary/gallivm/lp_bld_intr.c | 2 -- src/gallium/auxiliary/gallivm/lp_bld_intr.h | 1 - src/gallium/drivers/radeonsi/si_shader.c| 17 + 7 files changed, 8 insertions(+), 24 deletions(-) diff --git a/src/amd/common/ac_llvm_helper.cpp b/src/amd/common/ac_llvm_helper.cpp index 4db7036..54562cc 100644 --- a/src/amd/common/ac_llvm_helper.cpp +++ b/src/amd/common/ac_llvm_helper.cpp @@ -52,22 +52,21 @@ void ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t bytes) #else A->addAttr(llvm::Attribute::getWithDereferenceableBytes(A->getContext(), bytes)); #endif } bool ac_is_sgpr_param(LLVMValueRef arg) { llvm::Argument *A = llvm::unwrap(arg); llvm::AttributeList AS = A->getParent()->getAttributes(); unsigned ArgNo = A->getArgNo(); - return AS.hasAttribute(ArgNo + 1, llvm::Attribute::ByVal) || - AS.hasAttribute(ArgNo + 1, llvm::Attribute::InReg); + return AS.hasAttribute(ArgNo + 1, llvm::Attribute::InReg); } LLVMValueRef ac_llvm_get_called_value(LLVMValueRef call) { #if HAVE_LLVM >= 0x0309 return LLVMGetCalledValue(call); #else return llvm::wrap(llvm::CallSite(llvm::unwrap(call)).getCalledValue()); #endif } diff --git a/src/amd/common/ac_llvm_util.c b/src/amd/common/ac_llvm_util.c index 429904c..5fd785a 100644 --- a/src/amd/common/ac_llvm_util.c +++ b/src/amd/common/ac_llvm_util.c @@ -145,39 +145,37 @@ LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family, enum ac return tm; } #if HAVE_LLVM < 0x0400 static LLVMAttribute ac_attr_to_llvm_attr(enum ac_func_attr attr) { switch (attr) { case AC_FUNC_ATTR_ALWAYSINLINE: return LLVMAlwaysInlineAttribute; - case AC_FUNC_ATTR_BYVAL: return LLVMByValAttribute; case AC_FUNC_ATTR_INREG: return LLVMInRegAttribute; case AC_FUNC_ATTR_NOALIAS: return LLVMNoAliasAttribute; case AC_FUNC_ATTR_NOUNWIND: return LLVMNoUnwindAttribute; case AC_FUNC_ATTR_READNONE: return LLVMReadNoneAttribute; case AC_FUNC_ATTR_READONLY: return LLVMReadOnlyAttribute; default: fprintf(stderr, "Unhandled function attribute: %x\n", attr); return 0; } } #else static const char *attr_to_str(enum ac_func_attr attr) { switch (attr) { case AC_FUNC_ATTR_ALWAYSINLINE: return "alwaysinline"; - case AC_FUNC_ATTR_BYVAL: return "byval"; case AC_FUNC_ATTR_INREG: return "inreg"; case AC_FUNC_ATTR_NOALIAS: return "noalias"; case AC_FUNC_ATTR_NOUNWIND: return "nounwind"; case AC_FUNC_ATTR_READNONE: return "readnone"; case AC_FUNC_ATTR_READONLY: return "readonly"; case AC_FUNC_ATTR_WRITEONLY: return "writeonly"; case AC_FUNC_ATTR_INACCESSIBLE_MEM_ONLY: return "inaccessiblememonly"; case AC_FUNC_ATTR_CONVERGENT: return "convergent"; default: fprintf(stderr, "Unhandled function attribute: %x\n", attr); diff --git a/src/amd/common/ac_llvm_util.h b/src/amd/common/ac_llvm_util.h index 7c8b6b0..26b0959 100644 --- a/src/amd/common/ac_llvm_util.h +++ b/src/amd/common/ac_llvm_util.h @@ -30,21 +30,20 @@ #include #include "amd_family.h" #ifdef __cplusplus extern "C" { #endif enum ac_func_attr { AC_FUNC_ATTR_ALWAYSINLINE = (1 << 0), - AC_FUNC_ATTR_BYVAL= (1 << 1), AC_FUNC_ATTR_INREG= (1 << 2), AC_FUNC_ATTR_NOALIAS = (1 << 3), AC_FUNC_ATTR_NOUNWIND = (1 << 4), AC_FUNC_ATTR_READNONE = (1 << 5), AC_FUNC_ATTR_READONLY = (1 << 6), AC_FUNC_ATTR_WRITEONLY= HAVE_LLVM >= 0x0400 ? (1 << 7) : 0, AC_FUNC_ATTR_INACCESSIBLE_MEM_ONLY = HAVE_LLVM >= 0x0400 ? (1 << 8) : 0, AC_FUNC_ATTR_CONVERGENT = HAVE_LLVM >= 0x0400 ? (1 << 9) : 0, /* Legacy intrinsic that needs attributes on function declarations diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index 48e2920..187fdfb 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -316,28 +316,26 @@ create_llvm_function(LLVMContextRef ctx, LLVMModuleRef module, main_function_type = LLVMFunctionType(ret_type, args->types, args->count, 0); LLVMValueRef main_function = LLVMAddFunction(module, "main", main_function_type); main_function_body = LLVMAppendBasicBlockInContext(ctx, main_function, "main_body"); LLVMPositionBuilderAtEnd(builder, main_function_body); LLVMSetFunctionCallConv(main_function, RADEON_LLVM_AMDGPU_CS); for (unsigned i = 0; i < args->sgpr_count; ++i) { +
[Mesa-dev] [PATCH 12/15] ac: move address space definitions to common code
From: Marek Olšák--- src/amd/common/ac_llvm_build.h | 1 + src/amd/common/ac_nir_to_llvm.c | 9 +++-- src/gallium/drivers/radeonsi/si_shader.c | 11 +++ 3 files changed, 7 insertions(+), 14 deletions(-) diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h index 5d39458..2d6efb5 100644 --- a/src/amd/common/ac_llvm_build.h +++ b/src/amd/common/ac_llvm_build.h @@ -28,20 +28,21 @@ #include #include #include "amd_family.h" #ifdef __cplusplus extern "C" { #endif enum { + AC_CONST_ADDR_SPACE = 2, /* CONST is the only address space that selects SMEM loads */ AC_LOCAL_ADDR_SPACE = 3, }; struct ac_llvm_context { LLVMContextRef context; LLVMModuleRef module; LLVMBuilderRef builder; LLVMTypeRef voidt; LLVMTypeRef i1; diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index 187fdfb..0445d27 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -36,23 +36,20 @@ #include "ac_exp_param.h" enum radeon_llvm_calling_convention { RADEON_LLVM_AMDGPU_VS = 87, RADEON_LLVM_AMDGPU_GS = 88, RADEON_LLVM_AMDGPU_PS = 89, RADEON_LLVM_AMDGPU_CS = 90, RADEON_LLVM_AMDGPU_HS = 93, }; -#define CONST_ADDR_SPACE 2 -#define LOCAL_ADDR_SPACE 3 - #define RADEON_LLVM_MAX_INPUTS (VARYING_SLOT_VAR31 + 1) #define RADEON_LLVM_MAX_OUTPUTS (VARYING_SLOT_VAR31 + 1) struct nir_to_llvm_context; struct ac_nir_context { struct ac_llvm_context ac; struct ac_shader_abi *abi; gl_shader_stage stage; @@ -350,21 +347,21 @@ create_llvm_function(LLVMContextRef ctx, LLVMModuleRef module, LLVMAddTargetDependentFunctionAttr(main_function, "unsafe-fp-math", "true"); } return main_function; } static LLVMTypeRef const_array(LLVMTypeRef elem_type, int num_elements) { return LLVMPointerType(LLVMArrayType(elem_type, num_elements), - CONST_ADDR_SPACE); + AC_CONST_ADDR_SPACE); } static int get_elem_bits(struct ac_llvm_context *ctx, LLVMTypeRef type) { if (LLVMGetTypeKind(type) == LLVMVectorTypeKind) type = LLVMGetElementType(type); if (LLVMGetTypeKind(type) == LLVMIntegerTypeKind) return LLVMGetIntTypeWidth(type); @@ -1036,21 +1033,21 @@ static void create_function(struct nir_to_llvm_context *ctx, assign_arguments(ctx->main_function, ); user_sgpr_idx = 0; if (ctx->options->supports_spill || user_sgpr_info.need_ring_offsets) { set_loc_shader(ctx, AC_UD_SCRATCH_RING_OFFSETS, _sgpr_idx, 2); if (ctx->options->supports_spill) { ctx->ring_offsets = ac_build_intrinsic(>ac, "llvm.amdgcn.implicit.buffer.ptr", - LLVMPointerType(ctx->ac.i8, CONST_ADDR_SPACE), + LLVMPointerType(ctx->ac.i8, AC_CONST_ADDR_SPACE), NULL, 0, AC_FUNC_ATTR_READNONE); ctx->ring_offsets = LLVMBuildBitCast(ctx->builder, ctx->ring_offsets, const_array(ctx->ac.v4i32, 16), ""); } } /* For merged shaders the user SGPRs start at 8, with 8 system SGPRs in front (including * the rw_buffers at s0/s1. With user SGPR0 = s8, lets restart the count from 0 */ if (has_previous_stage) user_sgpr_idx = 0; @@ -5564,21 +5561,21 @@ setup_locals(struct ac_nir_context *ctx, static void setup_shared(struct ac_nir_context *ctx, struct nir_shader *nir) { nir_foreach_variable(variable, >shared) { LLVMValueRef shared = LLVMAddGlobalInAddressSpace( ctx->ac.module, glsl_to_llvm_type(ctx->nctx, variable->type), variable->name ? variable->name : "", - LOCAL_ADDR_SPACE); + AC_LOCAL_ADDR_SPACE); _mesa_hash_table_insert(ctx->vars, variable, shared); } } static LLVMValueRef emit_float_saturate(struct ac_llvm_context *ctx, LLVMValueRef v, float lo, float hi) { v = ac_to_float(ctx, v); v = emit_intrin_2f_param(ctx, "llvm.maxnum", ctx->f32, v, LLVMConstReal(ctx->f32, lo)); return emit_intrin_2f_param(ctx, "llvm.minnum", ctx->f32, v, LLVMConstReal(ctx->f32, hi)); diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 708da13..a1cc6e1 100644 ---
[Mesa-dev] [PATCH 08/15] winsys/radeon: implement and enable 32-bit VM allocations
From: Marek Olšák--- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 42 +++ src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 28 ++- src/gallium/winsys/radeon/drm/radeon_drm_winsys.h | 2 ++ 3 files changed, 64 insertions(+), 8 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c index bbfe5cc..06842a4 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ -242,32 +242,54 @@ static uint64_t radeon_bomgr_find_va(const struct radeon_info *info, if ((hole->size - waste) == size) { hole->size = waste; mtx_unlock(>mutex); return offset; } } offset = heap->start; waste = offset % alignment; waste = waste ? alignment - waste : 0; + +if (offset + waste + size > heap->end) { +mtx_unlock(>mutex); +return 0; +} + if (waste) { n = CALLOC_STRUCT(radeon_bo_va_hole); n->size = waste; n->offset = offset; list_add(>list, >holes); } offset += waste; heap->start += size + waste; mtx_unlock(>mutex); return offset; } +static uint64_t radeon_bomgr_find_va64(struct radeon_drm_winsys *ws, + uint64_t size, uint64_t alignment) +{ +uint64_t va = 0; + +/* Try to allocate from the 64-bit address space first. + * If it doesn't exist (start = 0) or if it doesn't have enough space, + * fall back to the 32-bit address space. + */ +if (ws->vm64.start) +va = radeon_bomgr_find_va(>info, >vm64, size, alignment); +if (!va) +va = radeon_bomgr_find_va(>info, >vm32, size, alignment); +return va; +} + static void radeon_bomgr_free_va(const struct radeon_info *info, struct radeon_vm_heap *heap, uint64_t va, uint64_t size) { struct radeon_bo_va_hole *hole = NULL; size = align(size, info->gart_page_size); mtx_lock(>mutex); if ((va + size) == heap->start) { @@ -363,21 +385,23 @@ void radeon_bo_destroy(struct pb_buffer *_buf) if (drmCommandWriteRead(rws->fd, DRM_RADEON_GEM_VA, , sizeof(va)) != 0 && va.operation == RADEON_VA_RESULT_ERROR) { fprintf(stderr, "radeon: Failed to deallocate virtual address for buffer:\n"); fprintf(stderr, "radeon:size : %"PRIu64" bytes\n", bo->base.size); fprintf(stderr, "radeon:va: 0x%"PRIx64"\n", bo->va); } } - radeon_bomgr_free_va(>info, >vm64, bo->va, bo->base.size); + radeon_bomgr_free_va(>info, + bo->va < rws->vm32.end ? >vm32 : >vm64, + bo->va, bo->base.size); } /* Close object. */ args.handle = bo->handle; drmIoctl(rws->fd, DRM_IOCTL_GEM_CLOSE, ); mtx_destroy(>u.real.map_mutex); if (bo->initial_domain & RADEON_DOMAIN_VRAM) rws->allocated_vram -= align(bo->base.size, rws->info.gart_page_size); @@ -653,22 +677,28 @@ static struct radeon_bo *radeon_create_bo(struct radeon_drm_winsys *rws, if (heap >= 0) { pb_cache_init_entry(>bo_cache, >u.real.cache_entry, >base, heap); } if (rws->info.has_virtual_memory) { struct drm_radeon_gem_va va; unsigned va_gap_size; va_gap_size = rws->check_vm ? MAX2(4 * alignment, 64 * 1024) : 0; -bo->va = radeon_bomgr_find_va(>info, >vm64, - size + va_gap_size, alignment); + +if (flags & RADEON_FLAG_32BIT) { +bo->va = radeon_bomgr_find_va(>info, >vm32, + size + va_gap_size, alignment); +assert(bo->va + size < rws->vm32.end); +} else { +bo->va = radeon_bomgr_find_va64(rws, size + va_gap_size, alignment); +} va.handle = bo->handle; va.vm_id = 0; va.operation = RADEON_VA_MAP; va.flags = RADEON_VM_PAGE_READABLE | RADEON_VM_PAGE_WRITEABLE | RADEON_VM_PAGE_SNOOPED; va.offset = bo->va; r = drmCommandWriteRead(rws->fd, DRM_RADEON_GEM_VA, , sizeof(va)); if (r && va.operation == RADEON_VA_RESULT_ERROR) { @@ -1055,22 +1085,21 @@ static struct pb_buffer *radeon_winsys_bo_from_ptr(struct radeon_winsys *rws, bo->hash = __sync_fetch_and_add(>next_bo_hash, 1); (void) mtx_init(>u.real.map_mutex, mtx_plain); util_hash_table_set(ws->bo_handles, (void*)(uintptr_t)bo->handle, bo); mtx_unlock(>bo_handles_mutex); if (ws->info.has_virtual_memory) { struct drm_radeon_gem_va va; -bo->va = radeon_bomgr_find_va(>info, >vm64, -
[Mesa-dev] [PATCH 09/15] radeonsi: move const_uploader allocations to 32-bit address space
From: Marek Olšák--- src/gallium/drivers/radeon/r600_buffer_common.c | 3 +++ src/gallium/drivers/radeon/r600_pipe_common.c | 5 +++-- src/gallium/drivers/radeon/r600_pipe_common.h | 1 + 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c index aca536d..2d64eed 100644 --- a/src/gallium/drivers/radeon/r600_buffer_common.c +++ b/src/gallium/drivers/radeon/r600_buffer_common.c @@ -170,20 +170,23 @@ void si_init_resource_fields(struct si_screen *sscreen, res->flags |= RADEON_FLAG_NO_SUBALLOC; /* shareable */ else res->flags |= RADEON_FLAG_NO_INTERPROCESS_SHARING; if (sscreen->debug_flags & DBG(NO_WC)) res->flags &= ~RADEON_FLAG_GTT_WC; if (res->b.b.flags & R600_RESOURCE_FLAG_READ_ONLY) res->flags |= RADEON_FLAG_READ_ONLY; + if (res->b.b.flags & R600_RESOURCE_FLAG_32BIT) + res->flags |= RADEON_FLAG_32BIT; + /* Set expected VRAM and GART usage for the buffer. */ res->vram_usage = 0; res->gart_usage = 0; res->max_forced_staging_uploads = 0; res->b.max_forced_staging_uploads = 0; if (res->domains & RADEON_DOMAIN_VRAM) { res->vram_usage = size; res->max_forced_staging_uploads = diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 9e45a9f..d46cb64 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -445,22 +445,23 @@ bool si_common_context_init(struct r600_common_context *rctx, return false; rctx->b.stream_uploader = u_upload_create(>b, 1024 * 1024, 0, PIPE_USAGE_STREAM, R600_RESOURCE_FLAG_READ_ONLY); if (!rctx->b.stream_uploader) return false; rctx->b.const_uploader = u_upload_create(>b, 128 * 1024, 0, PIPE_USAGE_DEFAULT, - sscreen->cpdma_prefetch_writes_memory ? - 0 : R600_RESOURCE_FLAG_READ_ONLY); +R600_RESOURCE_FLAG_32BIT | + (sscreen->cpdma_prefetch_writes_memory ? + 0 : R600_RESOURCE_FLAG_READ_ONLY)); if (!rctx->b.const_uploader) return false; rctx->cached_gtt_allocator = u_upload_create(>b, 16 * 1024, 0, PIPE_USAGE_STAGING, 0); if (!rctx->cached_gtt_allocator) return false; rctx->ctx = rctx->ws->ctx_create(rctx->ws); if (!rctx->ctx) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index a8e632c..fcba228 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -47,20 +47,21 @@ struct u_log_context; struct si_screen; struct si_context; #define R600_RESOURCE_FLAG_TRANSFER(PIPE_RESOURCE_FLAG_DRV_PRIV << 0) #define R600_RESOURCE_FLAG_FLUSHED_DEPTH (PIPE_RESOURCE_FLAG_DRV_PRIV << 1) #define R600_RESOURCE_FLAG_FORCE_TILING (PIPE_RESOURCE_FLAG_DRV_PRIV << 2) #define R600_RESOURCE_FLAG_DISABLE_DCC (PIPE_RESOURCE_FLAG_DRV_PRIV << 3) #define R600_RESOURCE_FLAG_UNMAPPABLE (PIPE_RESOURCE_FLAG_DRV_PRIV << 4) #define R600_RESOURCE_FLAG_READ_ONLY (PIPE_RESOURCE_FLAG_DRV_PRIV << 5) +#define R600_RESOURCE_FLAG_32BIT (PIPE_RESOURCE_FLAG_DRV_PRIV << 6) /* Debug flags. */ enum { /* Shader logging options: */ DBG_VS = PIPE_SHADER_VERTEX, DBG_PS = PIPE_SHADER_FRAGMENT, DBG_GS = PIPE_SHADER_GEOMETRY, DBG_TCS = PIPE_SHADER_TESS_CTRL, DBG_TES = PIPE_SHADER_TESS_EVAL, DBG_CS = PIPE_SHADER_COMPUTE, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/15] winsys/radeon: add struct radeon_vm_heap
From: Marek Olšák--- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 63 --- src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 9 ++-- src/gallium/winsys/radeon/drm/radeon_drm_winsys.h | 11 ++-- 3 files changed, 47 insertions(+), 36 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c index 7aef238..bbfe5cc 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ -191,146 +191,148 @@ static enum radeon_bo_domain radeon_bo_get_initial_domain( fprintf(stderr, "radeon: failed to get initial domain: %p 0x%08X\n", bo, bo->handle); /* Default domain as returned by get_valid_domain. */ return RADEON_DOMAIN_VRAM_GTT; } /* GEM domains and winsys domains are defined the same. */ return get_valid_domain(args.value); } -static uint64_t radeon_bomgr_find_va(struct radeon_drm_winsys *rws, +static uint64_t radeon_bomgr_find_va(const struct radeon_info *info, + struct radeon_vm_heap *heap, uint64_t size, uint64_t alignment) { struct radeon_bo_va_hole *hole, *n; uint64_t offset = 0, waste = 0; /* All VM address space holes will implicitly start aligned to the * size alignment, so we don't need to sanitize the alignment here */ -size = align(size, rws->info.gart_page_size); +size = align(size, info->gart_page_size); -mtx_lock(>bo_va_mutex); +mtx_lock(>mutex); /* first look for a hole */ -LIST_FOR_EACH_ENTRY_SAFE(hole, n, >va_holes, list) { +LIST_FOR_EACH_ENTRY_SAFE(hole, n, >holes, list) { offset = hole->offset; waste = offset % alignment; waste = waste ? alignment - waste : 0; offset += waste; if (offset >= (hole->offset + hole->size)) { continue; } if (!waste && hole->size == size) { offset = hole->offset; list_del(>list); FREE(hole); -mtx_unlock(>bo_va_mutex); +mtx_unlock(>mutex); return offset; } if ((hole->size - waste) > size) { if (waste) { n = CALLOC_STRUCT(radeon_bo_va_hole); n->size = waste; n->offset = hole->offset; list_add(>list, >list); } hole->size -= (size + waste); hole->offset += size + waste; -mtx_unlock(>bo_va_mutex); +mtx_unlock(>mutex); return offset; } if ((hole->size - waste) == size) { hole->size = waste; -mtx_unlock(>bo_va_mutex); +mtx_unlock(>mutex); return offset; } } -offset = rws->va_offset; +offset = heap->start; waste = offset % alignment; waste = waste ? alignment - waste : 0; if (waste) { n = CALLOC_STRUCT(radeon_bo_va_hole); n->size = waste; n->offset = offset; -list_add(>list, >va_holes); +list_add(>list, >holes); } offset += waste; -rws->va_offset += size + waste; -mtx_unlock(>bo_va_mutex); +heap->start += size + waste; +mtx_unlock(>mutex); return offset; } -static void radeon_bomgr_free_va(struct radeon_drm_winsys *rws, +static void radeon_bomgr_free_va(const struct radeon_info *info, + struct radeon_vm_heap *heap, uint64_t va, uint64_t size) { struct radeon_bo_va_hole *hole = NULL; -size = align(size, rws->info.gart_page_size); +size = align(size, info->gart_page_size); -mtx_lock(>bo_va_mutex); -if ((va + size) == rws->va_offset) { -rws->va_offset = va; +mtx_lock(>mutex); +if ((va + size) == heap->start) { +heap->start = va; /* Delete uppermost hole if it reaches the new top */ -if (!LIST_IS_EMPTY(>va_holes)) { -hole = container_of(rws->va_holes.next, hole, list); +if (!LIST_IS_EMPTY(>holes)) { +hole = container_of(heap->holes.next, hole, list); if ((hole->offset + hole->size) == va) { -rws->va_offset = hole->offset; +heap->start = hole->offset; list_del(>list); FREE(hole); } } } else { struct radeon_bo_va_hole *next; -hole = container_of(>va_holes, hole, list); -LIST_FOR_EACH_ENTRY(next, >va_holes, list) { +hole = container_of(>holes, hole, list); +LIST_FOR_EACH_ENTRY(next, >holes, list) { if (next->offset < va) break; hole = next; } -if (>list != >va_holes) { +if (>list != >holes) { /* Grow upper hole if it's adjacent */
[Mesa-dev] [PATCH 04/15] pb_cache: let drivers choose the number of buckets
From: Marek Olšák--- src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c | 2 +- src/gallium/auxiliary/pipebuffer/pb_cache.c| 20 src/gallium/auxiliary/pipebuffer/pb_cache.h| 6 -- src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 1 - src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 3 ++- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 1 - src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 3 ++- 7 files changed, 25 insertions(+), 11 deletions(-) diff --git a/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c b/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c index 24831f6..4e70048 100644 --- a/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c +++ b/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c @@ -297,16 +297,16 @@ pb_cache_manager_create(struct pb_manager *provider, return NULL; mgr = CALLOC_STRUCT(pb_cache_manager); if (!mgr) return NULL; mgr->base.destroy = pb_cache_manager_destroy; mgr->base.create_buffer = pb_cache_manager_create_buffer; mgr->base.flush = pb_cache_manager_flush; mgr->provider = provider; - pb_cache_init(>cache, usecs, size_factor, bypass_usage, + pb_cache_init(>cache, 1, usecs, size_factor, bypass_usage, maximum_cache_size, _pb_cache_buffer_destroy, pb_cache_can_reclaim_buffer); return >base; } diff --git a/src/gallium/auxiliary/pipebuffer/pb_cache.c b/src/gallium/auxiliary/pipebuffer/pb_cache.c index dd479ae..af899a2 100644 --- a/src/gallium/auxiliary/pipebuffer/pb_cache.c +++ b/src/gallium/auxiliary/pipebuffer/pb_cache.c @@ -85,21 +85,21 @@ pb_cache_add_buffer(struct pb_cache_entry *entry) struct pb_cache *mgr = entry->mgr; struct list_head *cache = >buckets[entry->bucket_index]; struct pb_buffer *buf = entry->buffer; unsigned i; mtx_lock(>mutex); assert(!pipe_is_referenced(>reference)); int64_t current_time = os_time_get(); - for (i = 0; i < ARRAY_SIZE(mgr->buckets); i++) + for (i = 0; i < mgr->num_heaps; i++) release_expired_buffers_locked(>buckets[i], current_time); /* Directly release any buffer that exceeds the limit. */ if (mgr->cache_size + buf->size > mgr->max_cache_size) { mgr->destroy_buffer(buf); mtx_unlock(>mutex); return; } entry->start = os_time_get(); @@ -146,20 +146,22 @@ pb_cache_is_buffer_compat(struct pb_cache_entry *entry, struct pb_buffer * pb_cache_reclaim_buffer(struct pb_cache *mgr, pb_size size, unsigned alignment, unsigned usage, unsigned bucket_index) { struct pb_cache_entry *entry; struct pb_cache_entry *cur_entry; struct list_head *cur, *next; int64_t now; int ret = 0; + + assert(bucket_index < mgr->num_heaps); struct list_head *cache = >buckets[bucket_index]; mtx_lock(>mutex); entry = NULL; cur = cache->next; next = cur->next; /* search in the expired buffers, freeing them in the process */ now = os_time_get(); @@ -222,39 +224,41 @@ pb_cache_reclaim_buffer(struct pb_cache *mgr, pb_size size, * Empty the cache. Useful when there is not enough memory. */ void pb_cache_release_all_buffers(struct pb_cache *mgr) { struct list_head *curr, *next; struct pb_cache_entry *buf; unsigned i; mtx_lock(>mutex); - for (i = 0; i < ARRAY_SIZE(mgr->buckets); i++) { + for (i = 0; i < mgr->num_heaps; i++) { struct list_head *cache = >buckets[i]; curr = cache->next; next = curr->next; while (curr != cache) { buf = LIST_ENTRY(struct pb_cache_entry, curr, head); destroy_buffer_locked(buf); curr = next; next = curr->next; } } mtx_unlock(>mutex); } void pb_cache_init_entry(struct pb_cache *mgr, struct pb_cache_entry *entry, struct pb_buffer *buf, unsigned bucket_index) { + assert(bucket_index < mgr->num_heaps); + memset(entry, 0, sizeof(*entry)); entry->buffer = buf; entry->mgr = mgr; entry->bucket_index = bucket_index; } /** * Initialize a caching buffer manager. * * @param mgr The cache buffer manager @@ -263,40 +267,48 @@ pb_cache_init_entry(struct pb_cache *mgr, struct pb_cache_entry *entry, * @param size_factor Declare buffers that are size_factor times bigger than * the requested size as cache hits. * @param bypass_usage Bitmask. If (requested usage & bypass_usage) != 0, * buffer allocation requests are rejected. * @param maximum_cache_size Maximum size of all unused buffers the cache can *hold. * @param destroy_buffer Function that destroys a buffer for good. * @param can_reclaim Whether a buffer can be reclaimed (e.g. is not busy) */ void -pb_cache_init(struct pb_cache *mgr, uint usecs, float size_factor,
[Mesa-dev] [PATCH 01/15] gallium/radeon: simplify radeon_flags_from_heap
From: Marek Olšák--- src/gallium/drivers/radeon/radeon_winsys.h | 22 -- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_winsys.h b/src/gallium/drivers/radeon/radeon_winsys.h index d1c761f..49ef83b 100644 --- a/src/gallium/drivers/radeon/radeon_winsys.h +++ b/src/gallium/drivers/radeon/radeon_winsys.h @@ -675,44 +675,38 @@ static inline enum radeon_bo_domain radeon_domain_from_heap(enum radeon_heap hea case RADEON_HEAP_GTT: return RADEON_DOMAIN_GTT; default: assert(0); return (enum radeon_bo_domain)0; } } static inline unsigned radeon_flags_from_heap(enum radeon_heap heap) { +unsigned flags = RADEON_FLAG_NO_INTERPROCESS_SHARING | + (heap != RADEON_HEAP_GTT ? RADEON_FLAG_GTT_WC : 0); + switch (heap) { case RADEON_HEAP_VRAM_NO_CPU_ACCESS: -return RADEON_FLAG_GTT_WC | - RADEON_FLAG_NO_CPU_ACCESS | - RADEON_FLAG_NO_INTERPROCESS_SHARING; +return flags | + RADEON_FLAG_NO_CPU_ACCESS; case RADEON_HEAP_VRAM_READ_ONLY: -return RADEON_FLAG_GTT_WC | - RADEON_FLAG_NO_INTERPROCESS_SHARING | +case RADEON_HEAP_GTT_WC_READ_ONLY: +return flags | RADEON_FLAG_READ_ONLY; case RADEON_HEAP_VRAM: case RADEON_HEAP_GTT_WC: -return RADEON_FLAG_GTT_WC | - RADEON_FLAG_NO_INTERPROCESS_SHARING; - -case RADEON_HEAP_GTT_WC_READ_ONLY: -return RADEON_FLAG_GTT_WC | - RADEON_FLAG_NO_INTERPROCESS_SHARING | - RADEON_FLAG_READ_ONLY; - case RADEON_HEAP_GTT: default: -return RADEON_FLAG_NO_INTERPROCESS_SHARING; +return flags; } } /* The pb cache bucket is chosen to minimize pb_cache misses. * It must be between 0 and 3 inclusive. */ static inline unsigned radeon_get_pb_cache_bucket_index(enum radeon_heap heap) { switch (heap) { case RADEON_HEAP_VRAM_NO_CPU_ACCESS: -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/15] pb_cache: call os_time_get outside of the loop
From: Marek Olšák--- src/gallium/auxiliary/pipebuffer/pb_cache.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/gallium/auxiliary/pipebuffer/pb_cache.c b/src/gallium/auxiliary/pipebuffer/pb_cache.c index b67e54b..dd479ae 100644 --- a/src/gallium/auxiliary/pipebuffer/pb_cache.c +++ b/src/gallium/auxiliary/pipebuffer/pb_cache.c @@ -47,34 +47,32 @@ destroy_buffer_locked(struct pb_cache_entry *entry) --mgr->num_buffers; mgr->cache_size -= buf->size; } mgr->destroy_buffer(buf); } /** * Free as many cache buffers from the list head as possible. */ static void -release_expired_buffers_locked(struct list_head *cache) +release_expired_buffers_locked(struct list_head *cache, + int64_t current_time) { struct list_head *curr, *next; struct pb_cache_entry *entry; - int64_t now; - - now = os_time_get(); curr = cache->next; next = curr->next; while (curr != cache) { entry = LIST_ENTRY(struct pb_cache_entry, curr, head); - if (!os_time_timeout(entry->start, entry->end, now)) + if (!os_time_timeout(entry->start, entry->end, current_time)) break; destroy_buffer_locked(entry); curr = next; next = curr->next; } } /** @@ -85,22 +83,24 @@ void pb_cache_add_buffer(struct pb_cache_entry *entry) { struct pb_cache *mgr = entry->mgr; struct list_head *cache = >buckets[entry->bucket_index]; struct pb_buffer *buf = entry->buffer; unsigned i; mtx_lock(>mutex); assert(!pipe_is_referenced(>reference)); + int64_t current_time = os_time_get(); + for (i = 0; i < ARRAY_SIZE(mgr->buckets); i++) - release_expired_buffers_locked(>buckets[i]); + release_expired_buffers_locked(>buckets[i], current_time); /* Directly release any buffer that exceeds the limit. */ if (mgr->cache_size + buf->size > mgr->max_cache_size) { mgr->destroy_buffer(buf); mtx_unlock(>mutex); return; } entry->start = os_time_get(); entry->end = entry->start + mgr->usecs; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/15] gallium/radeon: add 32-bit address space heaps
From: Marek Olšák--- src/gallium/drivers/radeon/radeon_winsys.h | 51 -- 1 file changed, 48 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_winsys.h b/src/gallium/drivers/radeon/radeon_winsys.h index 49ef83b..9f274b4 100644 --- a/src/gallium/drivers/radeon/radeon_winsys.h +++ b/src/gallium/drivers/radeon/radeon_winsys.h @@ -46,20 +46,21 @@ enum radeon_bo_domain { /* bitfield */ RADEON_DOMAIN_VRAM_GTT = RADEON_DOMAIN_VRAM | RADEON_DOMAIN_GTT }; enum radeon_bo_flag { /* bitfield */ RADEON_FLAG_GTT_WC =(1 << 0), RADEON_FLAG_NO_CPU_ACCESS = (1 << 1), RADEON_FLAG_NO_SUBALLOC = (1 << 2), RADEON_FLAG_SPARSE =(1 << 3), RADEON_FLAG_NO_INTERPROCESS_SHARING = (1 << 4), RADEON_FLAG_READ_ONLY = (1 << 5), +RADEON_FLAG_32BIT =(1 << 6), }; enum radeon_bo_usage { /* bitfield */ RADEON_USAGE_READ = 2, RADEON_USAGE_WRITE = 4, RADEON_USAGE_READWRITE = RADEON_USAGE_READ | RADEON_USAGE_WRITE, /* The winsys ensures that the CS submission will be scheduled after * previously flushed CSs referencing this BO in a conflicting way. */ @@ -648,37 +649,45 @@ static inline void radeon_emit(struct radeon_winsys_cs *cs, uint32_t value) static inline void radeon_emit_array(struct radeon_winsys_cs *cs, const uint32_t *values, unsigned count) { memcpy(cs->current.buf + cs->current.cdw, values, count * 4); cs->current.cdw += count; } enum radeon_heap { RADEON_HEAP_VRAM_NO_CPU_ACCESS, RADEON_HEAP_VRAM_READ_ONLY, +RADEON_HEAP_VRAM_READ_ONLY_32BIT, +RADEON_HEAP_VRAM_32BIT, RADEON_HEAP_VRAM, RADEON_HEAP_GTT_WC, RADEON_HEAP_GTT_WC_READ_ONLY, +RADEON_HEAP_GTT_WC_READ_ONLY_32BIT, +RADEON_HEAP_GTT_WC_32BIT, RADEON_HEAP_GTT, RADEON_MAX_SLAB_HEAPS, RADEON_MAX_CACHED_HEAPS = RADEON_MAX_SLAB_HEAPS, }; static inline enum radeon_bo_domain radeon_domain_from_heap(enum radeon_heap heap) { switch (heap) { case RADEON_HEAP_VRAM_NO_CPU_ACCESS: case RADEON_HEAP_VRAM_READ_ONLY: +case RADEON_HEAP_VRAM_READ_ONLY_32BIT: +case RADEON_HEAP_VRAM_32BIT: case RADEON_HEAP_VRAM: return RADEON_DOMAIN_VRAM; case RADEON_HEAP_GTT_WC: case RADEON_HEAP_GTT_WC_READ_ONLY: +case RADEON_HEAP_GTT_WC_READ_ONLY_32BIT: +case RADEON_HEAP_GTT_WC_32BIT: case RADEON_HEAP_GTT: return RADEON_DOMAIN_GTT; default: assert(0); return (enum radeon_bo_domain)0; } } static inline unsigned radeon_flags_from_heap(enum radeon_heap heap) { @@ -688,41 +697,56 @@ static inline unsigned radeon_flags_from_heap(enum radeon_heap heap) switch (heap) { case RADEON_HEAP_VRAM_NO_CPU_ACCESS: return flags | RADEON_FLAG_NO_CPU_ACCESS; case RADEON_HEAP_VRAM_READ_ONLY: case RADEON_HEAP_GTT_WC_READ_ONLY: return flags | RADEON_FLAG_READ_ONLY; +case RADEON_HEAP_VRAM_READ_ONLY_32BIT: +case RADEON_HEAP_GTT_WC_READ_ONLY_32BIT: +return flags | + RADEON_FLAG_READ_ONLY | + RADEON_FLAG_32BIT; + +case RADEON_HEAP_VRAM_32BIT: +case RADEON_HEAP_GTT_WC_32BIT: +return flags | + RADEON_FLAG_32BIT; + case RADEON_HEAP_VRAM: case RADEON_HEAP_GTT_WC: case RADEON_HEAP_GTT: default: return flags; } } /* The pb cache bucket is chosen to minimize pb_cache misses. * It must be between 0 and 3 inclusive. */ static inline unsigned radeon_get_pb_cache_bucket_index(enum radeon_heap heap) { switch (heap) { case RADEON_HEAP_VRAM_NO_CPU_ACCESS: return 0; case RADEON_HEAP_VRAM_READ_ONLY: +case RADEON_HEAP_VRAM_READ_ONLY_32BIT: +case RADEON_HEAP_VRAM_32BIT: case RADEON_HEAP_VRAM: return 1; case RADEON_HEAP_GTT_WC: case RADEON_HEAP_GTT_WC_READ_ONLY: +case RADEON_HEAP_GTT_WC_READ_ONLY_32BIT: +case RADEON_HEAP_GTT_WC_32BIT: return 2; case RADEON_HEAP_GTT: default: return 3; } } /* Return the heap index for winsys allocators, or -1 on failure. */ static inline int radeon_get_heap_index(enum radeon_bo_domain domain, enum radeon_bo_flag flags) @@ -733,46 +757,67 @@ static inline int radeon_get_heap_index(enum radeon_bo_domain domain, assert(!(flags & RADEON_FLAG_NO_CPU_ACCESS) || domain == RADEON_DOMAIN_VRAM); /* Resources with interprocess sharing don't use any winsys allocators. */ if (!(flags & RADEON_FLAG_NO_INTERPROCESS_SHARING)) return -1; /* Unsupported flags: NO_SUBALLOC, SPARSE. */ if (flags & ~(RADEON_FLAG_GTT_WC | RADEON_FLAG_NO_CPU_ACCESS | RADEON_FLAG_NO_INTERPROCESS_SHARING | - RADEON_FLAG_READ_ONLY)) +
[Mesa-dev] [PATCH 00/15] RadeonSI 32-bit GPU pointers
Hi, This series: - increases the number of buckets in pb_cache - adds 32-bit heaps: GTT WC, VRAM, and read-only versions of those - adds a 32-bit VM allocator into winsys/radeon and enables 32-bit VM allocations in both winsyses - moves all const_uploader allocations to 32-bit address space - puts "amdgpu.uniform" LLVM metadata on loads instead of GEPs, so that InstCombine doesn't remove it - switches shader pointers in user SGPRs to 32 bits Dependencies: - https://reviews.llvm.org/D41715 - https://reviews.llvm.org/D41651 This frees up to 7 user SGPRs in merged shaders, 5 user SGPRs in vertex shaders, and 4 user SGPRs in other shaders. Please review. Thanks, Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 104381] swr fails to build since llvm-svn r321257
https://bugs.freedesktop.org/show_bug.cgi?id=104381 Laurent carlierchanged: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #1 from Laurent carlier --- Fixed in trunk with ad218754c79e0af61d5ba225a4b195cb55c2cac9 -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] spirv: Use correct type for sampled images
On 6 January 2018 at 01:03, Jason Ekstrandwrote: > On Tue, Nov 7, 2017 at 3:08 AM, Alex Smith > wrote: > >> Thanks Jason. Can someone push this? >> > > Did you never get push access? > I did - this is commit e9eb3c4753e4f56b03d16d8d6f71d49f1e7b97db. Thanks, Alex > --Jason > > >> On 6 November 2017 at 16:21, Jason Ekstrand wrote: >> >>> On Mon, Nov 6, 2017 at 2:37 AM, Alex Smith >>> wrote: >>> We should use the result type of the OpSampledImage opcode, rather than the type of the underlying image/samplers. This resolves an issue when using separate images and shadow samplers with glslang. Example: layout (...) uniform samplerShadow s0; layout (...) uniform texture2D res0; ... float result = textureLod(sampler2DShadow(res0, s0), uv, 0); For this, for the combined OpSampledImage, the type of the base image was being used (which does not have the Depth flag set, whereas the result type does), therefore it was not being recognised as a shadow sampler. This led to the wrong LLVM intrinsics being emitted by RADV. >>> >>> Reviewed-by: Jason Ekstrand >>> >>> Signed-off-by: Alex Smith Cc: "17.2 17.3" --- src/compiler/spirv/spirv_to_nir.c | 10 -- src/compiler/spirv/vtn_private.h | 1 + src/compiler/spirv/vtn_variables.c | 1 + 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/compiler/spirv/spirv_to_nir.c b/src/compiler/spirv/spirv_to_nir.c index 6825e0d6a8..93a515d731 100644 --- a/src/compiler/spirv/spirv_to_nir.c +++ b/src/compiler/spirv/spirv_to_nir.c @@ -1490,6 +1490,8 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp opcode, struct vtn_value *val = vtn_push_value(b, w[2], vtn_value_type_sampled_image); val->sampled_image = ralloc(b, struct vtn_sampled_image); + val->sampled_image->type = + vtn_value(b, w[1], vtn_value_type_type)->type; val->sampled_image->image = vtn_value(b, w[3], vtn_value_type_pointer)->pointer; val->sampled_image->sampler = @@ -1516,16 +1518,12 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp opcode, sampled = *sampled_val->sampled_image; } else { assert(sampled_val->value_type == vtn_value_type_pointer); + sampled.type = sampled_val->pointer->type; sampled.image = NULL; sampled.sampler = sampled_val->pointer; } - const struct glsl_type *image_type; - if (sampled.image) { - image_type = sampled.image->var->var->interface_type; - } else { - image_type = sampled.sampler->var->var->interface_type; - } + const struct glsl_type *image_type = sampled.type->type; const enum glsl_sampler_dim sampler_dim = glsl_get_sampler_dim(image_type); const bool is_array = glsl_sampler_type_is_array(image_type); const bool is_shadow = glsl_sampler_type_is_shadow(image_type); diff --git a/src/compiler/spirv/vtn_private.h b/src/compiler/spirv/vtn_private.h index 84584620fc..6b4645acc8 100644 --- a/src/compiler/spirv/vtn_private.h +++ b/src/compiler/spirv/vtn_private.h @@ -411,6 +411,7 @@ struct vtn_image_pointer { }; struct vtn_sampled_image { + struct vtn_type *type; struct vtn_pointer *image; /* Image or array of images */ struct vtn_pointer *sampler; /* Sampler */ }; diff --git a/src/compiler/spirv/vtn_variables.c b/src/compiler/spirv/vtn_variables.c index 1cf9d597cf..9a69b4f6fc 100644 --- a/src/compiler/spirv/vtn_variables.c +++ b/src/compiler/spirv/vtn_variables.c @@ -1805,6 +1805,7 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp opcode, struct vtn_value *val = vtn_push_value(b, w[2], vtn_value_type_sampled_image); val->sampled_image = ralloc(b, struct vtn_sampled_image); + val->sampled_image->type = base_val->sampled_image->type; val->sampled_image->image = vtn_pointer_dereference(b, base_val->sampled_image->image, chain); val->sampled_image->sampler = base_val->sampled_image->sampl er; -- 2.13.6 >>> >> > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir: fix st_nir_assign_var_locations for patch variables
Signed-off-by: Karol HerbstReviewed-by: Kenneth Graunke --- src/mesa/state_tracker/st_glsl_to_nir.cpp | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp b/src/mesa/state_tracker/st_glsl_to_nir.cpp index 5683df..1c5de3d5de 100644 --- a/src/mesa/state_tracker/st_glsl_to_nir.cpp +++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp @@ -139,8 +139,12 @@ st_nir_assign_var_locations(struct exec_list *var_list, unsigned *size, } bool processed = false; - if (var->data.patch) { - unsigned patch_loc = var->data.location - VARYING_SLOT_VAR0; + if (var->data.patch && + var->data.location != VARYING_SLOT_TESS_LEVEL_INNER && + var->data.location != VARYING_SLOT_TESS_LEVEL_OUTER && + var->data.location != VARYING_SLOT_BOUNDING_BOX0 && + var->data.location != VARYING_SLOT_BOUNDING_BOX1) { + unsigned patch_loc = var->data.location - VARYING_SLOT_PATCH0; if (processed_patch_locs & (1 << patch_loc)) processed = true; -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] nir: Silence unused parameter warnings
Series: Reviewed-by: Alejandro PiñeiroOn 06/01/18 06:40, Ian Romanick wrote: > From: Ian Romanick > > In file included from src/compiler/nir/nir_opt_algebraic.c:4:0: > src/compiler/nir/nir_search_helpers.h: In function ‘is_not_const’: > src/compiler/nir/nir_search_helpers.h:118:59: warning: unused parameter > ‘num_components’ [-Wunused-parameter] > is_not_const(nir_alu_instr *instr, unsigned src, unsigned num_components, >^~ > src/compiler/nir/nir_search_helpers.h:119:29: warning: unused parameter > ‘swizzle ’ [-Wunused-parameter] > const uint8_t *swizzle) > ^~~ > > Signed-off-by: Ian Romanick > --- > src/compiler/nir/nir_search_helpers.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/compiler/nir/nir_search_helpers.h > b/src/compiler/nir/nir_search_helpers.h > index 200f247..2e3bd13 100644 > --- a/src/compiler/nir/nir_search_helpers.h > +++ b/src/compiler/nir/nir_search_helpers.h > @@ -115,8 +115,8 @@ is_zero_to_one(nir_alu_instr *instr, unsigned src, > unsigned num_components, > } > > static inline bool > -is_not_const(nir_alu_instr *instr, unsigned src, unsigned num_components, > - const uint8_t *swizzle) > +is_not_const(nir_alu_instr *instr, unsigned src, UNUSED unsigned > num_components, > + UNUSED const uint8_t *swizzle) > { > nir_const_value *val = nir_src_as_const_value(instr->src[src].src); > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev