Re: [Mesa-dev] [PATCH 2/2] i965/tcs: support gl_PatchVerticesIn as a system value from SPIR-V

2018-01-06 Thread Kenneth Graunke
On Saturday, January 6, 2018 9:07:44 PM PST Jason Ekstrand wrote:
> On Sat, Jan 6, 2018 at 5:12 PM, Kenneth Graunke 
> wrote:
> 
> > On Wednesday, November 15, 2017 11:53:08 PM PST Iago Toral Quiroga wrote:
> > > We currently handle this by lowering it to a uniform for gen8+ but
> > > the SPIR-V path generates this as a system value, so handle that
> > > case as well.
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_tcs.c | 9 -
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c
> > b/src/mesa/drivers/dri/i965/brw_tcs.c
> > > index 4424efea4f0..b07b11f485d 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_tcs.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_tcs.c
> > > @@ -296,7 +296,14 @@ brw_tcs_populate_key(struct brw_context *brw,
> > >per_patch_slots |= prog->info.patch_outputs_written;
> > > }
> > >
> > > -   if (devinfo->gen < 8 || !tcp)
> > > +   /* For GLSL, gen8+ lowers gl_PatchVerticesIn to a uniform, however
> > > +* the SPIR-V path always lowers it to a system value.
> > > +*/
> > > +   bool reads_patch_vertices_as_system_value =
> > > +  tcp && (tcp->program.nir->info.system_values_read &
> > > +  BITFIELD64_BIT(SYSTEM_VALUE_VERTICES_IN));
> > > +
> > > +   if (devinfo->gen < 8 || !tcp || reads_patch_vertices_as_
> > system_value)
> > >key->input_vertices = brw->ctx.TessCtrlProgram.patch_vertices;
> > > key->outputs_written = per_vertex_slots;
> > > key->patch_outputs_written = per_patch_slots;
> > >
> >
> > I guess this is okay, and it's better than nothing.  I'd really rather
> > see it converted to a uniform, like it is in the normal GLSL paths.  If
> > you're going to add recompiles based on the key like this, it might be
> > nice to at least update the brw_tcs_precompile function to guess, so we
> > at least attempt to avoid a recompile.
> >
> 
> Ugh... I'm happy to give a stronger "I don't like this".  In Vulkan, this
> is part of the pipeline state so we just pass it in through the shader
> key.  With GL, ugh... Personally, I think I'd be ok with just making it
> state based all the time but we already have the infrastructure to pass it
> through as a uniform so we may as well.  I think the better thing to do
> would be to add a quick little pass that moves VERTICES_IN to a uniform and
> call that on gen8+ brw_link.cpp.  Then we can delete
> LowerTESPatchVerticesIn as i965 is the only user.  The "pass" would be
> really easy:

LowerTCSPatchVerticesIn rather.  I like this plan.

> 
> void
> brw_nir_lower_tcs_vertices_in_to_uniform(nir_shader *nir, const struct
> gl_program *prog, brw_tcs_prog_data *prog_data)
> {
>int uniform = -1;
>nir_foreach_var_safe(var, >system_values) {
>   if (var->data.location != SYSTEM_VALUE_VERTICES_IN)
>  continue;
> 
>   if (uniform < -1) {
>  gl_state_index tokens[5] = {
> STATE_INTERNAL,
> STATE_TESS_PATCH_VERTICES_IN,
>  };
>  int index = _mesa_add_state_reference(prog->Parameters, tokens);
> 
>  uniform = prog_data->nr_params;
>  uint32_t *param =
> brw_stage_prog_data_add_params(_data->base->base, 1);
>  *param = BRW_PARAM_PARAMETER(index, SWIZZLE_);
>   }
> 
>   var->mode = nir_var_uniform;
>   var->data.location = uniform;
>   exec_node_remove(>node);
>   exec_list_push_tail(>uniforms, >node);
>}
> }
> 
> I may not have gotten my state referencing quite right there, but I think
> it's close.  I'd probably put the pas in brw_nir_uniforms.cpp if I was
> writing it.
> 
> --Jason
> 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/tcs: support gl_PatchVerticesIn as a system value from SPIR-V

2018-01-06 Thread Jason Ekstrand
On Sat, Jan 6, 2018 at 5:12 PM, Kenneth Graunke 
wrote:

> On Wednesday, November 15, 2017 11:53:08 PM PST Iago Toral Quiroga wrote:
> > We currently handle this by lowering it to a uniform for gen8+ but
> > the SPIR-V path generates this as a system value, so handle that
> > case as well.
> > ---
> >  src/mesa/drivers/dri/i965/brw_tcs.c | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c
> b/src/mesa/drivers/dri/i965/brw_tcs.c
> > index 4424efea4f0..b07b11f485d 100644
> > --- a/src/mesa/drivers/dri/i965/brw_tcs.c
> > +++ b/src/mesa/drivers/dri/i965/brw_tcs.c
> > @@ -296,7 +296,14 @@ brw_tcs_populate_key(struct brw_context *brw,
> >per_patch_slots |= prog->info.patch_outputs_written;
> > }
> >
> > -   if (devinfo->gen < 8 || !tcp)
> > +   /* For GLSL, gen8+ lowers gl_PatchVerticesIn to a uniform, however
> > +* the SPIR-V path always lowers it to a system value.
> > +*/
> > +   bool reads_patch_vertices_as_system_value =
> > +  tcp && (tcp->program.nir->info.system_values_read &
> > +  BITFIELD64_BIT(SYSTEM_VALUE_VERTICES_IN));
> > +
> > +   if (devinfo->gen < 8 || !tcp || reads_patch_vertices_as_
> system_value)
> >key->input_vertices = brw->ctx.TessCtrlProgram.patch_vertices;
> > key->outputs_written = per_vertex_slots;
> > key->patch_outputs_written = per_patch_slots;
> >
>
> I guess this is okay, and it's better than nothing.  I'd really rather
> see it converted to a uniform, like it is in the normal GLSL paths.  If
> you're going to add recompiles based on the key like this, it might be
> nice to at least update the brw_tcs_precompile function to guess, so we
> at least attempt to avoid a recompile.
>

Ugh... I'm happy to give a stronger "I don't like this".  In Vulkan, this
is part of the pipeline state so we just pass it in through the shader
key.  With GL, ugh... Personally, I think I'd be ok with just making it
state based all the time but we already have the infrastructure to pass it
through as a uniform so we may as well.  I think the better thing to do
would be to add a quick little pass that moves VERTICES_IN to a uniform and
call that on gen8+ brw_link.cpp.  Then we can delete
LowerTESPatchVerticesIn as i965 is the only user.  The "pass" would be
really easy:

void
brw_nir_lower_tcs_vertices_in_to_uniform(nir_shader *nir, const struct
gl_program *prog, brw_tcs_prog_data *prog_data)
{
   int uniform = -1;
   nir_foreach_var_safe(var, >system_values) {
  if (var->data.location != SYSTEM_VALUE_VERTICES_IN)
 continue;

  if (uniform < -1) {
 gl_state_index tokens[5] = {
STATE_INTERNAL,
STATE_TESS_PATCH_VERTICES_IN,
 };
 int index = _mesa_add_state_reference(prog->Parameters, tokens);

 uniform = prog_data->nr_params;
 uint32_t *param =
brw_stage_prog_data_add_params(_data->base->base, 1);
 *param = BRW_PARAM_PARAMETER(index, SWIZZLE_);
  }

  var->mode = nir_var_uniform;
  var->data.location = uniform;
  exec_node_remove(>node);
  exec_list_push_tail(>uniforms, >node);
   }
}

I may not have gotten my state referencing quite right there, but I think
it's close.  I'd probably put the pas in brw_nir_uniforms.cpp if I was
writing it.

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix st_nir_assign_var_locations for patch variables

2018-01-06 Thread Timothy Arceri

Thanks.

Reviewed-by: Timothy Arceri 

On 06/01/18 20:01, Karol Herbst wrote:

Signed-off-by: Karol Herbst 
Reviewed-by: Kenneth Graunke 
---
  src/mesa/state_tracker/st_glsl_to_nir.cpp | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index 5683df..1c5de3d5de 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -139,8 +139,12 @@ st_nir_assign_var_locations(struct exec_list *var_list, 
unsigned *size,
}
  
bool processed = false;

-  if (var->data.patch) {
- unsigned patch_loc = var->data.location - VARYING_SLOT_VAR0;
+  if (var->data.patch &&
+  var->data.location != VARYING_SLOT_TESS_LEVEL_INNER &&
+  var->data.location != VARYING_SLOT_TESS_LEVEL_OUTER &&
+  var->data.location != VARYING_SLOT_BOUNDING_BOX0 &&
+  var->data.location != VARYING_SLOT_BOUNDING_BOX1) {
+ unsigned patch_loc = var->data.location - VARYING_SLOT_PATCH0;
   if (processed_patch_locs & (1 << patch_loc))
  processed = true;
  


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/tcs: support gl_PatchVerticesIn as a system value from SPIR-V

2018-01-06 Thread Kenneth Graunke
On Wednesday, November 15, 2017 11:53:08 PM PST Iago Toral Quiroga wrote:
> We currently handle this by lowering it to a uniform for gen8+ but
> the SPIR-V path generates this as a system value, so handle that
> case as well.
> ---
>  src/mesa/drivers/dri/i965/brw_tcs.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c 
> b/src/mesa/drivers/dri/i965/brw_tcs.c
> index 4424efea4f0..b07b11f485d 100644
> --- a/src/mesa/drivers/dri/i965/brw_tcs.c
> +++ b/src/mesa/drivers/dri/i965/brw_tcs.c
> @@ -296,7 +296,14 @@ brw_tcs_populate_key(struct brw_context *brw,
>per_patch_slots |= prog->info.patch_outputs_written;
> }
>  
> -   if (devinfo->gen < 8 || !tcp)
> +   /* For GLSL, gen8+ lowers gl_PatchVerticesIn to a uniform, however
> +* the SPIR-V path always lowers it to a system value.
> +*/
> +   bool reads_patch_vertices_as_system_value =
> +  tcp && (tcp->program.nir->info.system_values_read &
> +  BITFIELD64_BIT(SYSTEM_VALUE_VERTICES_IN));
> +
> +   if (devinfo->gen < 8 || !tcp || reads_patch_vertices_as_system_value)
>key->input_vertices = brw->ctx.TessCtrlProgram.patch_vertices;
> key->outputs_written = per_vertex_slots;
> key->patch_outputs_written = per_patch_slots;
> 

I guess this is okay, and it's better than nothing.  I'd really rather
see it converted to a uniform, like it is in the normal GLSL paths.  If
you're going to add recompiles based on the key like this, it might be
nice to at least update the brw_tcs_precompile function to guess, so we
at least attempt to avoid a recompile.

As a stop-gap measure,
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/15] RadeonSI 32-bit GPU pointers

2018-01-06 Thread Marek Olšák
On Sat, Jan 6, 2018 at 5:51 PM, Christian König
 wrote:
> Hi Marek,
>
> actually I was on the verge to remove the 32bit VM support in libdrm because
> it clashes with HMM and SVM in general.
>
> Is it possible to set the upper 32bit of the 64bit address to some fixed
> value instead?

Yes, but not on radeon. radeon only has 8GB of virtual address space and
4GB on older kernels. I would have to change LLVM to set the high bits
differently on amdgpu but keep the high bits 0 on radeon.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 3/4] meson: build clover

2018-01-06 Thread Dylan Baker
Quoting Jan Vesely (2018-01-06 15:18:54)
> On Fri, 2018-01-05 at 15:26 -0800, Dylan Baker wrote:
> > Quoting Jan Vesely (2018-01-05 14:16:41)
> > > Hi,
> > > 
> > > 
> > > sorry for the delay. I was mostly traveling during the holidays.
> > 
> > No worries, I was also away over the holidays and didn't look at this until
> > today.
> > 
> > > 
> > > On Fri, 2017-12-15 at 10:54 -0800, Dylan Baker wrote:
> > > > This has only been compile tested.
> > > > 
> > > > v2: - Have a single option for opencl (Eric E)
> > > > - fix typo "tgis" -> "tgsi" (Curro)
> > > > - Don't add "lib" to pipe loader libraries, which matches the
> > > >   autotools behavior
> > > > v3: - Remove trailing whitespace
> > > > - Make PIPE_SEARCH_DIR an absolute path
> > > > 
> > > > cc: Curro Jerez 
> > > > cc: Jan Vesely 
> > > > cc: Aaron Watry 
> > > > Signed-off-by: Dylan Baker 
> > > > ---
> > > >  include/meson.build   |  19 
> > > >  meson.build   |  29 +-
> > > >  meson_options.txt |   7 ++
> > > >  src/gallium/auxiliary/pipe-loader/meson.build |   3 +-
> > > >  src/gallium/meson.build   |  12 ++-
> > > >  src/gallium/state_trackers/clover/meson.build | 122 
> > > > ++
> > > >  src/gallium/targets/opencl/meson.build|  73 +++
> > > >  src/gallium/targets/pipe-loader/meson.build   |  77 
> > > >  8 files changed, 336 insertions(+), 6 deletions(-)
> > > >  create mode 100644 src/gallium/state_trackers/clover/meson.build
> > > >  create mode 100644 src/gallium/targets/opencl/meson.build
> > > >  create mode 100644 src/gallium/targets/pipe-loader/meson.build
> > > > 
> > > > diff --git a/include/meson.build b/include/meson.build
> > > > index e4dae91cede..a2e7ce6580e 100644
> > > > --- a/include/meson.build
> > > > +++ b/include/meson.build
> > > > @@ -78,3 +78,22 @@ if with_gallium_st_nine
> > > >  subdir : 'd3dadapter',
> > > >)
> > > >  endif
> > > > +
> > > > +# Only install the headers if we are building a stand alone 
> > > > implementation and
> > > > +# not an ICD enabled implementation
> > > > +if with_gallium_opencl and not with_opencl_icd
> > > > +  install_headers(
> > > > +'CL/cl.h',
> > > > +'CL/cl.hpp',
> > > > +'CL/cl_d3d10.h',
> > > > +'CL/cl_d3d11.h',
> > > > +'CL/cl_dx9_media_sharing.h',
> > > > +'CL/cl_egl.h',
> > > > +'CL/cl_ext.h',
> > > > +'CL/cl_gl.h',
> > > > +'CL/cl_gl_ext.h',
> > > > +'CL/cl_platform.h',
> > > > +'CL/opencl.h',
> > > > +subdir: 'CL'
> > > > +  )
> > > > +endif
> > > > diff --git a/meson.build b/meson.build
> > > > index 842d441199e..74b2d5c49dc 100644
> > > > --- a/meson.build
> > > > +++ b/meson.build
> > > > @@ -583,6 +583,22 @@ if with_gallium_st_nine
> > > >endif
> > > >  endif
> > > >  
> > > > +_opencl = get_option('gallium-opencl')
> > > > +if _opencl !=' disabled'
> > > > +  if not with_gallium
> > > > +error('OpenCL Clover implementation requires at least one gallium 
> > > > driver.')
> > > > +  endif
> > > > +
> > > > +  # TODO: alitvec?
> > > > +  dep_clc = dependency('libclc')
> > > > +  with_gallium_opencl = true
> > > > +  with_opencl_icd = _opencl == 'icd'
> > > > +else
> > > > +  dep_clc = []
> > > > +  with_gallium_opencl = false
> > > > +  with_gallium_icd = false
> > > > +endif
> > > > +
> > > >  gl_pkgconfig_c_flags = []
> > > >  if with_platform_x11
> > > >if with_any_vk or (with_glx == 'dri' and with_dri_platform == 'drm')
> > > > @@ -930,7 +946,7 @@ dep_thread = dependency('threads')
> > > >  if dep_thread.found() and host_machine.system() != 'windows'
> > > >pre_args += '-DHAVE_PTHREAD'
> > > >  endif
> > > > -if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 # TODO: 
> > > > clover
> > > > +if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 or 
> > > > with_gallium_opencl
> > > >dep_elf = dependency('libelf', required : false)
> > > >if not dep_elf.found()
> > > >  dep_elf = cc.find_library('elf')
> > > > @@ -972,12 +988,19 @@ if with_amd_vk or with_gallium_radeonsi or 
> > > > with_gallium_r600
> > > >  llvm_modules += 'asmparser'
> > > >endif
> > > >  endif
> > > > +if with_gallium_opencl
> > > > +  llvm_modules += [
> > > > +'all-targets', 'linker', 'coverage', 'instrumentation', 'ipo', 
> > > > 'irreader',
> > > > +'lto', 'option', 'objcarcopts', 'profiledata',
> > > > +  ]
> > > > +  # TODO: optional modules
> > > > +endif
> > > >  
> > > >  _llvm = get_option('llvm')
> > > >  if _llvm == 'auto'
> > > >dep_llvm = dependency(
> > > >  'llvm', version : '>= 3.9.0', modules : llvm_modules,
> > > > -required : with_amd_vk or with_gallium_radeonsi or 
> > > > with_gallium_swr,
> > > > +required : with_amd_vk or with_gallium_radeonsi or 
> > > > 

Re: [Mesa-dev] [PATCH v3 3/4] meson: build clover

2018-01-06 Thread Jan Vesely
On Fri, 2018-01-05 at 15:26 -0800, Dylan Baker wrote:
> Quoting Jan Vesely (2018-01-05 14:16:41)
> > Hi,
> > 
> > 
> > sorry for the delay. I was mostly traveling during the holidays.
> 
> No worries, I was also away over the holidays and didn't look at this until
> today.
> 
> > 
> > On Fri, 2017-12-15 at 10:54 -0800, Dylan Baker wrote:
> > > This has only been compile tested.
> > > 
> > > v2: - Have a single option for opencl (Eric E)
> > > - fix typo "tgis" -> "tgsi" (Curro)
> > > - Don't add "lib" to pipe loader libraries, which matches the
> > >   autotools behavior
> > > v3: - Remove trailing whitespace
> > > - Make PIPE_SEARCH_DIR an absolute path
> > > 
> > > cc: Curro Jerez 
> > > cc: Jan Vesely 
> > > cc: Aaron Watry 
> > > Signed-off-by: Dylan Baker 
> > > ---
> > >  include/meson.build   |  19 
> > >  meson.build   |  29 +-
> > >  meson_options.txt |   7 ++
> > >  src/gallium/auxiliary/pipe-loader/meson.build |   3 +-
> > >  src/gallium/meson.build   |  12 ++-
> > >  src/gallium/state_trackers/clover/meson.build | 122 
> > > ++
> > >  src/gallium/targets/opencl/meson.build|  73 +++
> > >  src/gallium/targets/pipe-loader/meson.build   |  77 
> > >  8 files changed, 336 insertions(+), 6 deletions(-)
> > >  create mode 100644 src/gallium/state_trackers/clover/meson.build
> > >  create mode 100644 src/gallium/targets/opencl/meson.build
> > >  create mode 100644 src/gallium/targets/pipe-loader/meson.build
> > > 
> > > diff --git a/include/meson.build b/include/meson.build
> > > index e4dae91cede..a2e7ce6580e 100644
> > > --- a/include/meson.build
> > > +++ b/include/meson.build
> > > @@ -78,3 +78,22 @@ if with_gallium_st_nine
> > >  subdir : 'd3dadapter',
> > >)
> > >  endif
> > > +
> > > +# Only install the headers if we are building a stand alone 
> > > implementation and
> > > +# not an ICD enabled implementation
> > > +if with_gallium_opencl and not with_opencl_icd
> > > +  install_headers(
> > > +'CL/cl.h',
> > > +'CL/cl.hpp',
> > > +'CL/cl_d3d10.h',
> > > +'CL/cl_d3d11.h',
> > > +'CL/cl_dx9_media_sharing.h',
> > > +'CL/cl_egl.h',
> > > +'CL/cl_ext.h',
> > > +'CL/cl_gl.h',
> > > +'CL/cl_gl_ext.h',
> > > +'CL/cl_platform.h',
> > > +'CL/opencl.h',
> > > +subdir: 'CL'
> > > +  )
> > > +endif
> > > diff --git a/meson.build b/meson.build
> > > index 842d441199e..74b2d5c49dc 100644
> > > --- a/meson.build
> > > +++ b/meson.build
> > > @@ -583,6 +583,22 @@ if with_gallium_st_nine
> > >endif
> > >  endif
> > >  
> > > +_opencl = get_option('gallium-opencl')
> > > +if _opencl !=' disabled'
> > > +  if not with_gallium
> > > +error('OpenCL Clover implementation requires at least one gallium 
> > > driver.')
> > > +  endif
> > > +
> > > +  # TODO: alitvec?
> > > +  dep_clc = dependency('libclc')
> > > +  with_gallium_opencl = true
> > > +  with_opencl_icd = _opencl == 'icd'
> > > +else
> > > +  dep_clc = []
> > > +  with_gallium_opencl = false
> > > +  with_gallium_icd = false
> > > +endif
> > > +
> > >  gl_pkgconfig_c_flags = []
> > >  if with_platform_x11
> > >if with_any_vk or (with_glx == 'dri' and with_dri_platform == 'drm')
> > > @@ -930,7 +946,7 @@ dep_thread = dependency('threads')
> > >  if dep_thread.found() and host_machine.system() != 'windows'
> > >pre_args += '-DHAVE_PTHREAD'
> > >  endif
> > > -if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 # TODO: 
> > > clover
> > > +if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 or 
> > > with_gallium_opencl
> > >dep_elf = dependency('libelf', required : false)
> > >if not dep_elf.found()
> > >  dep_elf = cc.find_library('elf')
> > > @@ -972,12 +988,19 @@ if with_amd_vk or with_gallium_radeonsi or 
> > > with_gallium_r600
> > >  llvm_modules += 'asmparser'
> > >endif
> > >  endif
> > > +if with_gallium_opencl
> > > +  llvm_modules += [
> > > +'all-targets', 'linker', 'coverage', 'instrumentation', 'ipo', 
> > > 'irreader',
> > > +'lto', 'option', 'objcarcopts', 'profiledata',
> > > +  ]
> > > +  # TODO: optional modules
> > > +endif
> > >  
> > >  _llvm = get_option('llvm')
> > >  if _llvm == 'auto'
> > >dep_llvm = dependency(
> > >  'llvm', version : '>= 3.9.0', modules : llvm_modules,
> > > -required : with_amd_vk or with_gallium_radeonsi or with_gallium_swr,
> > > +required : with_amd_vk or with_gallium_radeonsi or with_gallium_swr 
> > > or with_gallium_opencl,
> > >)
> > >with_llvm = dep_llvm.found()
> > >  elif _llvm == 'true'
> > > @@ -1154,8 +1177,6 @@ else
> > >dep_lmsensors = []
> > >  endif
> > >  
> > > -# TODO: clover
> > > -
> > >  # TODO: gallium tests
> > >  
> > >  # TODO: various libdirs
> > > diff --git 

Re: [Mesa-dev] [PATCH 2/2] i965: Torch public intel_batchbuffer_emit_dword/float helpers.

2018-01-06 Thread Jason Ekstrand

Both are

Reviewed-by: Jason Ekstrand 

There is a part of me that has been tempted for some time to try and make 
some sort of generic batch buffer structure and share it between GL and 
Vulkan.  Getting this stuff right is hard and a good set of unified helpers 
may help.  I'm not sure how good of an idea that would be but it's a 
thought.  Also, not all that applicable to this patch, it just got me 
thinking about it again. :-)



On January 5, 2018 20:04:47 Kenneth Graunke  wrote:


intel_batchbuffer_emit_float is dead code, it should go.

intel_batchbuffer_emit_dword only had one user, which had bungled using
them by forgetting to call intel_batchbuffer_require_space first.  So it
seems wise to delete these unsafe helpers.

Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  4 ++--
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 13 -
 2 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c

index 3fd8e05d3dc..a17e1699254 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -692,9 +692,9 @@ brw_finish_batch(struct brw_context *brw)
 * necessary by emitting an extra MI_NOOP after the end.
 */
intel_batchbuffer_require_space(brw, 8, brw->batch.ring);
-   intel_batchbuffer_emit_dword(>batch, MI_BATCH_BUFFER_END);
+   *brw->batch.map_next++ = MI_BATCH_BUFFER_END;
if (USED_BATCH(brw->batch) & 1) {
-  intel_batchbuffer_emit_dword(>batch, MI_NOOP);
+  *brw->batch.map_next++ = MI_NOOP;
}

brw->batch.no_wrap = false;
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h

index a927fe7e09e..a9a34600ad1 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -78,19 +78,6 @@ static inline uint32_t float_as_int(float f)
return fi.d;
 }

-static inline void
-intel_batchbuffer_emit_dword(struct intel_batchbuffer *batch, GLuint dword)
-{
-   *batch->map_next++ = dword;
-   assert(batch->ring != UNKNOWN_RING);
-}
-
-static inline void
-intel_batchbuffer_emit_float(struct intel_batchbuffer *batch, float f)
-{
-   intel_batchbuffer_emit_dword(batch, float_as_int(f));
-}
-
 static inline void
 intel_batchbuffer_begin(struct brw_context *brw, int n, enum brw_gpu_ring ring)
 {
--
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/15] RadeonSI 32-bit GPU pointers

2018-01-06 Thread Christian König

Hi Marek,

actually I was on the verge to remove the 32bit VM support in libdrm 
because it clashes with HMM and SVM in general.


Is it possible to set the upper 32bit of the 64bit address to some fixed 
value instead?


Regards,
Christian.

Am 06.01.2018 um 12:12 schrieb Marek Olšák:

Hi,

This series:
- increases the number of buckets in pb_cache
- adds 32-bit heaps: GTT WC, VRAM, and read-only versions of those
- adds a 32-bit VM allocator into winsys/radeon and enables 32-bit VM
   allocations in both winsyses
- moves all const_uploader allocations to 32-bit address space
- puts "amdgpu.uniform" LLVM metadata on loads instead of GEPs,
   so that InstCombine doesn't remove it
- switches shader pointers in user SGPRs to 32 bits

Dependencies:
- https://reviews.llvm.org/D41715
- https://reviews.llvm.org/D41651

This frees up to 7 user SGPRs in merged shaders, 5 user SGPRs
in vertex shaders, and 4 user SGPRs in other shaders.

Please review.

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 00/31] Nir support for Nouveau

2018-01-06 Thread Karol Herbst
On Sat, Jan 6, 2018 at 1:34 AM, Kenneth Graunke  wrote:
> On Thursday, January 4, 2018 11:56:44 AM PST Jason Ekstrand wrote:
>> On January 4, 2018 12:51:15 Karol Herbst  wrote:
>>
>> > On Thu, Jan 4, 2018 at 7:06 PM, Ilia Mirkin  wrote:
>> >> On Thu, Jan 4, 2018 at 10:01 AM, Karol Herbst  wrote:
>> >>> significant changes to last series:
>> >>> * arb_gpu_shader5 interpolateat* (those nir ops don't map well to nvir)
>> >>>   no good plan on how to properly implement those
>> >>
>> >> What's the issue? They should map as well as the TGSI ones. (Since the
>> >> TGSI ones are just the GLSL ones.)
>> >>
>> >
>> > it is a bit ugly, because usually all inputs vars are lowered away, so
>> > that they are inputs. So they need special handling;
>> >
>> > lowered (input is centroid):
>> > vec1 32 ssa_25 = intrinsic load_input (ssa_24) () (0, 0) /* base=0 */
>> > /* component=0 */ /* packed:centroid_qualified */
>> > vec1 32 ssa_27 = intrinsic load_input (ssa_26) () (0, 1) /* base=0 */
>> > /* component=1 */ /* packed:centroid_qualified */
>> >
>> > not lowered:
>> > decl_var  INTERP_MODE_NONE vec2 in@unqualified-temp
>> > vec2 32 ssa_11 = intrinsic interp_var_at_centroid () (in@unqualified-temp) 
>> > ()
>> >
>> > I kind of wished I could have a load_input intrinsic with a flag or
>> > load_input_at_centroid, so that I end up with the same code in the
>> > end.
>>
>> In i965, we use the NIR explicit input interpolation intrinsics.  I'm on my
>> phone so I can't give more details easily.
>
> Setting nir_shader_compiler_options::use_interpolated_input_intrinsics
> will eliminate the need to look at variables.  Instead, you'll get these
> intrinsics:
>
> - load_input (for flat shaded inputs)
> - load_interpolated_input (for non-flat shaded inputs)
>   - load_barycentric_pixel
>   - load_barycentric_centroid
>   - load_barycentric_sample
>   - load_barycentric_at_sample (+ sample ID source)
>   - load_barycentric_at_offset (+ offset.xy source)
>
> The load_interpolated_input intrinsic takes an extra source, which
> should always be one of the load_barycentric_* intrinsics.  That way,
> from the intrinsic, you can see exactly how to interpolate it.
>
> I highly recommend using these.  They're much nicer to work with.

well I already implemented looking at the variables and this is no
issue really. What I am more concerned about are local variables used
in interp_var_at_* intrinsics and those didn't get converted with
use_interpolated_input_intrinsics. I will take a deeper look, maybe
there is some weird condition to actually convert those as well.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/15] ac: place amdgpu.uniform on loads instead of GEPs

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/amd/common/ac_llvm_build.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 164f310..ed00d20 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -775,25 +775,28 @@ ac_build_indexed_store(struct ac_llvm_context *ctx,
  *  dynamically uniform (i.e. load to an SGPR)
  * \param invariant Whether the load is invariant (no other opcodes affect it)
  */
 static LLVMValueRef
 ac_build_load_custom(struct ac_llvm_context *ctx, LLVMValueRef base_ptr,
 LLVMValueRef index, bool uniform, bool invariant)
 {
LLVMValueRef pointer, result;
 
pointer = ac_build_gep0(ctx, base_ptr, index);
-   if (uniform)
+   /* This will be removed by InstCombine if index == 0. */
+   if (HAVE_LLVM < 0x0600 && uniform)
LLVMSetMetadata(pointer, ctx->uniform_md_kind, ctx->empty_md);
result = LLVMBuildLoad(ctx->builder, pointer, "");
if (invariant)
LLVMSetMetadata(result, ctx->invariant_load_md_kind, 
ctx->empty_md);
+   if (HAVE_LLVM >= 0x0600 && uniform)
+   LLVMSetMetadata(result, ctx->uniform_md_kind, ctx->empty_md);
return result;
 }
 
 LLVMValueRef ac_build_load(struct ac_llvm_context *ctx, LLVMValueRef base_ptr,
   LLVMValueRef index)
 {
return ac_build_load_custom(ctx, base_ptr, index, false, false);
 }
 
 LLVMValueRef ac_build_load_invariant(struct ac_llvm_context *ctx,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/15] radeonsi: implement 32-bit pointers in user data SGPRs

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

SGPRS: 2170102 -> 2158430 (-0.54 %)
VGPRS: 1645656 -> 1641516 (-0.25 %)
Spilled SGPRs: 9078 -> 8810 (-2.95 %)
Spilled VGPRs: 130 -> 114 (-12.31 %)
Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread
Code Size: 52094872 -> 52692540 (1.15 %) bytes
---
 src/amd/common/ac_llvm_build.c|  13 +++
 src/amd/common/ac_llvm_build.h|   5 +
 src/gallium/drivers/radeonsi/si_descriptors.c |  10 +-
 src/gallium/drivers/radeonsi/si_shader.c  | 115 +-
 src/gallium/drivers/radeonsi/si_shader.h  |  23 -
 src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c |   6 +-
 6 files changed, 122 insertions(+), 50 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index ed00d20..02d1b39 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -57,20 +57,21 @@ ac_llvm_context_init(struct ac_llvm_context *ctx, 
LLVMContextRef context,
ctx->context = context;
ctx->module = NULL;
ctx->builder = NULL;
 
ctx->voidt = LLVMVoidTypeInContext(ctx->context);
ctx->i1 = LLVMInt1TypeInContext(ctx->context);
ctx->i8 = LLVMInt8TypeInContext(ctx->context);
ctx->i16 = LLVMIntTypeInContext(ctx->context, 16);
ctx->i32 = LLVMIntTypeInContext(ctx->context, 32);
ctx->i64 = LLVMIntTypeInContext(ctx->context, 64);
+   ctx->intptr = HAVE_32BIT_POINTERS ? ctx->i32 : ctx->i64;
ctx->f16 = LLVMHalfTypeInContext(ctx->context);
ctx->f32 = LLVMFloatTypeInContext(ctx->context);
ctx->f64 = LLVMDoubleTypeInContext(ctx->context);
ctx->v2i16 = LLVMVectorType(ctx->i16, 2);
ctx->v2i32 = LLVMVectorType(ctx->i32, 2);
ctx->v3i32 = LLVMVectorType(ctx->i32, 3);
ctx->v4i32 = LLVMVectorType(ctx->i32, 4);
ctx->v2f32 = LLVMVectorType(ctx->f32, 2);
ctx->v4f32 = LLVMVectorType(ctx->f32, 4);
ctx->v8i32 = LLVMVectorType(ctx->i32, 8);
@@ -128,21 +129,24 @@ unsigned
 ac_get_type_size(LLVMTypeRef type)
 {
LLVMTypeKind kind = LLVMGetTypeKind(type);
 
switch (kind) {
case LLVMIntegerTypeKind:
return LLVMGetIntTypeWidth(type) / 8;
case LLVMFloatTypeKind:
return 4;
case LLVMDoubleTypeKind:
+   return 8;
case LLVMPointerTypeKind:
+   if (LLVMGetPointerAddressSpace(type) == 
AC_CONST_32BIT_ADDR_SPACE)
+   return 4;
return 8;
case LLVMVectorTypeKind:
return LLVMGetVectorSize(type) *
   ac_get_type_size(LLVMGetElementType(type));
case LLVMArrayTypeKind:
return LLVMGetArrayLength(type) *
   ac_get_type_size(LLVMGetElementType(type));
default:
assert(0);
return 0;
@@ -2035,10 +2039,19 @@ LLVMValueRef ac_find_lsb(struct ac_llvm_context *ctx,
   LLVMIntEQ, src0,
   ctx->i32_0, ""),
   LLVMConstInt(ctx->i32, -1, 0), lsb, "");
 }
 
 LLVMTypeRef ac_array_in_const_addr_space(LLVMTypeRef elem_type)
 {
return LLVMPointerType(LLVMArrayType(elem_type, 0),
   AC_CONST_ADDR_SPACE);
 }
+
+LLVMTypeRef ac_array_in_const32_addr_space(LLVMTypeRef elem_type)
+{
+   if (!HAVE_32BIT_POINTERS)
+   return ac_array_in_const_addr_space(elem_type);
+
+   return LLVMPointerType(LLVMArrayType(elem_type, 0),
+  AC_CONST_32BIT_ADDR_SPACE);
+}
diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
index b1c4737..5235664 100644
--- a/src/amd/common/ac_llvm_build.h
+++ b/src/amd/common/ac_llvm_build.h
@@ -27,36 +27,40 @@
 
 #include 
 #include 
 
 #include "amd_family.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+#define HAVE_32BIT_POINTERS (HAVE_LLVM >= 0x0600)
+
 enum {
AC_CONST_ADDR_SPACE = 2, /* CONST is the only address space that 
selects SMEM loads */
AC_LOCAL_ADDR_SPACE = 3,
+   AC_CONST_32BIT_ADDR_SPACE = 6, /* same as CONST, but the pointer type 
has 32 bits */
 };
 
 struct ac_llvm_context {
LLVMContextRef context;
LLVMModuleRef module;
LLVMBuilderRef builder;
 
LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
LLVMTypeRef i16;
LLVMTypeRef i32;
LLVMTypeRef i64;
+   LLVMTypeRef intptr;
LLVMTypeRef f16;
LLVMTypeRef f32;
LLVMTypeRef f64;
LLVMTypeRef v2i16;
LLVMTypeRef v2i32;
LLVMTypeRef v3i32;
LLVMTypeRef v4i32;
LLVMTypeRef v2f32;
LLVMTypeRef v4f32;
LLVMTypeRef v8i32;
@@ -331,16 +335,17 @@ void ac_declare_lds_as_pointer(struct ac_llvm_context 
*ac);
 LLVMValueRef ac_lds_load(struct ac_llvm_context 

[Mesa-dev] [PATCH 13/15] ac: rename and move si_const_array into common code

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/amd/common/ac_llvm_build.c|  6 ++
 src/amd/common/ac_llvm_build.h|  3 +++
 src/amd/common/ac_nir_to_llvm.c   | 20 +++-
 src/gallium/drivers/radeonsi/si_shader.c  | 19 ++-
 src/gallium/drivers/radeonsi/si_shader_internal.h |  2 --
 src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c |  6 +++---
 6 files changed, 25 insertions(+), 31 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index a3af204..164f310 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -2026,10 +2026,16 @@ LLVMValueRef ac_find_lsb(struct ac_llvm_context *ctx,
  params, 2,
  AC_FUNC_ATTR_READNONE);
 
/* TODO: We need an intrinsic to skip this conditional. */
/* Check for zero: */
return LLVMBuildSelect(ctx->builder, LLVMBuildICmp(ctx->builder,
   LLVMIntEQ, src0,
   ctx->i32_0, ""),
   LLVMConstInt(ctx->i32, -1, 0), lsb, "");
 }
+
+LLVMTypeRef ac_array_in_const_addr_space(LLVMTypeRef elem_type)
+{
+   return LLVMPointerType(LLVMArrayType(elem_type, 0),
+  AC_CONST_ADDR_SPACE);
+}
diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
index 2d6efb5..b1c4737 100644
--- a/src/amd/common/ac_llvm_build.h
+++ b/src/amd/common/ac_llvm_build.h
@@ -329,15 +329,18 @@ void ac_init_exec_full_mask(struct ac_llvm_context *ctx);
 
 void ac_declare_lds_as_pointer(struct ac_llvm_context *ac);
 LLVMValueRef ac_lds_load(struct ac_llvm_context *ctx,
 LLVMValueRef dw_addr);
 void ac_lds_store(struct ac_llvm_context *ctx,
  LLVMValueRef dw_addr, LLVMValueRef value);
 
 LLVMValueRef ac_find_lsb(struct ac_llvm_context *ctx,
 LLVMTypeRef dst_type,
 LLVMValueRef src0);
+
+LLVMTypeRef ac_array_in_const_addr_space(LLVMTypeRef elem_type);
+
 #ifdef __cplusplus
 }
 #endif
 
 #endif
diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 0445d27..bc5b140 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -344,26 +344,20 @@ create_llvm_function(LLVMContextRef ctx, LLVMModuleRef 
module,
LLVMAddTargetDependentFunctionAttr(main_function,
   "no-nans-fp-math",
   "true");
LLVMAddTargetDependentFunctionAttr(main_function,
   "unsafe-fp-math",
   "true");
}
return main_function;
 }
 
-static LLVMTypeRef const_array(LLVMTypeRef elem_type, int num_elements)
-{
-   return LLVMPointerType(LLVMArrayType(elem_type, num_elements),
-  AC_CONST_ADDR_SPACE);
-}
-
 static int get_elem_bits(struct ac_llvm_context *ctx, LLVMTypeRef type)
 {
if (LLVMGetTypeKind(type) == LLVMVectorTypeKind)
type = LLVMGetElementType(type);
 
if (LLVMGetTypeKind(type) == LLVMIntegerTypeKind)
return LLVMGetIntTypeWidth(type);
 
if (type == ctx->f16)
return 16;
@@ -606,58 +600,58 @@ static void allocate_user_sgprs(struct 
nir_to_llvm_context *ctx,
 
 static void
 declare_global_input_sgprs(struct nir_to_llvm_context *ctx,
   gl_shader_stage stage,
   bool has_previous_stage,
   gl_shader_stage previous_stage,
   const struct user_sgpr_info *user_sgpr_info,
   struct arg_info *args,
   LLVMValueRef *desc_sets)
 {
-   LLVMTypeRef type = const_array(ctx->ac.i8, 1024 * 1024);
+   LLVMTypeRef type = ac_array_in_const_addr_space(ctx->ac.i8);
unsigned num_sets = ctx->options->layout ?
ctx->options->layout->num_sets : 0;
unsigned stage_mask = 1 << stage;
 
if (has_previous_stage)
stage_mask |= 1 << previous_stage;
 
/* 1 for each descriptor set */
if (!user_sgpr_info->indirect_all_descriptor_sets) {
for (unsigned i = 0; i < num_sets; ++i) {
if (ctx->options->layout->set[i].layout->shader_stages 
& stage_mask) {
add_array_arg(args, type,
  >descriptor_sets[i]);
}
}
} else {
-   add_array_arg(args, const_array(type, 32), desc_sets);
+   add_array_arg(args, ac_array_in_const_addr_space(type), 

[Mesa-dev] [PATCH 10/15] radeonsi: disallow constant buffers with a 64-bit address in slot 0

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

State trackers must use a user buffer or const_uploader,
or set pipe_resource::flags same as const_uploader->flags.
---
 src/gallium/drivers/radeonsi/si_descriptors.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 17115e1..b372090 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -1207,20 +1207,26 @@ void si_set_rw_buffer(struct si_context *sctx,
 
 static void si_pipe_set_constant_buffer(struct pipe_context *ctx,
enum pipe_shader_type shader, uint slot,
const struct pipe_constant_buffer 
*input)
 {
struct si_context *sctx = (struct si_context *)ctx;
 
if (shader >= SI_NUM_SHADERS)
return;
 
+   if (slot == 0 && input && input->buffer &&
+   !(r600_resource(input->buffer)->flags & RADEON_FLAG_32BIT)) {
+   assert(!"constant buffer 0 must have a 32-bit VM address, use 
const_uploader");
+   return;
+   }
+
slot = si_get_constbuf_slot(slot);
si_set_constant_buffer(sctx, >const_and_shader_buffers[shader],
   
si_const_and_shader_buffer_descriptors_idx(shader),
   slot, input);
 }
 
 void si_get_pipe_constant_buffer(struct si_context *sctx, uint shader,
 uint slot, struct pipe_constant_buffer *cbuf)
 {
cbuf->user_buffer = NULL;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/15] gallium/radeon: set number of pb_cache buckets = number of heaps

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeon/radeon_winsys.h| 24 
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 27 +--
 src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c |  2 +-
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 27 ++-
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c |  2 +-
 5 files changed, 25 insertions(+), 57 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
b/src/gallium/drivers/radeon/radeon_winsys.h
index 9f274b4..7914170 100644
--- a/src/gallium/drivers/radeon/radeon_winsys.h
+++ b/src/gallium/drivers/radeon/radeon_winsys.h
@@ -716,44 +716,20 @@ static inline unsigned radeon_flags_from_heap(enum 
radeon_heap heap)
RADEON_FLAG_32BIT;
 
 case RADEON_HEAP_VRAM:
 case RADEON_HEAP_GTT_WC:
 case RADEON_HEAP_GTT:
 default:
 return flags;
 }
 }
 
-/* The pb cache bucket is chosen to minimize pb_cache misses.
- * It must be between 0 and 3 inclusive.
- */
-static inline unsigned radeon_get_pb_cache_bucket_index(enum radeon_heap heap)
-{
-switch (heap) {
-case RADEON_HEAP_VRAM_NO_CPU_ACCESS:
-return 0;
-case RADEON_HEAP_VRAM_READ_ONLY:
-case RADEON_HEAP_VRAM_READ_ONLY_32BIT:
-case RADEON_HEAP_VRAM_32BIT:
-case RADEON_HEAP_VRAM:
-return 1;
-case RADEON_HEAP_GTT_WC:
-case RADEON_HEAP_GTT_WC_READ_ONLY:
-case RADEON_HEAP_GTT_WC_READ_ONLY_32BIT:
-case RADEON_HEAP_GTT_WC_32BIT:
-return 2;
-case RADEON_HEAP_GTT:
-default:
-return 3;
-}
-}
-
 /* Return the heap index for winsys allocators, or -1 on failure. */
 static inline int radeon_get_heap_index(enum radeon_bo_domain domain,
 enum radeon_bo_flag flags)
 {
 /* VRAM implies WC (write combining) */
 assert(!(domain & RADEON_DOMAIN_VRAM) || flags & RADEON_FLAG_GTT_WC);
 /* NO_CPU_ACCESS implies VRAM only. */
 assert(!(flags & RADEON_FLAG_NO_CPU_ACCESS) || domain == 
RADEON_DOMAIN_VRAM);
 
 /* Resources with interprocess sharing don't use any winsys allocators. */
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
index 92c314e..5d565ff 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
@@ -366,43 +366,44 @@ static void amdgpu_add_buffer_to_global_list(struct 
amdgpu_winsys_bo *bo)
   simple_mtx_lock(>global_bo_list_lock);
   LIST_ADDTAIL(>u.real.global_list_item, >global_bo_list);
   ws->num_buffers++;
   simple_mtx_unlock(>global_bo_list_lock);
}
 }
 
 static struct amdgpu_winsys_bo *amdgpu_create_bo(struct amdgpu_winsys *ws,
  uint64_t size,
  unsigned alignment,
- unsigned usage,
  enum radeon_bo_domain 
initial_domain,
  unsigned flags,
- unsigned pb_cache_bucket)
+ int heap)
 {
struct amdgpu_bo_alloc_request request = {0};
amdgpu_bo_handle buf_handle;
uint64_t va = 0;
struct amdgpu_winsys_bo *bo;
amdgpu_va_handle va_handle;
unsigned va_gap_size;
int r;
 
/* VRAM or GTT must be specified, but not both at the same time. */
assert(util_bitcount(initial_domain & RADEON_DOMAIN_VRAM_GTT) == 1);
 
bo = CALLOC_STRUCT(amdgpu_winsys_bo);
if (!bo) {
   return NULL;
}
 
-   pb_cache_init_entry(>bo_cache, >u.real.cache_entry, >base,
-   pb_cache_bucket);
+   if (heap >= 0) {
+  pb_cache_init_entry(>bo_cache, >u.real.cache_entry, >base,
+  heap);
+   }
request.alloc_size = size;
request.phys_alignment = alignment;
 
if (initial_domain & RADEON_DOMAIN_VRAM)
   request.preferred_heap |= AMDGPU_GEM_DOMAIN_VRAM;
if (initial_domain & RADEON_DOMAIN_GTT)
   request.preferred_heap |= AMDGPU_GEM_DOMAIN_GTT;
 
/* If VRAM is just stolen system memory, allow both VRAM and
 * GTT, whichever has free space. If a buffer is evicted from
@@ -446,21 +447,21 @@ static struct amdgpu_winsys_bo *amdgpu_create_bo(struct 
amdgpu_winsys *ws,
if (!(flags & RADEON_FLAG_READ_ONLY))
vm_flags |= AMDGPU_VM_PAGE_WRITEABLE;
 
r = amdgpu_bo_va_op_raw(ws->dev, buf_handle, 0, size, va, vm_flags,
   AMDGPU_VA_OP_MAP);
if (r)
   goto error_va_map;
 
pipe_reference_init(>base.reference, 1);
bo->base.alignment = alignment;
-   bo->base.usage = usage;
+   bo->base.usage = 0;
bo->base.size = size;
bo->base.vtbl = _winsys_bo_vtbl;
bo->ws = ws;
bo->bo = buf_handle;
bo->va = va;
bo->u.real.va_handle = va_handle;
bo->initial_domain 

[Mesa-dev] [PATCH 06/15] winsys/amdgpu: enable 32-bit VM allocations

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
index 5d565ff..8ce131c 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
@@ -430,21 +430,22 @@ static struct amdgpu_winsys_bo *amdgpu_create_bo(struct 
amdgpu_winsys *ws,
   fprintf(stderr, "amdgpu:size  : %"PRIu64" bytes\n", size);
   fprintf(stderr, "amdgpu:alignment : %u bytes\n", alignment);
   fprintf(stderr, "amdgpu:domains   : %u\n", initial_domain);
   goto error_bo_alloc;
}
 
va_gap_size = ws->check_vm ? MAX2(4 * alignment, 64 * 1024) : 0;
if (size > ws->info.pte_fragment_size)
   alignment = MAX2(alignment, ws->info.pte_fragment_size);
r = amdgpu_va_range_alloc(ws->dev, amdgpu_gpu_va_range_general,
- size + va_gap_size, alignment, 0, , 
_handle, 0);
+ size + va_gap_size, alignment, 0, , _handle,
+ flags & RADEON_FLAG_32BIT ? 
AMDGPU_VA_RANGE_32_BIT : 0);
if (r)
   goto error_va_alloc;
 
unsigned vm_flags = AMDGPU_VM_PAGE_READABLE |
AMDGPU_VM_PAGE_EXECUTABLE;
 
if (!(flags & RADEON_FLAG_READ_ONLY))
vm_flags |= AMDGPU_VM_PAGE_WRITEABLE;
 
r = amdgpu_bo_va_op_raw(ws->dev, buf_handle, 0, size, va, vm_flags,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/15] ac: don't use byval LLVM qualifier in shaders

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

shader-db doesn't show any regression and 32-bit pointers with byval
are declared as VGPRs for some reason.
---
 src/amd/common/ac_llvm_helper.cpp   |  3 +--
 src/amd/common/ac_llvm_util.c   |  2 --
 src/amd/common/ac_llvm_util.h   |  1 -
 src/amd/common/ac_nir_to_llvm.c |  6 ++
 src/gallium/auxiliary/gallivm/lp_bld_intr.c |  2 --
 src/gallium/auxiliary/gallivm/lp_bld_intr.h |  1 -
 src/gallium/drivers/radeonsi/si_shader.c| 17 +
 7 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/src/amd/common/ac_llvm_helper.cpp 
b/src/amd/common/ac_llvm_helper.cpp
index 4db7036..54562cc 100644
--- a/src/amd/common/ac_llvm_helper.cpp
+++ b/src/amd/common/ac_llvm_helper.cpp
@@ -52,22 +52,21 @@ void ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t 
bytes)
 #else
A->addAttr(llvm::Attribute::getWithDereferenceableBytes(A->getContext(), 
bytes));
 #endif
 }
 
 bool ac_is_sgpr_param(LLVMValueRef arg)
 {
llvm::Argument *A = llvm::unwrap(arg);
llvm::AttributeList AS = A->getParent()->getAttributes();
unsigned ArgNo = A->getArgNo();
-   return AS.hasAttribute(ArgNo + 1, llvm::Attribute::ByVal) ||
-  AS.hasAttribute(ArgNo + 1, llvm::Attribute::InReg);
+   return AS.hasAttribute(ArgNo + 1, llvm::Attribute::InReg);
 }
 
 LLVMValueRef ac_llvm_get_called_value(LLVMValueRef call)
 {
 #if HAVE_LLVM >= 0x0309
return LLVMGetCalledValue(call);
 #else
return 
llvm::wrap(llvm::CallSite(llvm::unwrap(call)).getCalledValue());
 #endif
 }
diff --git a/src/amd/common/ac_llvm_util.c b/src/amd/common/ac_llvm_util.c
index 429904c..5fd785a 100644
--- a/src/amd/common/ac_llvm_util.c
+++ b/src/amd/common/ac_llvm_util.c
@@ -145,39 +145,37 @@ LLVMTargetMachineRef ac_create_target_machine(enum 
radeon_family family, enum ac
 
return tm;
 }
 
 
 #if HAVE_LLVM < 0x0400
 static LLVMAttribute ac_attr_to_llvm_attr(enum ac_func_attr attr)
 {
switch (attr) {
case AC_FUNC_ATTR_ALWAYSINLINE: return LLVMAlwaysInlineAttribute;
-   case AC_FUNC_ATTR_BYVAL: return LLVMByValAttribute;
case AC_FUNC_ATTR_INREG: return LLVMInRegAttribute;
case AC_FUNC_ATTR_NOALIAS: return LLVMNoAliasAttribute;
case AC_FUNC_ATTR_NOUNWIND: return LLVMNoUnwindAttribute;
case AC_FUNC_ATTR_READNONE: return LLVMReadNoneAttribute;
case AC_FUNC_ATTR_READONLY: return LLVMReadOnlyAttribute;
default:
   fprintf(stderr, "Unhandled function attribute: %x\n", attr);
   return 0;
}
 }
 
 #else
 
 static const char *attr_to_str(enum ac_func_attr attr)
 {
switch (attr) {
case AC_FUNC_ATTR_ALWAYSINLINE: return "alwaysinline";
-   case AC_FUNC_ATTR_BYVAL: return "byval";
case AC_FUNC_ATTR_INREG: return "inreg";
case AC_FUNC_ATTR_NOALIAS: return "noalias";
case AC_FUNC_ATTR_NOUNWIND: return "nounwind";
case AC_FUNC_ATTR_READNONE: return "readnone";
case AC_FUNC_ATTR_READONLY: return "readonly";
case AC_FUNC_ATTR_WRITEONLY: return "writeonly";
case AC_FUNC_ATTR_INACCESSIBLE_MEM_ONLY: return "inaccessiblememonly";
case AC_FUNC_ATTR_CONVERGENT: return "convergent";
default:
   fprintf(stderr, "Unhandled function attribute: %x\n", attr);
diff --git a/src/amd/common/ac_llvm_util.h b/src/amd/common/ac_llvm_util.h
index 7c8b6b0..26b0959 100644
--- a/src/amd/common/ac_llvm_util.h
+++ b/src/amd/common/ac_llvm_util.h
@@ -30,21 +30,20 @@
 #include 
 
 #include "amd_family.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
 enum ac_func_attr {
AC_FUNC_ATTR_ALWAYSINLINE = (1 << 0),
-   AC_FUNC_ATTR_BYVAL= (1 << 1),
AC_FUNC_ATTR_INREG= (1 << 2),
AC_FUNC_ATTR_NOALIAS  = (1 << 3),
AC_FUNC_ATTR_NOUNWIND = (1 << 4),
AC_FUNC_ATTR_READNONE = (1 << 5),
AC_FUNC_ATTR_READONLY = (1 << 6),
AC_FUNC_ATTR_WRITEONLY= HAVE_LLVM >= 0x0400 ? (1 << 7) : 0,
AC_FUNC_ATTR_INACCESSIBLE_MEM_ONLY = HAVE_LLVM >= 0x0400 ? (1 << 8) : 0,
AC_FUNC_ATTR_CONVERGENT = HAVE_LLVM >= 0x0400 ? (1 << 9) : 0,
 
/* Legacy intrinsic that needs attributes on function declarations
diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 48e2920..187fdfb 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -316,28 +316,26 @@ create_llvm_function(LLVMContextRef ctx, LLVMModuleRef 
module,
main_function_type =
LLVMFunctionType(ret_type, args->types, args->count, 0);
LLVMValueRef main_function =
LLVMAddFunction(module, "main", main_function_type);
main_function_body =
LLVMAppendBasicBlockInContext(ctx, main_function, "main_body");
LLVMPositionBuilderAtEnd(builder, main_function_body);
 
LLVMSetFunctionCallConv(main_function, RADEON_LLVM_AMDGPU_CS);
for (unsigned i = 0; i < args->sgpr_count; ++i) {
+   

[Mesa-dev] [PATCH 12/15] ac: move address space definitions to common code

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/amd/common/ac_llvm_build.h   |  1 +
 src/amd/common/ac_nir_to_llvm.c  |  9 +++--
 src/gallium/drivers/radeonsi/si_shader.c | 11 +++
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
index 5d39458..2d6efb5 100644
--- a/src/amd/common/ac_llvm_build.h
+++ b/src/amd/common/ac_llvm_build.h
@@ -28,20 +28,21 @@
 #include 
 #include 
 
 #include "amd_family.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
 enum {
+   AC_CONST_ADDR_SPACE = 2, /* CONST is the only address space that 
selects SMEM loads */
AC_LOCAL_ADDR_SPACE = 3,
 };
 
 struct ac_llvm_context {
LLVMContextRef context;
LLVMModuleRef module;
LLVMBuilderRef builder;
 
LLVMTypeRef voidt;
LLVMTypeRef i1;
diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 187fdfb..0445d27 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -36,23 +36,20 @@
 #include "ac_exp_param.h"
 
 enum radeon_llvm_calling_convention {
RADEON_LLVM_AMDGPU_VS = 87,
RADEON_LLVM_AMDGPU_GS = 88,
RADEON_LLVM_AMDGPU_PS = 89,
RADEON_LLVM_AMDGPU_CS = 90,
RADEON_LLVM_AMDGPU_HS = 93,
 };
 
-#define CONST_ADDR_SPACE 2
-#define LOCAL_ADDR_SPACE 3
-
 #define RADEON_LLVM_MAX_INPUTS (VARYING_SLOT_VAR31 + 1)
 #define RADEON_LLVM_MAX_OUTPUTS (VARYING_SLOT_VAR31 + 1)
 
 struct nir_to_llvm_context;
 
 struct ac_nir_context {
struct ac_llvm_context ac;
struct ac_shader_abi *abi;
 
gl_shader_stage stage;
@@ -350,21 +347,21 @@ create_llvm_function(LLVMContextRef ctx, LLVMModuleRef 
module,
LLVMAddTargetDependentFunctionAttr(main_function,
   "unsafe-fp-math",
   "true");
}
return main_function;
 }
 
 static LLVMTypeRef const_array(LLVMTypeRef elem_type, int num_elements)
 {
return LLVMPointerType(LLVMArrayType(elem_type, num_elements),
-  CONST_ADDR_SPACE);
+  AC_CONST_ADDR_SPACE);
 }
 
 static int get_elem_bits(struct ac_llvm_context *ctx, LLVMTypeRef type)
 {
if (LLVMGetTypeKind(type) == LLVMVectorTypeKind)
type = LLVMGetElementType(type);
 
if (LLVMGetTypeKind(type) == LLVMIntegerTypeKind)
return LLVMGetIntTypeWidth(type);
 
@@ -1036,21 +1033,21 @@ static void create_function(struct nir_to_llvm_context 
*ctx,
 
assign_arguments(ctx->main_function, );
 
user_sgpr_idx = 0;
 
if (ctx->options->supports_spill || user_sgpr_info.need_ring_offsets) {
set_loc_shader(ctx, AC_UD_SCRATCH_RING_OFFSETS,
   _sgpr_idx, 2);
if (ctx->options->supports_spill) {
ctx->ring_offsets = ac_build_intrinsic(>ac, 
"llvm.amdgcn.implicit.buffer.ptr",
-  
LLVMPointerType(ctx->ac.i8, CONST_ADDR_SPACE),
+  
LLVMPointerType(ctx->ac.i8, AC_CONST_ADDR_SPACE),
   NULL, 0, 
AC_FUNC_ATTR_READNONE);
ctx->ring_offsets = LLVMBuildBitCast(ctx->builder, 
ctx->ring_offsets,
 
const_array(ctx->ac.v4i32, 16), "");
}
}

/* For merged shaders the user SGPRs start at 8, with 8 system SGPRs in 
front (including
 * the rw_buffers at s0/s1. With user SGPR0 = s8, lets restart the 
count from 0 */
if (has_previous_stage)
user_sgpr_idx = 0;
@@ -5564,21 +5561,21 @@ setup_locals(struct ac_nir_context *ctx,
 
 static void
 setup_shared(struct ac_nir_context *ctx,
 struct nir_shader *nir)
 {
nir_foreach_variable(variable, >shared) {
LLVMValueRef shared =
LLVMAddGlobalInAddressSpace(
   ctx->ac.module, glsl_to_llvm_type(ctx->nctx, 
variable->type),
   variable->name ? variable->name : "",
-  LOCAL_ADDR_SPACE);
+  AC_LOCAL_ADDR_SPACE);
_mesa_hash_table_insert(ctx->vars, variable, shared);
}
 }
 
 static LLVMValueRef
 emit_float_saturate(struct ac_llvm_context *ctx, LLVMValueRef v, float lo, 
float hi)
 {
v = ac_to_float(ctx, v);
v = emit_intrin_2f_param(ctx, "llvm.maxnum", ctx->f32, v, 
LLVMConstReal(ctx->f32, lo));
return emit_intrin_2f_param(ctx, "llvm.minnum", ctx->f32, v, 
LLVMConstReal(ctx->f32, hi));
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 708da13..a1cc6e1 100644
--- 

[Mesa-dev] [PATCH 08/15] winsys/radeon: implement and enable 32-bit VM allocations

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 42 +++
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 28 ++-
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.h |  2 ++
 3 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
index bbfe5cc..06842a4 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
@@ -242,32 +242,54 @@ static uint64_t radeon_bomgr_find_va(const struct 
radeon_info *info,
 if ((hole->size - waste) == size) {
 hole->size = waste;
 mtx_unlock(>mutex);
 return offset;
 }
 }
 
 offset = heap->start;
 waste = offset % alignment;
 waste = waste ? alignment - waste : 0;
+
+if (offset + waste + size > heap->end) {
+mtx_unlock(>mutex);
+return 0;
+}
+
 if (waste) {
 n = CALLOC_STRUCT(radeon_bo_va_hole);
 n->size = waste;
 n->offset = offset;
 list_add(>list, >holes);
 }
 offset += waste;
 heap->start += size + waste;
 mtx_unlock(>mutex);
 return offset;
 }
 
+static uint64_t radeon_bomgr_find_va64(struct radeon_drm_winsys *ws,
+   uint64_t size, uint64_t alignment)
+{
+uint64_t va = 0;
+
+/* Try to allocate from the 64-bit address space first.
+ * If it doesn't exist (start = 0) or if it doesn't have enough space,
+ * fall back to the 32-bit address space.
+ */
+if (ws->vm64.start)
+va = radeon_bomgr_find_va(>info, >vm64, size, alignment);
+if (!va)
+va = radeon_bomgr_find_va(>info, >vm32, size, alignment);
+return va;
+}
+
 static void radeon_bomgr_free_va(const struct radeon_info *info,
  struct radeon_vm_heap *heap,
  uint64_t va, uint64_t size)
 {
 struct radeon_bo_va_hole *hole = NULL;
 
 size = align(size, info->gart_page_size);
 
 mtx_lock(>mutex);
 if ((va + size) == heap->start) {
@@ -363,21 +385,23 @@ void radeon_bo_destroy(struct pb_buffer *_buf)
 
 if (drmCommandWriteRead(rws->fd, DRM_RADEON_GEM_VA, ,
sizeof(va)) != 0 &&
va.operation == RADEON_VA_RESULT_ERROR) {
 fprintf(stderr, "radeon: Failed to deallocate virtual address 
for buffer:\n");
 fprintf(stderr, "radeon:size  : %"PRIu64" bytes\n", 
bo->base.size);
 fprintf(stderr, "radeon:va: 0x%"PRIx64"\n", 
bo->va);
 }
}
 
-   radeon_bomgr_free_va(>info, >vm64, bo->va, bo->base.size);
+   radeon_bomgr_free_va(>info,
+ bo->va < rws->vm32.end ? >vm32 : >vm64,
+ bo->va, bo->base.size);
 }
 
 /* Close object. */
 args.handle = bo->handle;
 drmIoctl(rws->fd, DRM_IOCTL_GEM_CLOSE, );
 
 mtx_destroy(>u.real.map_mutex);
 
 if (bo->initial_domain & RADEON_DOMAIN_VRAM)
 rws->allocated_vram -= align(bo->base.size, rws->info.gart_page_size);
@@ -653,22 +677,28 @@ static struct radeon_bo *radeon_create_bo(struct 
radeon_drm_winsys *rws,
 if (heap >= 0) {
 pb_cache_init_entry(>bo_cache, >u.real.cache_entry, >base,
 heap);
 }
 
 if (rws->info.has_virtual_memory) {
 struct drm_radeon_gem_va va;
 unsigned va_gap_size;
 
 va_gap_size = rws->check_vm ? MAX2(4 * alignment, 64 * 1024) : 0;
-bo->va = radeon_bomgr_find_va(>info, >vm64,
-  size + va_gap_size, alignment);
+
+if (flags & RADEON_FLAG_32BIT) {
+bo->va = radeon_bomgr_find_va(>info, >vm32,
+  size + va_gap_size, alignment);
+assert(bo->va + size < rws->vm32.end);
+} else {
+bo->va = radeon_bomgr_find_va64(rws, size + va_gap_size, 
alignment);
+}
 
 va.handle = bo->handle;
 va.vm_id = 0;
 va.operation = RADEON_VA_MAP;
 va.flags = RADEON_VM_PAGE_READABLE |
RADEON_VM_PAGE_WRITEABLE |
RADEON_VM_PAGE_SNOOPED;
 va.offset = bo->va;
 r = drmCommandWriteRead(rws->fd, DRM_RADEON_GEM_VA, , sizeof(va));
 if (r && va.operation == RADEON_VA_RESULT_ERROR) {
@@ -1055,22 +1085,21 @@ static struct pb_buffer 
*radeon_winsys_bo_from_ptr(struct radeon_winsys *rws,
 bo->hash = __sync_fetch_and_add(>next_bo_hash, 1);
 (void) mtx_init(>u.real.map_mutex, mtx_plain);
 
 util_hash_table_set(ws->bo_handles, (void*)(uintptr_t)bo->handle, bo);
 
 mtx_unlock(>bo_handles_mutex);
 
 if (ws->info.has_virtual_memory) {
 struct drm_radeon_gem_va va;
 
-bo->va = radeon_bomgr_find_va(>info, >vm64,
- 

[Mesa-dev] [PATCH 09/15] radeonsi: move const_uploader allocations to 32-bit address space

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_buffer_common.c | 3 +++
 src/gallium/drivers/radeon/r600_pipe_common.c   | 5 +++--
 src/gallium/drivers/radeon/r600_pipe_common.h   | 1 +
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index aca536d..2d64eed 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -170,20 +170,23 @@ void si_init_resource_fields(struct si_screen *sscreen,
res->flags |= RADEON_FLAG_NO_SUBALLOC; /* shareable */
else
res->flags |= RADEON_FLAG_NO_INTERPROCESS_SHARING;
 
if (sscreen->debug_flags & DBG(NO_WC))
res->flags &= ~RADEON_FLAG_GTT_WC;
 
if (res->b.b.flags & R600_RESOURCE_FLAG_READ_ONLY)
res->flags |= RADEON_FLAG_READ_ONLY;
 
+   if (res->b.b.flags & R600_RESOURCE_FLAG_32BIT)
+   res->flags |= RADEON_FLAG_32BIT;
+
/* Set expected VRAM and GART usage for the buffer. */
res->vram_usage = 0;
res->gart_usage = 0;
res->max_forced_staging_uploads = 0;
res->b.max_forced_staging_uploads = 0;
 
if (res->domains & RADEON_DOMAIN_VRAM) {
res->vram_usage = size;
 
res->max_forced_staging_uploads =
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 9e45a9f..d46cb64 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -445,22 +445,23 @@ bool si_common_context_init(struct r600_common_context 
*rctx,
return false;
 
rctx->b.stream_uploader = u_upload_create(>b, 1024 * 1024,
  0, PIPE_USAGE_STREAM,
  R600_RESOURCE_FLAG_READ_ONLY);
if (!rctx->b.stream_uploader)
return false;
 
rctx->b.const_uploader = u_upload_create(>b, 128 * 1024,
 0, PIPE_USAGE_DEFAULT,
-
sscreen->cpdma_prefetch_writes_memory ?
-   0 : 
R600_RESOURCE_FLAG_READ_ONLY);
+R600_RESOURCE_FLAG_32BIT |
+
(sscreen->cpdma_prefetch_writes_memory ?
+   0 : 
R600_RESOURCE_FLAG_READ_ONLY));
if (!rctx->b.const_uploader)
return false;
 
rctx->cached_gtt_allocator = u_upload_create(>b, 16 * 1024,
 0, PIPE_USAGE_STAGING, 0);
if (!rctx->cached_gtt_allocator)
return false;
 
rctx->ctx = rctx->ws->ctx_create(rctx->ws);
if (!rctx->ctx)
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index a8e632c..fcba228 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -47,20 +47,21 @@
 struct u_log_context;
 struct si_screen;
 struct si_context;
 
 #define R600_RESOURCE_FLAG_TRANSFER(PIPE_RESOURCE_FLAG_DRV_PRIV << 
0)
 #define R600_RESOURCE_FLAG_FLUSHED_DEPTH   (PIPE_RESOURCE_FLAG_DRV_PRIV << 
1)
 #define R600_RESOURCE_FLAG_FORCE_TILING
(PIPE_RESOURCE_FLAG_DRV_PRIV << 2)
 #define R600_RESOURCE_FLAG_DISABLE_DCC (PIPE_RESOURCE_FLAG_DRV_PRIV << 
3)
 #define R600_RESOURCE_FLAG_UNMAPPABLE  (PIPE_RESOURCE_FLAG_DRV_PRIV << 
4)
 #define R600_RESOURCE_FLAG_READ_ONLY   (PIPE_RESOURCE_FLAG_DRV_PRIV << 
5)
+#define R600_RESOURCE_FLAG_32BIT   (PIPE_RESOURCE_FLAG_DRV_PRIV << 
6)
 
 /* Debug flags. */
 enum {
/* Shader logging options: */
DBG_VS = PIPE_SHADER_VERTEX,
DBG_PS = PIPE_SHADER_FRAGMENT,
DBG_GS = PIPE_SHADER_GEOMETRY,
DBG_TCS = PIPE_SHADER_TESS_CTRL,
DBG_TES = PIPE_SHADER_TESS_EVAL,
DBG_CS = PIPE_SHADER_COMPUTE,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/15] winsys/radeon: add struct radeon_vm_heap

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 63 ---
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c |  9 ++--
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.h | 11 ++--
 3 files changed, 47 insertions(+), 36 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
index 7aef238..bbfe5cc 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
@@ -191,146 +191,148 @@ static enum radeon_bo_domain 
radeon_bo_get_initial_domain(
 fprintf(stderr, "radeon: failed to get initial domain: %p 0x%08X\n",
 bo, bo->handle);
 /* Default domain as returned by get_valid_domain. */
 return RADEON_DOMAIN_VRAM_GTT;
 }
 
 /* GEM domains and winsys domains are defined the same. */
 return get_valid_domain(args.value);
 }
 
-static uint64_t radeon_bomgr_find_va(struct radeon_drm_winsys *rws,
+static uint64_t radeon_bomgr_find_va(const struct radeon_info *info,
+ struct radeon_vm_heap *heap,
  uint64_t size, uint64_t alignment)
 {
 struct radeon_bo_va_hole *hole, *n;
 uint64_t offset = 0, waste = 0;
 
 /* All VM address space holes will implicitly start aligned to the
  * size alignment, so we don't need to sanitize the alignment here
  */
-size = align(size, rws->info.gart_page_size);
+size = align(size, info->gart_page_size);
 
-mtx_lock(>bo_va_mutex);
+mtx_lock(>mutex);
 /* first look for a hole */
-LIST_FOR_EACH_ENTRY_SAFE(hole, n, >va_holes, list) {
+LIST_FOR_EACH_ENTRY_SAFE(hole, n, >holes, list) {
 offset = hole->offset;
 waste = offset % alignment;
 waste = waste ? alignment - waste : 0;
 offset += waste;
 if (offset >= (hole->offset + hole->size)) {
 continue;
 }
 if (!waste && hole->size == size) {
 offset = hole->offset;
 list_del(>list);
 FREE(hole);
-mtx_unlock(>bo_va_mutex);
+mtx_unlock(>mutex);
 return offset;
 }
 if ((hole->size - waste) > size) {
 if (waste) {
 n = CALLOC_STRUCT(radeon_bo_va_hole);
 n->size = waste;
 n->offset = hole->offset;
 list_add(>list, >list);
 }
 hole->size -= (size + waste);
 hole->offset += size + waste;
-mtx_unlock(>bo_va_mutex);
+mtx_unlock(>mutex);
 return offset;
 }
 if ((hole->size - waste) == size) {
 hole->size = waste;
-mtx_unlock(>bo_va_mutex);
+mtx_unlock(>mutex);
 return offset;
 }
 }
 
-offset = rws->va_offset;
+offset = heap->start;
 waste = offset % alignment;
 waste = waste ? alignment - waste : 0;
 if (waste) {
 n = CALLOC_STRUCT(radeon_bo_va_hole);
 n->size = waste;
 n->offset = offset;
-list_add(>list, >va_holes);
+list_add(>list, >holes);
 }
 offset += waste;
-rws->va_offset += size + waste;
-mtx_unlock(>bo_va_mutex);
+heap->start += size + waste;
+mtx_unlock(>mutex);
 return offset;
 }
 
-static void radeon_bomgr_free_va(struct radeon_drm_winsys *rws,
+static void radeon_bomgr_free_va(const struct radeon_info *info,
+ struct radeon_vm_heap *heap,
  uint64_t va, uint64_t size)
 {
 struct radeon_bo_va_hole *hole = NULL;
 
-size = align(size, rws->info.gart_page_size);
+size = align(size, info->gart_page_size);
 
-mtx_lock(>bo_va_mutex);
-if ((va + size) == rws->va_offset) {
-rws->va_offset = va;
+mtx_lock(>mutex);
+if ((va + size) == heap->start) {
+heap->start = va;
 /* Delete uppermost hole if it reaches the new top */
-if (!LIST_IS_EMPTY(>va_holes)) {
-hole = container_of(rws->va_holes.next, hole, list);
+if (!LIST_IS_EMPTY(>holes)) {
+hole = container_of(heap->holes.next, hole, list);
 if ((hole->offset + hole->size) == va) {
-rws->va_offset = hole->offset;
+heap->start = hole->offset;
 list_del(>list);
 FREE(hole);
 }
 }
 } else {
 struct radeon_bo_va_hole *next;
 
-hole = container_of(>va_holes, hole, list);
-LIST_FOR_EACH_ENTRY(next, >va_holes, list) {
+hole = container_of(>holes, hole, list);
+LIST_FOR_EACH_ENTRY(next, >holes, list) {
if (next->offset < va)
break;
 hole = next;
 }
 
-if (>list != >va_holes) {
+if (>list != >holes) {
 /* Grow upper hole if it's adjacent */
  

[Mesa-dev] [PATCH 04/15] pb_cache: let drivers choose the number of buckets

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c |  2 +-
 src/gallium/auxiliary/pipebuffer/pb_cache.c| 20 
 src/gallium/auxiliary/pipebuffer/pb_cache.h|  6 --
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c  |  1 -
 src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c  |  3 ++-
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c  |  1 -
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c  |  3 ++-
 7 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c 
b/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c
index 24831f6..4e70048 100644
--- a/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c
+++ b/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c
@@ -297,16 +297,16 @@ pb_cache_manager_create(struct pb_manager *provider,
   return NULL;

mgr = CALLOC_STRUCT(pb_cache_manager);
if (!mgr)
   return NULL;
 
mgr->base.destroy = pb_cache_manager_destroy;
mgr->base.create_buffer = pb_cache_manager_create_buffer;
mgr->base.flush = pb_cache_manager_flush;
mgr->provider = provider;
-   pb_cache_init(>cache, usecs, size_factor, bypass_usage,
+   pb_cache_init(>cache, 1, usecs, size_factor, bypass_usage,
  maximum_cache_size,
  _pb_cache_buffer_destroy,
  pb_cache_can_reclaim_buffer);
return >base;
 }
diff --git a/src/gallium/auxiliary/pipebuffer/pb_cache.c 
b/src/gallium/auxiliary/pipebuffer/pb_cache.c
index dd479ae..af899a2 100644
--- a/src/gallium/auxiliary/pipebuffer/pb_cache.c
+++ b/src/gallium/auxiliary/pipebuffer/pb_cache.c
@@ -85,21 +85,21 @@ pb_cache_add_buffer(struct pb_cache_entry *entry)
struct pb_cache *mgr = entry->mgr;
struct list_head *cache = >buckets[entry->bucket_index];
struct pb_buffer *buf = entry->buffer;
unsigned i;
 
mtx_lock(>mutex);
assert(!pipe_is_referenced(>reference));
 
int64_t current_time = os_time_get();
 
-   for (i = 0; i < ARRAY_SIZE(mgr->buckets); i++)
+   for (i = 0; i < mgr->num_heaps; i++)
   release_expired_buffers_locked(>buckets[i], current_time);
 
/* Directly release any buffer that exceeds the limit. */
if (mgr->cache_size + buf->size > mgr->max_cache_size) {
   mgr->destroy_buffer(buf);
   mtx_unlock(>mutex);
   return;
}
 
entry->start = os_time_get();
@@ -146,20 +146,22 @@ pb_cache_is_buffer_compat(struct pb_cache_entry *entry,
 struct pb_buffer *
 pb_cache_reclaim_buffer(struct pb_cache *mgr, pb_size size,
 unsigned alignment, unsigned usage,
 unsigned bucket_index)
 {
struct pb_cache_entry *entry;
struct pb_cache_entry *cur_entry;
struct list_head *cur, *next;
int64_t now;
int ret = 0;
+
+   assert(bucket_index < mgr->num_heaps);
struct list_head *cache = >buckets[bucket_index];
 
mtx_lock(>mutex);
 
entry = NULL;
cur = cache->next;
next = cur->next;
 
/* search in the expired buffers, freeing them in the process */
now = os_time_get();
@@ -222,39 +224,41 @@ pb_cache_reclaim_buffer(struct pb_cache *mgr, pb_size 
size,
  * Empty the cache. Useful when there is not enough memory.
  */
 void
 pb_cache_release_all_buffers(struct pb_cache *mgr)
 {
struct list_head *curr, *next;
struct pb_cache_entry *buf;
unsigned i;
 
mtx_lock(>mutex);
-   for (i = 0; i < ARRAY_SIZE(mgr->buckets); i++) {
+   for (i = 0; i < mgr->num_heaps; i++) {
   struct list_head *cache = >buckets[i];
 
   curr = cache->next;
   next = curr->next;
   while (curr != cache) {
  buf = LIST_ENTRY(struct pb_cache_entry, curr, head);
  destroy_buffer_locked(buf);
  curr = next;
  next = curr->next;
   }
}
mtx_unlock(>mutex);
 }
 
 void
 pb_cache_init_entry(struct pb_cache *mgr, struct pb_cache_entry *entry,
 struct pb_buffer *buf, unsigned bucket_index)
 {
+   assert(bucket_index < mgr->num_heaps);
+
memset(entry, 0, sizeof(*entry));
entry->buffer = buf;
entry->mgr = mgr;
entry->bucket_index = bucket_index;
 }
 
 /**
  * Initialize a caching buffer manager.
  *
  * @param mgr The cache buffer manager
@@ -263,40 +267,48 @@ pb_cache_init_entry(struct pb_cache *mgr, struct 
pb_cache_entry *entry,
  * @param size_factor  Declare buffers that are size_factor times bigger than
  * the requested size as cache hits.
  * @param bypass_usage  Bitmask. If (requested usage & bypass_usage) != 0,
  *  buffer allocation requests are rejected.
  * @param maximum_cache_size  Maximum size of all unused buffers the cache can
  *hold.
  * @param destroy_buffer  Function that destroys a buffer for good.
  * @param can_reclaim Whether a buffer can be reclaimed (e.g. is not busy)
  */
 void
-pb_cache_init(struct pb_cache *mgr, uint usecs, float size_factor,

[Mesa-dev] [PATCH 01/15] gallium/radeon: simplify radeon_flags_from_heap

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeon/radeon_winsys.h | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
b/src/gallium/drivers/radeon/radeon_winsys.h
index d1c761f..49ef83b 100644
--- a/src/gallium/drivers/radeon/radeon_winsys.h
+++ b/src/gallium/drivers/radeon/radeon_winsys.h
@@ -675,44 +675,38 @@ static inline enum radeon_bo_domain 
radeon_domain_from_heap(enum radeon_heap hea
 case RADEON_HEAP_GTT:
 return RADEON_DOMAIN_GTT;
 default:
 assert(0);
 return (enum radeon_bo_domain)0;
 }
 }
 
 static inline unsigned radeon_flags_from_heap(enum radeon_heap heap)
 {
+unsigned flags = RADEON_FLAG_NO_INTERPROCESS_SHARING |
+ (heap != RADEON_HEAP_GTT ? RADEON_FLAG_GTT_WC : 0);
+
 switch (heap) {
 case RADEON_HEAP_VRAM_NO_CPU_ACCESS:
-return RADEON_FLAG_GTT_WC |
-   RADEON_FLAG_NO_CPU_ACCESS |
-   RADEON_FLAG_NO_INTERPROCESS_SHARING;
+return flags |
+   RADEON_FLAG_NO_CPU_ACCESS;
 
 case RADEON_HEAP_VRAM_READ_ONLY:
-return RADEON_FLAG_GTT_WC |
-   RADEON_FLAG_NO_INTERPROCESS_SHARING |
+case RADEON_HEAP_GTT_WC_READ_ONLY:
+return flags |
RADEON_FLAG_READ_ONLY;
 
 case RADEON_HEAP_VRAM:
 case RADEON_HEAP_GTT_WC:
-return RADEON_FLAG_GTT_WC |
-   RADEON_FLAG_NO_INTERPROCESS_SHARING;
-
-case RADEON_HEAP_GTT_WC_READ_ONLY:
-return RADEON_FLAG_GTT_WC |
-   RADEON_FLAG_NO_INTERPROCESS_SHARING |
-   RADEON_FLAG_READ_ONLY;
-
 case RADEON_HEAP_GTT:
 default:
-return RADEON_FLAG_NO_INTERPROCESS_SHARING;
+return flags;
 }
 }
 
 /* The pb cache bucket is chosen to minimize pb_cache misses.
  * It must be between 0 and 3 inclusive.
  */
 static inline unsigned radeon_get_pb_cache_bucket_index(enum radeon_heap heap)
 {
 switch (heap) {
 case RADEON_HEAP_VRAM_NO_CPU_ACCESS:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/15] pb_cache: call os_time_get outside of the loop

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/auxiliary/pipebuffer/pb_cache.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/gallium/auxiliary/pipebuffer/pb_cache.c 
b/src/gallium/auxiliary/pipebuffer/pb_cache.c
index b67e54b..dd479ae 100644
--- a/src/gallium/auxiliary/pipebuffer/pb_cache.c
+++ b/src/gallium/auxiliary/pipebuffer/pb_cache.c
@@ -47,34 +47,32 @@ destroy_buffer_locked(struct pb_cache_entry *entry)
   --mgr->num_buffers;
   mgr->cache_size -= buf->size;
}
mgr->destroy_buffer(buf);
 }
 
 /**
  * Free as many cache buffers from the list head as possible.
  */
 static void
-release_expired_buffers_locked(struct list_head *cache)
+release_expired_buffers_locked(struct list_head *cache,
+   int64_t current_time)
 {
struct list_head *curr, *next;
struct pb_cache_entry *entry;
-   int64_t now;
-
-   now = os_time_get();
 
curr = cache->next;
next = curr->next;
while (curr != cache) {
   entry = LIST_ENTRY(struct pb_cache_entry, curr, head);
 
-  if (!os_time_timeout(entry->start, entry->end, now))
+  if (!os_time_timeout(entry->start, entry->end, current_time))
  break;
 
   destroy_buffer_locked(entry);
 
   curr = next;
   next = curr->next;
}
 }
 
 /**
@@ -85,22 +83,24 @@ void
 pb_cache_add_buffer(struct pb_cache_entry *entry)
 {
struct pb_cache *mgr = entry->mgr;
struct list_head *cache = >buckets[entry->bucket_index];
struct pb_buffer *buf = entry->buffer;
unsigned i;
 
mtx_lock(>mutex);
assert(!pipe_is_referenced(>reference));
 
+   int64_t current_time = os_time_get();
+
for (i = 0; i < ARRAY_SIZE(mgr->buckets); i++)
-  release_expired_buffers_locked(>buckets[i]);
+  release_expired_buffers_locked(>buckets[i], current_time);
 
/* Directly release any buffer that exceeds the limit. */
if (mgr->cache_size + buf->size > mgr->max_cache_size) {
   mgr->destroy_buffer(buf);
   mtx_unlock(>mutex);
   return;
}
 
entry->start = os_time_get();
entry->end = entry->start + mgr->usecs;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/15] gallium/radeon: add 32-bit address space heaps

2018-01-06 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeon/radeon_winsys.h | 51 --
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
b/src/gallium/drivers/radeon/radeon_winsys.h
index 49ef83b..9f274b4 100644
--- a/src/gallium/drivers/radeon/radeon_winsys.h
+++ b/src/gallium/drivers/radeon/radeon_winsys.h
@@ -46,20 +46,21 @@ enum radeon_bo_domain { /* bitfield */
 RADEON_DOMAIN_VRAM_GTT = RADEON_DOMAIN_VRAM | RADEON_DOMAIN_GTT
 };
 
 enum radeon_bo_flag { /* bitfield */
 RADEON_FLAG_GTT_WC =(1 << 0),
 RADEON_FLAG_NO_CPU_ACCESS = (1 << 1),
 RADEON_FLAG_NO_SUBALLOC =   (1 << 2),
 RADEON_FLAG_SPARSE =(1 << 3),
 RADEON_FLAG_NO_INTERPROCESS_SHARING = (1 << 4),
 RADEON_FLAG_READ_ONLY = (1 << 5),
+RADEON_FLAG_32BIT =(1 << 6),
 };
 
 enum radeon_bo_usage { /* bitfield */
 RADEON_USAGE_READ = 2,
 RADEON_USAGE_WRITE = 4,
 RADEON_USAGE_READWRITE = RADEON_USAGE_READ | RADEON_USAGE_WRITE,
 
 /* The winsys ensures that the CS submission will be scheduled after
  * previously flushed CSs referencing this BO in a conflicting way.
  */
@@ -648,37 +649,45 @@ static inline void radeon_emit(struct radeon_winsys_cs 
*cs, uint32_t value)
 static inline void radeon_emit_array(struct radeon_winsys_cs *cs,
 const uint32_t *values, unsigned count)
 {
 memcpy(cs->current.buf + cs->current.cdw, values, count * 4);
 cs->current.cdw += count;
 }
 
 enum radeon_heap {
 RADEON_HEAP_VRAM_NO_CPU_ACCESS,
 RADEON_HEAP_VRAM_READ_ONLY,
+RADEON_HEAP_VRAM_READ_ONLY_32BIT,
+RADEON_HEAP_VRAM_32BIT,
 RADEON_HEAP_VRAM,
 RADEON_HEAP_GTT_WC,
 RADEON_HEAP_GTT_WC_READ_ONLY,
+RADEON_HEAP_GTT_WC_READ_ONLY_32BIT,
+RADEON_HEAP_GTT_WC_32BIT,
 RADEON_HEAP_GTT,
 RADEON_MAX_SLAB_HEAPS,
 RADEON_MAX_CACHED_HEAPS = RADEON_MAX_SLAB_HEAPS,
 };
 
 static inline enum radeon_bo_domain radeon_domain_from_heap(enum radeon_heap 
heap)
 {
 switch (heap) {
 case RADEON_HEAP_VRAM_NO_CPU_ACCESS:
 case RADEON_HEAP_VRAM_READ_ONLY:
+case RADEON_HEAP_VRAM_READ_ONLY_32BIT:
+case RADEON_HEAP_VRAM_32BIT:
 case RADEON_HEAP_VRAM:
 return RADEON_DOMAIN_VRAM;
 case RADEON_HEAP_GTT_WC:
 case RADEON_HEAP_GTT_WC_READ_ONLY:
+case RADEON_HEAP_GTT_WC_READ_ONLY_32BIT:
+case RADEON_HEAP_GTT_WC_32BIT:
 case RADEON_HEAP_GTT:
 return RADEON_DOMAIN_GTT;
 default:
 assert(0);
 return (enum radeon_bo_domain)0;
 }
 }
 
 static inline unsigned radeon_flags_from_heap(enum radeon_heap heap)
 {
@@ -688,41 +697,56 @@ static inline unsigned radeon_flags_from_heap(enum 
radeon_heap heap)
 switch (heap) {
 case RADEON_HEAP_VRAM_NO_CPU_ACCESS:
 return flags |
RADEON_FLAG_NO_CPU_ACCESS;
 
 case RADEON_HEAP_VRAM_READ_ONLY:
 case RADEON_HEAP_GTT_WC_READ_ONLY:
 return flags |
RADEON_FLAG_READ_ONLY;
 
+case RADEON_HEAP_VRAM_READ_ONLY_32BIT:
+case RADEON_HEAP_GTT_WC_READ_ONLY_32BIT:
+return flags |
+   RADEON_FLAG_READ_ONLY |
+   RADEON_FLAG_32BIT;
+
+case RADEON_HEAP_VRAM_32BIT:
+case RADEON_HEAP_GTT_WC_32BIT:
+return flags |
+   RADEON_FLAG_32BIT;
+
 case RADEON_HEAP_VRAM:
 case RADEON_HEAP_GTT_WC:
 case RADEON_HEAP_GTT:
 default:
 return flags;
 }
 }
 
 /* The pb cache bucket is chosen to minimize pb_cache misses.
  * It must be between 0 and 3 inclusive.
  */
 static inline unsigned radeon_get_pb_cache_bucket_index(enum radeon_heap heap)
 {
 switch (heap) {
 case RADEON_HEAP_VRAM_NO_CPU_ACCESS:
 return 0;
 case RADEON_HEAP_VRAM_READ_ONLY:
+case RADEON_HEAP_VRAM_READ_ONLY_32BIT:
+case RADEON_HEAP_VRAM_32BIT:
 case RADEON_HEAP_VRAM:
 return 1;
 case RADEON_HEAP_GTT_WC:
 case RADEON_HEAP_GTT_WC_READ_ONLY:
+case RADEON_HEAP_GTT_WC_READ_ONLY_32BIT:
+case RADEON_HEAP_GTT_WC_32BIT:
 return 2;
 case RADEON_HEAP_GTT:
 default:
 return 3;
 }
 }
 
 /* Return the heap index for winsys allocators, or -1 on failure. */
 static inline int radeon_get_heap_index(enum radeon_bo_domain domain,
 enum radeon_bo_flag flags)
@@ -733,46 +757,67 @@ static inline int radeon_get_heap_index(enum 
radeon_bo_domain domain,
 assert(!(flags & RADEON_FLAG_NO_CPU_ACCESS) || domain == 
RADEON_DOMAIN_VRAM);
 
 /* Resources with interprocess sharing don't use any winsys allocators. */
 if (!(flags & RADEON_FLAG_NO_INTERPROCESS_SHARING))
 return -1;
 
 /* Unsupported flags: NO_SUBALLOC, SPARSE. */
 if (flags & ~(RADEON_FLAG_GTT_WC |
   RADEON_FLAG_NO_CPU_ACCESS |
   RADEON_FLAG_NO_INTERPROCESS_SHARING |
-  RADEON_FLAG_READ_ONLY))
+ 

[Mesa-dev] [PATCH 00/15] RadeonSI 32-bit GPU pointers

2018-01-06 Thread Marek Olšák
Hi,

This series:
- increases the number of buckets in pb_cache
- adds 32-bit heaps: GTT WC, VRAM, and read-only versions of those
- adds a 32-bit VM allocator into winsys/radeon and enables 32-bit VM
  allocations in both winsyses
- moves all const_uploader allocations to 32-bit address space
- puts "amdgpu.uniform" LLVM metadata on loads instead of GEPs,
  so that InstCombine doesn't remove it
- switches shader pointers in user SGPRs to 32 bits

Dependencies:
- https://reviews.llvm.org/D41715
- https://reviews.llvm.org/D41651

This frees up to 7 user SGPRs in merged shaders, 5 user SGPRs
in vertex shaders, and 4 user SGPRs in other shaders.

Please review.

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 104381] swr fails to build since llvm-svn r321257

2018-01-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104381

Laurent carlier  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Laurent carlier  ---
Fixed in trunk with ad218754c79e0af61d5ba225a4b195cb55c2cac9

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] spirv: Use correct type for sampled images

2018-01-06 Thread Alex Smith
On 6 January 2018 at 01:03, Jason Ekstrand  wrote:

> On Tue, Nov 7, 2017 at 3:08 AM, Alex Smith 
> wrote:
>
>> Thanks Jason. Can someone push this?
>>
>
> Did you never get push access?
>

I did - this is commit e9eb3c4753e4f56b03d16d8d6f71d49f1e7b97db.

Thanks,
Alex


> --Jason
>
>
>> On 6 November 2017 at 16:21, Jason Ekstrand  wrote:
>>
>>> On Mon, Nov 6, 2017 at 2:37 AM, Alex Smith 
>>> wrote:
>>>
 We should use the result type of the OpSampledImage opcode, rather than
 the type of the underlying image/samplers.

 This resolves an issue when using separate images and shadow samplers
 with glslang. Example:

 layout (...) uniform samplerShadow s0;
 layout (...) uniform texture2D res0;
 ...
 float result = textureLod(sampler2DShadow(res0, s0), uv, 0);

 For this, for the combined OpSampledImage, the type of the base image
 was being used (which does not have the Depth flag set, whereas the
 result type does), therefore it was not being recognised as a shadow
 sampler. This led to the wrong LLVM intrinsics being emitted by RADV.

>>>
>>> Reviewed-by: Jason Ekstrand 
>>>
>>>
 Signed-off-by: Alex Smith 
 Cc: "17.2 17.3" 
 ---
  src/compiler/spirv/spirv_to_nir.c  | 10 --
  src/compiler/spirv/vtn_private.h   |  1 +
  src/compiler/spirv/vtn_variables.c |  1 +
  3 files changed, 6 insertions(+), 6 deletions(-)

 diff --git a/src/compiler/spirv/spirv_to_nir.c
 b/src/compiler/spirv/spirv_to_nir.c
 index 6825e0d6a8..93a515d731 100644
 --- a/src/compiler/spirv/spirv_to_nir.c
 +++ b/src/compiler/spirv/spirv_to_nir.c
 @@ -1490,6 +1490,8 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp
 opcode,
struct vtn_value *val =
   vtn_push_value(b, w[2], vtn_value_type_sampled_image);
val->sampled_image = ralloc(b, struct vtn_sampled_image);
 +  val->sampled_image->type =
 + vtn_value(b, w[1], vtn_value_type_type)->type;
val->sampled_image->image =
   vtn_value(b, w[3], vtn_value_type_pointer)->pointer;
val->sampled_image->sampler =
 @@ -1516,16 +1518,12 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp
 opcode,
sampled = *sampled_val->sampled_image;
 } else {
assert(sampled_val->value_type == vtn_value_type_pointer);
 +  sampled.type = sampled_val->pointer->type;
sampled.image = NULL;
sampled.sampler = sampled_val->pointer;
 }

 -   const struct glsl_type *image_type;
 -   if (sampled.image) {
 -  image_type = sampled.image->var->var->interface_type;
 -   } else {
 -  image_type = sampled.sampler->var->var->interface_type;
 -   }
 +   const struct glsl_type *image_type = sampled.type->type;
 const enum glsl_sampler_dim sampler_dim =
 glsl_get_sampler_dim(image_type);
 const bool is_array = glsl_sampler_type_is_array(image_type);
 const bool is_shadow = glsl_sampler_type_is_shadow(image_type);
 diff --git a/src/compiler/spirv/vtn_private.h
 b/src/compiler/spirv/vtn_private.h
 index 84584620fc..6b4645acc8 100644
 --- a/src/compiler/spirv/vtn_private.h
 +++ b/src/compiler/spirv/vtn_private.h
 @@ -411,6 +411,7 @@ struct vtn_image_pointer {
  };

  struct vtn_sampled_image {
 +   struct vtn_type *type;
 struct vtn_pointer *image; /* Image or array of images */
 struct vtn_pointer *sampler; /* Sampler */
  };
 diff --git a/src/compiler/spirv/vtn_variables.c
 b/src/compiler/spirv/vtn_variables.c
 index 1cf9d597cf..9a69b4f6fc 100644
 --- a/src/compiler/spirv/vtn_variables.c
 +++ b/src/compiler/spirv/vtn_variables.c
 @@ -1805,6 +1805,7 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp
 opcode,
   struct vtn_value *val =
  vtn_push_value(b, w[2], vtn_value_type_sampled_image);
   val->sampled_image = ralloc(b, struct vtn_sampled_image);
 + val->sampled_image->type = base_val->sampled_image->type;
   val->sampled_image->image =
  vtn_pointer_dereference(b, base_val->sampled_image->image,
 chain);
   val->sampled_image->sampler = base_val->sampled_image->sampl
 er;
 --
 2.13.6


>>>
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: fix st_nir_assign_var_locations for patch variables

2018-01-06 Thread Karol Herbst
Signed-off-by: Karol Herbst 
Reviewed-by: Kenneth Graunke 
---
 src/mesa/state_tracker/st_glsl_to_nir.cpp | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index 5683df..1c5de3d5de 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -139,8 +139,12 @@ st_nir_assign_var_locations(struct exec_list *var_list, 
unsigned *size,
   }
 
   bool processed = false;
-  if (var->data.patch) {
- unsigned patch_loc = var->data.location - VARYING_SLOT_VAR0;
+  if (var->data.patch &&
+  var->data.location != VARYING_SLOT_TESS_LEVEL_INNER &&
+  var->data.location != VARYING_SLOT_TESS_LEVEL_OUTER &&
+  var->data.location != VARYING_SLOT_BOUNDING_BOX0 &&
+  var->data.location != VARYING_SLOT_BOUNDING_BOX1) {
+ unsigned patch_loc = var->data.location - VARYING_SLOT_PATCH0;
  if (processed_patch_locs & (1 << patch_loc))
 processed = true;
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] nir: Silence unused parameter warnings

2018-01-06 Thread Alejandro Piñeiro
Series:
Reviewed-by: Alejandro Piñeiro 

On 06/01/18 06:40, Ian Romanick wrote:
> From: Ian Romanick 
>
> In file included from src/compiler/nir/nir_opt_algebraic.c:4:0:
> src/compiler/nir/nir_search_helpers.h: In function ‘is_not_const’:
> src/compiler/nir/nir_search_helpers.h:118:59: warning: unused parameter
> ‘num_components’ [-Wunused-parameter]
>  is_not_const(nir_alu_instr *instr, unsigned src, unsigned num_components,
>^~
> src/compiler/nir/nir_search_helpers.h:119:29: warning: unused parameter
> ‘swizzle ’ [-Wunused-parameter]
>   const uint8_t *swizzle)
>  ^~~
>
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/nir/nir_search_helpers.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/nir/nir_search_helpers.h 
> b/src/compiler/nir/nir_search_helpers.h
> index 200f247..2e3bd13 100644
> --- a/src/compiler/nir/nir_search_helpers.h
> +++ b/src/compiler/nir/nir_search_helpers.h
> @@ -115,8 +115,8 @@ is_zero_to_one(nir_alu_instr *instr, unsigned src, 
> unsigned num_components,
>  }
>  
>  static inline bool
> -is_not_const(nir_alu_instr *instr, unsigned src, unsigned num_components,
> - const uint8_t *swizzle)
> +is_not_const(nir_alu_instr *instr, unsigned src, UNUSED unsigned 
> num_components,
> + UNUSED const uint8_t *swizzle)
>  {
> nir_const_value *val = nir_src_as_const_value(instr->src[src].src);
>  

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev