[Mesa-dev] [PATCH] i965: Use the buffer object size for VERTEX_BUFFER_STATE's size field.

2016-05-26 Thread Kenneth Graunke
commit 7c8dfa78b98a12c1c5 (i965/draw: Use the real size for vertex
buffers) changed how we programmed the VERTEX_BUFFER_STATE size field.

Previously, we programmed it to the size of the actual underlying BO,
which is page-aligned, and potentially much larger than the GL buffer
object.  This violated the ARB_robust_buffer_access spec.

With that change, we started programming it based on the range of data
we expect the draw call to actually access - which is based on the
min_index and max_index information provided to glDrawRangeElements().

Unfortunately, applications often provide inaccurate range information
to glDrawRangeElements().  For example, all the Unreal demos appear to
draw using a range of [0, 3] when the index buffer's actual index range
is [0, 5].  Such results are undefined, and we are absolutely allowed
to restrict access to the range they specified.  However, the failure
mode is usually that nothing draws, or misrendering with wild geometry,
which is kind of bad for a common mistake.  And people tend to assume
the range information isn't that important when data is in VBOs.

There's no real advantage, either.  ARB_robust_buffer_access only
requires us to restrict access to the GL buffer object size, not
the range of data we think they should access.  Doing that allows
buggy applications to still function.  (Note that we still use this
information for busy-tracking, so if they try to overwrite the data
with glBufferSubData, they'll still hit a bug.)  This seems to be
safer.

We may want to provide the more strict range as a debug option,
or scan the VBO and warn against bogus glDrawRangeElements in
debug contexts.  That can be done as a later patch, though.

Makes Unreal demos draw again.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_draw_upload.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c 
b/src/mesa/drivers/dri/i965/brw_draw_upload.c
index f4d1b2c..bfe20c5 100644
--- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
@@ -543,6 +543,7 @@ brw_prepare_vertices(struct brw_context *brw)
buffer->offset = offset;
buffer->stride = glarray->StrideB;
buffer->step_rate = glarray->InstanceDivisor;
+buffer->size = glarray->BufferObj->Size - offset;
 
 enabled_buffer[j] = intel_buffer;
 buffer_range_start[j] = start;
@@ -610,8 +611,6 @@ brw_prepare_vertices(struct brw_context *brw)
 
   buffer->bo = intel_bufferobj_buffer(brw, enabled_buffer[i], start, 
range);
   drm_intel_bo_reference(buffer->bo);
-
-  buffer->size = start + range;
}
 
/* If we need to upload all the arrays, then we can trim those arrays to
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] fix load of unpacked double vector input varyings

2016-05-26 Thread Timothy Arceri
On Thu, 2016-05-26 at 07:56 +0200, Samuel Iglesias Gonsálvez wrote:
> Hello,
> 
> Timothy found that tests with unpacked double vector input varyings
> were failing in i965 driver. For example, this is happening when
> using explicit locations because Mesa disables varying packing for
> that case.
> 
> These patches fix the following piglit test:
> 
> spec/arb_gpu_shader_fp64/execution/vs-fs-explicit-locations
> 
> Samuel Iglesias Gonsálvez (2):
>   i965/fs: fix offset when loading double vector input varyings
>   i965/fs: fix FS_OPCODE_CINTERP for unpacked double input varyings
> 
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 13 -
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 22 +-
>  2 files changed, 33 insertions(+), 2 deletions(-)
> 

These are both:

Tested-by: Timothy Arceri 

The logic looks ok to me also, but you probably want someone
more familiar with the backend to take a quick look also.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] fix load of unpacked double vector input varyings

2016-05-26 Thread Timothy Arceri
On Thu, 2016-05-26 at 17:44 +1000, Timothy Arceri wrote:
> On Thu, 2016-05-26 at 07:56 +0200, Samuel Iglesias Gonsálvez wrote:
> > 
> > Hello,
> > 
> > Timothy found that tests with unpacked double vector input varyings
> > were failing in i965 driver. For example, this is happening when
> > using explicit locations because Mesa disables varying packing for
> > that case.
> > 
> > These patches fix the following piglit test:
> > 
> > spec/arb_gpu_shader_fp64/execution/vs-fs-explicit-locations
> > 
> > Samuel Iglesias Gonsálvez (2):
> >   i965/fs: fix offset when loading double vector input varyings
> >   i965/fs: fix FS_OPCODE_CINTERP for unpacked double input varyings
> > 
> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 13 -
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 22
> > +-
> >  2 files changed, 33 insertions(+), 2 deletions(-)
> > 
> These are both:
> 
> Tested-by: Timothy Arceri 
> 
> The logic looks ok to me also, but you probably want someone
> more familiar with the backend to take a quick look also.

Forgot to ask. Did you also check the other stages work as expected
with explicit locations?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix the passthrough TCS for isolines.

2016-05-26 Thread Kenneth Graunke
We weren't setting up several of the uniform values for the patch
header, so we'd crash when uploading push constants.  We at least
need to initialize them to zero.  We also had the isoline parameters
reversed, so it would also render incorrectly (if it didn't crash).

Fixes a new Piglit test(*) (isoline-no-tcs), as well as crashes in
GL44-CTS.tessellation_shader.single.max_patch_vertices.

(*) https://lists.freedesktop.org/archives/piglit/2016-May/019866.html

Signed-off-by: Kenneth Graunke 
Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/drivers/dri/i965/brw_tcs.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c 
b/src/mesa/drivers/dri/i965/brw_tcs.c
index 9589fa5..9374a42 100644
--- a/src/mesa/drivers/dri/i965/brw_tcs.c
+++ b/src/mesa/drivers/dri/i965/brw_tcs.c
@@ -227,19 +227,24 @@ brw_codegen_tcs_prog(struct brw_context *brw,
*/
   const float **param = (const float **) prog_data.base.base.param;
   static float zero = 0.0f;
-  for (int i = 0; i < 4; i++) {
- param[7 - i] = &ctx->TessCtrlProgram.patch_default_outer_level[i];
-  }
+  for (int i = 0; i < 8; i++)
+ param[i] = &zero;
 
   if (key->tes_primitive_mode == GL_QUADS) {
+ for (int i = 0; i < 4; i++)
+param[7 - i] = &ctx->TessCtrlProgram.patch_default_outer_level[i];
+
  param[3] = &ctx->TessCtrlProgram.patch_default_inner_level[0];
  param[2] = &ctx->TessCtrlProgram.patch_default_inner_level[1];
- param[1] = &zero;
- param[0] = &zero;
   } else if (key->tes_primitive_mode == GL_TRIANGLES) {
+ for (int i = 0; i < 3; i++)
+param[7 - i] = &ctx->TessCtrlProgram.patch_default_outer_level[i];
+
  param[4] = &ctx->TessCtrlProgram.patch_default_inner_level[0];
- for (int i = 0; i < 4; i++)
-param[i] = &zero;
+  } else {
+ assert(key->tes_primitive_mode == GL_ISOLINES);
+ param[7] = &ctx->TessCtrlProgram.patch_default_outer_level[1];
+ param[6] = &ctx->TessCtrlProgram.patch_default_outer_level[0];
   }
}
 
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] nvc0: enable 32 textures on kepler+

2016-05-26 Thread Samuel Pitoiset
I think you forgot to increase the array of commands from 16 to 32 in 
nvc0_validate_tsc() (you did it in v1).


With that addressed, this patch is:

Reviewed-by: Samuel Pitoiset 

On 05/26/2016 04:55 AM, Ilia Mirkin wrote:

For fermi, this likely will require use of linked tsc mode. However on
bindless architectures, we can have as many as we want. As it stands,
the AUX_TEX_INFO has 32 teture handles reserved.

Signed-off-by: Ilia Mirkin 
---

v1 -> v2: drop all the 1 -> 1U changes. They don't matter in practice.

 src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
index 436e912..5be78aa 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
@@ -197,7 +197,7 @@ struct nvc0_context {
uint32_t textures_coherent[6];
struct nv50_tsc_entry *samplers[6][PIPE_MAX_SAMPLERS];
unsigned num_samplers[6];
-   uint16_t samplers_dirty[6];
+   uint32_t samplers_dirty[6];
bool seamless_cube_map;

uint32_t tex_handles[6][PIPE_MAX_SAMPLERS]; /* for nve4 */
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 7d692ea..4c47503 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -369,9 +369,9 @@ nvc0_screen_get_shader_param(struct pipe_screen *pscreen, 
unsigned shader,
case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS:
   return NVC0_MAX_BUFFERS;
case PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS:
-  return 16; /* would be 32 in linked (OpenGL-style) mode */
+  return (class_3d >= NVE4_3D_CLASS) ? 32 : 16;
case PIPE_SHADER_CAP_MAX_SAMPLER_VIEWS:
-  return 16; /* XXX not sure if more are really safe */
+  return (class_3d >= NVE4_3D_CLASS) ? 32 : 16;
case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT:
   return 32;
case PIPE_SHADER_CAP_MAX_SHADER_IMAGES:



--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/dri: fix winsys handle stride calculation for block formats

2016-05-26 Thread Michel Dänzer
On 25.05.2016 22:20, Philipp Zabel wrote:
> This fixes the stride calculation for pipe formats with a block width
> larger than one.
> 
> Signed-off-by: Philipp Zabel 
> ---
>  src/gallium/state_trackers/dri/dri2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/state_trackers/dri/dri2.c 
> b/src/gallium/state_trackers/dri/dri2.c
> index 0c84baf..c0b0d21 100644
> --- a/src/gallium/state_trackers/dri/dri2.c
> +++ b/src/gallium/state_trackers/dri/dri2.c
> @@ -804,7 +804,7 @@ dri2_create_image_from_name(__DRIscreen *_screen,
> if (pf == PIPE_FORMAT_NONE)
>return NULL;
>  
> -   whandle.stride = pitch * util_format_get_blocksize(pf);
> +   whandle.stride = util_format_get_stride(pf, pitch);
>  
> return dri2_create_image_from_winsys(_screen, width, height, format,
>  &whandle, loaderPrivate);
> 

Reviewed-by: Michel Dänzer 

Do you need somebody to push this patch for you?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/uvd: fix the H264 level for Tonga

2016-05-26 Thread Andy Furniss

Alex Deucher wrote:

On Wed, May 25, 2016 at 10:57 AM, Christian König
 wrote:

From: Christian König 

We support 5.1 for a while now.


Resend as the last one didn't have the CCs.

I know (well think) vdpau doesn't really mention 5.2 anywhere, but for
ffmpeg I've been making this change for some time to say 5.2.

Tonga can easily do 5.2, players don't seem to look at this field, but
ffmpeg cli now does and will refuse to use uvd for 5.2 vids.

In the past ffmpeg cli also didn't look at this, but they merged
something in from libav which changed things.

I have a trac open, but the dev who replied said fix the driver - he
didn't reply further when I said I didn't think vdpau went as high as
5.2 ...

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gbm: Enable DRI2 fence extension in the GBM DRI backend

2016-05-26 Thread Philipp Zabel
Hi Marek,

Am Mittwoch, den 25.05.2016, 20:52 +0200 schrieb Marek Olšák:
> On Wed, May 25, 2016 at 4:46 PM, Philipp Zabel  wrote:
> > Am Mittwoch, den 25.05.2016, 16:01 +0200 schrieb Marek Olšák:
> >> On Wed, May 25, 2016 at 3:44 PM, Philipp Zabel  
> >> wrote:
> >> > Am Dienstag, den 10.05.2016, 17:35 +0200 schrieb Philipp Zabel:
> >> >> To support the EGL_KHR_fence_sync extension on the DRM EGL platform,
> >> >> add the DRI2 fence extension to the dri_core_extensions match table.
> >> >>
> >> >> Signed-off-by: Philipp Zabel 
> >> >
> >> > Gentle ping. Is this about the right way to enable the
> >> > EGL_KHR_fence_sync extension on DRM EGL platforms?
> >>
> >> Unlikely. Where are the __DRI2fenceExtension callbacks implemented?
> >
> > The callbacks are implemented and added to the dri_screen_extensions[]
> > array in src/gallium/state_trackers/dri/dri2.c. The array is assigned to
> > the __DRIscreen member "extensions" in dri2_init_screen().
> >
> > dri_screen_create_dri2() in src/gbm/backends/dri/gbm_dri.c
> > then obtains the extensions array via dri->core->getExtensions() and
> > binds selected extensions to the gbm_dri_device according to the
> > placement information in the dri_core_extensions[] array.
> > This was already done for the flush and image extensions, so I have
> > similarly added a fence extension pointer to the gbm_dri_device and an
> > entry to dri_core_extensions to have it initialized from the dri2
> > extension array that already contained the fence extension.
> >
> > dri2_initialize_drm() in src/egl/drivers/dri2/platform_drm.c
> > then copies the extension pointers from the gbm_dri_device
> > dri2_dpy->gbm_dri into the dri2_egl_display dri2_dpy proper.
> > This also was already done for a few other extensions, among them image
> > and flush, and the dri2_egl_display already has a fence pointer that I
> > used to assign to the gbm_dri_device's new fence pointer.
> >
> > dri2_setup_screen() in src/egl/drivers/dri2/egl_dri2.c later checks
> > dri2_dpy->fence to enable the extension.
> 
> As you can see, I'm not very familiar with libgbm. Hopefully somebody
> else will take a look.

Ok, thanks. As you can see from my rambling reply versus Emil's succinct
summary, I am not either. Just followed the breadcrumbs and found you
two at the top of get_reviewer.pl output.

regards
Philipp

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gbm: Enable DRI2 fence extension in the GBM DRI backend

2016-05-26 Thread Philipp Zabel
Hi Emil,

Am Mittwoch, den 25.05.2016, 23:42 +0100 schrieb Emil Velikov:
[...]
> Or in other words, in case of egl + gbm, egl inherits the screen from
> the gbm device. As such platform_gbm does not call the core egl setup
> function, dri2_create_screen (like everyone else does x11, wayland...)
> but only the follow-up dri2_setup_screen.

Thank you for the explanation. What is the reason for this indirection?

> That said this patch will break things when using old libgbm and new
> libEGL and vice-versa. Sadly there's no way around it atm.
> Thus can we get an ABI check so that in the future we printout a
> message and abort early, instead of crashing in spectacular ways down
> the line?

I didn't think of that. How do you envision this ABI check to look like?
gbm(_drm)_device currently don't have any version fields and I'm not
sure how a new gbm backend would check for an old libEGL.
The first thing that comes to mind is a simple ABI version number to be
incremented in lock-step between libgbm and libEGL.

regards
Philipp

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/dri: fix winsys handle stride calculation for block formats

2016-05-26 Thread Philipp Zabel
Hi Michel,

Am Donnerstag, den 26.05.2016, 17:59 +0900 schrieb Michel Dänzer:
> On 25.05.2016 22:20, Philipp Zabel wrote:
> > This fixes the stride calculation for pipe formats with a block width
> > larger than one.
> > 
> > Signed-off-by: Philipp Zabel 
> > ---
> >  src/gallium/state_trackers/dri/dri2.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/gallium/state_trackers/dri/dri2.c 
> > b/src/gallium/state_trackers/dri/dri2.c
> > index 0c84baf..c0b0d21 100644
> > --- a/src/gallium/state_trackers/dri/dri2.c
> > +++ b/src/gallium/state_trackers/dri/dri2.c
> > @@ -804,7 +804,7 @@ dri2_create_image_from_name(__DRIscreen *_screen,
> > if (pf == PIPE_FORMAT_NONE)
> >return NULL;
> >  
> > -   whandle.stride = pitch * util_format_get_blocksize(pf);
> > +   whandle.stride = util_format_get_stride(pf, pitch);
> >  
> > return dri2_create_image_from_winsys(_screen, width, height, format,
> >  &whandle, loaderPrivate);
> > 
> 
> Reviewed-by: Michel Dänzer 
> 
> Do you need somebody to push this patch for you?

Yes, thank you.

regards
Philipp

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] nvc0: enable 32 textures on kepler+

2016-05-26 Thread Ilia Mirkin
Nope, it was one of the irrelevant changes. If I were enabling 32 on Fermi,
it'd matter, but I'm not.
On May 26, 2016 04:37, "Samuel Pitoiset"  wrote:

> I think you forgot to increase the array of commands from 16 to 32 in
> nvc0_validate_tsc() (you did it in v1).
>
> With that addressed, this patch is:
>
> Reviewed-by: Samuel Pitoiset 
>
> On 05/26/2016 04:55 AM, Ilia Mirkin wrote:
>
>> For fermi, this likely will require use of linked tsc mode. However on
>> bindless architectures, we can have as many as we want. As it stands,
>> the AUX_TEX_INFO has 32 teture handles reserved.
>>
>> Signed-off-by: Ilia Mirkin 
>> ---
>>
>> v1 -> v2: drop all the 1 -> 1U changes. They don't matter in practice.
>>
>>  src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 2 +-
>>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c  | 4 ++--
>>  2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
>> b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
>> index 436e912..5be78aa 100644
>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
>> @@ -197,7 +197,7 @@ struct nvc0_context {
>> uint32_t textures_coherent[6];
>> struct nv50_tsc_entry *samplers[6][PIPE_MAX_SAMPLERS];
>> unsigned num_samplers[6];
>> -   uint16_t samplers_dirty[6];
>> +   uint32_t samplers_dirty[6];
>> bool seamless_cube_map;
>>
>> uint32_t tex_handles[6][PIPE_MAX_SAMPLERS]; /* for nve4 */
>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> index 7d692ea..4c47503 100644
>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> @@ -369,9 +369,9 @@ nvc0_screen_get_shader_param(struct pipe_screen
>> *pscreen, unsigned shader,
>> case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS:
>>return NVC0_MAX_BUFFERS;
>> case PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS:
>> -  return 16; /* would be 32 in linked (OpenGL-style) mode */
>> +  return (class_3d >= NVE4_3D_CLASS) ? 32 : 16;
>> case PIPE_SHADER_CAP_MAX_SAMPLER_VIEWS:
>> -  return 16; /* XXX not sure if more are really safe */
>> +  return (class_3d >= NVE4_3D_CLASS) ? 32 : 16;
>> case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT:
>>return 32;
>> case PIPE_SHADER_CAP_MAX_SHADER_IMAGES:
>>
>>
> --
> -Samuel
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] nvc0: enable 32 textures on kepler+

2016-05-26 Thread Samuel Pitoiset



On 05/26/2016 01:22 PM, Ilia Mirkin wrote:

Nope, it was one of the irrelevant changes. If I were enabling 32 on
Fermi, it'd matter, but I'm not.


Right, this function is for fermi only.
Looks good.



On May 26, 2016 04:37, "Samuel Pitoiset" mailto:samuel.pitoi...@gmail.com>> wrote:

I think you forgot to increase the array of commands from 16 to 32
in nvc0_validate_tsc() (you did it in v1).

With that addressed, this patch is:

Reviewed-by: Samuel Pitoiset mailto:samuel.pitoi...@gmail.com>>

On 05/26/2016 04:55 AM, Ilia Mirkin wrote:

For fermi, this likely will require use of linked tsc mode.
However on
bindless architectures, we can have as many as we want. As it
stands,
the AUX_TEX_INFO has 32 teture handles reserved.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

v1 -> v2: drop all the 1 -> 1U changes. They don't matter in
practice.

 src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
index 436e912..5be78aa 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
@@ -197,7 +197,7 @@ struct nvc0_context {
uint32_t textures_coherent[6];
struct nv50_tsc_entry *samplers[6][PIPE_MAX_SAMPLERS];
unsigned num_samplers[6];
-   uint16_t samplers_dirty[6];
+   uint32_t samplers_dirty[6];
bool seamless_cube_map;

uint32_t tex_handles[6][PIPE_MAX_SAMPLERS]; /* for nve4 */
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 7d692ea..4c47503 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -369,9 +369,9 @@ nvc0_screen_get_shader_param(struct
pipe_screen *pscreen, unsigned shader,
case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS:
   return NVC0_MAX_BUFFERS;
case PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS:
-  return 16; /* would be 32 in linked (OpenGL-style) mode */
+  return (class_3d >= NVE4_3D_CLASS) ? 32 : 16;
case PIPE_SHADER_CAP_MAX_SAMPLER_VIEWS:
-  return 16; /* XXX not sure if more are really safe */
+  return (class_3d >= NVE4_3D_CLASS) ? 32 : 16;
case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT:
   return 32;
case PIPE_SHADER_CAP_MAX_SHADER_IMAGES:


--
-Samuel



--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gbm: Enable DRI2 fence extension in the GBM DRI backend

2016-05-26 Thread Emil Velikov
On 26 May 2016 at 11:28, Philipp Zabel  wrote:
> Hi Emil,
>
> Am Mittwoch, den 25.05.2016, 23:42 +0100 schrieb Emil Velikov:
> [...]
>> Or in other words, in case of egl + gbm, egl inherits the screen from
>> the gbm device. As such platform_gbm does not call the core egl setup
>> function, dri2_create_screen (like everyone else does x11, wayland...)
>> but only the follow-up dri2_setup_screen.
>
> Thank you for the explanation. What is the reason for this indirection?
>
The idea is that in order for one to use GBM, alone, you do need a
dri_screen at the bare minimum. That's how you interact with the dri
module probing/querying various things. At the same time as you use
both APIs in the same program you'd want those to be shared, as
otherwise, from a dri module point of view, you'll be having two
separate users/clients. Which makes it harder/impossible to implement
some some extensions such as EGL_EXT_image_dma_buf_import.

^^ Is my non-expert take on it, so it might be a bit off.

>> That said this patch will break things when using old libgbm and new
>> libEGL and vice-versa. Sadly there's no way around it atm.
>> Thus can we get an ABI check so that in the future we printout a
>> message and abort early, instead of crashing in spectacular ways down
>> the line?
>
> I didn't think of that. How do you envision this ABI check to look like?
> gbm(_drm)_device currently don't have any version fields and I'm not
> sure how a new gbm backend would check for an old libEGL.
> The first thing that comes to mind is a simple ABI version number to be
> incremented in lock-step between libgbm and libEGL.
>
Sadly there's nothing we can do for old libgbm/libEGL. The proposed
approach, on the other hand sounds great imho.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/dri: fix winsys handle stride calculation for block formats

2016-05-26 Thread Emil Velikov
Hi gents,

On 26 May 2016 at 11:28, Philipp Zabel  wrote:
> Hi Michel,
>
> Am Donnerstag, den 26.05.2016, 17:59 +0900 schrieb Michel Dänzer:
>> On 25.05.2016 22:20, Philipp Zabel wrote:
>> > This fixes the stride calculation for pipe formats with a block width
>> > larger than one.
>> >
>> > Signed-off-by: Philipp Zabel 
>> > ---
>> >  src/gallium/state_trackers/dri/dri2.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/src/gallium/state_trackers/dri/dri2.c 
>> > b/src/gallium/state_trackers/dri/dri2.c
>> > index 0c84baf..c0b0d21 100644
>> > --- a/src/gallium/state_trackers/dri/dri2.c
>> > +++ b/src/gallium/state_trackers/dri/dri2.c
>> > @@ -804,7 +804,7 @@ dri2_create_image_from_name(__DRIscreen *_screen,
>> > if (pf == PIPE_FORMAT_NONE)
>> >return NULL;
>> >
>> > -   whandle.stride = pitch * util_format_get_blocksize(pf);
>> > +   whandle.stride = util_format_get_stride(pf, pitch);
>> >
>> > return dri2_create_image_from_winsys(_screen, width, height, format,
>> >  &whandle, loaderPrivate);
>> >
>>
>> Reviewed-by: Michel Dänzer 
>>
>> Do you need somebody to push this patch for you?
>
> Yes, thank you.
>
Can we add a note if this fixes a real world case (on which driver
and/or format) ? Is it worth adding this patch in stable releases ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/10] compiler: Move glsl_to_nir to libglsl.la

2016-05-26 Thread Emil Velikov
Hi Jason,

On 26 May 2016 at 02:52, Jason Ekstrand  wrote:
> Right now libglsl.la depends on libnir.la so putting it in libnir.la
> adds a dependency on libglsl.la that goes the wrong direction.
> ---
>  src/compiler/Makefile.am  |2 +
>  src/compiler/Makefile.nir.am  |5 -
>  src/compiler/Makefile.sources |4 +-
>  src/compiler/glsl/glsl_to_nir.cpp | 2026 
> +
>  src/compiler/glsl/glsl_to_nir.h   |   42 +
>  src/compiler/nir/glsl_to_nir.cpp  | 2026 
> -
>  src/compiler/nir/glsl_to_nir.h|   42 -
>  src/mesa/drivers/dri/i965/brw_nir.c   |2 +-
>  src/mesa/state_tracker/st_glsl_to_nir.cpp |2 +-
>  9 files changed, 2074 insertions(+), 2077 deletions(-)
>  create mode 100644 src/compiler/glsl/glsl_to_nir.cpp
>  create mode 100644 src/compiler/glsl/glsl_to_nir.h
>  delete mode 100644 src/compiler/nir/glsl_to_nir.cpp
>  delete mode 100644 src/compiler/nir/glsl_to_nir.h
>
> diff --git a/src/compiler/Makefile.am b/src/compiler/Makefile.am
> index dc30f90..710ac5a 100644
> --- a/src/compiler/Makefile.am
> +++ b/src/compiler/Makefile.am
> @@ -31,6 +31,8 @@ AM_CPPFLAGS = \
> -I$(top_builddir)/src/compiler/glsl\
> -I$(top_srcdir)/src/compiler/glsl\
> -I$(top_srcdir)/src/compiler/glsl/glcpp\
> +   -I$(top_builddir)/src/compiler/nir\
> +   -I$(top_srcdir)/src/compiler/nir\
These should be moved and/or duplicated in Makefile.glsl.am shouldn't
they ? Please add space before \, as Matt suggested.

On the SCons side you'll likely need something like the following. If
it doesn't work out, please cc Jose so that he's aware of things going
crazy.

--- a/src/compiler/SConscript
+++ b/src/compiler/SConscript
@@ -21,5 +21,5 @@ compiler = env.ConvenienceLibrary(
 )
 Export('compiler')

-SConscript('SConscript.glsl')
 SConscript('SConscript.nir')
+SConscript('SConscript.glsl')
diff --git a/src/compiler/SConscript.glsl b/src/compiler/SConscript.glsl
index 43a11d1..8d4f71e 100644
--- a/src/compiler/SConscript.glsl
+++ b/src/compiler/SConscript.glsl
@@ -17,12 +17,14 @@ env.Prepend(CPPPATH = [
 '#src/gallium/auxiliary',
 '#src/compiler/glsl',
 '#src/compiler/glsl/glcpp',
+'#src/compiler/nir',
 ])

 env.Prepend(LIBS = [mesautil])

 # Make glcpp-parse.h and glsl_parser.h reachable from the include path.
 env.Prepend(CPPPATH = [Dir('.').abspath, Dir('glsl').abspath])
+env.Prepend(CPPPATH = [Dir('.').abspath, Dir('nir').abspath])

 glcpp_env = env.Clone()
 glcpp_env.Append(YACCFLAGS = [


Feel free to set the following, either locally or globally.

$ git config diff.renames true

This way git will generate -M patches (moved/renamed file) by default.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/dri: fix winsys handle stride calculation for block formats

2016-05-26 Thread Philipp Zabel
Am Donnerstag, den 26.05.2016, 12:43 +0100 schrieb Emil Velikov:
> Hi gents,
> 
> On 26 May 2016 at 11:28, Philipp Zabel  wrote:
> > Hi Michel,
> >
> > Am Donnerstag, den 26.05.2016, 17:59 +0900 schrieb Michel Dänzer:
> >> On 25.05.2016 22:20, Philipp Zabel wrote:
> >> > This fixes the stride calculation for pipe formats with a block width
> >> > larger than one.
> >> >
> >> > Signed-off-by: Philipp Zabel 
> >> > ---
> >> >  src/gallium/state_trackers/dri/dri2.c | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/src/gallium/state_trackers/dri/dri2.c 
> >> > b/src/gallium/state_trackers/dri/dri2.c
> >> > index 0c84baf..c0b0d21 100644
> >> > --- a/src/gallium/state_trackers/dri/dri2.c
> >> > +++ b/src/gallium/state_trackers/dri/dri2.c
> >> > @@ -804,7 +804,7 @@ dri2_create_image_from_name(__DRIscreen *_screen,
> >> > if (pf == PIPE_FORMAT_NONE)
> >> >return NULL;
> >> >
> >> > -   whandle.stride = pitch * util_format_get_blocksize(pf);
> >> > +   whandle.stride = util_format_get_stride(pf, pitch);
> >> >
> >> > return dri2_create_image_from_winsys(_screen, width, height, format,
> >> >  &whandle, loaderPrivate);
> >> >
> >>
> >> Reviewed-by: Michel Dänzer 
> >>
> >> Do you need somebody to push this patch for you?
> >
> > Yes, thank you.
> >
> Can we add a note if this fixes a real world case (on which driver
> and/or format) ? Is it worth adding this patch in stable releases ?

I encountered this when trying to import YUYV buffers via
EGL_EXT_image_dma_buf_import into the (still out of tree) etnaviv
gallium driver. Since I currently still have the following patch
applied, I don't think this is a stable issue, at least regarding YUYV:

--8<--
Subject: [PATCH] WIP: st/dri: Allow YUYV import

Unclear whether this is the right way, but this allows to import
dma-buffers with YUYV pixel format.

Signed-off-by: Philipp Zabel 

diff --git a/src/gallium/state_trackers/dri/dri2.c
b/src/gallium/state_trackers/dri/dri2.c
index e07389c..bad1d90 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -70,6 +70,10 @@ static int convert_fourcc(int format, int
*dri_components_p)
   format = __DRI_IMAGE_FORMAT_XBGR;
   dri_components = __DRI_IMAGE_COMPONENTS_RGB;
   break;
+   case __DRI_IMAGE_FOURCC_YUYV:
+  format = __DRI_IMAGE_FOURCC_YUYV;
+  dri_components = __DRI_IMAGE_COMPONENTS_Y_XUXV;
+  break;
default:
   return -1;
}
@@ -118,6 +122,9 @@ static enum pipe_format dri2_format_to_pipe_format
(int format)
case __DRI_IMAGE_FORMAT_ABGR:
   pf = PIPE_FORMAT_RGBA_UNORM;
   break;
+   case __DRI_IMAGE_FOURCC_YUYV:
+  pf = PIPE_FORMAT_YUYV;
+  break;
default:
   pf = PIPE_FORMAT_NONE;
   break;
-->8--

While I have your attention, should the above be handled by adding a
__DRI_IMAGE_FORMAT_YUYV instead?

regards
Philipp

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/10] compiler: Move glsl_to_nir to libglsl.la

2016-05-26 Thread Rob Herring
On Thu, May 26, 2016 at 6:59 AM, Emil Velikov  wrote:
> Hi Jason,
>
> On 26 May 2016 at 02:52, Jason Ekstrand  wrote:
>> Right now libglsl.la depends on libnir.la so putting it in libnir.la
>> adds a dependency on libglsl.la that goes the wrong direction.
>> ---
>>  src/compiler/Makefile.am  |2 +
>>  src/compiler/Makefile.nir.am  |5 -
>>  src/compiler/Makefile.sources |4 +-
>>  src/compiler/glsl/glsl_to_nir.cpp | 2026 
>> +
>>  src/compiler/glsl/glsl_to_nir.h   |   42 +
>>  src/compiler/nir/glsl_to_nir.cpp  | 2026 
>> -
>>  src/compiler/nir/glsl_to_nir.h|   42 -
>>  src/mesa/drivers/dri/i965/brw_nir.c   |2 +-
>>  src/mesa/state_tracker/st_glsl_to_nir.cpp |2 +-
>>  9 files changed, 2074 insertions(+), 2077 deletions(-)
>>  create mode 100644 src/compiler/glsl/glsl_to_nir.cpp
>>  create mode 100644 src/compiler/glsl/glsl_to_nir.h
>>  delete mode 100644 src/compiler/nir/glsl_to_nir.cpp
>>  delete mode 100644 src/compiler/nir/glsl_to_nir.h
>>
>> diff --git a/src/compiler/Makefile.am b/src/compiler/Makefile.am
>> index dc30f90..710ac5a 100644
>> --- a/src/compiler/Makefile.am
>> +++ b/src/compiler/Makefile.am
>> @@ -31,6 +31,8 @@ AM_CPPFLAGS = \
>> -I$(top_builddir)/src/compiler/glsl\
>> -I$(top_srcdir)/src/compiler/glsl\
>> -I$(top_srcdir)/src/compiler/glsl/glcpp\
>> +   -I$(top_builddir)/src/compiler/nir\
>> +   -I$(top_srcdir)/src/compiler/nir\
> These should be moved and/or duplicated in Makefile.glsl.am shouldn't
> they ? Please add space before \, as Matt suggested.
>
> On the SCons side you'll likely need something like the following. If
> it doesn't work out, please cc Jose so that he's aware of things going
> crazy.

Android too will need something similar.

Rob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nvc0/ir: avoid generating illegal instructions for compute constbuf loads

2016-05-26 Thread Hans de Goede

Hi,

On 26-05-16 04:44, Ilia Mirkin wrote:

For user-supplied constbufs, fileIndex is 0. In that case, when we
subtract 1, we'll end up loading from constbuf offset -16. This is
illegal, and there are asserts to avoid it. Normally we'd just DCE it,
but no point in generating the instructions if they're not going to be
used.

Signed-off-by: Ilia Mirkin 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index 869040c..da2fa4b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -2180,11 +2180,11 @@ NVC0LoweringPass::handleLDST(Instruction *i)
  // memory.
  int8_t fileIndex = i->getSrc(0)->reg.fileIndex - 1;
  Value *ind = i->getIndirect(0, 1);
- Value *ptr = loadUboInfo64(ind, fileIndex * 16);

  // TODO: clamp the offset to the maximum number of const buf.
  if (i->src(0).isIndirect(1)) {
 Value *offset = bld.loadImm(NULL, i->getSrc(0)->reg.data.offset + 
typeSizeof(i->sType));
+Value *ptr = loadUboInfo64(ind, fileIndex * 16);
 Value *length = loadUboLength32(ind, fileIndex * 16);
 Value *pred = new_LValue(func, FILE_PREDICATE);
 if (i->src(0).isIndirect(0)) {
@@ -2200,6 +2200,7 @@ NVC0LoweringPass::handleLDST(Instruction *i)
bld.mkMov(i->getDef(0), bld.mkImm(0));
 }
  } else if (fileIndex >= 0) {
+Value *ptr = loadUboInfo64(ind, fileIndex * 16);
 if (i->src(0).isIndirect(0)) {
bld.mkOp2(OP_ADD, TYPE_U64, ptr, ptr, i->getIndirect(0, 0));
 }



This patch does not seem to actually change anything, you've just moved the 
exact
same declaration to 2 places ... ?

Regards,

Hans



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nvc0/ir: avoid generating illegal instructions for compute constbuf loads

2016-05-26 Thread Ilia Mirkin
On Thu, May 26, 2016 at 8:41 AM, Hans de Goede  wrote:
> Hi,
>
>
> On 26-05-16 04:44, Ilia Mirkin wrote:
>>
>> For user-supplied constbufs, fileIndex is 0. In that case, when we
>> subtract 1, we'll end up loading from constbuf offset -16. This is
>> illegal, and there are asserts to avoid it. Normally we'd just DCE it,
>> but no point in generating the instructions if they're not going to be
>> used.
>>
>> Signed-off-by: Ilia Mirkin 
>> ---
>>  src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> index 869040c..da2fa4b 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>> @@ -2180,11 +2180,11 @@ NVC0LoweringPass::handleLDST(Instruction *i)
>>   // memory.
>>   int8_t fileIndex = i->getSrc(0)->reg.fileIndex - 1;
>>   Value *ind = i->getIndirect(0, 1);
>> - Value *ptr = loadUboInfo64(ind, fileIndex * 16);
>>
>>   // TODO: clamp the offset to the maximum number of const buf.
>>   if (i->src(0).isIndirect(1)) {
>>  Value *offset = bld.loadImm(NULL,
>> i->getSrc(0)->reg.data.offset + typeSizeof(i->sType));
>> +Value *ptr = loadUboInfo64(ind, fileIndex * 16);
>>  Value *length = loadUboLength32(ind, fileIndex * 16);
>>  Value *pred = new_LValue(func, FILE_PREDICATE);
>>  if (i->src(0).isIndirect(0)) {
>> @@ -2200,6 +2200,7 @@ NVC0LoweringPass::handleLDST(Instruction *i)
>> bld.mkMov(i->getDef(0), bld.mkImm(0));
>>  }
>>   } else if (fileIndex >= 0) {
>> +Value *ptr = loadUboInfo64(ind, fileIndex * 16);
>>  if (i->src(0).isIndirect(0)) {
>> bld.mkOp2(OP_ADD, TYPE_U64, ptr, ptr, i->getIndirect(0,
>> 0));
>>  }
>>
>
> This patch does not seem to actually change anything, you've just moved the
> exact
> same declaration to 2 places ... ?

If loadUboInfo64 had no side-effects you'd be right. However it
inserts instructions into the current (builder's) bb.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4] glsl: enforce invariant conditions for built-in variables

2016-05-26 Thread Lars Hamre
Ping.

On Tue, May 17, 2016 at 10:49 AM, Lars Hamre  wrote:
> Gentle ping, if nobody has an issues I would appreciate a push.
>
> Regards,
> Lars Hamre
>
> On Mon, May 9, 2016 at 7:00 PM, Lars Hamre  wrote:
>> v3/v4:
>>  - compare varying slot locations rather than names (Ilia Mirkin)
>> v2:
>>  - ES version check (Tapani Pälli)
>>
>> The conditions for which certain built-in special variables
>> can be declared invariant were not being checked.
>>
>> GLSL ES 1.00 specification, Section "Invariance and linkage" says:
>>
>> For the built-in special variables, gl_FragCoord can
>> only be declared invariant if and only if gl_Position is
>> declared invariant. Similarly gl_PointCoord can only be
>> declared invariant if and only if gl_PointSize is declared
>> invariant. It is an error to declare gl_FrontFacing as invariant.
>>
>> This fixes the following piglit tests in spec/glsl-es-1.00/linker:
>> glsl-fcoord-invariant
>> glsl-fface-invariant
>> glsl-pcoord-invariant
>>
>> Signed-off-by: Lars Hamre 
>>
>> ---
>>
>> CC: Ilia Mirkin 
>>
>> NOTE: Someone with access will need to commit this after the
>>   review process
>>
>>  src/compiler/glsl/link_varyings.cpp | 43 
>> +++--
>>  1 file changed, 41 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/compiler/glsl/link_varyings.cpp 
>> b/src/compiler/glsl/link_varyings.cpp
>> index 34e82c7..c7c9f5f 100644
>> --- a/src/compiler/glsl/link_varyings.cpp
>> +++ b/src/compiler/glsl/link_varyings.cpp
>> @@ -352,13 +352,23 @@ cross_validate_outputs_to_inputs(struct 
>> gl_shader_program *prog,
>> glsl_symbol_table parameters;
>> ir_variable *explicit_locations[MAX_VARYING][4] = { {NULL, NULL} };
>>
>> +   bool is_gl_position_invariant = false;
>> +   bool is_gl_point_size_invariant = false;
>> +
>> /* Find all shader outputs in the "producer" stage.
>>  */
>> foreach_in_list(ir_instruction, node, producer->ir) {
>>ir_variable *const var = node->as_variable();
>>
>>if ((var == NULL) || (var->data.mode != ir_var_shader_out))
>> -continue;
>> + continue;
>> +
>> +  if (prog->IsES && prog->Version < 300) {
>> + if (var->data.location == VARYING_SLOT_POS)
>> +is_gl_position_invariant = var->data.invariant;
>> + if (var->data.location == VARYING_SLOT_PSIZ)
>> +is_gl_point_size_invariant = var->data.invariant;
>> +  }
>>
>>if (!var->data.explicit_location
>>|| var->data.location < VARYING_SLOT_VAR0)
>> @@ -442,7 +452,36 @@ cross_validate_outputs_to_inputs(struct 
>> gl_shader_program *prog,
>>ir_variable *const input = node->as_variable();
>>
>>if ((input == NULL) || (input->data.mode != ir_var_shader_in))
>> -continue;
>> + continue;
>> +
>> +  /*
>> +   * GLSL ES 1.00 specification, Section "Invariance and linkage" says:
>> +   *
>> +   *  "For the built-in special variables, gl_FragCoord can
>> +   *  only be declared invariant if and only if gl_Position is
>> +   *  declared invariant. Similarly gl_PointCoord can only be
>> +   *  declared invariant if and only if gl_PointSize is declared
>> +   *  invariant. It is an error to declare gl_FrontFacing as invariant."
>> +   */
>> +  if (prog->IsES && prog->Version < 300 && input->data.invariant) {
>> + if (input->data.location == VARYING_SLOT_FACE) {
>> +linker_error(prog,
>> + "gl_FrontFacing cannot be declared invariant");
>> +return;
>> + } else if (!is_gl_position_invariant &&
>> +input->data.location == VARYING_SLOT_POS) {
>> +linker_error(prog,
>> + "gl_FragCoord cannot be declared invariant "
>> + "unless gl_Position is also invariant");
>> +return;
>> + } else if (!is_gl_point_size_invariant &&
>> +input->data.location == VARYING_SLOT_PNTC) {
>> +linker_error(prog,
>> + "gl_PointCoord cannot be declared invariant "
>> + "unless gl_PointSize is also invariant");
>> +return;
>> + }
>> +  }
>>
>>if (strcmp(input->name, "gl_Color") == 0 && input->data.used) {
>>   const ir_variable *const front_color =
>> --
>> 2.5.5
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/tgsi: use _mesa_roundevenf in micro_rnd

2016-05-26 Thread Lars Hamre
Gentle ping for a gallium developer.
If nobody has any issues I would appreciate a push.

Regards,
Lars Hamre

On Thu, May 19, 2016 at 6:16 PM, Matt Turner  wrote:
> On Thu, May 19, 2016 at 2:34 PM, Lars Hamre  wrote:
>> Fixes the following piglit tests (for softpipe):
>>
>> /spec/glsl-1.30/execution/built-in-functions/...
>> fs-roundeven-float
>> fs-roundeven-vec2
>> fs-roundeven-vec3
>> fs-roundeven-vec4
>> vs-roundeven-float
>> vs-roundeven-vec2
>> vs-roundeven-vec3
>> vs-roundeven-vec4
>>
>> /spec/glsl-1.50/execution/built-in-functions/...
>> gs-roundeven-float
>> gs-roundeven-vec2
>> gs-roundeven-vec3
>> gs-roundeven-vec4
>>
>> Signed-off-by: Lars Hamre 
>>
>> ---
>>
>> Note: someone with access will need to commit this
>>   after the review process.
>
> I'm not going to commit it myself because I don't work on a Gallium
> driver, but I'm very glad to see the patch.
>
> Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nvc0/ir: avoid generating illegal instructions for compute constbuf loads

2016-05-26 Thread Hans de Goede

Hi,

On 26-05-16 14:43, Ilia Mirkin wrote:

On Thu, May 26, 2016 at 8:41 AM, Hans de Goede  wrote:

Hi,


On 26-05-16 04:44, Ilia Mirkin wrote:


For user-supplied constbufs, fileIndex is 0. In that case, when we
subtract 1, we'll end up loading from constbuf offset -16. This is
illegal, and there are asserts to avoid it. Normally we'd just DCE it,
but no point in generating the instructions if they're not going to be
used.

Signed-off-by: Ilia Mirkin 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index 869040c..da2fa4b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -2180,11 +2180,11 @@ NVC0LoweringPass::handleLDST(Instruction *i)
  // memory.
  int8_t fileIndex = i->getSrc(0)->reg.fileIndex - 1;
  Value *ind = i->getIndirect(0, 1);
- Value *ptr = loadUboInfo64(ind, fileIndex * 16);

  // TODO: clamp the offset to the maximum number of const buf.
  if (i->src(0).isIndirect(1)) {
 Value *offset = bld.loadImm(NULL,
i->getSrc(0)->reg.data.offset + typeSizeof(i->sType));
+Value *ptr = loadUboInfo64(ind, fileIndex * 16);
 Value *length = loadUboLength32(ind, fileIndex * 16);
 Value *pred = new_LValue(func, FILE_PREDICATE);
 if (i->src(0).isIndirect(0)) {
@@ -2200,6 +2200,7 @@ NVC0LoweringPass::handleLDST(Instruction *i)
bld.mkMov(i->getDef(0), bld.mkImm(0));
 }
  } else if (fileIndex >= 0) {
+Value *ptr = loadUboInfo64(ind, fileIndex * 16);
 if (i->src(0).isIndirect(0)) {
bld.mkOp2(OP_ADD, TYPE_U64, ptr, ptr, i->getIndirect(0,
0));
 }



This patch does not seem to actually change anything, you've just moved the
exact
same declaration to 2 places ... ?


If loadUboInfo64 had no side-effects you'd be right. However it
inserts instructions into the current (builder's) bb.


Ah, ok I see, fwiw this patch looks good to me then:

Acked-by: Hans de Goede 

Regards,

Hans





  -ilia


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/tgsi: use _mesa_roundevenf in micro_rnd

2016-05-26 Thread Brian Paul

Will do.

-Brian

On 05/26/2016 06:51 AM, Lars Hamre wrote:

Gentle ping for a gallium developer.
If nobody has any issues I would appreciate a push.

Regards,
Lars Hamre

On Thu, May 19, 2016 at 6:16 PM, Matt Turner  wrote:

On Thu, May 19, 2016 at 2:34 PM, Lars Hamre  wrote:

Fixes the following piglit tests (for softpipe):

/spec/glsl-1.30/execution/built-in-functions/...
fs-roundeven-float
fs-roundeven-vec2
fs-roundeven-vec3
fs-roundeven-vec4
vs-roundeven-float
vs-roundeven-vec2
vs-roundeven-vec3
vs-roundeven-vec4

/spec/glsl-1.50/execution/built-in-functions/...
gs-roundeven-float
gs-roundeven-vec2
gs-roundeven-vec3
gs-roundeven-vec4

Signed-off-by: Lars Hamre 

---

Note: someone with access will need to commit this
   after the review process.


I'm not going to commit it myself because I don't work on a Gallium
driver, but I'm very glad to see the patch.

Reviewed-by: Matt Turner 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] swr: implement clipPlanes/clipVertex/clipDistance/cullDistance

2016-05-26 Thread Rowley, Timothy O

> On May 25, 2016, at 9:16 PM, Ilia Mirkin  wrote:
> 
> On Wed, May 25, 2016 at 10:03 PM, Tim Rowley  
> wrote:
>> v2: only load the clip vertex once
>> 
>> v3: fix clip enable logic, add cullDistance
>> ---
>> docs/GL3.txt   |  2 +-
>> src/gallium/drivers/swr/swr_context.h  |  2 ++
>> src/gallium/drivers/swr/swr_screen.cpp |  3 +-
>> src/gallium/drivers/swr/swr_shader.cpp | 63 
>> ++
>> src/gallium/drivers/swr/swr_shader.h   |  4 +++
>> src/gallium/drivers/swr/swr_state.cpp  | 24 -
>> 6 files changed, 95 insertions(+), 3 deletions(-)
>> 
>> diff --git a/docs/GL3.txt b/docs/GL3.txt
>> index 555a9be..5965f25 100644
>> --- a/docs/GL3.txt
>> +++ b/docs/GL3.txt
>> @@ -211,7 +211,7 @@ GL 4.5, GLSL 4.50:
>>   GL_ARB_ES3_1_compatibilityDONE (nvc0, radeonsi)
>>   GL_ARB_clip_control   DONE (i965, nv50, 
>> nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
>>   GL_ARB_conditional_render_invertedDONE (i965, nv50, 
>> nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
>> -  GL_ARB_cull_distance  DONE (i965, nv50, 
>> nvc0, llvmpipe, softpipe)
>> +  GL_ARB_cull_distance  DONE (i965, nv50, 
>> nvc0, llvmpipe, softpipe, swr)
>>   GL_ARB_derivative_control DONE (i965, nv50, 
>> nvc0, r600, radeonsi)
>>   GL_ARB_direct_state_accessDONE (all drivers)
>>   GL_ARB_get_texture_sub_image  DONE (all drivers)
>> diff --git a/src/gallium/drivers/swr/swr_context.h 
>> b/src/gallium/drivers/swr/swr_context.h
>> index a7383bb..75ecae3 100644
>> --- a/src/gallium/drivers/swr/swr_context.h
>> +++ b/src/gallium/drivers/swr/swr_context.h
>> @@ -89,6 +89,8 @@ struct swr_draw_context {
>>swr_jit_texture texturesFS[PIPE_MAX_SHADER_SAMPLER_VIEWS];
>>swr_jit_sampler samplersFS[PIPE_MAX_SAMPLERS];
>> 
>> +   float userClipPlanes[PIPE_MAX_CLIP_PLANES][4];
>> +
>>SWR_SURFACE_STATE renderTargets[SWR_NUM_ATTACHMENTS];
>> };
>> 
>> diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
>> b/src/gallium/drivers/swr/swr_screen.cpp
>> index 0772274..7851346 100644
>> --- a/src/gallium/drivers/swr/swr_screen.cpp
>> +++ b/src/gallium/drivers/swr/swr_screen.cpp
>> @@ -333,6 +333,8 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
>> param)
>>case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
>>case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
>>   return 1;
>> +   case PIPE_CAP_CULL_DISTANCE:
>> +  return 1;
>>case PIPE_CAP_TGSI_TXQS:
>>case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
>>case PIPE_CAP_SHAREABLE_SHADERS:
>> @@ -358,7 +360,6 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
>> param)
>>case PIPE_CAP_PCI_DEVICE:
>>case PIPE_CAP_PCI_FUNCTION:
>>case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
>> -   case PIPE_CAP_CULL_DISTANCE:
>>case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES:
>>   return 0;
>>}
>> diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
>> b/src/gallium/drivers/swr/swr_shader.cpp
>> index f693f51..25ea7ae 100644
>> --- a/src/gallium/drivers/swr/swr_shader.cpp
>> +++ b/src/gallium/drivers/swr/swr_shader.cpp
>> @@ -40,6 +40,9 @@
>> #include "swr_state.h"
>> #include "swr_screen.h"
>> 
>> +static unsigned
>> +locate_linkage(ubyte name, ubyte index, struct tgsi_shader_info *info);
>> +
>> bool operator==(const swr_jit_fs_key &lhs, const swr_jit_fs_key &rhs)
>> {
>>return !memcmp(&lhs, &rhs, sizeof(lhs));
>> @@ -120,6 +123,11 @@ swr_generate_vs_key(struct swr_jit_vs_key &key,
>> {
>>memset(&key, 0, sizeof(key));
>> 
>> +   key.clip_plane_mask = ctx->rasterizer->clip_plane_enable;
>> +   key.clip_distance_mask = swr_vs->info.base.clipdist_writemask;
>> +   key.cull_distance_mask = swr_vs->info.base.culldist_writemask;
>> +   key.writes_clipvertex = swr_vs->info.base.writes_clipvertex;
>> +
>>swr_generate_sampler_key(swr_vs->info, ctx, PIPE_SHADER_VERTEX, key);
>> }
>> 
>> @@ -252,6 +260,61 @@ BuilderSWR::CompileVS(struct swr_context *ctx, 
>> swr_jit_vs_key &key)
>>   }
>>}
>> 
>> +   if (ctx->rasterizer->clip_plane_enable) {
> 
> I think you want if (ctx->rasterizer->clip_plane_enable &&
> (swr_vs->info.base.clipdist_writemask |
> swr_vs->info.base.culldist_writemask) == 0)
> 
> Note that for culling, clip_plane_enable won't be set. That's only for
> clip planes and cull distances.
> 

I think the test actually needs to be "if (ctx->rasterizer->clip_plane_enable 
|| swr_vs->info.base.culldist_writemask)” since I need to do the output 
rewiring for clip and cull.

>> +  unsigned clip_mask = ctx->rasterizer->clip_plane_enable;
>> +
>> +  unsigned cv;
>> +  if (swr_vs->info.base.writes_clipvertex) {
>> + cv = 1 + locate_linkage(TGSI_SEMANTIC_CLIPVERTEX, 0,
>> + &swr_vs->info.base);
>> +  } else {
>> + for (int i = 0; i < PIPE_MAX_SHADER_

[Mesa-dev] [PATCH v4] swr: implement clipPlanes/clipVertex/clipDistance/cullDistance

2016-05-26 Thread Tim Rowley
v2: only load the clip vertex once

v3: fix clip enable logic, add cullDistance

v4: remove duplicate fields in vs jit key, fix test of clip fixup needed
---
 docs/GL3.txt   |  2 +-
 src/gallium/drivers/swr/swr_context.h  |  2 ++
 src/gallium/drivers/swr/swr_screen.cpp |  3 +-
 src/gallium/drivers/swr/swr_shader.cpp | 64 ++
 src/gallium/drivers/swr/swr_shader.h   |  1 +
 src/gallium/drivers/swr/swr_state.cpp  | 24 -
 6 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 555a9be..5965f25 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -211,7 +211,7 @@ GL 4.5, GLSL 4.50:
   GL_ARB_ES3_1_compatibilityDONE (nvc0, radeonsi)
   GL_ARB_clip_control   DONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
   GL_ARB_conditional_render_invertedDONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
-  GL_ARB_cull_distance  DONE (i965, nv50, 
nvc0, llvmpipe, softpipe)
+  GL_ARB_cull_distance  DONE (i965, nv50, 
nvc0, llvmpipe, softpipe, swr)
   GL_ARB_derivative_control DONE (i965, nv50, 
nvc0, r600, radeonsi)
   GL_ARB_direct_state_accessDONE (all drivers)
   GL_ARB_get_texture_sub_image  DONE (all drivers)
diff --git a/src/gallium/drivers/swr/swr_context.h 
b/src/gallium/drivers/swr/swr_context.h
index a7383bb..75ecae3 100644
--- a/src/gallium/drivers/swr/swr_context.h
+++ b/src/gallium/drivers/swr/swr_context.h
@@ -89,6 +89,8 @@ struct swr_draw_context {
swr_jit_texture texturesFS[PIPE_MAX_SHADER_SAMPLER_VIEWS];
swr_jit_sampler samplersFS[PIPE_MAX_SAMPLERS];
 
+   float userClipPlanes[PIPE_MAX_CLIP_PLANES][4];
+
SWR_SURFACE_STATE renderTargets[SWR_NUM_ATTACHMENTS];
 };
 
diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 0772274..7851346 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -333,6 +333,8 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
   return 1;
+   case PIPE_CAP_CULL_DISTANCE:
+  return 1;
case PIPE_CAP_TGSI_TXQS:
case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
case PIPE_CAP_SHAREABLE_SHADERS:
@@ -358,7 +360,6 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
case PIPE_CAP_PCI_DEVICE:
case PIPE_CAP_PCI_FUNCTION:
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
-   case PIPE_CAP_CULL_DISTANCE:
case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES:
   return 0;
}
diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index f693f51..5201c8f 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -40,6 +40,9 @@
 #include "swr_state.h"
 #include "swr_screen.h"
 
+static unsigned
+locate_linkage(ubyte name, ubyte index, struct tgsi_shader_info *info);
+
 bool operator==(const swr_jit_fs_key &lhs, const swr_jit_fs_key &rhs)
 {
return !memcmp(&lhs, &rhs, sizeof(lhs));
@@ -120,6 +123,11 @@ swr_generate_vs_key(struct swr_jit_vs_key &key,
 {
memset(&key, 0, sizeof(key));
 
+   key.clip_plane_mask =
+  swr_vs->info.base.clipdist_writemask ?
+  swr_vs->info.base.clipdist_writemask & 
ctx->rasterizer->clip_plane_enable :
+  ctx->rasterizer->clip_plane_enable;
+
swr_generate_sampler_key(swr_vs->info, ctx, PIPE_SHADER_VERTEX, key);
 }
 
@@ -252,6 +260,62 @@ BuilderSWR::CompileVS(struct swr_context *ctx, 
swr_jit_vs_key &key)
   }
}
 
+   if (ctx->rasterizer->clip_plane_enable ||
+   swr_vs->info.base.culldist_writemask) {
+  unsigned clip_mask = ctx->rasterizer->clip_plane_enable;
+
+  unsigned cv;
+  if (swr_vs->info.base.writes_clipvertex) {
+ cv = 1 + locate_linkage(TGSI_SEMANTIC_CLIPVERTEX, 0,
+ &swr_vs->info.base);
+  } else {
+ for (int i = 0; i < PIPE_MAX_SHADER_OUTPUTS; i++) {
+if (swr_vs->info.base.output_semantic_name[i] == 
TGSI_SEMANTIC_POSITION &&
+swr_vs->info.base.output_semantic_index[i] == 0) {
+   cv = i;
+   break;
+}
+ }
+  }
+  LLVMValueRef cx = LLVMBuildLoad(gallivm->builder, outputs[cv][0], "");
+  LLVMValueRef cy = LLVMBuildLoad(gallivm->builder, outputs[cv][1], "");
+  LLVMValueRef cz = LLVMBuildLoad(gallivm->builder, outputs[cv][2], "");
+  LLVMValueRef cw = LLVMBuildLoad(gallivm->builder, outputs[cv][3], "");
+
+  for (unsigned val = 0; val < PIPE_MAX_CLIP_PLANES; val++) {
+ // clip distance overrides user clip planes
+ if ((swr_vs->info.base.clipdist_writemask & clip_mask & (1 << val)) ||
+ (

Re: [Mesa-dev] [PATCH 01/10] compiler: Move glsl_to_nir to libglsl.la

2016-05-26 Thread Emil Velikov
On 26 May 2016 at 13:39, Rob Herring  wrote:
> On Thu, May 26, 2016 at 6:59 AM, Emil Velikov  
> wrote:
>> Hi Jason,
>>
>> On 26 May 2016 at 02:52, Jason Ekstrand  wrote:
>>> Right now libglsl.la depends on libnir.la so putting it in libnir.la
>>> adds a dependency on libglsl.la that goes the wrong direction.
>>> ---
>>>  src/compiler/Makefile.am  |2 +
>>>  src/compiler/Makefile.nir.am  |5 -
>>>  src/compiler/Makefile.sources |4 +-
>>>  src/compiler/glsl/glsl_to_nir.cpp | 2026 
>>> +
>>>  src/compiler/glsl/glsl_to_nir.h   |   42 +
>>>  src/compiler/nir/glsl_to_nir.cpp  | 2026 
>>> -
>>>  src/compiler/nir/glsl_to_nir.h|   42 -
>>>  src/mesa/drivers/dri/i965/brw_nir.c   |2 +-
>>>  src/mesa/state_tracker/st_glsl_to_nir.cpp |2 +-
>>>  9 files changed, 2074 insertions(+), 2077 deletions(-)
>>>  create mode 100644 src/compiler/glsl/glsl_to_nir.cpp
>>>  create mode 100644 src/compiler/glsl/glsl_to_nir.h
>>>  delete mode 100644 src/compiler/nir/glsl_to_nir.cpp
>>>  delete mode 100644 src/compiler/nir/glsl_to_nir.h
>>>
>>> diff --git a/src/compiler/Makefile.am b/src/compiler/Makefile.am
>>> index dc30f90..710ac5a 100644
>>> --- a/src/compiler/Makefile.am
>>> +++ b/src/compiler/Makefile.am
>>> @@ -31,6 +31,8 @@ AM_CPPFLAGS = \
>>> -I$(top_builddir)/src/compiler/glsl\
>>> -I$(top_srcdir)/src/compiler/glsl\
>>> -I$(top_srcdir)/src/compiler/glsl/glcpp\
>>> +   -I$(top_builddir)/src/compiler/nir\
>>> +   -I$(top_srcdir)/src/compiler/nir\
>> These should be moved and/or duplicated in Makefile.glsl.am shouldn't
>> they ? Please add space before \, as Matt suggested.
>>
>> On the SCons side you'll likely need something like the following. If
>> it doesn't work out, please cc Jose so that he's aware of things going
>> crazy.
>
> Android too will need something similar.
>
The android side is way to messy (from a quick thought) because of the
same reasons affecting genxml file generation. The quick hack that
(may or maynot) work is is below, while a better one will be to create
a dummy target used solely for generating the headers and using it
instead.

--- a/src/compiler/Android.glsl.mk
+++ b/src/compiler/Android.glsl.mk
@@ -43,7 +43,9 @@ LOCAL_C_INCLUDES := \
   $(MESA_TOP)/src/gallium/include \
   $(MESA_TOP)/src/gallium/auxiliary

-LOCAL_STATIC_LIBRARIES := libmesa_compiler
+# Adding nir, triggers generation of its headers (used by glsl_to_nir)
+# and expands the includes list, via its EXPORT_C_INCLUDES.
+LOCAL_STATIC_LIBRARIES := libmesa_compiler libmesa_nir

LOCAL_MODULE := libmesa_glsl

^^ Is obviously untested.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: add warnings .out files on glsl/tests .gitignore

2016-05-26 Thread Alejandro Piñeiro
---
 src/compiler/glsl/tests/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/compiler/glsl/tests/.gitignore 
b/src/compiler/glsl/tests/.gitignore
index 13dcdc4..1c38cd2 100644
--- a/src/compiler/glsl/tests/.gitignore
+++ b/src/compiler/glsl/tests/.gitignore
@@ -3,3 +3,4 @@ ralloc-test
 uniform-initializer-test
 sampler-types-test
 general-ir-test
+warnings/*.out
\ No newline at end of file
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 03/14] radeonsi: Define build_tbuffer_store_dwords earlier to support new users.

2016-05-26 Thread Bas Nieuwenhuizen
Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Nicolai Hähnle 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_shader.c | 138 +++
 1 file changed, 69 insertions(+), 69 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 1f162b5..6690f05 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -671,6 +671,75 @@ static LLVMValueRef get_dw_address(struct 
si_shader_context *ctx,
lp_build_const_int32(gallivm, param * 4), "");
 }
 
+/* TBUFFER_STORE_FORMAT_{X,XY,XYZ,XYZW} <- the suffix is selected by 
num_channels=1..4.
+ * The type of vdata must be one of i32 (num_channels=1), v2i32 
(num_channels=2),
+ * or v4i32 (num_channels=3,4). */
+static void build_tbuffer_store(struct si_shader_context *ctx,
+   LLVMValueRef rsrc,
+   LLVMValueRef vdata,
+   unsigned num_channels,
+   LLVMValueRef vaddr,
+   LLVMValueRef soffset,
+   unsigned inst_offset,
+   unsigned dfmt,
+   unsigned nfmt,
+   unsigned offen,
+   unsigned idxen,
+   unsigned glc,
+   unsigned slc,
+   unsigned tfe)
+{
+   struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm;
+   LLVMValueRef args[] = {
+   rsrc,
+   vdata,
+   LLVMConstInt(ctx->i32, num_channels, 0),
+   vaddr,
+   soffset,
+   LLVMConstInt(ctx->i32, inst_offset, 0),
+   LLVMConstInt(ctx->i32, dfmt, 0),
+   LLVMConstInt(ctx->i32, nfmt, 0),
+   LLVMConstInt(ctx->i32, offen, 0),
+   LLVMConstInt(ctx->i32, idxen, 0),
+   LLVMConstInt(ctx->i32, glc, 0),
+   LLVMConstInt(ctx->i32, slc, 0),
+   LLVMConstInt(ctx->i32, tfe, 0)
+   };
+
+   /* The instruction offset field has 12 bits */
+   assert(offen || inst_offset < (1 << 12));
+
+   /* The intrinsic is overloaded, we need to add a type suffix for 
overloading to work. */
+   unsigned func = CLAMP(num_channels, 1, 3) - 1;
+   const char *types[] = {"i32", "v2i32", "v4i32"};
+   char name[256];
+   snprintf(name, sizeof(name), "llvm.SI.tbuffer.store.%s", types[func]);
+
+   lp_build_intrinsic(gallivm->builder, name, ctx->voidt,
+  args, ARRAY_SIZE(args), 0);
+}
+
+static void build_tbuffer_store_dwords(struct si_shader_context *ctx,
+LLVMValueRef rsrc,
+LLVMValueRef vdata,
+unsigned num_channels,
+LLVMValueRef vaddr,
+LLVMValueRef soffset,
+unsigned inst_offset)
+{
+   static unsigned dfmt[] = {
+   V_008F0C_BUF_DATA_FORMAT_32,
+   V_008F0C_BUF_DATA_FORMAT_32_32,
+   V_008F0C_BUF_DATA_FORMAT_32_32_32,
+   V_008F0C_BUF_DATA_FORMAT_32_32_32_32
+   };
+   assert(num_channels >= 1 && num_channels <= 4);
+
+   build_tbuffer_store(ctx, rsrc, vdata, num_channels, vaddr, soffset,
+   inst_offset, dfmt[num_channels-1],
+   V_008F0C_BUF_NUM_FORMAT_UINT, 1, 0, 1, 1, 0);
+}
+
 /**
  * Load from LDS.
  *
@@ -1844,75 +1913,6 @@ static void si_dump_streamout(struct 
pipe_stream_output_info *so)
}
 }
 
-/* TBUFFER_STORE_FORMAT_{X,XY,XYZ,XYZW} <- the suffix is selected by 
num_channels=1..4.
- * The type of vdata must be one of i32 (num_channels=1), v2i32 
(num_channels=2),
- * or v4i32 (num_channels=3,4). */
-static void build_tbuffer_store(struct si_shader_context *ctx,
-   LLVMValueRef rsrc,
-   LLVMValueRef vdata,
-   unsigned num_channels,
-   LLVMValueRef vaddr,
-   LLVMValueRef soffset,
-   unsigned inst_offset,
-   unsigned dfmt,
-   unsigned nfmt,
-   unsigned offen,
-   unsigned idxen,
-   unsigned glc,
-   unsigned slc,
-   unsigned tfe)
-{
-   struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm;
-   LLVMValueRef args[] = {
-   rsrc,
-   vdata,
-   LLVMConstInt(ctx->i32, num_channels, 0),
-   vaddr,
-   soffset,
-

[Mesa-dev] [PATCH v3 08/14] radeonsi: Store inputs to memory when not using a TCS.

2016-05-26 Thread Bas Nieuwenhuizen
We need to copy the VS outputs to memory. I decided to do this
using a shader key, as the value depends on other shaders.

I also switch the fixed function TCS over to monolithic, as
otherwisze many of the user SGPR's need to be passed to the
epilog, which increases register pressure, or complexity to
avoid that. The main body of the fixed function TCS is not
that interesting to precompile anyway, since we do it on
demand and it is very small.

v2: Use u_bit_scan64.

Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_shader.c| 45 +
 src/gallium/drivers/radeonsi/si_shader.h|  1 +
 src/gallium/drivers/radeonsi/si_state_shaders.c |  3 ++
 3 files changed, 49 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 83bcf5e..b04d0f7 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -2444,6 +2444,48 @@ handle_semantic:
}
 }
 
+static void si_copy_tcs_inputs(struct lp_build_tgsi_context *bld_base)
+{
+   struct si_shader_context *ctx = si_shader_context(bld_base);
+   struct gallivm_state *gallivm = bld_base->base.gallivm;
+   LLVMValueRef invocation_id, rw_buffers, buffer, buffer_offset;
+   LLVMValueRef lds_vertex_stride, lds_vertex_offset, lds_base;
+   uint64_t inputs;
+
+   invocation_id = unpack_param(ctx, SI_PARAM_REL_IDS, 8, 5);
+
+   rw_buffers = LLVMGetParam(ctx->radeon_bld.main_fn, SI_PARAM_RW_BUFFERS);
+   buffer = build_indexed_load_const(ctx, rw_buffers,
+   lp_build_const_int32(gallivm, SI_HS_RING_TESS_OFFCHIP));
+
+   buffer_offset = LLVMGetParam(ctx->radeon_bld.main_fn, 
ctx->param_oc_lds);
+
+   lds_vertex_stride = unpack_param(ctx, SI_PARAM_TCS_IN_LAYOUT, 13, 8);
+   lds_vertex_offset = LLVMBuildMul(gallivm->builder, invocation_id,
+lds_vertex_stride, "");
+   lds_base = get_tcs_in_current_patch_offset(ctx);
+   lds_base = LLVMBuildAdd(gallivm->builder, lds_base, lds_vertex_offset, 
"");
+
+   inputs = ctx->shader->key.tcs.epilog.inputs_to_copy;
+   while (inputs) {
+   unsigned i = u_bit_scan64(&inputs);
+
+   LLVMValueRef lds_ptr = LLVMBuildAdd(gallivm->builder, lds_base,
+   lp_build_const_int32(gallivm, 4 * 
i),
+"");
+
+   LLVMValueRef buffer_addr = get_tcs_tes_buffer_address(ctx,
+ invocation_id,
+ lp_build_const_int32(gallivm, i));
+
+   LLVMValueRef value = lds_load(bld_base, TGSI_TYPE_SIGNED, ~0,
+ lds_ptr);
+
+   build_tbuffer_store_dwords(ctx, buffer, value, 4, buffer_addr,
+  buffer_offset, 0);
+   }
+}
+
 static void si_write_tess_factors(struct lp_build_tgsi_context *bld_base,
  LLVMValueRef rel_patch_id,
  LLVMValueRef invocation_id,
@@ -2585,6 +2627,7 @@ static void si_llvm_emit_tcs_epilogue(struct 
lp_build_tgsi_context *bld_base)
return;
}
 
+   si_copy_tcs_inputs(bld_base);
si_write_tess_factors(bld_base, rel_patch_id, invocation_id, 
tf_lds_offset);
 }
 
@@ -7426,6 +7469,8 @@ int si_shader_create(struct si_screen *sscreen, 
LLVMTargetMachineRef tm,
  shader->key.vs.as_ls != mainp->key.vs.as_ls)) ||
(shader->selector->type == PIPE_SHADER_TESS_EVAL &&
 shader->key.tes.as_es != mainp->key.tes.as_es) ||
+   (shader->selector->type == PIPE_SHADER_TESS_CTRL &&
+shader->key.tcs.epilog.inputs_to_copy) ||
shader->selector->type == PIPE_SHADER_COMPUTE) {
/* Monolithic shader (compiled as a whole, has many variants,
 * may take a long time to compile).
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index 26be25e..67b457b 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -304,6 +304,7 @@ struct si_vs_epilog_bits {
 /* Common TCS bits between the shader key and the epilog key. */
 struct si_tcs_epilog_bits {
unsignedprim_mode:3;
+   uint64_tinputs_to_copy;
 };
 
 /* Common PS bits between the shader key and the prolog key. */
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 968fc88..2aecfa3 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -846,6 +846,9 @@ static inline void si_shader_selector_key(struct 
pipe_context *ctx,
case PIPE_SHADER_TESS_CTRL:
key->tcs.epilo

[Mesa-dev] [PATCH v3 02/14] radeonsi: Add offchip tessellation parameters.

2016-05-26 Thread Bas Nieuwenhuizen
Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Nicolai Hähnle 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_shader.c| 28 -
 src/gallium/drivers/radeonsi/si_shader.h|  3 ++-
 src/gallium/drivers/radeonsi/si_state_shaders.c |  9 
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 3df7820..1f162b5 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -91,6 +91,12 @@ struct si_shader_context
int param_tes_rel_patch_id;
int param_tes_patch_id;
int param_es2gs_offset;
+   int param_oc_lds;
+
+   /* Sets a bit if the dynamic HS control word was 0x8000. The bit is
+* 0x80 for VS, 0x1 for ES.
+*/
+   int param_tess_offchip;
 
LLVMTargetMachineRef tm;
 
@@ -2326,14 +2332,14 @@ static void si_llvm_emit_tcs_epilogue(struct 
lp_build_tgsi_context *bld_base)
tf_soffset = LLVMGetParam(ctx->radeon_bld.main_fn,
  SI_PARAM_TESS_FACTOR_OFFSET);
ret = LLVMBuildInsertValue(builder, ret, tf_soffset,
-  SI_TCS_NUM_USER_SGPR, "");
+  SI_TCS_NUM_USER_SGPR + 1, "");
 
/* VGPRs */
rel_patch_id = bitcast(bld_base, TGSI_TYPE_FLOAT, rel_patch_id);
invocation_id = bitcast(bld_base, TGSI_TYPE_FLOAT, 
invocation_id);
tf_lds_offset = bitcast(bld_base, TGSI_TYPE_FLOAT, 
tf_lds_offset);
 
-   vgpr = SI_TCS_NUM_USER_SGPR + 1;
+   vgpr = SI_TCS_NUM_USER_SGPR + 2;
ret = LLVMBuildInsertValue(builder, ret, rel_patch_id, vgpr++, 
"");
ret = LLVMBuildInsertValue(builder, ret, invocation_id, vgpr++, 
"");
ret = LLVMBuildInsertValue(builder, ret, tf_lds_offset, vgpr++, 
"");
@@ -4945,7 +4951,11 @@ static void declare_streamout_params(struct 
si_shader_context *ctx,
 
/* Streamout SGPRs. */
if (so->num_outputs) {
-   params[ctx->param_streamout_config = (*num_params)++] = i32;
+   if (ctx->type != PIPE_SHADER_TESS_EVAL)
+   params[ctx->param_streamout_config = (*num_params)++] = 
i32;
+   else
+   ctx->param_streamout_config = ctx->param_tess_offchip;
+
params[ctx->param_streamout_write_index = (*num_params)++] = 
i32;
}
/* A streamout buffer offset is loaded if the stride is non-zero. */
@@ -5065,6 +5075,7 @@ static void create_function(struct si_shader_context *ctx)
params[SI_PARAM_TCS_OUT_OFFSETS] = ctx->i32;
params[SI_PARAM_TCS_OUT_LAYOUT] = ctx->i32;
params[SI_PARAM_TCS_IN_LAYOUT] = ctx->i32;
+   params[ctx->param_oc_lds = SI_PARAM_TCS_OC_LDS] = ctx->i32;
params[SI_PARAM_TESS_FACTOR_OFFSET] = ctx->i32;
last_sgpr = SI_PARAM_TESS_FACTOR_OFFSET;
 
@@ -5074,8 +5085,10 @@ static void create_function(struct si_shader_context 
*ctx)
num_params = SI_PARAM_REL_IDS+1;
 
if (!ctx->is_monolithic) {
-   /* PARAM_TESS_FACTOR_OFFSET is after user SGPRs. */
-   for (i = 0; i <= SI_TCS_NUM_USER_SGPR; i++)
+   /* SI_PARAM_TCS_OC_LDS and PARAM_TESS_FACTOR_OFFSET are
+* placed after the user SGPRs.
+*/
+   for (i = 0; i < SI_TCS_NUM_USER_SGPR + 2; i++)
returns[num_returns++] = ctx->i32; /* SGPRs */
 
for (i = 0; i < 3; i++)
@@ -5089,10 +5102,14 @@ static void create_function(struct si_shader_context 
*ctx)
num_params = SI_PARAM_TCS_OUT_LAYOUT+1;
 
if (shader->key.tes.as_es) {
+   params[ctx->param_oc_lds = num_params++] = ctx->i32;
+   params[ctx->param_tess_offchip = num_params++] = 
ctx->i32;
params[ctx->param_es2gs_offset = num_params++] = 
ctx->i32;
} else {
+   params[ctx->param_tess_offchip = num_params++] = 
ctx->i32;
declare_streamout_params(ctx, &shader->selector->so,
 params, ctx->i32, &num_params);
+   params[ctx->param_oc_lds = num_params++] = ctx->i32;
}
last_sgpr = num_params - 1;
 
@@ -6640,6 +6657,7 @@ static bool si_compile_tcs_epilog(struct si_screen 
*sscreen,
params[SI_PARAM_TCS_OUT_OFFSETS] = ctx.i32;
params[SI_PARAM_TCS_OUT_LAYOUT] = ctx.i32;
params[SI_PARAM_TCS_IN_LAYOUT] = ctx.i32;
+   params[ctx.param_oc_lds = SI_PARAM_TCS_OC_LDS] = ctx.i32;
params[SI_PARAM_TESS_FAC

[Mesa-dev] [PATCH v3 05/14] radeonsi: Use correct parameter index for LS_OUT_LAYOUT.

2016-05-26 Thread Bas Nieuwenhuizen
This happens to be in the right position, but that changes
when TCS/TES get new parameters.

Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Nicolai Hähnle 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_shader.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index 61ddcd1..7b1cbf9 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -141,8 +141,10 @@ enum {
SI_PARAM_VERTEX_BUFFERS = SI_NUM_RESOURCE_PARAMS,
SI_PARAM_BASE_VERTEX,
SI_PARAM_START_INSTANCE,
-   /* [0] = clamp vertex color */
+   /* [0] = clamp vertex color, VS as VS only */
SI_PARAM_VS_STATE_BITS,
+   /* same value as TCS_IN_LAYOUT, VS as LS only */
+   SI_PARAM_LS_OUT_LAYOUT = SI_PARAM_START_INSTANCE + 1,
/* the other VS parameters are assigned dynamically */
 
/* Offsets where TCS outputs and TCS patch outputs live in LDS:
@@ -163,10 +165,9 @@ enum {
 *   [13:20] = stride between vertices in dwords = num_inputs * 4, max 
= 32*4
 */
SI_PARAM_TCS_IN_LAYOUT,  /* TCS only */
-   SI_PARAM_LS_OUT_LAYOUT,  /* same value as TCS_IN_LAYOUT, LS only */
 
/* TCS only parameters. */
-   SI_PARAM_TCS_OC_LDS = SI_PARAM_TCS_IN_LAYOUT + 1,
+   SI_PARAM_TCS_OC_LDS,
SI_PARAM_TESS_FACTOR_OFFSET,
SI_PARAM_PATCH_ID,
SI_PARAM_REL_IDS,
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4] swr: implement clipPlanes/clipVertex/clipDistance/cullDistance

2016-05-26 Thread Ilia Mirkin
On Thu, May 26, 2016 at 9:11 AM, Tim Rowley  wrote:
> v2: only load the clip vertex once
>
> v3: fix clip enable logic, add cullDistance
>
> v4: remove duplicate fields in vs jit key, fix test of clip fixup needed
> ---
>  docs/GL3.txt   |  2 +-
>  src/gallium/drivers/swr/swr_context.h  |  2 ++
>  src/gallium/drivers/swr/swr_screen.cpp |  3 +-
>  src/gallium/drivers/swr/swr_shader.cpp | 64 
> ++
>  src/gallium/drivers/swr/swr_shader.h   |  1 +
>  src/gallium/drivers/swr/swr_state.cpp  | 24 -
>  6 files changed, 93 insertions(+), 3 deletions(-)
>
> diff --git a/docs/GL3.txt b/docs/GL3.txt
> index 555a9be..5965f25 100644
> --- a/docs/GL3.txt
> +++ b/docs/GL3.txt
> @@ -211,7 +211,7 @@ GL 4.5, GLSL 4.50:
>GL_ARB_ES3_1_compatibilityDONE (nvc0, radeonsi)
>GL_ARB_clip_control   DONE (i965, nv50, 
> nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
>GL_ARB_conditional_render_invertedDONE (i965, nv50, 
> nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
> -  GL_ARB_cull_distance  DONE (i965, nv50, 
> nvc0, llvmpipe, softpipe)
> +  GL_ARB_cull_distance  DONE (i965, nv50, 
> nvc0, llvmpipe, softpipe, swr)
>GL_ARB_derivative_control DONE (i965, nv50, 
> nvc0, r600, radeonsi)
>GL_ARB_direct_state_accessDONE (all drivers)
>GL_ARB_get_texture_sub_image  DONE (all drivers)
> diff --git a/src/gallium/drivers/swr/swr_context.h 
> b/src/gallium/drivers/swr/swr_context.h
> index a7383bb..75ecae3 100644
> --- a/src/gallium/drivers/swr/swr_context.h
> +++ b/src/gallium/drivers/swr/swr_context.h
> @@ -89,6 +89,8 @@ struct swr_draw_context {
> swr_jit_texture texturesFS[PIPE_MAX_SHADER_SAMPLER_VIEWS];
> swr_jit_sampler samplersFS[PIPE_MAX_SAMPLERS];
>
> +   float userClipPlanes[PIPE_MAX_CLIP_PLANES][4];
> +
> SWR_SURFACE_STATE renderTargets[SWR_NUM_ATTACHMENTS];
>  };
>
> diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
> b/src/gallium/drivers/swr/swr_screen.cpp
> index 0772274..7851346 100644
> --- a/src/gallium/drivers/swr/swr_screen.cpp
> +++ b/src/gallium/drivers/swr/swr_screen.cpp
> @@ -333,6 +333,8 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
> param)
> case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
> case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
>return 1;
> +   case PIPE_CAP_CULL_DISTANCE:
> +  return 1;
> case PIPE_CAP_TGSI_TXQS:
> case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
> case PIPE_CAP_SHAREABLE_SHADERS:
> @@ -358,7 +360,6 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
> param)
> case PIPE_CAP_PCI_DEVICE:
> case PIPE_CAP_PCI_FUNCTION:
> case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
> -   case PIPE_CAP_CULL_DISTANCE:
> case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES:
>return 0;
> }
> diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
> b/src/gallium/drivers/swr/swr_shader.cpp
> index f693f51..5201c8f 100644
> --- a/src/gallium/drivers/swr/swr_shader.cpp
> +++ b/src/gallium/drivers/swr/swr_shader.cpp
> @@ -40,6 +40,9 @@
>  #include "swr_state.h"
>  #include "swr_screen.h"
>
> +static unsigned
> +locate_linkage(ubyte name, ubyte index, struct tgsi_shader_info *info);
> +
>  bool operator==(const swr_jit_fs_key &lhs, const swr_jit_fs_key &rhs)
>  {
> return !memcmp(&lhs, &rhs, sizeof(lhs));
> @@ -120,6 +123,11 @@ swr_generate_vs_key(struct swr_jit_vs_key &key,
>  {
> memset(&key, 0, sizeof(key));
>
> +   key.clip_plane_mask =
> +  swr_vs->info.base.clipdist_writemask ?
> +  swr_vs->info.base.clipdist_writemask & 
> ctx->rasterizer->clip_plane_enable :
> +  ctx->rasterizer->clip_plane_enable;

What about cull planes? What does this key control exactly? If it's
clip | cull, then this needs to be more like

(swr_vs->info.base.clipdist_writemask &
ctx->rasterizer->clip_plane_enable) |
swr_vs->info.base.culldist_writemask

[if clipdist | culldist are written, otherwise just clip_plane_enable,
like you have it now]

> +
> swr_generate_sampler_key(swr_vs->info, ctx, PIPE_SHADER_VERTEX, key);
>  }
>
> @@ -252,6 +260,62 @@ BuilderSWR::CompileVS(struct swr_context *ctx, 
> swr_jit_vs_key &key)
>}
> }
>
> +   if (ctx->rasterizer->clip_plane_enable ||
> +   swr_vs->info.base.culldist_writemask) {
> +  unsigned clip_mask = ctx->rasterizer->clip_plane_enable;
> +
> +  unsigned cv;
> +  if (swr_vs->info.base.writes_clipvertex) {
> + cv = 1 + locate_linkage(TGSI_SEMANTIC_CLIPVERTEX, 0,
> + &swr_vs->info.base);
> +  } else {
> + for (int i = 0; i < PIPE_MAX_SHADER_OUTPUTS; i++) {
> +if (swr_vs->info.base.output_semantic_name[i] == 
> TGSI_SEMANTIC_POSITION &&
> +swr_vs->info.base.output_semantic_index[i] == 0) {
> +  

[Mesa-dev] [PATCH v3 07/14] radeonsi: Add offchip buffer address calculation.

2016-05-26 Thread Bas Nieuwenhuizen
Instead of creating a memory area per patch and per vertex, we put
the same attribute of every vertex & patch together. Most loads
and stores access the same attribute across all lanes, only for
different patches and vertices.

For the TCS this results in tightly packed data for 4-component
stores.

For the TES this is not the case as within a patch the loads
often also access the same vertex. However if there are < 4
vertices/patch, this still results in a reduction of the number
of cache lines. In the LDS situation we only do better than worst
case if the data per patch < 64 bytes, which due to the
tessellation factors is pretty much never.

We do not use hardware swizzling for this. It would slightly reduce
the number of executed VALU instructions, but I had issues with
increased wait times that I haven't been able to solve yet.
Furthermore, the tbuffer_store intrinsic does not support both
VGPR offset and an index, so we have a problem storing
indirectly indexed outputs. This can be solved by temporarily
storing arrays in LDS and then copying them, but I don't think
that is worth the effort. The difference in VALU cycles
hardware swizzling gives is about 0.2% of total busy cycles.
That is without handling the array case.

I chose for attributes instead of components as they are often
accessed together, and the software swizzling takes VALU cycles
for calculating offsets.

v2: - Rename functions to get_tcs_tes_buffer_address.
- multiply by 16 as late as possible.
- Use  tgsi_full_src_register_from_dst.
- Remove some bad comments.

Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_shader.c | 124 +++
 1 file changed, 124 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index ac42721..83bcf5e 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -671,6 +671,130 @@ static LLVMValueRef get_dw_address(struct 
si_shader_context *ctx,
lp_build_const_int32(gallivm, param * 4), "");
 }
 
+/* The offchip buffer layout for TCS->TES is
+ *
+ * - attribute 0 of patch 0 vertex 0
+ * - attribute 0 of patch 0 vertex 1
+ * - attribute 0 of patch 0 vertex 2
+ *   ...
+ * - attribute 0 of patch 1 vertex 0
+ * - attribute 0 of patch 1 vertex 1
+ *   ...
+ * - attribute 1 of patch 0 vertex 0
+ * - attribute 1 of patch 0 vertex 1
+ *   ...
+ * - per patch attribute 0 of patch 0
+ * - per patch attribute 0 of patch 1
+ *   ...
+ *
+ * Note that every attribute has 4 components.
+ */
+static LLVMValueRef get_tcs_tes_buffer_address(struct si_shader_context *ctx,
+   LLVMValueRef vertex_index,
+   LLVMValueRef param_index)
+{
+   struct gallivm_state *gallivm = 
ctx->radeon_bld.soa.bld_base.base.gallivm;
+   LLVMValueRef base_addr, vertices_per_patch, num_patches, total_vertices;
+   LLVMValueRef param_stride, constant16;
+
+   vertices_per_patch = unpack_param(ctx, SI_PARAM_TCS_OFFCHIP_LAYOUT, 9, 
6);
+   num_patches = unpack_param(ctx, SI_PARAM_TCS_OFFCHIP_LAYOUT, 0, 9);
+   total_vertices = LLVMBuildMul(gallivm->builder, vertices_per_patch,
+ num_patches, "");
+
+   constant16 = lp_build_const_int32(gallivm, 16);
+   if (vertex_index) {
+   base_addr = LLVMBuildMul(gallivm->builder, 
get_rel_patch_id(ctx),
+vertices_per_patch, "");
+
+   base_addr = LLVMBuildAdd(gallivm->builder, base_addr,
+vertex_index, "");
+
+   param_stride = total_vertices;
+   } else {
+   base_addr = get_rel_patch_id(ctx);
+   param_stride = num_patches;
+   }
+
+   base_addr = LLVMBuildAdd(gallivm->builder, base_addr,
+LLVMBuildMul(gallivm->builder, param_index,
+ param_stride, ""), "");
+
+   base_addr = LLVMBuildMul(gallivm->builder, base_addr, constant16, "");
+
+   if (!vertex_index) {
+   LLVMValueRef patch_data_offset =
+  unpack_param(ctx, SI_PARAM_TCS_OFFCHIP_LAYOUT, 16, 
16);
+
+   base_addr = LLVMBuildAdd(gallivm->builder, base_addr,
+patch_data_offset, "");
+   }
+   return base_addr;
+}
+
+static LLVMValueRef get_tcs_tes_buffer_address_from_reg(
+   struct si_shader_context *ctx,
+   const struct tgsi_full_dst_register 
*dst,
+   const struct tgsi_full_src_register 
*src)
+{
+   struct gallivm_state *gallivm = 
ctx->radeon_bld.soa.bld_base.base.gallivm;
+   struct tgsi_shader_info *info = &ctx->shader->selector->info;
+   

[Mesa-dev] [PATCH v3 06/14] radeonsi: Add user SGPR for the layout of the offchip buffer.

2016-05-26 Thread Bas Nieuwenhuizen
Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Nicolai Hähnle 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_shader.c |  3 +++
 src/gallium/drivers/radeonsi/si_shader.h | 12 ++--
 src/gallium/drivers/radeonsi/si_state_draw.c |  9 +++--
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index eb57345..ac42721 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -5186,6 +5186,7 @@ static void create_function(struct si_shader_context *ctx)
break;
 
case PIPE_SHADER_TESS_CTRL:
+   params[SI_PARAM_TCS_OFFCHIP_LAYOUT] = ctx->i32;
params[SI_PARAM_TCS_OUT_OFFSETS] = ctx->i32;
params[SI_PARAM_TCS_OUT_LAYOUT] = ctx->i32;
params[SI_PARAM_TCS_IN_LAYOUT] = ctx->i32;
@@ -5211,6 +5212,7 @@ static void create_function(struct si_shader_context *ctx)
break;
 
case PIPE_SHADER_TESS_EVAL:
+   params[SI_PARAM_TCS_OFFCHIP_LAYOUT] = ctx->i32;
params[SI_PARAM_TCS_OUT_OFFSETS] = ctx->i32;
params[SI_PARAM_TCS_OUT_LAYOUT] = ctx->i32;
num_params = SI_PARAM_TCS_OUT_LAYOUT+1;
@@ -6768,6 +6770,7 @@ static bool si_compile_tcs_epilog(struct si_screen 
*sscreen,
params[SI_PARAM_SAMPLERS] = ctx.i64;
params[SI_PARAM_IMAGES] = ctx.i64;
params[SI_PARAM_SHADER_BUFFERS] = ctx.i64;
+   params[SI_PARAM_TCS_OFFCHIP_LAYOUT] = ctx.i32;
params[SI_PARAM_TCS_OUT_OFFSETS] = ctx.i32;
params[SI_PARAM_TCS_OUT_LAYOUT] = ctx.i32;
params[SI_PARAM_TCS_IN_LAYOUT] = ctx.i32;
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index 7b1cbf9..26be25e 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -107,7 +107,8 @@ enum {
SI_LS_NUM_USER_SGPR,
 
/* both TCS and TES */
-   SI_SGPR_TCS_OUT_OFFSETS = SI_NUM_RESOURCE_SGPRS,
+   SI_SGPR_TCS_OFFCHIP_LAYOUT = SI_NUM_RESOURCE_SGPRS,
+   SI_SGPR_TCS_OUT_OFFSETS,
SI_SGPR_TCS_OUT_LAYOUT,
SI_TES_NUM_USER_SGPR,
 
@@ -147,11 +148,18 @@ enum {
SI_PARAM_LS_OUT_LAYOUT = SI_PARAM_START_INSTANCE + 1,
/* the other VS parameters are assigned dynamically */
 
+   /* Layout of TCS outputs in the offchip buffer
+*   [0:8] = the number of patches per threadgroup.
+*   [9:15] = the number of output vertices per patch.
+*   [16:31] = the offset of per patch attributes in the buffer in 
bytes.
+*/
+   SI_PARAM_TCS_OFFCHIP_LAYOUT = SI_NUM_RESOURCE_PARAMS, /* for TCS & TES 
*/
+
/* Offsets where TCS outputs and TCS patch outputs live in LDS:
 *   [0:15] = TCS output patch0 offset / 16, max = NUM_PATCHES * 32 * 32
 *   [16:31] = TCS output patch0 offset for per-patch / 16, max = 
NUM_PATCHES*32*32* + 32*32
 */
-   SI_PARAM_TCS_OUT_OFFSETS = SI_NUM_RESOURCE_PARAMS, /* for TCS & TES */
+   SI_PARAM_TCS_OUT_OFFSETS, /* for TCS & TES */
 
/* Layout of TCS outputs / TES inputs:
 *   [0:12] = stride between output patches in dwords, num_outputs * 
num_vertices * 4, max = 32*32*4
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index dab0dcc..e14a1c9 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -108,6 +108,7 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
unsigned input_patch_size, output_patch_size, output_patch0_offset;
unsigned perpatch_output_offset, lds_size, ls_rsrc2;
unsigned tcs_in_layout, tcs_out_layout, tcs_out_offsets;
+   unsigned offchip_layout;
 
*num_patches = 1; /* TODO: calculate this */
 
@@ -183,6 +184,8 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
 ((output_vertex_size / 4) << 13);
tcs_out_offsets = (output_patch0_offset / 16) |
  ((perpatch_output_offset / 16) << 16);
+   offchip_layout = (pervertex_output_patch_size * *num_patches << 16) |
+(num_tcs_output_cp << 9) | *num_patches;
 
/* Set them for LS. */
radeon_set_sh_reg(cs,
@@ -191,13 +194,15 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
 
/* Set them for TCS. */
radeon_set_sh_reg_seq(cs,
-   R_00B430_SPI_SHADER_USER_DATA_HS_0 + SI_SGPR_TCS_OUT_OFFSETS * 
4, 3);
+   R_00B430_SPI_SHADER_USER_DATA_HS_0 + SI_SGPR_TCS_OFFCHIP_LAYOUT 
* 4, 4);
+   radeon_emit(cs, offchip_layout);
radeon_emit(cs, tcs_out_offsets);
radeon_emit(cs, tcs_out_layout | (num_tcs_input_cp << 26));
radeon_emit(cs, tcs_in_layout);
 
/* Set them for TES. */
-   ra

[Mesa-dev] [PATCH v3 04/14] radeonsi: Add buffer load functions.

2016-05-26 Thread Bas Nieuwenhuizen
v2: - Use llvm.admgcn.buffer.load instrinsics for new LLVM.
- Code style fixes.

v3: - Code style fix.

Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_shader.c | 114 +++
 1 file changed, 114 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 6690f05..eb57345 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -740,6 +740,120 @@ static void build_tbuffer_store_dwords(struct 
si_shader_context *ctx,
V_008F0C_BUF_NUM_FORMAT_UINT, 1, 0, 1, 1, 0);
 }
 
+static LLVMValueRef build_buffer_load(struct si_shader_context *ctx,
+  LLVMValueRef rsrc,
+  int num_channels,
+  LLVMValueRef vindex,
+  LLVMValueRef voffset,
+  LLVMValueRef soffset,
+  unsigned inst_offset,
+  unsigned glc,
+  unsigned slc)
+{
+   struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm;
+   unsigned func = CLAMP(num_channels, 1, 3) - 1;
+
+   if (HAVE_LLVM >= 0x309) {
+   LLVMValueRef args[] = {
+   LLVMBuildBitCast(gallivm->builder, rsrc, ctx->v4i32, 
""),
+   vindex ? vindex : LLVMConstInt(ctx->i32, 0, 0),
+   LLVMConstInt(ctx->i32, inst_offset, 0),
+   LLVMConstInt(ctx->i1, glc, 0),
+   LLVMConstInt(ctx->i1, slc, 0)
+   };
+
+   LLVMTypeRef types[] = {ctx->f32, LLVMVectorType(ctx->f32, 2),
+  ctx->v4f32};
+   const char *type_names[] = {"f32", "v2f32", "v4f32"};
+   char name[256];
+
+   if (voffset) {
+   args[2] = LLVMBuildAdd(gallivm->builder, args[2], 
voffset,
+  "");
+   }
+
+   if (soffset) {
+   args[2] = LLVMBuildAdd(gallivm->builder, args[2], 
soffset,
+  "");
+   }
+
+   snprintf(name, sizeof(name), "llvm.amdgcn.buffer.load.%s",
+type_names[func]);
+
+   return lp_build_intrinsic(gallivm->builder, name, types[func], 
args,
+ ARRAY_SIZE(args), 
LLVMReadOnlyAttribute |
+ LLVMNoUnwindAttribute);
+   } else {
+   LLVMValueRef args[] = {
+   LLVMBuildBitCast(gallivm->builder, rsrc, ctx->v16i8, 
""),
+   voffset ? voffset : vindex,
+   soffset,
+   LLVMConstInt(ctx->i32, inst_offset, 0),
+   LLVMConstInt(ctx->i32, voffset ? 1 : 0, 0), // offen
+   LLVMConstInt(ctx->i32, vindex ? 1 : 0, 0), //idxen
+   LLVMConstInt(ctx->i32, glc, 0),
+   LLVMConstInt(ctx->i32, slc, 0),
+   LLVMConstInt(ctx->i32, 0, 0), // TFE
+   };
+
+   LLVMTypeRef types[] = {ctx->i32, LLVMVectorType(ctx->i32, 2),
+  ctx->v4i32};
+   const char *type_names[] = {"i32", "v2i32", "v4i32"};
+   const char *arg_type = "i32";
+   char name[256];
+
+   if (voffset && vindex) {
+   LLVMValueRef vaddr[] = {vindex, voffset};
+
+   arg_type = "v2i32";
+   args[1] = lp_build_gather_values(gallivm, vaddr, 2);
+   }
+
+   snprintf(name, sizeof(name), "llvm.SI.buffer.load.dword.%s.%s",
+type_names[func], arg_type);
+
+   return lp_build_intrinsic(gallivm->builder, name, types[func], 
args,
+ ARRAY_SIZE(args), 
LLVMReadOnlyAttribute |
+ LLVMNoUnwindAttribute);
+   }
+}
+
+static LLVMValueRef buffer_load(struct lp_build_tgsi_context *bld_base,
+enum tgsi_opcode_type type, unsigned swizzle,
+LLVMValueRef buffer, LLVMValueRef offset,
+LLVMValueRef base)
+{
+   struct si_shader_context *ctx = si_shader_context(bld_base);
+   struct gallivm_state *gallivm = bld_base->base.gallivm;
+   LLVMValueRef value, value2;
+   LLVMTypeRef llvm_type = tgsi2llvmtype(bld_base, type);
+   LLVMTypeRef vec_type = LLVMVectorType(llvm_type, 4);
+
+   if (swizzle == ~0) {
+   value = build_buffer_load(ctx, buffer, 4, NULL, base, offset,
+  

[Mesa-dev] [PATCH v3 09/14] radeonsi: Use buffer loads and stores for passing data from TCS to TES.

2016-05-26 Thread Bas Nieuwenhuizen
We always try to use 4-component loads, as LLVM does not combine loads
and they bypass the L1 cache.

We can't use a similar strategy for stores and this is especially
notable with the tess factors, as they are often set with separate
MOV's per component in the TGSI.

We keep storing to LDS and the LDS space, so we can load the outputs
later, either due to the shader, of for wrting the tess factors.

Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Nicolai Hähnle 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_shader.c | 66 
 1 file changed, 50 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index b04d0f7..6694f00 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1082,18 +1082,18 @@ static LLVMValueRef fetch_input_tes(
enum tgsi_opcode_type type, unsigned swizzle)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
-   LLVMValueRef dw_addr, stride;
+   struct gallivm_state *gallivm = bld_base->base.gallivm;
+   LLVMValueRef rw_buffers, buffer, base, addr;
 
-   if (reg->Register.Dimension) {
-   stride = unpack_param(ctx, SI_PARAM_TCS_OUT_LAYOUT, 13, 8);
-   dw_addr = get_tcs_out_current_patch_offset(ctx);
-   dw_addr = get_dw_address(ctx, NULL, reg, stride, dw_addr);
-   } else {
-   dw_addr = get_tcs_out_current_patch_data_offset(ctx);
-   dw_addr = get_dw_address(ctx, NULL, reg, NULL, dw_addr);
-   }
+   rw_buffers = LLVMGetParam(ctx->radeon_bld.main_fn,
+ SI_PARAM_RW_BUFFERS);
+   buffer = build_indexed_load_const(ctx, rw_buffers,
+   lp_build_const_int32(gallivm, SI_HS_RING_TESS_OFFCHIP));
 
-   return lds_load(bld_base, type, swizzle, dw_addr);
+   base = LLVMGetParam(ctx->radeon_bld.main_fn, ctx->param_oc_lds);
+   addr = get_tcs_tes_buffer_address_from_reg(ctx, NULL, reg);
+
+   return buffer_load(bld_base, type, swizzle, buffer, base, addr);
 }
 
 static void store_output_tcs(struct lp_build_tgsi_context *bld_base,
@@ -1102,9 +1102,12 @@ static void store_output_tcs(struct 
lp_build_tgsi_context *bld_base,
 LLVMValueRef dst[4])
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
+   struct gallivm_state *gallivm = bld_base->base.gallivm;
const struct tgsi_full_dst_register *reg = &inst->Dst[0];
unsigned chan_index;
LLVMValueRef dw_addr, stride;
+   LLVMValueRef rw_buffers, buffer, base, buf_addr;
+   LLVMValueRef values[4];
 
/* Only handle per-patch and per-vertex outputs here.
 * Vectors will be lowered to scalars and this function will be called 
again.
@@ -1124,6 +1127,15 @@ static void store_output_tcs(struct 
lp_build_tgsi_context *bld_base,
dw_addr = get_dw_address(ctx, reg, NULL, NULL, dw_addr);
}
 
+   rw_buffers = LLVMGetParam(ctx->radeon_bld.main_fn,
+ SI_PARAM_RW_BUFFERS);
+   buffer = build_indexed_load_const(ctx, rw_buffers,
+   lp_build_const_int32(gallivm, SI_HS_RING_TESS_OFFCHIP));
+
+   base = LLVMGetParam(ctx->radeon_bld.main_fn, ctx->param_oc_lds);
+   buf_addr = get_tcs_tes_buffer_address_from_reg(ctx, reg, NULL);
+
+
TGSI_FOR_EACH_DST0_ENABLED_CHANNEL(inst, chan_index) {
LLVMValueRef value = dst[chan_index];
 
@@ -1131,6 +1143,22 @@ static void store_output_tcs(struct 
lp_build_tgsi_context *bld_base,
value = radeon_llvm_saturate(bld_base, value);
 
lds_store(bld_base, chan_index, dw_addr, value);
+
+   value = LLVMBuildBitCast(gallivm->builder, value, ctx->i32, "");
+   values[chan_index] = value;
+
+   if (inst->Dst[0].Register.WriteMask != 0xF) {
+   build_tbuffer_store_dwords(ctx, buffer, value, 1,
+  buf_addr, base,
+  4 * chan_index);
+   }
+   }
+
+   if (inst->Dst[0].Register.WriteMask == 0xF) {
+   LLVMValueRef value = 
lp_build_gather_values(bld_base->base.gallivm,
+   values, 4);
+   build_tbuffer_store_dwords(ctx, buffer, value, 4, buf_addr,
+  base, 0);
}
 }
 
@@ -1641,15 +1669,21 @@ static void declare_system_value(
case TGSI_SEMANTIC_TESSINNER:
case TGSI_SEMANTIC_TESSOUTER:
{
-   LLVMValueRef dw_addr;
+   LLVMValueRef rw_buffers, buffer, base, addr;
int param = si_shader_io_get_unique_index(decl->Semantic.Name, 
0);
 
-   dw_addr = get_tcs_out_current_patch_data_offset(ctx);
- 

[Mesa-dev] [PATCH v3 12/14] radeonsi: Add barrier before writing the tess factors.

2016-05-26 Thread Bas Nieuwenhuizen
The factors may be stored to LDs by another invocation than
the invocation for vertex 0.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/gallium/drivers/radeonsi/si_shader.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 166b2e8..5e5bf68 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -144,6 +144,10 @@ static void si_init_shader_ctx(struct si_shader_context 
*ctx,
   struct si_shader *shader,
   LLVMTargetMachineRef tm);
 
+static void si_llvm_emit_barrier(const struct lp_build_tgsi_action *action,
+struct lp_build_tgsi_context *bld_base,
+struct lp_build_emit_data *emit_data);
+
 /* Ideally pass the sample mask input to the PS epilog as v13, which
  * is its usual location, so that the shader doesn't have to add v_mov.
  */
@@ -2534,6 +2538,8 @@ static void si_write_tess_factors(struct 
lp_build_tgsi_context *bld_base,
unsigned stride, outer_comps, inner_comps, i;
struct lp_build_if_state if_ctx, inner_if_ctx;
 
+   si_llvm_emit_barrier(NULL, bld_base, NULL);
+
/* Do this only for invocation 0, because the tess levels are per-patch,
 * not per-vertex.
 *
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 00/14] radeonsi: offchip tessellation

2016-05-26 Thread Bas Nieuwenhuizen
Addressed review comments by Marek.

As part of that the max number of patches per threadgroup was reduced to 40
from 64. This reduced unigine-heaven performance from 43.1 fps to 42.5 fps
(the number varies a little but the magnitude of the difference is pretty 
constant)

However it is likely that the optimal value for it differs between applications,
and I don't have that many applications to check against.

Any thoughts on the issue?

- Bas

Bas Nieuwenhuizen (14):
  radeonsi: Add buffer for offchip storage between TCS and TES.
  radeonsi: Add offchip tessellation parameters.
  radeonsi: Define build_tbuffer_store_dwords earlier to support new
users.
  radeonsi: Add buffer load functions.
  radeonsi: Use correct parameter index for LS_OUT_LAYOUT.
  radeonsi: Add user SGPR for the layout of the offchip buffer.
  radeonsi: Add offchip buffer address calculation.
  radeonsi: Store inputs to memory when not using a TCS.
  radeonsi: Use buffer loads and stores for passing data from TCS to
TES.
  radeonsi: Remove LDS layout user SGPR's from TES.
  radeonsi: Enable dynamic HS.
  radeonsi: Add barrier before writing the tess factors.
  radeonsi: Process multiple patches per threadgroup.
  radeonsi: Allow TES distribution between shader engines.

 src/gallium/drivers/radeonsi/si_pipe.c  |   1 +
 src/gallium/drivers/radeonsi/si_pipe.h  |   1 +
 src/gallium/drivers/radeonsi/si_shader.c| 547 +++-
 src/gallium/drivers/radeonsi/si_shader.h|  32 +-
 src/gallium/drivers/radeonsi/si_state.c |   5 +
 src/gallium/drivers/radeonsi/si_state.h |   3 +
 src/gallium/drivers/radeonsi/si_state_draw.c|  67 ++-
 src/gallium/drivers/radeonsi/si_state_shaders.c |  71 ++-
 src/gallium/drivers/radeonsi/sid.h  |   3 +
 9 files changed, 589 insertions(+), 141 deletions(-)

-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 10/14] radeonsi: Remove LDS layout user SGPR's from TES.

2016-05-26 Thread Bas Nieuwenhuizen
They are unused.

Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Nicolai Hähnle 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_shader.c |  4 +---
 src/gallium/drivers/radeonsi/si_shader.h | 15 ---
 src/gallium/drivers/radeonsi/si_state_draw.c |  4 +---
 3 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 6694f00..11c7c38 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -5414,9 +5414,7 @@ static void create_function(struct si_shader_context *ctx)
 
case PIPE_SHADER_TESS_EVAL:
params[SI_PARAM_TCS_OFFCHIP_LAYOUT] = ctx->i32;
-   params[SI_PARAM_TCS_OUT_OFFSETS] = ctx->i32;
-   params[SI_PARAM_TCS_OUT_LAYOUT] = ctx->i32;
-   num_params = SI_PARAM_TCS_OUT_LAYOUT+1;
+   num_params = SI_PARAM_TCS_OFFCHIP_LAYOUT+1;
 
if (shader->key.tes.as_es) {
params[ctx->param_oc_lds = num_params++] = ctx->i32;
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index 67b457b..9425b1e 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -108,12 +108,12 @@ enum {
 
/* both TCS and TES */
SI_SGPR_TCS_OFFCHIP_LAYOUT = SI_NUM_RESOURCE_SGPRS,
-   SI_SGPR_TCS_OUT_OFFSETS,
-   SI_SGPR_TCS_OUT_LAYOUT,
SI_TES_NUM_USER_SGPR,
 
/* TCS only */
-   SI_SGPR_TCS_IN_LAYOUT = SI_TES_NUM_USER_SGPR,
+   SI_SGPR_TCS_OUT_OFFSETS = SI_TES_NUM_USER_SGPR,
+   SI_SGPR_TCS_OUT_LAYOUT,
+   SI_SGPR_TCS_IN_LAYOUT,
SI_TCS_NUM_USER_SGPR,
 
/* GS limits */
@@ -155,26 +155,27 @@ enum {
 */
SI_PARAM_TCS_OFFCHIP_LAYOUT = SI_NUM_RESOURCE_PARAMS, /* for TCS & TES 
*/
 
+   /* TCS only parameters. */
+
/* Offsets where TCS outputs and TCS patch outputs live in LDS:
 *   [0:15] = TCS output patch0 offset / 16, max = NUM_PATCHES * 32 * 32
 *   [16:31] = TCS output patch0 offset for per-patch / 16, max = 
NUM_PATCHES*32*32* + 32*32
 */
-   SI_PARAM_TCS_OUT_OFFSETS, /* for TCS & TES */
+   SI_PARAM_TCS_OUT_OFFSETS,
 
/* Layout of TCS outputs / TES inputs:
 *   [0:12] = stride between output patches in dwords, num_outputs * 
num_vertices * 4, max = 32*32*4
 *   [13:20] = stride between output vertices in dwords = num_inputs * 
4, max = 32*4
 *   [26:31] = gl_PatchVerticesIn, max = 32
 */
-   SI_PARAM_TCS_OUT_LAYOUT, /* for TCS & TES */
+   SI_PARAM_TCS_OUT_LAYOUT,
 
/* Layout of LS outputs / TCS inputs
 *   [0:12] = stride between patches in dwords = num_inputs * 
num_vertices * 4, max = 32*32*4
 *   [13:20] = stride between vertices in dwords = num_inputs * 4, max 
= 32*4
 */
-   SI_PARAM_TCS_IN_LAYOUT,  /* TCS only */
+   SI_PARAM_TCS_IN_LAYOUT,
 
-   /* TCS only parameters. */
SI_PARAM_TCS_OC_LDS,
SI_PARAM_TESS_FACTOR_OFFSET,
SI_PARAM_PATCH_ID,
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index e14a1c9..6fe2619 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -201,10 +201,8 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
radeon_emit(cs, tcs_in_layout);
 
/* Set them for TES. */
-   radeon_set_sh_reg_seq(cs, tes_sh_base + SI_SGPR_TCS_OFFCHIP_LAYOUT * 4, 
3);
+   radeon_set_sh_reg_seq(cs, tes_sh_base + SI_SGPR_TCS_OFFCHIP_LAYOUT * 4, 
1);
radeon_emit(cs, offchip_layout);
-   radeon_emit(cs, tcs_out_offsets);
-   radeon_emit(cs, tcs_out_layout | (num_tcs_output_cp << 26));
 }
 
 static unsigned si_num_prims_for_vertices(const struct pipe_draw_info *info)
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 13/14] radeonsi: Process multiple patches per threadgroup.

2016-05-26 Thread Bas Nieuwenhuizen
Using more than 1 wave per threadgroup does increase performance
generally.  Not using too many patches per threadgroup also
increases performance. Both catalyst and amdgpu-pro seem to
use 40 patches as their maximum, but I haven't really seen
any performance increase from limiting the number of patches
to 40 instead of 64.

Note that the trick where we overlap the input and output LDS
does not work anymore as the insertion of the tess factors
changes the patch stride.

v2: - Add comment about LDS assumptions.
- Add constant for buffer size.
- Fix code style.

v3: - Correct limits for not splitting patches between waves.
- Set max num_patches to 40 as in the proprietary driver.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/gallium/drivers/radeonsi/si_state_draw.c | 50 +++-
 1 file changed, 35 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 6fe2619..c8b87a9 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -108,20 +108,7 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
unsigned input_patch_size, output_patch_size, output_patch0_offset;
unsigned perpatch_output_offset, lds_size, ls_rsrc2;
unsigned tcs_in_layout, tcs_out_layout, tcs_out_offsets;
-   unsigned offchip_layout;
-
-   *num_patches = 1; /* TODO: calculate this */
-
-   if (sctx->last_ls == ls->current &&
-   sctx->last_tcs == tcs &&
-   sctx->last_tes_sh_base == tes_sh_base &&
-   sctx->last_num_tcs_input_cp == num_tcs_input_cp)
-   return;
-
-   sctx->last_ls = ls->current;
-   sctx->last_tcs = tcs;
-   sctx->last_tes_sh_base = tes_sh_base;
-   sctx->last_num_tcs_input_cp = num_tcs_input_cp;
+   unsigned offchip_layout, hardware_lds_size;
 
/* This calculates how shader inputs and outputs among VS, TCS, and TES
 * are laid out in LDS. */
@@ -146,7 +133,29 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
pervertex_output_patch_size = num_tcs_output_cp * output_vertex_size;
output_patch_size = pervertex_output_patch_size + num_tcs_patch_outputs 
* 16;
 
-   output_patch0_offset = sctx->tcs_shader.cso ? input_patch_size * 
*num_patches : 0;
+   /* Ensure that we only need one wave per SIMD so we don't need to check
+* resource usage. Also ensures that the number of tcs in and out
+* vertices per threadgroup is at most 256.
+*/
+   *num_patches = 64 / MAX2(num_tcs_input_cp, num_tcs_output_cp) * 4;
+
+   /* Make sure that the data fits in LDS. This assumes the shaders only
+* use LDS for the inputs and outputs.
+*/
+   hardware_lds_size = sctx->b.chip_class >= CIK ? 65536 : 32768;
+   *num_patches = MIN2(*num_patches, hardware_lds_size / (input_patch_size 
+
+  
output_patch_size));
+
+   /* Make sure the output data fits in the offchip buffer */
+   *num_patches = MIN2(*num_patches, SI_TESS_OFFCHIP_BLOCK_SIZE /
+ output_patch_size);
+
+   /* Not necessary for correctness, but improves performance. The
+* specific value is taken from the proprietary driver.
+*/
+   *num_patches = MIN2(*num_patches, 40);
+
+   output_patch0_offset = input_patch_size * *num_patches;
perpatch_output_offset = output_patch0_offset + 
pervertex_output_patch_size;
 
lds_size = output_patch0_offset + output_patch_size * *num_patches;
@@ -160,6 +169,17 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
ls_rsrc2 |= S_00B52C_LDS_SIZE(align(lds_size, 256) / 256);
}
 
+   if (sctx->last_ls == ls->current &&
+   sctx->last_tcs == tcs &&
+   sctx->last_tes_sh_base == tes_sh_base &&
+   sctx->last_num_tcs_input_cp == num_tcs_input_cp)
+   return;
+
+   sctx->last_ls = ls->current;
+   sctx->last_tcs = tcs;
+   sctx->last_tes_sh_base = tes_sh_base;
+   sctx->last_num_tcs_input_cp = num_tcs_input_cp;
+
/* Due to a hw bug, RSRC2_LS must be written twice with another
 * LS register written in between. */
if (sctx->b.chip_class == CIK && sctx->b.family != CHIP_HAWAII)
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 14/14] radeonsi: Allow TES distribution between shader engines.

2016-05-26 Thread Bas Nieuwenhuizen
The R_028B50_VGT_TESS_DISTRIBUTION value is copied from
amdgpu-pro. Smaller values in the ACCUM fields seem to
decrease the performance advantage from this patch, higher
values don't seem to matter.

v2: Add distribution mode field enums.

Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Nicolai Hähnle 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_state.c |  5 
 src/gallium/drivers/radeonsi/si_state_draw.c|  8 +
 src/gallium/drivers/radeonsi/si_state_shaders.c | 39 +++--
 src/gallium/drivers/radeonsi/sid.h  |  3 ++
 4 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index aefa336..ab321ef 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -3791,6 +3791,11 @@ static void si_init_config(struct si_context *sctx)
   S_028424_OVERWRITE_COMBINER_WATERMARK(4));
si_pm4_set_reg(pm4, R_028C58_VGT_VERTEX_REUSE_BLOCK_CNTL, 30);
si_pm4_set_reg(pm4, R_028C5C_VGT_OUT_DEALLOC_CNTL, 32);
+   si_pm4_set_reg(pm4, R_028B50_VGT_TESS_DISTRIBUTION,
+  S_028B50_ACCUM_ISOLINE(32) |
+  S_028B50_ACCUM_TRI(11) |
+  S_028B50_ACCUM_QUAD(11) |
+  S_028B50_DONUT_SPLIT(16));
}
 
if (sctx->b.family == CHIP_STONEY)
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index c8b87a9..788869e 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -279,6 +279,14 @@ static unsigned si_get_ia_multi_vgt_param(struct 
si_context *sctx,
 sctx->b.family == CHIP_BONAIRE) &&
sctx->gs_shader.cso)
partial_vs_wave = true;
+
+   /* Needed for 028B6C_DISTRIBUTION_MODE != 0 */
+   if (sctx->b.chip_class >= VI) {
+   if (sctx->gs_shader.cso)
+   partial_es_wave = true;
+   else
+   partial_vs_wave = true;
+   }
}
 
/* This is a hardware requirement. */
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 116bf27..c6f51ea 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -249,7 +249,8 @@ void si_destroy_shader_cache(struct si_screen *sscreen)
 
 /* SHADER STATES */
 
-static void si_set_tesseval_regs(struct si_shader *shader,
+static void si_set_tesseval_regs(struct si_screen *sscreen,
+struct si_shader *shader,
 struct si_pm4_state *pm4)
 {
struct tgsi_shader_info *info = &shader->selector->info;
@@ -257,7 +258,7 @@ static void si_set_tesseval_regs(struct si_shader *shader,
unsigned tes_spacing = info->properties[TGSI_PROPERTY_TES_SPACING];
bool tes_vertex_order_cw = 
info->properties[TGSI_PROPERTY_TES_VERTEX_ORDER_CW];
bool tes_point_mode = info->properties[TGSI_PROPERTY_TES_POINT_MODE];
-   unsigned type, partitioning, topology;
+   unsigned type, partitioning, topology, distribution_mode;
 
switch (tes_prim_mode) {
case PIPE_PRIM_LINES:
@@ -299,10 +300,16 @@ static void si_set_tesseval_regs(struct si_shader *shader,
else
topology = V_028B6C_OUTPUT_TRIANGLE_CW;
 
+   if (sscreen->b.chip_class >= VI)
+   distribution_mode = V_028B6C_DISTRIBUTION_MODE_DONUTS;
+   else
+   distribution_mode = V_028B6C_DISTRIBUTION_MODE_NO_DIST;
+
si_pm4_set_reg(pm4, R_028B6C_VGT_TF_PARAM,
   S_028B6C_TYPE(type) |
   S_028B6C_PARTITIONING(partitioning) |
-  S_028B6C_TOPOLOGY(topology));
+  S_028B6C_TOPOLOGY(topology) |
+  S_028B6C_DISTRIBUTION_MODE(distribution_mode));
 }
 
 static void si_shader_ls(struct si_shader *shader)
@@ -359,7 +366,7 @@ static void si_shader_hs(struct si_shader *shader)
   
S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
 }
 
-static void si_shader_es(struct si_shader *shader)
+static void si_shader_es(struct si_screen *sscreen, struct si_shader *shader)
 {
struct si_pm4_state *pm4;
unsigned num_user_sgprs;
@@ -402,7 +409,7 @@ static void si_shader_es(struct si_shader *shader)
   
S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
 
if (shader->selector->type == PIPE_SHADER_TESS_EVAL)
-   si_set_tesseval_regs(shader, pm4);
+   si_set_tesseval_regs(sscreen, shader, pm4);
 }
 
 /**
@@ -489,7 +496,8 @@ stat

[Mesa-dev] [PATCH v3 11/14] radeonsi: Enable dynamic HS.

2016-05-26 Thread Bas Nieuwenhuizen
This allows running the TES on different CU's than the
TCS which results in performance improvements.

v2: Only write the control word from one invocation.

Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_shader.c| 19 +++
 src/gallium/drivers/radeonsi/si_state_shaders.c |  2 +-
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 11c7c38..166b2e8 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -2532,7 +2532,7 @@ static void si_write_tess_factors(struct 
lp_build_tgsi_context *bld_base,
LLVMValueRef lds_base, lds_inner, lds_outer, byteoffset, buffer;
LLVMValueRef out[6], vec0, vec1, rw_buffers, tf_base;
unsigned stride, outer_comps, inner_comps, i;
-   struct lp_build_if_state if_ctx;
+   struct lp_build_if_state if_ctx, inner_if_ctx;
 
/* Do this only for invocation 0, because the tess levels are per-patch,
 * not per-vertex.
@@ -2604,12 +2604,23 @@ static void si_write_tess_factors(struct 
lp_build_tgsi_context *bld_base,
byteoffset = LLVMBuildMul(gallivm->builder, rel_patch_id,
  lp_build_const_int32(gallivm, 4 * stride), 
"");
 
-   /* Store the outputs. */
+   lp_build_if(&inner_if_ctx, gallivm,
+   LLVMBuildICmp(gallivm->builder, LLVMIntEQ,
+ rel_patch_id, bld_base->uint_bld.zero, ""));
+
+   /* Store the dynamic HS control word. */
+   build_tbuffer_store_dwords(ctx, buffer,
+  lp_build_const_int32(gallivm, 0x8000),
+  1, lp_build_const_int32(gallivm, 0), 
tf_base, 0);
+
+   lp_build_endif(&inner_if_ctx);
+
+   /* Store the tessellation factors. */
build_tbuffer_store_dwords(ctx, buffer, vec0,
-  MIN2(stride, 4), byteoffset, tf_base, 0);
+  MIN2(stride, 4), byteoffset, tf_base, 4);
if (vec1)
build_tbuffer_store_dwords(ctx, buffer, vec1,
-  stride - 4, byteoffset, tf_base, 16);
+  stride - 4, byteoffset, tf_base, 20);
lp_build_endif(&if_ctx);
 }
 
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 2aecfa3..116bf27 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1882,7 +1882,7 @@ static void si_update_vgt_shader_config(struct si_context 
*sctx)
 
if (sctx->tes_shader.cso) {
stages |= S_028B54_LS_EN(V_028B54_LS_STAGE_ON) |
- S_028B54_HS_EN(1);
+ S_028B54_HS_EN(1) | S_028B54_DYNAMIC_HS(1);
 
if (sctx->gs_shader.cso)
stages |= S_028B54_ES_EN(V_028B54_ES_STAGE_DS) |
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 01/14] radeonsi: Add buffer for offchip storage between TCS and TES.

2016-05-26 Thread Bas Nieuwenhuizen
The buffer is quite large, but should only be allocated if the
application uses tessellation. Most non-games don't.

v2: - Use the correct register for SI.
- Add define for block size.

Signed-off-by: Bas Nieuwenhuizen 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_pipe.c  |  1 +
 src/gallium/drivers/radeonsi/si_pipe.h  |  1 +
 src/gallium/drivers/radeonsi/si_state.h |  3 +++
 src/gallium/drivers/radeonsi/si_state_shaders.c | 18 ++
 4 files changed, 23 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 6700590..eefc68a 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -48,6 +48,7 @@ static void si_destroy_context(struct pipe_context *context)
pipe_resource_reference(&sctx->esgs_ring, NULL);
pipe_resource_reference(&sctx->gsvs_ring, NULL);
pipe_resource_reference(&sctx->tf_ring, NULL);
+   pipe_resource_reference(&sctx->tess_offchip_ring, NULL);
pipe_resource_reference(&sctx->null_const_buf.buffer, NULL);
r600_resource_reference(&sctx->border_color_buffer, NULL);
free(sctx->border_color_table);
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 33d3d25..e5b88c7 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -256,6 +256,7 @@ struct si_context {
struct pipe_resource*esgs_ring;
struct pipe_resource*gsvs_ring;
struct pipe_resource*tf_ring;
+   struct pipe_resource*tess_offchip_ring;
union pipe_color_union  *border_color_table; /* in CPU memory, 
any endian */
struct r600_resource*border_color_buffer;
union pipe_color_union  *border_color_map; /* in VRAM (slow 
access), little endian */
diff --git a/src/gallium/drivers/radeonsi/si_state.h 
b/src/gallium/drivers/radeonsi/si_state.h
index f2a3b03..a3589d4 100644
--- a/src/gallium/drivers/radeonsi/si_state.h
+++ b/src/gallium/drivers/radeonsi/si_state.h
@@ -40,6 +40,8 @@
 #define SI_NUM_IMAGES  16
 #define SI_NUM_SHADER_BUFFERS  16
 
+#define SI_TESS_OFFCHIP_BLOCK_SIZE (8192 * 4)
+
 struct si_screen;
 struct si_shader;
 
@@ -155,6 +157,7 @@ struct si_shader_data {
 /* Private read-write buffer slots. */
 enum {
SI_HS_RING_TESS_FACTOR,
+   SI_HS_RING_TESS_OFFCHIP,
 
SI_ES_RING_ESGS,
SI_GS_RING_ESGS,
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 13066ff..d8ae2b2 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1770,6 +1770,7 @@ static bool si_update_spi_tmpring_size(struct si_context 
*sctx)
 
 static void si_init_tess_factor_ring(struct si_context *sctx)
 {
+   unsigned offchip_blocks = sctx->b.chip_class >= CIK ? 256 : 64;
assert(!sctx->tf_ring);
 
sctx->tf_ring = pipe_buffer_create(sctx->b.b.screen, PIPE_BIND_CUSTOM,
@@ -1780,6 +1781,14 @@ static void si_init_tess_factor_ring(struct si_context 
*sctx)
 
assert(((sctx->tf_ring->width0 / 4) & C_030938_SIZE) == 0);
 
+   sctx->tess_offchip_ring = pipe_buffer_create(sctx->b.b.screen,
+PIPE_BIND_CUSTOM,
+PIPE_USAGE_DEFAULT,
+offchip_blocks *
+
SI_TESS_OFFCHIP_BLOCK_SIZE);
+   if (!sctx->tess_offchip_ring)
+   return;
+
si_init_config_add_vgt_flush(sctx);
 
/* Append these registers to the init config state. */
@@ -1788,11 +1797,16 @@ static void si_init_tess_factor_ring(struct si_context 
*sctx)
   S_030938_SIZE(sctx->tf_ring->width0 / 4));
si_pm4_set_reg(sctx->init_config, R_030940_VGT_TF_MEMORY_BASE,
   r600_resource(sctx->tf_ring)->gpu_address >> 8);
+   si_pm4_set_reg(sctx->init_config, R_03093C_VGT_HS_OFFCHIP_PARAM,
+S_03093C_OFFCHIP_BUFFERING(offchip_blocks - 1) |
+
S_03093C_OFFCHIP_GRANULARITY(V_03093C_X_8K_DWORDS));
} else {
si_pm4_set_reg(sctx->init_config, R_008988_VGT_TF_RING_SIZE,
   S_008988_SIZE(sctx->tf_ring->width0 / 4));
si_pm4_set_reg(sctx->init_config, R_0089B8_VGT_TF_MEMORY_BASE,
   r600_resource(sctx->tf_ring)->gpu_address >> 8);
+   si_pm4_set_reg(sctx->init_config, R_0089B0_VGT_HS_OFFCHIP_PARAM,
+  S_0089B0_OFFCHIP_BUFFERING(offchip_blocks - 1));
}
 
/* Flush the context to re-emit the init_config sta

[Mesa-dev] [PATCH 2/5] util/indices: formatting, whitespace fixes in u_unfilled_indices.c

2016-05-26 Thread Brian Paul
---
 src/gallium/auxiliary/indices/u_unfilled_indices.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/src/gallium/auxiliary/indices/u_unfilled_indices.c 
b/src/gallium/auxiliary/indices/u_unfilled_indices.c
index fc974f8..fe57fd7 100644
--- a/src/gallium/auxiliary/indices/u_unfilled_indices.c
+++ b/src/gallium/auxiliary/indices/u_unfilled_indices.c
@@ -49,7 +49,7 @@ static void translate_memcpy_ushort( const void *in,
 {
memcpy(out, &((short *)in)[start], out_nr*sizeof(short));
 }
-  
+
 static void translate_memcpy_uint( const void *in,
unsigned start,
unsigned in_nr,
@@ -70,7 +70,7 @@ static void generate_linear_ushort( unsigned start,
for (i = 0; i < nr; i++)
   out_us[i] = (ushort)(i + start);
 }
-  
+
 static void generate_linear_uint( unsigned start,
   unsigned nr,
   void *out )
@@ -87,12 +87,12 @@ static void generate_linear_uint( unsigned start,
  * needed to draw the primitive with fill mode = PIPE_POLYGON_MODE_LINE using
  * separate lines (PIPE_PRIM_LINES).
  */
-static unsigned nr_lines( unsigned prim,
-  unsigned nr )
+static unsigned
+nr_lines(unsigned prim, unsigned nr)
 {
switch (prim) {
case PIPE_PRIM_TRIANGLES:
-  return (nr / 3) * 6; 
+  return (nr / 3) * 6;
case PIPE_PRIM_TRIANGLE_STRIP:
   return (nr - 2) * 6;
case PIPE_PRIM_TRIANGLE_FAN:
@@ -108,7 +108,6 @@ static unsigned nr_lines( unsigned prim,
   return 0;
}
 }
-  
 
 
 enum indices_mode
@@ -130,13 +129,11 @@ u_unfilled_translator(unsigned prim,
*out_index_size = (in_index_size == 4) ? 4 : 2;
out_idx = out_size_idx(*out_index_size);
 
-   if (unfilled_mode == PIPE_POLYGON_MODE_POINT) 
-   {
+   if (unfilled_mode == PIPE_POLYGON_MODE_POINT) {
   *out_prim = PIPE_PRIM_POINTS;
   *out_nr = nr;
 
-  switch (in_index_size)
-  {
+  switch (in_index_size) {
   case 1:
  *out_translate = translate_ubyte_ushort;
  return U_TRANSLATE_NORMAL;
@@ -189,7 +186,6 @@ u_unfilled_generator(unsigned prim,
out_idx = out_size_idx(*out_index_size);
 
if (unfilled_mode == PIPE_POLYGON_MODE_POINT) {
-
   if (*out_index_size == 4)
  *out_generate = generate_linear_uint;
   else
@@ -208,4 +204,3 @@ u_unfilled_generator(unsigned prim,
   return U_GENERATE_REUSABLE;
}
 }
-
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] util/indices: improve comments in u_indices.h

2016-05-26 Thread Brian Paul
---
 src/gallium/auxiliary/indices/u_indices.h | 32 ---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/src/gallium/auxiliary/indices/u_indices.h 
b/src/gallium/auxiliary/indices/u_indices.h
index 4483eb8..7f696ab 100644
--- a/src/gallium/auxiliary/indices/u_indices.h
+++ b/src/gallium/auxiliary/indices/u_indices.h
@@ -27,6 +27,7 @@
 
 #include "pipe/p_compiler.h"
 
+/* First/last provoking vertex */
 #define PV_FIRST  0
 #define PV_LAST   1
 #define PV_COUNT  2
@@ -35,13 +36,15 @@
 #define PR_DISABLE 0
 #define PR_ENABLE 1
 #define PR_COUNT 2
+
+
 /**
  * Index translator function (for glDrawElements() case)
  *
  * \param in the input index buffer
  * \param start  the index of the first vertex (pipe_draw_info::start)
  * \param nr the number of vertices (pipe_draw_info::count)
- * \param outoutput buffer big enough or nr vertices (of
+ * \param outoutput buffer big enough for nr vertices (of
  *@out_index_size bytes each)
  */
 typedef void (*u_translate_func)( const void *in,
@@ -56,7 +59,7 @@ typedef void (*u_translate_func)( const void *in,
  *
  * \param start  the index of the first vertex (pipe_draw_info::start)
  * \param nr the number of vertices (pipe_draw_info::count)
- * \param outoutput buffer big enough or nr vertices (of
+ * \param outoutput buffer big enough for nr vertices (of
  *@out_index_size bytes each)
  */
 typedef void (*u_generate_func)( unsigned start,
@@ -78,6 +81,15 @@ enum indices_mode {
 
 void u_index_init( void );
 
+
+/**
+ * For indexed drawing, this function determines what kind of primitive
+ * transformation is needed (if any) for handling:
+ * - unsupported primitive types (such as PIPE_PRIM_POLYGON)
+ * - changing the provoking vertex
+ * - primitive restart
+ * - index size (1 byte, 2 byte or 4 byte indexes)
+ */
 enum indices_mode
 u_index_translator(unsigned hw_mask,
unsigned prim,
@@ -91,7 +103,12 @@ u_index_translator(unsigned hw_mask,
unsigned *out_nr,
u_translate_func *out_translate);
 
-/* Note that even when generating it is necessary to know what the
+
+/**
+ * For non-indexed drawing, this function determines what kind of primitive
+ * transformation is needed (see above).
+ *
+ * Note that even when generating it is necessary to know what the
  * API's PV is, as the indices generated will depend on whether it is
  * the same as hardware or not, and in the case of triangle strips,
  * whether it is first or last.
@@ -111,6 +128,12 @@ u_index_generator(unsigned hw_mask,
 
 void u_unfilled_init( void );
 
+/**
+ * If the driver can't handle "unfilled" primitives (i.e. drawing triangle
+ * primitives as 3 lines or 3 points) this function can be used to translate
+ * an indexed primitive into a new indexed primitive to draw as lines or
+ * points.
+ */
 enum indices_mode
 u_unfilled_translator(unsigned prim,
   unsigned in_index_size,
@@ -121,6 +144,9 @@ u_unfilled_translator(unsigned prim,
   unsigned *out_nr,
   u_translate_func *out_translate);
 
+/**
+ * As above, but for non-indexed (array) primitives.
+ */
 enum indices_mode
 u_unfilled_generator(unsigned prim,
  unsigned start,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] util/indices: implement provoking vertex conversion for adjacency primitives

2016-05-26 Thread Brian Paul
Tested with new piglit gl-3.2-adj-prims test.
---
 src/gallium/auxiliary/indices/u_indices.c  | 52 
 src/gallium/auxiliary/indices/u_indices_gen.py | 83 +-
 src/gallium/auxiliary/indices/u_indices_priv.h |  2 +-
 3 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/src/gallium/auxiliary/indices/u_indices.c 
b/src/gallium/auxiliary/indices/u_indices.c
index 436f8f0..2b2d10c 100644
--- a/src/gallium/auxiliary/indices/u_indices.c
+++ b/src/gallium/auxiliary/indices/u_indices.c
@@ -55,6 +55,8 @@ static void translate_memcpy_uint( const void *in,
  * - Translate from first provoking vertex to last provoking vertex and
  *   vice versa.
  *
+ * Note that this function is used for indexed primitives.
+ *
  * \param hw_mask  mask of (1 << PIPE_PRIM_x) flags indicating which types
  * of primitives are supported by the hardware.
  * \param prim  incoming PIPE_PRIM_x
@@ -172,6 +174,30 @@ u_index_translator(unsigned hw_mask,
  *out_nr = (nr - 2) * 3;
  break;
 
+  case PIPE_PRIM_LINES_ADJACENCY:
+ *out_translate = 
translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];
+ *out_prim = PIPE_PRIM_LINES_ADJACENCY;
+ *out_nr = nr;
+ break;
+
+  case PIPE_PRIM_LINE_STRIP_ADJACENCY:
+ *out_translate = 
translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];
+ *out_prim = PIPE_PRIM_LINES_ADJACENCY;
+ *out_nr = (nr - 3) * 4;
+ break;
+
+  case PIPE_PRIM_TRIANGLES_ADJACENCY:
+ *out_translate = 
translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];
+ *out_prim = PIPE_PRIM_TRIANGLES_ADJACENCY;
+ *out_nr = nr;
+ break;
+
+  case PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY:
+ *out_translate = 
translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];
+ *out_prim = PIPE_PRIM_TRIANGLES_ADJACENCY;
+ *out_nr = ((nr - 4) / 2) * 6;
+ break;
+
   default:
  assert(0);
  *out_translate = 
translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];
@@ -193,6 +219,8 @@ u_index_translator(unsigned hw_mask,
  * The generator functions generates a number of ushort or uint indexes
  * for drawing the new type of primitive.
  *
+ * Note that this function is used for non-indexed primitives.
+ *
  * \param hw_mask  a bitmask of (1 << PIPE_PRIM_x) values that indicates
  * kind of primitives are supported by the driver.
  * \param prim  the PIPE_PRIM_x that the user wants to draw
@@ -294,6 +322,30 @@ u_index_generator(unsigned hw_mask,
  *out_nr = (nr - 2) * 3;
  return U_GENERATE_REUSABLE;
 
+  case PIPE_PRIM_LINES_ADJACENCY:
+ *out_generate = generate[out_idx][in_pv][out_pv][prim];
+ *out_prim = PIPE_PRIM_LINES_ADJACENCY;
+ *out_nr = nr;
+ return U_GENERATE_REUSABLE;
+
+  case PIPE_PRIM_LINE_STRIP_ADJACENCY:
+ *out_generate = generate[out_idx][in_pv][out_pv][prim];
+ *out_prim = PIPE_PRIM_LINES_ADJACENCY;
+ *out_nr = (nr - 3) * 4;
+ return U_GENERATE_REUSABLE;
+
+  case PIPE_PRIM_TRIANGLES_ADJACENCY:
+ *out_generate = generate[out_idx][in_pv][out_pv][prim];
+ *out_prim = PIPE_PRIM_TRIANGLES_ADJACENCY;
+ *out_nr = nr;
+ return U_GENERATE_REUSABLE;
+
+  case PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY:
+ *out_generate = generate[out_idx][in_pv][out_pv][prim];
+ *out_prim = PIPE_PRIM_TRIANGLES_ADJACENCY;
+ *out_nr = ((nr - 4) / 2) * 6;
+ return U_GENERATE_REUSABLE;
+
   default:
  assert(0);
  *out_generate = generate[out_idx][in_pv][out_pv][PIPE_PRIM_POINTS];
diff --git a/src/gallium/auxiliary/indices/u_indices_gen.py 
b/src/gallium/auxiliary/indices/u_indices_gen.py
index 97c8e0d..fb6b310 100644
--- a/src/gallium/auxiliary/indices/u_indices_gen.py
+++ b/src/gallium/auxiliary/indices/u_indices_gen.py
@@ -42,7 +42,11 @@ PRIMS=('points',
'tristrip', 
'quads', 
'quadstrip', 
-   'polygon')
+   'polygon',
+   'linesadj',
+   'linestripadj',
+   'trisadj',
+   'tristripadj')
 
 LONGPRIMS=('PIPE_PRIM_POINTS', 
'PIPE_PRIM_LINES', 
@@ -53,7 +57,11 @@ LONGPRIMS=('PIPE_PRIM_POINTS',
'PIPE_PRIM_TRIANGLE_STRIP', 
'PIPE_PRIM_QUADS', 
'PIPE_PRIM_QUAD_STRIP', 
-   'PIPE_PRIM_POLYGON')
+   'PIPE_PRIM_POLYGON',
+   'PIPE_PRIM_LINES_ADJACENCY',
+   'PIPE_PRIM_LINE_STRIP_ADJACENCY',
+   'PIPE_PRIM_TRIANGLES_ADJACENCY',
+   'PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY')
 
 longprim = dict(zip(PRIMS, LONGPRIMS))
 intype_idx = dict(ubyte='IN_UBYTE', ushort='IN_USHORT', uint='IN_UINT')
@@ -123,6 +131,20 @@ def tri( intype, outtype, ptr, v0, v1, v2 ):
 print '  (' + ptr + ')[1] = ' + vert( intype, outtype, v1 ) + ';'
 print '  (' + ptr + ')[2] = ' + vert( intype, outtype

[Mesa-dev] [PATCH 3/5] util/indices: assert that the incoming primitive is a triangle type

2016-05-26 Thread Brian Paul
The unfilled index translator/generator functions should only be
called when the primitive mode is one of the triangle types.
---
 src/gallium/auxiliary/indices/u_unfilled_indices.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gallium/auxiliary/indices/u_unfilled_indices.c 
b/src/gallium/auxiliary/indices/u_unfilled_indices.c
index fe57fd7..49fff6b 100644
--- a/src/gallium/auxiliary/indices/u_unfilled_indices.c
+++ b/src/gallium/auxiliary/indices/u_unfilled_indices.c
@@ -24,6 +24,7 @@
 
 #include "u_indices.h"
 #include "u_indices_priv.h"
+#include "util/u_prim.h"
 
 
 static void translate_ubyte_ushort( const void *in,
@@ -123,6 +124,8 @@ u_unfilled_translator(unsigned prim,
unsigned in_idx;
unsigned out_idx;
 
+   assert(u_reduced_prim(prim) == PIPE_PRIM_TRIANGLES);
+
u_unfilled_init();
 
in_idx = in_size_idx(in_index_size);
@@ -180,6 +183,8 @@ u_unfilled_generator(unsigned prim,
 {
unsigned out_idx;
 
+   assert(u_reduced_prim(prim) == PIPE_PRIM_TRIANGLES);
+
u_unfilled_init();
 
*out_index_size = ((start + nr) > 0xfffe) ? 4 : 2;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] util/indices: implement unfilled (tri->line) conversion for adjacency prims

2016-05-26 Thread Brian Paul
Tested with new piglit gl-3.2-adj-prims test.
---
 src/gallium/auxiliary/indices/u_unfilled_gen.py| 26 --
 src/gallium/auxiliary/indices/u_unfilled_indices.c | 14 
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/indices/u_unfilled_gen.py 
b/src/gallium/auxiliary/indices/u_unfilled_gen.py
index 873e781..18d9968 100644
--- a/src/gallium/auxiliary/indices/u_unfilled_gen.py
+++ b/src/gallium/auxiliary/indices/u_unfilled_gen.py
@@ -35,14 +35,18 @@ PRIMS=('tris',
'tristrip', 
'quads', 
'quadstrip', 
-   'polygon')
+   'polygon',
+   'tristripadj',
+   'trisadj')
 
 LONGPRIMS=('PIPE_PRIM_TRIANGLES', 
'PIPE_PRIM_TRIANGLE_FAN', 
'PIPE_PRIM_TRIANGLE_STRIP', 
'PIPE_PRIM_QUADS', 
'PIPE_PRIM_QUAD_STRIP', 
-   'PIPE_PRIM_POLYGON')
+   'PIPE_PRIM_POLYGON',
+   'PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY',
+   'PIPE_PRIM_TRIANGLES_ADJACENCY')
 
 longprim = dict(zip(PRIMS, LONGPRIMS))
 intype_idx = dict(ubyte='IN_UBYTE', ushort='IN_USHORT', uint='IN_UINT')
@@ -194,6 +198,22 @@ def quadstrip(intype, outtype):
 postamble()
 
 
+def trisadj(intype, outtype):
+preamble(intype, outtype, prim='trisadj')
+print '  for (i = start, j = 0; j < out_nr; j+=6, i+=6) { '
+do_tri( intype, outtype, 'out+j',  'i', 'i+2', 'i+4' );
+print '   }'
+postamble()
+
+
+def tristripadj(intype, outtype):
+preamble(intype, outtype, prim='tristripadj')
+print '  for (i = start, j = 0; j < out_nr; j+=6, i+=2) { '
+do_tri( intype, outtype, 'out+j',  'i', 'i+2', 'i+4' );
+print '   }'
+postamble()
+
+
 def emit_funcs():
 for intype in INTYPES:
 for outtype in OUTTYPES:
@@ -203,6 +223,8 @@ def emit_funcs():
 quads(intype, outtype)
 quadstrip(intype, outtype)
 polygon(intype, outtype)
+trisadj(intype, outtype)
+tristripadj(intype, outtype)
 
 def init(intype, outtype, prim):
 if intype == GENERATE:
diff --git a/src/gallium/auxiliary/indices/u_unfilled_indices.c 
b/src/gallium/auxiliary/indices/u_unfilled_indices.c
index 49fff6b..8cb5192 100644
--- a/src/gallium/auxiliary/indices/u_unfilled_indices.c
+++ b/src/gallium/auxiliary/indices/u_unfilled_indices.c
@@ -22,6 +22,12 @@
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  */
 
+
+/*
+ * NOTE: This file is not compiled by itself.  It's actually #included
+ * by the generated u_unfilled_gen.c file!
+ */
+
 #include "u_indices.h"
 #include "u_indices_priv.h"
 #include "util/u_prim.h"
@@ -104,6 +110,14 @@ nr_lines(unsigned prim, unsigned nr)
   return (nr - 2) / 2 * 8;
case PIPE_PRIM_POLYGON:
   return 2 * nr; /* a line (two verts) for each polygon edge */
+   /* Note: these cases can't really be handled since drawing lines instead
+* of triangles would also require changing the GS.  But if there's no GS,
+* this should work.
+*/
+   case PIPE_PRIM_TRIANGLES_ADJACENCY:
+  return (nr / 6) * 6;
+   case PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY:
+  return ((nr - 4) / 2) * 6;
default:
   assert(0);
   return 0;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] svga: clean up and improve comments in svga_draw_private.h

2016-05-26 Thread Brian Paul
---
 src/gallium/drivers/svga/svga_draw_private.h | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_draw_private.h 
b/src/gallium/drivers/svga/svga_draw_private.h
index c821742..48e0b60 100644
--- a/src/gallium/drivers/svga/svga_draw_private.h
+++ b/src/gallium/drivers/svga/svga_draw_private.h
@@ -157,13 +157,17 @@ struct svga_hwtnl {
 * This is compensated for in the offset associated with all
 * vertex buffers.
 */
-
int index_bias;

-   /* Flatshade information:
+   /* Provoking vertex information (for flat shading). */
+   unsigned api_pv;  /**< app-requested PV mode (PV_FIRST or PV_LAST) */
+   unsigned hw_pv;   /**< device-supported PV mode (PV_FIRST or PV_LAST) */
+
+   /* The triangle fillmode for the device (one of PIPE_POLYGON_MODE_{FILL,
+* LINE,POINT}).  If the polygon front mode matches the back mode,
+* api_fillmode will be that mode.  Otherwise, api_fillmode will be
+* PIPE_POLYGON_MODE_FILL.
 */
-   unsigned api_pv;
-   unsigned hw_pv;
unsigned api_fillmode;
 
/* Cache the results of running a particular generate func on each
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] svga: fix test for unfilled triangles fallback

2016-05-26 Thread Brian Paul
VGPU10 actually supports line-mode triangles.  We failed to make use of
that before.
---
 src/gallium/drivers/svga/svga_draw_arrays.c   |  8 --
 src/gallium/drivers/svga/svga_draw_elements.c |  3 +--
 src/gallium/drivers/svga/svga_draw_private.h  | 38 +--
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_draw_arrays.c 
b/src/gallium/drivers/svga/svga_draw_arrays.c
index c056772..43d7a97 100644
--- a/src/gallium/drivers/svga/svga_draw_arrays.c
+++ b/src/gallium/drivers/svga/svga_draw_arrays.c
@@ -212,6 +212,11 @@ svga_hwtnl_draw_arrays(struct svga_hwtnl *hwtnl,
unsigned api_pv = hwtnl->api_pv;
struct svga_context *svga = hwtnl->svga;
 
+   if (svga->curr.rast->templ.fill_front !=
+   svga->curr.rast->templ.fill_back) {
+  assert(hwtnl->api_fillmode == PIPE_POLYGON_MODE_FILL);
+   }
+
if (svga->curr.rast->templ.flatshade &&
svga->state.hw_draw.fs->constant_color_output) {
   /* The fragment color is a constant, not per-vertex so the whole
@@ -236,8 +241,7 @@ svga_hwtnl_draw_arrays(struct svga_hwtnl *hwtnl,
   }
}
 
-   if (hwtnl->api_fillmode != PIPE_POLYGON_MODE_FILL &&
-   u_reduced_prim(prim) == PIPE_PRIM_TRIANGLES) {
+   if (svga_need_unfilled_fallback(hwtnl, prim)) {
   /* Convert unfilled polygons into points, lines, triangles */
   gen_type = u_unfilled_generator(prim,
   start,
diff --git a/src/gallium/drivers/svga/svga_draw_elements.c 
b/src/gallium/drivers/svga/svga_draw_elements.c
index a987b92..b74c745 100644
--- a/src/gallium/drivers/svga/svga_draw_elements.c
+++ b/src/gallium/drivers/svga/svga_draw_elements.c
@@ -138,8 +138,7 @@ svga_hwtnl_draw_range_elements(struct svga_hwtnl *hwtnl,
u_translate_func gen_func;
enum pipe_error ret = PIPE_OK;
 
-   if (hwtnl->api_fillmode != PIPE_POLYGON_MODE_FILL &&
-   u_reduced_prim(prim) == PIPE_PRIM_TRIANGLES) {
+   if (svga_need_unfilled_fallback(hwtnl, prim)) {
   gen_type = u_unfilled_translator(prim,
index_size,
count,
diff --git a/src/gallium/drivers/svga/svga_draw_private.h 
b/src/gallium/drivers/svga/svga_draw_private.h
index 48e0b60..da5d60e 100644
--- a/src/gallium/drivers/svga/svga_draw_private.h
+++ b/src/gallium/drivers/svga/svga_draw_private.h
@@ -29,6 +29,8 @@
 #include "pipe/p_compiler.h"
 #include "pipe/p_defines.h"
 #include "indices/u_indices.h"
+#include "util/u_prim.h"
+#include "svga_context.h"
 #include "svga_hw_reg.h"
 #include "svga3d_shaderdefs.h"
 
@@ -182,9 +184,41 @@ struct svga_hwtnl {
 
 
 
-/***
- * Internal functions
+/**
+ * Do we need to use the gallium 'indices' helper to render unfilled
+ * triangles?
  */
+static inline boolean
+svga_need_unfilled_fallback(const struct svga_hwtnl *hwtnl, unsigned prim)
+{
+   const struct svga_context *svga = hwtnl->svga;
+
+   if (u_reduced_prim(prim) != PIPE_PRIM_TRIANGLES) {
+  /* if we're drawing points or lines, no fallback needed */
+  return FALSE;
+   }
+
+   if (svga_have_vgpu10(svga)) {
+  /* vgpu10 supports polygon fill and line modes */
+  if ((prim == PIPE_PRIM_QUADS ||
+   prim == PIPE_PRIM_QUAD_STRIP ||
+   prim == PIPE_PRIM_POLYGON) &&
+  hwtnl->api_fillmode == PIPE_POLYGON_MODE_LINE) {
+ /* VGPU10 doesn't directly render quads or polygons.  They're
+  * converted to triangles.  If we let the device draw the triangle
+  * outlines we'll get an extra, stray lines in the interiors.
+  * So, to draw unfilled quads correctly, we need the fallback.
+  */
+ return true;
+  }
+  return hwtnl->api_fillmode == PIPE_POLYGON_MODE_POINT;
+   } else {
+  /* vgpu9 doesn't support line or point fill modes */
+  return hwtnl->api_fillmode != PIPE_POLYGON_MODE_FILL;
+   }
+}
+
+
 enum pipe_error 
 svga_hwtnl_prim( struct svga_hwtnl *hwtnl,
  const SVGA3dPrimitiveRange *range,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] util/indices, svga: s/unsigned/enum pipe_prim_type/

2016-05-26 Thread Brian Paul
---
 src/gallium/auxiliary/indices/u_indices.c  |  6 +++---
 src/gallium/auxiliary/indices/u_indices.h  | 17 +
 src/gallium/auxiliary/indices/u_unfilled_indices.c | 10 +-
 src/gallium/drivers/svga/svga_draw_arrays.c|  3 ++-
 src/gallium/drivers/svga/svga_draw_elements.c  |  3 ++-
 5 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/src/gallium/auxiliary/indices/u_indices.c 
b/src/gallium/auxiliary/indices/u_indices.c
index 2b2d10c..91f00f2 100644
--- a/src/gallium/auxiliary/indices/u_indices.c
+++ b/src/gallium/auxiliary/indices/u_indices.c
@@ -72,7 +72,7 @@ static void translate_memcpy_uint( const void *in,
  */
 enum indices_mode
 u_index_translator(unsigned hw_mask,
-   unsigned prim,
+   enum pipe_prim_type prim,
unsigned in_index_size,
unsigned nr,
unsigned in_pv,
@@ -235,12 +235,12 @@ u_index_translator(unsigned hw_mask,
  */
 enum indices_mode
 u_index_generator(unsigned hw_mask,
-  unsigned prim,
+  enum pipe_prim_type prim,
   unsigned start,
   unsigned nr,
   unsigned in_pv,
   unsigned out_pv,
-  unsigned *out_prim,
+  enum pipe_prim_type *out_prim,
   unsigned *out_index_size,
   unsigned *out_nr,
   u_generate_func *out_generate)
diff --git a/src/gallium/auxiliary/indices/u_indices.h 
b/src/gallium/auxiliary/indices/u_indices.h
index 7f696ab..f160fcb 100644
--- a/src/gallium/auxiliary/indices/u_indices.h
+++ b/src/gallium/auxiliary/indices/u_indices.h
@@ -26,6 +26,7 @@
 #define U_INDICES_H
 
 #include "pipe/p_compiler.h"
+#include "pipe/p_defines.h"
 
 /* First/last provoking vertex */
 #define PV_FIRST  0
@@ -92,13 +93,13 @@ void u_index_init( void );
  */
 enum indices_mode
 u_index_translator(unsigned hw_mask,
-   unsigned prim,
+   enum pipe_prim_type prim,
unsigned in_index_size,
unsigned nr,
unsigned in_pv,   /* API */
unsigned out_pv,  /* hardware */
unsigned prim_restart,
-   unsigned *out_prim,
+   enum pipe_prim_type *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate);
@@ -115,12 +116,12 @@ u_index_translator(unsigned hw_mask,
  */
 enum indices_mode
 u_index_generator(unsigned hw_mask,
-  unsigned prim,
+  enum pipe_prim_type prim,
   unsigned start,
   unsigned nr,
   unsigned in_pv,   /* API */
   unsigned out_pv,  /* hardware */
-  unsigned *out_prim,
+  enum pipe_prim_type *out_prim,
   unsigned *out_index_size,
   unsigned *out_nr,
   u_generate_func *out_generate);
@@ -135,11 +136,11 @@ void u_unfilled_init( void );
  * points.
  */
 enum indices_mode
-u_unfilled_translator(unsigned prim,
+u_unfilled_translator(enum pipe_prim_type prim,
   unsigned in_index_size,
   unsigned nr,
   unsigned unfilled_mode,
-  unsigned *out_prim,
+  enum pipe_prim_type *out_prim,
   unsigned *out_index_size,
   unsigned *out_nr,
   u_translate_func *out_translate);
@@ -148,11 +149,11 @@ u_unfilled_translator(unsigned prim,
  * As above, but for non-indexed (array) primitives.
  */
 enum indices_mode
-u_unfilled_generator(unsigned prim,
+u_unfilled_generator(enum pipe_prim_type prim,
  unsigned start,
  unsigned nr,
  unsigned unfilled_mode,
- unsigned *out_prim,
+ enum pipe_prim_type *out_prim,
  unsigned *out_index_size,
  unsigned *out_nr,
  u_generate_func *out_generate);
diff --git a/src/gallium/auxiliary/indices/u_unfilled_indices.c 
b/src/gallium/auxiliary/indices/u_unfilled_indices.c
index 8cb5192..0ca1d04 100644
--- a/src/gallium/auxiliary/indices/u_unfilled_indices.c
+++ b/src/gallium/auxiliary/indices/u_unfilled_indices.c
@@ -95,7 +95,7 @@ static void generate_linear_uint( unsigned start,
  * separate lines (PIPE_PRIM_LINES).
  */
 static unsigned
-nr_lines(unsigned prim, unsigned nr)
+nr_lines(enum pipe_prim_type prim, unsigned nr)
 {
switch (prim) {
case PIPE_PRIM_TRIANGLES:
@@ -126,11 +126,11 @@ nr_lines(unsigned prim, unsigned nr)
 
 
 enum indices_mode
-u_unfilled_translator(unsigned prim,
+u_unfilled_translator(enum pipe_prim_type prim,
   unsigned in_index_size,

[Mesa-dev] [PATCH 5/5] gallium: change pipe_draw_info::mode to be pipe_prim_type

2016-05-26 Thread Brian Paul
Makes debugging with gdb a little nicer.
---
 src/gallium/include/pipe/p_state.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/include/pipe/p_state.h 
b/src/gallium/include/pipe/p_state.h
index eacf9bb..396f563 100644
--- a/src/gallium/include/pipe/p_state.h
+++ b/src/gallium/include/pipe/p_state.h
@@ -617,7 +617,7 @@ struct pipe_draw_info
 {
boolean indexed;  /**< use index buffer */
 
-   unsigned mode;  /**< the mode of the primitive */
+   enum pipe_prim_type mode;  /**< the mode of the primitive */
unsigned start;  /**< the index of the first vertex */
unsigned count;  /**< number of vertices */
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] svga: s/unsigned/enum pipe_resource_usage/ for buffer usage variables

2016-05-26 Thread Brian Paul
---
 src/gallium/drivers/svga/svga_resource_buffer.c | 2 +-
 src/gallium/drivers/svga/svga_screen_cache.c| 2 +-
 src/gallium/drivers/svga/svga_screen_cache.h| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_resource_buffer.c 
b/src/gallium/drivers/svga/svga_resource_buffer.c
index 9ecb975..d91497c 100644
--- a/src/gallium/drivers/svga/svga_resource_buffer.c
+++ b/src/gallium/drivers/svga/svga_resource_buffer.c
@@ -69,7 +69,7 @@ static void *
 svga_buffer_transfer_map(struct pipe_context *pipe,
  struct pipe_resource *resource,
  unsigned level,
- unsigned usage,
+ enum pipe_resource_usage usage,
  const struct pipe_box *box,
  struct pipe_transfer **ptransfer)
 {
diff --git a/src/gallium/drivers/svga/svga_screen_cache.c 
b/src/gallium/drivers/svga/svga_screen_cache.c
index 321c564..eaa589c 100644
--- a/src/gallium/drivers/svga/svga_screen_cache.c
+++ b/src/gallium/drivers/svga/svga_screen_cache.c
@@ -413,7 +413,7 @@ svga_screen_cache_init(struct svga_screen *svgascreen)
  */
 struct svga_winsys_surface *
 svga_screen_surface_create(struct svga_screen *svgascreen,
-   unsigned bind_flags, unsigned usage,
+   unsigned bind_flags, enum pipe_resource_usage usage,
struct svga_host_surface_cache_key *key)
 {
struct svga_winsys_screen *sws = svgascreen->sws;
diff --git a/src/gallium/drivers/svga/svga_screen_cache.h 
b/src/gallium/drivers/svga/svga_screen_cache.h
index 424eb2c..05d8c56 100644
--- a/src/gallium/drivers/svga/svga_screen_cache.h
+++ b/src/gallium/drivers/svga/svga_screen_cache.h
@@ -140,7 +140,7 @@ svga_screen_cache_init(struct svga_screen *svgascreen);
 
 struct svga_winsys_surface *
 svga_screen_surface_create(struct svga_screen *svgascreen,
-   unsigned bind_flags, unsigned usage,
+   unsigned bind_flags, enum pipe_resource_usage usage,
struct svga_host_surface_cache_key *key);
 
 void
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] svga: s/unsigned/enum pipe_prim_type/ for primitive type variables

2016-05-26 Thread Brian Paul
Proper enum types were only added recently.
---
 src/gallium/drivers/svga/svga_draw.h  | 4 ++--
 src/gallium/drivers/svga/svga_draw_arrays.c   | 6 +++---
 src/gallium/drivers/svga/svga_draw_elements.c | 6 +++---
 src/gallium/drivers/svga/svga_draw_private.h  | 5 +++--
 src/gallium/drivers/svga/svga_pipe_draw.c | 4 ++--
 src/gallium/drivers/svga/svga_swtnl_backend.c | 2 +-
 src/gallium/drivers/svga/svga_swtnl_private.h | 2 +-
 7 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_draw.h 
b/src/gallium/drivers/svga/svga_draw.h
index af8ecab..baefcd9 100644
--- a/src/gallium/drivers/svga/svga_draw.h
+++ b/src/gallium/drivers/svga/svga_draw.h
@@ -59,7 +59,7 @@ svga_hwtnl_vertex_buffers(struct svga_hwtnl *hwtnl,
 
 enum pipe_error
 svga_hwtnl_draw_arrays(struct svga_hwtnl *hwtnl,
-   unsigned prim, unsigned start, unsigned count,
+   enum pipe_prim_type prim, unsigned start, unsigned 
count,
unsigned start_instance, unsigned instance_count);
 
 enum pipe_error
@@ -69,7 +69,7 @@ svga_hwtnl_draw_range_elements(struct svga_hwtnl *hwtnl,
int index_bias,
unsigned min_index,
unsigned max_index,
-   unsigned prim, unsigned start, unsigned count,
+   enum pipe_prim_type prim, unsigned start, 
unsigned count,
unsigned start_instance, unsigned 
instance_count);
 
 boolean
diff --git a/src/gallium/drivers/svga/svga_draw_arrays.c 
b/src/gallium/drivers/svga/svga_draw_arrays.c
index 43d7a97..4bd1a33 100644
--- a/src/gallium/drivers/svga/svga_draw_arrays.c
+++ b/src/gallium/drivers/svga/svga_draw_arrays.c
@@ -90,7 +90,7 @@ compare(unsigned cached_nr, unsigned nr, unsigned type)
 
 static enum pipe_error
 retrieve_or_generate_indices(struct svga_hwtnl *hwtnl,
- unsigned prim,
+ enum pipe_prim_type prim,
  unsigned gen_type,
  unsigned gen_nr,
  unsigned gen_size,
@@ -170,7 +170,7 @@ retrieve_or_generate_indices(struct svga_hwtnl *hwtnl,
 
 static enum pipe_error
 simple_draw_arrays(struct svga_hwtnl *hwtnl,
-   unsigned prim, unsigned start, unsigned count,
+   enum pipe_prim_type prim, unsigned start, unsigned count,
unsigned start_instance, unsigned instance_count)
 {
SVGA3dPrimitiveRange range;
@@ -202,7 +202,7 @@ simple_draw_arrays(struct svga_hwtnl *hwtnl,
 
 enum pipe_error
 svga_hwtnl_draw_arrays(struct svga_hwtnl *hwtnl,
-   unsigned prim, unsigned start, unsigned count,
+   enum pipe_prim_type prim, unsigned start, unsigned 
count,
unsigned start_instance, unsigned instance_count)
 {
unsigned gen_prim, gen_size, gen_nr;
diff --git a/src/gallium/drivers/svga/svga_draw_elements.c 
b/src/gallium/drivers/svga/svga_draw_elements.c
index b74c745..6eb5067 100644
--- a/src/gallium/drivers/svga/svga_draw_elements.c
+++ b/src/gallium/drivers/svga/svga_draw_elements.c
@@ -39,7 +39,7 @@
 
 static enum pipe_error
 translate_indices(struct svga_hwtnl *hwtnl, struct pipe_resource *src,
-  unsigned offset, unsigned prim, unsigned nr,
+  unsigned offset, enum pipe_prim_type prim, unsigned nr,
   unsigned index_size,
   u_translate_func translate, struct pipe_resource **out_buf)
 {
@@ -98,7 +98,7 @@ svga_hwtnl_simple_draw_range_elements(struct svga_hwtnl 
*hwtnl,
   struct pipe_resource *index_buffer,
   unsigned index_size, int index_bias,
   unsigned min_index, unsigned max_index,
-  unsigned prim, unsigned start,
+  enum pipe_prim_type prim, unsigned start,
   unsigned count,
   unsigned start_instance,
   unsigned instance_count)
@@ -130,7 +130,7 @@ svga_hwtnl_draw_range_elements(struct svga_hwtnl *hwtnl,
struct pipe_resource *index_buffer,
unsigned index_size, int index_bias,
unsigned min_index, unsigned max_index,
-   unsigned prim, unsigned start, unsigned count,
+   enum pipe_prim_type prim, unsigned start, 
unsigned count,
unsigned start_instance, unsigned 
instance_count)
 {
unsigned gen_prim, gen_size, gen_nr;
diff --git a/src/gallium/drivers/svga/svga_draw_private.h 
b/src/gallium/drivers/svga/svga_draw_private.h
index da5d60e..38e5e66 100

Re: [Mesa-dev] [PATCH] nir: Use double-precision pow() when bit_size is 64, powf() otherwise

2016-05-26 Thread Emil Velikov
On 19 May 2016 at 10:00, Iago Toral  wrote:
> I have just noticed that this was never pushed, right? I noticed this
> while working on providing double-precision implementation for the other
> functions discussed in the thread.
>
You are correct Iago. Gents, can anyone get some light on the status
of this patch.

Has it been superseded by another, is there some work outstanding on this one ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] util: s/unsigned/enum pipe_resource_usage/ for buffer usage variables

2016-05-26 Thread Brian Paul
---
 src/gallium/auxiliary/util/u_debug.c  | 2 +-
 src/gallium/auxiliary/util/u_debug.h  | 2 +-
 src/gallium/auxiliary/util/u_inlines.h| 6 +++---
 src/gallium/auxiliary/util/u_staging.c| 2 +-
 src/gallium/auxiliary/util/u_staging.h| 2 +-
 src/gallium/auxiliary/util/u_suballoc.c   | 5 +++--
 src/gallium/auxiliary/util/u_suballoc.h   | 3 ++-
 src/gallium/auxiliary/util/u_upload_mgr.c | 4 ++--
 src/gallium/auxiliary/util/u_upload_mgr.h | 2 +-
 9 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_debug.c 
b/src/gallium/auxiliary/util/u_debug.c
index db66357..0d63cfe 100644
--- a/src/gallium/auxiliary/util/u_debug.c
+++ b/src/gallium/auxiliary/util/u_debug.c
@@ -550,7 +550,7 @@ debug_print_bind_flags(const char *msg, unsigned usage)
  * Print PIPE_USAGE_x enum values with a message.
  */
 void
-debug_print_usage_enum(const char *msg, unsigned usage)
+debug_print_usage_enum(const char *msg, enum pipe_resource_usage usage)
 {
static const struct debug_named_value names[] = {
   DEBUG_NAMED_VALUE(PIPE_USAGE_DEFAULT),
diff --git a/src/gallium/auxiliary/util/u_debug.h 
b/src/gallium/auxiliary/util/u_debug.h
index 85d0cb6..7da7f53 100644
--- a/src/gallium/auxiliary/util/u_debug.h
+++ b/src/gallium/auxiliary/util/u_debug.h
@@ -473,7 +473,7 @@ void
 debug_print_bind_flags(const char *msg, unsigned usage);
 
 void
-debug_print_usage_enum(const char *msg, unsigned usage);
+debug_print_usage_enum(const char *msg, enum pipe_resource_usage usage);
 
 
 #ifdef __cplusplus
diff --git a/src/gallium/auxiliary/util/u_inlines.h 
b/src/gallium/auxiliary/util/u_inlines.h
index 07c058a..a38223c 100644
--- a/src/gallium/auxiliary/util/u_inlines.h
+++ b/src/gallium/auxiliary/util/u_inlines.h
@@ -230,12 +230,12 @@ pipe_surface_equal(struct pipe_surface *s1, struct 
pipe_surface *s2)
 /**
  * Create a new resource.
  * \param bind  bitmask of PIPE_BIND_x flags
- * \param usage  bitmask of PIPE_USAGE_x flags
+ * \param usage  a PIPE_USAGE_x value
  */
 static inline struct pipe_resource *
 pipe_buffer_create( struct pipe_screen *screen,
unsigned bind,
-   unsigned usage,
+   enum pipe_resource_usage usage,
unsigned size )
 {
struct pipe_resource buffer;
@@ -395,7 +395,7 @@ pipe_buffer_write_nooverlap(struct pipe_context *pipe,
 static inline struct pipe_resource *
 pipe_buffer_create_with_data(struct pipe_context *pipe,
  unsigned bind,
- unsigned usage,
+ enum pipe_resource_usage usage,
  unsigned size,
  const void *ptr)
 {
diff --git a/src/gallium/auxiliary/util/u_staging.c 
b/src/gallium/auxiliary/util/u_staging.c
index caef2a8..5b61f5e 100644
--- a/src/gallium/auxiliary/util/u_staging.c
+++ b/src/gallium/auxiliary/util/u_staging.c
@@ -56,7 +56,7 @@ util_staging_resource_template(struct pipe_resource *pt, 
unsigned width,
 struct util_staging_transfer *
 util_staging_transfer_init(struct pipe_context *pipe,
struct pipe_resource *pt,
-   unsigned level, unsigned usage,
+   unsigned level, enum pipe_resource_usage usage,
const struct pipe_box *box,
boolean direct, struct util_staging_transfer *tx)
 {
diff --git a/src/gallium/auxiliary/util/u_staging.h 
b/src/gallium/auxiliary/util/u_staging.h
index 6c468aa..eed5584 100644
--- a/src/gallium/auxiliary/util/u_staging.h
+++ b/src/gallium/auxiliary/util/u_staging.h
@@ -56,7 +56,7 @@ struct util_staging_transfer {
 struct util_staging_transfer *
 util_staging_transfer_init(struct pipe_context *pipe,
struct pipe_resource *pt,
-   unsigned level, unsigned usage,
+   unsigned level, enum pipe_resource_usage usage,
const struct pipe_box *box,
boolean direct, struct util_staging_transfer *tx);
 
diff --git a/src/gallium/auxiliary/util/u_suballoc.c 
b/src/gallium/auxiliary/util/u_suballoc.c
index efa9a0c..3f9ede0 100644
--- a/src/gallium/auxiliary/util/u_suballoc.c
+++ b/src/gallium/auxiliary/util/u_suballoc.c
@@ -43,7 +43,7 @@ struct u_suballocator {
unsigned size;  /* Size of the whole buffer, in bytes. */
unsigned alignment; /* Alignment of each sub-allocation. */
unsigned bind;  /* Bitmask of PIPE_BIND_* flags. */
-   unsigned usage; /* One of PIPE_USAGE_* flags. */
+   enum pipe_resource_usage usage;
boolean zero_buffer_memory; /* If the buffer contents should be zeroed. */
 
struct pipe_resource *buffer;   /* The buffer we suballocate from. */
@@ -59,7 +59,8 @@ struct u_suballocator {
  */
 struct u_suballocator *
 u_suballocator_create(struct pipe_context *pipe, unsigned size,
-  unsig

Re: [Mesa-dev] [Mesa-stable] [V2 PATCH] meta: Fix the pbo usage in meta for GLES{1, 2} contexts

2016-05-26 Thread Emil Velikov
Hi all,

On 2 March 2016 at 03:22, Ian Romanick  wrote:
> Sorry for the delay.
>
> Reviewed-by: Ian Romanick 
>
>
> On 02/09/2016 03:28 PM, Anuj Phogat wrote:
>> OpenGL ES 1.0 doesn't support using GL_STREAM_DRAW and both
>> ES 1.0 and 2.0 don't support GL_STREAM_READ in glBufferData().
>> So, handle it correctly by calling the _mesa_meta_begin()
>> before create_texture_for_pbo().
>>
>> V2: Remove the changes related to allocate_storage. (Ian)
>>
>> Cc: Ian Romanick 
>> Cc: "11.1" 
>
>   "11.1 11.2"
>
>> Signed-off-by: Anuj Phogat 
>> ---
>>  src/mesa/drivers/common/meta_tex_subimage.c | 21 +
>>  1 file changed, 13 insertions(+), 8 deletions(-)
>>
It doesn't seem like this patch has landed yet, despite being
reviewed. Sadly it no longer applies cleanly to master, so I'm
wondering if it's still applicable or there's an alternative solution
(be that merged or not).

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl: Check if API is supported when using eglBindAPI.

2016-05-26 Thread Brian Paul

On 05/24/2016 04:42 PM, Plamena Manolova wrote:

According to the EGL specifications before binding an API
we must check whether it's supported first. If not eglBindAPI
should return EGL_FALSE and generate a EGL_BAD_PARAMETER error.

Signed-off-by: Plamena Manolova 
---
  src/egl/main/eglapi.c | 65 +++
  src/egl/main/eglcurrent.h | 11 +---
  src/egl/main/egldisplay.c |  5 
  src/egl/main/egldisplay.h |  1 +
  4 files changed, 72 insertions(+), 10 deletions(-)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index be2c90f..2d03ab4 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -1196,6 +1196,61 @@ eglGetError(void)
  }


+static bool
+_eglDisplaySupportsApi(_EGLDisplay *dpy, EGLenum api)
+{
+   if (!dpy->Initialized) {
+  return false;
+   }
+
+   switch (api) {
+   case EGL_OPENGL_API:
+  return !!(dpy->ClientAPIs & EGL_OPENGL_BIT);
+   case EGL_OPENGL_ES_API:
+  return !!(dpy->ClientAPIs & EGL_OPENGL_ES_BIT) ||
+ !!(dpy->ClientAPIs & EGL_OPENGL_ES2_BIT) ||
+ !!(dpy->ClientAPIs & EGL_OPENGL_ES3_BIT_KHR);


I think I'd indent the 2nd and 3rd lines there to line things up:

  return !!(dpy->ClientAPIs & EGL_OPENGL_ES_BIT) ||
 !!(dpy->ClientAPIs & EGL_OPENGL_ES2_BIT) ||
 !!(dpy->ClientAPIs & EGL_OPENGL_ES3_BIT_KHR);

And actually, I don't think you need the !! operators since the 
logical-OR will implicitly convert its operands to bool, but either way 
is fine.



+   case EGL_OPENVG_API:
+  return !!(dpy->ClientAPIs & EGL_OPENVG_BIT);
+   }
+
+   return false;
+}
+
+
+/**
+ * Return true if a client API enum is recognized.
+ */
+static bool
+_eglIsApiValid(EGLenum api)
+{
+   _EGLDisplay *dpy = _eglGlobal.DisplayList;
+   _EGLThreadInfo *current_thread = _eglGetCurrentThread();
+
+   if (api != EGL_OPENGL_API && api != EGL_OPENGL_ES_API &&
+   api != EGL_OPENVG_API) {
+  return false;
+   }
+
+   while (dpy) {
+  _EGLThreadInfo *thread = dpy->ThreadList;
+
+  while (thread) {
+ if (thread == current_thread) {
+if (_eglDisplaySupportsApi(dpy, api))
+   return true;
+ }
+
+ thread = thread->Next;
+  }
+
+  dpy = dpy->Next;
+   }
+
+   return false;
+}
+
+
  /**
   ** EGL 1.2
   **/
@@ -1211,6 +1266,16 @@ eglGetError(void)
   *  eglWaitNative()
   * See section 3.7 "Rendering Context" in the EGL specification for details.
   */
+
+ /**
+  * Section 3.7 (Rendering Contexts) of the EGL 1.5 spec says:
+  *
+  * "api must specify one of the supported client APIs, either
+  * EGL_OPENGL_API, EGL_OPENGL_ES_API, or EGL_OPENVG_API... If api
+  * is not one of the values specified above, or if the client API
+  * specified by api is not supported by the implementation, an
+  * EGL_BAD_PARAMETER error is generated."
+  */
  EGLBoolean EGLAPIENTRY
  eglBindAPI(EGLenum api)
  {
diff --git a/src/egl/main/eglcurrent.h b/src/egl/main/eglcurrent.h
index 1e386ac..6c203be 100644
--- a/src/egl/main/eglcurrent.h
+++ b/src/egl/main/eglcurrent.h
@@ -56,6 +56,7 @@ extern "C" {
   */
  struct _egl_thread_info
  {
+   _EGLThreadInfo *Next; /* used to link threads */
 EGLint LastError;
 _EGLContext *CurrentContexts[_EGL_API_NUM_APIS];
 /* use index for fast access to current context */
@@ -64,16 +65,6 @@ struct _egl_thread_info


  /**
- * Return true if a client API enum is recognized.
- */
-static inline EGLBoolean
-_eglIsApiValid(EGLenum api)
-{
-   return (api >= _EGL_API_FIRST_API && api <= _EGL_API_LAST_API);
-}
-
-
-/**
   * Convert a client API enum to an index, for use by thread info.
   * The client API enum is assumed to be valid.
   */
diff --git a/src/egl/main/egldisplay.c b/src/egl/main/egldisplay.c
index f6db03a..907a607 100644
--- a/src/egl/main/egldisplay.c
+++ b/src/egl/main/egldisplay.c
@@ -240,6 +240,7 @@ _EGLDisplay *
  _eglFindDisplay(_EGLPlatformType plat, void *plat_dpy)
  {
 _EGLDisplay *dpy;
+   _EGLThreadInfo *thread = _eglGetCurrentThread();

 if (plat == _EGL_INVALID_PLATFORM)
return NULL;
@@ -265,9 +266,13 @@ _eglFindDisplay(_EGLPlatformType plat, void *plat_dpy)
   /* add to the display list */
   dpy->Next = _eglGlobal.DisplayList;
   _eglGlobal.DisplayList = dpy;
+ dpy->ThreadList = NULL;
}
 }

+   thread->Next = dpy->ThreadList;
+   dpy->ThreadList = thread;
+
 mtx_unlock(_eglGlobal.Mutex);

 return dpy;
diff --git a/src/egl/main/egldisplay.h b/src/egl/main/egldisplay.h
index 6bfc858..8a730ed 100644
--- a/src/egl/main/egldisplay.h
+++ b/src/egl/main/egldisplay.h
@@ -140,6 +140,7 @@ struct _egl_display
 _EGLPlatformType Platform; /**< The type of the platform display */
 void *PlatformDisplay; /**< A pointer to the platform display */

+   _EGLThreadInfo *ThreadList;/**< A pointer to the thread the display was 
created form */
 _EGLDriver *Driver;/**< Matched driver of the display */

Re: [Mesa-dev] [Mesa-stable] [PATCH v2] scons: build osmesa swrast and gallium

2016-05-26 Thread Emil Velikov
On 27 April 2016 at 12:36, Emil Velikov  wrote:
> On 11 March 2016 at 08:43, Andreas Fänger  wrote:
>> This patch makes it possible to build classic osmesa/swrast on windows
>> again. Although there is a gallium version of osmesa now, the swrast version
>> still has more features lacking in llvmpipe, e.g. anisotropic filtering.
>>
>> This reverts commit 69db422218b0264b5b8eef45bd003a2544e9cbd6 and
>> c07df0f2014636b601cdbaff63214296599b1ad5 and adds "compiler" to
>> the LIBS in SConscript.
>>
> Brian, since you're one of the few people (the only person) looking
> after osmesa, are you OK with the patch ?
>
Brian, can we please hear your ack/nack this patch ?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 4/4] glx: fix error code when there is no context bound

2016-05-26 Thread Emil Velikov
Hi all,

On 29 February 2016 at 07:14, Tapani Pälli  wrote:
>
> On 02/22/2016 10:16 PM, Ian Romanick wrote:
>>
>> There are 17 total occurrences of
>>
>>  grep -r '[(]!gc[)]' src/glx/
>>
>> and
>>
>>  grep -r 'gc[[:space:]]*==[[:space:]]*NULL' src/glx/
>>
>> None of these check for dummyContext.  This is all very suspicious.
>> Looking at the implementation(s) of __glXGetCurrentContext, I don't
>> think it can ever return NULL.  Look in src/glx/glxcurrent.c.  It's
>> possible that __glXGetCurrentContext used to be able to return NULL, but
>> I find it unlikely.
>>
>> My guess is that all (or nearly all) of the !gc or gc == NULL checks are
>> wrong.  A bunch of them probably "just work" because they end up sending
>> protocol requests to the server, and the server sends back an error.
>
>
> I spent some time with this and it looks like some of these are correct as
> create_context (or indirect_create_context) can return NULL and also pointer
> given by client may be NULL (and can't be dummyContext). The places with
> explicit __glXGetCurrentContext call (9 of these) and a NULL check are
> incorrect. I can add these to the patch.
>
>> At the very least, I think these gc == NULL checks should be replaced by
>> asserts.  If the unit tests call these functions with
>> __glXGetCurrentContext returning NULL, the unit tests should be fixed to
>> return &dummyContext instead.
>
>
> Should it be then 'own dummyContext' implemented by fake_glx_screen.cpp
> something along lines in this patch and not trying to link with
> glxcurrent.c?
>
>> I'd really like to see analysis of the other NULL checks and either have
>> justifications for no change or have changes.  I'd also really like to
>> see piglit tests that could hit some of these.
>
>
> It looks like glx-test is testing return value of __glXGetCurrentContext
> currently (which is why it breaks), wouldn't fixing glx-test be sufficient?
>
>
Any news on the status of this patch ? The Suse guys did bring some
fixes recently (check __glXGetCurrentContext() vs dummyContext as
opposed to NULL), although I think we still want something like the
proposed here. Correct ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radeon/llvm: Use alloca instructions for larger arrays

2016-05-26 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Wed, May 25, 2016 at 2:35 PM, Tom Stellard  wrote:
> We were storing arrays in vectors, which was leading to some really bad
> spill code for large arrays.  allocas instructions are a better fit for
> arrays and LLVM optimizations are more geared toward dealing with
> allocas instead of vectors.
>
> For arrays that have 16 or less 32-bit elements, we will continue to use
> vectors, because this will force LLVM to store them in registers and
> use indirect registers, which is usually faster for small arrays.
>
> In the future we should use allocas for all arrays and teach LLVM
> how to store allocas in registers.
>
> This fixes the piglit test:
>
> spec/glsl-1.50/execution/geometry/max-input-component
> ---
>  src/gallium/drivers/radeon/radeon_llvm.h   |   7 +-
>  .../drivers/radeon/radeon_setup_tgsi_llvm.c| 169 
> ++---
>  2 files changed, 151 insertions(+), 25 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
> b/src/gallium/drivers/radeon/radeon_llvm.h
> index 3e11b36..5b524b6 100644
> --- a/src/gallium/drivers/radeon/radeon_llvm.h
> +++ b/src/gallium/drivers/radeon/radeon_llvm.h
> @@ -50,6 +50,11 @@ struct radeon_llvm_loop {
> LLVMBasicBlockRef endloop_block;
>  };
>
> +struct radeon_llvm_array {
> +   struct tgsi_declaration_range range;
> +   LLVMValueRef alloca;
> +};
> +
>  struct radeon_llvm_context {
> struct lp_build_tgsi_soa_context soa;
>
> @@ -96,7 +101,7 @@ struct radeon_llvm_context {
> unsigned loop_depth;
> unsigned loop_depth_max;
>
> -   struct tgsi_declaration_range *arrays;
> +   struct radeon_llvm_array *arrays;
>
> LLVMValueRef main_fn;
> LLVMTypeRef return_type;
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index 93bc307..cb35390 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -83,11 +83,25 @@ static LLVMValueRef emit_swizzle(
>
>  static struct tgsi_declaration_range
>  get_array_range(struct lp_build_tgsi_context *bld_base,
> -   unsigned File, const struct tgsi_ind_register *reg)
> +   unsigned File, unsigned reg_index,
> +   const struct tgsi_ind_register *reg)
>  {
> struct radeon_llvm_context * ctx = radeon_llvm_context(bld_base);
>
> -   if (File != TGSI_FILE_TEMPORARY || reg->ArrayID == 0 ||
> +   if (!reg) {
> +   unsigned i;
> +   unsigned num_arrays = 
> bld_base->info->array_max[TGSI_FILE_TEMPORARY];
> +   for (i = 0; i < num_arrays; i++) {
> +   const struct tgsi_declaration_range *range =
> +   &ctx->arrays[i].range;
> +
> +   if (reg_index >= range->First && reg_index <= 
> range->Last) {
> +   return ctx->arrays[i].range;
> +   }
> +   }
> +   }
> +
> +   if (File != TGSI_FILE_TEMPORARY || !reg || reg->ArrayID == 0 ||
> reg->ArrayID > bld_base->info->array_max[TGSI_FILE_TEMPORARY]) {
> struct tgsi_declaration_range range;
> range.First = 0;
> @@ -95,7 +109,32 @@ get_array_range(struct lp_build_tgsi_context *bld_base,
> return range;
> }
>
> -   return ctx->arrays[reg->ArrayID - 1];
> +   return ctx->arrays[reg->ArrayID - 1].range;
> +}
> +
> +static LLVMValueRef get_alloca_for_array(
> +   struct lp_build_tgsi_context *bld_base,
> +   unsigned file,
> +   unsigned index) {
> +
> +   unsigned i;
> +   unsigned num_arrays;
> +   struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
> +
> +   if (file != TGSI_FILE_TEMPORARY) {
> +   return NULL;
> +   }
> +
> +   num_arrays = bld_base->info->array_max[TGSI_FILE_TEMPORARY];
> +   for (i = 0; i < num_arrays; i++) {
> +   const struct tgsi_declaration_range *range =
> +   &ctx->arrays[i].range;
> +
> +   if (index >= range->First && index <= range->Last) {
> +   return ctx->arrays[i].alloca;
> +   }
> +   }
> +   return NULL;
>  }
>
>  static LLVMValueRef
> @@ -106,6 +145,9 @@ emit_array_index(
>  {
> struct gallivm_state * gallivm = bld->bld_base.base.gallivm;
>
> +   if (!reg) {
> +   return lp_build_const_int32(gallivm, offset);
> +   }
> LLVMValueRef addr = LLVMBuildLoad(gallivm->builder, 
> bld->addr[reg->Index][reg->Swizzle], "");
> return LLVMBuildAdd(gallivm->builder, addr, 
> lp_build_const_int32(gallivm, offset), "");
>  }
> @@ -154,7 +196,7 @@ emit_array_fetch(
> tmp_reg.Register.Index = i + range.First;
> LLVMValueRef temp = rade

Re: [Mesa-dev] [Mesa-stable] [PATCH] nv50/ir: fix texture barriers insertion with combined load/store

2016-05-26 Thread Emil Velikov
Hi guys,

Double-checking through the list and this patch never landed in
master. From a quick read through the log it's not immediately obvious
if an alternative did get pushed.

Can someone shed some light on the topic ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: Allow to invalidate external textures when (re-)binding

2016-05-26 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Wed, May 25, 2016 at 3:17 PM, Philipp Zabel  wrote:
> To comply with the requirement from the GL_OES_EGL_image_external
> extension that a call to glBindTexture guarantees that all further
> sampling will return values that correspond to the values in the
> external texture at or after the time that glBindTexture was called,
> do not bail out early from mesa_BindTextures if the target is
> external.
>
> Signed-off-by: Philipp Zabel 
> ---
>  src/mesa/main/texobj.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c
> index c9502bd..6219617 100644
> --- a/src/mesa/main/texobj.c
> +++ b/src/mesa/main/texobj.c
> @@ -1623,9 +1623,10 @@ bind_texture(struct gl_context *ctx,
> assert(targetIndex < NUM_TEXTURE_TARGETS);
>
> /* Check if this texture is only used by this context and is already 
> bound.
> -* If so, just return.
> +* If so, just return. For GL_OES_image_external, rebinding the texture
> +* always must invalidate cached resources.
>  */
> -   {
> +   if (targetIndex != TEXTURE_EXTERNAL_INDEX) {
>bool early_out;
>mtx_lock(&ctx->Shared->Mutex);
>early_out = ((ctx->Shared->RefCount == 1)
> --
> 2.8.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] st/mesa: Invalidate external textures when (re-)binding

2016-05-26 Thread Marek Olšák
On Wed, May 25, 2016 at 3:34 PM, Philipp Zabel  wrote:
> Am Mittwoch, den 25.05.2016, 09:23 -0400 schrieb Ilia Mirkin:
>> Iirc invalidate_resource is to allow backend to discard the contents...
>
> Thanks, I didn't know that. So this would need a new callback then?
> Specifically I want to discard a copy in tiled layout that was derived
> from a linear external texture previously.

FWIW, radeon drivers don't change the tile mode after a texture has
been exported. When the texture is exported, the tile mode is set in
stone, be it linear or tiled. There is no second copy.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] nv50/ir: fix texture barriers insertion with combined load/store

2016-05-26 Thread Samuel Pitoiset



On 05/26/2016 04:29 PM, Emil Velikov wrote:

Hi guys,

Double-checking through the list and this patch never landed in
master. From a quick read through the log it's not immediately obvious
if an alternative did get pushed.

Can someone shed some light on the topic ?


Hi Emil,

Oops! I have just forgot to make a v2. :)

Will do before the release (or during the RC window).

Thanks!



Thanks
Emil



--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Use double-precision pow() when bit_size is 64, powf() otherwise

2016-05-26 Thread Iago Toral
On Thu, 2016-05-26 at 15:13 +0100, Emil Velikov wrote:
> On 19 May 2016 at 10:00, Iago Toral  wrote:
> > I have just noticed that this was never pushed, right? I noticed this
> > while working on providing double-precision implementation for the other
> > functions discussed in the thread.
> >
> You are correct Iago. Gents, can anyone get some light on the status
> of this patch.
> 
> Has it been superseded by another, is there some work outstanding on this one 
> ?

Just to clarify the situation:

This particular patch fixes something that is wrong in master as of now
so I think it makes sense to push it. I don't think we have any bugs
related to this because GLSL only has a 32-bit pow and as it is now we
would simply use pow instead of powf for that, which is not what we want
but should be safe, so even if we don't push this I don't think we have
a problem.

Then we need to decide if we want to keep 64-bit implementations of pow
and other functions that don't really need 64-bit support in GLSL or
not. But this is not urgent because we are talking about versions of
functions that are not exposed by GLSL for now. If we want to do it,
then we can review and push [1] which adds support for a couple more and
if we don't then we have to remove the support for pow and sin/cos that
is in master right now.

[1] https://lists.freedesktop.org/archives/mesa-dev/2016-May/117521.html

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 00/14] radeonsi: offchip tessellation

2016-05-26 Thread Marek Olšák
On Thu, May 26, 2016 at 3:33 PM, Bas Nieuwenhuizen
 wrote:
> Addressed review comments by Marek.
>
> As part of that the max number of patches per threadgroup was reduced to 40
> from 64. This reduced unigine-heaven performance from 43.1 fps to 42.5 fps
> (the number varies a little but the magnitude of the difference is pretty 
> constant)
>
> However it is likely that the optimal value for it differs between 
> applications,
> and I don't have that many applications to check against.
>
> Any thoughts on the issue?

Feel free to increase the value as needed (without review).

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] st/mesa: Invalidate external textures when (re-)binding

2016-05-26 Thread Ilia Mirkin
On Thu, May 26, 2016 at 10:31 AM, Marek Olšák  wrote:
> On Wed, May 25, 2016 at 3:34 PM, Philipp Zabel  wrote:
>> Am Mittwoch, den 25.05.2016, 09:23 -0400 schrieb Ilia Mirkin:
>>> Iirc invalidate_resource is to allow backend to discard the contents...
>>
>> Thanks, I didn't know that. So this would need a new callback then?
>> Specifically I want to discard a copy in tiled layout that was derived
>> from a linear external texture previously.
>
> FWIW, radeon drivers don't change the tile mode after a texture has
> been exported. When the texture is exported, the tile mode is set in
> stone, be it linear or tiled. There is no second copy.

I think what he's saying is that they have a shadow copy of the
texture, and need to know when to update the shadow.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 7/8] glsl/linker: Don't include interface name for built-in blocks

2016-05-26 Thread Emil Velikov
Hi all,

On 18 May 2016 at 00:35, Timothy Arceri  wrote:
> On Tue, 2016-05-17 at 15:11 -0700, Ian Romanick wrote:
>> From: Ian Romanick 
>>
>> Commit 11096ec introduced a regression in some piglit tests (e.g.,
>> arb_program_interface_query-resource-query).  I did not notice this
>> regression because other (unrelated) problems caused failed
>> assertions
>> in those same tests on my system... so they crashed before getting to
>> the new failure.
>>
>> Signed-off-by: Ian Romanick 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>  src/compiler/glsl/linker.cpp | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/compiler/glsl/linker.cpp
>> b/src/compiler/glsl/linker.cpp
>> index 7f54433..34b4a81 100644
>> --- a/src/compiler/glsl/linker.cpp
>> +++ b/src/compiler/glsl/linker.cpp
>> @@ -3664,7 +3664,8 @@ add_shader_variable(struct gl_shader_program
>> *shProg, unsigned stage_mask,
>> *the name of the interface block (not the instance name)
>> and
>> *"Member" is the name of the variable."
>> */
>> -  const char *prefixed_name = var->data.from_named_ifc_block
>> +  const char *prefixed_name = (var->data.from_named_ifc_block &&
>> +   strncmp(var->name, "gl_", 3) !=
>> 0)
>
> You could use !is_gl_identifier(var->name) which looks slightly nicer.
>
Analogous to "glsl/linker: Include the interface name for input and
output blocks" add_shader_variable() is missing for 11.2. Can we have
a backport for the stable branch ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] st/mesa: Invalidate external textures when (re-)binding

2016-05-26 Thread Marek Olšák
On Thu, May 26, 2016 at 4:36 PM, Ilia Mirkin  wrote:
> On Thu, May 26, 2016 at 10:31 AM, Marek Olšák  wrote:
>> On Wed, May 25, 2016 at 3:34 PM, Philipp Zabel  
>> wrote:
>>> Am Mittwoch, den 25.05.2016, 09:23 -0400 schrieb Ilia Mirkin:
 Iirc invalidate_resource is to allow backend to discard the contents...
>>>
>>> Thanks, I didn't know that. So this would need a new callback then?
>>> Specifically I want to discard a copy in tiled layout that was derived
>>> from a linear external texture previously.
>>
>> FWIW, radeon drivers don't change the tile mode after a texture has
>> been exported. When the texture is exported, the tile mode is set in
>> stone, be it linear or tiled. There is no second copy.
>
> I think what he's saying is that they have a shadow copy of the
> texture, and need to know when to update the shadow.

Is it a tiler-specific thing?

Would pipe->memory_barrier be a good match for this?

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 2/2] glsl/linker: Include the interface name for input and output blocks

2016-05-26 Thread Emil Velikov
Hi gents,

On 14 May 2016 at 04:11, Kenneth Graunke  wrote:
> On Friday, May 13, 2016 6:42:54 PM PDT Ian Romanick wrote:
>> From: Ian Romanick 
>>
>> On my oes_shader_io_blocks branch, this fixes 71
>> dEQP-GLES31.functional.program_interface_query.* tests.
>>
>> Signed-off-by: Ian Romanick 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>  src/compiler/glsl/linker.cpp | 17 -
>>  1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
>> index 41b43ab..3749585 100644
>> --- a/src/compiler/glsl/linker.cpp
>> +++ b/src/compiler/glsl/linker.cpp
>> @@ -3654,6 +3654,21 @@ add_shader_variable(struct gl_shader_program *shProg,
> unsigned stage_mask,
>> }
>>
>> default: {
>> +  /* Issue #16 of the ARB_program_interface_query spec says:
>> +   *
>> +   * "* If a variable is a member of an interface block without an
>> +   *instance name, it is enumerated using just the variable name.
>> +   *
>> +   *  * If a variable is a member of an interface block with an
> instance
>> +   *name, it is enumerated as "BlockName.Member", where "BlockName"
> is
>> +   *the name of the interface block (not the instance name) and
>> +   *"Member" is the name of the variable."
>
> lol..."if it's in a block with one kind of name, use the block's other
> name..."
>
Seems like I forgot to press 'send' a while back.

The function itself is missing in 11.2 branch and from a quick look
there isn't a quick backport. Can anyone prep one or it doesn't make
sense to have one at all in -stable ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH v2] scons: build osmesa swrast and gallium

2016-05-26 Thread Brian Paul

On 05/26/2016 08:23 AM, Emil Velikov wrote:

On 27 April 2016 at 12:36, Emil Velikov  wrote:

On 11 March 2016 at 08:43, Andreas Fänger  wrote:

This patch makes it possible to build classic osmesa/swrast on windows
again. Although there is a gallium version of osmesa now, the swrast version
still has more features lacking in llvmpipe, e.g. anisotropic filtering.

This reverts commit 69db422218b0264b5b8eef45bd003a2544e9cbd6 and
c07df0f2014636b601cdbaff63214296599b1ad5 and adds "compiler" to
the LIBS in SConscript.


Brian, since you're one of the few people (the only person) looking
after osmesa, are you OK with the patch ?


Brian, can we please hear your ack/nack this patch ?


Thanks for the reminder. Too much stuff in my inbox.

Tested-by: Brian Paul 
Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 13/14] radeonsi: Process multiple patches per threadgroup.

2016-05-26 Thread Marek Olšák
Patches 12-13:

Reviewed-by: Marek Olšák 

Marek

On Thu, May 26, 2016 at 3:33 PM, Bas Nieuwenhuizen
 wrote:
> Using more than 1 wave per threadgroup does increase performance
> generally.  Not using too many patches per threadgroup also
> increases performance. Both catalyst and amdgpu-pro seem to
> use 40 patches as their maximum, but I haven't really seen
> any performance increase from limiting the number of patches
> to 40 instead of 64.
>
> Note that the trick where we overlap the input and output LDS
> does not work anymore as the insertion of the tess factors
> changes the patch stride.
>
> v2: - Add comment about LDS assumptions.
> - Add constant for buffer size.
> - Fix code style.
>
> v3: - Correct limits for not splitting patches between waves.
> - Set max num_patches to 40 as in the proprietary driver.
>
> Signed-off-by: Bas Nieuwenhuizen 
> ---
>  src/gallium/drivers/radeonsi/si_state_draw.c | 50 
> +++-
>  1 file changed, 35 insertions(+), 15 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
> b/src/gallium/drivers/radeonsi/si_state_draw.c
> index 6fe2619..c8b87a9 100644
> --- a/src/gallium/drivers/radeonsi/si_state_draw.c
> +++ b/src/gallium/drivers/radeonsi/si_state_draw.c
> @@ -108,20 +108,7 @@ static void si_emit_derived_tess_state(struct si_context 
> *sctx,
> unsigned input_patch_size, output_patch_size, output_patch0_offset;
> unsigned perpatch_output_offset, lds_size, ls_rsrc2;
> unsigned tcs_in_layout, tcs_out_layout, tcs_out_offsets;
> -   unsigned offchip_layout;
> -
> -   *num_patches = 1; /* TODO: calculate this */
> -
> -   if (sctx->last_ls == ls->current &&
> -   sctx->last_tcs == tcs &&
> -   sctx->last_tes_sh_base == tes_sh_base &&
> -   sctx->last_num_tcs_input_cp == num_tcs_input_cp)
> -   return;
> -
> -   sctx->last_ls = ls->current;
> -   sctx->last_tcs = tcs;
> -   sctx->last_tes_sh_base = tes_sh_base;
> -   sctx->last_num_tcs_input_cp = num_tcs_input_cp;
> +   unsigned offchip_layout, hardware_lds_size;
>
> /* This calculates how shader inputs and outputs among VS, TCS, and 
> TES
>  * are laid out in LDS. */
> @@ -146,7 +133,29 @@ static void si_emit_derived_tess_state(struct si_context 
> *sctx,
> pervertex_output_patch_size = num_tcs_output_cp * output_vertex_size;
> output_patch_size = pervertex_output_patch_size + 
> num_tcs_patch_outputs * 16;
>
> -   output_patch0_offset = sctx->tcs_shader.cso ? input_patch_size * 
> *num_patches : 0;
> +   /* Ensure that we only need one wave per SIMD so we don't need to 
> check
> +* resource usage. Also ensures that the number of tcs in and out
> +* vertices per threadgroup is at most 256.
> +*/
> +   *num_patches = 64 / MAX2(num_tcs_input_cp, num_tcs_output_cp) * 4;
> +
> +   /* Make sure that the data fits in LDS. This assumes the shaders only
> +* use LDS for the inputs and outputs.
> +*/
> +   hardware_lds_size = sctx->b.chip_class >= CIK ? 65536 : 32768;
> +   *num_patches = MIN2(*num_patches, hardware_lds_size / 
> (input_patch_size +
> +  
> output_patch_size));
> +
> +   /* Make sure the output data fits in the offchip buffer */
> +   *num_patches = MIN2(*num_patches, SI_TESS_OFFCHIP_BLOCK_SIZE /
> + output_patch_size);
> +
> +   /* Not necessary for correctness, but improves performance. The
> +* specific value is taken from the proprietary driver.
> +*/
> +   *num_patches = MIN2(*num_patches, 40);
> +
> +   output_patch0_offset = input_patch_size * *num_patches;
> perpatch_output_offset = output_patch0_offset + 
> pervertex_output_patch_size;
>
> lds_size = output_patch0_offset + output_patch_size * *num_patches;
> @@ -160,6 +169,17 @@ static void si_emit_derived_tess_state(struct si_context 
> *sctx,
> ls_rsrc2 |= S_00B52C_LDS_SIZE(align(lds_size, 256) / 256);
> }
>
> +   if (sctx->last_ls == ls->current &&
> +   sctx->last_tcs == tcs &&
> +   sctx->last_tes_sh_base == tes_sh_base &&
> +   sctx->last_num_tcs_input_cp == num_tcs_input_cp)
> +   return;
> +
> +   sctx->last_ls = ls->current;
> +   sctx->last_tcs = tcs;
> +   sctx->last_tes_sh_base = tes_sh_base;
> +   sctx->last_num_tcs_input_cp = num_tcs_input_cp;
> +
> /* Due to a hw bug, RSRC2_LS must be written twice with another
>  * LS register written in between. */
> if (sctx->b.chip_class == CIK && sctx->b.family != CHIP_HAWAII)
> --
> 2.8.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-de

Re: [Mesa-dev] [PATCH 3/5] util: s/unsigned/enum pipe_resource_usage/ for buffer usage variables

2016-05-26 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Thu, May 26, 2016 at 4:09 PM, Brian Paul  wrote:
> ---
>  src/gallium/auxiliary/util/u_debug.c  | 2 +-
>  src/gallium/auxiliary/util/u_debug.h  | 2 +-
>  src/gallium/auxiliary/util/u_inlines.h| 6 +++---
>  src/gallium/auxiliary/util/u_staging.c| 2 +-
>  src/gallium/auxiliary/util/u_staging.h| 2 +-
>  src/gallium/auxiliary/util/u_suballoc.c   | 5 +++--
>  src/gallium/auxiliary/util/u_suballoc.h   | 3 ++-
>  src/gallium/auxiliary/util/u_upload_mgr.c | 4 ++--
>  src/gallium/auxiliary/util/u_upload_mgr.h | 2 +-
>  9 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/src/gallium/auxiliary/util/u_debug.c 
> b/src/gallium/auxiliary/util/u_debug.c
> index db66357..0d63cfe 100644
> --- a/src/gallium/auxiliary/util/u_debug.c
> +++ b/src/gallium/auxiliary/util/u_debug.c
> @@ -550,7 +550,7 @@ debug_print_bind_flags(const char *msg, unsigned usage)
>   * Print PIPE_USAGE_x enum values with a message.
>   */
>  void
> -debug_print_usage_enum(const char *msg, unsigned usage)
> +debug_print_usage_enum(const char *msg, enum pipe_resource_usage usage)
>  {
> static const struct debug_named_value names[] = {
>DEBUG_NAMED_VALUE(PIPE_USAGE_DEFAULT),
> diff --git a/src/gallium/auxiliary/util/u_debug.h 
> b/src/gallium/auxiliary/util/u_debug.h
> index 85d0cb6..7da7f53 100644
> --- a/src/gallium/auxiliary/util/u_debug.h
> +++ b/src/gallium/auxiliary/util/u_debug.h
> @@ -473,7 +473,7 @@ void
>  debug_print_bind_flags(const char *msg, unsigned usage);
>
>  void
> -debug_print_usage_enum(const char *msg, unsigned usage);
> +debug_print_usage_enum(const char *msg, enum pipe_resource_usage usage);
>
>
>  #ifdef __cplusplus
> diff --git a/src/gallium/auxiliary/util/u_inlines.h 
> b/src/gallium/auxiliary/util/u_inlines.h
> index 07c058a..a38223c 100644
> --- a/src/gallium/auxiliary/util/u_inlines.h
> +++ b/src/gallium/auxiliary/util/u_inlines.h
> @@ -230,12 +230,12 @@ pipe_surface_equal(struct pipe_surface *s1, struct 
> pipe_surface *s2)
>  /**
>   * Create a new resource.
>   * \param bind  bitmask of PIPE_BIND_x flags
> - * \param usage  bitmask of PIPE_USAGE_x flags
> + * \param usage  a PIPE_USAGE_x value
>   */
>  static inline struct pipe_resource *
>  pipe_buffer_create( struct pipe_screen *screen,
> unsigned bind,
> -   unsigned usage,
> +   enum pipe_resource_usage usage,
> unsigned size )
>  {
> struct pipe_resource buffer;
> @@ -395,7 +395,7 @@ pipe_buffer_write_nooverlap(struct pipe_context *pipe,
>  static inline struct pipe_resource *
>  pipe_buffer_create_with_data(struct pipe_context *pipe,
>   unsigned bind,
> - unsigned usage,
> + enum pipe_resource_usage usage,
>   unsigned size,
>   const void *ptr)
>  {
> diff --git a/src/gallium/auxiliary/util/u_staging.c 
> b/src/gallium/auxiliary/util/u_staging.c
> index caef2a8..5b61f5e 100644
> --- a/src/gallium/auxiliary/util/u_staging.c
> +++ b/src/gallium/auxiliary/util/u_staging.c
> @@ -56,7 +56,7 @@ util_staging_resource_template(struct pipe_resource *pt, 
> unsigned width,
>  struct util_staging_transfer *
>  util_staging_transfer_init(struct pipe_context *pipe,
> struct pipe_resource *pt,
> -   unsigned level, unsigned usage,
> +   unsigned level, enum pipe_resource_usage usage,
> const struct pipe_box *box,
> boolean direct, struct util_staging_transfer *tx)
>  {
> diff --git a/src/gallium/auxiliary/util/u_staging.h 
> b/src/gallium/auxiliary/util/u_staging.h
> index 6c468aa..eed5584 100644
> --- a/src/gallium/auxiliary/util/u_staging.h
> +++ b/src/gallium/auxiliary/util/u_staging.h
> @@ -56,7 +56,7 @@ struct util_staging_transfer {
>  struct util_staging_transfer *
>  util_staging_transfer_init(struct pipe_context *pipe,
> struct pipe_resource *pt,
> -   unsigned level, unsigned usage,
> +   unsigned level, enum pipe_resource_usage usage,
> const struct pipe_box *box,
> boolean direct, struct util_staging_transfer *tx);
>
> diff --git a/src/gallium/auxiliary/util/u_suballoc.c 
> b/src/gallium/auxiliary/util/u_suballoc.c
> index efa9a0c..3f9ede0 100644
> --- a/src/gallium/auxiliary/util/u_suballoc.c
> +++ b/src/gallium/auxiliary/util/u_suballoc.c
> @@ -43,7 +43,7 @@ struct u_suballocator {
> unsigned size;  /* Size of the whole buffer, in bytes. */
> unsigned alignment; /* Alignment of each sub-allocation. */
> unsigned bind;  /* Bitmask of PIPE_BIND_* flags. */
> -   unsigned usage; /* One of PIPE_USAGE_* flags. */
> +   enum pipe_resource_usage usage;
> boolean zero_buff

Re: [Mesa-dev] [PATCH 5/5] gallium: change pipe_draw_info::mode to be pipe_prim_type

2016-05-26 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Thu, May 26, 2016 at 4:09 PM, Brian Paul  wrote:
> Makes debugging with gdb a little nicer.
> ---
>  src/gallium/include/pipe/p_state.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/include/pipe/p_state.h 
> b/src/gallium/include/pipe/p_state.h
> index eacf9bb..396f563 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -617,7 +617,7 @@ struct pipe_draw_info
>  {
> boolean indexed;  /**< use index buffer */
>
> -   unsigned mode;  /**< the mode of the primitive */
> +   enum pipe_prim_type mode;  /**< the mode of the primitive */
> unsigned start;  /**< the index of the first vertex */
> unsigned count;  /**< number of vertices */
>
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] st/mesa: Invalidate external textures when (re-)binding

2016-05-26 Thread Philipp Zabel
Am Donnerstag, den 26.05.2016, 10:36 -0400 schrieb Ilia Mirkin:
> On Thu, May 26, 2016 at 10:31 AM, Marek Olšák  wrote:
> > On Wed, May 25, 2016 at 3:34 PM, Philipp Zabel  
> > wrote:
> >> Am Mittwoch, den 25.05.2016, 09:23 -0400 schrieb Ilia Mirkin:
> >>> Iirc invalidate_resource is to allow backend to discard the contents...
> >>
> >> Thanks, I didn't know that. So this would need a new callback then?
> >> Specifically I want to discard a copy in tiled layout that was derived
> >> from a linear external texture previously.
> >
> > FWIW, radeon drivers don't change the tile mode after a texture has
> > been exported. When the texture is exported, the tile mode is set in
> > stone, be it linear or tiled. There is no second copy.
> 
> I think what he's saying is that they have a shadow copy of the
> texture, and need to know when to update the shadow.

Yes, exactly. I'd like to import a linear dma-buf using
EGL_EXT_image_dma_buf_import and GL_OES_EGL_image_external with the
etnaviv gallium driver.

The linear source buffer needs to be transferred into a shadow copy in
tiled layout. The texture samplers can only read from the tiled copy.

After the linear source buffer has been modified, binding the texture
again must trigger a refresh of the tiled copy somehow.

regards
Philipp

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium: detect avx512 cpu features

2016-05-26 Thread Roland Scheidegger
Am 26.05.2016 um 02:10 schrieb Tim Rowley:
> ---
>  src/gallium/auxiliary/util/u_cpu_detect.c | 17 +
>  src/gallium/auxiliary/util/u_cpu_detect.h | 10 ++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c 
> b/src/gallium/auxiliary/util/u_cpu_detect.c
> index aa3c30a..0d4a7c4 100644
> --- a/src/gallium/auxiliary/util/u_cpu_detect.c
> +++ b/src/gallium/auxiliary/util/u_cpu_detect.c
> @@ -387,6 +387,23 @@ util_cpu_detect(void)
>   util_cpu_caps.has_avx2 = (regs7[1] >> 5) & 1;
>}
>  
> +  // check for avx512
> +  if (((regs2[2] >> 27) & 1) && // OSXSAVE
> +  (xgetbv() & (0x7 << 5)) && // OPMASK: upper-256 enabled by OS
> +  (xgetbv() & (0x3 << 1))) { // XMM/YMM enabled by OS
> + uint32_t regs3[4];
> + cpuid(0x7, regs3);
> + util_cpu_caps.has_avx512f= (regs3[1] >> 16) & 1;
> + util_cpu_caps.has_avx512dq   = (regs3[1] >> 17) & 1;
> + util_cpu_caps.has_avx512ifma = (regs3[1] >> 21) & 1;
> + util_cpu_caps.has_avx512pf   = (regs3[1] >> 26) & 1;
> + util_cpu_caps.has_avx512er   = (regs3[1] >> 27) & 1;
> + util_cpu_caps.has_avx512cd   = (regs3[1] >> 28) & 1;
> + util_cpu_caps.has_avx512bw   = (regs3[1] >> 30) & 1;
> + util_cpu_caps.has_avx512vl   = (regs3[1] >> 31) & 1;
> + util_cpu_caps.has_avx512vbmi = (regs3[2] >>  1) & 1;
> +  }
> +
>if (regs[1] == 0x756e6547 && regs[2] == 0x6c65746e && regs[3] == 
> 0x49656e69) {
>   /* GenuineIntel */
>   util_cpu_caps.has_intel = 1;
> diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h 
> b/src/gallium/auxiliary/util/u_cpu_detect.h
> index 5ccfc93..b612a2c 100644
> --- a/src/gallium/auxiliary/util/u_cpu_detect.h
> +++ b/src/gallium/auxiliary/util/u_cpu_detect.h
> @@ -71,6 +71,16 @@ struct util_cpu_caps {
> unsigned has_xop:1;
> unsigned has_altivec:1;
> unsigned has_daz:1;
> +
> +   unsigned has_avx512f:1;
> +   unsigned has_avx512dq:1;
> +   unsigned has_avx512ifma:1;
> +   unsigned has_avx512pf:1;
> +   unsigned has_avx512er:1;
> +   unsigned has_avx512cd:1;
> +   unsigned has_avx512bw:1;
> +   unsigned has_avx512vl:1;
> +   unsigned has_avx512vbmi:1;
>  };
>  
>  extern struct util_cpu_caps
> 

Reviewed-by: Roland Scheidegger 

(Albeit I didn't actually verify the bits...)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/bxt: Add 2x6 variant

2016-05-26 Thread Ben Widawsky
Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Ben Widawsky 
---
 include/pci_ids/i965_pci_ids.h  |  2 ++
 src/mesa/drivers/dri/i965/brw_device_info.c | 22 ++
 2 files changed, 24 insertions(+)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index bd645fa..fce00da 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -162,4 +162,6 @@ CHIPSET(0x22B2, chv, "Intel(R) HD Graphics 
(Cherryview)")
 CHIPSET(0x22B3, chv, "Intel(R) HD Graphics (Cherryview)")
 CHIPSET(0x0A84, bxt, "Intel(R) HD Graphics (Broxton)")
 CHIPSET(0x1A84, bxt, "Intel(R) HD Graphics (Broxton)")
+CHIPSET(0x1A85, bxt_2x6, "Intel(R) HD Graphics (Broxton 2x6)")
 CHIPSET(0x5A84, bxt, "Intel(R) HD Graphics (Broxton)")
+CHIPSET(0x5A85, bxt_2x6, "Intel(R) HD Graphics (Broxton 2x6)")
diff --git a/src/mesa/drivers/dri/i965/brw_device_info.c 
b/src/mesa/drivers/dri/i965/brw_device_info.c
index 3666190..77bbe78 100644
--- a/src/mesa/drivers/dri/i965/brw_device_info.c
+++ b/src/mesa/drivers/dri/i965/brw_device_info.c
@@ -401,6 +401,28 @@ static const struct brw_device_info brw_device_info_bxt = {
}
 };
 
+static const struct brw_device_info brw_device_info_bxt_2x6 = {
+   GEN9_FEATURES,
+   .is_broxton = 1,
+   .gt = 1,
+   .has_llc = false,
+
+   .num_slices = 1,
+   .max_vs_threads = 56, /* XXX: guess */
+   .max_hs_threads = 56, /* XXX: guess */
+   .max_ds_threads = 56,
+   .max_gs_threads = 56,
+   .max_wm_threads = 64 * 2,
+   .max_cs_threads = 6 * 6,
+   .urb = {
+  .size = 128,
+  .min_vs_entries = 34,
+  .max_vs_entries = 352,
+  .max_hs_entries = 128,
+  .max_ds_entries = 208,
+  .max_gs_entries = 128,
+   }
+};
 /*
  * Note: for all KBL SKUs, the PRM says SKL for GS entries, not SKL+.
  * There's no KBL entry. Using the default SKL (GEN9) GS entries value.
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] tgsi: fix coverity out-of-bounds warning

2016-05-26 Thread Rob Clark
From: Rob Clark 

CID 1271532 (#1 of 1): Out-of-bounds read (OVERRUN)34. overrun-local:
Overrunning array of 2 16-byte elements at element index 2 (byte offset
32) by dereferencing pointer &inst.Dst[i].

Signed-off-by: Rob Clark 
---
 src/gallium/auxiliary/tgsi/tgsi_text.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c 
b/src/gallium/auxiliary/tgsi/tgsi_text.c
index 955d042..8bdec06 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -1081,6 +1081,9 @@ parse_instruction(
   inst.Memory.Qualifier = 0;
}
 
+   assume(info->num_dst <= TGSI_FULL_MAX_DST_REGISTERS);
+   assume(info->num_src <= TGSI_FULL_MAX_SRC_REGISTERS);
+
/* Parse instruction operands.
 */
for (i = 0; i < info->num_dst + info->num_src + info->is_tex; i++) {
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] tgsi: fix out of bounds access

2016-05-26 Thread Rob Clark
From: Rob Clark 

Not sure why coverity calls this an out-of-bounds read vs out-of-bounds
write.

CID 1358920 (#1 of 1): Out-of-bounds read (OVERRUN)9. overrun-local:
Overrunning array r of 3 16-byte elements at element index 3 (byte
offset 48) using index chan (which evaluates to 3).

Signed-off-by: Rob Clark 
---
 src/gallium/auxiliary/tgsi/tgsi_exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index d483429..6a5e5df 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -3850,7 +3850,7 @@ static void
 exec_load_mem(struct tgsi_exec_machine *mach,
   const struct tgsi_full_instruction *inst)
 {
-   union tgsi_exec_channel r[3];
+   union tgsi_exec_channel r[4];
uint chan;
char *ptr = mach->LocalMem;
uint32_t offset;
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nvc0/ir: avoid generating illegal instructions for compute constbuf loads

2016-05-26 Thread Samuel Pitoiset

Makes sense.

We should also clamp the index for indirect access as part of the 
robustness, but this can be done later.


Reviewed-by: Samuel Pitoiset 

On 05/26/2016 04:44 AM, Ilia Mirkin wrote:

For user-supplied constbufs, fileIndex is 0. In that case, when we
subtract 1, we'll end up loading from constbuf offset -16. This is
illegal, and there are asserts to avoid it. Normally we'd just DCE it,
but no point in generating the instructions if they're not going to be
used.

Signed-off-by: Ilia Mirkin 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index 869040c..da2fa4b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -2180,11 +2180,11 @@ NVC0LoweringPass::handleLDST(Instruction *i)
  // memory.
  int8_t fileIndex = i->getSrc(0)->reg.fileIndex - 1;
  Value *ind = i->getIndirect(0, 1);
- Value *ptr = loadUboInfo64(ind, fileIndex * 16);

  // TODO: clamp the offset to the maximum number of const buf.
  if (i->src(0).isIndirect(1)) {
 Value *offset = bld.loadImm(NULL, i->getSrc(0)->reg.data.offset + 
typeSizeof(i->sType));
+Value *ptr = loadUboInfo64(ind, fileIndex * 16);
 Value *length = loadUboLength32(ind, fileIndex * 16);
 Value *pred = new_LValue(func, FILE_PREDICATE);
 if (i->src(0).isIndirect(0)) {
@@ -2200,6 +2200,7 @@ NVC0LoweringPass::handleLDST(Instruction *i)
bld.mkMov(i->getDef(0), bld.mkImm(0));
 }
  } else if (fileIndex >= 0) {
+Value *ptr = loadUboInfo64(ind, fileIndex * 16);
 if (i->src(0).isIndirect(0)) {
bld.mkOp2(OP_ADD, TYPE_U64, ptr, ptr, i->getIndirect(0, 0));
 }



--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] tgsi: fix coverity out-of-bounds warning

2016-05-26 Thread Brian Paul

On 05/26/2016 09:25 AM, Rob Clark wrote:

From: Rob Clark 

CID 1271532 (#1 of 1): Out-of-bounds read (OVERRUN)34. overrun-local:
Overrunning array of 2 16-byte elements at element index 2 (byte offset
32) by dereferencing pointer &inst.Dst[i].

Signed-off-by: Rob Clark 
---
  src/gallium/auxiliary/tgsi/tgsi_text.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c 
b/src/gallium/auxiliary/tgsi/tgsi_text.c
index 955d042..8bdec06 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -1081,6 +1081,9 @@ parse_instruction(
inst.Memory.Qualifier = 0;
 }

+   assume(info->num_dst <= TGSI_FULL_MAX_DST_REGISTERS);
+   assume(info->num_src <= TGSI_FULL_MAX_SRC_REGISTERS);
+
 /* Parse instruction operands.
  */
 for (i = 0; i < info->num_dst + info->num_src + info->is_tex; i++) {



For both,
Reviewed-by: Brian Paul 

Should the first be cc'd for stable?

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] util/indices: implement provoking vertex conversion for adjacency primitives

2016-05-26 Thread Roland Scheidegger
Am 26.05.2016 um 16:06 schrieb Brian Paul:
> Tested with new piglit gl-3.2-adj-prims test.
> ---
>  src/gallium/auxiliary/indices/u_indices.c  | 52 
>  src/gallium/auxiliary/indices/u_indices_gen.py | 83 
> +-
>  src/gallium/auxiliary/indices/u_indices_priv.h |  2 +-
>  3 files changed, 134 insertions(+), 3 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/indices/u_indices.c 
> b/src/gallium/auxiliary/indices/u_indices.c
> index 436f8f0..2b2d10c 100644
> --- a/src/gallium/auxiliary/indices/u_indices.c
> +++ b/src/gallium/auxiliary/indices/u_indices.c
> @@ -55,6 +55,8 @@ static void translate_memcpy_uint( const void *in,
>   * - Translate from first provoking vertex to last provoking vertex and
>   *   vice versa.
>   *
> + * Note that this function is used for indexed primitives.
> + *
>   * \param hw_mask  mask of (1 << PIPE_PRIM_x) flags indicating which types
>   * of primitives are supported by the hardware.
>   * \param prim  incoming PIPE_PRIM_x
> @@ -172,6 +174,30 @@ u_index_translator(unsigned hw_mask,
>   *out_nr = (nr - 2) * 3;
>   break;
>  
> +  case PIPE_PRIM_LINES_ADJACENCY:
> + *out_translate = 
> translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];
Can't you get that line out of the switch? (Not that this is really new...)

Patch looks good though (albeit I can't quite verify the index numbers...)

Roland



> + *out_prim = PIPE_PRIM_LINES_ADJACENCY;
> + *out_nr = nr;
> + break;
> +
> +  case PIPE_PRIM_LINE_STRIP_ADJACENCY:
> + *out_translate = 
> translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];
> + *out_prim = PIPE_PRIM_LINES_ADJACENCY;
> + *out_nr = (nr - 3) * 4;
> + break;
> +
> +  case PIPE_PRIM_TRIANGLES_ADJACENCY:
> + *out_translate = 
> translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];
> + *out_prim = PIPE_PRIM_TRIANGLES_ADJACENCY;
> + *out_nr = nr;
> + break;
> +
> +  case PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY:
> + *out_translate = 
> translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];
> + *out_prim = PIPE_PRIM_TRIANGLES_ADJACENCY;
> + *out_nr = ((nr - 4) / 2) * 6;
> + break;
> +
>default:
>   assert(0);
>   *out_translate = 
> translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];
> @@ -193,6 +219,8 @@ u_index_translator(unsigned hw_mask,
>   * The generator functions generates a number of ushort or uint indexes
>   * for drawing the new type of primitive.
>   *
> + * Note that this function is used for non-indexed primitives.
> + *
>   * \param hw_mask  a bitmask of (1 << PIPE_PRIM_x) values that indicates
>   * kind of primitives are supported by the driver.
>   * \param prim  the PIPE_PRIM_x that the user wants to draw
> @@ -294,6 +322,30 @@ u_index_generator(unsigned hw_mask,
>   *out_nr = (nr - 2) * 3;
>   return U_GENERATE_REUSABLE;
>  
> +  case PIPE_PRIM_LINES_ADJACENCY:
> + *out_generate = generate[out_idx][in_pv][out_pv][prim];
> + *out_prim = PIPE_PRIM_LINES_ADJACENCY;
> + *out_nr = nr;
> + return U_GENERATE_REUSABLE;
> +
> +  case PIPE_PRIM_LINE_STRIP_ADJACENCY:
> + *out_generate = generate[out_idx][in_pv][out_pv][prim];
> + *out_prim = PIPE_PRIM_LINES_ADJACENCY;
> + *out_nr = (nr - 3) * 4;
> + return U_GENERATE_REUSABLE;
> +
> +  case PIPE_PRIM_TRIANGLES_ADJACENCY:
> + *out_generate = generate[out_idx][in_pv][out_pv][prim];
> + *out_prim = PIPE_PRIM_TRIANGLES_ADJACENCY;
> + *out_nr = nr;
> + return U_GENERATE_REUSABLE;
> +
> +  case PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY:
> + *out_generate = generate[out_idx][in_pv][out_pv][prim];
> + *out_prim = PIPE_PRIM_TRIANGLES_ADJACENCY;
> + *out_nr = ((nr - 4) / 2) * 6;
> + return U_GENERATE_REUSABLE;
> +
>default:
>   assert(0);
>   *out_generate = generate[out_idx][in_pv][out_pv][PIPE_PRIM_POINTS];
> diff --git a/src/gallium/auxiliary/indices/u_indices_gen.py 
> b/src/gallium/auxiliary/indices/u_indices_gen.py
> index 97c8e0d..fb6b310 100644
> --- a/src/gallium/auxiliary/indices/u_indices_gen.py
> +++ b/src/gallium/auxiliary/indices/u_indices_gen.py
> @@ -42,7 +42,11 @@ PRIMS=('points',
> 'tristrip', 
> 'quads', 
> 'quadstrip', 
> -   'polygon')
> +   'polygon',
> +   'linesadj',
> +   'linestripadj',
> +   'trisadj',
> +   'tristripadj')
>  
>  LONGPRIMS=('PIPE_PRIM_POINTS', 
> 'PIPE_PRIM_LINES', 
> @@ -53,7 +57,11 @@ LONGPRIMS=('PIPE_PRIM_POINTS',
> 'PIPE_PRIM_TRIANGLE_STRIP', 
> 'PIPE_PRIM_QUADS', 
> 'PIPE_PRIM_QUAD_STRIP', 
> -   'PIPE_PRIM_POLYGON')
> +   'PIPE_PRIM_POLYGON',
> +   'PIPE_PRIM_LINES_ADJACENCY',
> +   

Re: [Mesa-dev] [PATCH 5/5] util/indices: implement unfilled (tri->line) conversion for adjacency prims

2016-05-26 Thread Roland Scheidegger
Am 26.05.2016 um 16:06 schrieb Brian Paul:
> Tested with new piglit gl-3.2-adj-prims test.
> ---
>  src/gallium/auxiliary/indices/u_unfilled_gen.py| 26 
> --
>  src/gallium/auxiliary/indices/u_unfilled_indices.c | 14 
>  2 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/indices/u_unfilled_gen.py 
> b/src/gallium/auxiliary/indices/u_unfilled_gen.py
> index 873e781..18d9968 100644
> --- a/src/gallium/auxiliary/indices/u_unfilled_gen.py
> +++ b/src/gallium/auxiliary/indices/u_unfilled_gen.py
> @@ -35,14 +35,18 @@ PRIMS=('tris',
> 'tristrip', 
> 'quads', 
> 'quadstrip', 
> -   'polygon')
> +   'polygon',
> +   'tristripadj',
> +   'trisadj')
Maybe switch the order of tristripadj and trisadj (and just below too)
for consistency? No big deal though.

For the series:
Reviewed-by: Roland Scheidegger 


>  
>  LONGPRIMS=('PIPE_PRIM_TRIANGLES', 
> 'PIPE_PRIM_TRIANGLE_FAN', 
> 'PIPE_PRIM_TRIANGLE_STRIP', 
> 'PIPE_PRIM_QUADS', 
> 'PIPE_PRIM_QUAD_STRIP', 
> -   'PIPE_PRIM_POLYGON')
> +   'PIPE_PRIM_POLYGON',
> +   'PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY',
> +   'PIPE_PRIM_TRIANGLES_ADJACENCY')
>  
>  longprim = dict(zip(PRIMS, LONGPRIMS))
>  intype_idx = dict(ubyte='IN_UBYTE', ushort='IN_USHORT', uint='IN_UINT')
> @@ -194,6 +198,22 @@ def quadstrip(intype, outtype):
>  postamble()
>  
>  
> +def trisadj(intype, outtype):
> +preamble(intype, outtype, prim='trisadj')
> +print '  for (i = start, j = 0; j < out_nr; j+=6, i+=6) { '
> +do_tri( intype, outtype, 'out+j',  'i', 'i+2', 'i+4' );
> +print '   }'
> +postamble()
> +
> +
> +def tristripadj(intype, outtype):
> +preamble(intype, outtype, prim='tristripadj')
> +print '  for (i = start, j = 0; j < out_nr; j+=6, i+=2) { '
> +do_tri( intype, outtype, 'out+j',  'i', 'i+2', 'i+4' );
> +print '   }'
> +postamble()
> +
> +
>  def emit_funcs():
>  for intype in INTYPES:
>  for outtype in OUTTYPES:
> @@ -203,6 +223,8 @@ def emit_funcs():
>  quads(intype, outtype)
>  quadstrip(intype, outtype)
>  polygon(intype, outtype)
> +trisadj(intype, outtype)
> +tristripadj(intype, outtype)
>  
>  def init(intype, outtype, prim):
>  if intype == GENERATE:
> diff --git a/src/gallium/auxiliary/indices/u_unfilled_indices.c 
> b/src/gallium/auxiliary/indices/u_unfilled_indices.c
> index 49fff6b..8cb5192 100644
> --- a/src/gallium/auxiliary/indices/u_unfilled_indices.c
> +++ b/src/gallium/auxiliary/indices/u_unfilled_indices.c
> @@ -22,6 +22,12 @@
>   * USE OR OTHER DEALINGS IN THE SOFTWARE.
>   */
>  
> +
> +/*
> + * NOTE: This file is not compiled by itself.  It's actually #included
> + * by the generated u_unfilled_gen.c file!
> + */
> +
>  #include "u_indices.h"
>  #include "u_indices_priv.h"
>  #include "util/u_prim.h"
> @@ -104,6 +110,14 @@ nr_lines(unsigned prim, unsigned nr)
>return (nr - 2) / 2 * 8;
> case PIPE_PRIM_POLYGON:
>return 2 * nr; /* a line (two verts) for each polygon edge */
> +   /* Note: these cases can't really be handled since drawing lines instead
> +* of triangles would also require changing the GS.  But if there's no GS,
> +* this should work.
> +*/
> +   case PIPE_PRIM_TRIANGLES_ADJACENCY:
> +  return (nr / 6) * 6;
> +   case PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY:
> +  return ((nr - 4) / 2) * 6;
> default:
>assert(0);
>return 0;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] util/indices: implement provoking vertex conversion for adjacency primitives

2016-05-26 Thread Brian Paul

On 05/26/2016 09:35 AM, Roland Scheidegger wrote:

Am 26.05.2016 um 16:06 schrieb Brian Paul:

Tested with new piglit gl-3.2-adj-prims test.
---
  src/gallium/auxiliary/indices/u_indices.c  | 52 
  src/gallium/auxiliary/indices/u_indices_gen.py | 83 +-
  src/gallium/auxiliary/indices/u_indices_priv.h |  2 +-
  3 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/src/gallium/auxiliary/indices/u_indices.c 
b/src/gallium/auxiliary/indices/u_indices.c
index 436f8f0..2b2d10c 100644
--- a/src/gallium/auxiliary/indices/u_indices.c
+++ b/src/gallium/auxiliary/indices/u_indices.c
@@ -55,6 +55,8 @@ static void translate_memcpy_uint( const void *in,
   * - Translate from first provoking vertex to last provoking vertex and
   *   vice versa.
   *
+ * Note that this function is used for indexed primitives.
+ *
   * \param hw_mask  mask of (1 << PIPE_PRIM_x) flags indicating which types
   * of primitives are supported by the hardware.
   * \param prim  incoming PIPE_PRIM_x
@@ -172,6 +174,30 @@ u_index_translator(unsigned hw_mask,
   *out_nr = (nr - 2) * 3;
   break;

+  case PIPE_PRIM_LINES_ADJACENCY:
+ *out_translate = 
translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];

Can't you get that line out of the switch? (Not that this is really new...)


I didn't even notice that.  I'll do that in a later patch.



Patch looks good though (albeit I can't quite verify the index numbers...)


I'm pretty confident after testing with my new piglit test.

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] util/indices: implement unfilled (tri->line) conversion for adjacency prims

2016-05-26 Thread Brian Paul

On 05/26/2016 09:37 AM, Roland Scheidegger wrote:

Am 26.05.2016 um 16:06 schrieb Brian Paul:

Tested with new piglit gl-3.2-adj-prims test.
---
  src/gallium/auxiliary/indices/u_unfilled_gen.py| 26 --
  src/gallium/auxiliary/indices/u_unfilled_indices.c | 14 
  2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/indices/u_unfilled_gen.py 
b/src/gallium/auxiliary/indices/u_unfilled_gen.py
index 873e781..18d9968 100644
--- a/src/gallium/auxiliary/indices/u_unfilled_gen.py
+++ b/src/gallium/auxiliary/indices/u_unfilled_gen.py
@@ -35,14 +35,18 @@ PRIMS=('tris',
 'tristrip',
 'quads',
 'quadstrip',
-   'polygon')
+   'polygon',
+   'tristripadj',
+   'trisadj')

Maybe switch the order of tristripadj and trisadj (and just below too)
for consistency? No big deal though.


Will do.



For the series:
Reviewed-by: Roland Scheidegger 


Thanks!

-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] gallium: change pipe_draw_info::mode to be pipe_prim_type

2016-05-26 Thread Roland Scheidegger
Am 26.05.2016 um 16:09 schrieb Brian Paul:
> Makes debugging with gdb a little nicer.
> ---
>  src/gallium/include/pipe/p_state.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/include/pipe/p_state.h 
> b/src/gallium/include/pipe/p_state.h
> index eacf9bb..396f563 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -617,7 +617,7 @@ struct pipe_draw_info
>  {
> boolean indexed;  /**< use index buffer */
>  
> -   unsigned mode;  /**< the mode of the primitive */
> +   enum pipe_prim_type mode;  /**< the mode of the primitive */
> unsigned start;  /**< the index of the first vertex */
> unsigned count;  /**< number of vertices */
>  
> 

For the series: Reviewed-by: Roland Scheidegger 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] util/indices: implement provoking vertex conversion for adjacency primitives

2016-05-26 Thread Roland Scheidegger
Am 26.05.2016 um 17:39 schrieb Brian Paul:
> On 05/26/2016 09:35 AM, Roland Scheidegger wrote:
>> Am 26.05.2016 um 16:06 schrieb Brian Paul:
>>> Tested with new piglit gl-3.2-adj-prims test.
>>> ---
>>>   src/gallium/auxiliary/indices/u_indices.c  | 52 
>>>   src/gallium/auxiliary/indices/u_indices_gen.py | 83
>>> +-
>>>   src/gallium/auxiliary/indices/u_indices_priv.h |  2 +-
>>>   3 files changed, 134 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/gallium/auxiliary/indices/u_indices.c
>>> b/src/gallium/auxiliary/indices/u_indices.c
>>> index 436f8f0..2b2d10c 100644
>>> --- a/src/gallium/auxiliary/indices/u_indices.c
>>> +++ b/src/gallium/auxiliary/indices/u_indices.c
>>> @@ -55,6 +55,8 @@ static void translate_memcpy_uint( const void *in,
>>>* - Translate from first provoking vertex to last provoking vertex
>>> and
>>>*   vice versa.
>>>*
>>> + * Note that this function is used for indexed primitives.
>>> + *
>>>* \param hw_mask  mask of (1 << PIPE_PRIM_x) flags indicating
>>> which types
>>>* of primitives are supported by the hardware.
>>>* \param prim  incoming PIPE_PRIM_x
>>> @@ -172,6 +174,30 @@ u_index_translator(unsigned hw_mask,
>>>*out_nr = (nr - 2) * 3;
>>>break;
>>>
>>> +  case PIPE_PRIM_LINES_ADJACENCY:
>>> + *out_translate =
>>> translate[in_idx][out_idx][in_pv][out_pv][prim_restart][prim];
>> Can't you get that line out of the switch? (Not that this is really
>> new...)
> 
> I didn't even notice that.  I'll do that in a later patch.
> 

Actually I guess the default case was meant to have a different line,
like it does in u_index_generator (hardwiring prim), making this
slightly less silly. (But could still keep the line out of the switch,
noone cares if the error case has another assignment...)

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] tgsi: fix coverity out-of-bounds warning

2016-05-26 Thread Rob Clark
On Thu, May 26, 2016 at 11:33 AM, Brian Paul  wrote:
> On 05/26/2016 09:25 AM, Rob Clark wrote:
>>
>> From: Rob Clark 
>>
>> CID 1271532 (#1 of 1): Out-of-bounds read (OVERRUN)34. overrun-local:
>> Overrunning array of 2 16-byte elements at element index 2 (byte offset
>> 32) by dereferencing pointer &inst.Dst[i].
>>
>> Signed-off-by: Rob Clark 
>> ---
>>   src/gallium/auxiliary/tgsi/tgsi_text.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c
>> b/src/gallium/auxiliary/tgsi/tgsi_text.c
>> index 955d042..8bdec06 100644
>> --- a/src/gallium/auxiliary/tgsi/tgsi_text.c
>> +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
>> @@ -1081,6 +1081,9 @@ parse_instruction(
>> inst.Memory.Qualifier = 0;
>>  }
>>
>> +   assume(info->num_dst <= TGSI_FULL_MAX_DST_REGISTERS);
>> +   assume(info->num_src <= TGSI_FULL_MAX_SRC_REGISTERS);
>> +
>>  /* Parse instruction operands.
>>   */
>>  for (i = 0; i < info->num_dst + info->num_src + info->is_tex; i++) {
>>
>
> For both,
> Reviewed-by: Brian Paul 
>
> Should the first be cc'd for stable?
>

yeah, first was a real issue (2nd was just to give coverity a hint)..
first should go to however many stable branches are still a going
concern, I think..

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4] swr: implement clipPlanes/clipVertex/clipDistance/cullDistance

2016-05-26 Thread Rowley, Timothy O

> On May 26, 2016, at 8:33 AM, Ilia Mirkin  wrote:
> 
> On Thu, May 26, 2016 at 9:11 AM, Tim Rowley  
> wrote:
>> v2: only load the clip vertex once
>> 
>> v3: fix clip enable logic, add cullDistance
>> 
>> v4: remove duplicate fields in vs jit key, fix test of clip fixup needed
>> ---
>> docs/GL3.txt   |  2 +-
>> src/gallium/drivers/swr/swr_context.h  |  2 ++
>> src/gallium/drivers/swr/swr_screen.cpp |  3 +-
>> src/gallium/drivers/swr/swr_shader.cpp | 64 
>> ++
>> src/gallium/drivers/swr/swr_shader.h   |  1 +
>> src/gallium/drivers/swr/swr_state.cpp  | 24 -
>> 6 files changed, 93 insertions(+), 3 deletions(-)
>> 
>> diff --git a/docs/GL3.txt b/docs/GL3.txt
>> index 555a9be..5965f25 100644
>> --- a/docs/GL3.txt
>> +++ b/docs/GL3.txt
>> @@ -211,7 +211,7 @@ GL 4.5, GLSL 4.50:
>>   GL_ARB_ES3_1_compatibilityDONE (nvc0, radeonsi)
>>   GL_ARB_clip_control   DONE (i965, nv50, 
>> nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
>>   GL_ARB_conditional_render_invertedDONE (i965, nv50, 
>> nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
>> -  GL_ARB_cull_distance  DONE (i965, nv50, 
>> nvc0, llvmpipe, softpipe)
>> +  GL_ARB_cull_distance  DONE (i965, nv50, 
>> nvc0, llvmpipe, softpipe, swr)
>>   GL_ARB_derivative_control DONE (i965, nv50, 
>> nvc0, r600, radeonsi)
>>   GL_ARB_direct_state_accessDONE (all drivers)
>>   GL_ARB_get_texture_sub_image  DONE (all drivers)
>> diff --git a/src/gallium/drivers/swr/swr_context.h 
>> b/src/gallium/drivers/swr/swr_context.h
>> index a7383bb..75ecae3 100644
>> --- a/src/gallium/drivers/swr/swr_context.h
>> +++ b/src/gallium/drivers/swr/swr_context.h
>> @@ -89,6 +89,8 @@ struct swr_draw_context {
>>swr_jit_texture texturesFS[PIPE_MAX_SHADER_SAMPLER_VIEWS];
>>swr_jit_sampler samplersFS[PIPE_MAX_SAMPLERS];
>> 
>> +   float userClipPlanes[PIPE_MAX_CLIP_PLANES][4];
>> +
>>SWR_SURFACE_STATE renderTargets[SWR_NUM_ATTACHMENTS];
>> };
>> 
>> diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
>> b/src/gallium/drivers/swr/swr_screen.cpp
>> index 0772274..7851346 100644
>> --- a/src/gallium/drivers/swr/swr_screen.cpp
>> +++ b/src/gallium/drivers/swr/swr_screen.cpp
>> @@ -333,6 +333,8 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
>> param)
>>case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
>>case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
>>   return 1;
>> +   case PIPE_CAP_CULL_DISTANCE:
>> +  return 1;
>>case PIPE_CAP_TGSI_TXQS:
>>case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
>>case PIPE_CAP_SHAREABLE_SHADERS:
>> @@ -358,7 +360,6 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
>> param)
>>case PIPE_CAP_PCI_DEVICE:
>>case PIPE_CAP_PCI_FUNCTION:
>>case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
>> -   case PIPE_CAP_CULL_DISTANCE:
>>case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES:
>>   return 0;
>>}
>> diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
>> b/src/gallium/drivers/swr/swr_shader.cpp
>> index f693f51..5201c8f 100644
>> --- a/src/gallium/drivers/swr/swr_shader.cpp
>> +++ b/src/gallium/drivers/swr/swr_shader.cpp
>> @@ -40,6 +40,9 @@
>> #include "swr_state.h"
>> #include "swr_screen.h"
>> 
>> +static unsigned
>> +locate_linkage(ubyte name, ubyte index, struct tgsi_shader_info *info);
>> +
>> bool operator==(const swr_jit_fs_key &lhs, const swr_jit_fs_key &rhs)
>> {
>>return !memcmp(&lhs, &rhs, sizeof(lhs));
>> @@ -120,6 +123,11 @@ swr_generate_vs_key(struct swr_jit_vs_key &key,
>> {
>>memset(&key, 0, sizeof(key));
>> 
>> +   key.clip_plane_mask =
>> +  swr_vs->info.base.clipdist_writemask ?
>> +  swr_vs->info.base.clipdist_writemask & 
>> ctx->rasterizer->clip_plane_enable :
>> +  ctx->rasterizer->clip_plane_enable;
> 
> What about cull planes? What does this key control exactly? If it's
> clip | cull, then this needs to be more like
> 
> (swr_vs->info.base.clipdist_writemask &
> ctx->rasterizer->clip_plane_enable) |
> swr_vs->info.base.culldist_writemask
> 
> [if clipdist | culldist are written, otherwise just clip_plane_enable,
> like you have it now]

This key is an index to a hashtable per shader of the compiled variants.  Since 
culldist_writemask is intrinsic to the shader, it doesn’t need to be duplicated 
in the key.

I’m going to take a closer look at piglit results for the next iteration of 
this patch; I’ll consider your other comments in the process.  Might address 
them in a follow-up patch for optimizing revalidation/recompilation.  Thanks.

-Tim

>> +
>>swr_generate_sampler_key(swr_vs->info, ctx, PIPE_SHADER_VERTEX, key);
>> }
>> 
>> @@ -252,6 +260,62 @@ BuilderSWR::CompileVS(struct swr_context *ctx, 
>> swr_jit_vs_key &key)
>>   }
>>}
>> 
>> +   if (ctx->rasterizer->clip_plane_enable ||
>> + 

Re: [Mesa-dev] [PATCH 2/2] nvc0/ir: handle a load's reg result not being used for locked variants

2016-05-26 Thread Samuel Pitoiset
What about the Maxwell logic? LDS expects a GPR at def(0) and in case 
the dst reg doesn't exist, you move the predicate to def(0).


Are you sure you don't need to check if dst(0) is not a predicate in 
emitLDS()?


On 05/26/2016 04:44 AM, Ilia Mirkin wrote:

For a load locked, we might not use the first result but the second
result is the predicate result of the locking. In that case the load
splitting logic doesn't apply (which is designed for splitting 128-bit
loads). Instead we take the predicate and move it into the first
position (as having a dead result in first def's position upsets all
sorts of things including RA). Update the emitters to deal with this as
well.

Signed-off-by: Ilia Mirkin 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 20 ++---
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 26 +-
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 10 +++--
 3 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 6a5981d..27d9b8e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -2080,15 +2080,29 @@ CodeEmitterGK110::emitLOAD(const Instruction *i)
code[1] |= offset >> 9;

// Locked store on shared memory can fail.
+   int r = 0, p = -1;
if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
-  assert(i->defExists(1));
-  defId(i->def(1), 32 + 16);
+  if (i->def(0).getFile() == FILE_PREDICATE) { // p, #
+ r = -1;
+ p = 0;
+  } else if (i->defExists(1)) { // r, p
+ p = 1;
+  } else {
+ assert(!"Expected predicate dest for load locked");
+  }
}

emitPredicate(i);

-   defId(i->def(0), 2);
+   if (r >= 0)
+  defId(i->def(r), 2);
+   else
+  code[0] |= 255 << 2;
+
+   if (p >= 0)
+  defId(i->def(p), 32 + 16);
+
if (i->getIndirect(0, 0)) {
   srcId(i->src(0).getIndirect(0), 10);
   if (i->getIndirect(0, 0)->reg.size == 8)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 596293e..1bb962f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1874,17 +1874,31 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
}
code[1] = opc;

+   int r = 0, p = -1;
if (i->src(0).getFile() == FILE_MEMORY_SHARED) {
   if (i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
- assert(i->defExists(1));
- if (targ->getChipset() >= NVISA_GK104_CHIPSET)
-defId(i->def(1), 8);
- else
-defId(i->def(1), 32 + 18);
+ if (i->def(0).getFile() == FILE_PREDICATE) { // p, #
+r = -1;
+p = 0;
+ } else if (i->defExists(1)) { // r, p
+p = 1;
+ } else {
+assert(!"Expected predicate dest for load locked");
+ }
   }
}

-   defId(i->def(0), 14);
+   if (r >= 0)
+  defId(i->def(r), 14);
+   else
+  code[0] |= 63 << 14;
+
+   if (p >= 0) {
+  if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+ defId(i->def(p), 8);
+  else
+ defId(i->def(p), 32 + 18);
+   }

setAddressByFile(i->src(0));
srcId(i->src(0).getIndirect(0), 20);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index cd801f3..3213188 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -3265,14 +3265,20 @@ DeadCodeElim::visit(BasicBlock *bb)
  ++deadCount;
  delete_Instruction(prog, i);
   } else
-  if (i->defExists(1) && (i->op == OP_VFETCH || i->op == OP_LOAD)) {
+  if (i->defExists(1) &&
+  i->subOp == 0 &&
+  (i->op == OP_VFETCH || i->op == OP_LOAD)) {
  checkSplitLoad(i);
   } else
   if (i->defExists(0) && !i->getDef(0)->refCount()) {
  if (i->op == OP_ATOM ||
  i->op == OP_SUREDP ||
- i->op == OP_SUREDB)
+ i->op == OP_SUREDB) {
 i->setDef(0, NULL);
+ } else if (i->op == OP_LOAD && i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) 
{
+i->setDef(0, i->getDef(1));
+i->setDef(1, NULL);
+ }
   }
}
return true;



--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] nvc0/ir: handle a load's reg result not being used for locked variants

2016-05-26 Thread Samuel Pitoiset



On 05/26/2016 05:49 PM, Samuel Pitoiset wrote:

What about the Maxwell logic? LDS expects a GPR at def(0) and in case
the dst reg doesn't exist, you move the predicate to def(0).

Are you sure you don't need to check if dst(0) is not a predicate in
emitLDS()?


Oh, actually not because we don't lower on GM107, and not predicate are 
needed, so you are right.


Just wondering how did you find the issue. :=

Reviewed-by: Samuel Pitoiset 


On 05/26/2016 04:44 AM, Ilia Mirkin wrote:

For a load locked, we might not use the first result but the second
result is the predicate result of the locking. In that case the load
splitting logic doesn't apply (which is designed for splitting 128-bit
loads). Instead we take the predicate and move it into the first
position (as having a dead result in first def's position upsets all
sorts of things including RA). Update the emitters to deal with this as
well.

Signed-off-by: Ilia Mirkin 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 20
++---
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 26
+-
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 10 +++--
 3 files changed, 45 insertions(+), 11 deletions(-)

diff --git
a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 6a5981d..27d9b8e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -2080,15 +2080,29 @@ CodeEmitterGK110::emitLOAD(const Instruction *i)
code[1] |= offset >> 9;

// Locked store on shared memory can fail.
+   int r = 0, p = -1;
if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
-  assert(i->defExists(1));
-  defId(i->def(1), 32 + 16);
+  if (i->def(0).getFile() == FILE_PREDICATE) { // p, #
+ r = -1;
+ p = 0;
+  } else if (i->defExists(1)) { // r, p
+ p = 1;
+  } else {
+ assert(!"Expected predicate dest for load locked");
+  }
}

emitPredicate(i);

-   defId(i->def(0), 2);
+   if (r >= 0)
+  defId(i->def(r), 2);
+   else
+  code[0] |= 255 << 2;
+
+   if (p >= 0)
+  defId(i->def(p), 32 + 16);
+
if (i->getIndirect(0, 0)) {
   srcId(i->src(0).getIndirect(0), 10);
   if (i->getIndirect(0, 0)->reg.size == 8)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 596293e..1bb962f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1874,17 +1874,31 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
}
code[1] = opc;

+   int r = 0, p = -1;
if (i->src(0).getFile() == FILE_MEMORY_SHARED) {
   if (i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
- assert(i->defExists(1));
- if (targ->getChipset() >= NVISA_GK104_CHIPSET)
-defId(i->def(1), 8);
- else
-defId(i->def(1), 32 + 18);
+ if (i->def(0).getFile() == FILE_PREDICATE) { // p, #
+r = -1;
+p = 0;
+ } else if (i->defExists(1)) { // r, p
+p = 1;
+ } else {
+assert(!"Expected predicate dest for load locked");
+ }
   }
}

-   defId(i->def(0), 14);
+   if (r >= 0)
+  defId(i->def(r), 14);
+   else
+  code[0] |= 63 << 14;
+
+   if (p >= 0) {
+  if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+ defId(i->def(p), 8);
+  else
+ defId(i->def(p), 32 + 18);
+   }

setAddressByFile(i->src(0));
srcId(i->src(0).getIndirect(0), 20);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index cd801f3..3213188 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -3265,14 +3265,20 @@ DeadCodeElim::visit(BasicBlock *bb)
  ++deadCount;
  delete_Instruction(prog, i);
   } else
-  if (i->defExists(1) && (i->op == OP_VFETCH || i->op == OP_LOAD)) {
+  if (i->defExists(1) &&
+  i->subOp == 0 &&
+  (i->op == OP_VFETCH || i->op == OP_LOAD)) {
  checkSplitLoad(i);
   } else
   if (i->defExists(0) && !i->getDef(0)->refCount()) {
  if (i->op == OP_ATOM ||
  i->op == OP_SUREDP ||
- i->op == OP_SUREDB)
+ i->op == OP_SUREDB) {
 i->setDef(0, NULL);
+ } else if (i->op == OP_LOAD && i->subOp ==
NV50_IR_SUBOP_LOAD_LOCKED) {
+i->setDef(0, i->getDef(1));
+i->setDef(1, NULL);
+ }
   }
}
return true;





--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] nvc0/ir: handle a load's reg result not being used for locked variants

2016-05-26 Thread Samuel Pitoiset



On 05/26/2016 05:51 PM, Samuel Pitoiset wrote:



On 05/26/2016 05:49 PM, Samuel Pitoiset wrote:

What about the Maxwell logic? LDS expects a GPR at def(0) and in case
the dst reg doesn't exist, you move the predicate to def(0).

Are you sure you don't need to check if dst(0) is not a predicate in
emitLDS()?


Oh, actually not because we don't lower on GM107, and not predicate are
needed, so you are right.


[...] and we don't need to add a predicate [...]



Just wondering how did you find the issue. :=

Reviewed-by: Samuel Pitoiset 


On 05/26/2016 04:44 AM, Ilia Mirkin wrote:

For a load locked, we might not use the first result but the second
result is the predicate result of the locking. In that case the load
splitting logic doesn't apply (which is designed for splitting 128-bit
loads). Instead we take the predicate and move it into the first
position (as having a dead result in first def's position upsets all
sorts of things including RA). Update the emitters to deal with this as
well.

Signed-off-by: Ilia Mirkin 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 20
++---
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 26
+-
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 10 +++--
 3 files changed, 45 insertions(+), 11 deletions(-)

diff --git
a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 6a5981d..27d9b8e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -2080,15 +2080,29 @@ CodeEmitterGK110::emitLOAD(const Instruction *i)
code[1] |= offset >> 9;

// Locked store on shared memory can fail.
+   int r = 0, p = -1;
if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
-  assert(i->defExists(1));
-  defId(i->def(1), 32 + 16);
+  if (i->def(0).getFile() == FILE_PREDICATE) { // p, #
+ r = -1;
+ p = 0;
+  } else if (i->defExists(1)) { // r, p
+ p = 1;
+  } else {
+ assert(!"Expected predicate dest for load locked");
+  }
}

emitPredicate(i);

-   defId(i->def(0), 2);
+   if (r >= 0)
+  defId(i->def(r), 2);
+   else
+  code[0] |= 255 << 2;
+
+   if (p >= 0)
+  defId(i->def(p), 32 + 16);
+
if (i->getIndirect(0, 0)) {
   srcId(i->src(0).getIndirect(0), 10);
   if (i->getIndirect(0, 0)->reg.size == 8)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 596293e..1bb962f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1874,17 +1874,31 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
}
code[1] = opc;

+   int r = 0, p = -1;
if (i->src(0).getFile() == FILE_MEMORY_SHARED) {
   if (i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
- assert(i->defExists(1));
- if (targ->getChipset() >= NVISA_GK104_CHIPSET)
-defId(i->def(1), 8);
- else
-defId(i->def(1), 32 + 18);
+ if (i->def(0).getFile() == FILE_PREDICATE) { // p, #
+r = -1;
+p = 0;
+ } else if (i->defExists(1)) { // r, p
+p = 1;
+ } else {
+assert(!"Expected predicate dest for load locked");
+ }
   }
}

-   defId(i->def(0), 14);
+   if (r >= 0)
+  defId(i->def(r), 14);
+   else
+  code[0] |= 63 << 14;
+
+   if (p >= 0) {
+  if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+ defId(i->def(p), 8);
+  else
+ defId(i->def(p), 32 + 18);
+   }

setAddressByFile(i->src(0));
srcId(i->src(0).getIndirect(0), 20);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index cd801f3..3213188 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -3265,14 +3265,20 @@ DeadCodeElim::visit(BasicBlock *bb)
  ++deadCount;
  delete_Instruction(prog, i);
   } else
-  if (i->defExists(1) && (i->op == OP_VFETCH || i->op ==
OP_LOAD)) {
+  if (i->defExists(1) &&
+  i->subOp == 0 &&
+  (i->op == OP_VFETCH || i->op == OP_LOAD)) {
  checkSplitLoad(i);
   } else
   if (i->defExists(0) && !i->getDef(0)->refCount()) {
  if (i->op == OP_ATOM ||
  i->op == OP_SUREDP ||
- i->op == OP_SUREDB)
+ i->op == OP_SUREDB) {
 i->setDef(0, NULL);
+ } else if (i->op == OP_LOAD && i->subOp ==
NV50_IR_SUBOP_LOAD_LOCKED) {
+i->setDef(0, i->getDef(1));
+i->setDef(1, NULL);
+ }
   }
}
return true;







--
-Samuel
___
mesa-dev mailing l

Re: [Mesa-dev] [PATCH] i965/bxt: Add 2x6 variant

2016-05-26 Thread Kristian Høgsberg
On Thu, May 26, 2016 at 8:08 AM, Ben Widawsky  wrote:
> Cc: mesa-sta...@lists.freedesktop.org
> Signed-off-by: Ben Widawsky 

The numbers check out.

Reviewed-by: Kristian Høgsberg 

> ---
>  include/pci_ids/i965_pci_ids.h  |  2 ++
>  src/mesa/drivers/dri/i965/brw_device_info.c | 22 ++
>  2 files changed, 24 insertions(+)
>
> diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
> index bd645fa..fce00da 100644
> --- a/include/pci_ids/i965_pci_ids.h
> +++ b/include/pci_ids/i965_pci_ids.h
> @@ -162,4 +162,6 @@ CHIPSET(0x22B2, chv, "Intel(R) HD Graphics 
> (Cherryview)")
>  CHIPSET(0x22B3, chv, "Intel(R) HD Graphics (Cherryview)")
>  CHIPSET(0x0A84, bxt, "Intel(R) HD Graphics (Broxton)")
>  CHIPSET(0x1A84, bxt, "Intel(R) HD Graphics (Broxton)")
> +CHIPSET(0x1A85, bxt_2x6, "Intel(R) HD Graphics (Broxton 2x6)")
>  CHIPSET(0x5A84, bxt, "Intel(R) HD Graphics (Broxton)")
> +CHIPSET(0x5A85, bxt_2x6, "Intel(R) HD Graphics (Broxton 2x6)")
> diff --git a/src/mesa/drivers/dri/i965/brw_device_info.c 
> b/src/mesa/drivers/dri/i965/brw_device_info.c
> index 3666190..77bbe78 100644
> --- a/src/mesa/drivers/dri/i965/brw_device_info.c
> +++ b/src/mesa/drivers/dri/i965/brw_device_info.c
> @@ -401,6 +401,28 @@ static const struct brw_device_info brw_device_info_bxt 
> = {
> }
>  };
>
> +static const struct brw_device_info brw_device_info_bxt_2x6 = {
> +   GEN9_FEATURES,
> +   .is_broxton = 1,
> +   .gt = 1,
> +   .has_llc = false,
> +
> +   .num_slices = 1,
> +   .max_vs_threads = 56, /* XXX: guess */
> +   .max_hs_threads = 56, /* XXX: guess */
> +   .max_ds_threads = 56,
> +   .max_gs_threads = 56,
> +   .max_wm_threads = 64 * 2,
> +   .max_cs_threads = 6 * 6,
> +   .urb = {
> +  .size = 128,
> +  .min_vs_entries = 34,
> +  .max_vs_entries = 352,
> +  .max_hs_entries = 128,
> +  .max_ds_entries = 208,
> +  .max_gs_entries = 128,
> +   }
> +};
>  /*
>   * Note: for all KBL SKUs, the PRM says SKL for GS entries, not SKL+.
>   * There's no KBL entry. Using the default SKL (GEN9) GS entries value.
> --
> 2.8.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/10] i965, anv: Stop linking the Vulkan driver to libmesa

2016-05-26 Thread Emil Velikov
On 26 May 2016 at 02:52, Jason Ekstrand  wrote:
> This little series reworks the build a bit so that we can stop linking the
> Vulkan driver to libmesa.  This lets us substantially cut down on the size
> of the final binary.  The whole series can be found in a branch here:
>
> https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/anv-no-libmesa
>
> Cc: Emil Velikov 
>
> Jason Ekstrand (1
>   compiler: Move glsl_to_nir to libglsl.la
>   ptn: Include nir.h
>   i965/nir: Move the type_size_*_bytes functions to brw_nir.h
>   i965: Move brw_create_nir to brw_program.c
>   i965: Move brw_nir_lower_uniforms.cpp to i965_FILES
>   i965: Move brw_new_shader to brw_link.cpp
>   i965/test: Remove the fragment/vertex_program field from test visitors
>   i965: Move compiler debug functions to intel_screen.c
>   i965: Don't link libmesa or libdri_test_stubs into tests
>   anv: Stop linking against libmesa.la and libdri_test_stubs.la
>
Barring the suggestion for patch 1/10 the series looks great.

For everything else
Reviewed-by: Emil Velikov 

The other cool thing is that now we can nuke libdri_test_stubs :-)
I'll send a patch for that as this hits master.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/10] anv: Stop linking against libmesa.la and libdri_test_stubs.la

2016-05-26 Thread Kristian Høgsberg
Update the TODO as well. With that, for the series:

Reviewed-by: Kristian Høgsberg 

On Wed, May 25, 2016 at 6:52 PM, Jason Ekstrand  wrote:
> This brings the final size of an optimized non-debug build of the Vulkan
> driver down to 2.9 MB as opposed to 8.7 MB for the dri driver.
> ---
>  src/intel/vulkan/Makefile.am | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/intel/vulkan/Makefile.am b/src/intel/vulkan/Makefile.am
> index 662d720..8c6ce18 100644
> --- a/src/intel/vulkan/Makefile.am
> +++ b/src/intel/vulkan/Makefile.am
> @@ -112,10 +112,10 @@ libvulkan_common_la_SOURCES = $(VULKAN_SOURCES)
>
>  VULKAN_LIB_DEPS += \
> libvulkan_common.la \
> -   $(top_builddir)/src/intel/isl/libisl.la \
> $(top_builddir)/src/mesa/drivers/dri/i965/libi965_compiler.la \
> -   $(top_builddir)/src/mesa/libmesa.la \
> -   $(top_builddir)/src/mesa/drivers/dri/common/libdri_test_stubs.la \
> +   $(top_builddir)/src/compiler/nir/libnir.la \
> +   $(top_builddir)/src/util/libmesautil.la \
> +   $(top_builddir)/src/intel/isl/libisl.la \
> $(PER_GEN_LIBS) \
> $(PTHREAD_LIBS) \
> $(DLOPEN_LIBS) \
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] nvc0/ir: handle a load's reg result not being used for locked variants

2016-05-26 Thread Ilia Mirkin
On Thu, May 26, 2016 at 11:51 AM, Samuel Pitoiset
 wrote:
>
>
> On 05/26/2016 05:49 PM, Samuel Pitoiset wrote:
>>
>> What about the Maxwell logic? LDS expects a GPR at def(0) and in case
>> the dst reg doesn't exist, you move the predicate to def(0).
>>
>> Are you sure you don't need to check if dst(0) is not a predicate in
>> emitLDS()?
>
>
> Oh, actually not because we don't lower on GM107, and not predicate are
> needed, so you are right.
>
> Just wondering how did you find the issue. :=

Dave gave me a shader (generated by CTS, I think) that did
atomicExchange() without returning the result. It hit an assert.

>
> Reviewed-by: Samuel Pitoiset 

Thanks! Will push this out tonight.

  -ilia

>
>>
>> On 05/26/2016 04:44 AM, Ilia Mirkin wrote:
>>>
>>> For a load locked, we might not use the first result but the second
>>> result is the predicate result of the locking. In that case the load
>>> splitting logic doesn't apply (which is designed for splitting 128-bit
>>> loads). Instead we take the predicate and move it into the first
>>> position (as having a dead result in first def's position upsets all
>>> sorts of things including RA). Update the emitters to deal with this as
>>> well.
>>>
>>> Signed-off-by: Ilia Mirkin 
>>> ---
>>>  .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 20
>>> ++---
>>>  .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 26
>>> +-
>>>  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 10 +++--
>>>  3 files changed, 45 insertions(+), 11 deletions(-)
>>>
>>> diff --git
>>> a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
>>> index 6a5981d..27d9b8e 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
>>> @@ -2080,15 +2080,29 @@ CodeEmitterGK110::emitLOAD(const Instruction *i)
>>> code[1] |= offset >> 9;
>>>
>>> // Locked store on shared memory can fail.
>>> +   int r = 0, p = -1;
>>> if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
>>> i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
>>> -  assert(i->defExists(1));
>>> -  defId(i->def(1), 32 + 16);
>>> +  if (i->def(0).getFile() == FILE_PREDICATE) { // p, #
>>> + r = -1;
>>> + p = 0;
>>> +  } else if (i->defExists(1)) { // r, p
>>> + p = 1;
>>> +  } else {
>>> + assert(!"Expected predicate dest for load locked");
>>> +  }
>>> }
>>>
>>> emitPredicate(i);
>>>
>>> -   defId(i->def(0), 2);
>>> +   if (r >= 0)
>>> +  defId(i->def(r), 2);
>>> +   else
>>> +  code[0] |= 255 << 2;
>>> +
>>> +   if (p >= 0)
>>> +  defId(i->def(p), 32 + 16);
>>> +
>>> if (i->getIndirect(0, 0)) {
>>>srcId(i->src(0).getIndirect(0), 10);
>>>if (i->getIndirect(0, 0)->reg.size == 8)
>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>>> index 596293e..1bb962f 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>>> @@ -1874,17 +1874,31 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
>>> }
>>> code[1] = opc;
>>>
>>> +   int r = 0, p = -1;
>>> if (i->src(0).getFile() == FILE_MEMORY_SHARED) {
>>>if (i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
>>> - assert(i->defExists(1));
>>> - if (targ->getChipset() >= NVISA_GK104_CHIPSET)
>>> -defId(i->def(1), 8);
>>> - else
>>> -defId(i->def(1), 32 + 18);
>>> + if (i->def(0).getFile() == FILE_PREDICATE) { // p, #
>>> +r = -1;
>>> +p = 0;
>>> + } else if (i->defExists(1)) { // r, p
>>> +p = 1;
>>> + } else {
>>> +assert(!"Expected predicate dest for load locked");
>>> + }
>>>}
>>> }
>>>
>>> -   defId(i->def(0), 14);
>>> +   if (r >= 0)
>>> +  defId(i->def(r), 14);
>>> +   else
>>> +  code[0] |= 63 << 14;
>>> +
>>> +   if (p >= 0) {
>>> +  if (targ->getChipset() >= NVISA_GK104_CHIPSET)
>>> + defId(i->def(p), 8);
>>> +  else
>>> + defId(i->def(p), 32 + 18);
>>> +   }
>>>
>>> setAddressByFile(i->src(0));
>>> srcId(i->src(0).getIndirect(0), 20);
>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> index cd801f3..3213188 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> @@ -3265,14 +3265,20 @@ DeadCodeElim::visit(BasicBlock *bb)
>>>   ++deadCount;
>>>   delete_Instruction(prog, i);
>>>} else
>>> -  if (i->defExists(1) && (i->op == OP_VFETCH || i->op == OP_LOAD)) {
>>> +  if (i->defExists(1) &&
>>> +  i->subOp == 0 &&
>>> + 

Re: [Mesa-dev] [PATCH 2/2] svga: fix test for unfilled triangles fallback

2016-05-26 Thread Charmaine Lee
This series looks good to me.

Reviewed-by: Charmaine Lee 

From: Brian Paul 
Sent: Thursday, May 26, 2016 7:09 AM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee; Roland Scheidegger
Subject: [PATCH 2/2] svga: fix test for unfilled triangles fallback

VGPU10 actually supports line-mode triangles.  We failed to make use of
that before.
---
 src/gallium/drivers/svga/svga_draw_arrays.c   |  8 --
 src/gallium/drivers/svga/svga_draw_elements.c |  3 +--
 src/gallium/drivers/svga/svga_draw_private.h  | 38 +--
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_draw_arrays.c 
b/src/gallium/drivers/svga/svga_draw_arrays.c
index c056772..43d7a97 100644
--- a/src/gallium/drivers/svga/svga_draw_arrays.c
+++ b/src/gallium/drivers/svga/svga_draw_arrays.c
@@ -212,6 +212,11 @@ svga_hwtnl_draw_arrays(struct svga_hwtnl *hwtnl,
unsigned api_pv = hwtnl->api_pv;
struct svga_context *svga = hwtnl->svga;

+   if (svga->curr.rast->templ.fill_front !=
+   svga->curr.rast->templ.fill_back) {
+  assert(hwtnl->api_fillmode == PIPE_POLYGON_MODE_FILL);
+   }
+
if (svga->curr.rast->templ.flatshade &&
svga->state.hw_draw.fs->constant_color_output) {
   /* The fragment color is a constant, not per-vertex so the whole
@@ -236,8 +241,7 @@ svga_hwtnl_draw_arrays(struct svga_hwtnl *hwtnl,
   }
}

-   if (hwtnl->api_fillmode != PIPE_POLYGON_MODE_FILL &&
-   u_reduced_prim(prim) == PIPE_PRIM_TRIANGLES) {
+   if (svga_need_unfilled_fallback(hwtnl, prim)) {
   /* Convert unfilled polygons into points, lines, triangles */
   gen_type = u_unfilled_generator(prim,
   start,
diff --git a/src/gallium/drivers/svga/svga_draw_elements.c 
b/src/gallium/drivers/svga/svga_draw_elements.c
index a987b92..b74c745 100644
--- a/src/gallium/drivers/svga/svga_draw_elements.c
+++ b/src/gallium/drivers/svga/svga_draw_elements.c
@@ -138,8 +138,7 @@ svga_hwtnl_draw_range_elements(struct svga_hwtnl *hwtnl,
u_translate_func gen_func;
enum pipe_error ret = PIPE_OK;

-   if (hwtnl->api_fillmode != PIPE_POLYGON_MODE_FILL &&
-   u_reduced_prim(prim) == PIPE_PRIM_TRIANGLES) {
+   if (svga_need_unfilled_fallback(hwtnl, prim)) {
   gen_type = u_unfilled_translator(prim,
index_size,
count,
diff --git a/src/gallium/drivers/svga/svga_draw_private.h 
b/src/gallium/drivers/svga/svga_draw_private.h
index 48e0b60..da5d60e 100644
--- a/src/gallium/drivers/svga/svga_draw_private.h
+++ b/src/gallium/drivers/svga/svga_draw_private.h
@@ -29,6 +29,8 @@
 #include "pipe/p_compiler.h"
 #include "pipe/p_defines.h"
 #include "indices/u_indices.h"
+#include "util/u_prim.h"
+#include "svga_context.h"
 #include "svga_hw_reg.h"
 #include "svga3d_shaderdefs.h"

@@ -182,9 +184,41 @@ struct svga_hwtnl {



-/***
- * Internal functions
+/**
+ * Do we need to use the gallium 'indices' helper to render unfilled
+ * triangles?
  */
+static inline boolean
+svga_need_unfilled_fallback(const struct svga_hwtnl *hwtnl, unsigned prim)
+{
+   const struct svga_context *svga = hwtnl->svga;
+
+   if (u_reduced_prim(prim) != PIPE_PRIM_TRIANGLES) {
+  /* if we're drawing points or lines, no fallback needed */
+  return FALSE;
+   }
+
+   if (svga_have_vgpu10(svga)) {
+  /* vgpu10 supports polygon fill and line modes */
+  if ((prim == PIPE_PRIM_QUADS ||
+   prim == PIPE_PRIM_QUAD_STRIP ||
+   prim == PIPE_PRIM_POLYGON) &&
+  hwtnl->api_fillmode == PIPE_POLYGON_MODE_LINE) {
+ /* VGPU10 doesn't directly render quads or polygons.  They're
+  * converted to triangles.  If we let the device draw the triangle
+  * outlines we'll get an extra, stray lines in the interiors.
+  * So, to draw unfilled quads correctly, we need the fallback.
+  */
+ return true;
+  }
+  return hwtnl->api_fillmode == PIPE_POLYGON_MODE_POINT;
+   } else {
+  /* vgpu9 doesn't support line or point fill modes */
+  return hwtnl->api_fillmode != PIPE_POLYGON_MODE_FILL;
+   }
+}
+
+
 enum pipe_error
 svga_hwtnl_prim( struct svga_hwtnl *hwtnl,
  const SVGA3dPrimitiveRange *range,
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [V2 PATCH] meta: Fix the pbo usage in meta for GLES{1, 2} contexts

2016-05-26 Thread Anuj Phogat
On Thu, May 26, 2016 at 7:18 AM, Emil Velikov  wrote:
> Hi all,
>
> On 2 March 2016 at 03:22, Ian Romanick  wrote:
>> Sorry for the delay.
>>
>> Reviewed-by: Ian Romanick 
>>
>>
>> On 02/09/2016 03:28 PM, Anuj Phogat wrote:
>>> OpenGL ES 1.0 doesn't support using GL_STREAM_DRAW and both
>>> ES 1.0 and 2.0 don't support GL_STREAM_READ in glBufferData().
>>> So, handle it correctly by calling the _mesa_meta_begin()
>>> before create_texture_for_pbo().
>>>
>>> V2: Remove the changes related to allocate_storage. (Ian)
>>>
>>> Cc: Ian Romanick 
>>> Cc: "11.1" 
>>
>>   "11.1 11.2"
>>
>>> Signed-off-by: Anuj Phogat 
>>> ---
>>>  src/mesa/drivers/common/meta_tex_subimage.c | 21 +
>>>  1 file changed, 13 insertions(+), 8 deletions(-)
>>>
> It doesn't seem like this patch has landed yet, despite being
> reviewed. Sadly it no longer applies cleanly to master, so I'm
> wondering if it's still applicable or there's an alternative solution
> (be that merged or not).
>
> Thanks
> Emil

This patch has landed in commit 6d4ebbe.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] mesa: add support for GLSL ES 3.20 version string

2016-05-26 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 src/mesa/main/getstring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/main/getstring.c b/src/mesa/main/getstring.c
index c39a076..6e90511 100644
--- a/src/mesa/main/getstring.c
+++ b/src/mesa/main/getstring.c
@@ -80,6 +80,8 @@ shading_language_version(struct gl_context *ctx)
  return (const GLubyte *) "OpenGL ES GLSL ES 3.00";
   case 31:
  return (const GLubyte *) "OpenGL ES GLSL ES 3.10";
+  case 32:
+ return (const GLubyte *) "OpenGL ES GLSL ES 3.20";
   default:
  _mesa_problem(ctx,
"Invalid OpenGL ES version in 
shading_language_version()");
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   >