Re: [Mesa-dev] [PATCH v2 31/34] i965/state: Account for the element size in emit_buffer_surface_state

2016-06-28 Thread Pohjolainen, Topi
On Tue, Jun 28, 2016 at 09:22:49AM +0300, Pohjolainen, Topi wrote:
> On Thu, Jun 23, 2016 at 02:00:30PM -0700, Jason Ekstrand wrote:
> > ---
> >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c  | 11 ++-
> >  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |  9 +
> >  src/mesa/drivers/dri/i965/gen8_surface_state.c|  9 +
> >  3 files changed, 16 insertions(+), 13 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
> > b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > index 944d64d..29b8976 100644
> > --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > @@ -496,6 +496,7 @@ gen4_emit_buffer_surface_state(struct brw_context *brw,
> > unsigned pitch,
> > bool rw)
> >  {
> > +   unsigned elements = buffer_size / pitch;
> 
> Could be const as well as in the two other occurences further down.

Otherwise patches 31-34 are also:

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: add driconf to zero-init unintialized vars

2016-06-28 Thread Eirik Byrkjeflot Anonsen
Rob Clark  writes:

> On Tue, Jun 28, 2016 at 11:28 AM, Marek Olšák  wrote:
>> On Mon, Jun 27, 2016 at 9:28 PM, Rob Clark  wrote:
>>> On Mon, Jun 27, 2016 at 3:06 PM, Kenneth Graunke  
>>> wrote:
 On Monday, June 27, 2016 11:43:28 AM PDT Matt Turner wrote:
> On Mon, Jun 27, 2016 at 4:44 AM, Rob Clark  wrote:
> > On Mon, Jun 27, 2016 at 7:13 AM, Alan Swanson 
> >  wrote:
> >> On 2016-06-25 13:37, Rob Clark wrote:
> >>>
> >>> Some games are sloppy.. perhaps because it is defined behavior for DX 
> >>> or
> >>> perhaps because nv blob driver defaults things to zero.
> >>>
> >>> So add driconf param to force uninitialized variables to default to 
> >>> zero.
> >>>
> >>> This issue was observed with rust, from steam store.  But has surfaced
> >>> elsewhere in the past.
> >>>
> >>> Signed-off-by: Rob Clark 
> >>> ---
> >>> Note that I left out the drirc bit, since not entirely sure how to
> >>> identify this game.  (I don't actually have the game, just working off
> >>> of an apitrace)
> >>>
> >>> Possibly worth mentioning that for the shaders using uninitialized 
> >>> vars
> >>> having zero-initializers lets constant-propagation get rid of a whole
> >>> lot of instructions.  One shader I saw dropped to less than half of
> >>> it's original instruction count.
> >>
> >>
> >> If the default for uninitialised variables is undefined, then with the
> >> reported shader optimisations why bother with the (DRI) option when
> >> zeroing could still essentially be classed as undefined?
> >>
> >> Cuts the patch down to just the src/compiler/glsl/ast_to_hir.cpp 
> >> change.
> >
> > I did suggest that on #dri-devel, but Jason had a theoretical example
> > where it would hurt.. iirc something like:
> >
> >   float maybe_undef;
> >   for (int i = 0; i < some_uniform_at_least_one; i++)
> >  maybe_undef = ...
> >
> > also, he didn't want to hide shader bugs that app should fix.
> >
> > It would be interesting to rush shaderdb w/ glsl_zero_init=true and
> > see what happens, but I didn't get around to that yet.
>
> Here's what I get on i965. It's not a clear win.
>
> total instructions in shared programs: 5249030 -> 5249002 (-0.00%)
> instructions in affected programs: 28936 -> 28908 (-0.10%)
> helped: 66
> HURT: 132
>
> total cycles in shared programs: 57966694 -> 57956306 (-0.02%)
> cycles in affected programs: 1136118 -> 1125730 (-0.91%)
> helped: 78
> HURT: 106

 I suspect most of the help is because we're missing undef optimizations,
 such as CSE...while zero could be CSE'd.  (I have a patch, but it hurts
 things too...)
>>>
>>> right, I was thinking that treating undef as zero in constant-folding
>>> would have the same effect.. ofc it might make shader bugs less
>>> obvious.
>>>
>>> Btw, does anyone know what fglrx does?  Afaiu nv blob treats undef as
>>> zero.  If fglrx does the same, I suppose that strengthens the argument
>>> for "just do this unconditionally".
>>
>> No idea what fglrx does, but LLVM does eliminate code with undefined
>> inputs. Initializing everything to 0 might make that worse.
>
> hmm, treating as zero does eliminate a lot.. anyway, I guess we'll
> stick w/ driconf.
>
> fwiw, with some help from the reporter, we figured out that this is
> the bit that I need to squash into drirc:
>
> 
> 
> 

Not knowing a lot about drirc, I suspect you should have a double quote
at the end of glsl_zero_init as well?

eirik

> now, if I could talk somebody into a r-b for this and the i965 fix? ;-)
>
> BR,
> -R
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] st/mesa: get max supported number of image samples from driver

2016-06-28 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 src/mesa/state_tracker/st_extensions.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index b87c9ce..b55b2c2 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -443,7 +443,6 @@ void st_init_limits(struct pipe_screen *screen,
  c->Program[MESA_SHADER_COMPUTE].MaxImageUniforms;
c->MaxCombinedShaderOutputResources += c->MaxCombinedImageUniforms;
c->MaxImageUnits = MAX_IMAGE_UNITS;
-   c->MaxImageSamples = 0; /* XXX */
if (c->MaxCombinedImageUniforms) {
   extensions->ARB_shader_image_load_store = GL_TRUE;
   extensions->ARB_shader_image_size = GL_TRUE;
@@ -988,6 +987,11 @@ void st_init_extensions(struct pipe_screen *screen,
  color_formats, 16,
  PIPE_BIND_RENDER_TARGET);
 
+  consts->MaxImageSamples =
+ get_max_samples_for_formats(screen, ARRAY_SIZE(color_formats),
+ color_formats, 16,
+ PIPE_BIND_SHADER_IMAGE);
+
   consts->MaxColorTextureSamples =
  get_max_samples_for_formats(screen, ARRAY_SIZE(color_formats),
  color_formats, consts->MaxSamples,
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] nvc0: fix up image support for allowing multiple samples

2016-06-28 Thread Ilia Mirkin
Basically we just have to scale up the coordinates and then add the
relevant sample offset. The code to handle this was already largely
present from Christoph's earlier attempts to pipe images through back in
the dark ages, this just hooks it all up.

Signed-off-by: Ilia Mirkin 
---

Only tested on GK208... probably would be good for someone on GF1xx to give it 
a shot.

 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |  3 +++
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  |  4 
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c| 24 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_context.h|  2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 20 --
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 20 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_tex.c|  6 --
 7 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 0fa5aa1..f3d7bee 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -2388,6 +2388,9 @@ Converter::getImageCoords(std::vector , 
int r, int s)
 
for (int c = 0; c < arg; ++c)
   coords.push_back(fetchSrc(s, c));
+
+   if (t.isMS())
+  coords.push_back(fetchSrc(s, 3));
 }
 
 // For raw loads, granularity is 4 byte.
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index 67bd73b..73b680a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -2044,6 +2044,10 @@ 
NVC0LoweringPass::processSurfaceCoordsNVC0(TexInstruction *su)
Value *v;
Value *ind = NULL;
 
+   bld.setPosition(su, false);
+
+   adjustCoordinatesMS(su);
+
if (su->tex.rIndirectSrc >= 0) {
   ind = su->getIndirectR();
   if (su->tex.r > 0) {
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index 7511819..f5f7fd4 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -113,6 +113,30 @@ nvc0_screen_compute_setup(struct nvc0_screen *screen,
PUSH_DATA (push, screen->txc->offset + 65536);
PUSH_DATA (push, NVC0_TSC_MAX_ENTRIES - 1);
 
+   /* MS sample coordinate offsets */
+   BEGIN_NVC0(push, NVC0_CP(CB_SIZE), 3);
+   PUSH_DATA (push, 2048);
+   PUSH_DATAh(push, screen->uniform_bo->offset + NVC0_CB_AUX_INFO(5));
+   PUSH_DATA (push, screen->uniform_bo->offset + NVC0_CB_AUX_INFO(5));
+   BEGIN_1IC0(push, NVC0_CP(CB_POS), 1 + 2 * 8);
+   PUSH_DATA (push, NVC0_CB_AUX_MS_INFO);
+   PUSH_DATA (push, 0); /* 0 */
+   PUSH_DATA (push, 0);
+   PUSH_DATA (push, 1); /* 1 */
+   PUSH_DATA (push, 0);
+   PUSH_DATA (push, 0); /* 2 */
+   PUSH_DATA (push, 1);
+   PUSH_DATA (push, 1); /* 3 */
+   PUSH_DATA (push, 1);
+   PUSH_DATA (push, 2); /* 4 */
+   PUSH_DATA (push, 0);
+   PUSH_DATA (push, 3); /* 5 */
+   PUSH_DATA (push, 0);
+   PUSH_DATA (push, 2); /* 6 */
+   PUSH_DATA (push, 1);
+   PUSH_DATA (push, 3); /* 7 */
+   PUSH_DATA (push, 1);
+
return 0;
 }
 
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
index 8a965fc..c633ccf 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
@@ -112,7 +112,7 @@
 #define NVC0_CB_AUX_TEX_INFO(i) 0x020 + (i) * 4
 #define NVC0_CB_AUX_TEX_SIZE(32 * 4)
 /* 8 sets of 32-bits coordinate offsets */
-#define NVC0_CB_AUX_MS_INFO 0x0a0 /* CP */
+#define NVC0_CB_AUX_MS_INFO 0x0a0
 #define NVC0_CB_AUX_MS_SIZE (8 * 2 * 4)
 /* block/grid size, at 3 32-bits integers each and gridid */
 #define NVC0_CB_AUX_GRID_INFO   0x0e0 /* CP */
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index aba9511..d75b702 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -555,29 +555,25 @@ nvc0_program_translate(struct nvc0_program *prog, 
uint16_t chipset,
 
info->io.genUserClip = prog->vp.num_ucps;
info->io.auxCBSlot = 15;
+   info->io.msInfoCBSlot = 15;
info->io.ucpBase = NVC0_CB_AUX_UCP_INFO;
info->io.drawInfoBase = NVC0_CB_AUX_DRAW_INFO;
+   info->io.msInfoBase = NVC0_CB_AUX_MS_INFO;
+   info->io.bufInfoBase = NVC0_CB_AUX_BUF_INFO(0);
+   info->io.suInfoBase = NVC0_CB_AUX_SU_INFO(0);
+   if (chipset >= NVISA_GK104_CHIPSET) {
+  info->io.texBindBase = NVC0_CB_AUX_TEX_INFO(0);
+   }
 
if (prog->type == PIPE_SHADER_COMPUTE) {
   if (chipset >= NVISA_GK104_CHIPSET) {
  info->io.auxCBSlot = 7;
- info->io.texBindBase = NVC0_CB_AUX_TEX_INFO(0);
+ 

[Mesa-dev] [PATCH v2] gallium: Force blend color to 16-byte alignment

2016-06-28 Thread Chuck Atkins
This aligns the 4-element color float array to 16 byte boundaries.  This
should allow compiler vectorizers to generate better optimizations.
Also fixes broken vectorization generated by Intel compiler.

v2: Fixed indentation and added a lengthy comment explaining the
reason for the alignment.

Reported-by: Tim Rowley 
Tested-by: Tim Rowley 
Signed-off-by: Chuck Atkins 
Acked-by: Roland Scheidegger 
---
 src/gallium/include/pipe/p_state.h | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/gallium/include/pipe/p_state.h 
b/src/gallium/include/pipe/p_state.h
index 1543e90..5526c39 100644
--- a/src/gallium/include/pipe/p_state.h
+++ b/src/gallium/include/pipe/p_state.h
@@ -326,7 +326,17 @@ struct pipe_blend_state
 
 struct pipe_blend_color
 {
-   float color[4];
+   /**
+* Making the color array explicitly 16-byte aligned provides a hint to
+* compilers to make more efficient auto-vectorization optimizations.
+* The actual performance gains from vectorizing the blend color array are
+* fairly minimal, if any, but the alignment is necessary to work around
+* buggy vectorization in some compilers which fail to generate the correct
+* unaligned accessors resulting in a segfault.  Specifically several
+* versions of the Intel compiler are known to be affected but it's likely
+* others are as well.
+*/
+   PIPE_ALIGN_VAR(16) float color[4];
 };
 
 
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium: Force blend color to 16-byte alignment

2016-06-28 Thread Chuck Atkins
Really it's a workaround to fix bad vectorization in the Intel compiler,
but it doesn't it doesn't hurt for other compilers, even if the performance
difference is marginal if at all, and could only help.  If it was
problematic otherwise I'd guard it with an #ifdef _INTEL_COMPILER.  I can
update the patch with a comment explaining why it's there in case other
developers stumble on it and think "wtf".
On Jun 28, 2016 6:38 PM, "Roland Scheidegger"  wrote:

Am 28.06.2016 um 22:45 schrieb Chuck Atkins:
> This aligns the 4-element color float array to 16 byte boundaries.  This
> should allow compiler vectorizers to generate better optimizations.
> Also fixes broken vectorization generated by Intel compiler.
>
> Reported-by: Tim Rowley 
> Signed-off-by: Chuck Atkins 
> ---
>  src/gallium/include/pipe/p_state.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/include/pipe/p_state.h
b/src/gallium/include/pipe/p_state.h
> index 1543e90..95f140f 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -326,7 +326,7 @@ struct pipe_blend_state
>
>  struct pipe_blend_color
>  {
> -   float color[4];
> +  PIPE_ALIGN_VAR(16) float color[4];
>  };
>

I'm wondering if that's really needed. I have a difficult time to
imagine setting blend color is performance critical. And driver internal
you can obviously still align pipe_blend_color structs yourself.
But OTOH, why not...

Acked-by: Roland Scheidegger 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [PATCH 2/2] drm/i915: Removing PCI IDs that are no longer listed as Kabylake.

2016-06-28 Thread Pandiyan, Dhinakaran
On Thu, 2016-06-23 at 14:50 -0700, Rodrigo Vivi wrote:
> -   INTEL_VGA_DEVICE(0x5932, info), /* DT  GT4 */ \
> -   INTEL_VGA_DEVICE(0x593B, info), /* Halo GT4 */ \
> -   INTEL_VGA_DEVICE(0x593A, info), /* SRV GT4 */ \
> -   INTEL_VGA_DEVICE(0x593D, info)  /* WKS GT4


Reviewed-by: Dhinakaran Pandiyan 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] V3 On disk shader cache for i965 (Now with real world results!)

2016-06-28 Thread Grazvydas Ignotas
On Tue, Jun 28, 2016 at 10:53 AM, Timothy Arceri
 wrote:
> On Mon, 2016-06-27 at 00:46 +1000, Timothy Arceri wrote:
>> On Sun, 2016-06-26 at 16:15 +0300, Grazvydas Ignotas wrote:
>> > Tried this while playing with apitrace and am getting segfaults
>> > when
>> > running any trace with a cached (second) run. Not sure if it's
>> > "wrong"
>> > traces I've chosen or what, you can take one example from this bug:
>> > https://bugs.freedesktop.org/show_bug.cgi?id=96425
>>
>> Thanks for testing I'll take a look tomorrow.
>
> The problem is the shaders were being detached after linking so we had
> nothing to fallback to if we had a shade cache miss.
> I've hacked something up and pushed it to the shader-cache19 branch
> that allows the trace to run. Not sure how it relates to real game
> performance but the trace goes from 5FPS to 7FPS on the second run on
> my machine with which looks good :)

Seems to work now and makes things a good deal faster. nice!

However I have a case of one trace's cache seemingly affecting another
trace, are you interested in that? One of them (the one that gets
broken) is from this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=92229
Unfortunately the other "bad" one is my own and is over a gigabyte
(even compressed), I'll need to trim it I guess.

>> > It would also be good idea to hide the cache debug messages behind
>> > some env var, or at least send them to stderr and not stdout, as
>> > stdout breaks programs that pipe data through stdout like
>> > qapitrace.
>>
>> Right thats my next task, I should get this done tomorrow also. As
>> stated below :) "For now I have left in some printf's as the feature
>> is
>> still disabled by default and they are useful for debugging. I intend
>> to fix this soon to hide them behind an environment var."

Yes I have read that (even used your wording in my comment), but
somehow managed to forget it while testing, sorry.

Gražvydas
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/16] svga: set render target flag for snorm surfaces

2016-06-28 Thread Brian Paul
We don't normally support rendering to SNORM surfaces, but with
GL_ARB_copy_image we can copy to them if we treat them as typeless
and use a UNORM surface view.
---
 src/gallium/drivers/svga/svga_resource_texture.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/gallium/drivers/svga/svga_resource_texture.c 
b/src/gallium/drivers/svga/svga_resource_texture.c
index 14fe220..aa7724a 100644
--- a/src/gallium/drivers/svga/svga_resource_texture.c
+++ b/src/gallium/drivers/svga/svga_resource_texture.c
@@ -963,6 +963,16 @@ svga_texture_create(struct pipe_screen *screen,
   svga_format_name(typeless),
   bindings);
   }
+
+  if (svga_format_is_uncompressed_snorm(tex->key.format)) {
+ /* We can't normally render to snorm surfaces, but once we
+  * substitute a typeless format, we can if the rendertarget view
+  * is unorm.  This can happen with GL_ARB_copy_image.
+  */
+ tex->key.flags |= SVGA3D_SURFACE_HINT_RENDERTARGET;
+ tex->key.flags |= SVGA3D_SURFACE_BIND_RENDER_TARGET;
+  }
+
   tex->key.format = typeless;
}
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/16] svga: use vgpu10 CopyRegion command when possible

2016-06-28 Thread Brian Paul
From: Neha Bhende 

Do texture->texture copies host-side with this command when possible.
Use the previous software fallback otherwise.

Reviewed-by: Brian Paul 
---
 src/gallium/drivers/svga/svga_pipe_blit.c | 149 +-
 1 file changed, 147 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_pipe_blit.c 
b/src/gallium/drivers/svga/svga_pipe_blit.c
index 4eec927..564af51 100644
--- a/src/gallium/drivers/svga/svga_pipe_blit.c
+++ b/src/gallium/drivers/svga/svga_pipe_blit.c
@@ -36,12 +36,76 @@
 #define FILE_DEBUG_FLAG DEBUG_BLIT
 
 
+/**
+ * Copy an image between textures with the vgpu10 CopyRegion command.
+ */
+static void
+copy_region_vgpu10(struct svga_context *svga, struct pipe_resource *src_tex,
+unsigned src_x, unsigned src_y, unsigned src_z,
+unsigned src_level, unsigned src_face,
+struct pipe_resource *dst_tex,
+unsigned dst_x, unsigned dst_y, unsigned dst_z,
+unsigned dst_level, unsigned dst_face,
+unsigned width, unsigned height, unsigned depth)
+{
+   enum pipe_error ret;
+   uint32 srcSubResource, dstSubResource;
+   struct svga_texture *dtex, *stex;
+   SVGA3dCopyBox box;
+   int i, num_layers = 1;
+
+   stex = svga_texture(src_tex);
+   dtex = svga_texture(dst_tex);
+
+   box.x = dst_x;
+   box.y = dst_y;
+   box.z = dst_z;
+   box.w = width;
+   box.h = height;
+   box.d = depth;
+   box.srcx = src_x;
+   box.srcy = src_y;
+   box.srcz = src_z;
+
+   if (src_tex->target == PIPE_TEXTURE_1D_ARRAY ||
+   src_tex->target == PIPE_TEXTURE_2D_ARRAY) {
+  /* copy layer by layer */
+  box.z = 0;
+  box.d = 1;
+  box.srcz = 0;
+
+  num_layers = depth;
+  src_face = src_z;
+  dst_face = dst_z;
+   }
+
+   /* loop over array layers */
+   for (i = 0; i < num_layers; i++) {
+  srcSubResource = (src_face + i) * (src_tex->last_level + 1) + src_level;
+  dstSubResource = (dst_face + i) * (dst_tex->last_level + 1) + dst_level;
+
+  ret = SVGA3D_vgpu10_PredCopyRegion(svga->swc,
+ dtex->handle, dstSubResource,
+ stex->handle, srcSubResource, );
+  if (ret != PIPE_OK) {
+ svga_context_flush(svga, NULL);
+ ret = SVGA3D_vgpu10_PredCopyRegion(svga->swc,
+dtex->handle, dstSubResource,
+stex->handle, srcSubResource, 
);
+ assert(ret == PIPE_OK);
+  }
+
+  svga_define_texture_level(dtex, dst_face + i, dst_level);
+   }
+}
+
+
 static void
 svga_resource_copy_region(struct pipe_context *pipe,
-  struct pipe_resource* dst_tex,
+  struct pipe_resource *dst_tex,
   unsigned dst_level,
   unsigned dstx, unsigned dsty, unsigned dstz,
-  struct pipe_resource* src_tex,
+  struct pipe_resource *src_tex,
   unsigned src_level,
   const struct pipe_box *src_box)
 {
@@ -100,6 +164,52 @@ svga_resource_copy_region(struct pipe_context *pipe,
 }
 
 
+/**
+ * The state tracker implements some resource copies with blits (for
+ * GL_ARB_copy_image).  This function checks if we should really do the blit
+ * with a VGPU10 CopyRegion command or software fallback (for incompatible
+ * src/dst formats).
+ */
+static bool
+can_blit_via_copy_region_vgpu10(struct svga_context *svga,
+const struct pipe_blit_info *blit_info)
+{
+   struct svga_texture *dtex, *stex;
+
+   if (!svga_have_vgpu10(svga))
+  return false;
+
+   stex = svga_texture(blit_info->src.resource);
+   dtex = svga_texture(blit_info->src.resource);
+
+   // can't copy within one resource
+   if (stex->handle == dtex->handle)
+  return false;
+
+   // can't copy between different resource types
+   if (blit_info->src.resource->target != blit_info->dst.resource->target)
+  return false;
+
+   // check that the blit src/dst regions are same size, no flipping, etc.
+   if (blit_info->src.box.width != blit_info->dst.box.width ||
+   blit_info->src.box.height != blit_info->dst.box.height)
+  return false;
+
+   // depth/stencil copies not supported at this time
+   if (blit_info->mask != PIPE_MASK_RGBA)
+  return false;
+
+   if (blit_info->alpha_blend || blit_info->render_condition_enable ||
+   blit_info->scissor_enable)
+  return false;
+
+   // check that src/dst surface formats are compatible for the VGPU device.
+   return util_is_format_compatible(
+util_format_description(blit_info->src.resource->format),
+util_format_description(blit_info->dst.resource->format));
+}
+
+
 static void
 svga_blit(struct pipe_context *pipe,
   const struct pipe_blit_info 

[Mesa-dev] [PATCH 16/16] svga: use SVGA3D_vgpu10_BufferCopy() for buffer copies

2016-06-28 Thread Brian Paul
So that we do copies host-side rather than in the guest with map/memcpy.

Tested with piglit arb_copy_buffer-subdata-sync test and new
arb_copy_buffer-intra-buffer-copy test.

Reviewed-by: Charmaine Lee 
---
 src/gallium/drivers/svga/svga_pipe_blit.c | 32 +++
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_pipe_blit.c 
b/src/gallium/drivers/svga/svga_pipe_blit.c
index 4a01c8e..08aa30a 100644
--- a/src/gallium/drivers/svga/svga_pipe_blit.c
+++ b/src/gallium/drivers/svga/svga_pipe_blit.c
@@ -23,10 +23,11 @@
  *
  **/
 
-#include "svga_resource_texture.h"
 #include "svga_context.h"
 #include "svga_debug.h"
 #include "svga_cmd.h"
+#include "svga_resource_buffer.h"
+#include "svga_resource_texture.h"
 #include "svga_surface.h"
 
 //#include "util/u_blit_sw.h"
@@ -117,10 +118,33 @@ svga_resource_copy_region(struct pipe_context *pipe,
 */
svga_surfaces_flush( svga );
 
-   /* Fallback for buffers. */
if (dst_tex->target == PIPE_BUFFER && src_tex->target == PIPE_BUFFER) {
-  util_resource_copy_region(pipe, dst_tex, dst_level, dstx, dsty, dstz,
-src_tex, src_level, src_box);
+  /* can't copy within the same buffer, unfortunately */
+  if (svga_have_vgpu10(svga) && src_tex != dst_tex) {
+ enum pipe_error ret;
+ struct svga_winsys_surface *src_surf;
+ struct svga_winsys_surface *dst_surf;
+ struct svga_buffer *dbuffer = svga_buffer(dst_tex);
+
+ src_surf = svga_buffer_handle(svga, src_tex);
+ dst_surf = svga_buffer_handle(svga, dst_tex);
+
+ ret = SVGA3D_vgpu10_BufferCopy(svga->swc, src_surf, dst_surf,
+src_box->x, dstx, src_box->width);
+ if (ret != PIPE_OK) {
+svga_context_flush(svga, NULL);
+ret = SVGA3D_vgpu10_BufferCopy(svga->swc, src_surf, dst_surf,
+   src_box->x, dstx, src_box->width);
+assert(ret == PIPE_OK);
+ }
+
+ dbuffer->dirty = TRUE;
+  }
+  else {
+ /* use map/memcpy fallback */
+ util_resource_copy_region(pipe, dst_tex, dst_level, dstx,
+   dsty, dstz, src_tex, src_level, src_box);
+  }
   return;
}
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/16] svga: adjust sampler view format for RGBX

2016-06-28 Thread Brian Paul
We previously handled the case of a RGBX sampler view of a RGBA surface.
Add the reverse case too.  For GL_ARB_copy_image.
---
 src/gallium/drivers/svga/svga_state_sampler.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/svga/svga_state_sampler.c 
b/src/gallium/drivers/svga/svga_state_sampler.c
index 6e78825..00e8fc0 100644
--- a/src/gallium/drivers/svga/svga_state_sampler.c
+++ b/src/gallium/drivers/svga/svga_state_sampler.c
@@ -106,12 +106,16 @@ svga_validate_pipe_sampler_view(struct svga_context *svga,
   enum pipe_format pformat = sv->base.format;
 
   /* vgpu10 cannot create a BGRX view for a BGRA resource, so force it to
-   * create a BGRA view.
+   * create a BGRA view (and vice versa).
*/
   if (pformat == PIPE_FORMAT_B8G8R8X8_UNORM &&
   sv->base.texture->format == PIPE_FORMAT_B8G8R8A8_UNORM) {
  pformat = PIPE_FORMAT_B8G8R8A8_UNORM;
   }
+  else if (pformat == PIPE_FORMAT_B8G8R8A8_UNORM &&
+  sv->base.texture->format == PIPE_FORMAT_B8G8R8X8_UNORM) {
+ pformat = PIPE_FORMAT_B8G8R8X8_UNORM;
+  }
 
   format = svga_translate_format(ss, pformat,
  PIPE_BIND_SAMPLER_VIEW);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/16] svga: enable ARB_copy_image extension in the driver

2016-06-28 Thread Brian Paul
From: Neha Bhende 

Reviewed-by: Brian Paul 
---
 src/gallium/drivers/svga/svga_screen.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/svga/svga_screen.c 
b/src/gallium/drivers/svga/svga_screen.c
index 4c2774d..359a159 100644
--- a/src/gallium/drivers/svga/svga_screen.c
+++ b/src/gallium/drivers/svga/svga_screen.c
@@ -388,6 +388,8 @@ svga_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
case PIPE_CAP_VIDEO_MEMORY:
   /* XXX: Query the host ? */
   return 1;
+   case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS:
+  return sws->have_vgpu10;
case PIPE_CAP_UMA:
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
@@ -398,7 +400,6 @@ svga_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
case PIPE_CAP_TGSI_TXQS:
case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
case PIPE_CAP_SHAREABLE_SHADERS:
-   case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS:
case PIPE_CAP_CLEAR_TEXTURE:
case PIPE_CAP_DRAW_PARAMETERS:
case PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/16] svga: try blitting with copy region in more cases

2016-06-28 Thread Brian Paul
We previously could do blits with util_resource_copy_region() when doing
'loose' format checking.  Also do blits with util_resource_copy_region()
when the blit src/dst formats (not the underlying resources) exactly
match.  Needed for GL_ARB_copy_image.
---
 src/gallium/drivers/svga/svga_pipe_blit.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/svga/svga_pipe_blit.c 
b/src/gallium/drivers/svga/svga_pipe_blit.c
index ad54dc5..4a01c8e 100644
--- a/src/gallium/drivers/svga/svga_pipe_blit.c
+++ b/src/gallium/drivers/svga/svga_pipe_blit.c
@@ -294,7 +294,13 @@ svga_blit(struct pipe_context *pipe,
   return;
}
 
-   if (util_try_blit_via_copy_region(pipe, blit_info)) {
+   if (util_can_blit_via_copy_region(blit_info, TRUE) ||
+   util_can_blit_via_copy_region(blit_info, FALSE)) {
+  util_resource_copy_region(pipe, blit_info->dst.resource,
+blit_info->dst.level,
+blit_info->dst.box.x, blit_info->dst.box.y,
+blit_info->dst.box.z, blit_info->src.resource,
+blit_info->src.level, _info->src.box);
   return; /* done */
}
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/16] svga: use copy_region_vgpu10() for region copies when possible

2016-06-28 Thread Brian Paul
---
 src/gallium/drivers/svga/svga_pipe_blit.c | 42 ---
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_pipe_blit.c 
b/src/gallium/drivers/svga/svga_pipe_blit.c
index 564af51..ad54dc5 100644
--- a/src/gallium/drivers/svga/svga_pipe_blit.c
+++ b/src/gallium/drivers/svga/svga_pipe_blit.c
@@ -138,7 +138,6 @@ svga_resource_copy_region(struct pipe_context *pipe,
   src_z = src_box->z;
}
 
-   /* different src/dst type???*/
if (dst_tex->target == PIPE_TEXTURE_CUBE ||
dst_tex->target == PIPE_TEXTURE_1D_ARRAY) {
   dst_face_layer = dstz;
@@ -150,14 +149,49 @@ svga_resource_copy_region(struct pipe_context *pipe,
   dst_z = dstz;
}
 
-   svga_texture_copy_handle(svga,
-stex->handle,
+   stex = svga_texture(src_tex);
+   dtex = svga_texture(dst_tex);
+
+   if (svga_have_vgpu10(svga)) {
+  /* vgpu10 */
+  if (util_format_is_compressed(src_tex->format) ==
+  util_format_is_compressed(dst_tex->format) &&
+  !util_format_is_depth_and_stencil(src_tex->format) &&
+  stex->handle != dtex->handle &&
+  src_tex->target == dst_tex->target) {
+ copy_region_vgpu10(svga,
+src_tex,
 src_box->x, src_box->y, src_z,
 src_level, src_face_layer,
-dtex->handle,
+dst_tex,
 dstx, dsty, dst_z,
 dst_level, dst_face_layer,
 src_box->width, src_box->height, src_box->depth);
+  }
+  else {
+ util_resource_copy_region(pipe, dst_tex, dst_level, dstx, dsty, dstz,
+   src_tex, src_level, src_box);
+  }
+   }
+   else {
+  /* vgpu9 */
+  if (src_tex->format == dst_tex->format) {
+ svga_texture_copy_handle(svga,
+  stex->handle,
+  src_box->x, src_box->y, src_z,
+  src_level, src_face_layer,
+  dtex->handle,
+  dstx, dsty, dst_z,
+   dst_level, dst_face_layer,
+  src_box->width, src_box->height,
+  src_box->depth);
+ svga_define_texture_level(dtex, dst_face_layer, dst_level);
+  }
+  else {
+ util_resource_copy_region(pipe, dst_tex, dst_level, dstx, dsty, dstz,
+   src_tex, src_level, src_box);
+  }
+   }
 
/* Mark the destination image as being defined */
svga_define_texture_level(dtex, dst_face_layer, dst_level);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/16] svga: add SVGA3D_vgpu10_BufferCopy()

2016-06-28 Thread Brian Paul
---
 src/gallium/drivers/svga/svga_cmd.h|  6 ++
 src/gallium/drivers/svga/svga_cmd_vgpu10.c | 24 
 2 files changed, 30 insertions(+)

diff --git a/src/gallium/drivers/svga/svga_cmd.h 
b/src/gallium/drivers/svga/svga_cmd.h
index 26e4690..06e1b4a 100644
--- a/src/gallium/drivers/svga/svga_cmd.h
+++ b/src/gallium/drivers/svga/svga_cmd.h
@@ -642,4 +642,10 @@ enum pipe_error
 SVGA3D_vgpu10_GenMips(struct svga_winsys_context *swc,
   const SVGA3dShaderResourceViewId shaderResourceViewId,
   struct svga_winsys_surface *view);
+
+enum pipe_error
+SVGA3D_vgpu10_BufferCopy(struct svga_winsys_context *swc,
+ struct svga_winsys_surface *src,
+ struct svga_winsys_surface *dst,
+ unsigned srcx, unsigned dstx, unsigned width);
 #endif /* __SVGA3D_H__ */
diff --git a/src/gallium/drivers/svga/svga_cmd_vgpu10.c 
b/src/gallium/drivers/svga/svga_cmd_vgpu10.c
index 2729655..1f13193 100644
--- a/src/gallium/drivers/svga/svga_cmd_vgpu10.c
+++ b/src/gallium/drivers/svga/svga_cmd_vgpu10.c
@@ -1314,3 +1314,27 @@ SVGA3D_vgpu10_GenMips(struct svga_winsys_context *swc,
swc->commit(swc);
return PIPE_OK;
 }
+
+
+enum pipe_error
+SVGA3D_vgpu10_BufferCopy(struct svga_winsys_context *swc,
+  struct svga_winsys_surface *src,
+  struct svga_winsys_surface *dst,
+  unsigned srcx, unsigned dstx, unsigned width)
+{
+   SVGA3dCmdDXBufferCopy *cmd;
+
+   cmd = SVGA3D_FIFOReserve(swc, SVGA_3D_CMD_DX_BUFFER_COPY, sizeof *cmd, 2);
+
+   if (!cmd)
+  return PIPE_ERROR_OUT_OF_MEMORY;
+
+   swc->surface_relocation(swc, >dest, NULL, dst, SVGA_RELOC_WRITE);
+   swc->surface_relocation(swc, >src, NULL, src, SVGA_RELOC_READ);
+   cmd->destX = dstx;
+   cmd->srcX = srcx;
+   cmd->width = width;
+
+   swc->commit(swc);
+   return PIPE_OK;
+}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/16] svga: add new svga_format_is_uncompressed_snorm() helper

2016-06-28 Thread Brian Paul
---
 src/gallium/drivers/svga/svga_format.c | 20 
 src/gallium/drivers/svga/svga_format.h |  4 
 2 files changed, 24 insertions(+)

diff --git a/src/gallium/drivers/svga/svga_format.c 
b/src/gallium/drivers/svga/svga_format.c
index 17c3bf9..1b3cebe 100644
--- a/src/gallium/drivers/svga/svga_format.c
+++ b/src/gallium/drivers/svga/svga_format.c
@@ -2193,3 +2193,23 @@ svga_sampler_format(SVGA3dSurfaceFormat format)
   return format;
}
 }
+
+
+/**
+ * Is the given format an uncompressed snorm format?
+ */
+bool
+svga_format_is_uncompressed_snorm(SVGA3dSurfaceFormat format)
+{
+   switch (format) {
+   case SVGA3D_R8G8B8A8_SNORM:
+   case SVGA3D_R8G8_SNORM:
+   case SVGA3D_R8_SNORM:
+   case SVGA3D_R16G16B16A16_SNORM:
+   case SVGA3D_R16G16_SNORM:
+   case SVGA3D_R16_SNORM:
+  return true;
+   default:
+  return false;
+   }
+}
diff --git a/src/gallium/drivers/svga/svga_format.h 
b/src/gallium/drivers/svga/svga_format.h
index 630a86a..e6258179 100644
--- a/src/gallium/drivers/svga/svga_format.h
+++ b/src/gallium/drivers/svga/svga_format.h
@@ -104,4 +104,8 @@ SVGA3dSurfaceFormat
 svga_sampler_format(SVGA3dSurfaceFormat format);
 
 
+bool
+svga_format_is_uncompressed_snorm(SVGA3dSurfaceFormat format);
+
+
 #endif /* SVGA_FORMAT_H_ */
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/16] svga: adjust render target view format for RGBX

2016-06-28 Thread Brian Paul
For GL_ARB_copy_image we may be asked to create an RGBA view of
a RGBX surface.  Use an RGBX view format for that case.
---
 src/gallium/drivers/svga/svga_surface.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/svga/svga_surface.c 
b/src/gallium/drivers/svga/svga_surface.c
index a0108d2..e5943cf 100644
--- a/src/gallium/drivers/svga/svga_surface.c
+++ b/src/gallium/drivers/svga/svga_surface.c
@@ -452,10 +452,22 @@ svga_validate_surface_view(struct svga_context *svga, 
struct svga_surface *s)
 );
   }
   else {
+ SVGA3dSurfaceFormat view_format = s->key.format;
+ const struct svga_texture *stex = svga_texture(s->base.texture);
+
+ /* Can't create RGBA render target view of a RGBX surface so adjust
+  * the view format.  We do something similar for texture samplers in
+  * svga_validate_pipe_sampler_view().
+  */
+ if (view_format == SVGA3D_B8G8R8A8_UNORM &&
+ stex->key.format == SVGA3D_B8G8R8X8_TYPELESS) {
+view_format = SVGA3D_B8G8R8X8_UNORM;
+ }
+
  ret = SVGA3D_vgpu10_DefineRenderTargetView(svga->swc,
 s->view_id,
 s->handle,
-s->key.format,
+view_format,
 resType,
 );
   }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/16] svga: flush buffers when mapping for reading

2016-06-28 Thread Brian Paul
With host-side buffer copies (via SVGA3D_vgpu10_BufferCopy()) we have
to make sure any pending map-write operations are completed before reading.
Otherwise the ReadbackSubResource operation could get stale data from
the host buffer.

This allows the piglit arb_copy_buffer-subdata-sync test to pass when
we start using the SVGA3D_vgpu10_BufferCopy command.
---
 src/gallium/drivers/svga/svga_resource_buffer.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_resource_buffer.c 
b/src/gallium/drivers/svga/svga_resource_buffer.c
index 9ecb975..3c6ee20 100644
--- a/src/gallium/drivers/svga/svga_resource_buffer.c
+++ b/src/gallium/drivers/svga/svga_resource_buffer.c
@@ -95,13 +95,25 @@ svga_buffer_transfer_map(struct pipe_context *pipe,
transfer->usage = usage;
transfer->box = *box;
 
-   if ((usage & PIPE_TRANSFER_READ) && sbuf->dirty) {
-  /* Only need to test for vgpu10 since only vgpu10 features (streamout,
-   * buffer copy) can modify buffers on the device.
-   */
-  if (svga_have_vgpu10(svga)) {
+   if (usage & PIPE_TRANSFER_READ) {
+  if (!sbuf->user) {
+ (void) svga_buffer_handle(svga, resource);
+  }
+
+  if (sbuf->dma.pending > 0) {
+ svga_buffer_upload_flush(svga, sbuf);
+ svga_context_finish(svga);
+  }
+
+  if (sbuf->dirty) {
  enum pipe_error ret;
+
+ /* Host-side buffers can only be dirtied with vgpu10 features
+  * (streamout and buffer copy).
+  */
+ assert(svga_have_vgpu10(svga));
  assert(sbuf->handle);
+
  ret = SVGA3D_vgpu10_ReadbackSubResource(svga->swc, sbuf->handle, 0);
  if (ret != PIPE_OK) {
 svga_context_flush(svga, NULL);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/16] svga: use untyped surface formats in most cases

2016-06-28 Thread Brian Paul
This allows us to do copies between different, but compatible, surface
formats such as RGBA8_UNORM, RGBA8_SINT, RGBA8_UINT, etc. for
GL_ARB_copy_image.
---
 src/gallium/drivers/svga/svga_resource_texture.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_resource_texture.c 
b/src/gallium/drivers/svga/svga_resource_texture.c
index 0e21f5e..14fe220 100644
--- a/src/gallium/drivers/svga/svga_resource_texture.c
+++ b/src/gallium/drivers/svga/svga_resource_texture.c
@@ -949,10 +949,13 @@ svga_texture_create(struct pipe_screen *screen,
 * formats can be reinterpreted as other formats.  For example,
 * SVGA3D_R8G8B8A8_UNORM_TYPELESS can be interpreted as
 * SVGA3D_R8G8B8A8_UNORM_SRGB or SVGA3D_R8G8B8A8_UNORM.
+* Do not use typeless formats for SHARED, DISPLAY_TARGET or SCANOUT
+* buffers.
 */
-   if (svgascreen->sws->have_vgpu10 &&
-   (util_format_is_srgb(template->format) ||
-format_has_depth(template->format))) {
+   if (svgascreen->sws->have_vgpu10
+   && ((bindings & (PIPE_BIND_SHARED |
+PIPE_BIND_DISPLAY_TARGET |
+PIPE_BIND_SCANOUT)) == 0)) {
   SVGA3dSurfaceFormat typeless = svga_typeless_format(tex->key.format);
   if (0) {
  debug_printf("Convert resource type %s -> %s (bind 0x%x)\n",
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/16] util: simplify a few things in util_can_blit_via_copy_region()

2016-06-28 Thread Brian Paul
Since only the src box can have negative dims for flipping, just
comparing the src/dst box sizes is enough to detect flips.
---
 src/gallium/auxiliary/util/u_surface.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_surface.c 
b/src/gallium/auxiliary/util/u_surface.c
index 8d22bcf..e2229bc 100644
--- a/src/gallium/auxiliary/util/u_surface.c
+++ b/src/gallium/auxiliary/util/u_surface.c
@@ -701,21 +701,20 @@ util_can_blit_via_copy_region(const struct pipe_blit_info 
*blit)
   return FALSE;
}
 
-   /* No masks, no filtering, no scissor. */
+   /* No masks, no filtering, no scissor, no blending */
if ((blit->mask & mask) != mask ||
blit->filter != PIPE_TEX_FILTER_NEAREST ||
-   blit->scissor_enable) {
+   blit->scissor_enable ||
+   blit->alpha_blend) {
   return FALSE;
}
 
-   /* No flipping. */
-   if (blit->src.box.width < 0 ||
-   blit->src.box.height < 0 ||
-   blit->src.box.depth < 0) {
-  return FALSE;
-   }
+   /* Only the src box can have negative dims for flipping */
+   assert(blit->dst.box.width >= 1);
+   assert(blit->dst.box.height >= 1);
+   assert(blit->dst.box.depth >= 1);
 
-   /* No scaling. */
+   /* No scaling or flipping */
if (blit->src.box.width != blit->dst.box.width ||
blit->src.box.height != blit->dst.box.height ||
blit->src.box.depth != blit->dst.box.depth) {
@@ -736,9 +735,6 @@ util_can_blit_via_copy_region(const struct pipe_blit_info 
*blit)
   return FALSE;
}
 
-   if (blit->alpha_blend)
-  return FALSE;
-
return TRUE;
 }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/16] gallium/util: add tight_format_check param to util_can_blit_via_copy_region()

2016-06-28 Thread Brian Paul
The VMware driver will use this for implementing GL_ARB_copy_image.
---
 src/gallium/auxiliary/util/u_surface.c | 38 +-
 src/gallium/auxiliary/util/u_surface.h |  3 ++-
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_surface.c 
b/src/gallium/auxiliary/util/u_surface.c
index e2229bc..e0234f8 100644
--- a/src/gallium/auxiliary/util/u_surface.c
+++ b/src/gallium/auxiliary/util/u_surface.c
@@ -687,19 +687,37 @@ get_sample_count(const struct pipe_resource *res)
 }
 
 
+/**
+ * Check if a blit() command can be implemented with a resource_copy_region().
+ * If tight_format_check is true, only allow the resource_copy_region() if
+ * the blit src/dst formats are identical, ignoring the resource formats.
+ * Otherwise, check for format casting and compatibility.
+ */
 boolean
-util_can_blit_via_copy_region(const struct pipe_blit_info *blit)
+util_can_blit_via_copy_region(const struct pipe_blit_info *blit,
+  boolean tight_format_check)
 {
-   unsigned mask = util_format_get_mask(blit->dst.format);
+   const struct util_format_description *src_desc, *dst_desc;
 
-   /* No format conversions. */
-   if (blit->src.resource->format != blit->src.format ||
-   blit->dst.resource->format != blit->dst.format ||
-   !util_is_format_compatible(
-  util_format_description(blit->src.resource->format),
-  util_format_description(blit->dst.resource->format))) {
-  return FALSE;
+   src_desc = util_format_description(blit->src.resource->format);
+   dst_desc = util_format_description(blit->dst.resource->format);
+
+   if (tight_format_check) {
+  /* no format conversions allowed */
+  if (blit->src.format != blit->dst.format) {
+ return FALSE;
+  }
}
+   else {
+  /* do loose format compatibility checking */
+  if (blit->src.resource->format != blit->src.format ||
+  blit->dst.resource->format != blit->dst.format ||
+  !util_is_format_compatible(src_desc, dst_desc)) {
+ return FALSE;
+  }
+   }
+
+   unsigned mask = util_format_get_mask(blit->dst.format);
 
/* No masks, no filtering, no scissor, no blending */
if ((blit->mask & mask) != mask ||
@@ -752,7 +770,7 @@ boolean
 util_try_blit_via_copy_region(struct pipe_context *ctx,
   const struct pipe_blit_info *blit)
 {
-   if (util_can_blit_via_copy_region(blit)) {
+   if (util_can_blit_via_copy_region(blit, FALSE)) {
   ctx->resource_copy_region(ctx, blit->dst.resource, blit->dst.level,
 blit->dst.box.x, blit->dst.box.y,
 blit->dst.box.z,
diff --git a/src/gallium/auxiliary/util/u_surface.h 
b/src/gallium/auxiliary/util/u_surface.h
index bda2e1e..64a685b 100644
--- a/src/gallium/auxiliary/util/u_surface.h
+++ b/src/gallium/auxiliary/util/u_surface.h
@@ -99,7 +99,8 @@ util_clear_depth_stencil(struct pipe_context *pipe,
  unsigned width, unsigned height);
 
 boolean
-util_can_blit_via_copy_region(const struct pipe_blit_info *blit);
+util_can_blit_via_copy_region(const struct pipe_blit_info *blit,
+  boolean tight_format_check);
 
 extern boolean
 util_try_blit_via_copy_region(struct pipe_context *ctx,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/16] util: new util_try_blit_via_copy_region() function

2016-06-28 Thread Brian Paul
Pulled out of the util_try_blit_via_copy_region() function.  Subsequent
changes build on this.
---
 src/gallium/auxiliary/util/u_surface.c | 44 ++
 src/gallium/auxiliary/util/u_surface.h |  3 +++
 2 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_surface.c 
b/src/gallium/auxiliary/util/u_surface.c
index 8408aa8..8d22bcf 100644
--- a/src/gallium/auxiliary/util/u_surface.c
+++ b/src/gallium/auxiliary/util/u_surface.c
@@ -686,18 +686,9 @@ get_sample_count(const struct pipe_resource *res)
return res->nr_samples ? res->nr_samples : 1;
 }
 
-/**
- * Try to do a blit using resource_copy_region. The function calls
- * resource_copy_region if the blit description is compatible with it.
- *
- * It returns TRUE if the blit was done using resource_copy_region.
- *
- * It returns FALSE otherwise and the caller must fall back to a more generic
- * codepath for the blit operation. (e.g. by using u_blitter)
- */
+
 boolean
-util_try_blit_via_copy_region(struct pipe_context *ctx,
-  const struct pipe_blit_info *blit)
+util_can_blit_via_copy_region(const struct pipe_blit_info *blit)
 {
unsigned mask = util_format_get_mask(blit->dst.format);
 
@@ -748,9 +739,32 @@ util_try_blit_via_copy_region(struct pipe_context *ctx,
if (blit->alpha_blend)
   return FALSE;
 
-   ctx->resource_copy_region(ctx, blit->dst.resource, blit->dst.level,
- blit->dst.box.x, blit->dst.box.y, blit->dst.box.z,
- blit->src.resource, blit->src.level,
- >src.box);
return TRUE;
 }
+
+
+/**
+ * Try to do a blit using resource_copy_region. The function calls
+ * resource_copy_region if the blit description is compatible with it.
+ *
+ * It returns TRUE if the blit was done using resource_copy_region.
+ *
+ * It returns FALSE otherwise and the caller must fall back to a more generic
+ * codepath for the blit operation. (e.g. by using u_blitter)
+ */
+boolean
+util_try_blit_via_copy_region(struct pipe_context *ctx,
+  const struct pipe_blit_info *blit)
+{
+   if (util_can_blit_via_copy_region(blit)) {
+  ctx->resource_copy_region(ctx, blit->dst.resource, blit->dst.level,
+blit->dst.box.x, blit->dst.box.y,
+blit->dst.box.z,
+blit->src.resource, blit->src.level,
+>src.box);
+  return TRUE;
+   }
+   else {
+  return FALSE;
+   }
+}
diff --git a/src/gallium/auxiliary/util/u_surface.h 
b/src/gallium/auxiliary/util/u_surface.h
index bfd8f40..bda2e1e 100644
--- a/src/gallium/auxiliary/util/u_surface.h
+++ b/src/gallium/auxiliary/util/u_surface.h
@@ -98,6 +98,9 @@ util_clear_depth_stencil(struct pipe_context *pipe,
  unsigned dstx, unsigned dsty,
  unsigned width, unsigned height);
 
+boolean
+util_can_blit_via_copy_region(const struct pipe_blit_info *blit);
+
 extern boolean
 util_try_blit_via_copy_region(struct pipe_context *ctx,
   const struct pipe_blit_info *blit);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/16] svga: don't advertise support for R32G32B32_UINT/SINT surface formats

2016-06-28 Thread Brian Paul
From: Neha Bhende 

We want to be able to copy between different 32-bit, 3-channel surface
formats for GL_ARB_copy_image but since we don't have a 3-channel float
format, we can't support 32-bit, 3-channel integer formats.

The state tracker will choose 4-channel formats instead.

Fixes the piglit arb_copy_image-format test for several cases.

Note: This change may need to be revisited if/when the texture_view exension
is enabled in driver.

Reviewed-by: Brian Paul 
---
 src/gallium/drivers/svga/svga_format.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_format.c 
b/src/gallium/drivers/svga/svga_format.c
index 4662bef..17c3bf9 100644
--- a/src/gallium/drivers/svga/svga_format.c
+++ b/src/gallium/drivers/svga/svga_format.c
@@ -242,11 +242,11 @@ static const struct vgpu10_format_entry 
format_conversion_table[] =
{ PIPE_FORMAT_R16G16B16A16_SINT, SVGA3D_R16G16B16A16_SINT,   
SVGA3D_R16G16B16A16_SINT,0 },
{ PIPE_FORMAT_R32_UINT,  SVGA3D_R32_UINT,
SVGA3D_R32_UINT, 0 },
{ PIPE_FORMAT_R32G32_UINT,   SVGA3D_R32G32_UINT, 
SVGA3D_R32G32_UINT,  0 },
-   { PIPE_FORMAT_R32G32B32_UINT,SVGA3D_R32G32B32_UINT,  
SVGA3D_R32G32B32_UINT,   0 },
+   { PIPE_FORMAT_R32G32B32_UINT,SVGA3D_R32G32B32_UINT,  
SVGA3D_FORMAT_INVALID,   0 },
{ PIPE_FORMAT_R32G32B32A32_UINT, SVGA3D_R32G32B32A32_UINT,   
SVGA3D_R32G32B32A32_UINT,0 },
{ PIPE_FORMAT_R32_SINT,  SVGA3D_R32_SINT,
SVGA3D_R32_SINT, 0 },
{ PIPE_FORMAT_R32G32_SINT,   SVGA3D_R32G32_SINT, 
SVGA3D_R32G32_SINT,  0 },
-   { PIPE_FORMAT_R32G32B32_SINT,SVGA3D_R32G32B32_SINT,  
SVGA3D_R32G32B32_SINT,   0 },
+   { PIPE_FORMAT_R32G32B32_SINT,SVGA3D_R32G32B32_SINT,  
SVGA3D_FORMAT_INVALID,   0 },
{ PIPE_FORMAT_R32G32B32A32_SINT, SVGA3D_R32G32B32A32_SINT,   
SVGA3D_R32G32B32A32_SINT,0 },
{ PIPE_FORMAT_A8_UINT,   SVGA3D_FORMAT_INVALID,  
SVGA3D_FORMAT_INVALID,   0 },
{ PIPE_FORMAT_I8_UINT,   SVGA3D_FORMAT_INVALID,  
SVGA3D_FORMAT_INVALID,   0 },
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeonsi: use conformant line rasterization

2016-06-28 Thread Edward O'Callaghan
This series is,

Reviewed-by: Edward O'Callaghan 

On 06/29/2016 03:53 AM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> AA lines are not completely correct (see TODO), but everything else
> should be.
> 
> + 3 linestipple piglits
> ---
>  src/gallium/drivers/radeon/cayman_msaa.c | 12 ++--
>  src/gallium/drivers/radeon/r600d_common.h|  6 ++
>  src/gallium/drivers/radeonsi/si_state.c  | 10 +-
>  src/gallium/drivers/radeonsi/si_state_draw.c |  6 --
>  4 files changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/cayman_msaa.c 
> b/src/gallium/drivers/radeon/cayman_msaa.c
> index a9ec4c3..89c4937 100644
> --- a/src/gallium/drivers/radeon/cayman_msaa.c
> +++ b/src/gallium/drivers/radeon/cayman_msaa.c
> @@ -200,6 +200,14 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs 
> *cs, int nr_samples,
>  {
>   int setup_samples = nr_samples > 1 ? nr_samples :
>   overrast_samples > 1 ? overrast_samples : 0;
> + /* Required by OpenGL line rasterization.
> +  *
> +  * TODO: We should also enable perpendicular endcaps for AA lines,
> +  *   but that requires implementing line stippling in the pixel
> +  *   shader. SC can only do line stippling with axis-aligned
> +  *   endcaps.
> +  */
> + unsigned sc_line_cntl = S_028BDC_DX10_DIAMOND_TEST_ENA(1);
>  
>   if (setup_samples > 1) {
>   /* indexed by log2(nr_samples) */
> @@ -215,7 +223,7 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs *cs, 
> int nr_samples,
>   util_logbase2(util_next_power_of_two(ps_iter_samples));
>  
>   radeon_set_context_reg_seq(cs, CM_R_028BDC_PA_SC_LINE_CNTL, 2);
> - radeon_emit(cs, S_028BDC_LAST_PIXEL(1) |
> + radeon_emit(cs, sc_line_cntl |
>   S_028BDC_EXPAND_LINE_WIDTH(1)); /* 
> CM_R_028BDC_PA_SC_LINE_CNTL */
>   radeon_emit(cs, S_028BE0_MSAA_NUM_SAMPLES(log_samples) |
>   S_028BE0_MAX_SAMPLE_DIST(max_dist[log_samples]) |
> @@ -242,7 +250,7 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs *cs, 
> int nr_samples,
>   }
>   } else {
>   radeon_set_context_reg_seq(cs, CM_R_028BDC_PA_SC_LINE_CNTL, 2);
> - radeon_emit(cs, S_028BDC_LAST_PIXEL(1)); /* 
> CM_R_028BDC_PA_SC_LINE_CNTL */
> + radeon_emit(cs, sc_line_cntl); /* CM_R_028BDC_PA_SC_LINE_CNTL */
>   radeon_emit(cs, 0); /* CM_R_028BE0_PA_SC_AA_CONFIG */
>  
>   radeon_set_context_reg(cs, CM_R_028804_DB_EQAA,
> diff --git a/src/gallium/drivers/radeon/r600d_common.h 
> b/src/gallium/drivers/radeon/r600d_common.h
> index e50de96..6f534b3 100644
> --- a/src/gallium/drivers/radeon/r600d_common.h
> +++ b/src/gallium/drivers/radeon/r600d_common.h
> @@ -203,6 +203,12 @@
>  #define   S_028BDC_LAST_PIXEL(x)   (((unsigned)(x) & 
> 0x1) << 10)
>  #define   G_028BDC_LAST_PIXEL(x)   (((x) >> 10) & 0x1)
>  #define   C_028BDC_LAST_PIXEL  0xFBFF
> +#define   S_028BDC_PERPENDICULAR_ENDCAP_ENA(x) (((unsigned)(x) & 
> 0x1) << 11)
> +#define   G_028BDC_PERPENDICULAR_ENDCAP_ENA(x) (((x) >> 11) & 0x1)
> +#define   C_028BDC_PERPENDICULAR_ENDCAP_ENA0xF7FF
> +#define   S_028BDC_DX10_DIAMOND_TEST_ENA(x)(((unsigned)(x) & 
> 0x1) << 12)
> +#define   G_028BDC_DX10_DIAMOND_TEST_ENA(x)(((x) >> 12) & 0x1)
> +#define   C_028BDC_DX10_DIAMOND_TEST_ENA   0xEFFF
>  #define CM_R_028BE0_PA_SC_AA_CONFIG  0x28be0
>  #define   S_028BE0_MSAA_NUM_SAMPLES(x)  (((unsigned)(x) & 
> 0x7) << 0)
>  #define   S_028BE0_AA_MASK_CENTROID_DTMN(x)  (((unsigned)(x) & 0x1) 
> << 4)
> diff --git a/src/gallium/drivers/radeonsi/si_state.c 
> b/src/gallium/drivers/radeonsi/si_state.c
> index 0a2fdbf..b21fa5c 100644
> --- a/src/gallium/drivers/radeonsi/si_state.c
> +++ b/src/gallium/drivers/radeonsi/si_state.c
> @@ -3805,7 +3805,15 @@ static void si_init_config(struct si_context *sctx)
>  S_028034_BR_X(16384) | S_028034_BR_Y(16384));
>  
>   si_pm4_set_reg(pm4, R_02820C_PA_SC_CLIPRECT_RULE, 0x);
> - si_pm4_set_reg(pm4, R_028230_PA_SC_EDGERULE, 0x);
> + si_pm4_set_reg(pm4, R_028230_PA_SC_EDGERULE,
> +S_028230_ER_TRI(0xA) |
> +S_028230_ER_POINT(0xA) |
> +S_028230_ER_RECT(0xA) |
> +/* Required by DX10_DIAMOND_TEST_ENA: */
> +S_028230_ER_LINE_LR(0x1A) |
> +S_028230_ER_LINE_RL(0x26) |
> +S_028230_ER_LINE_TB(0xA) |
> +S_028230_ER_LINE_BT(0xA));
>   /* PA_SU_HARDWARE_SCREEN_OFFSET must be 0 due to hw bug on SI */
>   si_pm4_set_reg(pm4, 

Re: [Mesa-dev] [PATCH] gallium: Force blend color to 16-byte alignment

2016-06-28 Thread Roland Scheidegger
Am 28.06.2016 um 22:45 schrieb Chuck Atkins:
> This aligns the 4-element color float array to 16 byte boundaries.  This
> should allow compiler vectorizers to generate better optimizations.
> Also fixes broken vectorization generated by Intel compiler.
> 
> Reported-by: Tim Rowley 
> Signed-off-by: Chuck Atkins 
> ---
>  src/gallium/include/pipe/p_state.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/include/pipe/p_state.h 
> b/src/gallium/include/pipe/p_state.h
> index 1543e90..95f140f 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -326,7 +326,7 @@ struct pipe_blend_state
>  
>  struct pipe_blend_color
>  {
> -   float color[4];
> +  PIPE_ALIGN_VAR(16) float color[4];
>  };
>  

I'm wondering if that's really needed. I have a difficult time to
imagine setting blend color is performance critical. And driver internal
you can obviously still align pipe_blend_color structs yourself.
But OTOH, why not...

Acked-by: Roland Scheidegger 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [PATCH 2/2] i965: Removing PCI IDs that are no longer listed as Kabylake.

2016-06-28 Thread Pandiyan, Dhinakaran
On Thu, 2016-06-23 at 14:50 -0700, Rodrigo Vivi wrote:
> This is unusual. Usually IDs listed on early stages of platform
> definition are kept there as reserved for later use.
> 
> However these IDs here are not listed anymore in any of steppings
> and devices IDs tables for Kabylake on configurations overview
> section of BSpec.
> 
> So it is better removing them before they become used in any
> other future platform.
> 
> Signed-off-by: Rodrigo Vivi 
> ---
>  include/pci_ids/i965_pci_ids.h | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
> index 7a7897f..1566afd 100644
> --- a/include/pci_ids/i965_pci_ids.h
> +++ b/include/pci_ids/i965_pci_ids.h
> @@ -153,12 +153,7 @@ CHIPSET(0x5921, kbl_gt2, "Intel(R) Kabylake GT2F")
>  CHIPSET(0x5923, kbl_gt3, "Intel(R) Kabylake GT3")
>  CHIPSET(0x5926, kbl_gt3, "Intel(R) Kabylake GT3")
>  CHIPSET(0x5927, kbl_gt3, "Intel(R) Kabylake GT3")
> -CHIPSET(0x592A, kbl_gt3, "Intel(R) Kabylake GT3")
> -CHIPSET(0x592B, kbl_gt3, "Intel(R) Kabylake GT3")
> -CHIPSET(0x5932, kbl_gt4, "Intel(R) Kabylake GT4")
> -CHIPSET(0x593A, kbl_gt4, "Intel(R) Kabylake GT4")
>  CHIPSET(0x593B, kbl_gt4, "Intel(R) Kabylake GT4")
> -CHIPSET(0x593D, kbl_gt4, "Intel(R) Kabylake GT4")
>  CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherrytrail)")
>  CHIPSET(0x22B1, chv, "Intel(R) HD Graphics XXX (Braswell)") /* 
> Overridden in brw_get_renderer_string */
>  CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")

Verified against the spec, lgtm.
Reviewed-by: Dhinakaran Pandiyan 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [PATCH 1/2] i956: Add more Kabylake PCI IDs.

2016-06-28 Thread Pandiyan, Dhinakaran
On Thu, 2016-06-23 at 14:50 -0700, Rodrigo Vivi wrote:
> The spec has been updated adding new PCI IDs.
> 
> Signed-off-by: Rodrigo Vivi 
> ---
>  include/pci_ids/i965_pci_ids.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
> index fce00da..7a7897f 100644
> --- a/include/pci_ids/i965_pci_ids.h
> +++ b/include/pci_ids/i965_pci_ids.h
> @@ -137,6 +137,7 @@ CHIPSET(0x193D, skl_gt4, "Intel(R) Iris Pro Graphics P580 
> (Skylake GT4e)")
>  CHIPSET(0x5902, kbl_gt1, "Intel(R) Kabylake GT1")
>  CHIPSET(0x5906, kbl_gt1, "Intel(R) Kabylake GT1")
>  CHIPSET(0x590A, kbl_gt1, "Intel(R) Kabylake GT1")
> +CHIPSET(0x5908, kbl_gt1, "Intel(R) Kabylake GT1")
>  CHIPSET(0x590B, kbl_gt1, "Intel(R) Kabylake GT1")
>  CHIPSET(0x590E, kbl_gt1, "Intel(R) Kabylake GT1")
>  CHIPSET(0x5913, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
> @@ -149,7 +150,9 @@ CHIPSET(0x591B, kbl_gt2, "Intel(R) Kabylake GT2")
>  CHIPSET(0x591D, kbl_gt2, "Intel(R) Kabylake GT2")
>  CHIPSET(0x591E, kbl_gt2, "Intel(R) Kabylake GT2")
>  CHIPSET(0x5921, kbl_gt2, "Intel(R) Kabylake GT2F")
> +CHIPSET(0x5923, kbl_gt3, "Intel(R) Kabylake GT3")
>  CHIPSET(0x5926, kbl_gt3, "Intel(R) Kabylake GT3")
> +CHIPSET(0x5927, kbl_gt3, "Intel(R) Kabylake GT3")
>  CHIPSET(0x592A, kbl_gt3, "Intel(R) Kabylake GT3")
>  CHIPSET(0x592B, kbl_gt3, "Intel(R) Kabylake GT3")
>  CHIPSET(0x5932, kbl_gt4, "Intel(R) Kabylake GT4")

Verified against the spec. lgtm.
Reviewed-by: Dhinakaran Pandiyan 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2 v2] mesa/st: Silence unused variable warning

2016-06-28 Thread Gurkirpal Singh
v2: Use MAYBE_UNUSED
Changed commit tag
(Suggested by Ian Romanick)

Signed-off-by: Gurkirpal Singh 
---
 src/mesa/state_tracker/st_glsl_to_nir.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index a880564..2cdb7b6 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -46,6 +46,8 @@
 #include "compiler/glsl/glsl_to_nir.h"
 #include "compiler/glsl/ir.h"
 
+#include "util/macros.h"
+
 
 /* Depending on PIPE_CAP_TGSI_TEXCOORD (st->needs_texcoord_semantic) we
  * may need to fix up varying slots so the glsl->nir path is aligned
@@ -169,7 +171,7 @@ st_nir_assign_uniform_locations(struct gl_program *prog,
 
   if (uniform->type->is_sampler()) {
  unsigned val;
- bool found = shader_program->UniformHash->get(val, uniform->name);
+ MAYBE_UNUSED bool found = shader_program->UniformHash->get(val, 
uniform->name);
  loc = shaderidx++;
  assert(found);
  /* this ensure that nir_lower_samplers looks at the correct
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2 v2] gallium: Silence unused variable warnings

2016-06-28 Thread Gurkirpal Singh
v2: Use MAYBE_UNUSED as suggested by Ian Romanick

Signed-off-by: Gurkirpal Singh 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 4 +++-
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp  | 4 +++-
 src/gallium/drivers/nouveau/nv50/nv98_video.c | 4 +++-
 src/gallium/drivers/nouveau/nvc0/nvc0_video.c | 8 +---
 src/gallium/drivers/softpipe/sp_state_shader.c| 3 ++-
 src/gallium/state_trackers/xvmc/surface.c | 5 +++--
 src/gallium/state_trackers/xvmc/tests/xvmc_bench.c| 8 +---
 7 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
index 0fe399b..9fc7c5a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
@@ -23,6 +23,8 @@
 #include "codegen/nv50_ir.h"
 #include "codegen/nv50_ir_target_nv50.h"
 
+#include "util/macros.h"
+
 namespace nv50_ir {
 
 #define NV50_OP_ENC_LONG 0
@@ -621,7 +623,7 @@ void
 CodeEmitterNV50::emitLOAD(const Instruction *i)
 {
DataFile sf = i->src(0).getFile();
-   int32_t offset = i->getSrc(0)->reg.data.offset;
+   MAYBE_UNUSED int32_t offset = i->getSrc(0)->reg.data.offset;
 
switch (sf) {
case FILE_SHADER_INPUT:
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 3213188..6a60a7b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -24,6 +24,8 @@
 #include "codegen/nv50_ir_target.h"
 #include "codegen/nv50_ir_build_util.h"
 
+#include "util/macros.h"
+
 extern "C" {
 #include "util/u_math.h"
 }
@@ -2963,7 +2965,7 @@ NV50PostRaConstantFolding::visit(BasicBlock *bb)
i->setSrc(1, def->getSrc(0));
 } else {
ImmediateValue val;
-   bool ret = def->src(0).getImmediate(val);
+   MAYBE_UNUSED bool ret = def->src(0).getImmediate(val);
assert(ret);
if (i->getSrc(1)->reg.data.id & 1)
   val.reg.data.u32 >>= 16;
diff --git a/src/gallium/drivers/nouveau/nv50/nv98_video.c 
b/src/gallium/drivers/nouveau/nv50/nv98_video.c
index 177a7e0..d348807 100644
--- a/src/gallium/drivers/nouveau/nv50/nv98_video.c
+++ b/src/gallium/drivers/nouveau/nv50/nv98_video.c
@@ -24,6 +24,7 @@
 
 #include "util/u_sampler.h"
 #include "util/u_format.h"
+#include "util/macros.h"
 
 #include 
 
@@ -40,7 +41,8 @@ nv98_decoder_decode_bitstream(struct pipe_video_codec 
*decoder,
uint32_t comm_seq = ++dec->fence_seq;
union pipe_desc desc;
 
-   unsigned vp_caps, is_ref, ret;
+   unsigned vp_caps, is_ref;
+   MAYBE_UNUSED unsigned ret;
struct nouveau_vp3_video_buffer *refs[16] = {};
 
desc.base = picture;
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_video.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_video.c
index a9fd1d2..10cb31e 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_video.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_video.c
@@ -24,6 +24,7 @@
 
 #include "util/u_sampler.h"
 #include "util/u_format.h"
+#include "util/macros.h"
 
 static void
 nvc0_decoder_begin_frame(struct pipe_video_codec *decoder,
@@ -32,7 +33,7 @@ nvc0_decoder_begin_frame(struct pipe_video_codec *decoder,
 {
struct nouveau_vp3_decoder *dec = (struct nouveau_vp3_decoder *)decoder;
uint32_t comm_seq = ++dec->fence_seq;
-   unsigned ret = 0;
+   MAYBE_UNUSED unsigned ret = 0;
 
assert(dec);
assert(target);
@@ -53,7 +54,7 @@ nvc0_decoder_decode_bitstream(struct pipe_video_codec 
*decoder,
 {
struct nouveau_vp3_decoder *dec = (struct nouveau_vp3_decoder *)decoder;
uint32_t comm_seq = dec->fence_seq;
-   unsigned ret = 0;
+   MAYBE_UNUSED unsigned ret = 0;
 
assert(decoder);
 
@@ -72,7 +73,8 @@ nvc0_decoder_end_frame(struct pipe_video_codec *decoder,
uint32_t comm_seq = dec->fence_seq;
union pipe_desc desc;
 
-   unsigned vp_caps, is_ref, ret;
+   unsigned vp_caps, is_ref;
+   MAYBE_UNUSED unsigned ret;
struct nouveau_vp3_video_buffer *refs[16] = {};
 
desc.base = picture;
diff --git a/src/gallium/drivers/softpipe/sp_state_shader.c 
b/src/gallium/drivers/softpipe/sp_state_shader.c
index a745662..d02727f 100644
--- a/src/gallium/drivers/softpipe/sp_state_shader.c
+++ b/src/gallium/drivers/softpipe/sp_state_shader.c
@@ -34,6 +34,7 @@
 #include "util/u_memory.h"
 #include "util/u_inlines.h"
 #include "util/u_pstipple.h"
+#include "util/macros.h"
 #include "draw/draw_context.h"
 #include "draw/draw_vs.h"
 #include "draw/draw_gs.h"
@@ -420,7 +421,7 @@ static void
 softpipe_delete_compute_state(struct pipe_context *pipe,
   void *cs)
 {
-   struct softpipe_context *softpipe = softpipe_context(pipe);
+   MAYBE_UNUSED struct softpipe_context *softpipe = 

Re: [Mesa-dev] [PATCH] gallium: Force blend color to 16-byte alignment

2016-06-28 Thread Matt Turner
On Tue, Jun 28, 2016 at 1:45 PM, Chuck Atkins  wrote:
> This aligns the 4-element color float array to 16 byte boundaries.  This
> should allow compiler vectorizers to generate better optimizations.
> Also fixes broken vectorization generated by Intel compiler.
>
> Reported-by: Tim Rowley 
> Signed-off-by: Chuck Atkins 
> ---
>  src/gallium/include/pipe/p_state.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/include/pipe/p_state.h 
> b/src/gallium/include/pipe/p_state.h
> index 1543e90..95f140f 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -326,7 +326,7 @@ struct pipe_blend_state
>
>  struct pipe_blend_color
>  {
> -   float color[4];
> +  PIPE_ALIGN_VAR(16) float color[4];

Looks like you lost a space of indentation. Whoever commits, please fix.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: adds gen7_emit_cs_stall_flush on intel_texture_barrier

2016-06-28 Thread Francisco Jerez
Alejandro Piñeiro  writes:

> Fixes:
> GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass
>
> On Haswell, Broadwell and Skylake (note that in order to execute
> that test, it is needed to override GL and GLSL versions).
>
> I was not able to find a documentation reference that justifies it.

> ---
>
> Having said, I didn't find a documentation reference explicitly
> mention that this is needed.
>
> Initially I thought that a flag was missing when calling
> emit_pipe_control_flush at brw_emit_mi_flush, but it was not the case
> as far as I saw.  Then I noted that there is a gen6 workaround on that
> code:
>
>  if (brw->gen == 6) {
> /* Hardware workaround: SNB B-Spec says:
>  *
>  * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
>  * Flush Enable =1, a PIPE_CONTROL with any non-zero
>  * post-sync-op is required.
>  */
> brw_emit_post_sync_nonzero_flush(brw);
>  }
>
> I tested calling that method for any gen, guessing if the workaround
> was needed also for other gens, and the test got fixed. But looking at
> the documentation of other gens, I didn't find the need for this
> workaround. For that reason I moved to use gen7_emit_cs_stall, that is
> less agressive and get the test fixed too. It seems that in order to
> get a complete flush you need a cs stall flush with a
> pipe_control_write. But again, I didn't find any reference at the PRMs
> confirming it.
>
> Intuitively, this would be needed on brw_emit_mi_flush or even at
> brw_emit_pipe_control_flush (this one already include some
> gen-specific workarounds), but I prefered to keep it on the only place
> that seems to need it for now.
>
> In addition to solve that CTS test, it also gets it passing for the
> test I recently sent to the piglit list, and not included on master
> yet (acked for now):
> https://lists.freedesktop.org/archives/piglit/2016-June/020055.html
>
> That piglit patch adds 48 parameter combination for the basic
> test. Without this mesa patch 5-6 subtests fails. With this patch all
> of them passes. Tested on Haswell, Broadwell and Skylake too.
>
I believe this test is hitting the same hardware race condition that
most callers of brw_emit_mi_flush() suffer from: The problem of
brw_emit_mi_flush() is that, even though it is supposed to both
invalidate R/O caches (e.g. the sampler caches) and flush R/W caches
(e.g. the render cache), the former happens at the top of the pipeline
(i.e. as soon as the CS processor parses the PIPE_CONTROL command,
irrespective of whether a concurrent rendering workload could pollute a
R/O cache again in parallel), while the latter happens at the bottom of
the pipeline (i.e. after any concurrent rendering completes).

The gen7_emit_cs_stall_flush() call you have introduced seems to fix the
issue because it forces additional serialization with respect to
previous rendering commands before the R/O caches are invalidated, which
is a clear indicative that you're hitting the same bug.  The right way
to fix it would be to remove the brw_emit_mi_flush() call for Gen6+ at
least (brw_emit_mi_flush() is BTW a pretty big hammer and causes a bunch
of other caches to be flushed which aren't necessarily relevant to
texture barrier), and instead call brw_emit_pipe_control_flush() twice:
The first PIPE_CONTROL command should have at least RENDER_TARGET_FLUSH
and CS_STALL set to initiate a render cache flush after any concurrent
rendering completes and cause the CS to stop parsing commands until the
render cache becomes coherent with memory (the DEPTH_CACHE_FLUSH bit may
also be necessary for some workloads using depth texturing).  The second
PIPE_CONTROL should have TEXTURE_CACHE_INVALIDATE set (and no CS stall)
to clean up any stale data from the sampler caches before rendering
continues.

See 0aa4f99f562a05880a779707cbcd46be459863bf for how I addressed the
same problem in the L3 cache partitioning code (where I noticed the
problem originally), or 72473658c51d5e074ce219c1e6385a4cce29f467 for how
Ken fixed the same issue in the draw-time surface validation path.
Incidentally I had written some code just a couple of days ago to
address the same issue in the implementation of glMemoryBarrier (I'll
send it for review soon-ish).  There are likely many more instances of
this race condition in the driver, most callers of brw_emit_mi_flush are
suspect...

>  src/mesa/drivers/dri/i965/intel_tex.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_tex.c 
> b/src/mesa/drivers/dri/i965/intel_tex.c
> index cac33ac..e7459cd 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex.c
> @@ -362,6 +362,7 @@ intel_texture_barrier(struct gl_context *ctx)
>  {
> struct brw_context *brw = brw_context(ctx);
>  
> +   gen7_emit_cs_stall_flush(brw);
> brw_emit_mi_flush(brw);
>  }
>  
> -- 
> 2.7.4
>
> 

Re: [Mesa-dev] [PATCH 1/2] intel: Add more Kabylake PCI IDs.

2016-06-28 Thread Pandiyan, Dhinakaran
On Mon, 2016-06-27 at 17:10 -0700, Rodrigo Vivi wrote:
> The spec has been updated adding new PCI IDs.
> 
> v2: Avoid using "H" instead of HALO to keep names uniform - DK.
> 
> Cc: Dhinakaran Pandiyan 
> Signed-off-by: Rodrigo Vivi 
> ---
>  intel/intel_chipset.h | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/intel/intel_chipset.h b/intel/intel_chipset.h
> index e2554c3..6b8d4e9 100644
> --- a/intel/intel_chipset.h
> +++ b/intel/intel_chipset.h
> @@ -194,7 +194,9 @@
>  #define PCI_CHIP_KABYLAKE_ULT_GT20x5916
>  #define PCI_CHIP_KABYLAKE_ULT_GT1_5  0x5913
>  #define PCI_CHIP_KABYLAKE_ULT_GT10x5906
> -#define PCI_CHIP_KABYLAKE_ULT_GT30x5926
> +#define PCI_CHIP_KABYLAKE_ULT_GT3_0  0x5923
> +#define PCI_CHIP_KABYLAKE_ULT_GT3_1  0x5926
> +#define PCI_CHIP_KABYLAKE_ULT_GT3_2  0x5927
>  #define PCI_CHIP_KABYLAKE_ULT_GT2F   0x5921
>  #define PCI_CHIP_KABYLAKE_ULX_GT1_5  0x5915
>  #define PCI_CHIP_KABYLAKE_ULX_GT10x590E
> @@ -206,7 +208,8 @@
>  #define PCI_CHIP_KABYLAKE_HALO_GT2   0x591B
>  #define PCI_CHIP_KABYLAKE_HALO_GT4   0x593B
>  #define PCI_CHIP_KABYLAKE_HALO_GT3   0x592B
> -#define PCI_CHIP_KABYLAKE_HALO_GT1   0x590B
> +#define PCI_CHIP_KABYLAKE_HALO_GT1_0 0x5908
> +#define PCI_CHIP_KABYLAKE_HALO_GT1_1 0x590B
>  #define PCI_CHIP_KABYLAKE_SRV_GT20x591A
>  #define PCI_CHIP_KABYLAKE_SRV_GT30x592A
>  #define PCI_CHIP_KABYLAKE_SRV_GT10x590A
> @@ -414,7 +417,8 @@
>(devid) == PCI_CHIP_KABYLAKE_ULT_GT1   || \
>(devid) == PCI_CHIP_KABYLAKE_ULX_GT1   || \
>(devid) == PCI_CHIP_KABYLAKE_DT_GT1|| \
> -  (devid) == PCI_CHIP_KABYLAKE_HALO_GT1  || \
> +  (devid) == PCI_CHIP_KABYLAKE_HALO_GT1_0 || \
> +  (devid) == PCI_CHIP_KABYLAKE_HALO_GT1_1 || \
>(devid) == PCI_CHIP_KABYLAKE_SRV_GT1)
>  
>  #define IS_KBL_GT2(devid)((devid) == PCI_CHIP_KABYLAKE_ULT_GT2   || \
> @@ -425,7 +429,9 @@
>(devid) == PCI_CHIP_KABYLAKE_SRV_GT2   || \
>(devid) == PCI_CHIP_KABYLAKE_WKS_GT2)
>  
> -#define IS_KBL_GT3(devid)((devid) == PCI_CHIP_KABYLAKE_ULT_GT3   || \
> +#define IS_KBL_GT3(devid)((devid) == PCI_CHIP_KABYLAKE_ULT_GT3_0 || \
> +  (devid) == PCI_CHIP_KABYLAKE_ULT_GT3_1 || \
> +  (devid) == PCI_CHIP_KABYLAKE_ULT_GT3_2 || \
>(devid) == PCI_CHIP_KABYLAKE_HALO_GT3  || \
>(devid) == PCI_CHIP_KABYLAKE_SRV_GT3)
>  
Checked against the spec, lgtm.
Reviewed-by: Dhinakaran Pandiyan 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/main: handle gl_buffer_index correctly

2016-06-28 Thread Francesco Ansanelli
---
 src/mesa/main/buffers.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/mesa/main/buffers.c b/src/mesa/main/buffers.c
index e8aedde..3ff6061 100644
--- a/src/mesa/main/buffers.c
+++ b/src/mesa/main/buffers.c
@@ -170,7 +170,7 @@ draw_buffer_enum_to_bitmask(const struct gl_context *ctx, 
GLenum buffer)
  * Helper routine used by glReadBuffer.
  * Given a GLenum naming a color buffer, return the index of the corresponding
  * renderbuffer (a BUFFER_* value).
- * return -1 for an invalid buffer.
+ * return ~0 for an invalid buffer.
  */
 static gl_buffer_index
 read_buffer_enum_to_index(GLenum buffer)
@@ -197,7 +197,7 @@ read_buffer_enum_to_index(GLenum buffer)
   case GL_AUX1:
   case GL_AUX2:
   case GL_AUX3:
- return BUFFER_COUNT; /* invalid, but not -1 */
+ return BUFFER_COUNT; /* invalid, but not ~0 */
   case GL_COLOR_ATTACHMENT0_EXT:
  return BUFFER_COLOR0;
   case GL_COLOR_ATTACHMENT1_EXT:
@@ -219,7 +219,7 @@ read_buffer_enum_to_index(GLenum buffer)
  if (buffer >= GL_COLOR_ATTACHMENT8 && buffer <= GL_COLOR_ATTACHMENT31)
 return BUFFER_COUNT;
  /* error */
- return -1;
+ return ~0;
}
 }
 
@@ -722,11 +722,11 @@ read_buffer(struct gl_context *ctx, struct gl_framebuffer 
*fb,
else {
   /* general case / window-system framebuffer */
   if (_mesa_is_gles3(ctx) && !is_legal_es3_readbuffer_enum(buffer))
- srcBuffer = -1;
+ srcBuffer = ~0;
   else
  srcBuffer = read_buffer_enum_to_index(buffer);
 
-  if (srcBuffer == -1) {
+  if (srcBuffer == ~0u) {
  _mesa_error(ctx, GL_INVALID_ENUM,
  "%s(invalid buffer %s)", caller,
  _mesa_enum_to_string(buffer));
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] intel: Removing PCI IDs that are no longer listed as Kabylake.

2016-06-28 Thread Pandiyan, Dhinakaran
On Mon, 2016-06-27 at 17:10 -0700, Rodrigo Vivi wrote:
> This is unusual. Usually IDs listed on early stages of platform
> definition are kept there as reserved for later use.
> 
> However these IDs here are not listed anymore in any of steppings
> and devices IDs tables for Kabylake on configurations overview
> section of BSpec.
> 
> So it is better removing them before they become used in any
> other future platform.
> 
> v2: Rebase.
> 
> Cc: Dhinakaran Pandiyan 
> Signed-off-by: Rodrigo Vivi 
> ---
>  intel/intel_chipset.h | 16 +++-
>  1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/intel/intel_chipset.h b/intel/intel_chipset.h
> index 6b8d4e9..514f659 100644
> --- a/intel/intel_chipset.h
> +++ b/intel/intel_chipset.h
> @@ -204,18 +204,13 @@
>  #define PCI_CHIP_KABYLAKE_DT_GT2 0x5912
>  #define PCI_CHIP_KABYLAKE_DT_GT1_5   0x5917
>  #define PCI_CHIP_KABYLAKE_DT_GT1 0x5902
> -#define PCI_CHIP_KABYLAKE_DT_GT4 0x5932
>  #define PCI_CHIP_KABYLAKE_HALO_GT2   0x591B
>  #define PCI_CHIP_KABYLAKE_HALO_GT4   0x593B
> -#define PCI_CHIP_KABYLAKE_HALO_GT3   0x592B
>  #define PCI_CHIP_KABYLAKE_HALO_GT1_0 0x5908
>  #define PCI_CHIP_KABYLAKE_HALO_GT1_1 0x590B
>  #define PCI_CHIP_KABYLAKE_SRV_GT20x591A
> -#define PCI_CHIP_KABYLAKE_SRV_GT30x592A
>  #define PCI_CHIP_KABYLAKE_SRV_GT10x590A
> -#define PCI_CHIP_KABYLAKE_SRV_GT40x593A
>  #define PCI_CHIP_KABYLAKE_WKS_GT20x591D
> -#define PCI_CHIP_KABYLAKE_WKS_GT40x593D
>  
>  #define PCI_CHIP_BROXTON_0   0x0A84
>  #define PCI_CHIP_BROXTON_1   0x1A84
> @@ -431,14 +426,9 @@
>  
>  #define IS_KBL_GT3(devid)((devid) == PCI_CHIP_KABYLAKE_ULT_GT3_0 || \
>(devid) == PCI_CHIP_KABYLAKE_ULT_GT3_1 || \
> -  (devid) == PCI_CHIP_KABYLAKE_ULT_GT3_2 || \
> -  (devid) == PCI_CHIP_KABYLAKE_HALO_GT3  || \
> -  (devid) == PCI_CHIP_KABYLAKE_SRV_GT3)
> -
> -#define IS_KBL_GT4(devid)((devid) == PCI_CHIP_KABYLAKE_DT_GT4|| \
> -  (devid) == PCI_CHIP_KABYLAKE_HALO_GT4  || \
> -  (devid) == PCI_CHIP_KABYLAKE_SRV_GT4   || \
> -  (devid) == PCI_CHIP_KABYLAKE_WKS_GT4)
> +  (devid) == PCI_CHIP_KABYLAKE_ULT_GT3_2)
> +
> +#define IS_KBL_GT4(devid)((devid) == PCI_CHIP_KABYLAKE_HALO_GT4)
>  
>  #define IS_KABYLAKE(devid)   (IS_KBL_GT1(devid) || \
>IS_KBL_GT2(devid) || \

Checked against the spec, lgtm.
Reviewed-by: Dhinakaran Pandiyan 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium: Force blend color to 16-byte alignment

2016-06-28 Thread Chuck Atkins
This aligns the 4-element color float array to 16 byte boundaries.  This
should allow compiler vectorizers to generate better optimizations.
Also fixes broken vectorization generated by Intel compiler.

Reported-by: Tim Rowley 
Signed-off-by: Chuck Atkins 
---
 src/gallium/include/pipe/p_state.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/include/pipe/p_state.h 
b/src/gallium/include/pipe/p_state.h
index 1543e90..95f140f 100644
--- a/src/gallium/include/pipe/p_state.h
+++ b/src/gallium/include/pipe/p_state.h
@@ -326,7 +326,7 @@ struct pipe_blend_state
 
 struct pipe_blend_color
 {
-   float color[4];
+  PIPE_ALIGN_VAR(16) float color[4];
 };
 
 
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] i965: Refactor intel_get_param()

2016-06-28 Thread Ian Romanick
When I first saw spriv in the last hunk, my brain parsed it as spirv.
That was confusing. :)

Patches 1 and 3 are

Reviewed-by: Ian Romanick 

On 06/28/2016 10:07 AM, Chad Versace wrote:
> Replace the function's __DRIscreen parameter with struct intel_screen.
> The callsites feel more natural that way.
> ---
>  src/mesa/drivers/dri/i965/intel_screen.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
> b/src/mesa/drivers/dri/i965/intel_screen.c
> index 869119b..b693c45 100644
> --- a/src/mesa/drivers/dri/i965/intel_screen.c
> +++ b/src/mesa/drivers/dri/i965/intel_screen.c
> @@ -970,7 +970,7 @@ static const __DRIextension 
> *intelRobustScreenExtensions[] = {
>  };
>  
>  static int
> -intel_get_param(__DRIscreen *psp, int param, int *value)
> +intel_get_param(struct intel_screen *screen, int param, int *value)
>  {
> int ret;
> struct drm_i915_getparam gp;
> @@ -979,7 +979,8 @@ intel_get_param(__DRIscreen *psp, int param, int *value)
> gp.param = param;
> gp.value = value;
>  
> -   ret = drmCommandWriteRead(psp->fd, DRM_I915_GETPARAM, , sizeof(gp));
> +   ret = drmCommandWriteRead(screen->driScrnPriv->fd,
> + DRM_I915_GETPARAM, , sizeof(gp));
> if (ret < 0 && ret != -EINVAL)
>_mesa_warning(NULL, "drm_i915_getparam: %d", ret);
>  
> @@ -987,10 +988,10 @@ intel_get_param(__DRIscreen *psp, int param, int *value)
>  }
>  
>  static bool
> -intel_get_boolean(__DRIscreen *psp, int param)
> +intel_get_boolean(struct intel_screen *screen, int param)
>  {
> int value = 0;
> -   return (intel_get_param(psp, param, ) == 0) && value;
> +   return (intel_get_param(screen, param, ) == 0) && value;
>  }
>  
>  static void
> @@ -1125,12 +1126,12 @@ intel_detect_sseu(struct intel_screen *intelScreen)
> intelScreen->subslice_total = -1;
> intelScreen->eu_total = -1;
>  
> -   ret = intel_get_param(intelScreen->driScrnPriv, I915_PARAM_SUBSLICE_TOTAL,
> +   ret = intel_get_param(intelScreen, I915_PARAM_SUBSLICE_TOTAL,
>   >subslice_total);
> if (ret < 0 && ret != -EINVAL)
>goto err_out;
>  
> -   ret = intel_get_param(intelScreen->driScrnPriv,
> +   ret = intel_get_param(intelScreen,
>   I915_PARAM_EU_TOTAL, >eu_total);
> if (ret < 0 && ret != -EINVAL)
>goto err_out;
> @@ -1167,7 +1168,7 @@ intel_init_bufmgr(struct intel_screen *intelScreen)
>  
> drm_intel_bufmgr_gem_enable_fenced_relocs(intelScreen->bufmgr);
>  
> -   if (!intel_get_boolean(spriv, I915_PARAM_HAS_RELAXED_DELTA)) {
> +   if (!intel_get_boolean(intelScreen, I915_PARAM_HAS_RELAXED_DELTA)) {
>fprintf(stderr, "[%s: %u] Kernel 2.6.39 required.\n", __func__, 
> __LINE__);
>return false;
> }
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: add driconf to zero-init unintialized vars

2016-06-28 Thread Rob Clark
On Tue, Jun 28, 2016 at 11:28 AM, Marek Olšák  wrote:
> On Mon, Jun 27, 2016 at 9:28 PM, Rob Clark  wrote:
>> On Mon, Jun 27, 2016 at 3:06 PM, Kenneth Graunke  
>> wrote:
>>> On Monday, June 27, 2016 11:43:28 AM PDT Matt Turner wrote:
 On Mon, Jun 27, 2016 at 4:44 AM, Rob Clark  wrote:
 > On Mon, Jun 27, 2016 at 7:13 AM, Alan Swanson  
 > wrote:
 >> On 2016-06-25 13:37, Rob Clark wrote:
 >>>
 >>> Some games are sloppy.. perhaps because it is defined behavior for DX 
 >>> or
 >>> perhaps because nv blob driver defaults things to zero.
 >>>
 >>> So add driconf param to force uninitialized variables to default to 
 >>> zero.
 >>>
 >>> This issue was observed with rust, from steam store.  But has surfaced
 >>> elsewhere in the past.
 >>>
 >>> Signed-off-by: Rob Clark 
 >>> ---
 >>> Note that I left out the drirc bit, since not entirely sure how to
 >>> identify this game.  (I don't actually have the game, just working off
 >>> of an apitrace)
 >>>
 >>> Possibly worth mentioning that for the shaders using uninitialized vars
 >>> having zero-initializers lets constant-propagation get rid of a whole
 >>> lot of instructions.  One shader I saw dropped to less than half of
 >>> it's original instruction count.
 >>
 >>
 >> If the default for uninitialised variables is undefined, then with the
 >> reported shader optimisations why bother with the (DRI) option when
 >> zeroing could still essentially be classed as undefined?
 >>
 >> Cuts the patch down to just the src/compiler/glsl/ast_to_hir.cpp change.
 >
 > I did suggest that on #dri-devel, but Jason had a theoretical example
 > where it would hurt.. iirc something like:
 >
 >   float maybe_undef;
 >   for (int i = 0; i < some_uniform_at_least_one; i++)
 >  maybe_undef = ...
 >
 > also, he didn't want to hide shader bugs that app should fix.
 >
 > It would be interesting to rush shaderdb w/ glsl_zero_init=true and
 > see what happens, but I didn't get around to that yet.

 Here's what I get on i965. It's not a clear win.

 total instructions in shared programs: 5249030 -> 5249002 (-0.00%)
 instructions in affected programs: 28936 -> 28908 (-0.10%)
 helped: 66
 HURT: 132

 total cycles in shared programs: 57966694 -> 57956306 (-0.02%)
 cycles in affected programs: 1136118 -> 1125730 (-0.91%)
 helped: 78
 HURT: 106
>>>
>>> I suspect most of the help is because we're missing undef optimizations,
>>> such as CSE...while zero could be CSE'd.  (I have a patch, but it hurts
>>> things too...)
>>
>> right, I was thinking that treating undef as zero in constant-folding
>> would have the same effect.. ofc it might make shader bugs less
>> obvious.
>>
>> Btw, does anyone know what fglrx does?  Afaiu nv blob treats undef as
>> zero.  If fglrx does the same, I suppose that strengthens the argument
>> for "just do this unconditionally".
>
> No idea what fglrx does, but LLVM does eliminate code with undefined
> inputs. Initializing everything to 0 might make that worse.

hmm, treating as zero does eliminate a lot.. anyway, I guess we'll
stick w/ driconf.

fwiw, with some help from the reporter, we figured out that this is
the bit that I need to squash into drirc:





now, if I could talk somebody into a r-b for this and the i965 fix? ;-)

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa: Silence unused variable warning

2016-06-28 Thread Ian Romanick
On 06/28/2016 01:01 PM, Gurkirpal Singh wrote:
> Signed-off-by: Gurkirpal Singh 
> ---
>  src/mesa/state_tracker/st_glsl_to_nir.cpp | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
> b/src/mesa/state_tracker/st_glsl_to_nir.cpp
> index a880564..a914c8d 100644
> --- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
> @@ -172,6 +172,7 @@ st_nir_assign_uniform_locations(struct gl_program *prog,
>   bool found = shader_program->UniformHash->get(val, uniform->name);

There have been some similar patches recently that do

  MAYBE_UNUSED bool found = ...;

Also, the tag should be "mesa/st".

>   loc = shaderidx++;
>   assert(found);
> + (void) found;
>   /* this ensure that nir_lower_samplers looks at the correct
>* shader_program->UniformStorage[location]:
>*/
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] mesa: Silence unused variable warning

2016-06-28 Thread Gurkirpal Singh
Signed-off-by: Gurkirpal Singh 
---
 src/mesa/state_tracker/st_glsl_to_nir.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index a880564..a914c8d 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -172,6 +172,7 @@ st_nir_assign_uniform_locations(struct gl_program *prog,
  bool found = shader_program->UniformHash->get(val, uniform->name);
  loc = shaderidx++;
  assert(found);
+ (void) found;
  /* this ensure that nir_lower_samplers looks at the correct
   * shader_program->UniformStorage[location]:
   */
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] gallium: Silence unused variable warnings

2016-06-28 Thread Gurkirpal Singh
Signed-off-by: Gurkirpal Singh 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 2 ++
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp  | 1 +
 src/gallium/drivers/nouveau/nv50/nv98_video.c | 1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_video.c | 3 +++
 src/gallium/drivers/softpipe/sp_state_shader.c| 1 +
 src/gallium/state_trackers/xvmc/surface.c | 2 ++
 src/gallium/state_trackers/xvmc/tests/xvmc_bench.c| 4 
 7 files changed, 14 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
index 0fe399b..d5479a7 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
@@ -637,6 +637,7 @@ CodeEmitterNV50::emitLOAD(const Instruction *i)
case FILE_MEMORY_SHARED:
   if (targ->getChipset() >= 0x84) {
  assert(offset <= (int32_t)(0x3fff * typeSizeof(i->sType)));
+ (void) offset;
  code[0] = 0x1001;
  code[1] = 0x4000;
 
@@ -646,6 +647,7 @@ CodeEmitterNV50::emitLOAD(const Instruction *i)
  emitLoadStoreSizeCS(i->sType);
   } else {
  assert(offset <= (int32_t)(0x1f * typeSizeof(i->sType)));
+ (void) offset;
  code[0] = 0x1001;
  code[1] = 0x0020 | (i->lanes << 14);
  emitLoadStoreSizeCS(i->sType);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 3213188..e92cfea 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -2965,6 +2965,7 @@ NV50PostRaConstantFolding::visit(BasicBlock *bb)
ImmediateValue val;
bool ret = def->src(0).getImmediate(val);
assert(ret);
+   (void) ret;
if (i->getSrc(1)->reg.data.id & 1)
   val.reg.data.u32 >>= 16;
val.reg.data.u32 &= 0x;
diff --git a/src/gallium/drivers/nouveau/nv50/nv98_video.c 
b/src/gallium/drivers/nouveau/nv50/nv98_video.c
index 177a7e0..ce86399 100644
--- a/src/gallium/drivers/nouveau/nv50/nv98_video.c
+++ b/src/gallium/drivers/nouveau/nv50/nv98_video.c
@@ -53,6 +53,7 @@ nv98_decoder_decode_bitstream(struct pipe_video_codec 
*decoder,
 
/* did we decode bitstream correctly? */
assert(ret == 2);
+   (void) ret;
 
nv98_decoder_vp(dec, desc, target, comm_seq, vp_caps, is_ref, refs);
nv98_decoder_ppp(dec, desc, target, comm_seq);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_video.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_video.c
index a9fd1d2..d83f2a9 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_video.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_video.c
@@ -41,6 +41,7 @@ nvc0_decoder_begin_frame(struct pipe_video_codec *decoder,
ret = nvc0_decoder_bsp_begin(dec, comm_seq);
 
assert(ret == 2);
+   (void) ret;
 }
 
 static void
@@ -60,6 +61,7 @@ nvc0_decoder_decode_bitstream(struct pipe_video_codec 
*decoder,
ret = nvc0_decoder_bsp_next(dec, comm_seq, num_buffers, data, num_bytes);
 
assert(ret == 2);
+   (void) ret;
 }
 
 static void
@@ -81,6 +83,7 @@ nvc0_decoder_end_frame(struct pipe_video_codec *decoder,
 
/* did we decode bitstream correctly? */
assert(ret == 2);
+   (void) ret;
 
nvc0_decoder_vp(dec, desc, target, comm_seq, vp_caps, is_ref, refs);
nvc0_decoder_ppp(dec, desc, target, comm_seq);
diff --git a/src/gallium/drivers/softpipe/sp_state_shader.c 
b/src/gallium/drivers/softpipe/sp_state_shader.c
index a745662..d3abd9d 100644
--- a/src/gallium/drivers/softpipe/sp_state_shader.c
+++ b/src/gallium/drivers/softpipe/sp_state_shader.c
@@ -424,6 +424,7 @@ softpipe_delete_compute_state(struct pipe_context *pipe,
struct sp_compute_shader *state = (struct sp_compute_shader *)cs;
 
assert(softpipe->cs != state);
+   (void) softpipe;
tgsi_free_tokens(state->tokens);
FREE(state);
 }
diff --git a/src/gallium/state_trackers/xvmc/surface.c 
b/src/gallium/state_trackers/xvmc/surface.c
index 199712b..8e9e079 100644
--- a/src/gallium/state_trackers/xvmc/surface.c
+++ b/src/gallium/state_trackers/xvmc/surface.c
@@ -270,6 +270,8 @@ Status XvMCRenderSurface(Display *dpy, XvMCContext 
*context, unsigned int pictur
assert(target_surface_priv->context == context);
assert(!past_surface || past_surface_priv->context == context);
assert(!future_surface || future_surface_priv->context == context);
+   (void) past_surface_priv;
+   (void) future_surface_priv;
 
// call end frame on all referenced frames
if (past_surface)
diff --git a/src/gallium/state_trackers/xvmc/tests/xvmc_bench.c 
b/src/gallium/state_trackers/xvmc/tests/xvmc_bench.c
index 4dc95ba..ec7ecc8 100644
--- a/src/gallium/state_trackers/xvmc/tests/xvmc_bench.c
+++ 

Re: [Mesa-dev] [PATCH 2/3] i965: Use drmIoctl for DRM_I915_GETPARAM

2016-06-28 Thread Chris Wilson
On Tue, Jun 28, 2016 at 10:07:10AM -0700, Chad Versace wrote:
> Stop using drmCommandWriteRead for such a simple ioctl.
> ---
>  src/mesa/drivers/dri/i965/intel_screen.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
> b/src/mesa/drivers/dri/i965/intel_screen.c
> index b693c45..f7f806e 100644
> --- a/src/mesa/drivers/dri/i965/intel_screen.c
> +++ b/src/mesa/drivers/dri/i965/intel_screen.c
> @@ -979,8 +979,7 @@ intel_get_param(struct intel_screen *screen, int param, 
> int *value)
> gp.param = param;
> gp.value = value;
>  
> -   ret = drmCommandWriteRead(screen->driScrnPriv->fd,
> - DRM_I915_GETPARAM, , sizeof(gp));
> +   ret = drmIoctl(screen->driScrnPriv->fd, DRM_IOCTL_I915_GETPARAM, );
> if (ret < 0 && ret != -EINVAL)

drmIoctl() doesn't return -errno, just -1 and the error code in errno.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4] swr: Refactor checks for compiler feature flags

2016-06-28 Thread Chuck Atkins
Encapsulate the test for which flags are needed to get a compiler to
support certain features.  Along with this, give various options to try
for AVX and AVX2 support.  Ideally we want to use specific instruction
set feature flags, like -mavx2 for instance instead of -march=haswell,
but the flags required for certain compilers are different.  This
allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
while the Intel compiler which doesn't support those flags can fall
back to using -march=core-avx2.

This addresses a bug where the Intel compiler will silently ignore the
AVX2 instruction feature flags and then potentially fail to build.

v2: Pass preprocessor-check argument as true-state instead of
false-state for clarity.
v3: Reduce AVX2 define test to just __AVX2__.  Additional defines suchas
__FMA__, __BMI2__, and __F16C__ appear to be inconsistently defined
w.r.t thier availability.
v4: Fix C++11 flags being added globally and add more logic to
swr_require_cxx_feature_flags

Cc: Tim Rowley 
Signed-off-by: Chuck Atkins 
---
 configure.ac| 73 +
 src/gallium/drivers/swr/Makefile.am |  4 +-
 2 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/configure.ac b/configure.ac
index cc9bc47..8321e8e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2330,6 +2330,45 @@ swr_llvm_check() {
 fi
 }
 
+swr_require_cxx_feature_flags() {
+feature_name="$1"
+preprocessor_test="$2"
+option_list="$3"
+output_var="$4"
+
+AC_MSG_CHECKING([whether $CXX supports $feature_name])
+AC_LANG_PUSH([C++])
+save_CXXFLAGS="$CXXFLAGS"
+save_IFS="$IFS"
+IFS=","
+found=0
+for opts in $option_list
+do
+unset IFS
+CXXFLAGS="$opts $save_CXXFLAGS"
+AC_COMPILE_IFELSE(
+[AC_LANG_PROGRAM(
+[   #if !($preprocessor_test)
+#error
+#endif
+])],
+[found=1; break],
+[])
+IFS=","
+done
+IFS="$save_IFS"
+CXXFLAGS="$save_CXXFLAGS"
+AC_LANG_POP([C++])
+if test $found -eq 1; then
+AC_MSG_RESULT([$opts])
+eval "$output_var=\$opts"
+return 0
+fi
+AC_MSG_RESULT([no])
+AC_MSG_ERROR([swr requires $feature_name support])
+return 1
+}
+
 dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this 
block
 if test -n "$with_gallium_drivers"; then
 gallium_drivers=`IFS=', '; echo $with_gallium_drivers`
@@ -2399,31 +2438,19 @@ if test -n "$with_gallium_drivers"; then
 xswr)
 swr_llvm_check "swr"
 
-AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2])
-SWR_AVX_CXXFLAGS="-mavx"
-SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c"
-
-AC_LANG_PUSH([C++])
-save_CXXFLAGS="$CXXFLAGS"
-CXXFLAGS="-std=c++11 $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([c++11 compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-
-save_CXXFLAGS="$CXXFLAGS"
-CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([AVX compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-
-save_CFLAGS="$CXXFLAGS"
-CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([AVX2 compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-AC_LANG_POP([C++])
+swr_require_cxx_feature_flags "C++11" "__cplusplus >= 201103L" \
+",-std=c++11" \
+SWR_CXX11_CXXFLAGS
+AC_SUBST([SWR_CXX11_CXXFLAGS])
 
+swr_require_cxx_feature_flags "AVX" "defined(__AVX__)" \
+",-mavx,-march=core-avx" \
+SWR_AVX_CXXFLAGS
 AC_SUBST([SWR_AVX_CXXFLAGS])
+
+swr_require_cxx_feature_flags "AVX2" "defined(__AVX2__)" \
+",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2" \
+SWR_AVX2_CXXFLAGS
 AC_SUBST([SWR_AVX2_CXXFLAGS])
 
 HAVE_GALLIUM_SWR=yes
diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index d896154..210b203 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -22,7 +22,7 @@
 include Makefile.sources
 include $(top_srcdir)/src/gallium/Automake.inc
 
-AM_CXXFLAGS = $(GALLIUM_DRIVER_CFLAGS) -std=c++11
+AM_CXXFLAGS = $(GALLIUM_DRIVER_CFLAGS) $(SWR_CXX11_CXXFLAGS)
 
 noinst_LTLIBRARIES = libmesaswr.la
 
@@ -31,7 +31,7 @@ libmesaswr_la_SOURCES = $(LOADER_SOURCES)
 COMMON_CXXFLAGS = \
$(GALLIUM_DRIVER_CFLAGS) \
$(LLVM_CXXFLAGS) \
- 

Re: [Mesa-dev] [PATCH] i965: adds gen7_emit_cs_stall_flush on intel_texture_barrier

2016-06-28 Thread Alejandro Piñeiro
Hi,


On 28/06/16 18:00, Ilia Mirkin wrote:
> On Tue, Jun 28, 2016 at 11:46 AM, Alejandro Piñeiro
>  wrote:
>> Fixes:
>> GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass
>>
>> On Haswell, Broadwell and Skylake (note that in order to execute
>> that test, it is needed to override GL and GLSL versions).
>>
>> I was not able to find a documentation reference that justifies it.
>> ---
>>
>> Having said, I didn't find a documentation reference explicitly
>> mention that this is needed.
>>
>> Initially I thought that a flag was missing when calling
>> emit_pipe_control_flush at brw_emit_mi_flush, but it was not the case
>> as far as I saw.  Then I noted that there is a gen6 workaround on that
>> code:
>>
>>  if (brw->gen == 6) {
>> /* Hardware workaround: SNB B-Spec says:
>>  *
>>  * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
>>  * Flush Enable =1, a PIPE_CONTROL with any non-zero
>>  * post-sync-op is required.
>>  */
>> brw_emit_post_sync_nonzero_flush(brw);
>>  }
>>
>> I tested calling that method for any gen, guessing if the workaround
>> was needed also for other gens, and the test got fixed. But looking at
>> the documentation of other gens, I didn't find the need for this
>> workaround. For that reason I moved to use gen7_emit_cs_stall, that is
>> less agressive and get the test fixed too. It seems that in order to
>> get a complete flush you need a cs stall flush with a
>> pipe_control_write. But again, I didn't find any reference at the PRMs
>> confirming it.
>>
>> Intuitively, this would be needed on brw_emit_mi_flush or even at
>> brw_emit_pipe_control_flush (this one already include some
>> gen-specific workarounds), but I prefered to keep it on the only place
>> that seems to need it for now.
>>
>> In addition to solve that CTS test, it also gets it passing for the
>> test I recently sent to the piglit list, and not included on master
>> yet (acked for now):
>> https://lists.freedesktop.org/archives/piglit/2016-June/020055.html
>>
>> That piglit patch adds 48 parameter combination for the basic
>> test. Without this mesa patch 5-6 subtests fails. With this patch all
>> of them passes. Tested on Haswell, Broadwell and Skylake too.
>>
>>  src/mesa/drivers/dri/i965/intel_tex.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/intel_tex.c 
>> b/src/mesa/drivers/dri/i965/intel_tex.c
>> index cac33ac..e7459cd 100644
>> --- a/src/mesa/drivers/dri/i965/intel_tex.c
>> +++ b/src/mesa/drivers/dri/i965/intel_tex.c
>> @@ -362,6 +362,7 @@ intel_texture_barrier(struct gl_context *ctx)
>>  {
>> struct brw_context *brw = brw_context(ctx);
>>
>> +   gen7_emit_cs_stall_flush(brw);
>> brw_emit_mi_flush(brw);
> Without commenting on exactly what these do, what texture barrier *should* do 
> is
>
> (1) wait for all previous draws to complete (since they may be in the
> process of filling caches with "old" data)
> (2) flush texture caches
>
> If you flush caches without waiting first, then a draw currently in
> progress may continue dirtying them with the "bad" data.

Thanks for the detailed answer. It is true that I was forgetting (1) at
all. I totally focused on the cache flush, and assumed that there was
something missing there.

> As I said, however, I have no idea what either of the above functions
> *really* do, or what forms of parallelism are possible on intel hw.
> Hopefully the above comments will help someone with the proper
> knowledge evaluate whether this or a different change is necessary.

I really think that brw_emit_mi_flush totally fits on your (2) (and
should not be modified as I suggested on my previous email).
gen7_emit_cs_stall_flush as the better option for (1) is debatable.
Tomorrow I will keep checking to confirm it, or in order to find a
better option. Obviously if something with previous knowledge appears it
will be welcome.

Again, thanks for all the feedback, and the patience.

Best regards




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling

2016-06-28 Thread Andy Furniss

Christian König wrote:

Am 28.06.2016 um 20:11 schrieb Nayan Deshmukh:

Hi Andy,

Thanks for testing the patches.

On Tue, Jun 28, 2016 at 11:26 PM, Andy Furniss > wrote:

Nayan Deshmukh wrote:

Hi Christian and Andy,

I have sent new series of patches which takes care of the
points Christian
pointed out.

I have also made some changes to make it more efficient than
before.

Also due to a wrong message id, I have sent the messages as a
new  thread
instead of replying to this thread.


With the latest patches the artifacts are gone.

Sounds great.


Indeed, if nobody has any more suggestions I'm going to push this
version upstream tomorrow.


One Issue I just tested - it doesn't work with sharpen or denoise,
corrupted output compared to hqscale=0.

I didn't try this before, so don't know if it ever worked.

It's OK with deint.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling

2016-06-28 Thread Nayan Deshmukh
Hi Christian,

I will send a new patch in which the calculation is done before using the
constant buffer.

Also, Grigori suggested me to use gather4 instead of sampler to get the
textures. I tried
using it but I don't see any code from where I can take inspiration to use
that.

Regards,
Nayan.

On Tue, Jun 28, 2016 at 11:51 PM, Christian König 
wrote:

> Am 28.06.2016 um 20:11 schrieb Nayan Deshmukh:
>
> Hi Andy,
>
> Thanks for testing the patches.
>
> On Tue, Jun 28, 2016 at 11:26 PM, Andy Furniss 
> wrote:
>
>> Nayan Deshmukh wrote:
>>
>>> Hi Christian and Andy,
>>>
>>> I have sent new series of patches which takes care of the points
>>> Christian
>>> pointed out.
>>>
>>> I have also made some changes to make it more efficient than before.
>>>
>>> Also due to a wrong message id, I have sent the messages as a new  thread
>>> instead of replying to this thread.
>>>
>>
>> With the latest patches the artifacts are gone.
>
>
> Sounds great.
>
>
> Indeed, if nobody has any more suggestions I'm going to push this version
> upstream tomorrow.
>
> Regards,
> Christian.
>
>
>
>> There is still a slight offset on scaled up vids, this is better than
>> before though, as now there is no offset on unscaled vids.
>>
>>
>> I also see a slight offset but I am not able to find the reason for this
> offset I have set the viewport similiar to the case of hqscaling=0.
>
> Regards,
> Nayan.
>
>
>
>
> ___
> mesa-dev mailing 
> listmesa-dev@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3] swr: Refactor checks for compiler feature flags

2016-06-28 Thread Chuck Atkins
Encapsulate the test for which flags are needed to get a compiler to
support certain features.  Along with this, give various options to try
for AVX and AVX2 support.  Ideally we want to use specific instruction
set feature flags, like -mavx2 for instance instead of -march=haswell,
but the flags required for certain compilers are different.  This
allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
while the Intel compiler which doesn't support those flags can fall
back to using -march=core-avx2.

This addresses a bug where the Intel compiler will silently ignore the
AVX2 instruction feature flags and then potentially fail to build.

v2: Pass preprocessor-check argument as true-state instead of
false-state for clarity.
v3: Reduce AVX2 define test to just __AVX2__.  Additional defines suchas
__FMA__, __BMI2__, and __F16C__ appear to be inconsistently defined
   w.r.t thier availability.

Cc: Tim Rowley 
Signed-off-by: Chuck Atkins 
---
 configure.ac | 86 +++-
 1 file changed, 62 insertions(+), 24 deletions(-)

diff --git a/configure.ac b/configure.ac
index cc9bc47..92c35e8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2330,6 +2330,39 @@ swr_llvm_check() {
 fi
 }
 
+swr_cxx_feature_flags_check() {
+preprocessor_test="$1"
+option_list="$2"
+unset SWR_CXX_FEATURE_FLAGS
+AC_LANG_PUSH([C++])
+save_CXXFLAGS="$CXXFLAGS"
+save_IFS="$IFS"
+IFS=","
+found=0
+for opts in $option_list
+do
+unset IFS
+CXXFLAGS="$opts $save_CXXFLAGS"
+AC_COMPILE_IFELSE(
+[AC_LANG_PROGRAM(
+[   #if !($preprocessor_test)
+#error
+#endif
+])],
+[found=1; break],
+[])
+IFS=","
+done
+IFS="$save_IFS"
+CXXFLAGS="$save_CXXFLAGS"
+AC_LANG_POP([C++])
+if test $found -eq 1; then
+SWR_CXX_FEATURE_FLAGS="$opts"
+return 0
+fi
+return 1
+}
+
 dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this 
block
 if test -n "$with_gallium_drivers"; then
 gallium_drivers=`IFS=', '; echo $with_gallium_drivers`
@@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then
 xswr)
 swr_llvm_check "swr"
 
-AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2])
-SWR_AVX_CXXFLAGS="-mavx"
-SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c"
-
-AC_LANG_PUSH([C++])
-save_CXXFLAGS="$CXXFLAGS"
-CXXFLAGS="-std=c++11 $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([c++11 compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-
-save_CXXFLAGS="$CXXFLAGS"
-CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([AVX compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-
-save_CFLAGS="$CXXFLAGS"
-CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([AVX2 compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-AC_LANG_POP([C++])
-
+AC_MSG_CHECKING([whether $CXX supports c++11])
+if ! swr_cxx_feature_flags_check \
+"__cplusplus >= 201103L" \
+",-std=c++11"; then
+AC_MSG_RESULT([no])
+AC_MSG_ERROR([swr requires C++11 support])
+fi
+AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
+CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS"
+
+AC_MSG_CHECKING([whether $CXX supports AVX])
+if ! swr_cxx_feature_flags_check \
+"defined(__AVX__)" \
+",-mavx,-march=core-avx"; then
+AC_MSG_RESULT([no])
+AC_MSG_ERROR([swr requires AVX compiler support])
+fi
+AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
+SWR_AVX_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS"
 AC_SUBST([SWR_AVX_CXXFLAGS])
+
+AC_MSG_CHECKING([whether $CXX supports AVX2])
+if ! swr_cxx_feature_flags_check \
+"defined(__AVX2__)" \
+",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2"; then
+AC_MSG_RESULT([no])
+AC_MSG_ERROR([swr requires AVX2 compiler support])
+fi
+AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
+SWR_AVX2_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS"
 AC_SUBST([SWR_AVX2_CXXFLAGS])
 
 HAVE_GALLIUM_SWR=yes
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling

2016-06-28 Thread Christian König

Am 28.06.2016 um 20:11 schrieb Nayan Deshmukh:

Hi Andy,

Thanks for testing the patches.

On Tue, Jun 28, 2016 at 11:26 PM, Andy Furniss > wrote:


Nayan Deshmukh wrote:

Hi Christian and Andy,

I have sent new series of patches which takes care of the
points Christian
pointed out.

I have also made some changes to make it more efficient than
before.

Also due to a wrong message id, I have sent the messages as a
new  thread
instead of replying to this thread.


With the latest patches the artifacts are gone.

Sounds great.


Indeed, if nobody has any more suggestions I'm going to push this 
version upstream tomorrow.


Regards,
Christian.


There is still a slight offset on scaled up vids, this is better than
before though, as now there is no offset on unscaled vids.


I also see a slight offset but I am not able to find the reason for this
offset I have set the viewport similiar to the case of hqscaling=0.

Regards,
Nayan.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/st: Include nir.h for nir_shader symbol.

2016-06-28 Thread Matt Turner
On Tue, Jun 28, 2016 at 7:51 AM, Rob Clark  wrote:
> Already half of the world gets recompiled when you touch nir.h, and
> I'd rather not make that worse..

Exactly my thoughts.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Refactor checks for compiler feature flags

2016-06-28 Thread Chuck Atkins
The only guaranteed way I can think of to ensure compiler support is to try
compiling source that calls one intrinsic from each of the used groups.  I
can see that being "more correct" but I can't really think of a situation
where just checking for the __AVX2__ define will fail to build wither.

- Chuck

On Tue, Jun 28, 2016 at 2:10 PM, Chuck Atkins 
wrote:

> So this seems to be different across versions as well.  It looks like
> __AVX__ and __AVX2__ are the only ones we can really count on being there.
> I can drop the second check to just __AVX2__.  I think it's redundant by
> chance though that all CPUs that supported AVX2 also seem to support the
> additional 2 instructions.
>
> - Chuck
>
> On Tue, Jun 28, 2016 at 1:52 PM, Rowley, Timothy O <
> timothy.o.row...@intel.com> wrote:
>
>>
>> > On Jun 28, 2016, at 8:24 AM, Chuck Atkins 
>> wrote:
>> >
>> > Encapsulate the test for which flags are needed to get a compiler to
>> > support certain features.  Along with this, give various options to try
>> > for AVX and AVX2 support.  Ideally we want to use specific instruction
>> > set feature flags, like -mavx2 for instance instead of -march=haswell,
>> > but the flags required for certain compilers are different.  This
>> > allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
>> > while the Intel compiler which doesn't support those flags can fall
>> > back to using -march=core-avx2.
>> >
>> > This addresses a bug where the Intel compiler will silently ignore the
>> > AVX2 instruction feature flags and then potentially fail to build.
>> >
>> > Cc: Tim Rowley 
>> > Signed-off-by: Chuck Atkins 
>> > ---
>> > configure.ac | 86
>> +++-
>> > 1 file changed, 62 insertions(+), 24 deletions(-)
>> >
>> > diff --git a/configure.ac b/configure.ac
>> > index cc9bc47..806850e 100644
>> > --- a/configure.ac
>> > +++ b/configure.ac
>> > @@ -2330,6 +2330,39 @@ swr_llvm_check() {
>> > fi
>> > }
>> >
>> > +swr_cxx_feature_flags_check() {
>> > +ifndef_test=$1
>> > +option_list="$2"
>> > +unset SWR_CXX_FEATURE_FLAGS
>> > +AC_LANG_PUSH([C++])
>> > +save_CXXFLAGS="$CXXFLAGS"
>> > +save_IFS="$IFS"
>> > +IFS=","
>> > +found=0
>> > +for opts in $option_list
>> > +do
>> > +unset IFS
>> > +CXXFLAGS="$opts $save_CXXFLAGS"
>> > +AC_COMPILE_IFELSE(
>> > +[AC_LANG_PROGRAM(
>> > +[   $ifndef_test
>> > +#error
>> > +#endif
>> > +])],
>> > +[found=1; break],
>> > +[])
>> > +IFS=","
>> > +done
>> > +IFS="$save_IFS"
>> > +CXXFLAGS="$save_CXXFLAGS"
>> > +AC_LANG_POP([C++])
>> > +if test $found -eq 1; then
>> > +SWR_CXX_FEATURE_FLAGS="$opts"
>> > +return 0
>> > +fi
>> > +return 1
>> > +}
>> > +
>> > dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after
>> this block
>> > if test -n "$with_gallium_drivers"; then
>> > gallium_drivers=`IFS=', '; echo $with_gallium_drivers`
>> > @@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then
>> > xswr)
>> > swr_llvm_check "swr"
>> >
>> > -AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2])
>> > -SWR_AVX_CXXFLAGS="-mavx"
>> > -SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c"
>> > -
>> > -AC_LANG_PUSH([C++])
>> > -save_CXXFLAGS="$CXXFLAGS"
>> > -CXXFLAGS="-std=c++11 $CXXFLAGS"
>> > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
>> > -  [AC_MSG_ERROR([c++11 compiler support
>> not detected])])
>> > -CXXFLAGS="$save_CXXFLAGS"
>> > -
>> > -save_CXXFLAGS="$CXXFLAGS"
>> > -CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS"
>> > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
>> > -  [AC_MSG_ERROR([AVX compiler support not
>> detected])])
>> > -CXXFLAGS="$save_CXXFLAGS"
>> > -
>> > -save_CFLAGS="$CXXFLAGS"
>> > -CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS"
>> > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
>> > -  [AC_MSG_ERROR([AVX2 compiler support not
>> detected])])
>> > -CXXFLAGS="$save_CXXFLAGS"
>> > -AC_LANG_POP([C++])
>> > -
>> > +AC_MSG_CHECKING([whether $CXX supports c++11])
>> > +if ! swr_cxx_feature_flags_check \
>> > +"#if __cplusplus < 201103L" \
>> > +",-std=c++11"; then
>> > +AC_MSG_RESULT([no])
>> > +AC_MSG_ERROR([swr requires C++11 support])
>> > +fi
>> > +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
>> > +CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS”
>>
>> We don’t want to 

Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling

2016-06-28 Thread Nayan Deshmukh
Hi Andy,

Thanks for testing the patches.

On Tue, Jun 28, 2016 at 11:26 PM, Andy Furniss  wrote:

> Nayan Deshmukh wrote:
>
>> Hi Christian and Andy,
>>
>> I have sent new series of patches which takes care of the points Christian
>> pointed out.
>>
>> I have also made some changes to make it more efficient than before.
>>
>> Also due to a wrong message id, I have sent the messages as a new  thread
>> instead of replying to this thread.
>>
>
> With the latest patches the artifacts are gone.


Sounds great.


> There is still a slight offset on scaled up vids, this is better than
> before though, as now there is no offset on unscaled vids.
>
>
> I also see a slight offset but I am not able to find the reason for this
offset I have set the viewport similiar to the case of hqscaling=0.

Regards,
Nayan.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Refactor checks for compiler feature flags

2016-06-28 Thread Chuck Atkins
So this seems to be different across versions as well.  It looks like
__AVX__ and __AVX2__ are the only ones we can really count on being there.
I can drop the second check to just __AVX2__.  I think it's redundant by
chance though that all CPUs that supported AVX2 also seem to support the
additional 2 instructions.

- Chuck

On Tue, Jun 28, 2016 at 1:52 PM, Rowley, Timothy O <
timothy.o.row...@intel.com> wrote:

>
> > On Jun 28, 2016, at 8:24 AM, Chuck Atkins 
> wrote:
> >
> > Encapsulate the test for which flags are needed to get a compiler to
> > support certain features.  Along with this, give various options to try
> > for AVX and AVX2 support.  Ideally we want to use specific instruction
> > set feature flags, like -mavx2 for instance instead of -march=haswell,
> > but the flags required for certain compilers are different.  This
> > allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
> > while the Intel compiler which doesn't support those flags can fall
> > back to using -march=core-avx2.
> >
> > This addresses a bug where the Intel compiler will silently ignore the
> > AVX2 instruction feature flags and then potentially fail to build.
> >
> > Cc: Tim Rowley 
> > Signed-off-by: Chuck Atkins 
> > ---
> > configure.ac | 86
> +++-
> > 1 file changed, 62 insertions(+), 24 deletions(-)
> >
> > diff --git a/configure.ac b/configure.ac
> > index cc9bc47..806850e 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -2330,6 +2330,39 @@ swr_llvm_check() {
> > fi
> > }
> >
> > +swr_cxx_feature_flags_check() {
> > +ifndef_test=$1
> > +option_list="$2"
> > +unset SWR_CXX_FEATURE_FLAGS
> > +AC_LANG_PUSH([C++])
> > +save_CXXFLAGS="$CXXFLAGS"
> > +save_IFS="$IFS"
> > +IFS=","
> > +found=0
> > +for opts in $option_list
> > +do
> > +unset IFS
> > +CXXFLAGS="$opts $save_CXXFLAGS"
> > +AC_COMPILE_IFELSE(
> > +[AC_LANG_PROGRAM(
> > +[   $ifndef_test
> > +#error
> > +#endif
> > +])],
> > +[found=1; break],
> > +[])
> > +IFS=","
> > +done
> > +IFS="$save_IFS"
> > +CXXFLAGS="$save_CXXFLAGS"
> > +AC_LANG_POP([C++])
> > +if test $found -eq 1; then
> > +SWR_CXX_FEATURE_FLAGS="$opts"
> > +return 0
> > +fi
> > +return 1
> > +}
> > +
> > dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after
> this block
> > if test -n "$with_gallium_drivers"; then
> > gallium_drivers=`IFS=', '; echo $with_gallium_drivers`
> > @@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then
> > xswr)
> > swr_llvm_check "swr"
> >
> > -AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2])
> > -SWR_AVX_CXXFLAGS="-mavx"
> > -SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c"
> > -
> > -AC_LANG_PUSH([C++])
> > -save_CXXFLAGS="$CXXFLAGS"
> > -CXXFLAGS="-std=c++11 $CXXFLAGS"
> > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
> > -  [AC_MSG_ERROR([c++11 compiler support not
> detected])])
> > -CXXFLAGS="$save_CXXFLAGS"
> > -
> > -save_CXXFLAGS="$CXXFLAGS"
> > -CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS"
> > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
> > -  [AC_MSG_ERROR([AVX compiler support not
> detected])])
> > -CXXFLAGS="$save_CXXFLAGS"
> > -
> > -save_CFLAGS="$CXXFLAGS"
> > -CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS"
> > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
> > -  [AC_MSG_ERROR([AVX2 compiler support not
> detected])])
> > -CXXFLAGS="$save_CXXFLAGS"
> > -AC_LANG_POP([C++])
> > -
> > +AC_MSG_CHECKING([whether $CXX supports c++11])
> > +if ! swr_cxx_feature_flags_check \
> > +"#if __cplusplus < 201103L" \
> > +",-std=c++11"; then
> > +AC_MSG_RESULT([no])
> > +AC_MSG_ERROR([swr requires C++11 support])
> > +fi
> > +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
> > +CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS”
>
> We don’t want to globally override CXXFLAGS; AC_SUBST on a SWR_CXXFLAGS
> and using that in swr’s Makefile.am would be better.
>
> > +
> > +AC_MSG_CHECKING([whether $CXX supports AVX])
> > +if ! swr_cxx_feature_flags_check \
> > +"#ifndef __AVX__" \
> > +",-mavx,-march=core-avx"; then
> > +AC_MSG_RESULT([no])
> > +AC_MSG_ERROR([swr requires AVX compiler support])
> > +fi
> > +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
> > +  

Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling

2016-06-28 Thread Andy Furniss

Nayan Deshmukh wrote:

Hi Christian and Andy,

I have sent new series of patches which takes care of the points Christian
pointed out.

I have also made some changes to make it more efficient than before.

Also due to a wrong message id, I have sent the messages as a new  thread
instead of replying to this thread.


With the latest patches the artifacts are gone.

There is still a slight offset on scaled up vids, this is better than
before though, as now there is no offset on unscaled vids.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] radeonsi: enable distributed tess on multi-SE parts only

2016-06-28 Thread Marek Olšák
From: Marek Olšák 

ported from Vulkan
---
 src/gallium/drivers/radeonsi/si_pipe.c  | 4 
 src/gallium/drivers/radeonsi/si_pipe.h  | 1 +
 src/gallium/drivers/radeonsi/si_state_draw.c| 2 +-
 src/gallium/drivers/radeonsi/si_state_shaders.c | 2 +-
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index f38ecc1..633d4bb 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -712,6 +712,10 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws)
sscreen->tess_offchip_block_dw_size =
sscreen->b.family == CHIP_HAWAII ? 4096 : 8192;
 
+   sscreen->has_distributed_tess =
+   sscreen->b.chip_class >= VI &&
+   sscreen->b.info.max_se >= 2;
+
sscreen->b.has_cp_dma = true;
sscreen->b.has_streamout = true;
pipe_mutex_init(sscreen->shader_parts_mutex);
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index ee64ecc..3aff0ac 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -83,6 +83,7 @@ struct si_screen {
struct r600_common_screen   b;
unsignedgs_table_depth;
unsignedtess_offchip_block_dw_size;
+   boolhas_distributed_tess;
 
/* Whether shaders are monolithic (1-part) or separate (3-part). */
booluse_monolithic_shaders;
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 3558510..ce8def4 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -274,7 +274,7 @@ static unsigned si_get_ia_multi_vgt_param(struct si_context 
*sctx,
partial_vs_wave = true;
 
/* Needed for 028B6C_DISTRIBUTION_MODE != 0 */
-   if (sctx->b.chip_class >= VI) {
+   if (sctx->screen->has_distributed_tess) {
if (sctx->gs_shader.cso)
partial_es_wave = true;
else
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 9aa4a7c..4bcdeb6 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -300,7 +300,7 @@ static void si_set_tesseval_regs(struct si_screen *sscreen,
else
topology = V_028B6C_OUTPUT_TRIANGLE_CW;
 
-   if (sscreen->b.chip_class >= VI) {
+   if (sscreen->has_distributed_tess) {
if (sscreen->b.family == CHIP_FIJI ||
sscreen->b.family >= CHIP_POLARIS10)
distribution_mode = 
V_028B6C_DISTRIBUTION_MODE_TRAPEZOIDS;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Refactor checks for compiler feature flags

2016-06-28 Thread Rowley, Timothy O

> On Jun 28, 2016, at 8:24 AM, Chuck Atkins  wrote:
> 
> Encapsulate the test for which flags are needed to get a compiler to
> support certain features.  Along with this, give various options to try
> for AVX and AVX2 support.  Ideally we want to use specific instruction
> set feature flags, like -mavx2 for instance instead of -march=haswell,
> but the flags required for certain compilers are different.  This
> allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
> while the Intel compiler which doesn't support those flags can fall
> back to using -march=core-avx2.
> 
> This addresses a bug where the Intel compiler will silently ignore the
> AVX2 instruction feature flags and then potentially fail to build.
> 
> Cc: Tim Rowley 
> Signed-off-by: Chuck Atkins 
> ---
> configure.ac | 86 +++-
> 1 file changed, 62 insertions(+), 24 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index cc9bc47..806850e 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2330,6 +2330,39 @@ swr_llvm_check() {
> fi
> }
> 
> +swr_cxx_feature_flags_check() {
> +ifndef_test=$1
> +option_list="$2"
> +unset SWR_CXX_FEATURE_FLAGS
> +AC_LANG_PUSH([C++])
> +save_CXXFLAGS="$CXXFLAGS"
> +save_IFS="$IFS"
> +IFS=","
> +found=0
> +for opts in $option_list
> +do
> +unset IFS
> +CXXFLAGS="$opts $save_CXXFLAGS"
> +AC_COMPILE_IFELSE(
> +[AC_LANG_PROGRAM(
> +[   $ifndef_test
> +#error
> +#endif
> +])],
> +[found=1; break],
> +[])
> +IFS=","
> +done
> +IFS="$save_IFS"
> +CXXFLAGS="$save_CXXFLAGS"
> +AC_LANG_POP([C++])
> +if test $found -eq 1; then
> +SWR_CXX_FEATURE_FLAGS="$opts"
> +return 0
> +fi
> +return 1
> +}
> +
> dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this 
> block
> if test -n "$with_gallium_drivers"; then
> gallium_drivers=`IFS=', '; echo $with_gallium_drivers`
> @@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then
> xswr)
> swr_llvm_check "swr"
> 
> -AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2])
> -SWR_AVX_CXXFLAGS="-mavx"
> -SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c"
> -
> -AC_LANG_PUSH([C++])
> -save_CXXFLAGS="$CXXFLAGS"
> -CXXFLAGS="-std=c++11 $CXXFLAGS"
> -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
> -  [AC_MSG_ERROR([c++11 compiler support not 
> detected])])
> -CXXFLAGS="$save_CXXFLAGS"
> -
> -save_CXXFLAGS="$CXXFLAGS"
> -CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS"
> -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
> -  [AC_MSG_ERROR([AVX compiler support not 
> detected])])
> -CXXFLAGS="$save_CXXFLAGS"
> -
> -save_CFLAGS="$CXXFLAGS"
> -CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS"
> -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
> -  [AC_MSG_ERROR([AVX2 compiler support not 
> detected])])
> -CXXFLAGS="$save_CXXFLAGS"
> -AC_LANG_POP([C++])
> -
> +AC_MSG_CHECKING([whether $CXX supports c++11])
> +if ! swr_cxx_feature_flags_check \
> +"#if __cplusplus < 201103L" \
> +",-std=c++11"; then
> +AC_MSG_RESULT([no])
> +AC_MSG_ERROR([swr requires C++11 support])
> +fi
> +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
> +CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS”

We don’t want to globally override CXXFLAGS; AC_SUBST on a SWR_CXXFLAGS and 
using that in swr’s Makefile.am would be better.

> +
> +AC_MSG_CHECKING([whether $CXX supports AVX])
> +if ! swr_cxx_feature_flags_check \
> +"#ifndef __AVX__" \
> +",-mavx,-march=core-avx"; then
> +AC_MSG_RESULT([no])
> +AC_MSG_ERROR([swr requires AVX compiler support])
> +fi
> +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
> +SWR_AVX_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS"
> AC_SUBST([SWR_AVX_CXXFLAGS])
> +
> +AC_MSG_CHECKING([whether $CXX supports AVX2])
> +if ! swr_cxx_feature_flags_check \
> +"#if 
> !(defined(__AVX2__)&(__FMA__)&(__BMI2__)&(__F16C__))” 
> \

Is there any standard that says these are defined if the compiler supports 
them?  With icc 16.0.3, the test falls into the #error path when it tries the 
fallback test of -march=core-avx2.

> +",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2"; then
> +AC_MSG_RESULT([no])
> +AC_MSG_ERROR([swr 

[Mesa-dev] [PATCH 1/4] radeonsi: use conformant line rasterization

2016-06-28 Thread Marek Olšák
From: Marek Olšák 

AA lines are not completely correct (see TODO), but everything else
should be.

+ 3 linestipple piglits
---
 src/gallium/drivers/radeon/cayman_msaa.c | 12 ++--
 src/gallium/drivers/radeon/r600d_common.h|  6 ++
 src/gallium/drivers/radeonsi/si_state.c  | 10 +-
 src/gallium/drivers/radeonsi/si_state_draw.c |  6 --
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeon/cayman_msaa.c 
b/src/gallium/drivers/radeon/cayman_msaa.c
index a9ec4c3..89c4937 100644
--- a/src/gallium/drivers/radeon/cayman_msaa.c
+++ b/src/gallium/drivers/radeon/cayman_msaa.c
@@ -200,6 +200,14 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs *cs, 
int nr_samples,
 {
int setup_samples = nr_samples > 1 ? nr_samples :
overrast_samples > 1 ? overrast_samples : 0;
+   /* Required by OpenGL line rasterization.
+*
+* TODO: We should also enable perpendicular endcaps for AA lines,
+*   but that requires implementing line stippling in the pixel
+*   shader. SC can only do line stippling with axis-aligned
+*   endcaps.
+*/
+   unsigned sc_line_cntl = S_028BDC_DX10_DIAMOND_TEST_ENA(1);
 
if (setup_samples > 1) {
/* indexed by log2(nr_samples) */
@@ -215,7 +223,7 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs *cs, 
int nr_samples,
util_logbase2(util_next_power_of_two(ps_iter_samples));
 
radeon_set_context_reg_seq(cs, CM_R_028BDC_PA_SC_LINE_CNTL, 2);
-   radeon_emit(cs, S_028BDC_LAST_PIXEL(1) |
+   radeon_emit(cs, sc_line_cntl |
S_028BDC_EXPAND_LINE_WIDTH(1)); /* 
CM_R_028BDC_PA_SC_LINE_CNTL */
radeon_emit(cs, S_028BE0_MSAA_NUM_SAMPLES(log_samples) |
S_028BE0_MAX_SAMPLE_DIST(max_dist[log_samples]) |
@@ -242,7 +250,7 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs *cs, 
int nr_samples,
}
} else {
radeon_set_context_reg_seq(cs, CM_R_028BDC_PA_SC_LINE_CNTL, 2);
-   radeon_emit(cs, S_028BDC_LAST_PIXEL(1)); /* 
CM_R_028BDC_PA_SC_LINE_CNTL */
+   radeon_emit(cs, sc_line_cntl); /* CM_R_028BDC_PA_SC_LINE_CNTL */
radeon_emit(cs, 0); /* CM_R_028BE0_PA_SC_AA_CONFIG */
 
radeon_set_context_reg(cs, CM_R_028804_DB_EQAA,
diff --git a/src/gallium/drivers/radeon/r600d_common.h 
b/src/gallium/drivers/radeon/r600d_common.h
index e50de96..6f534b3 100644
--- a/src/gallium/drivers/radeon/r600d_common.h
+++ b/src/gallium/drivers/radeon/r600d_common.h
@@ -203,6 +203,12 @@
 #define   S_028BDC_LAST_PIXEL(x)   (((unsigned)(x) & 0x1) 
<< 10)
 #define   G_028BDC_LAST_PIXEL(x)   (((x) >> 10) & 0x1)
 #define   C_028BDC_LAST_PIXEL  0xFBFF
+#define   S_028BDC_PERPENDICULAR_ENDCAP_ENA(x) (((unsigned)(x) & 0x1) 
<< 11)
+#define   G_028BDC_PERPENDICULAR_ENDCAP_ENA(x) (((x) >> 11) & 0x1)
+#define   C_028BDC_PERPENDICULAR_ENDCAP_ENA0xF7FF
+#define   S_028BDC_DX10_DIAMOND_TEST_ENA(x)(((unsigned)(x) & 0x1) 
<< 12)
+#define   G_028BDC_DX10_DIAMOND_TEST_ENA(x)(((x) >> 12) & 0x1)
+#define   C_028BDC_DX10_DIAMOND_TEST_ENA   0xEFFF
 #define CM_R_028BE0_PA_SC_AA_CONFIG  0x28be0
 #define   S_028BE0_MSAA_NUM_SAMPLES(x)  (((unsigned)(x) & 0x7) 
<< 0)
 #define   S_028BE0_AA_MASK_CENTROID_DTMN(x)(((unsigned)(x) & 0x1) 
<< 4)
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 0a2fdbf..b21fa5c 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -3805,7 +3805,15 @@ static void si_init_config(struct si_context *sctx)
   S_028034_BR_X(16384) | S_028034_BR_Y(16384));
 
si_pm4_set_reg(pm4, R_02820C_PA_SC_CLIPRECT_RULE, 0x);
-   si_pm4_set_reg(pm4, R_028230_PA_SC_EDGERULE, 0x);
+   si_pm4_set_reg(pm4, R_028230_PA_SC_EDGERULE,
+  S_028230_ER_TRI(0xA) |
+  S_028230_ER_POINT(0xA) |
+  S_028230_ER_RECT(0xA) |
+  /* Required by DX10_DIAMOND_TEST_ENA: */
+  S_028230_ER_LINE_LR(0x1A) |
+  S_028230_ER_LINE_RL(0x26) |
+  S_028230_ER_LINE_TB(0xA) |
+  S_028230_ER_LINE_BT(0xA));
/* PA_SU_HARDWARE_SCREEN_OFFSET must be 0 due to hw bug on SI */
si_pm4_set_reg(pm4, R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0);
si_pm4_set_reg(pm4, R_028820_PA_CL_NANINF_CNTL, 0);
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 5f866d5..b9a7c14 100644
--- 

[Mesa-dev] [PATCH 3/4] radeonsi: set optimal VGT_HS_OFFCHIP_PARAM

2016-06-28 Thread Marek Olšák
From: Marek Olšák 

ported from Vulkan
---
 src/gallium/drivers/radeonsi/si_pipe.c  |  6 +++
 src/gallium/drivers/radeonsi/si_pipe.h  |  1 +
 src/gallium/drivers/radeonsi/si_state.h |  2 -
 src/gallium/drivers/radeonsi/si_state_draw.c|  5 ++-
 src/gallium/drivers/radeonsi/si_state_shaders.c | 49 -
 5 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index d835681..f38ecc1 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -706,6 +706,12 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws)
if (!debug_get_bool_option("RADEON_DISABLE_PERFCOUNTERS", false))
si_init_perfcounters(sscreen);
 
+   /* Hawaii has a bug with offchip buffers > 256 that can be worked
+* around by setting 4K granularity.
+*/
+   sscreen->tess_offchip_block_dw_size =
+   sscreen->b.family == CHIP_HAWAII ? 4096 : 8192;
+
sscreen->b.has_cp_dma = true;
sscreen->b.has_streamout = true;
pipe_mutex_init(sscreen->shader_parts_mutex);
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index d181905..ee64ecc 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -82,6 +82,7 @@ struct u_suballocator;
 struct si_screen {
struct r600_common_screen   b;
unsignedgs_table_depth;
+   unsignedtess_offchip_block_dw_size;
 
/* Whether shaders are monolithic (1-part) or separate (3-part). */
booluse_monolithic_shaders;
diff --git a/src/gallium/drivers/radeonsi/si_state.h 
b/src/gallium/drivers/radeonsi/si_state.h
index 2e4923d..9361849 100644
--- a/src/gallium/drivers/radeonsi/si_state.h
+++ b/src/gallium/drivers/radeonsi/si_state.h
@@ -40,8 +40,6 @@
 #define SI_NUM_IMAGES  16
 #define SI_NUM_SHADER_BUFFERS  16
 
-#define SI_TESS_OFFCHIP_BLOCK_SIZE (8192 * 4)
-
 struct si_screen;
 struct si_shader;
 
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index b9a7c14..3558510 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -147,8 +147,9 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
   
output_patch_size));
 
/* Make sure the output data fits in the offchip buffer */
-   *num_patches = MIN2(*num_patches, SI_TESS_OFFCHIP_BLOCK_SIZE /
- output_patch_size);
+   *num_patches = MIN2(*num_patches,
+   (sctx->screen->tess_offchip_block_dw_size * 4) /
+   output_patch_size);
 
/* Not necessary for correctness, but improves performance. The
 * specific value is taken from the proprietary driver.
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 89490bd..9aa4a7c 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1798,9 +1798,38 @@ static bool si_update_spi_tmpring_size(struct si_context 
*sctx)
 
 static void si_init_tess_factor_ring(struct si_context *sctx)
 {
-   unsigned offchip_blocks = sctx->b.chip_class >= CIK ? 256 : 64;
-   assert(!sctx->tf_ring);
+   bool double_offchip_buffers = sctx->b.chip_class >= CIK;
+   unsigned max_offchip_buffers_per_se = double_offchip_buffers ? 128 : 64;
+   unsigned max_offchip_buffers = max_offchip_buffers_per_se *
+  sctx->screen->b.info.max_se;
+   unsigned offchip_granularity;
+
+   switch (sctx->screen->tess_offchip_block_dw_size) {
+   default:
+   assert(0);
+   /* fall through */
+   case 8192:
+   offchip_granularity = V_03093C_X_8K_DWORDS;
+   break;
+   case 4096:
+   offchip_granularity = V_03093C_X_4K_DWORDS;
+   break;
+   }
 
+   switch (sctx->b.chip_class) {
+   case SI:
+   max_offchip_buffers = MIN2(max_offchip_buffers, 126);
+   break;
+   case CIK:
+   max_offchip_buffers = MIN2(max_offchip_buffers, 508);
+   break;
+   case VI:
+   default:
+   max_offchip_buffers = MIN2(max_offchip_buffers, 512);
+   break;
+   }
+
+   assert(!sctx->tf_ring);
sctx->tf_ring = pipe_buffer_create(sctx->b.b.screen, PIPE_BIND_CUSTOM,
   PIPE_USAGE_DEFAULT,
   32768 * sctx->screen->b.info.max_se);
@@ -1812,8 

[Mesa-dev] [PATCH 2/4] radeonsi: enable CU0 in each SE for LS-HS execution

2016-06-28 Thread Marek Olšák
From: Marek Olšák 

Offchip-only tessellation allows this.
---
 src/gallium/drivers/radeonsi/si_state.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index b21fa5c..54febce 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -3829,6 +3829,7 @@ static void si_init_config(struct si_context *sctx)
si_pm4_set_reg(pm4, R_028408_VGT_INDX_OFFSET, 0);
 
if (sctx->b.chip_class >= CIK) {
+   si_pm4_set_reg(pm4, R_00B51C_SPI_SHADER_PGM_RSRC3_LS, 
S_00B51C_CU_EN(0x));
si_pm4_set_reg(pm4, R_00B41C_SPI_SHADER_PGM_RSRC3_HS, 0);
si_pm4_set_reg(pm4, R_00B31C_SPI_SHADER_PGM_RSRC3_ES, 
S_00B31C_CU_EN(0x));
si_pm4_set_reg(pm4, R_00B21C_SPI_SHADER_PGM_RSRC3_GS, 
S_00B21C_CU_EN(0x));
@@ -3841,7 +3842,6 @@ static void si_init_config(struct si_context *sctx)
 *
 * LATE_ALLOC_VS = 2 is the highest safe number.
 */
-   si_pm4_set_reg(pm4, R_00B51C_SPI_SHADER_PGM_RSRC3_LS, 
S_00B51C_CU_EN(0x));
si_pm4_set_reg(pm4, R_00B118_SPI_SHADER_PGM_RSRC3_VS, 
S_00B118_CU_EN(0x));
si_pm4_set_reg(pm4, R_00B11C_SPI_SHADER_LATE_ALLOC_VS, 
S_00B11C_LIMIT(2));
} else {
@@ -3850,7 +3850,6 @@ static void si_init_config(struct si_context *sctx)
 * - VS can't execute on CU0.
 * - If HS writes outputs to LDS, LS can't execute on 
CU0.
 */
-   si_pm4_set_reg(pm4, R_00B51C_SPI_SHADER_PGM_RSRC3_LS, 
S_00B51C_CU_EN(0xfffe));
si_pm4_set_reg(pm4, R_00B118_SPI_SHADER_PGM_RSRC3_VS, 
S_00B118_CU_EN(0xfffe));
si_pm4_set_reg(pm4, R_00B11C_SPI_SHADER_LATE_ALLOC_VS, 
S_00B11C_LIMIT(31));
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] i965: Refactor intel_get_param()

2016-06-28 Thread Chad Versace
Replace the function's __DRIscreen parameter with struct intel_screen.
The callsites feel more natural that way.
---
 src/mesa/drivers/dri/i965/intel_screen.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 869119b..b693c45 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -970,7 +970,7 @@ static const __DRIextension *intelRobustScreenExtensions[] 
= {
 };
 
 static int
-intel_get_param(__DRIscreen *psp, int param, int *value)
+intel_get_param(struct intel_screen *screen, int param, int *value)
 {
int ret;
struct drm_i915_getparam gp;
@@ -979,7 +979,8 @@ intel_get_param(__DRIscreen *psp, int param, int *value)
gp.param = param;
gp.value = value;
 
-   ret = drmCommandWriteRead(psp->fd, DRM_I915_GETPARAM, , sizeof(gp));
+   ret = drmCommandWriteRead(screen->driScrnPriv->fd,
+ DRM_I915_GETPARAM, , sizeof(gp));
if (ret < 0 && ret != -EINVAL)
 _mesa_warning(NULL, "drm_i915_getparam: %d", ret);
 
@@ -987,10 +988,10 @@ intel_get_param(__DRIscreen *psp, int param, int *value)
 }
 
 static bool
-intel_get_boolean(__DRIscreen *psp, int param)
+intel_get_boolean(struct intel_screen *screen, int param)
 {
int value = 0;
-   return (intel_get_param(psp, param, ) == 0) && value;
+   return (intel_get_param(screen, param, ) == 0) && value;
 }
 
 static void
@@ -1125,12 +1126,12 @@ intel_detect_sseu(struct intel_screen *intelScreen)
intelScreen->subslice_total = -1;
intelScreen->eu_total = -1;
 
-   ret = intel_get_param(intelScreen->driScrnPriv, I915_PARAM_SUBSLICE_TOTAL,
+   ret = intel_get_param(intelScreen, I915_PARAM_SUBSLICE_TOTAL,
  >subslice_total);
if (ret < 0 && ret != -EINVAL)
   goto err_out;
 
-   ret = intel_get_param(intelScreen->driScrnPriv,
+   ret = intel_get_param(intelScreen,
  I915_PARAM_EU_TOTAL, >eu_total);
if (ret < 0 && ret != -EINVAL)
   goto err_out;
@@ -1167,7 +1168,7 @@ intel_init_bufmgr(struct intel_screen *intelScreen)
 
drm_intel_bufmgr_gem_enable_fenced_relocs(intelScreen->bufmgr);
 
-   if (!intel_get_boolean(spriv, I915_PARAM_HAS_RELAXED_DELTA)) {
+   if (!intel_get_boolean(intelScreen, I915_PARAM_HAS_RELAXED_DELTA)) {
   fprintf(stderr, "[%s: %u] Kernel 2.6.39 required.\n", __func__, 
__LINE__);
   return false;
}
-- 
2.9.0.rc2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3] i965: Cleanups for DRM_IOCTL_I915_GETPARAM

2016-06-28 Thread Chad Versace
I've begun investigating Android sync fds, whose support will be
advertised with a new i915 getparam. While investigating the new
feature, I wrote this little cleanup series.

Chad Versace (3):
  i965: Refactor intel_get_param()
  i965: Use drmIoctl for DRM_I915_GETPARAM
  i965: Use intel_get_param() more often

 src/mesa/drivers/dri/i965/intel_screen.c | 30 --
 1 file changed, 12 insertions(+), 18 deletions(-)

-- 
2.9.0.rc2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] i965: Use drmIoctl for DRM_I915_GETPARAM

2016-06-28 Thread Chad Versace
Stop using drmCommandWriteRead for such a simple ioctl.
---
 src/mesa/drivers/dri/i965/intel_screen.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index b693c45..f7f806e 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -979,8 +979,7 @@ intel_get_param(struct intel_screen *screen, int param, int 
*value)
gp.param = param;
gp.value = value;
 
-   ret = drmCommandWriteRead(screen->driScrnPriv->fd,
- DRM_I915_GETPARAM, , sizeof(gp));
+   ret = drmIoctl(screen->driScrnPriv->fd, DRM_IOCTL_I915_GETPARAM, );
if (ret < 0 && ret != -EINVAL)
 _mesa_warning(NULL, "drm_i915_getparam: %d", ret);
 
-- 
2.9.0.rc2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] i965: Use intel_get_param() more often

2016-06-28 Thread Chad Versace
Replace some open-coded ioctls with intel_get_param().

This is just a cleanup. No change in behavior.
---
 src/mesa/drivers/dri/i965/intel_screen.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index f7f806e..4194fd6 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1604,12 +1604,10 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
  (ret != -1 || errno != EINVAL);
}
 
-   struct drm_i915_getparam getparam;
-   getparam.param = I915_PARAM_CMD_PARSER_VERSION;
-   getparam.value = >cmd_parser_version;
-   const int ret = drmIoctl(psp->fd, DRM_IOCTL_I915_GETPARAM, );
-   if (ret == -1)
+   if (intel_get_param(intelScreen, I915_PARAM_CMD_PARSER_VERSION,
+   >cmd_parser_version) < 0) {
   intelScreen->cmd_parser_version = 0;
+   }
 
/* Haswell requires command parser version 6 in order to write to the
 * MI_MATH GPR registers, and version 7 in order to use
@@ -1629,12 +1627,8 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
intelScreen->program_id = 1;
 
if (intelScreen->devinfo->has_resource_streamer) {
-  int val = -1;
-  getparam.param = I915_PARAM_HAS_RESOURCE_STREAMER;
-  getparam.value = 
-
-  drmIoctl(psp->fd, DRM_IOCTL_I915_GETPARAM, );
-  intelScreen->has_resource_streamer = val > 0;
+  intelScreen->has_resource_streamer =
+intel_get_boolean(intelScreen, I915_PARAM_HAS_RESOURCE_STREAMER);
}
 
return (const __DRIconfig**) intel_screen_make_configs(psp);
-- 
2.9.0.rc2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Make single-buffered GLES representation internally consistent

2016-06-28 Thread Gurchetan Singh
Hi Ilia,

Setting it correctly initially is more messy.  At least in my use case, we
know the context type from EGL_RENDERABLE_TYPE before the framebuffer is
created.  We would need to add the context information to the visual used
by _mesa_initialize_window_framebuffer.  That requires including
main/mtypes.h in the EGL part of the source tree, which nobody else does
and leads to build system issues.

We could also make the change in _mesa_make_current instead of get.c, but
once again we'll be flipping the original value.  I'll send a modified
patch shortly unless somebody has any other ideas.

On Mon, Jun 27, 2016 at 7:55 PM, Ilia Mirkin  wrote:

> On Mon, Jun 27, 2016 at 6:30 PM, Gurchetan Singh
>  wrote:
> > Hi Ilia,
> >
> > The changes for get.c where prompted by the es3fIntegerStateQueryTests
> (see
> > modules/gles3/functional/es3fIntegerStateQueryTests.cpp in the dEQP
> tree).
> > Specifically, these few lines:
> >
> >>> const GLint validInitialValues[] = {GL_BACK, GL_NONE};
> >>> m_verifier->verifyIntegerAnyOf(m_testCtx, GL_READ_BUFFER,
> >>> validInitialValues, DE_LENGTH_OF_ARRAY(validInitialValues));
> >>> expectError(GL_NO_ERROR);
> >
> > We initially set ColorReadBuffer to GL_FRONT in
> > _mesa_initialize_window_framebuffer for single-buffered configs.
>
> So ... could we initialize it to GL_BACK for GLES and avoid this pain?
> Unfortunately I have no idea what the implications of that would be.
>
> >
> > We could also make sure the context is single-buffered in get.c to
> further
> > avoid bugs.  Let me know if that works for you and I'll send a modified
> > patch.
> >
> > I do agree it is a bit hacky ... I'd definitely be interested in
> alternative
> > solutions.
>
> If you're flipping the value in the getter, you might as well set that
> to be the value from the very beginning. However I don't know what the
> effects of that are.
>
>   -ilia
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: set the new pipe_surface::alpha_one field for RGB surfaces

2016-06-28 Thread Brian Paul
As I wrote in my other message yesterday, I was going to disable support 
in our VMware driver for the RGBX8 formats because of difficulties with 
ARB_copy_image functionality.  This led to the issue of blending to RGBA 
surfaces as if they were RGBX.


I guess we could set pipe_surface::format = RGBX for this case.  Though 
this would probably lead to some special-case code in the driver(s) (but 
probably comparable to what I had done for 'alpha_one'.)


The issue is when a driver says it can't support RGBX formats, it would 
still need to be prepared to handle some of those formats in the 
pipe_surface::format field.  As it is now, when a driver says it can't 
support a particular format, it really means it and is probably 
unprepared to see it anywhere.  So, if we set pipe_surface::format = 
RGBX in the state tracker, there's some regression risk across all 
drivers.  The flag I proposed wouldn't have that risk.


Anyway, I think I've found work-arounds in our driver to keep RGBX 
support so that this patch isn't needed after all.  I just have to 
finish more piglit testing.


-Brian


On 06/28/2016 09:11 AM, Marek Olšák wrote:

I guess you need this because your driver doesn't support LUMINANCE
and st/mesa selects RGBA, right? In that case, you can just set RGBX
in pipe_surface::format and you don't need another flag.

It would be better to select RGBX at renderbuffer creation, but doing
it later is fine as well.

Marek

On Fri, Jun 24, 2016 at 4:43 PM, Brian Paul  wrote:

This indicates the alpha channel of the surface should always be one.
Drivers can use this to adjust blending terms when needed.

v2: also check for R, RG, LUMINANCE surfaces, per Ilia
---
  src/mesa/state_tracker/st_cb_fbo.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
b/src/mesa/state_tracker/st_cb_fbo.c
index 9801b1f..843ff83 100644
--- a/src/mesa/state_tracker/st_cb_fbo.c
+++ b/src/mesa/state_tracker/st_cb_fbo.c
@@ -216,6 +216,11 @@ st_renderbuffer_alloc_storage(struct gl_context * ctx,
return FALSE;

 u_surface_default_template(_tmpl, strb->texture);
+   surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB ||
+  strb->Base._BaseFormat == GL_RG ||
+  strb->Base._BaseFormat == GL_R ||
+  strb->Base._BaseFormat == GL_LUMINANCE);
+
 strb->surface = pipe->create_surface(pipe,
  strb->texture,
  _tmpl);
@@ -463,6 +468,10 @@ st_update_renderbuffer_surface(struct st_context *st,
/* create a new pipe_surface */
struct pipe_surface surf_tmpl;
memset(_tmpl, 0, sizeof(surf_tmpl));
+  surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB ||
+ strb->Base._BaseFormat == GL_RG ||
+ strb->Base._BaseFormat == GL_R ||
+ strb->Base._BaseFormat == GL_LUMINANCE);
surf_tmpl.format = format;
surf_tmpl.u.tex.level = level;
surf_tmpl.u.tex.first_layer = first_layer;
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev=CwIBaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=T0t4QG7chq2ZwJo6wilkFznRSFy-8uDKartPGbomVj8=aRQY4-PdtA1sKl095cbVP0IOaCsr4WAgTK9bl_Loek0=vyrVquTCLd-SUurntQWZk5fNrQUvyVGdEzFB8q5kQ_k=


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: adds gen7_emit_cs_stall_flush on intel_texture_barrier

2016-06-28 Thread Ilia Mirkin
On Tue, Jun 28, 2016 at 11:46 AM, Alejandro Piñeiro
 wrote:
> Fixes:
> GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass
>
> On Haswell, Broadwell and Skylake (note that in order to execute
> that test, it is needed to override GL and GLSL versions).
>
> I was not able to find a documentation reference that justifies it.
> ---
>
> Having said, I didn't find a documentation reference explicitly
> mention that this is needed.
>
> Initially I thought that a flag was missing when calling
> emit_pipe_control_flush at brw_emit_mi_flush, but it was not the case
> as far as I saw.  Then I noted that there is a gen6 workaround on that
> code:
>
>  if (brw->gen == 6) {
> /* Hardware workaround: SNB B-Spec says:
>  *
>  * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
>  * Flush Enable =1, a PIPE_CONTROL with any non-zero
>  * post-sync-op is required.
>  */
> brw_emit_post_sync_nonzero_flush(brw);
>  }
>
> I tested calling that method for any gen, guessing if the workaround
> was needed also for other gens, and the test got fixed. But looking at
> the documentation of other gens, I didn't find the need for this
> workaround. For that reason I moved to use gen7_emit_cs_stall, that is
> less agressive and get the test fixed too. It seems that in order to
> get a complete flush you need a cs stall flush with a
> pipe_control_write. But again, I didn't find any reference at the PRMs
> confirming it.
>
> Intuitively, this would be needed on brw_emit_mi_flush or even at
> brw_emit_pipe_control_flush (this one already include some
> gen-specific workarounds), but I prefered to keep it on the only place
> that seems to need it for now.
>
> In addition to solve that CTS test, it also gets it passing for the
> test I recently sent to the piglit list, and not included on master
> yet (acked for now):
> https://lists.freedesktop.org/archives/piglit/2016-June/020055.html
>
> That piglit patch adds 48 parameter combination for the basic
> test. Without this mesa patch 5-6 subtests fails. With this patch all
> of them passes. Tested on Haswell, Broadwell and Skylake too.
>
>  src/mesa/drivers/dri/i965/intel_tex.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_tex.c 
> b/src/mesa/drivers/dri/i965/intel_tex.c
> index cac33ac..e7459cd 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex.c
> @@ -362,6 +362,7 @@ intel_texture_barrier(struct gl_context *ctx)
>  {
> struct brw_context *brw = brw_context(ctx);
>
> +   gen7_emit_cs_stall_flush(brw);
> brw_emit_mi_flush(brw);

Without commenting on exactly what these do, what texture barrier *should* do is

(1) wait for all previous draws to complete (since they may be in the
process of filling caches with "old" data)
(2) flush texture caches

If you flush caches without waiting first, then a draw currently in
progress may continue dirtying them with the "bad" data.

As I said, however, I have no idea what either of the above functions
*really* do, or what forms of parallelism are possible on intel hw.
Hopefully the above comments will help someone with the proper
knowledge evaluate whether this or a different change is necessary.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: adds gen7_emit_cs_stall_flush on intel_texture_barrier

2016-06-28 Thread Alejandro Piñeiro
Fixes:
GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass

On Haswell, Broadwell and Skylake (note that in order to execute
that test, it is needed to override GL and GLSL versions).

I was not able to find a documentation reference that justifies it.
---

Having said, I didn't find a documentation reference explicitly
mention that this is needed.

Initially I thought that a flag was missing when calling
emit_pipe_control_flush at brw_emit_mi_flush, but it was not the case
as far as I saw.  Then I noted that there is a gen6 workaround on that
code:

 if (brw->gen == 6) {
/* Hardware workaround: SNB B-Spec says:
 *
 * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
 * Flush Enable =1, a PIPE_CONTROL with any non-zero
 * post-sync-op is required.
 */
brw_emit_post_sync_nonzero_flush(brw);
 }

I tested calling that method for any gen, guessing if the workaround
was needed also for other gens, and the test got fixed. But looking at
the documentation of other gens, I didn't find the need for this
workaround. For that reason I moved to use gen7_emit_cs_stall, that is
less agressive and get the test fixed too. It seems that in order to
get a complete flush you need a cs stall flush with a
pipe_control_write. But again, I didn't find any reference at the PRMs
confirming it.

Intuitively, this would be needed on brw_emit_mi_flush or even at
brw_emit_pipe_control_flush (this one already include some
gen-specific workarounds), but I prefered to keep it on the only place
that seems to need it for now.

In addition to solve that CTS test, it also gets it passing for the
test I recently sent to the piglit list, and not included on master
yet (acked for now):
https://lists.freedesktop.org/archives/piglit/2016-June/020055.html

That piglit patch adds 48 parameter combination for the basic
test. Without this mesa patch 5-6 subtests fails. With this patch all
of them passes. Tested on Haswell, Broadwell and Skylake too.

 src/mesa/drivers/dri/i965/intel_tex.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/intel_tex.c 
b/src/mesa/drivers/dri/i965/intel_tex.c
index cac33ac..e7459cd 100644
--- a/src/mesa/drivers/dri/i965/intel_tex.c
+++ b/src/mesa/drivers/dri/i965/intel_tex.c
@@ -362,6 +362,7 @@ intel_texture_barrier(struct gl_context *ctx)
 {
struct brw_context *brw = brw_context(ctx);
 
+   gen7_emit_cs_stall_flush(brw);
brw_emit_mi_flush(brw);
 }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: set the new pipe_surface::alpha_one field for RGB surfaces

2016-06-28 Thread Marek Olšák
On Tue, Jun 28, 2016 at 5:16 PM, Ilia Mirkin  wrote:
> The main issue is when the st selects an alpha-ful format, but the GL
> wants an alpha-less format. The driver has no way of knowing. This
> gives it a way of knowing.
>
> The alternative is that the driver has to support every format and we
> drop all the fallbacks from st_format. Brian's proposal seems like a
> simpler solution. (This happens on nvc0, for example - RGB10A2 is
> supported, but RGB10X2 isn't. So the st picks RGB10A2 and nvc0 is none
> the wiser - until someone tries to do DST_ALPHA blending.)

Note that no hardware supports RGBX fully as pipe_surface. Radeon also
only supports RGBA and there is a state to force DST_ALPHA to one.
That's enough to pass all tests. The idea is to treat RGBX as RGBA in
all places except blending, and pipe_surface can already describe
that.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] clover: fix getting struct args api size

2016-06-28 Thread Jan Vesely
On Thu, 2016-06-23 at 18:03 -0700, Francisco Jerez wrote:
> Jan Vesely  writes:
> 
> > On Wed, 2016-06-22 at 20:22 -0700, Francisco Jerez wrote:
> > > Jan Vesely  writes:
> > > 
> > > > On Wed, 2016-06-22 at 17:07 -0700, Francisco Jerez wrote:
> > > > > Jan Vesely  writes:
> > > > > 
> > > > > > On Mon, 2016-06-13 at 17:24 -0700, Francisco Jerez wrote:
> > > > > > > Serge Martin  writes:
> > > > > > > 
> > > > > > > > This fix getting the size of a struct arg. vec3 types
> > > > > > > > still
> > > > > > > > work
> > > > > > > > ok.
> > > > > > > > Only buit-in args need to have power of two alignment,
> > > > > > > > getTypeAllocSize
> > > > > > > > reports the correct size.
> > > > > > > > ---
> > > > > > > >  src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > > > > > > | 3
> > > > > > > > ++-
> > > > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git
> > > > > > > > a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > > > > > > b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > > > > > > index 03487d6..9af51539 100644
> > > > > > > > ---
> > > > > > > > a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > > > > > > +++
> > > > > > > > b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > > > > > > @@ -472,7 +472,8 @@ namespace {
> > > > > > > >   // aligned to the next larger power of
> > > > > > > > two".  We
> > > > > > > > need
> > > > > > > > this
> > > > > > > >   // alignment for three element vectors, which
> > > > > > > > have
> > > > > > > >   // non-power-of-2 store size.
> > > > > > > > - const unsigned arg_api_size =
> > > > > > > > util_next_power_of_two(arg_store_size);
> > > > > > > > + const unsigned arg_api_size = arg_type-
> > > > > > > > > isStructTy()
> > > > > > > > ?
> > > > > > > > +   arg_store_size :
> > > > > > > > util_next_power_of_two(arg_store_size);
> > > > > > > >  
> > > > > > > Hm...  Isn't this still going to be broken if you pass a
> > > > > > > struct
> > > > > > > argument
> > > > > > > to a kernel function and the alignment of any of the
> > > > > > > struct
> > > > > > > members
> > > > > > > doesn't match the target-specific data layout?  Not sure
> > > > > > > we
> > > > > > > can
> > > > > > > fix
> > > > > > > this
> > > > > > > sensibly without requiring the target's data layout to
> > > > > > > match
> > > > > > > the
> > > > > > > CL
> > > > > > > API
> > > > > > > exactly.  Any suggestions Tom?
> > > > > > 
> > > > > > according to 6.7.2.1 compilers can arbitrarily insert
> > > > > > padding
> > > > > > between
> > > > > > struct members (except at the beginning).
> > > > > 
> > > > > What spec version are you looking at?  My CL spec doesn't
> > > > > have
> > > > > any
> > > > > section labeled 6.7.2.1.
> > > > 
> > > > c99 specs, I did not find anything specific for CLC (it might
> > > > be
> > > > that I
> > > > just need to look harder). CLC 2.0 adds additional constraint
> > > > that
> > > > you
> > > > can't use address space qualifiers.
> > > > 
> > > 
> > > I'd expect that whatever the CL spec says regarding the memory
> > > layout
> > > of
> > > CLC types (e.g. section 6.1.5 which specifies the usual alignment
> > > rules
> > > for CL types and section 6.11.1 and 6.11.3 which specify various
> > > variable and type declaration attributes giving finer control
> > > over
> > > the
> > > alignment of variable and struct member declarations) fully
> > > overrides
> > > the C99 spec.
> > 
> > Right, even if we consider that none of the C99 6.7.2.1 apply (and
> > at
> > least CL2.0 6.5.6 does not make it sound so), it only gives us one
> > side, we can check that the CLC struct layout follows what we would
> > expect. We don't have means to check and enforce that the host side
> > struct layout is compatible.
> > 
> 
> Yes, exactly, the CL spec doesn't have anything to say about the
> host-side memory layout, that's up to the host platform's ABI to
> define.
> 
> > > 
> > > > > 
> > > > > > Even if size/alignment of individual members match CL API
> > > > > > exactly,
> > > > > > there's no guarantee that the structure layout/size will be
> > > > > > the
> > > > > > same.
> > > > > > 
> > > > > How can you exchange structured data with a CL kernel then,
> > > > > assuming
> > > > > that the layout of structure types in memory is fully
> > > > > unspecified
> > > > > as
> > > > > you
> > > > > say?
> > > > 
> > > > that is my point. My understanding is that it relies on a
> > > > silent
> > > > assumption that both CLC and the host compiler will create the
> > > > same
> > > > structure layout given the same structure elements.
> > > > 
> > > > big endian host can create:
> > > > struct foo {
> > > > cl_int a;
> > > > // 16 bit padding;
> > > > cl_short b;
> > > > cl_int c;
> > > > };
> 

Re: [Mesa-dev] [PATCH 1/2] glsl: add driconf to zero-init unintialized vars

2016-06-28 Thread Marek Olšák
On Mon, Jun 27, 2016 at 9:28 PM, Rob Clark  wrote:
> On Mon, Jun 27, 2016 at 3:06 PM, Kenneth Graunke  
> wrote:
>> On Monday, June 27, 2016 11:43:28 AM PDT Matt Turner wrote:
>>> On Mon, Jun 27, 2016 at 4:44 AM, Rob Clark  wrote:
>>> > On Mon, Jun 27, 2016 at 7:13 AM, Alan Swanson  
>>> > wrote:
>>> >> On 2016-06-25 13:37, Rob Clark wrote:
>>> >>>
>>> >>> Some games are sloppy.. perhaps because it is defined behavior for DX or
>>> >>> perhaps because nv blob driver defaults things to zero.
>>> >>>
>>> >>> So add driconf param to force uninitialized variables to default to 
>>> >>> zero.
>>> >>>
>>> >>> This issue was observed with rust, from steam store.  But has surfaced
>>> >>> elsewhere in the past.
>>> >>>
>>> >>> Signed-off-by: Rob Clark 
>>> >>> ---
>>> >>> Note that I left out the drirc bit, since not entirely sure how to
>>> >>> identify this game.  (I don't actually have the game, just working off
>>> >>> of an apitrace)
>>> >>>
>>> >>> Possibly worth mentioning that for the shaders using uninitialized vars
>>> >>> having zero-initializers lets constant-propagation get rid of a whole
>>> >>> lot of instructions.  One shader I saw dropped to less than half of
>>> >>> it's original instruction count.
>>> >>
>>> >>
>>> >> If the default for uninitialised variables is undefined, then with the
>>> >> reported shader optimisations why bother with the (DRI) option when
>>> >> zeroing could still essentially be classed as undefined?
>>> >>
>>> >> Cuts the patch down to just the src/compiler/glsl/ast_to_hir.cpp change.
>>> >
>>> > I did suggest that on #dri-devel, but Jason had a theoretical example
>>> > where it would hurt.. iirc something like:
>>> >
>>> >   float maybe_undef;
>>> >   for (int i = 0; i < some_uniform_at_least_one; i++)
>>> >  maybe_undef = ...
>>> >
>>> > also, he didn't want to hide shader bugs that app should fix.
>>> >
>>> > It would be interesting to rush shaderdb w/ glsl_zero_init=true and
>>> > see what happens, but I didn't get around to that yet.
>>>
>>> Here's what I get on i965. It's not a clear win.
>>>
>>> total instructions in shared programs: 5249030 -> 5249002 (-0.00%)
>>> instructions in affected programs: 28936 -> 28908 (-0.10%)
>>> helped: 66
>>> HURT: 132
>>>
>>> total cycles in shared programs: 57966694 -> 57956306 (-0.02%)
>>> cycles in affected programs: 1136118 -> 1125730 (-0.91%)
>>> helped: 78
>>> HURT: 106
>>
>> I suspect most of the help is because we're missing undef optimizations,
>> such as CSE...while zero could be CSE'd.  (I have a patch, but it hurts
>> things too...)
>
> right, I was thinking that treating undef as zero in constant-folding
> would have the same effect.. ofc it might make shader bugs less
> obvious.
>
> Btw, does anyone know what fglrx does?  Afaiu nv blob treats undef as
> zero.  If fglrx does the same, I suppose that strengthens the argument
> for "just do this unconditionally".

No idea what fglrx does, but LLVM does eliminate code with undefined
inputs. Initializing everything to 0 might make that worse.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: set the new pipe_surface::alpha_one field for RGB surfaces

2016-06-28 Thread Ilia Mirkin
The main issue is when the st selects an alpha-ful format, but the GL
wants an alpha-less format. The driver has no way of knowing. This
gives it a way of knowing.

The alternative is that the driver has to support every format and we
drop all the fallbacks from st_format. Brian's proposal seems like a
simpler solution. (This happens on nvc0, for example - RGB10A2 is
supported, but RGB10X2 isn't. So the st picks RGB10A2 and nvc0 is none
the wiser - until someone tries to do DST_ALPHA blending.)

  -ilia


On Tue, Jun 28, 2016 at 11:11 AM, Marek Olšák  wrote:
> I guess you need this because your driver doesn't support LUMINANCE
> and st/mesa selects RGBA, right? In that case, you can just set RGBX
> in pipe_surface::format and you don't need another flag.
>
> It would be better to select RGBX at renderbuffer creation, but doing
> it later is fine as well.
>
> Marek
>
> On Fri, Jun 24, 2016 at 4:43 PM, Brian Paul  wrote:
>> This indicates the alpha channel of the surface should always be one.
>> Drivers can use this to adjust blending terms when needed.
>>
>> v2: also check for R, RG, LUMINANCE surfaces, per Ilia
>> ---
>>  src/mesa/state_tracker/st_cb_fbo.c | 9 +
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
>> b/src/mesa/state_tracker/st_cb_fbo.c
>> index 9801b1f..843ff83 100644
>> --- a/src/mesa/state_tracker/st_cb_fbo.c
>> +++ b/src/mesa/state_tracker/st_cb_fbo.c
>> @@ -216,6 +216,11 @@ st_renderbuffer_alloc_storage(struct gl_context * ctx,
>>return FALSE;
>>
>> u_surface_default_template(_tmpl, strb->texture);
>> +   surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB ||
>> +  strb->Base._BaseFormat == GL_RG ||
>> +  strb->Base._BaseFormat == GL_R ||
>> +  strb->Base._BaseFormat == GL_LUMINANCE);
>> +
>> strb->surface = pipe->create_surface(pipe,
>>  strb->texture,
>>  _tmpl);
>> @@ -463,6 +468,10 @@ st_update_renderbuffer_surface(struct st_context *st,
>>/* create a new pipe_surface */
>>struct pipe_surface surf_tmpl;
>>memset(_tmpl, 0, sizeof(surf_tmpl));
>> +  surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB ||
>> + strb->Base._BaseFormat == GL_RG ||
>> + strb->Base._BaseFormat == GL_R ||
>> + strb->Base._BaseFormat == GL_LUMINANCE);
>>surf_tmpl.format = format;
>>surf_tmpl.u.tex.level = level;
>>surf_tmpl.u.tex.first_layer = first_layer;
>> --
>> 1.9.1
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: set the new pipe_surface::alpha_one field for RGB surfaces

2016-06-28 Thread Marek Olšák
I guess you need this because your driver doesn't support LUMINANCE
and st/mesa selects RGBA, right? In that case, you can just set RGBX
in pipe_surface::format and you don't need another flag.

It would be better to select RGBX at renderbuffer creation, but doing
it later is fine as well.

Marek

On Fri, Jun 24, 2016 at 4:43 PM, Brian Paul  wrote:
> This indicates the alpha channel of the surface should always be one.
> Drivers can use this to adjust blending terms when needed.
>
> v2: also check for R, RG, LUMINANCE surfaces, per Ilia
> ---
>  src/mesa/state_tracker/st_cb_fbo.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
> b/src/mesa/state_tracker/st_cb_fbo.c
> index 9801b1f..843ff83 100644
> --- a/src/mesa/state_tracker/st_cb_fbo.c
> +++ b/src/mesa/state_tracker/st_cb_fbo.c
> @@ -216,6 +216,11 @@ st_renderbuffer_alloc_storage(struct gl_context * ctx,
>return FALSE;
>
> u_surface_default_template(_tmpl, strb->texture);
> +   surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB ||
> +  strb->Base._BaseFormat == GL_RG ||
> +  strb->Base._BaseFormat == GL_R ||
> +  strb->Base._BaseFormat == GL_LUMINANCE);
> +
> strb->surface = pipe->create_surface(pipe,
>  strb->texture,
>  _tmpl);
> @@ -463,6 +468,10 @@ st_update_renderbuffer_surface(struct st_context *st,
>/* create a new pipe_surface */
>struct pipe_surface surf_tmpl;
>memset(_tmpl, 0, sizeof(surf_tmpl));
> +  surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB ||
> + strb->Base._BaseFormat == GL_RG ||
> + strb->Base._BaseFormat == GL_R ||
> + strb->Base._BaseFormat == GL_LUMINANCE);
>surf_tmpl.format = format;
>surf_tmpl.u.tex.level = level;
>surf_tmpl.u.tex.first_layer = first_layer;
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/st: Include nir.h for nir_shader symbol.

2016-06-28 Thread Rob Clark
On Mon, Jun 27, 2016 at 10:08 PM, Matt Turner  wrote:
> On Mon, Jun 27, 2016 at 6:45 PM, Vinson Lee  wrote:
>> Fix this build error with GCC 4.4.
>>
>>   CC state_tracker/st_nir_lower_builtin.lo
>> In file included from state_tracker/st_nir_lower_builtin.c:61:
>> state_tracker/st_nir.h:34: error: redefinition of typedef ‘nir_shader’
>> ../../src/compiler/nir/nir.h:1830: note: previous declaration of 
>> ‘nir_shader’ was here
>
> This error seems to imply that nir.h is already being included somehow.
>
> Does just removing the typedef solve the problem? Can we figure out
> how nir.h is already being included and remove that?

nir.h is coming from st_nir_lower_builtin.c which #includes st_nir.h..

Perhaps the thing to do is drop the typedef, and just fwd declare
'struct nir_shader', and use 'struct nir_shader' instead of
'nir_shader' in st_nir.h

Already half of the world gets recompiled when you touch nir.h, and
I'd rather not make that worse..

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates

2016-06-28 Thread Ilia Mirkin
On Tue, Jun 28, 2016 at 10:27 AM, Samuel Pitoiset
 wrote:
>
>
> On 06/28/2016 04:23 PM, Ilia Mirkin wrote:
>>
>> On Tue, Jun 28, 2016 at 10:21 AM, Samuel Pitoiset
>>  wrote:
>>>
>>> On 06/28/2016 04:15 PM, Ilia Mirkin wrote:


 Again, what problem was this patch trying to solve?
>>>
>>>
>>>
>>> The problem is that FADD can only emits 19-bits but longIMMD() will
>>> return
>>> false because it only checks for the high 12-bits.
>>>
>>> I don't know if you saw my messages on IRC but I found some other issues
>>> with longIMMD() and emitIMMD().
>>
>>
>> Nope, it will emit 19 bits and then the 20th (high aka sign) bit as
>> well, just to a different location. [And the bottom 12 bits are
>> guaranteed to be 0.]
>>
>> What's a specific example that you think it doesn't emit correctly?
>
>
> I don't have any shaders which hit that issue, but I think it's similar to
> the fix I did for IMUL32I. The immediate value was 0xf4240 in that specific
> case, and IMUL emitted 0x74240 instead... because the sign bit was used to
> emit the NEG modifier.

Right, which isn't the same thing for ints, but is the same thing for
floats. For integer immediates, it's also the low 20 bits, not the
high 20 bits. And I believe that the condition should be ensuring that
all 12 of the high bits are the same. But perhaps it doesn't properly
check that those 12 bits have the same value as the 20th bit?

Anyways... if it ain't broken, don't fix it. Doesn't sound like FADD
emission is broken in any way - let's not fix it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates

2016-06-28 Thread Samuel Pitoiset



On 06/28/2016 04:38 PM, Ilia Mirkin wrote:

On Tue, Jun 28, 2016 at 10:27 AM, Samuel Pitoiset
 wrote:



On 06/28/2016 04:23 PM, Ilia Mirkin wrote:


On Tue, Jun 28, 2016 at 10:21 AM, Samuel Pitoiset
 wrote:


On 06/28/2016 04:15 PM, Ilia Mirkin wrote:



Again, what problem was this patch trying to solve?




The problem is that FADD can only emits 19-bits but longIMMD() will
return
false because it only checks for the high 12-bits.

I don't know if you saw my messages on IRC but I found some other issues
with longIMMD() and emitIMMD().



Nope, it will emit 19 bits and then the 20th (high aka sign) bit as
well, just to a different location. [And the bottom 12 bits are
guaranteed to be 0.]

What's a specific example that you think it doesn't emit correctly?



I don't have any shaders which hit that issue, but I think it's similar to
the fix I did for IMUL32I. The immediate value was 0xf4240 in that specific
case, and IMUL emitted 0x74240 instead... because the sign bit was used to
emit the NEG modifier.


Right, which isn't the same thing for ints, but is the same thing for
floats. For integer immediates, it's also the low 20 bits, not the
high 20 bits. And I believe that the condition should be ensuring that
all 12 of the high bits are the same. But perhaps it doesn't properly
check that those 12 bits have the same value as the 20th bit?


Yes, it does not do that.



Anyways... if it ain't broken, don't fix it. Doesn't sound like FADD
emission is broken in any way - let's not fix it.


Okay, your call. :-)





--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] mapi: Export all GLES 3.1 functions in libGLESv2.so

2016-06-28 Thread Emil Velikov
On 27 June 2016 at 18:38, Ian Romanick  wrote:
> On 06/24/2016 09:30 AM, Emil Velikov wrote:
>> On 20 June 2016 at 19:14, Ian Romanick  wrote:
>>> On 06/17/2016 11:15 AM, Emil Velikov wrote:
 On 17 June 2016 at 18:20, Ian Romanick  wrote:
> From: Ian Romanick 
>
> Khronos recommends that the GLES 3.1 library also be called libGLESv2.
> It also requires that functions be statically linkable from that
> library.
>
> NOTE: Mesa has supported the EGL_KHR_get_all_proc_addresses extension
> since at least Mesa 10.5, so applications targeting Linux should use
> eglGetProcAddress to avoid problems running binaries on systems with
> older, non-GLES 3.1 libGLESv2 libraries.
>
 Fwiw I'm inclined that we should go the "opposite direction". Namely:
 don't expose new symbols and stick to a predefined version (3.0 being
 the personal favour of choice).

 Why, you might ask - for a couple of reasons:
  - If the list continues to grow programs will have unstable ABI -
 sort of how libGL ended up. Applications are going to link against 3.1
 or later symbols [1], even if they only optionally use them. Thus
 things will quite hairy and fragile.
>>>
>>> There are at least two solutions, and piglit uses both.  If use of a set
>>> of functions is optional, you can still use GetProcAddress (when
>>> EGL_KHR_get_all_proc_addresses is available) or you can use dlsym.
>>>
>>> For me, piglit is where this whole problem actually started.  Right now,
>>> piglit follows the (unextended) rules and does not attempt to use
>>> GetProcAddress on core functions.  It uses dlsym.  I tried to extend
>>> shader_runner for separate shader objects on GLES.  Guess what?  Since
>>> the symbols aren't exported by the library, it didn't work.  So... now
>>> piglit would need TWO code paths... one that uses dlsym and one that
>>> uses eglGetProcAddress... or require an optional extension.
>>>
>> I've started looking at piglit last night. There should be some fixes
>> for it on the list later on today.
>>
>>> If an application requires GLES 3.1 symbols, it should just be able to
>>> link with them.  As far as I can tell, that's how it works on Android.
>>>
>> I look at the Android wrapper too closely for the following reasons:
>>
>> - There is libGLESv3.so which is identical copy of the v2 one.
>> - Their libGLESv2/3.so periodically grows new symbols, including GLES
>> extensions.
>> - Android has tight control what and/how it's run on their platform -
>> something that Linux distributions cannot do afaict.
>> - Applications using GLES should annotate the version used in the
>> manifest, which (haven't checked exactly) could serve as a first line
>> of defence for applications e.g. using GLES 3.1 on system/drivers
>> supporting GLES 3.0.
>>
>> That said, there is one very good thing:
>> - They use dlsym and then eglGetProcAddress on all symbols. Thus mesa
>> will just work.
>>
  - The other desktop GLES* provider NVIDIA does not export even a
 single GLES 3.1/3.2 entry point (still going through the 3.0 list) in
 their libGLESv2.so.2 binary.

 So what to do with GLES (3.0?)/3.1 and later:
  - tweak the spec so that said version of the API is only supported if
 the implementation can get core symbols via eglGetProcAddress. Be that
 props to the EGL_KHR_get_all_proc_addresses extension or EGL 1.5 [2].

>> Any "sounds ok" or "that's a horrible idea" input on this suggestion ?
>
> That ship has already sailed.  OpenGL ES 3.0 and 3.1 have both been
> shipping for years.  I don't think changing that is how I would use my
> time machine. :)
>
As you guys wish, I won't stir up a hornet's nest. Just a reminder
that we did a similar thing on the libGL front, which, imho, was
significantly more likely to have actual users that depend on such
'odd' behaviour.

A humble request - can we keep an eye open as GLES 3.3 and/or OpenGL
4.6 comes out. Would be great if with those include the proposed
suggestion/fix. Namely: in order to use these with EGL, one needs to
have the EGL_KHR_get_all_proc_addresses extension or EGL 1.5.

I'll keep an eye open Collabora being a Khronos member, although it
would be great if I'm not the only one.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Split gl_shader in two and clean-ups

2016-06-28 Thread Iago Toral
On Tue, 2016-06-28 at 11:52 +1000, Timothy Arceri wrote:
> There are two distinctly different uses of this struct. The first
> is to store GL shader objects. The second is to store information
> about a shader stage thats been linked.
> 
> The only place the new structs overlap is the shader layout fields and
> I intend to split that out into a third struct once this series lands.
> 
> Having two well defined structs helps code readability and allows the removal
> of some unreachable code paths that were the result of confusion between
> the two uses.

I think it is a good idea, thanks!

I dropped a comment in patch 4, with that fixed patches 1-4 are:
Reviewed-by: Iago Toral Quiroga 

I'll try to review the last 3 patches tomorrow.

Iago

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates

2016-06-28 Thread Samuel Pitoiset



On 06/28/2016 04:23 PM, Ilia Mirkin wrote:

On Tue, Jun 28, 2016 at 10:21 AM, Samuel Pitoiset
 wrote:

On 06/28/2016 04:15 PM, Ilia Mirkin wrote:


Again, what problem was this patch trying to solve?



The problem is that FADD can only emits 19-bits but longIMMD() will return
false because it only checks for the high 12-bits.

I don't know if you saw my messages on IRC but I found some other issues
with longIMMD() and emitIMMD().


Nope, it will emit 19 bits and then the 20th (high aka sign) bit as
well, just to a different location. [And the bottom 12 bits are
guaranteed to be 0.]

What's a specific example that you think it doesn't emit correctly?


I don't have any shaders which hit that issue, but I think it's similar 
to the fix I did for IMUL32I. The immediate value was 0xf4240 in that 
specific case, and IMUL emitted 0x74240 instead... because the sign bit 
was used to emit the NEG modifier.




 -ilia



--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/7] glsl: pass symbols to find_matching_signature() rather than shader

2016-06-28 Thread Iago Toral
On Tue, 2016-06-28 at 11:52 +1000, Timothy Arceri wrote:
> This will allow us to later split gl_shader into two structs.
> ---
>  src/compiler/glsl/link_functions.cpp | 47 
> +---
>  1 file changed, 22 insertions(+), 25 deletions(-)
> 
> diff --git a/src/compiler/glsl/link_functions.cpp 
> b/src/compiler/glsl/link_functions.cpp
> index 4e10287..c9dacc1 100644
> --- a/src/compiler/glsl/link_functions.cpp
> +++ b/src/compiler/glsl/link_functions.cpp
> @@ -31,8 +31,7 @@
>  
>  static ir_function_signature *
>  find_matching_signature(const char *name, const exec_list *actual_parameters,
> - gl_shader **shader_list, unsigned num_shaders,
> - bool use_builtin);
> +glsl_symbol_table *symbols, bool use_builtin);
>  
>  namespace {
>  
> @@ -78,8 +77,8 @@ public:
> * final linked shader.  If it does, use it as the target of the call.
> */
>ir_function_signature *sig =
> -  find_matching_signature(name, >parameters, , 1,
> -  ir->use_builtin);
> + find_matching_signature(name, >parameters, linked->symbols,
> + ir->use_builtin);
>if (sig != NULL) {
>ir->callee = sig;
>return visit_continue;
> @@ -88,8 +87,14 @@ public:
>/* Try to find the signature in one of the other shaders that is being
> * linked.  If it's not found there, return an error.
> */
> -  sig = find_matching_signature(name, >actual_parameters, 
> shader_list,
> - num_shaders, ir->use_builtin);
> +  for (unsigned i = 0; i < num_shaders; i++) {
> + sig = find_matching_signature(name, >actual_parameters,
> +   shader_list[i]->symbols,
> +   ir->use_builtin);
> + if (sig)
> +break;
> +  }
> +
>if (sig == NULL) {
>/* FINISHME: Log the full signature of unresolved function.
> */
> @@ -307,30 +312,22 @@ private:
>   */
>  ir_function_signature *
>  find_matching_signature(const char *name, const exec_list *actual_parameters,
> - gl_shader **shader_list, unsigned num_shaders,
> - bool use_builtin)
> +glsl_symbol_table *symbols, bool use_builtin)
>  {
> -   for (unsigned i = 0; i < num_shaders; i++) {
> -  ir_function *const f = shader_list[i]->symbols->get_function(name);
> -
> -  if (f == NULL)
> -  continue;
> +   ir_function *const f = symbols->get_function(name);
>  
> +   if (f) {
>ir_function_signature *sig =
>   f->matching_signature(NULL, actual_parameters, use_builtin);
>  
> -  if ((sig == NULL) ||
> -  (!sig->is_defined && !sig->is_intrinsic))
> -  continue;
> -
> -  /* If this function expects to bind to a built-in function and the
> -   * signature that we found isn't a built-in, keep looking.  Also keep
> -   * looking if we expect a non-built-in but found a built-in.
> -   */
> -  if (use_builtin != sig->is_builtin())
> - continue;
> -
> -  return sig;
> +  if (sig && (sig->is_defined || sig->is_intrinsic)) {
> + /* If this function expects to bind to a built-in function and the
> +  * signature that we found isn't a built-in, keep looking.  Also 
> keep
> +  * looking if we expect a non-built-in but found a built-in.
> +  */
> + if (use_builtin != sig->is_builtin())
> +return sig;

The code you changed would not return sig if this condition is true, so
I guess you meant:

if (use_builtin == sig->is_builtin())
   return sig;

Iago

> +  }
> }
>  
> return NULL;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] swr: Refactor checks for compiler feature flags

2016-06-28 Thread Chuck Atkins
Encapsulate the test for which flags are needed to get a compiler to
support certain features.  Along with this, give various options to try
for AVX and AVX2 support.  Ideally we want to use specific instruction
set feature flags, like -mavx2 for instance instead of -march=haswell,
but the flags required for certain compilers are different.  This
allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
while the Intel compiler which doesn't support those flags can fall
back to using -march=core-avx2.

This addresses a bug where the Intel compiler will silently ignore the
AVX2 instruction feature flags and then potentially fail to build.

v2: Pass preprocessor-check argument as true-state instead of
false-state for clarity.

Cc: Tim Rowley 
Signed-off-by: Chuck Atkins 
---
 configure.ac | 86 +++-
 1 file changed, 62 insertions(+), 24 deletions(-)

diff --git a/configure.ac b/configure.ac
index cc9bc47..6082778 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2330,6 +2330,39 @@ swr_llvm_check() {
 fi
 }
 
+swr_cxx_feature_flags_check() {
+preprocessor_test="$1"
+option_list="$2"
+unset SWR_CXX_FEATURE_FLAGS
+AC_LANG_PUSH([C++])
+save_CXXFLAGS="$CXXFLAGS"
+save_IFS="$IFS"
+IFS=","
+found=0
+for opts in $option_list
+do
+unset IFS
+CXXFLAGS="$opts $save_CXXFLAGS"
+AC_COMPILE_IFELSE(
+[AC_LANG_PROGRAM(
+[   #if !($preprocessor_test)
+#error
+#endif
+])],
+[found=1; break],
+[])
+IFS=","
+done
+IFS="$save_IFS"
+CXXFLAGS="$save_CXXFLAGS"
+AC_LANG_POP([C++])
+if test $found -eq 1; then
+SWR_CXX_FEATURE_FLAGS="$opts"
+return 0
+fi
+return 1
+}
+
 dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this 
block
 if test -n "$with_gallium_drivers"; then
 gallium_drivers=`IFS=', '; echo $with_gallium_drivers`
@@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then
 xswr)
 swr_llvm_check "swr"
 
-AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2])
-SWR_AVX_CXXFLAGS="-mavx"
-SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c"
-
-AC_LANG_PUSH([C++])
-save_CXXFLAGS="$CXXFLAGS"
-CXXFLAGS="-std=c++11 $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([c++11 compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-
-save_CXXFLAGS="$CXXFLAGS"
-CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([AVX compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-
-save_CFLAGS="$CXXFLAGS"
-CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([AVX2 compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-AC_LANG_POP([C++])
-
+AC_MSG_CHECKING([whether $CXX supports c++11])
+if ! swr_cxx_feature_flags_check \
+"__cplusplus >= 201103L" \
+",-std=c++11"; then
+AC_MSG_RESULT([no])
+AC_MSG_ERROR([swr requires C++11 support])
+fi
+AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
+CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS"
+
+AC_MSG_CHECKING([whether $CXX supports AVX])
+if ! swr_cxx_feature_flags_check \
+"defined(__AVX__)" \
+",-mavx,-march=core-avx"; then
+AC_MSG_RESULT([no])
+AC_MSG_ERROR([swr requires AVX compiler support])
+fi
+AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
+SWR_AVX_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS"
 AC_SUBST([SWR_AVX_CXXFLAGS])
+
+AC_MSG_CHECKING([whether $CXX supports AVX2])
+if ! swr_cxx_feature_flags_check \
+
"defined(__AVX2__)&(__FMA__)&(__BMI2__)&(__F16C__)" \
+",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2"; then
+AC_MSG_RESULT([no])
+AC_MSG_ERROR([swr requires AVX2 compiler support])
+fi
+AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
+SWR_AVX2_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS"
 AC_SUBST([SWR_AVX2_CXXFLAGS])
 
 HAVE_GALLIUM_SWR=yes
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates

2016-06-28 Thread Ilia Mirkin
On Tue, Jun 28, 2016 at 10:21 AM, Samuel Pitoiset
 wrote:
> On 06/28/2016 04:15 PM, Ilia Mirkin wrote:
>>
>> Again, what problem was this patch trying to solve?
>
>
> The problem is that FADD can only emits 19-bits but longIMMD() will return
> false because it only checks for the high 12-bits.
>
> I don't know if you saw my messages on IRC but I found some other issues
> with longIMMD() and emitIMMD().

Nope, it will emit 19 bits and then the 20th (high aka sign) bit as
well, just to a different location. [And the bottom 12 bits are
guaranteed to be 0.]

What's a specific example that you think it doesn't emit correctly?

 -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] doc: improve INTEL_DEBUG documentation

2016-06-28 Thread Kenneth Graunke
On Tuesday, June 28, 2016 1:33:21 AM PDT Grazvydas Ignotas wrote:
> Remove 'reg' option that does not actually exist, elaborate more about
> 'sync' and add the missing options.
> 
> Signed-off-by: Grazvydas Ignotas 
> ---
>  no commit access, if this is ok please somebody push
> 
>  docs/envvars.html | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/envvars.html b/docs/envvars.html
> index ed957bd..2d9a289 100644
> --- a/docs/envvars.html
> +++ b/docs/envvars.html
> @@ -144,11 +144,10 @@ See the Xlib software driver 
> page for details.
> bat - emit batch information
> pix - emit messages about pixel operations
> buf - emit messages about buffer objects
> -   reg - emit messages about regions
> fbo - emit messages about framebuffers
> fs - dump shader assembly for fragment shaders
> gs - dump shader assembly for geometry shaders
> -   sync - emit messages about synchronization
> +   sync - after sending each batch, emit a message and wait for that 
> batch to finish rendering
> prim - emit messages about drawing primitives
> vert - emit messages about vertex assembly
> dri - emit messages about the DRI interface
> @@ -163,9 +162,18 @@ See the Xlib software driver 
> page for details.
> blorp - emit messages about the blorp operations (blits  
> clears)
> nodualobj - suppress generation of dual-object geometry shader 
> code
> optimizer - dump shader assembly to files at each optimization pass 
> and iteration that make progress
> +   ann - annotate IR in assembly dumps
> +   no8 - don't generate SIMD8 fragment shader
> vec4 - force vec4 mode in vertex shader
> spill_fs - force spilling of all registers in the scalar backend 
> (useful to debug spilling code)
> spill_vec4 - force spilling of all registers in the vec4 backend 
> (useful to debug spilling code)
> +   cs - dump shader assembly for compute shaders
> +   hex - print instruction hex dump with the disassembly
> +   nocompact - disable instruction compaction
> +   tcs - dump shader assembly for tessellation control shaders
> +   tes - dump shader assembly for tessellation evaluation shaders
> +   l3 - emit messages about the new L3 state during transitions
> +   do32 - generate compute shader SIMD32 programs even if workgroup size 
> doesn't exceed the SIMD16 limit
> norbc - disable single sampled render buffer compression
>  
>  
> 

Reviewed-by: Kenneth Graunke 

Also pushed:

To ssh://git.freedesktop.org/git/mesa/mesa
   c1dbc56..2343235  master -> master

Thanks!


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates

2016-06-28 Thread Samuel Pitoiset



On 06/28/2016 04:15 PM, Ilia Mirkin wrote:

On Tue, Jun 28, 2016 at 10:11 AM, Samuel Pitoiset
 wrote:



On 06/28/2016 04:00 PM, Ilia Mirkin wrote:


On Tue, Jun 28, 2016 at 4:33 AM, Samuel Pitoiset
 wrote:




On 06/28/2016 05:10 AM, Ilia Mirkin wrote:



On Mon, Jun 27, 2016 at 6:08 PM, Samuel Pitoiset
 wrote:





On 06/28/2016 12:06 AM, Ilia Mirkin wrote:




On Mon, Jun 27, 2016 at 6:05 PM, Ilia Mirkin 
wrote:




On Mon, Jun 27, 2016 at 6:04 PM, Samuel Pitoiset
 wrote:






On 06/28/2016 12:02 AM, Ilia Mirkin wrote:





This loses you saturation. Does the target account for this?






No saturate flag for FADD32I.





That's not what I asked.





Specifically look at this code:

bool
TargetNVC0::isSatSupported(const Instruction *insn) const
{
   if (insn->op == OP_CVT)
  return true;
   if (!(opInfo[insn->op].dstMods & NV50_IR_MOD_SAT))
  return false;

   if (insn->dType == TYPE_U32)
  return (insn->op == OP_ADD) || (insn->op == OP_MAD);

   // add f32 LIMM cannot saturate
   if (insn->op == OP_ADD && insn->sType == TYPE_F32) {
  if (insn->getSrc(1)->asImm() &&
  insn->getSrc(1)->reg.data.u32 & 0xfff)
 return false;
   }

Note how it will say that sat is supported for SIMMs with FADD? So the
compiler will generate those ops, but then the emitter won't be able
to handle it.



Okay, I get it.




By the way, instead of trying to fight the longIMMD, you should just fix
it -

/*0008*/   @P0 FADD R0, R0, 1.NEG;  /*
0x3858203f8000 */

which corresponds nicely to

  emitNEG(0x2d, insn->src(1));

The issue is that emitIMMD does

   if (len == 19) {
...
  emitField( 56,   1, (val & 0x8) >> 19);
  emitField(pos, len, (val & 0x7));

So the problem is that the 56 isn't as fixed as the emission code had
hoped. I suspect that adjusting it will fix all these silly cases.

  -ilia



/*0010*/   @P0 FADD R0, R0, 0.NEG;  /*
0x38582000 */
/*0010*/   @P0 FADD R0, R0, -0;  /*
0x3958 */

urgh?



So ... what problem were you having again?



The thing is: why those 2 instructions use a different position for the neg
flag?


One is setting the high bit of the immediate, the other is applying
negation to the argument.


Ok.



Again, what problem was this patch trying to solve?


The problem is that FADD can only emits 19-bits but longIMMD() will 
return false because it only checks for the high 12-bits.


I don't know if you saw my messages on IRC but I found some other issues 
with longIMMD() and emitIMMD().




  -ilia



--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates

2016-06-28 Thread Ilia Mirkin
On Tue, Jun 28, 2016 at 10:11 AM, Samuel Pitoiset
 wrote:
>
>
> On 06/28/2016 04:00 PM, Ilia Mirkin wrote:
>>
>> On Tue, Jun 28, 2016 at 4:33 AM, Samuel Pitoiset
>>  wrote:
>>>
>>>
>>>
>>> On 06/28/2016 05:10 AM, Ilia Mirkin wrote:


 On Mon, Jun 27, 2016 at 6:08 PM, Samuel Pitoiset
  wrote:
>
>
>
>
> On 06/28/2016 12:06 AM, Ilia Mirkin wrote:
>>
>>
>>
>> On Mon, Jun 27, 2016 at 6:05 PM, Ilia Mirkin 
>> wrote:
>>>
>>>
>>>
>>> On Mon, Jun 27, 2016 at 6:04 PM, Samuel Pitoiset
>>>  wrote:





 On 06/28/2016 12:02 AM, Ilia Mirkin wrote:
>
>
>
>
> This loses you saturation. Does the target account for this?





 No saturate flag for FADD32I.
>>>
>>>
>>>
>>>
>>> That's not what I asked.
>>
>>
>>
>>
>> Specifically look at this code:
>>
>> bool
>> TargetNVC0::isSatSupported(const Instruction *insn) const
>> {
>>if (insn->op == OP_CVT)
>>   return true;
>>if (!(opInfo[insn->op].dstMods & NV50_IR_MOD_SAT))
>>   return false;
>>
>>if (insn->dType == TYPE_U32)
>>   return (insn->op == OP_ADD) || (insn->op == OP_MAD);
>>
>>// add f32 LIMM cannot saturate
>>if (insn->op == OP_ADD && insn->sType == TYPE_F32) {
>>   if (insn->getSrc(1)->asImm() &&
>>   insn->getSrc(1)->reg.data.u32 & 0xfff)
>>  return false;
>>}
>>
>> Note how it will say that sat is supported for SIMMs with FADD? So the
>> compiler will generate those ops, but then the emitter won't be able
>> to handle it.
>>
>
> Okay, I get it.



 By the way, instead of trying to fight the longIMMD, you should just fix
 it -

 /*0008*/   @P0 FADD R0, R0, 1.NEG;  /*
 0x3858203f8000 */

 which corresponds nicely to

   emitNEG(0x2d, insn->src(1));

 The issue is that emitIMMD does

if (len == 19) {
 ...
   emitField( 56,   1, (val & 0x8) >> 19);
   emitField(pos, len, (val & 0x7));

 So the problem is that the 56 isn't as fixed as the emission code had
 hoped. I suspect that adjusting it will fix all these silly cases.

   -ilia

>>>
>>> /*0010*/   @P0 FADD R0, R0, 0.NEG;  /*
>>> 0x38582000 */
>>> /*0010*/   @P0 FADD R0, R0, -0;  /*
>>> 0x3958 */
>>>
>>> urgh?
>>
>>
>> So ... what problem were you having again?
>
>
> The thing is: why those 2 instructions use a different position for the neg
> flag?

One is setting the high bit of the immediate, the other is applying
negation to the argument.

Again, what problem was this patch trying to solve?

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates

2016-06-28 Thread Samuel Pitoiset



On 06/28/2016 04:00 PM, Ilia Mirkin wrote:

On Tue, Jun 28, 2016 at 4:33 AM, Samuel Pitoiset
 wrote:



On 06/28/2016 05:10 AM, Ilia Mirkin wrote:


On Mon, Jun 27, 2016 at 6:08 PM, Samuel Pitoiset
 wrote:




On 06/28/2016 12:06 AM, Ilia Mirkin wrote:



On Mon, Jun 27, 2016 at 6:05 PM, Ilia Mirkin 
wrote:



On Mon, Jun 27, 2016 at 6:04 PM, Samuel Pitoiset
 wrote:





On 06/28/2016 12:02 AM, Ilia Mirkin wrote:




This loses you saturation. Does the target account for this?





No saturate flag for FADD32I.




That's not what I asked.




Specifically look at this code:

bool
TargetNVC0::isSatSupported(const Instruction *insn) const
{
   if (insn->op == OP_CVT)
  return true;
   if (!(opInfo[insn->op].dstMods & NV50_IR_MOD_SAT))
  return false;

   if (insn->dType == TYPE_U32)
  return (insn->op == OP_ADD) || (insn->op == OP_MAD);

   // add f32 LIMM cannot saturate
   if (insn->op == OP_ADD && insn->sType == TYPE_F32) {
  if (insn->getSrc(1)->asImm() &&
  insn->getSrc(1)->reg.data.u32 & 0xfff)
 return false;
   }

Note how it will say that sat is supported for SIMMs with FADD? So the
compiler will generate those ops, but then the emitter won't be able
to handle it.



Okay, I get it.



By the way, instead of trying to fight the longIMMD, you should just fix
it -

/*0008*/   @P0 FADD R0, R0, 1.NEG;  /*
0x3858203f8000 */

which corresponds nicely to

  emitNEG(0x2d, insn->src(1));

The issue is that emitIMMD does

   if (len == 19) {
...
  emitField( 56,   1, (val & 0x8) >> 19);
  emitField(pos, len, (val & 0x7));

So the problem is that the 56 isn't as fixed as the emission code had
hoped. I suspect that adjusting it will fix all these silly cases.

  -ilia



/*0010*/   @P0 FADD R0, R0, 0.NEG;  /*
0x38582000 */
/*0010*/   @P0 FADD R0, R0, -0;  /*
0x3958 */

urgh?


So ... what problem were you having again?


The thing is: why those 2 instructions use a different position for the 
neg flag?


An by the way, the bit 56 is fixed for all short immediates.



  -ilia



--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates

2016-06-28 Thread Ilia Mirkin
On Tue, Jun 28, 2016 at 4:33 AM, Samuel Pitoiset
 wrote:
>
>
> On 06/28/2016 05:10 AM, Ilia Mirkin wrote:
>>
>> On Mon, Jun 27, 2016 at 6:08 PM, Samuel Pitoiset
>>  wrote:
>>>
>>>
>>>
>>> On 06/28/2016 12:06 AM, Ilia Mirkin wrote:


 On Mon, Jun 27, 2016 at 6:05 PM, Ilia Mirkin 
 wrote:
>
>
> On Mon, Jun 27, 2016 at 6:04 PM, Samuel Pitoiset
>  wrote:
>>
>>
>>
>>
>> On 06/28/2016 12:02 AM, Ilia Mirkin wrote:
>>>
>>>
>>>
>>> This loses you saturation. Does the target account for this?
>>
>>
>>
>>
>> No saturate flag for FADD32I.
>
>
>
> That's not what I asked.



 Specifically look at this code:

 bool
 TargetNVC0::isSatSupported(const Instruction *insn) const
 {
if (insn->op == OP_CVT)
   return true;
if (!(opInfo[insn->op].dstMods & NV50_IR_MOD_SAT))
   return false;

if (insn->dType == TYPE_U32)
   return (insn->op == OP_ADD) || (insn->op == OP_MAD);

// add f32 LIMM cannot saturate
if (insn->op == OP_ADD && insn->sType == TYPE_F32) {
   if (insn->getSrc(1)->asImm() &&
   insn->getSrc(1)->reg.data.u32 & 0xfff)
  return false;
}

 Note how it will say that sat is supported for SIMMs with FADD? So the
 compiler will generate those ops, but then the emitter won't be able
 to handle it.

>>>
>>> Okay, I get it.
>>
>>
>> By the way, instead of trying to fight the longIMMD, you should just fix
>> it -
>>
>> /*0008*/   @P0 FADD R0, R0, 1.NEG;  /*
>> 0x3858203f8000 */
>>
>> which corresponds nicely to
>>
>>   emitNEG(0x2d, insn->src(1));
>>
>> The issue is that emitIMMD does
>>
>>if (len == 19) {
>> ...
>>   emitField( 56,   1, (val & 0x8) >> 19);
>>   emitField(pos, len, (val & 0x7));
>>
>> So the problem is that the 56 isn't as fixed as the emission code had
>> hoped. I suspect that adjusting it will fix all these silly cases.
>>
>>   -ilia
>>
>
> /*0010*/   @P0 FADD R0, R0, 0.NEG;  /*
> 0x38582000 */
> /*0010*/   @P0 FADD R0, R0, -0;  /*
> 0x3958 */
>
> urgh?

So ... what problem were you having again?

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] swr: Refactor checks for compiler feature flags

2016-06-28 Thread Chuck Atkins
Encapsulate the test for which flags are needed to get a compiler to
support certain features.  Along with this, give various options to try
for AVX and AVX2 support.  Ideally we want to use specific instruction
set feature flags, like -mavx2 for instance instead of -march=haswell,
but the flags required for certain compilers are different.  This
allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
while the Intel compiler which doesn't support those flags can fall
back to using -march=core-avx2.

This addresses a bug where the Intel compiler will silently ignore the
AVX2 instruction feature flags and then potentially fail to build.

Cc: Tim Rowley 
Signed-off-by: Chuck Atkins 
---
 configure.ac | 86 +++-
 1 file changed, 62 insertions(+), 24 deletions(-)

diff --git a/configure.ac b/configure.ac
index cc9bc47..806850e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2330,6 +2330,39 @@ swr_llvm_check() {
 fi
 }
 
+swr_cxx_feature_flags_check() {
+ifndef_test=$1
+option_list="$2"
+unset SWR_CXX_FEATURE_FLAGS
+AC_LANG_PUSH([C++])
+save_CXXFLAGS="$CXXFLAGS"
+save_IFS="$IFS"
+IFS=","
+found=0
+for opts in $option_list
+do
+unset IFS
+CXXFLAGS="$opts $save_CXXFLAGS"
+AC_COMPILE_IFELSE(
+[AC_LANG_PROGRAM(
+[   $ifndef_test
+#error
+#endif
+])],
+[found=1; break],
+[])
+IFS=","
+done
+IFS="$save_IFS"
+CXXFLAGS="$save_CXXFLAGS"
+AC_LANG_POP([C++])
+if test $found -eq 1; then
+SWR_CXX_FEATURE_FLAGS="$opts"
+return 0
+fi
+return 1
+}
+
 dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this 
block
 if test -n "$with_gallium_drivers"; then
 gallium_drivers=`IFS=', '; echo $with_gallium_drivers`
@@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then
 xswr)
 swr_llvm_check "swr"
 
-AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2])
-SWR_AVX_CXXFLAGS="-mavx"
-SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c"
-
-AC_LANG_PUSH([C++])
-save_CXXFLAGS="$CXXFLAGS"
-CXXFLAGS="-std=c++11 $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([c++11 compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-
-save_CXXFLAGS="$CXXFLAGS"
-CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([AVX compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-
-save_CFLAGS="$CXXFLAGS"
-CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS"
-AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
-  [AC_MSG_ERROR([AVX2 compiler support not 
detected])])
-CXXFLAGS="$save_CXXFLAGS"
-AC_LANG_POP([C++])
-
+AC_MSG_CHECKING([whether $CXX supports c++11])
+if ! swr_cxx_feature_flags_check \
+"#if __cplusplus < 201103L" \
+",-std=c++11"; then
+AC_MSG_RESULT([no])
+AC_MSG_ERROR([swr requires C++11 support])
+fi
+AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
+CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS"
+
+AC_MSG_CHECKING([whether $CXX supports AVX])
+if ! swr_cxx_feature_flags_check \
+"#ifndef __AVX__" \
+",-mavx,-march=core-avx"; then
+AC_MSG_RESULT([no])
+AC_MSG_ERROR([swr requires AVX compiler support])
+fi
+AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
+SWR_AVX_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS"
 AC_SUBST([SWR_AVX_CXXFLAGS])
+
+AC_MSG_CHECKING([whether $CXX supports AVX2])
+if ! swr_cxx_feature_flags_check \
+"#if 
!(defined(__AVX2__)&(__FMA__)&(__BMI2__)&(__F16C__))" \
+",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2"; then
+AC_MSG_RESULT([no])
+AC_MSG_ERROR([swr requires AVX2 compiler support])
+fi
+AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS])
+SWR_AVX2_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS"
 AC_SUBST([SWR_AVX2_CXXFLAGS])
 
 HAVE_GALLIUM_SWR=yes
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] vl: add a bicubic interpolation filter(v4)

2016-06-28 Thread Grigori Goronzy

On 2016-06-28 11:25, Nayan Deshmukh wrote:

This is a shader based bicubic interpolater which uses cubic
Hermite spline algorithm.

v2: set dst_area and dst_clip during scaling (Christian)
v3: clear the render target before rendering
v4: intialize offsets while initializing shaders
use a constant buffer to send dst_size to frag shader
small changes to reduce calculation in shader

Signed-off-by: Nayan Deshmukh 
---
 src/gallium/auxiliary/Makefile.sources   |   2 +
 src/gallium/auxiliary/vl/vl_bicubic_filter.c | 465 
+++

 src/gallium/auxiliary/vl/vl_bicubic_filter.h |  63 
 3 files changed, 530 insertions(+)
 create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.c
 create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.h

diff --git a/src/gallium/auxiliary/Makefile.sources
b/src/gallium/auxiliary/Makefile.sources
index ab58358..e0311bf 100644
--- a/src/gallium/auxiliary/Makefile.sources
+++ b/src/gallium/auxiliary/Makefile.sources
@@ -317,6 +317,8 @@ NIR_SOURCES := \
nir/tgsi_to_nir.h

 VL_SOURCES := \
+   vl/vl_bicubic_filter.c \
+   vl/vl_bicubic_filter.h \
vl/vl_compositor.c \
vl/vl_compositor.h \
vl/vl_csc.c \
diff --git a/src/gallium/auxiliary/vl/vl_bicubic_filter.c
b/src/gallium/auxiliary/vl/vl_bicubic_filter.c
new file mode 100644
index 000..396e76d
--- /dev/null
+++ b/src/gallium/auxiliary/vl/vl_bicubic_filter.c
@@ -0,0 +1,465 @@
+/**
+ *
+ * Copyright 2016 Nayan Deshmukh.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a

+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject 
to

+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including 
the
+ * next paragraph) shall be included in all copies or substantial 
portions

+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS

+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 
NON-INFRINGEMENT.

+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF 
CONTRACT,

+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ 
**/

+
+#include 
+
+#include "pipe/p_context.h"
+
+#include "tgsi/tgsi_ureg.h"
+
+#include "util/u_draw.h"
+#include "util/u_memory.h"
+#include "util/u_math.h"
+#include "util/u_rect.h"
+
+#include "vl_types.h"
+#include "vl_vertex_buffers.h"
+#include "vl_bicubic_filter.h"
+
+enum VS_OUTPUT
+{
+   VS_O_VPOS = 0,
+   VS_O_VTEX = 0
+};
+
+static void *
+create_vert_shader(struct vl_bicubic_filter *filter)
+{
+   struct ureg_program *shader;
+   struct ureg_src i_vpos;
+   struct ureg_dst o_vpos, o_vtex;
+
+   shader = ureg_create(PIPE_SHADER_VERTEX);
+   if (!shader)
+  return NULL;
+
+   i_vpos = ureg_DECL_vs_input(shader, 0);
+   o_vpos = ureg_DECL_output(shader, TGSI_SEMANTIC_POSITION, 
VS_O_VPOS);
+   o_vtex = ureg_DECL_output(shader, TGSI_SEMANTIC_GENERIC, 
VS_O_VTEX);

+
+   ureg_MOV(shader, o_vpos, i_vpos);
+   ureg_MOV(shader, o_vtex, i_vpos);
+
+   ureg_END(shader);
+
+   return ureg_create_shader_and_destroy(shader, filter->pipe);
+}
+
+static void
+create_frag_shader_cubic_interpolater(struct ureg_program *shader,
struct ureg_src tex_a,
+  struct ureg_src tex_b, struct
ureg_src tex_c,
+  struct ureg_src tex_d, struct 
ureg_src t,

+  struct ureg_dst o_fragment)
+{
+   struct ureg_dst temp[11];
+   struct ureg_dst t_2;
+   unsigned i;
+
+   for(i = 0; i < 11; ++i)
+   temp[i] = ureg_DECL_temporary(shader);
+   t_2 = ureg_DECL_temporary(shader);
+
+   /*
+* |temp[0]|   |  0  2  0  0 |  |tex_a|
+* |temp[1]| = | -1  0  1  0 |* |tex_b|
+* |temp[2]|   |  2 -5  4 -1 |  |tex_c|
+* |temp[3]|   | -1  3 -3  1 |  |tex_d|
+*/
+   ureg_MUL(shader, temp[0], tex_b, ureg_imm1f(shader, 2.0f));
+
+   ureg_MUL(shader, temp[1], tex_a, ureg_imm1f(shader, -1.0f));
+   ureg_MAD(shader, temp[1], tex_c, ureg_imm1f(shader, 1.0f),
+ureg_src(temp[1]));
+
+   ureg_MUL(shader, temp[2], tex_a, ureg_imm1f(shader, 2.0f));
+   ureg_MAD(shader, temp[2], tex_b, ureg_imm1f(shader, -5.0f),
+ureg_src(temp[2]));
+   ureg_MAD(shader, temp[2], tex_c, ureg_imm1f(shader, 4.0f),
+ 

Re: [Mesa-dev] [PATCH 1/2] vl: add a bicubic interpolation filter(v4)

2016-06-28 Thread Christian König

Am 28.06.2016 um 11:25 schrieb Nayan Deshmukh:

This is a shader based bicubic interpolater which uses cubic
Hermite spline algorithm.

v2: set dst_area and dst_clip during scaling (Christian)
v3: clear the render target before rendering
v4: intialize offsets while initializing shaders
 use a constant buffer to send dst_size to frag shader
 small changes to reduce calculation in shader

Signed-off-by: Nayan Deshmukh 
---
  src/gallium/auxiliary/Makefile.sources   |   2 +
  src/gallium/auxiliary/vl/vl_bicubic_filter.c | 465 +++
  src/gallium/auxiliary/vl/vl_bicubic_filter.h |  63 
  3 files changed, 530 insertions(+)
  create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.c
  create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.h

diff --git a/src/gallium/auxiliary/Makefile.sources 
b/src/gallium/auxiliary/Makefile.sources
index ab58358..e0311bf 100644
--- a/src/gallium/auxiliary/Makefile.sources
+++ b/src/gallium/auxiliary/Makefile.sources
@@ -317,6 +317,8 @@ NIR_SOURCES := \
nir/tgsi_to_nir.h
  
  VL_SOURCES := \

+   vl/vl_bicubic_filter.c \
+   vl/vl_bicubic_filter.h \
vl/vl_compositor.c \
vl/vl_compositor.h \
vl/vl_csc.c \
diff --git a/src/gallium/auxiliary/vl/vl_bicubic_filter.c 
b/src/gallium/auxiliary/vl/vl_bicubic_filter.c
new file mode 100644
index 000..396e76d
--- /dev/null
+++ b/src/gallium/auxiliary/vl/vl_bicubic_filter.c
@@ -0,0 +1,465 @@
+/**
+ *
+ * Copyright 2016 Nayan Deshmukh.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ **/
+
+#include 
+
+#include "pipe/p_context.h"
+
+#include "tgsi/tgsi_ureg.h"
+
+#include "util/u_draw.h"
+#include "util/u_memory.h"
+#include "util/u_math.h"
+#include "util/u_rect.h"
+
+#include "vl_types.h"
+#include "vl_vertex_buffers.h"
+#include "vl_bicubic_filter.h"
+
+enum VS_OUTPUT
+{
+   VS_O_VPOS = 0,
+   VS_O_VTEX = 0
+};
+
+static void *
+create_vert_shader(struct vl_bicubic_filter *filter)
+{
+   struct ureg_program *shader;
+   struct ureg_src i_vpos;
+   struct ureg_dst o_vpos, o_vtex;
+
+   shader = ureg_create(PIPE_SHADER_VERTEX);
+   if (!shader)
+  return NULL;
+
+   i_vpos = ureg_DECL_vs_input(shader, 0);
+   o_vpos = ureg_DECL_output(shader, TGSI_SEMANTIC_POSITION, VS_O_VPOS);
+   o_vtex = ureg_DECL_output(shader, TGSI_SEMANTIC_GENERIC, VS_O_VTEX);
+
+   ureg_MOV(shader, o_vpos, i_vpos);
+   ureg_MOV(shader, o_vtex, i_vpos);
+
+   ureg_END(shader);
+
+   return ureg_create_shader_and_destroy(shader, filter->pipe);
+}
+
+static void
+create_frag_shader_cubic_interpolater(struct ureg_program *shader, struct 
ureg_src tex_a,
+  struct ureg_src tex_b, struct ureg_src 
tex_c,
+  struct ureg_src tex_d, struct ureg_src t,
+  struct ureg_dst o_fragment)
+{
+   struct ureg_dst temp[11];
+   struct ureg_dst t_2;
+   unsigned i;
+
+   for(i = 0; i < 11; ++i)
+   temp[i] = ureg_DECL_temporary(shader);
+   t_2 = ureg_DECL_temporary(shader);
+
+   /*
+* |temp[0]|   |  0  2  0  0 |  |tex_a|
+* |temp[1]| = | -1  0  1  0 |* |tex_b|
+* |temp[2]|   |  2 -5  4 -1 |  |tex_c|
+* |temp[3]|   | -1  3 -3  1 |  |tex_d|
+*/
+   ureg_MUL(shader, temp[0], tex_b, ureg_imm1f(shader, 2.0f));
+
+   ureg_MUL(shader, temp[1], tex_a, ureg_imm1f(shader, -1.0f));
+   ureg_MAD(shader, temp[1], tex_c, ureg_imm1f(shader, 1.0f),
+ureg_src(temp[1]));
+
+   ureg_MUL(shader, temp[2], tex_a, ureg_imm1f(shader, 2.0f));
+   ureg_MAD(shader, temp[2], tex_b, ureg_imm1f(shader, -5.0f),
+ureg_src(temp[2]));
+   ureg_MAD(shader, temp[2], tex_c, ureg_imm1f(shader, 4.0f),
+   

Re: [Mesa-dev] [PATCH resend] pipe_loader_sw: Fix fd leak when instantiated via pipe_loader_sw_probe_kms

2016-06-28 Thread Hans de Goede

Hi,

On 27-05-16 16:24, Emil Velikov wrote:

Hi Hans,

On 27 May 2016 at 15:06, Hans de Goede  wrote:

Make pipe_loader_sw_probe_kms take ownership of the passed in fd,
like pipe_loader_drm_probe_fd does.

The only caller is dri_kms_init_screen which passes in a dupped fd,
just like dri2_init_screen passes in a dupped fd to
pipe_loader_drm_probe_fd.


My memory is failing ... I thought I replied to this.

The patch is correct, so
Reviewed-by: Emil Velikov 


Thanks, unfortunately I was swamped with other stuff, so I did not
get around until pushing it until now.

It is pushed now.


I wonder when I'll get the chance to fold the
almost-but-no-quite-the-same sw and hw side of the pipe loader. If
you're interested let me know.


Sorry, -ENOTIME.

Regards,

Hans
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling

2016-06-28 Thread Nayan Deshmukh
Hi Christian and Andy,

I have sent new series of patches which takes care of the points Christian
pointed out.

I have also made some changes to make it more efficient than before.

Also due to a wrong message id, I have sent the messages as a new  thread
instead of replying to this thread.

Regards,
Nayan.

On Mon, Jun 27, 2016 at 7:50 PM, Christian König 
wrote:

> This code fragment:
>
> +   /* t = frac(i_vtex*size)
> ...
> +   ureg_MUL(shader, t, i_vtex, ureg_imm1f(shader, size));
>
> Probably doesn't do what you expect it to do when the pixel center is at
> 0.5 instead of 0.0.
>
> For the matrix and most other filters the difference doesn't matter
> because you get the same offset on x/y as input you need to apply in the
> texture instructions as well.
>
> Regards,
> Christian.
>
>
> Am 27.06.2016 um 15:51 schrieb Nayan Deshmukh:
>
> Hi Christian,
>
> I haven't taken that into account, but how will it any way affect my
> calculation. I have written
> the code taking inspiration from the way matrix_filter uses offsets.
>
> Regards,
> Nayan.
>
> On Mon, Jun 27, 2016 at 6:55 PM, Christian König 
> wrote:
>
>> Hi guys,
>>
>> Nayan have you taken into account that the pixel center is at 0.5 and not
>> 0.0?
>>
>> Regards,
>> Christian.
>>
>>
>> Am 26.06.2016 um 22:30 schrieb Andy Furniss:
>>
>>> Nayan Deshmukh wrote:
>>>
 Hi Andy,

 On Sun, Jun 26, 2016 at 12:25 AM, Andy Furniss < 
 adf.li...@gmail.com> wrote:

 Nayan Deshmukh wrote:
>
> Hi Andy,
>>
>> Thanks for testing the patches.
>>
>> Please send me the videos and ratios with which there is corruption.
>>
>>
>
>
> https://drive.google.com/file/d/0BxP5-S1t9VEEaHZEM203RFpyNEE/view?usp=sharing
>
> This has no aspect encoded and displayed fullscreen on a 1920x1080
> monitor shows vertical line artifacts over the first 2/3 of the image.
>
> When I say lines they are not lines as such just that the distortion
> on the pendulum shows as it passes over imaginary lines at fixed
> points on the screen.
>
> with mplayer -aspect 4/3 or 16/9 it doesn't.
>
>
 I tested the videos and found out that the distortion is because of the
 amount
 of calculation done in the fragment shader. I tested the video with
 vl_median_filter
 and it showed no distortion however, with vl_matrix_filter( which
 requires
 more
 calculations than vl_median_filter) it showed the same distortion. I'll
 try
 to make it
 more efficient. But it still requires a lot of processing for a single
 pixel as it uses
 15 neighbouring pixel.

>>>
>>> Seems a bit strange, does the processing needed vary greatly with
>>> similar scale amounts? I have a powerful GPU and can force clocks
>>> high, but it makes no difference.
>>>
>>> Below is a png showing the artifacts I see on pendulum fullscreen
>>> are these what you see?
>>>
>>> If rather than full screen I stretch out the window to scale, there
>>> will be many sizes that don't produce those.
>>>
>>>
>>> https://drive.google.com/file/d/0BxP5-S1t9VEEd2hwNVp0ZXRSZTA/view?usp=sharing
>>>
>>> Also I don't see any offsets with the videos, may be I am missing
 something.
 If could tell me more about the offsets, I'll try to debug them.

>>>
>>>
>>> https://drive.google.com/file/d/0BxP5-S1t9VEEUGZTbndOMzBNZnM/view?usp=sharing
>>>
>>> Is a default scale, if you download both pngs and use something to
>>> display them both at the same time and line up the windows one on
>>> top of the other then flip between them you can see although the
>>> windows are lined up the images contained are not.
>>>
>>> You can make your own screen/window shots with xwd and display them
>>> with xwud. For me using fluxbox as a desktop it's easy to line up
>>> windows as they snap a bit towards the edge of the screen YMMV.
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>
>>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] st/vdpau: use bicubic filter for scaling(v6)

2016-06-28 Thread Nayan Deshmukh
use bicubic filtering as high quality scaling L1.

v2: fix a typo and add a newline to code
v3: -render the unscaled image on a temporary surface (Christian)
-apply noise reduction and sharpness filter on
 unscaled surface
-render the final scaled surface using bicubic
 interpolation
v4: support high quality scaling
v5: set dst_area and dst_clip in bicubic filter
v6: set buffer layer before setting dst_area

Signed-off-by: Nayan Deshmukh 
---
 src/gallium/state_trackers/vdpau/mixer.c | 112 ---
 src/gallium/state_trackers/vdpau/query.c |   1 +
 src/gallium/state_trackers/vdpau/vdpau_private.h |   6 ++
 3 files changed, 105 insertions(+), 14 deletions(-)

diff --git a/src/gallium/state_trackers/vdpau/mixer.c 
b/src/gallium/state_trackers/vdpau/mixer.c
index 65c3ce2..4dbbdf6 100644
--- a/src/gallium/state_trackers/vdpau/mixer.c
+++ b/src/gallium/state_trackers/vdpau/mixer.c
@@ -82,7 +82,6 @@ vlVdpVideoMixerCreate(VdpDevice device,
   switch (features[i]) {
   /* they are valid, but we doesn't support them */
   case VDP_VIDEO_MIXER_FEATURE_DEINTERLACE_TEMPORAL_SPATIAL:
-  case VDP_VIDEO_MIXER_FEATURE_HIGH_QUALITY_SCALING_L1:
   case VDP_VIDEO_MIXER_FEATURE_HIGH_QUALITY_SCALING_L2:
   case VDP_VIDEO_MIXER_FEATURE_HIGH_QUALITY_SCALING_L3:
   case VDP_VIDEO_MIXER_FEATURE_HIGH_QUALITY_SCALING_L4:
@@ -110,6 +109,9 @@ vlVdpVideoMixerCreate(VdpDevice device,
  vmixer->luma_key.supported = true;
  break;
 
+  case VDP_VIDEO_MIXER_FEATURE_HIGH_QUALITY_SCALING_L1:
+ vmixer->bicubic.supported = true;
+ break;
   default: goto no_params;
   }
}
@@ -202,6 +204,11 @@ vlVdpVideoMixerDestroy(VdpVideoMixer mixer)
   vl_matrix_filter_cleanup(vmixer->sharpness.filter);
   FREE(vmixer->sharpness.filter);
}
+
+   if (vmixer->bicubic.filter) {
+  vl_bicubic_filter_cleanup(vmixer->bicubic.filter);
+  FREE(vmixer->bicubic.filter);
+   }
pipe_mutex_unlock(vmixer->device->mutex);
DeviceReference(>device, NULL);
 
@@ -230,9 +237,11 @@ VdpStatus vlVdpVideoMixerRender(VdpVideoMixer mixer,
 VdpLayer const *layers)
 {
enum vl_compositor_deinterlace deinterlace;
-   struct u_rect rect, clip, *prect;
+   struct u_rect rect, clip, *prect, dirty_area;
unsigned i, layer = 0;
struct pipe_video_buffer *video_buffer;
+   struct pipe_sampler_view *sampler_view;
+   struct pipe_surface *surface;
 
vlVdpVideoMixer *vmixer;
vlVdpSurface *surf;
@@ -325,7 +334,43 @@ VdpStatus vlVdpVideoMixerRender(VdpVideoMixer mixer,
   prect = 
}
vl_compositor_set_buffer_layer(>cstate, compositor, layer, 
video_buffer, prect, NULL, deinterlace);
-   vl_compositor_set_layer_dst_area(>cstate, layer++, 
RectToPipe(destination_video_rect, ));
+
+   if(vmixer->bicubic.filter) {
+  struct pipe_context *pipe;
+  struct pipe_resource res_tmpl, *res;
+  struct pipe_sampler_view sv_templ;
+  struct pipe_surface surf_templ;
+
+  pipe = vmixer->device->context;
+  memset(_tmpl, 0, sizeof(res_tmpl));
+
+  res_tmpl.target = PIPE_TEXTURE_2D;
+  res_tmpl.width0 = surf->templat.width;
+  res_tmpl.height0 = surf->templat.height;
+  res_tmpl.format = dst->sampler_view->texture->format;
+  res_tmpl.depth0 = 1;
+  res_tmpl.array_size = 1;
+  res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET;
+  res_tmpl.usage = PIPE_USAGE_DEFAULT;
+
+  res = pipe->screen->resource_create(pipe->screen, _tmpl);
+
+  vlVdpDefaultSamplerViewTemplate(_templ, res);
+  sampler_view = pipe->create_sampler_view(pipe, res, _templ);
+
+  memset(_templ, 0, sizeof(surf_templ));
+  surf_templ.format = res->format;
+  surface = pipe->create_surface(pipe, res, _templ);
+
+  vl_compositor_reset_dirty_area(_area);
+  pipe_resource_reference(, NULL);
+   } else {
+  surface = dst->surface;
+  sampler_view = dst->sampler_view;
+  dirty_area = dst->dirty_area;
+  vl_compositor_set_layer_dst_area(>cstate, layer++, 
RectToPipe(destination_video_rect, ));
+  vl_compositor_set_dst_clip(>cstate, RectToPipe(destination_rect, 
));
+   }
 
for (i = 0; i < layer_count; ++i) {
   vlVdpOutputSurface *src = vlGetDataHTAB(layers->source_surface);
@@ -343,22 +388,29 @@ VdpStatus vlVdpVideoMixerRender(VdpVideoMixer mixer,
   ++layers;
}
 
-   vl_compositor_set_dst_clip(>cstate, RectToPipe(destination_rect, 
));
-   if (!vmixer->noise_reduction.filter && !vmixer->sharpness.filter)
+   if (!vmixer->noise_reduction.filter && !vmixer->sharpness.filter && 
!vmixer->bicubic.filter)
   vlVdpSave4DelayedRendering(vmixer->device, destination_surface, 
>cstate);
else {
-  vl_compositor_render(>cstate, compositor, dst->surface, 
>dirty_area, true);
+  vl_compositor_render(>cstate, compositor, surface, _area, 
true);
 
-  /* applying the noise reduction 

[Mesa-dev] [PATCH 1/2] vl: add a bicubic interpolation filter(v4)

2016-06-28 Thread Nayan Deshmukh
This is a shader based bicubic interpolater which uses cubic
Hermite spline algorithm.

v2: set dst_area and dst_clip during scaling (Christian)
v3: clear the render target before rendering
v4: intialize offsets while initializing shaders
use a constant buffer to send dst_size to frag shader
small changes to reduce calculation in shader

Signed-off-by: Nayan Deshmukh 
---
 src/gallium/auxiliary/Makefile.sources   |   2 +
 src/gallium/auxiliary/vl/vl_bicubic_filter.c | 465 +++
 src/gallium/auxiliary/vl/vl_bicubic_filter.h |  63 
 3 files changed, 530 insertions(+)
 create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.c
 create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.h

diff --git a/src/gallium/auxiliary/Makefile.sources 
b/src/gallium/auxiliary/Makefile.sources
index ab58358..e0311bf 100644
--- a/src/gallium/auxiliary/Makefile.sources
+++ b/src/gallium/auxiliary/Makefile.sources
@@ -317,6 +317,8 @@ NIR_SOURCES := \
nir/tgsi_to_nir.h
 
 VL_SOURCES := \
+   vl/vl_bicubic_filter.c \
+   vl/vl_bicubic_filter.h \
vl/vl_compositor.c \
vl/vl_compositor.h \
vl/vl_csc.c \
diff --git a/src/gallium/auxiliary/vl/vl_bicubic_filter.c 
b/src/gallium/auxiliary/vl/vl_bicubic_filter.c
new file mode 100644
index 000..396e76d
--- /dev/null
+++ b/src/gallium/auxiliary/vl/vl_bicubic_filter.c
@@ -0,0 +1,465 @@
+/**
+ *
+ * Copyright 2016 Nayan Deshmukh.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ **/
+
+#include 
+
+#include "pipe/p_context.h"
+
+#include "tgsi/tgsi_ureg.h"
+
+#include "util/u_draw.h"
+#include "util/u_memory.h"
+#include "util/u_math.h"
+#include "util/u_rect.h"
+
+#include "vl_types.h"
+#include "vl_vertex_buffers.h"
+#include "vl_bicubic_filter.h"
+
+enum VS_OUTPUT
+{
+   VS_O_VPOS = 0,
+   VS_O_VTEX = 0
+};
+
+static void *
+create_vert_shader(struct vl_bicubic_filter *filter)
+{
+   struct ureg_program *shader;
+   struct ureg_src i_vpos;
+   struct ureg_dst o_vpos, o_vtex;
+
+   shader = ureg_create(PIPE_SHADER_VERTEX);
+   if (!shader)
+  return NULL;
+
+   i_vpos = ureg_DECL_vs_input(shader, 0);
+   o_vpos = ureg_DECL_output(shader, TGSI_SEMANTIC_POSITION, VS_O_VPOS);
+   o_vtex = ureg_DECL_output(shader, TGSI_SEMANTIC_GENERIC, VS_O_VTEX);
+
+   ureg_MOV(shader, o_vpos, i_vpos);
+   ureg_MOV(shader, o_vtex, i_vpos);
+
+   ureg_END(shader);
+
+   return ureg_create_shader_and_destroy(shader, filter->pipe);
+}
+
+static void
+create_frag_shader_cubic_interpolater(struct ureg_program *shader, struct 
ureg_src tex_a,
+  struct ureg_src tex_b, struct ureg_src 
tex_c,
+  struct ureg_src tex_d, struct ureg_src t,
+  struct ureg_dst o_fragment)
+{
+   struct ureg_dst temp[11];
+   struct ureg_dst t_2;
+   unsigned i;
+
+   for(i = 0; i < 11; ++i)
+   temp[i] = ureg_DECL_temporary(shader);
+   t_2 = ureg_DECL_temporary(shader);
+
+   /*
+* |temp[0]|   |  0  2  0  0 |  |tex_a|
+* |temp[1]| = | -1  0  1  0 |* |tex_b|
+* |temp[2]|   |  2 -5  4 -1 |  |tex_c|
+* |temp[3]|   | -1  3 -3  1 |  |tex_d|
+*/
+   ureg_MUL(shader, temp[0], tex_b, ureg_imm1f(shader, 2.0f));
+
+   ureg_MUL(shader, temp[1], tex_a, ureg_imm1f(shader, -1.0f));
+   ureg_MAD(shader, temp[1], tex_c, ureg_imm1f(shader, 1.0f),
+ureg_src(temp[1]));
+
+   ureg_MUL(shader, temp[2], tex_a, ureg_imm1f(shader, 2.0f));
+   ureg_MAD(shader, temp[2], tex_b, ureg_imm1f(shader, -5.0f),
+ureg_src(temp[2]));
+   ureg_MAD(shader, temp[2], tex_c, ureg_imm1f(shader, 4.0f),
+ureg_src(temp[2]));
+   ureg_MAD(shader, temp[2], tex_d, 

Re: [Mesa-dev] [PATCH 2/3] st/omx: add support for nouveau / interlaced

2016-06-28 Thread Christian König

Hi Leo,

nice catch patch is Reviewed-by: Christian König .

But we still need to fix transcoding issue with interlaced as true. 
Our transcode support tunneling, basic the decode buffer will be used 
directly for encode.


Ah, yes of course. Sorry I was a bit fast with giving my rb on that, 
should have thought about it more.


The problem is that the VCE engine can only handle progressive frames 
and not the interlaced memory layout.


What we should do is implementing interlaced -> progressive conversion 
in the omx state tracker tunneling handling when that happens. Then set 
interlaced to false in the template and reallocate the video buffer for 
the next trancoding round.


The weave filter from the compositor could be used for interlaced -> 
progressive conversion. And btw: We are going to have the same problem 
with the VA-API state tracker.


Regards,
Christian.

Am 28.06.2016 um 09:01 schrieb Julien Isorce:

Thx Leo.
I confirm it works with nouveau driver so your fix is:
Tested-by: Julien Isorce >


On 28 June 2016 at 02:27, Liu, Leo > wrote:


Hi Julien and Christian,


I got a patch attached to fix the "fillout" problem, and please
review.


But we still need to fix transcoding issue with interlaced as
true. Our transcode support tunneling, basic the decode buffer
will be used directly for encode.


Thanks,

Leo




*From:* Julien Isorce >
*Sent:* June 27, 2016 4:54:07 PM
*To:* Liu, Leo
*Cc:* ML mesa-dev; Gurkirpal Singh; Koenig, Christian
*Subject:* Re: [Mesa-dev] [PATCH 2/3] st/omx: add support for
nouveau / interlaced
Hi Leo,

Sorry for the inconvenience, could you let me know how to
reproduce the problem ?
I have been playing with some gst pipelines and they all work but
I can only test with nouveau driver.

Cheers
Julien


On 27 June 2016 at 21:35, Leo Liu > wrote:

This patch break omx decode to file, it got seg fault. Will
take look further.

Regards,
Leo



On 06/27/2016 04:16 AM, Julien Isorce wrote:

Signed-off-by: Julien Isorce >
---
src/gallium/state_trackers/omx/vid_dec.c | 51

  1 file changed, 26 insertions(+), 25 deletions(-)

diff --git a/src/gallium/state_trackers/omx/vid_dec.c
b/src/gallium/state_trackers/omx/vid_dec.c
index 564ca2f..85ffb88 100644
--- a/src/gallium/state_trackers/omx/vid_dec.c
+++ b/src/gallium/state_trackers/omx/vid_dec.c
@@ -48,6 +48,7 @@
  #include "pipe/p_video_codec.h"
  #include "util/u_memory.h"
  #include "util/u_surface.h"
+#include "vl/vl_video_buffer.h"
  #include "vl/vl_vlc.h"
#include "entrypoint.h"
@@ -515,34 +516,34 @@ static void
vid_dec_FillOutput(vid_dec_PrivateType *priv, struct
pipe_video_buff
 OMX_VIDEO_PORTDEFINITIONTYPE *def =
>sPortParam.format.video;
   struct pipe_sampler_view **views;
-   struct pipe_transfer *transfer;
-   struct pipe_box box = { };
-   uint8_t *src, *dst;
+   unsigned i, j;
+   unsigned width, height;
   views = buf->get_sampler_view_planes(buf);
  -   dst = output->pBuffer;
-
-   box.width = def->nFrameWidth;
-   box.height = def->nFrameHeight;
-   box.depth = 1;
-
-   src = priv->pipe->transfer_map(priv->pipe,
views[0]->texture, 0,
- PIPE_TRANSFER_READ, , );
-   util_copy_rect(dst, views[0]->texture->format,
def->nStride, 0, 0,
-  box.width, box.height, src,
transfer->stride, 0, 0);
-   pipe_transfer_unmap(priv->pipe, transfer);
-
-   dst = ((uint8_t*)output->pBuffer) + (def->nStride *
box.height);
-
-   box.width = def->nFrameWidth / 2;
-   box.height = def->nFrameHeight / 2;
-
-   src = priv->pipe->transfer_map(priv->pipe,
views[1]->texture, 0,
- PIPE_TRANSFER_READ, , );
-   util_copy_rect(dst, views[1]->texture->format,
def->nStride, 0, 0,
-  box.width, box.height, src,
transfer->stride, 0, 0);
-   pipe_transfer_unmap(priv->pipe, transfer);
+   for 

Re: [Mesa-dev] [PATCH 0/7] mesa: Enable -fstrict-aliasing

2016-06-28 Thread Erik Faye-Lund
On Mon, Jun 27, 2016 at 11:42 PM, Matt Turner  wrote:
> Based on work by Davin McCall  from last summer.
>
> The biggest change is to exec_list. Previously, the head and tail sentinels
> overlapped, saving the size of a pointer. Unfortunately this is not allowed by
> the aliasing rules.
>
> I have fixed all warnings GCC reports in my normal build. I have no attempted
> to see what else needs to be fixed. I hope that the respective owners of the
> rest of Mesa can look into the remaining warnings.
>
> This series depends on my 4 patch series to glx, and the trivial "[PATCH] 
> i965:
> Simplify foreach_inst_in_block_safe() macro."
>
> Discuss!

I like it. I have some similar patches here:
https://github.com/kusma/mesa/tree/strict-aliasing

I'm not entirely convinced about the endianess-correctness of all of
the memcpy-conversions, though... but I could easily be wrong.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates

2016-06-28 Thread Samuel Pitoiset



On 06/28/2016 05:10 AM, Ilia Mirkin wrote:

On Mon, Jun 27, 2016 at 6:08 PM, Samuel Pitoiset
 wrote:



On 06/28/2016 12:06 AM, Ilia Mirkin wrote:


On Mon, Jun 27, 2016 at 6:05 PM, Ilia Mirkin  wrote:


On Mon, Jun 27, 2016 at 6:04 PM, Samuel Pitoiset
 wrote:




On 06/28/2016 12:02 AM, Ilia Mirkin wrote:



This loses you saturation. Does the target account for this?




No saturate flag for FADD32I.



That's not what I asked.



Specifically look at this code:

bool
TargetNVC0::isSatSupported(const Instruction *insn) const
{
   if (insn->op == OP_CVT)
  return true;
   if (!(opInfo[insn->op].dstMods & NV50_IR_MOD_SAT))
  return false;

   if (insn->dType == TYPE_U32)
  return (insn->op == OP_ADD) || (insn->op == OP_MAD);

   // add f32 LIMM cannot saturate
   if (insn->op == OP_ADD && insn->sType == TYPE_F32) {
  if (insn->getSrc(1)->asImm() &&
  insn->getSrc(1)->reg.data.u32 & 0xfff)
 return false;
   }

Note how it will say that sat is supported for SIMMs with FADD? So the
compiler will generate those ops, but then the emitter won't be able
to handle it.



Okay, I get it.


By the way, instead of trying to fight the longIMMD, you should just fix it -

/*0008*/   @P0 FADD R0, R0, 1.NEG;  /*
0x3858203f8000 */

which corresponds nicely to

  emitNEG(0x2d, insn->src(1));

The issue is that emitIMMD does

   if (len == 19) {
...
  emitField( 56,   1, (val & 0x8) >> 19);
  emitField(pos, len, (val & 0x7));

So the problem is that the 56 isn't as fixed as the emission code had
hoped. I suspect that adjusting it will fix all these silly cases.

  -ilia



/*0010*/   @P0 FADD R0, R0, 0.NEG;  /* 
0x38582000 */
/*0010*/   @P0 FADD R0, R0, -0;  /* 
0x3958 */


urgh?

--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] nouveau_drv_video.so ?

2016-06-28 Thread poma

nouveau_drv_video.so - what should it be?


https://koji.fedoraproject.org/koji/buildinfo?buildID=722316
... 0.7.4-13 - Revert symlinks - should be handled by mesa rhbz#1271842
https://bugzilla.redhat.com/show_bug.cgi?id=1271842
... 0.7.4-12 - Add symlinks for radeonsi,r600,nouveau - rhbz#1264499
 https://bugzilla.redhat.com/show_bug.cgi?id=1264499


$ rpm -q libva libva-vdpau-driver mesa-dri-drivers
libva-1.7.1-1.fc24.x86_64
libva-vdpau-driver-0.7.4-14.fc24.x86_64
mesa-dri-drivers-11.2.2-2.20160614.fc24.x86_64

$ rpm -ql libva-vdpau-driver 
/usr/lib64/dri/nvidia_drv_video.so
/usr/lib64/dri/s3g_drv_video.so
/usr/lib64/dri/vdpau_drv_video.so
/usr/share/doc/...
...

$ rpm -ql mesa-dri-drivers
/etc/drirc
/usr/lib64/dri/gallium_drv_video.so
/usr/lib64/dri/i915_dri.so
/usr/lib64/dri/i965_dri.so
/usr/lib64/dri/ilo_dri.so
/usr/lib64/dri/kms_swrast_dri.so
/usr/lib64/dri/nouveau_dri.so
/usr/lib64/dri/nouveau_vieux_dri.so
/usr/lib64/dri/r200_dri.so
/usr/lib64/dri/r300_dri.so
/usr/lib64/dri/r600_dri.so
/usr/lib64/dri/radeon_dri.so
/usr/lib64/dri/radeonsi_dri.so
/usr/lib64/dri/swrast_dri.so
/usr/lib64/dri/virtio_gpu_dri.so
/usr/lib64/dri/vmwgfx_dri.so
/usr/lib64/gallium-pipe
/usr/lib64/gallium-pipe/pipe_i965.so
/usr/lib64/gallium-pipe/pipe_nouveau.so
/usr/lib64/gallium-pipe/pipe_r300.so
/usr/lib64/gallium-pipe/pipe_r600.so
/usr/lib64/gallium-pipe/pipe_radeonsi.so
/usr/lib64/gallium-pipe/pipe_swrast.so
/usr/lib64/gallium-pipe/pipe_vmwgfx.so

$ ll /usr/lib64/dri/
... dummy_drv_video.so
... gallium_drv_video.so
... i915_dri.so
... i965_dri.so
... ilo_dri.so
... kms_swrast_dri.so
... nouveau_dri.so
... nouveau_vieux_dri.so
... nvidia_drv_video.so -> vdpau_drv_video.so
... r200_dri.so
... r300_dri.so
... r600_dri.so
... radeon_dri.so
... radeonsi_dri.so
... s3g_drv_video.so -> vdpau_drv_video.so
... swrast_dri.so
... vdpau_drv_video.so
... virtio_gpu_dri.so
... vmwgfx_dri.so


$ icecat 
...
libva info: VA-API version 0.39.2
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib64/dri/nouveau_drv_video.so
libva info: va_openDriver() returns -1
libva info: VA-API version 0.39.2
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib64/dri/nouveau_drv_video.so
libva info: va_openDriver() returns -1
libva info: VA-API version 0.39.2
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib64/dri/gallium_drv_video.so
libva info: Found init function __vaDriverInit_0_39
libva info: va_openDriver() returns 0


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] V3 On disk shader cache for i965 (Now with real world results!)

2016-06-28 Thread Timothy Arceri
On Mon, 2016-06-27 at 00:46 +1000, Timothy Arceri wrote:
> On Sun, 2016-06-26 at 16:15 +0300, Grazvydas Ignotas wrote:
> > Tried this while playing with apitrace and am getting segfaults
> > when
> > running any trace with a cached (second) run. Not sure if it's
> > "wrong"
> > traces I've chosen or what, you can take one example from this bug:
> > https://bugs.freedesktop.org/show_bug.cgi?id=96425
> 
> Thanks for testing I'll take a look tomorrow.

The problem is the shaders were being detached after linking so we had
nothing to fallback to if we had a shade cache miss.
I've hacked something up and pushed it to the shader-cache19 branch
that allows the trace to run. Not sure how it relates to real game
performance but the trace goes from 5FPS to 7FPS on the second run on
my machine with which looks good :)

Note I would only use that branch for short testing  (e.g. running
traces) not in real games as the hack above will leak memory. I'll be
travelling tomorrow but should have a real fix by thursday.

Thanks again for testing.

> 
> > 
> > It would also be good idea to hide the cache debug messages behind
> > some env var, or at least send them to stderr and not stdout, as
> > stdout breaks programs that pipe data through stdout like
> > qapitrace.
> 
> Right thats my next task, I should get this done tomorrow also. As
> stated below :) "For now I have left in some printf's as the feature
> is
> still disabled by default and they are useful for debugging. I intend
> to fix this soon to hide them behind an environment var."
> 
> Thanks again.
> 
> > 
> > Gražvydas
> > 
> > On Sun, Jun 26, 2016 at 7:16 AM, Timothy Arceri
> >  wrote:
> > > I've spent a bunch of time rebasing this series to remove the
> > > excess
> > > code churn and I've just pushed the results to the shader-cache
> > > branch
> > > mentioned below. There are no code changes to the end result but
> > > I've
> > > managed to get the patch count down to 80 (was 96 i think) and
> > > things
> > > should be much easier to review now.
> > > 
> > > I've also had reports of people testing with additional games
> > > such
> > > as
> > > Dota 2 and seeing good results.
> > > 
> > > 
> > > On Tue, 2016-06-21 at 16:08 +1000, Timothy Arceri wrote:
> > > > Rather than send 90+ patches to the list. Please see the repo
> > > > at
> > > > the
> > > > bottom of this email.
> > > > 
> > > > The big update is I've added all stages but compute and tested
> > > > with a
> > > > few games and everything seems to be working well so far.
> > > > Enabling
> > > > shader cache with the Shadow of Mordor benchmark make things
> > > > noticeably
> > > > smoother and helps consitently keep the min FPS at 15 on my
> > > > Skylake,
> > > > were as without it can be anywhere between 4-15.
> > > > 
> > > > The elemental demo which Dave pointed out as also doing a bunch
> > > > of
> > > > compiles during the demo is also smoother especially on the
> > > > second
> > > > run
> > > > but its really slow on my Skylake regardless. Maybe someone
> > > > with
> > > > a
> > > > highend Skylake would like to give it a try.
> > > > 
> > > > 
> > > > V3:
> > > > - add support for geometry and tessellation stages
> > > > - cache clip planes
> > > > - reserve parameter storage before restoring list
> > > > - stop losing  buffer blocks on cache fallback
> > > > - lots of little fixes I cant remember
> > > > 
> > > > V2:
> > > > - rebased on master
> > > > - add support for encoding doubles
> > > > - renamed skip_cache params to is_cache_fallback, and fix
> > > > related
> > > > bug
> > > > when
> > > >  disabling shader cache for xfb.
> > > > 
> > > > This series is based on the great work done by Carl, Kristian
> > > > and
> > > > others.
> > > > 
> > > > I've split up Carls original patches for easier review, and
> > > > also
> > > > merged
> > > > a number of fixes and clean-ups into his patches. However there
> > > > is a
> > > > little more code churn than is ideal as the appoach taken by
> > > > the
> > > > original patches needed to be modified quite a lot, I'm hoping
> > > > its
> > > > not
> > > > more than people can live with as I'd like to keep some of the
> > > > history
> > > > rather than just squashing everything.
> > > > 
> > > > For now I have left in some printf's as the feature is still
> > > > disabled
> > > > by default and they are useful for debugging. I intend to fix
> > > > this
> > > > soon
> > > > to hide them behind an environment var.
> > > > 
> > > > There are no regressions after two runs of piglit with shader
> > > > cache
> > > > enabled on my Broadwell machine.
> > > > 
> > > > This series enables on disk shader cache for all stage except
> > > > compute
> > > > programs. For now transform feedback, and SSO programs skip
> > > > using
> > > > the
> > > > cache, these will be added as follow ups.
> > > > 
> > > > My main goal with this series is to land something that
> > > > passes piglit there is a number of optimisations 

Re: [Mesa-dev] [PATCH 2/3] st/omx: add support for nouveau / interlaced

2016-06-28 Thread Julien Isorce
Thx Leo.
I confirm it works with nouveau driver so your fix is:
Tested-by: Julien Isorce 

On 28 June 2016 at 02:27, Liu, Leo  wrote:

> Hi Julien and Christian,
>
>
> I got a patch attached to fix the "fillout" problem, and please review.
>
>
> But we still need to fix transcoding issue with interlaced as true. Our
> transcode support tunneling, basic the decode buffer will be used directly
> for encode.
>
>
> Thanks,
>
> Leo
>
>
>
> --
> *From:* Julien Isorce 
> *Sent:* June 27, 2016 4:54:07 PM
> *To:* Liu, Leo
> *Cc:* ML mesa-dev; Gurkirpal Singh; Koenig, Christian
> *Subject:* Re: [Mesa-dev] [PATCH 2/3] st/omx: add support for nouveau /
> interlaced
>
> Hi Leo,
>
> Sorry for the inconvenience, could you let me know how to reproduce the
> problem ?
> I have been playing with some gst pipelines and they all work but I can
> only test with nouveau driver.
>
> Cheers
> Julien
>
>
> On 27 June 2016 at 21:35, Leo Liu  wrote:
>
>> This patch break omx decode to file, it got seg fault. Will take look
>> further.
>>
>> Regards,
>> Leo
>>
>>
>>
>> On 06/27/2016 04:16 AM, Julien Isorce wrote:
>>
>>> Signed-off-by: Julien Isorce 
>>> ---
>>>   src/gallium/state_trackers/omx/vid_dec.c | 51
>>> 
>>>   1 file changed, 26 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/src/gallium/state_trackers/omx/vid_dec.c
>>> b/src/gallium/state_trackers/omx/vid_dec.c
>>> index 564ca2f..85ffb88 100644
>>> --- a/src/gallium/state_trackers/omx/vid_dec.c
>>> +++ b/src/gallium/state_trackers/omx/vid_dec.c
>>> @@ -48,6 +48,7 @@
>>>   #include "pipe/p_video_codec.h"
>>>   #include "util/u_memory.h"
>>>   #include "util/u_surface.h"
>>> +#include "vl/vl_video_buffer.h"
>>>   #include "vl/vl_vlc.h"
>>> #include "entrypoint.h"
>>> @@ -515,34 +516,34 @@ static void vid_dec_FillOutput(vid_dec_PrivateType
>>> *priv, struct pipe_video_buff
>>>  OMX_VIDEO_PORTDEFINITIONTYPE *def = >sPortParam.format.video;
>>>struct pipe_sampler_view **views;
>>> -   struct pipe_transfer *transfer;
>>> -   struct pipe_box box = { };
>>> -   uint8_t *src, *dst;
>>> +   unsigned i, j;
>>> +   unsigned width, height;
>>>views = buf->get_sampler_view_planes(buf);
>>>   -   dst = output->pBuffer;
>>> -
>>> -   box.width = def->nFrameWidth;
>>> -   box.height = def->nFrameHeight;
>>> -   box.depth = 1;
>>> -
>>> -   src = priv->pipe->transfer_map(priv->pipe, views[0]->texture, 0,
>>> -  PIPE_TRANSFER_READ, , );
>>> -   util_copy_rect(dst, views[0]->texture->format, def->nStride, 0, 0,
>>> -  box.width, box.height, src, transfer->stride, 0, 0);
>>> -   pipe_transfer_unmap(priv->pipe, transfer);
>>> -
>>> -   dst = ((uint8_t*)output->pBuffer) + (def->nStride * box.height);
>>> -
>>> -   box.width = def->nFrameWidth / 2;
>>> -   box.height = def->nFrameHeight / 2;
>>> -
>>> -   src = priv->pipe->transfer_map(priv->pipe, views[1]->texture, 0,
>>> -  PIPE_TRANSFER_READ, , );
>>> -   util_copy_rect(dst, views[1]->texture->format, def->nStride, 0, 0,
>>> -  box.width, box.height, src, transfer->stride, 0, 0);
>>> -   pipe_transfer_unmap(priv->pipe, transfer);
>>> +   for (i = 0; i < 2 /* NV12 */; i++) {
>>> +  if (!views[i]) continue;
>>> +  width = buf->width;
>>> +  height = buf->height;
>>> +  vl_video_buffer_adjust_size(, , i, buf->interlaced,
>>> buf->chroma_format);
>>> +  for (j = 0; j < views[i]->texture->array_size; ++j) {
>>> + struct pipe_box box = {0, 0, j, width, height, 1};
>>> + struct pipe_transfer *transfer;
>>> + uint8_t *map, *dst;
>>> + map = priv->pipe->transfer_map(priv->pipe, views[i]->texture,
>>> 0,
>>> +  PIPE_TRANSFER_READ, , );
>>> + if (!map)
>>> +return;
>>> +
>>> + dst = ((uint8_t*)output->pBuffer + output->nOffset) + j *
>>> def->nStride + i * buf->width * buf->height;
>>> + util_copy_rect(dst,
>>> +views[i]->texture->format,
>>> +def->nStride * views[i]->texture->array_size, 0, 0,
>>> +box.width, box.height, map, transfer->stride, 0, 0);
>>> +
>>> + pipe_transfer_unmap(priv->pipe, transfer);
>>> +  }
>>> +   }
>>>   }
>>> static void vid_dec_FrameDecoded(OMX_COMPONENTTYPE *comp,
>>> OMX_BUFFERHEADERTYPE* input,
>>>
>>
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] clover: fix getting struct args api size

2016-06-28 Thread Francisco Jerez
Francisco Jerez  writes:

> Serge Martin  writes:
>
>> This fix getting the size of a struct arg. vec3 types still work ok.
>> Only buit-in args need to have power of two alignment, getTypeAllocSize
>> reports the correct size.
>> ---
>>  src/gallium/state_trackers/clover/llvm/invocation.cpp | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
>> b/src/gallium/state_trackers/clover/llvm/invocation.cpp
>> index 03487d6..9af51539 100644
>> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
>> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
>> @@ -472,7 +472,8 @@ namespace {
>>   // aligned to the next larger power of two".  We need this
>>   // alignment for three element vectors, which have
>>   // non-power-of-2 store size.
>> - const unsigned arg_api_size = 
>> util_next_power_of_two(arg_store_size);
>> + const unsigned arg_api_size = arg_type->isStructTy() ?
>> +   arg_store_size : util_next_power_of_two(arg_store_size);
>>  
> Hm...  Isn't this still going to be broken if you pass a struct argument
> to a kernel function and the alignment of any of the struct members
> doesn't match the target-specific data layout?  Not sure we can fix this
> sensibly without requiring the target's data layout to match the CL API
> exactly.  Any suggestions Tom?
>

Unless someone has a better plan, I suggest we roll back to v1.1 of this
patch and call it a back-end data layout bug if the expected alignment
or size of a kernel argument type doesn't match the requirements set by
the CL spec.

>>   llvm::Type *target_type = arg_type->isIntegerTy() ?
>> TD.getSmallestLegalIntType(mod->getContext(), arg_store_size 
>> * 8)
>> -- 
>> 2.5.5


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 31/34] i965/state: Account for the element size in emit_buffer_surface_state

2016-06-28 Thread Pohjolainen, Topi
On Thu, Jun 23, 2016 at 02:00:30PM -0700, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c  | 11 ++-
>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |  9 +
>  src/mesa/drivers/dri/i965/gen8_surface_state.c|  9 +
>  3 files changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> index 944d64d..29b8976 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> @@ -496,6 +496,7 @@ gen4_emit_buffer_surface_state(struct brw_context *brw,
> unsigned pitch,
> bool rw)
>  {
> +   unsigned elements = buffer_size / pitch;

Could be const as well as in the two other occurences further down.

> uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
>  6 * 4, 32, out_offset);
> memset(surf, 0, 6 * 4);
> @@ -504,9 +505,9 @@ gen4_emit_buffer_surface_state(struct brw_context *brw,
>   surface_format << BRW_SURFACE_FORMAT_SHIFT |
>   (brw->gen >= 6 ? BRW_SURFACE_RC_READ_WRITE : 0);
> surf[1] = (bo ? bo->offset64 : 0) + buffer_offset; /* reloc */
> -   surf[2] = ((buffer_size - 1) & 0x7f) << BRW_SURFACE_WIDTH_SHIFT |
> - (((buffer_size - 1) >> 7) & 0x1fff) << BRW_SURFACE_HEIGHT_SHIFT;
> -   surf[3] = (((buffer_size - 1) >> 20) & 0x7f) << BRW_SURFACE_DEPTH_SHIFT |
> +   surf[2] = ((elements - 1) & 0x7f) << BRW_SURFACE_WIDTH_SHIFT |
> + (((elements - 1) >> 7) & 0x1fff) << BRW_SURFACE_HEIGHT_SHIFT;
> +   surf[3] = (((elements - 1) >> 20) & 0x7f) << BRW_SURFACE_DEPTH_SHIFT |
>   (pitch - 1) << BRW_SURFACE_PITCH_SHIFT;
>  
> /* Emit relocation to surface contents.  The 965 PRM, Volume 4, section
> @@ -549,7 +550,7 @@ brw_update_buffer_texture_surface(struct gl_context *ctx,
> brw->vtbl.emit_buffer_surface_state(brw, surf_offset, bo,
> tObj->BufferOffset,
> brw_format,
> -   size / texel_size,
> +   size,
> texel_size,
> false /* rw */);
>  }
> @@ -1480,7 +1481,7 @@ update_image_surface(struct brw_context *brw,
>  
>   brw->vtbl.emit_buffer_surface_state(
>  brw, surf_offset, intel_obj->buffer, obj->BufferOffset,
> -format, intel_obj->Base.Size / texel_size, texel_size,
> +format, intel_obj->Base.Size, texel_size,
>  access != GL_READ_ONLY);
>  
>   update_buffer_image_param(brw, u, surface_idx, param);
> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> index bb94f2d..65a1cb0 100644
> --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> @@ -135,6 +135,7 @@ gen7_emit_buffer_surface_state(struct brw_context *brw,
> unsigned pitch,
> bool rw)
>  {
> +   unsigned elements = buffer_size / pitch;
> uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
>  8 * 4, 32, out_offset);
> memset(surf, 0, 8 * 4);
> @@ -143,12 +144,12 @@ gen7_emit_buffer_surface_state(struct brw_context *brw,
>   surface_format << BRW_SURFACE_FORMAT_SHIFT |
>   BRW_SURFACE_RC_READ_WRITE;
> surf[1] = (bo ? bo->offset64 : 0) + buffer_offset; /* reloc */
> -   surf[2] = SET_FIELD((buffer_size - 1) & 0x7f, GEN7_SURFACE_WIDTH) |
> - SET_FIELD(((buffer_size - 1) >> 7) & 0x3fff, 
> GEN7_SURFACE_HEIGHT);
> +   surf[2] = SET_FIELD((elements - 1) & 0x7f, GEN7_SURFACE_WIDTH) |
> + SET_FIELD(((elements - 1) >> 7) & 0x3fff, GEN7_SURFACE_HEIGHT);
> if (surface_format == BRW_SURFACEFORMAT_RAW)
> -  surf[3] = SET_FIELD(((buffer_size - 1) >> 21) & 0x3ff, 
> BRW_SURFACE_DEPTH);
> +  surf[3] = SET_FIELD(((elements - 1) >> 21) & 0x3ff, BRW_SURFACE_DEPTH);
> else
> -  surf[3] = SET_FIELD(((buffer_size - 1) >> 21) & 0x3f, 
> BRW_SURFACE_DEPTH);
> +  surf[3] = SET_FIELD(((elements - 1) >> 21) & 0x3f, BRW_SURFACE_DEPTH);
> surf[3] |= (pitch - 1);
>  
> surf[5] = SET_FIELD(GEN7_MOCS_L3, GEN7_SURFACE_MOCS);
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index 00e4c48..9ac8a48 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -63,6 +63,7 @@ gen8_emit_buffer_surface_state(struct brw_context *brw,
> unsigned pitch,
> bool rw)
>  {
> +   unsigned 

  1   2   >