Re: [Mesa-dev] Why do vulkan display surfaces not support alpha blending?

2020-03-19 Thread Keith Packard
Austin Shafer  writes:

> I'm just curious if there is a technical reason why blending isn't
> allowed, as the vulkan spec seems to permit it.

Just not implemented yet.

-- 
-keith


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Why do vulkan display surfaces not support alpha blending?

2020-03-19 Thread Austin Shafer
Hi all,

I noticed on an intel laptop that the only supported alpha mode for
any display plane is opaque. Is there a specific reason for this? I'm
trying to draw interfaces and things direct to the display and it
would be really nice to have alpha blending.

>From src/vulkan/wsi/wsi_common_display.c:

 721 VkResult
 722 wsi_get_display_plane_capabilities(VkPhysicalDevice physical_device,
 723struct wsi_device *wsi_device,
 724VkDisplayModeKHR mode_khr,
 725uint32_t plane_index,
 726VkDisplayPlaneCapabilitiesKHR 
*capabilities)
 727 {
 728struct wsi_display_mode *mode = wsi_display_mode_from_handle(mode_khr);
 729
 730/* XXX use actual values */
 731capabilities->supportedAlpha = VK_DISPLAY_PLANE_ALPHA_OPAQUE_BIT_KHR;

I'm just curious if there is a technical reason why blending isn't
allowed, as the vulkan spec seems to permit it.

Thanks!
Austin Shafer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC PATCH v2 6/6] nv50: Add shader disk caching

2020-03-19 Thread Mark Menzynski
Adds shader disk caching for nv50 to reduce the need to every time compile
shaders. Shaders are saved into disk_shader_cache from nv50_screen structure.

It serializes the input nv50_ir_prog_info to compute the hash key and
also to do a byte compare between the original nv50_ir_prog_info and the one
saved in the cache. If keys match and also the byte compare returns they
are equal, shaders are same, and the compiled nv50_ir_prog_info_out from the
cache can be used instead of compiling input info.

Signed-off-by: Mark Menzynski 
---
 .../drivers/nouveau/nv50/nv50_program.c   | 276 +++---
 .../drivers/nouveau/nv50/nv50_program.h   |   2 +
 .../drivers/nouveau/nv50/nv50_shader_state.c  |   4 +-
 src/gallium/drivers/nouveau/nv50/nv50_state.c |   1 +
 4 files changed, 47 insertions(+), 236 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index b5e36cf488d..156ac286a7f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -37,108 +37,6 @@ bitcount4(const uint32_t val)
return cnt[val & 0xf];
 }
 
-static int
-nv50_vertprog_assign_slots(struct nv50_ir_prog_info_out *info)
-{
-   struct nv50_program *prog = (struct nv50_program *)info->driverPriv;
-   unsigned i, n, c;
-
-   n = 0;
-   for (i = 0; i < info->numInputs; ++i) {
-  prog->in[i].id = i;
-  prog->in[i].sn = info->in[i].sn;
-  prog->in[i].si = info->in[i].si;
-  prog->in[i].hw = n;
-  prog->in[i].mask = info->in[i].mask;
-
-  prog->vp.attrs[(4 * i) / 32] |= info->in[i].mask << ((4 * i) % 32);
-
-  for (c = 0; c < 4; ++c)
- if (info->in[i].mask & (1 << c))
-info->in[i].slot[c] = n++;
-
-  if (info->in[i].sn == TGSI_SEMANTIC_PRIMID)
- prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_PRIMITIVE_ID;
-   }
-   prog->in_nr = info->numInputs;
-
-   for (i = 0; i < info->numSysVals; ++i) {
-  switch (info->sv[i].sn) {
-  case TGSI_SEMANTIC_INSTANCEID:
- prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_INSTANCE_ID;
- continue;
-  case TGSI_SEMANTIC_VERTEXID:
- prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID;
- prog->vp.attrs[2] |= 
NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID_DRAW_ARRAYS_ADD_START;
- continue;
-  default:
- break;
-  }
-   }
-
-   /*
-* Corner case: VP has no inputs, but we will still need to submit data to
-* draw it. HW will shout at us and won't draw anything if we don't enable
-* any input, so let's just pretend it's the first one.
-*/
-   if (prog->vp.attrs[0] == 0 &&
-   prog->vp.attrs[1] == 0 &&
-   prog->vp.attrs[2] == 0)
-  prog->vp.attrs[0] |= 0xf;
-
-   /* VertexID before InstanceID */
-   if (info->io.vertexId < info->numSysVals)
-  info->sv[info->io.vertexId].slot[0] = n++;
-   if (info->io.instanceId < info->numSysVals)
-  info->sv[info->io.instanceId].slot[0] = n++;
-
-   n = 0;
-   for (i = 0; i < info->numOutputs; ++i) {
-  switch (info->out[i].sn) {
-  case TGSI_SEMANTIC_PSIZE:
- prog->vp.psiz = i;
- break;
-  case TGSI_SEMANTIC_CLIPDIST:
- prog->vp.clpd[info->out[i].si] = n;
- break;
-  case TGSI_SEMANTIC_EDGEFLAG:
- prog->vp.edgeflag = i;
- break;
-  case TGSI_SEMANTIC_BCOLOR:
- prog->vp.bfc[info->out[i].si] = i;
- break;
-  case TGSI_SEMANTIC_LAYER:
- prog->gp.has_layer = true;
- prog->gp.layerid = n;
- break;
-  case TGSI_SEMANTIC_VIEWPORT_INDEX:
- prog->gp.has_viewport = true;
- prog->gp.viewportid = n;
- break;
-  default:
- break;
-  }
-  prog->out[i].id = i;
-  prog->out[i].sn = info->out[i].sn;
-  prog->out[i].si = info->out[i].si;
-  prog->out[i].hw = n;
-  prog->out[i].mask = info->out[i].mask;
-
-  for (c = 0; c < 4; ++c)
- if (info->out[i].mask & (1 << c))
-info->out[i].slot[c] = n++;
-   }
-   prog->out_nr = info->numOutputs;
-   prog->max_out = n;
-   if (!prog->max_out)
-  prog->max_out = 1;
-
-   if (prog->vp.psiz < info->numOutputs)
-  prog->vp.psiz = prog->out[prog->vp.psiz].hw;
-
-   return 0;
-}
-
 static int
 nv50_vertprog_assign_slots_info(struct nv50_ir_prog_info_out *info)
 {
@@ -263,115 +161,6 @@ nv50_vertprog_assign_slots_prog(struct 
nv50_ir_prog_info_out *info)
return 0;
 }
 
-static int
-nv50_fragprog_assign_slots(struct nv50_ir_prog_info_out *info)
-{
-   struct nv50_program *prog = (struct nv50_program *)info->driverPriv;
-   unsigned i, n, m, c;
-   unsigned nvary;
-   unsigned nflat;
-   unsigned nintp = 0;
-
-   /* count recorded non-flat inputs */
-   for (m = 0, i = 0; i < info->numInputs; ++i) {
-  switch (info->in[i].sn) {
-  case TGSI_SEMANTIC_POSITION:
- continue;
-  default:
- m += info->in[i].flat ? 0 : 1;
- 

[Mesa-dev] [RFC PATCH v2 1/6] nv50/ir: add nv50_ir_prog_info_out

2020-03-19 Thread Mark Menzynski
From: Karol Herbst 

Split out the output relevant fields from the nv50_ir_prog_info struct
in order to have a cleaner separation between the input and output of
the compilation.

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir.cpp   |  49 ++--
 src/gallium/drivers/nouveau/codegen/nv50_ir.h |   9 +-
 .../drivers/nouveau/codegen/nv50_ir_driver.h  | 117 +---
 .../nouveau/codegen/nv50_ir_from_common.cpp   |  14 +-
 .../nouveau/codegen/nv50_ir_from_common.h |   3 +-
 .../nouveau/codegen/nv50_ir_from_nir.cpp  | 204 +++---
 .../nouveau/codegen/nv50_ir_from_tgsi.cpp | 256 +-
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp |   6 +-
 .../nouveau/codegen/nv50_ir_target.cpp|   2 +-
 .../drivers/nouveau/codegen/nv50_ir_target.h  |   5 +-
 .../nouveau/codegen/nv50_ir_target_nv50.cpp   |  17 +-
 .../nouveau/codegen/nv50_ir_target_nv50.h |   3 +-
 .../drivers/nouveau/nouveau_compiler.c|   9 +-
 .../drivers/nouveau/nv50/nv50_program.c   |  62 +++--
 .../drivers/nouveau/nvc0/nvc0_program.c   |  87 +++---
 15 files changed, 449 insertions(+), 394 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
index c65853578f6..c2c5956874a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
@@ -1241,15 +1241,18 @@ void Program::releaseValue(Value *value)
 extern "C" {
 
 static void
-nv50_ir_init_prog_info(struct nv50_ir_prog_info *info)
+nv50_ir_init_prog_info(struct nv50_ir_prog_info *info,
+   struct nv50_ir_prog_info_out *info_out)
 {
+   info_out->target = info->target;
+   info_out->type = info->type;
if (info->type == PIPE_SHADER_TESS_CTRL || info->type == 
PIPE_SHADER_TESS_EVAL) {
-  info->prop.tp.domain = PIPE_PRIM_MAX;
-  info->prop.tp.outputPrim = PIPE_PRIM_MAX;
+  info_out->prop.tp.domain = PIPE_PRIM_MAX;
+  info_out->prop.tp.outputPrim = PIPE_PRIM_MAX;
}
if (info->type == PIPE_SHADER_GEOMETRY) {
-  info->prop.gp.instanceCount = 1;
-  info->prop.gp.maxVertices = 1;
+  info_out->prop.gp.instanceCount = 1;
+  info_out->prop.gp.maxVertices = 1;
}
if (info->type == PIPE_SHADER_COMPUTE) {
   info->prop.cp.numThreads[0] =
@@ -1257,23 +1260,26 @@ nv50_ir_init_prog_info(struct nv50_ir_prog_info *info)
   info->prop.cp.numThreads[2] = 1;
}
info->io.pointSize = 0xff;
-   info->io.instanceId = 0xff;
-   info->io.vertexId = 0xff;
-   info->io.edgeFlagIn = 0xff;
-   info->io.edgeFlagOut = 0xff;
-   info->io.fragDepth = 0xff;
-   info->io.sampleMask = 0xff;
+   info_out->bin.smemSize = info->bin.smemSize;
+   info_out->io.genUserClip = info->io.genUserClip;
+   info_out->io.instanceId = 0xff;
+   info_out->io.vertexId = 0xff;
+   info_out->io.edgeFlagIn = 0xff;
+   info_out->io.edgeFlagOut = 0xff;
+   info_out->io.fragDepth = 0xff;
+   info_out->io.sampleMask = 0xff;
info->io.backFaceColor[0] = info->io.backFaceColor[1] = 0xff;
 }
 
 int
-nv50_ir_generate_code(struct nv50_ir_prog_info *info)
+nv50_ir_generate_code(struct nv50_ir_prog_info *info,
+  struct nv50_ir_prog_info_out *info_out)
 {
int ret = 0;
 
nv50_ir::Program::Type type;
 
-   nv50_ir_init_prog_info(info);
+   nv50_ir_init_prog_info(info, info_out);
 
 #define PROG_TYPE_CASE(a, b)  \
case PIPE_SHADER_##a: type = nv50_ir::Program::TYPE_##b; break
@@ -1301,15 +1307,16 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info)
   return -1;
}
prog->driver = info;
+   prog->driver_out = info_out;
prog->dbgFlags = info->dbgFlags;
prog->optLevel = info->optLevel;
 
switch (info->bin.sourceRep) {
case PIPE_SHADER_IR_NIR:
-  ret = prog->makeFromNIR(info) ? 0 : -2;
+  ret = prog->makeFromNIR(info, info_out) ? 0 : -2;
   break;
case PIPE_SHADER_IR_TGSI:
-  ret = prog->makeFromTGSI(info) ? 0 : -2;
+  ret = prog->makeFromTGSI(info, info_out) ? 0 : -2;
   break;
default:
   ret = -1;
@@ -1320,7 +1327,7 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info)
if (prog->dbgFlags & NV50_IR_DEBUG_VERBOSE)
   prog->print();
 
-   targ->parseDriverInfo(info);
+   targ->parseDriverInfo(info, info_out);
prog->getTarget()->runLegalizePass(prog, nv50_ir::CG_STAGE_PRE_SSA);
 
prog->convertToSSA();
@@ -1342,7 +1349,7 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info)
 
prog->optimizePostRA(info->optLevel);
 
-   if (!prog->emitBinary(info)) {
+   if (!prog->emitBinary(info_out)) {
   ret = -5;
   goto out;
}
@@ -1350,10 +1357,10 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info)
 out:
INFO_DBG(prog->dbgFlags, VERBOSE, "nv50_ir_generate_code: ret = %i\n", ret);
 
-   info->bin.maxGPR = prog->maxGPR;
-   info->bin.code = prog->code;
-   info->bin.codeSize = prog->binSize;
-   info->bin.tlsSpace = 

[Mesa-dev] [RFC PATCH v2 4/6] nv50/ir: Add nv50_ir_prog_info serialize

2020-03-19 Thread Mark Menzynski
Adds a function for serializing a nv50_ir_prog_info structure, which is
needed for shader caching.

Signed-off-by: Mark Menzynski 
---
 .../drivers/nouveau/codegen/nv50_ir_driver.h  |  4 +
 .../nouveau/codegen/nv50_ir_serialize.cpp | 81 +++
 .../drivers/nouveau/nvc0/nvc0_context.h   |  1 +
 .../drivers/nouveau/nvc0/nvc0_program.c   | 43 --
 .../drivers/nouveau/nvc0/nvc0_shader_state.c  |  3 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c |  2 +
 6 files changed, 128 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index 10ae5cbe420..3728470ab45 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -278,6 +278,10 @@ namespace nv50_ir
 extern void
 nv50_ir_prog_info_out_print(struct nv50_ir_prog_info_out *);
 
+/* Serialize a nv50_ir_prog_info structure and save it into blob */
+extern bool
+nv50_ir_prog_info_serialize(struct blob *, struct nv50_ir_prog_info *);
+
 /* Serialize a nv50_ir_prog_info_out structure and save it into blob */
 extern bool
 nv50_ir_prog_info_out_serialize(struct blob *, struct nv50_ir_prog_info_out *);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_serialize.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_serialize.cpp
index 5671483bd4e..b640cb67503 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_serialize.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_serialize.cpp
@@ -17,6 +17,87 @@ enum FixupApplyFunc {
FLIP_GM107 = 7
 };
 
+extern bool
+nv50_ir_prog_info_serialize(struct blob *blob, struct nv50_ir_prog_info *info)
+{
+   blob_write_uint16(blob, info->target);
+   blob_write_uint8(blob, info->type);
+   blob_write_uint8(blob, info->optLevel);
+   blob_write_uint8(blob, info->dbgFlags);
+   blob_write_uint8(blob, info->omitLineNum);
+   blob_write_uint32(blob, info->bin.smemSize);
+   blob_write_uint16(blob, info->bin.maxOutput);
+   blob_write_uint8(blob, info->bin.sourceRep);
+
+   switch(info->bin.sourceRep) {
+  case PIPE_SHADER_IR_TGSI: {
+ struct tgsi_token *tokens = (struct tgsi_token *)info->bin.source;
+ unsigned int num_tokens = tgsi_num_tokens(tokens);
+
+ blob_write_uint32(blob, num_tokens);
+ blob_write_bytes(blob, tokens, num_tokens * sizeof(struct 
tgsi_token));
+ break;
+  }
+  case PIPE_SHADER_IR_NIR: {
+ struct nir_shader *nir = (struct nir_shader *)info->bin.source;
+ nir_serialize(blob, nir, false);
+ break;
+  }
+  default:
+ assert(!"unhandled info->bin.sourceRep");
+ return false;
+   }
+
+   blob_write_uint16(blob, info->immd.bufSize);
+   blob_write_bytes(blob, info->immd.buf, info->immd.bufSize * 
sizeof(*info->immd.buf));
+   blob_write_uint16(blob, info->immd.count);
+   blob_write_bytes(blob, info->immd.data, info->immd.count * 
sizeof(*info->immd.data));
+   blob_write_bytes(blob, info->immd.type, info->immd.count * 16); // for each 
vec4 (128 bit)
+
+   switch (info->type) {
+  case PIPE_SHADER_VERTEX:
+ blob_write_bytes(blob, info->prop.vp.inputMask,
+  4 * sizeof(*info->prop.vp.inputMask)); /* array of 
size 4 */
+ break;
+  case PIPE_SHADER_TESS_CTRL:
+ blob_write_uint32(blob, info->prop.cp.inputOffset);
+ blob_write_uint32(blob, info->prop.cp.sharedOffset);
+ blob_write_uint32(blob, info->prop.cp.gridInfoBase);
+ blob_write_bytes(blob, info->prop.cp.numThreads,
+  3 * sizeof(*info->prop.cp.numThreads)); /* array of 
size 3 */
+  case PIPE_SHADER_GEOMETRY:
+ blob_write_uint8(blob, info->prop.gp.inputPrim);
+ break;
+  case PIPE_SHADER_FRAGMENT:
+ blob_write_uint8(blob, info->prop.fp.persampleInvocation);
+ break;
+  default:
+ break;
+   }
+
+   blob_write_uint8(blob, info->io.auxCBSlot);
+   blob_write_uint16(blob, info->io.ucpBase);
+   blob_write_uint16(blob, info->io.drawInfoBase);
+   blob_write_uint16(blob, info->io.alphaRefBase);
+   blob_write_uint8(blob, info->io.pointSize);
+   blob_write_uint8(blob, info->io.viewportId);
+   blob_write_bytes(blob, info->io.backFaceColor, 2 * 
sizeof(*info->io.backFaceColor));
+   blob_write_uint8(blob, info->io.mul_zero_wins);
+   blob_write_uint8(blob, info->io.nv50styleSurfaces);
+   blob_write_uint16(blob, info->io.texBindBase);
+   blob_write_uint16(blob, info->io.fbtexBindBase);
+   blob_write_uint16(blob, info->io.suInfoBase);
+   blob_write_uint16(blob, info->io.bindlessBase);
+   blob_write_uint16(blob, info->io.bufInfoBase);
+   blob_write_uint16(blob, info->io.sampleInfoBase);
+   blob_write_uint8(blob, info->io.msInfoCBSlot);
+   blob_write_uint16(blob, info->io.msInfoBase);
+   blob_write_uint16(blob, info->io.uboInfoBase);
+   blob_write_uint8(blob, info->io.genUserClip);
+
+   return true;
+}

[Mesa-dev] [RFC PATCH v2 2/6] nv50/ir: Add nv50_ir_prog_info_out serialize and deserialize

2020-03-19 Thread Mark Menzynski
Adds functions for serializing and deserializing
nv50_ir_prog_info_out structure, which are needed for shader caching.

Signed-off-by: Mark Menzynski 
---
 .../drivers/nouveau/codegen/nv50_ir_driver.h  |  44 
 .../nouveau/codegen/nv50_ir_emit_gk110.cpp|  14 +-
 .../nouveau/codegen/nv50_ir_emit_gm107.cpp|  14 +-
 .../nouveau/codegen/nv50_ir_emit_nv50.cpp |   6 +-
 .../nouveau/codegen/nv50_ir_emit_nvc0.cpp |  14 +-
 .../nouveau/codegen/nv50_ir_serialize.cpp | 196 ++
 src/gallium/drivers/nouveau/meson.build   |   1 +
 7 files changed, 265 insertions(+), 24 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/codegen/nv50_ir_serialize.cpp

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index dab1ce030cb..eea32133ccf 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -25,6 +25,7 @@
 
 #include "pipe/p_shader_tokens.h"
 
+#include "util/blob.h"
 #include "tgsi/tgsi_util.h"
 #include "tgsi/tgsi_parse.h"
 #include "tgsi/tgsi_scan.h"
@@ -242,6 +243,49 @@ nv50_ir_apply_fixups(void *fixupData, uint32_t *code,
 extern void nv50_ir_get_target_library(uint32_t chipset,
const uint32_t **code, uint32_t *size);
 
+
+#ifdef __cplusplus
+namespace nv50_ir
+{
+   class FixupEntry;
+   class FixupData;
+
+   void
+   gk110_interpApply(const nv50_ir::FixupEntry *entry, uint32_t *code,
+ const nv50_ir::FixupData& data);
+   void
+   gm107_interpApply(const nv50_ir::FixupEntry *entry, uint32_t *code,
+ const nv50_ir::FixupData& data);
+   void
+   nv50_interpApply(const nv50_ir::FixupEntry *entry, uint32_t *code,
+const nv50_ir::FixupData& data);
+   void
+   nvc0_interpApply(const nv50_ir::FixupEntry *entry, uint32_t *code,
+const nv50_ir::FixupData& data);
+   void
+   gk110_selpFlip(const nv50_ir::FixupEntry *entry, uint32_t *code,
+  const nv50_ir::FixupData& data);
+   void
+   gm107_selpFlip(const nv50_ir::FixupEntry *entry, uint32_t *code,
+  const nv50_ir::FixupData& data);
+   void
+   nvc0_selpFlip(const nv50_ir::FixupEntry *entry, uint32_t *code,
+ const nv50_ir::FixupData& data);
+
+}
+#endif
+
+/* Serialize a nv50_ir_prog_info_out structure and save it into blob */
+extern bool
+nv50_ir_prog_info_out_serialize(struct blob *, struct nv50_ir_prog_info_out *);
+
+/* Deserialize from data and save into a nv50_ir_prog_info_out structure
+ * using a pointer. Size is a total size of the serialized data.
+ * Offset points to where info_out in data is located. */
+extern bool
+nv50_ir_prog_info_out_deserialize(void *data, size_t size, size_t offset,
+  struct nv50_ir_prog_info_out *);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 2118c3153f7..e651d7fdcb0 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -1209,8 +1209,8 @@ CodeEmitterGK110::emitSLCT(const CmpInstruction *i)
}
 }
 
-static void
-selpFlip(const FixupEntry *entry, uint32_t *code, const FixupData& data)
+void
+gk110_selpFlip(const FixupEntry *entry, uint32_t *code, const FixupData& data)
 {
int loc = entry->loc;
if (data.force_persample_interp)
@@ -1227,7 +1227,7 @@ void CodeEmitterGK110::emitSELP(const Instruction *i)
   code[1] |= 1 << 13;
 
if (i->subOp == 1) {
-  addInterp(0, 0, selpFlip);
+  addInterp(0, 0, gk110_selpFlip);
}
 }
 
@@ -2042,8 +2042,8 @@ CodeEmitterGK110::emitInterpMode(const Instruction *i)
code[1] |= (i->ipa & 0xc) << (19 - 2);
 }
 
-static void
-interpApply(const FixupEntry *entry, uint32_t *code, const FixupData& data)
+void
+gk110_interpApply(const struct FixupEntry *entry, uint32_t *code, const 
FixupData& data)
 {
int ipa = entry->ipa;
int reg = entry->reg;
@@ -2078,10 +2078,10 @@ CodeEmitterGK110::emitINTERP(const Instruction *i)
 
if (i->op == OP_PINTERP) {
   srcId(i->src(1), 23);
-  addInterp(i->ipa, SDATA(i->src(1)).id, interpApply);
+  addInterp(i->ipa, SDATA(i->src(1)).id, gk110_interpApply);
} else {
   code[0] |= 0xff << 23;
-  addInterp(i->ipa, 0xff, interpApply);
+  addInterp(i->ipa, 0xff, gk110_interpApply);
}
 
srcId(i->src(0).getIndirect(0), 10);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index e244bd0d610..4970f14cb33 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -947,8 +947,8 @@ CodeEmitterGM107::emitI2I()
emitGPR  (0x00, insn->def(0));
 }
 

[Mesa-dev] [RFC PATCH v2 3/6] nv50/ir: Add prog_info_out print

2020-03-19 Thread Mark Menzynski
Adds a function for printing nv50_ir_prog_info_out structure
in JSON-like format, which could be used in debugging.

Signed-off-by: Mark Menzynski 
---
 .../nouveau/codegen/.nv50_ir_from_nir.cpp.swp | Bin 0 -> 16384 bytes
 .../drivers/nouveau/codegen/nv50_ir_driver.h  |   3 +
 .../drivers/nouveau/codegen/nv50_ir_print.cpp | 154 ++
 3 files changed, 157 insertions(+)
 create mode 100644 
src/gallium/drivers/nouveau/codegen/.nv50_ir_from_nir.cpp.swp

diff --git a/src/gallium/drivers/nouveau/codegen/.nv50_ir_from_nir.cpp.swp 
b/src/gallium/drivers/nouveau/codegen/.nv50_ir_from_nir.cpp.swp
new file mode 100644
index 
..c405065a5df6c33ee4f0439c30c474d446b87730
GIT binary patch
literal 16384
zcmeHOYmgjO6>bBHf+(U`l!A4$5!hXaojg#f-DJt^>}FK^R)263;
z=*Mn0F;N7|@&{n4WtEm?DVBv`4+)c+9$@lpCm@dx4q4T2IBQPl6;dwU+8O|n1y
zW%X9R>`b3~(IOp`;=FU}piQPE0Tf@&~ns(*=bGKb~)t9u_-J)sTF87XZcp;I_
z_2c@M>2~}uni8LEziT?)CLK;B4D^Ft@{1BZwCE)UIo7+CqX>AFtb(*q#)B~L
z`SBVzb`o9(Z9`*cGivFUa->aT)r~#63{-x-L75yS5P(uHwqTi|LL5lu@qTi?J&!*`A
zQ1n32-+zJpknn$A(Qi}q(<%Dj6@5+7YZs38|C^$}OVPidqCcnT#})mWmyY&7t>{VJ
zSw(kL|I1!B+W)Vq@i$WRzbN`URsUb4=+7v6qW`XoM)^OT;(uR?{%1u$r1+mo
z(VtRuuIQzgkMcjI=*hf#Fh&1UivF5cjQ0PdqJK#7Ka!$9sp$Rreer1jKPY-4Upqzr
zy`m@c=iwCncd7AbFB#?kTSdQB$#*hEe?rmM6@AAmNBgVksO?wuPp0VVarP^-KxTo=
z0+|Ie3uG3^ERb0svp{Bn%mQb%fN5(QrE1@$L>TY?ll`JqnYTz_-s*}Ly
zfxCgbfDTXvbRY*@16*S_cZVu;Bnw2@J--Tz{i320VZ$@a0u8B+yu-37XsUW
zN0Dbf0z3?S4)`qa31A*50yh9}0k#1@grV*N3c!WH1;EdcN8bxLz}z#D-dBf0w^Fb6!3y2a0dPXl#e1~>)Zd=DTWeTV#{e%M)l
zn!fKcC#>UEh8eJZtYx|(pP13mJa)Wj$}PZ64{K?F8J4#!1KWILdU|VFbddY
zPPDxdO~b)QE5}w%H|A{e+8nXV=b0IXahKc7k2*Y9cS5e7_s4fIjAceVWKqYA@E>QU
zZ3jFI9j^^BSt#Vpu{CbpFiIj;XO$Q?Muwyu2qqPUWKk#Z<90`kD5dV1f!XB|52lom
zdP`7Se(c$7O=#9JL+1GrGnwVX5~de1bc=FBY+$>P1w4uakK2j5WVP6e9CTvie5yUg
zB7S6oIUe%^n+L4r2jD0!varY(%wach^+TSBW0@g`wL>35I=B|Y);Y$xBZixpgUd6v
z13OANL$GBbhj-AJwH%k*c;qQ*q7Lkee_dS4aW|L=A$s8G95qQm8ovoIG+m7Ev+%If
zgPd|QpqmHXTYO!yqRBy~#_>N@OS9PfyJRmx958tR`OdkPJbRvOtI1$$5*rah$6exSL?is8SYpkqxXGi_`P!oVDm6yld{2_iLGM}*S`)=
zj<9OdqomQMYLJUzg$(
zO{>F~u-f`exEFU>+ciwf!p>vd5u2{la(ICD;BiHo9hSZ;TDGjWwQQ
zL;B6KiW3M`m&~5nRL|Er*}_CmA#{0(cdM-y)^E5a4zvi*aON!2lRaUpk|+-dLFRcY
z-7Spl>1k}kxR|$BHl3VJCg_aT!?ir@KhtC{;ejCb#+Chwa`N(Ed^zyj*kx^+%&
zk;^5yz1KqnOvjclny4#%{fCAh%SqLXo33L`YsWOp#Tmo;Bd}>EX@f^azuU#$ryVEJ
zTUg_uAp(if3w<41qHm
zkEAMtzR)B;A`-PBkq1rE4a3!>m}dJ&%my-_LE>#`ABF4P)&
zaiv_SvE`N8ausqHD)X#Tt>~478Z1y+DpeXd_S!osDSbecjE)y?(X}PSI=5wGH
z%PaF_usN8jQf)w#CB0FahcZHP7`9fZG;|VUfi0D4#YKEgjFRY=xPa+(wp^>O>hs_*
zR)CA@W3U{oO`cG^4iP6(^JcvQ5lZh^u903sOJayy&^WewxEd{DyclzV=wRJwK2E
zrizj3ek)ox1BxGEX9}#F{u-Tmy^X+WcOnX89c)S_qFWIXVgYm=1m1`N6)Eh{aEJ!g
zzF^}6xDmsII|Y(0#JWz{LEMLc(MHVZBTS8(@5Wg{Pbgq(`7YKP#q`)kZAS=-vjX~f
zc3uRlLwd;=FBu^xCG?q)4*@rBe^VR{*=3=L%-yDHM
zP96_oC(&9Yf9(oC0P=mvyh<1r>A_-eZph-s%@)L9w@qA9%hEz+)E`sSu
z1k=Qfq>yndOcg4p2%T_p`4(HmqhL;P+%N5wRmtps~l2!N(?qXv+g9fgLsj-rQg~
zQ$29Ou@!JmV_6-Fl8~187-~U!#r|Z#f$(-#gCvVc!*tJ@((;oU
z<^Q)Mw|x(CVaosatK9OZ$oGE^KPR{&2U@BbEX0$2v#44g*p{}}Kq;48qL
zzzlE+a53;WY5|V|KLWlDd;~ZFYyd|9su#Qs7z3^bt^%G$P2dN>L%?T%yMZpy0ouSp
zfNBU0U>~p-7zeHc#(<|#Bls%tW#A}q0HFH86@Ugj0i7QKz5r0Y;A6l?0kXw0Ap6NI
zkXazJKxTo=0+|Ie3;eGwaGdJrgX~j}Qg|Ai}3w
zCDLhC144^Ifu>It0*E>g#n3I}42k4Ys^J0=K7+BW<%*io42nS*=*%Bg
zusNf`B#TM?F*Q}2nx5{Ohxr2jinOBF#?RK7O$w1U(tNmxqDEmTrU(bd9=XsSIC9j1
zcq;(UXb@Udr=;+zMkQX%Bt8%pUvuzhvE!}S6Pd?^Fg;lbZx1+XnnptDHxbsIAYmkD
zYK%f1s$R(KPl?gJ-Hfr`%WZQnLH#Qh=yCl^``E4Q*H!;Xoc=rOl
z8)5KQiZR~Jgq+;i=Smg`*E#Qz>QV$Ovjb+(G~1LLyYvEsl0QmyrSs{fjnEd1jO9YT
zZmdeX5id2HWkm{wd8Ax$sbsua`+#?mgrgXQ+WVRrIP{tYFLj6zi>A
zD-<+n{

literal 0
HcmV?d1

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index eea32133ccf..10ae5cbe420 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -275,6 +275,9 @@ namespace nv50_ir
 }
 #endif
 
+extern void
+nv50_ir_prog_info_out_print(struct nv50_ir_prog_info_out *);
+
 /* Serialize a nv50_ir_prog_info_out structure and save it into blob */
 extern bool
 nv50_ir_prog_info_out_serialize(struct blob *, struct nv50_ir_prog_info_out *);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index 5dcbf3c3e0c..2c13bef5e1a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -22,6 +22,7 @@
 
 #include "codegen/nv50_ir.h"
 #include "codegen/nv50_ir_target.h"
+#include "codegen/nv50_ir_driver.h"
 
 #include 
 
@@ -852,3 +853,156 @@ Function::printLiveIntervals() const
 }
 
 } // namespace nv50_ir
+
+extern void
+nv50_ir_prog_info_out_print(struct nv50_ir_prog_info_out *info_out)
+{
+   int i;
+
+   INFO("{\n");
+   INFO("   \"target\":\"%d\",\n", info_out->target);
+   INFO("   \"type\":\"%d\",\n", info_out->type);
+
+   // Bin
+   INFO("   \"bin\":{\n");
+   INFO("  \"maxGPR\":\"%d\",\n", info_out->bin.maxGPR);
+   INFO("  

[Mesa-dev] [RFC PATCH v2 5/6] nv50: Add separate functions for varying bits

2020-03-19 Thread Mark Menzynski
This separation will be needed for shader disk caching. The reason for it
is that when loading shaders from cache, data in info structure already gets
loaded. That means varying bits for info is needed only when compiling
shaders and not needed when loading from cache. Varying bits for prog are
needed in both cases.

Unfortunately, I don't know how most of the code works, I have separated
this manually, only by looking at the original code. That means that this
patch is experimental. Together with following commit it works
(there seem to be no regressions at all in VK-GL-CTS
[openglcts/data/mustpass/gl/khronos_mustpass/4.6.1.x/gl33-master.txt]
and all benchmarks behaved normally). Unfortunately, I cannot test in
Piglit because of technical problems, so there might be still some
work needed.
I am mainly asking to help with the function names,
look for bugs and pointing out useless code. I will be glad for every
review.

Signed-off-by: Mark Menzynski 
---
 .../drivers/nouveau/nv50/nv50_program.c   | 344 ++
 1 file changed, 344 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 924120eecdf..b5e36cf488d 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -139,6 +139,130 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info_out 
*info)
return 0;
 }
 
+static int
+nv50_vertprog_assign_slots_info(struct nv50_ir_prog_info_out *info)
+{
+   unsigned i, n, c;
+
+   n = 0;
+   for (i = 0; i < info->numInputs; ++i) {
+   for (c = 0; c < 4; ++c)
+ if (info->in[i].mask & (1 << c))
+info->in[i].slot[c] = n++;
+   }
+
+   /* VertexID before InstanceID */
+   if (info->io.vertexId < info->numSysVals)
+  info->sv[info->io.vertexId].slot[0] = n++;
+   if (info->io.instanceId < info->numSysVals)
+  info->sv[info->io.instanceId].slot[0] = n++;
+
+   n = 0;
+   for (i = 0; i < info->numOutputs; ++i) {
+  for (c = 0; c < 4; ++c)
+ if (info->out[i].mask & (1 << c))
+info->out[i].slot[c] = n++;
+   }
+
+   return 0;
+}
+
+static int
+nv50_vertprog_assign_slots_prog(struct nv50_ir_prog_info_out *info)
+{
+   struct nv50_program *prog = (struct nv50_program *)info->driverPriv;
+   unsigned i, n, c;
+
+   n = 0;
+   for (i = 0; i < info->numInputs; ++i) {
+  prog->in[i].id = i;
+  prog->in[i].sn = info->in[i].sn;
+  prog->in[i].si = info->in[i].si;
+  prog->in[i].hw = n;
+  prog->in[i].mask = info->in[i].mask;
+
+  prog->vp.attrs[(4 * i) / 32] |= info->in[i].mask << ((4 * i) % 32);
+
+  for (c = 0; c < 4; ++c)
+ if (info->in[i].mask & (1 << c))
+n++;
+
+  if (info->in[i].sn == TGSI_SEMANTIC_PRIMID)
+ prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_PRIMITIVE_ID;
+   }
+   prog->in_nr = info->numInputs;
+
+   for (i = 0; i < info->numSysVals; ++i) {
+  switch (info->sv[i].sn) {
+  case TGSI_SEMANTIC_INSTANCEID:
+ prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_INSTANCE_ID;
+ continue;
+  case TGSI_SEMANTIC_VERTEXID:
+ prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID;
+ prog->vp.attrs[2] |= 
NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID_DRAW_ARRAYS_ADD_START;
+ continue;
+  default:
+ break;
+  }
+   }
+
+   /*
+* Corner case: VP has no inputs, but we will still need to submit data to
+* draw it. HW will shout at us and won't draw anything if we don't enable
+* any input, so let's just pretend it's the first one.
+*/
+   if (prog->vp.attrs[0] == 0 &&
+   prog->vp.attrs[1] == 0 &&
+   prog->vp.attrs[2] == 0)
+  prog->vp.attrs[0] |= 0xf;
+
+   n = 0;
+   for (i = 0; i < info->numOutputs; ++i) {
+  switch (info->out[i].sn) {
+  case TGSI_SEMANTIC_PSIZE:
+ prog->vp.psiz = i;
+ break;
+  case TGSI_SEMANTIC_CLIPDIST:
+ prog->vp.clpd[info->out[i].si] = n;
+ break;
+  case TGSI_SEMANTIC_EDGEFLAG:
+ prog->vp.edgeflag = i;
+ break;
+  case TGSI_SEMANTIC_BCOLOR:
+ prog->vp.bfc[info->out[i].si] = i;
+ break;
+  case TGSI_SEMANTIC_LAYER:
+ prog->gp.has_layer = true;
+ prog->gp.layerid = n;
+ break;
+  case TGSI_SEMANTIC_VIEWPORT_INDEX:
+ prog->gp.has_viewport = true;
+ prog->gp.viewportid = n;
+ break;
+  default:
+ break;
+  }
+  prog->out[i].id = i;
+  prog->out[i].sn = info->out[i].sn;
+  prog->out[i].si = info->out[i].si;
+  prog->out[i].hw = n;
+  prog->out[i].mask = info->out[i].mask;
+
+  for (c = 0; c < 4; ++c)
+ if (info->out[i].mask & (1 << c))
+n++;
+   }
+   prog->out_nr = info->numOutputs;
+   prog->max_out = n;
+   if (!prog->max_out)
+  prog->max_out = 1;
+
+   if (prog->vp.psiz < info->numOutputs)
+  prog->vp.psiz = 

[Mesa-dev] [RFC PATCH 2/2] nv50: Add shader disk caching

2020-03-19 Thread Mark Menzynski
Adds shader disk caching for nv50 to reduce the need to every time compile
shaders. Shaders are saved into disk_shader_cache from nv50_screen structure.

It can be disabled with MESA_GLSL_CACHE_DISABLE=1.

It serializes the input nv50_ir_prog_info to compute the hash key and
also to do a byte compare between the original nv50_ir_prog_info and the one
saved in the cache. If keys match and also the byte compare returns they
are equal, shaders are same, and the compiled nv50_ir_prog_info_out from the
cache can be used instead of compiling input info.

Signed-off-by: Mark Menzynski 
---
 .../drivers/nouveau/nv50/nv50_program.c   | 55 ---
 .../drivers/nouveau/nv50/nv50_program.h   |  2 +
 .../drivers/nouveau/nv50/nv50_shader_state.c  |  4 +-
 src/gallium/drivers/nouveau/nv50/nv50_state.c |  1 +
 4 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index bf63b20f613..0b85267f36f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -667,10 +667,21 @@ nv50_program_create_strmout_state(const struct 
nv50_ir_prog_info_out *info,
 
 bool
 nv50_program_translate(struct nv50_program *prog, uint16_t chipset,
+   struct disk_cache *disk_shader_cache,
struct pipe_debug_callback *debug)
 {
+   struct blob blob;
struct nv50_ir_prog_info *info;
-   int i, ret;
+   struct nv50_ir_prog_info_out info_out = {};
+
+   void *cached_data = NULL;
+   size_t cached_size;
+   bool shader_found = false;
+
+   int i;
+   int ret = 0;
+   cache_key key;
+
const uint8_t map_undef = (prog->type == PIPE_SHADER_VERTEX) ? 0x40 : 0x80;
 
info = CALLOC_STRUCT(nv50_ir_prog_info);
@@ -704,7 +715,7 @@ nv50_program_translate(struct nv50_program *prog, uint16_t 
chipset,
info->io.msInfoCBSlot = 15;
info->io.msInfoBase = NV50_CB_AUX_MS_OFFSET;
 
-   info->assignSlots = nv50_program_assign_varying_slots;
+   info->assignSlots = nv50_program_assign_varying_slots_info;
 
prog->vp.bfc[0] = 0xff;
prog->vp.bfc[1] = 0xff;
@@ -726,16 +737,42 @@ nv50_program_translate(struct nv50_program *prog, 
uint16_t chipset,
info->optLevel = 3;
 #endif
 
-   struct nv50_ir_prog_info_out info_out = {};
/* these fields might be overwritten by the compiler */
-   info_out.bin.smemSize = prog->cp.smem_size;
-   info_out.io.genUserClip = prog->vp.clpd_nr;
+   info->bin.smemSize = prog->cp.smem_size;
+   info->io.genUserClip = prog->vp.clpd_nr;
+
+   blob_init();
+
+   if (disk_shader_cache) {
+  nv50_ir_prog_info_serialize(, info);
+  disk_cache_compute_key(disk_shader_cache, blob.data, blob.size, key);
+  cached_data = disk_cache_get(disk_shader_cache, key, _size);
+
+  if (cached_data && cached_size >= blob.size) { // blob.size is the size 
of serialized "info"
+ if (memcmp(cached_data, blob.data, blob.size) == 0) {
+shader_found = true;
+/* Blob contains only "info". In disk cache, "info_out" comes 
right after it */
+size_t offset = blob.size;
+nv50_ir_prog_info_out_deserialize(cached_data, cached_size, 
offset, _out);
+ }
+  }
+  free(cached_data);
+   }
info_out.driverPriv = prog;
-   ret = nv50_ir_generate_code(info, _out);
-   if (ret) {
-  NOUVEAU_ERR("shader translation failed: %i\n", ret);
-  goto out;
+
+   if (!shader_found) {
+  ret = nv50_ir_generate_code(info, _out);
+  if (ret) {
+ NOUVEAU_ERR("shader translation failed: %i\n", ret);
+ goto out;
+  }
+  if (disk_shader_cache) {
+ nv50_ir_prog_info_out_serialize(, _out);
+ disk_cache_put(disk_shader_cache, key, blob.data, blob.size, NULL);
+  }
}
+   blob_finish();
+   nv50_program_assign_varying_slots_prog(_out);
 
prog->code = info_out.bin.code;
prog->code_size = info_out.bin.codeSize;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h 
b/src/gallium/drivers/nouveau/nv50/nv50_program.h
index 1a89e0d5067..528e1d01fa1 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
@@ -116,7 +116,9 @@ struct nv50_program {
struct nv50_stream_output_state *so;
 };
 
+struct disk_cache;
 bool nv50_program_translate(struct nv50_program *, uint16_t chipset,
+struct disk_cache *,
 struct pipe_debug_callback *);
 bool nv50_program_upload_code(struct nv50_context *, struct nv50_program *);
 void nv50_program_destroy(struct nv50_context *, struct nv50_program *);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c 
b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
index 2cbbdc0cc35..65891108464 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -116,7 +116,9 @@ 

[Mesa-dev] [RFC PATCH 1/2] nv50: Add separate functions for varying bits

2020-03-19 Thread Mark Menzynski
This separation will be needed for shader disk caching. The reason for it
is that when loading shaders from cache, data in info structure already gets
loaded. That means varying bits for info is needed only when compiling
shaders and not needed when loading from cache. Varying bits for prog are
needed in both cases.

Unfortunately, I don't know how most of the code works, I have separated
this manually, only by looking at the original code. That means that this
patch is experimental. Together with following commit it works
(there seem to be no regressions at all in VK-GL-CTS
[openglcts/data/mustpass/gl/khronos_mustpass/4.6.1.x/gl33-master.txt]
and all benchmarks behaved normally). Unfortunately, I cannot test in
Piglit because of technical problems, so there might be still some
work needed.

I am mainly asking to help with the function names,
look for bugs and pointing out useless code. I will be glad for every
review.

Signed-off-by: Mark Menzynski 
---
 .../drivers/nouveau/nv50/nv50_program.c   | 344 ++
 1 file changed, 344 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index a3f3054cbaa..bf63b20f613 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -139,6 +139,130 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info_out 
*info)
return 0;
 }
 
+static int
+nv50_vertprog_assign_slots_info(struct nv50_ir_prog_info_out *info)
+{
+   unsigned i, n, c;
+
+   n = 0;
+   for (i = 0; i < info->numInputs; ++i) {
+   for (c = 0; c < 4; ++c)
+ if (info->in[i].mask & (1 << c))
+info->in[i].slot[c] = n++;
+   }
+
+   /* VertexID before InstanceID */
+   if (info->io.vertexId < info->numSysVals)
+  info->sv[info->io.vertexId].slot[0] = n++;
+   if (info->io.instanceId < info->numSysVals)
+  info->sv[info->io.instanceId].slot[0] = n++;
+
+   n = 0;
+   for (i = 0; i < info->numOutputs; ++i) {
+  for (c = 0; c < 4; ++c)
+ if (info->out[i].mask & (1 << c))
+info->out[i].slot[c] = n++;
+   }
+
+   return 0;
+}
+
+static int
+nv50_vertprog_assign_slots_prog(struct nv50_ir_prog_info_out *info)
+{
+   struct nv50_program *prog = (struct nv50_program *)info->driverPriv;
+   unsigned i, n, c;
+
+   n = 0;
+   for (i = 0; i < info->numInputs; ++i) {
+  prog->in[i].id = i;
+  prog->in[i].sn = info->in[i].sn;
+  prog->in[i].si = info->in[i].si;
+  prog->in[i].hw = n;
+  prog->in[i].mask = info->in[i].mask;
+
+  prog->vp.attrs[(4 * i) / 32] |= info->in[i].mask << ((4 * i) % 32);
+
+  for (c = 0; c < 4; ++c)
+ if (info->in[i].mask & (1 << c))
+n++;
+
+  if (info->in[i].sn == TGSI_SEMANTIC_PRIMID)
+ prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_PRIMITIVE_ID;
+   }
+   prog->in_nr = info->numInputs;
+
+   for (i = 0; i < info->numSysVals; ++i) {
+  switch (info->sv[i].sn) {
+  case TGSI_SEMANTIC_INSTANCEID:
+ prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_INSTANCE_ID;
+ continue;
+  case TGSI_SEMANTIC_VERTEXID:
+ prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID;
+ prog->vp.attrs[2] |= 
NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID_DRAW_ARRAYS_ADD_START;
+ continue;
+  default:
+ break;
+  }
+   }
+
+   /*
+* Corner case: VP has no inputs, but we will still need to submit data to
+* draw it. HW will shout at us and won't draw anything if we don't enable
+* any input, so let's just pretend it's the first one.
+*/
+   if (prog->vp.attrs[0] == 0 &&
+   prog->vp.attrs[1] == 0 &&
+   prog->vp.attrs[2] == 0)
+  prog->vp.attrs[0] |= 0xf;
+
+   n = 0;
+   for (i = 0; i < info->numOutputs; ++i) {
+  switch (info->out[i].sn) {
+  case TGSI_SEMANTIC_PSIZE:
+ prog->vp.psiz = i;
+ break;
+  case TGSI_SEMANTIC_CLIPDIST:
+ prog->vp.clpd[info->out[i].si] = n;
+ break;
+  case TGSI_SEMANTIC_EDGEFLAG:
+ prog->vp.edgeflag = i;
+ break;
+  case TGSI_SEMANTIC_BCOLOR:
+ prog->vp.bfc[info->out[i].si] = i;
+ break;
+  case TGSI_SEMANTIC_LAYER:
+ prog->gp.has_layer = true;
+ prog->gp.layerid = n;
+ break;
+  case TGSI_SEMANTIC_VIEWPORT_INDEX:
+ prog->gp.has_viewport = true;
+ prog->gp.viewportid = n;
+ break;
+  default:
+ break;
+  }
+  prog->out[i].id = i;
+  prog->out[i].sn = info->out[i].sn;
+  prog->out[i].si = info->out[i].si;
+  prog->out[i].hw = n;
+  prog->out[i].mask = info->out[i].mask;
+
+  for (c = 0; c < 4; ++c)
+ if (info->out[i].mask & (1 << c))
+n++;
+   }
+   prog->out_nr = info->numOutputs;
+   prog->max_out = n;
+   if (!prog->max_out)
+  prog->max_out = 1;
+
+   if (prog->vp.psiz < info->numOutputs)
+  prog->vp.psiz = 

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-19 Thread Marek Olšák
On Thu., Mar. 19, 2020, 06:51 Daniel Vetter,  wrote:

> On Tue, Mar 17, 2020 at 11:01:57AM +0100, Michel Dänzer wrote:
> > On 2020-03-16 7:33 p.m., Marek Olšák wrote:
> > > On Mon, Mar 16, 2020 at 5:57 AM Michel Dänzer 
> wrote:
> > >> On 2020-03-16 4:50 a.m., Marek Olšák wrote:
> > >>> The synchronization works because the Mesa driver waits for idle
> (drains
> > >>> the GFX pipeline) at the end of command buffers and there is only 1
> > >>> graphics queue, so everything is ordered.
> > >>>
> > >>> The GFX pipeline runs asynchronously to the command buffer, meaning
> the
> > >>> command buffer only starts draws and doesn't wait for completion. If
> the
> > >>> Mesa driver didn't wait at the end of the command buffer, the command
> > >>> buffer would finish and a different process could start execution of
> its
> > >>> own command buffer while shaders of the previous process are still
> > >> running.
> > >>>
> > >>> If the Mesa driver submits a command buffer internally (because it's
> > >> full),
> > >>> it doesn't wait, so the GFX pipeline doesn't notice that a command
> buffer
> > >>> ended and a new one started.
> > >>>
> > >>> The waiting at the end of command buffers happens only when the
> flush is
> > >>> external (Swap buffers, glFlush).
> > >>>
> > >>> It's a performance problem, because the GFX queue is blocked until
> the
> > >> GFX
> > >>> pipeline is drained at the end of every frame at least.
> > >>>
> > >>> So explicit fences for SwapBuffers would help.
> > >>
> > >> Not sure what difference it would make, since the same thing needs to
> be
> > >> done for explicit fences as well, doesn't it?
> > >
> > > No. Explicit fences don't require userspace to wait for idle in the
> command
> > > buffer. Fences are signalled when the last draw is complete and caches
> are
> > > flushed. Before that happens, any command buffer that is not dependent
> on
> > > the fence can start execution. There is never a need for the GPU to be
> idle
> > > if there is enough independent work to do.
> >
> > I don't think explicit fences in the context of this discussion imply
> > using that different fence signalling mechanism though. My understanding
> > is that the API proposed by Jason allows implicit fences to be used as
> > explicit ones and vice versa, so presumably they have to use the same
> > signalling mechanism.
> >
> >
> > Anyway, maybe the different fence signalling mechanism you describe
> > could be used by the amdgpu kernel driver in general, then Mesa could
> > drop the waits for idle and get the benefits with implicit sync as well?
>
> Yeah, this is entirely about the programming model visible to userspace.
> There shouldn't be any impact on the driver's choice of a top vs. bottom
> of the gpu pipeline used for synchronization, that's entirely up to what
> you're hw/driver/scheduler can pull off.
>
> Doing a full gfx pipeline flush for shared buffers, when your hw can do
> be, sounds like an issue to me that's not related to this here at all. It
> might be intertwined with amdgpu's special interpretation of dma_resv
> fences though, no idea. We might need to revamp all that. But for a
> userspace client that does nothing fancy (no multiple render buffer
> targets in one bo, or vk style "I write to everything all the time,
> perhaps" stuff) there should be 0 perf difference between implicit sync
> through dma_resv and explicit sync through sync_file/syncobj/dma_fence
> directly.
>
> If there is I'd consider that a bit a driver bug.
>

Last time I checked, there was no fence sync in gnome shell and compiz
after an app passes a buffer to it. So drivers have to invent hacks to work
around it and decrease performance. It's not a driver bug.

Implicit sync really means that apps and compositors don't sync, so the
driver has to guess when it should sync.

Marek


-Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-19 Thread Adam Jackson
On Tue, 2020-03-17 at 10:12 -0700, Jacob Lifshay wrote:
> One related issue with explicit sync using sync_file is that combined
> CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
> rendering in userspace (like llvmpipe but for Vulkan and with extra
> instructions for GPU tasks) but need to synchronize with other
> drivers/processes is that there should be some way to create an
> explicit fence/semaphore from userspace and later signal it. This
> seems to conflict with the requirement for a sync_file to complete in
> finite time, since the user process could be stopped or killed.

DRI3 (okay, libxshmfence specifically) uses futexes for this. Would
that work for you? IIRC the semantics there are that if the process
dies the futex is treated as triggered, which seems like the only
sensible thing to do.

- ajax

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-19 Thread Daniel Vetter
On Tue, Mar 17, 2020 at 11:01:57AM +0100, Michel Dänzer wrote:
> On 2020-03-16 7:33 p.m., Marek Olšák wrote:
> > On Mon, Mar 16, 2020 at 5:57 AM Michel Dänzer  wrote:
> >> On 2020-03-16 4:50 a.m., Marek Olšák wrote:
> >>> The synchronization works because the Mesa driver waits for idle (drains
> >>> the GFX pipeline) at the end of command buffers and there is only 1
> >>> graphics queue, so everything is ordered.
> >>>
> >>> The GFX pipeline runs asynchronously to the command buffer, meaning the
> >>> command buffer only starts draws and doesn't wait for completion. If the
> >>> Mesa driver didn't wait at the end of the command buffer, the command
> >>> buffer would finish and a different process could start execution of its
> >>> own command buffer while shaders of the previous process are still
> >> running.
> >>>
> >>> If the Mesa driver submits a command buffer internally (because it's
> >> full),
> >>> it doesn't wait, so the GFX pipeline doesn't notice that a command buffer
> >>> ended and a new one started.
> >>>
> >>> The waiting at the end of command buffers happens only when the flush is
> >>> external (Swap buffers, glFlush).
> >>>
> >>> It's a performance problem, because the GFX queue is blocked until the
> >> GFX
> >>> pipeline is drained at the end of every frame at least.
> >>>
> >>> So explicit fences for SwapBuffers would help.
> >>
> >> Not sure what difference it would make, since the same thing needs to be
> >> done for explicit fences as well, doesn't it?
> > 
> > No. Explicit fences don't require userspace to wait for idle in the command
> > buffer. Fences are signalled when the last draw is complete and caches are
> > flushed. Before that happens, any command buffer that is not dependent on
> > the fence can start execution. There is never a need for the GPU to be idle
> > if there is enough independent work to do.
> 
> I don't think explicit fences in the context of this discussion imply
> using that different fence signalling mechanism though. My understanding
> is that the API proposed by Jason allows implicit fences to be used as
> explicit ones and vice versa, so presumably they have to use the same
> signalling mechanism.
> 
> 
> Anyway, maybe the different fence signalling mechanism you describe
> could be used by the amdgpu kernel driver in general, then Mesa could
> drop the waits for idle and get the benefits with implicit sync as well?

Yeah, this is entirely about the programming model visible to userspace.
There shouldn't be any impact on the driver's choice of a top vs. bottom
of the gpu pipeline used for synchronization, that's entirely up to what
you're hw/driver/scheduler can pull off.

Doing a full gfx pipeline flush for shared buffers, when your hw can do
be, sounds like an issue to me that's not related to this here at all. It
might be intertwined with amdgpu's special interpretation of dma_resv
fences though, no idea. We might need to revamp all that. But for a
userspace client that does nothing fancy (no multiple render buffer
targets in one bo, or vk style "I write to everything all the time,
perhaps" stuff) there should be 0 perf difference between implicit sync
through dma_resv and explicit sync through sync_file/syncobj/dma_fence
directly.

If there is I'd consider that a bit a driver bug.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-19 Thread Daniel Vetter
On Tue, Mar 17, 2020 at 11:27:28AM -0500, Jason Ekstrand wrote:
> On Tue, Mar 17, 2020 at 10:33 AM Nicolas Dufresne  
> wrote:
> >
> > Le lundi 16 mars 2020 à 23:15 +0200, Laurent Pinchart a écrit :
> > > Hi Jason,
> > >
> > > On Mon, Mar 16, 2020 at 10:06:07AM -0500, Jason Ekstrand wrote:
> > > > On Mon, Mar 16, 2020 at 5:20 AM Laurent Pinchart wrote:
> > > > > Another issue is that V4L2 doesn't offer any guarantee on job 
> > > > > ordering.
> > > > > When you queue multiple buffers for camera capture for instance, you
> > > > > don't know until capture complete in which buffer the frame has been
> > > > > captured.
> > > >
> > > > Is this a Kernel UAPI issue?  Surely the kernel driver knows at the
> > > > start of frame capture which buffer it's getting written into.  I
> > > > would think that the kernel APIs could be adjusted (if we find good
> > > > reason to do so!) such that they return earlier and return a (buffer,
> > > > fence) pair.  Am I missing something fundamental about video here?
> > >
> > > For cameras I believe we could do that, yes. I was pointing out the
> > > issues caused by the current API. For video decoders I'll let Nicolas
> > > answer the question, he's way more knowledgeable that I am on that
> > > topic.
> >
> > Right now, there is simply no uAPI for supporting asynchronous errors
> > reporting when fences are invovled. That is true for both camera's and
> > CODEC. It's likely what all the attempt was missing, I don't know
> > enough myself to suggest something.
> >
> > Now, why Stateless video decoders are special is another subject. In
> > CODECs, the decoding and the presentation order may differ. For
> > Stateless kind of CODEC, a bitstream is passed to the HW. We don't know
> > if this bitstream is fully valid, since the it is being parsed and
> > validated by the firmware. It's also firmware job to decide which
> > buffer should be presented first.
> >
> > In most firmware interface, that information is communicated back all
> > at once when the frame is ready to be presented (which may be quite
> > some time after it was decoded). So indeed, a fence model is not really
> > easy to add, unless the firmware was designed with that model in mind.
> 
> Just to be clear, I think we should do whatever makes sense here and
> not try to slam sync_file in when it doesn't make sense just because
> we have it.  The more I read on this thread, the less out-fences from
> video decode sound like they make sense unless we have a really solid
> plan for async error reporting.  It's possible, depending on how many
> processes are involved in the pipeline, that async error reporting
> could help reduce latency a bit if it let the kernel report the error
> directly to the last process in the chain.  However, I'm not convinced
> the potential for userspace programmer error is worth it..  That said,
> I'm happy to leave that up to the actual video experts. (I just do 3D)

dma_fence has an error state which you can set when things went south. The
fence still completes (to guarantee forward progress).

Currently that error code isn't really propagated anywhere (well i915 iirc
does something like that since it tracks the depedencies internally in the
scheduler). Definitely not at the dma_fence level, since we don't track
the dependency graph there at all. We might want to add that, would at
least be possible.

If we track the cascading dma_fence error state in the kernel I do think
this could work. I'm not sure whether it's actually a good/useful idea
still.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-19 Thread Daniel Vetter
On Wed, Mar 18, 2020 at 11:05:48AM +0100, Michel Dänzer wrote:
> On 2020-03-17 6:21 p.m., Lucas Stach wrote:
> > That's one of the issues with implicit sync that explicit may solve: 
> > a single client taking way too much time to render something can 
> > block the whole pipeline up until the display flip. With explicit 
> > sync the compositor can just decide to use the last client buffer if 
> > the latest buffer isn't ready by some deadline.
> 
> FWIW, the compositor can do this with implicit sync as well, by polling
> a dma-buf fd for the buffer. (Currently, it has to poll for writable,
> because waiting for the exclusive fence only isn't enough with amdgpu)

Would be great if we don't have to make this recommended uapi, just
because amdgpu leaks it's trickery into the wider world. Polling for read
really should be enough (and I guess Christian gets to fix up amdgpu more,
at least for anything that has a dma-buf attached even if it's not shared
with anything !amdgpu.ko).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-19 Thread Daniel Vetter
On Tue, Mar 17, 2020 at 12:18:47PM -0500, Jason Ekstrand wrote:
> On Tue, Mar 17, 2020 at 12:13 PM Jacob Lifshay  
> wrote:
> >
> > One related issue with explicit sync using sync_file is that combined
> > CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
> > rendering in userspace (like llvmpipe but for Vulkan and with extra
> > instructions for GPU tasks) but need to synchronize with other
> > drivers/processes is that there should be some way to create an
> > explicit fence/semaphore from userspace and later signal it. This
> > seems to conflict with the requirement for a sync_file to complete in
> > finite time, since the user process could be stopped or killed.
> 
> Yeah... That's going to be a problem.  The only way I could see that
> working is if you created a sync_file that had a timeout associated
> with it.  However, then you run into the issue where you may have
> corruption if stuff doesn't complete on time.  Then again, you're not
> really dealing with an external unit and so the latency cost of going
> across the window system protocol probably isn't massively different
> from the latency cost of triggering the sync_file.  Maybe the answer
> there is to just do everything in-order and not worry about
> synchronization?

vgem does that already (fences with timeout). The corruption issue is also
not new, if your shaders take forever real gpus will nick your rendering
with a quick reset. Iirc someone (from cros google team maybe) was even
looking into making llvmpipe run on top of vgem as a real dri/drm mesa
driver.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev