[Mesa-dev] [Bug 111248] Navi10 Font rendering issue in Overwatch

2019-08-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=111248

--- Comment #5 from Matt  ---
Main menu frame

https://send.firefox.com/download/73198043a0f77669/#J_dNWZXjNm3LXJaErpD7zA

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] iris/gen9: Optimize slice and subslice load balancing behavior.

2019-08-09 Thread Francisco Jerez
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.

Reviewed-by: Kenneth Graunke 
---
 src/gallium/drivers/iris/iris_blorp.c   |  6 ++
 src/gallium/drivers/iris/iris_context.c |  1 +
 src/gallium/drivers/iris/iris_context.h |  3 +
 src/gallium/drivers/iris/iris_genx_protos.h |  4 +
 src/gallium/drivers/iris/iris_state.c   | 96 +
 5 files changed, 110 insertions(+)

diff --git a/src/gallium/drivers/iris/iris_blorp.c 
b/src/gallium/drivers/iris/iris_blorp.c
index 7298e23d23c..7aae5ea7002 100644
--- a/src/gallium/drivers/iris/iris_blorp.c
+++ b/src/gallium/drivers/iris/iris_blorp.c
@@ -307,6 +307,12 @@ iris_blorp_exec(struct blorp_batch *blorp_batch,
 
iris_require_command_space(batch, 1400);
 
+   const unsigned scale = params->fast_clear_op ? UINT_MAX : 1;
+   if (ice->state.current_hash_scale != scale) {
+  genX(emit_hashing_mode)(ice, batch, params->x1 - params->x0,
+  params->y1 - params->y0, scale);
+   }
+
blorp_exec(blorp_batch, params);
 
/* We've smashed all state compared to what the normal 3D pipeline
diff --git a/src/gallium/drivers/iris/iris_context.c 
b/src/gallium/drivers/iris/iris_context.c
index 8710f010ebf..02b74d39619 100644
--- a/src/gallium/drivers/iris/iris_context.c
+++ b/src/gallium/drivers/iris/iris_context.c
@@ -98,6 +98,7 @@ iris_lost_context_state(struct iris_batch *batch)
}
 
ice->state.dirty = ~0ull;
+   ice->state.current_hash_scale = 0;
memset(ice->state.last_grid, 0, sizeof(ice->state.last_grid));
batch->last_surface_base_address = ~0ull;
ice->vtbl.lost_genx_state(ice, batch);
diff --git a/src/gallium/drivers/iris/iris_context.h 
b/src/gallium/drivers/iris/iris_context.h
index f25c91fb317..6237c6f7014 100644
--- a/src/gallium/drivers/iris/iris_context.h
+++ b/src/gallium/drivers/iris/iris_context.h
@@ -726,6 +726,9 @@ struct iris_context {
 
   /** Records the size of variable-length state for INTEL_DEBUG=bat */
   struct hash_table_u64 *sizes;
+
+  /** Last rendering scale argument provided to genX(emit_hashing_mode). */
+  unsigned current_hash_scale;
} state;
 };
 
diff --git a/src/gallium/drivers/iris/iris_genx_protos.h 
b/src/gallium/drivers/iris/iris_genx_protos.h
index 623eb6b4802..16da78d7e9f 100644
--- a/src/gallium/drivers/iris/iris_genx_protos.h
+++ b/src/gallium/drivers/iris/iris_genx_protos.h
@@ -33,6 +33,10 @@ void genX(emit_urb_setup)(struct iris_context *ice,
   struct iris_batch *batch,
   const unsigned size[4],
   bool tess_present, bool gs_present);
+void genX(emit_hashing_mode)(struct iris_context *ice,
+ struct iris_batch *batch,
+ unsigned width, unsigned height,
+ unsigned scale);
 
 /* iris_blorp.c */
 void genX(init_blorp)(struct iris_context *ice);
diff --git a/src/gallium/drivers/iris/iris_state.c 
b/src/gallium/drivers/iris/iris_state.c
index 7932df23e3d..9e255d4cf89 100644
--- a/src/gallium/drivers/iris/iris_state.c
+++ b/src/gallium/drivers/iris/iris_state.c
@@ -5192,6 +5192,9 @@ iris_upload_dirty_render_state(struct iris_context *ice,
   }
}
 
+   if (ice->state.current_hash_scale != 1)
+  genX(emit_hashing_mode)(ice, batch, UINT_MAX, UINT_MAX, 1);
+
/* TODO: Gen8 PMA fix */
 }
 
@@ -6450,6 +6453,99 @@ iris_lost_genx_state(struct iris_context *ice, struct 
iris_batch *batch)
memset(genx->last_index_buffer, 0, sizeof(genx->last_index_buffer));
 }
 
+/**
+ * Update the pixel hashing modes that determine the balancing of PS threads
+ * across subslices and slices.
+ *
+ * \param width Width bound of the rendering area (already scaled down if \p
+ *  scale is greater than 1).
+ * \param height Height bound of the rendering area (already scaled down if \p
+ *   scale is greater than 1).
+ * \param scale The number of framebuffer samples that could potentially be
+ *  affected by an individual channel of the PS thread.  This is
+ *  typically one for single-sampled rendering, but for operations
+ *  like CCS resolves and fast clears a single PS invocation may
+ *  update a huge number of pixels, in which case a finer
+ *  balancing is desirable in order to maximally utilize the
+ *  bandwidth available.  UINT_MAX can be used as shorthand for
+ *  "finest hashing mode available".
+ */
+void
+genX(emit_hashing_mode)(struct iris_context *ice, struct iris_batch *batch,
+unsigned width, unsigned height, unsigned scale)
+{
+#if GEN_GEN == 9
+   const struct gen_device_info *devinfo = >screen->devinfo;
+   const unsigned slice_hashing[] = {
+  /* Because all Gen9 platforms with more than one slice require
+   * three-way subslice hashing, a single "normal" 16x16 slice hashing
+   * block 

[Mesa-dev] [PATCH 4/4] OPTIONAL: anv/gen9: Optimize slice and subslice load balancing behavior.

2019-08-09 Thread Francisco Jerez
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.  Marked optional because no performance evaluation
has been done on this commit, it is provided to match the hashing
settings of the Iris driver.  Test reports welcome.
---
 src/intel/vulkan/anv_genX.h|  4 ++
 src/intel/vulkan/anv_private.h |  6 ++
 src/intel/vulkan/genX_blorp_exec.c |  6 ++
 src/intel/vulkan/genX_cmd_buffer.c | 96 ++
 4 files changed, 112 insertions(+)

diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h
index a5435e566a3..06c6b467acf 100644
--- a/src/intel/vulkan/anv_genX.h
+++ b/src/intel/vulkan/anv_genX.h
@@ -44,6 +44,10 @@ void genX(cmd_buffer_apply_pipe_flushes)(struct 
anv_cmd_buffer *cmd_buffer);
 
 void genX(cmd_buffer_emit_gen7_depth_flush)(struct anv_cmd_buffer *cmd_buffer);
 
+void genX(cmd_buffer_emit_hashing_mode)(struct anv_cmd_buffer *cmd_buffer,
+unsigned width, unsigned height,
+unsigned scale);
+
 void genX(flush_pipeline_select_3d)(struct anv_cmd_buffer *cmd_buffer);
 void genX(flush_pipeline_select_gpgpu)(struct anv_cmd_buffer *cmd_buffer);
 
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 2465f264354..b381386a716 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -2421,6 +2421,12 @@ struct anv_cmd_state {
 
bool conditional_render_enabled;
 
+   /**
+* Last rendering scale argument provided to
+* genX(cmd_buffer_emit_hashing_mode)().
+*/
+   unsigned current_hash_scale;
+
/**
 * Array length is anv_cmd_state::pass::attachment_count. Array content is
 * valid only when recording a render pass instance.
diff --git a/src/intel/vulkan/genX_blorp_exec.c 
b/src/intel/vulkan/genX_blorp_exec.c
index 1592e7f7e3d..e9eedc06696 100644
--- a/src/intel/vulkan/genX_blorp_exec.c
+++ b/src/intel/vulkan/genX_blorp_exec.c
@@ -223,6 +223,12 @@ genX(blorp_exec)(struct blorp_batch *batch,
   genX(cmd_buffer_config_l3)(cmd_buffer, cfg);
}
 
+   const unsigned scale = params->fast_clear_op ? UINT_MAX : 1;
+   if (cmd_buffer->state.current_hash_scale != scale) {
+  genX(cmd_buffer_emit_hashing_mode)(cmd_buffer, params->x1 - params->x0,
+ params->y1 - params->y0, scale);
+   }
+
 #if GEN_GEN >= 11
/* The PIPE_CONTROL command description says:
 *
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 86ef1663ac4..e9e5570d49f 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -1595,6 +1595,7 @@ genX(CmdExecuteCommands)(
 */
primary->state.current_pipeline = UINT32_MAX;
primary->state.current_l3_config = NULL;
+   primary->state.current_hash_scale = 0;
 
/* Each of the secondary command buffers will use its own state base
 * address.  We need to re-emit state base address for the primary after
@@ -2663,6 +2664,9 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer 
*cmd_buffer)
 
genX(cmd_buffer_config_l3)(cmd_buffer, pipeline->urb.l3_config);
 
+   if (cmd_buffer->state.current_hash_scale != 1)
+  genX(cmd_buffer_emit_hashing_mode)(cmd_buffer, UINT_MAX, UINT_MAX, 1);
+
genX(flush_pipeline_select_3d)(cmd_buffer);
 
if (vb_emit) {
@@ -3925,6 +3929,98 @@ genX(cmd_buffer_emit_gen7_depth_flush)(struct 
anv_cmd_buffer *cmd_buffer)
}
 }
 
+/**
+ * Update the pixel hashing modes that determine the balancing of PS threads
+ * across subslices and slices.
+ *
+ * \param width Width bound of the rendering area (already scaled down if \p
+ *  scale is greater than 1).
+ * \param height Height bound of the rendering area (already scaled down if \p
+ *   scale is greater than 1).
+ * \param scale The number of framebuffer samples that could potentially be
+ *  affected by an individual channel of the PS thread.  This is
+ *  typically one for single-sampled rendering, but for operations
+ *  like CCS resolves and fast clears a single PS invocation may
+ *  update a huge number of pixels, in which case a finer
+ *  balancing is desirable in order to maximally utilize the
+ *  bandwidth available.  UINT_MAX can be used as shorthand for
+ *  "finest hashing mode available".
+ */
+void
+genX(cmd_buffer_emit_hashing_mode)(struct anv_cmd_buffer *cmd_buffer,
+   unsigned width, unsigned height,
+   unsigned scale)
+{
+#if GEN_GEN == 9
+   const struct gen_device_info *devinfo = _buffer->device->info;
+   const unsigned slice_hashing[] = {
+  /* Because all Gen9 platforms with more than one slice require
+   * three-way subslice hashing, a single "normal" 16x16 slice 

[Mesa-dev] [PATCH 1/4] i965/gen9: Optimize slice and subslice load balancing behavior.

2019-08-09 Thread Francisco Jerez
The default pixel hashing mode settings used for slice and subslice
load balancing are far from optimal under certain conditions (see the
comments below for the gory details).  The top-of-the-line GT4 parts
suffer from a particularly severe performance problem currently due to
a subslice load balancing issue.  Fixing this seems to improve
graphics performance across the board for most of the benchmarks in my
test set, up to ~20% in some cases, e.g. from SKL GT4:

unigine/valley:3.44% ±0.11%
gfxbench/gl_manhattan31:   3.99% ±0.13%
gputest/pixmark_volplosion:8.05% ±0.11%
synmark/OglTexFilterAniso:15.22% ±0.07%
synmark/OglTexMem128: 22.26% ±0.06%

Lower-end platforms are also affected by some subslice load imbalance
to a lesser degree, especially during CCS resolve and fast clear
operations, which are handled especially here due to rasterization
ocurring in reduced CCS coordinates, which changes the semantics of
the pixel hashing mode settings.

No regressions seen during my tests on some SKL, KBL and BXT
configurations.  Additional benchmark reports welcome on any Gen9
platforms (that includes anything with Skylake, Broxton, Kabylake,
Geminilake, Coffeelake, Whiskey Lake, Comet Lake or Amber Lake in your
renderer string).

P.S.: A similar problem is likely to be present on other non-Gen9
  platforms, especially for CCS resolve and fast clear operations.
  Will follow-up with additional patches fixing the hashing mode
  for those once I have enough performance data to justify it.

Reviewed-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_context.h  |  5 ++
 src/mesa/drivers/dri/i965/brw_defines.h  |  5 ++
 src/mesa/drivers/dri/i965/brw_misc_state.c   | 90 
 src/mesa/drivers/dri/i965/brw_state_upload.c |  9 +-
 src/mesa/drivers/dri/i965/genX_blorp_exec.c  |  6 ++
 5 files changed, 109 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 2ac443bf032..17639bf5995 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1219,6 +1219,9 @@ struct brw_context
 
enum gen9_astc5x5_wa_tex_type gen9_astc5x5_wa_tex_mask;
 
+   /** Last rendering scale argument provided to brw_emit_hashing_mode(). */
+   unsigned current_hash_scale;
+
__DRIcontext *driContext;
struct intel_screen *screen;
 };
@@ -1265,6 +1268,8 @@ GLboolean brwCreateContext(gl_api api,
  */
 void brw_workaround_depthstencil_alignment(struct brw_context *brw,
GLbitfield clear_mask);
+void brw_emit_hashing_mode(struct brw_context *brw, unsigned width,
+   unsigned height, unsigned scale);
 
 /* brw_object_purgeable.c */
 void brw_init_object_purgeable_functions(struct dd_function_table *functions);
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 425f5534110..33d042be869 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1570,6 +1570,11 @@ enum brw_pixel_shader_coverage_mask_mode {
 # define GEN9_SUBSLICE_HASHING_8x4  (2 << 8)
 # define GEN9_SUBSLICE_HASHING_16x16(3 << 8)
 # define GEN9_SUBSLICE_HASHING_MASK_BITS REG_MASK(3 << 8)
+# define GEN9_SLICE_HASHING_NORMAL  (0 << 11)
+# define GEN9_SLICE_HASHING_DISABLED(1 << 11)
+# define GEN9_SLICE_HASHING_32x16   (2 << 11)
+# define GEN9_SLICE_HASHING_32x32   (3 << 11)
+# define GEN9_SLICE_HASHING_MASK_BITS REG_MASK(3 << 11)
 
 /* Predicate registers */
 #define MI_PREDICATE_SRC0   0x2400
diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
b/src/mesa/drivers/dri/i965/brw_misc_state.c
index e73cadc5d3e..1291470d479 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -601,6 +601,96 @@ brw_emit_select_pipeline(struct brw_context *brw, enum 
brw_pipeline pipeline)
}
 }
 
+/**
+ * Update the pixel hashing modes that determine the balancing of PS threads
+ * across subslices and slices.
+ *
+ * \param width Width bound of the rendering area (already scaled down if \p
+ *  scale is greater than 1).
+ * \param height Height bound of the rendering area (already scaled down if \p
+ *   scale is greater than 1).
+ * \param scale The number of framebuffer samples that could potentially be
+ *  affected by an individual channel of the PS thread.  This is
+ *  typically one for single-sampled rendering, but for operations
+ *  like CCS resolves and fast clears a single PS invocation may
+ *  update a huge number of pixels, in which case a finer
+ *  balancing is desirable in order to maximally utilize the
+ *  bandwidth available.  UINT_MAX can be used as shorthand for
+ *  "finest hashing mode available".
+ */
+void

[Mesa-dev] [PATCH 2/4] intel/genxml: Add GT_MODE hashing defs for Gen9.

2019-08-09 Thread Francisco Jerez
Reviewed-by: Kenneth Graunke 
---
 src/intel/genxml/gen9.xml | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
index 9df7cd82738..0d037489df9 100644
--- a/src/intel/genxml/gen9.xml
+++ b/src/intel/genxml/gen9.xml
@@ -6477,6 +6477,23 @@
 
   
 
+  
+
+  
+  
+  
+  
+
+
+
+  
+  
+  
+  
+
+
+  
+
   
 
   
-- 
2.22.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Patch] glx: Fix SEGV due to dereferencing a NULL ptr from XCB-GLX.

2019-08-09 Thread zegentzy
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1631/diffs?commit_id=a615f57eedad8e9c459784701a91cf0fc68a4f6d
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] panfrost: Add madvise support to BO cache

2019-08-09 Thread Alyssa Rosenzweig
Alright. Let's wait for Tomeu's ack, but R-b :)

On Fri, Aug 09, 2019 at 01:53:13PM -0600, Rob Herring wrote:
> The kernel now supports madvise ioctl to indicate which BOs can be freed
> when there is memory pressure. Mark BOs purgeable when they are in the
> BO cache. The BOs must also be munmapped when they are in the cache or
> they cannot be purged.
> 
> We could optimize avoiding the madvise ioctl on older kernels once the
> driver version bump lands, but probably not worth it given the other
> driver features also being added.
> 
> Signed-off-by: Rob Herring 
> ---
>  src/gallium/drivers/panfrost/pan_bo_cache.c | 21 +
>  src/gallium/drivers/panfrost/pan_drm.c  |  4 ++--
>  2 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_bo_cache.c 
> b/src/gallium/drivers/panfrost/pan_bo_cache.c
> index 7378d0a8abea..239ea0b46cb2 100644
> --- a/src/gallium/drivers/panfrost/pan_bo_cache.c
> +++ b/src/gallium/drivers/panfrost/pan_bo_cache.c
> @@ -23,6 +23,8 @@
>   * Authors (Collabora):
>   *   Alyssa Rosenzweig 
>   */
> +#include 
> +#include "drm-uapi/panfrost_drm.h"
>  
>  #include "pan_screen.h"
>  #include "util/u_math.h"
> @@ -88,9 +90,21 @@ panfrost_bo_cache_fetch(
>  list_for_each_entry_safe(struct panfrost_bo, entry, bucket, link) {
>  if (entry->size >= size &&
>  entry->flags == flags) {
> +int ret;
> +struct drm_panfrost_madvise madv;
> +
>  /* This one works, splice it out of the cache */
>  list_del(>link);
>  
> +madv.handle = entry->gem_handle;
> +madv.madv = PANFROST_MADV_WILLNEED;
> + madv.retained = 0;
> +
> +ret = drmIoctl(screen->fd, 
> DRM_IOCTL_PANFROST_MADVISE, );
> + if (!ret && !madv.retained) {
> + panfrost_drm_release_bo(screen, entry, false);
> + continue;
> + }
>  /* Let's go! */
>  return entry;
>  }
> @@ -109,6 +123,13 @@ panfrost_bo_cache_put(
>  struct panfrost_bo *bo)
>  {
>  struct list_head *bucket = pan_bucket(screen, bo->size);
> +struct drm_panfrost_madvise madv;
> +
> +madv.handle = bo->gem_handle;
> +madv.madv = PANFROST_MADV_DONTNEED;
> + madv.retained = 0;
> +
> +drmIoctl(screen->fd, DRM_IOCTL_PANFROST_MADVISE, );
>  
>  /* Add us to the bucket */
>  list_addtail(>link, bucket);
> diff --git a/src/gallium/drivers/panfrost/pan_drm.c 
> b/src/gallium/drivers/panfrost/pan_drm.c
> index 36a6b975680a..28a4287202bd 100644
> --- a/src/gallium/drivers/panfrost/pan_drm.c
> +++ b/src/gallium/drivers/panfrost/pan_drm.c
> @@ -163,6 +163,8 @@ panfrost_drm_release_bo(struct panfrost_screen *screen, 
> struct panfrost_bo *bo,
>  /* Rather than freeing the BO now, we'll cache the BO for later
>   * allocations if we're allowed to */
>  
> +panfrost_drm_munmap_bo(screen, bo);
> +
>  if (cacheable) {
>  bool cached = panfrost_bo_cache_put(screen, bo);
>  
> @@ -172,8 +174,6 @@ panfrost_drm_release_bo(struct panfrost_screen *screen, 
> struct panfrost_bo *bo,
>  
>  /* Otherwise, if the BO wasn't cached, we'll legitimately free the 
> BO */
>  
> -panfrost_drm_munmap_bo(screen, bo);
> -
>  ret = drmIoctl(screen->fd, DRM_IOCTL_GEM_CLOSE, _close);
>  if (ret) {
>  fprintf(stderr, "DRM_IOCTL_GEM_CLOSE failed: %m\n");
> -- 
> 2.20.1
> 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] panfrost: Add madvise support to BO cache

2019-08-09 Thread Rob Herring
On Fri, Aug 9, 2019 at 2:49 PM Alyssa Rosenzweig
 wrote:
>
> I'm not one to care, but fwiw, spacing is inconsistent..?

Context? I guess you mean the 'madv.retained = 0;' line.

>
> > + if (!ret && !madv.retained) {
>
> What's the logic here? (What does a 0/!0 return code mean here?) I'm
> wondering if this meant to be ||?
>
> Or is the idea that an older kernel will have ret!=0 (since it doesn't
> recognize the ioctl) and therefore BOs won't be released, whereas new
> kernels will have ret==0 and then retained=0 if it needs to be released?

Yes. Any error whether madvise is supported or not is just treated as
never having made the ioctl call. The only error we can get is handle
lookup failing.

I could perhaps simplify this by initializing madv.retained to 1 and
never checking the return.

Rob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] panfrost: Add madvise support to BO cache

2019-08-09 Thread Alyssa Rosenzweig
I'm not one to care, but fwiw, spacing is inconsistent..?

> + if (!ret && !madv.retained) {

What's the logic here? (What does a 0/!0 return code mean here?) I'm
wondering if this meant to be ||?

Or is the idea that an older kernel will have ret!=0 (since it doesn't
recognize the ioctl) and therefore BOs won't be released, whereas new
kernels will have ret==0 and then retained=0 if it needs to be released?
> 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] panfrost: Sync UAPI header from kernel

2019-08-09 Thread Alyssa Rosenzweig
Patch #1 is R-b
On Fri, Aug 09, 2019 at 01:53:12PM -0600, Rob Herring wrote:
> Sync the panfrost_drm.h UAPI header with the latest from the kernel.
> This adds madvise ioctl and GPU feature params.
> 
> Signed-off-by: Rob Herring 
> ---
>  include/drm-uapi/panfrost_drm.h | 61 +
>  1 file changed, 61 insertions(+)
> 
> diff --git a/include/drm-uapi/panfrost_drm.h b/include/drm-uapi/panfrost_drm.h
> index 9150dd75aad8..ec19db1eead8 100644
> --- a/include/drm-uapi/panfrost_drm.h
> +++ b/include/drm-uapi/panfrost_drm.h
> @@ -20,6 +20,7 @@ extern "C" {
>  #define DRM_PANFROST_GET_BO_OFFSET   0x05
>  #define DRM_PANFROST_PERFCNT_ENABLE  0x06
>  #define DRM_PANFROST_PERFCNT_DUMP0x07
> +#define DRM_PANFROST_MADVISE 0x08
>  
>  #define DRM_IOCTL_PANFROST_SUBMITDRM_IOW(DRM_COMMAND_BASE + 
> DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
>  #define DRM_IOCTL_PANFROST_WAIT_BO   DRM_IOW(DRM_COMMAND_BASE + 
> DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
> @@ -27,6 +28,7 @@ extern "C" {
>  #define DRM_IOCTL_PANFROST_MMAP_BO   DRM_IOWR(DRM_COMMAND_BASE + 
> DRM_PANFROST_MMAP_BO, struct drm_panfrost_mmap_bo)
>  #define DRM_IOCTL_PANFROST_GET_PARAM DRM_IOWR(DRM_COMMAND_BASE + 
> DRM_PANFROST_GET_PARAM, struct drm_panfrost_get_param)
>  #define DRM_IOCTL_PANFROST_GET_BO_OFFSET DRM_IOWR(DRM_COMMAND_BASE + 
> DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
> +#define DRM_IOCTL_PANFROST_MADVISE   DRM_IOWR(DRM_COMMAND_BASE + 
> DRM_PANFROST_MADVISE, struct drm_panfrost_madvise)
>  
>  /*
>   * Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
> @@ -130,6 +132,45 @@ struct drm_panfrost_mmap_bo {
>  
>  enum drm_panfrost_param {
>   DRM_PANFROST_PARAM_GPU_PROD_ID,
> + DRM_PANFROST_PARAM_GPU_REVISION,
> + DRM_PANFROST_PARAM_SHADER_PRESENT,
> + DRM_PANFROST_PARAM_TILER_PRESENT,
> + DRM_PANFROST_PARAM_L2_PRESENT,
> + DRM_PANFROST_PARAM_STACK_PRESENT,
> + DRM_PANFROST_PARAM_AS_PRESENT,
> + DRM_PANFROST_PARAM_JS_PRESENT,
> + DRM_PANFROST_PARAM_L2_FEATURES,
> + DRM_PANFROST_PARAM_CORE_FEATURES,
> + DRM_PANFROST_PARAM_TILER_FEATURES,
> + DRM_PANFROST_PARAM_MEM_FEATURES,
> + DRM_PANFROST_PARAM_MMU_FEATURES,
> + DRM_PANFROST_PARAM_THREAD_FEATURES,
> + DRM_PANFROST_PARAM_MAX_THREADS,
> + DRM_PANFROST_PARAM_THREAD_MAX_WORKGROUP_SZ,
> + DRM_PANFROST_PARAM_THREAD_MAX_BARRIER_SZ,
> + DRM_PANFROST_PARAM_COHERENCY_FEATURES,
> + DRM_PANFROST_PARAM_TEXTURE_FEATURES0,
> + DRM_PANFROST_PARAM_TEXTURE_FEATURES1,
> + DRM_PANFROST_PARAM_TEXTURE_FEATURES2,
> + DRM_PANFROST_PARAM_TEXTURE_FEATURES3,
> + DRM_PANFROST_PARAM_JS_FEATURES0,
> + DRM_PANFROST_PARAM_JS_FEATURES1,
> + DRM_PANFROST_PARAM_JS_FEATURES2,
> + DRM_PANFROST_PARAM_JS_FEATURES3,
> + DRM_PANFROST_PARAM_JS_FEATURES4,
> + DRM_PANFROST_PARAM_JS_FEATURES5,
> + DRM_PANFROST_PARAM_JS_FEATURES6,
> + DRM_PANFROST_PARAM_JS_FEATURES7,
> + DRM_PANFROST_PARAM_JS_FEATURES8,
> + DRM_PANFROST_PARAM_JS_FEATURES9,
> + DRM_PANFROST_PARAM_JS_FEATURES10,
> + DRM_PANFROST_PARAM_JS_FEATURES11,
> + DRM_PANFROST_PARAM_JS_FEATURES12,
> + DRM_PANFROST_PARAM_JS_FEATURES13,
> + DRM_PANFROST_PARAM_JS_FEATURES14,
> + DRM_PANFROST_PARAM_JS_FEATURES15,
> + DRM_PANFROST_PARAM_NR_CORE_GROUPS,
> + DRM_PANFROST_PARAM_THREAD_TLS_ALLOC,
>  };
>  
>  struct drm_panfrost_get_param {
> @@ -162,6 +203,26 @@ struct drm_panfrost_perfcnt_dump {
>   __u64 buf_ptr;
>  };
>  
> +/* madvise provides a way to tell the kernel in case a buffers contents
> + * can be discarded under memory pressure, which is useful for userspace
> + * bo cache where we want to optimistically hold on to buffer allocate
> + * and potential mmap, but allow the pages to be discarded under memory
> + * pressure.
> + *
> + * Typical usage would involve madvise(DONTNEED) when buffer enters BO
> + * cache, and madvise(WILLNEED) if trying to recycle buffer from BO cache.
> + * In the WILLNEED case, 'retained' indicates to userspace whether the
> + * backing pages still exist.
> + */
> +#define PANFROST_MADV_WILLNEED 0 /* backing pages are needed, status 
> returned in 'retained' */
> +#define PANFROST_MADV_DONTNEED 1 /* backing pages not needed */
> +
> +struct drm_panfrost_madvise {
> + __u32 handle; /* in, GEM handle */
> + __u32 madv;   /* in, PANFROST_MADV_x */
> + __u32 retained;   /* out, whether backing store still exists */
> +};
> +
>  #if defined(__cplusplus)
>  }
>  #endif
> -- 
> 2.20.1
> 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 9/9] panfrost: Implement transform feedback

2019-08-09 Thread Alyssa Rosenzweig
Midgard has no hardware support for transform feedback, so we simulate
it in software. Lucky us.

What Midgard does do is write out vertex shader outputs to main memory
unconditonally. Fragment shaders read varyings back from main memory;
there's no on-chip storage for varyings. Whether this was a reasonable
design is a question I will not be engaging in this commit message.

What that does mean is that, in some sense, Midgard *always* does
transform feedback uncondtionally, and there's no way to turn off
transform feedback. Normally, we would allocate some scratch memory
every frame to store the varyings in an arbitrary format (interleaved
for simplicity), and then feed that scratch to the fragment shader and
discard when the rendering completes.

The only difference now is that sometimes, for some buffers, we use a BO
provided to us by Gallium and a format provided by Gallium, instead of
allocating the memory and choosing the format ourselves. This has some
limitations -- in particular, it only works at vec4 granularity, so a
corresponding GLSL linkage patch is needed to correctly implement
transform feedback for non-vec4 types. Nevertheless, given the hardware
already works in this admittedly-bizarre fashion, transform feedback is
"free". Or, at least, it's no more expensive than any other rendering.

Specifically not implemented is dynamically-sized transform feedback
(i.e. with geometry/tesselation shaders).

Spoiler alert: Midgard has no support for geometry *or* tessellation
shaders, despite advertising support. They get compiled to *massive*
compute shaders. How's that for checkbox compliance?

Signed-off-by: Alyssa Rosenzweig 
---
 src/gallium/drivers/panfrost/pan_varyings.c | 254 +---
 1 file changed, 217 insertions(+), 37 deletions(-)

diff --git a/src/gallium/drivers/panfrost/pan_varyings.c 
b/src/gallium/drivers/panfrost/pan_varyings.c
index 40d7d98bf65..69e9e6d036d 100644
--- a/src/gallium/drivers/panfrost/pan_varyings.c
+++ b/src/gallium/drivers/panfrost/pan_varyings.c
@@ -24,6 +24,7 @@
  */
 
 #include "pan_context.h"
+#include "util/u_prim.h"
 
 static mali_ptr
 panfrost_emit_varyings(
@@ -45,6 +46,33 @@ panfrost_emit_varyings(
 return transfer.gpu;
 }
 
+static void
+panfrost_emit_streamout(
+struct panfrost_context *ctx,
+union mali_attr *slot,
+unsigned stride,
+unsigned offset,
+unsigned count,
+struct pipe_stream_output_target *target)
+{
+/* Fill out the descriptor */
+slot->stride = stride * 4;
+slot->shift = slot->extra_flags = 0;
+
+unsigned max_size = target->buffer_size;
+unsigned expected_size = slot->stride * count;
+
+slot->size = MIN2(max_size, expected_size);
+
+/* Grab the BO and bind it to the batch */
+struct panfrost_job *batch = panfrost_get_job_for_fbo(ctx);
+struct panfrost_bo *bo = pan_resource(target->buffer)->bo;
+panfrost_job_add_bo(batch, bo);
+
+mali_ptr addr = bo->gpu + target->buffer_offset + (offset * 
slot->stride);
+slot->elements = addr;
+}
+
 static void
 panfrost_emit_point_coord(union mali_attr *slot)
 {
@@ -110,6 +138,44 @@ panfrost_emit_varying_meta(
 }
 }
 
+static bool
+has_point_coord(unsigned mask, gl_varying_slot loc)
+{
+if ((loc >= VARYING_SLOT_TEX0) && (loc <= VARYING_SLOT_TEX7))
+return (mask & (1 << (loc - VARYING_SLOT_TEX0)));
+else if (loc == VARYING_SLOT_PNTC)
+return (mask & (1 << 8));
+else
+return false;
+}
+
+/* Helpers for manipulating stream out information so we can pack varyings
+ * accordingly. Compute the src_offset for a given captured varying */
+
+static struct pipe_stream_output
+pan_get_so(struct pipe_stream_output_info info, gl_varying_slot loc)
+{
+for (unsigned i = 0; i < info.num_outputs; ++i) {
+if (info.output[i].register_index == loc)
+return  info.output[i];
+}
+
+unreachable("Varying not captured");
+}
+
+/* TODO: Integers */
+static enum mali_format
+pan_xfb_format(unsigned nr_components)
+{
+switch (nr_components) {
+case 1: return MALI_R32F;
+case 2: return MALI_RG32F;
+case 3: return MALI_RGB32F;
+case 4: return MALI_RGBA32F;
+default: unreachable("Invalid format");
+}
+}
+
 void
 panfrost_emit_varying_descriptor(
 struct panfrost_context *ctx,
@@ -129,53 +195,55 @@ panfrost_emit_varying_descriptor(
 struct panfrost_transfer trans = panfrost_allocate_transient(ctx,
  vs_size + fs_size);
 
-for (unsigned i = 0; i < vs->tripipe->varying_count; i++) {
-if (!is_special_varying(vs->varyings_loc[i]))
-vs->varyings[i].src_offset = 16 * (num_gen_varyings++);
-}
-
-for (unsigned i = 0; i < 

[Mesa-dev] [PATCH 3/9] panfrost: Set PIPE_CAP_TGSI_TEXCOORD

2019-08-09 Thread Alyssa Rosenzweig
It doesn't really make sense, since we don't have special texture
coordinate varyings, but it'll make some code simpler for XFB and it
doesn't hurt us, even if I lose a bit of my soul setting it.

Signed-off-by: Alyssa Rosenzweig 
---
 src/gallium/drivers/panfrost/pan_screen.c   | 5 +
 src/gallium/drivers/panfrost/pan_varyings.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/panfrost/pan_screen.c 
b/src/gallium/drivers/panfrost/pan_screen.c
index 510828ce555..76618fb1668 100644
--- a/src/gallium/drivers/panfrost/pan_screen.c
+++ b/src/gallium/drivers/panfrost/pan_screen.c
@@ -180,6 +180,11 @@ panfrost_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
 case PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL:
 return 0;
 
+/* I really don't want to set this CAP but let's not swim against the
+ * tide.. */
+case PIPE_CAP_TGSI_TEXCOORD:
+return 1;
+
 case PIPE_CAP_SEAMLESS_CUBE_MAP:
 case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
 return 1;
diff --git a/src/gallium/drivers/panfrost/pan_varyings.c 
b/src/gallium/drivers/panfrost/pan_varyings.c
index b4ed512917a..40d7d98bf65 100644
--- a/src/gallium/drivers/panfrost/pan_varyings.c
+++ b/src/gallium/drivers/panfrost/pan_varyings.c
@@ -143,7 +143,7 @@ panfrost_emit_varying_descriptor(
 
 unsigned loc = fs->varyings_loc[i];
 unsigned pnt_loc =
-(loc >= VARYING_SLOT_VAR0) ? (loc - VARYING_SLOT_VAR0) 
:
+(loc >= VARYING_SLOT_TEX0) ? (loc - VARYING_SLOT_TEX0) 
:
 (loc == VARYING_SLOT_PNTC) ? 8 :
 ~0;
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/9] panfrost: Import stream out utility from iris

2019-08-09 Thread Alyssa Rosenzweig
We'll need this in a moment. Ken's implementation, lightly edited for
Panfrost.

Signed-off-by: Alyssa Rosenzweig 
Suggested-by: Kenneth Graunke 
---
 src/gallium/drivers/panfrost/pan_context.c | 40 ++
 1 file changed, 40 insertions(+)

diff --git a/src/gallium/drivers/panfrost/pan_context.c 
b/src/gallium/drivers/panfrost/pan_context.c
index 5e182fec564..2d7e417bb27 100644
--- a/src/gallium/drivers/panfrost/pan_context.c
+++ b/src/gallium/drivers/panfrost/pan_context.c
@@ -1,6 +1,7 @@
 /*
  * © Copyright 2018 Alyssa Rosenzweig
  * Copyright © 2014-2017 Broadcom
+ * Copyright (C) 2017 Intel Corporation
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the "Software"),
@@ -1907,6 +1908,45 @@ panfrost_variant_matches(
 return true;
 }
 
+/**
+ * Fix an uncompiled shader's stream output info, and produce a bitmask
+ * of which VARYING_SLOT_* are captured for stream output.
+ *
+ * Core Gallium stores output->register_index as a "slot" number, where
+ * slots are assigned consecutively to all outputs in info->outputs_written.
+ * This naive packing of outputs doesn't work for us - we too have slots,
+ * but the layout is defined by the VUE map, which we won't have until we
+ * compile a specific shader variant.  So, we remap these and simply store
+ * VARYING_SLOT_* in our copy's output->register_index fields.
+ *
+ * We then produce a bitmask of outputs which are used for SO.
+ *
+ * Implementation from iris.
+ */
+
+static uint64_t
+update_so_info(struct pipe_stream_output_info *so_info,
+   uint64_t outputs_written)
+{
+   uint64_t so_outputs = 0;
+   uint8_t reverse_map[64] = {};
+   unsigned slot = 0;
+
+   while (outputs_written)
+   reverse_map[slot++] = u_bit_scan64(_written);
+
+   for (unsigned i = 0; i < so_info->num_outputs; i++) {
+   struct pipe_stream_output *output = _info->output[i];
+
+   /* Map Gallium's condensed "slots" back to real VARYING_SLOT_* 
enums */
+   output->register_index = reverse_map[output->register_index];
+
+   so_outputs |= 1ull << output->register_index;
+   }
+
+   return so_outputs;
+}
+
 static void
 panfrost_bind_shader_state(
 struct pipe_context *pctx,
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 8/9] panfrost: Increment offsets[] per draw

2019-08-09 Thread Alyssa Rosenzweig
We have to maintain the internal offset ourselves. Per v3d.

Signed-off-by: Alyssa Rosenzweig 
---
 src/gallium/drivers/panfrost/pan_context.c | 10 ++
 src/gallium/drivers/panfrost/pan_context.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/src/gallium/drivers/panfrost/pan_context.c 
b/src/gallium/drivers/panfrost/pan_context.c
index 9f89200a97f..29c8f3d0e62 100644
--- a/src/gallium/drivers/panfrost/pan_context.c
+++ b/src/gallium/drivers/panfrost/pan_context.c
@@ -1530,6 +1530,7 @@ panfrost_draw_vbo(
 /* Take into account a negative bias */
 ctx->vertex_count = info->count + abs(info->index_bias);
 ctx->instance_count = info->instance_count;
+ctx->active_prim = info->mode;
 
 /* For non-indexed draws, they're the same */
 unsigned vertex_count = ctx->vertex_count;
@@ -1644,6 +1645,15 @@ panfrost_draw_vbo(
 
 /* Fire off the draw itself */
 panfrost_queue_draw(ctx);
+
+/* Increment transform feedback offsets */
+
+for (unsigned i = 0; i < ctx->streamout.num_targets; ++i) {
+unsigned output_count = u_stream_outputs_for_vertices(
+ctx->active_prim, ctx->vertex_count);
+
+ctx->streamout.offsets[i] += output_count;
+}
 }
 
 /* CSO state */
diff --git a/src/gallium/drivers/panfrost/pan_context.h 
b/src/gallium/drivers/panfrost/pan_context.h
index 304733abc32..39893655f08 100644
--- a/src/gallium/drivers/panfrost/pan_context.h
+++ b/src/gallium/drivers/panfrost/pan_context.h
@@ -147,6 +147,7 @@ struct panfrost_context {
 
 unsigned vertex_count;
 unsigned instance_count;
+enum pipe_prim_type active_prim;
 
 /* If instancing is enabled, vertex count padded for instance; if
  * it is disabled, just equal to plain vertex count */
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/9] panfrost: Route outputs_written through the compiler

2019-08-09 Thread Alyssa Rosenzweig
It's there in shader_info, but we need to access it from pan_context.c

Signed-off-by: Alyssa Rosenzweig 
---
 src/gallium/drivers/panfrost/pan_assemble.c | 6 +-
 src/gallium/drivers/panfrost/pan_compute.c  | 2 +-
 src/gallium/drivers/panfrost/pan_context.c  | 5 -
 src/gallium/drivers/panfrost/pan_context.h  | 3 ++-
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/panfrost/pan_assemble.c 
b/src/gallium/drivers/panfrost/pan_assemble.c
index 337f97bddbd..47f6c1e5312 100644
--- a/src/gallium/drivers/panfrost/pan_assemble.c
+++ b/src/gallium/drivers/panfrost/pan_assemble.c
@@ -41,7 +41,8 @@ panfrost_shader_compile(
 enum pipe_shader_ir ir_type,
 const void *ir,
 gl_shader_stage stage,
-struct panfrost_shader_state *state)
+struct panfrost_shader_state *state,
+uint64_t *outputs_written)
 {
 struct panfrost_screen *screen = pan_screen(ctx->base.screen);
 uint8_t *dst;
@@ -118,6 +119,9 @@ panfrost_shader_compile(
 state->reads_point_coord = false;
 state->helper_invocations = s->info.fs.needs_helper_invocations;
 
+if (outputs_written)
+*outputs_written = s->info.outputs_written;
+
 /* Separate as primary uniform count is truncated */
 state->uniform_count = program.uniform_count;
 
diff --git a/src/gallium/drivers/panfrost/pan_compute.c 
b/src/gallium/drivers/panfrost/pan_compute.c
index 43fef8d8cfa..3931ab2aeb7 100644
--- a/src/gallium/drivers/panfrost/pan_compute.c
+++ b/src/gallium/drivers/panfrost/pan_compute.c
@@ -53,7 +53,7 @@ panfrost_create_compute_state(
 
 panfrost_shader_compile(ctx, v->tripipe,
 cso->ir_type, cso->prog,
-MESA_SHADER_COMPUTE, v);
+MESA_SHADER_COMPUTE, v, NULL);
 
 
 
diff --git a/src/gallium/drivers/panfrost/pan_context.c 
b/src/gallium/drivers/panfrost/pan_context.c
index 2d7e417bb27..1a7e2db6737 100644
--- a/src/gallium/drivers/panfrost/pan_context.c
+++ b/src/gallium/drivers/panfrost/pan_context.c
@@ -2008,12 +2008,15 @@ panfrost_bind_shader_state(
 /* We finally have a variant, so compile it */
 
 if (!shader_state->compiled) {
+uint64_t outputs_written = 0;
+
 panfrost_shader_compile(ctx, shader_state->tripipe,
   variants->base.type,
   variants->base.type == PIPE_SHADER_IR_NIR ?
   variants->base.ir.nir :
   variants->base.tokens,
-tgsi_processor_to_shader_stage(type), 
shader_state);
+tgsi_processor_to_shader_stage(type), 
shader_state,
+_written);
 
 shader_state->compiled = true;
 }
diff --git a/src/gallium/drivers/panfrost/pan_context.h 
b/src/gallium/drivers/panfrost/pan_context.h
index 24c54fe3467..66cab8736bd 100644
--- a/src/gallium/drivers/panfrost/pan_context.h
+++ b/src/gallium/drivers/panfrost/pan_context.h
@@ -330,7 +330,8 @@ panfrost_shader_compile(
 enum pipe_shader_ir ir_type,
 const void *ir,
 gl_shader_stage stage,
-struct panfrost_shader_state *state);
+struct panfrost_shader_state *state,
+uint64_t *outputs_written);
 
 void
 panfrost_pack_work_groups_compute(
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 7/9] panfrost: Fixup stream out information per variant

2019-08-09 Thread Alyssa Rosenzweig
We could probably get away with doing this once per pipe_shader_state
but let's not jump down that rabbit hole quite yet.

Signed-off-by: Alyssa Rosenzweig 
---
 src/gallium/drivers/panfrost/pan_context.c | 7 +++
 src/gallium/drivers/panfrost/pan_context.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/src/gallium/drivers/panfrost/pan_context.c 
b/src/gallium/drivers/panfrost/pan_context.c
index 1a7e2db6737..9f89200a97f 100644
--- a/src/gallium/drivers/panfrost/pan_context.c
+++ b/src/gallium/drivers/panfrost/pan_context.c
@@ -2019,6 +2019,13 @@ panfrost_bind_shader_state(
 _written);
 
 shader_state->compiled = true;
+
+/* Fixup the stream out information, since what Gallium returns
+ * normally is mildly insane */
+
+shader_state->stream_output = variants->base.stream_output;
+shader_state->so_mask =
+update_so_info(_state->stream_output, 
outputs_written);
 }
 }
 
diff --git a/src/gallium/drivers/panfrost/pan_context.h 
b/src/gallium/drivers/panfrost/pan_context.h
index 66cab8736bd..304733abc32 100644
--- a/src/gallium/drivers/panfrost/pan_context.h
+++ b/src/gallium/drivers/panfrost/pan_context.h
@@ -229,6 +229,8 @@ struct panfrost_shader_state {
 
 struct mali_attr_meta varyings[PIPE_MAX_ATTRIBS];
 gl_varying_slot varyings_loc[PIPE_MAX_ATTRIBS];
+struct pipe_stream_output_info stream_output;
+uint64_t so_mask;
 
 unsigned sysval_count;
 unsigned sysval[MAX_SYSVAL_COUNT];
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/9] panfrost: Flush when using transform feedback

2019-08-09 Thread Alyssa Rosenzweig
This is a huge hack to workaround incomplete BO flushing logic, but it's
enough for the dEQP transform feedback tests, and doing the resource
management to get this right is out-of-scope for this patch series.

Signed-off-by: Alyssa Rosenzweig 
---
 src/gallium/drivers/panfrost/pan_context.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/panfrost/pan_context.c 
b/src/gallium/drivers/panfrost/pan_context.c
index 8e7155f9e0a..5e182fec564 100644
--- a/src/gallium/drivers/panfrost/pan_context.c
+++ b/src/gallium/drivers/panfrost/pan_context.c
@@ -2519,6 +2519,7 @@ panfrost_get_query_result(struct pipe_context *pipe,
 
 case PIPE_QUERY_PRIMITIVES_GENERATED:
 case PIPE_QUERY_PRIMITIVES_EMITTED:
+panfrost_flush(pipe, NULL, PIPE_FLUSH_END_OF_FRAME);
 vresult->u64 = query->end - query->start;
 break;
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/9] panfrost: Wire up statistics for primitives

2019-08-09 Thread Alyssa Rosenzweig
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN should now be handled.

Signed-off-by: Alyssa Rosenzweig 
---
 src/gallium/drivers/panfrost/pan_context.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/src/gallium/drivers/panfrost/pan_context.c 
b/src/gallium/drivers/panfrost/pan_context.c
index 3903a4ca337..8e7155f9e0a 100644
--- a/src/gallium/drivers/panfrost/pan_context.c
+++ b/src/gallium/drivers/panfrost/pan_context.c
@@ -38,6 +38,7 @@
 #include "util/half_float.h"
 #include "util/u_helpers.h"
 #include "util/u_format.h"
+#include "util/u_prim.h"
 #include "util/u_prim_restart.h"
 #include "indices/u_primconvert.h"
 #include "tgsi/tgsi_parse.h"
@@ -234,6 +235,9 @@ panfrost_invalidate_frame(struct panfrost_context *ctx)
 
 /* XXX */
 ctx->dirty |= PAN_DIRTY_SAMPLERS | PAN_DIRTY_TEXTURES;
+
+/* TODO: When does this need to be handled? */
+ctx->active_queries = true;
 }
 
 /* In practice, every field of these payloads should be configurable
@@ -1450,6 +1454,26 @@ panfrost_scissor_culls_everything(struct 
panfrost_context *ctx)
 return (ss->minx == ss->maxx) || (ss->miny == ss->maxy);
 }
 
+/* Count generated primitives (when there is no geom/tess shaders) for
+ * transform feedback */
+
+static void
+panfrost_statistics_record(
+struct panfrost_context *ctx,
+const struct pipe_draw_info *info)
+{
+if (!ctx->active_queries)
+return;
+
+uint32_t prims = u_prims_for_vertices(info->mode, info->count);
+ctx->prims_generated += prims;
+
+if (ctx->streamout.num_targets <= 0)
+return;
+
+ctx->tf_prims_generated += prims;
+}
+
 static void
 panfrost_draw_vbo(
 struct pipe_context *pipe,
@@ -1536,6 +1560,8 @@ panfrost_draw_vbo(
 draw_flags |= 0x800;
 }
 
+panfrost_statistics_record(ctx, info);
+
 if (info->index_size) {
 /* Calculate the min/max index used so we can figure out how
  * many times to invoke the vertex shader */
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/9] panfrost: Implement callbacks for PRIMITIVES queries

2019-08-09 Thread Alyssa Rosenzweig
We're just going to compute them in the driver but let's get the
structures setup to handle them. Implementation from v3d.

Signed-off-by: Alyssa Rosenzweig 
---
 src/gallium/drivers/panfrost/pan_context.c | 52 --
 src/gallium/drivers/panfrost/pan_context.h | 15 ++-
 2 files changed, 52 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/panfrost/pan_context.c 
b/src/gallium/drivers/panfrost/pan_context.c
index ac1d1b9429b..3903a4ca337 100644
--- a/src/gallium/drivers/panfrost/pan_context.c
+++ b/src/gallium/drivers/panfrost/pan_context.c
@@ -2365,7 +2365,8 @@ static void
 panfrost_set_active_query_state(struct pipe_context *pipe,
 bool enable)
 {
-//struct panfrost_context *panfrost = pan_context(pipe);
+struct panfrost_context *ctx = pan_context(pipe);
+ctx->active_queries = enable;
 }
 
 static void
@@ -2415,17 +2416,24 @@ panfrost_begin_query(struct pipe_context *pipe, struct 
pipe_query *q)
 switch (query->type) {
 case PIPE_QUERY_OCCLUSION_COUNTER:
 case PIPE_QUERY_OCCLUSION_PREDICATE:
-case PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE: {
+case PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE:
 /* Allocate a word for the query results to be stored */
 query->transfer = panfrost_allocate_transient(ctx, 
sizeof(unsigned));
-
 ctx->occlusion_query = query;
+break;
+
+/* Geometry statistics are computed in the driver. XXX: geom/tess
+ * shaders.. */
 
+case PIPE_QUERY_PRIMITIVES_GENERATED:
+query->start = ctx->prims_generated;
+break;
+case PIPE_QUERY_PRIMITIVES_EMITTED:
+query->start = ctx->tf_prims_generated;
 break;
-}
 
 default:
-DBG("Skipping query %d\n", query->type);
+fprintf(stderr, "Skipping query %d\n", query->type);
 break;
 }
 
@@ -2436,7 +2444,22 @@ static bool
 panfrost_end_query(struct pipe_context *pipe, struct pipe_query *q)
 {
 struct panfrost_context *ctx = pan_context(pipe);
-ctx->occlusion_query = NULL;
+struct panfrost_query *query = (struct panfrost_query *) q;
+
+switch (query->type) {
+case PIPE_QUERY_OCCLUSION_COUNTER:
+case PIPE_QUERY_OCCLUSION_PREDICATE:
+case PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE:
+ctx->occlusion_query = NULL;
+break;
+case PIPE_QUERY_PRIMITIVES_GENERATED:
+query->end = ctx->prims_generated;
+break;
+case PIPE_QUERY_PRIMITIVES_EMITTED:
+query->end = ctx->tf_prims_generated;
+break;
+}
+
 return true;
 }
 
@@ -2446,18 +2469,16 @@ panfrost_get_query_result(struct pipe_context *pipe,
   bool wait,
   union pipe_query_result *vresult)
 {
-/* STUB */
 struct panfrost_query *query = (struct panfrost_query *) q;
 
-/* We need to flush out the jobs to actually run the counter, TODO
- * check wait, TODO wallpaper after if needed */
-
-panfrost_flush(pipe, NULL, PIPE_FLUSH_END_OF_FRAME);
 
 switch (query->type) {
 case PIPE_QUERY_OCCLUSION_COUNTER:
 case PIPE_QUERY_OCCLUSION_PREDICATE:
-case PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE: {
+case PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE:
+/* Flush first */
+panfrost_flush(pipe, NULL, PIPE_FLUSH_END_OF_FRAME);
+
 /* Read back the query results */
 unsigned *result = (unsigned *) query->transfer.cpu;
 unsigned passed = *result;
@@ -2469,7 +2490,12 @@ panfrost_get_query_result(struct pipe_context *pipe,
 }
 
 break;
-}
+
+case PIPE_QUERY_PRIMITIVES_GENERATED:
+case PIPE_QUERY_PRIMITIVES_EMITTED:
+vresult->u64 = query->end - query->start;
+break;
+
 default:
 DBG("Skipped query get %d\n", query->type);
 break;
diff --git a/src/gallium/drivers/panfrost/pan_context.h 
b/src/gallium/drivers/panfrost/pan_context.h
index 542d24d2c27..24c54fe3467 100644
--- a/src/gallium/drivers/panfrost/pan_context.h
+++ b/src/gallium/drivers/panfrost/pan_context.h
@@ -79,8 +79,16 @@ struct panfrost_query {
 unsigned type;
 unsigned index;
 
-/* Memory for the GPU to writeback the value of the query */
-struct panfrost_transfer transfer;
+union {
+/* For computed queries. 64-bit to prevent overflow */
+struct {
+uint64_t start;
+uint64_t end;
+};
+
+/* Memory for the GPU to writeback the value of the query */
+

[Mesa-dev] [PATCH 2/2] panfrost: Add madvise support to BO cache

2019-08-09 Thread Rob Herring
The kernel now supports madvise ioctl to indicate which BOs can be freed
when there is memory pressure. Mark BOs purgeable when they are in the
BO cache. The BOs must also be munmapped when they are in the cache or
they cannot be purged.

We could optimize avoiding the madvise ioctl on older kernels once the
driver version bump lands, but probably not worth it given the other
driver features also being added.

Signed-off-by: Rob Herring 
---
 src/gallium/drivers/panfrost/pan_bo_cache.c | 21 +
 src/gallium/drivers/panfrost/pan_drm.c  |  4 ++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/panfrost/pan_bo_cache.c 
b/src/gallium/drivers/panfrost/pan_bo_cache.c
index 7378d0a8abea..239ea0b46cb2 100644
--- a/src/gallium/drivers/panfrost/pan_bo_cache.c
+++ b/src/gallium/drivers/panfrost/pan_bo_cache.c
@@ -23,6 +23,8 @@
  * Authors (Collabora):
  *   Alyssa Rosenzweig 
  */
+#include 
+#include "drm-uapi/panfrost_drm.h"
 
 #include "pan_screen.h"
 #include "util/u_math.h"
@@ -88,9 +90,21 @@ panfrost_bo_cache_fetch(
 list_for_each_entry_safe(struct panfrost_bo, entry, bucket, link) {
 if (entry->size >= size &&
 entry->flags == flags) {
+int ret;
+struct drm_panfrost_madvise madv;
+
 /* This one works, splice it out of the cache */
 list_del(>link);
 
+madv.handle = entry->gem_handle;
+madv.madv = PANFROST_MADV_WILLNEED;
+   madv.retained = 0;
+
+ret = drmIoctl(screen->fd, DRM_IOCTL_PANFROST_MADVISE, 
);
+   if (!ret && !madv.retained) {
+   panfrost_drm_release_bo(screen, entry, false);
+   continue;
+   }
 /* Let's go! */
 return entry;
 }
@@ -109,6 +123,13 @@ panfrost_bo_cache_put(
 struct panfrost_bo *bo)
 {
 struct list_head *bucket = pan_bucket(screen, bo->size);
+struct drm_panfrost_madvise madv;
+
+madv.handle = bo->gem_handle;
+madv.madv = PANFROST_MADV_DONTNEED;
+   madv.retained = 0;
+
+drmIoctl(screen->fd, DRM_IOCTL_PANFROST_MADVISE, );
 
 /* Add us to the bucket */
 list_addtail(>link, bucket);
diff --git a/src/gallium/drivers/panfrost/pan_drm.c 
b/src/gallium/drivers/panfrost/pan_drm.c
index 36a6b975680a..28a4287202bd 100644
--- a/src/gallium/drivers/panfrost/pan_drm.c
+++ b/src/gallium/drivers/panfrost/pan_drm.c
@@ -163,6 +163,8 @@ panfrost_drm_release_bo(struct panfrost_screen *screen, 
struct panfrost_bo *bo,
 /* Rather than freeing the BO now, we'll cache the BO for later
  * allocations if we're allowed to */
 
+panfrost_drm_munmap_bo(screen, bo);
+
 if (cacheable) {
 bool cached = panfrost_bo_cache_put(screen, bo);
 
@@ -172,8 +174,6 @@ panfrost_drm_release_bo(struct panfrost_screen *screen, 
struct panfrost_bo *bo,
 
 /* Otherwise, if the BO wasn't cached, we'll legitimately free the BO 
*/
 
-panfrost_drm_munmap_bo(screen, bo);
-
 ret = drmIoctl(screen->fd, DRM_IOCTL_GEM_CLOSE, _close);
 if (ret) {
 fprintf(stderr, "DRM_IOCTL_GEM_CLOSE failed: %m\n");
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] panfrost: Sync UAPI header from kernel

2019-08-09 Thread Rob Herring
Sync the panfrost_drm.h UAPI header with the latest from the kernel.
This adds madvise ioctl and GPU feature params.

Signed-off-by: Rob Herring 
---
 include/drm-uapi/panfrost_drm.h | 61 +
 1 file changed, 61 insertions(+)

diff --git a/include/drm-uapi/panfrost_drm.h b/include/drm-uapi/panfrost_drm.h
index 9150dd75aad8..ec19db1eead8 100644
--- a/include/drm-uapi/panfrost_drm.h
+++ b/include/drm-uapi/panfrost_drm.h
@@ -20,6 +20,7 @@ extern "C" {
 #define DRM_PANFROST_GET_BO_OFFSET 0x05
 #define DRM_PANFROST_PERFCNT_ENABLE0x06
 #define DRM_PANFROST_PERFCNT_DUMP  0x07
+#define DRM_PANFROST_MADVISE   0x08
 
 #define DRM_IOCTL_PANFROST_SUBMIT  DRM_IOW(DRM_COMMAND_BASE + 
DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
 #define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + 
DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
@@ -27,6 +28,7 @@ extern "C" {
 #define DRM_IOCTL_PANFROST_MMAP_BO DRM_IOWR(DRM_COMMAND_BASE + 
DRM_PANFROST_MMAP_BO, struct drm_panfrost_mmap_bo)
 #define DRM_IOCTL_PANFROST_GET_PARAM   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_PANFROST_GET_PARAM, struct drm_panfrost_get_param)
 #define DRM_IOCTL_PANFROST_GET_BO_OFFSET   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
+#define DRM_IOCTL_PANFROST_MADVISE DRM_IOWR(DRM_COMMAND_BASE + 
DRM_PANFROST_MADVISE, struct drm_panfrost_madvise)
 
 /*
  * Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
@@ -130,6 +132,45 @@ struct drm_panfrost_mmap_bo {
 
 enum drm_panfrost_param {
DRM_PANFROST_PARAM_GPU_PROD_ID,
+   DRM_PANFROST_PARAM_GPU_REVISION,
+   DRM_PANFROST_PARAM_SHADER_PRESENT,
+   DRM_PANFROST_PARAM_TILER_PRESENT,
+   DRM_PANFROST_PARAM_L2_PRESENT,
+   DRM_PANFROST_PARAM_STACK_PRESENT,
+   DRM_PANFROST_PARAM_AS_PRESENT,
+   DRM_PANFROST_PARAM_JS_PRESENT,
+   DRM_PANFROST_PARAM_L2_FEATURES,
+   DRM_PANFROST_PARAM_CORE_FEATURES,
+   DRM_PANFROST_PARAM_TILER_FEATURES,
+   DRM_PANFROST_PARAM_MEM_FEATURES,
+   DRM_PANFROST_PARAM_MMU_FEATURES,
+   DRM_PANFROST_PARAM_THREAD_FEATURES,
+   DRM_PANFROST_PARAM_MAX_THREADS,
+   DRM_PANFROST_PARAM_THREAD_MAX_WORKGROUP_SZ,
+   DRM_PANFROST_PARAM_THREAD_MAX_BARRIER_SZ,
+   DRM_PANFROST_PARAM_COHERENCY_FEATURES,
+   DRM_PANFROST_PARAM_TEXTURE_FEATURES0,
+   DRM_PANFROST_PARAM_TEXTURE_FEATURES1,
+   DRM_PANFROST_PARAM_TEXTURE_FEATURES2,
+   DRM_PANFROST_PARAM_TEXTURE_FEATURES3,
+   DRM_PANFROST_PARAM_JS_FEATURES0,
+   DRM_PANFROST_PARAM_JS_FEATURES1,
+   DRM_PANFROST_PARAM_JS_FEATURES2,
+   DRM_PANFROST_PARAM_JS_FEATURES3,
+   DRM_PANFROST_PARAM_JS_FEATURES4,
+   DRM_PANFROST_PARAM_JS_FEATURES5,
+   DRM_PANFROST_PARAM_JS_FEATURES6,
+   DRM_PANFROST_PARAM_JS_FEATURES7,
+   DRM_PANFROST_PARAM_JS_FEATURES8,
+   DRM_PANFROST_PARAM_JS_FEATURES9,
+   DRM_PANFROST_PARAM_JS_FEATURES10,
+   DRM_PANFROST_PARAM_JS_FEATURES11,
+   DRM_PANFROST_PARAM_JS_FEATURES12,
+   DRM_PANFROST_PARAM_JS_FEATURES13,
+   DRM_PANFROST_PARAM_JS_FEATURES14,
+   DRM_PANFROST_PARAM_JS_FEATURES15,
+   DRM_PANFROST_PARAM_NR_CORE_GROUPS,
+   DRM_PANFROST_PARAM_THREAD_TLS_ALLOC,
 };
 
 struct drm_panfrost_get_param {
@@ -162,6 +203,26 @@ struct drm_panfrost_perfcnt_dump {
__u64 buf_ptr;
 };
 
+/* madvise provides a way to tell the kernel in case a buffers contents
+ * can be discarded under memory pressure, which is useful for userspace
+ * bo cache where we want to optimistically hold on to buffer allocate
+ * and potential mmap, but allow the pages to be discarded under memory
+ * pressure.
+ *
+ * Typical usage would involve madvise(DONTNEED) when buffer enters BO
+ * cache, and madvise(WILLNEED) if trying to recycle buffer from BO cache.
+ * In the WILLNEED case, 'retained' indicates to userspace whether the
+ * backing pages still exist.
+ */
+#define PANFROST_MADV_WILLNEED 0   /* backing pages are needed, status 
returned in 'retained' */
+#define PANFROST_MADV_DONTNEED 1   /* backing pages not needed */
+
+struct drm_panfrost_madvise {
+   __u32 handle; /* in, GEM handle */
+   __u32 madv;   /* in, PANFROST_MADV_x */
+   __u32 retained;   /* out, whether backing store still exists */
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 3/5] panfrost: Mark BOs as NOEXEC

2019-08-09 Thread Rob Herring
On Wed, Aug 7, 2019 at 2:37 AM Tomeu Vizoso  wrote:
>
> Unless a BO has the EXECUTABLE flag, mark it as NOEXEC.
>
> v2: - Rework version detection (Alyssa).
>
> Signed-off-by: Tomeu Vizoso 
> ---
>  include/drm-uapi/panfrost_drm.h   | 27 +++

Next time, I think this shouldn't really land in mesa before landing
in the kernel. It's also missing other updates (the additional
params).

>  src/gallium/drivers/panfrost/pan_drm.c|  6 -
>  src/gallium/drivers/panfrost/pan_screen.c |  3 ++-
>  src/gallium/drivers/panfrost/pan_screen.h |  3 +++
>  4 files changed, 37 insertions(+), 2 deletions(-)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/5] panfrost: Allocate shaders in their own BOs

2019-08-09 Thread Rob Herring
On Fri, Aug 9, 2019 at 3:31 AM Tomeu Vizoso  wrote:
>
> On Thu, 8 Aug 2019 at 16:19, Rob Herring  wrote:
> >
> > On Wed, Aug 7, 2019 at 11:23 PM Tomeu Vizoso  
> > wrote:
> > >
> > > On Thu, 8 Aug 2019 at 00:47, Rob Herring  wrote:
> > > >
> > > > On Wed, Aug 7, 2019 at 2:37 AM Tomeu Vizoso 
> > > >  wrote:
> > > > >
> > > > > Instead of all shaders being stored in a single BO, have each shader 
> > > > > in
> > > > > its own.
> > > > >
> > > > > This removes the need for a 16MB allocation per context, and allows us
> > > > > to place transient blend shaders in BOs marked as executable (before
> > > > > they were allocated in the transient pool, which shouldn't be
> > > > > executable).
> > > > >
> > > > > v2: - Store compiled blend shaders in a malloc'ed buffer, to avoid
> > > > >   reading from GPU-accessible memory when patching (Alyssa).
> > > > > - Free struct panfrost_blend_shader (Alyssa).
> > > > > - Give the job a reference to regular shaders when emitting
> > > > >   (Alyssa).
> > > > >
> > > > > Signed-off-by: Tomeu Vizoso 
> > > >
> > > >
> > > > > diff --git a/src/gallium/drivers/panfrost/pan_bo_cache.c 
> > > > > b/src/gallium/drivers/panfrost/pan_bo_cache.c
> > > > > index fba495c1dd69..7378d0a8abea 100644
> > > > > --- a/src/gallium/drivers/panfrost/pan_bo_cache.c
> > > > > +++ b/src/gallium/drivers/panfrost/pan_bo_cache.c
> > > > > @@ -84,11 +84,10 @@ panfrost_bo_cache_fetch(
> > > > >  {
> > > > >  struct list_head *bucket = pan_bucket(screen, size);
> > > > >
> > > > > -/* TODO: Honour flags? */
> > > > > -
> > > > >  /* Iterate the bucket looking for something suitable */
> > > > >  list_for_each_entry_safe(struct panfrost_bo, entry, bucket, 
> > > > > link) {
> > > > > -if (entry->size >= size) {
> > > > > +if (entry->size >= size &&
> > > > > +entry->flags == flags) {
> > > >
> > > > This change probably warrants its own patch IMO.
> > >
> > > Agreed.
> > >
> > > > This is using the
> > > > untranslated flags, but I think it should be the 'translated_flags' as
> > > > those are the ones changing the allocation. For example, I don't think
> > > > there's any reason for DELAYED_MMAP to be used as a match criteria
> > > > (BTW, I'm also not sure if we can reclaim BOs if they remain mmap'ed).
> > > >
> > > > Another problem I see, if we have a 100MB buffer in the cache, would
> > > > we really want to hit on a 4KB allocation? Perhaps a 'entry->size * 2
> > > > < size' check.
> > >
> > > Yeah, as mentioned in the v1 discussion, we have plenty of room for
> > > improvements here, but the goal now is just to stop doing memory
> > > allocation so greedily that we reach OOM after launching a few GL
> > > clients.
> >
> > Sure. IMO, committing the BO cache without madvise was a mistake.
> > Without madvise, 2 instances of glmark will OOM.
>
> How can I test that? I just checked here and I'm running 10 instances
> of it within gnome-shell with 1GB still free (from a total of 2GB).
> This is with HEAP support, without it we'd be still allocating one
> 16MB buffer per context, but it's still not that bad.
>
> It used to be pretty bad when we were allocating gigantic buffers on
> context creation, just to be safe. But Mesa master now is much more
> careful with that and I think .
>
> > I should be able to
> > send out the patch for it today. I think it's going to need to disable
> > caching when madvise is not supported.
>
> Can you check if that would be still needed, please?

It definitely seems better now.

I think before I also had lots of debug dmsg going to disk and filling
the page cache. I tried reproducing that, but still seems okay now.

Rob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 111308] [Regression, NIR, bisected] Black squares in Unigine Heaven via DXVK

2019-08-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=111308

Ian Romanick  changed:

   What|Removed |Added

 Status|NEEDINFO|ASSIGNED

--- Comment #7 from Ian Romanick  ---
https://gitlab.freedesktop.org/mesa/piglit/merge_requests/110 contains several
test cases that reproduce this problem and some related problems.  I should
have an MR for the fixes later today.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 111340] [LLVM] DIRT 3 horrible performance when Shadows set to very high

2019-08-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=111340

Bug ID: 111340
   Summary: [LLVM] DIRT 3 horrible performance when Shadows set to
very high
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Drivers/Vulkan/radeon
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: gr.mue...@gmail.com
QA Contact: mesa-dev@lists.freedesktop.org

Once you set Shadows to very high in the Options Menu the game slows down. You
have to use DXVK to run the game, version doesnt matter.

I already reported this last year when I was still using my Radeon HD 7970:
https://github.com/doitsujin/dxvk/issues/67#issuecomment-421925690

The game takes a nose dive from 134 to 29 fps once you switch the settings.

I tried today with my new Vega 56 and ACO:
with LLVM: 72,5 fps
with ACO: 206,3 fps

So we finally have the proof that LLVM is causing this. This might be the same
problem we have with the long standing and still valid bug:
https://bugs.freedesktop.org/show_bug.cgi?id=100069

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] android: intel/perf: fix missing include path in makefile

2019-08-09 Thread Lionel Landwerlin

Hey Mauro,

I was kind of surprised that u_math.h would pull a gallium header file.
So I pulled the thread a bit and came up with this MR : 
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1625


Sorry it's all over the place :(

-Lionel

On 09/08/2019 15:06, Mauro Rossi wrote:

Fixes the following building error:

In file included from external/mesa/src/intel/perf/gen_perf.c:42:
external/mesa/src/util/u_math.h:42:10:
fatal error: 'pipe/p_compiler.h' file not found
  ^~~
1 error generated.

Fixes: 018f9b8 ("intel/perf: refactor gen_perf_begin_query into gen_perf")
Signed-off-by: Mauro Rossi 
---
  src/intel/Android.perf.mk | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/intel/Android.perf.mk b/src/intel/Android.perf.mk
index 0d7d746a63..c99d84c4be 100644
--- a/src/intel/Android.perf.mk
+++ b/src/intel/Android.perf.mk
@@ -31,7 +31,9 @@ LOCAL_MODULE_CLASS := STATIC_LIBRARIES
  
  intermediates := $(call local-generated-sources-dir)
  
-LOCAL_C_INCLUDES := $(MESA_TOP)/include/drm-uapi

+LOCAL_C_INCLUDES := \
+   $(MESA_TOP)/include/drm-uapi \
+   $(MESA_TOP)/src/gallium/include
  
  LOCAL_SRC_FILES := $(GEN_PERF_FILES)
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 111248] Navi10 Font rendering issue in Overwatch

2019-08-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=111248

--- Comment #4 from Bas Nieuwenhuizen  ---
Anyone able to get a renderdoc capture for this?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] android: intel/perf: fix missing include path in makefile

2019-08-09 Thread Mauro Rossi
Fixes the following building error:

In file included from external/mesa/src/intel/perf/gen_perf.c:42:
external/mesa/src/util/u_math.h:42:10:
fatal error: 'pipe/p_compiler.h' file not found
 ^~~
1 error generated.

Fixes: 018f9b8 ("intel/perf: refactor gen_perf_begin_query into gen_perf")
Signed-off-by: Mauro Rossi 
---
 src/intel/Android.perf.mk | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/intel/Android.perf.mk b/src/intel/Android.perf.mk
index 0d7d746a63..c99d84c4be 100644
--- a/src/intel/Android.perf.mk
+++ b/src/intel/Android.perf.mk
@@ -31,7 +31,9 @@ LOCAL_MODULE_CLASS := STATIC_LIBRARIES
 
 intermediates := $(call local-generated-sources-dir)
 
-LOCAL_C_INCLUDES := $(MESA_TOP)/include/drm-uapi
+LOCAL_C_INCLUDES := \
+   $(MESA_TOP)/include/drm-uapi \
+   $(MESA_TOP)/src/gallium/include
 
 LOCAL_SRC_FILES := $(GEN_PERF_FILES)
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] util/anon_file: Fix build errors due to missing includes

2019-08-09 Thread Eduardo Lima Mitev
It was fixed already with c73988300f943e185a50aaba015f2f114ffcb262.

Eduardo

On 8/8/19 9:31 AM, Eduardo Lima Mitev wrote:
> Including stdlib.h and stdio.h is required in some configurations:
> 
> ../src/util/anon_file.c: In function ‘create_tmpfile_cloexec’:
> ../src/util/anon_file.c:75:9: error: implicit declaration of function 
> ‘mkostemp’ [-Werror=implicit-function-declaration]
> fd = mkostemp(tmpname, O_CLOEXEC);
>  ^~~~
> ../src/util/anon_file.c: In function ‘os_create_anonymous_file’:
> ../src/util/anon_file.c:126:11: error: implicit declaration of function 
> ‘getenv’ [-Werror=implicit-function-declaration]
> path = getenv("XDG_RUNTIME_DIR");
>^~
> ../src/util/anon_file.c:126:9: warning: assignment makes pointer from integer 
> without a cast [-Wint-conversion]
> path = getenv("XDG_RUNTIME_DIR");
>  ^
> ../src/util/anon_file.c:133:7: error: implicit declaration of function 
> ‘asprintf’ [-Werror=implicit-function-declaration]
>asprintf(, "%s/mesa-shared-%s-XX", path, debug_name);
>^~~~
> ../src/util/anon_file.c:141:4: error: implicit declaration of function ‘free’ 
> [-Werror=implicit-function-declaration]
> free(name);
> ^~~~
> ../src/util/anon_file.c:141:4: warning: incompatible implicit declaration of 
> built-in function ‘free’
> ../src/util/anon_file.c:141:4: note: include ‘’ or provide a 
> declaration of ‘free’
> cc1: some warnings being treated as errors
> ---
>  src/util/anon_file.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/util/anon_file.c b/src/util/anon_file.c
> index 184b8445bad..3334e793f5c 100644
> --- a/src/util/anon_file.c
> +++ b/src/util/anon_file.c
> @@ -33,6 +33,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #ifdef __FreeBSD__
>  #include 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/5] panfrost: Allocate shaders in their own BOs

2019-08-09 Thread Tomeu Vizoso
On Thu, 8 Aug 2019 at 16:50, Rob Herring  wrote:
>
> On Wed, Aug 7, 2019 at 5:47 PM Alyssa Rosenzweig
>  wrote:
> >
> > > This is using the
> > > untranslated flags, but I think it should be the 'translated_flags' as
> > > those are the ones changing the allocation.
> >
> > It's a little more complex than that. There some hypothetical
> > untranslated flags that I would want to match on. For instance, future
> > CPU read-only/write-only modifiers -- those affect the mmap (and need to
> > be accounted for in the BO cache) but aren't specified as
> > translated_flags to the kernel.
>
> I'll still argue that we shouldn't leave cached BOs mmap'ed so that
> example would be mute.
>
> The more bits we have to match on, the less effective the BO cache
> will be. Either we should use translated_flags or we should filter the
> untranslated flags to the ones we care about. The latter would be more
> flexible I guess.

Yeah, there's lots to optimize still, fortunately :)

Cheers,

Tomeu
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/5] panfrost: Allocate shaders in their own BOs

2019-08-09 Thread Tomeu Vizoso
On Thu, 8 Aug 2019 at 16:19, Rob Herring  wrote:
>
> On Wed, Aug 7, 2019 at 11:23 PM Tomeu Vizoso  
> wrote:
> >
> > On Thu, 8 Aug 2019 at 00:47, Rob Herring  wrote:
> > >
> > > On Wed, Aug 7, 2019 at 2:37 AM Tomeu Vizoso  
> > > wrote:
> > > >
> > > > Instead of all shaders being stored in a single BO, have each shader in
> > > > its own.
> > > >
> > > > This removes the need for a 16MB allocation per context, and allows us
> > > > to place transient blend shaders in BOs marked as executable (before
> > > > they were allocated in the transient pool, which shouldn't be
> > > > executable).
> > > >
> > > > v2: - Store compiled blend shaders in a malloc'ed buffer, to avoid
> > > >   reading from GPU-accessible memory when patching (Alyssa).
> > > > - Free struct panfrost_blend_shader (Alyssa).
> > > > - Give the job a reference to regular shaders when emitting
> > > >   (Alyssa).
> > > >
> > > > Signed-off-by: Tomeu Vizoso 
> > >
> > >
> > > > diff --git a/src/gallium/drivers/panfrost/pan_bo_cache.c 
> > > > b/src/gallium/drivers/panfrost/pan_bo_cache.c
> > > > index fba495c1dd69..7378d0a8abea 100644
> > > > --- a/src/gallium/drivers/panfrost/pan_bo_cache.c
> > > > +++ b/src/gallium/drivers/panfrost/pan_bo_cache.c
> > > > @@ -84,11 +84,10 @@ panfrost_bo_cache_fetch(
> > > >  {
> > > >  struct list_head *bucket = pan_bucket(screen, size);
> > > >
> > > > -/* TODO: Honour flags? */
> > > > -
> > > >  /* Iterate the bucket looking for something suitable */
> > > >  list_for_each_entry_safe(struct panfrost_bo, entry, bucket, 
> > > > link) {
> > > > -if (entry->size >= size) {
> > > > +if (entry->size >= size &&
> > > > +entry->flags == flags) {
> > >
> > > This change probably warrants its own patch IMO.
> >
> > Agreed.
> >
> > > This is using the
> > > untranslated flags, but I think it should be the 'translated_flags' as
> > > those are the ones changing the allocation. For example, I don't think
> > > there's any reason for DELAYED_MMAP to be used as a match criteria
> > > (BTW, I'm also not sure if we can reclaim BOs if they remain mmap'ed).
> > >
> > > Another problem I see, if we have a 100MB buffer in the cache, would
> > > we really want to hit on a 4KB allocation? Perhaps a 'entry->size * 2
> > > < size' check.
> >
> > Yeah, as mentioned in the v1 discussion, we have plenty of room for
> > improvements here, but the goal now is just to stop doing memory
> > allocation so greedily that we reach OOM after launching a few GL
> > clients.
>
> Sure. IMO, committing the BO cache without madvise was a mistake.
> Without madvise, 2 instances of glmark will OOM.

How can I test that? I just checked here and I'm running 10 instances
of it within gnome-shell with 1GB still free (from a total of 2GB).
This is with HEAP support, without it we'd be still allocating one
16MB buffer per context, but it's still not that bad.

It used to be pretty bad when we were allocating gigantic buffers on
context creation, just to be safe. But Mesa master now is much more
careful with that and I think .

> I should be able to
> send out the patch for it today. I think it's going to need to disable
> caching when madvise is not supported.

Can you check if that would be still needed, please?

Thanks,

Tomeu

> Rob
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 111248] Navi10 Font rendering issue in Overwatch

2019-08-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=111248

--- Comment #3 from Matt  ---
Created attachment 144991
  --> https://bugs.freedesktop.org/attachment.cgi?id=144991=edit
fonts correct on amdvlk (ignore snowy artefacts)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 111248] Navi10 Font rendering issue in Overwatch

2019-08-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=111248

Matt  changed:

   What|Removed |Added

 QA Contact|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop.
   |.org|org
   Assignee|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop.
   |.org|org
  Component|Drivers/Gallium/radeonsi|Drivers/Vulkan/radeon

--- Comment #2 from Matt  ---
Moving this to Vulkan / RADV bug

Loading the game up under the latest amdvlk from 26/7 renders the fonts
correctly (albeit with artefacts that are inherent to that build at the moment
on Navi10). It seems I was mistaken under my initial testing, the game fails to
render at all under wined3d.

Mesa: 19.2 git 39a90749
LLVM: 10.0 git git4575679
Kernel: 5.2.3 with DRM-NEXT patchset

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev