Re: [Mesa-dev] [PATCH] intel: Load the driver even if I915_PARAM_REVISION is not found.

2019-08-19 Thread Rafael Antognolli
On Mon, Aug 19, 2019 at 11:25:38PM +0200, Lionel Landwerlin wrote:
> On 19/08/2019 21:28, Rafael Antognolli wrote:
> > This param is only available starting on kernel 4.16. Use a default
> > value of 0 if it is not found instead.
> 
> 
> I trace the param to :
> 
> 
> commit 27cd44618b92fc8c6889e4628407791e45422bac
> Author: Neil Roberts 
> Date:   Wed Mar 4 14:41:16 2015 +
> 
>     drm/i915: Add I915_PARAM_REVISION
> 
> 
> That seems to be back into 4.1. Could it be another issue?
> 

Yeah, I noticed it later, just ignore this patch.

Thanks for looking, though.

Rafael

> -Lionel
> 
> 
> > 
> > Cc: Jordan Justen 
> > Cc: Mark Janes 
> > ---
> >   src/intel/dev/gen_device_info.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/intel/dev/gen_device_info.c 
> > b/src/intel/dev/gen_device_info.c
> > index 3953a1f4af3..375d13630a5 100644
> > --- a/src/intel/dev/gen_device_info.c
> > +++ b/src/intel/dev/gen_device_info.c
> > @@ -1366,7 +1366,7 @@ gen_get_device_info_from_fd(int fd, struct 
> > gen_device_info *devinfo)
> > return false;
> >  if (!getparam(fd, I915_PARAM_REVISION, >revision))
> > -   return false;
> > +  devinfo->revision = 0;
> >  if (!query_topology(devinfo, fd)) {
> > if (devinfo->gen >= 10) {
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel: Load the driver even if I915_PARAM_REVISION is not found.

2019-08-19 Thread Rafael Antognolli
This commit might also need a:

Fixes: 96e1c945f2b ("i965: Move device info initialization to common
code")

On Mon, Aug 19, 2019 at 12:28:55PM -0700, Rafael Antognolli wrote:
> This param is only available starting on kernel 4.16. Use a default
> value of 0 if it is not found instead.
> 
> Cc: Jordan Justen 
> Cc: Mark Janes 
> ---
>  src/intel/dev/gen_device_info.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/intel/dev/gen_device_info.c b/src/intel/dev/gen_device_info.c
> index 3953a1f4af3..375d13630a5 100644
> --- a/src/intel/dev/gen_device_info.c
> +++ b/src/intel/dev/gen_device_info.c
> @@ -1366,7 +1366,7 @@ gen_get_device_info_from_fd(int fd, struct 
> gen_device_info *devinfo)
>return false;
>  
> if (!getparam(fd, I915_PARAM_REVISION, >revision))
> -   return false;
> +  devinfo->revision = 0;
>  
> if (!query_topology(devinfo, fd)) {
>if (devinfo->gen >= 10) {
> -- 
> 2.21.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] intel: Load the driver even if I915_PARAM_REVISION is not found.

2019-08-19 Thread Rafael Antognolli
This param is only available starting on kernel 4.16. Use a default
value of 0 if it is not found instead.

Cc: Jordan Justen 
Cc: Mark Janes 
---
 src/intel/dev/gen_device_info.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/dev/gen_device_info.c b/src/intel/dev/gen_device_info.c
index 3953a1f4af3..375d13630a5 100644
--- a/src/intel/dev/gen_device_info.c
+++ b/src/intel/dev/gen_device_info.c
@@ -1366,7 +1366,7 @@ gen_get_device_info_from_fd(int fd, struct 
gen_device_info *devinfo)
   return false;
 
if (!getparam(fd, I915_PARAM_REVISION, >revision))
-   return false;
+  devinfo->revision = 0;
 
if (!query_topology(devinfo, fd)) {
   if (devinfo->gen >= 10) {
-- 
2.21.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: Properly initialize device->slice_hash.

2019-08-15 Thread Rafael Antognolli
On Wed, Aug 14, 2019 at 10:05:34PM -0500, Jason Ekstrand wrote:
> I take it this happens when subslices_delta == 0 and we take the early return?

Yes, exactly, in that case device->slice_hash is not initialized. I can
add this to the commit message to make it more clear.

> On Wed, Aug 14, 2019 at 5:45 PM Rafael Antognolli 
> 
> wrote:
> 
> I failed to initialize it on the other cases in GEN11 and it was causing
> a segfault when going through anv_DestroyDevice, if compiled with
> valgrind.
> 
> Fixes: 7bc022b4bbc ("anv/gen11: Emit SLICE_HASH_TABLE when pipes are
> unbalanced.)
> ---
>  src/intel/vulkan/genX_state.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/intel/vulkan/genX_state.c b/src/intel/vulkan/genX_state.c
> index de8b753dd34..3bf4890b4a4 100644
> --- a/src/intel/vulkan/genX_state.c
> +++ b/src/intel/vulkan/genX_state.c
> @@ -92,6 +92,8 @@ static void
>  genX(emit_slice_hashing_state)(struct anv_device *device,
> struct anv_batch *batch)
>  {
> +   device->slice_hash = (struct anv_state) { 0 };
> +
>  #if GEN_GEN == 11
> const unsigned *ppipe_subslices = device->info.ppipe_subslices;
> int subslices_delta = ppipe_subslices[0] - ppipe_subslices[1];
> @@ -156,8 +158,6 @@ genX(emit_slice_hashing_state)(struct anv_device
> *device,
> anv_batch_emit(batch, GENX(3DSTATE_3D_MODE), mode) {
>mode.SliceHashingTableEnable = true;
> }
> -#else
> -   device->slice_hash = (struct anv_state) { 0 };
>  #endif
>  }
> 
> --
> 2.21.0
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] anv: Properly initialize device->slice_hash.

2019-08-14 Thread Rafael Antognolli
I failed to initialize it on the other cases in GEN11 and it was causing
a segfault when going through anv_DestroyDevice, if compiled with
valgrind.

Fixes: 7bc022b4bbc ("anv/gen11: Emit SLICE_HASH_TABLE when pipes are
unbalanced.)
---
 src/intel/vulkan/genX_state.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/genX_state.c b/src/intel/vulkan/genX_state.c
index de8b753dd34..3bf4890b4a4 100644
--- a/src/intel/vulkan/genX_state.c
+++ b/src/intel/vulkan/genX_state.c
@@ -92,6 +92,8 @@ static void
 genX(emit_slice_hashing_state)(struct anv_device *device,
struct anv_batch *batch)
 {
+   device->slice_hash = (struct anv_state) { 0 };
+
 #if GEN_GEN == 11
const unsigned *ppipe_subslices = device->info.ppipe_subslices;
int subslices_delta = ppipe_subslices[0] - ppipe_subslices[1];
@@ -156,8 +158,6 @@ genX(emit_slice_hashing_state)(struct anv_device *device,
anv_batch_emit(batch, GENX(3DSTATE_3D_MODE), mode) {
   mode.SliceHashingTableEnable = true;
}
-#else
-   device->slice_hash = (struct anv_state) { 0 };
 #endif
 }
 
-- 
2.21.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/gen11: fix genX_bits.h include path

2019-08-13 Thread Rafael Antognolli
On Tue, Aug 13, 2019 at 05:50:30PM +0200, Mauro Rossi wrote:
> Instead of "genX_bits.h" use "genxml/genX_bits.h"
> as already done in other similar cases
> 
> Besides being more correct, it also fixes building error in Android.

Ugh, sorry for that.

Reviewed-by: Rafael Antognolli 

> Fixes: f0d2923 ("i965/gen11: Emit SLICE_HASH_TABLE when pipes are 
> unbalanced.")
> Signed-off-by: Mauro Rossi 
> ---
>  src/mesa/drivers/dri/i965/brw_state_upload.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
> b/src/mesa/drivers/dri/i965/brw_state_upload.c
> index c095f2e59c..87550425fc 100644
> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> @@ -43,7 +43,7 @@
>  #include "brw_gs.h"
>  #include "brw_wm.h"
>  #include "brw_cs.h"
> -#include "genX_bits.h"
> +#include "genxml/genX_bits.h"
>  #include "main/framebuffer.h"
>  
>  void
> -- 
> 2.20.1
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel/isl: Align clear color buffer to full cacheline

2019-04-17 Thread Rafael Antognolli
On Wed, Apr 17, 2019 at 09:04:09AM -0700, Kenneth Graunke wrote:
> On Wednesday, April 17, 2019 7:16:28 AM PDT Topi Pohjolainen wrote:
> > From: Rafael Antognolli 
> > 
> > Fixes MCS fast clear gpu hangs with Vulkan CTS on ICL in CI.
> > 
> > CC: Anuj Phogat 
> > CC: Kenneth Graunke 
> > Tested-by: Topi Pohjolainen 
> > Signed-off-by: Rafael Antognolli 
> > ---
> >  src/intel/isl/isl.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > index 6b9e6c9e0f0..acfed5119ba 100644
> > --- a/src/intel/isl/isl.c
> > +++ b/src/intel/isl/isl.c
> > @@ -122,7 +122,8 @@ isl_device_init(struct isl_device *dev,
> > dev->ss.size = RENDER_SURFACE_STATE_length(info) * 4;
> > dev->ss.align = isl_align(dev->ss.size, 32);
> >  
> > -   dev->ss.clear_color_state_size = CLEAR_COLOR_length(info) * 4;
> > +   dev->ss.clear_color_state_size =
> > +  isl_align(CLEAR_COLOR_length(info) * 4, 64);
> > dev->ss.clear_color_state_offset =
> >RENDER_SURFACE_STATE_ClearValueAddress_start(info) / 32 * 4;
> >  
> > 
> 
> I'm not as familiar with Vulkan, but it looks like we're storing this
> clear color data as part of the underlying image's BO, rather than as
> a separate piece of data.  I wonder if it has anything to do with that
> BO being considered tiled, so something is trying to access an entire
> cacheline around here.  Or it's offsetting following data to not be
> cacheline aligned...

Hmmm... Yeah, we store it after the aux buffer, in the same BO as the
image one.

What I think it's the biggest issue in Vulkan is that we store some
data (resolve type and tracking) right after the clear color data. And
the data size is 32B, but the docs say it should be the lower 32B of a
cacheline. For some reason I thought it was safe to write stuff into the
higher 32B, but apparently it wasn't :-/

> I did notice that the clear address has to be 64B aligned.

My understanding is that the image and aux surface were always 4K
aligned, so this restriction would be met. I guess making an assert for
it wouldn't hurt, though...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] iris: Do not fast clear depth on gen > 9 yet.

2019-04-03 Thread Rafael Antognolli
Depth fast clears were unrestricted, meaning they were enabled on every
hardware generation. However, gen11+ requires some extra code to make it
work properly.
---
 src/gallium/drivers/iris/iris_clear.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/iris/iris_clear.c 
b/src/gallium/drivers/iris/iris_clear.c
index 6e0a569e7b0..2fd82c1881e 100644
--- a/src/gallium/drivers/iris/iris_clear.c
+++ b/src/gallium/drivers/iris/iris_clear.c
@@ -353,6 +353,12 @@ can_fast_clear_depth(struct iris_context *ice,
 {
struct pipe_resource *p_res = (void *) res;
 
+   struct iris_batch *batch = >batches[IRIS_BATCH_RENDER];
+   const struct gen_device_info *devinfo = >screen->devinfo;
+
+   if (devinfo->gen > 9)
+  return false;
+
/* Check for partial clears */
if (box->x > 0 || box->y > 0 ||
box->width < u_minify(p_res->width0, level) ||
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/blorp: Remove unused parameter from blorp_surf_for_miptree.

2019-03-14 Thread Rafael Antognolli
It seems pretty useless nowadays.
---
 src/mesa/drivers/dri/i965/brw_blorp.c | 36 +--
 1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 97a5f6a9937..e09a8cef762 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -125,8 +125,7 @@ blorp_surf_for_miptree(struct brw_context *brw,
enum isl_aux_usage aux_usage,
bool is_render_target,
unsigned *level,
-   unsigned start_layer, unsigned num_layers,
-   struct isl_surf tmp_surfs[1])
+   unsigned start_layer, unsigned num_layers)
 {
const struct gen_device_info *devinfo = >screen->devinfo;
 
@@ -406,12 +405,11 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
intel_miptree_prepare_access(brw, dst_mt, dst_level, 1, dst_layer, 1,
 dst_aux_usage, dst_clear_supported);
 
-   struct isl_surf tmp_surfs[2];
struct blorp_surf src_surf, dst_surf;
blorp_surf_for_miptree(brw, _surf, src_mt, src_aux_usage, false,
-  _level, src_layer, 1, _surfs[0]);
+  _level, src_layer, 1);
blorp_surf_for_miptree(brw, _surf, dst_mt, dst_aux_usage, true,
-  _level, dst_layer, 1, _surfs[1]);
+  _level, dst_layer, 1);
 
struct isl_swizzle src_isl_swizzle = {
   .r = swizzle_to_scs(GET_SWZ(src_swizzle, 0)),
@@ -497,12 +495,11 @@ brw_blorp_copy_miptrees(struct brw_context *brw,
intel_miptree_prepare_access(brw, dst_mt, dst_level, 1, dst_layer, 1,
 dst_aux_usage, dst_clear_supported);
 
-   struct isl_surf tmp_surfs[2];
struct blorp_surf src_surf, dst_surf;
blorp_surf_for_miptree(brw, _surf, src_mt, src_aux_usage, false,
-  _level, src_layer, 1, _surfs[0]);
+  _level, src_layer, 1);
blorp_surf_for_miptree(brw, _surf, dst_mt, dst_aux_usage, true,
-  _level, dst_layer, 1, _surfs[1]);
+  _level, dst_layer, 1);
 
/* The hardware seems to have issues with having a two different format
 * views of the same texture in the sampler cache at the same time.  It's
@@ -1300,10 +1297,9 @@ do_single_blorp_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   irb->mt, irb->mt_level, irb->mt_layer, num_layers);
 
   /* We can't setup the blorp_surf until we've allocated the MCS above */
-  struct isl_surf isl_tmp[2];
   struct blorp_surf surf;
   blorp_surf_for_miptree(brw, , irb->mt, irb->mt->aux_usage, true,
- , irb->mt_layer, num_layers, isl_tmp);
+ , irb->mt_layer, num_layers);
 
   /* Ivybrigde PRM Vol 2, Part 1, "11.7 MCS Buffer for Render Target(s)":
*
@@ -1346,10 +1342,9 @@ do_single_blorp_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   intel_miptree_prepare_render(brw, irb->mt, level, irb->mt_layer,
num_layers, aux_usage);
 
-  struct isl_surf isl_tmp[2];
   struct blorp_surf surf;
   blorp_surf_for_miptree(brw, , irb->mt, aux_usage, true,
- , irb->mt_layer, num_layers, isl_tmp);
+ , irb->mt_layer, num_layers);
 
   union isl_color_value clear_color;
   memcpy(clear_color.f32, ctx->Color.ClearColor.f, sizeof(float) * 4);
@@ -1442,7 +1437,6 @@ brw_blorp_clear_depth_stencil(struct brw_context *brw,
   return;
 
uint32_t level, start_layer, num_layers;
-   struct isl_surf isl_tmp[4];
struct blorp_surf depth_surf, stencil_surf;
 
struct intel_mipmap_tree *depth_mt = NULL;
@@ -1459,8 +1453,7 @@ brw_blorp_clear_depth_stencil(struct brw_context *brw,
 
   unsigned depth_level = level;
   blorp_surf_for_miptree(brw, _surf, depth_mt, depth_mt->aux_usage,
- true, _level, start_layer, num_layers,
- _tmp[0]);
+ true, _level, start_layer, num_layers);
   assert(depth_level == level);
}
 
@@ -1489,8 +1482,7 @@ brw_blorp_clear_depth_stencil(struct brw_context *brw,
   unsigned stencil_level = level;
   blorp_surf_for_miptree(brw, _surf, stencil_mt,
  ISL_AUX_USAGE_NONE, true,
- _level, start_layer, num_layers,
- _tmp[2]);
+ _level, start_layer, num_layers);
}
 
assert((mask & BUFFER_BIT_DEPTH) || stencil_mask);
@@ -1525,11 +1517,9 @@ brw_blorp_resolve_color(struct brw_context *brw, struct 
intel_mipmap_tree *mt,
 
const mesa_format format = _mesa_get_srgb_format_linear(mt->format);
 
-   struct isl_surf isl_tmp[1];
struct blorp_surf surf;

Re: [Mesa-dev] [PATCH] i965: Perform manual preemption checks between commands

2019-03-05 Thread Rafael Antognolli
On Tue, Mar 05, 2019 at 07:50:24PM +, Chris Wilson wrote:
> Quoting Rafael Antognolli (2019-03-05 19:33:03)
> > On Tue, Mar 05, 2019 at 09:40:20AM +, Chris Wilson wrote:
> > > Not all commands support being preempted as they execute, and for those
> > > make sure we at least check for being preempted before we start so as to
> > > try and minimise the latency of whomever is more important than
> > > ourselves.
> > > 
> > > Cc: Jari Tahvanainen ,
> > > Cc: Rafael Antognolli 
> > > Cc: Kenneth Graunke 
> > > ---
> > > Always double check before you hit send.
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_defines.h | 1 +
> > >  src/mesa/drivers/dri/i965/brw_draw.c| 7 +++
> > >  2 files changed, 8 insertions(+)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> > > b/src/mesa/drivers/dri/i965/brw_defines.h
> > > index 2729a54e144..ef71c556cca 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > > @@ -1420,6 +1420,7 @@ enum brw_pixel_shader_coverage_mask_mode {
> > >  
> > >  #define MI_NOOP  (CMD_MI | 0)
> > >  
> > > +#define MI_ARB_CHECK (CMD_MI | 0x5 << 23)
> > >  #define MI_BATCH_BUFFER_END  (CMD_MI | 0xA << 23)
> > >  
> > >  #define MI_FLUSH (CMD_MI | (4 << 23))
> > > diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
> > > b/src/mesa/drivers/dri/i965/brw_draw.c
> > > index d07349419cc..a04e334ffc4 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_draw.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_draw.c
> > > @@ -196,6 +196,13 @@ brw_emit_prim(struct brw_context *brw,
> > > if (verts_per_instance == 0 && !prim->is_indirect && !xfb_obj)
> > >return;
> > >  
> > > +   /* If this object is not itself preemptible, check before we begin. */
> > > +   if (!brw->object_preemption) {
> > > +  BEGIN_BATCH(1);
> > > +  OUT_BATCH(MI_ARB_CHECK);
> > > +  ADVANCE_BATCH();
> > > +   }
> > > +
> > 
> > "The command streamer will preempt in the case arbitration is enabled,
> > there is a pending execution list and this command is currently being
> > parsed."
> > 
> > If there is a pending execution list, shouldn't we have been preempted
> > already, since mid-batch preemption is supposed to be enabled?
> 
> No, it still only occurs on certain instructions, not every instruction
> boundary.

Sounds good. In that case,

Reviewed-by: Rafael Antognolli 

> -Chris
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Perform manual preemption checks between commands

2019-03-05 Thread Rafael Antognolli
On Tue, Mar 05, 2019 at 09:40:20AM +, Chris Wilson wrote:
> Not all commands support being preempted as they execute, and for those
> make sure we at least check for being preempted before we start so as to
> try and minimise the latency of whomever is more important than
> ourselves.
> 
> Cc: Jari Tahvanainen ,
> Cc: Rafael Antognolli 
> Cc: Kenneth Graunke 
> ---
> Always double check before you hit send.
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h | 1 +
>  src/mesa/drivers/dri/i965/brw_draw.c| 7 +++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 2729a54e144..ef71c556cca 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -1420,6 +1420,7 @@ enum brw_pixel_shader_coverage_mask_mode {
>  
>  #define MI_NOOP  (CMD_MI | 0)
>  
> +#define MI_ARB_CHECK (CMD_MI | 0x5 << 23)
>  #define MI_BATCH_BUFFER_END  (CMD_MI | 0xA << 23)
>  
>  #define MI_FLUSH (CMD_MI | (4 << 23))
> diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
> b/src/mesa/drivers/dri/i965/brw_draw.c
> index d07349419cc..a04e334ffc4 100644
> --- a/src/mesa/drivers/dri/i965/brw_draw.c
> +++ b/src/mesa/drivers/dri/i965/brw_draw.c
> @@ -196,6 +196,13 @@ brw_emit_prim(struct brw_context *brw,
> if (verts_per_instance == 0 && !prim->is_indirect && !xfb_obj)
>return;
>  
> +   /* If this object is not itself preemptible, check before we begin. */
> +   if (!brw->object_preemption) {
> +  BEGIN_BATCH(1);
> +  OUT_BATCH(MI_ARB_CHECK);
> +  ADVANCE_BATCH();
> +   }
> +

"The command streamer will preempt in the case arbitration is enabled,
there is a pending execution list and this command is currently being
parsed."

If there is a pending execution list, shouldn't we have been preempted
already, since mid-batch preemption is supposed to be enabled?

> /* If we're set to always flush, do it before and after the primitive 
> emit.
>  * We want to catch both missed flushes that hurt instruction/state cache
>  * and missed flushes of the render cache as it heads to other parts of
> -- 
> 2.20.1
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: wire up the state_pool_padding test

2019-02-05 Thread Rafael Antognolli
On Tue, Feb 05, 2019 at 12:09:45PM +, Emil Velikov wrote:
> From: Emil Velikov 
> 
> Cc: Rafael Antognolli 
> Cc: Jason Ekstrand 
> Cc: Dylan Baker 
> Fixes: 927ba12b53c ("anv/tests: Adding test for the state_pool padding.")
> Signed-off-by: Emil Velikov 

Reviewed-by: Rafael Antognolli 

> ---
>  src/intel/Makefile.vulkan.am | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/src/intel/Makefile.vulkan.am b/src/intel/Makefile.vulkan.am
> index b315f10a01a..cad0a57bc7f 100644
> --- a/src/intel/Makefile.vulkan.am
> +++ b/src/intel/Makefile.vulkan.am
> @@ -253,6 +253,7 @@ VULKAN_TESTS = \
>   vulkan/tests/block_pool_no_free \
>   vulkan/tests/state_pool_no_free \
>   vulkan/tests/state_pool_free_list_only \
> + vulkan/tests/state_pool_padding \
>   vulkan/tests/state_pool
>  
>  VULKAN_TEST_LDADD = \
> @@ -274,6 +275,10 @@ vulkan_tests_state_pool_free_list_only_CFLAGS = 
> $(VULKAN_CFLAGS)
>  vulkan_tests_state_pool_free_list_only_CPPFLAGS = $(VULKAN_CPPFLAGS)
>  vulkan_tests_state_pool_free_list_only_LDADD = $(VULKAN_TEST_LDADD)
>  
> +vulkan_tests_state_pool_padding_CFLAGS = $(VULKAN_CFLAGS)
> +vulkan_tests_state_pool_padding_CPPFLAGS = $(VULKAN_CPPFLAGS)
> +vulkan_tests_state_pool_padding_LDADD = $(VULKAN_TEST_LDADD)
> +
>  vulkan_tests_state_pool_CFLAGS = $(VULKAN_CFLAGS)
>  vulkan_tests_state_pool_CPPFLAGS = $(VULKAN_CPPFLAGS)
>  vulkan_tests_state_pool_LDADD = $(VULKAN_TEST_LDADD)
> -- 
> 2.20.1
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] MR: Move pln emul to the fs_visitor.

2019-01-25 Thread Rafael Antognolli
Move the pln emul code to the fs_visitor, so we get some optimizations
that don't happen at the fs_generator level, mainly better scheduling.

One big caveat of this change is that we don't use NF types and the
accumulator anymore, but apparently we don't need the extra precision.

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/160

--
Rafael
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3] anv/allocator: Avoid race condition in anv_block_pool_map.

2019-01-23 Thread Rafael Antognolli
Accessing bo->map and then pool->center_bo_offset without a lock is
racy. One way of avoiding such race condition is to store the bo->map +
center_bo_offset into pool->map at the time the block pool is growing,
which happens within a lock.

v2: Only set pool->map if not using softpin (Jason).
v3: Move things around and only update center_bo_offset if not using
softpin too (Jason).

Cc: Jason Ekstrand 
Reported-by: Ian Romanick 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109442
Fixes: fc3f58832015cbb177179e7f3420d3611479b4a9
---
 src/intel/vulkan/anv_allocator.c | 20 ++--
 src/intel/vulkan/anv_private.h   | 13 +
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 89f26789c85..006175c8c65 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -436,7 +436,9 @@ anv_block_pool_init(struct anv_block_pool *pool,
pool->bo_flags = bo_flags;
pool->nbos = 0;
pool->size = 0;
+   pool->center_bo_offset = 0;
pool->start_address = gen_canonical_address(start_address);
+   pool->map = NULL;
 
/* This pointer will always point to the first BO in the list */
pool->bo = >bos[0];
@@ -536,6 +538,7 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
   if (map == MAP_FAILED)
  return vk_errorf(pool->device->instance, pool->device,
   VK_ERROR_MEMORY_MAP_FAILED, "gem mmap failed: %m");
+  assert(center_bo_offset == 0);
} else {
   /* Just leak the old map until we destroy the pool.  We can't munmap it
* without races or imposing locking on the block allocate fast path. On
@@ -549,6 +552,11 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
   if (map == MAP_FAILED)
  return vk_errorf(pool->device->instance, pool->device,
   VK_ERROR_MEMORY_MAP_FAILED, "mmap failed: %m");
+
+  /* Now that we mapped the new memory, we can write the new
+   * center_bo_offset back into pool and update pool->map. */
+  pool->center_bo_offset = center_bo_offset;
+  pool->map = map + center_bo_offset;
   gem_handle = anv_gem_userptr(pool->device, map, size);
   if (gem_handle == 0) {
  munmap(map, size);
@@ -573,10 +581,6 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
if (!pool->device->info.has_llc)
   anv_gem_set_caching(pool->device, gem_handle, I915_CACHING_CACHED);
 
-   /* Now that we successfull allocated everything, we can write the new
-* center_bo_offset back into pool. */
-   pool->center_bo_offset = center_bo_offset;
-
/* For block pool BOs we have to be a bit careful about where we place them
 * in the GTT.  There are two documented workarounds for state base address
 * placement : Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset
@@ -670,8 +674,12 @@ anv_block_pool_get_bo(struct anv_block_pool *pool, int32_t 
*offset)
 void*
 anv_block_pool_map(struct anv_block_pool *pool, int32_t offset)
 {
-   struct anv_bo *bo = anv_block_pool_get_bo(pool, );
-   return bo->map + pool->center_bo_offset + offset;
+   if (pool->bo_flags & EXEC_OBJECT_PINNED) {
+  struct anv_bo *bo = anv_block_pool_get_bo(pool, );
+  return bo->map + offset;
+   } else {
+  return pool->map + offset;
+   }
 }
 
 /** Grows and re-centers the block pool.
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 3889065c93c..110b2ccf023 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -663,6 +663,19 @@ struct anv_block_pool {
 */
uint32_t center_bo_offset;
 
+   /* Current memory map of the block pool.  This pointer may or may not
+* point to the actual beginning of the block pool memory.  If
+* anv_block_pool_alloc_back has ever been called, then this pointer
+* will point to the "center" position of the buffer and all offsets
+* (negative or positive) given out by the block pool alloc functions
+* will be valid relative to this pointer.
+*
+* In particular, map == bo.map + center_offset
+*
+* DO NOT access this pointer directly. Use anv_block_pool_map() instead,
+* since it will handle the softpin case as well, where this points to NULL.
+*/
+   void *map;
int fd;
 
/**
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] anv/allocator: Avoid race condition in anv_block_pool_map.

2019-01-23 Thread Rafael Antognolli
On Wed, Jan 23, 2019 at 06:08:50PM -0600, Jason Ekstrand wrote:
> On Wed, Jan 23, 2019 at 5:26 PM Rafael Antognolli 
> 
> wrote:
> 
> Accessing bo->map and then pool->center_bo_offset without a lock is
> racy. One way of avoiding such race condition is to store the bo->map +
> center_bo_offset into pool->map at the time the block pool is growing,
> which happens within a lock.
> 
> v2: Only set pool->map if not using softpin (Jason).
> 
> Cc: Jason Ekstrand 
> Reported-by: Ian Romanick 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109442
> Fixes: fc3f58832015cbb177179e7f3420d3611479b4a9
> ---
>  src/intel/vulkan/anv_allocator.c | 11 +--
>  src/intel/vulkan/anv_private.h   | 13 +
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/
> anv_allocator.c
> index 89f26789c85..67f9c2a948d 100644
> --- a/src/intel/vulkan/anv_allocator.c
> +++ b/src/intel/vulkan/anv_allocator.c
> @@ -437,6 +437,7 @@ anv_block_pool_init(struct anv_block_pool *pool,
> pool->nbos = 0;
> pool->size = 0;
> pool->start_address = gen_canonical_address(start_address);
> +   pool->map = NULL;
> 
> /* This pointer will always point to the first BO in the list */
> pool->bo = >bos[0];
> @@ -576,6 +577,8 @@ anv_block_pool_expand_range(struct anv_block_pool
> *pool,
> /* Now that we successfull allocated everything, we can write the new
>  * center_bo_offset back into pool. */
> pool->center_bo_offset = center_bo_offset;
> +   if (!use_softpin)
> +  pool->map = map + center_bo_offset;
> 
> 
> We could also put this a bit higher up right after where we actually call
> mmap.  That would reduce the number of "if (use_softpin)" blocks and probably
> make things more readable.
> 
> Come to think of it, we could also set pool->center_bo_offset there and just
> assert(center_bo_offset == 0) in the softpin case.  I like that.  Maybe that's
> a second patch?

I wouldn't mind sending a v3. In fact, I have it ready and will right
after this email). But I can also simply push this one and send a new
patch tomorrow, if it's too late to get a review on the v3.

> In any case, this fixes today's bug
> 
> Reviewed-by: Jason Ekstrand 

Thanks!
Rafael

> --Jason
>  
> 
> 
> /* For block pool BOs we have to be a bit careful about where we place
> them
>  * in the GTT.  There are two documented workarounds for state base
> address
> @@ -670,8 +673,12 @@ anv_block_pool_get_bo(struct anv_block_pool *pool,
> int32_t *offset)
>  void*
>  anv_block_pool_map(struct anv_block_pool *pool, int32_t offset)
>  {
> -   struct anv_bo *bo = anv_block_pool_get_bo(pool, );
> -   return bo->map + pool->center_bo_offset + offset;
> +   if (pool->bo_flags & EXEC_OBJECT_PINNED) {
> +  struct anv_bo *bo = anv_block_pool_get_bo(pool, );
> +  return bo->map + offset;
> +   } else {
> +  return pool->map + offset;
> +   }
>  }
> 
>  /** Grows and re-centers the block pool.
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/
> anv_private.h
> index 3889065c93c..110b2ccf023 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -663,6 +663,19 @@ struct anv_block_pool {
>  */
> uint32_t center_bo_offset;
> 
> +   /* Current memory map of the block pool.  This pointer may or may not
> +* point to the actual beginning of the block pool memory.  If
> +* anv_block_pool_alloc_back has ever been called, then this pointer
> +* will point to the "center" position of the buffer and all offsets
> +* (negative or positive) given out by the block pool alloc functions
> +* will be valid relative to this pointer.
> +*
> +* In particular, map == bo.map + center_offset
> +*
> +* DO NOT access this pointer directly. Use anv_block_pool_map()
> instead,
> +* since it will handle the softpin case as well, where this points to
> NULL.
> +*/
> +   void *map;
> int fd;
> 
> /**
> --
> 2.17.2
> 
> 

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] anv/allocator: Avoid race condition in anv_block_pool_map.

2019-01-23 Thread Rafael Antognolli
Accessing bo->map and then pool->center_bo_offset without a lock is
racy. One way of avoiding such race condition is to store the bo->map +
center_bo_offset into pool->map at the time the block pool is growing,
which happens within a lock.

v2: Only set pool->map if not using softpin (Jason).

Cc: Jason Ekstrand 
Reported-by: Ian Romanick 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109442
Fixes: fc3f58832015cbb177179e7f3420d3611479b4a9
---
 src/intel/vulkan/anv_allocator.c | 11 +--
 src/intel/vulkan/anv_private.h   | 13 +
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 89f26789c85..67f9c2a948d 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -437,6 +437,7 @@ anv_block_pool_init(struct anv_block_pool *pool,
pool->nbos = 0;
pool->size = 0;
pool->start_address = gen_canonical_address(start_address);
+   pool->map = NULL;
 
/* This pointer will always point to the first BO in the list */
pool->bo = >bos[0];
@@ -576,6 +577,8 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
/* Now that we successfull allocated everything, we can write the new
 * center_bo_offset back into pool. */
pool->center_bo_offset = center_bo_offset;
+   if (!use_softpin)
+  pool->map = map + center_bo_offset;
 
/* For block pool BOs we have to be a bit careful about where we place them
 * in the GTT.  There are two documented workarounds for state base address
@@ -670,8 +673,12 @@ anv_block_pool_get_bo(struct anv_block_pool *pool, int32_t 
*offset)
 void*
 anv_block_pool_map(struct anv_block_pool *pool, int32_t offset)
 {
-   struct anv_bo *bo = anv_block_pool_get_bo(pool, );
-   return bo->map + pool->center_bo_offset + offset;
+   if (pool->bo_flags & EXEC_OBJECT_PINNED) {
+  struct anv_bo *bo = anv_block_pool_get_bo(pool, );
+  return bo->map + offset;
+   } else {
+  return pool->map + offset;
+   }
 }
 
 /** Grows and re-centers the block pool.
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 3889065c93c..110b2ccf023 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -663,6 +663,19 @@ struct anv_block_pool {
 */
uint32_t center_bo_offset;
 
+   /* Current memory map of the block pool.  This pointer may or may not
+* point to the actual beginning of the block pool memory.  If
+* anv_block_pool_alloc_back has ever been called, then this pointer
+* will point to the "center" position of the buffer and all offsets
+* (negative or positive) given out by the block pool alloc functions
+* will be valid relative to this pointer.
+*
+* In particular, map == bo.map + center_offset
+*
+* DO NOT access this pointer directly. Use anv_block_pool_map() instead,
+* since it will handle the softpin case as well, where this points to NULL.
+*/
+   void *map;
int fd;
 
/**
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv/allocator: Avoid race condition in anv_block_pool_map.

2019-01-23 Thread Rafael Antognolli
Accessing bo->map and then pool->center_bo_offset without a lock is
racy. One way of avoiding such race condition is to store the bo->map +
center_bo_offset into pool->map at the time the block pool is growing,
which happens within a lock.

Cc: Jason Ekstrand 
Reported-by: Ian Romanick 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109442
Fixes: fc3f58832015cbb177179e7f3420d3611479b4a9
---
 src/intel/vulkan/anv_allocator.c | 10 --
 src/intel/vulkan/anv_private.h   | 13 +
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 89f26789c85..0bfe55bf684 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -437,6 +437,7 @@ anv_block_pool_init(struct anv_block_pool *pool,
pool->nbos = 0;
pool->size = 0;
pool->start_address = gen_canonical_address(start_address);
+   pool->map = NULL;
 
/* This pointer will always point to the first BO in the list */
pool->bo = >bos[0];
@@ -575,6 +576,7 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
 
/* Now that we successfull allocated everything, we can write the new
 * center_bo_offset back into pool. */
+   pool->map = map + center_bo_offset;
pool->center_bo_offset = center_bo_offset;
 
/* For block pool BOs we have to be a bit careful about where we place them
@@ -670,8 +672,12 @@ anv_block_pool_get_bo(struct anv_block_pool *pool, int32_t 
*offset)
 void*
 anv_block_pool_map(struct anv_block_pool *pool, int32_t offset)
 {
-   struct anv_bo *bo = anv_block_pool_get_bo(pool, );
-   return bo->map + pool->center_bo_offset + offset;
+   if (pool->bo_flags & EXEC_OBJECT_PINNED) {
+  struct anv_bo *bo = anv_block_pool_get_bo(pool, );
+  return bo->map + offset;
+   } else {
+  return pool->map + offset;
+   }
 }
 
 /** Grows and re-centers the block pool.
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 3889065c93c..110b2ccf023 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -663,6 +663,19 @@ struct anv_block_pool {
 */
uint32_t center_bo_offset;
 
+   /* Current memory map of the block pool.  This pointer may or may not
+* point to the actual beginning of the block pool memory.  If
+* anv_block_pool_alloc_back has ever been called, then this pointer
+* will point to the "center" position of the buffer and all offsets
+* (negative or positive) given out by the block pool alloc functions
+* will be valid relative to this pointer.
+*
+* In particular, map == bo.map + center_offset
+*
+* DO NOT access this pointer directly. Use anv_block_pool_map() instead,
+* since it will handle the softpin case as well, where this points to NULL.
+*/
+   void *map;
int fd;
 
/**
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/genxml: add missing MI_PREDICATE compare operations

2019-01-18 Thread Rafael Antognolli
Reviewed-by: Rafael Antognolli 

On Fri, Jan 18, 2019 at 05:01:58PM +, Lionel Landwerlin wrote:
> Doesn't save us a great deal of lines but at least they get decoded in
> aubinators.
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/genxml/gen10.xml | 2 ++
>  src/intel/genxml/gen11.xml | 2 ++
>  src/intel/genxml/gen7.xml  | 2 ++
>  src/intel/genxml/gen75.xml | 2 ++
>  src/intel/genxml/gen8.xml  | 2 ++
>  src/intel/genxml/gen9.xml  | 2 ++
>  src/intel/vulkan/genX_cmd_buffer.c | 1 -
>  7 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml
> index 9ec311d6cc5..7043ab8995d 100644
> --- a/src/intel/genxml/gen10.xml
> +++ b/src/intel/genxml/gen10.xml
> @@ -3047,6 +3047,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/genxml/gen11.xml b/src/intel/genxml/gen11.xml
> index 6ab1f965650..3af80a6ed3d 100644
> --- a/src/intel/genxml/gen11.xml
> +++ b/src/intel/genxml/gen11.xml
> @@ -3042,6 +3042,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml
> index 893c12b8af9..3c445757300 100644
> --- a/src/intel/genxml/gen7.xml
> +++ b/src/intel/genxml/gen7.xml
> @@ -2051,6 +2051,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml
> index 009a123ad69..3df7dc29939 100644
> --- a/src/intel/genxml/gen75.xml
> +++ b/src/intel/genxml/gen75.xml
> @@ -2462,6 +2462,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml
> index fd19b0c8b33..4d1488dae62 100644
> --- a/src/intel/genxml/gen8.xml
> +++ b/src/intel/genxml/gen8.xml
> @@ -2690,6 +2690,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
> index 706d398babb..3f02e866d0c 100644
> --- a/src/intel/genxml/gen9.xml
> +++ b/src/intel/genxml/gen9.xml
> @@ -2973,6 +2973,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 6fb19661ebb..544e2929990 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -3310,7 +3310,6 @@ void genX(CmdDispatchIndirect)(
> }
>  
> /* predicate = !predicate; */
> -#define COMPARE_FALSE   1
> anv_batch_emit(batch, GENX(MI_PREDICATE), mip) {
>mip.LoadOperation= LOAD_LOADINV;
>mip.CombineOperation = COMBINE_OR;
> -- 
> 2.20.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/blorp: Be more conservative about copying clear colors

2019-01-04 Thread Rafael Antognolli
On Fri, Jan 04, 2019 at 01:07:07PM -0600, Jason Ekstrand wrote:
> In 92eb5bbc68d7324 we attempted to avoid copying clear colors whenever
> we weren't doing a resolve.  However, this broke MSAA resolves because
> we need the clear color in the source.  This patch makes blorp much more
> conservative such that it only avoids the clear color copy if either
> aux_usage == NONE or it's explicitly doing a fast-clear.

Ah, nice! I think I tried to fix this by not setting the
surface->clear_color_addr.buffer in some cases, but I think it broke
some crucible test.  I don't remember the details of how, though.

Anyway, this looks better, and I assume it doesn't break anything.

Reviewed-by: Rafael Antognolli 

> Fixes: 92eb5bbc68d7 "intel/blorp: Only copy clear color when doing..."
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107728
> Cc: Rafael Antognolli 
> ---
>  src/intel/blorp/blorp_genX_exec.h | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/src/intel/blorp/blorp_genX_exec.h 
> b/src/intel/blorp/blorp_genX_exec.h
> index 9010b03fb67..29afe8ac78b 100644
> --- a/src/intel/blorp/blorp_genX_exec.h
> +++ b/src/intel/blorp/blorp_genX_exec.h
> @@ -1326,7 +1326,7 @@ blorp_emit_memcpy(struct blorp_batch *batch,
>  static void
>  blorp_emit_surface_state(struct blorp_batch *batch,
>   const struct brw_blorp_surface_info *surface,
> - enum isl_aux_op op,
> + enum isl_aux_op aux_op,
>   void *state, uint32_t state_offset,
>   const bool color_write_disables[4],
>   bool is_render_target)
> @@ -1382,7 +1382,7 @@ blorp_emit_surface_state(struct blorp_batch *batch,
>surface->aux_addr, *aux_addr);
> }
>  
> -   if (surface->clear_color_addr.buffer) {
> +   if (aux_usage != ISL_AUX_USAGE_NONE && surface->clear_color_addr.buffer) {
>  #if GEN_GEN >= 10
>assert((surface->clear_color_addr.offset & 0x3f) == 0);
>uint32_t *clear_addr = state + isl_dev->ss.clear_color_state_offset;
> @@ -1390,7 +1390,10 @@ blorp_emit_surface_state(struct blorp_batch *batch,
>isl_dev->ss.clear_color_state_offset,
>surface->clear_color_addr, *clear_addr);
>  #elif GEN_GEN >= 7
> -  if (op == ISL_AUX_OP_FULL_RESOLVE || op == ISL_AUX_OP_PARTIAL_RESOLVE) 
> {
> +  /* Fast clears just whack the AUX surface and don't actually use the
> +   * clear color for anything.  We can avoid the MI memcpy on that case.
> +   */
> +  if (aux_op != ISL_AUX_OP_FAST_CLEAR) {
>   struct blorp_address dst_addr = 
> blorp_get_surface_base_address(batch);
>   dst_addr.offset += state_offset + isl_dev->ss.clear_value_offset;
>   blorp_emit_memcpy(batch, dst_addr, surface->clear_color_addr,
> -- 
> 2.20.1
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/3] i965/gen9: Add workarounds for object preemption.

2018-12-13 Thread Rafael Antognolli
On Wed, Oct 31, 2018 at 04:27:31PM -0700, Kenneth Graunke wrote:
> On Wednesday, October 31, 2018 11:15:28 AM PDT Rafael Antognolli wrote:
> > On Tue, Oct 30, 2018 at 04:32:54PM -0700, Kenneth Graunke wrote:
> > > On Monday, October 29, 2018 10:19:54 AM PDT Rafael Antognolli wrote:
> > > Do we need any stalling when whacking CS_CHICKEN1...?
> > 
> > Hmmm... there's this:
> > 
> > "A fixed function pipe flush is required before modifying this field"
> > 
> > in the programming notes. I'm not sure what that is, but I assume it's
> > some type of PIPE_CONTROL?
> 
> Yeah.  I'm not honestly sure what kind - "fixed function pipe flush"
> isn't a thing.  Nobody ever uses wording that corresponds to actual
> mechanics of the hardware. :(
> 
> Maybe this would work:
> 
> brw_emit_end_of_pipe_sync(brw, PIPE_CONTROL_RENDER_TARGET_FLUSH);

Hey Ken,

Ressurrecting this old thread... I just noticed that I had this in patch
2/3 (inside brw_enable_obj_preemption()):

 brw_emit_pipe_control_flush(brw, PIPE_CONTROL_FLUSH_ENABLE);

That's bit 7 of the PIPE_CONTROL, and from the docs:

 "Hardware on parsing PIPECONTROL command with Pipe Control Flush Enable
 set will wait for all the outstanding post sync operations
 corresponding to previously executed PIPECONTROL commands are complete
 before making forward progress."

Do you think that's maybe what they meant? And in that case, I guess
maybe I would need a PIPE_CONTROL with post sync operation right before
this one, right?

> > > We may also need to disable autostripping by whacking some chicken
> > > registers if it's enabled (Gen9 WA #0799).  Which would be lame,
> > 
> > Looking again at #0799, it seems it's only applicable up to C0 on SKL,
> > and B0 on BXT. So maybe we should be fine here? Or just disable it on
> > BXT?
> 
> You're right, I misread that.  (I saw the Gen8 "FROM" tag and didn't
> notice that it uses "UNTIL" on Gen9...)  Nothing to do here.
> 
> > > because that's likely a useful optimization.  I guess we could disable
> > > preemption for TRILIST and LINELIST as well to be safe.
> > 
> > Is this because of the autostripping mentioned above, or some other
> > workaround? Or just your impression?
> > 
> > I can update it to include that, but just want to be sure that it's
> > still applicable, once we figure the thing about #0799.
> 
> Autostrip converts TRILIST/LINELIST into TRISTRIP/LINESTRIP, so
> the idea would be to avoid preemption for anything that hits the
> autostrip feature.  But, no need, as you noted above.
> 
> --Ken


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH 09/14] anv: Validate the list of BOs from the block pool.

2018-12-12 Thread Rafael Antognolli
On Mon, Dec 10, 2018 at 01:49:43PM -0600, Jason Ekstrand wrote:
> On Fri, Dec 7, 2018 at 6:06 PM Rafael Antognolli 
> wrote:
> 
> We now have multiple BOs in the block pool, but sometimes we still
> reference only the first one in some instructions, and use relative
> offsets in others. So we must be sure to add all the BOs from the block
> pool to the validation list when submitting commands.
> ---
>  src/intel/vulkan/anv_batch_chain.c | 47 ++
>  1 file changed, 42 insertions(+), 5 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/
> anv_batch_chain.c
> index bec4d647b7e..65df28ccb91 100644
> --- a/src/intel/vulkan/anv_batch_chain.c
> +++ b/src/intel/vulkan/anv_batch_chain.c
> @@ -1356,6 +1356,36 @@ relocate_cmd_buffer(struct anv_cmd_buffer
> *cmd_buffer,
> return true;
>  }
> 
> +static void
> +anv_reloc_list_add_dep(struct anv_cmd_buffer *cmd_buffer,
> +   struct anv_bo_list *bo_list)
> +{
> +   struct anv_bo_list *iter;
> +   struct anv_bo *bo;
> +   struct anv_reloc_list *relocs = cmd_buffer->batch.relocs;
> +
> +   anv_block_pool_foreach_bo(bo_list, iter, bo) {
> +  _mesa_set_add(relocs->deps, bo);
> +   }
> +}
> +
> +static void
> +anv_batch_bos_add(struct anv_cmd_buffer *cmd_buffer)
> +{
> +   struct anv_bo_list *bo_list;
> +
> +   bo_list = cmd_buffer->device->dynamic_state_pool.block_pool.bos;
> +   anv_reloc_list_add_dep(cmd_buffer, bo_list);
> +
> +   bo_list = cmd_buffer->device->instruction_state_pool.block_pool.bos;
> +   anv_reloc_list_add_dep(cmd_buffer, bo_list);
> +
> +   if (cmd_buffer->device->instance->physicalDevice.use_softpin) {
> +  bo_list = cmd_buffer->device->binding_table_pool.block_pool.bos;
> +  anv_reloc_list_add_dep(cmd_buffer, bo_list);
> 
> 
> I don't think want to add things to the command buffer's dependency set this
> late (at submit time).  Instead, I think we want to just do anv_execbuf_add_bo
> for each of them directly at the top of setup_execbuf_for_cmd_buffer.
> 
> 
> +   }
> +}
> +
>  static VkResult
>  setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf,
>   struct anv_cmd_buffer *cmd_buffer)
> @@ -1364,13 +1394,20 @@ setup_execbuf_for_cmd_buffer(struct anv_execbuf
> *execbuf,
> struct anv_state_pool *ss_pool =
>_buffer->device->surface_state_pool;
> 
> +   anv_batch_bos_add(cmd_buffer);
> +
> adjust_relocations_from_state_pool(ss_pool, _buffer->
> surface_relocs,
>cmd_buffer->last_ss_pool_center);
> -   VkResult result = anv_execbuf_add_bo(execbuf, ss_pool->block_pool.bo,
> -_buffer->surface_relocs, 0,
> -_buffer->device->alloc);
> -   if (result != VK_SUCCESS)
> -  return result;
> +   VkResult result;
> +   struct anv_bo *bo;
> +   struct anv_bo_list *iter;
> +   anv_block_pool_foreach_bo(ss_pool->block_pool.bos, iter, bo) {
> +  result = anv_execbuf_add_bo(execbuf, bo,
> +  _buffer->surface_relocs, 0,
> +  _buffer->device->alloc);
> 
> 
> I don't think we want to pass the relocation list on every BO.  Instead, we
> should have a softpin case where we walk the list and don't provide any
> relocations and a non-softpin case where we assert that there is only one BO
> and do provide the relocations.
>  

So, in the case of softpin, we need to pass the surface relocations at
some point, because the BO dependency is stored there. Unless we store
it somewhere else.

If we use:

   anv_execbuf_add_bo(execbuf, bo, NULL, 0, _buffer->device->alloc);

for the surface state pool, it seems like we end up missing some BOs in
the execbuf.

> 
> +  if (result != VK_SUCCESS)
> + return result;
> +   }
> 
> /* First, we walk over all of the bos we've seen and add them and 
> their
>  * relocations to the validate list.
> --
> 2.17.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH 10/14] anv: Add clflush to states.

2018-12-11 Thread Rafael Antognolli
On Mon, Dec 10, 2018 at 01:52:15PM -0600, Jason Ekstrand wrote:
> This seems very much over-the-top.  It would be better to either find the
> specific bug or else just allocate the BOs we use for states as snooped.  See
> also the anv_gem_set_caching call in genX_query.c.

It seems we were missing some flushes in the states allocated in
anv_shader_bin_create(). I added them there and apparently I don't need
this patch anymore.

> On Fri, Dec 7, 2018 at 6:06 PM Rafael Antognolli 
> wrote:
> 
> TODO: This is just flushing the entire dynamic states on every execbuf.
> Maybe it's too much. However, in theory we should be already flushing
> the states as needed, but I think we didn't hit any bug due to the
> coherence implied by userptr.
> ---
>  src/intel/vulkan/anv_batch_chain.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/
> anv_batch_chain.c
> index 65df28ccb91..99009679435 100644
> --- a/src/intel/vulkan/anv_batch_chain.c
> +++ b/src/intel/vulkan/anv_batch_chain.c
> @@ -1366,6 +1366,10 @@ anv_reloc_list_add_dep(struct anv_cmd_buffer
> *cmd_buffer,
> 
> anv_block_pool_foreach_bo(bo_list, iter, bo) {
>_mesa_set_add(relocs->deps, bo);
> +  if (!cmd_buffer->device->info.has_llc) {
> + for (uint32_t i = 0; i < bo->size; i += CACHELINE_SIZE)
> +__builtin_ia32_clflush(bo->map + i);
> +  }
> }
>  }
> 
> --
> 2.17.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH 12/14] anv/allocator: Rework chunk return to the state pool.

2018-12-11 Thread Rafael Antognolli
On Mon, Dec 10, 2018 at 11:10:02PM -0600, Jason Ekstrand wrote:
> 
> 
> On Mon, Dec 10, 2018 at 5:48 PM Rafael Antognolli 
> 
> wrote:
> 
> On Mon, Dec 10, 2018 at 04:56:40PM -0600, Jason Ekstrand wrote:
> > On Fri, Dec 7, 2018 at 6:06 PM Rafael Antognolli <
> rafael.antogno...@intel.com>
> > wrote:
> >
> > This commit tries to rework the code that split and returns chunks
> back
> > to the state pool, while still keeping the same logic.
> >
> > The original code would get a chunk larger than we need and split it
> > into pool->block_size. Then it would return all but the first one,
> and
> > would split that first one into alloc_size chunks. Then it would 
> keep
> > the first one (for the allocation), and return the others back to 
> the
> > pool.
> >
> > The new anv_state_pool_return_chunk() function will take a chunk
> (with
> > the alloc_size part removed), and a small_size hint. It then splits
> that
> > chunk into pool->block_size'd chunks, and if there's some space 
> still
> > left, split that into small_size chunks. small_size in this case is
> the
> > same size as alloc_size.
> >
> > The idea is to keep the same logic, but make it in a way we can 
> reuse
> it
> > to return other chunks to the pool when we are growing the buffer.
> > ---
> >  src/intel/vulkan/anv_allocator.c | 147
> +--
> >  1 file changed, 102 insertions(+), 45 deletions(-)
> >
> > diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/
> > anv_allocator.c
> > index 31258e38635..bddeb4a0fbd 100644
> > --- a/src/intel/vulkan/anv_allocator.c
> > +++ b/src/intel/vulkan/anv_allocator.c
> > @@ -994,6 +994,97 @@ anv_state_pool_get_bucket_size(uint32_t bucket)
> > return 1 << size_log2;
> >  }
> >
> > +/** Helper to create a chunk into the state table.
> > + *
> > + * It just creates 'count' entries into the state table and update
> their
> > sizes,
> > + * offsets and maps, also pushing them as "free" states.
> > + */
> > +static void
> > +anv_state_pool_return_blocks(struct anv_state_pool *pool,
> > + uint32_t chunk_offset, uint32_t count,
> > + uint32_t block_size)
> > +{
> > +   if (count == 0)
> > +  return;
> > +
> > +   uint32_t st_idx = anv_state_table_add(>table, count);
> > +   for (int i = 0; i < count; i++) {
> > +  /* update states that were added back to the state table */
> > +  struct anv_state *state_i = anv_state_table_get(>table,
> > +  st_idx + i);
> > +  state_i->alloc_size = block_size;
> > +  state_i->offset = chunk_offset + block_size * i;
> > +  struct anv_pool_map pool_map = anv_block_pool_map(>
> block_pool,
> > +state_i->
> offset);
> > +  state_i->map = pool_map.map + pool_map.offset;
> > +   }
> > +
> > +   uint32_t block_bucket = anv_state_pool_get_bucket(block_size);
> > +   anv_state_table_push(>buckets[block_bucket].free_list,
> > +>table, st_idx, count);
> > +}
> > +
> > +static uint32_t
> > +calculate_divisor(uint32_t size)
> > +{
> > +   uint32_t bucket = anv_state_pool_get_bucket(size);
> > +
> > +   while (bucket >= 0) {
> > +  uint32_t bucket_size = 
> anv_state_pool_get_bucket_size(bucket);
> > +  if (size % bucket_size == 0)
> > + return bucket_size;
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> > +/** Returns a chunk of memory back to the state pool.
> > + *
> > + * If small_size is zero, we split chunk_size into pool->
> block_size'd
> > pieces,
> > + * and return those. If there's some remaining 'rest' sp

Re: [Mesa-dev] [RFC PATCH 12/14] anv/allocator: Rework chunk return to the state pool.

2018-12-10 Thread Rafael Antognolli
On Mon, Dec 10, 2018 at 04:56:40PM -0600, Jason Ekstrand wrote:
> On Fri, Dec 7, 2018 at 6:06 PM Rafael Antognolli 
> wrote:
> 
> This commit tries to rework the code that split and returns chunks back
> to the state pool, while still keeping the same logic.
> 
> The original code would get a chunk larger than we need and split it
> into pool->block_size. Then it would return all but the first one, and
> would split that first one into alloc_size chunks. Then it would keep
> the first one (for the allocation), and return the others back to the
> pool.
> 
> The new anv_state_pool_return_chunk() function will take a chunk (with
> the alloc_size part removed), and a small_size hint. It then splits that
> chunk into pool->block_size'd chunks, and if there's some space still
> left, split that into small_size chunks. small_size in this case is the
> same size as alloc_size.
> 
> The idea is to keep the same logic, but make it in a way we can reuse it
> to return other chunks to the pool when we are growing the buffer.
> ---
>  src/intel/vulkan/anv_allocator.c | 147 +--
>  1 file changed, 102 insertions(+), 45 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/
> anv_allocator.c
> index 31258e38635..bddeb4a0fbd 100644
> --- a/src/intel/vulkan/anv_allocator.c
> +++ b/src/intel/vulkan/anv_allocator.c
> @@ -994,6 +994,97 @@ anv_state_pool_get_bucket_size(uint32_t bucket)
> return 1 << size_log2;
>  }
> 
> +/** Helper to create a chunk into the state table.
> + *
> + * It just creates 'count' entries into the state table and update their
> sizes,
> + * offsets and maps, also pushing them as "free" states.
> + */
> +static void
> +anv_state_pool_return_blocks(struct anv_state_pool *pool,
> + uint32_t chunk_offset, uint32_t count,
> + uint32_t block_size)
> +{
> +   if (count == 0)
> +  return;
> +
> +   uint32_t st_idx = anv_state_table_add(>table, count);
> +   for (int i = 0; i < count; i++) {
> +  /* update states that were added back to the state table */
> +  struct anv_state *state_i = anv_state_table_get(>table,
> +  st_idx + i);
> +  state_i->alloc_size = block_size;
> +  state_i->offset = chunk_offset + block_size * i;
> +  struct anv_pool_map pool_map = 
> anv_block_pool_map(>block_pool,
> +state_i->offset);
> +  state_i->map = pool_map.map + pool_map.offset;
> +   }
> +
> +   uint32_t block_bucket = anv_state_pool_get_bucket(block_size);
> +   anv_state_table_push(>buckets[block_bucket].free_list,
> +>table, st_idx, count);
> +}
> +
> +static uint32_t
> +calculate_divisor(uint32_t size)
> +{
> +   uint32_t bucket = anv_state_pool_get_bucket(size);
> +
> +   while (bucket >= 0) {
> +  uint32_t bucket_size = anv_state_pool_get_bucket_size(bucket);
> +  if (size % bucket_size == 0)
> + return bucket_size;
> +   }
> +
> +   return 0;
> +}
> +
> +/** Returns a chunk of memory back to the state pool.
> + *
> + * If small_size is zero, we split chunk_size into pool->block_size'd
> pieces,
> + * and return those. If there's some remaining 'rest' space (chunk_size 
> is
> not
> + * divisble by pool->block_size), then we find a bucket size that is a
> divisor
> + * of that rest, and split the 'rest' into that size, returning it to the
> pool.
> + *
> + * If small_size is non-zero, we use it in two different ways:
> + ** if it is larger than pool->block_size, we split the chunk into
> + *small_size'd pieces, instead of pool->block_size'd ones.
> + ** we also use it as the desired size to split the 'rest' after we
> split
> + *the bigger size of the chunk into pool->block_size;
> 
> 
> This seems both overly complicated and not really what we want.  If I have a
> block size of 8k and allocate a single 64-byte state and then a 8k state, it
> will break my almost 8k of padding into 511 64-byte states and return those
> which may be very wasteful if the next thing I do is allocate a 1K state.

Good, this would definitely be a waste.

> It also doesn't provide the current alignment guarantees t

Re: [Mesa-dev] [RFC PATCH 06/14] anv/allocator: Add getters for anv_block_pool.

2018-12-10 Thread Rafael Antognolli
On Mon, Dec 10, 2018 at 12:45:00PM -0600, Jason Ekstrand wrote:
> On Fri, Dec 7, 2018 at 6:06 PM Rafael Antognolli 
> wrote:
> 
> We will need specially the anv_block_pool_map, to find the
> map relative to some BO that is not at the start of the block pool.
> ---
>  src/intel/vulkan/anv_allocator.c   | 23 ---
>  src/intel/vulkan/anv_batch_chain.c |  5 +++--
>  src/intel/vulkan/anv_private.h |  7 +++
>  src/intel/vulkan/genX_blorp_exec.c |  5 +++--
>  4 files changed, 33 insertions(+), 7 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/
> anv_allocator.c
> index cda6a1a9d25..acf3c80fbac 100644
> --- a/src/intel/vulkan/anv_allocator.c
> +++ b/src/intel/vulkan/anv_allocator.c
> @@ -601,6 +601,15 @@ anv_block_pool_expand_range(struct anv_block_pool
> *pool,
> return VK_SUCCESS;
>  }
> 
> +struct anv_pool_map
> +anv_block_pool_map(struct anv_block_pool *pool, int32_t offset)
> +{
> +   return (struct anv_pool_map) {
> +  .map = pool->map,
> +  .offset = offset,
> +   };
> 
> 
> Every caller of this function adds the two together.  Why not just return the
> ofsetted pointer?

Ugh, I guess so. I thought I would have a use case for having them
separated, but so far there isn't. Will fix that for the next version.

> +}
> +
>  /** Grows and re-centers the block pool.
>   *
>   * We grow the block pool in one or both directions in such a way that 
> the
> @@ -967,7 +976,9 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool 
> *pool,
> st_idx +
> i);
> state_i->alloc_size = pool->block_size;
> state_i->offset = chunk_offset + pool->block_size * (i +
> 1);
> -   state_i->map = pool->block_pool.map + state_i->offset;
> +   struct anv_pool_map pool_map = anv_block_pool_map(>
> block_pool,
> + 
> state_i->
> offset);
> +   state_i->map = pool_map.map + pool_map.offset;
>  }
>  anv_state_table_push(>buckets[block_bucket].free_list,
>   >table, st_idx, push_back);
> @@ -983,7 +994,9 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool 
> *pool,
>  st_idx + i);
>  state_i->alloc_size = alloc_size;
>  state_i->offset = chunk_offset + alloc_size * (i + 1);
> -state_i->map = pool->block_pool.map + state_i->offset;
> +struct anv_pool_map pool_map = anv_block_pool_map(>
> block_pool,
> +  state_i->
> offset);
> +state_i->map = pool_map.map + pool_map.offset;
>   }
>   anv_state_table_push(>buckets[bucket].free_list,
>>table, st_idx, push_back);
> @@ -1002,7 +1015,11 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool
> *pool,
> state = anv_state_table_get(>table, idx);
> state->offset = offset;
> state->alloc_size = alloc_size;
> -   state->map = pool->block_pool.map + offset;
> +
> +   struct anv_pool_map pool_map = anv_block_pool_map(>block_pool,
> + state->offset);
> +   state->map = pool_map.map + pool_map.offset;
> +
> 
>  done:
> return *state;
> diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/
> anv_batch_chain.c
> index a9f8c5b79b1..6c06858efe1 100644
> --- a/src/intel/vulkan/anv_batch_chain.c
> +++ b/src/intel/vulkan/anv_batch_chain.c
> @@ -679,8 +679,9 @@ anv_cmd_buffer_alloc_binding_table(struct
> anv_cmd_buffer *cmd_buffer,
>return (struct anv_state) { 0 };
> 
> state.offset = cmd_buffer->bt_next;
> -   state.map = anv_binding_table_pool(device)->block_pool.map +
> -  bt_block->offset + state.offset;
> +   struct anv_pool_map pool_map =
> +  anv_block_pool_map(_binding_table_pool(device)->block_pool,
> bt_block->offset + state.offset);
> +   state.map = pool_map.map + pool_map.offset;
> 
> cmd_buffer->bt_next += state.alloc_size;
> 
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/

[Mesa-dev] [RFC PATCH 09/14] anv: Validate the list of BOs from the block pool.

2018-12-07 Thread Rafael Antognolli
We now have multiple BOs in the block pool, but sometimes we still
reference only the first one in some instructions, and use relative
offsets in others. So we must be sure to add all the BOs from the block
pool to the validation list when submitting commands.
---
 src/intel/vulkan/anv_batch_chain.c | 47 ++
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/src/intel/vulkan/anv_batch_chain.c 
b/src/intel/vulkan/anv_batch_chain.c
index bec4d647b7e..65df28ccb91 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -1356,6 +1356,36 @@ relocate_cmd_buffer(struct anv_cmd_buffer *cmd_buffer,
return true;
 }
 
+static void
+anv_reloc_list_add_dep(struct anv_cmd_buffer *cmd_buffer,
+   struct anv_bo_list *bo_list)
+{
+   struct anv_bo_list *iter;
+   struct anv_bo *bo;
+   struct anv_reloc_list *relocs = cmd_buffer->batch.relocs;
+
+   anv_block_pool_foreach_bo(bo_list, iter, bo) {
+  _mesa_set_add(relocs->deps, bo);
+   }
+}
+
+static void
+anv_batch_bos_add(struct anv_cmd_buffer *cmd_buffer)
+{
+   struct anv_bo_list *bo_list;
+
+   bo_list = cmd_buffer->device->dynamic_state_pool.block_pool.bos;
+   anv_reloc_list_add_dep(cmd_buffer, bo_list);
+
+   bo_list = cmd_buffer->device->instruction_state_pool.block_pool.bos;
+   anv_reloc_list_add_dep(cmd_buffer, bo_list);
+
+   if (cmd_buffer->device->instance->physicalDevice.use_softpin) {
+  bo_list = cmd_buffer->device->binding_table_pool.block_pool.bos;
+  anv_reloc_list_add_dep(cmd_buffer, bo_list);
+   }
+}
+
 static VkResult
 setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf,
  struct anv_cmd_buffer *cmd_buffer)
@@ -1364,13 +1394,20 @@ setup_execbuf_for_cmd_buffer(struct anv_execbuf 
*execbuf,
struct anv_state_pool *ss_pool =
   _buffer->device->surface_state_pool;
 
+   anv_batch_bos_add(cmd_buffer);
+
adjust_relocations_from_state_pool(ss_pool, _buffer->surface_relocs,
   cmd_buffer->last_ss_pool_center);
-   VkResult result = anv_execbuf_add_bo(execbuf, ss_pool->block_pool.bo,
-_buffer->surface_relocs, 0,
-_buffer->device->alloc);
-   if (result != VK_SUCCESS)
-  return result;
+   VkResult result;
+   struct anv_bo *bo;
+   struct anv_bo_list *iter;
+   anv_block_pool_foreach_bo(ss_pool->block_pool.bos, iter, bo) {
+  result = anv_execbuf_add_bo(execbuf, bo,
+  _buffer->surface_relocs, 0,
+  _buffer->device->alloc);
+  if (result != VK_SUCCESS)
+ return result;
+   }
 
/* First, we walk over all of the bos we've seen and add them and their
 * relocations to the validate list.
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC PATCH 02/14] anv/allocator: Add anv_state_table.

2018-12-07 Thread Rafael Antognolli
Add a structure to hold anv_states. This table will initially be used to
recicle anv_states, instead of relying on a linked list implemented in
GPU memory. Later it could be used so that all anv_states just point to
the content of this struct, instead of making copies of anv_states
everywhere.

TODO:
   1) I need to refine the API, specially anv_state_table_add(). So far
   we have to add an item, get the pointer to the anv_state, and then
   fill the content. I tried some different things so far but need to
   come back to this one.

   2) There's a lot of common code between this table backing store
   memory and the anv_block_pool buffer, due to how we grow it. I think
   it's possible to refactory this and reuse code on both places.

   3) Add unit tests.
---
 src/intel/vulkan/anv_allocator.c | 246 ++-
 src/intel/vulkan/anv_private.h   |  44 ++
 2 files changed, 288 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 67f2f73aa11..3590ede6050 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -100,6 +100,9 @@
 /* Allocations are always at least 64 byte aligned, so 1 is an invalid value.
  * We use it to indicate the free list is empty. */
 #define EMPTY 1
+#define EMPTY2 UINT32_MAX
+
+#define PAGE_SIZE 4096
 
 struct anv_mmap_cleanup {
void *map;
@@ -130,6 +133,246 @@ round_to_power_of_two(uint32_t value)
return 1 << ilog2_round_up(value);
 }
 
+struct anv_state_table_cleanup {
+   void *map;
+   size_t size;
+};
+
+#define ANV_STATE_TABLE_CLEANUP_INIT ((struct anv_state_table_cleanup){0})
+#define ANV_STATE_ENTRY_SIZE (sizeof(struct anv_free_entry))
+
+static VkResult
+anv_state_table_expand_range(struct anv_state_table *table, uint32_t size);
+
+VkResult
+anv_state_table_init(struct anv_state_table *table,
+struct anv_device *device,
+uint32_t initial_entries)
+{
+   VkResult result;
+
+   table->device = device;
+
+   table->fd = memfd_create("free table", MFD_CLOEXEC);
+   if (table->fd == -1)
+  return vk_error(VK_ERROR_INITIALIZATION_FAILED);
+
+   /* Just make it 2GB up-front.  The Linux kernel won't actually back it
+* with pages until we either map and fault on one of them or we use
+* userptr and send a chunk of it off to the GPU.
+*/
+   if (ftruncate(table->fd, BLOCK_POOL_MEMFD_SIZE) == -1) {
+  result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
+  goto fail_fd;
+   }
+
+   if (!u_vector_init(>mmap_cleanups,
+  round_to_power_of_two(sizeof(struct 
anv_state_table_cleanup)),
+  128)) {
+  result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
+  goto fail_fd;
+   }
+
+   table->state.next = 0;
+   table->state.end = 0;
+   table->size = 0;
+
+   uint32_t initial_size = initial_entries * ANV_STATE_ENTRY_SIZE;
+   result = anv_state_table_expand_range(table, initial_size);
+   if (result != VK_SUCCESS)
+  goto fail_mmap_cleanups;
+
+   return VK_SUCCESS;
+
+ fail_mmap_cleanups:
+   u_vector_finish(>mmap_cleanups);
+ fail_fd:
+   close(table->fd);
+
+   return result;
+}
+
+static VkResult
+anv_state_table_expand_range(struct anv_state_table *table, uint32_t size)
+{
+   void *map;
+   struct anv_mmap_cleanup *cleanup;
+
+   /* Assert that we only ever grow the pool */
+   assert(size >= table->state.end);
+
+   /* Assert that we don't go outside the bounds of the memfd */
+   assert(size <= BLOCK_POOL_MEMFD_SIZE);
+
+   cleanup = u_vector_add(>mmap_cleanups);
+   if (!cleanup)
+  return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
+
+   *cleanup = ANV_MMAP_CLEANUP_INIT;
+
+   /* Just leak the old map until we destroy the pool.  We can't munmap it
+* without races or imposing locking on the block allocate fast path. On
+* the whole the leaked maps adds up to less than the size of the
+* current map.  MAP_POPULATE seems like the right thing to do, but we
+* should try to get some numbers.
+*/
+   map = mmap(NULL, size, PROT_READ | PROT_WRITE,
+  MAP_SHARED | MAP_POPULATE, table->fd, 0);
+   if (map == MAP_FAILED) {
+  exit(-1);
+  return vk_errorf(table->device->instance, table->device,
+   VK_ERROR_MEMORY_MAP_FAILED, "mmap failed: %m");
+   }
+
+   cleanup->map = map;
+   cleanup->size = size;
+
+   table->map = map;
+   table->size = size;
+
+   return VK_SUCCESS;
+}
+
+static uint32_t
+anv_state_table_grow(struct anv_state_table *table)
+{
+   VkResult result = VK_SUCCESS;
+
+   pthread_mutex_lock(>device->mutex);
+
+   uint32_t used = align_u32(table->state.next * ANV_STATE_ENTRY_SIZE,
+ PAGE_SIZE);
+   uint32_t old_size = table->size;
+
+   /* The block pool is always initialized to a nonzero size and this function
+* is always called after initialization.
+*/
+   assert(old_size > 0);
+
+   uint32_t required = MAX2(used, old_size);
+   if (used * 

[Mesa-dev] [RFC PATCH 14/14] anv/allocator: Add support for non-userptr.

2018-12-07 Thread Rafael Antognolli
If softpin is supported, create new BOs for the required size and add the
respective BO maps. The other main change of this commit is that
anv_block_pool_map() now returns the map for the BO that the given
offset is part of. So there's no block_pool->map access anymore (when
softpin is used.
---
 src/intel/vulkan/anv_allocator.c | 92 ++--
 1 file changed, 53 insertions(+), 39 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 0d426edfb57..46f2278a56c 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -473,17 +473,19 @@ anv_block_pool_init(struct anv_block_pool *pool,
pool->size = 0;
pool->start_address = gen_canonical_address(start_address);
 
-   pool->fd = memfd_create("block pool", MFD_CLOEXEC);
-   if (pool->fd == -1)
-  return vk_error(VK_ERROR_INITIALIZATION_FAILED);
-
-   /* Just make it 2GB up-front.  The Linux kernel won't actually back it
-* with pages until we either map and fault on one of them or we use
-* userptr and send a chunk of it off to the GPU.
-*/
-   if (ftruncate(pool->fd, BLOCK_POOL_MEMFD_SIZE) == -1) {
-  result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
-  goto fail_fd;
+   if (!(pool->bo_flags & EXEC_OBJECT_PINNED)) {
+  pool->fd = memfd_create("block pool", MFD_CLOEXEC);
+  if (pool->fd == -1)
+ return vk_error(VK_ERROR_INITIALIZATION_FAILED);
+
+  /* Just make it 2GB up-front.  The Linux kernel won't actually back it
+   * with pages until we either map and fault on one of them or we use
+   * userptr and send a chunk of it off to the GPU.
+   */
+  if (ftruncate(pool->fd, BLOCK_POOL_MEMFD_SIZE) == -1) {
+ result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
+ goto fail_fd;
+  }
}
 
if (!u_vector_init(>mmap_cleanups,
@@ -507,7 +509,8 @@ anv_block_pool_init(struct anv_block_pool *pool,
  fail_mmap_cleanups:
u_vector_finish(>mmap_cleanups);
  fail_fd:
-   close(pool->fd);
+   if (!(pool->bo_flags & EXEC_OBJECT_PINNED))
+  close(pool->fd);
 
return result;
 }
@@ -525,8 +528,9 @@ anv_block_pool_finish(struct anv_block_pool *pool)
}
 
u_vector_finish(>mmap_cleanups);
+   if (!(pool->bo_flags & EXEC_OBJECT_PINNED))
+  close(pool->fd);
 
-   close(pool->fd);
anv_block_pool_bo_finish(pool);
 }
 
@@ -537,6 +541,7 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
void *map;
uint32_t gem_handle;
struct anv_mmap_cleanup *cleanup;
+   const bool use_softpin = !!(pool->bo_flags & EXEC_OBJECT_PINNED);
 
/* Assert that we only ever grow the pool */
assert(center_bo_offset >= pool->back_state.end);
@@ -544,7 +549,8 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
 
/* Assert that we don't go outside the bounds of the memfd */
assert(center_bo_offset <= BLOCK_POOL_MEMFD_CENTER);
-   assert(size - center_bo_offset <=
+   assert(use_softpin ||
+  size - center_bo_offset <=
   BLOCK_POOL_MEMFD_SIZE - BLOCK_POOL_MEMFD_CENTER);
 
cleanup = u_vector_add(>mmap_cleanups);
@@ -553,28 +559,36 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
 
*cleanup = ANV_MMAP_CLEANUP_INIT;
 
-   /* Just leak the old map until we destroy the pool.  We can't munmap it
-* without races or imposing locking on the block allocate fast path. On
-* the whole the leaked maps adds up to less than the size of the
-* current map.  MAP_POPULATE seems like the right thing to do, but we
-* should try to get some numbers.
-*/
-   map = mmap(NULL, size, PROT_READ | PROT_WRITE,
-  MAP_SHARED | MAP_POPULATE, pool->fd,
-  BLOCK_POOL_MEMFD_CENTER - center_bo_offset);
-   if (map == MAP_FAILED)
-  return vk_errorf(pool->device->instance, pool->device,
-   VK_ERROR_MEMORY_MAP_FAILED, "mmap failed: %m");
-
-   gem_handle = anv_gem_userptr(pool->device, map, size);
-   if (gem_handle == 0) {
-  munmap(map, size);
-  return vk_errorf(pool->device->instance, pool->device,
-   VK_ERROR_TOO_MANY_OBJECTS, "userptr failed: %m");
+   uint32_t newbo_size = size - pool->size;
+   if (use_softpin) {
+  gem_handle = anv_gem_create(pool->device, newbo_size);
+  map = anv_gem_mmap(pool->device, gem_handle, 0, newbo_size, 0);
+  if (map == MAP_FAILED)
+ return vk_errorf(pool->device->instance, pool->device,
+  VK_ERROR_MEMORY_MAP_FAILED, "gem mmap failed: %m");
+   } else {
+  /* Just leak the old map until we destroy the pool.  We can't munmap it
+   * without races or imposing locking on the block allocate fast path. On
+   * the whole the leaked maps adds up to less than the size of the
+   * current map.  MAP_POPULATE seems like the right thing to do, but we
+   * should try to get some numbers.
+   */
+  map = mmap(NULL, size, PROT_READ | PROT_WRITE,
+ 

[Mesa-dev] [RFC PATCH 03/14] anv/allocator: Use anv_state_table on anv_state_pool_alloc.

2018-12-07 Thread Rafael Antognolli
Usage of anv_state_table_add is really annoying, see comment on the
previous commit.
---
 src/intel/vulkan/anv_allocator.c | 96 +---
 src/intel/vulkan/anv_private.h   |  4 +-
 2 files changed, 67 insertions(+), 33 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 3590ede6050..5f0458afd77 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -869,11 +869,17 @@ anv_state_pool_init(struct anv_state_pool *pool,
if (result != VK_SUCCESS)
   return result;
 
+   result = anv_state_table_init(>table, device, 64);
+   if (result != VK_SUCCESS) {
+  anv_block_pool_finish(>block_pool);
+  return result;
+   }
+
assert(util_is_power_of_two_or_zero(block_size));
pool->block_size = block_size;
pool->back_alloc_free_list = ANV_FREE_LIST_EMPTY;
for (unsigned i = 0; i < ANV_STATE_BUCKETS; i++) {
-  pool->buckets[i].free_list = ANV_FREE_LIST_EMPTY;
+  pool->buckets[i].free_list = ANV_FREE_LIST2_EMPTY;
   pool->buckets[i].block.next = 0;
   pool->buckets[i].block.end = 0;
}
@@ -886,6 +892,7 @@ void
 anv_state_pool_finish(struct anv_state_pool *pool)
 {
VG(VALGRIND_DESTROY_MEMPOOL(pool));
+   anv_state_table_finish(>table);
anv_block_pool_finish(>block_pool);
 }
 
@@ -946,22 +953,30 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
 {
uint32_t bucket = anv_state_pool_get_bucket(MAX2(size, align));
 
-   struct anv_state state;
-   state.alloc_size = anv_state_pool_get_bucket_size(bucket);
+   struct anv_state *state;
+   uint32_t alloc_size = anv_state_pool_get_bucket_size(bucket);
+   int32_t offset;
 
/* Try free list first. */
-   if (anv_free_list_pop(>buckets[bucket].free_list,
- >block_pool.map, )) {
-  assert(state.offset >= 0);
+   state = anv_state_table_pop(>buckets[bucket].free_list,
+   >table);
+   if (state) {
+  assert(state->offset >= 0);
   goto done;
}
 
+
/* Try to grab a chunk from some larger bucket and split it up */
for (unsigned b = bucket + 1; b < ANV_STATE_BUCKETS; b++) {
-  int32_t chunk_offset;
-  if (anv_free_list_pop(>buckets[b].free_list,
->block_pool.map, _offset)) {
+  state = anv_state_table_pop(>buckets[b].free_list, >table);
+  if (state) {
  unsigned chunk_size = anv_state_pool_get_bucket_size(b);
+ int32_t chunk_offset = state->offset;
+
+ /* First lets update the state we got to its new size. offset and map
+  * remain the same.
+  */
+ state->alloc_size = alloc_size;
 
  /* We've found a chunk that's larger than the requested state size.
   * There are a couple of options as to what we do with it:
@@ -990,44 +1005,62 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
   * We choose option (3).
   */
  if (chunk_size > pool->block_size &&
- state.alloc_size < pool->block_size) {
+ alloc_size < pool->block_size) {
 assert(chunk_size % pool->block_size == 0);
 /* We don't want to split giant chunks into tiny chunks.  Instead,
  * break anything bigger than a block into block-sized chunks and
  * then break it down into bucket-sized chunks from there.  Return
  * all but the first block of the chunk to the block bucket.
  */
+uint32_t push_back = (chunk_size / pool->block_size) - 1;
 const uint32_t block_bucket =
anv_state_pool_get_bucket(pool->block_size);
-anv_free_list_push(>buckets[block_bucket].free_list,
-   pool->block_pool.map,
-   chunk_offset + pool->block_size,
-   pool->block_size,
-   (chunk_size / pool->block_size) - 1);
+uint32_t st_idx = anv_state_table_add(>table, push_back);
+for (int i = 0; i < push_back; i++) {
+   /* update states that were added back to the state table */
+   struct anv_state *state_i = anv_state_table_get(>table,
+   st_idx + i);
+   state_i->alloc_size = pool->block_size;
+   state_i->offset = chunk_offset + pool->block_size * (i + 1);
+   state_i->map = pool->block_pool.map + state_i->offset;
+}
+anv_state_table_push(>buckets[block_bucket].free_list,
+ >table, st_idx, push_back);
 chunk_size = pool->block_size;
  }
 
- assert(chunk_size % state.alloc_size == 0);
- anv_free_list_push(>buckets[bucket].free_list,
-pool->block_pool.map,
-chunk_offset + state.alloc_size,
-

[Mesa-dev] [RFC PATCH 08/14] anv/allocator: Add support for a list of BOs in block pool.

2018-12-07 Thread Rafael Antognolli
So far we use only one BO (the last one created) in the block pool. When
we switch to not use the userptr API, we will need multiple BOs. So add
code now to store multiple BOs in the block pool.

This has several implications, the main one being that we can't use
pool->map as before. For that reason we update the getter to find which
BO a given offset is part of, and return the respective map.
---
 src/intel/vulkan/anv_allocator.c | 132 +--
 src/intel/vulkan/anv_private.h   |  17 
 2 files changed, 125 insertions(+), 24 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 2eb191e98dc..31258e38635 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -428,6 +428,34 @@ static VkResult
 anv_block_pool_expand_range(struct anv_block_pool *pool,
 uint32_t center_bo_offset, uint32_t size);
 
+static struct anv_bo *
+anv_block_pool_bo_append(struct anv_block_pool *pool, struct anv_bo_list *elem)
+{
+   /* struct anv_bo_list *elem = malloc(sizeof(*elem)); */
+   elem->next = NULL;
+
+   if (pool->last)
+  pool->last->next = elem;
+   pool->last = elem;
+
+   /* if it's the first BO added, set the pointer to BOs too */
+   if (pool->bos == NULL)
+  pool->bos = elem;
+
+   return >bo;
+}
+
+static void
+anv_block_pool_bo_finish(struct anv_block_pool *pool)
+{
+   struct anv_bo_list *iter, *next;
+
+   for (iter = pool->bos; iter != NULL; iter = next) {
+  next = iter ? iter->next : NULL;
+  free(iter);
+   }
+}
+
 VkResult
 anv_block_pool_init(struct anv_block_pool *pool,
 struct anv_device *device,
@@ -439,19 +467,15 @@ anv_block_pool_init(struct anv_block_pool *pool,
 
pool->device = device;
pool->bo_flags = bo_flags;
+   pool->bo = NULL;
+   pool->bos = NULL;
+   pool->last = NULL;
+   pool->size = 0;
pool->start_address = gen_canonical_address(start_address);
 
-   pool->bo = malloc(sizeof(*pool->bo));
-   if (!pool->bo)
-  return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
-
-   anv_bo_init(pool->bo, 0, 0);
-
pool->fd = memfd_create("block pool", MFD_CLOEXEC);
-   if (pool->fd == -1) {
-  result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
-  goto fail_bo;
-   }
+   if (pool->fd == -1)
+  return vk_error(VK_ERROR_INITIALIZATION_FAILED);
 
/* Just make it 2GB up-front.  The Linux kernel won't actually back it
 * with pages until we either map and fault on one of them or we use
@@ -484,8 +508,6 @@ anv_block_pool_init(struct anv_block_pool *pool,
u_vector_finish(>mmap_cleanups);
  fail_fd:
close(pool->fd);
- fail_bo:
-   free(pool->bo);
 
return result;
 }
@@ -495,7 +517,6 @@ anv_block_pool_finish(struct anv_block_pool *pool)
 {
struct anv_mmap_cleanup *cleanup;
 
-   free(pool->bo);
u_vector_foreach(cleanup, >mmap_cleanups) {
   if (cleanup->map)
  munmap(cleanup->map, cleanup->size);
@@ -506,6 +527,7 @@ anv_block_pool_finish(struct anv_block_pool *pool)
u_vector_finish(>mmap_cleanups);
 
close(pool->fd);
+   anv_block_pool_bo_finish(pool);
 }
 
 static VkResult
@@ -599,24 +621,86 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
 * the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag and the kernel does all of the
 * hard work for us.
 */
-   anv_bo_init(pool->bo, gem_handle, size);
+   struct anv_bo *bo;
+   struct anv_bo_list *bo_elem = NULL;
+
+   /* If using softpin, we will keep adding new BOs every time we expand the
+* range. On the other hand, if not using softpin, we need to add a BO if we
+* don't have one yet.
+*/
+   if (!pool->bo) {
+  bo_elem = malloc(sizeof(*bo_elem));
+  bo = _elem->bo;
+   } else {
+  bo = pool->bo;
+   }
+
+   /* pool->bo will always point to the first BO added on this block pool. */
+   if (!pool->bo)
+  pool->bo = bo;
+
+   anv_bo_init(bo, gem_handle, size);
if (pool->bo_flags & EXEC_OBJECT_PINNED) {
-  pool->bo->offset = pool->start_address + BLOCK_POOL_MEMFD_CENTER -
+  bo->offset = pool->start_address + BLOCK_POOL_MEMFD_CENTER -
  center_bo_offset;
}
-   pool->bo->flags = pool->bo_flags;
-   pool->bo->map = map;
+   bo->flags = pool->bo_flags;
+   bo->map = map;
+
+   if (bo_elem)
+  anv_block_pool_bo_append(pool, bo_elem);
+   pool->size = size;
 
return VK_SUCCESS;
 }
 
+static struct anv_bo *
+anv_block_pool_get_bo(struct anv_block_pool *pool, int32_t *offset)
+{
+   struct anv_bo *bo, *bo_found = NULL;
+   int32_t cur_offset = 0;
+
+   assert(offset);
+
+   if (!(pool->bo_flags & EXEC_OBJECT_PINNED))
+  return pool->bo;
+
+   struct anv_bo_list *iter;
+   anv_block_pool_foreach_bo(pool->bos, iter, bo) {
+  if (*offset < cur_offset + bo->size) {
+ bo_found = bo;
+ break;
+  }
+  cur_offset += bo->size;
+   }
+
+   assert(bo_found != NULL);
+   *offset -= cur_offset;
+
+   return bo_found;
+}
+
 struct anv_pool_map
 

[Mesa-dev] [RFC PATCH 04/14] anv/allocator: Use anv_state_table on back_alloc too.

2018-12-07 Thread Rafael Antognolli
---
 src/intel/vulkan/anv_allocator.c | 32 ++--
 src/intel/vulkan/anv_private.h   |  2 +-
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 5f0458afd77..2171a97970b 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -877,7 +877,7 @@ anv_state_pool_init(struct anv_state_pool *pool,
 
assert(util_is_power_of_two_or_zero(block_size));
pool->block_size = block_size;
-   pool->back_alloc_free_list = ANV_FREE_LIST_EMPTY;
+   pool->back_alloc_free_list = ANV_FREE_LIST2_EMPTY;
for (unsigned i = 0; i < ANV_STATE_BUCKETS; i++) {
   pool->buckets[i].free_list = ANV_FREE_LIST2_EMPTY;
   pool->buckets[i].block.next = 0;
@@ -1077,22 +1077,27 @@ anv_state_pool_alloc(struct anv_state_pool *pool, 
uint32_t size, uint32_t align)
 struct anv_state
 anv_state_pool_alloc_back(struct anv_state_pool *pool)
 {
-   struct anv_state state;
-   state.alloc_size = pool->block_size;
+   struct anv_state *state;
+   uint32_t alloc_size = pool->block_size;
 
-   if (anv_free_list_pop(>back_alloc_free_list,
- >block_pool.map, )) {
-  assert(state.offset < 0);
+   state = anv_state_table_pop(>back_alloc_free_list, >table);
+   if (state) {
+  assert(state->offset < 0);
   goto done;
}
 
-   state.offset = anv_block_pool_alloc_back(>block_pool,
-pool->block_size);
+   int32_t offset;
+   offset = anv_block_pool_alloc_back(>block_pool,
+  pool->block_size);
+   uint32_t idx = anv_state_table_add(>table, 1);
+   state = anv_state_table_get(>table, idx);
+   state->offset = offset;
+   state->alloc_size = alloc_size;
+   state->map = pool->block_pool.map + state->offset;
 
 done:
-   state.map = pool->block_pool.map + state.offset;
-   VG(VALGRIND_MEMPOOL_ALLOC(pool, state.map, state.alloc_size));
-   return state;
+   VG(VALGRIND_MEMPOOL_ALLOC(pool, state->map, state->alloc_size));
+   return *state;
 }
 
 static void
@@ -1103,9 +1108,8 @@ anv_state_pool_free_no_vg(struct anv_state_pool *pool, 
struct anv_state state)
 
if (state.offset < 0) {
   assert(state.alloc_size == pool->block_size);
-  anv_free_list_push(>back_alloc_free_list,
- pool->block_pool.map, state.offset,
- state.alloc_size, 1);
+  anv_state_table_push(>back_alloc_free_list,
+   >table, state.idx, 1);
} else {
   anv_state_table_push(>buckets[bucket].free_list,
>table, state.idx, 1);
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index f7b3ec5f6a4..d068a4be5d8 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -737,7 +737,7 @@ struct anv_state_pool {
uint32_t block_size;
 
/** Free list for "back" allocations */
-   union anv_free_list back_alloc_free_list;
+   union anv_free_list2 back_alloc_free_list;
 
struct anv_fixed_size_state_pool buckets[ANV_STATE_BUCKETS];
 };
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC PATCH 07/14] anv: Update usage of block_pool->bo.

2018-12-07 Thread Rafael Antognolli
Change block_pool->bo to be a pointer, and update its usage everywhere.
This makes it simpler to switch it later to a list of BOs.
---
 src/intel/vulkan/anv_allocator.c   | 31 +++---
 src/intel/vulkan/anv_batch_chain.c |  8 
 src/intel/vulkan/anv_blorp.c   |  2 +-
 src/intel/vulkan/anv_private.h |  2 +-
 src/intel/vulkan/gen8_cmd_buffer.c |  6 +++---
 src/intel/vulkan/genX_blorp_exec.c |  4 ++--
 src/intel/vulkan/genX_cmd_buffer.c | 20 +--
 7 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index acf3c80fbac..2eb191e98dc 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -441,11 +441,17 @@ anv_block_pool_init(struct anv_block_pool *pool,
pool->bo_flags = bo_flags;
pool->start_address = gen_canonical_address(start_address);
 
-   anv_bo_init(>bo, 0, 0);
+   pool->bo = malloc(sizeof(*pool->bo));
+   if (!pool->bo)
+  return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
+
+   anv_bo_init(pool->bo, 0, 0);
 
pool->fd = memfd_create("block pool", MFD_CLOEXEC);
-   if (pool->fd == -1)
-  return vk_error(VK_ERROR_INITIALIZATION_FAILED);
+   if (pool->fd == -1) {
+  result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
+  goto fail_bo;
+   }
 
/* Just make it 2GB up-front.  The Linux kernel won't actually back it
 * with pages until we either map and fault on one of them or we use
@@ -478,6 +484,8 @@ anv_block_pool_init(struct anv_block_pool *pool,
u_vector_finish(>mmap_cleanups);
  fail_fd:
close(pool->fd);
+ fail_bo:
+   free(pool->bo);
 
return result;
 }
@@ -487,6 +495,7 @@ anv_block_pool_finish(struct anv_block_pool *pool)
 {
struct anv_mmap_cleanup *cleanup;
 
+   free(pool->bo);
u_vector_foreach(cleanup, >mmap_cleanups) {
   if (cleanup->map)
  munmap(cleanup->map, cleanup->size);
@@ -590,13 +599,13 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
 * the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag and the kernel does all of the
 * hard work for us.
 */
-   anv_bo_init(>bo, gem_handle, size);
+   anv_bo_init(pool->bo, gem_handle, size);
if (pool->bo_flags & EXEC_OBJECT_PINNED) {
-  pool->bo.offset = pool->start_address + BLOCK_POOL_MEMFD_CENTER -
+  pool->bo->offset = pool->start_address + BLOCK_POOL_MEMFD_CENTER -
  center_bo_offset;
}
-   pool->bo.flags = pool->bo_flags;
-   pool->bo.map = map;
+   pool->bo->flags = pool->bo_flags;
+   pool->bo->map = map;
 
return VK_SUCCESS;
 }
@@ -659,7 +668,7 @@ anv_block_pool_grow(struct anv_block_pool *pool, struct 
anv_block_state *state)
 
assert(state == >state || back_used > 0);
 
-   uint32_t old_size = pool->bo.size;
+   uint32_t old_size = pool->bo->size;
 
/* The block pool is always initialized to a nonzero size and this function
 * is always called after initialization.
@@ -685,7 +694,7 @@ anv_block_pool_grow(struct anv_block_pool *pool, struct 
anv_block_state *state)
while (size < back_required + front_required)
   size *= 2;
 
-   assert(size > pool->bo.size);
+   assert(size > pool->bo->size);
 
/* We compute a new center_bo_offset such that, when we double the size
 * of the pool, we maintain the ratio of how much is used by each side.
@@ -722,7 +731,7 @@ anv_block_pool_grow(struct anv_block_pool *pool, struct 
anv_block_state *state)
 
result = anv_block_pool_expand_range(pool, center_bo_offset, size);
 
-   pool->bo.flags = pool->bo_flags;
+   pool->bo->flags = pool->bo_flags;
 
 done:
pthread_mutex_unlock(>device->mutex);
@@ -733,7 +742,7 @@ done:
* needs to do so in order to maintain its concurrency model.
*/
   if (state == >state) {
- return pool->bo.size - pool->center_bo_offset;
+ return pool->bo->size - pool->center_bo_offset;
   } else {
  assert(pool->center_bo_offset > 0);
  return pool->center_bo_offset;
diff --git a/src/intel/vulkan/anv_batch_chain.c 
b/src/intel/vulkan/anv_batch_chain.c
index 6c06858efe1..bec4d647b7e 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -501,7 +501,7 @@ anv_cmd_buffer_surface_base_address(struct anv_cmd_buffer 
*cmd_buffer)
 {
struct anv_state *bt_block = u_vector_head(_buffer->bt_block_states);
return (struct anv_address) {
-  .bo = _binding_table_pool(cmd_buffer->device)->block_pool.bo,
+  .bo = anv_binding_table_pool(cmd_buffer->device)->block_pool.bo,
   .offset = bt_block->offset,
};
 }
@@ -1229,7 +1229,7 @@ adjust_relocations_to_state_pool(struct anv_state_pool 
*pool,
 * relocations that point to the pool bo with the correct offset.
 */
for (size_t i = 0; i < relocs->num_relocs; i++) {
-  if (relocs->reloc_bos[i] == >block_pool.bo) {
+  if (relocs->reloc_bos[i] == pool->block_pool.bo) {
  /* Adjust the delta value in the relocation to correctly
  

[Mesa-dev] [RFC PATCH 10/14] anv: Add clflush to states.

2018-12-07 Thread Rafael Antognolli
TODO: This is just flushing the entire dynamic states on every execbuf.
Maybe it's too much. However, in theory we should be already flushing
the states as needed, but I think we didn't hit any bug due to the
coherence implied by userptr.
---
 src/intel/vulkan/anv_batch_chain.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/intel/vulkan/anv_batch_chain.c 
b/src/intel/vulkan/anv_batch_chain.c
index 65df28ccb91..99009679435 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -1366,6 +1366,10 @@ anv_reloc_list_add_dep(struct anv_cmd_buffer *cmd_buffer,
 
anv_block_pool_foreach_bo(bo_list, iter, bo) {
   _mesa_set_add(relocs->deps, bo);
+  if (!cmd_buffer->device->info.has_llc) {
+ for (uint32_t i = 0; i < bo->size; i += CACHELINE_SIZE)
+__builtin_ia32_clflush(bo->map + i);
+  }
}
 }
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC PATCH 12/14] anv/allocator: Rework chunk return to the state pool.

2018-12-07 Thread Rafael Antognolli
This commit tries to rework the code that split and returns chunks back
to the state pool, while still keeping the same logic.

The original code would get a chunk larger than we need and split it
into pool->block_size. Then it would return all but the first one, and
would split that first one into alloc_size chunks. Then it would keep
the first one (for the allocation), and return the others back to the
pool.

The new anv_state_pool_return_chunk() function will take a chunk (with
the alloc_size part removed), and a small_size hint. It then splits that
chunk into pool->block_size'd chunks, and if there's some space still
left, split that into small_size chunks. small_size in this case is the
same size as alloc_size.

The idea is to keep the same logic, but make it in a way we can reuse it
to return other chunks to the pool when we are growing the buffer.
---
 src/intel/vulkan/anv_allocator.c | 147 +--
 1 file changed, 102 insertions(+), 45 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 31258e38635..bddeb4a0fbd 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -994,6 +994,97 @@ anv_state_pool_get_bucket_size(uint32_t bucket)
return 1 << size_log2;
 }
 
+/** Helper to create a chunk into the state table.
+ *
+ * It just creates 'count' entries into the state table and update their sizes,
+ * offsets and maps, also pushing them as "free" states.
+ */
+static void
+anv_state_pool_return_blocks(struct anv_state_pool *pool,
+ uint32_t chunk_offset, uint32_t count,
+ uint32_t block_size)
+{
+   if (count == 0)
+  return;
+
+   uint32_t st_idx = anv_state_table_add(>table, count);
+   for (int i = 0; i < count; i++) {
+  /* update states that were added back to the state table */
+  struct anv_state *state_i = anv_state_table_get(>table,
+  st_idx + i);
+  state_i->alloc_size = block_size;
+  state_i->offset = chunk_offset + block_size * i;
+  struct anv_pool_map pool_map = anv_block_pool_map(>block_pool,
+state_i->offset);
+  state_i->map = pool_map.map + pool_map.offset;
+   }
+
+   uint32_t block_bucket = anv_state_pool_get_bucket(block_size);
+   anv_state_table_push(>buckets[block_bucket].free_list,
+>table, st_idx, count);
+}
+
+static uint32_t
+calculate_divisor(uint32_t size)
+{
+   uint32_t bucket = anv_state_pool_get_bucket(size);
+
+   while (bucket >= 0) {
+  uint32_t bucket_size = anv_state_pool_get_bucket_size(bucket);
+  if (size % bucket_size == 0)
+ return bucket_size;
+   }
+
+   return 0;
+}
+
+/** Returns a chunk of memory back to the state pool.
+ *
+ * If small_size is zero, we split chunk_size into pool->block_size'd pieces,
+ * and return those. If there's some remaining 'rest' space (chunk_size is not
+ * divisble by pool->block_size), then we find a bucket size that is a divisor
+ * of that rest, and split the 'rest' into that size, returning it to the pool.
+ *
+ * If small_size is non-zero, we use it in two different ways:
+ ** if it is larger than pool->block_size, we split the chunk into
+ *small_size'd pieces, instead of pool->block_size'd ones.
+ ** we also use it as the desired size to split the 'rest' after we split
+ *the bigger size of the chunk into pool->block_size;
+ */
+static void
+anv_state_pool_return_chunk(struct anv_state_pool *pool,
+uint32_t chunk_offset, uint32_t chunk_size,
+uint32_t small_size)
+{
+   uint32_t divisor = MAX2(pool->block_size, small_size);
+   uint32_t nblocks = chunk_size / divisor;
+   uint32_t rest = chunk_size % pool->block_size;
+
+   /* First return pool->block_size'd chunks.*/
+   uint32_t offset = chunk_offset + rest;
+   anv_state_pool_return_blocks(pool, offset, nblocks, pool->block_size);
+
+   if (rest == 0)
+  return;
+
+   chunk_size = rest;
+
+   if (small_size > 0) {
+  divisor = small_size;
+   } else {
+  /* Find the maximum divisor of the remaining chunk, and return smaller
+   * chunks of that size to the list.
+   */
+  divisor = calculate_divisor(chunk_size);
+  assert(divisor > 0);
+   }
+
+   /* Now return the smaller chunks of 'divisor' size */
+   assert(chunk_size % divisor == 0);
+   nblocks = (chunk_size / divisor);
+   anv_state_pool_return_blocks(pool, chunk_offset, nblocks, divisor);
+}
+
 static struct anv_state
 anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
uint32_t size, uint32_t align)
@@ -1025,6 +1116,10 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
   */
  state->alloc_size = alloc_size;
 
+ /* Now return the unused part of the chunk back to the pool as free
+  * blocks
+  

[Mesa-dev] [RFC PATCH 06/14] anv/allocator: Add getters for anv_block_pool.

2018-12-07 Thread Rafael Antognolli
We will need specially the anv_block_pool_map, to find the
map relative to some BO that is not at the start of the block pool.
---
 src/intel/vulkan/anv_allocator.c   | 23 ---
 src/intel/vulkan/anv_batch_chain.c |  5 +++--
 src/intel/vulkan/anv_private.h |  7 +++
 src/intel/vulkan/genX_blorp_exec.c |  5 +++--
 4 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index cda6a1a9d25..acf3c80fbac 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -601,6 +601,15 @@ anv_block_pool_expand_range(struct anv_block_pool *pool,
return VK_SUCCESS;
 }
 
+struct anv_pool_map
+anv_block_pool_map(struct anv_block_pool *pool, int32_t offset)
+{
+   return (struct anv_pool_map) {
+  .map = pool->map,
+  .offset = offset,
+   };
+}
+
 /** Grows and re-centers the block pool.
  *
  * We grow the block pool in one or both directions in such a way that the
@@ -967,7 +976,9 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
st_idx + i);
state_i->alloc_size = pool->block_size;
state_i->offset = chunk_offset + pool->block_size * (i + 1);
-   state_i->map = pool->block_pool.map + state_i->offset;
+   struct anv_pool_map pool_map = 
anv_block_pool_map(>block_pool,
+ 
state_i->offset);
+   state_i->map = pool_map.map + pool_map.offset;
 }
 anv_state_table_push(>buckets[block_bucket].free_list,
  >table, st_idx, push_back);
@@ -983,7 +994,9 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
 st_idx + i);
 state_i->alloc_size = alloc_size;
 state_i->offset = chunk_offset + alloc_size * (i + 1);
-state_i->map = pool->block_pool.map + state_i->offset;
+struct anv_pool_map pool_map = 
anv_block_pool_map(>block_pool,
+  state_i->offset);
+state_i->map = pool_map.map + pool_map.offset;
  }
  anv_state_table_push(>buckets[bucket].free_list,
   >table, st_idx, push_back);
@@ -1002,7 +1015,11 @@ anv_state_pool_alloc_no_vg(struct anv_state_pool *pool,
state = anv_state_table_get(>table, idx);
state->offset = offset;
state->alloc_size = alloc_size;
-   state->map = pool->block_pool.map + offset;
+
+   struct anv_pool_map pool_map = anv_block_pool_map(>block_pool,
+ state->offset);
+   state->map = pool_map.map + pool_map.offset;
+
 
 done:
return *state;
diff --git a/src/intel/vulkan/anv_batch_chain.c 
b/src/intel/vulkan/anv_batch_chain.c
index a9f8c5b79b1..6c06858efe1 100644
--- a/src/intel/vulkan/anv_batch_chain.c
+++ b/src/intel/vulkan/anv_batch_chain.c
@@ -679,8 +679,9 @@ anv_cmd_buffer_alloc_binding_table(struct anv_cmd_buffer 
*cmd_buffer,
   return (struct anv_state) { 0 };
 
state.offset = cmd_buffer->bt_next;
-   state.map = anv_binding_table_pool(device)->block_pool.map +
-  bt_block->offset + state.offset;
+   struct anv_pool_map pool_map =
+  anv_block_pool_map(_binding_table_pool(device)->block_pool, 
bt_block->offset + state.offset);
+   state.map = pool_map.map + pool_map.offset;
 
cmd_buffer->bt_next += state.alloc_size;
 
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 539523450ef..a364be8dad5 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -749,6 +749,11 @@ struct anv_state_stream {
struct anv_state_stream_block *block_list;
 };
 
+struct anv_pool_map {
+   void *map;
+   int32_t offset;
+};
+
 /* The block_pool functions exported for testing only.  The block pool should
  * only be used via a state pool (see below).
  */
@@ -762,6 +767,8 @@ int32_t anv_block_pool_alloc(struct anv_block_pool *pool,
  uint32_t block_size);
 int32_t anv_block_pool_alloc_back(struct anv_block_pool *pool,
   uint32_t block_size);
+struct anv_pool_map anv_block_pool_map(struct anv_block_pool *pool,
+   int32_t offset);
 
 VkResult anv_state_pool_init(struct anv_state_pool *pool,
  struct anv_device *device,
diff --git a/src/intel/vulkan/genX_blorp_exec.c 
b/src/intel/vulkan/genX_blorp_exec.c
index c573e890946..5af6abb0894 100644
--- a/src/intel/vulkan/genX_blorp_exec.c
+++ b/src/intel/vulkan/genX_blorp_exec.c
@@ -63,8 +63,9 @@ blorp_surface_reloc(struct blorp_batch *batch, uint32_t 
ss_offset,
if (result != VK_SUCCESS)
   anv_batch_set_error(_buffer->batch, result);
 
-   void *dest = 

[Mesa-dev] [RFC PATCH 13/14] anv/allocator: Add padding information.

2018-12-07 Thread Rafael Antognolli
It's possible that we still have some space left in the block pool, but
we try to allocate a state larger than that state. This means such state
would start somewhere within the range of the old block_pool, and end
after that range, within the range of the new size.

That's fine when we use userptr, since the memory in the block pool is
CPU mapped continuously. However, by the end of this series, we will
have the block_pool split into different BOs, with different CPU
mapping ranges that are not necessarily continuous. So we must avoid
such case of a given state being part of two different BOs in the block
pool.

This commit solves the issue by detecting that we are growing the
block_pool even though we are not at the end of the range. If that
happens, we don't use the space left at the end of the old size, and
consider it as "padding" that can't be used in the allocation. We update
the size requested from the block pool to take the padding into account,
and return the offset after the padding, which happens to be at the
start of the new address range.

Additionally, we return the amount of padding we used, so the caller
knows that this happens and can return that padding back into a list of
free states, that can be reused later. This way we hopefully don't waste
any space, but also avoid having a state split between two different
BOs.
---
 src/intel/vulkan/anv_allocator.c| 57 ++---
 src/intel/vulkan/anv_private.h  |  2 +-
 src/intel/vulkan/tests/block_pool_no_free.c |  2 +-
 3 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index bddeb4a0fbd..0d426edfb57 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -839,16 +839,35 @@ done:
 static uint32_t
 anv_block_pool_alloc_new(struct anv_block_pool *pool,
  struct anv_block_state *pool_state,
- uint32_t block_size)
+ uint32_t block_size, uint32_t *padding)
 {
struct anv_block_state state, old, new;
 
+   /* Most allocations won't generate any padding */
+   if (padding)
+  *padding = 0;
+
while (1) {
   state.u64 = __sync_fetch_and_add(_state->u64, block_size);
   if (state.next + block_size <= state.end) {
  assert(pool->map);
  return state.next;
   } else if (state.next <= state.end) {
+ if (pool->bo_flags & EXEC_OBJECT_PINNED && state.next < state.end) {
+/* We need to grow the block pool, but still have some leftover
+ * space that can't be used by that particular allocation. So we
+ * add that as a "padding", and return it.
+ */
+uint32_t leftover = state.end - state.next;
+block_size += leftover;
+
+/* If there is some leftover space in the pool, the caller must
+ * deal with it.
+ */
+assert(leftover == 0 || padding);
+*padding = leftover;
+ }
+
  /* We allocated the first block outside the pool so we have to grow
   * the pool.  pool_state->next acts a mutex: threads who try to
   * allocate now will get block indexes above the current limit and
@@ -872,9 +891,16 @@ anv_block_pool_alloc_new(struct anv_block_pool *pool,
 
 int32_t
 anv_block_pool_alloc(struct anv_block_pool *pool,
- uint32_t block_size)
+ uint32_t block_size, uint32_t *padding)
 {
-   return anv_block_pool_alloc_new(pool, >state, block_size);
+   uint32_t offset;
+
+   offset = anv_block_pool_alloc_new(pool, >state, block_size, padding);
+
+   if (padding && *padding > 0)
+  offset += *padding;
+
+   return offset;
 }
 
 /* Allocates a block out of the back of the block pool.
@@ -891,7 +917,7 @@ anv_block_pool_alloc_back(struct anv_block_pool *pool,
   uint32_t block_size)
 {
int32_t offset = anv_block_pool_alloc_new(pool, >back_state,
- block_size);
+ block_size, NULL);
 
/* The offset we get out of anv_block_pool_alloc_new() is actually the
 * number of bytes downwards from the middle to the end of the block.
@@ -947,16 +973,24 @@ static uint32_t
 anv_fixed_size_state_pool_alloc_new(struct anv_fixed_size_state_pool *pool,
 struct anv_block_pool *block_pool,
 uint32_t state_size,
-uint32_t block_size)
+uint32_t block_size,
+uint32_t *padding)
 {
struct anv_block_state block, old, new;
uint32_t offset;
 
+   /* We don't always use anv_block_pool_alloc(), which would set *padding to
+* zero for us. So if we have a pointer to padding, we must zero it out
+* ourselves here, to make sure we always 

[Mesa-dev] [RFC PATCH 11/14] anv: Remove some asserts.

2018-12-07 Thread Rafael Antognolli
They won't be true anymore once we add support for multiple BOs with
non-userptr.
---
 src/intel/vulkan/genX_gpu_memcpy.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/intel/vulkan/genX_gpu_memcpy.c 
b/src/intel/vulkan/genX_gpu_memcpy.c
index 1bee1c6dc17..e20179fa675 100644
--- a/src/intel/vulkan/genX_gpu_memcpy.c
+++ b/src/intel/vulkan/genX_gpu_memcpy.c
@@ -133,9 +133,6 @@ genX(cmd_buffer_so_memcpy)(struct anv_cmd_buffer 
*cmd_buffer,
if (size == 0)
   return;
 
-   assert(dst.offset + size <= dst.bo->size);
-   assert(src.offset + size <= src.bo->size);
-
/* The maximum copy block size is 4 32-bit components at a time. */
assert(size % 4 == 0);
unsigned bs = gcd_pow2_u64(16, size);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC PATCH 05/14] anv/allocator: Remove usage of anv_free_list.

2018-12-07 Thread Rafael Antognolli
Maybe we should already rename anv_free_list2 -> anv_free_list since the
old one is gone.
---
 src/intel/vulkan/anv_allocator.c | 55 
 src/intel/vulkan/anv_private.h   | 11 ---
 2 files changed, 66 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 2171a97970b..cda6a1a9d25 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -373,61 +373,6 @@ anv_state_table_pop(union anv_free_list2 *list,
return NULL;
 }
 
-static bool
-anv_free_list_pop(union anv_free_list *list, void **map, int32_t *offset)
-{
-   union anv_free_list current, new, old;
-
-   current.u64 = list->u64;
-   while (current.offset != EMPTY) {
-  /* We have to add a memory barrier here so that the list head (and
-   * offset) gets read before we read the map pointer.  This way we
-   * know that the map pointer is valid for the given offset at the
-   * point where we read it.
-   */
-  __sync_synchronize();
-
-  int32_t *next_ptr = *map + current.offset;
-  new.offset = VG_NOACCESS_READ(next_ptr);
-  new.count = current.count + 1;
-  old.u64 = __sync_val_compare_and_swap(>u64, current.u64, new.u64);
-  if (old.u64 == current.u64) {
- *offset = current.offset;
- return true;
-  }
-  current = old;
-   }
-
-   return false;
-}
-
-static void
-anv_free_list_push(union anv_free_list *list, void *map, int32_t offset,
-   uint32_t size, uint32_t count)
-{
-   union anv_free_list current, old, new;
-   int32_t *next_ptr = map + offset;
-
-   /* If we're returning more than one chunk, we need to build a chain to add
-* to the list.  Fortunately, we can do this without any atomics since we
-* own everything in the chain right now.  `offset` is left pointing to the
-* head of our chain list while `next_ptr` points to the tail.
-*/
-   for (uint32_t i = 1; i < count; i++) {
-  VG_NOACCESS_WRITE(next_ptr, offset + i * size);
-  next_ptr = map + offset + i * size;
-   }
-
-   old = *list;
-   do {
-  current = old;
-  VG_NOACCESS_WRITE(next_ptr, current.offset);
-  new.offset = offset;
-  new.count = current.count + 1;
-  old.u64 = __sync_val_compare_and_swap(>u64, current.u64, new.u64);
-   } while (old.u64 != current.u64);
-}
-
 /* All pointers in the ptr_free_list are assumed to be page-aligned.  This
  * means that the bottom 12 bits should all be zero.
  */
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index d068a4be5d8..539523450ef 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -605,16 +605,6 @@ anv_bo_init(struct anv_bo *bo, uint32_t gem_handle, 
uint64_t size)
  * both the block pool and the state pools.  Unfortunately, in order to
  * solve the ABA problem, we can't use a single uint32_t head.
  */
-union anv_free_list {
-   struct {
-  int32_t offset;
-
-  /* A simple count that is incremented every time the head changes. */
-  uint32_t count;
-   };
-   uint64_t u64;
-};
-
 union anv_free_list2 {
struct {
   uint32_t offset;
@@ -625,7 +615,6 @@ union anv_free_list2 {
uint64_t u64;
 };
 
-#define ANV_FREE_LIST_EMPTY ((union anv_free_list) { { 1, 0 } })
 #define ANV_FREE_LIST2_EMPTY ((union anv_free_list2) { { UINT32_MAX, 0 } })
 
 struct anv_block_state {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC PATCH 01/14] anv/tests: Fix block_pool_no_free test.

2018-12-07 Thread Rafael Antognolli
The test was checking whether -1 was smaller than an unsigned int, which
is always false. So it was exiting early and never running until the
end, since it would reach the condition (thread_max == -1).

However, just fixing that is not enough. The test is currently getting
the highest block on each iteration, and then on the next one, until it
reaches the end. But by that point, we wouldn't have looked at all
blocks of all threads. For instance, for 3 threads and 4 blocks per
thread, a situation like this (unlikely to happen):

[Thread]: [Blocks]
   [0]: [0, 32, 64, 96]
   [1]: [128, 160, 192, 224]
   [2]: [256, 288, 320, 352]

Would cause the test to iterate only over the thread number 2.

The fix is to always grab the lowest block on each iteration, and assert
that it is higher than the one from the previous iteration.
---
 src/intel/vulkan/tests/block_pool_no_free.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/intel/vulkan/tests/block_pool_no_free.c 
b/src/intel/vulkan/tests/block_pool_no_free.c
index 17006dd3bc7..730297d4e36 100644
--- a/src/intel/vulkan/tests/block_pool_no_free.c
+++ b/src/intel/vulkan/tests/block_pool_no_free.c
@@ -78,16 +78,16 @@ static void validate_monotonic(uint32_t **blocks)
unsigned next[NUM_THREADS];
memset(next, 0, sizeof(next));
 
-   int highest = -1;
+   uint32_t lowest = UINT32_MAX;
while (true) {
-  /* First, we find which thread has the highest next element */
-  int thread_max = -1;
+  /* First, we find which thread has the lowest next element */
+  uint32_t thread_max = UINT32_MAX;
   int max_thread_idx = -1;
   for (unsigned i = 0; i < NUM_THREADS; i++) {
  if (next[i] >= BLOCKS_PER_THREAD)
 continue;
 
- if (thread_max < blocks[i][next[i]]) {
+ if (thread_max > blocks[i][next[i]]) {
 thread_max = blocks[i][next[i]];
 max_thread_idx = i;
  }
@@ -96,13 +96,14 @@ static void validate_monotonic(uint32_t **blocks)
   /* The only way this can happen is if all of the next[] values are at
* BLOCKS_PER_THREAD, in which case, we're done.
*/
-  if (thread_max == -1)
+  if (thread_max == UINT32_MAX)
  break;
 
-  /* That next element had better be higher than the previous highest */
-  assert(blocks[max_thread_idx][next[max_thread_idx]] > highest);
+  /* That next element had better be higher than the previous lowest */
+  assert(lowest == UINT32_MAX ||
+ blocks[max_thread_idx][next[max_thread_idx]] > lowest);
 
-  highest = blocks[max_thread_idx][next[max_thread_idx]];
+  lowest = blocks[max_thread_idx][next[max_thread_idx]];
   next[max_thread_idx]++;
}
 }
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC PATCH 00/14] Do not use userptr in anv if softpin is available.

2018-12-07 Thread Rafael Antognolli
This series changes anv_block_pool to use a list of BO's instead of a
single BO that gets reallocated and set with userptr.

The main changes are:
   - The introduction of anv_state_table to track anv_states, and
   recycle them;
   - Addition of a list of BOs in anv_block_pool, instead of a single
   BO;
   - Forcing allocations to not cross boundaries between the previous
   size of a block_pool and the new one;
   - And pushing back the "padding" required to avoid such boundaries
   cross to the pool of free anv_states.

I'm still working on increasing the test coverage (adding new tests for
some new cases we have now), but the series seems reasonable imho to
start getting some review already.

Cc: Jason Ekstrand 

Rafael Antognolli (14):
  anv/tests: Fix block_pool_no_free test.
  anv/allocator: Add anv_state_table.
  anv/allocator: Use anv_state_table on anv_state_pool_alloc.
  anv/allocator: Use anv_state_table on back_alloc too.
  anv/allocator: Remove usage of anv_free_list.
  anv/allocator: Add getters for anv_block_pool.
  anv: Update usage of block_pool->bo.
  anv/allocator: Add support for a list of BOs in block pool.
  anv: Validate the list of BOs from the block pool.
  anv: Add clflush to states.
  anv: Remove some asserts.
  anv/allocator: Rework chunk return to the state pool.
  anv/allocator: Add padding information.
  anv/allocator: Add support for non-userptr.

 src/intel/vulkan/anv_allocator.c| 741 
 src/intel/vulkan/anv_batch_chain.c  |  62 +-
 src/intel/vulkan/anv_blorp.c|   2 +-
 src/intel/vulkan/anv_private.h  |  73 +-
 src/intel/vulkan/gen8_cmd_buffer.c  |   6 +-
 src/intel/vulkan/genX_blorp_exec.c  |   9 +-
 src/intel/vulkan/genX_cmd_buffer.c  |  20 +-
 src/intel/vulkan/genX_gpu_memcpy.c  |   3 -
 src/intel/vulkan/tests/block_pool_no_free.c |  19 +-
 9 files changed, 740 insertions(+), 195 deletions(-)

-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/3] i965/gen9: Add workarounds for object preemption.

2018-10-31 Thread Rafael Antognolli
On Tue, Oct 30, 2018 at 04:32:54PM -0700, Kenneth Graunke wrote:
> On Monday, October 29, 2018 10:19:54 AM PDT Rafael Antognolli wrote:
> > Gen9 hardware requires some workarounds to disable preemption depending
> > on the type of primitive being emitted.
> > 
> > We implement this by adding a new atom that tracks BRW_NEW_PRIMITIVE.
> > Whenever it happens, we check the current type of primitive and
> > enable/disable object preemption.
> > 
> > For now, we just ignore blorp.  The only primitive it emits is
> > 3DPRIM_RECTLIST, and since it's not listed in the workarounds, we can
> > safely leave preemption enabled when it happens. Or it will be disabled
> > by a previous 3DPRIMITIVE, which should be fine too.
> > 
> > Signed-off-by: Rafael Antognolli 
> > Cc: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/genX_state_upload.c | 47 +++
> >  1 file changed, 47 insertions(+)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
> > b/src/mesa/drivers/dri/i965/genX_state_upload.c
> > index 740cb0c4d2e..3a01bab1ae1 100644
> > --- a/src/mesa/drivers/dri/i965/genX_state_upload.c
> > +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
> > @@ -5563,6 +5563,50 @@ static const struct brw_tracked_state 
> > genX(blend_constant_color) = {
> >  
> >  /* -- 
> > */
> >  
> > +#if GEN_GEN == 9
> > +
> > +/**
> 
>  * Enable or disable preemption based on the current primitive type.
>  * (This should only be necessary on Gen9 hardware, not Gen10+.)
>  *
> 
> > + * Implement workarounds for preemption:
> > + *- WaDisableMidObjectPreemptionForGSLineStripAdj
> > + *- WaDisableMidObjectPreemptionForTrifanOrPolygon
> > + */
> > +static void
> > +gen9_emit_preempt_wa(struct brw_context *brw)
> > +{
> 
> I think this might be a bit easier to follow as
> 
>bool object_preemption = true;
> 
>if (brw->primitive == _3DPRIM_LINESTRIP_ADJ && brw->gs.enabled)
>   object_preemption = false;
> 
>if (brw->primitive == _3DPRIM_TRIFAN)
>   object_preemption = false;
> 
>brw_enable_obj_preemption(brw, object_preemption);
> 
> (with the comments of course.)
> 
> Do we need any stalling when whacking CS_CHICKEN1...?

Hmmm... there's this:

"A fixed function pipe flush is required before modifying this field"

in the programming notes. I'm not sure what that is, but I assume it's
some type of PIPE_CONTROL?

> Looking through the workarounds list, I believe that we also need to
> disable mid-object preemption for _3DPRIM_LINELOOP (Gen9 WA #0816).
> 
> We may need to disable it if instance_count > 0 in the 3DPRIMITIVE
> (Gen9 WA #0798).

Ack.

> We may also need to disable autostripping by whacking some chicken
> registers if it's enabled (Gen9 WA #0799).  Which would be lame,

Looking again at #0799, it seems it's only applicable up to C0 on SKL,
and B0 on BXT. So maybe we should be fine here? Or just disable it on
BXT?

> because that's likely a useful optimization.  I guess we could disable
> preemption for TRILIST and LINELIST as well to be safe.

Is this because of the autostripping mentioned above, or some other
workaround? Or just your impression?

I can update it to include that, but just want to be sure that it's
still applicable, once we figure the thing about #0799.

> 
> And GPGPU preemption looks like a mile long list of workarounds,
> so let's not try it on Gen9.

Fair enough, thanks a lot for the review!

> > +   /* WaDisableMidObjectPreemptionForGSLineStripAdj
> > +*
> > +*WA: Disable mid-draw preemption when draw-call is a linestrip_adj 
> > and
> > +*GS is enabled.
> > +*/
> > +   bool object_preemption =
> > +  !(brw->primitive == _3DPRIM_LINESTRIP_ADJ && brw->gs.enabled);
> > +
> > +   /* WaDisableMidObjectPreemptionForTrifanOrPolygon
> > +*
> > +*TriFan miscompare in Execlist Preemption test. Cut index that is 
> > on a
> > +*previous context. End the previous, the resume another context 
> > with a
> > +*tri-fan or polygon, and the vertex count is corrupted. If we 
> > prempt
> > +*again we will cause corruption.
> > +*
> > +*WA: Disable mid-draw preemption when draw-call has a tri-fan.
> > +*/
> > +   object_preemption =
> > +  object_preemption && !(brw->primitive == _3DPRIM_TRIFAN);
> > +
> > +   brw_enable_obj_preemption(brw, object_preemption);
> > +}

Re: [Mesa-dev] [PATCH v2 2/3] i965/gen10+: Enable object level preemption.

2018-10-29 Thread Rafael Antognolli
On Mon, Oct 29, 2018 at 05:29:10PM +, Chris Wilson wrote:
> Quoting Rafael Antognolli (2018-10-29 17:19:53)
> > +void
> > +brw_enable_obj_preemption(struct brw_context *brw, bool enable)
> > +{
> > +   const struct gen_device_info *devinfo = >screen->devinfo;
> > +   assert(devinfo->gen >= 9);
> > +
> > +   if (enable == brw->object_preemption)
> > +  return;
> > +
> > +   /* A fixed function pipe flush is required before modifying this field 
> > */
> > +   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_FLUSH_ENABLE);
> > +
> > +   bool replay_mode = enable ?
> > +  GEN9_REPLAY_MODE_MIDOBJECT : GEN9_REPLAY_MODE_MIDBUFFER;
> > +
> > +   /* enable object level preemption */
> > +   brw_load_register_imm32(brw, CS_CHICKEN1,
> > +   replay_mode | GEN9_REPLAY_MODE_MASK);
> > +
> > +   brw->object_preemption = enable;
> > +}
> > +
> >  static void
> >  brw_upload_initial_gpu_state(struct brw_context *brw)
> >  {
> > @@ -153,6 +175,9 @@ brw_upload_initial_gpu_state(struct brw_context *brw)
> >   ADVANCE_BATCH();
> >}
> > }
> > +
> > +   if (devinfo->gen >= 10)
> 
> brw->object_preemption = false;
> 
> > +  brw_enable_obj_preemption(brw, true);
> 
> To force the LRI despite what the context may believe. (To accommodate
> recreating a logical context following a GPU hang.)
> -Chris

Fixing it locally, thanks.

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/3] i965/gen10+: Enable object level preemption.

2018-10-29 Thread Rafael Antognolli
Set bit when initializing context.

Signed-off-by: Rafael Antognolli 
---
 src/mesa/drivers/dri/i965/brw_context.h  |  2 ++
 src/mesa/drivers/dri/i965/brw_defines.h  |  5 
 src/mesa/drivers/dri/i965/brw_state.h|  3 ++-
 src/mesa/drivers/dri/i965/brw_state_upload.c | 25 
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 7fd15669eb9..9253386de7d 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -842,6 +842,8 @@ struct brw_context
 
GLuint primitive; /**< Hardware primitive, such as _3DPRIM_TRILIST. */
 
+   bool object_preemption; /**< Object level preemption enabled. */
+
GLenum reduced_primitive;
 
/**
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 97a787a2ab3..affc690618e 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1681,4 +1681,9 @@ enum brw_pixel_shader_coverage_mask_mode {
 # define HEADERLESS_MESSAGE_FOR_PREEMPTABLE_CONTEXTS(1 << 5)
 # define HEADERLESS_MESSAGE_FOR_PREEMPTABLE_CONTEXTS_MASK   REG_MASK(1 << 5)
 
+#define CS_CHICKEN10x2580 /* Gen9+ */
+# define GEN9_REPLAY_MODE_MIDBUFFER (0 << 0)
+# define GEN9_REPLAY_MODE_MIDOBJECT (1 << 0)
+# define GEN9_REPLAY_MODE_MASK  REG_MASK(1 << 0)
+
 #endif
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index f6acf81b899..546d103d1a4 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -128,7 +128,7 @@ void brw_disk_cache_write_compute_program(struct 
brw_context *brw);
 void brw_disk_cache_write_render_programs(struct brw_context *brw);
 
 /***
- * brw_state.c
+ * brw_state_upload.c
  */
 void brw_upload_render_state(struct brw_context *brw);
 void brw_render_state_finished(struct brw_context *brw);
@@ -138,6 +138,7 @@ void brw_init_state(struct brw_context *brw);
 void brw_destroy_state(struct brw_context *brw);
 void brw_emit_select_pipeline(struct brw_context *brw,
   enum brw_pipeline pipeline);
+void brw_enable_obj_preemption(struct brw_context *brw, bool enable);
 
 static inline void
 brw_select_pipeline(struct brw_context *brw, enum brw_pipeline pipeline)
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index 7f20579fb87..2e42dfb36d6 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -45,6 +45,28 @@
 #include "brw_cs.h"
 #include "main/framebuffer.h"
 
+void
+brw_enable_obj_preemption(struct brw_context *brw, bool enable)
+{
+   const struct gen_device_info *devinfo = >screen->devinfo;
+   assert(devinfo->gen >= 9);
+
+   if (enable == brw->object_preemption)
+  return;
+
+   /* A fixed function pipe flush is required before modifying this field */
+   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_FLUSH_ENABLE);
+
+   bool replay_mode = enable ?
+  GEN9_REPLAY_MODE_MIDOBJECT : GEN9_REPLAY_MODE_MIDBUFFER;
+
+   /* enable object level preemption */
+   brw_load_register_imm32(brw, CS_CHICKEN1,
+   replay_mode | GEN9_REPLAY_MODE_MASK);
+
+   brw->object_preemption = enable;
+}
+
 static void
 brw_upload_initial_gpu_state(struct brw_context *brw)
 {
@@ -153,6 +175,9 @@ brw_upload_initial_gpu_state(struct brw_context *brw)
  ADVANCE_BATCH();
   }
}
+
+   if (devinfo->gen >= 10)
+  brw_enable_obj_preemption(brw, true);
 }
 
 static inline const struct brw_tracked_state *
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/3] intel/genxml: Add register for object preemption.

2018-10-29 Thread Rafael Antognolli
Signed-off-by: Rafael Antognolli 
---
 src/intel/genxml/gen10.xml | 8 
 src/intel/genxml/gen11.xml | 8 
 src/intel/genxml/gen9.xml  | 8 
 3 files changed, 24 insertions(+)

diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml
index abd5da297d6..acded759335 100644
--- a/src/intel/genxml/gen10.xml
+++ b/src/intel/genxml/gen10.xml
@@ -3553,6 +3553,14 @@
 
   
 
+  
+
+  
+  
+
+
+  
+
   
 
   
diff --git a/src/intel/genxml/gen11.xml b/src/intel/genxml/gen11.xml
index c69d7dc89c2..d39bf09a5d7 100644
--- a/src/intel/genxml/gen11.xml
+++ b/src/intel/genxml/gen11.xml
@@ -3551,6 +3551,14 @@
 
   
 
+  
+
+  
+  
+
+
+  
+
   
 
   
diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
index ca268254503..b7ce3095ab4 100644
--- a/src/intel/genxml/gen9.xml
+++ b/src/intel/genxml/gen9.xml
@@ -3491,6 +3491,14 @@
 
   
 
+  
+
+  
+  
+
+
+  
+
   
 
   
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 0/3] Add object level preemption to i965.

2018-10-29 Thread Rafael Antognolli
Re-sending the series, this time adding preemption support only to i965,
since We still don't have vulkan tests for this.

The proposed piglit test for this series can be found here:
https://gitlab.freedesktop.org/rantogno/piglit/commits/review/context_preemption_v2

Cc: Kenneth Graunke 

Rafael Antognolli (3):
  intel/genxml: Add register for object preemption.
  i965/gen10+: Enable object level preemption.
  i965/gen9: Add workarounds for object preemption.

 src/intel/genxml/gen10.xml|  8 
 src/intel/genxml/gen11.xml|  8 
 src/intel/genxml/gen9.xml |  8 
 src/mesa/drivers/dri/i965/brw_context.h   |  2 +
 src/mesa/drivers/dri/i965/brw_defines.h   |  5 ++
 src/mesa/drivers/dri/i965/brw_state.h |  3 +-
 src/mesa/drivers/dri/i965/brw_state_upload.c  | 25 ++
 src/mesa/drivers/dri/i965/genX_state_upload.c | 47 +++
 8 files changed, 105 insertions(+), 1 deletion(-)

-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/3] i965/gen9: Add workarounds for object preemption.

2018-10-29 Thread Rafael Antognolli
Gen9 hardware requires some workarounds to disable preemption depending
on the type of primitive being emitted.

We implement this by adding a new atom that tracks BRW_NEW_PRIMITIVE.
Whenever it happens, we check the current type of primitive and
enable/disable object preemption.

For now, we just ignore blorp.  The only primitive it emits is
3DPRIM_RECTLIST, and since it's not listed in the workarounds, we can
safely leave preemption enabled when it happens. Or it will be disabled
by a previous 3DPRIMITIVE, which should be fine too.

Signed-off-by: Rafael Antognolli 
Cc: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/genX_state_upload.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
b/src/mesa/drivers/dri/i965/genX_state_upload.c
index 740cb0c4d2e..3a01bab1ae1 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -5563,6 +5563,50 @@ static const struct brw_tracked_state 
genX(blend_constant_color) = {
 
 /* -- */
 
+#if GEN_GEN == 9
+
+/**
+ * Implement workarounds for preemption:
+ *- WaDisableMidObjectPreemptionForGSLineStripAdj
+ *- WaDisableMidObjectPreemptionForTrifanOrPolygon
+ */
+static void
+gen9_emit_preempt_wa(struct brw_context *brw)
+{
+   /* WaDisableMidObjectPreemptionForGSLineStripAdj
+*
+*WA: Disable mid-draw preemption when draw-call is a linestrip_adj and
+*GS is enabled.
+*/
+   bool object_preemption =
+  !(brw->primitive == _3DPRIM_LINESTRIP_ADJ && brw->gs.enabled);
+
+   /* WaDisableMidObjectPreemptionForTrifanOrPolygon
+*
+*TriFan miscompare in Execlist Preemption test. Cut index that is on a
+*previous context. End the previous, the resume another context with a
+*tri-fan or polygon, and the vertex count is corrupted. If we prempt
+*again we will cause corruption.
+*
+*WA: Disable mid-draw preemption when draw-call has a tri-fan.
+*/
+   object_preemption =
+  object_preemption && !(brw->primitive == _3DPRIM_TRIFAN);
+
+   brw_enable_obj_preemption(brw, object_preemption);
+}
+
+static const struct brw_tracked_state gen9_preempt_wa = {
+   .dirty = {
+  .mesa = 0,
+  .brw = BRW_NEW_PRIMITIVE | BRW_NEW_GEOMETRY_PROGRAM,
+   },
+   .emit = gen9_emit_preempt_wa,
+};
+#endif
+
+/* -- */
+
 void
 genX(init_atoms)(struct brw_context *brw)
 {
@@ -5867,6 +5911,9 @@ genX(init_atoms)(struct brw_context *brw)
 
   (cut_index),
   _pma_fix,
+#if GEN_GEN == 9
+  _preempt_wa,
+#endif
};
 #endif
 
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/tools: Remove hardcoded PADDING_SIZE from sanitizer

2018-10-17 Thread Rafael Antognolli
On Wed, Oct 17, 2018 at 06:08:34PM +0300, Danylo Piliaiev wrote:
> Signed-off-by: Danylo Piliaiev 
> ---
>  src/intel/tools/intel_sanitize_gpu.c | 38 +++-
>  1 file changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/src/intel/tools/intel_sanitize_gpu.c 
> b/src/intel/tools/intel_sanitize_gpu.c
> index 9b49b0bbf2..36c4725a2f 100644
> --- a/src/intel/tools/intel_sanitize_gpu.c
> +++ b/src/intel/tools/intel_sanitize_gpu.c
> @@ -51,14 +51,6 @@ static int (*libc_fcntl)(int fd, int cmd, int param);
>  
>  #define DRM_MAJOR 226
>  
> -/* TODO: we want to make sure that the padding forces
> - * the BO to take another page on the (PP)GTT; 4KB
> - * may or may not be the page size for the BO. Indeed,
> - * depending on GPU, kernel version and GEM size, the
> - * page size can be one of 4KB, 64KB or 2M.
> - */
> -#define PADDING_SIZE 4096
> -
>  struct refcnt_hash_table {
> struct hash_table *t;
> int refcnt;
> @@ -80,6 +72,8 @@ pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
>  
>  static struct hash_table *fds_to_bo_sizes = NULL;
>  
> +static long padding_size = 0;
> +
>  static inline struct hash_table*
>  bo_size_table(int fd)
>  {
> @@ -166,7 +160,7 @@ padding_is_good(int fd, uint32_t handle)
> struct drm_i915_gem_mmap mmap_arg = {
>.handle = handle,
>.offset = bo_size(fd, handle),
> -  .size = PADDING_SIZE,
> +  .size = padding_size,
>.flags = 0,
> };
>  
> @@ -189,17 +183,17 @@ padding_is_good(int fd, uint32_t handle)
>  * if the bo is not cache coherent we likely need to
>  * invalidate the cache lines to get it.
>  */
> -   gen_invalidate_range(mapped, PADDING_SIZE);
> +   gen_invalidate_range(mapped, padding_size);
>  
> expected_value = handle & 0xFF;
> -   for (uint32_t i = 0; i < PADDING_SIZE; ++i) {
> +   for (uint32_t i = 0; i < padding_size; ++i) {
>if (expected_value != mapped[i]) {
> - munmap(mapped, PADDING_SIZE);
> + munmap(mapped, padding_size);
>   return false;
>}
>expected_value = next_noise_value(expected_value);
> }
> -   munmap(mapped, PADDING_SIZE);
> +   munmap(mapped, padding_size);
>  
> return true;
>  }
> @@ -207,9 +201,9 @@ padding_is_good(int fd, uint32_t handle)
>  static int
>  create_with_padding(int fd, struct drm_i915_gem_create *create)
>  {
> -   create->size += PADDING_SIZE;
> +   create->size += padding_size;
> int ret = libc_ioctl(fd, DRM_IOCTL_I915_GEM_CREATE, create);
> -   create->size -= PADDING_SIZE;
> +   create->size -= padding_size;
>  
> if (ret != 0)
>return ret;
> @@ -218,7 +212,7 @@ create_with_padding(int fd, struct drm_i915_gem_create 
> *create)
> struct drm_i915_gem_mmap mmap_arg = {
>.handle = create->handle,
>.offset = create->size,
> -  .size = PADDING_SIZE,
> +  .size = padding_size,
>.flags = 0,
> };
>  
> @@ -228,8 +222,8 @@ create_with_padding(int fd, struct drm_i915_gem_create 
> *create)
>  
> noise_values = (uint8_t*) (uintptr_t) mmap_arg.addr_ptr;
> fill_noise_buffer(noise_values, create->handle & 0xFF,
> - PADDING_SIZE);
> -   munmap(noise_values, PADDING_SIZE);
> + padding_size);
> +   munmap(noise_values, padding_size);
>  
> _mesa_hash_table_insert(bo_size_table(fd), 
> (void*)(uintptr_t)create->handle,
> (void*)(uintptr_t)create->size);
> @@ -427,4 +421,12 @@ init(void)
> libc_close = dlsym(RTLD_NEXT, "close");
> libc_fcntl = dlsym(RTLD_NEXT, "fcntl");
> libc_ioctl = dlsym(RTLD_NEXT, "ioctl");
> +
> +   /* We want to make sure that the padding forces
> +* the BO to take another page on the (PP)GTT.
> +*/
> +   padding_size = sysconf(_SC_PAGESIZE);

I don't think this is the page size we want. This is the page size of
CPU/system memory, which might be different from what the GPU is using
to map pages. For instance, even if we are using 64K pages for GPU
mapping, I think this call would still return 4K.

Though I'm not sure if there's an interface to query the kernel which
page size we are using for the GPU...

> +   if (padding_size == -1) {
> +  unreachable("Bad page size");
> +   }
>  }
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: consider a 'base level' when calculating width0, height0, depth0

2018-10-11 Thread Rafael Antognolli
On Thu, Oct 11, 2018 at 03:12:08PM +0300, andrey simiklit wrote:
> Hi,
> 
> Thanks for reviewing.
> This 'simple reproducer' just can cause assertion
> in the debug mesa version but I don't know
> how to check things which cause it using opengl api at the moment.
> I mean it can be some internal mesa things inaccessible outside
> but anyway I am going to try to do it.

I don't think you need to check for the assertion. You can simply write
the piglit test that does the same thing as your simple reproducer does,
and if the test causes an assertion, then if I'm not wrong piglit will
report that test as a "crash". So we would have coverage. And if your
test gets to the end of the execution without crashing, you can assume
it's a pass.

You probably can write some comments by the end of the test stating that
if it has reached that point, then things should be fine.

As an extra thing, I think the test could additionally check that
everything rendered correctly (check some colors from the framebuffer).

Anyway, just some ideas.

Thanks,
Rafael

> Regards,
> Andrii.
> On Mon, Oct 8, 2018 at 11:46 PM Rafael Antognolli 
> 
> wrote:
> 
> On Tue, Oct 02, 2018 at 07:16:01PM +0300, asimiklit.w...@gmail.com wrote:
> > From: Andrii Simiklit 
> >
> > I guess that when we calculating the width0, height0, depth0
> > to use for function 'intel_miptree_create' we need to consider
> > the 'base level' like it is done in the
> 'intel_miptree_create_for_teximage'
> > function.
> 
> Hi Andrii, this makes sense to me. I'm also not familiar with this code,
> so I'm not sure this is the right way to solve the issue, but at least
> it's a way.
> 
> You added a simple test case in the bug, do you think you could make
> that a piglit test?
> 
> 
> Thanks,
> Rafael
> 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107987
> > Signed-off-by: Andrii Simiklit 
> > ---
> >  .../drivers/dri/i965/intel_tex_validate.c | 26 ++-
> >  1 file changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/intel_tex_validate.c b/src/mesa/
> drivers/dri/i965/intel_tex_validate.c
> > index 72ce83c7ce..37aa8f43ec 100644
> > --- a/src/mesa/drivers/dri/i965/intel_tex_validate.c
> > +++ b/src/mesa/drivers/dri/i965/intel_tex_validate.c
> > @@ -119,8 +119,32 @@ intel_finalize_mipmap_tree(struct brw_context *brw,
> > /* May need to create a new tree:
> >  */
> > if (!intelObj->mt) {
> > +  const unsigned level = firstImage->base.Base.Level;
> >intel_get_image_dims(>base.Base, , , &
> depth);
> > -
> > +  /* Figure out image dimensions at start level. */
> > +  switch(intelObj->base.Target) {
> > +  case GL_TEXTURE_2D_MULTISAMPLE:
> > +  case GL_TEXTURE_2D_MULTISAMPLE_ARRAY:
> > +  case GL_TEXTURE_RECTANGLE:
> > +  case GL_TEXTURE_EXTERNAL_OES:
> > +  assert(level == 0);
> > +  break;
> > +  case GL_TEXTURE_3D:
> > +  depth = depth << level;
> > +  /* Fall through */
> > +  case GL_TEXTURE_2D:
> > +  case GL_TEXTURE_2D_ARRAY:
> > +  case GL_TEXTURE_CUBE_MAP:
> > +  case GL_TEXTURE_CUBE_MAP_ARRAY:
> > +  height = height << level;
> > +  /* Fall through */
> > +  case GL_TEXTURE_1D:
> > +  case GL_TEXTURE_1D_ARRAY:
> > +  width = width << level;
> > +  break;
> > +  default:
> > +  unreachable("Unexpected target");
> > +  }
> >perf_debug("Creating new %s %dx%dx%d %d-level miptree to handle "
> >   "finalized texture miptree.\n",
> >   
> _mesa_get_format_name(firstImage->base.Base.TexFormat),
> > --
> > 2.17.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] anv/gen9+: Initialize new fields in STATE_BASE_ADDRESS

2018-10-11 Thread Rafael Antognolli
On Wed, Oct 10, 2018 at 05:00:33PM -0700, Jordan Justen wrote:
> On 2018-10-10 14:38:23, Rafael Antognolli wrote:
> > On Wed, Oct 10, 2018 at 02:04:11PM -0700, Jordan Justen wrote:
> > > On 2018-10-10 13:45:13, Rafael Antognolli wrote:
> > > > On Wed, Oct 10, 2018 at 01:39:25PM -0700, Jordan Justen wrote:
> > > > > Ref: 263b584d5e4 "i965/skl: Emit extra zeros in STATE_BASE_ADDRESS on 
> > > > > Skylake."
> > > > > Signed-off-by: Jordan Justen 
> > > > > ---
> > > > >  src/intel/vulkan/genX_cmd_buffer.c | 12 
> > > > >  1 file changed, 12 insertions(+)
> > > > > 
> > > > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> > > > > b/src/intel/vulkan/genX_cmd_buffer.c
> > > > > index c3a7e5c83c3..43a02f22567 100644
> > > > > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > > > > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > > > > @@ -121,6 +121,18 @@ genX(cmd_buffer_emit_state_base_address)(struct 
> > > > > anv_cmd_buffer *cmd_buffer)
> > > > >sba.IndirectObjectBufferSizeModifyEnable  = true;
> > > > >sba.InstructionBufferSize = 0xf;
> > > > >sba.InstructionBuffersizeModifyEnable = true;
> > > > > +#  endif
> > > > > +#  if (GEN_GEN >= 9)
> > > > > +  sba.BindlessSurfaceStateBaseAddress = (struct anv_address) { 
> > > > > NULL, 0 };
> > > > > +  sba.BindlessSurfaceStateMemoryObjectControlState = GENX(MOCS);
> > > > > +  sba.BindlessSurfaceStateBaseAddressModifyEnable = true;
> > > > > +  sba.BindlessSurfaceStateSize = 0;
> > > > > +#  endif
> > > > > +#  if (GEN_GEN >= 10)
> > > > > +  sba.BindlessSamplerStateBaseAddress = (struct anv_address) { 
> > > > > NULL, 0 };
> > > > > +  sba.BindlessSamplerStateMemoryObjectControlState = GENX(MOCS);
> > > > > +  sba.BindlessSamplerStateBaseAddressModifyEnable = true;
> > > > > +  sba.BindlessSamplerStateBufferSize = 0;
> > > > 
> > > > Do we really need to set all of these fields? AFAIK the ones we don't
> > > > set should be left as 0's anyway, so at least the Address and BufferSize
> > > > should be fine to be left out. I think the MOCS field should be fine
> > > > too, since we are not setting any pointer here. Unless you want to
> > > > be really explicit...
> > > 
> > > Yeah. I don't know that it is helpful since the genxml already sets
> > > the packet length, and I guess things should be zero by default. Maybe
> > > it will make it a little easier to find for bindless in the future?
> > > 
> > > Regarding Jason's comment about the enable bit, I was following Ken's
> > > referenced commit (263b584d5e4) for the similar field in gen9+ on
> > > i965. Maybe it is good to actually force the write to explicitly set
> > > the size to 0?
> > 
> > Yeah, my understanding is that we should set the "modify" bit, so it
> > will actually set the address and size to 0.
> > 
> > > I guess setting MOCS does not follow what Ken did in i965.
> > > 
> > > If we actually do want to set the enable bit, then it might be good to
> > > also leave the fields being explicitly set to zero.
> > > 
> > > My preference would be to just set the fields explicitly. Since we
> > > only specify this packet in one place, it doesn't seem like it adds
> > > too much verbosity.
> > 
> > On most of the genxml code I've seen, we only set the fields that are
> > not zeroed by default.
> 
> I think there are exceptions:
> 
> $ git grep -Ee "\..* = 0;" src/intel/vulkan/genX_*
> 
> I think the rule is more like: if setting the field to 0 is notable,
> then it's better to explicitly set it for informational purposes.
> 
> If the 'enable' bit is set, then I think think the fields that will be
> updated are notable. If the 'enable' bit was not set, then maybe the
> fields are not important. (In that case, perhaps the patch should be
> dropped entirely.)

OK, that's indeed a good explanation.

> Anyway, if Jason doesn't have any further input, I'll go with your
> suggestion of dropping the zeroed fields.

I'm fine with either way, as long as we set the modify enable bit.

> -Jordan
> 
> > And if the name of these fields change or
> > something in a future generation, assuming we are still not using them,
> > it's easier to just change the xml for that gen.
> > 
> > So to keep the code consistent with the rest, I would leave it out, but
> > regardless of what you choose,
> > 
> > Reviewed-by: Rafael Antognolli 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i965/miptree: Use enum instead of boolean.

2018-10-10 Thread Rafael Antognolli
ISL_AUX_USAGE_NONE happens to be the same as "false", but let's do the
right thing and use the enum.

v2: fix intel_miptree_finish_depth too (Caio)

Reviewed-by: Dylan Baker 
Reviewed-by: Caio Marcelo de Oliveira Filho 
Reviewed-by: Jason Ekstrand 
---

I just added the finish_depth() fix in the same patch and kept the rb's,
since it's a one-liner. And imho it makes sense to have all the fixes in
a single commit. Hopefully it's not an issue.

 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index e32641f4098..69b7a96b9c7 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -2727,7 +2727,7 @@ intel_miptree_finish_depth(struct brw_context *brw,
 {
if (depth_written) {
   intel_miptree_finish_write(brw, mt, level, start_layer, layer_count,
- mt->aux_buf != NULL);
+ mt->aux_usage);
}
 }
 
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 708757c47b8..b0333655ad5 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -613,9 +613,10 @@ intel_miptree_access_raw(struct brw_context *brw,
  uint32_t level, uint32_t layer,
  bool write)
 {
-   intel_miptree_prepare_access(brw, mt, level, 1, layer, 1, false, false);
+   intel_miptree_prepare_access(brw, mt, level, 1, layer, 1,
+ISL_AUX_USAGE_NONE, false);
if (write)
-  intel_miptree_finish_write(brw, mt, level, layer, 1, false);
+  intel_miptree_finish_write(brw, mt, level, layer, 1, ISL_AUX_USAGE_NONE);
 }
 
 enum isl_aux_usage
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] anv/gen9+: Initialize new fields in STATE_BASE_ADDRESS

2018-10-10 Thread Rafael Antognolli
On Wed, Oct 10, 2018 at 02:04:11PM -0700, Jordan Justen wrote:
> On 2018-10-10 13:45:13, Rafael Antognolli wrote:
> > On Wed, Oct 10, 2018 at 01:39:25PM -0700, Jordan Justen wrote:
> > > Ref: 263b584d5e4 "i965/skl: Emit extra zeros in STATE_BASE_ADDRESS on 
> > > Skylake."
> > > Signed-off-by: Jordan Justen 
> > > ---
> > >  src/intel/vulkan/genX_cmd_buffer.c | 12 
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> > > b/src/intel/vulkan/genX_cmd_buffer.c
> > > index c3a7e5c83c3..43a02f22567 100644
> > > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > > @@ -121,6 +121,18 @@ genX(cmd_buffer_emit_state_base_address)(struct 
> > > anv_cmd_buffer *cmd_buffer)
> > >sba.IndirectObjectBufferSizeModifyEnable  = true;
> > >sba.InstructionBufferSize = 0xf;
> > >sba.InstructionBuffersizeModifyEnable = true;
> > > +#  endif
> > > +#  if (GEN_GEN >= 9)
> > > +  sba.BindlessSurfaceStateBaseAddress = (struct anv_address) { NULL, 
> > > 0 };
> > > +  sba.BindlessSurfaceStateMemoryObjectControlState = GENX(MOCS);
> > > +  sba.BindlessSurfaceStateBaseAddressModifyEnable = true;
> > > +  sba.BindlessSurfaceStateSize = 0;
> > > +#  endif
> > > +#  if (GEN_GEN >= 10)
> > > +  sba.BindlessSamplerStateBaseAddress = (struct anv_address) { NULL, 
> > > 0 };
> > > +  sba.BindlessSamplerStateMemoryObjectControlState = GENX(MOCS);
> > > +  sba.BindlessSamplerStateBaseAddressModifyEnable = true;
> > > +  sba.BindlessSamplerStateBufferSize = 0;
> > 
> > Do we really need to set all of these fields? AFAIK the ones we don't
> > set should be left as 0's anyway, so at least the Address and BufferSize
> > should be fine to be left out. I think the MOCS field should be fine
> > too, since we are not setting any pointer here. Unless you want to
> > be really explicit...
> 
> Yeah. I don't know that it is helpful since the genxml already sets
> the packet length, and I guess things should be zero by default. Maybe
> it will make it a little easier to find for bindless in the future?
> 
> Regarding Jason's comment about the enable bit, I was following Ken's
> referenced commit (263b584d5e4) for the similar field in gen9+ on
> i965. Maybe it is good to actually force the write to explicitly set
> the size to 0?

Yeah, my understanding is that we should set the "modify" bit, so it
will actually set the address and size to 0.

> I guess setting MOCS does not follow what Ken did in i965.
> 
> If we actually do want to set the enable bit, then it might be good to
> also leave the fields being explicitly set to zero.
> 
> My preference would be to just set the fields explicitly. Since we
> only specify this packet in one place, it doesn't seem like it adds
> too much verbosity.

On most of the genxml code I've seen, we only set the fields that are
not zeroed by default. And if the name of these fields change or
something in a future generation, assuming we are still not using them,
it's easier to just change the xml for that gen.

So to keep the code consistent with the rest, I would leave it out, but
regardless of what you choose,

Reviewed-by: Rafael Antognolli 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/gen10+: Initialize new fields in STATE_BASE_ADDRESS

2018-10-10 Thread Rafael Antognolli
On Wed, Oct 10, 2018 at 01:39:26PM -0700, Jordan Justen wrote:
> Ref: 263b584d5e4 "i965/skl: Emit extra zeros in STATE_BASE_ADDRESS on 
> Skylake."
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_misc_state.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
> b/src/mesa/drivers/dri/i965/brw_misc_state.c
> index 0895e1f2b7f..9bff2c8ac92 100644
> --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
> @@ -688,7 +688,7 @@ brw_upload_state_base_address(struct brw_context *brw)
> * to the bottom 4GB.
> */
>uint32_t mocs_wb = devinfo->gen >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB;
> -  int pkt_len = devinfo->gen >= 9 ? 19 : 16;
> +  int pkt_len = devinfo->gen >= 10 ? 22 : (devinfo->gen >= 9 ? 19 : 16);
>  
>BEGIN_BATCH(pkt_len);
>OUT_BATCH(CMD_STATE_BASE_ADDRESS << 16 | (pkt_len - 2));
> @@ -718,6 +718,11 @@ brw_upload_state_base_address(struct brw_context *brw)
>   OUT_BATCH(0);
>   OUT_BATCH(0);
>}
> +  if (devinfo->gen >= 10) {
> + OUT_BATCH(1);
> + OUT_BATCH(0);
> + OUT_BATCH(0);
> +  }

Reviewed-by: Rafael Antognolli 

>ADVANCE_BATCH();
> } else if (devinfo->gen >= 6) {
>uint8_t mocs = devinfo->gen == 7 ? GEN7_MOCS_L3 : 0;
> -- 
> 2.19.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] anv/gen9+: Initialize new fields in STATE_BASE_ADDRESS

2018-10-10 Thread Rafael Antognolli
On Wed, Oct 10, 2018 at 01:39:25PM -0700, Jordan Justen wrote:
> Ref: 263b584d5e4 "i965/skl: Emit extra zeros in STATE_BASE_ADDRESS on 
> Skylake."
> Signed-off-by: Jordan Justen 
> ---
>  src/intel/vulkan/genX_cmd_buffer.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> b/src/intel/vulkan/genX_cmd_buffer.c
> index c3a7e5c83c3..43a02f22567 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -121,6 +121,18 @@ genX(cmd_buffer_emit_state_base_address)(struct 
> anv_cmd_buffer *cmd_buffer)
>sba.IndirectObjectBufferSizeModifyEnable  = true;
>sba.InstructionBufferSize = 0xf;
>sba.InstructionBuffersizeModifyEnable = true;
> +#  endif
> +#  if (GEN_GEN >= 9)
> +  sba.BindlessSurfaceStateBaseAddress = (struct anv_address) { NULL, 0 };
> +  sba.BindlessSurfaceStateMemoryObjectControlState = GENX(MOCS);
> +  sba.BindlessSurfaceStateBaseAddressModifyEnable = true;
> +  sba.BindlessSurfaceStateSize = 0;
> +#  endif
> +#  if (GEN_GEN >= 10)
> +  sba.BindlessSamplerStateBaseAddress = (struct anv_address) { NULL, 0 };
> +  sba.BindlessSamplerStateMemoryObjectControlState = GENX(MOCS);
> +  sba.BindlessSamplerStateBaseAddressModifyEnable = true;
> +  sba.BindlessSamplerStateBufferSize = 0;

Do we really need to set all of these fields? AFAIK the ones we don't
set should be left as 0's anyway, so at least the Address and BufferSize
should be fine to be left out. I think the MOCS field should be fine
too, since we are not setting any pointer here. Unless you want to
be really explicit...

>  #  endif
> }
>  
> -- 
> 2.19.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Update STATE_BASE_ADDRESS length for gen11+.

2018-10-10 Thread Rafael Antognolli
Please ignore this patch, Jordan's version is the correct one.

On Wed, Oct 10, 2018 at 01:30:52PM -0700, Rafael Antognolli wrote:
> Starting in gen11, we have 3 more dwords used for Bindless Sampler State
> pointer and size.
> 
> Cc: Anuj Phogat 
> 
> ---
>  src/mesa/drivers/dri/i965/brw_misc_state.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
> b/src/mesa/drivers/dri/i965/brw_misc_state.c
> index 0895e1f2b7f..965fbb10c4d 100644
> --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
> @@ -688,7 +688,8 @@ brw_upload_state_base_address(struct brw_context *brw)
> * to the bottom 4GB.
> */
>uint32_t mocs_wb = devinfo->gen >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB;
> -  int pkt_len = devinfo->gen >= 9 ? 19 : 16;
> +  const int pkt_len =
> + devinfo->gen >= 9 ? (devinfo->gen >= 11 ? 22 : 19) : 16;
>  
>BEGIN_BATCH(pkt_len);
>OUT_BATCH(CMD_STATE_BASE_ADDRESS << 16 | (pkt_len - 2));
> @@ -717,6 +718,12 @@ brw_upload_state_base_address(struct brw_context *brw)
>   OUT_BATCH(1);
>   OUT_BATCH(0);
>   OUT_BATCH(0);
> + if (devinfo->gen >= 11) {
> +/* Bindless Sampler State */
> +OUT_BATCH(0);
> +OUT_BATCH(0);
> +OUT_BATCH(0);
> + }
>}
>ADVANCE_BATCH();
> } else if (devinfo->gen >= 6) {
> -- 
> 2.17.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Update STATE_BASE_ADDRESS length for gen11+.

2018-10-10 Thread Rafael Antognolli
Starting in gen11, we have 3 more dwords used for Bindless Sampler State
pointer and size.

Cc: Anuj Phogat 

---
 src/mesa/drivers/dri/i965/brw_misc_state.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
b/src/mesa/drivers/dri/i965/brw_misc_state.c
index 0895e1f2b7f..965fbb10c4d 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -688,7 +688,8 @@ brw_upload_state_base_address(struct brw_context *brw)
* to the bottom 4GB.
*/
   uint32_t mocs_wb = devinfo->gen >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB;
-  int pkt_len = devinfo->gen >= 9 ? 19 : 16;
+  const int pkt_len =
+ devinfo->gen >= 9 ? (devinfo->gen >= 11 ? 22 : 19) : 16;
 
   BEGIN_BATCH(pkt_len);
   OUT_BATCH(CMD_STATE_BASE_ADDRESS << 16 | (pkt_len - 2));
@@ -717,6 +718,12 @@ brw_upload_state_base_address(struct brw_context *brw)
  OUT_BATCH(1);
  OUT_BATCH(0);
  OUT_BATCH(0);
+ if (devinfo->gen >= 11) {
+/* Bindless Sampler State */
+OUT_BATCH(0);
+OUT_BATCH(0);
+OUT_BATCH(0);
+ }
   }
   ADVANCE_BATCH();
} else if (devinfo->gen >= 6) {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/miptree: Use enum instead of boolean.

2018-10-10 Thread Rafael Antognolli
ISL_AUX_USAGE_NONE happens to be the same as "false", but let's do the
right thing and use the enum.
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 708757c47b8..b0333655ad5 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -613,9 +613,10 @@ intel_miptree_access_raw(struct brw_context *brw,
  uint32_t level, uint32_t layer,
  bool write)
 {
-   intel_miptree_prepare_access(brw, mt, level, 1, layer, 1, false, false);
+   intel_miptree_prepare_access(brw, mt, level, 1, layer, 1,
+ISL_AUX_USAGE_NONE, false);
if (write)
-  intel_miptree_finish_write(brw, mt, level, layer, 1, false);
+  intel_miptree_finish_write(brw, mt, level, layer, 1, ISL_AUX_USAGE_NONE);
 }
 
 enum isl_aux_usage
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: consider a 'base level' when calculating width0, height0, depth0

2018-10-08 Thread Rafael Antognolli
On Tue, Oct 02, 2018 at 07:16:01PM +0300, asimiklit.w...@gmail.com wrote:
> From: Andrii Simiklit 
> 
> I guess that when we calculating the width0, height0, depth0
> to use for function 'intel_miptree_create' we need to consider
> the 'base level' like it is done in the 'intel_miptree_create_for_teximage'
> function.

Hi Andrii, this makes sense to me. I'm also not familiar with this code,
so I'm not sure this is the right way to solve the issue, but at least
it's a way.

You added a simple test case in the bug, do you think you could make
that a piglit test?

Thanks,
Rafael

> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107987
> Signed-off-by: Andrii Simiklit 
> ---
>  .../drivers/dri/i965/intel_tex_validate.c | 26 ++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_tex_validate.c 
> b/src/mesa/drivers/dri/i965/intel_tex_validate.c
> index 72ce83c7ce..37aa8f43ec 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex_validate.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex_validate.c
> @@ -119,8 +119,32 @@ intel_finalize_mipmap_tree(struct brw_context *brw,
> /* May need to create a new tree:
>  */
> if (!intelObj->mt) {
> +  const unsigned level = firstImage->base.Base.Level;
>intel_get_image_dims(>base.Base, , , );
> -
> +  /* Figure out image dimensions at start level. */
> +  switch(intelObj->base.Target) {
> +  case GL_TEXTURE_2D_MULTISAMPLE:
> +  case GL_TEXTURE_2D_MULTISAMPLE_ARRAY:
> +  case GL_TEXTURE_RECTANGLE:
> +  case GL_TEXTURE_EXTERNAL_OES:
> +  assert(level == 0);
> +  break;
> +  case GL_TEXTURE_3D:
> +  depth = depth << level;
> +  /* Fall through */
> +  case GL_TEXTURE_2D:
> +  case GL_TEXTURE_2D_ARRAY:
> +  case GL_TEXTURE_CUBE_MAP:
> +  case GL_TEXTURE_CUBE_MAP_ARRAY:
> +  height = height << level;
> +  /* Fall through */
> +  case GL_TEXTURE_1D:
> +  case GL_TEXTURE_1D_ARRAY:
> +  width = width << level;
> +  break;
> +  default:
> +  unreachable("Unexpected target");
> +  }
>perf_debug("Creating new %s %dx%dx%d %d-level miptree to handle "
>   "finalized texture miptree.\n",
>   _mesa_get_format_name(firstImage->base.Base.TexFormat),
> -- 
> 2.17.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 07/11] intel: tools: aub_mem: reuse already mapped ppgtt buffers

2018-08-22 Thread Rafael Antognolli
On Wed, Aug 08, 2018 at 11:11:11PM +0100, Lionel Landwerlin wrote:
> On 08/08/18 20:07, Rafael Antognolli wrote:
> > On Tue, Aug 07, 2018 at 06:35:18PM +0100, Lionel Landwerlin wrote:
> > > When we map a PPGTT buffer into a continous address space of aubinator
> > > to be able to inspect it, we currently add it to the list of BOs to
> > > unmap once we're finished. An optimization we can apply it to look up
> > > that list before trying to remap PPGTT buffers again (we already do
> > > this for GGTT buffers).
> > > 
> > > We need to take some care before doing this because the list also
> > > contains GGTT BOs. As GGTT & PPGTT are 2 different address spaces, we
> > > can have matching addresses in both that point to different physical
> > > locations.
> > So, before this change, we could have the same address for PPGTT and
> > GGTT on the map list, but they never clashed because we only added the
> > PPGTT ones at the end, and then unmapped them? Or was there something
> > else preventing them from conflicting?
> 
> Before this change we could get clashes when asking for a GGTT address and
> get a PPGTT one.
> I think we got lucky so far because we use a very small amount of GGTT and
> that didn't happen.

You explained it, I understood and moved on, and forgot about it :-/

But this is:

Reviewed-by: Rafael Antognolli 

> 
> > > This changes adds a flag on the elements of the list of mapped BOs to
> > > differenciate between GGTT & PPGTT, which allows use to reuse that
> > > list when looking up both address spaces.
> > > 
> > > Signed-off-by: Lionel Landwerlin 
> > > ---
> > >   src/intel/tools/aub_mem.c | 16 +++-
> > >   1 file changed, 11 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/src/intel/tools/aub_mem.c b/src/intel/tools/aub_mem.c
> > > index 2d29386e57c..3d4dc8061bd 100644
> > > --- a/src/intel/tools/aub_mem.c
> > > +++ b/src/intel/tools/aub_mem.c
> > > @@ -42,6 +42,7 @@ struct bo_map {
> > >  struct list_head link;
> > >  struct gen_batch_decode_bo bo;
> > >  bool unmap_after_use;
> > > +   bool ppgtt;
> > >   };
> > >   struct ggtt_entry {
> > > @@ -59,10 +60,11 @@ struct phys_mem {
> > >   };
> > >   static void
> > > -add_gtt_bo_map(struct aub_mem *mem, struct gen_batch_decode_bo bo, bool 
> > > unmap_after_use)
> > > +add_gtt_bo_map(struct aub_mem *mem, struct gen_batch_decode_bo bo, bool 
> > > ppgtt, bool unmap_after_use)
> > >   {
> > >  struct bo_map *m = calloc(1, sizeof(*m));
> > > +   m->ppgtt = ppgtt;
> > >  m->bo = bo;
> > >  m->unmap_after_use = unmap_after_use;
> > >  list_add(>link, >maps);
> > > @@ -190,7 +192,7 @@ aub_mem_local_write(void *_mem, uint64_t address,
> > > .addr = address,
> > > .size = size,
> > >  };
> > > -   add_gtt_bo_map(mem, bo, false);
> > > +   add_gtt_bo_map(mem, bo, false, false);
> > >   }
> > >   void
> > > @@ -253,7 +255,7 @@ aub_mem_get_ggtt_bo(void *_mem, uint64_t address)
> > >  struct gen_batch_decode_bo bo = {0};
> > >  list_for_each_entry(struct bo_map, i, >maps, link)
> > > -  if (i->bo.addr <= address && i->bo.addr + i->bo.size > address)
> > > +  if (!i->ppgtt && i->bo.addr <= address && i->bo.addr + i->bo.size 
> > > > address)
> > >return i->bo;
> > >  address &= ~0xfff;
> > > @@ -292,7 +294,7 @@ aub_mem_get_ggtt_bo(void *_mem, uint64_t address)
> > > assert(res != MAP_FAILED);
> > >  }
> > > -   add_gtt_bo_map(mem, bo, true);
> > > +   add_gtt_bo_map(mem, bo, false, true);
> > >  return bo;
> > >   }
> > > @@ -328,6 +330,10 @@ aub_mem_get_ppgtt_bo(void *_mem, uint64_t address)
> > >  struct aub_mem *mem = _mem;
> > >  struct gen_batch_decode_bo bo = {0};
> > > +   list_for_each_entry(struct bo_map, i, >maps, link)
> > > +  if (i->ppgtt && i->bo.addr <= address && i->bo.addr + i->bo.size > 
> > > address)
> > > + return i->bo;
> > > +
> > >  address &= ~0xfff;
> > >  if (!ppgtt_mapped(mem, mem->pml4, address))
> > > @@ -353,7 +359,7 @@ aub_mem_get_ppgtt_bo(void *_mem, uint64_t address)
> > > assert(res != MAP_FAILED);
> > >  }
> > > -   add_gtt_bo_map(mem, bo, true);
> > > +   add_gtt_bo_map(mem, bo, true, true);
> > >  return bo;
> > >   }
> > > -- 
> > > 2.18.0
> > > 
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] intel/tools/aubwrite: Rename "legacy" to "Trace Block".

2018-08-21 Thread Rafael Antognolli
Hopefully it's a little more descriptive, and more accurate.

Cc: Lionel Landwerlin 
---
 src/intel/tools/aub_write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/tools/aub_write.c b/src/intel/tools/aub_write.c
index e92bdaf5ed4..5d59b4ef28a 100644
--- a/src/intel/tools/aub_write.c
+++ b/src/intel/tools/aub_write.c
@@ -478,7 +478,7 @@ aub_write_trace_block(struct aub_file *aub,
ppgtt_lookup(aub, gtt_offset + 
offset),
block_size,

AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL,
-   "legacy");
+   "Trace Block");
   } else {
  dword_out(aub, CMD_AUB_TRACE_HEADER_BLOCK |
 ((aub->addr_bits > 32 ? 6 : 5) - 2));
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] intel/toosl/aubwrite: Always use physical addresses for traces.

2018-08-21 Thread Rafael Antognolli
It looks like we can't rely on the simulator to always translate virtual
addresses to physical ones correctly. So let's use physical everywhere.

Since our current GGTT maps virtual to physical addresses in a 1:1 way,
no further changes are required.

Additionally, we have other address spaces not in use right now. So
let's make it easier to switch which one we are using by putting the
default one into the aub_file struct.

Cc: Lionel Landwerlin 
---
 src/intel/tools/aub_write.c | 21 +++--
 src/intel/tools/aub_write.h |  1 +
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/intel/tools/aub_write.c b/src/intel/tools/aub_write.c
index 5d59b4ef28a..fb4e0de93e3 100644
--- a/src/intel/tools/aub_write.c
+++ b/src/intel/tools/aub_write.c
@@ -126,6 +126,7 @@ aub_file_init(struct aub_file *aub, FILE *file, uint16_t 
pci_id)
aub->addr_bits = aub->devinfo.gen >= 8 ? 48 : 32;
 
aub->pml4.phys_addr = PML4_PHYS_ADDR;
+   aub->default_addr_space = AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL;
 }
 
 void
@@ -339,7 +340,7 @@ write_execlists_header(struct aub_file *aub, const char 
*name)
 
/* RENDER_RING */
mem_trace_memory_write_header_out(aub, RENDER_RING_ADDR, RING_SIZE,
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
+ aub->default_addr_space,
  "RENDER RING");
for (uint32_t i = 0; i < RING_SIZE; i += sizeof(uint32_t))
   dword_out(aub, 0);
@@ -348,7 +349,7 @@ write_execlists_header(struct aub_file *aub, const char 
*name)
mem_trace_memory_write_header_out(aub, RENDER_CONTEXT_ADDR,
  PPHWSP_SIZE +
  CONTEXT_RENDER_SIZE,
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
+ aub->default_addr_space,
  "RENDER PPHWSP");
for (uint32_t i = 0; i < PPHWSP_SIZE; i += sizeof(uint32_t))
   dword_out(aub, 0);
@@ -358,7 +359,7 @@ write_execlists_header(struct aub_file *aub, const char 
*name)
 
/* BLITTER_RING */
mem_trace_memory_write_header_out(aub, BLITTER_RING_ADDR, RING_SIZE,
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
+ aub->default_addr_space,
  "BLITTER RING");
for (uint32_t i = 0; i < RING_SIZE; i += sizeof(uint32_t))
   dword_out(aub, 0);
@@ -367,7 +368,7 @@ write_execlists_header(struct aub_file *aub, const char 
*name)
mem_trace_memory_write_header_out(aub, BLITTER_CONTEXT_ADDR,
  PPHWSP_SIZE +
  CONTEXT_OTHER_SIZE,
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
+ aub->default_addr_space,
  "BLITTER PPHWSP");
for (uint32_t i = 0; i < PPHWSP_SIZE; i += sizeof(uint32_t))
   dword_out(aub, 0);
@@ -377,7 +378,7 @@ write_execlists_header(struct aub_file *aub, const char 
*name)
 
/* VIDEO_RING */
mem_trace_memory_write_header_out(aub, VIDEO_RING_ADDR, RING_SIZE,
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
+ aub->default_addr_space,
  "VIDEO RING");
for (uint32_t i = 0; i < RING_SIZE; i += sizeof(uint32_t))
   dword_out(aub, 0);
@@ -386,7 +387,7 @@ write_execlists_header(struct aub_file *aub, const char 
*name)
mem_trace_memory_write_header_out(aub, VIDEO_CONTEXT_ADDR,
  PPHWSP_SIZE +
  CONTEXT_OTHER_SIZE,
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
+ aub->default_addr_space,
  "VIDEO PPHWSP");
for (uint32_t i = 0; i < PPHWSP_SIZE; i += sizeof(uint32_t))
   dword_out(aub, 0);
@@ -477,7 +478,7 @@ aub_write_trace_block(struct aub_file *aub,
  mem_trace_memory_write_header_out(aub,
ppgtt_lookup(aub, gtt_offset + 
offset),
block_size,
-   
AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL,
+   aub->default_addr_space,
"Trace Block");
   } else {
  dword_out(aub, CMD_AUB_TRACE_HEADER_BLOCK |
@@ -542,7 +543,7 @@ aub_dump_execlist(struct aub_file *aub, uint64_t 
batch_offset, int ring_flag)
}
 
mem_trace_memory_write_header_out(aub, ring_addr, 16,
- AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
+ aub->default_addr_space,
  "RING 

Re: [Mesa-dev] [PATCH v2] intel/decoder: fix the possible out of bounds group_iter

2018-08-14 Thread Rafael Antognolli
On Tue, Aug 14, 2018 at 03:36:18PM +0100, Lionel Landwerlin wrote:
> On 14/08/18 12:55, asimiklit.work wrote:
> > Hi Lionel,
> > > Hi Andrii,
> > > 
> > > Again sorry, I don't think this is the right fix.
> > > I'm sending another patch to fix the parsing of
> > > MI_BATCH_BUFFER_START which seems to be the actual issue.
> > > 
> > > Thanks for working on this,
> > Thanks for your fast reply.
> > I agree that it is not correct patch for this issue but anyway
> > "iter_more_groups" function will still work incorrectly
> > for unknown instructions when the "iter->group->variable" field is true.
> > I guess that this case should be fixed.
> > Please let me know if I am incorrect.
> 
> Hey Andrii,
> 
> We shouldn't even get to use the iterator if it's an unknown instruction.
> The decoder should just advance dword by dword until it finds something that
> makes sense again.
> 
> If we run into that problem, I think we should fix the caller.

In that case, would an unreachable() or assert be a good thing to do?

> 
> > 
> > Regards,
> > Andrii.
> > 
> > On 2018-08-14 1:26 PM, Lionel Landwerlin wrote:
> > > Hi Andrii,
> > > 
> > > Again sorry, I don't think this is the right fix.
> > > I'm sending another patch to fix the parsing of
> > > MI_BATCH_BUFFER_START which seems to be the actual issue.
> > > 
> > > Thanks for working on this,
> > > 
> > > -
> > > Lionel
> > > 
> > > On 14/08/18 10:04, asimiklit.w...@gmail.com wrote:
> > > > From: Andrii Simiklit 
> > > > 
> > > > The "gen_group_get_length" function can return a negative value
> > > > and it can lead to the out of bounds group_iter.
> > > > 
> > > > v2: printing of "unknown command type" was added
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
> > > > Signed-off-by: Andrii Simiklit 
> > > > ---
> > > >   src/intel/common/gen_decoder.c | 13 +++--
> > > >   1 file changed, 11 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/src/intel/common/gen_decoder.c
> > > > b/src/intel/common/gen_decoder.c
> > > > index ec0a486..b36facf 100644
> > > > --- a/src/intel/common/gen_decoder.c
> > > > +++ b/src/intel/common/gen_decoder.c
> > > > @@ -770,6 +770,13 @@ gen_group_get_length(struct gen_group
> > > > *group, const uint32_t *p)
> > > >   return -1;
> > > >     }
> > > >  }
> > > > +   default: {
> > > > +  fprintf(stderr, "Unknown command type %u in '%s::%s'\n",
> > > > +    type,
> > > > +    (group->parent && group->parent->name) ?
> > > > group->parent->name : "UNKNOWN",
> > > > +    group->name ? group->name : "UNKNOWN");
> > > > +  break;
> > > > +   }
> > > >  }
> > > >    return -1;
> > > > @@ -803,8 +810,10 @@ static bool
> > > >   iter_more_groups(const struct gen_field_iterator *iter)
> > > >   {
> > > >  if (iter->group->variable) {
> > > > -  return iter_group_offset_bits(iter, iter->group_iter + 1) <
> > > > -  (gen_group_get_length(iter->group, iter->p) * 32);
> > > > +  const int length = gen_group_get_length(iter->group, iter->p);
> > > > +  return length > 0 &&
> > > > +    iter_group_offset_bits(iter, iter->group_iter + 1) <
> > > > +  (length * 32);
> > > >  } else {
> > > >     return (iter->group_iter + 1) < iter->group->group_count ||
> > > >    iter->group->next != NULL;
> > > 
> > > 
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/decoder: fix the possible out of bounds group_iter

2018-08-10 Thread Rafael Antognolli
On Fri, Aug 10, 2018 at 05:37:12PM +0100, Lionel Landwerlin wrote:
> Andrey also opened a bug about this issue :
> https://bugs.freedesktop.org/show_bug.cgi?id=107544
> 
> It feels like it should be fixed on master though. get_length() shouldn't
> return -1 for structs anymore.
> We should probably return 1 at end of get_length() so that the decoder
> prints out "unknown instruction".
> That would help spot potential errors and updates needed to genxml.

Yeah, that makes sense. I saw the we were doing the check for length < 0
somewhere else so it seemed reasonable to check for that, considering we
can return -1, but I agree that printing "unknown instruction" would be
better.

> -
> Lionel
> 
> 
> On 10/08/18 16:48, Rafael Antognolli wrote:
> > On Thu, Aug 09, 2018 at 03:00:30PM +0300, andrey simiklit wrote:
> > > Hi,
> > > 
> > > Sorry I missed the main thought here.
> > > The "gen_group_get_length" function returns int
> > > but the "iter_group_offset_bits" function returns uint32_t
> > > So uint32_t(int(-32)) = 0xFFE0U and it looks like unexpected behavior 
> > > for
> > > me:
> > > iter_group_offset_bits(iter, iter->group_iter + 1) < 0xFFE0U;
> > That's fine, I think the original commit message is good enough to
> > understand this change. Feel free to add this extra bit too if you want,
> > but I don't think it's needed.
> > 
> > Reviewed-by: Rafael Antognolli 
> > 
> > > Regards,
> > > Andrii.
> > > 
> > > On Thu, Aug 9, 2018 at 2:35 PM, Andrii Simiklit 
> > > wrote:
> > > 
> > >  The "gen_group_get_length" function can return a negative value
> > >  and it can lead to the out of bounds group_iter.
> > > 
> > >  Signed-off-by: Andrii Simiklit 
> > >  ---
> > >   src/intel/common/gen_decoder.c | 6 --
> > >   1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > >  diff --git a/src/intel/common/gen_decoder.c 
> > > b/src/intel/common/gen_decoder
> > >  .c
> > >  index ec0a486..f09bd87 100644
> > >  --- a/src/intel/common/gen_decoder.c
> > >  +++ b/src/intel/common/gen_decoder.c
> > >  @@ -803,8 +803,10 @@ static bool
> > >   iter_more_groups(const struct gen_field_iterator *iter)
> > >   {
> > >  if (iter->group->variable) {
> > >  -  return iter_group_offset_bits(iter, iter->group_iter + 1) <
> > >  -  (gen_group_get_length(iter->group, iter->p) * 32);
> > >  +  const int length = gen_group_get_length(iter->group, iter->p);
> > >  +  return length > 0 &&
> > >  + iter_group_offset_bits(iter, iter->group_iter + 1) <
> > >  +  (length * 32);
> > >  } else {
> > > return (iter->group_iter + 1) < iter->group->group_count ||
> > >iter->group->next != NULL;
> > >  --
> > >  2.7.4
> > > 
> > > 
> > > 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/decoder: fix the possible out of bounds group_iter

2018-08-10 Thread Rafael Antognolli
On Thu, Aug 09, 2018 at 03:00:30PM +0300, andrey simiklit wrote:
> Hi,
> 
> Sorry I missed the main thought here.
> The "gen_group_get_length" function returns int
> but the "iter_group_offset_bits" function returns uint32_t
> So uint32_t(int(-32)) = 0xFFE0U and it looks like unexpected behavior for
> me:
> iter_group_offset_bits(iter, iter->group_iter + 1) < 0xFFE0U;

That's fine, I think the original commit message is good enough to
understand this change. Feel free to add this extra bit too if you want,
but I don't think it's needed.

Reviewed-by: Rafael Antognolli 

> Regards,
> Andrii.
> 
> On Thu, Aug 9, 2018 at 2:35 PM, Andrii Simiklit 
> wrote:
> 
> The "gen_group_get_length" function can return a negative value
> and it can lead to the out of bounds group_iter.
> 
> Signed-off-by: Andrii Simiklit 
> ---
>  src/intel/common/gen_decoder.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder
> .c
> index ec0a486..f09bd87 100644
> --- a/src/intel/common/gen_decoder.c
> +++ b/src/intel/common/gen_decoder.c
> @@ -803,8 +803,10 @@ static bool
>  iter_more_groups(const struct gen_field_iterator *iter)
>  {
> if (iter->group->variable) {
> -  return iter_group_offset_bits(iter, iter->group_iter + 1) <
> -  (gen_group_get_length(iter->group, iter->p) * 32);
> +  const int length = gen_group_get_length(iter->group, iter->p);
> +  return length > 0 &&
> + iter_group_offset_bits(iter, iter->group_iter + 1) <
> +  (length * 32);
> } else {
>return (iter->group_iter + 1) < iter->group->group_count ||
>   iter->group->next != NULL;
> --
> 2.7.4
> 
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 11/11] intel: aubinator_viewer: add urb view

2018-08-08 Thread Rafael Antognolli
 b/src/intel/tools/aubinator_viewer_urb.h
> @@ -0,0 +1,96 @@
> +#ifndef AUBINATOR_VIEWER_URB_H
> +#define AUBINATOR_VIEWER_URB_H
> +
> +#include "aubinator_viewer.h"
> +
> +#include "imgui.h"
> +
> +struct AubinatorViewerUrb {
> +
> +   float RowHeight;
> +
> +   AubinatorViewerUrb() {
> +  RowHeight = 10.0f;
> +   }
> +
> +   bool _Hovered(const ImVec2& mouse, bool window_hovered,
> + const ImVec2& tl, const ImVec2& br) {
> +  return window_hovered &&
> + tl.x <= mouse.x && tl.y <= mouse.y &&
> + br.x > mouse.x && br.y > mouse.y;
> +   }
> +
> +   void DrawAllocation(const char *label,
> +   int n_stages,
> +   int end_urb_offset,
> +   const char *stage_names[],
> +   const struct aub_decode_urb_stage_state *stages) {
> +  const ImVec2 label_size = ImGui::CalcTextSize("VS entry:  ", NULL, 
> true);
> +  ImVec2 graph_size(ImGui::CalcItemWidth(), 2 * n_stages * label_size.y);
> +
> +  ImGui::BeginChild(label, ImVec2(0, graph_size.y), false);
> +
> +  ImDrawList* draw_list = ImGui::GetWindowDrawList();
> +
> +  const float row_height = MAX2(RowHeight, label_size.y);
> +  const float width = ImGui::GetContentRegionAvailWidth() - label_size.x;
> +  const float alloc_delta = width / end_urb_offset;
> +  const ImVec2 window_pos = ImGui::GetWindowPos();
> +  const ImVec2 mouse_pos = ImGui::GetMousePos();
> +  const bool window_hovered = ImGui::IsWindowHovered();
> +
> +  int const_idx = 0;
> +  for (int s = 0; s < n_stages; s++) {
> + const float x = window_pos.x + label_size.x;
> + const float y = window_pos.y + s * row_height;
> +
> + ImVec2 alloc_pos(window_pos.x, y);
> + ImVec2 alloc_tl(x + stages[s].start * alloc_delta, y);
> + ImVec2 alloc_br(x + (stages[s].start +
> +  stages[s].n_entries * stages[s].size) * 
> alloc_delta,
> + y + row_height);
> + ImVec2 const_tl(x + const_idx * alloc_delta, y);
> + ImVec2 const_br(x + (const_idx + stages[s].const_rd_length) * 
> alloc_delta,
> + y + row_height);
> +
> + char label[40];
> + snprintf(label, sizeof(label), "%s: ", stage_names[s]);
> + draw_list->AddText(alloc_pos, ImGui::GetColorU32(ImGuiCol_Text), 
> label);
> +
> + float r, g, b;
> + bool hovered;
> +
> + /* URB allocation */
> + hovered = _Hovered(mouse_pos, window_hovered, alloc_tl, alloc_br);
> + ImGui::ColorConvertHSVtoRGB((2 * s) * 1.0f / (2 * n_stages),
> + 1.0f, hovered ? 1.0f : 0.8f,
> + r, g, b);
> + draw_list->AddRectFilled(alloc_tl, alloc_br, ImColor(r, g, b));
> + if (hovered) {
> +ImGui::SetTooltip("%s: start=%u end=%u",
> +  stage_names[s],
> +  stages[s].start,
> +  stages[s].n_entries * stages[s].size);

Maybe you should add stages[s].start to the "end" case? It seems
a little inconsistent right now.

Other than that, and taking into account how much I understand about
imgui, consider this:

Reviewed-by: Rafael Antognolli 

> + }
> +
> + /* Constant URB input */
> + hovered = _Hovered(mouse_pos, window_hovered, const_tl, const_br);
> + ImGui::ColorConvertHSVtoRGB((2 * s + 1) * 1.0f / (2 * n_stages),
> + 1.0f, hovered ? 1.0f : 0.8f,
> + r, g, b);
> + draw_list->AddRectFilled(const_tl, const_br, ImColor(r, g, b));
> + if (hovered) {
> +ImGui::SetTooltip("%s constant: start=%u end=%u",
> +  stage_names[s],
> +  stages[s].rd_offset,
> +  stages[s].rd_offset + stages[s].rd_length);
> + }
> +
> + const_idx += stages[s].const_rd_length;
> +  }
> +
> +  ImGui::EndChild();
> +   }
> +};
> +
> +#endif /* AUBINATOR_VIEWER_URB_H */
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 10/11] intel: aubinator_viewer: store urb state during decoding

2018-08-08 Thread Rafael Antognolli
I'm not that familiar with this code yet, so take this review with a
grain of salt, but it looks good to me.

Reviewed-by: Rafael Antognolli 

Just a few comments below but nothing really important.

On Tue, Aug 07, 2018 at 06:35:21PM +0100, Lionel Landwerlin wrote:
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aubinator_viewer.h   |  26 
>  src/intel/tools/aubinator_viewer_decoder.cpp | 150 ---
>  2 files changed, 153 insertions(+), 23 deletions(-)
> 
> diff --git a/src/intel/tools/aubinator_viewer.h 
> b/src/intel/tools/aubinator_viewer.h
> index 2d89d9cf658..4a030efc0d0 100644
> --- a/src/intel/tools/aubinator_viewer.h
> +++ b/src/intel/tools/aubinator_viewer.h
> @@ -33,12 +33,35 @@ struct aub_viewer_decode_cfg {
>  show_dwords(true) {}
>  };
>  
> +enum aub_decode_stage {
> +   AUB_DECODE_STAGE_VS,
> +   AUB_DECODE_STAGE_HS,
> +   AUB_DECODE_STAGE_DS,
> +   AUB_DECODE_STAGE_GS,
> +   AUB_DECODE_STAGE_PS,
> +   AUB_DECODE_STAGE_CS,
> +   AUB_DECODE_N_STAGE,
> +};
> +
> +struct aub_decode_urb_stage_state {
> +   uint32_t start;
> +   uint32_t size;
> +   uint32_t n_entries;
> +
> +   uint32_t const_rd_length;
> +   uint32_t rd_offset;
> +   uint32_t rd_length;
> +   uint32_t wr_offset;
> +   uint32_t wr_length;
> +};
> +
>  struct aub_viewer_decode_ctx {
> struct gen_batch_decode_bo (*get_bo)(void *user_data, uint64_t address);
> unsigned (*get_state_size)(void *user_data,
>uint32_t offset_from_dynamic_state_base_addr);
>  
> void (*display_shader)(void *user_data, const char *shader_desc, uint64_t 
> address);
> +   void (*display_urb)(void *user_data, const struct 
> aub_decode_urb_stage_state *stages);
> void (*edit_address)(void *user_data, uint64_t address, uint32_t length);
>  
> void *user_data;
> @@ -53,6 +76,9 @@ struct aub_viewer_decode_ctx {
> uint64_t dynamic_base;
> uint64_t instruction_base;
>  
> +   enum aub_decode_stage stage;
> +   uint32_t end_urb_offset;
> +   struct aub_decode_urb_stage_state urb_stages[AUB_DECODE_N_STAGE];
>  };
>  
>  void aub_viewer_decode_ctx_init(struct aub_viewer_decode_ctx *ctx,
> diff --git a/src/intel/tools/aubinator_viewer_decoder.cpp 
> b/src/intel/tools/aubinator_viewer_decoder.cpp
> index a2ea3ba4a64..273bc2da376 100644
> --- a/src/intel/tools/aubinator_viewer_decoder.cpp
> +++ b/src/intel/tools/aubinator_viewer_decoder.cpp
> @@ -695,38 +695,125 @@ decode_load_register_imm(struct aub_viewer_decode_ctx 
> *ctx,
> }
>  }
>  
> +static void
> +decode_3dprimitive(struct aub_viewer_decode_ctx *ctx,
> +   struct gen_group *inst,
> +   const uint32_t *p)
> +{
> +   if (ctx->display_urb) {
> +  if (ImGui::Button("Show URB"))
> + ctx->display_urb(ctx->user_data, ctx->urb_stages);
> +   }
> +}
> +
> +static void
> +handle_urb(struct aub_viewer_decode_ctx *ctx,
> +   struct gen_group *inst,
> +   const uint32_t *p)
> +{
> +   struct gen_field_iterator iter;
> +   gen_field_iterator_init(, inst, p, 0, false);
> +   while (gen_field_iterator_next()) {
> +  if (strstr(iter.name, "URB Starting Address")) {
> + ctx->urb_stages[ctx->stage].start = iter.raw_value * 8192;
> +  } else if (strstr(iter.name, "URB Entry Allocation Size")) {
> + ctx->urb_stages[ctx->stage].size = (iter.raw_value + 1) * 64;
> +  } else if (strstr(iter.name, "Number of URB Entries")) {
> + ctx->urb_stages[ctx->stage].n_entries = iter.raw_value;
> +  }
> +   }
> +
> +   ctx->end_urb_offset = MAX2(ctx->urb_stages[ctx->stage].start +
> +  ctx->urb_stages[ctx->stage].n_entries *
> +  ctx->urb_stages[ctx->stage].size,
> +  ctx->end_urb_offset);
> +}
> +
> +static void
> +handle_urb_read(struct aub_viewer_decode_ctx *ctx,
> +struct gen_group *inst,
> +const uint32_t *p)
> +{
> +   struct gen_field_iterator iter;
> +   gen_field_iterator_init(, inst, p, 0, false);
> +   while (gen_field_iterator_next()) {
> +  /* Workaround the "Force * URB Entry Read Length" fields */
> +  if (iter.end_bit - iter.start_bit < 2)
> + continue;
> +
> +  if (strstr(iter.name, "URB Entry Read Offset")) {
> + ctx->urb_stages[ctx->stage].rd_offset = iter.raw_value * 32;
> +  } else if (strstr(iter.name, "URB Entry Read Length")) {
> + ctx->urb_stages[ctx->stage].rd_length = 

Re: [Mesa-dev] [PATCH v2 07/11] intel: tools: aub_mem: reuse already mapped ppgtt buffers

2018-08-08 Thread Rafael Antognolli
On Tue, Aug 07, 2018 at 06:35:18PM +0100, Lionel Landwerlin wrote:
> When we map a PPGTT buffer into a continous address space of aubinator
> to be able to inspect it, we currently add it to the list of BOs to
> unmap once we're finished. An optimization we can apply it to look up
> that list before trying to remap PPGTT buffers again (we already do
> this for GGTT buffers).
> 
> We need to take some care before doing this because the list also
> contains GGTT BOs. As GGTT & PPGTT are 2 different address spaces, we
> can have matching addresses in both that point to different physical
> locations.

So, before this change, we could have the same address for PPGTT and
GGTT on the map list, but they never clashed because we only added the
PPGTT ones at the end, and then unmapped them? Or was there something
else preventing them from conflicting?

> This changes adds a flag on the elements of the list of mapped BOs to
> differenciate between GGTT & PPGTT, which allows use to reuse that
> list when looking up both address spaces.
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aub_mem.c | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/src/intel/tools/aub_mem.c b/src/intel/tools/aub_mem.c
> index 2d29386e57c..3d4dc8061bd 100644
> --- a/src/intel/tools/aub_mem.c
> +++ b/src/intel/tools/aub_mem.c
> @@ -42,6 +42,7 @@ struct bo_map {
> struct list_head link;
> struct gen_batch_decode_bo bo;
> bool unmap_after_use;
> +   bool ppgtt;
>  };
>  
>  struct ggtt_entry {
> @@ -59,10 +60,11 @@ struct phys_mem {
>  };
>  
>  static void
> -add_gtt_bo_map(struct aub_mem *mem, struct gen_batch_decode_bo bo, bool 
> unmap_after_use)
> +add_gtt_bo_map(struct aub_mem *mem, struct gen_batch_decode_bo bo, bool 
> ppgtt, bool unmap_after_use)
>  {
> struct bo_map *m = calloc(1, sizeof(*m));
>  
> +   m->ppgtt = ppgtt;
> m->bo = bo;
> m->unmap_after_use = unmap_after_use;
> list_add(>link, >maps);
> @@ -190,7 +192,7 @@ aub_mem_local_write(void *_mem, uint64_t address,
>.addr = address,
>.size = size,
> };
> -   add_gtt_bo_map(mem, bo, false);
> +   add_gtt_bo_map(mem, bo, false, false);
>  }
>  
>  void
> @@ -253,7 +255,7 @@ aub_mem_get_ggtt_bo(void *_mem, uint64_t address)
> struct gen_batch_decode_bo bo = {0};
>  
> list_for_each_entry(struct bo_map, i, >maps, link)
> -  if (i->bo.addr <= address && i->bo.addr + i->bo.size > address)
> +  if (!i->ppgtt && i->bo.addr <= address && i->bo.addr + i->bo.size > 
> address)
>   return i->bo;
>  
> address &= ~0xfff;
> @@ -292,7 +294,7 @@ aub_mem_get_ggtt_bo(void *_mem, uint64_t address)
>assert(res != MAP_FAILED);
> }
>  
> -   add_gtt_bo_map(mem, bo, true);
> +   add_gtt_bo_map(mem, bo, false, true);
>  
> return bo;
>  }
> @@ -328,6 +330,10 @@ aub_mem_get_ppgtt_bo(void *_mem, uint64_t address)
> struct aub_mem *mem = _mem;
> struct gen_batch_decode_bo bo = {0};
>  
> +   list_for_each_entry(struct bo_map, i, >maps, link)
> +  if (i->ppgtt && i->bo.addr <= address && i->bo.addr + i->bo.size > 
> address)
> + return i->bo;
> +
> address &= ~0xfff;
>  
> if (!ppgtt_mapped(mem, mem->pml4, address))
> @@ -353,7 +359,7 @@ aub_mem_get_ppgtt_bo(void *_mem, uint64_t address)
>assert(res != MAP_FAILED);
> }
>  
> -   add_gtt_bo_map(mem, bo, true);
> +   add_gtt_bo_map(mem, bo, true, true);
>  
> return bo;
>  }
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 05/11] intel: tools: create libaub

2018-08-08 Thread Rafael Antognolli
Reviewed-by: Rafael Antognolli 

On Tue, Aug 07, 2018 at 06:35:16PM +0100, Lionel Landwerlin wrote:
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/meson.build | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/src/intel/tools/meson.build b/src/intel/tools/meson.build
> index d749a80afed..258bf7011b3 100644
> --- a/src/intel/tools/meson.build
> +++ b/src/intel/tools/meson.build
> @@ -18,12 +18,22 @@
>  # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
> THE
>  # SOFTWARE.
>  
> +
> +libaub = static_library(
> +  'aub',
> +  files('aub_read.c', 'aub_mem.c'),
> +  include_directories : [inc_common, inc_intel],
> +  link_with : [libintel_common, libintel_dev, libmesa_util],
> +  c_args : [c_vis_args, no_override_init_args],
> +  install : false
> +)
> +
>  aubinator = executable(
>'aubinator',
> -  files('aubinator.c', 'intel_aub.h', 'aub_read.h', 'aub_read.c', 
> 'aub_mem.h', 'aub_mem.c'),
> +  files('aubinator.c'),
>dependencies : [dep_expat, dep_zlib, dep_dl, dep_thread, dep_m],
>include_directories : [inc_common, inc_intel],
> -  link_with : [libintel_common, libintel_compiler, libintel_dev, 
> libmesa_util],
> +  link_with : [libintel_common, libintel_compiler, libintel_dev, 
> libmesa_util, libaub],
>c_args : [c_vis_args, no_override_init_args],
>build_by_default : true,
>install : true
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 02/11] util: rb_tree: add safe iterators

2018-08-07 Thread Rafael Antognolli
On Tue, Aug 07, 2018 at 06:35:13PM +0100, Lionel Landwerlin wrote:
> v2: Add helper to make iterators more readable (Rafael)
> Fix rev iterator bug (Rafael)
> 
> Signed-off-by: Lionel Landwerlin 

Reviewed-by: Rafael Antognolli 

> ---
>  src/util/rb_tree.h | 58 ++
>  1 file changed, 58 insertions(+)
> 
> diff --git a/src/util/rb_tree.h b/src/util/rb_tree.h
> index c77e9255ea2..1e8aeb4a7b2 100644
> --- a/src/util/rb_tree.h
> +++ b/src/util/rb_tree.h
> @@ -227,6 +227,30 @@ struct rb_node *rb_node_next(struct rb_node *node);
>  /** Get the next previous (to the left) in the tree or NULL */
>  struct rb_node *rb_node_prev(struct rb_node *node);
>  
> +/** Get the next node if available or the same node again.
> + *
> + * \param   typeThe type of the containing data structure
> + *
> + * \param   nodeThe variable name for current node in the iteration;
> + *  this will be declared as a pointer to \p type
> + *
> + * \param   field   The rb_node field in containing data structure
> + */
> +#define rb_tree_node_next_if_available(type, node, field) \
> +   (>field != NULL) ? rb_node_data(type, rb_node_next(>field), 
> field) : node
> +
> +/** Get the previous node if available or the same node again.
> + *
> + * \param   typeThe type of the containing data structure
> + *
> + * \param   nodeThe variable name for current node in the iteration;
> + *  this will be declared as a pointer to \p type
> + *
> + * \param   field   The rb_node field in containing data structure
> + */
> +#define rb_tree_node_prev_if_available(type, node, field) \
> +   (>field != NULL) ? rb_node_data(type, rb_node_prev(>field), 
> field) : node
> +
>  /** Iterate over the nodes in the tree
>   *
>   * \param   typeThe type of the containing data structure
> @@ -243,6 +267,23 @@ struct rb_node *rb_node_prev(struct rb_node *node);
>  >field != NULL; \
>  node = rb_node_data(type, rb_node_next(>field), field))
>  
> +/** Iterate over the nodes in the tree, allowing the current node to be freed
> + *
> + * \param   typeThe type of the containing data structure
> + *
> + * \param   nodeThe variable name for current node in the iteration;
> + *  this will be declared as a pointer to \p type
> + *
> + * \param   T   The red-black tree
> + *
> + * \param   field   The rb_node field in containing data structure
> + */
> +#define rb_tree_foreach_safe(type, node, T, field) \
> +   for (type *node = rb_node_data(type, rb_tree_first(T), field), \
> +   *__next = rb_tree_node_next_if_available(type, node, field); \
> +>field != NULL; \
> +node = __next, __next = rb_tree_node_next_if_available(type, node, 
> field))
> +
>  /** Iterate over the nodes in the tree in reverse
>   *
>   * \param   typeThe type of the containing data structure
> @@ -259,6 +300,23 @@ struct rb_node *rb_node_prev(struct rb_node *node);
>  >field != NULL; \
>  node = rb_node_data(type, rb_node_prev(>field), field))
>  
> +/** Iterate over the nodes in the tree in reverse, allowing the current node 
> to be freed
> + *
> + * \param   typeThe type of the containing data structure
> + *
> + * \param   nodeThe variable name for current node in the iteration;
> + *  this will be declared as a pointer to \p type
> + *
> + * \param   T   The red-black tree
> + *
> + * \param   field   The rb_node field in containing data structure
> + */
> +#define rb_tree_foreach_rev_safe(type, node, T, field) \
> +   for (type *node = rb_node_data(type, rb_tree_last(T), field), \
> +   *__prev = rb_tree_node_prev_if_available(type, node, field);  \
> +>field != NULL; \
> +node = __prev, __prev = rb_tree_node_prev_if_available(type, node, 
> field))
> +
>  /** Validate a red-black tree
>   *
>   * This function walks the tree and validates that this is a valid red-
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 09/11] intel: tools: add aubinator viewer

2018-08-07 Thread Rafael Antognolli
On Tue, Aug 07, 2018 at 06:35:20PM +0100, Lionel Landwerlin wrote:
> A graphical user interface version of aubinator.
> Allows you to :
> 
>- simultaneously look at multiple points in the aub file (using all
>  the goodness of the existing decoding in aubinator)
> 
>- edit an aub file
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aubinator_viewer.cpp | 1118 ++
>  src/intel/tools/aubinator_viewer.h   |   71 ++
>  src/intel/tools/aubinator_viewer_decoder.cpp |  860 ++
>  src/intel/tools/imgui/imgui_memory_editor.h  |  704 +++
>  src/intel/tools/meson.build  |   12 +
>  5 files changed, 2765 insertions(+)
>  create mode 100644 src/intel/tools/aubinator_viewer.cpp
>  create mode 100644 src/intel/tools/aubinator_viewer.h
>  create mode 100644 src/intel/tools/aubinator_viewer_decoder.cpp
>  create mode 100644 src/intel/tools/imgui/imgui_memory_editor.h
> 

[...]

> +int main(int argc, char *argv[])
> +{
> +   int c, i;
> +   bool help = false;
> +   const struct option aubinator_opts[] = {
> +  { "help",  no_argument,   (int *) , 
> true },
> +  { "xml",   required_argument, NULL,  
> 'x' },
> +  { NULL,0, NULL,  0 
> }
> +   };
> +
> +   memset(, 0, sizeof(context));
> +
> +   i = 0;
> +   while ((c = getopt_long(argc, argv, "x:s:", aubinator_opts, )) != -1) {
> +  switch (c) {
> +  case 'x':
> + context.xml_path = strdup(optarg);
> + break;
> +  default:
> + break;
> +  }
> +   }
> +
> +   if (optind < argc)
> +  context.input_file = argv[optind];
> +
> +   if (help || !context.input_file) {
> +  print_help(argv[0], stderr);
> +  exit(0);
> +   }
> +
> +   context.file = aub_file_open(context.input_file);
> +
> +   gtk_init(NULL, NULL);
> +
> +   context.gtk_window = gtk_window_new(GTK_WINDOW_TOPLEVEL);
> +   gtk_window_set_title(GTK_WINDOW(context.gtk_window), "GPUTop");

Ah, I guess this is why I'm seeing "GPUTop" at the title bar :P

> +   g_signal_connect(context.gtk_window, "delete-event", 
> G_CALLBACK(gtk_main_quit), NULL);
> +   gtk_window_resize(GTK_WINDOW(context.gtk_window), 1280, 720);
> +
> +   GtkWidget* gl_area = gtk_gl_area_new();
> +   g_signal_connect(gl_area, "render", G_CALLBACK(repaint_area), NULL);
> +   gtk_container_add(GTK_CONTAINER(context.gtk_window), gl_area);
> +
> +   gtk_widget_show_all(context.gtk_window);
> +
> +   ImGui::CreateContext();
> +   ImGui_ImplGtk3_Init(gl_area, true);
> +   ImGui_ImplOpenGL3_Init("#version 130");
> +
> +   init_ui();
> +
> +   gtk_main();
> +
> +   ImGui_ImplOpenGL3_Shutdown();
> +   ImGui_ImplGtk3_Shutdown();
> +   ImGui::DestroyContext();
> +
> +   free(context.xml_path);
> +
> +   return EXIT_SUCCESS;
> +}

I'm not sure what's going on here, but now when I close the window by
clicking on the "x" at the top right I see this message:

aubinator_viewer: ../../src/libepoxy/src/dispatch_common.c:863: 
epoxy_get_proc_address: Assertion `0 && "Couldn't find current GLX or EGL 
context.\n"' failed.

That's the only issue I've seen so far, and since it's only when closing
the window, I guess we can fix that later.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/14] intel: tools: add aubinator viewer

2018-08-06 Thread Rafael Antognolli
This is a really nice tool, and I'm looking forward to see the other
features you teased us with, as well as helping to improve it.

With that said, I don't know much about the imgui API and can't really
do much to help reviewing it yet. But I would like to have it landed
anyway.

Patches 13 and 14 are:

Acked-by: Rafael Antognolli  

On Thu, Aug 02, 2018 at 10:39:25AM +0100, Lionel Landwerlin wrote:
> A graphical user interface version of aubinator.
> Allows you to :
> 
>- simultaneously look at multiple points in the aub file (using all
>  the goodness of the existing decoding in aubinator)
> 
>- edit an aub file
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aubinator_viewer.cpp | 1173 ++
>  src/intel/tools/aubinator_viewer.h   |   61 +
>  src/intel/tools/aubinator_viewer_decoder.cpp |  849 +
>  src/intel/tools/imgui/imgui_memory_editor.h  |  426 +++
>  src/intel/tools/meson.build  |   13 +
>  5 files changed, 2522 insertions(+)
>  create mode 100644 src/intel/tools/aubinator_viewer.cpp
>  create mode 100644 src/intel/tools/aubinator_viewer.h
>  create mode 100644 src/intel/tools/aubinator_viewer_decoder.cpp
>  create mode 100644 src/intel/tools/imgui/imgui_memory_editor.h
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 12/14] build: new tool option for intel ui tools

2018-08-06 Thread Rafael Antognolli
Ugh, I just replied with the whole message without cutting it out, so in
case it doesn't reach the ML, this is what I wanted to say:

On Mon, Aug 06, 2018 at 11:19:20AM -0700, Rafael Antognolli wrote:
> I would change the commit summary line to make it clear we are importing
> imgui code, instead of just adding a build option.
> 
> With that,
> 
> Acked-by: Rafael Antognolli 
> 
> On Thu, Aug 02, 2018 at 10:39:24AM +0100, Lionel Landwerlin wrote:
> > We want to add a new UI tool to decode aub files. This will use the
> > Dear ImGui library to render its interface.
> > 
> > The main way to use ImGui is to embed its source code at a particular
> > revision. Most embedding projects have to do a bit of integration
> > which is really specific to one's project. In our case the only
> > modification is to include libepoxy instead of gl3w.
> > 
> > The import was done at this commit (https://github.com/ocornut/imgui) :
> > 
> > commit 6211f40f3d903dd9df961256e044029c49793aa3
> > Author: omar 
> > Date:   Fri Jul 27 12:29:33 2018 +0200
> > 
> > Internals: Drag and Drop: default drop preview use a narrower clipping 
> > rectangle (no effect here, but other branches uses a narrow clipping 
> > rectangle that was too small so this is a fix for it) + Comments
> > 
> > Signed-off-by: Lionel Landwerlin 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/14] intel: tools: aubmem: map gtt data to aub file

2018-08-06 Thread Rafael Antognolli
On Thu, Aug 02, 2018 at 10:39:23AM +0100, Lionel Landwerlin wrote:
> This will allow the aubinator viewer tool to modify the aub data that
> was loaded at a particular gtt address.
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aub_mem.c | 29 +
>  src/intel/tools/aub_mem.h |  6 ++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/src/intel/tools/aub_mem.c b/src/intel/tools/aub_mem.c
> index 788a2461130..2d29386e57c 100644
> --- a/src/intel/tools/aub_mem.c
> +++ b/src/intel/tools/aub_mem.c
> @@ -55,6 +55,7 @@ struct phys_mem {
> uint64_t fd_offset;
> uint64_t phys_addr;
> uint8_t *data;
> +   const uint8_t *aub_data;
>  };
>  
>  static void
> @@ -220,6 +221,7 @@ aub_mem_phys_write(void *_mem, uint64_t phys_address,
>uint32_t size_this_page = MIN2(to_write, 4096 - offset);
>to_write -= size_this_page;
>memcpy(pmem->data + offset, data, size_this_page);
> +  pmem->aub_data = data - offset;
>data = (const uint8_t *)data + size_this_page;
> }
>  }
> @@ -389,3 +391,30 @@ aub_mem_fini(struct aub_mem *mem)
> close(mem->mem_fd);
> mem->mem_fd = -1;
>  }
> +
> +struct gen_batch_decode_bo
> +aub_mem_get_phys_addr_data(struct aub_mem *mem, uint64_t phys_addr)
> +{
> +   struct phys_mem *page = search_phys_mem(mem, phys_addr);
> +   return page ?
> +  (struct gen_batch_decode_bo) { .map = page->data, .addr = 
> page->phys_addr, .size = 4096 } :

Looks like we are starting to use gen_batch_decode_bo as a generic
address pointer now (to both physical, virtual or aub data memory), so
maybe at some point we might want to change that name.

Doesn't need to be done in this patch, though.

Reviewed-by: Rafael Antognolli 

> +  (struct gen_batch_decode_bo) {};
> +}
> +
> +struct gen_batch_decode_bo
> +aub_mem_get_ppgtt_addr_data(struct aub_mem *mem, uint64_t virt_addr)
> +{
> +   struct phys_mem *page = ppgtt_walk(mem, mem->pml4, virt_addr);
> +   return page ?
> +  (struct gen_batch_decode_bo) { .map = page->data, .addr = virt_addr & 
> ~((1ULL << 12) - 1), .size = 4096 } :
> +  (struct gen_batch_decode_bo) {};
> +}
> +
> +struct gen_batch_decode_bo
> +aub_mem_get_ppgtt_addr_aub_data(struct aub_mem *mem, uint64_t virt_addr)
> +{
> +   struct phys_mem *page = ppgtt_walk(mem, mem->pml4, virt_addr);
> +   return page ?
> +  (struct gen_batch_decode_bo) { .map = page->aub_data, .addr = 
> virt_addr & ~((1ULL << 12) - 1), .size = 4096 } :
> +  (struct gen_batch_decode_bo) {};
> +}
> diff --git a/src/intel/tools/aub_mem.h b/src/intel/tools/aub_mem.h
> index 98e64214b98..1d73d3340f2 100644
> --- a/src/intel/tools/aub_mem.h
> +++ b/src/intel/tools/aub_mem.h
> @@ -65,6 +65,12 @@ void aub_mem_local_write(void *mem, uint64_t virt_address,
>  struct gen_batch_decode_bo aub_mem_get_ggtt_bo(void *mem, uint64_t address);
>  struct gen_batch_decode_bo aub_mem_get_ppgtt_bo(void *mem, uint64_t address);
>  
> +struct gen_batch_decode_bo aub_mem_get_phys_addr_data(struct aub_mem *mem, 
> uint64_t phys_addr);
> +struct gen_batch_decode_bo aub_mem_get_ppgtt_addr_data(struct aub_mem *mem, 
> uint64_t virt_addr);
> +
> +struct gen_batch_decode_bo aub_mem_get_ppgtt_addr_aub_data(struct aub_mem 
> *mem, uint64_t virt_addr);
> +
> +
>  #ifdef __cplusplus
>  }
>  #endif
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/14] intel: tools: split memory management out of aubinator

2018-08-03 Thread Rafael Antognolli
This also looks like a harmless and useful refactory.

Reviewed-by: Rafael Antognolli 

On Thu, Aug 02, 2018 at 10:39:21AM +0100, Lionel Landwerlin wrote:
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aub_mem.c   | 391 
>  src/intel/tools/aub_mem.h   |  72 +++
>  src/intel/tools/aubinator.c | 379 +++---
>  src/intel/tools/meson.build |   2 +-
>  4 files changed, 491 insertions(+), 353 deletions(-)
>  create mode 100644 src/intel/tools/aub_mem.c
>  create mode 100644 src/intel/tools/aub_mem.h
> 
> diff --git a/src/intel/tools/aub_mem.c b/src/intel/tools/aub_mem.c
> new file mode 100644
> index 000..788a2461130
> --- /dev/null
> +++ b/src/intel/tools/aub_mem.c
> @@ -0,0 +1,391 @@
> +/*
> + * Copyright © 2016-2018 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "aub_mem.h"
> +
> +#ifndef HAVE_MEMFD_CREATE
> +#include 
> +
> +static inline int
> +memfd_create(const char *name, unsigned int flags)
> +{
> +   return syscall(SYS_memfd_create, name, flags);
> +}
> +#endif
> +
> +struct bo_map {
> +   struct list_head link;
> +   struct gen_batch_decode_bo bo;
> +   bool unmap_after_use;
> +};
> +
> +struct ggtt_entry {
> +   struct rb_node node;
> +   uint64_t virt_addr;
> +   uint64_t phys_addr;
> +};
> +
> +struct phys_mem {
> +   struct rb_node node;
> +   uint64_t fd_offset;
> +   uint64_t phys_addr;
> +   uint8_t *data;
> +};
> +
> +static void
> +add_gtt_bo_map(struct aub_mem *mem, struct gen_batch_decode_bo bo, bool 
> unmap_after_use)
> +{
> +   struct bo_map *m = calloc(1, sizeof(*m));
> +
> +   m->bo = bo;
> +   m->unmap_after_use = unmap_after_use;
> +   list_add(>link, >maps);
> +}
> +
> +void
> +aub_mem_clear_bo_maps(struct aub_mem *mem)
> +{
> +   list_for_each_entry_safe(struct bo_map, i, >maps, link) {
> +  if (i->unmap_after_use)
> + munmap((void *)i->bo.map, i->bo.size);
> +  list_del(>link);
> +  free(i);
> +   }
> +}
> +
> +static inline struct ggtt_entry *
> +ggtt_entry_next(struct ggtt_entry *entry)
> +{
> +   if (!entry)
> +  return NULL;
> +   struct rb_node *node = rb_node_next(>node);
> +   if (!node)
> +  return NULL;
> +   return rb_node_data(struct ggtt_entry, node, node);
> +}
> +
> +static inline int
> +cmp_uint64(uint64_t a, uint64_t b)
> +{
> +   if (a < b)
> +  return -1;
> +   if (a > b)
> +  return 1;
> +   return 0;
> +}
> +
> +static inline int
> +cmp_ggtt_entry(const struct rb_node *node, const void *addr)
> +{
> +   struct ggtt_entry *entry = rb_node_data(struct ggtt_entry, node, node);
> +   return cmp_uint64(entry->virt_addr, *(const uint64_t *)addr);
> +}
> +
> +static struct ggtt_entry *
> +ensure_ggtt_entry(struct aub_mem *mem, uint64_t virt_addr)
> +{
> +   struct rb_node *node = rb_tree_search_sloppy(>ggtt, _addr,
> +cmp_ggtt_entry);
> +   int cmp = 0;
> +   if (!node || (cmp = cmp_ggtt_entry(node, _addr))) {
> +  struct ggtt_entry *new_entry = calloc(1, sizeof(*new_entry));
> +  new_entry->virt_addr = virt_addr;
> +  rb_tree_insert_at(>ggtt, node, _entry->node, cmp > 0);
> +  node = _entry->node;
> +   }
> +
> +   return rb_node_data(struct ggtt_entry, node, node);
> +}
> +
> 

Re: [Mesa-dev] [PATCH 10/14] intel: tools: aubwrite: wrap function declarations for c++

2018-08-03 Thread Rafael Antognolli
Reviewed-by: Rafael Antognolli 

On Thu, Aug 02, 2018 at 10:39:22AM +0100, Lionel Landwerlin wrote:
> ---
>  src/intel/tools/aub_write.h | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/src/intel/tools/aub_write.h b/src/intel/tools/aub_write.h
> index b421679b9eb..6a09c1747b9 100644
> --- a/src/intel/tools/aub_write.h
> +++ b/src/intel/tools/aub_write.h
> @@ -31,6 +31,10 @@
>  #include "dev/gen_device_info.h"
>  #include "common/gen_gem.h"
>  
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
>  struct aub_ppgtt_table {
> uint64_t phys_addr;
> struct aub_ppgtt_table *subtables[512];
> @@ -78,4 +82,8 @@ void aub_write_trace_block(struct aub_file *aub,
>  void aub_write_exec(struct aub_file *aub, uint64_t batch_addr,
>  uint64_t offset, int ring_flag);
>  
> +#ifdef __cplusplus
> +}
> +#endif
> +
>  #endif /* INTEL_AUB_WRITE */
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/14] util: rb_tree: add safe iterators

2018-08-03 Thread Rafael Antognolli
On Thu, Aug 02, 2018 at 10:39:20AM +0100, Lionel Landwerlin wrote:
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/util/rb_tree.h | 36 
>  1 file changed, 36 insertions(+)
> 
> diff --git a/src/util/rb_tree.h b/src/util/rb_tree.h
> index c77e9255ea2..df1a4197b8a 100644
> --- a/src/util/rb_tree.h
> +++ b/src/util/rb_tree.h
> @@ -243,6 +243,24 @@ struct rb_node *rb_node_prev(struct rb_node *node);
>  >field != NULL; \
>  node = rb_node_data(type, rb_node_next(>field), field))
>  
> +/** Iterate over the nodes in the tree, allowing the current node to be freed
> + *
> + * \param   typeThe type of the containing data structure
> + *
> + * \param   nodeThe variable name for current node in the iteration;
> + *  this will be declared as a pointer to \p type
> + *
> + * \param   T   The red-black tree
> + *
> + * \param   field   The rb_node field in containing data structure
> + */
> +#define rb_tree_foreach_safe(type, node, T, field) \
> +   for (type *node = rb_node_data(type, rb_tree_first(T), field), \
> +   *_next = (>field != NULL) ? rb_node_data(type, 
> rb_node_next(>field), field) : node; \
> +>field != NULL; \
> +node = _next, \
> +   _next = (&_next->field != NULL) ? rb_node_data(type, 
> rb_node_next(&_next->field), field) : _next)
> +

This looks nice. I'm thinking if maybe a simple helper would make things
easier to read, like:

#define _next_if_available(node, type, field) \
   (>field != NULL) ? rb_node_data(type, rb_node_next(>field), 
field) : node

Wouldn't help to simplify things, but up to you in the end.

>  /** Iterate over the nodes in the tree in reverse
>   *
>   * \param   typeThe type of the containing data structure
> @@ -259,6 +277,24 @@ struct rb_node *rb_node_prev(struct rb_node *node);
>  >field != NULL; \
>  node = rb_node_data(type, rb_node_prev(>field), field))
>  
> +/** Iterate over the nodes in the tree in reverse, allowing the current node 
> to be freed
> + *
> + * \param   typeThe type of the containing data structure
> + *
> + * \param   nodeThe variable name for current node in the iteration;
> + *  this will be declared as a pointer to \p type
> + *
> + * \param   T   The red-black tree
> + *
> + * \param   field   The rb_node field in containing data structure
> + */
> +#define rb_tree_foreach_rev_safe(type, node, T, field) \
> +   for (type *node = rb_node_data(type, rb_tree_last(T), field), \
> +   *_prev = rb_node_prev(>field) ? rb_node_data(type, 
> rb_node_prev(>field), field) : NULL; \
> +>field != NULL; \
> +node = _prev, \
> +   _prev = (prev && rb_node_prev(&_prev->field)) ? 
> rb_node_data(type, rb_node_prev(&_prev->field), field) : NULL)
   
I can't really find where this "prev" came from, but I also don't see
any warning when building it.

Also, same comment about the helper could apply here I guess, but this
"prev" is making things confusing to me.

> +
>  /** Validate a red-black tree
>   *
>   * This function walks the tree and validates that this is a valid red-
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/14] intel: tools: split aub parsing from aubinator

2018-08-03 Thread Rafael Antognolli
Looks like no functional change, and it's needed by the ui tool, so

Reviewed-by: Rafael Antognolli 

On Thu, Aug 02, 2018 at 10:39:19AM +0100, Lionel Landwerlin wrote:
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aub_read.c  | 307 ++
>  src/intel/tools/aub_read.h  |  75 +
>  src/intel/tools/aubinator.c | 324 +---
>  src/intel/tools/meson.build |   2 +-
>  4 files changed, 426 insertions(+), 282 deletions(-)
>  create mode 100644 src/intel/tools/aub_read.c
>  create mode 100644 src/intel/tools/aub_read.h
> 
> diff --git a/src/intel/tools/aub_read.c b/src/intel/tools/aub_read.c
> new file mode 100644
> index 000..e4578c687ff
> --- /dev/null
> +++ b/src/intel/tools/aub_read.c
> @@ -0,0 +1,307 @@
> +/*
> + * Copyright © 2016-2018 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "common/gen_gem.h"
> +#include "util/macros.h"
> +
> +#include "aub_read.h"
> +#include "intel_aub.h"
> +
> +#define TYPE(dw)   (((dw) >> 29) & 7)
> +#define OPCODE(dw) (((dw) >> 23) & 0x3f)
> +#define SUBOPCODE(dw)  (((dw) >> 16) & 0x7f)
> +
> +#define MAKE_HEADER(type, opcode, subopcode) \
> +   (((type) << 29) | ((opcode) << 23) | ((subopcode) << 16))
> +
> +#define TYPE_AUB0x7
> +
> +/* Classic AUB opcodes */
> +#define OPCODE_AUB  0x01
> +#define SUBOPCODE_HEADER0x05
> +#define SUBOPCODE_BLOCK 0x41
> +#define SUBOPCODE_BMP   0x1e
> +
> +/* Newer version AUB opcode */
> +#define OPCODE_NEW_AUB  0x2e
> +#define SUBOPCODE_REG_POLL  0x02
> +#define SUBOPCODE_REG_WRITE 0x03
> +#define SUBOPCODE_MEM_POLL  0x05
> +#define SUBOPCODE_MEM_WRITE 0x06
> +#define SUBOPCODE_VERSION   0x0e
> +
> +#define MAKE_GEN(major, minor) (((major) << 8) | (minor))
> +
> +static void
> +handle_trace_header(struct aub_read *read, const uint32_t *p)
> +{
> +   /* The intel_aubdump tool from IGT is kind enough to put a PCI-ID= tag in
> +* the AUB header comment.  If the user hasn't specified a hardware
> +* generation, try to use the one from the AUB file.
> +*/
> +   const uint32_t *end = p + (p[0] & 0x) + 2;
> +   int aub_pci_id = 0;
> +
> +   if (end > [12] && p[12] > 0) {
> +  if (sscanf((char *)[13], "PCI-ID=%i", _pci_id) > 0) {
> + if (!gen_get_device_info(aub_pci_id, >devinfo)) {
> +fprintf(stderr, "can't find device information: pci_id=0x%x\n", 
> aub_pci_id);
> +exit(EXIT_FAILURE);
> + }
> +  }
> +   }
> +
> +   char app_name[33];
> +   strncpy(app_name, (const char *)[2], 32);
> +   app_name[32] = 0;
> +
> +   if (read->info)
> +  read->info(read->user_data, aub_pci_id, app_name);
> +}
> +
> +static void
> +handle_memtrace_version(struct aub_read *read, const uint32_t *p)
> +{
> +   int header_length = p[0] & 0x;
> +   char app_name[64];
> +   int app_name_len = MIN2(4 * (header_length + 1 - 5), ARRAY_SIZE(app_name) 
> - 1);
> +   int pci_id_len = 0;
> +   int aub_pci_id = 0;
> +
> +   strncpy(app_name, (const char *)[5], app_name_len);
> +   app_name[app_name_len] = 0;
> +
> +   if (sscanf(app_name, "PCI-ID=%i %n", _pci_id, _id_len) > 0) {
> +  if (!gen_get_device_info(aub_pci_id, >devinfo)) {

Re: [Mesa-dev] [PATCH 01/14] intel: aubinator: fix read the context/ring

2018-08-03 Thread Rafael Antognolli
On Thu, Aug 02, 2018 at 10:39:13AM +0100, Lionel Landwerlin wrote:
> Up to now we've been lucky that the buffer returned was always exactly
> at the address we requested.

Looks like this needs to land, even if the rest of the series doesn't.

Reviewed-by: Rafael Antognolli 

> Fixes: 144b40db5411 ("intel: aubinator: drop the 1Tb GTT mapping")
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aubinator.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> index 8989d558b66..3fec04c24c4 100644
> --- a/src/intel/tools/aubinator.c
> +++ b/src/intel/tools/aubinator.c
> @@ -590,7 +590,7 @@ handle_memtrace_reg_write(uint32_t *p)
> uint32_t pphwsp_addr = context_descriptor & 0xf000;
> struct gen_batch_decode_bo pphwsp_bo = get_ggtt_batch_bo(NULL, 
> pphwsp_addr);
> uint32_t *context = (uint32_t *)((uint8_t *)pphwsp_bo.map +
> -(pphwsp_bo.addr - pphwsp_addr) +
> +(pphwsp_addr - pphwsp_bo.addr) +
>  pphwsp_size);
>  
> uint32_t ring_buffer_head = context[5];
> @@ -601,7 +601,7 @@ handle_memtrace_reg_write(uint32_t *p)
> struct gen_batch_decode_bo ring_bo = get_ggtt_batch_bo(NULL,
>ring_buffer_start);
> assert(ring_bo.size > 0);
> -   void *commands = (uint8_t *)ring_bo.map + (ring_bo.addr - 
> ring_buffer_start);
> +   void *commands = (uint8_t *)ring_bo.map + (ring_buffer_start - 
> ring_bo.addr);
>  
> if (context_descriptor & 0x100 /* ppgtt */) {
>batch_ctx.get_bo = get_ppgtt_batch_bo;
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel: tools: aubwrite: split gen[89] from gen10+

2018-07-30 Thread Rafael Antognolli
On Mon, Jul 30, 2018 at 04:28:37PM +0100, Lionel Landwerlin wrote:
> Gen10+ has an additional bit in MI_BATCH_BUFFER_END to signal the end
> of the context image.

Cool, I see you are also adding a couple missing commands and noops into
the gen10+ contexts.

Reviewed-by: Rafael Antognolli 

> We select the largest size for the context image regardless of the
> generation.
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aub_write.c | 216 +---
>  src/intel/tools/gen10_context.h | 141 +
>  src/intel/tools/gen8_context.h  | 135 
>  src/intel/tools/gen_context.h   | 107 
>  src/intel/tools/meson.build |   3 +-
>  5 files changed, 416 insertions(+), 186 deletions(-)
>  create mode 100644 src/intel/tools/gen10_context.h
>  create mode 100644 src/intel/tools/gen8_context.h
>  create mode 100644 src/intel/tools/gen_context.h
> 
> diff --git a/src/intel/tools/aub_write.c b/src/intel/tools/aub_write.c
> index 6fb99feb005..e92bdaf5ed4 100644
> --- a/src/intel/tools/aub_write.c
> +++ b/src/intel/tools/aub_write.c
> @@ -31,18 +31,14 @@
>  
>  #include "i915_drm.h"
>  #include "intel_aub.h"
> +#include "gen_context.h"
>  
>  #ifndef ALIGN
>  #define ALIGN(x, y) (((x) + (y)-1) & ~((y)-1))
>  #endif
>  
> -#define MI_LOAD_REGISTER_IMM_n(n) ((0x22 << 23) | (2 * (n) - 1))
> -#define MI_LRI_FORCE_POSTED   (1<<12)
> -
>  #define MI_BATCH_NON_SECURE_I965 (1 << 8)
>  
> -#define MI_BATCH_BUFFER_END (0xA << 23)
> -
>  #define min(a, b) ({\
>   __typeof(a) _a = (a);  \
>   __typeof(b) _b = (b);  \
> @@ -55,183 +51,33 @@
>   _a > _b ? _a : _b; \
>})
>  
> -#define HWS_PGA_RCSUNIT  0x02080
> -#define HWS_PGA_VCSUNIT0   0x12080
> -#define HWS_PGA_BCSUNIT  0x22080
> -
> -#define GFX_MODE_RCSUNIT   0x0229c
> -#define GFX_MODE_VCSUNIT0   0x1229c
> -#define GFX_MODE_BCSUNIT   0x2229c
> -
> -#define EXECLIST_SUBMITPORT_RCSUNIT   0x02230
> -#define EXECLIST_SUBMITPORT_VCSUNIT0   0x12230
> -#define EXECLIST_SUBMITPORT_BCSUNIT   0x22230
> -
> -#define EXECLIST_STATUS_RCSUNIT  0x02234
> -#define EXECLIST_STATUS_VCSUNIT0   0x12234
> -#define EXECLIST_STATUS_BCSUNIT  0x22234
> -
> -#define EXECLIST_SQ_CONTENTS0_RCSUNIT   0x02510
> -#define EXECLIST_SQ_CONTENTS0_VCSUNIT0   0x12510
> -#define EXECLIST_SQ_CONTENTS0_BCSUNIT   0x22510
> -
> -#define EXECLIST_CONTROL_RCSUNIT   0x02550
> -#define EXECLIST_CONTROL_VCSUNIT0   0x12550
> -#define EXECLIST_CONTROL_BCSUNIT   0x22550
> -
> -#define MEMORY_MAP_SIZE (64 /* MiB */ * 1024 * 1024)
> -
> -#define PTE_SIZE 4
> -#define GEN8_PTE_SIZE 8
> -
> -#define NUM_PT_ENTRIES (ALIGN(MEMORY_MAP_SIZE, 4096) / 4096)
> -#define PT_SIZE ALIGN(NUM_PT_ENTRIES * GEN8_PTE_SIZE, 4096)
> -
> -#define RING_SIZE (1 * 4096)
> -#define PPHWSP_SIZE (1 * 4096)
> -#define GEN11_LR_CONTEXT_RENDER_SIZE(14 * 4096)
> -#define GEN10_LR_CONTEXT_RENDER_SIZE(19 * 4096)
> -#define GEN9_LR_CONTEXT_RENDER_SIZE (22 * 4096)
> -#define GEN8_LR_CONTEXT_RENDER_SIZE (20 * 4096)
> -#define GEN8_LR_CONTEXT_OTHER_SIZE  (2 * 4096)
> -
> -
> -#define STATIC_GGTT_MAP_START 0
> -
> -#define RENDER_RING_ADDR STATIC_GGTT_MAP_START
> -#define RENDER_CONTEXT_ADDR (RENDER_RING_ADDR + RING_SIZE)
> -
> -#define BLITTER_RING_ADDR (RENDER_CONTEXT_ADDR + PPHWSP_SIZE + 
> GEN10_LR_CONTEXT_RENDER_SIZE)
> -#define BLITTER_CONTEXT_ADDR (BLITTER_RING_ADDR + RING_SIZE)
> -
> -#define VIDEO_RING_ADDR (BLITTER_CONTEXT_ADDR + PPHWSP_SIZE + 
> GEN8_LR_CONTEXT_OTHER_SIZE)
> -#define VIDEO_CONTEXT_ADDR (VIDEO_RING_ADDR + RING_SIZE)
> -
> -#define STATIC_GGTT_MAP_END (VIDEO_CONTEXT_ADDR + PPHWSP_SIZE + 
> GEN8_LR_CONTEXT_OTHER_SIZE)
> -#define STATIC_GGTT_MAP_SIZE (STATIC_GGTT_MAP_END - STATIC_GGTT_MAP_START)
> -
> -#define PML4_PHYS_ADDR ((uint64_t)(STATIC_GGTT_MAP_END))
> -
> -#define CONTEXT_FLAGS (0x339)   /* Normal Priority | L3-LLC Coherency |
> - * PPGTT Enabled |
> - * Legacy Context with 64 bit VA support |
> - * Valid
> - */
> -
> -#define RENDER_CONTEXT_DESCRIPTOR  ((uint64_t)1 << 62 | RENDER_CONTEXT_ADDR  
> | CONTEXT_FLAGS)
> -#define BLITTER_CONTEXT_DESCRIPTOR ((uint64_t)2 << 62 | BLITTER_CONTEXT_ADDR 
> | CONTEXT_FLAGS)
> -#define VIDEO_CONTEXT_DESCRIPTOR   ((uint64_t)3 << 62 | VIDEO_CONTEXT_ADDR   
> | CONTEXT_FLAGS)
> -
> -sta

Re: [Mesa-dev] [PATCH] i965: Disable guardband clipping on SandyBridge for odd dimensions

2018-07-26 Thread Rafael Antognolli
Hi Vadym,

Ken and Ian explained a bit the situation on this one to me, and it
looks like neither of them are really against this patch. So unless
someone else raise any concern, I'll ack and push the patch later today.

Thanks for fixing this.

Rafael

On Thu, Jul 26, 2018 at 04:04:29PM +0300, Vadym Shovkoplias wrote:
> ping
> 
> On Tue, Jul 3, 2018 at 5:09 PM, Vadim Shovkoplias 
> 
> wrote:
> 
> Hi mesa devs,
> 
> Can anyone please review this ? 
> This patch fixes following bugs:
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104388
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106158
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106667
> 
>
> 2018-06-07 18:27 GMT+03:00 Vadim Shovkoplias 
> :
> 
> Hi Kenneth,
> 
> Can you please look at this patch ?
> 
> 2018-06-07 15:30 GMT+03:00 Den :
> 
> Hello. Found out that this patch also fixes 2 new issues:
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106158
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106667
> 
> Tested-by: Denis 
> 
> 
> 
> On 24.05.18 14:16, vadym.shovkoplias wrote:
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104388
> Signed-off-by: Andriy Khulap 
> ---
>   src/mesa/drivers/dri/i965/genX_state_upload.c | 11
> +++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/
> src/mesa/drivers/dri/i965/genX_state_upload.c
> index b485e2c..5aa8033 100644
> --- a/src/mesa/drivers/dri/i965/genX_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
> @@ -2473,6 +2473,17 @@ brw_calculate_guardband_size(uint32_t
> fb_width, uint32_t fb_height,
>   */
>  const float gb_size = GEN_GEN >= 7 ? 16384.0f : 8192.0f;
>   +   /* Workaround: prevent gpu hangs on SandyBridge
> +* by disabling guardband clipping for odd dimensions.
> +*/
> +   if (GEN_GEN == 6 && (fb_width & 1 || fb_height & 1)) {
> +  *xmin = -1.0f;
> +  *xmax =  1.0f;
> +  *ymin = -1.0f;
> +  *ymax =  1.0f;
> +  return;
> +   }
> +
>  if (m00 != 0 && m11 != 0) {
> /* First, we compute the screen-space render area */
> const float ss_ra_xmin = MIN3(0, m30 + m00, 
> m30
> - m00);
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
> 
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
> 
> 
> 
> --
> 
> Vadym Shovkoplias | Senior Software Engineer
> GlobalLogic
> P +380.57.766.7667  M +3.8050.931.7304  S vadym.shovkoplias
> www.globallogic.com
>  
> http://www.globallogic.com/email_disclaimer.txt
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] intel: tools: dump: make dump tool reliable under gdb

2018-07-20 Thread Rafael Antognolli
On Fri, Jul 20, 2018 at 11:24:58AM +0100, Lionel Landwerlin wrote:
> The problem with passing the configuration of the dump lib through a
> file descriptor is that it can be read only once. But under gdb you
> might want to rerun your program multiple times.

> This change hands the configuration through a temporary file that is
> deleted once the command line passes to intel_dump_gpu has exited.

Nice, I noticed this weird behavior too, thanks for fixing.

Reviewed-by: Rafael Antognolli 

> Signed-off-by: Lionel Landwerlin 

> ---
>  src/intel/tools/intel_dump_gpu.c  |  2 +-
>  src/intel/tools/intel_dump_gpu.in | 11 ---
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/src/intel/tools/intel_dump_gpu.c 
> b/src/intel/tools/intel_dump_gpu.c
> index 6ce7d452afb..1abe54147cf 100644
> --- a/src/intel/tools/intel_dump_gpu.c
> +++ b/src/intel/tools/intel_dump_gpu.c
> @@ -349,7 +349,7 @@ maybe_init(void)
>  
> initialized = true;
>  
> -   config = fdopen(3, "r");
> +   config = fopen(getenv("INTEL_DUMP_GPU_CONFIG"), "r");
> while (fscanf(config, "%m[^=]=%m[^\n]\n", , ) != EOF) {
>if (!strcmp(key, "verbose")) {
>   if (!strcmp(value, "1")) {
> diff --git a/src/intel/tools/intel_dump_gpu.in 
> b/src/intel/tools/intel_dump_gpu.in
> index 9eea37189db..0454cff25da 100755
> --- a/src/intel/tools/intel_dump_gpu.in
> +++ b/src/intel/tools/intel_dump_gpu.in
> @@ -82,7 +82,12 @@ done
>  
>  [ -z $file ] && add_arg "file=intel.aub"
>  
> +tmp_file=`mktemp`
> +echo -e $args > $tmp_file
> +
>  
> LD_PRELOAD="@install_libexecdir@/libintel_dump_gpu.so${LD_PPRELOAD:+:$LD_PRELOAD}"
>  \
> -  exec -- "$@" 3< -`echo -e $args`
> -EOF
> +  INTEL_DUMP_GPU_CONFIG=$tmp_file \
> +  $@
> +ret=$?
> +rm $tmp_file
> +exit $ret
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] intel: tools: dump: protect against multiple calls on destructor

2018-07-20 Thread Rafael Antognolli
On Fri, Jul 20, 2018 at 05:19:57PM +0100, Lionel Landwerlin wrote:
> When running gdb, make sure to pass the LD_PRELOAD variable only to
> the executed program, not the debugger. Otherwise the debugger will
> run the preloaded constructor/destructor too and bad things will
> happen.
> 
> Suggested-by: Rafael Antognolli 
> Signed-off-by: Lionel Landwerlin 

Reviewed-by: Rafael Antognolli 

> ---
>  src/intel/tools/intel_dump_gpu.in | 19 ---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/src/intel/tools/intel_dump_gpu.in 
> b/src/intel/tools/intel_dump_gpu.in
> index 0454cff25da..aa187ba8614 100755
> --- a/src/intel/tools/intel_dump_gpu.in
> +++ b/src/intel/tools/intel_dump_gpu.in
> @@ -23,8 +23,10 @@ EOF
>  exit 0
>  }
>  
> +ld_preload="@install_libexecdir@/libintel_dump_gpu.so${LD_PPRELOAD:+:$LD_PRELOAD}"
>  args=""
>  file=""
> +gdb=""
>  
>  function add_arg() {
>  arg=$1
> @@ -60,6 +62,14 @@ while true; do
>  add_arg "device=${1##--device=}"
>  shift
>  ;;
> +--gdb)
> +gdb=1
> +shift
> +;;
> +-g)
> +gdb=1
> +shift
> +;;
>  --help)
>  show_help
>  ;;
> @@ -85,9 +95,12 @@ done
>  tmp_file=`mktemp`
>  echo -e $args > $tmp_file
>  
> -LD_PRELOAD="@install_libexecdir@/libintel_dump_gpu.so${LD_PPRELOAD:+:$LD_PRELOAD}"
>  \
> -  INTEL_DUMP_GPU_CONFIG=$tmp_file \
> -  $@
> +if [ -z $gdb ]; then
> +LD_PRELOAD="$ld_preload" INTEL_DUMP_GPU_CONFIG=$tmp_file $@
> +else
> +gdb -iex "set exec-wrapper env LD_PRELOAD=$ld_preload 
> INTEL_DUMP_GPU_CONFIG=$tmp_file" --args $@
> +fi
> +
>  ret=$?
>  rm $tmp_file
>  exit $ret
> -- 
> 2.18.0
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/4] intel: tools: dump: remove command execution feature

2018-07-19 Thread Rafael Antognolli
I was thinking about the patch. I didn't look deeply into the one that
removes the command execution stuff, but for the rest of the series,

Acked-by: Rafael Antognolli 

Sorry for being ambiguous :P

On Thu, Jul 19, 2018 at 10:12:57AM +0100, Lionel Landwerlin wrote:
> Was that for the whole series, or just this patch? :)
> 
> Thanks,
> 
> -
> Lionel
> 
> On 18/07/18 21:42, Jason Ekstrand wrote:
> 
> Very sketchily
> 
> Reviewed-by: Jason Ekstrand 
> 
> On Wed, Jul 18, 2018 at 10:21 AM Lionel Landwerlin <
> lionel.g.landwer...@intel.com> wrote:
> 
> In commit 86cb05a6d35a52 ("intel: aubinator: remove standard input
> processing option") we removed the ability to process aub as an input
> stream because we're now rely on mmapping the aub file to back the
> buffers aubinator is parsing.
> 
> intel_aubdump was the provider of the standard input data and since
> we've copied/reworked intel_aubdump into intel_dump_gpu within Mesa,
> we don't need that code anymore.
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/intel_dump_gpu.c  | 121 
> +++---
>  src/intel/tools/intel_dump_gpu.in |  27 +--
>  2 files changed, 29 insertions(+), 119 deletions(-)
> 
> diff --git a/src/intel/tools/intel_dump_gpu.c b/src/intel/tools/
> intel_dump_gpu.c
> index 6d2c4b7f983..5fd2c8ea723 100644
> --- a/src/intel/tools/intel_dump_gpu.c
> +++ b/src/intel/tools/intel_dump_gpu.c
> @@ -53,8 +53,8 @@ static int (*libc_close)(int fd) = 
> close_init_helper;
>  static int (*libc_ioctl)(int fd, unsigned long request, ...) =
> ioctl_init_helper;
> 
>  static int drm_fd = -1;
> -static char *filename = NULL;
> -static FILE *files[2] = { NULL, NULL };
> +static char *output_filename = NULL;
> +static FILE *output_file = NULL;
>  static int verbose = 0;
>  static bool device_override;
> 
> @@ -111,7 +111,7 @@ align_u32(uint32_t v, uint32_t a)
> 
>  static struct gen_device_info devinfo = {0};
>  static uint32_t device;
> -static struct aub_file aubs[2];
> +static struct aub_file aub_file;
> 
>  static void *
>  relocate_bo(struct bo *bo, const struct drm_i915_gem_execbuffer2
> *execbuffer2,
> @@ -205,28 +205,21 @@ dump_execbuffer2(int fd, struct
> drm_i915_gem_execbuffer2 *execbuffer2)
>fail_if(!gen_get_device_info(device, ),
>"failed to identify chipset=0x%x\n", device);
> 
> -  for (int i = 0; i < ARRAY_SIZE(files); i++) {
> - if (files[i] != NULL) {
> -aub_file_init([i], files[i], device);
> -if (verbose == 2)
> -   aubs[i].verbose_log_file = stdout;
> -aub_write_header([i], 
> program_invocation_short_name);
> - }
> -  }
> +  aub_file_init(_file, output_file, device);
> +  if (verbose == 2)
> + aub_file.verbose_log_file = stdout;
> +  aub_write_header(_file, program_invocation_short_name);
> 
>if (verbose)
>   printf("[intel_aubdump: running, "
>  "output file %s, chipset id 0x%04x, gen %d]\n",
> -filename, device, devinfo.gen);
> +output_filename, device, devinfo.gen);
> }
> 
> -   /* Any aub */
> -   struct aub_file *any_aub = files[0] ? [0] : [1];;
> -
> -   if (aub_use_execlists(any_aub))
> +   if (aub_use_execlists(_file))
>offset = 0x1000;
> else
> -  offset = aub_gtt_size(any_aub);
> +  offset = aub_gtt_size(_file);
> 
> if (verbose)
>printf("Dumping execbuffer2:\n");
> @@ -263,13 +256,8 @@ dump_execbuffer2(int fd, struct
> drm_i915_gem_execbuffer2 *execbuffer2)
>   bo->map = gem_mmap(fd, obj->handle, 0, bo->size);
>fail_if(bo->map == MAP_FAILED, "intel_aubdump: bo mmap failed\
> n");
> 
> -  for (int i = 0; i < ARRAY_SIZE(files); i++) {
> - if (files[i] == NULL)
> -continue;
> -
> - if (aub_use_execlists([i]))
> -aub_map_ppgtt([i], bo->offset, bo->size);
> -  

Re: [Mesa-dev] [PATCH v2 4/4] intel: tools: dump: trace memory writes

2018-07-19 Thread Rafael Antognolli
On Thu, Jul 19, 2018 at 10:14:32AM +0100, Lionel Landwerlin wrote:
> On 18/07/18 21:58, Rafael Antognolli wrote:
> > On Wed, Jul 18, 2018 at 06:21:32PM +0100, Lionel Landwerlin wrote:
> > > Signed-off-by: Lionel Landwerlin 
> > > ---
> > >   src/intel/tools/aub_write.c | 45 ++---
> > >   1 file changed, 32 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/src/intel/tools/aub_write.c b/src/intel/tools/aub_write.c
> > > index de4ce33..9c140553542 100644
> > > --- a/src/intel/tools/aub_write.c
> > > +++ b/src/intel/tools/aub_write.c
> > > @@ -313,10 +313,17 @@ dword_out(struct aub_file *aub, uint32_t data)
> > >   static void
> > >   mem_trace_memory_write_header_out(struct aub_file *aub, uint64_t addr,
> > > -  uint32_t len, uint32_t addr_space)
> > > +  uint32_t len, uint32_t addr_space,
> > > +  const char *desc)
> > Looks like you are not using desc anywhere...
> > 
> > Other than that, things look good.
> 
> Duh! Fixed locally.
> Counts as Rb?

Yeah, sure :)

> Thanks,
> 
> -
> Lionel
> 
> > 
> > >   {
> > >  uint32_t dwords = ALIGN(len, sizeof(uint32_t)) / sizeof(uint32_t);
> > > +   if (aub->verbose_log_file) {
> > > +  fprintf(aub->verbose_log_file,
> > > +  "  MEM WRITE (0x%016" PRIx64 "-0x%016" PRIx64 ")\n",
> > > +  addr, addr + len);
> > > +   }
> > > +
> > >  dword_out(aub, CMD_MEM_TRACE_MEMORY_WRITE | (5 + dwords - 1));
> > >  dword_out(aub, addr & 0x);   /* addr lo */
> > >  dword_out(aub, addr >> 32);   /* addr hi */
> > > @@ -387,7 +394,8 @@ populate_ppgtt_table(struct aub_file *aub, struct 
> > > aub_ppgtt_table *table,
> > > uint64_t write_size = (dirty_end - dirty_start + 1) *
> > >sizeof(uint64_t);
> > > mem_trace_memory_write_header_out(aub, write_addr, write_size,
> > > -
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL);
> > > +
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL,
> > > +"PPGTT update");
> > > data_out(aub, entries + dirty_start, write_size);
> > >  }
> > >   }
> > > @@ -476,7 +484,8 @@ write_execlists_header(struct aub_file *aub, const 
> > > char *name)
> > >  mem_trace_memory_write_header_out(aub, STATIC_GGTT_MAP_START >> 12,
> > >ggtt_ptes * GEN8_PTE_SIZE,
> > > - 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT_ENTRY);
> > > + 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT_ENTRY,
> > > + "GGTT PT");
> > >  for (uint32_t i = 0; i < ggtt_ptes; i++) {
> > > dword_out(aub, 1 + 0x1000 * i + STATIC_GGTT_MAP_START);
> > > dword_out(aub, 0);
> > > @@ -484,7 +493,8 @@ write_execlists_header(struct aub_file *aub, const 
> > > char *name)
> > >  /* RENDER_RING */
> > >  mem_trace_memory_write_header_out(aub, RENDER_RING_ADDR, RING_SIZE,
> > > - 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
> > > + 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
> > > + "RENDER RING");
> > >  for (uint32_t i = 0; i < RING_SIZE; i += sizeof(uint32_t))
> > > dword_out(aub, 0);
> > > @@ -492,7 +502,8 @@ write_execlists_header(struct aub_file *aub, const 
> > > char *name)
> > >  mem_trace_memory_write_header_out(aub, RENDER_CONTEXT_ADDR,
> > >PPHWSP_SIZE +
> > >sizeof(render_context_init),
> > > - 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
> > > + 
> > > AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
> > > + "RENDER PPHWSP");
> > >  for (uint32_t i = 0; i < PPHWSP_SIZE; i += sizeof(uint32_t))
> > > dword_out(aub, 0);
> > > @@ -501,7 +512,

Re: [Mesa-dev] [PATCH v2 4/4] intel: tools: dump: trace memory writes

2018-07-18 Thread Rafael Antognolli
On Wed, Jul 18, 2018 at 06:21:32PM +0100, Lionel Landwerlin wrote:
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aub_write.c | 45 ++---
>  1 file changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/src/intel/tools/aub_write.c b/src/intel/tools/aub_write.c
> index de4ce33..9c140553542 100644
> --- a/src/intel/tools/aub_write.c
> +++ b/src/intel/tools/aub_write.c
> @@ -313,10 +313,17 @@ dword_out(struct aub_file *aub, uint32_t data)
>  
>  static void
>  mem_trace_memory_write_header_out(struct aub_file *aub, uint64_t addr,
> -  uint32_t len, uint32_t addr_space)
> +  uint32_t len, uint32_t addr_space,
> +  const char *desc)

Looks like you are not using desc anywhere...

Other than that, things look good.

>  {
> uint32_t dwords = ALIGN(len, sizeof(uint32_t)) / sizeof(uint32_t);
>  
> +   if (aub->verbose_log_file) {
> +  fprintf(aub->verbose_log_file,
> +  "  MEM WRITE (0x%016" PRIx64 "-0x%016" PRIx64 ")\n",
> +  addr, addr + len);
> +   }
> +
> dword_out(aub, CMD_MEM_TRACE_MEMORY_WRITE | (5 + dwords - 1));
> dword_out(aub, addr & 0x);   /* addr lo */
> dword_out(aub, addr >> 32);   /* addr hi */
> @@ -387,7 +394,8 @@ populate_ppgtt_table(struct aub_file *aub, struct 
> aub_ppgtt_table *table,
>uint64_t write_size = (dirty_end - dirty_start + 1) *
>   sizeof(uint64_t);
>mem_trace_memory_write_header_out(aub, write_addr, write_size,
> -
> AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL);
> +
> AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL,
> +"PPGTT update");
>data_out(aub, entries + dirty_start, write_size);
> }
>  }
> @@ -476,7 +484,8 @@ write_execlists_header(struct aub_file *aub, const char 
> *name)
>  
> mem_trace_memory_write_header_out(aub, STATIC_GGTT_MAP_START >> 12,
>   ggtt_ptes * GEN8_PTE_SIZE,
> - 
> AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT_ENTRY);
> + 
> AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT_ENTRY,
> + "GGTT PT");
> for (uint32_t i = 0; i < ggtt_ptes; i++) {
>dword_out(aub, 1 + 0x1000 * i + STATIC_GGTT_MAP_START);
>dword_out(aub, 0);
> @@ -484,7 +493,8 @@ write_execlists_header(struct aub_file *aub, const char 
> *name)
>  
> /* RENDER_RING */
> mem_trace_memory_write_header_out(aub, RENDER_RING_ADDR, RING_SIZE,
> - 
> AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
> + AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
> + "RENDER RING");
> for (uint32_t i = 0; i < RING_SIZE; i += sizeof(uint32_t))
>dword_out(aub, 0);
>  
> @@ -492,7 +502,8 @@ write_execlists_header(struct aub_file *aub, const char 
> *name)
> mem_trace_memory_write_header_out(aub, RENDER_CONTEXT_ADDR,
>   PPHWSP_SIZE +
>   sizeof(render_context_init),
> - 
> AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
> + AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
> + "RENDER PPHWSP");
> for (uint32_t i = 0; i < PPHWSP_SIZE; i += sizeof(uint32_t))
>dword_out(aub, 0);
>  
> @@ -501,7 +512,8 @@ write_execlists_header(struct aub_file *aub, const char 
> *name)
>  
> /* BLITTER_RING */
> mem_trace_memory_write_header_out(aub, BLITTER_RING_ADDR, RING_SIZE,
> - 
> AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
> + AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
> + "BLITTER RING");
> for (uint32_t i = 0; i < RING_SIZE; i += sizeof(uint32_t))
>dword_out(aub, 0);
>  
> @@ -509,7 +521,8 @@ write_execlists_header(struct aub_file *aub, const char 
> *name)
> mem_trace_memory_write_header_out(aub, BLITTER_CONTEXT_ADDR,
>   PPHWSP_SIZE +
>   sizeof(blitter_context_init),
> - 
> AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT);
> + AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT,
> + "BLITTER PPHWSP");
> for (uint32_t i = 0; i < PPHWSP_SIZE; i += sizeof(uint32_t))
>dword_out(aub, 0);
>  
> @@ -518,7 +531,8 @@ write_execlists_header(struct aub_file *aub, const char 
> *name)
>  
> /* VIDEO_RING */
> mem_trace_memory_write_header_out(aub, VIDEO_RING_ADDR, RING_SIZE,
> - 
> 

Re: [Mesa-dev] [PATCH] intel: tools: Fix uninitialized variable warnings in intel_dump_gpu.

2018-07-13 Thread Rafael Antognolli
Reviewed-by: Rafael Antognolli 

On Thu, Jul 12, 2018 at 11:46:12AM -0700, Eric Anholt wrote:
> ---
>  src/intel/tools/intel_dump_gpu.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/intel/tools/intel_dump_gpu.c 
> b/src/intel/tools/intel_dump_gpu.c
> index 1201fa35ae0c..766ba662d910 100644
> --- a/src/intel/tools/intel_dump_gpu.c
> +++ b/src/intel/tools/intel_dump_gpu.c
> @@ -728,6 +728,8 @@ aub_dump_execlist(uint64_t batch_offset, int ring_flag)
>status_reg = EXECLIST_STATUS_BCSUNIT;
>control_reg = EXECLIST_CONTROL_BCSUNIT;
>break;
> +   default:
> +  unreachable("unknown ring");
> }
>  
> mem_trace_memory_write_header_out(ring_addr, 16,
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] intel/tools/dump_gpu: Add option to print ppgtt mappings.

2018-07-09 Thread Rafael Antognolli
Using -vv will increase the verbosity, by printing the ppgtt mappings as
they get written into the aub file.

Cc: Lionel Landwerlin 
---
 src/intel/tools/intel_dump_gpu.c  | 25 -
 src/intel/tools/intel_dump_gpu.in |  6 ++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/src/intel/tools/intel_dump_gpu.c b/src/intel/tools/intel_dump_gpu.c
index c909d63d88f..1201fa35ae0 100644
--- a/src/intel/tools/intel_dump_gpu.c
+++ b/src/intel/tools/intel_dump_gpu.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "intel_aub.h"
 
@@ -389,6 +390,11 @@ populate_ppgtt_table(struct ppgtt_table *table, int start, 
int end,
uint64_t entries[512] = {0};
int dirty_start = 512, dirty_end = 0;
 
+   if (verbose == 2) {
+  printf("  PPGTT (0x%016" PRIx64 "), lvl %d, start: %x, end: %x\n",
+ table->phys_addr, level, start, end);
+   }
+
for (int i = start; i <= end; i++) {
   if (!table->subtables[i]) {
  dirty_start = min(dirty_start, i);
@@ -396,11 +402,19 @@ populate_ppgtt_table(struct ppgtt_table *table, int 
start, int end,
  if (level == 1) {
 table->subtables[i] =
(void *)(phys_addrs_allocator++ << 12);
+if (verbose == 2) {
+   printf("   Adding entry: %x, phys_addr: 0x%016" PRIx64 "\n",
+  i, (uint64_t)table->subtables[i]);
+}
  } else {
 table->subtables[i] =
calloc(1, sizeof(struct ppgtt_table));
 table->subtables[i]->phys_addr =
phys_addrs_allocator++ << 12;
+if (verbose == 2) {
+   printf("   Adding entry: %x, phys_addr: 0x%016" PRIx64 "\n",
+  i, table->subtables[i]->phys_addr);
+}
  }
   }
   entries[i] = 3 /* read/write | present */ |
@@ -434,6 +448,11 @@ map_ppgtt(uint64_t start, uint64_t size)
 #define L2_table(addr) (L3_table(addr)->subtables[L3_index(addr)])
 #define L1_table(addr) (L2_table(addr)->subtables[L2_index(addr)])
 
+   if (verbose == 2) {
+  printf(" Mapping PPGTT address: 0x%" PRIx64 ", size: %" PRIu64"\n",
+ start, size);
+   }
+
populate_ppgtt_table(, L4_index(l4_start), L4_index(l4_end), 4);
 
for (uint64_t l4 = l4_start; l4 < l4_end; l4 += (1ULL << 39)) {
@@ -1072,7 +1091,11 @@ maybe_init(void)
config = fdopen(3, "r");
while (fscanf(config, "%m[^=]=%m[^\n]\n", , ) != EOF) {
   if (!strcmp(key, "verbose")) {
- verbose = 1;
+ if (!strcmp(value, "1")) {
+verbose = 1;
+ } else if (!strcmp(value, "2")) {
+verbose = 2;
+ }
   } else if (!strcmp(key, "device")) {
  fail_if(sscanf(value, "%i", ) != 1,
  "intel_aubdump: failed to parse device id '%s'",
diff --git a/src/intel/tools/intel_dump_gpu.in 
b/src/intel/tools/intel_dump_gpu.in
index 875a67e7682..b9887f0ed2e 100755
--- a/src/intel/tools/intel_dump_gpu.in
+++ b/src/intel/tools/intel_dump_gpu.in
@@ -17,6 +17,8 @@ contents and execution of the GEM application.
 
   -v Enable verbose output
 
+  -vvEnable extra verbosity - dumps gtt mappings
+
   --help Display this help message and exit
 
 EOF
@@ -55,6 +57,10 @@ while true; do
 add_arg "verbose=1"
 shift 1
 ;;
+-vv)
+add_arg "verbose=2"
+shift 1
+;;
 -o*)
 file=${1##-o}
 add_arg "file=${file:-$(basename ${file}).aub}"
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel: tools: dump_gpu: fix ppgtt mapping

2018-07-09 Thread Rafael Antognolli
Reviewed-by: Rafael Antognolli 

On Fri, Jul 06, 2018 at 11:02:05AM +0100, Lionel Landwerlin wrote:
> We were not properly writing page tables when the virtual address
> range spans multiple subtrees of the tables.
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/intel_dump_gpu.c | 46 
>  1 file changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git a/src/intel/tools/intel_dump_gpu.c 
> b/src/intel/tools/intel_dump_gpu.c
> index 8a7dd52e746..c909d63d88f 100644
> --- a/src/intel/tools/intel_dump_gpu.c
> +++ b/src/intel/tools/intel_dump_gpu.c
> @@ -423,13 +423,7 @@ static void
>  map_ppgtt(uint64_t start, uint64_t size)
>  {
> uint64_t l4_start = start & 0xff80;
> -   uint64_t l3_start = start & 0xc000;
> -   uint64_t l2_start = start & 0xffe0;
> -   uint64_t l1_start = start & 0xf000;
> uint64_t l4_end = ((start + size - 1) | 0x007f) & 0x;
> -   uint64_t l3_end = ((start + size - 1) | 0x3fff) & 0x;
> -   uint64_t l2_end = ((start + size - 1) | 0x001f) & 0x;
> -   uint64_t l1_end = ((start + size - 1) | 0x0fff) & 0x;
>  
>  #define L4_index(addr) (((addr) >> 39) & 0x1ff)
>  #define L3_index(addr) (((addr) >> 30) & 0x1ff)
> @@ -442,28 +436,34 @@ map_ppgtt(uint64_t start, uint64_t size)
>  
> populate_ppgtt_table(, L4_index(l4_start), L4_index(l4_end), 4);
>  
> -   for (uint64_t a = l4_start; a < l4_end; a += (1ULL << 39)) {
> -  uint64_t _start = max(a, l3_start);
> -  uint64_t _end = min(a + (1ULL << 39), l3_end);
> +   for (uint64_t l4 = l4_start; l4 < l4_end; l4 += (1ULL << 39)) {
> +  uint64_t l3_start = max(l4, start & 0xc000);
> +  uint64_t l3_end = min(l4 + (1ULL << 39),
> +((start + size - 1) | 0x3fff) & 
> 0x);
> +  uint64_t l3_start_idx = L3_index(l3_start);
> +  uint64_t l3_end_idx = L3_index(l3_start) >= l3_start_idx ? 
> L3_index(l3_end) : 0x1ff;
>  
> -  populate_ppgtt_table(L3_table(a), L3_index(_start),
> -   L3_index(_end), 3);
> -   }
> +  populate_ppgtt_table(L3_table(l4), l3_start_idx, l3_end_idx, 3);
>  
> -   for (uint64_t a = l3_start; a < l3_end; a += (1ULL << 30)) {
> -  uint64_t _start = max(a, l2_start);
> -  uint64_t _end = min(a + (1ULL << 30), l2_end);
> +  for (uint64_t l3 = l3_start; l3 < l3_end; l3 += (1ULL << 30)) {
> + uint64_t l2_start = max(l3, start & 0xffe0);
> + uint64_t l2_end = min(l3 + (1ULL << 30),
> +   ((start + size - 1) | 0x001f) & 
> 0x);
> + uint64_t l2_start_idx = L2_index(l2_start);
> + uint64_t l2_end_idx = L2_index(l2_end) >= l2_start_idx ? 
> L2_index(l2_end) : 0x1ff;
>  
> -  populate_ppgtt_table(L2_table(a), L2_index(_start),
> -   L2_index(_end), 2);
> -   }
> + populate_ppgtt_table(L2_table(l3), l2_start_idx, l2_end_idx, 2);
>  
> -   for (uint64_t a = l2_start; a < l2_end; a += (1ULL << 21)) {
> -  uint64_t _start = max(a, l1_start);
> -  uint64_t _end = min(a + (1ULL << 21), l1_end);
> + for (uint64_t l2 = l2_start; l2 < l2_end; l2 += (1ULL << 21)) {
> +uint64_t l1_start = max(l2, start & 0xf000);
> +uint64_t l1_end = min(l2 + (1ULL << 21),
> +  ((start + size - 1) | 0x0fff) & 
> 0x);
> +uint64_t l1_start_idx = L1_index(l1_start);
> +uint64_t l1_end_idx = L1_index(l1_end) >= l1_start_idx ? 
> L1_index(l1_end) : 0x1ff;
>  
> -  populate_ppgtt_table(L1_table(a), L1_index(_start),
> -   L1_index(_end), 1);
> +populate_ppgtt_table(L1_table(l2), l1_start_idx, l1_end_idx, 1);
> + }
> +  }
> }
>  }
>  
> -- 
> 2.18.0
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 14/16] intel: devinfo: add simulator id

2018-06-22 Thread Rafael Antognolli
Patches 14-16 are

Reviewed-by: Rafael Antognolli 

On Thu, Jun 21, 2018 at 05:29:13PM +0100, Lionel Landwerlin wrote:
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/dev/gen_device_info.c | 47 ++---
>  src/intel/dev/gen_device_info.h |  5 
>  2 files changed, 48 insertions(+), 4 deletions(-)
> 
> diff --git a/src/intel/dev/gen_device_info.c b/src/intel/dev/gen_device_info.c
> index 8e971329892..b0ae4d18034 100644
> --- a/src/intel/dev/gen_device_info.c
> +++ b/src/intel/dev/gen_device_info.c
> @@ -105,6 +105,7 @@ static const struct gen_device_info gen_device_info_i965 
> = {
>.size = 256,
> },
> .timestamp_frequency = 1250,
> +   .simulator_id = -1,
>  };
>  
>  static const struct gen_device_info gen_device_info_g4x = {
> @@ -124,6 +125,7 @@ static const struct gen_device_info gen_device_info_g4x = 
> {
>.size = 384,
> },
> .timestamp_frequency = 1250,
> +   .simulator_id = -1,
>  };
>  
>  static const struct gen_device_info gen_device_info_ilk = {
> @@ -142,6 +144,7 @@ static const struct gen_device_info gen_device_info_ilk = 
> {
>.size = 1024,
> },
> .timestamp_frequency = 1250,
> +   .simulator_id = -1,
>  };
>  
>  static const struct gen_device_info gen_device_info_snb_gt1 = {
> @@ -170,6 +173,7 @@ static const struct gen_device_info 
> gen_device_info_snb_gt1 = {
>},
> },
> .timestamp_frequency = 1250,
> +   .simulator_id = -1,
>  };
>  
>  static const struct gen_device_info gen_device_info_snb_gt2 = {
> @@ -198,6 +202,7 @@ static const struct gen_device_info 
> gen_device_info_snb_gt2 = {
>},
> },
> .timestamp_frequency = 1250,
> +   .simulator_id = -1,
>  };
>  
>  #define GEN7_FEATURES   \
> @@ -236,6 +241,7 @@ static const struct gen_device_info 
> gen_device_info_ivb_gt1 = {
>   [MESA_SHADER_GEOMETRY]  = 192,
>},
> },
> +   .simulator_id = 7,
>  };
>  
>  static const struct gen_device_info gen_device_info_ivb_gt2 = {
> @@ -265,6 +271,7 @@ static const struct gen_device_info 
> gen_device_info_ivb_gt2 = {
>   [MESA_SHADER_GEOMETRY]  = 320,
>},
> },
> +   .simulator_id = 7,
>  };
>  
>  static const struct gen_device_info gen_device_info_byt = {
> @@ -294,6 +301,7 @@ static const struct gen_device_info gen_device_info_byt = 
> {
>   [MESA_SHADER_GEOMETRY]  = 192,
>},
> },
> +   .simulator_id = 10,
>  };
>  
>  #define HSW_FEATURES \
> @@ -328,6 +336,7 @@ static const struct gen_device_info 
> gen_device_info_hsw_gt1 = {
>   [MESA_SHADER_GEOMETRY]  = 256,
>},
> },
> +   .simulator_id = 9,
>  };
>  
>  static const struct gen_device_info gen_device_info_hsw_gt2 = {
> @@ -356,6 +365,7 @@ static const struct gen_device_info 
> gen_device_info_hsw_gt2 = {
>   [MESA_SHADER_GEOMETRY]  = 640,
>},
> },
> +   .simulator_id = 9,
>  };
>  
>  static const struct gen_device_info gen_device_info_hsw_gt3 = {
> @@ -384,6 +394,7 @@ static const struct gen_device_info 
> gen_device_info_hsw_gt3 = {
>   [MESA_SHADER_GEOMETRY]  = 640,
>},
> },
> +   .simulator_id = 9,
>  };
>  
>  /* It's unclear how well supported sampling from the hiz buffer is on GEN8,
> @@ -429,7 +440,8 @@ static const struct gen_device_info 
> gen_device_info_bdw_gt1 = {
>   [MESA_SHADER_TESS_EVAL] = 1536,
>   [MESA_SHADER_GEOMETRY]  = 960,
>},
> -   }
> +   },
> +   .simulator_id = 11,
>  };
>  
>  static const struct gen_device_info gen_device_info_bdw_gt2 = {
> @@ -453,7 +465,8 @@ static const struct gen_device_info 
> gen_device_info_bdw_gt2 = {
>   [MESA_SHADER_TESS_EVAL] = 1536,
>   [MESA_SHADER_GEOMETRY]  = 960,
>},
> -   }
> +   },
> +   .simulator_id = 11,
>  };
>  
>  static const struct gen_device_info gen_device_info_bdw_gt3 = {
> @@ -477,7 +490,8 @@ static const struct gen_device_info 
> gen_device_info_bdw_gt3 = {
>   [MESA_SHADER_TESS_EVAL] = 1536,
>   [MESA_SHADER_GEOMETRY]  = 960,
>},
> -   }
> +   },
> +   .simulator_id = 11,
>  };
>  
>  static const struct gen_device_info gen_device_info_chv = {
> @@ -507,7 +521,8 @@ static const struct gen_device_info gen_device_info_chv = 
> {
>   [MESA_SHADER_TESS_EVAL] = 384,
>   [MESA_SHADER_GEOMETRY]  = 256,
>},
> -   }
> +   },
> +   .simulator_id = 13,
>  };
>  
>  #define GEN9_HW_INFO\
> @

Re: [Mesa-dev] [PATCH v3 13/16] intel: tools: dump-gpu: dump 48-bit addresses

2018-06-22 Thread Rafael Antognolli
On Thu, Jun 21, 2018 at 05:29:12PM +0100, Lionel Landwerlin wrote:
> From: Scott D Phillips 
> 
> For gen8+, write out PPGTT tables in aub files so that full 48-bit
> addresses can be serialized.

I don't fully understand how things worked before this patch, in the
GEN < 10 case. It looks to me that we would setup a GGTT mapping only
64MiB of memory, but that wouldn't make much sense. So I also don't know
how things work on the legacy behavior, even after this patch.

For the execlists case, things make more sense to me, so I'll add some
comments that imho we could add to help explain this patch. Assuming
those comments make sense and are correct, this patch is

Reviewed-by: Rafael Antognolli 

> 
> v2: Fix handling of `end` index in map_ppgtt
> 
> v3: Correctly mark GGTT entry as present (Rafael)
> 
> Signed-off-by: Scott D Phillips 
> Signed-off-by: Lionel Landwerlin 
> Cc: Jordan Justen 
> ---
>  src/intel/tools/intel_aub.h  |   3 +-
>  src/intel/tools/intel_dump_gpu.c | 315 +++
>  2 files changed, 151 insertions(+), 167 deletions(-)
> 
> diff --git a/src/intel/tools/intel_aub.h b/src/intel/tools/intel_aub.h
> index 9ca548edaf3..2888515048f 100644
> --- a/src/intel/tools/intel_aub.h
> +++ b/src/intel/tools/intel_aub.h
> @@ -117,7 +117,8 @@
>  /* DW3 */
>  
>  #define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_MASK  0xf000
> -#define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_LOCAL (1 << 28)

Add some comments here:

/**
 * Address spaces:
 *   - GGTT: virtual addresses written through GGTT that need to be
 * translated to physical addresses
 *   - PHYSICAL: physical addresses, no GTT translation needed.
 *   - GGTT_ENTRY: adds an entry to the GGTT.
 *
 * Note that there's no PPGGTT address space, because PPGTT virtual
 * addresses get translated and written as physical addresses.
 */
> +#define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT  (0 << 28)
> +#define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL  (2 << 28)
>  #define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT_ENTRY(4 << 28)
>  
>  /**
> diff --git a/src/intel/tools/intel_dump_gpu.c 
> b/src/intel/tools/intel_dump_gpu.c
> index 86c133da433..a9ce109b2b6 100644
> --- a/src/intel/tools/intel_dump_gpu.c
> +++ b/src/intel/tools/intel_dump_gpu.c
> @@ -51,6 +51,8 @@
>  #define MI_LOAD_REGISTER_IMM_n(n) ((0x22 << 23) | (2 * (n) - 1))
>  #define MI_LRI_FORCE_POSTED   (1<<12)
>  
> +#define MI_BATCH_NON_SECURE_I965 (1 << 8)
> +
>  #define MI_BATCH_BUFFER_END (0xA << 23)
>  
>  #define min(a, b) ({\
> @@ -59,6 +61,12 @@
>   _a < _b ? _a : _b; \
>})
>  
> +#define max(a, b) ({\
> + __typeof(a) _a = (a);  \
> + __typeof(b) _b = (b);  \
> + _a > _b ? _a : _b; \
> +  })
> +
>  #define HWS_PGA_RCSUNIT  0x02080
>  #define HWS_PGA_VCSUNIT0   0x12080
>  #define HWS_PGA_BCSUNIT  0x22080
> @@ -93,8 +101,12 @@
>  
>  #define RING_SIZE (1 * 4096)
>  #define PPHWSP_SIZE (1 * 4096)
> -#define GEN10_LR_CONTEXT_RENDER_SIZE   (19 * 4096)
> -#define GEN8_LR_CONTEXT_OTHER_SIZE   (2 * 4096)
> +#define GEN11_LR_CONTEXT_RENDER_SIZE(14 * 4096)
> +#define GEN10_LR_CONTEXT_RENDER_SIZE(19 * 4096)
> +#define GEN9_LR_CONTEXT_RENDER_SIZE (22 * 4096)
> +#define GEN8_LR_CONTEXT_RENDER_SIZE (20 * 4096)
> +#define GEN8_LR_CONTEXT_OTHER_SIZE  (2 * 4096)
> +
>  
>  #define STATIC_GGTT_MAP_START 0
>  
> @@ -110,14 +122,19 @@
>  #define STATIC_GGTT_MAP_END (VIDEO_CONTEXT_ADDR + PPHWSP_SIZE + 
> GEN8_LR_CONTEXT_OTHER_SIZE)
>  #define STATIC_GGTT_MAP_SIZE (STATIC_GGTT_MAP_END - STATIC_GGTT_MAP_START)
>  
> -#define CONTEXT_FLAGS (0x229)   /* Normal Priority | L3-LLC Coherency |
> -   Legacy Context with no 64 bit VA support 
> | Valid */
> +#define PML4_PHYS_ADDR ((uint64_t)(STATIC_GGTT_MAP_END))
> +
> +#define CONTEXT_FLAGS (0x339)   /* Normal Priority | L3-LLC Coherency |
> + * PPGTT Enabled |
> + * Legacy Context with 64 bit VA support |
> + * Valid
> + */
>  
> -#define RENDER_CONTEXT_DESCRIPTOR  ((uint64_t)1 << 32 | RENDER_CONTEXT_ADDR  
> | CONTEXT_FLAGS)
> -#define BLITTER_CONTEXT_DESCRIPTOR ((uint64_t)2 << 32 | BLITTER_CONTEXT_ADDR 
> | CONTEXT_FLAGS)
> -#define VIDEO_CONTEXT_DESCRIPTOR   ((uint64_t)3 << 32 | VIDEO_CONTEXT_ADDR   
> | CONTEXT_FLAGS)
> +#define RENDER_CONTEXT_DESCRIPTOR  ((u

Re: [Mesa-dev] [PATCH v3 06/16] intel: aubinator: handle GGTT mappings

2018-06-21 Thread Rafael Antognolli
This patch is

Reviewed-by: Rafael Antognolli 

On Thu, Jun 21, 2018 at 05:29:05PM +0100, Lionel Landwerlin wrote:
> We use memfd to store physical pages as they get read/written to and
> the GGTT entries translating virtual address to physical pages.
> 
> Based on a commit by Scott Phillips.
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aubinator.c | 257 ++--
>  1 file changed, 244 insertions(+), 13 deletions(-)
> 
> diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> index 3b04ba3f431..05083dbcda0 100644
> --- a/src/intel/tools/aubinator.c
> +++ b/src/intel/tools/aubinator.c
> @@ -39,12 +39,23 @@
>  
>  #include "util/list.h"
>  #include "util/macros.h"
> +#include "util/rb_tree.h"
>  
>  #include "common/gen_decoder.h"
>  #include "common/gen_disasm.h"
>  #include "common/gen_gem.h"
>  #include "intel_aub.h"
>  
> +#ifndef HAVE_MEMFD_CREATE
> +#include 
> +
> +static inline int
> +memfd_create(const char *name, unsigned int flags)
> +{
> +   return syscall(SYS_memfd_create, name, flags);
> +}
> +#endif
> +
>  /* Below is the only command missing from intel_aub.h in libdrm
>   * So, reuse intel_aub.h from libdrm and #define the
>   * AUB_MI_BATCH_BUFFER_END as below
> @@ -73,20 +84,39 @@ struct gen_batch_decode_ctx batch_ctx;
>  struct bo_map {
> struct list_head link;
> struct gen_batch_decode_bo bo;
> +   bool unmap_after_use;
> +};
> +
> +struct ggtt_entry {
> +   struct rb_node node;
> +   uint64_t virt_addr;
> +   uint64_t phys_addr;
> +};
> +
> +struct phys_mem {
> +   struct rb_node node;
> +   uint64_t fd_offset;
> +   uint64_t phys_addr;
> +   uint8_t *data;
>  };
>  
>  static struct list_head maps;
> +static struct rb_tree ggtt = {NULL};
> +static struct rb_tree mem = {NULL};
> +int mem_fd = -1;
> +off_t mem_fd_len = 0;
>  
>  FILE *outfile;
>  
>  struct brw_instruction;
>  
>  static void
> -add_gtt_bo_map(struct gen_batch_decode_bo bo)
> +add_gtt_bo_map(struct gen_batch_decode_bo bo, bool unmap_after_use)
>  {
> struct bo_map *m = calloc(1, sizeof(*m));
>  
> m->bo = bo;
> +   m->unmap_after_use = unmap_after_use;
> list_add(>link, );
>  }
>  
> @@ -94,21 +124,209 @@ static void
>  clear_bo_maps(void)
>  {
> list_for_each_entry_safe(struct bo_map, i, , link) {
> +  if (i->unmap_after_use)
> + munmap((void *)i->bo.map, i->bo.size);
>list_del(>link);
>free(i);
> }
>  }
>  
> +static inline struct ggtt_entry *
> +ggtt_entry_next(struct ggtt_entry *entry)
> +{
> +   if (!entry)
> +  return NULL;
> +   struct rb_node *node = rb_node_next(>node);
> +   if (!node)
> +  return NULL;
> +   return rb_node_data(struct ggtt_entry, node, node);
> +}
> +
> +static inline int
> +cmp_uint64(uint64_t a, uint64_t b)
> +{
> +   if (a < b)
> +  return -1;
> +   if (a > b)
> +  return 1;
> +   return 0;
> +}
> +
> +static inline int
> +cmp_ggtt_entry(const struct rb_node *node, const void *addr)
> +{
> +   struct ggtt_entry *entry = rb_node_data(struct ggtt_entry, node, node);
> +   return cmp_uint64(entry->virt_addr, *(const uint64_t *)addr);
> +}
> +
> +static struct ggtt_entry *
> +ensure_ggtt_entry(struct rb_tree *tree, uint64_t virt_addr)
> +{
> +   struct rb_node *node = rb_tree_search_sloppy(, _addr,
> +cmp_ggtt_entry);
> +   int cmp = 0;
> +   if (!node || (cmp = cmp_ggtt_entry(node, _addr))) {
> +  struct ggtt_entry *new_entry = calloc(1, sizeof(*new_entry));
> +  new_entry->virt_addr = virt_addr;
> +  rb_tree_insert_at(, node, _entry->node, cmp > 0);
> +  node = _entry->node;
> +   }
> +
> +   return rb_node_data(struct ggtt_entry, node, node);
> +}
> +
> +static struct ggtt_entry *
> +search_ggtt_entry(uint64_t virt_addr)
> +{
> +   virt_addr &= ~0xfff;
> +
> +   struct rb_node *node = rb_tree_search(, _addr, cmp_ggtt_entry);
> +
> +   if (!node)
> +  return NULL;
> +
> +   return rb_node_data(struct ggtt_entry, node, node);
> +}
> +
> +static inline int
> +cmp_phys_mem(const struct rb_node *node, const void *addr)
> +{
> +   struct phys_mem *mem = rb_node_data(struct phys_mem, node, node);
> +   return cmp_uint64(mem->phys_addr, *(uint64_t *)addr);
> +}
> +
> +static struct phys_mem *
> +ensure_phys_mem(uint64_t phys_addr)
> +{
> +   struct rb_node *node = rb_tree_search_sloppy(, _addr, 
> cmp_phys_

Re: [Mesa-dev] [PATCH v2 13/16] intel: tools: dump-gpu: dump 48-bit addresses

2018-06-20 Thread Rafael Antognolli
On Tue, Jun 19, 2018 at 02:45:28PM +0100, Lionel Landwerlin wrote:
> From: Scott D Phillips 
> 
> For gen8+, write out PPGTT tables in aub files so that full 48-bit
> addresses can be serialized.
> 
> v2: Fix handling of `end` index in map_ppgtt
> 
> Signed-off-by: Scott D Phillips 
> Signed-off-by: Lionel Landwerlin 
> Cc: Jordan Justen 
> ---
>  src/intel/tools/intel_aub.h  |   3 +-
>  src/intel/tools/intel_dump_gpu.c | 315 +++
>  2 files changed, 151 insertions(+), 167 deletions(-)
> 
> diff --git a/src/intel/tools/intel_aub.h b/src/intel/tools/intel_aub.h
> index 9ca548edaf3..2888515048f 100644
> --- a/src/intel/tools/intel_aub.h
> +++ b/src/intel/tools/intel_aub.h
> @@ -117,7 +117,8 @@
>  /* DW3 */
>  
>  #define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_MASK  0xf000
> -#define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_LOCAL (1 << 28)
> +#define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT  (0 << 28)
> +#define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_PHYSICAL  (2 << 28)
>  #define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT_ENTRY(4 << 28)
>  
>  /**
> diff --git a/src/intel/tools/intel_dump_gpu.c 
> b/src/intel/tools/intel_dump_gpu.c
> index 86c133da433..bfff481ba5e 100644
> --- a/src/intel/tools/intel_dump_gpu.c
> +++ b/src/intel/tools/intel_dump_gpu.c
> @@ -51,6 +51,8 @@
>  #define MI_LOAD_REGISTER_IMM_n(n) ((0x22 << 23) | (2 * (n) - 1))
>  #define MI_LRI_FORCE_POSTED   (1<<12)
>  
> +#define MI_BATCH_NON_SECURE_I965 (1 << 8)
> +
>  #define MI_BATCH_BUFFER_END (0xA << 23)
>  
>  #define min(a, b) ({\
> @@ -59,6 +61,12 @@
>   _a < _b ? _a : _b; \
>})
>  
> +#define max(a, b) ({\
> + __typeof(a) _a = (a);  \
> + __typeof(b) _b = (b);  \
> + _a > _b ? _a : _b; \
> +  })
> +
>  #define HWS_PGA_RCSUNIT  0x02080
>  #define HWS_PGA_VCSUNIT0   0x12080
>  #define HWS_PGA_BCSUNIT  0x22080
> @@ -93,8 +101,12 @@
>  
>  #define RING_SIZE (1 * 4096)
>  #define PPHWSP_SIZE (1 * 4096)
> -#define GEN10_LR_CONTEXT_RENDER_SIZE   (19 * 4096)
> -#define GEN8_LR_CONTEXT_OTHER_SIZE   (2 * 4096)
> +#define GEN11_LR_CONTEXT_RENDER_SIZE(14 * 4096)
> +#define GEN10_LR_CONTEXT_RENDER_SIZE(19 * 4096)
> +#define GEN9_LR_CONTEXT_RENDER_SIZE (22 * 4096)
> +#define GEN8_LR_CONTEXT_RENDER_SIZE (20 * 4096)
> +#define GEN8_LR_CONTEXT_OTHER_SIZE  (2 * 4096)
> +
>  
>  #define STATIC_GGTT_MAP_START 0
>  
> @@ -110,14 +122,19 @@
>  #define STATIC_GGTT_MAP_END (VIDEO_CONTEXT_ADDR + PPHWSP_SIZE + 
> GEN8_LR_CONTEXT_OTHER_SIZE)
>  #define STATIC_GGTT_MAP_SIZE (STATIC_GGTT_MAP_END - STATIC_GGTT_MAP_START)
>  
> -#define CONTEXT_FLAGS (0x229)   /* Normal Priority | L3-LLC Coherency |
> -   Legacy Context with no 64 bit VA support 
> | Valid */
> +#define PML4_PHYS_ADDR ((uint64_t)(STATIC_GGTT_MAP_END))
> +
> +#define CONTEXT_FLAGS (0x339)   /* Normal Priority | L3-LLC Coherency |
> + * PPGTT Enabled |
> + * Legacy Context with 64 bit VA support |
> + * Valid
> + */
>  
> -#define RENDER_CONTEXT_DESCRIPTOR  ((uint64_t)1 << 32 | RENDER_CONTEXT_ADDR  
> | CONTEXT_FLAGS)
> -#define BLITTER_CONTEXT_DESCRIPTOR ((uint64_t)2 << 32 | BLITTER_CONTEXT_ADDR 
> | CONTEXT_FLAGS)
> -#define VIDEO_CONTEXT_DESCRIPTOR   ((uint64_t)3 << 32 | VIDEO_CONTEXT_ADDR   
> | CONTEXT_FLAGS)
> +#define RENDER_CONTEXT_DESCRIPTOR  ((uint64_t)1 << 62 | RENDER_CONTEXT_ADDR  
> | CONTEXT_FLAGS)
> +#define BLITTER_CONTEXT_DESCRIPTOR ((uint64_t)2 << 62 | BLITTER_CONTEXT_ADDR 
> | CONTEXT_FLAGS)
> +#define VIDEO_CONTEXT_DESCRIPTOR   ((uint64_t)3 << 62 | VIDEO_CONTEXT_ADDR   
> | CONTEXT_FLAGS)
>  
> -static const uint32_t render_context_init[GEN10_LR_CONTEXT_RENDER_SIZE /
> +static const uint32_t render_context_init[GEN9_LR_CONTEXT_RENDER_SIZE / /* 
> Choose the largest */
>sizeof(uint32_t)] = {
> 0 /* MI_NOOP */,
> MI_LOAD_REGISTER_IMM_n(14) | MI_LRI_FORCE_POSTED,
> @@ -147,8 +164,8 @@ static const uint32_t 
> render_context_init[GEN10_LR_CONTEXT_RENDER_SIZE /
> 0x2280 /* PDP2_LDW */,  0,
> 0x227C /* PDP1_UDW */,  0,
> 0x2278 /* PDP1_LDW */,  0,
> -   0x2274 /* PDP0_UDW */,  0,
> -   0x2270 /* PDP0_LDW */,  0,
> +   0x2274 /* PDP0_UDW */,  PML4_PHYS_ADDR >> 32,
> +   0x2270 /* PDP0_LDW */,  PML4_PHYS_ADDR,
> /* MI_NOOP */
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>  
> @@ -185,8 +202,8 @@ static const uint32_t 
> blitter_context_init[GEN8_LR_CONTEXT_OTHER_SIZE /
> 0x22280 /* PDP2_LDW */,  0,
> 0x2227C /* PDP1_UDW */,  0,
> 0x22278 /* PDP1_LDW */,  0,
> -   0x22274 /* PDP0_UDW */,  0,
> -   0x22270 /* PDP0_LDW */,  0,
> +   0x22274 /* 

Re: [Mesa-dev] [PATCH v2 12/16] intel: tools: import intel_aubdump

2018-06-20 Thread Rafael Antognolli
diff -u --ignore-all-space shows that this and the original file are
roughly the same, except for some macros, some includes and how we check
for hardware gen.

Acked-by: Rafael Antognolli 

On Tue, Jun 19, 2018 at 02:45:27PM +0100, Lionel Landwerlin wrote:
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/Makefile.am |2 +
>  src/intel/tools/intel_dump_gpu.c  | 1313 +
>  src/intel/tools/intel_dump_gpu.in |  107 +++
>  src/intel/tools/meson.build   |   18 +
>  4 files changed, 1440 insertions(+)
>  create mode 100644 src/intel/tools/intel_dump_gpu.c
>  create mode 100755 src/intel/tools/intel_dump_gpu.in
> 
> diff --git a/src/intel/Makefile.am b/src/intel/Makefile.am
> index 3e098a7ac9b..8448640983f 100644
> --- a/src/intel/Makefile.am
> +++ b/src/intel/Makefile.am
> @@ -71,6 +71,8 @@ EXTRA_DIST = \
>   isl/meson.build \
>   tools/intel_sanitize_gpu.c \
>   tools/intel_sanitize_gpu.in \
> + tools/intel_dump_gpu.c \
> + tools/intel_dump_gpu.in \
>   tools/meson.build \
>   vulkan/meson.build \
>   meson.build
> diff --git a/src/intel/tools/intel_dump_gpu.c 
> b/src/intel/tools/intel_dump_gpu.c
> new file mode 100644
> index 000..86c133da433
> --- /dev/null
> +++ b/src/intel/tools/intel_dump_gpu.c
> @@ -0,0 +1,1313 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "intel_aub.h"
> +
> +#include "dev/gen_device_info.h"
> +#include "util/macros.h"
> +
> +#ifndef ALIGN
> +#define ALIGN(x, y) (((x) + (y)-1) & ~((y)-1))
> +#endif
> +
> +#define MI_LOAD_REGISTER_IMM_n(n) ((0x22 << 23) | (2 * (n) - 1))
> +#define MI_LRI_FORCE_POSTED   (1<<12)
> +
> +#define MI_BATCH_BUFFER_END (0xA << 23)
> +
> +#define min(a, b) ({\
> + __typeof(a) _a = (a);  \
> + __typeof(b) _b = (b);  \
> + _a < _b ? _a : _b; \
> +  })
> +
> +#define HWS_PGA_RCSUNIT  0x02080
> +#define HWS_PGA_VCSUNIT0   0x12080
> +#define HWS_PGA_BCSUNIT  0x22080
> +
> +#define GFX_MODE_RCSUNIT   0x0229c
> +#define GFX_MODE_VCSUNIT0   0x1229c
> +#define GFX_MODE_BCSUNIT   0x2229c
> +
> +#define EXECLIST_SUBMITPORT_RCSUNIT   0x02230
> +#define EXECLIST_SUBMITPORT_VCSUNIT0   0x12230
> +#define EXECLIST_SUBMITPORT_BCSUNIT   0x22230
> +
> +#define EXECLIST_STATUS_RCSUNIT  0x02234
> +#define EXECLIST_STATUS_VCSUNIT0   0x12234
> +#define EXECLIST_STATUS_BCSUNIT  0x22234
> +
> +#define EXECLIST_SQ_CONTENTS0_RCSUNIT   0x02510
> +#define EXECLIST_SQ_CONTENTS0_VCSUNIT0   0x12510
> +#define EXECLIST_SQ_CONTENTS0_BCSUNIT   0x22510
> +
> +#define EXECLIST_CONTROL_RCSUNIT   0x02550
> +#define EXECLIST_CONTROL_VCSUNIT0   0x12550
> +#define EXECLIST_CONTROL_BCSUNIT   0x22550
> +
> +#define MEMORY_MAP_SIZE (64 /* MiB */ * 1024 * 1024)
> +
> +#define PTE_SIZE 4
> +#define GEN8_PTE_SIZE 8
> +
> +#define NUM_PT_ENTRIES (ALIGN(MEMORY_MAP_SIZE, 4096) / 4096)
> +#define PT_SIZE ALIGN(NUM_PT_ENTRIES * GEN8_PTE_SIZE, 4096)
> +
> +#define RING_SIZE (1 * 4096)
> +#define PPHWSP_SIZE (1 * 4096)
> +#define GEN10_LR_CON

Re: [Mesa-dev] [PATCH v2 11/16] intel: tools: update intel_aub.h

2018-06-20 Thread Rafael Antognolli
On Tue, Jun 19, 2018 at 02:45:26PM +0100, Lionel Landwerlin wrote:
> Scott added new stuff in IGT.
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/intel_aub.h | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/src/intel/tools/intel_aub.h b/src/intel/tools/intel_aub.h
> index 5f0aba8e68e..9ca548edaf3 100644
> --- a/src/intel/tools/intel_aub.h
> +++ b/src/intel/tools/intel_aub.h
> @@ -49,6 +49,12 @@
>  #define CMD_AUB  (7 << 29)
>  
>  #define CMD_AUB_HEADER   (CMD_AUB | (1 << 23) | (0x05 << 16))
> +
> +#define CMD_MEM_TRACE_REGISTER_POLL  (CMD_AUB | (0x2e << 23) | (0x02 << 16))
> +#define CMD_MEM_TRACE_REGISTER_WRITE (CMD_AUB | (0x2e << 23) | (0x03 << 16))
> +#define CMD_MEM_TRACE_MEMORY_WRITE   (CMD_AUB | (0x2e << 23) | (0x06 << 16))
> +#define CMD_MEM_TRACE_VERSION(CMD_AUB | (0x2e << 23) | (0x0e 
> << 16))
> +
>  /* DW1 */
>  # define AUB_HEADER_MAJOR_SHIFT  24
>  # define AUB_HEADER_MINOR_SHIFT  16
> @@ -92,8 +98,28 @@
>  #define AUB_TRACE_MEMTYPE_PCI(3 << 16)
>  #define AUB_TRACE_MEMTYPE_GTT_ENTRY (4 << 16)
>  
> +#define AUB_MEM_TRACE_VERSION_FILE_VERSION   1
> +
>  /* DW2 */
>  
> +#define AUB_MEM_TRACE_VERSION_DEVICE_MASK0xff00
> +#define AUB_MEM_TRACE_VERSION_DEVICE_CNL (15 << 8)
> +
> +#define AUB_MEM_TRACE_VERSION_METHOD_MASK0x000c
> +#define AUB_MEM_TRACE_VERSION_METHOD_PHY (1 << 18)
> +
> +#define AUB_MEM_TRACE_REGISTER_SIZE_MASK 0x000f
> +#define AUB_MEM_TRACE_REGISTER_SIZE_DWORD(2 << 16)
> +
> +#define AUB_MEM_TRACE_REGISTER_SPACE_MASK0xf000
> +#define AUB_MEM_TRACE_REGISTER_SPACE_MMIO(0 << 28)
> +
> +/* DW3 */
> +
> +#define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_MASK  0xf000
> +#define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_LOCAL (1 << 28)
> +#define AUB_MEM_TRACE_MEMORY_ADDRESS_SPACE_GGTT_ENTRY(4 << 28)
> +

Cool, we can use these last ones in aubinator's
handle_memtrace_mem_write(). That could be done later though.

Reviewed-by: Rafael Antognolli 

>  /**
>   * aub_state_struct_type enum values are encoded with the top 16 bits
>   * representing the type to be delivered to the .aub file, and the bottom 16
> -- 
> 2.17.1
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 07/16] intel/tools/aubinator: aubinate ppgtt aubs

2018-06-20 Thread Rafael Antognolli
On Wed, Jun 20, 2018 at 12:01:28PM -0700, Rafael Antognolli wrote:
> On Wed, Jun 20, 2018 at 11:03:32AM +0100, Lionel Landwerlin wrote:
> > On 20/06/18 01:00, Rafael Antognolli wrote:
> > > On Tue, Jun 19, 2018 at 02:45:22PM +0100, Lionel Landwerlin wrote:
> > > > From: Scott D Phillips 
> > > > 
> > > > v2: by Lionel
> > > >  Fix memfd_create compilation issue
> 
> I guess this memfd_create was supposed to be on patch 05, right?

Oops, I meant patch 06 :P

> 
> With this and the extra memfd_create() removed, this patch is
> 
> Reviewed-by: Rafael Antognolli 
> 
> > > >  Fix pml4 address stored on 32 instead of 64bits
> > > >  Return no buffer if first ppgtt page is not mapped
> > > > 
> > > > Signed-off-by: Lionel Landwerlin 
> > > > ---
> > > >   src/intel/tools/aubinator.c | 76 -
> > > >   1 file changed, 75 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> > > > index 962546d360c..3368ac521bd 100644
> > > > --- a/src/intel/tools/aubinator.c
> > > > +++ b/src/intel/tools/aubinator.c
> > > > @@ -336,6 +336,68 @@ get_ggtt_batch_bo(void *user_data, uint64_t 
> > > > address)
> > > >  return bo;
> > > >   }
> > > > +
> > > > +static struct phys_mem *
> > > > +ppgtt_walk(uint64_t pml4, uint64_t address)
> > > > +{
> > > > +   uint64_t shift = 39;
> > > > +   uint64_t addr = pml4;
> > > > +   for (int level = 4; level > 0; level--) {
> > > > +  struct phys_mem *table = search_phys_mem(addr);
> > > > +  if (!table)
> > > > + return NULL;
> > > > +  int index = (address >> shift) & 0x1ff;
> > > > +  uint64_t entry = ((uint64_t *)table->data)[index];
> > > > +  if (!(entry & 1))
> > > > + return NULL;
> > > > +  addr = entry & ~0xfff;
> > > > +  shift -= 9;
> > > > +   }
> > > > +   return search_phys_mem(addr);
> > > > +}
> > > > +
> > > > +static bool
> > > > +ppgtt_mapped(uint64_t pml4, uint64_t address)
> > > > +{
> > > > +   return ppgtt_walk(pml4, address) != NULL;
> > > > +}
> > > > +
> > > > +static struct gen_batch_decode_bo
> > > > +get_ppgtt_batch_bo(void *user_data, uint64_t address)
> > > > +{
> > > > +   struct gen_batch_decode_bo bo = {0};
> > > > +   uint64_t pml4 = *(uint64_t *)user_data;
> > > > +
> > > > +   address &= ~0xfff;
> > > > +
> > > > +   if (!ppgtt_mapped(pml4, address))
> > > > +  return bo;
> > > > +
> > > > +   /* Map everything until the first gap since we don't know how much 
> > > > the
> > > > +* decoder actually needs.
> > > > +*/
> > > > +   uint64_t end = address;
> > > > +   while (ppgtt_mapped(pml4, end))
> > > > +  end += 4096;
> > > > +
> > > > +   bo.addr = address;
> > > > +   bo.size = end - address;
> > > > +   bo.map = mmap(NULL, bo.size, PROT_READ, MAP_SHARED | MAP_ANONYMOUS, 
> > > > -1, 0);
> > > > +   assert(bo.map != MAP_FAILED);
> > > > +
> > > > +   for (uint64_t page = address; page < end; page += 4096) {
> > > > +  struct phys_mem *phys_mem = ppgtt_walk(pml4, page);
> > > > +
> > > > +  void *res = mmap((uint8_t *)bo.map + (page - bo.addr), 4096, 
> > > > PROT_READ,
> > > > +   MAP_SHARED | MAP_FIXED, mem_fd, 
> > > > phys_mem->fd_offset);
> > > > +  assert(res != MAP_FAILED);
> > > > +   }
> > > > +
> > > > +   add_gtt_bo_map(bo, true);
> > > > +
> > > > +   return bo;
> > > > +}
> > > > +
> > > >   #define GEN_ENGINE_RENDER 1
> > > >   #define GEN_ENGINE_BLITTER 2
> > > > @@ -377,6 +439,7 @@ handle_trace_block(uint32_t *p)
> > > > }
> > > > (void)engine; /* TODO */
> > > > +  batch_ctx.get_bo = get_ggtt_batch_bo;
> > > > gen_print_batch(_ctx, bo.map, bo.size, 0);
> > > > clear_bo_maps();
> > > >

Re: [Mesa-dev] [PATCH v2 07/16] intel/tools/aubinator: aubinate ppgtt aubs

2018-06-20 Thread Rafael Antognolli
On Wed, Jun 20, 2018 at 11:03:32AM +0100, Lionel Landwerlin wrote:
> On 20/06/18 01:00, Rafael Antognolli wrote:
> > On Tue, Jun 19, 2018 at 02:45:22PM +0100, Lionel Landwerlin wrote:
> > > From: Scott D Phillips 
> > > 
> > > v2: by Lionel
> > >  Fix memfd_create compilation issue

I guess this memfd_create was supposed to be on patch 05, right?

With this and the extra memfd_create() removed, this patch is

Reviewed-by: Rafael Antognolli 

> > >  Fix pml4 address stored on 32 instead of 64bits
> > >  Return no buffer if first ppgtt page is not mapped
> > > 
> > > Signed-off-by: Lionel Landwerlin 
> > > ---
> > >   src/intel/tools/aubinator.c | 76 -
> > >   1 file changed, 75 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> > > index 962546d360c..3368ac521bd 100644
> > > --- a/src/intel/tools/aubinator.c
> > > +++ b/src/intel/tools/aubinator.c
> > > @@ -336,6 +336,68 @@ get_ggtt_batch_bo(void *user_data, uint64_t address)
> > >  return bo;
> > >   }
> > > +
> > > +static struct phys_mem *
> > > +ppgtt_walk(uint64_t pml4, uint64_t address)
> > > +{
> > > +   uint64_t shift = 39;
> > > +   uint64_t addr = pml4;
> > > +   for (int level = 4; level > 0; level--) {
> > > +  struct phys_mem *table = search_phys_mem(addr);
> > > +  if (!table)
> > > + return NULL;
> > > +  int index = (address >> shift) & 0x1ff;
> > > +  uint64_t entry = ((uint64_t *)table->data)[index];
> > > +  if (!(entry & 1))
> > > + return NULL;
> > > +  addr = entry & ~0xfff;
> > > +  shift -= 9;
> > > +   }
> > > +   return search_phys_mem(addr);
> > > +}
> > > +
> > > +static bool
> > > +ppgtt_mapped(uint64_t pml4, uint64_t address)
> > > +{
> > > +   return ppgtt_walk(pml4, address) != NULL;
> > > +}
> > > +
> > > +static struct gen_batch_decode_bo
> > > +get_ppgtt_batch_bo(void *user_data, uint64_t address)
> > > +{
> > > +   struct gen_batch_decode_bo bo = {0};
> > > +   uint64_t pml4 = *(uint64_t *)user_data;
> > > +
> > > +   address &= ~0xfff;
> > > +
> > > +   if (!ppgtt_mapped(pml4, address))
> > > +  return bo;
> > > +
> > > +   /* Map everything until the first gap since we don't know how much the
> > > +* decoder actually needs.
> > > +*/
> > > +   uint64_t end = address;
> > > +   while (ppgtt_mapped(pml4, end))
> > > +  end += 4096;
> > > +
> > > +   bo.addr = address;
> > > +   bo.size = end - address;
> > > +   bo.map = mmap(NULL, bo.size, PROT_READ, MAP_SHARED | MAP_ANONYMOUS, 
> > > -1, 0);
> > > +   assert(bo.map != MAP_FAILED);
> > > +
> > > +   for (uint64_t page = address; page < end; page += 4096) {
> > > +  struct phys_mem *phys_mem = ppgtt_walk(pml4, page);
> > > +
> > > +  void *res = mmap((uint8_t *)bo.map + (page - bo.addr), 4096, 
> > > PROT_READ,
> > > +   MAP_SHARED | MAP_FIXED, mem_fd, 
> > > phys_mem->fd_offset);
> > > +  assert(res != MAP_FAILED);
> > > +   }
> > > +
> > > +   add_gtt_bo_map(bo, true);
> > > +
> > > +   return bo;
> > > +}
> > > +
> > >   #define GEN_ENGINE_RENDER 1
> > >   #define GEN_ENGINE_BLITTER 2
> > > @@ -377,6 +439,7 @@ handle_trace_block(uint32_t *p)
> > > }
> > > (void)engine; /* TODO */
> > > +  batch_ctx.get_bo = get_ggtt_batch_bo;
> > > gen_print_batch(_ctx, bo.map, bo.size, 0);
> > > clear_bo_maps();
> > > @@ -402,7 +465,7 @@ aubinator_init(uint16_t aub_pci_id, const char 
> > > *app_name)
> > >  batch_flags |= GEN_BATCH_DECODE_FLOATS;
> > >  gen_batch_decode_ctx_init(_ctx, , outfile, batch_flags,
> > > - xml_path, get_ggtt_batch_bo, NULL, NULL);
> > > + xml_path, NULL, NULL, NULL);
> > >  batch_ctx.max_vbo_decoded_lines = max_vbo_lines;
> > >  char *color = GREEN_HEADER, *reset_color = NORMAL;
> > > @@ -542,12 +605,20 @@ handle_memtrace_reg_write(uint32_t *p)
> > >  uint32_t ring_buffer_head = context[5];

Re: [Mesa-dev] [PATCH v2 06/16] intel: aubinator: handle GGTT mappings

2018-06-20 Thread Rafael Antognolli
On Tue, Jun 19, 2018 at 02:45:21PM +0100, Lionel Landwerlin wrote:
> We use memfd to store physical pages as they get read/written to and
> the GGTT entries translating virtual address to physical pages.
> 
> Based on a commit by Scott Phillips.
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aubinator.c | 256 ++--
>  1 file changed, 243 insertions(+), 13 deletions(-)
> 
> diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> index f70038376be..962546d360c 100644
> --- a/src/intel/tools/aubinator.c
> +++ b/src/intel/tools/aubinator.c
> @@ -39,12 +39,23 @@
>  
>  #include "util/list.h"
>  #include "util/macros.h"
> +#include "util/rb_tree.h"
>  
>  #include "common/gen_decoder.h"
>  #include "common/gen_disasm.h"
>  #include "common/gen_gem.h"
>  #include "intel_aub.h"
>  
> +#ifndef HAVE_MEMFD_CREATE
> +#include 
> +
> +static inline int
> +memfd_create(const char *name, unsigned int flags)
> +{
> +   return syscall(SYS_memfd_create, name, flags);
> +}
> +#endif
> +
>  /* Below is the only command missing from intel_aub.h in libdrm
>   * So, reuse intel_aub.h from libdrm and #define the
>   * AUB_MI_BATCH_BUFFER_END as below
> @@ -73,9 +84,27 @@ struct gen_batch_decode_ctx batch_ctx;
>  struct bo_map {
> struct list_head link;
> struct gen_batch_decode_bo bo;
> +   bool unmap_after_use;
> +};
> +
> +struct ggtt_entry {
> +   struct rb_node node;
> +   uint64_t virt_addr;
> +   uint64_t phys_addr;
> +};
> +
> +struct phys_mem {
> +   struct rb_node node;
> +   uint64_t fd_offset;
> +   uint64_t phys_addr;
> +   uint8_t *data;
>  };
>  
>  static struct list_head maps;
> +static struct rb_tree ggtt = {NULL};
> +static struct rb_tree mem = {NULL};
> +int mem_fd = -1;
> +off_t mem_fd_len = 0;
>  
>  FILE *outfile;
>  
> @@ -92,11 +121,12 @@ field(uint32_t value, int start, int end)
>  struct brw_instruction;
>  
>  static void
> -add_gtt_bo_map(struct gen_batch_decode_bo bo)
> +add_gtt_bo_map(struct gen_batch_decode_bo bo, bool unmap_after_use)
>  {
> struct bo_map *m = calloc(1, sizeof(*m));
>  
> m->bo = bo;
> +   m->unmap_after_use = unmap_after_use;
> list_add(>link, );
>  }
>  
> @@ -104,21 +134,208 @@ static void
>  clear_bo_maps(void)
>  {
> list_for_each_entry_safe(struct bo_map, i, , link) {
> +  if (i->unmap_after_use)
> + munmap((void *)i->bo.map, i->bo.size);
>list_del(>link);
>free(i);
> }
>  }
>  
> +static inline struct ggtt_entry *
> +ggtt_entry_next(struct ggtt_entry *entry)
> +{
> +   if (!entry)
> +  return NULL;
> +   struct rb_node *node = rb_node_next(>node);
> +   if (!node)
> +  return NULL;
> +   return rb_node_data(struct ggtt_entry, node, node);
> +}
> +
> +static inline int
> +cmp_uint64(uint64_t a, uint64_t b)
> +{
> +   if (a < b)
> +  return -1;
> +   if (a > b)
> +  return 1;
> +   return 0;
> +}
> +
> +static inline int
> +cmp_ggtt_entry(const struct rb_node *node, const void *addr)
> +{
> +   struct ggtt_entry *entry = rb_node_data(struct ggtt_entry, node, node);
> +   return cmp_uint64(entry->virt_addr, *(uint64_t *)addr);
> +}
> +
> +static struct ggtt_entry *
> +ensure_ggtt_entry(struct rb_tree *tree, uint64_t virt_addr)
> +{
> +   struct rb_node *node = rb_tree_search_sloppy(, _addr,
> +cmp_ggtt_entry);
> +   int cmp = 0;
> +   if (!node || (cmp = cmp_ggtt_entry(node, _addr))) {
> +  struct ggtt_entry *new_entry = calloc(1, sizeof(*new_entry));
> +  new_entry->virt_addr = virt_addr;
> +  rb_tree_insert_at(, node, _entry->node, cmp > 0);
> +  node = _entry->node;
> +   }
> +
> +   return rb_node_data(struct ggtt_entry, node, node);
> +}
> +
> +static struct ggtt_entry *
> +search_ggtt_entry(uint64_t virt_addr)
> +{
> +   virt_addr &= ~0xfff;
> +
> +   struct rb_node *node = rb_tree_search(, _addr, cmp_ggtt_entry);
> +
> +   if (!node)
> +  return NULL;
> +
> +   return rb_node_data(struct ggtt_entry, node, node);
> +}
> +
> +static inline int
> +cmp_phys_mem(const struct rb_node *node, const void *addr)
> +{
> +   struct phys_mem *mem = rb_node_data(struct phys_mem, node, node);
> +   return cmp_uint64(mem->phys_addr, *(uint64_t *)addr);
> +}
> +
> +static struct phys_mem *
> +ensure_phys_mem(uint64_t phys_addr)
> +{
> +   struct rb_node *node = rb_tree_search_sloppy(, _addr, 
> cmp_phys_mem);
> +   int cmp = 0;
> +   if (!node || (cmp = cmp_phys_mem(node, _addr))) {
> +  struct phys_mem *new_mem = calloc(1, sizeof(*new_mem));
> +  new_mem->phys_addr = phys_addr;
> +  new_mem->fd_offset = mem_fd_len;
> +
> +  int ftruncate_res = ftruncate(mem_fd, mem_fd_len += 4096);
> +  assert(ftruncate_res == 0);
> +
> +  new_mem->data = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED,
> +   mem_fd, new_mem->fd_offset);
> +  assert(new_mem->data != MAP_FAILED);
> +
> +  rb_tree_insert_at(, node, _mem->node, cmp > 0);
> + 

Re: [Mesa-dev] [PATCH v2 07/16] intel/tools/aubinator: aubinate ppgtt aubs

2018-06-19 Thread Rafael Antognolli
On Tue, Jun 19, 2018 at 02:45:22PM +0100, Lionel Landwerlin wrote:
> From: Scott D Phillips 
> 
> v2: by Lionel
> Fix memfd_create compilation issue
> Fix pml4 address stored on 32 instead of 64bits
> Return no buffer if first ppgtt page is not mapped
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aubinator.c | 76 -
>  1 file changed, 75 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> index 962546d360c..3368ac521bd 100644
> --- a/src/intel/tools/aubinator.c
> +++ b/src/intel/tools/aubinator.c
> @@ -336,6 +336,68 @@ get_ggtt_batch_bo(void *user_data, uint64_t address)
>  
> return bo;
>  }
> +
> +static struct phys_mem *
> +ppgtt_walk(uint64_t pml4, uint64_t address)
> +{
> +   uint64_t shift = 39;
> +   uint64_t addr = pml4;
> +   for (int level = 4; level > 0; level--) {
> +  struct phys_mem *table = search_phys_mem(addr);
> +  if (!table)
> + return NULL;
> +  int index = (address >> shift) & 0x1ff;
> +  uint64_t entry = ((uint64_t *)table->data)[index];
> +  if (!(entry & 1))
> + return NULL;
> +  addr = entry & ~0xfff;
> +  shift -= 9;
> +   }
> +   return search_phys_mem(addr);
> +}
> +
> +static bool
> +ppgtt_mapped(uint64_t pml4, uint64_t address)
> +{
> +   return ppgtt_walk(pml4, address) != NULL;
> +}
> +
> +static struct gen_batch_decode_bo
> +get_ppgtt_batch_bo(void *user_data, uint64_t address)
> +{
> +   struct gen_batch_decode_bo bo = {0};
> +   uint64_t pml4 = *(uint64_t *)user_data;
> +
> +   address &= ~0xfff;
> +
> +   if (!ppgtt_mapped(pml4, address))
> +  return bo;
> +
> +   /* Map everything until the first gap since we don't know how much the
> +* decoder actually needs.
> +*/
> +   uint64_t end = address;
> +   while (ppgtt_mapped(pml4, end))
> +  end += 4096;
> +
> +   bo.addr = address;
> +   bo.size = end - address;
> +   bo.map = mmap(NULL, bo.size, PROT_READ, MAP_SHARED | MAP_ANONYMOUS, -1, 
> 0);
> +   assert(bo.map != MAP_FAILED);
> +
> +   for (uint64_t page = address; page < end; page += 4096) {
> +  struct phys_mem *phys_mem = ppgtt_walk(pml4, page);
> +
> +  void *res = mmap((uint8_t *)bo.map + (page - bo.addr), 4096, PROT_READ,
> +   MAP_SHARED | MAP_FIXED, mem_fd, phys_mem->fd_offset);
> +  assert(res != MAP_FAILED);
> +   }
> +
> +   add_gtt_bo_map(bo, true);
> +
> +   return bo;
> +}
> +
>  #define GEN_ENGINE_RENDER 1
>  #define GEN_ENGINE_BLITTER 2
>  
> @@ -377,6 +439,7 @@ handle_trace_block(uint32_t *p)
>}
>  
>(void)engine; /* TODO */
> +  batch_ctx.get_bo = get_ggtt_batch_bo;
>gen_print_batch(_ctx, bo.map, bo.size, 0);
>  
>clear_bo_maps();
> @@ -402,7 +465,7 @@ aubinator_init(uint16_t aub_pci_id, const char *app_name)
> batch_flags |= GEN_BATCH_DECODE_FLOATS;
>  
> gen_batch_decode_ctx_init(_ctx, , outfile, batch_flags,
> - xml_path, get_ggtt_batch_bo, NULL, NULL);
> + xml_path, NULL, NULL, NULL);
> batch_ctx.max_vbo_decoded_lines = max_vbo_lines;
>  
> char *color = GREEN_HEADER, *reset_color = NORMAL;
> @@ -542,12 +605,20 @@ handle_memtrace_reg_write(uint32_t *p)
> uint32_t ring_buffer_head = context[5];
> uint32_t ring_buffer_tail = context[7];
> uint32_t ring_buffer_start = context[9];
> +   uint64_t pml4 = (uint64_t)context[49] << 32 | context[51];
>  
> struct gen_batch_decode_bo ring_bo = get_ggtt_batch_bo(NULL,
>ring_buffer_start);
> assert(ring_bo.size > 0);
> void *commands = (uint8_t *)ring_bo.map + (ring_bo.addr - 
> ring_buffer_start);
>  
> +   if (context_descriptor & 0x100 /* ppgtt */) {
> +  batch_ctx.get_bo = get_ppgtt_batch_bo;
> +  batch_ctx.user_data = 
> +   } else {
> +  batch_ctx.get_bo = get_ggtt_batch_bo;
> +   }
> +
> (void)engine; /* TODO */
> gen_print_batch(_ctx, commands, ring_buffer_tail - ring_buffer_head,
> 0);
> @@ -849,6 +920,9 @@ int main(int argc, char *argv[])
>  
> list_inithead();
>  
> +   mem_fd = memfd_create("phys memory", 0);
> +
> +

It seems like this memfd_create() got duplicated here (it was added in
the previous patch).

> file = aub_file_open(input_file);
>  
> while (aub_file_more_stuff(file) &&
> -- 
> 2.17.1
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 04/16] intel: aubinator: drop the 1Tb GTT mapping

2018-06-19 Thread Rafael Antognolli
Patch is

Reviewed-by: Rafael Antognolli 

On Tue, Jun 19, 2018 at 02:45:19PM +0100, Lionel Landwerlin wrote:
> Now that we're softpinning the address of our BOs in anv & i965, the
> addresses selected start at the top of the addressing space. This is a
> problem for the current implementation of aubinator which uses only a
> 40bit mmapped address space.
> 
> This change keeps track of all the memory writes from the aub file and
> fetch them on request by the batch decoder. As a result we can get rid
> of the 1<<40 mmapped address space and only rely on the mmap aub file
> \o/
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aubinator.c | 130 
>  1 file changed, 72 insertions(+), 58 deletions(-)
> 
> diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> index 0438f96cd1b..f70038376be 100644
> --- a/src/intel/tools/aubinator.c
> +++ b/src/intel/tools/aubinator.c
> @@ -37,10 +37,12 @@
>  #include 
>  #include 
>  
> +#include "util/list.h"
>  #include "util/macros.h"
>  
>  #include "common/gen_decoder.h"
>  #include "common/gen_disasm.h"
> +#include "common/gen_gem.h"
>  #include "intel_aub.h"
>  
>  /* Below is the only command missing from intel_aub.h in libdrm
> @@ -68,8 +70,12 @@ char *input_file = NULL, *xml_path = NULL;
>  struct gen_device_info devinfo;
>  struct gen_batch_decode_ctx batch_ctx;
>  
> -uint64_t gtt_size, gtt_end;
> -void *gtt;
> +struct bo_map {
> +   struct list_head link;
> +   struct gen_batch_decode_bo bo;
> +};
> +
> +static struct list_head maps;
>  
>  FILE *outfile;
>  
> @@ -85,10 +91,32 @@ field(uint32_t value, int start, int end)
>  
>  struct brw_instruction;
>  
> -static inline int
> -valid_offset(uint32_t offset)
> +static void
> +add_gtt_bo_map(struct gen_batch_decode_bo bo)
>  {
> -   return offset < gtt_end;
> +   struct bo_map *m = calloc(1, sizeof(*m));
> +
> +   m->bo = bo;
> +   list_add(>link, );
> +}
> +
> +static void
> +clear_bo_maps(void)
> +{
> +   list_for_each_entry_safe(struct bo_map, i, , link) {
> +  list_del(>link);
> +  free(i);
> +   }
> +}
> +
> +static struct gen_batch_decode_bo
> +get_gen_batch_bo(void *user_data, uint64_t address)
> +{
> +   list_for_each_entry(struct bo_map, i, , link)
> +  if (i->bo.addr <= address && i->bo.addr + i->bo.size > address)
> + return i->bo;
> +
> +   return (struct gen_batch_decode_bo) { .map = NULL };
>  }
>  
>  #define GEN_ENGINE_RENDER 1
> @@ -100,26 +128,23 @@ handle_trace_block(uint32_t *p)
> int operation = p[1] & AUB_TRACE_OPERATION_MASK;
> int type = p[1] & AUB_TRACE_TYPE_MASK;
> int address_space = p[1] & AUB_TRACE_ADDRESS_SPACE_MASK;
> -   uint64_t offset = p[3];
> -   uint32_t size = p[4];
> int header_length = p[0] & 0x;
> -   uint32_t *data = p + header_length + 2;
> int engine = GEN_ENGINE_RENDER;
> -
> -   if (devinfo.gen >= 8)
> -  offset += (uint64_t) p[5] << 32;
> +   struct gen_batch_decode_bo bo = {
> +  .map = p + header_length + 2,
> +  /* Addresses written by aubdump here are in canonical form but the 
> batch
> +   * decoder always gives us addresses with the top 16bits zeroed, so do
> +   * the same here.
> +   */
> +  .addr = gen_48b_address((devinfo.gen >= 8 ? ((uint64_t) p[5] << 32) : 
> 0) |
> +  ((uint64_t) p[3])),
> +  .size = p[4],
> +   };
>  
> switch (operation) {
> case AUB_TRACE_OP_DATA_WRITE:
> -  if (address_space != AUB_TRACE_MEMTYPE_GTT)
> - break;
> -  if (gtt_size < offset + size) {
> - fprintf(stderr, "overflow gtt space: %s\n", strerror(errno));
> - exit(EXIT_FAILURE);
> -  }
> -  memcpy((char *) gtt + offset, data, size);
> -  if (gtt_end < offset + size)
> - gtt_end = offset + size;
> +  if (address_space == AUB_TRACE_MEMTYPE_GTT)
> + add_gtt_bo_map(bo);
>break;
> case AUB_TRACE_OP_COMMAND_WRITE:
>switch (type) {
> @@ -135,27 +160,13 @@ handle_trace_block(uint32_t *p)
>}
>  
>(void)engine; /* TODO */
> -  gen_print_batch(_ctx, data, size, 0);
> +  gen_print_batch(_ctx, bo.map, bo.size, 0);
>  
> -  gtt_end = 0;
> +  clear_bo_maps();
>break;
> }
>  }
>  
> -static struct gen_batch_decode_bo
> -get_gen_batch_bo(void *user_data, uint64_t address)
> -{
> -   if (address > gtt_end)
> -  

Re: [Mesa-dev] [PATCH v2 02/16] intel: aubinator: remove standard input processing option

2018-06-19 Thread Rafael Antognolli
On Tue, Jun 19, 2018 at 11:40:30AM -0700, Rafael Antognolli wrote:
> On Tue, Jun 19, 2018 at 02:45:17PM +0100, Lionel Landwerlin wrote:
> > Now that we rely on mmap of the data to parse, we can't process the
> > standard input anymore.
> 
> Didn't we rely on mmap of the data since forever?

Oh, I think it's because of patch 04, right? If so, I think we need to
update the message to reflect that this is going to be changed in a
newer commit. And maybe explain it a little more, something like:

"On a follow up commit in this series, we stop copying the data from the
mmap'ed file into our big gtt mmap, and start referencing data in it
directly. So reallocating the read buffer and adding more data from
stdin wouldn't work. For that reason, let's stop supporting stdin
process."

Or something like that, assuming I understood it correclty.

Anyway, this patch is

Reviewed-by: Rafael Antognolli 

> > This isn't much of a big deal because we have in-process batch decoder
> > (run with INTEL_DEBUG=batch) that supports essentially doing the same
> > thing.
> > 
> > Signed-off-by: Lionel Landwerlin 
> > ---
> >  src/intel/tools/aubinator.c | 102 +---
> >  1 file changed, 12 insertions(+), 90 deletions(-)
> > 
> > diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> > index 949ba96e556..3f9047e69a8 100644
> > --- a/src/intel/tools/aubinator.c
> > +++ b/src/intel/tools/aubinator.c
> > @@ -350,17 +350,6 @@ aub_file_open(const char *filename)
> > return file;
> >  }
> >  
> > -static struct aub_file *
> > -aub_file_stdin(void)
> > -{
> > -   struct aub_file *file;
> > -
> > -   file = calloc(1, sizeof *file);
> > -   file->stream = stdin;
> > -
> > -   return file;
> > -}
> > -
> >  #define TYPE(dw)   (((dw) >> 29) & 7)
> >  #define OPCODE(dw) (((dw) >> 23) & 0x3f)
> >  #define SUBOPCODE(dw)  (((dw) >> 16) & 0x7f)
> > @@ -398,8 +387,7 @@ aub_file_decode_batch(struct aub_file *file)
> > uint32_t *p, h, *new_cursor;
> > int header_length, bias;
> >  
> > -   if (file->end - file->cursor < 1)
> > -  return AUB_ITEM_DECODE_NEED_MORE_DATA;
> > +   assert(file->cursor < file->end);
> >  
> > p = file->cursor;
> > h = *p;
> > @@ -421,13 +409,11 @@ aub_file_decode_batch(struct aub_file *file)
> >  
> > new_cursor = p + header_length + bias;
> > if ((h & 0x) == MAKE_HEADER(TYPE_AUB, OPCODE_AUB, 
> > SUBOPCODE_BLOCK)) {
> > -  if (file->end - file->cursor < 4)
> > - return AUB_ITEM_DECODE_NEED_MORE_DATA;
> > +  assert(file->end - file->cursor >= 4);
> >new_cursor += p[4] / 4;
> > }
> >  
> > -   if (new_cursor > file->end)
> > -  return AUB_ITEM_DECODE_NEED_MORE_DATA;
> > +   assert(new_cursor <= file->end);
> >  
> > switch (h & 0x) {
> > case MAKE_HEADER(TYPE_AUB, OPCODE_AUB, SUBOPCODE_HEADER):
> > @@ -468,48 +454,6 @@ aub_file_more_stuff(struct aub_file *file)
> > return file->cursor < file->end || (file->stream && 
> > !feof(file->stream));
> >  }
> >  
> > -#define AUB_READ_BUFFER_SIZE (4096)
> > -#define MAX(a, b) ((a) < (b) ? (b) : (a))
> > -
> > -static void
> > -aub_file_data_grow(struct aub_file *file)
> > -{
> > -   size_t old_size = (file->mem_end - file->map) * 4;
> > -   size_t new_size = MAX(old_size * 2, AUB_READ_BUFFER_SIZE);
> > -   uint32_t *new_start = realloc(file->map, new_size);
> > -
> > -   file->cursor = new_start + (file->cursor - file->map);
> > -   file->end = new_start + (file->end - file->map);
> > -   file->map = new_start;
> > -   file->mem_end = file->map + (new_size / 4);
> > -}
> > -
> > -static bool
> > -aub_file_data_load(struct aub_file *file)
> > -{
> > -   size_t r;
> > -
> > -   if (file->stream == NULL)
> > -  return false;
> > -
> > -   /* First remove any consumed data */
> > -   if (file->cursor > file->map) {
> > -  memmove(file->map, file->cursor,
> > -  (file->end - file->cursor) * 4);
> > -  file->end -= file->cursor - file->map;
> > -  file->cursor = file->map;
> > -   }
> > -
> > -   /* Then load some new data in */
> > -   if ((file->mem_end - file->end) < (AUB_READ_BUFFER_SIZE / 4))
> > -  a

Re: [Mesa-dev] [PATCH v2 03/16] intel: aubinator: rework register writes handling

2018-06-19 Thread Rafael Antognolli
For some reason I always have trouble finding the docs about this.
Still, it looks correct according to the docs, and it also seems to be
matching what I see in aubdump since "tools/intel_aubdump: Simulate
"enhanced execlist" submission for gen11+".

Reviewed-by: Rafael Antognolli 

On Tue, Jun 19, 2018 at 02:45:18PM +0100, Lionel Landwerlin wrote:
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/aubinator.c | 82 -
>  1 file changed, 54 insertions(+), 28 deletions(-)
> 
> diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
> index 3f9047e69a8..0438f96cd1b 100644
> --- a/src/intel/tools/aubinator.c
> +++ b/src/intel/tools/aubinator.c
> @@ -240,46 +240,72 @@ handle_memtrace_version(uint32_t *p)
>  static void
>  handle_memtrace_reg_write(uint32_t *p)
>  {
> +   static struct execlist_regs {
> +  uint32_t render_elsp[4];
> +  int render_elsp_index;
> +  uint32_t blitter_elsp[4];
> +  int blitter_elsp_index;
> +   } state = {};
> +
> uint32_t offset = p[1];
> uint32_t value = p[5];
> +
> int engine;
> -   static int render_elsp_writes = 0;
> -   static int blitter_elsp_writes = 0;
> -   static int render_elsq0 = 0;
> -   static int blitter_elsq0 = 0;
> -   uint8_t *pphwsp;
> -
> -   if (offset == 0x2230) {
> -  render_elsp_writes++;
> +   uint64_t context_descriptor;
> +
> +   switch (offset) {
> +   case 0x2230: /* render elsp */
> +  state.render_elsp[state.render_elsp_index++] = value;
> +  if (state.render_elsp_index < 4)
> + return;
> +
> +  state.render_elsp_index = 0;
>engine = GEN_ENGINE_RENDER;
> -   } else if (offset == 0x22230) {
> -  blitter_elsp_writes++;
> +  context_descriptor = (uint64_t)state.render_elsp[2] << 32 |
> + state.render_elsp[3];
> +  break;
> +   case 0x22230: /* blitter elsp */
> +  state.blitter_elsp[state.blitter_elsp_index++] = value;
> +  if (state.blitter_elsp_index < 4)
> + return;
> +
> +  state.blitter_elsp_index = 0;
>engine = GEN_ENGINE_BLITTER;
> -   } else if (offset == 0x2510) {
> -  render_elsq0 = value;
> -   } else if (offset == 0x22510) {
> -  blitter_elsq0 = value;
> -   } else if (offset == 0x2550 || offset == 0x22550) {
> -  /* nothing */;
> -   } else {
> +  context_descriptor = (uint64_t)state.blitter_elsp[2] << 32 |
> + state.blitter_elsp[3];
> +  break;
> +   case 0x2510: /* render elsq0 lo */
> +  state.render_elsp[3] = value;
>return;
> -   }
> -
> -   if (render_elsp_writes > 3 || blitter_elsp_writes > 3) {
> -  render_elsp_writes = blitter_elsp_writes = 0;
> -  pphwsp = (uint8_t*)gtt + (value & 0xf000);
> -   } else if (offset == 0x2550) {
> +  break;
> +   case 0x2514: /* render elsq0 hi */
> +  state.render_elsp[2] = value;
> +  return;
> +  break;
> +   case 0x22510: /* blitter elsq0 lo */
> +  state.blitter_elsp[3] = value;
> +  return;
> +  break;
> +   case 0x22514: /* blitter elsq0 hi */
> +  state.blitter_elsp[2] = value;
> +  return;
> +  break;
> +   case 0x2550: /* render elsc */
>engine = GEN_ENGINE_RENDER;
> -  pphwsp = (uint8_t*)gtt + (render_elsq0 & 0xf000);
> -   } else if (offset == 0x22550) {
> +  context_descriptor = (uint64_t)state.render_elsp[2] << 32 |
> + state.render_elsp[3];
> +  break;
> +   case 0x22550: /* blitter elsc */
>engine = GEN_ENGINE_BLITTER;
> -  pphwsp = (uint8_t*)gtt + (blitter_elsq0 & 0xf000);
> -   } else {
> +  context_descriptor = (uint64_t)state.blitter_elsp[2] << 32 |
> + state.blitter_elsp[3];
> +  break;
> +   default:
>return;
> }
>  
> const uint32_t pphwsp_size = 4096;
> -   uint32_t *context = (uint32_t*)(pphwsp + pphwsp_size);
> +   uint32_t *context = (uint32_t*)(gtt + (context_descriptor & 0xf000) + 
> pphwsp_size);
> uint32_t ring_buffer_head = context[5];
> uint32_t ring_buffer_tail = context[7];
> uint32_t ring_buffer_start = context[9];
> -- 
> 2.17.1
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   5   6   >