date:20150706

Re: [Mesa-dev] [PATCH v4 3/6] i965: Enable hardware-generated binding tables on render path.

2015-07-06 Thread Abdiel Janulgue



On 07/03/2015 12:20 PM, Chris Wilson wrote:
 On Fri, Jul 03, 2015 at 10:00:31AM +0300, Abdiel Janulgue wrote:
 +void
 +gen7_enable_hw_binding_tables(struct brw_context *brw)
 +{
 +   if (!brw-use_resource_streamer)
 +  return;
 +
 +   if (!brw-hw_bt_pool.bo) {
 +  /* We use a single re-usable buffer object for the lifetime of the
 +   * context and size it to maximum allowed binding tables that can be
 +   * programmed per batch:
 +   *
 +   * From the Haswell PRM, Volume 7: 3D Media GPGPU,
 +   * 3DSTATE_BINDING_TABLE_POOL_ALLOC  Programming Note:
 +   * A maximum of 16,383 Binding tables are allowed in any batch 
 buffer
 +   */
 +  static const int max_size = 16383 * 4;
 +  brw-hw_bt_pool.bo = drm_intel_bo_alloc(brw-bufmgr, hw_bt,
 +  max_size, 64);
 +  brw-hw_bt_pool.next_offset = 0;
 +   }
 +
 +   int pkt_len = brw-gen = 8 ? 4 : 3;
 +   uint32_t dw1 = BRW_HW_BINDING_TABLE_ENABLE;
 +   if (brw-is_haswell)
 +  dw1 |= SET_FIELD(GEN7_MOCS_L3, GEN7_HW_BT_POOL_MOCS) |
 + HSW_BT_POOL_ALLOC_MUST_BE_ONE;
 +   else if (brw-gen = 8)
 +  dw1 |= BDW_MOCS_WB;
 +
 +   BEGIN_BATCH(pkt_len);
 +   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC  16 | (pkt_len - 2));
 +   if (brw-gen = 8) {
 +  OUT_RELOC64(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1);
 +  OUT_BATCH(brw-hw_bt_pool.bo-size);
 +   } else {
 +  OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1);
 +  OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0,
 + brw-hw_bt_pool.bo-size);
 +   }
 +   ADVANCE_BATCH();
 +
 +   /* From the Haswell PRM, Volume 7: 3D Media GPGPU,
 +* 3DSTATE_BINDING_TABLE_POOL_ALLOC  Programming Note:
 +*
 +* When switching between HW and SW binding table generation, SW must
 +* issue a state cache invalidate.
 +*/
 +   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE);
 
 Should this flush not be before the enable? It's a top-of-pipe sync,
 which means that the enable 3DSTATE will be parsed and changing the GPU
 state as the previous commands are still running. Since the flush is
 mandatory, that implies to me that the state it changing is not latched
 and so still being used by those earlier commands.

I made a test-run with the flush first before the enable hw-bt and
didn't see any problems. Re-ordering this to the top does actually makes
sense. Thanks for pointing this out!

-Abdiel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH RFC] egl/dri2: Add dri3 support to x11 platform

2015-07-06 Thread Pekka Paalanen

On Sat, 4 Jul 2015 11:18:18 +0800
Boyan Ding boyan.j.d...@gmail.com wrote:

  [snip]
 
  +/* FIXME: Is this right? Seems problematic for WL_bind_wayland_display */
  What seems to be the problem ?
 
  Afaik xcb_dri3_open_reply_fds should return an FD which is ok (be that a
  render_node device, or a master one with explicit auth).
 
 The problem is that WL_bind_wayland_display don't work under dri3 on
 x11. I only found it yesterday that to get it work, we'll need to add a
 mechanism to pass fd instead of name of dri device in wl_drm protocol.
 
 Previously, if a wayland client wants to use hardware accelerated EGL,
 it (with the help of libEGL in mesa) will bind to wl_drm object, and
 wl_drm will immediately send the name of dri device to the wayland
 client (actually also libEGL in mesa). After wayland platform code
 opens the device, it has to send the fd to the X server or drm to
 get authentication. Things are different with dri3, where a fd is
 directly sent to the client without the need to authenticate.
 
 I propose the following addition in wl_drm protocol:
 
 There are two kinds of wl_drm implementation. One is the current form.
 The other one, called dri3-capable (or whatever name), include wl_drm
 object built on dri3 directly or indirectly through wayland platform.
 
 If a client binds to a dri3-capable wl_drm object, it will send a device
 event to the client with NULL or empty string (so a client who knows
 nothing about it can safely fail). If the client knows about dri3-capable
 wl_drm object, it will send a request named get_fd and wl_drm will
 respond it with an fd acquired with dri3. If the wl_drm object is not
 dri3-capable it will raise an error if it receives a get_fd request,
 so will a dri3-capable wl_drm object if it receives authenticate
 request.

Remember, that all protocol errors in Wayland are always fatal, and
cause the whole client to be disconnected. Also, on Wayland EGL simply
cannot have its private connection to the server, it *must* be shared
with the whole application because otherwise EGL cannot operate on
the app's wl_surfaces.

If you really wanted to go this route (it seems you already have a
better idea than this), just use the normal mechanism to extend the
current wl_drm interface, or invent a whole new interface. Object
flavours does not seem like a workable solution the way you describe,
if I understood what you had in mind.

Note, that the compositor and the app may be using different versions
of Mesa, which means they may be using different definitions of wl_drm.
Therefore, wl_drm must follow the same stable protocol rules as
everything else.


Thanks,
pq


 
 So the following dri3_authenticate function is not needed. Let's not
 expose WL_bind_wayland_display for now, and its enablement
 should be separate patches.
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] clover: Implement image attribute getters

2015-07-06 Thread Zoltán Gilián

 This seems to be doing essentially the same thing as v1?  Is it the
 right patch?

The llvm pass was invoked in clover in v1. This patch relies on llvm
to perform that task (). What this patch does basically is that it
adds the image attributes to the end of the kernel input vector.
The commit message of this patch is misleading, I'll fix it.

On Wed, Jun 24, 2015 at 2:48 PM, Francisco Jerez curroje...@riseup.net wrote:
 Zoltan Gilian zoltan.gil...@gmail.com writes:

 Image attributes are passed to the kernel as hidden parameters after the
 image attribute itself. An llvm pass replaces the getter builtins to
 the appropriate parameters.

 This seems to be doing essentially the same thing as v1?  Is it the
 right patch?

 ---
  src/gallium/state_trackers/clover/core/kernel.cpp  | 26 +++
  src/gallium/state_trackers/clover/core/kernel.hpp  | 13 ++--
  src/gallium/state_trackers/clover/core/memory.cpp  |  2 +-
  .../state_trackers/clover/llvm/invocation.cpp  | 81 
 +-
  4 files changed, 116 insertions(+), 6 deletions(-)

 diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp 
 b/src/gallium/state_trackers/clover/core/kernel.cpp
 index 0756f06..291c799 100644
 --- a/src/gallium/state_trackers/clover/core/kernel.cpp
 +++ b/src/gallium/state_trackers/clover/core/kernel.cpp
 @@ -185,6 +185,13 @@ kernel::exec_context::bind(intrusive_ptrcommand_queue 
 _q,
}
 }

 +   // Bind image attribute args.
 +   for (const auto arg: kern._args) {
 +  if (auto img_arg = dynamic_castimage_argument*(arg.get())) {
 + img_arg-bind_attributes(*this);
 +  }
 +   }
 +
 // Create a new compute state if anything changed.
 if (!st || q != _q ||
 cs.req_local_mem != mem_local ||
 @@ -465,6 +472,25 @@ kernel::constant_argument::unbind(exec_context ctx) {
  }

  void
 +kernel::image_argument::bind_attributes(exec_context ctx) {
 +   cl_image_format format = img-format();
 +   cl_uint attributes[] = {
 + static_castcl_uint(img-width()),
 + static_castcl_uint(img-height()),
 + static_castcl_uint(img-depth()),
 + format.image_channel_data_type,
 + format.image_channel_order};
 +   for (unsigned i = 0; i  5; ++i) {
 +  auto v = bytes(attributes[i]);
 +
 +  extend(v, module::argument::zero_ext, sizeof(cl_uint));
 +  byteswap(v, ctx.q-device().endianness());
 +  align(ctx.input, sizeof(cl_uint));
 +  insert(ctx.input, v);
 +   }
 +}
 +
 +void
  kernel::image_rd_argument::set(size_t size, const void *value) {
 if (size != sizeof(cl_mem))
throw error(CL_INVALID_ARG_SIZE);
 diff --git a/src/gallium/state_trackers/clover/core/kernel.hpp 
 b/src/gallium/state_trackers/clover/core/kernel.hpp
 index d6432a4..8c15b2f 100644
 --- a/src/gallium/state_trackers/clover/core/kernel.hpp
 +++ b/src/gallium/state_trackers/clover/core/kernel.hpp
 @@ -190,7 +190,14 @@ namespace clover {
   pipe_surface *st;
};

 -  class image_rd_argument : public argument {
 +  class image_argument : public argument {
 +  public:
 + void bind_attributes(exec_context ctx);
 +  protected:
 + image *img;
 +  };
 +
 +  class image_rd_argument : public image_argument {
public:
   virtual void set(size_t size, const void *value);
   virtual void bind(exec_context ctx,
 @@ -198,11 +205,10 @@ namespace clover {
   virtual void unbind(exec_context ctx);

private:
 - image *img;
   pipe_sampler_view *st;
};

 -  class image_wr_argument : public argument {
 +  class image_wr_argument : public image_argument {
public:
   virtual void set(size_t size, const void *value);
   virtual void bind(exec_context ctx,
 @@ -210,7 +216,6 @@ namespace clover {
   virtual void unbind(exec_context ctx);

private:
 - image *img;
   pipe_surface *st;
};

 diff --git a/src/gallium/state_trackers/clover/core/memory.cpp 
 b/src/gallium/state_trackers/clover/core/memory.cpp
 index 055336a..b852e68 100644
 --- a/src/gallium/state_trackers/clover/core/memory.cpp
 +++ b/src/gallium/state_trackers/clover/core/memory.cpp
 @@ -189,7 +189,7 @@ image2d::image2d(clover::context ctx, cl_mem_flags 
 flags,
   const cl_image_format *format, size_t width,
   size_t height, size_t row_pitch,
   void *host_ptr) :
 -   image(ctx, flags, format, width, height, 0,
 +   image(ctx, flags, format, width, height, 1,
   row_pitch, 0, height * row_pitch, host_ptr) {
  }

 diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
 b/src/gallium/state_trackers/clover/llvm/invocation.cpp
 index 9b91fee..a33d450 100644
 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
 +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
 @@ -80,6 +80,7 @@
  using namespace clover;

  namespace {
 +
  #if 0
 void

[Mesa-dev] [PATCH 03/18] i965: Share the workaround bo between all contexts

2015-07-06 Thread Chris Wilson

Since the workaround bo is used strictly as a write-only buffer, we need
only allocate one per screen and use the same one from all contexts.

(The caveat here is during extension initialisation, where we write into
and read back register values from the buffer, but that is performed only
once for the first context - and baring synchronisation issues should not
be a problem. Safer would be to move that also to the screen.)

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 +++---
 src/mesa/drivers/dri/i965/intel_screen.c | 4 
 src/mesa/drivers/dri/i965/intel_screen.h | 1 +
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index 7ee3cb6..05e14cd 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -341,12 +341,12 @@ brw_init_pipe_control(struct brw_context *brw,
 * the gen6 workaround because it involves actually writing to
 * the buffer, and the kernel doesn't let us write to the batch.
 */
-   brw-workaround_bo = drm_intel_bo_alloc(brw-bufmgr,
-   pipe_control workaround,
-   4096, 4096);
+   brw-workaround_bo = brw-intelScreen-workaround_bo;
if (brw-workaround_bo == NULL)
   return -ENOMEM;
 
+   drm_intel_bo_reference(brw-workaround_bo);
+
brw-pipe_controls_since_last_cs_stall = 0;
 
return 0;
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 839a984..cd8e6eb 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv)
 {
struct intel_screen *intelScreen = sPriv-driverPrivate;
 
+   drm_intel_bo_unreference(intelScreen-workaround_bo);
dri_bufmgr_destroy(intelScreen-bufmgr);
driDestroyOptionInfo(intelScreen-optionCache);
 
@@ -1096,6 +1097,9 @@ intel_init_bufmgr(struct intel_screen *intelScreen)
   return false;
}
 
+   intelScreen-workaround_bo =
+  drm_intel_bo_alloc(intelScreen-bufmgr, pipe_control w/a, 4096, 4096);
+
return true;
 }
 
diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
b/src/mesa/drivers/dri/i965/intel_screen.h
index 941e0fc..e55fddb 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.h
+++ b/src/mesa/drivers/dri/i965/intel_screen.h
@@ -60,6 +60,7 @@ struct intel_screen
bool has_context_reset_notification;
 
dri_bufmgr *bufmgr;
+   drm_intel_bo *workaround_bo;
 
/**
 * A unique ID for shader programs.
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/18] i965: Enable GPU snooping of CPU caches for select buffers

2015-07-06 Thread Chris Wilson

On LLC, all buffers are normally cache coherent between the CPU and the
GPU, giving both parties fast access to shared data.

However, older architectures or Atoms, do not implement LLC between the
CPU and GPU. Instead they utilise a snooping architecture where the GPU
can snoop the CPU cache when told. The snooping has much higher overhead
(i.e. slower) than regular memory fetches so its use should be reserved
to instances where the data likely resides in the CPU cache and the GPU
need only use it once (e.g. for streaming textures from the CPU). The
other major benefit is that it can also push data to the CPU in a cache
coherent fashion, and so CPU access to a snooped buffer is very fast,
making it preferrable for anytime we need to readback data from the GPU

For demonstration we set the snoop flag for reading back via a blit from
a miptree, and for reading back transform feedback.
---
 src/mesa/drivers/dri/i965/brw_batch.c | 31 +++
 src/mesa/drivers/dri/i965/brw_batch.h |  9 
 src/mesa/drivers/dri/i965/brw_context.h   |  1 +
 src/mesa/drivers/dri/i965/gen6_sol.c  |  3 ++-
 src/mesa/drivers/dri/i965/gen7_sol_state.c|  2 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c |  1 +
 6 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_batch.c 
b/src/mesa/drivers/dri/i965/brw_batch.c
index bfff9fe..e01a0c4 100644
--- a/src/mesa/drivers/dri/i965/brw_batch.c
+++ b/src/mesa/drivers/dri/i965/brw_batch.c
@@ -1323,6 +1323,37 @@ struct brw_bo *brw_bo_create(struct brw_batch *batch,
return bo;
 }
 
+static bool __brw_bo_set_caching(struct brw_bo *bo, int caching)
+{
+   struct drm_i915_gem_caching arg;
+
+   memset(arg, 0, sizeof(arg));
+   arg.handle = bo-handle;
+   arg.caching = caching;
+   return drmIoctl(bo-batch-fd, DRM_IOCTL_I915_GEM_SET_CACHING, arg) == 0;
+}
+
+void brw_bo_enable_snoop(struct brw_bo *bo)
+{
+   assert(bo-reusable);
+
+   if (bo-cache_coherent)
+  return;
+
+   if (bo-tiling)
+  return; /* XXX abort? */
+
+   if (!__brw_bo_set_caching(bo, I915_CACHING_CACHED))
+  return;
+
+   drm_intel_bo_disable_reuse(bo-base);
+   if (bo-reusable)
+  list_move(bo-link, bo-batch-inactive);
+
+   bo-reusable = false;
+   bo-cache_coherent = true;
+}
+
 static uint64_t brw_surface_size(int cpp,
 uint32_t width,
 uint32_t height,
diff --git a/src/mesa/drivers/dri/i965/brw_batch.h 
b/src/mesa/drivers/dri/i965/brw_batch.h
index 5b56e82..3628b03 100644
--- a/src/mesa/drivers/dri/i965/brw_batch.h
+++ b/src/mesa/drivers/dri/i965/brw_batch.h
@@ -220,6 +220,15 @@ struct brw_bo *brw_bo_create_from_name(struct brw_batch 
*batch,
   const char *name,
   uint32_t global_name);
 
+/* Enable CPU cache coherent to the buffer. On LLC, normally all buffers
+ * are cache coherent, but on non-LLC architectures we can tell the GPU
+ * to snoop from and to flush into the CPU cache. Performing the snoop
+ * is slower for the GPU, but eliminates the uncached penalty from the CPU,
+ * so it only useful for streaming data (read once) to the GPU or when
+ * we need to read anything back from the GPU.
+ */
+void brw_bo_enable_snoop(struct brw_bo *bo);
+
 void brw_bo_mark_dirty(struct brw_batch *batch, struct brw_bo *bo);
 
 inline static int brw_bo_madvise(struct brw_bo *bo, int state)
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 63f4f87..3aa003c 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -872,6 +872,7 @@ struct brw_transform_feedback_object {
 * Count of primitives generated during this transform feedback operation.
 *  @{
 */
+#define BRW_XFB_BO_SIZE (1610)
uint64_t prims_generated[BRW_MAX_XFB_STREAMS];
struct brw_bo *prim_count_bo;
unsigned prim_count_buffer_index; /** in number of uint64_t units */
diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c 
b/src/mesa/drivers/dri/i965/gen6_sol.c
index 39bf8e9..037aad0 100644
--- a/src/mesa/drivers/dri/i965/gen6_sol.c
+++ b/src/mesa/drivers/dri/i965/gen6_sol.c
@@ -206,7 +206,8 @@ brw_new_transform_feedback(struct gl_context *ctx, GLuint 
name)
brw_obj-offset_bo =
   brw_bo_create(brw-batch, transform feedback offsets, 16, 64, 0);
brw_obj-prim_count_bo =
-  brw_bo_create(brw-batch, xfb primitive counts, 4096, 64, 0);
+  brw_bo_create(brw-batch, xfb primitive counts, BRW_XFB_BO_SIZE, 
64,0);
+   brw_bo_enable_snoop(brw_obj-prim_count_bo);
 
return brw_obj-base;
 }
diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index ff6678e..857ebe5 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -350,7 +350,7 @@ gen7_save_primitives_written_counters(struct brw_context 
*brw,

[Mesa-dev] [PATCH 07/18] i965: Make intel_mipmap_tree_map_raw() static

2015-07-06 Thread Chris Wilson

No external users, so no need to export the symbol outside of our
compilation unit.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 36 +--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  7 --
 2 files changed, 18 insertions(+), 25 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index fcad043..f9dc74f 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -1342,6 +1342,24 @@ intel_miptree_copy_teximage(struct brw_context *brw,
intel_obj-needs_validate = true;
 }
 
+static void *
+intel_miptree_map_raw(struct brw_context *brw,
+ struct intel_mipmap_tree *mt,
+ GLbitfield mode)
+{
+   /* CPU accesses to color buffers don't understand fast color clears, so
+* resolve any pending fast color clears before we map.
+*/
+   intel_miptree_resolve_color(brw, mt);
+   return brw_bo_map(mt-bo, mode  GL_MAP_WRITE_BIT ? MAP_WRITE : MAP_READ);
+}
+
+static void
+intel_miptree_unmap_raw(struct brw_context *brw,
+struct intel_mipmap_tree *mt)
+{
+}
+
 static bool
 intel_miptree_alloc_mcs(struct brw_context *brw,
 struct intel_mipmap_tree *mt,
@@ -2052,24 +2070,6 @@ intel_miptree_updownsample(struct brw_context *brw,
}
 }
 
-void *
-intel_miptree_map_raw(struct brw_context *brw,
- struct intel_mipmap_tree *mt,
- GLbitfield mode)
-{
-   /* CPU accesses to color buffers don't understand fast color clears, so
-* resolve any pending fast color clears before we map.
-*/
-   intel_miptree_resolve_color(brw, mt);
-   return brw_bo_map(mt-bo, mode  GL_MAP_WRITE_BIT ? MAP_WRITE : MAP_READ);
-}
-
-void
-intel_miptree_unmap_raw(struct brw_context *brw,
-struct intel_mipmap_tree *mt)
-{
-}
-
 static void
 intel_miptree_map_gtt(struct brw_context *brw,
  struct intel_mipmap_tree *mt,
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 6c69572..0c25f1f 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -773,13 +773,6 @@ brw_miptree_layout(struct brw_context *brw,
enum intel_miptree_tiling_mode requested,
uint32_t layout_flags);
 
-void *intel_miptree_map_raw(struct brw_context *brw,
-   struct intel_mipmap_tree *mt,
-   GLbitfield mode);
-
-void intel_miptree_unmap_raw(struct brw_context *brw,
- struct intel_mipmap_tree *mt);
-
 void
 intel_miptree_map(struct brw_context *brw,
  struct intel_mipmap_tree *mt,
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback

2015-07-06 Thread Chris Wilson

Since the purpose of transform feedback tends to be for the client to
act upon the results to change the geometry in the scene, it is likely
that the client will soon be waiting upon the results. Flush the batch
early so that we don't build up a long queue of commands afterwards that
could delay the readback.
---
 src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index 857ebe5..13dbe5b 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx,
 
brw_batch_end(brw-batch);
 
+   /* We will likely want to read the results in the very near future, so
+* push this primitive to hardware if it is currently idle.
+*/
+   if (!brw_batch_busy(brw-batch))
+  brw_batch_flush(brw-batch);
+
/* EndTransformFeedback() means that we need to update the number of
 * vertices written.  Since it's only necessary if DrawTransformFeedback()
 * is called and it means mapping a buffer object, we delay computing it
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/18] i965: Track active GPU region from MapBufferRange

2015-07-06 Thread Chris Wilson

Avoid unrequired synchronization if the user requests to map an unused
range on active buffer, equivalent to BufferSubData.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 src/mesa/drivers/dri/i965/intel_buffer_objects.c | 78 +---
 1 file changed, 42 insertions(+), 36 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c 
b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
index fda5c9f..283182d 100644
--- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c
+++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
@@ -330,44 +330,50 @@ brw_map_buffer_range(struct gl_context *ctx,
 * If they set INVALIDATE_BUFFER, we can pitch the current contents to
 * achieve the required synchronization.
 */
-   if (!(access  GL_MAP_UNSYNCHRONIZED_BIT)) {
+   if (!(access  GL_MAP_UNSYNCHRONIZED_BIT) 
+   brw_bo_busy(intel_obj-buffer, BUSY_WRITE | BUSY_RETIRE)) {
   if ((access  GL_MAP_INVALIDATE_BUFFER_BIT)) {
-if (brw_bo_busy(intel_obj-buffer, BUSY_WRITE | BUSY_RETIRE)) {
-   brw_bo_put(intel_obj-buffer);
-   alloc_buffer_object(brw, intel_obj);
-}
-  }
-   }
+ brw_bo_put(intel_obj-buffer);
+ alloc_buffer_object(brw, intel_obj);
+  } else if (offset + length = intel_obj-gpu_active_start ||
+ intel_obj-gpu_active_end = offset) {
+ access |= GL_MAP_UNSYNCHRONIZED_BIT;
+  } else if (!(access  GL_MAP_PERSISTENT_BIT) 
+ (access  GL_MAP_INVALIDATE_RANGE_BIT)) {
+ /* If the user is mapping a range of an active buffer object but
+  * doesn't require the current contents of that range, make a new
+  * BO, and we'll copy what they put in there out at unmap or
+  * FlushRange time.
+  *
+  * That is, unless they're looking for a persistent mapping -- we
+  * would need to do blits in the MemoryBarrier call, and it's easier
+  * to just do a GPU stall and do a mapping.
+  */
 
-   /* If the user is mapping a range of an active buffer object but
-* doesn't require the current contents of that range, make a new
-* BO, and we'll copy what they put in there out at unmap or
-* FlushRange time.
-*
-* That is, unless they're looking for a persistent mapping -- we would
-* need to do blits in the MemoryBarrier call, and it's easier to just do a
-* GPU stall and do a mapping.
-*/
-   if (!(access  (GL_MAP_UNSYNCHRONIZED_BIT | GL_MAP_PERSISTENT_BIT)) 
-   (access  GL_MAP_INVALIDATE_RANGE_BIT) 
-   brw_bo_busy(intel_obj-buffer, BUSY_WRITE | BUSY_RETIRE)) {
-  /* Ensure that the base alignment of the allocation meets the alignment
-   * guarantees the driver has advertised to the application.
-   */
-  const unsigned alignment = ctx-Const.MinMapBufferAlignment;
-
-  intel_obj-map_extra[index] = (uintptr_t) offset % alignment;
-  intel_obj-range_map_bo[index] =
- brw_bo_create(brw-batch,
-   BO blit temp,
-   length + intel_obj-map_extra[index],
-   alignment, 0);
-
-  obj-Mappings[index].Pointer =
-brw_bo_map(intel_obj-range_map_bo[index], MAP_WRITE) +
-intel_obj-map_extra[index];
-
-  return obj-Mappings[index].Pointer;
+ /* Ensure that the base alignment of the allocation meets the 
alignment
+  * guarantees the driver has advertised to the application.
+  */
+ const unsigned alignment = ctx-Const.MinMapBufferAlignment;
+
+ intel_obj-map_extra[index] = (uintptr_t) offset % alignment;
+ intel_obj-range_map_bo[index] =
+brw_bo_create(brw-batch,
+  BO blit temp,
+  length + intel_obj-map_extra[index],
+  alignment, 0);
+
+ obj-Mappings[index].Pointer =
+brw_bo_map(intel_obj-range_map_bo[index], MAP_WRITE) +
+intel_obj-map_extra[index];
+
+ return obj-Mappings[index].Pointer;
+  } else {
+ perf_debug(Stalling on glBufferMapRange(%ld, %ld) (%ldkb) to a busy 
+(%d-%d) buffer object.\n,
+(long)offset, (long)offset + length, (long)(length/1024),
+intel_obj-gpu_active_start,
+intel_obj-gpu_active_end);
+  }
}
 
map_flags = 0;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/18] i965: Reuse our VBO for streaming fast-clear vertices

2015-07-06 Thread Chris Wilson

Rather than allocating a fresh page every time we clear a buffer, keep
that page around between invocations by tracking the last used offset
and only allocating a fresh page when we wrap.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index a571a74..ffd1e3b 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -55,6 +55,7 @@
 struct brw_fast_clear_state {
GLuint vao;
GLuint vbo;
+   GLuint vbo_offset;
GLuint shader_prog;
GLint color_location;
 };
@@ -83,6 +84,8 @@ brw_fast_clear_init(struct brw_context *brw)
_mesa_VertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, sizeof(float) * 2, 0);
_mesa_EnableVertexAttribArray(0);
 
+   clear-vbo_offset = 4096; /* Force an allocation on first use */
+
return true;
 }
 
@@ -166,7 +169,8 @@ struct rect {
 static void
 brw_draw_rectlist(struct gl_context *ctx, struct rect *rect, int num_instances)
 {
-   int start = 0, count = 3;
+   struct brw_fast_clear_state *clear = brw_context(ctx)-fast_clear_state;
+   int start, count = 3, offset;
struct _mesa_prim prim;
float verts[6];
 
@@ -178,8 +182,15 @@ brw_draw_rectlist(struct gl_context *ctx, struct rect 
*rect, int num_instances)
verts[5] = rect-y0;
 
/* upload new vertex data */
-   _mesa_BufferData(GL_ARRAY_BUFFER_ARB, sizeof(verts), verts,
-GL_DYNAMIC_DRAW_ARB);
+   offset = clear-vbo_offset;
+   clear-vbo_offset += sizeof(verts);
+   if (clear-vbo_offset  4096) {
+  _mesa_BufferData(GL_ARRAY_BUFFER_ARB, 4096, NULL, GL_STREAM_DRAW_ARB);
+  offset = 0;
+  clear-vbo_offset = sizeof(verts);
+   }
+   _mesa_BufferSubData(GL_ARRAY_BUFFER_ARB, offset, sizeof(verts), verts);
+   start = offset / (2*sizeof(float));
 
if (ctx-NewState)
   _mesa_update_state(ctx);
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/18] i965: Coalesce relocation read/write domains to a single integer

2015-07-06 Thread Chris Wilson

There are only a handful of distinct cache domains (less than 16), and
internally the kernel simply doesn't care other than whether the object
is a render target. We can therefore trim a parameter by coalescing the
relocation domains to a single unsigned bitfield without loss of
generality.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 src/mesa/drivers/dri/i965/brw_batch.c  | 11 
 src/mesa/drivers/dri/i965/brw_batch.h  | 19 +++--
 src/mesa/drivers/dri/i965/brw_cc.c |  2 +-
 src/mesa/drivers/dri/i965/brw_clip_state.c |  2 +-
 src/mesa/drivers/dri/i965/brw_conditional_render.c |  6 ++--
 src/mesa/drivers/dri/i965/brw_context.c| 14 +-
 src/mesa/drivers/dri/i965/brw_context.h|  6 ++--
 src/mesa/drivers/dri/i965/brw_cs.cpp   |  4 +--
 src/mesa/drivers/dri/i965/brw_curbe.c  |  2 +-
 src/mesa/drivers/dri/i965/brw_draw.c   | 12 
 src/mesa/drivers/dri/i965/brw_draw_upload.c| 12 +++-
 src/mesa/drivers/dri/i965/brw_misc_state.c | 31 ++---
 .../drivers/dri/i965/brw_performance_monitor.c | 10 +++
 src/mesa/drivers/dri/i965/brw_pipe_control.c   |  8 ++
 src/mesa/drivers/dri/i965/brw_sampler_state.c  |  2 +-
 src/mesa/drivers/dri/i965/brw_sf_state.c   |  2 +-
 src/mesa/drivers/dri/i965/brw_vs_state.c   |  5 ++--
 src/mesa/drivers/dri/i965/brw_wm_state.c   |  4 +--
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c   | 31 -
 src/mesa/drivers/dri/i965/gen6_blorp.cpp   | 26 --
 src/mesa/drivers/dri/i965/gen6_depth_state.c   | 12 ++--
 src/mesa/drivers/dri/i965/gen6_gs_state.c  |  2 +-
 src/mesa/drivers/dri/i965/gen6_queryobj.c  |  8 +++---
 src/mesa/drivers/dri/i965/gen6_surface_state.c |  3 +-
 src/mesa/drivers/dri/i965/gen6_vs_state.c  |  2 +-
 src/mesa/drivers/dri/i965/gen6_wm_state.c  |  2 +-
 src/mesa/drivers/dri/i965/gen7_blorp.cpp   | 13 -
 src/mesa/drivers/dri/i965/gen7_gs_state.c  |  2 +-
 src/mesa/drivers/dri/i965/gen7_misc_state.c| 13 ++---
 src/mesa/drivers/dri/i965/gen7_sol_state.c |  8 +++---
 src/mesa/drivers/dri/i965/gen7_vs_state.c  |  4 +--
 src/mesa/drivers/dri/i965/gen7_wm_state.c  |  2 +-
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c  | 19 +++--
 src/mesa/drivers/dri/i965/gen8_depth_state.c   |  9 ++
 src/mesa/drivers/dri/i965/gen8_draw_upload.c   |  6 ++--
 src/mesa/drivers/dri/i965/gen8_gs_state.c  |  2 +-
 src/mesa/drivers/dri/i965/gen8_misc_state.c|  8 --
 src/mesa/drivers/dri/i965/gen8_ps_state.c  |  2 +-
 src/mesa/drivers/dri/i965/gen8_sol_state.c |  4 +--
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 14 --
 src/mesa/drivers/dri/i965/gen8_vs_state.c  |  2 +-
 src/mesa/drivers/dri/i965/intel_blit.c | 32 ++
 src/mesa/drivers/dri/i965/intel_extensions.c   |  4 +--
 43 files changed, 161 insertions(+), 221 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_batch.c 
b/src/mesa/drivers/dri/i965/brw_batch.c
index 24e96c6..bfff9fe 100644
--- a/src/mesa/drivers/dri/i965/brw_batch.c
+++ b/src/mesa/drivers/dri/i965/brw_batch.c
@@ -802,8 +802,7 @@ uint64_t __brw_batch_reloc(struct brw_batch *batch,
uint32_t batch_offset,
struct brw_bo *target_bo,
uint64_t target_offset,
-   unsigned read_domains,
-   unsigned write_domain)
+   unsigned domains)
 {
assert(target_bo-refcnt);
if (unlikely(target_bo-batch != batch)) {
@@ -854,8 +853,8 @@ uint64_t __brw_batch_reloc(struct brw_batch *batch,
   batch-reloc[n].delta = target_offset;
   batch-reloc[n].target_handle = target_bo-target_handle;
   batch-reloc[n].presumed_offset = target_bo-offset;
-  batch-reloc[n].read_domains = read_domains;
-  batch-reloc[n].write_domain = write_domain;
+  batch-reloc[n].read_domains = domains  0x;
+  batch-reloc[n].write_domain = domains  16;
 
   /* If we haven't added the batch to the execobject array yet, we
* will have to process all the relocations pointing to the
@@ -868,7 +867,7 @@ uint64_t __brw_batch_reloc(struct brw_batch *batch,
   }
}
 
-   if (write_domain  !target_bo-dirty) {
+   if (domains  16  !target_bo-dirty) {
   assert(target_bo != batch-bo);
   target_bo-write.rq = batch-next_request;
   list_move(target_bo-write.link, batch-next_request-write);
@@ -877,7 +876,7 @@ uint64_t __brw_batch_reloc(struct brw_batch *batch,
   target_bo-domain = DOMAIN_GPU;
   if (has_lut(batch)) {
  target_bo-exec-flags |= EXEC_OBJECT_WRITE;
- if (write_domain == I915_GEM_DOMAIN_INSTRUCTION

[Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once

2015-07-06 Thread Chris Wilson

Move the query for the TIMESTAMP register from context init to the
screen, so that it is only queried once for all contexts.

On 32bit systems, some old kernels trigger a hw bug resulting in the
TIMESTAMP register being shifted and the low bits always zero. Detect
this by repeating the read a few times and check the register is
incrementing.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 src/mesa/drivers/dri/i965/intel_extensions.c |  6 +-
 src/mesa/drivers/dri/i965/intel_screen.c | 15 +++
 src/mesa/drivers/dri/i965/intel_screen.h |  2 ++
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 3423190..740ac81 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -282,8 +282,6 @@ intelInitExtensions(struct gl_context *ctx)
}
 
if (brw-gen = 6) {
-  uint64_t dummy;
-
   ctx-Extensions.ARB_blend_func_extended =
  !driQueryOptionb(brw-optionCache, disable_blend_func_extended);
   ctx-Extensions.ARB_conditional_render_inverted = true;
@@ -307,9 +305,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx-Extensions.EXT_transform_feedback = true;
   ctx-Extensions.OES_depth_texture_cube_map = true;
 
-  /* Test if the kernel has the ioctl. */
-  if (drm_intel_reg_read(brw-bufmgr, TIMESTAMP, dummy) == 0)
- ctx-Extensions.ARB_timer_query = true;
+  ctx-Extensions.ARB_timer_query = brw-intelScreen-hw_has_timestamp;
 
   /* Only enable this in core profile because other parts of Mesa behave
* slightly differently when the extension is enabled.
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index f9398d7..839a984 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1123,6 +1123,20 @@ intel_detect_swizzling(struct intel_screen *screen)
   return true;
 }
 
+static bool
+intel_detect_timestamp(struct intel_screen *screen)
+{
+   uint64_t dummy = 0;
+   int loop = 10;
+
+   do {
+  if (drm_intel_reg_read(screen-bufmgr, TIMESTAMP, dummy))
+return false;
+   } while ((dummy  0x) == 0  --loop);
+
+   return loop  0;
+}
+
 /**
  * Return array of MSAA modes supported by the hardware. The array is
  * zero-terminated and sorted in decreasing order.
@@ -1378,6 +1392,7 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
intelScreen-hw_must_use_separate_stencil = intelScreen-devinfo-gen = 7;
 
intelScreen-hw_has_swizzling = intel_detect_swizzling(intelScreen);
+   intelScreen-hw_has_timestamp = intel_detect_timestamp(intelScreen);
 
const char *force_msaa = getenv(INTEL_FORCE_MSAA);
if (force_msaa) {
diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
b/src/mesa/drivers/dri/i965/intel_screen.h
index 742b3d3..941e0fc 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.h
+++ b/src/mesa/drivers/dri/i965/intel_screen.h
@@ -52,6 +52,8 @@ struct intel_screen
 
bool hw_has_swizzling;
 
+   bool hw_has_timestamp;
+
/**
 * Does the kernel support context reset notifications?
 */
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/18] i965: Move pipecontrol workaround bo to brw_pipe_control

2015-07-06 Thread Chris Wilson

With the exception of gen8, the sole user of the workaround bo are for
emitting pipe controls. Move it out of the purview of the batchbuffer
and into the pipecontrol.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 src/mesa/drivers/dri/i965/brw_context.c   |  7 +
 src/mesa/drivers/dri/i965/brw_context.h   | 12 +---
 src/mesa/drivers/dri/i965/brw_pipe_control.c  | 40 +++
 src/mesa/drivers/dri/i965/gen8_depth_state.c  |  2 +-
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 12 
 src/mesa/drivers/dri/i965/intel_extensions.c  | 28 +--
 6 files changed, 64 insertions(+), 37 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 4b51fe5..8150b94 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -819,6 +819,12 @@ brwCreateContext(gl_api api,
   }
}
 
+   if (brw_init_pipe_control(brw, devinfo)) {
+  *dri_ctx_error = __DRI_CTX_ERROR_NO_MEMORY;
+  intelDestroyContext(driContextPriv);
+  return false;
+   }
+
brw_init_state(brw);
 
intelInitExtensions(ctx);
@@ -942,6 +948,7 @@ intelDestroyContext(__DRIcontext * driContextPriv)
if (ctx-swrast_context)
   _swrast_DestroyContext(brw-ctx);
 
+   brw_fini_pipe_control(brw);
intel_batchbuffer_free(brw);
 
drm_intel_bo_unreference(brw-throttle_batch[1]);
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 3553f6e..db0fc48 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -868,8 +868,6 @@ struct intel_batchbuffer {
drm_intel_bo *bo;
/** Last BO submitted to the hardware.  Used for glFinish(). */
drm_intel_bo *last_bo;
-   /** BO for post-sync nonzero writes for gen6 workaround. */
-   drm_intel_bo *workaround_bo;
 
uint16_t emit, total;
uint16_t used, reserved_space;
@@ -881,8 +879,6 @@ struct intel_batchbuffer {
enum brw_gpu_ring ring;
bool needs_sol_reset;
 
-   uint8_t pipe_controls_since_last_cs_stall;
-
struct {
   uint16_t used;
   int reloc_count;
@@ -1034,6 +1030,10 @@ struct brw_context
 
drm_intel_context *hw_ctx;
 
+   /** BO for post-sync nonzero writes for gen6 workaround. */
+   drm_intel_bo *workaround_bo;
+   uint8_t pipe_controls_since_last_cs_stall;
+
/**
 * Set of drm_intel_bo * that have been rendered to within this batchbuffer
 * and would need flushing before being used from another cache domain that
@@ -2000,6 +2000,10 @@ gen9_use_linear_1d_layout(const struct brw_context *brw,
   const struct intel_mipmap_tree *mt);
 
 /* brw_pipe_control.c */
+int brw_init_pipe_control(struct brw_context *brw,
+ const struct brw_device_info *info);
+void brw_fini_pipe_control(struct brw_context *brw);
+
 void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags);
 void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags,
  drm_intel_bo *bo, uint32_t offset,
diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index b4c86b9..7ee3cb6 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -72,13 +72,13 @@ gen7_cs_stall_every_four_pipe_controls(struct brw_context 
*brw, uint32_t flags)
if (brw-gen == 7  !brw-is_haswell) {
   if (flags  PIPE_CONTROL_CS_STALL) {
  /* If we're doing a CS stall, reset the counter and carry on. */
- brw-batch.pipe_controls_since_last_cs_stall = 0;
+ brw-pipe_controls_since_last_cs_stall = 0;
  return 0;
   }
 
   /* If this is the fourth pipe control without a CS stall, do one now. */
-  if (++brw-batch.pipe_controls_since_last_cs_stall == 4) {
- brw-batch.pipe_controls_since_last_cs_stall = 0;
+  if (++brw-pipe_controls_since_last_cs_stall == 4) {
+ brw-pipe_controls_since_last_cs_stall = 0;
  return PIPE_CONTROL_CS_STALL;
   }
}
@@ -213,7 +213,7 @@ gen7_emit_vs_workaround_flush(struct brw_context *brw)
brw_emit_pipe_control_write(brw,
PIPE_CONTROL_WRITE_IMMEDIATE
| PIPE_CONTROL_DEPTH_STALL,
-   brw-batch.workaround_bo, 0,
+   brw-workaround_bo, 0,
0, 0);
 }
 
@@ -227,7 +227,7 @@ gen7_emit_cs_stall_flush(struct brw_context *brw)
brw_emit_pipe_control_write(brw,
PIPE_CONTROL_CS_STALL
| PIPE_CONTROL_WRITE_IMMEDIATE,
-   brw-batch.workaround_bo, 0,
+   brw-workaround_bo, 0,
0, 0);
 }
 
@@ -277,7 +277,7 @@ brw_emit_post_sync_nonzero_flush(struct

[Mesa-dev] [PATCH 06/18] i965: Pass the map-mode along to intel_mipmap_tree_map_raw()

2015-07-06 Thread Chris Wilson

Since we can distinguish when mapping between READ and WRITE, we can
pass along the map mode to avoid stalls and flushes where possible.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 28 ++-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  3 ++-
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index e8bbc04..fcad043 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -1401,7 +1401,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
 *
 * Note: the clear value for MCS buffers is all 1's, so we memset to 0xff.
 */
-   void *data = intel_miptree_map_raw(brw, mt-mcs_mt);
+   void *data = intel_miptree_map_raw(brw, mt-mcs_mt, GL_MAP_WRITE_BIT);
memset(data, 0xff, mt-mcs_mt-total_height * mt-mcs_mt-pitch);
intel_miptree_unmap_raw(brw, mt-mcs_mt);
mt-fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
@@ -2053,13 +2053,15 @@ intel_miptree_updownsample(struct brw_context *brw,
 }
 
 void *
-intel_miptree_map_raw(struct brw_context *brw, struct intel_mipmap_tree *mt)
+intel_miptree_map_raw(struct brw_context *brw,
+ struct intel_mipmap_tree *mt,
+ GLbitfield mode)
 {
/* CPU accesses to color buffers don't understand fast color clears, so
 * resolve any pending fast color clears before we map.
 */
intel_miptree_resolve_color(brw, mt);
-   return brw_bo_map(mt-bo, MAP_WRITE);
+   return brw_bo_map(mt-bo, mode  GL_MAP_WRITE_BIT ? MAP_WRITE : MAP_READ);
 }
 
 void
@@ -2088,7 +2090,7 @@ intel_miptree_map_gtt(struct brw_context *brw,
assert(y % bh == 0);
y /= bh;
 
-   base = intel_miptree_map_raw(brw, mt) + mt-offset;
+   base = intel_miptree_map_raw(brw, mt, map-mode) + mt-offset;
 
if (base == NULL)
   map-ptr = NULL;
@@ -2155,7 +2157,7 @@ intel_miptree_map_blit(struct brw_context *brw,
   }
}
 
-   map-ptr = intel_miptree_map_raw(brw, map-mt);
+   map-ptr = intel_miptree_map_raw(brw, map-mt, map-mode);
 
DBG(%s: %d,%d %dx%d from mt %p (%s) %d,%d = %p/%d\n, __func__,
map-x, map-y, map-w, map-h,
@@ -2219,7 +2221,7 @@ intel_miptree_map_movntdqa(struct brw_context *brw,
image_x += map-x;
image_y += map-y;
 
-   void *src = intel_miptree_map_raw(brw, mt);
+   void *src = intel_miptree_map_raw(brw, mt, map-mode);
if (!src)
   return;
src += image_y * mt-pitch;
@@ -2285,7 +2287,7 @@ intel_miptree_map_s8(struct brw_context *brw,
 */
if (!(map-mode  GL_MAP_INVALIDATE_RANGE_BIT)) {
   uint8_t *untiled_s8_map = map-ptr;
-  uint8_t *tiled_s8_map = intel_miptree_map_raw(brw, mt);
+  uint8_t *tiled_s8_map = intel_miptree_map_raw(brw, mt, GL_MAP_READ_BIT);
   unsigned int image_x, image_y;
 
   intel_miptree_get_image_offset(mt, level, slice, image_x, image_y);
@@ -2322,7 +2324,7 @@ intel_miptree_unmap_s8(struct brw_context *brw,
if (map-mode  GL_MAP_WRITE_BIT) {
   unsigned int image_x, image_y;
   uint8_t *untiled_s8_map = map-ptr;
-  uint8_t *tiled_s8_map = intel_miptree_map_raw(brw, mt);
+  uint8_t *tiled_s8_map = intel_miptree_map_raw(brw, mt, GL_MAP_WRITE_BIT);
 
   intel_miptree_get_image_offset(mt, level, slice, image_x, image_y);
 
@@ -2377,7 +2379,7 @@ intel_miptree_unmap_etc(struct brw_context *brw,
image_x += map-x;
image_y += map-y;
 
-   uint8_t *dst = intel_miptree_map_raw(brw, mt)
+   uint8_t *dst = intel_miptree_map_raw(brw, mt, GL_MAP_WRITE_BIT)
 + image_y * mt-pitch
 + image_x * mt-cpp;
 
@@ -2428,8 +2430,8 @@ intel_miptree_map_depthstencil(struct brw_context *brw,
 */
if (!(map-mode  GL_MAP_INVALIDATE_RANGE_BIT)) {
   uint32_t *packed_map = map-ptr;
-  uint8_t *s_map = intel_miptree_map_raw(brw, s_mt);
-  uint32_t *z_map = intel_miptree_map_raw(brw, z_mt);
+  uint8_t *s_map = intel_miptree_map_raw(brw, s_mt, GL_MAP_READ_BIT);
+  uint32_t *z_map = intel_miptree_map_raw(brw, z_mt, GL_MAP_READ_BIT);
   unsigned int s_image_x, s_image_y;
   unsigned int z_image_x, z_image_y;
 
@@ -2489,8 +2491,8 @@ intel_miptree_unmap_depthstencil(struct brw_context *brw,
 
if (map-mode  GL_MAP_WRITE_BIT) {
   uint32_t *packed_map = map-ptr;
-  uint8_t *s_map = intel_miptree_map_raw(brw, s_mt);
-  uint32_t *z_map = intel_miptree_map_raw(brw, z_mt);
+  uint8_t *s_map = intel_miptree_map_raw(brw, s_mt, GL_MAP_WRITE_BIT);
+  uint32_t *z_map = intel_miptree_map_raw(brw, z_mt, GL_MAP_WRITE_BIT);
   unsigned int s_image_x, s_image_y;
   unsigned int z_image_x, z_image_y;
 
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 7e91c97..6c69572 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++

[Mesa-dev] [PATCH 12/18] i965: Reuse any available upload space for temporary MapBufferRange blits

2015-07-06 Thread Chris Wilson

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 src/mesa/drivers/dri/i965/intel_buffer_objects.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c 
b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
index 283182d..7110ce6 100644
--- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c
+++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
@@ -354,17 +354,13 @@ brw_map_buffer_range(struct gl_context *ctx,
   * guarantees the driver has advertised to the application.
   */
  const unsigned alignment = ctx-Const.MinMapBufferAlignment;
-
- intel_obj-map_extra[index] = (uintptr_t) offset % alignment;
- intel_obj-range_map_bo[index] =
-brw_bo_create(brw-batch,
-  BO blit temp,
-  length + intel_obj-map_extra[index],
-  alignment, 0);
+ const unsigned adj = (uintptr_t) offset % alignment;
 
  obj-Mappings[index].Pointer =
-brw_bo_map(intel_obj-range_map_bo[index], MAP_WRITE) +
-intel_obj-map_extra[index];
+intel_upload_space(brw, length + adj, alignment,
+   intel_obj-range_map_bo[index],
+   intel_obj-map_extra[index]) + adj;
+ intel_obj-map_extra[index] += adj;
 
  return obj-Mappings[index].Pointer;
   } else {
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 13/18] util/register_allocate: Combine the BITSET arrays into a single allocation

2015-07-06 Thread Chris Wilson

ralloc's ability to track all pointers belonging to a context and free
them in a single call does not come cheap, and we can reduce the
overhead here by combining the array of BITSETs for a regset into a
single allocation.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Matt Turner matts...@gmail.com
Cc: Jason Ekstrand jason.ekstr...@intel.com
Cc: Martin Peres martin.pe...@linux.intel.com
---
 src/util/register_allocate.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
index 2ad8c3c..f5f7c04 100644
--- a/src/util/register_allocate.c
+++ b/src/util/register_allocate.c
@@ -187,14 +187,18 @@ ra_alloc_reg_set(void *mem_ctx, unsigned int count)
 {
unsigned int i;
struct ra_regs *regs;
+   BITSET_WORD *conflicts;
+   int bitset_count = BITSET_WORDS(count);
 
regs = rzalloc(mem_ctx, struct ra_regs);
regs-count = count;
regs-regs = rzalloc_array(regs, struct ra_reg, count);
 
+   conflicts = rzalloc_array(regs-regs, BITSET_WORD, count * bitset_count);
for (i = 0; i  count; i++) {
-  regs-regs[i].conflicts = rzalloc_array(regs-regs, BITSET_WORD,
-  BITSET_WORDS(count));
+  regs-regs[i].conflicts = conflicts;
+  conflicts += bitset_count;
+
   BITSET_SET(regs-regs[i].conflicts, i);
 
   regs-regs[i].conflict_list = ralloc_array(regs-regs, unsigned int, 4);
@@ -379,6 +383,8 @@ ra_alloc_interference_graph(struct ra_regs *regs, unsigned 
int count)
 {
struct ra_graph *g;
unsigned int i;
+   BITSET_WORD *adjacency;
+   int bitset_count = BITSET_WORDS(count);
 
g = rzalloc(NULL, struct ra_graph);
g-regs = regs;
@@ -387,9 +393,10 @@ ra_alloc_interference_graph(struct ra_regs *regs, unsigned 
int count)
 
g-stack = rzalloc_array(g, unsigned int, count);
 
+   adjacency = rzalloc_array(g, BITSET_WORD, count * bitset_count);
for (i = 0; i  count; i++) {
-  int bitset_count = BITSET_WORDS(count);
-  g-nodes[i].adjacency = rzalloc_array(g, BITSET_WORD, bitset_count);
+  g-nodes[i].adjacency = adjacency;
+  adjacency += bitset_count;
 
   g-nodes[i].adjacency_list_size = 4;
   g-nodes[i].adjacency_list =
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] gallium: add interface for writable shader images

2015-07-06 Thread Marek Olšák

On Sun, Jul 5, 2015 at 3:47 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 On Sun, Jul 5, 2015 at 9:25 AM, Marek Olšák mar...@gmail.com wrote:
 From: Marek Olšák marek.ol...@amd.com

 Other approaches are being considered:

 1) Don't use resource wrappers (views) and pass all view parameters
(format, layer range, level) to set_shader_images just like
set_vertex_buffers, set_constant_buffer, or even glBindImageTexture do.

 2) Use pipe_sampler_view instead of pipe_image_view,
and maybe even use set_sampler_views instead of set_shader_images.
set_sampler_views would have to use start_slot = PIPE_MAX_SAMPLERS for
all writable images to allow for OpenGL textures in the lower slots.

 I wouldn't like to go back to using pipe_surface, because it doesn't fit
 into the DX12 hw design.

 Please discuss.
 ---
 diff --git a/src/gallium/include/pipe/p_context.h 
 b/src/gallium/include/pipe/p_context.h
 index c2eedf8..13f3938 100644
 --- a/src/gallium/include/pipe/p_context.h
 +++ b/src/gallium/include/pipe/p_context.h
 @@ -48,6 +48,7 @@ struct pipe_depth_stencil_alpha_state;
  struct pipe_draw_info;
  struct pipe_fence_handle;
  struct pipe_framebuffer_state;
 +struct pipe_image_view;
  struct pipe_index_buffer;
  struct pipe_query;
  struct pipe_poly_stipple;
 @@ -236,20 +237,21 @@ struct pipe_context {
const float default_inner_level[2]);

 /**
 -* Bind an array of shader resources that will be used by the
 -* graphics pipeline.  Any resources that were previously bound to
 -* the specified range will be unbound after this call.
 +* Bind an array of images that will be used by a shader.
 +* Any imagse that were previously bound to the specified range
 +* will be unbound.
  *
 -* \param start  first resource to bind.
 -* \param count  number of consecutive resources to bind.
 -* \param resources  array of pointers to the resources to bind, it
 +* \param shader shader stage where the images will be bound.
 +* \param start_slot first iamage slot to bind.
 +* \param count  number of consecutive images to bind.
 +* \param buffersarray of pointers to the images to bind, it
  *   should contain at least \a count elements
 -*   unless it's NULL, in which case no new
 -*   resources will be bound.
 +*   unless it's NULL, in which case no images will
 +*   be bound.
  */
 -   void (*set_shader_resources)(struct pipe_context *,
 -unsigned start, unsigned count,
 -struct pipe_surface **resources);
 +   void (*set_shader_images)(struct pipe_context *, unsigned shader,
 + unsigned start_slot, unsigned count,
 + struct pipe_image_view *images);

 On Fermi, there is only one set of 8 images that can be bound,
 irrespective of stage. There are a different set of 8 bindings for 3d
 and compute pipelines though. Kepler+ is all bindless so it can do
 whatever.

Evergreen only has 12 images shared by fragment and compute. Other
shaders can't access them (I think). That's old DX11 hardware though.
SI+ is all bindless too.


 Even though I originally advocated for the stage argument, it appears
 to neither match Fermi hardware nor the GL API where I believe images
 are also in a global namespace.

Textures are in a global namespace too, but that doesn't make them
global in shaders. It really depends on whether
combined=vertex=fragment=... or if combined=vertex+fragment+ Now
that you brought this up, I've checked what Catalyst reports and
indeed it's the former for all SSBOs, images, and atomics. For
simplicity and less needed space for binding slots and less overhead
managing them, we should follow that design too. So let's remove the
shader argument from set_shader_images and set_shader_buffers.



 void (*set_vertex_buffers)( struct pipe_context *,
 unsigned start_slot,
 @@ -392,6 +394,17 @@ struct pipe_context {
 struct pipe_surface *);

 /**
 +* Create an image view into a buffer or texture to be used with load,
 +* store, and atomic instructions by a shader stage.
 +*/
 +   struct pipe_image_view * (*create_image_view)(struct pipe_context *ctx,
 + struct pipe_resource 
 *texture,
 + const struct 
 pipe_image_view *templat);
 +
 +   void (*image_view_destroy)(struct pipe_context *ctx,
 +  struct pipe_image_view *view);
 +
 +   /**
  * Map a resource.
  *
  * Transfers are (by default) context-private and allow uploads to be
 diff --git a/src/gallium/include/pipe/p_state.h 
 b/src/gallium/include/pipe/p_state.h
 index dc8f550..f655dda 100644
 ---

Re: [Mesa-dev] [PATCH RFC] egl/dri2: Add dri3 support to x11 platform

2015-07-06 Thread Emil Velikov

On 4 July 2015 at 04:18, Boyan Ding boyan.j.d...@gmail.com wrote:
 Hi Emil,


 On 07/03/2015 10:36 PM, Emil Velikov wrote:
 Hi Boyan,

 Thank you for doing this ! A few suggestions which you might be interesting:

 Considering that the backend has handled more than dri2 perhaps we can
 do a s/dri2/dri/ :-) That obviously is independent of your work.

 On 01/07/15 16:31, Boyan Ding wrote:
 Signed-off-by: Boyan Ding boyan.j.d...@gmail.com
 ---
  configure.ac |3 +
  src/egl/drivers/dri2/Makefile.am |5 +
  src/egl/drivers/dri2/egl_dri2.c  |   65 +-
  src/egl/drivers/dri2/egl_dri2.h  |   12 +-
  src/egl/drivers/dri2/platform_x11.c  |  127 ++-
  src/egl/drivers/dri2/platform_x11_dri3.c | 1591 
 ++
  src/egl/drivers/dri2/platform_x11_dri3.h |  140 +++
  7 files changed, 1926 insertions(+), 17 deletions(-)
  create mode 100644 src/egl/drivers/dri2/platform_x11_dri3.c
  create mode 100644 src/egl/drivers/dri2/platform_x11_dri3.h

 diff --git a/configure.ac b/configure.ac
 index af61aa2..090e6c9 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -1545,6 +1545,9 @@ if test x$enable_egl = xyes; then
  if test x$enable_shared_glapi = xno; then
  AC_MSG_ERROR([egl_dri2 requires --enable-shared-glapi])
  fi
 +if test x$enable_dri3 = xyes; then
 +EGL_LIB_DEPS=$EGL_LIB_DEPS -lxcb-dri3 -lxcb-present 
 -lXxf86vm -lxshmfence -lxcb-sync
 Neither one of these should be a direct -lfoo expression. We should
 check/use PKG_CHECK_MODULES and foo_{CFLAGS,LIBS}.

 I'll correct that.


 +fi
  else
  # Avoid building an empty libEGL. Drop/update this
  # when other backends (haiku?) come along.
 diff --git a/src/egl/drivers/dri2/Makefile.am 
 b/src/egl/drivers/dri2/Makefile.am
 index 55be4a7..d5b511c 100644
 --- a/src/egl/drivers/dri2/Makefile.am
 +++ b/src/egl/drivers/dri2/Makefile.am
 @@ -52,6 +52,11 @@ if HAVE_EGL_PLATFORM_X11
  libegl_dri2_la_SOURCES += platform_x11.c
  AM_CFLAGS += -DHAVE_X11_PLATFORM
  AM_CFLAGS += $(XCB_DRI2_CFLAGS)
 +if HAVE_DRI3
 +libegl_dri2_la_SOURCES += \
 +platform_x11_dri3.c \
 +platform_x11_dri3.h
 +endif
  endif

  if HAVE_EGL_PLATFORM_WAYLAND
 diff --git a/src/egl/drivers/dri2/egl_dri2.c 
 b/src/egl/drivers/dri2/egl_dri2.c
 index b1b65f7..052c579 100644
 --- a/src/egl/drivers/dri2/egl_dri2.c
 +++ b/src/egl/drivers/dri2/egl_dri2.c
 @@ -322,6 +322,12 @@ struct dri2_extension_match {
 int offset;
  };

 +static struct dri2_extension_match dri3_driver_extensions[] = {
 +   { __DRI_CORE, 1, offsetof(struct dri2_egl_display, core) },
 +   { __DRI_IMAGE_DRIVER, 1, offsetof(struct dri2_egl_display, 
 image_driver) },
 +   { NULL, 0, 0 }
 +};
 +
  static struct dri2_extension_match dri2_driver_extensions[] = {
 { __DRI_CORE, 1, offsetof(struct dri2_egl_display, core) },
 { __DRI_DRI2, 2, offsetof(struct dri2_egl_display, dri2) },
 @@ -464,6 +470,24 @@ dri2_open_driver(_EGLDisplay *disp)
  }

  EGLBoolean
 +dri2_load_driver_dri3(_EGLDisplay *disp)
 dri3_load_driver perhaps ?

 I prefixed all my functions in public files with dri2 instead of
 dri3 because the EGL driver is currently called DRI2 (although it
 seems funny to see dri3 in a driver called dri2).

 The current name of the driver is DRI2 because it uses dri2 to talk
 with the X server and uses similar technologies with direct rendering
 on other platforms. With my patch, the first reason becomes invalid.
 But should the name of the driver or namespace of all functions (or
 just the functions involved with dri3) change? I'm open to others'
 suggestions. The same applies to comments on function names below.

Iirc for swrast based wayland (platform_wayland.c) we already remove
the it uses dri2 assumption. Then again all this is a simple name
change which, as mentioned before, is not strictly related to your
work.


 [snip]

 diff --git a/src/egl/drivers/dri2/egl_dri2.h 
 b/src/egl/drivers/dri2/egl_dri2.h
 index f0cc6da..d258753 100644
 --- a/src/egl/drivers/dri2/egl_dri2.h
 +++ b/src/egl/drivers/dri2/egl_dri2.h
 @@ -153,11 +153,16 @@ struct dri2_egl_display

 int   dri2_major;
 int   dri2_minor;
 +   int   dri3_major;
 +   int   dri3_minor;
 +   int   present_major;
 +   int   present_minor;
 Many of these are unused. Same goes for the glx code which was the
 inspiration for this work :-)
 I think it's safe to remove them then.


 __DRIscreen  *dri_screen;
 int   own_dri_screen;
 const __DRIconfig   **driver_configs;
 void *driver;
 const __DRIcoreExtension   *core;
 +   const __DRIimageDriverExtension *image_driver;
 const __DRIdri2Extension   *dri2;
 const __DRIswrastExtension *swrast;
 const

[Mesa-dev] i965: Context local batch manager, take 2

2015-07-06 Thread Chris Wilson

Since the last posting, it has grown a few superficial patches for
tweaks in the general area and a couple of drive-bys from looking at the
synmark profiles. As for the main patch, I've worked through the few
piglit regressions and it should be clean on ivb/byt/hsw/bdw to the best
of my knowledge, and thanks to Jesse's prodding I have written
substantial comments to try and explain the intricacies of the kernel
interface we are trying to optimise.

Please review kindly,
-Chris

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 18/18] i965: AMD_pinned_memory and userptr

2015-07-06 Thread Chris Wilson

All GEN GPU can bind to any piece of memory (thanks UMA), and so through
a special ioctl we can map a chunk of page-aligned client memory into
the GPU address space. However, not all GEN are equal. Some have
cache-coherency between the CPU and the GPU, whilst the others are
incoherent and rely on snooping on explicit flushes to push/pull dirty
data. Whereas we can use client buffers as a general replacement for kernel
allocated buffers with LLC (cache coherency), using snooped buffers
behaves differently and so must be used with care.

AMD_pinned_memory supposes that the client memory buffer is suitable
for any general usage (e.g. vertex data, texture data) and so only on
LLC can we offer that extension.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 src/mesa/drivers/dri/i965/brw_batch.c| 36 +
 src/mesa/drivers/dri/i965/brw_batch.h|  8 +
 src/mesa/drivers/dri/i965/intel_buffer_objects.c | 40 +---
 src/mesa/drivers/dri/i965/intel_extensions.c |  8 +
 src/mesa/drivers/dri/i965/intel_screen.c | 14 +
 src/mesa/drivers/dri/i965/intel_screen.h |  2 +-
 6 files changed, 96 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_batch.c 
b/src/mesa/drivers/dri/i965/brw_batch.c
index e01a0c4..1f73148 100644
--- a/src/mesa/drivers/dri/i965/brw_batch.c
+++ b/src/mesa/drivers/dri/i965/brw_batch.c
@@ -1323,6 +1323,42 @@ struct brw_bo *brw_bo_create(struct brw_batch *batch,
return bo;
 }
 
+/*
+ * Wrap the chunk of client memory given by ptr+size inside a GPU
+ * buffer, and make it cache coherent (though on non-LLC architectures
+ * this requires snooping on explicit cache flushes). This allows the
+ * caller to write into the memory chunk and for those writes to be
+ * visible on the GPU (exactly as if they create the buffer and then
+ * persistently mapped it to obtain the pointer).
+ */
+struct brw_bo *brw_bo_create_userptr(struct brw_batch *batch,
+ const char *name,
+ void *ptr,
+ uint64_t size,
+ uint64_t alignment)
+{
+   drm_intel_bo *base;
+   struct brw_bo *bo;
+
+   base = drm_intel_bo_alloc_userptr(batch-bufmgr, name,
+ ptr, I915_TILING_NONE, 0, size, 0);
+   if (base == NULL)
+  return NULL;
+
+   base-align = alignment;
+   bo = brw_bo_import(batch, base, false);
+   if (bo == NULL) {
+  drm_intel_bo_unreference(base);
+  return NULL;
+   }
+
+   bo-cache_coherent = true;
+   bo-reusable = false;
+   list_move(bo-link, bo-batch-inactive);
+
+   return bo;
+}
+
 static bool __brw_bo_set_caching(struct brw_bo *bo, int caching)
 {
struct drm_i915_gem_caching arg;
diff --git a/src/mesa/drivers/dri/i965/brw_batch.h 
b/src/mesa/drivers/dri/i965/brw_batch.h
index 3628b03..3ee9b20 100644
--- a/src/mesa/drivers/dri/i965/brw_batch.h
+++ b/src/mesa/drivers/dri/i965/brw_batch.h
@@ -215,6 +215,14 @@ brw_bo_create_tiled(struct brw_batch *batch,
uint32_t *pitch,
unsigned flags);
 
+/* Create a local brw_bo for GPU access to client memory */
+struct brw_bo *
+brw_bo_create_userptr(struct brw_batch *batch,
+  const char *name,
+  void *ptr,
+  uint64_t size,
+  uint64_t alignment);
+
 /* Create a local brw_bo for a foreign buffer using its global flinked name */
 struct brw_bo *brw_bo_create_from_name(struct brw_batch *batch,
   const char *name,
diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c 
b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
index 7110ce6..03d0982 100644
--- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c
+++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
@@ -58,14 +58,10 @@ mark_buffer_inactive(struct intel_buffer_object *intel_obj)
intel_obj-gpu_active_end = 0;
 }
 
-/** Allocates a new brw_bo to store the data for the buffer object. */
 static void
-alloc_buffer_object(struct brw_context *brw,
-struct intel_buffer_object *intel_obj)
+mark_new_state(struct brw_context *brw,
+   struct intel_buffer_object *intel_obj)
 {
-   intel_obj-buffer =
-  brw_bo_create(brw-batch, bufferobj, intel_obj-Base.Size, 64, 0);
-
/* the buffer might be bound as a uniform buffer, need to update it
 */
if (intel_obj-Base.UsageHistory  USAGE_UNIFORM_BUFFER)
@@ -74,8 +70,18 @@ alloc_buffer_object(struct brw_context *brw,
   brw-ctx.NewDriverState |= BRW_NEW_TEXTURE_BUFFER;
if (intel_obj-Base.UsageHistory  USAGE_ATOMIC_COUNTER_BUFFER)
   brw-ctx.NewDriverState |= BRW_NEW_ATOMIC_BUFFER;
+}
+
+/** Allocates a new brw_bo to store the data for the buffer object. */
+static void
+alloc_buffer_object(struct brw_context *brw,
+struct

Re: [Mesa-dev] [RFC] gallium: add interface for writable shader images

2015-07-06 Thread Roland Scheidegger

Am 05.07.2015 um 15:47 schrieb Ilia Mirkin:
 On Sun, Jul 5, 2015 at 9:25 AM, Marek Olšák mar...@gmail.com wrote:
 From: Marek Olšák marek.ol...@amd.com

 Other approaches are being considered:

 1) Don't use resource wrappers (views) and pass all view parameters
(format, layer range, level) to set_shader_images just like
set_vertex_buffers, set_constant_buffer, or even glBindImageTexture do.

 2) Use pipe_sampler_view instead of pipe_image_view,
and maybe even use set_sampler_views instead of set_shader_images.
set_sampler_views would have to use start_slot = PIPE_MAX_SAMPLERS for
all writable images to allow for OpenGL textures in the lower slots.

 I wouldn't like to go back to using pipe_surface, because it doesn't fit
 into the DX12 hw design.

 Please discuss.
 ---
 diff --git a/src/gallium/include/pipe/p_context.h 
 b/src/gallium/include/pipe/p_context.h
 index c2eedf8..13f3938 100644
 --- a/src/gallium/include/pipe/p_context.h
 +++ b/src/gallium/include/pipe/p_context.h
 @@ -48,6 +48,7 @@ struct pipe_depth_stencil_alpha_state;
  struct pipe_draw_info;
  struct pipe_fence_handle;
  struct pipe_framebuffer_state;
 +struct pipe_image_view;
  struct pipe_index_buffer;
  struct pipe_query;
  struct pipe_poly_stipple;
 @@ -236,20 +237,21 @@ struct pipe_context {
const float default_inner_level[2]);

 /**
 -* Bind an array of shader resources that will be used by the
 -* graphics pipeline.  Any resources that were previously bound to
 -* the specified range will be unbound after this call.
 +* Bind an array of images that will be used by a shader.
 +* Any imagse that were previously bound to the specified range
 +* will be unbound.
  *
 -* \param start  first resource to bind.
 -* \param count  number of consecutive resources to bind.
 -* \param resources  array of pointers to the resources to bind, it
 +* \param shader shader stage where the images will be bound.
 +* \param start_slot first iamage slot to bind.
 +* \param count  number of consecutive images to bind.
 +* \param buffersarray of pointers to the images to bind, it
  *   should contain at least \a count elements
 -*   unless it's NULL, in which case no new
 -*   resources will be bound.
 +*   unless it's NULL, in which case no images will
 +*   be bound.
  */
 -   void (*set_shader_resources)(struct pipe_context *,
 -unsigned start, unsigned count,
 -struct pipe_surface **resources);
 +   void (*set_shader_images)(struct pipe_context *, unsigned shader,
 + unsigned start_slot, unsigned count,
 + struct pipe_image_view *images);
 
 On Fermi, there is only one set of 8 images that can be bound,
 irrespective of stage. There are a different set of 8 bindings for 3d
 and compute pipelines though. Kepler+ is all bindless so it can do
 whatever.
 
 Even though I originally advocated for the stage argument, it appears
 to neither match Fermi hardware nor the GL API where I believe images
 are also in a global namespace.

That looks exactly like d3d11 restrictions. UAVs were only in pixel
shaders and compute shaders. d3d11.1 made them available to all shader
stages though, but they continue to share the same bind points in all
shader stages (and still share them with render targets).
I don't really know though what's the better design. GL though seems to
think this is per shader stage (there are both per-stage and combined
limits just like as for textures).

 

 void (*set_vertex_buffers)( struct pipe_context *,
 unsigned start_slot,
 @@ -392,6 +394,17 @@ struct pipe_context {
 struct pipe_surface *);

 /**
 +* Create an image view into a buffer or texture to be used with load,
 +* store, and atomic instructions by a shader stage.
 +*/
 +   struct pipe_image_view * (*create_image_view)(struct pipe_context *ctx,
 + struct pipe_resource 
 *texture,
 + const struct 
 pipe_image_view *templat);
 +
 +   void (*image_view_destroy)(struct pipe_context *ctx,
 +  struct pipe_image_view *view);
 +
 +   /**
  * Map a resource.
  *
  * Transfers are (by default) context-private and allow uploads to be
 diff --git a/src/gallium/include/pipe/p_state.h 
 b/src/gallium/include/pipe/p_state.h
 index dc8f550..f655dda 100644
 --- a/src/gallium/include/pipe/p_state.h
 +++ b/src/gallium/include/pipe/p_state.h
 @@ -386,6 +386,31 @@ struct pipe_sampler_view


  /**
 + * A view into a writable buffer or texture that can be bound to a shader
 + * stage.
 + */
 +struct pipe_image_view
 +{
 +   struct pipe_reference

[Mesa-dev] [PATCH 14/18] util/register_allocate: Compute transitive conflicts using 2-passes

2015-07-06 Thread Chris Wilson

Avoid frequent use of reralloc() for tracking the conflicts list, and
walking that list every time we add a transitive conflict, by making the
observation we apply the indirect conflicts by combining the conflicts
of a conflicting register in a second pass.

Reduces brw_compiler_create() from 18351.5us to 4787.1us on my ivb
i7-3720QM (in context that 18ms represents about 50% of the time it takes
to start X, though why X instantiates an intel_screen at all remains a
mystery).

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Matt Turner matts...@gmail.com
Cc: Jason Ekstrand jason.ekstr...@intel.com
Cc: Martin Peres martin.pe...@linux.intel.com
---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  | 18 +++-
 .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 16 ++-
 src/util/register_allocate.c   | 53 +-
 src/util/register_allocate.h   |  2 +
 4 files changed, 64 insertions(+), 25 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index 8e5621d..7f87221 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -223,7 +223,7 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
reg_width)
 for (int base_reg = j;
  base_reg  j + (class_sizes[i] + 1) / 2;
  base_reg++) {
-   ra_add_transitive_reg_conflict(regs, base_reg, reg);
+   ra_mark_transitive_reg_conflict(regs, base_reg, reg);
 }
 
 reg++;
@@ -237,7 +237,7 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
reg_width)
 for (int base_reg = j;
  base_reg  j + class_sizes[i];
  base_reg++) {
-   ra_add_transitive_reg_conflict(regs, base_reg, reg);
+   ra_mark_transitive_reg_conflict(regs, base_reg, reg);
 }
 
 reg++;
@@ -246,6 +246,20 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
reg_width)
}
assert(reg == ra_reg_count);
 
+   reg = 0;
+   for (int i = 0; i  class_count; i++) {
+  int class_size = class_sizes[i];
+  int class_reg_count = base_reg_count - (class_size - 1);
+  if (devinfo-gen = 5  reg_width == 2)
+ class_size = (class_size + 1) / 2;
+  for (int j = 0; j  class_reg_count; j++) {
+for (int base_reg = j; base_reg  j + class_size; base_reg++)
+   ra_add_transitive_reg_conflict(regs, base_reg, reg);
+reg++;
+  }
+   }
+   assert(reg == ra_reg_count);
+
/* Add a special class for aligned pairs, which we'll put delta_xy
 * in on Gen = 6 so that we can do PLN.
 */
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
index 555c42e..93b7297 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
@@ -140,7 +140,7 @@ brw_vec4_alloc_reg_set(struct brw_compiler *compiler)
 for (int base_reg = j;
  base_reg  j + class_sizes[i];
  base_reg++) {
-   ra_add_transitive_reg_conflict(compiler-vec4_reg_set.regs, 
base_reg, reg);
+   ra_mark_transitive_reg_conflict(compiler-vec4_reg_set.regs, 
base_reg, reg);
 }
 
 reg++;
@@ -158,6 +158,20 @@ brw_vec4_alloc_reg_set(struct brw_compiler *compiler)
}
assert(reg == ra_reg_count);
 
+   reg = 0;
+   for (int i = 0; i  class_count; i++) {
+  int class_reg_count = base_reg_count - (class_sizes[i] - 1);
+  for (int j = 0; j  class_reg_count; j++) {
+for (int base_reg = j;
+ base_reg  j + class_sizes[i];
+ base_reg++) {
+   ra_add_transitive_reg_conflict(compiler-vec4_reg_set.regs, 
base_reg, reg);
+}
+reg++;
+  }
+   }
+   assert(reg == ra_reg_count);
+
ra_set_finalize(compiler-vec4_reg_set.regs, q_values);
 
for (int i = 0; i  MAX_VGRF_SIZE; i++)
diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
index f5f7c04..2bbab7f 100644
--- a/src/util/register_allocate.c
+++ b/src/util/register_allocate.c
@@ -83,19 +83,17 @@
 
 struct ra_reg {
BITSET_WORD *conflicts;
-   unsigned int *conflict_list;
-   unsigned int conflict_list_size;
-   unsigned int num_conflicts;
+   unsigned int conflict_range[2];
 };
 
 struct ra_regs {
struct ra_reg *regs;
-   unsigned int count;
 
struct ra_class **classes;
unsigned int class_count;
 
bool round_robin;
+   unsigned int count;
 };
 
 struct ra_class {
@@ -200,11 +198,8 @@ ra_alloc_reg_set(void *mem_ctx, unsigned int count)
   conflicts += bitset_count;
 
   BITSET_SET(regs-regs[i].conflicts, i);
-
-  regs-regs[i].conflict_list = ralloc_array(regs-regs, unsigned int, 4);
-  regs-regs[i].conflict_list_size = 4;
-  regs-regs[i].conflict_list[0] = i;
-

[Mesa-dev] [PATCH 15/18] swrast: Defer _tnl_vertex_init until first use

2015-07-06 Thread Chris Wilson

The vertices require a large chunk of memory, currently allocated during
context creation. However, this memory is not required until use so we
can defer the allocation until the first swrast_Wakeup().

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/swrast_setup/ss_context.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/mesa/swrast_setup/ss_context.c 
b/src/mesa/swrast_setup/ss_context.c
index 74b1da3..028eccb 100644
--- a/src/mesa/swrast_setup/ss_context.c
+++ b/src/mesa/swrast_setup/ss_context.c
@@ -59,10 +59,6 @@ _swsetup_CreateContext( struct gl_context *ctx )
swsetup-NewState = ~0;
_swsetup_trifuncs_init( ctx );
 
-   _tnl_init_vertices( ctx, ctx-Const.MaxArrayLockSize + 12, 
-  sizeof(SWvertex) );
-
-
return GL_TRUE;
 }
 
@@ -233,6 +229,11 @@ _swsetup_Wakeup( struct gl_context *ctx )
TNLcontext *tnl = TNL_CONTEXT(ctx);
SScontext *swsetup = SWSETUP_CONTEXT(ctx);
 
+   if (!(GET_VERTEX_STATE(ctx))-max_vertex_size)
+  _tnl_init_vertices(ctx,
+ctx-Const.MaxArrayLockSize + 12,
+sizeof(SWvertex));
+
tnl-Driver.Render.Start = _swsetup_RenderStart;
tnl-Driver.Render.Finish = _swsetup_RenderFinish;
tnl-Driver.Render.PrimitiveNotify = _swsetup_RenderPrimitive;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 17/18] loader: Look for any version of currently linked libudev.so

2015-07-06 Thread Chris Wilson

Since there was an ABI break and linking twice against libudev.so.0 and
libudev.so.1 causes the application to quickly crash, we first check if
the application is currently linked against libudev before dlopening a
local handle. However for backwards/forwards compatability, we need to
inspect the application for current linkage against all known versions
first.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 src/loader/loader.c | 46 --
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/src/loader/loader.c b/src/loader/loader.c
index 8452cd3..8780587 100644
--- a/src/loader/loader.c
+++ b/src/loader/loader.c
@@ -128,26 +128,36 @@ static void *udev_handle = NULL;
 static void *
 udev_dlopen_handle(void)
 {
-   if (!udev_handle) {
-  udev_handle = dlopen(libudev.so.1, RTLD_LOCAL | RTLD_LAZY);
-
-  if (!udev_handle) {
- /* libudev.so.1 changed the return types of the two unref functions
-  * from voids to pointers.  We don't use those return values, and the
-  * only ABI I've heard that cares about this kind of change (calling
-  * a function with a void * return that actually only returns void)
-  * might be ia64.
-  */
- udev_handle = dlopen(libudev.so.0, RTLD_LOCAL | RTLD_LAZY);
-
- if (!udev_handle) {
-log_(_LOADER_WARNING, Couldn't dlopen libudev.so.1 or 
- libudev.so.0, driver detection may be broken.\n);
- }
+   char name[80];
+   unsigned flags = RTLD_NOLOAD | RTLD_LOCAL | RTLD_LAZY;
+   int version;
+
+   /* libudev.so.1 changed the return types of the two unref functions
+* from voids to pointers.  We don't use those return values, and the
+* only ABI I've heard that cares about this kind of change (calling
+* a function with a void * return that actually only returns void)
+* might be ia64.
+*/
+
+   /* First try opening an already linked libudev, then try loading one */
+   do {
+  for (version = 1; version = 0; version--) {
+ snprintf(name, sizeof(name), libudev.so.%d, version);
+ udev_handle = dlopen(name, flags);
+ if (udev_handle)
+return udev_handle;
   }
-   }
 
-   return udev_handle;
+  if ((flags  RTLD_NOLOAD) == 0)
+ break;
+
+  flags = ~RTLD_NOLOAD;
+   } while (1);
+
+   log_(_LOADER_WARNING,
+Couldn't dlopen libudev.so.1 or 
+libudev.so.0, driver detection may be broken.\n);
+   return NULL;
 }
 
 static int dlsym_failed = 0;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 16/18] i965: Prevent coordinate overflow in intel_emit_linear_blit

2015-07-06 Thread Chris Wilson

Fixes regression from
commit 8c17d53823c77ac1c56b0548e4e54f69a33285f1
Author: Kenneth Graunke kenn...@whitecape.org
Date:   Wed Apr 15 03:04:33 2015 -0700

i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions.

which adjusted the coordinates to be relative to the nearest cacheline.
However, this then offsets the coordinates by up to 63 and this may then
cause them to overflow the BLT limits. For the well aligned large
transfer case, we can use 32bpp pixels and so reduce the coordinates by
4 (versus the current 8bpp pixels). We also have to be more careful
doing the last line just in case it may exceed the coordinate limit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90734
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Kenneth Graunke kenn...@whitecape.org
Cc: Ian Romanick ian.d.roman...@intel.com
Cc: Anuj Phogat anuj.pho...@gmail.com
Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/drivers/dri/i965/intel_blit.c | 62 --
 1 file changed, 29 insertions(+), 33 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_blit.c 
b/src/mesa/drivers/dri/i965/intel_blit.c
index 7e81685..dbb53d8 100644
--- a/src/mesa/drivers/dri/i965/intel_blit.c
+++ b/src/mesa/drivers/dri/i965/intel_blit.c
@@ -762,35 +762,20 @@ intel_emit_linear_blit(struct brw_context *brw,
int16_t src_x, dst_x;
bool ok;
 
-   /* The pitch given to the GPU must be DWORD aligned, and
-* we want width to match pitch. Max width is (1  15 - 1),
-* rounding that down to the nearest DWORD is 1  15 - 4
-*/
-   pitch = ROUND_DOWN_TO(MIN2(size, (1  15) - 1), 4);
-   height = (pitch == 0) ? 1 : size / pitch;
-   src_x = src_offset % 64;
-   dst_x = dst_offset % 64;
-   ok = intelEmitCopyBlit(brw, 1,
- pitch, src_bo, src_offset - src_x, I915_TILING_NONE,
-  INTEL_MIPTREE_TRMODE_NONE,
- pitch, dst_bo, dst_offset - dst_x, I915_TILING_NONE,
-  INTEL_MIPTREE_TRMODE_NONE,
- src_x, 0, /* src x/y */
- dst_x, 0, /* dst x/y */
- pitch, height, /* w, h */
- GL_COPY);
-   if (!ok)
-  _mesa_problem(ctx, Failed to linear blit %dx%d\n, pitch, height);
-
-   src_offset += pitch * height;
-   dst_offset += pitch * height;
-   src_x = src_offset % 64;
-   dst_x = dst_offset % 64;
-   size -= pitch * height;
-   assert (size  (1  15));
-   pitch = ALIGN(size, 4);
-
-   if (size != 0) {
+   do {
+  /* The pitch given to the GPU must be DWORD aligned, and
+   * we want width to match pitch. Max width is (1  15 - 1),
+   * rounding that down to the nearest DWORD is 1  15 - 4
+   */
+  pitch = ROUND_DOWN_TO(MIN2(size, (1  15) - 64), 4);
+  height = (size  pitch || pitch == 0) ? 1 : size / pitch;
+
+  src_x = src_offset % 64;
+  dst_x = dst_offset % 64;
+  pitch = ALIGN(MIN2(size, (1  15) - 64), 4);
+  assert(src_x + pitch  1  15);
+  assert(dst_x + pitch  1  15);
+
   ok = intelEmitCopyBlit(brw, 1,
 pitch, src_bo, src_offset - src_x, 
I915_TILING_NONE,
  INTEL_MIPTREE_TRMODE_NONE,
@@ -798,11 +783,22 @@ intel_emit_linear_blit(struct brw_context *brw,
  INTEL_MIPTREE_TRMODE_NONE,
 src_x, 0, /* src x/y */
 dst_x, 0, /* dst x/y */
-size, 1, /* w, h */
+MIN2(size, pitch), height, /* w, h */
 GL_COPY);
-  if (!ok)
- _mesa_problem(ctx, Failed to linear blit %dx%d\n, size, 1);
-   }
+  if (!ok) {
+_mesa_problem(ctx, Failed to linear blit %dx%d\n,
+  MIN2(size, pitch), height);
+return;
+  }
+
+  pitch *= height;
+  if (size = pitch)
+return;
+
+  src_offset += pitch;
+  dst_offset += pitch;
+  size -= pitch;
+   } while (1);
 }
 
 /**
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] clover: Implement image attribute getters

2015-07-06 Thread Francisco Jerez

Zoltán Gilián zoltan.gil...@gmail.com writes:

 This seems to be doing essentially the same thing as v1?  Is it the
 right patch?

 The llvm pass was invoked in clover in v1. This patch relies on llvm
 to perform that task (). What this patch does basically is that it
 adds the image attributes to the end of the kernel input vector.
 The commit message of this patch is misleading, I'll fix it.

NAK.  Just like in v1, you're implementing the same pipe driver-specific
policy in Clover's core layer -- If you don't feel like fixing this
properly as I described in my reply to v1, it would be acceptable to
implement it for the time being using a workaround similar to
llvm/invocation.cpp:433 -- Hint: you'll need new
module::argument::semantic enums.

Thanks.

 On Wed, Jun 24, 2015 at 2:48 PM, Francisco Jerez curroje...@riseup.net 
 wrote:
 Zoltan Gilian zoltan.gil...@gmail.com writes:

 Image attributes are passed to the kernel as hidden parameters after the
 image attribute itself. An llvm pass replaces the getter builtins to
 the appropriate parameters.

 This seems to be doing essentially the same thing as v1?  Is it the
 right patch?

 ---
  src/gallium/state_trackers/clover/core/kernel.cpp  | 26 +++
  src/gallium/state_trackers/clover/core/kernel.hpp  | 13 ++--
  src/gallium/state_trackers/clover/core/memory.cpp  |  2 +-
  .../state_trackers/clover/llvm/invocation.cpp  | 81 
 +-
  4 files changed, 116 insertions(+), 6 deletions(-)

 diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp 
 b/src/gallium/state_trackers/clover/core/kernel.cpp
 index 0756f06..291c799 100644
 --- a/src/gallium/state_trackers/clover/core/kernel.cpp
 +++ b/src/gallium/state_trackers/clover/core/kernel.cpp
 @@ -185,6 +185,13 @@ 
 kernel::exec_context::bind(intrusive_ptrcommand_queue _q,
}
 }

 +   // Bind image attribute args.
 +   for (const auto arg: kern._args) {
 +  if (auto img_arg = dynamic_castimage_argument*(arg.get())) {
 + img_arg-bind_attributes(*this);
 +  }
 +   }
 +
 // Create a new compute state if anything changed.
 if (!st || q != _q ||
 cs.req_local_mem != mem_local ||
 @@ -465,6 +472,25 @@ kernel::constant_argument::unbind(exec_context ctx) {
  }

  void
 +kernel::image_argument::bind_attributes(exec_context ctx) {
 +   cl_image_format format = img-format();
 +   cl_uint attributes[] = {
 + static_castcl_uint(img-width()),
 + static_castcl_uint(img-height()),
 + static_castcl_uint(img-depth()),
 + format.image_channel_data_type,
 + format.image_channel_order};
 +   for (unsigned i = 0; i  5; ++i) {
 +  auto v = bytes(attributes[i]);
 +
 +  extend(v, module::argument::zero_ext, sizeof(cl_uint));
 +  byteswap(v, ctx.q-device().endianness());
 +  align(ctx.input, sizeof(cl_uint));
 +  insert(ctx.input, v);
 +   }
 +}
 +
 +void
  kernel::image_rd_argument::set(size_t size, const void *value) {
 if (size != sizeof(cl_mem))
throw error(CL_INVALID_ARG_SIZE);
 diff --git a/src/gallium/state_trackers/clover/core/kernel.hpp 
 b/src/gallium/state_trackers/clover/core/kernel.hpp
 index d6432a4..8c15b2f 100644
 --- a/src/gallium/state_trackers/clover/core/kernel.hpp
 +++ b/src/gallium/state_trackers/clover/core/kernel.hpp
 @@ -190,7 +190,14 @@ namespace clover {
   pipe_surface *st;
};

 -  class image_rd_argument : public argument {
 +  class image_argument : public argument {
 +  public:
 + void bind_attributes(exec_context ctx);
 +  protected:
 + image *img;
 +  };
 +
 +  class image_rd_argument : public image_argument {
public:
   virtual void set(size_t size, const void *value);
   virtual void bind(exec_context ctx,
 @@ -198,11 +205,10 @@ namespace clover {
   virtual void unbind(exec_context ctx);

private:
 - image *img;
   pipe_sampler_view *st;
};

 -  class image_wr_argument : public argument {
 +  class image_wr_argument : public image_argument {
public:
   virtual void set(size_t size, const void *value);
   virtual void bind(exec_context ctx,
 @@ -210,7 +216,6 @@ namespace clover {
   virtual void unbind(exec_context ctx);

private:
 - image *img;
   pipe_surface *st;
};

 diff --git a/src/gallium/state_trackers/clover/core/memory.cpp 
 b/src/gallium/state_trackers/clover/core/memory.cpp
 index 055336a..b852e68 100644
 --- a/src/gallium/state_trackers/clover/core/memory.cpp
 +++ b/src/gallium/state_trackers/clover/core/memory.cpp
 @@ -189,7 +189,7 @@ image2d::image2d(clover::context ctx, cl_mem_flags 
 flags,
   const cl_image_format *format, size_t width,
   size_t height, size_t row_pitch,
   void *host_ptr) :
 -   image(ctx, flags, format, width, height, 0,
 +   image(ctx, flags, format, width, height, 1,

[Mesa-dev] [PATCH] radeonsi: Use param export count from si_llvm_export_vs in si_shader_vs

2015-07-06 Thread Michel Dänzer

From: Michel Dänzer michel.daen...@amd.com

This eliminates the error prone logic in si_shader_vs recalculating this
value.

It also fixes TGSI_SEMANTIC_CLIPDIST outputs incorrectly not being
counted for VS exports, since they are passed to the fragment shader as
varyings as well.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91193
Signed-off-by: Michel Dänzer michel.daen...@amd.com
---
 src/gallium/drivers/radeonsi/si_shader.c|  2 ++
 src/gallium/drivers/radeonsi/si_shader.h|  1 +
 src/gallium/drivers/radeonsi/si_state_shaders.c | 25 +++--
 3 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 4d97b58..753b238 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1218,6 +1218,8 @@ handle_semantic:
}
}
 
+   shader-nr_param_exports = param_count;
+
/* We need to add the position output manually if it's missing. */
if (!pos_args[0][0]) {
pos_args[0][0] = lp_build_const_int32(base-gallivm, 0xf); /* 
writemask */
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index b4339ae..8d309b4 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -165,6 +165,7 @@ struct si_shader {
 
booluses_instanceid;
unsignednr_pos_exports;
+   unsignednr_param_exports;
boolis_gs_copy_shader;
booldx10_clamp_mode; /* convert NaNs to 0 */
 };
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index eef3baa..a842d9d 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -148,10 +148,9 @@ static void si_shader_gs(struct si_shader *shader)
 
 static void si_shader_vs(struct si_shader *shader)
 {
-   struct tgsi_shader_info *info = shader-selector-info;
struct si_pm4_state *pm4;
unsigned num_sgprs, num_user_sgprs;
-   unsigned nparams, i, vgpr_comp_cnt;
+   unsigned nparams, vgpr_comp_cnt;
uint64_t va;
unsigned window_space =
   
shader-selector-info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION];
@@ -180,26 +179,8 @@ static void si_shader_vs(struct si_shader *shader)
}
assert(num_sgprs = 104);
 
-   /* Certain attributes (position, psize, etc.) don't count as params.
-* VS is required to export at least one param and 
r600_shader_from_tgsi()
-* takes care of adding a dummy export.
-*/
-   for (nparams = 0, i = 0 ; i  info-num_outputs; i++) {
-   switch (info-output_semantic_name[i]) {
-   case TGSI_SEMANTIC_CLIPVERTEX:
-   case TGSI_SEMANTIC_CLIPDIST:
-   case TGSI_SEMANTIC_CULLDIST:
-   case TGSI_SEMANTIC_POSITION:
-   case TGSI_SEMANTIC_PSIZE:
-   case TGSI_SEMANTIC_EDGEFLAG:
-   break;
-   default:
-   nparams++;
-   }
-   }
-   if (nparams  1)
-   nparams = 1;
-
+   /* VS is required to export at least one param. */
+   nparams = MAX2(shader-nr_param_exports, 1);
si_pm4_set_reg(pm4, R_0286C4_SPI_VS_OUT_CONFIG,
   S_0286C4_VS_EXPORT_COUNT(nparams - 1));
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 90821] Segfault when calling glViewport on surfaceless EGL context without bound FBO

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=90821

Philip Derrin philip+freedesk...@breakawayconsulting.com.au changed:

   What|Removed |Added

 CC||philip+freedesktop@breakawa
   ||yconsulting.com.au

--- Comment #4 from Philip Derrin 
philip+freedesk...@breakawayconsulting.com.au ---
I encountered this too, on i965.

It happens because intel_viewport calls _mesa_is_winsys_fbo to determine
whether the draw buffer has an associated DRI2 drawable, but that predicate
returns true for the global IncompleteFramebuffer which is used by surfaceless
contexts, which obviously has no DRI2 drawable. A null pointer check in
intel_viewport fixes it.

The i915 driver looks like it has the same problem (in
intel_invalidate_viewport), though I haven't tested it.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: Use param export count from si_llvm_export_vs in si_shader_vs

2015-07-06 Thread Marek Olšák

I thought your patch does something else, but after re-reading it, it
seems to do the right thing.

Reviewed-by: Marek Olšák marek.ol...@amd.com

Marek

On Mon, Jul 6, 2015 at 1:38 PM, Marek Olšák mar...@gmail.com wrote:
 Hi Michel,

 VS_EXPORT_COUNT shouldn't include POS exports (POSITION, MISC (psize,
 edgeflag, layer, viewport), and CLIPDIST/CULLDIST), it should only
 count PARAM exports. Setting higher VS_EXPORT_COUNT doesn't affect
 correctness but it allocates more parameter memory. CLIPDIST is
 special in that it's exported twice, first as POS and then as PARAM,
 the same will be done for CULLDIST in the future, so these two should
 indeed be removed from the list.

 Marek


 On Mon, Jul 6, 2015 at 10:30 AM, Michel Dänzer mic...@daenzer.net wrote:
 From: Michel Dänzer michel.daen...@amd.com

 This eliminates the error prone logic in si_shader_vs recalculating this
 value.

 It also fixes TGSI_SEMANTIC_CLIPDIST outputs incorrectly not being
 counted for VS exports, since they are passed to the fragment shader as
 varyings as well.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91193
 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
  src/gallium/drivers/radeonsi/si_shader.c|  2 ++
  src/gallium/drivers/radeonsi/si_shader.h|  1 +
  src/gallium/drivers/radeonsi/si_state_shaders.c | 25 
 +++--
  3 files changed, 6 insertions(+), 22 deletions(-)

 diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
 b/src/gallium/drivers/radeonsi/si_shader.c
 index 4d97b58..753b238 100644
 --- a/src/gallium/drivers/radeonsi/si_shader.c
 +++ b/src/gallium/drivers/radeonsi/si_shader.c
 @@ -1218,6 +1218,8 @@ handle_semantic:
 }
 }

 +   shader-nr_param_exports = param_count;
 +
 /* We need to add the position output manually if it's missing. */
 if (!pos_args[0][0]) {
 pos_args[0][0] = lp_build_const_int32(base-gallivm, 0xf); 
 /* writemask */
 diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
 b/src/gallium/drivers/radeonsi/si_shader.h
 index b4339ae..8d309b4 100644
 --- a/src/gallium/drivers/radeonsi/si_shader.h
 +++ b/src/gallium/drivers/radeonsi/si_shader.h
 @@ -165,6 +165,7 @@ struct si_shader {

 booluses_instanceid;
 unsignednr_pos_exports;
 +   unsignednr_param_exports;
 boolis_gs_copy_shader;
 booldx10_clamp_mode; /* convert NaNs to 0 */
  };
 diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
 b/src/gallium/drivers/radeonsi/si_state_shaders.c
 index eef3baa..a842d9d 100644
 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
 +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
 @@ -148,10 +148,9 @@ static void si_shader_gs(struct si_shader *shader)

  static void si_shader_vs(struct si_shader *shader)
  {
 -   struct tgsi_shader_info *info = shader-selector-info;
 struct si_pm4_state *pm4;
 unsigned num_sgprs, num_user_sgprs;
 -   unsigned nparams, i, vgpr_comp_cnt;
 +   unsigned nparams, vgpr_comp_cnt;
 uint64_t va;
 unsigned window_space =

 shader-selector-info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION];
 @@ -180,26 +179,8 @@ static void si_shader_vs(struct si_shader *shader)
 }
 assert(num_sgprs = 104);

 -   /* Certain attributes (position, psize, etc.) don't count as params.
 -* VS is required to export at least one param and 
 r600_shader_from_tgsi()
 -* takes care of adding a dummy export.
 -*/
 -   for (nparams = 0, i = 0 ; i  info-num_outputs; i++) {
 -   switch (info-output_semantic_name[i]) {
 -   case TGSI_SEMANTIC_CLIPVERTEX:
 -   case TGSI_SEMANTIC_CLIPDIST:
 -   case TGSI_SEMANTIC_CULLDIST:
 -   case TGSI_SEMANTIC_POSITION:
 -   case TGSI_SEMANTIC_PSIZE:
 -   case TGSI_SEMANTIC_EDGEFLAG:
 -   break;
 -   default:
 -   nparams++;
 -   }
 -   }
 -   if (nparams  1)
 -   nparams = 1;
 -
 +   /* VS is required to export at least one param. */
 +   nparams = MAX2(shader-nr_param_exports, 1);
 si_pm4_set_reg(pm4, R_0286C4_SPI_VS_OUT_CONFIG,
S_0286C4_VS_EXPORT_COUNT(nparams - 1));

 --
 2.1.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 90466] arm: linker error ndefined reference to `nir_metadata_preserve'

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=90466

Emil Velikov emil.l.veli...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Emil Velikov emil.l.veli...@gmail.com ---
The issue has been resolved with 10.6.0.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] mesa: remove duplicate array index validation and extra uniform location lookup

2015-07-06 Thread Martin Peres


On 03/07/15 15:35, Timothy Arceri wrote:

On Fri, 2015-07-03 at 15:08 +0300, Martin Peres wrote:

On 03/07/15 13:55, Timothy Arceri wrote:

This change removes multiple functions designed to validate an array
subscript and replaces them with a call to a single function.

The change also means that validation is now only done once and the index
is retrived at the same time, as a result the getUniformLocation code can
be merged saving an extra hash table lookup (and yet another validation
call).

Cc: Martin Peres martin.pe...@linux.intel.com
Cc: Tapani Pälli tapani.pa...@intel.com
---
   This clean-up was done to allow AoA program interface query support to
be
   implemented more easily. When applied to my latest AoA work all but one
   AoA program interface query subtest for the CTS now passes.

   No Piglit regressions.

   I also ran the following tests in the CTS without regression:

   - ES31-CTS.program_interface_query*
   - ES3
-CTS.gtf.GL3Tests.explicit_attrib_location.explicit_attrib_location_room
   - ES3-CTS.gtf.GL3Tests.draw_instanced.draw_instanced_max_vertex_attribs

   src/mesa/main/program_resource.c |  66 ++---
   src/mesa/main/shader_query.cpp   | 206 -
--
   src/mesa/main/shaderapi.h|   7 +-
   src/mesa/main/uniform_query.cpp  |  75 --
   src/mesa/main/uniforms.c |   7 +-
   src/mesa/main/uniforms.h |   4 -
   6 files changed, 100 insertions(+), 265 deletions(-)

diff --git a/src/mesa/main/program_resource.c
b/src/mesa/main/program_resource.c
index d857b84..0a306b8 100644
--- a/src/mesa/main/program_resource.c
+++ b/src/mesa/main/program_resource.c
@@ -173,32 +173,12 @@ is_xfb_marker(const char *str)
  return false;
   }
   
-/**

- * Checks if given name index is legal for GetProgramResourceIndex,
- * check is written to be compatible with GL_ARB_array_of_arrays.
- */
-static bool
-valid_program_resource_index_name(const GLchar *name)
-{
-   const char *array = strstr(name, [);
-   const char *close = strrchr(name, ']');
-
-   /* Not array, no need for the check. */
-   if (!array)
-  return true;
-
-   /* Last array index has to be zero. */
-   if (!close || *--close != '0')
-  return false;
-
-   return true;
-}
-
   GLuint GLAPIENTRY
   _mesa_GetProgramResourceIndex(GLuint program, GLenum programInterface,
 const GLchar *name)
   {
  GET_CURRENT_CONTEXT(ctx);
+   unsigned array_index;
  struct gl_program_resource *res;
  struct gl_shader_program *shProg =
 _mesa_lookup_shader_program_err(ctx, program,
@@ -221,12 +201,9 @@ _mesa_GetProgramResourceIndex(GLuint program, GLenum
programInterface,
  case GL_PROGRAM_OUTPUT:
  case GL_UNIFORM:
  case GL_TRANSFORM_FEEDBACK_VARYING:
-  /* Validate name syntax for array variables */
-  if (!valid_program_resource_index_name(name))
- return GL_INVALID_INDEX;
-  /* fall-through */
  case GL_UNIFORM_BLOCK:
-  res = _mesa_program_resource_find_name(shProg, programInterface,
name);
+  res = _mesa_program_resource_find_name(shProg, programInterface,
name,
+ array_index, true);
 if (!res)
return GL_INVALID_INDEX;
   
@@ -300,36 +277,6 @@ _mesa_GetProgramResourceiv(GLuint program, GLenum

programInterface,
   propCount, props, bufSize, length,
params);
   }
   
-/**

- * Function verifies syntax of given name for GetProgramResourceLocation
- * and GetProgramResourceLocationIndex for the following cases:
- *
- * array element portion of a string passed to
GetProgramResourceLocation
- * or GetProgramResourceLocationIndex must not have, a + sign, extra
- * leading zeroes, or whitespace.
- *
- * Check is written to be compatible with GL_ARB_array_of_arrays.
- */
-static bool
-invalid_array_element_syntax(const GLchar *name)
-{
-   char *first = strchr(name, '[');
-   char *last = strrchr(name, '[');
-
-   if (!first)
-  return false;
-
-   /* No '+' or ' ' allowed anywhere. */
-   if (strchr(first, '+') || strchr(first, ' '))
-  return true;
-
-   /* Check that last array index is 0. */
-   if (last[1] == '0'  last[2] != ']')
-  return true;
-
-   return false;
-}
-
   static struct gl_shader_program *
   lookup_linked_program(GLuint program, const char *caller)
   {
@@ -356,7 +303,7 @@ _mesa_GetProgramResourceLocation(GLuint program,
GLenum programInterface,
  struct gl_shader_program *shProg =
 lookup_linked_program(program, glGetProgramResourceLocation);
   
-   if (!shProg || !name || invalid_array_element_syntax(name))

+   if (!shProg || !name)
 return -1;
   
  /* Validate programInterface. */

@@ -383,7 +330,8 @@ _mesa_GetProgramResourceLocation(GLuint program,
GLenum programInterface,
 _mesa_lookup_enum_by_nr(programInterface), name);
  }
   
-   return _mesa_program_resource_location(shProg, programInterface,

Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once

2015-07-06 Thread Martin Peres


On 06/07/15 13:33, Chris Wilson wrote:

Move the query for the TIMESTAMP register from context init to the
screen, so that it is only queried once for all contexts.

On 32bit systems, some old kernels trigger a hw bug resulting in the
TIMESTAMP register being shifted and the low bits always zero. Detect
this by repeating the read a few times and check the register is
incrementing.


You do not do the latter. You only check for the low bits.

I guess the counter is supposed to be monotonically increasing and with 
a resolution of a few microseconds which would make this perfectly 
valid. Could you confirm and make sure to add this information in the 
commit message please?




Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
  src/mesa/drivers/dri/i965/intel_extensions.c |  6 +-
  src/mesa/drivers/dri/i965/intel_screen.c | 15 +++
  src/mesa/drivers/dri/i965/intel_screen.h |  2 ++
  3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 3423190..740ac81 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -282,8 +282,6 @@ intelInitExtensions(struct gl_context *ctx)
 }
  
 if (brw-gen = 6) {

-  uint64_t dummy;
-
ctx-Extensions.ARB_blend_func_extended =
   !driQueryOptionb(brw-optionCache, disable_blend_func_extended);
ctx-Extensions.ARB_conditional_render_inverted = true;
@@ -307,9 +305,7 @@ intelInitExtensions(struct gl_context *ctx)
ctx-Extensions.EXT_transform_feedback = true;
ctx-Extensions.OES_depth_texture_cube_map = true;
  
-  /* Test if the kernel has the ioctl. */

-  if (drm_intel_reg_read(brw-bufmgr, TIMESTAMP, dummy) == 0)
- ctx-Extensions.ARB_timer_query = true;
+  ctx-Extensions.ARB_timer_query = brw-intelScreen-hw_has_timestamp;
  
/* Only enable this in core profile because other parts of Mesa behave

 * slightly differently when the extension is enabled.
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index f9398d7..839a984 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1123,6 +1123,20 @@ intel_detect_swizzling(struct intel_screen *screen)
return true;
  }
  
+static bool

+intel_detect_timestamp(struct intel_screen *screen)
+{
+   uint64_t dummy = 0;
+   int loop = 10;
+
+   do {
+  if (drm_intel_reg_read(screen-bufmgr, TIMESTAMP, dummy))
+return false;
+   } while ((dummy  0x) == 0  --loop);
+
+   return loop  0;
+}
+
  /**
   * Return array of MSAA modes supported by the hardware. The array is
   * zero-terminated and sorted in decreasing order.
@@ -1378,6 +1392,7 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
 intelScreen-hw_must_use_separate_stencil = intelScreen-devinfo-gen = 7;
  
 intelScreen-hw_has_swizzling = intel_detect_swizzling(intelScreen);

+   intelScreen-hw_has_timestamp = intel_detect_timestamp(intelScreen);
  
 const char *force_msaa = getenv(INTEL_FORCE_MSAA);

 if (force_msaa) {
diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
b/src/mesa/drivers/dri/i965/intel_screen.h
index 742b3d3..941e0fc 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.h
+++ b/src/mesa/drivers/dri/i965/intel_screen.h
@@ -52,6 +52,8 @@ struct intel_screen
  
 bool hw_has_swizzling;
  
+   bool hw_has_timestamp;

+
 /**
  * Does the kernel support context reset notifications?
  */


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 69874] Automake throws a lot of [...] option 'subdir-objects' is disabled

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=69874

Emil Velikov emil.l.veli...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Emil Velikov emil.l.veli...@gmail.com ---
The final piece has been resolved and subdir-object has been enabled globally
as per
commit 404a90b82786080564fe32716f83ce055b9a934f
Author: Matt Turner matts...@gmail.com
Date:   Wed Jun 10 16:30:56 2015 -0700

mesa: Enable subdir-objects globally.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 89131] [Bisected] Graphical corruption in Weston, shows old framebuffer pieces

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=89131

Emil Velikov emil.l.veli...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Emil Velikov emil.l.veli...@gmail.com ---
The following patch is part of mesa since 10.5.7. Feel free to reopen if it
doesn't resolve the issue.

commit 25e9ae2b79f32631e7255807a242e5fc4e39984c
Author: Marek Olšák marek.ol...@amd.com
Date:   Tue May 26 19:32:36 2015 +0200

st/dri: fix postprocessing crash when there's no depth buffer

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: Use param export count from si_llvm_export_vs in si_shader_vs

2015-07-06 Thread Marek Olšák

Hi Michel,

VS_EXPORT_COUNT shouldn't include POS exports (POSITION, MISC (psize,
edgeflag, layer, viewport), and CLIPDIST/CULLDIST), it should only
count PARAM exports. Setting higher VS_EXPORT_COUNT doesn't affect
correctness but it allocates more parameter memory. CLIPDIST is
special in that it's exported twice, first as POS and then as PARAM,
the same will be done for CULLDIST in the future, so these two should
indeed be removed from the list.

Marek


On Mon, Jul 6, 2015 at 10:30 AM, Michel Dänzer mic...@daenzer.net wrote:
 From: Michel Dänzer michel.daen...@amd.com

 This eliminates the error prone logic in si_shader_vs recalculating this
 value.

 It also fixes TGSI_SEMANTIC_CLIPDIST outputs incorrectly not being
 counted for VS exports, since they are passed to the fragment shader as
 varyings as well.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91193
 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
  src/gallium/drivers/radeonsi/si_shader.c|  2 ++
  src/gallium/drivers/radeonsi/si_shader.h|  1 +
  src/gallium/drivers/radeonsi/si_state_shaders.c | 25 
 +++--
  3 files changed, 6 insertions(+), 22 deletions(-)

 diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
 b/src/gallium/drivers/radeonsi/si_shader.c
 index 4d97b58..753b238 100644
 --- a/src/gallium/drivers/radeonsi/si_shader.c
 +++ b/src/gallium/drivers/radeonsi/si_shader.c
 @@ -1218,6 +1218,8 @@ handle_semantic:
 }
 }

 +   shader-nr_param_exports = param_count;
 +
 /* We need to add the position output manually if it's missing. */
 if (!pos_args[0][0]) {
 pos_args[0][0] = lp_build_const_int32(base-gallivm, 0xf); /* 
 writemask */
 diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
 b/src/gallium/drivers/radeonsi/si_shader.h
 index b4339ae..8d309b4 100644
 --- a/src/gallium/drivers/radeonsi/si_shader.h
 +++ b/src/gallium/drivers/radeonsi/si_shader.h
 @@ -165,6 +165,7 @@ struct si_shader {

 booluses_instanceid;
 unsignednr_pos_exports;
 +   unsignednr_param_exports;
 boolis_gs_copy_shader;
 booldx10_clamp_mode; /* convert NaNs to 0 */
  };
 diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
 b/src/gallium/drivers/radeonsi/si_state_shaders.c
 index eef3baa..a842d9d 100644
 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
 +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
 @@ -148,10 +148,9 @@ static void si_shader_gs(struct si_shader *shader)

  static void si_shader_vs(struct si_shader *shader)
  {
 -   struct tgsi_shader_info *info = shader-selector-info;
 struct si_pm4_state *pm4;
 unsigned num_sgprs, num_user_sgprs;
 -   unsigned nparams, i, vgpr_comp_cnt;
 +   unsigned nparams, vgpr_comp_cnt;
 uint64_t va;
 unsigned window_space =

 shader-selector-info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION];
 @@ -180,26 +179,8 @@ static void si_shader_vs(struct si_shader *shader)
 }
 assert(num_sgprs = 104);

 -   /* Certain attributes (position, psize, etc.) don't count as params.
 -* VS is required to export at least one param and 
 r600_shader_from_tgsi()
 -* takes care of adding a dummy export.
 -*/
 -   for (nparams = 0, i = 0 ; i  info-num_outputs; i++) {
 -   switch (info-output_semantic_name[i]) {
 -   case TGSI_SEMANTIC_CLIPVERTEX:
 -   case TGSI_SEMANTIC_CLIPDIST:
 -   case TGSI_SEMANTIC_CULLDIST:
 -   case TGSI_SEMANTIC_POSITION:
 -   case TGSI_SEMANTIC_PSIZE:
 -   case TGSI_SEMANTIC_EDGEFLAG:
 -   break;
 -   default:
 -   nparams++;
 -   }
 -   }
 -   if (nparams  1)
 -   nparams = 1;
 -
 +   /* VS is required to export at least one param. */
 +   nparams = MAX2(shader-nr_param_exports, 1);
 si_pm4_set_reg(pm4, R_0286C4_SPI_VS_OUT_CONFIG,
S_0286C4_VS_EXPORT_COUNT(nparams - 1));

 --
 2.1.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 90905] mesa: Finish subdir-objects transition

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=90905

Emil Velikov emil.l.veli...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Emil Velikov emil.l.veli...@gmail.com ---
Both issues (the subdir-objects and -Wl,--allow-multiple-definition) should be
sorted now. Thank for the help gents.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 89068] glTexImage2D regression by texstore_rgba switch to _mesa_format_convert

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=89068

Emil Velikov emil.l.veli...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Emil Velikov emil.l.veli...@gmail.com ---
The patch referenced is present in mesa 10.5.0 onwards.
Thanks for the report and confirmation Brad.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 45348] [swrast] piglit fbo-drawbuffers-arbfp regression

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=45348

Emil Velikov emil.l.veli...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Emil Velikov emil.l.veli...@gmail.com ---
Hi Vinson,

Brian has pushed a fix shortly after your last test. It's available with 10.5.0
and later. Feel free to give it a test and reopen if the issue persists.

commit 89c96afe3c0acf8f2fccaf02da02945afe8ba5f3
Author: Brian Paul bri...@vmware.com
Date:   Mon Feb 16 11:23:06 2015 -0700

swrast: fix multiple color buffer writing

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once

2015-07-06 Thread Chris Wilson

On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote:
 On 06/07/15 13:33, Chris Wilson wrote:
 Move the query for the TIMESTAMP register from context init to the
 screen, so that it is only queried once for all contexts.
 
 On 32bit systems, some old kernels trigger a hw bug resulting in the
 TIMESTAMP register being shifted and the low bits always zero. Detect
 this by repeating the read a few times and check the register is
 incrementing.
 
 You do not do the latter. You only check for the low bits.
 
 I guess the counter is supposed to be monotonically increasing and
 with a resolution of a few microseconds which would make this
 perfectly valid. Could you confirm and make sure to add this
 information in the commit message please?

The counter should increment every 80ns. What's misleading in what I
wrote? It describes the hw bug and how to detect it.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once

2015-07-06 Thread Martin Peres




On 06/07/15 16:15, Martin Peres wrote:

On 06/07/15 16:13, Chris Wilson wrote:

On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote:

On 06/07/15 13:33, Chris Wilson wrote:

Move the query for the TIMESTAMP register from context init to the
screen, so that it is only queried once for all contexts.

On 32bit systems, some old kernels trigger a hw bug resulting in the
TIMESTAMP register being shifted and the low bits always zero. Detect
this by repeating the read a few times and check the register is
incrementing.

You do not do the latter. You only check for the low bits.

I guess the counter is supposed to be monotonically increasing and
with a resolution of a few microseconds which would make this
perfectly valid. Could you confirm and make sure to add this
information in the commit message please?

The counter should increment every 80ns. What's misleading in what I
wrote? It describes the hw bug and how to detect it.


Well, it is not misleading, it just lacks this information.

If it incremented every seconds, the patch would be stupid because the 
timestamp could be at 0 and polling 10 times at a few us of interval 
would always yield the same result. That's all :)


Oh, forgot to say: With this information added in the commit message and 
the commit message duplicated as a comment in intel_detect_timestamp(), 
the patch is:


Reviewed-by: Martin Peres martin.pe...@linux.intel.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/18] i965: Share the workaround bo between all contexts

2015-07-06 Thread Martin Peres




On 06/07/15 13:33, Chris Wilson wrote:

Since the workaround bo is used strictly as a write-only buffer, we need
only allocate one per screen and use the same one from all contexts.

(The caveat here is during extension initialisation, where we write into
and read back register values from the buffer, but that is performed only
once for the first context - and baring synchronisation issues should not
be a problem. Safer would be to move that also to the screen.)

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
  src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 +++---
  src/mesa/drivers/dri/i965/intel_screen.c | 4 
  src/mesa/drivers/dri/i965/intel_screen.h | 1 +
  3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index 7ee3cb6..05e14cd 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -341,12 +341,12 @@ brw_init_pipe_control(struct brw_context *brw,
  * the gen6 workaround because it involves actually writing to
  * the buffer, and the kernel doesn't let us write to the batch.
  */
-   brw-workaround_bo = drm_intel_bo_alloc(brw-bufmgr,
-   pipe_control workaround,
-   4096, 4096);
+   brw-workaround_bo = brw-intelScreen-workaround_bo;
 if (brw-workaround_bo == NULL)
return -ENOMEM;
  
+   drm_intel_bo_reference(brw-workaround_bo);

+
 brw-pipe_controls_since_last_cs_stall = 0;
  
 return 0;

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 839a984..cd8e6eb 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv)
  {
 struct intel_screen *intelScreen = sPriv-driverPrivate;
  
+   drm_intel_bo_unreference(intelScreen-workaround_bo);


I guess this is OK because after the last context got destroyed the 
screen would get destroyed?


In any case, I don't get the point of reference counting when we 
obviously only need to free the buffer when the screen gets freed.

 dri_bufmgr_destroy(intelScreen-bufmgr);
 driDestroyOptionInfo(intelScreen-optionCache);
  
@@ -1096,6 +1097,9 @@ intel_init_bufmgr(struct intel_screen *intelScreen)

return false;
 }
  
+   intelScreen-workaround_bo =

+  drm_intel_bo_alloc(intelScreen-bufmgr, pipe_control w/a, 4096, 4096);
+
 return true;
  }
  
diff --git a/src/mesa/drivers/dri/i965/intel_screen.h b/src/mesa/drivers/dri/i965/intel_screen.h

index 941e0fc..e55fddb 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.h
+++ b/src/mesa/drivers/dri/i965/intel_screen.h
@@ -60,6 +60,7 @@ struct intel_screen
 bool has_context_reset_notification;
  
 dri_bufmgr *bufmgr;

+   drm_intel_bo *workaround_bo;
  
 /**

  * A unique ID for shader programs.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 79706] [TRACKER] Mesa regression tracker

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=79706
Bug 79706 depends on bug 45348, which changed state.

Bug 45348 Summary: [swrast] piglit fbo-drawbuffers-arbfp regression
https://bugs.freedesktop.org/show_bug.cgi?id=45348

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/18] i965: Move pipecontrol workaround bo to brw_pipe_control

2015-07-06 Thread Martin Peres


On 06/07/15 13:33, Chris Wilson wrote:

With the exception of gen8, the sole user of the workaround bo are for
emitting pipe controls. Move it out of the purview of the batchbuffer
and into the pipecontrol.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk


Reviewed-by: Martin Peres martin.pe...@linux.intel.com

---
  src/mesa/drivers/dri/i965/brw_context.c   |  7 +
  src/mesa/drivers/dri/i965/brw_context.h   | 12 +---
  src/mesa/drivers/dri/i965/brw_pipe_control.c  | 40 +++
  src/mesa/drivers/dri/i965/gen8_depth_state.c  |  2 +-
  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 12 
  src/mesa/drivers/dri/i965/intel_extensions.c  | 28 +--
  6 files changed, 64 insertions(+), 37 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 4b51fe5..8150b94 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -819,6 +819,12 @@ brwCreateContext(gl_api api,
}
 }
  
+   if (brw_init_pipe_control(brw, devinfo)) {

+  *dri_ctx_error = __DRI_CTX_ERROR_NO_MEMORY;
+  intelDestroyContext(driContextPriv);
+  return false;
+   }
+
 brw_init_state(brw);
  
 intelInitExtensions(ctx);

@@ -942,6 +948,7 @@ intelDestroyContext(__DRIcontext * driContextPriv)
 if (ctx-swrast_context)
_swrast_DestroyContext(brw-ctx);
  
+   brw_fini_pipe_control(brw);

 intel_batchbuffer_free(brw);
  
 drm_intel_bo_unreference(brw-throttle_batch[1]);

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 3553f6e..db0fc48 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -868,8 +868,6 @@ struct intel_batchbuffer {
 drm_intel_bo *bo;
 /** Last BO submitted to the hardware.  Used for glFinish(). */
 drm_intel_bo *last_bo;
-   /** BO for post-sync nonzero writes for gen6 workaround. */
-   drm_intel_bo *workaround_bo;
  
 uint16_t emit, total;

 uint16_t used, reserved_space;
@@ -881,8 +879,6 @@ struct intel_batchbuffer {
 enum brw_gpu_ring ring;
 bool needs_sol_reset;
  
-   uint8_t pipe_controls_since_last_cs_stall;

-
 struct {
uint16_t used;
int reloc_count;
@@ -1034,6 +1030,10 @@ struct brw_context
  
 drm_intel_context *hw_ctx;
  
+   /** BO for post-sync nonzero writes for gen6 workaround. */

+   drm_intel_bo *workaround_bo;
+   uint8_t pipe_controls_since_last_cs_stall;
+
 /**
  * Set of drm_intel_bo * that have been rendered to within this batchbuffer
  * and would need flushing before being used from another cache domain that
@@ -2000,6 +2000,10 @@ gen9_use_linear_1d_layout(const struct brw_context *brw,
const struct intel_mipmap_tree *mt);
  
  /* brw_pipe_control.c */

+int brw_init_pipe_control(struct brw_context *brw,
+ const struct brw_device_info *info);
+void brw_fini_pipe_control(struct brw_context *brw);
+
  void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags);
  void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags,
   drm_intel_bo *bo, uint32_t offset,
diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index b4c86b9..7ee3cb6 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -72,13 +72,13 @@ gen7_cs_stall_every_four_pipe_controls(struct brw_context 
*brw, uint32_t flags)
 if (brw-gen == 7  !brw-is_haswell) {
if (flags  PIPE_CONTROL_CS_STALL) {
   /* If we're doing a CS stall, reset the counter and carry on. */
- brw-batch.pipe_controls_since_last_cs_stall = 0;
+ brw-pipe_controls_since_last_cs_stall = 0;
   return 0;
}
  
/* If this is the fourth pipe control without a CS stall, do one now. */

-  if (++brw-batch.pipe_controls_since_last_cs_stall == 4) {
- brw-batch.pipe_controls_since_last_cs_stall = 0;
+  if (++brw-pipe_controls_since_last_cs_stall == 4) {
+ brw-pipe_controls_since_last_cs_stall = 0;
   return PIPE_CONTROL_CS_STALL;
}
 }
@@ -213,7 +213,7 @@ gen7_emit_vs_workaround_flush(struct brw_context *brw)
 brw_emit_pipe_control_write(brw,
 PIPE_CONTROL_WRITE_IMMEDIATE
 | PIPE_CONTROL_DEPTH_STALL,
-   brw-batch.workaround_bo, 0,
+   brw-workaround_bo, 0,
 0, 0);
  }
  
@@ -227,7 +227,7 @@ gen7_emit_cs_stall_flush(struct brw_context *brw)

 brw_emit_pipe_control_write(brw,
 PIPE_CONTROL_CS_STALL
 | PIPE_CONTROL_WRITE_IMMEDIATE,
-

Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once

2015-07-06 Thread Martin Peres


On 06/07/15 16:13, Chris Wilson wrote:

On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote:

On 06/07/15 13:33, Chris Wilson wrote:

Move the query for the TIMESTAMP register from context init to the
screen, so that it is only queried once for all contexts.

On 32bit systems, some old kernels trigger a hw bug resulting in the
TIMESTAMP register being shifted and the low bits always zero. Detect
this by repeating the read a few times and check the register is
incrementing.

You do not do the latter. You only check for the low bits.

I guess the counter is supposed to be monotonically increasing and
with a resolution of a few microseconds which would make this
perfectly valid. Could you confirm and make sure to add this
information in the commit message please?

The counter should increment every 80ns. What's misleading in what I
wrote? It describes the hw bug and how to detect it.


Well, it is not misleading, it just lacks this information.

If it incremented every seconds, the patch would be stupid because the 
timestamp could be at 0 and polling 10 times at a few us of interval 
would always yield the same result. That's all :)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: fix lp_build_compare_ext

2015-07-06 Thread Jose Fonseca


On 04/07/15 07:15, Vinson Lee wrote:

On Fri, Jul 3, 2015 at 6:05 PM,  srol...@vmware.com wrote:

From: Roland Scheidegger srol...@vmware.com

The expansion should always be to the same width as the input arguments
no matter what, since these functions should work with any bit width of
the arguments (the sext is a no-op on any sane simd architecture).
Thus, fix the caller expecting differently.

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=91222 (not tested
otherwise)
---
  src/gallium/auxiliary/gallivm/lp_bld_logic.c   | 2 +-
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 3 +++
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_logic.c 
b/src/gallium/auxiliary/gallivm/lp_bld_logic.c
index f724cfa..80b53e5 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_logic.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_logic.c
@@ -81,7 +81,7 @@ lp_build_compare_ext(struct gallivm_state *gallivm,
   boolean ordered)
  {
 LLVMBuilderRef builder = gallivm-builder;
-   LLVMTypeRef int_vec_type = lp_build_int_vec_type(gallivm, 
lp_type_int_vec(32, 32 * type.length));
+   LLVMTypeRef int_vec_type = lp_build_int_vec_type(gallivm, type);
 LLVMValueRef zeros = LLVMConstNull(int_vec_type);
 LLVMValueRef ones = LLVMConstAllOnes(int_vec_type);
 LLVMValueRef cond;
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
index 1f2af85..0ad78b0 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
@@ -1961,8 +1961,11 @@ dset_emit_cpu(
 struct lp_build_emit_data * emit_data,
 unsigned pipe_func)
  {
+   LLVMBuilderRef builder = bld_base-base.gallivm-builder;
 LLVMValueRef cond = lp_build_cmp(bld_base-dbl_bld, pipe_func,
  emit_data-args[0], emit_data-args[1]);
+   /* arguments were 64 bit but store as 32 bit */
+   cond = LLVMBuildTrunc(builder, cond, bld_base-int_bld.int_vec_type, );
 emit_data-output[emit_data-chan] = cond;
  }

--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



Tested-by: Vinson Lee v...@freedesktop.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



Reviewed-by: Jose Fonseca jfons...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager

2015-07-06 Thread Chris Wilson

Just skipping to interesting comments for the moment.

On Mon, Jul 06, 2015 at 12:34:00PM -0700, Kenneth Graunke wrote:
 While I really like this idea in principle, the current patch is rather
 huge, making it difficult to review; bisectability would also suffer.
 Would it be possible to split it up into several smaller patches?

First time I tried I ended up with a series of patches of equal churn
and I wasn't happy with the results. Honestly, I find mechanical noise
in patches easy to tune out, you quickly recognise the pattern and
deviations should stand out.

Some of the churn I think is important as it highlight the relative
emphasis on different coding patterns. In particular, how the relocation
must always emit the address in the batch, so in quite a few functions
the code had to be rearranged in order to make that happen
 
 One idea I had for how you might accomplish that is to introduce the
 brw_bo struct and related functions, but make it simply contain a
 drm_intel_bo * field, and fall back to the existing libdrm-based code.

Sure, what I had in mind was a changeover to a struct brw_bo that just
wrapped drm_intel_bo for typesafefty (so the compiler was there to catch
mistakes and missing pieces). I felt the patch was still about 500 lines
of mechcanical changes, 1000 lines removing the intel_batchbuffer.c and
2000 lines adding brw_bo (it has grown somewhat with the comments!). The
api to review was in one place, and the bits that required fine changes
(mostly intel_bufferobj.c) were equally obvious.

But what you really need is a slow progression in feature development of
brw_batch. Splitting the mechanical changes off with a touch more churn,
and even converting functions over to new API in stages is relatively
trivial compared to building up the feature set required to do
batchbuffer construction from scratch.

- relocations come first, as request tracking cannot happen with
integration into the relocations.
- execbuf
- request tracking
- mmap functions
- exporting busyness

Honestly though the design has never existed in a piecemeal fashion, it
has always been requests first and that drove the infrastructure
required to enable the request tracking.

Anyway splitting out the mechanical changes is worth it if it gets more
eyes onto brw_batch.c itself and some of its knock-on effects.

 That way, you could put all the mechanical renaming and refactoring
 across the entire code-base in one patch that should have no functional
 change.  You could then replace the contents or brw_bo and those
 functions with your new improved batch manager.
 
 We've talked about moving away from libdrm for a while, so having our
 own functions and structures makes a lot of sense.
 
 I'm also curious about the performance on non-LLC platforms (CHV or
 BYT)?  You've dropped a number of non-LLC paths - which is probably
 fine, honestly...they were always of dubious value - but it'd be nice
 to know the record the effects of this change on non-LLC in the commit
 message.

Ah, I didn't actually remove non-LLC paths, I added equivalent detiled
CPU access for non-LLC (strictly non cache coherent bo, incl scanouts) as
for LLC. So I guess you mean the few places that lost the brw-has_llc
differentiation?
 
  @@ -279,11 +254,6 @@ retry:
  brw-ctx.NewDriverState = ~0ull;
  brw-no_depth_or_stencil = false;
  brw-ib.type = -1;
  -
  -   /* Flush the sampler cache so any texturing from the destination is
  -* coherent.
  -*/
  -   brw_emit_mi_flush(brw);
 
 It looks like you moved this flush earlier, so it's in the section of
 code that retries?  That's probably reasonable.  Seems worth keeping the
 comment, and this could be done as a separate patch...

True. The reason why it has to be earlier is that I am much stricter on
having all BEGIN_BATCH/ADVANCE_BATCH chunks being inside a
brw_batch_begin/brw_batch_end - since brw_emit_mi_flush() itself doesn't
start a new brw_batch_begin section, it had to go in blorp's. The comment
was removed because we now have accurate dirty texture cache tracking and
that itself is not the reason wy flush has to exist. (The flush is for a
change in hiz mode iirc, or some similar change in GPU state that
requires a flush between primitives.)
 
 [snip]
  +static void
  +load_sized_register_mem(struct brw_context *brw,
  +uint32_t reg,
  +struct brw_bo *bo,
  +uint32_t read_domains, uint32_t write_domain,
  +uint32_t offset,
  +int size)
  +{
  +   int i;
  +
  +   /* MI_LOAD_REGISTER_MEM only exists on Gen7+. */
  +   assert(brw-gen = 7);
  +
  +   if (brw-gen = 8) {
  +  BEGIN_BATCH(4 * size);
  +  for (i = 0; i  size; i++) {
  + OUT_BATCH(GEN7_MI_LOAD_REGISTER_MEM | (4 - 2));
  + OUT_BATCH(reg + i * 4);
  + OUT_RELOC64(bo, read_domains, write_domain, offset + i * 4);
  +  }
  +  ADVANCE_BATCH();
  +   } else {
  +

Re: [Mesa-dev] [PATCH] mesa: Add a MUST_CHECK macro for attribute((warn_unused_result)).

2015-07-06 Thread Chris Wilson

On Mon, Jul 06, 2015 at 11:18:00AM -0700, Kenneth Graunke wrote:
 In the kernel, this is called __must_check; all our attribute macros in
 Mesa appear to be uppercase, so I went with that.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 Cc: ch...@chris-wilson.co.uk
 Cc: matts...@gmail.com
 ---
  configure.ac  | 1 +
  src/util/macros.h | 6 ++
  2 files changed, 7 insertions(+)

 I noticed Chris wants to use this in one of his patches, so I figured I'd
 throw together a patch to do this a bit more cleanly.

That would suit me very much. I guess I need to learn about
AX_GCC_FUNC_ATTRIBUTE!

Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 15/18] swrast: Defer _tnl_vertex_init until first use

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 11:33:20 AM Chris Wilson wrote:
 The vertices require a large chunk of memory, currently allocated during
 context creation. However, this memory is not required until use so we
 can defer the allocation until the first swrast_Wakeup().
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/swrast_setup/ss_context.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)
 
 diff --git a/src/mesa/swrast_setup/ss_context.c 
 b/src/mesa/swrast_setup/ss_context.c
 index 74b1da3..028eccb 100644
 --- a/src/mesa/swrast_setup/ss_context.c
 +++ b/src/mesa/swrast_setup/ss_context.c
 @@ -59,10 +59,6 @@ _swsetup_CreateContext( struct gl_context *ctx )
 swsetup-NewState = ~0;
 _swsetup_trifuncs_init( ctx );
  
 -   _tnl_init_vertices( ctx, ctx-Const.MaxArrayLockSize + 12, 
 -sizeof(SWvertex) );
 -
 -
 return GL_TRUE;
  }
  
 @@ -233,6 +229,11 @@ _swsetup_Wakeup( struct gl_context *ctx )
 TNLcontext *tnl = TNL_CONTEXT(ctx);
 SScontext *swsetup = SWSETUP_CONTEXT(ctx);
  
 +   if (!(GET_VERTEX_STATE(ctx))-max_vertex_size)
 +  _tnl_init_vertices(ctx,
 +  ctx-Const.MaxArrayLockSize + 12,
 +  sizeof(SWvertex));
 +
 tnl-Driver.Render.Start = _swsetup_RenderStart;
 tnl-Driver.Render.Finish = _swsetup_RenderFinish;
 tnl-Driver.Render.PrimitiveNotify = _swsetup_RenderPrimitive;
 

Looks reasonable - this saves 2.59MB of memory in Piglit's glsl-cos
test.  (It's worth noting that in the commit message.)

I wonder if we can do better though by avoiding most of the
swrast context entirely...

Reviewed-by: Kenneth Graunke kenn...@whitecape.org


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Fix missing BRW_NEW_FS_PROG_DATA in gen6_renderbuffer_surfaces.

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 12:18:18 PM Matt Turner wrote:
 On Mon, Jul 6, 2015 at 9:55 AM, Kenneth Graunke kenn...@whitecape.org wrote:
  It looks like this was forgotten in commit 3c9dc2d31b80fc73bffa1f40a
  (i965: Make a brw_stage_prog_data for storing the SURF_INDEX
  information.)   In other words, it's been missing since we moved to
  dynamic binding table slot assignments.
 
 Author: Eric Anholt e...@anholt.net
 AuthorDate: Wed Oct 2 14:07:40 2013 -0700
 CommitDate: Tue Oct 15 10:18:42 2013 -0700
 
 Dang.
 
 How did you find this?

After reading Eero's latest performance analysis observations, I decided
to try marking render targets uncached (MOCS = UC) when blending and
logic ops (i.e. things that cause render target reads) are disabled.

This necessitated adding _NEW_COLOR.  Then I realized
BRW_NEW_FS_PROG_DATA was just plain missing.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nvc0: fix wrong use of BLIT_SRC_Y_INT for 2D texture copy

2015-07-06 Thread Ilia Mirkin

Reviewed-by: Ilia Mirkin imir...@alum.mit.edu

I think this only gets called for copy_region, which in turn probably
can't be reached by MS surfaces. Not sure, but this is definitely
right anyways.

On Mon, Jul 6, 2015 at 4:06 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:
 According to nv50, this should be src-ms_y instead of src-ms_x. This
 code is here since 2012, so it's probably a typo error which has never
 been detected since a long time. I didn't do a full piglit run to check
 if it fixes some other weird issues.

 Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
 ---
  src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c 
 b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
 index a820de7..53cd8cd 100644
 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
 +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
 @@ -189,7 +189,7 @@ nvc0_2d_texture_do_copy(struct nouveau_pushbuf *push,
 PUSH_DATA (push, 0);
 PUSH_DATA (push, sx  src-ms_x);
 PUSH_DATA (push, 0);
 -   PUSH_DATA (push, sy  src-ms_x);
 +   PUSH_DATA (push, sy  src-ms_y);

 return 0;
  }
 --
 2.4.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 15/18] swrast: Defer _tnl_vertex_init until first use

2015-07-06 Thread Brian Paul


On 07/06/2015 01:11 PM, Matt Turner wrote:

On Mon, Jul 6, 2015 at 3:33 AM, Chris Wilson ch...@chris-wilson.co.uk wrote:

The vertices require a large chunk of memory, currently allocated during
context creation. However, this memory is not required until use so we
can defer the allocation until the first swrast_Wakeup().


Makes sense to me. Someone like Brian like to take a look.


Looks OK to me, assuming you've done some piglit testing with swrast.

-Brian





Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Kenneth Graunke kenn...@whitecape.org
---
  src/mesa/swrast_setup/ss_context.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/mesa/swrast_setup/ss_context.c 
b/src/mesa/swrast_setup/ss_context.c
index 74b1da3..028eccb 100644
--- a/src/mesa/swrast_setup/ss_context.c
+++ b/src/mesa/swrast_setup/ss_context.c
@@ -59,10 +59,6 @@ _swsetup_CreateContext( struct gl_context *ctx )
 swsetup-NewState = ~0;
 _swsetup_trifuncs_init( ctx );

-   _tnl_init_vertices( ctx, ctx-Const.MaxArrayLockSize + 12,
-  sizeof(SWvertex) );
-
-
 return GL_TRUE;
  }

@@ -233,6 +229,11 @@ _swsetup_Wakeup( struct gl_context *ctx )
 TNLcontext *tnl = TNL_CONTEXT(ctx);
 SScontext *swsetup = SWSETUP_CONTEXT(ctx);

+   if (!(GET_VERTEX_STATE(ctx))-max_vertex_size)
+  _tnl_init_vertices(ctx,
+ctx-Const.MaxArrayLockSize + 12,
+sizeof(SWvertex));


Replace the tabs on these two lines with spaces.


+
 tnl-Driver.Render.Start = _swsetup_RenderStart;
 tnl-Driver.Render.Finish = _swsetup_RenderFinish;
 tnl-Driver.Render.PrimitiveNotify = _swsetup_RenderPrimitive;
--
2.1.4


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: Add a MUST_CHECK macro for attribute((warn_unused_result)).

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 09:27:12 PM Chris Wilson wrote:
 On Mon, Jul 06, 2015 at 11:18:00AM -0700, Kenneth Graunke wrote:
  In the kernel, this is called __must_check; all our attribute macros in
  Mesa appear to be uppercase, so I went with that.

  Signed-off-by: Kenneth Graunke kenn...@whitecape.org
  Cc: ch...@chris-wilson.co.uk
  Cc: matts...@gmail.com
  ---
   configure.ac  | 1 +
   src/util/macros.h | 6 ++
   2 files changed, 7 insertions(+)

  I noticed Chris wants to use this in one of his patches, so I figured I'd
  throw together a patch to do this a bit more cleanly.

 That would suit me very much. I guess I need to learn about
 AX_GCC_FUNC_ATTRIBUTE!

 Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk
 -Chris

It's pretty handy!  Matt found it in the autoconf archive:

http://www.gnu.org/software/autoconf-archive/ax_gcc_func_attribute.html

We just dropped it in our m4/ directory.

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/18] i965: Share the workaround bo between all contexts

2015-07-06 Thread Chris Wilson

On Mon, Jul 06, 2015 at 04:29:49PM +0300, Martin Peres wrote:
 
 
 On 06/07/15 13:33, Chris Wilson wrote:
 Since the workaround bo is used strictly as a write-only buffer, we need
 only allocate one per screen and use the same one from all contexts.
 
 (The caveat here is during extension initialisation, where we write into
 and read back register values from the buffer, but that is performed only
 once for the first context - and baring synchronisation issues should not
 be a problem. Safer would be to move that also to the screen.)
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
   src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 +++---
   src/mesa/drivers/dri/i965/intel_screen.c | 4 
   src/mesa/drivers/dri/i965/intel_screen.h | 1 +
   3 files changed, 8 insertions(+), 3 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
 b/src/mesa/drivers/dri/i965/brw_pipe_control.c
 index 7ee3cb6..05e14cd 100644
 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
 +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
 @@ -341,12 +341,12 @@ brw_init_pipe_control(struct brw_context *brw,
   * the gen6 workaround because it involves actually writing to
   * the buffer, and the kernel doesn't let us write to the batch.
   */
 -   brw-workaround_bo = drm_intel_bo_alloc(brw-bufmgr,
 -   pipe_control workaround,
 -   4096, 4096);
 +   brw-workaround_bo = brw-intelScreen-workaround_bo;
  if (brw-workaround_bo == NULL)
 return -ENOMEM;
 +   drm_intel_bo_reference(brw-workaround_bo);
 +
  brw-pipe_controls_since_last_cs_stall = 0;
  return 0;
 diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
 b/src/mesa/drivers/dri/i965/intel_screen.c
 index 839a984..cd8e6eb 100644
 --- a/src/mesa/drivers/dri/i965/intel_screen.c
 +++ b/src/mesa/drivers/dri/i965/intel_screen.c
 @@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv)
   {
  struct intel_screen *intelScreen = sPriv-driverPrivate;
 +   drm_intel_bo_unreference(intelScreen-workaround_bo);
 
 I guess this is OK because after the last context got destroyed the
 screen would get destroyed?

Sure, but not taking a reference for each would get a little strange and
you would still need a comment to explain why not following the standard
idiom is safe. Then when we introduce context local brw_bo, the
conversion from reference to local is much cleaner with the existing
ref.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult

2015-07-06 Thread Ilia Mirkin

That's right. Except really what might have happend was

occl query;
write X;
more drawing;
write X+1;

and then on the CPU, you see X+1. So the tests are always for = X.
And if you have more than 2^32 submits, you cry, because I'm *sure*
that nothing implements wraparound properly :)

On Mon, Jul 6, 2015 at 1:45 PM, Vyacheslav Gonakhchyan
ytri...@gmail.com wrote:
 Ilia, thanks a lot for the info.

 So basically if I submit to GPU's command stream:
 perform occlusion query,
 write X to Y.
 I know that query is completed when after reading Y address I get X.

 Regards,
 Vyacheslav

 On Mon, Jul 6, 2015 at 9:13 PM, Ilia Mirkin imir...@alum.mit.edu wrote:

 I'm only really familiar with nouveau, but I think all GPU hardware
 works in roughly the same way. Basically you have some way of
 inserting write X to address Y into the command stream (aka a
 fence), after which you insert write X+1 to address Y and so on.
 If you want the CPU to wait on a given fence, you just do while
 (*address  x);. If you have multiple GPU processing queues, you can
 usually also insert a stall this queue until the value at address Y
 is at least X command into the command stream.

 DRM uses implicit fences, so it knows which BOs are used for
 particular commands. So the flow goes something like submit bunch of
 commands; submit fence write and attach that fence id to the BOs in
 the previous bunch of comands. Then to wait for a bo to become ready,
 you just wait until the GPU writes the appropriate number to memory
 address Y (from above).

 The mesa drivers can sometimes use clever tricks that avoid this
 sync'ing because it knows exactly how it emits the commands and
 perhaps it waits on something related earlier whereby it knows the
 other thing will be ready. No idea if that's the case here.

 Hope this helps,

   -ilia


 On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan
 ytri...@gmail.com wrote:
  Ilia, thanks for the gallium link.
  Do you know any links to high level info with broad strokes about how
  this
  sync works? Frankly I do not know driver terminology and wanted to know
  more
  about how this sync is performed for my research. I'm using mesa as a
  reference because it has open implementation code. Occlusion query
  functionality probably waits for z-buffer to become ready. Problem is
  that
  usual synchronization techniques do not apply here. I'm thinking that
  driver
  code gets notifications about state change. I want to know what kind of
  notifications are available? Can query be performed in parallel with
  another
  frame being processed or does it need complete GPU pipeline flush?
 
  Thanks,
  Vyacheslav
 
  On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin imir...@alum.mit.edu
  wrote:
 
  On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan
  ytri...@gmail.com wrote:
   Hi, everyone.
  
   Trying to understand method radeonQueryGetResult (more broadly
   GPU-CPU
   sync).
  
   static void radeonQueryGetResult(struct gl_context *ctx, struct
   gl_query_object *q)
   {
   struct radeon_query_object *query = (struct radeon_query_object
   *)q;
   uint32_t *result;
   int i;
  
   radeon_print(RADEON_STATE, RADEON_VERBOSE,
   %s: query id %d, result %d\n,
   __func__, query-Base.Id, (int) query-Base.Result);
  
   radeon_bo_map(query-bo, GL_FALSE);
   result = query-bo-ptr;
  
   query-Base.Result = 0;
   for (i = 0; i  query-curr_offset/sizeof(uint32_t); ++i) {
   query-Base.Result += LE32_TO_CPU(result[i]);
   radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n,
   i,
   LE32_TO_CPU(result[i]));
   }
  
   radeon_bo_unmap(query-bo);
   }
  
   I don't know which part is responsible for blocking behavior (waiting
   for
   response from GPU). I suspect that radeon_bo_map does this magic.
   Can someone point in the right direction?
 
  The radeon_bo_map defined in
  src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic.
  However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c
  references the radeon_bo_map in libdrm, which does not appear to wait.
 
  FWIW for nouveau, nouveau_bo_map will also implicitly do a
  nouveau_bo_wait, but that does not appear to be the case for radeon.
 
  Cheers,
 
-ilia
 
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nvc0: fix wrong use of BLIT_SRC_Y_INT for 2D texture copy

2015-07-06 Thread Samuel Pitoiset

According to nv50, this should be src-ms_y instead of src-ms_x. This
code is here since 2012, so it's probably a typo error which has never
been detected since a long time. I didn't do a full piglit run to check
if it fixes some other weird issues.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
index a820de7..53cd8cd 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
@@ -189,7 +189,7 @@ nvc0_2d_texture_do_copy(struct nouveau_pushbuf *push,
PUSH_DATA (push, 0);
PUSH_DATA (push, sx  src-ms_x);
PUSH_DATA (push, 0);
-   PUSH_DATA (push, sy  src-ms_x);
+   PUSH_DATA (push, sy  src-ms_y);
 
return 0;
 }
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult

2015-07-06 Thread Roland Scheidegger

Am 06.07.2015 um 19:54 schrieb Ilia Mirkin:
 That's right. Except really what might have happend was
 
 occl query;
 write X;
 more drawing;
 write X+1;
 
 and then on the CPU, you see X+1. So the tests are always for = X.
 And if you have more than 2^32 submits, you cry, because I'm *sure*
 that nothing implements wraparound properly :)
That's why 64bit counters are used right? :-).

Roland

 
 On Mon, Jul 6, 2015 at 1:45 PM, Vyacheslav Gonakhchyan
 ytri...@gmail.com wrote:
 Ilia, thanks a lot for the info.

 So basically if I submit to GPU's command stream:
 perform occlusion query,
 write X to Y.
 I know that query is completed when after reading Y address I get X.

 Regards,
 Vyacheslav

 On Mon, Jul 6, 2015 at 9:13 PM, Ilia Mirkin imir...@alum.mit.edu wrote:

 I'm only really familiar with nouveau, but I think all GPU hardware
 works in roughly the same way. Basically you have some way of
 inserting write X to address Y into the command stream (aka a
 fence), after which you insert write X+1 to address Y and so on.
 If you want the CPU to wait on a given fence, you just do while
 (*address  x);. If you have multiple GPU processing queues, you can
 usually also insert a stall this queue until the value at address Y
 is at least X command into the command stream.

 DRM uses implicit fences, so it knows which BOs are used for
 particular commands. So the flow goes something like submit bunch of
 commands; submit fence write and attach that fence id to the BOs in
 the previous bunch of comands. Then to wait for a bo to become ready,
 you just wait until the GPU writes the appropriate number to memory
 address Y (from above).

 The mesa drivers can sometimes use clever tricks that avoid this
 sync'ing because it knows exactly how it emits the commands and
 perhaps it waits on something related earlier whereby it knows the
 other thing will be ready. No idea if that's the case here.

 Hope this helps,

   -ilia


 On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan
 ytri...@gmail.com wrote:
 Ilia, thanks for the gallium link.
 Do you know any links to high level info with broad strokes about how
 this
 sync works? Frankly I do not know driver terminology and wanted to know
 more
 about how this sync is performed for my research. I'm using mesa as a
 reference because it has open implementation code. Occlusion query
 functionality probably waits for z-buffer to become ready. Problem is
 that
 usual synchronization techniques do not apply here. I'm thinking that
 driver
 code gets notifications about state change. I want to know what kind of
 notifications are available? Can query be performed in parallel with
 another
 frame being processed or does it need complete GPU pipeline flush?

 Thanks,
 Vyacheslav

 On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin imir...@alum.mit.edu
 wrote:

 On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan
 ytri...@gmail.com wrote:
 Hi, everyone.

 Trying to understand method radeonQueryGetResult (more broadly
 GPU-CPU
 sync).

 static void radeonQueryGetResult(struct gl_context *ctx, struct
 gl_query_object *q)
 {
 struct radeon_query_object *query = (struct radeon_query_object
 *)q;
 uint32_t *result;
 int i;

 radeon_print(RADEON_STATE, RADEON_VERBOSE,
 %s: query id %d, result %d\n,
 __func__, query-Base.Id, (int) query-Base.Result);

 radeon_bo_map(query-bo, GL_FALSE);
 result = query-bo-ptr;

 query-Base.Result = 0;
 for (i = 0; i  query-curr_offset/sizeof(uint32_t); ++i) {
 query-Base.Result += LE32_TO_CPU(result[i]);
 radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n,
 i,
 LE32_TO_CPU(result[i]));
 }

 radeon_bo_unmap(query-bo);
 }

 I don't know which part is responsible for blocking behavior (waiting
 for
 response from GPU). I suspect that radeon_bo_map does this magic.
 Can someone point in the right direction?

 The radeon_bo_map defined in
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic.
 However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c
 references the radeon_bo_map in libdrm, which does not appear to wait.

 FWIW for nouveau, nouveau_bo_map will also implicitly do a
 nouveau_bo_wait, but that does not appear to be the case for radeon.

 Cheers,

   -ilia




 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=BQIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=WLJtDrNOS1MO4md0Q2dXG1RDHVFkoqdi6-ZeojTw0l8s=ZgIqDkkPvOUZvUp4VGqWC8rnvcv-tBNOIB6Dqpkh2uUe=
  
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib

2015-07-06 Thread Samuel Pitoiset

Before validating vertex arrays we need to check if a VBO is present.
Checking if vb-buffer is not NULL fixes the issue.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
 src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c 
b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
index 1fd33b8..3d200bd 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
@@ -382,6 +382,11 @@ nv50_vertex_arrays_validate(struct nv50_context *nv50)
   if (nv50-vbo_user  (1  b)) {
  address = addrs[b] + ve-pipe.src_offset;
  limit = addrs[b] + limits[b];
+  } else
+  if (!vb-buffer) {
+ BEGIN_NV04(push, NV50_3D(VERTEX_ARRAY_FETCH(i)), 1);
+ PUSH_DATA (push, 0);
+ continue;
   } else {
  struct nv04_resource *buf = nv04_resource(vb-buffer);
  if (!(refd  (1  b))) {
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/skl: Set the pulls bary bit in 3DSTATE_PS_EXTRA

2015-07-06 Thread Anuj Phogat

On Fri, Jul 3, 2015 at 5:15 AM, Neil Roberts n...@linux.intel.com wrote:
 On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if
 the shader sends a message to the pixel interpolator. This fixes the
 interpolateAt* tests on SKL, apart from interpolateatsample-nonconst
 but that is not implemented anywhere so it's not a regression.
 ---
  src/mesa/drivers/dri/i965/brw_context.h   | 1 +
  src/mesa/drivers/dri/i965/brw_defines.h   | 1 +
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp  | 4 
  src/mesa/drivers/dri/i965/gen8_ps_state.c | 3 +++
  4 files changed, 9 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
 b/src/mesa/drivers/dri/i965/brw_context.h
 index 3553f6e..7596139 100644
 --- a/src/mesa/drivers/dri/i965/brw_context.h
 +++ b/src/mesa/drivers/dri/i965/brw_context.h
 @@ -415,6 +415,7 @@ struct brw_wm_prog_data {
 bool uses_pos_offset;
 bool uses_omask;
 bool uses_kill;
 +   bool pulls_bary;
 uint32_t prog_offset_16;

 /**
 diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
 b/src/mesa/drivers/dri/i965/brw_defines.h
 index 66b9abc..19489ab 100644
 --- a/src/mesa/drivers/dri/i965/brw_defines.h
 +++ b/src/mesa/drivers/dri/i965/brw_defines.h
 @@ -2145,6 +2145,7 @@ enum brw_pixel_shader_computed_depth_mode {
  # define GEN8_PSX_SHADER_DISABLES_ALPHA_TO_COVERAGE (1  7)
  # define GEN8_PSX_SHADER_IS_PER_SAMPLE  (1  6)
  # define GEN8_PSX_SHADER_COMPUTES_STENCIL   (1  5)
 +# define GEN9_PSX_SHADER_PULLS_BARY (1  3)
  # define GEN8_PSX_SHADER_HAS_UAV(1  2)
  # define GEN8_PSX_SHADER_USES_INPUT_COVERAGE_MASK   (1  1)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 index bd71404..3ebc3a2 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 @@ -1481,6 +1481,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder bld, 
 nir_intrinsic_instr *instr
 case nir_intrinsic_interp_var_at_centroid:
 case nir_intrinsic_interp_var_at_sample:
 case nir_intrinsic_interp_var_at_offset: {
 +  assert(stage == MESA_SHADER_FRAGMENT);
 +
 +  ((struct brw_wm_prog_data *) prog_data)-pulls_bary = true;
 +
fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2);

/* For most messages, we need one reg of ignored data; the hardware
 diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c 
 b/src/mesa/drivers/dri/i965/gen8_ps_state.c
 index a88f109..d544509 100644
 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
 +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
 @@ -58,6 +58,9 @@ gen8_upload_ps_extra(struct brw_context *brw,
 if (prog_data-uses_omask)
dw1 |= GEN8_PSX_OMASK_TO_RENDER_TARGET;

 +   if (brw-gen = 9  prog_data-pulls_bary)
 +  dw1 |= GEN9_PSX_SHADER_PULLS_BARY;
 +
 if (_mesa_active_fragment_shader_has_atomic_ops(brw-ctx))
dw1 |= GEN8_PSX_SHADER_HAS_UAV;

 --
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reviewed-by: Anuj Phogat anuj.pho...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult

2015-07-06 Thread Vyacheslav Gonakhchyan

Ilia, thanks a lot for the info.

So basically if I submit to GPU's command stream:
perform occlusion query,
write X to Y.
I know that query is completed when after reading Y address I get X.

Regards,
Vyacheslav

On Mon, Jul 6, 2015 at 9:13 PM, Ilia Mirkin imir...@alum.mit.edu wrote:

 I'm only really familiar with nouveau, but I think all GPU hardware
 works in roughly the same way. Basically you have some way of
 inserting write X to address Y into the command stream (aka a
 fence), after which you insert write X+1 to address Y and so on.
 If you want the CPU to wait on a given fence, you just do while
 (*address  x);. If you have multiple GPU processing queues, you can
 usually also insert a stall this queue until the value at address Y
 is at least X command into the command stream.

 DRM uses implicit fences, so it knows which BOs are used for
 particular commands. So the flow goes something like submit bunch of
 commands; submit fence write and attach that fence id to the BOs in
 the previous bunch of comands. Then to wait for a bo to become ready,
 you just wait until the GPU writes the appropriate number to memory
 address Y (from above).

 The mesa drivers can sometimes use clever tricks that avoid this
 sync'ing because it knows exactly how it emits the commands and
 perhaps it waits on something related earlier whereby it knows the
 other thing will be ready. No idea if that's the case here.

 Hope this helps,

   -ilia


 On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan
 ytri...@gmail.com wrote:
  Ilia, thanks for the gallium link.
  Do you know any links to high level info with broad strokes about how
 this
  sync works? Frankly I do not know driver terminology and wanted to know
 more
  about how this sync is performed for my research. I'm using mesa as a
  reference because it has open implementation code. Occlusion query
  functionality probably waits for z-buffer to become ready. Problem is
 that
  usual synchronization techniques do not apply here. I'm thinking that
 driver
  code gets notifications about state change. I want to know what kind of
  notifications are available? Can query be performed in parallel with
 another
  frame being processed or does it need complete GPU pipeline flush?
 
  Thanks,
  Vyacheslav
 
  On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin imir...@alum.mit.edu
 wrote:
 
  On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan
  ytri...@gmail.com wrote:
   Hi, everyone.
  
   Trying to understand method radeonQueryGetResult (more broadly GPU-CPU
   sync).
  
   static void radeonQueryGetResult(struct gl_context *ctx, struct
   gl_query_object *q)
   {
   struct radeon_query_object *query = (struct radeon_query_object
 *)q;
   uint32_t *result;
   int i;
  
   radeon_print(RADEON_STATE, RADEON_VERBOSE,
   %s: query id %d, result %d\n,
   __func__, query-Base.Id, (int) query-Base.Result);
  
   radeon_bo_map(query-bo, GL_FALSE);
   result = query-bo-ptr;
  
   query-Base.Result = 0;
   for (i = 0; i  query-curr_offset/sizeof(uint32_t); ++i) {
   query-Base.Result += LE32_TO_CPU(result[i]);
   radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n,
 i,
   LE32_TO_CPU(result[i]));
   }
  
   radeon_bo_unmap(query-bo);
   }
  
   I don't know which part is responsible for blocking behavior (waiting
   for
   response from GPU). I suspect that radeon_bo_map does this magic.
   Can someone point in the right direction?
 
  The radeon_bo_map defined in
  src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic.
  However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c
  references the radeon_bo_map in libdrm, which does not appear to wait.
 
  FWIW for nouveau, nouveau_bo_map will also implicitly do a
  nouveau_bo_wait, but that does not appear to be the case for radeon.
 
  Cheers,
 
-ilia
 
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 05:12:10 PM Chris Wilson wrote:
 On Mon, Jul 06, 2015 at 04:19:36PM +0300, Martin Peres wrote:
  
  
  On 06/07/15 16:15, Martin Peres wrote:
  On 06/07/15 16:13, Chris Wilson wrote:
  On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote:
  On 06/07/15 13:33, Chris Wilson wrote:
  Move the query for the TIMESTAMP register from context init to the
  screen, so that it is only queried once for all contexts.
  
  On 32bit systems, some old kernels trigger a hw bug resulting in the
  TIMESTAMP register being shifted and the low bits always zero. Detect
  this by repeating the read a few times and check the register is
  incrementing.
  You do not do the latter. You only check for the low bits.
  
  I guess the counter is supposed to be monotonically increasing and
  with a resolution of a few microseconds which would make this
  perfectly valid. Could you confirm and make sure to add this
  information in the commit message please?
  The counter should increment every 80ns. What's misleading in what I
  wrote? It describes the hw bug and how to detect it.
  
  Well, it is not misleading, it just lacks this information.
  
  If it incremented every seconds, the patch would be stupid because
  the timestamp could be at 0 and polling 10 times at a few us of
  interval would always yield the same result. That's all :)
  
  Oh, forgot to say: With this information added in the commit message
  and the commit message duplicated as a comment in
  intel_detect_timestamp(), the patch is:
 
 How about:
 
 On 32bit systems, some old kernels trigger a hw bug resulting in the
 TIMESTAMP register being shifted and the low 32bits always zero. Detect
 this by repeating the read a few times and check the register is
 incrementing every 80ns as expected and not stuck on zero (as would be 
 the case with the buggy kernel/hw.).
 -Chris
 
 

It would be great to put this in a comment above the loop rather than
only in the commit message.

Also, moving the check to screen-time and fixing the check to use a loop
are really two separate things - but they're so trivial I'm inclined to
not be picky. :)

Reviewed-by: Kenneth Graunke kenn...@whitecape.org


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/18] i965: Pass the map-mode along to intel_mipmap_tree_map_raw()

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 11:33:11 AM Chris Wilson wrote:
 Since we can distinguish when mapping between READ and WRITE, we can
 pass along the map mode to avoid stalls and flushes where possible.
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 28 
 ++-
  src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  3 ++-
  2 files changed, 17 insertions(+), 14 deletions(-)

Huh, I thought I fixed this ages ago.  Guess not.  Thanks!

Reviewed-by: Kenneth Graunke kenn...@whitecape.org



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/18] i965: Share the workaround bo between all contexts

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 11:33:08 AM Chris Wilson wrote:
 Since the workaround bo is used strictly as a write-only buffer, we need
 only allocate one per screen and use the same one from all contexts.
 
 (The caveat here is during extension initialisation, where we write into
 and read back register values from the buffer, but that is performed only
 once for the first context - and baring synchronisation issues should not
 be a problem. Safer would be to move that also to the screen.)
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
  src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 +++---
  src/mesa/drivers/dri/i965/intel_screen.c | 4 
  src/mesa/drivers/dri/i965/intel_screen.h | 1 +
  3 files changed, 8 insertions(+), 3 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
 b/src/mesa/drivers/dri/i965/brw_pipe_control.c
 index 7ee3cb6..05e14cd 100644
 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
 +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
 @@ -341,12 +341,12 @@ brw_init_pipe_control(struct brw_context *brw,
  * the gen6 workaround because it involves actually writing to
  * the buffer, and the kernel doesn't let us write to the batch.
  */
 -   brw-workaround_bo = drm_intel_bo_alloc(brw-bufmgr,
 -   pipe_control workaround,
 -   4096, 4096);
 +   brw-workaround_bo = brw-intelScreen-workaround_bo;
 if (brw-workaround_bo == NULL)
return -ENOMEM;

Checking for out-of-memory conditions in code that doesn't actually
allocate anything looks funky now.  I'd be inclined just to drop the
-ENOMEM path and make this a void function.

Alternatively, you could just fold this into the brw_context setup and
ditch the functions altogether.  Up to you.

  
 +   drm_intel_bo_reference(brw-workaround_bo);
 +
 brw-pipe_controls_since_last_cs_stall = 0;
  
 return 0;
 diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
 b/src/mesa/drivers/dri/i965/intel_screen.c
 index 839a984..cd8e6eb 100644
 --- a/src/mesa/drivers/dri/i965/intel_screen.c
 +++ b/src/mesa/drivers/dri/i965/intel_screen.c
 @@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv)
  {
 struct intel_screen *intelScreen = sPriv-driverPrivate;
  
 +   drm_intel_bo_unreference(intelScreen-workaround_bo);
 dri_bufmgr_destroy(intelScreen-bufmgr);
 driDestroyOptionInfo(intelScreen-optionCache);
  
 @@ -1096,6 +1097,9 @@ intel_init_bufmgr(struct intel_screen *intelScreen)
return false;
 }
  
 +   intelScreen-workaround_bo =
 +  drm_intel_bo_alloc(intelScreen-bufmgr, pipe_control w/a, 4096, 
 4096);
 +

Seems a little funny to put this in intel_init_bufmgr, since it's not
setting up libdrm - why not put it in the caller?

 return true;
  }
  
 diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
 b/src/mesa/drivers/dri/i965/intel_screen.h
 index 941e0fc..e55fddb 100644
 --- a/src/mesa/drivers/dri/i965/intel_screen.h
 +++ b/src/mesa/drivers/dri/i965/intel_screen.h
 @@ -60,6 +60,7 @@ struct intel_screen
 bool has_context_reset_notification;
  
 dri_bufmgr *bufmgr;
 +   drm_intel_bo *workaround_bo;
  
 /**
  * A unique ID for shader programs.
 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] i965/gen4-5: Enable 16-wide dispatch on shaders with control flow.

2015-07-06 Thread Francisco Jerez

This was probably disabled due to a combination of several bugs in the
generator code (fixed earlier in this series) and a misunderstanding
of the hardware spec.  The documentation for most control flow
instructions mentions among other restrictions:

 Instruction compression is not allowed.

This however doesn't have any implications on 16 wide not being
supported, because none of the control flow instructions have
multi-register operands (control flow instructions are not compressed
on more recent hardware either, except maybe SNB's IF with inline
compare).  In fact Gen4-5 had 16-wide control flow masks and stacks,
and the spec mentions in several places that control flow instructions
push and pop 16 channels worth of data -- Otherwise there doesn't seem
to be any indication that it shouldn't work.

Causes no piglit regressions, and gives the following shader-db
results on ILK:

 total instructions in shared programs: 4711384 - 4711384 (0.00%)
 instructions in affected programs: 0 - 0
 helped:0
 HURT:  0
 GAINED:1215
 LOST:  0
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index bd71404..5247738 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -417,18 +417,12 @@ fs_visitor::nir_emit_if(nir_if *if_stmt)
 
bld.emit(BRW_OPCODE_ENDIF);
 
-   if (!try_replace_with_sel()  devinfo-gen  6) {
-  no16(Can't support (non-uniform) control flow on SIMD16\n);
-   }
+   try_replace_with_sel();
 }
 
 void
 fs_visitor::nir_emit_loop(nir_loop *loop)
 {
-   if (devinfo-gen  6) {
-  no16(Can't support (non-uniform) control flow on SIMD16\n);
-   }
-
bld.emit(BRW_OPCODE_DO);
 
nir_emit_cf_list(loop-body);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] glsl: Add missing check for whether an expression is an add operation

2015-07-06 Thread Matt Turner

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] glsl: Make sure not to dereference NULL

2015-07-06 Thread Matt Turner

On Sat, Jul 4, 2015 at 2:40 PM, Neil Roberts n...@linux.intel.com wrote:
 In this bit of code point_five can be NULL if the expression is not a
 constant. This fixes it to match the pattern of the rest of the chunk
 of code so that it checks for NULLs.

 Cc: Matt Turner matts...@gmail.com
 Cc: 10.6 mesa-sta...@lists.freedesktop.org
 ---
  src/glsl/opt_algebraic.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 9b8a426..c4b8715 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -589,7 +589,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
 continue;

  ir_constant *point_five = add_expr-operands[1 - 
 j]-as_constant();
 -if (!point_five-is_value(0.5, 0))
 +if (!point_five || !point_five-is_value(0.5, 0))
 continue;

  if (abs_expr-operands[0]-equals(sign_expr-operands[0])) {
 --
 1.9.3


Whoops. I think I thought that is_value NULL checked 'this'.

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] i965/gen4-5: Set ENDIF dst and src0 fields to the null register.

2015-07-06 Thread Francisco Jerez

The hardware docs don't mention explicitly what these fields should
be, but I've verified experimentally on ILK that using a GRF as
destination causes the register to be corrupted when the execution
size of an ENDIF instruction is higher than 8 -- and because the
destination we were using was g0, eventually a hang.

Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with
if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3.
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 0f53604..4d39762 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1584,8 +1584,8 @@ brw_ENDIF(struct brw_codegen *p)
}
 
if (devinfo-gen  6) {
-  brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
-  brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
+  brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
+  brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
   brw_set_src1(p, insn, brw_imm_d(0x0));
} else if (devinfo-gen == 6) {
   brw_set_dest(p, insn, brw_imm_w(0));
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] i965/gen4-5: Program the execution size correctly for DO/WHILE instructions.

2015-07-06 Thread Francisco Jerez

From the hardware docs for the DO instruction:

 Execution size is ignored for this instruction.

My observation on ILK hardware contradicts the spec though, channels
over the execution size of a DO instruction won't enter the loop, and
channels over the execution size of a WHILE instruction will exit the
loop after the first iteration -- The latter is consistent with the
spec though, there's no claim that the execution size is ignored for
the WHILE instruction so it's not completely unexpected that it has an
influence on the evaluation of EMask.

The execute_size argument of brw_DO() shouldn't have any effect on
Gen6 and newer hardware.  On Gen4-5 WHILE instructions inherit the
execution size from the matching DO, so this patch should fix them
too.  The execution size of BREAK and CONT instructions was already
being set correctly.

Fixes some 50 piglit tests on Gen4-5 when forced to run shaders with
conditional and loop instructions 16-wide,
e.g. shaders/glsl-fs-continue-inside-do-while.
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 0a70bdc..c986d91 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1869,7 +1869,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
 break;
 
   case BRW_OPCODE_DO:
-brw_DO(p, BRW_EXECUTE_8);
+brw_DO(p, dispatch_width == 16 ? BRW_EXECUTE_16 : BRW_EXECUTE_8);
 break;
 
   case BRW_OPCODE_BREAK:
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: allocate at least 1 BLEND_STATE element

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 04:24:12 PM Emil Velikov wrote:
 Hello gents,
 
 On 2 July 2015 at 08:45, Kenneth Graunke kenn...@whitecape.org wrote:
  On Wednesday, July 01, 2015 10:16:28 AM Mike Stroyan wrote:
  When there are no color buffer render targets, gen6 and gen7 still
  use the first BLEND_STATE element to determine alpha test.
  gen6_upload_blend_state was allocating zero elements when
  ctx-Color.AlphaEnabled was false.
  That left _3DSTATE_CC_STATE_POINTERS or _3DSTATE_BLEND_STATE_POINTERS
  pointing to random data from some previous brw_state_batch().
  That sometimes suppressed depth rendering when those bits
  happened to mean COMPAREFUNC_NEVER.
  This produced flickering shadows for dota2 reborn.
  ---
   src/mesa/drivers/dri/i965/gen6_cc.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/src/mesa/drivers/dri/i965/gen6_cc.c 
  b/src/mesa/drivers/dri/i965/gen6_cc.c
  index 2bfa271..2b76e24 100644
  --- a/src/mesa/drivers/dri/i965/gen6_cc.c
  +++ b/src/mesa/drivers/dri/i965/gen6_cc.c
  @@ -51,7 +51,7 @@ gen6_upload_blend_state(struct brw_context *brw)
   * with render target 0, which will reference BLEND_STATE[0] for
   * alpha test enable.
   */
  -   if (nr_draw_buffers == 0  ctx-Color.AlphaEnabled)
  +   if (nr_draw_buffers == 0)
 nr_draw_buffers = 1;
 
  size = sizeof(*blend) * nr_draw_buffers;
 
 
  Great catch!
 
  Reviewed-by: Kenneth Graunke kenn...@whitecape.org
 
  And pushed:
 9d408a4..fe2b748  master - master
 
  I think we ought to change gen8_blend_state.c as well, but I'm not quite
  sure what change to make.  Either we should make the same change you did
  here, or delete the whole We need at least 1 BLEND_STATE written
  block.
 
  On Gen8+, it looks like the alpha test and other functions that might
  discard pixels are all in the shared/common DWord, and the per-color
  target DWord pairs look relatively harmless.  I suppose the null RT
  would still refer to BLEND_STATE[0]...so it might still be worth
  emitting one.  Any thoughts?
 Should we get this into 10.6 or there are some not so obvious
 dependencies that we're missing in the 10.6 branch ? Asking every user
 to rebuild mesa just to play Dota2 seems unreasonable imho.
 
 Cheers,
 Emil

Yeah, it should definitely land in 10.6.I think I just forgot to add
the Cc stable tag (sorry!).  It shouldn't have any dependencies.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult

2015-07-06 Thread Ilia Mirkin

On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan
ytri...@gmail.com wrote:
 Hi, everyone.

 Trying to understand method radeonQueryGetResult (more broadly GPU-CPU
 sync).

 static void radeonQueryGetResult(struct gl_context *ctx, struct
 gl_query_object *q)
 {
 struct radeon_query_object *query = (struct radeon_query_object *)q;
 uint32_t *result;
 int i;

 radeon_print(RADEON_STATE, RADEON_VERBOSE,
 %s: query id %d, result %d\n,
 __func__, query-Base.Id, (int) query-Base.Result);

 radeon_bo_map(query-bo, GL_FALSE);
 result = query-bo-ptr;

 query-Base.Result = 0;
 for (i = 0; i  query-curr_offset/sizeof(uint32_t); ++i) {
 query-Base.Result += LE32_TO_CPU(result[i]);
 radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i,
 LE32_TO_CPU(result[i]));
 }

 radeon_bo_unmap(query-bo);
 }

 I don't know which part is responsible for blocking behavior (waiting for
 response from GPU). I suspect that radeon_bo_map does this magic.
 Can someone point in the right direction?

The radeon_bo_map defined in
src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic.
However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c
references the radeon_bo_map in libdrm, which does not appear to wait.

FWIW for nouveau, nouveau_bo_map will also implicitly do a
nouveau_bo_wait, but that does not appear to be the case for radeon.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/18] i965: Move pipecontrol workaround bo to brw_pipe_control

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 11:33:07 AM Chris Wilson wrote:
 With the exception of gen8, the sole user of the workaround bo are for
 emitting pipe controls. Move it out of the purview of the batchbuffer
 and into the pipecontrol.
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk

Reviewed-by: Kenneth Graunke kenn...@whitecape.org


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/18] i965: Reuse our VBO for streaming fast-clear vertices

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 11:33:10 AM Chris Wilson wrote:
 Rather than allocating a fresh page every time we clear a buffer, keep
 that page around between invocations by tracking the last used offset
 and only allocating a fresh page when we wrap.
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 17 ++---
  1 file changed, 14 insertions(+), 3 deletions(-)

This looks okay to me.  Do you have any performance data to justify the
extra complexity?


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult

2015-07-06 Thread Vyacheslav Gonakhchyan

Ilia, thanks for the gallium link.
Do you know any links to high level info with broad strokes about how this
sync works? Frankly I do not know driver terminology and wanted to know
more about how this sync is performed for my research. I'm using mesa as a
reference because it has open implementation code. Occlusion query
functionality probably waits for z-buffer to become ready. Problem is that
usual synchronization techniques do not apply here. I'm thinking that
driver code gets notifications about state change. I want to know what kind
of notifications are available? Can query be performed in parallel with
another frame being processed or does it need complete GPU pipeline flush?

Thanks,
Vyacheslav

On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin imir...@alum.mit.edu wrote:

 On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan
 ytri...@gmail.com wrote:
  Hi, everyone.
 
  Trying to understand method radeonQueryGetResult (more broadly GPU-CPU
  sync).
 
  static void radeonQueryGetResult(struct gl_context *ctx, struct
  gl_query_object *q)
  {
  struct radeon_query_object *query = (struct radeon_query_object *)q;
  uint32_t *result;
  int i;
 
  radeon_print(RADEON_STATE, RADEON_VERBOSE,
  %s: query id %d, result %d\n,
  __func__, query-Base.Id, (int) query-Base.Result);
 
  radeon_bo_map(query-bo, GL_FALSE);
  result = query-bo-ptr;
 
  query-Base.Result = 0;
  for (i = 0; i  query-curr_offset/sizeof(uint32_t); ++i) {
  query-Base.Result += LE32_TO_CPU(result[i]);
  radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i,
  LE32_TO_CPU(result[i]));
  }
 
  radeon_bo_unmap(query-bo);
  }
 
  I don't know which part is responsible for blocking behavior (waiting for
  response from GPU). I suspect that radeon_bo_map does this magic.
  Can someone point in the right direction?

 The radeon_bo_map defined in
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic.
 However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c
 references the radeon_bo_map in libdrm, which does not appear to wait.

 FWIW for nouveau, nouveau_bo_map will also implicitly do a
 nouveau_bo_wait, but that does not appear to be the case for radeon.

 Cheers,

   -ilia

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once

2015-07-06 Thread Chris Wilson

On Mon, Jul 06, 2015 at 04:19:36PM +0300, Martin Peres wrote:
 
 
 On 06/07/15 16:15, Martin Peres wrote:
 On 06/07/15 16:13, Chris Wilson wrote:
 On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote:
 On 06/07/15 13:33, Chris Wilson wrote:
 Move the query for the TIMESTAMP register from context init to the
 screen, so that it is only queried once for all contexts.
 
 On 32bit systems, some old kernels trigger a hw bug resulting in the
 TIMESTAMP register being shifted and the low bits always zero. Detect
 this by repeating the read a few times and check the register is
 incrementing.
 You do not do the latter. You only check for the low bits.
 
 I guess the counter is supposed to be monotonically increasing and
 with a resolution of a few microseconds which would make this
 perfectly valid. Could you confirm and make sure to add this
 information in the commit message please?
 The counter should increment every 80ns. What's misleading in what I
 wrote? It describes the hw bug and how to detect it.
 
 Well, it is not misleading, it just lacks this information.
 
 If it incremented every seconds, the patch would be stupid because
 the timestamp could be at 0 and polling 10 times at a few us of
 interval would always yield the same result. That's all :)
 
 Oh, forgot to say: With this information added in the commit message
 and the commit message duplicated as a comment in
 intel_detect_timestamp(), the patch is:

How about:

On 32bit systems, some old kernels trigger a hw bug resulting in the
TIMESTAMP register being shifted and the low 32bits always zero. Detect
this by repeating the read a few times and check the register is
incrementing every 80ns as expected and not stuck on zero (as would be 
the case with the buggy kernel/hw.).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=90264

--- Comment #32 from Matt Whitlock freedesk...@mattwhitlock.name ---
(In reply to Matt Whitlock from comment #29)
 Also, for what it's worth, this problem seems to have disappeared for me. I
 have rebuilt Mesa 10.5.6 without the revert of 95073a2d, and Chromium
 44.0.2403.30 is no longer exhibiting the bug for me. I am still on libdrm
 2.4.59, but I have since upgraded my Linux kernel to 4.0.5-gentoo.

I spoke too soon. I am still seeing tooltip corruptions in Chromium
44.0.2403.61 on Linux 4.0.5-gentoo, libdrm 2.4.62, Nouveau 1.0.11, and Mesa
10.6.1 (without the revert of 95073a2d).

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 07/18] i965: Make intel_mipmap_tree_map_raw() static

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 11:33:12 AM Chris Wilson wrote:
 No external users, so no need to export the symbol outside of our
 compilation unit.
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk

Good call.

Reviewed-by: Kenneth Graunke kenn...@whitecape.org


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: Fix missing BRW_NEW_FS_PROG_DATA in gen6_renderbuffer_surfaces.

2015-07-06 Thread Kenneth Graunke

It looks like this was forgotten in commit 3c9dc2d31b80fc73bffa1f40a
(i965: Make a brw_stage_prog_data for storing the SURF_INDEX
information.)   In other words, it's been missing since we moved to
dynamic binding table slot assignments.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index 72aad96..b67d9ca 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -770,7 +770,7 @@ update_renderbuffer_surfaces(struct brw_context *brw)
 {
const struct gl_context *ctx = brw-ctx;
 
-   /* _NEW_BUFFERS | _NEW_COLOR */
+   /* _NEW_BUFFERS | _NEW_COLOR | BRW_NEW_FS_PROG_DATA */
const struct gl_framebuffer *fb = ctx-DrawBuffer;
brw_update_renderbuffer_surfaces(
   brw, fb,
@@ -792,7 +792,8 @@ const struct brw_tracked_state brw_renderbuffer_surfaces = {
 const struct brw_tracked_state gen6_renderbuffer_surfaces = {
.dirty = {
   .mesa = _NEW_BUFFERS,
-  .brw = BRW_NEW_BATCH,
+  .brw = BRW_NEW_BATCH |
+ BRW_NEW_FS_PROG_DATA,
},
.emit = update_renderbuffer_surfaces,
 };
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult

2015-07-06 Thread Ilia Mirkin

I'm only really familiar with nouveau, but I think all GPU hardware
works in roughly the same way. Basically you have some way of
inserting write X to address Y into the command stream (aka a
fence), after which you insert write X+1 to address Y and so on.
If you want the CPU to wait on a given fence, you just do while
(*address  x);. If you have multiple GPU processing queues, you can
usually also insert a stall this queue until the value at address Y
is at least X command into the command stream.

DRM uses implicit fences, so it knows which BOs are used for
particular commands. So the flow goes something like submit bunch of
commands; submit fence write and attach that fence id to the BOs in
the previous bunch of comands. Then to wait for a bo to become ready,
you just wait until the GPU writes the appropriate number to memory
address Y (from above).

The mesa drivers can sometimes use clever tricks that avoid this
sync'ing because it knows exactly how it emits the commands and
perhaps it waits on something related earlier whereby it knows the
other thing will be ready. No idea if that's the case here.

Hope this helps,

  -ilia


On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan
ytri...@gmail.com wrote:
 Ilia, thanks for the gallium link.
 Do you know any links to high level info with broad strokes about how this
 sync works? Frankly I do not know driver terminology and wanted to know more
 about how this sync is performed for my research. I'm using mesa as a
 reference because it has open implementation code. Occlusion query
 functionality probably waits for z-buffer to become ready. Problem is that
 usual synchronization techniques do not apply here. I'm thinking that driver
 code gets notifications about state change. I want to know what kind of
 notifications are available? Can query be performed in parallel with another
 frame being processed or does it need complete GPU pipeline flush?

 Thanks,
 Vyacheslav

 On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin imir...@alum.mit.edu wrote:

 On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan
 ytri...@gmail.com wrote:
  Hi, everyone.
 
  Trying to understand method radeonQueryGetResult (more broadly GPU-CPU
  sync).
 
  static void radeonQueryGetResult(struct gl_context *ctx, struct
  gl_query_object *q)
  {
  struct radeon_query_object *query = (struct radeon_query_object *)q;
  uint32_t *result;
  int i;
 
  radeon_print(RADEON_STATE, RADEON_VERBOSE,
  %s: query id %d, result %d\n,
  __func__, query-Base.Id, (int) query-Base.Result);
 
  radeon_bo_map(query-bo, GL_FALSE);
  result = query-bo-ptr;
 
  query-Base.Result = 0;
  for (i = 0; i  query-curr_offset/sizeof(uint32_t); ++i) {
  query-Base.Result += LE32_TO_CPU(result[i]);
  radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i,
  LE32_TO_CPU(result[i]));
  }
 
  radeon_bo_unmap(query-bo);
  }
 
  I don't know which part is responsible for blocking behavior (waiting
  for
  response from GPU). I suspect that radeon_bo_map does this magic.
  Can someone point in the right direction?

 The radeon_bo_map defined in
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic.
 However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c
 references the radeon_bo_map in libdrm, which does not appear to wait.

 FWIW for nouveau, nouveau_bo_map will also implicitly do a
 nouveau_bo_wait, but that does not appear to be the case for radeon.

 Cheers,

   -ilia


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965/gen4-5: Enable 16-wide dispatch on shaders with control flow.

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 11:03 AM, Francisco Jerez curroje...@riseup.net wrote:
 This was probably disabled due to a combination of several bugs in the
 generator code (fixed earlier in this series) and a misunderstanding
 of the hardware spec.  The documentation for most control flow
 instructions mentions among other restrictions:

  Instruction compression is not allowed.

 This however doesn't have any implications on 16 wide not being
 supported, because none of the control flow instructions have
 multi-register operands (control flow instructions are not compressed
 on more recent hardware either, except maybe SNB's IF with inline
 compare).  In fact Gen4-5 had 16-wide control flow masks and stacks,
 and the spec mentions in several places that control flow instructions
 push and pop 16 channels worth of data -- Otherwise there doesn't seem
 to be any indication that it shouldn't work.

Awesome. Yeah, I was wondering aloud to Ken recently about what
actually prevented non-uniform control flow from working -- because as
you say the mask registers are 16-channels wide.

Really cool to find out that it was just a couple of bugs and nothing
fundamental. Really nice work.

 Causes no piglit regressions, and gives the following shader-db
 results on ILK:

Assuming no regressions on Gen4 and G45 (running through Jenkins should do it):

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 11:33:09 AM Chris Wilson wrote:
 When submitting commands to the GPU every cycle of latency counts;
 mutexes, spinlocks, even atomics quickly add to substantial overhead.
 
 This batch manager acts as thread-local shim over the buffer manager
 (drm_intel_bufmgr_gem). As we are only ever used from within a single
 context, we can rely on the upper layers providing thread safety.
 This allows us to import buffers from the shared screen (sharing buffers
 between multiple contexts, threads and users) and wrap that handle in
 our own. Similarly, we want to share the buffer cache between all
 users on the file and so allocate from the global threadsafe buffer
 manager, with a very small and transient local cache of active buffers.
 
 The batch manager provides a cheap way of busyness tracking and very
 efficient batch construction and kernel submission.
 
 The restrictions over and above the generic submission engine in
 intel_bufmgr_gem are:
  - not thread-safe
  - flat relocations, only the batch buffer itself carries
relocations. Relocations relative to auxiliary buffers
must be performed via STATE_BASE
  - direct mapping of the batch for writes, expect reads
from the batch to be slow
  - the batch is a fixed 64k in size
  - access to the batch must be wrapped by brw_batch_begin/_end
  - all relocations must be immediately written into the batch
 
 The importance of the flat relocation tree with local offset handling is
 that it allows us to use the relocation-less execbuffer interfaces,
 dramatically reducing the overhead of batch submission. However, that
 can be relaxed to allow other buffers than the batch buffer to carry
 relocations, if need be.
 
 ivb/bdw OglBatch7 improves by ~20% above and beyond my kernel relocation
 speedups.
 
 ISSUES:
 * shared mipmap trees
   - we instantiate a context local copy on use, but what are the semantics for
 serializing read/writes between them - do we need automagic flushing of
 execution on other contexts and common busyness tracking?
   - we retain references to the bo past the lifetime of its parent
 batchmgr as the mipmap_tree is retained past the lifetime of its
 original context, see glx_arb_create_context/default_major_version
 * OglMultithread is nevertheless unhappy; but that looks like undefined
   behaviour - i.e. a buggy client concurrently executing the same GL
   context in multiple threads, unpatched is equally buggy.
 * Add full-ppgtt softpinning support (no more relocations, at least for
   the first 256TiB), at the moment there is a limited proof-of-principle
   demonstration
 * polish and move to libdrm; though at the cost of sealing the structs?
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Daniel Vetter daniel.vet...@ffwll.ch
 Cc: Kristian Høgsberg k...@bitplanet.net
 Cc: Kenneth Graunke kenn...@whitecape.org
 Cc: Jesse Barnes jbar...@virtuousgeek.org
 Cc: Ian Romanick ian.d.roman...@intel.com
 Cc: Abdiel Janulgue abdiel.janul...@linux.intel.com
 Cc: Eero Tamminen eero.t.tammi...@intel.com
 Cc: Martin Peres martin.pe...@linux.intel.com
 ---
  src/mesa/drivers/dri/i965/Makefile.sources |4 +-
  src/mesa/drivers/dri/i965/brw_batch.c  | 1946 
 
  src/mesa/drivers/dri/i965/brw_batch.h  |  377 
  src/mesa/drivers/dri/i965/brw_binding_tables.c |1 -
  src/mesa/drivers/dri/i965/brw_blorp.cpp|   46 +-
  src/mesa/drivers/dri/i965/brw_cc.c |   16 +-
  src/mesa/drivers/dri/i965/brw_clear.c  |1 -
  src/mesa/drivers/dri/i965/brw_clip.c   |2 -
  src/mesa/drivers/dri/i965/brw_clip_line.c  |2 -
  src/mesa/drivers/dri/i965/brw_clip_point.c |2 -
  src/mesa/drivers/dri/i965/brw_clip_state.c |   14 +-
  src/mesa/drivers/dri/i965/brw_clip_tri.c   |2 -
  src/mesa/drivers/dri/i965/brw_clip_unfilled.c  |2 -
  src/mesa/drivers/dri/i965/brw_clip_util.c  |2 -
  src/mesa/drivers/dri/i965/brw_compute.c|   42 +-
  src/mesa/drivers/dri/i965/brw_conditional_render.c |2 +-
  src/mesa/drivers/dri/i965/brw_context.c|  233 ++-
  src/mesa/drivers/dri/i965/brw_context.h|  144 +-
  src/mesa/drivers/dri/i965/brw_cs.cpp   |6 +-
  src/mesa/drivers/dri/i965/brw_curbe.c  |1 -
  src/mesa/drivers/dri/i965/brw_draw.c   |  103 +-
  src/mesa/drivers/dri/i965/brw_draw_upload.c|   23 +-
  src/mesa/drivers/dri/i965/brw_ff_gs.c  |2 -
  src/mesa/drivers/dri/i965/brw_ff_gs_emit.c |1 -
  src/mesa/drivers/dri/i965/brw_fs.cpp   |5 +-
  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c|   11 +-
  src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c  |1 -
  src/mesa/drivers/dri/i965/brw_meta_updownsample.c  |1 -
  src/mesa/drivers/dri/i965/brw_misc_state.c |   10 +-

Re: [Mesa-dev] [PATCH 1/3] i965/gen4-5: Set ENDIF dst and src0 fields to the null register.

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 11:03 AM, Francisco Jerez curroje...@riseup.net wrote:
 The hardware docs don't mention explicitly what these fields should
 be, but I've verified experimentally on ILK that using a GRF as
 destination causes the register to be corrupted when the execution
 size of an ENDIF instruction is higher than 8 -- and because the
 destination we were using was g0, eventually a hang.

 Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with
 if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3.

Neat!

 ---
  src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
 b/src/mesa/drivers/dri/i965/brw_eu_emit.c
 index 0f53604..4d39762 100644
 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
 +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
 @@ -1584,8 +1584,8 @@ brw_ENDIF(struct brw_codegen *p)
 }

 if (devinfo-gen  6) {
 -  brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
 -  brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
 +  brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
 +  brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));

Strange that it was an imm32 (type D) on gen  6 and then imm16 after that.

Reviewed-by: Matt Turner matts...@gmail.com

brw_set_src1(p, insn, brw_imm_d(0x0));
 } else if (devinfo-gen == 6) {
brw_set_dest(p, insn, brw_imm_w(0));
 --
 2.4.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 91226] Crash in glLinkProgram (NEW)

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=91226

Neil Roberts n...@linux.intel.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Neil Roberts n...@linux.intel.com ---
I've pushed the patch here:

http://cgit.freedesktop.org/mesa/mesa/commit/?id=18039078e0254c7cb5e15b7186be

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: Add a MUST_CHECK macro for attribute((warn_unused_result)).

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 11:18 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 In the kernel, this is called __must_check; all our attribute macros in
 Mesa appear to be uppercase, so I went with that.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 Cc: ch...@chris-wilson.co.uk
 Cc: matts...@gmail.com
 ---

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 15/18] swrast: Defer _tnl_vertex_init until first use

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 3:33 AM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 The vertices require a large chunk of memory, currently allocated during
 context creation. However, this memory is not required until use so we
 can defer the allocation until the first swrast_Wakeup().

Makes sense to me. Someone like Brian like to take a look.


 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/swrast_setup/ss_context.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

 diff --git a/src/mesa/swrast_setup/ss_context.c 
 b/src/mesa/swrast_setup/ss_context.c
 index 74b1da3..028eccb 100644
 --- a/src/mesa/swrast_setup/ss_context.c
 +++ b/src/mesa/swrast_setup/ss_context.c
 @@ -59,10 +59,6 @@ _swsetup_CreateContext( struct gl_context *ctx )
 swsetup-NewState = ~0;
 _swsetup_trifuncs_init( ctx );

 -   _tnl_init_vertices( ctx, ctx-Const.MaxArrayLockSize + 12,
 -  sizeof(SWvertex) );
 -
 -
 return GL_TRUE;
  }

 @@ -233,6 +229,11 @@ _swsetup_Wakeup( struct gl_context *ctx )
 TNLcontext *tnl = TNL_CONTEXT(ctx);
 SScontext *swsetup = SWSETUP_CONTEXT(ctx);

 +   if (!(GET_VERTEX_STATE(ctx))-max_vertex_size)
 +  _tnl_init_vertices(ctx,
 +ctx-Const.MaxArrayLockSize + 12,
 +sizeof(SWvertex));

Replace the tabs on these two lines with spaces.

 +
 tnl-Driver.Render.Start = _swsetup_RenderStart;
 tnl-Driver.Render.Finish = _swsetup_RenderFinish;
 tnl-Driver.Render.PrimitiveNotify = _swsetup_RenderPrimitive;
 --
 2.1.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote:
 Since the purpose of transform feedback tends to be for the client to
 act upon the results to change the geometry in the scene, it is likely
 that the client will soon be waiting upon the results. Flush the batch
 early so that we don't build up a long queue of commands afterwards that
 could delay the readback.
 ---
  src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
 b/src/mesa/drivers/dri/i965/gen7_sol_state.c
 index 857ebe5..13dbe5b 100644
 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
 +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
 @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx,
  
 brw_batch_end(brw-batch);
  
 +   /* We will likely want to read the results in the very near future, so
 +* push this primitive to hardware if it is currently idle.
 +*/
 +   if (!brw_batch_busy(brw-batch))
 +  brw_batch_flush(brw-batch);
 +
 /* EndTransformFeedback() means that we need to update the number of
  * vertices written.  Since it's only necessary if DrawTransformFeedback()
  * is called and it means mapping a buffer object, we delay computing it
 

We need some data to justify this change.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] i965/gen4-5: Program the execution size correctly for DO/WHILE instructions.

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 11:03 AM, Francisco Jerez curroje...@riseup.net wrote:
 From the hardware docs for the DO instruction:

  Execution size is ignored for this instruction.

 My observation on ILK hardware contradicts the spec though, channels

/facepalm

 over the execution size of a DO instruction won't enter the loop, and
 channels over the execution size of a WHILE instruction will exit the
 loop after the first iteration -- The latter is consistent with the
 spec though, there's no claim that the execution size is ignored for
 the WHILE instruction so it's not completely unexpected that it has an
 influence on the evaluation of EMask.

 The execute_size argument of brw_DO() shouldn't have any effect on
 Gen6 and newer hardware.  On Gen4-5 WHILE instructions inherit the
 execution size from the matching DO, so this patch should fix them
 too.  The execution size of BREAK and CONT instructions was already
 being set correctly.

 Fixes some 50 piglit tests on Gen4-5 when forced to run shaders with
 conditional and loop instructions 16-wide,
 e.g. shaders/glsl-fs-continue-inside-do-while.

Awesome!

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: Add a MUST_CHECK macro for attribute((warn_unused_result)).

2015-07-06 Thread Kenneth Graunke

In the kernel, this is called __must_check; all our attribute macros in
Mesa appear to be uppercase, so I went with that.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Cc: ch...@chris-wilson.co.uk
Cc: matts...@gmail.com
---
 configure.ac  | 1 +
 src/util/macros.h | 6 ++
 2 files changed, 7 insertions(+)

I noticed Chris wants to use this in one of his patches, so I figured I'd
throw together a patch to do this a bit more cleanly.

diff --git a/configure.ac b/configure.ac
index ea0f069..d240c06 100644
--- a/configure.ac
+++ b/configure.ac
@@ -210,6 +210,7 @@ AX_GCC_FUNC_ATTRIBUTE([format])
 AX_GCC_FUNC_ATTRIBUTE([malloc])
 AX_GCC_FUNC_ATTRIBUTE([packed])
 AX_GCC_FUNC_ATTRIBUTE([unused])
+AX_GCC_FUNC_ATTRIBUTE([warn_unused_result])
 
 AM_CONDITIONAL([GEN_ASM_OFFSETS], test x$GEN_ASM_OFFSETS = xyes)
 
diff --git a/src/util/macros.h b/src/util/macros.h
index 3b708ed..66698e7 100644
--- a/src/util/macros.h
+++ b/src/util/macros.h
@@ -182,6 +182,12 @@ do {   \
 #define UNUSED
 #endif
 
+#ifdef HAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT
+#define MUST_CHECK __attribute__((warn_unused_result))
+#else
+#define MUST_CHECK
+#endif
+
 /** Compute ceiling of integer quotient of A divided by B. */
 #define DIV_ROUND_UP( A, B )  ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 )
 
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] i965/gen4-5: Set ENDIF dst and src0 fields to the null register.

2015-07-06 Thread Kenneth Graunke

On Monday, July 06, 2015 09:03:35 PM Francisco Jerez wrote:
 The hardware docs don't mention explicitly what these fields should
 be, but I've verified experimentally on ILK that using a GRF as
 destination causes the register to be corrupted when the execution
 size of an ENDIF instruction is higher than 8 -- and because the
 destination we were using was g0, eventually a hang.
 
 Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with
 if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3.
 ---
  src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
 b/src/mesa/drivers/dri/i965/brw_eu_emit.c
 index 0f53604..4d39762 100644
 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
 +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
 @@ -1584,8 +1584,8 @@ brw_ENDIF(struct brw_codegen *p)
 }
  
 if (devinfo-gen  6) {
 -  brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
 -  brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
 +  brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
 +  brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
brw_set_src1(p, insn, brw_imm_d(0x0));
 } else if (devinfo-gen == 6) {
brw_set_dest(p, insn, brw_imm_w(0));
 

Hah!  I always thought there were just some bugs in this area, but I'm
surprised to see that they were so small.  Great work tracking them
down!

We were doing a bunch of other bogus stuff back in the day, but some of
that got fixed during the Gen4-7 and Gen8+ code generator merging.

Nice work :) Series is:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Fix missing BRW_NEW_FS_PROG_DATA in gen6_renderbuffer_surfaces.

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 9:55 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 It looks like this was forgotten in commit 3c9dc2d31b80fc73bffa1f40a
 (i965: Make a brw_stage_prog_data for storing the SURF_INDEX
 information.)   In other words, it's been missing since we moved to
 dynamic binding table slot assignments.

Author: Eric Anholt e...@anholt.net
AuthorDate: Wed Oct 2 14:07:40 2013 -0700
CommitDate: Tue Oct 15 10:18:42 2013 -0700

Dang.

How did you find this?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib

2015-07-06 Thread Dylan Baker

  +  } else
  +  if (!vb-buffer) {
 
  Should the else and if be on the same line?
 
 The general style elsewhere is to do it in this weird way. Can't say
 I'm a big fan, but I prefer consistency.
 
 I'd happily take a change that undid that oddity.
 

Odd, okay.


signature.asc
Description: Digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib

2015-07-06 Thread Ilia Mirkin

On Mon, Jul 6, 2015 at 7:50 PM, Dylan Baker baker.dyla...@gmail.com wrote:
 On Mon, Jul 06, 2015 at 11:34:23PM +0200, Samuel Pitoiset wrote:
 Before validating vertex arrays we need to check if a VBO is present.
 Checking if vb-buffer is not NULL fixes the issue.

 Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
 ---
  src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 5 +
  1 file changed, 5 insertions(+)

 diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c 
 b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
 index 1fd33b8..3d200bd 100644
 --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
 +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
 @@ -382,6 +382,11 @@ nv50_vertex_arrays_validate(struct nv50_context *nv50)
if (nv50-vbo_user  (1  b)) {
   address = addrs[b] + ve-pipe.src_offset;
   limit = addrs[b] + limits[b];
 +  } else
 +  if (!vb-buffer) {

 Should the else and if be on the same line?

The general style elsewhere is to do it in this weird way. Can't say
I'm a big fan, but I prefer consistency.

I'd happily take a change that undid that oddity.


 + BEGIN_NV04(push, NV50_3D(VERTEX_ARRAY_FETCH(i)), 1);
 + PUSH_DATA (push, 0);
 + continue;
} else {
   struct nv04_resource *buf = nv04_resource(vb-buffer);
   if (!(refd  (1  b))) {
 --
 2.4.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: Don't disable SIMD16 when using the pixel interpolator

2015-07-06 Thread Matt Turner

On Sun, Jul 5, 2015 at 4:45 PM, Francisco Jerez curroje...@riseup.net wrote:
 Hi Matt,

 Matt Turner matts...@gmail.com writes:

 On Fri, Jul 3, 2015 at 3:46 AM, Francisco Jerez curroje...@riseup.net 
 wrote:
 Heh, I happened to come across this comment yesterday while looking for
 the remaining no16 calls and wondered why on earth it couldn't do the
 same that the normal interpolation code does.  After this patch and a
 series coming up that will remove all SIMD8 fallbacks from the texturing
 code, the only case left still applicable to Gen7 hardware and later
 will be SIMD16 explicit accumulator operands unsupported.  Anyone?

 I can explain the problem:

 Prior to Gen7, the were were two accumulator registers usable for most
 datatypes (acc0, acc1). On Gen7, they removed integer-support from
 acc1, which was necessary to implement SIMD16 integer multiplication
 using the normal MUL/MACH sequence.

 IIRC they got rid of the acc1 register on IVB altogether, but managed to
 emulate it for floating point types by taking advantage of the extra
 precision not normally used for floating point arithmetic (the fake acc1
 basically uses the same storage in the EU that holds the 32 MSBs of each
 component of acc0), what explains the apparent asymmetry between integer
 and floating point data types.

I've never read anything that told me that -- what have you seen?

 I implemented 32-bit integer multiplication without using the
 accumulator in:

 commit f7df169ba13d22338e9276839a7e9629ca0a6b4f
 Author: Matt Turner matts...@gmail.com
 Date:   Wed May 13 18:34:03 2015 -0700

 i965/fs: Implement integer multiply without mul/mach.

 The remaining cases of SIMD16 explicit accumulator operands
 unsupported are ADDC, SUBB, and 32x32 - high 32-bit multiplication.
 The remaining multiplication case can probably be reimplemented
 without the accumulator, like I did for the low 32-bit result.

 Hmm, I have the suspicion that high 32-bit multiplication is the one
 legit use-case of the accumulator we have left, any algorithm breaking
 it up into individual 32/16-bit MULs would end up doing more
 multiplications than the two MUL/MACH instructions we do now, because we
 wouldn't be able to take advantage of the full precision implemented in
 the hardware if we truncate the 48-bit intermediate results to fit in a
 32-bit register.

That's probably true. It's just that Sandybridge and earlier don't
expose the functionality (but could do 64-bit integer multiplication
just fine), Ivybridge has the quarter-control/accumulator bug, Haswell
works fine if you split the multiplication sequence into SIMD8, and
Broadwell let's you do 32x32 - 64-bit multiplication without the
accumulator.

So you have only two platforms where it's you have to use the
accumulator, and one of them is broken (but I guess can be trivially
fixed by some force-writemask-all hackery).

The best SIMD16 code for [iu]mulExtended() where both lsb and msb
results are used is probably 2 sets of mul/mach/mov (with some kind of
work around for Ivybridge), but that's kind of hard to recognize.

 How about we use the SIMD width lowering pass to split the computation
 in half?  It should be quite straightforward but will probably require
 adding a new virtual opcode so that the SIMD width lowering pass doesn't
 have to deal with (seriously fucked-up) accumulators directly.

Seems fine to me.

 The ADDC and SUBB instructions implicitly write a bit to the
 accumulator if their operations overflowed. The 1Q/2Q quarter control
 is supposed to select which register is implicitly written -- except
 that there is no acc1 for integer types. Haswell and newer ignore the
 quarter control and always write acc0, but IVB (and presumably BYT)
 attempt to write to the nonexistent acc1.

 You could split the the SIMD16 operations into 2x SIMD8s and set
 force_writemask_all on the second, followed by a 2Q MOV from the
 accumulator. Maybe we'd rather use the .o (overflow) conditional mod
 on a result ADD to implement this.

 Yeah.  I did in fact try to implement uaddCarry last Friday without
 using the accumulator by doing something like:

 | CMP.o tmp, src0, -src1
 | MOV dst, -tmp

 ...what of course didn't work because of the extra argument precision
 post-source modifiers and also because the .o condmod doesn't work at
 all on CMP, but...

Ah, you were trying to use the fact that CMP returns 0/-1. That's a
cool idea. It's too bad that the CMP instruction doesn't do .o

I'd been thinking of doing ADD.o tmp, src0, src1 and then something
that sets/selects 0/1 based on the flag register. Maybe even a move
from the flag register would be best.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib

2015-07-06 Thread Dylan Baker

On Mon, Jul 06, 2015 at 11:34:23PM +0200, Samuel Pitoiset wrote:
 Before validating vertex arrays we need to check if a VBO is present.
 Checking if vb-buffer is not NULL fixes the issue.
 
 Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
 ---
  src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 5 +
  1 file changed, 5 insertions(+)
 
 diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c 
 b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
 index 1fd33b8..3d200bd 100644
 --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
 +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
 @@ -382,6 +382,11 @@ nv50_vertex_arrays_validate(struct nv50_context *nv50)
if (nv50-vbo_user  (1  b)) {
   address = addrs[b] + ve-pipe.src_offset;
   limit = addrs[b] + limits[b];
 +  } else
 +  if (!vb-buffer) {

Should the else and if be on the same line?

 + BEGIN_NV04(push, NV50_3D(VERTEX_ARRAY_FETCH(i)), 1);
 + PUSH_DATA (push, 0);
 + continue;
} else {
   struct nv04_resource *buf = nv04_resource(vb-buffer);
   if (!(refd  (1  b))) {
 -- 
 2.4.5
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: Digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib

2015-07-06 Thread Ilia Mirkin

On Mon, Jul 6, 2015 at 7:55 PM, Dylan Baker baker.dyla...@gmail.com wrote:
  +  } else
  +  if (!vb-buffer) {
 
  Should the else and if be on the same line?

 The general style elsewhere is to do it in this weird way. Can't say
 I'm a big fan, but I prefer consistency.

 I'd happily take a change that undid that oddity.


 Odd, okay.

Its one advantage is it allows one to easily remove clauses either
with #if or with comments on a line level without editing other lines.
However that's a pretty rare occurrence.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib

2015-07-06 Thread Ilia Mirkin

Reviewed-by: Ilia Mirkin imir...@alum.mit.edu

But please change the commit title to mention the actual effect, like
avoid segfault with enabled but unbound vertex attrib or something.
The piglit fix should be mentioned in the body of the message -- it's
a nice side-effect, but not really title-worthy.

Also:

Cc: mesa-sta...@lists.freedesktop.org

in the commit so that it gets cherry-picked into stable releases.

On Mon, Jul 6, 2015 at 5:34 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:
 Before validating vertex arrays we need to check if a VBO is present.
 Checking if vb-buffer is not NULL fixes the issue.

 Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
 ---
  src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 5 +
  1 file changed, 5 insertions(+)

 diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c 
 b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
 index 1fd33b8..3d200bd 100644
 --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
 +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
 @@ -382,6 +382,11 @@ nv50_vertex_arrays_validate(struct nv50_context *nv50)
if (nv50-vbo_user  (1  b)) {
   address = addrs[b] + ve-pipe.src_offset;
   limit = addrs[b] + limits[b];
 +  } else
 +  if (!vb-buffer) {
 + BEGIN_NV04(push, NV50_3D(VERTEX_ARRAY_FETCH(i)), 1);
 + PUSH_DATA (push, 0);
 + continue;
} else {
   struct nv04_resource *buf = nv04_resource(vb-buffer);
   if (!(refd  (1  b))) {
 --
 2.4.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 85203] [DRI2][RadeonSI] Pageflip completion event has impossible msc.

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=85203

--- Comment #17 from Michel Dänzer mic...@daenzer.net ---
http://lists.freedesktop.org/archives/dri-devel/2015-July/085663.html should
fix this.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 85203] [DRI2][RadeonSI] Pageflip completion event has impossible msc.

2015-07-06 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=85203

Aaron B aaronbotte...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #18 from Aaron B aaronbotte...@gmail.com ---
I've long since replaced the card with a GTX 970 Superclocked. I'll mark it
fixed and assume it is, thank guys for your work.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult

2015-07-06 Thread Michel Dänzer

On 07.07.2015 01:32, Ilia Mirkin wrote:
 On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan
 ytri...@gmail.com wrote:
 Hi, everyone.

 Trying to understand method radeonQueryGetResult (more broadly GPU-CPU
 sync).

 static void radeonQueryGetResult(struct gl_context *ctx, struct
 gl_query_object *q)
 {
 struct radeon_query_object *query = (struct radeon_query_object *)q;
 uint32_t *result;
 int i;

 radeon_print(RADEON_STATE, RADEON_VERBOSE,
 %s: query id %d, result %d\n,
 __func__, query-Base.Id, (int) query-Base.Result);

 radeon_bo_map(query-bo, GL_FALSE);
 result = query-bo-ptr;

 query-Base.Result = 0;
 for (i = 0; i  query-curr_offset/sizeof(uint32_t); ++i) {
 query-Base.Result += LE32_TO_CPU(result[i]);
 radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i,
 LE32_TO_CPU(result[i]));
 }

 radeon_bo_unmap(query-bo);
 }

 I don't know which part is responsible for blocking behavior (waiting for
 response from GPU). I suspect that radeon_bo_map does this magic.
 Can someone point in the right direction?
 
 The radeon_bo_map defined in
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic.
 However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c
 references the radeon_bo_map in libdrm, which does not appear to wait.

It does, see radeon_bo_gem.c:bo_map():

wait:
boi-ptr = bo_gem-priv_ptr;
r = bo_wait(boi);


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: Use param export count from si_llvm_export_vs in si_shader_vs

2015-07-06 Thread Michel Dänzer

On 06.07.2015 20:41, Marek Olšák wrote:
 I thought your patch does something else, but after re-reading it,
 it seems to do the right thing.

Yeah, si_llvm_export_vs's param_count only counts parameter exports.


 Reviewed-by: Marek Olšák marek.ol...@amd.com

Thanks.


 On Mon, Jul 6, 2015 at 1:38 PM, Marek Olšák mar...@gmail.com wrote:
 
 CLIPDIST is special in that it's exported twice, first as POS and
 then as PARAM, the same will be done for CULLDIST in the future, so
 these two should indeed be removed from the list.

With this change, you won't need to do anything in si_shader_vs when
making that change for CULLDIST, si_llvm_export_vs just needs to include
it in param_count.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback

2015-07-06 Thread Kristian Høgsberg

On Mon, Jul 6, 2015 at 12:36 PM, Kenneth Graunke kenn...@whitecape.org wrote:
 On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote:
 Since the purpose of transform feedback tends to be for the client to
 act upon the results to change the geometry in the scene, it is likely
 that the client will soon be waiting upon the results. Flush the batch
 early so that we don't build up a long queue of commands afterwards that
 could delay the readback.
 ---
  src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++
  1 file changed, 6 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
 b/src/mesa/drivers/dri/i965/gen7_sol_state.c
 index 857ebe5..13dbe5b 100644
 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
 +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
 @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx,

 brw_batch_end(brw-batch);

 +   /* We will likely want to read the results in the very near future, so
 +* push this primitive to hardware if it is currently idle.
 +*/
 +   if (!brw_batch_busy(brw-batch))
 +  brw_batch_flush(brw-batch);
 +
 /* EndTransformFeedback() means that we need to update the number of
  * vertices written.  Since it's only necessary if 
 DrawTransformFeedback()
  * is called and it means mapping a buffer object, we delay computing it


 We need some data to justify this change.

I think even the theory is not correct - transform feedback is
typically fed back into the GPU (as new geometry, eg) rather than
consumed by the CPU, and in that case the flush is not helpful. But at
the end of the day, data will tell.

Kristian

 --Ken

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult

2015-07-06 Thread Vyacheslav Gonakhchyan

Hi, everyone.

Trying to understand method radeonQueryGetResult (more broadly GPU-CPU
sync).

static void radeonQueryGetResult(struct gl_context *ctx, struct
gl_query_object *q)
{
struct radeon_query_object *query = (struct radeon_query_object *)q;
uint32_t *result;
int i;

radeon_print(RADEON_STATE, RADEON_VERBOSE,
%s: query id %d, result %d\n,
__func__, query-Base.Id, (int) query-Base.Result);

radeon_bo_map(query-bo, GL_FALSE);
result = query-bo-ptr;

query-Base.Result = 0;
for (i = 0; i  query-curr_offset/sizeof(uint32_t); ++i) {
query-Base.Result += LE32_TO_CPU(result[i]);
radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i,
LE32_TO_CPU(result[i]));
}

radeon_bo_unmap(query-bo);
}

I don't know which part is responsible for blocking behavior (waiting for
response from GPU). I suspect that radeon_bo_map does this magic.
Can someone point in the right direction?

Thanks,
Vyacheslav
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 07/18] i965: Make intel_mipmap_tree_map_raw() static

2015-07-06 Thread Martin Peres


On 06/07/15 13:33, Chris Wilson wrote:

No external users, so no need to export the symbol outside of our
compilation unit.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk


Reviewed-by: Martin Peres martin.pe...@linux.intel.com


---
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 36 +--
  src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  7 --
  2 files changed, 18 insertions(+), 25 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index fcad043..f9dc74f 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -1342,6 +1342,24 @@ intel_miptree_copy_teximage(struct brw_context *brw,
 intel_obj-needs_validate = true;
  }
  
+static void *

+intel_miptree_map_raw(struct brw_context *brw,
+ struct intel_mipmap_tree *mt,
+ GLbitfield mode)
+{
+   /* CPU accesses to color buffers don't understand fast color clears, so
+* resolve any pending fast color clears before we map.
+*/
+   intel_miptree_resolve_color(brw, mt);
+   return brw_bo_map(mt-bo, mode  GL_MAP_WRITE_BIT ? MAP_WRITE : MAP_READ);
+}
+
+static void
+intel_miptree_unmap_raw(struct brw_context *brw,
+struct intel_mipmap_tree *mt)
+{
+}
+
  static bool
  intel_miptree_alloc_mcs(struct brw_context *brw,
  struct intel_mipmap_tree *mt,
@@ -2052,24 +2070,6 @@ intel_miptree_updownsample(struct brw_context *brw,
 }
  }
  
-void *

-intel_miptree_map_raw(struct brw_context *brw,
- struct intel_mipmap_tree *mt,
- GLbitfield mode)
-{
-   /* CPU accesses to color buffers don't understand fast color clears, so
-* resolve any pending fast color clears before we map.
-*/
-   intel_miptree_resolve_color(brw, mt);
-   return brw_bo_map(mt-bo, mode  GL_MAP_WRITE_BIT ? MAP_WRITE : MAP_READ);
-}
-
-void
-intel_miptree_unmap_raw(struct brw_context *brw,
-struct intel_mipmap_tree *mt)
-{
-}
-
  static void
  intel_miptree_map_gtt(struct brw_context *brw,
  struct intel_mipmap_tree *mt,
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 6c69572..0c25f1f 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -773,13 +773,6 @@ brw_miptree_layout(struct brw_context *brw,
 enum intel_miptree_tiling_mode requested,
 uint32_t layout_flags);
  
-void *intel_miptree_map_raw(struct brw_context *brw,

-   struct intel_mipmap_tree *mt,
-   GLbitfield mode);
-
-void intel_miptree_unmap_raw(struct brw_context *brw,
- struct intel_mipmap_tree *mt);
-
  void
  intel_miptree_map(struct brw_context *brw,
  struct intel_mipmap_tree *mt,


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 >

1 - 100 of 105 matches

Mail list logo