Re: [Mesa-dev] [PATCH v4 3/6] i965: Enable hardware-generated binding tables on render path.
On 07/03/2015 12:20 PM, Chris Wilson wrote: On Fri, Jul 03, 2015 at 10:00:31AM +0300, Abdiel Janulgue wrote: +void +gen7_enable_hw_binding_tables(struct brw_context *brw) +{ + if (!brw-use_resource_streamer) + return; + + if (!brw-hw_bt_pool.bo) { + /* We use a single re-usable buffer object for the lifetime of the + * context and size it to maximum allowed binding tables that can be + * programmed per batch: + * + * From the Haswell PRM, Volume 7: 3D Media GPGPU, + * 3DSTATE_BINDING_TABLE_POOL_ALLOC Programming Note: + * A maximum of 16,383 Binding tables are allowed in any batch buffer + */ + static const int max_size = 16383 * 4; + brw-hw_bt_pool.bo = drm_intel_bo_alloc(brw-bufmgr, hw_bt, + max_size, 64); + brw-hw_bt_pool.next_offset = 0; + } + + int pkt_len = brw-gen = 8 ? 4 : 3; + uint32_t dw1 = BRW_HW_BINDING_TABLE_ENABLE; + if (brw-is_haswell) + dw1 |= SET_FIELD(GEN7_MOCS_L3, GEN7_HW_BT_POOL_MOCS) | + HSW_BT_POOL_ALLOC_MUST_BE_ONE; + else if (brw-gen = 8) + dw1 |= BDW_MOCS_WB; + + BEGIN_BATCH(pkt_len); + OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC 16 | (pkt_len - 2)); + if (brw-gen = 8) { + OUT_RELOC64(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1); + OUT_BATCH(brw-hw_bt_pool.bo-size); + } else { + OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1); + OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, + brw-hw_bt_pool.bo-size); + } + ADVANCE_BATCH(); + + /* From the Haswell PRM, Volume 7: 3D Media GPGPU, +* 3DSTATE_BINDING_TABLE_POOL_ALLOC Programming Note: +* +* When switching between HW and SW binding table generation, SW must +* issue a state cache invalidate. +*/ + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE); Should this flush not be before the enable? It's a top-of-pipe sync, which means that the enable 3DSTATE will be parsed and changing the GPU state as the previous commands are still running. Since the flush is mandatory, that implies to me that the state it changing is not latched and so still being used by those earlier commands. I made a test-run with the flush first before the enable hw-bt and didn't see any problems. Re-ordering this to the top does actually makes sense. Thanks for pointing this out! -Abdiel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH RFC] egl/dri2: Add dri3 support to x11 platform
On Sat, 4 Jul 2015 11:18:18 +0800 Boyan Ding boyan.j.d...@gmail.com wrote: [snip] +/* FIXME: Is this right? Seems problematic for WL_bind_wayland_display */ What seems to be the problem ? Afaik xcb_dri3_open_reply_fds should return an FD which is ok (be that a render_node device, or a master one with explicit auth). The problem is that WL_bind_wayland_display don't work under dri3 on x11. I only found it yesterday that to get it work, we'll need to add a mechanism to pass fd instead of name of dri device in wl_drm protocol. Previously, if a wayland client wants to use hardware accelerated EGL, it (with the help of libEGL in mesa) will bind to wl_drm object, and wl_drm will immediately send the name of dri device to the wayland client (actually also libEGL in mesa). After wayland platform code opens the device, it has to send the fd to the X server or drm to get authentication. Things are different with dri3, where a fd is directly sent to the client without the need to authenticate. I propose the following addition in wl_drm protocol: There are two kinds of wl_drm implementation. One is the current form. The other one, called dri3-capable (or whatever name), include wl_drm object built on dri3 directly or indirectly through wayland platform. If a client binds to a dri3-capable wl_drm object, it will send a device event to the client with NULL or empty string (so a client who knows nothing about it can safely fail). If the client knows about dri3-capable wl_drm object, it will send a request named get_fd and wl_drm will respond it with an fd acquired with dri3. If the wl_drm object is not dri3-capable it will raise an error if it receives a get_fd request, so will a dri3-capable wl_drm object if it receives authenticate request. Remember, that all protocol errors in Wayland are always fatal, and cause the whole client to be disconnected. Also, on Wayland EGL simply cannot have its private connection to the server, it *must* be shared with the whole application because otherwise EGL cannot operate on the app's wl_surfaces. If you really wanted to go this route (it seems you already have a better idea than this), just use the normal mechanism to extend the current wl_drm interface, or invent a whole new interface. Object flavours does not seem like a workable solution the way you describe, if I understood what you had in mind. Note, that the compositor and the app may be using different versions of Mesa, which means they may be using different definitions of wl_drm. Therefore, wl_drm must follow the same stable protocol rules as everything else. Thanks, pq So the following dri3_authenticate function is not needed. Let's not expose WL_bind_wayland_display for now, and its enablement should be separate patches. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] clover: Implement image attribute getters
This seems to be doing essentially the same thing as v1? Is it the right patch? The llvm pass was invoked in clover in v1. This patch relies on llvm to perform that task (). What this patch does basically is that it adds the image attributes to the end of the kernel input vector. The commit message of this patch is misleading, I'll fix it. On Wed, Jun 24, 2015 at 2:48 PM, Francisco Jerez curroje...@riseup.net wrote: Zoltan Gilian zoltan.gil...@gmail.com writes: Image attributes are passed to the kernel as hidden parameters after the image attribute itself. An llvm pass replaces the getter builtins to the appropriate parameters. This seems to be doing essentially the same thing as v1? Is it the right patch? --- src/gallium/state_trackers/clover/core/kernel.cpp | 26 +++ src/gallium/state_trackers/clover/core/kernel.hpp | 13 ++-- src/gallium/state_trackers/clover/core/memory.cpp | 2 +- .../state_trackers/clover/llvm/invocation.cpp | 81 +- 4 files changed, 116 insertions(+), 6 deletions(-) diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp b/src/gallium/state_trackers/clover/core/kernel.cpp index 0756f06..291c799 100644 --- a/src/gallium/state_trackers/clover/core/kernel.cpp +++ b/src/gallium/state_trackers/clover/core/kernel.cpp @@ -185,6 +185,13 @@ kernel::exec_context::bind(intrusive_ptrcommand_queue _q, } } + // Bind image attribute args. + for (const auto arg: kern._args) { + if (auto img_arg = dynamic_castimage_argument*(arg.get())) { + img_arg-bind_attributes(*this); + } + } + // Create a new compute state if anything changed. if (!st || q != _q || cs.req_local_mem != mem_local || @@ -465,6 +472,25 @@ kernel::constant_argument::unbind(exec_context ctx) { } void +kernel::image_argument::bind_attributes(exec_context ctx) { + cl_image_format format = img-format(); + cl_uint attributes[] = { + static_castcl_uint(img-width()), + static_castcl_uint(img-height()), + static_castcl_uint(img-depth()), + format.image_channel_data_type, + format.image_channel_order}; + for (unsigned i = 0; i 5; ++i) { + auto v = bytes(attributes[i]); + + extend(v, module::argument::zero_ext, sizeof(cl_uint)); + byteswap(v, ctx.q-device().endianness()); + align(ctx.input, sizeof(cl_uint)); + insert(ctx.input, v); + } +} + +void kernel::image_rd_argument::set(size_t size, const void *value) { if (size != sizeof(cl_mem)) throw error(CL_INVALID_ARG_SIZE); diff --git a/src/gallium/state_trackers/clover/core/kernel.hpp b/src/gallium/state_trackers/clover/core/kernel.hpp index d6432a4..8c15b2f 100644 --- a/src/gallium/state_trackers/clover/core/kernel.hpp +++ b/src/gallium/state_trackers/clover/core/kernel.hpp @@ -190,7 +190,14 @@ namespace clover { pipe_surface *st; }; - class image_rd_argument : public argument { + class image_argument : public argument { + public: + void bind_attributes(exec_context ctx); + protected: + image *img; + }; + + class image_rd_argument : public image_argument { public: virtual void set(size_t size, const void *value); virtual void bind(exec_context ctx, @@ -198,11 +205,10 @@ namespace clover { virtual void unbind(exec_context ctx); private: - image *img; pipe_sampler_view *st; }; - class image_wr_argument : public argument { + class image_wr_argument : public image_argument { public: virtual void set(size_t size, const void *value); virtual void bind(exec_context ctx, @@ -210,7 +216,6 @@ namespace clover { virtual void unbind(exec_context ctx); private: - image *img; pipe_surface *st; }; diff --git a/src/gallium/state_trackers/clover/core/memory.cpp b/src/gallium/state_trackers/clover/core/memory.cpp index 055336a..b852e68 100644 --- a/src/gallium/state_trackers/clover/core/memory.cpp +++ b/src/gallium/state_trackers/clover/core/memory.cpp @@ -189,7 +189,7 @@ image2d::image2d(clover::context ctx, cl_mem_flags flags, const cl_image_format *format, size_t width, size_t height, size_t row_pitch, void *host_ptr) : - image(ctx, flags, format, width, height, 0, + image(ctx, flags, format, width, height, 1, row_pitch, 0, height * row_pitch, host_ptr) { } diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp b/src/gallium/state_trackers/clover/llvm/invocation.cpp index 9b91fee..a33d450 100644 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp @@ -80,6 +80,7 @@ using namespace clover; namespace { + #if 0 void
[Mesa-dev] [PATCH 03/18] i965: Share the workaround bo between all contexts
Since the workaround bo is used strictly as a write-only buffer, we need only allocate one per screen and use the same one from all contexts. (The caveat here is during extension initialisation, where we write into and read back register values from the buffer, but that is performed only once for the first context - and baring synchronisation issues should not be a problem. Safer would be to move that also to the screen.) Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 +++--- src/mesa/drivers/dri/i965/intel_screen.c | 4 src/mesa/drivers/dri/i965/intel_screen.h | 1 + 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index 7ee3cb6..05e14cd 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -341,12 +341,12 @@ brw_init_pipe_control(struct brw_context *brw, * the gen6 workaround because it involves actually writing to * the buffer, and the kernel doesn't let us write to the batch. */ - brw-workaround_bo = drm_intel_bo_alloc(brw-bufmgr, - pipe_control workaround, - 4096, 4096); + brw-workaround_bo = brw-intelScreen-workaround_bo; if (brw-workaround_bo == NULL) return -ENOMEM; + drm_intel_bo_reference(brw-workaround_bo); + brw-pipe_controls_since_last_cs_stall = 0; return 0; diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 839a984..cd8e6eb 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv) { struct intel_screen *intelScreen = sPriv-driverPrivate; + drm_intel_bo_unreference(intelScreen-workaround_bo); dri_bufmgr_destroy(intelScreen-bufmgr); driDestroyOptionInfo(intelScreen-optionCache); @@ -1096,6 +1097,9 @@ intel_init_bufmgr(struct intel_screen *intelScreen) return false; } + intelScreen-workaround_bo = + drm_intel_bo_alloc(intelScreen-bufmgr, pipe_control w/a, 4096, 4096); + return true; } diff --git a/src/mesa/drivers/dri/i965/intel_screen.h b/src/mesa/drivers/dri/i965/intel_screen.h index 941e0fc..e55fddb 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.h +++ b/src/mesa/drivers/dri/i965/intel_screen.h @@ -60,6 +60,7 @@ struct intel_screen bool has_context_reset_notification; dri_bufmgr *bufmgr; + drm_intel_bo *workaround_bo; /** * A unique ID for shader programs. -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/18] i965: Enable GPU snooping of CPU caches for select buffers
On LLC, all buffers are normally cache coherent between the CPU and the GPU, giving both parties fast access to shared data. However, older architectures or Atoms, do not implement LLC between the CPU and GPU. Instead they utilise a snooping architecture where the GPU can snoop the CPU cache when told. The snooping has much higher overhead (i.e. slower) than regular memory fetches so its use should be reserved to instances where the data likely resides in the CPU cache and the GPU need only use it once (e.g. for streaming textures from the CPU). The other major benefit is that it can also push data to the CPU in a cache coherent fashion, and so CPU access to a snooped buffer is very fast, making it preferrable for anytime we need to readback data from the GPU For demonstration we set the snoop flag for reading back via a blit from a miptree, and for reading back transform feedback. --- src/mesa/drivers/dri/i965/brw_batch.c | 31 +++ src/mesa/drivers/dri/i965/brw_batch.h | 9 src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/gen6_sol.c | 3 ++- src/mesa/drivers/dri/i965/gen7_sol_state.c| 2 +- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 1 + 6 files changed, 45 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_batch.c b/src/mesa/drivers/dri/i965/brw_batch.c index bfff9fe..e01a0c4 100644 --- a/src/mesa/drivers/dri/i965/brw_batch.c +++ b/src/mesa/drivers/dri/i965/brw_batch.c @@ -1323,6 +1323,37 @@ struct brw_bo *brw_bo_create(struct brw_batch *batch, return bo; } +static bool __brw_bo_set_caching(struct brw_bo *bo, int caching) +{ + struct drm_i915_gem_caching arg; + + memset(arg, 0, sizeof(arg)); + arg.handle = bo-handle; + arg.caching = caching; + return drmIoctl(bo-batch-fd, DRM_IOCTL_I915_GEM_SET_CACHING, arg) == 0; +} + +void brw_bo_enable_snoop(struct brw_bo *bo) +{ + assert(bo-reusable); + + if (bo-cache_coherent) + return; + + if (bo-tiling) + return; /* XXX abort? */ + + if (!__brw_bo_set_caching(bo, I915_CACHING_CACHED)) + return; + + drm_intel_bo_disable_reuse(bo-base); + if (bo-reusable) + list_move(bo-link, bo-batch-inactive); + + bo-reusable = false; + bo-cache_coherent = true; +} + static uint64_t brw_surface_size(int cpp, uint32_t width, uint32_t height, diff --git a/src/mesa/drivers/dri/i965/brw_batch.h b/src/mesa/drivers/dri/i965/brw_batch.h index 5b56e82..3628b03 100644 --- a/src/mesa/drivers/dri/i965/brw_batch.h +++ b/src/mesa/drivers/dri/i965/brw_batch.h @@ -220,6 +220,15 @@ struct brw_bo *brw_bo_create_from_name(struct brw_batch *batch, const char *name, uint32_t global_name); +/* Enable CPU cache coherent to the buffer. On LLC, normally all buffers + * are cache coherent, but on non-LLC architectures we can tell the GPU + * to snoop from and to flush into the CPU cache. Performing the snoop + * is slower for the GPU, but eliminates the uncached penalty from the CPU, + * so it only useful for streaming data (read once) to the GPU or when + * we need to read anything back from the GPU. + */ +void brw_bo_enable_snoop(struct brw_bo *bo); + void brw_bo_mark_dirty(struct brw_batch *batch, struct brw_bo *bo); inline static int brw_bo_madvise(struct brw_bo *bo, int state) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 63f4f87..3aa003c 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -872,6 +872,7 @@ struct brw_transform_feedback_object { * Count of primitives generated during this transform feedback operation. * @{ */ +#define BRW_XFB_BO_SIZE (1610) uint64_t prims_generated[BRW_MAX_XFB_STREAMS]; struct brw_bo *prim_count_bo; unsigned prim_count_buffer_index; /** in number of uint64_t units */ diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c index 39bf8e9..037aad0 100644 --- a/src/mesa/drivers/dri/i965/gen6_sol.c +++ b/src/mesa/drivers/dri/i965/gen6_sol.c @@ -206,7 +206,8 @@ brw_new_transform_feedback(struct gl_context *ctx, GLuint name) brw_obj-offset_bo = brw_bo_create(brw-batch, transform feedback offsets, 16, 64, 0); brw_obj-prim_count_bo = - brw_bo_create(brw-batch, xfb primitive counts, 4096, 64, 0); + brw_bo_create(brw-batch, xfb primitive counts, BRW_XFB_BO_SIZE, 64,0); + brw_bo_enable_snoop(brw_obj-prim_count_bo); return brw_obj-base; } diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index ff6678e..857ebe5 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -350,7 +350,7 @@ gen7_save_primitives_written_counters(struct brw_context *brw,
[Mesa-dev] [PATCH 07/18] i965: Make intel_mipmap_tree_map_raw() static
No external users, so no need to export the symbol outside of our compilation unit. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 36 +-- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 7 -- 2 files changed, 18 insertions(+), 25 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index fcad043..f9dc74f 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -1342,6 +1342,24 @@ intel_miptree_copy_teximage(struct brw_context *brw, intel_obj-needs_validate = true; } +static void * +intel_miptree_map_raw(struct brw_context *brw, + struct intel_mipmap_tree *mt, + GLbitfield mode) +{ + /* CPU accesses to color buffers don't understand fast color clears, so +* resolve any pending fast color clears before we map. +*/ + intel_miptree_resolve_color(brw, mt); + return brw_bo_map(mt-bo, mode GL_MAP_WRITE_BIT ? MAP_WRITE : MAP_READ); +} + +static void +intel_miptree_unmap_raw(struct brw_context *brw, +struct intel_mipmap_tree *mt) +{ +} + static bool intel_miptree_alloc_mcs(struct brw_context *brw, struct intel_mipmap_tree *mt, @@ -2052,24 +2070,6 @@ intel_miptree_updownsample(struct brw_context *brw, } } -void * -intel_miptree_map_raw(struct brw_context *brw, - struct intel_mipmap_tree *mt, - GLbitfield mode) -{ - /* CPU accesses to color buffers don't understand fast color clears, so -* resolve any pending fast color clears before we map. -*/ - intel_miptree_resolve_color(brw, mt); - return brw_bo_map(mt-bo, mode GL_MAP_WRITE_BIT ? MAP_WRITE : MAP_READ); -} - -void -intel_miptree_unmap_raw(struct brw_context *brw, -struct intel_mipmap_tree *mt) -{ -} - static void intel_miptree_map_gtt(struct brw_context *brw, struct intel_mipmap_tree *mt, diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h index 6c69572..0c25f1f 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h @@ -773,13 +773,6 @@ brw_miptree_layout(struct brw_context *brw, enum intel_miptree_tiling_mode requested, uint32_t layout_flags); -void *intel_miptree_map_raw(struct brw_context *brw, - struct intel_mipmap_tree *mt, - GLbitfield mode); - -void intel_miptree_unmap_raw(struct brw_context *brw, - struct intel_mipmap_tree *mt); - void intel_miptree_map(struct brw_context *brw, struct intel_mipmap_tree *mt, -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback
Since the purpose of transform feedback tends to be for the client to act upon the results to change the geometry in the scene, it is likely that the client will soon be waiting upon the results. Flush the batch early so that we don't build up a long queue of commands afterwards that could delay the readback. --- src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index 857ebe5..13dbe5b 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx, brw_batch_end(brw-batch); + /* We will likely want to read the results in the very near future, so +* push this primitive to hardware if it is currently idle. +*/ + if (!brw_batch_busy(brw-batch)) + brw_batch_flush(brw-batch); + /* EndTransformFeedback() means that we need to update the number of * vertices written. Since it's only necessary if DrawTransformFeedback() * is called and it means mapping a buffer object, we delay computing it -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/18] i965: Track active GPU region from MapBufferRange
Avoid unrequired synchronization if the user requests to map an unused range on active buffer, equivalent to BufferSubData. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/intel_buffer_objects.c | 78 +--- 1 file changed, 42 insertions(+), 36 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c b/src/mesa/drivers/dri/i965/intel_buffer_objects.c index fda5c9f..283182d 100644 --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c @@ -330,44 +330,50 @@ brw_map_buffer_range(struct gl_context *ctx, * If they set INVALIDATE_BUFFER, we can pitch the current contents to * achieve the required synchronization. */ - if (!(access GL_MAP_UNSYNCHRONIZED_BIT)) { + if (!(access GL_MAP_UNSYNCHRONIZED_BIT) + brw_bo_busy(intel_obj-buffer, BUSY_WRITE | BUSY_RETIRE)) { if ((access GL_MAP_INVALIDATE_BUFFER_BIT)) { -if (brw_bo_busy(intel_obj-buffer, BUSY_WRITE | BUSY_RETIRE)) { - brw_bo_put(intel_obj-buffer); - alloc_buffer_object(brw, intel_obj); -} - } - } + brw_bo_put(intel_obj-buffer); + alloc_buffer_object(brw, intel_obj); + } else if (offset + length = intel_obj-gpu_active_start || + intel_obj-gpu_active_end = offset) { + access |= GL_MAP_UNSYNCHRONIZED_BIT; + } else if (!(access GL_MAP_PERSISTENT_BIT) + (access GL_MAP_INVALIDATE_RANGE_BIT)) { + /* If the user is mapping a range of an active buffer object but + * doesn't require the current contents of that range, make a new + * BO, and we'll copy what they put in there out at unmap or + * FlushRange time. + * + * That is, unless they're looking for a persistent mapping -- we + * would need to do blits in the MemoryBarrier call, and it's easier + * to just do a GPU stall and do a mapping. + */ - /* If the user is mapping a range of an active buffer object but -* doesn't require the current contents of that range, make a new -* BO, and we'll copy what they put in there out at unmap or -* FlushRange time. -* -* That is, unless they're looking for a persistent mapping -- we would -* need to do blits in the MemoryBarrier call, and it's easier to just do a -* GPU stall and do a mapping. -*/ - if (!(access (GL_MAP_UNSYNCHRONIZED_BIT | GL_MAP_PERSISTENT_BIT)) - (access GL_MAP_INVALIDATE_RANGE_BIT) - brw_bo_busy(intel_obj-buffer, BUSY_WRITE | BUSY_RETIRE)) { - /* Ensure that the base alignment of the allocation meets the alignment - * guarantees the driver has advertised to the application. - */ - const unsigned alignment = ctx-Const.MinMapBufferAlignment; - - intel_obj-map_extra[index] = (uintptr_t) offset % alignment; - intel_obj-range_map_bo[index] = - brw_bo_create(brw-batch, - BO blit temp, - length + intel_obj-map_extra[index], - alignment, 0); - - obj-Mappings[index].Pointer = -brw_bo_map(intel_obj-range_map_bo[index], MAP_WRITE) + -intel_obj-map_extra[index]; - - return obj-Mappings[index].Pointer; + /* Ensure that the base alignment of the allocation meets the alignment + * guarantees the driver has advertised to the application. + */ + const unsigned alignment = ctx-Const.MinMapBufferAlignment; + + intel_obj-map_extra[index] = (uintptr_t) offset % alignment; + intel_obj-range_map_bo[index] = +brw_bo_create(brw-batch, + BO blit temp, + length + intel_obj-map_extra[index], + alignment, 0); + + obj-Mappings[index].Pointer = +brw_bo_map(intel_obj-range_map_bo[index], MAP_WRITE) + +intel_obj-map_extra[index]; + + return obj-Mappings[index].Pointer; + } else { + perf_debug(Stalling on glBufferMapRange(%ld, %ld) (%ldkb) to a busy +(%d-%d) buffer object.\n, +(long)offset, (long)offset + length, (long)(length/1024), +intel_obj-gpu_active_start, +intel_obj-gpu_active_end); + } } map_flags = 0; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/18] i965: Reuse our VBO for streaming fast-clear vertices
Rather than allocating a fresh page every time we clear a buffer, keep that page around between invocations by tracking the last used offset and only allocating a fresh page when we wrap. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c index a571a74..ffd1e3b 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c @@ -55,6 +55,7 @@ struct brw_fast_clear_state { GLuint vao; GLuint vbo; + GLuint vbo_offset; GLuint shader_prog; GLint color_location; }; @@ -83,6 +84,8 @@ brw_fast_clear_init(struct brw_context *brw) _mesa_VertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, sizeof(float) * 2, 0); _mesa_EnableVertexAttribArray(0); + clear-vbo_offset = 4096; /* Force an allocation on first use */ + return true; } @@ -166,7 +169,8 @@ struct rect { static void brw_draw_rectlist(struct gl_context *ctx, struct rect *rect, int num_instances) { - int start = 0, count = 3; + struct brw_fast_clear_state *clear = brw_context(ctx)-fast_clear_state; + int start, count = 3, offset; struct _mesa_prim prim; float verts[6]; @@ -178,8 +182,15 @@ brw_draw_rectlist(struct gl_context *ctx, struct rect *rect, int num_instances) verts[5] = rect-y0; /* upload new vertex data */ - _mesa_BufferData(GL_ARRAY_BUFFER_ARB, sizeof(verts), verts, -GL_DYNAMIC_DRAW_ARB); + offset = clear-vbo_offset; + clear-vbo_offset += sizeof(verts); + if (clear-vbo_offset 4096) { + _mesa_BufferData(GL_ARRAY_BUFFER_ARB, 4096, NULL, GL_STREAM_DRAW_ARB); + offset = 0; + clear-vbo_offset = sizeof(verts); + } + _mesa_BufferSubData(GL_ARRAY_BUFFER_ARB, offset, sizeof(verts), verts); + start = offset / (2*sizeof(float)); if (ctx-NewState) _mesa_update_state(ctx); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/18] i965: Coalesce relocation read/write domains to a single integer
There are only a handful of distinct cache domains (less than 16), and internally the kernel simply doesn't care other than whether the object is a render target. We can therefore trim a parameter by coalescing the relocation domains to a single unsigned bitfield without loss of generality. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/brw_batch.c | 11 src/mesa/drivers/dri/i965/brw_batch.h | 19 +++-- src/mesa/drivers/dri/i965/brw_cc.c | 2 +- src/mesa/drivers/dri/i965/brw_clip_state.c | 2 +- src/mesa/drivers/dri/i965/brw_conditional_render.c | 6 ++-- src/mesa/drivers/dri/i965/brw_context.c| 14 +- src/mesa/drivers/dri/i965/brw_context.h| 6 ++-- src/mesa/drivers/dri/i965/brw_cs.cpp | 4 +-- src/mesa/drivers/dri/i965/brw_curbe.c | 2 +- src/mesa/drivers/dri/i965/brw_draw.c | 12 src/mesa/drivers/dri/i965/brw_draw_upload.c| 12 +++- src/mesa/drivers/dri/i965/brw_misc_state.c | 31 ++--- .../drivers/dri/i965/brw_performance_monitor.c | 10 +++ src/mesa/drivers/dri/i965/brw_pipe_control.c | 8 ++ src/mesa/drivers/dri/i965/brw_sampler_state.c | 2 +- src/mesa/drivers/dri/i965/brw_sf_state.c | 2 +- src/mesa/drivers/dri/i965/brw_vs_state.c | 5 ++-- src/mesa/drivers/dri/i965/brw_wm_state.c | 4 +-- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 31 - src/mesa/drivers/dri/i965/gen6_blorp.cpp | 26 -- src/mesa/drivers/dri/i965/gen6_depth_state.c | 12 ++-- src/mesa/drivers/dri/i965/gen6_gs_state.c | 2 +- src/mesa/drivers/dri/i965/gen6_queryobj.c | 8 +++--- src/mesa/drivers/dri/i965/gen6_surface_state.c | 3 +- src/mesa/drivers/dri/i965/gen6_vs_state.c | 2 +- src/mesa/drivers/dri/i965/gen6_wm_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_blorp.cpp | 13 - src/mesa/drivers/dri/i965/gen7_gs_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_misc_state.c| 13 ++--- src/mesa/drivers/dri/i965/gen7_sol_state.c | 8 +++--- src/mesa/drivers/dri/i965/gen7_vs_state.c | 4 +-- src/mesa/drivers/dri/i965/gen7_wm_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 19 +++-- src/mesa/drivers/dri/i965/gen8_depth_state.c | 9 ++ src/mesa/drivers/dri/i965/gen8_draw_upload.c | 6 ++-- src/mesa/drivers/dri/i965/gen8_gs_state.c | 2 +- src/mesa/drivers/dri/i965/gen8_misc_state.c| 8 -- src/mesa/drivers/dri/i965/gen8_ps_state.c | 2 +- src/mesa/drivers/dri/i965/gen8_sol_state.c | 4 +-- src/mesa/drivers/dri/i965/gen8_surface_state.c | 14 -- src/mesa/drivers/dri/i965/gen8_vs_state.c | 2 +- src/mesa/drivers/dri/i965/intel_blit.c | 32 ++ src/mesa/drivers/dri/i965/intel_extensions.c | 4 +-- 43 files changed, 161 insertions(+), 221 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_batch.c b/src/mesa/drivers/dri/i965/brw_batch.c index 24e96c6..bfff9fe 100644 --- a/src/mesa/drivers/dri/i965/brw_batch.c +++ b/src/mesa/drivers/dri/i965/brw_batch.c @@ -802,8 +802,7 @@ uint64_t __brw_batch_reloc(struct brw_batch *batch, uint32_t batch_offset, struct brw_bo *target_bo, uint64_t target_offset, - unsigned read_domains, - unsigned write_domain) + unsigned domains) { assert(target_bo-refcnt); if (unlikely(target_bo-batch != batch)) { @@ -854,8 +853,8 @@ uint64_t __brw_batch_reloc(struct brw_batch *batch, batch-reloc[n].delta = target_offset; batch-reloc[n].target_handle = target_bo-target_handle; batch-reloc[n].presumed_offset = target_bo-offset; - batch-reloc[n].read_domains = read_domains; - batch-reloc[n].write_domain = write_domain; + batch-reloc[n].read_domains = domains 0x; + batch-reloc[n].write_domain = domains 16; /* If we haven't added the batch to the execobject array yet, we * will have to process all the relocations pointing to the @@ -868,7 +867,7 @@ uint64_t __brw_batch_reloc(struct brw_batch *batch, } } - if (write_domain !target_bo-dirty) { + if (domains 16 !target_bo-dirty) { assert(target_bo != batch-bo); target_bo-write.rq = batch-next_request; list_move(target_bo-write.link, batch-next_request-write); @@ -877,7 +876,7 @@ uint64_t __brw_batch_reloc(struct brw_batch *batch, target_bo-domain = DOMAIN_GPU; if (has_lut(batch)) { target_bo-exec-flags |= EXEC_OBJECT_WRITE; - if (write_domain == I915_GEM_DOMAIN_INSTRUCTION
[Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once
Move the query for the TIMESTAMP register from context init to the screen, so that it is only queried once for all contexts. On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low bits always zero. Detect this by repeating the read a few times and check the register is incrementing. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/intel_extensions.c | 6 +- src/mesa/drivers/dri/i965/intel_screen.c | 15 +++ src/mesa/drivers/dri/i965/intel_screen.h | 2 ++ 3 files changed, 18 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 3423190..740ac81 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -282,8 +282,6 @@ intelInitExtensions(struct gl_context *ctx) } if (brw-gen = 6) { - uint64_t dummy; - ctx-Extensions.ARB_blend_func_extended = !driQueryOptionb(brw-optionCache, disable_blend_func_extended); ctx-Extensions.ARB_conditional_render_inverted = true; @@ -307,9 +305,7 @@ intelInitExtensions(struct gl_context *ctx) ctx-Extensions.EXT_transform_feedback = true; ctx-Extensions.OES_depth_texture_cube_map = true; - /* Test if the kernel has the ioctl. */ - if (drm_intel_reg_read(brw-bufmgr, TIMESTAMP, dummy) == 0) - ctx-Extensions.ARB_timer_query = true; + ctx-Extensions.ARB_timer_query = brw-intelScreen-hw_has_timestamp; /* Only enable this in core profile because other parts of Mesa behave * slightly differently when the extension is enabled. diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index f9398d7..839a984 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1123,6 +1123,20 @@ intel_detect_swizzling(struct intel_screen *screen) return true; } +static bool +intel_detect_timestamp(struct intel_screen *screen) +{ + uint64_t dummy = 0; + int loop = 10; + + do { + if (drm_intel_reg_read(screen-bufmgr, TIMESTAMP, dummy)) +return false; + } while ((dummy 0x) == 0 --loop); + + return loop 0; +} + /** * Return array of MSAA modes supported by the hardware. The array is * zero-terminated and sorted in decreasing order. @@ -1378,6 +1392,7 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp) intelScreen-hw_must_use_separate_stencil = intelScreen-devinfo-gen = 7; intelScreen-hw_has_swizzling = intel_detect_swizzling(intelScreen); + intelScreen-hw_has_timestamp = intel_detect_timestamp(intelScreen); const char *force_msaa = getenv(INTEL_FORCE_MSAA); if (force_msaa) { diff --git a/src/mesa/drivers/dri/i965/intel_screen.h b/src/mesa/drivers/dri/i965/intel_screen.h index 742b3d3..941e0fc 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.h +++ b/src/mesa/drivers/dri/i965/intel_screen.h @@ -52,6 +52,8 @@ struct intel_screen bool hw_has_swizzling; + bool hw_has_timestamp; + /** * Does the kernel support context reset notifications? */ -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/18] i965: Move pipecontrol workaround bo to brw_pipe_control
With the exception of gen8, the sole user of the workaround bo are for emitting pipe controls. Move it out of the purview of the batchbuffer and into the pipecontrol. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/brw_context.c | 7 + src/mesa/drivers/dri/i965/brw_context.h | 12 +--- src/mesa/drivers/dri/i965/brw_pipe_control.c | 40 +++ src/mesa/drivers/dri/i965/gen8_depth_state.c | 2 +- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 12 src/mesa/drivers/dri/i965/intel_extensions.c | 28 +-- 6 files changed, 64 insertions(+), 37 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 4b51fe5..8150b94 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -819,6 +819,12 @@ brwCreateContext(gl_api api, } } + if (brw_init_pipe_control(brw, devinfo)) { + *dri_ctx_error = __DRI_CTX_ERROR_NO_MEMORY; + intelDestroyContext(driContextPriv); + return false; + } + brw_init_state(brw); intelInitExtensions(ctx); @@ -942,6 +948,7 @@ intelDestroyContext(__DRIcontext * driContextPriv) if (ctx-swrast_context) _swrast_DestroyContext(brw-ctx); + brw_fini_pipe_control(brw); intel_batchbuffer_free(brw); drm_intel_bo_unreference(brw-throttle_batch[1]); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3553f6e..db0fc48 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -868,8 +868,6 @@ struct intel_batchbuffer { drm_intel_bo *bo; /** Last BO submitted to the hardware. Used for glFinish(). */ drm_intel_bo *last_bo; - /** BO for post-sync nonzero writes for gen6 workaround. */ - drm_intel_bo *workaround_bo; uint16_t emit, total; uint16_t used, reserved_space; @@ -881,8 +879,6 @@ struct intel_batchbuffer { enum brw_gpu_ring ring; bool needs_sol_reset; - uint8_t pipe_controls_since_last_cs_stall; - struct { uint16_t used; int reloc_count; @@ -1034,6 +1030,10 @@ struct brw_context drm_intel_context *hw_ctx; + /** BO for post-sync nonzero writes for gen6 workaround. */ + drm_intel_bo *workaround_bo; + uint8_t pipe_controls_since_last_cs_stall; + /** * Set of drm_intel_bo * that have been rendered to within this batchbuffer * and would need flushing before being used from another cache domain that @@ -2000,6 +2000,10 @@ gen9_use_linear_1d_layout(const struct brw_context *brw, const struct intel_mipmap_tree *mt); /* brw_pipe_control.c */ +int brw_init_pipe_control(struct brw_context *brw, + const struct brw_device_info *info); +void brw_fini_pipe_control(struct brw_context *brw); + void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags); void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags, drm_intel_bo *bo, uint32_t offset, diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index b4c86b9..7ee3cb6 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -72,13 +72,13 @@ gen7_cs_stall_every_four_pipe_controls(struct brw_context *brw, uint32_t flags) if (brw-gen == 7 !brw-is_haswell) { if (flags PIPE_CONTROL_CS_STALL) { /* If we're doing a CS stall, reset the counter and carry on. */ - brw-batch.pipe_controls_since_last_cs_stall = 0; + brw-pipe_controls_since_last_cs_stall = 0; return 0; } /* If this is the fourth pipe control without a CS stall, do one now. */ - if (++brw-batch.pipe_controls_since_last_cs_stall == 4) { - brw-batch.pipe_controls_since_last_cs_stall = 0; + if (++brw-pipe_controls_since_last_cs_stall == 4) { + brw-pipe_controls_since_last_cs_stall = 0; return PIPE_CONTROL_CS_STALL; } } @@ -213,7 +213,7 @@ gen7_emit_vs_workaround_flush(struct brw_context *brw) brw_emit_pipe_control_write(brw, PIPE_CONTROL_WRITE_IMMEDIATE | PIPE_CONTROL_DEPTH_STALL, - brw-batch.workaround_bo, 0, + brw-workaround_bo, 0, 0, 0); } @@ -227,7 +227,7 @@ gen7_emit_cs_stall_flush(struct brw_context *brw) brw_emit_pipe_control_write(brw, PIPE_CONTROL_CS_STALL | PIPE_CONTROL_WRITE_IMMEDIATE, - brw-batch.workaround_bo, 0, + brw-workaround_bo, 0, 0, 0); } @@ -277,7 +277,7 @@ brw_emit_post_sync_nonzero_flush(struct
[Mesa-dev] [PATCH 06/18] i965: Pass the map-mode along to intel_mipmap_tree_map_raw()
Since we can distinguish when mapping between READ and WRITE, we can pass along the map mode to avoid stalls and flushes where possible. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 28 ++- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 3 ++- 2 files changed, 17 insertions(+), 14 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index e8bbc04..fcad043 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -1401,7 +1401,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw, * * Note: the clear value for MCS buffers is all 1's, so we memset to 0xff. */ - void *data = intel_miptree_map_raw(brw, mt-mcs_mt); + void *data = intel_miptree_map_raw(brw, mt-mcs_mt, GL_MAP_WRITE_BIT); memset(data, 0xff, mt-mcs_mt-total_height * mt-mcs_mt-pitch); intel_miptree_unmap_raw(brw, mt-mcs_mt); mt-fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR; @@ -2053,13 +2053,15 @@ intel_miptree_updownsample(struct brw_context *brw, } void * -intel_miptree_map_raw(struct brw_context *brw, struct intel_mipmap_tree *mt) +intel_miptree_map_raw(struct brw_context *brw, + struct intel_mipmap_tree *mt, + GLbitfield mode) { /* CPU accesses to color buffers don't understand fast color clears, so * resolve any pending fast color clears before we map. */ intel_miptree_resolve_color(brw, mt); - return brw_bo_map(mt-bo, MAP_WRITE); + return brw_bo_map(mt-bo, mode GL_MAP_WRITE_BIT ? MAP_WRITE : MAP_READ); } void @@ -2088,7 +2090,7 @@ intel_miptree_map_gtt(struct brw_context *brw, assert(y % bh == 0); y /= bh; - base = intel_miptree_map_raw(brw, mt) + mt-offset; + base = intel_miptree_map_raw(brw, mt, map-mode) + mt-offset; if (base == NULL) map-ptr = NULL; @@ -2155,7 +2157,7 @@ intel_miptree_map_blit(struct brw_context *brw, } } - map-ptr = intel_miptree_map_raw(brw, map-mt); + map-ptr = intel_miptree_map_raw(brw, map-mt, map-mode); DBG(%s: %d,%d %dx%d from mt %p (%s) %d,%d = %p/%d\n, __func__, map-x, map-y, map-w, map-h, @@ -2219,7 +2221,7 @@ intel_miptree_map_movntdqa(struct brw_context *brw, image_x += map-x; image_y += map-y; - void *src = intel_miptree_map_raw(brw, mt); + void *src = intel_miptree_map_raw(brw, mt, map-mode); if (!src) return; src += image_y * mt-pitch; @@ -2285,7 +2287,7 @@ intel_miptree_map_s8(struct brw_context *brw, */ if (!(map-mode GL_MAP_INVALIDATE_RANGE_BIT)) { uint8_t *untiled_s8_map = map-ptr; - uint8_t *tiled_s8_map = intel_miptree_map_raw(brw, mt); + uint8_t *tiled_s8_map = intel_miptree_map_raw(brw, mt, GL_MAP_READ_BIT); unsigned int image_x, image_y; intel_miptree_get_image_offset(mt, level, slice, image_x, image_y); @@ -2322,7 +2324,7 @@ intel_miptree_unmap_s8(struct brw_context *brw, if (map-mode GL_MAP_WRITE_BIT) { unsigned int image_x, image_y; uint8_t *untiled_s8_map = map-ptr; - uint8_t *tiled_s8_map = intel_miptree_map_raw(brw, mt); + uint8_t *tiled_s8_map = intel_miptree_map_raw(brw, mt, GL_MAP_WRITE_BIT); intel_miptree_get_image_offset(mt, level, slice, image_x, image_y); @@ -2377,7 +2379,7 @@ intel_miptree_unmap_etc(struct brw_context *brw, image_x += map-x; image_y += map-y; - uint8_t *dst = intel_miptree_map_raw(brw, mt) + uint8_t *dst = intel_miptree_map_raw(brw, mt, GL_MAP_WRITE_BIT) + image_y * mt-pitch + image_x * mt-cpp; @@ -2428,8 +2430,8 @@ intel_miptree_map_depthstencil(struct brw_context *brw, */ if (!(map-mode GL_MAP_INVALIDATE_RANGE_BIT)) { uint32_t *packed_map = map-ptr; - uint8_t *s_map = intel_miptree_map_raw(brw, s_mt); - uint32_t *z_map = intel_miptree_map_raw(brw, z_mt); + uint8_t *s_map = intel_miptree_map_raw(brw, s_mt, GL_MAP_READ_BIT); + uint32_t *z_map = intel_miptree_map_raw(brw, z_mt, GL_MAP_READ_BIT); unsigned int s_image_x, s_image_y; unsigned int z_image_x, z_image_y; @@ -2489,8 +2491,8 @@ intel_miptree_unmap_depthstencil(struct brw_context *brw, if (map-mode GL_MAP_WRITE_BIT) { uint32_t *packed_map = map-ptr; - uint8_t *s_map = intel_miptree_map_raw(brw, s_mt); - uint32_t *z_map = intel_miptree_map_raw(brw, z_mt); + uint8_t *s_map = intel_miptree_map_raw(brw, s_mt, GL_MAP_WRITE_BIT); + uint32_t *z_map = intel_miptree_map_raw(brw, z_mt, GL_MAP_WRITE_BIT); unsigned int s_image_x, s_image_y; unsigned int z_image_x, z_image_y; diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h index 7e91c97..6c69572 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h +++
[Mesa-dev] [PATCH 12/18] i965: Reuse any available upload space for temporary MapBufferRange blits
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/intel_buffer_objects.c | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c b/src/mesa/drivers/dri/i965/intel_buffer_objects.c index 283182d..7110ce6 100644 --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c @@ -354,17 +354,13 @@ brw_map_buffer_range(struct gl_context *ctx, * guarantees the driver has advertised to the application. */ const unsigned alignment = ctx-Const.MinMapBufferAlignment; - - intel_obj-map_extra[index] = (uintptr_t) offset % alignment; - intel_obj-range_map_bo[index] = -brw_bo_create(brw-batch, - BO blit temp, - length + intel_obj-map_extra[index], - alignment, 0); + const unsigned adj = (uintptr_t) offset % alignment; obj-Mappings[index].Pointer = -brw_bo_map(intel_obj-range_map_bo[index], MAP_WRITE) + -intel_obj-map_extra[index]; +intel_upload_space(brw, length + adj, alignment, + intel_obj-range_map_bo[index], + intel_obj-map_extra[index]) + adj; + intel_obj-map_extra[index] += adj; return obj-Mappings[index].Pointer; } else { -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/18] util/register_allocate: Combine the BITSET arrays into a single allocation
ralloc's ability to track all pointers belonging to a context and free them in a single call does not come cheap, and we can reduce the overhead here by combining the array of BITSETs for a regset into a single allocation. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Matt Turner matts...@gmail.com Cc: Jason Ekstrand jason.ekstr...@intel.com Cc: Martin Peres martin.pe...@linux.intel.com --- src/util/register_allocate.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c index 2ad8c3c..f5f7c04 100644 --- a/src/util/register_allocate.c +++ b/src/util/register_allocate.c @@ -187,14 +187,18 @@ ra_alloc_reg_set(void *mem_ctx, unsigned int count) { unsigned int i; struct ra_regs *regs; + BITSET_WORD *conflicts; + int bitset_count = BITSET_WORDS(count); regs = rzalloc(mem_ctx, struct ra_regs); regs-count = count; regs-regs = rzalloc_array(regs, struct ra_reg, count); + conflicts = rzalloc_array(regs-regs, BITSET_WORD, count * bitset_count); for (i = 0; i count; i++) { - regs-regs[i].conflicts = rzalloc_array(regs-regs, BITSET_WORD, - BITSET_WORDS(count)); + regs-regs[i].conflicts = conflicts; + conflicts += bitset_count; + BITSET_SET(regs-regs[i].conflicts, i); regs-regs[i].conflict_list = ralloc_array(regs-regs, unsigned int, 4); @@ -379,6 +383,8 @@ ra_alloc_interference_graph(struct ra_regs *regs, unsigned int count) { struct ra_graph *g; unsigned int i; + BITSET_WORD *adjacency; + int bitset_count = BITSET_WORDS(count); g = rzalloc(NULL, struct ra_graph); g-regs = regs; @@ -387,9 +393,10 @@ ra_alloc_interference_graph(struct ra_regs *regs, unsigned int count) g-stack = rzalloc_array(g, unsigned int, count); + adjacency = rzalloc_array(g, BITSET_WORD, count * bitset_count); for (i = 0; i count; i++) { - int bitset_count = BITSET_WORDS(count); - g-nodes[i].adjacency = rzalloc_array(g, BITSET_WORD, bitset_count); + g-nodes[i].adjacency = adjacency; + adjacency += bitset_count; g-nodes[i].adjacency_list_size = 4; g-nodes[i].adjacency_list = -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] gallium: add interface for writable shader images
On Sun, Jul 5, 2015 at 3:47 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Sun, Jul 5, 2015 at 9:25 AM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com Other approaches are being considered: 1) Don't use resource wrappers (views) and pass all view parameters (format, layer range, level) to set_shader_images just like set_vertex_buffers, set_constant_buffer, or even glBindImageTexture do. 2) Use pipe_sampler_view instead of pipe_image_view, and maybe even use set_sampler_views instead of set_shader_images. set_sampler_views would have to use start_slot = PIPE_MAX_SAMPLERS for all writable images to allow for OpenGL textures in the lower slots. I wouldn't like to go back to using pipe_surface, because it doesn't fit into the DX12 hw design. Please discuss. --- diff --git a/src/gallium/include/pipe/p_context.h b/src/gallium/include/pipe/p_context.h index c2eedf8..13f3938 100644 --- a/src/gallium/include/pipe/p_context.h +++ b/src/gallium/include/pipe/p_context.h @@ -48,6 +48,7 @@ struct pipe_depth_stencil_alpha_state; struct pipe_draw_info; struct pipe_fence_handle; struct pipe_framebuffer_state; +struct pipe_image_view; struct pipe_index_buffer; struct pipe_query; struct pipe_poly_stipple; @@ -236,20 +237,21 @@ struct pipe_context { const float default_inner_level[2]); /** -* Bind an array of shader resources that will be used by the -* graphics pipeline. Any resources that were previously bound to -* the specified range will be unbound after this call. +* Bind an array of images that will be used by a shader. +* Any imagse that were previously bound to the specified range +* will be unbound. * -* \param start first resource to bind. -* \param count number of consecutive resources to bind. -* \param resources array of pointers to the resources to bind, it +* \param shader shader stage where the images will be bound. +* \param start_slot first iamage slot to bind. +* \param count number of consecutive images to bind. +* \param buffersarray of pointers to the images to bind, it * should contain at least \a count elements -* unless it's NULL, in which case no new -* resources will be bound. +* unless it's NULL, in which case no images will +* be bound. */ - void (*set_shader_resources)(struct pipe_context *, -unsigned start, unsigned count, -struct pipe_surface **resources); + void (*set_shader_images)(struct pipe_context *, unsigned shader, + unsigned start_slot, unsigned count, + struct pipe_image_view *images); On Fermi, there is only one set of 8 images that can be bound, irrespective of stage. There are a different set of 8 bindings for 3d and compute pipelines though. Kepler+ is all bindless so it can do whatever. Evergreen only has 12 images shared by fragment and compute. Other shaders can't access them (I think). That's old DX11 hardware though. SI+ is all bindless too. Even though I originally advocated for the stage argument, it appears to neither match Fermi hardware nor the GL API where I believe images are also in a global namespace. Textures are in a global namespace too, but that doesn't make them global in shaders. It really depends on whether combined=vertex=fragment=... or if combined=vertex+fragment+ Now that you brought this up, I've checked what Catalyst reports and indeed it's the former for all SSBOs, images, and atomics. For simplicity and less needed space for binding slots and less overhead managing them, we should follow that design too. So let's remove the shader argument from set_shader_images and set_shader_buffers. void (*set_vertex_buffers)( struct pipe_context *, unsigned start_slot, @@ -392,6 +394,17 @@ struct pipe_context { struct pipe_surface *); /** +* Create an image view into a buffer or texture to be used with load, +* store, and atomic instructions by a shader stage. +*/ + struct pipe_image_view * (*create_image_view)(struct pipe_context *ctx, + struct pipe_resource *texture, + const struct pipe_image_view *templat); + + void (*image_view_destroy)(struct pipe_context *ctx, + struct pipe_image_view *view); + + /** * Map a resource. * * Transfers are (by default) context-private and allow uploads to be diff --git a/src/gallium/include/pipe/p_state.h b/src/gallium/include/pipe/p_state.h index dc8f550..f655dda 100644 ---
Re: [Mesa-dev] [PATCH RFC] egl/dri2: Add dri3 support to x11 platform
On 4 July 2015 at 04:18, Boyan Ding boyan.j.d...@gmail.com wrote: Hi Emil, On 07/03/2015 10:36 PM, Emil Velikov wrote: Hi Boyan, Thank you for doing this ! A few suggestions which you might be interesting: Considering that the backend has handled more than dri2 perhaps we can do a s/dri2/dri/ :-) That obviously is independent of your work. On 01/07/15 16:31, Boyan Ding wrote: Signed-off-by: Boyan Ding boyan.j.d...@gmail.com --- configure.ac |3 + src/egl/drivers/dri2/Makefile.am |5 + src/egl/drivers/dri2/egl_dri2.c | 65 +- src/egl/drivers/dri2/egl_dri2.h | 12 +- src/egl/drivers/dri2/platform_x11.c | 127 ++- src/egl/drivers/dri2/platform_x11_dri3.c | 1591 ++ src/egl/drivers/dri2/platform_x11_dri3.h | 140 +++ 7 files changed, 1926 insertions(+), 17 deletions(-) create mode 100644 src/egl/drivers/dri2/platform_x11_dri3.c create mode 100644 src/egl/drivers/dri2/platform_x11_dri3.h diff --git a/configure.ac b/configure.ac index af61aa2..090e6c9 100644 --- a/configure.ac +++ b/configure.ac @@ -1545,6 +1545,9 @@ if test x$enable_egl = xyes; then if test x$enable_shared_glapi = xno; then AC_MSG_ERROR([egl_dri2 requires --enable-shared-glapi]) fi +if test x$enable_dri3 = xyes; then +EGL_LIB_DEPS=$EGL_LIB_DEPS -lxcb-dri3 -lxcb-present -lXxf86vm -lxshmfence -lxcb-sync Neither one of these should be a direct -lfoo expression. We should check/use PKG_CHECK_MODULES and foo_{CFLAGS,LIBS}. I'll correct that. +fi else # Avoid building an empty libEGL. Drop/update this # when other backends (haiku?) come along. diff --git a/src/egl/drivers/dri2/Makefile.am b/src/egl/drivers/dri2/Makefile.am index 55be4a7..d5b511c 100644 --- a/src/egl/drivers/dri2/Makefile.am +++ b/src/egl/drivers/dri2/Makefile.am @@ -52,6 +52,11 @@ if HAVE_EGL_PLATFORM_X11 libegl_dri2_la_SOURCES += platform_x11.c AM_CFLAGS += -DHAVE_X11_PLATFORM AM_CFLAGS += $(XCB_DRI2_CFLAGS) +if HAVE_DRI3 +libegl_dri2_la_SOURCES += \ +platform_x11_dri3.c \ +platform_x11_dri3.h +endif endif if HAVE_EGL_PLATFORM_WAYLAND diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index b1b65f7..052c579 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -322,6 +322,12 @@ struct dri2_extension_match { int offset; }; +static struct dri2_extension_match dri3_driver_extensions[] = { + { __DRI_CORE, 1, offsetof(struct dri2_egl_display, core) }, + { __DRI_IMAGE_DRIVER, 1, offsetof(struct dri2_egl_display, image_driver) }, + { NULL, 0, 0 } +}; + static struct dri2_extension_match dri2_driver_extensions[] = { { __DRI_CORE, 1, offsetof(struct dri2_egl_display, core) }, { __DRI_DRI2, 2, offsetof(struct dri2_egl_display, dri2) }, @@ -464,6 +470,24 @@ dri2_open_driver(_EGLDisplay *disp) } EGLBoolean +dri2_load_driver_dri3(_EGLDisplay *disp) dri3_load_driver perhaps ? I prefixed all my functions in public files with dri2 instead of dri3 because the EGL driver is currently called DRI2 (although it seems funny to see dri3 in a driver called dri2). The current name of the driver is DRI2 because it uses dri2 to talk with the X server and uses similar technologies with direct rendering on other platforms. With my patch, the first reason becomes invalid. But should the name of the driver or namespace of all functions (or just the functions involved with dri3) change? I'm open to others' suggestions. The same applies to comments on function names below. Iirc for swrast based wayland (platform_wayland.c) we already remove the it uses dri2 assumption. Then again all this is a simple name change which, as mentioned before, is not strictly related to your work. [snip] diff --git a/src/egl/drivers/dri2/egl_dri2.h b/src/egl/drivers/dri2/egl_dri2.h index f0cc6da..d258753 100644 --- a/src/egl/drivers/dri2/egl_dri2.h +++ b/src/egl/drivers/dri2/egl_dri2.h @@ -153,11 +153,16 @@ struct dri2_egl_display int dri2_major; int dri2_minor; + int dri3_major; + int dri3_minor; + int present_major; + int present_minor; Many of these are unused. Same goes for the glx code which was the inspiration for this work :-) I think it's safe to remove them then. __DRIscreen *dri_screen; int own_dri_screen; const __DRIconfig **driver_configs; void *driver; const __DRIcoreExtension *core; + const __DRIimageDriverExtension *image_driver; const __DRIdri2Extension *dri2; const __DRIswrastExtension *swrast; const
[Mesa-dev] i965: Context local batch manager, take 2
Since the last posting, it has grown a few superficial patches for tweaks in the general area and a couple of drive-bys from looking at the synmark profiles. As for the main patch, I've worked through the few piglit regressions and it should be clean on ivb/byt/hsw/bdw to the best of my knowledge, and thanks to Jesse's prodding I have written substantial comments to try and explain the intricacies of the kernel interface we are trying to optimise. Please review kindly, -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 18/18] i965: AMD_pinned_memory and userptr
All GEN GPU can bind to any piece of memory (thanks UMA), and so through a special ioctl we can map a chunk of page-aligned client memory into the GPU address space. However, not all GEN are equal. Some have cache-coherency between the CPU and the GPU, whilst the others are incoherent and rely on snooping on explicit flushes to push/pull dirty data. Whereas we can use client buffers as a general replacement for kernel allocated buffers with LLC (cache coherency), using snooped buffers behaves differently and so must be used with care. AMD_pinned_memory supposes that the client memory buffer is suitable for any general usage (e.g. vertex data, texture data) and so only on LLC can we offer that extension. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/brw_batch.c| 36 + src/mesa/drivers/dri/i965/brw_batch.h| 8 + src/mesa/drivers/dri/i965/intel_buffer_objects.c | 40 +--- src/mesa/drivers/dri/i965/intel_extensions.c | 8 + src/mesa/drivers/dri/i965/intel_screen.c | 14 + src/mesa/drivers/dri/i965/intel_screen.h | 2 +- 6 files changed, 96 insertions(+), 12 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_batch.c b/src/mesa/drivers/dri/i965/brw_batch.c index e01a0c4..1f73148 100644 --- a/src/mesa/drivers/dri/i965/brw_batch.c +++ b/src/mesa/drivers/dri/i965/brw_batch.c @@ -1323,6 +1323,42 @@ struct brw_bo *brw_bo_create(struct brw_batch *batch, return bo; } +/* + * Wrap the chunk of client memory given by ptr+size inside a GPU + * buffer, and make it cache coherent (though on non-LLC architectures + * this requires snooping on explicit cache flushes). This allows the + * caller to write into the memory chunk and for those writes to be + * visible on the GPU (exactly as if they create the buffer and then + * persistently mapped it to obtain the pointer). + */ +struct brw_bo *brw_bo_create_userptr(struct brw_batch *batch, + const char *name, + void *ptr, + uint64_t size, + uint64_t alignment) +{ + drm_intel_bo *base; + struct brw_bo *bo; + + base = drm_intel_bo_alloc_userptr(batch-bufmgr, name, + ptr, I915_TILING_NONE, 0, size, 0); + if (base == NULL) + return NULL; + + base-align = alignment; + bo = brw_bo_import(batch, base, false); + if (bo == NULL) { + drm_intel_bo_unreference(base); + return NULL; + } + + bo-cache_coherent = true; + bo-reusable = false; + list_move(bo-link, bo-batch-inactive); + + return bo; +} + static bool __brw_bo_set_caching(struct brw_bo *bo, int caching) { struct drm_i915_gem_caching arg; diff --git a/src/mesa/drivers/dri/i965/brw_batch.h b/src/mesa/drivers/dri/i965/brw_batch.h index 3628b03..3ee9b20 100644 --- a/src/mesa/drivers/dri/i965/brw_batch.h +++ b/src/mesa/drivers/dri/i965/brw_batch.h @@ -215,6 +215,14 @@ brw_bo_create_tiled(struct brw_batch *batch, uint32_t *pitch, unsigned flags); +/* Create a local brw_bo for GPU access to client memory */ +struct brw_bo * +brw_bo_create_userptr(struct brw_batch *batch, + const char *name, + void *ptr, + uint64_t size, + uint64_t alignment); + /* Create a local brw_bo for a foreign buffer using its global flinked name */ struct brw_bo *brw_bo_create_from_name(struct brw_batch *batch, const char *name, diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c b/src/mesa/drivers/dri/i965/intel_buffer_objects.c index 7110ce6..03d0982 100644 --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c @@ -58,14 +58,10 @@ mark_buffer_inactive(struct intel_buffer_object *intel_obj) intel_obj-gpu_active_end = 0; } -/** Allocates a new brw_bo to store the data for the buffer object. */ static void -alloc_buffer_object(struct brw_context *brw, -struct intel_buffer_object *intel_obj) +mark_new_state(struct brw_context *brw, + struct intel_buffer_object *intel_obj) { - intel_obj-buffer = - brw_bo_create(brw-batch, bufferobj, intel_obj-Base.Size, 64, 0); - /* the buffer might be bound as a uniform buffer, need to update it */ if (intel_obj-Base.UsageHistory USAGE_UNIFORM_BUFFER) @@ -74,8 +70,18 @@ alloc_buffer_object(struct brw_context *brw, brw-ctx.NewDriverState |= BRW_NEW_TEXTURE_BUFFER; if (intel_obj-Base.UsageHistory USAGE_ATOMIC_COUNTER_BUFFER) brw-ctx.NewDriverState |= BRW_NEW_ATOMIC_BUFFER; +} + +/** Allocates a new brw_bo to store the data for the buffer object. */ +static void +alloc_buffer_object(struct brw_context *brw, +struct
Re: [Mesa-dev] [RFC] gallium: add interface for writable shader images
Am 05.07.2015 um 15:47 schrieb Ilia Mirkin: On Sun, Jul 5, 2015 at 9:25 AM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com Other approaches are being considered: 1) Don't use resource wrappers (views) and pass all view parameters (format, layer range, level) to set_shader_images just like set_vertex_buffers, set_constant_buffer, or even glBindImageTexture do. 2) Use pipe_sampler_view instead of pipe_image_view, and maybe even use set_sampler_views instead of set_shader_images. set_sampler_views would have to use start_slot = PIPE_MAX_SAMPLERS for all writable images to allow for OpenGL textures in the lower slots. I wouldn't like to go back to using pipe_surface, because it doesn't fit into the DX12 hw design. Please discuss. --- diff --git a/src/gallium/include/pipe/p_context.h b/src/gallium/include/pipe/p_context.h index c2eedf8..13f3938 100644 --- a/src/gallium/include/pipe/p_context.h +++ b/src/gallium/include/pipe/p_context.h @@ -48,6 +48,7 @@ struct pipe_depth_stencil_alpha_state; struct pipe_draw_info; struct pipe_fence_handle; struct pipe_framebuffer_state; +struct pipe_image_view; struct pipe_index_buffer; struct pipe_query; struct pipe_poly_stipple; @@ -236,20 +237,21 @@ struct pipe_context { const float default_inner_level[2]); /** -* Bind an array of shader resources that will be used by the -* graphics pipeline. Any resources that were previously bound to -* the specified range will be unbound after this call. +* Bind an array of images that will be used by a shader. +* Any imagse that were previously bound to the specified range +* will be unbound. * -* \param start first resource to bind. -* \param count number of consecutive resources to bind. -* \param resources array of pointers to the resources to bind, it +* \param shader shader stage where the images will be bound. +* \param start_slot first iamage slot to bind. +* \param count number of consecutive images to bind. +* \param buffersarray of pointers to the images to bind, it * should contain at least \a count elements -* unless it's NULL, in which case no new -* resources will be bound. +* unless it's NULL, in which case no images will +* be bound. */ - void (*set_shader_resources)(struct pipe_context *, -unsigned start, unsigned count, -struct pipe_surface **resources); + void (*set_shader_images)(struct pipe_context *, unsigned shader, + unsigned start_slot, unsigned count, + struct pipe_image_view *images); On Fermi, there is only one set of 8 images that can be bound, irrespective of stage. There are a different set of 8 bindings for 3d and compute pipelines though. Kepler+ is all bindless so it can do whatever. Even though I originally advocated for the stage argument, it appears to neither match Fermi hardware nor the GL API where I believe images are also in a global namespace. That looks exactly like d3d11 restrictions. UAVs were only in pixel shaders and compute shaders. d3d11.1 made them available to all shader stages though, but they continue to share the same bind points in all shader stages (and still share them with render targets). I don't really know though what's the better design. GL though seems to think this is per shader stage (there are both per-stage and combined limits just like as for textures). void (*set_vertex_buffers)( struct pipe_context *, unsigned start_slot, @@ -392,6 +394,17 @@ struct pipe_context { struct pipe_surface *); /** +* Create an image view into a buffer or texture to be used with load, +* store, and atomic instructions by a shader stage. +*/ + struct pipe_image_view * (*create_image_view)(struct pipe_context *ctx, + struct pipe_resource *texture, + const struct pipe_image_view *templat); + + void (*image_view_destroy)(struct pipe_context *ctx, + struct pipe_image_view *view); + + /** * Map a resource. * * Transfers are (by default) context-private and allow uploads to be diff --git a/src/gallium/include/pipe/p_state.h b/src/gallium/include/pipe/p_state.h index dc8f550..f655dda 100644 --- a/src/gallium/include/pipe/p_state.h +++ b/src/gallium/include/pipe/p_state.h @@ -386,6 +386,31 @@ struct pipe_sampler_view /** + * A view into a writable buffer or texture that can be bound to a shader + * stage. + */ +struct pipe_image_view +{ + struct pipe_reference
[Mesa-dev] [PATCH 14/18] util/register_allocate: Compute transitive conflicts using 2-passes
Avoid frequent use of reralloc() for tracking the conflicts list, and walking that list every time we add a transitive conflict, by making the observation we apply the indirect conflicts by combining the conflicts of a conflicting register in a second pass. Reduces brw_compiler_create() from 18351.5us to 4787.1us on my ivb i7-3720QM (in context that 18ms represents about 50% of the time it takes to start X, though why X instantiates an intel_screen at all remains a mystery). Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Matt Turner matts...@gmail.com Cc: Jason Ekstrand jason.ekstr...@intel.com Cc: Martin Peres martin.pe...@linux.intel.com --- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 18 +++- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 16 ++- src/util/register_allocate.c | 53 +- src/util/register_allocate.h | 2 + 4 files changed, 64 insertions(+), 25 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp index 8e5621d..7f87221 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp @@ -223,7 +223,7 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int reg_width) for (int base_reg = j; base_reg j + (class_sizes[i] + 1) / 2; base_reg++) { - ra_add_transitive_reg_conflict(regs, base_reg, reg); + ra_mark_transitive_reg_conflict(regs, base_reg, reg); } reg++; @@ -237,7 +237,7 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int reg_width) for (int base_reg = j; base_reg j + class_sizes[i]; base_reg++) { - ra_add_transitive_reg_conflict(regs, base_reg, reg); + ra_mark_transitive_reg_conflict(regs, base_reg, reg); } reg++; @@ -246,6 +246,20 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int reg_width) } assert(reg == ra_reg_count); + reg = 0; + for (int i = 0; i class_count; i++) { + int class_size = class_sizes[i]; + int class_reg_count = base_reg_count - (class_size - 1); + if (devinfo-gen = 5 reg_width == 2) + class_size = (class_size + 1) / 2; + for (int j = 0; j class_reg_count; j++) { +for (int base_reg = j; base_reg j + class_size; base_reg++) + ra_add_transitive_reg_conflict(regs, base_reg, reg); +reg++; + } + } + assert(reg == ra_reg_count); + /* Add a special class for aligned pairs, which we'll put delta_xy * in on Gen = 6 so that we can do PLN. */ diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp index 555c42e..93b7297 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp @@ -140,7 +140,7 @@ brw_vec4_alloc_reg_set(struct brw_compiler *compiler) for (int base_reg = j; base_reg j + class_sizes[i]; base_reg++) { - ra_add_transitive_reg_conflict(compiler-vec4_reg_set.regs, base_reg, reg); + ra_mark_transitive_reg_conflict(compiler-vec4_reg_set.regs, base_reg, reg); } reg++; @@ -158,6 +158,20 @@ brw_vec4_alloc_reg_set(struct brw_compiler *compiler) } assert(reg == ra_reg_count); + reg = 0; + for (int i = 0; i class_count; i++) { + int class_reg_count = base_reg_count - (class_sizes[i] - 1); + for (int j = 0; j class_reg_count; j++) { +for (int base_reg = j; + base_reg j + class_sizes[i]; + base_reg++) { + ra_add_transitive_reg_conflict(compiler-vec4_reg_set.regs, base_reg, reg); +} +reg++; + } + } + assert(reg == ra_reg_count); + ra_set_finalize(compiler-vec4_reg_set.regs, q_values); for (int i = 0; i MAX_VGRF_SIZE; i++) diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c index f5f7c04..2bbab7f 100644 --- a/src/util/register_allocate.c +++ b/src/util/register_allocate.c @@ -83,19 +83,17 @@ struct ra_reg { BITSET_WORD *conflicts; - unsigned int *conflict_list; - unsigned int conflict_list_size; - unsigned int num_conflicts; + unsigned int conflict_range[2]; }; struct ra_regs { struct ra_reg *regs; - unsigned int count; struct ra_class **classes; unsigned int class_count; bool round_robin; + unsigned int count; }; struct ra_class { @@ -200,11 +198,8 @@ ra_alloc_reg_set(void *mem_ctx, unsigned int count) conflicts += bitset_count; BITSET_SET(regs-regs[i].conflicts, i); - - regs-regs[i].conflict_list = ralloc_array(regs-regs, unsigned int, 4); - regs-regs[i].conflict_list_size = 4; - regs-regs[i].conflict_list[0] = i; -
[Mesa-dev] [PATCH 15/18] swrast: Defer _tnl_vertex_init until first use
The vertices require a large chunk of memory, currently allocated during context creation. However, this memory is not required until use so we can defer the allocation until the first swrast_Wakeup(). Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Kenneth Graunke kenn...@whitecape.org --- src/mesa/swrast_setup/ss_context.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/mesa/swrast_setup/ss_context.c b/src/mesa/swrast_setup/ss_context.c index 74b1da3..028eccb 100644 --- a/src/mesa/swrast_setup/ss_context.c +++ b/src/mesa/swrast_setup/ss_context.c @@ -59,10 +59,6 @@ _swsetup_CreateContext( struct gl_context *ctx ) swsetup-NewState = ~0; _swsetup_trifuncs_init( ctx ); - _tnl_init_vertices( ctx, ctx-Const.MaxArrayLockSize + 12, - sizeof(SWvertex) ); - - return GL_TRUE; } @@ -233,6 +229,11 @@ _swsetup_Wakeup( struct gl_context *ctx ) TNLcontext *tnl = TNL_CONTEXT(ctx); SScontext *swsetup = SWSETUP_CONTEXT(ctx); + if (!(GET_VERTEX_STATE(ctx))-max_vertex_size) + _tnl_init_vertices(ctx, +ctx-Const.MaxArrayLockSize + 12, +sizeof(SWvertex)); + tnl-Driver.Render.Start = _swsetup_RenderStart; tnl-Driver.Render.Finish = _swsetup_RenderFinish; tnl-Driver.Render.PrimitiveNotify = _swsetup_RenderPrimitive; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 17/18] loader: Look for any version of currently linked libudev.so
Since there was an ABI break and linking twice against libudev.so.0 and libudev.so.1 causes the application to quickly crash, we first check if the application is currently linked against libudev before dlopening a local handle. However for backwards/forwards compatability, we need to inspect the application for current linkage against all known versions first. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/loader/loader.c | 46 -- 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/src/loader/loader.c b/src/loader/loader.c index 8452cd3..8780587 100644 --- a/src/loader/loader.c +++ b/src/loader/loader.c @@ -128,26 +128,36 @@ static void *udev_handle = NULL; static void * udev_dlopen_handle(void) { - if (!udev_handle) { - udev_handle = dlopen(libudev.so.1, RTLD_LOCAL | RTLD_LAZY); - - if (!udev_handle) { - /* libudev.so.1 changed the return types of the two unref functions - * from voids to pointers. We don't use those return values, and the - * only ABI I've heard that cares about this kind of change (calling - * a function with a void * return that actually only returns void) - * might be ia64. - */ - udev_handle = dlopen(libudev.so.0, RTLD_LOCAL | RTLD_LAZY); - - if (!udev_handle) { -log_(_LOADER_WARNING, Couldn't dlopen libudev.so.1 or - libudev.so.0, driver detection may be broken.\n); - } + char name[80]; + unsigned flags = RTLD_NOLOAD | RTLD_LOCAL | RTLD_LAZY; + int version; + + /* libudev.so.1 changed the return types of the two unref functions +* from voids to pointers. We don't use those return values, and the +* only ABI I've heard that cares about this kind of change (calling +* a function with a void * return that actually only returns void) +* might be ia64. +*/ + + /* First try opening an already linked libudev, then try loading one */ + do { + for (version = 1; version = 0; version--) { + snprintf(name, sizeof(name), libudev.so.%d, version); + udev_handle = dlopen(name, flags); + if (udev_handle) +return udev_handle; } - } - return udev_handle; + if ((flags RTLD_NOLOAD) == 0) + break; + + flags = ~RTLD_NOLOAD; + } while (1); + + log_(_LOADER_WARNING, +Couldn't dlopen libudev.so.1 or +libudev.so.0, driver detection may be broken.\n); + return NULL; } static int dlsym_failed = 0; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 16/18] i965: Prevent coordinate overflow in intel_emit_linear_blit
Fixes regression from commit 8c17d53823c77ac1c56b0548e4e54f69a33285f1 Author: Kenneth Graunke kenn...@whitecape.org Date: Wed Apr 15 03:04:33 2015 -0700 i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions. which adjusted the coordinates to be relative to the nearest cacheline. However, this then offsets the coordinates by up to 63 and this may then cause them to overflow the BLT limits. For the well aligned large transfer case, we can use 32bpp pixels and so reduce the coordinates by 4 (versus the current 8bpp pixels). We also have to be more careful doing the last line just in case it may exceed the coordinate limit. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90734 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Kenneth Graunke kenn...@whitecape.org Cc: Ian Romanick ian.d.roman...@intel.com Cc: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/intel_blit.c | 62 -- 1 file changed, 29 insertions(+), 33 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_blit.c b/src/mesa/drivers/dri/i965/intel_blit.c index 7e81685..dbb53d8 100644 --- a/src/mesa/drivers/dri/i965/intel_blit.c +++ b/src/mesa/drivers/dri/i965/intel_blit.c @@ -762,35 +762,20 @@ intel_emit_linear_blit(struct brw_context *brw, int16_t src_x, dst_x; bool ok; - /* The pitch given to the GPU must be DWORD aligned, and -* we want width to match pitch. Max width is (1 15 - 1), -* rounding that down to the nearest DWORD is 1 15 - 4 -*/ - pitch = ROUND_DOWN_TO(MIN2(size, (1 15) - 1), 4); - height = (pitch == 0) ? 1 : size / pitch; - src_x = src_offset % 64; - dst_x = dst_offset % 64; - ok = intelEmitCopyBlit(brw, 1, - pitch, src_bo, src_offset - src_x, I915_TILING_NONE, - INTEL_MIPTREE_TRMODE_NONE, - pitch, dst_bo, dst_offset - dst_x, I915_TILING_NONE, - INTEL_MIPTREE_TRMODE_NONE, - src_x, 0, /* src x/y */ - dst_x, 0, /* dst x/y */ - pitch, height, /* w, h */ - GL_COPY); - if (!ok) - _mesa_problem(ctx, Failed to linear blit %dx%d\n, pitch, height); - - src_offset += pitch * height; - dst_offset += pitch * height; - src_x = src_offset % 64; - dst_x = dst_offset % 64; - size -= pitch * height; - assert (size (1 15)); - pitch = ALIGN(size, 4); - - if (size != 0) { + do { + /* The pitch given to the GPU must be DWORD aligned, and + * we want width to match pitch. Max width is (1 15 - 1), + * rounding that down to the nearest DWORD is 1 15 - 4 + */ + pitch = ROUND_DOWN_TO(MIN2(size, (1 15) - 64), 4); + height = (size pitch || pitch == 0) ? 1 : size / pitch; + + src_x = src_offset % 64; + dst_x = dst_offset % 64; + pitch = ALIGN(MIN2(size, (1 15) - 64), 4); + assert(src_x + pitch 1 15); + assert(dst_x + pitch 1 15); + ok = intelEmitCopyBlit(brw, 1, pitch, src_bo, src_offset - src_x, I915_TILING_NONE, INTEL_MIPTREE_TRMODE_NONE, @@ -798,11 +783,22 @@ intel_emit_linear_blit(struct brw_context *brw, INTEL_MIPTREE_TRMODE_NONE, src_x, 0, /* src x/y */ dst_x, 0, /* dst x/y */ -size, 1, /* w, h */ +MIN2(size, pitch), height, /* w, h */ GL_COPY); - if (!ok) - _mesa_problem(ctx, Failed to linear blit %dx%d\n, size, 1); - } + if (!ok) { +_mesa_problem(ctx, Failed to linear blit %dx%d\n, + MIN2(size, pitch), height); +return; + } + + pitch *= height; + if (size = pitch) +return; + + src_offset += pitch; + dst_offset += pitch; + size -= pitch; + } while (1); } /** -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] clover: Implement image attribute getters
Zoltán Gilián zoltan.gil...@gmail.com writes: This seems to be doing essentially the same thing as v1? Is it the right patch? The llvm pass was invoked in clover in v1. This patch relies on llvm to perform that task (). What this patch does basically is that it adds the image attributes to the end of the kernel input vector. The commit message of this patch is misleading, I'll fix it. NAK. Just like in v1, you're implementing the same pipe driver-specific policy in Clover's core layer -- If you don't feel like fixing this properly as I described in my reply to v1, it would be acceptable to implement it for the time being using a workaround similar to llvm/invocation.cpp:433 -- Hint: you'll need new module::argument::semantic enums. Thanks. On Wed, Jun 24, 2015 at 2:48 PM, Francisco Jerez curroje...@riseup.net wrote: Zoltan Gilian zoltan.gil...@gmail.com writes: Image attributes are passed to the kernel as hidden parameters after the image attribute itself. An llvm pass replaces the getter builtins to the appropriate parameters. This seems to be doing essentially the same thing as v1? Is it the right patch? --- src/gallium/state_trackers/clover/core/kernel.cpp | 26 +++ src/gallium/state_trackers/clover/core/kernel.hpp | 13 ++-- src/gallium/state_trackers/clover/core/memory.cpp | 2 +- .../state_trackers/clover/llvm/invocation.cpp | 81 +- 4 files changed, 116 insertions(+), 6 deletions(-) diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp b/src/gallium/state_trackers/clover/core/kernel.cpp index 0756f06..291c799 100644 --- a/src/gallium/state_trackers/clover/core/kernel.cpp +++ b/src/gallium/state_trackers/clover/core/kernel.cpp @@ -185,6 +185,13 @@ kernel::exec_context::bind(intrusive_ptrcommand_queue _q, } } + // Bind image attribute args. + for (const auto arg: kern._args) { + if (auto img_arg = dynamic_castimage_argument*(arg.get())) { + img_arg-bind_attributes(*this); + } + } + // Create a new compute state if anything changed. if (!st || q != _q || cs.req_local_mem != mem_local || @@ -465,6 +472,25 @@ kernel::constant_argument::unbind(exec_context ctx) { } void +kernel::image_argument::bind_attributes(exec_context ctx) { + cl_image_format format = img-format(); + cl_uint attributes[] = { + static_castcl_uint(img-width()), + static_castcl_uint(img-height()), + static_castcl_uint(img-depth()), + format.image_channel_data_type, + format.image_channel_order}; + for (unsigned i = 0; i 5; ++i) { + auto v = bytes(attributes[i]); + + extend(v, module::argument::zero_ext, sizeof(cl_uint)); + byteswap(v, ctx.q-device().endianness()); + align(ctx.input, sizeof(cl_uint)); + insert(ctx.input, v); + } +} + +void kernel::image_rd_argument::set(size_t size, const void *value) { if (size != sizeof(cl_mem)) throw error(CL_INVALID_ARG_SIZE); diff --git a/src/gallium/state_trackers/clover/core/kernel.hpp b/src/gallium/state_trackers/clover/core/kernel.hpp index d6432a4..8c15b2f 100644 --- a/src/gallium/state_trackers/clover/core/kernel.hpp +++ b/src/gallium/state_trackers/clover/core/kernel.hpp @@ -190,7 +190,14 @@ namespace clover { pipe_surface *st; }; - class image_rd_argument : public argument { + class image_argument : public argument { + public: + void bind_attributes(exec_context ctx); + protected: + image *img; + }; + + class image_rd_argument : public image_argument { public: virtual void set(size_t size, const void *value); virtual void bind(exec_context ctx, @@ -198,11 +205,10 @@ namespace clover { virtual void unbind(exec_context ctx); private: - image *img; pipe_sampler_view *st; }; - class image_wr_argument : public argument { + class image_wr_argument : public image_argument { public: virtual void set(size_t size, const void *value); virtual void bind(exec_context ctx, @@ -210,7 +216,6 @@ namespace clover { virtual void unbind(exec_context ctx); private: - image *img; pipe_surface *st; }; diff --git a/src/gallium/state_trackers/clover/core/memory.cpp b/src/gallium/state_trackers/clover/core/memory.cpp index 055336a..b852e68 100644 --- a/src/gallium/state_trackers/clover/core/memory.cpp +++ b/src/gallium/state_trackers/clover/core/memory.cpp @@ -189,7 +189,7 @@ image2d::image2d(clover::context ctx, cl_mem_flags flags, const cl_image_format *format, size_t width, size_t height, size_t row_pitch, void *host_ptr) : - image(ctx, flags, format, width, height, 0, + image(ctx, flags, format, width, height, 1,
[Mesa-dev] [PATCH] radeonsi: Use param export count from si_llvm_export_vs in si_shader_vs
From: Michel Dänzer michel.daen...@amd.com This eliminates the error prone logic in si_shader_vs recalculating this value. It also fixes TGSI_SEMANTIC_CLIPDIST outputs incorrectly not being counted for VS exports, since they are passed to the fragment shader as varyings as well. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91193 Signed-off-by: Michel Dänzer michel.daen...@amd.com --- src/gallium/drivers/radeonsi/si_shader.c| 2 ++ src/gallium/drivers/radeonsi/si_shader.h| 1 + src/gallium/drivers/radeonsi/si_state_shaders.c | 25 +++-- 3 files changed, 6 insertions(+), 22 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 4d97b58..753b238 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -1218,6 +1218,8 @@ handle_semantic: } } + shader-nr_param_exports = param_count; + /* We need to add the position output manually if it's missing. */ if (!pos_args[0][0]) { pos_args[0][0] = lp_build_const_int32(base-gallivm, 0xf); /* writemask */ diff --git a/src/gallium/drivers/radeonsi/si_shader.h b/src/gallium/drivers/radeonsi/si_shader.h index b4339ae..8d309b4 100644 --- a/src/gallium/drivers/radeonsi/si_shader.h +++ b/src/gallium/drivers/radeonsi/si_shader.h @@ -165,6 +165,7 @@ struct si_shader { booluses_instanceid; unsignednr_pos_exports; + unsignednr_param_exports; boolis_gs_copy_shader; booldx10_clamp_mode; /* convert NaNs to 0 */ }; diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index eef3baa..a842d9d 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -148,10 +148,9 @@ static void si_shader_gs(struct si_shader *shader) static void si_shader_vs(struct si_shader *shader) { - struct tgsi_shader_info *info = shader-selector-info; struct si_pm4_state *pm4; unsigned num_sgprs, num_user_sgprs; - unsigned nparams, i, vgpr_comp_cnt; + unsigned nparams, vgpr_comp_cnt; uint64_t va; unsigned window_space = shader-selector-info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION]; @@ -180,26 +179,8 @@ static void si_shader_vs(struct si_shader *shader) } assert(num_sgprs = 104); - /* Certain attributes (position, psize, etc.) don't count as params. -* VS is required to export at least one param and r600_shader_from_tgsi() -* takes care of adding a dummy export. -*/ - for (nparams = 0, i = 0 ; i info-num_outputs; i++) { - switch (info-output_semantic_name[i]) { - case TGSI_SEMANTIC_CLIPVERTEX: - case TGSI_SEMANTIC_CLIPDIST: - case TGSI_SEMANTIC_CULLDIST: - case TGSI_SEMANTIC_POSITION: - case TGSI_SEMANTIC_PSIZE: - case TGSI_SEMANTIC_EDGEFLAG: - break; - default: - nparams++; - } - } - if (nparams 1) - nparams = 1; - + /* VS is required to export at least one param. */ + nparams = MAX2(shader-nr_param_exports, 1); si_pm4_set_reg(pm4, R_0286C4_SPI_VS_OUT_CONFIG, S_0286C4_VS_EXPORT_COUNT(nparams - 1)); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90821] Segfault when calling glViewport on surfaceless EGL context without bound FBO
https://bugs.freedesktop.org/show_bug.cgi?id=90821 Philip Derrin philip+freedesk...@breakawayconsulting.com.au changed: What|Removed |Added CC||philip+freedesktop@breakawa ||yconsulting.com.au --- Comment #4 from Philip Derrin philip+freedesk...@breakawayconsulting.com.au --- I encountered this too, on i965. It happens because intel_viewport calls _mesa_is_winsys_fbo to determine whether the draw buffer has an associated DRI2 drawable, but that predicate returns true for the global IncompleteFramebuffer which is used by surfaceless contexts, which obviously has no DRI2 drawable. A null pointer check in intel_viewport fixes it. The i915 driver looks like it has the same problem (in intel_invalidate_viewport), though I haven't tested it. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Use param export count from si_llvm_export_vs in si_shader_vs
I thought your patch does something else, but after re-reading it, it seems to do the right thing. Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Mon, Jul 6, 2015 at 1:38 PM, Marek Olšák mar...@gmail.com wrote: Hi Michel, VS_EXPORT_COUNT shouldn't include POS exports (POSITION, MISC (psize, edgeflag, layer, viewport), and CLIPDIST/CULLDIST), it should only count PARAM exports. Setting higher VS_EXPORT_COUNT doesn't affect correctness but it allocates more parameter memory. CLIPDIST is special in that it's exported twice, first as POS and then as PARAM, the same will be done for CULLDIST in the future, so these two should indeed be removed from the list. Marek On Mon, Jul 6, 2015 at 10:30 AM, Michel Dänzer mic...@daenzer.net wrote: From: Michel Dänzer michel.daen...@amd.com This eliminates the error prone logic in si_shader_vs recalculating this value. It also fixes TGSI_SEMANTIC_CLIPDIST outputs incorrectly not being counted for VS exports, since they are passed to the fragment shader as varyings as well. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91193 Signed-off-by: Michel Dänzer michel.daen...@amd.com --- src/gallium/drivers/radeonsi/si_shader.c| 2 ++ src/gallium/drivers/radeonsi/si_shader.h| 1 + src/gallium/drivers/radeonsi/si_state_shaders.c | 25 +++-- 3 files changed, 6 insertions(+), 22 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 4d97b58..753b238 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -1218,6 +1218,8 @@ handle_semantic: } } + shader-nr_param_exports = param_count; + /* We need to add the position output manually if it's missing. */ if (!pos_args[0][0]) { pos_args[0][0] = lp_build_const_int32(base-gallivm, 0xf); /* writemask */ diff --git a/src/gallium/drivers/radeonsi/si_shader.h b/src/gallium/drivers/radeonsi/si_shader.h index b4339ae..8d309b4 100644 --- a/src/gallium/drivers/radeonsi/si_shader.h +++ b/src/gallium/drivers/radeonsi/si_shader.h @@ -165,6 +165,7 @@ struct si_shader { booluses_instanceid; unsignednr_pos_exports; + unsignednr_param_exports; boolis_gs_copy_shader; booldx10_clamp_mode; /* convert NaNs to 0 */ }; diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index eef3baa..a842d9d 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -148,10 +148,9 @@ static void si_shader_gs(struct si_shader *shader) static void si_shader_vs(struct si_shader *shader) { - struct tgsi_shader_info *info = shader-selector-info; struct si_pm4_state *pm4; unsigned num_sgprs, num_user_sgprs; - unsigned nparams, i, vgpr_comp_cnt; + unsigned nparams, vgpr_comp_cnt; uint64_t va; unsigned window_space = shader-selector-info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION]; @@ -180,26 +179,8 @@ static void si_shader_vs(struct si_shader *shader) } assert(num_sgprs = 104); - /* Certain attributes (position, psize, etc.) don't count as params. -* VS is required to export at least one param and r600_shader_from_tgsi() -* takes care of adding a dummy export. -*/ - for (nparams = 0, i = 0 ; i info-num_outputs; i++) { - switch (info-output_semantic_name[i]) { - case TGSI_SEMANTIC_CLIPVERTEX: - case TGSI_SEMANTIC_CLIPDIST: - case TGSI_SEMANTIC_CULLDIST: - case TGSI_SEMANTIC_POSITION: - case TGSI_SEMANTIC_PSIZE: - case TGSI_SEMANTIC_EDGEFLAG: - break; - default: - nparams++; - } - } - if (nparams 1) - nparams = 1; - + /* VS is required to export at least one param. */ + nparams = MAX2(shader-nr_param_exports, 1); si_pm4_set_reg(pm4, R_0286C4_SPI_VS_OUT_CONFIG, S_0286C4_VS_EXPORT_COUNT(nparams - 1)); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90466] arm: linker error ndefined reference to `nir_metadata_preserve'
https://bugs.freedesktop.org/show_bug.cgi?id=90466 Emil Velikov emil.l.veli...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #2 from Emil Velikov emil.l.veli...@gmail.com --- The issue has been resolved with 10.6.0. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] mesa: remove duplicate array index validation and extra uniform location lookup
On 03/07/15 15:35, Timothy Arceri wrote: On Fri, 2015-07-03 at 15:08 +0300, Martin Peres wrote: On 03/07/15 13:55, Timothy Arceri wrote: This change removes multiple functions designed to validate an array subscript and replaces them with a call to a single function. The change also means that validation is now only done once and the index is retrived at the same time, as a result the getUniformLocation code can be merged saving an extra hash table lookup (and yet another validation call). Cc: Martin Peres martin.pe...@linux.intel.com Cc: Tapani Pälli tapani.pa...@intel.com --- This clean-up was done to allow AoA program interface query support to be implemented more easily. When applied to my latest AoA work all but one AoA program interface query subtest for the CTS now passes. No Piglit regressions. I also ran the following tests in the CTS without regression: - ES31-CTS.program_interface_query* - ES3 -CTS.gtf.GL3Tests.explicit_attrib_location.explicit_attrib_location_room - ES3-CTS.gtf.GL3Tests.draw_instanced.draw_instanced_max_vertex_attribs src/mesa/main/program_resource.c | 66 ++--- src/mesa/main/shader_query.cpp | 206 - -- src/mesa/main/shaderapi.h| 7 +- src/mesa/main/uniform_query.cpp | 75 -- src/mesa/main/uniforms.c | 7 +- src/mesa/main/uniforms.h | 4 - 6 files changed, 100 insertions(+), 265 deletions(-) diff --git a/src/mesa/main/program_resource.c b/src/mesa/main/program_resource.c index d857b84..0a306b8 100644 --- a/src/mesa/main/program_resource.c +++ b/src/mesa/main/program_resource.c @@ -173,32 +173,12 @@ is_xfb_marker(const char *str) return false; } -/** - * Checks if given name index is legal for GetProgramResourceIndex, - * check is written to be compatible with GL_ARB_array_of_arrays. - */ -static bool -valid_program_resource_index_name(const GLchar *name) -{ - const char *array = strstr(name, [); - const char *close = strrchr(name, ']'); - - /* Not array, no need for the check. */ - if (!array) - return true; - - /* Last array index has to be zero. */ - if (!close || *--close != '0') - return false; - - return true; -} - GLuint GLAPIENTRY _mesa_GetProgramResourceIndex(GLuint program, GLenum programInterface, const GLchar *name) { GET_CURRENT_CONTEXT(ctx); + unsigned array_index; struct gl_program_resource *res; struct gl_shader_program *shProg = _mesa_lookup_shader_program_err(ctx, program, @@ -221,12 +201,9 @@ _mesa_GetProgramResourceIndex(GLuint program, GLenum programInterface, case GL_PROGRAM_OUTPUT: case GL_UNIFORM: case GL_TRANSFORM_FEEDBACK_VARYING: - /* Validate name syntax for array variables */ - if (!valid_program_resource_index_name(name)) - return GL_INVALID_INDEX; - /* fall-through */ case GL_UNIFORM_BLOCK: - res = _mesa_program_resource_find_name(shProg, programInterface, name); + res = _mesa_program_resource_find_name(shProg, programInterface, name, + array_index, true); if (!res) return GL_INVALID_INDEX; @@ -300,36 +277,6 @@ _mesa_GetProgramResourceiv(GLuint program, GLenum programInterface, propCount, props, bufSize, length, params); } -/** - * Function verifies syntax of given name for GetProgramResourceLocation - * and GetProgramResourceLocationIndex for the following cases: - * - * array element portion of a string passed to GetProgramResourceLocation - * or GetProgramResourceLocationIndex must not have, a + sign, extra - * leading zeroes, or whitespace. - * - * Check is written to be compatible with GL_ARB_array_of_arrays. - */ -static bool -invalid_array_element_syntax(const GLchar *name) -{ - char *first = strchr(name, '['); - char *last = strrchr(name, '['); - - if (!first) - return false; - - /* No '+' or ' ' allowed anywhere. */ - if (strchr(first, '+') || strchr(first, ' ')) - return true; - - /* Check that last array index is 0. */ - if (last[1] == '0' last[2] != ']') - return true; - - return false; -} - static struct gl_shader_program * lookup_linked_program(GLuint program, const char *caller) { @@ -356,7 +303,7 @@ _mesa_GetProgramResourceLocation(GLuint program, GLenum programInterface, struct gl_shader_program *shProg = lookup_linked_program(program, glGetProgramResourceLocation); - if (!shProg || !name || invalid_array_element_syntax(name)) + if (!shProg || !name) return -1; /* Validate programInterface. */ @@ -383,7 +330,8 @@ _mesa_GetProgramResourceLocation(GLuint program, GLenum programInterface, _mesa_lookup_enum_by_nr(programInterface), name); } - return _mesa_program_resource_location(shProg, programInterface,
Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once
On 06/07/15 13:33, Chris Wilson wrote: Move the query for the TIMESTAMP register from context init to the screen, so that it is only queried once for all contexts. On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low bits always zero. Detect this by repeating the read a few times and check the register is incrementing. You do not do the latter. You only check for the low bits. I guess the counter is supposed to be monotonically increasing and with a resolution of a few microseconds which would make this perfectly valid. Could you confirm and make sure to add this information in the commit message please? Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/intel_extensions.c | 6 +- src/mesa/drivers/dri/i965/intel_screen.c | 15 +++ src/mesa/drivers/dri/i965/intel_screen.h | 2 ++ 3 files changed, 18 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 3423190..740ac81 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -282,8 +282,6 @@ intelInitExtensions(struct gl_context *ctx) } if (brw-gen = 6) { - uint64_t dummy; - ctx-Extensions.ARB_blend_func_extended = !driQueryOptionb(brw-optionCache, disable_blend_func_extended); ctx-Extensions.ARB_conditional_render_inverted = true; @@ -307,9 +305,7 @@ intelInitExtensions(struct gl_context *ctx) ctx-Extensions.EXT_transform_feedback = true; ctx-Extensions.OES_depth_texture_cube_map = true; - /* Test if the kernel has the ioctl. */ - if (drm_intel_reg_read(brw-bufmgr, TIMESTAMP, dummy) == 0) - ctx-Extensions.ARB_timer_query = true; + ctx-Extensions.ARB_timer_query = brw-intelScreen-hw_has_timestamp; /* Only enable this in core profile because other parts of Mesa behave * slightly differently when the extension is enabled. diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index f9398d7..839a984 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1123,6 +1123,20 @@ intel_detect_swizzling(struct intel_screen *screen) return true; } +static bool +intel_detect_timestamp(struct intel_screen *screen) +{ + uint64_t dummy = 0; + int loop = 10; + + do { + if (drm_intel_reg_read(screen-bufmgr, TIMESTAMP, dummy)) +return false; + } while ((dummy 0x) == 0 --loop); + + return loop 0; +} + /** * Return array of MSAA modes supported by the hardware. The array is * zero-terminated and sorted in decreasing order. @@ -1378,6 +1392,7 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp) intelScreen-hw_must_use_separate_stencil = intelScreen-devinfo-gen = 7; intelScreen-hw_has_swizzling = intel_detect_swizzling(intelScreen); + intelScreen-hw_has_timestamp = intel_detect_timestamp(intelScreen); const char *force_msaa = getenv(INTEL_FORCE_MSAA); if (force_msaa) { diff --git a/src/mesa/drivers/dri/i965/intel_screen.h b/src/mesa/drivers/dri/i965/intel_screen.h index 742b3d3..941e0fc 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.h +++ b/src/mesa/drivers/dri/i965/intel_screen.h @@ -52,6 +52,8 @@ struct intel_screen bool hw_has_swizzling; + bool hw_has_timestamp; + /** * Does the kernel support context reset notifications? */ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 69874] Automake throws a lot of [...] option 'subdir-objects' is disabled
https://bugs.freedesktop.org/show_bug.cgi?id=69874 Emil Velikov emil.l.veli...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #13 from Emil Velikov emil.l.veli...@gmail.com --- The final piece has been resolved and subdir-object has been enabled globally as per commit 404a90b82786080564fe32716f83ce055b9a934f Author: Matt Turner matts...@gmail.com Date: Wed Jun 10 16:30:56 2015 -0700 mesa: Enable subdir-objects globally. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 89131] [Bisected] Graphical corruption in Weston, shows old framebuffer pieces
https://bugs.freedesktop.org/show_bug.cgi?id=89131 Emil Velikov emil.l.veli...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Emil Velikov emil.l.veli...@gmail.com --- The following patch is part of mesa since 10.5.7. Feel free to reopen if it doesn't resolve the issue. commit 25e9ae2b79f32631e7255807a242e5fc4e39984c Author: Marek Olšák marek.ol...@amd.com Date: Tue May 26 19:32:36 2015 +0200 st/dri: fix postprocessing crash when there's no depth buffer -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Use param export count from si_llvm_export_vs in si_shader_vs
Hi Michel, VS_EXPORT_COUNT shouldn't include POS exports (POSITION, MISC (psize, edgeflag, layer, viewport), and CLIPDIST/CULLDIST), it should only count PARAM exports. Setting higher VS_EXPORT_COUNT doesn't affect correctness but it allocates more parameter memory. CLIPDIST is special in that it's exported twice, first as POS and then as PARAM, the same will be done for CULLDIST in the future, so these two should indeed be removed from the list. Marek On Mon, Jul 6, 2015 at 10:30 AM, Michel Dänzer mic...@daenzer.net wrote: From: Michel Dänzer michel.daen...@amd.com This eliminates the error prone logic in si_shader_vs recalculating this value. It also fixes TGSI_SEMANTIC_CLIPDIST outputs incorrectly not being counted for VS exports, since they are passed to the fragment shader as varyings as well. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91193 Signed-off-by: Michel Dänzer michel.daen...@amd.com --- src/gallium/drivers/radeonsi/si_shader.c| 2 ++ src/gallium/drivers/radeonsi/si_shader.h| 1 + src/gallium/drivers/radeonsi/si_state_shaders.c | 25 +++-- 3 files changed, 6 insertions(+), 22 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 4d97b58..753b238 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -1218,6 +1218,8 @@ handle_semantic: } } + shader-nr_param_exports = param_count; + /* We need to add the position output manually if it's missing. */ if (!pos_args[0][0]) { pos_args[0][0] = lp_build_const_int32(base-gallivm, 0xf); /* writemask */ diff --git a/src/gallium/drivers/radeonsi/si_shader.h b/src/gallium/drivers/radeonsi/si_shader.h index b4339ae..8d309b4 100644 --- a/src/gallium/drivers/radeonsi/si_shader.h +++ b/src/gallium/drivers/radeonsi/si_shader.h @@ -165,6 +165,7 @@ struct si_shader { booluses_instanceid; unsignednr_pos_exports; + unsignednr_param_exports; boolis_gs_copy_shader; booldx10_clamp_mode; /* convert NaNs to 0 */ }; diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index eef3baa..a842d9d 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -148,10 +148,9 @@ static void si_shader_gs(struct si_shader *shader) static void si_shader_vs(struct si_shader *shader) { - struct tgsi_shader_info *info = shader-selector-info; struct si_pm4_state *pm4; unsigned num_sgprs, num_user_sgprs; - unsigned nparams, i, vgpr_comp_cnt; + unsigned nparams, vgpr_comp_cnt; uint64_t va; unsigned window_space = shader-selector-info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION]; @@ -180,26 +179,8 @@ static void si_shader_vs(struct si_shader *shader) } assert(num_sgprs = 104); - /* Certain attributes (position, psize, etc.) don't count as params. -* VS is required to export at least one param and r600_shader_from_tgsi() -* takes care of adding a dummy export. -*/ - for (nparams = 0, i = 0 ; i info-num_outputs; i++) { - switch (info-output_semantic_name[i]) { - case TGSI_SEMANTIC_CLIPVERTEX: - case TGSI_SEMANTIC_CLIPDIST: - case TGSI_SEMANTIC_CULLDIST: - case TGSI_SEMANTIC_POSITION: - case TGSI_SEMANTIC_PSIZE: - case TGSI_SEMANTIC_EDGEFLAG: - break; - default: - nparams++; - } - } - if (nparams 1) - nparams = 1; - + /* VS is required to export at least one param. */ + nparams = MAX2(shader-nr_param_exports, 1); si_pm4_set_reg(pm4, R_0286C4_SPI_VS_OUT_CONFIG, S_0286C4_VS_EXPORT_COUNT(nparams - 1)); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90905] mesa: Finish subdir-objects transition
https://bugs.freedesktop.org/show_bug.cgi?id=90905 Emil Velikov emil.l.veli...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Emil Velikov emil.l.veli...@gmail.com --- Both issues (the subdir-objects and -Wl,--allow-multiple-definition) should be sorted now. Thank for the help gents. -- You are receiving this mail because: You are the QA Contact for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 89068] glTexImage2D regression by texstore_rgba switch to _mesa_format_convert
https://bugs.freedesktop.org/show_bug.cgi?id=89068 Emil Velikov emil.l.veli...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #13 from Emil Velikov emil.l.veli...@gmail.com --- The patch referenced is present in mesa 10.5.0 onwards. Thanks for the report and confirmation Brad. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 45348] [swrast] piglit fbo-drawbuffers-arbfp regression
https://bugs.freedesktop.org/show_bug.cgi?id=45348 Emil Velikov emil.l.veli...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Emil Velikov emil.l.veli...@gmail.com --- Hi Vinson, Brian has pushed a fix shortly after your last test. It's available with 10.5.0 and later. Feel free to give it a test and reopen if the issue persists. commit 89c96afe3c0acf8f2fccaf02da02945afe8ba5f3 Author: Brian Paul bri...@vmware.com Date: Mon Feb 16 11:23:06 2015 -0700 swrast: fix multiple color buffer writing -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once
On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote: On 06/07/15 13:33, Chris Wilson wrote: Move the query for the TIMESTAMP register from context init to the screen, so that it is only queried once for all contexts. On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low bits always zero. Detect this by repeating the read a few times and check the register is incrementing. You do not do the latter. You only check for the low bits. I guess the counter is supposed to be monotonically increasing and with a resolution of a few microseconds which would make this perfectly valid. Could you confirm and make sure to add this information in the commit message please? The counter should increment every 80ns. What's misleading in what I wrote? It describes the hw bug and how to detect it. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once
On 06/07/15 16:15, Martin Peres wrote: On 06/07/15 16:13, Chris Wilson wrote: On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote: On 06/07/15 13:33, Chris Wilson wrote: Move the query for the TIMESTAMP register from context init to the screen, so that it is only queried once for all contexts. On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low bits always zero. Detect this by repeating the read a few times and check the register is incrementing. You do not do the latter. You only check for the low bits. I guess the counter is supposed to be monotonically increasing and with a resolution of a few microseconds which would make this perfectly valid. Could you confirm and make sure to add this information in the commit message please? The counter should increment every 80ns. What's misleading in what I wrote? It describes the hw bug and how to detect it. Well, it is not misleading, it just lacks this information. If it incremented every seconds, the patch would be stupid because the timestamp could be at 0 and polling 10 times at a few us of interval would always yield the same result. That's all :) Oh, forgot to say: With this information added in the commit message and the commit message duplicated as a comment in intel_detect_timestamp(), the patch is: Reviewed-by: Martin Peres martin.pe...@linux.intel.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/18] i965: Share the workaround bo between all contexts
On 06/07/15 13:33, Chris Wilson wrote: Since the workaround bo is used strictly as a write-only buffer, we need only allocate one per screen and use the same one from all contexts. (The caveat here is during extension initialisation, where we write into and read back register values from the buffer, but that is performed only once for the first context - and baring synchronisation issues should not be a problem. Safer would be to move that also to the screen.) Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 +++--- src/mesa/drivers/dri/i965/intel_screen.c | 4 src/mesa/drivers/dri/i965/intel_screen.h | 1 + 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index 7ee3cb6..05e14cd 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -341,12 +341,12 @@ brw_init_pipe_control(struct brw_context *brw, * the gen6 workaround because it involves actually writing to * the buffer, and the kernel doesn't let us write to the batch. */ - brw-workaround_bo = drm_intel_bo_alloc(brw-bufmgr, - pipe_control workaround, - 4096, 4096); + brw-workaround_bo = brw-intelScreen-workaround_bo; if (brw-workaround_bo == NULL) return -ENOMEM; + drm_intel_bo_reference(brw-workaround_bo); + brw-pipe_controls_since_last_cs_stall = 0; return 0; diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 839a984..cd8e6eb 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv) { struct intel_screen *intelScreen = sPriv-driverPrivate; + drm_intel_bo_unreference(intelScreen-workaround_bo); I guess this is OK because after the last context got destroyed the screen would get destroyed? In any case, I don't get the point of reference counting when we obviously only need to free the buffer when the screen gets freed. dri_bufmgr_destroy(intelScreen-bufmgr); driDestroyOptionInfo(intelScreen-optionCache); @@ -1096,6 +1097,9 @@ intel_init_bufmgr(struct intel_screen *intelScreen) return false; } + intelScreen-workaround_bo = + drm_intel_bo_alloc(intelScreen-bufmgr, pipe_control w/a, 4096, 4096); + return true; } diff --git a/src/mesa/drivers/dri/i965/intel_screen.h b/src/mesa/drivers/dri/i965/intel_screen.h index 941e0fc..e55fddb 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.h +++ b/src/mesa/drivers/dri/i965/intel_screen.h @@ -60,6 +60,7 @@ struct intel_screen bool has_context_reset_notification; dri_bufmgr *bufmgr; + drm_intel_bo *workaround_bo; /** * A unique ID for shader programs. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 79706] [TRACKER] Mesa regression tracker
https://bugs.freedesktop.org/show_bug.cgi?id=79706 Bug 79706 depends on bug 45348, which changed state. Bug 45348 Summary: [swrast] piglit fbo-drawbuffers-arbfp regression https://bugs.freedesktop.org/show_bug.cgi?id=45348 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/18] i965: Move pipecontrol workaround bo to brw_pipe_control
On 06/07/15 13:33, Chris Wilson wrote: With the exception of gen8, the sole user of the workaround bo are for emitting pipe controls. Move it out of the purview of the batchbuffer and into the pipecontrol. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Reviewed-by: Martin Peres martin.pe...@linux.intel.com --- src/mesa/drivers/dri/i965/brw_context.c | 7 + src/mesa/drivers/dri/i965/brw_context.h | 12 +--- src/mesa/drivers/dri/i965/brw_pipe_control.c | 40 +++ src/mesa/drivers/dri/i965/gen8_depth_state.c | 2 +- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 12 src/mesa/drivers/dri/i965/intel_extensions.c | 28 +-- 6 files changed, 64 insertions(+), 37 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 4b51fe5..8150b94 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -819,6 +819,12 @@ brwCreateContext(gl_api api, } } + if (brw_init_pipe_control(brw, devinfo)) { + *dri_ctx_error = __DRI_CTX_ERROR_NO_MEMORY; + intelDestroyContext(driContextPriv); + return false; + } + brw_init_state(brw); intelInitExtensions(ctx); @@ -942,6 +948,7 @@ intelDestroyContext(__DRIcontext * driContextPriv) if (ctx-swrast_context) _swrast_DestroyContext(brw-ctx); + brw_fini_pipe_control(brw); intel_batchbuffer_free(brw); drm_intel_bo_unreference(brw-throttle_batch[1]); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3553f6e..db0fc48 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -868,8 +868,6 @@ struct intel_batchbuffer { drm_intel_bo *bo; /** Last BO submitted to the hardware. Used for glFinish(). */ drm_intel_bo *last_bo; - /** BO for post-sync nonzero writes for gen6 workaround. */ - drm_intel_bo *workaround_bo; uint16_t emit, total; uint16_t used, reserved_space; @@ -881,8 +879,6 @@ struct intel_batchbuffer { enum brw_gpu_ring ring; bool needs_sol_reset; - uint8_t pipe_controls_since_last_cs_stall; - struct { uint16_t used; int reloc_count; @@ -1034,6 +1030,10 @@ struct brw_context drm_intel_context *hw_ctx; + /** BO for post-sync nonzero writes for gen6 workaround. */ + drm_intel_bo *workaround_bo; + uint8_t pipe_controls_since_last_cs_stall; + /** * Set of drm_intel_bo * that have been rendered to within this batchbuffer * and would need flushing before being used from another cache domain that @@ -2000,6 +2000,10 @@ gen9_use_linear_1d_layout(const struct brw_context *brw, const struct intel_mipmap_tree *mt); /* brw_pipe_control.c */ +int brw_init_pipe_control(struct brw_context *brw, + const struct brw_device_info *info); +void brw_fini_pipe_control(struct brw_context *brw); + void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags); void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags, drm_intel_bo *bo, uint32_t offset, diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index b4c86b9..7ee3cb6 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -72,13 +72,13 @@ gen7_cs_stall_every_four_pipe_controls(struct brw_context *brw, uint32_t flags) if (brw-gen == 7 !brw-is_haswell) { if (flags PIPE_CONTROL_CS_STALL) { /* If we're doing a CS stall, reset the counter and carry on. */ - brw-batch.pipe_controls_since_last_cs_stall = 0; + brw-pipe_controls_since_last_cs_stall = 0; return 0; } /* If this is the fourth pipe control without a CS stall, do one now. */ - if (++brw-batch.pipe_controls_since_last_cs_stall == 4) { - brw-batch.pipe_controls_since_last_cs_stall = 0; + if (++brw-pipe_controls_since_last_cs_stall == 4) { + brw-pipe_controls_since_last_cs_stall = 0; return PIPE_CONTROL_CS_STALL; } } @@ -213,7 +213,7 @@ gen7_emit_vs_workaround_flush(struct brw_context *brw) brw_emit_pipe_control_write(brw, PIPE_CONTROL_WRITE_IMMEDIATE | PIPE_CONTROL_DEPTH_STALL, - brw-batch.workaround_bo, 0, + brw-workaround_bo, 0, 0, 0); } @@ -227,7 +227,7 @@ gen7_emit_cs_stall_flush(struct brw_context *brw) brw_emit_pipe_control_write(brw, PIPE_CONTROL_CS_STALL | PIPE_CONTROL_WRITE_IMMEDIATE, -
Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once
On 06/07/15 16:13, Chris Wilson wrote: On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote: On 06/07/15 13:33, Chris Wilson wrote: Move the query for the TIMESTAMP register from context init to the screen, so that it is only queried once for all contexts. On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low bits always zero. Detect this by repeating the read a few times and check the register is incrementing. You do not do the latter. You only check for the low bits. I guess the counter is supposed to be monotonically increasing and with a resolution of a few microseconds which would make this perfectly valid. Could you confirm and make sure to add this information in the commit message please? The counter should increment every 80ns. What's misleading in what I wrote? It describes the hw bug and how to detect it. Well, it is not misleading, it just lacks this information. If it incremented every seconds, the patch would be stupid because the timestamp could be at 0 and polling 10 times at a few us of interval would always yield the same result. That's all :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallivm: fix lp_build_compare_ext
On 04/07/15 07:15, Vinson Lee wrote: On Fri, Jul 3, 2015 at 6:05 PM, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com The expansion should always be to the same width as the input arguments no matter what, since these functions should work with any bit width of the arguments (the sext is a no-op on any sane simd architecture). Thus, fix the caller expecting differently. This fixes https://bugs.freedesktop.org/show_bug.cgi?id=91222 (not tested otherwise) --- src/gallium/auxiliary/gallivm/lp_bld_logic.c | 2 +- src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_logic.c b/src/gallium/auxiliary/gallivm/lp_bld_logic.c index f724cfa..80b53e5 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_logic.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_logic.c @@ -81,7 +81,7 @@ lp_build_compare_ext(struct gallivm_state *gallivm, boolean ordered) { LLVMBuilderRef builder = gallivm-builder; - LLVMTypeRef int_vec_type = lp_build_int_vec_type(gallivm, lp_type_int_vec(32, 32 * type.length)); + LLVMTypeRef int_vec_type = lp_build_int_vec_type(gallivm, type); LLVMValueRef zeros = LLVMConstNull(int_vec_type); LLVMValueRef ones = LLVMConstAllOnes(int_vec_type); LLVMValueRef cond; diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c index 1f2af85..0ad78b0 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c @@ -1961,8 +1961,11 @@ dset_emit_cpu( struct lp_build_emit_data * emit_data, unsigned pipe_func) { + LLVMBuilderRef builder = bld_base-base.gallivm-builder; LLVMValueRef cond = lp_build_cmp(bld_base-dbl_bld, pipe_func, emit_data-args[0], emit_data-args[1]); + /* arguments were 64 bit but store as 32 bit */ + cond = LLVMBuildTrunc(builder, cond, bld_base-int_bld.int_vec_type, ); emit_data-output[emit_data-chan] = cond; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Tested-by: Vinson Lee v...@freedesktop.org ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Reviewed-by: Jose Fonseca jfons...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager
Just skipping to interesting comments for the moment. On Mon, Jul 06, 2015 at 12:34:00PM -0700, Kenneth Graunke wrote: While I really like this idea in principle, the current patch is rather huge, making it difficult to review; bisectability would also suffer. Would it be possible to split it up into several smaller patches? First time I tried I ended up with a series of patches of equal churn and I wasn't happy with the results. Honestly, I find mechanical noise in patches easy to tune out, you quickly recognise the pattern and deviations should stand out. Some of the churn I think is important as it highlight the relative emphasis on different coding patterns. In particular, how the relocation must always emit the address in the batch, so in quite a few functions the code had to be rearranged in order to make that happen One idea I had for how you might accomplish that is to introduce the brw_bo struct and related functions, but make it simply contain a drm_intel_bo * field, and fall back to the existing libdrm-based code. Sure, what I had in mind was a changeover to a struct brw_bo that just wrapped drm_intel_bo for typesafefty (so the compiler was there to catch mistakes and missing pieces). I felt the patch was still about 500 lines of mechcanical changes, 1000 lines removing the intel_batchbuffer.c and 2000 lines adding brw_bo (it has grown somewhat with the comments!). The api to review was in one place, and the bits that required fine changes (mostly intel_bufferobj.c) were equally obvious. But what you really need is a slow progression in feature development of brw_batch. Splitting the mechanical changes off with a touch more churn, and even converting functions over to new API in stages is relatively trivial compared to building up the feature set required to do batchbuffer construction from scratch. - relocations come first, as request tracking cannot happen with integration into the relocations. - execbuf - request tracking - mmap functions - exporting busyness Honestly though the design has never existed in a piecemeal fashion, it has always been requests first and that drove the infrastructure required to enable the request tracking. Anyway splitting out the mechanical changes is worth it if it gets more eyes onto brw_batch.c itself and some of its knock-on effects. That way, you could put all the mechanical renaming and refactoring across the entire code-base in one patch that should have no functional change. You could then replace the contents or brw_bo and those functions with your new improved batch manager. We've talked about moving away from libdrm for a while, so having our own functions and structures makes a lot of sense. I'm also curious about the performance on non-LLC platforms (CHV or BYT)? You've dropped a number of non-LLC paths - which is probably fine, honestly...they were always of dubious value - but it'd be nice to know the record the effects of this change on non-LLC in the commit message. Ah, I didn't actually remove non-LLC paths, I added equivalent detiled CPU access for non-LLC (strictly non cache coherent bo, incl scanouts) as for LLC. So I guess you mean the few places that lost the brw-has_llc differentiation? @@ -279,11 +254,6 @@ retry: brw-ctx.NewDriverState = ~0ull; brw-no_depth_or_stencil = false; brw-ib.type = -1; - - /* Flush the sampler cache so any texturing from the destination is -* coherent. -*/ - brw_emit_mi_flush(brw); It looks like you moved this flush earlier, so it's in the section of code that retries? That's probably reasonable. Seems worth keeping the comment, and this could be done as a separate patch... True. The reason why it has to be earlier is that I am much stricter on having all BEGIN_BATCH/ADVANCE_BATCH chunks being inside a brw_batch_begin/brw_batch_end - since brw_emit_mi_flush() itself doesn't start a new brw_batch_begin section, it had to go in blorp's. The comment was removed because we now have accurate dirty texture cache tracking and that itself is not the reason wy flush has to exist. (The flush is for a change in hiz mode iirc, or some similar change in GPU state that requires a flush between primitives.) [snip] +static void +load_sized_register_mem(struct brw_context *brw, +uint32_t reg, +struct brw_bo *bo, +uint32_t read_domains, uint32_t write_domain, +uint32_t offset, +int size) +{ + int i; + + /* MI_LOAD_REGISTER_MEM only exists on Gen7+. */ + assert(brw-gen = 7); + + if (brw-gen = 8) { + BEGIN_BATCH(4 * size); + for (i = 0; i size; i++) { + OUT_BATCH(GEN7_MI_LOAD_REGISTER_MEM | (4 - 2)); + OUT_BATCH(reg + i * 4); + OUT_RELOC64(bo, read_domains, write_domain, offset + i * 4); + } + ADVANCE_BATCH(); + } else { +
Re: [Mesa-dev] [PATCH] mesa: Add a MUST_CHECK macro for __attribute__((warn_unused_result)).
On Mon, Jul 06, 2015 at 11:18:00AM -0700, Kenneth Graunke wrote: In the kernel, this is called __must_check; all our attribute macros in Mesa appear to be uppercase, so I went with that. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Cc: ch...@chris-wilson.co.uk Cc: matts...@gmail.com --- configure.ac | 1 + src/util/macros.h | 6 ++ 2 files changed, 7 insertions(+) I noticed Chris wants to use this in one of his patches, so I figured I'd throw together a patch to do this a bit more cleanly. That would suit me very much. I guess I need to learn about AX_GCC_FUNC_ATTRIBUTE! Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 15/18] swrast: Defer _tnl_vertex_init until first use
On Monday, July 06, 2015 11:33:20 AM Chris Wilson wrote: The vertices require a large chunk of memory, currently allocated during context creation. However, this memory is not required until use so we can defer the allocation until the first swrast_Wakeup(). Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Kenneth Graunke kenn...@whitecape.org --- src/mesa/swrast_setup/ss_context.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/mesa/swrast_setup/ss_context.c b/src/mesa/swrast_setup/ss_context.c index 74b1da3..028eccb 100644 --- a/src/mesa/swrast_setup/ss_context.c +++ b/src/mesa/swrast_setup/ss_context.c @@ -59,10 +59,6 @@ _swsetup_CreateContext( struct gl_context *ctx ) swsetup-NewState = ~0; _swsetup_trifuncs_init( ctx ); - _tnl_init_vertices( ctx, ctx-Const.MaxArrayLockSize + 12, -sizeof(SWvertex) ); - - return GL_TRUE; } @@ -233,6 +229,11 @@ _swsetup_Wakeup( struct gl_context *ctx ) TNLcontext *tnl = TNL_CONTEXT(ctx); SScontext *swsetup = SWSETUP_CONTEXT(ctx); + if (!(GET_VERTEX_STATE(ctx))-max_vertex_size) + _tnl_init_vertices(ctx, + ctx-Const.MaxArrayLockSize + 12, + sizeof(SWvertex)); + tnl-Driver.Render.Start = _swsetup_RenderStart; tnl-Driver.Render.Finish = _swsetup_RenderFinish; tnl-Driver.Render.PrimitiveNotify = _swsetup_RenderPrimitive; Looks reasonable - this saves 2.59MB of memory in Piglit's glsl-cos test. (It's worth noting that in the commit message.) I wonder if we can do better though by avoiding most of the swrast context entirely... Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Fix missing BRW_NEW_FS_PROG_DATA in gen6_renderbuffer_surfaces.
On Monday, July 06, 2015 12:18:18 PM Matt Turner wrote: On Mon, Jul 6, 2015 at 9:55 AM, Kenneth Graunke kenn...@whitecape.org wrote: It looks like this was forgotten in commit 3c9dc2d31b80fc73bffa1f40a (i965: Make a brw_stage_prog_data for storing the SURF_INDEX information.) In other words, it's been missing since we moved to dynamic binding table slot assignments. Author: Eric Anholt e...@anholt.net AuthorDate: Wed Oct 2 14:07:40 2013 -0700 CommitDate: Tue Oct 15 10:18:42 2013 -0700 Dang. How did you find this? After reading Eero's latest performance analysis observations, I decided to try marking render targets uncached (MOCS = UC) when blending and logic ops (i.e. things that cause render target reads) are disabled. This necessitated adding _NEW_COLOR. Then I realized BRW_NEW_FS_PROG_DATA was just plain missing. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nvc0: fix wrong use of BLIT_SRC_Y_INT for 2D texture copy
Reviewed-by: Ilia Mirkin imir...@alum.mit.edu I think this only gets called for copy_region, which in turn probably can't be reached by MS surfaces. Not sure, but this is definitely right anyways. On Mon, Jul 6, 2015 at 4:06 PM, Samuel Pitoiset samuel.pitoi...@gmail.com wrote: According to nv50, this should be src-ms_y instead of src-ms_x. This code is here since 2012, so it's probably a typo error which has never been detected since a long time. I didn't do a full piglit run to check if it fixes some other weird issues. Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com --- src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c index a820de7..53cd8cd 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c @@ -189,7 +189,7 @@ nvc0_2d_texture_do_copy(struct nouveau_pushbuf *push, PUSH_DATA (push, 0); PUSH_DATA (push, sx src-ms_x); PUSH_DATA (push, 0); - PUSH_DATA (push, sy src-ms_x); + PUSH_DATA (push, sy src-ms_y); return 0; } -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 15/18] swrast: Defer _tnl_vertex_init until first use
On 07/06/2015 01:11 PM, Matt Turner wrote: On Mon, Jul 6, 2015 at 3:33 AM, Chris Wilson ch...@chris-wilson.co.uk wrote: The vertices require a large chunk of memory, currently allocated during context creation. However, this memory is not required until use so we can defer the allocation until the first swrast_Wakeup(). Makes sense to me. Someone like Brian like to take a look. Looks OK to me, assuming you've done some piglit testing with swrast. -Brian Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Kenneth Graunke kenn...@whitecape.org --- src/mesa/swrast_setup/ss_context.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/mesa/swrast_setup/ss_context.c b/src/mesa/swrast_setup/ss_context.c index 74b1da3..028eccb 100644 --- a/src/mesa/swrast_setup/ss_context.c +++ b/src/mesa/swrast_setup/ss_context.c @@ -59,10 +59,6 @@ _swsetup_CreateContext( struct gl_context *ctx ) swsetup-NewState = ~0; _swsetup_trifuncs_init( ctx ); - _tnl_init_vertices( ctx, ctx-Const.MaxArrayLockSize + 12, - sizeof(SWvertex) ); - - return GL_TRUE; } @@ -233,6 +229,11 @@ _swsetup_Wakeup( struct gl_context *ctx ) TNLcontext *tnl = TNL_CONTEXT(ctx); SScontext *swsetup = SWSETUP_CONTEXT(ctx); + if (!(GET_VERTEX_STATE(ctx))-max_vertex_size) + _tnl_init_vertices(ctx, +ctx-Const.MaxArrayLockSize + 12, +sizeof(SWvertex)); Replace the tabs on these two lines with spaces. + tnl-Driver.Render.Start = _swsetup_RenderStart; tnl-Driver.Render.Finish = _swsetup_RenderFinish; tnl-Driver.Render.PrimitiveNotify = _swsetup_RenderPrimitive; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Add a MUST_CHECK macro for __attribute__((warn_unused_result)).
On Monday, July 06, 2015 09:27:12 PM Chris Wilson wrote: On Mon, Jul 06, 2015 at 11:18:00AM -0700, Kenneth Graunke wrote: In the kernel, this is called __must_check; all our attribute macros in Mesa appear to be uppercase, so I went with that. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Cc: ch...@chris-wilson.co.uk Cc: matts...@gmail.com --- configure.ac | 1 + src/util/macros.h | 6 ++ 2 files changed, 7 insertions(+) I noticed Chris wants to use this in one of his patches, so I figured I'd throw together a patch to do this a bit more cleanly. That would suit me very much. I guess I need to learn about AX_GCC_FUNC_ATTRIBUTE! Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk -Chris It's pretty handy! Matt found it in the autoconf archive: http://www.gnu.org/software/autoconf-archive/ax_gcc_func_attribute.html We just dropped it in our m4/ directory. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/18] i965: Share the workaround bo between all contexts
On Mon, Jul 06, 2015 at 04:29:49PM +0300, Martin Peres wrote: On 06/07/15 13:33, Chris Wilson wrote: Since the workaround bo is used strictly as a write-only buffer, we need only allocate one per screen and use the same one from all contexts. (The caveat here is during extension initialisation, where we write into and read back register values from the buffer, but that is performed only once for the first context - and baring synchronisation issues should not be a problem. Safer would be to move that also to the screen.) Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 +++--- src/mesa/drivers/dri/i965/intel_screen.c | 4 src/mesa/drivers/dri/i965/intel_screen.h | 1 + 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index 7ee3cb6..05e14cd 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -341,12 +341,12 @@ brw_init_pipe_control(struct brw_context *brw, * the gen6 workaround because it involves actually writing to * the buffer, and the kernel doesn't let us write to the batch. */ - brw-workaround_bo = drm_intel_bo_alloc(brw-bufmgr, - pipe_control workaround, - 4096, 4096); + brw-workaround_bo = brw-intelScreen-workaround_bo; if (brw-workaround_bo == NULL) return -ENOMEM; + drm_intel_bo_reference(brw-workaround_bo); + brw-pipe_controls_since_last_cs_stall = 0; return 0; diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 839a984..cd8e6eb 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv) { struct intel_screen *intelScreen = sPriv-driverPrivate; + drm_intel_bo_unreference(intelScreen-workaround_bo); I guess this is OK because after the last context got destroyed the screen would get destroyed? Sure, but not taking a reference for each would get a little strange and you would still need a comment to explain why not following the standard idiom is safe. Then when we introduce context local brw_bo, the conversion from reference to local is much cleaner with the existing ref. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult
That's right. Except really what might have happend was occl query; write X; more drawing; write X+1; and then on the CPU, you see X+1. So the tests are always for = X. And if you have more than 2^32 submits, you cry, because I'm *sure* that nothing implements wraparound properly :) On Mon, Jul 6, 2015 at 1:45 PM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Ilia, thanks a lot for the info. So basically if I submit to GPU's command stream: perform occlusion query, write X to Y. I know that query is completed when after reading Y address I get X. Regards, Vyacheslav On Mon, Jul 6, 2015 at 9:13 PM, Ilia Mirkin imir...@alum.mit.edu wrote: I'm only really familiar with nouveau, but I think all GPU hardware works in roughly the same way. Basically you have some way of inserting write X to address Y into the command stream (aka a fence), after which you insert write X+1 to address Y and so on. If you want the CPU to wait on a given fence, you just do while (*address x);. If you have multiple GPU processing queues, you can usually also insert a stall this queue until the value at address Y is at least X command into the command stream. DRM uses implicit fences, so it knows which BOs are used for particular commands. So the flow goes something like submit bunch of commands; submit fence write and attach that fence id to the BOs in the previous bunch of comands. Then to wait for a bo to become ready, you just wait until the GPU writes the appropriate number to memory address Y (from above). The mesa drivers can sometimes use clever tricks that avoid this sync'ing because it knows exactly how it emits the commands and perhaps it waits on something related earlier whereby it knows the other thing will be ready. No idea if that's the case here. Hope this helps, -ilia On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Ilia, thanks for the gallium link. Do you know any links to high level info with broad strokes about how this sync works? Frankly I do not know driver terminology and wanted to know more about how this sync is performed for my research. I'm using mesa as a reference because it has open implementation code. Occlusion query functionality probably waits for z-buffer to become ready. Problem is that usual synchronization techniques do not apply here. I'm thinking that driver code gets notifications about state change. I want to know what kind of notifications are available? Can query be performed in parallel with another frame being processed or does it need complete GPU pipeline flush? Thanks, Vyacheslav On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Hi, everyone. Trying to understand method radeonQueryGetResult (more broadly GPU-CPU sync). static void radeonQueryGetResult(struct gl_context *ctx, struct gl_query_object *q) { struct radeon_query_object *query = (struct radeon_query_object *)q; uint32_t *result; int i; radeon_print(RADEON_STATE, RADEON_VERBOSE, %s: query id %d, result %d\n, __func__, query-Base.Id, (int) query-Base.Result); radeon_bo_map(query-bo, GL_FALSE); result = query-bo-ptr; query-Base.Result = 0; for (i = 0; i query-curr_offset/sizeof(uint32_t); ++i) { query-Base.Result += LE32_TO_CPU(result[i]); radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i, LE32_TO_CPU(result[i])); } radeon_bo_unmap(query-bo); } I don't know which part is responsible for blocking behavior (waiting for response from GPU). I suspect that radeon_bo_map does this magic. Can someone point in the right direction? The radeon_bo_map defined in src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic. However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c references the radeon_bo_map in libdrm, which does not appear to wait. FWIW for nouveau, nouveau_bo_map will also implicitly do a nouveau_bo_wait, but that does not appear to be the case for radeon. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nvc0: fix wrong use of BLIT_SRC_Y_INT for 2D texture copy
According to nv50, this should be src-ms_y instead of src-ms_x. This code is here since 2012, so it's probably a typo error which has never been detected since a long time. I didn't do a full piglit run to check if it fixes some other weird issues. Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com --- src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c index a820de7..53cd8cd 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c @@ -189,7 +189,7 @@ nvc0_2d_texture_do_copy(struct nouveau_pushbuf *push, PUSH_DATA (push, 0); PUSH_DATA (push, sx src-ms_x); PUSH_DATA (push, 0); - PUSH_DATA (push, sy src-ms_x); + PUSH_DATA (push, sy src-ms_y); return 0; } -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult
Am 06.07.2015 um 19:54 schrieb Ilia Mirkin: That's right. Except really what might have happend was occl query; write X; more drawing; write X+1; and then on the CPU, you see X+1. So the tests are always for = X. And if you have more than 2^32 submits, you cry, because I'm *sure* that nothing implements wraparound properly :) That's why 64bit counters are used right? :-). Roland On Mon, Jul 6, 2015 at 1:45 PM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Ilia, thanks a lot for the info. So basically if I submit to GPU's command stream: perform occlusion query, write X to Y. I know that query is completed when after reading Y address I get X. Regards, Vyacheslav On Mon, Jul 6, 2015 at 9:13 PM, Ilia Mirkin imir...@alum.mit.edu wrote: I'm only really familiar with nouveau, but I think all GPU hardware works in roughly the same way. Basically you have some way of inserting write X to address Y into the command stream (aka a fence), after which you insert write X+1 to address Y and so on. If you want the CPU to wait on a given fence, you just do while (*address x);. If you have multiple GPU processing queues, you can usually also insert a stall this queue until the value at address Y is at least X command into the command stream. DRM uses implicit fences, so it knows which BOs are used for particular commands. So the flow goes something like submit bunch of commands; submit fence write and attach that fence id to the BOs in the previous bunch of comands. Then to wait for a bo to become ready, you just wait until the GPU writes the appropriate number to memory address Y (from above). The mesa drivers can sometimes use clever tricks that avoid this sync'ing because it knows exactly how it emits the commands and perhaps it waits on something related earlier whereby it knows the other thing will be ready. No idea if that's the case here. Hope this helps, -ilia On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Ilia, thanks for the gallium link. Do you know any links to high level info with broad strokes about how this sync works? Frankly I do not know driver terminology and wanted to know more about how this sync is performed for my research. I'm using mesa as a reference because it has open implementation code. Occlusion query functionality probably waits for z-buffer to become ready. Problem is that usual synchronization techniques do not apply here. I'm thinking that driver code gets notifications about state change. I want to know what kind of notifications are available? Can query be performed in parallel with another frame being processed or does it need complete GPU pipeline flush? Thanks, Vyacheslav On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Hi, everyone. Trying to understand method radeonQueryGetResult (more broadly GPU-CPU sync). static void radeonQueryGetResult(struct gl_context *ctx, struct gl_query_object *q) { struct radeon_query_object *query = (struct radeon_query_object *)q; uint32_t *result; int i; radeon_print(RADEON_STATE, RADEON_VERBOSE, %s: query id %d, result %d\n, __func__, query-Base.Id, (int) query-Base.Result); radeon_bo_map(query-bo, GL_FALSE); result = query-bo-ptr; query-Base.Result = 0; for (i = 0; i query-curr_offset/sizeof(uint32_t); ++i) { query-Base.Result += LE32_TO_CPU(result[i]); radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i, LE32_TO_CPU(result[i])); } radeon_bo_unmap(query-bo); } I don't know which part is responsible for blocking behavior (waiting for response from GPU). I suspect that radeon_bo_map does this magic. Can someone point in the right direction? The radeon_bo_map defined in src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic. However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c references the radeon_bo_map in libdrm, which does not appear to wait. FWIW for nouveau, nouveau_bo_map will also implicitly do a nouveau_bo_wait, but that does not appear to be the case for radeon. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=BQIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=WLJtDrNOS1MO4md0Q2dXG1RDHVFkoqdi6-ZeojTw0l8s=ZgIqDkkPvOUZvUp4VGqWC8rnvcv-tBNOIB6Dqpkh2uUe= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib
Before validating vertex arrays we need to check if a VBO is present. Checking if vb-buffer is not NULL fixes the issue. Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com --- src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c index 1fd33b8..3d200bd 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c @@ -382,6 +382,11 @@ nv50_vertex_arrays_validate(struct nv50_context *nv50) if (nv50-vbo_user (1 b)) { address = addrs[b] + ve-pipe.src_offset; limit = addrs[b] + limits[b]; + } else + if (!vb-buffer) { + BEGIN_NV04(push, NV50_3D(VERTEX_ARRAY_FETCH(i)), 1); + PUSH_DATA (push, 0); + continue; } else { struct nv04_resource *buf = nv04_resource(vb-buffer); if (!(refd (1 b))) { -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/skl: Set the pulls bary bit in 3DSTATE_PS_EXTRA
On Fri, Jul 3, 2015 at 5:15 AM, Neil Roberts n...@linux.intel.com wrote: On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if the shader sends a message to the pixel interpolator. This fixes the interpolateAt* tests on SKL, apart from interpolateatsample-nonconst but that is not implemented anywhere so it's not a regression. --- src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/brw_defines.h | 1 + src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 src/mesa/drivers/dri/i965/gen8_ps_state.c | 3 +++ 4 files changed, 9 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3553f6e..7596139 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -415,6 +415,7 @@ struct brw_wm_prog_data { bool uses_pos_offset; bool uses_omask; bool uses_kill; + bool pulls_bary; uint32_t prog_offset_16; /** diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 66b9abc..19489ab 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -2145,6 +2145,7 @@ enum brw_pixel_shader_computed_depth_mode { # define GEN8_PSX_SHADER_DISABLES_ALPHA_TO_COVERAGE (1 7) # define GEN8_PSX_SHADER_IS_PER_SAMPLE (1 6) # define GEN8_PSX_SHADER_COMPUTES_STENCIL (1 5) +# define GEN9_PSX_SHADER_PULLS_BARY (1 3) # define GEN8_PSX_SHADER_HAS_UAV(1 2) # define GEN8_PSX_SHADER_USES_INPUT_COVERAGE_MASK (1 1) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index bd71404..3ebc3a2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -1481,6 +1481,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder bld, nir_intrinsic_instr *instr case nir_intrinsic_interp_var_at_centroid: case nir_intrinsic_interp_var_at_sample: case nir_intrinsic_interp_var_at_offset: { + assert(stage == MESA_SHADER_FRAGMENT); + + ((struct brw_wm_prog_data *) prog_data)-pulls_bary = true; + fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2); /* For most messages, we need one reg of ignored data; the hardware diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c b/src/mesa/drivers/dri/i965/gen8_ps_state.c index a88f109..d544509 100644 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c @@ -58,6 +58,9 @@ gen8_upload_ps_extra(struct brw_context *brw, if (prog_data-uses_omask) dw1 |= GEN8_PSX_OMASK_TO_RENDER_TARGET; + if (brw-gen = 9 prog_data-pulls_bary) + dw1 |= GEN9_PSX_SHADER_PULLS_BARY; + if (_mesa_active_fragment_shader_has_atomic_ops(brw-ctx)) dw1 |= GEN8_PSX_SHADER_HAS_UAV; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult
Ilia, thanks a lot for the info. So basically if I submit to GPU's command stream: perform occlusion query, write X to Y. I know that query is completed when after reading Y address I get X. Regards, Vyacheslav On Mon, Jul 6, 2015 at 9:13 PM, Ilia Mirkin imir...@alum.mit.edu wrote: I'm only really familiar with nouveau, but I think all GPU hardware works in roughly the same way. Basically you have some way of inserting write X to address Y into the command stream (aka a fence), after which you insert write X+1 to address Y and so on. If you want the CPU to wait on a given fence, you just do while (*address x);. If you have multiple GPU processing queues, you can usually also insert a stall this queue until the value at address Y is at least X command into the command stream. DRM uses implicit fences, so it knows which BOs are used for particular commands. So the flow goes something like submit bunch of commands; submit fence write and attach that fence id to the BOs in the previous bunch of comands. Then to wait for a bo to become ready, you just wait until the GPU writes the appropriate number to memory address Y (from above). The mesa drivers can sometimes use clever tricks that avoid this sync'ing because it knows exactly how it emits the commands and perhaps it waits on something related earlier whereby it knows the other thing will be ready. No idea if that's the case here. Hope this helps, -ilia On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Ilia, thanks for the gallium link. Do you know any links to high level info with broad strokes about how this sync works? Frankly I do not know driver terminology and wanted to know more about how this sync is performed for my research. I'm using mesa as a reference because it has open implementation code. Occlusion query functionality probably waits for z-buffer to become ready. Problem is that usual synchronization techniques do not apply here. I'm thinking that driver code gets notifications about state change. I want to know what kind of notifications are available? Can query be performed in parallel with another frame being processed or does it need complete GPU pipeline flush? Thanks, Vyacheslav On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Hi, everyone. Trying to understand method radeonQueryGetResult (more broadly GPU-CPU sync). static void radeonQueryGetResult(struct gl_context *ctx, struct gl_query_object *q) { struct radeon_query_object *query = (struct radeon_query_object *)q; uint32_t *result; int i; radeon_print(RADEON_STATE, RADEON_VERBOSE, %s: query id %d, result %d\n, __func__, query-Base.Id, (int) query-Base.Result); radeon_bo_map(query-bo, GL_FALSE); result = query-bo-ptr; query-Base.Result = 0; for (i = 0; i query-curr_offset/sizeof(uint32_t); ++i) { query-Base.Result += LE32_TO_CPU(result[i]); radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i, LE32_TO_CPU(result[i])); } radeon_bo_unmap(query-bo); } I don't know which part is responsible for blocking behavior (waiting for response from GPU). I suspect that radeon_bo_map does this magic. Can someone point in the right direction? The radeon_bo_map defined in src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic. However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c references the radeon_bo_map in libdrm, which does not appear to wait. FWIW for nouveau, nouveau_bo_map will also implicitly do a nouveau_bo_wait, but that does not appear to be the case for radeon. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once
On Monday, July 06, 2015 05:12:10 PM Chris Wilson wrote: On Mon, Jul 06, 2015 at 04:19:36PM +0300, Martin Peres wrote: On 06/07/15 16:15, Martin Peres wrote: On 06/07/15 16:13, Chris Wilson wrote: On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote: On 06/07/15 13:33, Chris Wilson wrote: Move the query for the TIMESTAMP register from context init to the screen, so that it is only queried once for all contexts. On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low bits always zero. Detect this by repeating the read a few times and check the register is incrementing. You do not do the latter. You only check for the low bits. I guess the counter is supposed to be monotonically increasing and with a resolution of a few microseconds which would make this perfectly valid. Could you confirm and make sure to add this information in the commit message please? The counter should increment every 80ns. What's misleading in what I wrote? It describes the hw bug and how to detect it. Well, it is not misleading, it just lacks this information. If it incremented every seconds, the patch would be stupid because the timestamp could be at 0 and polling 10 times at a few us of interval would always yield the same result. That's all :) Oh, forgot to say: With this information added in the commit message and the commit message duplicated as a comment in intel_detect_timestamp(), the patch is: How about: On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low 32bits always zero. Detect this by repeating the read a few times and check the register is incrementing every 80ns as expected and not stuck on zero (as would be the case with the buggy kernel/hw.). -Chris It would be great to put this in a comment above the loop rather than only in the commit message. Also, moving the check to screen-time and fixing the check to use a loop are really two separate things - but they're so trivial I'm inclined to not be picky. :) Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/18] i965: Pass the map-mode along to intel_mipmap_tree_map_raw()
On Monday, July 06, 2015 11:33:11 AM Chris Wilson wrote: Since we can distinguish when mapping between READ and WRITE, we can pass along the map mode to avoid stalls and flushes where possible. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 28 ++- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 3 ++- 2 files changed, 17 insertions(+), 14 deletions(-) Huh, I thought I fixed this ages ago. Guess not. Thanks! Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/18] i965: Share the workaround bo between all contexts
On Monday, July 06, 2015 11:33:08 AM Chris Wilson wrote: Since the workaround bo is used strictly as a write-only buffer, we need only allocate one per screen and use the same one from all contexts. (The caveat here is during extension initialisation, where we write into and read back register values from the buffer, but that is performed only once for the first context - and baring synchronisation issues should not be a problem. Safer would be to move that also to the screen.) Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 +++--- src/mesa/drivers/dri/i965/intel_screen.c | 4 src/mesa/drivers/dri/i965/intel_screen.h | 1 + 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index 7ee3cb6..05e14cd 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -341,12 +341,12 @@ brw_init_pipe_control(struct brw_context *brw, * the gen6 workaround because it involves actually writing to * the buffer, and the kernel doesn't let us write to the batch. */ - brw-workaround_bo = drm_intel_bo_alloc(brw-bufmgr, - pipe_control workaround, - 4096, 4096); + brw-workaround_bo = brw-intelScreen-workaround_bo; if (brw-workaround_bo == NULL) return -ENOMEM; Checking for out-of-memory conditions in code that doesn't actually allocate anything looks funky now. I'd be inclined just to drop the -ENOMEM path and make this a void function. Alternatively, you could just fold this into the brw_context setup and ditch the functions altogether. Up to you. + drm_intel_bo_reference(brw-workaround_bo); + brw-pipe_controls_since_last_cs_stall = 0; return 0; diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 839a984..cd8e6eb 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv) { struct intel_screen *intelScreen = sPriv-driverPrivate; + drm_intel_bo_unreference(intelScreen-workaround_bo); dri_bufmgr_destroy(intelScreen-bufmgr); driDestroyOptionInfo(intelScreen-optionCache); @@ -1096,6 +1097,9 @@ intel_init_bufmgr(struct intel_screen *intelScreen) return false; } + intelScreen-workaround_bo = + drm_intel_bo_alloc(intelScreen-bufmgr, pipe_control w/a, 4096, 4096); + Seems a little funny to put this in intel_init_bufmgr, since it's not setting up libdrm - why not put it in the caller? return true; } diff --git a/src/mesa/drivers/dri/i965/intel_screen.h b/src/mesa/drivers/dri/i965/intel_screen.h index 941e0fc..e55fddb 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.h +++ b/src/mesa/drivers/dri/i965/intel_screen.h @@ -60,6 +60,7 @@ struct intel_screen bool has_context_reset_notification; dri_bufmgr *bufmgr; + drm_intel_bo *workaround_bo; /** * A unique ID for shader programs. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] i965/gen4-5: Enable 16-wide dispatch on shaders with control flow.
This was probably disabled due to a combination of several bugs in the generator code (fixed earlier in this series) and a misunderstanding of the hardware spec. The documentation for most control flow instructions mentions among other restrictions: Instruction compression is not allowed. This however doesn't have any implications on 16 wide not being supported, because none of the control flow instructions have multi-register operands (control flow instructions are not compressed on more recent hardware either, except maybe SNB's IF with inline compare). In fact Gen4-5 had 16-wide control flow masks and stacks, and the spec mentions in several places that control flow instructions push and pop 16 channels worth of data -- Otherwise there doesn't seem to be any indication that it shouldn't work. Causes no piglit regressions, and gives the following shader-db results on ILK: total instructions in shared programs: 4711384 - 4711384 (0.00%) instructions in affected programs: 0 - 0 helped:0 HURT: 0 GAINED:1215 LOST: 0 --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index bd71404..5247738 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -417,18 +417,12 @@ fs_visitor::nir_emit_if(nir_if *if_stmt) bld.emit(BRW_OPCODE_ENDIF); - if (!try_replace_with_sel() devinfo-gen 6) { - no16(Can't support (non-uniform) control flow on SIMD16\n); - } + try_replace_with_sel(); } void fs_visitor::nir_emit_loop(nir_loop *loop) { - if (devinfo-gen 6) { - no16(Can't support (non-uniform) control flow on SIMD16\n); - } - bld.emit(BRW_OPCODE_DO); nir_emit_cf_list(loop-body); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: Add missing check for whether an expression is an add operation
Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] glsl: Make sure not to dereference NULL
On Sat, Jul 4, 2015 at 2:40 PM, Neil Roberts n...@linux.intel.com wrote: In this bit of code point_five can be NULL if the expression is not a constant. This fixes it to match the pattern of the rest of the chunk of code so that it checks for NULLs. Cc: Matt Turner matts...@gmail.com Cc: 10.6 mesa-sta...@lists.freedesktop.org --- src/glsl/opt_algebraic.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 9b8a426..c4b8715 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -589,7 +589,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) continue; ir_constant *point_five = add_expr-operands[1 - j]-as_constant(); -if (!point_five-is_value(0.5, 0)) +if (!point_five || !point_five-is_value(0.5, 0)) continue; if (abs_expr-operands[0]-equals(sign_expr-operands[0])) { -- 1.9.3 Whoops. I think I thought that is_value NULL checked 'this'. Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] i965/gen4-5: Set ENDIF dst and src0 fields to the null register.
The hardware docs don't mention explicitly what these fields should be, but I've verified experimentally on ILK that using a GRF as destination causes the register to be corrupted when the execution size of an ENDIF instruction is higher than 8 -- and because the destination we were using was g0, eventually a hang. Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3. --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 0f53604..4d39762 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1584,8 +1584,8 @@ brw_ENDIF(struct brw_codegen *p) } if (devinfo-gen 6) { - brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD)); - brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD)); + brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D)); + brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D)); brw_set_src1(p, insn, brw_imm_d(0x0)); } else if (devinfo-gen == 6) { brw_set_dest(p, insn, brw_imm_w(0)); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] i965/gen4-5: Program the execution size correctly for DO/WHILE instructions.
From the hardware docs for the DO instruction: Execution size is ignored for this instruction. My observation on ILK hardware contradicts the spec though, channels over the execution size of a DO instruction won't enter the loop, and channels over the execution size of a WHILE instruction will exit the loop after the first iteration -- The latter is consistent with the spec though, there's no claim that the execution size is ignored for the WHILE instruction so it's not completely unexpected that it has an influence on the evaluation of EMask. The execute_size argument of brw_DO() shouldn't have any effect on Gen6 and newer hardware. On Gen4-5 WHILE instructions inherit the execution size from the matching DO, so this patch should fix them too. The execution size of BREAK and CONT instructions was already being set correctly. Fixes some 50 piglit tests on Gen4-5 when forced to run shaders with conditional and loop instructions 16-wide, e.g. shaders/glsl-fs-continue-inside-do-while. --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 0a70bdc..c986d91 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -1869,7 +1869,7 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) break; case BRW_OPCODE_DO: -brw_DO(p, BRW_EXECUTE_8); +brw_DO(p, dispatch_width == 16 ? BRW_EXECUTE_16 : BRW_EXECUTE_8); break; case BRW_OPCODE_BREAK: -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: allocate at least 1 BLEND_STATE element
On Monday, July 06, 2015 04:24:12 PM Emil Velikov wrote: Hello gents, On 2 July 2015 at 08:45, Kenneth Graunke kenn...@whitecape.org wrote: On Wednesday, July 01, 2015 10:16:28 AM Mike Stroyan wrote: When there are no color buffer render targets, gen6 and gen7 still use the first BLEND_STATE element to determine alpha test. gen6_upload_blend_state was allocating zero elements when ctx-Color.AlphaEnabled was false. That left _3DSTATE_CC_STATE_POINTERS or _3DSTATE_BLEND_STATE_POINTERS pointing to random data from some previous brw_state_batch(). That sometimes suppressed depth rendering when those bits happened to mean COMPAREFUNC_NEVER. This produced flickering shadows for dota2 reborn. --- src/mesa/drivers/dri/i965/gen6_cc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen6_cc.c b/src/mesa/drivers/dri/i965/gen6_cc.c index 2bfa271..2b76e24 100644 --- a/src/mesa/drivers/dri/i965/gen6_cc.c +++ b/src/mesa/drivers/dri/i965/gen6_cc.c @@ -51,7 +51,7 @@ gen6_upload_blend_state(struct brw_context *brw) * with render target 0, which will reference BLEND_STATE[0] for * alpha test enable. */ - if (nr_draw_buffers == 0 ctx-Color.AlphaEnabled) + if (nr_draw_buffers == 0) nr_draw_buffers = 1; size = sizeof(*blend) * nr_draw_buffers; Great catch! Reviewed-by: Kenneth Graunke kenn...@whitecape.org And pushed: 9d408a4..fe2b748 master - master I think we ought to change gen8_blend_state.c as well, but I'm not quite sure what change to make. Either we should make the same change you did here, or delete the whole We need at least 1 BLEND_STATE written block. On Gen8+, it looks like the alpha test and other functions that might discard pixels are all in the shared/common DWord, and the per-color target DWord pairs look relatively harmless. I suppose the null RT would still refer to BLEND_STATE[0]...so it might still be worth emitting one. Any thoughts? Should we get this into 10.6 or there are some not so obvious dependencies that we're missing in the 10.6 branch ? Asking every user to rebuild mesa just to play Dota2 seems unreasonable imho. Cheers, Emil Yeah, it should definitely land in 10.6.I think I just forgot to add the Cc stable tag (sorry!). It shouldn't have any dependencies. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult
On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Hi, everyone. Trying to understand method radeonQueryGetResult (more broadly GPU-CPU sync). static void radeonQueryGetResult(struct gl_context *ctx, struct gl_query_object *q) { struct radeon_query_object *query = (struct radeon_query_object *)q; uint32_t *result; int i; radeon_print(RADEON_STATE, RADEON_VERBOSE, %s: query id %d, result %d\n, __func__, query-Base.Id, (int) query-Base.Result); radeon_bo_map(query-bo, GL_FALSE); result = query-bo-ptr; query-Base.Result = 0; for (i = 0; i query-curr_offset/sizeof(uint32_t); ++i) { query-Base.Result += LE32_TO_CPU(result[i]); radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i, LE32_TO_CPU(result[i])); } radeon_bo_unmap(query-bo); } I don't know which part is responsible for blocking behavior (waiting for response from GPU). I suspect that radeon_bo_map does this magic. Can someone point in the right direction? The radeon_bo_map defined in src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic. However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c references the radeon_bo_map in libdrm, which does not appear to wait. FWIW for nouveau, nouveau_bo_map will also implicitly do a nouveau_bo_wait, but that does not appear to be the case for radeon. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/18] i965: Move pipecontrol workaround bo to brw_pipe_control
On Monday, July 06, 2015 11:33:07 AM Chris Wilson wrote: With the exception of gen8, the sole user of the workaround bo are for emitting pipe controls. Move it out of the purview of the batchbuffer and into the pipecontrol. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/18] i965: Reuse our VBO for streaming fast-clear vertices
On Monday, July 06, 2015 11:33:10 AM Chris Wilson wrote: Rather than allocating a fresh page every time we clear a buffer, keep that page around between invocations by tracking the last used offset and only allocating a fresh page when we wrap. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) This looks okay to me. Do you have any performance data to justify the extra complexity? signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult
Ilia, thanks for the gallium link. Do you know any links to high level info with broad strokes about how this sync works? Frankly I do not know driver terminology and wanted to know more about how this sync is performed for my research. I'm using mesa as a reference because it has open implementation code. Occlusion query functionality probably waits for z-buffer to become ready. Problem is that usual synchronization techniques do not apply here. I'm thinking that driver code gets notifications about state change. I want to know what kind of notifications are available? Can query be performed in parallel with another frame being processed or does it need complete GPU pipeline flush? Thanks, Vyacheslav On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Hi, everyone. Trying to understand method radeonQueryGetResult (more broadly GPU-CPU sync). static void radeonQueryGetResult(struct gl_context *ctx, struct gl_query_object *q) { struct radeon_query_object *query = (struct radeon_query_object *)q; uint32_t *result; int i; radeon_print(RADEON_STATE, RADEON_VERBOSE, %s: query id %d, result %d\n, __func__, query-Base.Id, (int) query-Base.Result); radeon_bo_map(query-bo, GL_FALSE); result = query-bo-ptr; query-Base.Result = 0; for (i = 0; i query-curr_offset/sizeof(uint32_t); ++i) { query-Base.Result += LE32_TO_CPU(result[i]); radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i, LE32_TO_CPU(result[i])); } radeon_bo_unmap(query-bo); } I don't know which part is responsible for blocking behavior (waiting for response from GPU). I suspect that radeon_bo_map does this magic. Can someone point in the right direction? The radeon_bo_map defined in src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic. However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c references the radeon_bo_map in libdrm, which does not appear to wait. FWIW for nouveau, nouveau_bo_map will also implicitly do a nouveau_bo_wait, but that does not appear to be the case for radeon. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once
On Mon, Jul 06, 2015 at 04:19:36PM +0300, Martin Peres wrote: On 06/07/15 16:15, Martin Peres wrote: On 06/07/15 16:13, Chris Wilson wrote: On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote: On 06/07/15 13:33, Chris Wilson wrote: Move the query for the TIMESTAMP register from context init to the screen, so that it is only queried once for all contexts. On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low bits always zero. Detect this by repeating the read a few times and check the register is incrementing. You do not do the latter. You only check for the low bits. I guess the counter is supposed to be monotonically increasing and with a resolution of a few microseconds which would make this perfectly valid. Could you confirm and make sure to add this information in the commit message please? The counter should increment every 80ns. What's misleading in what I wrote? It describes the hw bug and how to detect it. Well, it is not misleading, it just lacks this information. If it incremented every seconds, the patch would be stupid because the timestamp could be at 0 and polling 10 times at a few us of interval would always yield the same result. That's all :) Oh, forgot to say: With this information added in the commit message and the commit message duplicated as a comment in intel_detect_timestamp(), the patch is: How about: On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low 32bits always zero. Detect this by repeating the read a few times and check the register is incrementing every 80ns as expected and not stuck on zero (as would be the case with the buggy kernel/hw.). -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome
https://bugs.freedesktop.org/show_bug.cgi?id=90264 --- Comment #32 from Matt Whitlock freedesk...@mattwhitlock.name --- (In reply to Matt Whitlock from comment #29) Also, for what it's worth, this problem seems to have disappeared for me. I have rebuilt Mesa 10.5.6 without the revert of 95073a2d, and Chromium 44.0.2403.30 is no longer exhibiting the bug for me. I am still on libdrm 2.4.59, but I have since upgraded my Linux kernel to 4.0.5-gentoo. I spoke too soon. I am still seeing tooltip corruptions in Chromium 44.0.2403.61 on Linux 4.0.5-gentoo, libdrm 2.4.62, Nouveau 1.0.11, and Mesa 10.6.1 (without the revert of 95073a2d). -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/18] i965: Make intel_mipmap_tree_map_raw() static
On Monday, July 06, 2015 11:33:12 AM Chris Wilson wrote: No external users, so no need to export the symbol outside of our compilation unit. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Good call. Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Fix missing BRW_NEW_FS_PROG_DATA in gen6_renderbuffer_surfaces.
It looks like this was forgotten in commit 3c9dc2d31b80fc73bffa1f40a (i965: Make a brw_stage_prog_data for storing the SURF_INDEX information.) In other words, it's been missing since we moved to dynamic binding table slot assignments. Cc: mesa-sta...@lists.freedesktop.org Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c index 72aad96..b67d9ca 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c @@ -770,7 +770,7 @@ update_renderbuffer_surfaces(struct brw_context *brw) { const struct gl_context *ctx = brw-ctx; - /* _NEW_BUFFERS | _NEW_COLOR */ + /* _NEW_BUFFERS | _NEW_COLOR | BRW_NEW_FS_PROG_DATA */ const struct gl_framebuffer *fb = ctx-DrawBuffer; brw_update_renderbuffer_surfaces( brw, fb, @@ -792,7 +792,8 @@ const struct brw_tracked_state brw_renderbuffer_surfaces = { const struct brw_tracked_state gen6_renderbuffer_surfaces = { .dirty = { .mesa = _NEW_BUFFERS, - .brw = BRW_NEW_BATCH, + .brw = BRW_NEW_BATCH | + BRW_NEW_FS_PROG_DATA, }, .emit = update_renderbuffer_surfaces, }; -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult
I'm only really familiar with nouveau, but I think all GPU hardware works in roughly the same way. Basically you have some way of inserting write X to address Y into the command stream (aka a fence), after which you insert write X+1 to address Y and so on. If you want the CPU to wait on a given fence, you just do while (*address x);. If you have multiple GPU processing queues, you can usually also insert a stall this queue until the value at address Y is at least X command into the command stream. DRM uses implicit fences, so it knows which BOs are used for particular commands. So the flow goes something like submit bunch of commands; submit fence write and attach that fence id to the BOs in the previous bunch of comands. Then to wait for a bo to become ready, you just wait until the GPU writes the appropriate number to memory address Y (from above). The mesa drivers can sometimes use clever tricks that avoid this sync'ing because it knows exactly how it emits the commands and perhaps it waits on something related earlier whereby it knows the other thing will be ready. No idea if that's the case here. Hope this helps, -ilia On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Ilia, thanks for the gallium link. Do you know any links to high level info with broad strokes about how this sync works? Frankly I do not know driver terminology and wanted to know more about how this sync is performed for my research. I'm using mesa as a reference because it has open implementation code. Occlusion query functionality probably waits for z-buffer to become ready. Problem is that usual synchronization techniques do not apply here. I'm thinking that driver code gets notifications about state change. I want to know what kind of notifications are available? Can query be performed in parallel with another frame being processed or does it need complete GPU pipeline flush? Thanks, Vyacheslav On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Hi, everyone. Trying to understand method radeonQueryGetResult (more broadly GPU-CPU sync). static void radeonQueryGetResult(struct gl_context *ctx, struct gl_query_object *q) { struct radeon_query_object *query = (struct radeon_query_object *)q; uint32_t *result; int i; radeon_print(RADEON_STATE, RADEON_VERBOSE, %s: query id %d, result %d\n, __func__, query-Base.Id, (int) query-Base.Result); radeon_bo_map(query-bo, GL_FALSE); result = query-bo-ptr; query-Base.Result = 0; for (i = 0; i query-curr_offset/sizeof(uint32_t); ++i) { query-Base.Result += LE32_TO_CPU(result[i]); radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i, LE32_TO_CPU(result[i])); } radeon_bo_unmap(query-bo); } I don't know which part is responsible for blocking behavior (waiting for response from GPU). I suspect that radeon_bo_map does this magic. Can someone point in the right direction? The radeon_bo_map defined in src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic. However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c references the radeon_bo_map in libdrm, which does not appear to wait. FWIW for nouveau, nouveau_bo_map will also implicitly do a nouveau_bo_wait, but that does not appear to be the case for radeon. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965/gen4-5: Enable 16-wide dispatch on shaders with control flow.
On Mon, Jul 6, 2015 at 11:03 AM, Francisco Jerez curroje...@riseup.net wrote: This was probably disabled due to a combination of several bugs in the generator code (fixed earlier in this series) and a misunderstanding of the hardware spec. The documentation for most control flow instructions mentions among other restrictions: Instruction compression is not allowed. This however doesn't have any implications on 16 wide not being supported, because none of the control flow instructions have multi-register operands (control flow instructions are not compressed on more recent hardware either, except maybe SNB's IF with inline compare). In fact Gen4-5 had 16-wide control flow masks and stacks, and the spec mentions in several places that control flow instructions push and pop 16 channels worth of data -- Otherwise there doesn't seem to be any indication that it shouldn't work. Awesome. Yeah, I was wondering aloud to Ken recently about what actually prevented non-uniform control flow from working -- because as you say the mask registers are 16-channels wide. Really cool to find out that it was just a couple of bugs and nothing fundamental. Really nice work. Causes no piglit regressions, and gives the following shader-db results on ILK: Assuming no regressions on Gen4 and G45 (running through Jenkins should do it): Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager
On Monday, July 06, 2015 11:33:09 AM Chris Wilson wrote: When submitting commands to the GPU every cycle of latency counts; mutexes, spinlocks, even atomics quickly add to substantial overhead. This batch manager acts as thread-local shim over the buffer manager (drm_intel_bufmgr_gem). As we are only ever used from within a single context, we can rely on the upper layers providing thread safety. This allows us to import buffers from the shared screen (sharing buffers between multiple contexts, threads and users) and wrap that handle in our own. Similarly, we want to share the buffer cache between all users on the file and so allocate from the global threadsafe buffer manager, with a very small and transient local cache of active buffers. The batch manager provides a cheap way of busyness tracking and very efficient batch construction and kernel submission. The restrictions over and above the generic submission engine in intel_bufmgr_gem are: - not thread-safe - flat relocations, only the batch buffer itself carries relocations. Relocations relative to auxiliary buffers must be performed via STATE_BASE - direct mapping of the batch for writes, expect reads from the batch to be slow - the batch is a fixed 64k in size - access to the batch must be wrapped by brw_batch_begin/_end - all relocations must be immediately written into the batch The importance of the flat relocation tree with local offset handling is that it allows us to use the relocation-less execbuffer interfaces, dramatically reducing the overhead of batch submission. However, that can be relaxed to allow other buffers than the batch buffer to carry relocations, if need be. ivb/bdw OglBatch7 improves by ~20% above and beyond my kernel relocation speedups. ISSUES: * shared mipmap trees - we instantiate a context local copy on use, but what are the semantics for serializing read/writes between them - do we need automagic flushing of execution on other contexts and common busyness tracking? - we retain references to the bo past the lifetime of its parent batchmgr as the mipmap_tree is retained past the lifetime of its original context, see glx_arb_create_context/default_major_version * OglMultithread is nevertheless unhappy; but that looks like undefined behaviour - i.e. a buggy client concurrently executing the same GL context in multiple threads, unpatched is equally buggy. * Add full-ppgtt softpinning support (no more relocations, at least for the first 256TiB), at the moment there is a limited proof-of-principle demonstration * polish and move to libdrm; though at the cost of sealing the structs? Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Kristian Høgsberg k...@bitplanet.net Cc: Kenneth Graunke kenn...@whitecape.org Cc: Jesse Barnes jbar...@virtuousgeek.org Cc: Ian Romanick ian.d.roman...@intel.com Cc: Abdiel Janulgue abdiel.janul...@linux.intel.com Cc: Eero Tamminen eero.t.tammi...@intel.com Cc: Martin Peres martin.pe...@linux.intel.com --- src/mesa/drivers/dri/i965/Makefile.sources |4 +- src/mesa/drivers/dri/i965/brw_batch.c | 1946 src/mesa/drivers/dri/i965/brw_batch.h | 377 src/mesa/drivers/dri/i965/brw_binding_tables.c |1 - src/mesa/drivers/dri/i965/brw_blorp.cpp| 46 +- src/mesa/drivers/dri/i965/brw_cc.c | 16 +- src/mesa/drivers/dri/i965/brw_clear.c |1 - src/mesa/drivers/dri/i965/brw_clip.c |2 - src/mesa/drivers/dri/i965/brw_clip_line.c |2 - src/mesa/drivers/dri/i965/brw_clip_point.c |2 - src/mesa/drivers/dri/i965/brw_clip_state.c | 14 +- src/mesa/drivers/dri/i965/brw_clip_tri.c |2 - src/mesa/drivers/dri/i965/brw_clip_unfilled.c |2 - src/mesa/drivers/dri/i965/brw_clip_util.c |2 - src/mesa/drivers/dri/i965/brw_compute.c| 42 +- src/mesa/drivers/dri/i965/brw_conditional_render.c |2 +- src/mesa/drivers/dri/i965/brw_context.c| 233 ++- src/mesa/drivers/dri/i965/brw_context.h| 144 +- src/mesa/drivers/dri/i965/brw_cs.cpp |6 +- src/mesa/drivers/dri/i965/brw_curbe.c |1 - src/mesa/drivers/dri/i965/brw_draw.c | 103 +- src/mesa/drivers/dri/i965/brw_draw_upload.c| 23 +- src/mesa/drivers/dri/i965/brw_ff_gs.c |2 - src/mesa/drivers/dri/i965/brw_ff_gs_emit.c |1 - src/mesa/drivers/dri/i965/brw_fs.cpp |5 +- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c| 11 +- src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c |1 - src/mesa/drivers/dri/i965/brw_meta_updownsample.c |1 - src/mesa/drivers/dri/i965/brw_misc_state.c | 10 +-
Re: [Mesa-dev] [PATCH 1/3] i965/gen4-5: Set ENDIF dst and src0 fields to the null register.
On Mon, Jul 6, 2015 at 11:03 AM, Francisco Jerez curroje...@riseup.net wrote: The hardware docs don't mention explicitly what these fields should be, but I've verified experimentally on ILK that using a GRF as destination causes the register to be corrupted when the execution size of an ENDIF instruction is higher than 8 -- and because the destination we were using was g0, eventually a hang. Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3. Neat! --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 0f53604..4d39762 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1584,8 +1584,8 @@ brw_ENDIF(struct brw_codegen *p) } if (devinfo-gen 6) { - brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD)); - brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD)); + brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D)); + brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D)); Strange that it was an imm32 (type D) on gen 6 and then imm16 after that. Reviewed-by: Matt Turner matts...@gmail.com brw_set_src1(p, insn, brw_imm_d(0x0)); } else if (devinfo-gen == 6) { brw_set_dest(p, insn, brw_imm_w(0)); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91226] Crash in glLinkProgram (NEW)
https://bugs.freedesktop.org/show_bug.cgi?id=91226 Neil Roberts n...@linux.intel.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #2 from Neil Roberts n...@linux.intel.com --- I've pushed the patch here: http://cgit.freedesktop.org/mesa/mesa/commit/?id=18039078e0254c7cb5e15b7186be -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Add a MUST_CHECK macro for __attribute__((warn_unused_result)).
On Mon, Jul 6, 2015 at 11:18 AM, Kenneth Graunke kenn...@whitecape.org wrote: In the kernel, this is called __must_check; all our attribute macros in Mesa appear to be uppercase, so I went with that. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Cc: ch...@chris-wilson.co.uk Cc: matts...@gmail.com --- Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 15/18] swrast: Defer _tnl_vertex_init until first use
On Mon, Jul 6, 2015 at 3:33 AM, Chris Wilson ch...@chris-wilson.co.uk wrote: The vertices require a large chunk of memory, currently allocated during context creation. However, this memory is not required until use so we can defer the allocation until the first swrast_Wakeup(). Makes sense to me. Someone like Brian like to take a look. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Kenneth Graunke kenn...@whitecape.org --- src/mesa/swrast_setup/ss_context.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/mesa/swrast_setup/ss_context.c b/src/mesa/swrast_setup/ss_context.c index 74b1da3..028eccb 100644 --- a/src/mesa/swrast_setup/ss_context.c +++ b/src/mesa/swrast_setup/ss_context.c @@ -59,10 +59,6 @@ _swsetup_CreateContext( struct gl_context *ctx ) swsetup-NewState = ~0; _swsetup_trifuncs_init( ctx ); - _tnl_init_vertices( ctx, ctx-Const.MaxArrayLockSize + 12, - sizeof(SWvertex) ); - - return GL_TRUE; } @@ -233,6 +229,11 @@ _swsetup_Wakeup( struct gl_context *ctx ) TNLcontext *tnl = TNL_CONTEXT(ctx); SScontext *swsetup = SWSETUP_CONTEXT(ctx); + if (!(GET_VERTEX_STATE(ctx))-max_vertex_size) + _tnl_init_vertices(ctx, +ctx-Const.MaxArrayLockSize + 12, +sizeof(SWvertex)); Replace the tabs on these two lines with spaces. + tnl-Driver.Render.Start = _swsetup_RenderStart; tnl-Driver.Render.Finish = _swsetup_RenderFinish; tnl-Driver.Render.PrimitiveNotify = _swsetup_RenderPrimitive; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback
On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote: Since the purpose of transform feedback tends to be for the client to act upon the results to change the geometry in the scene, it is likely that the client will soon be waiting upon the results. Flush the batch early so that we don't build up a long queue of commands afterwards that could delay the readback. --- src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index 857ebe5..13dbe5b 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx, brw_batch_end(brw-batch); + /* We will likely want to read the results in the very near future, so +* push this primitive to hardware if it is currently idle. +*/ + if (!brw_batch_busy(brw-batch)) + brw_batch_flush(brw-batch); + /* EndTransformFeedback() means that we need to update the number of * vertices written. Since it's only necessary if DrawTransformFeedback() * is called and it means mapping a buffer object, we delay computing it We need some data to justify this change. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965/gen4-5: Program the execution size correctly for DO/WHILE instructions.
On Mon, Jul 6, 2015 at 11:03 AM, Francisco Jerez curroje...@riseup.net wrote: From the hardware docs for the DO instruction: Execution size is ignored for this instruction. My observation on ILK hardware contradicts the spec though, channels /facepalm over the execution size of a DO instruction won't enter the loop, and channels over the execution size of a WHILE instruction will exit the loop after the first iteration -- The latter is consistent with the spec though, there's no claim that the execution size is ignored for the WHILE instruction so it's not completely unexpected that it has an influence on the evaluation of EMask. The execute_size argument of brw_DO() shouldn't have any effect on Gen6 and newer hardware. On Gen4-5 WHILE instructions inherit the execution size from the matching DO, so this patch should fix them too. The execution size of BREAK and CONT instructions was already being set correctly. Fixes some 50 piglit tests on Gen4-5 when forced to run shaders with conditional and loop instructions 16-wide, e.g. shaders/glsl-fs-continue-inside-do-while. Awesome! Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Add a MUST_CHECK macro for __attribute__((warn_unused_result)).
In the kernel, this is called __must_check; all our attribute macros in Mesa appear to be uppercase, so I went with that. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Cc: ch...@chris-wilson.co.uk Cc: matts...@gmail.com --- configure.ac | 1 + src/util/macros.h | 6 ++ 2 files changed, 7 insertions(+) I noticed Chris wants to use this in one of his patches, so I figured I'd throw together a patch to do this a bit more cleanly. diff --git a/configure.ac b/configure.ac index ea0f069..d240c06 100644 --- a/configure.ac +++ b/configure.ac @@ -210,6 +210,7 @@ AX_GCC_FUNC_ATTRIBUTE([format]) AX_GCC_FUNC_ATTRIBUTE([malloc]) AX_GCC_FUNC_ATTRIBUTE([packed]) AX_GCC_FUNC_ATTRIBUTE([unused]) +AX_GCC_FUNC_ATTRIBUTE([warn_unused_result]) AM_CONDITIONAL([GEN_ASM_OFFSETS], test x$GEN_ASM_OFFSETS = xyes) diff --git a/src/util/macros.h b/src/util/macros.h index 3b708ed..66698e7 100644 --- a/src/util/macros.h +++ b/src/util/macros.h @@ -182,6 +182,12 @@ do { \ #define UNUSED #endif +#ifdef HAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT +#define MUST_CHECK __attribute__((warn_unused_result)) +#else +#define MUST_CHECK +#endif + /** Compute ceiling of integer quotient of A divided by B. */ #define DIV_ROUND_UP( A, B ) ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 ) -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] i965/gen4-5: Set ENDIF dst and src0 fields to the null register.
On Monday, July 06, 2015 09:03:35 PM Francisco Jerez wrote: The hardware docs don't mention explicitly what these fields should be, but I've verified experimentally on ILK that using a GRF as destination causes the register to be corrupted when the execution size of an ENDIF instruction is higher than 8 -- and because the destination we were using was g0, eventually a hang. Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3. --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 0f53604..4d39762 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1584,8 +1584,8 @@ brw_ENDIF(struct brw_codegen *p) } if (devinfo-gen 6) { - brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD)); - brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD)); + brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D)); + brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D)); brw_set_src1(p, insn, brw_imm_d(0x0)); } else if (devinfo-gen == 6) { brw_set_dest(p, insn, brw_imm_w(0)); Hah! I always thought there were just some bugs in this area, but I'm surprised to see that they were so small. Great work tracking them down! We were doing a bunch of other bogus stuff back in the day, but some of that got fixed during the Gen4-7 and Gen8+ code generator merging. Nice work :) Series is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Fix missing BRW_NEW_FS_PROG_DATA in gen6_renderbuffer_surfaces.
On Mon, Jul 6, 2015 at 9:55 AM, Kenneth Graunke kenn...@whitecape.org wrote: It looks like this was forgotten in commit 3c9dc2d31b80fc73bffa1f40a (i965: Make a brw_stage_prog_data for storing the SURF_INDEX information.) In other words, it's been missing since we moved to dynamic binding table slot assignments. Author: Eric Anholt e...@anholt.net AuthorDate: Wed Oct 2 14:07:40 2013 -0700 CommitDate: Tue Oct 15 10:18:42 2013 -0700 Dang. How did you find this? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib
+ } else + if (!vb-buffer) { Should the else and if be on the same line? The general style elsewhere is to do it in this weird way. Can't say I'm a big fan, but I prefer consistency. I'd happily take a change that undid that oddity. Odd, okay. signature.asc Description: Digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib
On Mon, Jul 6, 2015 at 7:50 PM, Dylan Baker baker.dyla...@gmail.com wrote: On Mon, Jul 06, 2015 at 11:34:23PM +0200, Samuel Pitoiset wrote: Before validating vertex arrays we need to check if a VBO is present. Checking if vb-buffer is not NULL fixes the issue. Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com --- src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c index 1fd33b8..3d200bd 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c @@ -382,6 +382,11 @@ nv50_vertex_arrays_validate(struct nv50_context *nv50) if (nv50-vbo_user (1 b)) { address = addrs[b] + ve-pipe.src_offset; limit = addrs[b] + limits[b]; + } else + if (!vb-buffer) { Should the else and if be on the same line? The general style elsewhere is to do it in this weird way. Can't say I'm a big fan, but I prefer consistency. I'd happily take a change that undid that oddity. + BEGIN_NV04(push, NV50_3D(VERTEX_ARRAY_FETCH(i)), 1); + PUSH_DATA (push, 0); + continue; } else { struct nv04_resource *buf = nv04_resource(vb-buffer); if (!(refd (1 b))) { -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Don't disable SIMD16 when using the pixel interpolator
On Sun, Jul 5, 2015 at 4:45 PM, Francisco Jerez curroje...@riseup.net wrote: Hi Matt, Matt Turner matts...@gmail.com writes: On Fri, Jul 3, 2015 at 3:46 AM, Francisco Jerez curroje...@riseup.net wrote: Heh, I happened to come across this comment yesterday while looking for the remaining no16 calls and wondered why on earth it couldn't do the same that the normal interpolation code does. After this patch and a series coming up that will remove all SIMD8 fallbacks from the texturing code, the only case left still applicable to Gen7 hardware and later will be SIMD16 explicit accumulator operands unsupported. Anyone? I can explain the problem: Prior to Gen7, the were were two accumulator registers usable for most datatypes (acc0, acc1). On Gen7, they removed integer-support from acc1, which was necessary to implement SIMD16 integer multiplication using the normal MUL/MACH sequence. IIRC they got rid of the acc1 register on IVB altogether, but managed to emulate it for floating point types by taking advantage of the extra precision not normally used for floating point arithmetic (the fake acc1 basically uses the same storage in the EU that holds the 32 MSBs of each component of acc0), what explains the apparent asymmetry between integer and floating point data types. I've never read anything that told me that -- what have you seen? I implemented 32-bit integer multiplication without using the accumulator in: commit f7df169ba13d22338e9276839a7e9629ca0a6b4f Author: Matt Turner matts...@gmail.com Date: Wed May 13 18:34:03 2015 -0700 i965/fs: Implement integer multiply without mul/mach. The remaining cases of SIMD16 explicit accumulator operands unsupported are ADDC, SUBB, and 32x32 - high 32-bit multiplication. The remaining multiplication case can probably be reimplemented without the accumulator, like I did for the low 32-bit result. Hmm, I have the suspicion that high 32-bit multiplication is the one legit use-case of the accumulator we have left, any algorithm breaking it up into individual 32/16-bit MULs would end up doing more multiplications than the two MUL/MACH instructions we do now, because we wouldn't be able to take advantage of the full precision implemented in the hardware if we truncate the 48-bit intermediate results to fit in a 32-bit register. That's probably true. It's just that Sandybridge and earlier don't expose the functionality (but could do 64-bit integer multiplication just fine), Ivybridge has the quarter-control/accumulator bug, Haswell works fine if you split the multiplication sequence into SIMD8, and Broadwell let's you do 32x32 - 64-bit multiplication without the accumulator. So you have only two platforms where it's you have to use the accumulator, and one of them is broken (but I guess can be trivially fixed by some force-writemask-all hackery). The best SIMD16 code for [iu]mulExtended() where both lsb and msb results are used is probably 2 sets of mul/mach/mov (with some kind of work around for Ivybridge), but that's kind of hard to recognize. How about we use the SIMD width lowering pass to split the computation in half? It should be quite straightforward but will probably require adding a new virtual opcode so that the SIMD width lowering pass doesn't have to deal with (seriously fucked-up) accumulators directly. Seems fine to me. The ADDC and SUBB instructions implicitly write a bit to the accumulator if their operations overflowed. The 1Q/2Q quarter control is supposed to select which register is implicitly written -- except that there is no acc1 for integer types. Haswell and newer ignore the quarter control and always write acc0, but IVB (and presumably BYT) attempt to write to the nonexistent acc1. You could split the the SIMD16 operations into 2x SIMD8s and set force_writemask_all on the second, followed by a 2Q MOV from the accumulator. Maybe we'd rather use the .o (overflow) conditional mod on a result ADD to implement this. Yeah. I did in fact try to implement uaddCarry last Friday without using the accumulator by doing something like: | CMP.o tmp, src0, -src1 | MOV dst, -tmp ...what of course didn't work because of the extra argument precision post-source modifiers and also because the .o condmod doesn't work at all on CMP, but... Ah, you were trying to use the fact that CMP returns 0/-1. That's a cool idea. It's too bad that the CMP instruction doesn't do .o I'd been thinking of doing ADD.o tmp, src0, src1 and then something that sets/selects 0/1 based on the flag register. Maybe even a move from the flag register would be best. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib
On Mon, Jul 06, 2015 at 11:34:23PM +0200, Samuel Pitoiset wrote: Before validating vertex arrays we need to check if a VBO is present. Checking if vb-buffer is not NULL fixes the issue. Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com --- src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c index 1fd33b8..3d200bd 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c @@ -382,6 +382,11 @@ nv50_vertex_arrays_validate(struct nv50_context *nv50) if (nv50-vbo_user (1 b)) { address = addrs[b] + ve-pipe.src_offset; limit = addrs[b] + limits[b]; + } else + if (!vb-buffer) { Should the else and if be on the same line? + BEGIN_NV04(push, NV50_3D(VERTEX_ARRAY_FETCH(i)), 1); + PUSH_DATA (push, 0); + continue; } else { struct nv04_resource *buf = nv04_resource(vb-buffer); if (!(refd (1 b))) { -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: Digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib
On Mon, Jul 6, 2015 at 7:55 PM, Dylan Baker baker.dyla...@gmail.com wrote: + } else + if (!vb-buffer) { Should the else and if be on the same line? The general style elsewhere is to do it in this weird way. Can't say I'm a big fan, but I prefer consistency. I'd happily take a change that undid that oddity. Odd, okay. Its one advantage is it allows one to easily remove clauses either with #if or with comments on a line level without editing other lines. However that's a pretty rare occurrence. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: fix a SIGSEGV with piglit bin/gl-3.1-vao-broken-attrib
Reviewed-by: Ilia Mirkin imir...@alum.mit.edu But please change the commit title to mention the actual effect, like avoid segfault with enabled but unbound vertex attrib or something. The piglit fix should be mentioned in the body of the message -- it's a nice side-effect, but not really title-worthy. Also: Cc: mesa-sta...@lists.freedesktop.org in the commit so that it gets cherry-picked into stable releases. On Mon, Jul 6, 2015 at 5:34 PM, Samuel Pitoiset samuel.pitoi...@gmail.com wrote: Before validating vertex arrays we need to check if a VBO is present. Checking if vb-buffer is not NULL fixes the issue. Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com --- src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c index 1fd33b8..3d200bd 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c @@ -382,6 +382,11 @@ nv50_vertex_arrays_validate(struct nv50_context *nv50) if (nv50-vbo_user (1 b)) { address = addrs[b] + ve-pipe.src_offset; limit = addrs[b] + limits[b]; + } else + if (!vb-buffer) { + BEGIN_NV04(push, NV50_3D(VERTEX_ARRAY_FETCH(i)), 1); + PUSH_DATA (push, 0); + continue; } else { struct nv04_resource *buf = nv04_resource(vb-buffer); if (!(refd (1 b))) { -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 85203] [DRI2][RadeonSI] Pageflip completion event has impossible msc.
https://bugs.freedesktop.org/show_bug.cgi?id=85203 --- Comment #17 from Michel Dänzer mic...@daenzer.net --- http://lists.freedesktop.org/archives/dri-devel/2015-July/085663.html should fix this. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 85203] [DRI2][RadeonSI] Pageflip completion event has impossible msc.
https://bugs.freedesktop.org/show_bug.cgi?id=85203 Aaron B aaronbotte...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #18 from Aaron B aaronbotte...@gmail.com --- I've long since replaced the card with a GTX 970 Superclocked. I'll mark it fixed and assume it is, thank guys for your work. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult
On 07.07.2015 01:32, Ilia Mirkin wrote: On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan ytri...@gmail.com wrote: Hi, everyone. Trying to understand method radeonQueryGetResult (more broadly GPU-CPU sync). static void radeonQueryGetResult(struct gl_context *ctx, struct gl_query_object *q) { struct radeon_query_object *query = (struct radeon_query_object *)q; uint32_t *result; int i; radeon_print(RADEON_STATE, RADEON_VERBOSE, %s: query id %d, result %d\n, __func__, query-Base.Id, (int) query-Base.Result); radeon_bo_map(query-bo, GL_FALSE); result = query-bo-ptr; query-Base.Result = 0; for (i = 0; i query-curr_offset/sizeof(uint32_t); ++i) { query-Base.Result += LE32_TO_CPU(result[i]); radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i, LE32_TO_CPU(result[i])); } radeon_bo_unmap(query-bo); } I don't know which part is responsible for blocking behavior (waiting for response from GPU). I suspect that radeon_bo_map does this magic. Can someone point in the right direction? The radeon_bo_map defined in src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic. However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c references the radeon_bo_map in libdrm, which does not appear to wait. It does, see radeon_bo_gem.c:bo_map(): wait: boi-ptr = bo_gem-priv_ptr; r = bo_wait(boi); -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Use param export count from si_llvm_export_vs in si_shader_vs
On 06.07.2015 20:41, Marek Olšák wrote: I thought your patch does something else, but after re-reading it, it seems to do the right thing. Yeah, si_llvm_export_vs's param_count only counts parameter exports. Reviewed-by: Marek Olšák marek.ol...@amd.com Thanks. On Mon, Jul 6, 2015 at 1:38 PM, Marek Olšák mar...@gmail.com wrote: CLIPDIST is special in that it's exported twice, first as POS and then as PARAM, the same will be done for CULLDIST in the future, so these two should indeed be removed from the list. With this change, you won't need to do anything in si_shader_vs when making that change for CULLDIST, si_llvm_export_vs just needs to include it in param_count. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback
On Mon, Jul 6, 2015 at 12:36 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote: Since the purpose of transform feedback tends to be for the client to act upon the results to change the geometry in the scene, it is likely that the client will soon be waiting upon the results. Flush the batch early so that we don't build up a long queue of commands afterwards that could delay the readback. --- src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index 857ebe5..13dbe5b 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx, brw_batch_end(brw-batch); + /* We will likely want to read the results in the very near future, so +* push this primitive to hardware if it is currently idle. +*/ + if (!brw_batch_busy(brw-batch)) + brw_batch_flush(brw-batch); + /* EndTransformFeedback() means that we need to update the number of * vertices written. Since it's only necessary if DrawTransformFeedback() * is called and it means mapping a buffer object, we delay computing it We need some data to justify this change. I think even the theory is not correct - transform feedback is typically fed back into the GPU (as new geometry, eg) rather than consumed by the CPU, and in that case the flush is not helpful. But at the end of the day, data will tell. Kristian --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult
Hi, everyone. Trying to understand method radeonQueryGetResult (more broadly GPU-CPU sync). static void radeonQueryGetResult(struct gl_context *ctx, struct gl_query_object *q) { struct radeon_query_object *query = (struct radeon_query_object *)q; uint32_t *result; int i; radeon_print(RADEON_STATE, RADEON_VERBOSE, %s: query id %d, result %d\n, __func__, query-Base.Id, (int) query-Base.Result); radeon_bo_map(query-bo, GL_FALSE); result = query-bo-ptr; query-Base.Result = 0; for (i = 0; i query-curr_offset/sizeof(uint32_t); ++i) { query-Base.Result += LE32_TO_CPU(result[i]); radeon_print(RADEON_STATE, RADEON_TRACE, result[%d] = %d\n, i, LE32_TO_CPU(result[i])); } radeon_bo_unmap(query-bo); } I don't know which part is responsible for blocking behavior (waiting for response from GPU). I suspect that radeon_bo_map does this magic. Can someone point in the right direction? Thanks, Vyacheslav ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/18] i965: Make intel_mipmap_tree_map_raw() static
On 06/07/15 13:33, Chris Wilson wrote: No external users, so no need to export the symbol outside of our compilation unit. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Reviewed-by: Martin Peres martin.pe...@linux.intel.com --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 36 +-- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 7 -- 2 files changed, 18 insertions(+), 25 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index fcad043..f9dc74f 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -1342,6 +1342,24 @@ intel_miptree_copy_teximage(struct brw_context *brw, intel_obj-needs_validate = true; } +static void * +intel_miptree_map_raw(struct brw_context *brw, + struct intel_mipmap_tree *mt, + GLbitfield mode) +{ + /* CPU accesses to color buffers don't understand fast color clears, so +* resolve any pending fast color clears before we map. +*/ + intel_miptree_resolve_color(brw, mt); + return brw_bo_map(mt-bo, mode GL_MAP_WRITE_BIT ? MAP_WRITE : MAP_READ); +} + +static void +intel_miptree_unmap_raw(struct brw_context *brw, +struct intel_mipmap_tree *mt) +{ +} + static bool intel_miptree_alloc_mcs(struct brw_context *brw, struct intel_mipmap_tree *mt, @@ -2052,24 +2070,6 @@ intel_miptree_updownsample(struct brw_context *brw, } } -void * -intel_miptree_map_raw(struct brw_context *brw, - struct intel_mipmap_tree *mt, - GLbitfield mode) -{ - /* CPU accesses to color buffers don't understand fast color clears, so -* resolve any pending fast color clears before we map. -*/ - intel_miptree_resolve_color(brw, mt); - return brw_bo_map(mt-bo, mode GL_MAP_WRITE_BIT ? MAP_WRITE : MAP_READ); -} - -void -intel_miptree_unmap_raw(struct brw_context *brw, -struct intel_mipmap_tree *mt) -{ -} - static void intel_miptree_map_gtt(struct brw_context *brw, struct intel_mipmap_tree *mt, diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h index 6c69572..0c25f1f 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h @@ -773,13 +773,6 @@ brw_miptree_layout(struct brw_context *brw, enum intel_miptree_tiling_mode requested, uint32_t layout_flags); -void *intel_miptree_map_raw(struct brw_context *brw, - struct intel_mipmap_tree *mt, - GLbitfield mode); - -void intel_miptree_unmap_raw(struct brw_context *brw, - struct intel_mipmap_tree *mt); - void intel_miptree_map(struct brw_context *brw, struct intel_mipmap_tree *mt, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev