Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback
On Tuesday, July 07, 2015 04:46:22 PM Chris Wilson wrote: On Tue, Jul 07, 2015 at 10:12:20AM +0100, Chris Wilson wrote: On Mon, Jul 06, 2015 at 09:05:18PM -0700, Kristian Høgsberg wrote: On Mon, Jul 6, 2015 at 12:36 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote: Since the purpose of transform feedback tends to be for the client to act upon the results to change the geometry in the scene, it is likely that the client will soon be waiting upon the results. Flush the batch early so that we don't build up a long queue of commands afterwards that could delay the readback. --- src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index 857ebe5..13dbe5b 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx, brw_batch_end(brw-batch); + /* We will likely want to read the results in the very near future, so +* push this primitive to hardware if it is currently idle. +*/ + if (!brw_batch_busy(brw-batch)) + brw_batch_flush(brw-batch); + /* EndTransformFeedback() means that we need to update the number of * vertices written. Since it's only necessary if DrawTransformFeedback() * is called and it means mapping a buffer object, we delay computing it We need some data to justify this change. I think even the theory is not correct - transform feedback is typically fed back into the GPU (as new geometry, eg) rather than consumed by the CPU, and in that case the flush is not helpful. But at the end of the day, data will tell. How are they fed back? Can the xfb buffer be bound to the vertex buffer? (Genuine question! The only examples I've seen were for testing by the CPU.) Yes, it can. Just glBindBuffer() some buffers around. Or, I suspect one could bind it as a texture buffer object or SSBO and then use a compute shader on the results. With GL 4.x, the avoid synchronizing with the CPU mentality is a lot more prevalent, due to the advent of compute shaders. I've reviewed the code again, and gen7_end_transform_feedback() is always followed by brw_compute_xfb_vertices_written (and a read of the sol buffer) afaict, maybe not immediately but always before the next transform feedback. Sadly, yes. We have a primitive count and we need a vertex count - so, a tiny bit of math. Ideally, we would use the Gen7.5 MI_MATH+ feature to do this, eliminating the CPU-GPU synchronization point. Also afaict it is not possible to map the sol buffer directly into the application. -Chris It definitely is - the application creates GL buffer objects and binds them for use with transform feedback. They can certainly glMapBufferRange() those buffers. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] gallium/hud: replace byte units flag with pipe_driver_query_type
Instead of using a boolean 'is bytes' value, use the pipe_driver_query_type enum type. This will let is add support for time values in the next patch. --- src/gallium/auxiliary/hud/hud_context.c | 20 src/gallium/auxiliary/hud/hud_driver_query.c | 9 +++-- src/gallium/auxiliary/hud/hud_private.h | 5 +++-- 3 files changed, 18 insertions(+), 16 deletions(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index 6a124f7..9f42da9 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -231,14 +231,16 @@ hud_draw_string(struct hud_context *hud, unsigned x, unsigned y, } static void -number_to_human_readable(uint64_t num, boolean is_in_bytes, char *out) +number_to_human_readable(uint64_t num, enum pipe_driver_query_type type, + char *out) { static const char *byte_units[] = {, KB, MB, GB, TB, PB, EB}; static const char *metric_units[] = {, k, M, G, T, P, E}; - const char **units = is_in_bytes ? byte_units : metric_units; - double divisor = is_in_bytes ? 1024 : 1000; + const char **units = + (type == PIPE_DRIVER_QUERY_TYPE_BYTES) ? byte_units : metric_units; + double divisor = (type == PIPE_DRIVER_QUERY_TYPE_BYTES) ? 1024 : 1000; int unit = 0; double d = num; @@ -301,7 +303,7 @@ hud_pane_accumulate_vertices(struct hud_context *hud, hud-font.glyph_height / 2; number_to_human_readable(pane-max_value * i / 5, - pane-uses_byte_units, str); + pane-type, str); hud_draw_string(hud, x, y, str); } @@ -312,7 +314,7 @@ hud_pane_accumulate_vertices(struct hud_context *hud, unsigned y = pane-y2 + 2 + i*hud-font.glyph_height; number_to_human_readable(gr-current_value, - pane-uses_byte_units, str); + pane-type, str); hud_draw_string(hud, x, y, %s: %s, gr-name, str); i++; } @@ -869,12 +871,14 @@ hud_parse_env_var(struct hud_context *hud, const char *env) else if (strcmp(name, samples-passed) == 0 has_occlusion_query(hud-pipe-screen)) { hud_pipe_query_install(pane, hud-pipe, samples-passed, -PIPE_QUERY_OCCLUSION_COUNTER, 0, 0, FALSE); +PIPE_QUERY_OCCLUSION_COUNTER, 0, 0, +PIPE_DRIVER_QUERY_TYPE_UINT64); } else if (strcmp(name, primitives-generated) == 0 has_streamout(hud-pipe-screen)) { hud_pipe_query_install(pane, hud-pipe, primitives-generated, -PIPE_QUERY_PRIMITIVES_GENERATED, 0, 0, FALSE); +PIPE_QUERY_PRIMITIVES_GENERATED, 0, 0, +PIPE_DRIVER_QUERY_TYPE_UINT64); } else { boolean processed = FALSE; @@ -901,7 +905,7 @@ hud_parse_env_var(struct hud_context *hud, const char *env) if (i Elements(pipeline_statistics_names)) { hud_pipe_query_install(pane, hud-pipe, name, PIPE_QUERY_PIPELINE_STATISTICS, i, - 0, FALSE); + 0, PIPE_DRIVER_QUERY_TYPE_UINT64); processed = TRUE; } } diff --git a/src/gallium/auxiliary/hud/hud_driver_query.c b/src/gallium/auxiliary/hud/hud_driver_query.c index ee71678..c47d232 100644 --- a/src/gallium/auxiliary/hud/hud_driver_query.c +++ b/src/gallium/auxiliary/hud/hud_driver_query.c @@ -150,7 +150,7 @@ void hud_pipe_query_install(struct hud_pane *pane, struct pipe_context *pipe, const char *name, unsigned query_type, unsigned result_index, - uint64_t max_value, boolean uses_byte_units) + uint64_t max_value, enum pipe_driver_query_type type) { struct hud_graph *gr; struct query_info *info; @@ -178,8 +178,7 @@ hud_pipe_query_install(struct hud_pane *pane, struct pipe_context *pipe, hud_pane_add_graph(pane, gr); if (pane-max_value max_value) hud_pane_set_max_value(pane, max_value); - if (uses_byte_units) - pane-uses_byte_units = TRUE; + pane-type = type; } boolean @@ -189,7 +188,6 @@ hud_driver_query_install(struct hud_pane *pane, struct pipe_context *pipe, struct pipe_screen *screen = pipe-screen; struct pipe_driver_query_info query; unsigned num_queries, i; - boolean uses_byte_units; boolean found = FALSE; if (!screen-get_driver_query_info) @@ -208,9 +206,8 @@ hud_driver_query_install(struct hud_pane *pane, struct pipe_context *pipe, if (!found) return FALSE; - uses_byte_units = query.type == PIPE_DRIVER_QUERY_TYPE_BYTES; hud_pipe_query_install(pane, pipe,
[Mesa-dev] [PATCH] st/dri: don't set PIPE_BIND_SCANOUT for MSAA surfaces
From: Marek Olšák marek.ol...@amd.com --- src/gallium/state_trackers/dri/dri2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/dri/dri2.c b/src/gallium/state_trackers/dri/dri2.c index a8323a3..5aa785c 100644 --- a/src/gallium/state_trackers/dri/dri2.c +++ b/src/gallium/state_trackers/dri/dri2.c @@ -556,7 +556,7 @@ dri2_allocate_textures(struct dri_context *ctx, if (drawable-textures[statt]) { templ.format = drawable-textures[statt]-format; -templ.bind = drawable-textures[statt]-bind; +templ.bind = drawable-textures[statt]-bind ~PIPE_BIND_SCANOUT; templ.nr_samples = drawable-stvis.samples; /* Try to reuse the resource. -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/dri: don't set PIPE_BIND_SCANOUT for MSAA surfaces
On 07/07/2015 10:29 AM, Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com --- src/gallium/state_trackers/dri/dri2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/dri/dri2.c b/src/gallium/state_trackers/dri/dri2.c index a8323a3..5aa785c 100644 --- a/src/gallium/state_trackers/dri/dri2.c +++ b/src/gallium/state_trackers/dri/dri2.c @@ -556,7 +556,7 @@ dri2_allocate_textures(struct dri_context *ctx, if (drawable-textures[statt]) { templ.format = drawable-textures[statt]-format; -templ.bind = drawable-textures[statt]-bind; +templ.bind = drawable-textures[statt]-bind ~PIPE_BIND_SCANOUT; templ.nr_samples = drawable-stvis.samples; /* Try to reuse the resource. LGTM. Reviewed-by: Brian Paul bri...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] opencl: use versioned .so in mesa.icd
Ccing Tom Thank you Igor ! On 07/07/15 11:05, Igor Gnatenko wrote: We must have versioned library in mesa.icd, because ICD loader would fail if the mesa-devel package wasn't installed. Reported-by: Fabian Deutsch fabian.deut...@gmx.de Reference: https://bugs.freedesktop.org/show_bug.cgi?id=73512 Cc: 10.6 mesa-sta...@lists.freedesktop.org Signed-off-by: Igor Gnatenko i.gnatenko.br...@gmail.com Similar to the default location of the .icd file, this is another picky topic. Negardless I think we should go ahead with this patch. Why ? First let's see what others do: - nvidia - versioned soname, resides in lib. The full soname is used. - catalyst - no soname, resides in lib. libamdocl32/64.so is used. - beignet - unversioned soname, resides in lib/foo. Full library path is used and no version. - the spec - does not mention anything about soname, versioning or location. The example gives a plain libVendorAOpenCL.so. Based off this one can assume that it should live in lib, although everything else remains open. As our lovely build always sets SONAME (even when -module is set), the Fedora guys are doing (have been shipping) it correctly. With all that said, do we have any comments/objections against this patch ? Thanks, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Don't disable SIMD16 when using the pixel interpolator
Matt Turner matts...@gmail.com writes: On Sun, Jul 5, 2015 at 4:45 PM, Francisco Jerez curroje...@riseup.net wrote: Hi Matt, Matt Turner matts...@gmail.com writes: On Fri, Jul 3, 2015 at 3:46 AM, Francisco Jerez curroje...@riseup.net wrote: Heh, I happened to come across this comment yesterday while looking for the remaining no16 calls and wondered why on earth it couldn't do the same that the normal interpolation code does. After this patch and a series coming up that will remove all SIMD8 fallbacks from the texturing code, the only case left still applicable to Gen7 hardware and later will be SIMD16 explicit accumulator operands unsupported. Anyone? I can explain the problem: Prior to Gen7, the were were two accumulator registers usable for most datatypes (acc0, acc1). On Gen7, they removed integer-support from acc1, which was necessary to implement SIMD16 integer multiplication using the normal MUL/MACH sequence. IIRC they got rid of the acc1 register on IVB altogether, but managed to emulate it for floating point types by taking advantage of the extra precision not normally used for floating point arithmetic (the fake acc1 basically uses the same storage in the EU that holds the 32 MSBs of each component of acc0), what explains the apparent asymmetry between integer and floating point data types. I've never read anything that told me that -- what have you seen? Heh, I'll try to dig up my reference and send it to you in private. I implemented 32-bit integer multiplication without using the accumulator in: commit f7df169ba13d22338e9276839a7e9629ca0a6b4f Author: Matt Turner matts...@gmail.com Date: Wed May 13 18:34:03 2015 -0700 i965/fs: Implement integer multiply without mul/mach. The remaining cases of SIMD16 explicit accumulator operands unsupported are ADDC, SUBB, and 32x32 - high 32-bit multiplication. The remaining multiplication case can probably be reimplemented without the accumulator, like I did for the low 32-bit result. Hmm, I have the suspicion that high 32-bit multiplication is the one legit use-case of the accumulator we have left, any algorithm breaking it up into individual 32/16-bit MULs would end up doing more multiplications than the two MUL/MACH instructions we do now, because we wouldn't be able to take advantage of the full precision implemented in the hardware if we truncate the 48-bit intermediate results to fit in a 32-bit register. That's probably true. It's just that Sandybridge and earlier don't expose the functionality (but could do 64-bit integer multiplication just fine), Ivybridge has the quarter-control/accumulator bug, Haswell works fine if you split the multiplication sequence into SIMD8, and Broadwell let's you do 32x32 - 64-bit multiplication without the accumulator. So you have only two platforms where it's you have to use the accumulator, and one of them is broken (but I guess can be trivially fixed by some force-writemask-all hackery). I guess there's also VLV, CHV and BXT, AFAIK the latter two have some level of support for 64-bit multiplication (with the annoying alignment restriction on the operands) but it might be easier for them to use the accumulator path like earlier hardware. The best SIMD16 code for [iu]mulExtended() where both lsb and msb results are used is probably 2 sets of mul/mach/mov (with some kind of work around for Ivybridge), but that's kind of hard to recognize. It's probably also the best SIMD16 code (on chips without reasonable support for 64-bit multiply that is) for computing the high 32 bits of the result, regardless of whether optimizer is able to recognise that the low 32 bits of the computation also come out as a side product, and whether or not the low 32 bits are used by the shader. A potential solution could be to have the visitor emit full 64-bit MULs speculatively for any 32-bit integer multiplication (high or low), together with a MOV to chop off the unnecessary bits, a later optimization pass (run after CSE to give the optimizer the opportunity to merge the 64-bits MULs from the high and low 32-bit computations) would demote 64-bit MULs for which only the lowest 32-bits of the result are used to 32-bit MULs, later on the SIMD width lowering pass would split 16-wide 64-bit MULs in half, and a later pass would lower them into the MUL/MACH sequence on platforms that don't support full 64-bit MULs natively. Not sure if it's worth doing at this point. I can have a look into implementing the lowering pass for 64-bit MULs so we can start taking advantage of the SIMD width lowering pass and get rid of the no16() call right away, but the additional optimization pass to demote 64-bit MULs (and speculative emission of 64-bit MULs from the visitor) can probably wait until we have some use-case? How about we use the SIMD width lowering pass to split the computation in half? It should be quite straightforward but will probably require adding a new
Re: [Mesa-dev] [PATCH 0/8] Render node only opencl and pipe-loader cleanups
On 30/06/15 16:09, Emil Velikov wrote: Hello all, As mentioned over IRC a few weeks back, here is a series that removes support for non-render node devices. The two main motivations being: - Currently we force X/xcb onto everyone that wants to use OpenCL (headless OpenCL systems/farms anyone ?) - Nice overall cleanup - 43 insertions(+), 279 deletions(-) Note that the final patches touch related code - from removing a unused function (pipe_loader_sw_probe_xlib) to using loader_open_device() over open(), with the former caring about CLOEXEC. Francisco, Tom, Can you guys please take a look at the series. Even an Ack would be greatly appreciated. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 5/6] i965: Upload binding tables in hw-generated binding table format.
When hardware-generated binding tables are enabled, use the hw-generated binding table format when uploading binding table state. Normally, the CS will will just consume the binding table pointer commands as pipelined state. When the RS is enabled however, the RS flushes whatever edited surface state entries of our on-chip binding table to the binding table pool before passing the command on to the CS. Note that the the binding table pointer offset is relative to the binding table pool base address when resource streamer instead of the surface state base address. v2: Fix possible buffer overflow when allocating a chunk out of the hw-binding table pool (Ken). v3: Remove extra newline and add missing brace around if-statement (Matt). Cc: kenn...@whitecape.org Cc: matts...@gmail.com Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com --- src/mesa/drivers/dri/i965/brw_binding_tables.c | 72 -- 1 file changed, 56 insertions(+), 16 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c b/src/mesa/drivers/dri/i965/brw_binding_tables.c index b3d592b..cc56dbf 100644 --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c @@ -50,6 +50,26 @@ static const GLuint stage_to_bt_edit[MESA_SHADER_FRAGMENT + 1] = { _3DSTATE_BINDING_TABLE_EDIT_PS, }; +static uint32_t +reserve_hw_bt_space(struct brw_context *brw, unsigned bytes) +{ + if (brw-hw_bt_pool.next_offset + bytes = brw-hw_bt_pool.bo-size - 128) { + gen7_reset_hw_bt_pool_offsets(brw); + } + + uint32_t offset = brw-hw_bt_pool.next_offset; + + /* From the Haswell PRM, Volume 2b: Command Reference: Instructions, +* 3DSTATE_BINDING_TABLE_POINTERS_xS: +* +* If HW Binding Table is enabled, the offset is relative to the +* Binding Table Pool Base Address and the alignment is 64 bytes. +*/ + brw-hw_bt_pool.next_offset += ALIGN(bytes, 64); + + return offset; +} + /** * Upload a shader stage's binding table as indirect state. * @@ -70,30 +90,50 @@ brw_upload_binding_table(struct brw_context *brw, stage_state-bind_bo_offset = 0; } else { - /* Upload a new binding table. */ - if (INTEL_DEBUG DEBUG_SHADER_TIME) { - brw-vtbl.emit_buffer_surface_state( -brw, stage_state-surf_offset[ -prog_data-binding_table.shader_time_start], -brw-shader_time.bo, 0, BRW_SURFACEFORMAT_RAW, -brw-shader_time.bo-size, 1, true); + /* When RS is enabled use hw-binding table uploads, otherwise fallback to + * software-uploads. + */ + if (brw-use_resource_streamer) { + gen7_update_binding_table_from_array(brw, stage_state-stage, + stage_state-surf_offset, + prog_data-binding_table + .size_bytes / 4); + } else { + /* Upload a new binding table. */ + if (INTEL_DEBUG DEBUG_SHADER_TIME) { +brw-vtbl.emit_buffer_surface_state( + brw, stage_state-surf_offset[ + prog_data-binding_table.shader_time_start], + brw-shader_time.bo, 0, BRW_SURFACEFORMAT_RAW, + brw-shader_time.bo-size, 1, true); + } + + uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE, + prog_data-binding_table.size_bytes, + 32, + stage_state-bind_bo_offset); + + /* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */ + memcpy(bind, stage_state-surf_offset, +prog_data-binding_table.size_bytes); } - - uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE, - prog_data-binding_table.size_bytes, 32, - stage_state-bind_bo_offset); - - /* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */ - memcpy(bind, stage_state-surf_offset, - prog_data-binding_table.size_bytes); } brw-ctx.NewDriverState |= brw_new_binding_table; if (brw-gen = 7) { + if (brw-use_resource_streamer) { + stage_state-bind_bo_offset = +reserve_hw_bt_space(brw, prog_data-binding_table.size_bytes); + } BEGIN_BATCH(2); OUT_BATCH(packet_name 16 | (2 - 2)); - OUT_BATCH(stage_state-bind_bo_offset); + /* Align SurfaceStateOffset[16:6] format to [15:5] PS Binding Table field + * when hw-generated binding table is enabled. + */ + OUT_BATCH(brw-use_resource_streamer ? +(stage_state-bind_bo_offset 1) : +stage_state-bind_bo_offset); ADVANCE_BATCH(); } } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
[Mesa-dev] [Bug 91254] (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1
https://bugs.freedesktop.org/show_bug.cgi?id=91254 Tomasz C. toma...@o2.pl changed: What|Removed |Added CC||toma...@o2.pl -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91254] (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1
https://bugs.freedesktop.org/show_bug.cgi?id=91254 Bug ID: 91254 Summary: (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1 Product: Mesa Version: 10.6 Hardware: x86-64 (AMD64) URL: https://bugs.archlinux.org/task/45459 OS: Linux (All) Status: NEW Severity: major Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: toma...@o2.pl QA Contact: i...@freedesktop.org After upgrading to mesa 10.6.0-1 from 10.5.7 under the Intel Graphics, with the work of the VA-API display slows down and freezes. Additional info: * package version(s) mesa 10.6.0-1 mesa-libgl 10.6.0-1 * config and/or log files etc. System: 4.0.6-1-ck x86_64 (64 bit), (tested also 4.0.5-2), Desktop: KDE (Plasma 5.3) CPU: Dual core Intel Core i5 M 450 (-HT-MCP-) cache: 3072 KB Graphics: Card: Intel Core Processor Integrated Graphics Controller Display Server: X.Org 1.17.2 driver: intel Resolution: 1920x1080@60.00hz GLX Renderer: Mesa DRI Intel Ironlake Mobile GLX Version: 2.1 Mesa 10.6.0 The problem is on Intel Core i5 M 450 - first generation (Nehalem) of Intel Core, also tested on the i3-3220T - third generation (Ivy Bridge) and i3-4005U fourth generation (Haswell) and it works properly. I did not test for second-generation (Sandy Bridge). Steps to reproduce: Metod 1 - install mesa 10.6.0-1 and mesa-libgl 10.6.0-1 (or 10.6.1) - install mpv and configure it: vo=opengl hwdec=vaapi - play any video, Metod 2 - install mesa 10.6.0-1 and mesa-libgl 10.6.0-1 - install kodi - enable VA-API (Settings Video Acceleration) - play any video Symptoms: display slows down and freezes Tested on: - xf86-video-intel 1:2.99.917+364+gb24e758-1 and 2.99.917-5 - AccelMethod, SNA, UXA, glamor - Linux 4.0.6, 4.0.5-2, 4.1.1 On most video files you can see the problem, but not all. You can test to: Jellyfish Video Bitrate Test Files http://jell.yfish.us/ It helps only downgrade to mesa and mesa-libgl to 10.5.7-1 Does not help downgrade xf86-video-intel to 2.99.917-5, therefore the suspicion that the problem is mesa. Upgrade libva and libva-intel-driver from 1.5.1 to 1.6.0 does not resolve this bug. The bug is reported: https://bugs.archlinux.org/task/45459 https://bbs.archlinux.org/viewtopic.php?id=198982 -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v5 3/6] i965: Enable hardware-generated binding tables on render path.
This patch implements the binding table enable command which is also used to allocate a binding table pool where where hardware-generated binding table entries are flushed into. Each binding table offset in the binding table pool is unique per each shader stage that are enabled within a batch. Also insert the required brw_tracked_state objects to enable hw-generated binding tables in normal render path. v2: - Use MOCS in binding table pool alloc for GEN8 - Fix spurious offset when allocating binding table pool entry and start from zero instead. v3: - Include GEN8 fix for spurious offset above. v4: - Fixup wrong packet length in enable/disable hw-binding table for GEN8 (Ville). - Don't invoke HW-binding table disable command when we dont have resource streamer (Chris). v5: - Reorder the state cache invalidate flush so it happens in-between enabling hw-generated binding tables and the previous sw-binding table GPU state (Chris). Cc: kenn...@whitecape.org Cc: syrj...@sci.fi Cc: ch...@chris-wilson.co.uk Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com --- src/mesa/drivers/dri/i965/brw_binding_tables.c | 96 ++ src/mesa/drivers/dri/i965/brw_context.c| 4 ++ src/mesa/drivers/dri/i965/brw_context.h| 6 ++ src/mesa/drivers/dri/i965/brw_state.h | 6 ++ src/mesa/drivers/dri/i965/brw_state_upload.c | 4 ++ src/mesa/drivers/dri/i965/gen7_disable.c | 4 +- src/mesa/drivers/dri/i965/gen8_disable.c | 4 +- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 4 ++ 8 files changed, 124 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c b/src/mesa/drivers/dri/i965/brw_binding_tables.c index 98ff0dd..2f32976 100644 --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c @@ -170,6 +170,102 @@ const struct brw_tracked_state brw_gs_binding_table = { .emit = brw_gs_upload_binding_table, }; +/** + * Hardware-generated binding tables for the resource streamer + */ +void +gen7_disable_hw_binding_tables(struct brw_context *brw) +{ + if (!brw-use_resource_streamer) + return; + + int pkt_len = brw-gen = 8 ? 4 : 3; + + BEGIN_BATCH(pkt_len); + OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC 16 | (pkt_len - 2)); + if (brw-gen = 8) { + OUT_BATCH(0); + OUT_BATCH(0); + OUT_BATCH(0); + } else { + OUT_BATCH(HSW_BT_POOL_ALLOC_MUST_BE_ONE); + OUT_BATCH(0); + } + ADVANCE_BATCH(); + + /* From the Haswell PRM, Volume 7: 3D Media GPGPU, +* 3DSTATE_BINDING_TABLE_POOL_ALLOC Programming Note: +* +* When switching between HW and SW binding table generation, SW must +* issue a state cache invalidate. +*/ + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE); +} + +void +gen7_enable_hw_binding_tables(struct brw_context *brw) +{ + if (!brw-use_resource_streamer) + return; + + if (!brw-hw_bt_pool.bo) { + /* We use a single re-usable buffer object for the lifetime of the + * context and size it to maximum allowed binding tables that can be + * programmed per batch: + * + * From the Haswell PRM, Volume 7: 3D Media GPGPU, + * 3DSTATE_BINDING_TABLE_POOL_ALLOC Programming Note: + * A maximum of 16,383 Binding tables are allowed in any batch buffer + */ + static const int max_size = 16383 * 4; + brw-hw_bt_pool.bo = drm_intel_bo_alloc(brw-bufmgr, hw_bt, + max_size, 64); + brw-hw_bt_pool.next_offset = 0; + } + + /* From the Haswell PRM, Volume 7: 3D Media GPGPU, +* 3DSTATE_BINDING_TABLE_POOL_ALLOC Programming Note: +* +* When switching between HW and SW binding table generation, SW must +* issue a state cache invalidate. +*/ + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE); + + int pkt_len = brw-gen = 8 ? 4 : 3; + uint32_t dw1 = BRW_HW_BINDING_TABLE_ENABLE; + if (brw-is_haswell) + dw1 |= SET_FIELD(GEN7_MOCS_L3, GEN7_HW_BT_POOL_MOCS) | + HSW_BT_POOL_ALLOC_MUST_BE_ONE; + else if (brw-gen = 8) + dw1 |= BDW_MOCS_WB; + + BEGIN_BATCH(pkt_len); + OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC 16 | (pkt_len - 2)); + if (brw-gen = 8) { + OUT_RELOC64(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1); + OUT_BATCH(brw-hw_bt_pool.bo-size); + } else { + OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1); + OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, + brw-hw_bt_pool.bo-size); + } + ADVANCE_BATCH(); +} + +void +gen7_reset_hw_bt_pool_offsets(struct brw_context *brw) +{ + brw-hw_bt_pool.next_offset = 0; +} + +const struct brw_tracked_state gen7_hw_binding_tables = { + .dirty = { + .mesa = 0, + .brw = BRW_NEW_BATCH, + }, + .emit = gen7_enable_hw_binding_tables +}; + /** @} */ /** diff --git
Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback
On Mon, Jul 06, 2015 at 09:05:18PM -0700, Kristian Høgsberg wrote: On Mon, Jul 6, 2015 at 12:36 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote: Since the purpose of transform feedback tends to be for the client to act upon the results to change the geometry in the scene, it is likely that the client will soon be waiting upon the results. Flush the batch early so that we don't build up a long queue of commands afterwards that could delay the readback. --- src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index 857ebe5..13dbe5b 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx, brw_batch_end(brw-batch); + /* We will likely want to read the results in the very near future, so +* push this primitive to hardware if it is currently idle. +*/ + if (!brw_batch_busy(brw-batch)) + brw_batch_flush(brw-batch); + /* EndTransformFeedback() means that we need to update the number of * vertices written. Since it's only necessary if DrawTransformFeedback() * is called and it means mapping a buffer object, we delay computing it We need some data to justify this change. I think even the theory is not correct - transform feedback is typically fed back into the GPU (as new geometry, eg) rather than consumed by the CPU, and in that case the flush is not helpful. But at the end of the day, data will tell. How are they fed back? Can the xfb buffer be bound to the vertex buffer? (Genuine question! The only examples I've seen were for testing by the CPU.) The point of the patch was really more about getting people to think about the idea of making sure we queue work early that we need in the near future, and breaking such work up into packets that are naturally fenced by the kernel. However, Jesse made a good point that spinning on a manual semaphore for such feedback (if needed by the CPU) is likely far superior than using the kernel wait interfaces. For the query object, we would reserve the first slot for the semaphore tracking, then after every query pair would add a PIPE_CONTROL dword write to that slot with the new seqno. For reporting we need only map async and spin until that value is greater than the query we want to report back to the user. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/8] Render node only opencl and pipe-loader cleanups
Emil Velikov emil.l.veli...@gmail.com writes: On 30/06/15 16:09, Emil Velikov wrote: Hello all, As mentioned over IRC a few weeks back, here is a series that removes support for non-render node devices. The two main motivations being: - Currently we force X/xcb onto everyone that wants to use OpenCL (headless OpenCL systems/farms anyone ?) - Nice overall cleanup - 43 insertions(+), 279 deletions(-) Note that the final patches touch related code - from removing a unused function (pipe_loader_sw_probe_xlib) to using loader_open_device() over open(), with the former caring about CLOEXEC. Francisco, Tom, Can you guys please take a look at the series. Even an Ack would be greatly appreciated. Looks OK to me, assuming that Tom is OK with the general approach the series is: Reviewed-by: Francisco Jerez curroje...@riseup.net Thanks Emil signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/8] Render node only opencl and pipe-loader cleanups
On Tue, Jul 07, 2015 at 05:43:19PM +0100, Emil Velikov wrote: On 30/06/15 16:09, Emil Velikov wrote: Hello all, As mentioned over IRC a few weeks back, here is a series that removes support for non-render node devices. The two main motivations being: - Currently we force X/xcb onto everyone that wants to use OpenCL (headless OpenCL systems/farms anyone ?) Is this really true? I don't see where lack of xcb prevents users from building OpenCL. -Tom - Nice overall cleanup - 43 insertions(+), 279 deletions(-) Note that the final patches touch related code - from removing a unused function (pipe_loader_sw_probe_xlib) to using loader_open_device() over open(), with the former caring about CLOEXEC. Francisco, Tom, Can you guys please take a look at the series. Even an Ack would be greatly appreciated. I have no problems with merging these. -Tom Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 5/5] i965/gen9: Allocate YF/YS tiled buffer objects
On Tue, Jul 7, 2015 at 2:35 AM, Kenneth Graunke kenn...@whitecape.org wrote: On Tuesday, June 23, 2015 01:23:05 PM Anuj Phogat wrote: In case of I915_TILING_{X,Y} we need to pass tiling format to libdrm using drm_intel_bo_alloc_tiled(). But, In case of YF/YS tiled buffers libdrm need not know about the tiling format because these buffers don't have hardware support to be tiled or detiled through a fenced region. libdrm still need to know buffer alignment value for its use in kernel when resolving the relocation. Using drm_intel_bo_alloc_for_render() for YF/YS tiled buffers satisfy both the above conditions. V2: Delete min/max buffer size restrictions not valid for i965+. Remove redundant align to tile size statements. Remove some redundant code now when there are no min/max buffer size. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 62 +-- 1 file changed, 58 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 80c52f2..5bcb094 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -558,6 +558,48 @@ intel_lower_compressed_format(struct brw_context *brw, mesa_format format) } } +/* This function computes Yf/Ys tiled bo size, alignment and pitch. */ +static uint64_t +intel_get_yf_ys_bo_size(struct intel_mipmap_tree *mt, unsigned *alignment, +uint64_t *pitch) Hi Anuj, This patch has a subtle bug: you've specified pitch and stride to be uint64_t here, but below when you call it [snip] @@ -616,11 +658,23 @@ intel_miptree_create(struct brw_context *brw, alloc_flags |= BO_ALLOC_FOR_RENDER; unsigned long pitch; - mt-bo = drm_intel_bo_alloc_tiled(brw-bufmgr, miptree, total_width, - total_height, mt-cpp, mt-tiling, - pitch, alloc_flags); mt-etc_format = etc_format; - mt-pitch = pitch; + + if (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE) { + unsigned alignment = 0; + unsigned long size; + size = intel_get_yf_ys_bo_size(mt, alignment, pitch); ...you're passing a pointer to an unsigned long. On 32-bit builds, unsigned long is a 4 byte value, while uint64_t is 8 bytes. This could lead to stack corruption. (GCC warns about this during a 32-bit build.) Thanks for noticing this Ken. I think I never did 32 bit build with these patches :(. I assumed the solution was to make everything uint32_t, but apparently drm_intel_bo_alloc_tiled actually expects an unsigned long. So we can't change that. How about changing the parameter type of pitch to unsigned long* and types of size and stride to unsigned long? This fixes the 32 bit build warnings. Then I looked at your code, and realized that nothing even uses the pitch value. Is there some point to the parameter existing at all? pitch value is later assigned to mt-pitch. I could have avoided passing pitch parameter and instead assign mt-pitch in drm_intel_bo_alloc_for_render(). But, I used the current approach to keep mt-pitch assignments at a single place. I'm working on some refactoring to make this code look better. --Ken + assert(size); + mt-bo = drm_intel_bo_alloc_for_render(brw-bufmgr, miptree, + size, alignment); + mt-pitch = pitch; + } else { + mt-bo = drm_intel_bo_alloc_tiled(brw-bufmgr, miptree, +total_width, total_height, mt-cpp, +mt-tiling, pitch, +alloc_flags); + mt-pitch = pitch; + } /* If the BO is too large to fit in the aperture, we need to use the * BLT engine to support it. Prior to Sandybridge, the BLT paths can't ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback
On Tue, Jul 07, 2015 at 10:31:07AM -0700, Kenneth Graunke wrote: On Tuesday, July 07, 2015 04:46:22 PM Chris Wilson wrote: On Tue, Jul 07, 2015 at 10:12:20AM +0100, Chris Wilson wrote: On Mon, Jul 06, 2015 at 09:05:18PM -0700, Kristian Høgsberg wrote: On Mon, Jul 6, 2015 at 12:36 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote: Since the purpose of transform feedback tends to be for the client to act upon the results to change the geometry in the scene, it is likely that the client will soon be waiting upon the results. Flush the batch early so that we don't build up a long queue of commands afterwards that could delay the readback. --- src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index 857ebe5..13dbe5b 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx, brw_batch_end(brw-batch); + /* We will likely want to read the results in the very near future, so +* push this primitive to hardware if it is currently idle. +*/ + if (!brw_batch_busy(brw-batch)) + brw_batch_flush(brw-batch); + /* EndTransformFeedback() means that we need to update the number of * vertices written. Since it's only necessary if DrawTransformFeedback() * is called and it means mapping a buffer object, we delay computing it We need some data to justify this change. I think even the theory is not correct - transform feedback is typically fed back into the GPU (as new geometry, eg) rather than consumed by the CPU, and in that case the flush is not helpful. But at the end of the day, data will tell. How are they fed back? Can the xfb buffer be bound to the vertex buffer? (Genuine question! The only examples I've seen were for testing by the CPU.) Yes, it can. Just glBindBuffer() some buffers around. Or, I suspect one could bind it as a texture buffer object or SSBO and then use a compute shader on the results. With GL 4.x, the avoid synchronizing with the CPU mentality is a lot more prevalent, due to the advent of compute shaders. I've reviewed the code again, and gen7_end_transform_feedback() is always followed by brw_compute_xfb_vertices_written (and a read of the sol buffer) afaict, maybe not immediately but always before the next transform feedback. Sadly, yes. We have a primitive count and we need a vertex count - so, a tiny bit of math. Ideally, we would use the Gen7.5 MI_MATH+ feature to do this, eliminating the CPU-GPU synchronization point. Also afaict it is not possible to map the sol buffer directly into the application. -Chris It definitely is - the application creates GL buffer objects and binds them for use with transform feedback. They can certainly glMapBufferRange() those buffers. The trouble I see is that the values stored currently are implementation dependent and often reset. How is the application meant to use them directly? (Just trying to understand a bit better. If it is that the current implementation is stalling when not required, then trying to speed those stalls up really is just lipstick on a pig and irrelevant. The patch was just trying to make a suggestion that feeding the gpu around expected stall points works best with the current batch-level granularity of our fences. Using intrabatch semaphores for the query objects seems a more promising avenue than doing batch flushes anyway.) -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 17/18] loader: Look for any version of currently linked libudev.so
On 06/07/15 11:33, Chris Wilson wrote: Since there was an ABI break and linking twice against libudev.so.0 and libudev.so.1 causes the application to quickly crash, we first check if the application is currently linked against libudev before dlopening a local handle. However for backwards/forwards compatability, we need to inspect the application for current linkage against all known versions first. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk I'm ever so slightly conserned that RTLD_NOLOAD is not part of the POSIX standard, thus it's missing on some platforms (*BSD seems ok, while Solaris, MacOS are not). Then again this code is not build for them so we are safe. Plus it does save nastry crashes :-) Feel free to add the Cc: mesa-stable tag. Reviewed-by: Emil Velikov emil.l.veli...@gmail.com Note(s) so self: 1) what was the main obstactle for dropping libudev and sysfs 2) all that handling is completely broken in our configure.ac :-\ -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: fix Bug 85252 - Segfault in compiler while processing ternary operator with void arguments
This is done by returning an rvalue of type void in the ast_function_expression::hir function instead of a void expression. This produces (in the case of the ternary) an hir with a call to the void returning function and an assignement of a void variable which will be optimized out (the assignement) during the optimization pass. This fix results in having a valid subexpression in the many different cases where the subexpressions are functions whose return values are void. Thus preventing to dereference NULL in the following cases: * binary operator * unary operators * ternary operator * comparison operators (except equal and nequal operator) Equal and nequal had to be handled as a special case because instead of segfaulting on a forbidden syntax it was now accepting expressions with a void return value on either (or both) side of the expression. Piglist tests are on the way Signed-off-by: Renaud Gaubert ren...@lse.epita.fr Reviewed-by: Gabriel Laskar gabr...@lse.epita.fr Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85252 --- src/glsl/ast_function.cpp | 6 +- src/glsl/ast_to_hir.cpp | 10 +- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp index 92e26bf..776a754 100644 --- a/src/glsl/ast_function.cpp +++ b/src/glsl/ast_function.cpp @@ -1785,7 +1785,11 @@ ast_function_expression::hir(exec_list *instructions, /* an error has already been emitted */ value = ir_rvalue::error_value(ctx); } else { -value = generate_call(instructions, sig, actual_parameters, state); +value = generate_call(instructions, sig, actual_parameters, state); +if (!value) { + ir_variable *const tmp = new(ctx) ir_variable(glsl_type::void_type, void_var, ir_var_temporary); + value = new(ctx) ir_dereference_variable(tmp); +} } return value; diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 8cb46be..00cc16c 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1270,7 +1270,15 @@ ast_expression::do_hir(exec_list *instructions, *applied to one operand that can make them match, in which *case this conversion is done. */ - if ((!apply_implicit_conversion(op[0]-type, op[1], state) + + if (op[0]-type == glsl_type::void_type || op[1]-type == glsl_type::void_type) { + +_mesa_glsl_error( loc, state, `%s': wrong operand types: no operation + `%1$s' exists that takes a left-hand operand of type 'void' or a + right operand of type 'void', (this-oper == ast_equal) ? == : !=); + + error_emitted = true; + } else if ((!apply_implicit_conversion(op[0]-type, op[1], state) !apply_implicit_conversion(op[1]-type, op[0], state)) || (op[0]-type != op[1]-type)) { _mesa_glsl_error( loc, state, operands of `%s' must have the same -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] gallium/hud: add PIPE_DRIVER_QUERY_TYPE_MICROSECONDS for HUD
For the series: Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Tue, Jul 7, 2015 at 5:37 PM, Brian Paul bri...@vmware.com wrote: This allows drivers to report queries in units of microseconds and have the HUD display us (microseconds), ms (milliseconds) or s (seconds) on the graph. --- src/gallium/auxiliary/hud/hud_context.c | 25 - src/gallium/include/pipe/p_defines.h| 11 ++- 2 files changed, 26 insertions(+), 10 deletions(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index 9f42da9..cb55220 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -238,8 +238,9 @@ number_to_human_readable(uint64_t num, enum pipe_driver_query_type type, {, KB, MB, GB, TB, PB, EB}; static const char *metric_units[] = {, k, M, G, T, P, E}; - const char **units = - (type == PIPE_DRIVER_QUERY_TYPE_BYTES) ? byte_units : metric_units; + static const char *time_units[] = + { us, ms, s}; /* based on microseconds */ + const char *suffix; double divisor = (type == PIPE_DRIVER_QUERY_TYPE_BYTES) ? 1024 : 1000; int unit = 0; double d = num; @@ -249,12 +250,26 @@ number_to_human_readable(uint64_t num, enum pipe_driver_query_type type, unit++; } + switch (type) { + case PIPE_DRIVER_QUERY_TYPE_MICROSECONDS: + assert(unit ARRAY_SIZE(time_units)); + suffix = time_units[unit]; + break; + case PIPE_DRIVER_QUERY_TYPE_BYTES: + assert(unit ARRAY_SIZE(byte_units)); + suffix = byte_units[unit]; + break; + default: + assert(unit ARRAY_SIZE(metric_units)); + suffix = metric_units[unit]; + } + if (d = 100 || d == (int)d) - sprintf(out, %.0f%s, d, units[unit]); + sprintf(out, %.0f%s, d, suffix); else if (d = 10 || d*10 == (int)(d*10)) - sprintf(out, %.1f%s, d, units[unit]); + sprintf(out, %.1f%s, d, suffix); else - sprintf(out, %.2f%s, d, units[unit]); + sprintf(out, %.2f%s, d, suffix); } static void diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 153897a..b0cd23d 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -788,11 +788,12 @@ union pipe_color_union enum pipe_driver_query_type { - PIPE_DRIVER_QUERY_TYPE_UINT64 = 0, - PIPE_DRIVER_QUERY_TYPE_UINT = 1, - PIPE_DRIVER_QUERY_TYPE_FLOAT = 2, - PIPE_DRIVER_QUERY_TYPE_PERCENTAGE = 3, - PIPE_DRIVER_QUERY_TYPE_BYTES = 4, + PIPE_DRIVER_QUERY_TYPE_UINT64 = 0, + PIPE_DRIVER_QUERY_TYPE_UINT = 1, + PIPE_DRIVER_QUERY_TYPE_FLOAT= 2, + PIPE_DRIVER_QUERY_TYPE_PERCENTAGE = 3, + PIPE_DRIVER_QUERY_TYPE_BYTES= 4, + PIPE_DRIVER_QUERY_TYPE_MICROSECONDS = 5, }; enum pipe_driver_query_group_type -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/hud: display percentages with % suffix
Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Tue, Jul 7, 2015 at 9:17 PM, Brian Paul bri...@vmware.com wrote: --- src/gallium/auxiliary/hud/hud_context.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index cb55220..bd57190 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -255,6 +255,9 @@ number_to_human_readable(uint64_t num, enum pipe_driver_query_type type, assert(unit ARRAY_SIZE(time_units)); suffix = time_units[unit]; break; + case PIPE_DRIVER_QUERY_TYPE_PERCENTAGE: + suffix = %; + break; case PIPE_DRIVER_QUERY_TYPE_BYTES: assert(unit ARRAY_SIZE(byte_units)); suffix = byte_units[unit]; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 5/5] i965/gen9: Allocate YF/YS tiled buffer objects
On Tue, Jul 7, 2015 at 12:11 PM, Anuj Phogat anuj.pho...@gmail.com wrote: On Tue, Jul 7, 2015 at 2:35 AM, Kenneth Graunke kenn...@whitecape.org wrote: On Tuesday, June 23, 2015 01:23:05 PM Anuj Phogat wrote: In case of I915_TILING_{X,Y} we need to pass tiling format to libdrm using drm_intel_bo_alloc_tiled(). But, In case of YF/YS tiled buffers libdrm need not know about the tiling format because these buffers don't have hardware support to be tiled or detiled through a fenced region. libdrm still need to know buffer alignment value for its use in kernel when resolving the relocation. Using drm_intel_bo_alloc_for_render() for YF/YS tiled buffers satisfy both the above conditions. V2: Delete min/max buffer size restrictions not valid for i965+. Remove redundant align to tile size statements. Remove some redundant code now when there are no min/max buffer size. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 62 +-- 1 file changed, 58 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 80c52f2..5bcb094 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -558,6 +558,48 @@ intel_lower_compressed_format(struct brw_context *brw, mesa_format format) } } +/* This function computes Yf/Ys tiled bo size, alignment and pitch. */ +static uint64_t +intel_get_yf_ys_bo_size(struct intel_mipmap_tree *mt, unsigned *alignment, +uint64_t *pitch) Hi Anuj, This patch has a subtle bug: you've specified pitch and stride to be uint64_t here, but below when you call it [snip] @@ -616,11 +658,23 @@ intel_miptree_create(struct brw_context *brw, alloc_flags |= BO_ALLOC_FOR_RENDER; unsigned long pitch; - mt-bo = drm_intel_bo_alloc_tiled(brw-bufmgr, miptree, total_width, - total_height, mt-cpp, mt-tiling, - pitch, alloc_flags); mt-etc_format = etc_format; - mt-pitch = pitch; + + if (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE) { + unsigned alignment = 0; + unsigned long size; + size = intel_get_yf_ys_bo_size(mt, alignment, pitch); ...you're passing a pointer to an unsigned long. On 32-bit builds, unsigned long is a 4 byte value, while uint64_t is 8 bytes. This could lead to stack corruption. (GCC warns about this during a 32-bit build.) Thanks for noticing this Ken. I think I never did 32 bit build with these patches :(. I assumed the solution was to make everything uint32_t, but apparently drm_intel_bo_alloc_tiled actually expects an unsigned long. So we can't change that. How about changing the parameter type of pitch to unsigned long* and types of size and stride to unsigned long? This fixes the 32 bit build warnings. Then I looked at your code, and realized that nothing even uses the pitch value. Is there some point to the parameter existing at all? pitch value is later assigned to mt-pitch. I could have avoided passing pitch parameter and instead assign mt-pitch in drm_intel_bo_alloc_for_render(). But, I used the current approach Correction: assign mt-pitch in intel_get_yf_ys_bo_size() to keep mt-pitch assignments at a single place. I'm working on some refactoring to make this code look better. --Ken + assert(size); + mt-bo = drm_intel_bo_alloc_for_render(brw-bufmgr, miptree, + size, alignment); + mt-pitch = pitch; + } else { + mt-bo = drm_intel_bo_alloc_tiled(brw-bufmgr, miptree, +total_width, total_height, mt-cpp, +mt-tiling, pitch, +alloc_flags); + mt-pitch = pitch; + } /* If the BO is too large to fit in the aperture, we need to use the * BLT engine to support it. Prior to Sandybridge, the BLT paths can't ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCHv2] i965/gen9: Use custom MOCS entries set up by the kernel.
Instead of relying on hardware defaults the i915 kernel driver is going program custom MOCS tables system-wide on Gen9 hardware. The WT entry previously used for renderbuffers had a number of problems: It disabled caching on eLLC, it used a reserved L3 cacheability setting, and it used to override the PTE controls making renderbuffers always WT on LLC regardless of the kernel's setting. Instead use an entry from the new MOCS tables with parameters: TC=LLC/eLLC, LeCC=PTE, L3CC=WB. The WB entry previously used for anything other than renderbuffers has moved to a different index in the new MOCS tables but it should have the same caching semantics as the old entry. Even though the corresponding kernel change (drm/i915: Added Programming of the MOCS) is in a way an ABI break it doesn't seem necessary to check that the kernel is recent enough because the change should only affect Gen9 which is still unreleased hardware. v2: Update MOCS values for the new Android-incompatible tables introduced in v7 of the kernel patch. Cc: 10.6 mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_defines.h| 11 ++- src/mesa/drivers/dri/i965/gen8_surface_state.c | 3 +-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 66b9abc..8ab8d62 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -2491,12 +2491,13 @@ enum brw_wm_barycentric_interp_mode { #define BDW_MOCS_WT 0x58 #define BDW_MOCS_PTE 0x18 -/* Skylake: MOCS is now an index into an array of 64 different configurable - * cache settings. We still use only either write-back or write-through; and - * rely on the documented default values. +/* Skylake: MOCS is now an index into an array of 62 different caching + * configurations programmed by the kernel. */ -#define SKL_MOCS_WB (0b001001 1) -#define SKL_MOCS_WT (0b000101 1) +/* TC=LLC/eLLC, LeCC=WB, LRUM=3, L3CC=WB */ +#define SKL_MOCS_WB (2 1) +/* TC=LLC/eLLC, LeCC=PTE, LRUM=3, L3CC=WB */ +#define SKL_MOCS_PTE (1 1) #define MEDIA_VFE_STATE 0x7000 /* GEN7 DW2, GEN8+ DW3 */ diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index bd3eb00..dfaf762 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -401,8 +401,7 @@ gen8_update_renderbuffer_surface(struct brw_context *brw, irb-mt_layer : (irb-mt_layer / MAX2(mt-num_samples, 1)); GLenum gl_target = rb-TexImage ? rb-TexImage-TexObject-Target : GL_TEXTURE_2D; - /* FINISHME: Use PTE MOCS on Skylake. */ - uint32_t mocs = brw-gen = 9 ? SKL_MOCS_WT : BDW_MOCS_PTE; + uint32_t mocs = brw-gen = 9 ? SKL_MOCS_PTE : BDW_MOCS_PTE; intel_miptree_used_for_rendering(mt); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/8] Render node only opencl and pipe-loader cleanups
On 07/07/15 19:42, Tom Stellard wrote: On Tue, Jul 07, 2015 at 05:43:19PM +0100, Emil Velikov wrote: On 30/06/15 16:09, Emil Velikov wrote: Hello all, As mentioned over IRC a few weeks back, here is a series that removes support for non-render node devices. The two main motivations being: - Currently we force X/xcb onto everyone that wants to use OpenCL (headless OpenCL systems/farms anyone ?) Is this really true? I don't see where lack of xcb prevents users from building OpenCL. Ouch just realised how silly the wording is. Sorry about that. Currently if you have xcb at build time it will get picked, regardless if xcb is present at runtime or not. Something nasty which I refer to as hidden dependency. -Tom - Nice overall cleanup - 43 insertions(+), 279 deletions(-) Note that the final patches touch related code - from removing a unused function (pipe_loader_sw_probe_xlib) to using loader_open_device() over open(), with the former caring about CLOEXEC. Francisco, Tom, Can you guys please take a look at the series. Even an Ack would be greatly appreciated. I have no problems with merging these. If you'd like to take a look I can give it another week. Alternatively I'll fix patch 1/8 summary (same force mistake) and will push these in a day or so. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium/hud: display percentages with % suffix
--- src/gallium/auxiliary/hud/hud_context.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index cb55220..bd57190 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -255,6 +255,9 @@ number_to_human_readable(uint64_t num, enum pipe_driver_query_type type, assert(unit ARRAY_SIZE(time_units)); suffix = time_units[unit]; break; + case PIPE_DRIVER_QUERY_TYPE_PERCENTAGE: + suffix = %; + break; case PIPE_DRIVER_QUERY_TYPE_BYTES: assert(unit ARRAY_SIZE(byte_units)); suffix = byte_units[unit]; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback
On Tue, Jul 07, 2015 at 10:12:20AM +0100, Chris Wilson wrote: On Mon, Jul 06, 2015 at 09:05:18PM -0700, Kristian Høgsberg wrote: On Mon, Jul 6, 2015 at 12:36 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote: Since the purpose of transform feedback tends to be for the client to act upon the results to change the geometry in the scene, it is likely that the client will soon be waiting upon the results. Flush the batch early so that we don't build up a long queue of commands afterwards that could delay the readback. --- src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index 857ebe5..13dbe5b 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx, brw_batch_end(brw-batch); + /* We will likely want to read the results in the very near future, so +* push this primitive to hardware if it is currently idle. +*/ + if (!brw_batch_busy(brw-batch)) + brw_batch_flush(brw-batch); + /* EndTransformFeedback() means that we need to update the number of * vertices written. Since it's only necessary if DrawTransformFeedback() * is called and it means mapping a buffer object, we delay computing it We need some data to justify this change. I think even the theory is not correct - transform feedback is typically fed back into the GPU (as new geometry, eg) rather than consumed by the CPU, and in that case the flush is not helpful. But at the end of the day, data will tell. How are they fed back? Can the xfb buffer be bound to the vertex buffer? (Genuine question! The only examples I've seen were for testing by the CPU.) I've reviewed the code again, and gen7_end_transform_feedback() is always followed by brw_compute_xfb_vertices_written (and a read of the sol buffer) afaict, maybe not immediately but always before the next transform feedback. Also afaict it is not possible to map the sol buffer directly into the application. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] gallium: add interface for writable shader images
Am 07.07.2015 um 22:35 schrieb Jose Fonseca: On 07/07/15 21:28, Ilia Mirkin wrote: On Tue, Jul 7, 2015 at 4:24 PM, Jose Fonseca jfons...@vmware.com wrote: I'm not experienced with the semantics around resources that can be read/written by shaders, so I can't really make educated comments. But overall this looks good to me FWIW. On 05/07/15 14:25, Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com Other approaches are being considered: 1) Don't use resource wrappers (views) and pass all view parameters (format, layer range, level) to set_shader_images just like set_vertex_buffers, set_constant_buffer, or even glBindImageTexture do. I don't know how much pipe drivers leverage this nowadays, but these structures are convenient placeholders for driver data, particular when they don't support something (e.g., a certain format, or need some swizzling), natively. 2) Use pipe_sampler_view instead of pipe_image_view, and maybe even use set_sampler_views instead of set_shader_images. set_sampler_views would have to use start_slot = PIPE_MAX_SAMPLERS for all writable images to allow for OpenGL textures in the lower slots. If pipe_sampler_view and pipe_image_view are the same, we could indeed use one structure for both. While still keeping the separate create/bind/destroy functions. The big difference is that a sampler view has a first/last layer and first/last level, while image views are more like surfaces which just have the one of each. But they also need a byte range for buffer images. D3D11_TEX2D_ARRAY_UAV allows to specify first/last layer https://msdn.microsoft.com/en-us/library/windows/desktop/ff476242.aspx , so it sounds that once pipe_image_view is updated to handle D3D11, the difference would reduce to the absence of last_level Of course we could just ignore that and guarantee that first==last for images. Yes, it might not be a bad idea. You could of course argue then isn't it really more like pipe_surface? At least in d3d11 clearly they are much closer in concept to rts. The actual structures are of course mostly the same in gallium, the differences boil down to pipe_surface having (long obsolete) width/height parameters and a writable flag, whereas sampler views instead have swizzling fields (I don't think they'd have any use for this), support multiple levels (again, not needed for shader images / uavs), and have a target parameter (in d3d10, rts actually have a target parameter too, but it is of no practical consequence, hence there was no need for that in gallium - I'm not sure if it would be required for shader images / uavs, uavs certainly have such target parameter too but I'm not sure it matters). But in any case, I'm pretty impartial to what structure is used, as long as it is created/destroyed separately. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] gallium: add interface for writable shader images
I'm not experienced with the semantics around resources that can be read/written by shaders, so I can't really make educated comments. But overall this looks good to me FWIW. On 05/07/15 14:25, Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com Other approaches are being considered: 1) Don't use resource wrappers (views) and pass all view parameters (format, layer range, level) to set_shader_images just like set_vertex_buffers, set_constant_buffer, or even glBindImageTexture do. I don't know how much pipe drivers leverage this nowadays, but these structures are convenient placeholders for driver data, particular when they don't support something (e.g., a certain format, or need some swizzling), natively. 2) Use pipe_sampler_view instead of pipe_image_view, and maybe even use set_sampler_views instead of set_shader_images. set_sampler_views would have to use start_slot = PIPE_MAX_SAMPLERS for all writable images to allow for OpenGL textures in the lower slots. If pipe_sampler_view and pipe_image_view are the same, we could indeed use one structure for both. While still keeping the separate create/bind/destroy functions. This would enable drivers to treat them uniformly internally if they wanted (e.g, by concatenating all views bindings into a single array as you described). Or seperate internal objects if they wanted. This seems the best of both worlds. There is even a precendent: {create,bind,delete}_{fs,vs,gs}_state. These all use the same template structure, but drivers are free to create joint or disjoint private structures for each kind. And in face llvmpipe (and all draw based drivers), end up using different private objects. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] gallium: add interface for writable shader images
On 07/07/15 21:28, Ilia Mirkin wrote: On Tue, Jul 7, 2015 at 4:24 PM, Jose Fonseca jfons...@vmware.com wrote: I'm not experienced with the semantics around resources that can be read/written by shaders, so I can't really make educated comments. But overall this looks good to me FWIW. On 05/07/15 14:25, Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com Other approaches are being considered: 1) Don't use resource wrappers (views) and pass all view parameters (format, layer range, level) to set_shader_images just like set_vertex_buffers, set_constant_buffer, or even glBindImageTexture do. I don't know how much pipe drivers leverage this nowadays, but these structures are convenient placeholders for driver data, particular when they don't support something (e.g., a certain format, or need some swizzling), natively. 2) Use pipe_sampler_view instead of pipe_image_view, and maybe even use set_sampler_views instead of set_shader_images. set_sampler_views would have to use start_slot = PIPE_MAX_SAMPLERS for all writable images to allow for OpenGL textures in the lower slots. If pipe_sampler_view and pipe_image_view are the same, we could indeed use one structure for both. While still keeping the separate create/bind/destroy functions. The big difference is that a sampler view has a first/last layer and first/last level, while image views are more like surfaces which just have the one of each. But they also need a byte range for buffer images. D3D11_TEX2D_ARRAY_UAV allows to specify first/last layer https://msdn.microsoft.com/en-us/library/windows/desktop/ff476242.aspx , so it sounds that once pipe_image_view is updated to handle D3D11, the difference would reduce to the absence of last_level Of course we could just ignore that and guarantee that first==last for images. Yes, it might not be a bad idea. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] gallium: add interface for writable shader images
On Tue, Jul 7, 2015 at 4:24 PM, Jose Fonseca jfons...@vmware.com wrote: I'm not experienced with the semantics around resources that can be read/written by shaders, so I can't really make educated comments. But overall this looks good to me FWIW. On 05/07/15 14:25, Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com Other approaches are being considered: 1) Don't use resource wrappers (views) and pass all view parameters (format, layer range, level) to set_shader_images just like set_vertex_buffers, set_constant_buffer, or even glBindImageTexture do. I don't know how much pipe drivers leverage this nowadays, but these structures are convenient placeholders for driver data, particular when they don't support something (e.g., a certain format, or need some swizzling), natively. 2) Use pipe_sampler_view instead of pipe_image_view, and maybe even use set_sampler_views instead of set_shader_images. set_sampler_views would have to use start_slot = PIPE_MAX_SAMPLERS for all writable images to allow for OpenGL textures in the lower slots. If pipe_sampler_view and pipe_image_view are the same, we could indeed use one structure for both. While still keeping the separate create/bind/destroy functions. The big difference is that a sampler view has a first/last layer and first/last level, while image views are more like surfaces which just have the one of each. But they also need a byte range for buffer images. Of course we could just ignore that and guarantee that first==last for images. This would enable drivers to treat them uniformly internally if they wanted (e.g, by concatenating all views bindings into a single array as you described). Or seperate internal objects if they wanted. This seems the best of both worlds. There is even a precendent: {create,bind,delete}_{fs,vs,gs}_state. These all use the same template structure, but drivers are free to create joint or disjoint private structures for each kind. And in face llvmpipe (and all draw based drivers), end up using different private objects. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] gallium: add interface for writable shader images
On Tue, Jul 7, 2015 at 4:35 PM, Jose Fonseca jfons...@vmware.com wrote: On 07/07/15 21:28, Ilia Mirkin wrote: On Tue, Jul 7, 2015 at 4:24 PM, Jose Fonseca jfons...@vmware.com wrote: I'm not experienced with the semantics around resources that can be read/written by shaders, so I can't really make educated comments. But overall this looks good to me FWIW. On 05/07/15 14:25, Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com Other approaches are being considered: 1) Don't use resource wrappers (views) and pass all view parameters (format, layer range, level) to set_shader_images just like set_vertex_buffers, set_constant_buffer, or even glBindImageTexture do. I don't know how much pipe drivers leverage this nowadays, but these structures are convenient placeholders for driver data, particular when they don't support something (e.g., a certain format, or need some swizzling), natively. 2) Use pipe_sampler_view instead of pipe_image_view, and maybe even use set_sampler_views instead of set_shader_images. set_sampler_views would have to use start_slot = PIPE_MAX_SAMPLERS for all writable images to allow for OpenGL textures in the lower slots. If pipe_sampler_view and pipe_image_view are the same, we could indeed use one structure for both. While still keeping the separate create/bind/destroy functions. The big difference is that a sampler view has a first/last layer and first/last level, while image views are more like surfaces which just have the one of each. But they also need a byte range for buffer images. D3D11_TEX2D_ARRAY_UAV allows to specify first/last layer https://msdn.microsoft.com/en-us/library/windows/desktop/ff476242.aspx , so it sounds that once pipe_image_view is updated to handle D3D11, the difference would reduce to the absence of last_level Erm. Duh. OpenGL needs first/last layer too. And pipe_surface has it too, so it all works out well :) Of course we could just ignore that and guarantee that first==last for images. Yes, it might not be a bad idea. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] gallium/hud: add PIPE_DRIVER_QUERY_TYPE_MICROSECONDS for HUD
This allows drivers to report queries in units of microseconds and have the HUD display us (microseconds), ms (milliseconds) or s (seconds) on the graph. --- src/gallium/auxiliary/hud/hud_context.c | 25 - src/gallium/include/pipe/p_defines.h| 11 ++- 2 files changed, 26 insertions(+), 10 deletions(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index 9f42da9..cb55220 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -238,8 +238,9 @@ number_to_human_readable(uint64_t num, enum pipe_driver_query_type type, {, KB, MB, GB, TB, PB, EB}; static const char *metric_units[] = {, k, M, G, T, P, E}; - const char **units = - (type == PIPE_DRIVER_QUERY_TYPE_BYTES) ? byte_units : metric_units; + static const char *time_units[] = + { us, ms, s}; /* based on microseconds */ + const char *suffix; double divisor = (type == PIPE_DRIVER_QUERY_TYPE_BYTES) ? 1024 : 1000; int unit = 0; double d = num; @@ -249,12 +250,26 @@ number_to_human_readable(uint64_t num, enum pipe_driver_query_type type, unit++; } + switch (type) { + case PIPE_DRIVER_QUERY_TYPE_MICROSECONDS: + assert(unit ARRAY_SIZE(time_units)); + suffix = time_units[unit]; + break; + case PIPE_DRIVER_QUERY_TYPE_BYTES: + assert(unit ARRAY_SIZE(byte_units)); + suffix = byte_units[unit]; + break; + default: + assert(unit ARRAY_SIZE(metric_units)); + suffix = metric_units[unit]; + } + if (d = 100 || d == (int)d) - sprintf(out, %.0f%s, d, units[unit]); + sprintf(out, %.0f%s, d, suffix); else if (d = 10 || d*10 == (int)(d*10)) - sprintf(out, %.1f%s, d, units[unit]); + sprintf(out, %.1f%s, d, suffix); else - sprintf(out, %.2f%s, d, units[unit]); + sprintf(out, %.2f%s, d, suffix); } static void diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 153897a..b0cd23d 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -788,11 +788,12 @@ union pipe_color_union enum pipe_driver_query_type { - PIPE_DRIVER_QUERY_TYPE_UINT64 = 0, - PIPE_DRIVER_QUERY_TYPE_UINT = 1, - PIPE_DRIVER_QUERY_TYPE_FLOAT = 2, - PIPE_DRIVER_QUERY_TYPE_PERCENTAGE = 3, - PIPE_DRIVER_QUERY_TYPE_BYTES = 4, + PIPE_DRIVER_QUERY_TYPE_UINT64 = 0, + PIPE_DRIVER_QUERY_TYPE_UINT = 1, + PIPE_DRIVER_QUERY_TYPE_FLOAT= 2, + PIPE_DRIVER_QUERY_TYPE_PERCENTAGE = 3, + PIPE_DRIVER_QUERY_TYPE_BYTES= 4, + PIPE_DRIVER_QUERY_TYPE_MICROSECONDS = 5, }; enum pipe_driver_query_group_type -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: use implementation specified MAX_VERTEX_ATTRIBS rather than hardcoded value
--- src/glsl/linker.cpp | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index 6a69c15..2f5a36f 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -3084,12 +3084,7 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) } } - /* FINISHME: The value of the max_attribute_index parameter is -* FINISHME: implementation dependent based on the value of -* FINISHME: GL_MAX_VERTEX_ATTRIBS. GL_MAX_VERTEX_ATTRIBS must be -* FINISHME: at least 16, so hardcode 16 for now. -*/ - if (!assign_attribute_or_color_locations(prog, MESA_SHADER_VERTEX, 16)) { + if (!assign_attribute_or_color_locations(prog, MESA_SHADER_VERTEX, ctx-Const.Program[MESA_SHADER_VERTEX].MaxAttribs)) { goto done; } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: use implementation specified MAX_VERTEX_ATTRIBS rather than hardcoded value
Assuming the comment is correct, this is Reviewed-by: Ilia Mirkin imir...@alum.mit.edu src/mesa/main/get_hash_params.py: [ MAX_VERTEX_ATTRIBS_ARB, CONTEXT_INT(Const.Program[MESA_SHADER_VERTEX].MaxAttribs), extra_ARB_vertex_program_api_es2 ], Quickly looked over the code, and the comment does seem correct. Perhaps not going over 80 chars by so much would be better, your call whether to fix that or not. On Tue, Jul 7, 2015 at 7:42 PM, Timothy Arceri t_arc...@yahoo.com.au wrote: --- src/glsl/linker.cpp | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index 6a69c15..2f5a36f 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -3084,12 +3084,7 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) } } - /* FINISHME: The value of the max_attribute_index parameter is -* FINISHME: implementation dependent based on the value of -* FINISHME: GL_MAX_VERTEX_ATTRIBS. GL_MAX_VERTEX_ATTRIBS must be -* FINISHME: at least 16, so hardcode 16 for now. -*/ - if (!assign_attribute_or_color_locations(prog, MESA_SHADER_VERTEX, 16)) { + if (!assign_attribute_or_color_locations(prog, MESA_SHADER_VERTEX, ctx-Const.Program[MESA_SHADER_VERTEX].MaxAttribs)) { goto done; } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback
On Tuesday, July 07, 2015 09:02:16 PM Chris Wilson wrote: On Tue, Jul 07, 2015 at 10:31:07AM -0700, Kenneth Graunke wrote: On Tuesday, July 07, 2015 04:46:22 PM Chris Wilson wrote: On Tue, Jul 07, 2015 at 10:12:20AM +0100, Chris Wilson wrote: On Mon, Jul 06, 2015 at 09:05:18PM -0700, Kristian Høgsberg wrote: On Mon, Jul 6, 2015 at 12:36 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote: Since the purpose of transform feedback tends to be for the client to act upon the results to change the geometry in the scene, it is likely that the client will soon be waiting upon the results. Flush the batch early so that we don't build up a long queue of commands afterwards that could delay the readback. --- src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index 857ebe5..13dbe5b 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx, brw_batch_end(brw-batch); + /* We will likely want to read the results in the very near future, so +* push this primitive to hardware if it is currently idle. +*/ + if (!brw_batch_busy(brw-batch)) + brw_batch_flush(brw-batch); + /* EndTransformFeedback() means that we need to update the number of * vertices written. Since it's only necessary if DrawTransformFeedback() * is called and it means mapping a buffer object, we delay computing it We need some data to justify this change. I think even the theory is not correct - transform feedback is typically fed back into the GPU (as new geometry, eg) rather than consumed by the CPU, and in that case the flush is not helpful. But at the end of the day, data will tell. How are they fed back? Can the xfb buffer be bound to the vertex buffer? (Genuine question! The only examples I've seen were for testing by the CPU.) Yes, it can. Just glBindBuffer() some buffers around. Or, I suspect one could bind it as a texture buffer object or SSBO and then use a compute shader on the results. With GL 4.x, the avoid synchronizing with the CPU mentality is a lot more prevalent, due to the advent of compute shaders. I've reviewed the code again, and gen7_end_transform_feedback() is always followed by brw_compute_xfb_vertices_written (and a read of the sol buffer) afaict, maybe not immediately but always before the next transform feedback. Sadly, yes. We have a primitive count and we need a vertex count - so, a tiny bit of math. Ideally, we would use the Gen7.5 MI_MATH+ feature to do this, eliminating the CPU-GPU synchronization point. Also afaict it is not possible to map the sol buffer directly into the application. -Chris It definitely is - the application creates GL buffer objects and binds them for use with transform feedback. They can certainly glMapBufferRange() those buffers. The trouble I see is that the values stored currently are implementation dependent and often reset. How is the application meant to use them directly? (Just trying to understand a bit better. If it is that the current implementation is stalling when not required, then trying to speed those stalls up really is just lipstick on a pig and irrelevant. The patch was just trying to make a suggestion that feeding the gpu around expected stall points works best with the current batch-level granularity of our fences. Using intrabatch semaphores for the query objects seems a more promising avenue than doing batch flushes anyway.) -Chris I think we misunderstood each other. By SOL buffer do you mean prim_count_bo? If so, that's not visible to applications. Stream out (aka transform feedback) works by writing geometry data coming out of the VS/HS/DS/GS stages (whichever is last) into an application buffer. So I assumed you meant that buffer. But the format of /that/ data is absolutely controlled by the application. The mechanism for counting the primitives written (to implement glDrawTransformFeedback()) is entirely up to the driver. It's not the best. Prior to MI_MATH existing, it was the best I could think of. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] loader: libudev vs sysfs vs libdrm
Hello all, A recent patch by Chris, fixing some libudev fun in our loader, made me think if we can clear it up a bit. Having three different ways of retrieving the vendor/device ID does feel a bit excessive. Plus as one gets fixed others are likely to break - and they do. So here is a summary of each method, from portability POV. - libudev: widely common across Linux distributions (but not all). - sysfs: written by Gary Wong to target GNU Hurd and *BSD. The *BSD folk never got to using it though :-\ - libdrm: used as a last resource fall-back after the above two. the sole option used by *BSD, MacOS and Android. libdrm seems like a nice middle ground that can be used everywhere. Which begs the question: from a technical POV, is there any advantage/disadvantage of using one over the other ? I do recall Kristian and Eric participating in this discussion before, but the only thing I can find is along the lines of linux distros should be using libudev :-( Can anyone shed a light/cast their 2c ? Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/vs: Fix matNxM vertex attributes where M != 4.
Reviewed-by: Chris Forbes chr...@ijw.co.nz On Thu, Jul 2, 2015 at 8:08 PM, Kenneth Graunke kenn...@whitecape.org wrote: Matrix vertex attributes have their columns padded out to vec4s, which I was failing to account for. Scalar NIR expects them to be packed, however. Cc: mesa-sta...@lists.freedesktop.org Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) I still need to write proper Piglit tests for this. We have basically a single test for matrix vertex attributes, and that's a mat4 (which worked). But I figure we probably shouldn't hold up the bugfix on that. diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index caf1300..37b1ed7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -91,12 +91,19 @@ fs_visitor::nir_setup_inputs(nir_shader *shader) * So, we need to copy from fs_reg(ATTR, var-location) to * offset(nir_inputs, var-data.driver_location). */ - unsigned components = var-type-without_array()-components(); + const glsl_type *const t = var-type-without_array(); + const unsigned components = t-components(); + const unsigned cols = t-matrix_columns; + const unsigned elts = t-vector_elements; unsigned array_length = var-type-is_array() ? var-type-length : 1; for (unsigned i = 0; i array_length; i++) { -for (unsigned j = 0; j components; j++) { - bld.MOV(retype(offset(input, bld, components * i + j), type), - offset(fs_reg(ATTR, var-data.location + i, type), bld, j)); +for (unsigned j = 0; j cols; j++) { + for (unsigned k = 0; k elts; k++) { + bld.MOV(offset(retype(input, type), bld, + components * i + elts * j + k), + offset(fs_reg(ATTR, var-data.location + i, type), + bld, 4 * j + k)); + } } } break; -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91254] (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1
https://bugs.freedesktop.org/show_bug.cgi?id=91254 --- Comment #2 from Tomasz C. toma...@o2.pl --- On: mesa-git 10.7.0_devel.71031 mesa-libgl-git 10.7.0_devel.71031 (compiled from git master) this problem still exists same as 10.6 and 10.6.1 If I go back this two packages to version 10.5.7 it works correctly. How can I help locate the source of the problem? -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Mesa 10.6.2 cut-off
Hi all, As requested by Ilia, a bit of a heads-up: Any patches sent to mesa-stable and/or landed in master after 12 PM (noon) GMT, on the 8th of July won't feature in 10.6.2. Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91254] (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1
https://bugs.freedesktop.org/show_bug.cgi?id=91254 --- Comment #3 from Chris Wilson ch...@chris-wilson.co.uk --- You have two end points, a bisection would be very useful and only take a few minutes (maybe an hour at most?). -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager
On Tue, Jul 07, 2015 at 01:14:53PM +0300, Abdiel Janulgue wrote: On 07/06/2015 01:33 PM, Chris Wilson wrote: @@ -600,7 +593,10 @@ brw_emit_null_surface_state(struct brw_context *brw, 1 BRW_SURFACE_WRITEDISABLE_B_SHIFT | 1 BRW_SURFACE_WRITEDISABLE_A_SHIFT); } - surf[1] = bo ? bo-offset64 : 0; + surf[1] = brw_batch_reloc(brw-batch, *out_offset + 4, + bo, 0, + I915_GEM_DOMAIN_RENDER, + I915_GEM_DOMAIN_RENDER); null check for bo? I put the NULL check into the inline variant of brw_batch_reloc() for a bit of syntatic sugar for these cases. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once
On 06/07/15 19:12, Chris Wilson wrote: On Mon, Jul 06, 2015 at 04:19:36PM +0300, Martin Peres wrote: On 06/07/15 16:15, Martin Peres wrote: On 06/07/15 16:13, Chris Wilson wrote: On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote: On 06/07/15 13:33, Chris Wilson wrote: Move the query for the TIMESTAMP register from context init to the screen, so that it is only queried once for all contexts. On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low bits always zero. Detect this by repeating the read a few times and check the register is incrementing. You do not do the latter. You only check for the low bits. I guess the counter is supposed to be monotonically increasing and with a resolution of a few microseconds which would make this perfectly valid. Could you confirm and make sure to add this information in the commit message please? The counter should increment every 80ns. What's misleading in what I wrote? It describes the hw bug and how to detect it. Well, it is not misleading, it just lacks this information. If it incremented every seconds, the patch would be stupid because the timestamp could be at 0 and polling 10 times at a few us of interval would always yield the same result. That's all :) Oh, forgot to say: With this information added in the commit message and the commit message duplicated as a comment in intel_detect_timestamp(), the patch is: How about: On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low 32bits always zero. Detect this by repeating the read a few times and check the register is incrementing every 80ns as expected and not stuck on zero (as would be the case with the buggy kernel/hw.). -Chris Perfect! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback
On 06/07/15 22:36, Kenneth Graunke wrote: On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote: Since the purpose of transform feedback tends to be for the client to act upon the results to change the geometry in the scene, it is likely that the client will soon be waiting upon the results. Flush the batch early so that we don't build up a long queue of commands afterwards that could delay the readback. --- src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index 857ebe5..13dbe5b 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx, brw_batch_end(brw-batch); + /* We will likely want to read the results in the very near future, so +* push this primitive to hardware if it is currently idle. +*/ + if (!brw_batch_busy(brw-batch)) + brw_batch_flush(brw-batch); + /* EndTransformFeedback() means that we need to update the number of * vertices written. Since it's only necessary if DrawTransformFeedback() * is called and it means mapping a buffer object, we delay computing it We need some data to justify this change. I actually get a negative perf improvement out of this one, -0.9% on a customer benchmark. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] i965/gen9: Plugin the code for selecting YF/YS tiling on skl+
On Wednesday, June 10, 2015 03:30:47 PM Anuj Phogat wrote: Buffers with Yf/Ys tiling end up using meta upload / download paths or the blitter for cases where they used tiled_memcpy paths in case of Y tiling. This has exposed some bugs in meta path. To avoid any piglit regressions on SKL this patch keeps the Yf/Ys tiling disabled at the moment. V3: Make brw_miptree_choose_tr_mode() actually choose TRMODE. (Ben) Few cosmetic changes. V4: Get rid of brw_miptree_choose_tr_mode(). Take care of all tile resource modes {Yf, Ys, none} for all generations at one place. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 97 -- 1 file changed, 79 insertions(+), 18 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index b9ac4cf..c0ef5cc 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -807,27 +807,88 @@ brw_miptree_layout(struct brw_context *brw, enum intel_miptree_tiling_mode requested, struct intel_mipmap_tree *mt) { - mt-tr_mode = INTEL_MIPTREE_TRMODE_NONE; + const unsigned bpp = mt-cpp * 8; + const bool is_tr_mode_yf_ys_allowed = + brw-gen = 9 + !for_bo + !mt-compressed + /* Enable YF/YS tiling only for color surfaces because depth and + * stencil surfaces are not supported in blitter using fast copy + * blit and meta PBO upload, download paths. No other paths + * currently support Yf/Ys tiled surfaces. + * FIXME: Remove this restriction once we have a tiled_memcpy() + * path to do depth/stencil data upload/download to Yf/Ys tiled + * surfaces. + */ + _mesa_is_format_color_format(mt-format) + (requested == INTEL_MIPTREE_TILING_Y || + requested == INTEL_MIPTREE_TILING_ANY) + (bpp is_power_of_two(bpp)) + /* FIXME: To avoid piglit regressions keep the Yf/Ys tiling + * disabled at the moment. + */ + false; I must say, I was a bit surprised to see this land as is. You've got a lot of conditions there, only to finish them up with false - with a comment saying that your code isn't passing Piglit yet. That doesn't really meet our usual qualifications for merging. Coverity also pointed out that your if (is_tr_mode_yf_ys_allowed) block below is dead code, issuing new warnings. Forgive my ignorance, but what's the purpose of Yf/Ys tiling? My understanding was that Ys is primarily in support of a new OpenGL feature - GL_ARB_spare_texture(*) - which isn't yet enabled: https://www.opengl.org/registry/specs/ARB/sparse_texture.txt Is Yf tiling supposed to be more efficient than legacy Y-tiling? If so, then switching to it is an optimization, isn't it? We usually require data indicating some kind of performance improvement (any kind!) before landing a bunch of code for optimizations. Obviously that's pretty tricky with pre-release hardware, so I'd settle for it's complete and functions correctly. At any rate, it's merged, and hopefully you're able to get it working... - intel_miptree_set_alignment(brw, mt); - intel_miptree_set_total_width_height(brw, mt); + /* Lower index (Yf) is the higher priority mode */ + const uint32_t tr_mode[3] = {INTEL_MIPTREE_TRMODE_YF, +INTEL_MIPTREE_TRMODE_YS, +INTEL_MIPTREE_TRMODE_NONE}; + int i = is_tr_mode_yf_ys_allowed ? 0 : ARRAY_SIZE(tr_mode) - 1; - if (!mt-total_width || !mt-total_height) { - intel_miptree_release(mt); - return; - } + while (i ARRAY_SIZE(tr_mode)) { + if (brw-gen 9) + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_NONE); + else + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_YF || +tr_mode[i] == INTEL_MIPTREE_TRMODE_YS || +tr_mode[i] == INTEL_MIPTREE_TRMODE_NONE); - /* On Gen9+ the alignment values are expressed in multiples of the block -* size -*/ - if (brw-gen = 9) { - unsigned int i, j; - _mesa_get_format_block_size(mt-format, i, j); - mt-align_w /= i; - mt-align_h /= j; - } + mt-tr_mode = tr_mode[i]; + intel_miptree_set_alignment(brw, mt); + intel_miptree_set_total_width_height(brw, mt); - if (!for_bo) - mt-tiling = brw_miptree_choose_tiling(brw, requested, mt); + if (!mt-total_width || !mt-total_height) { + intel_miptree_release(mt); + return; + } + + /* On Gen9+ the alignment values are expressed in multiples of the + * block size. + */ + if (brw-gen = 9) { + unsigned int i, j; + _mesa_get_format_block_size(mt-format, i, j); + mt-align_w /= i; + mt-align_h /= j; + } + +
[Mesa-dev] [PATCH] opencl: use versioned .so in mesa.icd
We must have versioned library in mesa.icd, because ICD loader would fail if the mesa-devel package wasn't installed. Reported-by: Fabian Deutsch fabian.deut...@gmx.de Reference: https://bugs.freedesktop.org/show_bug.cgi?id=73512 Cc: 10.6 mesa-sta...@lists.freedesktop.org Signed-off-by: Igor Gnatenko i.gnatenko.br...@gmail.com --- configure.ac | 3 +++ src/gallium/targets/opencl/Makefile.am | 2 +- src/gallium/targets/opencl/mesa.icd| 1 - src/gallium/targets/opencl/mesa.icd.in | 1 + 4 files changed, 5 insertions(+), 2 deletions(-) delete mode 100644 src/gallium/targets/opencl/mesa.icd create mode 100644 src/gallium/targets/opencl/mesa.icd.in diff --git a/configure.ac b/configure.ac index d240c06..a7141a3 100644 --- a/configure.ac +++ b/configure.ac @@ -64,6 +64,8 @@ m4_ifdef([AM_PROG_AR], [AM_PROG_AR]) dnl Set internal versions OSMESA_VERSION=8 AC_SUBST([OSMESA_VERSION]) +OPENCL_VERSION=1 +AC_SUBST([OPENCL_VERSION]) dnl Versions for external dependencies LIBDRM_REQUIRED=2.4.38 @@ -2376,6 +2378,7 @@ AC_CONFIG_FILES([Makefile src/gallium/targets/libgl-xlib/Makefile src/gallium/targets/omx/Makefile src/gallium/targets/opencl/Makefile + src/gallium/targets/opencl/mesa.icd src/gallium/targets/osmesa/Makefile src/gallium/targets/osmesa/osmesa.pc src/gallium/targets/pipe-loader/Makefile diff --git a/src/gallium/targets/opencl/Makefile.am b/src/gallium/targets/opencl/Makefile.am index 70e60e2..af6d760 100644 --- a/src/gallium/targets/opencl/Makefile.am +++ b/src/gallium/targets/opencl/Makefile.am @@ -5,7 +5,7 @@ lib_LTLIBRARIES = lib@OPENCL_LIBNAME@.la lib@OPENCL_LIBNAME@_la_LDFLAGS = \ $(LLVM_LDFLAGS) \ -no-undefined \ - -version-number 1:0 \ + -version-number @OPENCL_VERSION@:0 \ $(GC_SECTIONS) \ $(LD_NO_UNDEFINED) diff --git a/src/gallium/targets/opencl/mesa.icd b/src/gallium/targets/opencl/mesa.icd deleted file mode 100644 index 6a6a870..000 --- a/src/gallium/targets/opencl/mesa.icd +++ /dev/null @@ -1 +0,0 @@ -libMesaOpenCL.so diff --git a/src/gallium/targets/opencl/mesa.icd.in b/src/gallium/targets/opencl/mesa.icd.in new file mode 100644 index 000..1b77b4e --- /dev/null +++ b/src/gallium/targets/opencl/mesa.icd.in @@ -0,0 +1 @@ +lib@OPENCL_LIBNAME@.so.@OPENCL_VERSION@ -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91254] (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1
https://bugs.freedesktop.org/show_bug.cgi?id=91254 --- Comment #1 from Chris Wilson ch...@chris-wilson.co.uk --- I suspect this a dup of bug 90839. Do you see the regression remain on master or the 10.6 branch? -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager
On 07/06/2015 01:33 PM, Chris Wilson wrote: +/* + * Add a relocation entry for the target buffer into the current batch. + * + * This is the heart of performing fast relocations, both here and in + * the corresponding kernel relocation routines. + * + * - Instead of passing in handles for the kernel convert back into + * the buffer for every relocation, we tell the kernel which + * execobject slot corresponds with the relocation. The kernel is + * able to use a simple LUT constructed as it first looks up each buffer + * for the batch rather than search a small, overfull hashtable. As both + * the number of relocations and buffers in a batch grow, the simple + * LUT is much more efficient (though the LUT itself is less cache + * friendly). + * However, as the batch buffer is by definition the last object in + * the execbuffer array we have to perform a pass to relabel the + * target of all relocations pointing to the batch. (Except when + * the kernel supports batch-first, in which case we can do the relocation + * target processing for the batch inline.) + * + * - If the kernel has not moved the buffer, it will still be in the same + * location as last time we used it. If we tell the kernel that all the + * relocation entries are the same as the offset for the buffer, then + * the kernel need only check that all the buffers are still in the same + * location and then skip performing relocations entirely. A huge win. + * + * - As a consequence of telling the kernel to skip processing the relocations, + * we need to tell the kernel about the read/write domains and special needs + * of the buffers. + * + * - Alternatively, we can request the kernel place the buffer exactly + * where we want it and forgo all relocations to that buffer entirely. + * The buffer is effectively pinned for its lifetime (if the kernel + * does have to move it, for example to swap it out to recover memory, + * the kernel will return it back to our requested location at the start + * of the next batch.) This of course imposes a lot of constraints on where + * we can say the buffers are, they must meet all the alignment constraints + * and not overlap. + * + * - Essential to all these techniques is that we always use the same + * presumed_offset for the relocations as for submitting the execobject. + * That value must be written into the batch and it must match the value + * we tell the kernel. (This breaks down when using relocation tries shared + * between multiple contexts, hence the need for context-local batch + * management.) + * + * In contrast to libdrm, we can build the execbuffer array along with + * the batch by forgoing the ability to handle general relocation trees. + * This avoids having multiple passes to build the execbuffer parameter, + * and also gives us a means to cheaply track when a buffer has been + * referenced by the batch. + */ +uint64_t __brw_batch_reloc(struct brw_batch *batch, + uint32_t batch_offset, + struct brw_bo *target_bo, + uint64_t target_offset, + unsigned read_domains, + unsigned write_domain) +{ + assert(target_bo-refcnt); + if (unlikely(target_bo-batch != batch)) { + /* XXX legal sharing between contexts/threads? */ + target_bo = brw_bo_import(batch, target_bo-base, true); + if (unlikely(target_bo == NULL)) + longjmp(batch-jmpbuf, -ENOMEM); + target_bo-refcnt--; /* kept alive by the implicit active reference */ + } + assert(target_bo-batch == batch); + + if (target_bo-exec == NULL) { + int n; + + /* reserve one exec entry for the batch */ + if (unlikely(batch-emit.nexec + 1 == batch-exec_size)) + __brw_batch_grow_exec(batch); + + n = batch-emit.nexec++; + target_bo-target_handle = has_lut(batch) ? n : target_bo-handle; + target_bo-exec = memset(batch-exec + n, 0, sizeof(*target_bo-exec)); + target_bo-exec-handle = target_bo-handle; + target_bo-exec-alignment = target_bo-alignment; + target_bo-exec-offset = target_bo-offset; + if (target_bo-pinned) + target_bo-exec-flags = EXEC_OBJECT_PINNED; + + /* Track the total amount of memory in use by all active requests */ + if (target_bo-read.rq == NULL) { + batch-rss += target_bo-size; + if (batch-rss batch-peak_rss) + batch-peak_rss = batch-rss; + } + target_bo-read.rq = batch-next_request; + list_move_tail(target_bo-read.link, batch-next_request-read); + + batch-aperture += target_bo-size; + } + + if (!target_bo-pinned) { + int n; + + if (unlikely(batch-emit.nreloc == batch-reloc_size)) + __brw_batch_grow_reloc(batch); + + n = batch-emit.nreloc++; +
Re: [Mesa-dev] [PATCH v2 5/5] i965/gen9: Allocate YF/YS tiled buffer objects
On Tuesday, June 23, 2015 01:23:05 PM Anuj Phogat wrote: In case of I915_TILING_{X,Y} we need to pass tiling format to libdrm using drm_intel_bo_alloc_tiled(). But, In case of YF/YS tiled buffers libdrm need not know about the tiling format because these buffers don't have hardware support to be tiled or detiled through a fenced region. libdrm still need to know buffer alignment value for its use in kernel when resolving the relocation. Using drm_intel_bo_alloc_for_render() for YF/YS tiled buffers satisfy both the above conditions. V2: Delete min/max buffer size restrictions not valid for i965+. Remove redundant align to tile size statements. Remove some redundant code now when there are no min/max buffer size. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 62 +-- 1 file changed, 58 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 80c52f2..5bcb094 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -558,6 +558,48 @@ intel_lower_compressed_format(struct brw_context *brw, mesa_format format) } } +/* This function computes Yf/Ys tiled bo size, alignment and pitch. */ +static uint64_t +intel_get_yf_ys_bo_size(struct intel_mipmap_tree *mt, unsigned *alignment, +uint64_t *pitch) Hi Anuj, This patch has a subtle bug: you've specified pitch and stride to be uint64_t here, but below when you call it [snip] @@ -616,11 +658,23 @@ intel_miptree_create(struct brw_context *brw, alloc_flags |= BO_ALLOC_FOR_RENDER; unsigned long pitch; - mt-bo = drm_intel_bo_alloc_tiled(brw-bufmgr, miptree, total_width, - total_height, mt-cpp, mt-tiling, - pitch, alloc_flags); mt-etc_format = etc_format; - mt-pitch = pitch; + + if (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE) { + unsigned alignment = 0; + unsigned long size; + size = intel_get_yf_ys_bo_size(mt, alignment, pitch); ...you're passing a pointer to an unsigned long. On 32-bit builds, unsigned long is a 4 byte value, while uint64_t is 8 bytes. This could lead to stack corruption. (GCC warns about this during a 32-bit build.) I assumed the solution was to make everything uint32_t, but apparently drm_intel_bo_alloc_tiled actually expects an unsigned long. So we can't change that. Then I looked at your code, and realized that nothing even uses the pitch value. Is there some point to the parameter existing at all? --Ken + assert(size); + mt-bo = drm_intel_bo_alloc_for_render(brw-bufmgr, miptree, + size, alignment); + mt-pitch = pitch; + } else { + mt-bo = drm_intel_bo_alloc_tiled(brw-bufmgr, miptree, +total_width, total_height, mt-cpp, +mt-tiling, pitch, +alloc_flags); + mt-pitch = pitch; + } /* If the BO is too large to fit in the aperture, we need to use the * BLT engine to support it. Prior to Sandybridge, the BLT paths can't signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/18] i965: Reuse our VBO for streaming fast-clear vertices
On 06/07/15 19:43, Kenneth Graunke wrote: On Monday, July 06, 2015 11:33:10 AM Chris Wilson wrote: Rather than allocating a fresh page every time we clear a buffer, keep that page around between invocations by tracking the last used offset and only allocating a fresh page when we wrap. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) This looks okay to me. Do you have any performance data to justify the extra complexity? I actually get a negative performance improvement on a customer benchmark (-1.3%). Could it be because we are waiting on the VBO at some point? What benchmark did you try to get a perf improvement? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager
On 07/07/2015 01:19 PM, Chris Wilson wrote: On Tue, Jul 07, 2015 at 01:14:53PM +0300, Abdiel Janulgue wrote: On 07/06/2015 01:33 PM, Chris Wilson wrote: @@ -600,7 +593,10 @@ brw_emit_null_surface_state(struct brw_context *brw, 1 BRW_SURFACE_WRITEDISABLE_B_SHIFT | 1 BRW_SURFACE_WRITEDISABLE_A_SHIFT); } - surf[1] = bo ? bo-offset64 : 0; + surf[1] = brw_batch_reloc(brw-batch, *out_offset + 4, + bo, 0, + I915_GEM_DOMAIN_RENDER, + I915_GEM_DOMAIN_RENDER); null check for bo? I put the NULL check into the inline variant of brw_batch_reloc() for a bit of syntatic sugar for these cases. -Chris You're, right. I failed to notice the in-line variant. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager
Hi Chris, I made a genuine effort to review this patch, hoping to better understand the various changes and what you were trying to accomplish. I spent many hours reading and trying to enumerate changes - or potential changes I needed to look hard at to convince myself whether they were correct. I came up with a frighteningly long list of changes: * Relocation handling changes considerably (the original point of Kristian's endeavour which led up to this). * Fencing, busy tracking, and sync objects are completely reworked. * Render-to-texture cache flushing and dirty buffer tracking is completely reworked. * Gen7 SOL buffer offset resetting now uses MI_LOAD_REGISTER_IMM rather than the execbuf2 parameter, requiring the command validator on Haswell. This effectively bumps the kernel requirement from v3.6 to v4.2-rc1, which will simply not fly with distributions at this time. * glBufferSubData() now uses intel_upload_data() rather than allocating a temporary BO. This is the first use of the upload buffer by the BLT engine, and could imply that the upload buffer's lifetime now extends across batches - longer than before. Separable change that requires separate evaluation and justification. * Per buffer cache-coherency checking rather than brw-has_llc? * glBufferSubData()'s prefer_stall_to_blit flag appears to depend on per-buffer cache-coherency rather than being set globally. Could impact performance of buffer uploads. * Potential missing flushes (which can cause hangs or misrendering): - It looks like calling brw_bo_busy() with BUSY_FLUSH causes a flush when necessary. However, some instances of the old bo_busy, bo_references, batch_flush pattern are replaced without that flag. One occurrance was in BufferSubData(); I did not spend time to check every case. - Flushes are often done implicitly by e.g. brw_bo_read calling brw_bo_map with the appropriate flags, and many explicit checks and flushes are removed. Not bad, but needs careful review. - Gen6+ query object code might have dropped an implicit flush guaranteeing that when the GL application requests the result, any pending work will be kicked off so they can poll/spin repeatedly until the result arrives. - New code to avoid redundant flushes. * perf_debug() warnings are removed all over the code for some reason: - Unsynchronized maps/BufferSubData not working on !LLC platforms? If they work now, that's a huge change! If not, why drop the warning? - Warnings about stalls on mapping buffers and miptrees are gone now. These have been useful in tracking down performance problems. They might not always be accurate, but surely removing them should be done separately with justification? - Warnings about stalls on query objects are gone. I've used these when analyzing application performance. Why? - Warnings about implicit flushes are gone. * BO unmap calls appears to be missing in some places. A few map calls have moved around in hard-to-follow ways. Unclear how lifetimes of buffers and lifetimes of maps are affected. * Possible mmap vs. pwrite preference changes? Hard to follow. * Texture upload (tiled_memcpy) changes, which is notoriously fragile and can lose all of the performance benefit if the compiler isn't able to optimize it just right. Ideally separate. * Assertions change to GL errors in brw_get_graphics_reset_status(). * Aperture space checking significantly reworked, especially for the BLT paths. Honestly, a lot nicer, but couldn't this be separated? * The bo_reuse driconf option is removed. * Gen4-5 structure changes. * brw_get_timestamp() - removes initialization of result to 0. Probably unnecessary and OK to delete; should be separate. * New helper functions and coding patterns. Separable. * Noise (renaming, moving code between files, some other trivial changes like removing 'brw' variables and moving code into else blocks). * ...I probably missed some things. Based upon this, I cannot in good conscience consider merging this patch. The potential for breakage is staggering. As a proof-of-concept, you've done an excellent job in proving we can do much better, and introduced a lot of good ideas. But there's a lot of work left to be done before we can consider applying it to our production quality driver. Please advise whether you would like to work towards making a mergeable, incremental patch series, or if someone else should embark on that endeavour. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] opencl: use versioned .so in mesa.icd
On 07.07.2015 19:05, Igor Gnatenko wrote: We must have versioned library in mesa.icd, because ICD loader would fail if the mesa-devel package wasn't installed. Reported-by: Fabian Deutsch fabian.deut...@gmx.de Reference: https://bugs.freedesktop.org/show_bug.cgi?id=73512 Cc: 10.6 mesa-sta...@lists.freedesktop.org Signed-off-by: Igor Gnatenko i.gnatenko.br...@gmail.com --- configure.ac | 3 +++ src/gallium/targets/opencl/Makefile.am | 2 +- src/gallium/targets/opencl/mesa.icd| 1 - src/gallium/targets/opencl/mesa.icd.in | 1 + 4 files changed, 5 insertions(+), 2 deletions(-) delete mode 100644 src/gallium/targets/opencl/mesa.icd create mode 100644 src/gallium/targets/opencl/mesa.icd.in diff --git a/configure.ac b/configure.ac index d240c06..a7141a3 100644 --- a/configure.ac +++ b/configure.ac @@ -64,6 +64,8 @@ m4_ifdef([AM_PROG_AR], [AM_PROG_AR]) dnl Set internal versions OSMESA_VERSION=8 AC_SUBST([OSMESA_VERSION]) +OPENCL_VERSION=1 +AC_SUBST([OPENCL_VERSION]) dnl Versions for external dependencies LIBDRM_REQUIRED=2.4.38 @@ -2376,6 +2378,7 @@ AC_CONFIG_FILES([Makefile src/gallium/targets/libgl-xlib/Makefile src/gallium/targets/omx/Makefile src/gallium/targets/opencl/Makefile + src/gallium/targets/opencl/mesa.icd src/gallium/targets/osmesa/Makefile src/gallium/targets/osmesa/osmesa.pc src/gallium/targets/pipe-loader/Makefile diff --git a/src/gallium/targets/opencl/Makefile.am b/src/gallium/targets/opencl/Makefile.am index 70e60e2..af6d760 100644 --- a/src/gallium/targets/opencl/Makefile.am +++ b/src/gallium/targets/opencl/Makefile.am @@ -5,7 +5,7 @@ lib_LTLIBRARIES = lib@OPENCL_LIBNAME@.la lib@OPENCL_LIBNAME@_la_LDFLAGS = \ $(LLVM_LDFLAGS) \ -no-undefined \ - -version-number 1:0 \ + -version-number @OPENCL_VERSION@:0 \ $(GC_SECTIONS) \ $(LD_NO_UNDEFINED) diff --git a/src/gallium/targets/opencl/mesa.icd b/src/gallium/targets/opencl/mesa.icd deleted file mode 100644 index 6a6a870..000 --- a/src/gallium/targets/opencl/mesa.icd +++ /dev/null @@ -1 +0,0 @@ -libMesaOpenCL.so diff --git a/src/gallium/targets/opencl/mesa.icd.in b/src/gallium/targets/opencl/mesa.icd.in new file mode 100644 index 000..1b77b4e --- /dev/null +++ b/src/gallium/targets/opencl/mesa.icd.in @@ -0,0 +1 @@ +lib@OPENCL_LIBNAME@.so.@OPENCL_VERSION@ Acked-by: Michel Dänzer michel.daen...@amd.com -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] clover: little OpenCL status code logging clean
s/build_error/compile_error in order to match the stored OpenCL status code. Make program::build catch and log every OpenCL error. Make tgsi error triggering uniform with the llvm one. --- Note that compile_error class is keep for later use .../state_trackers/clover/core/compiler.hpp| 3 ++- src/gallium/state_trackers/clover/core/error.hpp | 4 ++-- src/gallium/state_trackers/clover/core/program.cpp | 4 ++-- .../state_trackers/clover/llvm/invocation.cpp | 18 +++--- .../state_trackers/clover/tgsi/compiler.cpp| 28 +- 5 files changed, 32 insertions(+), 25 deletions(-) diff --git a/src/gallium/state_trackers/clover/core/compiler.hpp b/src/gallium/state_trackers/clover/core/compiler.hpp index c68aa39..2076417 100644 --- a/src/gallium/state_trackers/clover/core/compiler.hpp +++ b/src/gallium/state_trackers/clover/core/compiler.hpp @@ -37,7 +37,8 @@ namespace clover { const std::string opts, std::string r_log); - module compile_program_tgsi(const std::string source); + module compile_program_tgsi(const std::string source, + std::string r_log); } #endif diff --git a/src/gallium/state_trackers/clover/core/error.hpp b/src/gallium/state_trackers/clover/core/error.hpp index 780b973..59a5af4 100644 --- a/src/gallium/state_trackers/clover/core/error.hpp +++ b/src/gallium/state_trackers/clover/core/error.hpp @@ -65,9 +65,9 @@ namespace clover { cl_int code; }; - class build_error : public error { + class compile_error : public error { public: - build_error(const std::string what = ) : + compile_error(const std::string what = ) : error(CL_COMPILE_PROGRAM_FAILURE, what) { } }; diff --git a/src/gallium/state_trackers/clover/core/program.cpp b/src/gallium/state_trackers/clover/core/program.cpp index 0d6cc40..6eebd9c 100644 --- a/src/gallium/state_trackers/clover/core/program.cpp +++ b/src/gallium/state_trackers/clover/core/program.cpp @@ -56,14 +56,14 @@ program::build(const ref_vectordevice devs, const char *opts, try { auto module = (dev.ir_format() == PIPE_SHADER_IR_TGSI ? - compile_program_tgsi(_source) : + compile_program_tgsi(_source, log) : compile_program_llvm(_source, headers, dev.ir_format(), dev.ir_target(), build_opts(dev), log)); _binaries.insert({ dev, module }); _logs.insert({ dev, log }); - } catch (const build_error ) { + } catch (const error ) { _logs.insert({ dev, log }); throw; } diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp b/src/gallium/state_trackers/clover/llvm/invocation.cpp index 9b91fee..967284d 100644 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp @@ -108,7 +108,7 @@ namespace { name, llvm::MemoryBuffer::getMemBuffer(source)); if (!c.ExecuteAction(act)) - throw build_error(log); + throw compile_error(log); } module @@ -256,7 +256,7 @@ namespace { r_log = log; if (!ExecSuccess) - throw build_error(); + throw compile_error(); // Get address spaces map to be able to find kernel argument address space memcpy(address_spaces, c.getTarget().getAddressSpaceMap(), @@ -485,7 +485,7 @@ namespace { LLVMDisposeMessage(err_message); if (err) { - throw build_error(); + throw compile_error(); } } @@ -505,7 +505,7 @@ namespace { if (LLVMGetTargetFromTriple(triple.c_str(), target, error_message)) { r_log = std::string(error_message); LLVMDisposeMessage(error_message); - throw build_error(); + throw compile_error(); } LLVMTargetMachineRef tm = LLVMCreateTargetMachine( @@ -514,7 +514,7 @@ namespace { if (!tm) { r_log = Could not create TargetMachine: + triple; - throw build_error(); + throw compile_error(); } if (dump_asm) { @@ -567,7 +567,7 @@ namespace { const char *name; if (gelf_getshdr(section, symtab_header) != symtab_header) { r_log = Failed to read ELF section header.; - throw build_error(); + throw compile_error(); } name = elf_strptr(elf, section_str_index, symtab_header.sh_name); if (!strcmp(name, .symtab)) { @@ -577,9 +577,9 @@ namespace { } if (!symtab) { r_log = Unable to find symbol table.; -throw build_error(); +throw compile_error(); } - }
[Mesa-dev] [Bug 91259] FS compile failed: Register spilling not supported with m14 used
https://bugs.freedesktop.org/show_bug.cgi?id=91259 Bug ID: 91259 Summary: FS compile failed: Register spilling not supported with m14 used Product: Mesa Version: 10.6 Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Other Assignee: mesa-dev@lists.freedesktop.org Reporter: maxweiss.1...@googlemail.com QA Contact: mesa-dev@lists.freedesktop.org Failed to run a JavaFX application. System: - Arch Linux (last full package update 07 July 2015) (64 bit) - Intel HD 3000 - Oracle JDK 8 - mesa 10.6.1-1 Symptoms: A GUI with blurry fonts and inverted colors after a pile of exceptions. Example Exception trace: --- Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa Program link log: FS compile failed: Register spilling not supported with m14 used java.lang.RuntimeException: Error creating shader program at com.sun.prism.es2.ES2Shader.createFromSource(ES2Shader.java:158) at com.sun.prism.es2.ES2Shader.createFromSource(ES2Shader.java:173) at com.sun.prism.es2.ES2ResourceFactory.createShader(ES2ResourceFactory.java:219) at com.sun.scenario.effect.impl.prism.ps.PPSRenderer.createShader(PPSRenderer.java:203) at com.sun.scenario.effect.impl.prism.ps.PPSLinearConvolvePeer.createShader(PPSLinearConvolvePeer.java:102) at com.sun.scenario.effect.impl.prism.ps.PPSOneSamplerPeer.filterImpl(PPSOneSamplerPeer.java:90) at com.sun.scenario.effect.impl.prism.ps.PPSEffectPeer.filter(PPSEffectPeer.java:54) at com.sun.scenario.effect.LinearConvolveCoreEffect.filterImageDatas(LinearConvolveCoreEffect.java:85) at com.sun.scenario.effect.LinearConvolveCoreEffect.filterImageDatas(LinearConvolveCoreEffect.java:41) at com.sun.scenario.effect.FilterEffect.filter(FilterEffect.java:195) at com.sun.scenario.effect.impl.prism.PrEffectHelper.render(PrEffectHelper.java:166) at com.sun.javafx.sg.prism.EffectFilter.render(EffectFilter.java:61) at com.sun.javafx.sg.prism.NGNode.renderEffect(NGNode.java:2379) at com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2064) at com.sun.javafx.sg.prism.NGImageView.doRender(NGImageView.java:103) at com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1959) at com.sun.javafx.sg.prism.NGGroup.renderContent(NGGroup.java:235) at com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2067) at com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1959) at com.sun.javafx.tk.quantum.ViewPainter.doPaint(ViewPainter.java:474) at com.sun.javafx.tk.quantum.ViewPainter.paintImpl(ViewPainter.java:327) at com.sun.javafx.tk.quantum.UploadingPainter.run(UploadingPainter.java:133) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at com.sun.javafx.tk.RenderJob.run(RenderJob.java:58) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at com.sun.javafx.tk.quantum.QuantumRenderer$PipelineRunnable.run(QuantumRenderer.java:125) at java.lang.Thread.run(Thread.java:745) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91259] FS compile failed: Register spilling not supported with m14 used
https://bugs.freedesktop.org/show_bug.cgi?id=91259 --- Comment #1 from maxweiss.1...@googlemail.com --- Sorry I've failed to post the whole trace, here's the rest: java.lang.IllegalStateException: Operation requires resource lock at com.sun.prism.impl.ManagedResource.assertLocked(ManagedResource.java:96) at com.sun.prism.impl.BaseTexture.assertLocked(BaseTexture.java:267) at com.sun.prism.impl.ps.BaseShaderContext.setTexture(BaseShaderContext.java:689) at com.sun.prism.impl.ps.BaseShaderContext.validateTextureOp(BaseShaderContext.java:585) at com.sun.prism.impl.ps.BaseShaderContext.validateTextureOp(BaseShaderContext.java:501) at com.sun.prism.impl.BaseGraphics.drawTextureRaw(BaseGraphics.java:703) at com.sun.scenario.effect.impl.prism.ps.PPSOneSamplerPeer.filterImpl(PPSOneSamplerPeer.java:117) at com.sun.scenario.effect.impl.prism.ps.PPSEffectPeer.filter(PPSEffectPeer.java:54) at com.sun.scenario.effect.LinearConvolveCoreEffect.filterImageDatas(LinearConvolveCoreEffect.java:85) at com.sun.scenario.effect.LinearConvolveCoreEffect.filterImageDatas(LinearConvolveCoreEffect.java:41) at com.sun.scenario.effect.FilterEffect.filter(FilterEffect.java:195) at com.sun.scenario.effect.impl.prism.PrEffectHelper.render(PrEffectHelper.java:166) at com.sun.javafx.sg.prism.EffectFilter.render(EffectFilter.java:61) at com.sun.javafx.sg.prism.NGNode.renderEffect(NGNode.java:2379) at com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2064) at com.sun.javafx.sg.prism.NGImageView.doRender(NGImageView.java:103) at com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1959) at com.sun.javafx.sg.prism.NGGroup.renderContent(NGGroup.java:235) at com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2067) at com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1959) at com.sun.javafx.tk.quantum.ViewPainter.doPaint(ViewPainter.java:474) at com.sun.javafx.tk.quantum.ViewPainter.paintImpl(ViewPainter.java:327) at com.sun.javafx.tk.quantum.UploadingPainter.run(UploadingPainter.java:133) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at com.sun.javafx.tk.RenderJob.run(RenderJob.java:58) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at com.sun.javafx.tk.quantum.QuantumRenderer$PipelineRunnable.run(QuantumRenderer.java:125) at java.lang.Thread.run(Thread.java:745) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev