Re: [Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg
On Tue, 2016-09-13 at 22:12 -0700, Francisco Jerez wrote: > Iago Toral writes: > > > > > On Mon, 2016-09-12 at 14:05 -0700, Francisco Jerez wrote: > > > > > > Iago Toral Quiroga writes: > > > > > > > > > > > > > > > We will use this in cases where we want to force the vstride of > > > > a > > > > src_reg > > > > to 0 to exploit a particular behavior of the hardware. It will > > > > come > > > > in > > > > handy to implement access to components Z/W. > > > > --- > > > > src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 + > > > > src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 ++ > > > > 2 files changed, 3 insertions(+) > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > > > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > > > index f66c093..f3cce4b 100644 > > > > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > > > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h > > > > @@ -51,6 +51,7 @@ public: > > > > explicit src_reg(const dst_reg ®); > > > > > > > > src_reg *reladdr; > > > > + bool force_vstride0; > > > I was wondering whether it would make more sense to unify this > > > with > > > the > > > FS back-end's fs_reg::stride (a numeric stride field is also > > > likely > > > more > > > convenient to do arithmetic on than a boolean) and promote it to > > > backend_reg? It could be defined as the number of components to > > > jump > > > over for each logical channel of the register, which is just the > > > vstride > > > in single-precision SIMD4x2 and the hstride in scalar mode. > > We could do that, but I thought it would be a good idea to make it > > clear that here we are using the vstride=0 with a very specific > > intention and we don't expect the hardware to do what it would be > > expected (we are trying to exploit a hardware bug after all). If we > > were to use a normal stride field for this I think we would make > > this > > intention much less obvious and other people reading the code would > > have a much harder time understanding what is really going on. > > Since we > > are being tricky here I think the extra field to signal that we are > > trying to do something "special" might be worth it: people can > > track > > where we read and write that field and see exactly where it is > > being > > used for the purpose of exploiting this particular hardware > > behavior. > > > Yes, I agree that the hardware's behavior on Gen7 with non-identity > vstride is tricky and special -- Special enough that *none* of the > VEC4 > optimization passes and IR-handling code need to be aware of it, > because > the field is only going to be used as internal book-keeping data > structure in convert_to_hw_regs() and immediately discarded. IOW > you're > storing an internal data structure of convert_to_hw_regs() as part of > the shared IR data structure, with no well-defined semantics and > which > no back-end code (not even convert_to_hw_regs()) is going to be able > to > honor. > > So if your argument for making the representation of vstride > unnecessarily non-orthogonal is that you want to discourage people > from > using it at the IR level (which is fair because it won't work at > all!), > I would argue that it doesn't belong in the IR data structures in the > first place, because you could just keep convert_to_hw_regs' internal > data structures internal to convert_to_hw_regs. (I don't actually > think > you need the data structure, neither internal nor external, but more > on > that later) Yes, that makes sense. > > > > > > > > But thinking about it some more, I wonder if it's really > > > necessary to > > > expose vertical strides at the IR level? Aren't you planing to > > > use > > > this > > > during the conversion to HW registers exclusively? Why don't you > > > set > > > the vstride field directly in that case? > > Yes, this is used exclusively at that time. The conversion to > > hardware > > registers in convert_to_hw_regs() happens in two stages now: > > > > We call our 'expand_64bit_swizzle_to_32bit()' helper first. This > > one > > takes care of checking the regioning on DF instructions, translate > > swizzles and set force_vstrid0 to true when needed (which is also > > the > > only place that would set this to true). Then the rest of the code > > in > > convert_to_hw_regs() just operates as usual, only that it will > > check > > the force_vstride0 setting to decide the vstride to use for DF > > regions. > > > > I did it like this because it allows us to keep the DF swizzle > > translation and regioning checking logic separated from the > > conversion > > to hardware registers, but this separation means that we need to > > tell > > the latter when it has to set the vstride to 0, thus the addition > > of > > the forcE_vstride0 field. I think having these two things separated > > makes sense and makes the code easier to read. We can keep both > > things > > separate and still avoid the force_vstride0 field by using a stride > > field as you suggest above, but as I
[Mesa-dev] [PATCH] i965/fs: Take the sample mask into account in FIND_LIVE_CHANNEL
Just looking at the channel enables is not sufficient, at least not on Sky Lake. Channels that are disabled by the sample_mask may show up in the channel enable register as being enabled even if they are not executing. This can cause FIND_LIVE_CHANNEL to return a channel that isn't actually executing. In our handling of interpolateAtSample we do a clever trick with emit_uniformize to call the interpolator once for each unique sample id. Thanks to FIND_LIVE_CHANNEL returning a dead channel, we can get an infinite loop which hangs the GPU. Signed-off-by: Jason Ekstrand --- src/mesa/drivers/dri/i965/brw_eu.h | 3 ++- src/mesa/drivers/dri/i965/brw_eu_emit.c | 22 +++--- src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 ++- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 +- 5 files changed, 21 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index 3e52764..9aaab78 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p, void brw_find_live_channel(struct brw_codegen *p, - struct brw_reg dst); + struct brw_reg dst, + struct brw_reg sample_mask); void brw_broadcast(struct brw_codegen *p, diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 3b12030..f593a8d 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p, } void -brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst) +brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst, + struct brw_reg sample_mask) { const struct gen_device_info *devinfo = p->devinfo; const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current); @@ -3377,13 +3378,20 @@ brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst) if (devinfo->gen >= 8) { /* Getting the first active channel index is easy on Gen8: Just find - * the first bit set in the mask register. The same register exists - * on HSW already but it reads back as all ones when the current - * instruction has execution masking disabled, so it's kind of - * useless. + * the first bit set in the mask register AND the sample mask. The + * same register exists on HSW already but it reads back as all ones + * when the current instruction has execution masking disabled, so + * it's kind of useless. */ - inst = brw_FBL(p, vec1(dst), -retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD)); + struct brw_reg mask_reg = retype(brw_mask_reg(0), + BRW_REGISTER_TYPE_UD); + if (sample_mask.file != BRW_IMMEDIATE_VALUE || + sample_mask.ud != 0x) { +brw_AND(p, vec1(dst), mask_reg, sample_mask); +mask_reg = vec1(dst); + } + + inst = brw_FBL(p, vec1(dst), mask_reg); /* Quarter control has the effect of magically shifting the value of * this register so you'll get the first active channel relative to diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h index 483672f..45b5f88 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h @@ -407,7 +407,8 @@ namespace brw { const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD); const dst_reg dst = vgrf(src.type); - ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index, + sample_mask_reg()); ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, component(chan_index, 0)); return src_reg(component(dst, 0)); diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 2f4ba7b..d923b0b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -2041,7 +2041,7 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) break; case SHADER_OPCODE_FIND_LIVE_CHANNEL: - brw_find_live_channel(p, dst); + brw_find_live_channel(p, dst, src[0]); break; case SHADER_OPCODE_BROADCAST: diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 256abae..63fca6f 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1863,7 +1863,7 @@ generate_code(struct brw_c
Re: [Mesa-dev] [PATCH v2 3/7] intel/isl: Add support for 1-D compressed textures
On Mon, Sep 12, 2016 at 05:58:20PM -0700, Jason Ekstrand wrote: > Compressed 1-D textures are a well-defined thing in both GL and Vulkan. Looks correct to me: Reviewed-by: Topi Pohjolainen > --- > src/intel/isl/isl.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c > index a75fddf..185984d 100644 > --- a/src/intel/isl/isl.c > +++ b/src/intel/isl/isl.c > @@ -518,7 +518,6 @@ isl_calc_phys_level0_extent_sa(const struct isl_device > *dev, >assert(info->height == 1); >assert(info->depth == 1); >assert(info->samples == 1); > - assert(!isl_format_is_compressed(info->format)); > >switch (dim_layout) { >case ISL_DIM_LAYOUT_GEN4_3D: > @@ -527,8 +526,8 @@ isl_calc_phys_level0_extent_sa(const struct isl_device > *dev, >case ISL_DIM_LAYOUT_GEN9_1D: >case ISL_DIM_LAYOUT_GEN4_2D: > *phys_level0_sa = (struct isl_extent4d) { > -.w = info->width, > -.h = 1, > +.w = isl_align_npot(info->width, fmtl->bw), > +.h = fmtl->bh, > .d = 1, > .a = info->array_len, > }; > -- > 2.5.0.400.gff86faf > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] st/vdpau: flush the context before calling flush_frontbuffer
so that the texture is rendered to back buffer before calling flush_frontbuffer and can be copied to a different buffer in the function Signed-off-by: Nayan Deshmukh --- src/gallium/state_trackers/vdpau/presentation.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/gallium/state_trackers/vdpau/presentation.c b/src/gallium/state_trackers/vdpau/presentation.c index 2862eaf..f35d73a 100644 --- a/src/gallium/state_trackers/vdpau/presentation.c +++ b/src/gallium/state_trackers/vdpau/presentation.c @@ -271,11 +271,14 @@ vlVdpPresentationQueueDisplay(VdpPresentationQueue presentation_queue, } vscreen->set_next_timestamp(vscreen, earliest_presentation_time); - pipe->screen->flush_frontbuffer(pipe->screen, tex, 0, 0, - vscreen->get_private(vscreen), NULL); + // flush before calling flush_frontbuffer so that rendering is flushed + // to back buffer so the texture can be copied in flush_frontbuffer pipe->screen->fence_reference(pipe->screen, &surf->fence, NULL); pipe->flush(pipe, &surf->fence, 0); + pipe->screen->flush_frontbuffer(pipe->screen, tex, 0, 0, + vscreen->get_private(vscreen), NULL); + pq->last_surf = surf; if (dump_window == -1) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] st/va: flush the context before calling flush_frontbuffer
so that the texture is rendered to back buffer before calling flush_frontbuffer and can be copied to a different buffer in the function Signed-off-by: Nayan Deshmukh --- src/gallium/state_trackers/va/surface.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/va/surface.c b/src/gallium/state_trackers/va/surface.c index 3ee1cdd..05c890d 100644 --- a/src/gallium/state_trackers/va/surface.c +++ b/src/gallium/state_trackers/va/surface.c @@ -321,10 +321,13 @@ vlVaPutSurface(VADriverContextP ctx, VASurfaceID surface_id, void* draw, short s return status; } + // flush before calling flush_frontbuffer so that rendering is flushed + // to back buffer so the texture can be copied in flush_frontbuffer + drv->pipe->flush(drv->pipe, NULL, 0); + screen->flush_frontbuffer(screen, tex, 0, 0, vscreen->get_private(vscreen), NULL); - drv->pipe->flush(drv->pipe, NULL, 0); pipe_resource_reference(&tex, NULL); pipe_surface_reference(&surf_draw, NULL); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] vl/dri3: handle the case of different GPU(v4)
In case of prime when rendering is done on GPU other then the server GPU, use a seprate linear buffer for each back buffer which will be displayed using present extension. v2: Use a seprate linear buffer for each back buffer (Michel) v3: Change variable names and fix coding style (Leo and Emil) v4: Use PIPE_BIND_SAMPLER_VIEW for back buffer in case when a seprate linear buffer is used (Michel) Signed-off-by: Nayan Deshmukh --- src/gallium/auxiliary/vl/vl_winsys_dri3.c | 62 --- 1 file changed, 49 insertions(+), 13 deletions(-) diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c b/src/gallium/auxiliary/vl/vl_winsys_dri3.c index 3d596a6..f86300d 100644 --- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c +++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c @@ -49,6 +49,7 @@ struct vl_dri3_buffer { struct pipe_resource *texture; + struct pipe_resource *linear_texture; uint32_t pixmap; uint32_t sync_fence; @@ -69,6 +70,8 @@ struct vl_dri3_screen xcb_present_event_t eid; xcb_special_event_t *special_event; + struct pipe_context *pipe; + struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM]; int cur_back; @@ -82,6 +85,7 @@ struct vl_dri3_screen int64_t last_ust, ns_frame, last_msc, next_msc; bool flushed; + bool is_different_gpu; }; static void @@ -102,6 +106,8 @@ dri3_free_back_buffer(struct vl_dri3_screen *scrn, xcb_sync_destroy_fence(scrn->conn, buffer->sync_fence); xshmfence_unmap_shm(buffer->shm_fence); pipe_resource_reference(&buffer->texture, NULL); + if (buffer->linear_texture) + pipe_resource_reference(&buffer->linear_texture, NULL); FREE(buffer); } @@ -209,7 +215,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) xcb_sync_fence_t sync_fence; struct xshmfence *shm_fence; int buffer_fd, fence_fd; - struct pipe_resource templ; + struct pipe_resource templ, *pixmap_buffer_texture; struct winsys_handle whandle; unsigned usage; @@ -226,8 +232,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) goto close_fd; memset(&templ, 0, sizeof(templ)); - templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW | -PIPE_BIND_SCANOUT | PIPE_BIND_SHARED; + templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW; templ.format = PIPE_FORMAT_B8G8R8X8_UNORM; templ.target = PIPE_TEXTURE_2D; templ.last_level = 0; @@ -235,16 +240,35 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) templ.height0 = scrn->height; templ.depth0 = 1; templ.array_size = 1; - buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, - &templ); - if (!buffer->texture) - goto unmap_shm; + if (scrn->is_different_gpu) { + buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, +&templ); + if (!buffer->texture) + goto unmap_shm; + + templ.bind |= PIPE_BIND_SCANOUT | PIPE_BIND_SHARED | +PIPE_BIND_LINEAR; + buffer->linear_texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, + &templ); + pixmap_buffer_texture = buffer->linear_texture; + + if (!buffer->linear_texture) + goto no_linear_texture; + + } else { + templ.bind |= PIPE_BIND_SCANOUT | PIPE_BIND_SHARED; + buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, +&templ); + if (!buffer->texture) + goto unmap_shm; + pixmap_buffer_texture = buffer->texture; + } memset(&whandle, 0, sizeof(whandle)); whandle.type= DRM_API_HANDLE_TYPE_FD; usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ; scrn->base.pscreen->resource_get_handle(scrn->base.pscreen, NULL, - buffer->texture, &whandle, + pixmap_buffer_texture, &whandle, usage); buffer_fd = whandle.handle; buffer->pitch = whandle.stride; @@ -271,6 +295,8 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) return buffer; +no_linear_texture: + pipe_resource_reference(&buffer->texture, NULL); unmap_shm: xshmfence_unmap_shm(shm_fence); close_fd: @@ -474,6 +500,7 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen, struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)context_private; uint32_t options = XCB_PRESENT_OPTION_NONE; struct vl_dri3_buffer *back; + struct pipe_box src_box; back = scrn->back_buffers[scrn->cur_back]; if (!back) @@ -485,6 +512,16 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen, return; } + if (scrn->is_different_gpu) { + u_box_origin_2d(scrn->width, scrn->height, &src_box); +
Re: [Mesa-dev] [PATCH 62/95] i965/vec4: Add a shuffle_64bit_data helper
Iago Toral writes: > On Mon, 2016-09-12 at 14:19 -0700, Francisco Jerez wrote: >> Iago Toral Quiroga writes: >> >> > >> > SIMD4x2 64bit data is stored in register space like this: >> > >> > r0.0:DF x0 y0 z0 w0 >> > r0.1:DF x1 y1 z1 w1 >> > >> > When we need to write data such as this to memory using 32-bit >> > write >> > messages we need to shuffle it in this fashion: >> > >> > r0.0:DF x0 y0 x1 y1 >> > r0.1:DF z0 w0 z1 w1 >> > >> > and emit two 32-bit write messages, one for r0.0 at base_offset >> > and another one for r0.1 at base_offset+16. >> > >> > We also need to do the inverse operation when we read using 32-bit >> > messages >> > to produce valid SIMD4x2 64bit data from the data read. We can >> > achieve this >> > by aplying the exact same shuffling to the data read, although we >> > need to >> > apply different channel enables since the layout of the data is >> > reversed. >> > >> > This helper implements the data shuffling logic and we will use it >> > in >> > various places where we read and write 64bit data from/to memory. >> > --- >> > src/mesa/drivers/dri/i965/brw_vec4.h | 5 ++ >> > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 96 >> > ++ >> > 2 files changed, 101 insertions(+) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h >> > b/src/mesa/drivers/dri/i965/brw_vec4.h >> > index 26228d0..3337fc0 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_vec4.h >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h >> > @@ -327,6 +327,11 @@ public: >> > >> > src_reg setup_imm_df(double v); >> > >> > + vec4_instruction *shuffle_64bit_data(dst_reg dst, src_reg src, >> > +bool for_write, >> > +bblock_t *block = NULL, >> > +vec4_instruction *ref = >> > NULL); >> > + >> > virtual void emit_nir_code(); >> > virtual void nir_setup_uniforms(); >> > virtual void >> > nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr); >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > index 450db92..346e822 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > @@ -2145,4 +2145,100 @@ >> > vec4_visitor::nir_emit_undef(nir_ssa_undef_instr *instr) >> > dst_reg(VGRF, alloc.allocate(instr->def.bit_size / 32)); >> > } >> > >> > +/* SIMD4x2 64bit data is stored in register space like this: >> > + * >> > + * r0.0:DF x0 y0 z0 w0 >> > + * r0.1:DF x1 y1 z1 w1 >> > + * >> > + * When we need to write data such as this to memory using 32-bit >> > write >> > + * messages we need to shuffle it in this fashion: >> > + * >> > + * r0.0:DF x0 y0 x1 y1 (to be written at base offset) >> > + * r0.0:DF z0 w0 z1 w1 (to be written at base offset + 16) >> > + * >> > + * We need to do the inverse operation when we read using 32-bit >> > messages, >> > + * which we can do by applying the same exact shuffling on the 64- >> > bit data >> > + * read, only that because the data for each vertex is positioned >> > differently >> > + * we need to apply different channel enables. >> > + * >> > + * This function takes 64bit data and shuffles it as explained >> > above. >> > + * >> > + * The @for_write parameter is used to specify if the shuffling is >> > being done >> > + * for proper SIMD4x2 64-bit data that needs to be shuffled prior >> > to a 32-bit >> > + * write message (for_write = true), or instead we are doing the >> > inverse >> > + * opperation and we have just read 64-bit data using a 32-bit >> > messages that we >> > + * need to shuffle to create valid SIMD4x2 64-bit data (for_write >> > = false). >> > + * >> > + * If @block and @ref are non-NULL, then the shuffling is done >> > after @ref, >> > + * otherwise the instructions are emitted normally at the end. The >> > function >> > + * returns the last instruction inserted. >> > + * >> > + * Notice that @src and @dst cannot be the same register. >> > + */ >> > +vec4_instruction * >> > +vec4_visitor::shuffle_64bit_data(dst_reg dst, src_reg src, bool >> > for_write, >> > + bblock_t *block, vec4_instruction >> > *ref) >> > +{ >> > + assert(type_sz(src.type) == 8); >> > + assert(type_sz(dst.type) == 8); >> > + assert(!src.in_range(dst, 2)); >> > + assert(dst.writemask == WRITEMASK_XYZW); >> > + assert(!ref == !block); >> > + >> > + vec4_instruction *inst, *last; >> > + bool emit_before = ref != NULL; >> > + >> > + #define EMIT(i) \ >> > + if (!emit_before) { \ >> > + emit(i); \ >> > + } else { \ >> > + ref->insert_after(block, i); \ >> > + ref = i; \ >> > + } \ >> > + last = i; >> > + >> > + /* Resolve swizzle in src */ >> > + if (src.swizzle != BRW_SWIZZLE_XYZW) { >> > + dst_reg data = dst_reg(this, glsl_type::dvec4_type); >> > + inst = MOV(data, src); >>
Re: [Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg
Iago Toral writes: > On Mon, 2016-09-12 at 14:05 -0700, Francisco Jerez wrote: >> Iago Toral Quiroga writes: >> >> > >> > We will use this in cases where we want to force the vstride of a >> > src_reg >> > to 0 to exploit a particular behavior of the hardware. It will come >> > in >> > handy to implement access to components Z/W. >> > --- >> > src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 + >> > src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 ++ >> > 2 files changed, 3 insertions(+) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h >> > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h >> > index f66c093..f3cce4b 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h >> > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h >> > @@ -51,6 +51,7 @@ public: >> > explicit src_reg(const dst_reg ®); >> > >> > src_reg *reladdr; >> > + bool force_vstride0; >> I was wondering whether it would make more sense to unify this with >> the >> FS back-end's fs_reg::stride (a numeric stride field is also likely >> more >> convenient to do arithmetic on than a boolean) and promote it to >> backend_reg? It could be defined as the number of components to jump >> over for each logical channel of the register, which is just the >> vstride >> in single-precision SIMD4x2 and the hstride in scalar mode. > > We could do that, but I thought it would be a good idea to make it > clear that here we are using the vstride=0 with a very specific > intention and we don't expect the hardware to do what it would be > expected (we are trying to exploit a hardware bug after all). If we > were to use a normal stride field for this I think we would make this > intention much less obvious and other people reading the code would > have a much harder time understanding what is really going on. Since we > are being tricky here I think the extra field to signal that we are > trying to do something "special" might be worth it: people can track > where we read and write that field and see exactly where it is being > used for the purpose of exploiting this particular hardware behavior. > Yes, I agree that the hardware's behavior on Gen7 with non-identity vstride is tricky and special -- Special enough that *none* of the VEC4 optimization passes and IR-handling code need to be aware of it, because the field is only going to be used as internal book-keeping data structure in convert_to_hw_regs() and immediately discarded. IOW you're storing an internal data structure of convert_to_hw_regs() as part of the shared IR data structure, with no well-defined semantics and which no back-end code (not even convert_to_hw_regs()) is going to be able to honor. So if your argument for making the representation of vstride unnecessarily non-orthogonal is that you want to discourage people from using it at the IR level (which is fair because it won't work at all!), I would argue that it doesn't belong in the IR data structures in the first place, because you could just keep convert_to_hw_regs' internal data structures internal to convert_to_hw_regs. (I don't actually think you need the data structure, neither internal nor external, but more on that later) >> But thinking about it some more, I wonder if it's really necessary to >> expose vertical strides at the IR level? Aren't you planing to use >> this >> during the conversion to HW registers exclusively? Why don't you set >> the vstride field directly in that case? > > Yes, this is used exclusively at that time. The conversion to hardware > registers in convert_to_hw_regs() happens in two stages now: > > We call our 'expand_64bit_swizzle_to_32bit()' helper first. This one > takes care of checking the regioning on DF instructions, translate > swizzles and set force_vstrid0 to true when needed (which is also the > only place that would set this to true). Then the rest of the code in > convert_to_hw_regs() just operates as usual, only that it will check > the force_vstride0 setting to decide the vstride to use for DF regions. > > I did it like this because it allows us to keep the DF swizzle > translation and regioning checking logic separated from the conversion > to hardware registers, but this separation means that we need to tell > the latter when it has to set the vstride to 0, thus the addition of > the forcE_vstride0 field. I think having these two things separated > makes sense and makes the code easier to read. We can keep both things > separate and still avoid the force_vstride0 field by using a stride > field as you suggest above, but as I said, I think we might be doing a > rather tricky thing a bit less obvious than it should to other people. > Keeping these two tasks logically separate from each other sounds fine to me, but you don't need to extend the IR for them to exchange data. AFAICT expand_64bit_swizzle_to_32bit() is doing two things: - Calculate the hardware swizzle, which potentially involves an adjustment of the subregister offset -- These two are uniquely determined
Re: [Mesa-dev] [PATCH 1/3] nir: Call nir_metadata_preserve from nir_lower_alu_to_scalar().
Kenneth Graunke writes: > This is mandatory. This series is: Reviewed-by: Eric Anholt signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] nir/spirv/glsl450: Add support for the InterpolateAt opcodes
On 14 September 2016 at 14:26, Jason Ekstrand wrote: > Signed-off-by: Jason Ekstrand After writing my own version, I now understand the code enough to review it, The only thing I did different was to set a flag in the first switch to avoid the second switch, but otherwise this looks how it should. For both: Reviewed-by: Dave Airlie > --- > src/compiler/spirv/vtn_glsl450.c | 54 > +++- > 1 file changed, 53 insertions(+), 1 deletion(-) > > diff --git a/src/compiler/spirv/vtn_glsl450.c > b/src/compiler/spirv/vtn_glsl450.c > index e05d28f..cb0570d 100644 > --- a/src/compiler/spirv/vtn_glsl450.c > +++ b/src/compiler/spirv/vtn_glsl450.c > @@ -634,6 +634,57 @@ handle_glsl450_alu(struct vtn_builder *b, enum > GLSLstd450 entrypoint, > } > } > > +static void > +handle_glsl450_interpolation(struct vtn_builder *b, enum GLSLstd450 opcode, > + const uint32_t *w, unsigned count) > +{ > + const struct glsl_type *dest_type = > + vtn_value(b, w[1], vtn_value_type_type)->type->type; > + > + struct vtn_value *val = vtn_push_value(b, w[2], vtn_value_type_ssa); > + val->ssa = vtn_create_ssa_value(b, dest_type); > + > + nir_intrinsic_op op; > + switch (opcode) { > + case GLSLstd450InterpolateAtCentroid: > + op = nir_intrinsic_interp_var_at_centroid; > + break; > + case GLSLstd450InterpolateAtSample: > + op = nir_intrinsic_interp_var_at_sample; > + break; > + case GLSLstd450InterpolateAtOffset: > + op = nir_intrinsic_interp_var_at_offset; > + break; > + default: > + unreachable("Invalid opcode"); > + } > + > + nir_intrinsic_instr *intrin = nir_intrinsic_instr_create(b->nb.shader, > op); > + > + nir_deref_var *deref = vtn_nir_deref(b, w[5]); > + intrin->variables[0] = > + nir_deref_as_var(nir_copy_deref(intrin, &deref->deref)); > + > + switch (opcode) { > + case GLSLstd450InterpolateAtCentroid: > + break; > + case GLSLstd450InterpolateAtSample: > + case GLSLstd450InterpolateAtOffset: > + intrin->src[0] = nir_src_for_ssa(vtn_ssa_value(b, w[6])->def); > + break; > + default: > + unreachable("Invalid opcode"); > + } > + > + intrin->num_components = glsl_get_vector_elements(dest_type); > + nir_ssa_dest_init(&intrin->instr, &intrin->dest, > + glsl_get_vector_elements(dest_type), > + glsl_get_bit_size(dest_type), NULL); > + val->ssa->def = &intrin->dest.ssa; > + > + nir_builder_instr_insert(&b->nb, &intrin->instr); > +} > + > bool > vtn_handle_glsl450_instruction(struct vtn_builder *b, uint32_t ext_opcode, > const uint32_t *w, unsigned count) > @@ -656,7 +707,8 @@ vtn_handle_glsl450_instruction(struct vtn_builder *b, > uint32_t ext_opcode, > case GLSLstd450InterpolateAtCentroid: > case GLSLstd450InterpolateAtSample: > case GLSLstd450InterpolateAtOffset: > - unreachable("Unhandled opcode"); > + handle_glsl450_interpolation(b, ext_opcode, w, count); > + break; > > default: >handle_glsl450_alu(b, (enum GLSLstd450)ext_opcode, w, count); > -- > 2.5.0.400.gff86faf > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] nir/spirv: Claim support for SampleRateShading
We already support all of the decorations that require this capability. Signed-off-by: Jason Ekstrand --- src/compiler/spirv/spirv_to_nir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/spirv/spirv_to_nir.c b/src/compiler/spirv/spirv_to_nir.c index 7e7a026..0c6743b 100644 --- a/src/compiler/spirv/spirv_to_nir.c +++ b/src/compiler/spirv/spirv_to_nir.c @@ -2448,6 +2448,7 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, SpvOp opcode, case SpvCapabilityDerivativeControl: case SpvCapabilityInterpolationFunction: case SpvCapabilityMultiViewport: + case SpvCapabilitySampleRateShading: break; case SpvCapabilityClipDistance: @@ -2467,7 +2468,6 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, SpvOp opcode, case SpvCapabilityImageGatherExtended: case SpvCapabilityStorageImageMultisample: case SpvCapabilityImageCubeArray: - case SpvCapabilitySampleRateShading: case SpvCapabilityInt8: case SpvCapabilityInputAttachment: case SpvCapabilitySparseResidency: -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] nir/spirv/glsl450: Add support for the InterpolateAt opcodes
Signed-off-by: Jason Ekstrand --- src/compiler/spirv/vtn_glsl450.c | 54 +++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c index e05d28f..cb0570d 100644 --- a/src/compiler/spirv/vtn_glsl450.c +++ b/src/compiler/spirv/vtn_glsl450.c @@ -634,6 +634,57 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 entrypoint, } } +static void +handle_glsl450_interpolation(struct vtn_builder *b, enum GLSLstd450 opcode, + const uint32_t *w, unsigned count) +{ + const struct glsl_type *dest_type = + vtn_value(b, w[1], vtn_value_type_type)->type->type; + + struct vtn_value *val = vtn_push_value(b, w[2], vtn_value_type_ssa); + val->ssa = vtn_create_ssa_value(b, dest_type); + + nir_intrinsic_op op; + switch (opcode) { + case GLSLstd450InterpolateAtCentroid: + op = nir_intrinsic_interp_var_at_centroid; + break; + case GLSLstd450InterpolateAtSample: + op = nir_intrinsic_interp_var_at_sample; + break; + case GLSLstd450InterpolateAtOffset: + op = nir_intrinsic_interp_var_at_offset; + break; + default: + unreachable("Invalid opcode"); + } + + nir_intrinsic_instr *intrin = nir_intrinsic_instr_create(b->nb.shader, op); + + nir_deref_var *deref = vtn_nir_deref(b, w[5]); + intrin->variables[0] = + nir_deref_as_var(nir_copy_deref(intrin, &deref->deref)); + + switch (opcode) { + case GLSLstd450InterpolateAtCentroid: + break; + case GLSLstd450InterpolateAtSample: + case GLSLstd450InterpolateAtOffset: + intrin->src[0] = nir_src_for_ssa(vtn_ssa_value(b, w[6])->def); + break; + default: + unreachable("Invalid opcode"); + } + + intrin->num_components = glsl_get_vector_elements(dest_type); + nir_ssa_dest_init(&intrin->instr, &intrin->dest, + glsl_get_vector_elements(dest_type), + glsl_get_bit_size(dest_type), NULL); + val->ssa->def = &intrin->dest.ssa; + + nir_builder_instr_insert(&b->nb, &intrin->instr); +} + bool vtn_handle_glsl450_instruction(struct vtn_builder *b, uint32_t ext_opcode, const uint32_t *w, unsigned count) @@ -656,7 +707,8 @@ vtn_handle_glsl450_instruction(struct vtn_builder *b, uint32_t ext_opcode, case GLSLstd450InterpolateAtCentroid: case GLSLstd450InterpolateAtSample: case GLSLstd450InterpolateAtOffset: - unreachable("Unhandled opcode"); + handle_glsl450_interpolation(b, ext_opcode, w, count); + break; default: handle_glsl450_alu(b, (enum GLSLstd450)ext_opcode, w, count); -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2
On 14/09/16 02:53 AM, Marek Olšák wrote: > > cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/x86_64-linux-gnu > -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=O > -DCMAKE_BUILD_TYPE=RelWithDebInfo > -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \ > -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG > -fno-omit-frame-pointer" \ > -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG > -fno-omit-frame-pointer". FWIW, I recommend enabling assertions, i.e. setting -DLLVM_ENABLE_ASSERTIONS=1 and removing -DNDEBUG. > -DLLVM_BUILD_32_BITS=ON Hah, didn't know about this, I manually added -m32 to C(XX)FLAGS. Thanks. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: enable ARB_ES3_2_compatibility on gen8+
Note that ASTC support is not actually mandated for this extension to be exposed. Signed-off-by: Ilia Mirkin --- Also note that it doesn't seem required for the driver to simultaneously be exposing an actual ES 3.2 context. The ext does, however, nominally require GL 4.5. I think that can be ignored though. src/mesa/drivers/dri/i965/intel_extensions.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index f1ef4f6..fe22d3f 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -400,6 +400,7 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.ARB_shader_precision = true; ctx->Extensions.ARB_gpu_shader_fp64 = true; ctx->Extensions.ARB_vertex_attrib_64bit = true; + ctx->Extensions.ARB_ES3_2_compatibility = true; ctx->Extensions.OES_geometry_shader = true; ctx->Extensions.OES_texture_cube_map_array = true; } -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Enable ANDROID_extension_pack_es31a on Gen9+.
Seems reasonable. Perhaps it'd be worth figuring out what the deal with CHV's ASTC support is, since that's probably a more likely Android target. In the meanwhile, this is Reviewed-by: Ilia Mirkin On Tue, Sep 13, 2016 at 9:06 PM, Kenneth Graunke wrote: > AEP requires ASTC, which is only supported on Skylake and later. > > Signed-off-by: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/intel_extensions.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c > b/src/mesa/drivers/dri/i965/intel_extensions.c > index 0f28546..6bb73b8 100644 > --- a/src/mesa/drivers/dri/i965/intel_extensions.c > +++ b/src/mesa/drivers/dri/i965/intel_extensions.c > @@ -409,6 +409,7 @@ intelInitExtensions(struct gl_context *ctx) >ctx->Extensions.KHR_texture_compression_astc_ldr = true; >ctx->Extensions.KHR_texture_compression_astc_sliced_3d = true; >ctx->Extensions.ARB_shader_stencil_export = true; > + ctx->Extensions.ANDROID_extension_pack_es31a = true; >ctx->Extensions.MESA_shader_framebuffer_fetch = true; > } > > -- > 2.9.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Enable ANDROID_extension_pack_es31a on Gen9+.
AEP requires ASTC, which is only supported on Skylake and later. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/intel_extensions.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 0f28546..6bb73b8 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -409,6 +409,7 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.KHR_texture_compression_astc_ldr = true; ctx->Extensions.KHR_texture_compression_astc_sliced_3d = true; ctx->Extensions.ARB_shader_stencil_export = true; + ctx->Extensions.ANDROID_extension_pack_es31a = true; ctx->Extensions.MESA_shader_framebuffer_fetch = true; } -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] st/mesa: enable GL_ANDROID_extension_pack_es31a when available
For now that's never since advanced blend hasn't been piped through. Signed-off-by: Ilia Mirkin --- src/mesa/state_tracker/st_extensions.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 807fbfb..4d54928 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -1228,4 +1228,22 @@ void st_init_extensions(struct pipe_screen *screen, extensions->OES_primitive_bounding_box = extensions->ARB_ES3_1_compatibility; consts->NoPrimitiveBoundingBoxOutput = true; + + extensions->ANDROID_extension_pack_es31a = + extensions->KHR_texture_compression_astc_ldr && + extensions->KHR_blend_equation_advanced && + extensions->OES_sample_variables && + extensions->ARB_shader_image_load_store && + extensions->ARB_texture_stencil8 && + extensions->ARB_texture_multisample && + extensions->OES_copy_image && + extensions->ARB_draw_buffers_blend && + extensions->OES_geometry_shader && + extensions->ARB_gpu_shader5 && + extensions->OES_primitive_bounding_box && + extensions->ARB_tessellation_shader && + extensions->ARB_texture_border_clamp && + extensions->OES_texture_buffer && + extensions->OES_texture_cube_map_array && + extensions->EXT_texture_sRGB_decode; } -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] st/mesa: enable ARB_ES3_2_compatibility when enough available
Signed-off-by: Ilia Mirkin --- src/mesa/state_tracker/st_extensions.c | 20 1 file changed, 20 insertions(+) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 4d54928..55019d7 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -1246,4 +1246,24 @@ void st_init_extensions(struct pipe_screen *screen, extensions->OES_texture_buffer && extensions->OES_texture_cube_map_array && extensions->EXT_texture_sRGB_decode; + + /* Same deal as for ARB_ES3_1_compatibility - this has to be computed +* before overall versions are selected. Also it's actually a subset of ES +* 3.2, since it doesn't require ASTC or advanced blending. +*/ + extensions->ARB_ES3_2_compatibility = + extensions->ARB_ES3_1_compatibility && + extensions->KHR_robustness && + extensions->ARB_copy_image && + extensions->ARB_draw_buffers_blend && + extensions->ARB_draw_elements_base_vertex && + extensions->OES_geometry_shader && + extensions->ARB_gpu_shader5 && + extensions->ARB_sample_shading && + extensions->ARB_tessellation_shader && + extensions->ARB_texture_border_clamp && + extensions->OES_texture_buffer && + extensions->ARB_texture_cube_map_array && + extensions->ARB_texture_stencil8 && + extensions->ARB_texture_multisample; } -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] nir: Report progress from nir_lower_phis_to_scalar.
Signed-off-by: Kenneth Graunke --- src/compiler/nir/nir.h | 2 +- src/compiler/nir/nir_lower_phis_to_scalar.c | 20 +++- src/gallium/drivers/freedreno/ir3/ir3_nir.c | 2 +- src/gallium/drivers/vc4/vc4_program.c | 3 +-- src/mesa/drivers/dri/i965/brw_nir.c | 2 +- 5 files changed, 19 insertions(+), 10 deletions(-) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index ea8837d..8c7837a 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2411,7 +2411,7 @@ bool nir_lower_vec_to_movs(nir_shader *shader); bool nir_lower_alu_to_scalar(nir_shader *shader); void nir_lower_load_const_to_scalar(nir_shader *shader); -void nir_lower_phis_to_scalar(nir_shader *shader); +bool nir_lower_phis_to_scalar(nir_shader *shader); void nir_lower_io_to_scalar(nir_shader *shader, nir_variable_mode mask); void nir_lower_samplers(nir_shader *shader, diff --git a/src/compiler/nir/nir_lower_phis_to_scalar.c b/src/compiler/nir/nir_lower_phis_to_scalar.c index 9fd00cc..b12718f 100644 --- a/src/compiler/nir/nir_lower_phis_to_scalar.c +++ b/src/compiler/nir/nir_lower_phis_to_scalar.c @@ -166,6 +166,8 @@ static bool lower_phis_to_scalar_block(nir_block *block, struct lower_phis_to_scalar_state *state) { + bool progress = false; + /* Find the last phi node in the block */ nir_phi_instr *last_phi = NULL; nir_foreach_instr(instr, block) { @@ -248,6 +250,8 @@ lower_phis_to_scalar_block(nir_block *block, ralloc_steal(state->dead_ctx, phi); nir_instr_remove(&phi->instr); + progress = true; + /* We're using the safe iterator and inserting all the newly * scalarized phi nodes before their non-scalarized version so that's * ok. However, we are also inserting vec operations after all of @@ -258,13 +262,14 @@ lower_phis_to_scalar_block(nir_block *block, break; } - return true; + return progress; } -static void +static bool lower_phis_to_scalar_impl(nir_function_impl *impl) { struct lower_phis_to_scalar_state state; + bool progress = false; state.mem_ctx = ralloc_parent(impl); state.dead_ctx = ralloc_context(NULL); @@ -272,13 +277,14 @@ lower_phis_to_scalar_impl(nir_function_impl *impl) _mesa_key_pointer_equal); nir_foreach_block(block, impl) { - lower_phis_to_scalar_block(block, &state); + progress = lower_phis_to_scalar_block(block, &state) || progress; } nir_metadata_preserve(impl, nir_metadata_block_index | nir_metadata_dominance); ralloc_free(state.dead_ctx); + return progress; } /** A pass that lowers vector phi nodes to scalar @@ -288,11 +294,15 @@ lower_phis_to_scalar_impl(nir_function_impl *impl) * instance, if one of the sources is a non-scalarizable vector, then we * don't bother lowering because that would generate hard-to-coalesce movs. */ -void +bool nir_lower_phis_to_scalar(nir_shader *shader) { + bool progress = false; + nir_foreach_function(function, shader) { if (function->impl) - lower_phis_to_scalar_impl(function->impl); + progress = lower_phis_to_scalar_impl(function->impl) || progress; } + + return progress; } diff --git a/src/gallium/drivers/freedreno/ir3/ir3_nir.c b/src/gallium/drivers/freedreno/ir3/ir3_nir.c index 2526222..2d86a52 100644 --- a/src/gallium/drivers/freedreno/ir3/ir3_nir.c +++ b/src/gallium/drivers/freedreno/ir3/ir3_nir.c @@ -91,7 +91,7 @@ ir3_optimize_loop(nir_shader *s) OPT_V(s, nir_lower_vars_to_ssa); progress |= OPT(s, nir_lower_alu_to_scalar); - OPT_V(s, nir_lower_phis_to_scalar); + progress |= OPT(s, nir_lower_phis_to_scalar); progress |= OPT(s, nir_copy_prop); progress |= OPT(s, nir_opt_dce); diff --git a/src/gallium/drivers/vc4/vc4_program.c b/src/gallium/drivers/vc4/vc4_program.c index ca0bd44..64c075a 100644 --- a/src/gallium/drivers/vc4/vc4_program.c +++ b/src/gallium/drivers/vc4/vc4_program.c @@ -1424,8 +1424,7 @@ vc4_optimize_nir(struct nir_shader *s) NIR_PASS_V(s, nir_lower_vars_to_ssa); NIR_PASS(progress, s, nir_lower_alu_to_scalar); -NIR_PASS_V(s, nir_lower_phis_to_scalar); - +NIR_PASS(progress, s, nir_lower_phis_to_scalar); NIR_PASS(progress, s, nir_copy_prop); NIR_PASS(progress, s, nir_opt_remove_phis); NIR_PASS(progress, s, nir_opt_dce); diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c index 27be201..5b2130f 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.c +++ b/src/mesa/drivers/dri/i965/brw_nir.c @@ -381,7 +381,7 @@ nir_optimize(nir_shader *nir, bool is_scalar) OPT(nir_copy_prop); if (is_scalar) { - OPT_V(nir_lower_phis_to_scalar); +
[Mesa-dev] [PATCH 2/3] nir: Report progress from nir_lower_alu_to_scalar.
Signed-off-by: Kenneth Graunke --- src/compiler/nir/nir.h | 2 +- src/compiler/nir/nir_lower_alu_to_scalar.c | 42 ++--- src/gallium/drivers/freedreno/ir3/ir3_nir.c | 2 +- src/gallium/drivers/vc4/vc4_program.c | 2 +- src/mesa/drivers/dri/i965/brw_nir.c | 2 +- 5 files changed, 30 insertions(+), 20 deletions(-) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index ff7c422..ea8837d 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2408,7 +2408,7 @@ bool nir_remove_dead_variables(nir_shader *shader, nir_variable_mode modes); void nir_move_vec_src_uses_to_dest(nir_shader *shader); bool nir_lower_vec_to_movs(nir_shader *shader); -void nir_lower_alu_to_scalar(nir_shader *shader); +bool nir_lower_alu_to_scalar(nir_shader *shader); void nir_lower_load_const_to_scalar(nir_shader *shader); void nir_lower_phis_to_scalar(nir_shader *shader); diff --git a/src/compiler/nir/nir_lower_alu_to_scalar.c b/src/compiler/nir/nir_lower_alu_to_scalar.c index a84fbdf..fa18deb 100644 --- a/src/compiler/nir/nir_lower_alu_to_scalar.c +++ b/src/compiler/nir/nir_lower_alu_to_scalar.c @@ -73,7 +73,7 @@ lower_reduction(nir_alu_instr *instr, nir_op chan_op, nir_op merge_op, nir_instr_remove(&instr->instr); } -static void +static bool lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) { unsigned num_src = nir_op_infos[instr->op].num_inputs; @@ -90,7 +90,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) case name##3: \ case name##4: \ lower_reduction(instr, chan, merge, b); \ - return; + return true; switch (instr->op) { case nir_op_vec4: @@ -99,11 +99,11 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) /* We don't need to scalarize these ops, they're the ones generated to * group up outputs into a value that can be SSAed. */ - return; + return false; case nir_op_pack_half_2x16: if (!b->shader->options->lower_pack_half_2x16) - return; + return false; nir_ssa_def *val = nir_pack_half_2x16_split(b, nir_channel(b, instr->src[0].src.ssa, @@ -113,7 +113,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(val)); nir_instr_remove(&instr->instr); - return; + return true; case nir_op_unpack_unorm_4x8: case nir_op_unpack_snorm_4x8: @@ -122,11 +122,11 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) /* There is no scalar version of these ops, unless we were to break it * down to bitshifts and math (which is definitely not intended). */ - return; + return false; case nir_op_unpack_half_2x16: { if (!b->shader->options->lower_unpack_half_2x16) - return; + return false; nir_ssa_def *comps[2]; comps[0] = nir_unpack_half_2x16_split_x(b, instr->src[0].src.ssa); @@ -135,7 +135,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(vec)); nir_instr_remove(&instr->instr); - return; + return true; } case nir_op_pack_uvec2_to_uint: { @@ -185,11 +185,11 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(val)); nir_instr_remove(&instr->instr); - return; + return true; } case nir_op_unpack_double_2x32: - return; + return false; LOWER_REDUCTION(nir_op_fdot, nir_op_fmul, nir_op_fadd); LOWER_REDUCTION(nir_op_ball_fequal, nir_op_feq, nir_op_iand); @@ -204,7 +204,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) } if (instr->dest.dest.ssa.num_components == 1) - return; + return false; unsigned num_components = instr->dest.dest.ssa.num_components; nir_ssa_def *comps[] = { NULL, NULL, NULL, NULL }; @@ -240,30 +240,40 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(vec)); nir_instr_remove(&instr->instr); + return true; } -static void +static bool nir_lower_alu_to_scalar_impl(nir_function_impl *impl) { nir_builder builder; nir_builder_init(&builder, impl); + bool progress = false; nir_foreach_block(block, impl) { nir_foreach_instr_safe(instr, block) { - if (instr->type == nir_instr_type_alu) -lower_alu_instr_scalar(nir_instr_as_alu(instr), &builder); + if (instr->type == nir_instr_type_alu) { +progress = lower_alu_instr_scalar(nir_instr_as_alu(instr), + &builder) || progress; + } } } nir_metadata_preserve(impl, nir_metadata_block_index | nir_met
[Mesa-dev] [PATCH 1/3] nir: Call nir_metadata_preserve from nir_lower_alu_to_scalar().
This is mandatory. Cc: mesa-sta...@lists.freedesktop.org Signed-off-by: Kenneth Graunke --- src/compiler/nir/nir_lower_alu_to_scalar.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/compiler/nir/nir_lower_alu_to_scalar.c b/src/compiler/nir/nir_lower_alu_to_scalar.c index 4f72cf7..a84fbdf 100644 --- a/src/compiler/nir/nir_lower_alu_to_scalar.c +++ b/src/compiler/nir/nir_lower_alu_to_scalar.c @@ -254,6 +254,9 @@ nir_lower_alu_to_scalar_impl(nir_function_impl *impl) lower_alu_instr_scalar(nir_instr_as_alu(instr), &builder); } } + + nir_metadata_preserve(impl, nir_metadata_block_index | + nir_metadata_dominance); } void -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/6] RadeonSI: Let's just stop spilling SGPRs
Patches 2,3 and 5 are easy enough and so, Reviewed-by: Edward O'Callaghan Those numbers do look good but I think I'll leave the rest up to other folks to review. Kind Regards, Edward. On 09/14/2016 03:13 AM, Marek Olšák wrote: > This is quite easy because we just have to get rid of all of > the preloading at the beginning of shaders. > > I also removed preloading of PS inputs with literal indexing, which > has almost the same effect as sinking interp instructions. > > I'm slightly concerned that LICM won't move interps because they are > not considered speculatively-executable (=movable) by LLVM, but > the shader-db stats show that it doesn't matter. > > LLVM is smart enough to do CSE where needed for both descriptor loads > and interps. In fact, it's the CSE which is responsible for some of > the remaining SGPR spills. (It makes sense if you think about it) > > The compile time increased by 6% because CSE has a lot more work, > but it's certainly worth it. > > > shader-db stats: > > [PATCH 4/6] radeonsi: get rid of img/buf/sampler descriptor > https://people.freedesktop.org/~mareko/no_preload1.html > [PATCH 5/6] radeonsi: get rid of constant buffer preloading > https://people.freedesktop.org/~mareko/no_preload2.html > [PATCH 6/6] radeonsi: reload PS inputs with direct indexing at each > https://people.freedesktop.org/~mareko/no_preload3_ps.html > > Total diff: > https://people.freedesktop.org/~mareko/no_preload_total.html > > Please review. > > Marek > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 1/3] mesa: add a GLES3.2 enums section, and expose new MS line width params
On Tuesday, September 13, 2016 7:10:57 PM PDT Ilia Mirkin wrote: > This also exposes them for ARB_ES3_2_compatibility. > > While both specs refer to the new MS line width parameters being > separate from the existing AA line widths, reality begs to differ. It's > the same on all hardware currently supported by mesa. Should hardware > come along that wants these to be different, they're easy enough to > separate out. > > Signed-off-by: Ilia Mirkin > Reviewed-by: Ian Romanick (v1) Series is: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 0/3] *** Aubinator code simplification ***
On Tuesday, September 13, 2016 4:19:28 PM PDT Sirisha Gandikota wrote: > From: Sirisha Gandikota > > This patch set simplifies parts of code in the aubinator tool > as per review comments from Ken (Wed Aug 24 04:51:47 UTC 2016) > > v2 of the earlier patches simplifying code further as per Ken's comments > > Sirisha Gandikota (3): > aubinator: Simplify print_dword_val() method > aubinator: Make gen_disasm_disassemble handle split sends > aubinator: Remove bogus "end" parameter in gen_disasm_disassemble() > > src/intel/tools/aubinator.c | 24 ++-- > src/intel/tools/disasm.c | 27 +-- > src/intel/tools/gen_disasm.h | 2 +- > 3 files changed, 28 insertions(+), 25 deletions(-) Pushed, thanks! To ssh://git.freedesktop.org/git/mesa/mesa 1eebb60..aa7b410 master -> master signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 0/3] *** Aubinator code simplification ***
From: Sirisha Gandikota This patch set simplifies parts of code in the aubinator tool as per review comments from Ken (Wed Aug 24 04:51:47 UTC 2016) v2 of the earlier patches simplifying code further as per Ken's comments Sirisha Gandikota (3): aubinator: Simplify print_dword_val() method aubinator: Make gen_disasm_disassemble handle split sends aubinator: Remove bogus "end" parameter in gen_disasm_disassemble() src/intel/tools/aubinator.c | 24 ++-- src/intel/tools/disasm.c | 27 +-- src/intel/tools/gen_disasm.h | 2 +- 3 files changed, 28 insertions(+), 25 deletions(-) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/3] aubinator: Remove bogus "end" parameter in gen_disasm_disassemble()
From: Sirisha Gandikota Earlier, the loop pretends to loop over instructions from "start" to "end", but the callers always pass 8192 for end, which is some huge bogus value. The real loop termination condition is send-with-EOT or 0. (Ken) v2: no change Signed-off-by: Sirisha Gandikota --- src/intel/tools/aubinator.c | 12 ++-- src/intel/tools/disasm.c | 8 +--- src/intel/tools/gen_disasm.h | 2 +- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c index 89d29f2..fad8aaa 100644 --- a/src/intel/tools/aubinator.c +++ b/src/intel/tools/aubinator.c @@ -303,7 +303,7 @@ handle_media_interface_descriptor_load(struct gen_spec *spec, uint32_t *p) } insns = (struct brw_instruction *) (gtt + start); - gen_disasm_disassemble(disasm, insns, 0, 8192, stdout); + gen_disasm_disassemble(disasm, insns, 0, stdout); dump_samplers(spec, descriptors[3] & ~0x1f); dump_binding_table(spec, descriptors[4] & ~0x1f); @@ -401,7 +401,7 @@ handle_3dstate_vs(struct gen_spec *spec, uint32_t *p) instruction_base, start); insns = (struct brw_instruction *) (gtt + instruction_base + start); - gen_disasm_disassemble(disasm, insns, 0, 8192, stdout); + gen_disasm_disassemble(disasm, insns, 0, stdout); } } @@ -425,7 +425,7 @@ handle_3dstate_hs(struct gen_spec *spec, uint32_t *p) instruction_base, start); insns = (struct brw_instruction *) (gtt + instruction_base + start); - gen_disasm_disassemble(disasm, insns, 0, 8192, stdout); + gen_disasm_disassemble(disasm, insns, 0, stdout); } } @@ -519,21 +519,21 @@ handle_3dstate_ps(struct gen_spec *spec, uint32_t *p) printf(" Kernel[0] %s\n", k0); if (k0 != unused) { insns = (struct brw_instruction *) (gtt + start); - gen_disasm_disassemble(disasm, insns, 0, 8192, stdout); + gen_disasm_disassemble(disasm, insns, 0, stdout); } start = instruction_base + (p[k1_offset] & mask); printf(" Kernel[1] %s\n", k1); if (k1 != unused) { insns = (struct brw_instruction *) (gtt + start); - gen_disasm_disassemble(disasm, insns, 0, 8192, stdout); + gen_disasm_disassemble(disasm, insns, 0, stdout); } start = instruction_base + (p[k2_offset] & mask); printf(" Kernel[2] %s\n", k2); if (k2 != unused) { insns = (struct brw_instruction *) (gtt + start); - gen_disasm_disassemble(disasm, insns, 0, 8192, stdout); + gen_disasm_disassemble(disasm, insns, 0, stdout); } } diff --git a/src/intel/tools/disasm.c b/src/intel/tools/disasm.c index 89c711b..2b51424 100644 --- a/src/intel/tools/disasm.c +++ b/src/intel/tools/disasm.c @@ -45,13 +45,15 @@ is_send(uint32_t opcode) } void -gen_disasm_disassemble(struct gen_disasm *disasm, void *assembly, int start, - int end, FILE *out) +gen_disasm_disassemble(struct gen_disasm *disasm, void *assembly, + int start, FILE *out) { struct gen_device_info *devinfo = &disasm->devinfo; bool dump_hex = false; + int offset = start; - for (int offset = start; offset < end;) { + /* This loop exits when send-with-EOT or when opcode is 0 */ + while (true) { brw_inst *insn = assembly + offset; brw_inst uncompacted; bool compacted = brw_inst_cmpt_control(devinfo, insn); diff --git a/src/intel/tools/gen_disasm.h b/src/intel/tools/gen_disasm.h index af6654f..24b56c9 100644 --- a/src/intel/tools/gen_disasm.h +++ b/src/intel/tools/gen_disasm.h @@ -28,7 +28,7 @@ struct gen_disasm; struct gen_disasm *gen_disasm_create(int pciid); void gen_disasm_disassemble(struct gen_disasm *disasm, -void *assembly, int start, int end, FILE *out); +void *assembly, int start, FILE *out); void gen_disasm_destroy(struct gen_disasm *disasm); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/3] aubinator: Make gen_disasm_disassemble handle split sends
From: Sirisha Gandikota Skylake adds new SENDS and SENDSC opcodes, which should be handled in the send-with-EOT check. Make an is_send() helper that checks if the opcode is SEND/SENDC/SENDS/SENDSC (Ken) v2: Make is_send() much more crispier, Mix declaration and code to make the code compact (Ken) Signed-off-by: Sirisha Gandikota --- src/intel/tools/disasm.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/src/intel/tools/disasm.c b/src/intel/tools/disasm.c index 7e5a7cb..89c711b 100644 --- a/src/intel/tools/disasm.c +++ b/src/intel/tools/disasm.c @@ -35,6 +35,15 @@ struct gen_disasm { struct gen_device_info devinfo; }; +static bool +is_send(uint32_t opcode) +{ + return (opcode == BRW_OPCODE_SEND || + opcode == BRW_OPCODE_SENDC || + opcode == BRW_OPCODE_SENDS || + opcode == BRW_OPCODE_SENDSC ); +} + void gen_disasm_disassemble(struct gen_disasm *disasm, void *assembly, int start, int end, FILE *out) @@ -74,14 +83,10 @@ gen_disasm_disassemble(struct gen_disasm *disasm, void *assembly, int start, brw_disassemble_inst(out, devinfo, insn, compacted); /* Simplistic, but efficient way to terminate disasm */ - if (brw_inst_opcode(devinfo, insn) == BRW_OPCODE_SEND || - brw_inst_opcode(devinfo, insn) == BRW_OPCODE_SENDC) { - if (brw_inst_eot(devinfo, insn)) -break; - } - - if (brw_inst_opcode(devinfo, insn) == 0) + uint32_t opcode = brw_inst_opcode(devinfo, insn); + if (opcode == 0 || (is_send(opcode) && brw_inst_eot(devinfo, insn))) { break; + } } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/3] aubinator: Simplify print_dword_val() method
From: Sirisha Gandikota Remove the float/dword union and use the iter->p[f->start / 32] directly as printf formatter %08x expects uint32_t (Ken) v2: Make the cleanup much more crispier (Ken) Signed-off-by: Sirisha Gandikota --- src/intel/tools/aubinator.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c index 9d29b68..89d29f2 100644 --- a/src/intel/tools/aubinator.c +++ b/src/intel/tools/aubinator.c @@ -91,18 +91,14 @@ print_dword_val(struct gen_field_iterator *iter, uint64_t offset, int *dword_num) { struct gen_field *f; - union { - uint32_t dw; - float f; - } v; f = iter->group->fields[iter->i - 1]; - v.dw = iter->p[f->start / 32]; + const int dword = f->start / 32; - if (*dword_num != (f->start / 32)) { + if (*dword_num != dword) { printf("0x%08lx: 0x%08x : Dword %d\n", - offset + 4 * (f->start / 32), v.dw, f->start / 32); - *dword_num = (f->start / 32); + offset + 4 * dword, iter->p[dword], dword); + *dword_num = dword; } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 3/3] glsl: add core plumbing for GL_ANDROID_extension_pack_es31a
Signed-off-by: Ilia Mirkin --- src/compiler/glsl/glsl_parser_extras.cpp | 58 +++- src/compiler/glsl/glsl_parser_extras.h | 2 ++ src/mesa/main/extensions_table.h | 2 ++ src/mesa/main/mtypes.h | 1 + 4 files changed, 47 insertions(+), 16 deletions(-) diff --git a/src/compiler/glsl/glsl_parser_extras.cpp b/src/compiler/glsl/glsl_parser_extras.cpp index 436ddd0..0e9bfa7 100644 --- a/src/compiler/glsl/glsl_parser_extras.cpp +++ b/src/compiler/glsl/glsl_parser_extras.cpp @@ -523,6 +523,11 @@ struct _mesa_glsl_extension { const char *name; /** +* Whether this extension is a part of AEP +*/ + bool aep; + + /** * Predicate that checks whether the relevant extension is available for * this context. */ @@ -565,9 +570,14 @@ has_##name_str(const struct gl_context *ctx, gl_api api, uint8_t version) \ #undef EXT #define EXT(NAME) \ - { "GL_" #NAME, has_##NAME, \ - &_mesa_glsl_parse_state::NAME##_enable,\ - &_mesa_glsl_parse_state::NAME##_warn } + { "GL_" #NAME, false, has_##NAME,\ + &_mesa_glsl_parse_state::NAME##_enable,\ + &_mesa_glsl_parse_state::NAME##_warn } + +#define EXT_AEP(NAME) \ + { "GL_" #NAME, true, has_##NAME, \ + &_mesa_glsl_parse_state::NAME##_enable,\ + &_mesa_glsl_parse_state::NAME##_warn } /** * Table of extensions that can be enabled/disabled within a shader, @@ -623,7 +633,7 @@ static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = { /* KHR extensions go here, sorted alphabetically. */ - EXT(KHR_blend_equation_advanced), + EXT_AEP(KHR_blend_equation_advanced), /* OES extensions go here, sorted alphabetically. */ @@ -632,17 +642,17 @@ static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = { EXT(OES_geometry_shader), EXT(OES_gpu_shader5), EXT(OES_primitive_bounding_box), - EXT(OES_sample_variables), - EXT(OES_shader_image_atomic), + EXT_AEP(OES_sample_variables), + EXT_AEP(OES_shader_image_atomic), EXT(OES_shader_io_blocks), - EXT(OES_shader_multisample_interpolation), + EXT_AEP(OES_shader_multisample_interpolation), EXT(OES_standard_derivatives), EXT(OES_tessellation_point_size), EXT(OES_tessellation_shader), EXT(OES_texture_3D), EXT(OES_texture_buffer), EXT(OES_texture_cube_map_array), - EXT(OES_texture_storage_multisample_2d_array), + EXT_AEP(OES_texture_storage_multisample_2d_array), /* All other extensions go here, sorted alphabetically. */ @@ -651,23 +661,24 @@ static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = { EXT(AMD_shader_trinary_minmax), EXT(AMD_vertex_shader_layer), EXT(AMD_vertex_shader_viewport_index), + EXT(ANDROID_extension_pack_es31a), EXT(EXT_blend_func_extended), EXT(EXT_draw_buffers), EXT(EXT_clip_cull_distance), EXT(EXT_geometry_point_size), - EXT(EXT_geometry_shader), - EXT(EXT_gpu_shader5), - EXT(EXT_primitive_bounding_box), + EXT_AEP(EXT_geometry_shader), + EXT_AEP(EXT_gpu_shader5), + EXT_AEP(EXT_primitive_bounding_box), EXT(EXT_separate_shader_objects), EXT(EXT_shader_framebuffer_fetch), EXT(EXT_shader_integer_mix), - EXT(EXT_shader_io_blocks), + EXT_AEP(EXT_shader_io_blocks), EXT(EXT_shader_samples_identical), EXT(EXT_tessellation_point_size), - EXT(EXT_tessellation_shader), + EXT_AEP(EXT_tessellation_shader), EXT(EXT_texture_array), - EXT(EXT_texture_buffer), - EXT(EXT_texture_cube_map_array), + EXT_AEP(EXT_texture_buffer), + EXT_AEP(EXT_texture_cube_map_array), EXT(MESA_shader_integer_functions), }; @@ -713,7 +724,6 @@ static const _mesa_glsl_extension *find_extension(const char *name) return NULL; } - bool _mesa_glsl_process_extension(const char *name, YYLTYPE *name_locp, const char *behavior_string, YYLTYPE *behavior_locp, @@ -768,6 +778,22 @@ _mesa_glsl_process_extension(const char *name, YYLTYPE *name_locp, const _mesa_glsl_extension *extension = find_extension(name); if (extension && extension->compatible_with_state(state, api, gl_version)) { extension->set_flags(state, behavior); + if (extension->available_pred == has_ANDROID_extension_pack_es31a) { +for (unsigned i = 0; + i < ARRAY_SIZE(_mesa_glsl_supported_extensions); ++i) { + const _mesa_glsl_extension *extension = + &_mesa_glsl_supported_extensions[i]; + + if (!extension->aep) + continue; + /* AEP should not be enabled if all of the sub-extensions can't +* also be enabled. This is not the proper layer to do such +* error-checking though
[Mesa-dev] [PATCH v3 2/3] mesa: introduce glPrimitiveBoundingBoxARB entrypoint
This requires a bit of rejiggering, since normally ES entrypoints alias core ones, not vice-versa. Signed-off-by: Ilia Mirkin Reviewed-by: Ian Romanick --- src/mapi/glapi/gen/es_EXT.xml | 19 - src/mapi/glapi/gen/gl_API.xml | 37 + src/mesa/main/tests/dispatch_sanity.cpp | 3 +++ 3 files changed, 40 insertions(+), 19 deletions(-) diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml index b9fbec4..332dc5e 100644 --- a/src/mapi/glapi/gen/es_EXT.xml +++ b/src/mapi/glapi/gen/es_EXT.xml @@ -1342,23 +1342,4 @@ - - - - - - - - - - - - - - - - - - - diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index c39aa22..17c59db 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8318,6 +8318,43 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 42fe61a..c87b1dc 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -1866,6 +1866,9 @@ const struct function gl_core_functions_possible[] = { { "glMultiDrawArraysIndirectCountARB", 31, -1 }, { "glMultiDrawElementsIndirectCountARB", 31, -1 }, + /* GL_ARB_ES3_2_compatibility */ + { "glPrimitiveBoundingBoxARB", 45, -1 }, + { NULL, 0, -1 } }; -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 1/3] mesa: add a GLES3.2 enums section, and expose new MS line width params
This also exposes them for ARB_ES3_2_compatibility. While both specs refer to the new MS line width parameters being separate from the existing AA line widths, reality begs to differ. It's the same on all hardware currently supported by mesa. Should hardware come along that wants these to be different, they're easy enough to separate out. Signed-off-by: Ilia Mirkin Reviewed-by: Ian Romanick (v1) --- v3: drop separate constants for the MS params, reuse the AA ones src/mesa/main/context.h | 10 ++ src/mesa/main/get.c | 26 -- src/mesa/main/get_hash_generator.py | 15 +++ src/mesa/main/get_hash_params.py| 5 + 4 files changed, 46 insertions(+), 10 deletions(-) diff --git a/src/mesa/main/context.h b/src/mesa/main/context.h index 4cd149d..520b3bb 100644 --- a/src/mesa/main/context.h +++ b/src/mesa/main/context.h @@ -318,6 +318,16 @@ _mesa_is_gles31(const struct gl_context *ctx) /** + * Checks if the context is for GLES 3.2 or later + */ +static inline bool +_mesa_is_gles32(const struct gl_context *ctx) +{ + return ctx->API == API_OPENGLES2 && ctx->Version >= 32; +} + + +/** * Checks if the context supports geometry shaders. */ static inline bool diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 810ccb9..3cabb2b 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -142,6 +142,7 @@ enum value_extra { EXTRA_API_ES2, EXTRA_API_ES3, EXTRA_API_ES31, + EXTRA_API_ES32, EXTRA_NEW_BUFFERS, EXTRA_NEW_FRAG_CLAMP, EXTRA_VALID_DRAW_BUFFER, @@ -416,6 +417,12 @@ static const int extra_ARB_gpu_shader5_or_OES_sample_variables[] = { EXTRA_END }; +static const int extra_ES32[] = { + EXT(ARB_ES3_2_compatibility), + EXTRA_API_ES32, + EXTRA_END +}; + EXTRA_EXT(ARB_texture_cube_map); EXTRA_EXT(EXT_texture_array); EXTRA_EXT(NV_fog_distance); @@ -1164,6 +1171,11 @@ check_extra(struct gl_context *ctx, const char *func, const struct value_desc *d if (_mesa_is_gles31(ctx)) api_found = GL_TRUE; break; + case EXTRA_API_ES32: + api_check = GL_TRUE; + if (_mesa_is_gles32(ctx)) +api_found = GL_TRUE; +break; case EXTRA_API_GL: api_check = GL_TRUE; if (_mesa_is_desktop_gl(ctx)) @@ -1312,12 +1324,14 @@ find_value(const char *func, GLenum pname, void **p, union value *v) * value since it's compatible with GLES2 its entry in table_set[] is at the * end. */ - STATIC_ASSERT(ARRAY_SIZE(table_set) == API_OPENGL_LAST + 3); - if (_mesa_is_gles3(ctx)) { - api = API_OPENGL_LAST + 1; - } - if (_mesa_is_gles31(ctx)) { - api = API_OPENGL_LAST + 2; + STATIC_ASSERT(ARRAY_SIZE(table_set) == API_OPENGL_LAST + 4); + if (ctx->API == API_OPENGLES2) { + if (ctx->Version >= 32) + api = API_OPENGL_LAST + 3; + else if (ctx->Version >= 31) + api = API_OPENGL_LAST + 2; + else if (ctx->Version >= 30) + api = API_OPENGL_LAST + 1; } mask = ARRAY_SIZE(table(api)) - 1; hash = (pname * prime_factor); diff --git a/src/mesa/main/get_hash_generator.py b/src/mesa/main/get_hash_generator.py index c777b78..d7460c8 100644 --- a/src/mesa/main/get_hash_generator.py +++ b/src/mesa/main/get_hash_generator.py @@ -44,7 +44,7 @@ prime_factor = 89 prime_step = 281 hash_table_size = 1024 -gl_apis=set(["GL", "GL_CORE", "GLES", "GLES2", "GLES3", "GLES31"]) +gl_apis=set(["GL", "GL_CORE", "GLES", "GLES2", "GLES3", "GLES31", "GLES32"]) def print_header(): print "typedef const unsigned short table_t[%d];\n" % (hash_table_size) @@ -69,6 +69,7 @@ api_enum = [ 'GL_CORE', 'GLES3', # Not in gl_api enum in mtypes.h 'GLES31', # Not in gl_api enum in mtypes.h + 'GLES32', # Not in gl_api enum in mtypes.h ] def api_index(api): @@ -168,13 +169,18 @@ def generate_hash_tables(enum_list, enabled_apis, param_descriptors): for api in valid_apis: add_to_hash_table(tables[api], hash_val, len(params)) -# Also add GLES2 items to the GLES3 and GLES31 hash table +# Also add GLES2 items to the GLES3+ hash tables if api == "GLES2": add_to_hash_table(tables["GLES3"], hash_val, len(params)) add_to_hash_table(tables["GLES31"], hash_val, len(params)) -# Also add GLES3 items to the GLES31 hash table + add_to_hash_table(tables["GLES32"], hash_val, len(params)) +# Also add GLES3 items to the GLES31+ hash tables if api == "GLES3": add_to_hash_table(tables["GLES31"], hash_val, len(params)) + add_to_hash_table(tables["GLES32"], hash_val, len(params)) +# Also add GLES31 items to the GLES32+ hash tables +if api == "GLES31": + add_to_hash_table(tables["GLES32"], hash_val, len(params)) params.append(["GL_" + enum_name, param[1]]) sort
Re: [Mesa-dev] [PATCH v3] HUD: Add support for block I/O, network I/O and lmsensor stats
On 09/13/2016 02:21 PM, Steven Toth wrote: V3: Flatten the entire patchset ready for the ML V2: Additional seperate patches based on feedback a) configure.ac: Add a comment related to libsensors b) HUD: Disable Block/NIC I/O stats by default. Implement configuration option --enable-gallium-extra-hud=yes and enable both statistics when this option is enabled. c) Configure.ac: Minor cleanup to user visible configuration settings d) Configure.ac: HUD stats - build system improvements Move the -lsensors out of a deeper Makefile, bring it into the configure.ac. Also, rename a compiler directive to more closely follow the standard. V1: Initial release to the ML Three new features: 1. Disk/block I/O device read/write stats MB/ps. 2. Network Interface RX/TX transfer statistics as a percentage of the overall NIC speed. 3. lmsensor power, voltage and temperature sensors. The lmsensor changes makes a dependency on libsensors so support so the change is opt out by default. Signed-off-by: Steven Toth Builds and runs as expected with MSVC. I'll leave the detailed review to others. Tested-by: Brian Paul ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls (v2)
On 09/13/2016 02:42 PM, Adam Jackson wrote: On Tue, 2016-09-13 at 14:14 -0600, Kyle Brenneman wrote: On 09/13/2016 11:57 AM, Adam Jackson wrote: @@ -37,7 +39,7 @@ /* This should be kept in sync with _eglInitThreadInfo() */ #define _EGL_THREAD_INFO_INITIALIZER \ - { EGL_SUCCESS, NULL, 0, NULL, NULL, NULL } + { EGL_SUCCESS, NULL, EGL_NONE, NULL, NULL, NULL } The API here should be EGL_OPENGL_ES_API, not EGL_NONE. Otherwise, the current API would effectively change when the _EGLThreadInfo struct is allocated. Or I guess more generally, _EGL_THREAD_INFO_INITIALIZER should produce the same data as _eglInitThreadInfo. Mmm, okay. That's a very close reading of the spec. QueryAPI allows the result to be EGL_NONE, which does make sense for the dummy thread since you sure won't be doing much with it. But BindAPI says the default is EGL_OPENGL_ES_API, so presumably that should apply even to the dummy context. One does wonder then how you could ever get EGL_NONE out of QueryAPI. - ajax eglQueryAPI allows the result to be EGL_NONE only if it doesn't support GLES. From the spec (EGL 1.5, section 3.7): "The initial value of the current rendering API is EGL_OPENGL_ES_API , unless OpenGL ES is not supported by an implementation, in which case the initial value is EGL_NONE." ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] tgsi: Enable returns from within loops
Yes please, thanks! On Tue, Sep 13, 2016 at 4:22 PM, Brian Paul wrote: > On 09/13/2016 01:08 PM, Lars Hamre wrote: >> >> Fixes the following piglit test (for softpipe): >> /spec/glsl-1.10/execution/fs-loop-return >> >> Signed-off-by: Lars Hamre >> >> --- >> src/gallium/auxiliary/tgsi/tgsi_exec.c | 4 >> 1 file changed, 4 insertions(+) >> >> NOTE: Someone with access will need to commit this >>after the review process >> >> diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c >> b/src/gallium/auxiliary/tgsi/tgsi_exec.c >> index 1457c06..aff35e6 100644 >> --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c >> +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c >> @@ -5148,6 +5148,10 @@ exec_instruction( >> /* returning from main() */ >> mach->CondStackTop = 0; >> mach->LoopStackTop = 0; >> +mach->ContStackTop = 0; >> +mach->LoopLabelStackTop = 0; >> +mach->SwitchStackTop = 0; >> +mach->BreakStackTop = 0; >> *pc = -1; >> return FALSE; >>} >> -- >> 2.7.4 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > > Reviewed-by: Brian Paul > > Do you need me to push this for you? > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls (v2)
On Tue, 2016-09-13 at 14:14 -0600, Kyle Brenneman wrote: > On 09/13/2016 11:57 AM, Adam Jackson wrote: > > @@ -37,7 +39,7 @@ > > > > /* This should be kept in sync with _eglInitThreadInfo() */ > > #define _EGL_THREAD_INFO_INITIALIZER \ > > - { EGL_SUCCESS, NULL, 0, NULL, NULL, NULL } > > + { EGL_SUCCESS, NULL, EGL_NONE, NULL, NULL, NULL } > > The API here should be EGL_OPENGL_ES_API, not EGL_NONE. Otherwise, the > current API would effectively change when the _EGLThreadInfo struct is > allocated. Or I guess more generally, _EGL_THREAD_INFO_INITIALIZER > should produce the same data as _eglInitThreadInfo. Mmm, okay. That's a very close reading of the spec. QueryAPI allows the result to be EGL_NONE, which does make sense for the dummy thread since you sure won't be doing much with it. But BindAPI says the default is EGL_OPENGL_ES_API, so presumably that should apply even to the dummy context. One does wonder then how you could ever get EGL_NONE out of QueryAPI. - ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] HUD: Add support for block I/O, network I/O and lmsensor stats
> V3: Flatten the entire patchset ready for the ML Compile tested on Windows via AppVeyor Patches tested in various ./configure disable/enable modes on Ubuntu 16.04, 4.5.7 kernel on 32bit. Many thanks to everyone who provided feedback. -- Steven Toth - Kernel Labs http://www.kernellabs.com +1.646.355.8490 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: get rid of img/buf/sampler descriptor preloading (v2)
From: Marek Olšák 26011 shaders in 14651 tests Totals: SGPRS: 1251920 -> 1152636 (-7.93 %) VGPRS: 728421 -> 728198 (-0.03 %) Spilled SGPRs: 16644 -> 3776 (-77.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 36001064 -> 35835152 (-0.46 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 21 -> 222372 (0.07 %) Wait states: 0 -> 0 (0.00 %) v2: merge codepaths where possible --- src/gallium/drivers/radeonsi/si_shader.c | 173 --- 1 file changed, 41 insertions(+), 132 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 84cbfd7..6f9c45f 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -100,25 +100,20 @@ struct si_shader_context LLVMTargetMachineRef tm; unsigned invariant_load_md_kind; unsigned range_md_kind; unsigned uniform_md_kind; LLVMValueRef empty_md; /* Preloaded descriptors. */ LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS]; - LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS]; - LLVMValueRef sampler_views[SI_NUM_SAMPLERS]; - LLVMValueRef sampler_states[SI_NUM_SAMPLERS]; - LLVMValueRef fmasks[SI_NUM_SAMPLERS]; - LLVMValueRef images[SI_NUM_IMAGES]; LLVMValueRef esgs_ring; LLVMValueRef gsvs_ring[4]; LLVMValueRef lds; LLVMValueRef gs_next_vertex[4]; LLVMValueRef return_value; LLVMTypeRef voidt; LLVMTypeRef i1; LLVMTypeRef i8; @@ -3399,32 +3394,32 @@ static void membar_emit( { struct si_shader_context *ctx = si_shader_context(bld_base); emit_waitcnt(ctx); } static LLVMValueRef shader_buffer_fetch_rsrc(struct si_shader_context *ctx, const struct tgsi_full_src_register *reg) { - LLVMValueRef ind_index; - LLVMValueRef rsrc_ptr; + LLVMValueRef index; + LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, +SI_PARAM_SHADER_BUFFERS); if (!reg->Register.Indirect) - return ctx->shader_buffers[reg->Register.Index]; - - ind_index = get_bounded_indirect_index(ctx, ®->Indirect, - reg->Register.Index, - SI_NUM_SHADER_BUFFERS); + index = LLVMConstInt(ctx->i32, reg->Register.Index, 0); + else + index = get_bounded_indirect_index(ctx, ®->Indirect, + reg->Register.Index, + SI_NUM_SHADER_BUFFERS); - rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, SI_PARAM_SHADER_BUFFERS); - return build_indexed_load_const(ctx, rsrc_ptr, ind_index); + return build_indexed_load_const(ctx, rsrc_ptr, index); } static bool tgsi_is_array_sampler(unsigned target) { return target == TGSI_TEXTURE_1D_ARRAY || target == TGSI_TEXTURE_SHADOW1D_ARRAY || target == TGSI_TEXTURE_2D_ARRAY || target == TGSI_TEXTURE_SHADOW2D_ARRAY || target == TGSI_TEXTURE_CUBE_ARRAY || target == TGSI_TEXTURE_SHADOWCUBE_ARRAY || @@ -3473,51 +3468,47 @@ static LLVMValueRef force_dcc_off(struct si_shader_context *ctx, * Load the resource descriptor for \p image. */ static void image_fetch_rsrc( struct lp_build_tgsi_context *bld_base, const struct tgsi_full_src_register *image, bool dcc_off, LLVMValueRef *rsrc) { struct si_shader_context *ctx = si_shader_context(bld_base); + LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, +SI_PARAM_IMAGES); + LLVMValueRef index, tmp; assert(image->Register.File == TGSI_FILE_IMAGE); if (!image->Register.Indirect) { - /* Fast path: use preloaded resources */ - *rsrc = ctx->images[image->Register.Index]; + index = LLVMConstInt(ctx->i32, image->Register.Index, 0); } else { - /* Indexing and manual load */ - LLVMValueRef ind_index; - LLVMValueRef rsrc_ptr; - LLVMValueRef tmp; - /* From the GL_ARB_shader_image_load_store extension spec: * *If a shader performs an image load, store, or atomic *operation using an image variable declared as an array, *and if the index used to select an individual element is *negative or greater than or equal to the size of the *array, the results of the operation are undefined but may *not lead to termination. */ - ind_index = get_bounded_indirect_ind
[Mesa-dev] [PATCH] radeonsi: load streamout buffer descriptors before use (v2)
From: Marek Olšák v2: inline the code and remove the conditional that's a no-op now --- src/gallium/drivers/radeonsi/si_shader.c | 47 ++-- 1 file changed, 14 insertions(+), 33 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index be6fae7..d61f4ff 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -105,21 +105,20 @@ struct si_shader_context unsigned uniform_md_kind; LLVMValueRef empty_md; LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS]; LLVMValueRef lds; LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS]; LLVMValueRef sampler_views[SI_NUM_SAMPLERS]; LLVMValueRef sampler_states[SI_NUM_SAMPLERS]; LLVMValueRef fmasks[SI_NUM_SAMPLERS]; LLVMValueRef images[SI_NUM_IMAGES]; - LLVMValueRef so_buffers[4]; LLVMValueRef esgs_ring; LLVMValueRef gsvs_ring[4]; LLVMValueRef gs_next_vertex[4]; LLVMValueRef return_value; LLVMTypeRef voidt; LLVMTypeRef i1; LLVMTypeRef i8; LLVMTypeRef i32; LLVMTypeRef i64; @@ -2264,20 +2263,33 @@ static void si_dump_streamout(struct pipe_stream_output_info *so) * to buffers. */ static void si_llvm_emit_streamout(struct si_shader_context *ctx, struct si_shader_output_values *outputs, unsigned noutput) { struct pipe_stream_output_info *so = &ctx->shader->selector->so; struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm; LLVMBuilderRef builder = gallivm->builder; int i, j; struct lp_build_if_state if_ctx; + LLVMValueRef so_buffers[4]; + LLVMValueRef buf_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, + SI_PARAM_RW_BUFFERS); + + /* Load the descriptors. */ + for (i = 0; i < 4; ++i) { + if (ctx->shader->selector->so.stride[i]) { + LLVMValueRef offset = lp_build_const_int32(gallivm, + SI_VS_STREAMOUT_BUF0 + i); + + so_buffers[i] = build_indexed_load_const(ctx, buf_ptr, offset); + } + } /* Get bits [22:16], i.e. (so_param >> 16) & 127; */ LLVMValueRef so_vtx_count = unpack_param(ctx, ctx->param_streamout_config, 16, 7); LLVMValueRef tid = get_thread_id(ctx); /* can_emit = tid < so_vtx_count; */ LLVMValueRef can_emit = LLVMBuildICmp(builder, LLVMIntULT, tid, so_vtx_count, ""); @@ -2359,21 +2371,21 @@ static void si_llvm_emit_streamout(struct si_shader_context *ctx, } break; } LLVMValueRef can_emit_stream = LLVMBuildICmp(builder, LLVMIntEQ, stream_id, lp_build_const_int32(gallivm, stream), ""); lp_build_if(&if_ctx_stream, gallivm, can_emit_stream); - build_tbuffer_store_dwords(ctx, ctx->so_buffers[buf_idx], + build_tbuffer_store_dwords(ctx, so_buffers[buf_idx], vdata, num_comps, so_write_offset[buf_idx], LLVMConstInt(ctx->i32, 0, 0), so->output[i].dst_offset*4); lp_build_endif(&if_ctx_stream); } } lp_build_endif(&if_ctx); } @@ -5917,49 +5929,20 @@ static void preload_images(struct si_shader_context *ctx) lp_build_const_int32(gallivm, i)); if (info->images_writemask & (1 << i) && !(info->images_buffers & (1 << i))) rsrc = force_dcc_off(ctx, rsrc); ctx->images[i] = rsrc; } } -static void preload_streamout_buffers(struct si_shader_context *ctx) -{ - struct lp_build_tgsi_context *bld_base = &ctx->radeon_bld.soa.bld_base; - struct gallivm_state *gallivm = bld_base->base.gallivm; - unsigned i; - - /* Streamout can only be used if the shader is compiled as VS. */ - if (!ctx->shader->selector->so.num_outputs || - (ctx->type == PIPE_SHADER_VERTEX && -(ctx->shader->key.vs.as_es || - ctx->shader->key.vs.as_ls)) || - (ctx->type == PIPE_SHADER_TESS_EVAL && -ctx->shader->key.tes.as_es)) - return; - - LLVMValueRef buf_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, - SI_PARAM_R
[Mesa-dev] [PATCH v3] HUD: Add support for block I/O, network I/O and lmsensor stats
V3: Flatten the entire patchset ready for the ML V2: Additional seperate patches based on feedback a) configure.ac: Add a comment related to libsensors b) HUD: Disable Block/NIC I/O stats by default. Implement configuration option --enable-gallium-extra-hud=yes and enable both statistics when this option is enabled. c) Configure.ac: Minor cleanup to user visible configuration settings d) Configure.ac: HUD stats - build system improvements Move the -lsensors out of a deeper Makefile, bring it into the configure.ac. Also, rename a compiler directive to more closely follow the standard. V1: Initial release to the ML Three new features: 1. Disk/block I/O device read/write stats MB/ps. 2. Network Interface RX/TX transfer statistics as a percentage of the overall NIC speed. 3. lmsensor power, voltage and temperature sensors. The lmsensor changes makes a dependency on libsensors so support so the change is opt out by default. Signed-off-by: Steven Toth --- configure.ac | 42 +++ src/gallium/auxiliary/Makefile.am| 2 + src/gallium/auxiliary/Makefile.sources | 3 + src/gallium/auxiliary/hud/hud_context.c | 73 + src/gallium/auxiliary/hud/hud_diskstat.c | 335 src/gallium/auxiliary/hud/hud_nic.c | 441 +++ src/gallium/auxiliary/hud/hud_private.h | 25 ++ src/gallium/auxiliary/hud/hud_sensors_temp.c | 374 +++ src/gallium/include/pipe/p_defines.h | 4 + 9 files changed, 1299 insertions(+) create mode 100644 src/gallium/auxiliary/hud/hud_diskstat.c create mode 100644 src/gallium/auxiliary/hud/hud_nic.c create mode 100644 src/gallium/auxiliary/hud/hud_sensors_temp.c diff --git a/configure.ac b/configure.ac index a413a3a..610dff0 100644 --- a/configure.ac +++ b/configure.ac @@ -91,6 +91,7 @@ XCBGLX_REQUIRED=1.8.1 XSHMFENCE_REQUIRED=1.1 XVMC_REQUIRED=1.0.6 PYTHON_MAKO_REQUIRED=0.8.0 +LIBSENSORS_REQUIRED=4.0.0 dnl Check for progs AC_PROG_CPP @@ -871,6 +872,32 @@ AC_ARG_ENABLE([dri], [enable_dri="$enableval"], [enable_dri=yes]) +AC_ARG_ENABLE([gallium-extra-hud], +[AS_HELP_STRING([--enable-gallium-extra-hud], +[enable HUD block/NIC I/O HUD stats support @<:@default=disabled@:>@])], +[enable_gallium_extra_hud="$enableval"], +[enable_gallium_extra_hud=no]) +AM_CONDITIONAL(HAVE_GALLIUM_EXTRA_HUD, test "x$enable_gallium_extra_hud" = xyes) +if test "x$enable_gallium_extra_hud" = xyes ; then +DEFINES="${DEFINES} -DHAVE_GALLIUM_EXTRA_HUD=1" +fi + +#TODO: no pkgconfig .pc available for libsensors. +#PKG_CHECK_MODULES([LIBSENSORS], [libsensors >= $LIBSENSORS_REQUIRED], [enable_lmsensors=yes], [enable_lmsensors=no]) +AC_ARG_ENABLE([lmsensors], +[AS_HELP_STRING([--enable-lmsensors], +[enable HUD lmsensor support @<:@default=disabled@:>@])], +[enable_lmsensors="$enableval"], +[enable_lmsensors=no]) +AM_CONDITIONAL(HAVE_LIBSENSORS, test "x$enable_lmsensors" = xyes) +if test "x$enable_lmsensors" = xyes ; then +DEFINES="${DEFINES} -DHAVE_LIBSENSORS=1" +LIBSENSORS_LDFLAGS="-lsensors" +else +LIBSENSORS_LDFLAGS="" +fi +AC_SUBST(LIBSENSORS_LDFLAGS) + case "$host_os" in linux*) dri3_default=yes @@ -1122,6 +1149,8 @@ AM_CONDITIONAL(HAVE_DRISW_KMS, test "x$have_drisw_kms" = xyes ) AM_CONDITIONAL(HAVE_DRI2, test "x$enable_dri" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes ) AM_CONDITIONAL(HAVE_DRI3, test "x$enable_dri3" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes ) AM_CONDITIONAL(HAVE_APPLEDRI, test "x$enable_dri" = xyes -a "x$dri_platform" = xapple ) +AM_CONDITIONAL(HAVE_LMSENSORS, test "x$enable_lmsensors" = xyes ) +AM_CONDITIONAL(HAVE_GALLIUM_EXTRA_HUD, test "x$enable_gallium_extra_hud" = xyes ) AC_ARG_ENABLE([shared-glapi], [AS_HELP_STRING([--enable-shared-glapi], @@ -2876,6 +2905,19 @@ else echo "Gallium: no" fi +echo "" +if test "x$enable_gallium_extra_hud" != xyes; then +echo "HUD extra stats: no" +else +echo "HUD extra stats: yes" +fi + +if test "x$enable_lmsensors" != xyes; then +echo "HUD lmsensors: no" +else +echo "HUD lmsensors: yes" +fi + dnl Shader cache echo "" echo "Shader cache:$enable_shader_cache" diff --git a/src/gallium/auxiliary/Makefile.am b/src/gallium/auxiliary/Makefile.am index d971a2b..4a4a4fb 100644 --- a/src/gallium/auxiliary/Makefile.am +++ b/src/gallium/auxiliary/Makefile.am @@ -34,6 +34,8 @@ libgallium_la_SOURCES += \ endif +libgallium_la_LDFLAGS = $(LIBSENSORS_LDFLAGS) + MKDIR_GEN = $(AM_V_at)$(MKDIR_P) $(@D) PYTHON_GEN = $(AM_V_GEN)$(PYTHON2) $(PYTHON_FLAGS) diff --git a/src/gallium/auxiliary/Makefile.sources b/src/gallium/auxiliary/Makefile.sources index f8954c9..650a403 100644 --- a/src/gallium/auxiliary/Makefile.sources +++ b/src/gallium/auxiliary/Makefile.sources @@ -62,6 +62,9 @@ C_SOURCES := \ hud
Re: [Mesa-dev] [PATCH] tgsi: Enable returns from within loops
On 09/13/2016 01:08 PM, Lars Hamre wrote: Fixes the following piglit test (for softpipe): /spec/glsl-1.10/execution/fs-loop-return Signed-off-by: Lars Hamre --- src/gallium/auxiliary/tgsi/tgsi_exec.c | 4 1 file changed, 4 insertions(+) NOTE: Someone with access will need to commit this after the review process diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c b/src/gallium/auxiliary/tgsi/tgsi_exec.c index 1457c06..aff35e6 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c @@ -5148,6 +5148,10 @@ exec_instruction( /* returning from main() */ mach->CondStackTop = 0; mach->LoopStackTop = 0; +mach->ContStackTop = 0; +mach->LoopLabelStackTop = 0; +mach->SwitchStackTop = 0; +mach->BreakStackTop = 0; *pc = -1; return FALSE; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev Reviewed-by: Brian Paul Do you need me to push this for you? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: reload PS inputs with direct indexing at each use (v2)
From: Marek Olšák The LLVM compiler can CSE interp intrinsics thanks to LLVMReadNoneAttribute. 26011 shaders in 14651 tests Totals: SGPRS: 1146340 -> 1132676 (-1.19 %) VGPRS: 727371 -> 711730 (-2.15 %) Spilled SGPRs: 2218 -> 2078 (-6.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35841268 -> 36009732 (0.47 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222559 -> 224779 (1.00 %) Wait states: 0 -> 0 (0.00 %) v2: don't call load_input for fragment shaders in emit_declaration --- src/gallium/drivers/radeon/radeon_llvm.h | 6 - .../drivers/radeon/radeon_setup_tgsi_llvm.c| 30 ++ src/gallium/drivers/radeonsi/si_shader.c | 27 --- 3 files changed, 41 insertions(+), 22 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_llvm.h b/src/gallium/drivers/radeon/radeon_llvm.h index da5b7f5..f508d32 100644 --- a/src/gallium/drivers/radeon/radeon_llvm.h +++ b/src/gallium/drivers/radeon/radeon_llvm.h @@ -23,21 +23,23 @@ * Authors: Tom Stellard * */ #ifndef RADEON_LLVM_H #define RADEON_LLVM_H #include #include "gallivm/lp_bld_init.h" #include "gallivm/lp_bld_tgsi.h" +#include "tgsi/tgsi_parse.h" +#define RADEON_LLVM_MAX_INPUT_SLOTS 32 #define RADEON_LLVM_MAX_INPUTS 32 * 4 #define RADEON_LLVM_MAX_OUTPUTS 32 * 4 #define RADEON_LLVM_INITIAL_CF_DEPTH 4 #define RADEON_LLVM_MAX_SYSTEM_VALUES 4 struct radeon_llvm_branch { LLVMBasicBlockRef endif_block; LLVMBasicBlockRef if_block; @@ -55,33 +57,35 @@ struct radeon_llvm_context { /*=== Front end configuration ===*/ /* Instructions that are not described by any of the TGSI opcodes. */ /** This function is responsible for initilizing the inputs array and will be * called once for each input declared in the TGSI shader. */ void (*load_input)(struct radeon_llvm_context *, unsigned input_index, - const struct tgsi_full_declaration *decl); + const struct tgsi_full_declaration *decl, + LLVMValueRef out[4]); void (*load_system_value)(struct radeon_llvm_context *, unsigned index, const struct tgsi_full_declaration *decl); void (*declare_memory_region)(struct radeon_llvm_context *, const struct tgsi_full_declaration *decl); /** This array contains the input values for the shader. Typically these * values will be in the form of a target intrinsic that will inform the * backend how to load the actual inputs to the shader. */ + struct tgsi_full_declaration input_decls[RADEON_LLVM_MAX_INPUT_SLOTS]; LLVMValueRef inputs[RADEON_LLVM_MAX_INPUTS]; LLVMValueRef outputs[RADEON_LLVM_MAX_OUTPUTS][TGSI_NUM_CHANNELS]; /** This pointer is used to contain the temporary values. * The amount of temporary used in tgsi can't be bound to a max value and * thus we must allocate this array at runtime. */ LLVMValueRef *temps; unsigned temps_count; LLVMValueRef system_values[RADEON_LLVM_MAX_SYSTEM_VALUES]; diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 4643e6d..4fa43cd 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -439,28 +439,43 @@ LLVMValueRef radeon_llvm_emit_fetch(struct lp_build_tgsi_context *bld_base, bld_base->int_bld.zero); result = LLVMConstInsertElement(result, bld->immediates[reg->Register.Index][swizzle + 1], bld_base->int_bld.one); return LLVMConstBitCast(result, ctype); } else { return LLVMConstBitCast(bld->immediates[reg->Register.Index][swizzle], ctype); } } - case TGSI_FILE_INPUT: - result = ctx->inputs[radeon_llvm_reg_index_soa(reg->Register.Index, swizzle)]; + case TGSI_FILE_INPUT: { + unsigned index = reg->Register.Index; + LLVMValueRef input[4]; + + /* I don't think doing this for vertex shaders is beneficial. +* For those, we want to make sure the VMEM loads are executed +* only once. Fragment shaders don't care much, because +* v_interp instructions are much cheaper than VMEM loads. +*/ + if (ctx->soa.bld_base.info->processor == PIPE_SHADER_FRAGMENT) + ctx->load_input(ctx, index, &ctx->input_de
Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls (v2)
On 09/13/2016 11:57 AM, Adam Jackson wrote: From: Kyle Brenneman This decorates every EGL entrypoint with _EGL_FUNC_START, which records the function name and primary dispatch object label in the current thread state. It also adds debug report functions and calls them when appropriate. This would be useful enough for debugging on its own, if the user set a breakpoint when the report function was called. We will also need this state tracked in order to expose EGL_KHR_debug. v2: - Clear the object label in more cases in _eglSetFuncName - Set dummy thread's CurrentAPI to EGL_NONE not zero - Pass draw surface (if any) to _EGL_FUNC_START in eglSwapInterval --- src/egl/main/eglapi.c | 155 ++ src/egl/main/eglcurrent.c | 91 ++- src/egl/main/eglcurrent.h | 22 +++ src/egl/main/eglglobals.h | 5 ++ 4 files changed, 259 insertions(+), 14 deletions(-) diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c index 0477ad9..216b289 100644 --- a/src/egl/main/eglapi.c +++ b/src/egl/main/eglapi.c @@ -250,6 +250,37 @@ _eglUnlockDisplay(_EGLDisplay *dpy) mtx_unlock(&dpy->Mutex); } +static EGLBoolean +_eglSetFuncName(const char *funcName, _EGLDisplay *disp, EGLenum objectType, _EGLResource *object) +{ + _EGLThreadInfo *thr = _eglGetCurrentThread(); + if (!_eglIsCurrentThreadDummy()) { + thr->CurrentFuncName = funcName; + thr->CurrentObjectLabel = NULL; + + if (objectType == EGL_OBJECT_THREAD_KHR) + thr->CurrentObjectLabel = thr->Label; + else if (objectType == EGL_OBJECT_DISPLAY_KHR) + thr->CurrentObjectLabel = disp ? disp->Label : NULL; + else + thr->CurrentObjectLabel = object ? object->Label : NULL; + + return EGL_TRUE; + } + + _eglDebugReportFull(EGL_BAD_ALLOC, funcName, funcName, + EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL); + return EGL_FALSE; +} _eglSetFuncName starts with "thr->CurrentObjectLabel = NULL", so if it didn't set the label to something else later, it would have been cleared. + +#define _EGL_FUNC_START(disp, objectType, object, ret) \ + do { \ + if (!_eglSetFuncName(__func__, disp, objectType, (_EGLResource *) object)) { \ + if (disp) \ +_eglUnlockDisplay(disp); \ + return ret; \ + } \ + } while(0) static EGLint * _eglConvertAttribsToInt(const EGLAttrib *attr_list) @@ -287,6 +318,8 @@ eglGetDisplay(EGLNativeDisplayType nativeDisplay) _EGLDisplay *dpy; void *native_display_ptr; + _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY); + STATIC_ASSERT(sizeof(void*) == sizeof(nativeDisplay)); native_display_ptr = (void*) nativeDisplay; @@ -330,6 +363,7 @@ static EGLDisplay EGLAPIENTRY eglGetPlatformDisplayEXT(EGLenum platform, void *native_display, const EGLint *attrib_list) { + _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY); return _eglGetPlatformDisplayCommon(platform, native_display, attrib_list); } @@ -340,6 +374,8 @@ eglGetPlatformDisplay(EGLenum platform, void *native_display, EGLDisplay display; EGLint *int_attribs; + _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY); + int_attribs = _eglConvertAttribsToInt(attrib_list); if (attrib_list && !int_attribs) RETURN_EGL_ERROR(NULL, EGL_BAD_ALLOC, NULL); @@ -483,6 +519,8 @@ eglInitialize(EGLDisplay dpy, EGLint *major, EGLint *minor) { _EGLDisplay *disp = _eglLockDisplay(dpy); + _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE); + if (!disp) RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE); @@ -533,6 +571,8 @@ eglTerminate(EGLDisplay dpy) { _EGLDisplay *disp = _eglLockDisplay(dpy); + _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE); + if (!disp) RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE); @@ -560,6 +600,7 @@ eglQueryString(EGLDisplay dpy, EGLint name) } disp = _eglLockDisplay(dpy); + _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, NULL); _EGL_CHECK_DISPLAY(disp, NULL, drv); switch (name) { @@ -585,6 +626,8 @@ eglGetConfigs(EGLDisplay dpy, EGLConfig *configs, _EGLDriver *drv; EGLBoolean ret; + _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE); + _EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv); ret = drv->API.GetConfigs(drv, disp, configs, config_size, num_config); @@ -600,6 +643,8 @@ eglChooseConfig(EGLDisplay dpy, const EGLint *attrib_list, EGLConfig *configs, _EGLDriver *drv; EGLBoolean ret; + _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE); + _EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv); ret = drv->API.ChooseConfig(drv, disp, attrib_list, configs, config_size, num_config); @@ -617,6 +662
Re: [Mesa-dev] [PATCH] Remove GL_GLEXT_PROTOTYPES guards from non-ext headers.
On Tue, Sep 13, 2016 at 4:05 PM, Eric Anholt wrote: > Ilia Mirkin writes: > >> On Mon, Sep 12, 2016 at 11:55 AM, Emil Velikov >> wrote: >>> On 12 September 2016 at 15:35, Ilia Mirkin wrote: On Mon, Sep 12, 2016 at 10:10 AM, Emil Velikov wrote: > Keeping diff/patches in git always felt like a hack, imho. Plus > most/all(?) distros rely on the Mesa headers, so I'm not sure how that > is going to work. The alternatives are considerably more painful for just a handful of files with a small number of diffs. This would be as a tool for developers like us who update the mesa versions by importing new KHR versions, which will not have our local changes applied. The patch would not be used as part of the build process or anything else. >>> The goal being to have the patches alongside the patched headers. >>> This way one can use them as reference ? Sure sounds great imho. >> >> Exactly. So that when I download new KHR headers, I just apply the >> patch to them (and hope it applies), and if not, look at what was >> being done and try to repeat the process. Then I regenerate the patch >> against the (new) originals and check the whole thing in. > > Or you could just use git like normal: You have a public branch of the > unchanged headers. You make your own changes to the headers on master. > When you want to update to new upstream headers, you check out the > unchanged-headers branch, commit new unchanged upstreams there, check > out master, and git merge. Right. Seems hardly worth the hassle for a small handful of diffs on files we update once every 2 years. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] Remove GL_GLEXT_PROTOTYPES guards from non-ext headers.
Ilia Mirkin writes: > On Mon, Sep 12, 2016 at 11:55 AM, Emil Velikov > wrote: >> On 12 September 2016 at 15:35, Ilia Mirkin wrote: >>> On Mon, Sep 12, 2016 at 10:10 AM, Emil Velikov >>> wrote: Keeping diff/patches in git always felt like a hack, imho. Plus most/all(?) distros rely on the Mesa headers, so I'm not sure how that is going to work. >>> >>> The alternatives are considerably more painful for just a handful of >>> files with a small number of diffs. This would be as a tool for >>> developers like us who update the mesa versions by importing new KHR >>> versions, which will not have our local changes applied. The patch >>> would not be used as part of the build process or anything else. >>> >> The goal being to have the patches alongside the patched headers. >> This way one can use them as reference ? Sure sounds great imho. > > Exactly. So that when I download new KHR headers, I just apply the > patch to them (and hope it applies), and if not, look at what was > being done and try to repeat the process. Then I regenerate the patch > against the (new) originals and check the whole thing in. Or you could just use git like normal: You have a public branch of the unchanged headers. You make your own changes to the headers on master. When you want to update to new upstream headers, you check out the unchanged-headers branch, commit new unchanged upstreams there, check out master, and git merge. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 97261] vaapi u/v wrong order since vl/util: add copy func for yv12image to nv12surface
https://bugs.freedesktop.org/show_bug.cgi?id=97261 Andy Furniss changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #1 from Andy Furniss --- Fixed in mesa git - https://cgit.freedesktop.org/mesa/mesa/commit/?id=304f70536a73f4b63360632428241c7488c99610 -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 3/9] nv50/ir: teach load propagation about src2
With OP_ADD3, we might want to swap sources 2 and 1. Signed-off-by: Samuel Pitoiset --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 29 ++ 1 file changed, 29 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index a9172f8..f212eba 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -153,6 +153,7 @@ private: virtual bool visit(BasicBlock *); void checkSwapSrc01(Instruction *); + void checkSwapSrc21(Instruction *); bool isCSpaceLoad(Instruction *); bool isImmdLoad(Instruction *); @@ -239,6 +240,32 @@ LoadPropagation::checkSwapSrc01(Instruction *insn) } } +void +LoadPropagation::checkSwapSrc21(Instruction *insn) +{ + const Target *targ = prog->getTarget(); + if (insn->op != OP_ADD3) + return; + if (insn->src(2).getFile() != FILE_GPR) + return; + + Instruction *i1 = insn->getSrc(1)->getInsn(); + Instruction *i2 = insn->getSrc(2)->getInsn(); + + // Swap sources to inline the less frequently used source. That way, + // optimistically, it will eventually be able to remove the instruction. + int i1refs = insn->getSrc(1)->refCount(); + int i2refs = insn->getSrc(2)->refCount(); + + if ((isCSpaceLoad(i2) || isImmdLoad(i2)) && targ->insnCanLoad(insn, 1, i2)) { + if ((!isImmdLoad(i1) && !isCSpaceLoad(i1)) || + !targ->insnCanLoad(insn, 1, i1) || + i2refs < i1refs) { + insn->swapSources(2, 1); + } + } +} + bool LoadPropagation::visit(BasicBlock *bb) { @@ -256,6 +283,8 @@ LoadPropagation::visit(BasicBlock *bb) if (i->srcExists(1)) checkSwapSrc01(i); + if (i->srcExists(2)) + checkSwapSrc21(i); for (int s = 0; i->srcExists(s); ++s) { Instruction *ld = i->getSrc(s)->getInsn(); -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 5/9] nv50/ir: optimize ADD3(d, a, b, c) to MOV(d, a + b + c)
Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index fe815e3..ecde364 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -797,6 +797,17 @@ ConstantFolding::expr(Instruction *i, } break; } + case OP_ADD3: { + switch (i->dType) { + case TYPE_S32: + case TYPE_U32: + res.data.u32 = a->data.u32 + b->data.u32 + c->data.u32; + break; + default: + return; + } + break; + } default: return; } -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 4/9] nv50/ir: optimize ADD(ADD(a, b), c) to ADD3(a, b, c)
Signed-off-by: Samuel Pitoiset --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 55 ++ 1 file changed, 55 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index f212eba..fe815e3 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1569,6 +1569,7 @@ private: void handleABS(Instruction *); bool handleADD(Instruction *); bool tryADDToMADOrSAD(Instruction *, operation toOp); + bool tryADDToADD3(Instruction *); void handleMINMAX(Instruction *); void handleRCP(Instruction *); void handleSLCT(Instruction *); @@ -1642,6 +1643,8 @@ AlgebraicOpt::handleADD(Instruction *add) changed = tryADDToMADOrSAD(add, OP_MAD); if (!changed && prog->getTarget()->isOpSupported(OP_SAD, add->dType)) changed = tryADDToMADOrSAD(add, OP_SAD); + if (!changed && prog->getTarget()->isOpSupported(OP_ADD3, add->dType)) + changed = tryADDToADD3(add); return changed; } @@ -1712,6 +1715,58 @@ AlgebraicOpt::tryADDToMADOrSAD(Instruction *add, operation toOp) return true; } +// ADD(ADD(a,b), c) -> ADD3(a,b,c) +bool +AlgebraicOpt::tryADDToADD3(Instruction *add) +{ + Value *src0 = add->getSrc(0); + Value *src1 = add->getSrc(1); + const Modifier modBad = Modifier(~NV50_IR_MOD_NEG); + Modifier mod[4]; + Value *src; + int s; + + if (src0->refCount() == 1 && + src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_ADD) + s = 0; + else + if (src1->refCount() == 1 && + src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_ADD) + s = 1; + else + return false; + + src = add->getSrc(s); + + if (src->getUniqueInsn() && src->getUniqueInsn()->bb != add->bb) + return false; + + if (src->getInsn()->saturate) + return false; + + if (typeSizeof(add->dType) != typeSizeof(src->getInsn()->dType)) + return false; + + mod[0] = add->src(0).mod; + mod[1] = add->src(1).mod; + mod[2] = src->getUniqueInsn()->src(0).mod; + mod[3] = src->getUniqueInsn()->src(1).mod; + + if (((mod[0] | mod[1]) | (mod[2] | mod[3])) & modBad) + return false; + + add->op = OP_ADD3; + add->dType = src->getInsn()->dType; + add->sType = src->getInsn()->sType; + + add->setSrc(s, src->getInsn()->getSrc(0)); + add->src(s).mod = mod[s] ^ mod[2]; + add->setSrc(2, src->getInsn()->getSrc(1)); + add->src(2).mod = mod[3]; + + return true; +} + void AlgebraicOpt::handleMINMAX(Instruction *minmax) { -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 6/9] nv50/ir: optimize ADD3(d, a, b, 0x0) to ADD(d, a, b)
Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index ecde364..246cdff 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -937,6 +937,14 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue &imm2) return; } break; + case OP_ADD3: + if (imm2.isInteger(0)) { + i->op = OP_ADD; + i->setSrc(2, NULL); + foldCount++; + return; + } + break; default: return; } -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 2/9] gm107/ir: add emission for OP_ADD3
Signed-off-by: Samuel Pitoiset --- .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 34 ++ 1 file changed, 34 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp index cfde66c..fd3dd3f 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp @@ -150,6 +150,7 @@ private: void emitLOP(); void emitNOT(); void emitIADD(); + void emitIADD3(); void emitIMUL(); void emitIMAD(); void emitIMNMX(); @@ -1728,6 +1729,36 @@ CodeEmitterGM107::emitIADD() } void +CodeEmitterGM107::emitIADD3() +{ + switch (insn->src(1).getFile()) { + case FILE_GPR: + emitInsn(0x5cc0); + emitGPR (0x14, insn->src(1)); + break; + case FILE_MEMORY_CONST: + emitInsn(0x4cc0); + emitCBUF(0x22, -1, 0x14, 16, 2, insn->src(1)); + break; + case FILE_IMMEDIATE: + emitInsn(0x38c0); + emitIMMD(0x14, 19, insn->src(1)); + break; + default: + assert(!"bad src1 file"); + break; + } + emitNEG(0x33, insn->src(0)); + emitNEG(0x32, insn->src(1)); + emitNEG(0x31, insn->src(2)); + emitX (0x30); + emitCC (0x2f); + emitGPR(0x27, insn->src(2)); + emitGPR(0x08, insn->src(0)); + emitGPR(0x00, insn->def(0)); +} + +void CodeEmitterGM107::emitIMUL() { if (insn->src(1).getFile() != FILE_IMMEDIATE) { @@ -3077,6 +3108,9 @@ CodeEmitterGM107::emitInstruction(Instruction *i) emitIADD(); } break; + case OP_ADD3: + emitIADD3(); + break; case OP_MUL: if (isFloatType(insn->dType)) { if (insn->dType == TYPE_F64) -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 7/9] nv50/ir: optimize ADD3(d, 0x0, b, c) to ADD(d, b, c)
And ADD3(d, a, 0x0, c) to ADD(d, a, c) as well. v2: - use moveSources() - allow ADD3 -> ADD when srcFlags is set Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 246cdff..284f187 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1070,7 +1070,12 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) i->src(0).mod = Modifier(0); } break; - + case OP_ADD3: + if (imm0.isInteger(0)) { + i->op = OP_ADD; + i->moveSources(s + 1, -1); + } + break; case OP_DIV: if (s != 1 || (i->dType != TYPE_S32 && i->dType != TYPE_U32)) break; -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 1/9] nv50/ir: add preliminary support for OP_ADD3
This instruction is new since SM50 (Maxwell) and allows to perform an add with three sources. Unfortunately, it only supports integers. v3: - set commutative flag for OP_ADD3 - move OP_ADD3 after arithmetic ops Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/codegen/nv50_ir.h| 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp| 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp | 6 +++--- src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp | 4 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 12 6 files changed, 18 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index d6011d9..12a8b10 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -57,6 +57,7 @@ enum operation OP_MAD, OP_FMA, OP_SAD, // abs(src0 - src1) + src2 + OP_ADD3, OP_ABS, OP_NEG, OP_NOT, diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp index 22f2f5d..83340f2 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp @@ -86,6 +86,7 @@ const char *operationStr[OP_LAST + 1] = "mad", "fma", "sad", + "add3", "abs", "neg", "not", diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp index 7d7b315..dcf35ba 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp @@ -30,7 +30,7 @@ const uint8_t Target::operationSrcNr[] = 0, 0, // NOP, PHI 0, 0, 0, 0, // UNION, SPLIT, MERGE, CONSTRAINT 1, 1, 2,// MOV, LOAD, STORE - 2, 2, 2, 2, 2, 3, 3, 3, // ADD, SUB, MUL, DIV, MOD, MAD, FMA, SAD + 2, 2, 2, 2, 2, 3, 3, 3, 3, // ADD, SUB, MUL, DIV, MOD, MAD, FMA, SAD, ADD3 1, 1, 1,// ABS, NEG, NOT 2, 2, 2, 2, 2, // AND, OR, XOR, SHL, SHR 2, 2, 1,// MAX, MIN, SAT @@ -70,10 +70,10 @@ const OpClass Target::operationClass[] = OPCLASS_MOVE, OPCLASS_LOAD, OPCLASS_STORE, - // ADD, SUB, MUL; DIV, MOD; MAD, FMA, SAD + // ADD, SUB, MUL; DIV, MOD; MAD, FMA, SAD, ADD3 OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH, - OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH, + OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH, // ABS, NEG; NOT, AND, OR, XOR; SHL, SHR OPCLASS_CONVERT, OPCLASS_CONVERT, OPCLASS_LOGIC, OPCLASS_LOGIC, OPCLASS_LOGIC, OPCLASS_LOGIC, diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp index 6b8f767..eecd61f 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp @@ -61,6 +61,10 @@ TargetGM107::isOpSupported(operation op, DataType ty) const case OP_DIV: case OP_MOD: return false; + case OP_ADD3: + if (isFloatType(ty)) + return false; + break; default: break; } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp index b37ea73..e1a7963 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp @@ -437,6 +437,7 @@ TargetNV50::isOpSupported(operation op, DataType ty) const case OP_EXTBF: case OP_EXIT: // want exit modifier instead (on NOP if required) case OP_MEMBAR: + case OP_ADD3: return false; case OP_SAD: return ty == TYPE_S32; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp index f5981de..a927c1e 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp @@ -147,7 +147,9 @@ static const struct opProperties _initProps[] = { OP_SUSTP, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0 }, { OP_SUCLAMP, 0x0, 0x0, 0x0, 0x0, 0x2, 0x2 }, { OP_SUBFM, 0x0, 0x0, 0x0, 0x0, 0x6, 0x2 }, - { OP_SUEAU, 0x0, 0x0, 0x0, 0x0, 0x6, 0x2 } + { OP_SUEAU, 0x0, 0x0, 0x0, 0x0, 0x6, 0x2 }, + // gm107 ops: + { OP_ADD3,0x7, 0x0, 0x0, 0x0, 0x2, 0x2 }, }; void TargetNVC0::initOpInfo() @@ -156,14 +158,14 @@ void TargetNVC0::initOpInfo() static const uint32_t commutative[(OP_LAST + 31) / 32] = { - // ADD, MAD, MUL, AND, OR, XOR, MAX, MIN - 0x0670ca00, 0x003f, 0x, 0x + // ADD, MAD, MUL, ADD3, AND, OR, XOR, MAX, MIN +
[Mesa-dev] [PATCH v3 8/9] nv50/ir: optimize ADD3(d, a, b, c) to ADD(d, c, a + b)
This is similar to what we already do for MAD/FMA. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 284f187..6ba2af6 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -605,6 +605,14 @@ ConstantFolding::expr(Instruction *i, return; } break; + case OP_ADD3: + switch (i->dType) { + case TYPE_S32: + case TYPE_U32: res.data.u32 = a->data.u32 + b->data.u32; break; + default: + return; + } + break; case OP_POW: switch (i->dType) { case TYPE_F32: res.data.f32 = pow(a->data.f32, b->data.f32); break; @@ -721,7 +729,8 @@ ConstantFolding::expr(Instruction *i, switch (i->op) { case OP_MAD: - case OP_FMA: { + case OP_FMA: + case OP_ADD3: { ImmediateValue src0, src1 = *i->getSrc(0)->asImm(); // Move the immediate into position 1, where we know it might be -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 9/9] nv50/ir: optimize ADD3(d, a, b, c) to ADD(d, a, b + c)
And ADD3(d, a, b, c) to ADD(d, b, a + c) as well. Very modest effect because OP_ADD3 only supports integers, but can reduce the number of instructions in some shaders. total instructions in shared programs :2594754 -> 2594686 (-0.00%) total gprs used in shared programs:366893 -> 366919 (0.01%) total local used in shared programs :31872 -> 31872 (0.00%) localgpr inst bytes helped 0 0 39 39 hurt 0 26 0 0 Signed-off-by: Samuel Pitoiset --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 62 ++ 1 file changed, 62 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 6ba2af6..e5e6e8e 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -374,6 +374,7 @@ private: void expr(Instruction *, ImmediateValue&, ImmediateValue&); void expr(Instruction *, ImmediateValue&, ImmediateValue&, ImmediateValue&); void opnd(Instruction *, ImmediateValue&, int s); + void opnd2(Instruction *, ImmediateValue&, int, ImmediateValue&, int); void opnd3(Instruction *, ImmediateValue&); void unary(Instruction *, const ImmediateValue&); @@ -429,6 +430,13 @@ ConstantFolding::visit(BasicBlock *bb) opnd(i, src1, 1); if (i->srcExists(2) && i->src(2).getImmediate(src2)) opnd3(i, src2); + if (i->srcExists(2) && + i->src(0).getImmediate(src0) && i->src(2).getImmediate(src2)) + opnd2(i, src0, 0, src2, 2); + else + if (i->srcExists(2) && + i->src(1).getImmediate(src1) && i->src(2).getImmediate(src2)) + opnd2(i, src1, 1, src2, 2); } return true; } @@ -960,6 +968,60 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue &imm2) } void +ConstantFolding::opnd2(Instruction *i, ImmediateValue &imm0, int s0, + ImmediateValue &imm1, int s1) +{ + struct Storage *const a = &imm0.reg, *const b = &imm1.reg; + ImmediateValue src0, src1; + struct Storage res; + DataType type = i->dType; + + memset(&res.data, 0, sizeof(res.data)); + + switch (i->op) { + case OP_ADD3: + switch (i->dType) { + case TYPE_S32: + case TYPE_U32: res.data.u32 = a->data.u32 + b->data.u32; break; + default: + return; + } + break; + default: + return; + } + ++foldCount; + + i->op = OP_ADD; + + if (s0 == 0) { + i->setSrc(0, i->getSrc(1)); + i->src(0).mod = i->src(1).mod; + } + + i->setSrc(1, new_ImmediateValue(i->bb->getProgram(), res.data.u32)); + i->setSrc(2, NULL); + + i->getSrc(1)->reg.data = res.data; + i->getSrc(1)->reg.type = type; + i->getSrc(1)->reg.size = typeSizeof(type); + + src1 = *i->getSrc(1)->asImm(); + + // Move the immediate into position 1, where we know it might be + // emittable. However it might not be anyways, as there may be other + // restrictions, so move it into a separate LValue. + bld.setPosition(i, false); + i->setSrc(1, bld.mkMov(bld.getSSA(type), i->getSrc(1), type)->getDef(0)); + i->src(1).mod = Modifier(0); + + if (i->src(0).getImmediate(src0)) + expr(i, src0, src1); + else + opnd(i, src1, 1); +} + +void ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) { const int t = !s; -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] tgsi: Enable returns from within loops
Fixes the following piglit test (for softpipe): /spec/glsl-1.10/execution/fs-loop-return Signed-off-by: Lars Hamre --- src/gallium/auxiliary/tgsi/tgsi_exec.c | 4 1 file changed, 4 insertions(+) NOTE: Someone with access will need to commit this after the review process diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c b/src/gallium/auxiliary/tgsi/tgsi_exec.c index 1457c06..aff35e6 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c @@ -5148,6 +5148,10 @@ exec_instruction( /* returning from main() */ mach->CondStackTop = 0; mach->LoopStackTop = 0; +mach->ContStackTop = 0; +mach->LoopLabelStackTop = 0; +mach->SwitchStackTop = 0; +mach->BreakStackTop = 0; *pc = -1; return FALSE; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 94627] Game Risen on wine black grass
https://bugs.freedesktop.org/show_bug.cgi?id=94627 --- Comment #9 from Heiko Ernst --- Bug is closed in mesa 12.0.2 -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radeonsi/compute: Use the HSA abi for non-TGSI compute shaders v2
This patch switches non-TGSI compute shaders over to using the HSA ABI described here: https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md The HSA ABI provides a much cleaner interface for compute shaders and allows us to share more code in the compiler with the HSA stack. The main changes in this patch are: - We now pass the scratch buffer resource into the shader via user sgprs rather than using relocations. - Grid/Block sizes are now passed to the shader via the dispatch packet rather than at the beginning of the kernel arguments. Typically for HSA, the CP firmware will create the dispatch packet and set up the user sgprs automatically. However, in Mesa we let the driver do this work. The main reason for this is that I haven't researched how to get the CP to do all these things, and I'm not sure if it is supported for all GPUs. v2: - Add comments explaining why we are setting certian bits of the scratch resource descriptor. --- src/gallium/drivers/radeon/r600_pipe_common.c| 6 +- src/gallium/drivers/radeonsi/amd_kernel_code_t.h | 534 +++ src/gallium/drivers/radeonsi/si_compute.c| 236 +- 3 files changed, 758 insertions(+), 18 deletions(-) create mode 100644 src/gallium/drivers/radeonsi/amd_kernel_code_t.h diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 6d7cc1b..8f17f36 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -822,7 +822,11 @@ static int r600_get_compute_param(struct pipe_screen *screen, if (rscreen->family <= CHIP_ARUBA) { triple = "r600--"; } else { - triple = "amdgcn--"; + if (HAVE_LLVM < 0x0400) { + triple = "amdgcn--"; + } else { + triple = "amdgcn--mesa3d"; + } } switch(rscreen->family) { /* Clang < 3.6 is missing Hainan in its list of diff --git a/src/gallium/drivers/radeonsi/amd_kernel_code_t.h b/src/gallium/drivers/radeonsi/amd_kernel_code_t.h new file mode 100644 index 000..d0d7809 --- /dev/null +++ b/src/gallium/drivers/radeonsi/amd_kernel_code_t.h @@ -0,0 +1,534 @@ +/* + * Copyright 2015,2016 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * on the rights to use, copy, modify, merge, publish, distribute, sub + * license, and/or sell copies of the Software, and to permit persons to whom + * the Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL + * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM, + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE + * USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#ifndef AMDKERNELCODET_H +#define AMDKERNELCODET_H + +//---// +// AMD Kernel Code, and its dependencies // +//---// + +// Sets val bits for specified mask in specified dst packed instance. +#define AMD_HSA_BITS_SET(dst, mask, val) \ + dst &= (~(1 << mask ## _SHIFT) & ~mask); \ + dst |= (((val) << mask ## _SHIFT) & mask) + +// Gets bits for specified mask from specified src packed instance. +#define AMD_HSA_BITS_GET(src, mask) \ + ((src & mask) >> mask ## _SHIFT) \ + +/* Every amd_*_code_t has the following properties, which are composed of + * a number of bit fields. Every bit field has a mask (AMD_CODE_PROPERTY_*), + * bit width (AMD_CODE_PROPERTY_*_WIDTH, and bit shift amount + * (AMD_CODE_PROPERTY_*_SHIFT) for convenient access. Unused bits must be 0. + * + * (Note that bit fields cannot be used as their layout is + * implementation defined in the C standard and so cannot be used to + * specify an ABI) + */ +enum amd_code_property_mask_t { + + /* Enable the setup of the SGPR user data registers + * (AMD_CODE_PROPERTY_ENAB
[Mesa-dev] [PATCH 1/2] radeonsi/compute: Add some more debug printfs
--- src/gallium/drivers/radeonsi/si_compute.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 5041761..a79c224 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -298,6 +298,9 @@ static bool si_switch_compute_shader(struct si_context *sctx, radeon_emit(cs, config->rsrc1); radeon_emit(cs, config->rsrc2); + COMPUTE_DBG(sctx->screen, "COMPUTE_PGM_RSRC1: 0x%08x " + "COMPUTE_PGM_RSRC2: 0x%08x\n", config->rsrc1, config->rsrc2); + radeon_set_sh_reg(cs, R_00B860_COMPUTE_TMPRING_SIZE, S_00B860_WAVES(sctx->scratch_waves) | S_00B860_WAVESIZE(config->scratch_bytes_per_wave >> 10)); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/va: also honors interlaced preference when providing a video format
Andy Furniss wrote: Leo Liu wrote: mpv --vo-vaapi all is apparently OK when playing say a 25fps vid, but I've found that if I push the framerate to refresh rate and do something that draws OSD than image is corrupted, possible many VM faults logged followed by a segfault in u_copy_yv12_img_to_nv12_surf this happens with or without the uv swap patch below. I will file a bug after more investigation. Bisecting mesa goes back to the commit that introduced VAAPI_DISABLE_INTERLACE. We have to be careful, we cannot override preferred interlaced type, got from querying. OK, but it won't work = corrupted but no crash without the env. Ahh, ignore that I see what you mean (I think), but I don't know how to fix it. I notice the env was described as a "temporary solution" in the commit message. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/6] RadeonSI: Let's just stop spilling SGPRs
On Tue, Sep 13, 2016 at 8:16 PM, Nicolai Hähnle wrote: > > On 13.09.2016 19:13, Marek Olšák wrote: >> >> This is quite easy because we just have to get rid of all of >> the preloading at the beginning of shaders. >> >> I also removed preloading of PS inputs with literal indexing, which >> has almost the same effect as sinking interp instructions. >> >> I'm slightly concerned that LICM won't move interps because they are >> not considered speculatively-executable (=movable) by LLVM, but >> the shader-db stats show that it doesn't matter. >> >> LLVM is smart enough to do CSE where needed for both descriptor loads >> and interps. In fact, it's the CSE which is responsible for some of >> the remaining SGPR spills. (It makes sense if you think about it) >> >> The compile time increased by 6% because CSE has a lot more work, >> but it's certainly worth it. >> >> >> shader-db stats: >> >> [PATCH 4/6] radeonsi: get rid of img/buf/sampler descriptor >> https://people.freedesktop.org/~mareko/no_preload1.html >> [PATCH 5/6] radeonsi: get rid of constant buffer preloading >> https://people.freedesktop.org/~mareko/no_preload2.html >> [PATCH 6/6] radeonsi: reload PS inputs with direct indexing at each >> https://people.freedesktop.org/~mareko/no_preload3_ps.html >> >> Total diff: >> https://people.freedesktop.org/~mareko/no_preload_total.html > > > Those numbers are impressive. > > We do have to be slightly careful, I noticed that LLVM didn't lift some > constant loads out of loops with the earlier preload removal, in shaders > where SGPR pressure wasn't an issue at all. > > I think the right way to deal with this is to improve heuristics in LLVM, so > I'm fine with changing Mesa in this way. Yeah. The problem is the LICM (moving stuff out of loops) and Sink (moving stuff forward) passes are no-ops with intrinsics, because intrinsics fail the "isSafeToSpeculativelyExecute" function. The trivial fix would be to add a new "movable" flag for intrinsics and process it in that function. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2
Thanks a lot! I'll try that tonight! I have a 64-bit distrib, I don't think so but do I need to compile the 32-bit version of llvm as well (is it because Steam is using 32-bit libraries?). 2016-09-13 13:53 GMT-04:00 Marek Olšák : > LLVM 64-bit: > > mkdir -p build > cd build > cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/x86_64-linux-gnu > -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=O > -DCMAKE_BUILD_TYPE=RelWithDebInfo > -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \ > -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG > -fno-omit-frame-pointer" \ > -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG > -fno-omit-frame-pointer". > ninja > sudo ninja install > > > LLVM 32-bit: > > mkdir -p build32 > cd build32 > cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/i386-linux-gnu > -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=ON > -DCMAKE_BUILD_TYPE=RelWithDebInfo > -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \ > -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG > -fno-omit-frame-pointer" \ > -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG > -fno-omit-frame-pointer" \ > -DLLVM_BUILD_32_BITS=ON > ninja > sudo ninja install > # then add /usr/llvm/x86_64-linux-gnu and /usr/llvm/i386-linux-gnu to > ld.conf > > > Mesa configure helper script, it will overwrite the /usr/lib/ files on > Ubuntu (run as-is for 64-bit, or use "-32" for 32-bit): > > if test x$1 = x-32; then > dir=i386-linux-gnu > build=i686-linux-gnu > export CFLAGS="-m32 -O2 -g" > export CXXFLAGS="$CFLAGS" > export LDFLAGS="-L/usr/lib/$dir" > export PKG_CONFIG_PATH="/usr/lib/$dir/pkgconfig" > else > dir=x86_64-linux-gnu > build=$dir > fi > > ./autogen.sh \ > --build=$build --prefix=/usr --libdir=/usr/lib/$dir > --with-llvm-prefix=/usr/llvm/$dir \ > --enable-glx-tls --enable-texture-float --enable-debug --enable-vdpau \ > --disable-xvmc --disable-va --enable-nine --with-sha1=libnettle \ > --with-gallium-drivers=radeonsi,r600,swrast --with-dri-drivers= \ > --with-egl-platforms=x11,drm --enable-gles1 --enable-gles2 > > make -j4 > sudo make install > > You'll probably want to delete /usr/lib/$dir/*mesa*/*. That's Ubuntu's > invention that will prevent you from using installed libGL and libEGL. > > It's all kind of a mess, but I don't know of a better way. > > Marek > > > > On Tue, Sep 13, 2016 at 7:33 PM, Romain Failliot > wrote: >> 2016-09-13 12:41 GMT-04:00 Marek Olšák : >>> >>> BTW, If you update LLVM to a newer version, you also have to re-build >>> Mesa, because the LLVM version used by Mesa is determined while Mesa >>> is being built. >>> >>> Also, the chance to rage-quit while building LLVM+Mesa is pretty high >>> if you've never done it before. >> >> I see, is there a tutorial somewhere maybe on how to do that? >> I know how to compile projects, that's not a problem. It's more about the >> little details to make everything work once it's compiled. > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/14] egl: Lock the display in _eglCreateSync's callers
On Tue, 2016-09-13 at 19:18 +0100, Emil Velikov wrote: > For the series as a whole ? > Two words which contradict any software's stable scheme - new feature. Disagree, but I'm not the one running Mesa's stable branch, so my opinion doesn't count here. - ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] virgl: fix flushing while removing sampler views
From: Marc-André Lureau When updating the sampler views, virgl might need to flush. After flushing, the resources are reattached, however, the sampler enabled_mask isn't yet updated, and some views could be in the process of being removed, which lead to the following crash: Thread 1 "heaven_x64" received signal SIGSEGV, Segmentation fault. 0x7f83893cc3de in virgl_attach_res_sampler_views (vctx=vctx@entry=0x1c22c00, shader_type=shader_type@entry=1) at virgl_context.c:113 113 res = virgl_resource(tinfo->views[i]->base.texture); (gdb) bt #0 0x7f83893cc3de in virgl_attach_res_sampler_views (vctx=vctx@entry=0x1c22c00, shader_type=shader_type@entry=1) at virgl_context.c:113 #1 0x7f83893cc703 in virgl_reemit_res (vctx=0x1c22c00) at virgl_context.c:182 #2 virgl_flush_eq (ctx=ctx@entry=0x1c22c00, closure=0x1c22c00) at virgl_context.c:637 #3 0x7f83893ccbf8 in virgl_flush_from_st (ctx=0x1c22c00, fence=, flags=) at virgl_context.c:659 #4 0x7f83893cd6b0 in virgl_encoder_write_cmd_dword (ctx=ctx@entry=0x1c22c00, dword=dword@entry=67075) at virgl_encode.c:43 #5 0x7f83893cd76b in virgl_encode_delete_object (ctx=0x1c22c00, handle=1306480, object=object@entry=6) at virgl_encode.c:72 #6 0x7f83893ccc81 in virgl_destroy_sampler_view (ctx=, view=0x7aca1b0) at virgl_context.c:741 #7 0x7f83893cca17 in pipe_sampler_view_reference (view=0x0, ptr=0x1c22fc8) at ../../../../src/gallium/auxiliary/util/u_inlines.h:151 #8 virgl_set_sampler_views (ctx=0x1c22c00, shader_type=1, start_slot=, num_views=, views=) at virgl_context.c:724 #9 0x7f8388fffd68 in cso_set_sampler_views (ctx=0x1ca2ee0, shader_stage=, count=9, views=) at cso_cache/cso_context.c:1301 #10 0x7f8388e670c1 in update_textures (st=, mesa_shader=, prog=, max_units=16, sampler_views=0x1c8c140, num_textures=0x1c8c644) at state_tracker/st_atom_texture.c:465 #11 0x7f8388e6296d in st_validate_state (st=st@entry=0x1c8a710, pipeline=pipeline@entry=ST_PIPELINE_RENDER) at state_tracker/st_atom.c:289 #12 0x7f8388e8343b in st_draw_vbo (ctx=0x1c50600, prims=0x7ffe99b5a580, nr_prims=1, ib=0x7ffe99b5a560, index_bounds_valid=, min_index=, max_index=, tfb_vertcount=0x0, stream=0, indirect=0x0) at state_tracker/st_draw.c:176 #13 0x7f8388e44d34 in vbo_validated_drawrangeelements (ctx=ctx@entry=0x1c50600, mode=mode@entry=4, index_bounds_valid=index_bounds_valid@entry=0 '\000', start=start@entry=4294967295, end=end@entry=4294967295, count=count@entry=2688, type=5123, indices=0x0, basevertex=0, numInstances=1, baseInstance=0) at vbo/vbo_exec_array.c:849 #14 0x7f8388e44db5 in vbo_exec_DrawElementsInstanced (mode=4, #count=2688, type=5123, indices=0x0, numInstances=1) at #vbo/vbo_exec_array.c:1030 Instead, remove the views from enabled_mask immediately. Signed-off-by: Marc-André Lureau --- src/gallium/drivers/virgl/virgl_context.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/virgl/virgl_context.c b/src/gallium/drivers/virgl/virgl_context.c index 9007583..55144d5 100644 --- a/src/gallium/drivers/virgl/virgl_context.c +++ b/src/gallium/drivers/virgl/virgl_context.c @@ -711,6 +711,7 @@ static void virgl_set_sampler_views(struct pipe_context *ctx, pipe_sampler_view_reference((struct pipe_sampler_view **)&tinfo->views[i], NULL); } + tinfo->enabled_mask &= ~disable_mask; for (i = 0; i < num_views; i++) { struct virgl_sampler_view *grview = virgl_sampler_view(views[i]); @@ -721,12 +722,11 @@ static void virgl_set_sampler_views(struct pipe_context *ctx, new_mask |= 1 << i; pipe_sampler_view_reference((struct pipe_sampler_view **)&tinfo->views[i], views[i]); } else { + tinfo->enabled_mask &= ~(1 << i); pipe_sampler_view_reference((struct pipe_sampler_view **)&tinfo->views[i], NULL); - disable_mask |= 1 << i; } } - tinfo->enabled_mask &= ~disable_mask; tinfo->enabled_mask |= new_mask; virgl_encode_set_sampler_views(vctx, shader_type, start_slot, num_views, tinfo->views); virgl_attach_res_sampler_views(vctx, shader_type); -- 2.10.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/14] egl: Lock the display in _eglCreateSync's callers
On 13 September 2016 at 19:14, Adam Jackson wrote: > On Tue, 2016-09-13 at 17:22 +0100, Emil Velikov wrote: > >> Actually, current code has a bunch of such bugs which this series addresses. >> Considering there's only a couple of those and they are pretty hard to >> hit I won't bother with respinning the patches. >> >> That is unless we want them for stable ? > > I mean, I'm going to want this in "stable" because I want to switch to > libglvnd sooner than later. I'm perfectly capable of applying the > series to Fedora's Mesa build on my own though. > > I guess my question is why applying the whole series to stable wouldn't > be acceptable. > For the series as a whole ? Two words which contradict any software's stable scheme - new feature. Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/va: also honors interlaced preference when providing a video format
Leo Liu wrote: On 09/13/2016 01:29 PM, Andy Furniss wrote: Leo Liu wrote: Hi Andy, On 09/13/2016 06:22 AM, Andy Furniss wrote: Zhang, Boyuan wrote: Hi Leo, Christian and Julien, I tested the patch with Vaapi Encoding and Transcoding, it seems working fine. We are using "VAAPI_DISABLE_INTERLACE" env, so interlaced is always disabled. Though I notice it will break screen recording scripts for existing users who previously didn't need the env set but will after this. Totally untested/thought through, but maybe the env should default to on? Agree, can you come up a patch for that? OK, but maybe I should test a bit first to see if anything regresses. Unfortunately I today, by chance found an issue with mpv. With VAAPI_DISABLE_INTERLACE=1 which it needs for mpv --vo-vaapi all is apparently OK when playing say a 25fps vid, but I've found that if I push the framerate to refresh rate and do something that draws OSD than image is corrupted, possible many VM faults logged followed by a segfault in u_copy_yv12_img_to_nv12_surf this happens with or without the uv swap patch below. I will file a bug after more investigation. Bisecting mesa goes back to the commit that introduced VAAPI_DISABLE_INTERLACE. We have to be careful, we cannot override preferred interlaced type, got from querying. OK, but it won't work = corrupted but no crash without the env. Also any outstanding patches for VA-API encode from you was reviewed, but not committed? if any, sent to me, I can push them. There's only https://lists.freedesktop.org/archives/mesa-dev/2016-July/124695.html for the uv swap issue. Done. pushed. Thanks. Not my issue as such, but did anyone notice this from Mark Thompson, who does vaapi for libav/ffmpeg? I notice he didn't keep the CCs so maybe it got missed. https://lists.freedesktop.org/archives/mesa-dev/2016-September/128076.html Did anyone have this reviewed? No - I was thinking that as no one replied at all it had been missed, it's kind of a bug report really. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/6] RadeonSI: Let's just stop spilling SGPRs
On 13.09.2016 19:13, Marek Olšák wrote: This is quite easy because we just have to get rid of all of the preloading at the beginning of shaders. I also removed preloading of PS inputs with literal indexing, which has almost the same effect as sinking interp instructions. I'm slightly concerned that LICM won't move interps because they are not considered speculatively-executable (=movable) by LLVM, but the shader-db stats show that it doesn't matter. LLVM is smart enough to do CSE where needed for both descriptor loads and interps. In fact, it's the CSE which is responsible for some of the remaining SGPR spills. (It makes sense if you think about it) The compile time increased by 6% because CSE has a lot more work, but it's certainly worth it. shader-db stats: [PATCH 4/6] radeonsi: get rid of img/buf/sampler descriptor https://people.freedesktop.org/~mareko/no_preload1.html [PATCH 5/6] radeonsi: get rid of constant buffer preloading https://people.freedesktop.org/~mareko/no_preload2.html [PATCH 6/6] radeonsi: reload PS inputs with direct indexing at each https://people.freedesktop.org/~mareko/no_preload3_ps.html Total diff: https://people.freedesktop.org/~mareko/no_preload_total.html Those numbers are impressive. We do have to be slightly careful, I noticed that LLVM didn't lift some constant loads out of loops with the earlier preload removal, in shaders where SGPR pressure wasn't an issue at all. I think the right way to deal with this is to improve heuristics in LLVM, so I'm fine with changing Mesa in this way. Some comments sent in reply to the respective patches. Patches 2, 3, and 5 are Reviewed-by: Nicolai Hähnle Please review. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 14/14] egl: Implement EGL_KHR_debug
On 13 September 2016 at 18:55, Adam Jackson wrote: > On Tue, 2016-09-13 at 17:17 +0100, Emil Velikov wrote: > >> > + } else { >> > + _eglDebugReportFull(EGL_BAD_ALLOC, __func__, __func__, >> > + EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL); >> > + return EGL_BAD_ALLOC; >> > + } >> >> Nit: Please use the same style as the "objectType == >> EGL_OBJECT_DISPLAY_KHR" case. > > AFAICT the reason this code doesn't use RETURN_EGL_ERROR like > everything else is because it doesn't lock the display. Which is > extremely wrong, since we definitely depend on it not going away from > under us later! Fixed in v2. > Hehe, the locking 'issue' mentioned in 09/14 is already upon us. I've completely missed the lack of unlock here. >> Nit: You can also drop the "else" and flatten (indent one level less) >> all of the following code. > > Done in v2. > >> Missing EGLAPIENTRY > > Fixed in v2. > >> > +eglDebugMessageControlKHR(EGLDEBUGPROCKHR callback, >> > + const EGLAttrib *attrib_list) >> > +{ >> > + unsigned int newEnabled; >> > + >> > + _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC); >> > + >> > + mtx_lock(_eglGlobal.Mutex); >> > + >> > + newEnabled = _eglGlobal.debugTypesEnabled; >> > + if (attrib_list != NULL) { >> > + int i; >> > + >> > + for (i = 0; attrib_list[i] != EGL_NONE; i += 2) { >> >> Don't think we check it elsewhere (and/or if we should care too much) but >> still: >> Check if i overflows or use unsigned type ? > > There's a bunch of places where we don't check that... > >> > + if (attrib_list[i] >= EGL_DEBUG_MSG_CRITICAL_KHR && >> > + attrib_list[i] <= EGL_DEBUG_MSG_INFO_KHR) { >> > +if (attrib_list[i + 1]) { >> > + newEnabled |= DebugBitFromType(attrib_list[i]); >> > +} else { >> > + newEnabled &= ~DebugBitFromType(attrib_list[i]); >> > +} >> >> Nit: break; ? > > Nope. You're allowed to set the disposition for multiple error levels > in a single call to DebugMessageControl, so you need to validate them > all. > Right, had a bit of a brain freeze moment. >> > +eglQueryDebugKHR(EGLint attribute, EGLAttrib *value) >> > +{ >> > + _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC); >> > + >> > + mtx_lock(_eglGlobal.Mutex); >> > + if (attribute >= EGL_DEBUG_MSG_CRITICAL_KHR && >> > + attribute <= EGL_DEBUG_MSG_INFO_KHR) { >> > + if (_eglGlobal.debugTypesEnabled & DebugBitFromType(attribute)) { >> > + *value = EGL_TRUE; >> > + } else { >> > + *value = EGL_FALSE; >> > + } >> > + } else if (attribute == EGL_DEBUG_CALLBACK_KHR) { >> > + *value = (EGLAttrib) _eglGlobal.debugCallback; >> > + } else { >> > + mtx_unlock(_eglGlobal.Mutex); >> > + _eglReportError(EGL_BAD_ATTRIBUTE, NULL, >> > + "Invalid attribute 0x%04lx", (unsigned long) attribute); >> > + return EGL_FALSE; >> > + } >> >> Nit: Switch statement will be a lot easier to read. > > Meh. I factored out the valid-debug-level check to a helper, at which > point you can't really use a switch. Redone as a do-while so I could > use break to bail out of the success conditions. > Whichever works really. Just pointing out that using if/else chains esp. when the else isn't needed makes things messy/less appealing/etc.. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/14] egl: Lock the display in _eglCreateSync's callers
On Tue, 2016-09-13 at 17:22 +0100, Emil Velikov wrote: > Actually, current code has a bunch of such bugs which this series addresses. > Considering there's only a couple of those and they are pretty hard to > hit I won't bother with respinning the patches. > > That is unless we want them for stable ? I mean, I'm going to want this in "stable" because I want to switch to libglvnd sooner than later. I'm perfectly capable of applying the series to Fedora's Mesa build on my own though. I guess my question is why applying the whole series to stable wouldn't be acceptable. - ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/6] radeonsi: reload PS inputs with direct indexing at each use
On 13.09.2016 19:13, Marek Olšák wrote: From: Marek Olšák The LLVM compiler can CSE interp intrinsics thanks to LLVMReadNoneAttribute. 26011 shaders in 14651 tests Totals: SGPRS: 1146340 -> 1132676 (-1.19 %) VGPRS: 727371 -> 711730 (-2.15 %) Spilled SGPRs: 2218 -> 2078 (-6.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35841268 -> 36009732 (0.47 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222559 -> 224779 (1.00 %) Wait states: 0 -> 0 (0.00 %) --- src/gallium/drivers/radeon/radeon_llvm.h | 6 - .../drivers/radeon/radeon_setup_tgsi_llvm.c| 28 ++ src/gallium/drivers/radeonsi/si_shader.c | 27 + 3 files changed, 39 insertions(+), 22 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_llvm.h b/src/gallium/drivers/radeon/radeon_llvm.h index da5b7f5..f508d32 100644 --- a/src/gallium/drivers/radeon/radeon_llvm.h +++ b/src/gallium/drivers/radeon/radeon_llvm.h @@ -23,21 +23,23 @@ * Authors: Tom Stellard * */ #ifndef RADEON_LLVM_H #define RADEON_LLVM_H #include #include "gallivm/lp_bld_init.h" #include "gallivm/lp_bld_tgsi.h" +#include "tgsi/tgsi_parse.h" +#define RADEON_LLVM_MAX_INPUT_SLOTS 32 #define RADEON_LLVM_MAX_INPUTS 32 * 4 #define RADEON_LLVM_MAX_OUTPUTS 32 * 4 #define RADEON_LLVM_INITIAL_CF_DEPTH 4 #define RADEON_LLVM_MAX_SYSTEM_VALUES 4 struct radeon_llvm_branch { LLVMBasicBlockRef endif_block; LLVMBasicBlockRef if_block; @@ -55,33 +57,35 @@ struct radeon_llvm_context { /*=== Front end configuration ===*/ /* Instructions that are not described by any of the TGSI opcodes. */ /** This function is responsible for initilizing the inputs array and will be * called once for each input declared in the TGSI shader. */ void (*load_input)(struct radeon_llvm_context *, unsigned input_index, - const struct tgsi_full_declaration *decl); + const struct tgsi_full_declaration *decl, + LLVMValueRef out[4]); void (*load_system_value)(struct radeon_llvm_context *, unsigned index, const struct tgsi_full_declaration *decl); void (*declare_memory_region)(struct radeon_llvm_context *, const struct tgsi_full_declaration *decl); /** This array contains the input values for the shader. Typically these * values will be in the form of a target intrinsic that will inform the * backend how to load the actual inputs to the shader. */ + struct tgsi_full_declaration input_decls[RADEON_LLVM_MAX_INPUT_SLOTS]; LLVMValueRef inputs[RADEON_LLVM_MAX_INPUTS]; LLVMValueRef outputs[RADEON_LLVM_MAX_OUTPUTS][TGSI_NUM_CHANNELS]; /** This pointer is used to contain the temporary values. * The amount of temporary used in tgsi can't be bound to a max value and * thus we must allocate this array at runtime. */ LLVMValueRef *temps; unsigned temps_count; LLVMValueRef system_values[RADEON_LLVM_MAX_SYSTEM_VALUES]; diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 4643e6d..11f0cf2 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -439,28 +439,43 @@ LLVMValueRef radeon_llvm_emit_fetch(struct lp_build_tgsi_context *bld_base, bld_base->int_bld.zero); result = LLVMConstInsertElement(result, bld->immediates[reg->Register.Index][swizzle + 1], bld_base->int_bld.one); return LLVMConstBitCast(result, ctype); } else { return LLVMConstBitCast(bld->immediates[reg->Register.Index][swizzle], ctype); } } - case TGSI_FILE_INPUT: - result = ctx->inputs[radeon_llvm_reg_index_soa(reg->Register.Index, swizzle)]; + case TGSI_FILE_INPUT: { + unsigned index = reg->Register.Index; + LLVMValueRef input[4]; + + /* I don't think doing this for vertex shaders is beneficial. +* For those, we want to make sure the VMEM loads are executed +* only once. Fragment shaders don't care much, because +* v_interp instructions are much cheaper than VMEM loads. +*/ + if (ctx->soa.bld_base.info->processor == PIPE_SHADER_FRAGMENT) + ctx->load_input(ctx, index, &ctx->input_decls[index], input); + el
Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2
Marek Olšák wrote on 13.09.2016 19:53: > LLVM 64-bit: > > mkdir -p build > cd build > cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/x86_64-linux-gnu > -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=O > -DCMAKE_BUILD_TYPE=RelWithDebInfo > -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \ > -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG > -fno-omit-frame-pointer" \ > -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG > -fno-omit-frame-pointer". > ninja > sudo ninja install > > > LLVM 32-bit: > > mkdir -p build32 > cd build32 > cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/i386-linux-gnu > -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=ON > -DCMAKE_BUILD_TYPE=RelWithDebInfo > -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \ > -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG > -fno-omit-frame-pointer" \ > -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG > -fno-omit-frame-pointer" \ > -DLLVM_BUILD_32_BITS=ON > ninja > sudo ninja install > # then add /usr/llvm/x86_64-linux-gnu and /usr/llvm/i386-linux-gnu to > ld.conf > > > Mesa configure helper script, it will overwrite the /usr/lib/ files on > Ubuntu (run as-is for 64-bit, or use "-32" for 32-bit): Just a note about Debian/Ubuntu (even though I don't think that's too interesting for Romain): Well, if you build for Debian or a derivative like Ubuntu and you do not need many versions in parallel, ie. as a user, then the best option (IMHO) is using Debian's package and building it with the current upstream (plus some odd changes here and there for new symbols and such). LLVM you mostly don't need to build (in that case), because you can just use the packages from apt.llvm.org, which Sylvestre, the Debian LLVM maintainer, builds for various Debian and Ubuntu flavours. (And that saves a lot of the rage-quit potential IMHO, since building LLVM can take a while and fail in very inopportune moments.) That way you can switch cleanly back to your distros packages, if something breaks (or they catch up far enough that you want to stop building Mesa yourself). Anyway, this is how I do my builds: LLVM only if I have to, in case apt.llvm.org is currently outdated (happens occasionally) or I'm testing a patch for upstream inclusion. And Mesa I build regularly by just using git-buildpackage with the Debian package as a base and a local branch for current Mesa git plus a local "Debian branch" including different configuration (eg. I was already building the VA-API stuff for myself before Debian started doing it) or patches (again: those I'm testing for upstream inclusion), if any. Mesa builds are quite fast, only a couple of minutes in a clean chroot environment (pbuilder!), so it's not nearly as annoying as building LLVM. If there's interest, I could probably whip some guide up, which could be shipped in Mesa's doc directory? By the way, since nobody mentioned this so far: if you want OpenCL support you're going to need to rebuild libclc as well. Your distro's libclc was built against the LLVM it ships. Cheers, Kai signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/6] radeonsi: get rid of img/buf/sampler descriptor preloading
On 13.09.2016 19:13, Marek Olšák wrote: From: Marek Olšák 26011 shaders in 14651 tests Totals: SGPRS: 1251920 -> 1152636 (-7.93 %) VGPRS: 728421 -> 728198 (-0.03 %) Spilled SGPRs: 16644 -> 3776 (-77.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 36001064 -> 35835152 (-0.46 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 21 -> 222372 (0.07 %) Wait states: 0 -> 0 (0.00 %) --- src/gallium/drivers/radeonsi/si_shader.c | 123 +++ 1 file changed, 28 insertions(+), 95 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 3f77714..c96c52e 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -100,25 +100,20 @@ struct si_shader_context LLVMTargetMachineRef tm; unsigned invariant_load_md_kind; unsigned range_md_kind; unsigned uniform_md_kind; LLVMValueRef empty_md; /* Preloaded descriptors. */ LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS]; - LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS]; - LLVMValueRef sampler_views[SI_NUM_SAMPLERS]; - LLVMValueRef sampler_states[SI_NUM_SAMPLERS]; - LLVMValueRef fmasks[SI_NUM_SAMPLERS]; - LLVMValueRef images[SI_NUM_IMAGES]; LLVMValueRef esgs_ring; LLVMValueRef gsvs_ring[4]; LLVMValueRef lds; LLVMValueRef gs_next_vertex[4]; LLVMValueRef return_value; LLVMTypeRef voidt; LLVMTypeRef i1; LLVMTypeRef i8; @@ -3420,30 +3415,32 @@ static void membar_emit( struct si_shader_context *ctx = si_shader_context(bld_base); emit_waitcnt(ctx); } static LLVMValueRef shader_buffer_fetch_rsrc(struct si_shader_context *ctx, const struct tgsi_full_src_register *reg) { LLVMValueRef ind_index; - LLVMValueRef rsrc_ptr; + LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, +SI_PARAM_SHADER_BUFFERS); - if (!reg->Register.Indirect) - return ctx->shader_buffers[reg->Register.Index]; + if (!reg->Register.Indirect) { + ind_index = LLVMConstInt(ctx->i32, reg->Register.Index, 0); + return build_indexed_load_const(ctx, rsrc_ptr, ind_index); + } ind_index = get_bounded_indirect_index(ctx, ®->Indirect, reg->Register.Index, SI_NUM_SHADER_BUFFERS); - rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, SI_PARAM_SHADER_BUFFERS); return build_indexed_load_const(ctx, rsrc_ptr, ind_index); The calls to build_indexed_load_const can be further unified. } static bool tgsi_is_array_sampler(unsigned target) { return target == TGSI_TEXTURE_1D_ARRAY || target == TGSI_TEXTURE_SHADOW1D_ARRAY || target == TGSI_TEXTURE_2D_ARRAY || target == TGSI_TEXTURE_SHADOW2D_ARRAY || target == TGSI_TEXTURE_CUBE_ARRAY || @@ -3493,46 +3490,54 @@ static LLVMValueRef force_dcc_off(struct si_shader_context *ctx, * Load the resource descriptor for \p image. */ static void image_fetch_rsrc( struct lp_build_tgsi_context *bld_base, const struct tgsi_full_src_register *image, bool dcc_off, LLVMValueRef *rsrc) { struct si_shader_context *ctx = si_shader_context(bld_base); + LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, +SI_PARAM_IMAGES); assert(image->Register.File == TGSI_FILE_IMAGE); if (!image->Register.Indirect) { - /* Fast path: use preloaded resources */ - *rsrc = ctx->images[image->Register.Index]; + struct tgsi_shader_info *info = &ctx->shader->selector->info; + int i = image->Register.Index; + LLVMValueRef index = LLVMConstInt(ctx->i32, i, 0); + + /* Rely on LLVM to shrink the load for buffer resources. */ + *rsrc = build_indexed_load_const(ctx, rsrc_ptr, index); + + if (info->images_writemask & (1 << i) && + !(info->images_buffers & (1 << i))) + *rsrc = force_dcc_off(ctx, *rsrc); } else { /* Indexing and manual load */ LLVMValueRef ind_index; - LLVMValueRef rsrc_ptr; LLVMValueRef tmp; /* From the GL_ARB_shader_image_load_store extension spec: * *If a shader performs an image load, store, or atomic *operation using an image variable declared as an array, *and if the index used to select an individual element is *negative or greater than or equal to th
Re: [Mesa-dev] [PATCH] st/va: also honors interlaced preference when providing a video format
On 09/13/2016 01:29 PM, Andy Furniss wrote: Leo Liu wrote: Hi Andy, On 09/13/2016 06:22 AM, Andy Furniss wrote: Zhang, Boyuan wrote: Hi Leo, Christian and Julien, I tested the patch with Vaapi Encoding and Transcoding, it seems working fine. We are using "VAAPI_DISABLE_INTERLACE" env, so interlaced is always disabled. Though I notice it will break screen recording scripts for existing users who previously didn't need the env set but will after this. Totally untested/thought through, but maybe the env should default to on? Agree, can you come up a patch for that? OK, but maybe I should test a bit first to see if anything regresses. Unfortunately I today, by chance found an issue with mpv. With VAAPI_DISABLE_INTERLACE=1 which it needs for mpv --vo-vaapi all is apparently OK when playing say a 25fps vid, but I've found that if I push the framerate to refresh rate and do something that draws OSD than image is corrupted, possible many VM faults logged followed by a segfault in u_copy_yv12_img_to_nv12_surf this happens with or without the uv swap patch below. I will file a bug after more investigation. Bisecting mesa goes back to the commit that introduced VAAPI_DISABLE_INTERLACE. We have to be careful, we cannot override preferred interlaced type, got from querying. Also any outstanding patches for VA-API encode from you was reviewed, but not committed? if any, sent to me, I can push them. There's only https://lists.freedesktop.org/archives/mesa-dev/2016-July/124695.html for the uv swap issue. Done. pushed. Not my issue as such, but did anyone notice this from Mark Thompson, who does vaapi for libav/ffmpeg? I notice he didn't keep the CCs so maybe it got missed. https://lists.freedesktop.org/archives/mesa-dev/2016-September/128076.html Did anyone have this reviewed? Regards, Leo ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] intel/blorp: Stop setting 3DSTATE_DRAWING_RECTANGLE
On Mon, Sep 12, 2016 at 3:50 PM, Jason Ekstrand wrote: > The Vulkan driver sets 3DSTATE_DRAWING_RECTANGLE once to MAX_INT x MAX_INT > at the GPU initialization time and never sets it again. The GL driver sets > it every time the framebuffer changes. Originally, blorp set it to the > size of the drawing area but meant we had to set it back in the Vulkan > driver. Instead, we can easily just do that in the GL driver's blorp_exec > implementation and not set it in blorp core. > > Signed-off-by: Jason Ekstrand > --- > src/intel/blorp/blorp_genX_exec.h | 5 - > src/intel/vulkan/genX_blorp_exec.c | 15 --- > src/mesa/drivers/dri/i965/genX_blorp_exec.c | 5 + > 3 files changed, 5 insertions(+), 20 deletions(-) > > diff --git a/src/intel/blorp/blorp_genX_exec.h > b/src/intel/blorp/blorp_genX_exec.h > index aff59e1..eb4a5b9 100644 > --- a/src/intel/blorp/blorp_genX_exec.h > +++ b/src/intel/blorp/blorp_genX_exec.h > @@ -1216,11 +1216,6 @@ blorp_exec(struct blorp_batch *batch, const struct > blorp_params *params) >clear.DepthClearValue = params->depth.clear_color.u32[0]; > } > > - blorp_emit(batch, GENX(3DSTATE_DRAWING_RECTANGLE), rect) { > - rect.ClippedDrawingRectangleXMax = MAX2(params->x1, params->x0) - 1; > - rect.ClippedDrawingRectangleYMax = MAX2(params->y1, params->y0) - 1; > - } > - > blorp_emit(batch, GENX(3DPRIMITIVE), prim) { >prim.VertexAccessType = SEQUENTIAL; >prim.PrimitiveTopologyType = _3DPRIM_RECTLIST; > diff --git a/src/intel/vulkan/genX_blorp_exec.c > b/src/intel/vulkan/genX_blorp_exec.c > index a3ad97a..5ddbb7d 100644 > --- a/src/intel/vulkan/genX_blorp_exec.c > +++ b/src/intel/vulkan/genX_blorp_exec.c > @@ -203,21 +203,6 @@ genX(blorp_exec)(struct blorp_batch *batch, > > blorp_exec(batch, params); > > - /* BLORP sets DRAWING_RECTANGLE but we always want it set to the maximum. > -* Since we set it once at driver init and never again, we have to set it > -* back after invoking blorp. > -* > -* TODO: BLORP should assume a max drawing rectangle > -*/ > - blorp_emit(batch, GENX(3DSTATE_DRAWING_RECTANGLE), rect) { > - rect.ClippedDrawingRectangleYMin = 0; > - rect.ClippedDrawingRectangleXMin = 0; > - rect.ClippedDrawingRectangleYMax = UINT16_MAX; > - rect.ClippedDrawingRectangleXMax = UINT16_MAX; > - rect.DrawingRectangleOriginY = 0; > - rect.DrawingRectangleOriginX = 0; > - } > - > cmd_buffer->state.vb_dirty = ~0; > cmd_buffer->state.dirty = ~0; > cmd_buffer->state.push_constants_dirty = ~0; > diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exec.c > b/src/mesa/drivers/dri/i965/genX_blorp_exec.c > index 8cd5a62..edcd896 100644 > --- a/src/mesa/drivers/dri/i965/genX_blorp_exec.c > +++ b/src/mesa/drivers/dri/i965/genX_blorp_exec.c > @@ -206,6 +206,11 @@ retry: > > brw_emit_depth_stall_flushes(brw); > > + blorp_emit(batch, GENX(3DSTATE_DRAWING_RECTANGLE), rect) { > + rect.ClippedDrawingRectangleXMax = MAX2(params->x1, params->x0) - 1; > + rect.ClippedDrawingRectangleYMax = MAX2(params->y1, params->y0) - 1; > + } > + > blorp_exec(batch, params); > > /* Make sure we didn't wrap the batch unintentionally, and make sure we > -- > 2.5.0.400.gff86faf > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev Reviewed-by: Anuj Phogat ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] radeonsi: load streamout buffer descriptors before use
On 13.09.2016 19:13, Marek Olšák wrote: From: Marek Olšák --- src/gallium/drivers/radeonsi/si_shader.c | 67 1 file changed, 34 insertions(+), 33 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index be6fae7..b9ad4be 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -105,21 +105,20 @@ struct si_shader_context unsigned uniform_md_kind; LLVMValueRef empty_md; LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS]; LLVMValueRef lds; LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS]; LLVMValueRef sampler_views[SI_NUM_SAMPLERS]; LLVMValueRef sampler_states[SI_NUM_SAMPLERS]; LLVMValueRef fmasks[SI_NUM_SAMPLERS]; LLVMValueRef images[SI_NUM_IMAGES]; - LLVMValueRef so_buffers[4]; LLVMValueRef esgs_ring; LLVMValueRef gsvs_ring[4]; LLVMValueRef gs_next_vertex[4]; LLVMValueRef return_value; LLVMTypeRef voidt; LLVMTypeRef i1; LLVMTypeRef i8; LLVMTypeRef i32; LLVMTypeRef i64; @@ -2253,31 +2252,64 @@ static void si_dump_streamout(struct pipe_stream_output_info *so) i, so->output[i].output_buffer, so->output[i].dst_offset, so->output[i].dst_offset + so->output[i].num_components - 1, so->output[i].register_index, mask & 1 ? "x" : "", mask & 2 ? "y" : "", mask & 4 ? "z" : "", mask & 8 ? "w" : ""); } } +static void load_streamout_descriptors(struct si_shader_context *ctx, + LLVMValueRef so_buffers[4]) +{ + struct lp_build_tgsi_context *bld_base = &ctx->radeon_bld.soa.bld_base; + struct gallivm_state *gallivm = bld_base->base.gallivm; + unsigned i; + + /* Streamout can only be used if the shader is compiled as VS. */ + if (!ctx->shader->selector->so.num_outputs || + (ctx->type == PIPE_SHADER_VERTEX && +(ctx->shader->key.vs.as_es || + ctx->shader->key.vs.as_ls)) || + (ctx->type == PIPE_SHADER_TESS_EVAL && +ctx->shader->key.tes.as_es)) + return; This should probably be an assertion now. Cheers Nicolai + + LLVMValueRef buf_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, + SI_PARAM_RW_BUFFERS); + + /* Load the resources, we rely on the code sinking to do the rest */ + for (i = 0; i < 4; ++i) { + if (ctx->shader->selector->so.stride[i]) { + LLVMValueRef offset = lp_build_const_int32(gallivm, + SI_VS_STREAMOUT_BUF0 + i); + + so_buffers[i] = build_indexed_load_const(ctx, buf_ptr, offset); + } + } +} + /* On SI, the vertex shader is responsible for writing streamout data * to buffers. */ static void si_llvm_emit_streamout(struct si_shader_context *ctx, struct si_shader_output_values *outputs, unsigned noutput) { struct pipe_stream_output_info *so = &ctx->shader->selector->so; struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm; LLVMBuilderRef builder = gallivm->builder; int i, j; struct lp_build_if_state if_ctx; + LLVMValueRef so_buffers[4]; + + load_streamout_descriptors(ctx, so_buffers); /* Get bits [22:16], i.e. (so_param >> 16) & 127; */ LLVMValueRef so_vtx_count = unpack_param(ctx, ctx->param_streamout_config, 16, 7); LLVMValueRef tid = get_thread_id(ctx); /* can_emit = tid < so_vtx_count; */ LLVMValueRef can_emit = LLVMBuildICmp(builder, LLVMIntULT, tid, so_vtx_count, ""); @@ -2359,21 +2391,21 @@ static void si_llvm_emit_streamout(struct si_shader_context *ctx, } break; } LLVMValueRef can_emit_stream = LLVMBuildICmp(builder, LLVMIntEQ, stream_id, lp_build_const_int32(gallivm, stream), ""); lp_build_if(&if_ctx_stream, gallivm, can_emit_stream); - build_tbuffer_store_dwords(ctx, ctx->so_buffers[buf_idx], + build_tbuffer_store_dwords(ctx, so_buffers[buf_idx], vdata, num_comps, so_write_offset[buf_idx], LLVMConstInt(ctx->i32, 0, 0),
[Mesa-dev] [PATCH 08/14] egl: Factor out _eglCreateImageCommon (v2)
From: Kyle Brenneman v2: - Pass disp to RETURN_EGL_ERROR so we unlock the display --- src/egl/main/eglapi.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c index a74e5e4..ba4826a 100644 --- a/src/egl/main/eglapi.c +++ b/src/egl/main/eglapi.c @@ -1309,11 +1309,10 @@ eglReleaseThread(void) } -static EGLImage EGLAPIENTRY -eglCreateImageKHR(EGLDisplay dpy, EGLContext ctx, EGLenum target, +static EGLImage +_eglCreateImageCommon(_EGLDisplay *disp, EGLContext ctx, EGLenum target, EGLClientBuffer buffer, const EGLint *attr_list) { - _EGLDisplay *disp = _eglLockDisplay(dpy); _EGLContext *context = _eglLookupContext(ctx, disp); _EGLDriver *drv; _EGLImage *img; @@ -1337,18 +1336,27 @@ eglCreateImageKHR(EGLDisplay dpy, EGLContext ctx, EGLenum target, RETURN_EGL_EVAL(disp, ret); } +static EGLImage EGLAPIENTRY +eglCreateImageKHR(EGLDisplay dpy, EGLContext ctx, EGLenum target, + EGLClientBuffer buffer, const EGLint *attr_list) +{ + _EGLDisplay *disp = _eglLockDisplay(dpy); + return _eglCreateImageCommon(disp, ctx, target, buffer, attr_list); +} + EGLImage EGLAPIENTRY eglCreateImage(EGLDisplay dpy, EGLContext ctx, EGLenum target, EGLClientBuffer buffer, const EGLAttrib *attr_list) { + _EGLDisplay *disp = _eglLockDisplay(dpy); EGLImage image; EGLint *int_attribs = _eglConvertAttribsToInt(attr_list); if (attr_list && !int_attribs) - RETURN_EGL_ERROR(NULL, EGL_BAD_ALLOC, EGL_NO_IMAGE); + RETURN_EGL_ERROR(disp, EGL_BAD_ALLOC, EGL_NO_IMAGE); - image = eglCreateImageKHR(dpy, ctx, target, buffer, int_attribs); + image = _eglCreateImageCommon(disp, ctx, target, buffer, int_attribs); free(int_attribs); return image; } -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/14] egl: Implement EGL_KHR_debug (v2)
From: Kyle Brenneman Wire up the debug entrypoints to EGL dispatch, and add the extension string to the client extension list. v2: - Lots of style fixes - Fix missing EGLAPIENTRYs - Factor out valid attribute check - Lock display in eglLabelObjectKHR as needed, and use RETURN_EGL_* - Move "EGL_KHR_debug" into asciibetical order in client extension string --- src/egl/main/eglapi.c | 145 ++ src/egl/main/eglglobals.c | 1 + 2 files changed, 146 insertions(+) diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c index 216b289..7162039 100644 --- a/src/egl/main/eglapi.c +++ b/src/egl/main/eglapi.c @@ -1977,6 +1977,148 @@ eglExportDMABUFImageMESA(EGLDisplay dpy, EGLImage image, RETURN_EGL_EVAL(disp, ret); } +static EGLint EGLAPIENTRY +eglLabelObjectKHR(EGLDisplay dpy, EGLenum objectType, EGLObjectKHR object, + EGLLabelKHR label) +{ + _EGLDisplay *disp = NULL; + _EGLResourceType type; + + _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC); + + if (objectType == EGL_OBJECT_THREAD_KHR) { + _EGLThreadInfo *t = _eglGetCurrentThread(); + + if (!_eglIsCurrentThreadDummy()) { + t->Label = label; + return EGL_SUCCESS; + } + + RETURN_EGL_ERROR(NULL, EGL_BAD_ALLOC, EGL_BAD_ALLOC); + } + + disp = _eglLockDisplay(dpy); + if (disp == NULL) + RETURN_EGL_ERROR(disp, EGL_BAD_DISPLAY, EGL_BAD_DISPLAY); + + if (objectType == EGL_OBJECT_DISPLAY_KHR) { + if (dpy != (EGLDisplay) object) + RETURN_EGL_ERROR(disp, EGL_BAD_PARAMETER, EGL_BAD_PARAMETER); + + disp->Label = label; + RETURN_EGL_EVAL(disp, EGL_SUCCESS); + } + + switch (objectType) { + case EGL_OBJECT_CONTEXT_KHR: + type = _EGL_RESOURCE_CONTEXT; + break; + case EGL_OBJECT_SURFACE_KHR: + type = _EGL_RESOURCE_SURFACE; + break; + case EGL_OBJECT_IMAGE_KHR: + type = _EGL_RESOURCE_IMAGE; + break; + case EGL_OBJECT_SYNC_KHR: + type = _EGL_RESOURCE_SYNC; + break; + case EGL_OBJECT_STREAM_KHR: + default: + RETURN_EGL_ERROR(disp, EGL_BAD_PARAMETER, EGL_BAD_PARAMETER); + } + + if (_eglCheckResource(object, type, disp)) { + _EGLResource *res = (_EGLResource *) object; + + res->Label = label; + RETURN_EGL_EVAL(disp, EGL_SUCCESS); + } + + RETURN_EGL_ERROR(disp, EGL_BAD_PARAMETER, EGL_BAD_PARAMETER); +} + +static EGLBoolean +validDebugMessageLevel(EGLAttrib level) +{ + return (level >= EGL_DEBUG_MSG_CRITICAL_KHR && + level <= EGL_DEBUG_MSG_INFO_KHR); +} + +static EGLint EGLAPIENTRY +eglDebugMessageControlKHR(EGLDEBUGPROCKHR callback, + const EGLAttrib *attrib_list) +{ + unsigned int newEnabled; + + _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC); + + mtx_lock(_eglGlobal.Mutex); + + newEnabled = _eglGlobal.debugTypesEnabled; + if (attrib_list != NULL) { + int i; + + for (i = 0; attrib_list[i] != EGL_NONE; i += 2) { + if (validDebugMessageLevel(attrib_list[i])) { +if (attrib_list[i + 1]) + newEnabled |= DebugBitFromType(attrib_list[i]); +else + newEnabled &= ~DebugBitFromType(attrib_list[i]); +continue; + } + + // On error, set the last error code, call the current + // debug callback, and return the error code. + mtx_unlock(_eglGlobal.Mutex); + _eglReportError(EGL_BAD_ATTRIBUTE, NULL, + "Invalid attribute 0x%04lx", (unsigned long) attrib_list[i]); + return EGL_BAD_ATTRIBUTE; + } + } + + if (callback != NULL) { + _eglGlobal.debugCallback = callback; + _eglGlobal.debugTypesEnabled = newEnabled; + } else { + _eglGlobal.debugCallback = NULL; + _eglGlobal.debugTypesEnabled = _EGL_DEBUG_BIT_CRITICAL | _EGL_DEBUG_BIT_ERROR; + } + + mtx_unlock(_eglGlobal.Mutex); + return EGL_SUCCESS; +} + +static EGLBoolean EGLAPIENTRY +eglQueryDebugKHR(EGLint attribute, EGLAttrib *value) +{ + _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC); + + mtx_lock(_eglGlobal.Mutex); + + do { + if (validDebugMessageLevel(attribute)) { + if (_eglGlobal.debugTypesEnabled & DebugBitFromType(attribute)) +*value = EGL_TRUE; + else +*value = EGL_FALSE; + break; + } + + if (attribute == EGL_DEBUG_CALLBACK_KHR) { + *value = (EGLAttrib) _eglGlobal.debugCallback; + break; + } + + mtx_unlock(_eglGlobal.Mutex); + _eglReportError(EGL_BAD_ATTRIBUTE, NULL, + "Invalid attribute 0x%04lx", (unsigned long) attribute); + return EGL_FALSE; + } while (0); + + mtx_unlock(_eglGlobal.Mutex); + return EGL_TRUE; +} + __eglMustCastToProperFunctionPointerType EGLAPIENTRY eglGetProcAddress(const char *procname) { @@ -2056,6 +2198,9 @@ eglGetProcAddress(const char *procname) { "eglGetSyncValuesCHROMIU
[Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls (v2)
From: Kyle Brenneman This decorates every EGL entrypoint with _EGL_FUNC_START, which records the function name and primary dispatch object label in the current thread state. It also adds debug report functions and calls them when appropriate. This would be useful enough for debugging on its own, if the user set a breakpoint when the report function was called. We will also need this state tracked in order to expose EGL_KHR_debug. v2: - Clear the object label in more cases in _eglSetFuncName - Set dummy thread's CurrentAPI to EGL_NONE not zero - Pass draw surface (if any) to _EGL_FUNC_START in eglSwapInterval --- src/egl/main/eglapi.c | 155 ++ src/egl/main/eglcurrent.c | 91 ++- src/egl/main/eglcurrent.h | 22 +++ src/egl/main/eglglobals.h | 5 ++ 4 files changed, 259 insertions(+), 14 deletions(-) diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c index 0477ad9..216b289 100644 --- a/src/egl/main/eglapi.c +++ b/src/egl/main/eglapi.c @@ -250,6 +250,37 @@ _eglUnlockDisplay(_EGLDisplay *dpy) mtx_unlock(&dpy->Mutex); } +static EGLBoolean +_eglSetFuncName(const char *funcName, _EGLDisplay *disp, EGLenum objectType, _EGLResource *object) +{ + _EGLThreadInfo *thr = _eglGetCurrentThread(); + if (!_eglIsCurrentThreadDummy()) { + thr->CurrentFuncName = funcName; + thr->CurrentObjectLabel = NULL; + + if (objectType == EGL_OBJECT_THREAD_KHR) + thr->CurrentObjectLabel = thr->Label; + else if (objectType == EGL_OBJECT_DISPLAY_KHR) + thr->CurrentObjectLabel = disp ? disp->Label : NULL; + else + thr->CurrentObjectLabel = object ? object->Label : NULL; + + return EGL_TRUE; + } + + _eglDebugReportFull(EGL_BAD_ALLOC, funcName, funcName, + EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL); + return EGL_FALSE; +} + +#define _EGL_FUNC_START(disp, objectType, object, ret) \ + do { \ + if (!_eglSetFuncName(__func__, disp, objectType, (_EGLResource *) object)) { \ + if (disp) \ +_eglUnlockDisplay(disp); \ + return ret; \ + } \ + } while(0) static EGLint * _eglConvertAttribsToInt(const EGLAttrib *attr_list) @@ -287,6 +318,8 @@ eglGetDisplay(EGLNativeDisplayType nativeDisplay) _EGLDisplay *dpy; void *native_display_ptr; + _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY); + STATIC_ASSERT(sizeof(void*) == sizeof(nativeDisplay)); native_display_ptr = (void*) nativeDisplay; @@ -330,6 +363,7 @@ static EGLDisplay EGLAPIENTRY eglGetPlatformDisplayEXT(EGLenum platform, void *native_display, const EGLint *attrib_list) { + _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY); return _eglGetPlatformDisplayCommon(platform, native_display, attrib_list); } @@ -340,6 +374,8 @@ eglGetPlatformDisplay(EGLenum platform, void *native_display, EGLDisplay display; EGLint *int_attribs; + _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY); + int_attribs = _eglConvertAttribsToInt(attrib_list); if (attrib_list && !int_attribs) RETURN_EGL_ERROR(NULL, EGL_BAD_ALLOC, NULL); @@ -483,6 +519,8 @@ eglInitialize(EGLDisplay dpy, EGLint *major, EGLint *minor) { _EGLDisplay *disp = _eglLockDisplay(dpy); + _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE); + if (!disp) RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE); @@ -533,6 +571,8 @@ eglTerminate(EGLDisplay dpy) { _EGLDisplay *disp = _eglLockDisplay(dpy); + _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE); + if (!disp) RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE); @@ -560,6 +600,7 @@ eglQueryString(EGLDisplay dpy, EGLint name) } disp = _eglLockDisplay(dpy); + _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, NULL); _EGL_CHECK_DISPLAY(disp, NULL, drv); switch (name) { @@ -585,6 +626,8 @@ eglGetConfigs(EGLDisplay dpy, EGLConfig *configs, _EGLDriver *drv; EGLBoolean ret; + _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE); + _EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv); ret = drv->API.GetConfigs(drv, disp, configs, config_size, num_config); @@ -600,6 +643,8 @@ eglChooseConfig(EGLDisplay dpy, const EGLint *attrib_list, EGLConfig *configs, _EGLDriver *drv; EGLBoolean ret; + _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE); + _EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv); ret = drv->API.ChooseConfig(drv, disp, attrib_list, configs, config_size, num_config); @@ -617,6 +662,8 @@ eglGetConfigAttrib(EGLDisplay dpy, EGLConfig config, _EGLDriver *drv; EGLBoolean ret; + _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE); + _EGL_CHECK_CONFIG(disp, conf, EGL_FALSE, drv); ret = drv->API.GetConfigAttrib(drv, di
Re: [Mesa-dev] [PATCH 14/14] egl: Implement EGL_KHR_debug
On Tue, 2016-09-13 at 17:17 +0100, Emil Velikov wrote: > > + } else { > > + _eglDebugReportFull(EGL_BAD_ALLOC, __func__, __func__, > > + EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL); > > + return EGL_BAD_ALLOC; > > + } > > Nit: Please use the same style as the "objectType == > EGL_OBJECT_DISPLAY_KHR" case. AFAICT the reason this code doesn't use RETURN_EGL_ERROR like everything else is because it doesn't lock the display. Which is extremely wrong, since we definitely depend on it not going away from under us later! Fixed in v2. > Nit: You can also drop the "else" and flatten (indent one level less) > all of the following code. Done in v2. > Missing EGLAPIENTRY Fixed in v2. > > +eglDebugMessageControlKHR(EGLDEBUGPROCKHR callback, > > + const EGLAttrib *attrib_list) > > +{ > > + unsigned int newEnabled; > > + > > + _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC); > > + > > + mtx_lock(_eglGlobal.Mutex); > > + > > + newEnabled = _eglGlobal.debugTypesEnabled; > > + if (attrib_list != NULL) { > > + int i; > > + > > + for (i = 0; attrib_list[i] != EGL_NONE; i += 2) { > > Don't think we check it elsewhere (and/or if we should care too much) but > still: > Check if i overflows or use unsigned type ? There's a bunch of places where we don't check that... > > + if (attrib_list[i] >= EGL_DEBUG_MSG_CRITICAL_KHR && > > + attrib_list[i] <= EGL_DEBUG_MSG_INFO_KHR) { > > +if (attrib_list[i + 1]) { > > + newEnabled |= DebugBitFromType(attrib_list[i]); > > +} else { > > + newEnabled &= ~DebugBitFromType(attrib_list[i]); > > +} > > Nit: break; ? Nope. You're allowed to set the disposition for multiple error levels in a single call to DebugMessageControl, so you need to validate them all. > > +eglQueryDebugKHR(EGLint attribute, EGLAttrib *value) > > +{ > > + _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC); > > + > > + mtx_lock(_eglGlobal.Mutex); > > + if (attribute >= EGL_DEBUG_MSG_CRITICAL_KHR && > > + attribute <= EGL_DEBUG_MSG_INFO_KHR) { > > + if (_eglGlobal.debugTypesEnabled & DebugBitFromType(attribute)) { > > + *value = EGL_TRUE; > > + } else { > > + *value = EGL_FALSE; > > + } > > + } else if (attribute == EGL_DEBUG_CALLBACK_KHR) { > > + *value = (EGLAttrib) _eglGlobal.debugCallback; > > + } else { > > + mtx_unlock(_eglGlobal.Mutex); > > + _eglReportError(EGL_BAD_ATTRIBUTE, NULL, > > + "Invalid attribute 0x%04lx", (unsigned long) attribute); > > + return EGL_FALSE; > > + } > > Nit: Switch statement will be a lot easier to read. Meh. I factored out the valid-debug-level check to a helper, at which point you can't really use a switch. Redone as a do-while so I could use break to bail out of the success conditions. - ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2
LLVM 64-bit: mkdir -p build cd build cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/x86_64-linux-gnu -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=O -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \ -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG -fno-omit-frame-pointer" \ -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG -fno-omit-frame-pointer". ninja sudo ninja install LLVM 32-bit: mkdir -p build32 cd build32 cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/i386-linux-gnu -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \ -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG -fno-omit-frame-pointer" \ -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG -fno-omit-frame-pointer" \ -DLLVM_BUILD_32_BITS=ON ninja sudo ninja install # then add /usr/llvm/x86_64-linux-gnu and /usr/llvm/i386-linux-gnu to ld.conf Mesa configure helper script, it will overwrite the /usr/lib/ files on Ubuntu (run as-is for 64-bit, or use "-32" for 32-bit): if test x$1 = x-32; then dir=i386-linux-gnu build=i686-linux-gnu export CFLAGS="-m32 -O2 -g" export CXXFLAGS="$CFLAGS" export LDFLAGS="-L/usr/lib/$dir" export PKG_CONFIG_PATH="/usr/lib/$dir/pkgconfig" else dir=x86_64-linux-gnu build=$dir fi ./autogen.sh \ --build=$build --prefix=/usr --libdir=/usr/lib/$dir --with-llvm-prefix=/usr/llvm/$dir \ --enable-glx-tls --enable-texture-float --enable-debug --enable-vdpau \ --disable-xvmc --disable-va --enable-nine --with-sha1=libnettle \ --with-gallium-drivers=radeonsi,r600,swrast --with-dri-drivers= \ --with-egl-platforms=x11,drm --enable-gles1 --enable-gles2 make -j4 sudo make install You'll probably want to delete /usr/lib/$dir/*mesa*/*. That's Ubuntu's invention that will prevent you from using installed libGL and libEGL. It's all kind of a mess, but I don't know of a better way. Marek On Tue, Sep 13, 2016 at 7:33 PM, Romain Failliot < romain.faill...@foolstep.com> wrote: > 2016-09-13 12:41 GMT-04:00 Marek Olšák : >> >> BTW, If you update LLVM to a newer version, you also have to re-build >> Mesa, because the LLVM version used by Mesa is determined while Mesa >> is being built. >> >> Also, the chance to rage-quit while building LLVM+Mesa is pretty high >> if you've never done it before. > > I see, is there a tutorial somewhere maybe on how to do that? > I know how to compile projects, that's not a problem. It's more about the > little details to make everything work once it's compiled. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] intel/isl: Divide QPitch by 2 for 3-D stencil textures on SKL+
On Tue, Sep 13, 2016 at 10:46 AM, Chad Versace wrote: > On Thu 08 Sep 2016, Jason Ekstrand wrote: > > --- > > src/intel/isl/isl_surface_state.c | 15 ++- > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/src/intel/isl/isl_surface_state.c > b/src/intel/isl/isl_surface_state.c > > index f8ea122..22fef3d 100644 > > --- a/src/intel/isl/isl_surface_state.c > > +++ b/src/intel/isl/isl_surface_state.c > > @@ -173,7 +173,20 @@ get_qpitch(const struct isl_surf *surf) > >unreachable("Bad isl_surf_dim"); > > case ISL_DIM_LAYOUT_GEN4_2D: > >if (GEN_GEN >= 9) { > > - return isl_surf_get_array_pitch_el_rows(surf); > > + if (surf->dim == ISL_SURF_DIM_3D && surf->tiling == > ISL_TILING_W) { > > +/* This is rather annoying and completely undocumented. It > > + * appears that the hardware has a bug (or undocumented > feature) > > + * regarding stencil buffers most likely related to the way > > + * W-tiling is handled as modified Y-tiling. If you bind a > 3-D or > > + * 2-D array stencil buffer normally, and use texelFetch on > it, > > + * the z or array index will get implicitly multiplied by 2 > for no > > + * obvious reason. The fix appears to be to divide qpitch > by 2 > > + * for W-tiled surfaces. > > + */ > > Have you confirmed that this fix is not needed on other gens? Or have > you only confirmed that it's needed on SKL, and are deferring the > workaround on the other gens until you had a chance to test it on them? > I'm not sure about KBL or later hardware. However, it's not needed pre-SKL because they use the GEN4_3D layout whereas SKL uses GEN4_2D for 3D textures. > Either way, the patch is sound. And the workaround doesn't surprise me. > Reviewed-by: Chad Versace > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] intel/isl: Divide QPitch by 2 for 3-D stencil textures on SKL+
On Thu 08 Sep 2016, Jason Ekstrand wrote: > --- > src/intel/isl/isl_surface_state.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/src/intel/isl/isl_surface_state.c > b/src/intel/isl/isl_surface_state.c > index f8ea122..22fef3d 100644 > --- a/src/intel/isl/isl_surface_state.c > +++ b/src/intel/isl/isl_surface_state.c > @@ -173,7 +173,20 @@ get_qpitch(const struct isl_surf *surf) >unreachable("Bad isl_surf_dim"); > case ISL_DIM_LAYOUT_GEN4_2D: >if (GEN_GEN >= 9) { > - return isl_surf_get_array_pitch_el_rows(surf); > + if (surf->dim == ISL_SURF_DIM_3D && surf->tiling == ISL_TILING_W) { > +/* This is rather annoying and completely undocumented. It > + * appears that the hardware has a bug (or undocumented feature) > + * regarding stencil buffers most likely related to the way > + * W-tiling is handled as modified Y-tiling. If you bind a 3-D > or > + * 2-D array stencil buffer normally, and use texelFetch on it, > + * the z or array index will get implicitly multiplied by 2 for > no > + * obvious reason. The fix appears to be to divide qpitch by 2 > + * for W-tiled surfaces. > + */ Have you confirmed that this fix is not needed on other gens? Or have you only confirmed that it's needed on SKL, and are deferring the workaround on the other gens until you had a chance to test it on them? Either way, the patch is sound. And the workaround doesn't surprise me. Reviewed-by: Chad Versace ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] isl/state: Don't set QPitch for GEN4_3D surfaces
On Thu 08 Sep 2016, Jason Ekstrand wrote: > --- > src/intel/isl/isl_surface_state.c | 17 - > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/src/intel/isl/isl_surface_state.c > b/src/intel/isl/isl_surface_state.c > index 979e140..f8ea122 100644 > --- a/src/intel/isl/isl_surface_state.c > +++ b/src/intel/isl/isl_surface_state.c > @@ -172,7 +172,6 @@ get_qpitch(const struct isl_surf *surf) > default: >unreachable("Bad isl_surf_dim"); > case ISL_DIM_LAYOUT_GEN4_2D: > - case ISL_DIM_LAYOUT_GEN4_3D: >if (GEN_GEN >= 9) { > return isl_surf_get_array_pitch_el_rows(surf); >} else { > @@ -199,6 +198,22 @@ get_qpitch(const struct isl_surf *surf) > *slices. > */ >return isl_surf_get_array_pitch_el(surf); > + case ISL_DIM_LAYOUT_GEN4_3D: > + /* QPitch doesn't make sense for ISL_DIM_LAYOUT_GEN4_3D since it uses a > + * different pitch at each LOD. Also, the QPitch field is ignored for > + * these surfaces. Yep. Reviewed-by: Chad Versace ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] intel/isl: Ignore base_array_layer and array_len for 3D storage surfaces
On Tue 13 Sep 2016, Jason Ekstrand wrote: > The time we want to restrict the Z range of a 3-D surface is when rendering > to it. For storage surfaces, we always want he full range. However, we Typo --^^ > still need to set MinimumArrayElement and RenderTargetViewExtent to > sensible values so we'll just set them to the reasonable defaults we used > before we started respecting the base_array_layer and array_len. > > This fixes a bunch of Vulkan CTS regressions caused by 48f195d7c6483ed. > > Signed-off-by: Jason Ekstrand > --- > src/intel/isl/isl_surface_state.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) Reviewed-by: Chad Versace ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 14/14] egl: Implement EGL_KHR_debug
On 09/13/2016 10:17 AM, Emil Velikov wrote: Hi guys, There's a bunch of outstanding style nitpicks (come to think of it 13/14 could use the same) Those aside: there's a bunch of serious suggestions which I missed last time. On 12 September 2016 at 23:19, Adam Jackson wrote: From: Kyle Brenneman Wire up the debug entrypoints to EGL dispatch, and add the extension string to the client extension list. --- src/egl/main/eglapi.c | 140 ++ src/egl/main/eglglobals.c | 3 +- 2 files changed, 142 insertions(+), 1 deletion(-) diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c index 0a6ebe7..6b0fd2e 100644 --- a/src/egl/main/eglapi.c +++ b/src/egl/main/eglapi.c @@ -1987,6 +1987,143 @@ eglExportDMABUFImageMESA(EGLDisplay dpy, EGLImage image, RETURN_EGL_EVAL(disp, ret); } +static EGLint EGLAPIENTRY +eglLabelObjectKHR(EGLDisplay dpy, EGLenum objectType, EGLObjectKHR object, + EGLLabelKHR label) +{ + _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC); + + if (objectType == EGL_OBJECT_THREAD_KHR) { + _EGLThreadInfo *t = _eglGetCurrentThread(); + + if (!_eglIsCurrentThreadDummy()) { + t->Label = label; + return EGL_SUCCESS; + } else { + _eglDebugReportFull(EGL_BAD_ALLOC, __func__, __func__, + EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL); + return EGL_BAD_ALLOC; + } Nit: Please use the same style as the "objectType == EGL_OBJECT_DISPLAY_KHR" case. + } else { Nit: You can also drop the "else" and flatten (indent one level less) all of the following code. + _EGLDisplay *disp = _eglLookupDisplay(dpy); + + if (disp == NULL) { + _eglError(EGL_BAD_DISPLAY, "eglLabelObjectKHR"); + return EGL_BAD_DISPLAY; + } + + if (objectType == EGL_OBJECT_DISPLAY_KHR) { + if (dpy != (EGLDisplay) object) { +_eglError(EGL_BAD_PARAMETER, "eglLabelObjectKHR"); +return EGL_BAD_PARAMETER; + } + disp->Label = label; + return EGL_SUCCESS; + } else { Nit: kill this "else {" as well ? + _EGLResourceType type; + + switch (objectType) + { Nit: move to previous line +case EGL_OBJECT_CONTEXT_KHR: + type = _EGL_RESOURCE_CONTEXT; + break; +case EGL_OBJECT_SURFACE_KHR: + type = _EGL_RESOURCE_SURFACE; + break; +case EGL_OBJECT_IMAGE_KHR: + type = _EGL_RESOURCE_IMAGE; + break; +case EGL_OBJECT_SYNC_KHR: + type = _EGL_RESOURCE_SYNC; + break; +case EGL_OBJECT_STREAM_KHR: +default: +_eglError(EGL_BAD_PARAMETER, "eglLabelObjectKHR"); + return EGL_BAD_PARAMETER; + } + + if (_eglCheckResource(object, type, disp)) { +_EGLResource *res = (_EGLResource *) object; +res->Label = label; +return EGL_SUCCESS; + } else { +_eglError(EGL_BAD_PARAMETER, "eglLabelObjectKHR"); +return EGL_BAD_PARAMETER; + } Nit: coding style. + } + } +} + +static EGLint Missing EGLAPIENTRY +eglDebugMessageControlKHR(EGLDEBUGPROCKHR callback, + const EGLAttrib *attrib_list) +{ + unsigned int newEnabled; + + _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC); + + mtx_lock(_eglGlobal.Mutex); + + newEnabled = _eglGlobal.debugTypesEnabled; + if (attrib_list != NULL) { + int i; + + for (i = 0; attrib_list[i] != EGL_NONE; i += 2) { Don't think we check it elsewhere (and/or if we should care too much) but still: Check if i overflows or use unsigned type ? + if (attrib_list[i] >= EGL_DEBUG_MSG_CRITICAL_KHR && + attrib_list[i] <= EGL_DEBUG_MSG_INFO_KHR) { +if (attrib_list[i + 1]) { + newEnabled |= DebugBitFromType(attrib_list[i]); +} else { + newEnabled &= ~DebugBitFromType(attrib_list[i]); +} Nit: break; ? A break here would be incorrect, since you can specify multiple flags in the attribute list. + } else { +// On error, set the last error code, call the current +// debug callback, and return the error code. +mtx_unlock(_eglGlobal.Mutex); +_eglReportError(EGL_BAD_ATTRIBUTE, NULL, + "Invalid attribute 0x%04lx", (unsigned long) attrib_list[i]); +return EGL_BAD_ATTRIBUTE; + } + } + } + + if (callback != NULL) { + _eglGlobal.debugCallback = callback; + _eglGlobal.debugTypesEnabled = newEnabled; + } else { + _eglGlobal.debugCallback = NULL; + _eglGlobal.debugTypesEnabled = _EGL_DEBUG_BIT_CRITICAL | _EGL_DEBUG_BIT_ERROR; + } + + mtx_unlock(_eglGlobal.Mutex); + return EGL_SUCCESS; +} + +static EGLBoolean Missing EGLAPIENTRY +eglQueryDebugKH
[Mesa-dev] [PATCH] nvc0/ir: fix comments about instructions info
The comment for the commutative flags was wrong because OP_MUL is before OP_MAD. While we are at it add missing opcodes, and fix the comment about the short forms. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp index f5981de..f75e395 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp @@ -156,13 +156,14 @@ void TargetNVC0::initOpInfo() static const uint32_t commutative[(OP_LAST + 31) / 32] = { - // ADD, MAD, MUL, AND, OR, XOR, MAX, MIN + // ADD, MUL, MAD, FMA, AND, OR, XOR, MAX, MIN, SET_AND, SET_OR, SET_XOR, + // SET, SELP, SLCT 0x0670ca00, 0x003f, 0x, 0x }; static const uint32_t shortForm[(OP_LAST + 31) / 32] = { - // ADD, MAD, MUL, AND, OR, XOR, PRESIN, PREEX2, SFN, CVT, PINTERP, MOV + // ADD, MUL, MAD, FMA, AND, OR, XOR, MAX, MIN 0x0670ca00, 0x, 0x, 0x }; -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2
2016-09-13 12:41 GMT-04:00 Marek Olšák : > > BTW, If you update LLVM to a newer version, you also have to re-build > Mesa, because the LLVM version used by Mesa is determined while Mesa > is being built. > > Also, the chance to rage-quit while building LLVM+Mesa is pretty high > if you've never done it before. I see, is there a tutorial somewhere maybe on how to do that? I know how to compile projects, that's not a problem. It's more about the little details to make everything work once it's compiled. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/12] anv: Add func anv_image_has_hiz()
On Tue 13 Sep 2016, Nanley Chery wrote: > On Wed, Sep 07, 2016 at 03:51:14PM -0700, Chad Versace wrote: > > On Wed 07 Sep 2016, Nanley Chery wrote: > > > On Fri, Sep 02, 2016 at 11:42:24AM -0700, Chad Versace wrote: > > > > On Thu 01 Sep 2016, Jason Ekstrand wrote: > > > > > On Wed, Aug 31, 2016 at 8:29 PM, Nanley Chery > > > > > wrote: > > > > > > > > > > From: Chad Versace > > > > > > > > > > Nanley Chery (amend): > > > > > - Remove wip! tag > > > > > > > > > > Signed-off-by: Nanley Chery > > > > > --- > > > > > src/intel/vulkan/anv_private.h | 10 ++ > > > > > 1 file changed, 10 insertions(+) > > > > > > > > > > > > > > > > > > +static inline bool > > > > > +anv_image_has_hiz(const struct anv_image *image) > > > > > +{ > > > > > + /* We must check the usage because anv_image::hiz_surface > > > > > belongs to > > > > > + * a union. > > > > > + */ > > > > > + return (image->usage & > > > > > VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) && > > > > > > > > > > > > > > > Would checking (image->aspects & VK_IMAGE_ASPECT_DEPTH_BIT) be more > > > > > appropriate? > > > > > > > > I agree. VK_IMAGE_ASPECT_DEPTH_BIT makes more sense. > > > > > > > > Also, that's what the documentation for anv_image says, quoted below: > > > > > > > >struct anv_image { > > > > ... > > > > > > > > /** > > > >* Image subsurfaces > > > >* > > > >* For each foo, anv_image::foo_surface is valid if and only if > > > >* anv_image::aspects has a foo aspect. > > > >* > > > >* ... > > > >*/ > > > > union { > > > > struct anv_surface color_surface; > > > > > > > > struct { > > > > struct anv_surface depth_surface; > > > > struct anv_surface stencil_surface; > > > > > > > > }; > > > > }; > > > >}; > > > > > > > > > > Sure. Thanks for the documentation quote. > > > > > > A HiZ surface is created for a depth image if both usage and aspect > > > conditions > > > are satisfied. Would it be better for me to add the aspect check instead > > > of > > > replacing the usage check with it? > > > > I see. You want to avoid allocating the HiZ surface if it's never > > rendered as a depth attachment. > > > > So yes, your suggestion sounds good to me. > > I'll actually leave it out if you don't mind. The usage check isn't > required to get the right result. Sure. As long as the aspect check is present, then it's good with me. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] AppVeyor fails with 404 during wget
>> It looks like the winflexbison URL changed some time ago. But this >> didn't cause any build failures because the ZIP was being recovered from >> the cache. >> >> I'll look into it. >> >> Jose > > > It looks the archive was moved into a old_versions subdir. > > The attached patch should fix it. Could you please try it on your repos and > confirm. Much better, compilation starts correctly now, thanks. - Steve ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/va: also honors interlaced preference when providing a video format
Leo Liu wrote: Hi Andy, On 09/13/2016 06:22 AM, Andy Furniss wrote: Zhang, Boyuan wrote: Hi Leo, Christian and Julien, I tested the patch with Vaapi Encoding and Transcoding, it seems working fine. We are using "VAAPI_DISABLE_INTERLACE" env, so interlaced is always disabled. Though I notice it will break screen recording scripts for existing users who previously didn't need the env set but will after this. Totally untested/thought through, but maybe the env should default to on? Agree, can you come up a patch for that? OK, but maybe I should test a bit first to see if anything regresses. Unfortunately I today, by chance found an issue with mpv. With VAAPI_DISABLE_INTERLACE=1 which it needs for mpv --vo-vaapi all is apparently OK when playing say a 25fps vid, but I've found that if I push the framerate to refresh rate and do something that draws OSD than image is corrupted, possible many VM faults logged followed by a segfault in u_copy_yv12_img_to_nv12_surf this happens with or without the uv swap patch below. I will file a bug after more investigation. Bisecting mesa goes back to the commit that introduced VAAPI_DISABLE_INTERLACE. Also any outstanding patches for VA-API encode from you was reviewed, but not committed? if any, sent to me, I can push them. There's only https://lists.freedesktop.org/archives/mesa-dev/2016-July/124695.html for the uv swap issue. Not my issue as such, but did anyone notice this from Mark Thompson, who does vaapi for libav/ffmpeg? I notice he didn't keep the CCs so maybe it got missed. https://lists.freedesktop.org/archives/mesa-dev/2016-September/128076.html ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] intel: Move Vulkan sample positions to common code
On Mon, Sep 12, 2016 at 3:50 PM, Jason Ekstrand wrote: > Signed-off-by: Jason Ekstrand > --- > .../genX_multisample.h => common/gen_sample_positions.h} | 10 +- > src/intel/vulkan/genX_blorp_exec.c | 10 +- > src/intel/vulkan/genX_pipeline_util.h| 10 +- > src/intel/vulkan/genX_state.c| 12 > ++-- > 4 files changed, 21 insertions(+), 21 deletions(-) > rename src/intel/{vulkan/genX_multisample.h => > common/gen_sample_positions.h} (94%) > > diff --git a/src/intel/vulkan/genX_multisample.h > b/src/intel/common/gen_sample_positions.h > similarity index 94% > rename from src/intel/vulkan/genX_multisample.h > rename to src/intel/common/gen_sample_positions.h > index 0deb48f..0411bf0 100644 > --- a/src/intel/vulkan/genX_multisample.h > +++ b/src/intel/common/gen_sample_positions.h > @@ -22,17 +22,17 @@ > */ > #pragma once > > -#define SAMPLE_POS_1X(prefix) \ > +#define GEN_SAMPLE_POS_1X(prefix) \ > prefix##0XOffset = 0.5; \ > prefix##0YOffset = 0.5; > > -#define SAMPLE_POS_2X(prefix) \ > +#define GEN_SAMPLE_POS_2X(prefix) \ > prefix##0XOffset = 0.25; \ > prefix##0YOffset = 0.25; \ > prefix##1XOffset = 0.75; \ > prefix##1YOffset = 0.75; > > -#define SAMPLE_POS_4X(prefix) \ > +#define GEN_SAMPLE_POS_4X(prefix) \ > prefix##0XOffset = 0.375; \ > prefix##0YOffset = 0.125; \ > prefix##1XOffset = 0.875; \ > @@ -42,7 +42,7 @@ prefix##2YOffset = 0.625; \ > prefix##3XOffset = 0.625; \ > prefix##3YOffset = 0.875; > > -#define SAMPLE_POS_8X(prefix) \ > +#define GEN_SAMPLE_POS_8X(prefix) \ > prefix##0XOffset = 0.5625; \ > prefix##0YOffset = 0.3125; \ > prefix##1XOffset = 0.4375; \ > @@ -60,7 +60,7 @@ prefix##6YOffset = 0.9375; \ > prefix##7XOffset = 0.9375; \ > prefix##7YOffset = 0.0625; > > -#define SAMPLE_POS_16X(prefix) \ > +#define GEN_SAMPLE_POS_16X(prefix) \ > prefix##0XOffset = 0.5625; \ > prefix##0YOffset = 0.5625; \ > prefix##1XOffset = 0.4375; \ > diff --git a/src/intel/vulkan/genX_blorp_exec.c > b/src/intel/vulkan/genX_blorp_exec.c > index 889c423..5a08ed3 100644 > --- a/src/intel/vulkan/genX_blorp_exec.c > +++ b/src/intel/vulkan/genX_blorp_exec.c > @@ -24,7 +24,6 @@ > #include > > #include "anv_private.h" > -#include "genX_multisample.h" > > /* These are defined in anv_private.h and blorp_genX_exec.h */ > #undef __gen_address_type > @@ -32,6 +31,7 @@ > #undef __gen_combine_address > > #include "common/gen_l3_config.h" > +#include "common/gen_sample_positions.h" > #include "blorp/blorp_genX_exec.h" > > static void * > @@ -164,16 +164,16 @@ blorp_emit_3dstate_multisample(struct blorp_batch > *batch, unsigned samples) > >switch (samples) { >case 1: > - SAMPLE_POS_1X(ms.Sample); > + GEN_SAMPLE_POS_1X(ms.Sample); > break; >case 2: > - SAMPLE_POS_2X(ms.Sample); > + GEN_SAMPLE_POS_2X(ms.Sample); > break; >case 4: > - SAMPLE_POS_4X(ms.Sample); > + GEN_SAMPLE_POS_4X(ms.Sample); > break; >case 8: > - SAMPLE_POS_8X(ms.Sample); > + GEN_SAMPLE_POS_8X(ms.Sample); > break; >default: > break; > diff --git a/src/intel/vulkan/genX_pipeline_util.h > b/src/intel/vulkan/genX_pipeline_util.h > index 2c0bf3f..0ff92f1 100644 > --- a/src/intel/vulkan/genX_pipeline_util.h > +++ b/src/intel/vulkan/genX_pipeline_util.h > @@ -22,8 +22,8 @@ > */ > > #include "common/gen_l3_config.h" > +#include "common/gen_sample_positions.h" > #include "vk_format_info.h" > -#include "genX_multisample.h" > > static uint32_t > vertex_element_comp_control(enum isl_format format, unsigned comp) > @@ -610,16 +610,16 @@ emit_ms_state(struct anv_pipeline *pipeline, > >switch (samples) { >case 1: > - SAMPLE_POS_1X(ms.Sample); > + GEN_SAMPLE_POS_1X(ms.Sample); > break; >case 2: > - SAMPLE_POS_2X(ms.Sample); > + GEN_SAMPLE_POS_2X(ms.Sample); > break; >case 4: > - SAMPLE_POS_4X(ms.Sample); > + GEN_SAMPLE_POS_4X(ms.Sample); > break; >case 8: > - SAMPLE_POS_8X(ms.Sample); > + GEN_SAMPLE_POS_8X(ms.Sample); > break; >default: > break; > diff --git a/src/intel/vulkan/genX_state.c b/src/intel/vulkan/genX_state.c > index 2849b50..a6d405d 100644 > --- a/src/intel/vulkan/genX_state.c > +++ b/src/intel/vulkan/genX_state.c > @@ -28,8 +28,8 @@ > #include > > #include "anv_private.h" > -#include "genX_multisample.h" > > +#include "common/gen_sample_positions.h" > #include "genxml/gen_macros.h" > #include "genxml/genX_pack.h" > > @@ -77,12 +77,12 @@ genX(init_device_state)(struct anv_device *device) > * VkPhysicalDeviceFeatures::standardSampleLocations. > */ > anv_batch_emit(&batch, GENX(3DSTATE_SAMPLE_PATTERN), sp) { > - SAMPLE_POS_1X(sp
[Mesa-dev] [PATCH v3 0/3] Make eglExportDMABUFImageMESA return corresponding offset.
This patchset makes eglExportDMABUFImageMESA return corresponding offset of EGLImage instead of 0 on intel platfrom with classic dri driver(i965). v2: Add version check of __DRIimageExtension implementation in egl loader (Suggested by Axel Davy). v3: Don't add version check of __DRIimageExtension implementation in egl loader. Set the offset only when queryImage() succeeds. (Suggested by Emil Velikov) Chuanbo Weng (3): dri: add offset attribute and bump version of EGLImage extensions. egl: return corresponding offset of EGLImage instead of 0. i965: implement querying __DRI_IMAGE_ATTRIB_OFFSET. include/GL/internal/dri_interface.h | 4 +++- src/egl/drivers/dri2/egl_dri2.c | 8 +++- src/mesa/drivers/dri/i965/intel_screen.c | 9 +++-- 3 files changed, 17 insertions(+), 4 deletions(-) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 2/3] egl: return corresponding offset of EGLImage instead of 0.
The offset should not always be 0. For example, if EGLImage is created from a 2D texture with EGL_GL_TEXTURE_LEVEL=1, then the offset should be the actual start of miplevel 1 in bo. v2: Add version check of __DRIimageExtension implementation (Suggested by Axel Davy). v3: Don't add version check of __DRIimageExtension implementation. Set the offset only when queryImage() succeeds. (Suggested by Emil Velikov) Signed-off-by: Chuanbo Weng --- src/egl/drivers/dri2/egl_dri2.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index bbc457c..84687de 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -2259,8 +2259,14 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, _EGLDisplay *disp, _EGLImage *im dri2_dpy->image->queryImage(dri2_img->dri_image, __DRI_IMAGE_ATTRIB_STRIDE, strides); - if (offsets) + if (offsets) { offsets[0] = 0; + EGLint img_offset = 0; + bool ret = dri2_dpy->image->queryImage(dri2_img->dri_image, + __DRI_IMAGE_ATTRIB_OFFSET, &img_offset); + if(ret == true) + offsets[0] = img_offset; + } return EGL_TRUE; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 1/3] dri: add offset attribute and bump version of EGLImage extensions.
Offset is useful for buffer sharing with other components, so add it to queryImage attributes. Signed-off-by: Chuanbo Weng --- include/GL/internal/dri_interface.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/GL/internal/dri_interface.h b/include/GL/internal/dri_interface.h index 1c73cce..d0b1bc6 100644 --- a/include/GL/internal/dri_interface.h +++ b/include/GL/internal/dri_interface.h @@ -1094,7 +1094,7 @@ struct __DRIdri2ExtensionRec { * extensions. */ #define __DRI_IMAGE "DRI_IMAGE" -#define __DRI_IMAGE_VERSION 12 +#define __DRI_IMAGE_VERSION 13 /** * These formats correspond to the similarly named MESA_FORMAT_* @@ -1208,6 +1208,8 @@ struct __DRIdri2ExtensionRec { #define __DRI_IMAGE_ATTRIB_FOURCC 0x2008 /* available in versions 11 */ #define __DRI_IMAGE_ATTRIB_NUM_PLANES 0x2009 /* available in versions 11 */ +#define __DRI_IMAGE_ATTRIB_OFFSET 0x200A /* available in versions 13 */ + enum __DRIYUVColorSpace { __DRI_YUV_COLOR_SPACE_UNDEFINED = 0, __DRI_YUV_COLOR_SPACE_ITU_REC601 = 0x327F, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 3/3] i965: implement querying __DRI_IMAGE_ATTRIB_OFFSET.
Implement querying this attribute in intelImageExtension and bump version of intelImageExtension. Signed-off-by: Chuanbo Weng --- src/mesa/drivers/dri/i965/intel_screen.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index a3d252d..8c75e61 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -609,6 +609,9 @@ intel_query_image(__DRIimage *image, int attrib, int *value) case __DRI_IMAGE_ATTRIB_NUM_PLANES: *value = 1; return true; + case __DRI_IMAGE_ATTRIB_OFFSET: + *value = image->offset; + return true; default: return false; @@ -845,7 +848,7 @@ intel_from_planar(__DRIimage *parent, int plane, void *loaderPrivate) } static const __DRIimageExtension intelImageExtension = { -.base = { __DRI_IMAGE, 11 }, +.base = { __DRI_IMAGE, 13 }, .createImageFromName= intel_create_image_from_name, .createImageFromRenderbuffer= intel_create_image_from_renderbuffer, @@ -860,7 +863,9 @@ static const __DRIimageExtension intelImageExtension = { .createImageFromFds = intel_create_image_from_fds, .createImageFromDmaBufs = intel_create_image_from_dma_bufs, .blitImage = NULL, -.getCapabilities= NULL +.getCapabilities= NULL, +.mapImage = NULL, +.unmapImage = NULL, }; static int -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] radeonsi: rename get_sampler_desc -> load_sampler_desc
From: Marek Olšák --- src/gallium/drivers/radeonsi/si_shader.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 696f67b..3f77714 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -4298,23 +4298,23 @@ enum desc_type { static LLVMTypeRef const_array(LLVMTypeRef elem_type, int num_elements) { return LLVMPointerType(LLVMArrayType(elem_type, num_elements), CONST_ADDR_SPACE); } /** * Load an image view, fmask view. or sampler state descriptor. */ -static LLVMValueRef get_sampler_desc_custom(struct si_shader_context *ctx, - LLVMValueRef list, LLVMValueRef index, - enum desc_type type) +static LLVMValueRef load_sampler_desc_custom(struct si_shader_context *ctx, +LLVMValueRef list, LLVMValueRef index, +enum desc_type type) { struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm; LLVMBuilderRef builder = gallivm->builder; switch (type) { case DESC_IMAGE: /* The image is at [0:7]. */ index = LLVMBuildMul(builder, index, LLVMConstInt(ctx->i32, 2, 0), ""); break; case DESC_FMASK: @@ -4327,27 +4327,27 @@ static LLVMValueRef get_sampler_desc_custom(struct si_shader_context *ctx, index = LLVMBuildMul(builder, index, LLVMConstInt(ctx->i32, 4, 0), ""); index = LLVMBuildAdd(builder, index, LLVMConstInt(ctx->i32, 3, 0), ""); list = LLVMBuildPointerCast(builder, list, const_array(ctx->v4i32, 0), ""); break; } return build_indexed_load_const(ctx, list, index); } -static LLVMValueRef get_sampler_desc(struct si_shader_context *ctx, +static LLVMValueRef load_sampler_desc(struct si_shader_context *ctx, LLVMValueRef index, enum desc_type type) { LLVMValueRef list = LLVMGetParam(ctx->radeon_bld.main_fn, SI_PARAM_SAMPLERS); - return get_sampler_desc_custom(ctx, list, index, type); + return load_sampler_desc_custom(ctx, list, index, type); } /* Disable anisotropic filtering if BASE_LEVEL == LAST_LEVEL. * * SI-CI: * If BASE_LEVEL == LAST_LEVEL, the shader must disable anisotropic * filtering manually. The driver sets img7 to a mask clearing * MAX_ANISO_RATIO if BASE_LEVEL == LAST_LEVEL. The shader must do: * s_and_b32 samp0, samp0, img7 * @@ -4388,31 +4388,31 @@ static void tex_fetch_ptrs( if (emit_data->inst->Src[sampler_src].Register.Indirect) { const struct tgsi_full_src_register *reg = &emit_data->inst->Src[sampler_src]; LLVMValueRef ind_index; ind_index = get_bounded_indirect_index(ctx, ®->Indirect, reg->Register.Index, SI_NUM_SAMPLERS); - *res_ptr = get_sampler_desc(ctx, ind_index, DESC_IMAGE); + *res_ptr = load_sampler_desc(ctx, ind_index, DESC_IMAGE); if (target == TGSI_TEXTURE_2D_MSAA || target == TGSI_TEXTURE_2D_ARRAY_MSAA) { if (samp_ptr) *samp_ptr = NULL; if (fmask_ptr) - *fmask_ptr = get_sampler_desc(ctx, ind_index, DESC_FMASK); + *fmask_ptr = load_sampler_desc(ctx, ind_index, DESC_FMASK); } else { if (samp_ptr) { - *samp_ptr = get_sampler_desc(ctx, ind_index, DESC_SAMPLER); + *samp_ptr = load_sampler_desc(ctx, ind_index, DESC_SAMPLER); *samp_ptr = sici_fix_sampler_aniso(ctx, *res_ptr, *samp_ptr); } if (fmask_ptr) *fmask_ptr = NULL; } } else { *res_ptr = ctx->sampler_views[sampler_index]; if (samp_ptr) *samp_ptr = ctx->sampler_states[sampler_index]; if (fmask_ptr) @@ -5907,29 +5907,29 @@ static void preload_samplers(struct si_shader_context *ctx) LLVMValueRef offset; if (num_samplers == 0) return; /* Load the resources and samplers, we rely on the code sinking to do the rest */ for (i = 0; i < num_samplers; ++i) { /* Resource */ offset = lp_build_co
[Mesa-dev] [PATCH 4/6] radeonsi: get rid of img/buf/sampler descriptor preloading
From: Marek Olšák 26011 shaders in 14651 tests Totals: SGPRS: 1251920 -> 1152636 (-7.93 %) VGPRS: 728421 -> 728198 (-0.03 %) Spilled SGPRs: 16644 -> 3776 (-77.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 36001064 -> 35835152 (-0.46 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 21 -> 222372 (0.07 %) Wait states: 0 -> 0 (0.00 %) --- src/gallium/drivers/radeonsi/si_shader.c | 123 +++ 1 file changed, 28 insertions(+), 95 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 3f77714..c96c52e 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -100,25 +100,20 @@ struct si_shader_context LLVMTargetMachineRef tm; unsigned invariant_load_md_kind; unsigned range_md_kind; unsigned uniform_md_kind; LLVMValueRef empty_md; /* Preloaded descriptors. */ LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS]; - LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS]; - LLVMValueRef sampler_views[SI_NUM_SAMPLERS]; - LLVMValueRef sampler_states[SI_NUM_SAMPLERS]; - LLVMValueRef fmasks[SI_NUM_SAMPLERS]; - LLVMValueRef images[SI_NUM_IMAGES]; LLVMValueRef esgs_ring; LLVMValueRef gsvs_ring[4]; LLVMValueRef lds; LLVMValueRef gs_next_vertex[4]; LLVMValueRef return_value; LLVMTypeRef voidt; LLVMTypeRef i1; LLVMTypeRef i8; @@ -3420,30 +3415,32 @@ static void membar_emit( struct si_shader_context *ctx = si_shader_context(bld_base); emit_waitcnt(ctx); } static LLVMValueRef shader_buffer_fetch_rsrc(struct si_shader_context *ctx, const struct tgsi_full_src_register *reg) { LLVMValueRef ind_index; - LLVMValueRef rsrc_ptr; + LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, +SI_PARAM_SHADER_BUFFERS); - if (!reg->Register.Indirect) - return ctx->shader_buffers[reg->Register.Index]; + if (!reg->Register.Indirect) { + ind_index = LLVMConstInt(ctx->i32, reg->Register.Index, 0); + return build_indexed_load_const(ctx, rsrc_ptr, ind_index); + } ind_index = get_bounded_indirect_index(ctx, ®->Indirect, reg->Register.Index, SI_NUM_SHADER_BUFFERS); - rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, SI_PARAM_SHADER_BUFFERS); return build_indexed_load_const(ctx, rsrc_ptr, ind_index); } static bool tgsi_is_array_sampler(unsigned target) { return target == TGSI_TEXTURE_1D_ARRAY || target == TGSI_TEXTURE_SHADOW1D_ARRAY || target == TGSI_TEXTURE_2D_ARRAY || target == TGSI_TEXTURE_SHADOW2D_ARRAY || target == TGSI_TEXTURE_CUBE_ARRAY || @@ -3493,46 +3490,54 @@ static LLVMValueRef force_dcc_off(struct si_shader_context *ctx, * Load the resource descriptor for \p image. */ static void image_fetch_rsrc( struct lp_build_tgsi_context *bld_base, const struct tgsi_full_src_register *image, bool dcc_off, LLVMValueRef *rsrc) { struct si_shader_context *ctx = si_shader_context(bld_base); + LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, +SI_PARAM_IMAGES); assert(image->Register.File == TGSI_FILE_IMAGE); if (!image->Register.Indirect) { - /* Fast path: use preloaded resources */ - *rsrc = ctx->images[image->Register.Index]; + struct tgsi_shader_info *info = &ctx->shader->selector->info; + int i = image->Register.Index; + LLVMValueRef index = LLVMConstInt(ctx->i32, i, 0); + + /* Rely on LLVM to shrink the load for buffer resources. */ + *rsrc = build_indexed_load_const(ctx, rsrc_ptr, index); + + if (info->images_writemask & (1 << i) && + !(info->images_buffers & (1 << i))) + *rsrc = force_dcc_off(ctx, *rsrc); } else { /* Indexing and manual load */ LLVMValueRef ind_index; - LLVMValueRef rsrc_ptr; LLVMValueRef tmp; /* From the GL_ARB_shader_image_load_store extension spec: * *If a shader performs an image load, store, or atomic *operation using an image variable declared as an array, *and if the index used to select an individual element is *negative or greater than or equal to the size of the *array, the results of the operation are undefined but may
[Mesa-dev] [PATCH 1/6] radeonsi: load streamout buffer descriptors before use
From: Marek Olšák --- src/gallium/drivers/radeonsi/si_shader.c | 67 1 file changed, 34 insertions(+), 33 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index be6fae7..b9ad4be 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -105,21 +105,20 @@ struct si_shader_context unsigned uniform_md_kind; LLVMValueRef empty_md; LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS]; LLVMValueRef lds; LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS]; LLVMValueRef sampler_views[SI_NUM_SAMPLERS]; LLVMValueRef sampler_states[SI_NUM_SAMPLERS]; LLVMValueRef fmasks[SI_NUM_SAMPLERS]; LLVMValueRef images[SI_NUM_IMAGES]; - LLVMValueRef so_buffers[4]; LLVMValueRef esgs_ring; LLVMValueRef gsvs_ring[4]; LLVMValueRef gs_next_vertex[4]; LLVMValueRef return_value; LLVMTypeRef voidt; LLVMTypeRef i1; LLVMTypeRef i8; LLVMTypeRef i32; LLVMTypeRef i64; @@ -2253,31 +2252,64 @@ static void si_dump_streamout(struct pipe_stream_output_info *so) i, so->output[i].output_buffer, so->output[i].dst_offset, so->output[i].dst_offset + so->output[i].num_components - 1, so->output[i].register_index, mask & 1 ? "x" : "", mask & 2 ? "y" : "", mask & 4 ? "z" : "", mask & 8 ? "w" : ""); } } +static void load_streamout_descriptors(struct si_shader_context *ctx, + LLVMValueRef so_buffers[4]) +{ + struct lp_build_tgsi_context *bld_base = &ctx->radeon_bld.soa.bld_base; + struct gallivm_state *gallivm = bld_base->base.gallivm; + unsigned i; + + /* Streamout can only be used if the shader is compiled as VS. */ + if (!ctx->shader->selector->so.num_outputs || + (ctx->type == PIPE_SHADER_VERTEX && +(ctx->shader->key.vs.as_es || + ctx->shader->key.vs.as_ls)) || + (ctx->type == PIPE_SHADER_TESS_EVAL && +ctx->shader->key.tes.as_es)) + return; + + LLVMValueRef buf_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, + SI_PARAM_RW_BUFFERS); + + /* Load the resources, we rely on the code sinking to do the rest */ + for (i = 0; i < 4; ++i) { + if (ctx->shader->selector->so.stride[i]) { + LLVMValueRef offset = lp_build_const_int32(gallivm, + SI_VS_STREAMOUT_BUF0 + i); + + so_buffers[i] = build_indexed_load_const(ctx, buf_ptr, offset); + } + } +} + /* On SI, the vertex shader is responsible for writing streamout data * to buffers. */ static void si_llvm_emit_streamout(struct si_shader_context *ctx, struct si_shader_output_values *outputs, unsigned noutput) { struct pipe_stream_output_info *so = &ctx->shader->selector->so; struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm; LLVMBuilderRef builder = gallivm->builder; int i, j; struct lp_build_if_state if_ctx; + LLVMValueRef so_buffers[4]; + + load_streamout_descriptors(ctx, so_buffers); /* Get bits [22:16], i.e. (so_param >> 16) & 127; */ LLVMValueRef so_vtx_count = unpack_param(ctx, ctx->param_streamout_config, 16, 7); LLVMValueRef tid = get_thread_id(ctx); /* can_emit = tid < so_vtx_count; */ LLVMValueRef can_emit = LLVMBuildICmp(builder, LLVMIntULT, tid, so_vtx_count, ""); @@ -2359,21 +2391,21 @@ static void si_llvm_emit_streamout(struct si_shader_context *ctx, } break; } LLVMValueRef can_emit_stream = LLVMBuildICmp(builder, LLVMIntEQ, stream_id, lp_build_const_int32(gallivm, stream), ""); lp_build_if(&if_ctx_stream, gallivm, can_emit_stream); - build_tbuffer_store_dwords(ctx, ctx->so_buffers[buf_idx], + build_tbuffer_store_dwords(ctx, so_buffers[buf_idx], vdata, num_comps, so_write_offset[buf_idx], LLVMConstInt(ctx->i32, 0, 0), so->output[i].dst_offset*4); lp_build_endif(&if_ctx_st
[Mesa-dev] [PATCH 6/6] radeonsi: reload PS inputs with direct indexing at each use
From: Marek Olšák The LLVM compiler can CSE interp intrinsics thanks to LLVMReadNoneAttribute. 26011 shaders in 14651 tests Totals: SGPRS: 1146340 -> 1132676 (-1.19 %) VGPRS: 727371 -> 711730 (-2.15 %) Spilled SGPRs: 2218 -> 2078 (-6.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35841268 -> 36009732 (0.47 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222559 -> 224779 (1.00 %) Wait states: 0 -> 0 (0.00 %) --- src/gallium/drivers/radeon/radeon_llvm.h | 6 - .../drivers/radeon/radeon_setup_tgsi_llvm.c| 28 ++ src/gallium/drivers/radeonsi/si_shader.c | 27 + 3 files changed, 39 insertions(+), 22 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_llvm.h b/src/gallium/drivers/radeon/radeon_llvm.h index da5b7f5..f508d32 100644 --- a/src/gallium/drivers/radeon/radeon_llvm.h +++ b/src/gallium/drivers/radeon/radeon_llvm.h @@ -23,21 +23,23 @@ * Authors: Tom Stellard * */ #ifndef RADEON_LLVM_H #define RADEON_LLVM_H #include #include "gallivm/lp_bld_init.h" #include "gallivm/lp_bld_tgsi.h" +#include "tgsi/tgsi_parse.h" +#define RADEON_LLVM_MAX_INPUT_SLOTS 32 #define RADEON_LLVM_MAX_INPUTS 32 * 4 #define RADEON_LLVM_MAX_OUTPUTS 32 * 4 #define RADEON_LLVM_INITIAL_CF_DEPTH 4 #define RADEON_LLVM_MAX_SYSTEM_VALUES 4 struct radeon_llvm_branch { LLVMBasicBlockRef endif_block; LLVMBasicBlockRef if_block; @@ -55,33 +57,35 @@ struct radeon_llvm_context { /*=== Front end configuration ===*/ /* Instructions that are not described by any of the TGSI opcodes. */ /** This function is responsible for initilizing the inputs array and will be * called once for each input declared in the TGSI shader. */ void (*load_input)(struct radeon_llvm_context *, unsigned input_index, - const struct tgsi_full_declaration *decl); + const struct tgsi_full_declaration *decl, + LLVMValueRef out[4]); void (*load_system_value)(struct radeon_llvm_context *, unsigned index, const struct tgsi_full_declaration *decl); void (*declare_memory_region)(struct radeon_llvm_context *, const struct tgsi_full_declaration *decl); /** This array contains the input values for the shader. Typically these * values will be in the form of a target intrinsic that will inform the * backend how to load the actual inputs to the shader. */ + struct tgsi_full_declaration input_decls[RADEON_LLVM_MAX_INPUT_SLOTS]; LLVMValueRef inputs[RADEON_LLVM_MAX_INPUTS]; LLVMValueRef outputs[RADEON_LLVM_MAX_OUTPUTS][TGSI_NUM_CHANNELS]; /** This pointer is used to contain the temporary values. * The amount of temporary used in tgsi can't be bound to a max value and * thus we must allocate this array at runtime. */ LLVMValueRef *temps; unsigned temps_count; LLVMValueRef system_values[RADEON_LLVM_MAX_SYSTEM_VALUES]; diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 4643e6d..11f0cf2 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -439,28 +439,43 @@ LLVMValueRef radeon_llvm_emit_fetch(struct lp_build_tgsi_context *bld_base, bld_base->int_bld.zero); result = LLVMConstInsertElement(result, bld->immediates[reg->Register.Index][swizzle + 1], bld_base->int_bld.one); return LLVMConstBitCast(result, ctype); } else { return LLVMConstBitCast(bld->immediates[reg->Register.Index][swizzle], ctype); } } - case TGSI_FILE_INPUT: - result = ctx->inputs[radeon_llvm_reg_index_soa(reg->Register.Index, swizzle)]; + case TGSI_FILE_INPUT: { + unsigned index = reg->Register.Index; + LLVMValueRef input[4]; + + /* I don't think doing this for vertex shaders is beneficial. +* For those, we want to make sure the VMEM loads are executed +* only once. Fragment shaders don't care much, because +* v_interp instructions are much cheaper than VMEM loads. +*/ + if (ctx->soa.bld_base.info->processor == PIPE_SHADER_FRAGMENT) + ctx->load_input(ctx, index, &ctx->input_decls[index], input); + else +
[Mesa-dev] [PATCH 2/6] radeonsi: cosmetic changes in si_shader.c
From: Marek Olšák --- src/gallium/drivers/radeonsi/si_shader.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index b9ad4be..696f67b 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -98,29 +98,31 @@ struct si_shader_context */ int param_tess_offchip; LLVMTargetMachineRef tm; unsigned invariant_load_md_kind; unsigned range_md_kind; unsigned uniform_md_kind; LLVMValueRef empty_md; + /* Preloaded descriptors. */ LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS]; - LLVMValueRef lds; LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS]; LLVMValueRef sampler_views[SI_NUM_SAMPLERS]; LLVMValueRef sampler_states[SI_NUM_SAMPLERS]; LLVMValueRef fmasks[SI_NUM_SAMPLERS]; LLVMValueRef images[SI_NUM_IMAGES]; LLVMValueRef esgs_ring; LLVMValueRef gsvs_ring[4]; + + LLVMValueRef lds; LLVMValueRef gs_next_vertex[4]; LLVMValueRef return_value; LLVMTypeRef voidt; LLVMTypeRef i1; LLVMTypeRef i8; LLVMTypeRef i32; LLVMTypeRef i64; LLVMTypeRef i128; LLVMTypeRef f32; @@ -5856,21 +5858,21 @@ static void create_function(struct si_shader_context *ctx) LLVMArrayType(ctx->i32, 64), "ddxy_lds", LOCAL_ADDR_SPACE); if ((ctx->type == PIPE_SHADER_VERTEX && shader->key.vs.as_ls) || ctx->type == PIPE_SHADER_TESS_CTRL || ctx->type == PIPE_SHADER_TESS_EVAL) declare_tess_lds(ctx); } -static void preload_constants(struct si_shader_context *ctx) +static void preload_constant_buffers(struct si_shader_context *ctx) { struct lp_build_tgsi_context *bld_base = &ctx->radeon_bld.soa.bld_base; struct gallivm_state *gallivm = bld_base->base.gallivm; const struct tgsi_shader_info *info = bld_base->info; unsigned buf; LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, SI_PARAM_CONST_BUFFERS); for (buf = 0; buf < SI_NUM_CONST_BUFFERS; buf++) { if (info->const_file_max[buf] == -1) continue; @@ -6790,21 +6792,21 @@ int si_compile_tgsi_shader(struct si_screen *sscreen, case PIPE_SHADER_COMPUTE: ctx.radeon_bld.declare_memory_region = declare_compute_memory; break; default: assert(!"Unsupported shader type"); return -1; } create_meta_data(&ctx); create_function(&ctx); - preload_constants(&ctx); + preload_constant_buffers(&ctx); preload_shader_buffers(&ctx); preload_samplers(&ctx); preload_images(&ctx); preload_ring_buffers(&ctx); if (ctx.is_monolithic && sel->type == PIPE_SHADER_FRAGMENT && shader->key.ps.prolog.poly_stipple) { LLVMValueRef list = LLVMGetParam(ctx.radeon_bld.main_fn, SI_PARAM_RW_BUFFERS); si_llvm_emit_polygon_stipple(&ctx, list, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] radeonsi: get rid of constant buffer preloading
From: Marek Olšák 26011 shaders in 14651 tests Totals: SGPRS: 1152636 -> 1146340 (-0.55 %) VGPRS: 728198 -> 727371 (-0.11 %) Spilled SGPRs: 3776 -> 2218 (-41.26 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35835152 -> 35841268 (0.02 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222372 -> 222559 (0.08 %) Wait states: 0 -> 0 (0.00 %) --- src/gallium/drivers/radeonsi/si_shader.c | 38 1 file changed, 14 insertions(+), 24 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index c96c52e..faa5363 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -99,21 +99,20 @@ struct si_shader_context int param_tess_offchip; LLVMTargetMachineRef tm; unsigned invariant_load_md_kind; unsigned range_md_kind; unsigned uniform_md_kind; LLVMValueRef empty_md; /* Preloaded descriptors. */ - LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS]; LLVMValueRef esgs_ring; LLVMValueRef gsvs_ring[4]; LLVMValueRef lds; LLVMValueRef gs_next_vertex[4]; LLVMValueRef return_value; LLVMTypeRef voidt; LLVMTypeRef i1; LLVMTypeRef i8; @@ -1842,20 +1841,29 @@ static void declare_compute_memory(struct radeon_llvm_context *radeon_bld, var = LLVMAddGlobalInAddressSpace(gallivm->module, LLVMArrayType(ctx->i8, sel->local_size), "compute_lds", LOCAL_ADDR_SPACE); LLVMSetAlignment(var, 4); ctx->shared_memory = LLVMBuildBitCast(gallivm->builder, var, i8p, ""); } +static LLVMValueRef load_const_buffer_desc(struct si_shader_context *ctx, int i) +{ + LLVMValueRef list_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, +SI_PARAM_CONST_BUFFERS); + + return build_indexed_load_const(ctx, list_ptr, + LLVMConstInt(ctx->i32, i, 0)); +} + static LLVMValueRef fetch_constant( struct lp_build_tgsi_context *bld_base, const struct tgsi_full_src_register *reg, enum tgsi_opcode_type type, unsigned swizzle) { struct si_shader_context *ctx = si_shader_context(bld_base); struct lp_build_context *base = &bld_base->base; const struct tgsi_ind_register *ireg = ®->Indirect; unsigned buf, idx; @@ -1869,45 +1877,46 @@ static LLVMValueRef fetch_constant( for (chan = 0; chan < TGSI_NUM_CHANNELS; ++chan) values[chan] = fetch_constant(bld_base, reg, type, chan); return lp_build_gather_values(bld_base->base.gallivm, values, 4); } buf = reg->Register.Dimension ? reg->Dimension.Index : 0; idx = reg->Register.Index * 4 + swizzle; if (!reg->Register.Indirect && !reg->Dimension.Indirect) { - LLVMValueRef c0, c1; + LLVMValueRef c0, c1, desc; - c0 = buffer_load_const(ctx, ctx->const_buffers[buf], + desc = load_const_buffer_desc(ctx, buf); + c0 = buffer_load_const(ctx, desc, LLVMConstInt(ctx->i32, idx * 4, 0)); if (!tgsi_type_is_64bit(type)) return bitcast(bld_base, type, c0); else { - c1 = buffer_load_const(ctx, ctx->const_buffers[buf], + c1 = buffer_load_const(ctx, desc, LLVMConstInt(ctx->i32, (idx + 1) * 4, 0)); return radeon_llvm_emit_fetch_64bit(bld_base, type, c0, c1); } } if (reg->Register.Dimension && reg->Dimension.Indirect) { LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, SI_PARAM_CONST_BUFFERS); LLVMValueRef index; index = get_bounded_indirect_index(ctx, ®->DimIndirect, reg->Dimension.Index, SI_NUM_CONST_BUFFERS); bufp = build_indexed_load_const(ctx, ptr, index); } else - bufp = ctx->const_buffers[buf]; + bufp = load_const_buffer_desc(ctx, buf); addr = ctx->radeon_bld.soa.addr[ireg->Index][ireg->Swizzle]; addr = LLVMBuildLoad(base->gallivm->builder, addr, "load addr reg"); addr = lp_build_mul_imm(&bld_base->uint_bld, addr, 16); addr = lp_build_add(&bld_base->uint_bld, addr, lp_build_const_int32(base->gallivm, idx * 4)); result = buffer_