Re: [Mesa-dev] [PATCH] util/build-id: Fix address comparison for binaries with LOAD vaddr > 0
I've verified this gets the correct address. Very nice work figuring this out Stephan! Reviewed-by: Tapani PälliOn 01/24/2018 04:13 PM, Stephan Gerhold wrote: build_id_find_nhdr_for_addr() fails to find the build-id if the first LOAD segment has a virtual address other than 0x0. For most shared libraries, the first LOAD segment has vaddr=0x0: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x00 0x 0x 0x2d2e26 0x2d2e26 R E 0x1000 LOAD 0x2d2e54 0x002d3e54 0x002d3e54 0x2e248 0x2f148 RW 0x1000 However, compiling the Intel Vulkan driver as 32-bit binary on Android produces the following ELF header with vaddr=0x8000 instead: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x34 0x8034 0x8034 0x00100 0x00100 R 0x4 LOAD 0x00 0x8000 0x8000 0x224a04 0x224a04 R E 0x1000 LOAD 0x225710 0x0022e710 0x0022e710 0x25988 0x27364 RW 0x1000 build_id_find_nhdr_callback() compares the address of dli_fbase from dladdr() and dlpi_addr from dl_iterate_phdr(). With vaddr > 0, these point to a different memory address, e.g.: dli_fbase=0xd8395000 (offset 0x8000) dlpi_addr=0xd838d000 At least on glibc and bionic (Android) dli_fbase refers to the address where the shared object is mapped into the process space, whereas dlpi_addr is just the base address for the vaddrs declared in the ELF header. To compare them correctly, we need to calculate the start of the mapping by adding the vaddr of the first LOAD segment to the base address. Cc: Chad Versace Cc: Emil Velikov Cc: Tapani Pälli Cc: Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104642 Fixes: 5c98d38 "util: Query build-id by symbol address, not library name" --- src/util/build_id.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/src/util/build_id.c b/src/util/build_id.c index 536c74360e..fb67d160e3 100644 --- a/src/util/build_id.c +++ b/src/util/build_id.c @@ -58,7 +58,18 @@ build_id_find_nhdr_callback(struct dl_phdr_info *info, size_t size, void *data_) { struct callback_data *data = data_; - if ((void *)info->dlpi_addr != data->dli_fbase) + /* Calculate address where shared object is mapped into the process space. +* (Using the base address and the virtual address of the first LOAD segment) +*/ + void *map_start = NULL; + for (unsigned i = 0; i < info->dlpi_phnum; i++) { + if (info->dlpi_phdr[i].p_type == PT_LOAD) { + map_start = (void *)(info->dlpi_addr + info->dlpi_phdr[i].p_vaddr); + break; + } + } + + if (map_start != data->dli_fbase) return 0; for (unsigned i = 0; i < info->dlpi_phnum; i++) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 0.5/5] i965/tiled_memcpy: linear_to_ytiled a cache line at a time
The OCD in me is seeing a couple more places you could micro-optimize. Before I actually point them out, Reviewed-by: Jason EkstrandOn Thu, Jan 25, 2018 at 8:23 AM, Scott D Phillips < scott.d.phill...@intel.com> wrote: > TileY's low 6 address bits are: v1 v0 u3 u2 u1 u0 > Thus a cache line in the tiled surface is composed of a 2d area of > 16x4 bytes of the linear surface. > > Add a special case where the area being copied is 4-line aligned > and a multiple of 4-lines so that entire cache lines will be > written at a time. > > On Apollolake, this increases tiling throughput to wc maps by > 84.8512% +/- 0.935379% > Nice! > v2: Split [y0, y1) and [y2, y3) loops apart for clarity (Jason Ekstrand) > --- > src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 80 > +++--- > 1 file changed, 72 insertions(+), 8 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > index 53a5679691..9e6bafa4b4 100644 > --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > @@ -287,8 +287,8 @@ linear_to_xtiled(uint32_t x0, uint32_t x1, uint32_t > x2, uint32_t x3, > */ > static inline void > linear_to_ytiled(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, > - uint32_t y0, uint32_t y1, > - char *dst, const char *src, > + uint32_t y0, uint32_t y3, > + char *dst, const char *src0, > int32_t src_pitch, > uint32_t swizzle_bit, > mem_copy_fn mem_copy, > @@ -306,6 +306,9 @@ linear_to_ytiled(uint32_t x0, uint32_t x1, uint32_t > x2, uint32_t x3, > const uint32_t column_width = ytile_span; > const uint32_t bytes_per_column = column_width * ytile_height; > > + uint32_t y1 = ALIGN_UP(y0, 4); > + uint32_t y2 = ALIGN_DOWN(y3, 4); > + > uint32_t xo0 = (x0 % ytile_span) + (x0 / ytile_span) * > bytes_per_column; > uint32_t xo1 = (x1 % ytile_span) + (x1 / ytile_span) * > bytes_per_column; > > @@ -319,26 +322,87 @@ linear_to_ytiled(uint32_t x0, uint32_t x1, uint32_t > x2, uint32_t x3, > > uint32_t x, yo; > > - src += (ptrdiff_t)y0 * src_pitch; > + const char *src = src0 + (ptrdiff_t)y0 * src_pitch; > > - for (yo = y0 * column_width; yo < y1 * column_width; yo += > column_width) { > + if (y0 != y1) { > + for (yo = y0 * column_width; yo < y1 * column_width; yo += > column_width) { > + uint32_t xo = xo1; > + uint32_t swizzle = swizzle1; > + > + mem_copy(dst + ((xo0 + yo) ^ swizzle0), src + x0, x1 - x0); > + > + /* Step by spans/columns. As it happens, the swizzle bit flips > + * at each step so we don't need to calculate it explicitly. > + */ > + for (x = x1; x < x2; x += ytile_span) { > +mem_copy_align16(dst + ((xo + yo) ^ swizzle), src + x, > ytile_span); > +xo += bytes_per_column; > +swizzle ^= swizzle_bit; > + } > + > + mem_copy_align16(dst + ((xo + yo) ^ swizzle), src + x2, x3 - x2); > + > + src += src_pitch; > + } > + } > + > + src = src0 + (ptrdiff_t)y1 * src_pitch; > + > + for (yo = y1 * column_width; yo < y2 * column_width; yo += 4 * > column_width) { >uint32_t xo = xo1; >uint32_t swizzle = swizzle1; > > - mem_copy(dst + ((xo0 + yo) ^ swizzle0), src + x0, x1 - x0); > + if (x0 != x1) { > + mem_copy(dst + ((xo0 + yo + 0 * column_width) ^ swizzle0), src + > x0 + 0 * src_pitch, x1 - x0); > + mem_copy(dst + ((xo0 + yo + 1 * column_width) ^ swizzle0), src + > x0 + 1 * src_pitch, x1 - x0); > + mem_copy(dst + ((xo0 + yo + 2 * column_width) ^ swizzle0), src + > x0 + 2 * src_pitch, x1 - x0); > + mem_copy(dst + ((xo0 + yo + 3 * column_width) ^ swizzle0), src + > x0 + 3 * src_pitch, x1 - x0); > + } > >/* Step by spans/columns. As it happens, the swizzle bit flips > * at each step so we don't need to calculate it explicitly. > */ >for (x = x1; x < x2; x += ytile_span) { > - mem_copy_align16(dst + ((xo + yo) ^ swizzle), src + x, > ytile_span); > + mem_copy_align16(dst + ((xo + yo + 0 * column_width) ^ swizzle), > src + x + 0 * src_pitch, ytile_span); > + mem_copy_align16(dst + ((xo + yo + 1 * column_width) ^ swizzle), > src + x + 1 * src_pitch, ytile_span); > + mem_copy_align16(dst + ((xo + yo + 2 * column_width) ^ swizzle), > src + x + 2 * src_pitch, ytile_span); > + mem_copy_align16(dst + ((xo + yo + 3 * column_width) ^ swizzle), > src + x + 3 * src_pitch, ytile_span); > xo += bytes_per_column; > swizzle ^= swizzle_bit; >} > > - mem_copy_align16(dst + ((xo + yo) ^ swizzle), src + x2, x3 - x2); > + if (x2 != x3) { > + mem_copy_align16(dst + ((xo + yo + 0 * column_width) ^ swizzle), > src + x2 + 0 * src_pitch, x3 -
[Mesa-dev] [PATCH] meson: Add new picture_{h264, hevc}_enc.c files to meson too
--- Very nice that this finally arrives. Can you add the files to meson too, something like this patch? I can't test it because I only have Polaris here. src/gallium/state_trackers/va/meson.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/va/meson.build b/src/gallium/state_trackers/va/meson.build index 35da5ab532..2eb312ce4c 100644 --- a/src/gallium/state_trackers/va/meson.build +++ b/src/gallium/state_trackers/va/meson.build @@ -26,7 +26,7 @@ libva_st = static_library( 'buffer.c', 'config.c', 'context.c', 'display.c', 'image.c', 'picture.c', 'picture_mpeg12.c', 'picture_mpeg4.c', 'picture_h264.c', 'picture_hevc.c', 'picture_vc1.c', 'picture_mjpeg.c', 'postproc.c', 'subpicture.c', -'surface.c', +'surface.c', 'picture_h264_enc.c', 'picture_hevc_enc.c' ), c_args : [ c_vis_args, -- 2.16.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] mesa: add blob overrun check to program binary reads
--- src/mesa/main/program_binary.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/src/mesa/main/program_binary.c b/src/mesa/main/program_binary.c index 2786487362..68a15ec258 100644 --- a/src/mesa/main/program_binary.c +++ b/src/mesa/main/program_binary.c @@ -287,5 +287,19 @@ _mesa_program_binary(struct gl_context *ctx, struct gl_shader_program *sh_prog, return; } + if (blob.current != blob.end || blob.overrun) { + /* Something has gone wrong ignore the binary and set link status to + * failure. + */ + assert(!"Invalid program binary cache item!"); + + if (ctx->_Shader->Flags & GLSL_CACHE_INFO) { + fprintf(stderr, "Error reading program from program binary\n"); + } + sh_prog->data->LinkStatus = linking_failure; + + return; + } + sh_prog->data->LinkStatus = linking_success; } -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] st/shader_cache: restore num_tgsi_tokens when loading from cache
Without this we will fail to correctly serialise programs when using glGetProgramBinary() if the program was retrieved from the disk cache rather than freshly compiled. Fixes: c69b0dd6817b "st/glsl_to_tgsi: store num_tgsi_tokens in st_*_program" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104762 --- src/mesa/state_tracker/st_shader_cache.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/src/mesa/state_tracker/st_shader_cache.c b/src/mesa/state_tracker/st_shader_cache.c index a971b0d7ee..92c633d450 100644 --- a/src/mesa/state_tracker/st_shader_cache.c +++ b/src/mesa/state_tracker/st_shader_cache.c @@ -142,10 +142,11 @@ read_stream_out_from_cache(struct blob_reader *blob_reader, static void read_tgsi_from_cache(struct blob_reader *blob_reader, - const struct tgsi_token **tokens) + const struct tgsi_token **tokens, + unsigned *num_tokens) { - uint32_t num_tokens = blob_read_uint32(blob_reader); - unsigned tokens_size = num_tokens * sizeof(struct tgsi_token); + *num_tokens = blob_read_uint32(blob_reader); + unsigned tokens_size = *num_tokens * sizeof(struct tgsi_token); *tokens = (const struct tgsi_token*) MALLOC(tokens_size); blob_copy_bytes(blob_reader, (uint8_t *) *tokens, tokens_size); } @@ -175,7 +176,8 @@ st_deserialise_tgsi_program(struct gl_context *ctx, sizeof(stvp->result_to_output)); read_stream_out_from_cache(_reader, >tgsi); - read_tgsi_from_cache(_reader, >tgsi.tokens); + read_tgsi_from_cache(_reader, >tgsi.tokens, + >num_tgsi_tokens); if (st->vp == stvp) st->dirty |= ST_NEW_VERTEX_PROGRAM(st, stvp); @@ -189,7 +191,8 @@ st_deserialise_tgsi_program(struct gl_context *ctx, >variants, >tgsi); read_stream_out_from_cache(_reader, >tgsi); - read_tgsi_from_cache(_reader, >tgsi.tokens); + read_tgsi_from_cache(_reader, >tgsi.tokens, + >num_tgsi_tokens); if (st->tcp == sttcp) st->dirty |= sttcp->affected_states; @@ -203,7 +206,8 @@ st_deserialise_tgsi_program(struct gl_context *ctx, >variants, >tgsi); read_stream_out_from_cache(_reader, >tgsi); - read_tgsi_from_cache(_reader, >tgsi.tokens); + read_tgsi_from_cache(_reader, >tgsi.tokens, + >num_tgsi_tokens); if (st->tep == sttep) st->dirty |= sttep->affected_states; @@ -217,7 +221,8 @@ st_deserialise_tgsi_program(struct gl_context *ctx, >tgsi); read_stream_out_from_cache(_reader, >tgsi); - read_tgsi_from_cache(_reader, >tgsi.tokens); + read_tgsi_from_cache(_reader, >tgsi.tokens, + >num_tgsi_tokens); if (st->gp == stgp) st->dirty |= stgp->affected_states; @@ -229,7 +234,8 @@ st_deserialise_tgsi_program(struct gl_context *ctx, st_release_fp_variants(st, stfp); - read_tgsi_from_cache(_reader, >tgsi.tokens); + read_tgsi_from_cache(_reader, >tgsi.tokens, + >num_tgsi_tokens); if (st->fp == stfp) st->dirty |= stfp->affected_states; @@ -242,7 +248,8 @@ st_deserialise_tgsi_program(struct gl_context *ctx, st_release_cp_variants(st, stcp); read_tgsi_from_cache(_reader, - (const struct tgsi_token**) >tgsi.prog); + (const struct tgsi_token**) >tgsi.prog, + >num_tgsi_tokens); stcp->tgsi.req_local_mem = stcp->Base.info.cs.shared_size; stcp->tgsi.req_private_mem = 0; -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] radeon/vcn: add and manage render picture list
Hi Emil, I cherry-picked these 2 patches over to the 17.3 branch, then tested locally and confirmed that it works fine. I also added bugzilla reference to the patch as well. Please see the details in the links below: https://lists.freedesktop.org/archives/mesa-dev/2018-January/183485.html https://lists.freedesktop.org/archives/mesa-dev/2018-January/183484.html Regards, Boyuan -Original Message- From: Emil Velikov [mailto:emil.l.veli...@gmail.com] Sent: January-24-18 8:38 AM To: Zhang, Boyuan; ML mesa-stable Cc: ML mesa-dev Subject: Re: [Mesa-dev] [PATCH 1/2] radeon/vcn: add and manage render picture list On 11 December 2017 at 16:47,wrote: > From: Boyuan Zhang > > Create a list in decoder to store all render picture buffer pointers > that currently being used in reference picture lists. > > During get message buffer call, check each pointer in > render_pic_list[] within given pic->ref[] list, remove pointer that no > longer being used by > pic->ref[]. Then add current render surface pointer to the > pic->render_pic_list[] > and assign the associated index to result.curr_idx. > > As a result, result.curr_idx will have the correct index to represent > the current render picture, instead of the previous increamenting values. > > Signed-off-by: Boyuan Zhang > Reviewed-by: Christian König > --- We'd want this and 2/2 (sha's below) in stable. Otherwise people will experience regressions when updating their firmware. f2bfd1cbb7e radeon/vcn: add and manage render picture list 2ec48039b8a radeon/uvd: add and manage render picture list Including the bugzilla reference will be great. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104745 -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: silence MinGW 'may be unused uninitialized' warning in get.c
The warning happens on line 2114 for the memcpy(data, p, size) call. I'm not sure why that generates the warning but not the earlier use of p in the code. --- src/mesa/main/get.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 7f2d72a..5fee9a6 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -2051,7 +2051,7 @@ _mesa_GetUnsignedBytevEXT(GLenum pname, GLubyte *data) const struct value_desc *d; union value v; int shift; - void *p; + void *p = NULL; GLsizei size; const char *func = "glGetUnsignedBytevEXT"; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radeon/uvd: add and manage render picture list
From: Boyuan ZhangCreate a list in decoder to store all render picture buffer pointers that currently being used in reference picture lists. During get message buffer call, check each pointer in render_pic_list[] within given pic->ref[] list, remove pointer that no longer being used by pic->ref[]. Then add current render surface pointer to the render_pic_list[] and assign the associated index to result.curr_idx. As a result, result.curr_idx will have the correct index to represent the current render picture, instead of the previous increamenting values. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104745 Signed-off-by: Boyuan Zhang Reviewed-by: Christian König Cc: mesa-sta...@lists.freedesktop.org (cherry picked from commit 2ec48039b8aa1f6a5e16f3f12483b88981d0f5d3) --- src/gallium/drivers/radeon/radeon_uvd.c | 29 + 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_uvd.c b/src/gallium/drivers/radeon/radeon_uvd.c index 032ed7c..87e7858 100644 --- a/src/gallium/drivers/radeon/radeon_uvd.c +++ b/src/gallium/drivers/radeon/radeon_uvd.c @@ -97,6 +97,8 @@ struct ruvd_decoder { unsignedcmd; unsignedcntl; } reg; + + void*render_pic_list[16]; }; /* flush IB to the hardware */ @@ -596,7 +598,7 @@ static struct ruvd_h265 get_h265_msg(struct ruvd_decoder *dec, struct pipe_video struct pipe_h265_picture_desc *pic) { struct ruvd_h265 result; - unsigned i; + unsigned i, j; memset(, 0, sizeof(result)); @@ -676,11 +678,28 @@ static struct ruvd_h265 get_h265_msg(struct ruvd_decoder *dec, struct pipe_video result.row_height_minus1[i] = pic->pps->row_height_minus1[i]; result.num_delta_pocs_ref_rps_idx = pic->NumDeltaPocsOfRefRpsIdx; - result.curr_idx = pic->CurrPicOrderCntVal; result.curr_poc = pic->CurrPicOrderCntVal; + for (i = 0 ; i < 16 ; i++) { + for (j = 0; (pic->ref[j] != NULL) && (j < 16) ; j++) { + if (dec->render_pic_list[i] == pic->ref[j]) + break; + if (j == 15) + dec->render_pic_list[i] = NULL; + else if (pic->ref[j+1] == NULL) + dec->render_pic_list[i] = NULL; + } + } + for (i = 0 ; i < 16 ; i++) { + if (dec->render_pic_list[i] == NULL) { + dec->render_pic_list[i] = target; + result.curr_idx = i; + break; + } + } + vl_video_buffer_set_associated_data(target, >base, - (void *)(uintptr_t)pic->CurrPicOrderCntVal, + (void *)(uintptr_t)result.curr_idx, _destroy_associated_data); for (i = 0; i < 16; ++i) { @@ -723,7 +742,7 @@ static struct ruvd_h265 get_h265_msg(struct ruvd_decoder *dec, struct pipe_video memcpy(dec->it + 864, pic->pps->sps->ScalingList32x32, 2 * 64); for (i = 0 ; i < 2 ; i++) { - for (int j = 0 ; j < 15 ; j++) + for (j = 0 ; j < 15 ; j++) result.direct_reflist[i][j] = pic->RefPicList[i][j]; } @@ -1407,6 +1426,8 @@ struct pipe_video_codec *si_common_uvd_create_decoder(struct pipe_context *conte goto error; } + for (i = 0; i < 16; i++) +dec->render_pic_list[i] = NULL; dec->fb_size = (info.family == CHIP_TONGA) ? FB_BUFFER_SIZE_TONGA : FB_BUFFER_SIZE; bs_buf_size = width * height * (512 / (16 * 16)); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radeon/vcn: add and manage render picture list
From: Boyuan ZhangCreate a list in decoder to store all render picture buffer pointers that currently being used in reference picture lists. During get message buffer call, check each pointer in render_pic_list[] within given pic->ref[] list, remove pointer that no longer being used by pic->ref[]. Then add current render surface pointer to the render_pic_list[] and assign the associated index to result.curr_idx. As a result, result.curr_idx will have the correct index to represent the current render picture, instead of the previous increamenting values. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104745 Signed-off-by: Boyuan Zhang Reviewed-by: Christian König Cc: mesa-sta...@lists.freedesktop.org (cherry picked from commit f2bfd1cbb7e72945ca192845a1ad28426c7aea89) --- src/gallium/drivers/radeon/radeon_vcn_dec.c | 28 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec.c b/src/gallium/drivers/radeon/radeon_vcn_dec.c index 2ece4a3..8010010 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_dec.c +++ b/src/gallium/drivers/radeon/radeon_vcn_dec.c @@ -78,6 +78,7 @@ struct radeon_decoder { unsignedbs_size; unsignedcur_buffer; + void*render_pic_list[16]; }; static rvcn_dec_message_avc_t get_h264_msg(struct radeon_decoder *dec, @@ -186,7 +187,7 @@ static rvcn_dec_message_hevc_t get_h265_msg(struct radeon_decoder *dec, struct pipe_h265_picture_desc *pic) { rvcn_dec_message_hevc_t result; - unsigned i; + unsigned i, j; memset(, 0, sizeof(result)); result.sps_info_flags = 0; @@ -273,11 +274,28 @@ static rvcn_dec_message_hevc_t get_h265_msg(struct radeon_decoder *dec, result.row_height_minus1[i] = pic->pps->row_height_minus1[i]; result.num_delta_pocs_ref_rps_idx = pic->NumDeltaPocsOfRefRpsIdx; - result.curr_idx = pic->CurrPicOrderCntVal; result.curr_poc = pic->CurrPicOrderCntVal; + for (i = 0 ; i < 16 ; i++) { + for (j = 0; (pic->ref[j] != NULL) && (j < 16) ; j++) { + if (dec->render_pic_list[i] == pic->ref[j]) + break; + if (j == 15) + dec->render_pic_list[i] = NULL; + else if (pic->ref[j+1] == NULL) + dec->render_pic_list[i] = NULL; + } + } + for (i = 0 ; i < 16 ; i++) { + if (dec->render_pic_list[i] == NULL) { + dec->render_pic_list[i] = target; + result.curr_idx = i; + break; + } + } + vl_video_buffer_set_associated_data(target, >base, - (void *)(uintptr_t)pic->CurrPicOrderCntVal, + (void *)(uintptr_t)result.curr_idx, _dec_destroy_associated_data); for (i = 0; i < 16; ++i) { @@ -320,7 +338,7 @@ static rvcn_dec_message_hevc_t get_h265_msg(struct radeon_decoder *dec, memcpy(dec->it + 864, pic->pps->sps->ScalingList32x32, 2 * 64); for (i = 0 ; i < 2 ; i++) { - for (int j = 0 ; j < 15 ; j++) + for (j = 0 ; j < 15 ; j++) result.direct_reflist[i][j] = pic->RefPicList[i][j]; } @@ -1236,6 +1254,8 @@ struct pipe_video_codec *radeon_create_decoder(struct pipe_context *context, goto error; } + for (i = 0; i < 16; i++) + dec->render_pic_list[i] = NULL; bs_buf_size = width * height * (512 / (16 * 16)); for (i = 0; i < NUM_BUFFERS; ++i) { unsigned msg_fb_it_size = FB_BUFFER_OFFSET + FB_BUFFER_SIZE; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/5] i965/miptree: Use cpu tiling/detiling when mapping
Rename the (un)map_gtt functions to (un)map_map (map by returning a map) and add new functions (un)map_tiled_memcpy that return a shadow buffer populated with the intel_tiled_memcpy functions. v2: Compute extents properly in the x|y-rounded-down case (Chris Wilson) --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 95 --- 1 file changed, 86 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index c480eade93..85297cb0c1 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -31,6 +31,7 @@ #include "intel_image.h" #include "intel_mipmap_tree.h" #include "intel_tex.h" +#include "intel_tiled_memcpy.h" #include "intel_blit.h" #include "intel_fbo.h" @@ -3028,10 +3029,10 @@ intel_miptree_unmap_raw(struct intel_mipmap_tree *mt) } static void -intel_miptree_map_gtt(struct brw_context *brw, - struct intel_mipmap_tree *mt, - struct intel_miptree_map *map, - unsigned int level, unsigned int slice) +intel_miptree_map_map(struct brw_context *brw, + struct intel_mipmap_tree *mt, + struct intel_miptree_map *map, + unsigned int level, unsigned int slice) { unsigned int bw, bh; void *base; @@ -3049,7 +3050,7 @@ intel_miptree_map_gtt(struct brw_context *brw, y /= bh; x /= bw; - base = intel_miptree_map_raw(brw, mt, map->mode); + base = intel_miptree_map_raw(brw, mt, map->mode | MAP_RAW); if (base == NULL) map->ptr = NULL; @@ -3075,11 +3076,80 @@ intel_miptree_map_gtt(struct brw_context *brw, } static void -intel_miptree_unmap_gtt(struct intel_mipmap_tree *mt) +intel_miptree_unmap_map(struct intel_mipmap_tree *mt) { intel_miptree_unmap_raw(mt); } +/* Compute extent parameters for use with tiled_memcpy functions. + * xs are in units of bytes and ys are in units of strides. */ +static inline void +tile_extents(struct intel_mipmap_tree *mt, struct intel_miptree_map *map, + unsigned int level, unsigned int slice, unsigned int *x1, + unsigned int *x2, unsigned int *y1, unsigned int *y2) +{ + unsigned int block_width, block_height, block_bytes; + unsigned int x0_el, y0_el; + + _mesa_get_format_block_size(mt->format, _width, _height); + block_bytes = _mesa_get_format_bytes(mt->format); + + assert(map->x % block_width == 0); + assert(map->y % block_height == 0); + + intel_miptree_get_image_offset(mt, level, slice, _el, _el); + *x1 = (map->x / block_width + x0_el) * block_bytes; + *y1 = map->y / block_height + y0_el; + *x2 = (DIV_ROUND_UP(map->x + map->w, block_width) + x0_el) * block_bytes; + *y2 = DIV_ROUND_UP(map->y + map->h, block_height) + y0_el; +} + +static void +intel_miptree_map_tiled_memcpy(struct brw_context *brw, + struct intel_mipmap_tree *mt, + struct intel_miptree_map *map, + unsigned int level, unsigned int slice) +{ + unsigned int x1, x2, y1, y2; + tile_extents(mt, map, level, slice, , , , ); + map->stride = _mesa_format_row_stride(mt->format, map->w); + map->buffer = map->ptr = malloc(map->stride * (y2 - y1)); + + if (!(map->mode & GL_MAP_INVALIDATE_RANGE_BIT)) { + char *src = intel_miptree_map_raw(brw, mt, map->mode | MAP_RAW); + src += mt->offset; + + tiled_to_linear(x1, x2, y1, y2, map->ptr, src, map->stride, + mt->surf.row_pitch, brw->has_swizzling, mt->surf.tiling, + memcpy); + + intel_miptree_unmap_raw(mt); + } +} + +static void +intel_miptree_unmap_tiled_memcpy(struct brw_context *brw, + struct intel_mipmap_tree *mt, + struct intel_miptree_map *map, + unsigned int level, + unsigned int slice) +{ + if (map->mode & GL_MAP_WRITE_BIT) { + unsigned int x1, x2, y1, y2; + tile_extents(mt, map, level, slice, , , , ); + + char *dst = intel_miptree_map_raw(brw, mt, map->mode | MAP_RAW); + dst += mt->offset; + + linear_to_tiled(x1, x2, y1, y2, dst, map->ptr, mt->surf.row_pitch, + map->stride, brw->has_swizzling, mt->surf.tiling, memcpy); + + intel_miptree_unmap_raw(mt); + } + free(map->buffer); + map->buffer = map->ptr = NULL; +} + static void intel_miptree_map_blit(struct brw_context *brw, struct intel_mipmap_tree *mt, @@ -3637,8 +3707,10 @@ intel_miptree_map(struct brw_context *brw, (mt->surf.row_pitch % 16 == 0)) { intel_miptree_map_movntdqa(brw, mt, map, level, slice); #endif + } else if (mt->surf.tiling != ISL_TILING_LINEAR) { + intel_miptree_map_tiled_memcpy(brw, mt, map, level, slice); } else { -
[Mesa-dev] [PATCH v2 0.5/5] i965/tiled_memcpy: linear_to_ytiled a cache line at a time
TileY's low 6 address bits are: v1 v0 u3 u2 u1 u0 Thus a cache line in the tiled surface is composed of a 2d area of 16x4 bytes of the linear surface. Add a special case where the area being copied is 4-line aligned and a multiple of 4-lines so that entire cache lines will be written at a time. On Apollolake, this increases tiling throughput to wc maps by 84.8512% +/- 0.935379% v2: Split [y0, y1) and [y2, y3) loops apart for clarity (Jason Ekstrand) --- src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 80 +++--- 1 file changed, 72 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c index 53a5679691..9e6bafa4b4 100644 --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c @@ -287,8 +287,8 @@ linear_to_xtiled(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, */ static inline void linear_to_ytiled(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, - uint32_t y0, uint32_t y1, - char *dst, const char *src, + uint32_t y0, uint32_t y3, + char *dst, const char *src0, int32_t src_pitch, uint32_t swizzle_bit, mem_copy_fn mem_copy, @@ -306,6 +306,9 @@ linear_to_ytiled(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, const uint32_t column_width = ytile_span; const uint32_t bytes_per_column = column_width * ytile_height; + uint32_t y1 = ALIGN_UP(y0, 4); + uint32_t y2 = ALIGN_DOWN(y3, 4); + uint32_t xo0 = (x0 % ytile_span) + (x0 / ytile_span) * bytes_per_column; uint32_t xo1 = (x1 % ytile_span) + (x1 / ytile_span) * bytes_per_column; @@ -319,26 +322,87 @@ linear_to_ytiled(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, uint32_t x, yo; - src += (ptrdiff_t)y0 * src_pitch; + const char *src = src0 + (ptrdiff_t)y0 * src_pitch; - for (yo = y0 * column_width; yo < y1 * column_width; yo += column_width) { + if (y0 != y1) { + for (yo = y0 * column_width; yo < y1 * column_width; yo += column_width) { + uint32_t xo = xo1; + uint32_t swizzle = swizzle1; + + mem_copy(dst + ((xo0 + yo) ^ swizzle0), src + x0, x1 - x0); + + /* Step by spans/columns. As it happens, the swizzle bit flips + * at each step so we don't need to calculate it explicitly. + */ + for (x = x1; x < x2; x += ytile_span) { +mem_copy_align16(dst + ((xo + yo) ^ swizzle), src + x, ytile_span); +xo += bytes_per_column; +swizzle ^= swizzle_bit; + } + + mem_copy_align16(dst + ((xo + yo) ^ swizzle), src + x2, x3 - x2); + + src += src_pitch; + } + } + + src = src0 + (ptrdiff_t)y1 * src_pitch; + + for (yo = y1 * column_width; yo < y2 * column_width; yo += 4 * column_width) { uint32_t xo = xo1; uint32_t swizzle = swizzle1; - mem_copy(dst + ((xo0 + yo) ^ swizzle0), src + x0, x1 - x0); + if (x0 != x1) { + mem_copy(dst + ((xo0 + yo + 0 * column_width) ^ swizzle0), src + x0 + 0 * src_pitch, x1 - x0); + mem_copy(dst + ((xo0 + yo + 1 * column_width) ^ swizzle0), src + x0 + 1 * src_pitch, x1 - x0); + mem_copy(dst + ((xo0 + yo + 2 * column_width) ^ swizzle0), src + x0 + 2 * src_pitch, x1 - x0); + mem_copy(dst + ((xo0 + yo + 3 * column_width) ^ swizzle0), src + x0 + 3 * src_pitch, x1 - x0); + } /* Step by spans/columns. As it happens, the swizzle bit flips * at each step so we don't need to calculate it explicitly. */ for (x = x1; x < x2; x += ytile_span) { - mem_copy_align16(dst + ((xo + yo) ^ swizzle), src + x, ytile_span); + mem_copy_align16(dst + ((xo + yo + 0 * column_width) ^ swizzle), src + x + 0 * src_pitch, ytile_span); + mem_copy_align16(dst + ((xo + yo + 1 * column_width) ^ swizzle), src + x + 1 * src_pitch, ytile_span); + mem_copy_align16(dst + ((xo + yo + 2 * column_width) ^ swizzle), src + x + 2 * src_pitch, ytile_span); + mem_copy_align16(dst + ((xo + yo + 3 * column_width) ^ swizzle), src + x + 3 * src_pitch, ytile_span); xo += bytes_per_column; swizzle ^= swizzle_bit; } - mem_copy_align16(dst + ((xo + yo) ^ swizzle), src + x2, x3 - x2); + if (x2 != x3) { + mem_copy_align16(dst + ((xo + yo + 0 * column_width) ^ swizzle), src + x2 + 0 * src_pitch, x3 - x2); + mem_copy_align16(dst + ((xo + yo + 1 * column_width) ^ swizzle), src + x2 + 1 * src_pitch, x3 - x2); + mem_copy_align16(dst + ((xo + yo + 2 * column_width) ^ swizzle), src + x2 + 2 * src_pitch, x3 - x2); + mem_copy_align16(dst + ((xo + yo + 3 * column_width) ^ swizzle), src + x2 + 3 * src_pitch, x3 - x2); + } - src += src_pitch; + src += 4 * src_pitch; + } + + if (y2 != y3) { + src = src0 + (ptrdiff_t)y2
[Mesa-dev] [PATCH] ac/nir: Correctly handle imod with different signs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102032 --- src/amd/common/ac_nir_to_llvm.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index 8ae8650a7b..4f1e4af37b 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -1671,6 +1671,23 @@ static LLVMValueRef emit_ddxy_interp( return ac_build_gather_values(>ac, result, 4); } +static LLVMValueRef emit_imod(struct ac_llvm_context *ctx, LLVMValueRef src0, LLVMValueRef src1) +{ + /* The imod result should have the same sign as src1 when not 0. */ + + LLVMValueRef result = LLVMBuildSRem(ctx->builder, src0, src1, ""); + + LLVMValueRef diff_sign = LLVMBuildXor(ctx->builder, result, src1, ""); + diff_sign = LLVMBuildICmp(ctx->builder, LLVMIntSLT, diff_sign, ctx->i32_0, ""); + + LLVMValueRef nonzero = LLVMBuildICmp(ctx->builder, LLVMIntNE, result, ctx->i32_0, ""); + + LLVMValueRef cond = LLVMBuildAnd(ctx->builder, diff_sign, nonzero, ""); + LLVMValueRef offset = LLVMBuildSelect(ctx->builder, cond, src1, ctx->i32_0, ""); + + return LLVMBuildAdd(ctx->builder, result, offset, ""); +} + static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr) { LLVMValueRef src[4], result = NULL; @@ -1733,7 +1750,7 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr) result = LLVMBuildMul(ctx->ac.builder, src[0], src[1], ""); break; case nir_op_imod: - result = LLVMBuildSRem(ctx->ac.builder, src[0], src[1], ""); + result = emit_imod(>ac, src[0], src[1]); break; case nir_op_umod: result = LLVMBuildURem(ctx->ac.builder, src[0], src[1], ""); -- 2.16.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 102032] nir_op_imod is incorrectly implemented as LLVM's srem
https://bugs.freedesktop.org/show_bug.cgi?id=102032 --- Comment #1 from Bas Nieuwenhuizen--- I went looking to why there were no good CTS tests for this and found this in the vulkan spec: For the OpSRem and OpSMod instructions, if either operand is negative the result is undefined. Note While the OpSRem and OpSMod instructions are supported by the Vulkan environment, they require non-negative values and thus do not enable additional functionality beyond what OpUMod provides. While I'm open to fixing this, you may want to rethink what you are doing. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] anv/pipeline: remove the pipeline layout field from anv_pipeline
3 and 4 are Reviewed-by: Jason EkstrandOn Thu, Jan 25, 2018 at 4:24 AM, Iago Toral Quiroga wrote: > It no longer has any users. > > Suggested-by: Jason Ekstrand > --- > src/intel/vulkan/anv_pipeline.c | 2 -- > src/intel/vulkan/anv_private.h | 1 - > src/intel/vulkan/genX_pipeline.c | 1 - > 3 files changed, 4 deletions(-) > > diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_ > pipeline.c > index 4dc18096af5..43ae9f5ef91 100644 > --- a/src/intel/vulkan/anv_pipeline.c > +++ b/src/intel/vulkan/anv_pipeline.c > @@ -1297,8 +1297,6 @@ anv_pipeline_init(struct anv_pipeline *pipeline, > assert(pCreateInfo->subpass < render_pass->subpass_count); > pipeline->subpass = _pass->subpasses[pCreateInfo->subpass]; > > - pipeline->layout = anv_pipeline_layout_from_ > handle(pCreateInfo->layout); > - > result = anv_reloc_list_init(>batch_relocs, alloc); > if (result != VK_SUCCESS) >return result; > diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_ > private.h > index ae99cd51ff4..ea3af3a0f2b 100644 > --- a/src/intel/vulkan/anv_private.h > +++ b/src/intel/vulkan/anv_private.h > @@ -2147,7 +2147,6 @@ struct anv_pipeline { > struct anv_dynamic_state dynamic_state; > > struct anv_subpass * subpass; > - struct anv_pipeline_layout * layout; > > bool needs_data_cache; > > diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_ > pipeline.c > index 82fdf206a95..91cc37de04a 100644 > --- a/src/intel/vulkan/genX_pipeline.c > +++ b/src/intel/vulkan/genX_pipeline.c > @@ -1756,7 +1756,6 @@ compute_pipeline_create( >return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY); > > pipeline->device = device; > - pipeline->layout = anv_pipeline_layout_from_ > handle(pCreateInfo->layout); > > pipeline->blend_state.map = NULL; > > -- > 2.14.1 > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] anv/pipeline: don't take the layout from the pipeline to compile shaders
I had a few nits below. With those fixed, Reviewed-by: Jason EkstrandOn Thu, Jan 25, 2018 at 4:24 AM, Iago Toral Quiroga wrote: > The Vulkan spec states that VkPipelineLayout objects must not be > destroyed while any command buffer that uses them is in the recording > state, but it permits them to be destroyed otherwise. This means that > applications are allowed to free pipeline layouts after command recording > is finished even if there are pipeline objects that still exist and were > created with these layouts. > > There are two solutions to this, one is to use reference counting on > pipeline layout objects. The other is to avoid holding references to > pipeline layouts where they are not really needed. > > This patch takes a step towards the second option by making the > pipeline shader compile code take pipeline layout from the > VkGraphicsPipelineCreateInfo provided rather than the pipeline > object. > > A follow-up patch will remove any remaining uses of the layout field > so we can remove it from the pipeline object and avoid the need > for reference counting. > > Suggested-by: Jason Ekstrand > --- > src/intel/vulkan/anv_nir.h | 3 +- > src/intel/vulkan/anv_nir_apply_pipeline_layout.c | 2 +- > src/intel/vulkan/anv_nir_lower_ycbcr_textures.c | 9 ++-- > src/intel/vulkan/anv_pipeline.c | 54 > > 4 files changed, 44 insertions(+), 24 deletions(-) > > diff --git a/src/intel/vulkan/anv_nir.h b/src/intel/vulkan/anv_nir.h > index 8ac0a119dac..ce95b40b014 100644 > --- a/src/intel/vulkan/anv_nir.h > +++ b/src/intel/vulkan/anv_nir.h > @@ -38,9 +38,10 @@ void anv_nir_lower_push_constants(nir_shader *shader); > bool anv_nir_lower_multiview(nir_shader *shader, uint32_t view_mask); > > bool anv_nir_lower_ycbcr_textures(nir_shader *shader, > - struct anv_pipeline *pipeline); > + struct anv_pipeline_layout *layout); > > void anv_nir_apply_pipeline_layout(struct anv_pipeline *pipeline, > + struct anv_pipeline_layout *layout, > nir_shader *shader, > struct brw_stage_prog_data *prog_data, > struct anv_pipeline_bind_map *map); > diff --git a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c > b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c > index 6775f9b464e..acabc5426be 100644 > --- a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c > +++ b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c > @@ -326,11 +326,11 @@ setup_vec4_uniform_value(uint32_t *params, uint32_t > offset, unsigned n) > > void > anv_nir_apply_pipeline_layout(struct anv_pipeline *pipeline, > + struct anv_pipeline_layout *layout, >nir_shader *shader, >struct brw_stage_prog_data *prog_data, >struct anv_pipeline_bind_map *map) > { > - struct anv_pipeline_layout *layout = pipeline->layout; > gl_shader_stage stage = shader->info.stage; > > struct apply_pipeline_layout_state state = { > diff --git a/src/intel/vulkan/anv_nir_lower_ycbcr_textures.c > b/src/intel/vulkan/anv_nir_lower_ycbcr_textures.c > index 028f24e2f60..ad793ee0a0c 100644 > --- a/src/intel/vulkan/anv_nir_lower_ycbcr_textures.c > +++ b/src/intel/vulkan/anv_nir_lower_ycbcr_textures.c > @@ -316,13 +316,13 @@ swizzle_channel(struct isl_swizzle swizzle, unsigned > channel) > } > > static bool > -try_lower_tex_ycbcr(struct anv_pipeline *pipeline, > +try_lower_tex_ycbcr(struct anv_pipeline_layout *layout, > nir_builder *builder, > nir_tex_instr *tex) > { > nir_variable *var = tex->texture->var; > const struct anv_descriptor_set_layout *set_layout = > - pipeline->layout->set[var->data.descriptor_set].layout; > + layout->set[var->data.descriptor_set].layout; > const struct anv_descriptor_set_binding_layout *binding = >_layout->binding[var->data.binding]; > > @@ -440,7 +440,8 @@ try_lower_tex_ycbcr(struct anv_pipeline *pipeline, > } > > bool > -anv_nir_lower_ycbcr_textures(nir_shader *shader, struct anv_pipeline > *pipeline) > +anv_nir_lower_ycbcr_textures(nir_shader *shader, > + struct anv_pipeline_layout *layout) > { > bool progress = false; > > @@ -458,7 +459,7 @@ anv_nir_lower_ycbcr_textures(nir_shader *shader, > struct anv_pipeline *pipeline) > continue; > > nir_tex_instr *tex = nir_instr_as_tex(instr); > -function_progress |= try_lower_tex_ycbcr(pipeline, , > tex); > +function_progress |= try_lower_tex_ycbcr(layout, , > tex); > } >} > > diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_ > pipeline.c > index
Re: [Mesa-dev] [PATCH 3/4] r600: add ARB_query_buffer_object support
Am 25.01.2018 um 01:40 schrieb Dave Airlie: > From: Dave Airlie> > This uses a different shader than radeonsi, as we can't address non-256 > aligned ssbos, which the radeonsi code does. This passes some extra > offsets into the shader. Couldn't you just require the query buffers to have sufficient alignment in the first place, hence simplifying this? ssbo's need to have 256B alignment as well, as do UBOs. Albeit I can't really see what GL would require, buffer object alignment is quite a mystery to me in general... > > It also contains a set of u64 instruction implementation that may > or may not be complete (at least the u64div is definitely not something > that works outside this use-case). If r600 grows 64-bit integers, > it will use the GLSL lowering for divmod. > > Signed-off-by: Dave Airlie > --- ... > +static int emit_u64add(struct r600_shader_ctx *ctx, int op, > +int treg, > +int src0_sel, int src0_chan, > +int src1_sel, int src1_chan) > +{ > + struct r600_bytecode_alu alu; > + int r; > + int opc; > + > + if (op == ALU_OP2_ADD_INT) > + opc = ALU_OP2_ADDC_UINT; > + else > + opc = ALU_OP2_SUBB_UINT; > + > + memset(, 0, sizeof(struct r600_bytecode_alu)); > + alu.op = op;; > + alu.dst.sel = treg; > + alu.dst.chan = 0; > + alu.dst.write = 1; > + alu.src[0].sel = src0_sel; > + alu.src[0].chan = src0_chan + 0; > + alu.src[1].sel = src1_sel; > + alu.src[1].chan = src1_chan + 0; > + alu.src[1].neg = 0; > + r = r600_bytecode_add_alu(ctx->bc, ); > + if (r) > + return r; > + > + memset(, 0, sizeof(struct r600_bytecode_alu)); > + alu.op = op; > + alu.dst.sel = treg; > + alu.dst.chan = 1; > + alu.dst.write = 1; > + alu.src[0].sel = src0_sel; > + alu.src[0].chan = src0_chan + 1; > + alu.src[1].sel = src1_sel; > + alu.src[1].chan = src1_chan + 1; > + alu.src[1].neg = 0; > + r = r600_bytecode_add_alu(ctx->bc, ); > + if (r) > + return r; > + > + memset(, 0, sizeof(struct r600_bytecode_alu)); > + alu.op = opc; > + alu.dst.sel = treg; > + alu.dst.chan = 2; > + alu.dst.write = 1; > + alu.last = 1; > + alu.src[0].sel = src0_sel; > + alu.src[0].chan = src0_chan + 0; > + alu.src[1].sel = src1_sel; > + alu.src[1].chan = src1_chan + 0; > + alu.src[1].neg = 0; > + r = r600_bytecode_add_alu(ctx->bc, ); > + if (r) > + return r; > + > + memset(, 0, sizeof(struct r600_bytecode_alu)); > + alu.op = op; > + alu.dst.sel = treg; > + alu.dst.chan = 1; > + alu.dst.write = 1; > + alu.src[0].sel = treg; > + alu.src[0].chan = 1; > + alu.src[1].sel = treg; > + alu.src[1].chan = 2; > + alu.last = 1; > + r = r600_bytecode_add_alu(ctx->bc, ); > + if (r) > + return r; > + return 0; > +} > + > +static int egcm_u64add(struct r600_shader_ctx *ctx) Couldn't you call into emit_u64add for performing the actual add? Or maybe it wouldn't really be simpler... > +{ > + struct tgsi_full_instruction *inst = > >parse.FullToken.FullInstruction; > + struct r600_bytecode_alu alu; > + int r; > + int treg = ctx->temp_reg; > + int op = ALU_OP2_ADD_INT, opc = ALU_OP2_ADDC_UINT; > + > + if (ctx->src[1].neg) { > + op = ALU_OP2_SUB_INT; > + opc = ALU_OP2_SUBB_UINT; > + } > + memset(, 0, sizeof(struct r600_bytecode_alu)); > + alu.op = op;; > + alu.dst.sel = treg; > + alu.dst.chan = 0; > + alu.dst.write = 1; > + r600_bytecode_src([0], >src[0], 0); > + r600_bytecode_src([1], >src[1], 0); > + alu.src[1].neg = 0; > + r = r600_bytecode_add_alu(ctx->bc, ); > + if (r) > + return r; > + > + memset(, 0, sizeof(struct r600_bytecode_alu)); > + alu.op = op; > + alu.dst.sel = treg; > + alu.dst.chan = 1; > + alu.dst.write = 1; > + r600_bytecode_src([0], >src[0], 1); > + r600_bytecode_src([1], >src[1], 1); > + alu.src[1].neg = 0; > + r = r600_bytecode_add_alu(ctx->bc, ); > + if (r) > + return r; > + > + memset(, 0, sizeof(struct r600_bytecode_alu)); > + alu.op = opc ; > + alu.dst.sel = treg; > + alu.dst.chan = 2; > + alu.dst.write = 1; > + alu.last = 1; > + r600_bytecode_src([0], >src[0], 0); > + r600_bytecode_src([1], >src[1], 0); > + alu.src[1].neg = 0; > + r = r600_bytecode_add_alu(ctx->bc, ); > + if (r) > + return r; > + > + memset(, 0, sizeof(struct r600_bytecode_alu)); > + alu.op = op; > + tgsi_dst(ctx, >Dst[0], 1, ); > + alu.src[0].sel = treg; > + alu.src[0].chan = 1; > + alu.src[1].sel = treg; > + alu.src[1].chan = 2; > + alu.last = 1; > + r = r600_bytecode_add_alu(ctx->bc, ); > + if (r) > +
Re: [Mesa-dev] [PATCH 1/4] anv/descriptor_set: add reference counting for descriptor set layouts
On Thu, Jan 25, 2018 at 4:24 AM, Iago Toral Quirogawrote: > The spec states that descriptor set layouts can be destroyed almost > at any time: > >"VkDescriptorSetLayout objects may be accessed by commands that > operate on descriptor sets allocated using that layout, and those > descriptor sets must not be updated with vkUpdateDescriptorSets > after the descriptor set layout has been destroyed. Otherwise, > descriptor set layouts can be destroyed any time they are not in > use by an API command." > > Fixes the following work-in-progress CTS tests: > dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.graphics > dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.compute > > Suggested-by: Jason Ekstrand > --- > src/intel/vulkan/anv_cmd_buffer.c | 6 ++ > src/intel/vulkan/anv_descriptor_set.c | 17 ++--- > src/intel/vulkan/anv_private.h| 26 -- > 3 files changed, 40 insertions(+), 9 deletions(-) > > diff --git a/src/intel/vulkan/anv_cmd_buffer.c b/src/intel/vulkan/anv_cmd_ > buffer.c > index bf80061c6d4..521cf6b6a54 100644 > --- a/src/intel/vulkan/anv_cmd_buffer.c > +++ b/src/intel/vulkan/anv_cmd_buffer.c > @@ -913,8 +913,7 @@ void anv_CmdPushDescriptorSetKHR( > > assert(_set < MAX_SETS); > > - const struct anv_descriptor_set_layout *set_layout = > - layout->set[_set].layout; > + struct anv_descriptor_set_layout *set_layout = > layout->set[_set].layout; > > struct anv_push_descriptor_set *push_set = >anv_cmd_buffer_get_push_descriptor_set(cmd_buffer, > @@ -1006,8 +1005,7 @@ void anv_CmdPushDescriptorSetWithTemplateKHR( > > assert(_set < MAX_PUSH_DESCRIPTORS); > > - const struct anv_descriptor_set_layout *set_layout = > - layout->set[_set].layout; > + struct anv_descriptor_set_layout *set_layout = > layout->set[_set].layout; > > struct anv_push_descriptor_set *push_set = >anv_cmd_buffer_get_push_descriptor_set(cmd_buffer, > diff --git a/src/intel/vulkan/anv_descriptor_set.c b/src/intel/vulkan/anv_ > descriptor_set.c > index 1d4df264ae6..99122aed229 100644 > --- a/src/intel/vulkan/anv_descriptor_set.c > +++ b/src/intel/vulkan/anv_descriptor_set.c > @@ -67,6 +67,8 @@ VkResult anv_CreateDescriptorSetLayout( >return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY); > > memset(set_layout, 0, sizeof(*set_layout)); > + set_layout->ref_cnt = 1; > + set_layout->allocator = pAllocator; > There's a sticky bit here around allocators. Because we're reference counting, there's no guarantee that it will get freed when they call vkDestroyDescriptorSetLayout. This means that VK_SYSTEM_ALLOCATION_SCOPE_OBJECT is not appropriate. Instead, we should be using VK_SYSTEM_ALLOCATION_SCOPE_DEVICE and allocating it off the device allocator ignoring pAllocator. That will probably cause a CTS warning (not an error) in the allocation tests but I think it's the right thing to do. Other than that, looks good! > set_layout->binding_count = max_binding + 1; > > for (uint32_t b = 0; b <= max_binding; b++) { > @@ -204,7 +206,8 @@ void anv_DestroyDescriptorSetLayout( > if (!set_layout) >return; > > - vk_free2(>alloc, pAllocator, set_layout); > + assert(pAllocator == set_layout->allocator); > + anv_descriptor_set_layout_unref(device, set_layout); > } > > static void > @@ -246,6 +249,7 @@ VkResult anv_CreatePipelineLayout( >ANV_FROM_HANDLE(anv_descriptor_set_layout, set_layout, >pCreateInfo->pSetLayouts[set]); >layout->set[set].layout = set_layout; > + anv_descriptor_set_layout_ref(set_layout); > >layout->set[set].dynamic_offset_start = dynamic_offset_count; >for (uint32_t b = 0; b < set_layout->binding_count; b++) { > @@ -290,6 +294,9 @@ void anv_DestroyPipelineLayout( > if (!pipeline_layout) >return; > > + for (uint32_t i = 0; i < pipeline_layout->num_sets; i++) > + anv_descriptor_set_layout_unref(device, pipeline_layout->set[i]. > layout); > + > vk_free2(>alloc, pAllocator, pipeline_layout); > } > > @@ -423,7 +430,7 @@ struct surface_state_free_list_entry { > VkResult > anv_descriptor_set_create(struct anv_device *device, >struct anv_descriptor_pool *pool, > - const struct anv_descriptor_set_layout *layout, > + struct anv_descriptor_set_layout *layout, >struct anv_descriptor_set **out_set) > { > struct anv_descriptor_set *set; > @@ -455,8 +462,10 @@ anv_descriptor_set_create(struct anv_device *device, >} > } > > - set->size = size; > set->layout = layout; > + anv_descriptor_set_layout_ref(layout); > + > + set->size = size; > set->buffer_views = >(struct anv_buffer_view *) >descriptors[layout->size]; > set->buffer_count = layout->buffer_count; > @@ -512,6 +521,8 @@
Re: [Mesa-dev] [Mesa-stable] [PATCH] anv/pipeline: Don't look at blend state unless we have an attachment
It landed as 4b69ba381766cd911eb1284f1b0332a139ec8a75 On Thu, Jan 25, 2018 at 3:27 AM, Emil Velikovwrote: > On 18 January 2018 at 01:16, Jason Ekstrand wrote: > > Without this, we may end up dereferencing blend before we check for > > binding->index != UINT32_MAX. However, Vulkan allows the blend state to > > be NULL so long as you don't have any color attachments. This fixes a > > segfault when running The Talos Principal. > > > > Fixes: 12f4e00b69e724a23504b7bd3958fb75dc462950 > > Cc: mesa-sta...@lists.freedesktop.org > > --- > Jason, did this fall through the cracks or it has been > superseded/rejected for some reason? > > -Emil > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/12] st/va: enable dual instances encode only for H264
From: Boyuan ZhangLogics that related to dual instances encode should only be done for H264, not other codecs. Signed-off-by: Boyuan Zhang --- src/gallium/state_trackers/va/picture.c | 3 ++- src/gallium/state_trackers/va/surface.c | 23 +-- 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/src/gallium/state_trackers/va/picture.c b/src/gallium/state_trackers/va/picture.c index 77d379b..537e931 100644 --- a/src/gallium/state_trackers/va/picture.c +++ b/src/gallium/state_trackers/va/picture.c @@ -650,7 +650,8 @@ vlVaEndPicture(VADriverContextP ctx, VAContextID context_id) } context->decoder->end_frame(context->decoder, context->target, >desc.base); - if (context->decoder->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) { + if (context->decoder->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE && + u_reduce_video_profile(context->templat.profile) == PIPE_VIDEO_FORMAT_MPEG4_AVC) { int idr_period = context->desc.h264enc.gop_size / context->gop_coeff; int p_remain_in_idr = idr_period - context->desc.h264enc.frame_num; surf->frame_num_cnt = context->desc.h264enc.frame_num_cnt; diff --git a/src/gallium/state_trackers/va/surface.c b/src/gallium/state_trackers/va/surface.c index 636505b..9823232 100644 --- a/src/gallium/state_trackers/va/surface.c +++ b/src/gallium/state_trackers/va/surface.c @@ -36,6 +36,7 @@ #include "util/u_rect.h" #include "util/u_sampler.h" #include "util/u_surface.h" +#include "util/u_video.h" #include "vl/vl_compositor.h" #include "vl/vl_video_buffer.h" @@ -122,16 +123,18 @@ vlVaSyncSurface(VADriverContextP ctx, VASurfaceID render_target) } if (context->decoder->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) { - int frame_diff; - if (context->desc.h264enc.frame_num_cnt >= surf->frame_num_cnt) - frame_diff = context->desc.h264enc.frame_num_cnt - surf->frame_num_cnt; - else - frame_diff = 0x - surf->frame_num_cnt + 1 + context->desc.h264enc.frame_num_cnt; - if ((frame_diff == 0) && - (surf->force_flushed == false) && - (context->desc.h264enc.frame_num_cnt % 2 != 0)) { - context->decoder->flush(context->decoder); - context->first_single_submitted = true; + if (u_reduce_video_profile(context->templat.profile) == PIPE_VIDEO_FORMAT_MPEG4_AVC) { + int frame_diff; + if (context->desc.h264enc.frame_num_cnt >= surf->frame_num_cnt) +frame_diff = context->desc.h264enc.frame_num_cnt - surf->frame_num_cnt; + else +frame_diff = 0x - surf->frame_num_cnt + 1 + context->desc.h264enc.frame_num_cnt; + if ((frame_diff == 0) && + (surf->force_flushed == false) && + (context->desc.h264enc.frame_num_cnt % 2 != 0)) { +context->decoder->flush(context->decoder); +context->first_single_submitted = true; + } } context->decoder->get_feedback(context->decoder, surf->feedback, &(surf->coded_buf->coded_size)); surf->feedback = NULL; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/12] st/va: move H264 enc functions into separate file
From: Boyuan ZhangMove all H264 encode related functions into separate file. Similar to VAAPI decode side, there will be separate file for each codec on encode side as well. Signed-off-by: Boyuan Zhang --- src/gallium/state_trackers/va/Makefile.sources | 1 + src/gallium/state_trackers/va/picture.c | 146 +++- src/gallium/state_trackers/va/picture_h264_enc.c | 163 +++ src/gallium/state_trackers/va/va_private.h | 5 + 4 files changed, 218 insertions(+), 97 deletions(-) create mode 100644 src/gallium/state_trackers/va/picture_h264_enc.c diff --git a/src/gallium/state_trackers/va/Makefile.sources b/src/gallium/state_trackers/va/Makefile.sources index 2d6546b..8a69828 100644 --- a/src/gallium/state_trackers/va/Makefile.sources +++ b/src/gallium/state_trackers/va/Makefile.sources @@ -8,6 +8,7 @@ C_SOURCES := \ picture_mpeg12.c \ picture_mpeg4.c \ picture_h264.c \ + picture_h264_enc.c \ picture_hevc.c \ picture_vc1.c \ picture_mjpeg.c \ diff --git a/src/gallium/state_trackers/va/picture.c b/src/gallium/state_trackers/va/picture.c index 8951573..77d379b 100644 --- a/src/gallium/state_trackers/va/picture.c +++ b/src/gallium/state_trackers/va/picture.c @@ -349,55 +349,52 @@ handleVASliceDataBufferType(vlVaContext *context, vlVaBuffer *buf) static VAStatus handleVAEncMiscParameterTypeRateControl(vlVaContext *context, VAEncMiscParameterBuffer *misc) { - VAEncMiscParameterRateControl *rc = (VAEncMiscParameterRateControl *)misc->data; - if (context->desc.h264enc.rate_ctrl.rate_ctrl_method == - PIPE_H264_ENC_RATE_CONTROL_METHOD_CONSTANT) - context->desc.h264enc.rate_ctrl.target_bitrate = rc->bits_per_second; - else - context->desc.h264enc.rate_ctrl.target_bitrate = rc->bits_per_second * (rc->target_percentage / 100.0); - context->desc.h264enc.rate_ctrl.peak_bitrate = rc->bits_per_second; - if (context->desc.h264enc.rate_ctrl.target_bitrate < 200) - context->desc.h264enc.rate_ctrl.vbv_buffer_size = MIN2((context->desc.h264enc.rate_ctrl.target_bitrate * 2.75), 200); - else - context->desc.h264enc.rate_ctrl.vbv_buffer_size = context->desc.h264enc.rate_ctrl.target_bitrate; + VAStatus status = VA_STATUS_SUCCESS; - return VA_STATUS_SUCCESS; + switch (u_reduce_video_profile(context->templat.profile)) { + case PIPE_VIDEO_FORMAT_MPEG4_AVC: + status = vlVaHandleVAEncMiscParameterTypeRateControlH264(context, misc); + break; + + default: + break; + } + + return status; } static VAStatus handleVAEncMiscParameterTypeFrameRate(vlVaContext *context, VAEncMiscParameterBuffer *misc) { - VAEncMiscParameterFrameRate *fr = (VAEncMiscParameterFrameRate *)misc->data; - if (fr->framerate & 0x) { - context->desc.h264enc.rate_ctrl.frame_rate_num = fr->framerate & 0x; - context->desc.h264enc.rate_ctrl.frame_rate_den = fr->framerate >> 16 & 0x; - } else { - context->desc.h264enc.rate_ctrl.frame_rate_num = fr->framerate; - context->desc.h264enc.rate_ctrl.frame_rate_den = 1; + VAStatus status = VA_STATUS_SUCCESS; + + switch (u_reduce_video_profile(context->templat.profile)) { + case PIPE_VIDEO_FORMAT_MPEG4_AVC: + status = vlVaHandleVAEncMiscParameterTypeFrameRateH264(context, misc); + break; + + default: + break; } - return VA_STATUS_SUCCESS; + + return status; } static VAStatus handleVAEncSequenceParameterBufferType(vlVaDriver *drv, vlVaContext *context, vlVaBuffer *buf) { - VAEncSequenceParameterBufferH264 *h264 = (VAEncSequenceParameterBufferH264 *)buf->data; - if (!context->decoder) { - context->templat.max_references = h264->max_num_ref_frames; - context->templat.level = h264->level_idc; - context->decoder = drv->pipe->create_video_codec(drv->pipe, >templat); - if (!context->decoder) - return VA_STATUS_ERROR_ALLOCATION_FAILED; + VAStatus status = VA_STATUS_SUCCESS; + + switch (u_reduce_video_profile(context->templat.profile)) { + case PIPE_VIDEO_FORMAT_MPEG4_AVC: + status = vlVaHandleVAEncSequenceParameterBufferTypeH264(drv, context, buf); + break; + + default: + break; } - context->gop_coeff = ((1024 + h264->intra_idr_period - 1) / h264->intra_idr_period + 1) / 2 * 2; - if (context->gop_coeff > VL_VA_ENC_GOP_COEFF) - context->gop_coeff = VL_VA_ENC_GOP_COEFF; - context->desc.h264enc.gop_size = h264->intra_idr_period * context->gop_coeff; - context->desc.h264enc.rate_ctrl.frame_rate_num = h264->time_scale / 2; - context->desc.h264enc.rate_ctrl.frame_rate_den = h264->num_units_in_tick; - context->desc.h264enc.pic_order_cnt_type = h264->seq_fields.bits.pic_order_cnt_type; - return VA_STATUS_SUCCESS; + return status; } static VAStatus @@ -426,80 +423,35 @@ handleVAEncMiscParameterBufferType(vlVaContext *context,
[Mesa-dev] [PATCH 10/12] st/va: add HEVC encode functions
From: Boyuan ZhangAdd a separate file for HEVC encode functions. Signed-off-by: Boyuan Zhang --- src/gallium/state_trackers/va/Makefile.sources | 1 + src/gallium/state_trackers/va/picture.c | 56 +-- src/gallium/state_trackers/va/picture_hevc_enc.c | 69 src/gallium/state_trackers/va/va_private.h | 5 ++ 4 files changed, 128 insertions(+), 3 deletions(-) create mode 100644 src/gallium/state_trackers/va/picture_hevc_enc.c diff --git a/src/gallium/state_trackers/va/Makefile.sources b/src/gallium/state_trackers/va/Makefile.sources index 8a69828..f3a13f2 100644 --- a/src/gallium/state_trackers/va/Makefile.sources +++ b/src/gallium/state_trackers/va/Makefile.sources @@ -10,6 +10,7 @@ C_SOURCES := \ picture_h264.c \ picture_h264_enc.c \ picture_hevc.c \ + picture_hevc_enc.c \ picture_vc1.c \ picture_mjpeg.c \ postproc.c \ diff --git a/src/gallium/state_trackers/va/picture.c b/src/gallium/state_trackers/va/picture.c index 537e931..e26996c 100644 --- a/src/gallium/state_trackers/va/picture.c +++ b/src/gallium/state_trackers/va/picture.c @@ -139,6 +139,31 @@ getEncParamPreset(vlVaContext *context) context->desc.h264enc.ref_pic_mode = 0x0201; } +static void +getEncParamPresetH265(vlVaContext *context) +{ + //rate control + context->desc.h265enc.rc.vbv_buffer_size = 2000; + context->desc.h265enc.rc.vbv_buf_lv = 48; + context->desc.h265enc.rc.fill_data_enable = 1; + context->desc.h265enc.rc.enforce_hrd = 1; + if (context->desc.h265enc.rc.frame_rate_num == 0 || + context->desc.h265enc.rc.frame_rate_den == 0) { + context->desc.h265enc.rc.frame_rate_num = 30; + context->desc.h265enc.rc.frame_rate_den = 1; + } + context->desc.h265enc.rc.target_bits_picture = + context->desc.h265enc.rc.target_bitrate * + ((float)context->desc.h265enc.rc.frame_rate_den / + context->desc.h265enc.rc.frame_rate_num); + context->desc.h265enc.rc.peak_bits_picture_integer = + context->desc.h265enc.rc.peak_bitrate * + ((float)context->desc.h265enc.rc.frame_rate_den / + context->desc.h265enc.rc.frame_rate_num); + + context->desc.h265enc.rc.peak_bits_picture_fraction = 0; +} + static VAStatus handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext *context, vlVaBuffer *buf) { @@ -356,6 +381,10 @@ handleVAEncMiscParameterTypeRateControl(vlVaContext *context, VAEncMiscParameter status = vlVaHandleVAEncMiscParameterTypeRateControlH264(context, misc); break; + case PIPE_VIDEO_FORMAT_HEVC: + status = vlVaHandleVAEncMiscParameterTypeRateControlHEVC(context, misc); + break; + default: break; } @@ -373,6 +402,10 @@ handleVAEncMiscParameterTypeFrameRate(vlVaContext *context, VAEncMiscParameterBu status = vlVaHandleVAEncMiscParameterTypeFrameRateH264(context, misc); break; + case PIPE_VIDEO_FORMAT_HEVC: + status = vlVaHandleVAEncMiscParameterTypeFrameRateHEVC(context, misc); + break; + default: break; } @@ -390,6 +423,10 @@ handleVAEncSequenceParameterBufferType(vlVaDriver *drv, vlVaContext *context, vl status = vlVaHandleVAEncSequenceParameterBufferTypeH264(drv, context, buf); break; + case PIPE_VIDEO_FORMAT_HEVC: + status = vlVaHandleVAEncSequenceParameterBufferTypeHEVC(drv, context, buf); + break; + default: break; } @@ -430,6 +467,10 @@ handleVAEncPictureParameterBufferType(vlVaDriver *drv, vlVaContext *context, vlV status = vlVaHandleVAEncPictureParameterBufferTypeH264(drv, context, buf); break; + case PIPE_VIDEO_FORMAT_HEVC: + status = vlVaHandleVAEncPictureParameterBufferTypeHEVC(drv, context, buf); + break; + default: break; } @@ -447,6 +488,10 @@ handleVAEncSliceParameterBufferType(vlVaDriver *drv, vlVaContext *context, vlVaB status = vlVaHandleVAEncSliceParameterBufferTypeH264(drv, context, buf); break; + case PIPE_VIDEO_FORMAT_HEVC: + status = vlVaHandleVAEncSliceParameterBufferTypeHEVC(drv, context, buf); + break; + default: break; } @@ -640,8 +685,11 @@ vlVaEndPicture(VADriverContextP ctx, VAContextID context_id) if (context->decoder->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) { coded_buf = context->coded_buf; - getEncParamPreset(context); - context->desc.h264enc.frame_num_cnt++; + if (u_reduce_video_profile(context->templat.profile) == PIPE_VIDEO_FORMAT_MPEG4_AVC) { + getEncParamPreset(context); + context->desc.h264enc.frame_num_cnt++; + } else if (u_reduce_video_profile(context->templat.profile) == PIPE_VIDEO_FORMAT_HEVC) + getEncParamPresetH265(context); context->decoder->begin_frame(context->decoder, context->target, >desc.base); context->decoder->encode_bitstream(context->decoder,
[Mesa-dev] [PATCH 12/12] radeonsi: enable vcn encode for HEVC main
From: Boyuan ZhangEnable vcn encode for HEVC main profile on Raven. Signed-off-by: Boyuan Zhang --- src/gallium/drivers/radeonsi/si_get.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_get.c b/src/gallium/drivers/radeonsi/si_get.c index 1c84a25..8382721 100644 --- a/src/gallium/drivers/radeonsi/si_get.c +++ b/src/gallium/drivers/radeonsi/si_get.c @@ -588,8 +588,10 @@ static int si_get_video_param(struct pipe_screen *screen, if (entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) { switch (param) { case PIPE_VIDEO_CAP_SUPPORTED: - return codec == PIPE_VIDEO_FORMAT_MPEG4_AVC && + return (codec == PIPE_VIDEO_FORMAT_MPEG4_AVC && (si_vce_is_fw_version_supported(sscreen) || + sscreen->info.family == CHIP_RAVEN)) || + (profile == PIPE_VIDEO_PROFILE_HEVC_MAIN && sscreen->info.family == CHIP_RAVEN); case PIPE_VIDEO_CAP_NPOT_TEXTURES: return 1; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/12] st/va: implement HEVC encode functions
From: Boyuan ZhangImplement HEVC encode functions based on VAAPI HEVC encode interface. Signed-off-by: Boyuan Zhang --- src/gallium/state_trackers/va/picture_hevc_enc.c | 130 ++- 1 file changed, 125 insertions(+), 5 deletions(-) diff --git a/src/gallium/state_trackers/va/picture_hevc_enc.c b/src/gallium/state_trackers/va/picture_hevc_enc.c index 144bb8c..1f14098 100644 --- a/src/gallium/state_trackers/va/picture_hevc_enc.c +++ b/src/gallium/state_trackers/va/picture_hevc_enc.c @@ -32,7 +32,50 @@ VAStatus vlVaHandleVAEncPictureParameterBufferTypeHEVC(vlVaDriver *drv, vlVaContext *context, vlVaBuffer *buf) { - /* TODO */ + VAEncPictureParameterBufferHEVC *h265; + vlVaBuffer *coded_buf; + int i; + + h265 = buf->data; + context->desc.h265enc.decoded_curr_pic = h265->decoded_curr_pic.picture_id; + + for (i = 0; i < 15; i++) + context->desc.h265enc.reference_frames[i] = h265->reference_frames[i].picture_id; + + context->desc.h265enc.pic_order_cnt = h265->decoded_curr_pic.pic_order_cnt; + coded_buf = handle_table_get(drv->htab, h265->coded_buf); + + if (!coded_buf->derived_surface.resource) + coded_buf->derived_surface.resource = pipe_buffer_create(drv->pipe->screen, PIPE_BIND_VERTEX_BUFFER, +PIPE_USAGE_STREAM, coded_buf->size); + + context->coded_buf = coded_buf; + context->desc.h265enc.pic.log2_parallel_merge_level_minus2 = h265->log2_parallel_merge_level_minus2; + context->desc.h265enc.pic.nal_unit_type = h265->nal_unit_type; + + switch(h265->pic_fields.bits.coding_type) { + case 1: + if (h265->pic_fields.bits.idr_pic_flag) + context->desc.h265enc.picture_type = PIPE_H265_ENC_PICTURE_TYPE_IDR; + else + context->desc.h265enc.picture_type = PIPE_H265_ENC_PICTURE_TYPE_I; + break; + case 2: + context->desc.h265enc.picture_type = PIPE_H265_ENC_PICTURE_TYPE_P; + break; + case 3: + case 4: + case 5: + return VA_STATUS_ERROR_UNIMPLEMENTED; //no b frame support + break; + } + + context->desc.h265enc.pic.constrained_intra_pred_flag = h265->pic_fields.bits.constrained_intra_pred_flag; + context->desc.h265enc.pic.loop_filter_across_tiles_enabled_flag = h265->pic_fields.bits.loop_filter_across_tiles_enabled_flag; + + util_hash_table_set(context->desc.h265enc.frame_idx, + UINT_TO_PTR(h265->decoded_curr_pic.picture_id), + UINT_TO_PTR(context->desc.h265enc.frame_num)); return VA_STATUS_SUCCESS; } @@ -40,7 +83,33 @@ vlVaHandleVAEncPictureParameterBufferTypeHEVC(vlVaDriver *drv, vlVaContext *cont VAStatus vlVaHandleVAEncSliceParameterBufferTypeHEVC(vlVaDriver *drv, vlVaContext *context, vlVaBuffer *buf) { - /* TODO */ + VAEncSliceParameterBufferHEVC *h265; + + h265 = buf->data; + context->desc.h265enc.ref_idx_l0 = VA_INVALID_ID; + context->desc.h265enc.ref_idx_l1 = VA_INVALID_ID; + + for (int i = 0; i < 15; i++) { + if (h265->ref_pic_list0[i].picture_id != VA_INVALID_ID) { + if (context->desc.h265enc.ref_idx_l0 == VA_INVALID_ID) +context->desc.h265enc.ref_idx_l0 = PTR_TO_UINT(util_hash_table_get(context->desc.h265enc.frame_idx, + UINT_TO_PTR(h265->ref_pic_list0[i].picture_id))); + } + if (h265->ref_pic_list1[i].picture_id != VA_INVALID_ID && h265->slice_type == 1) { + if (context->desc.h265enc.ref_idx_l1 == VA_INVALID_ID) +context->desc.h265enc.ref_idx_l1 = PTR_TO_UINT(util_hash_table_get(context->desc.h265enc.frame_idx, + UINT_TO_PTR(h265->ref_pic_list1[i].picture_id))); + } + } + + context->desc.h265enc.slice.max_num_merge_cand = h265->max_num_merge_cand; + context->desc.h265enc.slice.slice_cb_qp_offset = h265->slice_cb_qp_offset; + context->desc.h265enc.slice.slice_cr_qp_offset = h265->slice_cr_qp_offset; + context->desc.h265enc.slice.slice_beta_offset_div2 = h265->slice_beta_offset_div2; + context->desc.h265enc.slice.slice_tc_offset_div2 = h265->slice_tc_offset_div2; + context->desc.h265enc.slice.cabac_init_flag = h265->slice_fields.bits.cabac_init_flag; + context->desc.h265enc.slice.slice_deblocking_filter_disabled_flag = h265->slice_fields.bits.slice_deblocking_filter_disabled_flag; + context->desc.h265enc.slice.slice_loop_filter_across_slices_enabled_flag = h265->slice_fields.bits.slice_loop_filter_across_slices_enabled_flag; return VA_STATUS_SUCCESS; } @@ -48,14 +117,57 @@ vlVaHandleVAEncSliceParameterBufferTypeHEVC(vlVaDriver *drv, vlVaContext *contex VAStatus vlVaHandleVAEncSequenceParameterBufferTypeHEVC(vlVaDriver *drv, vlVaContext *context, vlVaBuffer *buf) { - /* TODO */ + VAEncSequenceParameterBufferHEVC *h265 = (VAEncSequenceParameterBufferHEVC *)buf->data; + + if (!context->decoder) { + context->templat.level =
[Mesa-dev] [PATCH 02/12] radeon/vcn: add vcn encode interface for HEVC
From: Boyuan ZhangAdd vcn encode interface for HEVC, and rename radeon_enc_h264_enc_pic to radeon_enc_pic since radeon_enc_pic is used by both H264 and HEVC. Signed-off-by: Boyuan Zhang --- src/gallium/drivers/radeon/radeon_vcn_enc.h | 82 - 1 file changed, 80 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_vcn_enc.h b/src/gallium/drivers/radeon/radeon_vcn_enc.h index 0385860..2ec42e4 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_enc.h +++ b/src/gallium/drivers/radeon/radeon_vcn_enc.h @@ -48,6 +48,10 @@ #define RENCODE_IB_PARAM_FEEDBACK_BUFFER 0x0010 #define RENCODE_IB_PARAM_DIRECT_OUTPUT_NALU0x0020 +#define RENCODE_HEVC_IB_PARAM_SLICE_CONTROL0x0011 +#define RENCODE_HEVC_IB_PARAM_SPEC_MISC0x0012 +#define RENCODE_HEVC_IB_PARAM_DEBLOCKING_FILTER0x0013 + #define RENCODE_H264_IB_PARAM_SLICE_CONTROL0x0021 #define RENCODE_H264_IB_PARAM_SPEC_MISC0x0022 #define RENCODE_H264_IB_PARAM_ENCODE_PARAMS0x0023 @@ -67,6 +71,7 @@ #define RENCODE_IF_MINOR_VERSION_MASK 0x #define RENCODE_IF_MINOR_VERSION_SHIFT 0 +#define RENCODE_ENCODE_STANDARD_HEVC 0 #define RENCODE_ENCODE_STANDARD_H264 1 #define RENCODE_PREENCODE_MODE_NONE0x @@ -77,6 +82,9 @@ #define RENCODE_H264_SLICE_CONTROL_MODE_FIXED_MBS 0x #define RENCODE_H264_SLICE_CONTROL_MODE_FIXED_BITS 0x0001 +#define RENCODE_HEVC_SLICE_CONTROL_MODE_FIXED_CTBS 0x +#define RENCODE_HEVC_SLICE_CONTROL_MODE_FIXED_BITS 0x0001 + #define RENCODE_RATE_CONTROL_METHOD_NONE 0x #define RENCODE_RATE_CONTROL_METHOD_LATENCY_CONSTRAINED_VBR0x0001 #define RENCODE_RATE_CONTROL_METHOD_PEAK_CONSTRAINED_VBR 0x0002 @@ -95,6 +103,11 @@ #define RENCODE_HEADER_INSTRUCTION_END 0x #define RENCODE_HEADER_INSTRUCTION_COPY0x0001 +#define RENCODE_HEVC_HEADER_INSTRUCTION_DEPENDENT_SLICE_END0x0001 +#define RENCODE_HEVC_HEADER_INSTRUCTION_FIRST_SLICE0x00010001 +#define RENCODE_HEVC_HEADER_INSTRUCTION_SLICE_SEGMENT 0x00010002 +#define RENCODE_HEVC_HEADER_INSTRUCTION_SLICE_QP_DELTA 0x00010003 + #define RENCODE_H264_HEADER_INSTRUCTION_FIRST_MB 0x0002 #define RENCODE_H264_HEADER_INSTRUCTION_SLICE_QP_DELTA 0x00020001 @@ -181,6 +194,25 @@ typedef struct rvcn_enc_h264_slice_control_s }; } rvcn_enc_h264_slice_control_t; +typedef struct rvcn_enc_hevc_slice_control_s +{ +uint32_t slice_control_mode; +union +{ +struct +{ +uint32_t num_ctbs_per_slice; +uint32_t num_ctbs_per_slice_segment; +} fixed_ctbs_per_slice; + +struct +{ +uint32_t num_bits_per_slice; +uint32_t num_bits_per_slice_segment; +} fixed_bits_per_slice; +}; +} rvcn_enc_hevc_slice_control_t; + typedef struct rvcn_enc_h264_spec_misc_s { uint32_t constrained_intra_pred_flag; @@ -192,6 +224,17 @@ typedef struct rvcn_enc_h264_spec_misc_s uint32_t level_idc; } rvcn_enc_h264_spec_misc_t; +typedef struct rvcn_enc_hevc_spec_misc_s +{ +uint32_t log2_min_luma_coding_block_size_minus3; +uint32_t amp_disabled; +uint32_t strong_intra_smoothing_enabled; +uint32_t constrained_intra_pred_flag; +uint32_t cabac_init_flag; +uint32_t half_pel_enabled; +uint32_t quarter_pel_enabled; +} rvcn_enc_hevc_spec_misc_t; + typedef struct rvcn_enc_rate_ctl_session_init_s { uint32_t rate_control_method; @@ -276,6 +319,16 @@ typedef struct rvcn_enc_h264_deblocking_filter_s int32_tcr_qp_offset; } rvcn_enc_h264_deblocking_filter_t; +typedef struct rvcn_enc_hevc_deblocking_filter_s +{ +uint32_t loop_filter_across_slices_enabled; +int32_tdeblocking_filter_disabled; +int32_tbeta_offset_div2; +int32_ttc_offset_div2; +int32_tcb_qp_offset; +int32_tcr_qp_offset; +} rvcn_enc_hevc_deblocking_filter_t; + typedef struct rvcn_enc_intra_refresh_s { uint32_t intra_refresh_mode; @@ -331,7 +384,7 @@ struct pipe_video_codec *radeon_create_encoder(struct pipe_context *context, struct radeon_winsys* ws, radeon_enc_get_buffer get_buffer); -struct radeon_enc_h264_enc_pic { +struct radeon_enc_pic { enumpipe_h264_enc_picture_type picture_type; unsignedframe_num; @@ -343,21 +396,46 @@ struct radeon_enc_h264_enc_pic { unsignedcrop_right; unsigned
[Mesa-dev] [PATCH 03/12] radeon/vcn: support picture parameters for HEVC
From: Boyuan ZhangPass pipe_picture_desc instead of pipe_h264_enc_picture_desc so that it can be used for different codecs. Add functions to handle picture parameters that will be used for HEVC encode. Signed-off-by: Boyuan Zhang --- src/gallium/drivers/radeon/radeon_vcn_enc.c | 73 +++-- src/gallium/drivers/radeon/radeon_vcn_enc.h | 2 +- src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c | 11 ++-- 3 files changed, 65 insertions(+), 21 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_vcn_enc.c b/src/gallium/drivers/radeon/radeon_vcn_enc.c index 06579c8..20be5e6 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_enc.c +++ b/src/gallium/drivers/radeon/radeon_vcn_enc.c @@ -38,20 +38,62 @@ #include "radeon_video.h" #include "radeon_vcn_enc.h" -static void radeon_vcn_enc_get_param(struct radeon_encoder *enc, struct pipe_h264_enc_picture_desc *pic) +static void radeon_vcn_enc_get_param(struct radeon_encoder *enc, struct pipe_picture_desc *picture) { - enc->enc_pic.picture_type = pic->picture_type; - enc->enc_pic.frame_num = pic->frame_num; - enc->enc_pic.pic_order_cnt = pic->pic_order_cnt; - enc->enc_pic.pic_order_cnt_type = pic->pic_order_cnt_type; - enc->enc_pic.ref_idx_l0 = pic->ref_idx_l0; - enc->enc_pic.ref_idx_l1 = pic->ref_idx_l1; - enc->enc_pic.not_referenced = pic->not_referenced; - enc->enc_pic.is_idr = (pic->picture_type == PIPE_H264_ENC_PICTURE_TYPE_IDR); - enc->enc_pic.crop_left = 0; - enc->enc_pic.crop_right = (align(enc->base.width, 16) - enc->base.width) / 2; - enc->enc_pic.crop_top = 0; - enc->enc_pic.crop_bottom = (align(enc->base.height, 16) - enc->base.height) / 2; + if (u_reduce_video_profile(picture->profile) == PIPE_VIDEO_FORMAT_MPEG4_AVC) { + struct pipe_h264_enc_picture_desc *pic = (struct pipe_h264_enc_picture_desc *)picture; + enc->enc_pic.picture_type = pic->picture_type; + enc->enc_pic.frame_num = pic->frame_num; + enc->enc_pic.pic_order_cnt = pic->pic_order_cnt; + enc->enc_pic.pic_order_cnt_type = pic->pic_order_cnt_type; + enc->enc_pic.ref_idx_l0 = pic->ref_idx_l0; + enc->enc_pic.ref_idx_l1 = pic->ref_idx_l1; + enc->enc_pic.not_referenced = pic->not_referenced; + enc->enc_pic.is_idr = (pic->picture_type == PIPE_H264_ENC_PICTURE_TYPE_IDR); + enc->enc_pic.crop_left = 0; + enc->enc_pic.crop_right = (align(enc->base.width, 16) - enc->base.width) / 2; + enc->enc_pic.crop_top = 0; + enc->enc_pic.crop_bottom = (align(enc->base.height, 16) - enc->base.height) / 2; + } else if (u_reduce_video_profile(picture->profile) == PIPE_VIDEO_FORMAT_HEVC) { + struct pipe_h265_enc_picture_desc *pic = (struct pipe_h265_enc_picture_desc *)picture; + enc->enc_pic.picture_type = pic->picture_type; + enc->enc_pic.frame_num = pic->frame_num; + enc->enc_pic.pic_order_cnt = pic->pic_order_cnt; + enc->enc_pic.pic_order_cnt_type = pic->pic_order_cnt_type; + enc->enc_pic.ref_idx_l0 = pic->ref_idx_l0; + enc->enc_pic.ref_idx_l1 = pic->ref_idx_l1; + enc->enc_pic.not_referenced = pic->not_referenced; + enc->enc_pic.is_idr = (pic->picture_type == PIPE_H265_ENC_PICTURE_TYPE_IDR) || +(pic->picture_type == PIPE_H265_ENC_PICTURE_TYPE_I); + enc->enc_pic.crop_left = 0; + enc->enc_pic.crop_right = (align(enc->base.width, 16) - enc->base.width) / 2; + enc->enc_pic.crop_top = 0; + enc->enc_pic.crop_bottom = (align(enc->base.height, 16) - enc->base.height) / 2; + enc->enc_pic.general_tier_flag = pic->seq.general_tier_flag; + enc->enc_pic.general_profile_idc = pic->seq.general_profile_idc; + enc->enc_pic.general_level_idc = pic->seq.general_level_idc; + enc->enc_pic.max_poc = pic->seq.intra_period; + enc->enc_pic.log2_max_poc = 0; + for (int i = enc->enc_pic.max_poc; i != 0; enc->enc_pic.log2_max_poc++) + i = (i >> 1); + enc->enc_pic.chroma_format_idc = pic->seq.chroma_format_idc; + enc->enc_pic.pic_width_in_luma_samples = pic->seq.pic_width_in_luma_samples; + enc->enc_pic.pic_height_in_luma_samples = pic->seq.pic_height_in_luma_samples; + enc->enc_pic.log2_diff_max_min_luma_coding_block_size = pic->seq.log2_diff_max_min_luma_coding_block_size; + enc->enc_pic.log2_min_transform_block_size_minus2 = pic->seq.log2_min_transform_block_size_minus2; + enc->enc_pic.log2_diff_max_min_transform_block_size = pic->seq.log2_diff_max_min_transform_block_size; + enc->enc_pic.max_transform_hierarchy_depth_inter = pic->seq.max_transform_hierarchy_depth_inter; + enc->enc_pic.max_transform_hierarchy_depth_intra = pic->seq.max_transform_hierarchy_depth_intra; + enc->enc_pic.log2_parallel_merge_level_minus2 = pic->pic.log2_parallel_merge_level_minus2; + enc->enc_pic.bit_depth_luma_minus8 =
[Mesa-dev] [PATCH 05/12] radeon/vcn: add header implementations for HEVC
From: Boyuan ZhangImplement encoding of sps, pps, vps, aud, and slice headers for HEVC based on HEVC specs. Signed-off-by: Boyuan Zhang --- src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c | 348 +++- 1 file changed, 347 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c b/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c index a651f7e..74c4a08 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c +++ b/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c @@ -551,6 +551,86 @@ static void radeon_enc_nalu_sps(struct radeon_encoder *enc) RADEON_ENC_END(); } +static void radeon_enc_nalu_sps_hevc(struct radeon_encoder *enc) +{ + RADEON_ENC_BEGIN(RENCODE_IB_PARAM_DIRECT_OUTPUT_NALU); + RADEON_ENC_CS(RENCODE_DIRECT_OUTPUT_NALU_TYPE_SPS); + uint32_t *size_in_bytes = >cs->current.buf[enc->cs->current.cdw++]; + int i; + + radeon_enc_reset(enc); + radeon_enc_set_emulation_prevention(enc, false); + radeon_enc_code_fixed_bits(enc, 0x0001, 32); + radeon_enc_code_fixed_bits(enc, 0x4201, 16); + radeon_enc_byte_align(enc); + radeon_enc_set_emulation_prevention(enc, true); + radeon_enc_code_fixed_bits(enc, 0x0, 4); + radeon_enc_code_fixed_bits(enc, enc->enc_pic.layer_ctrl.max_num_temporal_layers - 1, 3); + radeon_enc_code_fixed_bits(enc, 0x1, 1); + radeon_enc_code_fixed_bits(enc, 0x0, 2); + radeon_enc_code_fixed_bits(enc, enc->enc_pic.general_tier_flag, 1); + radeon_enc_code_fixed_bits(enc, enc->enc_pic.general_profile_idc, 5); + radeon_enc_code_fixed_bits(enc, 0x6000, 32); + radeon_enc_code_fixed_bits(enc, 0xb000, 32); + radeon_enc_code_fixed_bits(enc, 0x0, 16); + radeon_enc_code_fixed_bits(enc, enc->enc_pic.general_level_idc, 8); + + for (i = 0; i < (enc->enc_pic.layer_ctrl.max_num_temporal_layers - 1) ; i++) + radeon_enc_code_fixed_bits(enc, 0x0, 2); + + if ((enc->enc_pic.layer_ctrl.max_num_temporal_layers - 1) > 0) { + for (i = (enc->enc_pic.layer_ctrl.max_num_temporal_layers - 1); i < 8; i++) + radeon_enc_code_fixed_bits(enc, 0x0, 2); + } + + radeon_enc_code_ue(enc, 0x0); + radeon_enc_code_ue(enc, enc->enc_pic.chroma_format_idc); + radeon_enc_code_ue(enc, enc->enc_pic.pic_width_in_luma_samples); + radeon_enc_code_ue(enc, enc->enc_pic.pic_height_in_luma_samples); + radeon_enc_code_fixed_bits(enc, 0x0, 1); + radeon_enc_code_ue(enc, enc->enc_pic.bit_depth_luma_minus8); + radeon_enc_code_ue(enc, enc->enc_pic.bit_depth_chroma_minus8); + radeon_enc_code_ue(enc, enc->enc_pic.log2_max_poc - 4); + radeon_enc_code_fixed_bits(enc, 0x0, 1); + radeon_enc_code_ue(enc, 1); + radeon_enc_code_ue(enc, 0x0); + radeon_enc_code_ue(enc, 0x0); + radeon_enc_code_ue(enc, enc->enc_pic.hevc_spec_misc.log2_min_luma_coding_block_size_minus3); + //Only support CTBSize 64 + radeon_enc_code_ue(enc, 6 - (enc->enc_pic.hevc_spec_misc.log2_min_luma_coding_block_size_minus3 + 3)); + radeon_enc_code_ue(enc, enc->enc_pic.log2_min_transform_block_size_minus2); + radeon_enc_code_ue(enc, enc->enc_pic.log2_diff_max_min_transform_block_size); + radeon_enc_code_ue(enc, enc->enc_pic.max_transform_hierarchy_depth_inter); + radeon_enc_code_ue(enc, enc->enc_pic.max_transform_hierarchy_depth_intra); + + radeon_enc_code_fixed_bits(enc, 0x0, 1); + radeon_enc_code_fixed_bits(enc, !enc->enc_pic.hevc_spec_misc.amp_disabled, 1); + radeon_enc_code_fixed_bits(enc, enc->enc_pic.sample_adaptive_offset_enabled_flag, 1); + radeon_enc_code_fixed_bits(enc, enc->enc_pic.pcm_enabled_flag, 1); + + radeon_enc_code_ue(enc, 1); + radeon_enc_code_ue(enc, 1); + radeon_enc_code_ue(enc, 0); + radeon_enc_code_ue(enc, 0); + radeon_enc_code_fixed_bits(enc, 0x1, 1); + + radeon_enc_code_fixed_bits(enc, 0x0, 1); + + radeon_enc_code_fixed_bits(enc, 0, 1); + radeon_enc_code_fixed_bits(enc, enc->enc_pic.hevc_spec_misc.strong_intra_smoothing_enabled, 1); + + radeon_enc_code_fixed_bits(enc, 0x0, 1); + + radeon_enc_code_fixed_bits(enc, 0x0, 1); + + radeon_enc_code_fixed_bits(enc, 0x1, 1); + + radeon_enc_byte_align(enc); + radeon_enc_flush_headers(enc); + *size_in_bytes = (enc->bits_output + 7) / 8; + RADEON_ENC_END(); +} + static void radeon_enc_nalu_pps(struct radeon_encoder *enc) { RADEON_ENC_BEGIN(RENCODE_IB_PARAM_DIRECT_OUTPUT_NALU); @@ -586,6 +666,150 @@ static void radeon_enc_nalu_pps(struct radeon_encoder *enc) RADEON_ENC_END(); } +static void radeon_enc_nalu_pps_hevc(struct radeon_encoder *enc) +{ + RADEON_ENC_BEGIN(RENCODE_IB_PARAM_DIRECT_OUTPUT_NALU); + RADEON_ENC_CS(RENCODE_DIRECT_OUTPUT_NALU_TYPE_PPS); +
[Mesa-dev] [PATCH 07/12] st/va: add HEVC picture desc
From: Boyuan ZhangAdd HEVC picture desc, and add codec check when creating and destroying context. Signed-off-by: Boyuan Zhang --- src/gallium/state_trackers/va/context.c| 26 ++ src/gallium/state_trackers/va/va_private.h | 1 + 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/src/gallium/state_trackers/va/context.c b/src/gallium/state_trackers/va/context.c index 78e1f19..f03b326 100644 --- a/src/gallium/state_trackers/va/context.c +++ b/src/gallium/state_trackers/va/context.c @@ -284,8 +284,18 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID config_id, int picture_width, context->desc.base.profile = config->profile; context->desc.base.entry_point = config->entrypoint; if (config->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) { - context->desc.h264enc.rate_ctrl.rate_ctrl_method = config->rc; - context->desc.h264enc.frame_idx = util_hash_table_create(handle_hash, handle_compare); + switch (u_reduce_video_profile(context->templat.profile)) { + case PIPE_VIDEO_FORMAT_MPEG4_AVC: + context->desc.h264enc.rate_ctrl.rate_ctrl_method = config->rc; + context->desc.h264enc.frame_idx = util_hash_table_create(handle_hash, handle_compare); + break; + case PIPE_VIDEO_FORMAT_HEVC: + context->desc.h265enc.rc.rate_ctrl_method = config->rc; + context->desc.h265enc.frame_idx = util_hash_table_create(handle_hash, handle_compare); + break; + default: + break; + } } mtx_lock(>mutex); @@ -314,8 +324,16 @@ vlVaDestroyContext(VADriverContextP ctx, VAContextID context_id) if (context->decoder) { if (context->desc.base.entry_point == PIPE_VIDEO_ENTRYPOINT_ENCODE) { - if (context->desc.h264enc.frame_idx) -util_hash_table_destroy (context->desc.h264enc.frame_idx); + if (u_reduce_video_profile(context->decoder->profile) == + PIPE_VIDEO_FORMAT_MPEG4_AVC) { +if (context->desc.h264enc.frame_idx) + util_hash_table_destroy (context->desc.h264enc.frame_idx); + } + if (u_reduce_video_profile(context->decoder->profile) == + PIPE_VIDEO_FORMAT_HEVC) { +if (context->desc.h265enc.frame_idx) + util_hash_table_destroy (context->desc.h265enc.frame_idx); + } } else { if (u_reduce_video_profile(context->decoder->profile) == PIPE_VIDEO_FORMAT_MPEG4_AVC) { diff --git a/src/gallium/state_trackers/va/va_private.h b/src/gallium/state_trackers/va/va_private.h index 520f970..c022feb 100644 --- a/src/gallium/state_trackers/va/va_private.h +++ b/src/gallium/state_trackers/va/va_private.h @@ -270,6 +270,7 @@ typedef struct { struct pipe_h265_picture_desc h265; struct pipe_mjpeg_picture_desc mjpeg; struct pipe_h264_enc_picture_desc h264enc; + struct pipe_h265_enc_picture_desc h265enc; } desc; struct { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/12] st/va: add entrypoint check for HEVC
From: Boyuan ZhangAdd entrypoint check for HEVC to differentiate decode and encode jobs. Signed-off-by: Boyuan Zhang --- src/gallium/state_trackers/va/context.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/src/gallium/state_trackers/va/context.c b/src/gallium/state_trackers/va/context.c index f03b326..f567f54 100644 --- a/src/gallium/state_trackers/va/context.c +++ b/src/gallium/state_trackers/va/context.c @@ -263,16 +263,18 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID config_id, int picture_width, case PIPE_VIDEO_FORMAT_HEVC: context->templat.max_references = num_render_targets; - context->desc.h265.pps = CALLOC_STRUCT(pipe_h265_pps); - if (!context->desc.h265.pps) { -FREE(context); -return VA_STATUS_ERROR_ALLOCATION_FAILED; - } - context->desc.h265.pps->sps = CALLOC_STRUCT(pipe_h265_sps); - if (!context->desc.h265.pps->sps) { -FREE(context->desc.h265.pps); -FREE(context); -return VA_STATUS_ERROR_ALLOCATION_FAILED; + if (config->entrypoint != PIPE_VIDEO_ENTRYPOINT_ENCODE) { +context->desc.h265.pps = CALLOC_STRUCT(pipe_h265_pps); +if (!context->desc.h265.pps) { + FREE(context); + return VA_STATUS_ERROR_ALLOCATION_FAILED; +} +context->desc.h265.pps->sps = CALLOC_STRUCT(pipe_h265_sps); +if (!context->desc.h265.pps->sps) { + FREE(context->desc.h265.pps); + FREE(context); + return VA_STATUS_ERROR_ALLOCATION_FAILED; +} } break; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/12] radeon/vcn: add ib implementations for HEVC
From: Boyuan ZhangImplement required ibs for vcn HEVC encode. Signed-off-by: Boyuan Zhang --- src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c | 267 1 file changed, 222 insertions(+), 45 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c b/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c index 06b8092..a651f7e 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c +++ b/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c @@ -231,6 +231,27 @@ static void radeon_enc_session_init(struct radeon_encoder *enc) RADEON_ENC_END(); } +static void radeon_enc_session_init_hevc(struct radeon_encoder *enc) +{ + enc->enc_pic.session_init.encode_standard = RENCODE_ENCODE_STANDARD_HEVC; + enc->enc_pic.session_init.aligned_picture_width = align(enc->base.width, 64); + enc->enc_pic.session_init.aligned_picture_height = align(enc->base.height, 16); + enc->enc_pic.session_init.padding_width = enc->enc_pic.session_init.aligned_picture_width - enc->base.width; + enc->enc_pic.session_init.padding_height = enc->enc_pic.session_init.aligned_picture_height - enc->base.height; + enc->enc_pic.session_init.pre_encode_mode = RENCODE_PREENCODE_MODE_NONE; + enc->enc_pic.session_init.pre_encode_chroma_enabled = false; + + RADEON_ENC_BEGIN(RENCODE_IB_PARAM_SESSION_INIT); + RADEON_ENC_CS(enc->enc_pic.session_init.encode_standard); + RADEON_ENC_CS(enc->enc_pic.session_init.aligned_picture_width); + RADEON_ENC_CS(enc->enc_pic.session_init.aligned_picture_height); + RADEON_ENC_CS(enc->enc_pic.session_init.padding_width); + RADEON_ENC_CS(enc->enc_pic.session_init.padding_height); + RADEON_ENC_CS(enc->enc_pic.session_init.pre_encode_mode); + RADEON_ENC_CS(enc->enc_pic.session_init.pre_encode_chroma_enabled); + RADEON_ENC_END(); +} + static void radeon_enc_layer_control(struct radeon_encoder *enc) { enc->enc_pic.layer_ctrl.max_num_temporal_layers = 1; @@ -262,6 +283,19 @@ static void radeon_enc_slice_control(struct radeon_encoder *enc) RADEON_ENC_END(); } +static void radeon_enc_slice_control_hevc(struct radeon_encoder *enc) +{ + enc->enc_pic.hevc_slice_ctrl.slice_control_mode = RENCODE_HEVC_SLICE_CONTROL_MODE_FIXED_CTBS; + enc->enc_pic.hevc_slice_ctrl.fixed_ctbs_per_slice.num_ctbs_per_slice = align(enc->base.width, 64) / 64 * align(enc->base.height, 64) / 64; + enc->enc_pic.hevc_slice_ctrl.fixed_ctbs_per_slice.num_ctbs_per_slice_segment = enc->enc_pic.hevc_slice_ctrl.fixed_ctbs_per_slice.num_ctbs_per_slice; + + RADEON_ENC_BEGIN(RENCODE_HEVC_IB_PARAM_SLICE_CONTROL); + RADEON_ENC_CS(enc->enc_pic.hevc_slice_ctrl.slice_control_mode); + RADEON_ENC_CS(enc->enc_pic.hevc_slice_ctrl.fixed_ctbs_per_slice.num_ctbs_per_slice); + RADEON_ENC_CS(enc->enc_pic.hevc_slice_ctrl.fixed_ctbs_per_slice.num_ctbs_per_slice_segment); + RADEON_ENC_END(); +} + static void radeon_enc_spec_misc(struct radeon_encoder *enc) { enc->enc_pic.spec_misc.constrained_intra_pred_flag = 0; @@ -283,27 +317,68 @@ static void radeon_enc_spec_misc(struct radeon_encoder *enc) RADEON_ENC_END(); } +static void radeon_enc_spec_misc_hevc(struct radeon_encoder *enc, struct pipe_picture_desc *picture) +{ + struct pipe_h265_enc_picture_desc *pic = (struct pipe_h265_enc_picture_desc *)picture; + enc->enc_pic.hevc_spec_misc.log2_min_luma_coding_block_size_minus3 = pic->seq.log2_min_luma_coding_block_size_minus3; + enc->enc_pic.hevc_spec_misc.amp_disabled = !pic->seq.amp_enabled_flag; + enc->enc_pic.hevc_spec_misc.strong_intra_smoothing_enabled = pic->seq.strong_intra_smoothing_enabled_flag; + enc->enc_pic.hevc_spec_misc.constrained_intra_pred_flag = pic->pic.constrained_intra_pred_flag; + enc->enc_pic.hevc_spec_misc.cabac_init_flag = pic->slice.cabac_init_flag; + enc->enc_pic.hevc_spec_misc.half_pel_enabled = 1; + enc->enc_pic.hevc_spec_misc.quarter_pel_enabled = 1; + + RADEON_ENC_BEGIN(RENCODE_HEVC_IB_PARAM_SPEC_MISC); + RADEON_ENC_CS(enc->enc_pic.hevc_spec_misc.log2_min_luma_coding_block_size_minus3); + RADEON_ENC_CS(enc->enc_pic.hevc_spec_misc.amp_disabled); + RADEON_ENC_CS(enc->enc_pic.hevc_spec_misc.strong_intra_smoothing_enabled); + RADEON_ENC_CS(enc->enc_pic.hevc_spec_misc.constrained_intra_pred_flag); + RADEON_ENC_CS(enc->enc_pic.hevc_spec_misc.cabac_init_flag); + RADEON_ENC_CS(enc->enc_pic.hevc_spec_misc.half_pel_enabled); + RADEON_ENC_CS(enc->enc_pic.hevc_spec_misc.quarter_pel_enabled); + RADEON_ENC_END(); +} + static void radeon_enc_rc_session_init(struct radeon_encoder *enc, struct pipe_picture_desc *picture) { - struct pipe_h264_enc_picture_desc *pic = (struct pipe_h264_enc_picture_desc *)picture; - switch(pic->rate_ctrl.rate_ctrl_method) { -
[Mesa-dev] [PATCH 01/12] vl: add parameters for HEVC encode
From: Boyuan ZhangAdd HEVC encode interface Signed-off-by: Boyuan Zhang --- src/gallium/include/pipe/p_video_state.h | 100 +++ 1 file changed, 100 insertions(+) diff --git a/src/gallium/include/pipe/p_video_state.h b/src/gallium/include/pipe/p_video_state.h index 5a88e6c..26e0acf 100644 --- a/src/gallium/include/pipe/p_video_state.h +++ b/src/gallium/include/pipe/p_video_state.h @@ -120,6 +120,15 @@ enum pipe_h264_enc_picture_type PIPE_H264_ENC_PICTURE_TYPE_SKIP = 0x04 }; +enum pipe_h265_enc_picture_type +{ + PIPE_H265_ENC_PICTURE_TYPE_P = 0x00, + PIPE_H265_ENC_PICTURE_TYPE_B = 0x01, + PIPE_H265_ENC_PICTURE_TYPE_I = 0x02, + PIPE_H265_ENC_PICTURE_TYPE_IDR = 0x03, + PIPE_H265_ENC_PICTURE_TYPE_SKIP = 0x04 +}; + enum pipe_h264_enc_rate_control_method { PIPE_H264_ENC_RATE_CONTROL_METHOD_DISABLE = 0x00, @@ -129,6 +138,15 @@ enum pipe_h264_enc_rate_control_method PIPE_H264_ENC_RATE_CONTROL_METHOD_VARIABLE = 0x04 }; +enum pipe_h265_enc_rate_control_method +{ + PIPE_H265_ENC_RATE_CONTROL_METHOD_DISABLE = 0x00, + PIPE_H265_ENC_RATE_CONTROL_METHOD_CONSTANT_SKIP = 0x01, + PIPE_H265_ENC_RATE_CONTROL_METHOD_VARIABLE_SKIP = 0x02, + PIPE_H265_ENC_RATE_CONTROL_METHOD_CONSTANT = 0x03, + PIPE_H265_ENC_RATE_CONTROL_METHOD_VARIABLE = 0x04 +}; + struct pipe_picture_desc { enum pipe_video_profile profile; @@ -412,6 +430,88 @@ struct pipe_h264_enc_picture_desc }; +struct pipe_h265_enc_seq_param +{ + uint8_t general_profile_idc; + uint8_t general_level_idc; + uint8_t general_tier_flag; + uint32_t intra_period; + uint16_t pic_width_in_luma_samples; + uint16_t pic_height_in_luma_samples; + uint32_t chroma_format_idc; + uint32_t bit_depth_luma_minus8; + uint32_t bit_depth_chroma_minus8; + bool strong_intra_smoothing_enabled_flag; + bool amp_enabled_flag; + bool sample_adaptive_offset_enabled_flag; + bool pcm_enabled_flag; + bool sps_temporal_mvp_enabled_flag; + uint8_t log2_min_luma_coding_block_size_minus3; + uint8_t log2_diff_max_min_luma_coding_block_size; + uint8_t log2_min_transform_block_size_minus2; + uint8_t log2_diff_max_min_transform_block_size; + uint8_t max_transform_hierarchy_depth_inter; + uint8_t max_transform_hierarchy_depth_intra; +}; + +struct pipe_h265_enc_pic_param +{ + uint8_t log2_parallel_merge_level_minus2; + uint8_t nal_unit_type; + bool constrained_intra_pred_flag; + bool loop_filter_across_tiles_enabled_flag; +}; + +struct pipe_h265_enc_slice_param +{ + uint8_t max_num_merge_cand; + int8_t slice_cb_qp_offset; + int8_t slice_cr_qp_offset; + int8_t slice_beta_offset_div2; + int8_t slice_tc_offset_div2; + bool cabac_init_flag; + uint32_t slice_deblocking_filter_disabled_flag; + bool slice_loop_filter_across_slices_enabled_flag; +}; + +struct pipe_h265_enc_rate_control +{ + enum pipe_h265_enc_rate_control_method rate_ctrl_method; + unsigned target_bitrate; + unsigned peak_bitrate; + unsigned frame_rate_num; + unsigned frame_rate_den; + unsigned quant_i_frames; + unsigned vbv_buffer_size; + unsigned vbv_buf_lv; + unsigned target_bits_picture; + unsigned peak_bits_picture_integer; + unsigned peak_bits_picture_fraction; + unsigned fill_data_enable; + unsigned enforce_hrd; +}; + +struct pipe_h265_enc_picture_desc +{ + struct pipe_picture_desc base; + + struct pipe_h265_enc_seq_param seq; + struct pipe_h265_enc_pic_param pic; + struct pipe_h265_enc_slice_param slice; + struct pipe_h265_enc_rate_control rc; + + enum pipe_h265_enc_picture_type picture_type; + unsigned decoded_curr_pic; + unsigned reference_frames[16]; + unsigned frame_num; + unsigned pic_order_cnt; + unsigned pic_order_cnt_type; + unsigned ref_idx_l0; + unsigned ref_idx_l1; + bool not_referenced; + struct util_hash_table *frame_idx; +}; + struct pipe_h265_sps { uint8_t chroma_format_idc; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Reset the register file to VGRF in lower_integer_multiplication
On Thu, Jan 25, 2018 at 10:08 AM, Matt Turnerwrote: > On Fri, Dec 15, 2017 at 5:12 PM, Jason Ekstrand > wrote: > > 18fde36ced4279f2577097a1a7d31b55f2f5f141 changed the way temporary > > registers were allocated in lower_integer_multiplication so that we > > allocate regs_written(inst) space and keep the stride of the original > > destination register. This was to ensure that any MUL which originally > > followed the CHV/BXT integer multiply regioning restrictions would > > continue to follow those restrictions even after lowering. This works > > fine except that I forgot to reset the register file to VGRF so, even > > though they were assigned a number from alloc.allocate(), they had the > > wrong register file. This caused some GLES 3.0 CTS tests to start > > failing on Sandy Bridge due to attempted reads from the MRF: > > > > ES3-CTS.functional.shaders.precision.int.highp_mul_fragment.snbm64 > > ES3-CTS.functional.shaders.precision.int.mediump_mul_fragment.snbm64 > > ES3-CTS.functional.shaders.precision.int.lowp_mul_fragment.snbm64 > > ES3-CTS.functional.shaders.precision.uint.highp_mul_fragment.snbm64 > > ES3-CTS.functional.shaders.precision.uint.mediump_mul_ > fragment.snbm64 > > ES3-CTS.functional.shaders.precision.uint.lowp_mul_fragment.snbm64 > > > > This commit remedies this problem by, instead of copying inst->dst and > > overwriting nr, just make a new register and set the region to match > > inst->dst. > > > > Cc: Matt Turner > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103626 > > Fixes: 18fde36ced4279f2577097a1a7d31b55f2f5f141 > > Cc: "17.3" > > Thanks. Sorry this got lost. Looks like it was sent the day I started > vacation. > > Reviewed-by: Matt Turner > Thanks! I'll give it one more run through Jenkins and land it. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vulkan: Update the XML and headers to 1.0.68
pushed On Thu, Jan 25, 2018 at 10:30 AM, Chad Versacewrote: > On Wed 24 Jan 2018, Jason Ekstrand wrote: > > --- > > include/vulkan/vulkan.h| 54 --- > > src/vulkan/registry/vk.xml | 91 ++ > +++- > > 2 files changed, 130 insertions(+), 15 deletions(-) > > Acked-by: Chad Versace > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Fix function pointers initialization in status tracker
On 01/25/2018 01:09 PM, Eleni Maria Stea wrote: We assigned the function that gets the device uuid to the GetDriverUuid function pointer and the function that gets the driver uuid to the GetDeviceUuid function pointer inside the state tracker. Exchanged the pointers. --- src/mesa/state_tracker/st_context.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_context.c b/src/mesa/state_tracker/st_context.c index 3ba4847926..d3e7d3fb7f 100644 --- a/src/mesa/state_tracker/st_context.c +++ b/src/mesa/state_tracker/st_context.c @@ -757,8 +757,8 @@ st_init_driver_functions(struct pipe_screen *screen, functions->UpdateState = st_invalidate_state; functions->QueryMemoryInfo = st_query_memory_info; functions->SetBackgroundContext = st_set_background_context; - functions->GetDriverUuid = st_get_device_uuid; - functions->GetDeviceUuid = st_get_driver_uuid; + functions->GetDriverUuid = st_get_driver_uuid; + functions->GetDeviceUuid = st_get_device_uuid; /* GL_ARB_get_program_binary */ functions->GetProgramBinaryDriverSHA1 = st_get_program_binary_driver_sha1; Reviewed-by: Brian PaulI'll also cc mesa-stable on it and push it soon. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Fix function pointers initialization in status tracker
We assigned the function that gets the device uuid to the GetDriverUuid function pointer and the function that gets the driver uuid to the GetDeviceUuid function pointer inside the state tracker. Exchanged the pointers. --- src/mesa/state_tracker/st_context.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_context.c b/src/mesa/state_tracker/st_context.c index 3ba4847926..d3e7d3fb7f 100644 --- a/src/mesa/state_tracker/st_context.c +++ b/src/mesa/state_tracker/st_context.c @@ -757,8 +757,8 @@ st_init_driver_functions(struct pipe_screen *screen, functions->UpdateState = st_invalidate_state; functions->QueryMemoryInfo = st_query_memory_info; functions->SetBackgroundContext = st_set_background_context; - functions->GetDriverUuid = st_get_device_uuid; - functions->GetDeviceUuid = st_get_driver_uuid; + functions->GetDriverUuid = st_get_driver_uuid; + functions->GetDeviceUuid = st_get_device_uuid; /* GL_ARB_get_program_binary */ functions->GetProgramBinaryDriverSHA1 = st_get_program_binary_driver_sha1; -- 2.15.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: emit a cache flush before enabling predication
On 26 Jan. 2018 01:10, "Matthew Nicholls"wrote: Otherwise cache flushes could get conditionally disabled while still clearing the flush_bits, and thus flushes due to application pipeline barriers may never get executed. I wonder would we better not predicating flushes. I added that as an extra opt, but it might be the wrong move. Dave. Cc: mesa-sta...@lists.freedesktop.org --- src/amd/vulkan/radv_meta_fast_clear.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/amd/vulkan/radv_meta_fast_clear.c b/src/amd/vulkan/radv_meta_fast_clear.c index fdeeaeedbf..f4353fd889 100644 --- a/src/amd/vulkan/radv_meta_fast_clear.c +++ b/src/amd/vulkan/radv_meta_fast_clear.c @@ -602,6 +602,8 @@ radv_emit_color_decompress(struct radv_cmd_buffer *cmd_buffer, } if (!decompress_dcc && image->surface.dcc_size) { + si_emit_cache_flush(cmd_buffer); + radv_emit_set_predication_state_from_image(cmd_buffer, image, true); cmd_buffer->state.predicating = true; } -- 2.13.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v1 0/7] Implement commont gralloc_handle_t in libdrm
On Thu, Jan 25, 2018 at 10:21 AM, Robert Fosswrote: > Hey Tomasz, > > On 01/24/2018 11:04 AM, Tomasz Figa wrote: >> >> Hi Robert, >> >> On Wed, Jan 17, 2018 at 2:36 AM, Robert Foss >> wrote: >>> >>> This series moves {gbm,drm,cros}_gralloc_handle_t struct to libdrm, >>> since at least 4 implementations exist, and share a lot of contents. >>> The idea is to keep the common stuff defined in one place, and libdrm >>> is the common codebase to all of these platforms. >>> >>> >>> Additionally, having this struct defined in libdrm will make it >>> easier for mesa and gralloc implementations to communicate. >>> >>> Robert Foss (7): >>>android: Move gralloc handle struct to libdrm >>>android: Add version variable to gralloc_handle_t >>>android: Mark gralloc_handle_t magic variable as const >>>android: Remove member name from gralloc_handle_t >>>android: Change gralloc_handle_t format from Android format to fourcc >>>android: Change gralloc_handle_t members to be fixed width >>>android: Add accessor functions for gralloc_handle_t variables >> >> >> Again, thanks for working on this. >> >> I looked through the series and it seems to be much different from >> what I imagined when writing my previous reply. I must have >> misunderstood your proposal back then. > > > Ah, glad we caught it before v2 then :) > >> >> Generally, current series doesn't solve Chromium OS main concern of >> locking down the handle struct. Even though accessors are added, they >> are implemented in libdrm and refer to the exact handle layout as per >> the handle struct defined by libdrm. > > > So solving the problems of multiple projects is the goal, so reconsidering > is probably they way forward. > >> >> What I had in my mind, would be creating a secondary struct, >> consisting only of callbacks, which would be filled in by particular >> gralloc implementation running in the system with its accessors. This >> would completely eliminate any dependencies on the handle struct >> itself from consumers of gralloc buffers. > > > So just to sketch out the solution, it would look something like this? > > struct gralloc_handle_t { This is not a handle... > uint32_t (*get_fd)(buffer_handle_t handle, uint32_t plane); > uint64_t (*get_modifier)(buffer_handle_t handle, uint32_t plane); > uint32_t (*get_offsets)(buffer_handle_t handle, uint32_t plane); > uint32_t (*get_stride)(buffer_handle_t handle, uint32_t plane); > ... > } gralloc_funcs_t; > > struct gralloc_handle_t { > native_handle_t base; > > /* api variables */ > const int magic; /* differentiate between allocator impls */ > const int version; /* api version */ > > gralloc_funcs_t funcs; This doesn't go in the handle, but rather you would retrieve this struct I guess with a "perform" call to gralloc AIUI. Of course, if you have 1 perform call, then why not just a perform op for each accessor. Does perform even exist in a gralloc 2 implementation? > ... > } gralloc_handle_t; > > For reasons of backwards compatability gralloc_handle_t should probably > contain whatever gbm_gralloc_handle_t contains now too. Being backwards compatible with upstream (to the extent there is one) was a goal. You can't really have that and what Tomasz proposes. > Since we're going to version this struct, we can always drop extraneous > variables later. > Since we'll be able to drop variables, we could add more variables to > support the cros minigbm variables of even the intel minigbm ones. > This would be a bit high churn, but probably ease adoption. I've yet to hear technical reasons why the handle struct needs to be different. > Additionally the gralloc buffer registering mechanism doesn't exist in any > of the gralloc implementations, so being able to start out with something > that works on all platforms would be nice. > > > Rob. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH v2] configure.ac: add missing llvm dependencies to .pc files
Hi Emil, I'll squash it before pushing >> > > Thanks! Hopefully once my new account goes through I can push on my own. > It looks like my account finally went through so I can just take care of pushing it myself. - Chuck ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr/rast: Optimize DumpToFile output size
Reviewed-by: Bruce Cherniak> On Jan 24, 2018, at 2:50 PM, George Kyriazis > wrote: > > Modify DumpToFile to only dump the function, not the entire module. > Reduces file sizes and speeds up the dumping. > --- > src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp > b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp > index 675438b..7105766 100644 > --- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp > +++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp > @@ -421,8 +421,7 @@ void JitManager::DumpToFile(Function *f, const char > *fileName) > sprintf(fName, "%s.%s.ll", funcName, fileName); > #endif > raw_fd_ostream fd(fName, EC, llvm::sys::fs::F_None); > -Module* pModule = f->getParent(); > -pModule->print(fd, nullptr); > +f->print(fd, nullptr); > > #if defined(_WIN32) > sprintf(fName, "%s\\cfg.%s.%s.dot", outDir.c_str(), funcName, > fileName); > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] svga: s/Bool/SVGA3dBool/ in SVGA3dDevCapResult
Reviewed-by: Charmaine LeeFrom: Brian Paul Sent: Thursday, January 25, 2018 10:38:51 AM To: mesa-dev@lists.freedesktop.org Cc: Charmaine Lee; Neha Bhende Subject: [PATCH] svga: s/Bool/SVGA3dBool/ in SVGA3dDevCapResult And fix whitespace. To sync up with in-house code. --- src/gallium/drivers/svga/include/svga3d_devcaps.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/svga/include/svga3d_devcaps.h b/src/gallium/drivers/svga/include/svga3d_devcaps.h index ade210b..4e2f6bf 100644 --- a/src/gallium/drivers/svga/include/svga3d_devcaps.h +++ b/src/gallium/drivers/svga/include/svga3d_devcaps.h @@ -448,10 +448,10 @@ typedef enum { SVGADX_DXFMT_MULTISAMPLE_8 ) typedef union { - Bool b; + SVGA3dBool b; uint32 u; - int32 i; - float f; + int32 i; + float f; } SVGA3dDevCapResult; #endif /* _SVGA3D_DEVCAPS_H_ */ -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/10] swrast: remove non-applicable GLX_SWAP_COPY_OML comment
On 12 December 2017 at 00:19, Ian Romanickwrote: > On 12/07/2017 09:07 AM, Emil Velikov wrote: >> From: Emil Velikov >> >> Noticed while skimming for GLX_ instances i the dri codebase. > > in > > With that fixed, this patch is also > > Reviewed-by: Ian Romanick > Thank you Ian, tweaked and pushed the first four patches. If anyone is feeling a bit bored and wants to skim through the rest [1] that would be appreciated. -Emil [1] https://patchwork.freedesktop.org/series/35051/ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] svga: s/Bool/SVGA3dBool/ in SVGA3dDevCapResult
And fix whitespace. To sync up with in-house code. --- src/gallium/drivers/svga/include/svga3d_devcaps.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/svga/include/svga3d_devcaps.h b/src/gallium/drivers/svga/include/svga3d_devcaps.h index ade210b..4e2f6bf 100644 --- a/src/gallium/drivers/svga/include/svga3d_devcaps.h +++ b/src/gallium/drivers/svga/include/svga3d_devcaps.h @@ -448,10 +448,10 @@ typedef enum { SVGADX_DXFMT_MULTISAMPLE_8 ) typedef union { - Bool b; + SVGA3dBool b; uint32 u; - int32 i; - float f; + int32 i; + float f; } SVGA3dDevCapResult; #endif /* _SVGA3D_DEVCAPS_H_ */ -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vulkan: Update the XML and headers to 1.0.68
On Wed 24 Jan 2018, Jason Ekstrand wrote: > --- > include/vulkan/vulkan.h| 54 --- > src/vulkan/registry/vk.xml | 91 > +- > 2 files changed, 130 insertions(+), 15 deletions(-) Acked-by: Chad Versace___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 104141] include/c11/threads_posix.h:96: undefined reference to `pthread_once'
https://bugs.freedesktop.org/show_bug.cgi?id=104141 Vinson Leechanged: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] swr/rast: Support USE_SIMD16_FRONTEND=0 for EarlyRast
Series Reviewed-by: Bruce Cherniak> On Jan 24, 2018, at 9:31 AM, George Kyriazis > wrote: > > Early Rasterization did not initially work with USE_SIMD16_FRONTEND=0. > Fix it so it works there, too. Please note that the default setting > is USE_SIMD16_FRONTEND=1. > --- > .../drivers/swr/rasterizer/core/frontend.cpp | 66 +++--- > 1 file changed, 33 insertions(+), 33 deletions(-) > > diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp > b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp > index 9600f78..66c4b74 100644 > --- a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp > +++ b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp > @@ -1032,31 +1032,31 @@ static void GeometryShaderStage( > simdscalari vPrimId = > _simd_set1_epi32(pPrimitiveId[inputPrim]); > > // Gather data from the SVG if provided. > -simdscalari vViewportIdx = > SIMD16::setzero_si(); > -simdscalari vRtIdx = SIMD16::setzero_si(); > -SIMD8::Vec4 svgAttrib[4]; > +simdscalari vViewportIdx = > SIMD::setzero_si(); > +simdscalari vRtIdx = SIMD::setzero_si(); > +SIMD::Vec4 svgAttrib[4]; > > if (state.backendState.readViewportArrayIndex > || state.backendState.readRenderTargetArrayIndex) > { > -tessPa.Assemble(VERTEX_SGV_SLOT, > svgAttrib); > +gsPa.Assemble(VERTEX_SGV_SLOT, > svgAttrib); > } > > > if (state.backendState.readViewportArrayIndex) > { > -vViewportIdx = > SIMD8::castps_si(svgAttrib[0][VERTEX_SGV_VAI_COMP]); > +vViewportIdx = > SIMD::castps_si(svgAttrib[0][VERTEX_SGV_VAI_COMP]); > > // OOB VPAI indices => forced to zero. > -vViewportIdx = > SIMD8::max_epi32(vViewportIdx, SIMD8::setzero_si()); > -simd16scalari vNumViewports = > SIMD8::set1_epi32(KNOB_NUM_VIEWPORTS_SCISSORS); > -simd16scalari vClearMask = > SIMD8::cmplt_epi32(vViewportIdx, vNumViewports); > -vViewportIdx = SIMD8::and_si(vClearMask, > vViewportIdx); > -tessPa.viewportArrayActive = true; > +vViewportIdx = > SIMD::max_epi32(vViewportIdx, SIMD::setzero_si()); > +simdscalari vNumViewports = > SIMD::set1_epi32(KNOB_NUM_VIEWPORTS_SCISSORS); > +simdscalari vClearMask = > SIMD::cmplt_epi32(vViewportIdx, vNumViewports); > +vViewportIdx = SIMD::and_si(vClearMask, > vViewportIdx); > +gsPa.viewportArrayActive = true; > } > if > (state.backendState.readRenderTargetArrayIndex) > { > -vRtIdx = > SIMD8::castps_si(svgAttrib[0][VERTEX_SGV_RTAI_COMP]); > -tessPa.rtArrayActive = true; > +vRtIdx = > SIMD::castps_si(svgAttrib[0][VERTEX_SGV_RTAI_COMP]); > +gsPa.rtArrayActive = true; > } > > pfnClipFunc(pDC, gsPa, workerId, attrib, > GenMask(gsPa.NumPrims()), vPrimId, vViewportIdx, vRtIdx); > @@ -1437,9 +1437,9 @@ static void TessellationStages( > } > #else > // Gather data from the SVG if provided. > -simdscalari vViewportIdx = SIMD16::setzero_si(); > -simdscalari vRtIdx = SIMD16::setzero_si(); > -SIMD8::Vec4 svgAttrib[4]; > +simdscalari vViewportIdx = SIMD::setzero_si(); > +simdscalari vRtIdx = SIMD::setzero_si(); > +SIMD::Vec4 svgAttrib[4]; > > if (state.backendState.readViewportArrayIndex || > state.backendState.readRenderTargetArrayIndex) > { > @@ -1448,18 +1448,18 @@ static void TessellationStages( > > if (state.backendState.readViewportArrayIndex) > { > -vViewportIdx = > SIMD8::castps_si(svgAttrib[0][VERTEX_SGV_VAI_COMP]); > +vViewportIdx = > SIMD::castps_si(svgAttrib[0][VERTEX_SGV_VAI_COMP]); > > // OOB VPAI
[Mesa-dev] [Bug 104710] [swrast] piglit draw-batch regression
https://bugs.freedesktop.org/show_bug.cgi?id=104710 Vinson Leechanged: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #2 from Vinson Lee --- commit 365a48abddcabf6596c2e34a784d91c8ab929918 Author: Brian Paul Date: Tue Jan 23 10:48:51 2018 -0700 vbo: fix incorrect min/max_index values in display list draw call This fixes another regression from commit 8e4efdc895ea ("vbo: optimize some display list drawing"). The problem was the min_index, max_index values passed to the vbo drawing function were not computed to compensate for the biased prim::start values. https://bugs.freedesktop.org/show_bug.cgi?id=104746 https://bugs.freedesktop.org/show_bug.cgi?id=104742 https://bugs.freedesktop.org/show_bug.cgi?id=104690 Tested-by: Clayton Craft Fixes: 8e4efdc895ea ("vbo: optimize some display list drawing") Reviewed-by: Emil Velikov -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 2/8] compiler: Add SYSTEM_VALUE_FIRST_VERTEX and instrinsics
This VS system value will contain the value passed as for indexed draw calls or the value passed as for non-indexed draw calls. It can be used to calculate the gl_VertexID as SYSTEM_VALUE_VERTEX_ID_ZERO_BASE plus SYSTEM_VALUE_FIRST_VERTEX. From the OpenGL 4.6 spec, 10.4 "Drawing Commands Using Vertex Arrays": - Page 352: "The index of any element transferred to the GL by DrawArraysOneInstance is referred to as its vertex ID, and may be read by a vertex shader as gl_VertexID. The vertex ID of the ith element transferred is first + i." - Page 355: "The index of any element transferred to the GL by DrawElementsOneInstance is referred to as its vertex ID, and may be read by a vertex shader as gl_VertexID. The vertex ID of the ith element transferred is the sum of basevertex and the value stored in the currently bound element array buffer at offset indices + i." Currently the gl_VertexID calculation uses SYSTEM_VALUE_BASE_VERTEX but this will have to change when the value of gl_BaseVertex is fixed. Currently its value is broken for non-indexed draw calls because it must be zero but we are setting it to . v2: use SYSTEM_VALUE_FIRST_VERTEX as name for the value, instead of SYSTEM_VALUE_BASE_VERTEX_ID (Kenneth). Reviewed-by: Neil RobertsReviewed-by: Kenneth Graunke --- src/compiler/nir/nir.c | 4 src/compiler/nir/nir_gather_info.c | 1 + src/compiler/nir/nir_intrinsics.h | 1 + src/compiler/shader_enums.c| 1 + src/compiler/shader_enums.h| 14 ++ 5 files changed, 21 insertions(+) diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c index bdd8960403c..e69c2accbbf 100644 --- a/src/compiler/nir/nir.c +++ b/src/compiler/nir/nir.c @@ -1919,6 +1919,8 @@ nir_intrinsic_from_system_value(gl_system_value val) return nir_intrinsic_load_base_instance; case SYSTEM_VALUE_VERTEX_ID_ZERO_BASE: return nir_intrinsic_load_vertex_id_zero_base; + case SYSTEM_VALUE_FIRST_VERTEX: + return nir_intrinsic_load_first_vertex; case SYSTEM_VALUE_BASE_VERTEX: return nir_intrinsic_load_base_vertex; case SYSTEM_VALUE_INVOCATION_ID: @@ -1990,6 +1992,8 @@ nir_system_value_from_intrinsic(nir_intrinsic_op intrin) return SYSTEM_VALUE_BASE_INSTANCE; case nir_intrinsic_load_vertex_id_zero_base: return SYSTEM_VALUE_VERTEX_ID_ZERO_BASE; + case nir_intrinsic_load_first_vertex: + return SYSTEM_VALUE_FIRST_VERTEX; case nir_intrinsic_load_base_vertex: return SYSTEM_VALUE_BASE_VERTEX; case nir_intrinsic_load_invocation_id: diff --git a/src/compiler/nir/nir_gather_info.c b/src/compiler/nir/nir_gather_info.c index 946939657ec..555ae77b1d3 100644 --- a/src/compiler/nir/nir_gather_info.c +++ b/src/compiler/nir/nir_gather_info.c @@ -247,6 +247,7 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, nir_shader *shader) case nir_intrinsic_load_vertex_id: case nir_intrinsic_load_vertex_id_zero_base: case nir_intrinsic_load_base_vertex: + case nir_intrinsic_load_first_vertex: case nir_intrinsic_load_base_instance: case nir_intrinsic_load_instance_id: case nir_intrinsic_load_sample_id: diff --git a/src/compiler/nir/nir_intrinsics.h b/src/compiler/nir/nir_intrinsics.h index ede29277876..7d3421f0e30 100644 --- a/src/compiler/nir/nir_intrinsics.h +++ b/src/compiler/nir/nir_intrinsics.h @@ -333,6 +333,7 @@ SYSTEM_VALUE(frag_coord, 4, 0, xx, xx, xx) SYSTEM_VALUE(front_face, 1, 0, xx, xx, xx) SYSTEM_VALUE(vertex_id, 1, 0, xx, xx, xx) SYSTEM_VALUE(vertex_id_zero_base, 1, 0, xx, xx, xx) +SYSTEM_VALUE(first_vertex, 1, 0, xx, xx, xx) SYSTEM_VALUE(base_vertex, 1, 0, xx, xx, xx) SYSTEM_VALUE(instance_id, 1, 0, xx, xx, xx) SYSTEM_VALUE(base_instance, 1, 0, xx, xx, xx) diff --git a/src/compiler/shader_enums.c b/src/compiler/shader_enums.c index 2179c475abd..5e123f29f37 100644 --- a/src/compiler/shader_enums.c +++ b/src/compiler/shader_enums.c @@ -214,6 +214,7 @@ gl_system_value_name(gl_system_value sysval) ENUM(SYSTEM_VALUE_INSTANCE_ID), ENUM(SYSTEM_VALUE_INSTANCE_INDEX), ENUM(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE), + ENUM(SYSTEM_VALUE_FIRST_VERTEX), ENUM(SYSTEM_VALUE_BASE_VERTEX), ENUM(SYSTEM_VALUE_BASE_INSTANCE), ENUM(SYSTEM_VALUE_DRAW_ID), diff --git a/src/compiler/shader_enums.h b/src/compiler/shader_enums.h index ffe551ab20f..9f71194c146 100644 --- a/src/compiler/shader_enums.h +++ b/src/compiler/shader_enums.h @@ -472,6 +472,20 @@ typedef enum */ SYSTEM_VALUE_BASE_VERTEX, + /** +* Depending on the type of the draw call (indexed or non-indexed), +* is the value of \c basevertex passed to \c glDrawElementsBaseVertex and +* similar, or is the value of \c first passed to \c glDrawArrays and +* similar. +* +* \note +* It can be used to calculate the \c SYSTEM_VALUE_VERTEX_ID as +* \c SYSTEM_VALUE_VERTEX_ID_ZERO_BASE plus \c SYSTEM_VALUE_FIRST_VERTEX. +* +* \sa
[Mesa-dev] [PATCH v3 8/8] i965: gl_BaseVertex must be zero for non-indexed draw calls
We keep 'firstvertex' as it is and move gl_BaseVertex to the drawID vertex element. The previous Vertex Elements order was: * VE 1: * VE 2: and now it is: * VE 1:* VE 2: To move the BaseVertex keeping VE1 as it is, allows to keep pointing the vertex buffer associated to VE 1 to the indirect buffer for indirect draw calls. From the OpenGL 4.6 (11.1.3.9 Shader Inputs) specification: "gl_BaseVertex holds the integer value passed to the baseVertex parameter to the command that resulted in the current shader invocation. In the case where the command has no baseVertex parameter, the value of gl_BaseVertex is zero." Fixes CTS tests: * KHR-GL45.shader_draw_parameters_tests.ShaderDrawArraysParameters * KHR-GL45.shader_draw_parameters_tests.ShaderDrawArraysInstancedParameters * KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters * KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysIndirectParameters * KHR-GL45.shader_draw_parameters_tests.MultiDrawArraysIndirectCountParameters Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102678 --- src/intel/compiler/brw_nir.c | 14 + src/intel/compiler/brw_vec4.cpp | 14 + src/mesa/drivers/dri/i965/brw_context.h | 32 ++- src/mesa/drivers/dri/i965/brw_draw.c | 45 ++- src/mesa/drivers/dri/i965/brw_draw_upload.c | 24 -- src/mesa/drivers/dri/i965/genX_state_upload.c | 38 +++--- 6 files changed, 105 insertions(+), 62 deletions(-) diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index 34b1e44adf0..c10fa73f4fc 100644 --- a/src/intel/compiler/brw_nir.c +++ b/src/intel/compiler/brw_nir.c @@ -238,8 +238,7 @@ brw_nir_lower_vs_inputs(nir_shader *nir, */ const bool has_sgvs = nir->info.system_values_read & - (BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX) | - BITFIELD64_BIT(SYSTEM_VALUE_FIRST_VERTEX) | + (BITFIELD64_BIT(SYSTEM_VALUE_FIRST_VERTEX) | BITFIELD64_BIT(SYSTEM_VALUE_BASE_INSTANCE) | BITFIELD64_BIT(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) | BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID)); @@ -279,7 +278,6 @@ brw_nir_lower_vs_inputs(nir_shader *nir, nir_intrinsic_set_base(load, num_inputs); switch (intrin->intrinsic) { - case nir_intrinsic_load_base_vertex: case nir_intrinsic_load_first_vertex: nir_intrinsic_set_component(load, 0); break; @@ -293,11 +291,15 @@ brw_nir_lower_vs_inputs(nir_shader *nir, nir_intrinsic_set_component(load, 3); break; case nir_intrinsic_load_draw_id: - /* gl_DrawID is stored right after gl_VertexID and friends - * if any of them exist. + case nir_intrinsic_load_base_vertex: + /* gl_DrawID and gl_BaseVertex are stored right after + gl_VertexID and friends if any of them exist. */ nir_intrinsic_set_base(load, num_inputs + has_sgvs); - nir_intrinsic_set_component(load, 0); + if (intrin->intrinsic == nir_intrinsic_load_draw_id) + nir_intrinsic_set_component(load, 0); + else + nir_intrinsic_set_component(load, 1); break; default: unreachable("Invalid system value intrinsic"); diff --git a/src/intel/compiler/brw_vec4.cpp b/src/intel/compiler/brw_vec4.cpp index 06c79630119..3b4b3c01b57 100644 --- a/src/intel/compiler/brw_vec4.cpp +++ b/src/intel/compiler/brw_vec4.cpp @@ -2787,14 +2787,19 @@ brw_compile_vs(const struct brw_compiler *compiler, void *log_data, * incoming vertex attribute. So, add an extra slot. */ if (shader->info.system_values_read & - (BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX) | -BITFIELD64_BIT(SYSTEM_VALUE_FIRST_VERTEX) | + (BITFIELD64_BIT(SYSTEM_VALUE_FIRST_VERTEX) | BITFIELD64_BIT(SYSTEM_VALUE_BASE_INSTANCE) | BITFIELD64_BIT(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) | BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID))) { nr_attribute_slots++; } + if (shader->info.system_values_read & + (BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX) | +BITFIELD64_BIT(SYSTEM_VALUE_DRAW_ID))) { + nr_attribute_slots++; + } + if (shader->info.system_values_read & BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX)) prog_data->uses_basevertex = true; @@ -2815,12 +2820,9 @@ brw_compile_vs(const struct brw_compiler *compiler, void *log_data, BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID)) prog_data->uses_instanceid = true; - /* gl_DrawID has its very own vec4 */ if (shader->info.system_values_read & -
[Mesa-dev] [PATCH v3 3/8] intel/compiler: Add a uses_firstvertex flag
From: Neil RobertsReviewed-by: Kenneth Graunke --- src/intel/compiler/brw_compiler.h | 1 + src/intel/compiler/brw_vec4.cpp | 4 2 files changed, 5 insertions(+) diff --git a/src/intel/compiler/brw_compiler.h b/src/intel/compiler/brw_compiler.h index b1086bbcee5..0afe5757945 100644 --- a/src/intel/compiler/brw_compiler.h +++ b/src/intel/compiler/brw_compiler.h @@ -966,6 +966,7 @@ struct brw_vs_prog_data { bool uses_vertexid; bool uses_instanceid; bool uses_basevertex; + bool uses_firstvertex; bool uses_baseinstance; bool uses_drawid; }; diff --git a/src/intel/compiler/brw_vec4.cpp b/src/intel/compiler/brw_vec4.cpp index ad6d8f9d6bc..36e17d77d47 100644 --- a/src/intel/compiler/brw_vec4.cpp +++ b/src/intel/compiler/brw_vec4.cpp @@ -2798,6 +2798,10 @@ brw_compile_vs(const struct brw_compiler *compiler, void *log_data, BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX)) prog_data->uses_basevertex = true; + if (shader->info.system_values_read & + BITFIELD64_BIT(SYSTEM_VALUE_FIRST_VERTEX)) + prog_data->uses_firstvertex = true; + if (shader->info.system_values_read & BITFIELD64_BIT(SYSTEM_VALUE_BASE_INSTANCE)) prog_data->uses_baseinstance = true; -- 2.14.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 5/8] spirv: Lower BaseVertex to FIRST_VERTEX instead of BASE_VERTEX
From: Neil RobertsThe base vertex in Vulkan is different from GL in that for non-indexed primitives the value is taken from the firstVertex parameter instead of being set to zero. This coincides with the new SYSTEM_VALUE_FIRST_VERTEX instead of BASE_VERTEX. --- src/compiler/spirv/vtn_variables.c | 2 +- src/intel/vulkan/genX_cmd_buffer.c | 16 src/intel/vulkan/genX_pipeline.c | 2 ++ 3 files changed, 15 insertions(+), 5 deletions(-) diff --git a/src/compiler/spirv/vtn_variables.c b/src/compiler/spirv/vtn_variables.c index eb306d0c4a8..3e5686af1d9 100644 --- a/src/compiler/spirv/vtn_variables.c +++ b/src/compiler/spirv/vtn_variables.c @@ -1279,7 +1279,7 @@ vtn_get_builtin_location(struct vtn_builder *b, set_mode_system_value(b, mode); break; case SpvBuiltInBaseVertex: - *location = SYSTEM_VALUE_BASE_VERTEX; + *location = SYSTEM_VALUE_FIRST_VERTEX; set_mode_system_value(b, mode); break; case SpvBuiltInBaseInstance: diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c index c23a54fb7b9..9fc281bf4eb 100644 --- a/src/intel/vulkan/genX_cmd_buffer.c +++ b/src/intel/vulkan/genX_cmd_buffer.c @@ -2223,7 +2223,9 @@ void genX(CmdDraw)( genX(cmd_buffer_flush_state)(cmd_buffer); - if (vs_prog_data->uses_basevertex || vs_prog_data->uses_baseinstance) + if (vs_prog_data->uses_firstvertex || + vs_prog_data->uses_basevertex || + vs_prog_data->uses_baseinstance) emit_base_vertex_instance(cmd_buffer, firstVertex, firstInstance); if (vs_prog_data->uses_drawid) emit_draw_index(cmd_buffer, 0); @@ -2261,7 +2263,9 @@ void genX(CmdDrawIndexed)( genX(cmd_buffer_flush_state)(cmd_buffer); - if (vs_prog_data->uses_basevertex || vs_prog_data->uses_baseinstance) + if (vs_prog_data->uses_firstvertex || + vs_prog_data->uses_basevertex || + vs_prog_data->uses_baseinstance) emit_base_vertex_instance(cmd_buffer, vertexOffset, firstInstance); if (vs_prog_data->uses_drawid) emit_draw_index(cmd_buffer, 0); @@ -2417,7 +2421,9 @@ void genX(CmdDrawIndirect)( struct anv_bo *bo = buffer->bo; uint32_t bo_offset = buffer->offset + offset; - if (vs_prog_data->uses_basevertex || vs_prog_data->uses_baseinstance) + if (vs_prog_data->uses_firstvertex || + vs_prog_data->uses_basevertex || + vs_prog_data->uses_baseinstance) emit_base_vertex_instance_bo(cmd_buffer, bo, bo_offset + 8); if (vs_prog_data->uses_drawid) emit_draw_index(cmd_buffer, i); @@ -2456,7 +2462,9 @@ void genX(CmdDrawIndexedIndirect)( uint32_t bo_offset = buffer->offset + offset; /* TODO: We need to stomp base vertex to 0 somehow */ - if (vs_prog_data->uses_basevertex || vs_prog_data->uses_baseinstance) + if (vs_prog_data->uses_firstvertex || + vs_prog_data->uses_basevertex || + vs_prog_data->uses_baseinstance) emit_base_vertex_instance_bo(cmd_buffer, bo, bo_offset + 12); if (vs_prog_data->uses_drawid) emit_draw_index(cmd_buffer, i); diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c index 82fdf206a95..5f4cf58b83d 100644 --- a/src/intel/vulkan/genX_pipeline.c +++ b/src/intel/vulkan/genX_pipeline.c @@ -98,6 +98,7 @@ emit_vertex_input(struct anv_pipeline *pipeline, const bool needs_svgs_elem = vs_prog_data->uses_vertexid || vs_prog_data->uses_instanceid || vs_prog_data->uses_basevertex || +vs_prog_data->uses_firstvertex || vs_prog_data->uses_baseinstance; uint32_t elem_count = __builtin_popcount(elements) - @@ -178,6 +179,7 @@ emit_vertex_input(struct anv_pipeline *pipeline, * well. Just do all or nothing. */ uint32_t base_ctrl = (vs_prog_data->uses_basevertex || +vs_prog_data->uses_firstvertex || vs_prog_data->uses_baseinstance) ? VFCOMP_STORE_SRC : VFCOMP_STORE_0; -- 2.14.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 6/8] i965: Don't request GLSL IR lowering of gl_VertexID
From: Ian RomanickLet the lowering in NIR handle it instead. This hurts one shader that occurs twice in shader-db (SynMark GSCloth) on IVB and HSW. No other shaders or platforms were affected. total cycles in shared programs: 253438422 -> 253438426 (0.00%) cycles in affected programs: 412 -> 416 (0.97%) helped: 0 HURT: 2 Signed-off-by: Ian Romanick Reviewed-by: Antia Puentes --- src/mesa/drivers/dri/i965/brw_context.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 9ed8bc64bb3..7775468f98a 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -590,7 +590,6 @@ brw_initialize_context_constants(struct brw_context *brw) ctx->Const.QuadsFollowProvokingVertexConvention = false; ctx->Const.NativeIntegers = true; - ctx->Const.VertexID_is_zero_based = true; /* Regarding the CMP instruction, the Ivybridge PRM says: * -- 2.14.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 7/8] nir: Offset vertex_id by first_vertex instead of base_vertex
From: Neil Robertsbase_vertex will be zero for non-indexed calls and in that case we need vertex_id to be offset by the ‘first’ parameter instead. That is what we get with first_vertex. This is true for both GL and Vulkan. The freedreno driver is also setting vertex_id_zero_based on nir_options. In order to avoid breakage this patch switches the relevant code to handle SYSTEM_VALUE_FIRST_VERTEX so that it can retain the same behavior. v2: change a3xx/fd3_emit.c and a4xx/fd4_emit.c from SYSTEM_VALUE_BASE_VERTEX to SYSTEM_VALUE_FIRST_VERTEX (Kenneth). Cc: Rob Clark Cc: Marek Olšák --- src/compiler/nir/nir_lower_system_values.c | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_emit.c| 2 +- src/gallium/drivers/freedreno/a4xx/fd4_emit.c| 2 +- src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c | 5 ++--- src/intel/vulkan/genX_cmd_buffer.c | 4 src/intel/vulkan/genX_pipeline.c | 4 +--- 6 files changed, 6 insertions(+), 13 deletions(-) diff --git a/src/compiler/nir/nir_lower_system_values.c b/src/compiler/nir/nir_lower_system_values.c index 3594f4ae5ce..6f4fb8233ab 100644 --- a/src/compiler/nir/nir_lower_system_values.c +++ b/src/compiler/nir/nir_lower_system_values.c @@ -105,7 +105,7 @@ convert_block(nir_block *block, nir_builder *b) if (b->shader->options->vertex_id_zero_based) { sysval = nir_iadd(b, nir_load_vertex_id_zero_base(b), - nir_load_base_vertex(b)); + nir_load_first_vertex(b)); } else { sysval = nir_load_vertex_id(b); } diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_emit.c b/src/gallium/drivers/freedreno/a3xx/fd3_emit.c index b9e1af00e2c..3419ba86d46 100644 --- a/src/gallium/drivers/freedreno/a3xx/fd3_emit.c +++ b/src/gallium/drivers/freedreno/a3xx/fd3_emit.c @@ -374,7 +374,7 @@ fd3_emit_vertex_bufs(struct fd_ringbuffer *ring, struct fd3_emit *emit) continue; if (vp->inputs[i].sysval) { switch(vp->inputs[i].slot) { - case SYSTEM_VALUE_BASE_VERTEX: + case SYSTEM_VALUE_FIRST_VERTEX: /* handled elsewhere */ break; case SYSTEM_VALUE_VERTEX_ID_ZERO_BASE: diff --git a/src/gallium/drivers/freedreno/a4xx/fd4_emit.c b/src/gallium/drivers/freedreno/a4xx/fd4_emit.c index 5fec2b6b08a..42268ceea71 100644 --- a/src/gallium/drivers/freedreno/a4xx/fd4_emit.c +++ b/src/gallium/drivers/freedreno/a4xx/fd4_emit.c @@ -378,7 +378,7 @@ fd4_emit_vertex_bufs(struct fd_ringbuffer *ring, struct fd4_emit *emit) continue; if (vp->inputs[i].sysval) { switch(vp->inputs[i].slot) { - case SYSTEM_VALUE_BASE_VERTEX: + case SYSTEM_VALUE_FIRST_VERTEX: /* handled elsewhere */ break; case SYSTEM_VALUE_VERTEX_ID_ZERO_BASE: diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c index 15a3aa4c802..d3a8dbec14e 100644 --- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c +++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c @@ -2073,11 +2073,10 @@ emit_intrinsic(struct ir3_context *ctx, nir_intrinsic_instr *intr) ctx->ir->outputs[n] = src[i]; } break; - case nir_intrinsic_load_base_vertex: + case nir_intrinsic_load_first_vertex: if (!ctx->basevertex) { ctx->basevertex = create_driver_param(ctx, IR3_DP_VTXID_BASE); - add_sysval_input(ctx, SYSTEM_VALUE_BASE_VERTEX, - ctx->basevertex); + add_sysval_input(ctx, SYSTEM_VALUE_FIRST_VERTEX, ctx->basevertex); } dst[0] = ctx->basevertex; break; diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c index 9fc281bf4eb..d7dc14f387b 100644 --- a/src/intel/vulkan/genX_cmd_buffer.c +++ b/src/intel/vulkan/genX_cmd_buffer.c @@ -2224,7 +2224,6 @@ void genX(CmdDraw)( genX(cmd_buffer_flush_state)(cmd_buffer); if (vs_prog_data->uses_firstvertex || - vs_prog_data->uses_basevertex || vs_prog_data->uses_baseinstance) emit_base_vertex_instance(cmd_buffer, firstVertex, firstInstance); if (vs_prog_data->uses_drawid) @@ -2264,7 +2263,6 @@ void genX(CmdDrawIndexed)( genX(cmd_buffer_flush_state)(cmd_buffer); if (vs_prog_data->uses_firstvertex || - vs_prog_data->uses_basevertex || vs_prog_data->uses_baseinstance)
[Mesa-dev] [PATCH v3 1/8] i965: allocate a SGVS element when VertexID or InstanceID are read
From: Iago Toral QuirogaAlthough on gen8+ platforms we can in theory use 3DSTATE_VF_SGVS to put these beyond the last vertex element it seems that we still need to allocate the SVGS element, otherwise we have observed cases where we end up reading garbage. Specifically, the CTS test mentioned below was flaky with a fail rate of ~1% on some gen9+ platforms caused by reading garbage for the gl_InstanceID value. The flakyness goes away as soon as we start allocating the SVGS element. v2: - Do this for gen8+, not just gen9+, and pull the boolean outside the #if block (Jason) Fixes flaky test: KHR-GL45.vertex_attrib_64bit.limits_test Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104335 Reviewed-by: Jason Ekstrand --- src/mesa/drivers/dri/i965/genX_state_upload.c | 17 ++--- 1 file changed, 2 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c index 50ac5bc59ff..d0a980f9730 100644 --- a/src/mesa/drivers/dri/i965/genX_state_upload.c +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c @@ -486,26 +486,13 @@ genX(emit_vertices)(struct brw_context *brw) } else { brw_batch_emit(brw, GENX(3DSTATE_VF_SGVS), vfs); } +#endif - /* Normally we don't need an element for the SGVS attribute because the -* 3DSTATE_VF_SGVS instruction lets you store the generated attribute in an -* element that is past the list in 3DSTATE_VERTEX_ELEMENTS. However if -* we're using draw parameters then we need an element for the those -* values. Additionally if there is an edge flag element then the SGVS -* can't be inserted past that so we need a dummy element to ensure that -* the edge flag is the last one. -*/ - const bool needs_sgvs_element = (vs_prog_data->uses_basevertex || -vs_prog_data->uses_baseinstance || -((vs_prog_data->uses_instanceid || - vs_prog_data->uses_vertexid) - && uses_edge_flag)); -#else const bool needs_sgvs_element = (vs_prog_data->uses_basevertex || vs_prog_data->uses_baseinstance || vs_prog_data->uses_instanceid || vs_prog_data->uses_vertexid); -#endif + unsigned nr_elements = brw->vb.nr_enabled + needs_sgvs_element + vs_prog_data->uses_drawid; -- 2.14.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 4/8] intel: Handle firstvertex in an identical way to BaseVertex
Until we set gl_BaseVertex to zero for non-indexed draw calls both have an identical value. The Vertex Elements are kept like that: * VE 1: * VE 2: --- src/intel/compiler/brw_nir.c | 3 +++ src/intel/compiler/brw_vec4.cpp | 1 + src/mesa/drivers/dri/i965/brw_context.h | 8 ++-- src/mesa/drivers/dri/i965/brw_draw.c | 14 +- src/mesa/drivers/dri/i965/brw_draw_upload.c | 7 +-- src/mesa/drivers/dri/i965/genX_state_upload.c | 11 +++ 6 files changed, 31 insertions(+), 13 deletions(-) diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index dbddef0d04d..34b1e44adf0 100644 --- a/src/intel/compiler/brw_nir.c +++ b/src/intel/compiler/brw_nir.c @@ -239,6 +239,7 @@ brw_nir_lower_vs_inputs(nir_shader *nir, const bool has_sgvs = nir->info.system_values_read & (BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX) | + BITFIELD64_BIT(SYSTEM_VALUE_FIRST_VERTEX) | BITFIELD64_BIT(SYSTEM_VALUE_BASE_INSTANCE) | BITFIELD64_BIT(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) | BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID)); @@ -261,6 +262,7 @@ brw_nir_lower_vs_inputs(nir_shader *nir, switch (intrin->intrinsic) { case nir_intrinsic_load_base_vertex: +case nir_intrinsic_load_first_vertex: case nir_intrinsic_load_base_instance: case nir_intrinsic_load_vertex_id_zero_base: case nir_intrinsic_load_instance_id: @@ -278,6 +280,7 @@ brw_nir_lower_vs_inputs(nir_shader *nir, nir_intrinsic_set_base(load, num_inputs); switch (intrin->intrinsic) { case nir_intrinsic_load_base_vertex: + case nir_intrinsic_load_first_vertex: nir_intrinsic_set_component(load, 0); break; case nir_intrinsic_load_base_instance: diff --git a/src/intel/compiler/brw_vec4.cpp b/src/intel/compiler/brw_vec4.cpp index 36e17d77d47..06c79630119 100644 --- a/src/intel/compiler/brw_vec4.cpp +++ b/src/intel/compiler/brw_vec4.cpp @@ -2788,6 +2788,7 @@ brw_compile_vs(const struct brw_compiler *compiler, void *log_data, */ if (shader->info.system_values_read & (BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX) | +BITFIELD64_BIT(SYSTEM_VALUE_FIRST_VERTEX) | BITFIELD64_BIT(SYSTEM_VALUE_BASE_INSTANCE) | BITFIELD64_BIT(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) | BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID))) { diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 9046acd175c..0a20706567e 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -881,8 +881,12 @@ struct brw_context struct { struct { - /** The value of gl_BaseVertex for the current _mesa_prim. */ - int gl_basevertex; + /** + * Either the value of gl_BaseVertex for indexed draw calls or the + * value of the argument for non-indexed draw calls for the + * current _mesa_prim. + */ + int firstvertex; /** The value of gl_BaseInstance for the current _mesa_prim. */ int gl_baseinstance; diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 50cf8b12c74..a1a5161fd35 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -816,25 +816,29 @@ brw_draw_single_prim(struct gl_context *ctx, * always flag if the shader uses one of the values. For direct draws, * we only flag if the values change. */ - const int new_basevertex = + const int new_firstvertex = prim->indexed ? prim->basevertex : prim->start; const int new_baseinstance = prim->base_instance; const struct brw_vs_prog_data *vs_prog_data = brw_vs_prog_data(brw->vs.base.prog_data); if (prim_id > 0) { - const bool uses_draw_parameters = + const bool uses_firstvertex = vs_prog_data->uses_basevertex || + vs_prog_data->uses_firstvertex; + + const bool uses_draw_parameters = + uses_firstvertex || vs_prog_data->uses_baseinstance; if ((uses_draw_parameters && prim->is_indirect) || - (vs_prog_data->uses_basevertex && - brw->draw.params.gl_basevertex != new_basevertex) || + (uses_firstvertex && + brw->draw.params.firstvertex != new_firstvertex) || (vs_prog_data->uses_baseinstance && brw->draw.params.gl_baseinstance != new_baseinstance)) brw->ctx.NewDriverState |= BRW_NEW_VERTICES; } - brw->draw.params.gl_basevertex = new_basevertex; + brw->draw.params.firstvertex = new_firstvertex; brw->draw.params.gl_baseinstance = new_baseinstance; brw_bo_unreference(brw->draw.draw_params_bo); diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c
Re: [Mesa-dev] [PATCH] i965/fs: Reset the register file to VGRF in lower_integer_multiplication
On Fri, Dec 15, 2017 at 5:12 PM, Jason Ekstrandwrote: > 18fde36ced4279f2577097a1a7d31b55f2f5f141 changed the way temporary > registers were allocated in lower_integer_multiplication so that we > allocate regs_written(inst) space and keep the stride of the original > destination register. This was to ensure that any MUL which originally > followed the CHV/BXT integer multiply regioning restrictions would > continue to follow those restrictions even after lowering. This works > fine except that I forgot to reset the register file to VGRF so, even > though they were assigned a number from alloc.allocate(), they had the > wrong register file. This caused some GLES 3.0 CTS tests to start > failing on Sandy Bridge due to attempted reads from the MRF: > > ES3-CTS.functional.shaders.precision.int.highp_mul_fragment.snbm64 > ES3-CTS.functional.shaders.precision.int.mediump_mul_fragment.snbm64 > ES3-CTS.functional.shaders.precision.int.lowp_mul_fragment.snbm64 > ES3-CTS.functional.shaders.precision.uint.highp_mul_fragment.snbm64 > ES3-CTS.functional.shaders.precision.uint.mediump_mul_fragment.snbm64 > ES3-CTS.functional.shaders.precision.uint.lowp_mul_fragment.snbm64 > > This commit remedies this problem by, instead of copying inst->dst and > overwriting nr, just make a new register and set the region to match > inst->dst. > > Cc: Matt Turner > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103626 > Fixes: 18fde36ced4279f2577097a1a7d31b55f2f5f141 > Cc: "17.3" Thanks. Sorry this got lost. Looks like it was sent the day I started vacation. Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: add missing RGB9_E5 format in _mesa_base_fbo_format
Am 25.01.2018 um 17:56 schrieb Roland Scheidegger: > Am 25.01.2018 um 16:30 schrieb Michel Dänzer: >> On 2018-01-24 05:38 PM, Juan A. Suarez Romero wrote: >>> This fixes KHR-GL45.internalformat.renderbuffer.rgb9_e5. >>> --- >>> src/mesa/main/fbobject.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c >>> index d23916d1ad7..c72204e11a0 100644 >>> --- a/src/mesa/main/fbobject.c >>> +++ b/src/mesa/main/fbobject.c >>> @@ -1976,6 +1976,9 @@ _mesa_base_fbo_format(const struct gl_context *ctx, >>> GLenum internalFormat) >>> ctx->Extensions.ARB_texture_float) || >>>_mesa_is_gles3(ctx) /* EXT_color_buffer_float */ ) >>> ? GL_RGBA : 0; >>> + case GL_RGB9_E5: >>> + return (_mesa_is_desktop_gl(ctx) && >>> ctx->Extensions.EXT_texture_shared_exponent) >>> + ? GL_RGB: 0; >>> case GL_ALPHA16F_ARB: >>> case GL_ALPHA32F_ARB: >>>return ctx->API == API_OPENGL_COMPAT && >>> >> >> Unfortunately, this broke the "spec@arb_internalformat_query2@samples >> and num_sample_counts pname checks" piglit tests with radeonsi and >> llvmpipe, see below. >> >> Any idea what might need to be done in Gallium to fix this? >> >> >> 32 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = >> GL_RENDERBUFFER, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), >> supported=1 >> 32 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = >> GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = >> (1,GL_TRUE), supported=1 >> 32 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = >> GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = >> (1,GL_TRUE), supported=1 >> 64 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = >> GL_RENDERBUFFER, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), >> supported=1 >> 64 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = >> GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = >> (1,GL_TRUE), supported=1 >> 64 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = >> GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = >> (1,GL_TRUE), supported=1 >> PIGLIT: {"subtest": {"GL_NUM_SAMPLE_COUNTS" : "fail"}} >> 32 bit failing case: pname = GL_SAMPLES, target = GL_RENDERBUFFER, >> internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 >> 32 bit failing case: pname = GL_SAMPLES, target = >> GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = >> (1,GL_TRUE), supported=1 >> 32 bit failing case: pname = GL_SAMPLES, target = >> GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = >> (1,GL_TRUE), supported=1 >> 64 bit failing case: pname = GL_SAMPLES, target = GL_RENDERBUFFER, >> internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 >> 64 bit failing case: pname = GL_SAMPLES, target = >> GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = >> (1,GL_TRUE), supported=1 >> 64 bit failing case: pname = GL_SAMPLES, target = >> GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = >> (1,GL_TRUE), supported=1 >> PIGLIT: {"subtest": {"GL_SAMPLES" : "fail"}} >> PIGLIT: {"result": "fail" } >> >> > > Purely coincidentally, I was trying to clean up the formatquery code > recently (should help some failures with r600 too), and I think these > cleanups would fix it. > Basically outright say "no" to target/pname combinations which don't > make sense rather than trying to find a format suitable for another > target and then asking the driver for the nonsense combination, plus > some other small bits like not validating things again (sometimes, a > third time...). > Albeit it will cause some breakage with the piglit test, which I believe > is a test error, but that might be open for debate... > (For TEXTURE_BUFFER and the internalformat size/type queries, do you > return valid values or unsupported? The problem here is ARB_tbo says you > can't get these values via the equivalent GetTexLevelParameter queries, > whereas with GL 3.1 you can. And internalformat_query2 says it returns > "the same information" as GetTexLevelParameter, albeit it's not entirely > true in any case since the equivalent of the internalformat stencil type > doesn't even exist. My stance would be that valid values should be > reported even without GL 3.1, but the piglit test thinks differently.) > Err, actually this won't fix it I suppose - because rgb9e5 now is a valid fbo format. Was that commit really correct? It does not make sense to me, rgb9e5 cannot be a fbo/renderable format. Or was this just working around issues in formatquery.c (which I try to address with this patch)? Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: expand glDrawPixels cache to handle multiple images
2018-01-25 18:02 GMT+01:00 Roland Scheidegger: > Am 25.01.2018 um 16:55 schrieb Brian Paul: > > The newest version of WSI Fusion makes several glDrawPixels calls > > per frame. By caching more than one image, we get better performance > > when panning/zomming the map. > Still zooming :-) > > > > > > > > v2: move pixel unpack param checking out of cache search loop, per Roland > > --- > > src/mesa/state_tracker/st_cb_drawpixels.c | 196 > +- > > src/mesa/state_tracker/st_context.c | 4 - > > src/mesa/state_tracker/st_context.h | 22 +++- > > 3 files changed, 154 insertions(+), 68 deletions(-) > > > > diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c > b/src/mesa/state_tracker/st_cb_drawpixels.c > > index 1d88976..e63f6f7 100644 > > --- a/src/mesa/state_tracker/st_cb_drawpixels.c > > +++ b/src/mesa/state_tracker/st_cb_drawpixels.c > > @@ -375,6 +375,131 @@ alloc_texture(struct st_context *st, GLsizei > width, GLsizei height, > > > > > > /** > > + * Search the cache for an image which matches the given parameters. > > + * \return pipe_resource pointer if found, NULL if not found. > > + */ > > +static struct pipe_resource * > > +search_drawpixels_cache(struct st_context *st, > > +GLsizei width, GLsizei height, > > +GLenum format, GLenum type, > > +const struct gl_pixelstore_attrib *unpack, > > +const void *pixels) > > +{ > > + struct pipe_resource *pt = NULL; > > + const GLint bpp = _mesa_bytes_per_pixel(format, type); > > + unsigned i; > > + > > + if ((unpack->RowLength != 0 && unpack->RowLength != width) || > > + unpack->SkipPixels != 0 || > > + unpack->SkipRows != 0 || > > + unpack->SwapBytes) { > > + /* we don't allow non-default pixel unpacking values */ > > + return NULL; > > + } > > + > > + /* Search cache entries for a match */ > > + for (i = 0; i < ARRAY_SIZE(st->drawpix_cache.entries); i++) { > > + struct drawpix_cache_entry *entry = >drawpix_cache.entries[i]; > > + > > + if (width == entry->width && > > + height == entry->height && > > + format == entry->format && > > + type == entry->type && > > + pixels == entry->user_pointer && > > + !_mesa_is_bufferobj(unpack->BufferObj) && > Move this line as well? > > > > > + entry->image) { > > + assert(entry->texture); > > + > > + /* check if the pixel data is the same */ > > + if (memcmp(pixels, entry->image, width * height * bpp) == 0) { > > +/* Success - found a cache match */ > > +pipe_resource_reference(, entry->texture); > > +/* refcount of returned texture should be at least two > here. One > > + * reference for the cache to hold on to, one for the > caller (which > > + * it will release), and possibly more held by the driver. > > + */ > > +assert(pt->reference.count >= 2); > > + > > +/* update the age of this entry */ > > +entry->age = ++st->drawpix_cache.age; > > + > > +return pt; > > + } > > + } > > + } > > + > > + /* no cache match found */ > > + return NULL; > > +} > > + > > + > > +/** > > + * Find the oldest entry in the glDrawPixels cache. We'll replace this > > + * one when we need to store a new image. > > + */ > > +static struct drawpix_cache_entry * > > +find_oldest_drawpixels_cache_entry(struct st_context *st) > > +{ > > + unsigned oldest_age = ~0u, oldest_index = ~0u; > > + unsigned i; > > + > > + /* Find entry with oldest (lowest) age */ > > + for (i = 0; i < ARRAY_SIZE(st->drawpix_cache.entries); i++) { > > + const struct drawpix_cache_entry *entry = > >drawpix_cache.entries[i]; > > + if (entry->age < oldest_age) { > > + oldest_age = entry->age; > > + oldest_index = i; > > + } > > + } > > + > > + assert(oldest_age != ~0u); > Ok, if it takes 2 years to hit it, that's probably ok... > > Reviewed-by: Roland Scheidegger > Note that at 13000fps (maximum I could achieve with glxgears) it would take less than 4 days. Though I guess if you run glDrawPixels each frame you couldn't achieve such fps value. Gustaw Smolarczyk > > > + assert(oldest_index != ~0u); > > + > > + return >drawpix_cache.entries[oldest_index]; > > +} > > + > > + > > +/** > > + * Try to save the given glDrawPixels image in the cache. > > + */ > > +static void > > +cache_drawpixels_image(struct st_context *st, > > + GLsizei width, GLsizei height, > > + GLenum format, GLenum type, > > + const struct gl_pixelstore_attrib *unpack, > > + const void *pixels, > > + struct pipe_resource *pt) > > +{ > > + if ((unpack->RowLength == 0 || unpack->RowLength == width) && > >
Re: [Mesa-dev] [PATCH] configure.ac: correct driglx-direct help text
On 20 December 2017 at 17:34, Emil Velikovwrote: > The default was toggled a while back, but the text wasn't updated. Reviewed-by: Daniel Stone ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] configure.ac: add Wundef to the build flags
On 24 November 2017 at 18:26, Eric Engestromwrote: > On Friday, 2017-11-24 18:14:41 +, Emil Velikov wrote: >> On 24 November 2017 at 14:32, Eric Engestrom >> wrote: >> > On Friday, 2017-11-24 14:25:02 +, Emil Velikov wrote: >> >> From: Emil Velikov >> >> >> >> From the manual: >> >> Warn if an undefined identifier is evaluated in an `#if' directive. >> >> >> >> This is something we want to know and address. Otherwise we can end up >> >> with subtle issues, in the less commonly used codepaths. >> >> >> >> Note: this will trigger a lot of extra warnings, with ~60 of those being >> >> unique. Once all those are resolved we'd want to promote the warning to >> >> an error. >> > >> > Yes please; series is >> > Reviewed-by: Eric Engestrom >> > >> Thanks. I think we should hold these off, until some (say 1/3?) of the >> issues are resolved. >> Otherwise devs might get a bit annoyed my the massive amount of warnings. > > Agreed. The series I just sent fixes 99% of the warnings already, > because c99_{compat,math}.h is included everywhere. > > Once that series and your gtest patches land, if think it should be good > enough, and individual devs can take care of the rest. > > The next biggest offender is Nouveau, and I haven't had a proper look > but at a glance I think it looked like it was probably just a few places > generating many warnings. FTR, I haven't forgotten about this one. Upstream gtest has not replied to the series that I've posted nearly 2 months ago [1]. I'd love to address any feedback and flow things naturally into Mesa. Alternatively we could pull it locally, although next time we update gtest things might be fiddly. -Emil [1] https://github.com/google/googletest/pull/1335 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] mesa: whitespace fixes in varray.h
Looks good. For the series, Reviewed-by: Neha BhendeRegards, Neha From: Brian Paul Sent: Thursday, January 25, 2018 8:48:00 AM To: mesa-dev@lists.freedesktop.org Cc: Neha Bhende; Charmaine Lee; Roland Scheidegger Subject: [PATCH 3/3] mesa: whitespace fixes in varray.h --- src/mesa/main/varray.h | 55 ++ 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/src/mesa/main/varray.h b/src/mesa/main/varray.h index 03d81d0..93f2f47 100644 --- a/src/mesa/main/varray.h +++ b/src/mesa/main/varray.h @@ -44,9 +44,10 @@ _mesa_vertex_attrib_address(const struct gl_array_attributes *array, if (_mesa_is_bufferobj(binding->BufferObj)) return (const GLubyte *) (binding->Offset + array->RelativeOffset); else - return array->Ptr; + return array->Ptr; } + /** * Sets the fields in a gl_vertex_array to values derived from a * gl_array_attributes and a gl_vertex_buffer_binding. @@ -70,6 +71,7 @@ _mesa_update_client_array(struct gl_context *ctx, _mesa_reference_buffer_object(ctx, >BufferObj, binding->BufferObj); } + static inline bool _mesa_attr_zero_aliases_vertex(const struct gl_context *ctx) { @@ -190,7 +192,7 @@ _mesa_SecondaryColorPointer_no_error(GLint size, GLenum type, GLsizei stride, const GLvoid *ptr); extern void GLAPIENTRY _mesa_SecondaryColorPointer(GLint size, GLenum type, - GLsizei stride, const GLvoid *ptr); +GLsizei stride, const GLvoid *ptr); extern void GLAPIENTRY @@ -206,8 +208,8 @@ _mesa_VertexAttribPointer_no_error(GLuint index, GLint size, GLenum type, const GLvoid *pointer); extern void GLAPIENTRY _mesa_VertexAttribPointer(GLuint index, GLint size, GLenum type, - GLboolean normalized, GLsizei stride, - const GLvoid *pointer); + GLboolean normalized, GLsizei stride, + const GLvoid *pointer); void GLAPIENTRY _mesa_VertexAttribIPointer_no_error(GLuint index, GLint size, GLenum type, @@ -295,35 +297,35 @@ _mesa_InterleavedArrays(GLenum format, GLsizei stride, const GLvoid *pointer); extern void GLAPIENTRY -_mesa_MultiDrawArrays( GLenum mode, const GLint *first, - const GLsizei *count, GLsizei primcount ); +_mesa_MultiDrawArrays(GLenum mode, const GLint *first, + const GLsizei *count, GLsizei primcount); extern void GLAPIENTRY -_mesa_MultiDrawElementsEXT( GLenum mode, const GLsizei *count, GLenum type, -const GLvoid **indices, GLsizei primcount ); +_mesa_MultiDrawElementsEXT(GLenum mode, const GLsizei *count, GLenum type, + const GLvoid **indices, GLsizei primcount); extern void GLAPIENTRY -_mesa_MultiDrawElementsBaseVertex( GLenum mode, - const GLsizei *count, GLenum type, - const GLvoid **indices, GLsizei primcount, - const GLint *basevertex); +_mesa_MultiDrawElementsBaseVertex(GLenum mode, + const GLsizei *count, GLenum type, + const GLvoid **indices, GLsizei primcount, + const GLint *basevertex); extern void GLAPIENTRY -_mesa_MultiModeDrawArraysIBM( const GLenum * mode, const GLint * first, - const GLsizei * count, - GLsizei primcount, GLint modestride ); +_mesa_MultiModeDrawArraysIBM(const GLenum * mode, const GLint * first, + const GLsizei * count, + GLsizei primcount, GLint modestride ); extern void GLAPIENTRY -_mesa_MultiModeDrawElementsIBM( const GLenum * mode, const GLsizei * count, - GLenum type, const GLvoid * const * indices, - GLsizei primcount, GLint modestride ); +_mesa_MultiModeDrawElementsIBM(const GLenum * mode, const GLsizei * count, + GLenum type, const GLvoid * const * indices, + GLsizei primcount, GLint modestride ); extern void GLAPIENTRY _mesa_LockArraysEXT(GLint first, GLsizei count); extern void GLAPIENTRY -_mesa_UnlockArraysEXT( void ); +_mesa_UnlockArraysEXT(void); extern void GLAPIENTRY @@ -343,13 +345,13 @@ _mesa_DrawRangeElements(GLenum mode, GLuint start, GLuint end, GLsizei count, extern void GLAPIENTRY _mesa_DrawElementsBaseVertex(GLenum mode, GLsizei count, GLenum type, -const GLvoid *indices, GLint basevertex); + const GLvoid *indices, GLint basevertex); extern void GLAPIENTRY _mesa_DrawRangeElementsBaseVertex(GLenum mode, GLuint start, GLuint end,
Re: [Mesa-dev] [PATCH] configure.ac: correct driglx-direct help text
On 20 December 2017 at 17:34, Emil Velikovwrote: > From: Emil Velikov > > The default was toggled a while back, but the text wasn't updated. > > Fixes: bd526ec9e1b ("configure: Always default to > --enable-driglx-direct") > Cc: Jon TURNEY > Signed-off-by: Emil Velikov > --- > configure.ac | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/configure.ac b/configure.ac > index 79f275d3914..cadbe4bce3c 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -1597,7 +1597,7 @@ fi > AC_ARG_ENABLE([driglx-direct], > [AS_HELP_STRING([--disable-driglx-direct], > [disable direct rendering in GLX and EGL for DRI \ > -@<:@default=auto@:>@])], > +@<:@default=enabled@:>@])], > [driglx_direct="$enableval"], > [driglx_direct="yes"]) Humble ping, anyone? Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radv: fix RADV_DEBUG=syncshaders on GFX9
Reviewed-by: Bas NieuwenhuizenOn Thu, Jan 25, 2018 at 3:46 PM, Samuel Pitoiset wrote: > Signed-off-by: Samuel Pitoiset > --- > src/amd/vulkan/radv_cmd_buffer.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_ > buffer.c > index ba5fd92f2a1..b694174de68 100644 > --- a/src/amd/vulkan/radv_cmd_buffer.c > +++ b/src/amd/vulkan/radv_cmd_buffer.c > @@ -433,13 +433,22 @@ radv_cmd_buffer_after_draw(struct radv_cmd_buffer > *cmd_buffer, >enum radv_cmd_flush_bits flags) > { > if (cmd_buffer->device->instance->debug_flags & > RADV_DEBUG_SYNC_SHADERS) { > + uint32_t *ptr = NULL; > + uint64_t va = 0; > + > assert(flags & (RADV_CMD_FLAG_PS_PARTIAL_FLUSH | > RADV_CMD_FLAG_CS_PARTIAL_FLUSH)); > > + if (cmd_buffer->device->physical_device->rad_info.chip_class > == GFX9) { > + va = radv_buffer_get_va(cmd_buffer->gfx9_fence_bo) > + > +cmd_buffer->gfx9_fence_offset; > + ptr = _buffer->gfx9_fence_idx; > + } > + > /* Force wait for graphics or compute engines to be idle. > */ > si_cs_emit_cache_flush(cmd_buffer->cs, false, >cmd_buffer->device->physical_ > device->rad_info.chip_class, > - NULL, 0, > + ptr, va, >radv_cmd_buffer_uses_mec(cmd_ > buffer), >flags); > } > -- > 2.16.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] radv: fix a GPU hang with RADV_DEBUG=syncshaders
Reviewed-by: Bas NieuwenhuizenOn Thu, Jan 25, 2018 at 3:46 PM, Samuel Pitoiset wrote: > The GPU hangs when the driver forces a PS_PARTIAL_FLUSH after > a dispatch call (and vice versa for graphics). Something has > changed in the kernel driver because it used to work. > > Signed-off-by: Samuel Pitoiset > --- > src/amd/vulkan/radv_cmd_buffer.c | 15 +++ > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_ > buffer.c > index 6d512c6070a..ba5fd92f2a1 100644 > --- a/src/amd/vulkan/radv_cmd_buffer.c > +++ b/src/amd/vulkan/radv_cmd_buffer.c > @@ -429,15 +429,14 @@ void radv_cmd_buffer_trace_emit(struct > radv_cmd_buffer *cmd_buffer) > } > > static void > -radv_cmd_buffer_after_draw(struct radv_cmd_buffer *cmd_buffer) > +radv_cmd_buffer_after_draw(struct radv_cmd_buffer *cmd_buffer, > + enum radv_cmd_flush_bits flags) > { > if (cmd_buffer->device->instance->debug_flags & > RADV_DEBUG_SYNC_SHADERS) { > - enum radv_cmd_flush_bits flags; > - > - /* Force wait for graphics/compute engines to be idle. */ > - flags = RADV_CMD_FLAG_PS_PARTIAL_FLUSH | > - RADV_CMD_FLAG_CS_PARTIAL_FLUSH; > + assert(flags & (RADV_CMD_FLAG_PS_PARTIAL_FLUSH | > + RADV_CMD_FLAG_CS_PARTIAL_FLUSH)); > > + /* Force wait for graphics or compute engines to be idle. > */ > si_cs_emit_cache_flush(cmd_buffer->cs, false, >cmd_buffer->device->physical_ > device->rad_info.chip_class, >NULL, 0, > @@ -3501,7 +3500,7 @@ radv_draw(struct radv_cmd_buffer *cmd_buffer, > } > > assert(cmd_buffer->cs->cdw <= cdw_max); > - radv_cmd_buffer_after_draw(cmd_buffer); > + radv_cmd_buffer_after_draw(cmd_buffer, RADV_CMD_FLAG_PS_PARTIAL_ > FLUSH); > } > > void radv_CmdDraw( > @@ -3821,7 +3820,7 @@ radv_dispatch(struct radv_cmd_buffer *cmd_buffer, > radv_emit_dispatch_packets(cmd_buffer, info); > } > > - radv_cmd_buffer_after_draw(cmd_buffer); > + radv_cmd_buffer_after_draw(cmd_buffer, RADV_CMD_FLAG_CS_PARTIAL_ > FLUSH); > } > > void radv_CmdDispatch( > -- > 2.16.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: expand glDrawPixels cache to handle multiple images
Am 25.01.2018 um 16:55 schrieb Brian Paul: > The newest version of WSI Fusion makes several glDrawPixels calls > per frame. By caching more than one image, we get better performance > when panning/zomming the map. Still zooming :-) > > v2: move pixel unpack param checking out of cache search loop, per Roland > --- > src/mesa/state_tracker/st_cb_drawpixels.c | 196 > +- > src/mesa/state_tracker/st_context.c | 4 - > src/mesa/state_tracker/st_context.h | 22 +++- > 3 files changed, 154 insertions(+), 68 deletions(-) > > diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c > b/src/mesa/state_tracker/st_cb_drawpixels.c > index 1d88976..e63f6f7 100644 > --- a/src/mesa/state_tracker/st_cb_drawpixels.c > +++ b/src/mesa/state_tracker/st_cb_drawpixels.c > @@ -375,6 +375,131 @@ alloc_texture(struct st_context *st, GLsizei width, > GLsizei height, > > > /** > + * Search the cache for an image which matches the given parameters. > + * \return pipe_resource pointer if found, NULL if not found. > + */ > +static struct pipe_resource * > +search_drawpixels_cache(struct st_context *st, > +GLsizei width, GLsizei height, > +GLenum format, GLenum type, > +const struct gl_pixelstore_attrib *unpack, > +const void *pixels) > +{ > + struct pipe_resource *pt = NULL; > + const GLint bpp = _mesa_bytes_per_pixel(format, type); > + unsigned i; > + > + if ((unpack->RowLength != 0 && unpack->RowLength != width) || > + unpack->SkipPixels != 0 || > + unpack->SkipRows != 0 || > + unpack->SwapBytes) { > + /* we don't allow non-default pixel unpacking values */ > + return NULL; > + } > + > + /* Search cache entries for a match */ > + for (i = 0; i < ARRAY_SIZE(st->drawpix_cache.entries); i++) { > + struct drawpix_cache_entry *entry = >drawpix_cache.entries[i]; > + > + if (width == entry->width && > + height == entry->height && > + format == entry->format && > + type == entry->type && > + pixels == entry->user_pointer && > + !_mesa_is_bufferobj(unpack->BufferObj) && Move this line as well? > + entry->image) { > + assert(entry->texture); > + > + /* check if the pixel data is the same */ > + if (memcmp(pixels, entry->image, width * height * bpp) == 0) { > +/* Success - found a cache match */ > +pipe_resource_reference(, entry->texture); > +/* refcount of returned texture should be at least two here. One > + * reference for the cache to hold on to, one for the caller > (which > + * it will release), and possibly more held by the driver. > + */ > +assert(pt->reference.count >= 2); > + > +/* update the age of this entry */ > +entry->age = ++st->drawpix_cache.age; > + > +return pt; > + } > + } > + } > + > + /* no cache match found */ > + return NULL; > +} > + > + > +/** > + * Find the oldest entry in the glDrawPixels cache. We'll replace this > + * one when we need to store a new image. > + */ > +static struct drawpix_cache_entry * > +find_oldest_drawpixels_cache_entry(struct st_context *st) > +{ > + unsigned oldest_age = ~0u, oldest_index = ~0u; > + unsigned i; > + > + /* Find entry with oldest (lowest) age */ > + for (i = 0; i < ARRAY_SIZE(st->drawpix_cache.entries); i++) { > + const struct drawpix_cache_entry *entry = > >drawpix_cache.entries[i]; > + if (entry->age < oldest_age) { > + oldest_age = entry->age; > + oldest_index = i; > + } > + } > + > + assert(oldest_age != ~0u); Ok, if it takes 2 years to hit it, that's probably ok... Reviewed-by: Roland Scheidegger> + assert(oldest_index != ~0u); > + > + return >drawpix_cache.entries[oldest_index]; > +} > + > + > +/** > + * Try to save the given glDrawPixels image in the cache. > + */ > +static void > +cache_drawpixels_image(struct st_context *st, > + GLsizei width, GLsizei height, > + GLenum format, GLenum type, > + const struct gl_pixelstore_attrib *unpack, > + const void *pixels, > + struct pipe_resource *pt) > +{ > + if ((unpack->RowLength == 0 || unpack->RowLength == width) && > + unpack->SkipPixels == 0 && > + unpack->SkipRows == 0) { > + const GLint bpp = _mesa_bytes_per_pixel(format, type); > + struct drawpix_cache_entry *entry = > + find_oldest_drawpixels_cache_entry(st); > + assert(entry); > + entry->width = width; > + entry->height = height; > + entry->format = format; > + entry->type = type; > + entry->user_pointer = pixels; > + free(entry->image); > + entry->image = malloc(width * height * bpp); > + if
[Mesa-dev] [Bug 104749] rasterizer/jitter/JitManager.cpp:252:91: error: no matching function for call to ‘llvm::DIBuilder::createBasicType(const char [8], int, llvm::dwarf::TypeKind)’
https://bugs.freedesktop.org/show_bug.cgi?id=104749 Emil Velikovchanged: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #5 from Emil Velikov --- Should be resolved as of commit 0e879aad2fd1dac102c13d680edf455aa068d5df Author: George Kyriazis Date: Tue Jan 23 16:12:42 2018 -0600 swr/rast: support llvm 3.9 type declarations LLVM 3.9 was not taken into account in initial check-in. Fixes: 01ab218bbc ("swr/rast: Initial work for debugging support.") cc: mesa-sta...@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104749 -- You are receiving this mail because: You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] mesa: whitespace fixes in varray.h
For the series, Reviewed-by: Charmaine LeeFrom: Brian Paul Sent: Thursday, January 25, 2018 8:48:00 AM To: mesa-dev@lists.freedesktop.org Cc: Neha Bhende; Charmaine Lee; Roland Scheidegger Subject: [PATCH 3/3] mesa: whitespace fixes in varray.h --- src/mesa/main/varray.h | 55 ++ 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/src/mesa/main/varray.h b/src/mesa/main/varray.h index 03d81d0..93f2f47 100644 --- a/src/mesa/main/varray.h +++ b/src/mesa/main/varray.h @@ -44,9 +44,10 @@ _mesa_vertex_attrib_address(const struct gl_array_attributes *array, if (_mesa_is_bufferobj(binding->BufferObj)) return (const GLubyte *) (binding->Offset + array->RelativeOffset); else - return array->Ptr; + return array->Ptr; } + /** * Sets the fields in a gl_vertex_array to values derived from a * gl_array_attributes and a gl_vertex_buffer_binding. @@ -70,6 +71,7 @@ _mesa_update_client_array(struct gl_context *ctx, _mesa_reference_buffer_object(ctx, >BufferObj, binding->BufferObj); } + static inline bool _mesa_attr_zero_aliases_vertex(const struct gl_context *ctx) { @@ -190,7 +192,7 @@ _mesa_SecondaryColorPointer_no_error(GLint size, GLenum type, GLsizei stride, const GLvoid *ptr); extern void GLAPIENTRY _mesa_SecondaryColorPointer(GLint size, GLenum type, - GLsizei stride, const GLvoid *ptr); +GLsizei stride, const GLvoid *ptr); extern void GLAPIENTRY @@ -206,8 +208,8 @@ _mesa_VertexAttribPointer_no_error(GLuint index, GLint size, GLenum type, const GLvoid *pointer); extern void GLAPIENTRY _mesa_VertexAttribPointer(GLuint index, GLint size, GLenum type, - GLboolean normalized, GLsizei stride, - const GLvoid *pointer); + GLboolean normalized, GLsizei stride, + const GLvoid *pointer); void GLAPIENTRY _mesa_VertexAttribIPointer_no_error(GLuint index, GLint size, GLenum type, @@ -295,35 +297,35 @@ _mesa_InterleavedArrays(GLenum format, GLsizei stride, const GLvoid *pointer); extern void GLAPIENTRY -_mesa_MultiDrawArrays( GLenum mode, const GLint *first, - const GLsizei *count, GLsizei primcount ); +_mesa_MultiDrawArrays(GLenum mode, const GLint *first, + const GLsizei *count, GLsizei primcount); extern void GLAPIENTRY -_mesa_MultiDrawElementsEXT( GLenum mode, const GLsizei *count, GLenum type, -const GLvoid **indices, GLsizei primcount ); +_mesa_MultiDrawElementsEXT(GLenum mode, const GLsizei *count, GLenum type, + const GLvoid **indices, GLsizei primcount); extern void GLAPIENTRY -_mesa_MultiDrawElementsBaseVertex( GLenum mode, - const GLsizei *count, GLenum type, - const GLvoid **indices, GLsizei primcount, - const GLint *basevertex); +_mesa_MultiDrawElementsBaseVertex(GLenum mode, + const GLsizei *count, GLenum type, + const GLvoid **indices, GLsizei primcount, + const GLint *basevertex); extern void GLAPIENTRY -_mesa_MultiModeDrawArraysIBM( const GLenum * mode, const GLint * first, - const GLsizei * count, - GLsizei primcount, GLint modestride ); +_mesa_MultiModeDrawArraysIBM(const GLenum * mode, const GLint * first, + const GLsizei * count, + GLsizei primcount, GLint modestride ); extern void GLAPIENTRY -_mesa_MultiModeDrawElementsIBM( const GLenum * mode, const GLsizei * count, - GLenum type, const GLvoid * const * indices, - GLsizei primcount, GLint modestride ); +_mesa_MultiModeDrawElementsIBM(const GLenum * mode, const GLsizei * count, + GLenum type, const GLvoid * const * indices, + GLsizei primcount, GLint modestride ); extern void GLAPIENTRY _mesa_LockArraysEXT(GLint first, GLsizei count); extern void GLAPIENTRY -_mesa_UnlockArraysEXT( void ); +_mesa_UnlockArraysEXT(void); extern void GLAPIENTRY @@ -343,13 +345,13 @@ _mesa_DrawRangeElements(GLenum mode, GLuint start, GLuint end, GLsizei count, extern void GLAPIENTRY _mesa_DrawElementsBaseVertex(GLenum mode, GLsizei count, GLenum type, -const GLvoid *indices, GLint basevertex); + const GLvoid *indices, GLint basevertex); extern void GLAPIENTRY _mesa_DrawRangeElementsBaseVertex(GLenum mode, GLuint start, GLuint end, -
[Mesa-dev] [Bug 104710] [swrast] piglit draw-batch regression
https://bugs.freedesktop.org/show_bug.cgi?id=104710 --- Comment #1 from Emil Velikov--- Vinson I'm suspecting that this should be fixed with 365a48abddcabf6596c2e34a784d91c8ab929918. Can you please confirm? -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: add missing RGB9_E5 format in _mesa_base_fbo_format
Am 25.01.2018 um 16:30 schrieb Michel Dänzer: > On 2018-01-24 05:38 PM, Juan A. Suarez Romero wrote: >> This fixes KHR-GL45.internalformat.renderbuffer.rgb9_e5. >> --- >> src/mesa/main/fbobject.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c >> index d23916d1ad7..c72204e11a0 100644 >> --- a/src/mesa/main/fbobject.c >> +++ b/src/mesa/main/fbobject.c >> @@ -1976,6 +1976,9 @@ _mesa_base_fbo_format(const struct gl_context *ctx, >> GLenum internalFormat) >> ctx->Extensions.ARB_texture_float) || >>_mesa_is_gles3(ctx) /* EXT_color_buffer_float */ ) >> ? GL_RGBA : 0; >> + case GL_RGB9_E5: >> + return (_mesa_is_desktop_gl(ctx) && >> ctx->Extensions.EXT_texture_shared_exponent) >> + ? GL_RGB: 0; >> case GL_ALPHA16F_ARB: >> case GL_ALPHA32F_ARB: >>return ctx->API == API_OPENGL_COMPAT && >> > > Unfortunately, this broke the "spec@arb_internalformat_query2@samples > and num_sample_counts pname checks" piglit tests with radeonsi and > llvmpipe, see below. > > Any idea what might need to be done in Gallium to fix this? > > > 32 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = > GL_RENDERBUFFER, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), > supported=1 > 32 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = > GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = > (1,GL_TRUE), supported=1 > 32 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = > GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = > (1,GL_TRUE), supported=1 > 64 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = > GL_RENDERBUFFER, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), > supported=1 > 64 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = > GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = > (1,GL_TRUE), supported=1 > 64 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = > GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = > (1,GL_TRUE), supported=1 > PIGLIT: {"subtest": {"GL_NUM_SAMPLE_COUNTS" : "fail"}} > 32 bit failing case: pname = GL_SAMPLES, target = GL_RENDERBUFFER, > internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 > 32 bit failing case: pname = GL_SAMPLES, target = > GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = > (1,GL_TRUE), supported=1 > 32 bit failing case: pname = GL_SAMPLES, target = > GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = > (1,GL_TRUE), supported=1 > 64 bit failing case: pname = GL_SAMPLES, target = GL_RENDERBUFFER, > internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 > 64 bit failing case: pname = GL_SAMPLES, target = > GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = > (1,GL_TRUE), supported=1 > 64 bit failing case: pname = GL_SAMPLES, target = > GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = > (1,GL_TRUE), supported=1 > PIGLIT: {"subtest": {"GL_SAMPLES" : "fail"}} > PIGLIT: {"result": "fail" } > > Purely coincidentally, I was trying to clean up the formatquery code recently (should help some failures with r600 too), and I think these cleanups would fix it. Basically outright say "no" to target/pname combinations which don't make sense rather than trying to find a format suitable for another target and then asking the driver for the nonsense combination, plus some other small bits like not validating things again (sometimes, a third time...). Albeit it will cause some breakage with the piglit test, which I believe is a test error, but that might be open for debate... (For TEXTURE_BUFFER and the internalformat size/type queries, do you return valid values or unsupported? The problem here is ARB_tbo says you can't get these values via the equivalent GetTexLevelParameter queries, whereas with GL 3.1 you can. And internalformat_query2 says it returns "the same information" as GetTexLevelParameter, albeit it's not entirely true in any case since the equivalent of the internalformat stencil type doesn't even exist. My stance would be that valid values should be reported even without GL 3.1, but the piglit test thinks differently.) Roland diff --git a/src/mesa/main/formatquery.c b/src/mesa/main/formatquery.c index 61f798c88f..3f5da272c3 100644 --- a/src/mesa/main/formatquery.c +++ b/src/mesa/main/formatquery.c @@ -398,8 +398,6 @@ _is_target_supported(struct gl_context *ctx, GLenum target) case GL_TEXTURE_1D: case GL_TEXTURE_2D: case GL_TEXTURE_3D: - if (!_mesa_is_desktop_gl(ctx)) - return false; break; case GL_TEXTURE_1D_ARRAY: @@ -560,15 +558,29 @@ _is_internalformat_supported(struct gl_context *ctx, GLenum target, * implementation accepts it for any texture specification commands, and * -
[Mesa-dev] [PATCH 3/3] mesa: whitespace fixes in varray.h
--- src/mesa/main/varray.h | 55 ++ 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/src/mesa/main/varray.h b/src/mesa/main/varray.h index 03d81d0..93f2f47 100644 --- a/src/mesa/main/varray.h +++ b/src/mesa/main/varray.h @@ -44,9 +44,10 @@ _mesa_vertex_attrib_address(const struct gl_array_attributes *array, if (_mesa_is_bufferobj(binding->BufferObj)) return (const GLubyte *) (binding->Offset + array->RelativeOffset); else - return array->Ptr; + return array->Ptr; } + /** * Sets the fields in a gl_vertex_array to values derived from a * gl_array_attributes and a gl_vertex_buffer_binding. @@ -70,6 +71,7 @@ _mesa_update_client_array(struct gl_context *ctx, _mesa_reference_buffer_object(ctx, >BufferObj, binding->BufferObj); } + static inline bool _mesa_attr_zero_aliases_vertex(const struct gl_context *ctx) { @@ -190,7 +192,7 @@ _mesa_SecondaryColorPointer_no_error(GLint size, GLenum type, GLsizei stride, const GLvoid *ptr); extern void GLAPIENTRY _mesa_SecondaryColorPointer(GLint size, GLenum type, - GLsizei stride, const GLvoid *ptr); +GLsizei stride, const GLvoid *ptr); extern void GLAPIENTRY @@ -206,8 +208,8 @@ _mesa_VertexAttribPointer_no_error(GLuint index, GLint size, GLenum type, const GLvoid *pointer); extern void GLAPIENTRY _mesa_VertexAttribPointer(GLuint index, GLint size, GLenum type, - GLboolean normalized, GLsizei stride, - const GLvoid *pointer); + GLboolean normalized, GLsizei stride, + const GLvoid *pointer); void GLAPIENTRY _mesa_VertexAttribIPointer_no_error(GLuint index, GLint size, GLenum type, @@ -295,35 +297,35 @@ _mesa_InterleavedArrays(GLenum format, GLsizei stride, const GLvoid *pointer); extern void GLAPIENTRY -_mesa_MultiDrawArrays( GLenum mode, const GLint *first, - const GLsizei *count, GLsizei primcount ); +_mesa_MultiDrawArrays(GLenum mode, const GLint *first, + const GLsizei *count, GLsizei primcount); extern void GLAPIENTRY -_mesa_MultiDrawElementsEXT( GLenum mode, const GLsizei *count, GLenum type, -const GLvoid **indices, GLsizei primcount ); +_mesa_MultiDrawElementsEXT(GLenum mode, const GLsizei *count, GLenum type, + const GLvoid **indices, GLsizei primcount); extern void GLAPIENTRY -_mesa_MultiDrawElementsBaseVertex( GLenum mode, - const GLsizei *count, GLenum type, - const GLvoid **indices, GLsizei primcount, - const GLint *basevertex); +_mesa_MultiDrawElementsBaseVertex(GLenum mode, + const GLsizei *count, GLenum type, + const GLvoid **indices, GLsizei primcount, + const GLint *basevertex); extern void GLAPIENTRY -_mesa_MultiModeDrawArraysIBM( const GLenum * mode, const GLint * first, - const GLsizei * count, - GLsizei primcount, GLint modestride ); +_mesa_MultiModeDrawArraysIBM(const GLenum * mode, const GLint * first, + const GLsizei * count, + GLsizei primcount, GLint modestride ); extern void GLAPIENTRY -_mesa_MultiModeDrawElementsIBM( const GLenum * mode, const GLsizei * count, - GLenum type, const GLvoid * const * indices, - GLsizei primcount, GLint modestride ); +_mesa_MultiModeDrawElementsIBM(const GLenum * mode, const GLsizei * count, + GLenum type, const GLvoid * const * indices, + GLsizei primcount, GLint modestride ); extern void GLAPIENTRY _mesa_LockArraysEXT(GLint first, GLsizei count); extern void GLAPIENTRY -_mesa_UnlockArraysEXT( void ); +_mesa_UnlockArraysEXT(void); extern void GLAPIENTRY @@ -343,13 +345,13 @@ _mesa_DrawRangeElements(GLenum mode, GLuint start, GLuint end, GLsizei count, extern void GLAPIENTRY _mesa_DrawElementsBaseVertex(GLenum mode, GLsizei count, GLenum type, -const GLvoid *indices, GLint basevertex); + const GLvoid *indices, GLint basevertex); extern void GLAPIENTRY _mesa_DrawRangeElementsBaseVertex(GLenum mode, GLuint start, GLuint end, - GLsizei count, GLenum type, - const GLvoid *indices, - GLint basevertex); + GLsizei count, GLenum type, + const GLvoid *indices, +
[Mesa-dev] [PATCH 2/3] mesa: include mtypes.h in varray.h
We actually use some of the types from mtypes.h so include it directly instead of relying on indirectly including it via bufferobj.h --- src/mesa/main/varray.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/mesa/main/varray.h b/src/mesa/main/varray.h index 6dcf1db..03d81d0 100644 --- a/src/mesa/main/varray.h +++ b/src/mesa/main/varray.h @@ -28,11 +28,9 @@ #define VARRAY_H -#include "glheader.h" +#include "mtypes.h" #include "bufferobj.h" -struct gl_vertex_array; -struct gl_context; /** * Returns a pointer to the vertex attribute data in a client array, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] mesa: s/gl_vertex_attrib_array/gl_array_attributes/ in comments
The structure type was renamed some time ago, but some comments were not updated. --- src/mesa/main/arrayobj.c | 2 +- src/mesa/main/mtypes.h | 2 +- src/mesa/main/varray.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/main/arrayobj.c b/src/mesa/main/arrayobj.c index d6dc82d..2810647 100644 --- a/src/mesa/main/arrayobj.c +++ b/src/mesa/main/arrayobj.c @@ -307,7 +307,7 @@ _mesa_initialize_vao(struct gl_context *ctx, /** - * Updates the derived gl_vertex_arrays when a gl_vertex_attrib_array + * Updates the derived gl_vertex_arrays when a gl_array_attributes * or a gl_vertex_buffer_binding has changed. */ void diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index ce4fd4c..66c56a9 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -1541,7 +1541,7 @@ struct gl_vertex_array_object /** * Derived vertex attribute arrays * -* This is a legacy data structure created from gl_vertex_attrib_array and +* This is a legacy data structure created from gl_array_attributes and * gl_vertex_buffer_binding, for compatibility with existing driver code. */ struct gl_vertex_array _VertexAttrib[VERT_ATTRIB_MAX]; diff --git a/src/mesa/main/varray.h b/src/mesa/main/varray.h index 8ec6d30..6dcf1db 100644 --- a/src/mesa/main/varray.h +++ b/src/mesa/main/varray.h @@ -51,7 +51,7 @@ _mesa_vertex_attrib_address(const struct gl_array_attributes *array, /** * Sets the fields in a gl_vertex_array to values derived from a - * gl_vertex_attrib_array and a gl_vertex_buffer_binding. + * gl_array_attributes and a gl_vertex_buffer_binding. */ static inline void _mesa_update_client_array(struct gl_context *ctx, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH v2] configure.ac: add missing llvm dependencies to .pc files
> > > +if test "x$enable_glx" == xgallium-xlib; then > > +GL_PC_LIB_PRIV="$GL_PC_LIB_PRIV $LLVM_LIBS" > > +fi > > +if test "x$enable_gallium_osmesa" = xyes; then > > +OSMESA_PC_LIB_PRIV="$OSMESA_PC_LIB_PRIV $LLVM_LIBS" > > +fi > I'm itching to add a comment above these two, since Eric brought it up. > Modulo any objections None from me. Please go ahead. I'll squash it before pushing. > Thanks! Hopefully once my new account goes through I can push on my own. - Chuck ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v1 0/7] Implement commont gralloc_handle_t in libdrm
Hey Tomasz, On 01/24/2018 11:04 AM, Tomasz Figa wrote: Hi Robert, On Wed, Jan 17, 2018 at 2:36 AM, Robert Fosswrote: This series moves {gbm,drm,cros}_gralloc_handle_t struct to libdrm, since at least 4 implementations exist, and share a lot of contents. The idea is to keep the common stuff defined in one place, and libdrm is the common codebase to all of these platforms. Additionally, having this struct defined in libdrm will make it easier for mesa and gralloc implementations to communicate. Robert Foss (7): android: Move gralloc handle struct to libdrm android: Add version variable to gralloc_handle_t android: Mark gralloc_handle_t magic variable as const android: Remove member name from gralloc_handle_t android: Change gralloc_handle_t format from Android format to fourcc android: Change gralloc_handle_t members to be fixed width android: Add accessor functions for gralloc_handle_t variables Again, thanks for working on this. I looked through the series and it seems to be much different from what I imagined when writing my previous reply. I must have misunderstood your proposal back then. Ah, glad we caught it before v2 then :) Generally, current series doesn't solve Chromium OS main concern of locking down the handle struct. Even though accessors are added, they are implemented in libdrm and refer to the exact handle layout as per the handle struct defined by libdrm. So solving the problems of multiple projects is the goal, so reconsidering is probably they way forward. What I had in my mind, would be creating a secondary struct, consisting only of callbacks, which would be filled in by particular gralloc implementation running in the system with its accessors. This would completely eliminate any dependencies on the handle struct itself from consumers of gralloc buffers. So just to sketch out the solution, it would look something like this? struct gralloc_handle_t { uint32_t (*get_fd)(buffer_handle_t handle, uint32_t plane); uint64_t (*get_modifier)(buffer_handle_t handle, uint32_t plane); uint32_t (*get_offsets)(buffer_handle_t handle, uint32_t plane); uint32_t (*get_stride)(buffer_handle_t handle, uint32_t plane); ... } gralloc_funcs_t; struct gralloc_handle_t { native_handle_t base; /* api variables */ const int magic; /* differentiate between allocator impls */ const int version; /* api version */ gralloc_funcs_t funcs; ... } gralloc_handle_t; For reasons of backwards compatability gralloc_handle_t should probably contain whatever gbm_gralloc_handle_t contains now too. Since we're going to version this struct, we can always drop extraneous variables later. Since we'll be able to drop variables, we could add more variables to support the cros minigbm variables of even the intel minigbm ones. This would be a bit high churn, but probably ease adoption. Additionally the gralloc buffer registering mechanism doesn't exist in any of the gralloc implementations, so being able to start out with something that works on all platforms would be nice. Rob. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH mesa] egl: keep extension list sorted, per comment at the top
On Thu, 2018-01-25 at 10:14 +, Eric Engestrom wrote: > Signed-off-by: Eric EngestromReviewed-by: Adam Jackson - ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] st/mesa: expand glDrawPixels cache to handle multiple images
The newest version of WSI Fusion makes several glDrawPixels calls per frame. By caching more than one image, we get better performance when panning/zomming the map. v2: move pixel unpack param checking out of cache search loop, per Roland --- src/mesa/state_tracker/st_cb_drawpixels.c | 196 +- src/mesa/state_tracker/st_context.c | 4 - src/mesa/state_tracker/st_context.h | 22 +++- 3 files changed, 154 insertions(+), 68 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index 1d88976..e63f6f7 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -375,6 +375,131 @@ alloc_texture(struct st_context *st, GLsizei width, GLsizei height, /** + * Search the cache for an image which matches the given parameters. + * \return pipe_resource pointer if found, NULL if not found. + */ +static struct pipe_resource * +search_drawpixels_cache(struct st_context *st, +GLsizei width, GLsizei height, +GLenum format, GLenum type, +const struct gl_pixelstore_attrib *unpack, +const void *pixels) +{ + struct pipe_resource *pt = NULL; + const GLint bpp = _mesa_bytes_per_pixel(format, type); + unsigned i; + + if ((unpack->RowLength != 0 && unpack->RowLength != width) || + unpack->SkipPixels != 0 || + unpack->SkipRows != 0 || + unpack->SwapBytes) { + /* we don't allow non-default pixel unpacking values */ + return NULL; + } + + /* Search cache entries for a match */ + for (i = 0; i < ARRAY_SIZE(st->drawpix_cache.entries); i++) { + struct drawpix_cache_entry *entry = >drawpix_cache.entries[i]; + + if (width == entry->width && + height == entry->height && + format == entry->format && + type == entry->type && + pixels == entry->user_pointer && + !_mesa_is_bufferobj(unpack->BufferObj) && + entry->image) { + assert(entry->texture); + + /* check if the pixel data is the same */ + if (memcmp(pixels, entry->image, width * height * bpp) == 0) { +/* Success - found a cache match */ +pipe_resource_reference(, entry->texture); +/* refcount of returned texture should be at least two here. One + * reference for the cache to hold on to, one for the caller (which + * it will release), and possibly more held by the driver. + */ +assert(pt->reference.count >= 2); + +/* update the age of this entry */ +entry->age = ++st->drawpix_cache.age; + +return pt; + } + } + } + + /* no cache match found */ + return NULL; +} + + +/** + * Find the oldest entry in the glDrawPixels cache. We'll replace this + * one when we need to store a new image. + */ +static struct drawpix_cache_entry * +find_oldest_drawpixels_cache_entry(struct st_context *st) +{ + unsigned oldest_age = ~0u, oldest_index = ~0u; + unsigned i; + + /* Find entry with oldest (lowest) age */ + for (i = 0; i < ARRAY_SIZE(st->drawpix_cache.entries); i++) { + const struct drawpix_cache_entry *entry = >drawpix_cache.entries[i]; + if (entry->age < oldest_age) { + oldest_age = entry->age; + oldest_index = i; + } + } + + assert(oldest_age != ~0u); + assert(oldest_index != ~0u); + + return >drawpix_cache.entries[oldest_index]; +} + + +/** + * Try to save the given glDrawPixels image in the cache. + */ +static void +cache_drawpixels_image(struct st_context *st, + GLsizei width, GLsizei height, + GLenum format, GLenum type, + const struct gl_pixelstore_attrib *unpack, + const void *pixels, + struct pipe_resource *pt) +{ + if ((unpack->RowLength == 0 || unpack->RowLength == width) && + unpack->SkipPixels == 0 && + unpack->SkipRows == 0) { + const GLint bpp = _mesa_bytes_per_pixel(format, type); + struct drawpix_cache_entry *entry = + find_oldest_drawpixels_cache_entry(st); + assert(entry); + entry->width = width; + entry->height = height; + entry->format = format; + entry->type = type; + entry->user_pointer = pixels; + free(entry->image); + entry->image = malloc(width * height * bpp); + if (entry->image) { + memcpy(entry->image, pixels, width * height * bpp); + pipe_resource_reference(>texture, pt); + entry->age = ++st->drawpix_cache.age; + } + else { + /* out of memory, free/disable cached texture */ + entry->width = 0; + entry->height = 0; + pipe_resource_reference(>texture, NULL); + } + } +} + + +/** * Make texture containing an image for glDrawPixels image. * If 'pixels' is
Re: [Mesa-dev] [Mesa-stable] [PATCH v2] configure.ac: add missing llvm dependencies to .pc files
On 25 January 2018 at 14:43, Chuck Atkinswrote: > v2: Only add as dependencies for gallium-osmesa and gallium-xlib > > CC: > Signed-of-by: Chuck Atkins Reviewed-by: Emil Velikov > --- > configure.ac | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/configure.ac b/configure.ac > index 7c1fbe0ed1..448bd3a6ba 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -2780,6 +2780,12 @@ if test "x$enable_llvm" = xyes; then > fi > fi > fi > +if test "x$enable_glx" == xgallium-xlib; then > +GL_PC_LIB_PRIV="$GL_PC_LIB_PRIV $LLVM_LIBS" > +fi > +if test "x$enable_gallium_osmesa" = xyes; then > +OSMESA_PC_LIB_PRIV="$OSMESA_PC_LIB_PRIV $LLVM_LIBS" > +fi I'm itching to add a comment above these two, since Eric brought it up. Modulo any objections I'll squash it before pushing. The following two targets embed the swr/llvmpipe driver into the final binary. Adding LLVM_LIBS results in the LLVM library propagated in the Libs.private of the respective .pc file. With the latter of which used when static linking the respective targets into other projects. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: Export signalled sync file instead of -1.
-1 is considered an error for EGL_ANDROID_native_fence_sync, so we need to actually create a sync file. Fixes: f536f45250 "radeonsi: implement sync_file import/export" --- src/gallium/drivers/radeon/radeon_winsys.h | 5 + src/gallium/drivers/radeonsi/si_fence.c| 2 ++ src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 23 +++ 3 files changed, 30 insertions(+) diff --git a/src/gallium/drivers/radeon/radeon_winsys.h b/src/gallium/drivers/radeon/radeon_winsys.h index d1c761f4ee..307f8efaec 100644 --- a/src/gallium/drivers/radeon/radeon_winsys.h +++ b/src/gallium/drivers/radeon/radeon_winsys.h @@ -610,6 +610,11 @@ struct radeon_winsys { int (*fence_export_sync_file)(struct radeon_winsys *ws, struct pipe_fence_handle *fence); +/** + * Return a sync file FD that is already signalled. + */ +int (*export_signalled_sync_file)(struct radeon_winsys *ws); + /** * Initialize surface * diff --git a/src/gallium/drivers/radeonsi/si_fence.c b/src/gallium/drivers/radeonsi/si_fence.c index 5f320803aa..47d68dbc33 100644 --- a/src/gallium/drivers/radeonsi/si_fence.c +++ b/src/gallium/drivers/radeonsi/si_fence.c @@ -356,6 +356,8 @@ static int si_fence_get_fd(struct pipe_screen *screen, /* If we don't have FDs at this point, it means we don't have fences * either. */ + if (sdma_fd == -1 && gfx_fd == -1) + return ws->export_signalled_sync_file(ws); if (sdma_fd == -1) return gfx_fd; if (gfx_fd == -1) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c index 63cd63287f..b60574cfdd 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c @@ -114,6 +114,28 @@ static int amdgpu_fence_export_sync_file(struct radeon_winsys *rws, return fd; } +static int amdgpu_export_signalled_sync_file(struct radeon_winsys *rws) +{ + struct amdgpu_winsys *ws = amdgpu_winsys(rws); + uint32_t syncobj; + int fd = -1; + + int r = amdgpu_cs_create_syncobj2(ws->dev, DRM_SYNCOBJ_CREATE_SIGNALED, + ); + if (r) { + return -1; + } + + r = amdgpu_cs_syncobj_export_sync_file(ws->dev, syncobj, ); + if (r) { + fd = -1; + } + + amdgpu_cs_destroy_syncobj(ws->dev, syncobj); + return fd; +} + + static void amdgpu_fence_submitted(struct pipe_fence_handle *fence, uint64_t seq_no, uint64_t *user_fence_cpu_address) @@ -1560,4 +1582,5 @@ void amdgpu_cs_init_functions(struct amdgpu_winsys *ws) ws->base.fence_reference = amdgpu_fence_reference; ws->base.fence_import_sync_file = amdgpu_fence_import_sync_file; ws->base.fence_export_sync_file = amdgpu_fence_export_sync_file; + ws->base.export_signalled_sync_file = amdgpu_export_signalled_sync_file; } -- 2.16.0.rc1.238.g530d649a79-goog ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: expand glDrawPixels cache to handle multiple images
On 01/24/2018 09:06 PM, Roland Scheidegger wrote: Am 25.01.2018 um 00:19 schrieb Brian Paul: The newest version of WSI Fusion makes several glDrawPixels calls per frame. By caching more than one image, we get better performance when panning/zomming the map. zooming --- src/mesa/state_tracker/st_cb_drawpixels.c | 192 +- src/mesa/state_tracker/st_context.c | 4 - src/mesa/state_tracker/st_context.h | 22 +++- 3 files changed, 150 insertions(+), 68 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index 1d88976..2e4e89d 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -375,6 +375,127 @@ alloc_texture(struct st_context *st, GLsizei width, GLsizei height, /** + * Search the cache for an image which matches the given parameters. + * \return pipe_resource pointer if found, NULL if not found. + */ +static struct pipe_resource * +search_drawpixels_cache(struct st_context *st, +GLsizei width, GLsizei height, +GLenum format, GLenum type, +const struct gl_pixelstore_attrib *unpack, +const void *pixels) +{ + struct pipe_resource *pt = NULL; + const GLint bpp = _mesa_bytes_per_pixel(format, type); + unsigned i; + + /* Search cache entries for a match */ + for (i = 0; i < ARRAY_SIZE(st->drawpix_cache.entries); i++) { + struct drawpix_cache_entry *entry = >drawpix_cache.entries[i]; + + if (width == entry->width && + height == entry->height && + format == entry->format && + type == entry->type && + pixels == entry->user_pointer && + !_mesa_is_bufferobj(unpack->BufferObj) && + (unpack->RowLength == 0 || unpack->RowLength == width) && + unpack->SkipPixels == 0 && + unpack->SkipRows == 0 && + unpack->SwapBytes == GL_FALSE && Maybe factor out all these unpack parameter (which don't change) into their own var? Would make it more obvious which parameter you're actually comparing in the entries. And if that combined unpack var isn't true, you should probably skip the for loop in the first place. Yeah, I'll lift those out of the loop. + entry->image) { + assert(entry->texture); + + /* check if the pixel data is the same */ + if (memcmp(pixels, entry->image, width * height * bpp) == 0) { +/* Success- found a cache match */ whitespace before - +pipe_resource_reference(, entry->texture); +/* refcount of returned texture should be at least two here. One + * reference for the cache to hold on to, one for the caller (which + * it will release), and possibly more held by the driver. + */ +assert(pt->reference.count >= 2); + +/* update the age of this entry */ +entry->age = ++st->drawpix_cache.age; + +return pt; + } + } + } + + /* no cache match found */ + return NULL; +} + + +/** + * Find the oldest entry in the glDrawPixels cache. We'll replace this + * one when we need to store a new image. + */ +static struct drawpix_cache_entry * +find_oldest_drawpixels_cache_entry(struct st_context *st) +{ + unsigned oldest_age = ~0u, oldest_index = ~0u; + unsigned i; + + /* Find entry with oldest (lowest) age */ + for (i = 0; i < ARRAY_SIZE(st->drawpix_cache.entries); i++) { + const struct drawpix_cache_entry *entry = >drawpix_cache.entries[i]; + if (entry->age < oldest_age) { + oldest_age = entry->age; + oldest_index = i; + } + } + + assert(oldest_age != ~0u); Couldn't you hit that with 32bit wraparound of age? I think the logic should be pretty safe against wraparound (would just not return the oldest entry). Yeah, I can drop that. Though, even at 60 draws/second, it'd take over 2 years to hit wrap-around. :) Reviewed-by: Roland ScheideggerThanks. I'll post a v2. -Brian + assert(oldest_index != ~0u); + + return >drawpix_cache.entries[oldest_index]; +} + + +/** + * Try to save the given glDrawPixels image in the cache. + */ +static void +cache_drawpixels_image(struct st_context *st, + GLsizei width, GLsizei height, + GLenum format, GLenum type, + const struct gl_pixelstore_attrib *unpack, + const void *pixels, + struct pipe_resource *pt) +{ + if ((unpack->RowLength == 0 || unpack->RowLength == width) && + unpack->SkipPixels == 0 && + unpack->SkipRows == 0) { + const GLint bpp = _mesa_bytes_per_pixel(format, type); + struct drawpix_cache_entry *entry = + find_oldest_drawpixels_cache_entry(st); + assert(entry); + entry->width = width; + entry->height =
Re: [Mesa-dev] [PATCH 2/3] anv/gen10: Ignore push constant packets during context restore.
On Wed, Jan 24, 2018 at 05:08:54PM -0800, Jason Ekstrand wrote: > On Wed, Jan 24, 2018 at 4:33 PM, Rafael Antognolli >> wrote: > > Similar to the GL driver, ignore 3DSTATE_CONSTANT_* packets when doing a > context restore. > > Signed-off-by: Rafael Antognolli > Cc: Jason Ekstrand > Cc: "18.0" > --- > src/intel/vulkan/anv_private.h | 1 + > src/intel/vulkan/genX_cmd_buffer.c | 47 ++ > > 2 files changed, 48 insertions(+) > > diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_ > private.h > index b351c6f63b3..a4c84d2c295 100644 > --- a/src/intel/vulkan/anv_private.h > +++ b/src/intel/vulkan/anv_private.h > @@ -1458,6 +1458,7 @@ enum anv_pipe_bits { > ANV_PIPE_CONSTANT_CACHE_INVALIDATE_BIT= (1 << 3), > ANV_PIPE_VF_CACHE_INVALIDATE_BIT = (1 << 4), > ANV_PIPE_DATA_CACHE_FLUSH_BIT = (1 << 5), > + ANV_PIPE_ISP_DISABLE_BIT = (1 << 9), > > > Let's drop this if we're not going to use it. OK. > > ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT = (1 << 10), > ANV_PIPE_INSTRUCTION_CACHE_INVALIDATE_BIT = (1 << 11), > ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT= (1 << 12), > diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/ > genX_cmd_buffer.c > index c23a54fb7b9..7028c1ce9df 100644 > --- a/src/intel/vulkan/genX_cmd_buffer.c > +++ b/src/intel/vulkan/genX_cmd_buffer.c > @@ -1008,6 +1008,50 @@ genX(BeginCommandBuffer)( > return result; > } > > +/** > + * From the PRM, Volume 2a: > + * > + *"Indirect State Pointers Disable > + * > + *At the completion of the post-sync operation associated with this > pipe > + *control packet, the indirect state pointers in the hardware are > + *considered invalid; the indirect pointers are not saved in the > context. > + *If any new indirect state commands are executed in the command > stream > + *while the pipe control is pending, the new indirect state commands > are > + *preserved. > + * > + *[DevIVB+]: Using Invalidate State Pointer (ISP) only inhibits > context > + *restoring of Push Constant (3DSTATE_CONSTANT_*) commands. Push > Constant > + *commands are only considered as Indirect State Pointers. Once ISP > is > + *issued in a context, SW must initialize by programming push > constant > + *commands for all the shaders (at least to zero length) before > attempting > + *any rendering operation for the same context." > + * > + * 3DSTATE_CONSTANT_* packets are restored during a context restore, > + * even though they point to a BO that has been already unreferenced at > + * the end of the previous batch buffer. This has been fine so far since > + * we are protected by these scratch page (every address not covered by > + * a BO should be pointing to the scratch page). But on CNL, it is > + * causing a GPU hang during context restore at the 3DSTATE_CONSTANT_* > + * instruction. > + * > + * The flag "Indirect State Pointers Disable" in PIPE_CONTROL tells the > + * hardware to ignore previous 3DSTATE_CONSTANT_* packets during a > + * context restore, so the mentioned hang doesn't happen. However, > + * software must program push constant commands for all stages prior to > + * rendering anything, so we flag them as dirty. > > > And... The next command buffer won't. I just looked at it and we won't set up > push constants again until we use them. We could either set > 3DSTATE_CONSTANT_* > instead or we can make sure that push constants are flagged as dirty in > BeginCommandBuffer. Oh, I understood that anv was always sending them at every command buffer. OK, will check this again. > + */ > +static void > +emit_isp_disable(struct anv_cmd_buffer *cmd_buffer) > +{ > + anv_batch_emit(_buffer->batch, GENX(PIPE_CONTROL), pc) { > + pc.IndirectStatePointersDisable = true; > + pc.PostSyncOperation = WriteImmediateData; > + pc.Address = > +(struct anv_address) { _buffer->device->workaround_bo, 0 > };' > > > Is the W/A BO write needed? That's what I understood from "At the completion of the post-sync operation associated with this pipe control packet..." > > + } > +} > + > VkResult > genX(EndCommandBuffer)( > VkCommandBuffer commandBuffer) > @@ -1024,6 +1068,9 @@ genX(EndCommandBuffer)( > > genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer); > > + if (GEN_GEN == 10) > + emit_isp_disable(cmd_buffer); > + >
Re: [Mesa-dev] [PATCH] mesa: add missing RGB9_E5 format in _mesa_base_fbo_format
On 2018-01-24 05:38 PM, Juan A. Suarez Romero wrote: > This fixes KHR-GL45.internalformat.renderbuffer.rgb9_e5. > --- > src/mesa/main/fbobject.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c > index d23916d1ad7..c72204e11a0 100644 > --- a/src/mesa/main/fbobject.c > +++ b/src/mesa/main/fbobject.c > @@ -1976,6 +1976,9 @@ _mesa_base_fbo_format(const struct gl_context *ctx, > GLenum internalFormat) > ctx->Extensions.ARB_texture_float) || >_mesa_is_gles3(ctx) /* EXT_color_buffer_float */ ) > ? GL_RGBA : 0; > + case GL_RGB9_E5: > + return (_mesa_is_desktop_gl(ctx) && > ctx->Extensions.EXT_texture_shared_exponent) > + ? GL_RGB: 0; > case GL_ALPHA16F_ARB: > case GL_ALPHA32F_ARB: >return ctx->API == API_OPENGL_COMPAT && > Unfortunately, this broke the "spec@arb_internalformat_query2@samples and num_sample_counts pname checks" piglit tests with radeonsi and llvmpipe, see below. Any idea what might need to be done in Gallium to fix this? 32 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = GL_RENDERBUFFER, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 32 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 32 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 64 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = GL_RENDERBUFFER, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 64 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 64 bit failing case: pname = GL_NUM_SAMPLE_COUNTS, target = GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 PIGLIT: {"subtest": {"GL_NUM_SAMPLE_COUNTS" : "fail"}} 32 bit failing case: pname = GL_SAMPLES, target = GL_RENDERBUFFER, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 32 bit failing case: pname = GL_SAMPLES, target = GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 32 bit failing case: pname = GL_SAMPLES, target = GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 64 bit failing case: pname = GL_SAMPLES, target = GL_RENDERBUFFER, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 64 bit failing case: pname = GL_SAMPLES, target = GL_TEXTURE_2D_MULTISAMPLE, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 64 bit failing case: pname = GL_SAMPLES, target = GL_TEXTURE_2D_MULTISAMPLE_ARRAY, internalformat = GL_RGB9_E5, params[0] = (1,GL_TRUE), supported=1 PIGLIT: {"subtest": {"GL_SAMPLES" : "fail"}} PIGLIT: {"result": "fail" } -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Correctly print glTexImage dimensions
texture_format_error_check_gles() displays error like "glTexImage%dD". This patch just replace the %d by the correct dimension. Signed-off-by: Elie Tournier--- src/mesa/main/teximage.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c index e5f8bb0718..cc329e6410 100644 --- a/src/mesa/main/teximage.c +++ b/src/mesa/main/teximage.c @@ -1787,7 +1787,6 @@ texture_formats_agree(GLenum internalFormat, * \param format pixel data format given by the user. * \param type pixel data type given by the user. * \param internalFormat internal format given by the user. - * \param dimensions texture image dimensions (must be 1, 2 or 3). * \param callerName name of the caller function to print in the error message * * \return true if a error is found, false otherwise @@ -1796,8 +1795,7 @@ texture_formats_agree(GLenum internalFormat, */ static bool texture_format_error_check_gles(struct gl_context *ctx, GLenum format, -GLenum type, GLenum internalFormat, -GLuint dimensions, const char *callerName) +GLenum type, GLenum internalFormat, const char *callerName) { GLenum err = _mesa_es3_error_check_format_and_type(ctx, format, type, internalFormat); @@ -1911,9 +1909,11 @@ texture_error_check( struct gl_context *ctx, * Formats and types that require additional extensions (e.g., GL_FLOAT * requires GL_OES_texture_float) are filtered elsewhere. */ + char bufCallerName[20]; + snprintf(bufCallerName, 20, "glTexImage%dD", dimensions); if (_mesa_is_gles(ctx) && - texture_format_error_check_gles(ctx, format, type, internalFormat, - dimensions, "glTexImage%dD")) { + texture_format_error_check_gles(ctx, format, type, + internalFormat, bufCallerName)) { return GL_TRUE; } @@ -2234,8 +2234,7 @@ texsubimage_error_check(struct gl_context *ctx, GLuint dimensions, */ if (_mesa_is_gles(ctx) && texture_format_error_check_gles(ctx, format, type, - internalFormat, - dimensions, callerName)) { + internalFormat, callerName)) { return GL_TRUE; } -- 2.16.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meson: fix some more defines meson.build
On Thursday, 2018-01-25 10:31:12 +0100, Marc Dietrich wrote: > Am Donnerstag, 25. Januar 2018, 10:28:26 CET schrieb Marc Dietrich: > > Am Donnerstag, 25. Januar 2018, 10:18:16 CET schrieb Eric Engestrom: > > > On Wednesday, 2018-01-24 22:02:42 +0100, Marc Dietrich wrote: > > > > Btw, there is still some strange problem in PACKAGE_BUGREPORT as it > > > > includes a "$" in the url. I don't know where this comes from. > > > > > > Where do you see this "$"? > > > I've looked at the code and it looks all good to me. > > > > yes, code is fine, output is not (see build.ninja): > > > > '-DPACKAGE_BUGREPORT="https$://bugs.freedesktop.org/enter_bug.cgi? > > product=Mesa"' > > > > maybe some escaping required? > > > > Marc > > arr, I just checked the resulting binary and it seems to be ok there, so > false > alarm. Still puzzling where it came from and where it went to. This is some ninja-specific escaping: https://ninja-build.org/manual.html#_lexical_syntax > > Marc > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: simplify _mesa_delete_list() a bit, add some assertions
On 01/24/2018 09:41 PM, Roland Scheidegger wrote: Am 25.01.2018 um 00:19 schrieb Brian Paul: All but two cases of the switch did the same n += InstSize[n[0].opcode] instruction. Just move it after the switch. Add some sanity check assertions. --- src/mesa/main/dlist.c | 39 +++ 1 file changed, 11 insertions(+), 28 deletions(-) diff --git a/src/mesa/main/dlist.c b/src/mesa/main/dlist.c index a6b212e..7b8e0f6 100644 --- a/src/mesa/main/dlist.c +++ b/src/mesa/main/dlist.c @@ -961,79 +961,60 @@ _mesa_delete_list(struct gl_context *ctx, struct gl_display_list *dlist) /* for some commands, we need to free malloc'd memory */ case OPCODE_MAP1: free(get_pointer([6])); -n += InstSize[n[0].opcode]; break; case OPCODE_MAP2: free(get_pointer([10])); -n += InstSize[n[0].opcode]; break; case OPCODE_CALL_LISTS: free(get_pointer([3])); -n += InstSize[n[0].opcode]; break; case OPCODE_DRAW_PIXELS: free(get_pointer([5])); -n += InstSize[n[0].opcode]; break; case OPCODE_BITMAP: free(get_pointer([7])); -n += InstSize[n[0].opcode]; break; case OPCODE_POLYGON_STIPPLE: free(get_pointer([1])); -n += InstSize[n[0].opcode]; break; case OPCODE_TEX_IMAGE1D: free(get_pointer([8])); -n += InstSize[n[0].opcode]; break; case OPCODE_TEX_IMAGE2D: free(get_pointer([9])); -n += InstSize[n[0].opcode]; break; case OPCODE_TEX_IMAGE3D: free(get_pointer([10])); -n += InstSize[n[0].opcode]; break; case OPCODE_TEX_SUB_IMAGE1D: free(get_pointer([7])); -n += InstSize[n[0].opcode]; break; case OPCODE_TEX_SUB_IMAGE2D: free(get_pointer([9])); -n += InstSize[n[0].opcode]; break; case OPCODE_TEX_SUB_IMAGE3D: free(get_pointer([11])); -n += InstSize[n[0].opcode]; break; case OPCODE_COMPRESSED_TEX_IMAGE_1D: free(get_pointer([7])); -n += InstSize[n[0].opcode]; break; case OPCODE_COMPRESSED_TEX_IMAGE_2D: free(get_pointer([8])); -n += InstSize[n[0].opcode]; break; case OPCODE_COMPRESSED_TEX_IMAGE_3D: free(get_pointer([9])); -n += InstSize[n[0].opcode]; break; case OPCODE_COMPRESSED_TEX_SUB_IMAGE_1D: free(get_pointer([7])); -n += InstSize[n[0].opcode]; break; case OPCODE_COMPRESSED_TEX_SUB_IMAGE_2D: free(get_pointer([9])); -n += InstSize[n[0].opcode]; break; case OPCODE_COMPRESSED_TEX_SUB_IMAGE_3D: free(get_pointer([11])); -n += InstSize[n[0].opcode]; break; case OPCODE_PROGRAM_STRING_ARB: free(get_pointer([4])); /* program string */ -n += InstSize[n[0].opcode]; break; case OPCODE_UNIFORM_1FV: case OPCODE_UNIFORM_2FV: @@ -1048,7 +1029,6 @@ _mesa_delete_list(struct gl_context *ctx, struct gl_display_list *dlist) case OPCODE_UNIFORM_3UIV: case OPCODE_UNIFORM_4UIV: free(get_pointer([3])); -n += InstSize[n[0].opcode]; break; case OPCODE_UNIFORM_MATRIX22: case OPCODE_UNIFORM_MATRIX33: @@ -1060,7 +1040,6 @@ _mesa_delete_list(struct gl_context *ctx, struct gl_display_list *dlist) case OPCODE_UNIFORM_MATRIX34: case OPCODE_UNIFORM_MATRIX43: free(get_pointer([4])); -n += InstSize[n[0].opcode]; break; case OPCODE_PROGRAM_UNIFORM_1FV: case OPCODE_PROGRAM_UNIFORM_2FV: @@ -1075,7 +1054,6 @@ _mesa_delete_list(struct gl_context *ctx, struct gl_display_list *dlist) case OPCODE_PROGRAM_UNIFORM_3UIV: case OPCODE_PROGRAM_UNIFORM_4UIV: free(get_pointer([4])); -n += InstSize[n[0].opcode]; break; case OPCODE_PROGRAM_UNIFORM_MATRIX22F: case OPCODE_PROGRAM_UNIFORM_MATRIX33F: @@ -1087,15 +1065,12 @@ _mesa_delete_list(struct gl_context *ctx, struct gl_display_list *dlist) case OPCODE_PROGRAM_UNIFORM_MATRIX34F: case OPCODE_PROGRAM_UNIFORM_MATRIX43F: free(get_pointer([5])); -n += InstSize[n[0].opcode]; break; case OPCODE_PIXEL_MAP: free(get_pointer([3])); -n += InstSize[n[0].opcode]; break;
Re: [Mesa-dev] [Mesa-stable] [PATCH] util/build-id: Fix address comparison for binaries with LOAD vaddr > 0
On Thu, Jan 25, 2018 at 11:22:10AM +, Emil Velikov wrote: > On 24 January 2018 at 14:13, Stephan Gerholdwrote: > > build_id_find_nhdr_for_addr() fails to find the build-id if the first LOAD > > segment has a virtual address other than 0x0. > > > > For most shared libraries, the first LOAD segment has vaddr=0x0: > > > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > > LOAD 0x00 0x 0x 0x2d2e26 0x2d2e26 R E > > 0x1000 > > LOAD 0x2d2e54 0x002d3e54 0x002d3e54 0x2e248 0x2f148 RW 0x1000 > > > > However, compiling the Intel Vulkan driver as 32-bit binary on Android > > produces > > the following ELF header with vaddr=0x8000 instead: > > > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > > PHDR 0x34 0x8034 0x8034 0x00100 0x00100 R 0x4 > > LOAD 0x00 0x8000 0x8000 0x224a04 0x224a04 R E > > 0x1000 > > LOAD 0x225710 0x0022e710 0x0022e710 0x25988 0x27364 RW 0x1000 > > > > build_id_find_nhdr_callback() compares the address of dli_fbase from > > dladdr() > > and dlpi_addr from dl_iterate_phdr(). With vaddr > 0, these point to a > > different memory address, e.g.: > > > > dli_fbase=0xd8395000 (offset 0x8000) > > dlpi_addr=0xd838d000 > > > > At least on glibc and bionic (Android) dli_fbase refers to the address where > > the shared object is mapped into the process space, whereas dlpi_addr is > > just > > the base address for the vaddrs declared in the ELF header. > > > > To compare them correctly, we need to calculate the start of the mapping > > by adding the vaddr of the first LOAD segment to the base address. > > > > Cc: Chad Versace > > Cc: Emil Velikov > > Cc: Tapani Pälli > > Cc: > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104642 > > Fixes: 5c98d38 "util: Query build-id by symbol address, not library name" > > --- > Based on my observation of glibc code and reading at the spec, I think > this is correct. > Admittedly the man page could be improved. > > FWIW I've poked the #musl people about this change last night, and > haven't heard any feedback yet. > Be that about a) our understanding of how it should work or b) musl's > implementation on the topic. I found a related discussion about the implementation of dli_fbase on the musl mailing list[1]. The FreeBSD man page for dladdr()[2] linked in the message on the musl mailing list is a bit more specific about dli_fbase: "The base address at which the shared object is mapped into the address space of the calling process." ... which is - at least as far as I understand it - exactly how glibc and bionic behave and the reason why we need this patch for LOAD vaddrs != 0. However, from what I've noticed when testing with musl, they seem to handle it unlike glibc/bionic/the FreeBSD man page. musl always returns the base address without the offset where the shared object is mapped. Technically, this means that this patch will break on musl in the rare situation that you actually link a shared library with LOAD vaddr != 0. However, considering that only they seem to handle it differently, this might be worth reporting to them instead? [1]: http://www.openwall.com/lists/musl/2013/01/16/10 [2]: https://www.unix.com/man-page/FreeBSD/3/dladdr/ > Patch looks sensible, although input from Chad/Matt would be appreciated. > Reviewed-by: Emil Velikov > > -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: emit a cache flush before enabling predication
Otherwise cache flushes could get conditionally disabled while still clearing the flush_bits, and thus flushes due to application pipeline barriers may never get executed. Cc: mesa-sta...@lists.freedesktop.org --- src/amd/vulkan/radv_meta_fast_clear.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/amd/vulkan/radv_meta_fast_clear.c b/src/amd/vulkan/radv_meta_fast_clear.c index fdeeaeedbf..f4353fd889 100644 --- a/src/amd/vulkan/radv_meta_fast_clear.c +++ b/src/amd/vulkan/radv_meta_fast_clear.c @@ -602,6 +602,8 @@ radv_emit_color_decompress(struct radv_cmd_buffer *cmd_buffer, } if (!decompress_dcc && image->surface.dcc_size) { + si_emit_cache_flush(cmd_buffer); + radv_emit_set_predication_state_from_image(cmd_buffer, image, true); cmd_buffer->state.predicating = true; } -- 2.13.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallivm: fix crash with seamless cube filtering with different min/mag filter
Looks great. Reviewed-by: Jose FonsecaOn 25/01/18 03:33, srol...@vmware.com wrote: From: Roland Scheidegger We are not allowed to modify the incoming coords values, or things may crash (as we may be inside a llvm conditional and the values may be used in another branch). I recently broke this when fixing an issue with NaNs and seamless cube map filtering, and it causes crashes when doing cubemap filtering if the min and mag filters are different. Add const to the pointers passed in to prevent this mishap in the future. Fixes: a485ad0bcd ("gallivm: fix an issue with NaNs with seamless cube filtering") --- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 38 +-- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c index ff8cbf6..8f760f5 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c @@ -857,7 +857,7 @@ lp_build_sample_image_nearest(struct lp_build_sample_context *bld, LLVMValueRef img_stride_vec, LLVMValueRef data_ptr, LLVMValueRef mipoffsets, - LLVMValueRef *coords, + const LLVMValueRef *coords, const LLVMValueRef *offsets, LLVMValueRef colors_out[4]) { @@ -1004,7 +1004,7 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, LLVMValueRef img_stride_vec, LLVMValueRef data_ptr, LLVMValueRef mipoffsets, - LLVMValueRef *coords, + const LLVMValueRef *coords, const LLVMValueRef *offsets, LLVMValueRef colors_out[4]) { @@ -1106,7 +1106,7 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, struct lp_build_if_state edge_if; LLVMTypeRef int1t; LLVMValueRef new_faces[4], new_xcoords[4][2], new_ycoords[4][2]; - LLVMValueRef coord, have_edge, have_corner; + LLVMValueRef coord0, coord1, have_edge, have_corner; LLVMValueRef fall_off_ym_notxm, fall_off_ym_notxp, fall_off_x, fall_off_y; LLVMValueRef fall_off_yp_notxm, fall_off_yp_notxp; LLVMValueRef x0, x1, y0, y1, y0_clamped, y1_clamped; @@ -1130,20 +1130,20 @@ lp_build_sample_image_linear(struct lp_build_sample_context *bld, * other values might be bogus in the end too). * So kill off the NaNs here. */ - coords[0] = lp_build_max_ext(coord_bld, coords[0], coord_bld->zero, - GALLIVM_NAN_RETURN_OTHER_SECOND_NONNAN); - coords[1] = lp_build_max_ext(coord_bld, coords[1], coord_bld->zero, - GALLIVM_NAN_RETURN_OTHER_SECOND_NONNAN); - coord = lp_build_mul(coord_bld, coords[0], flt_width_vec); + coord0 = lp_build_max_ext(coord_bld, coords[0], coord_bld->zero, +GALLIVM_NAN_RETURN_OTHER_SECOND_NONNAN); + coord0 = lp_build_mul(coord_bld, coord0, flt_width_vec); /* instead of clamp, build mask if overflowed */ - coord = lp_build_sub(coord_bld, coord, half); + coord0 = lp_build_sub(coord_bld, coord0, half); /* convert to int, compute lerp weight */ /* not ideal with AVX (and no AVX2) */ - lp_build_ifloor_fract(coord_bld, coord, , _fpart); + lp_build_ifloor_fract(coord_bld, coord0, , _fpart); x1 = lp_build_add(ivec_bld, x0, ivec_bld->one); - coord = lp_build_mul(coord_bld, coords[1], flt_height_vec); - coord = lp_build_sub(coord_bld, coord, half); - lp_build_ifloor_fract(coord_bld, coord, , _fpart); + coord1 = lp_build_max_ext(coord_bld, coords[1], coord_bld->zero, +GALLIVM_NAN_RETURN_OTHER_SECOND_NONNAN); + coord1 = lp_build_mul(coord_bld, coord1, flt_height_vec); + coord1 = lp_build_sub(coord_bld, coord1, half); + lp_build_ifloor_fract(coord_bld, coord1, , _fpart); y1 = lp_build_add(ivec_bld, y0, ivec_bld->one); fall_off[0] = lp_build_cmp(ivec_bld, PIPE_FUNC_LESS, x0, ivec_bld->zero); @@ -1747,7 +1747,7 @@ lp_build_sample_mipmap(struct lp_build_sample_context *bld, unsigned img_filter, unsigned mip_filter, boolean is_gather, - LLVMValueRef *coords, + const LLVMValueRef *coords, const LLVMValueRef *offsets, LLVMValueRef ilevel0, LLVMValueRef ilevel1, @@ -1820,6 +1820,7 @@ lp_build_sample_mipmap(struct lp_build_sample_context *bld,
[Mesa-dev] [PATCH 1/2] radv: fix a GPU hang with RADV_DEBUG=syncshaders
The GPU hangs when the driver forces a PS_PARTIAL_FLUSH after a dispatch call (and vice versa for graphics). Something has changed in the kernel driver because it used to work. Signed-off-by: Samuel Pitoiset--- src/amd/vulkan/radv_cmd_buffer.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c index 6d512c6070a..ba5fd92f2a1 100644 --- a/src/amd/vulkan/radv_cmd_buffer.c +++ b/src/amd/vulkan/radv_cmd_buffer.c @@ -429,15 +429,14 @@ void radv_cmd_buffer_trace_emit(struct radv_cmd_buffer *cmd_buffer) } static void -radv_cmd_buffer_after_draw(struct radv_cmd_buffer *cmd_buffer) +radv_cmd_buffer_after_draw(struct radv_cmd_buffer *cmd_buffer, + enum radv_cmd_flush_bits flags) { if (cmd_buffer->device->instance->debug_flags & RADV_DEBUG_SYNC_SHADERS) { - enum radv_cmd_flush_bits flags; - - /* Force wait for graphics/compute engines to be idle. */ - flags = RADV_CMD_FLAG_PS_PARTIAL_FLUSH | - RADV_CMD_FLAG_CS_PARTIAL_FLUSH; + assert(flags & (RADV_CMD_FLAG_PS_PARTIAL_FLUSH | + RADV_CMD_FLAG_CS_PARTIAL_FLUSH)); + /* Force wait for graphics or compute engines to be idle. */ si_cs_emit_cache_flush(cmd_buffer->cs, false, cmd_buffer->device->physical_device->rad_info.chip_class, NULL, 0, @@ -3501,7 +3500,7 @@ radv_draw(struct radv_cmd_buffer *cmd_buffer, } assert(cmd_buffer->cs->cdw <= cdw_max); - radv_cmd_buffer_after_draw(cmd_buffer); + radv_cmd_buffer_after_draw(cmd_buffer, RADV_CMD_FLAG_PS_PARTIAL_FLUSH); } void radv_CmdDraw( @@ -3821,7 +3820,7 @@ radv_dispatch(struct radv_cmd_buffer *cmd_buffer, radv_emit_dispatch_packets(cmd_buffer, info); } - radv_cmd_buffer_after_draw(cmd_buffer); + radv_cmd_buffer_after_draw(cmd_buffer, RADV_CMD_FLAG_CS_PARTIAL_FLUSH); } void radv_CmdDispatch( -- 2.16.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv: fix RADV_DEBUG=syncshaders on GFX9
Signed-off-by: Samuel Pitoiset--- src/amd/vulkan/radv_cmd_buffer.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c index ba5fd92f2a1..b694174de68 100644 --- a/src/amd/vulkan/radv_cmd_buffer.c +++ b/src/amd/vulkan/radv_cmd_buffer.c @@ -433,13 +433,22 @@ radv_cmd_buffer_after_draw(struct radv_cmd_buffer *cmd_buffer, enum radv_cmd_flush_bits flags) { if (cmd_buffer->device->instance->debug_flags & RADV_DEBUG_SYNC_SHADERS) { + uint32_t *ptr = NULL; + uint64_t va = 0; + assert(flags & (RADV_CMD_FLAG_PS_PARTIAL_FLUSH | RADV_CMD_FLAG_CS_PARTIAL_FLUSH)); + if (cmd_buffer->device->physical_device->rad_info.chip_class == GFX9) { + va = radv_buffer_get_va(cmd_buffer->gfx9_fence_bo) + +cmd_buffer->gfx9_fence_offset; + ptr = _buffer->gfx9_fence_idx; + } + /* Force wait for graphics or compute engines to be idle. */ si_cs_emit_cache_flush(cmd_buffer->cs, false, cmd_buffer->device->physical_device->rad_info.chip_class, - NULL, 0, + ptr, va, radv_cmd_buffer_uses_mec(cmd_buffer), flags); } -- 2.16.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] configure.ac: add missing llvm dependencies to .pc files
v2: Only add as dependencies for gallium-osmesa and gallium-xlib CC:Signed-of-by: Chuck Atkins --- configure.ac | 6 ++ 1 file changed, 6 insertions(+) diff --git a/configure.ac b/configure.ac index 7c1fbe0ed1..448bd3a6ba 100644 --- a/configure.ac +++ b/configure.ac @@ -2780,6 +2780,12 @@ if test "x$enable_llvm" = xyes; then fi fi fi +if test "x$enable_glx" == xgallium-xlib; then +GL_PC_LIB_PRIV="$GL_PC_LIB_PRIV $LLVM_LIBS" +fi +if test "x$enable_gallium_osmesa" = xyes; then +OSMESA_PC_LIB_PRIV="$OSMESA_PC_LIB_PRIV $LLVM_LIBS" +fi fi AM_CONDITIONAL(HAVE_GALLIUM_SVGA, test "x$HAVE_GALLIUM_SVGA" = xyes) -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/7] freedreno: a2xx: Support TEXTURE_RECT
On Thu, Jan 25, 2018 at 08:41:11AM -0500, Ilia Mirkin wrote: > Should you also expose PIPE_CAP_TEXTURE_RECTANGLE? (Or whatever it's > called... I forget.) Yes, good point, will add that. Wladimir ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Freedreno] [PATCH 1/7] freedreno: a2xx: Update rnndb header
On Thu, Jan 25, 2018 at 08:40:00AM -0500, Ilia Mirkin wrote: > On Thu, Jan 25, 2018 at 8:29 AM, Wladimir J. van der Laan >wrote: > > Also update BLEND_ to BLEND2_ opcodes to accomodate. > > Are you saying this doesn't compile right now? I would have expected > the accompanying change to a2xx.xml.h for that. Perhaps this landed > into the wrong commit? There used to be a rename from BLEND_ to BLEND2_ here, it probably made it in in an earlier patch? It does compile like now but I think the change is correct: BLEND_* a3xx_rb_blend_opcode BLEND2_* is a2xx_rb_blend_opcode Howver, it happens that BLEND2_DST_PLUS_SRC and BLEND_DST_PLUS_SRC have the same value so it's a nop either way. > Also it's odd that the formats are so different than originally > entered. Any opinion on how that happened? I do not know where the original values come from - mine come from the yamoto register headers that are part of the amd-gpu kernel driver. (see freedreno envytools commit 1b32c444f82cd7144d71602106462f59f146c1d0, and also: https://github.com/jaketesler/UDOO_Kernel/blob/master/drivers/mxc/amd-gpu/include/reg/yamato/22/yamato_enum.h#L1799 ) I've checked on a20x that for example ETC1 ones check out, but obviously not every single one of them. Regards, Wladimir > > > > > Signed-off-by: Wladimir J. van der Laan > > --- > > src/gallium/drivers/freedreno/a2xx/a2xx.xml.h | 33 > > +++ > > src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 4 ++-- > > 2 files changed, 15 insertions(+), 22 deletions(-) > > > > diff --git a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h > > b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h > > index 55a4355..279a652 100644 > > --- a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h > > +++ b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h > > @@ -84,13 +84,12 @@ enum a2xx_sq_surfaceformat { > > FMT_5_5_5_1 = 13, > > FMT_8_8_8_8_A = 14, > > FMT_4_4_4_4 = 15, > > - FMT_10_11_11 = 16, > > - FMT_11_11_10 = 17, > > + FMT_8_8_8 = 16, > > FMT_DXT1 = 18, > > FMT_DXT2_3 = 19, > > FMT_DXT4_5 = 20, > > + FMT_10_10_10_2 = 21, > > FMT_24_8 = 22, > > - FMT_24_8_FLOAT = 23, > > FMT_16 = 24, > > FMT_16_16 = 25, > > FMT_16_16_16_16 = 26, > > @@ -106,29 +105,23 @@ enum a2xx_sq_surfaceformat { > > FMT_32_FLOAT = 36, > > FMT_32_32_FLOAT = 37, > > FMT_32_32_32_32_FLOAT = 38, > > - FMT_32_AS_8 = 39, > > - FMT_32_AS_8_8 = 40, > > - FMT_16_MPEG = 41, > > - FMT_16_16_MPEG = 42, > > - FMT_8_INTERLACED = 43, > > - FMT_32_AS_8_INTERLACED = 44, > > - FMT_32_AS_8_8_INTERLACED = 45, > > - FMT_16_INTERLACED = 46, > > - FMT_16_MPEG_INTERLACED = 47, > > - FMT_16_16_MPEG_INTERLACED = 48, > > + FMT_ATI_TC_RGB = 39, > > + FMT_ATI_TC_RGBA = 40, > > + FMT_ATI_TC_555_565_RGB = 41, > > + FMT_ATI_TC_555_565_RGBA = 42, > > + FMT_ATI_TC_RGBA_INTERP = 43, > > + FMT_ATI_TC_555_565_RGBA_INTERP = 44, > > + FMT_ETC1_RGBA_INTERP = 46, > > + FMT_ETC1_RGB = 47, > > + FMT_ETC1_RGBA = 48, > > FMT_DXN = 49, > > - FMT_8_8_8_8_AS_16_16_16_16 = 50, > > - FMT_DXT1_AS_16_16_16_16 = 51, > > - FMT_DXT2_3_AS_16_16_16_16 = 52, > > - FMT_DXT4_5_AS_16_16_16_16 = 53, > > + FMT_2_3_3 = 51, > > FMT_2_10_10_10_AS_16_16_16_16 = 54, > > - FMT_10_11_11_AS_16_16_16_16 = 55, > > - FMT_11_11_10_AS_16_16_16_16 = 56, > > + FMT_10_10_10_2_AS_16_16_16_16 = 55, > > FMT_32_32_32_FLOAT = 57, > > FMT_DXT3A = 58, > > FMT_DXT5A = 59, > > FMT_CTX1 = 60, > > - FMT_DXT3A_AS_1_1_1_1 = 61, > > }; > > > > enum a2xx_sq_ps_vtx_mode { > > diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c > > b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c > > index 0905ab6..46a7d18 100644 > > --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c > > +++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c > > @@ -293,10 +293,10 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct > > fd_tile *tile) > > OUT_PKT3(ring, CP_SET_CONSTANT, 2); > > OUT_RING(ring, CP_REG(REG_A2XX_RB_BLEND_CONTROL)); > > OUT_RING(ring, A2XX_RB_BLEND_CONTROL_COLOR_SRCBLEND(FACTOR_ONE) | > > - > > A2XX_RB_BLEND_CONTROL_COLOR_COMB_FCN(BLEND_DST_PLUS_SRC) | > > + > > A2XX_RB_BLEND_CONTROL_COLOR_COMB_FCN(BLEND2_DST_PLUS_SRC) | > > A2XX_RB_BLEND_CONTROL_COLOR_DESTBLEND(FACTOR_ZERO) | > > A2XX_RB_BLEND_CONTROL_ALPHA_SRCBLEND(FACTOR_ONE) | > > - > > A2XX_RB_BLEND_CONTROL_ALPHA_COMB_FCN(BLEND_DST_PLUS_SRC) | > > + > > A2XX_RB_BLEND_CONTROL_ALPHA_COMB_FCN(BLEND2_DST_PLUS_SRC) | > > A2XX_RB_BLEND_CONTROL_ALPHA_DESTBLEND(FACTOR_ZERO)); > > > >
Re: [Mesa-dev] [PATCH 4/7] freedreno: a2xx: Support TEXTURE_RECT
Should you also expose PIPE_CAP_TEXTURE_RECTANGLE? (Or whatever it's called... I forget.) On Thu, Jan 25, 2018 at 8:29 AM, Wladimir J. van der Laanwrote: > Denormalized texture coordinates are required for text rendering in > GALLIUM_HUD. > > Signed-off-by: Wladimir J. van der Laan > --- > src/gallium/drivers/freedreno/a2xx/fd2_compiler.c | 3 ++- > src/gallium/drivers/freedreno/a2xx/ir-a2xx.c | 1 + > src/gallium/drivers/freedreno/a2xx/ir-a2xx.h | 1 + > 3 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c > b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c > index 2ffd8cd..9f2fc61 100644 > --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c > +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c > @@ -791,6 +791,7 @@ translate_tex(struct fd2_compile_context *ctx, > instr = ir2_instr_create(next_exec_cf(ctx), IR2_FETCH); > instr->fetch.opc = TEX_FETCH; > instr->fetch.is_cube = (inst->Texture.Texture == TGSI_TEXTURE_3D); > + instr->fetch.is_rect = (inst->Texture.Texture == TGSI_TEXTURE_RECT); > assert(inst->Texture.NumOffsets <= 1); // TODO what to do in other > cases? > > /* save off the tex fetch to be patched later with correct const_idx: > */ > @@ -802,7 +803,7 @@ translate_tex(struct fd2_compile_context *ctx, > reg = add_src_reg(ctx, instr, coord); > > /* blob compiler always sets 3rd component to same as 1st for 2d: */ > - if (inst->Texture.Texture == TGSI_TEXTURE_2D) > + if (inst->Texture.Texture == TGSI_TEXTURE_2D || inst->Texture.Texture > == TGSI_TEXTURE_RECT) > reg->swizzle[2] = reg->swizzle[0]; > > /* dst register needs to be marked for sync: */ > diff --git a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c > b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c > index 163c282..3666a7e 100644 > --- a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c > +++ b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c > @@ -341,6 +341,7 @@ static int instr_emit_fetch(struct ir2_instruction *instr, > tex->use_comp_lod = 1; > tex->use_reg_lod = !instr->fetch.is_cube; > tex->sample_location = SAMPLE_CENTER; > +tex->tx_coord_denorm = instr->fetch.is_rect; > > if (instr->pred != IR2_PRED_NONE) { > tex->pred_select = 1; > diff --git a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h > b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h > index 36ed204..c4b6c18 100644 > --- a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h > +++ b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h > @@ -74,6 +74,7 @@ struct ir2_instruction { > unsigned const_idx; > /* texture fetch specific: */ > bool is_cube : 1; > + bool is_rect : 1; > /* vertex fetch specific: */ > unsigned const_idx_sel; > enum a2xx_sq_surfaceformat fmt; > -- > 2.7.4 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Freedreno] [PATCH 1/7] freedreno: a2xx: Update rnndb header
On Thu, Jan 25, 2018 at 8:29 AM, Wladimir J. van der Laanwrote: > Also update BLEND_ to BLEND2_ opcodes to accomodate. Are you saying this doesn't compile right now? I would have expected the accompanying change to a2xx.xml.h for that. Perhaps this landed into the wrong commit? Also it's odd that the formats are so different than originally entered. Any opinion on how that happened? > > Signed-off-by: Wladimir J. van der Laan > --- > src/gallium/drivers/freedreno/a2xx/a2xx.xml.h | 33 > +++ > src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 4 ++-- > 2 files changed, 15 insertions(+), 22 deletions(-) > > diff --git a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h > b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h > index 55a4355..279a652 100644 > --- a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h > +++ b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h > @@ -84,13 +84,12 @@ enum a2xx_sq_surfaceformat { > FMT_5_5_5_1 = 13, > FMT_8_8_8_8_A = 14, > FMT_4_4_4_4 = 15, > - FMT_10_11_11 = 16, > - FMT_11_11_10 = 17, > + FMT_8_8_8 = 16, > FMT_DXT1 = 18, > FMT_DXT2_3 = 19, > FMT_DXT4_5 = 20, > + FMT_10_10_10_2 = 21, > FMT_24_8 = 22, > - FMT_24_8_FLOAT = 23, > FMT_16 = 24, > FMT_16_16 = 25, > FMT_16_16_16_16 = 26, > @@ -106,29 +105,23 @@ enum a2xx_sq_surfaceformat { > FMT_32_FLOAT = 36, > FMT_32_32_FLOAT = 37, > FMT_32_32_32_32_FLOAT = 38, > - FMT_32_AS_8 = 39, > - FMT_32_AS_8_8 = 40, > - FMT_16_MPEG = 41, > - FMT_16_16_MPEG = 42, > - FMT_8_INTERLACED = 43, > - FMT_32_AS_8_INTERLACED = 44, > - FMT_32_AS_8_8_INTERLACED = 45, > - FMT_16_INTERLACED = 46, > - FMT_16_MPEG_INTERLACED = 47, > - FMT_16_16_MPEG_INTERLACED = 48, > + FMT_ATI_TC_RGB = 39, > + FMT_ATI_TC_RGBA = 40, > + FMT_ATI_TC_555_565_RGB = 41, > + FMT_ATI_TC_555_565_RGBA = 42, > + FMT_ATI_TC_RGBA_INTERP = 43, > + FMT_ATI_TC_555_565_RGBA_INTERP = 44, > + FMT_ETC1_RGBA_INTERP = 46, > + FMT_ETC1_RGB = 47, > + FMT_ETC1_RGBA = 48, > FMT_DXN = 49, > - FMT_8_8_8_8_AS_16_16_16_16 = 50, > - FMT_DXT1_AS_16_16_16_16 = 51, > - FMT_DXT2_3_AS_16_16_16_16 = 52, > - FMT_DXT4_5_AS_16_16_16_16 = 53, > + FMT_2_3_3 = 51, > FMT_2_10_10_10_AS_16_16_16_16 = 54, > - FMT_10_11_11_AS_16_16_16_16 = 55, > - FMT_11_11_10_AS_16_16_16_16 = 56, > + FMT_10_10_10_2_AS_16_16_16_16 = 55, > FMT_32_32_32_FLOAT = 57, > FMT_DXT3A = 58, > FMT_DXT5A = 59, > FMT_CTX1 = 60, > - FMT_DXT3A_AS_1_1_1_1 = 61, > }; > > enum a2xx_sq_ps_vtx_mode { > diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c > b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c > index 0905ab6..46a7d18 100644 > --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c > +++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c > @@ -293,10 +293,10 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct > fd_tile *tile) > OUT_PKT3(ring, CP_SET_CONSTANT, 2); > OUT_RING(ring, CP_REG(REG_A2XX_RB_BLEND_CONTROL)); > OUT_RING(ring, A2XX_RB_BLEND_CONTROL_COLOR_SRCBLEND(FACTOR_ONE) | > - > A2XX_RB_BLEND_CONTROL_COLOR_COMB_FCN(BLEND_DST_PLUS_SRC) | > + > A2XX_RB_BLEND_CONTROL_COLOR_COMB_FCN(BLEND2_DST_PLUS_SRC) | > A2XX_RB_BLEND_CONTROL_COLOR_DESTBLEND(FACTOR_ZERO) | > A2XX_RB_BLEND_CONTROL_ALPHA_SRCBLEND(FACTOR_ONE) | > - > A2XX_RB_BLEND_CONTROL_ALPHA_COMB_FCN(BLEND_DST_PLUS_SRC) | > + > A2XX_RB_BLEND_CONTROL_ALPHA_COMB_FCN(BLEND2_DST_PLUS_SRC) | > A2XX_RB_BLEND_CONTROL_ALPHA_DESTBLEND(FACTOR_ZERO)); > > OUT_PKT3(ring, CP_SET_CONSTANT, 3); > -- > 2.7.4 > > ___ > Freedreno mailing list > freedr...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/freedreno ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] configure.ac: add missing llvm dependencies to .pc files
> Should be used only for gallium-xlib based glx, since it embeds the > swr/llvmpipe driver. > ... ... > There is no LLVM specific code in these - ^^ should not be needed. > Correct. This was initially to address the problem for OSMesa but I realized it was likely an issue for more than just OSMesa. After a bit of debugging I see that I was indeed a bit overzealous on that. Will fix in v2. > > >> +if test "x$enable_osmesa$enable_gallium_osmesa" != xnono; then > >> +OSMESA_PC_LIB_PRIV="$OSMESA_PC_LIB_PRIV $OSMESA_PC_LIB_PRIV > $LLVM_LIBS" > > > > I'm assuming the duplicate `$OSMESA_PC_LIB_PRIV` wasn't intended? > Will fix in v2. > These variables have the dependency libs (-lfoo) that the respective > libraries libGL.so/libGLES*so/etc. > Then they are stored in the the .pc Libs.private section - thus anyone > static linking said libraries will reuse it. > This is inded the use case here: building a static libGL or static libOSMesa and having proper dependency info available. I'll push a corrected v2 shortly. Thanks for the review. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] freedreno: a2xx: implement SEQ/SNE instructions
Extend translate_sge_slt to emit these, in analogous fashion but using CNDEv. Signed-off-by: Wladimir J. van der Laan--- src/gallium/drivers/freedreno/a2xx/fd2_compiler.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c index 9f2fc61..52f0aba 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c @@ -829,8 +829,10 @@ translate_tex(struct fd2_compile_context *ctx, /* SGE(a,b) = GTE((b - a), 1.0, 0.0) */ /* SLT(a,b) = GTE((b - a), 0.0, 1.0) */ +/* SEQ(a,b) = EQU((b - a), 1.0, 0.0) */ +/* SNE(a,b) = EQU((b - a), 0.0, 1.0) */ static void -translate_sge_slt(struct fd2_compile_context *ctx, +translate_sge_slt_seq_sne(struct fd2_compile_context *ctx, struct tgsi_full_instruction *inst, unsigned opc) { struct ir2_instruction *instr; @@ -838,6 +840,7 @@ translate_sge_slt(struct fd2_compile_context *ctx, struct tgsi_src_register tmp_src; struct tgsi_src_register tmp_const; float c0, c1; +instr_vector_opc_t vopc; switch (opc) { default: @@ -845,10 +848,22 @@ translate_sge_slt(struct fd2_compile_context *ctx, case TGSI_OPCODE_SGE: c0 = 1.0; c1 = 0.0; +vopc = CNDGTEv; break; case TGSI_OPCODE_SLT: c0 = 0.0; c1 = 1.0; +vopc = CNDGTEv; + break; + case TGSI_OPCODE_SEQ: + c0 = 0.0; + c1 = 1.0; +vopc = CNDEv; + break; + case TGSI_OPCODE_SNE: + c0 = 1.0; + c1 = 0.0; +vopc = CNDEv; break; } @@ -859,7 +874,7 @@ translate_sge_slt(struct fd2_compile_context *ctx, add_src_reg(ctx, instr, >Src[0].Register)->flags |= IR2_REG_NEGATE; add_src_reg(ctx, instr, >Src[1].Register); - instr = ir2_instr_create_alu(next_exec_cf(ctx), CNDGTEv, ~0); + instr = ir2_instr_create_alu(next_exec_cf(ctx), vopc, ~0); add_dst_reg(ctx, instr, >Dst[0].Register); /* maybe should re-arrange the syntax some day, but * in assembler/disassembler and what ir.c expects @@ -1057,7 +1072,9 @@ translate_instruction(struct fd2_compile_context *ctx, break; case TGSI_OPCODE_SLT: case TGSI_OPCODE_SGE: - translate_sge_slt(ctx, inst, opc); +case TGSI_OPCODE_SEQ: +case TGSI_OPCODE_SNE: + translate_sge_slt_seq_sne(ctx, inst, opc); break; case TGSI_OPCODE_MAD: instr = ir2_instr_create_alu(cf, MULADDv, ~0); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/7] freedreno: a2xx: Implement DP2 instruction
Use DOT2ADDv instruction with 0.0f constant add. Signed-off-by: Wladimir J. van der Laan--- src/gallium/drivers/freedreno/a2xx/fd2_compiler.c | 21 + 1 file changed, 21 insertions(+) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c index 52f0aba..ce0b33a 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c @@ -987,6 +987,24 @@ translate_trig(struct fd2_compile_context *ctx, add_src_reg(ctx, instr, _src); } +static void +translate_dp2(struct fd2_compile_context *ctx, + struct tgsi_full_instruction *inst, + unsigned opc) +{ +struct tgsi_src_register tmp_const; +struct ir2_instruction *instr; +/* DP2ADD c,a,b -> dot2(a,b) + c */ +/* for c we use the constant 0.0 */ +instr = ir2_instr_create_alu(next_exec_cf(ctx), DOT2ADDv, ~0); +get_immediate(ctx, _const, fui(0.0f)); +add_dst_reg(ctx, instr, >Dst[0].Register); +add_src_reg(ctx, instr, _const); +add_src_reg(ctx, instr, >Src[0].Register); +add_src_reg(ctx, instr, >Src[1].Register); +add_vector_clamp(inst, instr); +} + /* * Main part of compiler/translator: */ @@ -1054,6 +1072,9 @@ translate_instruction(struct fd2_compile_context *ctx, instr = ir2_instr_create_alu(cf, ADDv, ~0); add_regs_vector_2(ctx, inst, instr); break; + case TGSI_OPCODE_DP2: + translate_dp2(ctx, inst, opc); + break; case TGSI_OPCODE_DP3: instr = ir2_instr_create_alu(cf, DOT3v, ~0); add_regs_vector_2(ctx, inst, instr); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/7] freedreno: a2xx: Fix fd2_tex_swiz
Compose swizzles using util_format_compose_swizzles instead of the custom code (which somehow had a bug). This makes the GL_ALPHA internal format work. Signed-off-by: Wladimir J. van der Laan--- src/gallium/drivers/freedreno/a2xx/fd2_util.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_util.c b/src/gallium/drivers/freedreno/a2xx/fd2_util.c index 0bdcfcd..25f2bf4 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_util.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_util.c @@ -309,14 +309,14 @@ fd2_tex_swiz(enum pipe_format format, unsigned swizzle_r, unsigned swizzle_g, { const struct util_format_description *desc = util_format_description(format); - uint8_t swiz[] = { - swizzle_r, swizzle_g, swizzle_b, swizzle_a, - PIPE_SWIZZLE_0, PIPE_SWIZZLE_1, - PIPE_SWIZZLE_1, PIPE_SWIZZLE_1, - }; + unsigned char swiz[4] = { + swizzle_r, swizzle_g, swizzle_b, swizzle_a, + }, rswiz[4]; - return A2XX_SQ_TEX_3_SWIZ_X(tex_swiz(swiz[desc->swizzle[0]])) | - A2XX_SQ_TEX_3_SWIZ_Y(tex_swiz(swiz[desc->swizzle[1]])) | - A2XX_SQ_TEX_3_SWIZ_Z(tex_swiz(swiz[desc->swizzle[2]])) | - A2XX_SQ_TEX_3_SWIZ_W(tex_swiz(swiz[desc->swizzle[3]])); + util_format_compose_swizzles(desc->swizzle, swiz, rswiz); + + return A2XX_SQ_TEX_3_SWIZ_X(tex_swiz(rswiz[0])) | + A2XX_SQ_TEX_3_SWIZ_Y(tex_swiz(rswiz[1])) | + A2XX_SQ_TEX_3_SWIZ_Z(tex_swiz(rswiz[2])) | + A2XX_SQ_TEX_3_SWIZ_W(tex_swiz(rswiz[3])); } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] freedreno: a2xx: Prevent crash in emit_texture if view is not set
Textures will sometimes be updated if texture view state was un-set, without this change that causes an assertion crash or segfault. Signed-off-by: Wladimir J. van der Laan--- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c index 5a1db13..ebe698f 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c @@ -125,6 +125,7 @@ emit_texture(struct fd_ringbuffer *ring, struct fd_context *ctx, { unsigned const_idx = fd2_get_const_idx(ctx, tex, samp_id); static const struct fd2_sampler_stateobj dummy_sampler = {}; + static const struct fd2_pipe_sampler_view dummy_view = {}; const struct fd2_sampler_stateobj *sampler; struct fd2_pipe_sampler_view *view; @@ -134,13 +135,19 @@ emit_texture(struct fd_ringbuffer *ring, struct fd_context *ctx, sampler = tex->samplers[samp_id] ? fd2_sampler_stateobj(tex->samplers[samp_id]) : _sampler; - view = fd2_pipe_sampler_view(tex->textures[samp_id]); + view = tex->textures[samp_id] ? + fd2_pipe_sampler_view(tex->textures[samp_id]) : + _view; OUT_PKT3(ring, CP_SET_CONSTANT, 7); OUT_RING(ring, 0x0001 + (0x6 * const_idx)); OUT_RING(ring, sampler->tex0 | view->tex0); - OUT_RELOC(ring, fd_resource(view->base.texture)->bo, 0, view->fmt, 0); + if (view->base.texture) + OUT_RELOC(ring, fd_resource(view->base.texture)->bo, 0, view->fmt, 0); + else + OUT_RING(ring, 0); + OUT_RING(ring, view->tex2); OUT_RING(ring, sampler->tex3 | view->tex3); OUT_RING(ring, sampler->tex4); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] freedreno: a2xx: Support TEXTURE_RECT
Denormalized texture coordinates are required for text rendering in GALLIUM_HUD. Signed-off-by: Wladimir J. van der Laan--- src/gallium/drivers/freedreno/a2xx/fd2_compiler.c | 3 ++- src/gallium/drivers/freedreno/a2xx/ir-a2xx.c | 1 + src/gallium/drivers/freedreno/a2xx/ir-a2xx.h | 1 + 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c index 2ffd8cd..9f2fc61 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c @@ -791,6 +791,7 @@ translate_tex(struct fd2_compile_context *ctx, instr = ir2_instr_create(next_exec_cf(ctx), IR2_FETCH); instr->fetch.opc = TEX_FETCH; instr->fetch.is_cube = (inst->Texture.Texture == TGSI_TEXTURE_3D); + instr->fetch.is_rect = (inst->Texture.Texture == TGSI_TEXTURE_RECT); assert(inst->Texture.NumOffsets <= 1); // TODO what to do in other cases? /* save off the tex fetch to be patched later with correct const_idx: */ @@ -802,7 +803,7 @@ translate_tex(struct fd2_compile_context *ctx, reg = add_src_reg(ctx, instr, coord); /* blob compiler always sets 3rd component to same as 1st for 2d: */ - if (inst->Texture.Texture == TGSI_TEXTURE_2D) + if (inst->Texture.Texture == TGSI_TEXTURE_2D || inst->Texture.Texture == TGSI_TEXTURE_RECT) reg->swizzle[2] = reg->swizzle[0]; /* dst register needs to be marked for sync: */ diff --git a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c index 163c282..3666a7e 100644 --- a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c +++ b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c @@ -341,6 +341,7 @@ static int instr_emit_fetch(struct ir2_instruction *instr, tex->use_comp_lod = 1; tex->use_reg_lod = !instr->fetch.is_cube; tex->sample_location = SAMPLE_CENTER; +tex->tx_coord_denorm = instr->fetch.is_rect; if (instr->pred != IR2_PRED_NONE) { tex->pred_select = 1; diff --git a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h index 36ed204..c4b6c18 100644 --- a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h +++ b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h @@ -74,6 +74,7 @@ struct ir2_instruction { unsigned const_idx; /* texture fetch specific: */ bool is_cube : 1; + bool is_rect : 1; /* vertex fetch specific: */ unsigned const_idx_sel; enum a2xx_sq_surfaceformat fmt; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/7] freedreno: a2xx: Compressed textures support
Add support for: - PIPE_FORMAT_ETC1_RGB8 - PIPE_FORMAT_DXT1_RGB - PIPE_FORMAT_DXT1_RGBA - PIPE_FORMAT_DXT3_RGBA - PIPE_FORMAT_DXT5_RGBA Signed-off-by: Wladimir J. van der Laan--- src/gallium/drivers/freedreno/a2xx/fd2_util.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_util.c b/src/gallium/drivers/freedreno/a2xx/fd2_util.c index 25f2bf4..60e5c39 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_util.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_util.c @@ -183,6 +183,17 @@ fd2_pipe2surface(enum pipe_format format) case PIPE_FORMAT_R32G32B32A32_FLOAT: return FMT_32_32_32_32_FLOAT; +/* Compressed textures. */ +case PIPE_FORMAT_ETC1_RGB8: +return FMT_ETC1_RGB; +case PIPE_FORMAT_DXT1_RGB: +case PIPE_FORMAT_DXT1_RGBA: +return FMT_DXT1; +case PIPE_FORMAT_DXT3_RGBA: +return FMT_DXT2_3; +case PIPE_FORMAT_DXT5_RGBA: +return FMT_DXT4_5; + /* YUV buffers. */ case PIPE_FORMAT_UYVY: return FMT_Cr_Y1_Cb_Y0; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/7] freedreno: a2xx: Update rnndb header
Also update BLEND_ to BLEND2_ opcodes to accomodate. Signed-off-by: Wladimir J. van der Laan--- src/gallium/drivers/freedreno/a2xx/a2xx.xml.h | 33 +++ src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 4 ++-- 2 files changed, 15 insertions(+), 22 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h index 55a4355..279a652 100644 --- a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h +++ b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h @@ -84,13 +84,12 @@ enum a2xx_sq_surfaceformat { FMT_5_5_5_1 = 13, FMT_8_8_8_8_A = 14, FMT_4_4_4_4 = 15, - FMT_10_11_11 = 16, - FMT_11_11_10 = 17, + FMT_8_8_8 = 16, FMT_DXT1 = 18, FMT_DXT2_3 = 19, FMT_DXT4_5 = 20, + FMT_10_10_10_2 = 21, FMT_24_8 = 22, - FMT_24_8_FLOAT = 23, FMT_16 = 24, FMT_16_16 = 25, FMT_16_16_16_16 = 26, @@ -106,29 +105,23 @@ enum a2xx_sq_surfaceformat { FMT_32_FLOAT = 36, FMT_32_32_FLOAT = 37, FMT_32_32_32_32_FLOAT = 38, - FMT_32_AS_8 = 39, - FMT_32_AS_8_8 = 40, - FMT_16_MPEG = 41, - FMT_16_16_MPEG = 42, - FMT_8_INTERLACED = 43, - FMT_32_AS_8_INTERLACED = 44, - FMT_32_AS_8_8_INTERLACED = 45, - FMT_16_INTERLACED = 46, - FMT_16_MPEG_INTERLACED = 47, - FMT_16_16_MPEG_INTERLACED = 48, + FMT_ATI_TC_RGB = 39, + FMT_ATI_TC_RGBA = 40, + FMT_ATI_TC_555_565_RGB = 41, + FMT_ATI_TC_555_565_RGBA = 42, + FMT_ATI_TC_RGBA_INTERP = 43, + FMT_ATI_TC_555_565_RGBA_INTERP = 44, + FMT_ETC1_RGBA_INTERP = 46, + FMT_ETC1_RGB = 47, + FMT_ETC1_RGBA = 48, FMT_DXN = 49, - FMT_8_8_8_8_AS_16_16_16_16 = 50, - FMT_DXT1_AS_16_16_16_16 = 51, - FMT_DXT2_3_AS_16_16_16_16 = 52, - FMT_DXT4_5_AS_16_16_16_16 = 53, + FMT_2_3_3 = 51, FMT_2_10_10_10_AS_16_16_16_16 = 54, - FMT_10_11_11_AS_16_16_16_16 = 55, - FMT_11_11_10_AS_16_16_16_16 = 56, + FMT_10_10_10_2_AS_16_16_16_16 = 55, FMT_32_32_32_FLOAT = 57, FMT_DXT3A = 58, FMT_DXT5A = 59, FMT_CTX1 = 60, - FMT_DXT3A_AS_1_1_1_1 = 61, }; enum a2xx_sq_ps_vtx_mode { diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c index 0905ab6..46a7d18 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c @@ -293,10 +293,10 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct fd_tile *tile) OUT_PKT3(ring, CP_SET_CONSTANT, 2); OUT_RING(ring, CP_REG(REG_A2XX_RB_BLEND_CONTROL)); OUT_RING(ring, A2XX_RB_BLEND_CONTROL_COLOR_SRCBLEND(FACTOR_ONE) | - A2XX_RB_BLEND_CONTROL_COLOR_COMB_FCN(BLEND_DST_PLUS_SRC) | + A2XX_RB_BLEND_CONTROL_COLOR_COMB_FCN(BLEND2_DST_PLUS_SRC) | A2XX_RB_BLEND_CONTROL_COLOR_DESTBLEND(FACTOR_ZERO) | A2XX_RB_BLEND_CONTROL_ALPHA_SRCBLEND(FACTOR_ONE) | - A2XX_RB_BLEND_CONTROL_ALPHA_COMB_FCN(BLEND_DST_PLUS_SRC) | + A2XX_RB_BLEND_CONTROL_ALPHA_COMB_FCN(BLEND2_DST_PLUS_SRC) | A2XX_RB_BLEND_CONTROL_ALPHA_DESTBLEND(FACTOR_ZERO)); OUT_PKT3(ring, CP_SET_CONSTANT, 3); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev