Re: [Mesa-dev] swrast gallium provider doesn't work on Solaris SPARC (Mesa 20 with LLVM 10)

2020-05-20 Thread Roland Scheidegger
Oh I missed that sparc is big endian.
Still not sure where the llvm IR would differ but indeed some different
unpacking somewhere could explain things.
(And I'd guess we'd get issues reported for other big endian archs soon
enough too...)

Roland


Am 19.05.20 um 22:00 schrieb Adam Jackson:
> On Wed, 2020-05-13 at 18:27 +0200, Petr Sumbera wrote:
>> Hi,
>>
>> I have difficulties with Mesa update from version Mesa 18 (LLVM 6) where 
>> it used to work.
>>
>> Now with Mesa 20 and LLVM 10 glxgears doesn't show wheels but just some 
>> random rectangles. Please see attached picture.
>>
>> Any idea what could be possibly wrong?
> 
> You have the misfortune to be using a big-endian CPU, and that
> corruption looks quite a lot like the vertex processing stage getting
> channel order wrong. Since the internals of both TGSI and NIR ought to
> be endian-indifferent, I suspect the error would be at point where you
> acquire the vertex data from the app and need to do machine-specific
> unpacking, but that's just a guess. Dumping the IR generated by each
> should hopefully be instructive.
> 
> It might also be worth trying to build Mesa 20 against LLVM 6 to see if
> that points to anything in LLVM, but that's probably a bit less likely.
> 
> - ajax
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7C257d443c503846a91b3708d7fc2f4a38%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637255152405729735sdata=6bzf3CQhwNaivcjX4%2BnqxjDJ2TaIdI8MDArHSwgVxSo%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] swrast gallium provider doesn't work on Solaris SPARC (Mesa 20 with LLVM 10)

2020-05-15 Thread Roland Scheidegger
You can also set LP_DEBUG=tgsi_ir instead (but it will only work in
debug builds).

No idea though what could go wrong when using NIR, seems fairly strange
since there's nothing architecture specific in there (and the generated
IR should be fairly similar too especially for simple stuff).

Roland

Am 15.05.20 um 09:42 schrieb Petr Sumbera:
> Seems to be related to:
> 
> llvmpipe: switch to NIR by default
> 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmesa3d%2Fmesa%2Fcommit%2Fe6b2af56cb037e3174d049478e0ad7c7715780e4%23diff-ca9de5cfd0347994dacc6eca9816d334data=02%7C01%7Csroland%40vmware.com%7C3b3e5b4b132a4a0e25ad08d7f8cdf251%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637251435738333481sdata=c3%2BMWC1pxQL5whzbM4IM65d%2B3LQne5RHIWQVF0NyzPA%3Dreserved=0
> 
> 
> Following helps me as workaround:
> 
> --- a/src/gallium/drivers/llvmpipe/lp_screen.c
> +++ b/src/gallium/drivers/llvmpipe/lp_screen.c
> @@ -434,8 +434,10 @@ llvmpipe_get_shader_param(struct pipe_screen *screen,
>   struct llvmpipe_screen *lscreen = llvmpipe_screen(screen);
>   if (lscreen->use_tgsi)
>  return PIPE_SHADER_IR_TGSI;
> +#if !defined(__sparc__)
>   else
>  return PIPE_SHADER_IR_NIR;
> +#endif
>    }
> 
>    switch (param) {
> 
> ---
> 
> Any idea?
> 
> Petr
> 
> On 13.05.2020 18:27, Petr Sumbera wrote:
>> Hi,
>>
>> I have difficulties with Mesa update from version Mesa 18 (LLVM 6)
>> where it used to work.
>>
>> Now with Mesa 20 and LLVM 10 glxgears doesn't show wheels but just
>> some random rectangles. Please see attached picture.
>>
>> Any idea what could be possibly wrong?
>>
>> Thanks you!
>>
>> Petr
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7C3b3e5b4b132a4a0e25ad08d7f8cdf251%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637251435738333481sdata=gaemm3a26OlOdcUXOerfZWp0iGqw8ysrSPrXKYgRHMU%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] size of LP_MAX_VBUF_SIZE

2020-02-20 Thread Roland Scheidegger
Am 20.02.20 um 02:45 schrieb Dave Airlie:
> Hey,
> 
> Anyone know why LP_MAX_VBUF_SIZE is only 4K?
> 
> Tess submits > 100K verts to the draw pipeline, which start to split
> them, due to the 4K size of the above it splits 50 vertices per vbuf,
> however it then calls draw_reset_vertex_ids which iterates over all
> 100K vertices each time.
> 
> I might try fixing the reset, but I wondered why this was only sending
> 50 vertices at a time to the rasterizer.
> 
> Dave.
> 

Dave,

I don't recall, I think this even predates me working on llvmpipe...
That said, I think in general splitting into smaller chunks is done so
things are more cache friendly (though the limit is so low it would fit
into l1 cache even back then...). And probably the overhead of invoking
things multiple times just wasn't all that large compared to the
execution time of the vs (and the setup code in llvmpipe).
I don't know if that was actually measured though at some point, and it
is quite possible the average vertex size got quite a bit larger since
then (hence max vertices per split lower), as everything was geared
towards quite simple apps back then.

So I think increasing the limit is probably quite fine, but splitting
still needs to work correctly.

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Is it time to stop using the mailing list for patch review?

2019-12-09 Thread Roland Scheidegger
So, I'm guilty of being one of the few still using the mailing list...
But in any case, certainly I can switch to using MRs if everybody
prefers that now (meaning, I'm really indifferent to this).

Roland


Am 10.12.19 um 00:07 schrieb Dylan Baker:
> Hi everyone,
> 
> I think its time we discussed whether we're going to continue to do patch 
> review
> on the mailing list, or if it it should all go through gitlab. I think we 
> should
> stop using the mailing list, here are some reasons:
> 
> 1) Most development is happening on gitlab at this point, patches on the 
> mailing
>list are often overlooked
> 2) The mailing list bypasses CI which potentially breaks the build
> 3) Probably more reasons I'm forgetting.
> 
> Please discuss,
> Dylan
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7C1cb75dae042b4427ab2408d77cfc9cdc%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637115296743272638sdata=BmiwnAd%2BBZaZqqTwIYn0zqPOQ8jQ1pIzOqtyGhxuEN0%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Revert "draw: revert using correct order for prim decomposition."

2019-11-22 Thread Roland Scheidegger
Sorry for the delay, I've now pushed this.

Roland


Am 21.11.19 um 06:31 schrieb Zebediah Figura:
> Ping?
> 
> On 11/5/19 10:21 AM, Zebediah Figura wrote:
>> This reverts commit f97b731c82afb06cfd6ffebc90a3e098a9a1b308.
>>
>> Closes: 
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2Fissues%2F250data=02%7C01%7Csroland%40vmware.com%7C1ebdc9c211964ad8e85608d76e442549%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637099111319407572sdata=sbYBdsE15KRre2Vivfhcd5Bv34CWnJiJZn5Xes4OnqM%3Dreserved=0
>> ---
>> Corresponding piglit test patch: 
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F338747%2Fdata=02%7C01%7Csroland%40vmware.com%7C1ebdc9c211964ad8e85608d76e442549%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637099111319407572sdata=VoCucwYTyU16XDnUX%2FHne8dKHLfF1Q5uwp0OwSJWMF4%3Dreserved=0
>>
>> This is my first patch to Mesa. I do not have commit access.
>>
>>  src/gallium/auxiliary/draw/draw_pt_decompose.h | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/draw/draw_pt_decompose.h 
>> b/src/gallium/auxiliary/draw/draw_pt_decompose.h
>> index c4fab6548b..0b2522c08f 100644
>> --- a/src/gallium/auxiliary/draw/draw_pt_decompose.h
>> +++ b/src/gallium/auxiliary/draw/draw_pt_decompose.h
>> @@ -3,8 +3,6 @@
>> const boolean quads_flatshade_last =  \
>>draw->quads_always_flatshade_last; \
>> const boolean last_vertex_last =  \
>> -  !(draw->rasterizer->flatshade &&   \
>> -draw->rasterizer->flatshade_first);
>> -/* FIXME: the draw->rasterizer->flatshade part is really wrong */
>> +  !draw->rasterizer->flatshade_first;
>>  
>>  #include "draw_decompose_tmp.h"
>>
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7C1ebdc9c211964ad8e85608d76e442549%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637099111319407572sdata=Tr6q6Y1jtulZWiN2QTR4d3viF1Ai7F5cD0zHOZb9X8I%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: Check thread creation errors

2019-11-10 Thread Roland Scheidegger
Looks great to me.
Reviewed-by: Roland Scheidegger 

Am 08.11.19 um 23:05 schrieb Nathan Kidd:
> In the case of glibc, pthread_t is internally a pointer.  If
> lp_rast_destroy() passes a 0-value pthread_t to pthread_join(), the
> latter will SEGV dereferencing it.
> 
> pthread_create() can fail if either the user's ulimit -u or Linux
> kernel's /proc/sys/kernel/threads-max is reached.
> 
> Choosing to continue, rather than fail, on theory that it is better to
> run with the one main thread, than not run at all.
> 
> Keeping as many threads as we got, since lack of threads severely
> degrades llvmpipe performance.
> 
> Signed-off-by: Nathan Kidd 
> ---
>  src/gallium/drivers/llvmpipe/lp_rast.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c
> b/src/gallium/drivers/llvmpipe/lp_rast.c
> index d50e92b..ef783ea 100644
> --- a/src/gallium/drivers/llvmpipe/lp_rast.c
> +++ b/src/gallium/drivers/llvmpipe/lp_rast.c
> @@ -867,6 +867,10 @@ create_rast_threads(struct lp_rasterizer *rast)
>pipe_semaphore_init(>tasks[i].work_done, 0);
>rast->threads[i] = u_thread_create(thread_function,
>  (void *) >tasks[i]);
> +  if (!rast->threads[i]) {
> + rast->num_threads = i; /* previous thread is max */
> + break;
> +  }
> }
>  }
>  -- 1.8.3.1
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7C89a950ed76af42db159608d7649a0d7f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637088485136387009sdata=NT1iuAGZTnoJErKVkS2RbobIj5ufzYI2n0j3FIYxxUY%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] scons: Fix force_scons parsing.

2019-10-25 Thread Roland Scheidegger
Looks alright to me.
Reviewed-by: Roland Scheidegger 

Am 25.10.19 um 23:12 schrieb Jose Fonseca:
> - Use parsed options instead of using ARGUMENTS directly.
> - Handle case mingw cross compilation.
> ---
>  SConstruct | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/SConstruct b/SConstruct
> index 61a915f7deb..f905189dd9e 100644
> --- a/SConstruct
> +++ b/SConstruct
> @@ -71,9 +71,8 @@ Help(opts.GenerateHelpText(env))
>  ###
>  # Print a deprecation warning for using scons on non-windows
>  
> -if common.host_platform != 'windows':
> -force = ARGUMENTS['force_scons']
> -if force.lower() not in {'false', 'off', 'none', '0', 'n'}:
> +if common.host_platform != 'windows' and env['platform'] != 'windows':
> +if env['force_scons']:
>  print("WARNING: Scons is deprecated for non-windows platforms 
> (including cygwin) "
>"please use meson instead.", file=sys.stderr)
>  else:
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: use fallback code for mul_hi with llvm >= 7.0

2019-10-16 Thread Roland Scheidegger
Am 16.10.19 um 01:20 schrieb Dave Airlie:
> On Fri, 30 Aug 2019 at 00:55, Roland Scheidegger  wrote:
>>
>> Am 29.08.19 um 15:05 schrieb Jose Fonseca:
>>> This change is
>>>
>>>   Reviewed-by: Jose Fonseca 
>>>
>>> Regarding follow up change, do you think the LLVM pattern is sane/doable?
>> Yes, should be doable and not too bad (I did not verify that what we're
>> doing doesn't actually get recognized, since it's theoretically possible
>> some other lowering could produce the pattern, although it seems unlikely).
>> I think though this code isn't hit a lot - it was once used by draw,
>> which is why I noticed the suboptimal code generated and added the
>> optimized version, but nowadays it's just for mulhi, so should be fairly
>> rare in practice.
>>
>>
>>>
>>> If not we should try ask them to reconsider relying strictly upon
>>> pattern matching.  I get the feeling upstream LLVM is throwing the baby
>>> with the water with these changes.  I do understand the advantages of
>>> moving away from vendor specific intrinsics, but I think that for things
>>> which have no natural representation on LLVM base IR, they should add a
>>> vendor-agnostic intrinsic, for example a new "/llvm.mulhi.*"  set of
>>> instrinsics/, as narrow pattern matching is bound to produce performance
>>> cliffs nobody will notice.
>> They did add new generic intrinsics for some things, but not this one
>> indeed.
>> I'm not exactly a big fan of this pattern matching in favor of
>> intrinsics neither, at least if the patterns are non-trivial...
> 
> Btw In working on something else, I found the padd and psub intrinsic
> paths seem to fail at least on LLVM 8.
You mean the signed sse2/avx2 intrinsics for psub/padd?
At a quick glance it seems like llvm 8 actually already removed both the
signed and unsigned intrinsics in the end (initially only the unsigned
versions were removed, which is a lot more noticeable as it is hit
basically in all tests). I think the llvm specific intrinsics should
work in llvm 8 too.
I'll test this...

Roland


> 
> I don't have an example test to show though.
> 
> Dave.
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: increase max texture size to 2GB

2019-10-10 Thread Roland Scheidegger
Am 10.10.19 um 20:52 schrieb Jose Fonseca:
> Sounds great.
> 
> Reviewed-by: Jose Fonseca 
> 
> BTW, it's not really difficult to do gather with unsigned offsets: add
> 0x8000 to the base, subtract 0x800 to the offsets, and use the
> signed gather.   If the cost of doing so is significant, we could do
> this just for large textures, by adding a bit per texture to the shader key.
Right, that should work. I guess the cost should be insignificant
(although might make debugging harder?). Theoretically the base address
adjustment could even be done outside the shader (though that would be
looking a bit odd, in any case it's just a single scalar add per jit
invocation), and the offset adjustment could be done while the offsets
are still vectorized.

Roland


> 
> Jose
> 
> 
> 
> *From:* srol...@vmware.com 
> *Sent:* Thursday, October 10, 2019 19:18
> *To:* Jose Fonseca ; mesa-dev@lists.freedesktop.org
> 
> *Cc:* Roland Scheidegger 
> *Subject:* [PATCH] llvmpipe: increase max texture size to 2GB
>  
> From: Roland Scheidegger 
> 
> The 1GB limit was arbitrary, increase this to 2GB (which is the max
> possible without code changes).
> ---
>  src/gallium/drivers/llvmpipe/lp_limits.h | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_limits.h
> b/src/gallium/drivers/llvmpipe/lp_limits.h
> index c2808162c78..569179ecdf4 100644
> --- a/src/gallium/drivers/llvmpipe/lp_limits.h
> +++ b/src/gallium/drivers/llvmpipe/lp_limits.h
> @@ -43,7 +43,11 @@
>  /**
>   * Max texture sizes
>   */
> -#define LP_MAX_TEXTURE_SIZE (1 * 1024 * 1024 * 1024ULL)  /* 1GB for now */
> +/**
> + * 2GB is the actual max currently (we always use 32bit offsets, and both
> + * llvm GEP as well as avx2 gather use signed offsets).
> + */
> +#define LP_MAX_TEXTURE_SIZE (2 * 1024 * 1024 * 1024ULL)
>  #define LP_MAX_TEXTURE_2D_LEVELS 14  /* 8K x 8K for now */
>  #define LP_MAX_TEXTURE_3D_LEVELS 12  /* 2K x 2K x 2K for now */
>  #define LP_MAX_TEXTURE_CUBE_LEVELS 14  /* 8K x 8K for now */
> -- 
> 2.17.1
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: disable accurate cube corner for integer textures.

2019-08-29 Thread Roland Scheidegger
Am 29.08.19 um 22:06 schrieb Dave Airlie:
> From: Dave Airlie 
> 
> Bugzilla: 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.freedesktop.org%2Fshow_bug.cgi%3Fid%3D111511data=02%7C01%7Csroland%40vmware.com%7Cfec452f1a7bc48fdf26c08d72cbc7631%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637027060290077242sdata=DIFdJJllTfLtwIYd5GJVUNmCx9ecNrNKRrbSg9qkMy8%3Dreserved=0
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> index adb6adf143a..69dba78ac8a 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> @@ -1039,6 +1039,10 @@ lp_build_sample_image_linear(struct 
> lp_build_sample_context *bld,
>  
> accurate_cube_corners = ACCURATE_CUBE_CORNERS && seamless_cube_filter;
>  
> +   /* disable accurate cube corners for integer textures. */
> +   if (is_gather && 
> util_format_is_pure_integer(bld->static_texture_state->format))
> +  accurate_cube_corners = FALSE;

I think should drop the is_gather condition - it would crash all the
same if it ends up here (which it shouldn't as the texture would be
incomplete in this case).

So just accurate_cube_corners = ACCURATE_CUBE_CORNERS &&
seamless_cube_filter && !util_format_is_pure_integer()
(maybe with a comment that we should only end up with the linear image
path here in case of gather).
With that fixed,
Reviewed-by: Roland Scheidegger 


> lp_build_extract_image_sizes(bld,
>  >int_size_bld,
>  bld->int_coord_type,
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: use fallback code for mul_hi with llvm >= 7.0

2019-08-29 Thread Roland Scheidegger
Am 29.08.19 um 15:05 schrieb Jose Fonseca:
> This change is 
> 
>   Reviewed-by: Jose Fonseca 
> 
> Regarding follow up change, do you think the LLVM pattern is sane/doable?
Yes, should be doable and not too bad (I did not verify that what we're
doing doesn't actually get recognized, since it's theoretically possible
some other lowering could produce the pattern, although it seems unlikely).
I think though this code isn't hit a lot - it was once used by draw,
which is why I noticed the suboptimal code generated and added the
optimized version, but nowadays it's just for mulhi, so should be fairly
rare in practice.


> 
> If not we should try ask them to reconsider relying strictly upon
> pattern matching.  I get the feeling upstream LLVM is throwing the baby
> with the water with these changes.  I do understand the advantages of
> moving away from vendor specific intrinsics, but I think that for things
> which have no natural representation on LLVM base IR, they should add a
> vendor-agnostic intrinsic, for example a new "/llvm.mulhi.*"  set of
> instrinsics/, as narrow pattern matching is bound to produce performance
> cliffs nobody will notice.
They did add new generic intrinsics for some things, but not this one
indeed.
I'm not exactly a big fan of this pattern matching in favor of
intrinsics neither, at least if the patterns are non-trivial...

Roland



> /
> /
> /Jose/
> 
> 
> *From:* srol...@vmware.com 
> *Sent:* Wednesday, August 28, 2019 20:37
> *To:* mesa-dev@lists.freedesktop.org ;
> Jose Fonseca ; airl...@freedesktop.org
> 
> *Cc:* Roland Scheidegger 
> *Subject:* [PATCH] gallivm: use fallback code for mul_hi with llvm >= 7.0
>  
> From: Roland Scheidegger 
> 
> LLVM 7.0 ditched the pmulu intrinsics.
> This is only a trivial patch to use the fallback code instead.
> It'll likely produce atrocious code since the pattern doesn't match what
> llvm itself uses in its autoupgrade paths, hence the pattern won't be
> recognized.
> 
> Should fix https://bugs.freedesktop.org/show_bug.cgi?id=111496
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_arit.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> index c4931c0b230..f1866c6625f 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> @@ -1169,8 +1169,13 @@ lp_build_mul_32_lohi_cpu(struct lp_build_context
> *bld,
>  * https://llvm.org/bugs/show_bug.cgi?id=30845
>  * So, whip up our own code, albeit only for length 4 and 8 (which
>  * should be good enough)...
> +    * FIXME: For llvm >= 7.0 we should match the autoupgrade pattern
> +    * (bitcast/and/mul/shuffle for unsigned, bitcast/shl/ashr/mul/shuffle
> +    * for signed), which the fallback code does not, without this llvm
> +    * will likely still produce atrocious code.
>  */
> -   if ((bld->type.length == 4 || bld->type.length == 8) &&
> +   if (HAVE_LLVM < 0x0700 &&
> +   (bld->type.length == 4 || bld->type.length == 8) &&
>     ((util_cpu_caps.has_sse2 && (bld->type.sign == 0)) ||
>  util_cpu_caps.has_sse4_1)) {
>    const char *intrinsic = NULL;
> -- 
> 2.17.1
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] scons: Make GCC builds stricter.

2019-08-27 Thread Roland Scheidegger
Am 27.08.19 um 12:57 schrieb Jose Fonseca:
> Uses some of the same -Werror options used by Meson, as suggested by
> Michel Daezer.
> ---
>  scons/gallium.py | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/scons/gallium.py b/scons/gallium.py
> index 21197c8d0d1..2eff4174257 100755
> --- a/scons/gallium.py
> +++ b/scons/gallium.py
> @@ -473,7 +473,10 @@ def generate(env):
>  '-fmessage-length=0', # be nice to Eclipse
>  ]
>  cflags += [
> -'-Wmissing-prototypes',
> +'-Werror=implicit-function-declaration',
> +'-Werror=missing-prototypes',
> +'-Werror=return-type',
> +'-Werror=incompatible-pointer-types',
>  '-std=gnu99',
>      ]
>  if icc:
> 

For the series:
Reviewed-by: Roland Scheidegger 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: fix issue with AtomicCmpXchg wrapper on llvm 3.5-3.8

2019-08-02 Thread Roland Scheidegger
Am 02.08.19 um 18:54 schrieb Brian Paul:
> On 08/02/2019 10:36 AM, srol...@vmware.com wrote:
>> From: Roland Scheidegger 
>>
>> These versions still need wrapper but already have both success and
>> failure ordering.
>> (Compile tested on llvm 3.7, llvm 3.8.)
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=02
>> ---
>>   src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 16 +++-
>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> index 79d10293e80..723c84d57c2 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> @@ -822,15 +822,29 @@ static llvm::AtomicOrdering
>> mapFromLLVMOrdering(LLVMAtomicOrdering Ordering) {
>>  llvm_unreachable("Invalid LLVMAtomicOrdering value!");
>>   }
>>   +#if HAVE_LLVM < 0x305
>>   LLVMValueRef LLVMBuildAtomicCmpXchg(LLVMBuilderRef B, LLVMValueRef Ptr,
>>   LLVMValueRef Cmp, LLVMValueRef New,
>>   LLVMAtomicOrdering SuccessOrdering,
>>   LLVMAtomicOrdering FailureOrdering,
>>   LLVMBool SingleThread)
>>   {
>> -   /* LLVM 3.8 doesn't have a second ordering and uses old
>> SynchronizationScope enum */
>> +   /* LLVM < 3.5 doesn't have a second ordering and uses old
>> SynchronizationScope enum */
>>  return
>> llvm::wrap(llvm::unwrap(B)->CreateAtomicCmpXchg(llvm::unwrap(Ptr),
>> llvm::unwrap(Cmp),
>>    
>> llvm::unwrap(New), mapFromLLVMOrdering(SuccessOrdering),
>>    
>> SingleThread ? llvm::SynchronizationScope::SingleThread :
>> llvm::SynchronizationScope::CrossThread));
>>   }
>> +#else
>> +LLVMValueRef LLVMBuildAtomicCmpXchg(LLVMBuilderRef B, LLVMValueRef Ptr,
>> +    LLVMValueRef Cmp, LLVMValueRef New,
>> +    LLVMAtomicOrdering SuccessOrdering,
>> +    LLVMAtomicOrdering FailureOrdering,
>> +    LLVMBool SingleThread)
>> +{
>> +   return
>> llvm::wrap(llvm::unwrap(B)->CreateAtomicCmpXchg(llvm::unwrap(Ptr),
>> llvm::unwrap(Cmp),
>> + 
>> llvm::unwrap(New), mapFromLLVMOrdering(SuccessOrdering),
>> + 
>> mapFromLLVMOrdering(FailureOrdering),
>> + 
>> SingleThread ? llvm::SynchronizationScope::SingleThread :
>> llvm::SynchronizationScope::CrossThread));
>> +}
>> +#endif
>>   #endif
>>
> 
> Could the #if / #endif logic be moved into the body of
> LLVMBuildAtomicCmpXchg() so the whole function isn't duplicated?
Ah yes sure. Somehow I didn't think of that...
Will change this before submit.

Roland


> 
> Other than that,
> Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [AppVeyor] mesa master #11902 failed

2019-07-18 Thread Roland Scheidegger
Am 16.07.19 um 20:55 schrieb AppVeyor:
> 
>   Build mesa 11902 failed
> Commit 856e84083e by Rob Clark  on
> 7/15/2019 4:05 PM:
> mesa/st: add sampler uniforms\n\nAdd sampler uniforms for the UV
> plane(s), so driver can count the\nuniforms and get the correct sampler
> count.\n\nFixes lowered YUV on a6xx which actually wants to know # of
> samplers.\n\nSigned-off-by: Rob Clark
> \nReviewed-by: Kristian H. Kristensen
> \nReviewed-by: Eric Anholt 
>

Apparently this commit broke windows builds...

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: Improve lp_build_rcp_refine.

2019-06-25 Thread Roland Scheidegger
Looks good to me, albeit it's potentially minimally slower, so I'm
wondering if the higher precision is actually useful?
I guess though the last two steps could use lp_build_fmuladd?
Reviewed-by: Roland Scheidegger 



Am 25.06.19 um 11:17 schrieb Jose Fonseca:
> Use the alternative more accurate expression from
> https://en.wikipedia.org/wiki/Division_algorithm#Newton%E2%80%93Raphson_division
> 
> Tested by enabling this code path, and running gloss mesa demo.
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_arit.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> index 02fb81afe51..8aa5931eb69 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> @@ -2707,11 +2707,11 @@ lp_build_sqrt(struct lp_build_context *bld,
>  /**
>   * Do one Newton-Raphson step to improve reciprocate precision:
>   *
> - *   x_{i+1} = x_i * (2 - a * x_i)
> + *   x_{i+1} = x_i + x_i * (1 - a * x_i)
>   *
>   * XXX: Unfortunately this won't give IEEE-754 conformant results for 0 or
>   * +/-Inf, giving NaN instead.  Certain applications rely on this behavior,
> - * such as Google Earth, which does RCP(RSQRT(0.0) when drawing the Earth's
> + * such as Google Earth, which does RCP(RSQRT(0.0)) when drawing the Earth's
>   * halo. It would be necessary to clamp the argument to prevent this.
>   *
>   * See also:
> @@ -2724,12 +2724,13 @@ lp_build_rcp_refine(struct lp_build_context *bld,
>  LLVMValueRef rcp_a)
>  {
> LLVMBuilderRef builder = bld->gallivm->builder;
> -   LLVMValueRef two = lp_build_const_vec(bld->gallivm, bld->type, 2.0);
> LLVMValueRef res;
>  
> res = LLVMBuildFMul(builder, a, rcp_a, "");
> -   res = LLVMBuildFSub(builder, two, res, "");
> +   res = LLVMBuildFSub(builder, bld->one, res, "");
> +
> res = LLVMBuildFMul(builder, rcp_a, res, "");
> +   res = LLVMBuildFAdd(builder, rcp_a, res, "");
>  
> return res;
>  }
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: make remove_shader_variant static.

2019-06-20 Thread Roland Scheidegger
Reviewed-by: Roland Scheidegger 

Am 19.06.19 um 22:47 schrieb Dave Airlie:
> From: Dave Airlie 
> 
> this isn't used outside this file.
> ---
>  src/gallium/drivers/llvmpipe/lp_state_fs.c | 2 +-
>  src/gallium/drivers/llvmpipe/lp_state_fs.h | 4 
>  2 files changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c 
> b/src/gallium/drivers/llvmpipe/lp_state_fs.c
> index ab285bed1ca..b05997a3aab 100644
> --- a/src/gallium/drivers/llvmpipe/lp_state_fs.c
> +++ b/src/gallium/drivers/llvmpipe/lp_state_fs.c
> @@ -3023,7 +3023,7 @@ llvmpipe_bind_fs_state(struct pipe_context *pipe, void 
> *fs)
>   * Remove shader variant from two lists: the shader's variant list
>   * and the context's variant list.
>   */
> -void
> +static void
>  llvmpipe_remove_shader_variant(struct llvmpipe_context *lp,
> struct lp_fragment_shader_variant *variant)
>  {
> diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.h 
> b/src/gallium/drivers/llvmpipe/lp_state_fs.h
> index 28eccde17f8..dc04df8bd94 100644
> --- a/src/gallium/drivers/llvmpipe/lp_state_fs.h
> +++ b/src/gallium/drivers/llvmpipe/lp_state_fs.h
> @@ -145,8 +145,4 @@ struct lp_fragment_shader
>  void
>  lp_debug_fs_variant(const struct lp_fragment_shader_variant *variant);
>  
> -void
> -llvmpipe_remove_shader_variant(struct llvmpipe_context *lp,
> -   struct lp_fragment_shader_variant *variant);
> -
>  #endif /* LP_STATE_FS_H_ */
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: fix default cbuf info.

2019-05-27 Thread Roland Scheidegger
Am 27.05.19 um 11:39 schrieb Juan A. Suarez Romero:
> On Fri, 2019-05-24 at 03:08 +0200, srol...@vmware.com wrote:
>> From: Roland Scheidegger 
>>
>> The default null_output really needs to be static, otherwise the values
>> we'll eventually get later are doubly random (they are not initialized,
>> and even if they were it's a pointer to a local stack variable).
>> VMware bug 2349556.
> 
> 
> Shouldn't this be CC to @stable ?
I forgot to mention this, but it should not actually be an issue in the
public branch, since that part of the information gathered there isn't
actually used by llvmpipe, hence if it contains garbage or not doesn't
matter. So there isn't really any need for stable.
But we have a branch where llvmpipe uses it.

Roland

> 
> 
>> ---
>>  src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c 
>> b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c
>> index b4e3c2fbc8..9fc9b8c77e 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c
>> @@ -608,7 +608,7 @@ finished:
>>  */
>>  
>> for (index = 0; index < PIPE_MAX_COLOR_BUFS; ++index) {
>> -  const struct lp_tgsi_channel_info null_output[4];
>> +  static const struct lp_tgsi_channel_info null_output[4];
>>info->cbuf[index] = null_output;
>> }
>>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/5] gallium/util: fix two MSVC compiler warnings

2019-05-07 Thread Roland Scheidegger
For the series:
Reviewed-by: Roland Scheidegger 


Am 04.05.19 um 18:07 schrieb Brian Paul:
> Remove stray const qualifier.
> s/unsigned/enum tgsi_semantic/
> ---
>  src/gallium/auxiliary/util/u_format_zs.h  | 2 +-
>  src/gallium/auxiliary/util/u_simple_shaders.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_format_zs.h 
> b/src/gallium/auxiliary/util/u_format_zs.h
> index 160919d..bed3c51 100644
> --- a/src/gallium/auxiliary/util/u_format_zs.h
> +++ b/src/gallium/auxiliary/util/u_format_zs.h
> @@ -113,7 +113,7 @@ void
>  util_format_z24_unorm_s8_uint_pack_s_8uint(uint8_t *dst_row, unsigned 
> dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned width, 
> unsigned height);
>  
>  void
> -util_format_z24_unorm_s8_uint_pack_separate(uint8_t *dst_row, unsigned 
> dst_stride, const uint32_t *z_src_row, unsigned z_src_stride, const uint8_t 
> *s_src_row, unsigned s_src_stride, const unsigned width, unsigned height);
> +util_format_z24_unorm_s8_uint_pack_separate(uint8_t *dst_row, unsigned 
> dst_stride, const uint32_t *z_src_row, unsigned z_src_stride, const uint8_t 
> *s_src_row, unsigned s_src_stride, unsigned width, unsigned height);
>  
>  void
>  util_format_s8_uint_z24_unorm_unpack_z_float(float *dst_row, unsigned 
> dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned width, 
> unsigned height);
> diff --git a/src/gallium/auxiliary/util/u_simple_shaders.c 
> b/src/gallium/auxiliary/util/u_simple_shaders.c
> index 4046ab1..d62a655 100644
> --- a/src/gallium/auxiliary/util/u_simple_shaders.c
> +++ b/src/gallium/auxiliary/util/u_simple_shaders.c
> @@ -117,8 +117,8 @@ util_make_vertex_passthrough_shader_with_so(struct 
> pipe_context *pipe,
>  
>  void *util_make_layered_clear_vertex_shader(struct pipe_context *pipe)
>  {
> -   const unsigned semantic_names[] = {TGSI_SEMANTIC_POSITION,
> -  TGSI_SEMANTIC_GENERIC};
> +   const enum tgsi_semantic semantic_names[] = {TGSI_SEMANTIC_POSITION,
> +TGSI_SEMANTIC_GENERIC};
> const unsigned semantic_indices[] = {0, 0};
>  
> return util_make_vertex_passthrough_shader_with_so(pipe, 2, 
> semantic_names,
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: init some vars to NULL to silence MinGW compiler warnings

2019-05-02 Thread Roland Scheidegger
Reviewed-by: Roland Scheidegger 

Am 01.05.19 um 18:48 schrieb Brian Paul:
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_format_s3tc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_s3tc.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_format_s3tc.c
> index 9561c34..90b2be9 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_format_s3tc.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_format_s3tc.c
> @@ -2191,7 +2191,7 @@ lp_build_fetch_s3tc_rgba_aos(struct gallivm_state 
> *gallivm,
>rgba = LLVMGetUndef(i128_vectype);
>  
>for (count = 0; count < n / 4; count++) {
> - LLVMValueRef colors, codewords, alpha_lo, alpha_hi;
> + LLVMValueRef colors, codewords, alpha_lo = NULL, alpha_hi = NULL;
>  
>   i4 = lp_build_extract_range(gallivm, i, count * 4, 4);
>   j4 = lp_build_extract_range(gallivm, j, count * 4, 4);
> @@ -2230,7 +2230,7 @@ lp_build_fetch_s3tc_rgba_aos(struct gallivm_state 
> *gallivm,
>rgba = LLVMBuildBitCast(builder, rgba, i8_vectype, "");
> }
> else {
> -  LLVMValueRef colors, codewords, alpha_lo, alpha_hi;
> +  LLVMValueRef colors, codewords, alpha_lo = NULL, alpha_hi = NULL;
>  
>lp_build_gather_s3tc(gallivm, n, format_desc, , ,
> _lo, _hi, base_ptr, offset);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Low interpolation precision for 8 bit textures using llvmpipe

2019-04-15 Thread Roland Scheidegger
Am 15.04.19 um 13:55 schrieb Dominik Drees:
> On 4/12/19 5:32 PM, Roland Scheidegger wrote:
>> Am 12.04.19 um 14:34 schrieb Dominik Drees:
>>> Hi Roland!
>>>
>>> On 4/11/19 8:18 PM, Roland Scheidegger wrote:
>>>> What version of mesa are you using?
>>> The original results were generated using version 19.0.2 (from the arch
>>> linux repositories), but I got the same results using the current git
>>> version (98934e6aa19795072a353dae6020dafadc76a1e3).
>> Alright, both of these would use the GALLIVM_PERF var.
>>
>>>> The debug flags were changed a while ago (so that those perf tweaks can
>>>> be disabled on release builds too), it needs to be either:
>>>> GALLIVM_PERF=no_rho_approx,no_brilinear,no_quad_lod
>>>> or easier
>>>> GALLIVM_PERF=no_filter_hacks (which disables these 3 things above
>>>> together)
>>>>
>>>> Although all of that only really affects filtering with mipmaps (not
>>>> sure if you do?).
>>> Using GALLIVM_PERF does not a make a difference, either, but that should
>>> be expected because I'm not using mipmaps, just "regular" linear
>>> filtering (GL_NEAREST).
>>>>
>>>>
>>>> (more below)
>>> See my responses below as well.
>>>>
>>>>
>>>> Am 11.04.19 um 18:00 schrieb Dominik Drees:
>>>>> Running with the suggested flags in the environment does not change the
>>>>> result for the test case I described below. The results with and without
>>>>> the environment variables set are pixel-wise equal.
>>>>>
>>>>> By the way, and if this of interest: For GL_NEAREST sampling the results
>>>>> from hardware and llvmpipe are equal as well.
>>>>>
>>>>> Best,
>>>>> Dominik
>>>>>
>>>>> On 4/11/19 4:36 PM, Ilia Mirkin wrote:
>>>>>> llvmpipe takes a number of shortcuts in the interest of speed which
>>>>>> cause inaccurate texturing. Try running with
>>>>>>
>>>>>> GALLIVM_DEBUG=no_rho_approx,no_brilinear,no_quad_lod
>>>>>>
>>>>>> and see if the issue still occurs.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>      -ilia
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 11, 2019 at 8:30 AM Dominik Drees 
>>>>>> wrote:
>>>>>>>
>>>>>>> Hello, everyone!
>>>>>>>
>>>>>>> I have a question regarding the interpolation precision of llvmpipe.
>>>>>>> Feel free to redirect me to somewhere else if this is not the right
>>>>>>> place to ask. Consider the following scenario: In a fragment shader we
>>>>>>> are sampling from a 16x16, 8 bit texture with values between 0 and 3
>>>>>>> using linear interpolation. Then we write white to the screen if the
>>>>>>> sampled value is > 1/255 and black otherwise. The output looks very
>>>>>>> different when rendered with llvmpipe compared to the result
>>>>>>> produced by
>>>>>>> rendering hardware (for both intel (mesa i965) and nvidia (proprietary
>>>>>>> driver)).
>>>>>>>
>>>>>>> I've uploaded examplary output images here
>>>>>>> (https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2Fa%2FD1udpezdata=02%7C01%7Csroland%40vmware.com%7Cbdef52eb504c4078f9f808d6be96da17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905952501149697sdata=vymggYHZTDLwKNh7RpcM1eSyhVA2L%2BfHNchvYS8yQPQ%3Dreserved=0)
>>>>>>>
>>>>>>>
>>>>>>> and the corresponding fragment shader here
>>>>>>> (https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fpa808Reqdata=02%7C01%7Csroland%40vmware.com%7Cbdef52eb504c4078f9f808d6be96da17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905952501149697sdata=%2FqKVJCXFS4UswynKeSoqCKivTHAb2o%2FZwVE1nwNms3M%3Dreserved=0).
>>>>>>>
>>>> The shader looks iffy to me, how do you use that vec4 in the if clause?
>>>>
>>>>
>>>>>>>
>>>>>>> My hypothesis is that llvmpipe (in contrast to hardware) only uses
>>>>>>> 8 bit
>>>>>>> for the interpolation computation when r

Re: [Mesa-dev] [PATCH 1/3] llvmpipe: add lp_fence_timedwait() helper

2019-04-12 Thread Roland Scheidegger
Looks correct to me.
For the series,
Reviewed-by: Roland Scheidegger 

Am 11.04.19 um 18:05 schrieb Emil Velikov:
> The function is analogous to lp_fence_wait() while taking at timeout
> (ns) parameter, as needed for EGL fence/sync.
> 
> Cc: Roland Scheidegger 
> Signed-off-by: Emil Velikov 
> ---
>  src/gallium/drivers/llvmpipe/lp_fence.c | 22 ++
>  src/gallium/drivers/llvmpipe/lp_fence.h |  3 +++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_fence.c 
> b/src/gallium/drivers/llvmpipe/lp_fence.c
> index 20cd91cd63d..f8b31a9d6a5 100644
> --- a/src/gallium/drivers/llvmpipe/lp_fence.c
> +++ b/src/gallium/drivers/llvmpipe/lp_fence.c
> @@ -125,3 +125,25 @@ lp_fence_wait(struct lp_fence *f)
>  }
>  
>  
> +boolean
> +lp_fence_timedwait(struct lp_fence *f, uint64_t timeout)
> +{
> +   struct timespec ts = {
> +  .tv_nsec = timeout % 10L,
> +  .tv_sec = timeout / 10L,
> +   };
> +   int ret;
> +
> +   if (LP_DEBUG & DEBUG_FENCE)
> +  debug_printf("%s %d\n", __FUNCTION__, f->id);
> +
> +   mtx_lock(>mutex);
> +   assert(f->issued);
> +   while (f->count < f->rank) {
> +  ret = cnd_timedwait(>signalled, >mutex, );
> +   }
> +   mtx_unlock(>mutex);
> +   return ret == thrd_success;
> +}
> +
> +
> diff --git a/src/gallium/drivers/llvmpipe/lp_fence.h 
> b/src/gallium/drivers/llvmpipe/lp_fence.h
> index b72026492c6..5ba746d22d1 100644
> --- a/src/gallium/drivers/llvmpipe/lp_fence.h
> +++ b/src/gallium/drivers/llvmpipe/lp_fence.h
> @@ -65,6 +65,9 @@ lp_fence_signalled(struct lp_fence *fence);
>  void
>  lp_fence_wait(struct lp_fence *fence);
>  
> +boolean
> +lp_fence_timedwait(struct lp_fence *fence, uint64_t timeout);
> +
>  void
>  llvmpipe_init_screen_fence_funcs(struct pipe_screen *screen);
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Low interpolation precision for 8 bit textures using llvmpipe

2019-04-12 Thread Roland Scheidegger
Am 12.04.19 um 14:34 schrieb Dominik Drees:
> Hi Roland!
> 
> On 4/11/19 8:18 PM, Roland Scheidegger wrote:
>> What version of mesa are you using?
> The original results were generated using version 19.0.2 (from the arch
> linux repositories), but I got the same results using the current git
> version (98934e6aa19795072a353dae6020dafadc76a1e3).
Alright, both of these would use the GALLIVM_PERF var.

>> The debug flags were changed a while ago (so that those perf tweaks can
>> be disabled on release builds too), it needs to be either:
>> GALLIVM_PERF=no_rho_approx,no_brilinear,no_quad_lod
>> or easier
>> GALLIVM_PERF=no_filter_hacks (which disables these 3 things above
>> together)
>>
>> Although all of that only really affects filtering with mipmaps (not
>> sure if you do?).
> Using GALLIVM_PERF does not a make a difference, either, but that should
> be expected because I'm not using mipmaps, just "regular" linear
> filtering (GL_NEAREST).
>>
>>
>> (more below)
> See my responses below as well.
>>
>>
>> Am 11.04.19 um 18:00 schrieb Dominik Drees:
>>> Running with the suggested flags in the environment does not change the
>>> result for the test case I described below. The results with and without
>>> the environment variables set are pixel-wise equal.
>>>
>>> By the way, and if this of interest: For GL_NEAREST sampling the results
>>> from hardware and llvmpipe are equal as well.
>>>
>>> Best,
>>> Dominik
>>>
>>> On 4/11/19 4:36 PM, Ilia Mirkin wrote:
>>>> llvmpipe takes a number of shortcuts in the interest of speed which
>>>> cause inaccurate texturing. Try running with
>>>>
>>>> GALLIVM_DEBUG=no_rho_approx,no_brilinear,no_quad_lod
>>>>
>>>> and see if the issue still occurs.
>>>>
>>>> Cheers,
>>>>
>>>>     -ilia
>>>>
>>>>
>>>>
>>>> On Thu, Apr 11, 2019 at 8:30 AM Dominik Drees 
>>>> wrote:
>>>>>
>>>>> Hello, everyone!
>>>>>
>>>>> I have a question regarding the interpolation precision of llvmpipe.
>>>>> Feel free to redirect me to somewhere else if this is not the right
>>>>> place to ask. Consider the following scenario: In a fragment shader we
>>>>> are sampling from a 16x16, 8 bit texture with values between 0 and 3
>>>>> using linear interpolation. Then we write white to the screen if the
>>>>> sampled value is > 1/255 and black otherwise. The output looks very
>>>>> different when rendered with llvmpipe compared to the result
>>>>> produced by
>>>>> rendering hardware (for both intel (mesa i965) and nvidia (proprietary
>>>>> driver)).
>>>>>
>>>>> I've uploaded examplary output images here
>>>>> (https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2Fa%2FD1udpezdata=02%7C01%7Csroland%40vmware.com%7Cbdef52eb504c4078f9f808d6be96da17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905952501149697sdata=vymggYHZTDLwKNh7RpcM1eSyhVA2L%2BfHNchvYS8yQPQ%3Dreserved=0)
>>>>>
>>>>>
>>>>> and the corresponding fragment shader here
>>>>> (https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fpa808Reqdata=02%7C01%7Csroland%40vmware.com%7Cbdef52eb504c4078f9f808d6be96da17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905952501149697sdata=%2FqKVJCXFS4UswynKeSoqCKivTHAb2o%2FZwVE1nwNms3M%3Dreserved=0).
>>>>>
>> The shader looks iffy to me, how do you use that vec4 in the if clause?
>>
>>
>>>>>
>>>>> My hypothesis is that llvmpipe (in contrast to hardware) only uses
>>>>> 8 bit
>>>>> for the interpolation computation when reading from 8 bit textures and
>>>>> thus loses precision in the lower bits. Is that correct? If so, does
>>>>> anyone know of a workaround?
>>
>> So, in theory it is indeed possible the results are less accurate with
>> llvmpipe (I believe all recent hw does rgba8 filtering with more than 8
>> bit precision).
>> For formats fitting into rgba8, we have a fast path in llvmpipe
>> (gallivm) for the lerp, which unpacks the 8bit values into 16bit values,
>> does the lerp with that and packs back to 8 bit. The result is
>> accurately rounded there (to 8 bit) but only for 1 lerp step - for a 2d
>> texture there are 3 of those (one per direction, and a final one
>&g

Re: [Mesa-dev] Low interpolation precision for 8 bit textures using llvmpipe

2019-04-11 Thread Roland Scheidegger
What version of mesa are you using?
The debug flags were changed a while ago (so that those perf tweaks can
be disabled on release builds too), it needs to be either:
GALLIVM_PERF=no_rho_approx,no_brilinear,no_quad_lod
or easier
GALLIVM_PERF=no_filter_hacks (which disables these 3 things above together)

Although all of that only really affects filtering with mipmaps (not
sure if you do?).


(more below)


Am 11.04.19 um 18:00 schrieb Dominik Drees:
> Running with the suggested flags in the environment does not change the
> result for the test case I described below. The results with and without
> the environment variables set are pixel-wise equal.
> 
> By the way, and if this of interest: For GL_NEAREST sampling the results
> from hardware and llvmpipe are equal as well.
> 
> Best,
> Dominik
> 
> On 4/11/19 4:36 PM, Ilia Mirkin wrote:
>> llvmpipe takes a number of shortcuts in the interest of speed which
>> cause inaccurate texturing. Try running with
>>
>> GALLIVM_DEBUG=no_rho_approx,no_brilinear,no_quad_lod
>>
>> and see if the issue still occurs.
>>
>> Cheers,
>>
>>    -ilia
>>
>>
>>
>> On Thu, Apr 11, 2019 at 8:30 AM Dominik Drees 
>> wrote:
>>>
>>> Hello, everyone!
>>>
>>> I have a question regarding the interpolation precision of llvmpipe.
>>> Feel free to redirect me to somewhere else if this is not the right
>>> place to ask. Consider the following scenario: In a fragment shader we
>>> are sampling from a 16x16, 8 bit texture with values between 0 and 3
>>> using linear interpolation. Then we write white to the screen if the
>>> sampled value is > 1/255 and black otherwise. The output looks very
>>> different when rendered with llvmpipe compared to the result produced by
>>> rendering hardware (for both intel (mesa i965) and nvidia (proprietary
>>> driver)).
>>>
>>> I've uploaded examplary output images here
>>> (https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2Fa%2FD1udpezdata=02%7C01%7Csroland%40vmware.com%7Cbdef52eb504c4078f9f808d6be96da17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905952501149697sdata=vymggYHZTDLwKNh7RpcM1eSyhVA2L%2BfHNchvYS8yQPQ%3Dreserved=0)
>>>
>>> and the corresponding fragment shader here
>>> (https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fpa808Reqdata=02%7C01%7Csroland%40vmware.com%7Cbdef52eb504c4078f9f808d6be96da17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905952501149697sdata=%2FqKVJCXFS4UswynKeSoqCKivTHAb2o%2FZwVE1nwNms3M%3Dreserved=0).
The shader looks iffy to me, how do you use that vec4 in the if clause?


>>>
>>> My hypothesis is that llvmpipe (in contrast to hardware) only uses 8 bit
>>> for the interpolation computation when reading from 8 bit textures and
>>> thus loses precision in the lower bits. Is that correct? If so, does
>>> anyone know of a workaround?

So, in theory it is indeed possible the results are less accurate with
llvmpipe (I believe all recent hw does rgba8 filtering with more than 8
bit precision).
For formats fitting into rgba8, we have a fast path in llvmpipe
(gallivm) for the lerp, which unpacks the 8bit values into 16bit values,
does the lerp with that and packs back to 8 bit. The result is
accurately rounded there (to 8 bit) but only for 1 lerp step - for a 2d
texture there are 3 of those (one per direction, and a final one
combining the result). And yes this means the filtered result only has 8
bits.

I do believe you should not rely on implementations having more accuracy
- as far as I know the filtering we do is conformant there (it is tricky
to do better using the fast path).

There would be code to actually do filtering with full float precision,
although there's no way to reach it with rgba8 formats unless you change
the code (if you want to try out the theory, look at
lp_bld_sample_soa.c, lp_build_sample_soa_code() determines whether to
use the fast (aos) filtering path (use_aos, determined mostly by
util_format_fits_8unorm()). If you set this to false it will use the
full float filtering path. (FWIW I was actually thinking a while ago we
should force this path when there's only 1 channel, albeit I never got
around to test (benchmark) it - this is because the AoS filtering path
is really optimized for rgba8 formats, and if you only have 1 channel
it's quite possible float filtering is actually faster, since this
handles the channels individually.)
I guess though if the full float precision filtering is useful in
general, we could add that to GALLIVM_PERF.

Roland




>>>
>>> A little bit of background about the use case: We are trying to move the
>>> CI of Voreen
>>> (https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.uni-muenster.de%2FVoreen%2Fdata=02%7C01%7Csroland%40vmware.com%7Cbdef52eb504c4078f9f808d6be96da17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905952501149697sdata=tZf1sxXpC0rDhAAzqXNp9UQnRmrnZceKCerfJKcMdmk%3Dreserved=0)
>>> to the Gitlab-CI
>>> running in docker without any hardware dependencies. Using llvmpipe for
>>> 

Re: [Mesa-dev] [PATCH 1/2] draw: fix undefined shift of (1 << 31)

2019-04-11 Thread Roland Scheidegger
For the series, and the other one (undefined shifts in swrast/draw),
Reviewed-by: Roland Scheidegger 


Am 11.04.19 um 12:32 schrieb Dave Airlie:
> From: Dave Airlie 
> 
> Pointed out by a coverity scan.
> ---
>  src/gallium/auxiliary/draw/draw_pipe_aapoint.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c 
> b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c
> index 2b96b8ad446..dc22039b127 100644
> --- a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c
> +++ b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c
> @@ -175,7 +175,7 @@ aa_transform_prolog(struct tgsi_transform_context *ctx)
>  
> /* find two free temp regs */
> for (i = 0; i < 32; i++) {
> -  if ((aactx->tempsUsed & (1 << i)) == 0) {
> +  if ((aactx->tempsUsed & (1u << i)) == 0) {
>   /* found a free temp */
>   if (aactx->tmp0 < 0)
>  aactx->tmp0 = i;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/5] softpipe: add support for vertex streams

2019-03-29 Thread Roland Scheidegger
As long as there's no regressions in llvmpipe, looks great to me.

Am 29.03.19 um 06:48 schrieb Dave Airlie:
> This enables the ARB_gpu_shader5 vertex streams on softpipe.
> 
> Signed-off-by: Dave Airlie 
> ---
>  src/gallium/drivers/softpipe/sp_screen.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/softpipe/sp_screen.c 
> b/src/gallium/drivers/softpipe/sp_screen.c
> index 438557e146a..d2c31b8935d 100644
> --- a/src/gallium/drivers/softpipe/sp_screen.c
> +++ b/src/gallium/drivers/softpipe/sp_screen.c
> @@ -122,7 +122,7 @@ softpipe_get_param(struct pipe_screen *screen, enum 
> pipe_cap param)
> case PIPE_CAP_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS:
>return 1024;
> case PIPE_CAP_MAX_VERTEX_STREAMS:
> -  return 1;
> +  return PIPE_MAX_VERTEX_STREAMS;
I think technically you should make this dependent on
!sp_screen->use_llvm (unless you want to fix the llvm paths :-)).

For the series:
Reviewed-by: Roland Scheidegger 



> case PIPE_CAP_MAX_VERTEX_ATTRIB_STRIDE:
>return 2048;
> case PIPE_CAP_PRIMITIVE_RESTART:
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] softpipe/draw: fix vertex id in soft paths.

2019-03-27 Thread Roland Scheidegger
;  {
> struct exec_vertex_shader *evs = exec_vertex_shader(shader);
> struct tgsi_exec_machine *machine = evs->machine;
> @@ -133,7 +134,7 @@ vs_exec_run_linear(struct draw_vertex_shader *shader,
>   if (shader->info.uses_vertexid) {
>  unsigned vid = 
> machine->SysSemanticToIndex[TGSI_SEMANTIC_VERTEXID];
>  assert(vid < ARRAY_SIZE(machine->SystemValue));
> -machine->SystemValue[vid].xyzw[0].i[j] = i + j + basevertex;
> +machine->SystemValue[vid].xyzw[0].i[j] = fetch_elts ? 
> fetch_elts[i + j] : (i + j + basevertex);
>   }
>   if (shader->info.uses_basevertex) {
>  unsigned vid = 
> machine->SysSemanticToIndex[TGSI_SEMANTIC_BASEVERTEX];
> @@ -143,7 +144,7 @@ vs_exec_run_linear(struct draw_vertex_shader *shader,
>   if (shader->info.uses_vertexid_nobase) {
>  unsigned vid = 
> machine->SysSemanticToIndex[TGSI_SEMANTIC_VERTEXID_NOBASE];
>  assert(vid < ARRAY_SIZE(machine->SystemValue));
> -machine->SystemValue[vid].xyzw[0].i[j] = i + j;
> +    machine->SystemValue[vid].xyzw[0].i[j] = fetch_elts ? 
> fetch_elts[i + j] : (i + j);
So, I'm pretty sure you'd actually have to subtract basevertex here for
the elts case - fetch_elts still includes it already (this is also what
the llvm paths do:
"system_values.vertex_id_nobase = LLVMBuildSub(builder,
true_index_array, system_values.basevertex, "");).")
Might not hit that with gl state tracker though...
With that fixed,
For the series:
Reviewed-by: Roland Scheidegger 

Thanks for tackling this! We got a bit sloppy there when we implemented
this stuff in draw first, essentially only made sure it worked with llvm
paths...

Roland


>   }
>  
>   for (slot = 0; slot < shader->info.num_inputs; slot++) {
> diff --git a/src/gallium/auxiliary/draw/draw_vs_llvm.c 
> b/src/gallium/auxiliary/draw/draw_vs_llvm.c
> index c92e4317216..15486f8ffa8 100644
> --- a/src/gallium/auxiliary/draw/draw_vs_llvm.c
> +++ b/src/gallium/auxiliary/draw/draw_vs_llvm.c
> @@ -53,7 +53,8 @@ vs_llvm_run_linear( struct draw_vertex_shader *shader,
>  const unsigned constants_size[PIPE_MAX_CONSTANT_BUFFERS],
>   unsigned count,
>   unsigned input_stride,
> - unsigned output_stride )
> + unsigned output_stride,
> + const unsigned *elts)
>  {
> /* we should never get here since the entire pipeline is
>  * generated in draw_pt_fetch_shade_pipeline_llvm.c */
> diff --git a/src/gallium/auxiliary/draw/draw_vs_variant.c 
> b/src/gallium/auxiliary/draw/draw_vs_variant.c
> index af36a86674d..44cf29b8e47 100644
> --- a/src/gallium/auxiliary/draw/draw_vs_variant.c
> +++ b/src/gallium/auxiliary/draw/draw_vs_variant.c
> @@ -179,7 +179,7 @@ static void PIPE_CDECL vsvg_run_elts( struct 
> draw_vs_variant *variant,
>vsvg->base.vs->draw->pt.user.vs_constants_size,
>count,
>temp_vertex_stride, 
> -  temp_vertex_stride);
> +  temp_vertex_stride, NULL);
>  
> /* FIXME: geometry shading? */
>  
> @@ -247,7 +247,7 @@ static void PIPE_CDECL vsvg_run_linear( struct 
> draw_vs_variant *variant,
>vsvg->base.vs->draw->pt.user.vs_constants_size,
>count,
>temp_vertex_stride, 
> -  temp_vertex_stride);
> +  temp_vertex_stride, NULL);
>  
> if (vsvg->base.key.clip) {
>/* not really handling clipping, just do the rhw so we can
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: fix texture deletion context mix-up issues (v2)

2019-03-22 Thread Roland Scheidegger
Looks alright to me, it's all quite tricky stuff...
Reviewed-by: Roland Scheidegger 


Am 22.03.19 um 20:51 schrieb Brian Paul:
> When we destroy a context, we need to temporarily make that context
> the current one for the thread.
> 
> That's because during context tear-down we make many calls to
> _mesa_reference_texobj(, NULL).  Note there's no context
> parameter.  If the texture's refcount goes to zero and we need to
> delete it, we use the thread's current context.  But if that context
> isn't the context we're tearing down, we get into trouble when
> deallocating sampler views.  See patch 593e36f956 ("st/mesa:
> implement "zombie" sampler views (v2)") for background information.
> 
> Also, we need to release any sampler views attached to the fallback
> textures.
> 
> Fixes a crash on exit with a glretrace of the Nobel Clinician
> application.
> 
> v2: at end of st_destroy_context(), check if save_ctx == ctx and
> unbind the context if so.
> ---
>  src/mesa/state_tracker/st_context.c | 51 
> -
>  1 file changed, 39 insertions(+), 12 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_context.c 
> b/src/mesa/state_tracker/st_context.c
> index f037384..09d467a 100644
> --- a/src/mesa/state_tracker/st_context.c
> +++ b/src/mesa/state_tracker/st_context.c
> @@ -917,15 +917,39 @@ st_destroy_context(struct st_context *st)
>  {
> struct gl_context *ctx = st->ctx;
> struct st_framebuffer *stfb, *next;
> +   struct gl_framebuffer *save_drawbuffer;
> +   struct gl_framebuffer *save_readbuffer;
> +
> +   /* Save the current context and draw/read buffers*/
> +   GET_CURRENT_CONTEXT(save_ctx);
> +   if (save_ctx) {
> +  save_drawbuffer = save_ctx->WinSysDrawBuffer;
> +  save_readbuffer = save_ctx->WinSysReadBuffer;
> +   } else {
> +  save_drawbuffer = save_readbuffer = NULL;
> +   }
>  
> -   GET_CURRENT_CONTEXT(curctx);
> +   /*
> +* We need to bind the context we're deleting so that
> +* _mesa_reference_texobj_() uses this context when deleting textures.
> +* Similarly for framebuffer objects, etc.
> +*/
> +   _mesa_make_current(ctx, NULL, NULL);
>  
> -   if (curctx == NULL) {
> -  /* No current context, but we need one to release
> -   * renderbuffer surface when we release framebuffer.
> -   * So temporarily bind the context.
> -   */
> -  _mesa_make_current(ctx, NULL, NULL);
> +   /* This must be called first so that glthread has a chance to finish */
> +   _mesa_glthread_destroy(ctx);
> +
> +   _mesa_HashWalk(ctx->Shared->TexObjects, destroy_tex_sampler_cb, st);
> +
> +   /* For the fallback textures, free any sampler views belonging to this
> +* context.
> +*/
> +   for (unsigned i = 0; i < NUM_TEXTURE_TARGETS; i++) {
> +  struct st_texture_object *stObj =
> + st_texture_object(ctx->Shared->FallbackTex[i]);
> +  if (stObj) {
> + st_texture_release_context_sampler_view(st, stObj);
> +  }
> }
>  
> st_context_free_zombie_objects(st);
> @@ -933,11 +957,6 @@ st_destroy_context(struct st_context *st)
> mtx_destroy(>zombie_sampler_views.mutex);
> mtx_destroy(>zombie_shaders.mutex);
>  
> -   /* This must be called first so that glthread has a chance to finish */
> -   _mesa_glthread_destroy(ctx);
> -
> -   _mesa_HashWalk(ctx->Shared->TexObjects, destroy_tex_sampler_cb, st);
> -
> st_reference_fragprog(st, >fp, NULL);
> st_reference_prog(st, >gp, NULL);
> st_reference_vertprog(st, >vp, NULL);
> @@ -965,4 +984,12 @@ st_destroy_context(struct st_context *st)
> st = NULL;
>  
> free(ctx);
> +
> +   if (save_ctx == ctx) {
> +  /* unbind the context we just deleted */
> +  _mesa_make_current(NULL, NULL, NULL);
> +   } else {
> +  /* Restore the current context and draw/read buffers (may be NULL) */
> +  _mesa_make_current(save_ctx, save_drawbuffer, save_readbuffer);
> +   }
>  }
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] softpipe: handle 32-bit bitfield inserts

2019-03-21 Thread Roland Scheidegger
Am 21.03.19 um 05:16 schrieb Dave Airlie:
> From: Dave Airlie 
> 
> Fixes piglits if ARB_gpu_shader5 is enabled
> ---
>  src/gallium/auxiliary/tgsi/tgsi_exec.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
> b/src/gallium/auxiliary/tgsi/tgsi_exec.c
> index c93e4e26e40..78159fc1d9f 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
> @@ -4999,10 +4999,14 @@ micro_bfi(union tgsi_exec_channel *dst,
>  {
> int i;
> for (i = 0; i < 4; i++) {
> -  int width = src3->u[i] & 0x1f;
> +  int width = src3->u[i];
>int offset = src2->u[i] & 0x1f;
> -  int bitmask = ((1 << width) - 1) << offset;
> -  dst->u[i] = ((src1->u[i] << offset) & bitmask) | (src0->u[i] & 
> ~bitmask);
> +  if (width == 32) {
> + dst->u[i] = src1->u[i];
> +  } else {
> + int bitmask = ((1 << width) - 1) << offset;
> + dst->u[i] = ((src1->u[i] << offset) & bitmask) | (src0->u[i] & 
> ~bitmask);
> +  }
> }
>  }
>  
> 

I think this is a really highly annoying difference between d3d11 and GL
there for bitfieldInsert/Extract...
But in any case, all 4 patches look good to me.
Reviewed-by: Roland Scheidegger 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: implement "zombie" sampler views (v2)

2019-03-18 Thread Roland Scheidegger
Reviewed-by: Roland Scheidegger 

Am 15.03.19 um 22:12 schrieb Brian Paul:
> When st_texture_release_all_sampler_views() is called the texture may
> have sampler views belonging to several contexts.  If we unreference a
> sampler view and its refcount hits zero, we need to be sure to destroy
> the sampler view with the same context which created it.
> 
> This was not the case with the previous code which used
> pipe_sampler_view_release().  That function could end up freeing a
> sampler view with a context different than the one which created it.
> In the case of the VMware svga driver, we detected this but leaked the
> sampler view.  This led to a crash with google-chrome when the kernel
> module had too many sampler views.  VMware bug 2274734.
> 
> Alternately, if we try to delete a sampler view with the correct
> context, we may be "reaching into" a context which is active on
> another thread.  That's not safe.
> 
> To fix these issues this patch adds a per-context list of "zombie"
> sampler views.  These are views which are to be freed at some point
> when the context is active.  Other contexts may safely add sampler
> views to the zombie list at any time (it's mutex protected).  This
> avoids the context/view ownership mix-ups we had before.
> 
> Tested with: google-chrome, google earth, Redway3D Watch/Turbine demos
> a few Linux games.  If anyone can recomment some other multi-threaded,
> multi-context GL apps to test, please let me know.
> 
> v2: avoid potential race issue by always adding sampler views to the
> zombie list if the view's context doesn't match the current context,
> ignoring the refcount.
> 
> Reviewed-by: Roland Scheidegger 
> Reviewed-by: Neha Bhende 
> Reviewed-by: Mathias Fröhlich 
> Reviewed-By: Jose Fonseca 
> ---
>  src/mesa/state_tracker/st_cb_flush.c |  6 +++
>  src/mesa/state_tracker/st_context.c  | 80 
> 
>  src/mesa/state_tracker/st_context.h  | 25 ++
>  src/mesa/state_tracker/st_sampler_view.c | 21 +++--
>  src/mesa/state_tracker/st_texture.h  |  3 ++
>  5 files changed, 131 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_cb_flush.c 
> b/src/mesa/state_tracker/st_cb_flush.c
> index 5b3188c..81e5338 100644
> --- a/src/mesa/state_tracker/st_cb_flush.c
> +++ b/src/mesa/state_tracker/st_cb_flush.c
> @@ -39,6 +39,7 @@
>  #include "st_cb_flush.h"
>  #include "st_cb_clear.h"
>  #include "st_cb_fbo.h"
> +#include "st_context.h"
>  #include "st_manager.h"
>  #include "pipe/p_context.h"
>  #include "pipe/p_defines.h"
> @@ -53,6 +54,11 @@ st_flush(struct st_context *st,
>  {
> st_flush_bitmap_cache(st);
>  
> +   /* We want to call this function periodically.
> +* Typically, it has nothing to do so it shouldn't be expensive.
> +*/
> +   st_context_free_zombie_objects(st);
> +
> st->pipe->flush(st->pipe, fence, flags);
>  }
>  
> diff --git a/src/mesa/state_tracker/st_context.c 
> b/src/mesa/state_tracker/st_context.c
> index 2898279..c38f8e5 100644
> --- a/src/mesa/state_tracker/st_context.c
> +++ b/src/mesa/state_tracker/st_context.c
> @@ -261,6 +261,79 @@ st_invalidate_state(struct gl_context *ctx)
>  }
>  
>  
> +/*
> + * In some circumstances (such as running google-chrome) the state
> + * tracker may try to delete a resource view from a context different
> + * than when it was created.  We don't want to do that.
> + *
> + * In that situation, st_texture_release_all_sampler_views() calls this
> + * function to transfer the sampler view reference to this context (expected
> + * to be the context which created the view.)
> + */
> +void
> +st_save_zombie_sampler_view(struct st_context *st,
> +struct pipe_sampler_view *view)
> +{
> +   struct st_zombie_sampler_view_node *entry;
> +
> +   assert(view->context == st->pipe);
> +
> +   entry = MALLOC_STRUCT(st_zombie_sampler_view_node);
> +   if (!entry)
> +  return;
> +
> +   entry->view = view;
> +
> +   /* We need a mutex since this function may be called from one thread
> +* while free_zombie_resource_views() is called from another.
> +*/
> +   mtx_lock(>zombie_sampler_views.mutex);
> +   LIST_ADDTAIL(>node, >zombie_sampler_views.list.node);
> +   mtx_unlock(>zombie_sampler_views.mutex);
> +}
> +
> +
> +/*
> + * Free any zombie sampler views that may be attached to this context.
> + */
> +static void
> +free_zombie_sampler_views(struct st_context *st)
> +{
> +   struct st_zombie_sampler_view_node *entry, *next;
> +
&

Re: [Mesa-dev] [PATCH 1/9] gallium/u_math: add ushort_to_float/float_to_ushort

2019-03-15 Thread Roland Scheidegger
Am 16.03.19 um 02:28 schrieb Qiang Yu:
> Signed-off-by: Qiang Yu 
> ---
>  src/util/u_math.h | 31 +++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/src/util/u_math.h b/src/util/u_math.h
> index e7dbbe5ca22..ffadfb47282 100644
> --- a/src/util/u_math.h
> +++ b/src/util/u_math.h
> @@ -389,6 +389,37 @@ float_to_ubyte(float f)
> }
>  }
>  
> +/**
> + * Convert ushort to float in [0, 1].
> + */
> +static inline float
> +ushort_to_float(ushort us)
> +{
> +   return (float) us * (1.0f / 65535.0f);
> +}
> +
> +
> +/**
> + * Convert float in [0,1] to ushort in [0,65535] with clamping.
> + */
> +static inline ushort
> +float_to_ushort(float f)
> +{
> +   union fi tmp;
> +
> +   tmp.f = f;
> +   if (tmp.i < 0) {
> +  return (ushort) 0;
> +   }
> +   else if (tmp.i >= 0x3f80 /* 1.0f */) {
> +  return (ushort) 65535;
> +   }
This will convert NaNs to either 0 or 65535, depending on their sign.
I think generally it's better to convert this consistently to 0 (gl
usually doesn't require this, however d3d10 does).

Roland


> +   else {
> +  tmp.f = tmp.f * (65535.0f/65536.0f) + 128.0f;
> +  return (ushort) tmp.i;
> +   }
> +}
> +
>  static inline float
>  byte_to_float_tex(int8_t b)
>  {
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium/docs: clarify set_sampler_views

2019-03-14 Thread Roland Scheidegger
Am 15.03.19 um 02:18 schrieb Rob Clark:
> On Thu, Mar 14, 2019 at 8:28 PM Roland Scheidegger  wrote:
>>
>> Am 14.03.19 um 22:06 schrieb Rob Clark:
>>> On Thu, Mar 14, 2019 at 3:58 PM Roland Scheidegger  
>>> wrote:
>>>>
>>>> Am 14.03.19 um 14:13 schrieb Rob Clark:
>>>>> On Tue, Mar 12, 2019 at 1:59 PM Roland Scheidegger  
>>>>> wrote:
>>>>>>
>>>>>> Am 12.03.19 um 16:16 schrieb Rob Clark:
>>>>>>> This previously was not called out clearly, but based on a survey of the
>>>>>>> code, it seems the expected behavior is to release the reference to any
>>>>>>> sampler views beyond the new range being bound.
>>>>>>
>>>>>> That isn't really true. This was designed to work like d3d10, where
>>>>>> other views are unmodified.
>>>>>> The cso code will actually unset all views which previously were set and
>>>>>> are above the num_views in the call (this wouldn't be necessary if the
>>>>>> pipe function itself would work like this).
>>>>>> However, it will only do this for fragment textures, and pass through
>>>>>> the parameters unmodified otherwise. Which means behavior might not be
>>>>>> very consistent for the different stages...
>>>>>
>>>>> Any opinion about whether views==NULL should be allowed?  Currently I
>>>>> have something like:
>>>>>
>>>>> 
>>>>> diff --git a/src/gallium/docs/source/context.rst
>>>>> b/src/gallium/docs/source/context.rst
>>>>> index f89d9e1005e..06d30bfb38b 100644
>>>>> --- a/src/gallium/docs/source/context.rst
>>>>> +++ b/src/gallium/docs/source/context.rst
>>>>> @@ -143,6 +143,11 @@ to the array index which is used for sampling.
>>>>>to a respective sampler view and releases a reference to the previous
>>>>>sampler view.
>>>>>
>>>>> +  Sampler views outside of ``[start_slot, start_slot + num_views)`` are
>>>>> +  unmodified.  If ``views`` is NULL, the behavior is the same as if
>>>>> +  ``views[n]`` was NULL for the entire range, ie. releasing the reference
>>>>> +  for all the sampler views in the specified range.
>>>>> +
>>>>>  * ``create_sampler_view`` creates a new sampler view. ``texture`` is 
>>>>> associated
>>>>>with the sampler view which results in sampler view holding a reference
>>>>>to the texture. Format specified in template must be compatible
>>>>> 
>>>>>
>>>>> But going thru the other drivers, a lot of them also don't handle the
>>>>> views==NULL case.  This case doesn't seem to come up with mesa/st, but
>>>>> does with XA and nine, and some of the test code.
>>>>
>>>> I think this should be illegal. As you've noted some drivers can't
>>>> handle it, and I don't see a particularly good reason to allow it. Well
>>>> I guess it trades some complexity in state trackers with some complexity
>>>> in drivers...
>>>
>>> fwiw, going with the idea that it should be legal, I fixed that in the
>>> drivers that didn't handle it in:
>>>
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2Fmerge_requests%2F449data=02%7C01%7Csroland%40vmware.com%7C2fe81dea2d9d4de1974f08d6a8e42caa%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636882095286989477sdata=qd1z5iv8dvt2z16ZlT2OPngoDGofvCM%2F%2F0hsddqAbO4%3Dreserved=0
>>>
>>> (planning to send to list, I just pushed a WIP MR to run it thru the CI 
>>> system)
>>
>> I'm pretty sure both softpipe and llvmpipe would crash too, they
>> dereference this without checking if it's null.
>> So effectively all drivers but one thought it was illegal?
>> I still see no point in allowing it (or rather, changing this to be
>> allowed - per se there's nothing really wrong with this to be allowed).
>> That said, it appears that set_shader_images and set_shader_buffers
>> allow it, so there's some precedence for this.
> 
> hmm, I'd assumed llvmpipe was used with xa somewhere so I didn't
> really look at it and assumed it handled this..
xa only sets fragment sampler views, and those only through cso.
cso will turn this into a non-null views parameter.
(cso itself also won't tolerate null views parameter, unless the count
is zero, but that sho

Re: [Mesa-dev] [PATCH 8/8] gallium/util: remove pipe_sampler_view_release()

2019-03-14 Thread Roland Scheidegger
This looks all good to me.
For the series:
Reviewed-by: Roland Scheidegger 

Am 14.03.19 um 20:37 schrieb Brian Paul:
> It's no longer used.
> ---
>  src/gallium/auxiliary/util/u_inlines.h | 20 
>  1 file changed, 20 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_inlines.h 
> b/src/gallium/auxiliary/util/u_inlines.h
> index fa1e920..567d3d0 100644
> --- a/src/gallium/auxiliary/util/u_inlines.h
> +++ b/src/gallium/auxiliary/util/u_inlines.h
> @@ -192,26 +192,6 @@ pipe_sampler_view_reference(struct pipe_sampler_view 
> **dst,
> *dst = src;
>  }
>  
> -/**
> - * Similar to pipe_sampler_view_reference() but always set the pointer to
> - * NULL and pass in the current context explicitly.
> - *
> - * If *ptr is non-NULL, it may refer to a view that was created in a 
> different
> - * context (however, that context must still be alive).
> - */
> -static inline void
> -pipe_sampler_view_release(struct pipe_context *ctx,
> -  struct pipe_sampler_view **ptr)
> -{
> -   struct pipe_sampler_view *old_view = *ptr;
> -
> -   if (pipe_reference_described(_view->reference, NULL,
> -
> (debug_reference_descriptor)debug_describe_sampler_view)) {
> -  ctx->sampler_view_destroy(ctx, old_view);
> -   }
> -   *ptr = NULL;
> -}
> -
>  static inline void
>  pipe_so_target_reference(struct pipe_stream_output_target **dst,
>   struct pipe_stream_output_target *src)
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium/docs: clarify set_sampler_views

2019-03-14 Thread Roland Scheidegger
Am 14.03.19 um 22:06 schrieb Rob Clark:
> On Thu, Mar 14, 2019 at 3:58 PM Roland Scheidegger  wrote:
>>
>> Am 14.03.19 um 14:13 schrieb Rob Clark:
>>> On Tue, Mar 12, 2019 at 1:59 PM Roland Scheidegger  
>>> wrote:
>>>>
>>>> Am 12.03.19 um 16:16 schrieb Rob Clark:
>>>>> This previously was not called out clearly, but based on a survey of the
>>>>> code, it seems the expected behavior is to release the reference to any
>>>>> sampler views beyond the new range being bound.
>>>>
>>>> That isn't really true. This was designed to work like d3d10, where
>>>> other views are unmodified.
>>>> The cso code will actually unset all views which previously were set and
>>>> are above the num_views in the call (this wouldn't be necessary if the
>>>> pipe function itself would work like this).
>>>> However, it will only do this for fragment textures, and pass through
>>>> the parameters unmodified otherwise. Which means behavior might not be
>>>> very consistent for the different stages...
>>>
>>> Any opinion about whether views==NULL should be allowed?  Currently I
>>> have something like:
>>>
>>> 
>>> diff --git a/src/gallium/docs/source/context.rst
>>> b/src/gallium/docs/source/context.rst
>>> index f89d9e1005e..06d30bfb38b 100644
>>> --- a/src/gallium/docs/source/context.rst
>>> +++ b/src/gallium/docs/source/context.rst
>>> @@ -143,6 +143,11 @@ to the array index which is used for sampling.
>>>to a respective sampler view and releases a reference to the previous
>>>sampler view.
>>>
>>> +  Sampler views outside of ``[start_slot, start_slot + num_views)`` are
>>> +  unmodified.  If ``views`` is NULL, the behavior is the same as if
>>> +  ``views[n]`` was NULL for the entire range, ie. releasing the reference
>>> +  for all the sampler views in the specified range.
>>> +
>>>  * ``create_sampler_view`` creates a new sampler view. ``texture`` is 
>>> associated
>>>with the sampler view which results in sampler view holding a reference
>>>to the texture. Format specified in template must be compatible
>>> 
>>>
>>> But going thru the other drivers, a lot of them also don't handle the
>>> views==NULL case.  This case doesn't seem to come up with mesa/st, but
>>> does with XA and nine, and some of the test code.
>>
>> I think this should be illegal. As you've noted some drivers can't
>> handle it, and I don't see a particularly good reason to allow it. Well
>> I guess it trades some complexity in state trackers with some complexity
>> in drivers...
> 
> fwiw, going with the idea that it should be legal, I fixed that in the
> drivers that didn't handle it in:
> 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2Fmerge_requests%2F449data=02%7C01%7Csroland%40vmware.com%7C503a661358114ccf08d208d6a8c0eadc%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636881943862256444sdata=4c6ehFiS676ZwbneR6T0CBhBHHq7zoL5efQ7E9e%2Fd9E%3Dreserved=0
> 
> (planning to send to list, I just pushed a WIP MR to run it thru the CI 
> system)

I'm pretty sure both softpipe and llvmpipe would crash too, they
dereference this without checking if it's null.
So effectively all drivers but one thought it was illegal?
I still see no point in allowing it (or rather, changing this to be
allowed - per se there's nothing really wrong with this to be allowed).
That said, it appears that set_shader_images and set_shader_buffers
allow it, so there's some precedence for this.

> 
> I was on the fence between making st handle it vs making driver handle
> it, but it is trivial for the driver to handle it if it knows it is
> supposed to.  And I had to fixup the drivers for various things
> already (most hadn't been updated to handle the `start_slot` param,
> for ex).
Yes, I think in particular because when going through cso things will
always start at slot 0, so some drivers got sloppy...
But well for views not being allowed to be null that's also pretty
trivial for state trackers to handle...

> 
> Eric suggested (on the MR) introducing a helper for this, which might
> be a better approach to cut down on the boilerplate..  I'll play with
> that idea.
> 
> (btw, from a quick look, set_sampler_views isn't the only problem
> slot.. I noticed set_shader_buffers has the same issue.. but maybe
> I'll try to fix one thing at a time and worry more about that when
> panfrost or etnaviv gets closer to the point of supporting SSBO and
> compute sh

Re: [Mesa-dev] [PATCH] gallium/docs: clarify set_sampler_views

2019-03-14 Thread Roland Scheidegger
Am 14.03.19 um 14:13 schrieb Rob Clark:
> On Tue, Mar 12, 2019 at 1:59 PM Roland Scheidegger  wrote:
>>
>> Am 12.03.19 um 16:16 schrieb Rob Clark:
>>> This previously was not called out clearly, but based on a survey of the
>>> code, it seems the expected behavior is to release the reference to any
>>> sampler views beyond the new range being bound.
>>
>> That isn't really true. This was designed to work like d3d10, where
>> other views are unmodified.
>> The cso code will actually unset all views which previously were set and
>> are above the num_views in the call (this wouldn't be necessary if the
>> pipe function itself would work like this).
>> However, it will only do this for fragment textures, and pass through
>> the parameters unmodified otherwise. Which means behavior might not be
>> very consistent for the different stages...
> 
> Any opinion about whether views==NULL should be allowed?  Currently I
> have something like:
> 
> 
> diff --git a/src/gallium/docs/source/context.rst
> b/src/gallium/docs/source/context.rst
> index f89d9e1005e..06d30bfb38b 100644
> --- a/src/gallium/docs/source/context.rst
> +++ b/src/gallium/docs/source/context.rst
> @@ -143,6 +143,11 @@ to the array index which is used for sampling.
>to a respective sampler view and releases a reference to the previous
>sampler view.
> 
> +  Sampler views outside of ``[start_slot, start_slot + num_views)`` are
> +  unmodified.  If ``views`` is NULL, the behavior is the same as if
> +  ``views[n]`` was NULL for the entire range, ie. releasing the reference
> +  for all the sampler views in the specified range.
> +
>  * ``create_sampler_view`` creates a new sampler view. ``texture`` is 
> associated
>with the sampler view which results in sampler view holding a reference
>to the texture. Format specified in template must be compatible
> 
> 
> But going thru the other drivers, a lot of them also don't handle the
> views==NULL case.  This case doesn't seem to come up with mesa/st, but
> does with XA and nine, and some of the test code.

I think this should be illegal. As you've noted some drivers can't
handle it, and I don't see a particularly good reason to allow it. Well
I guess it trades some complexity in state trackers with some complexity
in drivers...

Roland



> BR,
> -R
> 
>>
>>
>>>
>>> I think radeonsi and freedreno were the only ones not doing this.  Which
>>> could probably temporarily leak a bit of memory by holding on to the
>>> sampler view reference.
>> Not sure about other drivers, but llvmpipe will not do this neither.
>>
>> Roland
>>
>>
>>>
>>> Signed-off-by: Rob Clark 
>>> ---
>>>  src/gallium/docs/source/context.rst | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/src/gallium/docs/source/context.rst 
>>> b/src/gallium/docs/source/context.rst
>>> index f89d9e1005e..199d335f8f4 100644
>>> --- a/src/gallium/docs/source/context.rst
>>> +++ b/src/gallium/docs/source/context.rst
>>> @@ -143,6 +143,9 @@ to the array index which is used for sampling.
>>>to a respective sampler view and releases a reference to the previous
>>>sampler view.
>>>
>>> +  Previously bound samplers with index ``>= num_views`` are unbound rather
>>> +  than unmodified.
>>> +
>>>  * ``create_sampler_view`` creates a new sampler view. ``texture`` is 
>>> associated
>>>with the sampler view which results in sampler view holding a reference
>>>to the texture. Format specified in template must be compatible
>>>
>>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium/docs: clarify set_sampler_views

2019-03-13 Thread Roland Scheidegger
Am 12.03.19 um 22:48 schrieb Rob Clark:
> On Tue, Mar 12, 2019 at 1:59 PM Roland Scheidegger  wrote:
>>
>> Am 12.03.19 um 16:16 schrieb Rob Clark:
>>> This previously was not called out clearly, but based on a survey of the
>>> code, it seems the expected behavior is to release the reference to any
>>> sampler views beyond the new range being bound.
>>
>> That isn't really true. This was designed to work like d3d10, where
>> other views are unmodified.
>> The cso code will actually unset all views which previously were set and
>> are above the num_views in the call (this wouldn't be necessary if the
>> pipe function itself would work like this).
>> However, it will only do this for fragment textures, and pass through
>> the parameters unmodified otherwise. Which means behavior might not be
>> very consistent for the different stages...
> 
> hmm, I did notice w/ deqp tests (which aren't so good at
> resetting/clearing state between tests), that I ended up w/ different
> # of sampler views bound (without changing freedreno to match the
> behavior of most of the other drivers).. I didn't really dig in that
> closely but it seemed like mesa/st wasn't clearing the additional
> previously bound textures.  Maybe I overlooked something, but that
> seemed wrong.
> 
> One way or another, I guess we should clarify and change the various
> drivers to have a common behavior because right now there two
> different behaviors and I guess it is at least confusing for new
> gallium driver writers (as it was for me and I've been at it for a
> while)

Yes, I agree with that, the current state there doesn't help anyone.

Roland


> BR,
> -R
> 
>>
>>
>>>
>>> I think radeonsi and freedreno were the only ones not doing this.  Which
>>> could probably temporarily leak a bit of memory by holding on to the
>>> sampler view reference.
>> Not sure about other drivers, but llvmpipe will not do this neither.
>>
>> Roland
>>
>>
>>>
>>> Signed-off-by: Rob Clark 
>>> ---
>>>  src/gallium/docs/source/context.rst | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/src/gallium/docs/source/context.rst 
>>> b/src/gallium/docs/source/context.rst
>>> index f89d9e1005e..199d335f8f4 100644
>>> --- a/src/gallium/docs/source/context.rst
>>> +++ b/src/gallium/docs/source/context.rst
>>> @@ -143,6 +143,9 @@ to the array index which is used for sampling.
>>>to a respective sampler view and releases a reference to the previous
>>>sampler view.
>>>
>>> +  Previously bound samplers with index ``>= num_views`` are unbound rather
>>> +  than unmodified.
>>> +
>>>  * ``create_sampler_view`` creates a new sampler view. ``texture`` is 
>>> associated
>>>with the sampler view which results in sampler view holding a reference
>>>to the texture. Format specified in template must be compatible
>>>
>>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium/docs: clarify set_sampler_views

2019-03-12 Thread Roland Scheidegger
Am 12.03.19 um 16:16 schrieb Rob Clark:
> This previously was not called out clearly, but based on a survey of the
> code, it seems the expected behavior is to release the reference to any
> sampler views beyond the new range being bound.

That isn't really true. This was designed to work like d3d10, where
other views are unmodified.
The cso code will actually unset all views which previously were set and
are above the num_views in the call (this wouldn't be necessary if the
pipe function itself would work like this).
However, it will only do this for fragment textures, and pass through
the parameters unmodified otherwise. Which means behavior might not be
very consistent for the different stages...


> 
> I think radeonsi and freedreno were the only ones not doing this.  Which
> could probably temporarily leak a bit of memory by holding on to the
> sampler view reference.
Not sure about other drivers, but llvmpipe will not do this neither.

Roland


> 
> Signed-off-by: Rob Clark 
> ---
>  src/gallium/docs/source/context.rst | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/gallium/docs/source/context.rst 
> b/src/gallium/docs/source/context.rst
> index f89d9e1005e..199d335f8f4 100644
> --- a/src/gallium/docs/source/context.rst
> +++ b/src/gallium/docs/source/context.rst
> @@ -143,6 +143,9 @@ to the array index which is used for sampling.
>to a respective sampler view and releases a reference to the previous
>sampler view.
>  
> +  Previously bound samplers with index ``>= num_views`` are unbound rather
> +  than unmodified.
> +
>  * ``create_sampler_view`` creates a new sampler view. ``texture`` is 
> associated
>with the sampler view which results in sampler view holding a reference
>to the texture. Format specified in template must be compatible
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: init hash keys with memset(), not designated initializers

2019-03-12 Thread Roland Scheidegger
Am 12.03.19 um 18:34 schrieb Eric Anholt:
> Ilia Mirkin  writes:
> 
>> I believe the distinction here is between initializing all the fields
>> and using them individually (thus leaving the padding uninitialized),
>> vs using the fields to create a structure to hash as a sequence of
>> bytes. For the latter, you really need all the memory initialized, not
>> just the "valid" parts of the structure. In at least my mind, it's
>> fairly well-established that compilers don't always initialize all of
>> a structure's underlying bytes, but I also don't have a specific
>> instance of that situation I can point to.
>>
>> For most usage, foo = {0} is fine since you're not hashing the bytes
>> but rather accessing the fields directly. But for hashing, you really
>> want all the bits initialized.
> 
> Gah.  The commit message even said it was about padding, and I failed to
> read.  Sorry for the noise, this does seem right.

The alternative is to use explicit padding in the struct, so that
structure initialization is guaranteed to initialize everything to zero.
I believe some places still do that too, but it's not very nice neither
(can easily make mistakes there). So - pick your poison...

Roland

> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Cd6e706866d844beb84a808d6a7110f53%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636880089068822714sdata=AENcqpYCV4d4DJW3q4Bkg9GXIEyU6Au8Yccuthd7Mbk%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: init hash keys with memset(), not designated initializers

2019-03-08 Thread Roland Scheidegger
Reviewed-by: Roland Scheidegger 

Am 08.03.19 um 18:14 schrieb Brian Paul:
> Since the compiler may not zero-out padding in the object.
> Add a couple comments about this to prevent misunderstandings in
> the future.
> 
> Fixes: 67d96816ff5 ("st/mesa: move, clean-up shader variant key decls/inits")
> ---
>  src/mesa/state_tracker/st_atom_shader.c |  9 +++--
>  src/mesa/state_tracker/st_program.c | 13 ++---
>  2 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_atom_shader.c 
> b/src/mesa/state_tracker/st_atom_shader.c
> index ac7a1a5..a4475e2 100644
> --- a/src/mesa/state_tracker/st_atom_shader.c
> +++ b/src/mesa/state_tracker/st_atom_shader.c
> @@ -112,7 +112,10 @@ st_update_fp( struct st_context *st )
> !stfp->variants->key.bitmap) {
>shader = stfp->variants->driver_shader;
> } else {
> -  struct st_fp_variant_key key = {0};
> +  struct st_fp_variant_key key;
> +
> +  /* use memset, not an initializer to be sure all memory is zeroed */
> +  memset(, 0, sizeof(key));
>  
>key.st = st->has_shareable_shaders ? NULL : st;
>  
> @@ -168,7 +171,9 @@ st_update_vp( struct st_context *st )
> stvp->variants->key.passthrough_edgeflags == st->vertdata_edgeflags) {
>st->vp_variant = stvp->variants;
> } else {
> -  struct st_vp_variant_key key = {0};
> +  struct st_vp_variant_key key;
> +
> +  memset(, 0, sizeof(key));
>  
>key.st = st->has_shareable_shaders ? NULL : st;
>  
> diff --git a/src/mesa/state_tracker/st_program.c 
> b/src/mesa/state_tracker/st_program.c
> index 6d669a9..fe03070 100644
> --- a/src/mesa/state_tracker/st_program.c
> +++ b/src/mesa/state_tracker/st_program.c
> @@ -1807,7 +1807,10 @@ st_get_cp_variant(struct st_context *st,
>  {
> struct pipe_context *pipe = st->pipe;
> struct st_basic_variant *v;
> -   struct st_basic_variant_key key = {0};
> +   struct st_basic_variant_key key;
> +
> +   /* use memset, not an initializer to be sure all memory is zeroed */
> +   memset(, 0, sizeof(key));
>  
> key.st = st->has_shareable_shaders ? NULL : st;
>  
> @@ -2030,7 +2033,9 @@ st_precompile_shader_variant(struct st_context *st,
> switch (prog->Target) {
> case GL_VERTEX_PROGRAM_ARB: {
>struct st_vertex_program *p = (struct st_vertex_program *)prog;
> -  struct st_vp_variant_key key = {0};
> +  struct st_vp_variant_key key;
> +
> +  memset(, 0, sizeof(key));
>  
>key.st = st->has_shareable_shaders ? NULL : st;
>st_get_vp_variant(st, p, );
> @@ -2057,7 +2062,9 @@ st_precompile_shader_variant(struct st_context *st,
>  
> case GL_FRAGMENT_PROGRAM_ARB: {
>struct st_fragment_program *p = (struct st_fragment_program *)prog;
> -  struct st_fp_variant_key key = {0};
> +  struct st_fp_variant_key key;
> +
> +  memset(, 0, sizeof(key));
>  
>key.st = st->has_shareable_shaders ? NULL : st;
>st_get_fp_variant(st, p, );
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: move, clean-up shader variant key decls/inits

2019-03-08 Thread Roland Scheidegger
Am 07.03.19 um 17:20 schrieb Brian Paul:
> Move the variant key declarations inside the scope they're used.
> Use designated initializers instead of memset() calls.
I don't think this will always work as intended, since there's
non-explicit padding bits in the key, and AFAIK even with c11 compilers
are not required to set such padding to zero too.

Roland



> ---
>  src/mesa/state_tracker/st_atom_shader.c | 8 
>  src/mesa/state_tracker/st_program.c | 9 +++--
>  2 files changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_atom_shader.c 
> b/src/mesa/state_tracker/st_atom_shader.c
> index c6faa3f..ac7a1a5 100644
> --- a/src/mesa/state_tracker/st_atom_shader.c
> +++ b/src/mesa/state_tracker/st_atom_shader.c
> @@ -97,7 +97,6 @@ void
>  st_update_fp( struct st_context *st )
>  {
> struct st_fragment_program *stfp;
> -   struct st_fp_variant_key key;
>  
> assert(st->ctx->FragmentProgram._Current);
> stfp = st_fragment_program(st->ctx->FragmentProgram._Current);
> @@ -113,7 +112,8 @@ st_update_fp( struct st_context *st )
> !stfp->variants->key.bitmap) {
>shader = stfp->variants->driver_shader;
> } else {
> -  memset(, 0, sizeof(key));
> +  struct st_fp_variant_key key = {0};
> +
>key.st = st->has_shareable_shaders ? NULL : st;
>  
>/* _NEW_FRAG_CLAMP */
> @@ -155,7 +155,6 @@ void
>  st_update_vp( struct st_context *st )
>  {
> struct st_vertex_program *stvp;
> -   struct st_vp_variant_key key;
>  
> /* find active shader and params -- Should be covered by
>  * ST_NEW_VERTEX_PROGRAM
> @@ -169,7 +168,8 @@ st_update_vp( struct st_context *st )
> stvp->variants->key.passthrough_edgeflags == st->vertdata_edgeflags) {
>st->vp_variant = stvp->variants;
> } else {
> -  memset(, 0, sizeof key);
> +  struct st_vp_variant_key key = {0};
> +
>key.st = st->has_shareable_shaders ? NULL : st;
>  
>/* When this is true, we will add an extra input to the vertex
> diff --git a/src/mesa/state_tracker/st_program.c 
> b/src/mesa/state_tracker/st_program.c
> index c2daa4d..5e43a2e 100644
> --- a/src/mesa/state_tracker/st_program.c
> +++ b/src/mesa/state_tracker/st_program.c
> @@ -1772,9 +1772,8 @@ st_get_cp_variant(struct st_context *st,
>  {
> struct pipe_context *pipe = st->pipe;
> struct st_basic_variant *v;
> -   struct st_basic_variant_key key;
> +   struct st_basic_variant_key key = {0};
>  
> -   memset(, 0, sizeof(key));
> key.st = st->has_shareable_shaders ? NULL : st;
>  
> /* Search for existing variant */
> @@ -1996,9 +1995,8 @@ st_precompile_shader_variant(struct st_context *st,
> switch (prog->Target) {
> case GL_VERTEX_PROGRAM_ARB: {
>struct st_vertex_program *p = (struct st_vertex_program *)prog;
> -  struct st_vp_variant_key key;
> +  struct st_vp_variant_key key = {0};
>  
> -  memset(, 0, sizeof(key));
>key.st = st->has_shareable_shaders ? NULL : st;
>st_get_vp_variant(st, p, );
>break;
> @@ -2024,9 +2022,8 @@ st_precompile_shader_variant(struct st_context *st,
>  
> case GL_FRAGMENT_PROGRAM_ARB: {
>struct st_fragment_program *p = (struct st_fragment_program *)prog;
> -  struct st_fp_variant_key key;
> +  struct st_fp_variant_key key = {0};
>  
> -  memset(, 0, sizeof(key));
>key.st = st->has_shareable_shaders ? NULL : st;
>st_get_fp_variant(st, p, );
>break;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium: Add PIPE_BARRIER_UPDATE_BUFFER and UPDATE_TEXTURE bits.

2019-03-06 Thread Roland Scheidegger
The idea seems reasonable to me (albeit I think having so many different
barrier bits in the API in the first place is likely to cause apps to
slightly misuse them...). Drivers which don't need to care can always
ignore it.

Roland



Am 06.03.19 um 09:32 schrieb Kenneth Graunke:
> The glMemoryBarrier() function makes shader memory stores ordered with
> respect to things specified by the given bits.  Until now, st/mesa has
> ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT,
> saying that drivers should implicitly perform the needed flushing.
> 
> This seems like a pretty big assumption to make.  Instead, this commit
> opts to translate them to new PIPE_BARRIER bits, and adjusts existing
> drivers to continue ignoring them (preserving the current behavior).
> 
> The i965 driver performs actions on these memory barriers.  Shader
> memory stores go through a "data cache" which is separate from the
> render cache and other read caches (like the texture cache).  All
> memory barriers need to flush the data cache (to ensure shader memory
> stores are visible), and possibly invalidate read caches (to ensure
> stale data is no longer visible).  The driver implicitly flushes for
> most caches, but not for data cache, since ARB_shader_image_load_store
> introduced MemoryBarrier() precisely to order these explicitly.
> 
> I would like to follow i965's approach in iris, flushing the data cache
> on any MemoryBarrier() call, so I need st/mesa to actually call the
> pipe->memory_barrier() callback.
> ---
>  .../drivers/freedreno/freedreno_context.c |  3 ++
>  src/gallium/drivers/r600/r600_state_common.c  |  4 +++
>  src/gallium/drivers/radeonsi/si_state.c   |  3 ++
>  src/gallium/drivers/softpipe/sp_flush.c   |  3 ++
>  src/gallium/drivers/tegra/tegra_context.c |  3 ++
>  src/gallium/drivers/v3d/v3d_context.c |  3 ++
>  src/gallium/include/pipe/p_defines.h  |  7 +++-
>  src/mesa/state_tracker/st_cb_texturebarrier.c | 34 +++
>  8 files changed, 44 insertions(+), 16 deletions(-)
> 
> I am curious to hear people's thoughts on this.  It seems useful for the
> driver to receive a memory_barrier() call...and adding a few bits seemed
> to be the cleanest way to make that happen.  But I'm open to ideas.
> 
> There are no nouveau changes in this patch, but that's only because none
> appeared to be necessary.  Most drivers performed some kind of flush on
> any memory_barrier() call, regardless of the bits - but nouveau's hooks
> don't.  So the newly added case would already be a no-op.
> 
> diff --git a/src/gallium/drivers/freedreno/freedreno_context.c 
> b/src/gallium/drivers/freedreno/freedreno_context.c
> index 6c01c15bb32..4e86d099974 100644
> --- a/src/gallium/drivers/freedreno/freedreno_context.c
> +++ b/src/gallium/drivers/freedreno/freedreno_context.c
> @@ -99,6 +99,9 @@ fd_texture_barrier(struct pipe_context *pctx, unsigned 
> flags)
>  static void
>  fd_memory_barrier(struct pipe_context *pctx, unsigned flags)
>  {
> + if (!(flags & ~PIPE_BARRIER_UPDATE))
> + return;
> +
>   fd_context_flush(pctx, NULL, 0);
>   /* TODO do we need to check for persistently mapped buffers and 
> fd_bo_cpu_prep()?? */
>  }
> diff --git a/src/gallium/drivers/r600/r600_state_common.c 
> b/src/gallium/drivers/r600/r600_state_common.c
> index f886a27170d..c7c606f131b 100644
> --- a/src/gallium/drivers/r600/r600_state_common.c
> +++ b/src/gallium/drivers/r600/r600_state_common.c
> @@ -94,6 +94,10 @@ void r600_emit_alphatest_state(struct r600_context *rctx, 
> struct r600_atom *atom
>  static void r600_memory_barrier(struct pipe_context *ctx, unsigned flags)
>  {
>   struct r600_context *rctx = (struct r600_context *)ctx;
> +
> + if (!(flags & ~PIPE_BARRIER_UPDATE))
> + return;
> +
>   if (flags & PIPE_BARRIER_CONSTANT_BUFFER)
>   rctx->b.flags |= R600_CONTEXT_INV_CONST_CACHE;
>  
> diff --git a/src/gallium/drivers/radeonsi/si_state.c 
> b/src/gallium/drivers/radeonsi/si_state.c
> index 458b108a7e3..3c29b4c92ed 100644
> --- a/src/gallium/drivers/radeonsi/si_state.c
> +++ b/src/gallium/drivers/radeonsi/si_state.c
> @@ -4710,6 +4710,9 @@ void si_memory_barrier(struct pipe_context *ctx, 
> unsigned flags)
>  {
>   struct si_context *sctx = (struct si_context *)ctx;
>  
> + if (!(flags & ~PIPE_BARRIER_UPDATE))
> + return;
> +
>   /* Subsequent commands must wait for all shader invocations to
>* complete. */
>   sctx->flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
> diff --git a/src/gallium/drivers/softpipe/sp_flush.c 
> b/src/gallium/drivers/softpipe/sp_flush.c
> index 3bf8c499218..5eccbe34d46 100644
> --- a/src/gallium/drivers/softpipe/sp_flush.c
> +++ b/src/gallium/drivers/softpipe/sp_flush.c
> @@ -192,5 +192,8 @@ void softpipe_texture_barrier(struct pipe_context *pipe, 
> unsigned flags)
>  
>  void softpipe_memory_barrier(struct pipe_context *pipe, unsigned flags)
>  {
> +   if (!(flags & 

Re: [Mesa-dev] [PATCH] gallium/util: Fix off-by-one in box intersection

2019-02-28 Thread Roland Scheidegger
Am 01.03.19 um 00:28 schrieb Gurchetan Singh:
> On Thu, Feb 28, 2019 at 12:39 AM Boris Brezillon
>  wrote:
>>
>> Hello Gurchetan,
>>
>> On Wed, 27 Feb 2019 10:34:26 -0800
>> Gurchetan Singh  wrote:
>>
>>> On Mon, Feb 25, 2019 at 12:35 AM Boris Brezillon
>>>  wrote:

 From: Daniel Stone 

 pipe_boxes are x/y + width/height, rather than x0/y0 -> x1/y1. This
 means that (x+width) is not included in the box.

 The box intersection check was seemingly written for inclusive regions,
 and would falsely assert that adjacent boxes would overlap.

 Fix the off-by-one by being one pixel less greedy.
>>>
>>> Is there a reason for this change?  I only see this used in a warning
>>> in the nine state tracker and virgl (where reporting adjacent
>>> intersections is preferred).
>>
>> This patch was part of a series Daniel worked on to optimize texture
>> atlas updates on Vivante GPUs [1]. In the end, this work has been put
>> on hold because the perf optimization was not as high as expected, but
>> it might be resurrected at some point.
>> Anyway, back to the point. In this patchset, the pipe_region_overlaps()
>> helper needs to check when regions overlap and not when they're
>> adjacent. If other users need u_box_test_intersection_2d() to also
>> detect when boxes are adjacent, then we should definitely keep the code
>> unchanged, but maybe it's worth a comment in the code to clarify the
>> behavior.
> 
> Thanks for the information.  You can just modify this function to be
> something like:
> 
> u_box_test_intersection_2d(const struct pipe_box *a, const struct
> pipe_box *b, boolean adjacent_allowed)
> 
> [or add another function --- whatever you prefer]
> 
> That way we can keep behavior for virgl/nine unchanged.

I can't see why you'd want to know if the regions are adjacent?
If they are adjacent you can still do blits etc. without having to worry
about overwriting src regions etc.
Now for 1d regions (buffers) I could see adjacent being useful - could
use that to combine multiple ranges into one for instance. But I don't
think you'd want to use a 2d intersect test for that...

Roland



> 
>>
>> Regards,
>>
>> Boris
>>
>> [1]https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.collabora.com%2Fbbrezillon%2Fmesa%2Fcommits%2Fetna-texture-atlas-18.2.4data=02%7C01%7Csroland%40vmware.com%7Ce72daea7c212452556f208d69dd47aa1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636869933286320567sdata=lHnhl1gM19Gt%2FU3KVv%2FlpBgPXFoSl4BqwZ93yHgbbRQ%3Dreserved=0
>>
>>>

 Signed-off-by: Daniel Stone 
 Signed-off-by: Boris Brezillon 
 ---
  src/gallium/auxiliary/util/u_box.h | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)

 diff --git a/src/gallium/auxiliary/util/u_box.h 
 b/src/gallium/auxiliary/util/u_box.h
 index b3f478e7bfc4..ead7189ecaf8 100644
 --- a/src/gallium/auxiliary/util/u_box.h
 +++ b/src/gallium/auxiliary/util/u_box.h
 @@ -161,15 +161,15 @@ u_box_test_intersection_2d(const struct pipe_box *a,
 unsigned i;
 int a_l[2], a_r[2], b_l[2], b_r[2];

 -   a_l[0] = MIN2(a->x, a->x + a->width);
 -   a_r[0] = MAX2(a->x, a->x + a->width);
 -   a_l[1] = MIN2(a->y, a->y + a->height);
 -   a_r[1] = MAX2(a->y, a->y + a->height);
 +   a_l[0] = MIN2(a->x, a->x + a->width - 1);
 +   a_r[0] = MAX2(a->x, a->x + a->width - 1);
 +   a_l[1] = MIN2(a->y, a->y + a->height - 1);
 +   a_r[1] = MAX2(a->y, a->y + a->height - 1);

 -   b_l[0] = MIN2(b->x, b->x + b->width);
 -   b_r[0] = MAX2(b->x, b->x + b->width);
 -   b_l[1] = MIN2(b->y, b->y + b->height);
 -   b_r[1] = MAX2(b->y, b->y + b->height);
 +   b_l[0] = MIN2(b->x, b->x + b->width - 1);
 +   b_r[0] = MAX2(b->x, b->x + b->width - 1);
 +   b_l[1] = MIN2(b->y, b->y + b->height - 1);
 +   b_r[1] = MAX2(b->y, b->y + b->height - 1);

 for (i = 0; i < 2; ++i) {
if (a_l[i] > b_r[i] || a_r[i] < b_l[i])
 --
 2.20.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Ce72daea7c212452556f208d69dd47aa1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636869933286320567sdata=%2FSECNIFewcH6gECXxq94DXvX6QfN8PEKpDQd3h%2Boxz8%3Dreserved=0
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Ce72daea7c212452556f208d69dd47aa1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636869933286320567sdata=%2FSECNIFewcH6gECXxq94DXvX6QfN8PEKpDQd3h%2Boxz8%3Dreserved=0
> 

___
mesa-dev mailing list

Re: [Mesa-dev] [PATCH] scons: Workaround failures with MSVC when using SCons 3.0.[2-4].

2019-02-28 Thread Roland Scheidegger
Am 28.02.19 um 11:03 schrieb Jose Fonseca:
> This change applies the workaround suggested by Bill Deegan on the
> affected SCons versions.
> 
> It also adds a comment with the URL explaining why we were using
> customizing the decider and max_drift in the first place, as I had
> forgotten all about it.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109443
> Tested-by: liviupro...@yahoo.com
> ---
>  scons/gallium.py | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/scons/gallium.py b/scons/gallium.py
> index 963834a5fbc..efe32e06c6c 100755
> --- a/scons/gallium.py
> +++ b/scons/gallium.py
> @@ -308,7 +308,13 @@ def generate(env):
>  if env.GetOption('num_jobs') <= 1:
>  env.SetOption('num_jobs', num_jobs())
>  
> -env.Decider('MD5-timestamp')
> +# Speed up dependency checking.  See
> +# - https://github.com/SCons/scons/wiki/GoFastButton
> +# - https://bugs.freedesktop.org/show_bug.cgi?id=109443
> +scons_version = distutils.version.StrictVersion(SCons.__version__)
> +if scons_version < distutils.version.StrictVersion('3.0.2') or \
> +   scons_version > distutils.version.StrictVersion('3.0.4'):
> +env.Decider('MD5-timestamp')
>  env.SetOption('max_drift', 60)
>  
>  # C preprocessor options
> 

Reviewed-by: Roland Scheidegger 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium: allow more PIPE_RESOURCE_ driver flags

2019-01-31 Thread Roland Scheidegger
Looks ok to me, there's still a few bits left luckily...
Reviewed-by: Roland Scheidegger 

Am 30.01.19 um 20:21 schrieb Marek Olšák:
> From: Marek Olšák 
> 
> radeonsi has 8 and will probably have 9 soon.
> ---
>  src/gallium/include/pipe/p_defines.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/include/pipe/p_defines.h 
> b/src/gallium/include/pipe/p_defines.h
> index 867d0cb5d74..1f2a3469cc9 100644
> --- a/src/gallium/include/pipe/p_defines.h
> +++ b/src/gallium/include/pipe/p_defines.h
> @@ -480,21 +480,21 @@ enum pipe_flush_flags
>  #define PIPE_BIND_LINEAR  (1 << 21)
>  
>  
>  /**
>   * Flags for the driver about resource behaviour:
>   */
>  #define PIPE_RESOURCE_FLAG_MAP_PERSISTENT (1 << 0)
>  #define PIPE_RESOURCE_FLAG_MAP_COHERENT   (1 << 1)
>  #define PIPE_RESOURCE_FLAG_TEXTURING_MORE_LIKELY (1 << 2)
>  #define PIPE_RESOURCE_FLAG_SPARSE(1 << 3)
> -#define PIPE_RESOURCE_FLAG_DRV_PRIV(1 << 16) /* driver/winsys private */
> +#define PIPE_RESOURCE_FLAG_DRV_PRIV(1 << 8) /* driver/winsys private */
>  #define PIPE_RESOURCE_FLAG_ST_PRIV (1 << 24) /* state-tracker/winsys 
> private */
>  
>  /**
>   * Hint about the expected lifecycle of a resource.
>   * Sorted according to GPU vs CPU access.
>   */
>  enum pipe_resource_usage {
> PIPE_USAGE_DEFAULT,/* fast GPU access */
> PIPE_USAGE_IMMUTABLE,  /* fast GPU access, immutable */
> PIPE_USAGE_DYNAMIC,/* uploaded data is used multiple times */
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()

2019-01-29 Thread Roland Scheidegger
Am 29.01.19 um 10:10 schrieb Erik Faye-Lund:
> On Mon, 2019-01-28 at 09:31 -0800, Matt Turner wrote:
>> Use the trick of adding and then subtracting 2**52 (52 is the number
>> of
>> explicit mantissa bits a double-precision floating-point value has)
>> to
>> implement round-to-even.
>>
>> Cuts the number of instructions on SKL of the piglit test
>> fs-roundEven-double.shader_test from 109 to 21.
> 
> Won't this approach only work for "small" values, that is values equal
> to or smaller than DBL_MAX - 2**52? Once you add 2**52, you'll get
> infinity, and you can't subtract 2**52 away again without being stuck
> with infinity, no...

It would actually work for very large numbers in theory.
The only numbers the magic trick won't work are those with magnitude
between 2^52 and 2^104 (those are already integral and the add will
cause some of them to be rounded up to another number with the sub not
doing anything afterwards), for larger ones it will work again, up to
and including inf.
But in any case, that's what the bcsel is for, for numbers larger than
2^52 no operations are performed at all.

Roland


> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Cab4bc9f7d353406d07fd08d685c9b366%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636843498692106809sdata=X5iJUwgPjhoiZYqrzSd%2FE1vhRrBthXVt21eFBigWjjM%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()

2019-01-28 Thread Roland Scheidegger
I like it :-).
That said, there's some caveats as discussed on IRC - in particular for
gpus which don't do round-to-nearest-even for ordinary fp64 math (or
rounding mode could be set to something else manually) it won't do the
right thing.
And if you can have fast-math enabled, then it probably won't round at
all (at least I think it would be legal to eliminate the add/sub in this
case).
So I'm not entirely sure anymore if this can be used unconditionally.
But I can't really tell if those potential caveats actually matter, hence
Reviewed-by: Roland Scheidegger 

Am 28.01.19 um 18:31 schrieb Matt Turner:
> Use the trick of adding and then subtracting 2**52 (52 is the number of
> explicit mantissa bits a double-precision floating-point value has) to
> implement round-to-even.
> 
> Cuts the number of instructions on SKL of the piglit test
> fs-roundEven-double.shader_test from 109 to 21.
> ---
>  src/compiler/nir/nir_lower_double_ops.c | 56 ++---
>  1 file changed, 12 insertions(+), 44 deletions(-)
> 
> diff --git a/src/compiler/nir/nir_lower_double_ops.c 
> b/src/compiler/nir/nir_lower_double_ops.c
> index 4d4cdf635ea..054fce9c168 100644
> --- a/src/compiler/nir/nir_lower_double_ops.c
> +++ b/src/compiler/nir/nir_lower_double_ops.c
> @@ -392,50 +392,18 @@ lower_fract(nir_builder *b, nir_ssa_def *src)
>  static nir_ssa_def *
>  lower_round_even(nir_builder *b, nir_ssa_def *src)
>  {
> -   /* If fract(src) == 0.5, then we will have to decide the rounding 
> direction.
> -* We will do this by computing the mod(abs(src), 2) and testing if it
> -* is < 1 or not.
> -*
> -* We compute mod(abs(src), 2) as:
> -* abs(src) - 2.0 * floor(abs(src) / 2.0)
> -*/
> -   nir_ssa_def *two = nir_imm_double(b, 2.0);
> -   nir_ssa_def *abs_src = nir_fabs(b, src);
> -   nir_ssa_def *mod =
> -  nir_fsub(b,
> -   abs_src,
> -   nir_fmul(b,
> -two,
> -nir_ffloor(b,
> -   nir_fmul(b,
> -abs_src,
> -nir_imm_double(b, 0.5);
> -
> -   /*
> -* If fract(src) != 0.5, then we round as floor(src + 0.5)
> -*
> -* If fract(src) == 0.5, then we have to check the modulo:
> -*
> -*   if it is < 1 we need a trunc operation so we get:
> -*  0.5 -> 0,   -0.5 -> -0
> -*  2.5 -> 2,   -2.5 -> -2
> -*
> -*   otherwise we need to check if src >= 0, in which case we need to 
> round
> -*   upwards, or not, in which case we need to round downwards so we get:
> -*  1.5 -> 2,   -1.5 -> -2
> -*  3.5 -> 4,   -3.5 -> -4
> -*/
> -   nir_ssa_def *fract = nir_ffract(b, src);
> -   return nir_bcsel(b,
> -nir_fne(b, fract, nir_imm_double(b, 0.5)),
> -nir_ffloor(b, nir_fadd(b, src, nir_imm_double(b, 0.5))),
> -nir_bcsel(b,
> -  nir_flt(b, mod, nir_imm_double(b, 1.0)),
> -  nir_ftrunc(b, src),
> -  nir_bcsel(b,
> -nir_fge(b, src, nir_imm_double(b, 
> 0.0)),
> -nir_fadd(b, src, nir_imm_double(b, 
> 0.5)),
> -nir_fsub(b, src, nir_imm_double(b, 
> 0.5);
> +   /* Add and subtract 2**52 to round off any fractional bits. */
> +   nir_ssa_def *two52 = nir_imm_double(b, (double)(1ull << 52));
> +   nir_ssa_def *sign = nir_iand(b, nir_unpack_64_2x32_split_y(b, src),
> +nir_imm_int(b, 1ull << 31));
> +
> +   b->exact = true;
> +   nir_ssa_def *res = nir_fsub(b, nir_fadd(b, nir_fabs(b, src), two52), 
> two52);
> +   b->exact = false;
> +
> +   return nir_bcsel(b, nir_flt(b, nir_fabs(b, src), two52),
> +nir_pack_64_2x32_split(b, nir_unpack_64_2x32_split_x(b, 
> res),
> +   nir_ior(b, 
> nir_unpack_64_2x32_split_y(b, res), sign)), src);
>  }
>  
>  static nir_ssa_def *
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallivm: Return true from arch_rounding_available() if NEON is available

2019-01-22 Thread Roland Scheidegger
I have no clue on aarch64, but looks all good to me.
For the series:
Reviewed-by: Roland Scheidegger 

Am 23.01.19 um 00:12 schrieb Matt Turner:
> LLVM uses the single instruction "FRINTI" to implement llvm.nearbyint.
> Fixes the rounding tests of lp_test_arit.
> 
> Bug: 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.gentoo.org%2F665570data=02%7C01%7Csroland%40vmware.com%7C3592bd2ab3a14996f0b208d680bf1f90%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636837955687364906sdata=N0br4dLkbNwWMj4H%2FEq3VSYHUCmkLcJZAxAFId4jClQ%3Dreserved=0
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_arit.c | 4 +++-
>  src/gallium/drivers/llvmpipe/lp_test_arit.c | 3 ++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> index c050bfdb936..057c50ed278 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> @@ -1992,6 +1992,8 @@ arch_rounding_available(const struct lp_type type)
> else if ((util_cpu_caps.has_altivec &&
>  (type.width == 32 && type.length == 4)))
>return TRUE;
> +   else if (util_cpu_caps.has_neon)
> +  return TRUE;
>  
> return FALSE;
>  }
> @@ -2099,7 +2101,7 @@ lp_build_round_arch(struct lp_build_context *bld,
>  LLVMValueRef a,
>  enum lp_build_round_mode mode)
>  {
> -   if (util_cpu_caps.has_sse4_1) {
> +   if (util_cpu_caps.has_sse4_1 || util_cpu_caps.has_neon) {
>LLVMBuilderRef builder = bld->gallivm->builder;
>const struct lp_type type = bld->type;
>const char *intrinsic_root;
> diff --git a/src/gallium/drivers/llvmpipe/lp_test_arit.c 
> b/src/gallium/drivers/llvmpipe/lp_test_arit.c
> index acba7ed44a8..eb3f67dc1fe 100644
> --- a/src/gallium/drivers/llvmpipe/lp_test_arit.c
> +++ b/src/gallium/drivers/llvmpipe/lp_test_arit.c
> @@ -458,7 +458,8 @@ test_unary(unsigned verbose, FILE *fp, const struct 
> unary_test_t *test, unsigned
>  continue;
>   }
>  
> - if (test->ref ==  && length == 2 && 
> + if (!util_cpu_caps.has_neon &&
> + test->ref ==  && length == 2 &&
>   ref != roundf(testval)) {
>  /* FIXME: The generic (non SSE) path in lp_build_iround, which is
>   * always taken for length==2 regardless of native round support,
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] autotools: Deprecate the use of autotools

2019-01-12 Thread Roland Scheidegger
Am 11.01.19 um 23:28 schrieb Ilia Mirkin:
> On Fri, Jan 11, 2019 at 5:12 PM Matt Turner  wrote:
>>
>> From: Gert Wollny 
>>
>> Since Meson will eventually be the only build system deprecate autotools
>> now. It can still be used by invoking configure with the flag
>>   --enable-autotools
>>
>> NAKed-by: Ilia Mirkin 
> 
> [nouveau]
> 
>> Acked-by: Eric Engestrom 
>> Acked-by: Kenneth Graunke 
>> Acked-by: Lionel Landwerlin 
>> Acked-by: Jason Ekstrand 
>> Reviewed-by: Matt Turner 
> 
> [intel]
> 
>> Acked-by: Rob Clark 
> 
> [freedreno]
> 
>> Acked-by: Marek Olšák 
> 
> [radeon]
> 
>> Reviewed-by: Christian Gmeiner 
> 
> [etnaviv]
> 
>> Reviewed-by: Eric Anholt 
> 
> [vc4]
> 
>> Signed-off-by: Gert Wollny 
> 
> [sorry Gert, not sure how to classify you]
> 
> I think the vmware team (which largely maintains llvmpipe and svga) is
> probably worth hearing from -- I believe they've largely stayed out of
> it. But an ack/nack would be good. Also virgl isn't represented, I
> believe. Probably not *required* to hear from these, but perhaps worth
> a poke?

I don't think we actually use autotools anywhere (we use scons)
officially, so I don't think we have any objections to deprecate
autotools. Though at some point we should probably adopt meson too...

Roland

> 
>> ---
>> I think there's support for overriding the sole objection to this patch.
>>
>> To confirm:
>>
>> (1) The plan is to remove Autotools, perhaps after the 19.0 release
>>
>> (2) This patch's purpose is to ensure that everyone knows that
>> Autotools will be going away (think: people who build Mesa as
>> part of an automated process and wouldn't notice a deprecation
>> warning unless it requires some action from them)
> 
> If it's being removed _after_ the 19.0 release, does it make sense to
> have a patch like this _in_ the 19.0 release? (Perhaps the answer is
> `yes', but I'd still like to ask the question.)
> 
>>
>> (3) We expect all reasonable concerns about Meson to be resolved
>> before Autotools is removed (e.g., reconfiguration problems,
>> retrieving configuration command line, configuration status
>> output, etc.)
>>
>>  configure.ac | 13 +
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/configure.ac b/configure.ac
>> index e4d20054d5f..c7473d77eff 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -52,6 +52,19 @@ mingw*)
>>  ;;
>>  esac
>>
>> +AC_ARG_ENABLE(autotools,
>> +   [AS_HELP_STRING([--enable-autotools],
>> +   [Enable the use of this autotools based build 
>> configuration])],
>> +   [enable_autotools=$enableval], [enable_autotools=no])
>> +
>> +if test "x$enable_autotools" != "xyes" ; then
>> +AC_MSG_ERROR([the autotools build system has been deprecated in favour 
>> of
>> +meson and will be removed eventually. For instructions on how to use 
>> meson
>> +see 
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mesa3d.org%2Fmeson.htmldata=02%7C01%7Csroland%40vmware.com%7C6318bcef4bb8448c161808d678142d0c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636828425410064737sdata=LPbkdwrhgg2lC%2FNDJl54GbV5DSQtM6YWu16g68IGEME%3Dreserved=0.
>> +If you still want to use the autotools build, then add 
>> --enable-autotools
>> +to the configure command line.])
>> +fi
>> +
>>  # Support silent build rules, requires at least automake-1.11. Disable
>>  # by either passing --disable-silent-rules to configure or passing V=1
>>  # to make
>> --
>> 2.19.2
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7C6318bcef4bb8448c161808d678142d0c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636828425410064737sdata=xuss8y2JSxq4Tlzz4JRppDaP9Kze5r8k8RbLpJnixIo%3Dreserved=0
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7C6318bcef4bb8448c161808d678142d0c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636828425410064737sdata=xuss8y2JSxq4Tlzz4JRppDaP9Kze5r8k8RbLpJnixIo%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Always return some fence in flush (v2)

2019-01-10 Thread Roland Scheidegger
Am 10.01.19 um 04:51 schrieb Tomasz Figa:
> Hi Roland,
> 
> On Thu, Jan 10, 2019 at 1:36 AM Roland Scheidegger  wrote:
>>
>> Sorry but I had to revert this, as we've seen lots of assertion failures
>> (src/gallium/drivers/llvmpipe/lp_fence.c:120: lp_fence_wait: Assertion
>> `f->issued' failed.). For instance with libgl_xlib state tracker and piglit.
>> I'm not entirely sure if it's really safe to just remove the assert or
> 
> Understood. Sorry for not spotting this in my testing.
No problem...

> Would you be
> able to help with more details on how to reproduce these failures?

I'm not sure if this reproduces with dri, so build libgl-xlib (I just
use scons, but it shouldn't matter, as long as you get asserts enabled...).
Set LD_LIBRARY_PATH to the built libGL, and run any of the affected
piglit tests (automated testing here got about 3500 crashes) - picking
just the first piglit/bin/arb_vertex_array_bgra-get -auto
arb_vertex_array_bgra-get: src/gallium/drivers/llvmpipe/lp_fence.c:120:
lp_fence_wait: Assertion `f->issued' failed.

It actually seems to mostly affect tests which don't render anything
(and they might actually crash after reporting the test passed).

FWIW this ist the stack trace:
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x77433801 in __GI_abort () at abort.c:79
#2  0x7742339a in __assert_fail_base (fmt=0x775aa7d8
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x756f9fa9 "f->issued",
file=file@entry=0x756f9f50
"src/gallium/drivers/llvmpipe/lp_fence.c", line=line@entry=120,
function=function@entry=0x756fa090 <__PRETTY_FUNCTION__.11755>
"lp_fence_wait") at assert.c:92
#3  0x77423412 in __GI___assert_fail (assertion=0x756f9fa9
"f->issued", file=0x756f9f50
"src/gallium/drivers/llvmpipe/lp_fence.c", line=120,
function=0x756fa090 <__PRETTY_FUNCTION__.11755> "lp_fence_wait") at
assert.c:101
#4  0x73eae07a in lp_fence_wait (f=0x559b1a10) at
src/gallium/drivers/llvmpipe/lp_fence.c:120
#5  0x73eaaa9b in llvmpipe_fence_finish (screen=0x55798170,
ctx=0x0, fence_handle=0x559b1a10, timeout=18446744073709551615) at
src/gallium/drivers/llvmpipe/lp_screen.c:637
#6  0x73f22eaf in XMesaFlush (c=0x557c4db0) at
src/gallium/state_trackers/glx/xlib/xm_api.c:1402
#7  0x73f22b80 in XMesaMakeCurrent2 (c=0x0, drawBuffer=0x0,
readBuffer=0x0) at src/gallium/state_trackers/glx/xlib/xm_api.c:1300
#8  0x73f260f1 in glXMakeContextCurrent (dpy=0x55781e60,
draw=0, read=0, ctx=0x0) at
src/gallium/state_trackers/glx/xlib/glx_api.c:1262
#9  0x73f2616c in glXMakeCurrent (dpy=0x55781e60,
drawable=0, ctx=0x0) at src/gallium/state_trackers/glx/xlib/glx_api.c:1283
#10 0x76b81a70 in ?? () from
/usr/lib/x86_64-linux-gnu/libwaffle-1.so.0
#11 0x76b7e01c in waffle_make_current () from
/usr/lib/x86_64-linux-gnu/libwaffle-1.so.0
#12 0x77b30829 in piglit_wfl_framework_teardown
(wfl_fw=0x557817a0) at
/home/sroland/devel/piglit/tests/util/piglit-framework-gl/piglit_wfl_framework.c:636
#13 0x77b30f05 in piglit_winsys_framework_teardown
(winsys_fw=0x557817a0) at
/home/sroland/devel/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:238
#14 0x77b31e1b in destroy (gl_fw=0x557817a0) at
/home/sroland/devel/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:212
#15 0x77b14b6e in destroy () at
/home/sroland/devel/piglit/tests/util/piglit-framework-gl.c:210
#16 0x77436041 in __run_exit_handlers (status=0,
listp=0x777de718 <__exit_funcs>,
run_list_atexit=run_list_atexit@entry=true,
run_dtors=run_dtors@entry=true) at exit.c:108
#17 0x7743613a in __GI_exit (status=) at exit.c:139
#18 0x777e66b9 in piglit_report_result (result=PIGLIT_PASS) at
/home/sroland/devel/piglit/tests/util/piglit-util.c:241
#19 0x4fca in piglit_init (argc=1, argv=0x7fffe338) at
/home/sroland/devel/piglit/tests/spec/arb_vertex_array_bgra/get.c:74
#20 0x77b30975 in run_test (gl_fw=0x557817a0, argc=1,
argv=0x7fffe338) at
/home/sroland/devel/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:73
#21 0x77b14c22 in piglit_gl_test_run (argc=1,
argv=0x7fffe338, config=0x7fffe1f0) at
/home/sroland/devel/piglit/tests/util/piglit-framework-gl.c:229
#22 0x4e20 in main (argc=1, argv=0x7fffe338) at
/home/sroland/devel/piglit/tests/spec/arb_vertex_array_bgra/get.c:38



> 
>> if it needs some more work, and I don't have time right now for a
>> thorough investigation, but I'll happily take new patches...
> 
> Perhaps we could make these dummy fences "issued". I'll check how this
> works in the code.
> 
> Best regards,
> To

Re: [Mesa-dev] [PATCH] st/mesa: don't leak pipe_surface if pipe_context is not current

2019-01-09 Thread Roland Scheidegger
Am 08.01.19 um 21:03 schrieb Marek Olšák:
> On Tue, Jan 8, 2019 at 12:54 PM Roland Scheidegger  <mailto:srol...@vmware.com>> wrote:
> 
> Am 08.01.19 um 17:17 schrieb Marek Olšák:
> > From: Marek Olšák mailto:marek.ol...@amd.com>>
> >
> > We have found some pipe_surface leaks internally.
> >
> > This is the same code as surface_destroy in radeonsi.
> > Ideally, surface_destroy would be in pipe_screen.
> > No, pipe_surfaces are not context objects.
> Well they are supposed to be...
> But yes mesa/st doesn't play by the rules there, so I guess that's
> better than a leak...
> 
> 
> If it was possible, I would change the rules. create_surface could stay
> in pipe_context, but I would move surface_destroy into pipe_screen.

I think st/mesa would still use (not just delete) the pipe_surface in
other contexts? If so I don't like the proposal, because it's still a
lie that pipe_surface is a context-based object.

Roland


> 
> Marek

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Always return some fence in flush (v2)

2019-01-09 Thread Roland Scheidegger
Sorry but I had to revert this, as we've seen lots of assertion failures
(src/gallium/drivers/llvmpipe/lp_fence.c:120: lp_fence_wait: Assertion
`f->issued' failed.). For instance with libgl_xlib state tracker and piglit.
I'm not entirely sure if it's really safe to just remove the assert or
if it needs some more work, and I don't have time right now for a
thorough investigation, but I'll happily take new patches...

Roland


Am 09.01.19 um 02:09 schrieb Roland Scheidegger:
> Am 07.01.19 um 09:54 schrieb Tomasz Figa:
>> On Sun, Dec 23, 2018 at 12:55 AM Roland Scheidegger  
>> wrote:
>>>
>>> Alright, I guess it should work...
>>>
>>> Reviewed-by: Roland Scheidegger 
>>>
>>
>> Thanks!
>>
>> Would we have anyone who could help to commit it?
> 
> Pushed (albeit I forgot the R-b on it, ah well...)
> 
> Roland
> 
>>
>> (I know that I was supposed to apply for commit rights, but I expect
>> my contribution rate to be relatively low, due to a shift to different
>> areas, so I don't think I'm a good candidate for a committer anymore.)
>>
>> Best regards,
>> Tomasz
>>
>>>
>>> Am 14.12.18 um 09:17 schrieb Tomasz Figa:
>>>> If there is no last fence, due to no rendering happening yet, just
>>>> create a new signaled fence and return it, to match the expectations of
>>>> the EGL sync fence API.
>>>>
>>>> Fixes random "Could not create sync fence 0x3003" assertion failures from
>>>> Skia on Android, coming from the following code:
>>>>
>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandroid.googlesource.com%2Fplatform%2Fframeworks%2Fbase%2F%2B%2Fmaster%2Flibs%2Fhwui%2Fpipeline%2Fskia%2FSkiaOpenGLPipeline.cpp%23427data=02%7C01%7Csroland%40vmware.com%7Ccb06f4e1c9164a7871cb08d675cf20c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636825929820495738sdata=6hmZk%2BXWaQk%2B5XAKjxFSybOSpCVwzvKemYgZQ1rtpvg%3Dreserved=0
>>>>
>>>> Reproducible especially with thread count >= 4.
>>>>
>>>> One could make the driver always keep the reference to the last fence,
>>>> but:
>>>>
>>>>  - the driver seems to explicitly destroy the fence whenever a rendering
>>>>pass completes and changing that would require a significant functional
>>>>change to the code. (Specifically, in lp_scene_end_rasterization().)
>>>>
>>>>  - it still wouldn't solve the problem of an EGL sync fence being created
>>>>and waited on without any rendering happening at all, which is
>>>>also likely to happen with Android code pointed to in the commit.
>>>>
>>>> Therefore, the simple approach of always creating a fence is taken,
>>>> similarly to other drivers, such as radeonsi.
>>>>
>>>> Tested with piglit llvmpipe suite with no regressions and following
>>>> tests fixed:
>>>>
>>>> egl_khr_fence_sync
>>>>  conformance
>>>>   eglclientwaitsynckhr_flag_sync_flush
>>>>   eglclientwaitsynckhr_nonzero_timeout
>>>>   eglclientwaitsynckhr_zero_timeout
>>>>   eglcreatesynckhr_default_attributes
>>>>   eglgetsyncattribkhr_invalid_attrib
>>>>   eglgetsyncattribkhr_sync_status
>>>>
>>>> v2:
>>>>  - remove the useless lp_fence_reference() dance (Nicolai),
>>>>  - explain why creating the dummy fence is the right approach.
>>>>
>>>> Signed-off-by: Tomasz Figa 
>>>> ---
>>>>  src/gallium/drivers/llvmpipe/lp_setup.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c 
>>>> b/src/gallium/drivers/llvmpipe/lp_setup.c
>>>> index b087369473..e72e119c8a 100644
>>>> --- a/src/gallium/drivers/llvmpipe/lp_setup.c
>>>> +++ b/src/gallium/drivers/llvmpipe/lp_setup.c
>>>> @@ -361,6 +361,8 @@ lp_setup_flush( struct lp_setup_context *setup,
>>>>
>>>> if (fence) {
>>>>lp_fence_reference((struct lp_fence **)fence, setup->last_fence);
>>>> +  if (!*fence)
>>>> + *fence = (struct pipe_fence_handle *)lp_fence_create(0);
>>>> }
>>>>  }
>>>>
>>>>
>>>
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Ccb06f4e1c9164a7871cb08d675cf20c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636825929820495738sdata=FeFqwQi9AUVDWaUw7lLMtAci6wWjE44vqwwjVwysY3o%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Always return some fence in flush (v2)

2019-01-08 Thread Roland Scheidegger
Am 07.01.19 um 09:54 schrieb Tomasz Figa:
> On Sun, Dec 23, 2018 at 12:55 AM Roland Scheidegger  
> wrote:
>>
>> Alright, I guess it should work...
>>
>> Reviewed-by: Roland Scheidegger 
>>
> 
> Thanks!
> 
> Would we have anyone who could help to commit it?

Pushed (albeit I forgot the R-b on it, ah well...)

Roland

> 
> (I know that I was supposed to apply for commit rights, but I expect
> my contribution rate to be relatively low, due to a shift to different
> areas, so I don't think I'm a good candidate for a committer anymore.)
> 
> Best regards,
> Tomasz
> 
>>
>> Am 14.12.18 um 09:17 schrieb Tomasz Figa:
>>> If there is no last fence, due to no rendering happening yet, just
>>> create a new signaled fence and return it, to match the expectations of
>>> the EGL sync fence API.
>>>
>>> Fixes random "Could not create sync fence 0x3003" assertion failures from
>>> Skia on Android, coming from the following code:
>>>
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandroid.googlesource.com%2Fplatform%2Fframeworks%2Fbase%2F%2B%2Fmaster%2Flibs%2Fhwui%2Fpipeline%2Fskia%2FSkiaOpenGLPipeline.cpp%23427data=02%7C01%7Csroland%40vmware.com%7C578e1621b8de4eeca24508d6747dd1a1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636824481094747485sdata=dEjvlyeiOlGCBnegZS%2FXsNRf8OX21qss9d59D0RGqZk%3Dreserved=0
>>>
>>> Reproducible especially with thread count >= 4.
>>>
>>> One could make the driver always keep the reference to the last fence,
>>> but:
>>>
>>>  - the driver seems to explicitly destroy the fence whenever a rendering
>>>pass completes and changing that would require a significant functional
>>>change to the code. (Specifically, in lp_scene_end_rasterization().)
>>>
>>>  - it still wouldn't solve the problem of an EGL sync fence being created
>>>and waited on without any rendering happening at all, which is
>>>also likely to happen with Android code pointed to in the commit.
>>>
>>> Therefore, the simple approach of always creating a fence is taken,
>>> similarly to other drivers, such as radeonsi.
>>>
>>> Tested with piglit llvmpipe suite with no regressions and following
>>> tests fixed:
>>>
>>> egl_khr_fence_sync
>>>  conformance
>>>   eglclientwaitsynckhr_flag_sync_flush
>>>   eglclientwaitsynckhr_nonzero_timeout
>>>   eglclientwaitsynckhr_zero_timeout
>>>   eglcreatesynckhr_default_attributes
>>>   eglgetsyncattribkhr_invalid_attrib
>>>   eglgetsyncattribkhr_sync_status
>>>
>>> v2:
>>>  - remove the useless lp_fence_reference() dance (Nicolai),
>>>  - explain why creating the dummy fence is the right approach.
>>>
>>> Signed-off-by: Tomasz Figa 
>>> ---
>>>  src/gallium/drivers/llvmpipe/lp_setup.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c 
>>> b/src/gallium/drivers/llvmpipe/lp_setup.c
>>> index b087369473..e72e119c8a 100644
>>> --- a/src/gallium/drivers/llvmpipe/lp_setup.c
>>> +++ b/src/gallium/drivers/llvmpipe/lp_setup.c
>>> @@ -361,6 +361,8 @@ lp_setup_flush( struct lp_setup_context *setup,
>>>
>>> if (fence) {
>>>lp_fence_reference((struct lp_fence **)fence, setup->last_fence);
>>> +  if (!*fence)
>>> + *fence = (struct pipe_fence_handle *)lp_fence_create(0);
>>> }
>>>  }
>>>
>>>
>>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: don't leak pipe_surface if pipe_context is not current

2019-01-08 Thread Roland Scheidegger
Am 08.01.19 um 17:17 schrieb Marek Olšák:
> From: Marek Olšák 
> 
> We have found some pipe_surface leaks internally.
> 
> This is the same code as surface_destroy in radeonsi.
> Ideally, surface_destroy would be in pipe_screen.
> No, pipe_surfaces are not context objects.
Well they are supposed to be...
But yes mesa/st doesn't play by the rules there, so I guess that's
better than a leak...

Roland


> 
> Cc: 18.3 19.0 
> ---
>  src/gallium/auxiliary/util/u_inlines.h | 19 +++
>  src/mesa/state_tracker/st_cb_fbo.c |  5 -
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_inlines.h 
> b/src/gallium/auxiliary/util/u_inlines.h
> index b06fb111709..fa1e920b509 100644
> --- a/src/gallium/auxiliary/util/u_inlines.h
> +++ b/src/gallium/auxiliary/util/u_inlines.h
> @@ -147,20 +147,39 @@ pipe_resource_reference(struct pipe_resource **dst, 
> struct pipe_resource *src)
>  
>   old_dst->screen->resource_destroy(old_dst->screen, old_dst);
>   old_dst = next;
>} while (pipe_reference_described(_dst->reference, NULL,
>  (debug_reference_descriptor)
>  debug_describe_resource));
> }
> *dst = src;
>  }
>  
> +/**
> + * Same as pipe_surface_release, but used when pipe_context doesn't exist
> + * anymore.
> + */
> +static inline void
> +pipe_surface_release_no_context(struct pipe_surface **ptr)
> +{
> +   struct pipe_surface *surf = *ptr;
> +
> +   if (pipe_reference_described(>reference, NULL,
> +(debug_reference_descriptor)
> +debug_describe_surface)) {
> +  /* trivially destroy pipe_surface */
> +  pipe_resource_reference(>texture, NULL);
> +  free(surf);
> +   }
> +   *ptr = NULL;
> +}
> +
>  /**
>   * Set *dst to \p src with proper reference counting.
>   *
>   * The caller must guarantee that \p src and *dst were created in
>   * the same context (if they exist), and that this must be the current 
> context.
>   */
>  static inline void
>  pipe_sampler_view_reference(struct pipe_sampler_view **dst,
>  struct pipe_sampler_view *src)
>  {
> diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
> b/src/mesa/state_tracker/st_cb_fbo.c
> index 8901a8680ef..8d099f7b0f9 100644
> --- a/src/mesa/state_tracker/st_cb_fbo.c
> +++ b/src/mesa/state_tracker/st_cb_fbo.c
> @@ -278,22 +278,25 @@ st_renderbuffer_alloc_storage(struct gl_context * ctx,
>   * gl_renderbuffer::Delete()
>   */
>  static void
>  st_renderbuffer_delete(struct gl_context *ctx, struct gl_renderbuffer *rb)
>  {
> struct st_renderbuffer *strb = st_renderbuffer(rb);
> if (ctx) {
>struct st_context *st = st_context(ctx);
>pipe_surface_release(st->pipe, >surface_srgb);
>pipe_surface_release(st->pipe, >surface_linear);
> -  strb->surface = NULL;
> +   } else {
> +  pipe_surface_release_no_context(>surface_srgb);
> +  pipe_surface_release_no_context(>surface_linear);
> }
> +   strb->surface = NULL;
> pipe_resource_reference(>texture, NULL);
> free(strb->data);
> _mesa_delete_renderbuffer(ctx, rb);
>  }
>  
>  
>  /**
>   * Called via ctx->Driver.NewRenderbuffer()
>   */
>  static struct gl_renderbuffer *
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NIR constant problem for GPU which doesn't have native integer support

2019-01-03 Thread Roland Scheidegger
Am 03.01.19 um 20:50 schrieb Jason Ekstrand:
> > The problem you're describing is in converting from NIR to another IR,
> > not to hardware.  In LLVM they made a choice to put types on SSA
> values
> > but then to have the actual semantics be based on the instruction
> > itself.  This leads to lots of redundancy in the IR but also lots of
> > things you can validate which is kind-of nice.  Is that redundancy
> > really useful?  In our experience with NIR, we haven't found it to be
> > other than booleans (now sorted), this one constant issue, and
> > translating to IRs that have that redundancy.  Then why did they add
> > it?  I'm really not sure but it may, at least somewhat, be related to
> > the fact that they allow arrays and structs in their SSA values and so
> > need full types.  This is another design decision in LLVM which I find
> > highly questionable.  What you're is more-or-less that NIR should
> carry,
> > maintain, and validate extra useless information just so we can
> pass it
> > on to LLVM which is going to use it for what, exactly?  Sorry if I'm
> > extremely reluctant to make fairly fundamental changes to NIR with no
> > better reason than "LLVM did it that way".
> >
> > There's a second reason why I don't like the idea of putting types on
> > SSA values: It's extremely deceptive.  In SPIR-V you're allowed to
> do an
> > OpSConvert with a source that is an unsigned 16-bit integer and a
> > destination that is an unsigned 32-bit integer.  What happens?  Your
> > uint -> uint cast sign extends!  Yup That's what happens.  No
> joke. 
> > The same is true of signed vs. unsigned division or modulus.  The end
> > result is that the types in SPIR-V are useless and you can't actually
> > trust them for anything except bit-size and sometimes labelling
> > something as a float vs. int.
> This is really a decision of spir-v only though, llvm doesn't have that
> nonsense (maybe for making it easier to translate to other languages?) -
> there's just int and float types there, with no distinction between
> signed and unsigned.
> 
> 
> I think with SPIR-V you could probably just pick one and make everything
> either signed or unsigned.  I'm not immediately aware of any opcodes
> that require one signedness or the other; most just require an integer
> or require a float.  That said, this is SPIR-V so I'm not going to bet
> money on that...
>  
> 
> You are quite right though that float vs. int is somewhat redundant too
> due to the instructions indicating the type. I suppose there might be
> reasons why there's different types - hw may use different registers for
> instance (whereas of course noone in their right mind would use
> different registers for signed vs. unsigned ints), or there might be
> some real cost of bitcasts (at least bypass delays are common for cpus
> when transitioning values between int and float execution units). For
> instance, it makes a difference with x86 sse if you use float or int
> loads, which otherwise you couldn't indicate directly (llvm can optimize
> this into the right kind of load nowadays even if you use the wrong kind
> of variable for the load, e.g. int load then bitcast to float and do
> some float op will change it into a float load, but this is IIRC
> actually a pretty new ability, and possibly doesn't happen if you
> disable enough optimization passes).
> 
> 
> Having it for the purpose of register allocation makes sense in the CPU
> world.  In the GPU world, everything is typically designed float-first
> and I'm not aware of any hardware has separate int and float registers
> like x86 does.  That said, hardware changes over time and it's entirely
> possible that someone will decide that a heterogeneous register file is
> a good idea on a GPU.  (Technically, most GPUs do have flag regs or
> accumulators or something but it's not as bad as x86.)  Our of
> curiosity, do you (or anyone else) know if LLVM actually uses them for
> that purpose?  I could see that information getting lost in the back-end
> and them using something else to choose registers.
> 
> --Jason

For llvm with x86 sse, I don't think a variable being characterized as
float or int via bitcast actually makes a difference whatsoever (at
least not with optimization). llvm would determine if it's float or int
on its own (based on preceeding / succeeding instructions). For example,
as you might know, sse has both float and int logic ops, whereas llvm of
course does not - but it would still use float logic ops appropriately
(when the value is coming out of / going into a "true" float op),
despite that you have to cast it to int in llvm to do the logic op. (A
more interesting examples are actually shuffles, since again due to sse
being quite horrid there some are only directly possible with ints, some

Re: [Mesa-dev] NIR constant problem for GPU which doesn't have native integer support

2019-01-03 Thread Roland Scheidegger
Am 03.01.19 um 18:58 schrieb Jason Ekstrand:
> On Thu, Jan 3, 2019 at 3:39 AM Erik Faye-Lund
> mailto:erik.faye-l...@collabora.com>> wrote:
> 
> On Wed, 2019-01-02 at 10:16 -0600, Jason Ekstrand wrote:
> > On Wed, Jan 2, 2019 at 9:43 AM Ilia Mirkin  >
> > wrote:
> > > Have a look at the first 4 patches in the series from Jonathan
> > > Marek
> > > to address some of these issues:
> > >
> > > https://patchwork.freedesktop.org/series/54295/
> 
> 
> > >
> > > Not sure exactly what state that work is in, but I've added
> > > Jonathan
> > > to CC, perhaps he can provide an update.
> > >
> > > Cheers,
> > >
> > >   -ilia
> > >
> > > On Wed, Jan 2, 2019 at 6:28 AM Qiang Yu  > wrote:
> > > >
> > > > Hi guys,
> > > >
> > > > I found the problem with this test fragment shader when lima
> > > development:
> > > > uniform int color;
> > > > void main() {
> > > >     if (color > 1)
> > > >         gl_FragColor = vec4(1.0, 0.0, 0.0, 1);
> > > >     else
> > > >         gl_FragColor = vec4(0.0, 1.0, 0.0, 1);
> > > > }
> > > >
> > > > nir_print_shader output:
> > > > impl main {
> > > >         block block_0:
> > > >         /* preds: */
> > > >         vec1 32 ssa_0 = load_const (0x0001 /* 0.00 */)
> > > >         vec4 32 ssa_1 = load_const (0x3f80 /* 1.00 */,
> > > > 0x /* 0.00 */, 0x /* 0.00 */, 0x3f80
> > > /*
> > > > 1.00 */)
> > > >         vec4 32 ssa_2 = load_const (0x /* 0.00 */,
> > > > 0x3f80 /* 1.00 */, 0x /* 0.00 */, 0x3f80
> > > /*
> > > > 1.00 */)
> > > >         vec1 32 ssa_3 = load_const (0x /* 0.00 */)
> > > >         vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 1, 0)
> > > /*
> > > > base=0 */ /* range=1 */ /* component=0 */   /* color */
> > > >         vec1 32 ssa_5 = slt ssa_0, ssa_4
> > > >         vec1 32 ssa_6 = fnot ssa_5
> > > >         vec4 32 ssa_7 = bcsel ssa_6., ssa_2, ssa_1
> > > >         intrinsic store_output (ssa_7, ssa_3) (0, 15, 0) /*
> > > base=0 */
> > > > /* wrmask=xyzw */ /* component=0 */       /* gl_FragColor */
> > > >         /* succs: block_1 */
> > > >         block block_1:
> > > > }
> > > >
> > > > ssa0 is not converted to float when glsl to nir. I see
> > > glsl_to_nir.cpp
> > > > will create flt/ilt/ult
> > > > based on source type for gpu support native integer, but for gpu
> > > not
> > > > support native
> > > > integer, just create slt for all source type. And in
> > > > nir_lower_constant_initializers,
> > > > there's also no type conversion for integer constant.
> >
> > This is a generally sticky issue.  In NIR, we have no concept of
> > types on SSA values which has proven perfectly reasonable and
> > actually very powerful in a world where integers are supported
> > natively.  Unfortunately, it causes significant problems for float-
> > only architectures.
> 
> I would like to take this chance to say that this untyped SSA-value
> choice has lead to issues in both radeon_si (because LLVM values are
> typed) and zink (similarly, because SPIR-V values are typed), where we
> need to to bitcasts on every access because there's just not enough
> information available to emit variables with the right type.
> 
> 
> I'm not sure if I agree that the two problems are the same or not... 
> More on that in a bit.
>  
> 
> It took us a lot of time to realize that the meta-data from the opcodes
> doesn't *really* provide this, because the rest of nir doesn't treat
> values consistently. In fact, this feels arguably more like buggy
> behavior; why do we even have fmov when all of the time the compiler
> will emit imovs for floating-point values...? Or why do we have bitcast
> 
> 
> Why do we have different mov opcodes?  Because they have different
> behavior in the presence of source/destination modifiers.  You likely
> don't use those in radeon or Zink but we do use them on Intel.  That
> being said, I've very seriously considered adding a generic nir_op_mov
> which is entirely typeless and doesn't support modifiers at all and make
> most of core NIR use that.  For that matter, there's no real reason why
> we need fmov with modifiers at all when we could we could easily replace
> "ssa1 = fmov.sat |x|" with "ssa1 = fsat |x|" or "ssa1 = fabs.sat x".  If
> we had a generic nir_op_mov to deal 

Re: [Mesa-dev] [PATCH] llvmpipe: Always return some fence in flush (v2)

2018-12-22 Thread Roland Scheidegger
Alright, I guess it should work...

Reviewed-by: Roland Scheidegger 


Am 14.12.18 um 09:17 schrieb Tomasz Figa:
> If there is no last fence, due to no rendering happening yet, just
> create a new signaled fence and return it, to match the expectations of
> the EGL sync fence API.
> 
> Fixes random "Could not create sync fence 0x3003" assertion failures from
> Skia on Android, coming from the following code:
> 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandroid.googlesource.com%2Fplatform%2Fframeworks%2Fbase%2F%2B%2Fmaster%2Flibs%2Fhwui%2Fpipeline%2Fskia%2FSkiaOpenGLPipeline.cpp%23427data=02%7C01%7Csroland%40vmware.com%7Cbbfaf154367d449a153608d6619ca298%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636803722709844114sdata=Jau0msKOhAhDMMklBWCtuse40%2FMxpLR50vqjtiyVBYw%3Dreserved=0
> 
> Reproducible especially with thread count >= 4.
> 
> One could make the driver always keep the reference to the last fence,
> but:
> 
>  - the driver seems to explicitly destroy the fence whenever a rendering
>pass completes and changing that would require a significant functional
>change to the code. (Specifically, in lp_scene_end_rasterization().)
> 
>  - it still wouldn't solve the problem of an EGL sync fence being created
>and waited on without any rendering happening at all, which is
>also likely to happen with Android code pointed to in the commit.
> 
> Therefore, the simple approach of always creating a fence is taken,
> similarly to other drivers, such as radeonsi.
> 
> Tested with piglit llvmpipe suite with no regressions and following
> tests fixed:
> 
> egl_khr_fence_sync
>  conformance
>   eglclientwaitsynckhr_flag_sync_flush
>   eglclientwaitsynckhr_nonzero_timeout
>   eglclientwaitsynckhr_zero_timeout
>   eglcreatesynckhr_default_attributes
>   eglgetsyncattribkhr_invalid_attrib
>   eglgetsyncattribkhr_sync_status
> 
> v2:
>  - remove the useless lp_fence_reference() dance (Nicolai),
>  - explain why creating the dummy fence is the right approach.
> 
> Signed-off-by: Tomasz Figa 
> ---
>  src/gallium/drivers/llvmpipe/lp_setup.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c 
> b/src/gallium/drivers/llvmpipe/lp_setup.c
> index b087369473..e72e119c8a 100644
> --- a/src/gallium/drivers/llvmpipe/lp_setup.c
> +++ b/src/gallium/drivers/llvmpipe/lp_setup.c
> @@ -361,6 +361,8 @@ lp_setup_flush( struct lp_setup_context *setup,
>  
> if (fence) {
>lp_fence_reference((struct lp_fence **)fence, setup->last_fence);
> +  if (!*fence)
> + *fence = (struct pipe_fence_handle *)lp_fence_create(0);
> }
>  }
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] st/mesa: Make an enum for pipeline statistics query result indices.

2018-12-21 Thread Roland Scheidegger
Am 22.12.18 um 01:21 schrieb Ilia Mirkin:
> 
> 
> On Fri, Dec 21, 2018, 19:16 Marek Olšák   wrote:
> 
> 
> 
> On Fri, Dec 21, 2018, 6:28 PM Kenneth Graunke   wrote:
> 
> That seems like a reasonable interface to me.
> 
> But, I don't think it's backwards compatible.  Today, don't state
> trackers set index = 0 and expect all 11 to be returned?  We could
> easily change the in-tree state trackers, but not sure about the
> other ones.
> 
> We could always encode the index differently, but at that point, I
> wonder if it would be cleaner to just add a new query type like Ilia
> suggested.
> 
> Marek, what would you prefer?
> 
> 
> Backward compatibility is not required. Gallium is not a stable API.
> In tree state trackers can be fixed easily. We shouldn't worry too
> much about closed source state trackers.
> 
> 
> Fwiw my take is that while it's fine to change apis around (we do this
> all the time), we should avoid causing a loss of functionality just
> because no in-tree state tracker uses it. I think having a
> forward-looking gallium API greatly facilitated GL 3 and 4 bringup of
> gallium drivers, even though there wasn't necessarily an in-tree way to
> access all the functionality at the time.
> 
> As long as all the previously accessible functionality remains, I think
> we're fine.

Yes, I agree. Certainly it isn't a problem for us to change the supplied
index to the query.
Although it is usually better if interface changes are not just
semantically different, but also syntactically, to avoid surprises - you
know there's always just this odd place in the code which for some
reason you failed to change, and a compile error is much better than
strange failures later (this is of course even more true for out-of-tree
state trackers). But this might not always be feasible, keeping the
interface nice should take priority. (There's not really any hard rules
for gallium interface changes, that's just my take on it.)

Roland



> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: abort when trying to use non-existing intrinsic

2018-12-21 Thread Roland Scheidegger
Am 21.12.18 um 15:30 schrieb Jose Fonseca:
> On 21/12/2018 14:28, Roland Scheidegger wrote:
>> Am 21.12.18 um 08:46 schrieb Jose Fonseca:
>>> On 21/12/2018 01:42, srol...@vmware.com wrote:
>>>> From: Roland Scheidegger 
>>>>
>>>> Whenever llvm removes an intrinsic (we're using), we're hitting
>>>> segfaults
>>>> due to llvm doing calls to address 0 in the jitted code instead.
>>>> However, Jose figured out we can actually detect this with
>>>> LLVMGetIntrinsicID(), so use this to abort, so we don't have to wonder
>>>> what got broken. (Of course, someone still needs to fix the code to
>>>> no longer use this intrinsic.)
>>>> ---
>>>>    src/gallium/auxiliary/gallivm/lp_bld_intr.c | 10 ++
>>>>    1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_intr.c
>>>> b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
>>>> index 74ed16f33f0..c9df136b103 100644
>>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_intr.c
>>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
>>>> @@ -241,6 +241,16 @@ lp_build_intrinsic(LLVMBuilderRef builder,
>>>>        function = lp_declare_intrinsic(module, name, ret_type,
>>>> arg_types, num_args);
>>>>    +  /*
>>>> +   * If llvm removes an intrinsic we use, we'll hit this abort
>>>> (rather
>>>> +   * than a call to address zero in the jited code).
>>>> +   */
>>>> +  if (LLVMGetIntrinsicID(function) == 0) {
>>>> + printf("llvm (version 0x%x) found no intrinsic for %s, going
>>>> to crash...\n",
>>>> +    HAVE_LLVM, name);
>>>
>>> Better to use _debug_printf() so it's redirected to stderr (or
>>> OutpuDebug on Windows.)
>> Alright, though this will drop the output on non-debug builds.
> 
> Not really: debug_printf only prints on debug build, but  _debug_printf
> (note the leading underscore) always print.
Ahh right I failed to catch the difference here.

Roland



> 
> Perhaps it's not the smartest naming convention, but it should do the
> expected.
> 
> 
> Jose

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: abort when trying to use non-existing intrinsic

2018-12-21 Thread Roland Scheidegger
Am 21.12.18 um 08:46 schrieb Jose Fonseca:
> On 21/12/2018 01:42, srol...@vmware.com wrote:
>> From: Roland Scheidegger 
>>
>> Whenever llvm removes an intrinsic (we're using), we're hitting segfaults
>> due to llvm doing calls to address 0 in the jitted code instead.
>> However, Jose figured out we can actually detect this with
>> LLVMGetIntrinsicID(), so use this to abort, so we don't have to wonder
>> what got broken. (Of course, someone still needs to fix the code to
>> no longer use this intrinsic.)
>> ---
>>   src/gallium/auxiliary/gallivm/lp_bld_intr.c | 10 ++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_intr.c
>> b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
>> index 74ed16f33f0..c9df136b103 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_intr.c
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_intr.c
>> @@ -241,6 +241,16 @@ lp_build_intrinsic(LLVMBuilderRef builder,
>>       function = lp_declare_intrinsic(module, name, ret_type,
>> arg_types, num_args);
>>   +  /*
>> +   * If llvm removes an intrinsic we use, we'll hit this abort
>> (rather
>> +   * than a call to address zero in the jited code).
>> +   */
>> +  if (LLVMGetIntrinsicID(function) == 0) {
>> + printf("llvm (version 0x%x) found no intrinsic for %s, going
>> to crash...\n",
>> +    HAVE_LLVM, name);
> 
> Better to use _debug_printf() so it's redirected to stderr (or
> OutpuDebug on Windows.)
Alright, though this will drop the output on non-debug builds.

> 
>> + abort();
>> +  }
>> +
>>     if (!set_callsite_attrs)
>>    lp_add_func_attributes(function, attr_mask);
>>  
> 
> I think it's worth auditing we don't use lp_build_intrinsic() helper for
> LLVM functions we built ourselves.  I took a look, and didn't found any.
It never occurred to me you could use it for ordinary functions. But I
suppose you're right in theory someone could use it (although of course
the function body would have to be defined elsewhere).

Roland



> Otherwise
> 
> Reviewed-by: Jose Fonseca 
> 
> Jose

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: use llvm jit code for decoding s3tc

2018-12-20 Thread Roland Scheidegger
Ahh great find!
I tried it out and it works, if it's not an intrinsic it will return 0,
otherwise some number (no idea how they are enumerated, but who cares).

Roland


Am 21.12.18 um 00:56 schrieb Jose Fonseca:
> There's an function -- LLVMGetIntrinsicID -- I wonder if we can use it
> to trap unsupported intrinsics?
> 
> Jose
> 
> On 20/12/2018 22:09, Roland Scheidegger wrote:
>> Am 20.12.18 um 16:56 schrieb Michel Dänzer:
>>> On 2018-12-19 4:51 a.m., srol...@vmware.com wrote:
>>>> From: Roland Scheidegger 
>>>>
>>>> This is (much) faster than using the util fallback.
>>>> (Note that there's two methods here, one would use a cache, similar to
>>>> the existing code (although the cache was disabled), except the block
>>>> decode is done with jit code, the other directly decodes the required
>>>> pixels. For now don't use the cache (being direct-mapped is suboptimal,
>>>> but it's difficult to come up with something better which doesn't have
>>>> too much overhead.)
>>>
>>> This change made lp_test_format segfault on my Ryzen 7 1700, both using
>>> LLVM 7 and current SVN HEAD. Not much information in the backtrace
>>> unfortunately:
>>>
>>
>> Ahh I failed to test with newer llvm versions (it works with llvm 5.0 at
>> least).
>> It is (once again...) due to intrinsics disappearing from llvm.
>> I especially hate it that there's seemingly no way to detect if an
>> intrinsic has disappeared, since llvm will just replace it will calls to
>> address 0 if the intrinsic doesn't exist nowadays...
>> Probably it's the llvm.x86.sse2.pavg.b intrinsic, I'll fix it...
>>
>> Roland
>>
>>
>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x in ?? ()
>>> (gdb) bt
>>> #0  0x in ?? ()
>>> #1  0x77fcd16a in ?? ()
>>> #2  0x2c2c2c2caeaeaeae in ?? ()
>>> #3  0x979797976f6f6f6f in ?? ()
>>> #4  0x20b0d7f2 in ?? ()
>>> #5  0x976f2cae in ?? ()
>>> #6  0x0089625d008eb099 in ?? ()
>>> #7  0x7b8487218483a821 in ?? ()
>>> #8  0x008414210094ffd6 in ?? ()
>>> #9  0x007bef9400739629 in ?? ()
>>> #10 0x0069 in ?? ()
>>> #11 0x06d0 in ?? ()
>>> #12 0x7fffe490 in ?? ()
>>> #13 0x5593f040 in ?? ()
>>> #14 0x7fffe430 in ?? ()
>>> #15 0x77fcd053 in ?? ()
>>> #16 0x77fcd000 in ?? ()
>>> #17 0x556fd618 in util_format_test_cases ()
>>> #18 0x558c0780 in ?? ()
>>> #19 0x7fffe490 in ?? ()
>>> #20 0x556fd618 in util_format_test_cases ()
>>> #21 0x555667cc in test_format_float (verbose=,
>>> desc=0x7fffe4a0, fp=0x0) at
>>> ../src/gallium/drivers/llvmpipe/lp_test_format.c:184
>>> #22 test_one (verbose=, format_desc=0x7fffe4a0,
>>> fp=0x0) at ../src/gallium/drivers/llvmpipe/lp_test_format.c:342
>>> #23 test_all (verbose=, fp=0x0) at
>>> ../src/gallium/drivers/llvmpipe/lp_test_format.c:395
>>> #24 0x5556621f in main (argc=1, argv=0x7fffe628) at
>>> ../src/gallium/drivers/llvmpipe/lp_test_main.c:419
>>>
>>>
>>
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: use llvm jit code for decoding s3tc

2018-12-20 Thread Roland Scheidegger
Am 20.12.18 um 16:56 schrieb Michel Dänzer:
> On 2018-12-19 4:51 a.m., srol...@vmware.com wrote:
>> From: Roland Scheidegger 
>>
>> This is (much) faster than using the util fallback.
>> (Note that there's two methods here, one would use a cache, similar to
>> the existing code (although the cache was disabled), except the block
>> decode is done with jit code, the other directly decodes the required
>> pixels. For now don't use the cache (being direct-mapped is suboptimal,
>> but it's difficult to come up with something better which doesn't have
>> too much overhead.)
> 
> This change made lp_test_format segfault on my Ryzen 7 1700, both using
> LLVM 7 and current SVN HEAD. Not much information in the backtrace
> unfortunately:
> 

Ahh I failed to test with newer llvm versions (it works with llvm 5.0 at
least).
It is (once again...) due to intrinsics disappearing from llvm.
I especially hate it that there's seemingly no way to detect if an
intrinsic has disappeared, since llvm will just replace it will calls to
address 0 if the intrinsic doesn't exist nowadays...
Probably it's the llvm.x86.sse2.pavg.b intrinsic, I'll fix it...

Roland



> Program received signal SIGSEGV, Segmentation fault.
> 0x in ?? ()
> (gdb) bt
> #0  0x in ?? ()
> #1  0x77fcd16a in ?? ()
> #2  0x2c2c2c2caeaeaeae in ?? ()
> #3  0x979797976f6f6f6f in ?? ()
> #4  0x20b0d7f2 in ?? ()
> #5  0x976f2cae in ?? ()
> #6  0x0089625d008eb099 in ?? ()
> #7  0x7b8487218483a821 in ?? ()
> #8  0x008414210094ffd6 in ?? ()
> #9  0x007bef9400739629 in ?? ()
> #10 0x0069 in ?? ()
> #11 0x06d0 in ?? ()
> #12 0x7fffe490 in ?? ()
> #13 0x5593f040 in ?? ()
> #14 0x7fffe430 in ?? ()
> #15 0x77fcd053 in ?? ()
> #16 0x77fcd000 in ?? ()
> #17 0x556fd618 in util_format_test_cases ()
> #18 0x558c0780 in ?? ()
> #19 0x7fffe490 in ?? ()
> #20 0x556fd618 in util_format_test_cases ()
> #21 0x555667cc in test_format_float (verbose=, 
> desc=0x7fffe4a0, fp=0x0) at 
> ../src/gallium/drivers/llvmpipe/lp_test_format.c:184
> #22 test_one (verbose=, format_desc=0x7fffe4a0, fp=0x0) at 
> ../src/gallium/drivers/llvmpipe/lp_test_format.c:342
> #23 test_all (verbose=, fp=0x0) at 
> ../src/gallium/drivers/llvmpipe/lp_test_format.c:395
> #24 0x5556621f in main (argc=1, argv=0x7fffe628) at 
> ../src/gallium/drivers/llvmpipe/lp_test_main.c:419
> 
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: allow glDrawElements to work with GL_SELECT feedback

2018-12-19 Thread Roland Scheidegger
Fix looks good to me.

Reviewed-by: Roland Scheidegger 


Am 19.12.18 um 04:50 schrieb Ilia Mirkin:
> Not sure if this ever worked, but the current logic for setting the
> min/max index is definitely wrong for indexed draws. While we're at it,
> bring in all the usual logic from the non-indirect drawing path.
> 
> Bugzilla: 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.freedesktop.org%2Fshow_bug.cgi%3Fid%3D109086data=02%7C01%7Csroland%40vmware.com%7C6caf58471242445a386a08d665652d9a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636807882600665064sdata=YCu%2FJsZbUMvGJsiVDW916xYWVbY8P3hCh%2BADCVU%2BiDk%3Dreserved=0
> Signed-off-by: Ilia Mirkin 
> ---
> 
> This makes it more or less mirror st_draw_vbo. As the comment in the
> code says, would be nice to refactor, but ... meh.
> 
> Note that I haven't tested any of the interactions with additional
> features, like primitive restart or instancing or any of that. However I
> don't think that this would make things *worse*.
> 
>  src/mesa/state_tracker/st_draw_feedback.c | 61 +++
>  1 file changed, 39 insertions(+), 22 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_draw_feedback.c 
> b/src/mesa/state_tracker/st_draw_feedback.c
> index 6ec6d5c16f4..49fdecf7e38 100644
> --- a/src/mesa/state_tracker/st_draw_feedback.c
> +++ b/src/mesa/state_tracker/st_draw_feedback.c
> @@ -84,27 +84,6 @@ set_feedback_vertex_format(struct gl_context *ctx)
>  }
>  
>  
> -/**
> - * Helper for drawing current vertex arrays.
> - */
> -static void
> -draw_arrays(struct draw_context *draw, unsigned mode,
> -unsigned start, unsigned count)
> -{
> -   struct pipe_draw_info info;
> -
> -   util_draw_init_info();
> -
> -   info.mode = mode;
> -   info.start = start;
> -   info.count = count;
> -   info.min_index = start;
> -   info.max_index = start + count - 1;
> -
> -   draw_vbo(draw, );
> -}
> -
> -
>  /**
>   * Called by VBO to draw arrays when in selection or feedback mode and
>   * to implement glRasterPos.
> @@ -136,10 +115,18 @@ st_feedback_draw_vbo(struct gl_context *ctx,
> struct pipe_transfer *ib_transfer = NULL;
> GLuint i;
> const void *mapped_indices = NULL;
> +   struct pipe_draw_info info;
>  
> if (!draw)
>return;
>  
> +   /* Initialize pipe_draw_info. */
> +   info.primitive_restart = false;
> +   info.vertices_per_patch = ctx->TessCtrlProgram.patch_vertices;
> +   info.indirect = NULL;
> +   info.count_from_stream_output = NULL;
> +   info.restart_index = 0;
> +
> st_flush_bitmap_cache(st);
> st_invalidate_readpix_cache(st);
>  
> @@ -213,9 +200,23 @@ st_feedback_draw_vbo(struct gl_context *ctx,
>   mapped_indices = ib->ptr;
>}
>  
> +  info.index_size = ib->index_size;
> +  info.min_index = min_index;
> +  info.max_index = max_index;
> +  info.has_user_indices = true;
> +  info.index.user = mapped_indices;
> +
>draw_set_indexes(draw,
> (ubyte *) mapped_indices,
> index_size, ~0);
> +
> +  if (ctx->Array._PrimitiveRestart) {
> + info.primitive_restart = true;
> + info.restart_index = _mesa_primitive_restart_index(ctx, 
> info.index_size);
> +  }
> +   } else {
> +  info.index_size = 0;
> +  info.has_user_indices = false;
> }
>  
> /* set the constant buffer */
> @@ -226,7 +227,23 @@ st_feedback_draw_vbo(struct gl_context *ctx,
>  
> /* draw here */
> for (i = 0; i < nr_prims; i++) {
> -  draw_arrays(draw, prims[i].mode, start + prims[i].start, 
> prims[i].count);
> +  info.count = prims[i].count;
> +
> +  if (!info.count)
> + continue;
> +
> +  info.mode = prims[i].mode;
> +  info.start = start + prims[i].start;
> +  info.start_instance = prims[i].base_instance;
> +  info.instance_count = prims[i].num_instances;
> +  info.index_bias = prims[i].basevertex;
> +  info.drawid = prims[i].draw_id;
> +  if (!ib) {
> + info.min_index = info.start;
> + info.max_index = info.start + info.count - 1;
> +  }
> +
> +  draw_vbo(draw, );
> }
>  
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: use llvm jit code for decoding s3tc

2018-12-19 Thread Roland Scheidegger
Am 19.12.18 um 08:35 schrieb Jose Fonseca:
> On 19/12/2018 03:51, srol...@vmware.com wrote:
>> From: Roland Scheidegger 
>>
>> This is (much) faster than using the util fallback.
>> (Note that there's two methods here, one would use a cache, similar to
>> the existing code (although the cache was disabled), except the block
>> decode is done with jit code, the other directly decodes the required
>> pixels. For now don't use the cache (being direct-mapped is suboptimal,
>> but it's difficult to come up with something better which doesn't have
>> too much overhead.)
>> ---

>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
>> b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
>> index 018cca8f9df..a6662c5e01b 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
>> @@ -3549,10 +3549,6 @@ lp_build_sample_soa_func(struct gallivm_state
>> *gallivm,
>>     const struct util_format_description *format_desc;
>>     format_desc =
>> util_format_description(static_texture_state->format);
>>     if (format_desc && format_desc->layout ==
>> UTIL_FORMAT_LAYOUT_S3TC) {
>> - /*
>> -  * This is not 100% correct, if we have cache but the
>> -  * util_format_s3tc_prefer is true the cache won't get used
>> -  * regardless (could hook up the block decode there...) */
>>    need_cache = TRUE;
> 
> I'm a bit confused.  Based on your comment description, shouldnt this be
> FALSE?  Or is this dead code?
No that should be correct (note util_format_s3tc_prefer doesn't even
exist anymore). This is in if (cache_ptr) section - it just means that
we will pass the cache pointer (if it exists, that is if cache is
enabled) to the (separate) texture function.

Roland


> 
>>     }
>>  }
>> diff --git a/src/gallium/auxiliary/meson.build
>> b/src/gallium/auxiliary/meson.build
>> index a4dbcf7b4ca..57f7e69050f 100644
>> --- a/src/gallium/auxiliary/meson.build
>> +++ b/src/gallium/auxiliary/meson.build
>> @@ -389,8 +389,8 @@ if with_llvm
>>   'gallivm/lp_bld_flow.h',
>>   'gallivm/lp_bld_format_aos_array.c',
>>   'gallivm/lp_bld_format_aos.c',
>> -    'gallivm/lp_bld_format_cached.c',
>>   'gallivm/lp_bld_format_float.c',
>> +    'gallivm/lp_bld_format_s3tc.c',
>>   'gallivm/lp_bld_format.c',
>>   'gallivm/lp_bld_format.h',
>>   'gallivm/lp_bld_format_soa.c',
>>
> 
> 
> Otherwise looks great.  Thanks!
> 
> Reviewed-by: Jose Fonseca 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MR: NIR: Partial redundancy elimination for compares

2018-12-17 Thread Roland Scheidegger
Am 17.12.18 um 23:27 schrieb Roland Scheidegger:
> Am 17.12.18 um 23:07 schrieb Ilia Mirkin:
>> On Mon, Dec 17, 2018 at 5:05 PM Ian Romanick  wrote:
>>>
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2Fmerge_requests%2F22data=02%7C01%7Csroland%40vmware.com%7C5773f37aa397417e6beb08d6646c24c6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636806812986635916sdata=z2ev4sVJEW8Kw2hsoNWfYHh6FkSwSy%2B5CdTItxoh%2FPE%3Dreserved=0
>>>
>>> This series adds a new optimization pass that tries to replace code
>>> sequences like
>>>
>>> if (x < y) {
>>> z = y - x;
>>> ...
>>> }
>>>
>>> with a sequence like
>>>
>>> t = x - y;
>>> if (t < 0) {
>>> z = -t;
>>> ...
>>> }
>>
>> Is it worth worrying about infinities? e.g. if x = -Infinity, y =
>> Infinity, "x < y" will be true, but "x - y < 0" will not be (pretty
>> sure it'll be a NaN, which is not < 0).
> 
> I was wondering the same, but I think this should still work.
> -Inf - Inf = -Inf, so no problem there.
> 

Although it looks like the optimization might be a bit problematic? For
it to work you really need not just be able to use flags generated by
sub, but also you need to be able to eliminate the negation one way or
another (e.g. free input negate going into another operation, turning
subsequent adds into subs, ...). But well maybe that's often possible.

Roland


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MR: NIR: Partial redundancy elimination for compares

2018-12-17 Thread Roland Scheidegger
Am 17.12.18 um 23:07 schrieb Ilia Mirkin:
> On Mon, Dec 17, 2018 at 5:05 PM Ian Romanick  wrote:
>>
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2Fmerge_requests%2F22data=02%7C01%7Csroland%40vmware.com%7C5773f37aa397417e6beb08d6646c24c6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636806812986635916sdata=z2ev4sVJEW8Kw2hsoNWfYHh6FkSwSy%2B5CdTItxoh%2FPE%3Dreserved=0
>>
>> This series adds a new optimization pass that tries to replace code
>> sequences like
>>
>> if (x < y) {
>> z = y - x;
>> ...
>> }
>>
>> with a sequence like
>>
>> t = x - y;
>> if (t < 0) {
>> z = -t;
>> ...
>> }
> 
> Is it worth worrying about infinities? e.g. if x = -Infinity, y =
> Infinity, "x < y" will be true, but "x - y < 0" will not be (pretty
> sure it'll be a NaN, which is not < 0).

I was wondering the same, but I think this should still work.
-Inf - Inf = -Inf, so no problem there.

Roland


>   -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] st/mesa: Make an enum for pipeline statistics query result indices.

2018-12-15 Thread Roland Scheidegger
Am 15.12.18 um 22:39 schrieb Ilia Mirkin:
> On Sat, Dec 15, 2018 at 4:12 PM Kenneth Graunke  wrote:
>>
>> On Saturday, December 15, 2018 9:10:46 AM PST Ilia Mirkin wrote:
>>> On Sat, Dec 15, 2018 at 4:45 AM Kenneth Graunke  
>>> wrote:
 Gallium handles pipeline statistics queries as a single query
 (PIPE_QUERY_PIPELINE_STATISTICS) which returns a struct with 11 values.
 Sometimes it's useful to refer to each of those values individually,
 rather than as a group.  To avoid hardcoding numbers, we define a new
 enum for each value.  Here, the name and enum value correspond to the
 index in the struct pipe_query_data_pipeline_statistics result.
>>>
>>> This later-in-the-series desire to be able to get just one value back
>>> from Gallium is an API change which would break any existing d3d1x
>>> state trackers. I realize you're not changing any drivers, but in my
>>> mind, it's preferable not to have ambiguous APIs where some drivers do
>>> one thing, others do another. For NVIDIA, we have to fetch the
>>> counters individually too, but we just do them all. It's ~free to do
>>> so, and we only do one set of synchronization for all of them.
>>
>> Yes, I suppose you're right in that the existing interface is designed
>> to return all 11 counters, and I'm essesntially implementing it wrong
>> because I didn't understand it.  It seemed like more of an inconsistency
>> in get_query_results vs get_query_results_resource, where one of those
>> knew what value to fetch, and the other did not.
> 
> I implemented the direct-write-to-resource logic, and there was no
> use-case for fetching them all, and you had to support a GL API where
> you had to be able to place each value into a precise location. I
> didn't want to alter the other API at the time as I didn't feel it
> hurt too much.
> 
>>
>> While it may not be that expensive to return 11 values, it isn't free.
>> The ARB_pipeline_statistics_query extension is designed to help port
>> D3D1x apps to OpenGL, but it provides GL-style single-value queries
>> rather than a single query that returns a struct.  So, if a D3D1x
>> translator naively tries to fetch all 11 counters, it would have to
>> do 11 different GL queries...each of which would map to Gallium's
>> return-all-11 interface...resulting in 121 counter reads.
> 
> Yep. Not ideal. (Which was never my argument anyways -- having a way
> to return a single value would definitely be better.)
> 
>>
>>> Anyways, I'd rather not have this ambiguous thing of "you could return
>>> some or all" situation. We should be more definitive. I'd recommend
>>> adding a PIPE_QUERY_PIPELINE_STATISTICS_ONE to differentiate [or
>>> PIPE_QUERY_PIPELINE_STATISTIC if you want to be clever and cause lots
>>> of typos], along with a CAP for support.
>>
>> I'd like to have an interface that better serves the in-tree state
>> trackers.  st/mesa wants 1 value, but it seems st/nine wants 2.  So
>> the single interface isn't ideal for nine, either, I guess.
> 
> Historically we've tried to avoid creating breakage for VMware where
> it wasn't necessary.
> 
>>
>> If people would prefer that we add a new query type and capability bit,
>> I can do that, but I probably won't bother implementing the all-11 mode
>> in my driver, then.
> 
> I think that's fine. It's already behind some PIPE_CAP. The state
> tracker could then do like
> 
> if ONE: get_one();
> if ALL: get_all(); select one();
> 
> Cheers,
> 
>   -ilia
> 

Yes, it's unfortunate d3d returns all values and GL returns just one.
Although FWIW outside of test suites we haven't really seen much use of
pipeline statistics queries, I'd guess that's the same for GL. Maybe
they are used for development/debugging/profiling of apps, but otherwise
it appears they don't see much (if any) use - as such they probably
aren't all that performance critical.

I think being explicit in the interface (in some way) if all or one
value is requested is a good idea (and not all drivers implementing
requesting all is ok).

(Patch 1 looks fine regradless to me fwiw.)

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/6] mesa/st: wire up DiscardSubFramebuffer

2018-12-11 Thread Roland Scheidegger
Am 11.12.18 um 23:50 schrieb Rob Clark:
> Signed-off-by: Rob Clark 
> ---
>  src/gallium/include/pipe/p_context.h | 11 +++
>  src/mesa/state_tracker/st_cb_fbo.c   | 26 ++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/src/gallium/include/pipe/p_context.h 
> b/src/gallium/include/pipe/p_context.h
> index d4e9179b78a..eb52c7e9a4e 100644
> --- a/src/gallium/include/pipe/p_context.h
> +++ b/src/gallium/include/pipe/p_context.h
> @@ -811,6 +811,17 @@ struct pipe_context {
> void (*invalidate_surface)(struct pipe_context *ctx,
>struct pipe_surface *surf);
>  
> +   /**
> +* Invalidate a portion of a surface.  This is used to
> +*
> +* (1) implement glInvalidateSubFramebuffer() and friends
> +* (2) as a hint before a scissored clear (which is turned into draw_vbo()
> +* that the cleared rect can be discarded
> +*/
> +   void (*invalidate_sub_surface)(struct pipe_context *ctx,
> +  struct pipe_surface *surf,
> +  const struct pipe_scissor_state *rect);
> +

The same applies as to the previous change. Additionally, I think we
really don't need 3 functions essentially doing the same thing. Ok I
could see that maybe there's value passing in the surface directly
(rather than mip levels, layers), but surely
invalidate_surface/invalidate_sub_surface looks like overkill to me.
Maybe pass in a NULL pointer for rect if you want to clear everything?

Roland


> /**
>  * Return information about unexpected device resets.
>  */
> diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
> b/src/mesa/state_tracker/st_cb_fbo.c
> index 3ece1d4a9de..50c27ea51d9 100644
> --- a/src/mesa/state_tracker/st_cb_fbo.c
> +++ b/src/mesa/state_tracker/st_cb_fbo.c
> @@ -774,6 +774,31 @@ st_discard_framebuffer(struct gl_context *ctx, struct 
> gl_framebuffer *fb,
>st->pipe->invalidate_surface(st->pipe, psurf);
>  }
>  
> +static void
> +st_discard_sub_framebuffer(struct gl_context *ctx, struct gl_framebuffer *fb,
> +   struct gl_renderbuffer_attachment *att, GLint x,
> +   GLint y, GLsizei width, GLsizei height)
> +{
> +   struct st_context *st = st_context(ctx);
> +   struct pipe_surface *psurf;
> +
> +   if (!att->Renderbuffer)
> +  return;
> +
> +   psurf = st_renderbuffer(att->Renderbuffer)->surface;
> +
> +   if (st->pipe->invalidate_sub_surface) {
> +  struct pipe_scissor_state rect;
> +
> +  rect.minx = x;
> +  rect.maxx = x + width - 1;
> +  rect.miny = y;
> +  rect.maxy = y + height - 1;
> +
> +  st->pipe->invalidate_sub_surface(st->pipe, psurf, );
> +   }
> +}
> +
>  /**
>   * Called via glDrawBuffer.  We only provide this driver function so that we
>   * can check if we need to allocate a new renderbuffer.  Specifically, we
> @@ -952,6 +977,7 @@ st_init_fbo_functions(struct dd_function_table *functions)
> functions->FinishRenderTexture = st_finish_render_texture;
> functions->ValidateFramebuffer = st_validate_framebuffer;
> functions->DiscardFramebuffer = st_discard_framebuffer;
> +   functions->DiscardSubFramebuffer = st_discard_sub_framebuffer;
>  
> functions->DrawBufferAllocate = st_DrawBufferAllocate;
> functions->ReadBuffer = st_ReadBuffer;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] mesa/st: wire up DiscardFramebuffer

2018-12-11 Thread Roland Scheidegger
Am 11.12.18 um 23:50 schrieb Rob Clark:
> pipe_context::invalidate_resource() is so *almost* what we want, but
> with FBOs the fb can be a single layer/level of a pipe_resource, which
> makes ::invalidate_resource() not expressive enough.
> 
> Signed-off-by: Rob Clark 
> ---
>  src/gallium/include/pipe/p_context.h |  8 
>  src/mesa/state_tracker/st_cb_fbo.c   | 16 
>  2 files changed, 24 insertions(+)
> 
> diff --git a/src/gallium/include/pipe/p_context.h 
> b/src/gallium/include/pipe/p_context.h
> index e07b76d4f03..d4e9179b78a 100644
> --- a/src/gallium/include/pipe/p_context.h
> +++ b/src/gallium/include/pipe/p_context.h
> @@ -803,6 +803,14 @@ struct pipe_context {
> void (*invalidate_resource)(struct pipe_context *ctx,
> struct pipe_resource *resource);
>  
> +   /**
> +* Like ->invalidate_surface,
Like ->invalidate->resource?

gallium interface changes must use gallium: in the shortlog. Should
probably split up the patch into 2.
It's also missing the gallium/docs changes.

Roland



 but can invalidate a specific layer and level
> +* of a resource.  If the backing surf->texture has just a single layer 
> and
> +* level (like window system buffers) it is equiv to ->invalidate_resource
> +*/
> +   void (*invalidate_surface)(struct pipe_context *ctx,
> +  struct pipe_surface *surf);
> +
> /**
>  * Return information about unexpected device resets.
>  */
> diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
> b/src/mesa/state_tracker/st_cb_fbo.c
> index 8901a8680ef..3ece1d4a9de 100644
> --- a/src/mesa/state_tracker/st_cb_fbo.c
> +++ b/src/mesa/state_tracker/st_cb_fbo.c
> @@ -758,6 +758,21 @@ st_validate_framebuffer(struct gl_context *ctx, struct 
> gl_framebuffer *fb)
> }
>  }
>  
> +static void
> +st_discard_framebuffer(struct gl_context *ctx, struct gl_framebuffer *fb,
> +   struct gl_renderbuffer_attachment *att)
> +{
> +   struct st_context *st = st_context(ctx);
> +   struct pipe_surface *psurf;
> +
> +   if (!att->Renderbuffer)
> +  return;
> +
> +   psurf = st_renderbuffer(att->Renderbuffer)->surface;
> +
> +   if (st->pipe->invalidate_surface)
> +  st->pipe->invalidate_surface(st->pipe, psurf);
> +}
>  
>  /**
>   * Called via glDrawBuffer.  We only provide this driver function so that we
> @@ -936,6 +951,7 @@ st_init_fbo_functions(struct dd_function_table *functions)
> functions->RenderTexture = st_render_texture;
> functions->FinishRenderTexture = st_finish_render_texture;
> functions->ValidateFramebuffer = st_validate_framebuffer;
> +   functions->DiscardFramebuffer = st_discard_framebuffer;
>  
> functions->DrawBufferAllocate = st_DrawBufferAllocate;
> functions->ReadBuffer = st_ReadBuffer;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/aux: add is_unorm() helper

2018-12-11 Thread Roland Scheidegger
Am 11.12.18 um 22:21 schrieb Ilia Mirkin:
> So ... Z24 will end up with is_unorm() == true? [Just guessing --
> assume it doesn't hae is_mixed == true.]
Z24X8 would return true, but Z24S8 would return false. (If you didn't
want that I guess you could check for colorspace.)

> Also, does RGB10A2 have mixed
> set? If so, then it won't report unorm. Not 100% sure if is_mixed is
> only for norm + int mixing.
Not sure which RGB10A2 variant you're refering to but there's only one
weird one (for bump maps) which would have mixed set. All others I think
always have components with the same type, so the one with unorm should
return true...

FWIW I don't think the combination of normalized & pure_integer is
really possible, so an explicit check for !pure_integer seems redundant.
But shouldn't hurt I suppose (and of course is_snorm does it the same).

Roland




> On Tue, Dec 11, 2018 at 4:05 PM Rob Clark  wrote:
>>
>> We already had one for is_snorm() but not unorm.
>>
>> Signed-off-by: Rob Clark 
>> ---
>>  src/gallium/auxiliary/util/u_format.c | 21 +
>>  src/gallium/auxiliary/util/u_format.h |  3 +++
>>  2 files changed, 24 insertions(+)
>>
>> diff --git a/src/gallium/auxiliary/util/u_format.c 
>> b/src/gallium/auxiliary/util/u_format.c
>> index e43a619313e..231e89017b4 100644
>> --- a/src/gallium/auxiliary/util/u_format.c
>> +++ b/src/gallium/auxiliary/util/u_format.c
>> @@ -169,6 +169,27 @@ util_format_is_snorm(enum pipe_format format)
>>desc->channel[i].normalized;
>>  }
>>
>> +/**
>> + * Returns true if all non-void channels are normalized unsigned.
>> + */
>> +boolean
>> +util_format_is_unorm(enum pipe_format format)
>> +{
>> +   const struct util_format_description *desc = 
>> util_format_description(format);
>> +   int i;
>> +
>> +   if (desc->is_mixed)
>> +  return FALSE;
>> +
>> +   i = util_format_get_first_non_void_channel(format);
>> +   if (i == -1)
>> +  return FALSE;
>> +
>> +   return desc->channel[i].type == UTIL_FORMAT_TYPE_UNSIGNED &&
>> +  !desc->channel[i].pure_integer &&
>> +  desc->channel[i].normalized;
>> +}
>> +
>>  boolean
>>  util_format_is_snorm8(enum pipe_format format)
>>  {
>> diff --git a/src/gallium/auxiliary/util/u_format.h 
>> b/src/gallium/auxiliary/util/u_format.h
>> index 5bcfc1f1154..8dcc438a4a1 100644
>> --- a/src/gallium/auxiliary/util/u_format.h
>> +++ b/src/gallium/auxiliary/util/u_format.h
>> @@ -726,6 +726,9 @@ util_format_is_pure_uint(enum pipe_format format);
>>  boolean
>>  util_format_is_snorm(enum pipe_format format);
>>
>> +boolean
>> +util_format_is_unorm(enum pipe_format format);
>> +
>>  boolean
>>  util_format_is_snorm8(enum pipe_format format);
>>
>> --
>> 2.19.2
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7C4aca1244e0024a40746508d65faea953%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636801601117990736sdata=5SfD%2BKxjb1guv08vn0GOyxCODeiehCLf0m%2BLAhEE%2BW8%3Dreserved=0
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7C4aca1244e0024a40746508d65faea953%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636801601117990736sdata=5SfD%2BKxjb1guv08vn0GOyxCODeiehCLf0m%2BLAhEE%2BW8%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/11] st/mesa: Use Array._DrawVAO in st_atom_array.c.

2018-12-11 Thread Roland Scheidegger
Am 11.12.18 um 10:37 schrieb Mathias Fröhlich:
> 
> Hey,
> 
> On Tuesday, 11 December 2018 10:19:47 CET Erik Faye-Lund wrote:
>> On Mon, 2018-12-10 at 18:23 +0100, Mathias Fröhlich wrote:
>>> Hi Erik,
>>>
>>> Not sure if this is our problem as I think that I only saw simple
>>> bindings with a zero instance divisor while debugging supertux kart.
>>>
>>> But at least I think that this is a problem in virglrenderer. The
>>> glVertexBindingDivisor is per binding and not per vertex attribute
>>> in OpenGL.
>>> ... you probably want to solve that differently, but for now this
>>> should
>>> quick band aid to pinpoint the problem that we observe.
>>>
>>> Does the attached patch to virglrenderer fix our problem?
>>>
>>
>> It does! Thanks a lot :)
>>
>> I'll find out what the proper fix is, and submit a patch in your name!
>> Again, thanks a lot :)
> 
> You are welcome! And I don't need credits. Its a bit of a pity that the
> vertex element/buffer structure in gallium is different than it is in OpenGL.
> OTOH, does it match the way it is done in DirectX?
Yes indeed. It actually seems D3d10 (11/12) is the odd man out here,
since GL, Vulkan, and even Metal have it per vertex buffer. But having
it per attribute is a more powerful representation (since you can have
multiple attribs per buffer, but not the other way round). Not sure
off-hand if d3d9 could already do it. I suppose if you'd be interested
only in gl state tracker you could ignore that the value could
potentially be different for different attributes but the same buffer,
otherwise you'd have to duplicate the binding in theory, although I
believe that hitting such a condition will be extremely rare (it
probably doesn't make much sense that someone would organize the data in
such a strange way that you have both per-vertex and per-instance (or
per-instance data with separate rates) in the same buffer).

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/28] util: added float to float16 conversions with RTZ and RTNE

2018-12-07 Thread Roland Scheidegger
Am 07.12.18 um 05:22 schrieb Matt Turner:
> On Thu, Dec 6, 2018 at 7:22 PM Roland Scheidegger  wrote:
>>
>> Am 07.12.18 um 03:20 schrieb Matt Turner:
>>> Since this is for an extension that will be BDW+ can we use the
>>> _cvtss_sh() intrinsic instead? It corresponds to an IVB+ instruction
>>> and even takes the rounding mode directly as an immediate argument.
>>
>> Not saying trying to use it isn't a good idea, but you'd need the right
>> compile flags, and you can't assume it's present, since even the latest
>> pentiums don't support avx (and by extension, f16c). (The same is true
>> for atoms too, of course).
> 
> I'm not sure that AVX and F16C are related, but from a quick glance it
> seems that you're right that Atoms ("little core") doesn't support
> F16C. I had no idea :(
> 
> As far as I can tell all "big cores" have F16C. That's what
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fonlinedocs%2Fgcc%2Fx86-Options.htmldata=02%7C01%7Csroland%40vmware.com%7Ca977fe6f49144fb22be608d65bfbb280%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636797533925838415sdata=oyAmOqL3xyDJ4pWo7jpduH4XawLuSKJf432K7X31094%3Dreserved=0
>  indicates.
That also indicates SNB and up all have AVX. Despite that,
Pentiums/Celerons from those families definitely do not. (I suppose that
means cputype=ivbybridge etc. can't be used if you target the
pentiums/celerons, at least not for gcc. I know this was a recurring
problem for llvm with autodetect of cpu type, when it would recognize
newer core and then trying to use avx / avx2 on pentiums, dying in a fire.)
That f16c is tied implicitly to avx seems obvious without a doubt, since
the instructions (VCVTPH2PS, VCVTPS2PH) only exist with VEX encoding.
You cannot issue VEX-encoded instructions without AVX (VEX-encoding _is_
AVX, regardless if you use the 128bit or 256bit variants).
If you don't like that pentiums don't support those, complain to intel
(as it's just disabled, of course). IMHO it's a bit silly nowadays...

> 
> If we've got to have the code, we might as well use it and not
> complicate it by using _cvtss_sh() then. Dang.
> 
> (Unfortunately there seems to be bad information out there confusing
> things though... see 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommunities.intel.com%2Fthread%2F121635data=02%7C01%7Csroland%40vmware.com%7Ca977fe6f49144fb22be608d65bfbb280%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636797533925838415sdata=KOCiTY%2BLWFc1eu7iMPWPm2PALY7Bl%2FNaEoVk%2FP%2BAvaw%3Dreserved=0)

Quite sure this is blatantly false. Seems even intel is confused about
it :-).

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/28] util: added float to float16 conversions with RTZ and RTNE

2018-12-06 Thread Roland Scheidegger
Am 07.12.18 um 03:20 schrieb Matt Turner:
> Since this is for an extension that will be BDW+ can we use the
> _cvtss_sh() intrinsic instead? It corresponds to an IVB+ instruction
> and even takes the rounding mode directly as an immediate argument.

Not saying trying to use it isn't a good idea, but you'd need the right
compile flags, and you can't assume it's present, since even the latest
pentiums don't support avx (and by extension, f16c). (The same is true
for atoms too, of course).

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: Use nextafterf(0.5, 0.0) as rounding constant

2018-11-28 Thread Roland Scheidegger
Am 28.11.18 um 07:37 schrieb Matt Turner:
> The common truncf(x + 0.5) fails for the floating-point value just less
> than 0.5 (nextafterf(0.5, 0.0)). nextafterf(0.5, 0.0) + 0.5, after
> rounding is 1.0, thus truncf does not produce the desired value.
> 
> The solution is to add nextafterf(0.5, 0.0) instead of 0.5 before
> truncating. This works for all values.

Reviewed-by: Roland Scheidegger 

Although this will still do round-to-nearest-away-from-zero, instead of
nearest-even which we probably want.
That said, I don't think it matters anywhere - d3d10 round will require
round-to-nearest-even, and the shader round might fall back to this, but
I don't think we really care in this case. GL in general doesn't care
anyway of course. (I don't think you can easily implement
round-to-nearest-even with a fallback.)


> ---
> I noticed this while investigating 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.gentoo.org%2F665570data=02%7C01%7Csroland%40vmware.com%7C63d1da48159f4cbe95e308d654fbf3d8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636789838438856537sdata=yzXELCbFCR0IsjzrObWN%2B%2BdzVAOV0HQFkSn%2FxxWesWI%3Dreserved=0
>  but it
> does not fix it.
> 
> Roland, do you have a suggestion for how to make lp_build_iround() work
> on non-SSE/non-Altivec platforms? I notice that if I unconditionally
> return TRUE from arch_rounding_available() and make
> lp_build_round_arch() take the SSE4.1 path (that emits llvm.nearbyint)
> it passes on ARM64.
> 
> I noticed there's some hack in lp_test_arit.c:test_unary:
> 
>if (test->ref ==  && length == 2 &&
>ref != roundf(testval)) {
>   /* FIXME: The generic (non SSE) path in lp_build_iround, which is
>* always taken for length==2 regardless of native round support,
>* does not round to even. */
>   expected_pass = FALSE;
>}
> 
> It'd be nice to get rid of that.. but maybe we can somehow use it to
> just mark all the round tests as expected fail on other platforms if no
> real fix is forthcoming?

Actually I think arch_rounding_available() would not need to check for
vector size (but we should check for type width instead) for the sse41
case? I think llvm should be able to handle any vector size reasonably,
type legalization should take care of it, which would eliminate the
length-2 problem for sse41 (at some point we actually used sse
intrinsics, not the llvm ones.)

As a side note, perhaps we could use the llvm intrinsics on altivec too
instead nowadays, I don't know if they do the right thing there, but I
can't see why they wouldn't, but someone would need to test it.

The only problem with using the llvm intrinsics is you really really
really don't want to do it if the cpu doesn't natively support it. In
this case llvm (at least used to) emit (scalar of course) calls to math
library, which is completely horrendous (in particular since we don't
really care that much about the exact rounding), if it even works. Hence
despite using llvm intrinsics we need to know if the cpu actually
supports it.
So ideally for arm we'd actually detect cpu features like for other
archs (I don't think all arm chips can do rounding ops, even if they
have neon, although perhaps all arm64 ones can) and follow similar
logic. But noone ever submitted a arm-specific patch for better llvmpipe
support...

Roland

> 
>  src/gallium/auxiliary/gallivm/lp_bld_arit.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> index f348833206b..c050bfdb936 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> @@ -2477,7 +2477,7 @@ lp_build_iround(struct lp_build_context *bld,
> else {
>LLVMValueRef half;
>  
> -  half = lp_build_const_vec(bld->gallivm, type, 0.5);
> +  half = lp_build_const_vec(bld->gallivm, type, nextafterf(0.5, 0.0));
>  
>if (type.sign) {
>   LLVMTypeRef vec_type = bld->vec_type;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf

2018-11-19 Thread Roland Scheidegger
Although I'm not sure we actually really wanted that rounding behavior in the 
first place - it's possible the only reason it was used is just because it had 
an easy implementation...


From: Matt Turner 
Sent: Friday, November 16, 2018 8:02:00 PM
To: Dylan Baker
Cc: Roland Scheidegger; ML mesa-dev; erik.faye-l...@collabora.com
Subject: Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf

On Fri, Nov 16, 2018 at 10:34 AM Dylan Baker  wrote:
>
> Quoting Roland Scheidegger (2018-11-13 18:41:00)
> > Am 14.11.18 um 03:21 schrieb Matt Turner:
> > > On Tue, Nov 13, 2018 at 6:03 PM Roland Scheidegger  
> > > wrote:
> > >>
> > >> Am 13.11.18 um 23:49 schrieb Dylan Baker:
> > >>> Quoting Roland Scheidegger (2018-11-13 14:13:00)
> > >>>> Am 13.11.18 um 18:00 schrieb Dylan Baker:
> > >>>>> Quoting Erik Faye-Lund (2018-11-13 01:34:53)
> > >>>>>> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote:
> > >>>>>>> Quoting Erik Faye-Lund (2018-11-12 04:51:47)
> > >>>>>>>> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote:
> > >>>>>>>>> Which has the same behavior.
> > >>>>>>>>
> > >>>>>>>> Does it? I'm not so sure... IROUND_POS seems to round to nearest
> > >>>>>>>> integer depending on the FPU rounding mode, _mesa_roundevenf rounds
> > >>>>>>>> to
> > >>>>>>>> the nearest *even* value regardless of the FPU rounding mode, no?
> > >>>>>>>>
> > >>>>>>>> I'm not sure if it matters or not, but *at least* point that out in
> > >>>>>>>> the
> > >>>>>>>> commit message. Unless I'm missing something, of course...
> > >>>>>>>
> > >>>>>>> I should put it in the commit message, but there is a comment in
> > >>>>>>> rounding.h that
> > >>>>>>> if you change the rounding mode you get to keep the pieces.
> > >>>>>>
> > >>>>>> Well, this might regress performance pretty badly. Especially in the
> > >>>>>> swrast code, this could be bad...
> > >>>>>>
> > >>>>>
> > >>>>> Why? we have the assumption that you don't change the rounding mode 
> > >>>>> already in
> > >>>>> core mesa and many of the drivers.
> > >>>>>
> > >>>>> For performance, I measured a simple 1000 loops of rounding, and 
> > >>>>> found that the
> > >>>>> only way the rounding.h function was slower is if you used the 
> > >>>>> __SSE4_1__
> > >>>>> path... (It was the same performance as the int cast +0.5 
> > >>>>> implementation)
> > >>>> FWIW I'm not entirely sure it's useful to have a sse41 implementation -
> > >>>> since all sse2 capable cpus can natively do rintf. Although maybe it
> > >>>> should be pointed out that the sse41 implementation will use a defined
> > >>>> rounding mode, whereas rintf will use current rounding mode. But I 
> > >>>> don't
> > >>>> think anyone ever cares for the results if a different rounding mode
> > >>>> would be set. Although of course rint and its variant do not actually
> > >>>> guarantee the even part of it (but well if it's a sse41 capable box we
> > >>>> pretty much know it would do just that anyway)... (And technically
> > >>>> nearbyintf would probably be an even better solution, since we never
> > >>>> want to get involved with the clunky exceptions, otherwise it's
> > >>>> identical. But there might be reasons why it isn't used.)
> > >>>>
> > >>>> Roland
> > >>>
> > >>> I'm not convinced we want it either, since it seems to be slower than 
> > >>> glibc's
> > >>> rintf. I guess it probably does make sense to use the nearbyintf 
> > >>> instead.
> > >>>
> > >>> As an aside (since I know 0 about assembly), does 
> > >>> _MM_FROUND_CUR_DIRECTION not
> > >>> check the rounding mode?
> > >> Oh indeed, I didn't check the code too closely (I was just assuming
> > 

Re: [Mesa-dev] [PATCH 30/30] mesa/st: require linear interpolation for ARB_texture_float

2018-11-19 Thread Roland Scheidegger
FWIW this looks like a rather similar incident to me what happened when mesa 
began to verify the max vertex stride (which needs to be 2048 with GL 4.4 
whereas r600 can only do 2047) where I argued it's a much better idea to lie 
about the GL version there rather than the specific vertex stride bit, but I 
was rather unsuccessful and not everybody apparently shares this view...


From: mesa-dev  on behalf of Ilia 
Mirkin 
Sent: Monday, November 19, 2018 5:37:58 PM
To: Erik Faye-Lund
Cc: ML Mesa-dev; Timothy Arceri; Emil Velikov
Subject: Re: [Mesa-dev] [PATCH 30/30] mesa/st: require linear interpolation for 
ARB_texture_float

On Mon, Nov 19, 2018 at 11:30 AM Erik Faye-Lund
 wrote:
>
> On Mon, 2018-11-19 at 11:13 -0500, Ilia Mirkin wrote:
> > On Mon, Nov 19, 2018 at 10:40 AM Erik Faye-Lund
> >  wrote:
> > > On Mon, 2018-11-19 at 10:02 -0500, Ilia Mirkin wrote:
> > > > Unfortunately this will drop GL 3.0 from Adreno A3xx. I think
> > > > we'd
> > > > rather fake linear interpolation with F32 textures which are
> > > > never
> > > > used than lose GL 3.0 there...
> > >
> > > Right...
> > >
> > > I guess this means that this GPU never really did support OpenGL
> > > 3.0,
> > > and will make some applications misbehave. There's definately
> > > applications out there that will lead to surprisingly bad problems
> > > when
> > > features like these are not supported.
> > >
> > > For instance if an application tries to take a local gradient by
> > > sampling a texture twice with a tiny epsilon (a common trick in
> > > tangent-free normal mapping, for instance), it will essentially get
> > > garbage, which can cause close to useless rendering.
> > >
> > > I've worked on applications that would have had problems like these
> > > if
> > > drivers report the wrong version, but could work correctly if they
> > > report the right version.
> > >
> > > Either way, I don't believe faking like that belongs in core Mesa.
> > > So
> > > if the Freedreno developers really want this kind of behavior,
> > > perhaps
> > > something like this could be a better move?
> > >
> > > ---8<---
> > > diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c
> > > b/src/gallium/drivers/freedreno/freedreno_screen.c
> > > index 88d91a91234..de811371f05 100644
> > > --- a/src/gallium/drivers/freedreno/freedreno_screen.c
> > > +++ b/src/gallium/drivers/freedreno/freedreno_screen.c
> > > @@ -260,6 +260,11 @@ fd_screen_get_param(struct pipe_screen
> > > *pscreen,
> > > enum pipe_cap param)
> > > return 0;
> > >
> > > case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
> > > +   /* HACK: A330 doesn't support linear interpolation
> > > of
> > > FP32 textures, but
> > > +* to keep OpenGL 3.0 support, we lie about it
> > > here.
> > > +*/
> > > +   return is_a3xx(screen) || is_a4xx(screen) ||
> > > is_a5xx(screen) || is_a6xx(screen);
> > > +
> > > case PIPE_CAP_CUBE_MAP_ARRAY:
> > > case PIPE_CAP_SAMPLER_VIEW_TARGET:
> > > case PIPE_CAP_TEXTURE_QUERY_LOD:
> > > ---8<---
> > >
> > > Alternatively, they could ask users to override the GL-version for
> > > applications that need GL 3.0, but doesn't have problems with the
> > > lack
> > > of FP32-interpolation...
> >
> > GL 3.0 brings SO much stuff in though, and GL 3.1 brings core
> > profiles.
> >
> > Your proposed solution will also expose the OES_bla ext, which we
> > definitely don't want to do. I'd instead keep it loose. The hardware
> > that doesn't support this stuff is generally targeted at ES. However
> > it's convenient to have desktop GL both for test coverage (piglit) as
> > well as regular use.
> >
> > Tons of desktop stuff doesn't work in Adreno. Starting with different
> > cull modes for front and back. Setting polygon mode for quads to
> > lines
> > shows you the internal line. Edge mode isn't supported. Probably
> > 1
> > other things.
> >
> > But it's still very useful to have GL 3.x advertised.
>
> As I tried to point out, that's only useful from one point of view.
> From an application developer's point of view, it's *worse* to expose
> GL 3.0 when it's not really supported. There's no way for applications
> to tell if filtering will work or not. When the correct version is
> reported, the application can provide a fallback path for the features
> it need, or fall back to lower quality rendering.
>
> When you're outside the spec, you kinda have to pick your poison. But I
> don't think a single driver wanting to fake the support should affect
> all other drivers regardless.

You're looking at this as some hypothetical driver which supports a
random smattering of extension enables, and trying to make mesa
resilient against such an adversarial opponent.

But that's not what's going on here. Features come in packs. I think
that a3xx on adreno is the only hardware affected by this change in
practice.

>
> And with the other legacy GL features that 

Re: [Mesa-dev] [PATCH] r600: clean up the GS ring buffers when the context is destroyed

2018-11-19 Thread Roland Scheidegger
Reviewed-by: Roland Scheidegger 


From: mesa-dev  on behalf of Gert 
Wollny 
Sent: Friday, November 16, 2018 6:06:15 PM
To: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] r600: clean up the GS ring buffers when the 
context is destroyed

I forgot:

Fixes: 1371d65a7fbd695d3516861fe733685569d890d0
   r600g: initial support for geometry shaders on evergreen (v2)

Am Freitag, den 16.11.2018, 12:48 +0100 schrieb Gert Wollny:
> From: Gert Wollny 
>
> This fixes two memory leaks reported by ASAN:
>
> Direct leak of 248 byte(s) in 1 object(s) allocated from:
>in malloc (/usr/lib64/gcc/x86_64-pc-linux-
> gnu/7.3.0/libasan.so+0xdb880)
>in r600_alloc_buffer_struct
> ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:578
>in r600_buffer_create
> ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:600
>in r600_resource_create_common
> ../../samba/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1265
>in r600_resource_create
> ../../samba/mesa/src/gallium/drivers/r600/r600_pipe.c:725
>in pipe_buffer_create
> ../../samba/mesa/src/gallium/auxiliary/util/u_inlines.h:291
>in update_gs_block_state
> ../../samba/mesa/src/gallium/drivers/r600/r600_state_common.c:1482
>
> Direct leak of 248 byte(s) in 1 object(s) allocated from:
>in malloc (/usr/lib64/gcc/x86_64-pc-linux-
> gnu/7.3.0/libasan.so+0xdb880)
>in r600_alloc_buffer_struct
> ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:578
>in r600_buffer_create
> ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:600
>in r600_resource_create_common
> ../../samba/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1265
>in r600_resource_create
> ../../samba/mesa/src/gallium/drivers/r600/r600_pipe.c:722
>in pipe_buffer_create
> ../../samba/mesa/src/gallium/auxiliary/util/u_inlines.h:291
>in update_gs_block_state
> ../../samba/mesa/src/gallium/drivers/r600/r600_state_common.c:1489
>
> Signed-off-by: Gert Wollny 
> ---
>  src/gallium/drivers/r600/r600_pipe.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/src/gallium/drivers/r600/r600_pipe.c
> b/src/gallium/drivers/r600/r600_pipe.c
> index 9e8501ff33..adf6ebc95f 100644
> --- a/src/gallium/drivers/r600/r600_pipe.c
> +++ b/src/gallium/drivers/r600/r600_pipe.c
> @@ -105,6 +105,12 @@ static void r600_destroy_context(struct
> pipe_context *context)
>   }
>   util_unreference_framebuffer_state(
> >framebuffer.state);
>
> +if (rctx->gs_rings.gsvs_ring.buffer)
> +   pipe_resource_reference(>gs_rings.gsvs_ring.buffer,
> NULL);
> +
> +if (rctx->gs_rings.esgs_ring.buffer)
> +   pipe_resource_reference(>gs_rings.esgs_ring.buffer,
> NULL);
> +
>   for (sh = 0; sh < PIPE_SHADER_TYPES; ++sh)
>   for (i = 0; i < PIPE_MAX_CONSTANT_BUFFERS; ++i)
>   rctx->b.b.set_constant_buffer(context, sh,
> i, NULL);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Ce3131ecf338b44f2a84808d64be5e3bc%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636779848156025413sdata=wNexMrKAnvcjOt9io9ZFRkrAJbziRMLzS2aTsixUO14%3Dreserved=0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf

2018-11-13 Thread Roland Scheidegger
Am 14.11.18 um 03:21 schrieb Matt Turner:
> On Tue, Nov 13, 2018 at 6:03 PM Roland Scheidegger  wrote:
>>
>> Am 13.11.18 um 23:49 schrieb Dylan Baker:
>>> Quoting Roland Scheidegger (2018-11-13 14:13:00)
>>>> Am 13.11.18 um 18:00 schrieb Dylan Baker:
>>>>> Quoting Erik Faye-Lund (2018-11-13 01:34:53)
>>>>>> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote:
>>>>>>> Quoting Erik Faye-Lund (2018-11-12 04:51:47)
>>>>>>>> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote:
>>>>>>>>> Which has the same behavior.
>>>>>>>>
>>>>>>>> Does it? I'm not so sure... IROUND_POS seems to round to nearest
>>>>>>>> integer depending on the FPU rounding mode, _mesa_roundevenf rounds
>>>>>>>> to
>>>>>>>> the nearest *even* value regardless of the FPU rounding mode, no?
>>>>>>>>
>>>>>>>> I'm not sure if it matters or not, but *at least* point that out in
>>>>>>>> the
>>>>>>>> commit message. Unless I'm missing something, of course...
>>>>>>>
>>>>>>> I should put it in the commit message, but there is a comment in
>>>>>>> rounding.h that
>>>>>>> if you change the rounding mode you get to keep the pieces.
>>>>>>
>>>>>> Well, this might regress performance pretty badly. Especially in the
>>>>>> swrast code, this could be bad...
>>>>>>
>>>>>
>>>>> Why? we have the assumption that you don't change the rounding mode 
>>>>> already in
>>>>> core mesa and many of the drivers.
>>>>>
>>>>> For performance, I measured a simple 1000 loops of rounding, and found 
>>>>> that the
>>>>> only way the rounding.h function was slower is if you used the __SSE4_1__
>>>>> path... (It was the same performance as the int cast +0.5 implementation)
>>>> FWIW I'm not entirely sure it's useful to have a sse41 implementation -
>>>> since all sse2 capable cpus can natively do rintf. Although maybe it
>>>> should be pointed out that the sse41 implementation will use a defined
>>>> rounding mode, whereas rintf will use current rounding mode. But I don't
>>>> think anyone ever cares for the results if a different rounding mode
>>>> would be set. Although of course rint and its variant do not actually
>>>> guarantee the even part of it (but well if it's a sse41 capable box we
>>>> pretty much know it would do just that anyway)... (And technically
>>>> nearbyintf would probably be an even better solution, since we never
>>>> want to get involved with the clunky exceptions, otherwise it's
>>>> identical. But there might be reasons why it isn't used.)
>>>>
>>>> Roland
>>>
>>> I'm not convinced we want it either, since it seems to be slower than 
>>> glibc's
>>> rintf. I guess it probably does make sense to use the nearbyintf instead.
>>>
>>> As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION 
>>> not
>>> check the rounding mode?
>> Oh indeed, I didn't check the code too closely (I was just assuming
>> _mm_round_ss() was used because it is possible to use round-to-nearest
>> regardless the actual rounding mode, but that's not the case).
>>
>> But actually I misread this code: the point of mesa_roundevenf is to
>> round to float WITHOUT conversion to int. In which case it makes more
>> sense at least at first look...
>>
>> But if you want to round to nearest integer WITH conversion to int, you
>> probably really want to use something else. nearbyint family doesn't
>> have variants which give you ints. There's rint functions which give you
>> ints directly, but they are likely a very bad idea (aside from exception
> 
> Why?
Not sure what the why refers to here?


> 
>> handling, not quite sure if this really causes the compiler to do
>> something different) because of giving you long (or long long) results -
>> meaning that you can't use the simple cpu instructions giving you 32bit
>> results (because conversion to 64bit long + trunc to 32bit will give you
>> defined (although meaningless) results in some cases where direct
>> conversion to 32bit int wouldn't).
>> So ideally you'd pick a variant where the compiler is smart enough to
>> recognize it ca

Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf

2018-11-13 Thread Roland Scheidegger
Am 14.11.18 um 03:02 schrieb Roland Scheidegger:
> Am 13.11.18 um 23:49 schrieb Dylan Baker:
>> Quoting Roland Scheidegger (2018-11-13 14:13:00)
>>> Am 13.11.18 um 18:00 schrieb Dylan Baker:
>>>> Quoting Erik Faye-Lund (2018-11-13 01:34:53)
>>>>> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote:
>>>>>> Quoting Erik Faye-Lund (2018-11-12 04:51:47)
>>>>>>> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote:
>>>>>>>> Which has the same behavior.
>>>>>>>
>>>>>>> Does it? I'm not so sure... IROUND_POS seems to round to nearest
>>>>>>> integer depending on the FPU rounding mode, _mesa_roundevenf rounds
>>>>>>> to
>>>>>>> the nearest *even* value regardless of the FPU rounding mode, no?
>>>>>>>
>>>>>>> I'm not sure if it matters or not, but *at least* point that out in
>>>>>>> the
>>>>>>> commit message. Unless I'm missing something, of course...
>>>>>>
>>>>>> I should put it in the commit message, but there is a comment in
>>>>>> rounding.h that
>>>>>> if you change the rounding mode you get to keep the pieces.
>>>>>
>>>>> Well, this might regress performance pretty badly. Especially in the
>>>>> swrast code, this could be bad...
>>>>>
>>>>
>>>> Why? we have the assumption that you don't change the rounding mode 
>>>> already in
>>>> core mesa and many of the drivers.
>>>>
>>>> For performance, I measured a simple 1000 loops of rounding, and found 
>>>> that the
>>>> only way the rounding.h function was slower is if you used the __SSE4_1__
>>>> path... (It was the same performance as the int cast +0.5 implementation)
>>> FWIW I'm not entirely sure it's useful to have a sse41 implementation -
>>> since all sse2 capable cpus can natively do rintf. Although maybe it
>>> should be pointed out that the sse41 implementation will use a defined
>>> rounding mode, whereas rintf will use current rounding mode. But I don't
>>> think anyone ever cares for the results if a different rounding mode
>>> would be set. Although of course rint and its variant do not actually
>>> guarantee the even part of it (but well if it's a sse41 capable box we
>>> pretty much know it would do just that anyway)... (And technically
>>> nearbyintf would probably be an even better solution, since we never
>>> want to get involved with the clunky exceptions, otherwise it's
>>> identical. But there might be reasons why it isn't used.)
>>>
>>> Roland
>>
>> I'm not convinced we want it either, since it seems to be slower than glibc's
>> rintf. I guess it probably does make sense to use the nearbyintf instead.
>>
>> As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION 
>> not
>> check the rounding mode?
> Oh indeed, I didn't check the code too closely (I was just assuming
> _mm_round_ss() was used because it is possible to use round-to-nearest
> regardless the actual rounding mode, but that's not the case).
> 
> But actually I misread this code: the point of mesa_roundevenf is to
> round to float WITHOUT conversion to int. In which case it makes more
> sense at least at first look...
> 
> But if you want to round to nearest integer WITH conversion to int, you
> probably really want to use something else. nearbyint family doesn't
> have variants which give you ints. There's rint functions which give you
> ints directly, but they are likely a very bad idea (aside from exception
> handling, not quite sure if this really causes the compiler to do
> something different) because of giving you long (or long long) results -
> meaning that you can't use the simple cpu instructions giving you 32bit
> results (because conversion to 64bit long + trunc to 32bit will give you
> defined (although meaningless) results in some cases where direct
> conversion to 32bit int wouldn't).
> So ideally you'd pick a variant where the compiler is smart enough to
> recognize it can be done with a single instruction. I would guess
> nearbyintf + int cast should do just about everywhere, at least as long
> as x64 or x86 + sse2 is used, my suspicion is the old IROUND function
> was done in a time where x87 was still relevant. Or maybe rintf + int
> cast, no idea how the compiler really handles them differently (I tried
> to quickly look at it in gcc source, but no idea where those are
> buried)

Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf

2018-11-13 Thread Roland Scheidegger
Am 13.11.18 um 23:49 schrieb Dylan Baker:
> Quoting Roland Scheidegger (2018-11-13 14:13:00)
>> Am 13.11.18 um 18:00 schrieb Dylan Baker:
>>> Quoting Erik Faye-Lund (2018-11-13 01:34:53)
>>>> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote:
>>>>> Quoting Erik Faye-Lund (2018-11-12 04:51:47)
>>>>>> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote:
>>>>>>> Which has the same behavior.
>>>>>>
>>>>>> Does it? I'm not so sure... IROUND_POS seems to round to nearest
>>>>>> integer depending on the FPU rounding mode, _mesa_roundevenf rounds
>>>>>> to
>>>>>> the nearest *even* value regardless of the FPU rounding mode, no?
>>>>>>
>>>>>> I'm not sure if it matters or not, but *at least* point that out in
>>>>>> the
>>>>>> commit message. Unless I'm missing something, of course...
>>>>>
>>>>> I should put it in the commit message, but there is a comment in
>>>>> rounding.h that
>>>>> if you change the rounding mode you get to keep the pieces.
>>>>
>>>> Well, this might regress performance pretty badly. Especially in the
>>>> swrast code, this could be bad...
>>>>
>>>
>>> Why? we have the assumption that you don't change the rounding mode already 
>>> in
>>> core mesa and many of the drivers.
>>>
>>> For performance, I measured a simple 1000 loops of rounding, and found that 
>>> the
>>> only way the rounding.h function was slower is if you used the __SSE4_1__
>>> path... (It was the same performance as the int cast +0.5 implementation)
>> FWIW I'm not entirely sure it's useful to have a sse41 implementation -
>> since all sse2 capable cpus can natively do rintf. Although maybe it
>> should be pointed out that the sse41 implementation will use a defined
>> rounding mode, whereas rintf will use current rounding mode. But I don't
>> think anyone ever cares for the results if a different rounding mode
>> would be set. Although of course rint and its variant do not actually
>> guarantee the even part of it (but well if it's a sse41 capable box we
>> pretty much know it would do just that anyway)... (And technically
>> nearbyintf would probably be an even better solution, since we never
>> want to get involved with the clunky exceptions, otherwise it's
>> identical. But there might be reasons why it isn't used.)
>>
>> Roland
> 
> I'm not convinced we want it either, since it seems to be slower than glibc's
> rintf. I guess it probably does make sense to use the nearbyintf instead.
> 
> As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION not
> check the rounding mode?
Oh indeed, I didn't check the code too closely (I was just assuming
_mm_round_ss() was used because it is possible to use round-to-nearest
regardless the actual rounding mode, but that's not the case).

But actually I misread this code: the point of mesa_roundevenf is to
round to float WITHOUT conversion to int. In which case it makes more
sense at least at first look...

But if you want to round to nearest integer WITH conversion to int, you
probably really want to use something else. nearbyint family doesn't
have variants which give you ints. There's rint functions which give you
ints directly, but they are likely a very bad idea (aside from exception
handling, not quite sure if this really causes the compiler to do
something different) because of giving you long (or long long) results -
meaning that you can't use the simple cpu instructions giving you 32bit
results (because conversion to 64bit long + trunc to 32bit will give you
defined (although meaningless) results in some cases where direct
conversion to 32bit int wouldn't).
So ideally you'd pick a variant where the compiler is smart enough to
recognize it can be done with a single instruction. I would guess
nearbyintf + int cast should do just about everywhere, at least as long
as x64 or x86 + sse2 is used, my suspicion is the old IROUND function
was done in a time where x87 was still relevant. Or maybe rintf + int
cast, no idea how the compiler really handles them differently (I tried
to quickly look at it in gcc source, but no idea where those are
buried). As a side note, I hate it when the assembly solution is obvious
and you can't really figure out how the hell you should coax the
compiler in giving you the right answer (I mean, high level languages
are there to help, not get in your way...).

All that said, I still don't really see the point of the manual sse41
assembly (even for the case when we don't want to convert to int) -
assuming there is an easy solution to get the compiler to do the right
thing...

Roland

> 
> Dylan
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf

2018-11-13 Thread Roland Scheidegger
Am 13.11.18 um 18:00 schrieb Dylan Baker:
> Quoting Erik Faye-Lund (2018-11-13 01:34:53)
>> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote:
>>> Quoting Erik Faye-Lund (2018-11-12 04:51:47)
 On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote:
> Which has the same behavior.

 Does it? I'm not so sure... IROUND_POS seems to round to nearest
 integer depending on the FPU rounding mode, _mesa_roundevenf rounds
 to
 the nearest *even* value regardless of the FPU rounding mode, no?

 I'm not sure if it matters or not, but *at least* point that out in
 the
 commit message. Unless I'm missing something, of course...
>>>
>>> I should put it in the commit message, but there is a comment in
>>> rounding.h that
>>> if you change the rounding mode you get to keep the pieces.
>>
>> Well, this might regress performance pretty badly. Especially in the
>> swrast code, this could be bad...
>>
> 
> Why? we have the assumption that you don't change the rounding mode already in
> core mesa and many of the drivers.
> 
> For performance, I measured a simple 1000 loops of rounding, and found that 
> the
> only way the rounding.h function was slower is if you used the __SSE4_1__
> path... (It was the same performance as the int cast +0.5 implementation)
FWIW I'm not entirely sure it's useful to have a sse41 implementation -
since all sse2 capable cpus can natively do rintf. Although maybe it
should be pointed out that the sse41 implementation will use a defined
rounding mode, whereas rintf will use current rounding mode. But I don't
think anyone ever cares for the results if a different rounding mode
would be set. Although of course rint and its variant do not actually
guarantee the even part of it (but well if it's a sse41 capable box we
pretty much know it would do just that anyway)... (And technically
nearbyintf would probably be an even better solution, since we never
want to get involved with the clunky exceptions, otherwise it's
identical. But there might be reasons why it isn't used.)

Roland


> 
> Dylan
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7C5f77a09021be4da94a1c08d649899668%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636777252795733409sdata=ZS9kXWZAg0jOYt5bXyPV2rqlnhqN1ojr675tb8kKPTg%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: add lowering for ffloor

2018-11-13 Thread Roland Scheidegger
Am 12.11.18 um 20:40 schrieb Jason Ekstrand:
> On Mon, Nov 12, 2018 at 1:29 PM Christian Gmeiner
> mailto:christian.gmei...@gmail.com>> wrote:
> 
> Hi Jason
> 
> Am Sa., 1. Sep. 2018 um 21:23 Uhr schrieb Jason Ekstrand
> mailto:ja...@jlekstrand.net>>:
> >
> > I don't think either of these work for negative numbers
> >
> 
> I would like to land this patch - can you provide some details why
> this does not work for
> negative numbers?
> 
> 
> No, this is correct.  It's GLSL's definition of fract(x) that's dumb. 
> GLSL defines fract(x) as x - floor(x) (exactly what your formula says. 
> This means that
> 
> fract(-1.4) = -1.4 - floor(-1.4) = -1.4 - (-2.0) = -1.4 + 2.0 = 0.6
> 
> so GLSL fract() doesn't give you the fractional part at all.  *sigh*
Can you elaborate why you think this result to be wrong? In fact it's
the only definition I've ever heard of. But yes, wikipedia says fract
for negative numbers is open for debate (with 3 possible solutions). In
the context of graphics shading languages, it is however most definitely
the agreed-upon formula for fractional parts (and certainly the same
formula was used pre-glsl or hlsl).

Roland


> 
> Reviewed-by: Jason Ekstrand  >
>  
> 
> > On September 1, 2018 14:16:11 Christian Gmeiner
> > mailto:christian.gmei...@gmail.com>>
> wrote:
> >
> > > Signed-off-by: Christian Gmeiner  >
> > > ---
> > > src/compiler/nir/nir.h                | 3 +++
> > > src/compiler/nir/nir_opt_algebraic.py | 1 +
> > > 2 files changed, 4 insertions(+)
> > >
> > > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> > > index 169fa1fa20..d81eefc032 100644
> > > --- a/src/compiler/nir/nir.h
> > > +++ b/src/compiler/nir/nir.h
> > > @@ -2054,6 +2054,9 @@ typedef struct nir_shader_compiler_options {
> > >     */
> > >    bool fdot_replicates;
> > >
> > > +   /** lowers ffloor to fsub+ffract: */
> > > +   bool lower_ffloor;
> > > +
> > >    /** lowers ffract to fsub+ffloor: */
> > >    bool lower_ffract;
> > >
> > > diff --git a/src/compiler/nir/nir_opt_algebraic.py
> > > b/src/compiler/nir/nir_opt_algebraic.py
> > > index ae1261f874..3d2b861a42 100644
> > > --- a/src/compiler/nir/nir_opt_algebraic.py
> > > +++ b/src/compiler/nir/nir_opt_algebraic.py
> > > @@ -118,6 +118,7 @@ optimizations = [
> > >    (('~flrp', a, 0.0, c), ('fadd', ('fmul', ('fneg', a), c), a)),
> > >    (('flrp@32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a),
> > >    'options->lower_flrp32'),
> > >    (('flrp@64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a),
> > >    'options->lower_flrp64'),
> > > +   (('ffloor', a), ('fsub', a, ('ffract', a)),
> 'options->lower_ffloor'),
> > >    (('ffract', a), ('fsub', a, ('ffloor', a)),
> 'options->lower_ffract'),
> > >    (('~fadd', ('fmul', a, ('fadd', 1.0, ('fneg', ('b2f', c,
> ('fmul', b,
> > >    ('b2f', c))), ('bcsel', c, b, a), 'options->lower_flrp32'),
> > >    (('~fadd@32', ('fmul', a, ('fadd', 1.0, ('fneg',         c
> ))), ('fmul', b,
> > >            c )), ('flrp', a, b, c), '!options->lower_flrp32'),
> > > --
> > > 2.17.1
> > >
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> 
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
> >
> >
> >
> 
> -- 
> greets
> --
> Christian Gmeiner, MSc
> 
> https://christian-gmeiner.info
> 
> 
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Ce0fe2529778847ca3ef208d648d6bb83%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636776484451523879sdata=uioIRixNYHv8b%2FKTa%2BXqkzkDVZCnlm4FYdzsnxsDo38%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: fix improper clamping of vertex index when fetching gs inputs

2018-11-08 Thread Roland Scheidegger
Am 08.11.18 um 14:59 schrieb Jose Fonseca:
> Good find.
> 
> On 08/11/2018 01:54, srol...@vmware.com wrote:
>> From: Roland Scheidegger 
>>
>> Because we only have one file_max for the (2d) gs input file, the value
>> actually represents the max of attrib and vertex index (although I'm
>> not entirely sure if we really want the max, since the max valid value
>> of the vertex dimension can be easily deduced from the input primitive).
> 
> Yes, perhaps we should have a 2nd file_max array for the 2nd axis.   But
> it would be a more invasive change.
If just for gs inputs, I can't see why we'd need it, since it can easily
be deduced from the input prim (we also have file_count, although that
seems just plain broken to me for gs inputs, as well as file_mask, which
also mixes vertex / attrib indexes, and looks useless in that form to me
as well).

> 
>> Thus in cases where the number of inputs is higher than the number of
>> vertices per prim, we did not properly clamp the vertex index, which
>> would result in out-of-bound fetches, potentially causing segfaults
>> (the segfaults seemed actually difficult to trigger, but valgrind
>> certainly wasn't happy). This might have happened even if the shader
>> did not actually try to fetch bogus vertices, if the fetching happened
>> in non-active conditional clauses.
>>
>> To fix simply use the correct max vertex index value (derived from
>> the input prim type) instead when clamping for this case.
>> ---
>>   .../auxiliary/gallivm/lp_bld_tgsi_soa.c   | 38 ++-
>>   1 file changed, 28 insertions(+), 10 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
>> b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
>> index 83d7dbea9a..0db81b31ad 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
>> @@ -41,6 +41,7 @@
>>   #include "util/u_debug.h"
>>   #include "util/u_math.h"
>>   #include "util/u_memory.h"
>> +#include "util/u_prim.h"
>>   #include "tgsi/tgsi_dump.h"
>>   #include "tgsi/tgsi_exec.h"
>>   #include "tgsi/tgsi_info.h"
>> @@ -1059,7 +1060,8 @@ emit_mask_scatter(struct
>> lp_build_tgsi_soa_context *bld,
>>   static LLVMValueRef
>>   get_indirect_index(struct lp_build_tgsi_soa_context *bld,
>>  unsigned reg_file, unsigned reg_index,
>> -   const struct tgsi_ind_register *indirect_reg)
>> +   const struct tgsi_ind_register *indirect_reg,
>> +   unsigned index_limit)
> 
> file_max array is signed, so let's make index_limit also signed, and
> assert here in this function that it is positive.  This would trap if
> some caller gets the limit for the wrong file.
> 
> Which reminds me -- SM4 allows to declare constant buffers with
> undeclared size.  What exactly would we get passed here in those
> circunstances?  In practice constant buffers can't be larger than 64KB,
> which means we could presume filemax is 4K registers.
For constant reg file, the clamping is skipped in this function here.
emit_fetch_constant does its own clamping, based on the actual size of
the bound buffers. (Although note the clamping is only done for indirect
constant buffer fetches, not for direct ones, which may not be safe.
Maybe should fix that as well? I think it was omitted for performance
reasons, and because unlike for indirect fetches the shader would be
somewhat bogus, but I guess it would still result in out-of-bound fetches.)


> 
>>   {
>>  LLVMBuilderRef builder = bld->bld_base.base.gallivm->builder;
>>  struct lp_build_context *uint_bld = >bld_base.uint_bld;
>> @@ -1107,8 +1109,7 @@ get_indirect_index(struct
>> lp_build_tgsi_soa_context *bld,
>>   */
>>  if (reg_file != TGSI_FILE_CONSTANT) {
>>     max_index = lp_build_const_int_vec(bld->bld_base.base.gallivm,
>> - uint_bld->type,
>> -
>> bld->bld_base.info->file_max[reg_file]);
>> + uint_bld->type, index_limit);
>>       assert(!uint_bld->type.sign);
>>     index = lp_build_min(uint_bld, index, max_index);
>> @@ -1224,7 +1225,8 @@ emit_fetch_constant(
>>     indirect_index = get_indirect_index(bld,
>>     reg->Register.File,
>>     reg->Register.Index,
>> -  >Indire

Re: [Mesa-dev] [PATCH vulkan/spec 1/3] Add an extension to specify that derivative groups are quads

2018-11-06 Thread Roland Scheidegger
Am 06.11.18 um 22:48 schrieb Jason Ekstrand:
> This came to the top of my list recently due to a difference between
> OpenGL and Vulkan discard operations and D3D's discard operation.  The
> OpenGL and Vulkan discard is defined to be control flow and derivatives
> are undefined after discard.  With D3D, derivatives are considered
> well-defined after discard.
> 
> In order to work around this, DXVK (and I would assume VKD3D though I'm
> not sure), simply sets a global boolean instead of doing the discard and
> then emits `if (do_discard) discard;` at the end of the shader.  For
> complex shadaers, this leads to the shader doing way more work than
> needed and poor performance.
Is that really all they do? This will not work in presence of UAVs
(shader images / ssbos), since after a discard writes must have no effect.

> If, on the other hand, they knew that
> derivative groups are just subgroup quads, they could do something
> better:
> 
> bool want_discard;
> 
> void d3d_discard()
> {
> want_discard = true;
> if (subgroupClusteredAnd(want_discard, 4))
> discard;
> }
> 
> void main()
> {
> want_discard = false;
> 
> // stuff
> 
> if (some_condition)
> d3d_discard();
> 
> // Exepensive stuff
Expensive

> 
> if (want_discard)
> discard;
If the expensive stuff includes buffer / image writes, that still
wouldn't work as far as I can tell (although it could be fixed, just
like without the extension, by wrapping the writes with if (!want_discard).

Roland


> }
> ---
>  chapters/features.txt| 21 +
>  chapters/shaders.txt | 12 
>  chapters/textures.txt|  8 
>  include/vulkan/vulkan_core.h | 15 +--
>  xml/vk.xml   | 13 ++---
>  5 files changed, 64 insertions(+), 5 deletions(-)
> 
> diff --git a/chapters/features.txt b/chapters/features.txt
> index 08c8d8420..3d22972ea 100644
> --- a/chapters/features.txt
> +++ b/chapters/features.txt
> @@ -2969,6 +2969,27 @@ more slink:VkSubgroupFeatureFlagBits.
>  
>  endif::VK_VERSION_1_1[]
>  
> +ifdef::VK_EXT_derivative_group_quad[]
> +
> +[open,refpage='VkPhysicalDeviceDerivativeGroupQuadPropertiesEXT',desc='Structure
>  describing the relationship between derivative groups and subgroup quads for 
> an implementation',type='structs']
> +--
> +
> +The sname:VkPhysicalDeviceDerivativeGroupQuadPropertiesEXT structure is
> +defined as:
> +
> +include::../api/structs/VkPhysicalDeviceDerivativeGroupQuadPropertiesEXT.txt[]
> +
> +The members of the sname:VkPhysicalDeviceDerivativeGroupQuadPropertiesEXT
> +structure describe the following implementation-dependent limits:
> +
> +  * pname:derivativeGroupsAreSubgroupQuads is a boolean that specifies that
> +derivative groups in fragment shaders correspond to subgroup quads.
> +--
> +
> +include::../validity/structs/VkPhysicalDeviceDerivativeGroupQuadPropertiesEXT.txt[]
> +
> +endif::VK_EXT_derivative_group_quad[]
> +
>  ifdef::VK_EXT_blend_operation_advanced[]
>  
>  
> [open,refpage='VkPhysicalDeviceBlendOperationAdvancedPropertiesEXT',desc='Structure
>  describing advanced blending limits that can be supported by an 
> implementation',type='structs']
> diff --git a/chapters/shaders.txt b/chapters/shaders.txt
> index 5cb3edb35..11d3ea9db 100644
> --- a/chapters/shaders.txt
> +++ b/chapters/shaders.txt
> @@ -808,6 +808,11 @@ A _derivative group_ (see the subsection "`Control 
> Flow`" of section 2 of
>  the SPIR-V 1.00 Revision 4 specification) for a fragment shader is the set
>  of invocations generated by a single primitive (point, line, or triangle),
>  including any helper invocations generated by that primitive.
> +ifdef::VK_EXT_derivative_group_quad[]
> +If the fname:derivativeGroupsAreSubgroupQuads field of
> +slink:VkPhysicalDeviceDerivativeGroupQuadPropertiesEXT is ename:VK_TRUE, a
> +derivative group for a fragment shader is a single subgroup quad.
> +endif::VK_EXT_derivative_group_quad[]
>  ifdef::VK_NV_compute_shader_derivatives[]
>  A derivative group for a compute shader is a single local workgroup.
>  endif::VK_NV_compute_shader_derivatives[]
> @@ -920,6 +925,13 @@ The operations supported are add, mul, min, max, and, 
> or, xor.
>  
>  The quad subgroup operations allow clusters of 4 invocations (a quad), to
>  share data efficiently with each other.
> +ifdef::VK_EXT_derivative_group_quad[]
> +For fragment shaders, if the fname:derivativeGroupsAreSubgroupQuads field of
> +slink:VkPhysicalDeviceDerivativeGroupQuadPropertiesEXT is ename:VK_TRUE,
> +each quad corresponds to one of the groups of four shader
> +invocations used for
> +<>.
> +endif::VK_EXT_derivative_group_quad[]
>  ifdef::VK_NV_compute_shader_derivatives[]
>  For compute shaders using the code:DerivativeGroupQuadsNV or
>  code:DerivativeGroupLinearNV execution modes, each quad corresponds to one
> diff --git a/chapters/textures.txt 

Re: [Mesa-dev] [PATCH v2 2/5] gallium: Add new PIPE_CAP_MULTISAMPLED_RENDER_TO_TEXTURE

2018-11-06 Thread Roland Scheidegger
Am 07.11.18 um 00:03 schrieb Kristian Høgsberg:
> On Tue, Nov 6, 2018 at 2:44 PM Axel Davy  wrote:
>>
>> Hi,
>>
>> Is there anything to be done in the nine state trackers (or other state
>> trackers).
>>
>> Nine uses create_surface. Should it expect the field to be filled
>> properly by the driver ?
> 
> Nothing is required from any state tracker, but if your API has an
> extension like EXT_multisampled_render_to_texture, and the driver has
> this new capability, you can set pipe_surface::nr_samples. The driver
> will then render with that many samples internally and transparently
> resolve the rendering to the (non-MSAA) resource.
> 
>> On 06/11/2018 23:09, Kristian H. Kristensen wrote:
>>> +   /**
>>> +* If a driver doesn't advertise 
>>> PIPE_CAP_MULTISAMPLED_RENDER_TO_TEXTURE,
>>> +* pipe_surface::nr_samples will always be 0.
>>> +*/
>> The above comment should be added to the comment below.
>>> +   /** Number of samples for the surface.  This can be different from the
>>> +* resource nr_samples when the resource is bound using
>>> +* FramebufferTexture2DMultisampleEXT.
>>> +*/
>>> +   unsigned nr_samples:8;
> 
> Hm, I probably need to reword that a bit, since it implies the surface
> sample count can be same as the resource, when it is only ever either
> surface samples = 0 or surface samples > 1 with resource samples = 1.
Wouldn't it be more logical if you rather adjust the code to match the
comment? That is, the surface would inherit the sample count of the
resource by default, but can be set to something different for this
extension.


> Kristian
> 
>>> +
>>>  union pipe_surface_desc u;
>>>   };
>>>
>>
>>
>> Yours,
>>
>>
>> Axel Davy
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Cbd68e613af17447b3eae08d6443c12d3%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636771422156943417sdata=pji9JMcMB0DQyRIzske1nXJyCpneZ4RxITU9ov2A92o%3Dreserved=0
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Cbd68e613af17447b3eae08d6443c12d3%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636771422156943417sdata=pji9JMcMB0DQyRIzske1nXJyCpneZ4RxITU9ov2A92o%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 4/4] r600: Add support for EXT_texture_sRGB_R8

2018-11-06 Thread Roland Scheidegger

Reviewed-by: Roland Scheidegger 

Am 06.11.18 um 12:04 schrieb Gert Wollny:
> Hi Dave & Roland, 
> 
> Can I persuade you to take a look at this one-liner? 
> 
> Many thanks, 
> Gert 
> 
> Am Donnerstag, den 01.11.2018, 12:59 +0100 schrieb Gert Wollny:
>> From: Gert Wollny 
>>
>> Enables on R600 and makes pass:
>>   dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.*
>>   dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8*
>>
>> v2: remove chunk for dri/radeon (Emil)
>>
>> Signed-off-by: Gert Wollny 
>> ---
>>  src/gallium/drivers/r600/r600_state_common.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/src/gallium/drivers/r600/r600_state_common.c
>> b/src/gallium/drivers/r600/r600_state_common.c
>> index e6c1b0be97..2d36541787 100644
>> --- a/src/gallium/drivers/r600/r600_state_common.c
>> +++ b/src/gallium/drivers/r600/r600_state_common.c
>> @@ -2917,6 +2917,7 @@ uint32_t r600_translate_texformat(struct
>> pipe_screen *screen,
>>  switch (desc->nr_channels) {
>>  case 1:
>>  result = FMT_8;
>> +is_srgb_valid = TRUE;
>>  goto out_word4;
>>  case 2:
>>  result = FMT_8_8;

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/4] Gallium: Add format PIPE_FORMAT_R8_SRGB

2018-11-01 Thread Roland Scheidegger
Am 01.11.18 um 15:48 schrieb Gert Wollny:
> Am Dienstag, den 30.10.2018, 16:04 + schrieb Roland Scheidegger:
>> With the format ordering in svga_format.c as Ilia mentioned fixed
>> Reviewed-by: Roland Scheidegger 
> 
> CMIIW, but I guess was that your R-b was only for this patch, right?
Right, I didn't have time to look at the others.

Roland


> 
> Best, 
> Gert 
> 
>>
>> Am 30.10.18 um 11:46 schrieb Gert Wollny:
>>> This format is needed to support EXT_texture_sRGB_R8. THe patch
>>> adds a new
>>> format enum, the format entries in Gallium and and svga, the
>>> mapping between
>>> sRGB and linear formats, and tests.
>>>
>>>   v2: - add mapping to linear format for PIPE_FORMATR_R8_SRGB
>>>   v3: - Add texture format to svga format table since otherwise
>>> building
>>> mesa will fail when this driver is enabled. It was not
>>> tested
>>> whether the extension actually works.
>>>   v4: - svga: remove the SVGA specific format definitions and table
>>> entries
>>> and only add correct the location of PIPE_FORMAT_R8_SRGB in
>>> the
>>> format_conversion_table (Ilia Mirkin)
>>>   - Split patch (1/2) to separate Gallium part and mesa/st
>>> part.
>>> (Roland Scheidegger)
>>>   - Trim the commit message to only contain the relevant parts
>>> from the
>>> split.
>>>
>>> Signed-off-by: Gert Wollny 
>>> ---
>>>  src/gallium/auxiliary/util/u_format.csv | 1 +
>>>  src/gallium/auxiliary/util/u_format.h   | 4 
>>>  src/gallium/auxiliary/util/u_format_tests.c | 4 
>>>  src/gallium/drivers/svga/svga_format.c  | 1 +
>>>  src/gallium/include/pipe/p_format.h | 2 ++
>>>  5 files changed, 12 insertions(+)
>>>
>>> diff --git a/src/gallium/auxiliary/util/u_format.csv
>>> b/src/gallium/auxiliary/util/u_format.csv
>>> index f9e4925f27..911ac07d32 100644
>>> --- a/src/gallium/auxiliary/util/u_format.csv
>>> +++ b/src/gallium/auxiliary/util/u_format.csv
>>> @@ -114,6 +114,7 @@ PIPE_FORMAT_I32_FLOAT , plain, 1,
>>> 1, f32 , , , , , r
>>>  
>>>  # SRGB formats
>>>  PIPE_FORMAT_L8_SRGB   , plain, 1, 1, un8
>>> , , , , xxx1, srgb 
>>> +PIPE_FORMAT_R8_SRGB   , plain, 1, 1, un8
>>> , , , , x001, srgb
>>>  PIPE_FORMAT_L8A8_SRGB , plain, 1, 1, un8 , un8
>>> , , , xxxy, srgb 
>>>  PIPE_FORMAT_R8G8B8_SRGB   , plain, 1, 1, un8 , un8 , un8
>>> , , xyz1, srgb 
>>>  PIPE_FORMAT_R8G8B8A8_SRGB , plain, 1, 1, un8 , un8 , un8 ,
>>> un8 , xyzw, srgb 
>>> diff --git a/src/gallium/auxiliary/util/u_format.h
>>> b/src/gallium/auxiliary/util/u_format.h
>>> index e66849c16b..5bcfc1f115 100644
>>> --- a/src/gallium/auxiliary/util/u_format.h
>>> +++ b/src/gallium/auxiliary/util/u_format.h
>>> @@ -925,6 +925,8 @@ util_format_srgb(enum pipe_format format)
>>> switch (format) {
>>> case PIPE_FORMAT_L8_UNORM:
>>>return PIPE_FORMAT_L8_SRGB;
>>> +   case PIPE_FORMAT_R8_UNORM:
>>> +  return PIPE_FORMAT_R8_SRGB;
>>> case PIPE_FORMAT_L8A8_UNORM:
>>>return PIPE_FORMAT_L8A8_SRGB;
>>> case PIPE_FORMAT_R8G8B8_UNORM:
>>> @@ -1001,6 +1003,8 @@ util_format_linear(enum pipe_format format)
>>> switch (format) {
>>> case PIPE_FORMAT_L8_SRGB:
>>>return PIPE_FORMAT_L8_UNORM;
>>> +   case PIPE_FORMAT_R8_SRGB:
>>> +  return PIPE_FORMAT_R8_UNORM;
>>> case PIPE_FORMAT_L8A8_SRGB:
>>>return PIPE_FORMAT_L8A8_UNORM;
>>> case PIPE_FORMAT_R8G8B8_SRGB:
>>> diff --git a/src/gallium/auxiliary/util/u_format_tests.c
>>> b/src/gallium/auxiliary/util/u_format_tests.c
>>> index 9c9a5838d1..dee52533c1 100644
>>> --- a/src/gallium/auxiliary/util/u_format_tests.c
>>> +++ b/src/gallium/auxiliary/util/u_format_tests.c
>>> @@ -236,6 +236,10 @@ util_format_test_cases[] =
>>> {PIPE_FORMAT_L8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0xbc),
>>> UNPACKED_1x1(0.502886458033, 0.502886458033, 0.502886458033, 1.0)},
>>> {PIPE_FORMAT_L8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0xff),
>>> UNPACKED_1x1(1.0, 1.0, 1.0, 1.0)},
>>>  
>>> +   {PIPE_FORMAT_R8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0x00),
>>> UNPACKED_1x1(0.0, 0.0, 0.

Re: [Mesa-dev] [PATCH 09/12] gallivm: remove workarounds for pre LLVM 5.0

2018-10-31 Thread Roland Scheidegger
Am 31.10.18 um 14:30 schrieb Emil Velikov:
> From: Emil Velikov 
> 
> With LLVM 5.0.1 the minimum required version, we can drop all the dead
> code.
> 
> Cc: Roland Scheidegger 
> Cc: Jose Fonseca 
> Signed-off-by: Emil Velikov 
> ---
> Gents this is a quick and dirty grep job. A couple of places may need
> the comments to be tweaked/dropped - I've annotated those with XXX.
Looks good enough to me, these comments can be tweaked later (it is also
possible there's some minor simplification possible in some places with
the old stuff dropped).
Quite the ifdef cleanup :-).

For 9/12, 10/12:
Reviewed-by: Roland Scheidegger 


> ---
>  src/gallium/auxiliary/gallivm/lp_bld.h|  25 +-
>  src/gallium/auxiliary/gallivm/lp_bld_arit.c   | 104 +-
>  .../auxiliary/gallivm/lp_bld_debug.cpp|   9 -
>  src/gallium/auxiliary/gallivm/lp_bld_gather.c |   4 +-
>  src/gallium/auxiliary/gallivm/lp_bld_init.c   |  68 +---
>  src/gallium/auxiliary/gallivm/lp_bld_intr.c   |  38 +-
>  src/gallium/auxiliary/gallivm/lp_bld_intr.h   |   6 +-
>  src/gallium/auxiliary/gallivm/lp_bld_logic.c  |   5 +-
>  src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 346 +-
>  src/gallium/auxiliary/gallivm/lp_bld_misc.h   |   1 -
>  10 files changed, 26 insertions(+), 580 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld.h 
> b/src/gallium/auxiliary/gallivm/lp_bld.h
> index 239c27e3c25..a008541c18d 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld.h
> +++ b/src/gallium/auxiliary/gallivm/lp_bld.h
> @@ -53,16 +53,8 @@
>  #ifndef HAVE_LLVM
>  #error "HAVE_LLVM should be set with LLVM's version number, e.g. (0x0207 for 
> 2.7)"
>  #endif
> -#if HAVE_LLVM < 0x303
> -#error "LLVM 3.3 or newer required"
> -#endif
> -
> -
> -#if HAVE_LLVM <= 0x0303
> -/* We won't actually use LLVMMCJITMemoryManagerRef, just create a dummy
> - * typedef to simplify things elsewhere.
> - */
> -typedef void *LLVMMCJITMemoryManagerRef;
> +#if HAVE_LLVM < 0x500
> +#error "LLVM 5.0 or newer required"
>  #endif
>  
>  
> @@ -96,17 +88,4 @@ typedef void *LLVMMCJITMemoryManagerRef;
>  #define LLVMCreateBuilder ILLEGAL_LLVM_FUNCTION
>  
>  
> -/*
> - * Before LLVM 3.4 LLVMSetAlignment only supported GlobalValue, not
> - * LoadInst/StoreInst as we need.
> - */
> -#if HAVE_LLVM < 0x0304
> -#  ifdef __cplusplus
> -  extern "C"
> -#  endif
> -   void LLVMSetAlignmentBackport(LLVMValueRef V, unsigned Bytes);
> -#  define LLVMSetAlignment LLVMSetAlignmentBackport
> -#endif
> -
> -
>  #endif /* LP_BLD_H */
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> index f348833206b..e91ff361924 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> @@ -142,49 +142,6 @@ lp_build_min_simple(struct lp_build_context *bld,
>   intrinsic = "llvm.ppc.altivec.vminfp";
>   intr_size = 128;
>}
> -   } else if (HAVE_LLVM < 0x0309 &&
> -  util_cpu_caps.has_avx2 && type.length > 4) {
> -  intr_size = 256;
> -  switch (type.width) {
> -  case 8:
> - intrinsic = type.sign ? "llvm.x86.avx2.pmins.b" : 
> "llvm.x86.avx2.pminu.b";
> - break;
> -  case 16:
> - intrinsic = type.sign ? "llvm.x86.avx2.pmins.w" : 
> "llvm.x86.avx2.pminu.w";
> - break;
> -  case 32:
> - intrinsic = type.sign ? "llvm.x86.avx2.pmins.d" : 
> "llvm.x86.avx2.pminu.d";
> - break;
> -  }
> -   } else if (HAVE_LLVM < 0x0309 &&
> -  util_cpu_caps.has_sse2 && type.length >= 2) {
> -  intr_size = 128;
> -  if ((type.width == 8 || type.width == 16) &&
> -  (type.width * type.length <= 64) &&
> -  (gallivm_debug & GALLIVM_DEBUG_PERF)) {
> - debug_printf("%s: inefficient code, bogus shuffle due to packing\n",
> -  __FUNCTION__);
> -  }
> -  if (type.width == 8 && !type.sign) {
> - intrinsic = "llvm.x86.sse2.pminu.b";
> -  }
> -  else if (type.width == 16 && type.sign) {
> - intrinsic = "llvm.x86.sse2.pmins.w";
> -  }
> -  if (util_cpu_caps.has_sse4_1) {
> - if (type.width == 8 && type.sign) {
> -intrinsic = "llvm.x86.sse41.pminsb";
> - }
> - if (type.width == 16 && !type.sign) {
> -intrinsic = "llvm.x86.sse41.pminuw";
&

Re: [Mesa-dev] [PATCH] gallium/util: don't let children of fork & exec inherit our thread affinity

2018-10-30 Thread Roland Scheidegger
Am 30.10.18 um 23:55 schrieb Marek Olšák:
> On Tue, Oct 30, 2018 at 6:32 PM Gustaw Smolarczyk  > wrote:
> 
> wt., 30 paź 2018, 23:01 Marek Olšák  >:
> 
> On Mon, Oct 29, 2018 at 12:43 PM Michel Dänzer
> mailto:mic...@daenzer.net>> wrote:
> 
> On 2018-10-28 11:27 a.m., Gustaw Smolarczyk wrote:
> > pon., 17 wrz 2018 o 18:24 Michel Dänzer
> mailto:mic...@daenzer.net>> napisał(a):
> >>
> >> On 2018-09-15 3:04 a.m., Marek Olšák wrote:
> >>> On Fri, Sep 14, 2018 at 4:53 AM, Michel Dänzer
> mailto:mic...@daenzer.net>> wrote:
> 
>  Last but not least, this doesn't solve the issue of
> apps such as
>  blender, which spawn their own worker threads after
> initializing OpenGL
>  (possibly not themselves directly, but via the toolkit
> or another
>  library; e.g. GTK+4 uses OpenGL by default), inheriting
> the thread affinity.
> 
> 
>  Due to these issues, setting the thread affinity needs
> to be disabled by
>  default, and only white-listed for applications where
> it's known safe
>  and beneficial. This sucks, but I'm afraid that's the
> reality until
>  there's better API available which allows solving these
> issues.
> >>>
> >>> We don't have the bandwidth to maintain whitelists. This
> will either
> >>> have to be always on or always off.
> >>>
> >>> On the positive side, only Ryzens with multiple CCXs get
> all the
> >>> benefits and disadvantages.
> >>
> >> In other words, only people who spent relatively large
> amounts of money
> >> for relatively high-end CPUs will be affected (I'm sure
> they'll be glad
> >> to know that "common people" aren't affected. ;).
> Affected applications
> >> will see their performance decreased by a factor of 2-8
> (the number of
> >> CCXs in the CPU).
> >>
> >> OTOH, only a relatively small number of games will get a
> significant
> >> benefit from the thread affinity, and the benefit will be
> smaller than a
> >> factor of 2. This cannot justify risking a performance
> drop of up to a
> >> factor of 8, no matter how small the risk.
> >>
> >> Therefore, the appropriate mechanism is a whitelist.
> >
> > Hi,
> >
> > What was the conclusion of this discussion? I don't see any
> > whitelist/blacklist for this feature.
> >
> > I have just tested blender and it still renders on only a
> single CCX
> > on mesa from git master. Also, there is a bug report that
> suggests
> > this regressed performance in at least one game [1].
> 
> I hooked up that bug report to the 18.3 blocker bug.
> 
> 
> > If you think enabling it by default is the way to go, we
> should also
> > implement a blacklist so that it can be turned off in such
> cases.
> 
> I stand by my opinion that a white-list is appropriate, not a
> black-list. It's pretty much the same as mesa_glthread.
> 
> 
> So you are saying that gallium multithreading show be slower
> than singlethreading by default.
> 
> Marek
> 
> 
> Hi Marek,
> 
> The Ryzen optimization helps a lot of applications (mostly games)
> and improves their performance, mostly because of the reduced cost
> of communication between application's GL API thread and driver's
> pipe/winsys threads.
> 
> However, not all of the applications respond in the same way. The
> thread affinity management is hacky, by which I mean that this
> mechanism was not meant to mess with application threads from within
> library's threads. As an example, blender's threads, which use
> OpenGL "by accident", are forced to use the same CCX as the main
> gallium/winsys thread, even if they are many and want to work on as
> many CCXs as are possible. The thread that starts using GL spawns
> many more threads that don't use GL at all, and the current atfork
> mechanism doesn't help.
> 
> The current mechanism of tweaking thread affinities doesn't work
> universally with all Linux applications. We need a mechanism of
> tweaking this behavior, either through a whitelist or through a
> blacklist. As any application using OpenGL can be 

Re: [Mesa-dev] [PATCH v2 2/4] Gallium: Add format PIPE_FORMAT_R8_SRGB

2018-10-30 Thread Roland Scheidegger
With the format ordering in svga_format.c as Ilia mentioned fixed
Reviewed-by: Roland Scheidegger 

Am 30.10.18 um 11:46 schrieb Gert Wollny:
> This format is needed to support EXT_texture_sRGB_R8. THe patch adds a new
> format enum, the format entries in Gallium and and svga, the mapping between
> sRGB and linear formats, and tests.
> 
>   v2: - add mapping to linear format for PIPE_FORMATR_R8_SRGB
>   v3: - Add texture format to svga format table since otherwise building
> mesa will fail when this driver is enabled. It was not tested
> whether the extension actually works.
>   v4: - svga: remove the SVGA specific format definitions and table entries
> and only add correct the location of PIPE_FORMAT_R8_SRGB in the
> format_conversion_table (Ilia Mirkin)
>   - Split patch (1/2) to separate Gallium part and mesa/st part.
> (Roland Scheidegger)
>   - Trim the commit message to only contain the relevant parts from the
> split.
> 
> Signed-off-by: Gert Wollny 
> ---
>  src/gallium/auxiliary/util/u_format.csv | 1 +
>  src/gallium/auxiliary/util/u_format.h   | 4 
>  src/gallium/auxiliary/util/u_format_tests.c | 4 
>  src/gallium/drivers/svga/svga_format.c  | 1 +
>  src/gallium/include/pipe/p_format.h | 2 ++
>  5 files changed, 12 insertions(+)
> 
> diff --git a/src/gallium/auxiliary/util/u_format.csv 
> b/src/gallium/auxiliary/util/u_format.csv
> index f9e4925f27..911ac07d32 100644
> --- a/src/gallium/auxiliary/util/u_format.csv
> +++ b/src/gallium/auxiliary/util/u_format.csv
> @@ -114,6 +114,7 @@ PIPE_FORMAT_I32_FLOAT , plain, 1, 1, f32 ,
>  , , , , r
>  
>  # SRGB formats
>  PIPE_FORMAT_L8_SRGB   , plain, 1, 1, un8 , , , , 
> xxx1, srgb 
> +PIPE_FORMAT_R8_SRGB   , plain, 1, 1, un8 , , , , 
> x001, srgb
>  PIPE_FORMAT_L8A8_SRGB , plain, 1, 1, un8 , un8 , , , 
> xxxy, srgb 
>  PIPE_FORMAT_R8G8B8_SRGB   , plain, 1, 1, un8 , un8 , un8 , , 
> xyz1, srgb 
>  PIPE_FORMAT_R8G8B8A8_SRGB , plain, 1, 1, un8 , un8 , un8 , un8 , 
> xyzw, srgb 
> diff --git a/src/gallium/auxiliary/util/u_format.h 
> b/src/gallium/auxiliary/util/u_format.h
> index e66849c16b..5bcfc1f115 100644
> --- a/src/gallium/auxiliary/util/u_format.h
> +++ b/src/gallium/auxiliary/util/u_format.h
> @@ -925,6 +925,8 @@ util_format_srgb(enum pipe_format format)
> switch (format) {
> case PIPE_FORMAT_L8_UNORM:
>return PIPE_FORMAT_L8_SRGB;
> +   case PIPE_FORMAT_R8_UNORM:
> +  return PIPE_FORMAT_R8_SRGB;
> case PIPE_FORMAT_L8A8_UNORM:
>return PIPE_FORMAT_L8A8_SRGB;
> case PIPE_FORMAT_R8G8B8_UNORM:
> @@ -1001,6 +1003,8 @@ util_format_linear(enum pipe_format format)
> switch (format) {
> case PIPE_FORMAT_L8_SRGB:
>return PIPE_FORMAT_L8_UNORM;
> +   case PIPE_FORMAT_R8_SRGB:
> +  return PIPE_FORMAT_R8_UNORM;
> case PIPE_FORMAT_L8A8_SRGB:
>return PIPE_FORMAT_L8A8_UNORM;
> case PIPE_FORMAT_R8G8B8_SRGB:
> diff --git a/src/gallium/auxiliary/util/u_format_tests.c 
> b/src/gallium/auxiliary/util/u_format_tests.c
> index 9c9a5838d1..dee52533c1 100644
> --- a/src/gallium/auxiliary/util/u_format_tests.c
> +++ b/src/gallium/auxiliary/util/u_format_tests.c
> @@ -236,6 +236,10 @@ util_format_test_cases[] =
> {PIPE_FORMAT_L8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0xbc), 
> UNPACKED_1x1(0.502886458033, 0.502886458033, 0.502886458033, 1.0)},
> {PIPE_FORMAT_L8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0xff), 
> UNPACKED_1x1(1.0, 1.0, 1.0, 1.0)},
>  
> +   {PIPE_FORMAT_R8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0x00), 
> UNPACKED_1x1(0.0, 0.0, 0.0, 1.0)},
> +   {PIPE_FORMAT_R8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0xbc), 
> UNPACKED_1x1(0.502886458033, 0.0, 0.0, 1.0)},
> +   {PIPE_FORMAT_R8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0xff), 
> UNPACKED_1x1(1.0, 0.0, 0.0, 1.0)},
> +
> {PIPE_FORMAT_L8A8_SRGB, PACKED_1x16(0x), PACKED_1x16(0x), 
> UNPACKED_1x1(0.0, 0.0, 0.0, 0.0)},
> {PIPE_FORMAT_L8A8_SRGB, PACKED_1x16(0x), PACKED_1x16(0x00bc), 
> UNPACKED_1x1(0.502886458033, 0.502886458033, 0.502886458033, 0.0)},
> {PIPE_FORMAT_L8A8_SRGB, PACKED_1x16(0x), PACKED_1x16(0x00ff), 
> UNPACKED_1x1(1.0, 1.0, 1.0, 0.0)},
> diff --git a/src/gallium/drivers/svga/svga_format.c 
> b/src/gallium/drivers/svga/svga_format.c
> index 9f6a618706..bf1bbca3e2 100644
> --- a/src/gallium/drivers/svga/svga_format.c
> +++ b/src/gallium/drivers/svga/svga_format.c
> @@ -154,6 +154,7 @@ static const struct vgpu10_format_entry 
> format_conversion_table[] =
> { PIPE_FORMAT_R16G16B16_FLOAT,   SVGA3D_R16G16B16

Re: [Mesa-dev] [PATCH 2/3] mesa/st: Add Gallium support for EXT_texture_sRGB_R8

2018-10-29 Thread Roland Scheidegger
This patch should probably be split into adding the format to gallium
and using it in mesa/st.
More comments inline.

Am 29.10.18 um 08:35 schrieb Gert Wollny:
> This only adds support on the Gallium core level, for the drivers
> it is likely that additional changes are needed to support the
> new texture format.
> 
> Enables on softpipe and makes pass: 
>   dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.*
> 
> v2: - add include for getting GL_SR8_EXT
> - add mapping to linear format for PIPE_FORMATR_R8_SRGB
> v3: - Add texture format to svga format table since otherwise building
>   mesa will fail when this driver is enabled. It was not tested
>   whether the extension actually works.
> 
> Signed-off-by: Gert Wollny 
> ---
>  src/gallium/auxiliary/util/u_format.csv   |  1 +
>  src/gallium/auxiliary/util/u_format.h |  4 
>  src/gallium/auxiliary/util/u_format_tests.c   |  4 
>  src/gallium/drivers/svga/include/svga3d_devcaps.h |  1 +
>  src/gallium/drivers/svga/include/svga3d_surfacedefs.h |  5 +
>  src/gallium/drivers/svga/include/svga3d_types.h   |  2 +-
>  src/gallium/drivers/svga/svga_format.c|  7 +++
>  src/gallium/include/pipe/p_format.h   |  2 ++
>  src/mesa/state_tracker/st_extensions.c|  4 
>  src/mesa/state_tracker/st_format.c| 10 ++
>  10 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_format.csv 
> b/src/gallium/auxiliary/util/u_format.csv
> index f9e4925f27..911ac07d32 100644
> --- a/src/gallium/auxiliary/util/u_format.csv
> +++ b/src/gallium/auxiliary/util/u_format.csv
> @@ -114,6 +114,7 @@ PIPE_FORMAT_I32_FLOAT , plain, 1, 1, f32 ,
>  , , , , r
>  
>  # SRGB formats
>  PIPE_FORMAT_L8_SRGB   , plain, 1, 1, un8 , , , , 
> xxx1, srgb 
> +PIPE_FORMAT_R8_SRGB   , plain, 1, 1, un8 , , , , 
> x001, srgb
>  PIPE_FORMAT_L8A8_SRGB , plain, 1, 1, un8 , un8 , , , 
> xxxy, srgb 
>  PIPE_FORMAT_R8G8B8_SRGB   , plain, 1, 1, un8 , un8 , un8 , , 
> xyz1, srgb 
>  PIPE_FORMAT_R8G8B8A8_SRGB , plain, 1, 1, un8 , un8 , un8 , un8 , 
> xyzw, srgb 
> diff --git a/src/gallium/auxiliary/util/u_format.h 
> b/src/gallium/auxiliary/util/u_format.h
> index e66849c16b..5bcfc1f115 100644
> --- a/src/gallium/auxiliary/util/u_format.h
> +++ b/src/gallium/auxiliary/util/u_format.h
> @@ -925,6 +925,8 @@ util_format_srgb(enum pipe_format format)
> switch (format) {
> case PIPE_FORMAT_L8_UNORM:
>return PIPE_FORMAT_L8_SRGB;
> +   case PIPE_FORMAT_R8_UNORM:
> +  return PIPE_FORMAT_R8_SRGB;
> case PIPE_FORMAT_L8A8_UNORM:
>return PIPE_FORMAT_L8A8_SRGB;
> case PIPE_FORMAT_R8G8B8_UNORM:
> @@ -1001,6 +1003,8 @@ util_format_linear(enum pipe_format format)
> switch (format) {
> case PIPE_FORMAT_L8_SRGB:
>return PIPE_FORMAT_L8_UNORM;
> +   case PIPE_FORMAT_R8_SRGB:
> +  return PIPE_FORMAT_R8_UNORM;
> case PIPE_FORMAT_L8A8_SRGB:
>return PIPE_FORMAT_L8A8_UNORM;
> case PIPE_FORMAT_R8G8B8_SRGB:
> diff --git a/src/gallium/auxiliary/util/u_format_tests.c 
> b/src/gallium/auxiliary/util/u_format_tests.c
> index 9c9a5838d1..dee52533c1 100644
> --- a/src/gallium/auxiliary/util/u_format_tests.c
> +++ b/src/gallium/auxiliary/util/u_format_tests.c
> @@ -236,6 +236,10 @@ util_format_test_cases[] =
> {PIPE_FORMAT_L8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0xbc), 
> UNPACKED_1x1(0.502886458033, 0.502886458033, 0.502886458033, 1.0)},
> {PIPE_FORMAT_L8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0xff), 
> UNPACKED_1x1(1.0, 1.0, 1.0, 1.0)},
>  
> +   {PIPE_FORMAT_R8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0x00), 
> UNPACKED_1x1(0.0, 0.0, 0.0, 1.0)},
> +   {PIPE_FORMAT_R8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0xbc), 
> UNPACKED_1x1(0.502886458033, 0.0, 0.0, 1.0)},
> +   {PIPE_FORMAT_R8_SRGB, PACKED_1x8(0xff), PACKED_1x8(0xff), 
> UNPACKED_1x1(1.0, 0.0, 0.0, 1.0)},
> +
> {PIPE_FORMAT_L8A8_SRGB, PACKED_1x16(0x), PACKED_1x16(0x), 
> UNPACKED_1x1(0.0, 0.0, 0.0, 0.0)},
> {PIPE_FORMAT_L8A8_SRGB, PACKED_1x16(0x), PACKED_1x16(0x00bc), 
> UNPACKED_1x1(0.502886458033, 0.502886458033, 0.502886458033, 0.0)},
> {PIPE_FORMAT_L8A8_SRGB, PACKED_1x16(0x), PACKED_1x16(0x00ff), 
> UNPACKED_1x1(1.0, 1.0, 1.0, 0.0)},
> diff --git a/src/gallium/drivers/svga/include/svga3d_devcaps.h 
> b/src/gallium/drivers/svga/include/svga3d_devcaps.h
> index a519198b64..996fe23952 100644
> --- a/src/gallium/drivers/svga/include/svga3d_devcaps.h
> +++ b/src/gallium/drivers/svga/include/svga3d_devcaps.h
> @@ -437,6 +437,7 @@ typedef enum {
>  
> SVGA3D_DEVCAP_MULTISAMPLE_2X= 245,
> SVGA3D_DEVCAP_MULTISAMPLE_4X= 246,
> +   SVGA3D_DEVCAP_DXFMT_R8_UNORM_SRGB   = 247,
This won't work. Those devcaps are defined by our svga device (this 

Re: [Mesa-dev] [PATCH] scons: Remove gles option.

2018-10-19 Thread Roland Scheidegger
Looks alright to me, if it's broken anyway.

Reviewed-by: Roland Scheidegger 

Am 19.10.18 um 14:33 schrieb Jose Fonseca:
> It's broken, and WGL state tracker is always built with GLES support
> noawadays.
> ---
>  common.py| 2 --
>  src/SConscript   | 7 ---
>  src/gallium/state_trackers/osmesa/SConscript | 4 +---
>  src/gallium/state_trackers/wgl/SConscript| 4 +---
>  src/gallium/targets/libgl-gdi/SConscript | 6 --
>  src/gallium/targets/libgl-xlib/SConscript| 6 --
>  src/mapi/glapi/SConscript| 6 +-
>  src/mapi/shared-glapi/SConscript | 9 +
>  src/mesa/SConscript  | 4 +---
>  src/mesa/drivers/osmesa/SConscript   | 4 +---
>  10 files changed, 6 insertions(+), 46 deletions(-)
> 
> diff --git a/common.py b/common.py
> index 113fc7f5c12..f4f2bb44c1c 100644
> --- a/common.py
> +++ b/common.py
> @@ -99,8 +99,6 @@ def AddOptions(opts):
>  'enable static code analysis where available', 'no'))
>  opts.Add(BoolOption('asan', 'enable Address Sanitizer', 'no'))
>  opts.Add('toolchain', 'compiler toolchain', default_toolchain)
> -opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support',
> -'no'))
>  opts.Add(BoolOption('llvm', 'use LLVM', default_llvm))
>  opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp 
> (swrast)',
>  'no'))
> diff --git a/src/SConscript b/src/SConscript
> index 95ea061c4bb..54350a9cdcc 100644
> --- a/src/SConscript
> +++ b/src/SConscript
> @@ -42,10 +42,6 @@ env.Append(CPPPATH = ["#" + env['build_dir']])
>  if env['platform'] != 'windows':
>  SConscript('loader/SConscript')
>  
> -# When env['gles'] is set, the targets defined in mapi/glapi/SConscript are 
> not
> -# used.  libgl-xlib and libgl-gdi adapt themselves to use the targets defined
> -# in mapi/glapi-shared/SConscript.  mesa/SConscript also adapts itself to
> -# enable OpenGL ES support.
>  SConscript('mapi/glapi/gen/SConscript')
>  SConscript('mapi/glapi/SConscript')
>  
> @@ -61,8 +57,5 @@ if not env['embedded']:
>  if env['platform'] == 'haiku':
>  SConscript('egl/SConscript')
>  
> -if env['gles']:
> -SConscript('mapi/shared-glapi/SConscript')
> -
>  SConscript('gallium/SConscript')
>  
> diff --git a/src/gallium/state_trackers/osmesa/SConscript 
> b/src/gallium/state_trackers/osmesa/SConscript
> index f5519f13762..be67d0fe739 100644
> --- a/src/gallium/state_trackers/osmesa/SConscript
> +++ b/src/gallium/state_trackers/osmesa/SConscript
> @@ -14,10 +14,8 @@ if env['platform'] == 'windows':
>  env.AppendUnique(CPPDEFINES = [
>  'BUILD_GL32', # declare gl* as __declspec(dllexport) in Mesa headers
>  'WIN32_LEAN_AND_MEAN', # 
> http://msdn2.microsoft.com/en-us/library/6dwk3a1z.aspx
> +'_GLAPI_NO_EXPORTS', # prevent _glapi_* from being declared 
> __declspec(dllimport)
>  ])
> -if not env['gles']:
> -# prevent _glapi_* from being declared __declspec(dllimport)
> -env.Append(CPPDEFINES = ['_GLAPI_NO_EXPORTS'])
>  
>  st_osmesa = env.ConvenienceLibrary(
>  target ='st_osmesa',
> diff --git a/src/gallium/state_trackers/wgl/SConscript 
> b/src/gallium/state_trackers/wgl/SConscript
> index a7fbb07a89a..bbf5ebd9764 100644
> --- a/src/gallium/state_trackers/wgl/SConscript
> +++ b/src/gallium/state_trackers/wgl/SConscript
> @@ -14,10 +14,8 @@ env.AppendUnique(CPPDEFINES = [
>  '_GDI32_', # prevent wgl* being declared __declspec(dllimport)
>  'BUILD_GL32', # declare gl* as __declspec(dllexport) in Mesa headers
>  'WIN32_LEAN_AND_MEAN', # 
> http://msdn2.microsoft.com/en-us/library/6dwk3a1z.aspx
> +'_GLAPI_NO_EXPORTS', # prevent _glapi_* from being declared 
> __declspec(dllimport)
>  ])
> -if not env['gles']:
> -# prevent _glapi_* from being declared __declspec(dllimport)
> -env.Append(CPPDEFINES = ['_GLAPI_NO_EXPORTS'])
>  
>  wgl = env.ConvenienceLibrary(
>  target ='wgl',
> diff --git a/src/gallium/targets/libgl-gdi/SConscript 
> b/src/gallium/targets/libgl-gdi/SConscript
> index 132cb73358d..94feca24ef3 100644
> --- a/src/gallium/targets/libgl-gdi/SConscript
> +++ b/src/gallium/targets/libgl-gdi/SConscript
> @@ -48,12 +48,6 @@ else:
>  
>  env['no_import_lib'] = 1
>  
> -# when GLES is enabled, gl* and _glapi_* belong to bridge_glapi and
> -# shared_glapi respectively
> -if env['gles']:
> -env.Prepend(LIBPATH = [shared_glapi.dir])
> -glapi = [bridge_glapi, 'libglapi']
> -
>  opengl32 = env.SharedLibrary(
>

Re: [Mesa-dev] [PATCH mesa] radv: s/abs/fabsf/ for floats

2018-10-18 Thread Roland Scheidegger
Am 18.10.18 um 19:25 schrieb Matt Turner:
> On Thu, Oct 18, 2018 at 8:46 AM Eric Engestrom  
> wrote:
>>
>> Fixes: a4c4efad89eceb26cf82 "radv: Rework guard band calculation"
>> Cc: Bas Nieuwenhuizen 
>> Signed-off-by: Eric Engestrom 
>> ---
>>  src/amd/vulkan/si_cmd_buffer.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c
>> index de057657ee70d354e910..52daf99414790d4764b6 100644
>> --- a/src/amd/vulkan/si_cmd_buffer.c
>> +++ b/src/amd/vulkan/si_cmd_buffer.c
>> @@ -516,16 +516,16 @@ si_write_scissors(struct radeon_cmdbuf *cs, int first,
>> VkRect2D scissor = si_intersect_scissor([i], 
>> _scissor);
>>
>> get_viewport_xform(viewports + i, scale, translate);
>> -   scale[0] = abs(scale[0]);
>> -   scale[1] = abs(scale[1]);
>> +   scale[0] = fabsf(scale[0]);
>> +   scale[1] = fabsf(scale[1]);
>>
>> if (scale[0] < 0.5)
> 
> You might want to suffix these immediates with f at the same time. As
> is, this will convert scale[0] to a double before the compairson
> against 0.5. I'm assuming that scale[0] is a float, which I've
> inferred from the patch but haven't confirmed.

Not that I don't think doing this isn't good idea, but I think compilers
are clever enough to figure this out and omit the conversion? At least
if you don't compile with -O0...

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Q: to which software renderers should we contribute to help virgl conformance testing

2018-10-17 Thread Roland Scheidegger
Am 17.10.18 um 19:15 schrieb Gert Wollny:
> Dear all, 
> 
> we are looking into doing a CI for virglrenderer that also runs a
> subset of the GLES dEQP, and in order to be able to run this also in
> gitlab.fd.o we were looking into the available gallium software
> renderers. Inital tests by just running the dEQP-GLES2 were quite
> successful in the sense that the exection time is not too long (a full
> run on the GL and GLES host with llvmpipe takes about 10 min [1]). 
> 
> Now to extend on that work the focus is turning to which software
> renderer has the most features, the least failing tests, and is
> actively developed. 
> 
> Simply looking at the commit stats it seems that the developement of
> softpipe and llvmpipe is mostly stalled, swr, on the other had has seen
> quite some development, but mostly regarding performance, and given the
> FAQ [2] the focus is on a very specific application space and not so
> much on getting more features in.
I wouldn't quite say llvmpipe is stalled, although it's true that there
weren't all that many changes (in particular as new features are concerned).

> 
> When checking for conformance of virglrenderer we need a host driver
> that is conformant itself, and we are willing to contribute here, but
> it seems to make most sense to focus this work on just one driver. To
> make sensible choice there are some open questions:
> 
> Are there plans to get swr and/or llvmpipe to support gles 3.1, or
> carry any of the drivers even further, maybe GLES 3.2 and desktop 4.x?
At a quick glance for for gles 3.1 llvmpipe would be missing mostly
compute shaders and shader images / ssbo, so definitely some work. GL 4
would add tessellation as well (at least I think these are the big parts
missing).
Unfortunately I don't have time to work on this, but it would be nice to
have indeed. Well volunteers welcome, no special hw nor docs needed :-).
(Although softpipe is easier to work with, but it's just not all that
interesing.)

> 
> 
> Is there any specific interest to fix all failures that occur when
> running gles dEQP? In this bug report [3] Roland pointed out that
> "there is no goal as such to pass dEQP, although patches are welcome",
> any opinion for the other drivers? (for swr beyond what is written in
> the FAQ).
I think it wouldn't really be all that much work to get dEQP passing -
since llvmpipe is built to honor dx10 rules, which are typically more
strict than GL. But some things specific to GL fail. So IMHO if you want
a non-hw driver to pass dEQP, llvmpipe is probably still your best bet
(but of course, softpipe is generally easier to fix).
Can't really comment on swr there.

Roland


> 
> As pointed out in the FAQ, swr is very Intel specific, are there plans
> not layed out in the FAQ to support other, non-x86 hardware?
> 
> many thanks 
> Gert

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] softpipe: dynamically allocate space for immediate constants

2018-10-16 Thread Roland Scheidegger
Looks reasonable to me.
Reviewed-by: Roland Scheidegger 


Am 16.10.18 um 10:07 schrieb Gert Wollny:
> From: Gert Wollny 
> 
> The number of immediate constants was fixed and the size check was
> only done by means of an assertion. Given this a shader that emits
> more immediate constants would result in a memory corruption when
> mesa is build in release mode.
> 
> Instead of using this fixed limit allocate the space dynamically, let it 
> grow as needed, and also remove the unused ImmArray.
> 
> Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1
> 
> Signed-off-by: Gert Wollny 
> ---
>  src/gallium/auxiliary/tgsi/tgsi_exec.c | 13 -
>  src/gallium/auxiliary/tgsi/tgsi_exec.h |  7 +++
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
> b/src/gallium/auxiliary/tgsi/tgsi_exec.c
> index 59194ebe31..5db515a075 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
> @@ -1223,7 +1223,17 @@ tgsi_exec_machine_bind_shader(
>   {
>  uint size = parse.FullToken.FullImmediate.Immediate.NrTokens - 1;
>  assert( size <= 4 );
> -assert( mach->ImmLimit + 1 <= TGSI_EXEC_NUM_IMMEDIATES );
> +if (mach->ImmLimit >= mach->ImmsReserved) {
> +   unsigned newReserved = mach->ImmsReserved ? 2 * 
> mach->ImmsReserved : 128;
> +   float4 *imms = REALLOC(mach->Imms, mach->ImmsReserved, 
> newReserved * sizeof(float4));
> +   if (imms) {
> +  mach->ImmsReserved = newReserved;
> +  mach->Imms = imms;
> +   } else {
> +  debug_printf("Unable to (re)allocate space for immidiate 
> constants\n");
> +  break;
> +   }
> +}
>  
>  for( i = 0; i < size; i++ ) {
> mach->Imms[mach->ImmLimit][i] = 
> @@ -1337,6 +1347,7 @@ tgsi_exec_machine_destroy(struct tgsi_exec_machine 
> *mach)
> if (mach) {
>FREE(mach->Instructions);
>FREE(mach->Declarations);
> +  FREE(mach->Imms);
>  
>align_free(mach->Inputs);
>align_free(mach->Outputs);
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h 
> b/src/gallium/auxiliary/tgsi/tgsi_exec.h
> index ed8b9e8869..6d4ac38142 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_exec.h
> +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h
> @@ -231,7 +231,6 @@ struct tgsi_sampler
>  };
>  
>  #define TGSI_EXEC_NUM_TEMPS   4096
> -#define TGSI_EXEC_NUM_IMMEDIATES  256
>  
>  /*
>   * Locations of various utility registers (_I = Index, _C = Channel)
> @@ -341,6 +340,7 @@ enum tgsi_break_type {
>  
>  #define TGSI_EXEC_MAX_BREAK_STACK (TGSI_EXEC_MAX_LOOP_NESTING + 
> TGSI_EXEC_MAX_SWITCH_NESTING)
>  
> +typedef float float4[4];
>  
>  /**
>   * Run-time virtual machine state for executing TGSI shader.
> @@ -352,9 +352,8 @@ struct tgsi_exec_machine
> struct tgsi_exec_vector   Temps[TGSI_EXEC_NUM_TEMPS +
> TGSI_EXEC_NUM_TEMP_EXTRAS];
>  
> -   float Imms[TGSI_EXEC_NUM_IMMEDIATES][4];
> -
> -   float ImmArray[TGSI_EXEC_NUM_IMMEDIATES][4];
> +   unsigned   ImmsReserved;
> +   float4 *Imms;
>  
> struct tgsi_exec_vector   *Inputs;
> struct tgsi_exec_vector   *Outputs;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 4/7] mesa: Helper functions for counting set bits in a mask

2018-10-15 Thread Roland Scheidegger
Am 15.10.18 um 15:19 schrieb Toni Lönnberg:
> ---
>  src/util/bitscan.h | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/src/util/bitscan.h b/src/util/bitscan.h
> index dc89ac9..cdfecaf 100644
> --- a/src/util/bitscan.h
> +++ b/src/util/bitscan.h
> @@ -112,6 +112,31 @@ u_bit_scan64(uint64_t *mask)
> return i;
>  }
>  
> +/* Count bits set in mask */
> +static inline int
> +u_count_bits(unsigned *mask)
I don't think you'd want to pass a pointer.

Besides, I don't think we need another set of functions for this.
src/util/u_math.h already has util_bitcount64 and util_bitcount which do
the same thing.
(Although I don't know which one is better, util_bitcount looks like it
would be potentially faster with just very few bits set, but with
"random" uint/uint64 it certainly would seem the new one is better. But
in any case, can't beat the cpu popcount instruction...)

Roland


> +{
> +   unsigned v = *mask;
> +   int c;
> +   v = v - ((v >> 1) & 0x);
> +   v = (v & 0x) + ((v >> 2) & 0x);
> +   v = (v + (v >> 4)) & 0xF0F0F0F;
> +   c = (int)((v * 0x1010101) >> 24);
> +   return c;
> +}
> +
> +static inline int
> +u_count_bits64(uint64_t *mask)
> +{
> +   uint64_t v = *mask;
> +   int c;
> +   v = v - ((v >> 1) & 0xull);
> +   v = (v & 0xull) + ((v >> 2) & 0xull);
> +   v = (v + (v >> 4)) & 0xF0F0F0F0F0F0F0Full;
> +   c = (int)((v * 0x101010101010101ull) >> 56);
> +   return c;
> +}
> +
>  /* Determine if an unsigned value is a power of two.
>   *
>   * \note
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: Pass index to pipe->create_query() for statistics queries.

2018-10-15 Thread Roland Scheidegger
FWIW the gallium pipeline stats query exists for way longer than the GL
ARB spec for it, and at least llvmpipe implemented it for ages.
So the reason for it being like that is due to dx10 (which always
queries these together), when gl couldn't do it at all.

To make it a bit nicer you could use new defines instead of just numbers
for indices, without making the change more intrusive.
I'm not sure it's really worth the trouble of splitting it up (well it
would mean we'd have to emit a boatload of queries for dx10, unless you
just add additional ones, but then the drivers would need to support
both...), since I don't think it's really something which gets used a lot.

Some comment inline.


Am 15.10.18 um 08:29 schrieb Kenneth Graunke:
> GL exposes separate queries for each pipeline statistics counter.
> For some reason, Gallium chose to map them all to a single target,
> PIPE_QUERY_PIPELINE_STATISTICS.  Radeon hardware appears to query
> them all as a group.  pipe->get_query_result_resource() takes an
> index, indicating which to write to the buffer.  The CPU-side hook,
> pipe->get_query_result(), simply writes them all, and st/mesa returns
> the one that was actually desired.
> 
> On Intel hardware, each individual pipeline statistics value is handled
> as a separate counter and query.  We can query each individually, and
> that is more efficient than querying all 11 counters each time.  But,
> we need pipe->get_query_result() to know which one to return.
> 
> To handle this, we pass the index into pipe->create_query(), which
> was previously always 0 for these queries.  Drivers which return all
> of the counters as a group can simply ignore it; drivers querying one
> at a time can use it to distinguish between the counters.
> 
> This is the least invasive fix, but it is kind of ugly, and I wonder
> whether we'd be better off just adding PIPE_QUERY_IA_VERTICES (etc.)
> targets...
> ---
>  src/mesa/state_tracker/st_cb_queryobj.c | 76 -
>  1 file changed, 36 insertions(+), 40 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_cb_queryobj.c 
> b/src/mesa/state_tracker/st_cb_queryobj.c
> index 69e6004c3f1..0dc06ceb574 100644
> --- a/src/mesa/state_tracker/st_cb_queryobj.c
> +++ b/src/mesa/state_tracker/st_cb_queryobj.c
> @@ -88,6 +88,40 @@ st_DeleteQuery(struct gl_context *ctx, struct 
> gl_query_object *q)
> free(stq);
>  }
>  
> +static int
> +target_to_index(const struct gl_query_object *q)
> +{
> +   switch (q->Target) {
> +   case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:
> +   case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB:
> +   case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB:
The last one here doesn't actually have an index (as it is used for
querying all streams) - albeit I suppose q->Stream should be 0 anyway.
GL_PRIMITIVES_GENERATED though can have an index.

Otherwise looks reasonable to me.

Roland


> +  return q->Stream;
> +   case GL_VERTICES_SUBMITTED_ARB:
> +  return 0;
> +   case GL_PRIMITIVES_SUBMITTED_ARB:
> +  return 1;
> +   case GL_VERTEX_SHADER_INVOCATIONS_ARB:
> +  return 2;
> +   case GL_GEOMETRY_SHADER_INVOCATIONS:
> +  return 3;
> +   case GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB:
> +  return 4;
> +   case GL_CLIPPING_INPUT_PRIMITIVES_ARB:
> +  return 5;
> +   case GL_CLIPPING_OUTPUT_PRIMITIVES_ARB:
> +  return 6;
> +   case GL_FRAGMENT_SHADER_INVOCATIONS_ARB:
> +  return 7;
> +   case GL_TESS_CONTROL_SHADER_PATCHES_ARB:
> +  return 8;
> +   case GL_TESS_EVALUATION_SHADER_INVOCATIONS_ARB:
> +  return 9;
> +   case GL_COMPUTE_SHADER_INVOCATIONS_ARB:
> +  return 10;
> +   default:
> +  return 0;
> +   }
> +}
>  
>  static void
>  st_BeginQuery(struct gl_context *ctx, struct gl_query_object *q)
> @@ -164,7 +198,7 @@ st_BeginQuery(struct gl_context *ctx, struct 
> gl_query_object *q)
>   ret = pipe->end_query(pipe, stq->pq_begin);
> } else {
>if (!stq->pq) {
> - stq->pq = pipe->create_query(pipe, type, q->Stream);
> + stq->pq = pipe->create_query(pipe, type, target_to_index(q));
>   stq->type = type;
>}
>if (stq->pq)
> @@ -383,46 +417,8 @@ st_StoreQueryResult(struct gl_context *ctx, struct 
> gl_query_object *q,
>  
> if (pname == GL_QUERY_RESULT_AVAILABLE) {
>index = -1;
> -   } else if (stq->type == PIPE_QUERY_PIPELINE_STATISTICS) {
> -  switch (q->Target) {
> -  case GL_VERTICES_SUBMITTED_ARB:
> - index = 0;
> - break;
> -  case GL_PRIMITIVES_SUBMITTED_ARB:
> - index = 1;
> - break;
> -  case GL_VERTEX_SHADER_INVOCATIONS_ARB:
> - index = 2;
> - break;
> -  case GL_GEOMETRY_SHADER_INVOCATIONS:
> - index = 3;
> - break;
> -  case GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB:
> - index = 4;
> - break;
> -  case GL_CLIPPING_INPUT_PRIMITIVES_ARB:
> - index = 5;
> - break;
> -  case GL_CLIPPING_OUTPUT_PRIMITIVES_ARB:

Re: [Mesa-dev] [PATCH 3/3] appveyor: Cache pip's cache files.

2018-10-15 Thread Roland Scheidegger
Am 12.10.18 um 17:27 schrieb Jose Fonseca:
> It should speed up the Python packages installation.
> ---
>  appveyor.yml | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/appveyor.yml b/appveyor.yml
> index a4e942c14ca..ccb84fd3403 100644
> --- a/appveyor.yml
> +++ b/appveyor.yml
> @@ -33,7 +33,9 @@ branches:
>  # - 
> https://www.appveyor.com/blog/2014/06/04/shallow-clone-for-git-repositories
>  clone_depth: 100
>  
> +# https://www.appveyor.com/docs/build-cache/
>  cache:
> +- '%LOCALAPPDATA%\pip\Cache -> appveyor.yml'
>  - win_flex_bison-2.5.15.zip
>  - llvm-5.0.1-msvc2017-mtd.7z
>  
> 

Series looks good to me.
Reviewed-by: Roland Scheidegger 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] gallivm: Make it possible to disable some optimization shortcuts in release builds

2018-10-05 Thread Roland Scheidegger
Looks alright to me. I'm not quite sold on the "safemath" name though, 
since "safe math" is usually associated with floating point 
optimizations, and this here is just filtering hacks. Maybe something 
like disable_filtering_hacks would be more fitting?

Reviewed-by: Roland Scheidegger 

On 10/05/2018 06:08 AM, Gert Wollny wrote:
> From: Gert Wollny 
> 
> For testing it is of interest that all tests of dEQP pass, e.g. to test
> virglrenderer on a host only providing software rendering like in a CI.
> Hence make it possible to disable certain optimizations that make tests fail.
> 
> While we are there also add some documentation to the flags to make it clear
> that this is opt-out.
> 
> Setting the environment variable "GALLIVM_PERF=disable_all" can be used to 
> make
> the following tests pass in release mode:
> 
>dEQP-GLES2.functional.texture.mipmap.2d.affine.*_linear_*
>dEQP-GLES2.functional.texture.mipmap.cube.generate.*
>dEQP-GLES2.functional.texture.vertex.2d.filtering.*_mipmap_linear_*
>dEQP-GLES2.functional.texture.vertex.2d.wrap.*
> 
> Related:
>
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.freedesktop.org%2Fshow_bug.cgi%3Fid%3D94957data=02%7C01%7Csroland%40vmware.com%7Cca786b57a0ab40daeddd08d62ac3e86d%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636743418250432350sdata=czSAaylr1oCdY5lsPRxB7EsVb6mPhMU2e1t1ZuhYnYk%3Dreserved=0
> 
> v2: rename optimization disabling flag to 'safemath' and also move the
>  nopt flag to the perf flags.
> 
> Signed-off-by: Gert Wollny 
> ---
>   src/gallium/auxiliary/gallivm/lp_bld_debug.h  | 16 ---
>   src/gallium/auxiliary/gallivm/lp_bld_init.c   | 25 
> +++
>   src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  6 +++---
>   src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c   |  6 +++---
>   4 files changed, 32 insertions(+), 21 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_debug.h 
> b/src/gallium/auxiliary/gallivm/lp_bld_debug.h
> index f96a1afa7a..eeef0d6ba6 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_debug.h
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_debug.h
> @@ -39,20 +39,22 @@
>   #define GALLIVM_DEBUG_TGSI  (1 << 0)
>   #define GALLIVM_DEBUG_IR(1 << 1)
>   #define GALLIVM_DEBUG_ASM   (1 << 2)
> -#define GALLIVM_DEBUG_NO_OPT(1 << 3)
> -#define GALLIVM_DEBUG_PERF  (1 << 4)
> -#define GALLIVM_DEBUG_NO_BRILINEAR  (1 << 5)
> -#define GALLIVM_DEBUG_NO_RHO_APPROX (1 << 6)
> -#define GALLIVM_DEBUG_NO_QUAD_LOD   (1 << 7)
> -#define GALLIVM_DEBUG_GC(1 << 8)
> -#define GALLIVM_DEBUG_DUMP_BC   (1 << 9)
> +#define GALLIVM_DEBUG_PERF  (1 << 3)
> +#define GALLIVM_DEBUG_GC(1 << 4)
> +#define GALLIVM_DEBUG_DUMP_BC   (1 << 5)
>   
> +#define GALLIVM_PERF_NO_BRILINEAR  (1 << 0)
> +#define GALLIVM_PERF_NO_RHO_APPROX (1 << 1)
> +#define GALLIVM_PERF_NO_QUAD_LOD   (1 << 2)
> +#define GALLIVM_PERF_NO_OPT(1 << 3)
>   
>   #ifdef __cplusplus
>   extern "C" {
>   #endif
>   
>   
> +extern unsigned gallivm_perf;
> +
>   #ifdef DEBUG
>   extern unsigned gallivm_debug;
>   #else
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_init.c
> index 1f0a01cde6..3f7c4d3154 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c
> @@ -59,6 +59,17 @@ static const bool use_mcjit = USE_MCJIT;
>   static bool use_mcjit = FALSE;
>   #endif
>   
> +unsigned gallivm_perf = 0;
> +
> +static const struct debug_named_value lp_bld_perf_flags[] = {
> +   { "no_brilinear", GALLIVM_PERF_NO_BRILINEAR, "disable brilinear 
> optimization" },
> +   { "no_rho_approx", GALLIVM_PERF_NO_RHO_APPROX, "disable rho_approx 
> optimization" },
> +   { "no_quad_lod", GALLIVM_PERF_NO_QUAD_LOD, "disable quad_lod 
> optimization" },
> +   { "nopt",   GALLIVM_PERF_NO_OPT, "disable optimization passes to speed up 
> shader compilation" },
> +   { "safemath", GALLIVM_PERF_NO_BRILINEAR | GALLIVM_PERF_NO_RHO_APPROX |
> + GALLIVM_PERF_NO_QUAD_LOD, "disable unsafe optimizations" },
> +   DEBUG_NAMED_VALUE_END
> +};
>   
>   #ifdef DEBUG
>   unsigned gallivm_debug = 0;
> @@ -67,11 +78,7 @@ static const struct debug_named_value lp_bld_debug_flags[] 
> = {
>  { "tgsi",   GALLIVM_DEBUG_TGSI, NULL },
>  { "ir", GALLIVM_DEBUG_IR, NULL },
>  { "

Re: [Mesa-dev] [PATCH] gallivm: Make it possible to disable some optimization shortcuts in release builds

2018-10-04 Thread Roland Scheidegger
I've attached the diff (no guarantees, can't test it right now).
nopt is useful because if there's tons of shaders to compile compilation is 
quite a bit faster (but generally you really don't want to do this).
Not sure about why dumpbc isn't debug only, might not have been a deliberate 
decision. Although using a separate variable has the advantage that the 
compiler can optimise out the unneeded code (we didn't really care).


From: Gert Wollny 
Sent: Thursday, October 4, 2018 12:58:41 AM
To: Roland Scheidegger; mesa-dev@lists.freedesktop.org
Cc: imir...@alum.mit.edu; Jose Fonseca
Subject: Re: [PATCH] gallivm: Make it possible to disable some optimization 
shortcuts in release builds

Am Mittwoch, den 03.10.2018, 20:47 + schrieb Roland Scheidegger:
> Is it worth it splitting out to another var?
> We actually have code branches internally where we just define the
> gallivm_debug var always, and some of the debug flags outside the
> #ifdef
> debug (we'll actually need more than just these 3 accessible outside
> debug builds).
> If you think this is cleaner though I suppose we can deal with it...

One part of me says that keeping it separate is indeed cleaner, another
says it  doesn't really matter. Why don't you upstream your version
with all the flags you need exposed (and document the not so obvious
ones)?

Best,
Gert


>
> Roland
>
>
>
> On 10/03/2018 09:52 AM, Gert Wollny wrote:
> > From: Gert Wollny 
> >
> > For testing it is of interetest that all tests of dEQP pass, e.g.
> > to test
> > virglrenderer on a host only providing software rendering like in a
> > CI.
> > Hence make it possible to disable certain optimizations that make
> > tests fail.
> >
> > While we are there also add some documentation to the flags to make
> > it clear
> > that this is opt-out.
> >
> > Setting the environment variable "GALLIVM_PERF=disable_all" can be
> > used to make
> > the follwing tests pass in release mode:
> >
> >dEQP-GLES2.functional.texture.mipmap.2d.affine.*_linear_*
> >dEQP-GLES2.functional.texture.mipmap.cube.generate.*
> >dEQP-
> > GLES2.functional.texture.vertex.2d.filtering.*_mipmap_linear_*
> >dEQP-GLES2.functional.texture.vertex.2d.wrap.*
> >
> > Related:
> >https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2
> > Fbugs.freedesktop.org%2Fshow_bug.cgi%3Fid%3D94957data=02%7C01%
> > 7Csroland%40vmware.com%7Cd8a63cda397e40a6d42808d6290556e7%7Cb39138c
> > a3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636741500244575630sdata=UU
> > 5W053FLBScYWpQtw9yANGRDCcKYQdS4eRyl7k9u9k%3Dreserved=0
> >
> > Signed-off-by: Gert Wollny 
> > ---
> >   src/gallium/auxiliary/gallivm/lp_bld_debug.h  | 13 
> > -
> >   src/gallium/auxiliary/gallivm/lp_bld_init.c   | 15
> > ---
> >   src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  6 +++---
> >   src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c   |  6 +++---
> >   4 files changed, 26 insertions(+), 14 deletions(-)
> >
> > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_debug.h
> > b/src/gallium/auxiliary/gallivm/lp_bld_debug.h
> > index f96a1afa7a..ce09d789cb 100644
> > --- a/src/gallium/auxiliary/gallivm/lp_bld_debug.h
> > +++ b/src/gallium/auxiliary/gallivm/lp_bld_debug.h
> > @@ -41,18 +41,21 @@
> >   #define GALLIVM_DEBUG_ASM   (1 << 2)
> >   #define GALLIVM_DEBUG_NO_OPT(1 << 3)
> >   #define GALLIVM_DEBUG_PERF  (1 << 4)
> > -#define GALLIVM_DEBUG_NO_BRILINEAR  (1 << 5)
> > -#define GALLIVM_DEBUG_NO_RHO_APPROX (1 << 6)
> > -#define GALLIVM_DEBUG_NO_QUAD_LOD   (1 << 7)
> > -#define GALLIVM_DEBUG_GC(1 << 8)
> > -#define GALLIVM_DEBUG_DUMP_BC   (1 << 9)
> > +#define GALLIVM_DEBUG_GC(1 << 5)
> > +#define GALLIVM_DEBUG_DUMP_BC   (1 << 6)
> >
> >
> > +#define GALLIVM_PERF_NO_BRILINEAR  (1 << 0)
> > +#define GALLIVM_PERF_NO_RHO_APPROX (1 << 1)
> > +#define GALLIVM_PERF_NO_QUAD_LOD   (1 << 2)
> > +
> >   #ifdef __cplusplus
> >   extern "C" {
> >   #endif
> >
> >
> > +extern unsigned gallivm_perf;
> > +
> >   #ifdef DEBUG
> >   extern unsigned gallivm_debug;
> >   #else
> > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c
> > b/src/gallium/auxiliary/gallivm/lp_bld_init.c
> > index 1f0a01cde6..c8b2a7fcc9 100644
> > --- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
> > +++ b/src/gallium/auxiliary/gallivm/

Re: [Mesa-dev] [PATCH] gallivm: Make it possible to disable some optimization shortcuts in release builds

2018-10-03 Thread Roland Scheidegger
Is it worth it splitting out to another var?
We actually have code branches internally where we just define the 
gallivm_debug var always, and some of the debug flags outside the #ifdef 
debug (we'll actually need more than just these 3 accessible outside 
debug builds).
If you think this is cleaner though I suppose we can deal with it...

Roland



On 10/03/2018 09:52 AM, Gert Wollny wrote:
> From: Gert Wollny 
> 
> For testing it is of interetest that all tests of dEQP pass, e.g. to test
> virglrenderer on a host only providing software rendering like in a CI.
> Hence make it possible to disable certain optimizations that make tests fail.
> 
> While we are there also add some documentation to the flags to make it clear
> that this is opt-out.
> 
> Setting the environment variable "GALLIVM_PERF=disable_all" can be used to 
> make
> the follwing tests pass in release mode:
> 
>dEQP-GLES2.functional.texture.mipmap.2d.affine.*_linear_*
>dEQP-GLES2.functional.texture.mipmap.cube.generate.*
>dEQP-GLES2.functional.texture.vertex.2d.filtering.*_mipmap_linear_*
>dEQP-GLES2.functional.texture.vertex.2d.wrap.*
> 
> Related:
>
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.freedesktop.org%2Fshow_bug.cgi%3Fid%3D94957data=02%7C01%7Csroland%40vmware.com%7Cd8a63cda397e40a6d42808d6290556e7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636741500244575630sdata=UU5W053FLBScYWpQtw9yANGRDCcKYQdS4eRyl7k9u9k%3Dreserved=0
> 
> Signed-off-by: Gert Wollny 
> ---
>   src/gallium/auxiliary/gallivm/lp_bld_debug.h  | 13 -
>   src/gallium/auxiliary/gallivm/lp_bld_init.c   | 15 ---
>   src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |  6 +++---
>   src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c   |  6 +++---
>   4 files changed, 26 insertions(+), 14 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_debug.h 
> b/src/gallium/auxiliary/gallivm/lp_bld_debug.h
> index f96a1afa7a..ce09d789cb 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_debug.h
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_debug.h
> @@ -41,18 +41,21 @@
>   #define GALLIVM_DEBUG_ASM   (1 << 2)
>   #define GALLIVM_DEBUG_NO_OPT(1 << 3)
>   #define GALLIVM_DEBUG_PERF  (1 << 4)
> -#define GALLIVM_DEBUG_NO_BRILINEAR  (1 << 5)
> -#define GALLIVM_DEBUG_NO_RHO_APPROX (1 << 6)
> -#define GALLIVM_DEBUG_NO_QUAD_LOD   (1 << 7)
> -#define GALLIVM_DEBUG_GC(1 << 8)
> -#define GALLIVM_DEBUG_DUMP_BC   (1 << 9)
> +#define GALLIVM_DEBUG_GC(1 << 5)
> +#define GALLIVM_DEBUG_DUMP_BC   (1 << 6)
>   
>   
> +#define GALLIVM_PERF_NO_BRILINEAR  (1 << 0)
> +#define GALLIVM_PERF_NO_RHO_APPROX (1 << 1)
> +#define GALLIVM_PERF_NO_QUAD_LOD   (1 << 2)
> +
>   #ifdef __cplusplus
>   extern "C" {
>   #endif
>   
>   
> +extern unsigned gallivm_perf;
> +
>   #ifdef DEBUG
>   extern unsigned gallivm_debug;
>   #else
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_init.c
> index 1f0a01cde6..c8b2a7fcc9 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c
> @@ -59,6 +59,16 @@ static const bool use_mcjit = USE_MCJIT;
>   static bool use_mcjit = FALSE;
>   #endif
>   
> +unsigned gallivm_perf = 0;
> +
> +static const struct debug_named_value lp_bld_perf_flags[] = {
> +   { "no_brilinear", GALLIVM_PERF_NO_BRILINEAR, "disable brilinear 
> optimization" },
> +   { "no_rho_approx", GALLIVM_PERF_NO_RHO_APPROX, "disable rho_approx 
> optimization" },
> +   { "no_quad_lod", GALLIVM_PERF_NO_QUAD_LOD, "disable quad_lod 
> optimization" },
> +   { "disable_all", GALLIVM_PERF_NO_BRILINEAR | GALLIVM_PERF_NO_RHO_APPROX |
> + GALLIVM_PERF_NO_QUAD_LOD, "disable all optimizations" },
> +   DEBUG_NAMED_VALUE_END
> +};
>   
>   #ifdef DEBUG
>   unsigned gallivm_debug = 0;
> @@ -69,9 +79,6 @@ static const struct debug_named_value lp_bld_debug_flags[] 
> = {
>  { "asm",GALLIVM_DEBUG_ASM, NULL },
>  { "nopt",   GALLIVM_DEBUG_NO_OPT, NULL },
>  { "perf",   GALLIVM_DEBUG_PERF, NULL },
> -   { "no_brilinear", GALLIVM_DEBUG_NO_BRILINEAR, NULL },
> -   { "no_rho_approx", GALLIVM_DEBUG_NO_RHO_APPROX, NULL },
> -   { "no_quad_lod", GALLIVM_DEBUG_NO_QUAD_LOD, NULL },
>  { "gc", GALLIVM_DEBUG_GC, NULL },
>  { "dumpbc", GALLIVM_DEBUG_DUMP_BC, NULL },
>  DEBUG_NAMED_VALUE_END
> @@ -420,6 +427,8 @@ lp_build_init(void)
>  gallivm_debug = debug_get_option_gallivm_debug();
>   #endif
>   
> +   gallivm_perf = debug_get_flags_option("GALLIVM_PERF", lp_bld_perf_flags, 
> 0 );
> +
>  lp_set_target_options();
>   
>  util_cpu_detect();
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> index 8f760f59fe..018cca8f9d 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> @@ -2825,13 +2825,13 @@ 

Re: [Mesa-dev] [PATCH] Revert "mesa: remove unnecessary 'sort by year' for the GL extensions"

2018-09-25 Thread Roland Scheidegger
Looks great to me.

Reviewed-by: Roland Scheidegger 

Am 24.09.2018 um 17:19 schrieb Emil Velikov:
> This reverts commit 3d81e11b49366b5636b8524ba0f8c7076e3fdf34.
> 
> As reported by Federico, some games require the 'sort by year' since
> they truncate the extensions which do not fit the fixed size string
> array.
> 
> Seemingly I did not consider that as the documentation (both Mesa and
> Nvidia) mention about program crashes ... which are worked around by
> setting the env. variable.
> 
> This commit reinstates the workaround and enhances the documentation.
> 
> Cc: Federico Dossena 
> Cc: Timothy Arceri 
> Cc: Marek Olšák 
> Cc: Roland Scheidegger 
> Cc: Ian Romanick 
> Reported-by: Federico Dossena 
> Fixes: 3d81e11b493 ("mesa: remove unnecessary 'sort by year' for the GL
> extensions")
> ---
> UNTESTED: Resolved the revert conflicts, but I haven't done any actual
> testing. If Federico can do the honours that'll be amazing.
> 
>  src/mesa/main/extensions.c | 46 --
>  1 file changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
> index 25c3161f7d0..4d95a072793 100644
> --- a/src/mesa/main/extensions.c
> +++ b/src/mesa/main/extensions.c
> @@ -335,6 +335,30 @@ _mesa_extension_supported(const struct gl_context *ctx, 
> extension_index i)
> return (ctx->Version >= ext->version[ctx->API]) && base[ext->offset];
>  }
>  
> +/**
> + * Compare two entries of the extensions table.  Sorts first by year,
> + * then by name.
> + *
> + * Arguments are indices into _mesa_extension_table.
> + */
> +static int
> +extension_compare(const void *p1, const void *p2)
> +{
> +   extension_index i1 = * (const extension_index *) p1;
> +   extension_index i2 = * (const extension_index *) p2;
> +   const struct mesa_extension *e1 = &_mesa_extension_table[i1];
> +   const struct mesa_extension *e2 = &_mesa_extension_table[i2];
> +   int res;
> +
> +   res = (int)e1->year - (int)e2->year;
> +
> +   if (res == 0) {
> +  res = strcmp(e1->name, e2->name);
> +   }
> +
> +   return res;
> +}
> +
>  
>  /**
>   * Construct the GL_EXTENSIONS string.  Called the first time that
> @@ -372,8 +396,8 @@ _mesa_make_extension_string(struct gl_context *ctx)
>  
>if (i->year <= maxYear &&
>_mesa_extension_supported(ctx, k)) {
> - length += strlen(i->name) + 1; /* +1 for space */
> - extension_indices[count++] = k;
> +  length += strlen(i->name) + 1; /* +1 for space */
> +  ++count;
>}
> }
> for (k = 0; k < MAX_UNRECOGNIZED_EXTENSIONS; k++)
> @@ -385,6 +409,24 @@ _mesa_make_extension_string(struct gl_context *ctx)
>return NULL;
> }
>  
> +   /* Sort extensions in chronological order because idTech 2/3 games
> +* (e.g., Quake3 demo) store the extension list in a fixed size buffer.
> +* Some cases truncate, while others overflow the buffer. Resulting in
> +* misrendering and crashes, respectively.
> +* Address the former here, while the latter will be addressed by setting
> +* the MESA_EXTENSION_MAX_YEAR environment variable.
> +*/
> +   j = 0;
> +   for (k = 0; k < MESA_EXTENSION_COUNT; ++k) {
> +  if (_mesa_extension_table[k].year <= maxYear &&
> + _mesa_extension_supported(ctx, k)) {
> + extension_indices[j++] = k;
> +  }
> +   }
> +   assert(j == count);
> +   qsort(extension_indices, count,
> + sizeof *extension_indices, extension_compare);
> +
> /* Build the extension string.*/
> for (j = 0; j < count; ++j) {
>const struct mesa_extension *i = 
> &_mesa_extension_table[extension_indices[j]];
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   5   6   7   8   9   10   >