Re: Replacing NIR with SPIR-V?

2022-01-23 Thread Connor Abbott
On Sun, Jan 23, 2022 at 1:58 PM Abel Bernabeu
 wrote:
>>
>> Yes, NIR arrays and struct and nir_deref to deal with them but, by the time 
>> you get into the back-end, all the nir_derefs are gone and you're left with 
>> load/store messages with actual addresses (either a 64-bit memory address or 
>> a index+offset pair for a bound resource).  Again, unless you're going to 
>> dump straight into LLVM, you really don't want to handle that in your 
>> back-end unless you really have to.
>
>
> That is the thing: there is already a community maintained LLVM backend for 
> RISC-V and I need to see how to get value from that effort. And that is a 
> very typical escenario for new architectures. There is already an LLVM 
> backend for a programmable device and someone asks: could you do some 
> graphics around this without spending millions?
>
> Then your options as an engineer are:
>
> - Use Mesa as a framework and translate NIR to assembly (most likely choice).
>
> - Use Mesa as a framework and translate NIR to LLVM IR with some intrinsics, 
> then feed the pre-existing LLVM backend.

Using the pre-existing backend probably isn't a real option, because
it's designed for different things. The biggest hurdle is that for
many years now, vendors have realized that SIMT-style parallelism is
the most appropriate for GPUs, and in order to do SIMT effectively
your whole compiler stack has to be aware of it, from the frontend to
the backend. There are two "views" of your program, the "thread-level
view" which is what the programmer wrote where SIMD lanes are separate
threads and you're specifying what happens to one thread, and the
"wave-level view" which is what the machine actually executes. In the
backend you have to be aware of both the thread-level view and
wave-level view of your program in order to effectively register
allocate, and that's something that LLVM's backend infrastructure just
can't do. AMDGPU has some hacks, but they're not 100% effective and to
use them in RISC-V you'd probably have to rewrite the whole backend
anyway. For example, in the AMDGPU backend vector registers aren't
exposed as actual vector registers to LLVM's machinery, but it still
models control-flow at the "wave-level" which causes some inaccuracy
in liveness. So, the existing investment isn't really worth as much as
you think it is.

>
> - Use some new alternative, possibly a Mesa fork relying on the Khronos 
> SPIR-V to LLVM IR translator. Start fixing the tool for supporting 
> graphics... Make SPIR-V the IR that communicates frontend and backend :-)
>
> I am not thinking in terms of what is best for Mesa, but in terms of how 
> could the RISC-V community organize its effort given that an LLVM backend is 
> a given thing.
>
> I see the current reasons why NIR is preferred over SPIR-V in Mesa. So far 
> you have given me three
>
> - There is a well designed library for traversing NIR, whereas SPIR-V defines 
> nothing.
> - The arrays and structs are lowered before the shader is passed to the 
> backend.
> - You see SPIR-V as a "serializing" format for IR to be exchanged through the 
> network (like a .PNG for shaders), whereas NIR's focus is more about how the 
> data structures are represented in memory while in use.
>
> My takeaway messages are two:
>
> - Advise to support NIR on the RISC-V plan.

Sure.

> - If I have a chance, suggest to Khronos making SPIR-V more like NIR, so in 
> the future it is considered beyond a serializing format.

No, this is a terrible idea. Serialization formats like SPIR-V and IRs
intended to be consumed by a backend like NIR (and LLVM) have very
different needs - SPIR-V has to be backwards compatible and have very
broad support, NIR has to represent lower-level constructs that SPIR-V
doesn't, etc. If you created an in-memory IR datastructure that
adhered slavishly to SPIR-V you'd have a terrible replacement for NIR,
and that's just by necessity - they're solving different problems.

There's some history to this, because before SPIR-V there was SPIR
which was just LLVM bitcode with some graphics stuff on top. It failed
for the reasons above and was abandoned, and that's why we have SPIR-V
today.

>
> Thanks for your comments so far.
>
>
>
> On Fri, Jan 21, 2022 at 4:24 AM Jason Ekstrand  wrote:
>>
>> On Thu, Jan 20, 2022 at 5:49 PM Abel Bernabeu 
>>  wrote:
>>>
>>> In principle, all the properties you highlight in your blog as key points 
>>> of NIR also apply to SPIR-V.
>>
>>
>> First off, that blog post is truly ancient.  Based on the quote from 
>> nir_opt_algebraic.c, it looks like less than 6 months after the original NIR 
>> patches landed which puts it at 5-6 years old.  A lot has changed since then.
>>
>>>
>>> I was curious to know where in the details that I miss, NIR starts shining 
>>> as a more suitable IR than SPIR-V for the task of communicating front-end 
>>> and back-end. By the way, thanks for putting together that blog post.
>>
>>
>> In terms of what they're capable of communicating, 

Re: NIR: is_used_once breaks multi-pass rendering

2022-01-20 Thread Connor Abbott
There are precise rules for when calculations in GL must return the
same result, which are laid out in Appendix A ("Invariance"). The
relevant rule here:

"Rule 4 The same vertex or fragment shader will produce the same result when
run multiple times with the same input. The wording ‘the same shader’ means a
program object that is populated with the same source strings, which
are compiled
and then linked, possibly multiple times, and which program object is
then executed
using the same GL state vector. Invariance is relaxed for shaders with
side effects,
such as accessing atomic counters (see section A.5)"

The key part is "using the same GL state vector." In particular that
includes which buffers are attached - and the entire program has to be
the same, which allows varying linking optimizations. The intent of
this was probably exactly to enable these sorts of optimizations based
on what's using a value while allowing aggressive cross-stage
optimizations. This means that in your example, the app is broken and
it should be using "invariant gl_Position;" (i.e. what the app
workaround does).

There's a question of whether apps are broken often enough that we
ought to enable vs_position_always_invariant by default. That will
obviously come with some performance cost, although maybe it's
acceptable enough in practice.

Connor

On Thu, Jan 20, 2022 at 9:31 AM Marek Olšák  wrote:
>
> Hi,
>
> "is_used_once" within an inexact transformation in nir_opt_algebraic can lead 
> to geometry differences with multi-pass rendering, causing incorrect output. 
> Here's an example to proof this:
>
> Let's assume there is a pass that writes out some intermediate value from the 
> position calculation as a varying. Let's assume there is another pass that 
> does the same thing, but only draws to the depth buffer, so varyings are 
> eliminated. The second pass would get "is_used_once" because there is just 
> the position, and let's assume there is an inexact transformation with 
> "is_used_once" that matches that. On the other hand, the first pass wouldn't 
> get "is_used_once" because there is the varying. Now the same position 
> calculation is different for each pass, causing depth test functions commonly 
> used in multi-pass rendering such as EQUAL to fail.
>
> The application might even use the exact same shader for both passes, and the 
> driver might just look for depth-only rendering and remove the varyings based 
> on that. Or it can introduce more "is_used_once" cases via uniform inlining. 
> From the app's point of view, the positions should be identical between both 
> passes if it's the exact same shader.
>
> The workaround we have for this issue is called 
> "vs_position_always_invariant", which was added for inexact FMA fusing, but 
> it works with all inexact transformations containing "is_used_once".
>
> This issue could be exacerbated by future optimizations.
>
> Some of the solutions are:
> - Remove "is_used_once" (safe)
> - Enable vs_position_always_invariant by default (not helpful if the data 
> flow is shader->texture->shader->position)
> - Always suppress inexact transformations containing "is_used_once" for all 
> instructions contributing to the final position value (less aggressive than 
> vs_position_always_invariant; it needs a proof that it's equivalent to 
> vs_position_always_invariant in terms of invariance, not behavior)
> - Continue using app workarounds.
>
> Just some food for thought.
>
> Marek


Re: git and Marge troubles this week

2022-01-07 Thread Connor Abbott
On Fri, Jan 7, 2022 at 6:32 PM Emma Anholt  wrote:
>
> On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott  wrote:
> >
> > Unfortunately batch mode has only made it *worse* - I'm sure it's not
> > intentional, but it seems that it's still running the CI pipelines
> > individually after the batch pipeline passes and not merging them
> > right away, which completely defeats the point. See, for example,
> > !14213 which has gone through 8 cycles being batched with earlier MRs,
> > 5 of those passing only to have an earlier job in the batch spuriously
> > fail when actually merging and Marge seemingly giving up on merging it
> > (???). As I type it was "lucky" enough to be the first job in a batch
> > which passed and is currently running its pipeline and is blocked on
> > iris-whl-traces-performance (I have !14453 to disable that broken job,
> > but who knows with the Marge chaos when it's going to get merged...).
> >
> > Stepping back, I think it was a bad idea to push a "I think this might
> > help" type change like this without first carefully monitoring things
> > afterwards. An hour or so of babysitting Marge would've caught that
> > this wasn't working, and would've prevented many hours of backlog and
> > perception of general CI instability.
>
> I spent the day watching marge, like I do every day.  Looking at the
> logs, we got 0 MRs in during my work hours PST, out of about 14 or so
> marge assignments that day.  Leaving marge broken for the night would
> have been indistinguishable from the status quo, was my assessment.

Yikes, that's awful - and I know it's definitely not easy keeping
everything running!

But unfortunately it seems like the problems that day were transient,
and as I said earlier there were at least 6 MRs that succeeded that
would've been merged if they weren't batch MRs, so enabling batch mode
did wind up causing some damage compared to doing nothing. So doing it
the next day, when there was some possibility to follow up any
problems, would've been better. Not to take away from what you guys
are doing, just a lesson for next time.

Connor


Re: git and Marge troubles this week

2022-01-07 Thread Connor Abbott
On Fri, Jan 7, 2022 at 6:32 PM Emma Anholt  wrote:
>
> On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott  wrote:
> >
> > Unfortunately batch mode has only made it *worse* - I'm sure it's not
> > intentional, but it seems that it's still running the CI pipelines
> > individually after the batch pipeline passes and not merging them
> > right away, which completely defeats the point. See, for example,
> > !14213 which has gone through 8 cycles being batched with earlier MRs,
> > 5 of those passing only to have an earlier job in the batch spuriously
> > fail when actually merging and Marge seemingly giving up on merging it
> > (???). As I type it was "lucky" enough to be the first job in a batch
> > which passed and is currently running its pipeline and is blocked on
> > iris-whl-traces-performance (I have !14453 to disable that broken job,
> > but who knows with the Marge chaos when it's going to get merged...).
> >
> > Stepping back, I think it was a bad idea to push a "I think this might
> > help" type change like this without first carefully monitoring things
> > afterwards. An hour or so of babysitting Marge would've caught that
> > this wasn't working, and would've prevented many hours of backlog and
> > perception of general CI instability.
>
> I spent the day watching marge, like I do every day.  Looking at the
> logs, we got 0 MRs in during my work hours PST, out of about 14 or so
> marge assignments that day.  Leaving marge broken for the night would
> have been indistinguishable from the status quo, was my assessment.
>
> There was definitely some extra spam about trying batches, more than
> there were actual batches attempted.  My guess would be gitlab
> connection reliability stuff, but I'm not sure.
>
> Of the 5 batches marge attempted before the change was reverted, three
> fell to https://gitlab.freedesktop.org/mesa/mesa/-/issues/5837, one to
> the git fetch fails, and one to a new timeout I don't think I've seen
> before: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/17357425#L1731.
> Of all the sub-MRs involved in those batches, I think two of those
> might have gotten through by dodging the LAVA lab fail.  Marge's batch
> backoff did work, and !14436 and maybe !14433 landed during that time.

Looks like I was a bit off with the numbers, but I double-checked and
these batch MRs containing !14213 all passed and yet it didn't get
merged: !14456, !14452, !14449, !14445, !14440, !14438... so actually
6.

!14436, for whatever reason, was never put into a batch - it worked as
before the change, probably because there weren't other MRs to combine
it with at the time. I've been looking through Marge's history and
can't find a single example where a successful batched merge happened.
Typically, when there's a successful batch MR, the first MR in the
batch gets rebased by Marge but not merged, instead its pipeline gets
run and (seemingly) Marge moves on and picks some other MR, not even
waiting for it to finish. Since (iirc) Marge picks MRs by
least-recently-active and this generates activity, it gets shoved to
the back of the queue and then gets locked in a cycle (!14213 is the
worst, but there are others). I think this happens because Mesa gates
acceptance on the pipeline passing, and therefore when Marge goes to
merge the pipelines in a batch she can't, and she just moves on to the
next one.

Connor


Re: git and Marge troubles this week

2022-01-07 Thread Connor Abbott
On Fri, Jan 7, 2022 at 3:18 PM Connor Abbott  wrote:
>
> Unfortunately batch mode has only made it *worse* - I'm sure it's not
> intentional, but it seems that it's still running the CI pipelines
> individually after the batch pipeline passes and not merging them
> right away, which completely defeats the point. See, for example,
> !14213 which has gone through 8 cycles being batched with earlier MRs,
> 5 of those passing only to have an earlier job in the batch spuriously
> fail when actually merging and Marge seemingly giving up on merging it
> (???). As I type it was "lucky" enough to be the first job in a batch
> which passed and is currently running its pipeline and is blocked on
> iris-whl-traces-performance (I have !14453 to disable that broken job,
> but who knows with the Marge chaos when it's going to get merged...).

Ah, I guess I spoke too soon and my analysis of what's happening was
wrong, because in the meantime she merged !14121 without waiting for
that broken job to timeout. It's somehow just completely ignoring
!14213 and merging other things on top of it, I guess? Either way
something is seriously wrong.

>
> Stepping back, I think it was a bad idea to push a "I think this might
> help" type change like this without first carefully monitoring things
> afterwards. An hour or so of babysitting Marge would've caught that
> this wasn't working, and would've prevented many hours of backlog and
> perception of general CI instability.
>
> Connor
>
> On Fri, Jan 7, 2022 at 6:36 AM Emma Anholt  wrote:
> >
> > As you've probably noticed, there have been issues with git access
> > this week.  The fd.o sysadmins are desperately trying to stay on
> > vacation because they do deserve a break, but have still been working
> > on the problem and a couple of solutions haven't worked out yet.
> > Hopefully we'll have some news soon.
> >
> > Due to these ongoing git timeouts, our CI runners have been getting
> > bogged down with stalled jobs and causing a lot of spurious failures
> > where the pipeline doesn't get all its jobs assigned to runners before
> > Marge gives up.  Today, I asked daniels to bump Marge's pipeline
> > timeout to 4 hours (up from 1).  To get MRs flowing at a similar rate
> > despite the longer total pipeline times, we also enabled batch mode as
> > described at 
> > https://github.com/smarkets/marge-bot/blob/master/README.md#batching-merge-requests.
> >
> > It means there are now theoretical cases as described in the README
> > where Marge might merge a set of code that leaves main broken.
> > However, those cases are pretty obscure, and I expect that failure
> > rate to be much lower than the existing "you can merge flaky code"
> > failure rate and worth the risk.
> >
> > Hopefully this gets us all productive again.


Re: git and Marge troubles this week

2022-01-07 Thread Connor Abbott
Unfortunately batch mode has only made it *worse* - I'm sure it's not
intentional, but it seems that it's still running the CI pipelines
individually after the batch pipeline passes and not merging them
right away, which completely defeats the point. See, for example,
!14213 which has gone through 8 cycles being batched with earlier MRs,
5 of those passing only to have an earlier job in the batch spuriously
fail when actually merging and Marge seemingly giving up on merging it
(???). As I type it was "lucky" enough to be the first job in a batch
which passed and is currently running its pipeline and is blocked on
iris-whl-traces-performance (I have !14453 to disable that broken job,
but who knows with the Marge chaos when it's going to get merged...).

Stepping back, I think it was a bad idea to push a "I think this might
help" type change like this without first carefully monitoring things
afterwards. An hour or so of babysitting Marge would've caught that
this wasn't working, and would've prevented many hours of backlog and
perception of general CI instability.

Connor

On Fri, Jan 7, 2022 at 6:36 AM Emma Anholt  wrote:
>
> As you've probably noticed, there have been issues with git access
> this week.  The fd.o sysadmins are desperately trying to stay on
> vacation because they do deserve a break, but have still been working
> on the problem and a couple of solutions haven't worked out yet.
> Hopefully we'll have some news soon.
>
> Due to these ongoing git timeouts, our CI runners have been getting
> bogged down with stalled jobs and causing a lot of spurious failures
> where the pipeline doesn't get all its jobs assigned to runners before
> Marge gives up.  Today, I asked daniels to bump Marge's pipeline
> timeout to 4 hours (up from 1).  To get MRs flowing at a similar rate
> despite the longer total pipeline times, we also enabled batch mode as
> described at 
> https://github.com/smarkets/marge-bot/blob/master/README.md#batching-merge-requests.
>
> It means there are now theoretical cases as described in the README
> where Marge might merge a set of code that leaves main broken.
> However, those cases are pretty obscure, and I expect that failure
> rate to be much lower than the existing "you can merge flaky code"
> failure rate and worth the risk.
>
> Hopefully this gets us all productive again.


Re: [Mesa-dev] SpvOpSelect w/ float operands

2020-11-18 Thread Connor Abbott
On Tue, Nov 17, 2020 at 9:56 PM Brian Paul  wrote:
>
> On 11/17/2020 11:49 AM, Ian Romanick wrote:
> > On 11/17/20 9:25 AM, Brian Paul wrote:
> >>
> >> Using the Intel Vulkan driver, we've found some cases where SpvOpSelect
> >> is returning -0.0 (negative zeros) instead of normal 0.0 depending on
> >> the arguments.
> >
> > Do you have a specific test case that fails?
>
> Yeah, but as with the NMin/NMax issue it's not a simple test case.  It
> comes from a Windows WHCK test suite.
>
>
> > It seems like on some platforms there was an errata about the version of
> > the SEL instruction that is used for min or max that can return the
> > wrong signed zero in some cases.
> >
> > It's also possible that some optimizations are causing problems.  I
> > don't remember exactly how it works in SPIR-V, but does marking those
> > SPIR-V instructions as precise (that's what it was in GLSL) make a
> > difference?
>
> AFAIK, there's only a SPIR-V decoration for tagging things for relaxed
> precision.

The SPIR-V equivalent of "precise" is NoContract. But for emulating DX
stuff you're supposed to use KHR_shader_float_controls which was
specifically designed for emulating DX floating-point requirements. In
NIR that just marks everything as "exact" if you force correct
NaN/signed zero/Inf handling.

>
> -Brian
>
> >
> >> I'm wondering if "SpvOpSelect x, a, b" for floats is being implemented
> >> with something like "a*x + b*(1-x)" ?  That might explain where the
> >> negative zeros are coming from.
> >>
> >> Our work-around is to implement selection with bitwise operations: (a &
> >> x) | (b & ~x)
> >>
> >> It seems to me that SpvOpSelect shouldn't interpret the bits and just
> >> return an exact copy of the argument.
> >>
> >> -Brian
> >> ___
> >> mesa-dev mailing list
> >> mesa-dev@lists.freedesktop.org
> >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=04%7C01%7Cbrianp%40vmware.com%7C5e1ffabf0f7c47a2f48308d88b298304%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637412357691565798%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=SdzrOYvGGFraVg61OGvIQ6NJrT2i7fye%2Fy03XoZOi7E%3Dreserved=0
> >
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] gl_nir_lower_samplers_as_deref vs drawpixels lowering

2019-11-25 Thread Connor Abbott
Why are you calling gl_nir_lower_samplers_as_deref twice? The entire
point of it is to deal with the crazy legacy GL model where samplers
are "just normal uniforms" that can be embedded in structs and calling
glUniform() can update the sampler binding. It doesn't touch samplers
with an explicit binding at all. It does set nir->info.textures_used,
but that's a small part of what it does. So while you certainly could
make it re-entrant, it really wouldn't make any sense to call it twice
as it isn't necessary for any mesa-internal samplers besides the
(trivial) updating textures_used. So I'd say that the best course of
action is to just update nir->info.textures_used by yourself in the
drawpix lowering pass and not call gl_nir_lower_samplers_as_deref.
Maybe longer-term the solution is to fill that out in nir_gather_info?

Connor

On Mon, Nov 25, 2019 at 6:56 AM Dave Airlie  wrote:
>
> I was asked to use some newer radeonsi code in my tgsi info gathering
> wrapper for NIR. One of the things it does is use
> nir->info.textures_used.
>
> Now with the piglit test draw-pixel-with-texture there is a reentrancy
> issue with the passes.
>
> We create a shader and gl_nir_lower_samplers_as_deref get called on
> it, this sets nir->info.textures_used to 1, and it also lowers all
> texture derefs in the shader.
>
> The shader gets used in a variant later for drawpixels, the drawpixels
> lowering then adds it's own "drawpix" sampler an accesses it with
> derefs. Then gl_nir_lower_samplers_as_deref gets called again for the
> whole shader in finalisation, but this time the first set of derefs
> have already been lowered so it only lowers the new drawpix ones, and
> sets nir->info.textures_used to 2 (it's a bitfield), it should be 3.
>
> Are the other drivers seeing this? any ideas on what needs to be
> fixed, the nir sampler lowering could I suppose record
> nir->info.textures_used for non-deref textures as well.
>
> Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 00/37] panfrost: Support batch pipelining

2019-09-16 Thread Connor Abbott
As a drive-by comment, in case you didn't know, the "standard"
solution for avoiding flushing when BO's are written by the CPU (e.g.
uniform buffer updates) as documented in ARM's performance guide is to
add a copy-on-write mechanism, so that you have "shadow" BO's when the
original BO is modified by the user. I believe this is implemented in
freedreno, at least there was a talk about it at XDC a few years ago.

On Mon, Sep 16, 2019 at 4:37 PM Boris Brezillon
 wrote:
>
> Hello,
>
> This is the second attempt at supporting batch pipelining. This time I
> implemented it using a dependency graph (as suggested by Alyssa and
> Steven) so that batch submission can be delayed even more: the only
> time we flush batches now is when we have an explicit flush or when
> the CPU needs to access a BO (we might want to tweak that a bit to
> avoid the extra latency incurred by this solution). With that in place
> we hope to increase GPU utilization.
>
> A few words about the patches in this series:
>
> * Like the previous version, this series is a mix of cleanups and
>   functional changes. Most of them should be pretty trivial to review
>   and I intend to merge them independently once they have receive
>   proper review (to avoid having to send another patch bomb like this
>   one).
>
> * The "rework BO API" batch has been split to ease review
>
> * Patches 35 and 36 are not mandatory, but I remember reading (I think
>   it was Steven who mentioned that) that draw order matters when
>   queueing render operations for different frames (frame N should
>   ideally be ready before frame N+1). Not sure if enforcing draw call
>   order is enough to guarantee that rendering of frame N always
>   finishes before frame N+1 though.
>
> Regards,
>
> Boris
>
> Boris Brezillon (37):
>   panfrost: Stop exposing internal panfrost_*_batch() functions
>   panfrost: Use the correct type for the bo_handle array
>   panfrost: Add missing panfrost_batch_add_bo() calls
>   panfrost: Add polygon_list to the batch BO set at allocation time
>   panfrost: Kill a useless memset(0) in panfrost_create_context()
>   panfrost: Stop passing has_draws to panfrost_drm_submit_vs_fs_batch()
>   panfrost: Get rid of pan_drm.c
>   panfrost: Move panfrost_bo_{reference,unreference}() to pan_bo.c
>   panfrost: s/PAN_ALLOCATE_/PAN_BO_/
>   panfrost: Move the BO API to its own header
>   panfrost: Stop exposing panfrost_bo_cache_{fetch,put}()
>   panfrost: Don't check if BO is mmaped before calling
> panfrost_bo_mmap()
>   panfrost: Stop passing screen around for BO operations
>   panfrost: Stop using panfrost_bo_release() outside of pan_bo.c
>   panfrost: Add panfrost_bo_{alloc,free}()
>   panfrost: Don't return imported/exported BOs to the cache
>   panfrost: Make sure the BO is 'ready' when picked from the cache
>   panfrost: Add flags to reflect the BO imported/exported state
>   panfrost: Add the panfrost_batch_create_bo() helper
>   panfrost: Add FBO BOs to batch->bos earlier
>   panfrost: Allocate tiler and scratchpad BOs per-batch
>   panfrost: Extend the panfrost_batch_add_bo() API to pass access flags
>   panfrost: Make panfrost_batch->bos a hash table
>   panfrost: Cache GPU accesses to BOs
>   panfrost: Add a batch fence
>   panfrost: Use the per-batch fences to wait on the last submitted batch
>   panfrost: Add a panfrost_freeze_batch() helper
>   panfrost: Start tracking inter-batch dependencies
>   panfrost: Prepare panfrost_fence for batch pipelining
>   panfrost: Add a panfrost_flush_all_batches() helper
>   panfrost: Add a panfrost_flush_batches_accessing_bo() helper
>   panfrost: Kill the explicit serialization in panfrost_batch_submit()
>   panfrost: Get rid of the flush in panfrost_set_framebuffer_state()
>   panfrost: Do fine-grained flushing when preparing BO for CPU accesses
>   panfrost: Rename ctx->batches into ctx->fbo_to_batch
>   panfrost: Take draw call order into account
>   panfrost/ci: New tests are passing
>
>  .../drivers/panfrost/ci/expected-failures.txt |   4 -
>  src/gallium/drivers/panfrost/meson.build  |   1 -
>  src/gallium/drivers/panfrost/pan_allocate.c   |  22 +-
>  src/gallium/drivers/panfrost/pan_allocate.h   |  20 -
>  src/gallium/drivers/panfrost/pan_assemble.c   |   3 +-
>  src/gallium/drivers/panfrost/pan_blend_cso.c  |  13 +-
>  src/gallium/drivers/panfrost/pan_bo.c | 331 +++-
>  src/gallium/drivers/panfrost/pan_bo.h | 130 +++
>  src/gallium/drivers/panfrost/pan_compute.c|   2 +-
>  src/gallium/drivers/panfrost/pan_context.c| 175 ++--
>  src/gallium/drivers/panfrost/pan_context.h|  22 +-
>  src/gallium/drivers/panfrost/pan_drm.c| 394 -
>  src/gallium/drivers/panfrost/pan_fragment.c   |   3 -
>  src/gallium/drivers/panfrost/pan_instancing.c |   6 +-
>  src/gallium/drivers/panfrost/pan_job.c| 760 --
>  src/gallium/drivers/panfrost/pan_job.h|  85 +-
>  src/gallium/drivers/panfrost/pan_mfbd.c   |   1 +
>  

[Mesa-dev] [PATCH] nir/lower_io_to_temporaries: Fix hash table leak

2019-07-08 Thread Connor Abbott
Fixes: c45f5db527252384395e55fb1149b673ec7b5fa8 ("nir/lower_io_to_temporaries: 
Handle interpolation intrinsics")
---
Whoops...

 src/compiler/nir/nir_lower_io_to_temporaries.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/compiler/nir/nir_lower_io_to_temporaries.c 
b/src/compiler/nir/nir_lower_io_to_temporaries.c
index c865c7de10c..f92489b9d51 100644
--- a/src/compiler/nir/nir_lower_io_to_temporaries.c
+++ b/src/compiler/nir/nir_lower_io_to_temporaries.c
@@ -364,4 +364,6 @@ nir_lower_io_to_temporaries(nir_shader *shader, 
nir_function_impl *entrypoint,
exec_list_append(>globals, _outputs);
 
nir_fixup_deref_modes(shader);
+
+   _mesa_hash_table_destroy(state.input_map, NULL);
 }
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Possible bug in nir_algebraic?

2019-06-22 Thread Connor Abbott
I haven't thought about whether it's algebraically correct, but
otherwise your pattern looks fine to me.

If you haven't noticed already, I added some commented out code to
nir_replace_instr() that will print out each pattern that's matched.
The first thing I'd do is move that up to the beginning of the
function, so that it prints potential matches instead of actual
matches. If your pattern shows up as a potential match, then it's a
problem with match_expression() and not the automaton, and at that
point you can start setting breakpoints and stepping through it.

If it's not a potential match because the automaton filtered it out,
then debugging is currently a little harder. You'll have to add some
debugging code to ${pass_name}_pre_block() that prints out the
assigned state for each ALU instruction. The state is an integer which
is conceptually an index into TreeAutomaton.states for the constructed
TreeAutomaton. So if an instruction has state n, then
automaton.states[n] should be a set of potential partial matches for
that instruction. Note that variables like a, b, c etc. are converted
into __wildcard while constants are all converted into __constant. So
for example, ssa_2 should have a set { __const, __wildcard } as it can
be matched as a variable or as a constant (we actually explicitly
construct this set in TreeAutomaton._build_table). ssa_83 should have
a set { __wildcard, (neg __wildcard) } since it can be matched either
as a variable or as something like (neg a). (yes, this is very similar
to the subset construction for getting a DFA from an NFA...). Unless
there's a bug, each of these "match sets" should contain the
appropriate subset of your pattern until ssa_97 which should have the
full pattern as one of the entries in the set. Let us know the details
if that's not the case.

I think that we could definitely do better when it comes to debugging
why the automaton didn't match something. We could emit the
automaton's state list in C, and then have a debugging option to print
the match set for each instruction so you'd know where something went
awry. I didn't do that earlier since I didn't have a need for it while
bringing up the automaton, but we could add it if it helps. That being
said, hopefully you won't need it this time :)

Best,

Connor

On Sat, Jun 22, 2019 at 2:26 AM Ian Romanick  wrote:
>
> I have encountered what I believe to be a bug in nir_algebraic.  Since
> the rewrite to use automata, I'm not sure how to begin debugging it.
> I'm looking for some suggestions... even if the suggestion is, "Fix your
> patterns."
>
> I have added a pattern like:
>
>(('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)),
>   ('fadd', 1.0, ('fneg', a))),
>  ('fmul', ('flrp', a, 1.0, a), b)),
> ('flrp', 1.0, b, a), '!options->lower_flrp32'),
>
> While using NIR_PRINT=1, I see this in my instruction stream:
>
> vec1 32 ssa_2 = load_const (0x3f80 /* 1.00 */)
> ...
> vec1 32 ssa_196 = intrinsic load_uniform (ssa_195) (68, 4, 160)
> vec1 32 ssa_83 = fneg ssa_196
> vec1 32 ssa_84 = fadd ssa_83, ssa_2
> vec1 32 ssa_85 = fmul ssa_84, ssa_84
> ...
> vec1 32 ssa_95 = flrp ssa_196, ssa_2, ssa_196
> vec1 32 ssa_96 = fmul ssa_78, ssa_95
> vec1 32 ssa_97 = fadd ssa_96, ssa_85
>
> But nir_opt_algebraic does not make any progress.  It sure looks like it
> should trigger with a = ssa_196 and b = ssa_78.
>
> However, progress is made if I change the pattern to
>
>(('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)),
>   c),
>  ('fmul', ('flrp', a, 1.0, a), b)),
> ('flrp', 1.0, b, a), '!options->lower_flrp32'),
>
> ssa_85 is definitely ('fmul', ssa_84, ssa_84), and ssa_84 is definitely
> ('fadd', 1.0, ('fneg', ssa_196))... both times. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] ac, radeonsi: Always mark buffer stores as inaccessiblememonly

2019-06-18 Thread Connor Abbott
inaccessiblememonly means that it doesn't modify memory accesible via
normal LLVM pointers. This lets LLVM's dead store elimination, memcpy
forwarding, etc. ignore functions with this attribute. We don't
represent descriptors as pointers, so this property is always true of
buffer and image stores. There are plans to represent descriptors via
pointers, but this just means that now nothing is inaccessiblememonly,
as LLVM will then understand loads/stores via its usual alias analysis.

Radeonsi was mistakenly only setting it if the driver could prove that
there were no reads, and then it was cargo-culted into ac_llvm_build
and ac_llvm_to_nir. Rip it out of everything.

statistics with nir enabled:

Totals from affected shaders:
SGPRS: 152 -> 152 (0.00 %)
VGPRS: 128 -> 132 (3.12 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 9324 -> 9244 (-0.86 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 17 -> 17 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

The only difference was a manhattan31 shader.

Acked-by: Timothy Arceri 
---

I included this in
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1018 since it
was related to my goal of making sure that the NIR code added all the
attributes that the TGSI code does for buffer stores/loads, but I'm
sending it here for visibility since it affects the old paths too and
involves some LLVM knowledge.

 src/amd/common/ac_llvm_build.c| 61 +++
 src/amd/common/ac_llvm_build.h| 16 ++---
 src/amd/common/ac_llvm_util.h |  7 ---
 src/amd/common/ac_nir_to_llvm.c   | 12 ++--
 src/amd/vulkan/radv_nir_to_llvm.c | 20 +++---
 .../radeonsi/si_compute_prim_discard.c|  4 +-
 src/gallium/drivers/radeonsi/si_shader.c  | 26 
 .../drivers/radeonsi/si_shader_tgsi_mem.c |  8 +--
 8 files changed, 61 insertions(+), 93 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index b93fdde023e..1e6247ad72e 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -1116,7 +1116,6 @@ ac_build_llvm7_buffer_store_common(struct ac_llvm_context 
*ctx,
   unsigned num_channels,
   bool glc,
   bool slc,
-  bool writeonly_memory,
   bool use_format)
 {
LLVMValueRef args[] = {
@@ -1141,7 +1140,7 @@ ac_build_llvm7_buffer_store_common(struct ac_llvm_context 
*ctx,
}
 
ac_build_intrinsic(ctx, name, ctx->voidt, args, ARRAY_SIZE(args),
-  ac_get_store_intr_attribs(writeonly_memory));
+  AC_FUNC_ATTR_INACCESSIBLE_MEM_ONLY);
 }
 
 static void
@@ -1155,7 +1154,6 @@ ac_build_llvm8_buffer_store_common(struct ac_llvm_context 
*ctx,
   LLVMTypeRef return_channel_type,
   bool glc,
   bool slc,
-  bool writeonly_memory,
   bool use_format,
   bool structurized)
 {
@@ -1184,7 +1182,7 @@ ac_build_llvm8_buffer_store_common(struct ac_llvm_context 
*ctx,
}
 
ac_build_intrinsic(ctx, name, ctx->voidt, args, idx,
-  ac_get_store_intr_attribs(writeonly_memory));
+  AC_FUNC_ATTR_INACCESSIBLE_MEM_ONLY);
 }
 
 void
@@ -1195,18 +1193,17 @@ ac_build_buffer_store_format(struct ac_llvm_context 
*ctx,
 LLVMValueRef voffset,
 unsigned num_channels,
 bool glc,
-bool slc,
-bool writeonly_memory)
+bool slc)
 {
if (HAVE_LLVM >= 0x800) {
ac_build_llvm8_buffer_store_common(ctx, rsrc, data, vindex,
   voffset, NULL, num_channels,
   ctx->f32, glc, slc,
-  writeonly_memory, true, 
true);
+  true, true);
} else {
ac_build_llvm7_buffer_store_common(ctx, rsrc, data, vindex, 
voffset,
   num_channels, glc, slc,
-  writeonly_memory, true);
+  true);
}
 }
 
@@ -1224,7 +1221,6 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx,
unsigned inst_offset,
bool glc,
bool slc,
-   bool writeonly_memory,
bool 

[Mesa-dev] [PATCH] ac/nir: Remove stale TODO

2019-06-05 Thread Connor Abbott
While we're here, copy the comment explaining this from radeonsi.
---
 src/amd/common/ac_nir_to_llvm.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 833b1e54abc..11de22a8cbd 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -3878,7 +3878,13 @@ static void visit_tex(struct ac_nir_context *ctx, 
nir_tex_instr *instr)
args.offset = NULL;
}
 
-   /* TODO TG4 support */
+   /* DMASK was repurposed for GATHER4. 4 components are always
+* returned and DMASK works like a swizzle - it selects
+* the component to fetch. The only valid DMASK values are
+* 1=red, 2=green, 4=blue, 8=alpha. (e.g. 1 returns
+* (red,red,red,red) etc.) The ISA document doesn't mention
+* this.
+*/
args.dmask = 0xf;
if (instr->op == nir_texop_tg4) {
if (instr->is_shadow)
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: Don't force dcc disable for loads

2019-06-05 Thread Connor Abbott
When e9d935ed0e2 added force_dcc_off(), we forced it off for any
preloaded image descriptor which had stores associated with them, since
the same preloaded descriptors were used for loads and stores. However,
when the preloading was removed in 16be87c9042, the existing logic was
kept despite it not being necessary anymore. The comment above
force_dcc_off() only mentions stores, so only force DCC off for stores.

Cc: Nicolai Hähnle 
Cc: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_shader_nir.c  | 6 --
 src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c | 7 ---
 2 files changed, 13 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
b/src/gallium/drivers/radeonsi/si_shader_nir.c
index 72e6ffbac8a..a852283aff0 100644
--- a/src/gallium/drivers/radeonsi/si_shader_nir.c
+++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
@@ -997,16 +997,10 @@ si_nir_load_sampler_desc(struct ac_shader_abi *abi,
 bool write, bool bindless)
 {
struct si_shader_context *ctx = si_shader_context_from_abi(abi);
-   const struct tgsi_shader_info *info = >shader->selector->info;
LLVMBuilderRef builder = ctx->ac.builder;
unsigned const_index = base_index + constant_index;
bool dcc_off = write;
 
-   /* TODO: images_store and images_atomic are not set */
-   if (!dynamic_index && image &&
-   (info->images_store | info->images_atomic) & (1 << const_index))
-   dcc_off = true;
-
assert(!descriptor_set);
assert(!image || desc_type == AC_DESC_IMAGE || desc_type == 
AC_DESC_BUFFER);
 
diff --git a/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c 
b/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
index c5704bc0eae..53075f1b546 100644
--- a/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
+++ b/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
@@ -218,15 +218,8 @@ image_fetch_rsrc(
bool dcc_off = is_store;
 
if (!image->Register.Indirect) {
-   const struct tgsi_shader_info *info = bld_base->info;
-   unsigned images_writemask = info->images_store |
-   info->images_atomic;
-
index = LLVMConstInt(ctx->i32,
 si_get_image_slot(image->Register.Index), 
0);
-
-   if (images_writemask & (1 << image->Register.Index))
-   dcc_off = true;
} else {
/* From the GL_ARB_shader_image_load_store extension spec:
 *
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi/nir: Fix type in bindless address computation

2019-06-04 Thread Connor Abbott
On Tue, Jun 4, 2019 at 4:24 PM Juan A. Suarez Romero
 wrote:
>
> On Fri, 2019-05-31 at 14:55 +0200, Connor Abbott wrote:
> > Bindless handles in GL are 64-bit. This fixes an assert failure in LLVM.
>
> Does it make sense to nominate this for stable release?

I don't think so, since radeonsi NIR still isn't enabled by default
yet. This is work towards being able to do that.

Connor
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi/nir: Fix type in bindless address computation

2019-05-31 Thread Connor Abbott
Bindless handles in GL are 64-bit. This fixes an assert failure in LLVM.
---

With this patch, we now have Piglit parity in debug mode.

 src/gallium/drivers/radeonsi/si_shader_nir.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
b/src/gallium/drivers/radeonsi/si_shader_nir.c
index 19ed71ae05d..72e6ffbac8a 100644
--- a/src/gallium/drivers/radeonsi/si_shader_nir.c
+++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
@@ -1020,7 +1020,7 @@ si_nir_load_sampler_desc(struct ac_shader_abi *abi,
 * 16-dword slots for now.
 */
dynamic_index = LLVMBuildMul(ctx->ac.builder, 
dynamic_index,
-LLVMConstInt(ctx->i32, 2, 0), "");
+LLVMConstInt(ctx->i64, 2, 0), "");
 
return si_load_image_desc(ctx, list, dynamic_index, 
desc_type,
  dcc_off, true);
@@ -1032,7 +1032,7 @@ si_nir_load_sampler_desc(struct ac_shader_abi *abi,
 * to prevent incorrect code generation and hangs.
 */
dynamic_index = LLVMBuildMul(ctx->ac.builder, dynamic_index,
-LLVMConstInt(ctx->i32, 2, 0), "");
+LLVMConstInt(ctx->i64, 2, 0), "");
list = ac_build_pointer_add(>ac, list, dynamic_index);
return si_load_sampler_desc(ctx, list, ctx->i32_0, desc_type);
}
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: Fix editorconfig

2019-05-17 Thread Connor Abbott
At least on vim, indenting doesn't work without this. Copied from
src/amd/vulkan.
---
 src/gallium/drivers/radeonsi/.editorconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/radeonsi/.editorconfig 
b/src/gallium/drivers/radeonsi/.editorconfig
index cc8e11ffd65..21a3c7d1274 100644
--- a/src/gallium/drivers/radeonsi/.editorconfig
+++ b/src/gallium/drivers/radeonsi/.editorconfig
@@ -1,2 +1,3 @@
 [*.{c,h}]
 indent_style = tab
+indent_size = tab
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] docs: advice to resolve discussion on gitlab MR doc

2019-05-16 Thread Connor Abbott
Some grammar nits:

- "Resolve Discussion" goes before "button" as it modifies it.
- It's either "This way..." or "In this manner...", not "In this
way...", although the latter is a little too stilted/over-formal here.
- This isn't a hypothetical or another situation where "would know..."
is appropriate.
- "...which didn't" (since it's short for "which didn't get handled").

The corrected text is:

After an update, for the feedback you handled, close the feedback
discussion with the "Resolve Discussion" button. This way the reviewer
knows which feedback got handled and which didn't.

I definitely didn't notice this when I started using Gitlab either.
With the fixed text:

Reviewed-by: Connor Abbott 

On Thu, May 16, 2019 at 11:36 AM Alejandro Piñeiro  wrote:
>
> For newcomers to gitlab, it is not evident that it is better to press
> the "Resolve Discussion" button when you update your branch handling
> feedback.
> ---
>
> As the commit message says, it is not always evident. I was pointed to
> do that when I started to use gitlab, and just today I mentioned it to
> two different people that didn't know about that.
>
> Having said so, I feel that the specific text needs some poulishing
> first, so any suggestion is welcome.
>
>  docs/submittingpatches.html | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/docs/submittingpatches.html b/docs/submittingpatches.html
> index 020e73d09ec..147b97d76e1 100644
> --- a/docs/submittingpatches.html
> +++ b/docs/submittingpatches.html
> @@ -258,6 +258,10 @@ your email administrator for this.)
>  
>  
>Make changes and update your branch based on feedback
> +  After an update, for the feedback you handled, close the
> +  feedback discussion with the button "Resolve Discussion". In this
> +  way the reviewer would know which feedback got handled and which
> +  not.
>Old, stale MR may be closed, but you can reopen it if you
>  still want to pursue the changes
>You should periodically check to see if your MR needs to be
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC PATCH 2/2] panfrost/midgard: Ignore imov/fmov distinction

2019-05-13 Thread Connor Abbott
On Mon, May 13, 2019 at 4:48 PM Alyssa Rosenzweig  wrote:
>
> > Using nir_gather_ssa_types is wrong. In both Midgard and NIR, SSA
> > values are just a bunch of bits with no int/float distinction, and
> > therefore you shouldn't need to know how a register is used to compile
> > the instruction producing it. The only distinction between imov and
> > fmov, in both NIR and the Midgard ISA, is what modifiers they support.
> > What you want to do is probably what you originally did, i.e. use fmov
> > for NIR fmov as well as fabs and fneg, imov for imov (and if we delete
> > fmov, just using it for fabs and fneg). If this fixes any bugs, it's
> > just papering over bugs in your backend, and you should fix them
> > instead. Note that later GLSL versions have intBitsToFloat() and
> > floatBitsToInt() which blow away any assumption that the types will be
> > consistent.
>
> That's how my mental model was, but it doesn't look to be the case: the
> blob is very consistent in emitting imov/fmov distinclty (and
> icsel/fcsel), and a lot of bizarre bugs come up if you do it any other
> way, even absent any modifiers. There's -some- difference, it's just not
> obvious what. Although I'll admit playing with intBitsToFloat/etc allow
> seemingly wrong shaders even with the blob..

This means that you're probably missing something else, and this is
papering over another bug. What the blob happens to use is irrelevant,
and of course you can always force it to do the "wrong" move by
reinterpreting things using intBitsToFloat()/floatBitsToInt().
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC PATCH 2/2] panfrost/midgard: Ignore imov/fmov distinction

2019-05-13 Thread Connor Abbott
Using nir_gather_ssa_types is wrong. In both Midgard and NIR, SSA
values are just a bunch of bits with no int/float distinction, and
therefore you shouldn't need to know how a register is used to compile
the instruction producing it. The only distinction between imov and
fmov, in both NIR and the Midgard ISA, is what modifiers they support.
What you want to do is probably what you originally did, i.e. use fmov
for NIR fmov as well as fabs and fneg, imov for imov (and if we delete
fmov, just using it for fabs and fneg). If this fixes any bugs, it's
just papering over bugs in your backend, and you should fix them
instead. Note that later GLSL versions have intBitsToFloat() and
floatBitsToInt() which blow away any assumption that the types will be
consistent.

On Mon, May 13, 2019 at 5:48 AM Alyssa Rosenzweig  wrote:
>
> We use nir_gather_ssa_types, rather than the instruction name, to decide
> how moves should be compiled. This is important since the imov/fmov
> never mapped to what Midgard needed to begin with. This should allow
> Jason's imov/fmov merger to proceed without regressing Panfrost, since
> this is one less buggy behaviour we rely on.
>
> A great deal of future work is required for handling registers, which
> likely accounts for some regressions in dEQP.
>
> Signed-off-by: Alyssa Rosenzweig 
> Cc: Jason Ekstrand 
> ---
>  .../panfrost/midgard/midgard_compile.c| 81 ---
>  1 file changed, 53 insertions(+), 28 deletions(-)
>
> diff --git a/src/gallium/drivers/panfrost/midgard/midgard_compile.c 
> b/src/gallium/drivers/panfrost/midgard/midgard_compile.c
> index 4a26ba769b2..b0985f55635 100644
> --- a/src/gallium/drivers/panfrost/midgard/midgard_compile.c
> +++ b/src/gallium/drivers/panfrost/midgard/midgard_compile.c
> @@ -511,6 +511,10 @@ typedef struct compiler_context {
>  unsigned sysvals[MAX_SYSVAL_COUNT];
>  unsigned sysval_count;
>  struct hash_table_u64 *sysval_to_id;
> +
> +/* Mapping for int/float words, filled out */
> +BITSET_WORD *float_types;
> +BITSET_WORD *int_types;
>  } compiler_context;
>
>  /* Append instruction to end of current block */
> @@ -1153,10 +1157,41 @@ emit_indirect_offset(compiler_context *ctx, nir_src 
> *src)
>  emit_mir_instruction(ctx, ins);
>  }
>
> +static bool
> +is_probably_int(compiler_context *ctx, unsigned ssa_index)
> +{
> +/* TODO: Extend to registers XXX... assume float for now since that's
> + * slightly safer for reasons we don't totally get.. */
> +
> +if (ssa_index >= ctx->func->impl->ssa_alloc)
> +return false;
> +
> +bool is_float = BITSET_TEST(ctx->float_types, ssa_index);
> +bool is_int = BITSET_TEST(ctx->int_types, ssa_index);
> +
> +if (is_float && !is_int)
> +return false;
> +
> +if (is_int && !is_float)
> +return true;
> +
> +/* TODO: Other cases.. if we're not sure but it is SSA, try int... 
> this
> + * is all kinda arbitrary right now */
> +return true;
> +}
> +
>  #define ALU_CASE(nir, _op) \
> case nir_op_##nir: \
> op = midgard_alu_op_##_op; \
> break;
> +
> +#define ALU_CASE_IF(nir, _fop, _iop) \
> +case nir_op_##nir: \
> +op = is_probably_int(ctx, dest) \
> +? midgard_alu_op_##_iop : \
> +  midgard_alu_op_##_fop; \
> +break;
> +
>  static bool
>  nir_is_fzero_constant(nir_src src)
>  {
> @@ -1198,7 +1233,6 @@ emit_alu(compiler_context *ctx, nir_alu_instr *instr)
>  ALU_CASE(imax, imax);
>  ALU_CASE(umin, umin);
>  ALU_CASE(umax, umax);
> -ALU_CASE(fmov, fmov);
>  ALU_CASE(ffloor, ffloor);
>  ALU_CASE(fround_even, froundeven);
>  ALU_CASE(ftrunc, ftrunc);
> @@ -1209,7 +1243,10 @@ emit_alu(compiler_context *ctx, nir_alu_instr *instr)
>  ALU_CASE(isub, isub);
>  ALU_CASE(imul, imul);
>  ALU_CASE(iabs, iabs);
> -ALU_CASE(imov, imov);
> +
> +/* NIR is typeless here */
> +ALU_CASE_IF(imov, fmov, imov);
> +ALU_CASE_IF(fmov, fmov, imov);
>
>  ALU_CASE(feq32, feq);
>  ALU_CASE(fne32, fne);
> @@ -3361,31 +3398,6 @@ midgard_opt_copy_prop_tex(compiler_context *ctx, 
> midgard_block *block)
>  return progress;
>  }
>
> -/* We don't really understand the imov/fmov split, so always use fmov (but 
> let
> - * it be imov in the IR so we don't do unsafe floating point "optimizations"
> - * and break things */
> -
> -static void
> -midgard_imov_workaround(compiler_context *ctx, midgard_block *block)
> -{
> -mir_foreach_instr_in_block_safe(block, ins) {
> -if (ins->type != TAG_ALU_4) continue;
> -if (ins->alu.op != 

Re: [Mesa-dev] [RFC PATCH 03/17] eir: add live ranges pass

2019-05-10 Thread Connor Abbott
On Fri, May 10, 2019 at 11:47 AM Connor Abbott  wrote:
>
>
> This way of representing liveness, and then using a coloring register
> allocator, is a common anti-pattern in Mesa, that was initially copied
> from i965 which dates back to before we knew any better. I really
> don't want to see it spread to yet another driver :(.
>
> Representing live ranges like this is imprecise. If I have a program like 
> this:
>
> foo = ...
> if (...) {
>bar = ...
>... = bar; /* last use of "bar" */
> }
> ... = foo;

Whoops, that should actually read:

foo = ...
if (...) {
   bar = ...
   ... = bar; /* last use of "bar" */
} else {
   ... = foo;
}

>
> Then it will say that foo and bar interfere, even when they don't.
>
> Now, this approximation does make things a bit simpler. But, it turns
> out that if you're willing to make it, then the interference graph is
> trivially colorable via a simple linear-time algorithm. This is the
> basis of "linear-scan" register allocators, including the one in LLVM.
> If you want to go down this route, you can, but this hybrid is just
> useless as it gives you the worst of both worlds.
>
> If you want to properly build up the interference graph, it's actually
> not that hard. After doing the inter-basic-block liveness analysis,
> for each block, you initialize a bitset to the live-out bitset. Then
> you walk the block backwards, updating it at each instruction exactly
> as in liveness analysis, so that it always represents the live
> registers before each instruction. Then you add interferences between
> all of the live registers and the register(s) defined at the
> instruction.
>
> One last pitfall I'll mention is that in the real world, you'll also
> need to use reachability. If you have something like
>
> if (...)
>foo = ... /* only definition of "foo" */
>
> ... = foo;
>
> where foo is only partially defined, then the liveness of foo will
> "leak" through the if. To fix this you need to consider what's called
> "reachability," i.e. something is only live if, in addition to
> potentially being used sometime later, it is reachable (potentially
> defined sometime earlier). Reachability analysis is exactly like
> liveness analysis, but everything is backwards. i965 does this
> properly nowadays, and the change had a huge effect on spilling/RA.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC PATCH 03/17] eir: add live ranges pass

2019-05-10 Thread Connor Abbott
On Fri, May 10, 2019 at 11:09 AM Christian Gmeiner
 wrote:
>
> Signed-off-by: Christian Gmeiner 
> ---
>  src/etnaviv/compiler/eir.h|   3 +
>  src/etnaviv/compiler/eir_live_variables.c | 258 ++
>  src/etnaviv/compiler/meson.build  |   1 +
>  .../compiler/tests/eir_live_variables.cpp | 162 +++
>  src/etnaviv/compiler/tests/meson.build|  11 +
>  5 files changed, 435 insertions(+)
>  create mode 100644 src/etnaviv/compiler/eir_live_variables.c
>  create mode 100644 src/etnaviv/compiler/tests/eir_live_variables.cpp
>
> diff --git a/src/etnaviv/compiler/eir.h b/src/etnaviv/compiler/eir.h
> index a05b12de94b..38c6af4e07e 100644
> --- a/src/etnaviv/compiler/eir.h
> +++ b/src/etnaviv/compiler/eir.h
> @@ -151,6 +151,9 @@ struct eir
> /* keep track of number of allocated uniforms */
> struct util_dynarray uniform_alloc;
> unsigned uniform_offset;
> +
> +   /* Live ranges of temp registers */
> +   int *temp_start, *temp_end;

This way of representing liveness, and then using a coloring register
allocator, is a common anti-pattern in Mesa, that was initially copied
from i965 which dates back to before we knew any better. I really
don't want to see it spread to yet another driver :(.

Representing live ranges like this is imprecise. If I have a program like this:

foo = ...
if (...) {
   bar = ...
   ... = bar; /* last use of "bar" */
}
... = foo;

Then it will say that foo and bar interfere, even when they don't.

Now, this approximation does make things a bit simpler. But, it turns
out that if you're willing to make it, then the interference graph is
trivially colorable via a simple linear-time algorithm. This is the
basis of "linear-scan" register allocators, including the one in LLVM.
If you want to go down this route, you can, but this hybrid is just
useless as it gives you the worst of both worlds.

If you want to properly build up the interference graph, it's actually
not that hard. After doing the inter-basic-block liveness analysis,
for each block, you initialize a bitset to the live-out bitset. Then
you walk the block backwards, updating it at each instruction exactly
as in liveness analysis, so that it always represents the live
registers before each instruction. Then you add interferences between
all of the live registers and the register(s) defined at the
instruction.

One last pitfall I'll mention is that in the real world, you'll also
need to use reachability. If you have something like

if (...)
   foo = ... /* only definition of "foo" */

... = foo;

where foo is only partially defined, then the liveness of foo will
"leak" through the if. To fix this you need to consider what's called
"reachability," i.e. something is only live if, in addition to
potentially being used sometime later, it is reachable (potentially
defined sometime earlier). Reachability analysis is exactly like
liveness analysis, but everything is backwards. i965 does this
properly nowadays, and the change had a huge effect on spilling/RA.

>  };
>
>  struct eir_info {
> diff --git a/src/etnaviv/compiler/eir_live_variables.c 
> b/src/etnaviv/compiler/eir_live_variables.c
> new file mode 100644
> index 000..fe94e7a2a3d
> --- /dev/null
> +++ b/src/etnaviv/compiler/eir_live_variables.c
> @@ -0,0 +1,258 @@
> +/*
> + * Copyright (c) 2018 Etnaviv Project
> + * Copyright (C) 2018 Zodiac Inflight Innovations
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sub license,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> + * next paragraph) shall be included in all copies or substantial portions
> + * of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> + * Authors:
> + *Christian Gmeiner 
> + */
> +
> +#include "eir.h"
> +#include "util/bitset.h"
> +#include "util/ralloc.h"
> +#include "util/u_math.h"
> +
> +#define MAX_INSTRUCTION (1 << 30)
> +
> +struct block_data {
> +   BITSET_WORD *def;
> +   BITSET_WORD *use;
> +   BITSET_WORD *livein;
> +   BITSET_WORD *liveout;
> +   int start_ip, end_ip;
> +};
> +
> +/* Returns the variable index 

Re: [Mesa-dev] [PATCH] [Panfrost] [Bifrost] Add a few missing ops to disassembler

2019-05-06 Thread Connor Abbott
On Mon, May 6, 2019 at 1:26 AM  wrote:
>
> From: Ryan Houdek 
>
> Adds Round, RoundEven, and the two Trunc variants to the disassembler.
> Additionally adds two control register types that were used with these
> instructions.
>
> Signed-off-by: Ryan Houdek 
> ---
>  src/gallium/drivers/panfrost/bifrost/disassemble.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/src/gallium/drivers/panfrost/bifrost/disassemble.c 
> b/src/gallium/drivers/panfrost/bifrost/disassemble.c
> index daadf257896..585f6ced107 100644
> --- a/src/gallium/drivers/panfrost/bifrost/disassemble.c
> +++ b/src/gallium/drivers/panfrost/bifrost/disassemble.c
> @@ -287,6 +287,7 @@ static struct bifrost_reg_ctrl DecodeRegCtrl(struct 
> bifrost_regs regs)
>  case 1:
>  decoded.fma_write_unit = REG_WRITE_TWO;
>  break;
> +case 2:

New modes for something as simple as an instruction that rounds seems
highly suspicious... maybe this is the 64-bit clause mode? That has an
entirely different reg ctrl, and I haven't ported the changes needed
from SPD to Mesa yet. I also can't reproduce it myself with the
offline compiler. Can you share an example where the blob uses one of
these?

>  case 3:
>  decoded.fma_write_unit = REG_WRITE_TWO;
>  decoded.read_reg3 = true;
> @@ -318,6 +319,8 @@ static struct bifrost_reg_ctrl DecodeRegCtrl(struct 
> bifrost_regs regs)
>  decoded.add_write_unit = REG_WRITE_TWO;
>  decoded.clause_start = true;
>  break;
> +
> +case 7:
>  case 15:
>  decoded.fma_write_unit = REG_WRITE_THREE;
>  decoded.add_write_unit = REG_WRITE_TWO;
> @@ -681,10 +684,13 @@ static const struct fma_op_info FMAOpInfos[] = {
>  { 0xe0bc0, "UMAX3", FMA_THREE_SRC },
>  { 0xe0c00, "IMIN3", FMA_THREE_SRC },
>  { 0xe0c40, "UMIN3", FMA_THREE_SRC },
> +{ 0xe0ec5, "ROUND", FMA_ONE_SRC },
>  { 0xe0f40, "CSEL", FMA_THREE_SRC }, // src2 != 0 ? src1 : src0
>  { 0xe0fc0, "MUX.i32", FMA_THREE_SRC }, // see ADD comment
> +{ 0xe1805, "ROUNDEVEN", FMA_ONE_SRC },
>  { 0xe1845, "CEIL", FMA_ONE_SRC },
>  { 0xe1885, "FLOOR", FMA_ONE_SRC },
> +{ 0xe18c5, "TRUNC", FMA_ONE_SRC },
>  { 0xe19b0, "ATAN_LDEXP.Y.f32", FMA_TWO_SRC },
>  { 0xe19b8, "ATAN_LDEXP.X.f32", FMA_TWO_SRC },
>  // These instructions in the FMA slot, together with 
> LSHIFT_ADD_HIGH32.i32
> @@ -1177,6 +1183,7 @@ static const struct add_op_info add_op_infos[] = {
>  { 0x07bc5, "FLOG_FREXPE", ADD_ONE_SRC },
>  { 0x07d45, "CEIL", ADD_ONE_SRC },
>  { 0x07d85, "FLOOR", ADD_ONE_SRC },
> +{ 0x07dc5, "TRUNC", ADD_ONE_SRC },

You can add to the list:
7d05 -> roundEven

although bizarrely ROUND doesn't seem to have an ADD equivalent.

>  { 0x07f18, "LSHIFT_ADD_HIGH32.i32", ADD_TWO_SRC },
>  { 0x08000, "LD_ATTR.f16", ADD_LOAD_ATTR, true },
>  { 0x08100, "LD_ATTR.v2f16", ADD_LOAD_ATTR, true },
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] panfrost: Refactor blend descriptors

2019-05-05 Thread Connor Abbott
On Sun, May 5, 2019 at 1:27 AM Alyssa Rosenzweig  wrote:
>
> > The blend shader enable bit is already described in the comments in
> > the header; the blend shader is enabled when unk2 == 0.
>
> I'm pretty sure that comment was from you, but thank you ;)
>
> > (the blend shader has
> > to be allocated within the same 2^24 byte range as the main shader for
> > it to work properly anyways, even on Midgard, which is probably not
> > implemented properly on mainline).
>
> Indeed. Mainline Midgard blend shaders work (well, stubbed out so they
> just do passthrough without any real blending, but the hardware is
> correct). That said, we cap shader memory at 16MB upfront, which
> "resolves" this problem.

No, it doesn't. The high (64 - 24) bits have to be exactly the same.
So if your 16 MB allocation is not aligned, you could wind up with two
shaders crossing that boundary by sheer bad luck and then things go
boom. ARM's kernel driver dealt with it by aligning all executable
memory allocations to 2^24, but the upstream driver doesn't have a
solution yet.

>
> > Maybe it would be better if these functions got passed the
> > mali_shader_descriptor itself?
>
> Possibly. I don't have access to any Bifrost machines right now, so I
> can't test that.

I could test it, but most of the freedreno tests that exercise blend
shaders wouldn't work because you never implemented capturing the
memory for each submission separately. Have you done that since?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] panfrost: Refactor blend descriptors

2019-05-04 Thread Connor Abbott
On Sun, May 5, 2019 at 12:14 AM Alyssa Rosenzweig  wrote:
>
> This commit does a fairly large cleanup of blend descriptors, although
> there should not be any functional changes. In particular, we split
> apart the Midgard and Bifrost blend descriptors, since they are
> radically different. From there, we can identify that the Midgard
> descriptor as previously written was really two render targets'
> descriptors stuck together. From this observation, we split the Midgard
> descriptor into what a single RT actually needs. This enables us to
> correctly dump blending configuration for MRT samples on Midgard. It
> also allows the Midgard and Bifrost blend code to peacefully coexist,
> with runtime selection rather than a #ifdef. So, as a bonus, this will
> help the future Bifrost effort, eliminating one major source of
> compile-time architectural divergence.
>
> Signed-off-by: Alyssa Rosenzweig 
> ---
>  .../drivers/panfrost/include/panfrost-job.h   |  56 ---
>  src/gallium/drivers/panfrost/pan_context.c|  31 ++--
>  .../drivers/panfrost/pandecode/decode.c   | 155 +-
>  3 files changed, 122 insertions(+), 120 deletions(-)
>
> diff --git a/src/gallium/drivers/panfrost/include/panfrost-job.h 
> b/src/gallium/drivers/panfrost/include/panfrost-job.h
> index c2d922678b8..71ac054f7c3 100644
> --- a/src/gallium/drivers/panfrost/include/panfrost-job.h
> +++ b/src/gallium/drivers/panfrost/include/panfrost-job.h
> @@ -415,25 +415,37 @@ enum mali_format {
>  #define MALI_READS_ZS (1 << 12)
>  #define MALI_READS_TILEBUFFER (1 << 16)
>
> -struct mali_blend_meta {
> -#ifndef BIFROST
> -/* Base value of 0x200.
> +/* The raw Midgard blend payload can either be an equation or a shader
> + * address, depending on the context */
> +
> +union midgard_blend {
> +mali_ptr shader;
> +struct mali_blend_equation equation;
> +};
> +
> +/* On MRT Midgard systems (using an MFBD), each render target gets its own
> + * blend descriptor */
> +
> +struct midgard_blend_rt {
> +/* Flags base value of 0x200 to enable the render target.
>   * OR with 0x1 for blending (anything other than REPLACE).
> - * OR with 0x2 for programmable blending
> + * OR with 0x2 for programmable blending with 0-2 registers
> + * OR with 0x3 for programmable blending with 2+ registers
>   */
>
> -u64 unk1;
> +u64 flags;
> +union midgard_blend blend;
> +} __attribute__((packed));
>
> -union {
> -struct mali_blend_equation blend_equation_1;
> -mali_ptr blend_shader;
> -};
> +/* On Bifrost systems (all MRT), each render target gets one of these
> + * descriptors */
> +
> +struct bifrost_blend_rt {
> +/* This is likely an analogue of the flags on
> + * midgard_blend_rt */
>
> -u64 zero2;
> -struct mali_blend_equation blend_equation_2;
> -#else
>  u32 unk1; // = 0x200
> -struct mali_blend_equation blend_equation;
> +struct mali_blend_equation equation;
>  /*
>   * - 0x19 normally
>   * - 0x3 when this slot is unused (everything else is 0 except the 
> index)
> @@ -479,11 +491,13 @@ struct mali_blend_meta {
>  * in the same pool as the original shader. The kernel will
>  * make sure this allocation is aligned to 2^24 bytes.
>  */
> -   u32 blend_shader;
> +   u32 shader;
> };
> -#endif
>  } __attribute__((packed));
>
> +/* Descriptor for the shader. Following this is at least one, up to four 
> blend
> + * descriptors for each active render target */
> +
>  struct mali_shader_meta {
>  mali_ptr shader;
>  u16 texture_count;
> @@ -584,17 +598,7 @@ struct mali_shader_meta {
>   * MALI_HAS_BLEND_SHADER to decide how to interpret.
>   */
>
> -union {
> -mali_ptr blend_shader;
> -struct mali_blend_equation blend_equation;
> -};
> -
> -/* There can be up to 4 blend_meta's. None of them are required for
> - * vertex shaders or the non-MRT case for Midgard (so the blob 
> doesn't
> - * allocate any space).
> - */
> -struct mali_blend_meta blend_meta[];
> -
> +union midgard_blend blend;
>  } __attribute__((packed));
>
>  /* This only concerns hardware jobs */
> diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> b/src/gallium/drivers/panfrost/pan_context.c
> index 17b5b75db92..cab7c89ac8b 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -1000,7 +1000,7 @@ panfrost_emit_for_draw(struct panfrost_context *ctx, 
> bool with_vertex_data)
>   * maybe both are read...?) */
>
>  if (ctx->blend->has_blend_shader) {
> -ctx->fragment_shader_core.blend_shader = 
> ctx->blend->blend_shader;
> + 

Re: [Mesa-dev] [PATCH] nir/algebraic: Don't emit empty initializers for MSVC

2019-05-03 Thread Connor Abbott
On Fri, May 3, 2019 at 10:39 PM Jason Ekstrand  wrote:

> On Fri, May 3, 2019 at 3:29 PM Connor Abbott  wrote:
>
>> FWIW, the reason I changed it away was to keep it consistent with the
>> line directly above that uses the length (since the C array won't exist if
>> it's length 0). Does that convince you?
>>
>
> Nope.  First off, if you take Dylan's suggestions (which I think are
> reasonable), it doesn't use the length.  Second, it means that the C code
> will now have an unverifiable random number in it.  Are you sure it's
> really 137?  I dare you to go count them.
>

Ok, I pushed it with your change.

Connor


>
> --Jason
>
>
>> On Fri, May 3, 2019 at 10:26 PM Jason Ekstrand 
>> wrote:
>>
>>> On Thu, May 2, 2019 at 3:51 PM Dylan Baker  wrote:
>>>
>>>> Quoting Connor Abbott (2019-05-02 13:34:07)
>>>> > Just don't emit the transform array at all if there are no transforms
>>>> > for a state, and avoid trying to walk over it.
>>>> > ---
>>>> > Brian, does this build on Windows? I tested it on my shader-db
>>>> > on radeonsi.
>>>> >
>>>> > ---
>>>> >  src/compiler/nir/nir_algebraic.py | 6 +-
>>>> >  1 file changed, 5 insertions(+), 1 deletion(-)
>>>> >
>>>> > diff --git a/src/compiler/nir/nir_algebraic.py
>>>> b/src/compiler/nir/nir_algebraic.py
>>>> > index 6db749e9248..7af80a4f92e 100644
>>>> > --- a/src/compiler/nir/nir_algebraic.py
>>>> > +++ b/src/compiler/nir/nir_algebraic.py
>>>> > @@ -993,11 +993,13 @@ static const uint16_t CONST_STATE = 1;
>>>> >  % endfor
>>>> >
>>>> >  % for state_id, state_xforms in enumerate(automaton.state_patterns):
>>>> > +% if len(state_xforms) > 0: # avoid emitting a 0-length array for
>>>> MSVC
>>>>
>>>> if not state_xforms:
>>>>
>>>> Using len() to test container emptiness is an anti-pattern in python,
>>>> is is
>>>> roughly 10x slower  than this.
>>>>
>>>> >  static const struct transform ${pass_name}_state${state_id}_xforms[]
>>>> = {
>>>> >  % for i in state_xforms:
>>>> >{ ${xforms[i].search.c_ptr(cache)},
>>>> ${xforms[i].replace.c_value_ptr(cache)}, ${xforms[i].condition_index} },
>>>> >  % endfor
>>>> >  };
>>>> > +% endif
>>>> >  % endfor
>>>> >
>>>> >  static const struct per_op_table
>>>> ${pass_name}_table[nir_num_search_ops] = {
>>>> > @@ -1080,7 +1082,8 @@ ${pass_name}_block(nir_builder *build,
>>>> nir_block *block,
>>>> >switch (states[alu->dest.dest.ssa.index]) {
>>>> >  % for i in range(len(automaton.state_patterns)):
>>>> >case ${i}:
>>>> > - for (unsigned i = 0; i <
>>>> ARRAY_SIZE(${pass_name}_state${i}_xforms); i++) {
>>>>
>>>
>>> I'd rather keep the ARRAY_SIZE unless we've got a really good reason to
>>> drop it.  With that and Dylan's changes,
>>>
>>> Reviewed-by: Jason Ekstrand 
>>>
>>>
>>>> > + % if len(automaton.state_patterns[i]) > 0:
>>>>
>>>> same here
>>>>
>>>> Dylan
>>>>
>>>> > + for (unsigned i = 0; i <
>>>> ${len(automaton.state_patterns[i])}; i++) {
>>>> >  const struct transform *xform =
>>>> &${pass_name}_state${i}_xforms[i];
>>>> >  if (condition_flags[xform->condition_offset] &&
>>>> >  nir_replace_instr(build, alu, xform->search,
>>>> xform->replace)) {
>>>> > @@ -1088,6 +1091,7 @@ ${pass_name}_block(nir_builder *build,
>>>> nir_block *block,
>>>> > break;
>>>> >  }
>>>> >   }
>>>> > + % endif
>>>> >   break;
>>>> >  % endfor
>>>> >default: assert(0);
>>>> > --
>>>> > 2.17.2
>>>> >
>>>> > ___
>>>> > mesa-dev mailing list
>>>> > mesa-dev@lists.freedesktop.org
>>>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>
>>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir/algebraic: Don't emit empty initializers for MSVC

2019-05-03 Thread Connor Abbott
FWIW, the reason I changed it away was to keep it consistent with the line
directly above that uses the length (since the C array won't exist if it's
length 0). Does that convince you?

On Fri, May 3, 2019 at 10:26 PM Jason Ekstrand  wrote:

> On Thu, May 2, 2019 at 3:51 PM Dylan Baker  wrote:
>
>> Quoting Connor Abbott (2019-05-02 13:34:07)
>> > Just don't emit the transform array at all if there are no transforms
>> > for a state, and avoid trying to walk over it.
>> > ---
>> > Brian, does this build on Windows? I tested it on my shader-db
>> > on radeonsi.
>> >
>> > ---
>> >  src/compiler/nir/nir_algebraic.py | 6 +-
>> >  1 file changed, 5 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/src/compiler/nir/nir_algebraic.py
>> b/src/compiler/nir/nir_algebraic.py
>> > index 6db749e9248..7af80a4f92e 100644
>> > --- a/src/compiler/nir/nir_algebraic.py
>> > +++ b/src/compiler/nir/nir_algebraic.py
>> > @@ -993,11 +993,13 @@ static const uint16_t CONST_STATE = 1;
>> >  % endfor
>> >
>> >  % for state_id, state_xforms in enumerate(automaton.state_patterns):
>> > +% if len(state_xforms) > 0: # avoid emitting a 0-length array for MSVC
>>
>> if not state_xforms:
>>
>> Using len() to test container emptiness is an anti-pattern in python, is
>> is
>> roughly 10x slower  than this.
>>
>> >  static const struct transform ${pass_name}_state${state_id}_xforms[] =
>> {
>> >  % for i in state_xforms:
>> >{ ${xforms[i].search.c_ptr(cache)},
>> ${xforms[i].replace.c_value_ptr(cache)}, ${xforms[i].condition_index} },
>> >  % endfor
>> >  };
>> > +% endif
>> >  % endfor
>> >
>> >  static const struct per_op_table
>> ${pass_name}_table[nir_num_search_ops] = {
>> > @@ -1080,7 +1082,8 @@ ${pass_name}_block(nir_builder *build, nir_block
>> *block,
>> >switch (states[alu->dest.dest.ssa.index]) {
>> >  % for i in range(len(automaton.state_patterns)):
>> >case ${i}:
>> > - for (unsigned i = 0; i <
>> ARRAY_SIZE(${pass_name}_state${i}_xforms); i++) {
>>
>
> I'd rather keep the ARRAY_SIZE unless we've got a really good reason to
> drop it.  With that and Dylan's changes,
>
> Reviewed-by: Jason Ekstrand 
>
>
>> > + % if len(automaton.state_patterns[i]) > 0:
>>
>> same here
>>
>> Dylan
>>
>> > + for (unsigned i = 0; i < ${len(automaton.state_patterns[i])};
>> i++) {
>> >  const struct transform *xform =
>> &${pass_name}_state${i}_xforms[i];
>> >  if (condition_flags[xform->condition_offset] &&
>> >  nir_replace_instr(build, alu, xform->search,
>> xform->replace)) {
>> > @@ -1088,6 +1091,7 @@ ${pass_name}_block(nir_builder *build, nir_block
>> *block,
>> > break;
>> >  }
>> >   }
>> > + % endif
>> >   break;
>> >  % endfor
>> >default: assert(0);
>> > --
>> > 2.17.2
>> >
>> > ___
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nir/algebraic: Don't emit empty initializers for MSVC

2019-05-02 Thread Connor Abbott
Just don't emit the transform array at all if there are no transforms
for a state, and avoid trying to walk over it.
---
Brian, does this build on Windows? I tested it on my shader-db
on radeonsi.

---
 src/compiler/nir/nir_algebraic.py | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir_algebraic.py 
b/src/compiler/nir/nir_algebraic.py
index 6db749e9248..7af80a4f92e 100644
--- a/src/compiler/nir/nir_algebraic.py
+++ b/src/compiler/nir/nir_algebraic.py
@@ -993,11 +993,13 @@ static const uint16_t CONST_STATE = 1;
 % endfor
 
 % for state_id, state_xforms in enumerate(automaton.state_patterns):
+% if len(state_xforms) > 0: # avoid emitting a 0-length array for MSVC
 static const struct transform ${pass_name}_state${state_id}_xforms[] = {
 % for i in state_xforms:
   { ${xforms[i].search.c_ptr(cache)}, ${xforms[i].replace.c_value_ptr(cache)}, 
${xforms[i].condition_index} },
 % endfor
 };
+% endif
 % endfor
 
 static const struct per_op_table ${pass_name}_table[nir_num_search_ops] = {
@@ -1080,7 +1082,8 @@ ${pass_name}_block(nir_builder *build, nir_block *block,
   switch (states[alu->dest.dest.ssa.index]) {
 % for i in range(len(automaton.state_patterns)):
   case ${i}:
- for (unsigned i = 0; i < ARRAY_SIZE(${pass_name}_state${i}_xforms); 
i++) {
+ % if len(automaton.state_patterns[i]) > 0:
+ for (unsigned i = 0; i < ${len(automaton.state_patterns[i])}; i++) {
 const struct transform *xform = &${pass_name}_state${i}_xforms[i];
 if (condition_flags[xform->condition_offset] &&
 nir_replace_instr(build, alu, xform->search, xform->replace)) {
@@ -1088,6 +1091,7 @@ ${pass_name}_block(nir_builder *build, nir_block *block,
break;
 }
  }
+ % endif
  break;
 % endfor
   default: assert(0);
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: don't emit empty initializers for MSVC

2019-05-02 Thread Connor Abbott
Whoops, that should be "radeonsi with radeonsi_enable_nir=true" since NIR
isn't enabled by default yet.

On Thu, May 2, 2019 at 9:35 PM Connor Abbott  wrote:

> This will crash at runtime, since it'll construct a "struct transform"
> with all NULL pointers, and then the loop below in ${pass_name}_block()
> will see that there's one transform in the array since it uses ARRAY_SIZE
> and then crash trying to access it.
>
> Running piglit with i965, or radeonsi will reproduce the crash.
>
> On Thu, May 2, 2019 at 7:52 PM Brian Paul  wrote:
>
>> This fixes a build failure with MSVC.
>>
>> ---
>>
>> I've compiled tested this, but not sure how to runtime test it.
>> ---
>>  src/compiler/nir/nir_algebraic.py | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/src/compiler/nir/nir_algebraic.py
>> b/src/compiler/nir/nir_algebraic.py
>> index 6db749e..dc25421 100644
>> --- a/src/compiler/nir/nir_algebraic.py
>> +++ b/src/compiler/nir/nir_algebraic.py
>> @@ -997,6 +997,9 @@ static const struct transform
>> ${pass_name}_state${state_id}_xforms[] = {
>>  % for i in state_xforms:
>>{ ${xforms[i].search.c_ptr(cache)},
>> ${xforms[i].replace.c_value_ptr(cache)}, ${xforms[i].condition_index} },
>>  % endfor
>> +% if state_xforms == []: # avoid empty initializers for MSVC
>> +  0
>> +% endif
>>  };
>>  % endfor
>>
>> --
>> 2.7.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: don't emit empty initializers for MSVC

2019-05-02 Thread Connor Abbott
This will crash at runtime, since it'll construct a "struct transform" with
all NULL pointers, and then the loop below in ${pass_name}_block() will see
that there's one transform in the array since it uses ARRAY_SIZE and then
crash trying to access it.

Running piglit with i965, or radeonsi will reproduce the crash.

On Thu, May 2, 2019 at 7:52 PM Brian Paul  wrote:

> This fixes a build failure with MSVC.
>
> ---
>
> I've compiled tested this, but not sure how to runtime test it.
> ---
>  src/compiler/nir/nir_algebraic.py | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/compiler/nir/nir_algebraic.py
> b/src/compiler/nir/nir_algebraic.py
> index 6db749e..dc25421 100644
> --- a/src/compiler/nir/nir_algebraic.py
> +++ b/src/compiler/nir/nir_algebraic.py
> @@ -997,6 +997,9 @@ static const struct transform
> ${pass_name}_state${state_id}_xforms[] = {
>  % for i in state_xforms:
>{ ${xforms[i].search.c_ptr(cache)},
> ${xforms[i].replace.c_value_ptr(cache)}, ${xforms[i].condition_index} },
>  % endfor
> +% if state_xforms == []: # avoid empty initializers for MSVC
> +  0
> +% endif
>  };
>  % endfor
>
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/5] panfrost: Disable PIPE_CAP_TGSI_TEXCOORD

2019-03-15 Thread Connor Abbott
On Fri, Mar 15, 2019 at 5:00 PM Ilia Mirkin  wrote:
>
> On Fri, Mar 15, 2019 at 11:52 AM Connor Abbott 
wrote:
> >
> > On Fri, Mar 15, 2019 at 3:46 PM Ilia Mirkin 
wrote:
> > >
> > > On Fri, Mar 15, 2019 at 10:29 AM Alyssa Rosenzweig <
aly...@rosenzweig.io> wrote:
> > > >
> > > > > This is needed if you can only handle point sprites on certain
> > > > > varyings but not others. This is the case for nv30- and nvc0-based
> > > > > GPUs, which is why the cap was introduced.
> > > > >
> > > > > If your GPU does not have such restrictions, you can safely
remove the cap.
> > > >
> > > > Ah-ha, I see, thank you.
> > > >
> > > > I can't handle point sprites on any varyings (they all have to get
> > > > lowered to gl_PointCoord), so.. :)
> > >
> > > Yeah, that's not extremely surprising. Point sprites weren't a thing
> > > in GLES 1, and only available as gl_PointCoord in GLES 2. With GL 1.x
> > > + multitexture, you can have up to 8 tex coords used in your fragment
> > > pipeline, and so need a fixed-function way of setting them to point
> > > coords when drawing points. Hence the current TEXCOORD semantics of
> > > just bumping generic varyings by 8, so that we can be sure that any
> > > shaders with true gl_TexCoord[] usage get the low 8 varyings assigned
> > > (and which, on nvc0+ go through "special" shader outputs, subject to
> > > the replacement).
> > >
> > > Note that e.g. Adreno A3xx+ is capable of doing the replacement
> > > everywhere, despite being a GLES-targeted GPU. So it's not out of the
> > > realm of possibility that you just haven't figured out the full
> > > mechanism for doing it yet.
> >
> > Can you replace more than one varying and output multiple points vertex
shader, or something crazy like that? On Mali gl_PointCoord is just a
normal varying whose descriptor points to a special point-coord buffer that
gets fed to the tiler, so we should be able to make an arbitrary varying
"turn into" gl_PointCoord just by changing its descriptor at draw time.
>
> Yeah, any (gl_TexCoord) varying may get replaced by a point sprite
> coord (in GL 1.x / GL 2.x). Note that there's additional complication
> from coordinate inversion, when you're drawing on the winsys fb vs an
> fbo. So you have to be able to generate both S,T and S,1-T. [Or
> perhaps there's an explicit control for it, I forget.] With GLES2+,
> it's just gl_PointCoord though.
>
> The way gallium handles this is that you get a mask of varyings to
> replace in
https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/include/pipe/p_state.h#n188
> + whether to invert or not (sprite_coord_mode).

Oh, whoops... I guess I was thinking of gl_PointSize. gl_PointCoord in the
fragment shader does indeed come through an extra-special varying, at least
the way the blob does it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/5] panfrost: Disable PIPE_CAP_TGSI_TEXCOORD

2019-03-15 Thread Connor Abbott
On Fri, Mar 15, 2019 at 3:46 PM Ilia Mirkin  wrote:
>
> On Fri, Mar 15, 2019 at 10:29 AM Alyssa Rosenzweig 
wrote:
> >
> > > This is needed if you can only handle point sprites on certain
> > > varyings but not others. This is the case for nv30- and nvc0-based
> > > GPUs, which is why the cap was introduced.
> > >
> > > If your GPU does not have such restrictions, you can safely remove
the cap.
> >
> > Ah-ha, I see, thank you.
> >
> > I can't handle point sprites on any varyings (they all have to get
> > lowered to gl_PointCoord), so.. :)
>
> Yeah, that's not extremely surprising. Point sprites weren't a thing
> in GLES 1, and only available as gl_PointCoord in GLES 2. With GL 1.x
> + multitexture, you can have up to 8 tex coords used in your fragment
> pipeline, and so need a fixed-function way of setting them to point
> coords when drawing points. Hence the current TEXCOORD semantics of
> just bumping generic varyings by 8, so that we can be sure that any
> shaders with true gl_TexCoord[] usage get the low 8 varyings assigned
> (and which, on nvc0+ go through "special" shader outputs, subject to
> the replacement).
>
> Note that e.g. Adreno A3xx+ is capable of doing the replacement
> everywhere, despite being a GLES-targeted GPU. So it's not out of the
> realm of possibility that you just haven't figured out the full
> mechanism for doing it yet.

Can you replace more than one varying and output multiple points vertex
shader, or something crazy like that? On Mali gl_PointCoord is just a
normal varying whose descriptor points to a special point-coord buffer that
gets fed to the tiler, so we should be able to make an arbitrary varying
"turn into" gl_PointCoord just by changing its descriptor at draw time.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] panfrost: Use tiler fast path (performance boost)

2019-02-21 Thread Connor Abbott
On Thu, Feb 21, 2019 at 5:07 PM Alyssa Rosenzweig 
wrote:

> > Yes, there is a buffer for holding the results of the tiler. The way it
> > works is that the userspace driver allocates a very large buffer with a
> > "grow on page fault" bit, so the kernel will allocate more memory as the
> > tiler asks for it.
>
> To be clear, right now I have this magic misc_0 buffer setup that way,
> too (allocating something enormous and setting GROW_ON_GPF). Although,
> it's not clear to me that that is the _correct_ thing to do; IIRC, the
> blob allocates "just enough", no GROW_ON_GPF needed, but I should
> probably check that.
>

I don't know about Midgard, but on Bifrost it definitely uses GROW_ON_GPF,
telling the kernel to preallocate some small-ish initial set, presumably as
an optimization. I don't think you can accurately predict how much memory
you're going to need ahead of time, since it depends on how the triangles
are distributed along the screen.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] panfrost: Use tiler fast path (performance boost)

2019-02-21 Thread Connor Abbott
On Thu, Feb 21, 2019 at 3:01 PM Rob Clark  wrote:

> On Thu, Feb 21, 2019 at 1:19 AM Alyssa Rosenzweig 
> wrote:
> >
> > For reasons that are still unclear (speculation included in the comment
> > added in this patch), the tiler? metadata has a fast path that we were
> > not enabling; there looks to be a possible time/memory tradeoff, but the
> > details remain unclear.
> >
> > Regardless, this patch improves performance dramatically. Particular
> > wins are for geometry-heavy scenes. For instance, glmark2-es2's
> > Phong-shaded bunny, rendering at fullscreen (2400x1600) via GBM, jumped
> > from ~20fps to hitting vsync cap at 60fps. Gains are even more obvious
> > when vsync is disabled, as in glmark2-es2-wayland.
> >
> > With this patch, on GLES 2.0 samples not involving FBOs, it appears
> > performance is converging with (and sometimes surpassing) the blob.
> >
> > Signed-off-by: Alyssa Rosenzweig 
> > ---
> >  src/gallium/drivers/panfrost/pan_context.c | 42 +++---
> >  1 file changed, 38 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/gallium/drivers/panfrost/pan_context.c
> b/src/gallium/drivers/panfrost/pan_context.c
> > index 822b5a0dfef..2996a9c1e09 100644
> > --- a/src/gallium/drivers/panfrost/pan_context.c
> > +++ b/src/gallium/drivers/panfrost/pan_context.c
> > @@ -256,7 +256,28 @@ static struct bifrost_framebuffer
> >  panfrost_emit_mfbd(struct panfrost_context *ctx)
> >  {
> >  struct bifrost_framebuffer framebuffer = {
> > -.tiler_meta = 0xf0c600,
> > +/* It is not yet clear what tiler_meta means or how it's
> > + * calculated, but we can tell the lower 32-bits are a
> > + * (monotonically increasing?) function of tile count
> and
> > + * geometry complexity; I suspect it defines a memory
> size of
> > + * some kind? for the tiler. It's really unclear at the
> > + * moment... but to add to the confusion, the hardware
> is happy
> > + * enough to accept a zero in this field, so we don't
> even have
> > + * to worry about it right now.
>
> *somewhere* the result of VS (or binning shader if VS is split in two)
> needs to be saved for use during the per-tile rendering.  Presumably
> you have to give the hw a buffer of appropriate/matching size
> somewhere, or with enough geometry (and too small a buffer) things
> will overflow and go badly.
>
>
Yes, there is a buffer for holding the results of the tiler. The way it
works is that the userspace driver allocates a very large buffer with a
"grow on page fault" bit, so the kernel will allocate more memory as the
tiler asks for it.


> I guess if you exceed the size given in .tiler_meta, then hw falls
> back to running VS per tile for all geometry (not just geom visible in
> the tile), hence big diff in perf for large tile counts and lotsa
> geometry.
>

No, this isn't it. The vertex shaders aren't split in half, they all run
entirely before the tiler even starts. The only thing I could imagine would
be some optional part of the tiler buffer where the aforementioned grow on
page fault thing doesn't quite work out.


>
> It does sound a bit strange that it would depend on tile count.  I'd
> expect it would be a function strictly of amount of geometry (and
> possibly effected by whether or not gl_PointSize is written?).. and
> possibly amount of varyings if VS isn't split into two parts.
>
> BR,
> -R
>
> > + * The byte (just after the 32-bit mark) is much more
> > + * interesting. The higher nibble I've only ever seen
> as 0xF,
> > + * but the lower one I've seen as 0x0 or 0xF, and it's
> not
> > + * obvious what the difference is. But what -is-
> obvious is
> > + * that when the lower nibble is zero, performance is
> severely
> > + * degraded compared to when the lower nibble is set.
> > + * Evidently, that nibble enables some sort of fast
> path,
> > + * perhaps relating to caching or tile flush?
> Regardless, at
> > + * this point there's no clear reason not to set it,
> aside from
> > + * substantially increased memory requirements (of the
> misc_0
> > + * buffer) */
> > +
> > +.tiler_meta = ((uint64_t) 0xff << 32) | 0x0,
> >
> >  .width1 = MALI_POSITIVE(ctx->pipe_framebuffer.width),
> >  .height1 = MALI_POSITIVE(ctx->pipe_framebuffer.height),
> > @@ -271,10 +292,23 @@ panfrost_emit_mfbd(struct panfrost_context *ctx)
> >
> >  .unknown2 = 0x1f,
> >
> > -/* Presumably corresponds to unknown_address_X of SFBD
> */
> > +/* Corresponds to unknown_address_X of SFBD */
> >  .scratchpad = ctx->scratchpad.gpu,
> >  .tiler_scratch_start  = ctx->misc_0.gpu,
> > -

Re: [Mesa-dev] A few NIR compile time optimisations

2019-02-19 Thread Connor Abbott
On Wed, Feb 20, 2019 at 4:29 AM Marek Olšák  wrote:

> On Tue, Feb 19, 2019 at 7:57 PM Rob Clark  wrote:
>
>> On Tue, Feb 19, 2019 at 6:49 PM Marek Olšák  wrote:
>> >
>> > st_link_shader takes 55% of CPU time with NIR, and 9% with TGSI.
>> >
>> > nir_validate_shader 49%
>> >
>> > nir_validate_shader is overused. It doesn't make sense even in debug
>> builds.
>>
>> tbh, I like nir_validate enabled when I do piglit/deqp runs.. and I
>> already do separate release vs debug builds (which meson kinda
>> encourages by disallowing in-tree builds in the first place, but is
>> totally possible with autotools builds).. I kinda think benchmarking
>> debug build in the first place is a flawed process.
>>
>> So I wouldn't profile/benchmark nir vs tgsi (or really anything) with
>> debug builds.. I could get behind a separate compiler-debug option
>> that changed the default NIR_VALIDATE setting and maybe some other nir
>> debug features to get some features of debug builds without enabling
>> the heavier nir debug features.  But other than making debug options a
>> bit more fine grained, I wouldn't change things in response to a
>> flawed benchmarking process..
>>
>
> That's a harsh reaction to a relatively good benchmarking setup. I use
> debugoptimized with -DDEBUG. My performance is probably more affected by
> -fno-omit-frame-pointer than -DDEBUG.
>

Why would you enable DEBUG in a profiling build? AFAIK it's supposed to
enable validation more expensive than simple asserts, which you probably
don't want in a benchmarking setup, although from grepping the source it's
not used much. It might be a good idea to move running NIR validation
behind DEBUG instead of !NDEBUG. In the meantime, unless you really want to
benchmark things with assertions enabled in which case NIR_VALIDATE=0 works
(but why would you?), you can set -Db_ndebug=false in Meson. I just saw
today that they're enabled by default in debugoptimized builds (whoops,
better go fix that and re-profile...).


>
> Marek
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] panfrost: Backport driver to Mali T600/T700

2019-02-19 Thread Connor Abbott
On Tue, Feb 19, 2019 at 5:33 PM Emil Velikov 
wrote:

> On Fri, 15 Feb 2019 at 21:39, Alyssa Rosenzweig 
> wrote:
> >
> > > - about 1/5 of the patch seems to be white space changes
> >
> > ...Oops. Any tips for avoiding this type of diff churn in the future? I
> > suppose it's not inherently harmful, but maybe it could make merging
> > more difficult than strictly necessary.
> >
> Splitting/polishing patches can be fiddly - but IMHO pays in the long run.
> IIRC not too long ago Dave mentioned that specific work took him more
> time to polish+split than to write+debug the code.
>
> > > - doesn't seem like BIFROST is defined anywhere
> >
> > Indeed it's not; Bifrost is not yet supported, but at least this way we
> > can share headers with the out-of-tree work on Bifrost (is anyone
> > working on these parts right now..?)
> >
> > > - other drivers keep details like is_t6xx, require_sfbd, others in
> > > driver/screen specific struct
> >
> > Aye, that'll be fixed next patch :)
> >
> > > - the __LP64__ checks seems suspicious, no other mesa driver has those
> >
> > Is there a better way to handle mixed bit-ness? We have shared memory
> > (sort of -- separate MMUs, separate address spaces, but they're mapped
> > together with shared physical RAM and we opt for SAME_VA where gpu_va ==
> > user_cpu_va). As such, 32-bit Mali and 64-bit Mali behave differently,
> > since pointers are larger and then some fields get rearranged to pack
> > tighter/less-so depending on pointer sizes. There's no real benefit to
> > support both modes in the same build of the driver; by far, having a
> > 32-bit build for armhf with 32-bit Mali descriptors and a 64-bit build
> > for aarch64 with 64-bit descriptors is the sane approach. Accordingly,
> > I reasoned that __LP64__ is the cleanest way to check what type of
> > system we're building for, and from there which descriptor flavour we
> > should use. Is there something inherently problematic about this scheme?
> >
> I might not be the best person for this, but I think subtle
> differences like these should not be exposed to userspace (be part of
> the UABI).
> In the x86 world running 64bit kernels with 32bit userspace is fairly
> common, but from what I hear it's less so in Arm land.
>

This isn't UABI, since it has absolutely nothing to do with the kernel. The
hardware supports two command stream formats, the 32-bit one where pointers
are expanded from 32-bit to 64-bit internally by the HW and the 64-bit one,
and userspace tells the hardware which one it wants to use by setting a bit
in the job header which is only interpreted by the hardware. Right now the
idea is to select which one based on what bitsize the userspace is compiled
for, hence uintptr_t for pointers, but this could change in the future
without having to notify the kernel. Nothing in the kernel is reading these
pointers at all besides some HW workarounds in the vendor kernel which read
the same "which bitsize am I" bit the HW does.


>
> > In theory we can mix and match, the hardware can do both regardless of
> > the CPU as far as I know, but that complicates things dramatically for
> > little benefit.
> >
> > Keep in mind that Midgard onwards uses descriptors in shared memory,
> > rather than a true command stream, so it's possible no other mesa driver
> > does this since no other mesa-supported hardware needs this.
> >
> I don't know all the drivers but it sounds possible.
>
> Thanks for the extensive reply
> Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: remove simple dead if detection from nir_opt_dead_cf()

2019-02-15 Thread Connor Abbott
Reviewed-by: Connor Abbott 

I agree that if we ever need to bring this back, we should just check for
both branches empty and no phis afterwards.

On Thu, Feb 14, 2019 at 2:38 AM Timothy Arceri 
wrote:

> This was probably useful when it was first written, however it
> looks to be no longer necessary.
>
> As far as I can tell these days dce is smart enough to remove useless
> instructions from if branches. Once this is done
> nir_opt_peephole_select() will end up removing the empty if.
>
> Removing this support reduces the dolphin uber shader compilation
> time by around 60%. Compile time is reduced due to no longer having
> to compute the live ssa defs metadata so much.
>
> No shader-db changes on i965 or radeonsi.
> ---
>  src/compiler/nir/nir_opt_dead_cf.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/src/compiler/nir/nir_opt_dead_cf.c
> b/src/compiler/nir/nir_opt_dead_cf.c
> index 14986732096..053c5743527 100644
> --- a/src/compiler/nir/nir_opt_dead_cf.c
> +++ b/src/compiler/nir/nir_opt_dead_cf.c
> @@ -180,7 +180,7 @@ def_not_live_out(nir_ssa_def *def, void *state)
>  }
>
>  /*
> - * Test if a loop node or if node is dead. Such nodes are dead if:
> + * Test if a loop node is dead. Such nodes are dead if:
>   *
>   * 1) It has no side effects (i.e. intrinsics which could possibly affect
> the
>   * state of the program aside from producing an SSA value, indicated by a
> lack
> @@ -198,7 +198,7 @@ def_not_live_out(nir_ssa_def *def, void *state)
>  static bool
>  node_is_dead(nir_cf_node *node)
>  {
> -   assert(node->type == nir_cf_node_loop || node->type == nir_cf_node_if);
> +   assert(node->type == nir_cf_node_loop);
>
> nir_block *before = nir_cf_node_as_block(nir_cf_node_prev(node));
> nir_block *after = nir_cf_node_as_block(nir_cf_node_next(node));
> @@ -230,11 +230,6 @@ dead_cf_block(nir_block *block)
>  {
> nir_if *following_if = nir_block_get_following_if(block);
> if (following_if) {
> -  if (node_is_dead(_if->cf_node)) {
> - nir_cf_node_remove(_if->cf_node);
> - return true;
> -  }
> -
>if (!nir_src_is_const(following_if->condition))
>   return false;
>
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] A few NIR compile time optimisations

2019-02-13 Thread Connor Abbott
Reviewed-by: Connor Abbott 

I'm a bit surprised it's that slow... do you have any idea what's going on?
I've made flamegraphs in the past on i965 to see where most of the time is
spent.

On Wed, Feb 13, 2019 at 9:00 AM Timothy Arceri 
wrote:

> Currently the radeonsi NIR backend takes around twice the time
> of the tgsi backend to compile shader-db. These are some first
> steps at reducing the overhead of NIR.
>
> This series reduces the compile time of a Deus Ex program I was
> profiling by around 5%.
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 12/44] nir: add new fadd, fsub, fmul opcodes taking into account rounding mode

2019-02-06 Thread Connor Abbott
On Wed, Feb 6, 2019 at 11:46 AM Samuel Iglesias Gonsálvez <
sigles...@igalia.com> wrote:

> According to Vulkan spec, the new execution modes affect only
> correctly rounded SPIR-V instructions, which includes fadd,
> fsub and fmul.
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/compiler/nir/nir_opcodes.py | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/src/compiler/nir/nir_opcodes.py
> b/src/compiler/nir/nir_opcodes.py
> index f8997444c28..7b45d38f460 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -453,9 +453,15 @@ def binop_convert(name, out_type, in_type, alg_props,
> const_expr):
> opcode(name, 0, out_type, [0, 0], [in_type, in_type],
>False, alg_props, const_expr, "")
>
> +def binop_convert_rounding_mode(name, out_type, in_type, alg_props,
> const_expr, rounding_mode):
> +   opcode(name, 0, out_type, [0, 0], [in_type, in_type], False,
> alg_props, const_expr, rounding_mode)
> +
>  def binop(name, ty, alg_props, const_expr):
> binop_convert(name, ty, ty, alg_props, const_expr)
>
> +def binop_rounding_mode(name, ty, alg_props, const_expr, rounding_mode):
> +   binop_convert_rounding_mode(name, ty, ty, alg_props, const_expr,
> rounding_mode)
> +
>  def binop_compare(name, ty, alg_props, const_expr):
> binop_convert(name, tbool1, ty, alg_props, const_expr)
>
> @@ -490,13 +496,24 @@ def binop_reduce(name, output_size, output_type,
> src_type, prereduce_expr,
>final(reduce_(reduce_(src0, src1), reduce_(src2, src3))), "")
>
>  binop("fadd", tfloat, commutative + associative, "src0 + src1")
> +binop_rounding_mode("fadd_rtne", tfloat, commutative + associative,
> +"_mesa_roundeven(src0 + src1)", "_rtne")
>

There are two things wrong here:
- The default floating-point environment, and the one Mesa expects, does
round-to-nearest-even so it's rtz that needs special handling.
- _mesa_roundeven here is a no-op (it'll implicitly expand to a double and
then convert back to a float). The rounding actually happens as part of the
addition itself. I'm not sure if adding as double (with
round-to-nearest-even) and then rounding back to a float will work, but
that definitely won't work for doubles. I think you'll have to implement
round-to-zero addition yourself, or fiddle with the CPU's floating-point
environment.

The same goes multiply and fma.


> +binop_rounding_mode("fadd_rtz", tfloat, commutative + associative, "src0
> + src1", "_rtz")
> +
>  binop("iadd", tint, commutative + associative, "src0 + src1")
>  binop("uadd_sat", tuint, commutative,
>"(src0 + src1) < src0 ? UINT64_MAX : (src0 + src1)")
>  binop("fsub", tfloat, "", "src0 - src1")
> +binop_rounding_mode("fsub_rtne", tfloat, "",
> +"_mesa_roundeven(src0 - src1)", "_rtne")
> +binop_rounding_mode("fsub_rtz", tfloat, "", "src0 - src1", "_rtz")
>  binop("isub", tint, "", "src0 - src1")
>
>  binop("fmul", tfloat, commutative + associative, "src0 * src1")
> +binop_rounding_mode("fmul_rtne", tfloat, commutative + associative,
> +"_mesa_roundeven(src0 * src1)", "_rtne")
> +binop_rounding_mode("fmul_rtz", tfloat, commutative + associative, "src0
> * src1", "_rtz")
> +
>  # low 32-bits of signed/unsigned integer multiply
>  binop("imul", tint, commutative + associative, "src0 * src1")
>
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 14/29] nir: fix constant expressions for rounding mode conversions

2019-01-30 Thread Connor Abbott
I think maybe it would be better if we put all this in nir_opcodes.py
instead. Again, I'd like to make sure that for every opcode, what it does
is right next to the definition instead of buried in
nir_constant_expressions.py.

Also, I don't see anywhere in the series where you handle different
rounding modes for fadd, fsub, fmul, etc.

On Tue, Dec 18, 2018 at 11:35 AM Samuel Iglesias Gonsálvez <
sigles...@igalia.com> wrote:

> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/compiler/nir/nir_constant_expressions.py | 46 +---
>  1 file changed, 41 insertions(+), 5 deletions(-)
>
> diff --git a/src/compiler/nir/nir_constant_expressions.py
> b/src/compiler/nir/nir_constant_expressions.py
> index dc2132df0d0..84cf819e921 100644
> --- a/src/compiler/nir/nir_constant_expressions.py
> +++ b/src/compiler/nir/nir_constant_expressions.py
> @@ -64,6 +64,7 @@ template = """\
>  #include "util/rounding.h" /* for _mesa_roundeven */
>  #include "util/half_float.h"
>  #include "util/bigmath.h"
> +#include "util/double.h"
>  #include "nir_constant_expressions.h"
>
>  /**
> @@ -324,7 +325,15 @@ struct ${type}${width}_vec {
>   % elif input_types[j] == "float16":
>  _mesa_half_to_float(_src[${j}].u16[${k}]),
>   % else:
> -_src[${j}].${get_const_field(input_types[j])}[${k}],
> +% if ("rtne" in op.rounding_mode) and ("float" in
> input_types[j]) and ("int" in output_type):
> +   % if "float32" in input_types[j]:
> +
> _mesa_roundevenf(_src[${j}].${get_const_field(input_types[j])}[${k}]),
> +   % else:
> +
> _mesa_roundeven(_src[${j}].${get_const_field(input_types[j])}[${k}]),
> +   % endif
> +% else:
> +   _src[${j}].${get_const_field(input_types[j])}[${k}],
> +% endif
>   % endif
>% endfor
>% for k in range(op.input_sizes[j], 4):
> @@ -353,8 +362,27 @@ struct ${type}${width}_vec {
> const float src${j} =
>_mesa_half_to_float(_src[${j}].u16[_i]);
>  % else:
> -   const ${input_types[j]}_t src${j} =
> -  _src[${j}].${get_const_field(input_types[j])}[_i];
> +   % if ("rtne" in op.rounding_mode) and ("float" in
> input_types[j]) and ("int" in output_type):
> +  % if "float32" in input_types[j]:
> + const ${input_types[j]}_t src${j} =
> +
> _mesa_roundevenf(_src[${j}].${get_const_field(input_types[j])}[_i]);
> +  % else:
> + const ${input_types[j]}_t src${j} =
> +
> _mesa_roundeven(_src[${j}].${get_const_field(input_types[j])}[_i]);
> +
> +  % endif
> +   % elif ("float64" in input_types[j]) and ("float32" in
> output_type):
> +  % if ("rtz" in op.rounding_mode):
> + const ${input_types[j]}_t src${j} =
> +
> _mesa_double_to_float_rtz(_src[${j}].${get_const_field(input_types[j])}[_i]);
> +  % else:
> + const ${input_types[j]}_t src${j} =
> +
> _mesa_double_to_float_rtne(_src[${j}].${get_const_field(input_types[j])}[_i]);
> +  % endif
> +   % else:
> +  const ${input_types[j]}_t src${j} =
> + _src[${j}].${get_const_field(input_types[j])}[_i];
> +   % endif
>  % endif
>   % endfor
>
> @@ -378,7 +406,11 @@ struct ${type}${width}_vec {
>  ## Sanitize the C value to a proper NIR 0/-1 bool
>  _dst_val.${get_const_field(output_type)}[_i] = -(int)dst;
>   % elif output_type == "float16":
> -_dst_val.u16[_i] = _mesa_float_to_half(dst);
> +% if "rtz" in op.rounding_mode:
> +   _dst_val.u16[_i] = _mesa_float_to_float16_rtz(dst);
> +% else:
> +   _dst_val.u16[_i] = _mesa_float_to_float16_rtne(dst);
> +% endif
>   % else:
>  _dst_val.${get_const_field(output_type)}[_i] = dst;
>   % endif
> @@ -416,7 +448,11 @@ struct ${type}${width}_vec {
>  ## Sanitize the C value to a proper NIR 0/-1 bool
>  _dst_val.${get_const_field(output_type)}[${k}] =
> -(int)dst.${"xyzw"[k]};
>   % elif output_type == "float16":
> -_dst_val.u16[${k}] = _mesa_float_to_half(dst.${"xyzw"[k]});
> +% if "rtz" in op.rounding_mode:
> +   _dst_val.u16[${k}] =
> _mesa_float_to_float16_rtz(dst.${"xyzw"[k]});
> +% else:
> +   _dst_val.u16[${k}] =
> _mesa_float_to_float16_rtne(dst.${"xyzw"[k]});
> +% endif
>   % else:
>  _dst_val.${get_const_field(output_type)}[${k}] =
> dst.${"xyzw"[k]};
>   % endif
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>

Re: [Mesa-dev] [PATCH v2 04/29] nir: add support for flushing to zero denorm constants

2019-01-30 Thread Connor Abbott
On Tue, Dec 18, 2018 at 11:35 AM Samuel Iglesias Gonsálvez <
sigles...@igalia.com> wrote:

> v2:
> - Refactor conditions and shared function (Connor)
> - Move code to nir_eval_const_opcode() (Connor)
> - Don't flush to zero on fquantize2f16
>   From Vulkan spec, VK_KHR_shader_float_controls section:
>
>   "3) Do denorm and rounding mode controls apply to OpSpecConstantOp?
>
>   RESOLVED: Yes, except when the opcode is OpQuantizeToF16."
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/compiler/nir/nir_constant_expressions.h  |  3 +-
>  src/compiler/nir/nir_constant_expressions.py | 65 ++--
>  src/compiler/nir/nir_loop_analyze.c  |  7 ++-
>  src/compiler/nir/nir_opt_constant_folding.c  | 15 ++---
>  src/compiler/spirv/spirv_to_nir.c|  3 +-
>  5 files changed, 75 insertions(+), 18 deletions(-)
>
> diff --git a/src/compiler/nir/nir_constant_expressions.h
> b/src/compiler/nir/nir_constant_expressions.h
> index 1d6bbbc25d3..a2d416abc45 100644
> --- a/src/compiler/nir/nir_constant_expressions.h
> +++ b/src/compiler/nir/nir_constant_expressions.h
> @@ -31,6 +31,7 @@
>  #include "nir.h"
>
>  nir_const_value nir_eval_const_opcode(nir_op op, unsigned num_components,
> -  unsigned bit_size, nir_const_value
> *src);
> +  unsigned bit_size, nir_const_value
> *src,
> +  unsigned
> float_controls_execution_mode);
>
>  #endif /* NIR_CONSTANT_EXPRESSIONS_H */
> diff --git a/src/compiler/nir/nir_constant_expressions.py
> b/src/compiler/nir/nir_constant_expressions.py
> index 505cdd8baae..dc2132df0d0 100644
> --- a/src/compiler/nir/nir_constant_expressions.py
> +++ b/src/compiler/nir/nir_constant_expressions.py
> @@ -66,6 +66,37 @@ template = """\
>  #include "util/bigmath.h"
>  #include "nir_constant_expressions.h"
>
> +/**
> + * Checks if the provided value is a denorm and flushes it to zero.
> +*/
> +static nir_const_value
> +constant_denorm_flush_to_zero(nir_const_value value, unsigned index,
> unsigned bit_size)
> +{
> +   switch(bit_size) {
> +   case 64:
> +  if (value.u64[index] < 0x0010)
> + value.u64[index] = 0;
> +  if (value.u64[index] & 0x8000 &&
> +  !(value.u64[index] & 0x7ff0))
> + value.u64[index] = 0x8000;
> +  break;
> +   case 32:
> +  if (value.u32[index] < 0x0080)
> + value.u32[index] = 0;
> +  if (value.u32[index] & 0x8000 &&
> +  !(value.u32[index] & 0x7f80))
> + value.u32[index] = 0x8000;
> +  break;
> +   case 16:
> +  if (value.u16[index] < 0x0400)
> + value.u16[index] = 0;
> +  if (value.u16[index] & 0x8000 &&
> +  !(value.u16[index] & 0x7c00))
> + value.u16[index] = 0x8000;
> +   }
> +   return value;
> +}
> +
>  /**
>   * Evaluate one component of packSnorm4x8.
>   */
> @@ -260,7 +291,7 @@ struct ${type}${width}_vec {
>  % endfor
>  % endfor
>
> -<%def name="evaluate_op(op, bit_size)">
> +<%def name="evaluate_op(op, bit_size, execution_mode)">
> <%
> output_type = type_add_size(op.output_type, bit_size)
> input_types = [type_add_size(type_, bit_size) for type_ in
> op.input_types]
> @@ -277,6 +308,14 @@ struct ${type}${width}_vec {
>   <% continue %>
>%endif
>
> +  % for k in range(op.input_sizes[j]):
> + % if op.name != "fquantize2f16" and bit_size > 8 and
> op.input_types[j] == "float":
>

This condition doesn't seem right. It may happen to work, but it isn't
following NIR principles on ALU instructions. Each NIR opcode has an output
type (given by op.output_type) and input types for each source (given by
op.input_types). Each type may be sized, like "float32", or unsized like
"float". All unsized inputs/outputs must have the same bit size at runtime.
The bit_size here is the common bitsize for inputs/outputs with unsized
types like "float" in the opcode definition. Even though in general it's
only known at runtime, we're switching over all bitsizes in the generated
code, so here we do know what it is at compile time. In order to handle
sized types, we have to drop all references to bit_size and stop comparing
op.input_types[j] directly with "float" since it may be a sized type. Also,
if this source has a float type, we already know that its bit size is at
least 16, so the second check should be useless. I think what you actually
want to do is extract the base type, i.e. something like: op.name !=
"fquantize16" and type_base_name(input_types[j]) == "float"


> +if (execution_mode &
> SHADER_DENORM_FLUSH_TO_ZERO_FP${bit_size}) {
>

bit_size here isn't the right bit size for sized sources. You want
${type_size(input_types[i])} which is the actual size of the source at
runtime. This will be bit_size if the op.input_types[j] is unsized or the
input size otherwise.

+   _src[${j}] = 

Re: [Mesa-dev] [PATCH 17/28] intel/nir: call nir_opt_constant_folding before nir_opt_algebraic is executed

2018-12-10 Thread Connor Abbott
On Mon, Dec 10, 2018 at 2:20 PM Samuel Iglesias Gonsálvez
 wrote:
>
> On 05/12/2018 19:22, Connor Abbott wrote:
> > Why is this needed? In general, we shouldn't be relying on
> > optimization ordering for correctness. It probably just means one of
> > the optimizations is wrong, and you're working around that.
>
> There is an issue when the shader tries to generate a NaN using 0/0,
> opt_algebraic is replaces the 0 * NaN part for 0.
>
> This is an example of denorm * NaN code, although it is happening for
> other tests too. The optimization takes this code:
>
> vec1 32 ssa_0 = load_const (0x /* 0.00 */)
> vec1 32 ssa_2 = load_const (0x0080 /* 0.00 */)
> vec1 32 ssa_3 = load_const (0x008003f0 /* 0.00 */)
> vec1 32 ssa_4 = fneg ssa_2
> vec1 32 ssa_5 = fadd ssa_3, ssa_4 <<<< generate denorm
> vec1 32 ssa_7 = frcp ssa_0
> vec1 32 ssa_8 = fmul ssa_0, ssa_7 <<< generate NaN
> vec1 32 ssa_10 = fmul ssa_5, ssa_8 <<<< denorm * NaN
>
> and converts it to the following:
>
> vec1 32 ssa_0 = load_const (0x /* 0.00 */)
> vec1 32 ssa_2 = load_const (0x0080 /* 0.00 */)
> vec1 32 ssa_3 = load_const (0x008003f0 /* 0.00 */)
> vec1 32 ssa_4 = fneg ssa_2
> vec1 32 ssa_5 = fadd ssa_3, ssa_4   <<<< build denorm
> vec1 32 ssa_7 = frcp ssa_0
> vec1 32 ssa_18 = load_const (0x /* 0.00 */)
> vec1 32 ssa_19 = imov ssa_18   <<<< fmul (a, 0) = 0, so we miss 
> NaN!
> vec1 32 ssa_10 = fmul ssa_5, ssa_19 <<<< denorm * 0
>
> Is it possible to represent fmul (a, 0) = 0 if 'a' is not NaN? Or
> fmul(a, NaN) = NaN? Do you have any other suggestion?

It sounds like another imprecise optimization that doesn't respect
NaN's in opt_algebraic. Either we have to mark everything as precise
when we must preserve NaN's (or flush denorms, or preserve denorms, or
...), or we have to do an audit of all the imprecise optimizations in
opt_algebraic, figure out which execution modes they don't respect,
and disable them when the offending execution modes are set. For
example, we could disable (fmul a, 0) -> 0 only when NaN's must be
preserved. We could also add weaker optimizations that are precise and
hence always enabled, like limiting that transform to when a is known
not to be NaN, but that's something for later.

>
> Sam
>
> > On Wed, Dec 5, 2018 at 4:56 PM Samuel Iglesias Gonsálvez
> >  wrote:
> >>
> >> This would do constant folding and also flush to zero denorms operands 
> >> before
> >> the nir_opt_algebraic is executed.
> >>
> >> Signed-off-by: Samuel Iglesias Gonsálvez 
> >> ---
> >>  src/intel/compiler/brw_nir.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
> >> index 0a5aa35c700..600f7a97df9 100644
> >> --- a/src/intel/compiler/brw_nir.c
> >> +++ b/src/intel/compiler/brw_nir.c
> >> @@ -570,8 +570,8 @@ brw_nir_optimize(nir_shader *nir, const struct 
> >> brw_compiler *compiler,
> >>OPT(nir_opt_cse);
> >>OPT(nir_opt_peephole_select, 0);
> >>OPT(nir_opt_intrinsics);
> >> -  OPT(nir_opt_algebraic);
> >>OPT(nir_opt_constant_folding);
> >> +  OPT(nir_opt_algebraic);
> >>OPT(nir_opt_dead_cf);
> >>if (OPT(nir_opt_trivial_continues)) {
> >>   /* If nir_opt_trivial_continues makes progress, then we need to 
> >> clean
> >> --
> >> 2.19.1
> >>
> >> ___
> >> mesa-dev mailing list
> >> mesa-dev@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: Fixup algebraic test for variable-sized conversions

2018-12-07 Thread Connor Abbott
b2i can now take any size boolean in preparation for 1-bit booleans, so
the error message printed is slightly different.

Fixes: dca6cd9ce65 ("nir: Make boolean conversions sized just like the others")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108961
Cc: Jason Ekstrand 
---
 src/compiler/nir/tests/algebraic_parser_test.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/nir/tests/algebraic_parser_test.py 
b/src/compiler/nir/tests/algebraic_parser_test.py
index 492a09ec7db..d96da7db519 100644
--- a/src/compiler/nir/tests/algebraic_parser_test.py
+++ b/src/compiler/nir/tests/algebraic_parser_test.py
@@ -67,7 +67,7 @@ class ValidatorTests(unittest.TestCase):
 
 def test_replace_src_bitsize(self):
 self.common((('iadd', a, ('b2i', b)), ('iadd', a, b)),
-"Sources a (bit size of a) and b (bit size of 32) " \
+"Sources a (bit size of a) and b (bit size of b) " \
 "of ('iadd', 'a', 'b') may not have the same bit size " \
 "when building the replacement expression.")
 
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/28] Revert "spirv: Don’t check for NaN for most OpFOrd* comparisons"

2018-12-07 Thread Connor Abbott
On Fri, Dec 7, 2018 at 1:16 AM Ian Romanick  wrote:
>
> On 12/05/2018 10:12 AM, Connor Abbott wrote:
> > This won't work, since this optimization in nir_opt_algebraic will undo it:
> >
> > # For any float comparison operation, "cmp", if you have "a == a && a cmp b"
> > # then the "a == a" is redundant because it's equivalent to "a is not NaN"
> > # and, if a is a NaN then the second comparison will fail anyway.
> > for op in ['flt', 'fge', 'feq']:
> >optimizations += [
> >   (('iand', ('feq', a, a), (op, a, b)), (op, a, b)),
> >   (('iand', ('feq', a, a), (op, b, a)), (op, b, a)),
> >]
> >
> > and then this optimization might change the behavior for NaN's:
> >
> ># Comparison simplifications
> >(('~inot', ('flt', a, b)), ('fge', a, b)),
> >(('~inot', ('fge', a, b)), ('flt', a, b)),
> >(('~inot', ('feq', a, b)), ('fne', a, b)),
> >(('~inot', ('fne', a, b)), ('feq', a, b)),
> >
> > The topic of NaN's and comparisons in NIR has been discussed several
> > times before, most recently with this thread:
> > https://lists.freedesktop.org/archives/mesa-dev/2018-December/210780.html
>
> These two optimizations do not play well together.  Each by itself is
> valid, but the combination is not. As I mentioned in the previous
> thread, I believe the first change should mark the resulting (op, a, b)
> as precise.  That would prevent the second optimization from triggering.
>  I don't think nir_opt_algebraic has way to do this, but it doesn't seem
> like it would be that hard to add.  Regardless of what happens with the
> rest, breaking the cumulative effect of these optimizations is necessary.

Well, the entire reason for the first optimization was the code that's
being reinstated in this commit, so we could probably get rid of it
once we get rid of the need for it. If user code actually does the
extra NaN check, then yeah, we might need a way to mark this as
precise. One long-term thing would be splitting up the "precise" flag
into a per-instruction bitflag with all the options introduced in this
series, so we can mark the compare as "preserves NaN" and still let
other NaN-preserving but e.g. incorrect wrt signed zero optimizations
take place.

>
> > Given that this language got added to the Vulkan spec: "By default,
> > the implementation may perform optimizations on half, single, or
> > double-precision floating-point instructions respectively that ignore
> > sign of a zero, or assume that arguments and results are not Nans or
> > ±∞" I think we should probably do the following:
> >
> > - Fix the CTS tests that prompted this whole block of code to only
> > check the result of comparing NaN when this extension is available and
> > NaN's are preserved.
> > - nir_op_fne must be unordered already, regardless of what
> > floating-point options the user asked for, since it's used to
> > implement isNan() already. We should probably also define nir_op_feq
> > to be ordered. So we don't have to do anything with them, and !(a ==
> > b) == (a == b) is guaranteed.
> > - Define fgt and fle to be ordered, as this is what every backend
> > already does. No need to add unnecessary NaN checks.
> > - Disable the comparison simplifications (except for the fne one,
> > which shouldn't be marked imprecise as it is now) when preserve-nan is
> > set. I think there are a few other imprecise transforms that also need
> > to be disabled.
>
> Even with the work that I've been doing this week, removing these
> optimizations still hurts (quite dramatically in a few cases) over 1500
> shaders in shader-db.

That's why I said "when preserve-nan is set." When an application sets
this flag, it means they really do want NaN to work correctly,
whatever the cost, so the combination of the first and second
optimizations is invalid.

> By and large, applications go to great lengths to
> avoid things that could generate NaN.  If a calculation generates NaN in
> a graphics shader, you've already lost.  As a result, these hurt shaders
> work as-is.  Adding an extra 8% instructions to a working shader doesn't
> help anyone.
>
> Based on the commit message in d55835b8bdf0, my hypothesis is that
> disabling the combination of the two sets of optimizations won't
> negatively affect any shaders.

It will, since it makes every comparison precise for SPIR-V. This
won't hurt only if you never emit the NaN checks unless preserve-nan
is set. But this will just emit some pointless NaN checks in vtn when
preserve-nan is enabled and then throw them away immediately in
opt_algebraic while marking the comparison as precise. So 

Re: [Mesa-dev] [PATCH 06/25] amd/common: scan/reduce across waves of a workgroup

2018-12-06 Thread Connor Abbott
Is this going to be used by an extension? If you don't have a use for
it yet, it would probably be better to wait.
On Thu, Dec 6, 2018 at 3:01 PM Nicolai Hähnle  wrote:
>
> From: Nicolai Hähnle 
>
> Order-aware scan/reduce can trade-off LDS traffic for external atomics
> memory traffic in producer/consumer compute shaders.
> ---
>  src/amd/common/ac_llvm_build.c | 195 -
>  src/amd/common/ac_llvm_build.h |  36 ++
>  2 files changed, 227 insertions(+), 4 deletions(-)
>
> diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
> index 68c8bad9e83..932f4bbdeef 100644
> --- a/src/amd/common/ac_llvm_build.c
> +++ b/src/amd/common/ac_llvm_build.c
> @@ -3345,68 +3345,88 @@ ac_build_alu_op(struct ac_llvm_context *ctx, 
> LLVMValueRef lhs, LLVMValueRef rhs,
> _64bit ? ctx->f64 : ctx->f32,
> (LLVMValueRef[]){lhs, rhs}, 2, 
> AC_FUNC_ATTR_READNONE);
> case nir_op_iand: return LLVMBuildAnd(ctx->builder, lhs, rhs, "");
> case nir_op_ior: return LLVMBuildOr(ctx->builder, lhs, rhs, "");
> case nir_op_ixor: return LLVMBuildXor(ctx->builder, lhs, rhs, "");
> default:
> unreachable("bad reduction intrinsic");
> }
>  }
>
> -/* TODO: add inclusive and excluse scan functions for SI chip class.  */
> +/**
> + * \param maxprefix specifies that the result only needs to be correct for a
> + * prefix of this many threads
> + *
> + * TODO: add inclusive and excluse scan functions for SI chip class.
> + */
>  static LLVMValueRef
> -ac_build_scan(struct ac_llvm_context *ctx, nir_op op, LLVMValueRef src, 
> LLVMValueRef identity)
> +ac_build_scan(struct ac_llvm_context *ctx, nir_op op, LLVMValueRef src, 
> LLVMValueRef identity,
> + unsigned maxprefix)
>  {
> LLVMValueRef result, tmp;
> result = src;
> +   if (maxprefix <= 1)
> +   return result;
> tmp = ac_build_dpp(ctx, identity, src, dpp_row_sr(1), 0xf, 0xf, 
> false);
> result = ac_build_alu_op(ctx, result, tmp, op);
> +   if (maxprefix <= 2)
> +   return result;
> tmp = ac_build_dpp(ctx, identity, src, dpp_row_sr(2), 0xf, 0xf, 
> false);
> result = ac_build_alu_op(ctx, result, tmp, op);
> +   if (maxprefix <= 3)
> +   return result;
> tmp = ac_build_dpp(ctx, identity, src, dpp_row_sr(3), 0xf, 0xf, 
> false);
> result = ac_build_alu_op(ctx, result, tmp, op);
> +   if (maxprefix <= 4)
> +   return result;
> tmp = ac_build_dpp(ctx, identity, result, dpp_row_sr(4), 0xf, 0xe, 
> false);
> result = ac_build_alu_op(ctx, result, tmp, op);
> +   if (maxprefix <= 8)
> +   return result;
> tmp = ac_build_dpp(ctx, identity, result, dpp_row_sr(8), 0xf, 0xc, 
> false);
> result = ac_build_alu_op(ctx, result, tmp, op);
> +   if (maxprefix <= 16)
> +   return result;
> tmp = ac_build_dpp(ctx, identity, result, dpp_row_bcast15, 0xa, 0xf, 
> false);
> result = ac_build_alu_op(ctx, result, tmp, op);
> +   if (maxprefix <= 32)
> +   return result;
> tmp = ac_build_dpp(ctx, identity, result, dpp_row_bcast31, 0xc, 0xf, 
> false);
> result = ac_build_alu_op(ctx, result, tmp, op);
> return result;
>  }
>
>  LLVMValueRef
>  ac_build_inclusive_scan(struct ac_llvm_context *ctx, LLVMValueRef src, 
> nir_op op)
>  {
> ac_build_optimization_barrier(ctx, );
> LLVMValueRef result;
> LLVMValueRef identity =
> get_reduction_identity(ctx, op, 
> ac_get_type_size(LLVMTypeOf(src)));
> result = LLVMBuildBitCast(ctx->builder, ac_build_set_inactive(ctx, 
> src, identity),
>   LLVMTypeOf(identity), "");
> -   result = ac_build_scan(ctx, op, result, identity);
> +   result = ac_build_scan(ctx, op, result, identity, 64);
>
> return ac_build_wwm(ctx, result);
>  }
>
>  LLVMValueRef
>  ac_build_exclusive_scan(struct ac_llvm_context *ctx, LLVMValueRef src, 
> nir_op op)
>  {
> ac_build_optimization_barrier(ctx, );
> LLVMValueRef result;
> LLVMValueRef identity =
> get_reduction_identity(ctx, op, 
> ac_get_type_size(LLVMTypeOf(src)));
> result = LLVMBuildBitCast(ctx->builder, ac_build_set_inactive(ctx, 
> src, identity),
>   LLVMTypeOf(identity), "");
> result = ac_build_dpp(ctx, identity, result, dpp_wf_sr1, 0xf, 0xf, 
> false);
> -   result = ac_build_scan(ctx, op, result, identity);
> +   result = ac_build_scan(ctx, op, result, identity, 64);
>
> return ac_build_wwm(ctx, result);
>  }
>
>  LLVMValueRef
>  ac_build_reduce(struct ac_llvm_context *ctx, LLVMValueRef src, nir_op op, 
> unsigned cluster_size)
>  {
> if (cluster_size == 1) return src;
> 

Re: [Mesa-dev] [PATCH v2 00/11] nir: Rework boolean conversions

2018-12-05 Thread Connor Abbott
I had a look, and this series is

Reviewed-by: Connor Abbott 

I think I already reviewed at least part of your 1-bit-boolean series,
and had some feedback, but I'm done reviewing for today :)
On Fri, Nov 30, 2018 at 12:37 AM Jason Ekstrand  wrote:
>
> This is mostly a re-send of my earlier series to rework boolean conversions
> in NIR which can be found here:
>
> https://lists.freedesktop.org/archives/mesa-dev/2018-November/209089.html
>
> This version solidly chooses path B from the previous series and is rebased
> on top of Connor's nir_algebraic.py reworks.  With his rework, the
> nir_algebraic.py changes required by this series are much simpler.
>
> Jason Ekstrand (11):
>   nir/opcodes: Pull in the type helpers from constant_expressions
>   nir/opcodes: Rename tbool to tbool32
>   nir/algebraic: Clean up some __str__ cruft
>   nir/algebraic: Refactor codegen a bit
>   nir/algebraic: Add support for unsized conversion opcodes
>   nir/opt_algebraic: Simplify an optimization using the new search ops
>   nir/opt_algebraic: Drop bit-size suffixes from conversions
>   nir/opt_algebraic: Add 32-bit specifiers to a bunch of booleans
>   nir: Make boolean conversions sized just like the others
>   FIXUP: nir/opt_algebraic: Add suffixes to some x2b opcodes
>   FIXUP: Fix NIR producers and consumers to use unsized conversions
>
>  src/amd/common/ac_nir_to_llvm.c  |  12 +-
>  src/broadcom/compiler/nir_to_vir.c   |   8 +-
>  src/compiler/glsl/glsl_to_nir.cpp|   2 +-
>  src/compiler/nir/nir.h   |   4 +-
>  src/compiler/nir/nir_algebraic.py|  74 ++--
>  src/compiler/nir/nir_builder.h   |  12 ++
>  src/compiler/nir/nir_constant_expressions.py |  25 +--
>  src/compiler/nir/nir_lower_idiv.c|   2 +-
>  src/compiler/nir/nir_lower_int64.c   |   2 +-
>  src/compiler/nir/nir_opcodes.py  |  73 +---
>  src/compiler/nir/nir_opcodes_c.py|  41 ++--
>  src/compiler/nir/nir_opt_algebraic.py| 187 +--
>  src/compiler/nir/nir_opt_if.c|   2 +-
>  src/compiler/nir/nir_search.c| 109 ++-
>  src/compiler/nir/nir_search.h|  17 +-
>  src/compiler/spirv/vtn_glsl450.c |   4 +-
>  src/freedreno/ir3/ir3_compiler_nir.c |  11 +-
>  src/gallium/drivers/vc4/vc4_program.c|   8 +-
>  src/intel/compiler/brw_fs_nir.cpp|  19 +-
>  src/intel/compiler/brw_vec4_nir.cpp  |   9 +-
>  src/mesa/program/prog_to_nir.c   |   4 +-
>  21 files changed, 397 insertions(+), 228 deletions(-)
>
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/28] nir: support for denorm flush-to-zero in nir_lower_double_ops

2018-12-05 Thread Connor Abbott
All the current lowerings produce their result using a floating-point
multiply or add, so denorms should already be flushed (e.g.
nir_op_frcp), or they never produce a denorm (e.g. nir_op_ftrunc), so
I don't think this is necessary.
On Wed, Dec 5, 2018 at 4:56 PM Samuel Iglesias Gonsálvez
 wrote:
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/compiler/nir/nir_lower_double_ops.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/src/compiler/nir/nir_lower_double_ops.c 
> b/src/compiler/nir/nir_lower_double_ops.c
> index b3543bc6963..97b825d2fdb 100644
> --- a/src/compiler/nir/nir_lower_double_ops.c
> +++ b/src/compiler/nir/nir_lower_double_ops.c
> @@ -558,6 +558,18 @@ lower_doubles_instr(nir_alu_instr *instr, 
> nir_lower_doubles_options options)
>unreachable("unhandled opcode");
> }
>
> +   bool denorm_flush_to_zero =
> +  bld.shader->info.shader_float_controls_execution_mode & 
> SHADER_DENORM_FLUSH_TO_ZERO_FP64;
> +   if (denorm_flush_to_zero) {
> +  /* TODO: add support for flushing negative denorms to -0.0 */
> +  /* Flush to zero if the result value is a denorm */
> +  result = nir_bcsel(,
> +  nir_flt(, nir_fabs(, result),
> +  nir_imm_double(, 
> 2.22507385850720138309023271733e-308)),
> +  nir_imm_double(, 0.0),
> +  result);
> +   }
> +
> nir_ssa_def_rewrite_uses(>dest.dest.ssa, nir_src_for_ssa(result));
> nir_instr_remove(>instr);
> return true;
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 14/28] nir: fix denorms in unpack_half_1x16()

2018-12-05 Thread Connor Abbott
On Wed, Dec 5, 2018 at 4:56 PM Samuel Iglesias Gonsálvez
 wrote:
>
> According to VK_KHR_shader_float_controls:
>
> "Denormalized values obtained via unpacking an integer into a vector
>  of values with smaller bit width and interpreting those values as
>  floating-point numbers must: be flushed to zero, unless the entry point
>  is declared with the code:DenormPreserve execution mode."
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/compiler/nir/nir_constant_expressions.py | 13 +
>  src/compiler/nir/nir_lower_alu_to_scalar.c   | 10 --
>  src/compiler/nir/nir_opcodes.py  |  5 +
>  3 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/nir/nir_constant_expressions.py 
> b/src/compiler/nir/nir_constant_expressions.py
> index a9af1bd233d..bc60a08da28 100644
> --- a/src/compiler/nir/nir_constant_expressions.py
> +++ b/src/compiler/nir/nir_constant_expressions.py
> @@ -245,6 +245,19 @@ pack_half_1x16(float x)
> return _mesa_float_to_half(x);
>  }
>
> +/**
> + * Evaluate one component of unpackHalf2x16.
> + */
> +static float
> +unpack_half_1x16_flush_to_zero(uint16_t u)
> +{
> +   if (u < 0x0400)
> +  u = 0;
> +   if (u & 0x8000 && !(u & 0x7c00))
> +  u = 0x8000;
> +   return _mesa_half_to_float(u);
> +}
> +
>  /**
>   * Evaluate one component of unpackHalf2x16.
>   */
> diff --git a/src/compiler/nir/nir_lower_alu_to_scalar.c 
> b/src/compiler/nir/nir_lower_alu_to_scalar.c
> index 7ef032cd164..d80cf2504c7 100644
> --- a/src/compiler/nir/nir_lower_alu_to_scalar.c
> +++ b/src/compiler/nir/nir_lower_alu_to_scalar.c
> @@ -133,8 +133,14 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder 
> *b)
>nir_ssa_def *packed = nir_ssa_for_alu_src(b, instr, 0);
>
>nir_ssa_def *comps[2];
> -  comps[0] = nir_unpack_half_2x16_split_x(b, packed);
> -  comps[1] = nir_unpack_half_2x16_split_y(b, packed);
> +
> +  if (b->shader->info.shader_float_controls_execution_mode & 
> SHADER_DENORM_FLUSH_TO_ZERO_FP16) {
> + comps[0] = nir_unpack_half_2x16_split_x_flush_to_zero(b, packed);
> + comps[1] = nir_unpack_half_2x16_split_y_flush_to_zero(b, packed);

This feels a little wrong... I think either we should add a
nir_op_unpack_half_2x16_flush_to_zero, or we shouldn't be changing
this here. We should be consistent on whether the flushing behavior is
implied by the environment or specified by the opcode.

> +  } else {
> + comps[0] = nir_unpack_half_2x16_split_x(b, packed);
> + comps[1] = nir_unpack_half_2x16_split_y(b, packed);
> +  }
>nir_ssa_def *vec = nir_vec(b, comps, 2);
>
>nir_ssa_def_rewrite_uses(>dest.dest.ssa, nir_src_for_ssa(vec));
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index eb554a66b44..191025f6932 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -309,6 +309,11 @@ unop_convert("unpack_half_2x16_split_x", tfloat32, 
> tuint32,
>  unop_convert("unpack_half_2x16_split_y", tfloat32, tuint32,
>   "unpack_half_1x16((uint16_t)(src0 >> 16))")
>
> +unop_convert("unpack_half_2x16_split_x_flush_to_zero", tfloat32, tuint32,
> + "unpack_half_1x16_flush_to_zero((uint16_t)(src0 & 0x))")
> +unop_convert("unpack_half_2x16_split_y_flush_to_zero", tfloat32, tuint32,
> + "unpack_half_1x16_flush_to_zero((uint16_t)(src0 >> 16))")
> +
>  unop_convert("unpack_32_2x16_split_x", tuint16, tuint32, "src0")
>  unop_convert("unpack_32_2x16_split_y", tuint16, tuint32, "src0 >> 16")
>
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 17/28] intel/nir: call nir_opt_constant_folding before nir_opt_algebraic is executed

2018-12-05 Thread Connor Abbott
Why is this needed? In general, we shouldn't be relying on
optimization ordering for correctness. It probably just means one of
the optimizations is wrong, and you're working around that.
On Wed, Dec 5, 2018 at 4:56 PM Samuel Iglesias Gonsálvez
 wrote:
>
> This would do constant folding and also flush to zero denorms operands before
> the nir_opt_algebraic is executed.
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/intel/compiler/brw_nir.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
> index 0a5aa35c700..600f7a97df9 100644
> --- a/src/intel/compiler/brw_nir.c
> +++ b/src/intel/compiler/brw_nir.c
> @@ -570,8 +570,8 @@ brw_nir_optimize(nir_shader *nir, const struct 
> brw_compiler *compiler,
>OPT(nir_opt_cse);
>OPT(nir_opt_peephole_select, 0);
>OPT(nir_opt_intrinsics);
> -  OPT(nir_opt_algebraic);
>OPT(nir_opt_constant_folding);
> +  OPT(nir_opt_algebraic);
>OPT(nir_opt_dead_cf);
>if (OPT(nir_opt_trivial_continues)) {
>   /* If nir_opt_trivial_continues makes progress, then we need to 
> clean
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/28] nir: take into account rounding modes in conversions

2018-12-05 Thread Connor Abbott
On Wed, Dec 5, 2018 at 4:56 PM Samuel Iglesias Gonsálvez
 wrote:
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/compiler/nir/nir.h   | 15 +++
>  src/compiler/nir/nir_constant_expressions.py | 46 +---
>  src/compiler/spirv/vtn_alu.c | 16 ++-
>  3 files changed, 71 insertions(+), 6 deletions(-)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 65a1f60c3c6..f22ac13b2ac 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -866,6 +866,21 @@ nir_get_nir_type_for_glsl_type(const struct glsl_type 
> *type)
>  nir_op nir_type_conversion_op(nir_alu_type src, nir_alu_type dst,
>nir_rounding_mode rnd);
>
> +static inline nir_rounding_mode
> +nir_get_rounding_mode_from_float_controls(unsigned rounding_mode,
> +  nir_alu_type type)
> +{
> +   if (nir_alu_type_get_base_type(type) != nir_type_float)
> +  return nir_rounding_mode_undef;
> +
> +   if (rounding_mode & SHADER_ROUNDING_MODE_RTZ)
> +  return nir_rounding_mode_rtz;
> +   if (rounding_mode & SHADER_ROUNDING_MODE_RTE)
> +  return nir_rounding_mode_rtne;
> +
> +   return nir_rounding_mode_undef;
> +}
> +
>  typedef enum {
> NIR_OP_IS_COMMUTATIVE = (1 << 0),
> NIR_OP_IS_ASSOCIATIVE = (1 << 1),
> diff --git a/src/compiler/nir/nir_constant_expressions.py 
> b/src/compiler/nir/nir_constant_expressions.py
> index 118af9f7818..a9af1bd233d 100644
> --- a/src/compiler/nir/nir_constant_expressions.py
> +++ b/src/compiler/nir/nir_constant_expressions.py
> @@ -79,6 +79,7 @@ template = """\
>  #include 
>  #include "util/rounding.h" /* for _mesa_roundeven */
>  #include "util/half_float.h"
> +#include "util/double.h"
>  #include "nir_constant_expressions.h"
>
>  /**
> @@ -300,7 +301,15 @@ struct bool32_vec {
>   % elif input_types[j] == "float16":
>  _mesa_half_to_float(_src[${j}].u16[${k}]),
>   % else:
> -_src[${j}].${get_const_field(input_types[j])}[${k}],
> +% if ("rtne" in op.name) and ("float" in input_types[j]) and 
> ("int" in output_type):

This should be done in the opcode definition, not buried inside
internal nir_constant_expressions code, unless there's a very good
reason. The whole reason the constant-folding code is put next to the
opcode is to document what it does, and you're defeating that here.

> +   % if "float32" in input_types[j]:
> +  
> _mesa_roundevenf(_src[${j}].${get_const_field(input_types[j])}[${k}]),
> +   % else:
> +  
> _mesa_roundeven(_src[${j}].${get_const_field(input_types[j])}[${k}]),
> +   % endif
> +% else:
> +   _src[${j}].${get_const_field(input_types[j])}[${k}],
> +% endif
>   % endif
>% endfor
>% for k in range(op.input_sizes[j], 4):
> @@ -328,8 +337,27 @@ struct bool32_vec {
> const float src${j} =
>_mesa_half_to_float(_src[${j}].u16[_i]);
>  % else:
> -   const ${input_types[j]}_t src${j} =
> -  _src[${j}].${get_const_field(input_types[j])}[_i];
> +   % if ("rtne" in op.name) and ("float" in input_types[j]) and 
> ("int" in output_type):
> +  % if "float32" in input_types[j]:
> + const ${input_types[j]}_t src${j} =
> +
> _mesa_roundevenf(_src[${j}].${get_const_field(input_types[j])}[_i]);
> +  % else:
> + const ${input_types[j]}_t src${j} =
> +
> _mesa_roundeven(_src[${j}].${get_const_field(input_types[j])}[_i]);
> +
> +  % endif
> +   % elif ("float64" in input_types[j]) and ("float32" in 
> output_type):
> +  % if ("rtz" in op.name):
> + const ${input_types[j]}_t src${j} =
> +
> _mesa_double_to_float_rtz(_src[${j}].${get_const_field(input_types[j])}[_i]);
> +  % else:
> + const ${input_types[j]}_t src${j} =
> +
> _mesa_double_to_float_rtne(_src[${j}].${get_const_field(input_types[j])}[_i]);
> +  % endif
> +   % else:
> +  const ${input_types[j]}_t src${j} =
> + _src[${j}].${get_const_field(input_types[j])}[_i];
> +   % endif
>  % endif
>   % endfor
>
> @@ -350,7 +378,11 @@ struct bool32_vec {
>  ## Sanitize the C value to a proper NIR bool
>  _dst_val.u32[_i] = dst ? NIR_TRUE : NIR_FALSE;
>   % elif output_type == "float16":
> -_dst_val.u16[_i] = _mesa_float_to_half(dst);
> +% if "rtz" in op.name:
> +   _dst_val.u16[_i] = _mesa_float_to_float16_rtz(dst);
> +% else:
> +   _dst_val.u16[_i] = 

Re: [Mesa-dev] [PATCH 09/28] nir/algebraic: fix (inf - inf) = NaN case

2018-12-05 Thread Connor Abbott
This is not acceptable, since this will disable the optimization even
when it would otherwise be allowed. Either we need to mark everything
as precise if one of these execution modes are enabled, or we need to
disable this optimization only if the appropriate preserve-NaN-and-Inf
mode is set.
On Wed, Dec 5, 2018 at 4:56 PM Samuel Iglesias Gonsálvez
 wrote:
>
> If we have (inf - inf) we should return NaN, not 0.0. Same for
> (NaN - NaN) case.
>
> Fixes tests in Vulkan CTS that produce such kind subtractions.
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index 747f1751086..e4f77e7b952 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -91,7 +91,6 @@ optimizations = [
> (('usadd_4x8', a, ~0), ~0),
> (('~fadd', ('fmul', a, b), ('fmul', a, c)), ('fmul', a, ('fadd', b, c))),
> (('iadd', ('imul', a, b), ('imul', a, c)), ('imul', a, ('iadd', b, c))),
> -   (('~fadd', ('fneg', a), a), 0.0),
> (('iadd', ('ineg', a), a), 0),
> (('iadd', ('ineg', a), ('iadd', a, b)), b),
> (('iadd', a, ('iadd', ('ineg', a), b)), b),
> @@ -891,7 +890,6 @@ before_ffma_optimizations = [
>
> (('~fadd', ('fmul', a, b), ('fmul', a, c)), ('fmul', a, ('fadd', b, c))),
> (('iadd', ('imul', a, b), ('imul', a, c)), ('imul', a, ('iadd', b, c))),
> -   (('~fadd', ('fneg', a), a), 0.0),
> (('iadd', ('ineg', a), a), 0),
> (('iadd', ('ineg', a), ('iadd', a, b)), b),
> (('iadd', a, ('iadd', ('ineg', a), b)), b),
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/28] Revert "spirv: Don’t check for NaN for most OpFOrd* comparisons"

2018-12-05 Thread Connor Abbott
This won't work, since this optimization in nir_opt_algebraic will undo it:

# For any float comparison operation, "cmp", if you have "a == a && a cmp b"
# then the "a == a" is redundant because it's equivalent to "a is not NaN"
# and, if a is a NaN then the second comparison will fail anyway.
for op in ['flt', 'fge', 'feq']:
   optimizations += [
  (('iand', ('feq', a, a), (op, a, b)), (op, a, b)),
  (('iand', ('feq', a, a), (op, b, a)), (op, b, a)),
   ]

and then this optimization might change the behavior for NaN's:

   # Comparison simplifications
   (('~inot', ('flt', a, b)), ('fge', a, b)),
   (('~inot', ('fge', a, b)), ('flt', a, b)),
   (('~inot', ('feq', a, b)), ('fne', a, b)),
   (('~inot', ('fne', a, b)), ('feq', a, b)),

The topic of NaN's and comparisons in NIR has been discussed several
times before, most recently with this thread:
https://lists.freedesktop.org/archives/mesa-dev/2018-December/210780.html

Given that this language got added to the Vulkan spec: "By default,
the implementation may perform optimizations on half, single, or
double-precision floating-point instructions respectively that ignore
sign of a zero, or assume that arguments and results are not Nans or
±∞" I think we should probably do the following:

- Fix the CTS tests that prompted this whole block of code to only
check the result of comparing NaN when this extension is available and
NaN's are preserved.
- nir_op_fne must be unordered already, regardless of what
floating-point options the user asked for, since it's used to
implement isNan() already. We should probably also define nir_op_feq
to be ordered. So we don't have to do anything with them, and !(a ==
b) == (a == b) is guaranteed.
- Define fgt and fle to be ordered, as this is what every backend
already does. No need to add unnecessary NaN checks.
- Disable the comparison simplifications (except for the fne one,
which shouldn't be marked imprecise as it is now) when preserve-nan is
set. I think there are a few other imprecise transforms that also need
to be disabled.
- (optional) Add fgtu and fleu opcodes for unordered comparisons. This
might help backends which can do these in only one instruction. Even
if we don't do this, these can be implemented as not (fle a, b) and
not (fgt a, b) respectively, which is fewer instructions than the
current lowering.
- (optional) Add fequ and fneo opcodes that do unordered equal and
ordered not-equal, respectively. Otherwise they have to be implemented
with explicit NaN checks like now.

On Wed, Dec 5, 2018 at 4:56 PM Samuel Iglesias Gonsálvez
 wrote:
>
> This reverts commit c4ab1bdcc9710e3c7cc7115d3be9c69b7e7712ef. We need
> to check the arguments looking for NaNs, because they can introduce
> failures in tests for FOrd*, specially when running
> VK_KHR_shader_float_control tests in CTS.
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/compiler/spirv/vtn_alu.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/src/compiler/spirv/vtn_alu.c b/src/compiler/spirv/vtn_alu.c
> index dc6fedc9129..629b57560ca 100644
> --- a/src/compiler/spirv/vtn_alu.c
> +++ b/src/compiler/spirv/vtn_alu.c
> @@ -535,18 +535,23 @@ vtn_handle_alu(struct vtn_builder *b, SpvOp opcode,
>break;
> }
>
> -   case SpvOpFOrdNotEqual: {
> -  /* For all the SpvOpFOrd* comparisons apart from NotEqual, the value
> -   * from the ALU will probably already be false if the operands are not
> -   * ordered so we don’t need to handle it specially.
> -   */
> +   case SpvOpFOrdEqual:
> +   case SpvOpFOrdNotEqual:
> +   case SpvOpFOrdLessThan:
> +   case SpvOpFOrdGreaterThan:
> +   case SpvOpFOrdLessThanEqual:
> +   case SpvOpFOrdGreaterThanEqual: {
>bool swap;
>unsigned src_bit_size = glsl_get_bit_size(vtn_src[0]->type);
>unsigned dst_bit_size = glsl_get_bit_size(type);
>nir_op op = vtn_nir_alu_op_for_spirv_opcode(b, opcode, ,
>src_bit_size, 
> dst_bit_size);
>
> -  assert(!swap);
> +  if (swap) {
> + nir_ssa_def *tmp = src[0];
> + src[0] = src[1];
> + src[1] = tmp;
> +  }
>
>val->ssa->def =
>   nir_iand(>nb,
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/28] nir: add support for flushing to zero denorm constants

2018-12-05 Thread Connor Abbott
Given that other places call nir_eval_const_opcode(), and they'll be
broken unless they also flush denorms, it's probably a good idea to
move all this into nir_eval_const_opcode() itself.

On Wed, Dec 5, 2018 at 4:56 PM Samuel Iglesias Gonsálvez
 wrote:
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/compiler/nir/nir_opt_constant_folding.c | 74 +++--
>  1 file changed, 68 insertions(+), 6 deletions(-)
>
> diff --git a/src/compiler/nir/nir_opt_constant_folding.c 
> b/src/compiler/nir/nir_opt_constant_folding.c
> index 1fca530af24..a6df8284e17 100644
> --- a/src/compiler/nir/nir_opt_constant_folding.c
> +++ b/src/compiler/nir/nir_opt_constant_folding.c
> @@ -39,7 +39,7 @@ struct constant_fold_state {
>  };
>
>  static bool
> -constant_fold_alu_instr(nir_alu_instr *instr, void *mem_ctx)
> +constant_fold_alu_instr(nir_alu_instr *instr, void *mem_ctx, unsigned 
> execution_mode)
>  {
> nir_const_value src[NIR_MAX_VEC_COMPONENTS];
>
> @@ -77,12 +77,39 @@ constant_fold_alu_instr(nir_alu_instr *instr, void 
> *mem_ctx)
>   switch(load_const->def.bit_size) {
>   case 64:
>  src[i].u64[j] = load_const->value.u64[instr->src[i].swizzle[j]];
> +if (execution_mode & SHADER_DENORM_FLUSH_TO_ZERO_FP64 &&
> +(nir_op_infos[instr->op].input_types[i] == nir_type_float ||
> + nir_op_infos[instr->op].input_types[i] == 
> nir_type_float64)) {
> +   if (src[i].u64[j] < 0x0010)
> +  src[i].u64[j] = 0;
> +   if (src[i].u64[j] & 0x8000 &&
> +   !(src[i].u64[j] & 0x7ff0))
> +  src[i].u64[j] = 0x8000;
> +}

Given that this code is duplicated for inputs and outputs in this
patch, maybe refactor to a shared helper?

>  break;
>   case 32:
>  src[i].u32[j] = load_const->value.u32[instr->src[i].swizzle[j]];
> +if (execution_mode & SHADER_DENORM_FLUSH_TO_ZERO_FP32 &&
> +(nir_op_infos[instr->op].input_types[i] == nir_type_float ||
> + nir_op_infos[instr->op].input_types[i] == 
> nir_type_float32)) {
> +   if (src[i].u32[j] < 0x0080)
> +  src[i].u32[j] = 0;
> +   if (src[i].u32[j] & 0x8000 &&
> +   !(src[i].u32[j] & 0x7f80))
> +  src[i].u32[j] = 0x8000;
> +}
>  break;
>   case 16:
>  src[i].u16[j] = load_const->value.u16[instr->src[i].swizzle[j]];
> +if (execution_mode & SHADER_DENORM_FLUSH_TO_ZERO_FP16 &&
> +(nir_op_infos[instr->op].input_types[i] == nir_type_float ||
> + nir_op_infos[instr->op].input_types[i] == 
> nir_type_float16)) {
> +   if (src[i].u16[j] < 0x0400)
> +  src[i].u16[j] = 0;
> +   if (src[i].u16[j] & 0x8000 &&
> +   !(src[i].u16[j] & 0x7c00))
> +  src[i].u16[j] = 0x8000;
> +}
>  break;
>   case 8:
>  src[i].u8[j] = load_const->value.u8[instr->src[i].swizzle[j]];
> @@ -106,6 +133,40 @@ constant_fold_alu_instr(nir_alu_instr *instr, void 
> *mem_ctx)
>nir_eval_const_opcode(instr->op, instr->dest.dest.ssa.num_components,
>  bit_size, src);
>
> +   for (unsigned j = 0; j < instr->dest.dest.ssa.num_components; j++) {
> +  if (execution_mode & SHADER_DENORM_FLUSH_TO_ZERO_FP64 &&
> +  bit_size == 64 &&
> +  (nir_op_infos[instr->op].output_type == nir_type_float ||
> +   nir_op_infos[instr->op].output_type == nir_type_float64)) {

The bit_size doesn't have to equal the destination bitsize, it's the
bitsize for inputs and outputs which are unsized (e.g. output_type ==
nir_type_float instead of nir_type_float32). This should be
(output_type == nir_type_float && bit_size == 64) || output_type ==
nir_type_float64, and that goes for for the other bitsizes too.

> + if (dest.u64[j] < 0x0010)
> +dest.u64[j] = 0;
> + if (dest.u64[j] & 0x8000 &&
> + !(dest.u64[j] & 0x7ff0))
> +dest.u64[j] = 0x8000;
> +  }
> +  if (execution_mode & SHADER_DENORM_FLUSH_TO_ZERO_FP32 &&
> +  bit_size == 32 &&
> +  (nir_op_infos[instr->op].output_type == nir_type_float ||
> +   nir_op_infos[instr->op].output_type == nir_type_float32)) {
> + if (dest.u32[j] < 0x0080)
> +dest.u32[j] = 0;
> + if (dest.u32[j] & 0x8000 &&
> + !(dest.u32[j] & 0x7f80))
> +dest.u32[j] = 0x8000;
> +  }
> +
> +  if (execution_mode & SHADER_DENORM_FLUSH_TO_ZERO_FP16 &&
> +  bit_size == 16 &&
> +  (nir_op_infos[instr->op].output_type == nir_type_float ||
> +   

[Mesa-dev] [PATCH v3] nir/algebraic: Rewrite bit-size inference

2018-12-05 Thread Connor Abbott
Before this commit, there were two copies of the algorithm: one in C,
that we would use to figure out what bit-size to give the replacement
expression, and one in Python, that emulated the C one and tried to
prove that the C algorithm would never fail to correctly assign
bit-sizes. That seemed pretty fragile, and likely to fall over if we
make any changes. Furthermore, the C code was really just recomputing
more-or-less the same thing as the Python code every time. Instead, we
can just store the results of the Python algorithm in the C
datastructure, and consult it to compute the bitsize of each value,
moving the "brains" entirely into Python. Since the Python algorithm no
longer has to match C, it's also a lot easier to change it to something
more closely approximating an actual type-inference algorithm. The
algorithm used is based on Hindley-Milner, although deliberately
weakened a little. It's a few more lines than the old one, judging by
the diffstat, but I think it's easier to verify that it's correct while
being as general as possible.

We could split this up into two changes, first making the C code use the
results of the Python code and then rewriting the Python algorithm, but
since the old algorithm never tracked which variable each equivalence
class, it would mean we'd have to add some non-trivial code which would
then get thrown away. I think it's better to see the final state all at
once, although I could also try splitting it up.

v2:
- Replace instances of "== None" and "!= None" with "is None" and
"is not None".
- Rename first_src to first_unsized_src
- Only merge the destination with the first unsized source, since the
sources have already been merged.
- Add a comment explaining what nir_search_value::bit_size now means.
v3:
- Fix one last instance to use "is not" instead of !=
- Don't try to be so clever when choosing which error message to print
based on whether we're in the search or replace expression.
- Fix trailing whitespace.
---
 src/compiler/nir/nir_algebraic.py | 520 --
 src/compiler/nir/nir_search.c | 146 +
 src/compiler/nir/nir_search.h |  17 +-
 3 files changed, 317 insertions(+), 366 deletions(-)

diff --git a/src/compiler/nir/nir_algebraic.py 
b/src/compiler/nir/nir_algebraic.py
index 728196136ab..efd6e52cdb9 100644
--- a/src/compiler/nir/nir_algebraic.py
+++ b/src/compiler/nir/nir_algebraic.py
@@ -88,7 +88,7 @@ class Value(object):
 
__template = mako.template.Template("""
 static const ${val.c_type} ${val.name} = {
-   { ${val.type_enum}, ${val.bit_size} },
+   { ${val.type_enum}, ${val.c_bit_size} },
 % if isinstance(val, Constant):
${val.type()}, { ${val.hex()} /* ${val.value} */ },
 % elif isinstance(val, Variable):
@@ -112,6 +112,40 @@ static const ${val.c_type} ${val.name} = {
def __str__(self):
   return self.in_val
 
+   def get_bit_size(self):
+  """Get the physical bit-size that has been chosen for this value, or if
+  there is none, the canonical value which currently represents this
+  bit-size class. Variables will be preferred, i.e. if there are any
+  variables in the equivalence class, the canonical value will be a
+  variable. We do this since we'll need to know which variable each value
+  is equivalent to when constructing the replacement expression. This is
+  the "find" part of the union-find algorithm.
+  """
+  bit_size = self
+
+  while isinstance(bit_size, Value):
+ if bit_size._bit_size is None:
+break
+ bit_size = bit_size._bit_size
+
+  if bit_size is not self:
+ self._bit_size = bit_size
+  return bit_size
+
+   def set_bit_size(self, other):
+  """Make self.get_bit_size() return what other.get_bit_size() return
+  before calling this, or just "other" if it's a concrete bit-size. This is
+  the "union" part of the union-find algorithm.
+  """
+
+  self_bit_size = self.get_bit_size()
+  other_bit_size = other if isinstance(other, int) else 
other.get_bit_size()
+
+  if self_bit_size == other_bit_size:
+ return
+
+  self_bit_size._bit_size = other_bit_size
+
@property
def type_enum(self):
   return "nir_search_value_" + self.type_str
@@ -124,6 +158,21 @@ static const ${val.c_type} ${val.name} = {
def c_ptr(self):
   return "&{0}.value".format(self.name)
 
+   @property
+   def c_bit_size(self):
+  bit_size = self.get_bit_size()
+  if isinstance(bit_size, int):
+ return bit_size
+  elif isinstance(bit_size, Variable):
+ return -bit_size.index - 1
+  else:
+ # If the bit-size class is neither a variable, nor an actual 
bit-size, then
+ # - If it's in the search expression, we don't need to check anything
+ # - If it's in the replace expression, either it's ambiguous (in which
+ # case we'd reject it), or it equals the bit-size of the search value
+ # We represent these cases 

Re: [Mesa-dev] [PATCH 1/2] nir/algebraic: Rewrite bit-size inference

2018-12-04 Thread Connor Abbott
On Mon, Dec 3, 2018 at 11:39 PM Dylan Baker  wrote:
>
> Quoting Jason Ekstrand (2018-12-03 14:12:41)
> > On Mon, Dec 3, 2018 at 3:50 PM Dylan Baker  wrote:
> >
> >     Quoting Connor Abbott (2018-11-29 10:32:02)
> > > Before this commit, there were two copies of the algorithm: one in C,
> > > that we would use to figure out what bit-size to give the replacement
> > > expression, and one in Python, that emulated the C one and tried to
> > > prove that the C algorithm would never fail to correctly assign
> > > bit-sizes. That seemed pretty fragile, and likely to fall over if we
> > > make any changes. Furthermore, the C code was really just recomputing
> > > more-or-less the same thing as the Python code every time. Instead, we
> > > can just store the results of the Python algorithm in the C
> > > datastructure, and consult it to compute the bitsize of each value,
> > > moving the "brains" entirely into Python. Since the Python algorithm 
> > no
> > > longer has to match C, it's also a lot easier to change it to 
> > something
> > > more closely approximating an actual type-inference algorithm. The
> > > algorithm used is based on Hindley-Milner, although deliberately
> > > weakened a little. It's a few more lines than the old one, judging by
> > > the diffstat, but I think it's easier to verify that it's correct 
> > while
> > > being as general as possible.
> > >
> > > We could split this up into two changes, first making the C code use 
> > the
> > > results of the Python code and then rewriting the Python algorithm, 
> > but
> > > since the old algorithm never tracked which variable each equivalence
> > > class, it would mean we'd have to add some non-trivial code which 
> > would
> > > then get thrown away. I think it's better to see the final state all 
> > at
> > > once, although I could also try splitting it up.
> > > ---
> > >  src/compiler/nir/nir_algebraic.py | 518 
> > --
> > >  src/compiler/nir/nir_search.c | 146 +
> > >  src/compiler/nir/nir_search.h |   2 +-
> > >  3 files changed, 295 insertions(+), 371 deletions(-)
> > >
> > > diff --git a/src/compiler/nir/nir_algebraic.py b/src/compiler/nir/
> > nir_algebraic.py
> > > index 728196136ab..48390dbde38 100644
> > > --- a/src/compiler/nir/nir_algebraic.py
> > > +++ b/src/compiler/nir/nir_algebraic.py
> > > @@ -88,7 +88,7 @@ class Value(object):
> > >
> > > __template = mako.template.Template("""
> > >  static const ${val.c_type} ${val.name} = {
> > > -   { ${val.type_enum}, ${val.bit_size} },
> > > +   { ${val.type_enum}, ${val.c_bit_size} },
> > >  % if isinstance(val, Constant):
> > > ${val.type()}, { ${val.hex()} /* ${val.value} */ },
> > >  % elif isinstance(val, Variable):
> > > @@ -112,6 +112,40 @@ static const ${val.c_type} ${val.name} = {
> > > def __str__(self):
> > >return self.in_val
> > >
> > > +   def get_bit_size(self):
> > > +  """Get the physical bit-size that has been chosen for this 
> > value,
> > or if
> > > +  there is none, the canonical value which currently represents 
> > this
> > > +  bit-size class. Variables will be preferred, i.e. if there are 
> > any
> > > +  variables in the equivalence class, the canonical value will 
> > be a
> > > +  variable. We do this since we'll need to know which variable 
> > each
> > value
> > > +  is equivalent to when constructing the replacement expression.
> > This is
> > > +  the "find" part of the union-find algorithm.
> > > +  """
> > > +  bit_size = self
> > > +
> > > +  while isinstance(bit_size, Value):
> > > + if bit_size._bit_size == None:
> >
> > Use "is" and "is not" instead of "==" and "!=" when comparing singletons
> > like
> > None, True, False; the former are the identity operators, they'll be 
> > faster
> > and
> > avoid any surprises.
> >
> > > +break
> >

[Mesa-dev] [PATCH v2] nir/algebraic: Rewrite bit-size inference

2018-12-03 Thread Connor Abbott
Before this commit, there were two copies of the algorithm: one in C,
that we would use to figure out what bit-size to give the replacement
expression, and one in Python, that emulated the C one and tried to
prove that the C algorithm would never fail to correctly assign
bit-sizes. That seemed pretty fragile, and likely to fall over if we
make any changes. Furthermore, the C code was really just recomputing
more-or-less the same thing as the Python code every time. Instead, we
can just store the results of the Python algorithm in the C
datastructure, and consult it to compute the bitsize of each value,
moving the "brains" entirely into Python. Since the Python algorithm no
longer has to match C, it's also a lot easier to change it to something
more closely approximating an actual type-inference algorithm. The
algorithm used is based on Hindley-Milner, although deliberately
weakened a little. It's a few more lines than the old one, judging by
the diffstat, but I think it's easier to verify that it's correct while
being as general as possible.

We could split this up into two changes, first making the C code use the
results of the Python code and then rewriting the Python algorithm, but
since the old algorithm never tracked which variable each equivalence
class, it would mean we'd have to add some non-trivial code which would
then get thrown away. I think it's better to see the final state all at
once, although I could also try splitting it up.

v2:
- Replace instances of "== None" and "!= None" with "is None" and
"is not None".
- Rename first_src to first_unsized_src
- Only merge the destination with the first unsized source, since the
sources have already been merged.
- Add a comment explaining what nir_search_value::bit_size now means.
---
 src/compiler/nir/nir_algebraic.py | 518 --
 src/compiler/nir/nir_search.c | 146 +
 src/compiler/nir/nir_search.h |  17 +-
 3 files changed, 310 insertions(+), 371 deletions(-)

diff --git a/src/compiler/nir/nir_algebraic.py 
b/src/compiler/nir/nir_algebraic.py
index 728196136ab..1a5b6a1be01 100644
--- a/src/compiler/nir/nir_algebraic.py
+++ b/src/compiler/nir/nir_algebraic.py
@@ -88,7 +88,7 @@ class Value(object):
 
__template = mako.template.Template("""
 static const ${val.c_type} ${val.name} = {
-   { ${val.type_enum}, ${val.bit_size} },
+   { ${val.type_enum}, ${val.c_bit_size} },
 % if isinstance(val, Constant):
${val.type()}, { ${val.hex()} /* ${val.value} */ },
 % elif isinstance(val, Variable):
@@ -112,6 +112,40 @@ static const ${val.c_type} ${val.name} = {
def __str__(self):
   return self.in_val
 
+   def get_bit_size(self):
+  """Get the physical bit-size that has been chosen for this value, or if
+  there is none, the canonical value which currently represents this
+  bit-size class. Variables will be preferred, i.e. if there are any
+  variables in the equivalence class, the canonical value will be a
+  variable. We do this since we'll need to know which variable each value
+  is equivalent to when constructing the replacement expression. This is
+  the "find" part of the union-find algorithm.
+  """
+  bit_size = self
+
+  while isinstance(bit_size, Value):
+ if bit_size._bit_size is None:
+break
+ bit_size = bit_size._bit_size
+
+  if bit_size != self:
+ self._bit_size = bit_size
+  return bit_size
+
+   def set_bit_size(self, other):
+  """Make self.get_bit_size() return what other.get_bit_size() return
+  before calling this, or just "other" if it's a concrete bit-size. This is
+  the "union" part of the union-find algorithm.
+  """
+
+  self_bit_size = self.get_bit_size()
+  other_bit_size = other if isinstance(other, int) else 
other.get_bit_size()
+
+  if self_bit_size == other_bit_size:
+ return
+
+  self_bit_size._bit_size = other_bit_size
+
@property
def type_enum(self):
   return "nir_search_value_" + self.type_str
@@ -124,6 +158,21 @@ static const ${val.c_type} ${val.name} = {
def c_ptr(self):
   return "&{0}.value".format(self.name)
 
+   @property
+   def c_bit_size(self):
+  bit_size = self.get_bit_size()
+  if isinstance(bit_size, int):
+ return bit_size
+  elif isinstance(bit_size, Variable):
+ return -bit_size.index - 1
+  else:
+ # If the bit-size class is neither a variable, nor an actual 
bit-size, then
+ # - If it's in the search expression, we don't need to check anything
+ # - If it's in the replace expression, either it's ambiguous (in which
+ # case we'd reject it), or it equals the bit-size of the search value
+ # We represent these cases with a 0 bit-size.
+ return 0
+
def render(self):
   return self.__template.render(val=self,
 Constant=Constant,
@@ -140,14 +189,14 @@ class Constant(Value):
   if 

Re: [Mesa-dev] [PATCH 1/2] nir: add a compiler option for disabling float comparison simplifications

2018-12-01 Thread Connor Abbott
On Sat, Dec 1, 2018 at 3:22 PM Samuel Pitoiset
 wrote:
>
> I'm not saying this series is the right thing to do. It just fixes two
> test failures in the vkd3d testsuite for RADV. I added a new compiler
> option to not break anything and to only affects RADV. Anyways, it seems
> unclear what the best option is. To sum up, looks like there is 3 ways:
>
> 1) set the exact bit for all SPIRV float comparisons ops
> 2) draft a new extension, not sure if we really need to
> 3) just remove these optimisations when targeting Vulkan
>
> Opinions are welcome, thanks!

Well, I was just pointing out that this has bit us in the past, and
will probably bite us in the future if we don't fix it once and for
all, so it's about more than the vkd3d testsuite. We're just getting
lucky since right now the optimizations only trigger and mess things
up with this testsuite. After thinking about it, I think it's best if
we do another option 4, which is to remove the optimizations in
question and always do the right thing even for GLSL.

>
> On 11/29/18 4:22 PM, Jason Ekstrand wrote:
> > Can you provide some context for this?  Those rules are already flagged
> > "inexact" (that's what the ~ means) so they won't apply to anything
> > that's "precise" or "invariant".
> >
> > On Thu, Nov 29, 2018 at 9:18 AM Samuel Pitoiset
> > mailto:samuel.pitoi...@gmail.com>> wrote:
> >
> > It's correct in GLSL because the behaviour is undefined in
> > presence of NaNs. But this seems incorrect in Vulkan.
> >
> > Signed-off-by: Samuel Pitoiset  > >
> > ---
> >   src/compiler/nir/nir.h| 6 ++
> >   src/compiler/nir/nir_opt_algebraic.py | 8 
> >   2 files changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> > index db935c8496b..4107c293962 100644
> > --- a/src/compiler/nir/nir.h
> > +++ b/src/compiler/nir/nir.h
> > @@ -2188,6 +2188,12 @@ typedef struct nir_shader_compiler_options {
> >  /* Set if nir_lower_wpos_ytransform() should also invert
> > gl_PointCoord. */
> >  bool lower_wpos_pntc;
> >
> > +   /* If false, lower ~inot(flt(a,b)) -> fge(a,b) and variants.
> > +* In presence of NaNs, this is correct in GLSL because the
> > behaviour is
> > +* undefined. In Vulkan, doing these transformations is
> > incorrect.
> > +*/
> > +   bool exact_float_comparisons;
> > +
> >  /**
> >   * Should nir_lower_io() create load_interpolated_input intrinsics?
> >   *
> > diff --git a/src/compiler/nir/nir_opt_algebraic.py
> > b/src/compiler/nir/nir_opt_algebraic.py
> > index f2a7be0c403..3750874407b 100644
> > --- a/src/compiler/nir/nir_opt_algebraic.py
> > +++ b/src/compiler/nir/nir_opt_algebraic.py
> > @@ -154,10 +154,10 @@ optimizations = [
> >  (('ishl', ('imul', a, '#b'), '#c'), ('imul', a, ('ishl', b, c))),
> >
> >  # Comparison simplifications
> > -   (('~inot', ('flt', a, b)), ('fge', a, b)),
> > -   (('~inot', ('fge', a, b)), ('flt', a, b)),
> > -   (('~inot', ('feq', a, b)), ('fne', a, b)),
> > -   (('~inot', ('fne', a, b)), ('feq', a, b)),
> > +   (('~inot', ('flt', a, b)), ('fge', a, b),
> > '!options->exact_float_comparisons'),
> > +   (('~inot', ('fge', a, b)), ('flt', a, b),
> > '!options->exact_float_comparisons'),
> > +   (('~inot', ('feq', a, b)), ('fne', a, b),
> > '!options->exact_float_comparisons'),
> > +   (('~inot', ('fne', a, b)), ('feq', a, b),
> > '!options->exact_float_comparisons'),
> >
> >
> > The feq/fne one is actually completely safe.  fne is defined to be !feq
> > even when NaN is considered.
> >
> > --Jasoan
> >
> >  (('inot', ('ilt', a, b)), ('ige', a, b)),
> >  (('inot', ('ult', a, b)), ('uge', a, b)),
> >  (('inot', ('ige', a, b)), ('ilt', a, b)),
> > --
> > 2.19.2
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org 
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: add a compiler option for disabling float comparison simplifications

2018-12-01 Thread Connor Abbott
On Fri, Nov 30, 2018 at 10:18 PM Ian Romanick  wrote:
>
> On 11/29/2018 07:47 AM, Connor Abbott wrote:
> > On Thu, Nov 29, 2018 at 4:22 PM Jason Ekstrand  wrote:
> >>
> >> Can you provide some context for this?  Those rules are already flagged 
> >> "inexact" (that's what the ~ means) so they won't apply to anything that's 
> >> "precise" or "invariant".
> >
> > I think the concern is that this isn't allowed in SPIR-V, even without
> > exact or invariant. We even go out of our way to do the correct thing
> > in the frontend by inserting an "&& a == a" or "|| a != a", but then
>
> If you're that paranoid about it, why not just mark the operations are
> precise?  That's literally why it exists.

Yes, that's right. And if we decide we need to get it correct for
SPIR-V, then what it's doing now is broken anyways...

>
> > opt_algebraic removes it with another rule and then this rule can flip
> > it from ordered to unordered. The spec says that operations don't have
> > to produce NaN, but it doesn't say anything on comparisons other than
> > the generic "everything must follow IEEE rules" and an entry in the
> > table that says "produces correct results." Then again, I can't find
> > anything in GLSL allowing these transforms either, so maybe we just
> > need to get rid of them.
>
> What I hear you saying is, "The behavior isn't defined."  Unless you can
> point to a CTS test or an application that has incorrect behavior, I'm
> going to oppose removing this pretty strongly.  *Every* GLSL compiler
> does this.

No, I don't think ARB_shader_precision definitely says that the
behavior is undefined. While it does say that you don't have to
produce NaN's, it also says that intBitsToFloat() must produce a NaN
given the right input, it otherwise just says that comparisons must
produce the "correct result," with no exception for NaN's. "correct
result" does not mean "the behavior is undefined." It never refers
back to the IEEE spec or says what "correct result" means, but one
could only assume it's referring to the required unsignaling
comparisons (Table 5.1 and 5.3 in IEEE 754-2008), which is also what C
defines them to be. Those rules haven't changed much since, and
they're basically the same for Vulkan.

As have others have said, there are currently Vulkan CTS tests that
actually checks comparisons with NaN, and we currently pass it
basically by dumb luck because of the brokenness I mentioned (see mesa
commit e062eb6415de3aa51b43f30d638ce8215efc0511 which introduced the
extra checks for NaN and cites the CTS tests). It would probably be an
uphill battle to change the CTS tests, partially because one can argue
that it actually is required, but also because of the CL-over-Vulkan
efforts, as well as DXVK and VKD3D which are emulating API's that need
comparisons with NaN to work correctly. Also, according to
https://patchwork.freedesktop.org/patch/206486/, apparently
Wolfenstein 2 actually does care about it and breaks if you change
ordered to unordered -- again, we're getting it right by dumb luck.
And it's probably likely that some DX game does it, and we also get it
right by dumb luck. Just think about how much crazy stuff games come
to rely on by accident! We could make comparisons precise in SPIR-V,
but then we'd need to make unordered comparisons fast anyways, and it
seems silly to let GL and SPIR-V diverge like that, especially when
Wine has been emulating DX over GL for a long time.

Best,

Connor


>
> >> On Thu, Nov 29, 2018 at 9:18 AM Samuel Pitoiset 
> >>  wrote:
> >>>
> >>> It's correct in GLSL because the behaviour is undefined in
> >>> presence of NaNs. But this seems incorrect in Vulkan.
> >>>
> >>> Signed-off-by: Samuel Pitoiset 
> >>> ---
> >>>  src/compiler/nir/nir.h| 6 ++
> >>>  src/compiler/nir/nir_opt_algebraic.py | 8 
> >>>  2 files changed, 10 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> >>> index db935c8496b..4107c293962 100644
> >>> --- a/src/compiler/nir/nir.h
> >>> +++ b/src/compiler/nir/nir.h
> >>> @@ -2188,6 +2188,12 @@ typedef struct nir_shader_compiler_options {
> >>> /* Set if nir_lower_wpos_ytransform() should also invert 
> >>> gl_PointCoord. */
> >>> bool lower_wpos_pntc;
> >>>
> >>> +   /* If false, lower ~inot(flt(a,b)) -> fge(a,b) and variants.
> >>> +* In presence of NaNs, this is correct in GLSL because the 
> >>> b

[Mesa-dev] [PATCH 2/2] nir/algebraic: Add unit tests for bitsize validation

2018-11-29 Thread Connor Abbott
The non-failure path can be tested by just compiling mesa and then
testing it, but the failure paths won't be hit unless you make a mistake,
so it's best to test them with some unit tests.
---
 src/compiler/Makefile.nir.am  |   4 +-
 src/compiler/nir/meson.build  |   7 ++
 .../nir/tests/algebraic_parser_test.py| 116 ++
 .../nir/tests/algebraic_parser_test.sh|   3 +
 4 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 src/compiler/nir/tests/algebraic_parser_test.py
 create mode 100644 src/compiler/nir/tests/algebraic_parser_test.sh

diff --git a/src/compiler/Makefile.nir.am b/src/compiler/Makefile.nir.am
index c646c6bdc1e..aa0a92856f1 100644
--- a/src/compiler/Makefile.nir.am
+++ b/src/compiler/Makefile.nir.am
@@ -87,10 +87,12 @@ nir_tests_vars_tests_SOURCES = nir/tests/vars_tests.cpp
 nir_tests_vars_tests_CFLAGS = $(NIR_TESTS_CFLAGS)
 nir_tests_vars_tests_LDADD = $(NIR_TESTS_LDADD)
 
+check_SCRIPTS = nir/tests/algebraic_parser_test.sh
 
 TESTS += \
 nir/tests/control_flow_tests \
-nir/tests/vars_tests
+nir/tests/vars_tests \
+   nir/tests/algebraic_parser_test.sh
 
 
 BUILT_SOURCES += \
diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
index b0c3a7feb31..84e58cafb6f 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -259,4 +259,11 @@ if with_tests
   link_with : libmesa_util,
 )
   )
+  test(
+'nir_algebraic_parser',
+prog_python,
+args : [
+  join_paths(meson.current_source_dir(), 'tests/algebraic_parser_test.py')
+],
+  )
 endif
diff --git a/src/compiler/nir/tests/algebraic_parser_test.py 
b/src/compiler/nir/tests/algebraic_parser_test.py
new file mode 100644
index 000..492a09ec7db
--- /dev/null
+++ b/src/compiler/nir/tests/algebraic_parser_test.py
@@ -0,0 +1,116 @@
+#
+# Copyright (C) 2018 Valve Corporation
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+# IN THE SOFTWARE.
+
+import unittest
+
+import sys
+import os
+sys.path.insert(1, os.path.join(sys.path[0], '..'))
+
+from nir_algebraic import SearchAndReplace
+
+# These tests check that the bitsize validator correctly rejects various
+# different kinds of malformed expressions, and documents what the error
+# message looks like.
+
+a = 'a'
+b = 'b'
+c = 'c'
+
+class ValidatorTests(unittest.TestCase):
+pattern = ()
+message = ''
+
+def common(self, pattern, message):
+with self.assertRaises(AssertionError) as context:
+SearchAndReplace(pattern)
+
+self.assertEqual(message, str(context.exception))
+
+def test_wrong_src_count(self):
+self.common((('iadd', a), ('fadd', a, a)),
+"Expression ('iadd', 'a') has 1 sources, expected 2")
+
+def test_var_bitsize(self):
+self.common((('iadd', 'a@32', 'a@64'), ('fadd', a, a)),
+"Variable a has conflicting bit size requirements: " \
+"it must have bit size 32 and 64")
+
+def test_var_bitsize_2(self):
+self.common((('iadd', a, 'a@32'), ('fadd', 'a@64', a)),
+"Variable a has conflicting bit size requirements: " \
+"it must have bit size 32 and 64")
+
+def test_search_src_bitsize(self):
+self.common((('iadd', 'a@32', 'b@64'), ('fadd', a, b)),
+"Source a@32 of ('iadd', 'a@32', 'b@64') must have bit size 32, " \
+"while source b@64 must have incompatible bit size 64")
+
+def test_replace_src_bitsize(self):
+self.common((('iadd', a, ('b2i', b)), ('iadd', a, b)),
+"Sources a (bit size of a) and b (bit size of 32) " \
+"of ('iadd', 'a', 'b') may not have the same bit size " \
+"when building the replacement expression.")
+
+def test_search_src_bitsize_fixed(self):
+self.common((('ishl', a, 'b@64'), ('ishl', a, b)),
+"b@64 must have 64 bits, but as a source of 

[Mesa-dev] [PATCH 1/2] nir/algebraic: Rewrite bit-size inference

2018-11-29 Thread Connor Abbott
Before this commit, there were two copies of the algorithm: one in C,
that we would use to figure out what bit-size to give the replacement
expression, and one in Python, that emulated the C one and tried to
prove that the C algorithm would never fail to correctly assign
bit-sizes. That seemed pretty fragile, and likely to fall over if we
make any changes. Furthermore, the C code was really just recomputing
more-or-less the same thing as the Python code every time. Instead, we
can just store the results of the Python algorithm in the C
datastructure, and consult it to compute the bitsize of each value,
moving the "brains" entirely into Python. Since the Python algorithm no
longer has to match C, it's also a lot easier to change it to something
more closely approximating an actual type-inference algorithm. The
algorithm used is based on Hindley-Milner, although deliberately
weakened a little. It's a few more lines than the old one, judging by
the diffstat, but I think it's easier to verify that it's correct while
being as general as possible.

We could split this up into two changes, first making the C code use the
results of the Python code and then rewriting the Python algorithm, but
since the old algorithm never tracked which variable each equivalence
class, it would mean we'd have to add some non-trivial code which would
then get thrown away. I think it's better to see the final state all at
once, although I could also try splitting it up.
---
 src/compiler/nir/nir_algebraic.py | 518 --
 src/compiler/nir/nir_search.c | 146 +
 src/compiler/nir/nir_search.h |   2 +-
 3 files changed, 295 insertions(+), 371 deletions(-)

diff --git a/src/compiler/nir/nir_algebraic.py 
b/src/compiler/nir/nir_algebraic.py
index 728196136ab..48390dbde38 100644
--- a/src/compiler/nir/nir_algebraic.py
+++ b/src/compiler/nir/nir_algebraic.py
@@ -88,7 +88,7 @@ class Value(object):
 
__template = mako.template.Template("""
 static const ${val.c_type} ${val.name} = {
-   { ${val.type_enum}, ${val.bit_size} },
+   { ${val.type_enum}, ${val.c_bit_size} },
 % if isinstance(val, Constant):
${val.type()}, { ${val.hex()} /* ${val.value} */ },
 % elif isinstance(val, Variable):
@@ -112,6 +112,40 @@ static const ${val.c_type} ${val.name} = {
def __str__(self):
   return self.in_val
 
+   def get_bit_size(self):
+  """Get the physical bit-size that has been chosen for this value, or if
+  there is none, the canonical value which currently represents this
+  bit-size class. Variables will be preferred, i.e. if there are any
+  variables in the equivalence class, the canonical value will be a
+  variable. We do this since we'll need to know which variable each value
+  is equivalent to when constructing the replacement expression. This is
+  the "find" part of the union-find algorithm.
+  """
+  bit_size = self
+
+  while isinstance(bit_size, Value):
+ if bit_size._bit_size == None:
+break
+ bit_size = bit_size._bit_size
+
+  if bit_size != self:
+ self._bit_size = bit_size
+  return bit_size
+
+   def set_bit_size(self, other):
+  """Make self.get_bit_size() return what other.get_bit_size() return
+  before calling this, or just "other" if it's a concrete bit-size. This is
+  the "union" part of the union-find algorithm.
+  """
+
+  self_bit_size = self.get_bit_size()
+  other_bit_size = other if isinstance(other, int) else 
other.get_bit_size()
+
+  if self_bit_size == other_bit_size:
+ return
+
+  self_bit_size._bit_size = other_bit_size
+
@property
def type_enum(self):
   return "nir_search_value_" + self.type_str
@@ -124,6 +158,21 @@ static const ${val.c_type} ${val.name} = {
def c_ptr(self):
   return "&{0}.value".format(self.name)
 
+   @property
+   def c_bit_size(self):
+  bit_size = self.get_bit_size()
+  if isinstance(bit_size, int):
+ return bit_size
+  elif isinstance(bit_size, Variable):
+ return -bit_size.index - 1
+  else:
+ # If the bit-size class is neither a variable, nor an actual 
bit-size, then
+ # - If it's in the search expression, we don't need to check anything
+ # - If it's in the replace expression, either it's ambiguous (in which
+ # case we'd reject it), or it equals the bit-size of the search value
+ # We represent these cases with a 0 bit-size.
+ return 0
+
def render(self):
   return self.__template.render(val=self,
 Constant=Constant,
@@ -140,14 +189,14 @@ class Constant(Value):
   if isinstance(val, (str)):
  m = _constant_re.match(val)
  self.value = ast.literal_eval(m.group('value'))
- self.bit_size = int(m.group('bits')) if m.group('bits') else 0
+ self._bit_size = int(m.group('bits')) if m.group('bits') else None
   else:
  self.value 

[Mesa-dev] [PATCH 0/2] nir/algebraic: Rewrite bit-size handling

2018-11-29 Thread Connor Abbott
While nir_algebraic in general is great, the code to implement bit-size
inference has always been pretty fragile. While I was trying to remember
how everything worked in order to review Jason's patches touching this
area, I realized that I could make the whole thing significantly
simpler. This series is the end result of that. Since I was nice, I also
included some tests to exercise all the different errors you can hit. It
conflicts a little bit with Jason's series, since the bit size inference
algorithm has to be aware of the new sizeless comparison operators, but
I won't think it'll be too bad. And it means that I don't have to review
the trickiest prep patches for that series :)

This series is available at:
https://gitlab.freedesktop.org/cwabbott0/mesa/commits/nir-bitsize-validator-rewrite
It might be better to go there to see the final version of the
BitSizeValidator, since that part of the diff won't be too helpful.

Finally, I haven't actually tested the compiled mesa, but I'm working on
it :) It would be good to give this a run through the Intel CI.

Connor Abbott (2):
  nir/algebraic: Rewrite bit-size inference
  nir/algebraic: Add unit tests for bitsize validation

 src/compiler/Makefile.nir.am  |   4 +-
 src/compiler/nir/meson.build  |   7 +
 src/compiler/nir/nir_algebraic.py | 518 ++
 src/compiler/nir/nir_search.c | 146 +
 src/compiler/nir/nir_search.h |   2 +-
 .../nir/tests/algebraic_parser_test.py| 116 
 .../nir/tests/algebraic_parser_test.sh|   3 +
 7 files changed, 424 insertions(+), 372 deletions(-)
 create mode 100644 src/compiler/nir/tests/algebraic_parser_test.py
 create mode 100644 src/compiler/nir/tests/algebraic_parser_test.sh

-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: add a compiler option for disabling float comparison simplifications

2018-11-29 Thread Connor Abbott
On Thu, Nov 29, 2018 at 4:47 PM Connor Abbott  wrote:
>
> On Thu, Nov 29, 2018 at 4:22 PM Jason Ekstrand  wrote:
> >
> > Can you provide some context for this?  Those rules are already flagged 
> > "inexact" (that's what the ~ means) so they won't apply to anything that's 
> > "precise" or "invariant".
>
> I think the concern is that this isn't allowed in SPIR-V, even without
> exact or invariant. We even go out of our way to do the correct thing
> in the frontend by inserting an "&& a == a" or "|| a != a", but then
> opt_algebraic removes it with another rule and then this rule can flip
> it from ordered to unordered. The spec says that operations don't have
> to produce NaN, but it doesn't say anything on comparisons other than
> the generic "everything must follow IEEE rules" and an entry in the
> table that says "produces correct results." Then again, I can't find
> anything in GLSL allowing these transforms either, so maybe we just
> need to get rid of them.

Sorry... by SPIR-V, I meant Vulkan SPIR-V. Specifically Appendix A of
the Vulkan spec, in the "Precision and Operation of SPIR-V
Instructions" section. There's a section in the GLSL 4.50 spec which
is identical AFAICT.

>
> >
> > On Thu, Nov 29, 2018 at 9:18 AM Samuel Pitoiset  
> > wrote:
> >>
> >> It's correct in GLSL because the behaviour is undefined in
> >> presence of NaNs. But this seems incorrect in Vulkan.
> >>
> >> Signed-off-by: Samuel Pitoiset 
> >> ---
> >>  src/compiler/nir/nir.h| 6 ++
> >>  src/compiler/nir/nir_opt_algebraic.py | 8 
> >>  2 files changed, 10 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> >> index db935c8496b..4107c293962 100644
> >> --- a/src/compiler/nir/nir.h
> >> +++ b/src/compiler/nir/nir.h
> >> @@ -2188,6 +2188,12 @@ typedef struct nir_shader_compiler_options {
> >> /* Set if nir_lower_wpos_ytransform() should also invert 
> >> gl_PointCoord. */
> >> bool lower_wpos_pntc;
> >>
> >> +   /* If false, lower ~inot(flt(a,b)) -> fge(a,b) and variants.
> >> +* In presence of NaNs, this is correct in GLSL because the 
> >> behaviour is
> >> +* undefined. In Vulkan, doing these transformations is incorrect.
> >> +*/
> >> +   bool exact_float_comparisons;
> >> +
> >> /**
> >>  * Should nir_lower_io() create load_interpolated_input intrinsics?
> >>  *
> >> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> >> b/src/compiler/nir/nir_opt_algebraic.py
> >> index f2a7be0c403..3750874407b 100644
> >> --- a/src/compiler/nir/nir_opt_algebraic.py
> >> +++ b/src/compiler/nir/nir_opt_algebraic.py
> >> @@ -154,10 +154,10 @@ optimizations = [
> >> (('ishl', ('imul', a, '#b'), '#c'), ('imul', a, ('ishl', b, c))),
> >>
> >> # Comparison simplifications
> >> -   (('~inot', ('flt', a, b)), ('fge', a, b)),
> >> -   (('~inot', ('fge', a, b)), ('flt', a, b)),
> >> -   (('~inot', ('feq', a, b)), ('fne', a, b)),
> >> -   (('~inot', ('fne', a, b)), ('feq', a, b)),
> >> +   (('~inot', ('flt', a, b)), ('fge', a, b), 
> >> '!options->exact_float_comparisons'),
> >> +   (('~inot', ('fge', a, b)), ('flt', a, b), 
> >> '!options->exact_float_comparisons'),
> >> +   (('~inot', ('feq', a, b)), ('fne', a, b), 
> >> '!options->exact_float_comparisons'),
> >> +   (('~inot', ('fne', a, b)), ('feq', a, b), 
> >> '!options->exact_float_comparisons'),
> >
> >
> > The feq/fne one is actually completely safe.  fne is defined to be !feq 
> > even when NaN is considered.
> >
> > --Jasoan
> >
> >>
> >> (('inot', ('ilt', a, b)), ('ige', a, b)),
> >> (('inot', ('ult', a, b)), ('uge', a, b)),
> >> (('inot', ('ige', a, b)), ('ilt', a, b)),
> >> --
> >> 2.19.2
> >>
> >> ___
> >> mesa-dev mailing list
> >> mesa-dev@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: add a compiler option for disabling float comparison simplifications

2018-11-29 Thread Connor Abbott
On Thu, Nov 29, 2018 at 4:22 PM Jason Ekstrand  wrote:
>
> Can you provide some context for this?  Those rules are already flagged 
> "inexact" (that's what the ~ means) so they won't apply to anything that's 
> "precise" or "invariant".

I think the concern is that this isn't allowed in SPIR-V, even without
exact or invariant. We even go out of our way to do the correct thing
in the frontend by inserting an "&& a == a" or "|| a != a", but then
opt_algebraic removes it with another rule and then this rule can flip
it from ordered to unordered. The spec says that operations don't have
to produce NaN, but it doesn't say anything on comparisons other than
the generic "everything must follow IEEE rules" and an entry in the
table that says "produces correct results." Then again, I can't find
anything in GLSL allowing these transforms either, so maybe we just
need to get rid of them.

>
> On Thu, Nov 29, 2018 at 9:18 AM Samuel Pitoiset  
> wrote:
>>
>> It's correct in GLSL because the behaviour is undefined in
>> presence of NaNs. But this seems incorrect in Vulkan.
>>
>> Signed-off-by: Samuel Pitoiset 
>> ---
>>  src/compiler/nir/nir.h| 6 ++
>>  src/compiler/nir/nir_opt_algebraic.py | 8 
>>  2 files changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index db935c8496b..4107c293962 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -2188,6 +2188,12 @@ typedef struct nir_shader_compiler_options {
>> /* Set if nir_lower_wpos_ytransform() should also invert gl_PointCoord. 
>> */
>> bool lower_wpos_pntc;
>>
>> +   /* If false, lower ~inot(flt(a,b)) -> fge(a,b) and variants.
>> +* In presence of NaNs, this is correct in GLSL because the 
>> behaviour is
>> +* undefined. In Vulkan, doing these transformations is incorrect.
>> +*/
>> +   bool exact_float_comparisons;
>> +
>> /**
>>  * Should nir_lower_io() create load_interpolated_input intrinsics?
>>  *
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index f2a7be0c403..3750874407b 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -154,10 +154,10 @@ optimizations = [
>> (('ishl', ('imul', a, '#b'), '#c'), ('imul', a, ('ishl', b, c))),
>>
>> # Comparison simplifications
>> -   (('~inot', ('flt', a, b)), ('fge', a, b)),
>> -   (('~inot', ('fge', a, b)), ('flt', a, b)),
>> -   (('~inot', ('feq', a, b)), ('fne', a, b)),
>> -   (('~inot', ('fne', a, b)), ('feq', a, b)),
>> +   (('~inot', ('flt', a, b)), ('fge', a, b), 
>> '!options->exact_float_comparisons'),
>> +   (('~inot', ('fge', a, b)), ('flt', a, b), 
>> '!options->exact_float_comparisons'),
>> +   (('~inot', ('feq', a, b)), ('fne', a, b), 
>> '!options->exact_float_comparisons'),
>> +   (('~inot', ('fne', a, b)), ('feq', a, b), 
>> '!options->exact_float_comparisons'),
>
>
> The feq/fne one is actually completely safe.  fne is defined to be !feq even 
> when NaN is considered.
>
> --Jasoan
>
>>
>> (('inot', ('ilt', a, b)), ('ige', a, b)),
>> (('inot', ('ult', a, b)), ('uge', a, b)),
>> (('inot', ('ige', a, b)), ('ilt', a, b)),
>> --
>> 2.19.2
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] ac: handle cast derefs

2018-11-21 Thread Connor Abbott
I don't see this working where there are any phi nodes involved. The
variable pointers spec says that you can use pointers with phi nodes.
Currently we just create a plain integer type, so this will fall over.
Are there any CTS tests that phi nodes which don't get removed by
nir_opt_peephole_sel?

Fundamentally, there's a mismatch between the NIR and LLVM/SPIR-V
model for derefs. In LLVM and SPIR-V, pointer-ness is part of the type
of a value, and so pointers can (in theory, in SPIR-V there are
restrictions based on what capabilities are present) be passed around
freely, but on the other hand you can't e.g. add a pointer and an
integer. Instead, you have to do a ptr-to-int cast or do an access
chain (or GEP in LLVM lingo). In NIR, values only have their bitsize
and number of components, so the ptr-to-int cast is implicit, but on
the other hand, every intrinsic using a deref has to be traced back to
a variable or cast deref instruction. So one way of correctly
implementing this would be to emit a LLVM int-to-ptr if necessary for
nir_deref_type_cast here, and emit a ptr-to-int cast in
ac_to_integer(). There's the risk that this doesn't optimize as well
due to LLVM not expecting it, but at least it works. If that becomes a
problem, we could do some kind of fancier analysis to figure out where
to put the casts, or move NIR more towards LLVM's model.
On Mon, Nov 19, 2018 at 9:16 PM Dave Airlie  wrote:
>
> From: Dave Airlie 
>
> Just give back the same value for now.
> ---
>  src/amd/common/ac_nir_to_llvm.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index cc795324cc5..d7296a4617e 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -3715,6 +3715,9 @@ static void visit_deref(struct ac_nir_context *ctx,
> result = ac_build_gep0(>ac, get_src(ctx, instr->parent),
>get_src(ctx, instr->arr.index));
> break;
> +   case nir_deref_type_cast:
> +   result = get_src(ctx, instr->parent);
> +   break;
> default:
> unreachable("Unhandled deref_instr deref type");
> }
> --
> 2.17.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix an assertion for 16-bit integers in nir_imm_intN_t()

2018-11-19 Thread Connor Abbott
This will cause the assert to pass when it shouldn't in some cases
with a 32-bit bitsize, and seems like a hack since it's subverting the
point of the assert, which is guarantee that we won't lose any
information by truncating the source. It would be better to fix the
caller that's hitting the assert, and maybe change the argument to be
a int64_t too.
On Mon, Nov 19, 2018 at 10:38 AM Samuel Pitoiset
 wrote:
>
> Fixes dEQP-VK.spirv_assembly.type.scalar.i16.*
>
> Fixes: 1f29f4db1e ("nir/builder: Assert that intN_t immediates fit")
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/compiler/nir/nir_builder.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/compiler/nir/nir_builder.h b/src/compiler/nir/nir_builder.h
> index e37aba23dc..1f29209157 100644
> --- a/src/compiler/nir/nir_builder.h
> +++ b/src/compiler/nir/nir_builder.h
> @@ -332,7 +332,8 @@ nir_imm_intN_t(nir_builder *build, uint64_t x, unsigned 
> bit_size)
>
> assert(bit_size == 64 ||
>(int64_t)x >> bit_size == 0 ||
> -  (int64_t)x >> bit_size == -1);
> +  (int64_t)x >> bit_size == -1 ||
> +  (int64_t)x >> bit_size == 65535);
>
> memset(, 0, sizeof(v));
> assert(bit_size <= 64);
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] radv: implement fast HTILE clears for depth or stencil only on GFX9

2018-11-12 Thread Connor Abbott
On Mon, Nov 12, 2018 at 9:34 PM Bas Nieuwenhuizen
 wrote:
>
> On Mon, Nov 12, 2018 at 5:55 PM Samuel Pitoiset
>  wrote:
> >
> > This allows to fast clear the depth part (or the stencil part)
> > of a depth+stencil surface when HTILE is enabled. I didn't test
> > on GFX8, so it's disabled currently.
> >
> > This gives a very nice boost, for example when clearing the depth
> > aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster).
> >
> > BEFORE: 235 us
> > AFTER: 13 us
> >
> > Signed-off-by: Samuel Pitoiset 
> > ---
> >  src/amd/vulkan/radv_meta_clear.c | 268 ++-
> >  src/amd/vulkan/radv_private.h|   6 +
> >  2 files changed, 269 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_meta_clear.c 
> > b/src/amd/vulkan/radv_meta_clear.c
> > index b3128d021d..364a38daba 100644
> > --- a/src/amd/vulkan/radv_meta_clear.c
> > +++ b/src/amd/vulkan/radv_meta_clear.c
> > @@ -303,6 +303,22 @@ create_color_pipeline(struct radv_device *device,
> > return result;
> >  }
> >
> > +static void
> > +finish_meta_clear_htile_mask_state(struct radv_device *device)
> > +{
> > +   struct radv_meta_state *state = >meta_state;
> > +
> > +   radv_DestroyPipeline(radv_device_to_handle(device),
> > +state->clear_htile_mask_pipeline,
> > +>alloc);
> > +   radv_DestroyPipelineLayout(radv_device_to_handle(device),
> > +  state->clear_htile_mask_p_layout,
> > +  >alloc);
> > +   radv_DestroyDescriptorSetLayout(radv_device_to_handle(device),
> > +   state->clear_htile_mask_ds_layout,
> > +   >alloc);
> > +}
> > +
> >  void
> >  radv_device_finish_meta_clear_state(struct radv_device *device)
> >  {
> > @@ -339,6 +355,8 @@ radv_device_finish_meta_clear_state(struct radv_device 
> > *device)
> > radv_DestroyPipelineLayout(radv_device_to_handle(device),
> >state->clear_depth_p_layout,
> >>alloc);
> > +
> > +   finish_meta_clear_htile_mask_state(device);
> >  }
> >
> >  static void
> > @@ -746,6 +764,69 @@ emit_depthstencil_clear(struct radv_cmd_buffer 
> > *cmd_buffer,
> > }
> >  }
> >
> > +static uint32_t
> > +clear_htile_mask(struct radv_cmd_buffer *cmd_buffer,
> > +struct radeon_winsys_bo *bo, uint64_t offset, uint64_t 
> > size,
> > +uint32_t htile_value, uint32_t htile_mask)
> > +{
> > +   struct radv_device *device = cmd_buffer->device;
> > +   struct radv_meta_state *state = >meta_state;
> > +   uint64_t block_count = round_up_u64(size, 1024);
> > +   struct radv_meta_saved_state saved_state;
> > +
> > +   radv_meta_save(_state, cmd_buffer,
> > +  RADV_META_SAVE_COMPUTE_PIPELINE |
> > +  RADV_META_SAVE_CONSTANTS |
> > +  RADV_META_SAVE_DESCRIPTORS);
> > +
> > +   struct radv_buffer dst_buffer = {
> > +   .bo = bo,
> > +   .offset = offset,
> > +   .size = size
> > +   };
> > +
> > +   radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer),
> > +VK_PIPELINE_BIND_POINT_COMPUTE,
> > +state->clear_htile_mask_pipeline);
> > +
> > +   radv_meta_push_descriptor_set(cmd_buffer, 
> > VK_PIPELINE_BIND_POINT_COMPUTE,
> > + state->clear_htile_mask_p_layout,
> > + 0, /* set */
> > + 1, /* descriptorWriteCount */
> > + (VkWriteDescriptorSet[]) {
> > + {
> > + .sType = 
> > VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
> > + .dstBinding = 0,
> > + .dstArrayElement = 0,
> > + .descriptorCount = 1,
> > + .descriptorType = 
> > VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
> > + .pBufferInfo = 
> > &(VkDescriptorBufferInfo) {
> > + .buffer = 
> > radv_buffer_to_handle(_buffer),
> > + .offset = 0,
> > + .range = size
> > + }
> > + }
> > + });
> > +
> > +   const unsigned constants[2] = {
> > +   htile_value & htile_mask,
> > +   ~htile_mask,
> > +   };
> > +
> > +

Re: [Mesa-dev] [PATCH 21/31] nir: Add support for 1-bit data types

2018-11-06 Thread Connor Abbott
I quickly looked through the users of glsl_get_bit_size, and here are
the ones I found that still have to be fixed up:

- vtn_type_layout_430()
- vtn_type_block_size() in vtn_variables.c
- our OpBitcast implementation, although I don't think we can see
booleans there? (Also, it doesn't handle 8 and 16-bit types?!)
- nir_opt_large_constants.c, in handle_constant_store which has to
upload the constant data to a buffer, and then presumably later when
we load it we now have to insert i2b instructions.
- glsl_to_nir, when we generate a UBO or SSBO load, we'll start
generating a load to a 1-bit value, and then inserting an i2b
instruction, which probably isn't what we want.

Also, one comment below...

On Tue, Oct 23, 2018 at 12:16 AM Jason Ekstrand  wrote:
>> This commit adds support for 1-bit booleans and integers.  Booleans
> obviously take a value of true or false.  Because we have to define the
> semantics of 1-bit signed and unsigned integers, we define uint1_t to
> take values of 0 and 1 and int1_t to take values of 0 and -1.  1-bit
> arithmetic is then well-defined in the usual way, just with fewer bits.
> The definition of int1_t and uint1_t doesn't usually matter but we do
> need something for purposes of constant folding.
> ---
>  src/compiler/nir/nir.c| 15 +--
>  src/compiler/nir/nir.h| 21 +++-
>  src/compiler/nir/nir_builder.h| 12 -
>  src/compiler/nir/nir_constant_expressions.py  | 25 ---
>  src/compiler/nir/nir_instr_set.c  | 23 ++---
>  .../nir/nir_lower_load_const_to_scalar.c  |  3 +++
>  src/compiler/nir/nir_opt_constant_folding.c   |  3 +++
>  src/compiler/nir/nir_opt_large_constants.c|  5 
>  src/compiler/nir/nir_print.c  |  3 +++
>  src/compiler/nir/nir_search.c |  3 ++-
>  src/compiler/nir/nir_validate.c   |  2 +-
>  src/compiler/nir_types.cpp|  2 +-
>  src/compiler/spirv/spirv_to_nir.c |  9 +++
>  13 files changed, 101 insertions(+), 25 deletions(-)
>
> diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c
> index 0be40d257f5..8e83edb3644 100644
> --- a/src/compiler/nir/nir.c
> +++ b/src/compiler/nir/nir.c
> @@ -638,6 +638,7 @@ const_value_int(int64_t i, unsigned bit_size)
>  {
> nir_const_value v;
> switch (bit_size) {
> +   case 1:  v.b[0]   = i & 1;  break;
> case 8:  v.i8[0]  = i;  break;
> case 16: v.i16[0] = i;  break;
> case 32: v.i32[0] = i;  break;
> @@ -1206,6 +1207,8 @@ nir_src_comp_as_int(nir_src src, unsigned comp)
>
> assert(comp < load->def.num_components);
> switch (load->def.bit_size) {
> +   /* int1_t uses 0/-1 convention */
> +   case 1:  return -(int)load->value.b[comp];
> case 8:  return load->value.i8[comp];
> case 16: return load->value.i16[comp];
> case 32: return load->value.i32[comp];
> @@ -1223,6 +1226,7 @@ nir_src_comp_as_uint(nir_src src, unsigned comp)
>
> assert(comp < load->def.num_components);
> switch (load->def.bit_size) {
> +   case 1:  return load->value.b[comp];
> case 8:  return load->value.u8[comp];
> case 16: return load->value.u16[comp];
> case 32: return load->value.u32[comp];
> @@ -1235,15 +1239,12 @@ nir_src_comp_as_uint(nir_src src, unsigned comp)
>  bool
>  nir_src_comp_as_bool(nir_src src, unsigned comp)
>  {
> -   assert(nir_src_is_const(src));
> -   nir_load_const_instr *load = 
> nir_instr_as_load_const(src.ssa->parent_instr);
> +   int64_t i = nir_src_comp_as_int(src, comp);
>
> -   assert(comp < load->def.num_components);
> -   assert(load->def.bit_size == 32);
> -   assert(load->value.u32[comp] == NIR_TRUE ||
> -  load->value.u32[comp] == NIR_FALSE);
> +   /* Booleans of any size use 0/-1 convention */
> +   assert(i == 0 || i == -1);
>
> -   return load->value.u32[comp];
> +   return i;
>  }
>
>  double
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 47c7f400b2d..b19138f9e61 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -117,6 +117,7 @@ typedef enum {
>  } nir_rounding_mode;
>
>  typedef union {
> +   bool b[NIR_MAX_VEC_COMPONENTS];
> float f32[NIR_MAX_VEC_COMPONENTS];
> double f64[NIR_MAX_VEC_COMPONENTS];
> int8_t i8[NIR_MAX_VEC_COMPONENTS];
> @@ -778,17 +779,25 @@ typedef struct {
> unsigned write_mask : NIR_MAX_VEC_COMPONENTS; /* ignored if dest.is_ssa 
> is true */
>  } nir_alu_dest;
>
> +/** NIR sized and unsized types
> + *
> + * The values in this enum are carefully chosen so that the sized type is
> + * just the unsized type OR the number of bits.
> + */
>  typedef enum {
> nir_type_invalid = 0, /* Not a valid type */
> -   nir_type_float,
> -   nir_type_int,
> -   nir_type_uint,
> -   nir_type_bool,
> +   nir_type_int =   2,
> +   nir_type_uint =  4,
> +   nir_type_bool =  6,
> +   nir_type_float = 128,
> +   nir_type_bool1 = 1  | 

Re: [Mesa-dev] [PATCH 21/31] nir: Add support for 1-bit data types

2018-11-06 Thread Connor Abbott
On Tue, Oct 23, 2018 at 12:16 AM Jason Ekstrand  wrote:
>
> This commit adds support for 1-bit booleans and integers.  Booleans
> obviously take a value of true or false.  Because we have to define the
> semantics of 1-bit signed and unsigned integers, we define uint1_t to
> take values of 0 and 1 and int1_t to take values of 0 and -1.  1-bit
> arithmetic is then well-defined in the usual way, just with fewer bits.
> The definition of int1_t and uint1_t doesn't usually matter but we do
> need something for purposes of constant folding.
> ---
>  src/compiler/nir/nir.c| 15 +--
>  src/compiler/nir/nir.h| 21 +++-
>  src/compiler/nir/nir_builder.h| 12 -
>  src/compiler/nir/nir_constant_expressions.py  | 25 ---
>  src/compiler/nir/nir_instr_set.c  | 23 ++---
>  .../nir/nir_lower_load_const_to_scalar.c  |  3 +++
>  src/compiler/nir/nir_opt_constant_folding.c   |  3 +++
>  src/compiler/nir/nir_opt_large_constants.c|  5 
>  src/compiler/nir/nir_print.c  |  3 +++
>  src/compiler/nir/nir_search.c |  3 ++-
>  src/compiler/nir/nir_validate.c   |  2 +-
>  src/compiler/nir_types.cpp|  2 +-
>  src/compiler/spirv/spirv_to_nir.c |  9 +++
>  13 files changed, 101 insertions(+), 25 deletions(-)
>
> diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c
> index 0be40d257f5..8e83edb3644 100644
> --- a/src/compiler/nir/nir.c
> +++ b/src/compiler/nir/nir.c
> @@ -638,6 +638,7 @@ const_value_int(int64_t i, unsigned bit_size)
>  {
> nir_const_value v;
> switch (bit_size) {
> +   case 1:  v.b[0]   = i & 1;  break;
> case 8:  v.i8[0]  = i;  break;
> case 16: v.i16[0] = i;  break;
> case 32: v.i32[0] = i;  break;
> @@ -1206,6 +1207,8 @@ nir_src_comp_as_int(nir_src src, unsigned comp)
>
> assert(comp < load->def.num_components);
> switch (load->def.bit_size) {
> +   /* int1_t uses 0/-1 convention */
> +   case 1:  return -(int)load->value.b[comp];
> case 8:  return load->value.i8[comp];
> case 16: return load->value.i16[comp];
> case 32: return load->value.i32[comp];
> @@ -1223,6 +1226,7 @@ nir_src_comp_as_uint(nir_src src, unsigned comp)
>
> assert(comp < load->def.num_components);
> switch (load->def.bit_size) {
> +   case 1:  return load->value.b[comp];
> case 8:  return load->value.u8[comp];
> case 16: return load->value.u16[comp];
> case 32: return load->value.u32[comp];
> @@ -1235,15 +1239,12 @@ nir_src_comp_as_uint(nir_src src, unsigned comp)
>  bool
>  nir_src_comp_as_bool(nir_src src, unsigned comp)
>  {
> -   assert(nir_src_is_const(src));
> -   nir_load_const_instr *load = 
> nir_instr_as_load_const(src.ssa->parent_instr);
> +   int64_t i = nir_src_comp_as_int(src, comp);
>
> -   assert(comp < load->def.num_components);
> -   assert(load->def.bit_size == 32);
> -   assert(load->value.u32[comp] == NIR_TRUE ||
> -  load->value.u32[comp] == NIR_FALSE);
> +   /* Booleans of any size use 0/-1 convention */
> +   assert(i == 0 || i == -1);
>
> -   return load->value.u32[comp];
> +   return i;
>  }
>
>  double
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 47c7f400b2d..b19138f9e61 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -117,6 +117,7 @@ typedef enum {
>  } nir_rounding_mode;
>
>  typedef union {
> +   bool b[NIR_MAX_VEC_COMPONENTS];
> float f32[NIR_MAX_VEC_COMPONENTS];
> double f64[NIR_MAX_VEC_COMPONENTS];
> int8_t i8[NIR_MAX_VEC_COMPONENTS];
> @@ -778,17 +779,25 @@ typedef struct {
> unsigned write_mask : NIR_MAX_VEC_COMPONENTS; /* ignored if dest.is_ssa 
> is true */
>  } nir_alu_dest;
>
> +/** NIR sized and unsized types
> + *
> + * The values in this enum are carefully chosen so that the sized type is
> + * just the unsized type OR the number of bits.
> + */
>  typedef enum {
> nir_type_invalid = 0, /* Not a valid type */
> -   nir_type_float,
> -   nir_type_int,
> -   nir_type_uint,
> -   nir_type_bool,
> +   nir_type_int =   2,
> +   nir_type_uint =  4,
> +   nir_type_bool =  6,
> +   nir_type_float = 128,

This seems pretty convoluted... can we just swap things around, i.e.
do something like

nir_type_int = (1 << 6),
nir_type_uint = (2 << 6),

etc. I guess this makes it take more than a byte, but normally
nir_alu_type is never actually part of the IR, so this shouldn't
matter much. From some grepping, it seems like we only try to pack a
nir_alu_type in 8 bits in one place in nir_serialize.c, and there are
10 extra bits to spare there.

> +   nir_type_bool1 = 1  | nir_type_bool,
> nir_type_bool32 =32 | nir_type_bool,
> +   nir_type_int1 =  1  | nir_type_int,
> nir_type_int8 =  8  | nir_type_int,
> nir_type_int16 = 16 | nir_type_int,
> nir_type_int32 = 32 | nir_type_int,
> nir_type_int64 = 64 | 

Re: [Mesa-dev] intel: WIP: Support for using 16-bits for mediump

2018-11-06 Thread Connor Abbott
On Tue, Nov 6, 2018 at 1:45 PM Pohjolainen, Topi
 wrote:
>
> On Tue, Nov 06, 2018 at 11:31:58AM +0100, Connor Abbott wrote:
> > On Tue, Nov 6, 2018 at 11:14 AM Pohjolainen, Topi
> >  wrote:
> > >
> > > On Tue, Nov 06, 2018 at 10:45:52AM +0100, Connor Abbott wrote:
> > > > As far as I understand, mediump handling can be split into two parts:
> > > >
> > > > 1. Figuring out which operations (instructions or SSA values in NIR)
> > > > can use relaxed precision.
> > > > 2. Deciding which relaxed-precision operations to actually compute in
> > > > 16-bit precision.
> > > >
> > > > At least for GLSL, #1 is pretty well nailed down by the GLSL spec,
> > > > where it's specified in terms of the source expressions. For example,
> > > > something like:
> > > >
> > > > mediump float a = ...;
> > > > mediump float b = ...;
> > > > float c = a + b;
> > > > float d = c + 2.0;
> > > >
> > > > the last addition must be performed in full precision, whereas for:
> > > >
> > > >
> > > > mediump float a = ...;
> > > > mediump float b = ...;
> > > > float d = (a + b) + 2.0;
> > > >
> > > > it can be lowered to 16-bit. This information gets lost during
> > > > expression grafting in GLSL IR, or vars-to-SSA in NIR, and even the
> > > > AST -> GLSL IR transform will sometimes split up expressions, so it
> > > > seems like both are too low-level for this. The analysis described by
> > > > the spec (the paragraph in section 4.7.3 "Precision Qualifiers" of the
> > > > GLSL ES 3.20 spec) has to happen on the AST after type checking but
> > > > before lowering to GLSL IR in order to be correct and not overly
> > > > conservative. If you want to do it in NIR since #2 is easier with SSA,
> > > > then sure... but we can't mix them up and do both at the same time.
> > > > We'll have to add support for annotating ir_expression's and nir_instr
> > > > (or maybe nir_ssa_def's) with a relaxed precision, and filter that
> > > > information down through the pipeline. Hopefully that also works
> > > > better for SPIR-V, where you can annotate individual instructions as
> > > > being RelaxedPrecision, and afaik (hopefully) #1 is handled by
> > > > glslang.
> > >
> > > I tried to describe the logic I used and my interpretation of the spec in
> > > the accompanying patch:
> > >
> > > https://lists.freedesktop.org/archives/mesa-dev/2018-November/208683.html
> > >
> > > Does it make any sense?
> >
> > It seems incorrect, since it will make the addition in my example
> > operate in 16 bit precision when it shouldn't. As I explained above,
> > it's impossible to do this correctly in NIR.
> >
> > Also, abusing a 16-bit bitsize in NIR to mean mediump is not ok. There
> > are other vulkan/glsl extensions out there that provide actual fp16
> > support, where the result is guaranteed to be calculated as a
> > half-float, and these obviously won't work properly with this pass. We
> > need to add a flag to the SSA def, or Jason's idea a long time ago was
> > to add a fake "24-bit" bitsize. Part of #2 will involve converting the
> > bitsize to be 16-bit and removing the flag.
>
> I wrote small test shader-runner test:
>
> [require]
> GL ES >= 2.0
> GLSL ES >= 1.00
>
> [vertex shader]
> #version 100
>
> precision highp float;
>
> attribute vec4 piglit_vertex;
>
> void main()
> {
> gl_Position = piglit_vertex;
> }
>
> [fragment shader]
> #version 100
> precision highp float;
>
> uniform mediump float a;
> uniform mediump float b;
>
> void main()
> {
> float c = a + b;
> float d = c + 0.4;
>
> gl_FragColor = vec4(a, b, c, d);
> }
>
> [test]
> uniform float a 0.1
> uniform float b 0.2
> draw rect -1 -1 2 2
> probe all rgba 0.1 0.2 0.3 0.7
>
>
> And that made me realize another short-coming in my implementation - I lowered
> variable precision after running nir_lower_var_copies() loosing precision for
> local temporaries. Moving it before gave me:
>
> NIR (final form) for fragment shader:
> shader: MESA_SHADER_FRAGMENT
> name: GLSL3
> inputs: 0
> outputs: 0
> uniforms: 8
> shared: 0
> decl_var uniform INTERP_MODE_NONE float16_t a (0, 0, 0)
> decl_var uniform INTERP_MODE_NONE float16_t b (1, 4, 0)
> decl_var shader_out INTERP_MODE_NON

Re: [Mesa-dev] intel: WIP: Support for using 16-bits for mediump

2018-11-06 Thread Connor Abbott
On Tue, Nov 6, 2018 at 11:31 AM Connor Abbott  wrote:
>
> On Tue, Nov 6, 2018 at 11:14 AM Pohjolainen, Topi
>  wrote:
> >
> > On Tue, Nov 06, 2018 at 10:45:52AM +0100, Connor Abbott wrote:
> > > As far as I understand, mediump handling can be split into two parts:
> > >
> > > 1. Figuring out which operations (instructions or SSA values in NIR)
> > > can use relaxed precision.
> > > 2. Deciding which relaxed-precision operations to actually compute in
> > > 16-bit precision.
> > >
> > > At least for GLSL, #1 is pretty well nailed down by the GLSL spec,
> > > where it's specified in terms of the source expressions. For example,
> > > something like:
> > >
> > > mediump float a = ...;
> > > mediump float b = ...;
> > > float c = a + b;
> > > float d = c + 2.0;
> > >
> > > the last addition must be performed in full precision, whereas for:
> > >
> > >
> > > mediump float a = ...;
> > > mediump float b = ...;
> > > float d = (a + b) + 2.0;
> > >
> > > it can be lowered to 16-bit. This information gets lost during
> > > expression grafting in GLSL IR, or vars-to-SSA in NIR, and even the
> > > AST -> GLSL IR transform will sometimes split up expressions, so it
> > > seems like both are too low-level for this. The analysis described by
> > > the spec (the paragraph in section 4.7.3 "Precision Qualifiers" of the
> > > GLSL ES 3.20 spec) has to happen on the AST after type checking but
> > > before lowering to GLSL IR in order to be correct and not overly
> > > conservative. If you want to do it in NIR since #2 is easier with SSA,
> > > then sure... but we can't mix them up and do both at the same time.
> > > We'll have to add support for annotating ir_expression's and nir_instr
> > > (or maybe nir_ssa_def's) with a relaxed precision, and filter that
> > > information down through the pipeline. Hopefully that also works
> > > better for SPIR-V, where you can annotate individual instructions as
> > > being RelaxedPrecision, and afaik (hopefully) #1 is handled by
> > > glslang.
> >
> > I tried to describe the logic I used and my interpretation of the spec in
> > the accompanying patch:
> >
> > https://lists.freedesktop.org/archives/mesa-dev/2018-November/208683.html
> >
> > Does it make any sense?
>
> It seems incorrect, since it will make the addition in my example
> operate in 16 bit precision when it shouldn't. As I explained above,
> it's impossible to do this correctly in NIR.

I digged a little more, and it seems like my interpretation of the
spec is also what glslang does, as well as at least one other
implementation (Mali). You can see the glslang implementation here:
https://github.com/KhronosGroup/glslang/blob/5ff3c3da3b374a03a5eff96544fcd6678ed575c1/glslang/MachineIndependent/Intermediate.cpp#L3024

I should clarify a little bit: it may be possible to implement the
GLSL rules for propagating precision qualifiers later, by annotating
every expression node/SSA def as either highp, mediump, or "could be
either." Then you can do something like what you're doing, propagating
things through and stopping when you hit a mediump or highp
expression. Every expression that defines a variable on the source
level must be either highp or mediump. But I think this makes the
changes to the rest of the optimization passes even more invasive,
since you still have to handle highp/mediump correctly everywhere, but
you also have to deal with the "could be either" case, so it doesn't
really help you. It's easier to just do it at the AST level, doing
something like what glslang does. This part is completely
target-independent -- only later, in a NIR pass, should we decide
whether to actually use fp16 while taking into account what the HW
actually can do.

>
> Also, abusing a 16-bit bitsize in NIR to mean mediump is not ok. There
> are other vulkan/glsl extensions out there that provide actual fp16
> support, where the result is guaranteed to be calculated as a
> half-float, and these obviously won't work properly with this pass. We
> need to add a flag to the SSA def, or Jason's idea a long time ago was
> to add a fake "24-bit" bitsize. Part of #2 will involve converting the
> bitsize to be 16-bit and removing the flag.
>
> >
> > >
> > >
> > > On Tue, Nov 6, 2018 at 7:30 AM Topi Pohjolainen
> > >  wrote:
> > > >
> > > > Here is a version 2 of adding support for 16-bit float instructions in
> > > > the shader compiler. Unlike the first version which did all the analysis

Re: [Mesa-dev] intel: WIP: Support for using 16-bits for mediump

2018-11-06 Thread Connor Abbott
On Tue, Nov 6, 2018 at 11:14 AM Pohjolainen, Topi
 wrote:
>
> On Tue, Nov 06, 2018 at 10:45:52AM +0100, Connor Abbott wrote:
> > As far as I understand, mediump handling can be split into two parts:
> >
> > 1. Figuring out which operations (instructions or SSA values in NIR)
> > can use relaxed precision.
> > 2. Deciding which relaxed-precision operations to actually compute in
> > 16-bit precision.
> >
> > At least for GLSL, #1 is pretty well nailed down by the GLSL spec,
> > where it's specified in terms of the source expressions. For example,
> > something like:
> >
> > mediump float a = ...;
> > mediump float b = ...;
> > float c = a + b;
> > float d = c + 2.0;
> >
> > the last addition must be performed in full precision, whereas for:
> >
> >
> > mediump float a = ...;
> > mediump float b = ...;
> > float d = (a + b) + 2.0;
> >
> > it can be lowered to 16-bit. This information gets lost during
> > expression grafting in GLSL IR, or vars-to-SSA in NIR, and even the
> > AST -> GLSL IR transform will sometimes split up expressions, so it
> > seems like both are too low-level for this. The analysis described by
> > the spec (the paragraph in section 4.7.3 "Precision Qualifiers" of the
> > GLSL ES 3.20 spec) has to happen on the AST after type checking but
> > before lowering to GLSL IR in order to be correct and not overly
> > conservative. If you want to do it in NIR since #2 is easier with SSA,
> > then sure... but we can't mix them up and do both at the same time.
> > We'll have to add support for annotating ir_expression's and nir_instr
> > (or maybe nir_ssa_def's) with a relaxed precision, and filter that
> > information down through the pipeline. Hopefully that also works
> > better for SPIR-V, where you can annotate individual instructions as
> > being RelaxedPrecision, and afaik (hopefully) #1 is handled by
> > glslang.
>
> I tried to describe the logic I used and my interpretation of the spec in
> the accompanying patch:
>
> https://lists.freedesktop.org/archives/mesa-dev/2018-November/208683.html
>
> Does it make any sense?

It seems incorrect, since it will make the addition in my example
operate in 16 bit precision when it shouldn't. As I explained above,
it's impossible to do this correctly in NIR.

Also, abusing a 16-bit bitsize in NIR to mean mediump is not ok. There
are other vulkan/glsl extensions out there that provide actual fp16
support, where the result is guaranteed to be calculated as a
half-float, and these obviously won't work properly with this pass. We
need to add a flag to the SSA def, or Jason's idea a long time ago was
to add a fake "24-bit" bitsize. Part of #2 will involve converting the
bitsize to be 16-bit and removing the flag.

>
> >
> >
> > On Tue, Nov 6, 2018 at 7:30 AM Topi Pohjolainen
> >  wrote:
> > >
> > > Here is a version 2 of adding support for 16-bit float instructions in
> > > the shader compiler. Unlike the first version which did all the analysis
> > > at glsl level here one adds the notion of precision to NIR variables and
> > > does the analysis and precision lowering in NIR level.
> > >
> > > This lives in: gitlab.freedesktop.org:tpohjola/mesa and branch fp16.
> > >
> > > This is now mature enough to be able to use 16-bit precision for all
> > > instructions except a few special cases for gfxbench trex and alu2.
> > > (Unfortunately I'm not seeing any performance benefit. This is not
> > > that surprising as I got to the same point with the glsl-based
> > > solution and was able to measure the performance already back then).
> > > Hence I thought it is time to share it.
> > >
> > > While this is still work-in-progress I didn't want to flood the list
> > > with the full set of patches but instead included the very last where
> > > I try to outline the logic and its current shortcomings. There is also
> > > a short list of TODO items.
> > >
> > > In addition to those I need to examine couple of Intel specific
> > > misrenderings. I haven't gotten that deep yet but it looks I'm missing
> > > something with 16-bit inot and mad/mac lowered interpolation.
> > > Unfortunately I get corrupted rendering only with hardware while
> > > simulator is happy.
> > >
> > > Mostly I'm afraid how to test all of this properly. I haven't written
> > > any unit tests but that is high on my list. This is mostly because I've
> > > been uncertain about my design choices. So far I've used shader
> > > runner te

Re: [Mesa-dev] intel: WIP: Support for using 16-bits for mediump

2018-11-06 Thread Connor Abbott
As far as I understand, mediump handling can be split into two parts:

1. Figuring out which operations (instructions or SSA values in NIR)
can use relaxed precision.
2. Deciding which relaxed-precision operations to actually compute in
16-bit precision.

At least for GLSL, #1 is pretty well nailed down by the GLSL spec,
where it's specified in terms of the source expressions. For example,
something like:

mediump float a = ...;
mediump float b = ...;
float c = a + b;
float d = c + 2.0;

the last addition must be performed in full precision, whereas for:


mediump float a = ...;
mediump float b = ...;
float d = (a + b) + 2.0;

it can be lowered to 16-bit. This information gets lost during
expression grafting in GLSL IR, or vars-to-SSA in NIR, and even the
AST -> GLSL IR transform will sometimes split up expressions, so it
seems like both are too low-level for this. The analysis described by
the spec (the paragraph in section 4.7.3 "Precision Qualifiers" of the
GLSL ES 3.20 spec) has to happen on the AST after type checking but
before lowering to GLSL IR in order to be correct and not overly
conservative. If you want to do it in NIR since #2 is easier with SSA,
then sure... but we can't mix them up and do both at the same time.
We'll have to add support for annotating ir_expression's and nir_instr
(or maybe nir_ssa_def's) with a relaxed precision, and filter that
information down through the pipeline. Hopefully that also works
better for SPIR-V, where you can annotate individual instructions as
being RelaxedPrecision, and afaik (hopefully) #1 is handled by
glslang.


On Tue, Nov 6, 2018 at 7:30 AM Topi Pohjolainen
 wrote:
>
> Here is a version 2 of adding support for 16-bit float instructions in
> the shader compiler. Unlike the first version which did all the analysis
> at glsl level here one adds the notion of precision to NIR variables and
> does the analysis and precision lowering in NIR level.
>
> This lives in: gitlab.freedesktop.org:tpohjola/mesa and branch fp16.
>
> This is now mature enough to be able to use 16-bit precision for all
> instructions except a few special cases for gfxbench trex and alu2.
> (Unfortunately I'm not seeing any performance benefit. This is not
> that surprising as I got to the same point with the glsl-based
> solution and was able to measure the performance already back then).
> Hence I thought it is time to share it.
>
> While this is still work-in-progress I didn't want to flood the list
> with the full set of patches but instead included the very last where
> I try to outline the logic and its current shortcomings. There is also
> a short list of TODO items.
>
> In addition to those I need to examine couple of Intel specific
> misrenderings. I haven't gotten that deep yet but it looks I'm missing
> something with 16-bit inot and mad/mac lowered interpolation.
> Unfortunately I get corrupted rendering only with hardware while
> simulator is happy.
>
> Mostly I'm afraid how to test all of this properly. I haven't written
> any unit tests but that is high on my list. This is mostly because I've
> been uncertain about my design choices. So far I've used shader
> runner tests that I've written for specific cases. These are useful for
> development purposes but don't bring much value for regression testing.
>
> Alejandro Piñeiro (1):
>   intel/compiler/fs: Use half_precision data_format on 16-bit fb writes
>
> Jose Maria Casanova Crespo (2):
>   intel/compiler/fs: Include support for RT data_format bit
>   intel/compiler/disasm: Show half-precision data_format on rt_writes
>
> Topi Pohjolainen (58):
>   intel/compiler/fs: Set 16-bit sampler return format
>   intel/compiler/disasm: Show half-precision for sampler messages
>   intel/compiler/fs: Skip tex-inst early in conversion lowering
>   intel/compiler/fs: Support for dumping 16-bit IMM values
>   intel/compiler: Allow 16-bit math
>   intel/compiler/fs: Add helpers for 16-bit null regs
>   intel/compiler/fs: Use two SIMD8 instructions for 16-bit math
>   intel/compiler/fs: Use 16-bit null dest with 16-bit math
>   intel/compiler/fs: Use 16-bit null dest with 16-bit compare
>   intel/compiler/fs: Add 16-bit type support for nir_if
>   intel/compiler/eu: Prepare 3-src-op for 16-bit sources
>   intel/compiler/eu: Prepare 3-src-op for 16-bit dst
>   intel/compiler/eu: Allow 3-src-op with mixed precision (HF/F) sources
>   intel/compiler/disasm: Print mixed precision 3-src types correctly
>   intel/compiler/disasm: Print 16-bit IMM values
>   intel/compiler/fs: Support for combining 16-bit immediates
>   intel/compiler/fs: Set tex type for generator to flag fp16
>   intel/compiler/fs: Use component_size() instead of open coded
>   intel/compiler/fs: Add register padding support
>   intel/compiler/fs: Pad 16-bit texture return payloads
>   intel/compiler/fs: Pad 16-bit output (store/fb write) payloads
>   intel/compiler/fs: Pad 16-bit nir vec* components into full reg
>   intel/compiler/fs: Pad 16-bit nir intrinsic dest into 

Re: [Mesa-dev] [PATCH 05/15] ac: revert new LLVM 7.0 behavior for fdiv

2018-10-29 Thread Connor Abbott
ctx->f32_1 probably needs to be replaced by the appropriately-sized
float, like LLVMConstReal(1., ...)
On Mon, Oct 29, 2018 at 11:45 AM Timothy Arceri  wrote:
>
> Hi Marek,
>
> It's late and I haven't dug into this any further but this patch causes
> a whole bunch of f64 piglit tests to fail for the radeonsi nir backend.
>
> e.g.
>
> ./bin/shader_runner
> generated_tests/spec/glsl-4.00/execution/built-in-functions/fs-inverse-dmat2.shader_test
> -auto -fbo
>
>
> LLVM ERROR: Cannot select: 0x7fbc48075aa8: f64 = bitcast 0x7fbc48077730
>0x7fbc48077730: f32 = RCP 0x7fbc4806e788
>  0x7fbc4806e788: f64 = fadd nsz 0x7fbc480757d0, 0x7fbc48075a40
>0x7fbc480757d0: f64 = fmul nsz 0x7fbc4806f0e0, 0x7fbc4806f420
>  0x7fbc4806f0e0: f64 = bitcast 0x7fbc4806f078
>
> On 30/8/18 6:13 am, Marek Olšák wrote:
> > From: Marek Olšák 
> >
> > Cc: 18.1 18.2 
> > ---
> >   src/amd/common/ac_llvm_build.c | 9 -
> >   1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
> > index c741a1ab62d..629cd2a7527 100644
> > --- a/src/amd/common/ac_llvm_build.c
> > +++ b/src/amd/common/ac_llvm_build.c
> > @@ -554,21 +554,28 @@ LLVMValueRef ac_build_expand_to_vec4(struct 
> > ac_llvm_context *ctx,
> >   chan[num_channels++] = LLVMGetUndef(elemtype);
> >
> >   return ac_build_gather_values(ctx, chan, 4);
> >   }
> >
> >   LLVMValueRef
> >   ac_build_fdiv(struct ac_llvm_context *ctx,
> > LLVMValueRef num,
> > LLVMValueRef den)
> >   {
> > - LLVMValueRef ret = LLVMBuildFDiv(ctx->builder, num, den, "");
> > + /* If we do (num / den), LLVM >= 7.0 does:
> > +  *return num * v_rcp_f32(den * (fabs(den) > 0x1.0p+96f ? 
> > 0x1.0p-32f : 1.0f));
> > +  *
> > +  * If we do (num * (1 / den)), LLVM does:
> > +  *return num * v_rcp_f32(den);
> > +  */
> > + LLVMValueRef rcp = LLVMBuildFDiv(ctx->builder, ctx->f32_1, den, "");
> > + LLVMValueRef ret = LLVMBuildFMul(ctx->builder, num, rcp, "");
> >
> >   /* Use v_rcp_f32 instead of precise division. */
> >   if (!LLVMIsConstant(ret))
> >   LLVMSetMetadata(ret, ctx->fpmath_md_kind, 
> > ctx->fpmath_md_2p5_ulp);
> >   return ret;
> >   }
> >
> >   /* Coordinates for cube map selection. sc, tc, and ma are as in Table 8.27
> >* of the OpenGL 4.5 (Compatibility Profile) specification, except ma is
> >* already multiplied by two. id is the cube face number.
> >
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 31/31] nir: Add a bool to float32 lowering pass

2018-10-23 Thread Connor Abbott
On Tue, Oct 23, 2018 at 12:16 AM Jason Ekstrand  wrote:
>
> This should be useful for drivers that don't support real integers.
>
> Cc: Alyssa Rosenzweig 
> ---
>  src/compiler/Makefile.sources  |   1 +
>  src/compiler/nir/meson.build   |   1 +
>  src/compiler/nir/nir_lower_bool_to_float.c | 181 +
>  3 files changed, 183 insertions(+)
>  create mode 100644 src/compiler/nir/nir_lower_bool_to_float.c
>
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index 8f65f974ab8..2ff12ff43cb 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -230,6 +230,7 @@ NIR_FILES = \
> nir/nir_lower_atomics_to_ssbo.c \
> nir/nir_lower_bitmap.c \
> nir/nir_lower_bit_size.c \
> +   nir/nir_lower_bool_to_float.c \
> nir/nir_lower_bool_to_int32.c \
> nir/nir_lower_clamp_color_outputs.c \
> nir/nir_lower_clip.c \
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index 5809551c9d4..f715668a03b 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -113,6 +113,7 @@ files_libnir = files(
>'nir_lower_alpha_test.c',
>'nir_lower_atomics_to_ssbo.c',
>'nir_lower_bitmap.c',
> +  'nir_lower_bool_to_float.c',
>'nir_lower_bool_to_int32.c',
>'nir_lower_clamp_color_outputs.c',
>'nir_lower_clip.c',
> diff --git a/src/compiler/nir/nir_lower_bool_to_float.c 
> b/src/compiler/nir/nir_lower_bool_to_float.c
> new file mode 100644
> index 000..7aa5efb5a2f
> --- /dev/null
> +++ b/src/compiler/nir/nir_lower_bool_to_float.c
> @@ -0,0 +1,181 @@
> +/*
> + * Copyright © 2018 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "nir.h"
> +#include "nir_builder.h"
> +
> +static bool
> +assert_ssa_def_is_not_1bit(nir_ssa_def *def, UNUSED void *unused)
> +{
> +   assert(def->bit_size > 1);
> +   return true;
> +}
> +
> +static bool
> +rewrite_1bit_ssa_def_to_32bit(nir_ssa_def *def, void *_progress)
> +{
> +   bool *progress = _progress;
> +   if (def->bit_size == 1) {
> +  def->bit_size = 32;
> +  *progress = true;
> +   }
> +   return true;
> +}
> +
> +static bool
> +lower_alu_instr(nir_builder *b, nir_alu_instr *alu)
> +{
> +   const nir_op_info *op_info = _op_infos[alu->op];
> +
> +   b->cursor = nir_before_instr(>instr);
> +
> +   /* Replacement SSA value */
> +   nir_ssa_def *rep = NULL;
> +   switch (alu->op) {
> +   case nir_op_b2f: alu->op = nir_op_fmov; break;
> +   case nir_op_b2i: alu->op = nir_op_fmov; break;
> +   case nir_op_f2b:
> +   case nir_op_i2b:
> +  rep = nir_sne(b, nir_ssa_for_alu_src(b, alu, 0),
> +   nir_imm_float(b, 0));
> +  break;
> +
> +   case nir_op_flt: alu->op = nir_op_slt; break;
> +   case nir_op_fge: alu->op = nir_op_sge; break;
> +   case nir_op_feq: alu->op = nir_op_seq; break;
> +   case nir_op_fne: alu->op = nir_op_sne; break;
> +   case nir_op_ilt: alu->op = nir_op_slt; break;
> +   case nir_op_ige: alu->op = nir_op_sge; break;
> +   case nir_op_ieq: alu->op = nir_op_seq; break;
> +   case nir_op_ine: alu->op = nir_op_sne; break;
> +   case nir_op_ult: alu->op = nir_op_slt; break;
> +   case nir_op_uge: alu->op = nir_op_sge; break;
> +
> +   case nir_op_ball_fequal2:  alu->op = nir_op_fall_equal2; break;
> +   case nir_op_ball_fequal3:  alu->op = nir_op_fall_equal3; break;
> +   case nir_op_ball_fequal4:  alu->op = nir_op_fall_equal4; break;
> +   case nir_op_bany_fnequal2: alu->op = nir_op_fany_nequal2; break;
> +   case nir_op_bany_fnequal3: alu->op = nir_op_fany_nequal3; break;
> +   case nir_op_bany_fnequal4: alu->op = nir_op_fany_nequal4; break;
> +   case nir_op_ball_iequal2:  alu->op = nir_op_fall_equal2; break;
> +   case nir_op_ball_iequal3:  alu->op = nir_op_fall_equal3; break;
> +   case 

Re: [Mesa-dev] [PATCH 30/31] nir/algebraic: Add some optimizations for D3D-style booleans

2018-10-23 Thread Connor Abbott
On Tue, Oct 23, 2018 at 12:16 AM Jason Ekstrand  wrote:
>
> D3D booleans use a 32-bit 0/-1 representation.  Because this previously
> matched NIR exactly, we didn't have to really optimize for it.  Now that
> we have 1-bit booleans, we need some specific optimizations to chew
> through the D3D12-style booleans.
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index f0861c4411d..4d778e4b308 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -239,6 +239,7 @@ optimizations = [
> (('fne', ('b2f', a), 0.0), a),
> (('ieq', ('b2i', a), 0),   ('inot', a)),
> (('ine', ('b2i', a), 0),   a),
> +   (('ine', ('ineg', ('b2i', a)), 0), a),
>
> (('fne', ('u2f32', a), 0.0), ('ine', a, 0)),
> (('feq', ('u2f32', a), 0.0), ('ieq', a, 0)),
> @@ -528,6 +529,18 @@ optimizations = [
> (('bcsel', a, b, b), b),
> (('fcsel', a, b, b), b),
>
> +   # D3D Boolean eumulation
> +   (('bcsel', a, -1, 0), ('ineg', ('b2i', a))),
> +   (('bcsel', a, 0, -1), ('ineg', ('b2i', ('inot', a,
> +   (('iand', ('ineg', ('b2i', a)), ('ineg', ('b2i', b))),
> +('ineg', ('b2i', ('iand', a, b,
> +   (('ior', ('ineg', ('b2i', a)), ('ineg', ('b2i', b))),
> +('ineg', ('b2i', ('ior', a, b,
> +   (('ieq', ('ineg', ('b2i', a)), 0), ('inot', a)),
> +   (('ieq', ('ineg', ('b2i', a)), -1), a),
> +   (('ine', ('ineg', ('b2i', a)), 0), a),

Isn't this the same as the line you added above?

> +   (('ine', ('ineg', ('b2i', a)), -1), ('inot', a)),
> +
> # Conversions
> (('i2b', ('b2i', a)), a),
> (('f2i32', ('ftrunc', a)), ('f2i32', a)),
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] ac: Introduce ac_build_expand()

2018-10-18 Thread Connor Abbott
And implement ac_bulid_expand_to_vec4() on top of it.

Fixes: 7e7ee82698247d8f93fe37775b99f4838b0247dd ("ac: add support for 16bit 
buffer loads")
---
 src/amd/common/ac_llvm_build.c | 40 ++
 src/amd/common/ac_llvm_build.h |  3 +++
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 2d78ca1b52a..c54a50dcd86 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -523,39 +523,51 @@ ac_build_gather_values(struct ac_llvm_context *ctx,
return ac_build_gather_values_extended(ctx, values, value_count, 1, 
false, false);
 }
 
-/* Expand a scalar or vector to <4 x type> by filling the remaining channels
- * with undef. Extract at most num_channels components from the input.
+/* Expand a scalar or vector to  by filling the remaining
+ * channels with undef. Extract at most src_channels components from the input.
  */
-LLVMValueRef ac_build_expand_to_vec4(struct ac_llvm_context *ctx,
-LLVMValueRef value,
-unsigned num_channels)
+LLVMValueRef ac_build_expand(struct ac_llvm_context *ctx,
+LLVMValueRef value,
+unsigned src_channels,
+unsigned dst_channels)
 {
LLVMTypeRef elemtype;
-   LLVMValueRef chan[4];
+   LLVMValueRef chan[dst_channels];
 
if (LLVMGetTypeKind(LLVMTypeOf(value)) == LLVMVectorTypeKind) {
unsigned vec_size = LLVMGetVectorSize(LLVMTypeOf(value));
-   num_channels = MIN2(num_channels, vec_size);
 
-   if (num_channels >= 4)
+   if (src_channels == dst_channels && vec_size == dst_channels)
return value;
 
-   for (unsigned i = 0; i < num_channels; i++)
+   src_channels = MIN2(src_channels, vec_size);
+
+   for (unsigned i = 0; i < src_channels; i++)
chan[i] = ac_llvm_extract_elem(ctx, value, i);
 
elemtype = LLVMGetElementType(LLVMTypeOf(value));
} else {
-   if (num_channels) {
-   assert(num_channels == 1);
+   if (src_channels) {
+   assert(src_channels == 1);
chan[0] = value;
}
elemtype = LLVMTypeOf(value);
}
 
-   while (num_channels < 4)
-   chan[num_channels++] = LLVMGetUndef(elemtype);
+   for (unsigned i = src_channels; i < dst_channels; i++)
+   chan[i] = LLVMGetUndef(elemtype);
+
+   return ac_build_gather_values(ctx, chan, dst_channels);
+}
 
-   return ac_build_gather_values(ctx, chan, 4);
+/* Expand a scalar or vector to <4 x type> by filling the remaining channels
+ * with undef. Extract at most num_channels components from the input.
+ */
+LLVMValueRef ac_build_expand_to_vec4(struct ac_llvm_context *ctx,
+LLVMValueRef value,
+unsigned num_channels)
+{
+   return ac_build_expand(ctx, value, num_channels, 4);
 }
 
 LLVMValueRef ac_build_round(struct ac_llvm_context *ctx, LLVMValueRef value)
diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
index f68efbc49ff..1275e4fb698 100644
--- a/src/amd/common/ac_llvm_build.h
+++ b/src/amd/common/ac_llvm_build.h
@@ -172,6 +172,9 @@ LLVMValueRef
 ac_build_gather_values(struct ac_llvm_context *ctx,
   LLVMValueRef *values,
   unsigned value_count);
+LLVMValueRef ac_build_expand(struct ac_llvm_context *ctx,
+LLVMValueRef value,
+unsigned src_channels, unsigned dst_channels);
 LLVMValueRef ac_build_expand_to_vec4(struct ac_llvm_context *ctx,
 LLVMValueRef value,
 unsigned num_channels);
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] Revert "radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT"

2018-10-18 Thread Connor Abbott
This reverts commit 647c2b90e96a9ab8571baf958a7c67c1e816911a. There was
one recently-introduced bug in ac for dvec3 loads, but the other test
failures were actually bugs in the tests. See
https://github.com/cwabbott0/VK-GL-CTS/tree/voteallequal-fixes
---
 src/amd/vulkan/radv_device.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 6eb37472992..4a705a724ef 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -1057,14 +1057,12 @@ void radv_GetPhysicalDeviceProperties2(
(VkPhysicalDeviceSubgroupProperties*)ext;
properties->subgroupSize = 64;
properties->supportedStages = VK_SHADER_STAGE_ALL;
-   /* TODO: Enable VK_SUBGROUP_FEATURE_VOTE_BIT when wwm
-* is fixed in LLVM.
-*/
properties->supportedOperations =

VK_SUBGROUP_FEATURE_ARITHMETIC_BIT |

VK_SUBGROUP_FEATURE_BASIC_BIT |

VK_SUBGROUP_FEATURE_BALLOT_BIT |
-   
VK_SUBGROUP_FEATURE_QUAD_BIT;
+   
VK_SUBGROUP_FEATURE_QUAD_BIT |
+   
VK_SUBGROUP_FEATURE_VOTE_BIT;
if (pdevice->rad_info.chip_class >= VI) {
properties->supportedOperations |=

VK_SUBGROUP_FEATURE_SHUFFLE_BIT |
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] ac: Fix loading a dvec3 from an SSBO

2018-10-18 Thread Connor Abbott
The comment was wrong, since the loop above casts to a type with the
correct bitsize already.

Fixes: 7e7ee82698247d8f93fe37775b99f4838b0247dd ("ac: add support for 16bit 
buffer loads")
---
 src/amd/common/ac_nir_to_llvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 402cf2d6655..ee75e2890dd 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1685,8 +1685,8 @@ static LLVMValueRef visit_load_buffer(struct 
ac_nir_context *ctx,
};
 
if (num_bytes > 16 && num_components == 3) {
-   /* we end up with a v4f32 and v2f32 but shuffle fails 
on that */
-   results[1] = ac_build_expand_to_vec4(>ac, 
results[1], 2);
+   /* we end up with a v2i64 and i64 but shuffle fails on 
that */
+   results[1] = ac_build_expand(>ac, results[1], 1, 
2);
}
 
LLVMValueRef swizzle = LLVMConstVector(masks, num_components);
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/7] nir/int64: Add some more lowering helpers

2018-10-15 Thread Connor Abbott
On Mon, Oct 15, 2018 at 8:41 PM Jason Ekstrand  wrote:
>
> On Mon, Oct 15, 2018 at 1:39 PM Ian Romanick  wrote:
>>
>> On 10/14/2018 03:58 PM, Jason Ekstrand wrote:
>> > On October 14, 2018 17:12:34 Matt Turner  wrote:
>> >> +static nir_ssa_def *
>> >> +lower_iabs64(nir_builder *b, nir_ssa_def *x)
>> >> +{
>> >> +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
>> >> +   nir_ssa_def *x_is_neg = nir_ilt(b, x_hi, nir_imm_int(b, 0));
>> >> +   return nir_bcsel(b, x_is_neg, lower_ineg64(b, x), x);
>> >
>> > lower_bcsel?  Or, since we're depending on this running multiple times,
>> > just nir_ineg?  I go back and forth on whether a pass like this should
>> > run in a loop or be smart enough to lower intermediate bits on the fly.
>> > We should probably pick one.
>>
>> In principle, I agree.  I've been bitten a couple times by lowering
>> passes that generate other things that need to be lowered on some
>> platforms (that I didn't test).  In this case, I think the loop is the
>> right answer since each operation is lowered by a separate flag.
>
>
> That's the easy answer, certainly.  The other option is to have every lowered 
> thing builder check the flag and conditionally do the lowering.  That's 
> annoying and hard to get right so a loop is probably best for now.

Couldn't you just have the builder be right after the instruction,
instead of before it, and make the outer loop use a non-safe iterator
so that it will immediately run over the instructions generated? Doing
another pass over the whole shader is usually a little expensive.

>
> --Jason
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V2 01/10] nir: evaluate if condition uses inside the if branches

2018-07-18 Thread Connor Abbott
On Thu, Jul 19, 2018 at 10:24 AM, Timothy Arceri  wrote:
> On 19/07/18 12:02, Connor Abbott wrote:
>>
>> Why not do the more general thing, and evaluate the condition in every
>> block dominated by the then and else blocks? That should handle the
>> loop and non-loop cases.
>
>
> Can you explain what the advantage would be in doing that? Is it just likely
> to reduce the code required?

Yeah, that's it. Plus it works the same for unstructured control flow
(if we ever add that).

>
>
>>
>> On Thu, Jul 19, 2018 at 8:06 AM, Timothy Arceri 
>> wrote:
>>>
>>> Since we know what side of the branch we ended up on we can just
>>> replace the use with a constant.
>>>
>>> All helped shaders are from Unreal Engine 4 besides one shader from
>>> Dirt Showdown.
>>>
>>> V2: make sure we do evaluation when condition is used in else with
>>>  a single block (we were checking for blocks < the last else
>>>  block rather than <=)
>>>
>>> shader-db results SKL:
>>>
>>> total instructions in shared programs: 13219725 -> 13219643 (<.01%)
>>> instructions in affected programs: 28917 -> 28835 (-0.28%)
>>> helped: 45
>>> HURT: 0
>>>
>>> total cycles in shared programs: 529335971 -> 529334604 (<.01%)
>>> cycles in affected programs: 216209 -> 214842 (-0.63%)
>>> helped: 45
>>> HURT: 4
>>>
>>> Cc: Ian Romanick 
>>>
>>> fix if condition eval for else with a single block
>>> ---
>>>   src/compiler/nir/nir_opt_if.c | 121 ++
>>>   1 file changed, 121 insertions(+)
>>>
>>> diff --git a/src/compiler/nir/nir_opt_if.c
>>> b/src/compiler/nir/nir_opt_if.c
>>> index a52de120ad6..4ed919887ce 100644
>>> --- a/src/compiler/nir/nir_opt_if.c
>>> +++ b/src/compiler/nir/nir_opt_if.c
>>> @@ -348,6 +348,86 @@ opt_if_loop_terminator(nir_if *nif)
>>>  return true;
>>>   }
>>>
>>> +static void
>>> +replace_if_condition_use_with_const(nir_src *use, unsigned nir_boolean,
>>> +void *mem_ctx, bool if_condition)
>>> +{
>>> +   /* Create const */
>>> +   nir_load_const_instr *load = nir_load_const_instr_create(mem_ctx, 1,
>>> 32);
>>> +   load->value.u32[0] = nir_boolean;
>>> +
>>> +   if (if_condition) {
>>> +  nir_instr_insert_before_cf(>parent_if->cf_node,
>>> >instr);
>>> +   } else if (use->parent_instr->type == nir_instr_type_phi) {
>>> +  nir_phi_instr *cond_phi = nir_instr_as_phi(use->parent_instr);
>>> +
>>> +  bool UNUSED found = false;
>>> +  nir_foreach_phi_src(phi_src, cond_phi) {
>>> + if (phi_src->src.ssa == use->ssa) {
>>> +nir_instr_insert_before_block(phi_src->pred, >instr);
>>> +found = true;
>>> +break;
>>> + }
>>> +  }
>>> +  assert(found);
>>> +   } else {
>>> +  nir_instr_insert_before(use->parent_instr,  >instr);
>>> +   }
>>> +
>>> +   /* Rewrite use to use const */
>>> +   nir_src new_src = nir_src_for_ssa(>def);
>>> +
>>> +   if (if_condition)
>>> +  nir_if_rewrite_condition(use->parent_if, new_src);
>>> +   else
>>> +  nir_instr_rewrite_src(use->parent_instr, use, new_src);
>>> +}
>>> +
>>> +static bool
>>> +evaluate_condition_use(nir_if *nif, nir_src *use_src, void *mem_ctx,
>>> +   bool if_condition)
>>> +{
>>> +   bool progress = false;
>>> +
>>> +   nir_block *first_then = nir_if_first_then_block(nif);
>>> +   if (use_src->parent_instr->block->index > first_then->index) {
>>> +  nir_block *first_else = nir_if_first_else_block(nif);
>>> +  if (use_src->parent_instr->block->index < first_else->index) {
>>> + replace_if_condition_use_with_const(use_src, NIR_TRUE, mem_ctx,
>>> + if_condition);
>>> +
>>> + progress = true;
>>> +  } else if (use_src->parent_instr->block->index <=
>>> + nir_if_last_else_block(nif)->index) {
>>> + replace_if_condition_use_with_const(use_src, NIR_FALSE,
>>> mem_ctx,
>>> +

Re: [Mesa-dev] [PATCH V2 01/10] nir: evaluate if condition uses inside the if branches

2018-07-18 Thread Connor Abbott
Why not do the more general thing, and evaluate the condition in every
block dominated by the then and else blocks? That should handle the
loop and non-loop cases.

On Thu, Jul 19, 2018 at 8:06 AM, Timothy Arceri  wrote:
> Since we know what side of the branch we ended up on we can just
> replace the use with a constant.
>
> All helped shaders are from Unreal Engine 4 besides one shader from
> Dirt Showdown.
>
> V2: make sure we do evaluation when condition is used in else with
> a single block (we were checking for blocks < the last else
> block rather than <=)
>
> shader-db results SKL:
>
> total instructions in shared programs: 13219725 -> 13219643 (<.01%)
> instructions in affected programs: 28917 -> 28835 (-0.28%)
> helped: 45
> HURT: 0
>
> total cycles in shared programs: 529335971 -> 529334604 (<.01%)
> cycles in affected programs: 216209 -> 214842 (-0.63%)
> helped: 45
> HURT: 4
>
> Cc: Ian Romanick 
>
> fix if condition eval for else with a single block
> ---
>  src/compiler/nir/nir_opt_if.c | 121 ++
>  1 file changed, 121 insertions(+)
>
> diff --git a/src/compiler/nir/nir_opt_if.c b/src/compiler/nir/nir_opt_if.c
> index a52de120ad6..4ed919887ce 100644
> --- a/src/compiler/nir/nir_opt_if.c
> +++ b/src/compiler/nir/nir_opt_if.c
> @@ -348,6 +348,86 @@ opt_if_loop_terminator(nir_if *nif)
> return true;
>  }
>
> +static void
> +replace_if_condition_use_with_const(nir_src *use, unsigned nir_boolean,
> +void *mem_ctx, bool if_condition)
> +{
> +   /* Create const */
> +   nir_load_const_instr *load = nir_load_const_instr_create(mem_ctx, 1, 32);
> +   load->value.u32[0] = nir_boolean;
> +
> +   if (if_condition) {
> +  nir_instr_insert_before_cf(>parent_if->cf_node,  >instr);
> +   } else if (use->parent_instr->type == nir_instr_type_phi) {
> +  nir_phi_instr *cond_phi = nir_instr_as_phi(use->parent_instr);
> +
> +  bool UNUSED found = false;
> +  nir_foreach_phi_src(phi_src, cond_phi) {
> + if (phi_src->src.ssa == use->ssa) {
> +nir_instr_insert_before_block(phi_src->pred, >instr);
> +found = true;
> +break;
> + }
> +  }
> +  assert(found);
> +   } else {
> +  nir_instr_insert_before(use->parent_instr,  >instr);
> +   }
> +
> +   /* Rewrite use to use const */
> +   nir_src new_src = nir_src_for_ssa(>def);
> +
> +   if (if_condition)
> +  nir_if_rewrite_condition(use->parent_if, new_src);
> +   else
> +  nir_instr_rewrite_src(use->parent_instr, use, new_src);
> +}
> +
> +static bool
> +evaluate_condition_use(nir_if *nif, nir_src *use_src, void *mem_ctx,
> +   bool if_condition)
> +{
> +   bool progress = false;
> +
> +   nir_block *first_then = nir_if_first_then_block(nif);
> +   if (use_src->parent_instr->block->index > first_then->index) {
> +  nir_block *first_else = nir_if_first_else_block(nif);
> +  if (use_src->parent_instr->block->index < first_else->index) {
> + replace_if_condition_use_with_const(use_src, NIR_TRUE, mem_ctx,
> + if_condition);
> +
> + progress = true;
> +  } else if (use_src->parent_instr->block->index <=
> + nir_if_last_else_block(nif)->index) {
> + replace_if_condition_use_with_const(use_src, NIR_FALSE, mem_ctx,
> + if_condition);
> +
> + progress = true;
> +  }
> +   }
> +
> +   return progress;
> +}
> +
> +static bool
> +opt_if_evaluate_condition_use(nir_if *nif, void *mem_ctx)
> +{
> +   bool progress = false;
> +
> +   /* Evaluate any uses of the if condition inside the if branches */
> +   assert(nif->condition.is_ssa);
> +   nir_foreach_use_safe(use_src, nif->condition.ssa) {
> +  progress |= evaluate_condition_use(nif, use_src, mem_ctx, false);
> +   }
> +
> +   nir_foreach_if_use_safe(use_src, nif->condition.ssa) {
> +  if (use_src->parent_if != nif)
> + progress |= evaluate_condition_use(nif, use_src, mem_ctx, true);
> +   }
> +
> +   return progress;
> +}
> +
>  static bool
>  opt_if_cf_list(nir_builder *b, struct exec_list *cf_list)
>  {
> @@ -381,6 +461,41 @@ opt_if_cf_list(nir_builder *b, struct exec_list *cf_list)
> return progress;
>  }
>
> +/**
> + * These optimisations depend on nir_metadata_block_index and therefore must
> + * not do anything to cause the metadata to become invalid.
> + */
> +static bool
> +opt_if_safe_cf_list(nir_builder *b, struct exec_list *cf_list, void *mem_ctx)
> +{
> +   bool progress = false;
> +   foreach_list_typed(nir_cf_node, cf_node, node, cf_list) {
> +  switch (cf_node->type) {
> +  case nir_cf_node_block:
> + break;
> +
> +  case nir_cf_node_if: {
> + nir_if *nif = nir_cf_node_as_if(cf_node);
> + progress |= opt_if_safe_cf_list(b, >then_list, mem_ctx);
> + progress |= opt_if_safe_cf_list(b, >else_list, mem_ctx);
> + 

Re: [Mesa-dev] [PATCH] nir: fix ir_binop_gequal glsl_to_nir conversion

2018-04-14 Thread Connor Abbott
On Sat, Apr 14, 2018 at 3:39 PM, Erico Nunes  wrote:
> On Sat, Apr 14, 2018 at 9:26 PM, Jason Ekstrand  wrote:
>> Reviewed-by: Jason Ekstrand 
>>
>> What driver is hitting this path?  The !supports_ints path isn't used to my
>> knowledge so if some driver has started using it, they're liable to find
>> more bugs than just this one. :-)
>
> I'm doing some work on the lima vertex shader compiler and I hit this.
>
> And yeah this is there since 2015 it seems, so I suppose no other
> drivers are using this path, we'll see if there's more.

I think that it's probably impractical to use this path, and we should
probably delete it. There are just too many optimizations, e.g. in
nir_opt_algebraic and lowering passes that assume you have ints. I
think a better plan would be to silently convert ints to floats in the
lima driver, and maybe inhibit any optimizations that use bit
twiddling tricks if real int support isn't indicated.

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Question about min_index/max_index calculation

2018-03-24 Thread Connor Abbott
On Sat, Mar 24, 2018 at 6:00 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote:
> On Sat, Mar 24, 2018 at 2:27 PM, Marek Olšák <mar...@gmail.com> wrote:
>>
>> On Sat, Mar 24, 2018 at 1:36 PM, Connor Abbott <cwabbo...@gmail.com>
>> wrote:
>>>
>>> If Gallium was being lazy and not
>>> specifying the bounds for internal shaders, that needs to be fixed for
>>> the HW to work properly.
>>
>>
>> I don't understand the sentence. Shaders don't interact with vertex
>> indices. I also don't like the word "lazy". The proper expression is "saving
>> time".
>
>
> I figured he meant for things like u_blitter.  But why those things would be
> using an index buffer is beyond me...

Yeah, that was just speculation. I just meant to explain why Mali
might require the min/max index up-front, unlike other HW (AFAIK). I'm
certainly no expert when it comes to Gallium, and I don't have the
hardware to reproduce the problem.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Question about min_index/max_index calculation

2018-03-24 Thread Connor Abbott
My understanding is that unlike most other architectures, Mali does
vertex shading on every vertex up-front, completely ignoring the index
buffer. Primitive assembly and tile binning happen after every vertex
is transformed. There is no cache of transformed vertices. Utgard also
only supports GLES2, so the index buffer data is always passed through
a CPU pointer. Since it's possible to calculate the bounds on the CPU
without stalling, and since the HW was designed with low transistor
count in mind, they don't compute the bounds on the HW, and instead
expect you to pass it through. If Gallium was being lazy and not
specifying the bounds for internal shaders, that needs to be fixed for
the HW to work properly. Or, we need to guess by looking at the vertex
buffer size as Ilia mentioned.

On Sun, Mar 18, 2018 at 12:59 AM, Marek Olšák  wrote:
> The index bounds are computed only when they are needed for uploading
> vertices that are passed via a CPU pointer (user_buffer). In all other
> cases, computing the index bounds has a performance cost, which can be very
> significant.
>
> If you rely on u_vbuf to upload vertices for you, you shouldn't need the
> index bounds.
>
> Marek
>
> On Sat, Mar 17, 2018 at 2:12 PM, Erico Nunes  wrote:
>>
>> Hi all,
>>
>> I have been working to add indexed drawing/glDrawElements support to
>> the mesa-lima driver currently in development
>> (https://github.com/yuq/mesa-lima).
>> For that implementation, it seems that we need to have the minimum and
>> maximum index values from the index buffer available in order to set
>> up a proper command stream.
>> Mesa provides these values in pipe_draw_info.min_index and
>> pipe_draw_info.max_index, however in some cases we noticed that it
>> decides to not calculate those. This happens because of
>> st_context.draw_needs_minmax_index being false after evaluating the
>> vertex data. In those cases, min_index gets to be 0 and max_index gets
>> to be 0x.
>> According to the gallium documentation, this seems to be on purpose
>> and apparently drivers should be able to handle the 0 and 0x
>> case and be able to render anyway. However, we haven't figured out a
>> way to do the render anyway with 0 and 0x.
>>
>> For us it would be interesting to always have mesa calculate those
>> values for indexed drawing. We haven't been able to figure out a way
>> to do that without changing the mesa generic code. Is there some way
>> we could accomplish that in driver specific code?
>> Otherwise, can you provide some advice on how to best handle this?
>>
>> Using mesa 17.3 and kmscube with the following patch is one way to
>> reproduce st_context.draw_needs_minmax_index not being set.
>>
>> https://gist.githubusercontent.com/enunes/366398fbee3d194deb3a46ef9c2ca78d/raw/82a2c8084236e35635b7a247609213d0068974e3/kmscube.patch
>> The only way that this works for us with the current implementation is
>> by hacking st_context.draw_needs_minmax_index to be always true in
>> some way.
>>
>> Thanks
>>
>> Erico
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] FLAG-DAY: NIR derefs

2018-03-14 Thread Connor Abbott
On Wed, Mar 14, 2018 at 6:07 PM, Rob Clark <robdcl...@gmail.com> wrote:
> On Wed, Mar 14, 2018 at 7:42 PM, Connor Abbott <cwabbo...@gmail.com> wrote:
>> On Wed, Mar 14, 2018 at 5:05 PM, Rob Clark <robdcl...@gmail.com> wrote:
>>> On Wed, Mar 14, 2018 at 4:58 PM, Connor Abbott <cwabbo...@gmail.com> wrote:
>>>> FWIW, the way I imagined doing this was something like:
>>>>
>>>> 1. Add nir_deref_instr and nir_deref_type_pointer. At this point, just
>>>> make everything assert if the base deref isn't a nir_deref_var. This
>>>> will be a bit of a flag-day, but also very mechanical. It'll also help
>>>> us catch cases where we don't handle new-style derefs later.
>>>> 2. Add a pass to flatten nir_deref_type_pointer into
>>>> nir_deref_type_var if possible (i.e. if there's a clear chain up to
>>>> the base variable without any phi nodes or whatever). This should
>>>> always be possible for GLSL, as well as SPIR-V unless
>>>> KHR_variable_pointers is enabled. We'll use this to avoid too much
>>>> churn in drivers, passes that haven't been updated, etc. We might also
>>>> want a pass to do the opposite, for converting passes where we don't
>>>> want to have codepaths for both forms at once.
>>>
>>> btw, does it seem reasonable to assert that deref instruction src's
>>> are *always* in SSA form?  That seems reasonable to me since they will
>>> be mostly lowered away before the driver sees them (and I think makes
>>> some of the operation on them easier), and I can't think of any way
>>> for them *not* to be SSA (since they aren't real instructions).
>>
>> I think so... as long as you don't lower locals to regs before
>> lowering everything to explicit address arithmetic. Although, with the
>> physical memory model, it's just another source like any other so I'm
>> not sure if there's a point.
>>
>
> I think w/ phys memory model, we could lower away the deref's before
> going to regs.  That *seems* like a reasonable requirement to me.
>
>>>
>>> If so, my rough thoughts are a deref instruction chain (formed by ssa
>>> links to previous deref instruction) either start w/
>>> nir_deref_instr_pointer or nir_deref_instruction_var instructions at
>>> the head of the list (to start, I guess you could ignore adding the
>>> nir_deref_instr_pointer instruction and I could add that for
>>> clover/spirv work).  Followed by N links of struct/array deref_link
>>> instructions that have two ssa src's (one that is previous deref
>>> instruction and one that is array or struct member offset)
>>
>> Why would you need a separate nir_deref_instr_pointer? Do you want to
>> put information like what type of pointer it is in there? Maybe we
>> could just make that part of every nir_deref_instr instead?
>
> well, in clc you could hypotheticaly do something like:
>
>   __global struct Foo *f = (struct Foo *)0x1234;
>
> so you don't necessarily have a var at the start of your deref chain.
>
> More realistic example is:
>
>   ptr->a.b->c.d
>
> which is really two deref chains, first starting at a var, second
> starting at an ssa ptr (which I think realistically ends up needing to
> be a fat pointer to deal w/ cl's multiple address spaces[1]), with an
> intermediate load_global or load_shared intrinsic in between.
>
> Anyways, don't want to derail the conversion to deref instructions too
> much, but I do think we need something different for "var" vs "ptr"
> (and the nice thing about deref chains is this should be easier to
> add)

My point was that you don't really need a distinction, as long as
deref instructions can accept any old pointer. In your second example,
there would be a struct deref, a load, and then a second struct deref
using the result of the load. This is similar to how it's done in
LLVM.

>
> BR,
> -R
>
> [1] kinda a different topic.. short version is I'm leaning towards a
> nir_deref_instr_pointer taking a two component vector as it's src so
> it can be lowered to an if/else chain to deal with different address
> spaces, and then let opt passes clean things up so driver ends up with
> either load/store_global or load/store_local, etc
>
>
>>
>>>
>>>> 3. Modify nir_lower_io to handle new-style derefs, especially for
>>>> shared variables (i.e. KHR_variable_pointers for anv). We might have
>>>> to modify a few other passes, too.
>>>> 4. Add the required deref lowering passes to all drivers.
>>>> 5. Rewrite glsl_to_nir and spirv

Re: [Mesa-dev] FLAG-DAY: NIR derefs

2018-03-14 Thread Connor Abbott
On Wed, Mar 14, 2018 at 5:05 PM, Rob Clark <robdcl...@gmail.com> wrote:
> On Wed, Mar 14, 2018 at 4:58 PM, Connor Abbott <cwabbo...@gmail.com> wrote:
>> FWIW, the way I imagined doing this was something like:
>>
>> 1. Add nir_deref_instr and nir_deref_type_pointer. At this point, just
>> make everything assert if the base deref isn't a nir_deref_var. This
>> will be a bit of a flag-day, but also very mechanical. It'll also help
>> us catch cases where we don't handle new-style derefs later.
>> 2. Add a pass to flatten nir_deref_type_pointer into
>> nir_deref_type_var if possible (i.e. if there's a clear chain up to
>> the base variable without any phi nodes or whatever). This should
>> always be possible for GLSL, as well as SPIR-V unless
>> KHR_variable_pointers is enabled. We'll use this to avoid too much
>> churn in drivers, passes that haven't been updated, etc. We might also
>> want a pass to do the opposite, for converting passes where we don't
>> want to have codepaths for both forms at once.
>
> btw, does it seem reasonable to assert that deref instruction src's
> are *always* in SSA form?  That seems reasonable to me since they will
> be mostly lowered away before the driver sees them (and I think makes
> some of the operation on them easier), and I can't think of any way
> for them *not* to be SSA (since they aren't real instructions).

I think so... as long as you don't lower locals to regs before
lowering everything to explicit address arithmetic. Although, with the
physical memory model, it's just another source like any other so I'm
not sure if there's a point.

>
> If so, my rough thoughts are a deref instruction chain (formed by ssa
> links to previous deref instruction) either start w/
> nir_deref_instr_pointer or nir_deref_instruction_var instructions at
> the head of the list (to start, I guess you could ignore adding the
> nir_deref_instr_pointer instruction and I could add that for
> clover/spirv work).  Followed by N links of struct/array deref_link
> instructions that have two ssa src's (one that is previous deref
> instruction and one that is array or struct member offset)

Why would you need a separate nir_deref_instr_pointer? Do you want to
put information like what type of pointer it is in there? Maybe we
could just make that part of every nir_deref_instr instead?

>
>> 3. Modify nir_lower_io to handle new-style derefs, especially for
>> shared variables (i.e. KHR_variable_pointers for anv). We might have
>> to modify a few other passes, too.
>> 4. Add the required deref lowering passes to all drivers.
>> 5. Rewrite glsl_to_nir and spirv_to_nir to emit the new-style derefs.
>> At the very least, we should be using this to implement the shared
>> variable bits of KHR_variable_pointers. If we add stride/offset
>> annotations to nir_deref_instr for UBO's and SSBO's, then we might
>> also be able to get rid of the vtn_deref stuff entirely (although I'm
>> not sure if that should be a goal right now).
>
> I think I might try to prototype something where we convert vtn over
> to new-style deref instructions, plus a pass to lower to old style
> deref chains.  It partly comes down to how quickly I can finish a
> couple other things, and how much I can't sleep on a long-ass flight.
> (I guess even if throw-away, if it gives some idea of what to do or
> what not to do it might be useful?)
>
> Anyways, as far as decoupling this from backend drivers, I think a
> nir_intr_get_var(intr, n) instruction to replace open coded
> intr->variables[0]->var could go a long way.  (In the new world this
> would follow ssa links to previous deref instruction to find the
> nir_deref_instruction_var.)  I'll try typing this up in a few minutes.
>
>> At this point, we can fix things up and move everything else over to
>> new-style derefs at our leisure. Also, it should now be pretty
>> straightforward to add support for shared variable pointers to radv
>> without lowering everything to offsets up-front, which is nice.
>>
>> Connor
>>
>>
>> On Wed, Mar 14, 2018 at 2:32 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote:
>>> All,
>>>
>>> Connor and I along with several others have been discussing for a while
>>> changing the way NIR dereferences work.  In particular, adding a new
>>> nir_deref_instr type where the first one in the chain takes a variable and
>>> is followed by a series of instructions which take another deref instruction
>>> and do an array or structure dereference on it.
>>>
>>> Much of the motivation for this is some of the upcoming SPIR-V stuff where
>>> we have more real pointers and 

Re: [Mesa-dev] FLAG-DAY: NIR derefs

2018-03-14 Thread Connor Abbott
FWIW, the way I imagined doing this was something like:

1. Add nir_deref_instr and nir_deref_type_pointer. At this point, just
make everything assert if the base deref isn't a nir_deref_var. This
will be a bit of a flag-day, but also very mechanical. It'll also help
us catch cases where we don't handle new-style derefs later.
2. Add a pass to flatten nir_deref_type_pointer into
nir_deref_type_var if possible (i.e. if there's a clear chain up to
the base variable without any phi nodes or whatever). This should
always be possible for GLSL, as well as SPIR-V unless
KHR_variable_pointers is enabled. We'll use this to avoid too much
churn in drivers, passes that haven't been updated, etc. We might also
want a pass to do the opposite, for converting passes where we don't
want to have codepaths for both forms at once.
3. Modify nir_lower_io to handle new-style derefs, especially for
shared variables (i.e. KHR_variable_pointers for anv). We might have
to modify a few other passes, too.
4. Add the required deref lowering passes to all drivers.
5. Rewrite glsl_to_nir and spirv_to_nir to emit the new-style derefs.
At the very least, we should be using this to implement the shared
variable bits of KHR_variable_pointers. If we add stride/offset
annotations to nir_deref_instr for UBO's and SSBO's, then we might
also be able to get rid of the vtn_deref stuff entirely (although I'm
not sure if that should be a goal right now).

At this point, we can fix things up and move everything else over to
new-style derefs at our leisure. Also, it should now be pretty
straightforward to add support for shared variable pointers to radv
without lowering everything to offsets up-front, which is nice.

Connor


On Wed, Mar 14, 2018 at 2:32 PM, Jason Ekstrand  wrote:
> All,
>
> Connor and I along with several others have been discussing for a while
> changing the way NIR dereferences work.  In particular, adding a new
> nir_deref_instr type where the first one in the chain takes a variable and
> is followed by a series of instructions which take another deref instruction
> and do an array or structure dereference on it.
>
> Much of the motivation for this is some of the upcoming SPIR-V stuff where
> we have more real pointers and deref chains don't really work anymore.  It
> will also allow for things such as CSE of common derefs which could make
> analysis easier.  This is similar to what LLVM does and it's working very
> well for them.
>
> The reason for this e-mail is that this is going to be a flag-day change.
> We've been talking about it for a while but this is going to be a major and
> fairly painful change in the short term so no one has actually done it.
> It's time we finally just suck it up and make it happen.  While we will try
> to make the change as incrementally and reviewably as possible but there is
> a real limit as to what is possible here.  My plan is to start cracking away
> at this on Monday and hopefully have something working for i965/anv by the
> end of the week or maybe some time the week after.  If anyone has something
> to say in opposition, please speak up now and not after I've spent a week
> straight frantically hacking on NIR.
>
> I would like everyone to be respectful of the fact that this will be a major
> change and very painful to rebase.  If you've got outstanding NIR, GLSL, or
> SPIR-V work that is likely to conflict with this, please try to land it
> before Monday so that we can avoid rebase conflicts.  If you have interest
> in reviewing this, please try to be responsive so that we can get it
> reviewed and landed before it becomes too painful.  I'll try to send out
> some preview patches as I go so that the data structures themselves can get
> some review before the rest of the changes have been made.
>
> I'm also asking for help from Rob, Bas, and Eric if there are changes needed
> in any of their drivers.  I suspect the impact on back-end drivers will be
> low because most of them don't use derefs directly, but it would be good of
> people were on hand to help catch bugs if nothing else.
>
> Thanks,
>
> --Jason Ekstrand
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix divide by zero crash during constant folding

2018-02-27 Thread Connor Abbott
Floating point division shouldn't signal on division by zero, it
should just return an appropriately-signed infinity, which seems like
a sane thing to do, and way better than just returning 0. So we
shouldn't do this with fdiv. I guess 0 is as good a result as any for
the integer division, though -- there aren't really any great choices.

On Tue, Feb 27, 2018 at 10:07 PM, Timothy Arceri  wrote:
> From the GLSL 4.60 spec Section 5.9 (Expressions):
>
>"Dividing by zero does not cause an exception but does result in
> an unspecified value."
>
> Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271
> ---
>  src/compiler/nir/nir_opcodes.py | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index 278562b2bd..dcc5b07d05 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -403,9 +403,9 @@ binop("imul_high", tint32, commutative,
>  binop("umul_high", tuint32, commutative,
>"(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")
>
> -binop("fdiv", tfloat, "", "src0 / src1")
> -binop("idiv", tint, "", "src0 / src1")
> -binop("udiv", tuint, "", "src0 / src1")
> +binop("fdiv", tfloat, "", "src1 == 0 ? 0 : (src0 / src1)")
> +binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
> +binop("udiv", tuint, "", "src1 == 0 ? 0 : (src0 / src1)")
>
>  # returns a boolean representing the carry resulting from the addition of
>  # the two unsigned arguments.
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] ac: create ac_get_zero_for_type() helper

2018-02-25 Thread Connor Abbott
Why not just use LLVMConstNull()?

On Mon, Feb 26, 2018 at 12:14 AM, Timothy Arceri  wrote:
> ---
>  src/amd/common/ac_llvm_build.c | 15 +++
>  src/amd/common/ac_llvm_build.h |  3 +++
>  2 files changed, 18 insertions(+)
>
> diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
> index 15144addb9..f3775938e5 100644
> --- a/src/amd/common/ac_llvm_build.c
> +++ b/src/amd/common/ac_llvm_build.c
> @@ -148,6 +148,21 @@ ac_get_elem_bits(struct ac_llvm_context *ctx, 
> LLVMTypeRef type)
> unreachable("Unhandled type kind in get_elem_bits");
>  }
>
> +LLVMValueRef
> +ac_get_zero_for_type(struct ac_llvm_context *ctx, LLVMTypeRef type)
> +{
> +   if (type == ctx->i32)
> +   return ctx->i32_0;
> +   if (type == ctx->f32)
> +   return ctx->f32_0;
> +   if (type == ctx->i64)
> +   return ctx->i64_0;
> +   if (type == ctx->f64)
> +   return ctx->f64_0;
> +
> +   unreachable("Unhandled type kind in ac_get_zero_for_type()");
> +}
> +
>  unsigned
>  ac_get_type_size(LLVMTypeRef type)
>  {
> diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
> index 0a49ad8ca1..2051dc35bc 100644
> --- a/src/amd/common/ac_llvm_build.h
> +++ b/src/amd/common/ac_llvm_build.h
> @@ -100,6 +100,9 @@ ac_get_llvm_num_components(LLVMValueRef value);
>  int
>  ac_get_elem_bits(struct ac_llvm_context *ctx, LLVMTypeRef type);
>
> +LLVMValueRef
> +ac_get_zero_for_type(struct ac_llvm_context *ctx, LLVMTypeRef type);
> +
>  LLVMValueRef
>  ac_llvm_extract_elem(struct ac_llvm_context *ac,
>  LLVMValueRef value,
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/1] ac/nir: use ordered float comparisons except for not equal

2018-02-23 Thread Connor Abbott
Yes, this is the right ordering for NIR (and GLSL) comparisons.

Reviewed-by: Connor Abbott <cwabbo...@gmail.com>

On Fri, Feb 23, 2018 at 8:21 AM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:
> Original patch from Timothy Arceri, I have just fixed the
> not equal case locally.
>
> This fixes one important rendering issue in Wolfenstein 2
> (the cutscene transition issue).
>
> RadeonSI uses the same ordered comparisons, so I guess that
> what we should do as well.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104302
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104905
> Cc: <mesa-sta...@lists.freedesktop.org>
> Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
> ---
>  src/amd/common/ac_nir_to_llvm.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 687157..bc1d16d2a4 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -1801,16 +1801,16 @@ static void visit_alu(struct ac_nir_context *ctx, 
> const nir_alu_instr *instr)
> result = emit_int_cmp(>ac, LLVMIntUGE, src[0], src[1]);
> break;
> case nir_op_feq:
> -   result = emit_float_cmp(>ac, LLVMRealUEQ, src[0], 
> src[1]);
> +   result = emit_float_cmp(>ac, LLVMRealOEQ, src[0], 
> src[1]);
> break;
> case nir_op_fne:
> result = emit_float_cmp(>ac, LLVMRealUNE, src[0], 
> src[1]);
> break;
> case nir_op_flt:
> -   result = emit_float_cmp(>ac, LLVMRealULT, src[0], 
> src[1]);
> +   result = emit_float_cmp(>ac, LLVMRealOLT, src[0], 
> src[1]);
> break;
> case nir_op_fge:
> -   result = emit_float_cmp(>ac, LLVMRealUGE, src[0], 
> src[1]);
> +   result = emit_float_cmp(>ac, LLVMRealOGE, src[0], 
> src[1]);
> break;
> case nir_op_fabs:
> result = emit_intrin_1f_param(>ac, "llvm.fabs",
> --
> 2.16.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] ac: use correct LLVM opcodes for ordered comparisons

2018-02-23 Thread Connor Abbott
On Fri, Feb 23, 2018 at 8:30 AM, Bas Nieuwenhuizen <ba...@chromium.org> wrote:
>
>
> On Thu, Feb 15, 2018 at 8:54 AM, Connor Abbott <cwabbo...@gmail.com> wrote:
>>
>> On Wed, Feb 14, 2018 at 11:53 PM, Timothy Arceri <tarc...@itsqueeze.com>
>> wrote:
>> >
>> >
>> > On 15/02/18 04:39, Marek Olšák wrote:
>> >>
>> >> Reviewed-by: Marek Olšák <marek.ol...@amd.com>
>> >>
>> >> Marek
>> >>
>> >> On Wed, Feb 14, 2018 at 7:29 AM, Timothy Arceri <tarc...@itsqueeze.com>
>> >> wrote:
>> >>>
>> >>> Fixes glsl-1.30/execution/isinf-and-isnan* piglit tests for
>> >>> radeonsi and should fix SPIRV errors when LLVM optimises away
>> >>> the workarounds in vtn_handle_alu() for handling ordered
>> >>> comparisons.
>> >>>
>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104905
>> >>> ---
>> >>>   src/amd/common/ac_nir_to_llvm.c | 8 
>> >>>   1 file changed, 4 insertions(+), 4 deletions(-)
>> >>>
>> >>> diff --git a/src/amd/common/ac_nir_to_llvm.c
>> >>> b/src/amd/common/ac_nir_to_llvm.c
>> >>> index a0c5680205..e81f86bb08 100644
>> >>> --- a/src/amd/common/ac_nir_to_llvm.c
>> >>> +++ b/src/amd/common/ac_nir_to_llvm.c
>> >>> @@ -1792,16 +1792,16 @@ static void visit_alu(struct ac_nir_context
>> >>> *ctx,
>> >>> const nir_alu_instr *instr)
>> >>>  result = emit_int_cmp(>ac, LLVMIntUGE, src[0],
>> >>> src[1]);
>> >>>  break;
>> >>>  case nir_op_feq:
>> >>> -   result = emit_float_cmp(>ac, LLVMRealUEQ, src[0],
>> >>> src[1]);
>> >>> +   result = emit_float_cmp(>ac, LLVMRealOEQ, src[0],
>> >>> src[1]);
>> >>>  break;
>> >>>  case nir_op_fne:
>> >>> -   result = emit_float_cmp(>ac, LLVMRealUNE, src[0],
>> >>> src[1]);
>> >>> +   result = emit_float_cmp(>ac, LLVMRealONE, src[0],
>> >>> src[1]);
>> >
>> >
>> > It seems we need to leave this one as is to avoid regressions. This is
>> > also
>> > what radeonsi does.
>>
>> So, the thing you have to understand is that in LLVM unordered
>> comparisons are precisely the inverse of the ordered comparisons. That
>> is, (a <=(ordered) b) == !(a >(unordered b), (a ==(ordered) b) == !(a
>> !=(unordered) b), and  so on. C defines that all comparsions are
>> ordered except !=, so that (a == b) == !(a != b) always holds true.
>> Most hardware follows this convention -- offhand, x86 SSE is the only
>> ISA I know of with separate ordered and unordered comparisons, and
>> LLVM appears to have copied the distinction from them, but no one else
>> has both. I'm not even sure if it's in the IEEE spec. GLSL follows the
>> C convention, so glsl_to_nir just uses nir_op_fne to mean unordered
>> not-equal. spirv_to_nir generates some extra instructions, which then
>> get stripped away later... sigh.
>>
>> I think the right way to untangle this mess is to define that the NIR
>> opcodes should always match the C convention. The separate ordered and
>> unordered opcodes are unnecesary, since one is just the logical
>> negation of the other, and LLVM was a little overzealous -- I'm sure
>> they would get rid of the distinction if they had the chance -- and
>> then they were blindly copied to SPIR-V. spirv_to_nir should just
>> negate the result if necessary rather than emitting the extra code to
>> handle NaN, and ac should use ordered except for not-equals.
>
>
> I think we should also use ordered for not-equal. Otherwise we have no way
> to contruct an unordered equal  or ordered not-equal using the not-trick. I
> think that would be more important than trying to keep it in sync with C?

I was thinking about that too... but all the backends (except for
radv), frontends, opt_algebraic patterns, etc. currently assume fne
means unordered not-equals. We'd have to rewrite a lot of stuff to
flip the meaning. But if you're willing to do all the mechanical
rewriting, sure :).

>>
>>
>> >
>> >
>> >>>  break;
>> >>>  case nir_op_flt:
>> >>> -   result = emit_float_cmp(>ac, LLVMRealULT, src[0],
>> >>> s

Re: [Mesa-dev] [PATCH v2 2/2] radv: implement AMD_gcn_shader extension

2018-02-21 Thread Connor Abbott
On Wed, Feb 21, 2018 at 3:03 PM, Daniel Schürmann
 wrote:
>
>> On Wed, Feb 21, 2018 at 1:00 PM,  
>> wrote:
>>>
>>> From: Dave Airlie 
>>>
>>> Co-authored-by: Daniel Schürmann 
>>> Signed-off-by: Daniel Schürmann 
>>> ---
>>>   src/amd/common/ac_llvm_build.c|  3 +-
>>>   src/amd/common/ac_nir_to_llvm.c   | 39 ++
>>>   src/amd/vulkan/radv_extensions.py |  1 +
>>>   src/compiler/nir/meson.build  |  1 +
>>>   src/compiler/nir/nir_intrinsics.h |  4 +++
>>>   src/compiler/spirv/spirv_to_nir.c |  2 ++
>>>   src/compiler/spirv/vtn_amd.c  | 68
>>> +++
>>>   src/compiler/spirv/vtn_private.h  |  3 ++
>>>   8 files changed, 120 insertions(+), 1 deletion(-)
>>>   create mode 100644 src/compiler/spirv/vtn_amd.c
>>>
>>> diff --git a/src/amd/common/ac_llvm_build.c
>>> b/src/amd/common/ac_llvm_build.c
>>> index 15144addb9..3bb74c2b0b 100644
>>> --- a/src/amd/common/ac_llvm_build.c
>>> +++ b/src/amd/common/ac_llvm_build.c
>>> @@ -370,7 +370,8 @@ LLVMValueRef
>>>   ac_build_shader_clock(struct ac_llvm_context *ctx)
>>>   {
>>>  LLVMValueRef tmp = ac_build_intrinsic(ctx,
>>> "llvm.readcyclecounter",
>>> - ctx->i64, NULL, 0, 0);
>>> + ctx->i64, NULL, 0,
>>> AC_FUNC_ATTR_READONLY);
>>> +   ac_build_optimization_barrier(ctx, );
>>
>> ac_build_optimization_barrier() creates an empty inline asm statement,
>> which actually doesn't do much to prevent code motion beyond the
>> attributes already added to llvm.readcyclecounter by llvm. It prevents
>> duplicating it, but that's about it, and not useful anyways. We only
>> use it to work around some problems with cross-wavefront intrinsics,
>> which don't exist here. You can just drop this hunk.
>
> It also prevents LLVM from eliminating multiple calls to the same function,
> which is the purpose in this case. (And also functions as "kind of" code
> motion barrier)

I think we can fix this by removing the "readonly" parameter from
ac_build_intrinsic(). LLVM doesn't really give you many extra
guarantees with an inline asm block, compared to an intrinsic with no
flags set. The only difference, I think, is that noduplicate is
implied for asm calls with the sideeffect bit set, but that's not
useful here.

>
>>
>>>  return LLVMBuildBitCast(ctx->builder, tmp, ctx->v2i32, "");
>>>   }
>>>
>>> diff --git a/src/amd/common/ac_nir_to_llvm.c
>>> b/src/amd/common/ac_nir_to_llvm.c
>>> index 2460e105f7..05f28b26a2 100644
>>> --- a/src/amd/common/ac_nir_to_llvm.c
>>> +++ b/src/amd/common/ac_nir_to_llvm.c
>>> @@ -4328,6 +4328,38 @@ load_patch_vertices_in(struct ac_shader_abi *abi)
>>>  return LLVMConstInt(ctx->ac.i32,
>>> ctx->options->key.tcs.input_vertices, false);
>>>   }
>>>
>>> +static LLVMValueRef
>>> +visit_cube_face_index(struct ac_nir_context *ctx,
>>> + nir_intrinsic_instr *instr)
>>> +{
>>> +   LLVMValueRef result;
>>> +   LLVMValueRef in[3];
>>> +   LLVMValueRef src0 = ac_to_float(>ac, get_src(ctx,
>>> instr->src[0]));
>>> +   for (unsigned chan = 0; chan < 3; chan++)
>>> +   in[chan] = ac_llvm_extract_elem(>ac, src0, chan);
>>> +
>>> +   result = ac_build_intrinsic(>ac,  "llvm.amdgcn.cubeid",
>>> +   ctx->ac.f32, in, 3,
>>> AC_FUNC_ATTR_READNONE);
>>> +   return result;
>>> +}
>>> +
>>> +static LLVMValueRef
>>> +visit_cube_face_coord(struct ac_nir_context *ctx,
>>> + nir_intrinsic_instr *instr)
>>> +{
>>> +   LLVMValueRef results[2];
>>> +   LLVMValueRef in[3];
>>> +   LLVMValueRef src0 = ac_to_float(>ac, get_src(ctx,
>>> instr->src[0]));
>>> +   for (unsigned chan = 0; chan < 3; chan++)
>>> +   in[chan] = ac_llvm_extract_elem(>ac, src0, chan);
>>> +
>>> +   results[0] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubetc",
>>> +   ctx->ac.f32, in, 3,
>>> AC_FUNC_ATTR_READNONE);
>>> +   results[1] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubesc",
>>> +   ctx->ac.f32, in, 3,
>>> AC_FUNC_ATTR_READNONE);
>>> +   return ac_build_gather_values(>ac, results, 2);
>>> +}
>>> +
>>>   static void visit_intrinsic(struct ac_nir_context *ctx,
>>>   nir_intrinsic_instr *instr)
>>>   {
>>> @@ -4613,6 +4645,13 @@ static void visit_intrinsic(struct ac_nir_context
>>> *ctx,
>>>  result = LLVMBuildSExt(ctx->ac.builder, tmp,
>>> ctx->ac.i32, "");
>>>  break;
>>>  }
>>> +   case nir_intrinsic_cube_face_index:
>>> +   result = visit_cube_face_index(ctx, instr);
>>> +   break;
>>> +   case nir_intrinsic_cube_face_coord:
>>> +   result = 

Re: [Mesa-dev] [PATCH v2 2/2] radv: implement AMD_gcn_shader extension

2018-02-21 Thread Connor Abbott
On Wed, Feb 21, 2018 at 3:13 PM, Connor Abbott <cwabbo...@gmail.com> wrote:
> On Wed, Feb 21, 2018 at 3:03 PM, Daniel Schürmann
> <daniel.schuerm...@campus.tu-berlin.de> wrote:
>>
>>> On Wed, Feb 21, 2018 at 1:00 PM,  <daniel.schuerm...@campus.tu-berlin.de>
>>> wrote:
>>>>
>>>> From: Dave Airlie <airl...@redhat.com>
>>>>
>>>> Co-authored-by: Daniel Schürmann <daniel.schuerm...@campus.tu-berlin.de>
>>>> Signed-off-by: Daniel Schürmann <daniel.schuerm...@campus.tu-berlin.de>
>>>> ---
>>>>   src/amd/common/ac_llvm_build.c|  3 +-
>>>>   src/amd/common/ac_nir_to_llvm.c   | 39 ++
>>>>   src/amd/vulkan/radv_extensions.py |  1 +
>>>>   src/compiler/nir/meson.build  |  1 +
>>>>   src/compiler/nir/nir_intrinsics.h |  4 +++
>>>>   src/compiler/spirv/spirv_to_nir.c |  2 ++
>>>>   src/compiler/spirv/vtn_amd.c  | 68
>>>> +++
>>>>   src/compiler/spirv/vtn_private.h  |  3 ++
>>>>   8 files changed, 120 insertions(+), 1 deletion(-)
>>>>   create mode 100644 src/compiler/spirv/vtn_amd.c
>>>>
>>>> diff --git a/src/amd/common/ac_llvm_build.c
>>>> b/src/amd/common/ac_llvm_build.c
>>>> index 15144addb9..3bb74c2b0b 100644
>>>> --- a/src/amd/common/ac_llvm_build.c
>>>> +++ b/src/amd/common/ac_llvm_build.c
>>>> @@ -370,7 +370,8 @@ LLVMValueRef
>>>>   ac_build_shader_clock(struct ac_llvm_context *ctx)
>>>>   {
>>>>  LLVMValueRef tmp = ac_build_intrinsic(ctx,
>>>> "llvm.readcyclecounter",
>>>> - ctx->i64, NULL, 0, 0);
>>>> + ctx->i64, NULL, 0,
>>>> AC_FUNC_ATTR_READONLY);
>>>> +   ac_build_optimization_barrier(ctx, );
>>>
>>> ac_build_optimization_barrier() creates an empty inline asm statement,
>>> which actually doesn't do much to prevent code motion beyond the
>>> attributes already added to llvm.readcyclecounter by llvm. It prevents
>>> duplicating it, but that's about it, and not useful anyways. We only
>>> use it to work around some problems with cross-wavefront intrinsics,
>>> which don't exist here. You can just drop this hunk.
>>
>> It also prevents LLVM from eliminating multiple calls to the same function,
>> which is the purpose in this case. (And also functions as "kind of" code
>> motion barrier)
>
> I think we can fix this by removing the "readonly" parameter from
> ac_build_intrinsic(). LLVM doesn't really give you many extra
> guarantees with an inline asm block, compared to an intrinsic with no
> flags set. The only difference, I think, is that noduplicate is
> implied for asm calls with the sideeffect bit set, but that's not
> useful here.

(Also, I forgot to mention, but this should be a separate change,
since it impacts radeonsi NIR as well.)

>
>>
>>>
>>>>  return LLVMBuildBitCast(ctx->builder, tmp, ctx->v2i32, "");
>>>>   }
>>>>
>>>> diff --git a/src/amd/common/ac_nir_to_llvm.c
>>>> b/src/amd/common/ac_nir_to_llvm.c
>>>> index 2460e105f7..05f28b26a2 100644
>>>> --- a/src/amd/common/ac_nir_to_llvm.c
>>>> +++ b/src/amd/common/ac_nir_to_llvm.c
>>>> @@ -4328,6 +4328,38 @@ load_patch_vertices_in(struct ac_shader_abi *abi)
>>>>  return LLVMConstInt(ctx->ac.i32,
>>>> ctx->options->key.tcs.input_vertices, false);
>>>>   }
>>>>
>>>> +static LLVMValueRef
>>>> +visit_cube_face_index(struct ac_nir_context *ctx,
>>>> + nir_intrinsic_instr *instr)
>>>> +{
>>>> +   LLVMValueRef result;
>>>> +   LLVMValueRef in[3];
>>>> +   LLVMValueRef src0 = ac_to_float(>ac, get_src(ctx,
>>>> instr->src[0]));
>>>> +   for (unsigned chan = 0; chan < 3; chan++)
>>>> +   in[chan] = ac_llvm_extract_elem(>ac, src0, chan);
>>>> +
>>>> +   result = ac_build_intrinsic(>ac,  "llvm.amdgcn.cubeid",
>>>> +   ctx->ac.f32, in, 3,
>>>> AC_FUNC_ATTR_READNONE);
>>>> +   return result;
>>>> +}
>>>> +
>>>> +static LLVMValueRef
>>>> +visit_cube_face_coord(struct ac_n

Re: [Mesa-dev] [PATCH v2 2/2] radv: implement AMD_gcn_shader extension

2018-02-21 Thread Connor Abbott
On Wed, Feb 21, 2018 at 1:00 PM,   wrote:
> From: Dave Airlie 
>
> Co-authored-by: Daniel Schürmann 
> Signed-off-by: Daniel Schürmann 
> ---
>  src/amd/common/ac_llvm_build.c|  3 +-
>  src/amd/common/ac_nir_to_llvm.c   | 39 ++
>  src/amd/vulkan/radv_extensions.py |  1 +
>  src/compiler/nir/meson.build  |  1 +
>  src/compiler/nir/nir_intrinsics.h |  4 +++
>  src/compiler/spirv/spirv_to_nir.c |  2 ++
>  src/compiler/spirv/vtn_amd.c  | 68 
> +++
>  src/compiler/spirv/vtn_private.h  |  3 ++
>  8 files changed, 120 insertions(+), 1 deletion(-)
>  create mode 100644 src/compiler/spirv/vtn_amd.c
>
> diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
> index 15144addb9..3bb74c2b0b 100644
> --- a/src/amd/common/ac_llvm_build.c
> +++ b/src/amd/common/ac_llvm_build.c
> @@ -370,7 +370,8 @@ LLVMValueRef
>  ac_build_shader_clock(struct ac_llvm_context *ctx)
>  {
> LLVMValueRef tmp = ac_build_intrinsic(ctx, "llvm.readcyclecounter",
> - ctx->i64, NULL, 0, 0);
> + ctx->i64, NULL, 0, AC_FUNC_ATTR_READONLY);
> +   ac_build_optimization_barrier(ctx, );

ac_build_optimization_barrier() creates an empty inline asm statement,
which actually doesn't do much to prevent code motion beyond the
attributes already added to llvm.readcyclecounter by llvm. It prevents
duplicating it, but that's about it, and not useful anyways. We only
use it to work around some problems with cross-wavefront intrinsics,
which don't exist here. You can just drop this hunk.

> return LLVMBuildBitCast(ctx->builder, tmp, ctx->v2i32, "");
>  }
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 2460e105f7..05f28b26a2 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -4328,6 +4328,38 @@ load_patch_vertices_in(struct ac_shader_abi *abi)
> return LLVMConstInt(ctx->ac.i32, 
> ctx->options->key.tcs.input_vertices, false);
>  }
>
> +static LLVMValueRef
> +visit_cube_face_index(struct ac_nir_context *ctx,
> + nir_intrinsic_instr *instr)
> +{
> +   LLVMValueRef result;
> +   LLVMValueRef in[3];
> +   LLVMValueRef src0 = ac_to_float(>ac, get_src(ctx, 
> instr->src[0]));
> +   for (unsigned chan = 0; chan < 3; chan++)
> +   in[chan] = ac_llvm_extract_elem(>ac, src0, chan);
> +
> +   result = ac_build_intrinsic(>ac,  "llvm.amdgcn.cubeid",
> +   ctx->ac.f32, in, 3, 
> AC_FUNC_ATTR_READNONE);
> +   return result;
> +}
> +
> +static LLVMValueRef
> +visit_cube_face_coord(struct ac_nir_context *ctx,
> + nir_intrinsic_instr *instr)
> +{
> +   LLVMValueRef results[2];
> +   LLVMValueRef in[3];
> +   LLVMValueRef src0 = ac_to_float(>ac, get_src(ctx, 
> instr->src[0]));
> +   for (unsigned chan = 0; chan < 3; chan++)
> +   in[chan] = ac_llvm_extract_elem(>ac, src0, chan);
> +
> +   results[0] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubetc",
> +   ctx->ac.f32, in, 3, 
> AC_FUNC_ATTR_READNONE);
> +   results[1] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubesc",
> +   ctx->ac.f32, in, 3, 
> AC_FUNC_ATTR_READNONE);
> +   return ac_build_gather_values(>ac, results, 2);
> +}
> +
>  static void visit_intrinsic(struct ac_nir_context *ctx,
>  nir_intrinsic_instr *instr)
>  {
> @@ -4613,6 +4645,13 @@ static void visit_intrinsic(struct ac_nir_context *ctx,
> result = LLVMBuildSExt(ctx->ac.builder, tmp, ctx->ac.i32, "");
> break;
> }
> +   case nir_intrinsic_cube_face_index:
> +   result = visit_cube_face_index(ctx, instr);
> +   break;
> +   case nir_intrinsic_cube_face_coord:
> +   result = visit_cube_face_coord(ctx, instr);
> +   break;
> +
> default:
> fprintf(stderr, "Unknown intrinsic: ");
> nir_print_instr(>instr, stderr);
> diff --git a/src/amd/vulkan/radv_extensions.py 
> b/src/amd/vulkan/radv_extensions.py
> index d761895d3a..a63e01faae 100644
> --- a/src/amd/vulkan/radv_extensions.py
> +++ b/src/amd/vulkan/radv_extensions.py
> @@ -88,6 +88,7 @@ EXTENSIONS = [
>  Extension('VK_EXT_external_memory_host',  1, 
> 'device->rad_info.has_userptr'),
>  Extension('VK_EXT_global_priority',   1, 
> 'device->rad_info.has_ctx_priority'),
>  Extension('VK_AMD_draw_indirect_count',   1, True),
> +Extension('VK_AMD_gcn_shader',1, True),
>  Extension('VK_AMD_rasterization_order',   1, 
> 'device->rad_info.chip_class >= 

Re: [Mesa-dev] [PATCH 2/2] radv: implement AMD_gcn_shader extension

2018-02-20 Thread Connor Abbott
On Tue, Feb 20, 2018 at 2:06 PM, Daniel Schürmann
 wrote:
> From: Dave Airlie 
>
> Signed-off-by: Daniel Schürmann 
> ---
>  src/amd/common/ac_nir_to_llvm.c   | 51 +++
>  src/amd/vulkan/radv_extensions.py |  1 +
>  src/compiler/nir/meson.build  |  1 +
>  src/compiler/nir/nir_intrinsics.h |  5 
>  src/compiler/spirv/spirv_to_nir.c |  2 ++
>  src/compiler/spirv/vtn_amd.c  | 63
> +++
>  src/compiler/spirv/vtn_private.h  |  3 ++
>  7 files changed, 126 insertions(+)
>  create mode 100644 src/compiler/spirv/vtn_amd.c
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c
> b/src/amd/common/ac_nir_to_llvm.c
> index 12f097e2b2..251c225676 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -4325,6 +4325,47 @@ load_patch_vertices_in(struct ac_shader_abi *abi)
> return LLVMConstInt(ctx->ac.i32,
> ctx->options->key.tcs.input_vertices, false);
>  }
>  +static LLVMValueRef
> +visit_cube_face_index(struct ac_nir_context *ctx,
> + nir_intrinsic_instr *instr)
> +{
> +   LLVMValueRef result;
> +   LLVMValueRef in[3];
> +   LLVMValueRef src0 = ac_to_float(>ac, get_src(ctx,
> instr->src[0]));
> +   for (unsigned chan = 0; chan < 3; chan++)
> +   in[chan] = ac_llvm_extract_elem(>ac, src0, chan);
> +
> +   result = ac_build_intrinsic(>ac,  "llvm.amdgcn.cubeid",
> +   ctx->ac.f32, in, 3,
> AC_FUNC_ATTR_READNONE);
> +   return result;
> +}
> +
> +static LLVMValueRef
> +visit_cube_face_coord(struct ac_nir_context *ctx,
> + nir_intrinsic_instr *instr)
> +{
> +   LLVMValueRef results[2];
> +   LLVMValueRef in[3];
> +   LLVMValueRef src0 = ac_to_float(>ac, get_src(ctx,
> instr->src[0]));
> +   for (unsigned chan = 0; chan < 3; chan++)
> +   in[chan] = ac_llvm_extract_elem(>ac, src0, chan);
> +
> +   results[0] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubetc",
> +   ctx->ac.f32, in, 3,
> AC_FUNC_ATTR_READNONE);
> +   results[1] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubesc",
> +   ctx->ac.f32, in, 3,
> AC_FUNC_ATTR_READNONE);
> +   return ac_build_gather_values(>ac, results, 2);
> +}
> +
> +static LLVMValueRef
> +visit_time(struct ac_nir_context *ctx,
> +nir_intrinsic_instr *instr)
> +{
> +   return ac_build_intrinsic(>ac, "llvm.amdgcn.s.memrealtime",
> + ctx->ac.i64, NULL, 0,
> AC_FUNC_ATTR_READONLY);
> +
> +}
> +
>  static void visit_intrinsic(struct ac_nir_context *ctx,
>  nir_intrinsic_instr *instr)
>  {
> @@ -4610,6 +4651,16 @@ static void visit_intrinsic(struct ac_nir_context
> *ctx,
> result = LLVMBuildSExt(ctx->ac.builder, tmp, ctx->ac.i32,
> "");
> break;
> }
> +   case nir_intrinsic_cube_face_index:
> +   result = visit_cube_face_index(ctx, instr);
> +   break;
> +   case nir_intrinsic_cube_face_coord:
> +   result = visit_cube_face_coord(ctx, instr);
> +   break;
> +   case nir_intrinsic_time:
> +   result = visit_time(ctx, instr);
> +   break;
> +
> default:
> fprintf(stderr, "Unknown intrinsic: ");
> nir_print_instr(>instr, stderr);
> diff --git a/src/amd/vulkan/radv_extensions.py
> b/src/amd/vulkan/radv_extensions.py
> index d761895d3a..a63e01faae 100644
> --- a/src/amd/vulkan/radv_extensions.py
> +++ b/src/amd/vulkan/radv_extensions.py
> @@ -88,6 +88,7 @@ EXTENSIONS = [
>  Extension('VK_EXT_external_memory_host',  1,
> 'device->rad_info.has_userptr'),
>  Extension('VK_EXT_global_priority',   1,
> 'device->rad_info.has_ctx_priority'),
>  Extension('VK_AMD_draw_indirect_count',   1, True),
> +Extension('VK_AMD_gcn_shader',1, True),
>  Extension('VK_AMD_rasterization_order',   1,
> 'device->rad_info.chip_class >= VI && device->rad_info.max_se >= 2'),
>  Extension('VK_AMD_shader_info',   1, True),
>  ]
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index 859a0c1e62..e0011a4dc0 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -189,6 +189,7 @@ files_libnir = files(
>'../spirv/spirv_info.h',
>'../spirv/spirv_to_nir.c',
>'../spirv/vtn_alu.c',
> +  '../spirv/vtn_amd.c',
>'../spirv/vtn_cfg.c',
>'../spirv/vtn_glsl450.c',
>'../spirv/vtn_private.h',
> diff --git a/src/compiler/nir/nir_intrinsics.h
> b/src/compiler/nir/nir_intrinsics.h
> index ede2927787..e3c0620ce8 100644
> --- a/src/compiler/nir/nir_intrinsics.h
> +++ b/src/compiler/nir/nir_intrinsics.h

Re: [Mesa-dev] [PATCH 5/5] ac: use correct LLVM opcodes for ordered comparisons

2018-02-14 Thread Connor Abbott
On Wed, Feb 14, 2018 at 11:53 PM, Timothy Arceri  wrote:
>
>
> On 15/02/18 04:39, Marek Olšák wrote:
>>
>> Reviewed-by: Marek Olšák 
>>
>> Marek
>>
>> On Wed, Feb 14, 2018 at 7:29 AM, Timothy Arceri 
>> wrote:
>>>
>>> Fixes glsl-1.30/execution/isinf-and-isnan* piglit tests for
>>> radeonsi and should fix SPIRV errors when LLVM optimises away
>>> the workarounds in vtn_handle_alu() for handling ordered
>>> comparisons.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104905
>>> ---
>>>   src/amd/common/ac_nir_to_llvm.c | 8 
>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/src/amd/common/ac_nir_to_llvm.c
>>> b/src/amd/common/ac_nir_to_llvm.c
>>> index a0c5680205..e81f86bb08 100644
>>> --- a/src/amd/common/ac_nir_to_llvm.c
>>> +++ b/src/amd/common/ac_nir_to_llvm.c
>>> @@ -1792,16 +1792,16 @@ static void visit_alu(struct ac_nir_context *ctx,
>>> const nir_alu_instr *instr)
>>>  result = emit_int_cmp(>ac, LLVMIntUGE, src[0],
>>> src[1]);
>>>  break;
>>>  case nir_op_feq:
>>> -   result = emit_float_cmp(>ac, LLVMRealUEQ, src[0],
>>> src[1]);
>>> +   result = emit_float_cmp(>ac, LLVMRealOEQ, src[0],
>>> src[1]);
>>>  break;
>>>  case nir_op_fne:
>>> -   result = emit_float_cmp(>ac, LLVMRealUNE, src[0],
>>> src[1]);
>>> +   result = emit_float_cmp(>ac, LLVMRealONE, src[0],
>>> src[1]);
>
>
> It seems we need to leave this one as is to avoid regressions. This is also
> what radeonsi does.

So, the thing you have to understand is that in LLVM unordered
comparisons are precisely the inverse of the ordered comparisons. That
is, (a <=(ordered) b) == !(a >(unordered b), (a ==(ordered) b) == !(a
!=(unordered) b), and  so on. C defines that all comparsions are
ordered except !=, so that (a == b) == !(a != b) always holds true.
Most hardware follows this convention -- offhand, x86 SSE is the only
ISA I know of with separate ordered and unordered comparisons, and
LLVM appears to have copied the distinction from them, but no one else
has both. I'm not even sure if it's in the IEEE spec. GLSL follows the
C convention, so glsl_to_nir just uses nir_op_fne to mean unordered
not-equal. spirv_to_nir generates some extra instructions, which then
get stripped away later... sigh.

I think the right way to untangle this mess is to define that the NIR
opcodes should always match the C convention. The separate ordered and
unordered opcodes are unnecesary, since one is just the logical
negation of the other, and LLVM was a little overzealous -- I'm sure
they would get rid of the distinction if they had the chance -- and
then they were blindly copied to SPIR-V. spirv_to_nir should just
negate the result if necessary rather than emitting the extra code to
handle NaN, and ac should use ordered except for not-equals.

>
>
>>>  break;
>>>  case nir_op_flt:
>>> -   result = emit_float_cmp(>ac, LLVMRealULT, src[0],
>>> src[1]);
>>> +   result = emit_float_cmp(>ac, LLVMRealOLT, src[0],
>>> src[1]);
>>>  break;
>>>  case nir_op_fge:
>>> -   result = emit_float_cmp(>ac, LLVMRealUGE, src[0],
>>> src[1]);
>>> +   result = emit_float_cmp(>ac, LLVMRealOGE, src[0],
>>> src[1]);
>>>  break;
>>>  case nir_op_ufeq:
>>>  result = emit_float_cmp(>ac, LLVMRealUEQ, src[0],
>>> src[1]);
>>> --
>>> 2.14.3
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/nir: do int64 lowering before optimization

2018-02-04 Thread Connor Abbott
On Mon, Dec 11, 2017 at 11:01 AM, Jason Ekstrand  wrote:
> On Mon, Dec 11, 2017 at 12:55 AM, Iago Toral  wrote:
>>
>> This didn't get any reviews yet. Any takers?
>>
>> On Fri, 2017-12-01 at 13:46 +0100, Iago Toral Quiroga wrote:
>> > Otherwise loop unrolling will fail to see the actual cost of
>> > the unrolling operations when the loop body contains 64-bit integer
>> > instructions, and very specially when the divmod64 lowering applies,
>> > since its lowering is quite expensive.
>> >
>> > Without this change, some in-development CTS tests for int64
>> > get stuck forever trying to register allocate a shader with
>> > over 50K SSA values. The large number of SSA values is the result
>> > of NIR first unrolling multiple seemingly simple loops that involve
>> > int64 instructions, only to then lower these instructions to produce
>> > a massive pile of code (due to the divmod64 lowering in the unrolled
>> > instructions).
>> >
>> > With this change, loop unrolling will see the loops with the int64
>> > code already lowered and will realize that it is too expensive to
>> > unroll.
>
>
> Hrm... I'm not quite sure what I think of this.  I put it after nir_optimize
> because I wanted opt_algebraic to be able to work it's magic and hopefully
> remove a bunch of int64 ops before we lower them.  In particular, we have
> optimizations to remove integer division and replace it with shifts.
> However, loop unrolling does need to happen before lower_indirect_derefs so
> that lower_indirect_derefs will do as little work as possible.
>
> This is a bit of a pickle...  I don't really want to add a third
> brw_nir_optimize call.  It probably wouldn't be the end of the world but it
> does add compile time.
>
> One crazy idea which I don't think I like would be to have a quick pass that
> walks the IR and sees if there are any 64-bit SSA values.  If it does, we
> run brw_nir_optimize without loop unrolling then 64-bit lowering and then we
> go into the normal brw_nir_optimize.
>
> --Jason

Why don't we just add some sort of backend-specific code-size metric
to the loop unrolling, rather than just counting NIR instructions?
i.e. something like a num_assembly_instructions(nir_instr *) function
pointer in nir_shader_compiler_options. The root of the problem is
that that different NIR instructions can turn into vastly different
numbers of assembly instructions, but we really care about the latter,
so the cutoff isn't doing its job of avoiding code-size blowup. As far
as I'm aware, this is what most other compilers (e.g. LLVM) do to
solve this problem.

>
>>
>> > ---
>> >  src/intel/compiler/brw_nir.c | 8 
>> >  1 file changed, 4 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/src/intel/compiler/brw_nir.c
>> > b/src/intel/compiler/brw_nir.c
>> > index 8f3f77f89a..ef12cdfff8 100644
>> > --- a/src/intel/compiler/brw_nir.c
>> > +++ b/src/intel/compiler/brw_nir.c
>> > @@ -636,6 +636,10 @@ brw_preprocess_nir(const struct brw_compiler
>> > *compiler, nir_shader *nir)
>> >
>> > OPT(nir_split_var_copies);
>> >
>> > +   nir_lower_int64(nir, nir_lower_imul64 |
>> > +nir_lower_isign64 |
>> > +nir_lower_divmod64);
>> > +
>> > nir = brw_nir_optimize(nir, compiler, is_scalar);
>> >
>> > if (is_scalar) {
>> > @@ -663,10 +667,6 @@ brw_preprocess_nir(const struct brw_compiler
>> > *compiler, nir_shader *nir)
>> >brw_nir_no_indirect_mask(compiler, nir->info.stage);
>> > nir_lower_indirect_derefs(nir, indirect_mask);
>> >
>> > -   nir_lower_int64(nir, nir_lower_imul64 |
>> > -nir_lower_isign64 |
>> > -nir_lower_divmod64);
>> > -
>> > /* Get rid of split copies */
>> > nir = brw_nir_optimize(nir, compiler, is_scalar);
>> >
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8] st/glsl_to_nir: disable io lowering and forced indirect array splitting in fs

2018-01-29 Thread Connor Abbott
When I was talking about handling I/O for radeonsi NIR with Nicolai, I
think the conclusion was that the best way forward is to make the
driver call nir_lower_io and friends, at least for inputs and outputs.
This is a perfect example of the kind of hacks that we have to do --
precisely how to handle inputs is a driver-specific thing, and piling
up special cases like this in the state tracker is just unsustainable.

As an aside, we probably should be using the load_barycentric_*
intrinsics, which would make the code for interpolateAt* much cleaner
for radeonsi, since we don't have to deal with variable derefs. The
goal should be to eliminatem manual deref walking and offset
calculation from radeonsi nir entirely, it shouldn't be necessary at
all.

Connor


On Sun, Jan 14, 2018 at 10:46 PM, Timothy Arceri  wrote:
> We need this to be able to support the interpolateAt builtins in a
> sane way. It also leads to the generation of more optimal code.
>
> The lowering and splitting is made conditional on glsl 400 because
> vc4 and freedreno both expect these passes to be enabled and niether
> support glsl 400 so don't need to deal with the interpolateAt builtins.
>
> We leave the other stages for now as to avoid regressions.
> ---
>  src/mesa/state_tracker/st_glsl_to_nir.cpp | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
> b/src/mesa/state_tracker/st_glsl_to_nir.cpp
> index 6e3a1548f4..bc55c5b7db 100644
> --- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
> @@ -461,7 +461,9 @@ st_nir_get_mesa_program(struct gl_context *ctx,
>  struct gl_linked_shader *shader)
>  {
> struct st_context *st = st_context(ctx);
> +   struct pipe_screen *screen = st->pipe->screen;
> struct gl_program *prog;
> +   unsigned glsl_version = screen->get_param(screen, 
> PIPE_CAP_GLSL_FEATURE_LEVEL);
>
> validate_ir_tree(shader->ir);
>
> @@ -491,11 +493,14 @@ st_nir_get_mesa_program(struct gl_context *ctx,
> prog->nir = nir;
>
> if (nir->info.stage != MESA_SHADER_TESS_CTRL &&
> -   nir->info.stage != MESA_SHADER_TESS_EVAL) {
> +   nir->info.stage != MESA_SHADER_TESS_EVAL &&
> +   (nir->info.stage != MESA_SHADER_FRAGMENT ||
> +(glsl_version < 400 && nir->info.stage == MESA_SHADER_FRAGMENT))) {
>NIR_PASS_V(nir, nir_lower_io_to_temporaries,
>   nir_shader_get_entrypoint(nir),
>   true, true);
> }
> +
> NIR_PASS_V(nir, nir_lower_global_vars_to_local);
> NIR_PASS_V(nir, nir_split_var_copies);
> NIR_PASS_V(nir, nir_lower_var_copies);
> @@ -665,12 +670,16 @@ st_finalize_nir(struct st_context *st, struct 
> gl_program *prog,
>  struct gl_shader_program *shader_program, nir_shader *nir)
>  {
> struct pipe_screen *screen = st->pipe->screen;
> +   unsigned glsl_version = screen->get_param(screen, 
> PIPE_CAP_GLSL_FEATURE_LEVEL);
>
> NIR_PASS_V(nir, nir_split_var_copies);
> NIR_PASS_V(nir, nir_lower_var_copies);
> if (nir->info.stage != MESA_SHADER_TESS_CTRL &&
> -   nir->info.stage != MESA_SHADER_TESS_EVAL)
> +   nir->info.stage != MESA_SHADER_TESS_EVAL &&
> +   (nir->info.stage != MESA_SHADER_FRAGMENT ||
> +(glsl_version < 400 && nir->info.stage == MESA_SHADER_FRAGMENT))) {
>NIR_PASS_V(nir, nir_lower_io_arrays_to_elements_no_indirects);
> +   }
>
> if (nir->info.stage == MESA_SHADER_VERTEX) {
>/* Needs special handling so drvloc matches the vbo state: */
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH 1/2] r600/sb: Set flags for GROUP_BARRIER instruction and force it into slot X

2018-01-10 Thread Connor Abbott
On Wed, Jan 10, 2018 at 3:27 PM, Ilia Mirkin  wrote:
> On Wed, Jan 10, 2018 at 3:13 PM, Gert Wollny  wrote:
>> Am Mittwoch, den 10.01.2018, 16:36 +0100 schrieb Gert Wollny:
>>> This seems to satisfy the sb optimizer, i.e. no regressions in the
>>> piglits compared to disabling sb for tesselation shaders with
>>> barriers but enabling them in general.
>>> ---
>>
>> Actually, it seems this is not enough, at least for Tomb Raider which
>> uses one tessellation control shader with a barrier. The optimizer
>> reorders the LDS instructions around the barrier in a way that in the
>> optimized version there are more reads before it than in the original
>> byte code.
>>
>> The number of writes is the same though, and as far as I can tell from
>> the TGSI, the values written to LDS before the barrier are not read
>> back within the shader - which makes me wonder whether the barrier is
>> actually necessary.
>
> If your hardware executes all the vertices in parallel, then a barrier
> should be unnecessary.

While this is true, you also need to be careful that reads after the
barrier don't get reordered wrt any writes before the barrier --
nouveau might be more conservative, but sb might be more aggressive
here.

>
>   -ilia
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nv50/ir/ra: Fix copying compound for moves

2018-01-08 Thread Connor Abbott
In order to reduce moves when coalescing multiple registers into a
larger register, RA will try to coalesce MERGE instructions with their
definitions. For example, for something like this in GLSL:

uint a = ...;
uint b = ...;
uint64 x = packUint2x32(a, b);

The compiler will try to coalesce x with a and b, in the same way as
something like:

uint a = ...;
uint b = ...;
...
uint x = phi(a, b);

with the crucial difference that the definitions of a and b only clobber
part of the register, instead of the whole thing. This information is
carried through the compound flag and compMask bitmask. If compound is
set, then the value has been coalesced in such a way that not all the
defs clobber the entire register. The compMask bitmask describes which
subregister each def clobbers, although it does it in a slightly
convoluted way. It's an invariant that once compound is set on one def,
it must be set for all the defs in a given coalesced value.

In more detail, the constraints pass will first create extra moves:

uint a = ...;
uint b = ...;
uint a' = a;
uint b' = b;
uint64 x = packUint2x32(a', b');

and then RA will merge values involved in MERGE/SPLIT instructions,
merging x with a' and b' and making the combined value compound -- this
is relatively simple, and will always succeed since we just created a'
and b', so they never interfere with x, and x has no other definitions,
since we haven't started coalescing moves yet. Basically, we just replaced
the MERGE instruction with an equivalent sequence of partial writes to the
destination. The tricky part comes when we try to merge a' with a
and b' with b. We need to transfer the compound information from a' to a
and b' to b, which copyCompound() does, but we also need to transfer it
to any defs coalesced with a and b, which the code failed to do. Similarly,
if x is the argument to a phi instruction, then when we try to merge it
with other arguments to the same phi by coalescing moves, we'd have
problems guaranteeing that all the other merged defs stay up-to-date.

One tricky part of fixing this is that in order to properly propagate
the information from a' to a, we need to do it before the defs for a and
a' are merged in coalesceValues(), since we need to know which defs are
merged with a but not a' -- after coalesceValues() returns, all the defs
have been combined, so we don't know which is which. I took the approach
of calling copyCompound() inside coalesceValues(), instead of
afterwards.

Cc: Ilia Mirkin 
Cc: Karol Herbst 
---
So, I guess curiosity got the best of me :). Of course, I have no actual
way to test if this fixes the problem, but hopefully this at least helps
someone get further...

 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 56 ++
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index b33d7b4010..2664c0678f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -890,6 +890,34 @@ GCRA::RIG_Node::init(const RegisterSet& regs, LValue *lval)
livei.insert(lval->livei);
 }
 
+// Used when coalescing moves. The non-compound value will become one, e.g.:
+// mov b32 $r0 $r2/ merge b64 $r0d { $r0 $r1 }
+// split b64 { $r0 $r1 } $r0d / mov b64 $r0d f64 $r2d
+static inline void copyCompound(Value *dst, Value *src)
+{
+   LValue *ldst = dst->asLValue();
+   LValue *lsrc = src->asLValue();
+
+   if (ldst->compound && !lsrc->compound) {
+  LValue *swap = lsrc;
+  lsrc = ldst;
+  ldst = swap;
+   }
+
+   assert(!ldst->compound);
+
+   if (lsrc->compound) {
+  Value *dstRep = ldst->join;
+  for (Value::DefIterator d = dstRep->defs.begin(); d != 
dstRep->defs.end();
+   ++d) {
+ LValue *ldst = (*d)->get()->asLValue();
+ assert(!ldst->compound)
+ ldst->compound = 1;
+ ldst->compMask = lsrc->compMask;
+  }
+   }
+}
+
 bool
 GCRA::coalesceValues(Value *dst, Value *src, bool force)
 {
@@ -932,9 +960,16 @@ GCRA::coalesceValues(Value *dst, Value *src, bool force)
if (!force && nRep->livei.overlaps(nVal->livei))
   return false;
 
+   // TODO: Handle this case properly.
+   if (!force && rep->compound && val->compound)
+  return false;
+
INFO_DBG(prog->dbgFlags, REG_ALLOC, "joining %%%i($%i) <- %%%i\n",
 rep->id, rep->reg.data.id, val->id);
 
+   if (!force)
+  copyCompound(dst, src);
+
// set join pointer of all values joined with val
for (Value::DefIterator def = val->defs.begin(); def != val->defs.end();
 ++def)
@@ -997,24 +1032,6 @@ static inline uint8_t makeCompMask(int compSize, int 
base, int size)
}
 }
 
-// Used when coalescing moves. The non-compound value will become one, e.g.:
-// mov b32 $r0 $r2/ merge b64 $r0d { $r0 $r1 }
-// split b64 { $r0 $r1 } $r0d / mov b64 

  1   2   3   4   5   6   7   8   9   10   >