Re: [Mesa-dev] [PATCH 0/9] radeonsi: ARB_query_buffer_object implementation

2016-09-17 Thread Nicolai Hähnle

On 16.09.2016 19:11, Ian Romanick wrote:

On 09/16/2016 06:57 AM, Nicolai Hähnle wrote:

Hi all,

as the title says. The implementation uses a compute shader to summarize
data from the query buffers. As long as only one query buffer is in flight
(the normal case), that compute shader is launched exactly once, on a
single thread. If multiple buffers were required, then one compute grid is
launched for each of these buffers, in sequence.

All of this could be done in much fancier ways using bindless buffers and
wave-wide computations, but really, the expectation is that most queries
will be rather simple (though occlusion queries always contain at least 8
result pairs, so it's not like it would be completely pointless).

This code also exposes the hilarious lowering of 64-bit integer divides
in LLVM, since timestamp queries use it. This lowering generates more than
2KB of code for a single division, which is excessive even when the division
*isn't* by a constant. The right place to fix this is in LLVM, and I'm
already looking into it. For normal queries this is completely irrelevant
because the code will just be skipped.


Is the division by a constant?  If it is, you might want to use
something like what libdivide would generate.


Yes it is. I'd rather fix this in LLVM, though. LLVM has the required 
infrastructure already, it just doesn't use it in this case out of 
silliness.


Nicolai




Please review!
Thanks
Nicolai

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/9] radeonsi: ARB_query_buffer_object implementation

2016-09-16 Thread Ian Romanick
On 09/16/2016 06:57 AM, Nicolai Hähnle wrote:
> Hi all,
> 
> as the title says. The implementation uses a compute shader to summarize
> data from the query buffers. As long as only one query buffer is in flight
> (the normal case), that compute shader is launched exactly once, on a
> single thread. If multiple buffers were required, then one compute grid is
> launched for each of these buffers, in sequence.
> 
> All of this could be done in much fancier ways using bindless buffers and
> wave-wide computations, but really, the expectation is that most queries
> will be rather simple (though occlusion queries always contain at least 8
> result pairs, so it's not like it would be completely pointless).
> 
> This code also exposes the hilarious lowering of 64-bit integer divides
> in LLVM, since timestamp queries use it. This lowering generates more than
> 2KB of code for a single division, which is excessive even when the division
> *isn't* by a constant. The right place to fix this is in LLVM, and I'm
> already looking into it. For normal queries this is completely irrelevant
> because the code will just be skipped.

Is the division by a constant?  If it is, you might want to use
something like what libdivide would generate.

> Please review!
> Thanks
> Nicolai
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] radeonsi: ARB_query_buffer_object implementation

2016-09-16 Thread Nicolai Hähnle
Hi all,

as the title says. The implementation uses a compute shader to summarize
data from the query buffers. As long as only one query buffer is in flight
(the normal case), that compute shader is launched exactly once, on a
single thread. If multiple buffers were required, then one compute grid is
launched for each of these buffers, in sequence.

All of this could be done in much fancier ways using bindless buffers and
wave-wide computations, but really, the expectation is that most queries
will be rather simple (though occlusion queries always contain at least 8
result pairs, so it's not like it would be completely pointless).

This code also exposes the hilarious lowering of 64-bit integer divides
in LLVM, since timestamp queries use it. This lowering generates more than
2KB of code for a single division, which is excessive even when the division
*isn't* by a constant. The right place to fix this is in LLVM, and I'm
already looking into it. For normal queries this is completely irrelevant
because the code will just be skipped.

Please review!
Thanks
Nicolai

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev