Re: [Mesa-dev] [PATCH v3 00/43] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+

2017-11-02 Thread Chema Casanova
El 02/11/17 a las 01:43, Jason Ekstrand escribió:
> I'm done reading for the day.  As you're working on incorporating
> feedback, I'd  like you to re-arrange things a bit so that we do
> everything required to enable VK_KHR_16bit_storage (including
> advertising the Vulkan extension string) for SSBOs and UBOs first and
> then enable it for push constants and enable it for inputs/outputs
> last.  This way we can land the most important part (UBOs and SSBOs)
> soon and the more annoying parts can get the review time that they need.

I think that is a good approach, I'll reorder the series so we can land
and enable the UBO/SSBOs without the other capabilities.

Chema

>
> On Mon, Oct 30, 2017 at 5:20 PM, Jason Ekstrand  > wrote:
>
> Patches 1-5, 8-11, and 13-18 are
>
> Reviewed-by: Jason Ekstrand  >
>
> On Mon, Oct 16, 2017 at 8:23 AM, Pohjolainen, Topi
> >
> wrote:
>
> On Mon, Oct 16, 2017 at 08:03:41AM -0700, Jason Ekstrand wrote:
> > FYI: I'm planning to review this some time this week. 
> Probably not today
> > though.
>
> Great, I was hoping you would. I'm just reading out of
> curiosity and asking
> random questions. Mostly trying to remind myself how compiler
> works :) It has
> been a while since I had anything to do with it.
>
> >
> > On Thu, Oct 12, 2017 at 11:37 AM, Jose Maria Casanova Crespo <
> > jmcasan...@igalia.com > wrote:
> >
> > > Hello,
> > >
> > > this is the V3 series for the implementation of the
> > > SPV_KHR_16bit_storage and VK_KHR_16bit_storage extensions
> on the anv
> > > vulkan driver, in addition to the GLSL and NIR support needed.
> > >
> > > The original series can be found here [1], and the V2 is
> available
> > > here [2].
> > >
> > > In short V3 includes the following:
> > >
> > >  * Updates on several patches after the review of the V2
> series.
> > >    This includes some squashes, and specially changes so
> 16-bit
> > >    types are always packed, not using stride 2 by default.
> > >    This implied a re-implementation of all
> load_input/store_output
> > >    intrinsics for 16-bit. New solution shuffles and unshuffles
> > >    16-bit components in 32-bit URB write and read
> operations. This
> > >    saves space in the URB writes and reduces the register
> pressure
> > >    just using half of the space.
> > >
> > > * 5 patches have been removed from v2 series because now
> we not
> > >    assume the stride 2 for 16-bit registers. We also
> removed the
> > >    patch of reuse_16bit_conversion_register. The problems
> related
> > >    to spilling that motivate that patch were better
> addressed by
> > >    Curro's liveness patch.
> > >
> > >    i965/fs: Set stride 2 when dealing with 16-bit floats/ints
> > >    i965/fs: Retype 16-bit/stride2 movs to UD on nir_op_vecX
> > >    i965/fs: Need to allocate as minimum 32-bit register
> > >    i965/fs: Update assertion on copy propagation
> > >    i965/fs: Add reuse_16bit_conversions_register optimization
> > >
> > > Finally an updated overview of the patches:
> > >
> > > Patches 1-2 add 16-bit float, int and uint types to GLSL.
> This is
> > > needed because NIR uses GLSL types internally. We use the
> enums
> > > already defined at AMD_gpu_shader_half_float and NV_gpu_shader
> > > extensions. Patch 4 updates mesa/st, in order to avoid
> warnings for
> > > types not handled on a switch.
> > >
> > > Patches 3-6 add NIR support for those new GLSL 16-bit types,
> > > conversion opcodes, and rounding modes for float to half-float
> > > conversions.
> > >
> > > Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR
> support.
> > >
> > > Patches 10-13 add general 16-bit support for i965. This
> includes
> > > handling of new types on several general purpose methods,
> > > update/remove some asserts.
> > >
> > > Patches 14-18 add support for 32 to 16-bit conversions for
> i965,
> > > including rounding mode opcodes (needed for float to
> half-float
> > > conversions), and an optimization that removes superfluous
> rounding
> > > mode sets.
> > >
> > > Patch 19 adds 16-bit support for constant location.
> > >
>  

Re: [Mesa-dev] [PATCH v3 00/43] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+

2017-11-01 Thread Jason Ekstrand
I'm done reading for the day.  As you're working on incorporating feedback,
I'd  like you to re-arrange things a bit so that we do everything required
to enable VK_KHR_16bit_storage (including advertising the Vulkan extension
string) for SSBOs and UBOs first and then enable it for push constants and
enable it for inputs/outputs last.  This way we can land the most important
part (UBOs and SSBOs) soon and the more annoying parts can get the review
time that they need.

On Mon, Oct 30, 2017 at 5:20 PM, Jason Ekstrand 
wrote:

> Patches 1-5, 8-11, and 13-18 are
>
> Reviewed-by: Jason Ekstrand 
>
> On Mon, Oct 16, 2017 at 8:23 AM, Pohjolainen, Topi <
> topi.pohjolai...@gmail.com> wrote:
>
>> On Mon, Oct 16, 2017 at 08:03:41AM -0700, Jason Ekstrand wrote:
>> > FYI: I'm planning to review this some time this week.  Probably not
>> today
>> > though.
>>
>> Great, I was hoping you would. I'm just reading out of curiosity and
>> asking
>> random questions. Mostly trying to remind myself how compiler works :) It
>> has
>> been a while since I had anything to do with it.
>>
>> >
>> > On Thu, Oct 12, 2017 at 11:37 AM, Jose Maria Casanova Crespo <
>> > jmcasan...@igalia.com> wrote:
>> >
>> > > Hello,
>> > >
>> > > this is the V3 series for the implementation of the
>> > > SPV_KHR_16bit_storage and VK_KHR_16bit_storage extensions on the anv
>> > > vulkan driver, in addition to the GLSL and NIR support needed.
>> > >
>> > > The original series can be found here [1], and the V2 is available
>> > > here [2].
>> > >
>> > > In short V3 includes the following:
>> > >
>> > >  * Updates on several patches after the review of the V2 series.
>> > >This includes some squashes, and specially changes so 16-bit
>> > >types are always packed, not using stride 2 by default.
>> > >This implied a re-implementation of all load_input/store_output
>> > >intrinsics for 16-bit. New solution shuffles and unshuffles
>> > >16-bit components in 32-bit URB write and read operations. This
>> > >saves space in the URB writes and reduces the register pressure
>> > >just using half of the space.
>> > >
>> > > * 5 patches have been removed from v2 series because now we not
>> > >assume the stride 2 for 16-bit registers. We also removed the
>> > >patch of reuse_16bit_conversion_register. The problems related
>> > >to spilling that motivate that patch were better addressed by
>> > >Curro's liveness patch.
>> > >
>> > >i965/fs: Set stride 2 when dealing with 16-bit floats/ints
>> > >i965/fs: Retype 16-bit/stride2 movs to UD on nir_op_vecX
>> > >i965/fs: Need to allocate as minimum 32-bit register
>> > >i965/fs: Update assertion on copy propagation
>> > >i965/fs: Add reuse_16bit_conversions_register optimization
>> > >
>> > > Finally an updated overview of the patches:
>> > >
>> > > Patches 1-2 add 16-bit float, int and uint types to GLSL. This is
>> > > needed because NIR uses GLSL types internally. We use the enums
>> > > already defined at AMD_gpu_shader_half_float and NV_gpu_shader
>> > > extensions. Patch 4 updates mesa/st, in order to avoid warnings for
>> > > types not handled on a switch.
>> > >
>> > > Patches 3-6 add NIR support for those new GLSL 16-bit types,
>> > > conversion opcodes, and rounding modes for float to half-float
>> > > conversions.
>> > >
>> > > Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR support.
>> > >
>> > > Patches 10-13 add general 16-bit support for i965. This includes
>> > > handling of new types on several general purpose methods,
>> > > update/remove some asserts.
>> > >
>> > > Patches 14-18 add support for 32 to 16-bit conversions for i965,
>> > > including rounding mode opcodes (needed for float to half-float
>> > > conversions), and an optimization that removes superfluous rounding
>> > > mode sets.
>> > >
>> > > Patch 19 adds 16-bit support for constant location.
>> > >
>> > > Patches 20-24 add and use two new messages: byte scattered read and
>> > > write. Those were needed because untyped surface message has a fixed
>> > > 32-bit write size. Those messages are used on the 16-bit support of
>> > > store SSBO, load SSBO, load UBO and load shared.
>> > >
>> > > Patches 25-29 implement 16-bit vertex attribute inputs support on
>> > > i965. These include changes on anv. This was needed because 16-bit
>> > > surface formats do implicit conversion to 32-bit. To workaround this,
>> > > we override the 16-bit surface format, and use 32-bit ones.
>> > >
>> > > Patch 30 implements load input and load store for all intra stage.
>> > > This patch substitutes the previous simple patch i965/fs: Set stride 2
>> > > when dealing with 16-bit floats/ints.
>> > >
>> > > Patch 31-37 implements 16-bit store output support for fragment
>> > > shaders on i965.
>> > >
>> > > Patches 38-41 are the new patches included in V2. Three of them are
>> > > improvements over V1 that doesn't fix any execution problem, but they

Re: [Mesa-dev] [PATCH v3 00/43] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+

2017-10-30 Thread Jason Ekstrand
Patches 1-5, 8-11, and 13-18 are

Reviewed-by: Jason Ekstrand 

On Mon, Oct 16, 2017 at 8:23 AM, Pohjolainen, Topi <
topi.pohjolai...@gmail.com> wrote:

> On Mon, Oct 16, 2017 at 08:03:41AM -0700, Jason Ekstrand wrote:
> > FYI: I'm planning to review this some time this week.  Probably not today
> > though.
>
> Great, I was hoping you would. I'm just reading out of curiosity and asking
> random questions. Mostly trying to remind myself how compiler works :) It
> has
> been a while since I had anything to do with it.
>
> >
> > On Thu, Oct 12, 2017 at 11:37 AM, Jose Maria Casanova Crespo <
> > jmcasan...@igalia.com> wrote:
> >
> > > Hello,
> > >
> > > this is the V3 series for the implementation of the
> > > SPV_KHR_16bit_storage and VK_KHR_16bit_storage extensions on the anv
> > > vulkan driver, in addition to the GLSL and NIR support needed.
> > >
> > > The original series can be found here [1], and the V2 is available
> > > here [2].
> > >
> > > In short V3 includes the following:
> > >
> > >  * Updates on several patches after the review of the V2 series.
> > >This includes some squashes, and specially changes so 16-bit
> > >types are always packed, not using stride 2 by default.
> > >This implied a re-implementation of all load_input/store_output
> > >intrinsics for 16-bit. New solution shuffles and unshuffles
> > >16-bit components in 32-bit URB write and read operations. This
> > >saves space in the URB writes and reduces the register pressure
> > >just using half of the space.
> > >
> > > * 5 patches have been removed from v2 series because now we not
> > >assume the stride 2 for 16-bit registers. We also removed the
> > >patch of reuse_16bit_conversion_register. The problems related
> > >to spilling that motivate that patch were better addressed by
> > >Curro's liveness patch.
> > >
> > >i965/fs: Set stride 2 when dealing with 16-bit floats/ints
> > >i965/fs: Retype 16-bit/stride2 movs to UD on nir_op_vecX
> > >i965/fs: Need to allocate as minimum 32-bit register
> > >i965/fs: Update assertion on copy propagation
> > >i965/fs: Add reuse_16bit_conversions_register optimization
> > >
> > > Finally an updated overview of the patches:
> > >
> > > Patches 1-2 add 16-bit float, int and uint types to GLSL. This is
> > > needed because NIR uses GLSL types internally. We use the enums
> > > already defined at AMD_gpu_shader_half_float and NV_gpu_shader
> > > extensions. Patch 4 updates mesa/st, in order to avoid warnings for
> > > types not handled on a switch.
> > >
> > > Patches 3-6 add NIR support for those new GLSL 16-bit types,
> > > conversion opcodes, and rounding modes for float to half-float
> > > conversions.
> > >
> > > Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR support.
> > >
> > > Patches 10-13 add general 16-bit support for i965. This includes
> > > handling of new types on several general purpose methods,
> > > update/remove some asserts.
> > >
> > > Patches 14-18 add support for 32 to 16-bit conversions for i965,
> > > including rounding mode opcodes (needed for float to half-float
> > > conversions), and an optimization that removes superfluous rounding
> > > mode sets.
> > >
> > > Patch 19 adds 16-bit support for constant location.
> > >
> > > Patches 20-24 add and use two new messages: byte scattered read and
> > > write. Those were needed because untyped surface message has a fixed
> > > 32-bit write size. Those messages are used on the 16-bit support of
> > > store SSBO, load SSBO, load UBO and load shared.
> > >
> > > Patches 25-29 implement 16-bit vertex attribute inputs support on
> > > i965. These include changes on anv. This was needed because 16-bit
> > > surface formats do implicit conversion to 32-bit. To workaround this,
> > > we override the 16-bit surface format, and use 32-bit ones.
> > >
> > > Patch 30 implements load input and load store for all intra stage.
> > > This patch substitutes the previous simple patch i965/fs: Set stride 2
> > > when dealing with 16-bit floats/ints.
> > >
> > > Patch 31-37 implements 16-bit store output support for fragment
> > > shaders on i965.
> > >
> > > Patches 38-41 are the new patches included in V2. Three of them are
> > > improvements over V1 that doesn't fix any execution problem, but they
> > > improve performance reducing the use of multiple scattered messages
> > > for untyped read/write opreations. 16bit CTS tests passes without them.
> > > The other one would fix a real problem (patch 41), but unfourtunately
> > > no CTS test yet catching it.
> > >
> > > Patches 42-43 enable both extensions on anv vulkan driver.
> > >
> > > [1] https://lists.freedesktop.org/archives/mesa-dev/2017-July/16
> 2791.html
> > > [2] https://lists.freedesktop.org/archives/mesa-dev/2017-August/
> > > 167455.html
> > >
> > > Alejandro Piñeiro (14):
> > >   i965/vec4: Handle 16-bit types at type_size_xvec4
> > >   i965/fs: Add brw_reg_type_from_bit_size 

Re: [Mesa-dev] [PATCH v3 00/43] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+

2017-10-16 Thread Pohjolainen, Topi
On Mon, Oct 16, 2017 at 08:03:41AM -0700, Jason Ekstrand wrote:
> FYI: I'm planning to review this some time this week.  Probably not today
> though.

Great, I was hoping you would. I'm just reading out of curiosity and asking
random questions. Mostly trying to remind myself how compiler works :) It has
been a while since I had anything to do with it.

> 
> On Thu, Oct 12, 2017 at 11:37 AM, Jose Maria Casanova Crespo <
> jmcasan...@igalia.com> wrote:
> 
> > Hello,
> >
> > this is the V3 series for the implementation of the
> > SPV_KHR_16bit_storage and VK_KHR_16bit_storage extensions on the anv
> > vulkan driver, in addition to the GLSL and NIR support needed.
> >
> > The original series can be found here [1], and the V2 is available
> > here [2].
> >
> > In short V3 includes the following:
> >
> >  * Updates on several patches after the review of the V2 series.
> >This includes some squashes, and specially changes so 16-bit
> >types are always packed, not using stride 2 by default.
> >This implied a re-implementation of all load_input/store_output
> >intrinsics for 16-bit. New solution shuffles and unshuffles
> >16-bit components in 32-bit URB write and read operations. This
> >saves space in the URB writes and reduces the register pressure
> >just using half of the space.
> >
> > * 5 patches have been removed from v2 series because now we not
> >assume the stride 2 for 16-bit registers. We also removed the
> >patch of reuse_16bit_conversion_register. The problems related
> >to spilling that motivate that patch were better addressed by
> >Curro's liveness patch.
> >
> >i965/fs: Set stride 2 when dealing with 16-bit floats/ints
> >i965/fs: Retype 16-bit/stride2 movs to UD on nir_op_vecX
> >i965/fs: Need to allocate as minimum 32-bit register
> >i965/fs: Update assertion on copy propagation
> >i965/fs: Add reuse_16bit_conversions_register optimization
> >
> > Finally an updated overview of the patches:
> >
> > Patches 1-2 add 16-bit float, int and uint types to GLSL. This is
> > needed because NIR uses GLSL types internally. We use the enums
> > already defined at AMD_gpu_shader_half_float and NV_gpu_shader
> > extensions. Patch 4 updates mesa/st, in order to avoid warnings for
> > types not handled on a switch.
> >
> > Patches 3-6 add NIR support for those new GLSL 16-bit types,
> > conversion opcodes, and rounding modes for float to half-float
> > conversions.
> >
> > Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR support.
> >
> > Patches 10-13 add general 16-bit support for i965. This includes
> > handling of new types on several general purpose methods,
> > update/remove some asserts.
> >
> > Patches 14-18 add support for 32 to 16-bit conversions for i965,
> > including rounding mode opcodes (needed for float to half-float
> > conversions), and an optimization that removes superfluous rounding
> > mode sets.
> >
> > Patch 19 adds 16-bit support for constant location.
> >
> > Patches 20-24 add and use two new messages: byte scattered read and
> > write. Those were needed because untyped surface message has a fixed
> > 32-bit write size. Those messages are used on the 16-bit support of
> > store SSBO, load SSBO, load UBO and load shared.
> >
> > Patches 25-29 implement 16-bit vertex attribute inputs support on
> > i965. These include changes on anv. This was needed because 16-bit
> > surface formats do implicit conversion to 32-bit. To workaround this,
> > we override the 16-bit surface format, and use 32-bit ones.
> >
> > Patch 30 implements load input and load store for all intra stage.
> > This patch substitutes the previous simple patch i965/fs: Set stride 2
> > when dealing with 16-bit floats/ints.
> >
> > Patch 31-37 implements 16-bit store output support for fragment
> > shaders on i965.
> >
> > Patches 38-41 are the new patches included in V2. Three of them are
> > improvements over V1 that doesn't fix any execution problem, but they
> > improve performance reducing the use of multiple scattered messages
> > for untyped read/write opreations. 16bit CTS tests passes without them.
> > The other one would fix a real problem (patch 41), but unfourtunately
> > no CTS test yet catching it.
> >
> > Patches 42-43 enable both extensions on anv vulkan driver.
> >
> > [1] https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html
> > [2] https://lists.freedesktop.org/archives/mesa-dev/2017-August/
> > 167455.html
> >
> > Alejandro Piñeiro (14):
> >   i965/vec4: Handle 16-bit types at type_size_xvec4
> >   i965/fs: Add brw_reg_type_from_bit_size utility method
> >   i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type
> >   i965/fs: Handle 32-bit to 16-bit conversions
> >   i965/fs: Define new shader opcode to set rounding modes
> >   i965/fs: Enable rounding mode on f2f16 ops
> >   i965/fs: Add remove_extra_rounding_modes optimization
> >   i965/fs: Adjust type_size/type_slots on store_ssbo
> >   i965/fs: Use 

Re: [Mesa-dev] [PATCH v3 00/43] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+

2017-10-16 Thread Jason Ekstrand
FYI: I'm planning to review this some time this week.  Probably not today
though.

On Thu, Oct 12, 2017 at 11:37 AM, Jose Maria Casanova Crespo <
jmcasan...@igalia.com> wrote:

> Hello,
>
> this is the V3 series for the implementation of the
> SPV_KHR_16bit_storage and VK_KHR_16bit_storage extensions on the anv
> vulkan driver, in addition to the GLSL and NIR support needed.
>
> The original series can be found here [1], and the V2 is available
> here [2].
>
> In short V3 includes the following:
>
>  * Updates on several patches after the review of the V2 series.
>This includes some squashes, and specially changes so 16-bit
>types are always packed, not using stride 2 by default.
>This implied a re-implementation of all load_input/store_output
>intrinsics for 16-bit. New solution shuffles and unshuffles
>16-bit components in 32-bit URB write and read operations. This
>saves space in the URB writes and reduces the register pressure
>just using half of the space.
>
> * 5 patches have been removed from v2 series because now we not
>assume the stride 2 for 16-bit registers. We also removed the
>patch of reuse_16bit_conversion_register. The problems related
>to spilling that motivate that patch were better addressed by
>Curro's liveness patch.
>
>i965/fs: Set stride 2 when dealing with 16-bit floats/ints
>i965/fs: Retype 16-bit/stride2 movs to UD on nir_op_vecX
>i965/fs: Need to allocate as minimum 32-bit register
>i965/fs: Update assertion on copy propagation
>i965/fs: Add reuse_16bit_conversions_register optimization
>
> Finally an updated overview of the patches:
>
> Patches 1-2 add 16-bit float, int and uint types to GLSL. This is
> needed because NIR uses GLSL types internally. We use the enums
> already defined at AMD_gpu_shader_half_float and NV_gpu_shader
> extensions. Patch 4 updates mesa/st, in order to avoid warnings for
> types not handled on a switch.
>
> Patches 3-6 add NIR support for those new GLSL 16-bit types,
> conversion opcodes, and rounding modes for float to half-float
> conversions.
>
> Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR support.
>
> Patches 10-13 add general 16-bit support for i965. This includes
> handling of new types on several general purpose methods,
> update/remove some asserts.
>
> Patches 14-18 add support for 32 to 16-bit conversions for i965,
> including rounding mode opcodes (needed for float to half-float
> conversions), and an optimization that removes superfluous rounding
> mode sets.
>
> Patch 19 adds 16-bit support for constant location.
>
> Patches 20-24 add and use two new messages: byte scattered read and
> write. Those were needed because untyped surface message has a fixed
> 32-bit write size. Those messages are used on the 16-bit support of
> store SSBO, load SSBO, load UBO and load shared.
>
> Patches 25-29 implement 16-bit vertex attribute inputs support on
> i965. These include changes on anv. This was needed because 16-bit
> surface formats do implicit conversion to 32-bit. To workaround this,
> we override the 16-bit surface format, and use 32-bit ones.
>
> Patch 30 implements load input and load store for all intra stage.
> This patch substitutes the previous simple patch i965/fs: Set stride 2
> when dealing with 16-bit floats/ints.
>
> Patch 31-37 implements 16-bit store output support for fragment
> shaders on i965.
>
> Patches 38-41 are the new patches included in V2. Three of them are
> improvements over V1 that doesn't fix any execution problem, but they
> improve performance reducing the use of multiple scattered messages
> for untyped read/write opreations. 16bit CTS tests passes without them.
> The other one would fix a real problem (patch 41), but unfourtunately
> no CTS test yet catching it.
>
> Patches 42-43 enable both extensions on anv vulkan driver.
>
> [1] https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html
> [2] https://lists.freedesktop.org/archives/mesa-dev/2017-August/
> 167455.html
>
> Alejandro Piñeiro (14):
>   i965/vec4: Handle 16-bit types at type_size_xvec4
>   i965/fs: Add brw_reg_type_from_bit_size utility method
>   i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type
>   i965/fs: Handle 32-bit to 16-bit conversions
>   i965/fs: Define new shader opcode to set rounding modes
>   i965/fs: Enable rounding mode on f2f16 ops
>   i965/fs: Add remove_extra_rounding_modes optimization
>   i965/fs: Adjust type_size/type_slots on store_ssbo
>   i965/fs: Use byte_scattered_write on 16-bit store_ssbo
>   anv/pipeline: Use 32-bit surface formats for 16-bit formats
>   anv/cmd_buffer: Add a padding to the vertex buffer
>   i965/fs: Use half_precision data_format on 16-bit fb writes
>   i965/fs: Predicate byte scattered writes if needed
>   anv: Enable VK_KHR_16bit_storage
>
> Eduardo Lima Mitev (8):
>   glsl: Add 16-bit types
>   mesa/st: Handle 16-bit types at st_glsl_storage_type_size()
>   nir: Add support for 16-bit types (half 

[Mesa-dev] [PATCH v3 00/43] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+

2017-10-12 Thread Jose Maria Casanova Crespo
Hello,

this is the V3 series for the implementation of the
SPV_KHR_16bit_storage and VK_KHR_16bit_storage extensions on the anv
vulkan driver, in addition to the GLSL and NIR support needed.

The original series can be found here [1], and the V2 is available
here [2].

In short V3 includes the following:

 * Updates on several patches after the review of the V2 series.
   This includes some squashes, and specially changes so 16-bit
   types are always packed, not using stride 2 by default.
   This implied a re-implementation of all load_input/store_output
   intrinsics for 16-bit. New solution shuffles and unshuffles
   16-bit components in 32-bit URB write and read operations. This
   saves space in the URB writes and reduces the register pressure
   just using half of the space.

* 5 patches have been removed from v2 series because now we not
   assume the stride 2 for 16-bit registers. We also removed the
   patch of reuse_16bit_conversion_register. The problems related
   to spilling that motivate that patch were better addressed by
   Curro's liveness patch.

   i965/fs: Set stride 2 when dealing with 16-bit floats/ints
   i965/fs: Retype 16-bit/stride2 movs to UD on nir_op_vecX
   i965/fs: Need to allocate as minimum 32-bit register
   i965/fs: Update assertion on copy propagation
   i965/fs: Add reuse_16bit_conversions_register optimization

Finally an updated overview of the patches:

Patches 1-2 add 16-bit float, int and uint types to GLSL. This is
needed because NIR uses GLSL types internally. We use the enums
already defined at AMD_gpu_shader_half_float and NV_gpu_shader
extensions. Patch 4 updates mesa/st, in order to avoid warnings for
types not handled on a switch.

Patches 3-6 add NIR support for those new GLSL 16-bit types,
conversion opcodes, and rounding modes for float to half-float
conversions.

Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR support.

Patches 10-13 add general 16-bit support for i965. This includes
handling of new types on several general purpose methods,
update/remove some asserts.

Patches 14-18 add support for 32 to 16-bit conversions for i965,
including rounding mode opcodes (needed for float to half-float
conversions), and an optimization that removes superfluous rounding
mode sets.

Patch 19 adds 16-bit support for constant location.

Patches 20-24 add and use two new messages: byte scattered read and
write. Those were needed because untyped surface message has a fixed
32-bit write size. Those messages are used on the 16-bit support of
store SSBO, load SSBO, load UBO and load shared.

Patches 25-29 implement 16-bit vertex attribute inputs support on
i965. These include changes on anv. This was needed because 16-bit
surface formats do implicit conversion to 32-bit. To workaround this,
we override the 16-bit surface format, and use 32-bit ones.

Patch 30 implements load input and load store for all intra stage.
This patch substitutes the previous simple patch i965/fs: Set stride 2
when dealing with 16-bit floats/ints.

Patch 31-37 implements 16-bit store output support for fragment
shaders on i965.

Patches 38-41 are the new patches included in V2. Three of them are
improvements over V1 that doesn't fix any execution problem, but they
improve performance reducing the use of multiple scattered messages
for untyped read/write opreations. 16bit CTS tests passes without them.
The other one would fix a real problem (patch 41), but unfourtunately
no CTS test yet catching it.

Patches 42-43 enable both extensions on anv vulkan driver.

[1] https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html
[2] https://lists.freedesktop.org/archives/mesa-dev/2017-August/167455.html

Alejandro Piñeiro (14):
  i965/vec4: Handle 16-bit types at type_size_xvec4
  i965/fs: Add brw_reg_type_from_bit_size utility method
  i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type
  i965/fs: Handle 32-bit to 16-bit conversions
  i965/fs: Define new shader opcode to set rounding modes
  i965/fs: Enable rounding mode on f2f16 ops
  i965/fs: Add remove_extra_rounding_modes optimization
  i965/fs: Adjust type_size/type_slots on store_ssbo
  i965/fs: Use byte_scattered_write on 16-bit store_ssbo
  anv/pipeline: Use 32-bit surface formats for 16-bit formats
  anv/cmd_buffer: Add a padding to the vertex buffer
  i965/fs: Use half_precision data_format on 16-bit fb writes
  i965/fs: Predicate byte scattered writes if needed
  anv: Enable VK_KHR_16bit_storage

Eduardo Lima Mitev (8):
  glsl: Add 16-bit types
  mesa/st: Handle 16-bit types at st_glsl_storage_type_size()
  nir: Add support for 16-bit types (half float, int16 and uint16)
  nir: Populate conversion opcodes to/from 16-bit types
  spirv/nir: Handle 16-bit types
  spirv/nir: Add support for SPV_KHR_16bit_storage
  i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg
  anv: Enable SPV_KHR_16bit_storage on gen 8+

Jose Maria Casanova Crespo (21):
  nir: Add rounding modes enum
  nir: Handle fp16