On Thu, Aug 23, 2018 at 05:48:45PM +0100, Ard Biesheuvel wrote:
> Replace the literal load of the addend vector with a sequence that
> performs each add individually. This sequence is only 2 instructions
> longer than the original, and 2% faster on Cortex-A53.
>
> This is an improvement by itself,
On 23 August 2018 at 21:04, Nick Desaulniers wrote:
> On Thu, Aug 23, 2018 at 9:48 AM Ard Biesheuvel
> wrote:
>>
>> Replace the literal load of the addend vector with a sequence that
>> performs each add individually. This sequence is only 2 instructions
>> longer than the original, and 2% faster
On Thu, Aug 23, 2018 at 9:48 AM Ard Biesheuvel
wrote:
>
> Replace the literal load of the addend vector with a sequence that
> performs each add individually. This sequence is only 2 instructions
> longer than the original, and 2% faster on Cortex-A53.
>
> This is an improvement by itself, but als
Replace the literal load of the addend vector with a sequence that
performs each add individually. This sequence is only 2 instructions
longer than the original, and 2% faster on Cortex-A53.
This is an improvement by itself, but also works around a Clang issue,
whose integrated assembler does not