[PING] Re: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0

2022-11-15 Thread Daniel Engel
Hello, 

Is there still any interest in merging this patch? 

Thanks,
Daniel


On Mon, Oct 31, 2022, at 8:44 AM, Daniel Engel wrote:
> Hi Richard,
>
> I am re-submitting my libgcc patch from 2021:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html
>
> I believe I have finally made the stage1 window. 
>
> Regards,
> Daniel
>
> ---
>
> Changes since v6:
>
> * Rebased and tested with gcc-13
>
> There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}.
> Clean master:
>
> # of expected passes529397
> # of unexpected failures41160
> # of unexpected successes   12
> # of expected failures  3442
> # of unresolved testcases   978
> # of unsupported tests  28993
>
> Patched master:
>
> # of expected passes529397
> # of unexpected failures41160
> # of unexpected successes   12
> # of expected failures  3442
> # of unresolved testcases   978
> # of unsupported tests  28993
>
> ---
>
> This patch series adds an assembly-language implementation of IEEE-754 
> compliant
> single-precision functions designed for the Cortex M0 (v6m) architecture.  
> There
> are improvements to most of the EABI integer functions as well.  This is the
> ibgcc component of a larger library project originally proposed in 2018:
>
> https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
>
> As one point of comparison, a test program [1] links 916 bytes from libgcc 
> with
> the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 
> toolchain.
> That's a 90% size reduction.
>
> I have extensive test vectors [2], and this patch pass all tests on an 
> STM32F051.
> These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 
> [5], plus
> many of my own generation.
>
> There may be some follow-on projects worth discussing:
>
> * The library is currently integrated into the ARM v6s-m multilib only.  
> It
> is likely that some other architectures would benefit from these routines.
> However, I have NOT profiled the existing implementations (ieee754-sf.S) 
> to
> estimate where improvements may be found.
>
> * GCC currently lacks test for some functions, such as 
> __aeabi_[u]ldivmod().
> There may be useful bits in [1] that can be integrated.
>
> On Cortex M0, the library has (approximately) the following properties:
>
> Function(s) Size (bytes)Cycles  
> Stack   Accuracy
> __clzsi250  20  
> 0   exact
> __clzsi2 (OPTIMIZE_SIZE)22  51  
> 0   exact
> __clzdi28+__clzsi2  4+__clzsi2  
> 0   exact
>
> __clrsbsi2  8+__clzsi2  6+__clzsi2  
> 0   exact
> __clrsbdi2  18+__clzsi2 (8..10)+__clzsi2
> 0   exact
>
> __ctzsi252  21  
> 0   exact
> __ctzsi2 (OPTIMIZE_SIZE)24  52  
> 0   exact
> __ctzdi28+__ctzsi2  5+__ctzsi2  
> 0   exact
>
> __ffssi28   6..(5+__ctzsi2) 
> 0   exact
> __ffsdi214+__ctzsi2 9..(8+__ctzsi2) 
> 0   exact
>
> __popcountsi2   52  25  
> 0   exact
> __popcountsi2 (OPTIMIZE_SIZE)   14  9..201  
> 0   exact
> __popcountdi2   34+__popcountsi246  
> 0   exact
> __popcountdi2 (OPTIMIZE_SIZE)   12+__popcountsi217..401 
> 0   exact
>
> __paritysi2 24  14  
> 0   exact
> __paritysi2 (OPTIMIZE_SIZE) 16  38  
> 0   exact
> __paritydi2 2+__paritysi2   1+__paritysi2   
> 0   exact
>
> __umulsidi3 44  24  
> 0   exact
> __mulsidi3  30+__umulsidi3  24+__umulsidi3  
> 8   exact
> __muldi3 (__aeabi_lmul) 10+__umulsidi3  6+__umulsidi3   
> 0   exact
> __ashldi3 (__aeabi_llsl)22  13  
> 0   exact
> __lshrdi3 (__aeabi_llsr)22  13  
> 0   exact
> __ashrdi3 (__aeabi_lasr)22  13  
> 0   exact
>
> __aeabi_lcmp20  13  
> 0   exact
> __aeabi_ulcmp   16  10  
> 0   exact
>
> __udivsi3 (__aeabi_uidiv)   56  72..385 
> 0   < 1 lsb

[PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0

2022-10-31 Thread Daniel Engel
Hi Richard,

I am re-submitting my libgcc patch from 2021:

https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html

I believe I have finally made the stage1 window. 

Regards,
Daniel

---

Changes since v6:

* Rebased and tested with gcc-13

There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}.
Clean master:

# of expected passes529397
# of unexpected failures41160
# of unexpected successes   12
# of expected failures  3442
# of unresolved testcases   978
# of unsupported tests  28993

Patched master:

# of expected passes529397
# of unexpected failures41160
# of unexpected successes   12
# of expected failures  3442
# of unresolved testcases   978
# of unsupported tests  28993

---

This patch series adds an assembly-language implementation of IEEE-754 compliant
single-precision functions designed for the Cortex M0 (v6m) architecture.  There
are improvements to most of the EABI integer functions as well.  This is the
ibgcc component of a larger library project originally proposed in 2018:

https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html

As one point of comparison, a test program [1] links 916 bytes from libgcc with
the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain.
That's a 90% size reduction.

I have extensive test vectors [2], and this patch pass all tests on an 
STM32F051.
These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 [5], plus
many of my own generation.

There may be some follow-on projects worth discussing:

* The library is currently integrated into the ARM v6s-m multilib only.  It
is likely that some other architectures would benefit from these routines.
However, I have NOT profiled the existing implementations (ieee754-sf.S) to
estimate where improvements may be found.

* GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod().
There may be useful bits in [1] that can be integrated.

On Cortex M0, the library has (approximately) the following properties:

Function(s) Size (bytes)Cycles  Stack   
Accuracy
__clzsi250  20  0   
exact
__clzsi2 (OPTIMIZE_SIZE)22  51  0   
exact
__clzdi28+__clzsi2  4+__clzsi2  0   
exact

__clrsbsi2  8+__clzsi2  6+__clzsi2  0   
exact
__clrsbdi2  18+__clzsi2 (8..10)+__clzsi20   
exact

__ctzsi252  21  0   
exact
__ctzsi2 (OPTIMIZE_SIZE)24  52  0   
exact
__ctzdi28+__ctzsi2  5+__ctzsi2  0   
exact

__ffssi28   6..(5+__ctzsi2) 0   
exact
__ffsdi214+__ctzsi2 9..(8+__ctzsi2) 0   
exact

__popcountsi2   52  25  0   
exact
__popcountsi2 (OPTIMIZE_SIZE)   14  9..201  0   
exact
__popcountdi2   34+__popcountsi246  0   
exact
__popcountdi2 (OPTIMIZE_SIZE)   12+__popcountsi217..401 0   
exact

__paritysi2 24  14  0   
exact
__paritysi2 (OPTIMIZE_SIZE) 16  38  0   
exact
__paritydi2 2+__paritysi2   1+__paritysi2   0   
exact

__umulsidi3 44  24  0   
exact
__mulsidi3  30+__umulsidi3  24+__umulsidi3  8   
exact
__muldi3 (__aeabi_lmul) 10+__umulsidi3  6+__umulsidi3   0   
exact
__ashldi3 (__aeabi_llsl)22  13  0   
exact
__lshrdi3 (__aeabi_llsr)22  13  0   
exact
__ashrdi3 (__aeabi_lasr)22  13  0   
exact

__aeabi_lcmp20  13  0   
exact
__aeabi_ulcmp   16  10  0   
exact

__udivsi3 (__aeabi_uidiv)   56  72..385 0   
< 1 lsb
__divsi3 (__aeabi_idiv) 38+__udivsi326+__udivsi38   
< 1 lsb
__udivdi3 (__aeabi_uldiv)   164 103..1394   16  
< 1 lsb
__udivdi3 (OPTIMIZE_SIZE)   142 120..1392   16  
< 1 lsb
__divdi3 (__aeabi_ldiv) 54+__udivdi336+__udivdi332  
< 1 lsb

__shared_float