On Monday 16 February 2015 08:34 PM, Bernhard Reutner-Fischer wrote:
>> While it at I also did some arch specific adjustment in sigaction path
>> >- inlining the rt_sigaction syscall stub detour to reduce branch return
>> >stack mispredicts etc - which is what 6/8 does !
> This sounds suspicious.
> IIRC we already had that argument, last time around _dl_do_reloc and 
> _dl_do_lazy_reloc.
> Could it be that your port has a bug here ( missed optimisation ) around 
> ifunc handling? Sounds like back then on ARM https://gcc.gnu.org/PR40887#c6
> 
> What am I missing?


I don't think my use-case is close to the ARM issue u pointed to above as there 
is
no ifunc or function pointer involved.

With orig code, we get 2 function calls on ARC:

0000b504 <__libc_sigaction>:
    b504:       push_s     blink
    b506:       sub_s      sp,sp,12
    b508:       bl.d       36b20 <__st_r13_to_r15>
...

    b540:       bl.d       b750 <__syscall_rt_sigaction>   <--- DIRECT CALL
    b544:       mov_s      r3,8
    b546:       add_s      sp,sp,20
    b548:       mov_s      r12,12
    b54a:       b          36b88 <__ld_r13_to_r15_ret>
    b54e:       nop_s

0000b750 <__syscall_rt_sigaction>:
    b750:       mov        r8,134
    b754:       swi                                <---- SYSCALL TRAP INTO 
KERNEL
    b758:       cmp        r0,0xfffffc00
    b75c:       bls_s      b76a
    b75e:       st.a       blink,[sp,-4]
    b762:       bl         b550 <__syscall_error>
    b766:       ld.ab      blink,[sp,4]
    b76a:       j_s        [blink]

The small function call is not necessarily good micro-architecturally when
returning due to limited number of call return stack entries. That cost is
amortized if function is largish.

I do understand that these small syscall wrappers are a common uClibc design
pattern and exist all over the place but given that this was all arch code I 
tool
the liberty of removing the one hop and the code now looks as below:

0000b4d8 <__libc_sigaction>:
    b4d8:       st.a       gp,[sp,-4]
    b4dc:       sub_s      sp,sp,20
    b4de:       add        gp,pcl,0x00065284
    b4e6:       breq_s     r1,0,b516
    b4e8:       ld_s       r3,[r1,4]
...
    b516:       mov        r8,134
    b51a:       mov_s      r3,8
    b51c:       swi
    b520:       cmp        r0,0xfffffc00
    b524:       bls_s      b532
    b526:       st.a       blink,[sp,-4]
    b52a:       bl         b53c <__syscall_error>
    b52e:       ld.ab      blink,[sp,4]
    b532:       ld.a       gp,[sp,20]
    b536:       j_s.d      [blink]
    b538:       add_s      sp,sp,4
    b53a:       nop_s

-Vineet
_______________________________________________
uClibc mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/uclibc

Reply via email to