On Wednesday 18 February 2015 01:33 PM, Bernhard Reutner-Fischer wrote:
> On February 18, 2015 6:51:17 AM GMT+01:00, Vineet Gupta 
> <[email protected]> wrote:
>> On Monday 16 February 2015 08:34 PM, Bernhard Reutner-Fischer wrote:
>>>> While it at I also did some arch specific adjustment in sigaction
>> path
>>>>> - inlining the rt_sigaction syscall stub detour to reduce branch
>> return
>>>>> stack mispredicts etc - which is what 6/8 does !
>>> This sounds suspicious.
>>> IIRC we already had that argument, last time around _dl_do_reloc and
>> _dl_do_lazy_reloc.
>>> Could it be that your port has a bug here ( missed optimisation )
>> around ifunc handling? Sounds like back then on ARM
>> https://gcc.gnu.org/PR40887#c6
>>> What am I missing?
>>
>> I don't think my use-case is close to the ARM issue u pointed to above
>> as there is
>> no ifunc or function pointer involved.
> I was more thinking about the relic functors.
> Does GCC 5 produce identical code for ARC master way to explicit function 
> calls compared to using a function pointer like suggested and used in all 
> other ports?
> If not then I'd consider this a bug.
>
>> With orig code, we get 2 function calls on ARC:
>>
>> 0000b504 <__libc_sigaction>:
>>    b504:     push_s     blink
>>    b506:     sub_s      sp,sp,12
>>    b508:     bl.d       36b20 <__st_r13_to_r15>
>> ...
>>
>>    b540:     bl.d       b750 <__syscall_rt_sigaction>   <--- DIRECT CALL
>>    b544:     mov_s      r3,8
>>    b546:     add_s      sp,sp,20
>>    b548:     mov_s      r12,12
>>    b54a:     b          36b88 <__ld_r13_to_r15_ret>
>>    b54e:     nop_s
>>
>> 0000b750 <__syscall_rt_sigaction>:
>>    b750:     mov        r8,134
>> b754:        swi                                <---- SYSCALL TRAP INTO 
>> KERNEL
>>    b758:     cmp        r0,0xfffffc00
>>    b75c:     bls_s      b76a
>>    b75e:     st.a       blink,[sp,-4]
>>    b762:     bl         b550 <__syscall_error>
>>    b766:     ld.ab      blink,[sp,4]
>>    b76a:     j_s        [blink]
>>
>> The small function call is not necessarily good micro-architecturally
>> when
>> returning due to limited number of call return stack entries. That cost
>> is
>> amortized if function is largish.
>>
>> I do understand that these small syscall wrappers are a common uClibc
>> design
>> pattern and exist all over the place but given that this was all arch
>> code I tool
>> the liberty of removing the one hop and the code now looks as below:
>>
>> 0000b4d8 <__libc_sigaction>:
>>    b4d8:     st.a       gp,[sp,-4]
>>    b4dc:     sub_s      sp,sp,20
>>    b4de:     add        gp,pcl,0x00065284
>>    b4e6:     breq_s     r1,0,b516
>>    b4e8:     ld_s       r3,[r1,4]
>> ...
>>    b516:     mov        r8,134
>>    b51a:     mov_s      r3,8
>>    b51c:     swi
>>    b520:     cmp        r0,0xfffffc00
>>    b524:     bls_s      b532
>>    b526:     st.a       blink,[sp,-4]
>>    b52a:     bl         b53c <__syscall_error>
>>    b52e:     ld.ab      blink,[sp,4]
>>    b532:     ld.a       gp,[sp,20]
>>    b536:     j_s.d      [blink]
>>    b538:     add_s      sp,sp,4
>>    b53a:     nop_s
> I would have assumed / hoped that GCC 5 should generate this 2nd variant for 
> extern inline __syscall_rt_sigaction.
>
> Doesn't it do that?

ARC gcc upgrade to 5.0 is still being done - so i can't comment. CCing our gcc 
gurus !

-Vineet
_______________________________________________
uClibc mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/uclibc

Reply via email to