On Wednesday 18 February 2015 01:33 PM, Bernhard Reutner-Fischer wrote: > On February 18, 2015 6:51:17 AM GMT+01:00, Vineet Gupta > <[email protected]> wrote: >> On Monday 16 February 2015 08:34 PM, Bernhard Reutner-Fischer wrote: >>>> While it at I also did some arch specific adjustment in sigaction >> path >>>>> - inlining the rt_sigaction syscall stub detour to reduce branch >> return >>>>> stack mispredicts etc - which is what 6/8 does ! >>> This sounds suspicious. >>> IIRC we already had that argument, last time around _dl_do_reloc and >> _dl_do_lazy_reloc. >>> Could it be that your port has a bug here ( missed optimisation ) >> around ifunc handling? Sounds like back then on ARM >> https://gcc.gnu.org/PR40887#c6 >>> What am I missing? >> >> I don't think my use-case is close to the ARM issue u pointed to above >> as there is >> no ifunc or function pointer involved. > I was more thinking about the relic functors. > Does GCC 5 produce identical code for ARC master way to explicit function > calls compared to using a function pointer like suggested and used in all > other ports? > If not then I'd consider this a bug. > >> With orig code, we get 2 function calls on ARC: >> >> 0000b504 <__libc_sigaction>: >> b504: push_s blink >> b506: sub_s sp,sp,12 >> b508: bl.d 36b20 <__st_r13_to_r15> >> ... >> >> b540: bl.d b750 <__syscall_rt_sigaction> <--- DIRECT CALL >> b544: mov_s r3,8 >> b546: add_s sp,sp,20 >> b548: mov_s r12,12 >> b54a: b 36b88 <__ld_r13_to_r15_ret> >> b54e: nop_s >> >> 0000b750 <__syscall_rt_sigaction>: >> b750: mov r8,134 >> b754: swi <---- SYSCALL TRAP INTO >> KERNEL >> b758: cmp r0,0xfffffc00 >> b75c: bls_s b76a >> b75e: st.a blink,[sp,-4] >> b762: bl b550 <__syscall_error> >> b766: ld.ab blink,[sp,4] >> b76a: j_s [blink] >> >> The small function call is not necessarily good micro-architecturally >> when >> returning due to limited number of call return stack entries. That cost >> is >> amortized if function is largish. >> >> I do understand that these small syscall wrappers are a common uClibc >> design >> pattern and exist all over the place but given that this was all arch >> code I tool >> the liberty of removing the one hop and the code now looks as below: >> >> 0000b4d8 <__libc_sigaction>: >> b4d8: st.a gp,[sp,-4] >> b4dc: sub_s sp,sp,20 >> b4de: add gp,pcl,0x00065284 >> b4e6: breq_s r1,0,b516 >> b4e8: ld_s r3,[r1,4] >> ... >> b516: mov r8,134 >> b51a: mov_s r3,8 >> b51c: swi >> b520: cmp r0,0xfffffc00 >> b524: bls_s b532 >> b526: st.a blink,[sp,-4] >> b52a: bl b53c <__syscall_error> >> b52e: ld.ab blink,[sp,4] >> b532: ld.a gp,[sp,20] >> b536: j_s.d [blink] >> b538: add_s sp,sp,4 >> b53a: nop_s > I would have assumed / hoped that GCC 5 should generate this 2nd variant for > extern inline __syscall_rt_sigaction. > > Doesn't it do that?
ARC gcc upgrade to 5.0 is still being done - so i can't comment. CCing our gcc gurus ! -Vineet _______________________________________________ uClibc mailing list [email protected] http://lists.busybox.net/mailman/listinfo/uclibc
