Re: New prologue/epilogue code for i386 string functions
On Tue, Oct 22, 2013 at 8:58 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, this patch adds code to produce prologues/epilogues as suggested by Ondrej Bilka (I described more the approach in http://gcc.gnu.org/ml/gcc-patches/2013-09/msg02082.html) This patch is updated and cleaned up version after Mikhail changes merging memset/memcpy generation code. (I will continue with some incremental cleanups for the code dulication we ended up with). For now I don't have value range code in, but all logic is in place once http://gcc.gnu.org/ml/gcc-patches/2013-09/msg02011.html gets reviewed. Bootstrapped/regtesed x86_64-linux also with -minline-all-stringops and tested on SPEC2k6. I will commit it later today after more testing. Honza * i386.h (TARGET_MISALIGNED_MOVE_STRING_PROLOGUES_EPILOGUES): New tuning flag. * x86-tune.def (TARGET_MISALIGNED_MOVE_STRING_PROLOGUES): Define it. * i386.c (expand_small_movmem_or_setmem): New function. (expand_set_or_movmem_prologue_epilogue_by_misaligned_moves): New function (alg_usable_p): Add support for value ranges; cleanup. (ix86_expand_set_or_movmem): Add support for misaligned moves. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59605 -- H.J.
Re: New prologue/epilogue code for i386 string functions
Jan Hubicka hubi...@ucw.cz writes: +static void +expand_set_or_movmem_prologue_epilogue_by_misaligned_moves (rtx destmem, rtx srcmem, + rtx *destptr, rtx *srcptr, + enum machine_mode mode, + rtx value, rtx vec_value, + rtx *count, + rtx *done_label, + int size, + int desired_align, + int align, + unsigned HOST_WIDE_INT *min_size, + bool dynamic_check, + bool issetmem) That's a scary prototype. Could you refactor this somehow to not need that many parameters? Perhaps this should be multiple functions. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: New prologue/epilogue code for i386 string functions
Jan Hubicka hubi...@ucw.cz writes: +static void +expand_set_or_movmem_prologue_epilogue_by_misaligned_moves (rtx destmem, rtx srcmem, + rtx *destptr, rtx *srcptr, + enum machine_mode mode, + rtx value, rtx vec_value, + rtx *count, + rtx *done_label, + int size, + int desired_align, + int align, + unsigned HOST_WIDE_INT *min_size, + bool dynamic_check, + bool issetmem) That's a scary prototype. Could you refactor this somehow to not need that many parameters? Perhaps this should be multiple functions. Well, it became worse with merging memcpy and memset code (but not by that much). The problem here is that the prologue/epilogue code does really quite a lot of things at once. It sort of naturally split into the code handling small memcpy and the branchless code copying first and last N bytes. I can split the function this way, but I think the first one will take pretty much all the existing parameters... I will try to look into this... Honza -Andi -- a...@linux.intel.com -- Speaking for myself only