Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
Dear all, I just want to let you know that we just published the final version of the Vector Function ABI specification. The call-clobbered and call-preserved lists of register has been updated (see section 2.1) . The document is located at the same address: https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi Kind regards, Francesco > On May 16, 2018, at 11:21 AM, Steve Ellcey wrote: > > On Tue, 2018-05-15 at 18:29 +, Francesco Petrogalli wrote: > >> Hi Steve, >> >> I am happy to let you know that the Vector Function ABI for AArch64 >> is now public and available via the link at [1]. >> >> Don’t hesitate to contact me in case you have any questions. >> >> Kind regards, >> >> Francesco >> >> [1] https://developer.arm.com/products/software-development-tools/hpc >> /arm-compiler-for-hpc/vector-function-abi >> >>> >>> Steve Ellcey >>> sell...@cavium.com > > Thanks for publishing this Francesco, it looks like the main issue for > GCC is that the Vector Function ABI has different caller saved / callee > saved register conventions than the standard ARM calling convention. > > If I understand things correctly, in the standard calling convention > the callee will only save the bottom 64 bits of V8-V15 and so the > caller needs to save those registers if it is using the top half. In > the Vector calling convention the callee will save all 128 bits of > these registers (and possibly more registers) so the caller does not > have to save these registers at all, even if it is using all 128 bits > of them. > > It doesn't look like GCC has any existing mechanism for having different > sets of caller saved/callee saved registers depending on the function > attributes of the calling or called function. > > Changing what registers a callee function saves and restores shouldn't > be too difficult since that can be done when generating the prologue > and epilogue code but changing what registers a caller saves/restores > when doing the call seems trickier. The macro > TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the > function being called. It returns true/false depending on just the > register number and mode. > > Steve Ellcey > sell...@cavium.com
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
On 05/31/2018 04:39 AM, Alan Hayward wrote: > (Missed this thread initially due to incorrect email address) Sorry. Good to hear your're still interested in figuring this out. > >> On 29 May 2018, at 11:05, Richard Sandiford >> wrote: >> >> Jeff Law writes: >>> Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff. >>> When we left things I think we were trying to decide between >>> CLOBBER_HIGH and clobbering the appropriate subreg. The problem with >>> the latter is the dataflow we compute is inaccurate (overly pessimistic) >>> so that'd have to be fixed. > > Yes, I want to get back to looking at this again, however I’ve been busy > elsewhere. Similarly. > >> >> The clobbered part of the register in this case is a high-part subreg, >> which is ill-formed for single registers. It would also be difficult >> to represent in terms of the mode, since there are no defined modes for >> what can be stored in the high part of an SVE register. For 128-bit >> SVE that mode would have zero bits. :-) >> >> I thought the alternative suggestion was instead to have: >> >> (set (reg:M X) (reg:M X)) >> >> when X is preserved in mode M but not in wider modes. But that seems >> like too much of a special case to me, both in terms of the source and >> the destination: > > Agreed. When I looked at doing it that way back in Jan, my conclusion was > that if we did it that way we end up with more or less the same code but > instead of: > > if (GET_CODE (setter) == CLOBBER_HIGH >&& reg_is_clobbered_by_clobber_high(REGNO(dest), GET_MODE > (rsp->last_set_value)) > > Now becomes something like: > > if (GET_CODE (setter) == SET >&& REG_P (dest) && HARD_REGISTER_P (dest) && REG_P (src) && REGNO(dst) == > REGNO(src) >&& reg_is_clobbered_by_self_set(REGNO(dest), GET_MODE > (rsp->last_set_value)) > > Ok, some of that code can go into a macro, but it feel much clearer to > explicitly check for CLOBBER_HIGH rather then applying some special semantics > to a specific SET case. Then let's return to the CLOBBER_HIGH approach. The hope was that most of the places where you had to introduce CLOBBER_HIGH would "just work" with the self-set approach. If that's not the case, then there's really nothing to be gained with self-set. I suggest you get the patch updated for the trunk and repost now that we're in broad agreement that self-set is a rathole. jeff
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
On 05/29/2018 04:05 AM, Richard Sandiford wrote: > Jeff Law writes: >> Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff. >> When we left things I think we were trying to decide between >> CLOBBER_HIGH and clobbering the appropriate subreg. The problem with >> the latter is the dataflow we compute is inaccurate (overly pessimistic) >> so that'd have to be fixed. > > The clobbered part of the register in this case is a high-part subreg, > which is ill-formed for single registers. It would also be difficult > to represent in terms of the mode, since there are no defined modes for > what can be stored in the high part of an SVE register. For 128-bit > SVE that mode would have zero bits. :-) > > I thought the alternative suggestion was instead to have: > >(set (reg:M X) (reg:M X)) You're right. I mis-remembered. IT happens far too often these days. > > when X is preserved in mode M but not in wider modes. But that seems > like too much of a special case to me, both in terms of the source and > the destination: Well, the hope was this would "just work" without having to introduce a new RTX code and teach all the RTL passes about it. The self-assignment has the right semantics, but I believe Alan showed that the DF infrastructure pessimized it horribly. At which point the question became how painful would it be to fix DF and compare that to the pain of adding a new RTX code. > > - On the destination side, a SET normally provides something for later > instructions to use, whereas here the effect is intended to be the > opposite: the instruction has no effect at all on a value of mode M > in X. As you say, this would pessimise df without specific handling. > But I think all optimisations that look for the definition of a value > would need to be taught to "look through" this set to find the real > definition of (reg:M X) (or any value of a mode no larger than M in X). > Very few passes use the df def-uses chains for this due its high cost. But how often do we really need to look for the REG in a large mode than M? Yea, it happens occasionally, but I don't think it's pervasive and the cases where we do probably aren't *that* important performance-wise. Though at a conceptual level I agree. SET is meant to provide something for later consumption, we'd be abusing it. > > More fundamentally, it should be possible in RTL to express an > instruction J that *does* read X in mode M and clobbers its high part. > If we use the SET above to represent the clobber, and treat the rhs use > as special, then presumably J would need two uses of X, one "dummy" one > on the no-op SET and one "real" one on some other SET (or perhaps in a > top-level USE). Having the number of uses determine this seems > a bit awkward. > > IMO CLOBBER and SET have different semantics for good reason: CLOBBER > represents an optimisation barrier for things that care about the value > of a certain rtx object, while SET represents a productive effect or > side-effect. The effect we want here is the same as a normal clobber, > except that the clobber is mode-dependent. I largely agree. It was really a matter of whether or not using the self-set would simplify the implementation in a significant way. jeff
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
(Missed this thread initially due to incorrect email address) > On 29 May 2018, at 11:05, Richard Sandiford > wrote: > > Jeff Law writes: >> Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff. >> When we left things I think we were trying to decide between >> CLOBBER_HIGH and clobbering the appropriate subreg. The problem with >> the latter is the dataflow we compute is inaccurate (overly pessimistic) >> so that'd have to be fixed. Yes, I want to get back to looking at this again, however I’ve been busy elsewhere. > > The clobbered part of the register in this case is a high-part subreg, > which is ill-formed for single registers. It would also be difficult > to represent in terms of the mode, since there are no defined modes for > what can be stored in the high part of an SVE register. For 128-bit > SVE that mode would have zero bits. :-) > > I thought the alternative suggestion was instead to have: > > (set (reg:M X) (reg:M X)) > > when X is preserved in mode M but not in wider modes. But that seems > like too much of a special case to me, both in terms of the source and > the destination: Agreed. When I looked at doing it that way back in Jan, my conclusion was that if we did it that way we end up with more or less the same code but instead of: if (GET_CODE (setter) == CLOBBER_HIGH && reg_is_clobbered_by_clobber_high(REGNO(dest), GET_MODE (rsp->last_set_value)) Now becomes something like: if (GET_CODE (setter) == SET && REG_P (dest) && HARD_REGISTER_P (dest) && REG_P (src) && REGNO(dst) == REGNO(src) && reg_is_clobbered_by_self_set(REGNO(dest), GET_MODE (rsp->last_set_value)) Ok, some of that code can go into a macro, but it feel much clearer to explicitly check for CLOBBER_HIGH rather then applying some special semantics to a specific SET case. Alan.
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
Jeff Law writes: > Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff. > When we left things I think we were trying to decide between > CLOBBER_HIGH and clobbering the appropriate subreg. The problem with > the latter is the dataflow we compute is inaccurate (overly pessimistic) > so that'd have to be fixed. The clobbered part of the register in this case is a high-part subreg, which is ill-formed for single registers. It would also be difficult to represent in terms of the mode, since there are no defined modes for what can be stored in the high part of an SVE register. For 128-bit SVE that mode would have zero bits. :-) I thought the alternative suggestion was instead to have: (set (reg:M X) (reg:M X)) when X is preserved in mode M but not in wider modes. But that seems like too much of a special case to me, both in terms of the source and the destination: - On the destination side, a SET normally provides something for later instructions to use, whereas here the effect is intended to be the opposite: the instruction has no effect at all on a value of mode M in X. As you say, this would pessimise df without specific handling. But I think all optimisations that look for the definition of a value would need to be taught to "look through" this set to find the real definition of (reg:M X) (or any value of a mode no larger than M in X). Very few passes use the df def-uses chains for this due its high cost. - On the source side, the instruction doesn't actually care what's in X, but nevertheless appears to use it. This means that most passes would need to be taught that a use of X on the rhs of a no-op SET is special and should usually be ignored. More fundamentally, it should be possible in RTL to express an instruction J that *does* read X in mode M and clobbers its high part. If we use the SET above to represent the clobber, and treat the rhs use as special, then presumably J would need two uses of X, one "dummy" one on the no-op SET and one "real" one on some other SET (or perhaps in a top-level USE). Having the number of uses determine this seems a bit awkward. IMO CLOBBER and SET have different semantics for good reason: CLOBBER represents an optimisation barrier for things that care about the value of a certain rtx object, while SET represents a productive effect or side-effect. The effect we want here is the same as a normal clobber, except that the clobber is mode-dependent. Thanks, Richard
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
On 05/26/2018 04:09 AM, Richard Sandiford wrote: > Steve Ellceywrites: >> On Wed, 2018-05-16 at 22:11 +0100, Richard Sandiford wrote: >>> >>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way >>> of saying that an rtl instruction preserves the low part of a >>> register but clobbers the high part. We would need something like >>> Alan H's CLOBBER_HIGH patches to do it using explicit clobbers. >>> >>> Another approach would be to piggy-back on the -fipa-ra >>> infrastructure >>> and record that vector PCS functions only clobber Q0-Q7. If -fipa-ra >>> knows that a function doesn't clobber Q8-Q15 then that should >>> override >>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED. (I'm not sure whether it does >>> in practice, but it should :-) And if it doesn't that's a bug that's >>> worth fixing for its own sake.) >>> >>> Thanks, >>> Richard >> >> Alan, >> >> I have been looking at your CLOBBER_HIGH patches to see if they >> might be helpful in implementing the ARM SIMD Vector ABI in GCC. >> I have also been looking at the -fipa-ra flag and how it works. >> >> I was wondering if you considered using the ipa-ra infrastructure >> for the SVE work that you are currently trying to support with >> the CLOBBER_HIGH macro? >> >> My current thought for the ABI work is to mark all the floating >> point / vector registers as caller saved (the lower half of V8-V15 >> are currently callee saved) and remove >> TARGET_HARD_REGNO_CALL_PART_CLOBBERED. >> This should work but would be inefficient. >> >> The next step would be to split get_call_reg_set_usage up into >> two functions so that I don't have to pass in a default set of >> registers. One function would return call_used_reg_set by >> default (but could return a smaller set if it had actual used >> register information) and the other would return regs_invalidated >> by_call by default (but could also return a smaller set). >> >> Next I would add a 'largest mode used' array to call_cgraph_rtl_info >> structure in addition to the current function_used_regs register >> set. >> >> Then I could turn the get_call_reg_set_usage replacement functions >> into target specific functions and with the information in the >> call_cgraph_rtl_info structure and any simd attribute information on >> a function I could modify what registers are really being used/invalidated >> without being saved. >> >> If the called function only uses the bottom half of a register it would not >> be marked as used/invalidated. If it uses the entire register and the >> function is not marked as simd, then the register would marked as >> used/invalidated. If the function was marked as simd the register would not >> be marked because a simd function would save both the upper and lower halves >> of a callee saved register (whereas a non simd function would only save the >> lower half). >> >> Does this sound like something that could be used in place of your >> CLOBBER_HIGH patch? > > One of the advantages of CLOBBER_HIGH is that it can be attached to > arbitrary instructions, not just calls. The motivating example was > tlsdesc_small_, which isn't treated as a call but as a normal > instruction. (And I don't think we want to change that, since it's much > easier for rtl optimisers to deal with normal instructions compared to > calls. In general a call is part of a longer sequence of instructions > that includes setting up arguments, etc.) Yea. I don't think we want to change tlsdesc*. Representing them as normal insns rather than calls seems reasonable to me. Now that we're in stage1 I do want to revisit the CLOBBER_HIGH stuff. When we left things I think we were trying to decide between CLOBBER_HIGH and clobbering the appropriate subreg. The problem with the latter is the dataflow we compute is inaccurate (overly pessimistic) so that'd have to be fixed. Jeff
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
On Sat, May 26, 2018 at 11:09:24AM +0100, Richard Sandiford wrote: > On the wider point about changing the way call clobber information > is represented: I agree it would be good to generalise what we have > now. But if possible I think we should avoid target hooks that take > a specific call, and instead make it an inherent part of the call insn > itself, much like CALL_INSN_FUNCTION_USAGE is now. E.g. we could add > a field that points to an ABI description, with -fipa-ra effectively > creating ad-hoc ABIs. That ABI description could start out with > whatever we think is relevant now and could grow over time. Somewhat related: there still is PR68150 open for problems with HARD_REGNO_CALL_PART_CLOBBERED in postreload-gcse (it ignores it). Segher
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
Steve Ellceywrites: > On Wed, 2018-05-16 at 22:11 +0100, Richard Sandiford wrote: >> >> TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way >> of saying that an rtl instruction preserves the low part of a >> register but clobbers the high part. We would need something like >> Alan H's CLOBBER_HIGH patches to do it using explicit clobbers. >> >> Another approach would be to piggy-back on the -fipa-ra >> infrastructure >> and record that vector PCS functions only clobber Q0-Q7. If -fipa-ra >> knows that a function doesn't clobber Q8-Q15 then that should >> override >> TARGET_HARD_REGNO_CALL_PART_CLOBBERED. (I'm not sure whether it does >> in practice, but it should :-) And if it doesn't that's a bug that's >> worth fixing for its own sake.) >> >> Thanks, >> Richard > > Alan, > > I have been looking at your CLOBBER_HIGH patches to see if they > might be helpful in implementing the ARM SIMD Vector ABI in GCC. > I have also been looking at the -fipa-ra flag and how it works. > > I was wondering if you considered using the ipa-ra infrastructure > for the SVE work that you are currently trying to support with > the CLOBBER_HIGH macro? > > My current thought for the ABI work is to mark all the floating > point / vector registers as caller saved (the lower half of V8-V15 > are currently callee saved) and remove > TARGET_HARD_REGNO_CALL_PART_CLOBBERED. > This should work but would be inefficient. > > The next step would be to split get_call_reg_set_usage up into > two functions so that I don't have to pass in a default set of > registers. One function would return call_used_reg_set by > default (but could return a smaller set if it had actual used > register information) and the other would return regs_invalidated > by_call by default (but could also return a smaller set). > > Next I would add a 'largest mode used' array to call_cgraph_rtl_info > structure in addition to the current function_used_regs register > set. > > Then I could turn the get_call_reg_set_usage replacement functions > into target specific functions and with the information in the > call_cgraph_rtl_info structure and any simd attribute information on > a function I could modify what registers are really being used/invalidated > without being saved. > > If the called function only uses the bottom half of a register it would not > be marked as used/invalidated. If it uses the entire register and the > function is not marked as simd, then the register would marked as > used/invalidated. If the function was marked as simd the register would not > be marked because a simd function would save both the upper and lower halves > of a callee saved register (whereas a non simd function would only save the > lower half). > > Does this sound like something that could be used in place of your > CLOBBER_HIGH patch? One of the advantages of CLOBBER_HIGH is that it can be attached to arbitrary instructions, not just calls. The motivating example was tlsdesc_small_, which isn't treated as a call but as a normal instruction. (And I don't think we want to change that, since it's much easier for rtl optimisers to deal with normal instructions compared to calls. In general a call is part of a longer sequence of instructions that includes setting up arguments, etc.) The other use case (not implemented in the posted patches) would be to represent the effect of syscalls, which clobber the "SVE part" of all vector registers. In that case the clobber would need to be attached to an inline asm insn. On the wider point about changing the way call clobber information is represented: I agree it would be good to generalise what we have now. But if possible I think we should avoid target hooks that take a specific call, and instead make it an inherent part of the call insn itself, much like CALL_INSN_FUNCTION_USAGE is now. E.g. we could add a field that points to an ABI description, with -fipa-ra effectively creating ad-hoc ABIs. That ABI description could start out with whatever we think is relevant now and could grow over time. Thanks, Richard
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
On Wed, 2018-05-16 at 22:11 +0100, Richard Sandiford wrote: > > TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way > of saying that an rtl instruction preserves the low part of a > register but clobbers the high part. We would need something like > Alan H's CLOBBER_HIGH patches to do it using explicit clobbers. > > Another approach would be to piggy-back on the -fipa-ra > infrastructure > and record that vector PCS functions only clobber Q0-Q7. If -fipa-ra > knows that a function doesn't clobber Q8-Q15 then that should > override > TARGET_HARD_REGNO_CALL_PART_CLOBBERED. (I'm not sure whether it does > in practice, but it should :-) And if it doesn't that's a bug that's > worth fixing for its own sake.) > > Thanks, > Richard Alan, I have been looking at your CLOBBER_HIGH patches to see if they might be helpful in implementing the ARM SIMD Vector ABI in GCC. I have also been looking at the -fipa-ra flag and how it works. I was wondering if you considered using the ipa-ra infrastructure for the SVE work that you are currently trying to support with the CLOBBER_HIGH macro? My current thought for the ABI work is to mark all the floating point / vector registers as caller saved (the lower half of V8-V15 are currently callee saved) and remove TARGET_HARD_REGNO_CALL_PART_CLOBBERED. This should work but would be inefficient. The next step would be to split get_call_reg_set_usage up into two functions so that I don't have to pass in a default set of registers. One function would return call_used_reg_set by default (but could return a smaller set if it had actual used register information) and the other would return regs_invalidated by_call by default (but could also return a smaller set). Next I would add a 'largest mode used' array to call_cgraph_rtl_info structure in addition to the current function_used_regs register set. Then I could turn the get_call_reg_set_usage replacement functions into target specific functions and with the information in the call_cgraph_rtl_info structure and any simd attribute information on a function I could modify what registers are really being used/invalidated without being saved. If the called function only uses the bottom half of a register it would not be marked as used/invalidated. If it uses the entire register and the function is not marked as simd, then the register would marked as used/invalidated. If the function was marked as simd the register would not be marked because a simd function would save both the upper and lower halves of a callee saved register (whereas a non simd function would only save the lower half). Does this sound like something that could be used in place of your CLOBBER_HIGH patch? Steve Ellcey sell...@cavium.com
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
Steve Ellceywrites: > On Wed, 2018-05-16 at 17:30 +0100, Richard Earnshaw (lists) wrote: >> On 16/05/18 17:21, Steve Ellcey wrote: >> > >> > It doesn't look like GCC has any existing mechanism for having different >> > sets of caller saved/callee saved registers depending on the function >> > attributes of the calling or called function. >> > >> > Changing what registers a callee function saves and restores shouldn't >> > be too difficult since that can be done when generating the prologue >> > and epilogue code but changing what registers a caller saves/restores >> > when doing the call seems trickier. The macro >> > TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the >> > function being called. It returns true/false depending on just the >> > register number and mode. >> > >> > Steve Ellcey >> > sell...@cavium.com >> > >> >> Actually, we can. See, for example, the attribute((pcs)) for the ARM >> port. I think we could probably handle this automagically for the SVE >> vector calling convention in AArch64. >> >> R. > > Interesting, it looks like one could use aarch64_emit_call to emit > extra use_reg / clobber_reg instructions but in this case we want to > tell the caller that some registers are not being clobbered by the > callee. The ARM port does not > define TARGET_HARD_REGNO_CALL_PART_CLOBBERED and that seemed like one > of the most problamatic issues with Aarch64. Maybe we would have to > undefine this for aarch64 and use explicit clobbers to say what > floating point registers / vector registers are clobbered for each > call? I wonder how that would affect register allocation. TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way of saying that an rtl instruction preserves the low part of a register but clobbers the high part. We would need something like Alan H's CLOBBER_HIGH patches to do it using explicit clobbers. Another approach would be to piggy-back on the -fipa-ra infrastructure and record that vector PCS functions only clobber Q0-Q7. If -fipa-ra knows that a function doesn't clobber Q8-Q15 then that should override TARGET_HARD_REGNO_CALL_PART_CLOBBERED. (I'm not sure whether it does in practice, but it should :-) And if it doesn't that's a bug that's worth fixing for its own sake.) Thanks, Richard
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
On Wed, 2018-05-16 at 17:30 +0100, Richard Earnshaw (lists) wrote: > On 16/05/18 17:21, Steve Ellcey wrote: > > > > It doesn't look like GCC has any existing mechanism for having different > > sets of caller saved/callee saved registers depending on the function > > attributes of the calling or called function. > > > > Changing what registers a callee function saves and restores shouldn't > > be too difficult since that can be done when generating the prologue > > and epilogue code but changing what registers a caller saves/restores > > when doing the call seems trickier. The macro > > TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the > > function being called. It returns true/false depending on just the > > register number and mode. > > > > Steve Ellcey > > sell...@cavium.com > > > > Actually, we can. See, for example, the attribute((pcs)) for the ARM > port. I think we could probably handle this automagically for the SVE > vector calling convention in AArch64. > > R. Interesting, it looks like one could use aarch64_emit_call to emit extra use_reg / clobber_reg instructions but in this case we want to tell the caller that some registers are not being clobbered by the callee. The ARM port does not define TARGET_HARD_REGNO_CALL_PART_CLOBBERED and that seemed like one of the most problamatic issues with Aarch64. Maybe we would have to undefine this for aarch64 and use explicit clobbers to say what floating point registers / vector registers are clobbered for each call? I wonder how that would affect register allocation. Steve Ellcey sell...@cavium.com
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
On 16/05/18 17:21, Steve Ellcey wrote: > On Tue, 2018-05-15 at 18:29 +, Francesco Petrogalli wrote: > >> Hi Steve, >> >> I am happy to let you know that the Vector Function ABI for AArch64 >> is now public and available via the link at [1]. >> >> Don’t hesitate to contact me in case you have any questions. >> >> Kind regards, >> >> Francesco >> >> [1] https://developer.arm.com/products/software-development-tools/hpc >> /arm-compiler-for-hpc/vector-function-abi >> >>> >>> Steve Ellcey >>> sell...@cavium.com > > Thanks for publishing this Francesco, it looks like the main issue for > GCC is that the Vector Function ABI has different caller saved / callee > saved register conventions than the standard ARM calling convention. > > If I understand things correctly, in the standard calling convention > the callee will only save the bottom 64 bits of V8-V15 and so the > caller needs to save those registers if it is using the top half. In > the Vector calling convention the callee will save all 128 bits of > these registers (and possibly more registers) so the caller does not > have to save these registers at all, even if it is using all 128 bits > of them. > > It doesn't look like GCC has any existing mechanism for having different > sets of caller saved/callee saved registers depending on the function > attributes of the calling or called function. > > Changing what registers a callee function saves and restores shouldn't > be too difficult since that can be done when generating the prologue > and epilogue code but changing what registers a caller saves/restores > when doing the call seems trickier. The macro > TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the > function being called. It returns true/false depending on just the > register number and mode. > > Steve Ellcey > sell...@cavium.com > Actually, we can. See, for example, the attribute((pcs)) for the ARM port. I think we could probably handle this automagically for the SVE vector calling convention in AArch64. R.
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
On Tue, 2018-05-15 at 18:29 +, Francesco Petrogalli wrote: > Hi Steve, > > I am happy to let you know that the Vector Function ABI for AArch64 > is now public and available via the link at [1]. > > Don’t hesitate to contact me in case you have any questions. > > Kind regards, > > Francesco > > [1] https://developer.arm.com/products/software-development-tools/hpc > /arm-compiler-for-hpc/vector-function-abi > > > > > Steve Ellcey > > sell...@cavium.com Thanks for publishing this Francesco, it looks like the main issue for GCC is that the Vector Function ABI has different caller saved / callee saved register conventions than the standard ARM calling convention. If I understand things correctly, in the standard calling convention the callee will only save the bottom 64 bits of V8-V15 and so the caller needs to save those registers if it is using the top half. In the Vector calling convention the callee will save all 128 bits of these registers (and possibly more registers) so the caller does not have to save these registers at all, even if it is using all 128 bits of them. It doesn't look like GCC has any existing mechanism for having different sets of caller saved/callee saved registers depending on the function attributes of the calling or called function. Changing what registers a callee function saves and restores shouldn't be too difficult since that can be done when generating the prologue and epilogue code but changing what registers a caller saves/restores when doing the call seems trickier. The macro TARGET_HARD_REGNO_CALL_PART_CLOBBERED doesn't know anything about the function being called. It returns true/false depending on just the register number and mode. Steve Ellcey sell...@cavium.com
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
> On Feb 9, 2018, at 3:47 PM, Steve Ellceywrote: > > […] > I was wondering if the function vector ABI has been published yet and > if so, where I could find it. > Hi Steve, I am happy to let you know that the Vector Function ABI for AArch64 is now public and available via the link at [1]. Don’t hesitate to contact me in case you have any questions. Kind regards, Francesco [1] https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi > Steve Ellcey > sell...@cavium.com
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
James, This is a follow-up to https://gcc.gnu.org/ml/gcc/2017-03/msg00109.html where you said: | Hi Ashwin, | | Thanks for the question. ARM has defined a vector function ABI, based | on the Vector Function ABI Specification you linked below, which | is designed to be suitable for both the Advanced SIMD and Scalable | Vector Extensions. There has not yet been a release of this document | which I can point you at, nor can I give you an estimate of when the | document will be published. I was wondering if the function vector ABI has been published yet and if so, where I could find it. Steve Ellcey sell...@cavium.com
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
On Friday 17 March 2017 07:31 PM, James Greenhalgh wrote: > On Wed, Mar 15, 2017 at 09:50:18AM +, Sekhar, Ashwin wrote: >> Hi GCC Team, Aarch64 Maintainers, >> >> >> The rules in Vector Function Application Binary Interface Specification for >> OpenMP >> (https://sourceware.org/glibc/wiki/libmvec?action=AttachFile=view=VectorABI.txt) >> is used in x86 for generating the simd clones of a function. >> >> Is there a similar one defined for Aarch64? >> >> If not, would like to start a discussion on the same for Aarch64. To kick >> start the same, a draft proposal for Aarch64 (on the same lines as x86 ABI) >> is included below. The only change from x86 ABI is in the function name >> mangling. Here the letter 'b' is used for indicating the ASIMD isa. > > Hi Ashwin, > > Thanks for the question. ARM has defined a vector function ABI, based > on the Vector Function ABI Specification you linked below, which > is designed to be suitable for both the Advanced SIMD and Scalable > Vector Extensions. There has not yet been a release of this document > which I can point you at, nor can I give you an estimate of when the > document will be published. > > However, Francesco Petrogalli has recently made a proposal to the > LLVM mailing list ( https://reviews.llvm.org/D30739 ) which I would > note conflicts with your proposal in one way. You choose 'b' for name > mangling for a vector function using Advanced SIMD, while Francesco > uses 'n', which is the agreed character in the Vector Function ABI > Specification we have been working on. > > I'd encourage you to wait for formal publication of the ARM Vector > Function ABI to prevent any unexpected divergence between > implementations. Thanks for the information. We at Cavium are also working on libraries which requires this ABI specification. So we would like to see this published as early as possible. > > Thanks, > James > > Thanks Ashwin
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
On Wed, Mar 15, 2017 at 09:50:18AM +, Sekhar, Ashwin wrote: > Hi GCC Team, Aarch64 Maintainers, > > > The rules in Vector Function Application Binary Interface Specification for > OpenMP > (https://sourceware.org/glibc/wiki/libmvec?action=AttachFile=view=VectorABI.txt) > is used in x86 for generating the simd clones of a function. > > Is there a similar one defined for Aarch64? > > If not, would like to start a discussion on the same for Aarch64. To kick > start the same, a draft proposal for Aarch64 (on the same lines as x86 ABI) > is included below. The only change from x86 ABI is in the function name > mangling. Here the letter 'b' is used for indicating the ASIMD isa. Hi Ashwin, Thanks for the question. ARM has defined a vector function ABI, based on the Vector Function ABI Specification you linked below, which is designed to be suitable for both the Advanced SIMD and Scalable Vector Extensions. There has not yet been a release of this document which I can point you at, nor can I give you an estimate of when the document will be published. However, Francesco Petrogalli has recently made a proposal to the LLVM mailing list ( https://reviews.llvm.org/D30739 ) which I would note conflicts with your proposal in one way. You choose 'b' for name mangling for a vector function using Advanced SIMD, while Francesco uses 'n', which is the agreed character in the Vector Function ABI Specification we have been working on. I'd encourage you to wait for formal publication of the ARM Vector Function ABI to prevent any unexpected divergence between implementations. Thanks, James
[Aarch64] Vector Function Application Binary Interface Specification for OpenMP
Hi GCC Team, Aarch64 Maintainers, The rules in Vector Function Application Binary Interface Specification for OpenMP (https://sourceware.org/glibc/wiki/libmvec?action=AttachFile=view=VectorABI.txt) is used in x86 for generating the simd clones of a function. Is there a similar one defined for Aarch64? If not, would like to start a discussion on the same for Aarch64. To kick start the same, a draft proposal for Aarch64 (on the same lines as x86 ABI) is included below. The only change from x86 ABI is in the function name mangling. Here the letter 'b' is used for indicating the ASIMD isa. Please review and comment. Thanks and Regards, Ashwin Sekhar T K CUT HERE -- Aarch64 Vector Function Application Binary Interface Specification for OpenMP 1. Vector Function ABI Overview Aarch64 Vector Function ABI provides ABI for the vector functions generated by compiler supporting SIMD constructs of OpenMP 4.0 [1] in Aarch64. This is based on the x86 Vector Function Application Binary Interface Specification for OpenMP [2]. 2. Vector Function ABI Vector Function ABI defines a set of rules that the caller and the callee functions must obey. These rules consist of: * Calling convention * Vector length (the number of concurrent scalar invocations to be processed per invocation of the vector function) * Mapping from element data types to vector data types * Ordering of vector arguments * Vector function masking * Vector function name mangling * Compiler generated variants of vector function 2.1. Calling Convention The vector functions should use calling convention described in Procedure Call Standard for the ARM 64-bit Architecture (AArch64) [3]. 2.2. Vector Length Every vector variant of a SIMD-enabled function has a vector length (VLEN). If OpenMP clause "simdlen" is used, the VLEN is the value of the argument of that clause. The VLEN value must be power of 2. In other case the notion of the function`s "characteristic data type" (CDT) is used to compute the vector length. CDT is defined in the following order: a) For non-void function, the CDT is the return type. b) If the function has any non-uniform, non-linear parameters, then the CDT is the type of the first such parameter. c) If the CDT determined by a) or b) above is struct, union, or class type which is pass-by-value (except for the type that maps to the built-in complex data type), the characteristic data type is int. d) If none of the above three cases is applicable, the CDT is int. VLEN = sizeof(vector_register) / sizeof(CDT), For example, if ISA is ASIMD, sizeof(vector_register) = 16, as the vector registers are 128 bit. And if the CDT of the function is "int", sizeof(CDT) = 4. So, VLEN = 4. 2.3. Element Data Type to Vector Data Type Mapping The vector data types for parameters are selected depending on ISA, vector length, data type of original parameter, and parameter specification. For uniform and linear parameters (detailed description could be found in [1]), the original data type is preserved. For vector parameters, vector data types are selected by the compiler. The mapping from element data type to vector data type is described as below. * The bit size of vector data type of parameter is computed as: size_of_vector_data_type = VLEN * sizeof(original_parameter_data_type) * 8 For instance, for ASIMD version of vector function with parameter data type "int": If VLEN = 4, size_of_vector_data_type = 4 * 4 * 8 = 128 (bits), which means one argument of type __m128 to be passed. * If the size_of_vector_data_type is greater than the width of the vector register, multiple vector registers are selected and the parameter will be passed in multiple vector registers. For instance, for ASIMD version of vector function with parameter data type "int": If VLEN = 8, size_of_vector_data_type = 8 * 4 * 8 = 256 (bits), so the vector data type is __m256, which means 2 arguments of type __m128 are to be passed. 2.4. Ordering of Vector Arguments * When a parameter in the original data type results in one argument in the vector function, the ordering rule is a simple one to one match with the original argument order. For example, when the