Re: [PATCH] ARM: convert all mov.* pc, reg to bx reg for ARMv6+ (part1)
Hi Russell, On 01.07.2014 18:19, Russell King wrote: ARMv6 and greater introduced a new instruction (bx) which can be used to return from function calls. Recent CPUs perform better when the bx lr instruction is used rather than the mov pc, lr instruction, and this sequence is strongly recommended to be used by the ARM architecture manual (section A.4.1.1). We provide a new macro ret with all its variants for the condition code which will resolve to the appropriate instruction. Rather than doing this piecemeal, and miss some instances, change all the mov pc instances to use the new macro, with the exception of the movs instruction and the kprobes code. This allows us to detect the mov pc, lr case and fix it up - and also gives us the possibility of deploying this for other registers depending on the CPU selection. Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk Build- and boot-tested on mach-s3c64xx. Tested-by: Tomasz Figa tomasz.f...@gmail.com Best regards, Tomasz -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM: convert all mov.* pc, reg to bx reg for ARMv6+ (part1)
Russell King rmk+ker...@arm.linux.org.uk writes: ARMv6 and greater introduced a new instruction (bx) which can be used to return from function calls. Recent CPUs perform better when the bx lr instruction is used rather than the mov pc, lr instruction, and this sequence is strongly recommended to be used by the ARM architecture manual (section A.4.1.1). We provide a new macro ret with all its variants for the condition code which will resolve to the appropriate instruction. When the source register is not lr the name ret is a misnomer since only the bx lr instruction is predicted as a function return. The bx instruction with other source registers uses the normal prediction mechanisms, leaving the return stack alone, and should not be used for function returns. Any code currently using another register to return from a function should probably be modified to use lr instead, unless there are special reasons for doing otherwise. If code jumping to an address in a non-lr register is not a return, using the ret name will make for some rather confusing reading. I would suggest either using a more neutral name than ret or adding an alias to be used for non-return jumps so as to make the intent clearer. -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM: convert all mov.* pc, reg to bx reg for ARMv6+ (part1)
On Tue, Jul 01, 2014 at 05:42:42PM +0100, Måns Rullgård wrote: Russell King rmk+ker...@arm.linux.org.uk writes: ARMv6 and greater introduced a new instruction (bx) which can be used to return from function calls. Recent CPUs perform better when the bx lr instruction is used rather than the mov pc, lr instruction, and this sequence is strongly recommended to be used by the ARM architecture manual (section A.4.1.1). We provide a new macro ret with all its variants for the condition code which will resolve to the appropriate instruction. When the source register is not lr the name ret is a misnomer since only the bx lr instruction is predicted as a function return. The bx instruction with other source registers uses the normal prediction mechanisms, leaving the return stack alone, and should not be used for function returns. Any code currently using another register to return from a function should probably be modified to use lr instead, unless there are special reasons for doing otherwise. If code jumping to an address in a non-lr register is not a return, using the ret name will make for some rather confusing reading. I would suggest either using a more neutral name than ret or adding an alias to be used for non-return jumps so as to make the intent clearer. If you read the patch, the branches which are changed are those which do indeed return in some way. There are those which do this having moved lr to a different register. As you point out, bx lr /may/ be treated specially (I've actually been discussing this with Will Deacon over the last couple of days, who has also been talking to the hardware people in ARM, and Will is happy with this patch as in its current form.) This is why I've changed all mov pc, reg instructions which return in some way to use this macro, and left others (those which are used to call some function and return back to the same point) alone. I have thought about introducing a call macro for those other sites, but as there are soo few of them, I've left them as-is for the time being (this patch is already large enough.) If there are any in the patch which you have specific concerns about, please specifically point them out those which give you a concern. Thanks. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM: convert all mov.* pc, reg to bx reg for ARMv6+ (part1)
Russell King - ARM Linux li...@arm.linux.org.uk writes: On Tue, Jul 01, 2014 at 05:42:42PM +0100, Måns Rullgård wrote: Russell King rmk+ker...@arm.linux.org.uk writes: ARMv6 and greater introduced a new instruction (bx) which can be used to return from function calls. Recent CPUs perform better when the bx lr instruction is used rather than the mov pc, lr instruction, and this sequence is strongly recommended to be used by the ARM architecture manual (section A.4.1.1). We provide a new macro ret with all its variants for the condition code which will resolve to the appropriate instruction. When the source register is not lr the name ret is a misnomer since only the bx lr instruction is predicted as a function return. The bx instruction with other source registers uses the normal prediction mechanisms, leaving the return stack alone, and should not be used for function returns. Any code currently using another register to return from a function should probably be modified to use lr instead, unless there are special reasons for doing otherwise. If code jumping to an address in a non-lr register is not a return, using the ret name will make for some rather confusing reading. I would suggest either using a more neutral name than ret or adding an alias to be used for non-return jumps so as to make the intent clearer. If you read the patch, the branches which are changed are those which do indeed return in some way. There are those which do this having moved lr to a different register. The patch is huge, and it wasn't obvious (to me) from the diff context what the non-lr jumps were doing. As you point out, bx lr /may/ be treated specially (I've actually been Most, if not all, Cortex-A cores do this according the public TRMs. They also do the same thing for mov pc, lr so there will probably be no performance gain from this change. It's still a good idea though, since we don't know what future cores will do. discussing this with Will Deacon over the last couple of days, who has also been talking to the hardware people in ARM, and Will is happy with this patch as in its current form.) This is why I've changed all mov pc, reg instructions which return in some way to use this macro, and left others (those which are used to call some function and return back to the same point) alone. In that case the patch should be fine. Your patch description didn't make it clear that only actual returns were being changed. -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM: convert all mov.* pc, reg to bx reg for ARMv6+ (part1)
Hi Mans, On Tue, Jul 01, 2014 at 06:24:43PM +0100, Måns Rullgård wrote: Russell King - ARM Linux li...@arm.linux.org.uk writes: As you point out, bx lr /may/ be treated specially (I've actually been Most, if not all, Cortex-A cores do this according the public TRMs. They also do the same thing for mov pc, lr so there will probably be no performance gain from this change. It's still a good idea though, since we don't know what future cores will do. Funnily enough, that's not actually true (and is more or less what prompted this patch after discussion with Russell). There are cores out there that don't predict mov pc, lr at all (let alone do anything with the return stack). discussing this with Will Deacon over the last couple of days, who has also been talking to the hardware people in ARM, and Will is happy with this patch as in its current form.) This is why I've changed all mov pc, reg instructions which return in some way to use this macro, and left others (those which are used to call some function and return back to the same point) alone. In that case the patch should be fine. Your patch description didn't make it clear that only actual returns were being changed. I'm led to believe that some predictors require lr in order to update the return stack, whilst others don't. That part is all horribly micro-architectural, so the current patch is doing the right thing by sticking to the ARM ARM but enabling us to hook into other registers later on if we choose. Cheers, Will -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM: convert all mov.* pc, reg to bx reg for ARMv6+ (part1)
On 07/01/2014 10:19 AM, Russell King wrote: ARMv6 and greater introduced a new instruction (bx) which can be used to return from function calls. Recent CPUs perform better when the bx lr instruction is used rather than the mov pc, lr instruction, and this sequence is strongly recommended to be used by the ARM architecture manual (section A.4.1.1). We provide a new macro ret with all its variants for the condition code which will resolve to the appropriate instruction. Rather than doing this piecemeal, and miss some instances, change all the mov pc instances to use the new macro, with the exception of the movs instruction and the kprobes code. This allows us to detect the mov pc, lr case and fix it up - and also gives us the possibility of deploying this for other registers depending on the CPU selection. Tested-by: Stephen Warren swar...@nvidia.com (On an NVIDIA Tegra Jetson TK1 board, both CPU hotplug and system sleep were tested, which are the use-cases that actually use the edited assembly files) -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM: convert all mov.* pc, reg to bx reg for ARMv6+ (part1)
Will Deacon will.dea...@arm.com writes: Hi Mans, On Tue, Jul 01, 2014 at 06:24:43PM +0100, Måns Rullgård wrote: Russell King - ARM Linux li...@arm.linux.org.uk writes: As you point out, bx lr /may/ be treated specially (I've actually been Most, if not all, Cortex-A cores do this according the public TRMs. They also do the same thing for mov pc, lr so there will probably be no performance gain from this change. It's still a good idea though, since we don't know what future cores will do. Funnily enough, that's not actually true (and is more or less what prompted this patch after discussion with Russell). There are cores out there that don't predict mov pc, lr at all (let alone do anything with the return stack). Even ones where the TRM says they do? I suppose the only way to know for sure is to measure it. discussing this with Will Deacon over the last couple of days, who has also been talking to the hardware people in ARM, and Will is happy with this patch as in its current form.) This is why I've changed all mov pc, reg instructions which return in some way to use this macro, and left others (those which are used to call some function and return back to the same point) alone. In that case the patch should be fine. Your patch description didn't make it clear that only actual returns were being changed. I'm led to believe that some predictors require lr in order to update the return stack, whilst others don't. That part is all horribly micro-architectural, so the current patch is doing the right thing by sticking to the ARM ARM but enabling us to hook into other registers later on if we choose. Agreed. -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM: convert all mov.* pc, reg to bx reg for ARMv6+ (part1)
Russell King rmk+ker...@arm.linux.org.uk writes: ARMv6 and greater introduced a new instruction (bx) which can be used to return from function calls. Recent CPUs perform better when the bx lr instruction is used rather than the mov pc, lr instruction, and this sequence is strongly recommended to be used by the ARM architecture manual (section A.4.1.1). For ARMv5 - XScale, and more specificaly the mioa701_bootresume.S path (suspend to RAM) : Tested-by: Robert Jarzmik robert.jarz...@free.fr Cheers. -- Robert -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM: convert all mov.* pc, reg to bx reg for ARMv6+ (part1)
On Tue, 1 Jul 2014, Will Deacon wrote: Hi Mans, On Tue, Jul 01, 2014 at 06:24:43PM +0100, Måns Rullgård wrote: Russell King - ARM Linux li...@arm.linux.org.uk writes: As you point out, bx lr /may/ be treated specially (I've actually been Most, if not all, Cortex-A cores do this according the public TRMs. They also do the same thing for mov pc, lr so there will probably be no performance gain from this change. It's still a good idea though, since we don't know what future cores will do. Funnily enough, that's not actually true (and is more or less what prompted this patch after discussion with Russell). There are cores out there that don't predict mov pc, lr at all (let alone do anything with the return stack). discussing this with Will Deacon over the last couple of days, who has also been talking to the hardware people in ARM, and Will is happy with this patch as in its current form.) This is why I've changed all mov pc, reg instructions which return in some way to use this macro, and left others (those which are used to call some function and return back to the same point) alone. In that case the patch should be fine. Your patch description didn't make it clear that only actual returns were being changed. I'm led to believe that some predictors require lr in order to update the return stack, whilst others don't. That part is all horribly micro-architectural, so the current patch is doing the right thing by sticking to the ARM ARM but enabling us to hook into other registers later on if we choose. May I suggest to have a patch with only the macro definition in it and all this discussion in the commit log please? The usage sites should be done in a separate patch to make it clearer. Nicolas