Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Wed, 7 Nov 2018, Stefan Agner wrote: > On 06.11.2018 05:49, Nicolas Pitre wrote: > > So here's the revised patch. It now has full compile-test coverage for > > real this time. Would you mind reviewing it again before I resubmit it > > please? > > Compile tested all copypage implementations with the revised patch using > Clang too, everything builds fine. > > FWIW, I used this defconfigs: > copypage-fa.c: moxart_defconfig > copypage-feroceon.c: mvebu_v5_defconfig > copypage-v4mc.c: h3600_defconfig+CONFIG_AEABI > copypage-v4wb.c/v4wt.c: multi_v4t_defconfig > copypage-xsc3.c/scale.c: pxa_defconfig-CONFIG_FTRACE > > The changes look good to me: > > Reviewed-by: Stefan Agner Thanks. Submitted here: http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8805/2 Nicolas
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On 06.11.2018 05:49, Nicolas Pitre wrote: > On Tue, 6 Nov 2018, Stefan Agner wrote: > >> On 16.10.2018 22:43, Nicolas Pitre wrote: >> > Subject: [PATCH] ARM: remove naked function usage >> > >> > Convert page copy functions not to rely on the naked function attribute. >> > >> > This attribute is known to confuse some gcc versions when function >> > arguments aren't explicitly listed as inline assembly operands despite >> > the gcc documentation. That resulted in commit 9a40ac86152c ("ARM: >> > 6164/1: Add kto and kfrom to input operands list."). >> > >> > Yet that commit has problems of its own by having assembly operand >> > constraints completely wrong. If the generated code has been OK since >> > then, it is due to luck rather than correctness. So this patch provides >> > proper assembly operand usage, and removes two instances of redundant >> > register duplications in the implementation while at it. >> > >> > Inspection of the generated code with this patch doesn't show any obvious >> > quality degradation either, so not relying on __naked at all will make >> > the code less fragile, and more likely to be compilable with clang. >> > >> > The only remaining __naked instances (excluding the kprobes test cases) >> > are exynos_pm_power_up_setup() and tc2_pm_power_up_setup(). But in those >> > cases only the function address is used by the compiler with no chance of >> > inlining it by mistake. >> > >> > Signed-off-by: Nicolas Pitre >> >> As mentioned a couple of weeks ago, I did test this patchset on two >> architectures (pxa_defconfig -> copypage-xscale.c and >> versatile_defconfig -> copypage-v4wb.c). >> >> I really like this approach, can we move forward with this? > > Yes, the patch was submitted to the patch tracker a few days later. > Oh sorry, didn't realize that! >> > + asm volatile ("\ >> > + pld [%1, #0]\n\ >> > + pld [%1, #32] \n\ >> > +1:pld [%1, #64] \n\ >> > + pld [%1, #96] \n\ >> >\n\ >> > -2:ldrdr2, [r1], #8\n\ >> > - mov ip, r0 \n\ >> > - ldrdr4, [r1], #8\n\ >> > - mcr p15, 0, ip, c7, c6, 1 @ invalidate\n\ >> > - strdr2, [r0], #8\n\ >> > - ldrdr2, [r1], #8\n\ >> > - strdr4, [r0], #8\n\ >> > - ldrdr4, [r1], #8\n\ >> > - strdr2, [r0], #8\n\ >> > - strdr4, [r0], #8\n\ >> > - ldrdr2, [r1], #8\n\ >> > - mov ip, r0 \n\ >> > - ldrdr4, [r1], #8\n\ >> > - mcr p15, 0, ip, c7, c6, 1 @ invalidate\n\ >> > - strdr2, [r0], #8\n\ >> > - ldrdr2, [r1], #8\n\ >> > - subslr, lr, #1 \n\ >> > - strdr4, [r0], #8\n\ >> > - ldrdr4, [r1], #8\n\ >> > - strdr2, [r0], #8\n\ >> > - strdr4, [r0], #8\n\ >> > +2:ldrdr2, [%1], #8\n\ >> > + ldrdr4, [%1], #8\n\ >> > + mcr p15, 0, %0, c7, c6, 1 @ invalidate\n\ >> > + strdr2, [%0], #8\n\ >> > + ldrdr2, [%1], #8\n\ >> > + strdr4, [%0], #8\n\ >> > + ldrdr4, [%1], #8\n\ >> > + strdr2, [%0], #8\n\ >> > + strdr4, [%0], #8\n\ >> > + ldrdr2, [%1], #8\n\ >> > + ldrdr4, [%1], #8\n\ >> > + mcr p15, 0, %0, c7, c6, 1 @ invalidate\n\ >> > + strdr2, [%0], #8\n\ >> > + ldrdr2, [%1], #8\n\ >> > + subs%2, %2, #1 \n\ >> > + strdr4, [%0], #8\n\ >> > + ldrdr4, [%1], #8\n\ >> > + strdr2, [%0], #8\n\ >> > + strdr4, [%0], #8\n\ >> >bgt 1b \n\ >> > - beq 2b \n\ >> > - \n\ >> > - ldmfd sp!, {r4, r5, pc}" >> > - : >> > - : "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64 - 1)); >> > + beq 2b " >> > + : "+&r" (kto), "+&r" (kfrom), "=&r" (tmp) >> > + : "2" (PAGE_SIZE / 64 - 1) >> > + : "r2", "r3", "r4", "r5"); >> >> r3 and r5 are not used above, so no need to have them in the clobber >> list. > > They are used. ldrd and strd instructions always use a pair of > consecutive registers. So "ldrd r2, ..." loads into r2-r3 and "ldrd r4, ..." > loads into r4-r5. Oh I see. The clobber list is fine then! > >> > diff --git a/ar
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Tue, 6 Nov 2018, Robin Murphy wrote: > On 06/11/2018 04:49, Nicolas Pitre wrote: > [...] > >> r3 and r5 are not used above, so no need to have them in the clobber > >> list. > > > > They are used. ldrd and strd instructions always use a pair of > > consecutive registers. So "ldrd r2, ..." loads into r2-r3 and "ldrd r4, ..." > > loads into r4-r5. > > FWIW, since we should now be enabling unified syntax everywhere, I guess we > could probably rewrite all those ldrd/strd to the UAL 3-operand form - i.e. > "ldrd r2, r3, [...]" - if we really cared for the extra clarity. Good idea. Worthy of a separate patch though. Nicolas
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On 06/11/2018 04:49, Nicolas Pitre wrote: [...] r3 and r5 are not used above, so no need to have them in the clobber list. They are used. ldrd and strd instructions always use a pair of consecutive registers. So "ldrd r2, ..." loads into r2-r3 and "ldrd r4, ..." loads into r4-r5. FWIW, since we should now be enabling unified syntax everywhere, I guess we could probably rewrite all those ldrd/strd to the UAL 3-operand form - i.e. "ldrd r2, r3, [...]" - if we really cared for the extra clarity. Robin.
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Tue, 6 Nov 2018, Stefan Agner wrote: > On 16.10.2018 22:43, Nicolas Pitre wrote: > > Subject: [PATCH] ARM: remove naked function usage > > > > Convert page copy functions not to rely on the naked function attribute. > > > > This attribute is known to confuse some gcc versions when function > > arguments aren't explicitly listed as inline assembly operands despite > > the gcc documentation. That resulted in commit 9a40ac86152c ("ARM: > > 6164/1: Add kto and kfrom to input operands list."). > > > > Yet that commit has problems of its own by having assembly operand > > constraints completely wrong. If the generated code has been OK since > > then, it is due to luck rather than correctness. So this patch provides > > proper assembly operand usage, and removes two instances of redundant > > register duplications in the implementation while at it. > > > > Inspection of the generated code with this patch doesn't show any obvious > > quality degradation either, so not relying on __naked at all will make > > the code less fragile, and more likely to be compilable with clang. > > > > The only remaining __naked instances (excluding the kprobes test cases) > > are exynos_pm_power_up_setup() and tc2_pm_power_up_setup(). But in those > > cases only the function address is used by the compiler with no chance of > > inlining it by mistake. > > > > Signed-off-by: Nicolas Pitre > > As mentioned a couple of weeks ago, I did test this patchset on two > architectures (pxa_defconfig -> copypage-xscale.c and > versatile_defconfig -> copypage-v4wb.c). > > I really like this approach, can we move forward with this? Yes, the patch was submitted to the patch tracker a few days later. > A couple of comments below: > > > > --- > > arch/arm/mm/copypage-fa.c | 34 ++-- > > arch/arm/mm/copypage-feroceon.c | 97 +-- > > arch/arm/mm/copypage-v4mc.c | 18 +++ > > arch/arm/mm/copypage-v4wb.c | 40 +++ > > arch/arm/mm/copypage-v4wt.c | 36 ++--- > > arch/arm/mm/copypage-xsc3.c | 70 +++-- > > arch/arm/mm/copypage-xscale.c | 70 - > > 7 files changed, 171 insertions(+), 194 deletions(-) > > > > diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c > > index d130a5ece5..453a3341ca 100644 > > --- a/arch/arm/mm/copypage-fa.c > > +++ b/arch/arm/mm/copypage-fa.c > > @@ -17,26 +17,24 @@ > > /* > > * Faraday optimised copy_user_page > > */ > > -static void __naked > > -fa_copy_user_page(void *kto, const void *kfrom) > > +static void fa_copy_user_page(void *kto, const void *kfrom) > > { > > - asm("\ > > - stmfd sp!, {r4, lr} @ 2\n\ > > - mov r2, %0 @ 1\n\ > > -1: ldmia r1!, {r3, r4, ip, lr} @ 4\n\ > > - stmia r0, {r3, r4, ip, lr}@ 4\n\ > > - mcr p15, 0, r0, c7, c14, 1 @ 1 clean and invalidate D > > line\n\ > > - add r0, r0, #16 @ 1\n\ > > - ldmia r1!, {r3, r4, ip, lr} @ 4\n\ > > - stmia r0, {r3, r4, ip, lr}@ 4\n\ > > - mcr p15, 0, r0, c7, c14, 1 @ 1 clean and invalidate D > > line\n\ > > - add r0, r0, #16 @ 1\n\ > > - subsr2, r2, #1 @ 1\n\ > > + int tmp; > > There should be an empty line here. Yeah... there should. > > + asm volatile ("\ > > +1: ldmia %1!, {r3, r4, ip, lr} @ 4\n\ > > + stmia %0, {r3, r4, ip, lr}@ 4\n\ > > + mcr p15, 0, %0, c7, c14, 1 @ 1 clean and invalidate D > > line\n\ > > + add %0, %0, #16 @ 1\n\ > > + ldmia %1!, {r3, r4, ip, lr} @ 4\n\ > > + stmia %0, {r3, r4, ip, lr}@ 4\n\ > > + mcr p15, 0, %0, c7, c14, 1 @ 1 clean and invalidate D > > line\n\ > > + add %0, %0, #16 @ 1\n\ > > + subs%2, %2, #1 @ 1\n\ > > bne 1b @ 1\n\ > > - mcr p15, 0, r2, c7, c10, 4 @ 1 drain WB\n\ > > - ldmfd sp!, {r4, pc} @ 3" > > - : > > - : "I" (PAGE_SIZE / 32)); > > + mcr p15, 0, %2, c7, c10, 4 @ 1 drain WB" > > + : "+&r" (kto), "+&r" (kfrom), "=&r" "tmp) > > There is sneaked in a " before tmp instead of (. Good catch. I did compile-test all the existing defconfigs though. Apparently this file is not covered? > > diff --git a/arch/arm/mm/copypage-feroceon.c > > b/arch/arm/mm/copypage-feroceon.c > > index 49ee0c1a72..1349430c63 100644 > > --- a/arch/arm/mm/copypage-feroceon.c > > +++ b/arch/arm/mm/copypage-feroceon.c > > @@ -13,58 +13,55 @@ > > #include > > #include > > > > -static void __naked > > -feroceon_copy_user_page(void *kto, const void *kfrom) > > +static void feroceon_copy_user_page(void *kto, const void *kfrom) > > { > > - asm("\ > > - stmfd sp!, {r4-r9, lr}
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On 16.10.2018 22:43, Nicolas Pitre wrote: > On Tue, 16 Oct 2018, Russell King - ARM Linux wrote: > >> On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote: >> > On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner wrote: >> > >> > > When functions incoming parameters are not in input operands list gcc >> > > 4.5 does not load the parameters into registers before calling this >> > > function but the inline assembly assumes valid addresses inside this >> > > function. This breaks the code because r0 and r1 are invalid when >> > > execution enters v4wb_copy_user_page () >> > > >> > > Also the constant needs to be used as third input operand so account >> > > for that as well. >> > > >> > > This fixes copypage-fa.c what has previously done before for the other >> > > copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto >> > > and kfrom to input operands list."). >> > > >> > > Signed-off-by: Stefan Agner >> > >> > Please add: >> > Cc: sta...@vger.kernel.org >> >> It's not obvious yet whether this is right - it contradicts the GCC >> manual, but then we have evidence that it's required for some GCC >> versions where GCC may clone the function, or if the function is >> used within the same file. > > Why not getting rid of __naked altogether? Here's what I suggest: > > - >8 > Subject: [PATCH] ARM: remove naked function usage > > Convert page copy functions not to rely on the naked function attribute. > > This attribute is known to confuse some gcc versions when function > arguments aren't explicitly listed as inline assembly operands despite > the gcc documentation. That resulted in commit 9a40ac86152c ("ARM: > 6164/1: Add kto and kfrom to input operands list."). > > Yet that commit has problems of its own by having assembly operand > constraints completely wrong. If the generated code has been OK since > then, it is due to luck rather than correctness. So this patch provides > proper assembly operand usage, and removes two instances of redundant > register duplications in the implementation while at it. > > Inspection of the generated code with this patch doesn't show any obvious > quality degradation either, so not relying on __naked at all will make > the code less fragile, and more likely to be compilable with clang. > > The only remaining __naked instances (excluding the kprobes test cases) > are exynos_pm_power_up_setup() and tc2_pm_power_up_setup(). But in those > cases only the function address is used by the compiler with no chance of > inlining it by mistake. > > Signed-off-by: Nicolas Pitre As mentioned a couple of weeks ago, I did test this patchset on two architectures (pxa_defconfig -> copypage-xscale.c and versatile_defconfig -> copypage-v4wb.c). I really like this approach, can we move forward with this? A couple of comments below: > --- > arch/arm/mm/copypage-fa.c | 34 ++-- > arch/arm/mm/copypage-feroceon.c | 97 +-- > arch/arm/mm/copypage-v4mc.c | 18 +++ > arch/arm/mm/copypage-v4wb.c | 40 +++ > arch/arm/mm/copypage-v4wt.c | 36 ++--- > arch/arm/mm/copypage-xsc3.c | 70 +++-- > arch/arm/mm/copypage-xscale.c | 70 - > 7 files changed, 171 insertions(+), 194 deletions(-) > > diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c > index d130a5ece5..453a3341ca 100644 > --- a/arch/arm/mm/copypage-fa.c > +++ b/arch/arm/mm/copypage-fa.c > @@ -17,26 +17,24 @@ > /* > * Faraday optimised copy_user_page > */ > -static void __naked > -fa_copy_user_page(void *kto, const void *kfrom) > +static void fa_copy_user_page(void *kto, const void *kfrom) > { > - asm("\ > - stmfd sp!, {r4, lr} @ 2\n\ > - mov r2, %0 @ 1\n\ > -1: ldmia r1!, {r3, r4, ip, lr} @ 4\n\ > - stmia r0, {r3, r4, ip, lr}@ 4\n\ > - mcr p15, 0, r0, c7, c14, 1 @ 1 clean and invalidate D > line\n\ > - add r0, r0, #16 @ 1\n\ > - ldmia r1!, {r3, r4, ip, lr} @ 4\n\ > - stmia r0, {r3, r4, ip, lr}@ 4\n\ > - mcr p15, 0, r0, c7, c14, 1 @ 1 clean and invalidate D > line\n\ > - add r0, r0, #16 @ 1\n\ > - subsr2, r2, #1 @ 1\n\ > + int tmp; There should be an empty line here. > + asm volatile ("\ > +1: ldmia %1!, {r3, r4, ip, lr} @ 4\n\ > + stmia %0, {r3, r4, ip, lr}@ 4\n\ > + mcr p15, 0, %0, c7, c14, 1 @ 1 clean and invalidate D > line\n\ > + add %0, %0, #16 @ 1\n\ > + ldmia %1!, {r3, r4, ip, lr} @ 4\n\ > + stmia %0, {r3, r4, ip, lr}@ 4\n\ > + mcr p15, 0, %0, c7, c14, 1 @ 1 clean and invalidate D > line\n\ > + add %0, %0, #16 @ 1\n\ > + subs%2, %
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Wed, 17 Oct 2018, Arnd Bergmann wrote: > On Tue, Oct 16, 2018 at 10:43 PM Nicolas Pitre > wrote: > > On Tue, 16 Oct 2018, Russell King - ARM Linux wrote: > > > On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote: > > > > On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner wrote: > > > It's not obvious yet whether this is right - it contradicts the GCC > > > manual, but then we have evidence that it's required for some GCC > > > versions where GCC may clone the function, or if the function is > > > used within the same file. > > > > Why not getting rid of __naked altogether? Here's what I suggest: > > > > - >8 > > Subject: [PATCH] ARM: remove naked function usage > > > > Convert page copy functions not to rely on the naked function attribute. > > > > This attribute is known to confuse some gcc versions when function > > arguments aren't explicitly listed as inline assembly operands despite > > the gcc documentation. That resulted in commit 9a40ac86152c ("ARM: > > 6164/1: Add kto and kfrom to input operands list."). > > It's probably worth noting that the minimum gcc version for compiling > the kernel is now gcc-4.6, which I think does not suffer from the gcc-4.5 > bug that triggered the change. See in particular commits 9c695203a7dd > ("compiler-gcc.h: gcc-4.5 needs noclone and noinline on __naked functions") > and d124b44f09ca ("Compiler Attributes: naked was fixed in gcc 4.6"). > > The first one made sure we don't inline these functions, so gcc-4.5 > no longer runs into the problem even in the absence of the workaround, > and the second patch reverts that again, noting that gcc-4.6 is fixed. > > I don't see anything wrong with converting the functions to not > use __naked at all, but I think we can also just revert the original > commit 9a40ac86152c to get it to build with clang. When I last > played with clang on arm32, that's what I did. I'll reply with the > patch I have in my randconfig tree. The __naked attribute has idiosyncrasies of its own, regardless of any potential bugs, that sometimes makes it harder to maintain and prevent extra optimizations that the compiler could otherwise take care of. So I think that this is a good thing to get rid of __naked when its usage isn't necessary, like the instances in this patch. The remaining instances are cases where there is simply no stack available making __naked necessary in those cases. Nicolas
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Tue, Oct 16, 2018 at 10:43 PM Nicolas Pitre wrote: > On Tue, 16 Oct 2018, Russell King - ARM Linux wrote: > > On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote: > > > On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner wrote: > > It's not obvious yet whether this is right - it contradicts the GCC > > manual, but then we have evidence that it's required for some GCC > > versions where GCC may clone the function, or if the function is > > used within the same file. > > Why not getting rid of __naked altogether? Here's what I suggest: > > - >8 > Subject: [PATCH] ARM: remove naked function usage > > Convert page copy functions not to rely on the naked function attribute. > > This attribute is known to confuse some gcc versions when function > arguments aren't explicitly listed as inline assembly operands despite > the gcc documentation. That resulted in commit 9a40ac86152c ("ARM: > 6164/1: Add kto and kfrom to input operands list."). It's probably worth noting that the minimum gcc version for compiling the kernel is now gcc-4.6, which I think does not suffer from the gcc-4.5 bug that triggered the change. See in particular commits 9c695203a7dd ("compiler-gcc.h: gcc-4.5 needs noclone and noinline on __naked functions") and d124b44f09ca ("Compiler Attributes: naked was fixed in gcc 4.6"). The first one made sure we don't inline these functions, so gcc-4.5 no longer runs into the problem even in the absence of the workaround, and the second patch reverts that again, noting that gcc-4.6 is fixed. I don't see anything wrong with converting the functions to not use __naked at all, but I think we can also just revert the original commit 9a40ac86152c to get it to build with clang. When I last played with clang on arm32, that's what I did. I'll reply with the patch I have in my randconfig tree. Arnd
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On 16.10.2018 22:43, Nicolas Pitre wrote: > On Tue, 16 Oct 2018, Russell King - ARM Linux wrote: > >> On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote: >> > On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner wrote: >> > >> > > When functions incoming parameters are not in input operands list gcc >> > > 4.5 does not load the parameters into registers before calling this >> > > function but the inline assembly assumes valid addresses inside this >> > > function. This breaks the code because r0 and r1 are invalid when >> > > execution enters v4wb_copy_user_page () >> > > >> > > Also the constant needs to be used as third input operand so account >> > > for that as well. >> > > >> > > This fixes copypage-fa.c what has previously done before for the other >> > > copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto >> > > and kfrom to input operands list."). >> > > >> > > Signed-off-by: Stefan Agner >> > >> > Please add: >> > Cc: sta...@vger.kernel.org >> >> It's not obvious yet whether this is right - it contradicts the GCC >> manual, but then we have evidence that it's required for some GCC >> versions where GCC may clone the function, or if the function is >> used within the same file. > > Why not getting rid of __naked altogether? Here's what I suggest: > > - >8 > Subject: [PATCH] ARM: remove naked function usage > > Convert page copy functions not to rely on the naked function attribute. > > This attribute is known to confuse some gcc versions when function > arguments aren't explicitly listed as inline assembly operands despite > the gcc documentation. That resulted in commit 9a40ac86152c ("ARM: > 6164/1: Add kto and kfrom to input operands list."). > > Yet that commit has problems of its own by having assembly operand > constraints completely wrong. If the generated code has been OK since > then, it is due to luck rather than correctness. So this patch provides > proper assembly operand usage, and removes two instances of redundant > register duplications in the implementation while at it. > > Inspection of the generated code with this patch doesn't show any obvious > quality degradation either, so not relying on __naked at all will make > the code less fragile, and more likely to be compilable with clang. > > The only remaining __naked instances (excluding the kprobes test cases) > are exynos_pm_power_up_setup() and tc2_pm_power_up_setup(). But in those > cases only the function address is used by the compiler with no chance of > inlining it by mistake. Tested using Qemu mainstone and versatileab (pxa_defconfig-CONFIG_FTRACE and versatile_defconfig) compiled with Clang 7.0. Both configuration compile and boot fine. So from that perspective: Tested-by: Stefan Agner -- Stefan > > Signed-off-by: Nicolas Pitre > --- > arch/arm/mm/copypage-fa.c | 34 ++-- > arch/arm/mm/copypage-feroceon.c | 97 +-- > arch/arm/mm/copypage-v4mc.c | 18 +++ > arch/arm/mm/copypage-v4wb.c | 40 +++ > arch/arm/mm/copypage-v4wt.c | 36 ++--- > arch/arm/mm/copypage-xsc3.c | 70 +++-- > arch/arm/mm/copypage-xscale.c | 70 - > 7 files changed, 171 insertions(+), 194 deletions(-) > > diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c > index d130a5ece5..453a3341ca 100644 > --- a/arch/arm/mm/copypage-fa.c > +++ b/arch/arm/mm/copypage-fa.c > @@ -17,26 +17,24 @@ > /* > * Faraday optimised copy_user_page > */ > -static void __naked > -fa_copy_user_page(void *kto, const void *kfrom) > +static void fa_copy_user_page(void *kto, const void *kfrom) > { > - asm("\ > - stmfd sp!, {r4, lr} @ 2\n\ > - mov r2, %0 @ 1\n\ > -1: ldmia r1!, {r3, r4, ip, lr} @ 4\n\ > - stmia r0, {r3, r4, ip, lr}@ 4\n\ > - mcr p15, 0, r0, c7, c14, 1 @ 1 clean and invalidate D > line\n\ > - add r0, r0, #16 @ 1\n\ > - ldmia r1!, {r3, r4, ip, lr} @ 4\n\ > - stmia r0, {r3, r4, ip, lr}@ 4\n\ > - mcr p15, 0, r0, c7, c14, 1 @ 1 clean and invalidate D > line\n\ > - add r0, r0, #16 @ 1\n\ > - subsr2, r2, #1 @ 1\n\ > + int tmp; > + asm volatile ("\ > +1: ldmia %1!, {r3, r4, ip, lr} @ 4\n\ > + stmia %0, {r3, r4, ip, lr}@ 4\n\ > + mcr p15, 0, %0, c7, c14, 1 @ 1 clean and invalidate D > line\n\ > + add %0, %0, #16 @ 1\n\ > + ldmia %1!, {r3, r4, ip, lr} @ 4\n\ > + stmia %0, {r3, r4, ip, lr}@ 4\n\ > + mcr p15, 0, %0, c7, c14, 1 @ 1 clean and invalidate D > line\n\ > + add %0, %0, #16 @ 1\n\ > + subs%2, %2, #1 @ 1\n\ > bne 1b
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Tue, 16 Oct 2018, Russell King - ARM Linux wrote: > On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote: > > On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner wrote: > > > > > When functions incoming parameters are not in input operands list gcc > > > 4.5 does not load the parameters into registers before calling this > > > function but the inline assembly assumes valid addresses inside this > > > function. This breaks the code because r0 and r1 are invalid when > > > execution enters v4wb_copy_user_page () > > > > > > Also the constant needs to be used as third input operand so account > > > for that as well. > > > > > > This fixes copypage-fa.c what has previously done before for the other > > > copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto > > > and kfrom to input operands list."). > > > > > > Signed-off-by: Stefan Agner > > > > Please add: > > Cc: sta...@vger.kernel.org > > It's not obvious yet whether this is right - it contradicts the GCC > manual, but then we have evidence that it's required for some GCC > versions where GCC may clone the function, or if the function is > used within the same file. Why not getting rid of __naked altogether? Here's what I suggest: - >8 Subject: [PATCH] ARM: remove naked function usage Convert page copy functions not to rely on the naked function attribute. This attribute is known to confuse some gcc versions when function arguments aren't explicitly listed as inline assembly operands despite the gcc documentation. That resulted in commit 9a40ac86152c ("ARM: 6164/1: Add kto and kfrom to input operands list."). Yet that commit has problems of its own by having assembly operand constraints completely wrong. If the generated code has been OK since then, it is due to luck rather than correctness. So this patch provides proper assembly operand usage, and removes two instances of redundant register duplications in the implementation while at it. Inspection of the generated code with this patch doesn't show any obvious quality degradation either, so not relying on __naked at all will make the code less fragile, and more likely to be compilable with clang. The only remaining __naked instances (excluding the kprobes test cases) are exynos_pm_power_up_setup() and tc2_pm_power_up_setup(). But in those cases only the function address is used by the compiler with no chance of inlining it by mistake. Signed-off-by: Nicolas Pitre --- arch/arm/mm/copypage-fa.c | 34 ++-- arch/arm/mm/copypage-feroceon.c | 97 +-- arch/arm/mm/copypage-v4mc.c | 18 +++ arch/arm/mm/copypage-v4wb.c | 40 +++ arch/arm/mm/copypage-v4wt.c | 36 ++--- arch/arm/mm/copypage-xsc3.c | 70 +++-- arch/arm/mm/copypage-xscale.c | 70 - 7 files changed, 171 insertions(+), 194 deletions(-) diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c index d130a5ece5..453a3341ca 100644 --- a/arch/arm/mm/copypage-fa.c +++ b/arch/arm/mm/copypage-fa.c @@ -17,26 +17,24 @@ /* * Faraday optimised copy_user_page */ -static void __naked -fa_copy_user_page(void *kto, const void *kfrom) +static void fa_copy_user_page(void *kto, const void *kfrom) { - asm("\ - stmfd sp!, {r4, lr} @ 2\n\ - mov r2, %0 @ 1\n\ -1: ldmia r1!, {r3, r4, ip, lr} @ 4\n\ - stmia r0, {r3, r4, ip, lr}@ 4\n\ - mcr p15, 0, r0, c7, c14, 1 @ 1 clean and invalidate D line\n\ - add r0, r0, #16 @ 1\n\ - ldmia r1!, {r3, r4, ip, lr} @ 4\n\ - stmia r0, {r3, r4, ip, lr}@ 4\n\ - mcr p15, 0, r0, c7, c14, 1 @ 1 clean and invalidate D line\n\ - add r0, r0, #16 @ 1\n\ - subsr2, r2, #1 @ 1\n\ + int tmp; + asm volatile ("\ +1: ldmia %1!, {r3, r4, ip, lr} @ 4\n\ + stmia %0, {r3, r4, ip, lr}@ 4\n\ + mcr p15, 0, %0, c7, c14, 1 @ 1 clean and invalidate D line\n\ + add %0, %0, #16 @ 1\n\ + ldmia %1!, {r3, r4, ip, lr} @ 4\n\ + stmia %0, {r3, r4, ip, lr}@ 4\n\ + mcr p15, 0, %0, c7, c14, 1 @ 1 clean and invalidate D line\n\ + add %0, %0, #16 @ 1\n\ + subs%2, %2, #1 @ 1\n\ bne 1b @ 1\n\ - mcr p15, 0, r2, c7, c10, 4 @ 1 drain WB\n\ - ldmfd sp!, {r4, pc} @ 3" - : - : "I" (PAGE_SIZE / 32)); + mcr p15, 0, %2, c7, c10, 4 @ 1 drain WB" + : "+&r" (kto), "+&r" (kfrom), "=&r" "tmp) + : "2" (PAGE_SIZE / 32) + : "r3", "r4", "ip", "lr"); } void fa_copy_user_highpage(struct p
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Tue, Oct 16, 2018 at 10:44 AM Russell King - ARM Linux wrote: > On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote: > > I am on deep waters with ARM assembly, admittedly. So I wanted to > > ask: OpenWRT has this cache patch: > > https://github.com/openwrt/openwrt/blob/master/target/linux/gemini/patches-4.14/0001-cache-patch-from-OpenWRT.patch > > I do not know why (sorry). > > > > Do you think that patch is actually a hack to hide the problem > > fixed with this patch? (OK maybe stupid question but...) > > No, it looks to me like a hack to make DMA cache handling "more > efficient" by cleaning/invalidating the entire cache when dealing > with large streaming buffers. Aha that makes a lot of sense. I will attempt to drop it from OpenWRT in the next kernel upgrade unless benchmarks can show that it is worth it. Thanks Russell! Linus Walleij
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote: > On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner wrote: > > > When functions incoming parameters are not in input operands list gcc > > 4.5 does not load the parameters into registers before calling this > > function but the inline assembly assumes valid addresses inside this > > function. This breaks the code because r0 and r1 are invalid when > > execution enters v4wb_copy_user_page () > > > > Also the constant needs to be used as third input operand so account > > for that as well. > > > > This fixes copypage-fa.c what has previously done before for the other > > copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto > > and kfrom to input operands list."). > > > > Signed-off-by: Stefan Agner > > Please add: > Cc: sta...@vger.kernel.org It's not obvious yet whether this is right - it contradicts the GCC manual, but then we have evidence that it's required for some GCC versions where GCC may clone the function, or if the function is used within the same file. > I am on deep waters with ARM assembly, admittedly. So I wanted to > ask: OpenWRT has this cache patch: > https://github.com/openwrt/openwrt/blob/master/target/linux/gemini/patches-4.14/0001-cache-patch-from-OpenWRT.patch > I do not know why (sorry). > > Do you think that patch is actually a hack to hide the problem > fixed with this patch? (OK maybe stupid question but...) No, it looks to me like a hack to make DMA cache handling "more efficient" by cleaning/invalidating the entire cache when dealing with large streaming buffers. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner wrote: > When functions incoming parameters are not in input operands list gcc > 4.5 does not load the parameters into registers before calling this > function but the inline assembly assumes valid addresses inside this > function. This breaks the code because r0 and r1 are invalid when > execution enters v4wb_copy_user_page () > > Also the constant needs to be used as third input operand so account > for that as well. > > This fixes copypage-fa.c what has previously done before for the other > copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto > and kfrom to input operands list."). > > Signed-off-by: Stefan Agner Please add: Cc: sta...@vger.kernel.org I am on deep waters with ARM assembly, admittedly. So I wanted to ask: OpenWRT has this cache patch: https://github.com/openwrt/openwrt/blob/master/target/linux/gemini/patches-4.14/0001-cache-patch-from-OpenWRT.patch I do not know why (sorry). Do you think that patch is actually a hack to hide the problem fixed with this patch? (OK maybe stupid question but...) it appeared anonymously in OpenWRT with the commit message "add v3.18 support" at one point. Yours, Linus Walleij
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Tue, Oct 16, 2018 at 12:52:58AM +0200, Stefan Agner wrote: > On 16.10.2018 00:46, Russell King - ARM Linux wrote: > > On Tue, Oct 16, 2018 at 12:39:54AM +0200, Stefan Agner wrote: > >> On 16.10.2018 00:23, Russell King - ARM Linux wrote: > >> > On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote: > >> >> When functions incoming parameters are not in input operands list gcc > >> >> 4.5 does not load the parameters into registers before calling this > >> >> function but the inline assembly assumes valid addresses inside this > >> >> function. This breaks the code because r0 and r1 are invalid when > >> >> execution enters v4wb_copy_user_page () > >> > > >> > NAK. Naked functions must never be inlined. Please add a "noinline" > >> > attribute to the function rather than making things more complex. > >> > > >> > >> To be honest, I did not put much thought into this commit since it is > >> just doing to copypage-fa.c what 9a40ac86152c ("ARM: 6164/1: Add kto and > >> kfrom to input operands list.") has been done to the other copypage > >> implementations... > >> > >> [adding Khem] > >> > >> > The GCC manual states: > >> > > >> > `naked' > >> > Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX > >> > and SPU ports to indicate that the specified function does not > >> > need prologue/epilogue sequences generated by the compiler. It is > >> > up to the programmer to provide these sequences. The only > >> > > >> > statements that can be safely included in naked functions are > >> > ^ > >> > `asm' statements that do not have operands. All other statements, > >> > ^^^ > >> > including declarations of local variables, `if' statements, and so > >> > forth, should be avoided. Naked functions should be used to > >> > implement the body of an assembly function, while allowing the > >> > compiler to construct the requisite function declaration for the > >> > assembler. > >> > > >> > The 'I' attribute is fine here because it is a constant that is not > >> > allowed to be in a register (and hence has no code generation side > >> > effects.) > >> > > >> > Adding operands for the input parameters, however, isn't going to > >> > work around the fact that _this_ assembly is written to be out of > >> > line and so it must never be inlined by the compiler. > >> > >> I briefly looked at a disassembled version after applying both patches, > >> it indeed leads to inlining. However, the code seems to be working > >> (thanks to asm volatile?)... > > > > Apart from v4wb_copy_user_page() and mc_copy_user_page(), how is > > Clang inlining these static functions that are only used through > > function pointers? > > I only looked at copypage-xscale.c (the mc_copy_user_page() case)... The two I mention are different from the rest, because they are used from other functions within the same file. The rest are all used through function pointers and should, therefore, never be inlined. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On 16.10.2018 00:46, Russell King - ARM Linux wrote: > On Tue, Oct 16, 2018 at 12:39:54AM +0200, Stefan Agner wrote: >> On 16.10.2018 00:23, Russell King - ARM Linux wrote: >> > On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote: >> >> When functions incoming parameters are not in input operands list gcc >> >> 4.5 does not load the parameters into registers before calling this >> >> function but the inline assembly assumes valid addresses inside this >> >> function. This breaks the code because r0 and r1 are invalid when >> >> execution enters v4wb_copy_user_page () >> > >> > NAK. Naked functions must never be inlined. Please add a "noinline" >> > attribute to the function rather than making things more complex. >> > >> >> To be honest, I did not put much thought into this commit since it is >> just doing to copypage-fa.c what 9a40ac86152c ("ARM: 6164/1: Add kto and >> kfrom to input operands list.") has been done to the other copypage >> implementations... >> >> [adding Khem] >> >> > The GCC manual states: >> > >> > `naked' >> > Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX >> > and SPU ports to indicate that the specified function does not >> > need prologue/epilogue sequences generated by the compiler. It is >> > up to the programmer to provide these sequences. The only >> > >> > statements that can be safely included in naked functions are >> > ^ >> > `asm' statements that do not have operands. All other statements, >> > ^^^ >> > including declarations of local variables, `if' statements, and so >> > forth, should be avoided. Naked functions should be used to >> > implement the body of an assembly function, while allowing the >> > compiler to construct the requisite function declaration for the >> > assembler. >> > >> > The 'I' attribute is fine here because it is a constant that is not >> > allowed to be in a register (and hence has no code generation side >> > effects.) >> > >> > Adding operands for the input parameters, however, isn't going to >> > work around the fact that _this_ assembly is written to be out of >> > line and so it must never be inlined by the compiler. >> >> I briefly looked at a disassembled version after applying both patches, >> it indeed leads to inlining. However, the code seems to be working >> (thanks to asm volatile?)... > > Apart from v4wb_copy_user_page() and mc_copy_user_page(), how is > Clang inlining these static functions that are only used through > function pointers? I only looked at copypage-xscale.c (the mc_copy_user_page() case)... -- Stefan
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Tue, Oct 16, 2018 at 12:39:54AM +0200, Stefan Agner wrote: > On 16.10.2018 00:23, Russell King - ARM Linux wrote: > > On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote: > >> When functions incoming parameters are not in input operands list gcc > >> 4.5 does not load the parameters into registers before calling this > >> function but the inline assembly assumes valid addresses inside this > >> function. This breaks the code because r0 and r1 are invalid when > >> execution enters v4wb_copy_user_page () > > > > NAK. Naked functions must never be inlined. Please add a "noinline" > > attribute to the function rather than making things more complex. > > > > To be honest, I did not put much thought into this commit since it is > just doing to copypage-fa.c what 9a40ac86152c ("ARM: 6164/1: Add kto and > kfrom to input operands list.") has been done to the other copypage > implementations... > > [adding Khem] > > > The GCC manual states: > > > > `naked' > > Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX > > and SPU ports to indicate that the specified function does not > > need prologue/epilogue sequences generated by the compiler. It is > > up to the programmer to provide these sequences. The only > > > > statements that can be safely included in naked functions are > > ^ > > `asm' statements that do not have operands. All other statements, > > ^^^ > > including declarations of local variables, `if' statements, and so > > forth, should be avoided. Naked functions should be used to > > implement the body of an assembly function, while allowing the > > compiler to construct the requisite function declaration for the > > assembler. > > > > The 'I' attribute is fine here because it is a constant that is not > > allowed to be in a register (and hence has no code generation side > > effects.) > > > > Adding operands for the input parameters, however, isn't going to > > work around the fact that _this_ assembly is written to be out of > > line and so it must never be inlined by the compiler. > > I briefly looked at a disassembled version after applying both patches, > it indeed leads to inlining. However, the code seems to be working > (thanks to asm volatile?)... Apart from v4wb_copy_user_page() and mc_copy_user_page(), how is Clang inlining these static functions that are only used through function pointers? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On 16.10.2018 00:23, Russell King - ARM Linux wrote: > On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote: >> When functions incoming parameters are not in input operands list gcc >> 4.5 does not load the parameters into registers before calling this >> function but the inline assembly assumes valid addresses inside this >> function. This breaks the code because r0 and r1 are invalid when >> execution enters v4wb_copy_user_page () > > NAK. Naked functions must never be inlined. Please add a "noinline" > attribute to the function rather than making things more complex. > To be honest, I did not put much thought into this commit since it is just doing to copypage-fa.c what 9a40ac86152c ("ARM: 6164/1: Add kto and kfrom to input operands list.") has been done to the other copypage implementations... [adding Khem] > The GCC manual states: > > `naked' > Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX > and SPU ports to indicate that the specified function does not > need prologue/epilogue sequences generated by the compiler. It is > up to the programmer to provide these sequences. The only > > statements that can be safely included in naked functions are > ^ > `asm' statements that do not have operands. All other statements, > ^^^ > including declarations of local variables, `if' statements, and so > forth, should be avoided. Naked functions should be used to > implement the body of an assembly function, while allowing the > compiler to construct the requisite function declaration for the > assembler. > > The 'I' attribute is fine here because it is a constant that is not > allowed to be in a register (and hence has no code generation side > effects.) > > Adding operands for the input parameters, however, isn't going to > work around the fact that _this_ assembly is written to be out of > line and so it must never be inlined by the compiler. I briefly looked at a disassembled version after applying both patches, it indeed leads to inlining. However, the code seems to be working (thanks to asm volatile?)... Anyway, my goal is actually what patch 2 ("ARM: copypage: do not use naked functions") is doing: Make Clang happy. As a matter of fact, reverting 9a40ac86152c actually fixes compilation for Clang too, and seems to lead to a working Kernel (tested with versatile_defconfig in Qemu), so maybe that is what we should do here? -- Stefan > >> Also the constant needs to be used as third input operand so account >> for that as well. >> >> This fixes copypage-fa.c what has previously done before for the other >> copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto >> and kfrom to input operands list."). >> >> Signed-off-by: Stefan Agner >> --- >> arch/arm/mm/copypage-fa.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c >> index d130a5ece5d5..ec6501308c60 100644 >> --- a/arch/arm/mm/copypage-fa.c >> +++ b/arch/arm/mm/copypage-fa.c >> @@ -22,7 +22,7 @@ fa_copy_user_page(void *kto, const void *kfrom) >> { >> asm("\ >> stmfd sp!, {r4, lr} @ 2\n\ >> -mov r2, %0 @ 1\n\ >> +mov r2, %2 @ 1\n\ >> 1: ldmia r1!, {r3, r4, ip, lr} @ 4\n\ >> stmia r0, {r3, r4, ip, lr}@ 4\n\ >> mcr p15, 0, r0, c7, c14, 1 @ 1 clean and invalidate D >> line\n\ >> @@ -36,7 +36,7 @@ fa_copy_user_page(void *kto, const void *kfrom) >> mcr p15, 0, r2, c7, c10, 4 @ 1 drain WB\n\ >> ldmfd sp!, {r4, pc} @ 3" >> : >> -: "I" (PAGE_SIZE / 32)); >> +: "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 32)); >> } >> >> void fa_copy_user_highpage(struct page *to, struct page *from, >> -- >> 2.19.1 >>
Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote: > When functions incoming parameters are not in input operands list gcc > 4.5 does not load the parameters into registers before calling this > function but the inline assembly assumes valid addresses inside this > function. This breaks the code because r0 and r1 are invalid when > execution enters v4wb_copy_user_page () NAK. Naked functions must never be inlined. Please add a "noinline" attribute to the function rather than making things more complex. The GCC manual states: `naked' Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX and SPU ports to indicate that the specified function does not need prologue/epilogue sequences generated by the compiler. It is up to the programmer to provide these sequences. The only statements that can be safely included in naked functions are ^ `asm' statements that do not have operands. All other statements, ^^^ including declarations of local variables, `if' statements, and so forth, should be avoided. Naked functions should be used to implement the body of an assembly function, while allowing the compiler to construct the requisite function declaration for the assembler. The 'I' attribute is fine here because it is a constant that is not allowed to be in a register (and hence has no code generation side effects.) Adding operands for the input parameters, however, isn't going to work around the fact that _this_ assembly is written to be out of line and so it must never be inlined by the compiler. > Also the constant needs to be used as third input operand so account > for that as well. > > This fixes copypage-fa.c what has previously done before for the other > copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto > and kfrom to input operands list."). > > Signed-off-by: Stefan Agner > --- > arch/arm/mm/copypage-fa.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c > index d130a5ece5d5..ec6501308c60 100644 > --- a/arch/arm/mm/copypage-fa.c > +++ b/arch/arm/mm/copypage-fa.c > @@ -22,7 +22,7 @@ fa_copy_user_page(void *kto, const void *kfrom) > { > asm("\ > stmfd sp!, {r4, lr} @ 2\n\ > - mov r2, %0 @ 1\n\ > + mov r2, %2 @ 1\n\ > 1: ldmia r1!, {r3, r4, ip, lr} @ 4\n\ > stmia r0, {r3, r4, ip, lr}@ 4\n\ > mcr p15, 0, r0, c7, c14, 1 @ 1 clean and invalidate D > line\n\ > @@ -36,7 +36,7 @@ fa_copy_user_page(void *kto, const void *kfrom) > mcr p15, 0, r2, c7, c10, 4 @ 1 drain WB\n\ > ldmfd sp!, {r4, pc} @ 3" > : > - : "I" (PAGE_SIZE / 32)); > + : "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 32)); > } > > void fa_copy_user_highpage(struct page *to, struct page *from, > -- > 2.19.1 > -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
[PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list
When functions incoming parameters are not in input operands list gcc 4.5 does not load the parameters into registers before calling this function but the inline assembly assumes valid addresses inside this function. This breaks the code because r0 and r1 are invalid when execution enters v4wb_copy_user_page () Also the constant needs to be used as third input operand so account for that as well. This fixes copypage-fa.c what has previously done before for the other copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto and kfrom to input operands list."). Signed-off-by: Stefan Agner --- arch/arm/mm/copypage-fa.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c index d130a5ece5d5..ec6501308c60 100644 --- a/arch/arm/mm/copypage-fa.c +++ b/arch/arm/mm/copypage-fa.c @@ -22,7 +22,7 @@ fa_copy_user_page(void *kto, const void *kfrom) { asm("\ stmfd sp!, {r4, lr} @ 2\n\ - mov r2, %0 @ 1\n\ + mov r2, %2 @ 1\n\ 1: ldmia r1!, {r3, r4, ip, lr} @ 4\n\ stmia r0, {r3, r4, ip, lr}@ 4\n\ mcr p15, 0, r0, c7, c14, 1 @ 1 clean and invalidate D line\n\ @@ -36,7 +36,7 @@ fa_copy_user_page(void *kto, const void *kfrom) mcr p15, 0, r2, c7, c10, 4 @ 1 drain WB\n\ ldmfd sp!, {r4, pc} @ 3" : - : "I" (PAGE_SIZE / 32)); + : "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 32)); } void fa_copy_user_highpage(struct page *to, struct page *from, -- 2.19.1