Re: [PATCH] powerpc/vdso: Separate vvar vma from vdso
On 3/31/21 10:59 AM, Michael Ellerman wrote: > Christophe Leroy writes: [..] >> >>> @@ -133,7 +135,13 @@ static int __arch_setup_additional_pages(struct >>> linux_binprm *bprm, int uses_int >>> * install_special_mapping or the perf counter mmap tracking code >>> * will fail to recognise it as a vDSO. >>> */ >>> - mm->context.vdso = (void __user *)vdso_base + PAGE_SIZE; >>> + mm->context.vdso = (void __user *)vdso_base + vvar_size; >>> + >>> + vma = _install_special_mapping(mm, vdso_base, vvar_size, >>> + VM_READ | VM_MAYREAD | VM_IO | >>> + VM_DONTDUMP | VM_PFNMAP, _spec); >>> + if (IS_ERR(vma)) >>> + return PTR_ERR(vma); >>> >>> /* >>> * our vma flags don't have VM_WRITE so by default, the process isn't >> >> >> IIUC, VM_PFNMAP is for when we have a vvar_fault handler. >> Allthough we will soon have one for handle TIME_NS, at the moment >> powerpc doesn't have that handler. >> Isn't it dangerous to set VM_PFNMAP then ? I believe, it's fine, special_mapping_fault() does: : if (sm->fault) : return sm->fault(sm, vmf->vma, vmf); > Some of the other flags seem odd too. > eg. VM_IO ? VM_DONTDUMP ? Yeah, so: VM_PFNMAP | VM_IO is a protection from remote access on pages. So one can't access such page with ptrace(), /proc/$pid/mem or process_vm_write(). Otherwise, it would create COW mapping and the tracee will stop working with stale vvar. VM_DONTDUMP restricts the area from coredumping and gdb will also avoid accessing those[1][2]. I agree that VM_PFNMAP was probably excessive in this patch alone and rather synchronized code with other architectures, but it makes more sense now in the new patches set by Christophe: https://lore.kernel.org/linux-arch/cover.1617209141.git.christophe.le...@csgroup.eu/ [1] https://lore.kernel.org/lkml/550731af.6080...@redhat.com/T/ [2] https://sourceware.org/legacy-ml/gdb-patches/2015-03/msg00383.html Thanks, Dmitry
Re: [PATCH] powerpc/vdso: Separate vvar vma from vdso
On 3/30/21 11:17 AM, Christophe Leroy wrote: > > > Le 26/03/2021 à 20:17, Dmitry Safonov a écrit : [..] >> --- a/arch/powerpc/kernel/vdso.c >> +++ b/arch/powerpc/kernel/vdso.c >> @@ -55,10 +55,10 @@ static int vdso_mremap(const struct >> vm_special_mapping *sm, struct vm_area_struc >> { >> unsigned long new_size = new_vma->vm_end - new_vma->vm_start; >> - if (new_size != text_size + PAGE_SIZE) >> + if (new_size != text_size) >> return -EINVAL; > > In ARM64 you have removed the above test in commit 871402e05b24cb56 > ("mm: forbid splitting special mappings"). Do we need to keep it here ? > >> - current->mm->context.vdso = (void __user *)new_vma->vm_start + >> PAGE_SIZE; >> + current->mm->context.vdso = (void __user *)new_vma->vm_start; >> return 0; >> } > Yes, right you are, this can be dropped. Thanks, Dmitry
Re: [PATCH] powerpc/vdso: Separate vvar vma from vdso
On 3/29/21 4:14 PM, Laurent Dufour wrote: > Le 26/03/2021 à 20:17, Dmitry Safonov a écrit : >> Since commit 511157ab641e ("powerpc/vdso: Move vdso datapage up front") >> VVAR page is in front of the VDSO area. In result it breaks CRIU >> (Checkpoint Restore In Userspace) [1], where CRIU expects that "[vdso]" >> from /proc/../maps points at ELF/vdso image, rather than at VVAR data >> page. >> Laurent made a patch to keep CRIU working (by reading aux vector). >> But I think it still makes sence to separate two mappings into different >> VMAs. It will also make ppc64 less "special" for userspace and as >> a side-bonus will make VVAR page un-writable by debugger (which >> previously >> would COW page and can be unexpected). >> >> I opportunistically Cc stable on it: I understand that usually such >> stuff isn't a stable material, but that will allow us in CRIU have >> one workaround less that is needed just for one release (v5.11) on >> one platform (ppc64), which we otherwise have to maintain. >> I wouldn't go as far as to say that the commit 511157ab641e is ABI >> regression as no other userspace got broken, but I'd really appreciate >> if it gets backported to v5.11 after v5.12 is released, so as not >> to complicate already non-simple CRIU-vdso code. Thanks! >> >> Cc: Andrei Vagin >> Cc: Andy Lutomirski >> Cc: Benjamin Herrenschmidt >> Cc: Christophe Leroy >> Cc: Laurent Dufour >> Cc: Michael Ellerman >> Cc: Paul Mackerras >> Cc: linuxppc-dev@lists.ozlabs.org >> Cc: sta...@vger.kernel.org # v5.11 >> [1]: https://github.com/checkpoint-restore/criu/issues/1417 >> Signed-off-by: Dmitry Safonov >> Tested-by: Christophe Leroy > > I run the CRIU's test suite and except the usual suspects, all the tests > passed. > > Tested-by: Laurent Dufour Thank you, Laurent! -- Dmitry
Re: [PATCH] powerpc/vdso: Separate vvar vma from vdso
Hi Christophe, On 3/27/21 5:19 PM, Christophe Leroy wrote: [..] >> I opportunistically Cc stable on it: I understand that usually such >> stuff isn't a stable material, but that will allow us in CRIU have >> one workaround less that is needed just for one release (v5.11) on >> one platform (ppc64), which we otherwise have to maintain. > > Why is that a workaround, and why for one release only ? I think the > solution proposed by Laurentto use the aux vector AT_SYSINFO_EHDR should > work with any past and future release. Yeah, I guess. Previously, (before v5.11/power) all kernels had ELF start at "[vdso]" VMA start, now we'll have to carry the offset in the VMA. Probably, not the worst thing, but as it will be only for v5.11 release it can break, so needs separate testing. Kinda life was a bit easier without this additional code. >> I wouldn't go as far as to say that the commit 511157ab641e is ABI >> regression as no other userspace got broken, but I'd really appreciate >> if it gets backported to v5.11 after v5.12 is released, so as not >> to complicate already non-simple CRIU-vdso code. Thanks! >> >> Cc: Andrei Vagin >> Cc: Andy Lutomirski >> Cc: Benjamin Herrenschmidt >> Cc: Christophe Leroy >> Cc: Laurent Dufour >> Cc: Michael Ellerman >> Cc: Paul Mackerras >> Cc: linuxppc-dev@lists.ozlabs.org >> Cc: sta...@vger.kernel.org # v5.11 >> [1]: https://github.com/checkpoint-restore/criu/issues/1417 >> Signed-off-by: Dmitry Safonov >> Tested-by: Christophe Leroy > > I tested it with sifreturn_vdso selftest and it worked, because that > selftest doesn't involve VDSO data. Thanks again on helping with testing it, I appreciate it! > But if I do a mremap() on the VDSO text vma without remapping VVAR to > keep the same distance between the two vmas, gettimeofday() crashes. The > reason is that the code obtains the address of the data by calculating a > fix difference from its own address with the below macro, the delta > being resolved at link time: > > .macro get_datapage ptr > bcl 20, 31, .+4 > 999: > mflr \ptr > #if CONFIG_PPC_PAGE_SHIFT > 14 > addis \ptr, \ptr, (_vdso_datapage - 999b)@ha > #endif > addi \ptr, \ptr, (_vdso_datapage - 999b)@l > .endm > > So the datapage needs to remain at the same distance from the code at > all time. > > Wondering how the other architectures do to have two independent VMAs > and be able to move one independently of the other. It's alright as far as I know. If userspace remaps vdso/vvar it should be aware of this (CRIU keeps this in mind, also old vdso image is dumped to compare on restore with the one that the host has). Thanks, Dmitry
Re: VDSO ELF header
On 3/26/21 6:40 PM, Christophe Leroy wrote: > > > Le 26/03/2021 à 18:11, Dmitry Safonov a écrit : >> On 3/26/21 5:07 PM, Christophe Leroy wrote: >>> No, the problem is that user access has to be allowed for the flush() >>> >>> A hacky solution would be to call user_access_begin() , will test that >>> later >> >> Yeah, cool. >> >> Will it be fine if I send the vvar patch with your Tested-by? >> > > Tested-by: Christophe Leroy Thank you! I'll properly submit it shortly.. > With the user access fixed on the flush, it sigreturn_vdso selftest is a > success. I'll send a patch for it in the coming days. Nice! > What is the status of your series which adds generic vdso_base tracking ? Yeah, I was doing a new version of patches and I always was unsatisfied by the result and stuck between "good" and "best" (more code rewriting). And then had some other work to finish. I'll try to finish and send it next week, thanks for pinging :-) -- Dmitry
Re: VDSO ELF header
On 3/26/21 5:07 PM, Christophe Leroy wrote: > No, the problem is that user access has to be allowed for the flush() > > A hacky solution would be to call user_access_begin() , will test that > later Yeah, cool. Will it be fine if I send the vvar patch with your Tested-by? Thanks, Dmitry
Re: VDSO ELF header
On 3/26/21 4:11 PM, Christophe Leroy wrote: [..] >>> >>> Dmitry proposed the same, see >>> https://github.com/0x7f454c46/linux/commit/783c7a2532d2219edbcf555cc540eab05f698d2a >>> >>> >>> >>> Discussion at https://github.com/checkpoint-restore/criu/issues/1417 >> >> Yeah, I didn't submit it officially to lkml because I couldn't test it >> yet (and I usually don't send untested patches). The VM I have fails to >> kexec and there's some difficulty to get serial console working, so I'd >> appreciate if someone could either pick it up, or add tested-by. >> > > Just to let everyone know, while testing your patch with selftest I > encountered the following Oops. But I also have it without your patch > thought. Thank you, Christophe! > > root@vgoip:~# ./sigreturn_vdso > test: sigreturn_vdso > tags: git_version:v5.12-rc4-1553-gc31141d460e6 > VDSO is at 0x104000-0x10bfff (32768 bytes) > Signal delivered OK with VDSO mapped > VDSO moved to 0x77bf4000-0x77bfbfff (32768 bytes) > Signal delivered OK with VDSO moved > Unmapped VDSO > [ 1855.444371] Kernel attempted to read user page (7ff9ff30) - exploit > attempt? (uid: 0) > [ 1855.459404] BUG: Unable to handle kernel data access on read at > 0x7ff9ff30 > [ 1855.466188] Faulting instruction address: 0xc00111d4 > [ 1855.471099] Oops: Kernel access of bad area, sig: 11 [#1] > [ 1855.476428] BE PAGE_SIZE=16K PREEMPT CMPC885 > [ 1855.480702] SAF3000 DIE NOTIFICATION > [ 1855.484184] CPU: 0 PID: 362 Comm: sigreturn_vdso Not tainted > 5.12.0-rc4-s3k-dev-01553-gc31141d460e6 #4811 > [ 1855.493644] NIP: c00111d4 LR: c0005a28 CTR: > [ 1855.498634] REGS: cadb3dd0 TRAP: 0300 Not tainted > (5.12.0-rc4-s3k-dev-01553-gc31141d460e6) > [ 1855.507068] MSR: 9032 CR: 48000884 XER: 2000 > [ 1855.513866] DAR: 7ff9ff30 DSISR: 8800 > [ 1855.513866] GPR00: c0007788 cadb3e90 c28dc000 7ff9ff30 7ff9ff40 > 04e0 7ff9fd50 > [ 1855.513866] GPR08: 0001 0001 7ff9ff30 28000282 > 1001b7e8 100a0920 > [ 1855.513866] GPR16: 100cac0c 100b 102883a4 10289685 100d > 100d 100d 100b2e9e > [ 1855.513866] GPR24: 102883c8 7ff9ff38 cadb3f40 > cadb3ec8 c28dc000 > [ 1855.552767] NIP [c00111d4] flush_icache_range+0x90/0xb4 > [ 1855.557932] LR [c0005a28] handle_signal32+0x1bc/0x1c4 > [ 1855.562925] Call Trace: > [ 1855.565332] [cadb3e90] [100d] 0x100d (unreliable) > [ 1855.570666] [cadb3ec0] [c0007788] do_notify_resume+0x260/0x314 > [ 1855.576432] [cadb3f20] [c000c764] syscall_exit_prepare+0x120/0x184 > [ 1855.582542] [cadb3f30] [c00100b4] ret_from_syscall+0xc/0x28 > [ 1855.588050] --- interrupt: c00 at 0xfe807f8 > [ 1855.592183] NIP: 0fe807f8 LR: 10001048 CTR: c0139378 > [ 1855.597174] REGS: cadb3f40 TRAP: 0c00 Not tainted > (5.12.0-rc4-s3k-dev-01553-gc31141d460e6) > [ 1855.605607] MSR: d032 CR: 28000282 XER: > 2000 > [ 1855.612664] > [ 1855.612664] GPR00: 0025 7ffa0230 77c09690 000a > 28000282 0001 0ff03a38 > [ 1855.612664] GPR08: d032 0328 c28dc000 0009 88000282 > 1001b7e8 100a0920 > [ 1855.612664] GPR16: 100cac0c 100b 102883a4 10289685 100d > 100d 100d 100b2e9e > [ 1855.612664] GPR24: 102883c8 77bff628 10002358 > 1001 1000210c 8000 > [ 1855.648894] NIP [0fe807f8] 0xfe807f8 > [ 1855.652426] LR [10001048] 0x10001048 > [ 1855.655954] --- interrupt: c00 > [ 1855.658969] Instruction dump: > [ 1855.661893] 38630010 7c001fac 38630010 4200fff0 7c0004ac 4c00012c > 4e800020 7c001fac > [ 1855.669811] 2c0a 38630010 4082ffcc 4be4 <7c00186c> 2c07 > 39430010 4082ff8c > [ 1855.677910] ---[ end trace f071a5587092b3aa ]--- > [ 1855.682462] > Remapped the stack executable > !! child died by signal 11 > failure: sigreturn_vdso Yes, it seems unrelated. Probably, a bit hacky solution to this one could be: --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -911,7 +911,7 @@ int handle_signal32(struct ksignal *ksig, sigset_t *oldset, } user_write_access_end(); - if (tramp == (unsigned long)mctx->mc_pad) + if ((tramp == (unsigned long)mctx->mc_pad) && access_ok(tramp, 2*sizeof(unsigned long))) flush_icache_range(tramp, tramp + 2 * sizeof(unsigned long)); regs->link = tramp; -- But it's up to you, it seems power-related. Thanks, Dmitry
Re: VDSO ELF header
Hello, On 3/26/21 10:50 AM, Christophe Leroy wrote: > > > Le 26/03/2021 à 11:46, Michael Ellerman a écrit : >> Laurent Dufour writes: >>> Le 25/03/2021 à 17:56, Laurent Dufour a écrit : Le 25/03/2021 à 17:46, Christophe Leroy a écrit : > Le 25/03/2021 à 17:11, Laurent Dufour a écrit : >> Since v5.11 and the changes you made to the VDSO code, it no more >> exposing >> the ELF header at the beginning of the VDSO mapping in user space. >> >> This is confusing CRIU which is checking for this ELF header cookie >> (https://github.com/checkpoint-restore/criu/issues/1417). > > How does it do on other architectures ? Good question, I'll double check the CRIU code. >>> >>> On x86, there are 2 VDSO entries: >>> 77fcb000-77fce000 r--p 00:00 >>> 0 [vvar] >>> 77fce000-77fcf000 r-xp 00:00 >>> 0 [vdso] >>> >>> And the VDSO is starting with the ELF header. >>> >> I'm not an expert in loading and ELF part and reading the change >> you made, I >> can't identify how this could work now as I'm expecting the loader >> to need >> that ELF header to do the relocation. > > I think the loader is able to find it at the expected place. Actually, it seems the loader relies on the AUX vector AT_SYSINFO_EHDR. I guess CRIU should do the same. >> >> From my investigation it seems that the first bytes of the VDSO >> area are now >> the vdso_arch_data. >> >> Is the ELF header put somewhere else? >> How could the loader process the VDSO without that ELF header? >> > > Like most other architectures, we now have the data section as > first page and > the text section follows. So you will likely find the elf header on > the second > page. >>> >>> I'm wondering if the data section you're refering to is the vvar >>> section I can >>> see on x86. >> >> Many of the other architectures have separate vm_special_mapping's for >> the data page and the vdso binary, where the former is called "vvar". >> >> eg, s390: >> >> static struct vm_special_mapping vvar_mapping = { >> .name = "[vvar]", >> .fault = vvar_fault, >> }; >> >> static struct vm_special_mapping vdso_mapping = { >> .name = "[vdso]", >> .mremap = vdso_mremap, >> }; >> >> >> I guess we probably should be doing that too. >> > > Dmitry proposed the same, see > https://github.com/0x7f454c46/linux/commit/783c7a2532d2219edbcf555cc540eab05f698d2a > > > Discussion at https://github.com/checkpoint-restore/criu/issues/1417 Yeah, I didn't submit it officially to lkml because I couldn't test it yet (and I usually don't send untested patches). The VM I have fails to kexec and there's some difficulty to get serial console working, so I'd appreciate if someone could either pick it up, or add tested-by. Thanks, Dmitry
Re: [PATCH] x86/mpx: fix recursive munmap() corruption
Hi Laurent, Christophe, Michael, all, On 11/3/20 5:11 PM, Laurent Dufour wrote: > Le 23/10/2020 à 14:28, Christophe Leroy a écrit : [..] That seems like it would work for CRIU and make sense in general? >>> >>> Sorry for the late answer, yes this would make more sense. >>> >>> Here is a patch doing that. >>> >> >> In your patch, the test seems overkill: >> >> + if ((start <= vdso_base && vdso_end <= end) || /* 1 */ >> + (vdso_base <= start && start < vdso_end) || /* 3,4 */ >> + (vdso_base < end && end <= vdso_end)) /* 2,3 */ >> + mm->context.vdso_base = mm->context.vdso_end = 0; >> >> What about >> >> if (start < vdso_end && vdso_start < end) >> mm->context.vdso_base = mm->context.vdso_end = 0; >> >> This should cover all cases, or am I missing something ? >> >> >> And do we really need to store vdso_end in the context ? >> I think it should be possible to re-calculate it: the size of the VDSO >> should be (_end - _start) + PAGE_SIZE for 32 bits VDSO, >> and (_end - _start) + PAGE_SIZE for the 64 bits VDSO. > > Thanks Christophe for the advise. > > That is covering all the cases, and indeed is similar to the Michael's > proposal I missed last year. > > I'll send a patch fixing this issue following your proposal. It's probably not necessary anymore. I've sent patches [1], currently in akpm, the last one forbids splitting of vm_special_mapping. So, a user is able munmap() or mremap() vdso as a whole, but not partly. [1]: https://lore.kernel.org/linux-mm/20201013013416.390574-1-d...@arista.com/ Thanks, Dmitry
Re: [PATCH v8 2/8] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()
Hi Christophe, Will, On 10/23/20 12:57 PM, Christophe Leroy wrote: > > > Le 23/10/2020 à 13:25, Will Deacon a écrit : >> On Fri, Oct 23, 2020 at 01:22:04PM +0200, Christophe Leroy wrote: >>> Hi Dmitry, [..] >>> I haven't seen the patches, did you sent them out finally ? I was working on .close() hook, but while cooking it, I thought it may be better to make tracking of user landing generic. Note that the vdso base address is mostly needed by kernel as an address to land in userspace after processing a signal. I have some raw patches that add +#ifdef CONFIG_ARCH_HAS_USER_LANDING + struct vm_area_struct *user_landing; +#endif inside mm_struct and I plan to finish them after rc1 gets released. While working on that, I noticed that arm32 and some other architectures track vdso position in mm.context with the only reason to add AT_SYSINFO_EHDR in the elf header that's being loaded. That's quite overkill to have a pointer in mm.context that rather can be a local variable in elf binfmt loader. Also, I found some issues with mremap code. The patches series mentioned are at the base of the branch with generic user landing. I have sent only those patches not the full branch as I remember there was a policy that during merge window one should send only fixes, rather than refactoring/new code. >> I think it's this series: >> >> https://lore.kernel.org/r/20201013013416.390574-1-d...@arista.com >> >> but they look really invasive to me, so I may cook a small hack for arm64 >> in the meantine / for stable. I don't mind small hacks, but I'm concerned that the suggested fix which sets `mm->context.vdso_base = 0` on munmap() may have it's issue: that won't work if a user for whatever-broken-reason will mremap() vdso on 0 address. As the fix supposes to fix an issue that hasn't fired for anyone yet, it probably shouldn't introduce another. That's why I've used vm_area_struct to track vdso position in the patches set. Probably, temporary, you could use something like: #define BAD_VDSO_ADDRESS(-1)UL Or non-page-aligned address. But the signal code that checks if it can land on vdso/sigpage should be also aligned with the new definition. > Not sure we are talking about the same thing. > > I can't see any new .close function added to vm_special_mapping in order > to replace arch_unmap() hook. Thanks, Dmitry
Re: [PATCH v8 2/8] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()
On 9/27/20 8:43 AM, Christophe Leroy wrote: > > > Le 21/09/2020 à 13:26, Will Deacon a écrit : >> On Fri, Aug 28, 2020 at 12:14:28PM +1000, Michael Ellerman wrote: >>> Dmitry Safonov <0x7f454...@gmail.com> writes: [..] >>>> I'll cook a patch for vm_special_mapping if you don't mind :-) >>> >>> That would be great, thanks! >> >> I lost track of this one. Is there a patch kicking around to resolve >> this, >> or is the segfault expected behaviour? >> > > IIUC dmitry said he will cook a patch. I have not seen any patch yet. Yes, sorry about the delay - I was a bit busy with xfrm patches. I'll send patches for .close() this week, working on them now. > AFAIKS, among the architectures having VDSO sigreturn trampolines, only > SH, X86 and POWERPC provide alternative trampoline on stack when VDSO is > not there. > > All other architectures just having a VDSO don't expect VDSO to not be > mapped. > > As far as nowadays stacks are mapped non-executable, getting a segfaut > is expected behaviour. However, I think we should really make it > cleaner. Today it segfaults because it is still pointing to the VDSO > trampoline that has been unmapped. But should the user map some other > code at the same address, we'll run in the weed on signal return instead > of segfaulting. +1. > So VDSO unmapping should really be properly managed, the reference > should be properly cleared in order to segfault in a controllable manner. > > Only powerpc has a hook to properly clear the VDSO pointer when VDSO is > unmapped. Thanks, Dmitry
Re: [PATCH v8 2/8] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()
Hello, On Wed, 26 Aug 2020 at 15:39, Michael Ellerman wrote: > Christophe Leroy writes: [..] > > arch_remap() gets replaced by vdso_remap() > > > > For arch_unmap(), I'm wondering how/what other architectures do, because > > powerpc seems to be the only one to erase the vdso context pointer when > > unmapping the vdso. > > Yeah. The original unmap/remap stuff was added for CRIU, which I thought > people tested on other architectures (more than powerpc even). > > Possibly no one really cares about vdso unmap though, vs just moving the > vdso. > > We added a test for vdso unmap recently because it happened to trigger a > KAUP failure, and someone actually hit it & reported it. You right, CRIU cares much more about moving vDSO. It's done for each restoree and as on most setups vDSO is premapped and used by the application - it's actively tested. Speaking about vDSO unmap - that's concerning only for heterogeneous C/R, i.e when an application is migrated from a system that uses vDSO to the one which doesn't - it's much rare scenario. (for arm it's !CONFIG_VDSO, for x86 it's `vdso=0` boot parameter) Looking at the code, it seems quite easy to provide/maintain .close() for vm_special_mapping. A bit harder to add a test from CRIU side (as glibc won't know on restore that it can't use vdso anymore), but totally not impossible. > Running that test on arm64 segfaults: > > # ./sigreturn_vdso > VDSO is at 0x8191f000-0x8191 (4096 bytes) > Signal delivered OK with VDSO mapped > VDSO moved to 0x8191a000-0x8191afff (4096 bytes) > Signal delivered OK with VDSO moved > Unmapped VDSO > Remapped the stack executable > [ 48.556191] potentially unexpected fatal signal 11. > [ 48.556752] CPU: 0 PID: 140 Comm: sigreturn_vdso Not tainted > 5.9.0-rc2-00057-g2ac69819ba9e #190 > [ 48.556990] Hardware name: linux,dummy-virt (DT) > [ 48.557336] pstate: 60001000 (nZCv daif -PAN -UAO BTYPE=--) > [ 48.557475] pc : 8191a7bc > [ 48.557603] lr : 8191a7bc > [ 48.557697] sp : c13c9e90 > [ 48.557873] x29: c13cb0e0 x28: > [ 48.558201] x27: x26: > [ 48.558337] x25: x24: > [ 48.558754] x23: x22: > [ 48.558893] x21: 004009b0 x20: > [ 48.559046] x19: 00400ff0 x18: > [ 48.559180] x17: 817da300 x16: 00412010 > [ 48.559312] x15: x14: 001c > [ 48.559443] x13: 656c626174756365 x12: 7865206b63617473 > [ 48.559625] x11: 0003 x10: 0101010101010101 > [ 48.559828] x9 : 818afda8 x8 : 0081 > [ 48.559973] x7 : 6174732065687420 x6 : 64657070616d6552 > [ 48.560115] x5 : 0e0388bd x4 : 0040135d > [ 48.560270] x3 : x2 : 0001 > [ 48.560412] x1 : 0003 x0 : 004120b8 > Segmentation fault > # > > So I think we need to keep the unmap hook. Maybe it should be handled by > the special_mapping stuff generically. I'll cook a patch for vm_special_mapping if you don't mind :-) Thanks, Dmitry
[PATCHv3 00/50] Add log level to show_stack()
Changes to v3: - Collected more architectual Acks and Reviewed-by - Fixed compilation on sparc64 Changes to v2: - Removed excessive pr_cont("\n") (nits by Senozhatsky) - Leave backtrace debugging messages with pr_debug() (noted by Russell King and Will Deacon) - Correct microblaze_unwind_inner() declaration (Thanks to Michal Simek and kbuild test robot) - Fix copy'n'paste typo in show_stack_loglvl() for sparc (kbuild robot) - Fix backtrace output on xtensa (Thanks Max Filippov) - Add loglevel to show_stack() on s390 (kbuild robot) - Collected all Reviewed-by and Acked-by (thanks!) v2: https://lore.kernel.org/linux-riscv/20200316143916.195608-1-d...@arista.com/ v1: https://lore.kernel.org/linux-riscv/20191106030542.868541-1-d...@arista.com/ Add log level argument to show_stack(). Done in three stages: 1. Introducing show_stack_loglvl() for every architecture 2. Migrating old users with an explicit log level 3. Renaming show_stack_loglvl() into show_stack() Justification: o It's a design mistake to move a business-logic decision into platform realization detail. o I have currently two patches sets that would benefit from this work: Removing console_loglevel jumps in sysrq driver [1] Hung task warning before panic [2] - suggested by Tetsuo (but he probably didn't realise what it would involve). o While doing (1), (2) the backtraces were adjusted to headers and other messages for each situation - so there won't be a situation when the backtrace is printed, but the headers are missing because they have lesser log level (or the reverse). o As the result in (2) plays with console_loglevel for kdb are removed. The least important for upstream, but maybe still worth to note that every company I've worked in so far had an off-list patch to print backtrace with the needed log level (but only for the architecture they cared about). If you have other ideas how you will benefit from show_stack() with a log level - please, reply to this cover letter. See also discussion on v1: https://lore.kernel.org/linux-riscv/20191106083538.z5nlpuf64cigx...@pathway.suse.cz/ Cc: Andrew Morton Cc: Greg Kroah-Hartman Cc: Ingo Molnar Cc: Jiri Slaby Cc: Petr Mladek Cc: Sergey Senozhatsky Cc: Steven Rostedt Cc: Tetsuo Handa Thanks, Dmitry [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u [2]: https://lkml.kernel.org/r/41fd7652-df1f-26f6-aba0-b87ebae07...@i-love.sakura.ne.jp Dmitry Safonov (50): kallsyms/printk: Add loglvl to print_ip_sym() alpha: Add show_stack_loglvl() arc: Add show_stack_loglvl() arm/asm: Add loglvl to c_backtrace() arm: Add loglvl to unwind_backtrace() arm: Add loglvl to dump_backtrace() arm: Wire up dump_backtrace_{entry,stm} arm: Add show_stack_loglvl() arm64: Add loglvl to dump_backtrace() arm64: Add show_stack_loglvl() c6x: Add show_stack_loglvl() csky: Add show_stack_loglvl() h8300: Add show_stack_loglvl() hexagon: Add show_stack_loglvl() ia64: Pass log level as arg into ia64_do_show_stack() ia64: Add show_stack_loglvl() m68k: Add show_stack_loglvl() microblaze: Add loglvl to microblaze_unwind_inner() microblaze: Add loglvl to microblaze_unwind() microblaze: Add show_stack_loglvl() mips: Add show_stack_loglvl() nds32: Add show_stack_loglvl() nios2: Add show_stack_loglvl() openrisc: Add show_stack_loglvl() parisc: Add show_stack_loglvl() powerpc: Add show_stack_loglvl() riscv: Add show_stack_loglvl() s390: Add show_stack_loglvl() sh: Add loglvl to dump_mem() sh: Remove needless printk() sh: Add loglvl to printk_address() sh: Add loglvl to show_trace() sh: Add show_stack_loglvl() sparc: Add show_stack_loglvl() um/sysrq: Remove needless variable sp um: Add show_stack_loglvl() unicore32: Remove unused pmode argument in c_backtrace() unicore32: Add loglvl to c_backtrace() unicore32: Add show_stack_loglvl() x86: Add missing const qualifiers for log_lvl x86: Add show_stack_loglvl() xtensa: Add loglvl to show_trace() xtensa: Add show_stack_loglvl() sysrq: Use show_stack_loglvl() x86/amd_gart: Print stacktrace for a leak with KERN_ERR power: Use show_stack_loglvl() kdb: Don't play with console_loglevel sched: Print stack trace with KERN_INFO kernel: Use show_stack_loglvl() kernel: Rename show_stack_loglvl() => show_stack() arch/alpha/kernel/traps.c| 22 +++ arch/arc/include/asm/bug.h | 3 ++- arch/arc/kernel/stacktrace.c | 17 +++- arch/arc/kernel/troubleshoot.c | 2 +- arch/arm/include/asm/bug.h | 3 ++- arch/arm/include/asm/traps.h | 3 ++- arch/arm/include/asm/unwind.h| 3 ++- arch/arm/kernel/traps.c | 39 +++ arch/arm/kernel/unwind.c | 5 ++-- arch/arm/lib/backtrace-clang.S | 9 +-- arch/arm/lib/backtrace.S | 14 +++--- arch/arm64/include/asm/stacktrace.h | 3 ++- arch
[PATCHv3 26/50] powerpc: Add show_stack_loglvl()
Currently, the log-level of show_stack() depends on a platform realization. It creates situations where the headers are printed with lower log level or higher than the stacktrace (depending on a platform or user). Furthermore, it forces the logic decision from user to an architecture side. In result, some users as sysrq/kdb/etc are doing tricks with temporary rising console_loglevel while printing their messages. And in result it not only may print unwanted messages from other CPUs, but also omit printing at all in the unlucky case where the printk() was deferred. Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems an easier approach than introducing more printk buffers. Also, it will consolidate printings with headers. Introduce show_stack_loglvl(), that eventually will substitute show_stack(). Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: Paul Mackerras Cc: linuxppc-dev@lists.ozlabs.org [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u Acked-by: Michael Ellerman (powerpc) Signed-off-by: Dmitry Safonov --- arch/powerpc/kernel/process.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 9c21288f8645..6ad438d59796 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -2068,7 +2068,8 @@ unsigned long get_wchan(struct task_struct *p) static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH; -void show_stack(struct task_struct *tsk, unsigned long *stack) +void show_stack_loglvl(struct task_struct *tsk, unsigned long *stack, + const char *loglvl) { unsigned long sp, ip, lr, newsp; int count = 0; @@ -2093,7 +2094,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) } lr = 0; - printk("Call Trace:\n"); + printk("%sCall Trace:\n", loglvl); do { if (!validate_sp(sp, tsk, STACK_FRAME_OVERHEAD)) break; @@ -2102,7 +2103,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) newsp = stack[0]; ip = stack[STACK_FRAME_LR_SAVE]; if (!firstframe || ip != lr) { - printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip); + printk("%s["REG"] ["REG"] %pS", + loglvl, sp, ip, (void *)ip); #ifdef CONFIG_FUNCTION_GRAPH_TRACER ret_addr = ftrace_graph_ret_addr(current, _idx, ip, stack); @@ -2124,8 +2126,9 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) struct pt_regs *regs = (struct pt_regs *) (sp + STACK_FRAME_OVERHEAD); lr = regs->link; - printk("--- interrupt: %lx at %pS\nLR = %pS\n", - regs->trap, (void *)regs->nip, (void *)lr); + printk("%s--- interrupt: %lx at %pS\nLR = %pS\n", + loglvl, regs->trap, + (void *)regs->nip, (void *)lr); firstframe = 1; } @@ -2135,6 +2138,11 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) put_task_stack(tsk); } +void show_stack(struct task_struct *tsk, unsigned long *stack) +{ + show_stack_loglvl(tsk, stack, KERN_DEFAULT); +} + #ifdef CONFIG_PPC64 /* Called with hard IRQs off */ void notrace __ppc64_runlatch_on(void) -- 2.26.0
[PATCHv2 00/50] Add log level to show_stack()
Changes to v2: - Removed excessive pr_cont("\n") (nits by Senozhatsky) - Leave backtrace debugging messages with pr_debug() (noted by Russell King and Will Deacon) - Correct microblaze_unwind_inner() declaration (Thanks to Michal Simek and kbuild test robot) - Fix copy'n'paste typo in show_stack_loglvl() for sparc (kbuild robot) - Fix backtrace output on xtensa (Thanks Max Filippov) - Add loglevel to show_stack() on s390 (kbuild robot) - Collected all Reviewed-by and Acked-by (thanks!) Add log level argument to show_stack(). Done in three stages: 1. Introducing show_stack_loglvl() for every architecture 2. Migrating old users with an explicit log level 3. Renaming show_stack_loglvl() into show_stack() Justification: o It's a design mistake to move a business-logic decision into platform realization detail. o I have currently two patches sets that would benefit from this work: Removing console_loglevel jumps in sysrq driver [1] Hung task warning before panic [2] - suggested by Tetsuo (but he probably didn't realise what it would involve). o While doing (1), (2) the backtraces were adjusted to headers and other messages for each situation - so there won't be a situation when the backtrace is printed, but the headers are missing because they have lesser log level (or the reverse). The least important for upstream, but maybe still worth to note that every company I've worked in so far had an off-list patch to print backtrace with the needed log level (but only for the architecture they cared about). If you have other ideas how you will benefit from show_stack() with a log level - please, reply to this cover letter. See also discussion on v1: https://lore.kernel.org/linux-riscv/20191106083538.z5nlpuf64cigx...@pathway.suse.cz/ Cc: Andrew Morton Cc: Greg Kroah-Hartman Cc: Ingo Molnar Cc: Jiri Slaby Cc: Petr Mladek Cc: Sergey Senozhatsky Cc: Steven Rostedt Cc: Tetsuo Handa Thanks, Dmitry [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u [2]: https://lkml.kernel.org/r/41fd7652-df1f-26f6-aba0-b87ebae07...@i-love.sakura.ne.jp Dmitry Safonov (50): kallsyms/printk: Add loglvl to print_ip_sym() alpha: Add show_stack_loglvl() arc: Add show_stack_loglvl() arm/asm: Add loglvl to c_backtrace() arm: Add loglvl to unwind_backtrace() arm: Add loglvl to dump_backtrace() arm: Wire up dump_backtrace_{entry,stm} arm: Add show_stack_loglvl() arm64: Add loglvl to dump_backtrace() arm64: Add show_stack_loglvl() c6x: Add show_stack_loglvl() csky: Add show_stack_loglvl() h8300: Add show_stack_loglvl() hexagon: Add show_stack_loglvl() ia64: Pass log level as arg into ia64_do_show_stack() ia64: Add show_stack_loglvl() m68k: Add show_stack_loglvl() microblaze: Add loglvl to microblaze_unwind_inner() microblaze: Add loglvl to microblaze_unwind() microblaze: Add show_stack_loglvl() mips: Add show_stack_loglvl() nds32: Add show_stack_loglvl() nios2: Add show_stack_loglvl() openrisc: Add show_stack_loglvl() parisc: Add show_stack_loglvl() powerpc: Add show_stack_loglvl() riscv: Add show_stack_loglvl() s390: Add show_stack_loglvl() sh: Add loglvl to dump_mem() sh: Remove needless printk() sh: Add loglvl to printk_address() sh: Add loglvl to show_trace() sh: Add show_stack_loglvl() sparc: Add show_stack_loglvl() um/sysrq: Remove needless variable sp um: Add show_stack_loglvl() unicore32: Remove unused pmode argument in c_backtrace() unicore32: Add loglvl to c_backtrace() unicore32: Add show_stack_loglvl() x86: Add missing const qualifiers for log_lvl x86: Add show_stack_loglvl() xtensa: Add loglvl to show_trace() xtensa: Add show_stack_loglvl() sysrq: Use show_stack_loglvl() x86/amd_gart: Print stacktrace for a leak with KERN_ERR power: Use show_stack_loglvl() kdb: Don't play with console_loglevel sched: Print stack trace with KERN_INFO kernel: Use show_stack_loglvl() kernel: Rename show_stack_loglvl() => show_stack() arch/alpha/kernel/traps.c| 22 +++ arch/arc/include/asm/bug.h | 3 ++- arch/arc/kernel/stacktrace.c | 17 +++- arch/arc/kernel/troubleshoot.c | 2 +- arch/arm/include/asm/bug.h | 3 ++- arch/arm/include/asm/traps.h | 3 ++- arch/arm/include/asm/unwind.h| 3 ++- arch/arm/kernel/traps.c | 40 arch/arm/kernel/unwind.c | 7 ++--- arch/arm/lib/backtrace-clang.S | 9 +-- arch/arm/lib/backtrace.S | 14 +++--- arch/arm64/include/asm/stacktrace.h | 3 ++- arch/arm64/kernel/process.c | 2 +- arch/arm64/kernel/traps.c| 19 ++--- arch/c6x/kernel/traps.c | 18 +++-- arch/csky/kernel/dumpstack.c | 9 --- arch/csky/kernel/ptrace.c| 4 +-- arch/h8300/kernel/traps.c| 12 - arch/hexagon/kernel/traps.c
[PATCHv2 26/50] powerpc: Add show_stack_loglvl()
Currently, the log-level of show_stack() depends on a platform realization. It creates situations where the headers are printed with lower log level or higher than the stacktrace (depending on a platform or user). Furthermore, it forces the logic decision from user to an architecture side. In result, some users as sysrq/kdb/etc are doing tricks with temporary rising console_loglevel while printing their messages. And in result it not only may print unwanted messages from other CPUs, but also omit printing at all in the unlucky case where the printk() was deferred. Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems an easier approach than introducing more printk buffers. Also, it will consolidate printings with headers. Introduce show_stack_loglvl(), that eventually will substitute show_stack(). Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: Paul Mackerras Cc: linuxppc-dev@lists.ozlabs.org [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u Acked-by: Michael Ellerman (powerpc) Signed-off-by: Dmitry Safonov --- arch/powerpc/kernel/process.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index fad50db9dcf2..c1ab7f613da4 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -2034,7 +2034,8 @@ unsigned long get_wchan(struct task_struct *p) static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH; -void show_stack(struct task_struct *tsk, unsigned long *stack) +void show_stack_loglvl(struct task_struct *tsk, unsigned long *stack, + const char *loglvl) { unsigned long sp, ip, lr, newsp; int count = 0; @@ -2059,7 +2060,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) } lr = 0; - printk("Call Trace:\n"); + printk("%sCall Trace:\n", loglvl); do { if (!validate_sp(sp, tsk, STACK_FRAME_OVERHEAD)) break; @@ -2068,7 +2069,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) newsp = stack[0]; ip = stack[STACK_FRAME_LR_SAVE]; if (!firstframe || ip != lr) { - printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip); + printk("%s["REG"] ["REG"] %pS", + loglvl, sp, ip, (void *)ip); #ifdef CONFIG_FUNCTION_GRAPH_TRACER ret_addr = ftrace_graph_ret_addr(current, _idx, ip, stack); @@ -2090,8 +2092,9 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) struct pt_regs *regs = (struct pt_regs *) (sp + STACK_FRAME_OVERHEAD); lr = regs->link; - printk("--- interrupt: %lx at %pS\nLR = %pS\n", - regs->trap, (void *)regs->nip, (void *)lr); + printk("%s--- interrupt: %lx at %pS\nLR = %pS\n", + loglvl, regs->trap, + (void *)regs->nip, (void *)lr); firstframe = 1; } @@ -2101,6 +2104,11 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) put_task_stack(tsk); } +void show_stack(struct task_struct *tsk, unsigned long *stack) +{ + show_stack_loglvl(tsk, stack, KERN_DEFAULT); +} + #ifdef CONFIG_PPC64 /* Called with hard IRQs off */ void notrace __ppc64_runlatch_on(void) -- 2.25.1
[PATCH-tty-testing] tty/serial/8250: Add has_sysrq to plat_serial8250_port
In contrast to 8250/8250_of, legacy_serial on powerpc does fill (struct plat_serial8250_port). The reason is likely that it's done on device_initcall(), not on probe. So, 8250_core is not yet probed. Propagate value from platform_device on 8250 probe - in case powepc legacy driver it's initialized on initcall, in case 8250_of it will be initialized later on of_platform_serial_setup(). Fixes: ea2683bf546c ("tty/serial: Migrate 8250_fsl to use has_sysrq"). Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Reported-by: kbuild test robot Signed-off-by: Dmitry Safonov --- It's probably better to squash this into the 8250_fsl patch. I've added Fixes tag in case the branch won't be rebased. Tested powerpc build manually with ppc64 cross-compiler. drivers/tty/serial/8250/8250_core.c | 1 + include/linux/serial_8250.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c index e682390ce0de..0894a22fd702 100644 --- a/drivers/tty/serial/8250/8250_core.c +++ b/drivers/tty/serial/8250/8250_core.c @@ -816,6 +816,7 @@ static int serial8250_probe(struct platform_device *dev) uart.port.flags = p->flags; uart.port.mapbase = p->mapbase; uart.port.hub6 = p->hub6; + uart.port.has_sysrq = p->has_sysrq; uart.port.private_data = p->private_data; uart.port.type = p->type; uart.port.serial_in = p->serial_in; diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h index bb2bc99388ca..6a8e8c48c882 100644 --- a/include/linux/serial_8250.h +++ b/include/linux/serial_8250.h @@ -25,6 +25,7 @@ struct plat_serial8250_port { unsigned char regshift; /* register shift */ unsigned char iotype; /* UPIO_* */ unsigned char hub6; + unsigned char has_sysrq; /* supports magic SysRq */ upf_t flags; /* UPF_* flags */ unsigned inttype; /* If UPF_FIXED_TYPE */ unsigned int(*serial_in)(struct uart_port *, int); -- 2.24.1
Re: [PATCH 00/58] serial/sysrq: Cleanup ifdeffery
Hi Christophe, On 12/13/19 5:47 AM, Christophe Leroy wrote: > Le 13/12/2019 à 01:05, Dmitry Safonov a écrit : [..] > > powerpc patchwork didn't get the full series, see > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=148198 Yes, I was under impression that architecture mail-lists want related patches. But now I see that from the patchwork point of view it's better to have the whole series in inbox. > Can't find them on linux-serial patchwork either > (https://patches.linaro.org/project/linux-serial/list/) I'm not sure - maybe the frequency of checking is low? I see all patches in linux-serial ml: https://marc.info/?l=linux-serial=1=201912=2 > It is impossible to review/test powerpc bits without the first patches > of the series, where can the entire series be found ? Sorry for the inconvenience. I can resend without Cc'ing all people just to ppc mail-list if that works for you. Or you can clone it directly from my github: https://github.com/0x7f454c46/linux/tree/sysrq-serial-seq-v1 Thanks, Dmitry
[PATCH 49/58] serial/ucc_uart: Remove ifdef SUPPORT_SYSRQ
ucc_uart doesn't seem to support console over itself, so maybe it can be deleted with uart_handle_sysrq_char() from the file. Cc: Timur Tabi Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Dmitry Safonov --- drivers/tty/serial/ucc_uart.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/tty/serial/ucc_uart.c b/drivers/tty/serial/ucc_uart.c index a0555ae2b1ef..ff7784047156 100644 --- a/drivers/tty/serial/ucc_uart.c +++ b/drivers/tty/serial/ucc_uart.c @@ -551,9 +551,7 @@ static void qe_uart_int_rx(struct uart_qe_port *qe_port) /* Overrun does not affect the current character ! */ if (status & BD_SC_OV) tty_insert_flip_char(tport, 0, TTY_OVERRUN); -#ifdef SUPPORT_SYSRQ port->sysrq = 0; -#endif goto error_return; } -- 2.24.0
[PATCH 31/58] tty/serial: Migrate pmac_zilog to use has_sysrq
The SUPPORT_SYSRQ ifdeffery is not nice as: - May create misunderstanding about sizeof(struct uart_port) between different objects - Prevents moving functions from serial_core.h - Reduces readability (well, it's ifdeffery - it's hard to follow) In order to remove SUPPORT_SYSRQ, has_sysrq variable has been added. Initialise it in driver's probe and remove ifdeffery. Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: Paul Mackerras Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Dmitry Safonov --- drivers/tty/serial/pmac_zilog.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/tty/serial/pmac_zilog.c b/drivers/tty/serial/pmac_zilog.c index bcb5bf70534e..ba65a3bbd72a 100644 --- a/drivers/tty/serial/pmac_zilog.c +++ b/drivers/tty/serial/pmac_zilog.c @@ -61,10 +61,6 @@ #define of_machine_is_compatible(x) (0) #endif -#if defined (CONFIG_SERIAL_PMACZILOG_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ) -#define SUPPORT_SYSRQ -#endif - #include #include @@ -1721,6 +1717,7 @@ static int __init pmz_init_port(struct uart_pmac_port *uap) uap->control_reg = uap->port.membase; uap->data_reg = uap->control_reg + 4; uap->port_type = 0; + uap->port.has_sysrq = IS_ENABLED(CONFIG_SERIAL_PMACZILOG_CONSOLE); pmz_convert_to_zs(uap, CS8, 0, 9600); -- 2.24.0
[PATCH 05/58] tty/serial: Migrate 8250_fsl to use has_sysrq
The SUPPORT_SYSRQ ifdeffery is not nice as: - May create misunderstanding about sizeof(struct uart_port) between different objects - Prevents moving functions from serial_core.h - Reduces readability (well, it's ifdeffery - it's hard to follow) In order to remove SUPPORT_SYSRQ, has_sysrq variable has been added. Initialise it in driver's probe and remove ifdeffery. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Dmitry Safonov --- arch/powerpc/kernel/legacy_serial.c | 4 +++- drivers/tty/serial/8250/8250_fsl.c | 4 drivers/tty/serial/8250/8250_of.c | 4 +++- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/legacy_serial.c b/arch/powerpc/kernel/legacy_serial.c index 7cea5978f21f..f061e06e9f51 100644 --- a/arch/powerpc/kernel/legacy_serial.c +++ b/arch/powerpc/kernel/legacy_serial.c @@ -479,8 +479,10 @@ static void __init fixup_port_irq(int index, port->irq = virq; #ifdef CONFIG_SERIAL_8250_FSL - if (of_device_is_compatible(np, "fsl,ns16550")) + if (of_device_is_compatible(np, "fsl,ns16550")) { port->handle_irq = fsl8250_handle_irq; + port->has_sysrq = IS_ENABLED(CONFIG_SERIAL_8250_CONSOLE); + } #endif } diff --git a/drivers/tty/serial/8250/8250_fsl.c b/drivers/tty/serial/8250/8250_fsl.c index aa0e216d5ead..0d0c80905c58 100644 --- a/drivers/tty/serial/8250/8250_fsl.c +++ b/drivers/tty/serial/8250/8250_fsl.c @@ -1,8 +1,4 @@ // SPDX-License-Identifier: GPL-2.0 -#if defined(CONFIG_SERIAL_8250_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ) -#define SUPPORT_SYSRQ -#endif - #include #include diff --git a/drivers/tty/serial/8250/8250_of.c b/drivers/tty/serial/8250/8250_of.c index 92fbf46ce3bd..531ad67395e0 100644 --- a/drivers/tty/serial/8250/8250_of.c +++ b/drivers/tty/serial/8250/8250_of.c @@ -222,8 +222,10 @@ static int of_platform_serial_setup(struct platform_device *ofdev, if (IS_ENABLED(CONFIG_SERIAL_8250_FSL) && (of_device_is_compatible(np, "fsl,ns16550") || -of_device_is_compatible(np, "fsl,16550-FIFO64"))) +of_device_is_compatible(np, "fsl,16550-FIFO64"))) { port->handle_irq = fsl8250_handle_irq; + port->has_sysrq = IS_ENABLED(CONFIG_SERIAL_8250_CONSOLE); + } return 0; err_unprepare: -- 2.24.0
[PATCH 00/58] serial/sysrq: Cleanup ifdeffery
The original purpose of the patches set was to add a way to enable sysrq on a uart where currently it can be constantly either on or off (CONFIG_MAGIC_SYSRQ_SERIAL), see the last patch: "serial/sysrq: Add MAGIC_SYSRQ_SERIAL_SEQUENCE" But to do that, I had to add uart_try_toggle_sysrq() and I didn't want to bloat serial_core.h even more. So, I did cleanup by removing SUPPORT_SYSRQ resulting in a nice diff-stat and lesser ifdeffery. Most patches are one-liners, I decided to keep them separated per-driver to let reviewers easier follow the purpose. Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Vasiliy Khoruzhick Cc: linux-ser...@vger.kernel.org Dmitry Safonov (58): sysrq: Remove sysrq_handler_registered serial: Move sysrq members above serial_core: Un-ifdef sysrq SUPPORT_SYSRQ tty/serial: Migrate aspeed_vuart to use has_sysrq tty/serial: Migrate 8250_fsl to use has_sysrq tty/serial: Migrate bcm63xx_uart to use has_sysrq tty/serial: Migrate 8250_omap to use has_sysrq tty/serial: Migrate 8250_port to use has_sysrq tty/serial: Migrate amba-pl01* to use has_sysrq tty/serial: Migrate apbuart to use has_sysrq tty/serial: Migrate arc_uart to use has_sysrq tty/serial: Migrate atmel_serial to use has_sysrq tty/serial: Migrate clps711x to use has_sysrq tty/serial: Migrate cpm_uart to use has_sysrq tty/serial: Migrate dz to use has_sysrq tty/serial: Migrate efm32-uart to use has_sysrq tty/serial: Migrate fsl_linflexuart to use has_sysrq tty/serial: Migrate fsl_lpuart to use has_sysrq tty/serial: Migrate imx to use has_sysrq tty/serial: Migrate ip22zilog to use has_sysrq tty/serial: Migrate meson_uart to use has_sysrq tty/serial: Migrate milbeaut_usio to use has_sysrq tty/serial: Migrate mpc52xx_uart to use has_sysrq tty/serial: Don't zero port->sysrq tty/serial: Migrate msm_serial to use has_sysrq tty/serial: Migrate mux to use has_sysrq tty/serial: Migrate mxs-auart to use has_sysrq tty/serial: Migrate omap-serial to use has_sysrq tty/serial: Migrate pch_uart to use has_sysrq tty/serial: Don't check port->sysrq tty/serial: Migrate pmac_zilog to use has_sysrq tty/serial: Migrate pnx8xxx_uart to use has_sysrq serial/f81534: Don't check port->sysrq tty/serial: Migrate pxa to use has_sysrq tty/serial: Migrate qcom_geni_serial to use has_sysrq tty/serial: Migrate sa1100 to use has_sysrq tty/serial: Migrate samsung_tty to use has_sysrq tty/serial: Migrate sb1250-duart to use has_sysrq tty/serial: Migrate sccnxp to use has_sysrq tty/serial: Migrate serial_txx9 to use has_sysrq tty/serial: Migrate sh-sci to use has_sysrq tty/serial: Migrate sprd_serial to use has_sysrq tty/serial: Migrate st-asc to use has_sysrq tty/serial: Migrate stm32-usart to use has_sysrq tty/serial: Migrate sunhv to use has_sysrq tty/serial: Migrate sunsab to use has_sysrq tty/serial: Migrate sunsu to use has_sysrq tty/serial: Migrate sunzilog to use has_sysrq serial/ucc_uart: Remove ifdef SUPPORT_SYSRQ tty/serial: Migrate vr41xx_siu to use has_sysrq tty/serial: Migrate vt8500_serial to use has_sysrq tty/serial: Migrate xilinx_uartps to use has_sysrq tty/serial: Migrate zs to use has_sysrq serial_core: Remove SUPPORT_SYSRQ ifdeffery usb/serial: Don't handle break when CONFIG_MAGIC_SYSRQ is disabled serial_core: Move sysrq functions from header file sysctl/sysrq: Remove __sysrq_enabled copy serial/sysrq: Add MAGIC_SYSRQ_SERIAL_SEQUENCE arch/powerpc/kernel/legacy_serial.c | 4 +- drivers/tty/serial/8250/8250_aspeed_vuart.c | 5 +- drivers/tty/serial/8250/8250_fsl.c | 4 - drivers/tty/serial/8250/8250_of.c | 4 +- drivers/tty/serial/8250/8250_omap.c | 5 +- drivers/tty/serial/8250/8250_port.c | 5 +- drivers/tty/serial/amba-pl010.c | 5 +- drivers/tty/serial/amba-pl011.c | 6 +- drivers/tty/serial/apbuart.c| 5 +- drivers/tty/serial/arc_uart.c | 5 +- drivers/tty/serial/atmel_serial.c | 9 +- drivers/tty/serial/bcm63xx_uart.c | 5 +- drivers/tty/serial/clps711x.c | 5 +- drivers/tty/serial/cpm_uart/cpm_uart_core.c | 9 +- drivers/tty/serial/dz.c | 5 +- drivers/tty/serial/efm32-uart.c | 5 +- drivers/tty/serial/fsl_linflexuart.c| 8 +- drivers/tty/serial/fsl_lpuart.c | 9 +- drivers/tty/serial/imx.c| 7 +- drivers/tty/serial/ip22zilog.c | 7 +- drivers/tty/serial/meson_uart.c | 5 +- drivers/tty/serial/milbeaut_usio.c | 5 +- drivers/tty/serial/mpc52xx_uart.c | 11 +- drivers/tty/serial/msm_serial.c | 5 +- drivers/tty/serial/mux.c| 5 +- drivers/tty/serial/mxs-auart.c | 5 +- drivers/tty/serial/omap-serial.c| 5 +- drivers/tty/serial/pch_uart.c |
Re: [PATCH 00/50] Add log level to show_stack()
Hi Sergey, On 11/13/19 6:33 AM, Sergey Senozhatsky wrote: [..] > Well, here we go. There is a number of generally useful functions that > print nice data and where people might want to have better loglevel control > (for debugging purposes). show_stack() is just one of them. Patching all > those functions, which you have mentioned above, is hardly a fun task to do. > Hence the printk() per-CPU per-context loglevel proposition. The code there > is not clever or complicated and is meant for debugging purposes only, but > with just 3 lines of code one can do some stuff: > > /* @BTW you can't do this with "%s" KERN_FOO ;P */ > + printk_emergency_enter(LOGLEVEL_SCHED); > + debug_show_all_locks(); > + printk_emergency_exit(); Not that I want to critic your proposal more, but just to clarify what I've meant by "cleaver and complicated": I don't think semantically there is any difference between: printk_emergency_enter(LOGLEVEL_SCHED); printk(); printk_emergency_exit(); vs printk("%s ...", KERN_SHED, ...); I was speaking about complexity not about usage, but about realization inside printk_emergency_enter(): there will be some business logic that will do "it's NMI? Use loglevel given. it's KERN_ALERT? Don't downgrade the loglevel..." and other smart details those are really logic which one have to think about and later maintain. There will be also minor issues like people inserting print with one log level and seeing it in dmesg with another, but I presume those printk_enter() and printk_exit() will cover little amount of code without much nesting [as long as it's not getting overused]. And also it can be documented and people will learn about another feature of printk(). And this year I've seen the presentation "Why printk() is so complicated?" and I presumed that the approach is to keep things as simple as possible. In conclusion: I see that your proposal also solves the problem [except preemption and some pr_debug() that shouldn't be printed]. And I think your approach is better in the sense of short-term (we won't have any missed %s KERN_ in linux-next), but in a long-term it adds some amount of business logic to printk and another feature. Just in case: again, I don't mind, it's up to you, maintainers of printk. It's also not very fun for me to create those patches. But they fix console_loglevel issues (I hope we could un-export it in the end) and also I need it for my other patches those will produce warnings with debug loglevel when configured through sysctl. Thanks, Dmitry
Re: [PATCH 00/50] Add log level to show_stack()
On 11/12/19 4:25 AM, Sergey Senozhatsky wrote: > On (19/11/12 02:40), Dmitry Safonov wrote: > [..] >> In my point of view the cost of one-time [mostly build] testing every >> architecture is cheaper than introducing some new smart code that will >> live forever. > > Well, there may be the need to pass loglevel deeper due to "hey __show_stack() > on that arch invokes bar(), which invokes foo() and now foo() does printk(), > but we don't see it". The context which decided to backtaraces decided > to do so for a reason, probably, so I guess we can look at it as "a special > error reporting code block". > > The proposed patch set passes loglevel via slightly unusual channel - > via sprintf(). We probably can do it, but I would prefer to minimize > the number of such printk-s in the kernel. The code snippet which I > posted also does pretty unusual thing w.r.t loglevel. Both approaches > are "non-standard" from that POV. I don't strongly disagree, but if you look at those results: git grep 'printk("%s.*", \(lvl\|level\)' it seems to be used in quite a few places. Thanks, Dmitry
Re: [PATCH 00/50] Add log level to show_stack()
On 11/13/19 1:23 AM, Sergey Senozhatsky wrote: > On (19/11/12 19:12), Sergey Senozhatsky wrote: >> On (19/11/12 09:35), Petr Mladek wrote: >> [..] >>> This is getting too complicated. It would introduce too many >>> hidden rules. While the explicitly passed loglevel parameter >>> is straightforward and clear. >> >> If loglevel is DEFAULT or NOTICE or INFO then we can overwrite it >> (either downgrade or upgrade). That's one rule, basically. Not too >> complicated, I guess. > > Can be taken even a bit further than > > show_stack(NULL, NULL, LOGLEVEL_DEBUG); > or > show_stack(NULL, NULL, LOGLEVEL_ERR); > > For instance, > > spin_lock_irqsave(>lock, flags); > printk_emergency_enter(LOGLEVEL_SCHED); > ... > show_stack(...); > printk(); > printk(); > ... > spin_unlock_irqrestore(>lock, flags); > > or > > spin_lock_irqsave(_port->lock, flags); > printk_emergency_enter(LOGLEVEL_SCHED); > ... > printk(); > printk(); > ... Yeah, that makes sense. I believe it's up to you, Petr and Steven to decide what's better for printk(). I mean, after all you care for all this code. I guess I've pointed that in my point of view price for one-time testing code is cheaper than adding a new printk feature to swap log levels on the fly. But again, it's just how I see it - if we fix those printing problems, than in half year we will forget they ever existed, but in your proposal, there will still be some clever printk code. On the other side, also worth to note that current patches set fix problems for kdb (and for my hung task printing patch), but it's incomplete for sysrq driver. I've gone through functions used by sysrq driver and the same changes introducing log level parameter would be needed for: sched_show_task(), debug_show_all_locks(), show_regs(), show_state(), show_mem(). Some of them don't need any platform changes, but at least show_regs() needs. So, you also need to have that in mind making the decision. Thanks, Dmitry
Re: [PATCH 00/50] Add log level to show_stack()
Hi Sergey, On 11/12/19 2:17 AM, Sergey Senozhatsky wrote: > On (19/11/11 19:47), Dmitry Safonov wrote: [..] >> What I'm going to do - is to fix all build and reported issues, I'll >> send v2 this week and feel free to NAK it, I will forget about those >> patches and won't be offended. > > Lovely. > And - no, I'm not going to NAK platform specific changes. Just so you know. > > *All* I'm talking about is an alternative, less "go and touch a ton of > platform code" approach. The argument "I patched so many files that I'm > not even going to discuss anything now" is not productive, to say the > least. Hope this clarifies. It probably was a wrong impression from the both sides. My impression was "You touch every architecture - we won't even consider that". Sorry for the the wrong impression from my side - I'm open for discussion. In my point of view the cost of one-time [mostly build] testing every architecture is cheaper than introducing some new smart code that will live forever. Though, again you and Petr understand more than me in printk() code, so I'm not any insisting. I'll reply to your suggestion tomorrow, it's a bit late in my tz. Thanks, Dmitry
Re: [PATCH 00/50] Add log level to show_stack()
Hi Sergey, Petr, On 11/11/19 1:23 AM, Sergey Senozhatsky wrote: > On (19/11/08 14:04), Petr Mladek wrote: > [..] >> I agree that it is complicated to pass the loglevel as >> a parameter. It would be better define the default >> log level for a given code section. It might be stored >> in task_struct for the normal context and in per-CPU >> variables for interrupt contexts. > > I do recall that we talked about per-CPU printk state bit which would > start/end "just print it" section. We probably can extend it to "just > log_store" type of functionality. Doesn't look like a very bad idea. > "This task/context is in trouble, whatever it printk()-s is important". I don't see how bits on task_struct or in per-cpu are easier than supplying a log level parameter down the stack. How would it work if sysrq_handle_crash() called by key-press? How would that interact with deferred printing? How would it make visible prints from current context, but not from something that preempted it? Furthermore, what problems are you trying to solve with this design? Only sysrq driver? Kdb? In my perspective it sounds too complicated and over-engineered for something that has two-three users. Also I've tried to point that I need to print backtrace sometimes with KERN_DEBUG loglevel to use it in production for early notices those needs to go only to log files and currently each architecture decides which log level it prefers. And what's so complicated about this patches set? I see only side of the testing, but the build-testing is covered with 0day bot and cost nothing and any visible regression may be found during -next period. While introducing those printk-sections may subtly break things. I mean, I definitely know less about printk() and its internals than you - so my points may be a no-sense. What I'm going to do - is to fix all build and reported issues, I'll send v2 this week and feel free to NAK it, I will forget about those patches and won't be offended. I don't see many people those are "hey, we'll benefit from this". And doing this patches set was neither quite fun (dirty business), nor something I can be later proud of (hey, I've introduced the log level parameter to printing functions!). Thanks, Dima
Re: [PATCH 00/50] Add log level to show_stack()
On 11/8/19 5:30 PM, Russell King - ARM Linux admin wrote: > On Fri, Nov 08, 2019 at 04:28:30PM +0000, Dmitry Safonov wrote: [..] >> >> Well, the use-case for lower log-level is that everything goes into logs >> (/var/log/dmesg or /var/log/messages whatever rsyslog has settting). >> >> That has it's value: >> - after a failure (i.e. panic) messages, those were only signs that >> something goes wrong can be seen in logs which can give ideas what has >> happened. > > No they don't. When the kernel panics, userspace generally stops > running, so rsyslog won't be able to write them to /var/log/messages. > > How, by "kernel panics" I mean a real kernel panic, which probably > isn't what you're talking about there. You are probably talking > about the whole shebang of non-fatal kernel oops, kernel warnings > and the like. If so, I'd ask you to stop confuzzilating terminology. > > If you really want to capture such events, then you need to have the > kernel write the panic to (e.g.) flash - see the mtdoops driver. I was talking about things prior the panic: OOMs, MMC write/read warnings, hung tasks, we also have local patches to produce a warning if the mutex is being held for too long or a task is starving on CPU time by hard/soft irqs (I hope I will design something like that for upstream). I've found those warnings useful to: (a) have an early message when the things are starting going bad. (b) analyze contentions or too large scale for a box or faulty hardware for non-reproducible issues just from logs. We use kexec to save the dmesg ringbuffer content after the panic. Thanks, Dmitry
Re: [PATCH 00/50] Add log level to show_stack()
On 11/6/19 8:34 PM, Peter Zijlstra wrote: > On Wed, Nov 06, 2019 at 04:27:33PM +0000, Dmitry Safonov wrote: [..] >> Sorry, I should have tried to describe better. >> >> I'm trying to remove external users of console_loglevel by following >> reasons: > > I suppose since all my machines have 'debug ignore_loglevel > earlyprintk=serial,ttyS0,115200 console=ttyS0,115200' I don't have this > experience. Yeah, I remember you avoid all those functionalities of printk(), fair enough. On the other side, regular users and I'm betting most of the non-tuned distributions use /proc/sys/kernel/printk by default. (Checking on my Arch & Fedora - loglevel 4 from the box) >> - changing console_loglevel on SMP means that unwanted messages from >> other CPUs will appear (that have lower log level) >> - on UMP unwanted messages may appear if the code is preempted while it >> hasn't set the console_loglevel back to old >> - rising console_loglevel to print wanted message(s) may not work at all >> if printk() has being delayed and the console_loglevel is already set >> back to old value > > Sure, frobbing the global console_loglevel is bad. > >> I also have patches in wip those needs to print backtrace with specific >> loglevel (higher when it's critical, lower when it's notice and >> shouldn't go to serial console). > > (everything always should go to serial, serial is awesome :-) Personally I agree. Unfortunately, here @Arista there are switches (I'm speaking about the order of thousands at least) those have baud-rate 9600. It's a bit expensive being elaborate with such setup. >> Besides on local tests I see hits those have headers (messages like >> "Backtrace: ") without an actual backtrace and the reverse - a backtrace >> without a reason for it. It's quite annoying and worth addressing by >> syncing headers log levels to backtraces. > > I suppose I'm surprised there are backtraces that are not important. > Either badness happened and it needs printing, or the user asked for it > and it needs printing. > > Perhaps we should be removing backtraces if they're not important > instead of allowing to print them as lower loglevels? Well, the use-case for lower log-level is that everything goes into logs (/var/log/dmesg or /var/log/messages whatever rsyslog has settting). That has it's value: - after a failure (i.e. panic) messages, those were only signs that something goes wrong can be seen in logs which can give ideas what has happened. - before the failure, those messages go to telemetry and can be auto-magically processed remotely to take a decision (e.g. balance the traffic away). I wish all the information could be gathered in the userspace, but many existing kernel/driver messages are valuable. A more detailed example is hung task detector: we want to have the backtrace for a task that is in uninterruptible state too long, but only in logs as printing on serial console may lead to holding console lock and watchdog. Besides customer notifications and investigations, I see the value of such "bottleneck" warnings during long-running integration tests. Thanks, Dmitry
Re: [PATCH 00/50] Add log level to show_stack()
Hi Peter, On 11/6/19 9:20 AM, Peter Zijlstra wrote: > On Wed, Nov 06, 2019 at 03:04:51AM +0000, Dmitry Safonov wrote: >> Add log level argument to show_stack(). >> Done in three stages: >> 1. Introducing show_stack_loglvl() for every architecture >> 2. Migrating old users with an explicit log level >> 3. Renaming show_stack_loglvl() into show_stack() >> >> Justification: >> o It's a design mistake to move a business-logic decision >> into platform realization detail. >> o I have currently two patches sets that would benefit from this work: >> Removing console_loglevel jumps in sysrq driver [1] >> Hung task warning before panic [2] - suggested by Tetsuo (but he >> probably didn't realise what it would involve). >> o While doing (1), (2) the backtraces were adjusted to headers >> and other messages for each situation - so there won't be a situation >> when the backtrace is printed, but the headers are missing because >> they have lesser log level (or the reverse). >> o As the result in (2) plays with console_loglevel for kdb are removed. > > I really don't understand that word salad. Why are you doing this? > Sorry, I should have tried to describe better. I'm trying to remove external users of console_loglevel by following reasons: - changing console_loglevel on SMP means that unwanted messages from other CPUs will appear (that have lower log level) - on UMP unwanted messages may appear if the code is preempted while it hasn't set the console_loglevel back to old - rising console_loglevel to print wanted message(s) may not work at all if printk() has being delayed and the console_loglevel is already set back to old value Sysrq driver changes console_loglevel because it needs to print message with a specific log level (the code said userspace relies on this). Kdb changes console_loglevel to print backtrace as the log level depends on architecture realisation. I also have patches in wip those needs to print backtrace with specific loglevel (higher when it's critical, lower when it's notice and shouldn't go to serial console). Besides on local tests I see hits those have headers (messages like "Backtrace: ") without an actual backtrace and the reverse - a backtrace without a reason for it. It's quite annoying and worth addressing by syncing headers log levels to backtraces. Thanks, Dmitry
Re: [PATCH 00/50] Add log level to show_stack()
On 11/6/19 8:35 AM, Petr Mladek wrote: > On Wed 2019-11-06 03:04:51, Dmitry Safonov wrote: >> Add log level argument to show_stack(). >> Done in three stages: >> 1. Introducing show_stack_loglvl() for every architecture >> 2. Migrating old users with an explicit log level >> 3. Renaming show_stack_loglvl() into show_stack() >> >> Justification: >> o It's a design mistake to move a business-logic decision >> into platform realization detail. >> o I have currently two patches sets that would benefit from this work: >> Removing console_loglevel jumps in sysrq driver [1] > > Just to clarify. The problem in sysrq driver is a bit different. > It modifies console_loglevel to show even less important message > on the console. > > IMHO, it should be solved by printing the header line with pr_error(). > It is not ideal. A cleaner solution might be to introduce another > loglevel that will always get pushed to the console. But I am > not sure if it is worth this single line. I believe why it's not done - there is a comment in sysrq code that said the userspace relies on the loglevel of sysrq messages (and may trigger alerts from emerg/err log level): * Raise the apparent loglevel to maximum so that the sysrq header * is shown to provide the user with positive feedback. We do not * simply emit this at KERN_EMERG as that would change message * routing in the consumers of /proc/kmsg. But I don't mind any solution. >> Hung task warning before panic [2] - suggested by Tetsuo (but he >> probably didn't realise what it would involve). >> o While doing (1), (2) the backtraces were adjusted to headers >> and other messages for each situation - so there won't be a situation >> when the backtrace is printed, but the headers are missing because >> they have lesser log level (or the reverse). >> o As the result in (2) plays with console_loglevel for kdb are removed. > >> The least important for upstream, but maybe still worth to note that >> every company I've worked in so far had an off-list patch to print >> backtrace with the needed log level (but only for the architecture they >> cared about). >> If you have other ideas how you will benefit from show_stack() with >> a log level - please, reply to this cover letter. > > I agree with all the other justification. > > I would add. The backtrace is really useful for debugging. It should > be possible to print it even in less critical situations. > > I am afraid that many people use WARN() for this purpose. But WARN() > is not always appropriate. WARN() misuse huts when panic_on_warn > option is used. Thanks, Petr. -- Dmitry
Re: [PATCH 26/50] powerpc: Add show_stack_loglvl()
On 11/6/19 9:52 AM, Michael Ellerman wrote: > Dmitry Safonov writes: >> Currently, the log-level of show_stack() depends on a platform >> realization. It creates situations where the headers are printed with >> lower log level or higher than the stacktrace (depending on >> a platform or user). > > Yes, I've had bug reports where the stacktrace is missing, which is > annoying. Thanks for trying to fix the problem. > >> Furthermore, it forces the logic decision from user to an architecture >> side. In result, some users as sysrq/kdb/etc are doing tricks with >> temporary rising console_loglevel while printing their messages. >> And in result it not only may print unwanted messages from other CPUs, >> but also omit printing at all in the unlucky case where the printk() >> was deferred. >> >> Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems >> an easier approach than introducing more printk buffers. >> Also, it will consolidate printings with headers. >> >> Introduce show_stack_loglvl(), that eventually will substitute >> show_stack(). >> >> Cc: Benjamin Herrenschmidt >> Cc: Michael Ellerman >> Cc: Paul Mackerras >> Cc: linuxppc-dev@lists.ozlabs.org >> [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u >> Signed-off-by: Dmitry Safonov >> --- >> arch/powerpc/kernel/process.c | 18 +- >> 1 file changed, 13 insertions(+), 5 deletions(-) > > This looks good to me. > > Acked-by: Michael Ellerman (powerpc) Thanks for the review and time! -- Dmitry
[PATCH 26/50] powerpc: Add show_stack_loglvl()
Currently, the log-level of show_stack() depends on a platform realization. It creates situations where the headers are printed with lower log level or higher than the stacktrace (depending on a platform or user). Furthermore, it forces the logic decision from user to an architecture side. In result, some users as sysrq/kdb/etc are doing tricks with temporary rising console_loglevel while printing their messages. And in result it not only may print unwanted messages from other CPUs, but also omit printing at all in the unlucky case where the printk() was deferred. Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems an easier approach than introducing more printk buffers. Also, it will consolidate printings with headers. Introduce show_stack_loglvl(), that eventually will substitute show_stack(). Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: Paul Mackerras Cc: linuxppc-dev@lists.ozlabs.org [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u Signed-off-by: Dmitry Safonov --- arch/powerpc/kernel/process.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 639ceae7da9d..34b46680a196 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -2028,7 +2028,8 @@ unsigned long get_wchan(struct task_struct *p) static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH; -void show_stack(struct task_struct *tsk, unsigned long *stack) +void show_stack_loglvl(struct task_struct *tsk, unsigned long *stack, + const char *loglvl) { unsigned long sp, ip, lr, newsp; int count = 0; @@ -2053,7 +2054,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) } lr = 0; - printk("Call Trace:\n"); + printk("%sCall Trace:\n", loglvl); do { if (!validate_sp(sp, tsk, STACK_FRAME_OVERHEAD)) break; @@ -2062,7 +2063,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) newsp = stack[0]; ip = stack[STACK_FRAME_LR_SAVE]; if (!firstframe || ip != lr) { - printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip); + printk("%s["REG"] ["REG"] %pS", + loglvl, sp, ip, (void *)ip); #ifdef CONFIG_FUNCTION_GRAPH_TRACER ret_addr = ftrace_graph_ret_addr(current, _idx, ip, stack); @@ -2084,8 +2086,9 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) struct pt_regs *regs = (struct pt_regs *) (sp + STACK_FRAME_OVERHEAD); lr = regs->link; - printk("--- interrupt: %lx at %pS\nLR = %pS\n", - regs->trap, (void *)regs->nip, (void *)lr); + printk("%s--- interrupt: %lx at %pS\nLR = %pS\n", + loglvl, regs->trap, + (void *)regs->nip, (void *)lr); firstframe = 1; } @@ -2095,6 +2098,11 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) put_task_stack(tsk); } +void show_stack(struct task_struct *tsk, unsigned long *stack) +{ + show_stack_loglvl(tsk, stack, KERN_DEFAULT); +} + #ifdef CONFIG_PPC64 /* Called with hard IRQs off */ void notrace __ppc64_runlatch_on(void) -- 2.23.0
[PATCH 00/50] Add log level to show_stack()
Add log level argument to show_stack(). Done in three stages: 1. Introducing show_stack_loglvl() for every architecture 2. Migrating old users with an explicit log level 3. Renaming show_stack_loglvl() into show_stack() Justification: o It's a design mistake to move a business-logic decision into platform realization detail. o I have currently two patches sets that would benefit from this work: Removing console_loglevel jumps in sysrq driver [1] Hung task warning before panic [2] - suggested by Tetsuo (but he probably didn't realise what it would involve). o While doing (1), (2) the backtraces were adjusted to headers and other messages for each situation - so there won't be a situation when the backtrace is printed, but the headers are missing because they have lesser log level (or the reverse). o As the result in (2) plays with console_loglevel for kdb are removed. The least important for upstream, but maybe still worth to note that every company I've worked in so far had an off-list patch to print backtrace with the needed log level (but only for the architecture they cared about). If you have other ideas how you will benefit from show_stack() with a log level - please, reply to this cover letter. Ok, to the scary part: I've tested it on x86_64 and build tested on a couple of architectures. Though, I can't cover all platforms so I hope I'll have a couple of reports and than it'll soak in linux-next for some time. In my opinion the variety of architectures shouldn't stop general improvements. Cc: Andrew Morton Cc: Greg Kroah-Hartman Cc: Ingo Molnar Cc: Jiri Slaby Cc: Petr Mladek Cc: Sergey Senozhatsky Cc: Steven Rostedt Cc: Tetsuo Handa Thanks, Dmitry [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u [2]: https://lkml.kernel.org/r/41fd7652-df1f-26f6-aba0-b87ebae07...@i-love.sakura.ne.jp Dmitry Safonov (50): kallsyms/printk: Add loglvl to print_ip_sym() alpha: Add show_stack_loglvl() arc: Add show_stack_loglvl() arm/asm: Add loglvl to c_backtrace() arm: Add loglvl to unwind_backtrace() arm: Add loglvl to dump_backtrace() arm: Wire up dump_backtrace_{entry,stm} arm: Add show_stack_loglvl() arm64: Add loglvl to dump_backtrace() arm64: Add show_stack_loglvl() c6x: Add show_stack_loglvl() csky: Add show_stack_loglvl() h8300: Add show_stack_loglvl() hexagon: Add show_stack_loglvl() ia64: Pass log level as arg into ia64_do_show_stack() ia64: Add show_stack_loglvl() m68k: Add show_stack_loglvl() microblaze: Add loglvl to microblaze_unwind_inner() microblaze: Add loglvl to microblaze_unwind() microblaze: Add show_stack_loglvl() mips: Add show_stack_loglvl() nds32: Add show_stack_loglvl() nios2: Add show_stack_loglvl() openrisc: Add show_stack_loglvl() parisc: Add show_stack_loglvl() powerpc: Add show_stack_loglvl() riscv: Add show_stack_loglvl() s390: Add show_stack_loglvl() sh: Add loglvl to dump_mem() sh: Remove needless printk() sh: Add loglvl to printk_address() sh: Add loglvl to show_trace() sh: Add show_stack_loglvl() sparc: Add show_stack_loglvl() um/sysrq: Remove needless variable sp um: Add show_stack_loglvl() unicore32: Remove unused pmode argument in c_backtrace() unicore32: Add loglvl to c_backtrace() unicore32: Add show_stack_loglvl() x86: Add missing const qualifiers for log_lvl x86: Add show_stack_loglvl() xtensa: Add loglvl to show_trace() xtensa: Add show_stack_loglvl() sysrq: Use show_stack_loglvl() x86/amd_gart: Print stacktrace for a leak with KERN_ERR power: Use show_stack_loglvl() kdb: Don't play with console_loglevel sched: Print stack trace with KERN_INFO kernel: Use show_stack_loglvl() kernel: Rename show_stack_loglvl() => show_stack() arch/alpha/kernel/traps.c| 22 +++ arch/arc/include/asm/bug.h | 3 ++- arch/arc/kernel/stacktrace.c | 17 +++- arch/arc/kernel/troubleshoot.c | 2 +- arch/arm/include/asm/bug.h | 3 ++- arch/arm/include/asm/traps.h | 3 ++- arch/arm/include/asm/unwind.h| 3 ++- arch/arm/kernel/traps.c | 40 arch/arm/kernel/unwind.c | 7 ++--- arch/arm/lib/backtrace-clang.S | 9 +-- arch/arm/lib/backtrace.S | 14 +++--- arch/arm64/include/asm/stacktrace.h | 3 ++- arch/arm64/kernel/process.c | 2 +- arch/arm64/kernel/traps.c| 19 ++--- arch/c6x/kernel/traps.c | 18 +++-- arch/csky/kernel/dumpstack.c | 9 --- arch/csky/kernel/ptrace.c| 4 +-- arch/h8300/kernel/traps.c| 12 - arch/hexagon/kernel/traps.c | 25 - arch/ia64/include/asm/ptrace.h | 1 - arch/ia64/kernel/mca.c | 2 +- arch/ia64/kernel/process.c | 17 ++-- arch/m68k/kernel/traps.c | 13 - arch/microblaze/incl
[RFC 1/4] mm: remove unused TASK_SIZE_OF()
All users of TASK_SIZE_OF(tsk) have migrated to mm->task_size or TASK_SIZE_MAX since: commit d696ca016d57 ("x86/fsgsbase/64: Use TASK_SIZE_MAX for FSBASE/GSBASE upper limits"), commit a06db751c321 ("pagemap: check permissions and capabilities at open time"), Cc: Catalin Marinas <catalin.mari...@arm.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Ralf Baechle <r...@linux-mips.org> Cc: "James E.J. Bottomley" <j...@parisc-linux.org> Cc: Helge Deller <del...@gmx.de> Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Martin Schwidefsky <schwidef...@de.ibm.com> Cc: Heiko Carstens <heiko.carst...@de.ibm.com> Cc: "David S. Miller" <da...@davemloft.net> Cc: Peter Zijlstra <pet...@infradead.org> Cc: linux-arm-ker...@lists.infradead.org Cc: linux-m...@linux-mips.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: sparcli...@vger.kernel.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/arm64/include/asm/memory.h | 2 -- arch/mips/include/asm/processor.h | 3 --- arch/parisc/include/asm/processor.h | 3 +-- arch/powerpc/include/asm/processor.h | 3 +-- arch/s390/include/asm/processor.h | 3 +-- arch/sparc/include/asm/processor_64.h | 3 --- arch/x86/include/asm/processor.h | 2 -- include/linux/sched.h | 4 8 files changed, 3 insertions(+), 20 deletions(-) diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index bfe632808d77..329bb4fd543c 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -80,8 +80,6 @@ #define TASK_SIZE_32 UL(0x1) #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ TASK_SIZE_32 : TASK_SIZE_64) -#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \ - TASK_SIZE_32 : TASK_SIZE_64) #else #define TASK_SIZE TASK_SIZE_64 #endif /* CONFIG_COMPAT */ diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index 95b8c471f572..c2827a5507d4 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -73,9 +73,6 @@ extern unsigned int vced_count, vcei_count; #define TASK_SIZE (test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE64) #define STACK_TOP_MAX TASK_SIZE64 -#define TASK_SIZE_OF(tsk) \ - (test_tsk_thread_flag(tsk, TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE64) - #define TASK_IS_32BIT_ADDR test_thread_flag(TIF_32BIT_ADDR) #endif diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h index a3661ee6b060..8b51ddae8e4a 100644 --- a/arch/parisc/include/asm/processor.h +++ b/arch/parisc/include/asm/processor.h @@ -32,8 +32,7 @@ #define HAVE_ARCH_PICK_MMAP_LAYOUT -#define TASK_SIZE_OF(tsk) ((tsk)->thread.task_size) -#define TASK_SIZE TASK_SIZE_OF(current) +#define TASK_SIZE (current->thread.task_size) #define TASK_UNMAPPED_BASE (current->thread.map_base) #define DEFAULT_TASK_SIZE32(0xFFF0UL) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 1ba814436c73..04e575ead590 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -111,9 +111,8 @@ void release_thread(struct task_struct *); */ #define TASK_SIZE_USER32 (0x0001UL - (1*PAGE_SIZE)) -#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \ +#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ TASK_SIZE_USER32 : TASK_SIZE_USER64) -#define TASK_SIZETASK_SIZE_OF(current) /* This decides where the kernel will search for a free chunk of vm * space during mmap's. diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h index 6bca916a5ba0..c53e8e2a51ac 100644 --- a/arch/s390/include/asm/processor.h +++ b/arch/s390/include/asm/processor.h @@ -89,10 +89,9 @@ extern void execve_tail(void); * User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit. */ -#define TASK_SIZE_OF(tsk) ((tsk)->mm->context.asce_limit) #define TASK_UNMAPPED_BASE (test_thread_flag(TIF_31BIT) ? \ (1UL << 30) : (1UL << 41)) -#define TASK_SIZE TASK_SIZE_OF(current) +#define TASK_SIZE (current->mm->context.asce_limit) #define TASK_MAX_SIZE (1UL << 53) #define STACK_TOP (1UL << (test_thread_flag(TIF_31BIT) ? 31:42)) diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h index 6448cfc8292f..6ce1a75d7a24 100644 --- a/arch/sparc
[RFC 0/4] x86: keep TASK_SIZE in sync with mm->task_size
At this moment, we have following task_size-related things: - TASK_SIZE_OF() macro, which is unused; - current->mm->task_size which is used in half and TASK_SIZE() macro which is used in the other half of code - TIF_ADDR32, which is used to detect 32-bit address space and is x86-specific, where some other arches misused TIF_32BIT - personality ADDR_LIMIT_32BIT, which is used on arm/alpha - ADDR_LIMIT_3GB, which is x86-specific and can be used to change running task's TASK_SIZE 3GB <-> 4GB This patches set removes unused definition of TASK_SIZE_OF (1), defines TASK_SIZE macro as current->mm->task_size (3). I would suggest define TASK_SIZE this way in generic version, but currently I test it only on x86. It also frees thread info flag (2) and adds arch_prctl() on x86_64 to get/set current virtual address space size - as it's needed by now only for CRIU, hide it under CHECKPOINT_RESTORE config. Hope those patches will help to clean task_size-related code at least a bit (and helps me to restore vaddr limits). Cc: Thomas Gleixner <t...@linutronix.de> Cc: Ingo Molnar <mi...@redhat.com> Cc: "H. Peter Anvin" <h...@zytor.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Kirill A. Shutemov <kirill.shute...@linux.intel.com> Cc: x...@kernel.org Dmitry Safonov (4): mm: remove unused TASK_SIZE_OF() x86/thread_info: kill TIF_ADDR32 in favour of ADDR_LIMIT_32BIT x86/mm: define TASK_SIZE as current->mm->task_size x86/arch_prctl: add ARCH_{GET,SET}_TASK_SIZE arch/arm64/include/asm/memory.h | 2 -- arch/mips/include/asm/processor.h | 3 --- arch/parisc/include/asm/processor.h | 3 +-- arch/powerpc/include/asm/processor.h | 3 +-- arch/s390/include/asm/processor.h | 3 +-- arch/sparc/include/asm/processor_64.h | 3 --- arch/x86/include/asm/elf.h| 7 +-- arch/x86/include/asm/processor.h | 19 +-- arch/x86/include/asm/thread_info.h| 4 +--- arch/x86/include/uapi/asm/prctl.h | 3 +++ arch/x86/kernel/process_64.c | 17 +++-- arch/x86/kernel/sys_x86_64.c | 4 ++-- arch/x86/um/asm/segment.h | 2 +- arch/x86/xen/mmu.c| 4 ++-- fs/exec.c | 17 +++-- include/linux/sched.h | 4 16 files changed, 52 insertions(+), 46 deletions(-) -- 2.11.0
Re: [PATCHv3 0/8] powerpc/mm: refactor vDSO mapping code
On 11/08/2016 02:57 AM, Michael Ellerman wrote: Dmitry Safonov <0x7f454...@gmail.com> writes: 2016-10-27 20:09 GMT+03:00 Dmitry Safonov <dsafo...@virtuozzo.com>: ping? There's another series doing some similar changes: http://www.spinics.net/lists/linux-mm/msg115860.html Well, that version makes arch_mremap hook more general with renaming vdso pointer. While this series erases that hook totaly. So, we've agreed that it would be better without this hook, but with generic version of vdso_mremap special_mapping helper: https://marc.info/?i=d1aa8bec-a53e-cd30-e66a-39bebb6a4...@codeaurora.org And I don't like all the macro games in 3/8, eg: +#ifndef BITS +#define BITS 32 +#endif + +#undef Elf_Ehdr +#undef Elf_Sym +#undef Elf_Shdr + +#define _CONCAT3(a, b, c) a ## b ## c +#define CONCAT3(a, b, c) _CONCAT3(a, b, c) +#define Elf_Ehdr CONCAT3(Elf, BITS, _Ehdr) +#define Elf_SymCONCAT3(Elf, BITS, _Sym) +#define Elf_Shdr CONCAT3(Elf, BITS, _Shdr) +#define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE) +#define vdso_kbase CONCAT3(vdso, BITS, _kbase) +#define vdso_pages CONCAT3(vdso, BITS, _pages) + +#undef pr_fmt +#define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt + +#define lib_elfinfo CONCAT3(lib, BITS, _elfinfo) + +#define find_section CONCAT3(find_section, BITS,) +static void * __init find_section(Elf_Ehdr *ehdr, const char *secname, + unsigned long *size) I'd rather we kept the duplication of code than the obfuscation those macros add. If we can come up with a way to share more of the code without having to do all those tricks then I'd be interested. Well, ok, I thought it's quite common even outside of tracing: e.g, fs/compat_binfmt_elf.c does quite the same trick. But as you find it obscured - than ok, I will resend without that common-vdso part. cheers -- Dmitry
Re: [PATCHv3 1/8] powerpc/vdso: unify return paths in setup_additional_pages
On 11/08/2016 03:10 AM, Michael Ellerman wrote: Hi Dmitry, Thanks for the patches. Dmitry Safonov <dsafo...@virtuozzo.com> writes: Impact: cleanup I'm not a fan of these "Impact" lines, especially when they're not correct, ie. this is not a cleanup, a cleanup doesn't change logic. Rename `rc' variable which doesn't seems to mean anything into kernel-known `ret'. 'rc' means "Return Code", it's fairly common. I see at least ~8500 "int rc" declarations in the kernel. Please don't rename variables and change logic in one patch. Ok, right - just didn't saw `rc' that freq as `ret'. Will leave the name. Combine two function returns into one as it's also easier to read. Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 4111d30badfa..4ffb82a2d9e9 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -154,7 +154,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) struct page **vdso_pagelist; unsigned long vdso_pages; unsigned long vdso_base; - int rc; + int ret = 0; Please don't initialise return codes in the declaration, it prevents the compiler from warning you if you forget to initialise it in a particular path. AFAICS you never even use the default value either. Oh, right - I split this patch from converting install_special_mapping() to special vma version _install_special_mapping(), 6/8 patch in series. Will move initialization to that patch. if (!vdso_ready) return 0; @@ -203,8 +203,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) ((VDSO_ALIGNMENT - 1) & PAGE_MASK), 0, 0); if (IS_ERR_VALUE(vdso_base)) { - rc = vdso_base; - goto fail_mmapsem; + ret = vdso_base; + goto out_up_mmap_sem; } /* Add required alignment. */ @@ -227,21 +227,16 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) * It's fine to use that for setting breakpoints in the vDSO code * pages though. */ - rc = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT, + ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT, VM_READ|VM_EXEC| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, vdso_pagelist); - if (rc) { + if (ret) current->mm->context.vdso_base = 0; - goto fail_mmapsem; - } - - up_write(>mmap_sem); - return 0; - fail_mmapsem: +out_up_mmap_sem: up_write(>mmap_sem); - return rc; + return ret; } If you strip out the variable renames then I think that change would be OK. cheers -- Dmitry
Re: [PATCHv3 0/8] powerpc/mm: refactor vDSO mapping code
2016-10-27 20:09 GMT+03:00 Dmitry Safonov <dsafo...@virtuozzo.com>: > Changes since v1, v2: > - use vdso64_pages only under CONFIG_PPC64 (32-bit build fix) > - remove arch_vma_name helper as not needed anymore, > simplify vdso_base pointer initializing in map_vdso() > > Cleanup patches for vDSO on powerpc. > Originally, I wanted to add vDSO remapping on arm/aarch64 and > I decided to cleanup that part on powerpc. > I've add a hook for vm_ops for vDSO just like I did for x86, > which makes cross-arch arch_mremap hook no more needed. > Other changes - reduce exhaustive code duplication by > separating the common vdso code. > > No visible to userspace changes expected. > Tested on qemu with buildroot rootfs. > > Dmitry Safonov (8): > powerpc/vdso: unify return paths in setup_additional_pages > powerpc/vdso: remove unused params in vdso_do_func_patch{32,64} > powerpc/vdso: separate common code in vdso_common > powerpc/vdso: introduce init_vdso{32,64}_pagelist > powerpc/vdso: split map_vdso from arch_setup_additional_pages > powerpc/vdso: switch from legacy_special_mapping_vmops > mm: kill arch_mremap > powerpc/vdso: remove arch_vma_name > > arch/alpha/include/asm/Kbuild| 1 - > arch/arc/include/asm/Kbuild | 1 - > arch/arm/include/asm/Kbuild | 1 - > arch/arm64/include/asm/Kbuild| 1 - > arch/avr32/include/asm/Kbuild| 1 - > arch/blackfin/include/asm/Kbuild | 1 - > arch/c6x/include/asm/Kbuild | 1 - > arch/cris/include/asm/Kbuild | 1 - > arch/frv/include/asm/Kbuild | 1 - > arch/h8300/include/asm/Kbuild| 1 - > arch/hexagon/include/asm/Kbuild | 1 - > arch/ia64/include/asm/Kbuild | 1 - > arch/m32r/include/asm/Kbuild | 1 - > arch/m68k/include/asm/Kbuild | 1 - > arch/metag/include/asm/Kbuild| 1 - > arch/microblaze/include/asm/Kbuild | 1 - > arch/mips/include/asm/Kbuild | 1 - > arch/mn10300/include/asm/Kbuild | 1 - > arch/nios2/include/asm/Kbuild| 1 - > arch/openrisc/include/asm/Kbuild | 1 - > arch/parisc/include/asm/Kbuild | 1 - > arch/powerpc/include/asm/mm-arch-hooks.h | 28 -- > arch/powerpc/kernel/vdso.c | 502 > +-- > arch/powerpc/kernel/vdso_common.c| 248 +++ > arch/s390/include/asm/Kbuild | 1 - > arch/score/include/asm/Kbuild| 1 - > arch/sh/include/asm/Kbuild | 1 - > arch/sparc/include/asm/Kbuild| 1 - > arch/tile/include/asm/Kbuild | 1 - > arch/um/include/asm/Kbuild | 1 - > arch/unicore32/include/asm/Kbuild| 1 - > arch/x86/include/asm/Kbuild | 1 - > arch/xtensa/include/asm/Kbuild | 1 - > include/asm-generic/mm-arch-hooks.h | 16 - > include/linux/mm-arch-hooks.h| 25 -- > mm/mremap.c | 4 - > 36 files changed, 324 insertions(+), 529 deletions(-) > delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h > create mode 100644 arch/powerpc/kernel/vdso_common.c > delete mode 100644 include/asm-generic/mm-arch-hooks.h > delete mode 100644 include/linux/mm-arch-hooks.h ping?
Re: [PATCH 0/7] powerpc/mm: refactor vDSO mapping code
2016-10-25 18:50 GMT+03:00 Dmitry Safonov <dsafo...@virtuozzo.com>: > Cleanup patches for vDSO on powerpc. > Originally, I wanted to add vDSO remapping on arm/aarch64 and > I decided to cleanup that part on powerpc. > I've add a hook for vm_ops for vDSO just like I did for x86. > Other changes - reduce exhaustive code duplication. > No visible to userspace changes expected. > > Tested on qemu with buildroot rootfs. > > Dmitry Safonov (7): > powerpc/vdso: unify return paths in setup_additional_pages > powerpc/vdso: remove unused params in vdso_do_func_patch{32,64} > powerpc/vdso: separate common code in vdso_common > powerpc/vdso: introduce init_vdso{32,64}_pagelist > powerpc/vdso: split map_vdso from arch_setup_additional_pages > powerpc/vdso: switch from legacy_special_mapping_vmops > mm: kill arch_mremap Ignore this version, please - I've just sent v3 with some new fixes.
[PATCHv3 7/8] mm: kill arch_mremap
This reverts commit 4abad2ca4a4d ("mm: new arch_remap() hook") and commit 2ae416b142b6 ("mm: new mm hook framework"). It also keeps the same functionality of mremapping vDSO blob with introducing vm_special_mapping mremap op for powerpc. The same way it's being handled on x86. Cc: Laurent Dufour <lduf...@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: Andrew Morton <a...@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/alpha/include/asm/Kbuild| 1 - arch/arc/include/asm/Kbuild | 1 - arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild| 1 - arch/avr32/include/asm/Kbuild| 1 - arch/blackfin/include/asm/Kbuild | 1 - arch/c6x/include/asm/Kbuild | 1 - arch/cris/include/asm/Kbuild | 1 - arch/frv/include/asm/Kbuild | 1 - arch/h8300/include/asm/Kbuild| 1 - arch/hexagon/include/asm/Kbuild | 1 - arch/ia64/include/asm/Kbuild | 1 - arch/m32r/include/asm/Kbuild | 1 - arch/m68k/include/asm/Kbuild | 1 - arch/metag/include/asm/Kbuild| 1 - arch/microblaze/include/asm/Kbuild | 1 - arch/mips/include/asm/Kbuild | 1 - arch/mn10300/include/asm/Kbuild | 1 - arch/nios2/include/asm/Kbuild| 1 - arch/openrisc/include/asm/Kbuild | 1 - arch/parisc/include/asm/Kbuild | 1 - arch/powerpc/include/asm/mm-arch-hooks.h | 28 arch/powerpc/kernel/vdso.c | 25 + arch/powerpc/kernel/vdso_common.c| 1 + arch/s390/include/asm/Kbuild | 1 - arch/score/include/asm/Kbuild| 1 - arch/sh/include/asm/Kbuild | 1 - arch/sparc/include/asm/Kbuild| 1 - arch/tile/include/asm/Kbuild | 1 - arch/um/include/asm/Kbuild | 1 - arch/unicore32/include/asm/Kbuild| 1 - arch/x86/include/asm/Kbuild | 1 - arch/xtensa/include/asm/Kbuild | 1 - include/asm-generic/mm-arch-hooks.h | 16 include/linux/mm-arch-hooks.h| 25 - mm/mremap.c | 4 36 files changed, 26 insertions(+), 103 deletions(-) delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h delete mode 100644 include/asm-generic/mm-arch-hooks.h delete mode 100644 include/linux/mm-arch-hooks.h diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index bf8475ce85ee..0a5e0ec2842b 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -6,7 +6,6 @@ generic-y += exec.h generic-y += export.h generic-y += irq_work.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += preempt.h generic-y += sections.h generic-y += trace_clock.h diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index c332604606dd..e6059a808463 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -22,7 +22,6 @@ generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += mman.h generic-y += msgbuf.h generic-y += msi.h diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 0745538b26d3..44b717cb4a55 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -15,7 +15,6 @@ generic-y += irq_regs.h generic-y += kdebug.h generic-y += local.h generic-y += local64.h -generic-y += mm-arch-hooks.h generic-y += msgbuf.h generic-y += msi.h generic-y += param.h diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 44e1d7f10add..a42a1367aea4 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -20,7 +20,6 @@ generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += mman.h generic-y += msgbuf.h generic-y += msi.h diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild index 241b9b9729d8..519810d0d5e1 100644 --- a/arch/avr32/include/asm/Kbuild +++ b/arch/avr32/include/asm/Kbuild @@ -12,7 +12,6 @@ generic-y += irq_work.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += param.h generic-y += percpu.h generic-y += preempt.h diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild index
[PATCHv3 8/8] powerpc/vdso: remove arch_vma_name
It's not needed since vdso is inserted with vm_special_mapping which contains vma name. This also reverts commit f2053f1a7bf6 ("powerpc/perf_counter: Fix vdso detection") as not needed anymore. See also commit f7b6eb3fa072 ("x86: Set context.vdso before installing the mapping"). Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c | 20 +++- 1 file changed, 3 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 431bdf7ec68e..f66f52aa94de 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -208,13 +208,6 @@ static int map_vdso(struct vm_special_mapping *vsm, unsigned long vdso_pages, vdso_base = ALIGN(vdso_base, VDSO_ALIGNMENT); /* -* Put vDSO base into mm struct. We need to do this before calling -* install_special_mapping or the perf counter mmap tracking code -* will fail to recognise it as a vDSO (since arch_vma_name fails). -*/ - current->mm->context.vdso_base = vdso_base; - - /* * our vma flags don't have VM_WRITE so by default, the process isn't * allowed to write those pages. * gdb can break that with ptrace interface, and thus trigger COW on @@ -228,10 +221,10 @@ static int map_vdso(struct vm_special_mapping *vsm, unsigned long vdso_pages, VM_READ|VM_EXEC| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, vsm); - if (IS_ERR(vma)) { + if (IS_ERR(vma)) ret = PTR_ERR(vma); - current->mm->context.vdso_base = 0; - } + else + current->mm->context.vdso_base = vdso_base; out_up_mmap_sem: up_write(>mmap_sem); @@ -262,13 +255,6 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) return -1; } -const char *arch_vma_name(struct vm_area_struct *vma) -{ - if (vma->vm_mm && vma->vm_start == vma->vm_mm->context.vdso_base) - return "[vdso]"; - return NULL; -} - #ifdef CONFIG_VDSO32 #include "vdso_common.c" #endif /* CONFIG_VDSO32 */ -- 2.10.1
[PATCHv3 2/8] powerpc/vdso: remove unused params in vdso_do_func_patch{32, 64}
Impact: cleanup vdso_do_func_patch{32,64} only use {v32,v64} parameter accordingly. Remove not needed parameters. Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 4ffb82a2d9e9..278b9aa25a1c 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -309,7 +309,6 @@ static unsigned long __init find_function32(struct lib32_elfinfo *lib, } static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32, - struct lib64_elfinfo *v64, const char *orig, const char *fix) { Elf32_Sym *sym32_gen, *sym32_fix; @@ -344,7 +343,6 @@ static unsigned long __init find_function32(struct lib32_elfinfo *lib, } static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32, - struct lib64_elfinfo *v64, const char *orig, const char *fix) { return 0; @@ -419,8 +417,7 @@ static unsigned long __init find_function64(struct lib64_elfinfo *lib, #endif } -static int __init vdso_do_func_patch64(struct lib32_elfinfo *v32, - struct lib64_elfinfo *v64, +static int __init vdso_do_func_patch64(struct lib64_elfinfo *v64, const char *orig, const char *fix) { Elf64_Sym *sym64_gen, *sym64_fix; @@ -619,11 +616,9 @@ static __init int vdso_fixup_alt_funcs(struct lib32_elfinfo *v32, * It would be easy to do, but doesn't seem to be necessary, * patching the OPD symbol is enough. */ - vdso_do_func_patch32(v32, v64, patch->gen_name, -patch->fix_name); + vdso_do_func_patch32(v32, patch->gen_name, patch->fix_name); #ifdef CONFIG_PPC64 - vdso_do_func_patch64(v32, v64, patch->gen_name, -patch->fix_name); + vdso_do_func_patch64(v64, patch->gen_name, patch->fix_name); #endif /* CONFIG_PPC64 */ } -- 2.10.1
[PATCHv3 6/8] powerpc/vdso: switch from legacy_special_mapping_vmops
This will allow to handle vDSO vma like special_mapping, that has it's name and hooks. Needed for mremap hook, which will replace arch_mremap helper, also for removing arch_vma_name. Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c| 19 +++ arch/powerpc/kernel/vdso_common.c | 8 ++-- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index e68601ffc9ad..9ee3fd65c6e9 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -51,7 +51,7 @@ #define VDSO_ALIGNMENT (1 << 16) static unsigned int vdso32_pages; -static struct page **vdso32_pagelist; +static struct vm_special_mapping vdso32_mapping; unsigned long vdso32_sigtramp; unsigned long vdso32_rt_sigtramp; @@ -64,7 +64,7 @@ static void *vdso32_kbase; extern char vdso64_start, vdso64_end; static void *vdso64_kbase = _start; static unsigned int vdso64_pages; -static struct page **vdso64_pagelist; +static struct vm_special_mapping vdso64_mapping; unsigned long vdso64_rt_sigtramp; #endif /* CONFIG_PPC64 */ @@ -143,10 +143,11 @@ struct lib64_elfinfo unsigned long text; }; -static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages, +static int map_vdso(struct vm_special_mapping *vsm, unsigned long vdso_pages, unsigned long vdso_base) { struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; int ret = 0; mm->context.vdso_base = 0; @@ -198,12 +199,14 @@ static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages, * It's fine to use that for setting breakpoints in the vDSO code * pages though. */ - ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT, + vma = _install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT, VM_READ|VM_EXEC| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, -vdso_pagelist); - if (ret) +vsm); + if (IS_ERR(vma)) { + ret = PTR_ERR(vma); current->mm->context.vdso_base = 0; + } out_up_mmap_sem: up_write(>mmap_sem); @@ -220,7 +223,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) return 0; if (is_32bit_task()) - return map_vdso(vdso32_pagelist, vdso32_pages, VDSO32_MBASE); + return map_vdso(_mapping, vdso32_pages, VDSO32_MBASE); #ifdef CONFIG_PPC64 else /* @@ -228,7 +231,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) * allows get_unmapped_area to find an area near other mmaps * and most likely share a SLB entry. */ - return map_vdso(vdso64_pagelist, vdso64_pages, 0); + return map_vdso(_mapping, vdso64_pages, 0); #endif WARN_ONCE(1, "task is not 32-bit on non PPC64 kernel"); return -1; diff --git a/arch/powerpc/kernel/vdso_common.c b/arch/powerpc/kernel/vdso_common.c index c97c30606b3f..047f6b8b230f 100644 --- a/arch/powerpc/kernel/vdso_common.c +++ b/arch/powerpc/kernel/vdso_common.c @@ -14,7 +14,7 @@ #define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE) #define vdso_kbase CONCAT3(vdso, BITS, _kbase) #define vdso_pages CONCAT3(vdso, BITS, _pages) -#define vdso_pagelist CONCAT3(vdso, BITS, _pagelist) +#define vdso_mapping CONCAT3(vdso, BITS, _mapping) #undef pr_fmt #define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt @@ -207,6 +207,7 @@ static __init int vdso_setup(struct lib_elfinfo *v) static __init void init_vdso_pagelist(void) { int i; + struct page **vdso_pagelist; /* Make sure pages are in the correct state */ vdso_pagelist = kzalloc(sizeof(struct page *) * (vdso_pages + 2), @@ -221,6 +222,9 @@ static __init void init_vdso_pagelist(void) } vdso_pagelist[i++] = virt_to_page(vdso_data); vdso_pagelist[i] = NULL; + + vdso_mapping.pages = vdso_pagelist; + vdso_mapping.name = "[vdso]"; } #undef find_section @@ -236,7 +240,7 @@ static __init void init_vdso_pagelist(void) #undef VDSO_LBASE #undef vdso_kbase #undef vdso_pages -#undef vdso_pagelist +#undef vdso_mapping #undef lib_elfinfo #undef BITS #undef _CONCAT3 -- 2.10.1
[PATCHv3 5/8] powerpc/vdso: split map_vdso from arch_setup_additional_pages
Impact: cleanup I'll be easier to introduce vm_special_mapping struct in a smaller map_vdso() function (see the next patches). The same way it's handeled on x86. Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c | 67 +- 1 file changed, 31 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 25d03d773c49..e68601ffc9ad 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -143,52 +143,23 @@ struct lib64_elfinfo unsigned long text; }; - -/* - * This is called from binfmt_elf, we create the special vma for the - * vDSO and insert it into the mm struct tree - */ -int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) +static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages, + unsigned long vdso_base) { struct mm_struct *mm = current->mm; - struct page **vdso_pagelist; - unsigned long vdso_pages; - unsigned long vdso_base; int ret = 0; - if (!vdso_ready) - return 0; - -#ifdef CONFIG_PPC64 - if (is_32bit_task()) { - vdso_pagelist = vdso32_pagelist; - vdso_pages = vdso32_pages; - vdso_base = VDSO32_MBASE; - } else { - vdso_pagelist = vdso64_pagelist; - vdso_pages = vdso64_pages; - /* -* On 64bit we don't have a preferred map address. This -* allows get_unmapped_area to find an area near other mmaps -* and most likely share a SLB entry. -*/ - vdso_base = 0; - } -#else - vdso_pagelist = vdso32_pagelist; - vdso_pages = vdso32_pages; - vdso_base = VDSO32_MBASE; -#endif - - current->mm->context.vdso_base = 0; + mm->context.vdso_base = 0; - /* vDSO has a problem and was disabled, just don't "enable" it for the + /* +* vDSO has a problem and was disabled, just don't "enable" it for the * process */ if (vdso_pages == 0) return 0; + /* Add a page to the vdso size for the data page */ - vdso_pages ++; + vdso_pages++; /* * pick a base address for the vDSO in process space. We try to put it @@ -239,6 +210,30 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) return ret; } +/* + * This is called from binfmt_elf, we create the special vma for the + * vDSO and insert it into the mm struct tree + */ +int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) +{ + if (!vdso_ready) + return 0; + + if (is_32bit_task()) + return map_vdso(vdso32_pagelist, vdso32_pages, VDSO32_MBASE); +#ifdef CONFIG_PPC64 + else + /* +* On 64bit we don't have a preferred map address. This +* allows get_unmapped_area to find an area near other mmaps +* and most likely share a SLB entry. +*/ + return map_vdso(vdso64_pagelist, vdso64_pages, 0); +#endif + WARN_ONCE(1, "task is not 32-bit on non PPC64 kernel"); + return -1; +} + const char *arch_vma_name(struct vm_area_struct *vma) { if (vma->vm_mm && vma->vm_start == vma->vm_mm->context.vdso_base) -- 2.10.1
[PATCHv3 3/8] powerpc/vdso: separate common code in vdso_common
Impact: cleanup There are common functions for handeling 32-bit and 64-bit vDSO ELF files: find_section{32,64}, find_symbol{32,64}, find_function{32,64}, vdso_do_func_patch{32,64}, vdso_do_find_sections{32,64}, vdso_fixup_datapag{32,64}, vdso_fixup_features{32,64}, vdso_setup{32,64} which all do the same work with the only difference is using structures for 32 or 64 bit ELF. Let's combine them into common code, reducing copy'n'paste code. Small changes: I also switched usage of printk(KERNEL_,...) on pr_(...) and used pr_fmt() macro for "vDSO{32,64}: " prefix. Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c| 352 ++ arch/powerpc/kernel/vdso_common.c | 221 2 files changed, 234 insertions(+), 339 deletions(-) create mode 100644 arch/powerpc/kernel/vdso_common.c diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 278b9aa25a1c..8010a0d82049 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -51,13 +51,13 @@ #define VDSO_ALIGNMENT (1 << 16) static unsigned int vdso32_pages; -static void *vdso32_kbase; static struct page **vdso32_pagelist; unsigned long vdso32_sigtramp; unsigned long vdso32_rt_sigtramp; #ifdef CONFIG_VDSO32 extern char vdso32_start, vdso32_end; +static void *vdso32_kbase; #endif #ifdef CONFIG_PPC64 @@ -246,250 +246,16 @@ const char *arch_vma_name(struct vm_area_struct *vma) return NULL; } - - #ifdef CONFIG_VDSO32 -static void * __init find_section32(Elf32_Ehdr *ehdr, const char *secname, - unsigned long *size) -{ - Elf32_Shdr *sechdrs; - unsigned int i; - char *secnames; - - /* Grab section headers and strings so we can tell who is who */ - sechdrs = (void *)ehdr + ehdr->e_shoff; - secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset; - - /* Find the section they want */ - for (i = 1; i < ehdr->e_shnum; i++) { - if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) { - if (size) - *size = sechdrs[i].sh_size; - return (void *)ehdr + sechdrs[i].sh_offset; - } - } - *size = 0; - return NULL; -} - -static Elf32_Sym * __init find_symbol32(struct lib32_elfinfo *lib, - const char *symname) -{ - unsigned int i; - char name[MAX_SYMNAME], *c; - - for (i = 0; i < (lib->dynsymsize / sizeof(Elf32_Sym)); i++) { - if (lib->dynsym[i].st_name == 0) - continue; - strlcpy(name, lib->dynstr + lib->dynsym[i].st_name, - MAX_SYMNAME); - c = strchr(name, '@'); - if (c) - *c = 0; - if (strcmp(symname, name) == 0) - return >dynsym[i]; - } - return NULL; -} - -/* Note that we assume the section is .text and the symbol is relative to - * the library base - */ -static unsigned long __init find_function32(struct lib32_elfinfo *lib, - const char *symname) -{ - Elf32_Sym *sym = find_symbol32(lib, symname); - - if (sym == NULL) { - printk(KERN_WARNING "vDSO32: function %s not found !\n", - symname); - return 0; - } - return sym->st_value - VDSO32_LBASE; -} - -static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32, - const char *orig, const char *fix) -{ - Elf32_Sym *sym32_gen, *sym32_fix; - - sym32_gen = find_symbol32(v32, orig); - if (sym32_gen == NULL) { - printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", orig); - return -1; - } - if (fix == NULL) { - sym32_gen->st_name = 0; - return 0; - } - sym32_fix = find_symbol32(v32, fix); - if (sym32_fix == NULL) { - printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", fix); - return -1; - } - sym32_gen->st_value = sym32_fix->st_value; - sym32_gen->st_size = sym32_fix->st_size; - sym32_gen->st_info = sym32_fix->st_info; - sym32_gen->st_other = sym32_fix->st_other; - sym32_gen->st_shndx = sym32_fix->st_shndx; - - return 0; -} -#else /* !CONFIG_VDSO32 */ -static unsigned long __init find_function32(struct lib32_elfinfo *lib, -
[PATCHv3 4/8] powerpc/vdso: introduce init_vdso{32,64}_pagelist
Impact: cleanup Move allocation/initialization of vDSO's pagelist for 32/64-bit vDSO into common vdso code, introducing a function for that. Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c| 27 ++- arch/powerpc/kernel/vdso_common.c | 22 ++ 2 files changed, 24 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 8010a0d82049..25d03d773c49 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -382,8 +382,6 @@ early_initcall(vdso_getcpu_init); static int __init vdso_init(void) { - int i; - #ifdef CONFIG_PPC64 /* * Fill up the "systemcfg" stuff for backward compatibility @@ -454,32 +452,11 @@ static int __init vdso_init(void) } #ifdef CONFIG_VDSO32 - /* Make sure pages are in the correct state */ - vdso32_pagelist = kzalloc(sizeof(struct page *) * (vdso32_pages + 2), - GFP_KERNEL); - BUG_ON(vdso32_pagelist == NULL); - for (i = 0; i < vdso32_pages; i++) { - struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE); - ClearPageReserved(pg); - get_page(pg); - vdso32_pagelist[i] = pg; - } - vdso32_pagelist[i++] = virt_to_page(vdso_data); - vdso32_pagelist[i] = NULL; + init_vdso32_pagelist(); #endif #ifdef CONFIG_PPC64 - vdso64_pagelist = kzalloc(sizeof(struct page *) * (vdso64_pages + 2), - GFP_KERNEL); - BUG_ON(vdso64_pagelist == NULL); - for (i = 0; i < vdso64_pages; i++) { - struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE); - ClearPageReserved(pg); - get_page(pg); - vdso64_pagelist[i] = pg; - } - vdso64_pagelist[i++] = virt_to_page(vdso_data); - vdso64_pagelist[i] = NULL; + init_vdso64_pagelist(); #endif /* CONFIG_PPC64 */ get_page(virt_to_page(vdso_data)); diff --git a/arch/powerpc/kernel/vdso_common.c b/arch/powerpc/kernel/vdso_common.c index ac25d66134fb..c97c30606b3f 100644 --- a/arch/powerpc/kernel/vdso_common.c +++ b/arch/powerpc/kernel/vdso_common.c @@ -14,6 +14,7 @@ #define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE) #define vdso_kbase CONCAT3(vdso, BITS, _kbase) #define vdso_pages CONCAT3(vdso, BITS, _pages) +#define vdso_pagelist CONCAT3(vdso, BITS, _pagelist) #undef pr_fmt #define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt @@ -202,6 +203,25 @@ static __init int vdso_setup(struct lib_elfinfo *v) return 0; } +#define init_vdso_pagelist CONCAT3(init_vdso, BITS, _pagelist) +static __init void init_vdso_pagelist(void) +{ + int i; + + /* Make sure pages are in the correct state */ + vdso_pagelist = kzalloc(sizeof(struct page *) * (vdso_pages + 2), + GFP_KERNEL); + BUG_ON(vdso_pagelist == NULL); + for (i = 0; i < vdso_pages; i++) { + struct page *pg = virt_to_page(vdso_kbase + i*PAGE_SIZE); + + ClearPageReserved(pg); + get_page(pg); + vdso_pagelist[i] = pg; + } + vdso_pagelist[i++] = virt_to_page(vdso_data); + vdso_pagelist[i] = NULL; +} #undef find_section #undef find_symbol @@ -211,10 +231,12 @@ static __init int vdso_setup(struct lib_elfinfo *v) #undef vdso_fixup_datapage #undef vdso_fixup_features #undef vdso_setup +#undef init_vdso_pagelist #undef VDSO_LBASE #undef vdso_kbase #undef vdso_pages +#undef vdso_pagelist #undef lib_elfinfo #undef BITS #undef _CONCAT3 -- 2.10.1
[PATCHv3 1/8] powerpc/vdso: unify return paths in setup_additional_pages
Impact: cleanup Rename `rc' variable which doesn't seems to mean anything into kernel-known `ret'. Combine two function returns into one as it's also easier to read. Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 4111d30badfa..4ffb82a2d9e9 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -154,7 +154,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) struct page **vdso_pagelist; unsigned long vdso_pages; unsigned long vdso_base; - int rc; + int ret = 0; if (!vdso_ready) return 0; @@ -203,8 +203,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) ((VDSO_ALIGNMENT - 1) & PAGE_MASK), 0, 0); if (IS_ERR_VALUE(vdso_base)) { - rc = vdso_base; - goto fail_mmapsem; + ret = vdso_base; + goto out_up_mmap_sem; } /* Add required alignment. */ @@ -227,21 +227,16 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) * It's fine to use that for setting breakpoints in the vDSO code * pages though. */ - rc = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT, + ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT, VM_READ|VM_EXEC| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, vdso_pagelist); - if (rc) { + if (ret) current->mm->context.vdso_base = 0; - goto fail_mmapsem; - } - - up_write(>mmap_sem); - return 0; - fail_mmapsem: +out_up_mmap_sem: up_write(>mmap_sem); - return rc; + return ret; } const char *arch_vma_name(struct vm_area_struct *vma) -- 2.10.1
[PATCHv3 0/8] powerpc/mm: refactor vDSO mapping code
Changes since v1, v2: - use vdso64_pages only under CONFIG_PPC64 (32-bit build fix) - remove arch_vma_name helper as not needed anymore, simplify vdso_base pointer initializing in map_vdso() Cleanup patches for vDSO on powerpc. Originally, I wanted to add vDSO remapping on arm/aarch64 and I decided to cleanup that part on powerpc. I've add a hook for vm_ops for vDSO just like I did for x86, which makes cross-arch arch_mremap hook no more needed. Other changes - reduce exhaustive code duplication by separating the common vdso code. No visible to userspace changes expected. Tested on qemu with buildroot rootfs. Dmitry Safonov (8): powerpc/vdso: unify return paths in setup_additional_pages powerpc/vdso: remove unused params in vdso_do_func_patch{32,64} powerpc/vdso: separate common code in vdso_common powerpc/vdso: introduce init_vdso{32,64}_pagelist powerpc/vdso: split map_vdso from arch_setup_additional_pages powerpc/vdso: switch from legacy_special_mapping_vmops mm: kill arch_mremap powerpc/vdso: remove arch_vma_name arch/alpha/include/asm/Kbuild| 1 - arch/arc/include/asm/Kbuild | 1 - arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild| 1 - arch/avr32/include/asm/Kbuild| 1 - arch/blackfin/include/asm/Kbuild | 1 - arch/c6x/include/asm/Kbuild | 1 - arch/cris/include/asm/Kbuild | 1 - arch/frv/include/asm/Kbuild | 1 - arch/h8300/include/asm/Kbuild| 1 - arch/hexagon/include/asm/Kbuild | 1 - arch/ia64/include/asm/Kbuild | 1 - arch/m32r/include/asm/Kbuild | 1 - arch/m68k/include/asm/Kbuild | 1 - arch/metag/include/asm/Kbuild| 1 - arch/microblaze/include/asm/Kbuild | 1 - arch/mips/include/asm/Kbuild | 1 - arch/mn10300/include/asm/Kbuild | 1 - arch/nios2/include/asm/Kbuild| 1 - arch/openrisc/include/asm/Kbuild | 1 - arch/parisc/include/asm/Kbuild | 1 - arch/powerpc/include/asm/mm-arch-hooks.h | 28 -- arch/powerpc/kernel/vdso.c | 502 +-- arch/powerpc/kernel/vdso_common.c| 248 +++ arch/s390/include/asm/Kbuild | 1 - arch/score/include/asm/Kbuild| 1 - arch/sh/include/asm/Kbuild | 1 - arch/sparc/include/asm/Kbuild| 1 - arch/tile/include/asm/Kbuild | 1 - arch/um/include/asm/Kbuild | 1 - arch/unicore32/include/asm/Kbuild| 1 - arch/x86/include/asm/Kbuild | 1 - arch/xtensa/include/asm/Kbuild | 1 - include/asm-generic/mm-arch-hooks.h | 16 - include/linux/mm-arch-hooks.h| 25 -- mm/mremap.c | 4 - 36 files changed, 324 insertions(+), 529 deletions(-) delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h create mode 100644 arch/powerpc/kernel/vdso_common.c delete mode 100644 include/asm-generic/mm-arch-hooks.h delete mode 100644 include/linux/mm-arch-hooks.h -- 2.10.1
[PATCHv2 7/7] mm: kill arch_mremap
This reverts commit 4abad2ca4a4d ("mm: new arch_remap() hook") and commit 2ae416b142b6 ("mm: new mm hook framework"). It also keeps the same functionality of mremapping vDSO blob with introducing vm_special_mapping mremap op for powerpc. Cc: Laurent Dufour <lduf...@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: Andrew Morton <a...@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- v2: use vdso64_pages only under CONFIG_PPC64 arch/alpha/include/asm/Kbuild| 1 - arch/arc/include/asm/Kbuild | 1 - arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild| 1 - arch/avr32/include/asm/Kbuild| 1 - arch/blackfin/include/asm/Kbuild | 1 - arch/c6x/include/asm/Kbuild | 1 - arch/cris/include/asm/Kbuild | 1 - arch/frv/include/asm/Kbuild | 1 - arch/h8300/include/asm/Kbuild| 1 - arch/hexagon/include/asm/Kbuild | 1 - arch/ia64/include/asm/Kbuild | 1 - arch/m32r/include/asm/Kbuild | 1 - arch/m68k/include/asm/Kbuild | 1 - arch/metag/include/asm/Kbuild| 1 - arch/microblaze/include/asm/Kbuild | 1 - arch/mips/include/asm/Kbuild | 1 - arch/mn10300/include/asm/Kbuild | 1 - arch/nios2/include/asm/Kbuild| 1 - arch/openrisc/include/asm/Kbuild | 1 - arch/parisc/include/asm/Kbuild | 1 - arch/powerpc/include/asm/mm-arch-hooks.h | 28 arch/powerpc/kernel/vdso.c | 25 + arch/powerpc/kernel/vdso_common.c| 1 + arch/s390/include/asm/Kbuild | 1 - arch/score/include/asm/Kbuild| 1 - arch/sh/include/asm/Kbuild | 1 - arch/sparc/include/asm/Kbuild| 1 - arch/tile/include/asm/Kbuild | 1 - arch/um/include/asm/Kbuild | 1 - arch/unicore32/include/asm/Kbuild| 1 - arch/x86/include/asm/Kbuild | 1 - arch/xtensa/include/asm/Kbuild | 1 - include/asm-generic/mm-arch-hooks.h | 16 include/linux/mm-arch-hooks.h| 25 - mm/mremap.c | 4 36 files changed, 26 insertions(+), 103 deletions(-) delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h delete mode 100644 include/asm-generic/mm-arch-hooks.h delete mode 100644 include/linux/mm-arch-hooks.h diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index bf8475ce85ee..0a5e0ec2842b 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -6,7 +6,6 @@ generic-y += exec.h generic-y += export.h generic-y += irq_work.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += preempt.h generic-y += sections.h generic-y += trace_clock.h diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index c332604606dd..e6059a808463 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -22,7 +22,6 @@ generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += mman.h generic-y += msgbuf.h generic-y += msi.h diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 0745538b26d3..44b717cb4a55 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -15,7 +15,6 @@ generic-y += irq_regs.h generic-y += kdebug.h generic-y += local.h generic-y += local64.h -generic-y += mm-arch-hooks.h generic-y += msgbuf.h generic-y += msi.h generic-y += param.h diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 44e1d7f10add..a42a1367aea4 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -20,7 +20,6 @@ generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += mman.h generic-y += msgbuf.h generic-y += msi.h diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild index 241b9b9729d8..519810d0d5e1 100644 --- a/arch/avr32/include/asm/Kbuild +++ b/arch/avr32/include/asm/Kbuild @@ -12,7 +12,6 @@ generic-y += irq_work.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += param.h generic-y += percpu.h generic-y += preempt.h diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild
[PATCH 4/7] powerpc/vdso: introduce init_vdso{32,64}_pagelist
Common code with allocation/initialization of vDSO's pagelist. Impact: cleanup Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c| 27 ++- arch/powerpc/kernel/vdso_common.c | 22 ++ 2 files changed, 24 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 8010a0d82049..25d03d773c49 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -382,8 +382,6 @@ early_initcall(vdso_getcpu_init); static int __init vdso_init(void) { - int i; - #ifdef CONFIG_PPC64 /* * Fill up the "systemcfg" stuff for backward compatibility @@ -454,32 +452,11 @@ static int __init vdso_init(void) } #ifdef CONFIG_VDSO32 - /* Make sure pages are in the correct state */ - vdso32_pagelist = kzalloc(sizeof(struct page *) * (vdso32_pages + 2), - GFP_KERNEL); - BUG_ON(vdso32_pagelist == NULL); - for (i = 0; i < vdso32_pages; i++) { - struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE); - ClearPageReserved(pg); - get_page(pg); - vdso32_pagelist[i] = pg; - } - vdso32_pagelist[i++] = virt_to_page(vdso_data); - vdso32_pagelist[i] = NULL; + init_vdso32_pagelist(); #endif #ifdef CONFIG_PPC64 - vdso64_pagelist = kzalloc(sizeof(struct page *) * (vdso64_pages + 2), - GFP_KERNEL); - BUG_ON(vdso64_pagelist == NULL); - for (i = 0; i < vdso64_pages; i++) { - struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE); - ClearPageReserved(pg); - get_page(pg); - vdso64_pagelist[i] = pg; - } - vdso64_pagelist[i++] = virt_to_page(vdso_data); - vdso64_pagelist[i] = NULL; + init_vdso64_pagelist(); #endif /* CONFIG_PPC64 */ get_page(virt_to_page(vdso_data)); diff --git a/arch/powerpc/kernel/vdso_common.c b/arch/powerpc/kernel/vdso_common.c index ac25d66134fb..c97c30606b3f 100644 --- a/arch/powerpc/kernel/vdso_common.c +++ b/arch/powerpc/kernel/vdso_common.c @@ -14,6 +14,7 @@ #define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE) #define vdso_kbase CONCAT3(vdso, BITS, _kbase) #define vdso_pages CONCAT3(vdso, BITS, _pages) +#define vdso_pagelist CONCAT3(vdso, BITS, _pagelist) #undef pr_fmt #define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt @@ -202,6 +203,25 @@ static __init int vdso_setup(struct lib_elfinfo *v) return 0; } +#define init_vdso_pagelist CONCAT3(init_vdso, BITS, _pagelist) +static __init void init_vdso_pagelist(void) +{ + int i; + + /* Make sure pages are in the correct state */ + vdso_pagelist = kzalloc(sizeof(struct page *) * (vdso_pages + 2), + GFP_KERNEL); + BUG_ON(vdso_pagelist == NULL); + for (i = 0; i < vdso_pages; i++) { + struct page *pg = virt_to_page(vdso_kbase + i*PAGE_SIZE); + + ClearPageReserved(pg); + get_page(pg); + vdso_pagelist[i] = pg; + } + vdso_pagelist[i++] = virt_to_page(vdso_data); + vdso_pagelist[i] = NULL; +} #undef find_section #undef find_symbol @@ -211,10 +231,12 @@ static __init int vdso_setup(struct lib_elfinfo *v) #undef vdso_fixup_datapage #undef vdso_fixup_features #undef vdso_setup +#undef init_vdso_pagelist #undef VDSO_LBASE #undef vdso_kbase #undef vdso_pages +#undef vdso_pagelist #undef lib_elfinfo #undef BITS #undef _CONCAT3 -- 2.10.0
[PATCH 3/7] powerpc/vdso: separate common code in vdso_common
Impact: cleanup I also switched usage of printk(KERNEL_,...) on pr_(...) and used pr_fmt() macro for "vDSO{32,64}: " prefix. Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c| 352 ++ arch/powerpc/kernel/vdso_common.c | 221 2 files changed, 234 insertions(+), 339 deletions(-) create mode 100644 arch/powerpc/kernel/vdso_common.c diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 278b9aa25a1c..8010a0d82049 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -51,13 +51,13 @@ #define VDSO_ALIGNMENT (1 << 16) static unsigned int vdso32_pages; -static void *vdso32_kbase; static struct page **vdso32_pagelist; unsigned long vdso32_sigtramp; unsigned long vdso32_rt_sigtramp; #ifdef CONFIG_VDSO32 extern char vdso32_start, vdso32_end; +static void *vdso32_kbase; #endif #ifdef CONFIG_PPC64 @@ -246,250 +246,16 @@ const char *arch_vma_name(struct vm_area_struct *vma) return NULL; } - - #ifdef CONFIG_VDSO32 -static void * __init find_section32(Elf32_Ehdr *ehdr, const char *secname, - unsigned long *size) -{ - Elf32_Shdr *sechdrs; - unsigned int i; - char *secnames; - - /* Grab section headers and strings so we can tell who is who */ - sechdrs = (void *)ehdr + ehdr->e_shoff; - secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset; - - /* Find the section they want */ - for (i = 1; i < ehdr->e_shnum; i++) { - if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) { - if (size) - *size = sechdrs[i].sh_size; - return (void *)ehdr + sechdrs[i].sh_offset; - } - } - *size = 0; - return NULL; -} - -static Elf32_Sym * __init find_symbol32(struct lib32_elfinfo *lib, - const char *symname) -{ - unsigned int i; - char name[MAX_SYMNAME], *c; - - for (i = 0; i < (lib->dynsymsize / sizeof(Elf32_Sym)); i++) { - if (lib->dynsym[i].st_name == 0) - continue; - strlcpy(name, lib->dynstr + lib->dynsym[i].st_name, - MAX_SYMNAME); - c = strchr(name, '@'); - if (c) - *c = 0; - if (strcmp(symname, name) == 0) - return >dynsym[i]; - } - return NULL; -} - -/* Note that we assume the section is .text and the symbol is relative to - * the library base - */ -static unsigned long __init find_function32(struct lib32_elfinfo *lib, - const char *symname) -{ - Elf32_Sym *sym = find_symbol32(lib, symname); - - if (sym == NULL) { - printk(KERN_WARNING "vDSO32: function %s not found !\n", - symname); - return 0; - } - return sym->st_value - VDSO32_LBASE; -} - -static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32, - const char *orig, const char *fix) -{ - Elf32_Sym *sym32_gen, *sym32_fix; - - sym32_gen = find_symbol32(v32, orig); - if (sym32_gen == NULL) { - printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", orig); - return -1; - } - if (fix == NULL) { - sym32_gen->st_name = 0; - return 0; - } - sym32_fix = find_symbol32(v32, fix); - if (sym32_fix == NULL) { - printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", fix); - return -1; - } - sym32_gen->st_value = sym32_fix->st_value; - sym32_gen->st_size = sym32_fix->st_size; - sym32_gen->st_info = sym32_fix->st_info; - sym32_gen->st_other = sym32_fix->st_other; - sym32_gen->st_shndx = sym32_fix->st_shndx; - - return 0; -} -#else /* !CONFIG_VDSO32 */ -static unsigned long __init find_function32(struct lib32_elfinfo *lib, - const char *symname) -{ - return 0; -} - -static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32, - const char *orig, const char *fix) -{ - return 0; -} +#include "vdso_common.c" #endif /* CONFIG_VDSO32 */ - #ifdef CONFIG_PPC64 - -static void * __init find_section64(Elf64_Ehdr *ehdr, const char *secname, -
[PATCH 7/7] mm: kill arch_mremap
This reverts commit 4abad2ca4a4d ("mm: new arch_remap() hook") and commit 2ae416b142b6 ("mm: new mm hook framework"). It also keeps the same functionality of mremapping vDSO blob with introducing vm_special_mapping mremap op for powerpc. Cc: Laurent Dufour <lduf...@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: Andrew Morton <a...@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/alpha/include/asm/Kbuild| 1 - arch/arc/include/asm/Kbuild | 1 - arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild| 1 - arch/avr32/include/asm/Kbuild| 1 - arch/blackfin/include/asm/Kbuild | 1 - arch/c6x/include/asm/Kbuild | 1 - arch/cris/include/asm/Kbuild | 1 - arch/frv/include/asm/Kbuild | 1 - arch/h8300/include/asm/Kbuild| 1 - arch/hexagon/include/asm/Kbuild | 1 - arch/ia64/include/asm/Kbuild | 1 - arch/m32r/include/asm/Kbuild | 1 - arch/m68k/include/asm/Kbuild | 1 - arch/metag/include/asm/Kbuild| 1 - arch/microblaze/include/asm/Kbuild | 1 - arch/mips/include/asm/Kbuild | 1 - arch/mn10300/include/asm/Kbuild | 1 - arch/nios2/include/asm/Kbuild| 1 - arch/openrisc/include/asm/Kbuild | 1 - arch/parisc/include/asm/Kbuild | 1 - arch/powerpc/include/asm/mm-arch-hooks.h | 28 arch/powerpc/kernel/vdso.c | 19 +++ arch/powerpc/kernel/vdso_common.c| 1 + arch/s390/include/asm/Kbuild | 1 - arch/score/include/asm/Kbuild| 1 - arch/sh/include/asm/Kbuild | 1 - arch/sparc/include/asm/Kbuild| 1 - arch/tile/include/asm/Kbuild | 1 - arch/um/include/asm/Kbuild | 1 - arch/unicore32/include/asm/Kbuild| 1 - arch/x86/include/asm/Kbuild | 1 - arch/xtensa/include/asm/Kbuild | 1 - include/asm-generic/mm-arch-hooks.h | 16 include/linux/mm-arch-hooks.h| 25 - mm/mremap.c | 4 36 files changed, 20 insertions(+), 103 deletions(-) delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h delete mode 100644 include/asm-generic/mm-arch-hooks.h delete mode 100644 include/linux/mm-arch-hooks.h diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index bf8475ce85ee..0a5e0ec2842b 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -6,7 +6,6 @@ generic-y += exec.h generic-y += export.h generic-y += irq_work.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += preempt.h generic-y += sections.h generic-y += trace_clock.h diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index c332604606dd..e6059a808463 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -22,7 +22,6 @@ generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += mman.h generic-y += msgbuf.h generic-y += msi.h diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 0745538b26d3..44b717cb4a55 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -15,7 +15,6 @@ generic-y += irq_regs.h generic-y += kdebug.h generic-y += local.h generic-y += local64.h -generic-y += mm-arch-hooks.h generic-y += msgbuf.h generic-y += msi.h generic-y += param.h diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 44e1d7f10add..a42a1367aea4 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -20,7 +20,6 @@ generic-y += kvm_para.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += mman.h generic-y += msgbuf.h generic-y += msi.h diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild index 241b9b9729d8..519810d0d5e1 100644 --- a/arch/avr32/include/asm/Kbuild +++ b/arch/avr32/include/asm/Kbuild @@ -12,7 +12,6 @@ generic-y += irq_work.h generic-y += local.h generic-y += local64.h generic-y += mcs_spinlock.h -generic-y += mm-arch-hooks.h generic-y += param.h generic-y += percpu.h generic-y += preempt.h diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild index 91d49c0a3118..c80181e4454f 100644 --- a/ar
[PATCH 6/7] powerpc/vdso: switch from legacy_special_mapping_vmops
This will allow to introduce mremap hook (the next patch). Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c| 19 +++ arch/powerpc/kernel/vdso_common.c | 8 ++-- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index e68601ffc9ad..9ee3fd65c6e9 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -51,7 +51,7 @@ #define VDSO_ALIGNMENT (1 << 16) static unsigned int vdso32_pages; -static struct page **vdso32_pagelist; +static struct vm_special_mapping vdso32_mapping; unsigned long vdso32_sigtramp; unsigned long vdso32_rt_sigtramp; @@ -64,7 +64,7 @@ static void *vdso32_kbase; extern char vdso64_start, vdso64_end; static void *vdso64_kbase = _start; static unsigned int vdso64_pages; -static struct page **vdso64_pagelist; +static struct vm_special_mapping vdso64_mapping; unsigned long vdso64_rt_sigtramp; #endif /* CONFIG_PPC64 */ @@ -143,10 +143,11 @@ struct lib64_elfinfo unsigned long text; }; -static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages, +static int map_vdso(struct vm_special_mapping *vsm, unsigned long vdso_pages, unsigned long vdso_base) { struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; int ret = 0; mm->context.vdso_base = 0; @@ -198,12 +199,14 @@ static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages, * It's fine to use that for setting breakpoints in the vDSO code * pages though. */ - ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT, + vma = _install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT, VM_READ|VM_EXEC| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, -vdso_pagelist); - if (ret) +vsm); + if (IS_ERR(vma)) { + ret = PTR_ERR(vma); current->mm->context.vdso_base = 0; + } out_up_mmap_sem: up_write(>mmap_sem); @@ -220,7 +223,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) return 0; if (is_32bit_task()) - return map_vdso(vdso32_pagelist, vdso32_pages, VDSO32_MBASE); + return map_vdso(_mapping, vdso32_pages, VDSO32_MBASE); #ifdef CONFIG_PPC64 else /* @@ -228,7 +231,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) * allows get_unmapped_area to find an area near other mmaps * and most likely share a SLB entry. */ - return map_vdso(vdso64_pagelist, vdso64_pages, 0); + return map_vdso(_mapping, vdso64_pages, 0); #endif WARN_ONCE(1, "task is not 32-bit on non PPC64 kernel"); return -1; diff --git a/arch/powerpc/kernel/vdso_common.c b/arch/powerpc/kernel/vdso_common.c index c97c30606b3f..047f6b8b230f 100644 --- a/arch/powerpc/kernel/vdso_common.c +++ b/arch/powerpc/kernel/vdso_common.c @@ -14,7 +14,7 @@ #define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE) #define vdso_kbase CONCAT3(vdso, BITS, _kbase) #define vdso_pages CONCAT3(vdso, BITS, _pages) -#define vdso_pagelist CONCAT3(vdso, BITS, _pagelist) +#define vdso_mapping CONCAT3(vdso, BITS, _mapping) #undef pr_fmt #define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt @@ -207,6 +207,7 @@ static __init int vdso_setup(struct lib_elfinfo *v) static __init void init_vdso_pagelist(void) { int i; + struct page **vdso_pagelist; /* Make sure pages are in the correct state */ vdso_pagelist = kzalloc(sizeof(struct page *) * (vdso_pages + 2), @@ -221,6 +222,9 @@ static __init void init_vdso_pagelist(void) } vdso_pagelist[i++] = virt_to_page(vdso_data); vdso_pagelist[i] = NULL; + + vdso_mapping.pages = vdso_pagelist; + vdso_mapping.name = "[vdso]"; } #undef find_section @@ -236,7 +240,7 @@ static __init void init_vdso_pagelist(void) #undef VDSO_LBASE #undef vdso_kbase #undef vdso_pages -#undef vdso_pagelist +#undef vdso_mapping #undef lib_elfinfo #undef BITS #undef _CONCAT3 -- 2.10.0
[PATCH 5/7] powerpc/vdso: split map_vdso from arch_setup_additional_pages
I'll be easier to introduce vm_special_mapping struct in a smaller map_vdso() function (see the next patches). Impact: cleanup Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c | 67 +- 1 file changed, 31 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 25d03d773c49..e68601ffc9ad 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -143,52 +143,23 @@ struct lib64_elfinfo unsigned long text; }; - -/* - * This is called from binfmt_elf, we create the special vma for the - * vDSO and insert it into the mm struct tree - */ -int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) +static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages, + unsigned long vdso_base) { struct mm_struct *mm = current->mm; - struct page **vdso_pagelist; - unsigned long vdso_pages; - unsigned long vdso_base; int ret = 0; - if (!vdso_ready) - return 0; - -#ifdef CONFIG_PPC64 - if (is_32bit_task()) { - vdso_pagelist = vdso32_pagelist; - vdso_pages = vdso32_pages; - vdso_base = VDSO32_MBASE; - } else { - vdso_pagelist = vdso64_pagelist; - vdso_pages = vdso64_pages; - /* -* On 64bit we don't have a preferred map address. This -* allows get_unmapped_area to find an area near other mmaps -* and most likely share a SLB entry. -*/ - vdso_base = 0; - } -#else - vdso_pagelist = vdso32_pagelist; - vdso_pages = vdso32_pages; - vdso_base = VDSO32_MBASE; -#endif - - current->mm->context.vdso_base = 0; + mm->context.vdso_base = 0; - /* vDSO has a problem and was disabled, just don't "enable" it for the + /* +* vDSO has a problem and was disabled, just don't "enable" it for the * process */ if (vdso_pages == 0) return 0; + /* Add a page to the vdso size for the data page */ - vdso_pages ++; + vdso_pages++; /* * pick a base address for the vDSO in process space. We try to put it @@ -239,6 +210,30 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) return ret; } +/* + * This is called from binfmt_elf, we create the special vma for the + * vDSO and insert it into the mm struct tree + */ +int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) +{ + if (!vdso_ready) + return 0; + + if (is_32bit_task()) + return map_vdso(vdso32_pagelist, vdso32_pages, VDSO32_MBASE); +#ifdef CONFIG_PPC64 + else + /* +* On 64bit we don't have a preferred map address. This +* allows get_unmapped_area to find an area near other mmaps +* and most likely share a SLB entry. +*/ + return map_vdso(vdso64_pagelist, vdso64_pages, 0); +#endif + WARN_ONCE(1, "task is not 32-bit on non PPC64 kernel"); + return -1; +} + const char *arch_vma_name(struct vm_area_struct *vma) { if (vma->vm_mm && vma->vm_start == vma->vm_mm->context.vdso_base) -- 2.10.0
[PATCH 2/7] powerpc/vdso: remove unused params in vdso_do_func_patch{32, 64}
Impact: cleanup Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 4ffb82a2d9e9..278b9aa25a1c 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -309,7 +309,6 @@ static unsigned long __init find_function32(struct lib32_elfinfo *lib, } static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32, - struct lib64_elfinfo *v64, const char *orig, const char *fix) { Elf32_Sym *sym32_gen, *sym32_fix; @@ -344,7 +343,6 @@ static unsigned long __init find_function32(struct lib32_elfinfo *lib, } static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32, - struct lib64_elfinfo *v64, const char *orig, const char *fix) { return 0; @@ -419,8 +417,7 @@ static unsigned long __init find_function64(struct lib64_elfinfo *lib, #endif } -static int __init vdso_do_func_patch64(struct lib32_elfinfo *v32, - struct lib64_elfinfo *v64, +static int __init vdso_do_func_patch64(struct lib64_elfinfo *v64, const char *orig, const char *fix) { Elf64_Sym *sym64_gen, *sym64_fix; @@ -619,11 +616,9 @@ static __init int vdso_fixup_alt_funcs(struct lib32_elfinfo *v32, * It would be easy to do, but doesn't seem to be necessary, * patching the OPD symbol is enough. */ - vdso_do_func_patch32(v32, v64, patch->gen_name, -patch->fix_name); + vdso_do_func_patch32(v32, patch->gen_name, patch->fix_name); #ifdef CONFIG_PPC64 - vdso_do_func_patch64(v32, v64, patch->gen_name, -patch->fix_name); + vdso_do_func_patch64(v64, patch->gen_name, patch->fix_name); #endif /* CONFIG_PPC64 */ } -- 2.10.0
[PATCH 1/7] powerpc/vdso: unify return paths in setup_additional_pages
Impact: cleanup Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> Cc: Paul Mackerras <pau...@samba.org> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Oleg Nesterov <o...@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> --- arch/powerpc/kernel/vdso.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index 4111d30badfa..4ffb82a2d9e9 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -154,7 +154,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) struct page **vdso_pagelist; unsigned long vdso_pages; unsigned long vdso_base; - int rc; + int ret = 0; if (!vdso_ready) return 0; @@ -203,8 +203,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) ((VDSO_ALIGNMENT - 1) & PAGE_MASK), 0, 0); if (IS_ERR_VALUE(vdso_base)) { - rc = vdso_base; - goto fail_mmapsem; + ret = vdso_base; + goto out_up_mmap_sem; } /* Add required alignment. */ @@ -227,21 +227,16 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) * It's fine to use that for setting breakpoints in the vDSO code * pages though. */ - rc = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT, + ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT, VM_READ|VM_EXEC| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, vdso_pagelist); - if (rc) { + if (ret) current->mm->context.vdso_base = 0; - goto fail_mmapsem; - } - - up_write(>mmap_sem); - return 0; - fail_mmapsem: +out_up_mmap_sem: up_write(>mmap_sem); - return rc; + return ret; } const char *arch_vma_name(struct vm_area_struct *vma) -- 2.10.0
[PATCH 0/7] powerpc/mm: refactor vDSO mapping code
Cleanup patches for vDSO on powerpc. Originally, I wanted to add vDSO remapping on arm/aarch64 and I decided to cleanup that part on powerpc. I've add a hook for vm_ops for vDSO just like I did for x86. Other changes - reduce exhaustive code duplication. No visible to userspace changes expected. Tested on qemu with buildroot rootfs. Dmitry Safonov (7): powerpc/vdso: unify return paths in setup_additional_pages powerpc/vdso: remove unused params in vdso_do_func_patch{32,64} powerpc/vdso: separate common code in vdso_common powerpc/vdso: introduce init_vdso{32,64}_pagelist powerpc/vdso: split map_vdso from arch_setup_additional_pages powerpc/vdso: switch from legacy_special_mapping_vmops mm: kill arch_mremap arch/alpha/include/asm/Kbuild| 1 - arch/arc/include/asm/Kbuild | 1 - arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild| 1 - arch/avr32/include/asm/Kbuild| 1 - arch/blackfin/include/asm/Kbuild | 1 - arch/c6x/include/asm/Kbuild | 1 - arch/cris/include/asm/Kbuild | 1 - arch/frv/include/asm/Kbuild | 1 - arch/h8300/include/asm/Kbuild| 1 - arch/hexagon/include/asm/Kbuild | 1 - arch/ia64/include/asm/Kbuild | 1 - arch/m32r/include/asm/Kbuild | 1 - arch/m68k/include/asm/Kbuild | 1 - arch/metag/include/asm/Kbuild| 1 - arch/microblaze/include/asm/Kbuild | 1 - arch/mips/include/asm/Kbuild | 1 - arch/mn10300/include/asm/Kbuild | 1 - arch/nios2/include/asm/Kbuild| 1 - arch/openrisc/include/asm/Kbuild | 1 - arch/parisc/include/asm/Kbuild | 1 - arch/powerpc/include/asm/mm-arch-hooks.h | 28 -- arch/powerpc/kernel/vdso.c | 492 +-- arch/powerpc/kernel/vdso_common.c| 248 arch/s390/include/asm/Kbuild | 1 - arch/score/include/asm/Kbuild| 1 - arch/sh/include/asm/Kbuild | 1 - arch/sparc/include/asm/Kbuild| 1 - arch/tile/include/asm/Kbuild | 1 - arch/um/include/asm/Kbuild | 1 - arch/unicore32/include/asm/Kbuild| 1 - arch/x86/include/asm/Kbuild | 1 - arch/xtensa/include/asm/Kbuild | 1 - include/asm-generic/mm-arch-hooks.h | 16 - include/linux/mm-arch-hooks.h| 25 -- mm/mremap.c | 4 - 36 files changed, 323 insertions(+), 520 deletions(-) delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h create mode 100644 arch/powerpc/kernel/vdso_common.c delete mode 100644 include/asm-generic/mm-arch-hooks.h delete mode 100644 include/linux/mm-arch-hooks.h -- 2.10.0
Re: VDSO unmap and remap support for additional architectures
On 04/29/2016 04:22 PM, Christopher Covington wrote: On 04/28/2016 02:53 PM, Andy Lutomirski wrote: Also, at some point, possibly quite soon, x86 will want a way for user code to ask the kernel to map a specific vdso variant at a specific address. Could we perhaps add a new pair of syscalls: struct vdso_info { unsigned long space_needed_before; unsigned long space_needed_after; unsigned long alignment; }; long vdso_get_info(unsigned int vdso_type, struct vdso_info *info); long vdso_remap(unsigned int vdso_type, unsigned long addr, unsigned int flags); #define VDSO_X86_I386 0 #define VDSO_X86_64 1 #define VDSO_X86_X32 2 // etc. vdso_remap will map the vdso of the chosen type such at AT_SYSINFO_EHDR lines up with addr. It will use up to space_needed_before bytes before that address and space_needed_after after than address. It will also unmap the old vdso (or maybe only do that if some flag is set). On x86, mremap is *not* sufficient for everything that's needed, because some programs will need to change the vdso type. I don't I understand. Why can't people just exec() the ELF type that corresponds to the VDSO they want? I may say about my needs in it: to not lose all the existing information in application. Imagine you're restoring a container with 64-bit and 32-bit applications (in compatible mode). So you need somehow switch vdso type in restorer for a 32-bit application. Yes, you may exec() and then - all already restored application properties will got lost. You will need to transpher information about mappings, make protocol between restorer binary and main criu application, finally you'll end up with some really much more difficult architecture than it is now. And it'll be slower. Also it's pretty logical: if one can switch between modes, why can't he change vdso mapping to mode he got to? (note: if the work about removing thread compatible flags will be done (on x86), there will not even be such a thing, as application mode - just difference on which syscalls it uses: compatible or native). Thanks, Dmitry Safonov ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev