Re: [PATCH] powerpc/vdso: Separate vvar vma from vdso

2021-03-31 Thread Dmitry Safonov
On 3/31/21 10:59 AM, Michael Ellerman wrote:
> Christophe Leroy  writes:
[..]
>>
>>> @@ -133,7 +135,13 @@ static int __arch_setup_additional_pages(struct 
>>> linux_binprm *bprm, int uses_int
>>>  * install_special_mapping or the perf counter mmap tracking code
>>>  * will fail to recognise it as a vDSO.
>>>  */
>>> -   mm->context.vdso = (void __user *)vdso_base + PAGE_SIZE;
>>> +   mm->context.vdso = (void __user *)vdso_base + vvar_size;
>>> +
>>> +   vma = _install_special_mapping(mm, vdso_base, vvar_size,
>>> +  VM_READ | VM_MAYREAD | VM_IO |
>>> +  VM_DONTDUMP | VM_PFNMAP, _spec);
>>> +   if (IS_ERR(vma))
>>> +   return PTR_ERR(vma);
>>>   
>>> /*
>>>  * our vma flags don't have VM_WRITE so by default, the process isn't
>>
>>
>> IIUC, VM_PFNMAP is for when we have a vvar_fault handler.
>> Allthough we will soon have one for handle TIME_NS, at the moment
>> powerpc doesn't have that handler.
>> Isn't it dangerous to set VM_PFNMAP then ?

I believe, it's fine, special_mapping_fault() does:
:   if (sm->fault)
:   return sm->fault(sm, vmf->vma, vmf);

> Some of the other flags seem odd too.
> eg. VM_IO ? VM_DONTDUMP ?

Yeah, so:
VM_PFNMAP | VM_IO is a protection from remote access on pages. So one
can't access such page with ptrace(), /proc/$pid/mem or
process_vm_write(). Otherwise, it would create COW mapping and the
tracee will stop working with stale vvar.

VM_DONTDUMP restricts the area from coredumping and gdb will also avoid
accessing those[1][2].

I agree that VM_PFNMAP was probably excessive in this patch alone and
rather synchronized code with other architectures, but it makes more
sense now in the new patches set by Christophe:
https://lore.kernel.org/linux-arch/cover.1617209141.git.christophe.le...@csgroup.eu/


[1] https://lore.kernel.org/lkml/550731af.6080...@redhat.com/T/
[2] https://sourceware.org/legacy-ml/gdb-patches/2015-03/msg00383.html

Thanks,
  Dmitry


Re: [PATCH] powerpc/vdso: Separate vvar vma from vdso

2021-03-31 Thread Dmitry Safonov
On 3/30/21 11:17 AM, Christophe Leroy wrote:
> 
> 
> Le 26/03/2021 à 20:17, Dmitry Safonov a écrit :
[..]
>> --- a/arch/powerpc/kernel/vdso.c
>> +++ b/arch/powerpc/kernel/vdso.c
>> @@ -55,10 +55,10 @@ static int vdso_mremap(const struct
>> vm_special_mapping *sm, struct vm_area_struc
>>   {
>>   unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
>>   -    if (new_size != text_size + PAGE_SIZE)
>> +    if (new_size != text_size)
>>   return -EINVAL;
> 
> In ARM64 you have removed the above test in commit 871402e05b24cb56
> ("mm: forbid splitting special mappings"). Do we need to keep it here ?
> 
>>   -    current->mm->context.vdso = (void __user *)new_vma->vm_start +
>> PAGE_SIZE;
>> +    current->mm->context.vdso = (void __user *)new_vma->vm_start;
>>     return 0;
>>   }
> 

Yes, right you are, this can be dropped.

Thanks,
  Dmitry


Re: [PATCH] powerpc/vdso: Separate vvar vma from vdso

2021-03-29 Thread Dmitry Safonov
On 3/29/21 4:14 PM, Laurent Dufour wrote:
> Le 26/03/2021 à 20:17, Dmitry Safonov a écrit :
>> Since commit 511157ab641e ("powerpc/vdso: Move vdso datapage up front")
>> VVAR page is in front of the VDSO area. In result it breaks CRIU
>> (Checkpoint Restore In Userspace) [1], where CRIU expects that "[vdso]"
>> from /proc/../maps points at ELF/vdso image, rather than at VVAR data
>> page.
>> Laurent made a patch to keep CRIU working (by reading aux vector).
>> But I think it still makes sence to separate two mappings into different
>> VMAs. It will also make ppc64 less "special" for userspace and as
>> a side-bonus will make VVAR page un-writable by debugger (which
>> previously
>> would COW page and can be unexpected).
>>
>> I opportunistically Cc stable on it: I understand that usually such
>> stuff isn't a stable material, but that will allow us in CRIU have
>> one workaround less that is needed just for one release (v5.11) on
>> one platform (ppc64), which we otherwise have to maintain.
>> I wouldn't go as far as to say that the commit 511157ab641e is ABI
>> regression as no other userspace got broken, but I'd really appreciate
>> if it gets backported to v5.11 after v5.12 is released, so as not
>> to complicate already non-simple CRIU-vdso code. Thanks!
>>
>> Cc: Andrei Vagin 
>> Cc: Andy Lutomirski 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Christophe Leroy 
>> Cc: Laurent Dufour 
>> Cc: Michael Ellerman 
>> Cc: Paul Mackerras 
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: sta...@vger.kernel.org # v5.11
>> [1]: https://github.com/checkpoint-restore/criu/issues/1417
>> Signed-off-by: Dmitry Safonov 
>> Tested-by: Christophe Leroy 
> 
> I run the CRIU's test suite and except the usual suspects, all the tests
> passed.
> 
> Tested-by: Laurent Dufour 

Thank you, Laurent!

-- 
  Dmitry


Re: [PATCH] powerpc/vdso: Separate vvar vma from vdso

2021-03-27 Thread Dmitry Safonov
Hi Christophe,

On 3/27/21 5:19 PM, Christophe Leroy wrote:
[..]
>> I opportunistically Cc stable on it: I understand that usually such
>> stuff isn't a stable material, but that will allow us in CRIU have
>> one workaround less that is needed just for one release (v5.11) on
>> one platform (ppc64), which we otherwise have to maintain.
> 
> Why is that a workaround, and why for one release only ? I think the
> solution proposed by Laurentto use the aux vector AT_SYSINFO_EHDR should
> work with any past and future release.

Yeah, I guess.
Previously, (before v5.11/power) all kernels had ELF start at "[vdso]"
VMA start, now we'll have to carry the offset in the VMA. Probably, not
the worst thing, but as it will be only for v5.11 release it can break,
so needs separate testing.
Kinda life was a bit easier without this additional code.

>> I wouldn't go as far as to say that the commit 511157ab641e is ABI
>> regression as no other userspace got broken, but I'd really appreciate
>> if it gets backported to v5.11 after v5.12 is released, so as not
>> to complicate already non-simple CRIU-vdso code. Thanks!
>>
>> Cc: Andrei Vagin 
>> Cc: Andy Lutomirski 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Christophe Leroy 
>> Cc: Laurent Dufour 
>> Cc: Michael Ellerman 
>> Cc: Paul Mackerras 
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: sta...@vger.kernel.org # v5.11
>> [1]: https://github.com/checkpoint-restore/criu/issues/1417
>> Signed-off-by: Dmitry Safonov 
>> Tested-by: Christophe Leroy 
> 
> I tested it with sifreturn_vdso selftest and it worked, because that
> selftest doesn't involve VDSO data.

Thanks again on helping with testing it, I appreciate it!

> But if I do a mremap() on the VDSO text vma without remapping VVAR to
> keep the same distance between the two vmas, gettimeofday() crashes. The
> reason is that the code obtains the address of the data by calculating a
> fix difference from its own address with the below macro, the delta
> being resolved at link time:
> 
> .macro get_datapage ptr
> bcl    20, 31, .+4
> 999:
> mflr    \ptr
> #if CONFIG_PPC_PAGE_SHIFT > 14
> addis    \ptr, \ptr, (_vdso_datapage - 999b)@ha
> #endif
> addi    \ptr, \ptr, (_vdso_datapage - 999b)@l
> .endm
> 
> So the datapage needs to remain at the same distance from the code at
> all time.
> 
> Wondering how the other architectures do to have two independent VMAs
> and be able to move one independently of the other.

It's alright as far as I know. If userspace remaps vdso/vvar it should
be aware of this (CRIU keeps this in mind, also old vdso image is dumped
to compare on restore with the one that the host has).

Thanks,
  Dmitry


Re: VDSO ELF header

2021-03-26 Thread Dmitry Safonov
On 3/26/21 6:40 PM, Christophe Leroy wrote:
> 
> 
> Le 26/03/2021 à 18:11, Dmitry Safonov a écrit :
>> On 3/26/21 5:07 PM, Christophe Leroy wrote:
>>> No, the problem is that user access has to be allowed for the flush()
>>>
>>> A hacky solution would be to call user_access_begin() , will test that
>>> later
>>
>> Yeah, cool.
>>
>> Will it be fine if I send the vvar patch with your Tested-by?
>>
> 
> Tested-by: Christophe Leroy 

Thank you!
I'll properly submit it shortly..

> With the user access fixed on the flush, it sigreturn_vdso selftest is a
> success. I'll send a patch for it in the coming days.

Nice!

> What is the status of your series which adds generic vdso_base tracking ?

Yeah, I was doing a new version of patches and I always was unsatisfied
by the result and stuck between "good" and "best" (more code rewriting).
And then had some other work to finish.

I'll try to finish and send it next week, thanks for pinging :-)

--
   Dmitry


Re: VDSO ELF header

2021-03-26 Thread Dmitry Safonov
On 3/26/21 5:07 PM, Christophe Leroy wrote:
> No, the problem is that user access has to be allowed for the flush()
> 
> A hacky solution would be to call user_access_begin() , will test that
> later

Yeah, cool.

Will it be fine if I send the vvar patch with your Tested-by?

Thanks,
Dmitry


Re: VDSO ELF header

2021-03-26 Thread Dmitry Safonov
On 3/26/21 4:11 PM, Christophe Leroy wrote:
[..]
>>>
>>> Dmitry proposed the same, see
>>> https://github.com/0x7f454c46/linux/commit/783c7a2532d2219edbcf555cc540eab05f698d2a
>>>
>>>
>>>
>>> Discussion at https://github.com/checkpoint-restore/criu/issues/1417
>>
>> Yeah, I didn't submit it officially to lkml because I couldn't test it
>> yet (and I usually don't send untested patches). The VM I have fails to
>> kexec and there's some difficulty to get serial console working, so I'd
>> appreciate if someone could either pick it up, or add tested-by.
>>
> 
> Just to let everyone know, while testing your patch with selftest I
> encountered the following Oops. But I also have it without your patch
> thought.

Thank you, Christophe!

> 
> root@vgoip:~# ./sigreturn_vdso
> test: sigreturn_vdso
> tags: git_version:v5.12-rc4-1553-gc31141d460e6
> VDSO is at 0x104000-0x10bfff (32768 bytes)
> Signal delivered OK with VDSO mapped
> VDSO moved to 0x77bf4000-0x77bfbfff (32768 bytes)
> Signal delivered OK with VDSO moved
> Unmapped VDSO
> [ 1855.444371] Kernel attempted to read user page (7ff9ff30) - exploit
> attempt? (uid: 0)
> [ 1855.459404] BUG: Unable to handle kernel data access on read at
> 0x7ff9ff30
> [ 1855.466188] Faulting instruction address: 0xc00111d4
> [ 1855.471099] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 1855.476428] BE PAGE_SIZE=16K PREEMPT CMPC885
> [ 1855.480702] SAF3000 DIE NOTIFICATION
> [ 1855.484184] CPU: 0 PID: 362 Comm: sigreturn_vdso Not tainted
> 5.12.0-rc4-s3k-dev-01553-gc31141d460e6 #4811
> [ 1855.493644] NIP:  c00111d4 LR: c0005a28 CTR: 
> [ 1855.498634] REGS: cadb3dd0 TRAP: 0300   Not tainted 
> (5.12.0-rc4-s3k-dev-01553-gc31141d460e6)
> [ 1855.507068] MSR:  9032   CR: 48000884  XER: 2000
> [ 1855.513866] DAR: 7ff9ff30 DSISR: 8800
> [ 1855.513866] GPR00: c0007788 cadb3e90 c28dc000 7ff9ff30 7ff9ff40
> 04e0 7ff9fd50 
> [ 1855.513866] GPR08: 0001 0001 7ff9ff30  28000282
> 1001b7e8 100a0920 
> [ 1855.513866] GPR16: 100cac0c 100b 102883a4 10289685 100d
> 100d 100d 100b2e9e
> [ 1855.513866] GPR24:  102883c8  7ff9ff38 cadb3f40
> cadb3ec8 c28dc000 
> [ 1855.552767] NIP [c00111d4] flush_icache_range+0x90/0xb4
> [ 1855.557932] LR [c0005a28] handle_signal32+0x1bc/0x1c4
> [ 1855.562925] Call Trace:
> [ 1855.565332] [cadb3e90] [100d] 0x100d (unreliable)
> [ 1855.570666] [cadb3ec0] [c0007788] do_notify_resume+0x260/0x314
> [ 1855.576432] [cadb3f20] [c000c764] syscall_exit_prepare+0x120/0x184
> [ 1855.582542] [cadb3f30] [c00100b4] ret_from_syscall+0xc/0x28
> [ 1855.588050] --- interrupt: c00 at 0xfe807f8
> [ 1855.592183] NIP:  0fe807f8 LR: 10001048 CTR: c0139378
> [ 1855.597174] REGS: cadb3f40 TRAP: 0c00   Not tainted 
> (5.12.0-rc4-s3k-dev-01553-gc31141d460e6)
> [ 1855.605607] MSR:  d032   CR: 28000282  XER:
> 2000
> [ 1855.612664]
> [ 1855.612664] GPR00: 0025 7ffa0230 77c09690  000a
> 28000282 0001 0ff03a38
> [ 1855.612664] GPR08: d032 0328 c28dc000 0009 88000282
> 1001b7e8 100a0920 
> [ 1855.612664] GPR16: 100cac0c 100b 102883a4 10289685 100d
> 100d 100d 100b2e9e
> [ 1855.612664] GPR24:  102883c8  77bff628 10002358
> 1001 1000210c 8000
> [ 1855.648894] NIP [0fe807f8] 0xfe807f8
> [ 1855.652426] LR [10001048] 0x10001048
> [ 1855.655954] --- interrupt: c00
> [ 1855.658969] Instruction dump:
> [ 1855.661893] 38630010 7c001fac 38630010 4200fff0 7c0004ac 4c00012c
> 4e800020 7c001fac
> [ 1855.669811] 2c0a 38630010 4082ffcc 4be4 <7c00186c> 2c07
> 39430010 4082ff8c
> [ 1855.677910] ---[ end trace f071a5587092b3aa ]---
> [ 1855.682462]
> Remapped the stack executable
> !! child died by signal 11
> failure: sigreturn_vdso

Yes, it seems unrelated.
Probably, a bit hacky solution to this one could be:

--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -911,7 +911,7 @@ int handle_signal32(struct ksignal *ksig, sigset_t
*oldset,
}
user_write_access_end();

-   if (tramp == (unsigned long)mctx->mc_pad)
+   if ((tramp == (unsigned long)mctx->mc_pad) && access_ok(tramp,
2*sizeof(unsigned long)))
flush_icache_range(tramp, tramp + 2 * sizeof(unsigned
long));

regs->link = tramp;
--

But it's up to you, it seems power-related.

Thanks,
Dmitry


Re: VDSO ELF header

2021-03-26 Thread Dmitry Safonov
Hello,

On 3/26/21 10:50 AM, Christophe Leroy wrote:
> 
> 
> Le 26/03/2021 à 11:46, Michael Ellerman a écrit :
>> Laurent Dufour  writes:
>>> Le 25/03/2021 à 17:56, Laurent Dufour a écrit :
 Le 25/03/2021 à 17:46, Christophe Leroy a écrit :
> Le 25/03/2021 à 17:11, Laurent Dufour a écrit :
>> Since v5.11 and the changes you made to the VDSO code, it no more
>> exposing
>> the ELF header at the beginning of the VDSO mapping in user space.
>>
>> This is confusing CRIU which is checking for this ELF header cookie
>> (https://github.com/checkpoint-restore/criu/issues/1417).
>
> How does it do on other architectures ?

 Good question, I'll double check the CRIU code.
>>>
>>> On x86, there are 2 VDSO entries:
>>> 77fcb000-77fce000 r--p  00:00
>>> 0  [vvar]
>>> 77fce000-77fcf000 r-xp  00:00
>>> 0  [vdso]
>>>
>>> And the VDSO is starting with the ELF header.
>>>
>> I'm not an expert in loading and ELF part and reading the change
>> you made, I
>> can't identify how this could work now as I'm expecting the loader
>> to need
>> that ELF header to do the relocation.
>
> I think the loader is able to find it at the expected place.

 Actually, it seems the loader relies on the AUX vector
 AT_SYSINFO_EHDR. I guess
 CRIU should do the same.

>>
>>   From my investigation it seems that the first bytes of the VDSO
>> area are now
>> the vdso_arch_data.
>>
>> Is the ELF header put somewhere else?
>> How could the loader process the VDSO without that ELF header?
>>
>
> Like most other architectures, we now have the data section as
> first page and
> the text section follows. So you will likely find the elf header on
> the second
> page.
>>>
>>> I'm wondering if the data section you're refering to is the vvar
>>> section I can
>>> see on x86.
>>
>> Many of the other architectures have separate vm_special_mapping's for
>> the data page and the vdso binary, where the former is called "vvar".
>>
>> eg, s390:
>>
>> static struct vm_special_mapping vvar_mapping = {
>> .name = "[vvar]",
>> .fault = vvar_fault,
>> };
>>
>> static struct vm_special_mapping vdso_mapping = {
>> .name = "[vdso]",
>> .mremap = vdso_mremap,
>> };
>>
>>
>> I guess we probably should be doing that too.
>>
> 
> Dmitry proposed the same, see
> https://github.com/0x7f454c46/linux/commit/783c7a2532d2219edbcf555cc540eab05f698d2a
> 
> 
> Discussion at https://github.com/checkpoint-restore/criu/issues/1417

Yeah, I didn't submit it officially to lkml because I couldn't test it
yet (and I usually don't send untested patches). The VM I have fails to
kexec and there's some difficulty to get serial console working, so I'd
appreciate if someone could either pick it up, or add tested-by.

Thanks,
  Dmitry


Re: [PATCH] x86/mpx: fix recursive munmap() corruption

2020-11-03 Thread Dmitry Safonov
Hi Laurent, Christophe, Michael, all,

On 11/3/20 5:11 PM, Laurent Dufour wrote:
> Le 23/10/2020 à 14:28, Christophe Leroy a écrit :
[..]
 That seems like it would work for CRIU and make sense in general?
>>>
>>> Sorry for the late answer, yes this would make more sense.
>>>
>>> Here is a patch doing that.
>>>
>>
>> In your patch, the test seems overkill:
>>
>> +    if ((start <= vdso_base && vdso_end <= end) ||  /* 1   */
>> +    (vdso_base <= start && start < vdso_end) || /* 3,4 */
>> +    (vdso_base < end && end <= vdso_end))   /* 2,3 */
>> +    mm->context.vdso_base = mm->context.vdso_end = 0;
>>
>> What about
>>
>>  if (start < vdso_end && vdso_start < end)
>>  mm->context.vdso_base = mm->context.vdso_end = 0;
>>
>> This should cover all cases, or am I missing something ?
>>
>>
>> And do we really need to store vdso_end in the context ?
>> I think it should be possible to re-calculate it: the size of the VDSO
>> should be (_end - _start) + PAGE_SIZE for 32 bits VDSO,
>> and (_end - _start) + PAGE_SIZE for the 64 bits VDSO.
> 
> Thanks Christophe for the advise.
> 
> That is covering all the cases, and indeed is similar to the Michael's
> proposal I missed last year.
> 
> I'll send a patch fixing this issue following your proposal.

It's probably not necessary anymore. I've sent patches [1], currently in
akpm, the last one forbids splitting of vm_special_mapping.
So, a user is able munmap() or mremap() vdso as a whole, but not partly.

[1]:
https://lore.kernel.org/linux-mm/20201013013416.390574-1-d...@arista.com/

Thanks,
  Dmitry


Re: [PATCH v8 2/8] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()

2020-10-23 Thread Dmitry Safonov
Hi Christophe, Will,

On 10/23/20 12:57 PM, Christophe Leroy wrote:
> 
> 
> Le 23/10/2020 à 13:25, Will Deacon a écrit :
>> On Fri, Oct 23, 2020 at 01:22:04PM +0200, Christophe Leroy wrote:
>>> Hi Dmitry,
[..]
>>> I haven't seen the patches, did you sent them out finally ?

I was working on .close() hook, but while cooking it, I thought it may
be better to make tracking of user landing generic. Note that the vdso
base address is mostly needed by kernel as an address to land in
userspace after processing a signal.

I have some raw patches that add
+#ifdef CONFIG_ARCH_HAS_USER_LANDING
+   struct vm_area_struct *user_landing;
+#endif
inside mm_struct and I plan to finish them after rc1 gets released.

While working on that, I noticed that arm32 and some other architectures
track vdso position in mm.context with the only reason to add
AT_SYSINFO_EHDR in the elf header that's being loaded. That's quite
overkill to have a pointer in mm.context that rather can be a local
variable in elf binfmt loader. Also, I found some issues with mremap
code. The patches series mentioned are at the base of the branch with
generic user landing. I have sent only those patches not the full branch
as I remember there was a policy that during merge window one should
send only fixes, rather than refactoring/new code.

>> I think it's this series:
>>
>> https://lore.kernel.org/r/20201013013416.390574-1-d...@arista.com
>>
>> but they look really invasive to me, so I may cook a small hack for arm64
>> in the meantine / for stable.

I don't mind small hacks, but I'm concerned that the suggested fix which
sets `mm->context.vdso_base = 0` on munmap() may have it's issue: that
won't work if a user for whatever-broken-reason will mremap() vdso on 0
address. As the fix supposes to fix an issue that hasn't fired for
anyone yet, it probably shouldn't introduce another. That's why I've
used vm_area_struct to track vdso position in the patches set.
Probably, temporary, you could use something like:
#define BAD_VDSO_ADDRESS(-1)UL
Or non-page-aligned address.
But the signal code that checks if it can land on vdso/sigpage should be
also aligned with the new definition.

> Not sure we are talking about the same thing.
> 
> I can't see any new .close function added to vm_special_mapping in order
> to replace arch_unmap() hook.
Thanks,
  Dmitry


Re: [PATCH v8 2/8] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()

2020-09-28 Thread Dmitry Safonov
On 9/27/20 8:43 AM, Christophe Leroy wrote:
> 
> 
> Le 21/09/2020 à 13:26, Will Deacon a écrit :
>> On Fri, Aug 28, 2020 at 12:14:28PM +1000, Michael Ellerman wrote:
>>> Dmitry Safonov <0x7f454...@gmail.com> writes:
[..]
>>>> I'll cook a patch for vm_special_mapping if you don't mind :-)
>>>
>>> That would be great, thanks!
>>
>> I lost track of this one. Is there a patch kicking around to resolve
>> this,
>> or is the segfault expected behaviour?
>>
> 
> IIUC dmitry said he will cook a patch. I have not seen any patch yet.

Yes, sorry about the delay - I was a bit busy with xfrm patches.

I'll send patches for .close() this week, working on them now.

> AFAIKS, among the architectures having VDSO sigreturn trampolines, only
> SH, X86 and POWERPC provide alternative trampoline on stack when VDSO is
> not there.
> 
> All other architectures just having a VDSO don't expect VDSO to not be
> mapped.
> 
> As far as nowadays stacks are mapped non-executable, getting a segfaut
> is expected behaviour. However, I think we should really make it
> cleaner. Today it segfaults because it is still pointing to the VDSO
> trampoline that has been unmapped. But should the user map some other
> code at the same address, we'll run in the weed on signal return instead
> of segfaulting.

+1.

> So VDSO unmapping should really be properly managed, the reference
> should be properly cleared in order to segfault in a controllable manner.
> 
> Only powerpc has a hook to properly clear the VDSO pointer when VDSO is
> unmapped.

Thanks,
 Dmitry


Re: [PATCH v8 2/8] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()

2020-08-27 Thread Dmitry Safonov
Hello,

On Wed, 26 Aug 2020 at 15:39, Michael Ellerman  wrote:
> Christophe Leroy  writes:
[..]
> > arch_remap() gets replaced by vdso_remap()
> >
> > For arch_unmap(), I'm wondering how/what other architectures do, because
> > powerpc seems to be the only one to erase the vdso context pointer when
> > unmapping the vdso.
>
> Yeah. The original unmap/remap stuff was added for CRIU, which I thought
> people tested on other architectures (more than powerpc even).
>
> Possibly no one really cares about vdso unmap though, vs just moving the
> vdso.
>
> We added a test for vdso unmap recently because it happened to trigger a
> KAUP failure, and someone actually hit it & reported it.

You right, CRIU cares much more about moving vDSO.
It's done for each restoree and as on most setups vDSO is premapped and
used by the application - it's actively tested.
Speaking about vDSO unmap - that's concerning only for heterogeneous C/R,
i.e when an application is migrated from a system that uses vDSO to the one
which doesn't - it's much rare scenario.
(for arm it's !CONFIG_VDSO, for x86 it's `vdso=0` boot parameter)

Looking at the code, it seems quite easy to provide/maintain .close() for
vm_special_mapping. A bit harder to add a test from CRIU side
(as glibc won't know on restore that it can't use vdso anymore),
but totally not impossible.

> Running that test on arm64 segfaults:
>
>   # ./sigreturn_vdso
>   VDSO is at 0x8191f000-0x8191 (4096 bytes)
>   Signal delivered OK with VDSO mapped
>   VDSO moved to 0x8191a000-0x8191afff (4096 bytes)
>   Signal delivered OK with VDSO moved
>   Unmapped VDSO
>   Remapped the stack executable
>   [   48.556191] potentially unexpected fatal signal 11.
>   [   48.556752] CPU: 0 PID: 140 Comm: sigreturn_vdso Not tainted 
> 5.9.0-rc2-00057-g2ac69819ba9e #190
>   [   48.556990] Hardware name: linux,dummy-virt (DT)
>   [   48.557336] pstate: 60001000 (nZCv daif -PAN -UAO BTYPE=--)
>   [   48.557475] pc : 8191a7bc
>   [   48.557603] lr : 8191a7bc
>   [   48.557697] sp : c13c9e90
>   [   48.557873] x29: c13cb0e0 x28: 
>   [   48.558201] x27:  x26: 
>   [   48.558337] x25:  x24: 
>   [   48.558754] x23:  x22: 
>   [   48.558893] x21: 004009b0 x20: 
>   [   48.559046] x19: 00400ff0 x18: 
>   [   48.559180] x17: 817da300 x16: 00412010
>   [   48.559312] x15:  x14: 001c
>   [   48.559443] x13: 656c626174756365 x12: 7865206b63617473
>   [   48.559625] x11: 0003 x10: 0101010101010101
>   [   48.559828] x9 : 818afda8 x8 : 0081
>   [   48.559973] x7 : 6174732065687420 x6 : 64657070616d6552
>   [   48.560115] x5 : 0e0388bd x4 : 0040135d
>   [   48.560270] x3 :  x2 : 0001
>   [   48.560412] x1 : 0003 x0 : 004120b8
>   Segmentation fault
>   #
>
> So I think we need to keep the unmap hook. Maybe it should be handled by
> the special_mapping stuff generically.

I'll cook a patch for vm_special_mapping if you don't mind :-)

Thanks,
 Dmitry


[PATCHv3 00/50] Add log level to show_stack()

2020-04-19 Thread Dmitry Safonov
Changes to v3:
- Collected more architectual Acks and Reviewed-by
- Fixed compilation on sparc64

Changes to v2:
- Removed excessive pr_cont("\n") (nits by Senozhatsky)
- Leave backtrace debugging messages with pr_debug()
  (noted by Russell King and Will Deacon)
- Correct microblaze_unwind_inner() declaration
  (Thanks to Michal Simek and kbuild test robot)
- Fix copy'n'paste typo in show_stack_loglvl() for sparc
  (kbuild robot)
- Fix backtrace output on xtensa
  (Thanks Max Filippov)
- Add loglevel to show_stack() on s390 (kbuild robot)
- Collected all Reviewed-by and Acked-by (thanks!)

v2: https://lore.kernel.org/linux-riscv/20200316143916.195608-1-d...@arista.com/
v1: https://lore.kernel.org/linux-riscv/20191106030542.868541-1-d...@arista.com/

Add log level argument to show_stack().
Done in three stages:
1. Introducing show_stack_loglvl() for every architecture
2. Migrating old users with an explicit log level
3. Renaming show_stack_loglvl() into show_stack()

Justification:
o It's a design mistake to move a business-logic decision
  into platform realization detail.
o I have currently two patches sets that would benefit from this work:
  Removing console_loglevel jumps in sysrq driver [1]
  Hung task warning before panic [2] - suggested by Tetsuo (but he
  probably didn't realise what it would involve).
o While doing (1), (2) the backtraces were adjusted to headers
  and other messages for each situation - so there won't be a situation
  when the backtrace is printed, but the headers are missing because
  they have lesser log level (or the reverse).
o As the result in (2) plays with console_loglevel for kdb are removed.

The least important for upstream, but maybe still worth to note that
every company I've worked in so far had an off-list patch to print
backtrace with the needed log level (but only for the architecture they
cared about).
If you have other ideas how you will benefit from show_stack() with
a log level - please, reply to this cover letter.

See also discussion on v1:
https://lore.kernel.org/linux-riscv/20191106083538.z5nlpuf64cigx...@pathway.suse.cz/

Cc: Andrew Morton 
Cc: Greg Kroah-Hartman 
Cc: Ingo Molnar 
Cc: Jiri Slaby 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: Tetsuo Handa 

Thanks,
Dmitry

[1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u
[2]: 
https://lkml.kernel.org/r/41fd7652-df1f-26f6-aba0-b87ebae07...@i-love.sakura.ne.jp

Dmitry Safonov (50):
  kallsyms/printk: Add loglvl to print_ip_sym()
  alpha: Add show_stack_loglvl()
  arc: Add show_stack_loglvl()
  arm/asm: Add loglvl to c_backtrace()
  arm: Add loglvl to unwind_backtrace()
  arm: Add loglvl to dump_backtrace()
  arm: Wire up dump_backtrace_{entry,stm}
  arm: Add show_stack_loglvl()
  arm64: Add loglvl to dump_backtrace()
  arm64: Add show_stack_loglvl()
  c6x: Add show_stack_loglvl()
  csky: Add show_stack_loglvl()
  h8300: Add show_stack_loglvl()
  hexagon: Add show_stack_loglvl()
  ia64: Pass log level as arg into ia64_do_show_stack()
  ia64: Add show_stack_loglvl()
  m68k: Add show_stack_loglvl()
  microblaze: Add loglvl to microblaze_unwind_inner()
  microblaze: Add loglvl to microblaze_unwind()
  microblaze: Add show_stack_loglvl()
  mips: Add show_stack_loglvl()
  nds32: Add show_stack_loglvl()
  nios2: Add show_stack_loglvl()
  openrisc: Add show_stack_loglvl()
  parisc: Add show_stack_loglvl()
  powerpc: Add show_stack_loglvl()
  riscv: Add show_stack_loglvl()
  s390: Add show_stack_loglvl()
  sh: Add loglvl to dump_mem()
  sh: Remove needless printk()
  sh: Add loglvl to printk_address()
  sh: Add loglvl to show_trace()
  sh: Add show_stack_loglvl()
  sparc: Add show_stack_loglvl()
  um/sysrq: Remove needless variable sp
  um: Add show_stack_loglvl()
  unicore32: Remove unused pmode argument in c_backtrace()
  unicore32: Add loglvl to c_backtrace()
  unicore32: Add show_stack_loglvl()
  x86: Add missing const qualifiers for log_lvl
  x86: Add show_stack_loglvl()
  xtensa: Add loglvl to show_trace()
  xtensa: Add show_stack_loglvl()
  sysrq: Use show_stack_loglvl()
  x86/amd_gart: Print stacktrace for a leak with KERN_ERR
  power: Use show_stack_loglvl()
  kdb: Don't play with console_loglevel
  sched: Print stack trace with KERN_INFO
  kernel: Use show_stack_loglvl()
  kernel: Rename show_stack_loglvl() => show_stack()

 arch/alpha/kernel/traps.c| 22 +++
 arch/arc/include/asm/bug.h   |  3 ++-
 arch/arc/kernel/stacktrace.c | 17 +++-
 arch/arc/kernel/troubleshoot.c   |  2 +-
 arch/arm/include/asm/bug.h   |  3 ++-
 arch/arm/include/asm/traps.h |  3 ++-
 arch/arm/include/asm/unwind.h|  3 ++-
 arch/arm/kernel/traps.c  | 39 +++
 arch/arm/kernel/unwind.c |  5 ++--
 arch/arm/lib/backtrace-clang.S   |  9 +--
 arch/arm/lib/backtrace.S | 14 +++---
 arch/arm64/include/asm/stacktrace.h  |  3 ++-
 arch

[PATCHv3 26/50] powerpc: Add show_stack_loglvl()

2020-04-18 Thread Dmitry Safonov
Currently, the log-level of show_stack() depends on a platform
realization. It creates situations where the headers are printed with
lower log level or higher than the stacktrace (depending on
a platform or user).

Furthermore, it forces the logic decision from user to an architecture
side. In result, some users as sysrq/kdb/etc are doing tricks with
temporary rising console_loglevel while printing their messages.
And in result it not only may print unwanted messages from other CPUs,
but also omit printing at all in the unlucky case where the printk()
was deferred.

Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems
an easier approach than introducing more printk buffers.
Also, it will consolidate printings with headers.

Introduce show_stack_loglvl(), that eventually will substitute
show_stack().

Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
[1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u
Acked-by: Michael Ellerman  (powerpc)
Signed-off-by: Dmitry Safonov 
---
 arch/powerpc/kernel/process.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9c21288f8645..6ad438d59796 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2068,7 +2068,8 @@ unsigned long get_wchan(struct task_struct *p)
 
 static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH;
 
-void show_stack(struct task_struct *tsk, unsigned long *stack)
+void show_stack_loglvl(struct task_struct *tsk, unsigned long *stack,
+  const char *loglvl)
 {
unsigned long sp, ip, lr, newsp;
int count = 0;
@@ -2093,7 +2094,7 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
}
 
lr = 0;
-   printk("Call Trace:\n");
+   printk("%sCall Trace:\n", loglvl);
do {
if (!validate_sp(sp, tsk, STACK_FRAME_OVERHEAD))
break;
@@ -2102,7 +2103,8 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
newsp = stack[0];
ip = stack[STACK_FRAME_LR_SAVE];
if (!firstframe || ip != lr) {
-   printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip);
+   printk("%s["REG"] ["REG"] %pS",
+   loglvl, sp, ip, (void *)ip);
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
ret_addr = ftrace_graph_ret_addr(current,
_idx, ip, stack);
@@ -2124,8 +2126,9 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
struct pt_regs *regs = (struct pt_regs *)
(sp + STACK_FRAME_OVERHEAD);
lr = regs->link;
-   printk("--- interrupt: %lx at %pS\nLR = %pS\n",
-  regs->trap, (void *)regs->nip, (void *)lr);
+   printk("%s--- interrupt: %lx at %pS\nLR = %pS\n",
+  loglvl, regs->trap,
+  (void *)regs->nip, (void *)lr);
firstframe = 1;
}
 
@@ -2135,6 +2138,11 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
put_task_stack(tsk);
 }
 
+void show_stack(struct task_struct *tsk, unsigned long *stack)
+{
+   show_stack_loglvl(tsk, stack, KERN_DEFAULT);
+}
+
 #ifdef CONFIG_PPC64
 /* Called with hard IRQs off */
 void notrace __ppc64_runlatch_on(void)
-- 
2.26.0



[PATCHv2 00/50] Add log level to show_stack()

2020-03-16 Thread Dmitry Safonov
Changes to v2:
- Removed excessive pr_cont("\n") (nits by Senozhatsky)
- Leave backtrace debugging messages with pr_debug()
  (noted by Russell King and Will Deacon)
- Correct microblaze_unwind_inner() declaration
  (Thanks to Michal Simek and kbuild test robot)
- Fix copy'n'paste typo in show_stack_loglvl() for sparc
  (kbuild robot)
- Fix backtrace output on xtensa
  (Thanks Max Filippov)
- Add loglevel to show_stack() on s390 (kbuild robot)
- Collected all Reviewed-by and Acked-by (thanks!)

Add log level argument to show_stack().
Done in three stages:
1. Introducing show_stack_loglvl() for every architecture
2. Migrating old users with an explicit log level
3. Renaming show_stack_loglvl() into show_stack()

Justification:
o It's a design mistake to move a business-logic decision
  into platform realization detail.
o I have currently two patches sets that would benefit from this work:
  Removing console_loglevel jumps in sysrq driver [1]
  Hung task warning before panic [2] - suggested by Tetsuo (but he
  probably didn't realise what it would involve).
o While doing (1), (2) the backtraces were adjusted to headers
  and other messages for each situation - so there won't be a situation
  when the backtrace is printed, but the headers are missing because
  they have lesser log level (or the reverse).

The least important for upstream, but maybe still worth to note that
every company I've worked in so far had an off-list patch to print
backtrace with the needed log level (but only for the architecture they
cared about).
If you have other ideas how you will benefit from show_stack() with
a log level - please, reply to this cover letter.

See also discussion on v1:
https://lore.kernel.org/linux-riscv/20191106083538.z5nlpuf64cigx...@pathway.suse.cz/

Cc: Andrew Morton 
Cc: Greg Kroah-Hartman 
Cc: Ingo Molnar 
Cc: Jiri Slaby 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: Tetsuo Handa 

Thanks,
Dmitry

[1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u
[2]: 
https://lkml.kernel.org/r/41fd7652-df1f-26f6-aba0-b87ebae07...@i-love.sakura.ne.jp

Dmitry Safonov (50):
  kallsyms/printk: Add loglvl to print_ip_sym()
  alpha: Add show_stack_loglvl()
  arc: Add show_stack_loglvl()
  arm/asm: Add loglvl to c_backtrace()
  arm: Add loglvl to unwind_backtrace()
  arm: Add loglvl to dump_backtrace()
  arm: Wire up dump_backtrace_{entry,stm}
  arm: Add show_stack_loglvl()
  arm64: Add loglvl to dump_backtrace()
  arm64: Add show_stack_loglvl()
  c6x: Add show_stack_loglvl()
  csky: Add show_stack_loglvl()
  h8300: Add show_stack_loglvl()
  hexagon: Add show_stack_loglvl()
  ia64: Pass log level as arg into ia64_do_show_stack()
  ia64: Add show_stack_loglvl()
  m68k: Add show_stack_loglvl()
  microblaze: Add loglvl to microblaze_unwind_inner()
  microblaze: Add loglvl to microblaze_unwind()
  microblaze: Add show_stack_loglvl()
  mips: Add show_stack_loglvl()
  nds32: Add show_stack_loglvl()
  nios2: Add show_stack_loglvl()
  openrisc: Add show_stack_loglvl()
  parisc: Add show_stack_loglvl()
  powerpc: Add show_stack_loglvl()
  riscv: Add show_stack_loglvl()
  s390: Add show_stack_loglvl()
  sh: Add loglvl to dump_mem()
  sh: Remove needless printk()
  sh: Add loglvl to printk_address()
  sh: Add loglvl to show_trace()
  sh: Add show_stack_loglvl()
  sparc: Add show_stack_loglvl()
  um/sysrq: Remove needless variable sp
  um: Add show_stack_loglvl()
  unicore32: Remove unused pmode argument in c_backtrace()
  unicore32: Add loglvl to c_backtrace()
  unicore32: Add show_stack_loglvl()
  x86: Add missing const qualifiers for log_lvl
  x86: Add show_stack_loglvl()
  xtensa: Add loglvl to show_trace()
  xtensa: Add show_stack_loglvl()
  sysrq: Use show_stack_loglvl()
  x86/amd_gart: Print stacktrace for a leak with KERN_ERR
  power: Use show_stack_loglvl()
  kdb: Don't play with console_loglevel
  sched: Print stack trace with KERN_INFO
  kernel: Use show_stack_loglvl()
  kernel: Rename show_stack_loglvl() => show_stack()

 arch/alpha/kernel/traps.c| 22 +++
 arch/arc/include/asm/bug.h   |  3 ++-
 arch/arc/kernel/stacktrace.c | 17 +++-
 arch/arc/kernel/troubleshoot.c   |  2 +-
 arch/arm/include/asm/bug.h   |  3 ++-
 arch/arm/include/asm/traps.h |  3 ++-
 arch/arm/include/asm/unwind.h|  3 ++-
 arch/arm/kernel/traps.c  | 40 
 arch/arm/kernel/unwind.c |  7 ++---
 arch/arm/lib/backtrace-clang.S   |  9 +--
 arch/arm/lib/backtrace.S | 14 +++---
 arch/arm64/include/asm/stacktrace.h  |  3 ++-
 arch/arm64/kernel/process.c  |  2 +-
 arch/arm64/kernel/traps.c| 19 ++---
 arch/c6x/kernel/traps.c  | 18 +++--
 arch/csky/kernel/dumpstack.c |  9 ---
 arch/csky/kernel/ptrace.c|  4 +--
 arch/h8300/kernel/traps.c| 12 -
 arch/hexagon/kernel/traps.c 

[PATCHv2 26/50] powerpc: Add show_stack_loglvl()

2020-03-16 Thread Dmitry Safonov
Currently, the log-level of show_stack() depends on a platform
realization. It creates situations where the headers are printed with
lower log level or higher than the stacktrace (depending on
a platform or user).

Furthermore, it forces the logic decision from user to an architecture
side. In result, some users as sysrq/kdb/etc are doing tricks with
temporary rising console_loglevel while printing their messages.
And in result it not only may print unwanted messages from other CPUs,
but also omit printing at all in the unlucky case where the printk()
was deferred.

Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems
an easier approach than introducing more printk buffers.
Also, it will consolidate printings with headers.

Introduce show_stack_loglvl(), that eventually will substitute
show_stack().

Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
[1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u
Acked-by: Michael Ellerman  (powerpc)
Signed-off-by: Dmitry Safonov 
---
 arch/powerpc/kernel/process.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fad50db9dcf2..c1ab7f613da4 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2034,7 +2034,8 @@ unsigned long get_wchan(struct task_struct *p)
 
 static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH;
 
-void show_stack(struct task_struct *tsk, unsigned long *stack)
+void show_stack_loglvl(struct task_struct *tsk, unsigned long *stack,
+  const char *loglvl)
 {
unsigned long sp, ip, lr, newsp;
int count = 0;
@@ -2059,7 +2060,7 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
}
 
lr = 0;
-   printk("Call Trace:\n");
+   printk("%sCall Trace:\n", loglvl);
do {
if (!validate_sp(sp, tsk, STACK_FRAME_OVERHEAD))
break;
@@ -2068,7 +2069,8 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
newsp = stack[0];
ip = stack[STACK_FRAME_LR_SAVE];
if (!firstframe || ip != lr) {
-   printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip);
+   printk("%s["REG"] ["REG"] %pS",
+   loglvl, sp, ip, (void *)ip);
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
ret_addr = ftrace_graph_ret_addr(current,
_idx, ip, stack);
@@ -2090,8 +2092,9 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
struct pt_regs *regs = (struct pt_regs *)
(sp + STACK_FRAME_OVERHEAD);
lr = regs->link;
-   printk("--- interrupt: %lx at %pS\nLR = %pS\n",
-  regs->trap, (void *)regs->nip, (void *)lr);
+   printk("%s--- interrupt: %lx at %pS\nLR = %pS\n",
+  loglvl, regs->trap,
+  (void *)regs->nip, (void *)lr);
firstframe = 1;
}
 
@@ -2101,6 +2104,11 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
put_task_stack(tsk);
 }
 
+void show_stack(struct task_struct *tsk, unsigned long *stack)
+{
+   show_stack_loglvl(tsk, stack, KERN_DEFAULT);
+}
+
 #ifdef CONFIG_PPC64
 /* Called with hard IRQs off */
 void notrace __ppc64_runlatch_on(void)
-- 
2.25.1



[PATCH-tty-testing] tty/serial/8250: Add has_sysrq to plat_serial8250_port

2019-12-17 Thread Dmitry Safonov
In contrast to 8250/8250_of, legacy_serial on powerpc does fill
(struct plat_serial8250_port). The reason is likely that it's done on
device_initcall(), not on probe. So, 8250_core is not yet probed.

Propagate value from platform_device on 8250 probe - in case powepc
legacy driver it's initialized on initcall, in case 8250_of it will be
initialized later on of_platform_serial_setup().

Fixes: ea2683bf546c ("tty/serial: Migrate 8250_fsl to use has_sysrq").
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Reported-by: kbuild test robot 
Signed-off-by: Dmitry Safonov 
---
 It's probably better to squash this into the 8250_fsl patch.
 I've added Fixes tag in case the branch won't be rebased.
 Tested powerpc build manually with ppc64 cross-compiler.

 drivers/tty/serial/8250/8250_core.c | 1 +
 include/linux/serial_8250.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/tty/serial/8250/8250_core.c 
b/drivers/tty/serial/8250/8250_core.c
index e682390ce0de..0894a22fd702 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -816,6 +816,7 @@ static int serial8250_probe(struct platform_device *dev)
uart.port.flags = p->flags;
uart.port.mapbase   = p->mapbase;
uart.port.hub6  = p->hub6;
+   uart.port.has_sysrq = p->has_sysrq;
uart.port.private_data  = p->private_data;
uart.port.type  = p->type;
uart.port.serial_in = p->serial_in;
diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h
index bb2bc99388ca..6a8e8c48c882 100644
--- a/include/linux/serial_8250.h
+++ b/include/linux/serial_8250.h
@@ -25,6 +25,7 @@ struct plat_serial8250_port {
unsigned char   regshift;   /* register shift */
unsigned char   iotype; /* UPIO_* */
unsigned char   hub6;
+   unsigned char   has_sysrq;  /* supports magic SysRq */
upf_t   flags;  /* UPF_* flags */
unsigned inttype;   /* If UPF_FIXED_TYPE */
unsigned int(*serial_in)(struct uart_port *, int);
-- 
2.24.1



Re: [PATCH 00/58] serial/sysrq: Cleanup ifdeffery

2019-12-13 Thread Dmitry Safonov
Hi Christophe,

On 12/13/19 5:47 AM, Christophe Leroy wrote:
> Le 13/12/2019 à 01:05, Dmitry Safonov a écrit :
[..]
> 
> powerpc patchwork didn't get the full series, see
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=148198

Yes, I was under impression that architecture mail-lists want related
patches. But now I see that from the patchwork point of view it's better
to have the whole series in inbox.

> Can't find them on linux-serial patchwork either
> (https://patches.linaro.org/project/linux-serial/list/)

I'm not sure - maybe the frequency of checking is low?
I see all patches in linux-serial ml:
https://marc.info/?l=linux-serial=1=201912=2

> It is impossible to review/test powerpc bits without the first patches
> of the series, where can the entire series be found ?

Sorry for the inconvenience.
I can resend without Cc'ing all people just to ppc mail-list if that
works for you. Or you can clone it directly from my github:
https://github.com/0x7f454c46/linux/tree/sysrq-serial-seq-v1

Thanks,
  Dmitry


[PATCH 49/58] serial/ucc_uart: Remove ifdef SUPPORT_SYSRQ

2019-12-12 Thread Dmitry Safonov
ucc_uart doesn't seem to support console over itself, so maybe it can
be deleted with uart_handle_sysrq_char() from the file.

Cc: Timur Tabi 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Dmitry Safonov 
---
 drivers/tty/serial/ucc_uart.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/tty/serial/ucc_uart.c b/drivers/tty/serial/ucc_uart.c
index a0555ae2b1ef..ff7784047156 100644
--- a/drivers/tty/serial/ucc_uart.c
+++ b/drivers/tty/serial/ucc_uart.c
@@ -551,9 +551,7 @@ static void qe_uart_int_rx(struct uart_qe_port *qe_port)
/* Overrun does not affect the current character ! */
if (status & BD_SC_OV)
tty_insert_flip_char(tport, 0, TTY_OVERRUN);
-#ifdef SUPPORT_SYSRQ
port->sysrq = 0;
-#endif
goto error_return;
 }
 
-- 
2.24.0



[PATCH 31/58] tty/serial: Migrate pmac_zilog to use has_sysrq

2019-12-12 Thread Dmitry Safonov
The SUPPORT_SYSRQ ifdeffery is not nice as:
- May create misunderstanding about sizeof(struct uart_port) between
  different objects
- Prevents moving functions from serial_core.h
- Reduces readability (well, it's ifdeffery - it's hard to follow)

In order to remove SUPPORT_SYSRQ, has_sysrq variable has been added.
Initialise it in driver's probe and remove ifdeffery.

Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Dmitry Safonov 
---
 drivers/tty/serial/pmac_zilog.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/tty/serial/pmac_zilog.c b/drivers/tty/serial/pmac_zilog.c
index bcb5bf70534e..ba65a3bbd72a 100644
--- a/drivers/tty/serial/pmac_zilog.c
+++ b/drivers/tty/serial/pmac_zilog.c
@@ -61,10 +61,6 @@
 #define of_machine_is_compatible(x) (0)
 #endif
 
-#if defined (CONFIG_SERIAL_PMACZILOG_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
-#define SUPPORT_SYSRQ
-#endif
-
 #include 
 #include 
 
@@ -1721,6 +1717,7 @@ static int __init pmz_init_port(struct uart_pmac_port 
*uap)
uap->control_reg   = uap->port.membase;
uap->data_reg  = uap->control_reg + 4;
uap->port_type = 0;
+   uap->port.has_sysrq = IS_ENABLED(CONFIG_SERIAL_PMACZILOG_CONSOLE);
 
pmz_convert_to_zs(uap, CS8, 0, 9600);
 
-- 
2.24.0



[PATCH 05/58] tty/serial: Migrate 8250_fsl to use has_sysrq

2019-12-12 Thread Dmitry Safonov
The SUPPORT_SYSRQ ifdeffery is not nice as:
- May create misunderstanding about sizeof(struct uart_port) between
  different objects
- Prevents moving functions from serial_core.h
- Reduces readability (well, it's ifdeffery - it's hard to follow)

In order to remove SUPPORT_SYSRQ, has_sysrq variable has been added.
Initialise it in driver's probe and remove ifdeffery.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Dmitry Safonov 
---
 arch/powerpc/kernel/legacy_serial.c | 4 +++-
 drivers/tty/serial/8250/8250_fsl.c  | 4 
 drivers/tty/serial/8250/8250_of.c   | 4 +++-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/legacy_serial.c 
b/arch/powerpc/kernel/legacy_serial.c
index 7cea5978f21f..f061e06e9f51 100644
--- a/arch/powerpc/kernel/legacy_serial.c
+++ b/arch/powerpc/kernel/legacy_serial.c
@@ -479,8 +479,10 @@ static void __init fixup_port_irq(int index,
port->irq = virq;
 
 #ifdef CONFIG_SERIAL_8250_FSL
-   if (of_device_is_compatible(np, "fsl,ns16550"))
+   if (of_device_is_compatible(np, "fsl,ns16550")) {
port->handle_irq = fsl8250_handle_irq;
+   port->has_sysrq = IS_ENABLED(CONFIG_SERIAL_8250_CONSOLE);
+   }
 #endif
 }
 
diff --git a/drivers/tty/serial/8250/8250_fsl.c 
b/drivers/tty/serial/8250/8250_fsl.c
index aa0e216d5ead..0d0c80905c58 100644
--- a/drivers/tty/serial/8250/8250_fsl.c
+++ b/drivers/tty/serial/8250/8250_fsl.c
@@ -1,8 +1,4 @@
 // SPDX-License-Identifier: GPL-2.0
-#if defined(CONFIG_SERIAL_8250_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
-#define SUPPORT_SYSRQ
-#endif
-
 #include 
 #include 
 
diff --git a/drivers/tty/serial/8250/8250_of.c 
b/drivers/tty/serial/8250/8250_of.c
index 92fbf46ce3bd..531ad67395e0 100644
--- a/drivers/tty/serial/8250/8250_of.c
+++ b/drivers/tty/serial/8250/8250_of.c
@@ -222,8 +222,10 @@ static int of_platform_serial_setup(struct platform_device 
*ofdev,
 
if (IS_ENABLED(CONFIG_SERIAL_8250_FSL) &&
(of_device_is_compatible(np, "fsl,ns16550") ||
-of_device_is_compatible(np, "fsl,16550-FIFO64")))
+of_device_is_compatible(np, "fsl,16550-FIFO64"))) {
port->handle_irq = fsl8250_handle_irq;
+   port->has_sysrq = IS_ENABLED(CONFIG_SERIAL_8250_CONSOLE);
+   }
 
return 0;
 err_unprepare:
-- 
2.24.0



[PATCH 00/58] serial/sysrq: Cleanup ifdeffery

2019-12-12 Thread Dmitry Safonov
The original purpose of the patches set was to add a way to enable
sysrq on a uart where currently it can be constantly either on or off
(CONFIG_MAGIC_SYSRQ_SERIAL), see the last patch:
  "serial/sysrq: Add MAGIC_SYSRQ_SERIAL_SEQUENCE"

But to do that, I had to add uart_try_toggle_sysrq() and I didn't want
to bloat serial_core.h even more. So, I did cleanup by removing
SUPPORT_SYSRQ resulting in a nice diff-stat and lesser ifdeffery.

Most patches are one-liners, I decided to keep them separated per-driver
to let reviewers easier follow the purpose.

Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: Vasiliy Khoruzhick 
Cc: linux-ser...@vger.kernel.org

Dmitry Safonov (58):
  sysrq: Remove sysrq_handler_registered
  serial: Move sysrq members above
  serial_core: Un-ifdef sysrq SUPPORT_SYSRQ
  tty/serial: Migrate aspeed_vuart to use has_sysrq
  tty/serial: Migrate 8250_fsl to use has_sysrq
  tty/serial: Migrate bcm63xx_uart to use has_sysrq
  tty/serial: Migrate 8250_omap to use has_sysrq
  tty/serial: Migrate 8250_port to use has_sysrq
  tty/serial: Migrate amba-pl01* to use has_sysrq
  tty/serial: Migrate apbuart to use has_sysrq
  tty/serial: Migrate arc_uart to use has_sysrq
  tty/serial: Migrate atmel_serial to use has_sysrq
  tty/serial: Migrate clps711x to use has_sysrq
  tty/serial: Migrate cpm_uart to use has_sysrq
  tty/serial: Migrate dz to use has_sysrq
  tty/serial: Migrate efm32-uart to use has_sysrq
  tty/serial: Migrate fsl_linflexuart to use has_sysrq
  tty/serial: Migrate fsl_lpuart to use has_sysrq
  tty/serial: Migrate imx to use has_sysrq
  tty/serial: Migrate ip22zilog to use has_sysrq
  tty/serial: Migrate meson_uart to use has_sysrq
  tty/serial: Migrate milbeaut_usio to use has_sysrq
  tty/serial: Migrate mpc52xx_uart to use has_sysrq
  tty/serial: Don't zero port->sysrq
  tty/serial: Migrate msm_serial to use has_sysrq
  tty/serial: Migrate mux to use has_sysrq
  tty/serial: Migrate mxs-auart to use has_sysrq
  tty/serial: Migrate omap-serial to use has_sysrq
  tty/serial: Migrate pch_uart to use has_sysrq
  tty/serial: Don't check port->sysrq
  tty/serial: Migrate pmac_zilog to use has_sysrq
  tty/serial: Migrate pnx8xxx_uart to use has_sysrq
  serial/f81534: Don't check port->sysrq
  tty/serial: Migrate pxa to use has_sysrq
  tty/serial: Migrate qcom_geni_serial to use has_sysrq
  tty/serial: Migrate sa1100 to use has_sysrq
  tty/serial: Migrate samsung_tty to use has_sysrq
  tty/serial: Migrate sb1250-duart to use has_sysrq
  tty/serial: Migrate sccnxp to use has_sysrq
  tty/serial: Migrate serial_txx9 to use has_sysrq
  tty/serial: Migrate sh-sci to use has_sysrq
  tty/serial: Migrate sprd_serial to use has_sysrq
  tty/serial: Migrate st-asc to use has_sysrq
  tty/serial: Migrate stm32-usart to use has_sysrq
  tty/serial: Migrate sunhv to use has_sysrq
  tty/serial: Migrate sunsab to use has_sysrq
  tty/serial: Migrate sunsu to use has_sysrq
  tty/serial: Migrate sunzilog to use has_sysrq
  serial/ucc_uart: Remove ifdef SUPPORT_SYSRQ
  tty/serial: Migrate vr41xx_siu to use has_sysrq
  tty/serial: Migrate vt8500_serial to use has_sysrq
  tty/serial: Migrate xilinx_uartps to use has_sysrq
  tty/serial: Migrate zs to use has_sysrq
  serial_core: Remove SUPPORT_SYSRQ ifdeffery
  usb/serial: Don't handle break when CONFIG_MAGIC_SYSRQ is disabled
  serial_core: Move sysrq functions from header file
  sysctl/sysrq: Remove __sysrq_enabled copy
  serial/sysrq: Add MAGIC_SYSRQ_SERIAL_SEQUENCE

 arch/powerpc/kernel/legacy_serial.c |   4 +-
 drivers/tty/serial/8250/8250_aspeed_vuart.c |   5 +-
 drivers/tty/serial/8250/8250_fsl.c  |   4 -
 drivers/tty/serial/8250/8250_of.c   |   4 +-
 drivers/tty/serial/8250/8250_omap.c |   5 +-
 drivers/tty/serial/8250/8250_port.c |   5 +-
 drivers/tty/serial/amba-pl010.c |   5 +-
 drivers/tty/serial/amba-pl011.c |   6 +-
 drivers/tty/serial/apbuart.c|   5 +-
 drivers/tty/serial/arc_uart.c   |   5 +-
 drivers/tty/serial/atmel_serial.c   |   9 +-
 drivers/tty/serial/bcm63xx_uart.c   |   5 +-
 drivers/tty/serial/clps711x.c   |   5 +-
 drivers/tty/serial/cpm_uart/cpm_uart_core.c |   9 +-
 drivers/tty/serial/dz.c |   5 +-
 drivers/tty/serial/efm32-uart.c |   5 +-
 drivers/tty/serial/fsl_linflexuart.c|   8 +-
 drivers/tty/serial/fsl_lpuart.c |   9 +-
 drivers/tty/serial/imx.c|   7 +-
 drivers/tty/serial/ip22zilog.c  |   7 +-
 drivers/tty/serial/meson_uart.c |   5 +-
 drivers/tty/serial/milbeaut_usio.c  |   5 +-
 drivers/tty/serial/mpc52xx_uart.c   |  11 +-
 drivers/tty/serial/msm_serial.c |   5 +-
 drivers/tty/serial/mux.c|   5 +-
 drivers/tty/serial/mxs-auart.c  |   5 +-
 drivers/tty/serial/omap-serial.c|   5 +-
 drivers/tty/serial/pch_uart.c   | 

Re: [PATCH 00/50] Add log level to show_stack()

2019-11-13 Thread Dmitry Safonov
Hi Sergey,

On 11/13/19 6:33 AM, Sergey Senozhatsky wrote:
[..]
> Well, here we go. There is a number of generally useful functions that
> print nice data and where people might want to have better loglevel control
> (for debugging purposes). show_stack() is just one of them. Patching all
> those functions, which you have mentioned above, is hardly a fun task to do.
> Hence the printk() per-CPU per-context loglevel proposition. The code there
> is not clever or complicated and is meant for debugging purposes only, but
> with just 3 lines of code one can do some stuff:
> 
>   /* @BTW you can't do this with "%s" KERN_FOO ;P */
> + printk_emergency_enter(LOGLEVEL_SCHED);
> + debug_show_all_locks();
> + printk_emergency_exit();

Not that I want to critic your proposal more, but just to clarify what
I've meant by "cleaver and complicated":

I don't think semantically there is any difference between:

printk_emergency_enter(LOGLEVEL_SCHED);
printk();
printk_emergency_exit();
vs
printk("%s ...", KERN_SHED, ...);

I was speaking about complexity not about usage, but about realization
inside printk_emergency_enter(): there will be some business logic that
will do "it's NMI? Use loglevel given. it's KERN_ALERT? Don't downgrade
the loglevel..." and other smart details those are really logic which
one have to think about and later maintain.

There will be also minor issues like people inserting print with one log
level and seeing it in dmesg with another, but I presume those
printk_enter() and printk_exit() will cover little amount of code
without much nesting [as long as it's not getting overused]. And also it
can be documented and people will learn about another feature of printk().

And this year I've seen the presentation "Why printk() is so
complicated?" and I presumed that the approach is to keep things as
simple as possible.

In conclusion:
I see that your proposal also solves the problem [except preemption and
some pr_debug() that shouldn't be printed]. And I think your approach is
better in the sense of short-term (we won't have any missed %s KERN_ in
linux-next), but in a long-term it adds some amount of business logic to
printk and another feature.

Just in case: again, I don't mind, it's up to you, maintainers of
printk. It's also not very fun for me to create those patches.
But they fix console_loglevel issues (I hope we could un-export it in
the end) and also I need it for my other patches those will produce
warnings with debug loglevel when configured through sysctl.

Thanks,
  Dmitry


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-12 Thread Dmitry Safonov
On 11/12/19 4:25 AM, Sergey Senozhatsky wrote:
> On (19/11/12 02:40), Dmitry Safonov wrote:
> [..]
>> In my point of view the cost of one-time [mostly build] testing every
>> architecture is cheaper than introducing some new smart code that will
>> live forever.
> 
> Well, there may be the need to pass loglevel deeper due to "hey __show_stack()
> on that arch invokes bar(), which invokes foo() and now foo() does printk(),
> but we don't see it". The context which decided to backtaraces decided
> to do so for a reason, probably, so I guess we can look at it as "a special
> error reporting code block".
> 
> The proposed patch set passes loglevel via slightly unusual channel -
> via sprintf(). We probably can do it, but I would prefer to minimize
> the number of such printk-s in the kernel. The code snippet which I
> posted also does pretty unusual thing w.r.t loglevel. Both approaches
> are "non-standard" from that POV.

I don't strongly disagree, but if you look at those results:
git grep 'printk("%s.*", \(lvl\|level\)'

it seems to be used in quite a few places.

Thanks,
  Dmitry


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-12 Thread Dmitry Safonov
On 11/13/19 1:23 AM, Sergey Senozhatsky wrote:
> On (19/11/12 19:12), Sergey Senozhatsky wrote:
>> On (19/11/12 09:35), Petr Mladek wrote:
>> [..]
>>> This is getting too complicated. It would introduce too many
>>> hidden rules. While the explicitly passed loglevel parameter
>>> is straightforward and clear.
>>
>> If loglevel is DEFAULT or NOTICE or INFO then we can overwrite it
>> (either downgrade or upgrade). That's one rule, basically. Not too
>> complicated, I guess.
> 
> Can be taken even a bit further than
> 
>   show_stack(NULL, NULL, LOGLEVEL_DEBUG);
> or
>   show_stack(NULL, NULL, LOGLEVEL_ERR);
> 
> For instance,
> 
>   spin_lock_irqsave(>lock, flags);
>   printk_emergency_enter(LOGLEVEL_SCHED);
>   ...
>   show_stack(...);
>   printk();
>   printk();
>   ...
>   spin_unlock_irqrestore(>lock, flags);
> 
> or
> 
>   spin_lock_irqsave(_port->lock, flags);
>   printk_emergency_enter(LOGLEVEL_SCHED);
>   ...
>   printk();
>   printk();
>   ...

Yeah, that makes sense.

I believe it's up to you, Petr and Steven to decide what's better for
printk(). I mean, after all you care for all this code.

I guess I've pointed that in my point of view price for one-time testing
code is cheaper than adding a new printk feature to swap log levels on
the fly. But again, it's just how I see it - if we fix those printing
problems, than in half year we will forget they ever existed, but in
your proposal, there will still be some clever printk code.

On the other side, also worth to note that current patches set fix
problems for kdb (and for my hung task printing patch), but it's
incomplete for sysrq driver. I've gone through functions used by sysrq
driver and the same changes introducing log level parameter would be
needed for: sched_show_task(), debug_show_all_locks(), show_regs(),
show_state(), show_mem(). Some of them don't need any platform changes,
but at least show_regs() needs.
So, you also need to have that in mind making the decision.

Thanks,
  Dmitry


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-11 Thread Dmitry Safonov
Hi Sergey,

On 11/12/19 2:17 AM, Sergey Senozhatsky wrote:
> On (19/11/11 19:47), Dmitry Safonov wrote:
[..]
>> What I'm going to do - is to fix all build and reported issues, I'll
>> send v2 this week and feel free to NAK it, I will forget about those
>> patches and won't be offended.
> 
> Lovely.
> And - no, I'm not going to NAK platform specific changes. Just so you know.
> 
> *All* I'm talking about is an alternative, less "go and touch a ton of
> platform code" approach. The argument "I patched so many files that I'm
> not even going to discuss anything now" is not productive, to say the
> least. Hope this clarifies.

It probably was a wrong impression from the both sides.
My impression was "You touch every architecture - we won't even consider
that". Sorry for the the wrong impression from my side - I'm open for
discussion.

In my point of view the cost of one-time [mostly build] testing every
architecture is cheaper than introducing some new smart code that will
live forever. Though, again you and Petr understand more than me in
printk() code, so I'm not any insisting.

I'll reply to your suggestion tomorrow, it's a bit late in my tz.

Thanks,
  Dmitry


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-11 Thread Dmitry Safonov
Hi Sergey, Petr,

On 11/11/19 1:23 AM, Sergey Senozhatsky wrote:
> On (19/11/08 14:04), Petr Mladek wrote:
> [..]
>> I agree that it is complicated to pass the loglevel as
>> a parameter. It would be better define the default
>> log level for a given code section. It might be stored
>> in task_struct for the normal context and in per-CPU
>> variables for interrupt contexts.
> 
> I do recall that we talked about per-CPU printk state bit which would
> start/end "just print it" section. We probably can extend it to "just
> log_store" type of functionality. Doesn't look like a very bad idea.
> "This task/context is in trouble, whatever it printk()-s is important".

I don't see how bits on task_struct or in per-cpu are easier than
supplying a log level parameter down the stack.
How would it work if sysrq_handle_crash() called by key-press?
How would that interact with deferred printing?
How would it make visible prints from current context, but not from
something that preempted it?

Furthermore, what problems are you trying to solve with this design?
Only sysrq driver? Kdb?
In my perspective it sounds too complicated and over-engineered for
something that has two-three users.
Also I've tried to point that I need to print backtrace sometimes with
KERN_DEBUG loglevel to use it in production for early notices those
needs to go only to log files and currently each architecture decides
which log level it prefers.

And what's so complicated about this patches set? I see only side of the
testing, but the build-testing is covered with 0day bot and cost nothing
and any visible regression may be found during -next period.
While introducing those printk-sections may subtly break things.

I mean, I definitely know less about printk() and its internals than you
- so my points may be a no-sense.

What I'm going to do - is to fix all build and reported issues, I'll
send v2 this week and feel free to NAK it, I will forget about those
patches and won't be offended.
I don't see many people those are "hey, we'll benefit from this".
And doing this patches set was neither quite fun (dirty business), nor
something I can be later proud of (hey, I've introduced the log level
parameter to printing functions!).

Thanks,
  Dima


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-08 Thread Dmitry Safonov


On 11/8/19 5:30 PM, Russell King - ARM Linux admin wrote:
> On Fri, Nov 08, 2019 at 04:28:30PM +0000, Dmitry Safonov wrote:
[..]
>>
>> Well, the use-case for lower log-level is that everything goes into logs
>> (/var/log/dmesg or /var/log/messages whatever rsyslog has settting).
>>
>> That has it's value:
>> - after a failure (i.e. panic) messages, those were only signs that
>> something goes wrong can be seen in logs which can give ideas what has
>> happened.
> 
> No they don't.  When the kernel panics, userspace generally stops
> running, so rsyslog won't be able to write them to /var/log/messages.
> 
> How, by "kernel panics" I mean a real kernel panic, which probably
> isn't what you're talking about there.  You are probably talking
> about the whole shebang of non-fatal kernel oops, kernel warnings
> and the like.  If so, I'd ask you to stop confuzzilating terminology.
> 
> If you really want to capture such events, then you need to have the
> kernel write the panic to (e.g.) flash - see the mtdoops driver.

I was talking about things prior the panic: OOMs, MMC write/read
warnings, hung tasks, we also have local patches to produce a warning if
the mutex is being held for too long or a task is starving on CPU time
by hard/soft irqs (I hope I will design something like that for
upstream). I've found those warnings useful to:
(a) have an early message when the things are starting going bad.
(b) analyze contentions or too large scale for a box or faulty hardware
for non-reproducible issues just from logs.

We use kexec to save the dmesg ringbuffer content after the panic.

Thanks,
  Dmitry


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-08 Thread Dmitry Safonov
On 11/6/19 8:34 PM, Peter Zijlstra wrote:
> On Wed, Nov 06, 2019 at 04:27:33PM +0000, Dmitry Safonov wrote:
[..]
>> Sorry, I should have tried to describe better.
>>
>> I'm trying to remove external users of console_loglevel by following
>> reasons:
>
> I suppose since all my machines have 'debug ignore_loglevel
> earlyprintk=serial,ttyS0,115200 console=ttyS0,115200' I don't have this
> experience.

Yeah, I remember you avoid all those functionalities of printk(), fair
enough. On the other side, regular users and I'm betting most of
the non-tuned distributions use /proc/sys/kernel/printk by default.
(Checking on my Arch & Fedora - loglevel 4 from the box)

>> - changing console_loglevel on SMP means that unwanted messages from
>> other CPUs will appear (that have lower log level)
>> - on UMP unwanted messages may appear if the code is preempted while it
>> hasn't set the console_loglevel back to old
>> - rising console_loglevel to print wanted message(s) may not work at all
>> if printk() has being delayed and the console_loglevel is already set
>> back to old value
>
> Sure, frobbing the global console_loglevel is bad.
>
>> I also have patches in wip those needs to print backtrace with specific
>> loglevel (higher when it's critical, lower when it's notice and
>> shouldn't go to serial console).
>
> (everything always should go to serial, serial is awesome :-)

Personally I agree. Unfortunately, here @Arista there are switches (I'm
speaking about the order of thousands at least) those have baud-rate 9600.
It's a bit expensive being elaborate with such setup.

>> Besides on local tests I see hits those have headers (messages like
>> "Backtrace: ") without an actual backtrace and the reverse - a backtrace
>> without a reason for it. It's quite annoying and worth addressing by
>> syncing headers log levels to backtraces.
>
> I suppose I'm surprised there are backtraces that are not important.
> Either badness happened and it needs printing, or the user asked for it
> and it needs printing.
>
> Perhaps we should be removing backtraces if they're not important
> instead of allowing to print them as lower loglevels?

Well, the use-case for lower log-level is that everything goes into logs
(/var/log/dmesg or /var/log/messages whatever rsyslog has settting).

That has it's value:
- after a failure (i.e. panic) messages, those were only signs that
something goes wrong can be seen in logs which can give ideas what has
happened.
- before the failure, those messages go to telemetry and can be
auto-magically processed remotely to take a decision (e.g. balance the
traffic away).

I wish all the information could be gathered in the userspace, but many
existing kernel/driver messages are valuable. A more detailed example is
hung task detector: we want to have the backtrace for a task that is
in uninterruptible state too long, but only in logs as printing on
serial console may lead to holding console lock and watchdog.
Besides customer notifications and investigations, I see the value of such
"bottleneck" warnings during long-running integration tests.

Thanks,
  Dmitry


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-06 Thread Dmitry Safonov
Hi Peter,

On 11/6/19 9:20 AM, Peter Zijlstra wrote:
> On Wed, Nov 06, 2019 at 03:04:51AM +0000, Dmitry Safonov wrote:
>> Add log level argument to show_stack().
>> Done in three stages:
>> 1. Introducing show_stack_loglvl() for every architecture
>> 2. Migrating old users with an explicit log level
>> 3. Renaming show_stack_loglvl() into show_stack()
>>
>> Justification:
>> o It's a design mistake to move a business-logic decision
>>   into platform realization detail.
>> o I have currently two patches sets that would benefit from this work:
>>   Removing console_loglevel jumps in sysrq driver [1]
>>   Hung task warning before panic [2] - suggested by Tetsuo (but he
>>   probably didn't realise what it would involve).
>> o While doing (1), (2) the backtraces were adjusted to headers
>>   and other messages for each situation - so there won't be a situation
>>   when the backtrace is printed, but the headers are missing because
>>   they have lesser log level (or the reverse).
>> o As the result in (2) plays with console_loglevel for kdb are removed.
> 
> I really don't understand that word salad. Why are you doing this?
> 

Sorry, I should have tried to describe better.

I'm trying to remove external users of console_loglevel by following
reasons:
- changing console_loglevel on SMP means that unwanted messages from
other CPUs will appear (that have lower log level)
- on UMP unwanted messages may appear if the code is preempted while it
hasn't set the console_loglevel back to old
- rising console_loglevel to print wanted message(s) may not work at all
if printk() has being delayed and the console_loglevel is already set
back to old value

Sysrq driver changes console_loglevel because it needs to print message
with a specific log level (the code said userspace relies on this).
Kdb changes console_loglevel to print backtrace as the log level depends
on architecture realisation.

I also have patches in wip those needs to print backtrace with specific
loglevel (higher when it's critical, lower when it's notice and
shouldn't go to serial console).

Besides on local tests I see hits those have headers (messages like
"Backtrace: ") without an actual backtrace and the reverse - a backtrace
without a reason for it. It's quite annoying and worth addressing by
syncing headers log levels to backtraces.

Thanks,
  Dmitry


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-06 Thread Dmitry Safonov
On 11/6/19 8:35 AM, Petr Mladek wrote:
> On Wed 2019-11-06 03:04:51, Dmitry Safonov wrote:
>> Add log level argument to show_stack().
>> Done in three stages:
>> 1. Introducing show_stack_loglvl() for every architecture
>> 2. Migrating old users with an explicit log level
>> 3. Renaming show_stack_loglvl() into show_stack()
>>
>> Justification:
>> o It's a design mistake to move a business-logic decision
>>   into platform realization detail.
>> o I have currently two patches sets that would benefit from this work:
>>   Removing console_loglevel jumps in sysrq driver [1]
> 
> Just to clarify. The problem in sysrq driver is a bit different.
> It modifies console_loglevel to show even less important message
> on the console.
> 
> IMHO, it should be solved by printing the header line with pr_error().
> It is not ideal. A cleaner solution might be to introduce another
> loglevel that will always get pushed to the console. But I am
> not sure if it is worth this single line.

I believe why it's not done - there is a comment in sysrq code that said
the userspace relies on the loglevel of sysrq messages (and may trigger
alerts from emerg/err log level):
  * Raise the apparent loglevel to maximum so that the sysrq header
  * is shown to provide the user with positive feedback.  We do not
  * simply emit this at KERN_EMERG as that would change message
  * routing in the consumers of /proc/kmsg.

But I don't mind any solution.

>>   Hung task warning before panic [2] - suggested by Tetsuo (but he
>>   probably didn't realise what it would involve).
>> o While doing (1), (2) the backtraces were adjusted to headers
>>   and other messages for each situation - so there won't be a situation
>>   when the backtrace is printed, but the headers are missing because
>>   they have lesser log level (or the reverse).
>> o As the result in (2) plays with console_loglevel for kdb are removed.
> 
>> The least important for upstream, but maybe still worth to note that
>> every company I've worked in so far had an off-list patch to print
>> backtrace with the needed log level (but only for the architecture they
>> cared about).
>> If you have other ideas how you will benefit from show_stack() with
>> a log level - please, reply to this cover letter.
> 
> I agree with all the other justification.
> 
> I would add. The backtrace is really useful for debugging. It should
> be possible to print it even in less critical situations.
> 
> I am afraid that many people use WARN() for this purpose. But WARN()
> is not always appropriate. WARN() misuse huts when panic_on_warn
> option is used.

Thanks, Petr.

-- 
  Dmitry


Re: [PATCH 26/50] powerpc: Add show_stack_loglvl()

2019-11-06 Thread Dmitry Safonov
On 11/6/19 9:52 AM, Michael Ellerman wrote:
> Dmitry Safonov  writes:
>> Currently, the log-level of show_stack() depends on a platform
>> realization. It creates situations where the headers are printed with
>> lower log level or higher than the stacktrace (depending on
>> a platform or user).
> 
> Yes, I've had bug reports where the stacktrace is missing, which is
> annoying. Thanks for trying to fix the problem.
> 
>> Furthermore, it forces the logic decision from user to an architecture
>> side. In result, some users as sysrq/kdb/etc are doing tricks with
>> temporary rising console_loglevel while printing their messages.
>> And in result it not only may print unwanted messages from other CPUs,
>> but also omit printing at all in the unlucky case where the printk()
>> was deferred.
>>
>> Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems
>> an easier approach than introducing more printk buffers.
>> Also, it will consolidate printings with headers.
>>
>> Introduce show_stack_loglvl(), that eventually will substitute
>> show_stack().
>>
>> Cc: Benjamin Herrenschmidt 
>> Cc: Michael Ellerman 
>> Cc: Paul Mackerras 
>> Cc: linuxppc-dev@lists.ozlabs.org
>> [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u
>> Signed-off-by: Dmitry Safonov 
>> ---
>>  arch/powerpc/kernel/process.c | 18 +-
>>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> This looks good to me.
> 
> Acked-by: Michael Ellerman  (powerpc)

Thanks for the review and time!

-- 
  Dmitry


[PATCH 26/50] powerpc: Add show_stack_loglvl()

2019-11-05 Thread Dmitry Safonov
Currently, the log-level of show_stack() depends on a platform
realization. It creates situations where the headers are printed with
lower log level or higher than the stacktrace (depending on
a platform or user).

Furthermore, it forces the logic decision from user to an architecture
side. In result, some users as sysrq/kdb/etc are doing tricks with
temporary rising console_loglevel while printing their messages.
And in result it not only may print unwanted messages from other CPUs,
but also omit printing at all in the unlucky case where the printk()
was deferred.

Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems
an easier approach than introducing more printk buffers.
Also, it will consolidate printings with headers.

Introduce show_stack_loglvl(), that eventually will substitute
show_stack().

Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
[1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u
Signed-off-by: Dmitry Safonov 
---
 arch/powerpc/kernel/process.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 639ceae7da9d..34b46680a196 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2028,7 +2028,8 @@ unsigned long get_wchan(struct task_struct *p)
 
 static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH;
 
-void show_stack(struct task_struct *tsk, unsigned long *stack)
+void show_stack_loglvl(struct task_struct *tsk, unsigned long *stack,
+  const char *loglvl)
 {
unsigned long sp, ip, lr, newsp;
int count = 0;
@@ -2053,7 +2054,7 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
}
 
lr = 0;
-   printk("Call Trace:\n");
+   printk("%sCall Trace:\n", loglvl);
do {
if (!validate_sp(sp, tsk, STACK_FRAME_OVERHEAD))
break;
@@ -2062,7 +2063,8 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
newsp = stack[0];
ip = stack[STACK_FRAME_LR_SAVE];
if (!firstframe || ip != lr) {
-   printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip);
+   printk("%s["REG"] ["REG"] %pS",
+   loglvl, sp, ip, (void *)ip);
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
ret_addr = ftrace_graph_ret_addr(current,
_idx, ip, stack);
@@ -2084,8 +2086,9 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
struct pt_regs *regs = (struct pt_regs *)
(sp + STACK_FRAME_OVERHEAD);
lr = regs->link;
-   printk("--- interrupt: %lx at %pS\nLR = %pS\n",
-  regs->trap, (void *)regs->nip, (void *)lr);
+   printk("%s--- interrupt: %lx at %pS\nLR = %pS\n",
+  loglvl, regs->trap,
+  (void *)regs->nip, (void *)lr);
firstframe = 1;
}
 
@@ -2095,6 +2098,11 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
put_task_stack(tsk);
 }
 
+void show_stack(struct task_struct *tsk, unsigned long *stack)
+{
+   show_stack_loglvl(tsk, stack, KERN_DEFAULT);
+}
+
 #ifdef CONFIG_PPC64
 /* Called with hard IRQs off */
 void notrace __ppc64_runlatch_on(void)
-- 
2.23.0



[PATCH 00/50] Add log level to show_stack()

2019-11-05 Thread Dmitry Safonov
Add log level argument to show_stack().
Done in three stages:
1. Introducing show_stack_loglvl() for every architecture
2. Migrating old users with an explicit log level
3. Renaming show_stack_loglvl() into show_stack()

Justification:
o It's a design mistake to move a business-logic decision
  into platform realization detail.
o I have currently two patches sets that would benefit from this work:
  Removing console_loglevel jumps in sysrq driver [1]
  Hung task warning before panic [2] - suggested by Tetsuo (but he
  probably didn't realise what it would involve).
o While doing (1), (2) the backtraces were adjusted to headers
  and other messages for each situation - so there won't be a situation
  when the backtrace is printed, but the headers are missing because
  they have lesser log level (or the reverse).
o As the result in (2) plays with console_loglevel for kdb are removed.

The least important for upstream, but maybe still worth to note that
every company I've worked in so far had an off-list patch to print
backtrace with the needed log level (but only for the architecture they
cared about).
If you have other ideas how you will benefit from show_stack() with
a log level - please, reply to this cover letter.

Ok, to the scary part: I've tested it on x86_64 and build tested on a
couple of architectures. Though, I can't cover all platforms so
I hope I'll have a couple of reports and than it'll soak in linux-next
for some time. In my opinion the variety of architectures shouldn't
stop general improvements.

Cc: Andrew Morton 
Cc: Greg Kroah-Hartman 
Cc: Ingo Molnar 
Cc: Jiri Slaby 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: Tetsuo Handa 

Thanks,
Dmitry

[1]: https://lore.kernel.org/lkml/20190528002412.1625-1-d...@arista.com/T/#u
[2]: 
https://lkml.kernel.org/r/41fd7652-df1f-26f6-aba0-b87ebae07...@i-love.sakura.ne.jp

Dmitry Safonov (50):
  kallsyms/printk: Add loglvl to print_ip_sym()
  alpha: Add show_stack_loglvl()
  arc: Add show_stack_loglvl()
  arm/asm: Add loglvl to c_backtrace()
  arm: Add loglvl to unwind_backtrace()
  arm: Add loglvl to dump_backtrace()
  arm: Wire up dump_backtrace_{entry,stm}
  arm: Add show_stack_loglvl()
  arm64: Add loglvl to dump_backtrace()
  arm64: Add show_stack_loglvl()
  c6x: Add show_stack_loglvl()
  csky: Add show_stack_loglvl()
  h8300: Add show_stack_loglvl()
  hexagon: Add show_stack_loglvl()
  ia64: Pass log level as arg into ia64_do_show_stack()
  ia64: Add show_stack_loglvl()
  m68k: Add show_stack_loglvl()
  microblaze: Add loglvl to microblaze_unwind_inner()
  microblaze: Add loglvl to microblaze_unwind()
  microblaze: Add show_stack_loglvl()
  mips: Add show_stack_loglvl()
  nds32: Add show_stack_loglvl()
  nios2: Add show_stack_loglvl()
  openrisc: Add show_stack_loglvl()
  parisc: Add show_stack_loglvl()
  powerpc: Add show_stack_loglvl()
  riscv: Add show_stack_loglvl()
  s390: Add show_stack_loglvl()
  sh: Add loglvl to dump_mem()
  sh: Remove needless printk()
  sh: Add loglvl to printk_address()
  sh: Add loglvl to show_trace()
  sh: Add show_stack_loglvl()
  sparc: Add show_stack_loglvl()
  um/sysrq: Remove needless variable sp
  um: Add show_stack_loglvl()
  unicore32: Remove unused pmode argument in c_backtrace()
  unicore32: Add loglvl to c_backtrace()
  unicore32: Add show_stack_loglvl()
  x86: Add missing const qualifiers for log_lvl
  x86: Add show_stack_loglvl()
  xtensa: Add loglvl to show_trace()
  xtensa: Add show_stack_loglvl()
  sysrq: Use show_stack_loglvl()
  x86/amd_gart: Print stacktrace for a leak with KERN_ERR
  power: Use show_stack_loglvl()
  kdb: Don't play with console_loglevel
  sched: Print stack trace with KERN_INFO
  kernel: Use show_stack_loglvl()
  kernel: Rename show_stack_loglvl() => show_stack()

 arch/alpha/kernel/traps.c| 22 +++
 arch/arc/include/asm/bug.h   |  3 ++-
 arch/arc/kernel/stacktrace.c | 17 +++-
 arch/arc/kernel/troubleshoot.c   |  2 +-
 arch/arm/include/asm/bug.h   |  3 ++-
 arch/arm/include/asm/traps.h |  3 ++-
 arch/arm/include/asm/unwind.h|  3 ++-
 arch/arm/kernel/traps.c  | 40 
 arch/arm/kernel/unwind.c |  7 ++---
 arch/arm/lib/backtrace-clang.S   |  9 +--
 arch/arm/lib/backtrace.S | 14 +++---
 arch/arm64/include/asm/stacktrace.h  |  3 ++-
 arch/arm64/kernel/process.c  |  2 +-
 arch/arm64/kernel/traps.c| 19 ++---
 arch/c6x/kernel/traps.c  | 18 +++--
 arch/csky/kernel/dumpstack.c |  9 ---
 arch/csky/kernel/ptrace.c|  4 +--
 arch/h8300/kernel/traps.c| 12 -
 arch/hexagon/kernel/traps.c  | 25 -
 arch/ia64/include/asm/ptrace.h   |  1 -
 arch/ia64/kernel/mca.c   |  2 +-
 arch/ia64/kernel/process.c   | 17 ++--
 arch/m68k/kernel/traps.c | 13 -
 arch/microblaze/incl

[RFC 1/4] mm: remove unused TASK_SIZE_OF()

2016-12-30 Thread Dmitry Safonov
All users of TASK_SIZE_OF(tsk) have migrated to mm->task_size or
TASK_SIZE_MAX since:
commit d696ca016d57 ("x86/fsgsbase/64: Use TASK_SIZE_MAX for
FSBASE/GSBASE upper limits"),
commit a06db751c321 ("pagemap: check permissions and capabilities at
open time"),

Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Ralf Baechle <r...@linux-mips.org>
Cc: "James E.J. Bottomley" <j...@parisc-linux.org>
Cc: Helge Deller <del...@gmx.de>
Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Martin Schwidefsky <schwidef...@de.ibm.com>
Cc: Heiko Carstens <heiko.carst...@de.ibm.com>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/arm64/include/asm/memory.h   | 2 --
 arch/mips/include/asm/processor.h | 3 ---
 arch/parisc/include/asm/processor.h   | 3 +--
 arch/powerpc/include/asm/processor.h  | 3 +--
 arch/s390/include/asm/processor.h | 3 +--
 arch/sparc/include/asm/processor_64.h | 3 ---
 arch/x86/include/asm/processor.h  | 2 --
 include/linux/sched.h | 4 
 8 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index bfe632808d77..329bb4fd543c 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -80,8 +80,6 @@
 #define TASK_SIZE_32   UL(0x1)
 #define TASK_SIZE  (test_thread_flag(TIF_32BIT) ? \
TASK_SIZE_32 : TASK_SIZE_64)
-#define TASK_SIZE_OF(tsk)  (test_tsk_thread_flag(tsk, TIF_32BIT) ? \
-   TASK_SIZE_32 : TASK_SIZE_64)
 #else
 #define TASK_SIZE  TASK_SIZE_64
 #endif /* CONFIG_COMPAT */
diff --git a/arch/mips/include/asm/processor.h 
b/arch/mips/include/asm/processor.h
index 95b8c471f572..c2827a5507d4 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -73,9 +73,6 @@ extern unsigned int vced_count, vcei_count;
 #define TASK_SIZE (test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : 
TASK_SIZE64)
 #define STACK_TOP_MAX  TASK_SIZE64
 
-#define TASK_SIZE_OF(tsk)  \
-   (test_tsk_thread_flag(tsk, TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE64)
-
 #define TASK_IS_32BIT_ADDR test_thread_flag(TIF_32BIT_ADDR)
 
 #endif
diff --git a/arch/parisc/include/asm/processor.h 
b/arch/parisc/include/asm/processor.h
index a3661ee6b060..8b51ddae8e4a 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -32,8 +32,7 @@
 
 #define HAVE_ARCH_PICK_MMAP_LAYOUT
 
-#define TASK_SIZE_OF(tsk)   ((tsk)->thread.task_size)
-#define TASK_SIZE  TASK_SIZE_OF(current)
+#define TASK_SIZE  (current->thread.task_size)
 #define TASK_UNMAPPED_BASE  (current->thread.map_base)
 
 #define DEFAULT_TASK_SIZE32(0xFFF0UL)
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 1ba814436c73..04e575ead590 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -111,9 +111,8 @@ void release_thread(struct task_struct *);
  */
 #define TASK_SIZE_USER32 (0x0001UL - (1*PAGE_SIZE))
 
-#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \
+#define TASK_SIZE  (test_thread_flag(TIF_32BIT) ? \
TASK_SIZE_USER32 : TASK_SIZE_USER64)
-#define TASK_SIZETASK_SIZE_OF(current)
 
 /* This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index 6bca916a5ba0..c53e8e2a51ac 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -89,10 +89,9 @@ extern void execve_tail(void);
  * User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit.
  */
 
-#define TASK_SIZE_OF(tsk)  ((tsk)->mm->context.asce_limit)
 #define TASK_UNMAPPED_BASE (test_thread_flag(TIF_31BIT) ? \
(1UL << 30) : (1UL << 41))
-#define TASK_SIZE  TASK_SIZE_OF(current)
+#define TASK_SIZE  (current->mm->context.asce_limit)
 #define TASK_MAX_SIZE  (1UL << 53)
 
 #define STACK_TOP  (1UL << (test_thread_flag(TIF_31BIT) ? 31:42))
diff --git a/arch/sparc/include/asm/processor_64.h 
b/arch/sparc/include/asm/processor_64.h
index 6448cfc8292f..6ce1a75d7a24 100644
--- a/arch/sparc

[RFC 0/4] x86: keep TASK_SIZE in sync with mm->task_size

2016-12-30 Thread Dmitry Safonov
At this moment, we have following task_size-related things:
- TASK_SIZE_OF() macro, which is unused;
- current->mm->task_size which is used in half and TASK_SIZE() macro
  which is used in the other half of code
- TIF_ADDR32, which is used to detect 32-bit address space and is
  x86-specific, where some other arches misused TIF_32BIT
- personality ADDR_LIMIT_32BIT, which is used on arm/alpha
- ADDR_LIMIT_3GB, which is x86-specific and can be used to change
  running task's TASK_SIZE 3GB <-> 4GB

This patches set removes unused definition of TASK_SIZE_OF (1),
defines TASK_SIZE macro as current->mm->task_size (3).
I would suggest define TASK_SIZE this way in generic version,
but currently I test it only on x86.
It also frees thread info flag (2) and adds arch_prctl()
on x86_64 to get/set current virtual address space size - as
it's needed by now only for CRIU, hide it under CHECKPOINT_RESTORE
config.
Hope those patches will help to clean task_size-related code
at least a bit (and helps me to restore vaddr limits).

Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Kirill A. Shutemov <kirill.shute...@linux.intel.com>
Cc: x...@kernel.org

Dmitry Safonov (4):
  mm: remove unused TASK_SIZE_OF()
  x86/thread_info: kill TIF_ADDR32 in favour of ADDR_LIMIT_32BIT
  x86/mm: define TASK_SIZE as current->mm->task_size
  x86/arch_prctl: add ARCH_{GET,SET}_TASK_SIZE

 arch/arm64/include/asm/memory.h   |  2 --
 arch/mips/include/asm/processor.h |  3 ---
 arch/parisc/include/asm/processor.h   |  3 +--
 arch/powerpc/include/asm/processor.h  |  3 +--
 arch/s390/include/asm/processor.h |  3 +--
 arch/sparc/include/asm/processor_64.h |  3 ---
 arch/x86/include/asm/elf.h|  7 +--
 arch/x86/include/asm/processor.h  | 19 +--
 arch/x86/include/asm/thread_info.h|  4 +---
 arch/x86/include/uapi/asm/prctl.h |  3 +++
 arch/x86/kernel/process_64.c  | 17 +++--
 arch/x86/kernel/sys_x86_64.c  |  4 ++--
 arch/x86/um/asm/segment.h |  2 +-
 arch/x86/xen/mmu.c|  4 ++--
 fs/exec.c | 17 +++--
 include/linux/sched.h |  4 
 16 files changed, 52 insertions(+), 46 deletions(-)

-- 
2.11.0



Re: [PATCHv3 0/8] powerpc/mm: refactor vDSO mapping code

2016-11-08 Thread Dmitry Safonov

On 11/08/2016 02:57 AM, Michael Ellerman wrote:

Dmitry Safonov <0x7f454...@gmail.com> writes:


2016-10-27 20:09 GMT+03:00 Dmitry Safonov <dsafo...@virtuozzo.com>:

ping?


There's another series doing some similar changes:

http://www.spinics.net/lists/linux-mm/msg115860.html


Well, that version makes arch_mremap hook more general with renaming
vdso pointer. While this series erases that hook totaly.
So, we've agreed that it would be better without this hook, but with
generic version of vdso_mremap special_mapping helper:
https://marc.info/?i=d1aa8bec-a53e-cd30-e66a-39bebb6a4...@codeaurora.org


And I don't like all the macro games in 3/8, eg:

+#ifndef BITS
+#define BITS 32
+#endif
+
+#undef Elf_Ehdr
+#undef Elf_Sym
+#undef Elf_Shdr
+
+#define _CONCAT3(a, b, c)  a ## b ## c
+#define CONCAT3(a, b, c)   _CONCAT3(a, b, c)
+#define Elf_Ehdr   CONCAT3(Elf,  BITS, _Ehdr)
+#define Elf_SymCONCAT3(Elf,  BITS, _Sym)
+#define Elf_Shdr   CONCAT3(Elf,  BITS, _Shdr)
+#define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE)
+#define vdso_kbase CONCAT3(vdso, BITS, _kbase)
+#define vdso_pages CONCAT3(vdso, BITS, _pages)
+
+#undef pr_fmt
+#define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt
+
+#define lib_elfinfo CONCAT3(lib, BITS, _elfinfo)
+
+#define find_section CONCAT3(find_section, BITS,)
+static void * __init find_section(Elf_Ehdr *ehdr, const char *secname,
+   unsigned long *size)


I'd rather we kept the duplication of code than the obfuscation those
macros add.

If we can come up with a way to share more of the code without having to
do all those tricks then I'd be interested.


Well, ok, I thought it's quite common even outside of tracing:
e.g, fs/compat_binfmt_elf.c does quite the same trick.
But as you find it obscured - than ok, I will resend without that
common-vdso part.



cheers




--
 Dmitry


Re: [PATCHv3 1/8] powerpc/vdso: unify return paths in setup_additional_pages

2016-11-08 Thread Dmitry Safonov

On 11/08/2016 03:10 AM, Michael Ellerman wrote:

Hi Dmitry,

Thanks for the patches.

Dmitry Safonov <dsafo...@virtuozzo.com> writes:

Impact: cleanup


I'm not a fan of these "Impact" lines, especially when they're not
correct, ie. this is not a cleanup, a cleanup doesn't change logic.


Rename `rc' variable which doesn't seems to mean anything into
kernel-known `ret'.


'rc' means "Return Code", it's fairly common. I see at least ~8500
"int rc" declarations in the kernel.

Please don't rename variables and change logic in one patch.


Ok, right - just didn't saw `rc' that freq as `ret'.
Will leave the name.


Combine two function returns into one as it's
also easier to read.

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 4111d30badfa..4ffb82a2d9e9 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -154,7 +154,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
struct page **vdso_pagelist;
unsigned long vdso_pages;
unsigned long vdso_base;
-   int rc;
+   int ret = 0;


Please don't initialise return codes in the declaration, it prevents the
compiler from warning you if you forget to initialise it in a
particular path.

AFAICS you never even use the default value either.


Oh, right - I split this patch from converting install_special_mapping()
to special vma version _install_special_mapping(), 6/8 patch in series.
Will move initialization to that patch.


if (!vdso_ready)
return 0;
@@ -203,8 +203,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
  ((VDSO_ALIGNMENT - 1) & PAGE_MASK),
  0, 0);
if (IS_ERR_VALUE(vdso_base)) {
-   rc = vdso_base;
-   goto fail_mmapsem;
+   ret = vdso_base;
+   goto out_up_mmap_sem;
}

/* Add required alignment. */
@@ -227,21 +227,16 @@ int arch_setup_additional_pages(struct linux_binprm 
*bprm, int uses_interp)
 * It's fine to use that for setting breakpoints in the vDSO code
 * pages though.
 */
-   rc = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
+   ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
 VM_READ|VM_EXEC|
 VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
 vdso_pagelist);
-   if (rc) {
+   if (ret)
current->mm->context.vdso_base = 0;
-   goto fail_mmapsem;
-   }
-
-   up_write(>mmap_sem);
-   return 0;

- fail_mmapsem:
+out_up_mmap_sem:
up_write(>mmap_sem);
-   return rc;
+   return ret;
 }



If you strip out the variable renames then I think that change would be
OK.

cheers




--
 Dmitry


Re: [PATCHv3 0/8] powerpc/mm: refactor vDSO mapping code

2016-11-07 Thread Dmitry Safonov
2016-10-27 20:09 GMT+03:00 Dmitry Safonov <dsafo...@virtuozzo.com>:
> Changes since v1, v2:
> - use vdso64_pages only under CONFIG_PPC64 (32-bit build fix)
> - remove arch_vma_name helper as not needed anymore,
>   simplify vdso_base pointer initializing in map_vdso()
>
> Cleanup patches for vDSO on powerpc.
> Originally, I wanted to add vDSO remapping on arm/aarch64 and
> I decided to cleanup that part on powerpc.
> I've add a hook for vm_ops for vDSO just like I did for x86,
> which makes cross-arch arch_mremap hook no more needed.
> Other changes - reduce exhaustive code duplication by
> separating the common vdso code.
>
> No visible to userspace changes expected.
> Tested on qemu with buildroot rootfs.
>
> Dmitry Safonov (8):
>   powerpc/vdso: unify return paths in setup_additional_pages
>   powerpc/vdso: remove unused params in vdso_do_func_patch{32,64}
>   powerpc/vdso: separate common code in vdso_common
>   powerpc/vdso: introduce init_vdso{32,64}_pagelist
>   powerpc/vdso: split map_vdso from arch_setup_additional_pages
>   powerpc/vdso: switch from legacy_special_mapping_vmops
>   mm: kill arch_mremap
>   powerpc/vdso: remove arch_vma_name
>
>  arch/alpha/include/asm/Kbuild|   1 -
>  arch/arc/include/asm/Kbuild  |   1 -
>  arch/arm/include/asm/Kbuild  |   1 -
>  arch/arm64/include/asm/Kbuild|   1 -
>  arch/avr32/include/asm/Kbuild|   1 -
>  arch/blackfin/include/asm/Kbuild |   1 -
>  arch/c6x/include/asm/Kbuild  |   1 -
>  arch/cris/include/asm/Kbuild |   1 -
>  arch/frv/include/asm/Kbuild  |   1 -
>  arch/h8300/include/asm/Kbuild|   1 -
>  arch/hexagon/include/asm/Kbuild  |   1 -
>  arch/ia64/include/asm/Kbuild |   1 -
>  arch/m32r/include/asm/Kbuild |   1 -
>  arch/m68k/include/asm/Kbuild |   1 -
>  arch/metag/include/asm/Kbuild|   1 -
>  arch/microblaze/include/asm/Kbuild   |   1 -
>  arch/mips/include/asm/Kbuild |   1 -
>  arch/mn10300/include/asm/Kbuild  |   1 -
>  arch/nios2/include/asm/Kbuild|   1 -
>  arch/openrisc/include/asm/Kbuild |   1 -
>  arch/parisc/include/asm/Kbuild   |   1 -
>  arch/powerpc/include/asm/mm-arch-hooks.h |  28 --
>  arch/powerpc/kernel/vdso.c   | 502 
> +--
>  arch/powerpc/kernel/vdso_common.c| 248 +++
>  arch/s390/include/asm/Kbuild |   1 -
>  arch/score/include/asm/Kbuild|   1 -
>  arch/sh/include/asm/Kbuild   |   1 -
>  arch/sparc/include/asm/Kbuild|   1 -
>  arch/tile/include/asm/Kbuild |   1 -
>  arch/um/include/asm/Kbuild   |   1 -
>  arch/unicore32/include/asm/Kbuild|   1 -
>  arch/x86/include/asm/Kbuild  |   1 -
>  arch/xtensa/include/asm/Kbuild   |   1 -
>  include/asm-generic/mm-arch-hooks.h  |  16 -
>  include/linux/mm-arch-hooks.h|  25 --
>  mm/mremap.c  |   4 -
>  36 files changed, 324 insertions(+), 529 deletions(-)
>  delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h
>  create mode 100644 arch/powerpc/kernel/vdso_common.c
>  delete mode 100644 include/asm-generic/mm-arch-hooks.h
>  delete mode 100644 include/linux/mm-arch-hooks.h

ping?


Re: [PATCH 0/7] powerpc/mm: refactor vDSO mapping code

2016-10-27 Thread Dmitry Safonov
2016-10-25 18:50 GMT+03:00 Dmitry Safonov <dsafo...@virtuozzo.com>:
> Cleanup patches for vDSO on powerpc.
> Originally, I wanted to add vDSO remapping on arm/aarch64 and
> I decided to cleanup that part on powerpc.
> I've add a hook for vm_ops for vDSO just like I did for x86.
> Other changes - reduce exhaustive code duplication.
> No visible to userspace changes expected.
>
> Tested on qemu with buildroot rootfs.
>
> Dmitry Safonov (7):
>   powerpc/vdso: unify return paths in setup_additional_pages
>   powerpc/vdso: remove unused params in vdso_do_func_patch{32,64}
>   powerpc/vdso: separate common code in vdso_common
>   powerpc/vdso: introduce init_vdso{32,64}_pagelist
>   powerpc/vdso: split map_vdso from arch_setup_additional_pages
>   powerpc/vdso: switch from legacy_special_mapping_vmops
>   mm: kill arch_mremap

Ignore this version, please - I've just sent v3 with some new fixes.


[PATCHv3 7/8] mm: kill arch_mremap

2016-10-27 Thread Dmitry Safonov
This reverts commit 4abad2ca4a4d ("mm: new arch_remap() hook") and
commit 2ae416b142b6 ("mm: new mm hook framework").
It also keeps the same functionality of mremapping vDSO blob with
introducing vm_special_mapping mremap op for powerpc.
The same way it's being handled on x86.

Cc: Laurent Dufour <lduf...@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/alpha/include/asm/Kbuild|  1 -
 arch/arc/include/asm/Kbuild  |  1 -
 arch/arm/include/asm/Kbuild  |  1 -
 arch/arm64/include/asm/Kbuild|  1 -
 arch/avr32/include/asm/Kbuild|  1 -
 arch/blackfin/include/asm/Kbuild |  1 -
 arch/c6x/include/asm/Kbuild  |  1 -
 arch/cris/include/asm/Kbuild |  1 -
 arch/frv/include/asm/Kbuild  |  1 -
 arch/h8300/include/asm/Kbuild|  1 -
 arch/hexagon/include/asm/Kbuild  |  1 -
 arch/ia64/include/asm/Kbuild |  1 -
 arch/m32r/include/asm/Kbuild |  1 -
 arch/m68k/include/asm/Kbuild |  1 -
 arch/metag/include/asm/Kbuild|  1 -
 arch/microblaze/include/asm/Kbuild   |  1 -
 arch/mips/include/asm/Kbuild |  1 -
 arch/mn10300/include/asm/Kbuild  |  1 -
 arch/nios2/include/asm/Kbuild|  1 -
 arch/openrisc/include/asm/Kbuild |  1 -
 arch/parisc/include/asm/Kbuild   |  1 -
 arch/powerpc/include/asm/mm-arch-hooks.h | 28 
 arch/powerpc/kernel/vdso.c   | 25 +
 arch/powerpc/kernel/vdso_common.c|  1 +
 arch/s390/include/asm/Kbuild |  1 -
 arch/score/include/asm/Kbuild|  1 -
 arch/sh/include/asm/Kbuild   |  1 -
 arch/sparc/include/asm/Kbuild|  1 -
 arch/tile/include/asm/Kbuild |  1 -
 arch/um/include/asm/Kbuild   |  1 -
 arch/unicore32/include/asm/Kbuild|  1 -
 arch/x86/include/asm/Kbuild  |  1 -
 arch/xtensa/include/asm/Kbuild   |  1 -
 include/asm-generic/mm-arch-hooks.h  | 16 
 include/linux/mm-arch-hooks.h| 25 -
 mm/mremap.c  |  4 
 36 files changed, 26 insertions(+), 103 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h
 delete mode 100644 include/asm-generic/mm-arch-hooks.h
 delete mode 100644 include/linux/mm-arch-hooks.h

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index bf8475ce85ee..0a5e0ec2842b 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -6,7 +6,6 @@ generic-y += exec.h
 generic-y += export.h
 generic-y += irq_work.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += preempt.h
 generic-y += sections.h
 generic-y += trace_clock.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index c332604606dd..e6059a808463 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -22,7 +22,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 0745538b26d3..44b717cb4a55 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -15,7 +15,6 @@ generic-y += irq_regs.h
 generic-y += kdebug.h
 generic-y += local.h
 generic-y += local64.h
-generic-y += mm-arch-hooks.h
 generic-y += msgbuf.h
 generic-y += msi.h
 generic-y += param.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 44e1d7f10add..a42a1367aea4 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -20,7 +20,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild
index 241b9b9729d8..519810d0d5e1 100644
--- a/arch/avr32/include/asm/Kbuild
+++ b/arch/avr32/include/asm/Kbuild
@@ -12,7 +12,6 @@ generic-y += irq_work.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild
index

[PATCHv3 8/8] powerpc/vdso: remove arch_vma_name

2016-10-27 Thread Dmitry Safonov
It's not needed since vdso is inserted with vm_special_mapping
which contains vma name.
This also reverts commit f2053f1a7bf6 ("powerpc/perf_counter: Fix vdso
detection") as not needed anymore.
See also commit f7b6eb3fa072 ("x86: Set context.vdso before installing
the mapping").

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c | 20 +++-
 1 file changed, 3 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 431bdf7ec68e..f66f52aa94de 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -208,13 +208,6 @@ static int map_vdso(struct vm_special_mapping *vsm, 
unsigned long vdso_pages,
vdso_base = ALIGN(vdso_base, VDSO_ALIGNMENT);
 
/*
-* Put vDSO base into mm struct. We need to do this before calling
-* install_special_mapping or the perf counter mmap tracking code
-* will fail to recognise it as a vDSO (since arch_vma_name fails).
-*/
-   current->mm->context.vdso_base = vdso_base;
-
-   /*
 * our vma flags don't have VM_WRITE so by default, the process isn't
 * allowed to write those pages.
 * gdb can break that with ptrace interface, and thus trigger COW on
@@ -228,10 +221,10 @@ static int map_vdso(struct vm_special_mapping *vsm, 
unsigned long vdso_pages,
 VM_READ|VM_EXEC|
 VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
 vsm);
-   if (IS_ERR(vma)) {
+   if (IS_ERR(vma))
ret = PTR_ERR(vma);
-   current->mm->context.vdso_base = 0;
-   }
+   else
+   current->mm->context.vdso_base = vdso_base;
 
 out_up_mmap_sem:
up_write(>mmap_sem);
@@ -262,13 +255,6 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
return -1;
 }
 
-const char *arch_vma_name(struct vm_area_struct *vma)
-{
-   if (vma->vm_mm && vma->vm_start == vma->vm_mm->context.vdso_base)
-   return "[vdso]";
-   return NULL;
-}
-
 #ifdef CONFIG_VDSO32
 #include "vdso_common.c"
 #endif /* CONFIG_VDSO32 */
-- 
2.10.1



[PATCHv3 2/8] powerpc/vdso: remove unused params in vdso_do_func_patch{32, 64}

2016-10-27 Thread Dmitry Safonov
Impact: cleanup

vdso_do_func_patch{32,64} only use {v32,v64} parameter accordingly.
Remove not needed parameters.

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 4ffb82a2d9e9..278b9aa25a1c 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -309,7 +309,6 @@ static unsigned long __init find_function32(struct 
lib32_elfinfo *lib,
 }
 
 static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32,
-  struct lib64_elfinfo *v64,
   const char *orig, const char *fix)
 {
Elf32_Sym *sym32_gen, *sym32_fix;
@@ -344,7 +343,6 @@ static unsigned long __init find_function32(struct 
lib32_elfinfo *lib,
 }
 
 static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32,
-  struct lib64_elfinfo *v64,
   const char *orig, const char *fix)
 {
return 0;
@@ -419,8 +417,7 @@ static unsigned long __init find_function64(struct 
lib64_elfinfo *lib,
 #endif
 }
 
-static int __init vdso_do_func_patch64(struct lib32_elfinfo *v32,
-  struct lib64_elfinfo *v64,
+static int __init vdso_do_func_patch64(struct lib64_elfinfo *v64,
   const char *orig, const char *fix)
 {
Elf64_Sym *sym64_gen, *sym64_fix;
@@ -619,11 +616,9 @@ static __init int vdso_fixup_alt_funcs(struct 
lib32_elfinfo *v32,
 * It would be easy to do, but doesn't seem to be necessary,
 * patching the OPD symbol is enough.
 */
-   vdso_do_func_patch32(v32, v64, patch->gen_name,
-patch->fix_name);
+   vdso_do_func_patch32(v32, patch->gen_name, patch->fix_name);
 #ifdef CONFIG_PPC64
-   vdso_do_func_patch64(v32, v64, patch->gen_name,
-patch->fix_name);
+   vdso_do_func_patch64(v64, patch->gen_name, patch->fix_name);
 #endif /* CONFIG_PPC64 */
}
 
-- 
2.10.1



[PATCHv3 6/8] powerpc/vdso: switch from legacy_special_mapping_vmops

2016-10-27 Thread Dmitry Safonov
This will allow to handle vDSO vma like special_mapping, that has
it's name and hooks. Needed for mremap hook, which will replace
arch_mremap helper, also for removing arch_vma_name.

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c| 19 +++
 arch/powerpc/kernel/vdso_common.c |  8 ++--
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index e68601ffc9ad..9ee3fd65c6e9 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -51,7 +51,7 @@
 #define VDSO_ALIGNMENT (1 << 16)
 
 static unsigned int vdso32_pages;
-static struct page **vdso32_pagelist;
+static struct vm_special_mapping vdso32_mapping;
 unsigned long vdso32_sigtramp;
 unsigned long vdso32_rt_sigtramp;
 
@@ -64,7 +64,7 @@ static void *vdso32_kbase;
 extern char vdso64_start, vdso64_end;
 static void *vdso64_kbase = _start;
 static unsigned int vdso64_pages;
-static struct page **vdso64_pagelist;
+static struct vm_special_mapping vdso64_mapping;
 unsigned long vdso64_rt_sigtramp;
 #endif /* CONFIG_PPC64 */
 
@@ -143,10 +143,11 @@ struct lib64_elfinfo
unsigned long   text;
 };
 
-static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages,
+static int map_vdso(struct vm_special_mapping *vsm, unsigned long vdso_pages,
unsigned long vdso_base)
 {
struct mm_struct *mm = current->mm;
+   struct vm_area_struct *vma;
int ret = 0;
 
mm->context.vdso_base = 0;
@@ -198,12 +199,14 @@ static int map_vdso(struct page **vdso_pagelist, unsigned 
long vdso_pages,
 * It's fine to use that for setting breakpoints in the vDSO code
 * pages though.
 */
-   ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
+   vma = _install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
 VM_READ|VM_EXEC|
 VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
-vdso_pagelist);
-   if (ret)
+vsm);
+   if (IS_ERR(vma)) {
+   ret = PTR_ERR(vma);
current->mm->context.vdso_base = 0;
+   }
 
 out_up_mmap_sem:
up_write(>mmap_sem);
@@ -220,7 +223,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
return 0;
 
if (is_32bit_task())
-   return map_vdso(vdso32_pagelist, vdso32_pages, VDSO32_MBASE);
+   return map_vdso(_mapping, vdso32_pages, VDSO32_MBASE);
 #ifdef CONFIG_PPC64
else
/*
@@ -228,7 +231,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
 * allows get_unmapped_area to find an area near other mmaps
 * and most likely share a SLB entry.
 */
-   return map_vdso(vdso64_pagelist, vdso64_pages, 0);
+   return map_vdso(_mapping, vdso64_pages, 0);
 #endif
WARN_ONCE(1, "task is not 32-bit on non PPC64 kernel");
return -1;
diff --git a/arch/powerpc/kernel/vdso_common.c 
b/arch/powerpc/kernel/vdso_common.c
index c97c30606b3f..047f6b8b230f 100644
--- a/arch/powerpc/kernel/vdso_common.c
+++ b/arch/powerpc/kernel/vdso_common.c
@@ -14,7 +14,7 @@
 #define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE)
 #define vdso_kbase CONCAT3(vdso, BITS, _kbase)
 #define vdso_pages CONCAT3(vdso, BITS, _pages)
-#define vdso_pagelist  CONCAT3(vdso, BITS, _pagelist)
+#define vdso_mapping   CONCAT3(vdso, BITS, _mapping)
 
 #undef pr_fmt
 #define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt
@@ -207,6 +207,7 @@ static __init int vdso_setup(struct lib_elfinfo *v)
 static __init void init_vdso_pagelist(void)
 {
int i;
+   struct page **vdso_pagelist;
 
/* Make sure pages are in the correct state */
vdso_pagelist = kzalloc(sizeof(struct page *) * (vdso_pages + 2),
@@ -221,6 +222,9 @@ static __init void init_vdso_pagelist(void)
}
vdso_pagelist[i++] = virt_to_page(vdso_data);
vdso_pagelist[i] = NULL;
+
+   vdso_mapping.pages = vdso_pagelist;
+   vdso_mapping.name = "[vdso]";
 }
 
 #undef find_section
@@ -236,7 +240,7 @@ static __init void init_vdso_pagelist(void)
 #undef VDSO_LBASE
 #undef vdso_kbase
 #undef vdso_pages
-#undef vdso_pagelist
+#undef vdso_mapping
 #undef lib_elfinfo
 #undef BITS
 #undef _CONCAT3
-- 
2.10.1



[PATCHv3 5/8] powerpc/vdso: split map_vdso from arch_setup_additional_pages

2016-10-27 Thread Dmitry Safonov
Impact: cleanup

I'll be easier to introduce vm_special_mapping struct in
a smaller map_vdso() function (see the next patches).
The same way it's handeled on x86.

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c | 67 +-
 1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 25d03d773c49..e68601ffc9ad 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -143,52 +143,23 @@ struct lib64_elfinfo
unsigned long   text;
 };
 
-
-/*
- * This is called from binfmt_elf, we create the special vma for the
- * vDSO and insert it into the mm struct tree
- */
-int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
+static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages,
+   unsigned long vdso_base)
 {
struct mm_struct *mm = current->mm;
-   struct page **vdso_pagelist;
-   unsigned long vdso_pages;
-   unsigned long vdso_base;
int ret = 0;
 
-   if (!vdso_ready)
-   return 0;
-
-#ifdef CONFIG_PPC64
-   if (is_32bit_task()) {
-   vdso_pagelist = vdso32_pagelist;
-   vdso_pages = vdso32_pages;
-   vdso_base = VDSO32_MBASE;
-   } else {
-   vdso_pagelist = vdso64_pagelist;
-   vdso_pages = vdso64_pages;
-   /*
-* On 64bit we don't have a preferred map address. This
-* allows get_unmapped_area to find an area near other mmaps
-* and most likely share a SLB entry.
-*/
-   vdso_base = 0;
-   }
-#else
-   vdso_pagelist = vdso32_pagelist;
-   vdso_pages = vdso32_pages;
-   vdso_base = VDSO32_MBASE;
-#endif
-
-   current->mm->context.vdso_base = 0;
+   mm->context.vdso_base = 0;
 
-   /* vDSO has a problem and was disabled, just don't "enable" it for the
+   /*
+* vDSO has a problem and was disabled, just don't "enable" it for the
 * process
 */
if (vdso_pages == 0)
return 0;
+
/* Add a page to the vdso size for the data page */
-   vdso_pages ++;
+   vdso_pages++;
 
/*
 * pick a base address for the vDSO in process space. We try to put it
@@ -239,6 +210,30 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
return ret;
 }
 
+/*
+ * This is called from binfmt_elf, we create the special vma for the
+ * vDSO and insert it into the mm struct tree
+ */
+int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
+{
+   if (!vdso_ready)
+   return 0;
+
+   if (is_32bit_task())
+   return map_vdso(vdso32_pagelist, vdso32_pages, VDSO32_MBASE);
+#ifdef CONFIG_PPC64
+   else
+   /*
+* On 64bit we don't have a preferred map address. This
+* allows get_unmapped_area to find an area near other mmaps
+* and most likely share a SLB entry.
+*/
+   return map_vdso(vdso64_pagelist, vdso64_pages, 0);
+#endif
+   WARN_ONCE(1, "task is not 32-bit on non PPC64 kernel");
+   return -1;
+}
+
 const char *arch_vma_name(struct vm_area_struct *vma)
 {
if (vma->vm_mm && vma->vm_start == vma->vm_mm->context.vdso_base)
-- 
2.10.1



[PATCHv3 3/8] powerpc/vdso: separate common code in vdso_common

2016-10-27 Thread Dmitry Safonov
Impact: cleanup

There are common functions for handeling 32-bit and 64-bit vDSO ELF
files: find_section{32,64}, find_symbol{32,64}, find_function{32,64},
vdso_do_func_patch{32,64}, vdso_do_find_sections{32,64},
vdso_fixup_datapag{32,64}, vdso_fixup_features{32,64}, vdso_setup{32,64}
which all do the same work with the only difference is using structures
for 32 or 64 bit ELF. Let's combine them into common code, reducing
copy'n'paste code.

Small changes:
I also switched usage of printk(KERNEL_,...) on pr_(...)
and used pr_fmt() macro for "vDSO{32,64}: " prefix.

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c| 352 ++
 arch/powerpc/kernel/vdso_common.c | 221 
 2 files changed, 234 insertions(+), 339 deletions(-)
 create mode 100644 arch/powerpc/kernel/vdso_common.c

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 278b9aa25a1c..8010a0d82049 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -51,13 +51,13 @@
 #define VDSO_ALIGNMENT (1 << 16)
 
 static unsigned int vdso32_pages;
-static void *vdso32_kbase;
 static struct page **vdso32_pagelist;
 unsigned long vdso32_sigtramp;
 unsigned long vdso32_rt_sigtramp;
 
 #ifdef CONFIG_VDSO32
 extern char vdso32_start, vdso32_end;
+static void *vdso32_kbase;
 #endif
 
 #ifdef CONFIG_PPC64
@@ -246,250 +246,16 @@ const char *arch_vma_name(struct vm_area_struct *vma)
return NULL;
 }
 
-
-
 #ifdef CONFIG_VDSO32
-static void * __init find_section32(Elf32_Ehdr *ehdr, const char *secname,
- unsigned long *size)
-{
-   Elf32_Shdr *sechdrs;
-   unsigned int i;
-   char *secnames;
-
-   /* Grab section headers and strings so we can tell who is who */
-   sechdrs = (void *)ehdr + ehdr->e_shoff;
-   secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset;
-
-   /* Find the section they want */
-   for (i = 1; i < ehdr->e_shnum; i++) {
-   if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) {
-   if (size)
-   *size = sechdrs[i].sh_size;
-   return (void *)ehdr + sechdrs[i].sh_offset;
-   }
-   }
-   *size = 0;
-   return NULL;
-}
-
-static Elf32_Sym * __init find_symbol32(struct lib32_elfinfo *lib,
-   const char *symname)
-{
-   unsigned int i;
-   char name[MAX_SYMNAME], *c;
-
-   for (i = 0; i < (lib->dynsymsize / sizeof(Elf32_Sym)); i++) {
-   if (lib->dynsym[i].st_name == 0)
-   continue;
-   strlcpy(name, lib->dynstr + lib->dynsym[i].st_name,
-   MAX_SYMNAME);
-   c = strchr(name, '@');
-   if (c)
-   *c = 0;
-   if (strcmp(symname, name) == 0)
-   return >dynsym[i];
-   }
-   return NULL;
-}
-
-/* Note that we assume the section is .text and the symbol is relative to
- * the library base
- */
-static unsigned long __init find_function32(struct lib32_elfinfo *lib,
-   const char *symname)
-{
-   Elf32_Sym *sym = find_symbol32(lib, symname);
-
-   if (sym == NULL) {
-   printk(KERN_WARNING "vDSO32: function %s not found !\n",
-  symname);
-   return 0;
-   }
-   return sym->st_value - VDSO32_LBASE;
-}
-
-static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32,
-  const char *orig, const char *fix)
-{
-   Elf32_Sym *sym32_gen, *sym32_fix;
-
-   sym32_gen = find_symbol32(v32, orig);
-   if (sym32_gen == NULL) {
-   printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", orig);
-   return -1;
-   }
-   if (fix == NULL) {
-   sym32_gen->st_name = 0;
-   return 0;
-   }
-   sym32_fix = find_symbol32(v32, fix);
-   if (sym32_fix == NULL) {
-   printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", fix);
-   return -1;
-   }
-   sym32_gen->st_value = sym32_fix->st_value;
-   sym32_gen->st_size = sym32_fix->st_size;
-   sym32_gen->st_info = sym32_fix->st_info;
-   sym32_gen->st_other = sym32_fix->st_other;
-   sym32_gen->st_shndx = sym32_fix->st_shndx;
-
-   return 0;
-}
-#else /* !CONFIG_VDSO32 */
-static unsigned long __init find_function32(struct lib32_elfinfo *lib,
-   

[PATCHv3 4/8] powerpc/vdso: introduce init_vdso{32,64}_pagelist

2016-10-27 Thread Dmitry Safonov
Impact: cleanup

Move allocation/initialization of vDSO's pagelist for 32/64-bit vDSO
into common vdso code, introducing a function for that.

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c| 27 ++-
 arch/powerpc/kernel/vdso_common.c | 22 ++
 2 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 8010a0d82049..25d03d773c49 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -382,8 +382,6 @@ early_initcall(vdso_getcpu_init);
 
 static int __init vdso_init(void)
 {
-   int i;
-
 #ifdef CONFIG_PPC64
/*
 * Fill up the "systemcfg" stuff for backward compatibility
@@ -454,32 +452,11 @@ static int __init vdso_init(void)
}
 
 #ifdef CONFIG_VDSO32
-   /* Make sure pages are in the correct state */
-   vdso32_pagelist = kzalloc(sizeof(struct page *) * (vdso32_pages + 2),
- GFP_KERNEL);
-   BUG_ON(vdso32_pagelist == NULL);
-   for (i = 0; i < vdso32_pages; i++) {
-   struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE);
-   ClearPageReserved(pg);
-   get_page(pg);
-   vdso32_pagelist[i] = pg;
-   }
-   vdso32_pagelist[i++] = virt_to_page(vdso_data);
-   vdso32_pagelist[i] = NULL;
+   init_vdso32_pagelist();
 #endif
 
 #ifdef CONFIG_PPC64
-   vdso64_pagelist = kzalloc(sizeof(struct page *) * (vdso64_pages + 2),
- GFP_KERNEL);
-   BUG_ON(vdso64_pagelist == NULL);
-   for (i = 0; i < vdso64_pages; i++) {
-   struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE);
-   ClearPageReserved(pg);
-   get_page(pg);
-   vdso64_pagelist[i] = pg;
-   }
-   vdso64_pagelist[i++] = virt_to_page(vdso_data);
-   vdso64_pagelist[i] = NULL;
+   init_vdso64_pagelist();
 #endif /* CONFIG_PPC64 */
 
get_page(virt_to_page(vdso_data));
diff --git a/arch/powerpc/kernel/vdso_common.c 
b/arch/powerpc/kernel/vdso_common.c
index ac25d66134fb..c97c30606b3f 100644
--- a/arch/powerpc/kernel/vdso_common.c
+++ b/arch/powerpc/kernel/vdso_common.c
@@ -14,6 +14,7 @@
 #define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE)
 #define vdso_kbase CONCAT3(vdso, BITS, _kbase)
 #define vdso_pages CONCAT3(vdso, BITS, _pages)
+#define vdso_pagelist  CONCAT3(vdso, BITS, _pagelist)
 
 #undef pr_fmt
 #define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt
@@ -202,6 +203,25 @@ static __init int vdso_setup(struct lib_elfinfo *v)
return 0;
 }
 
+#define init_vdso_pagelist CONCAT3(init_vdso, BITS, _pagelist)
+static __init void init_vdso_pagelist(void)
+{
+   int i;
+
+   /* Make sure pages are in the correct state */
+   vdso_pagelist = kzalloc(sizeof(struct page *) * (vdso_pages + 2),
+ GFP_KERNEL);
+   BUG_ON(vdso_pagelist == NULL);
+   for (i = 0; i < vdso_pages; i++) {
+   struct page *pg = virt_to_page(vdso_kbase + i*PAGE_SIZE);
+
+   ClearPageReserved(pg);
+   get_page(pg);
+   vdso_pagelist[i] = pg;
+   }
+   vdso_pagelist[i++] = virt_to_page(vdso_data);
+   vdso_pagelist[i] = NULL;
+}
 
 #undef find_section
 #undef find_symbol
@@ -211,10 +231,12 @@ static __init int vdso_setup(struct lib_elfinfo *v)
 #undef vdso_fixup_datapage
 #undef vdso_fixup_features
 #undef vdso_setup
+#undef init_vdso_pagelist
 
 #undef VDSO_LBASE
 #undef vdso_kbase
 #undef vdso_pages
+#undef vdso_pagelist
 #undef lib_elfinfo
 #undef BITS
 #undef _CONCAT3
-- 
2.10.1



[PATCHv3 1/8] powerpc/vdso: unify return paths in setup_additional_pages

2016-10-27 Thread Dmitry Safonov
Impact: cleanup

Rename `rc' variable which doesn't seems to mean anything into
kernel-known `ret'. Combine two function returns into one as it's
also easier to read.

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 4111d30badfa..4ffb82a2d9e9 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -154,7 +154,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
struct page **vdso_pagelist;
unsigned long vdso_pages;
unsigned long vdso_base;
-   int rc;
+   int ret = 0;
 
if (!vdso_ready)
return 0;
@@ -203,8 +203,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
  ((VDSO_ALIGNMENT - 1) & PAGE_MASK),
  0, 0);
if (IS_ERR_VALUE(vdso_base)) {
-   rc = vdso_base;
-   goto fail_mmapsem;
+   ret = vdso_base;
+   goto out_up_mmap_sem;
}
 
/* Add required alignment. */
@@ -227,21 +227,16 @@ int arch_setup_additional_pages(struct linux_binprm 
*bprm, int uses_interp)
 * It's fine to use that for setting breakpoints in the vDSO code
 * pages though.
 */
-   rc = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
+   ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
 VM_READ|VM_EXEC|
 VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
 vdso_pagelist);
-   if (rc) {
+   if (ret)
current->mm->context.vdso_base = 0;
-   goto fail_mmapsem;
-   }
-
-   up_write(>mmap_sem);
-   return 0;
 
- fail_mmapsem:
+out_up_mmap_sem:
up_write(>mmap_sem);
-   return rc;
+   return ret;
 }
 
 const char *arch_vma_name(struct vm_area_struct *vma)
-- 
2.10.1



[PATCHv3 0/8] powerpc/mm: refactor vDSO mapping code

2016-10-27 Thread Dmitry Safonov
Changes since v1, v2:
- use vdso64_pages only under CONFIG_PPC64 (32-bit build fix)
- remove arch_vma_name helper as not needed anymore,
  simplify vdso_base pointer initializing in map_vdso()

Cleanup patches for vDSO on powerpc.
Originally, I wanted to add vDSO remapping on arm/aarch64 and
I decided to cleanup that part on powerpc.
I've add a hook for vm_ops for vDSO just like I did for x86,
which makes cross-arch arch_mremap hook no more needed.
Other changes - reduce exhaustive code duplication by
separating the common vdso code.

No visible to userspace changes expected.
Tested on qemu with buildroot rootfs.

Dmitry Safonov (8):
  powerpc/vdso: unify return paths in setup_additional_pages
  powerpc/vdso: remove unused params in vdso_do_func_patch{32,64}
  powerpc/vdso: separate common code in vdso_common
  powerpc/vdso: introduce init_vdso{32,64}_pagelist
  powerpc/vdso: split map_vdso from arch_setup_additional_pages
  powerpc/vdso: switch from legacy_special_mapping_vmops
  mm: kill arch_mremap
  powerpc/vdso: remove arch_vma_name

 arch/alpha/include/asm/Kbuild|   1 -
 arch/arc/include/asm/Kbuild  |   1 -
 arch/arm/include/asm/Kbuild  |   1 -
 arch/arm64/include/asm/Kbuild|   1 -
 arch/avr32/include/asm/Kbuild|   1 -
 arch/blackfin/include/asm/Kbuild |   1 -
 arch/c6x/include/asm/Kbuild  |   1 -
 arch/cris/include/asm/Kbuild |   1 -
 arch/frv/include/asm/Kbuild  |   1 -
 arch/h8300/include/asm/Kbuild|   1 -
 arch/hexagon/include/asm/Kbuild  |   1 -
 arch/ia64/include/asm/Kbuild |   1 -
 arch/m32r/include/asm/Kbuild |   1 -
 arch/m68k/include/asm/Kbuild |   1 -
 arch/metag/include/asm/Kbuild|   1 -
 arch/microblaze/include/asm/Kbuild   |   1 -
 arch/mips/include/asm/Kbuild |   1 -
 arch/mn10300/include/asm/Kbuild  |   1 -
 arch/nios2/include/asm/Kbuild|   1 -
 arch/openrisc/include/asm/Kbuild |   1 -
 arch/parisc/include/asm/Kbuild   |   1 -
 arch/powerpc/include/asm/mm-arch-hooks.h |  28 --
 arch/powerpc/kernel/vdso.c   | 502 +--
 arch/powerpc/kernel/vdso_common.c| 248 +++
 arch/s390/include/asm/Kbuild |   1 -
 arch/score/include/asm/Kbuild|   1 -
 arch/sh/include/asm/Kbuild   |   1 -
 arch/sparc/include/asm/Kbuild|   1 -
 arch/tile/include/asm/Kbuild |   1 -
 arch/um/include/asm/Kbuild   |   1 -
 arch/unicore32/include/asm/Kbuild|   1 -
 arch/x86/include/asm/Kbuild  |   1 -
 arch/xtensa/include/asm/Kbuild   |   1 -
 include/asm-generic/mm-arch-hooks.h  |  16 -
 include/linux/mm-arch-hooks.h|  25 --
 mm/mremap.c  |   4 -
 36 files changed, 324 insertions(+), 529 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h
 create mode 100644 arch/powerpc/kernel/vdso_common.c
 delete mode 100644 include/asm-generic/mm-arch-hooks.h
 delete mode 100644 include/linux/mm-arch-hooks.h

-- 
2.10.1



[PATCHv2 7/7] mm: kill arch_mremap

2016-10-25 Thread Dmitry Safonov
This reverts commit 4abad2ca4a4d ("mm: new arch_remap() hook") and
commit 2ae416b142b6 ("mm: new mm hook framework").
It also keeps the same functionality of mremapping vDSO blob with
introducing vm_special_mapping mremap op for powerpc.

Cc: Laurent Dufour <lduf...@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
v2: use vdso64_pages only under CONFIG_PPC64

 arch/alpha/include/asm/Kbuild|  1 -
 arch/arc/include/asm/Kbuild  |  1 -
 arch/arm/include/asm/Kbuild  |  1 -
 arch/arm64/include/asm/Kbuild|  1 -
 arch/avr32/include/asm/Kbuild|  1 -
 arch/blackfin/include/asm/Kbuild |  1 -
 arch/c6x/include/asm/Kbuild  |  1 -
 arch/cris/include/asm/Kbuild |  1 -
 arch/frv/include/asm/Kbuild  |  1 -
 arch/h8300/include/asm/Kbuild|  1 -
 arch/hexagon/include/asm/Kbuild  |  1 -
 arch/ia64/include/asm/Kbuild |  1 -
 arch/m32r/include/asm/Kbuild |  1 -
 arch/m68k/include/asm/Kbuild |  1 -
 arch/metag/include/asm/Kbuild|  1 -
 arch/microblaze/include/asm/Kbuild   |  1 -
 arch/mips/include/asm/Kbuild |  1 -
 arch/mn10300/include/asm/Kbuild  |  1 -
 arch/nios2/include/asm/Kbuild|  1 -
 arch/openrisc/include/asm/Kbuild |  1 -
 arch/parisc/include/asm/Kbuild   |  1 -
 arch/powerpc/include/asm/mm-arch-hooks.h | 28 
 arch/powerpc/kernel/vdso.c   | 25 +
 arch/powerpc/kernel/vdso_common.c|  1 +
 arch/s390/include/asm/Kbuild |  1 -
 arch/score/include/asm/Kbuild|  1 -
 arch/sh/include/asm/Kbuild   |  1 -
 arch/sparc/include/asm/Kbuild|  1 -
 arch/tile/include/asm/Kbuild |  1 -
 arch/um/include/asm/Kbuild   |  1 -
 arch/unicore32/include/asm/Kbuild|  1 -
 arch/x86/include/asm/Kbuild  |  1 -
 arch/xtensa/include/asm/Kbuild   |  1 -
 include/asm-generic/mm-arch-hooks.h  | 16 
 include/linux/mm-arch-hooks.h| 25 -
 mm/mremap.c  |  4 
 36 files changed, 26 insertions(+), 103 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h
 delete mode 100644 include/asm-generic/mm-arch-hooks.h
 delete mode 100644 include/linux/mm-arch-hooks.h

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index bf8475ce85ee..0a5e0ec2842b 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -6,7 +6,6 @@ generic-y += exec.h
 generic-y += export.h
 generic-y += irq_work.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += preempt.h
 generic-y += sections.h
 generic-y += trace_clock.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index c332604606dd..e6059a808463 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -22,7 +22,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 0745538b26d3..44b717cb4a55 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -15,7 +15,6 @@ generic-y += irq_regs.h
 generic-y += kdebug.h
 generic-y += local.h
 generic-y += local64.h
-generic-y += mm-arch-hooks.h
 generic-y += msgbuf.h
 generic-y += msi.h
 generic-y += param.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 44e1d7f10add..a42a1367aea4 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -20,7 +20,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild
index 241b9b9729d8..519810d0d5e1 100644
--- a/arch/avr32/include/asm/Kbuild
+++ b/arch/avr32/include/asm/Kbuild
@@ -12,7 +12,6 @@ generic-y += irq_work.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild

[PATCH 4/7] powerpc/vdso: introduce init_vdso{32,64}_pagelist

2016-10-25 Thread Dmitry Safonov
Common code with allocation/initialization of vDSO's pagelist.

Impact: cleanup

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c| 27 ++-
 arch/powerpc/kernel/vdso_common.c | 22 ++
 2 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 8010a0d82049..25d03d773c49 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -382,8 +382,6 @@ early_initcall(vdso_getcpu_init);
 
 static int __init vdso_init(void)
 {
-   int i;
-
 #ifdef CONFIG_PPC64
/*
 * Fill up the "systemcfg" stuff for backward compatibility
@@ -454,32 +452,11 @@ static int __init vdso_init(void)
}
 
 #ifdef CONFIG_VDSO32
-   /* Make sure pages are in the correct state */
-   vdso32_pagelist = kzalloc(sizeof(struct page *) * (vdso32_pages + 2),
- GFP_KERNEL);
-   BUG_ON(vdso32_pagelist == NULL);
-   for (i = 0; i < vdso32_pages; i++) {
-   struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE);
-   ClearPageReserved(pg);
-   get_page(pg);
-   vdso32_pagelist[i] = pg;
-   }
-   vdso32_pagelist[i++] = virt_to_page(vdso_data);
-   vdso32_pagelist[i] = NULL;
+   init_vdso32_pagelist();
 #endif
 
 #ifdef CONFIG_PPC64
-   vdso64_pagelist = kzalloc(sizeof(struct page *) * (vdso64_pages + 2),
- GFP_KERNEL);
-   BUG_ON(vdso64_pagelist == NULL);
-   for (i = 0; i < vdso64_pages; i++) {
-   struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE);
-   ClearPageReserved(pg);
-   get_page(pg);
-   vdso64_pagelist[i] = pg;
-   }
-   vdso64_pagelist[i++] = virt_to_page(vdso_data);
-   vdso64_pagelist[i] = NULL;
+   init_vdso64_pagelist();
 #endif /* CONFIG_PPC64 */
 
get_page(virt_to_page(vdso_data));
diff --git a/arch/powerpc/kernel/vdso_common.c 
b/arch/powerpc/kernel/vdso_common.c
index ac25d66134fb..c97c30606b3f 100644
--- a/arch/powerpc/kernel/vdso_common.c
+++ b/arch/powerpc/kernel/vdso_common.c
@@ -14,6 +14,7 @@
 #define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE)
 #define vdso_kbase CONCAT3(vdso, BITS, _kbase)
 #define vdso_pages CONCAT3(vdso, BITS, _pages)
+#define vdso_pagelist  CONCAT3(vdso, BITS, _pagelist)
 
 #undef pr_fmt
 #define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt
@@ -202,6 +203,25 @@ static __init int vdso_setup(struct lib_elfinfo *v)
return 0;
 }
 
+#define init_vdso_pagelist CONCAT3(init_vdso, BITS, _pagelist)
+static __init void init_vdso_pagelist(void)
+{
+   int i;
+
+   /* Make sure pages are in the correct state */
+   vdso_pagelist = kzalloc(sizeof(struct page *) * (vdso_pages + 2),
+ GFP_KERNEL);
+   BUG_ON(vdso_pagelist == NULL);
+   for (i = 0; i < vdso_pages; i++) {
+   struct page *pg = virt_to_page(vdso_kbase + i*PAGE_SIZE);
+
+   ClearPageReserved(pg);
+   get_page(pg);
+   vdso_pagelist[i] = pg;
+   }
+   vdso_pagelist[i++] = virt_to_page(vdso_data);
+   vdso_pagelist[i] = NULL;
+}
 
 #undef find_section
 #undef find_symbol
@@ -211,10 +231,12 @@ static __init int vdso_setup(struct lib_elfinfo *v)
 #undef vdso_fixup_datapage
 #undef vdso_fixup_features
 #undef vdso_setup
+#undef init_vdso_pagelist
 
 #undef VDSO_LBASE
 #undef vdso_kbase
 #undef vdso_pages
+#undef vdso_pagelist
 #undef lib_elfinfo
 #undef BITS
 #undef _CONCAT3
-- 
2.10.0



[PATCH 3/7] powerpc/vdso: separate common code in vdso_common

2016-10-25 Thread Dmitry Safonov
Impact: cleanup

I also switched usage of printk(KERNEL_,...) on pr_(...)
and used pr_fmt() macro for "vDSO{32,64}: " prefix.

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c| 352 ++
 arch/powerpc/kernel/vdso_common.c | 221 
 2 files changed, 234 insertions(+), 339 deletions(-)
 create mode 100644 arch/powerpc/kernel/vdso_common.c

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 278b9aa25a1c..8010a0d82049 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -51,13 +51,13 @@
 #define VDSO_ALIGNMENT (1 << 16)
 
 static unsigned int vdso32_pages;
-static void *vdso32_kbase;
 static struct page **vdso32_pagelist;
 unsigned long vdso32_sigtramp;
 unsigned long vdso32_rt_sigtramp;
 
 #ifdef CONFIG_VDSO32
 extern char vdso32_start, vdso32_end;
+static void *vdso32_kbase;
 #endif
 
 #ifdef CONFIG_PPC64
@@ -246,250 +246,16 @@ const char *arch_vma_name(struct vm_area_struct *vma)
return NULL;
 }
 
-
-
 #ifdef CONFIG_VDSO32
-static void * __init find_section32(Elf32_Ehdr *ehdr, const char *secname,
- unsigned long *size)
-{
-   Elf32_Shdr *sechdrs;
-   unsigned int i;
-   char *secnames;
-
-   /* Grab section headers and strings so we can tell who is who */
-   sechdrs = (void *)ehdr + ehdr->e_shoff;
-   secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset;
-
-   /* Find the section they want */
-   for (i = 1; i < ehdr->e_shnum; i++) {
-   if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) {
-   if (size)
-   *size = sechdrs[i].sh_size;
-   return (void *)ehdr + sechdrs[i].sh_offset;
-   }
-   }
-   *size = 0;
-   return NULL;
-}
-
-static Elf32_Sym * __init find_symbol32(struct lib32_elfinfo *lib,
-   const char *symname)
-{
-   unsigned int i;
-   char name[MAX_SYMNAME], *c;
-
-   for (i = 0; i < (lib->dynsymsize / sizeof(Elf32_Sym)); i++) {
-   if (lib->dynsym[i].st_name == 0)
-   continue;
-   strlcpy(name, lib->dynstr + lib->dynsym[i].st_name,
-   MAX_SYMNAME);
-   c = strchr(name, '@');
-   if (c)
-   *c = 0;
-   if (strcmp(symname, name) == 0)
-   return >dynsym[i];
-   }
-   return NULL;
-}
-
-/* Note that we assume the section is .text and the symbol is relative to
- * the library base
- */
-static unsigned long __init find_function32(struct lib32_elfinfo *lib,
-   const char *symname)
-{
-   Elf32_Sym *sym = find_symbol32(lib, symname);
-
-   if (sym == NULL) {
-   printk(KERN_WARNING "vDSO32: function %s not found !\n",
-  symname);
-   return 0;
-   }
-   return sym->st_value - VDSO32_LBASE;
-}
-
-static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32,
-  const char *orig, const char *fix)
-{
-   Elf32_Sym *sym32_gen, *sym32_fix;
-
-   sym32_gen = find_symbol32(v32, orig);
-   if (sym32_gen == NULL) {
-   printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", orig);
-   return -1;
-   }
-   if (fix == NULL) {
-   sym32_gen->st_name = 0;
-   return 0;
-   }
-   sym32_fix = find_symbol32(v32, fix);
-   if (sym32_fix == NULL) {
-   printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", fix);
-   return -1;
-   }
-   sym32_gen->st_value = sym32_fix->st_value;
-   sym32_gen->st_size = sym32_fix->st_size;
-   sym32_gen->st_info = sym32_fix->st_info;
-   sym32_gen->st_other = sym32_fix->st_other;
-   sym32_gen->st_shndx = sym32_fix->st_shndx;
-
-   return 0;
-}
-#else /* !CONFIG_VDSO32 */
-static unsigned long __init find_function32(struct lib32_elfinfo *lib,
-   const char *symname)
-{
-   return 0;
-}
-
-static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32,
-  const char *orig, const char *fix)
-{
-   return 0;
-}
+#include "vdso_common.c"
 #endif /* CONFIG_VDSO32 */
 
-
 #ifdef CONFIG_PPC64
-
-static void * __init find_section64(Elf64_Ehdr *ehdr, const char *secname,
-

[PATCH 7/7] mm: kill arch_mremap

2016-10-25 Thread Dmitry Safonov
This reverts commit 4abad2ca4a4d ("mm: new arch_remap() hook") and
commit 2ae416b142b6 ("mm: new mm hook framework").
It also keeps the same functionality of mremapping vDSO blob with
introducing vm_special_mapping mremap op for powerpc.

Cc: Laurent Dufour <lduf...@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/alpha/include/asm/Kbuild|  1 -
 arch/arc/include/asm/Kbuild  |  1 -
 arch/arm/include/asm/Kbuild  |  1 -
 arch/arm64/include/asm/Kbuild|  1 -
 arch/avr32/include/asm/Kbuild|  1 -
 arch/blackfin/include/asm/Kbuild |  1 -
 arch/c6x/include/asm/Kbuild  |  1 -
 arch/cris/include/asm/Kbuild |  1 -
 arch/frv/include/asm/Kbuild  |  1 -
 arch/h8300/include/asm/Kbuild|  1 -
 arch/hexagon/include/asm/Kbuild  |  1 -
 arch/ia64/include/asm/Kbuild |  1 -
 arch/m32r/include/asm/Kbuild |  1 -
 arch/m68k/include/asm/Kbuild |  1 -
 arch/metag/include/asm/Kbuild|  1 -
 arch/microblaze/include/asm/Kbuild   |  1 -
 arch/mips/include/asm/Kbuild |  1 -
 arch/mn10300/include/asm/Kbuild  |  1 -
 arch/nios2/include/asm/Kbuild|  1 -
 arch/openrisc/include/asm/Kbuild |  1 -
 arch/parisc/include/asm/Kbuild   |  1 -
 arch/powerpc/include/asm/mm-arch-hooks.h | 28 
 arch/powerpc/kernel/vdso.c   | 19 +++
 arch/powerpc/kernel/vdso_common.c|  1 +
 arch/s390/include/asm/Kbuild |  1 -
 arch/score/include/asm/Kbuild|  1 -
 arch/sh/include/asm/Kbuild   |  1 -
 arch/sparc/include/asm/Kbuild|  1 -
 arch/tile/include/asm/Kbuild |  1 -
 arch/um/include/asm/Kbuild   |  1 -
 arch/unicore32/include/asm/Kbuild|  1 -
 arch/x86/include/asm/Kbuild  |  1 -
 arch/xtensa/include/asm/Kbuild   |  1 -
 include/asm-generic/mm-arch-hooks.h  | 16 
 include/linux/mm-arch-hooks.h| 25 -
 mm/mremap.c  |  4 
 36 files changed, 20 insertions(+), 103 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h
 delete mode 100644 include/asm-generic/mm-arch-hooks.h
 delete mode 100644 include/linux/mm-arch-hooks.h

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index bf8475ce85ee..0a5e0ec2842b 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -6,7 +6,6 @@ generic-y += exec.h
 generic-y += export.h
 generic-y += irq_work.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += preempt.h
 generic-y += sections.h
 generic-y += trace_clock.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index c332604606dd..e6059a808463 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -22,7 +22,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 0745538b26d3..44b717cb4a55 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -15,7 +15,6 @@ generic-y += irq_regs.h
 generic-y += kdebug.h
 generic-y += local.h
 generic-y += local64.h
-generic-y += mm-arch-hooks.h
 generic-y += msgbuf.h
 generic-y += msi.h
 generic-y += param.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 44e1d7f10add..a42a1367aea4 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -20,7 +20,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild
index 241b9b9729d8..519810d0d5e1 100644
--- a/arch/avr32/include/asm/Kbuild
+++ b/arch/avr32/include/asm/Kbuild
@@ -12,7 +12,6 @@ generic-y += irq_work.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild
index 91d49c0a3118..c80181e4454f 100644
--- a/ar

[PATCH 6/7] powerpc/vdso: switch from legacy_special_mapping_vmops

2016-10-25 Thread Dmitry Safonov
This will allow to introduce mremap hook (the next patch).

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c| 19 +++
 arch/powerpc/kernel/vdso_common.c |  8 ++--
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index e68601ffc9ad..9ee3fd65c6e9 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -51,7 +51,7 @@
 #define VDSO_ALIGNMENT (1 << 16)
 
 static unsigned int vdso32_pages;
-static struct page **vdso32_pagelist;
+static struct vm_special_mapping vdso32_mapping;
 unsigned long vdso32_sigtramp;
 unsigned long vdso32_rt_sigtramp;
 
@@ -64,7 +64,7 @@ static void *vdso32_kbase;
 extern char vdso64_start, vdso64_end;
 static void *vdso64_kbase = _start;
 static unsigned int vdso64_pages;
-static struct page **vdso64_pagelist;
+static struct vm_special_mapping vdso64_mapping;
 unsigned long vdso64_rt_sigtramp;
 #endif /* CONFIG_PPC64 */
 
@@ -143,10 +143,11 @@ struct lib64_elfinfo
unsigned long   text;
 };
 
-static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages,
+static int map_vdso(struct vm_special_mapping *vsm, unsigned long vdso_pages,
unsigned long vdso_base)
 {
struct mm_struct *mm = current->mm;
+   struct vm_area_struct *vma;
int ret = 0;
 
mm->context.vdso_base = 0;
@@ -198,12 +199,14 @@ static int map_vdso(struct page **vdso_pagelist, unsigned 
long vdso_pages,
 * It's fine to use that for setting breakpoints in the vDSO code
 * pages though.
 */
-   ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
+   vma = _install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
 VM_READ|VM_EXEC|
 VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
-vdso_pagelist);
-   if (ret)
+vsm);
+   if (IS_ERR(vma)) {
+   ret = PTR_ERR(vma);
current->mm->context.vdso_base = 0;
+   }
 
 out_up_mmap_sem:
up_write(>mmap_sem);
@@ -220,7 +223,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
return 0;
 
if (is_32bit_task())
-   return map_vdso(vdso32_pagelist, vdso32_pages, VDSO32_MBASE);
+   return map_vdso(_mapping, vdso32_pages, VDSO32_MBASE);
 #ifdef CONFIG_PPC64
else
/*
@@ -228,7 +231,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
 * allows get_unmapped_area to find an area near other mmaps
 * and most likely share a SLB entry.
 */
-   return map_vdso(vdso64_pagelist, vdso64_pages, 0);
+   return map_vdso(_mapping, vdso64_pages, 0);
 #endif
WARN_ONCE(1, "task is not 32-bit on non PPC64 kernel");
return -1;
diff --git a/arch/powerpc/kernel/vdso_common.c 
b/arch/powerpc/kernel/vdso_common.c
index c97c30606b3f..047f6b8b230f 100644
--- a/arch/powerpc/kernel/vdso_common.c
+++ b/arch/powerpc/kernel/vdso_common.c
@@ -14,7 +14,7 @@
 #define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE)
 #define vdso_kbase CONCAT3(vdso, BITS, _kbase)
 #define vdso_pages CONCAT3(vdso, BITS, _pages)
-#define vdso_pagelist  CONCAT3(vdso, BITS, _pagelist)
+#define vdso_mapping   CONCAT3(vdso, BITS, _mapping)
 
 #undef pr_fmt
 #define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt
@@ -207,6 +207,7 @@ static __init int vdso_setup(struct lib_elfinfo *v)
 static __init void init_vdso_pagelist(void)
 {
int i;
+   struct page **vdso_pagelist;
 
/* Make sure pages are in the correct state */
vdso_pagelist = kzalloc(sizeof(struct page *) * (vdso_pages + 2),
@@ -221,6 +222,9 @@ static __init void init_vdso_pagelist(void)
}
vdso_pagelist[i++] = virt_to_page(vdso_data);
vdso_pagelist[i] = NULL;
+
+   vdso_mapping.pages = vdso_pagelist;
+   vdso_mapping.name = "[vdso]";
 }
 
 #undef find_section
@@ -236,7 +240,7 @@ static __init void init_vdso_pagelist(void)
 #undef VDSO_LBASE
 #undef vdso_kbase
 #undef vdso_pages
-#undef vdso_pagelist
+#undef vdso_mapping
 #undef lib_elfinfo
 #undef BITS
 #undef _CONCAT3
-- 
2.10.0



[PATCH 5/7] powerpc/vdso: split map_vdso from arch_setup_additional_pages

2016-10-25 Thread Dmitry Safonov
I'll be easier to introduce vm_special_mapping struct in
a smaller map_vdso() function (see the next patches).

Impact: cleanup

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c | 67 +-
 1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 25d03d773c49..e68601ffc9ad 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -143,52 +143,23 @@ struct lib64_elfinfo
unsigned long   text;
 };
 
-
-/*
- * This is called from binfmt_elf, we create the special vma for the
- * vDSO and insert it into the mm struct tree
- */
-int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
+static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages,
+   unsigned long vdso_base)
 {
struct mm_struct *mm = current->mm;
-   struct page **vdso_pagelist;
-   unsigned long vdso_pages;
-   unsigned long vdso_base;
int ret = 0;
 
-   if (!vdso_ready)
-   return 0;
-
-#ifdef CONFIG_PPC64
-   if (is_32bit_task()) {
-   vdso_pagelist = vdso32_pagelist;
-   vdso_pages = vdso32_pages;
-   vdso_base = VDSO32_MBASE;
-   } else {
-   vdso_pagelist = vdso64_pagelist;
-   vdso_pages = vdso64_pages;
-   /*
-* On 64bit we don't have a preferred map address. This
-* allows get_unmapped_area to find an area near other mmaps
-* and most likely share a SLB entry.
-*/
-   vdso_base = 0;
-   }
-#else
-   vdso_pagelist = vdso32_pagelist;
-   vdso_pages = vdso32_pages;
-   vdso_base = VDSO32_MBASE;
-#endif
-
-   current->mm->context.vdso_base = 0;
+   mm->context.vdso_base = 0;
 
-   /* vDSO has a problem and was disabled, just don't "enable" it for the
+   /*
+* vDSO has a problem and was disabled, just don't "enable" it for the
 * process
 */
if (vdso_pages == 0)
return 0;
+
/* Add a page to the vdso size for the data page */
-   vdso_pages ++;
+   vdso_pages++;
 
/*
 * pick a base address for the vDSO in process space. We try to put it
@@ -239,6 +210,30 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
return ret;
 }
 
+/*
+ * This is called from binfmt_elf, we create the special vma for the
+ * vDSO and insert it into the mm struct tree
+ */
+int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
+{
+   if (!vdso_ready)
+   return 0;
+
+   if (is_32bit_task())
+   return map_vdso(vdso32_pagelist, vdso32_pages, VDSO32_MBASE);
+#ifdef CONFIG_PPC64
+   else
+   /*
+* On 64bit we don't have a preferred map address. This
+* allows get_unmapped_area to find an area near other mmaps
+* and most likely share a SLB entry.
+*/
+   return map_vdso(vdso64_pagelist, vdso64_pages, 0);
+#endif
+   WARN_ONCE(1, "task is not 32-bit on non PPC64 kernel");
+   return -1;
+}
+
 const char *arch_vma_name(struct vm_area_struct *vma)
 {
if (vma->vm_mm && vma->vm_start == vma->vm_mm->context.vdso_base)
-- 
2.10.0



[PATCH 2/7] powerpc/vdso: remove unused params in vdso_do_func_patch{32, 64}

2016-10-25 Thread Dmitry Safonov
Impact: cleanup

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 4ffb82a2d9e9..278b9aa25a1c 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -309,7 +309,6 @@ static unsigned long __init find_function32(struct 
lib32_elfinfo *lib,
 }
 
 static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32,
-  struct lib64_elfinfo *v64,
   const char *orig, const char *fix)
 {
Elf32_Sym *sym32_gen, *sym32_fix;
@@ -344,7 +343,6 @@ static unsigned long __init find_function32(struct 
lib32_elfinfo *lib,
 }
 
 static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32,
-  struct lib64_elfinfo *v64,
   const char *orig, const char *fix)
 {
return 0;
@@ -419,8 +417,7 @@ static unsigned long __init find_function64(struct 
lib64_elfinfo *lib,
 #endif
 }
 
-static int __init vdso_do_func_patch64(struct lib32_elfinfo *v32,
-  struct lib64_elfinfo *v64,
+static int __init vdso_do_func_patch64(struct lib64_elfinfo *v64,
   const char *orig, const char *fix)
 {
Elf64_Sym *sym64_gen, *sym64_fix;
@@ -619,11 +616,9 @@ static __init int vdso_fixup_alt_funcs(struct 
lib32_elfinfo *v32,
 * It would be easy to do, but doesn't seem to be necessary,
 * patching the OPD symbol is enough.
 */
-   vdso_do_func_patch32(v32, v64, patch->gen_name,
-patch->fix_name);
+   vdso_do_func_patch32(v32, patch->gen_name, patch->fix_name);
 #ifdef CONFIG_PPC64
-   vdso_do_func_patch64(v32, v64, patch->gen_name,
-patch->fix_name);
+   vdso_do_func_patch64(v64, patch->gen_name, patch->fix_name);
 #endif /* CONFIG_PPC64 */
}
 
-- 
2.10.0



[PATCH 1/7] powerpc/vdso: unify return paths in setup_additional_pages

2016-10-25 Thread Dmitry Safonov
Impact: cleanup

Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com>
---
 arch/powerpc/kernel/vdso.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 4111d30badfa..4ffb82a2d9e9 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -154,7 +154,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
struct page **vdso_pagelist;
unsigned long vdso_pages;
unsigned long vdso_base;
-   int rc;
+   int ret = 0;
 
if (!vdso_ready)
return 0;
@@ -203,8 +203,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
  ((VDSO_ALIGNMENT - 1) & PAGE_MASK),
  0, 0);
if (IS_ERR_VALUE(vdso_base)) {
-   rc = vdso_base;
-   goto fail_mmapsem;
+   ret = vdso_base;
+   goto out_up_mmap_sem;
}
 
/* Add required alignment. */
@@ -227,21 +227,16 @@ int arch_setup_additional_pages(struct linux_binprm 
*bprm, int uses_interp)
 * It's fine to use that for setting breakpoints in the vDSO code
 * pages though.
 */
-   rc = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
+   ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
 VM_READ|VM_EXEC|
 VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
 vdso_pagelist);
-   if (rc) {
+   if (ret)
current->mm->context.vdso_base = 0;
-   goto fail_mmapsem;
-   }
-
-   up_write(>mmap_sem);
-   return 0;
 
- fail_mmapsem:
+out_up_mmap_sem:
up_write(>mmap_sem);
-   return rc;
+   return ret;
 }
 
 const char *arch_vma_name(struct vm_area_struct *vma)
-- 
2.10.0



[PATCH 0/7] powerpc/mm: refactor vDSO mapping code

2016-10-25 Thread Dmitry Safonov
Cleanup patches for vDSO on powerpc.
Originally, I wanted to add vDSO remapping on arm/aarch64 and
I decided to cleanup that part on powerpc.
I've add a hook for vm_ops for vDSO just like I did for x86.
Other changes - reduce exhaustive code duplication.
No visible to userspace changes expected.

Tested on qemu with buildroot rootfs.

Dmitry Safonov (7):
  powerpc/vdso: unify return paths in setup_additional_pages
  powerpc/vdso: remove unused params in vdso_do_func_patch{32,64}
  powerpc/vdso: separate common code in vdso_common
  powerpc/vdso: introduce init_vdso{32,64}_pagelist
  powerpc/vdso: split map_vdso from arch_setup_additional_pages
  powerpc/vdso: switch from legacy_special_mapping_vmops
  mm: kill arch_mremap

 arch/alpha/include/asm/Kbuild|   1 -
 arch/arc/include/asm/Kbuild  |   1 -
 arch/arm/include/asm/Kbuild  |   1 -
 arch/arm64/include/asm/Kbuild|   1 -
 arch/avr32/include/asm/Kbuild|   1 -
 arch/blackfin/include/asm/Kbuild |   1 -
 arch/c6x/include/asm/Kbuild  |   1 -
 arch/cris/include/asm/Kbuild |   1 -
 arch/frv/include/asm/Kbuild  |   1 -
 arch/h8300/include/asm/Kbuild|   1 -
 arch/hexagon/include/asm/Kbuild  |   1 -
 arch/ia64/include/asm/Kbuild |   1 -
 arch/m32r/include/asm/Kbuild |   1 -
 arch/m68k/include/asm/Kbuild |   1 -
 arch/metag/include/asm/Kbuild|   1 -
 arch/microblaze/include/asm/Kbuild   |   1 -
 arch/mips/include/asm/Kbuild |   1 -
 arch/mn10300/include/asm/Kbuild  |   1 -
 arch/nios2/include/asm/Kbuild|   1 -
 arch/openrisc/include/asm/Kbuild |   1 -
 arch/parisc/include/asm/Kbuild   |   1 -
 arch/powerpc/include/asm/mm-arch-hooks.h |  28 --
 arch/powerpc/kernel/vdso.c   | 492 +--
 arch/powerpc/kernel/vdso_common.c| 248 
 arch/s390/include/asm/Kbuild |   1 -
 arch/score/include/asm/Kbuild|   1 -
 arch/sh/include/asm/Kbuild   |   1 -
 arch/sparc/include/asm/Kbuild|   1 -
 arch/tile/include/asm/Kbuild |   1 -
 arch/um/include/asm/Kbuild   |   1 -
 arch/unicore32/include/asm/Kbuild|   1 -
 arch/x86/include/asm/Kbuild  |   1 -
 arch/xtensa/include/asm/Kbuild   |   1 -
 include/asm-generic/mm-arch-hooks.h  |  16 -
 include/linux/mm-arch-hooks.h|  25 --
 mm/mremap.c  |   4 -
 36 files changed, 323 insertions(+), 520 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h
 create mode 100644 arch/powerpc/kernel/vdso_common.c
 delete mode 100644 include/asm-generic/mm-arch-hooks.h
 delete mode 100644 include/linux/mm-arch-hooks.h

-- 
2.10.0



Re: VDSO unmap and remap support for additional architectures

2016-04-29 Thread Dmitry Safonov

On 04/29/2016 04:22 PM, Christopher Covington wrote:

On 04/28/2016 02:53 PM, Andy Lutomirski wrote:

Also, at some point, possibly quite soon, x86 will want a way for
user code to ask the kernel to map a specific vdso variant at a specific
address. Could we perhaps add a new pair of syscalls:

struct vdso_info {
 unsigned long space_needed_before;
 unsigned long space_needed_after;
 unsigned long alignment;
};

long vdso_get_info(unsigned int vdso_type, struct vdso_info *info);

long vdso_remap(unsigned int vdso_type, unsigned long addr, unsigned int flags);

#define VDSO_X86_I386 0
#define VDSO_X86_64 1
#define VDSO_X86_X32 2
// etc.

vdso_remap will map the vdso of the chosen type such at
AT_SYSINFO_EHDR lines up with addr. It will use up to
space_needed_before bytes before that address and space_needed_after
after than address. It will also unmap the old vdso (or maybe only do
that if some flag is set).

On x86, mremap is *not* sufficient for everything that's needed,
because some programs will need to change the vdso type.

I don't I understand. Why can't people just exec() the ELF type that
corresponds to the VDSO they want?


I may say about my needs in it: to not lose all the existing
information in application.
Imagine you're restoring a container with 64-bit and 32-bit
applications (in compatible mode). So you need somehow
switch vdso type in restorer for a 32-bit application.
Yes, you may exec() and then - all already restored application
properties will got lost. You will need to transpher information
about mappings, make protocol between restorer binary
and main criu application, finally you'll end up with some
really much more difficult architecture than it is now.
And it'll be slower.

Also it's pretty logical: if one can switch between modes,
why can't he change vdso mapping to mode he got to?
(note: if the work about removing thread compatible flags
will be done (on x86), there will not even be such a thing,
as application mode - just difference on which syscalls it
uses: compatible or native).

Thanks,
Dmitry Safonov
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev