[PATCH] powerpc/setup: display reason for not booting

2018-12-17 Thread Christophe Leroy
When no machine description matches, display it clearly
before looping forever.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/setup-common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 4fe7740917a7..ef7fb60534a8 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -634,7 +634,7 @@ void probe_machine(void)
}
/* What can we do if we didn't find ? */
if (machine_id >= &__machine_desc_end) {
-   DBG("No suitable machine found !\n");
+   pr_err("No suitable machine description found !\n");
for (;;);
}
 
-- 
2.13.3



[PATCH v2] powerpc/perf: Fix loop exit condition in nest_imc_event_init

2018-12-17 Thread Anju T Sudhakar
The data structure (i.e struct imc_mem_info) to hold the memory address
information for nest imc units is allocated based on the number of nodes
in the system.

nest_imc_event_init() traverse this struct array to calculate the memory
base address for the event-cpu. If we fail to find a match for the event
cpu's chip-id in imc_mem_info struct array, then the do-while loop will
iterate until we crash.

Fix this by changing the loop exit condition based on the number of 
non zero vbase elements in the array, since the allocation is done for
nr_chips + 1.

Reported-by: Dan Carpenter  
Fixes: 885dcd709ba91 ( powerpc/perf: Add nest IMC PMU support)
Signed-off-by: Anju T Sudhakar 
---
 arch/powerpc/perf/imc-pmu.c   | 2 +-
 arch/powerpc/platforms/powernv/opal-imc.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 4f34c75..d1009fe 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -508,7 +508,7 @@ static int nest_imc_event_init(struct perf_event *event)
break;
}
pcni++;
-   } while (pcni);
+   } while (pcni->vbase != 0);
 
if (!flag)
return -ENODEV;
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 58a0794..3d27f02 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -127,7 +127,7 @@ static int imc_get_mem_addr_nest(struct device_node *node,
nr_chips))
goto error;
 
-   pmu_ptr->mem_info = kcalloc(nr_chips, sizeof(*pmu_ptr->mem_info),
+   pmu_ptr->mem_info = kcalloc(nr_chips + 1, sizeof(*pmu_ptr->mem_info),
GFP_KERNEL);
if (!pmu_ptr->mem_info)
goto error;
-- 
1.8.3.1



Re: [PATCH v1 03/13] powerpc/mm/32s: rework mmu_mapin_ram()

2018-12-17 Thread Jonathan Neuschäfer
On Mon, Dec 17, 2018 at 10:29:18AM +0100, Christophe Leroy wrote:
> > With patches 1-3:
> > [0.00] setbat(0, c000, , 0100, 311)
> > [0.00] setbat(2, c100, 0100, 0080, 311)
> > [0.00] setbat(4, d000, 1000, 0200, 791)
> 
> What we see is that BAT0 is not used in the origin. I have always wondered
> the reason, maybe there is something odd behind and BAT0 shall no ne used.
> 
> Could you try and modify find_free_bat() so that it starts at b = 1 instead
> of b = 0 ?

In this case, setbat is called with index 2, 3, and 4, but the Wii still
doesn't boot.

> > According to arch/powerpc/include/asm/book3s/32/hash.h,
> >   - 0x591 = _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_COHERENT | 
> > _PAGE_PRESENT
> >   - 0x311 = _PAGE_EXEC | _PAGE_ACCESSED | _PAGE_COHERENT | _PAGE_PRESENT
> >   - 0x791 = _PAGE_RW | _PAGE_EXEC | _PAGE_ACCESSED | _PAGE_DIRTY | 
> > _PAGE_COHERENT | _PAGE_PRESENT
> > 
> 
> Yes, patch 1 added _PAGE_EXEC which explains this 0x200.
> Do you confirm it still works well with only patch 1 ?

Patch 1 alone boots to userspace.


Jonathan


signature.asc
Description: PGP signature


Re: [PATCH] KVM: PPC: Book3S PR: Set hflag to indicate that POWER9 supports 1T segments

2018-12-17 Thread Paul Mackerras
On Fri, Dec 07, 2018 at 02:43:18PM +1100, Suraj Jitindar Singh wrote:
> When booting a kvm-pr guest on a POWER9 machine the following message is
> observed:
> "qemu-system-ppc64: KVM does not support 1TiB segments which guest expects"
> 
> This is because the guest is expecting to be able to use 1T segments
> however we don't indicate support for it. This is because we don't set
> the BOOK3S_HFLAG_MULTI_PGSIZE flag in the hflags in kvmppc_set_pvr_pr()
> on POWER9.
> 
> POWER9 does indeed have support for 1T segments, so add a case for
> POWER9 to the switch statement to ensure it is set.
> 
> Signed-off-by: Suraj Jitindar Singh 

Thanks, patch applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH] KVM: PPC: Book3S HV: Change to use DEFINE_SHOW_ATTRIBUTE macro

2018-12-17 Thread Paul Mackerras
On Mon, Nov 05, 2018 at 09:47:17AM -0500, Yangtao Li wrote:
> Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
> 
> Signed-off-by: Yangtao Li 

Thanks, patch applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH V4 0/8] KVM: PPC: Implement passthrough of emulated devices for nested guests

2018-12-17 Thread Paul Mackerras
On Fri, Dec 14, 2018 at 04:29:02PM +1100, Suraj Jitindar Singh wrote:
> This patch series allows for emulated devices to be passed through to nested
> guests, irrespective of at which level the device is being emulated.
> 
> Note that the emulated device must be using dma, not virtio.
> 
> For example, passing through an emulated e1000:
> 
> 1. Emulate the device at L(n) for L(n+1)
> 
> qemu-system-ppc64 -netdev type=user,id=net0 -device e1000,netdev=net0
> 
> 2. Assign the VFIO-PCI driver at L(n+1)
> 
> echo vfio-pci > /sys/bus/pci/devices/:00:00.0/driver_override
> echo :00:00.0 > /sys/bus/pci/drivers/e1000/unbind
> echo :00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
> chmod 666 /dev/vfio/0
> 
> 3. Pass the device through from L(n+1) to L(n+2)
> 
> qemu-system-ppc64 -device vfio-pci,host=:00:00.0
> 
> 4. L(n+2) can now access the device which will be emulated at L(n)
> 
> V2 -> V3:
> 1/8: None
> 2/8: None
> 3/8: None
> 4/8: None
> 5/8: None
> 6/8: Add if def to fix compilation for some platforms
> 7/8: None
> 8/8: None
> 
> Suraj Jitindar Singh (8):
>   KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
>   KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix()
>   KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2
>   KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops
> struct
>   KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants
>   KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2
> guest
>   KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants
> 1 & 2
>   KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3
> guest

Thanks, series applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH 7/8] powerpc/dma: split the two __dma_alloc_coherent implementations

2018-12-17 Thread Gerhard Pircher
Am 2018-12-17 um 08:35 schrieb Christoph Hellwig:
> On Mon, Dec 17, 2018 at 07:51:05AM +0100, Christophe Leroy wrote:
>>
>>
>> Le 16/12/2018 à 18:19, Christoph Hellwig a écrit :
>>> The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share
>>> any code with the one for systems with coherent caches.  Split it off
>>> and merge it with the helpers in dma-noncoherent.c that have no other
>>> callers.
>>>
>>> Signed-off-by: Christoph Hellwig 
>>> Acked-by: Benjamin Herrenschmidt 
>>> ---
>>>   arch/powerpc/include/asm/dma-mapping.h |  5 -
>>>   arch/powerpc/kernel/dma.c  | 14 ++
>>
>> Instead of all the ifdefs in dma.c, couldn't we split it
>> in two files, ie dma.c for common parts and dma-coherence.c for specific 
>> stuff ?
> 
> The end goal is to kill dma.c and keep dma-noncoherent.c only with most
> of the code moving to common code.  Here is the current state of that:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.5
> 
> But it still has issues on two tested platforms and isn't ready yet.
I hope that I can give this a try on one of my AmigaOne machines over
Christmas. Unfortunately my main local AmigaOne machine is out of order
and the other one is only remotely accessible, which makes kernel testing
a bit hard. :-)

Gerhard


Re: [PATCH 2/2] s390/pci: handle function enumeration after sriov enablement

2018-12-17 Thread Christoph Hellwig
On Mon, Dec 17, 2018 at 06:30:18PM +0100, Sebastian Ott wrote:
> Something like this:
> https://lore.kernel.org/linux-pci/20181212215453.gj99...@google.com/T/#m649d86ea3c65f669c74d048f89afbaf473876ac3

No, I literally meant a flag to skip the work.  Think about it: there
is a standard way to probe VFs, which comes from what is defined in the
PCIe spec itself.  It just turns out s390 for some weird reason decides
to already let the VFs show up basically like PFs.  There really should
be no reason to branch out into per-arch code here as there really
isn't much to do on a per-arch level.  More just a quirk for the
firmware is buggy and already reports the VFs to us, so skip the
probing.


Re: [PATCH 2/2] s390/pci: handle function enumeration after sriov enablement

2018-12-17 Thread Sebastian Ott
On Fri, 14 Dec 2018, Christoph Hellwig wrote:
> On Fri, Dec 14, 2018 at 05:12:45AM -0800, Christoph Hellwig wrote:
> > On Thu, Dec 13, 2018 at 06:54:28PM +0100, Sebastian Ott wrote:
> > > Implement pcibios_sriov_{add|del}_vfs as empty functions. VF
> > > creation will be triggered by the hotplug code.
> > 
> > And instead of having the arch suplply a no-op arch override I
> > think it would be better to have the config option just stub it
> > out in common code.
> 
> Or in fact maybe even a runtime flag in struct pci_dev.  Who knows
> if all future s390 PCIe busses will have exactly the same behavior
> or if we eventually get the standards compliant behvior back?

Something like this:
https://lore.kernel.org/linux-pci/20181212215453.gj99...@google.com/T/#m649d86ea3c65f669c74d048f89afbaf473876ac3

Not a runtime flag, but a function pointer in struct pci_host_bridge.
This would provide the requested flexibility. The problem with this
approach is that it requires other patches that are not yet upstream
(https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=pci-probe-rework).

Since this discussion is going on since a few months and I want to
have this code upstream and in distributions for HW enablement I've
asked Bjorn to go with the initial approach (weak functions) and
promised to move that to struct pci_host_bridge once Arnd's patches
are upstream. Would that be OK for you too?

Regards,
Sebastian



Re: [RESEND PATCH] kernel/dma/direct: Do not include SME mask in the DMA supported check

2018-12-17 Thread Christoph Hellwig
Thanks,

applied to the dma-mapping for-linus tree.


Re: [PATCH] kernel/dma/direct: Do not include SME mask in the DMA supported check

2018-12-17 Thread Lendacky, Thomas
On 12/16/2018 05:41 PM, Tom Lendacky wrote:
> On 12/15/2018 04:55 AM, Christoph Hellwig wrote:
>> The mail seems to be so oddly encoded so that git-am fails on it.  Can
>> you resend as plain text?
> 
> Hmmm... not sure what happened with that, but yeah, looking at the message
> source shows something strange went on. Let me take a look and I'll try to
> get a good version to you tommorow (Monday).

Must have been something with stgit... just resent using git format-patch
and git send-email and it looks ok.  Let me know if it's still not right
when it gets to you.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>


[RESEND PATCH] kernel/dma/direct: Do not include SME mask in the DMA supported check

2018-12-17 Thread Lendacky, Thomas
The dma_direct_supported() function intends to check the DMA mask against
specific values. However, the phys_to_dma() function includes the SME
encryption mask, which defeats the intended purpose of the check. This
results in drivers that support less than 48-bit DMA (SME encryption mask
is bit 47) from being able to set the DMA mask successfully when SME is
active, which results in the driver failing to initialize.

Change the function used to check the mask from phys_to_dma() to
__phys_to_dma() so that the SME encryption mask is not part of the check.

Fixes: c1d0af1a1d5d ("kernel/dma/direct: take DMA offset into account in 
dma_direct_supported")
Signed-off-by: Tom Lendacky 
---
 kernel/dma/direct.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 22a12ab..375c77e 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -309,7 +309,12 @@ int dma_direct_supported(struct device *dev, u64 mask)
 
min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT);
 
-   return mask >= phys_to_dma(dev, min_mask);
+   /*
+* This check needs to be against the actual bit mask value, so
+* use __phys_to_dma() here so that the SME encryption mask isn't
+* part of the check.
+*/
+   return mask >= __phys_to_dma(dev, min_mask);
 }
 
 int dma_direct_mapping_error(struct device *dev, dma_addr_t dma_addr)
-- 
1.9.1



[PATCH] powerpc/prom: move the device tree if not in declared memory.

2018-12-17 Thread Christophe Leroy
If the device tree doesn't reside in the memory which is declared
inside it, it has to be moved as well as this memory will not be
mapped by the kernel.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/prom.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 87a68e2dc531..4181ec715f88 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -124,8 +124,8 @@ static void __init move_device_tree(void)
size = fdt_totalsize(initial_boot_params);
 
if ((memory_limit && (start + size) > PHYSICAL_START + memory_limit) ||
-   overlaps_crashkernel(start, size) ||
-   overlaps_initrd(start, size)) {
+   !memblock_is_memory(start + size - 1) ||
+   overlaps_crashkernel(start, size) || overlaps_initrd(start, size)) {
p = __va(memblock_phys_alloc(size, PAGE_SIZE));
memcpy(p, initial_boot_params, size);
initial_boot_params = p;
-- 
2.13.3



Re: [PATCH v5 0/5] powerpc: system call table generation support

2018-12-17 Thread Michael Ellerman
Firoz Khan  writes:

> Hi Michael,
>
> On Mon, 17 Dec 2018 at 16:01, Michael Ellerman  wrote:
>> No it's fine if it applies on next.
>>
>> I can also fix up minor merge conflicts if there are any.
>
> Ohh. I already rebased and sent v6.

That's OK.

cheers


Re: [PATCH NEXT v2 1/4] powerpc/pasemi: Add PCI initialisation for Nemo board.

2018-12-17 Thread Michael Ellerman
Darren Stevens  writes:

> Michael,
>
> Any comments on these?

Hi Darren,

I guess in general we'd like more of this to come from the device tree.

But I'll merge this series as-is, because I don't think it helps anyone
to have this code out-of-tree. We can always clean things up further in
future if anyone has the time & motivation.

cheers

> On 19/08/2018, Darren Stevens wrote:
>> The A-Eon Amigaone X1000's Nemo motherboard has an AMD SB600
>> connected to one of the PCI-e root ports on its PaSemi
>> Pwrficient 1628M SoC. Normally the SB600 southbridge would be
>> connected to a hidden PCI-e port on the system's northbridge,
>> and as a result doesn't fully comply with the PCI-e spec.
>> 
>> Add code to relax the PCI-e detection in both the root port
>> and the Linux kernel allowing on board devices to be detected.
>> 
>> Signed-off-by: Darren Stevens 
>>
>> ---
>>
>> Changes made:
>>
>> v2: Replaced sb600_bus with a define, moved iob_mapbase into 
>> sb600_set_flag()
>> Created some register/Flag names (as I don't have the docs
>> for the PA6T-1682M)


Re: [PATCH v2] powerpc/mm: make NULL pointer deferences explicit on bad page faults.

2018-12-17 Thread Michael Ellerman
Christophe Leroy  writes:

> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 01b9bcc7fa85..3398291f4785 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -636,21 +636,24 @@ void bad_page_fault(struct pt_regs *regs, unsigned long 
> address, int sig)
>   switch (TRAP(regs)) {
>   case 0x300:
>   case 0x380:
> - printk(KERN_ALERT "Unable to handle kernel paging request for "
> - "data at address 0x%08lx\n", regs->dar);
> + if (regs->dar < PAGE_SIZE)
> + pr_alert("BUG: Kernel NULL pointer dereference");
> + else
> + pr_alert("BUG: Unable to handle kernel data access");
> + pr_cont(" at 0x%08lx\n", regs->dar);

It's best to avoid pr_cont() as it can lead to interleaving, so I
rewrote this as:

pr_alert("BUG: %s at 0x%08lx\n",
 regs->dar < PAGE_SIZE ? "Kernel NULL pointer 
dereference" :
 "Unable to handle kernel data access", regs->dar);


>   break;
>   case 0x400:
>   case 0x480:
> - printk(KERN_ALERT "Unable to handle kernel paging request for "
> - "instruction fetch\n");
> + pr_alert("BUG: Unable to handle kernel instruction fetch%s",
> +  regs->nip < PAGE_SIZE ? " (NULL pointer ?)\n" : "\n");
   I dropped the space here ^


cheers


Re: [PATCH v1 3/9] powerpc/vdso: don't clear PG_reserved

2018-12-17 Thread Michael Ellerman
David Hildenbrand  writes:

> The VDSO is part of the kernel image and therefore the struct pages are
> marked as reserved during boot.
>
> As we install a special mapping, the actual struct pages will never be
> exposed to MM via the page tables. We can therefore leave the pages
> marked as reserved.
>
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Christophe Leroy 
> Cc: Kees Cook 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Matthew Wilcox 
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/kernel/vdso.c | 2 --
>  1 file changed, 2 deletions(-)

Thanks.

Acked-by: Michael Ellerman  (powerpc)

cheers

> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
> index 65b3bdb99f0b..d59dc2e9a695 100644
> --- a/arch/powerpc/kernel/vdso.c
> +++ b/arch/powerpc/kernel/vdso.c
> @@ -795,7 +795,6 @@ static int __init vdso_init(void)
>   BUG_ON(vdso32_pagelist == NULL);
>   for (i = 0; i < vdso32_pages; i++) {
>   struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE);
> - ClearPageReserved(pg);
>   get_page(pg);
>   vdso32_pagelist[i] = pg;
>   }
> @@ -809,7 +808,6 @@ static int __init vdso_init(void)
>   BUG_ON(vdso64_pagelist == NULL);
>   for (i = 0; i < vdso64_pages; i++) {
>   struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE);
> - ClearPageReserved(pg);
>   get_page(pg);
>   vdso64_pagelist[i] = pg;
>   }
> -- 
> 2.17.2


Re: [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods

2018-12-17 Thread Michael Ellerman
Christoph Hellwig  writes:

> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index dbfc7056d7df..d442d23e182b 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -247,6 +252,8 @@ static inline void dma_nommu_unmap_page(struct device 
> *dev,
>enum dma_data_direction direction,
>unsigned long attrs)
>  {
> + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> + __dma_sync(bus_to_virt(dma_address), size, dir);

I did s/dir/direction here.

cheers


Re: [PATCH] powerpc/mm: make NULL pointer deferences explicit on bad page faults.

2018-12-17 Thread Michael Ellerman
Christophe Leroy  writes:
> Hi Michael,
>
> Le 14/12/2018 à 01:57, Michael Ellerman a écrit :
>> Hi Christophe,
>> 
>> You know it's the trivial patches that are going to get lots of review
>> comments :)
>
> I'm so happy to get comments.

Haha :)

>> Christophe Leroy  writes:
>>> As several other arches including x86, this patch makes it explicit
>>> that a bad page fault is a NULL pointer dereference when the fault
>>> address is lower than PAGE_SIZE
>> 
>> I'm being pedantic, but it's not necessarily a NULL pointer dereference.
>> It might just be a direct access to a low address, eg:
>> 
>>   char *p = 0x100;
>>   *p = 0;
>> 
>> That's not a NULL pointer dereference.
>> 
>> But other arches do print this so I guess it's OK to add, and in most
>> cases it will be an actual NULL pointer dereference.
>> 
>> I wonder though if we should use 4096 rather than PAGE_SIZE, given
>> that's the actual value other arches are using. We support 256K pages on
>> some systems, which is getting quite large.
>
> Those invalid accesses are catched because the first page is marked non 
> present or non accessible in the page table, so I thing using PAGE_SIZE 
> here is valid regardless of the page size.

It's not a question of whether we catch the fault it's what we print
when we catch it. Most of the time on 64-bit the first few GB of the
page tables will be empty, so those will all fault, but we don't call
them NULL pointer deferences.

So I'm just saying that this is a heuristic, ie. an access close to zero
is probably an access at a small offset from a NULL pointer, but it may
not be. And so it's kind of arbitrary where we decide to make the cut
off point between printing that it's a NULL pointer vs a regularly bad
access.

Anyway I'm happy to use PAGE_SHIFT for now, if anyone complains we can
always change it.

>> What about:
>> 
>>BUG: Unable to handle kernel instruction fetch at 0x
>
> I think we still need to make it explicit that we jumped there due to a 
> NULL function pointer, allthought I don't have a good text idea yet for 
> this.

Being pedantic again, we don't know that it was a NULL function pointer.
You might have done a bad setcontext and set your NIP to zero.

But it's probably fine to print it as a hint, and it's probably right
most of the time.

cheers


Re: [PATCH] powerpc/ptrace: cleanup do_syscall_trace_enter

2018-12-17 Thread Oleg Nesterov
On 12/16, Dmitry V. Levin wrote:
>
>  long do_syscall_trace_enter(struct pt_regs *regs)
>  {
> + u32 cached_flags;
> +
>   user_exit();
>  
> - if (test_thread_flag(TIF_SYSCALL_EMU)) {
> - /*
> -  * A nonzero return code from tracehook_report_syscall_entry()
> -  * tells us to prevent the syscall execution, but we are not
> -  * going to execute it anyway.
> -  *
> -  * Returning -1 will skip the syscall execution. We want to
> -  * avoid clobbering any register also, thus, not 'gotoing'
> -  * skip label.
> -  */
> - if (tracehook_report_syscall_entry(regs))
> - ;
> - return -1;
> - }
> + cached_flags = READ_ONCE(current_thread_info()->flags) &
> +(_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE);
>  
> - /*
> -  * The tracer may decide to abort the syscall, if so tracehook
> -  * will return !0. Note that the tracer may also just change
> -  * regs->gpr[0] to an invalid syscall number, that is handled
> -  * below on the exit path.
> -  */
> - if (test_thread_flag(TIF_SYSCALL_TRACE) &&
> - tracehook_report_syscall_entry(regs))
> - goto skip;
> + if (cached_flags) {
> + int rc = tracehook_report_syscall_entry(regs);
> +
> + if (unlikely(cached_flags & _TIF_SYSCALL_EMU)) {
> + /*
> +  * A nonzero return code from
> +  * tracehook_report_syscall_entry() tells us
> +  * to prevent the syscall execution, but
> +  * we are not going to execute it anyway.
> +  *
> +  * Returning -1 will skip the syscall execution.
> +  * We want to avoid clobbering any register also,
> +  * thus, not 'gotoing' skip label.
> +  */
> + return -1;
> + }
> +
> + if (rc) {
> + /*
> +  * The tracer decided to abort the syscall.
> +  * Note that the tracer may also just change
> +  * regs->gpr[0] to an invalid syscall number,
> +  * that is handled below on the exit path.
> +  */
> + goto skip;
> + }
> + }

Looks good to me,

Oleg.



Re: [PATCH] powerpc/ptrace: cleanup do_syscall_trace_enter

2018-12-17 Thread Dmitry V. Levin
Hi,

On Mon, Dec 17, 2018 at 10:20:26PM +1100, Michael Ellerman wrote:
> "Dmitry V. Levin"  writes:
> > Invoke tracehook_report_syscall_entry once.
> 
> Thanks.
> 
> > Signed-off-by: Dmitry V. Levin 
> > ---
> >  arch/powerpc/kernel/ptrace.c | 54 +---
> >  1 file changed, 31 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> > index 714c3480c52d..8794d32c2d9e 100644
> > --- a/arch/powerpc/kernel/ptrace.c
> > +++ b/arch/powerpc/kernel/ptrace.c
> > @@ -3263,32 +3263,40 @@ static inline int do_seccomp(struct pt_regs *regs) 
> > { return 0; }
> >   */
> >  long do_syscall_trace_enter(struct pt_regs *regs)
> >  {
> > +   u32 cached_flags;
> > +
> 
> Do you mind if I just call it "flags", I find "cached_flags" a bit
> unwieldy for some reason.
> 
> I'm happy to fix it up when applying.

No problem, feel free to call it whatever you like.  Thanks,


-- 
ldv


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/ptrace: cleanup do_syscall_trace_enter

2018-12-17 Thread Michael Ellerman
"Dmitry V. Levin"  writes:
> Invoke tracehook_report_syscall_entry once.

Thanks.

> Signed-off-by: Dmitry V. Levin 
> ---
>  arch/powerpc/kernel/ptrace.c | 54 +---
>  1 file changed, 31 insertions(+), 23 deletions(-)
>
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 714c3480c52d..8794d32c2d9e 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -3263,32 +3263,40 @@ static inline int do_seccomp(struct pt_regs *regs) { 
> return 0; }
>   */
>  long do_syscall_trace_enter(struct pt_regs *regs)
>  {
> + u32 cached_flags;
> +

Do you mind if I just call it "flags", I find "cached_flags" a bit
unwieldy for some reason.

I'm happy to fix it up when applying.

cheers

>   user_exit();
>  
> - if (test_thread_flag(TIF_SYSCALL_EMU)) {
> - /*
> -  * A nonzero return code from tracehook_report_syscall_entry()
> -  * tells us to prevent the syscall execution, but we are not
> -  * going to execute it anyway.
> -  *
> -  * Returning -1 will skip the syscall execution. We want to
> -  * avoid clobbering any register also, thus, not 'gotoing'
> -  * skip label.
> -  */
> - if (tracehook_report_syscall_entry(regs))
> - ;
> - return -1;
> - }
> + cached_flags = READ_ONCE(current_thread_info()->flags) &
> +(_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE);
>  
> - /*
> -  * The tracer may decide to abort the syscall, if so tracehook
> -  * will return !0. Note that the tracer may also just change
> -  * regs->gpr[0] to an invalid syscall number, that is handled
> -  * below on the exit path.
> -  */
> - if (test_thread_flag(TIF_SYSCALL_TRACE) &&
> - tracehook_report_syscall_entry(regs))
> - goto skip;
> + if (cached_flags) {
> + int rc = tracehook_report_syscall_entry(regs);
> +
> + if (unlikely(cached_flags & _TIF_SYSCALL_EMU)) {
> + /*
> +  * A nonzero return code from
> +  * tracehook_report_syscall_entry() tells us
> +  * to prevent the syscall execution, but
> +  * we are not going to execute it anyway.
> +  *
> +  * Returning -1 will skip the syscall execution.
> +  * We want to avoid clobbering any register also,
> +  * thus, not 'gotoing' skip label.
> +  */
> + return -1;
> + }
> +
> + if (rc) {
> + /*
> +  * The tracer decided to abort the syscall.
> +  * Note that the tracer may also just change
> +  * regs->gpr[0] to an invalid syscall number,
> +  * that is handled below on the exit path.
> +  */
> + goto skip;
> + }
> + }
>  
>   /* Run seccomp after ptrace; allow it to set gpr[3]. */
>   if (do_seccomp(regs))
> -- 
> ldv


Re: [PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache

2018-12-17 Thread Michael Ellerman
Hi Frank,

frowand.l...@gmail.com writes:
> From: Frank Rowand 
>
> Non-overlay dynamic devicetree node removal may leave the node in
> the phandle cache.  Subsequent calls to of_find_node_by_phandle()
> will incorrectly find the stale entry.  Remove the node from the
> cache.
>
> Add paranoia checks in of_find_node_by_phandle() as a second level
> of defense (do not return cached node if detached, do not add node
> to cache if detached).
>
> Reported-by: Michael Bringmann 
> Signed-off-by: Frank Rowand 
> ---

Similarly here can we add:

Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of 
of_find_node_by_phandle()")
Cc: sta...@vger.kernel.org # v4.17+


Thanks for doing this series.

Some minor comments below.

> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 6c33d63361b8..ad71864cecf5 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -162,6 +162,27 @@ int of_free_phandle_cache(void)
>  late_initcall_sync(of_free_phandle_cache);
>  #endif
>  
> +/*
> + * Caller must hold devtree_lock.
> + */
> +void __of_free_phandle_cache_entry(phandle handle)
> +{
> + phandle masked_handle;
> +
> + if (!handle)
> + return;

We could fold the phandle_cache check into that if and return early for
both cases couldn't we?

> + masked_handle = handle & phandle_cache_mask;
> +
> + if (phandle_cache) {

Meaning this wouldn't be necessary.

> + if (phandle_cache[masked_handle] &&
> + handle == phandle_cache[masked_handle]->phandle) {
> + of_node_put(phandle_cache[masked_handle]);
> + phandle_cache[masked_handle] = NULL;
> + }

A temporary would help the readability here I think, eg:

struct device_node *np;
np = phandle_cache[masked_handle];

if (np && handle == np->phandle) {
of_node_put(np);
phandle_cache[masked_handle] = NULL;
}

> @@ -1209,11 +1230,18 @@ struct device_node *of_find_node_by_phandle(phandle 
> handle)
>   if (phandle_cache[masked_handle] &&
>   handle == phandle_cache[masked_handle]->phandle)
>   np = phandle_cache[masked_handle];
> + if (np && of_node_check_flag(np, OF_DETACHED)) {
> + WARN_ON(1);
> + of_node_put(np);

Do we really want to do the put here?

We're here because something has gone wrong, possibly even memory
corruption such that np is not even pointing at a device node anymore.
So it seems like it would be safer to just leave the ref count alone,
possibly leak a small amount of memory, and NULL out the reference.


cheers


Re: [PATCH v2 1/2] of: of_node_get()/of_node_put() nodes held in phandle cache

2018-12-17 Thread Michael Ellerman
Hi Frank,

frowand.l...@gmail.com writes:
> From: Frank Rowand 
>
> The phandle cache contains struct device_node pointers.  The refcount
> of the pointers was not incremented while in the cache, allowing use
> after free error after kfree() of the node.  Add the proper increment
> and decrement of the use count.
>
> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of 
> of_find_node_by_phandle()")

Can we also add:

Cc: sta...@vger.kernel.org # v4.17+


This and the next patch solve WARN_ONs and other problems for us on some
systems so I think they meet the criteria for a stable backport.

Rest of the patch LGTM, I'm not able to test it unfortunately, I have to
defer to mwb for that.

cheers

> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 09692c9b32a7..6c33d63361b8 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -116,9 +116,6 @@ int __weak of_node_to_nid(struct device_node *np)
>  }
>  #endif
>  
> -static struct device_node **phandle_cache;
> -static u32 phandle_cache_mask;
> -
>  /*
>   * Assumptions behind phandle_cache implementation:
>   *   - phandle property values are in a contiguous range of 1..n
> @@ -127,6 +124,44 @@ int __weak of_node_to_nid(struct device_node *np)
>   *   - the phandle lookup overhead reduction provided by the cache
>   * will likely be less
>   */
> +
> +static struct device_node **phandle_cache;
> +static u32 phandle_cache_mask;
> +
> +/*
> + * Caller must hold devtree_lock.
> + */
> +static void __of_free_phandle_cache(void)
> +{
> + u32 cache_entries = phandle_cache_mask + 1;
> + u32 k;
> +
> + if (!phandle_cache)
> + return;
> +
> + for (k = 0; k < cache_entries; k++)
> + of_node_put(phandle_cache[k]);
> +
> + kfree(phandle_cache);
> + phandle_cache = NULL;
> +}
> +
> +int of_free_phandle_cache(void)
> +{
> + unsigned long flags;
> +
> + raw_spin_lock_irqsave(_lock, flags);
> +
> + __of_free_phandle_cache();
> +
> + raw_spin_unlock_irqrestore(_lock, flags);
> +
> + return 0;
> +}
> +#if !defined(CONFIG_MODULES)
> +late_initcall_sync(of_free_phandle_cache);
> +#endif
> +
>  void of_populate_phandle_cache(void)
>  {
>   unsigned long flags;
> @@ -136,8 +171,7 @@ void of_populate_phandle_cache(void)
>  
>   raw_spin_lock_irqsave(_lock, flags);
>  
> - kfree(phandle_cache);
> - phandle_cache = NULL;
> + __of_free_phandle_cache();
>  
>   for_each_of_allnodes(np)
>   if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
> @@ -155,30 +189,15 @@ void of_populate_phandle_cache(void)
>   goto out;
>  
>   for_each_of_allnodes(np)
> - if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
> + if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) {
> + of_node_get(np);
>   phandle_cache[np->phandle & phandle_cache_mask] = np;
> + }
>  
>  out:
>   raw_spin_unlock_irqrestore(_lock, flags);
>  }
>  
> -int of_free_phandle_cache(void)
> -{
> - unsigned long flags;
> -
> - raw_spin_lock_irqsave(_lock, flags);
> -
> - kfree(phandle_cache);
> - phandle_cache = NULL;
> -
> - raw_spin_unlock_irqrestore(_lock, flags);
> -
> - return 0;
> -}
> -#if !defined(CONFIG_MODULES)
> -late_initcall_sync(of_free_phandle_cache);
> -#endif
> -
>  void __init of_core_init(void)
>  {
>   struct device_node *np;
> @@ -1195,8 +1214,11 @@ struct device_node *of_find_node_by_phandle(phandle 
> handle)
>   if (!np) {
>   for_each_of_allnodes(np)
>   if (np->phandle == handle) {
> - if (phandle_cache)
> + if (phandle_cache) {
> + /* will put when removed from cache */
> + of_node_get(np);
>   phandle_cache[masked_handle] = np;
> + }
>   break;
>   }
>   }
> -- 
> Frank Rowand 


Re: [PATCH v5 0/5] powerpc: system call table generation support

2018-12-17 Thread Firoz Khan
Hi Michael,

On Mon, 17 Dec 2018 at 16:01, Michael Ellerman  wrote:
> No it's fine if it applies on next.
>
> I can also fix up minor merge conflicts if there are any.

Ohh. I already rebased and sent v6.

Thanks
Firoz


[PATCH v6 5/5] powerpc: generate uapi header and system call table files

2018-12-17 Thread Firoz Khan
System call table generation script must be run to gener-
ate unistd_32/64.h and syscall_table_32/64/c32/spu.h files.
This patch will have changes which will invokes the script.

This patch will generate unistd_32/64.h and syscall_table-
_32/64/c32/spu.h files by the syscall table generation
script invoked by parisc/Makefile and the generated files
against the removed files must be identical.

The generated uapi header file will be included in uapi/-
asm/unistd.h and generated system call table header file
will be included by kernel/systbl.S file.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/Makefile   |   3 +
 arch/powerpc/include/asm/Kbuild |   4 +
 arch/powerpc/include/asm/systbl.h   | 395 
 arch/powerpc/include/uapi/asm/Kbuild|   2 +
 arch/powerpc/include/uapi/asm/unistd.h  | 392 +--
 arch/powerpc/kernel/Makefile|  10 -
 arch/powerpc/kernel/systbl.S|  52 +---
 arch/powerpc/kernel/systbl_chk.c|  60 -
 arch/powerpc/platforms/cell/spu_callbacks.c |  17 +-
 9 files changed, 26 insertions(+), 909 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/systbl.h
 delete mode 100644 arch/powerpc/kernel/systbl_chk.c

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 8a2ce14..34897191 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -402,6 +402,9 @@ archclean:
 
 archprepare: checkbin
 
+archheaders:
+   $(Q)$(MAKE) $(build)=arch/powerpc/kernel/syscalls all
+
 ifdef CONFIG_STACKPROTECTOR
 prepare: stack_protector_prepare
 
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 3196d22..77ff7fb 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -1,3 +1,7 @@
+generated-y += syscall_table_32.h
+generated-y += syscall_table_64.h
+generated-y += syscall_table_c32.h
+generated-y += syscall_table_spu.h
 generic-y += div64.h
 generic-y += export.h
 generic-y += irq_regs.h
diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
deleted file mode 100644
index c4321b9..000
--- a/arch/powerpc/include/asm/systbl.h
+++ /dev/null
@@ -1,395 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * List of powerpc syscalls. For the meaning of the _SPU suffix see
- * arch/powerpc/platforms/cell/spu_callbacks.c
- */
-
-SYSCALL(restart_syscall)
-SYSCALL(exit)
-PPC_SYS(fork)
-SYSCALL_SPU(read)
-SYSCALL_SPU(write)
-COMPAT_SYS_SPU(open)
-SYSCALL_SPU(close)
-SYSCALL_SPU(waitpid)
-SYSCALL_SPU(creat)
-SYSCALL_SPU(link)
-SYSCALL_SPU(unlink)
-COMPAT_SYS(execve)
-SYSCALL_SPU(chdir)
-COMPAT_SYS_SPU(time)
-SYSCALL_SPU(mknod)
-SYSCALL_SPU(chmod)
-SYSCALL_SPU(lchown)
-SYSCALL(ni_syscall)
-OLDSYS(stat)
-COMPAT_SYS_SPU(lseek)
-SYSCALL_SPU(getpid)
-COMPAT_SYS(mount)
-SYSX(sys_ni_syscall,sys_oldumount,sys_oldumount)
-SYSCALL_SPU(setuid)
-SYSCALL_SPU(getuid)
-COMPAT_SYS_SPU(stime)
-COMPAT_SYS(ptrace)
-SYSCALL_SPU(alarm)
-OLDSYS(fstat)
-SYSCALL(pause)
-COMPAT_SYS(utime)
-SYSCALL(ni_syscall)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(access)
-SYSCALL_SPU(nice)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(sync)
-SYSCALL_SPU(kill)
-SYSCALL_SPU(rename)
-SYSCALL_SPU(mkdir)
-SYSCALL_SPU(rmdir)
-SYSCALL_SPU(dup)
-SYSCALL_SPU(pipe)
-COMPAT_SYS_SPU(times)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(brk)
-SYSCALL_SPU(setgid)
-SYSCALL_SPU(getgid)
-SYSCALL(signal)
-SYSCALL_SPU(geteuid)
-SYSCALL_SPU(getegid)
-SYSCALL(acct)
-SYSCALL(umount)
-SYSCALL(ni_syscall)
-COMPAT_SYS_SPU(ioctl)
-COMPAT_SYS_SPU(fcntl)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(setpgid)
-SYSCALL(ni_syscall)
-SYSX(sys_ni_syscall,sys_olduname,sys_olduname)
-SYSCALL_SPU(umask)
-SYSCALL_SPU(chroot)
-COMPAT_SYS(ustat)
-SYSCALL_SPU(dup2)
-SYSCALL_SPU(getppid)
-SYSCALL_SPU(getpgrp)
-SYSCALL_SPU(setsid)
-SYS32ONLY(sigaction)
-SYSCALL_SPU(sgetmask)
-SYSCALL_SPU(ssetmask)
-SYSCALL_SPU(setreuid)
-SYSCALL_SPU(setregid)
-SYS32ONLY(sigsuspend)
-SYSX(sys_ni_syscall,compat_sys_sigpending,sys_sigpending)
-SYSCALL_SPU(sethostname)
-COMPAT_SYS_SPU(setrlimit)
-SYSX(sys_ni_syscall,compat_sys_old_getrlimit,sys_old_getrlimit)
-COMPAT_SYS_SPU(getrusage)
-COMPAT_SYS_SPU(gettimeofday)
-COMPAT_SYS_SPU(settimeofday)
-SYSCALL_SPU(getgroups)
-SYSCALL_SPU(setgroups)
-SYSX(sys_ni_syscall,sys_ni_syscall,ppc_select)
-SYSCALL_SPU(symlink)
-OLDSYS(lstat)
-SYSCALL_SPU(readlink)
-SYSCALL(uselib)
-SYSCALL(swapon)
-SYSCALL(reboot)
-SYSX(sys_ni_syscall,compat_sys_old_readdir,sys_old_readdir)
-SYSCALL_SPU(mmap)
-SYSCALL_SPU(munmap)
-COMPAT_SYS_SPU(truncate)
-COMPAT_SYS_SPU(ftruncate)
-SYSCALL_SPU(fchmod)
-SYSCALL_SPU(fchown)
-SYSCALL_SPU(getpriority)
-SYSCALL_SPU(setpriority)
-SYSCALL(ni_syscall)
-COMPAT_SYS(statfs)
-COMPAT_SYS(fstatfs)
-SYSCALL(ni_syscall)
-COMPAT_SYS_SPU(socketcall)
-SYSCALL_SPU(syslog)
-COMPAT_SYS_SPU(setitimer)
-COMPAT_SYS_SPU(getitimer)
-COMPAT_SYS_SPU(newstat)
-COMPAT_SYS_SPU(newlstat)
-COMPAT_SYS_SPU(newfstat)
-SYSX(sys_ni_syscall,sys_uname,sys_uname)

[PATCH v6 4/5] powerpc: split compat syscall table out from native table

2018-12-17 Thread Firoz Khan
PowerPC uses a syscall table with native and compat calls
interleaved, which is a slightly simpler way to define two
matching tables.

As we move to having the tables generated, that advantage
is no longer important, but the interleaved table gets in
the way of using the same scripts as on the other archit-
ectures.

Split out a new compat_sys_call_table symbol that contains
all the compat calls, and leave the main table for the nat-
ive calls, to more closely match the method we use every-
where else.

Suggested-by: Arnd Bergmann 
Signed-off-by: Firoz Khan 
---
 arch/powerpc/include/asm/syscall.h |  3 +--
 arch/powerpc/kernel/entry_64.S |  7 +--
 arch/powerpc/kernel/systbl.S   | 35 ---
 arch/powerpc/kernel/vdso.c |  7 +--
 4 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/syscall.h 
b/arch/powerpc/include/asm/syscall.h
index ab9f3f0..1a0e7a8 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -18,9 +18,8 @@
 #include 
 
 /* ftrace syscalls requires exporting the sys_call_table */
-#ifdef CONFIG_FTRACE_SYSCALLS
 extern const unsigned long sys_call_table[];
-#endif /* CONFIG_FTRACE_SYSCALLS */
+extern const unsigned long compat_sys_call_table[];
 
 static inline int syscall_get_nr(struct task_struct *task, struct pt_regs 
*regs)
 {
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 7b1693a..5574d92 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -54,6 +54,9 @@
 SYS_CALL_TABLE:
.tc sys_call_table[TC],sys_call_table
 
+COMPAT_SYS_CALL_TABLE:
+   .tc compat_sys_call_table[TC],compat_sys_call_table
+
 /* This value is used to mark exception frames on the stack. */
 exception_marker:
.tc ID_EXC_MARKER[TC],STACK_FRAME_REGS_MARKER
@@ -173,7 +176,7 @@ system_call:/* label this so stack 
traces look sane */
ld  r11,SYS_CALL_TABLE@toc(2)
andis.  r10,r10,_TIF_32BIT@h
beq 15f
-   addir11,r11,8   /* use 32-bit syscall entries */
+   ld  r11,COMPAT_SYS_CALL_TABLE@toc(2)
clrldi  r3,r3,32
clrldi  r4,r4,32
clrldi  r5,r5,32
@@ -181,7 +184,7 @@ system_call:/* label this so stack 
traces look sane */
clrldi  r7,r7,32
clrldi  r8,r8,32
 15:
-   slwir0,r0,4
+   slwir0,r0,3
 
barrier_nospec_asm
/*
diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.S
index 9ff1913..0fa84e1 100644
--- a/arch/powerpc/kernel/systbl.S
+++ b/arch/powerpc/kernel/systbl.S
@@ -17,13 +17,13 @@
 #include 
 
 #ifdef CONFIG_PPC64
-#define SYSCALL(func)  .8byte  DOTSYM(sys_##func),DOTSYM(sys_##func)
-#define COMPAT_SYS(func)   .8byte  
DOTSYM(sys_##func),DOTSYM(compat_sys_##func)
-#define PPC_SYS(func)  .8byte  DOTSYM(ppc_##func),DOTSYM(ppc_##func)
-#define OLDSYS(func)   .8byte  
DOTSYM(sys_ni_syscall),DOTSYM(sys_ni_syscall)
-#define SYS32ONLY(func).8byte  
DOTSYM(sys_ni_syscall),DOTSYM(compat_sys_##func)
-#define PPC64ONLY(func).8byte  
DOTSYM(ppc_##func),DOTSYM(sys_ni_syscall)
-#define SYSX(f, f3264, f32).8byte  DOTSYM(f),DOTSYM(f3264)
+#define SYSCALL(func)  .8byte  DOTSYM(sys_##func)
+#define COMPAT_SYS(func)   .8byte  DOTSYM(sys_##func)
+#define PPC_SYS(func)  .8byte  DOTSYM(ppc_##func)
+#define OLDSYS(func)   .8byte  DOTSYM(sys_ni_syscall)
+#define SYS32ONLY(func).8byte  DOTSYM(sys_ni_syscall)
+#define PPC64ONLY(func).8byte  DOTSYM(ppc_##func)
+#define SYSX(f, f3264, f32).8byte  DOTSYM(f)
 #else
 #define SYSCALL(func)  .long   sys_##func
 #define COMPAT_SYS(func)   .long   sys_##func
@@ -46,6 +46,27 @@
 
 .globl sys_call_table
 sys_call_table:
+#include 
+
+#undef SYSCALL
+#undef COMPAT_SYS
+#undef PPC_SYS
+#undef OLDSYS
+#undef SYS32ONLY
+#undef PPC64ONLY
+#undef SYSX
 
+#ifdef CONFIG_COMPAT
+#define SYSCALL(func)  .8byte  DOTSYM(sys_##func)
+#define COMPAT_SYS(func)   .8byte  DOTSYM(compat_sys_##func)
+#define PPC_SYS(func)  .8byte  DOTSYM(ppc_##func)
+#define OLDSYS(func)   .8byte  DOTSYM(sys_ni_syscall)
+#define SYS32ONLY(func).8byte  DOTSYM(compat_sys_##func)
+#define PPC64ONLY(func).8byte  DOTSYM(sys_ni_syscall)
+#define SYSX(f, f3264, f32).8byte  DOTSYM(f3264)
+
+.globl compat_sys_call_table
+compat_sys_call_table:
 #define compat_sys_sigsuspend  sys_sigsuspend
 #include 
+#endif
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 65b3bdb..7725a97 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -671,15 +671,18 @@ static void __init vdso_setup_syscall_map(void)
 {
unsigned int i;
extern unsigned long *sys_call_table;
+#ifdef CONFIG_PPC64
+   extern unsigned long 

[PATCH v6 3/5] powerpc: add system call table generation support

2018-12-17 Thread Firoz Khan
The system call tables are in different format in all
architecture and it will be difficult to manually add or
modify the system calls in the respective files. To make
it easy by keeping a script and which will generate the
uapi header and syscall table file. This change will also
help to unify the implementation across all architectures.

The system call table generation script is added in
syscalls directory which contain the script to generate
both uapi header file and system call table files.
The syscall.tbl file will be the input for the scripts.

syscall.tbl contains the list of available system calls
along with system call number and corresponding entry point.
Add a new system call in this architecture will be possible
by adding new entry in the syscall.tbl file.

Adding a new table entry consisting of:
- System call number.
- ABI.
- System call name.
- Entry point name.
- Compat entry name, if required.

syscallhdr.sh and syscalltbl.sh will generate uapi header-
unistd_32/64.h and syscall_table_32/64/c32/spu.h files
respectively. File syscall_table_32/64/c32/spu.h is incl-
uded by syscall.S - the real system call table. Both *.sh
files will parse the content syscall.tbl to generate the
header and table files.

ARM, s390 and x86 architecuture does have similar support.
I leverage their implementation to come up with a generic
solution.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/kernel/syscalls/Makefile  |  63 +
 arch/powerpc/kernel/syscalls/syscall.tbl   | 427 +
 arch/powerpc/kernel/syscalls/syscallhdr.sh |  37 +++
 arch/powerpc/kernel/syscalls/syscalltbl.sh |  36 +++
 4 files changed, 563 insertions(+)
 create mode 100644 arch/powerpc/kernel/syscalls/Makefile
 create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
 create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh

diff --git a/arch/powerpc/kernel/syscalls/Makefile 
b/arch/powerpc/kernel/syscalls/Makefile
new file mode 100644
index 000..27b4895
--- /dev/null
+++ b/arch/powerpc/kernel/syscalls/Makefile
@@ -0,0 +1,63 @@
+# SPDX-License-Identifier: GPL-2.0
+kapi := arch/$(SRCARCH)/include/generated/asm
+uapi := arch/$(SRCARCH)/include/generated/uapi/asm
+
+_dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)')  \
+ $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)')
+
+syscall := $(srctree)/$(src)/syscall.tbl
+syshdr := $(srctree)/$(src)/syscallhdr.sh
+systbl := $(srctree)/$(src)/syscalltbl.sh
+
+quiet_cmd_syshdr = SYSHDR  $@
+  cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@'   \
+  '$(syshdr_abis_$(basetarget))'   \
+  '$(syshdr_pfx_$(basetarget))'\
+  '$(syshdr_offset_$(basetarget))'
+
+quiet_cmd_systbl = SYSTBL  $@
+  cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@'   \
+  '$(systbl_abis_$(basetarget))'   \
+  '$(systbl_abi_$(basetarget))'\
+  '$(systbl_offset_$(basetarget))'
+
+syshdr_abis_unistd_32 := common,nospu,32
+$(uapi)/unistd_32.h: $(syscall) $(syshdr)
+   $(call if_changed,syshdr)
+
+syshdr_abis_unistd_64 := common,nospu,64
+$(uapi)/unistd_64.h: $(syscall) $(syshdr)
+   $(call if_changed,syshdr)
+
+systbl_abis_syscall_table_32 := common,nospu,32
+systbl_abi_syscall_table_32 := 32
+$(kapi)/syscall_table_32.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abis_syscall_table_64 := common,nospu,64
+systbl_abi_syscall_table_64 := 64
+$(kapi)/syscall_table_64.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abis_syscall_table_c32 := common,nospu,32
+systbl_abi_syscall_table_c32 := c32
+$(kapi)/syscall_table_c32.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abis_syscall_table_spu := common,spu
+systbl_abi_syscall_table_spu := spu
+$(kapi)/syscall_table_spu.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+uapisyshdr-y   += unistd_32.h unistd_64.h
+kapisyshdr-y   += syscall_table_32.h   \
+  syscall_table_64.h   \
+  syscall_table_c32.h  \
+  syscall_table_spu.h
+
+targets+= $(uapisyshdr-y) $(kapisyshdr-y)
+
+PHONY += all
+all: $(addprefix $(uapi)/,$(uapisyshdr-y))
+all: $(addprefix $(kapi)/,$(kapisyshdr-y))
+   @:
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
new file mode 100644
index 000..db3bbb8
--- /dev/null
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -0,0 +1,427 @@
+# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+#
+# system call numbers and entry vectors for powerpc
+#
+# The format is:
+# 
+#
+# The  can be common, spu, nospu, 64, or 32 for this file.
+#
+0  nospu   restart_syscall sys_restart_syscall

[PATCH v6 2/5] powerpc: move macro definition from asm/systbl.h

2018-12-17 Thread Firoz Khan
Move the macro definition for compat_sys_sigsuspend from
asm/systbl.h to the file which it is getting included.

One of the patch in this patch series is generating uapi
header and syscall table files. In order to come up with
a common implimentation across all architecture, we need
to do this change.

This change will simplify the implementation of system
call table generation script and help to come up a common
implementation across all architecture.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/include/asm/systbl.h | 1 -
 arch/powerpc/kernel/systbl.S  | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 01b5171..c4321b9 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -76,7 +76,6 @@
 SYSCALL_SPU(ssetmask)
 SYSCALL_SPU(setreuid)
 SYSCALL_SPU(setregid)
-#define compat_sys_sigsuspend sys_sigsuspend
 SYS32ONLY(sigsuspend)
 SYSX(sys_ni_syscall,compat_sys_sigpending,sys_sigpending)
 SYSCALL_SPU(sethostname)
diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.S
index 919a327..9ff1913 100644
--- a/arch/powerpc/kernel/systbl.S
+++ b/arch/powerpc/kernel/systbl.S
@@ -47,4 +47,5 @@
 .globl sys_call_table
 sys_call_table:
 
+#define compat_sys_sigsuspend  sys_sigsuspend
 #include 
-- 
1.9.1



[PATCH v6 1/5] powerpc: add __NR_syscalls along with NR_syscalls

2018-12-17 Thread Firoz Khan
NR_syscalls macro holds the number of system call exist
in powerpc architecture. We have to change the value of
NR_syscalls, if we add or delete a system call.

One of the patch in this patch series has a script which
will generate a uapi header based on syscall.tbl file.
The syscall.tbl file contains the number of system call
information. So we have two option to update NR_syscalls
value.

1. Update NR_syscalls in asm/unistd.h manually by count-
   ing the no.of system calls. No need to update NR_sys-
   calls until we either add a new system call or delete
   existing system call.

2. We can keep this feature in above mentioned script,
   that will count the number of syscalls and keep it in
   a generated file. In this case we don't need to expli-
   citly update NR_syscalls in asm/unistd.h file.

The 2nd option will be the recommended one. For that, I
added the __NR_syscalls macro in uapi/asm/unistd.h along
with NR_syscalls asm/unistd.h. The macro __NR_syscalls
also added for making the name convention same across all
architecture. While __NR_syscalls isn't strictly part of
the uapi, having it as part of the generated header to
simplifies the implementation. We also need to enclose
this macro with #ifdef __KERNEL__ to avoid side effects.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/include/asm/unistd.h  | 3 +--
 arch/powerpc/include/uapi/asm/unistd.h | 5 -
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index b0de85b..a3c35e6 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -11,8 +11,7 @@
 
 #include 
 
-
-#define NR_syscalls389
+#define NR_syscalls__NR_syscalls
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index 985534d..7195868 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -10,7 +10,6 @@
 #ifndef _UAPI_ASM_POWERPC_UNISTD_H_
 #define _UAPI_ASM_POWERPC_UNISTD_H_
 
-
 #define __NR_restart_syscall 0
 #define __NR_exit1
 #define __NR_fork2
@@ -401,4 +400,8 @@
 #define __NR_rseq  387
 #define __NR_io_pgetevents 388
 
+#ifdef __KERNEL__
+#define __NR_syscalls  389
+#endif
+
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
1.9.1



[PATCH v6 0/5] powerpc: system call table generation support

2018-12-17 Thread Firoz Khan
The purpose of this patch series is, we can easily
add/modify/delete system call table support by cha-
nging entry in syscall.tbl file instead of manually
changing many files. The other goal is to unify the 
system call table generation support implementation 
across all the architectures. 

The system call tables are in different format in 
all architecture. It will be difficult to manually
add, modify or delete the system calls in the resp-
ective files manually. To make it easy by keeping a 
script and which'll generate uapi header file and 
syscall table file.

syscall.tbl contains the list of available system 
calls along with system call number and correspond-
ing entry point. Add a new system call in this arch-
itecture will be possible by adding new entry in 
the syscall.tbl file.

Adding a new table entry consisting of:
- System call number.
- ABI.
- System call name.
- Entry point name.
- Compat entry name, if required.
- spu entry name, if required.

ARM, s390 and x86 architecuture does exist the sim-
ilar support. I leverage their implementation to 
come up with a generic solution.

I have done the same support for work for alpha, 
ia64, m68k, microblaze, mips, parisc, sh, sparc, 
and xtensa. Below mentioned git repository contains
more details about the workflow.

https://github.com/frzkhn/system_call_table_generator/

Finally, this is the ground work to solve the Y2038
issue. We need to add two dozen of system calls to 
solve Y2038 issue. So this patch series will help to
add new system calls easily by adding new entry in the
syscall.tbl.

Changes since v5:
 - rebased with 4.20-rc7.

Changes since v4:
 - DOTSYM macro removed for ppc32, which was causing
   the compilation error.

Changes since v3:
 - split compat syscall table out from native table.
 - modified the script to add new line in the generated
   file.

Changes since v2:
 - modified/optimized the syscall.tbl to avoid duplicate
   for the spu entries.
 - updated the syscalltbl.sh to meet the above point.

Changes since v1:
 - optimized/updated the syscall table generation 
   scripts.
 - fixed all mixed indentation issues in syscall.tbl.
 - added "comments" in syscall_*.tbl.
 - changed from generic-y to generated-y in Kbuild.

Firoz Khan (5):
  powerpc: add __NR_syscalls along with NR_syscalls
  powerpc: move macro definition from asm/systbl.h
  powerpc: add system call table generation support
  powerpc: split compat syscall table out from native table
  powerpc: generate uapi header and system call table files

 arch/powerpc/Makefile   |   3 +
 arch/powerpc/include/asm/Kbuild |   4 +
 arch/powerpc/include/asm/syscall.h  |   3 +-
 arch/powerpc/include/asm/systbl.h   | 396 --
 arch/powerpc/include/asm/unistd.h   |   3 +-
 arch/powerpc/include/uapi/asm/Kbuild|   2 +
 arch/powerpc/include/uapi/asm/unistd.h  | 389 +
 arch/powerpc/kernel/Makefile|  10 -
 arch/powerpc/kernel/entry_64.S  |   7 +-
 arch/powerpc/kernel/syscalls/Makefile   |  63 
 arch/powerpc/kernel/syscalls/syscall.tbl| 427 
 arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
 arch/powerpc/kernel/syscalls/syscalltbl.sh  |  36 +++
 arch/powerpc/kernel/systbl.S|  40 ++-
 arch/powerpc/kernel/systbl_chk.c|  60 
 arch/powerpc/kernel/vdso.c  |   7 +-
 arch/powerpc/platforms/cell/spu_callbacks.c |  17 +-
 17 files changed, 606 insertions(+), 898 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/systbl.h
 create mode 100644 arch/powerpc/kernel/syscalls/Makefile
 create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
 create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
 delete mode 100644 arch/powerpc/kernel/systbl_chk.c

-- 
1.9.1



Re: [PATCH v5 0/5] powerpc: system call table generation support

2018-12-17 Thread Michael Ellerman
Satheesh Rajendran  writes:
> Hi Firoz,
>
> On Thu, Dec 13, 2018 at 02:32:45PM +0530, Firoz Khan wrote:
>> The purpose of this patch series is, we can easily
>> add/modify/delete system call table support by cha-
>> nging entry in syscall.tbl file instead of manually
>> changing many files. The other goal is to unify the 
>> system call table generation support implementation 
>> across all the architectures. 
>> 
>> The system call tables are in different format in 
>> all architecture. It will be difficult to manually
>> add, modify or delete the system calls in the resp-
>> ective files manually. To make it easy by keeping a 
>> script and which'll generate uapi header file and 
>> syscall table file.
>> 
>> syscall.tbl contains the list of available system 
>> calls along with system call number and correspond-
>> ing entry point. Add a new system call in this arch-
>> itecture will be possible by adding new entry in 
>> the syscall.tbl file.
>> 
>> Adding a new table entry consisting of:
>> - System call number.
>> - ABI.
>> - System call name.
>> - Entry point name.
>>  - Compat entry name, if required.
>>  - spu entry name, if required.
>> 
>> ARM, s390 and x86 architecuture does exist the sim-
>> ilar support. I leverage their implementation to 
>> come up with a generic solution.
>> 
>> I have done the same support for work for alpha, 
>> ia64, m68k, microblaze, mips, parisc, sh, sparc, 
>> and xtensa. Below mentioned git repository contains
>> more details about the workflow.
>> 
>> https://github.com/frzkhn/system_call_table_generator/
>> 
>> Finally, this is the ground work to solve the Y2038
>> issue. We need to add two dozen of system calls to 
>> solve Y2038 issue. So this patch series will help to
>> add new system calls easily by adding new entry in the
>> syscall.tbl.
>> 
>> Changes since v4:
>>  - DOTSYM macro removed for ppc32, which was causing
>>the compilation error.
>> 
>> Changes since v3:
>>  - split compat syscall table out from native table.
>>  - modified the script to add new line in the generated
>>file.
>> 
>> Changes since v2:
>>  - modified/optimized the syscall.tbl to avoid duplicate
>>for the spu entries.
>>  - updated the syscalltbl.sh to meet the above point.
>> 
>> Changes since v1:
>>  - optimized/updated the syscall table generation 
>>scripts.
>>  - fixed all mixed indentation issues in syscall.tbl.
>>  - added "comments" in syscall_*.tbl.
>>  - changed from generic-y to generated-y in Kbuild.
>> 
>> Firoz Khan (5):
>>   powerpc: add __NR_syscalls along with NR_syscalls
>>   powerpc: move macro definition from asm/systbl.h
>>   powerpc: add system call table generation support
>>   powerpc: split compat syscall table out from native table
>>   powerpc: generate uapi header and system call table files
>
> Tried to apply on linus "master" and 
> linuxppc-dev(https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git)
>  "merge" branch,
> both failed to apply series.
>
> # git am mbox
> Applying: powerpc: add __NR_syscalls along with NR_syscalls
> Applying: powerpc: move macro definition from asm/systbl.h
> Applying: powerpc: add system call table generation support
> Applying: powerpc: split compat syscall table out from native table
> Applying: powerpc: generate uapi header and system call table files
> error: patch failed: arch/powerpc/include/uapi/asm/Kbuild:1
> error: arch/powerpc/include/uapi/asm/Kbuild: patch does not apply
> Patch failed at 0005 powerpc: generate uapi header and system call table files
> Use 'git am --show-current-patch' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
>
> Then, tried with 
> linuxppc-dev(https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git)
>  "next" branch,
> patch got applied, compiled with ppc64le_defconfig and booted on IBM Power8 
> box.
>
> # uname -r
> 4.20.0-rc2-gdd2690d2c
>
> Looks like patch series needs a rebase against the latest kernel versions.

No it's fine if it applies on next.

I can also fix up minor merge conflicts if there are any.

cheers


Re: [PATCH v5 0/5] powerpc: system call table generation support

2018-12-17 Thread Firoz Khan
Hi Satheesh,

On Mon, 17 Dec 2018 at 13:39, Satheesh Rajendran
 wrote:
>
> Hi Firoz,
>
> On Thu, Dec 13, 2018 at 02:32:45PM +0530, Firoz Khan wrote:
> Tried to apply on linus "master" and 
> linuxppc-dev(https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git)
>  "merge" branch,
> both failed to apply series.
>
> # git am mbox
> Applying: powerpc: add __NR_syscalls along with NR_syscalls
> Applying: powerpc: move macro definition from asm/systbl.h
> Applying: powerpc: add system call table generation support
> Applying: powerpc: split compat syscall table out from native table
> Applying: powerpc: generate uapi header and system call table files
> error: patch failed: arch/powerpc/include/uapi/asm/Kbuild:1
> error: arch/powerpc/include/uapi/asm/Kbuild: patch does not apply
> Patch failed at 0005 powerpc: generate uapi header and system call table files
> Use 'git am --show-current-patch' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
>
> Then, tried with 
> linuxppc-dev(https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git)
>  "next" branch,
> patch got applied, compiled with ppc64le_defconfig and booted on IBM Power8 
> box.
>
> # uname -r
> 4.20.0-rc2-gdd2690d2c
>
> Looks like patch series needs a rebase against the latest kernel versions.

Thanks for the update.
Sure, I'll update the patch series and post asap.

Firoz


Re: [PATCH v1 9/9] mm: better document PG_reserved

2018-12-17 Thread David Hildenbrand
On 15.12.18 01:12, Randy Dunlap wrote:
> On 12/14/18 3:10 AM, David Hildenbrand wrote:
>> The usage of PG_reserved and how PG_reserved pages are to be treated is
>> buried deep down in different parts of the kernel. Let's shine some light
>> onto these details by documenting current users and expected
>> behavior.
>>
>> Especially, clarify on the "Some of them might not even exist" case.
>> These are physical memory gaps that will never be dumped as they
>> are not marked as IORESOURCE_SYSRAM. PG_reserved does in general not
>> hinder anybody from dumping or swapping. In some cases, these pages
>> will not be stored in the hibernation image.
> 
> Hi,
> Thanks for the doc update.
> Comments below.
> 
>> Cc: Andrew Morton 
>> Cc: Stephen Rothwell 
>> Cc: Pavel Tatashin 
>> Cc: Michal Hocko 
>> Cc: Alexander Duyck 
>> Cc: Matthew Wilcox 
>> Cc: Anthony Yznaga 
>> Cc: Miles Chen 
>> Cc: yi.z.zh...@linux.intel.com
>> Cc: Dan Williams 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  include/linux/page-flags.h | 33 +++--
>>  1 file changed, 31 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 808b4183e30d..9de2e941cbd5 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -17,8 +17,37 @@
>>  /*
>>   * Various page->flags bits:
>>   *
>> - * PG_reserved is set for special pages, which can never be swapped out. 
>> Some
>> - * of them might not even exist...
>> + * PG_reserved is set for special pages. The "struct page" of such a page
>> + * should in general not be touched (e.g. set dirty) except by their owner.
> 
>by its owner.

Indeed.

> 
>> + * Pages marked as PG_reserved include:
>> + * - Pages part of the kernel image (including vDSO) and similar (e.g. BIOS,
>> + *   initrd, HW tables)
>> + * - Pages reserved or allocated early during boot (before the page 
>> allocator
>> + *   was initialized). This includes (depending on the architecture) the
>> + *   initial vmmap, initial page tables, crashkernel, elfcorehdr, and much
> 
> VM map,

This should actually be vmemmap (aka struct pages).

> 
>> + *   much more. Once (if ever) freed, PG_reserved is cleared and they will
>> + *   be given to the page allocator.
>> + * - Pages falling into physical memory gaps - not IORESOURCE_SYSRAM. Trying
>> + *   to read/write these pages might end badly. Don't touch!
>> + * - The zero page(s)
>> + * - Pages not added to the page allocator when onlining a section because
>> + *   they were excluded via the online_page_callback() or because they are
>> + *   PG_hwpoison.
>> + * - Pages allocated in the context of kexec/kdump (loaded kernel image,
>> + *   control pages, vmcoreinfo)
>> + * - MMIO/DMA pages. Some architectures don't allow to ioremap pages that 
>> are
>> + *   not marked PG_reserved (as they might be in use by somebody else who 
>> does
>> + *   not respect the caching strategy).
>> + * - Pages part of an offline section (struct pages of offline sections 
>> should
>> + *   not be trusted as they will be initialized when first onlined).
>> + * - MCA pages on ia64
>> + * - Pages holding CPU notes for POWER Firmware Assisted Dump
>> + * - Device memory (e.g. PMEM, DAX, HMM)
>> + * Some PG_reserved pages will be excluded from the hibernation image.
>> + * PG_reserved does in general not hinder anybody from dumping or swapping
>> + * and is no longer required for remap_pfn_range(). ioremap might require 
>> it.
>> + * Consequently, PG_reserved for a page mapped into user space can indicate
>> + * the zero page, the vDSO, MMIO pages or device memory.
>>   *
>>   * The PG_private bitflag is set on pagecache pages if they contain 
>> filesystem
>>   * specific data (which is normally at page->private). It can be used by
>>
> 
> cheers.
> 

Thanks!

-- 

Thanks,

David / dhildenb


Re: [PATCH v1 03/13] powerpc/mm/32s: rework mmu_mapin_ram()

2018-12-17 Thread Christophe Leroy




Le 17/12/2018 à 02:28, Jonathan Neuschäfer a écrit :

Hi, thanks for your reply.

On Thu, Dec 13, 2018 at 03:51:32PM +0100, Christophe Leroy wrote:

Hi Again,

Le 13/12/2018 à 13:16, Christophe Leroy a écrit :

[...]

Can you tell/provide the .config and dts used ?


I'm using wii.dts and almost the wii_defconfig from my tree (save-
defconfig result is attached), which is 4.20-rc5 plus a few patches:

   https://github.com/neuschaefer/linux wii-4.20-rc5(w/o your patches)
   https://github.com/neuschaefer/linux wii-4.20-rc5-ppcbat (w/ your patches 
1-3)


You seem to have 319MB RAM wherease arch/powerpc/boot/dts/wii.dts only
has 88MB Memory:

  memory {
      device_type = "memory";
      reg = <0x 0x0180    /* MEM1 24MB 1T-SRAM */
     0x1000 0x0400>;    /* MEM2 64MB GDDR3 */
  };


This is, I think, because something marks all the address space from 0
to the end of MEM2 as RAM, and then cuts out a hole in the middle. I'm
not sure about the exact mechanism.

Unfortunately this hole has to be treated carefully because it contains
MMIO devices.


Putting the same description in my mpc832x board DTS and doing a few hacks
to get the WII functions called, I get the following:

[0.00] Top of RAM: 0x1400, Total RAM: 0x580
[0.00] Memory hole size: 232MB
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x-0x13ff]
[0.00]   Normal   empty
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x-0x017f]
[0.00]   node   0: [mem 0x1000-0x13ff]
[0.00] Initmem setup node 0 [mem
0x-0x13ff]
[0.00] On node 0 totalpages: 22528
[0.00]   DMA zone: 640 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 22528 pages, LIFO batch:3
[0.00] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[0.00] pcpu-alloc: [0] 0
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 21888
[0.00] Kernel command line: loglevel=7
ip=192.168.2.5:192.168.2.2::255.0
[0.00] Dentry cache hash table entries: 16384 (order: 4, 65536
bytes)
[0.00] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
[0.00] Memory: 77060K/90112K available (6548K kernel code, 1156K
rwdata,
[0.00] Kernel virtual memory layout:
[0.00]   * 0xfffdf000..0xf000  : fixmap
[0.00]   * 0xfdffd000..0xfe00  : early ioremap
[0.00]   * 0xd500..0xfdffd000  : vmalloc & ioremap




root@vgoippro:~# cat /sys/kernel/debug/powerpc/block_address_translation
---[ Instruction Block Address Translation ]---
0: 0xc000-0xc0ff 0x Kernel EXEC coherent
1: -
2: 0xc100-0xc17f 0x0100 Kernel EXEC coherent
3: -
4: 0xd000-0xd3ff 0x1000 Kernel EXEC coherent
5: -
6: -
7: -

---[ Data Block Address Translation ]---
0: 0xc000-0xc0ff 0x Kernel RW coherent
1: 0xfffe-0x 0x0d00 Kernel RW no cache guarded
2: 0xc100-0xc17f 0x0100 Kernel RW coherent
3: -
4: 0xd000-0xd3ff 0x1000 Kernel RW coherent
5: -
6: -
7: -


Could you please provide the dmesg and
/sys/kernel/debug/powerpc/block_address_translation from before this patch,
so that we can compare and identify the differences if any ?


After applying the patch that adds this debugfs file and enabling
CONFIG_PPC_PTDUMP, I get this:

# cat /sys/kernel/debug/powerpc/block_address_translation
---[ Instruction Block Address Translation ]---
0: -
1: -
2: 0xc000-0xc0ff 0x Kernel EXEC
3: 0xc100-0xc17f 0x0100 Kernel EXEC
4: 0xd000-0xd1ff 0x1000 Kernel EXEC
5: -
6: -
7: -

---[ Data Block Address Translation ]---
0: -
1: 0xfffe-0x 0x0d00 Kernel RW no cache guarded
2: 0xc000-0xc0ff 0x Kernel RW
3: 0xc100-0xc17f 0x0100 Kernel RW
4: 0xd000-0xd1ff 0x1000 Kernel RW
5: -
6: -
7: -

dmesg is attached.


I added some tracing to the setbat function:

diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index f6f575bae3bc..4da3dc54fe46 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -120,6 +120,9 @@ void __init setbat(int index, unsigned long virt, 
phys_addr_t phys,
struct ppc_bat *bat = BATS[index];
unsigned long flags = pgprot_val(prot);
  
+	pr_info("setbat(%u, %px, %px, %px, %lx)\n",

+   index, (void *)virt, (void *)phys, (void *)size, flags);
+
if ((flags & 

Re: [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods

2018-12-17 Thread Christoph Hellwig
On Mon, Dec 17, 2018 at 08:39:17AM +0100, Christophe Leroy wrote:
> I can help you with powerpc 8xx actually.

Below is a patch that implements the proper scheme on top of the series
in this thread.  Compile tested with tqm8xx_defconfig and tqm8xx_defconfig
+ CONFIG_HIGHMEM only.

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index dacd0f93f2b2..8df9dd42b351 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -39,19 +39,17 @@ extern int dma_nommu_mmap_coherent(struct device *dev,
  * to ensure it is consistent.
  */
 struct device;
-extern void __dma_sync(void *vaddr, size_t size, int direction);
-extern void __dma_sync_page(struct page *page, unsigned long offset,
-size_t size, int direction);
+void ppc_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+   size_t size, enum dma_data_direction dir);
+void ppc_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+   size_t size, enum dma_data_direction dir);
 extern unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr);
 
 #else /* ! CONFIG_NOT_COHERENT_CACHE */
-/*
- * Cache coherent cores.
- */
-
-#define __dma_sync(addr, size, rw) ((void)0)
-#define __dma_sync_page(pg, off, sz, rw)   ((void)0)
-
+static inline void ppc_sync_dma_for_device(struct device *dev,
+   phys_addr_t paddr, size_t size, enum dma_data_direction dir)
+{
+}
 #endif /* ! CONFIG_NOT_COHERENT_CACHE */
 
 static inline unsigned long device_to_mask(struct device *dev)
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 270b2911c437..0c0bcfebc271 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -6,7 +6,7 @@
  */
 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -194,23 +194,12 @@ static int dma_nommu_map_sg(struct device *dev, struct 
scatterlist *sgl,
if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
continue;
 
-   __dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
+   ppc_sync_dma_for_device(dev, sg_phys(sg), sg->length, 
direction);
}
 
return nents;
 }
 
-static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sgl,
-   int nents, enum dma_data_direction direction,
-   unsigned long attrs)
-{
-   struct scatterlist *sg;
-   int i;
-
-   for_each_sg(sgl, sg, nents, i)
-   __dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
-}
-
 static u64 dma_nommu_get_required_mask(struct device *dev)
 {
u64 end, mask;
@@ -230,39 +219,70 @@ static inline dma_addr_t dma_nommu_map_page(struct device 
*dev,
 enum dma_data_direction dir,
 unsigned long attrs)
 {
+   phys_addr_t paddr = page_to_phys(page) + offset;
+
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-   __dma_sync_page(page, offset, size, dir);
+   ppc_sync_dma_for_device(dev, paddr, size, dir);
 
-   return page_to_phys(page) + offset + get_dma_offset(dev);
+   return paddr + get_dma_offset(dev);
 }
 
+#ifdef CONFIG_NOT_COHERENT_CACHE
 static inline void dma_nommu_unmap_page(struct device *dev,
 dma_addr_t dma_address,
 size_t size,
-enum dma_data_direction direction,
+enum dma_data_direction dir,
 unsigned long attrs)
 {
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-   __dma_sync(bus_to_virt(dma_address), size, dir);
+   ppc_sync_dma_for_cpu(dev, dma_to_phys(dev, dma_address), size,
+   dir);
 }
 
-#ifdef CONFIG_NOT_COHERENT_CACHE
-static inline void dma_nommu_sync_sg(struct device *dev,
+static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sgl,
+   int nents, enum dma_data_direction direction,
+   unsigned long attrs)
+{
+   struct scatterlist *sg;
+   int i;
+
+   for_each_sg(sgl, sg, nents, i)
+   ppc_sync_dma_for_cpu(dev, sg_phys(sg), sg->length, direction);
+}
+
+static inline void dma_nommu_sync_sg_for_device(struct device *dev,
struct scatterlist *sgl, int nents,
-   enum dma_data_direction direction)
+   enum dma_data_direction dir)
 {
struct scatterlist *sg;
int i;
 
for_each_sg(sgl, sg, nents, i)
-   __dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
+   ppc_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
 }
+static inline void dma_nommu_sync_sg_for_cpu(struct device *dev,
+   struct 

Re: [PATCH v5 0/5] powerpc: system call table generation support

2018-12-17 Thread Satheesh Rajendran
Hi Firoz,

On Thu, Dec 13, 2018 at 02:32:45PM +0530, Firoz Khan wrote:
> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the 
> system call table generation support implementation 
> across all the architectures. 
> 
> The system call tables are in different format in 
> all architecture. It will be difficult to manually
> add, modify or delete the system calls in the resp-
> ective files manually. To make it easy by keeping a 
> script and which'll generate uapi header file and 
> syscall table file.
> 
> syscall.tbl contains the list of available system 
> calls along with system call number and correspond-
> ing entry point. Add a new system call in this arch-
> itecture will be possible by adding new entry in 
> the syscall.tbl file.
> 
> Adding a new table entry consisting of:
> - System call number.
> - ABI.
> - System call name.
> - Entry point name.
>   - Compat entry name, if required.
>   - spu entry name, if required.
> 
> ARM, s390 and x86 architecuture does exist the sim-
> ilar support. I leverage their implementation to 
> come up with a generic solution.
> 
> I have done the same support for work for alpha, 
> ia64, m68k, microblaze, mips, parisc, sh, sparc, 
> and xtensa. Below mentioned git repository contains
> more details about the workflow.
> 
> https://github.com/frzkhn/system_call_table_generator/
> 
> Finally, this is the ground work to solve the Y2038
> issue. We need to add two dozen of system calls to 
> solve Y2038 issue. So this patch series will help to
> add new system calls easily by adding new entry in the
> syscall.tbl.
> 
> Changes since v4:
>  - DOTSYM macro removed for ppc32, which was causing
>the compilation error.
> 
> Changes since v3:
>  - split compat syscall table out from native table.
>  - modified the script to add new line in the generated
>file.
> 
> Changes since v2:
>  - modified/optimized the syscall.tbl to avoid duplicate
>for the spu entries.
>  - updated the syscalltbl.sh to meet the above point.
> 
> Changes since v1:
>  - optimized/updated the syscall table generation 
>scripts.
>  - fixed all mixed indentation issues in syscall.tbl.
>  - added "comments" in syscall_*.tbl.
>  - changed from generic-y to generated-y in Kbuild.
> 
> Firoz Khan (5):
>   powerpc: add __NR_syscalls along with NR_syscalls
>   powerpc: move macro definition from asm/systbl.h
>   powerpc: add system call table generation support
>   powerpc: split compat syscall table out from native table
>   powerpc: generate uapi header and system call table files

Tried to apply on linus "master" and 
linuxppc-dev(https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git) 
"merge" branch,
both failed to apply series.

# git am mbox
Applying: powerpc: add __NR_syscalls along with NR_syscalls
Applying: powerpc: move macro definition from asm/systbl.h
Applying: powerpc: add system call table generation support
Applying: powerpc: split compat syscall table out from native table
Applying: powerpc: generate uapi header and system call table files
error: patch failed: arch/powerpc/include/uapi/asm/Kbuild:1
error: arch/powerpc/include/uapi/asm/Kbuild: patch does not apply
Patch failed at 0005 powerpc: generate uapi header and system call table files
Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Then, tried with 
linuxppc-dev(https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git) 
"next" branch,
patch got applied, compiled with ppc64le_defconfig and booted on IBM Power8 box.

# uname -r
4.20.0-rc2-gdd2690d2c

Looks like patch series needs a rebase against the latest kernel versions.


Thanks,
-Satheesh.

> 
>  arch/powerpc/Makefile   |   3 +
>  arch/powerpc/include/asm/Kbuild |   4 +
>  arch/powerpc/include/asm/syscall.h  |   3 +-
>  arch/powerpc/include/asm/systbl.h   | 396 --
>  arch/powerpc/include/asm/unistd.h   |   3 +-
>  arch/powerpc/include/uapi/asm/Kbuild|   2 +
>  arch/powerpc/include/uapi/asm/unistd.h  | 389 +
>  arch/powerpc/kernel/Makefile|  10 -
>  arch/powerpc/kernel/entry_64.S  |   7 +-
>  arch/powerpc/kernel/syscalls/Makefile   |  63 
>  arch/powerpc/kernel/syscalls/syscall.tbl| 427 
> 
>  arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
>  arch/powerpc/kernel/syscalls/syscalltbl.sh  |  36 +++
>  arch/powerpc/kernel/systbl.S|  40 ++-
>  arch/powerpc/kernel/systbl_chk.c|  60 
>  arch/powerpc/kernel/vdso.c  | 

[PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache

2018-12-17 Thread frowand . list
From: Frank Rowand 

Non-overlay dynamic devicetree node removal may leave the node in
the phandle cache.  Subsequent calls to of_find_node_by_phandle()
will incorrectly find the stale entry.  Remove the node from the
cache.

Add paranoia checks in of_find_node_by_phandle() as a second level
of defense (do not return cached node if detached, do not add node
to cache if detached).

Reported-by: Michael Bringmann 
Signed-off-by: Frank Rowand 
---

changes since v1:
  - add WARN_ON(1) for unexpected condition in of_find_node_by_phandle()

 drivers/of/base.c   | 30 +-
 drivers/of/dynamic.c|  3 +++
 drivers/of/of_private.h |  4 
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 6c33d63361b8..ad71864cecf5 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -162,6 +162,27 @@ int of_free_phandle_cache(void)
 late_initcall_sync(of_free_phandle_cache);
 #endif
 
+/*
+ * Caller must hold devtree_lock.
+ */
+void __of_free_phandle_cache_entry(phandle handle)
+{
+   phandle masked_handle;
+
+   if (!handle)
+   return;
+
+   masked_handle = handle & phandle_cache_mask;
+
+   if (phandle_cache) {
+   if (phandle_cache[masked_handle] &&
+   handle == phandle_cache[masked_handle]->phandle) {
+   of_node_put(phandle_cache[masked_handle]);
+   phandle_cache[masked_handle] = NULL;
+   }
+   }
+}
+
 void of_populate_phandle_cache(void)
 {
unsigned long flags;
@@ -1209,11 +1230,18 @@ struct device_node *of_find_node_by_phandle(phandle 
handle)
if (phandle_cache[masked_handle] &&
handle == phandle_cache[masked_handle]->phandle)
np = phandle_cache[masked_handle];
+   if (np && of_node_check_flag(np, OF_DETACHED)) {
+   WARN_ON(1);
+   of_node_put(np);
+   phandle_cache[masked_handle] = NULL;
+   np = NULL;
+   }
}
 
if (!np) {
for_each_of_allnodes(np)
-   if (np->phandle == handle) {
+   if (np->phandle == handle &&
+   !of_node_check_flag(np, OF_DETACHED)) {
if (phandle_cache) {
/* will put when removed from cache */
of_node_get(np);
diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index f4f8ed9b5454..ecea92f68c87 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -268,6 +268,9 @@ void __of_detach_node(struct device_node *np)
}
 
of_node_set_flag(np, OF_DETACHED);
+
+   /* race with of_find_node_by_phandle() prevented by devtree_lock */
+   __of_free_phandle_cache_entry(np->phandle);
 }
 
 /**
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index 5d1567025358..24786818e32e 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -84,6 +84,10 @@ static inline void __of_detach_node_sysfs(struct device_node 
*np) {}
 int of_resolve_phandles(struct device_node *tree);
 #endif
 
+#if defined(CONFIG_OF_DYNAMIC)
+void __of_free_phandle_cache_entry(phandle handle);
+#endif
+
 #if defined(CONFIG_OF_OVERLAY)
 void of_overlay_mutex_lock(void);
 void of_overlay_mutex_unlock(void);
-- 
Frank Rowand