Re: perf tools: add support for generating bpf prologue on powerpc

2016-05-06 Thread Michael Ellerman
On Thu, 2016-05-05 at 15:23:19 UTC, "Naveen N. Rao" wrote:
> Generalize existing macros to serve the purpose.
> 
> Cc: Wang Nan 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Masami Hiramatsu 
> Cc: Ian Munsie 
> Cc: Michael Ellerman 
> Signed-off-by: Naveen N. Rao 
> ---
> With this patch:
> # ./perf test 37
> 37: Test BPF filter  :
> 37.1: Test basic BPF filtering   : Ok
> 37.2: Test BPF prologue generation   : Ok
> 37.3: Test BPF relocation checker: Ok
> 
>  tools/perf/arch/powerpc/Makefile  |  1 +
>  tools/perf/arch/powerpc/util/dwarf-regs.c | 40 
> +--
>  2 files changed, 29 insertions(+), 12 deletions(-)

Looks feasible, and is in powerpc only code, should I take this or acme?

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [3/3] powerpc/fadump: add support for fadump_nr_cpus= parameter

2016-05-06 Thread Michael Ellerman
On Fri, 2016-06-05 at 11:51:08 UTC, Hari Bathini wrote:
> Kernel parameter 'nr_cpus' can be used to limit the maximum number
> of processors that an SMP kernel could support. This patch extends
> this to fadump by introducing 'fadump_nr_cpus' parameter that can
> help in booting fadump kernel on a lower memory footprint.

Is there really no other way to do this? I really hate adding new, single use
only command line parameters.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [2/3] powerpc/fadump: add support to specify memory range based size

2016-05-06 Thread Michael Ellerman
On Fri, 2016-06-05 at 11:50:37 UTC, Hari Bathini wrote:
> Currently, memory for fadump can be specified with fadump_reserve_mem=size,
> where only a fixed size can be specified. This patch tries to extend this
> syntax to support conditional reservation based on memory size, with the
> below syntax:
> 
>   fadump_reserve_mem=:[,:,...]
> 
> This syntax helps using the same commandline parameter for different system
> memory sizes.

This is basically using the crashkernel= syntax right?

So can we please reuse the crashkernel= parsing code?

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-06 Thread Alex Williamson
On Fri, 6 May 2016 16:35:38 +1000
Alexey Kardashevskiy  wrote:

> On 05/06/2016 01:05 AM, Alex Williamson wrote:
> > On Thu, 5 May 2016 12:15:46 +
> > "Tian, Kevin"  wrote:
> >  
> >>> From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> >>> Sent: Thursday, May 05, 2016 7:43 PM
> >>>
> >>> Hi David and Kevin,
> >>>
> >>> On 2016/5/5 17:54, David Laight wrote:
> >>>  
>  From: Tian, Kevin  
> > Sent: 05 May 2016 10:37  
>  ...  
> >> Acutually, we are not aimed at accessing MSI-X table from
> >> guest. So I think it's safe to passthrough MSI-X table if we
> >> can make sure guest kernel would not touch MSI-X table in
> >> normal code path such as para-virtualized guest kernel on PPC64.
> >>  
> > Then how do you prevent malicious guest kernel accessing it?  
>  Or a malicious guest driver for an ethernet card setting up
>  the receive buffer ring to contain a single word entry that
>  contains the address associated with an MSI-X interrupt and
>  then using a loopback mode to cause a specific packet be
>  received that writes the required word through that address.
> 
>  Remember the PCIe cycle for an interrupt is a normal memory write
>  cycle.
> 
>   David
>   
> >>>
> >>> If we have enough permission to load a malicious driver or
> >>> kernel, we can easily break the guest without exposed
> >>> MSI-X table.
> >>>
> >>> I think it should be safe to expose MSI-X table if we can
> >>> make sure that malicious guest driver/kernel can't use
> >>> the MSI-X table to break other guest or host. The
> >>> capability of IRQ remapping could provide this
> >>> kind of protection.
> >>>  
> >>
> >> With IRQ remapping it doesn't mean you can pass through MSI-X
> >> structure to guest. I know actual IRQ remapping might be platform
> >> specific, but at least for Intel VT-d specification, MSI-X entry must
> >> be configured with a remappable format by host kernel which
> >> contains an index into IRQ remapping table. The index will find a
> >> IRQ remapping entry which controls interrupt routing for a specific
> >> device. If you allow a malicious program random index into MSI-X
> >> entry of assigned device, the hole is obvious...
> >>
> >> Above might make sense only for a IRQ remapping implementation
> >> which doesn't rely on extended MSI-X format (e.g. simply based on
> >> BDF). If that's the case for PPC, then you should build MSI-X
> >> passthrough based on this fact instead of general IRQ remapping
> >> enabled or not.  
> >
> > I don't think anyone is expecting that we can expose the MSI-X vector
> > table to the guest and the guest can make direct use of it.  The end
> > goal here is that the guest on a power system is already
> > paravirtualized to not program the device MSI-X by directly writing to
> > the MSI-X vector table.  They have hypercalls for this since they
> > always run virtualized.  Therefore a) they never intend to touch the
> > MSI-X vector table and b) they have sufficient isolation that a guest
> > can only hurt itself by doing so.
> >
> > On x86 we don't have a), our method of programming the MSI-X vector
> > table is to directly write to it. Therefore we will always require QEMU
> > to place a MemoryRegion over the vector table to intercept those
> > accesses.  However with interrupt remapping, we do have b) on x86, which
> > means that we don't need to be so strict in disallowing user accesses
> > to the MSI-X vector table.  It's not useful for configuring MSI-X on
> > the device, but the user should only be able to hurt themselves by
> > writing it directly.  x86 doesn't really get anything out of this
> > change, but it helps this special case on power pretty significantly
> > aiui.  Thanks,  
> 
> Excellent short overview, saved :)
> 
> How do we proceed with these patches? Nobody seems objecting them but also 
> nobody seems taking them either...

Well, this series is still based on some non-upstream patches, so...
Once that dependency is resolved this series should probably be split
into functional areas for acceptance by the appropriate subsystem
maintainers.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v9 22/22] PCI/hotplug: PowerPC PowerNV PCI hotplug driver

2016-05-06 Thread Rob Herring
On Thu, May 5, 2016 at 7:28 PM, Gavin Shan  wrote:
> On Thu, May 05, 2016 at 12:04:49PM -0500, Rob Herring wrote:
>>On Tue, May 3, 2016 at 8:22 AM, Gavin Shan  wrote:
>>> This adds standalone driver to support PCI hotplug for PowerPC PowerNV
>>> platform that runs on top of skiboot firmware. The firmware identifies
>>> hotpluggable slots and marked their device tree node with proper
>>> "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
>>> device tree nodes to create/register PCI hotplug slot accordingly.
>>>
>>> The PCI slots are organized in fashion of tree, which means one
>>> PCI slot might have parent PCI slot and parent PCI slot possibly
>>> contains multiple child PCI slots. At the plugging time, the parent
>>> PCI slot is populated before its children. The child PCI slots are
>>> removed before their parent PCI slot can be removed from the system.
>>>
>>> If the skiboot firmware doesn't support slot status retrieval, the PCI
>>> slot device node shouldn't have property "ibm,reset-by-firmware". In
>>> that case, none of valid PCI slots will be detected from device tree.
>>> The skiboot firmware doesn't export the capability to access attention
>>> LEDs yet and it's something for TBD.
>>>
>>> Signed-off-by: Gavin Shan 
>>> Acked-by: Bjorn Helgaas 
>>
>>[...]
>>
>>> +static void pnv_php_handle_poweron(struct pnv_php_slot *php_slot)
>>> +{
>>> +   void *fdt, *fdt1, *dt;
>>> +   int confirm = PNV_PHP_POWER_CONFIRMED_SUCCESS;
>>> +   int ret;
>>> +
>>> +   /* We don't know the FDT blob size. We try to get it through
>>> +* maximal memory chunk and then copy it to another chunk that
>>> +* fits the real size.
>>> +*/
>>> +   fdt1 = kzalloc(0x1, GFP_KERNEL);
>>> +   if (!fdt1)
>>> +   goto error;
>>> +
>>> +   ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x1);
>>> +   if (ret)
>>> +   goto free_fdt1;
>>> +
>>> +   fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
>>> +   if (!fdt)
>>> +   goto free_fdt1;
>>> +
>>> +   /* Unflatten device tree blob */
>>> +   memcpy(fdt, fdt1, fdt_totalsize(fdt1));
>>
>>This is wrong. If the size is greater than 64K, then you will be
>>overrunning the fdt1 buffer. You need to fetch the FDT again if it is
>>bigger than 64KB.
>>
>
> Thanks for review, Rob. Sorry that I don't see how it's a problem. An
> errcode is returned from pnv_pci_get_device_tree() if the FDT blob
> size is greater than 64K. In this case, memcpy() won't be triggered.
> pnv_pci_get_device_tree() relies on firmware implementation which
> avoids overrunning the buffer.

Okay, I missed that pnv_pci_get_device_tree would error out.

> On the other hand, it would be reasonable to retry retriving the
> FDT blob if 64K buffer isn't enough. Also, kzalloc() can be replaced
> with alloc_pages() as 64K is the default page size on PPC64. I will
> have something like below until some one has more concerns. As the
> size of the allocated buffer will be greater than the real FDT blob
> size, some memory (not too much) is wasted. I guess it should be ok.
>
> struct page *page;
> void *fdt;
> unsigned int order;
> int ret;
>
> for (order = 0; order < MAX_ORDER; order++) {
> page = alloc_pages(GFP_KERNEL, order);
> if (page) {
> fdt = page_address(page);
> ret = pnv_pci_get_device_tree(php_slot->dn->phandle,
>   fdt, (1 << order) * 
> PAGE_SIZE);
> if (ret) {
> dev_dbg(_slot->pdev.dev, "Error %d 
> getting device tree (%d)\n",
> ret, order);
> free_pages(fdt, order);
> continue;
> }
> }
> }

I would allocate a minimal buffer to read the header, get the actual
size, then allocate a new buffer. There's no point in looping. If you
know 64KB is the biggest size you should ever see, then how you had it
is reasonable, too.

Rob
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model

2016-05-06 Thread Josh Poimboeuf
On Fri, May 06, 2016 at 01:33:01PM +0200, Petr Mladek wrote:
> On Thu 2016-04-28 15:44:48, Josh Poimboeuf wrote:
> > diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
> > index 782fbb5..b3b8639 100644
> > --- a/kernel/livepatch/patch.c
> > +++ b/kernel/livepatch/patch.c
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include "patch.h"
> > +#include "transition.h"
> >  
> >  static LIST_HEAD(klp_ops);
> >  
> > @@ -58,11 +59,42 @@ static void notrace klp_ftrace_handler(unsigned long ip,
> > ops = container_of(fops, struct klp_ops, fops);
> >  
> > rcu_read_lock();
> > +
> > func = list_first_or_null_rcu(>func_stack, struct klp_func,
> >   stack_node);
> > -   if (WARN_ON_ONCE(!func))
> > +
> > +   if (!func)
> > goto unlock;
> >  
> > +   /*
> > +* See the comment for the 2nd smp_wmb() in klp_init_transition() for
> > +* an explanation of why this read barrier is needed.
> > +*/
> > +   smp_rmb();
> > +
> > +   if (unlikely(func->transition)) {
> > +
> > +   /*
> > +* See the comment for the 1st smp_wmb() in
> > +* klp_init_transition() for an explanation of why this read
> > +* barrier is needed.
> > +*/
> > +   smp_rmb();
> 
> I would add here:
> 
>   WARN_ON_ONCE(current->patch_state == KLP_UNDEFINED);
> 
> We do not know in which context this is called, so the printk's are
> not ideal. But it will get triggered only if there is a bug in
> the livepatch implementation. It should happen on random locations
> and rather early when a bug is introduced.
> 
> Anyway, better to die and catch the bug that let the system running
> in an undefined state and produce cryptic errors later on.

Ok, makes sense.

> > +   if (current->patch_state == KLP_UNPATCHED) {
> > +   /*
> > +* Use the previously patched version of the function.
> > +* If no previous patches exist, use the original
> > +* function.
> > +*/
> > +   func = list_entry_rcu(func->stack_node.next,
> > + struct klp_func, stack_node);
> > +
> > +   if (>stack_node == >func_stack)
> > +   goto unlock;
> > +   }
> > +   }
> 
> I am staring into the code for too long now. I need to step back for a
> while. I'll do another look when you send the next version. Anyway,
> you did a great work. I speak mainly for the livepatch part and
> I like it.

Thanks for the helpful reviews!  I'll be on vacation again next week so
I get a break too :-)

-- 
Josh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Canyonlands oops at Shutdown

2016-05-06 Thread Julian Margetson
Getting the following at shutdown with Kernel 4.6-rc's on Sam460ex 
Canyonlands board .


Regards
Julian


[ 1533.722779] Unable to handle kernel paging request for data at 
address 0x0128

[ 1533.744309] Faulting instruction address: 0xc026d3c8
[ 1535.763583] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1535.782886] PREEMPT Canyonlands
[ 1535.799805] Modules linked in:
[ 1535.816546] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 
4.6.0-rc6-sam460ex-jm #4

[ 1535.838341] task: ea85 ti: ea846000 task.ti: ea846000
[ 1535.857783] NIP: c026d3c8 LR: c0466984 CTR: c001a8ac
[ 1535.876847] REGS: ea847d10 TRAP: 0300   Not tainted 
(4.6.0-rc6-sam460ex-jm)

[ 1535.898224] MSR: 00029000   CR: 44422284  XER: 
[ 1535.918868] DEAR: 0128 ESR: 
GPR00: c0466984 ea847dc0 ea85 0108 000f fff0  
0007
GPR08: 0001 c0b5a19c  ea847de0 28428468 205cfe94 205a946e 
bfff8a7c
GPR16: 2097d008 2097d018 2097d090 bfff8980 bfff897c   
4321fedc
GPR24: 2097d578 c0b3f17c c0b8  c0a95e68 eaa35000 c0b8545c 
0108

[ 1536.024665] NIP [c026d3c8] kobject_get+0x18/0x80
[ 1536.044003] LR [c0466984] get_device+0x1c/0x38
[ 1536.063198] Call Trace:
[ 1536.080424] [ea847dc0] [eaa3fa10] 0xeaa3fa10 (unreliable)
[ 1536.100896] [ea847dd0] [c0466984] get_device+0x1c/0x38
[ 1536.121059] [ea847de0] [c04689f4] device_shutdown+0x58/0x178
[ 1536.141727] [ea847e10] [c003c280] kernel_halt+0x38/0x64
[ 1536.161875] [ea847e20] [c003c4cc] SyS_reboot+0x140/0x1b0
[ 1536.182057] [ea847f40] [c000ad80] ret_from_syscall+0x0/0x3c
[ 1536.202568] --- interrupt: c01 at 0x203d1fbc
[ 1536.202568] LR = 0x2058f878
[ 1536.239761] Instruction dump:
[ 1536.257358] 4b9c 7fa3eb78 484cad91 39610020 7fe3fb78 4bda3d74 
9421fff0 7c0802a6
[ 1536.280308] 93e1000c 7c7f1b79 90010014 41820060 <813f0020> 2f89 
41bc001c 809f

[ 1536.303611] ---[ end trace f5a63492b41c62f2 ]---
[ 1536.323546]
[ 1537.340293] note: systemd-shutdow[1] exited with preempt_count 1
[ 1537.363602] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b

[ 1537.363602]
[ 1537.404665] Rebooting in 180 seconds..


U-Boot 2015.a (May 16 2015 - 14:20:11)

CPU:   AMCC PowerPC 460EX Rev. B at 1155 MHz (PLB=231 OPB=115 EBC=115)
   No Security/Kasumi support
   Bootstrap Option H - Boot ROM Location I2C (Addr 0x52)
   Internal PCI arbiter enabled
   32 kB I-Cache 32 kB D-Cache
Board: Sam460ex/cr, PCIe 4x + SATA-2
I2C:   ready
DRAM:  ddr2_boost enabled, level 3
   2 GiB (ECC not enabled, 462 MHz, CL4)
PCI:   Bus Dev VenId DevId Class Int
00  04  1095  3512  0104  00
00  06  126f  0501  0380  00
PCIE1: successfully set as root-complex
02  00  1002  683f  0300  ff
Net:   ppc_4xx_eth0
FPGA:  Revision 03 (2010-10-07)
SM502: found
PERMD2:not found
VGA:   1
VESA:  OK
[0.00] Using Canyonlands machine description
[0.00] Linux version 4.6.0-rc6-sam460ex-jm (root@julian-VirtualBox) 
(gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #4 PREEMPT Fri May 6 
07:54:21 AST 2016
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x-0x2fff]
[0.00]   Normal   empty
[0.00]   HighMem  [mem 0x3000-0x7fff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x-0x7fff]
[0.00] Initmem setup node 0 [mem 0x-0x7fff]
[0.00] MMU: Allocated 1088 bytes of context maps for 255 contexts
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 522752
[0.00] Kernel command line: root=/dev/sda6 console=ttyS0,115200 
console=tty0
[0.00] PID hash table entries: 4096 (order: 2, 16384 bytes)
[0.00] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[0.00] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[0.00] Sorting __ex_table...
[0.00] Memory: 2001664K/2097152K available (7416K kernel code, 316K 
rwdata, 3808K rodata, 240K init, 370K bss, 95488K reserved, 0K cma-reserved, 
1310720K highmem)
[0.00] Kernel virtual memory layout:
[0.00]   * 0xfffcf000..0xf000  : fixmap
[0.00]   * 0xffc0..0xffe0  : highmem PTEs
[0.00]   * 0xffa0..0xffc0  : consistent mem
[0.00]   * 0xffa0..0xffa0  : early ioremap
[0.00]   * 0xf100..0xffa0  : vmalloc & ioremap
[0.00] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[0.00] Preemptible hierarchical RCU implementation.
[0.00] Build-time adjustment of leaf fanout to 32.
[0.00] NR_IRQS:512 nr_irqs:512 16
[0.00] UIC0 (32 IRQ sources) at DCR 0xc0
[0.00] UIC1 (32 IRQ sources) at DCR 0xd0
[0.00] UIC2 (32 IRQ sources) at DCR 0xe0
[0.00] UIC3 (32 IRQ sources) at DCR 0xf0
[

Re: klp_task_patch: was: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model

2016-05-06 Thread Josh Poimboeuf
On Thu, May 05, 2016 at 01:57:01PM +0200, Petr Mladek wrote:
> I have missed that the two commands are called with preemption
> disabled. So, I had the following crazy scenario in mind:
> 
> 
> CPU0  CPU1
> 
> klp_enable_patch()
> 
>   klp_target_state = KLP_PATCHED;
> 
>   for_each_task()
>  set TIF_PENDING_PATCH
> 
>   # task 123
> 
>   if (klp_patch_pending(current)
> klp_patch_task(current)
> 
> clear TIF_PENDING_PATCH
> 
>   smp_rmb();
> 
>   # switch to assembly of
>   # klp_patch_task()
> 
>   mov klp_target_state, %r12
> 
>   # interrupt and schedule
>   # another task
> 
> 
>   klp_reverse_transition();
> 
> klp_target_state = KLP_UNPATCHED;
> 
> klt_try_to_complete_transition()
> 
>   task = 123;
>   if (task->patch_state == klp_target_state;
>  return 0;
> 
> => task 123 is in target state and does
> not block conversion
> 
>   klp_complete_transition()
> 
> 
>   # disable previous patch on the stack
>   klp_disable_patch();
> 
> klp_target_state = KLP_UNPATCHED;
>   
>   
>   # task 123 gets scheduled again
>   lea %r12, task->patch_state
> 
>   => it happily stores an outdated
>   state
> 

Thanks for the clear explanation, this helps a lot.

> This is why the two functions should get called with preemption
> disabled. We should document it at least. I imagine that we will
> use them later also in another context and nobody will remember
> this crazy scenario.
> 
> Well, even disabled preemption does not help. The process on
> CPU1 might be also interrupted by an NMI and do some long
> printk in it.
> 
> IMHO, the only safe approach is to call klp_patch_task()
> only for "current" on a safe place. Then this race is harmless.
> The switch happen on a safe place, so that it does not matter
> into which state the process is switched.

I'm not sure about this solution.  When klp_complete_transition() is
called, we need all tasks to be patched, for good.  We don't want any of
them to randomly switch to the wrong state at some later time in the
middle of a future patch operation.  How would changing klp_patch_task()
to only use "current" prevent that?

> By other words, the task state might be updated only
> 
>+ by the task itself on a safe place
>+ by other task when the updated on is sleeping on a safe place
> 
> This should be well documented and the API should help to avoid
> a misuse.

I think we could fix it to be safe for future callers who might not have
preemption disabled with a couple of changes to klp_patch_task():
disabling preemption and testing/clearing the TIF_PATCH_PENDING flag
before changing the patch state:

  void klp_patch_task(struct task_struct *task)
  {
preempt_disable();
  
if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING))
task->patch_state = READ_ONCE(klp_target_state);
  
preempt_enable();
  }

We would also need a synchronize_sched() after the patching is complete,
either at the end of klp_try_complete_transition() or in
klp_complete_transition().  That would make sure that all existing calls
to klp_patch_task() are done.

-- 
Josh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/3] powerpc/fadump: add support for fadump_nr_cpus= parameter

2016-05-06 Thread Hari Bathini
Kernel parameter 'nr_cpus' can be used to limit the maximum number
of processors that an SMP kernel could support. This patch extends
this to fadump by introducing 'fadump_nr_cpus' parameter that can
help in booting fadump kernel on a lower memory footprint.

Suggested-by: Mahesh Salgaonkar 
Signed-off-by: Hari Bathini 
---
 arch/powerpc/kernel/fadump.c |   22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index a7fef3e..c75783c 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -470,6 +470,28 @@ static int __init early_fadump_param(char *p)
 }
 early_param("fadump", early_fadump_param);
 
+/* Look for fadump_nr_cpus= cmdline option. */
+static int __init early_fadump_nrcpus(char *p)
+{
+   int nr_cpus;
+
+   /*
+* fadump_nr_cpus parameter is only applicable on a
+* fadump active kernel. This is to reduce memory
+* needed to boot a fadump active kernel.
+* So, check if we are booting after crash.
+*/
+   if (!is_fadump_active())
+   return 0;
+
+   get_option(, _cpus);
+   if (nr_cpus > 0 && nr_cpus < nr_cpu_ids)
+   nr_cpu_ids = nr_cpus;
+
+   return 0;
+}
+early_param("fadump_nr_cpus", early_fadump_nrcpus);
+
 static void register_fw_dump(struct fadump_mem_struct *fdm)
 {
int rc;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] powerpc/fadump: add support to specify memory range based size

2016-05-06 Thread Hari Bathini
Currently, memory for fadump can be specified with fadump_reserve_mem=size,
where only a fixed size can be specified. This patch tries to extend this
syntax to support conditional reservation based on memory size, with the
below syntax:

fadump_reserve_mem=:[,:,...]

This syntax helps using the same commandline parameter for different system
memory sizes.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/kernel/fadump.c |  127 +++---
 1 file changed, 118 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index d0af58b..a7fef3e 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -193,6 +193,121 @@ static unsigned long init_fadump_mem_struct(struct 
fadump_mem_struct *fdm,
return addr;
 }
 
+#define FADUMP_MEM_CMDLINE_PREFIX  "fadump_reserve_mem="
+
+static __init char *get_last_fadump_reserve_mem(void)
+{
+   char *p = boot_command_line, *fadump_cmdline = NULL;
+
+   /* find fadump_reserve_mem and use the last one if there are more */
+   p = strstr(p, FADUMP_MEM_CMDLINE_PREFIX);
+   while (p) {
+   fadump_cmdline = p;
+   p = strstr(p+1, FADUMP_MEM_CMDLINE_PREFIX);
+   }
+
+   return fadump_cmdline;
+}
+
+#define parse_fadump_print(fmt, arg...) \
+   printk(KERN_INFO "fadump_reserve_mem: " fmt, ##arg)
+
+/*
+ * This function parses command line for fadump_reserve_mem=
+ *
+ * Supports the below two syntaxes:
+ *1. fadump_reserve_mem=size
+ *2. fadump_reserve_mem=ramsize-range:size[,...]
+ *
+ * Sets fw_dump.reserve_bootvar with the memory size
+ * provided, 0 otherwise
+ *
+ * The function returns -EINVAL on failure, 0 otherwise.
+ */
+static int __init parse_fadump_reserve_mem(void)
+{
+   char *cur, *tmp;
+   char *first_colon, *first_space;
+   char *fadump_cmdline;
+   unsigned long long system_ram;
+
+   fw_dump.reserve_bootvar = 0;
+   fadump_cmdline = get_last_fadump_reserve_mem();
+
+   /* when no fadump_reserve_mem= cmdline option is provided */
+   if (!fadump_cmdline)
+   return 0;
+
+   first_colon = strchr(fadump_cmdline, ':');
+   first_space = strchr(fadump_cmdline, ' ');
+   cur = fadump_cmdline + strlen(FADUMP_MEM_CMDLINE_PREFIX);
+
+   /* for fadump_reserve_mem=size cmdline syntax */
+   if (!first_colon || (first_space && (first_colon > first_space))) {
+   fw_dump.reserve_bootvar = memparse(cur, );
+   return 0;
+   }
+
+   /* for fadump_reserve_mem=ramsize-range:size[,...] cmdline syntax */
+   system_ram = memblock_phys_mem_size();
+   /* for each entry of the comma-separated list */
+   do {
+   unsigned long long start, end = ULLONG_MAX, size;
+
+   /* get the start of the range */
+   start = memparse(cur, );
+   if (cur == tmp) {
+   parse_fadump_print("Memory value expected\n");
+   return -EINVAL;
+   }
+   cur = tmp;
+   if (*cur != '-') {
+   parse_fadump_print("'-' expected\n");
+   return -EINVAL;
+   }
+   cur++;
+
+   /* if no ':' is here, than we read the end */
+   if (*cur != ':') {
+   end = memparse(cur, );
+   if (cur == tmp) {
+   parse_fadump_print("Memory value expected\n");
+   return -EINVAL;
+   }
+   cur = tmp;
+   if (end <= start) {
+   parse_fadump_print("end <= start\n");
+   return -EINVAL;
+   }
+   }
+
+   if (*cur != ':') {
+   parse_fadump_print("':' expected\n");
+   return -EINVAL;
+   }
+   cur++;
+
+   size = memparse(cur, );
+   if (cur == tmp) {
+   parse_fadump_print("Memory value expected\n");
+   return -EINVAL;
+   }
+   cur = tmp;
+   if (size >= system_ram) {
+   parse_fadump_print("invalid size\n");
+   return -EINVAL;
+   }
+
+   /* match ? */
+   if (system_ram >= start && system_ram < end) {
+   fw_dump.reserve_bootvar = size;
+   break;
+   }
+   } while (*cur++ == ',');
+
+   return 0;
+}
+
 /**
  * fadump_calculate_reserve_size(): reserve variable boot area 5% of System RAM
  *
@@ -212,6 +327,9 @@ static inline unsigned long 
fadump_calculate_reserve_size(void)
 {
unsigned long size;
 
+   /* sets fw_dump.reserve_bootvar */
+   

[PATCH 1/3] powerpc/fadump: set an upper limit for the default memory reserved for fadump

2016-05-06 Thread Hari Bathini
When boot memory size for fadump is not specified, memory is reserved
for fadump based on system RAM size. As the system RAM size increases,
the memory reserved for fadump increases as well. This patch sets an
upper limit on the memory reserved for fadump, to avoid reserving
excess memory.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/fadump.h |6 ++
 arch/powerpc/kernel/fadump.c  |4 
 2 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index b4407d0..2c3cb32 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -43,6 +43,12 @@
 #define MIN_BOOT_MEM   (((RMA_END < (0x1UL << 28)) ? (0x1UL << 28) : RMA_END) \
+ (0x1UL << 26))
 
+/*
+ * Maximum memory needed for fadump to boot up successfully. Use this as
+ * an upper limit for fadump so we don't endup reserving excess memory.
+ */
+#define MAX_BOOT_MEM   (0x1UL << 32)
+
 #define memblock_num_regions(memblock_type)(memblock.memblock_type.cnt)
 
 #ifndef ELF_CORE_EFLAGS
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 3cb3b02a..d0af58b 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -225,6 +225,10 @@ static inline unsigned long 
fadump_calculate_reserve_size(void)
/* round it down in multiples of 256 */
size = size & ~0x0FFFUL;
 
+   /* Set an upper limit on the memory to be reserved */
+   if (size > MAX_BOOT_MEM)
+   size = MAX_BOOT_MEM;
+
/* Truncate to memory_limit. We don't want to over reserve the memory.*/
if (memory_limit && size > memory_limit)
size = memory_limit;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V10 00/28] Add new powerpc specific ELF core notes

2016-05-06 Thread Michael Ellerman
On Tue, 2016-02-16 at 14:29 +0530, Anshuman Khandual wrote:

>   This patch series adds twelve new ELF core note sections which can
> be used with existing ptrace request PTRACE_GETREGSET-SETREGSET for accessing
> various transactional memory and other miscellaneous debug register sets on
> powerpc platform.
> 
> Test Result (All tests pass on both BE and LE)
> --
> ptrace-ebbPASS
> ptrace-gprPASS
> ptrace-tm-gpr PASS
> ptrace-tm-spd-gpr PASS
> ptrace-tarPASS
> ptrace-tm-tar PASS
> ptrace-tm-spd-tar PASS
> ptrace-vsxPASS
> ptrace-tm-vsx PASS
> ptrace-tm-spd-vsx PASS
> ptrace-tm-spr PASS

How are you building the tests? On BE I get:


  In file included from ptrace-tm-gpr.c:12:0:
  ptrace-tm-gpr.c: In function ‘trace_tm_gpr’:
  In file included from ptrace.h:31:0,
   from ptrace-tm-vsx.c:11:
  ptrace-tm-vsx.c: In function ‘ptrace_tm_vsx’:
  ptrace-gpr.h:20:19: error: large integer implicitly truncated to unsigned 
type [-Werror=overflow]
   #define FPR_2_REP 0x3f60624de000
 ^
  ptrace-tm-gpr.c:209:26: note: in expansion of macro ‘FPR_2_REP’
ret = validate_fpr(fpr, FPR_2_REP);
^
  ptrace-tm-vsx.c:150:46: error: ‘PPC_FEATURE2_HTM’ undeclared (first use in 
this function)
SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2) & PPC_FEATURE2_HTM));
^
  
/home/kerkins/workspace/kernel-build-selftests/arch/powerpc/compiler/gcc_ubuntu_be/linux/tools/testing/selftests/powerpc/utils.h:49:7:
 note: in definition of macro ‘SKIP_IF’
if ((x)) {  \
 ^
  ptrace-gpr.h:19:19: error: large integer implicitly truncated to unsigned 
type [-Werror=overflow]
   #define FPR_1_REP 0x3f50624de000
 ^
  ptrace-tm-gpr.c:217:26: note: in expansion of macro ‘FPR_1_REP’
ret = validate_fpr(fpr, FPR_1_REP);
^
  ptrace-tm-vsx.c:150:46: note: each undeclared identifier is reported only 
once for each function it appears in
SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2) & PPC_FEATURE2_HTM));
^
  
/home/kerkins/workspace/kernel-build-selftests/arch/powerpc/compiler/gcc_ubuntu_be/linux/tools/testing/selftests/powerpc/utils.h:49:7:
 note: in definition of macro ‘SKIP_IF’
if ((x)) {  \
 ^
  ptrace-gpr.h:21:19: error: large integer implicitly truncated to unsigned 
type [-Werror=overflow]
   #define FPR_3_REP 0x3f689374c000
 ^
  ptrace-tm-gpr.c:233:30: note: in expansion of macro ‘FPR_3_REP’
ret = write_ckpt_fpr(child, FPR_3_REP);
^
  In file included from ptrace.h:31:0,
   from ptrace-tm-gpr.c:11:
  ptrace-tm-gpr.c: In function ‘ptrace_tm_gpr’:
  ptrace-tm-gpr.c:249:46: error: ‘PPC_FEATURE2_HTM’ undeclared (first use in 
this function)
SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2) & PPC_FEATURE2_HTM));
^
  
/home/kerkins/workspace/kernel-build-selftests/arch/powerpc/compiler/gcc_ubuntu_be/linux/tools/testing/selftests/powerpc/utils.h:49:7:
 note: in definition of macro ‘SKIP_IF’
if ((x)) {  \
 ^
  ptrace-tm-gpr.c:249:46: note: each undeclared identifier is reported only 
once for each function it appears in
SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2) & PPC_FEATURE2_HTM));
^
  
/home/kerkins/workspace/kernel-build-selftests/arch/powerpc/compiler/gcc_ubuntu_be/linux/tools/testing/selftests/powerpc/utils.h:49:7:
 note: in definition of macro ‘SKIP_IF’
if ((x)) {  \
 ^
  cc1: all warnings being treated as errors
  In file included from ../pmu/ebb/ebb.h:12:0,
   from ptrace-ebb.c:11:
  ptrace-ebb.h: In function ‘reset_ebb_with_clear_mask’:
  ../pmu/ebb/../../reg.h:49:31: error: left shift count >= width of type 
[-Werror=shift-count-overflow]
   #define BESCR_PME  (0x1ul << 32) /* PMU Event-based exception Enable */
 ^
  ../pmu/ebb/../../reg.h:16:60: note: in definition of macro ‘mtspr’
   : "r" ((unsigned long)(v)) \
  ^
  ptrace-ebb.h:73:28: note: in expansion of macro ‘BESCR_PME’
   mtspr(SPRN_BESCRS, BESCR_PME);
  ^
  In file included from ptrace-tm-tar.c:12:0:
  ptrace-tm-tar.c: In function ‘tm_tar’:
  ptrace-tar.h:24:17: error: large integer implicitly truncated to unsigned 
type [-Werror=overflow]
   #define PPR_4   0x10/* or 2,2,2 */
   ^
  ptrace-tm-tar.c:78:45: note: in expansion of macro ‘PPR_4’
 ret = validate_tar_registers(regs, TAR_4, PPR_4, DSCR_4);
   ^
  In file included from 

Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model

2016-05-06 Thread Petr Mladek
On Thu 2016-04-28 15:44:48, Josh Poimboeuf wrote:
> diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
> index 782fbb5..b3b8639 100644
> --- a/kernel/livepatch/patch.c
> +++ b/kernel/livepatch/patch.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include "patch.h"
> +#include "transition.h"
>  
>  static LIST_HEAD(klp_ops);
>  
> @@ -58,11 +59,42 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>   ops = container_of(fops, struct klp_ops, fops);
>  
>   rcu_read_lock();
> +
>   func = list_first_or_null_rcu(>func_stack, struct klp_func,
> stack_node);
> - if (WARN_ON_ONCE(!func))
> +
> + if (!func)
>   goto unlock;
>  
> + /*
> +  * See the comment for the 2nd smp_wmb() in klp_init_transition() for
> +  * an explanation of why this read barrier is needed.
> +  */
> + smp_rmb();
> +
> + if (unlikely(func->transition)) {
> +
> + /*
> +  * See the comment for the 1st smp_wmb() in
> +  * klp_init_transition() for an explanation of why this read
> +  * barrier is needed.
> +  */
> + smp_rmb();

I would add here:

WARN_ON_ONCE(current->patch_state == KLP_UNDEFINED);

We do not know in which context this is called, so the printk's are
not ideal. But it will get triggered only if there is a bug in
the livepatch implementation. It should happen on random locations
and rather early when a bug is introduced.

Anyway, better to die and catch the bug that let the system running
in an undefined state and produce cryptic errors later on.


> + if (current->patch_state == KLP_UNPATCHED) {
> + /*
> +  * Use the previously patched version of the function.
> +  * If no previous patches exist, use the original
> +  * function.
> +  */
> + func = list_entry_rcu(func->stack_node.next,
> +   struct klp_func, stack_node);
> +
> + if (>stack_node == >func_stack)
> + goto unlock;
> + }
> + }

I am staring into the code for too long now. I need to step back for a
while. I'll do another look when you send the next version. Anyway,
you did a great work. I speak mainly for the livepatch part and
I like it.

Best Regards,
Petr
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v9 04/22] powerpc/powernv: Increase PE# capacity

2016-05-06 Thread Gavin Shan
On Fri, May 06, 2016 at 05:17:25PM +1000, Alexey Kardashevskiy wrote:
>On 05/03/2016 11:22 PM, Gavin Shan wrote:
>>Each PHB maintains an array helping to translate 2-bytes Request
>>ID (RID) to PE# with the assumption that PE# takes one byte, meaning
>>that we can't have more than 256 PEs. However, pci_dn->pe_number
>>already had 4-bytes for the PE#.
>
>Can you possibly have more than 256 PEs? Or exactly 256? What patch in this
>series makes use of it?
>
>I probably asked but do not remember the answer :)
>
>Looks like waste of memory - you only used a small fraction of
>pe_rmap[0x1] and now the waste is quadrupled.
>

The PE capacities on different hardware are different as below. So we're
going to support 16-bits PE number in near future. That means the element
in the array needs "unsigned short" at least and 2 pages (2 * 64KB) will
be reserved for it.

P7IOC: 127PHB3: 256
PHB4:  65536  NPU1: 4NPU2: 16

I agree some memory is wasted and the wasted amount depends on the PCI
topology. No memory will be wasted if 256 busses show on one particular
PHB. Less busses one PHB has, more memory will be wasted. As I explained
before, the total used memory is 4 pages (4 * 64KB). Considering the memory
capacity on PPC64 (especially PowerNV), I guess it's fine. Note that the
memory is allocated from memblock together with PHB instance.

The alternative solution (to avoid wasting memory) would be searching for
the PE number according to the input BDFN through the PE list maintained
in each PHB. Obviously, it will induce more logic and more CPU cycles will
be used. So it's a kind of trade-off. If you really want to see this, I
absolutely can do it in next revision. Another option would be to improve
it later and keep the code as what we have. Please input your thought. 

>
>>
>>This extends the PE# capacity for every PHB. After that, the PE number
>>is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to
>>check the PE# in phb->pe_rmap[] is valid or not.
>
>Looks like using IODA_INVALID_PE is the only reason for this patch.
>

For now, yes. In near future, It needs to be extended to represent 16-bits
PE number for PHB4 as I explained above. 

>
>>
>>Signed-off-by: Gavin Shan 
>>Reviewed-by: Daniel Axtens 
>>---
>> arch/powerpc/platforms/powernv/pci-ioda.c | 6 +-
>> arch/powerpc/platforms/powernv/pci.h  | 7 ++-
>> 2 files changed, 7 insertions(+), 6 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index cbd4c0b..cf96cb5 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -768,7 +768,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, 
>>struct pnv_ioda_pe *pe)
>>
>>  /* Clear the reverse map */
>>  for (rid = pe->rid; rid < rid_end; rid++)
>>- phb->ioda.pe_rmap[rid] = 0;
>>+ phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
>>
>>  /* Release from all parents PELT-V */
>>  while (parent) {
>>@@ -3406,6 +3406,10 @@ static void __init pnv_pci_init_ioda_phb(struct 
>>device_node *np,
>>  if (prop32)
>>  phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
>>
>>+ /* Invalidate RID to PE# mapping */
>>+ for (segno = 0; segno < ARRAY_SIZE(phb->ioda.pe_rmap); segno++)
>>+ phb->ioda.pe_rmap[segno] = IODA_INVALID_PE;
>>+
>>  /* Parse 64-bit MMIO range */
>>  pnv_ioda_parse_m64_window(phb);
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci.h 
>>b/arch/powerpc/platforms/powernv/pci.h
>>index 904f60b..80f5326 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -156,11 +156,8 @@ struct pnv_phb {
>>  struct list_headpe_list;
>>  struct mutexpe_list_mutex;
>>
>>- /* Reverse map of PEs, will have to extend if
>>-  * we are to support more than 256 PEs, indexed
>>-  * bus { bus, devfn }
>>-  */
>>- unsigned char   pe_rmap[0x1];
>>+ /* Reverse map of PEs, indexed by {bus, devfn} */
>>+ unsigned intpe_rmap[0x1];
>>
>>  /* TCE cache invalidate registers (physical and
>>   * remapped)
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/9] FP/VEC/VSX switching optimisations

2016-05-06 Thread Naveen N. Rao
On 2016/05/05 05:32PM, Naveen N Rao wrote:
> On 2016/02/29 05:53PM, Cyril Bur wrote:
> > Cover-letter for V1 of the series is at
> > https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-November/136350.html
> > 
> > Cover-letter for V2 of the series is at
> > https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-January/138054.html
> > 
> > Changes in V3:
> > Addressed review comments from Michael Neuling
> >  - Made commit message in 4/9 better reflect the patch
> >  - Removed overuse of #ifdef blocks and redundant condition in 5/9
> >  - Split 6/8 in two to better prepare for 7,8,9
> >  - Removed #ifdefs in 6/9
> > 
> > Changes in V4:
> >  - Addressed non ABI compliant ASM macros in 1/9
> >  - Fixed build breakage due to changing #ifdefs in V3 (6/9)
> >  - Reordered some conditions in if statements
> > 
> > Changes in V5:
> >  - Enhanced basic-asm.h to provide ABI independent macro as pointed out by
> >Naveen Rao.
> >- Tested for both BE and LE builds. Had to disable -flto from the
> >  selftests/powerpc Makefile as it didn't play well with the custom ASM.
> >  - Added some extra debugging output to the vmx_signal testcase
> >  - Fixed comments in testing code
> >  - Updated VSX test code to use GCC Altivec macros
> > 
> > Changes in V6:
> >  - Removed recursive definition of CFLAGS in math/Makefile
> >  - Corrected the use of the word param in favour of doubleword
> >  - Reordered some code in basic-asm.h and neatened some comments
> 
> This series is resulting in a kernel crash with one of the perf tests.  
> To reproduce, build perf and run the test for breakpoint overflow signal 
> handler.
> 
> # ./perf test -v 17
> 17: Test breakpoint overflow signal handler  :
> --- start ---
> test child forked, pid 3753
> failed opening event 0
> failed opening event 0
> cpu 0xd: Vector: 600 (Alignment) at [c000edd738c0]
> pc: c000a818: save_fpu+0xa8/0x2ac
> lr: c001568c: __giveup_fpu+0x2c/0x90
> sp: c000edd73b40
>msr: 8280b033
>dar: c000edc436e0
>  dsisr: 4200
>   current = 0xc000edc42c00
>   paca= 0xc7e82700 softe: 0irq_happened: 0x01
> pid   = 3753, comm = perf
> Linux version 4.6.0-rc3-nnr+ (root@rhel71le) (gcc version 4.8.3 20140911 (Red 
> Hat 4.8.3-8) (GCC) ) #93 SMP Wed May 4 22:01:06 IST 2016
> enter ? for help
> [link register   ] c001568c __giveup_fpu+0x2c/0x90
> [c000edd73b40]  (unreliable)
> [c000edd73b70] c0015730 giveup_fpu+0x40/0xa0
> [c000edd73ba0] c0015810 flush_fp_to_thread+0x80/0x90
> [c000edd73bd0] c0026b3c setup_sigcontext.constprop.3+0xbc/0x1f0
> [c000edd73c30] c00274c4 handle_rt_signal64+0x3b4/0x7c0
> [c000edd73d10] c0017ee0 do_signal+0x150/0x2b0
> [c000edd73e00] c0018220 do_notify_resume+0xd0/0x110
> [c000edd73e30] c0009844 ret_from_except_lite+0x70/0x74
> --- Exception: 900 (Decrementer) at 100b3c88
> SP (3fffd08cfb20) is in userspace
> d:mon> ls save_fpu
> save_fpu: c000a770
> 
> With v4.5, the test would fail, but not cause what looks to be an 
> alignment exception.

xmon couldn't decode the instructions:

d:mon>
c000a810  3880  li  r4,0
c000a814  f250  .long 0xf250
c000a818  7c062798  .long 0x7c062798
c000a81c  f250  .long 0xf250
c000a820  38800010  li  r4,16
c000a824  f0210a50  .long 0xf0210a50
c000a828  7c262798  .long 0x7c262798
c000a82c  f0210a50  .long 0xf0210a50
c000a830  38800020  li  r4,32
c000a834  f0421250  .long 0xf0421250

However, with objdump, the instructions look to be ok:

c000aa10 
c000a810:   00 00 80 38 li  r4,0
c000a814:   50 02 00 f0 xxswapd vs0,vs0
c000a818:   98 27 06 7c stxvd2x vs0,r6,r4
c000a81c:   50 02 00 f0 xxswapd vs0,vs0
c000a820:   10 00 80 38 li  r4,16
c000a824:   50 0a 21 f0 xxswapd vs1,vs1
c000a828:   98 27 26 7c stxvd2x vs1,r6,r4
c000a82c:   50 0a 21 f0 xxswapd vs1,vs1

I saw this on a LE vm on Power7 and that looks to be the issue, since a 
BE vm does not show this. I'm attaching the .config in case it helps.


- Naveen

#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.6.0-rc3 Kernel Configuration
#
CONFIG_PPC64=y

#
# Processor support
#
CONFIG_PPC_BOOK3S_64=y
# CONFIG_PPC_BOOK3E_64 is not set
CONFIG_POWER7_CPU=y
# CONFIG_POWER8_CPU is not set
CONFIG_PPC_BOOK3S=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_VSX=y
CONFIG_PPC_ICSWX=y
# CONFIG_PPC_ICSWX_PID is not set
# CONFIG_PPC_ICSWX_USE_SIGILL is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_64=y
CONFIG_PPC_RADIX_MMU=y
CONFIG_PPC_MM_SLICES=y
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2048
CONFIG_PPC_DOORBELL=y
# CONFIG_CPU_BIG_ENDIAN 

[GIT PULL] Please pull powerpc/linux.git powerpc-4.6-5 tag

2016-05-06 Thread Michael Ellerman
Hi Linus,

Please pull one powerpc fix for 4.6:

The following changes since commit d701cca6744fe0d67c86346dcfc9b128b17b5045:

  powerpc: wire up preadv2 and pwritev2 syscalls (2016-04-27 16:47:55 +1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.6-5

for you to fetch changes up to b4c112114aab9aff5ed4568ca5e662bb02cdfe74:

  powerpc: Fix bad inline asm constraint in create_zero_mask() (2016-05-02 
11:10:25 +1000)


powerpc fixes for 4.6 #4

 - Fix bad inline asm constraint in create_zero_mask() from Anton Blanchard


Anton Blanchard (1):
  powerpc: Fix bad inline asm constraint in create_zero_mask()

 arch/powerpc/include/asm/word-at-a-time.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] cxl: Add kernel API to allow a context to operate with relocate disabled

2016-05-06 Thread Ian Munsie
From: Ian Munsie 

cxl devices typically access memory using an MMU in much the same way as
the CPU, and each context includes a state register much like the MSR in
the CPU. Like the CPU, the state register includes a bit to enable
relocation, which we currently always enable.

In some cases, it may be desirable to allow a device to access memory
using real addresses instead of effective addresses, so this adds a new
API, cxl_set_translation_mode, that can be used to disable relocation
on a given kernel context. This can allow for the creation of a special
privileged context that the device can use if it needs relocation
disabled, and can use regular contexts at times when it needs relocation
enabled.

This interface is only available to users of the kernel API for obvious
reasons, and will never be supported in a virtualised environment.

This will be used by the upcoming cxl support in the mlx5 driver.

Signed-off-by: Ian Munsie 
---

Changes since v1:
- Changed API to use a dedicated cxl_set_translation_mode() call instead of
  adding an extra parameter to cxl_start_context2() based on review feedback
  from Frederic Barrat
- Changed error code for attempting to use in PowerVM environment to -EPERM

 drivers/misc/cxl/api.c| 19 +++
 drivers/misc/cxl/cxl.h|  1 +
 drivers/misc/cxl/guest.c  |  3 +++
 drivers/misc/cxl/native.c |  5 +++--
 include/misc/cxl.h|  8 
 5 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 8075823..6d228cc 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -183,6 +183,7 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
ctx->pid = get_task_pid(task, PIDTYPE_PID);
ctx->glpid = get_task_pid(task->group_leader, PIDTYPE_PID);
kernel = false;
+   ctx->real_mode = false;
}
 
cxl_ctx_get();
@@ -219,6 +220,24 @@ void cxl_set_master(struct cxl_context *ctx)
 }
 EXPORT_SYMBOL_GPL(cxl_set_master);
 
+int cxl_set_translation_mode(struct cxl_context *ctx, bool real_mode)
+{
+   if (ctx->status == STARTED) {
+   /*
+* We could potentially update the PE and issue an update LLCMD
+* to support this, but it doesn't seem to have a good use case
+* since it's trivial to just create a second kernel context
+* with different translation modes, so until someone convinces
+* me otherwise:
+*/
+   return -EBUSY;
+   }
+
+   ctx->real_mode = real_mode;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(cxl_set_translation_mode);
+
 /* wrappers around afu_* file ops which are EXPORTED */
 int cxl_fd_open(struct inode *inode, struct file *file)
 {
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index dfdbfb0..6e3e485 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -523,6 +523,7 @@ struct cxl_context {
bool pe_inserted;
bool master;
bool kernel;
+   bool real_mode;
bool pending_irq;
bool pending_fault;
bool pending_afu_err;
diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c
index 769971c..c2815b9 100644
--- a/drivers/misc/cxl/guest.c
+++ b/drivers/misc/cxl/guest.c
@@ -617,6 +617,9 @@ static int guest_attach_process(struct cxl_context *ctx, 
bool kernel, u64 wed, u
 {
pr_devel("in %s\n", __func__);
 
+   if (ctx->real_mode)
+   return -EPERM;
+
ctx->kernel = kernel;
if (ctx->afu->current_mode == CXL_MODE_DIRECTED)
return attach_afu_directed(ctx, wed, amr);
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index ef494ba..ba459a9 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -485,8 +485,9 @@ static u64 calculate_sr(struct cxl_context *ctx)
if (mfspr(SPRN_LPCR) & LPCR_TC)
sr |= CXL_PSL_SR_An_TC;
if (ctx->kernel) {
-   sr |= CXL_PSL_SR_An_R | (mfmsr() & MSR_SF);
-   sr |= CXL_PSL_SR_An_HV;
+   if (!ctx->real_mode)
+   sr |= CXL_PSL_SR_An_R;
+   sr |= (mfmsr() & MSR_SF) | CXL_PSL_SR_An_HV;
} else {
sr |= CXL_PSL_SR_An_PR | CXL_PSL_SR_An_R;
sr &= ~(CXL_PSL_SR_An_HV);
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index 7d5e261..56560c5 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -127,6 +127,14 @@ int cxl_afu_reset(struct cxl_context *ctx);
 void cxl_set_master(struct cxl_context *ctx);
 
 /*
+ * Sets the context to use real mode memory accesses to operate with
+ * translation disabled. Note that this only makes sense for kernel contexts
+ * under bare metal, and will not work with virtualisation. May only be
+ * performed on stopped contexts.
+ */
+int cxl_set_translation_mode(struct cxl_context 

Re: [PATCH] cxl: Add kernel API to allow a context to operate with relocate disabled

2016-05-06 Thread Ian Munsie
Sure thing, that actually simplifies things a great deal. Testing now
and will resend shortly :)

-Ian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v9 04/22] powerpc/powernv: Increase PE# capacity

2016-05-06 Thread Alexey Kardashevskiy

On 05/03/2016 11:22 PM, Gavin Shan wrote:

Each PHB maintains an array helping to translate 2-bytes Request
ID (RID) to PE# with the assumption that PE# takes one byte, meaning
that we can't have more than 256 PEs. However, pci_dn->pe_number
already had 4-bytes for the PE#.


Can you possibly have more than 256 PEs? Or exactly 256? What patch in this 
series makes use of it?


I probably asked but do not remember the answer :)

Looks like waste of memory - you only used a small fraction of 
pe_rmap[0x1] and now the waste is quadrupled.





This extends the PE# capacity for every PHB. After that, the PE number
is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to
check the PE# in phb->pe_rmap[] is valid or not.


Looks like using IODA_INVALID_PE is the only reason for this patch.




Signed-off-by: Gavin Shan 
Reviewed-by: Daniel Axtens 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 6 +-
 arch/powerpc/platforms/powernv/pci.h  | 7 ++-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index cbd4c0b..cf96cb5 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -768,7 +768,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, 
struct pnv_ioda_pe *pe)

/* Clear the reverse map */
for (rid = pe->rid; rid < rid_end; rid++)
-   phb->ioda.pe_rmap[rid] = 0;
+   phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;

/* Release from all parents PELT-V */
while (parent) {
@@ -3406,6 +3406,10 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
if (prop32)
phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);

+   /* Invalidate RID to PE# mapping */
+   for (segno = 0; segno < ARRAY_SIZE(phb->ioda.pe_rmap); segno++)
+   phb->ioda.pe_rmap[segno] = IODA_INVALID_PE;
+
/* Parse 64-bit MMIO range */
pnv_ioda_parse_m64_window(phb);

diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 904f60b..80f5326 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -156,11 +156,8 @@ struct pnv_phb {
struct list_headpe_list;
struct mutexpe_list_mutex;

-   /* Reverse map of PEs, will have to extend if
-* we are to support more than 256 PEs, indexed
-* bus { bus, devfn }
-*/
-   unsigned char   pe_rmap[0x1];
+   /* Reverse map of PEs, indexed by {bus, devfn} */
+   unsigned intpe_rmap[0x1];

/* TCE cache invalidate registers (physical and
 * remapped)




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [GIT PULL 00/17] perf/core improvements and fixes

2016-05-06 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> 
> The following changes since commit 1b6de5917172967acd8db4d222df4225d23a8a60:
> 
>   perf/x86/intel/pt: Convert ACCESS_ONCE()s (2016-05-05 10:16:29 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-20160505
> 
> for you to fetch changes up to b6b85dad30ad7e7394990e2317a780577974a4e6:
> 
>   perf evlist: Rename variable in perf_mmap__read() (2016-05-05 21:04:04 
> -0300)
> 
> 
> perf/core improvements and fixes:
> 
> User visible:
> 
> - Order output of 'perf trace --summary' better, now the threads will
>   appear ascending order of number of events, and then, for each, in
>   descending order of syscalls by the time spent in the syscalls, so
>   that the last page produced can be the one about the most interesting
>   thread straced, suggested by Milian Wolff (Arnaldo Carvalho de Melo)
> 
> - Do not show the runtime_ms for a thread when not collecting it, that
>   is done so far only with 'perf trace --sched' (Arnaldo Carvalho de Melo)
> 
> - Fix kallsyms perf test on ppc64le (Naveen N. Rao)
> 
> Infrastructure:
> 
> - Move global variables related to presence of some keys in the sort order to 
> a
>   per hist struct, to allow code like the hists browser to work with multiple
>   hists with different lists of columns (Jiri Olsa)
> 
> - Add support for generating bpf prologue in powerpc (Naveen N. Rao)
> 
> - Fix kprobe and kretprobe handling with kallsyms on ppc64le (Naveen N. Rao)
> 
> - evlist mmap changes, prep work for supporting reading backwards (Wang Nan)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Arnaldo Carvalho de Melo (5):
>   perf machine: Introduce number of threads member
>   perf tools: Add template for generating rbtree resort class
>   perf trace: Sort summary output by number of events
>   perf trace: Sort syscalls stats by msecs in --summary
>   perf trace: Do not show the runtime_ms for a thread when not collecting 
> it
> 
> Jiri Olsa (7):
>   perf hists: Move sort__need_collapse into struct perf_hpp_list
>   perf hists: Move sort__has_parent into struct perf_hpp_list
>   perf hists: Move sort__has_sym into struct perf_hpp_list
>   perf hists: Move sort__has_dso into struct perf_hpp_list
>   perf hists: Move sort__has_socket into struct perf_hpp_list
>   perf hists: Move sort__has_thread into struct perf_hpp_list
>   perf hists: Move sort__has_comm into struct perf_hpp_list
> 
> Naveen N. Rao (3):
>   perf tools powerpc: Add support for generating bpf prologue
>   perf powerpc: Fix kprobe and kretprobe handling with kallsyms on ppc64le
>   perf symbols: Fix kallsyms perf test on ppc64le
> 
> Wang Nan (2):
>   perf evlist: Extract perf_mmap__read()
>   perf evlist: Rename variable in perf_mmap__read()
> 
>  tools/perf/arch/powerpc/Makefile|   1 +
>  tools/perf/arch/powerpc/util/dwarf-regs.c   |  40 +---
>  tools/perf/arch/powerpc/util/sym-handling.c |  43 ++--
>  tools/perf/builtin-diff.c   |   4 +-
>  tools/perf/builtin-report.c |   4 +-
>  tools/perf/builtin-top.c|   8 +-
>  tools/perf/builtin-trace.c  |  87 ++--
>  tools/perf/tests/hists_common.c |   2 +-
>  tools/perf/tests/hists_cumulate.c   |   2 +-
>  tools/perf/tests/hists_link.c   |   4 +-
>  tools/perf/tests/hists_output.c |   2 +-
>  tools/perf/ui/browsers/hists.c  |  32 +++---
>  tools/perf/ui/gtk/hists.c   |   2 +-
>  tools/perf/ui/hist.c|   2 +-
>  tools/perf/util/annotate.c  |   2 +-
>  tools/perf/util/callchain.c |   2 +-
>  tools/perf/util/evlist.c|  56 ++-
>  tools/perf/util/hist.c  |  14 +--
>  tools/perf/util/hist.h  |  10 ++
>  tools/perf/util/machine.c   |   9 +-
>  tools/perf/util/machine.h   |   1 +
>  tools/perf/util/probe-event.c   |   5 +-
>  tools/perf/util/probe-event.h   |   3 +-
>  tools/perf/util/rb_resort.h | 149 
> 
>  tools/perf/util/sort.c  |  35 +++
>  tools/perf/util/sort.h  |   7 --
>  tools/perf/util/symbol-elf.c|   7 +-
>  tools/perf/util/symbol.h|   3 +-
>  28 files changed, 382 insertions(+), 154 deletions(-)
>  create mode 100644 tools/perf/util/rb_resort.h

Pulled, thanks a lot Arnaldo!

Ingo
___
Linuxppc-dev mailing list

Re: [PATCH v9 03/22] powerpc/powernv: Move pnv_pci_ioda_setup_opal_tce_kill() around

2016-05-06 Thread Alexey Kardashevskiy

On 05/03/2016 11:22 PM, Gavin Shan wrote:

pnv_pci_ioda_setup_opal_tce_kill() called by pnv_ioda_setup_dma()
to remap the TCE kill regiter. What's done in pnv_ioda_setup_dma()
will be covered in pcibios_setup_bridge() which is invoked on each
PCI bridge. It means we will possibly remap the TCE kill register
for multiple times and it's unnecessary.

This moves pnv_pci_ioda_setup_opal_tce_kill() to where the PHB is
initialized (pnv_pci_init_ioda_phb()) to avoid above issue.

Signed-off-by: Gavin Shan 


Reviewed-by: Alexey Kardashevskiy 


---
 arch/powerpc/platforms/powernv/pci-ioda.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5ee8a57..cbd4c0b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2599,8 +2599,6 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
pr_info("PCI: Domain %04x has %d available 32-bit DMA segments\n",
hose->global_number, phb->ioda.dma32_count);

-   pnv_pci_ioda_setup_opal_tce_kill(phb);
-
/* Walk our PE list and configure their DMA segments */
list_for_each_entry(pe, >ioda.pe_list, list) {
weight = pnv_pci_ioda_pe_dma_weight(pe);
@@ -3396,6 +3394,9 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
if (phb->regs == NULL)
pr_err("  Failed to map registers !\n");

+   /* Initialize TCE kill register */
+   pnv_pci_ioda_setup_opal_tce_kill(phb);
+
/* Initialize more IODA stuff */
phb->ioda.total_pe_num = 1;
prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-06 Thread Alexey Kardashevskiy

On 05/06/2016 01:05 AM, Alex Williamson wrote:

On Thu, 5 May 2016 12:15:46 +
"Tian, Kevin"  wrote:


From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
Sent: Thursday, May 05, 2016 7:43 PM

Hi David and Kevin,

On 2016/5/5 17:54, David Laight wrote:


From: Tian, Kevin

Sent: 05 May 2016 10:37

...

Acutually, we are not aimed at accessing MSI-X table from
guest. So I think it's safe to passthrough MSI-X table if we
can make sure guest kernel would not touch MSI-X table in
normal code path such as para-virtualized guest kernel on PPC64.


Then how do you prevent malicious guest kernel accessing it?

Or a malicious guest driver for an ethernet card setting up
the receive buffer ring to contain a single word entry that
contains the address associated with an MSI-X interrupt and
then using a loopback mode to cause a specific packet be
received that writes the required word through that address.

Remember the PCIe cycle for an interrupt is a normal memory write
cycle.

David



If we have enough permission to load a malicious driver or
kernel, we can easily break the guest without exposed
MSI-X table.

I think it should be safe to expose MSI-X table if we can
make sure that malicious guest driver/kernel can't use
the MSI-X table to break other guest or host. The
capability of IRQ remapping could provide this
kind of protection.



With IRQ remapping it doesn't mean you can pass through MSI-X
structure to guest. I know actual IRQ remapping might be platform
specific, but at least for Intel VT-d specification, MSI-X entry must
be configured with a remappable format by host kernel which
contains an index into IRQ remapping table. The index will find a
IRQ remapping entry which controls interrupt routing for a specific
device. If you allow a malicious program random index into MSI-X
entry of assigned device, the hole is obvious...

Above might make sense only for a IRQ remapping implementation
which doesn't rely on extended MSI-X format (e.g. simply based on
BDF). If that's the case for PPC, then you should build MSI-X
passthrough based on this fact instead of general IRQ remapping
enabled or not.


I don't think anyone is expecting that we can expose the MSI-X vector
table to the guest and the guest can make direct use of it.  The end
goal here is that the guest on a power system is already
paravirtualized to not program the device MSI-X by directly writing to
the MSI-X vector table.  They have hypercalls for this since they
always run virtualized.  Therefore a) they never intend to touch the
MSI-X vector table and b) they have sufficient isolation that a guest
can only hurt itself by doing so.

On x86 we don't have a), our method of programming the MSI-X vector
table is to directly write to it. Therefore we will always require QEMU
to place a MemoryRegion over the vector table to intercept those
accesses.  However with interrupt remapping, we do have b) on x86, which
means that we don't need to be so strict in disallowing user accesses
to the MSI-X vector table.  It's not useful for configuring MSI-X on
the device, but the user should only be able to hurt themselves by
writing it directly.  x86 doesn't really get anything out of this
change, but it helps this special case on power pretty significantly
aiui.  Thanks,


Excellent short overview, saved :)

How do we proceed with these patches? Nobody seems objecting them but also 
nobody seems taking them either...





--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/5] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge

2016-05-06 Thread Alexey Kardashevskiy

On 04/27/2016 10:43 PM, Yongji Xie wrote:

Any IODA host bridge have the capability of IRQ remapping.
So we set PCI_BUS_FLAGS_MSI_REMAP when this kind of host birdge
is detected.

Signed-off-by: Yongji Xie 



Reviewed-by: Alexey Kardashevskiy 



---
 arch/powerpc/platforms/powernv/pci-ioda.c |8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f90dc04..9557638 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3080,6 +3080,12 @@ static void pnv_pci_ioda_fixup(void)
pnv_npu_ioda_fixup();
 }

+int pnv_pci_ioda_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+   bridge->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
+   return 0;
+}
+
 /*
  * Returns the alignment for I/O or memory windows for P2P
  * bridges. That actually depends on how PEs are segmented.
@@ -3364,6 +3370,8 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
 */
ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;

+   ppc_md.pcibios_root_bridge_prepare = pnv_pci_ioda_root_bridge_prepare;
+
if (phb->type == PNV_PHB_NPU)
hose->controller_ops = pnv_npu_ioda_controller_ops;
else




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] Enable ZONE_DEVICE on POWER

2016-05-06 Thread oliver
Hi,

I've been working on kernel support for a persistent memory (nvdimm)
device and the kernel driver infrastructure requires ZONE_DEVICE for
DAX support.. I've had it enabled in my tree for some time (without
altmap support) without any real issues.

I wasn't planning on upstreaming any of my changes until 4.8 at the
earliest so I am ok with carrying these patches myself. However, there
has been some interest in using ZONE_DEVICE for other things on ppc
(wasn't that you?) and given that ZONE_DEVICE is gated behind
CONFIG_EXPERT I can't see there being any kind of negative impact on
end users by merging it now. At the very least it lets the rest of the
kernel development community know that changes affecting zones should
also be tested on powerpc.

Thanks,
Oliver


On Fri, May 6, 2016 at 3:13 PM, Anshuman Khandual
 wrote:
> On 05/05/2016 08:18 PM, Aneesh Kumar K.V wrote:
>> Anshuman Khandual  writes:
>>
>>> This enables base ZONE_DEVICE support on POWER. This series depends on
>>> the following patches posted by Oliver.
>>>
>>> https://patchwork.ozlabs.org/patch/618867/
>>> https://patchwork.ozlabs.org/patch/618868/
>>>
>>> Anshuman Khandual (2):
>>>   powerpc/mm: Make vmemmap_populate accommodate ZONE_DEVICE memory
>>>   powerpc/mm: Enable support for ZONE_DEVICE on PPC_BOOK3S_64 platforms
>>>
>>>  arch/powerpc/mm/init_64.c | 4 +++-
>>>  mm/Kconfig| 2 +-
>>>  2 files changed, 4 insertions(+), 2 deletions(-)
>>>
>>
>> What is the use case ? Who will use ZONE_DEVICE on ppc64. This should be
>> be merged along with the patch series that use this.
>
> IIUC, Oliver has been looking at using ZONE_DEVICE for the NVDIMM (or
> some other persistent memory) drivers. I have been following Dan William's
> work on this front and want to explore more details about it's functioning
> on ppc64. This enablement will just help us little bit in that direction.
>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev