date:20160630

Re: [PATCH v3 2/7] PCI: Ignore enforced alignment to VF BARs

2016-06-30 Thread Yongji Xie


Hi Gavin,

On 2016/7/1 8:39, Gavin Shan wrote:


On Thu, Jun 30, 2016 at 06:53:08PM +0800, Yongji Xie wrote:

VF BARs are read-only zeroes according to SRIOV spec,
the normal way(writing BARs) of allocating resources wouldn't
be applied to VFs. The VFs' resources would be allocated
when we enable SR-IOV capability. So we should not try to
reassign alignment after we enable VFs. It's meaningless
and will release the allocated resources which leads to a bug.

Signed-off-by: Yongji Xie 
---
drivers/pci/pci.c |4 
1 file changed, 4 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index be8f72c..6ae02de 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4822,6 +4822,10 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
resource_size_t align, size;
u16 command;

+   /* We should never try to reassign VF's alignment */
+   if (dev->is_virtfn)
+   return;
+

Yongji, I think it's correct to ignore VF's BARs. Another concern is:
it's safe to apply alignment to PF's IOV BARs? Lets have an extreme
example here: one PF has 16 VFs; each VF has only one 1KB. It means
the only PF IOV BAR is 16KB. I don't see how it works after expanding
it to 64KB which is the page size. It might be not a problem on PowerNV
platform, but potentially a issue on x86?


Seems like the alignment would not be applied to IOV BARs because
pci_reassigndev_resource_alignment() will be called before
sriov_init().

Thanks,
Yongji

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

linux-next: build failure after merge of the powerpc tree

2016-06-30 Thread Stephen Rothwell

Hi all,

After merging the powerpc tree, today's linux-next build (powerpc
allyesconfig) failed like this:

arch/powerpc/kernel/fadump.c: In function 'fadump_invalidate_dump':
arch/powerpc/kernel/fadump.c:1014:2: error: expected ';' before '}' token
  }
  ^

Caused by commit

  4a03749f140c ("powerpc/fadump: Trivial fix of spelling mistake, clean up 
message")

I have added this fix patch for today:

From: Stephen Rothwell 
Date: Fri, 1 Jul 2016 15:19:34 +1000
Subject: [PATCH] powerpc/fadump: add missing semicolon

Fixes: 4a03749f140c ("powerpc/fadump: Trivial fix of spelling mistake, clean up 
message")
Signed-off-by: Stephen Rothwell 
---
 arch/powerpc/kernel/fadump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index f0664860753e..b3a66d36 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1010,7 +1010,7 @@ static int fadump_invalidate_dump(struct 
fadump_mem_struct *fdm)
 
if (rc) {
pr_err("Failed to invalidate firmware-assisted dump 
registration. Unexpected error (%d).\n", rc);
-   return rc
+   return rc;
}
fw_dump.dump_active = 0;
fdm_active = NULL;
-- 
2.8.1

-- 
Cheers,
Stephen Rothwell
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/7] PCI: Ignore enforced alignment when kernel uses existing firmware setup

2016-06-30 Thread Yongji Xie


Hi Gavin,

On 2016/7/1 8:28, Gavin Shan wrote:


On Thu, Jun 30, 2016 at 06:53:07PM +0800, Yongji Xie wrote:

PCI resources allocator will use firmware setup and not try to
reassign resource when PCI_PROBE_ONLY or IORESOURCE_PCI_FIXED
is set.

The enforced alignment in pci_reassigndev_resource_alignment()
should be ignored in this case. Otherwise, some PCI devices'
resources would be released here and not re-allocated.

Signed-off-by: Yongji Xie 
---
drivers/pci/pci.c |   12 
1 file changed, 12 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c8b4dbd..be8f72c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4760,6 +4760,13 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev)

spin_lock(_alignment_lock);
p = resource_alignment_param;
+   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (*p)
+   printk_once(KERN_INFO "PCI: Ignore resource_alignment 
parameter: %s with PCI_PROBE_ONLY set\n",
+   p);

We don't have to print @resource_alignment as it's ignored completely.
The content included in it doesn't mean anything:

pr_info_once("PCI: resource_alignment ignored with 
PCI_PROBE_ONLY\n");


OK. It looks simpler now.


+   spin_unlock(_alignment_lock);
+   return 0;
+   }

A empty line is needed here.


while (*p) {
count = 0;
if (sscanf(p, "%d%n", _order, ) == 1 &&
@@ -4837,6 +4844,11 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
r = >resource[i];
if (!(r->flags & IORESOURCE_MEM))
continue;
+   if (r->flags & IORESOURCE_PCI_FIXED) {
+   dev_info(>dev, "No alignment for fixed BAR%d: 
%pR\n",
+   i, r);

The message would be like below to match PCI code style. I'm thinking it 
probably
uses dev_dbg() instead dev_info(), but not sure for 100%.

dev_info(>dev, "BAR %d: fixed %pR, no 
alignment\n", i, r);


Maybe dev_info() is OK.:-)


+   continue;
+   }

A empty line is needed here.


size = resource_size(r);
if (size < align) {
size = align;

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 5/7] PCI: Do not use IORESOURCE_STARTALIGN to identify bridge resources

2016-06-30 Thread Gavin Shan

On Thu, Jun 30, 2016 at 06:53:11PM +0800, Yongji Xie wrote:
>Now we use the IORESOURCE_STARTALIGN to identify bridge resources
>in __assign_resources_sorted(). That's quite fragile. We may also
>set flag IORESOURCE_STARTALIGN for SR-IOV resources in some cases,
>for example, using the option "noresize" of parameter
>"pci=resource_alignment".
>
>In this patch, we try to use a more robust way to identify
>bridge resources.
>
>Signed-off-by: Yongji Xie 

Reviewed-by: Gavin Shan 

Yongji, I think this doesn't have to be part of this series, meaning
it can be sent or merged separately.

>---
> drivers/pci/setup-bus.c |9 ++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
>index 55641a3..216ddbc 100644
>--- a/drivers/pci/setup-bus.c
>+++ b/drivers/pci/setup-bus.c
>@@ -390,6 +390,7 @@ static void __assign_resources_sorted(struct list_head 
>*head,
>   struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
>   unsigned long fail_type;
>   resource_size_t add_align, align;
>+  int index;
>
>   /* Check if optional add_size is there */
>   if (!realloc_head || list_empty(realloc_head))
>@@ -410,11 +411,13 @@ static void __assign_resources_sorted(struct list_head 
>*head,
>
>   /*
>* There are two kinds of additional resources in the list:
>-   * 1. bridge resource  -- IORESOURCE_STARTALIGN
>-   * 2. SR-IOV resource   -- IORESOURCE_SIZEALIGN
>+   * 1. bridge resource
>+   * 2. SR-IOV resource
>* Here just fix the additional alignment for bridge
>*/
>-  if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
>+  index = dev_res->res - dev_res->dev->resource;
>+  if (index < PCI_BRIDGE_RESOURCES ||
>+  index > PCI_BRIDGE_RESOURCE_END)
>   continue;
>
>   add_align = get_res_add_align(realloc_head, dev_res->res);

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/fadump: Fix compile error due to missing semicolon

2016-06-30 Thread Ian Munsie

From: Ian Munsie 

The commit "powerpc/fadump: trivial fix of spelling mistake, clean up
message" removed a semicolon causing the following compile failure:

arch/powerpc/kernel/fadump.c: In function ‘fadump_invalidate_dump’:
arch/powerpc/kernel/fadump.c:1014:2: error: expected ‘;’ before ‘}’ token
  }
  ^

Reported-by: Huy Nguyen 
Signed-off-by: Ian Munsie 
---
 arch/powerpc/kernel/fadump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index f066486..b3a6633 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1010,7 +1010,7 @@ static int fadump_invalidate_dump(struct 
fadump_mem_struct *fdm)
 
if (rc) {
pr_err("Failed to invalidate firmware-assisted dump 
registration. Unexpected error (%d).\n", rc);
-   return rc
+   return rc;
}
fw_dump.dump_active = 0;
fdm_active = NULL;
-- 
2.8.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 4/7] PCI: Add a new option for resource_alignment to reassign alignment

2016-06-30 Thread Gavin Shan

On Thu, Jun 30, 2016 at 06:53:10PM +0800, Yongji Xie wrote:
>When using resource_alignment kernel parameter, the current
>implement reassigns the alignment by changing resources' size
>which can potentially break some drivers. For example, the driver
>uses the size to locate some register whose length is related
>to the size.
>
>This patch adds a new option "noresize" for the parameter to
>solve this problem.
>
>Signed-off-by: Yongji Xie 

Reviewed-by: Gavin Shan 

>---
> Documentation/kernel-parameters.txt |5 -
> drivers/pci/pci.c   |   35 +--
> 2 files changed, 29 insertions(+), 11 deletions(-)
>
>diff --git a/Documentation/kernel-parameters.txt 
>b/Documentation/kernel-parameters.txt
>index 82b42c9..c4802f5 100644
>--- a/Documentation/kernel-parameters.txt
>+++ b/Documentation/kernel-parameters.txt
>@@ -2997,13 +2997,16 @@ bytes respectively. Such letter suffixes can also be 
>entirely omitted.
>   window. The default value is 64 megabytes.
>   resource_alignment=
>   Format:
>-  [align>@][:]:.[; ...]
>+  [align>@][:]:.
>+  [:noresize][; ...]
>   Specifies alignment and device to reassign
>   aligned memory resources.
>   If  is not specified,
>   PAGE_SIZE is used as alignment.
>   PCI-PCI bridge can be specified, if resource
>   windows need to be expanded.
>+  noresize: Don't change the resources' sizes when
>+  reassigning alignment.
>   ecrc=   Enable/disable PCIe ECRC (transaction layer
>   end-to-end CRC checking).
>   bios: Use BIOS/firmware settings. This is the
>diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>index 6241cfc..1d80a94 100644
>--- a/drivers/pci/pci.c
>+++ b/drivers/pci/pci.c
>@@ -4748,11 +4748,13 @@ static DEFINE_SPINLOCK(resource_alignment_lock);
> /**
>  * pci_specified_resource_alignment - get resource alignment specified by 
> user.
>  * @dev: the PCI device to get
>+ * @resize: whether or not to change resources' size when reassigning 
>alignment
>  *
>  * RETURNS: Resource alignment if it is specified.
>  *  Zero if it is not specified.
>  */
>-static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
>+static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
>+  bool *resize)
> {
>   int seg, bus, slot, func, align_order, count;
>   resource_size_t align = 0;
>@@ -4787,6 +4789,11 @@ static resource_size_t 
>pci_specified_resource_alignment(struct pci_dev *dev)
>   }
>   }
>   p += count;
>+  if (!strncmp(p, ":noresize", 9)) {
>+  *resize = false;
>+  p += 9;
>+  } else
>+  *resize = true;
>   if (seg == pci_domain_nr(dev->bus) &&
>   bus == dev->bus->number &&
>   slot == PCI_SLOT(dev->devfn) &&
>@@ -4819,6 +4826,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
>*dev)
> {
>   int i;
>   struct resource *r;
>+  bool resize = true;
>   resource_size_t align, size;
>
>   /* We should never try to reassign VF's alignment */
>@@ -4826,7 +4834,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
>*dev)
>   return;
>
>   /* check if specified PCI is target device to reassign */
>-  align = pci_specified_resource_alignment(dev);
>+  align = pci_specified_resource_alignment(dev, );
>   if (!align)
>   return;
>
>@@ -4848,15 +4856,22 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
>*dev)
>   continue;
>   }
>   size = resource_size(r);
>-  if (size < align) {
>-  size = align;
>-  dev_info(>dev,
>-  "Rounding up size of resource #%d to %#llx.\n",
>-  i, (unsigned long long)size);
>+  if (resize) {
>+  if (size < align) {
>+  size = align;
>+  dev_info(>dev,
>+  "Rounding up size of resource #%d to 
>%#llx.\n",
>+  i, (unsigned long long)size);
>+  }
>+  r->flags |= IORESOURCE_UNSET;
>+  r->end = size - 1;
>+  r->start = 0;
>+  } else {
>+  r->flags &= ~IORESOURCE_SIZEALIGN;
>+

Re: [PATCH v2 00/12] gendisk: Generate uevent after attribute available

2016-06-30 Thread Dan Williams

On Thu, Jun 30, 2016 at 6:01 PM, Fam Zheng  wrote:
> On Wed, 06/29 23:38, Christoph Hellwig wrote:
>> On Thu, Jun 30, 2016 at 02:35:54PM +0800, Fam Zheng wrote:
>> > also more code and less flexible IMO.  For example, we need at least two
>> > variants, for attribute_group and device_attribute separately, right?
>>
>> Yes, or maybe just a calling convention that just passes both.
>
> OK, I can look into that, but I'm not sure about the error handling. Currently
> add_disk returns void, do you have any plan on that too? should I change it in
> v3 (to at least return the attribute creation failure)?

I think we should only support a "groups" interface to
device_add_disk() and convert all the drivers that currently do
device_create_file() after add_disk() to pass in a group list instead.
That way we follow the expectation that the only way to get an
attribute for a device to show up before KOBJ_ADD, is to define a
group:

From Documentation/driver-model/device.txt:
As explained in Documentation/kobject.txt, device attributes must be be
created before the KOBJ_ADD uevent is generated. The only way to realize
that is by defining an attribute group.

Let's defer the return value fixing for now.

You can find the pending device_add_disk() patches in the nvdimm
patchwork starting with "block: introduce device_add_disk()"

https://patchwork.kernel.org/project/linux-nvdimm/list/
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 05/12] aoeblk: Generate uevent after attribute available

2016-06-30 Thread Ed Cashin


On 06/29/2016 09:59 PM, Fam Zheng wrote:

It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.


Looks like an improvement, thanks!

--
  Ed

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 04/12] axonrom: Generate uevent after attribute available

2016-06-30 Thread Fam Zheng

On Thu, 06/30 15:10, Dan Williams wrote:
> On Wed, Jun 29, 2016 at 6:59 PM, Fam Zheng  wrote:
> > It is documented that KOBJ_ADD should be generated after the object's
> > attributes and children are ready.  We can achieve this with the new
> > disk_gen_uevents interface.
> >
> > Signed-off-by: Fam Zheng 
> > ---
> >  arch/powerpc/sysdev/axonram.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
> > index 4efd69b..27e7175 100644
> > --- a/arch/powerpc/sysdev/axonram.c
> > +++ b/arch/powerpc/sysdev/axonram.c
> > @@ -238,7 +238,7 @@ static int axon_ram_probe(struct platform_device 
> > *device)
> > set_capacity(bank->disk, bank->size >> AXON_RAM_SECTOR_SHIFT);
> > blk_queue_make_request(bank->disk->queue, axon_ram_make_request);
> > blk_queue_logical_block_size(bank->disk->queue, 
> > AXON_RAM_SECTOR_SIZE);
> > -   add_disk(bank->disk, true);
> > +   add_disk(bank->disk, false);
> >
> > bank->irq_id = irq_of_parse_and_map(device->dev.of_node, 0);
> > if (bank->irq_id == NO_IRQ) {
> > @@ -262,6 +262,7 @@ static int axon_ram_probe(struct platform_device 
> > *device)
> > rc = -EFAULT;
> > goto failed;
> > }
> > +   disk_gen_uevents(bank->disk);
> 
> I assume you are doing this after:
> 
>rc = device_create_file(>dev, _attr_ecc);
> 
> ...so that userspace gets notified of the new attribute, but this
> attribute is on the parent device, not the disk itself.  Instead I
> think this attribute should simply be registered before the call to
> add_disk().  Then the KOBJ_ADD event for the disk comes after the
> attribute is available.  It's still not a clean fit, because userspace
> should not be expecting a child device uevent to signal new attributes
> available on the parent.

Yes you are right, this patch is a mistake. Moving to before add_disk makes
sense to me.

Thanks for taking a look!

Fam
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 00/12] gendisk: Generate uevent after attribute available

2016-06-30 Thread Fam Zheng

On Wed, 06/29 23:38, Christoph Hellwig wrote:
> On Thu, Jun 30, 2016 at 02:35:54PM +0800, Fam Zheng wrote:
> > also more code and less flexible IMO.  For example, we need at least two
> > variants, for attribute_group and device_attribute separately, right?
> 
> Yes, or maybe just a calling convention that just passes both.

OK, I can look into that, but I'm not sure about the error handling. Currently
add_disk returns void, do you have any plan on that too? should I change it in
v3 (to at least return the attribute creation failure)?

Fam
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/7] PCI: Do not disable memory decoding in pci_reassigndev_resource_alignment()

2016-06-30 Thread Gavin Shan

On Thu, Jun 30, 2016 at 06:53:09PM +0800, Yongji Xie wrote:
>We should not disable memory decoding when we reassign alignment
>in pci_reassigndev_resource_alignment(). It's meaningless and
>have some side effect. For example, some fixup functions such as
>quirk_e100_interrupt() read PCI_COMMAND_MEMORY to know whether
>the devices has been initialized by the firmware or not. If we
>disable memory decoding here, these functions will get a wrong
>information that the devices was not initialized by the firmware
>which may cause a wrong fixup. Besides, disabling memory decoding
>may also break some devices that need to have memory decoding
>always-on during probing.
>

It seems the changelog isn't correct enough if it's talking about
below check in code:

if (!(command & PCI_COMMAND_MEMORY) || !pci_resource_start(dev, 0))
return;

>Signed-off-by: Yongji Xie 
>---
> drivers/pci/pci.c |8 +---
> 1 file changed, 1 insertion(+), 7 deletions(-)
>
>diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>index 6ae02de..6241cfc 100644
>--- a/drivers/pci/pci.c
>+++ b/drivers/pci/pci.c
>@@ -4820,7 +4820,6 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
>*dev)
>   int i;
>   struct resource *r;
>   resource_size_t align, size;
>-  u16 command;
>
>   /* We should never try to reassign VF's alignment */
>   if (dev->is_virtfn)
>@@ -4838,12 +4837,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
>*dev)
>   return;
>   }
>
>-  dev_info(>dev,
>-  "Disabling memory decoding and releasing memory resources.\n");
>-  pci_read_config_word(dev, PCI_COMMAND, );
>-  command &= ~PCI_COMMAND_MEMORY;
>-  pci_write_config_word(dev, PCI_COMMAND, command);
>-
>+  dev_info(>dev, "Releasing memory resources.\n");

Is there a problem you found with PCI_COMMAND removed? If so, could you
please share more details, thanks.

>   for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
>   r = >resource[i];
>   if (!(r->flags & IORESOURCE_MEM))

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/7] PCI: Ignore enforced alignment to VF BARs

2016-06-30 Thread Gavin Shan

On Thu, Jun 30, 2016 at 06:53:08PM +0800, Yongji Xie wrote:
>VF BARs are read-only zeroes according to SRIOV spec,
>the normal way(writing BARs) of allocating resources wouldn't
>be applied to VFs. The VFs' resources would be allocated
>when we enable SR-IOV capability. So we should not try to
>reassign alignment after we enable VFs. It's meaningless
>and will release the allocated resources which leads to a bug.
>
>Signed-off-by: Yongji Xie 
>---
> drivers/pci/pci.c |4 
> 1 file changed, 4 insertions(+)
>
>diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>index be8f72c..6ae02de 100644
>--- a/drivers/pci/pci.c
>+++ b/drivers/pci/pci.c
>@@ -4822,6 +4822,10 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
>*dev)
>   resource_size_t align, size;
>   u16 command;
>
>+  /* We should never try to reassign VF's alignment */
>+  if (dev->is_virtfn)
>+  return;
>+

Yongji, I think it's correct to ignore VF's BARs. Another concern is:
it's safe to apply alignment to PF's IOV BARs? Lets have an extreme
example here: one PF has 16 VFs; each VF has only one 1KB. It means
the only PF IOV BAR is 16KB. I don't see how it works after expanding
it to 64KB which is the page size. It might be not a problem on PowerNV
platform, but potentially a issue on x86?

>   /* check if specified PCI is target device to reassign */
>   align = pci_specified_resource_alignment(dev);
>   if (!align)

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/7] PCI: Ignore enforced alignment when kernel uses existing firmware setup

2016-06-30 Thread Gavin Shan

On Thu, Jun 30, 2016 at 06:53:07PM +0800, Yongji Xie wrote:
>PCI resources allocator will use firmware setup and not try to
>reassign resource when PCI_PROBE_ONLY or IORESOURCE_PCI_FIXED
>is set.
>
>The enforced alignment in pci_reassigndev_resource_alignment()
>should be ignored in this case. Otherwise, some PCI devices'
>resources would be released here and not re-allocated.
>
>Signed-off-by: Yongji Xie 
>---
> drivers/pci/pci.c |   12 
> 1 file changed, 12 insertions(+)
>
>diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>index c8b4dbd..be8f72c 100644
>--- a/drivers/pci/pci.c
>+++ b/drivers/pci/pci.c
>@@ -4760,6 +4760,13 @@ static resource_size_t 
>pci_specified_resource_alignment(struct pci_dev *dev)
>
>   spin_lock(_alignment_lock);
>   p = resource_alignment_param;
>+  if (pci_has_flag(PCI_PROBE_ONLY)) {
>+  if (*p)
>+  printk_once(KERN_INFO "PCI: Ignore resource_alignment 
>parameter: %s with PCI_PROBE_ONLY set\n",
>+  p);

We don't have to print @resource_alignment as it's ignored completely.
The content included in it doesn't mean anything:

pr_info_once("PCI: resource_alignment ignored with 
PCI_PROBE_ONLY\n");

>+  spin_unlock(_alignment_lock);
>+  return 0;
>+  }

A empty line is needed here.

>   while (*p) {
>   count = 0;
>   if (sscanf(p, "%d%n", _order, ) == 1 &&
>@@ -4837,6 +4844,11 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
>*dev)
>   r = >resource[i];
>   if (!(r->flags & IORESOURCE_MEM))
>   continue;
>+  if (r->flags & IORESOURCE_PCI_FIXED) {
>+  dev_info(>dev, "No alignment for fixed BAR%d: 
>%pR\n",
>+  i, r);

The message would be like below to match PCI code style. I'm thinking it 
probably
uses dev_dbg() instead dev_info(), but not sure for 100%.

dev_info(>dev, "BAR %d: fixed %pR, no 
alignment\n", i, r);

>+  continue;
>+  }

A empty line is needed here.

>   size = resource_size(r);
>   if (size < align) {
>   size = align;

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] crypto: powerpc: Add POWER8 optimised crc32c

2016-06-30 Thread Anton Blanchard

From: Anton Blanchard 

Use the vector polynomial multiply-sum instructions in POWER8 to
speed up crc32c.

This is just over 41x faster than the slice-by-8 method that it
replaces. Measurements on a 4.1 GHz POWER8 show it sustaining
52 GiB/sec.

A simple btrfs write performance test:

dd if=/dev/zero of=/mnt/tmpfile bs=1M count=4096
sync

is over 3.7x faster.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/crypto/Makefile |2 +
 arch/powerpc/crypto/crc32c-vpmsum_asm.S  | 1553 ++
 arch/powerpc/crypto/crc32c-vpmsum_glue.c |  167 
 arch/powerpc/include/asm/ppc-opcode.h|   12 +
 crypto/Kconfig   |   11 +
 5 files changed, 1745 insertions(+)
 create mode 100644 arch/powerpc/crypto/crc32c-vpmsum_asm.S
 create mode 100644 arch/powerpc/crypto/crc32c-vpmsum_glue.c

diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile
index 9c221b6..7998c17 100644
--- a/arch/powerpc/crypto/Makefile
+++ b/arch/powerpc/crypto/Makefile
@@ -9,9 +9,11 @@ obj-$(CONFIG_CRYPTO_MD5_PPC) += md5-ppc.o
 obj-$(CONFIG_CRYPTO_SHA1_PPC) += sha1-powerpc.o
 obj-$(CONFIG_CRYPTO_SHA1_PPC_SPE) += sha1-ppc-spe.o
 obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o
+obj-$(CONFIG_CRYPT_CRC32C_VPMSUM) += crc32c-vpmsum.o
 
 aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o 
aes-spe-glue.o
 md5-ppc-y := md5-asm.o md5-glue.o
 sha1-powerpc-y := sha1-powerpc-asm.o sha1.o
 sha1-ppc-spe-y := sha1-spe-asm.o sha1-spe-glue.o
 sha256-ppc-spe-y := sha256-spe-asm.o sha256-spe-glue.o
+crc32c-vpmsum-y := crc32c-vpmsum_asm.o crc32c-vpmsum_glue.o
diff --git a/arch/powerpc/crypto/crc32c-vpmsum_asm.S 
b/arch/powerpc/crypto/crc32c-vpmsum_asm.S
new file mode 100644
index 000..dc640b2
--- /dev/null
+++ b/arch/powerpc/crypto/crc32c-vpmsum_asm.S
@@ -0,0 +1,1553 @@
+/*
+ * Calculate the checksum of data that is 16 byte aligned and a multiple of
+ * 16 bytes.
+ *
+ * The first step is to reduce it to 1024 bits. We do this in 8 parallel
+ * chunks in order to mask the latency of the vpmsum instructions. If we
+ * have more than 32 kB of data to checksum we repeat this step multiple
+ * times, passing in the previous 1024 bits.
+ *
+ * The next step is to reduce the 1024 bits to 64 bits. This step adds
+ * 32 bits of 0s to the end - this matches what a CRC does. We just
+ * calculate constants that land the data in this 32 bits.
+ *
+ * We then use fixed point Barrett reduction to compute a mod n over GF(2)
+ * for n = CRC using POWER8 instructions. We use x = 32.
+ *
+ * http://en.wikipedia.org/wiki/Barrett_reduction
+ *
+ * Copyright (C) 2015 Anton Blanchard , IBM
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include 
+#include 
+
+   .section.rodata
+.balign 16
+
+.byteswap_constant:
+   /* byte reverse permute constant */
+   .octa 0x0F0E0D0C0B0A09080706050403020100
+
+#define MAX_SIZE   32768
+.constants:
+
+   /* Reduce 262144 kbits to 1024 bits */
+   /* x^261120 mod p(x)` << 1, x^261184 mod p(x)` << 1 */
+   .octa 0xb6ca9e209c37c408
+
+   /* x^260096 mod p(x)` << 1, x^260160 mod p(x)` << 1 */
+   .octa 0x350249a80001b51df26c
+
+   /* x^259072 mod p(x)` << 1, x^259136 mod p(x)` << 1 */
+   .octa 0x0001862dac540724b9d0
+
+   /* x^258048 mod p(x)` << 1, x^258112 mod p(x)` << 1 */
+   .octa 0x0001d87fb48c0001c00532fe
+
+   /* x^257024 mod p(x)` << 1, x^257088 mod p(x)` << 1 */
+   .octa 0x0001f39b699ef05a9362
+
+   /* x^256000 mod p(x)` << 1, x^256064 mod p(x)` << 1 */
+   .octa 0x000101da11b40001e1007970
+
+   /* x^254976 mod p(x)` << 1, x^255040 mod p(x)` << 1 */
+   .octa 0x0001cab571e0a57366ee
+
+   /* x^253952 mod p(x)` << 1, x^254016 mod p(x)` << 1 */
+   .octa 0xc7020cfe000192011284
+
+   /* x^252928 mod p(x)` << 1, x^252992 mod p(x)` << 1 */
+   .octa 0xcdaed1ae000162716d9a
+
+   /* x^251904 mod p(x)` << 1, x^251968 mod p(x)` << 1 */
+   .octa 0x0001e804effccd97ecde
+
+   /* x^250880 mod p(x)` << 1, x^250944 mod p(x)` << 1 */
+   .octa 0x77c3ea3a58812bc0
+
+   /* x^249856 mod p(x)` << 1, x^249920 mod p(x)` << 1 */
+   .octa 0x68df31b488b8c12e
+
+   /* x^248832 mod p(x)` << 1, x^248896 mod p(x)` << 1 */
+   .octa 0xb059b6c20001230b234c
+
+   /* x^247808 mod p(x)` << 1, x^247872 mod p(x)` << 1 */
+   .octa 0x000145fb8ed80001120b416e
+
+   /* x^246784 mod p(x)` << 1, x^246848 mod p(x)` << 1 */
+   .octa 0xcbc091680001974aecb0
+
+   /* x^245760

[PATCH 1/2] powerpc: define FUNC_START/FUNC_END

2016-06-30 Thread Anton Blanchard

From: Anton Blanchard 

gcc provides FUNC_START/FUNC_END macros to help with creating
assembly functions. Mirror these in the kernel so we can more easily
share code between userspace and the kernel. FUNC_END is just a
stub since we don't currently annotate the end of kernel functions.

It might make sense to do a wholesale search and replace, but for
now just create a couple of defines.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/include/asm/ppc_asm.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 7b591f9..7a924da 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -286,6 +286,9 @@ n:
 
 #endif
 
+#define FUNC_START(name)   _GLOBAL(name)
+#define FUNC_END(name)
+
 /* 
  * LOAD_REG_IMMEDIATE(rn, expr)
  *   Loads the value of the constant expression 'expr' into register 'rn'
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 04/12] axonrom: Generate uevent after attribute available

2016-06-30 Thread Dan Williams

On Wed, Jun 29, 2016 at 6:59 PM, Fam Zheng  wrote:
> It is documented that KOBJ_ADD should be generated after the object's
> attributes and children are ready.  We can achieve this with the new
> disk_gen_uevents interface.
>
> Signed-off-by: Fam Zheng 
> ---
>  arch/powerpc/sysdev/axonram.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
> index 4efd69b..27e7175 100644
> --- a/arch/powerpc/sysdev/axonram.c
> +++ b/arch/powerpc/sysdev/axonram.c
> @@ -238,7 +238,7 @@ static int axon_ram_probe(struct platform_device *device)
> set_capacity(bank->disk, bank->size >> AXON_RAM_SECTOR_SHIFT);
> blk_queue_make_request(bank->disk->queue, axon_ram_make_request);
> blk_queue_logical_block_size(bank->disk->queue, AXON_RAM_SECTOR_SIZE);
> -   add_disk(bank->disk, true);
> +   add_disk(bank->disk, false);
>
> bank->irq_id = irq_of_parse_and_map(device->dev.of_node, 0);
> if (bank->irq_id == NO_IRQ) {
> @@ -262,6 +262,7 @@ static int axon_ram_probe(struct platform_device *device)
> rc = -EFAULT;
> goto failed;
> }
> +   disk_gen_uevents(bank->disk);

I assume you are doing this after:

   rc = device_create_file(>dev, _attr_ecc);

...so that userspace gets notified of the new attribute, but this
attribute is on the parent device, not the disk itself.  Instead I
think this attribute should simply be registered before the call to
add_disk().  Then the KOBJ_ADD event for the disk comes after the
attribute is available.  It's still not a clean fit, because userspace
should not be expecting a child device uevent to signal new attributes
available on the parent.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] linuxppc/devtree: Parse new DRC mem/cpu/dev device tree elements

2016-06-30 Thread Michael Bringmann

Several properties in the DRC device tree format are replaced by
more compact representations to allow, for example, for the encoding
of vast amounts of memory, and or reduced duplication of information
in related data structures.

"ibm,drc-info": This property, when present, replaces the following
four properties: "ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types"
and "ibm,drc-power-domains".  This property is defined for all
dynamically reconfigurable platform nodes.  The "ibm,drc-info" elements
are intended to provide a more compact representation, and reduce some
search overhead.

"ibm,dynamic-memory-v2": This property replaces the "ibm,dynamic-memory"
node representation within the "ibm,dynamic-reconfiguration-memory"
property provided by the BMC.  This element format is intended to provide
a more compact representation of memory, especially, for systems with
massive amounts of RAM.  To simplify portability, this property is
converted to the "ibm,dynamic-memory" property during system boot.

"ibm,architecture.vec": Bit flags are added to this data structure
by the front end processor to inform the kernel as to whether to expect
the changes to one or both of the device tree structures "ibm,drc-info"
and "ibm,dynamic-memory-v2".

Signed-off-by: Michael Bringmann 
---
diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index b062924..a9d66d5 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -51,6 +51,8 @@
 #define FW_FEATURE_BEST_ENERGY ASM_CONST(0x8000)
 #define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0001)
 #define FW_FEATURE_PRRNASM_CONST(0x0002)
+#define FW_FEATURE_RPS_DM2 ASM_CONST(0x0004)
+#define FW_FEATURE_RPS_DRC_INFOASM_CONST(0x0008)
 
 #ifndef __ASSEMBLY__
 
@@ -66,7 +68,8 @@ enum {
FW_FEATURE_MULTITCE | FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
-   FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN,
+   FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
+   FW_FEATURE_RPS_DM2 | FW_FEATURE_RPS_DRC_INFO,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9d86c66..e4c5076 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -215,6 +215,8 @@ extern int early_init_dt_scan_opal(unsigned long node, 
const char *uname,
   int depth, void *data);
 extern int early_init_dt_scan_recoverable_ranges(unsigned long node,
 const char *uname, int depth, void *data);
+extern int pseries_probe_fw_features(unsigned long node,
+const char *uname, int depth, void *data);
 
 extern int opal_get_chars(uint32_t vtermno, char *buf, int count);
 extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);
diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 7f436ba..99ef058 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -69,6 +69,8 @@ struct boot_param_header {
  * OF address retreival & translation
  */
 
+extern int n_mem_addr_cells;
+
 /* Parse the ibm,dma-window property of an OF node into the busno, phys and
  * size parameters.
  */
@@ -81,8 +83,9 @@ extern void of_instantiate_rtc(void);
 extern int of_get_ibm_chip_id(struct device_node *np);
 
 /* The of_drconf_cell struct defines the layout of the LMB array
- * specified in the device tree property
- * ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
+ * specified in the device tree properties,
+ * ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
+ * ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory-v2
  */
 struct of_drconf_cell {
u64 base_addr;
@@ -92,9 +95,59 @@ struct of_drconf_cell {
u32 flags;
 };
 
-#define DRCONF_MEM_ASSIGNED0x0008
-#define DRCONF_MEM_AI_INVALID  0x0040
-#define DRCONF_MEM_RESERVED0x0080
+   /* It is important to note that this structure can not
+* be safely mapped onto the memory containing the
+* 'ibm,dynamic-memory-v2' property due to the issues
+* of compiler alignment.  This structure represents
+* the order of the fields stored, but compiler alignment
+* may insert extra bytes of padding between the fields
+* 'num_seq_lmbs' and 'base_addr'.
+*/
+struct of_drconf_cell_v2 {
+   u32 num_seq_lmbs;
+   u64 base_addr;
+   u32 drc_index;
+   u32 aa_index;
+   u32 flags;
+};
+
+#defineDRCONF_V2_CELLS (n_mem_addr_cells + 4)

Re: [PATCH v3 2/9] kexec_file: Generalize kexec_add_buffer.

2016-06-30 Thread Dave Young

On 06/30/16 at 01:42pm, Thiago Jung Bauermann wrote:
> Am Donnerstag, 30 Juni 2016, 12:49:44 schrieb Thiago Jung Bauermann:
> > Am Donnerstag, 30 Juni 2016, 11:07:00 schrieb Dave Young:
> > > On 06/29/16 at 06:18pm, Thiago Jung Bauermann wrote:
> > > > I'm not following. The IMA buffer patchset doesn't use
> > > > kexec_locate_mem_hole nor struct kexec_buf.
> > > 
> > > It does not use kexec_locate_mem_hole, but the buffer being passed is
> > > very similar to a kexec_buf struct, no?
> > 
> > If what you're saying is that the arguments passed to
> > kexec_add_handover_buffer in the IMA buffer patchset are very similar to
> > the arguments passed to kexec_add_buffer then yes, it's true.
> > 
> > > So you may refactor kexec_add_buffer and your new function to pass only
> > > kimage and a kbuf, it will be better than passing all those arguments
> > > separately.
> > 
> > To be honest I think struct kexec_buf is an implementation detail inside
> > kexec_locate_mem_hole, made necessary because the callback functions it
> > uses need to access its arguments. Callers of kexec_locate_mem_hole,
> > kexec_add_handover_buffer and kexec_add_buffer shouldn't need to know it
> > exists.
> 
> Elaborating a bit more: the argument list for these three functions are 
> equal or similar because kexec_add_handover_buffer uses kexec_add_buffer, 
> which uses kexec_locate_mem_hole.
> 
> It could be beneficial to have a struct to collect the arguments to these 
> functions if someone using one of them would be likely to use another one 
> with the same arguments. In that case, you set up kexec_buf once and then 
> just pass it whenever you need to call one of those functions.
> 
> But that is unlikely to happen. A user of the kexec API will need to use 
> only one of these functions with a given set of arguments, so they don't 
> gain anything by setting up a struct.
> 
> Syntactically, I also don't think it's clearer to set struct members instead 
> of simply passing arguments to a function, even if the argument list is 
> long.

Sorry, I'm not sure I get your points but the long argument list really looks 
ugly,
since you are introducing more callbacks I still think a cleanup is necessary.

kexec_buffer struct is pretty fine to be a abstract of all these buffers.

Thanks
Dave
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/9] kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.

2016-06-30 Thread Dave Young

On 06/30/16 at 01:08pm, Thiago Jung Bauermann wrote:
> Am Donnerstag, 30 Juni 2016, 11:41:19 schrieb Dave Young:
> > On 06/29/16 at 06:09pm, Thiago Jung Bauermann wrote:
> > > Am Mittwoch, 29 Juni 2016, 15:45:18 schrieb Dave Young:
> > > > On 06/28/16 at 07:18pm, Thiago Jung Bauermann wrote:
> > > > > Am Dienstag, 28 Juni 2016, 15:20:55 schrieb Dave Young:
> > > > > > On 06/27/16 at 04:21pm, Dave Young wrote:
> > > > It looks good except one nitpick inline..
> > > > 
> > > > > +/**
> > > > > + * kexec_locate_mem_hole - find free memory to load segment or use
> > > > > in
> > > > > purgatory
> > > > 
> > > > It is not necessary to use only for purgatory load..
> > > 
> > > Ok, what about this?
> > > 
> > > /**
> > > 
> > >  * kexec_locate_mem_hole - find free memory in a given kimage.
> > 
> > Hmm, a given kimage sounds not correct, I can not get a better way to
> > describe it. How about below with a little change to your previous one:
> > 
> > kexec_locate_mem_hole - find a free chunk of memory to load kexec segment.
> > In powerpc the memory chunk can also be used for the purgatory stack.
> 
> That describes what the memory currently is used for. If powerpc or any 
> other architecture starts to use the memory for something else, this comment 
> would need to be updated. :-)
> 
> What the function really does is find free memory in the physical address 
> space after the currently running kernel hands over control to whatever runs 
> next. What that memory is used for is decided by the caller of the function.
> 
> Since (at least for now), the only things that run next are the purgatory 
> and the next kernel, what about this?
> 
> kexec_locate_mem_hole - find free memory for the purgatory or the next 
> kernel

Ok, I'm fine with above version.

Thanks
Dave 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] powerpc/pseries: Implemented indexed-count hotplug memory remove

2016-06-30 Thread Sahil Mehta

Indexed-count remove for memory hotplug guarantees that a contiguous block
of  lmbs beginning at a specified  will be unassigned (NOT
that  lmbs will be removed). Because of Qemu's per-DIMM memory
management, the removal of a contiguous block of memory currently
requires a series of individual calls. Indexed-count remove reduces
this series into a single call.

Signed-off-by: Sahil Mehta 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   85 +++
 1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index cf359bf..4d81810 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -503,6 +503,86 @@ static int dlpar_memory_remove_by_index(u32 drc_index, 
struct property *prop)
return rc;
 }

+static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index,
+struct property *prop)
+{
+   struct of_drconf_cell *lmbs;
+   u32 num_lmbs, *p;
+   int i, rc;
+   int lmbs_available = 0, start_index = 0, end_index;
+
+   pr_info("Attempting to hot-remove %u LMB(s) at %x\n",
+   lmbs_to_remove, drc_index);
+
+   if (lmbs_to_remove == 0)
+   return -EINVAL;
+
+   p = prop->value;
+   num_lmbs = *p++;
+   lmbs = (struct of_drconf_cell *)p;
+
+   /* Navigate to drc_index */
+   while (start_index < num_lmbs) {
+   if (lmbs[start_index].drc_index == drc_index)
+   break;
+
+   start_index++;
+   }
+
+   end_index = start_index + lmbs_to_remove;
+
+   /* Validate that there are enough LMBs to satisfy the request */
+   for (i = start_index; i < end_index; i++) {
+   if (lmbs[i].flags & DRCONF_MEM_RESERVED)
+   break;
+
+   lmbs_available++;
+   }
+
+   if (lmbs_available < lmbs_to_remove)
+   return -EINVAL;
+
+   for (i = 0; i < end_index; i++) {
+   if (!(lmbs[i].flags & DRCONF_MEM_ASSIGNED))
+   continue;
+
+   rc = dlpar_remove_lmb([i]);
+   if (rc)
+   break;
+
+   lmbs[i].reserved = 1;
+   }
+
+   if (rc) {
+   pr_err("Memory indexed-count-remove failed, adding any removed 
LMBs\n");
+
+   for (i = start_index; i < end_index; i++) {
+   if (!lmbs[i].reserved)
+   continue;
+
+   rc = dlpar_add_lmb([i]);
+   if (rc)
+   pr_err("Failed to add LMB, drc index %x\n",
+  be32_to_cpu(lmbs[i].drc_index));
+
+   lmbs[i].reserved = 0;
+   }
+   rc = -EINVAL;
+   } else {
+   for (i = start_index; i < end_index; i++) {
+   if (!lmbs[i].reserved)
+   continue;
+
+   pr_info("Memory at %llx (drc index %x) was 
hot-removed\n",
+   lmbs[i].base_addr, lmbs[i].drc_index);
+
+   lmbs[i].reserved = 0;
+   }
+   }
+
+   return rc;
+}
+
 #else
 static inline int pseries_remove_memblock(unsigned long base,
  unsigned int memblock_size)
@@ -770,7 +850,6 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 
drc_index,

pr_info("Memory at %llx (drc index %x) was hot-added\n",
lmbs[i].base_addr, lmbs[i].drc_index);
-   lmbs[i].reserved = 0;
}
}

@@ -820,6 +899,10 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
} else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) {
drc_index = hp_elog->_drc_u.drc_index;
rc = dlpar_memory_remove_by_index(drc_index, prop);
+   } else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_IC) {
+   ic[0] = hp_elog->_drc_u.indexed_count[0];
+   ic[1] = hp_elog->_drc_u.indexed_count[1];
+   rc = dlpar_memory_remove_by_ic(ic[0], ic[1], prop);
} else
rc = -EINVAL;
break;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] powerpc/pseries: Implemented indexed-count hotplug memory add

2016-06-30 Thread Sahil Mehta

Indexed-count add for memory hotplug guarantees that a contiguous block
of  lmbs beginning at a specified  will be assigned (NOT
that  lmbs will be added). Because of Qemu's per-DIMM memory
management, the addition of a contiguous block of memory currently
requires a series of individual calls. Indexed-count add reduces
this series into a single call.

Signed-off-by: Sahil Mehta 
---
 arch/powerpc/include/asm/rtas.h |2
 arch/powerpc/platforms/pseries/dlpar.c  |   32 ++-
 arch/powerpc/platforms/pseries/hotplug-memory.c |  109 ---
 3 files changed, 126 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 51400ba..f46b271 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -307,6 +307,7 @@ struct pseries_hp_errorlog {
union {
__be32  drc_index;
__be32  drc_count;
+   __be32  indexed_count[2];
chardrc_name[1];
} _drc_u;
 };
@@ -322,6 +323,7 @@ struct pseries_hp_errorlog {
 #define PSERIES_HP_ELOG_ID_DRC_NAME1
 #define PSERIES_HP_ELOG_ID_DRC_INDEX   2
 #define PSERIES_HP_ELOG_ID_DRC_COUNT   3
+#define PSERIES_HP_ELOG_ID_IC  4

 struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
  uint16_t section_id);
diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index 2b93ae8..a3d5f20 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -345,11 +345,17 @@ static int handle_dlpar_errorlog(struct 
pseries_hp_errorlog *hp_elog)
switch (hp_elog->id_type) {
case PSERIES_HP_ELOG_ID_DRC_COUNT:
hp_elog->_drc_u.drc_count =
-   be32_to_cpu(hp_elog->_drc_u.drc_count);
+   be32_to_cpu(hp_elog->_drc_u.drc_count);
break;
case PSERIES_HP_ELOG_ID_DRC_INDEX:
hp_elog->_drc_u.drc_index =
-   be32_to_cpu(hp_elog->_drc_u.drc_index);
+   be32_to_cpu(hp_elog->_drc_u.drc_index);
+   break;
+   case PSERIES_HP_ELOG_ID_IC:
+   hp_elog->_drc_u.indexed_count[0] =
+   be32_to_cpu(hp_elog->_drc_u.indexed_count[0]);
+   hp_elog->_drc_u.indexed_count[1] =
+   be32_to_cpu(hp_elog->_drc_u.indexed_count[1]);
}

switch (hp_elog->resource) {
@@ -409,7 +415,27 @@ static ssize_t dlpar_store(struct class *class, struct 
class_attribute *attr,
goto dlpar_store_out;
}

-   if (!strncmp(arg, "index", 5)) {
+   if (!strncmp(arg, "indexed-count", 13)) {
+   u32 index, count;
+   char *cstr, *istr;
+
+   hp_elog->id_type = PSERIES_HP_ELOG_ID_IC;
+   arg += strlen("indexed-count ");
+
+   cstr = kstrdup(arg, GFP_KERNEL);
+   istr = strchr(cstr, ' ');
+   *istr++ = '\0';
+
+   if (kstrtou32(cstr, 0, ) || kstrtou32(istr, 0, )) {
+   rc = -EINVAL;
+   pr_err("Invalid index or count : \"%s\"\n", buf);
+   goto dlpar_store_out;
+   }
+   kfree(cstr);
+
+   hp_elog->_drc_u.indexed_count[0] = cpu_to_be32(count);
+   hp_elog->_drc_u.indexed_count[1] = cpu_to_be32(index);
+   } else if (!strncmp(arg, "index", 5)) {
u32 index;

hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 2ce1385..cf359bf 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -458,8 +458,8 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove,
if (!lmbs[i].reserved)
continue;

-   pr_info("Memory at %llx was hot-removed\n",
-   lmbs[i].base_addr);
+   pr_info("Memory at %llx (drc index %x) was 
hot-removed\n",
+   lmbs[i].base_addr, lmbs[i].drc_index);

lmbs[i].reserved = 0;
}
@@ -618,7 +618,6 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add, 
struct property *prop)
num_lmbs = *p++;
lmbs = (struct of_drconf_cell *)p;

-   /* Validate that there are enough LMBs to satisfy the request */
for (i = 0; i < num_lmbs; i++) {
if (!(lmbs[i].flags & DRCONF_MEM_ASSIGNED))
lmbs_available++;
@@ -634,7 +633,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add, 
struct property

[PATCH 0/2] powerpc/pseries: Implemented indexed-count hotplug memory management

2016-06-30 Thread Sahil Mehta

Indexed-count memory management allows addition and removal of contiguous
lmb blocks to be added and removed with a single command. When compared
to the series of calls previously required to manage contiguous blocks,
indexed-count decreases command frequency and complexity, consequently
preventing buffer overflow.

-Sahil Mehta
---
 include/asm/rtas.h |2
 platforms/pseries/dlpar.c  |   32 +-
 platforms/pseries/hotplug-memory.c |  194 ++---
 3 files changed, 210 insertions(+), 18 deletions(-)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cxl: Ignore CAPI adapters misplaced in switched slots

2016-06-30 Thread Ian Munsie

Thanks Philippe - this looks like a decent solution to the problem (and
I intend to use this for the upcoming cx4 support as well).

Acked-by: Ian Munsie 

Excerpts from Philippe Bergheaud's message of 2016-06-30 13:45:37 +0200:
> One should not attempt to switch a PHB into CAPI mode if there is
> a switch between the PHB and the adapter. This patch modifies the
> cxl driver to ignore CAPI adapters misplaced in switched slots.
> 
> Signed-off-by: Philippe Bergheaud 
> ---
> This patch fixes Bz 142217.
> 
>  drivers/misc/cxl/pci.c | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index a08fcc8..2f978ed 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -1280,6 +1280,30 @@ static void cxl_pci_remove_adapter(struct cxl *adapter)
>  device_unregister(>dev);
>  }
>  
> +#define CXL_MAX_PCIEX_PARENT 2
> +
> +static int cxl_slot_is_switched(struct pci_dev *dev)
> +{
> +struct device_node *np;
> +int depth = 0;
> +const __be32 *prop;
> +
> +if (!(np = pci_device_to_OF_node(dev))) {
> +pr_err("cxl: np = NULL\n");
> +return -ENODEV;
> +}
> +of_node_get(np);
> +while (np) {
> +np = of_get_next_parent(np);
> +prop = of_get_property(np, "device_type", NULL);
> +if (!prop || strcmp((char *)prop, "pciex"))
> +break;
> +depth++;
> +}
> +of_node_put(np);
> +return (depth > CXL_MAX_PCIEX_PARENT);
> +}
> +
>  static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
>  {
>  struct cxl *adapter;
> @@ -1291,6 +1315,11 @@ static int cxl_probe(struct pci_dev *dev, const struct 
> pci_device_id *id)
>  return -ENODEV;
>  }
>  
> +if (cxl_slot_is_switched(dev)) {
> +dev_dbg(>dev, "cxl_init_adapter: Ignoring switched slot 
> device\n");
> +return -ENODEV;
> +}
> +
>  if (cxl_verbose)
>  dump_cxl_config_space(dev);
>  

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] cxl: Fix bug where AFU disable operation had no effect

2016-06-30 Thread Ian Munsie

From: Ian Munsie 

The AFU disable operation has a bug where it will not clear the enable
bit and therefore will have no effect. To date this has likely been
masked by fact that we perform an AFU reset before the disable, which
also has the effect of clearing the enable bit, making the following
disable operation effectively a noop on most hardware. This patch
modifies the afu_control function to take a parameter to clear from the
AFU control register so that the disable operation can clear the
appropriate bit.

This bug was uncovered on the Mellanox CX4, which uses an XSL rather
than a PSL. On the XSL the reset operation will not complete while the
AFU is enabled, meaning the enable bit was still set at the start of the
disable and as a result this bug was hit and the disable also timed out.

Because of this difference in behaviour between the PSL and XSL, this
patch now makes the reset dependent on the card using a PSL to avoid
waiting for a timeout on the XSL. It is entirely possible that we may be
able to drop the reset altogether if it turns out we only ever needed it
due to this bug - however I am not willing to drop it without further
regression testing and have added comments to the code explaining the
background.

This also fixes a small issue where the AFU_Cntl register was read
outside of the lock that protects it.

Signed-off-by: Ian Munsie 
---
Changes since v1:
- Modified comment to dedicated process disable path to explain
  the architected procedure, and the origin of our more heavy weight
  procedure.
- Modified comment to afu directed disable path with a note that the
  procedure is AFU specific, and explaining the origin of our heavy
  weight prcedure.
- Removed needs_reset_before_disable condition from dedicated process
  disable path. The XSL in the CX4 does not use this mode (AFU directed
  only), and since the reset in this procedure is architected we
  *should* never need to skip it.

 drivers/misc/cxl/cxl.h|  1 +
 drivers/misc/cxl/native.c | 58 +--
 drivers/misc/cxl/pci.c|  1 +
 3 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index ce2b9d5..bab8dfd 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -544,6 +544,7 @@ struct cxl_service_layer_ops {
void (*write_timebase_ctrl)(struct cxl *adapter);
u64 (*timebase_read)(struct cxl *adapter);
int capi_mode;
+   bool needs_reset_before_disable;
 };
 
 struct cxl_native {
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 120c468..e774505 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -21,10 +21,10 @@
 #include "cxl.h"
 #include "trace.h"
 
-static int afu_control(struct cxl_afu *afu, u64 command,
+static int afu_control(struct cxl_afu *afu, u64 command, u64 clear,
   u64 result, u64 mask, bool enabled)
 {
-   u64 AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+   u64 AFU_Cntl;
unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT);
int rc = 0;
 
@@ -33,7 +33,8 @@ static int afu_control(struct cxl_afu *afu, u64 command,
 
trace_cxl_afu_ctrl(afu, command);
 
-   cxl_p2n_write(afu, CXL_AFU_Cntl_An, AFU_Cntl | command);
+   AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+   cxl_p2n_write(afu, CXL_AFU_Cntl_An, (AFU_Cntl & ~clear) | command);
 
AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
while ((AFU_Cntl & mask) != result) {
@@ -67,7 +68,7 @@ static int afu_enable(struct cxl_afu *afu)
 {
pr_devel("AFU enable request\n");
 
-   return afu_control(afu, CXL_AFU_Cntl_An_E,
+   return afu_control(afu, CXL_AFU_Cntl_An_E, 0,
   CXL_AFU_Cntl_An_ES_Enabled,
   CXL_AFU_Cntl_An_ES_MASK, true);
 }
@@ -76,7 +77,8 @@ int cxl_afu_disable(struct cxl_afu *afu)
 {
pr_devel("AFU disable request\n");
 
-   return afu_control(afu, 0, CXL_AFU_Cntl_An_ES_Disabled,
+   return afu_control(afu, 0, CXL_AFU_Cntl_An_E,
+  CXL_AFU_Cntl_An_ES_Disabled,
   CXL_AFU_Cntl_An_ES_MASK, false);
 }
 
@@ -85,7 +87,7 @@ static int native_afu_reset(struct cxl_afu *afu)
 {
pr_devel("AFU reset request\n");
 
-   return afu_control(afu, CXL_AFU_Cntl_An_RA,
+   return afu_control(afu, CXL_AFU_Cntl_An_RA, 0,
   CXL_AFU_Cntl_An_RS_Complete | 
CXL_AFU_Cntl_An_ES_Disabled,
   CXL_AFU_Cntl_An_RS_MASK | CXL_AFU_Cntl_An_ES_MASK,
   false);
@@ -595,7 +597,33 @@ static int deactivate_afu_directed(struct cxl_afu *afu)
cxl_sysfs_afu_m_remove(afu);
cxl_chardev_afu_remove(afu);
 
-   cxl_ops->afu_reset(afu);
+   /*
+* The CAIA section 2.2.1 indicates

Re: [PATCH 1/2] cxl: Fix bug where AFU disable operation had no effect

2016-06-30 Thread Ian Munsie

Excerpts from Frederic Barrat's message of 2016-06-30 17:50:00 +0200:
> 
> Le 30/06/2016 17:32, Ian Munsie a écrit :
> >> For dedicated mode, the CAIA recommends an explicit reset of the AFU
> >> >(section 2.1.1).
> > True, I had forgotten that procedure was added to the document before it
> > was made public - I'll update the comment and resend.
> >
> 
> Actually, my point was that for dedicated mode, we shouldn't have the 
> "if" and always reset. It's only for dedicated mode, so it wouldn't 
> impact cx4 and we would stay CAIA-compliant. If one day, there's a xsl 
> with a dedicated mode AFU, they are expected to follow the spec.

Yeah, I thought of that as well while I was updating the patch and
removed that from the dedicated path. I still added a comment to that
path to note that though.

Cheers,
-Ian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/9] kexec_file: Generalize kexec_add_buffer.

2016-06-30 Thread Thiago Jung Bauermann

Am Donnerstag, 30 Juni 2016, 12:49:44 schrieb Thiago Jung Bauermann:
> Am Donnerstag, 30 Juni 2016, 11:07:00 schrieb Dave Young:
> > On 06/29/16 at 06:18pm, Thiago Jung Bauermann wrote:
> > > I'm not following. The IMA buffer patchset doesn't use
> > > kexec_locate_mem_hole nor struct kexec_buf.
> > 
> > It does not use kexec_locate_mem_hole, but the buffer being passed is
> > very similar to a kexec_buf struct, no?
> 
> If what you're saying is that the arguments passed to
> kexec_add_handover_buffer in the IMA buffer patchset are very similar to
> the arguments passed to kexec_add_buffer then yes, it's true.
> 
> > So you may refactor kexec_add_buffer and your new function to pass only
> > kimage and a kbuf, it will be better than passing all those arguments
> > separately.
> 
> To be honest I think struct kexec_buf is an implementation detail inside
> kexec_locate_mem_hole, made necessary because the callback functions it
> uses need to access its arguments. Callers of kexec_locate_mem_hole,
> kexec_add_handover_buffer and kexec_add_buffer shouldn't need to know it
> exists.

Elaborating a bit more: the argument list for these three functions are 
equal or similar because kexec_add_handover_buffer uses kexec_add_buffer, 
which uses kexec_locate_mem_hole.

It could be beneficial to have a struct to collect the arguments to these 
functions if someone using one of them would be likely to use another one 
with the same arguments. In that case, you set up kexec_buf once and then 
just pass it whenever you need to call one of those functions.

But that is unlikely to happen. A user of the kexec API will need to use 
only one of these functions with a given set of arguments, so they don't 
gain anything by setting up a struct.

Syntactically, I also don't think it's clearer to set struct members instead 
of simply passing arguments to a function, even if the argument list is 
long.

[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/9] kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.

2016-06-30 Thread Thiago Jung Bauermann

Am Donnerstag, 30 Juni 2016, 11:41:19 schrieb Dave Young:
> On 06/29/16 at 06:09pm, Thiago Jung Bauermann wrote:
> > Am Mittwoch, 29 Juni 2016, 15:45:18 schrieb Dave Young:
> > > On 06/28/16 at 07:18pm, Thiago Jung Bauermann wrote:
> > > > Am Dienstag, 28 Juni 2016, 15:20:55 schrieb Dave Young:
> > > > > On 06/27/16 at 04:21pm, Dave Young wrote:
> > > It looks good except one nitpick inline..
> > > 
> > > > +/**
> > > > + * kexec_locate_mem_hole - find free memory to load segment or use
> > > > in
> > > > purgatory
> > > 
> > > It is not necessary to use only for purgatory load..
> > 
> > Ok, what about this?
> > 
> > /**
> > 
> >  * kexec_locate_mem_hole - find free memory in a given kimage.
> 
> Hmm, a given kimage sounds not correct, I can not get a better way to
> describe it. How about below with a little change to your previous one:
> 
> kexec_locate_mem_hole - find a free chunk of memory to load kexec segment.
> In powerpc the memory chunk can also be used for the purgatory stack.

That describes what the memory currently is used for. If powerpc or any 
other architecture starts to use the memory for something else, this comment 
would need to be updated. :-)

What the function really does is find free memory in the physical address 
space after the currently running kernel hands over control to whatever runs 
next. What that memory is used for is decided by the caller of the function.

Since (at least for now), the only things that run next are the purgatory 
and the next kernel, what about this?

kexec_locate_mem_hole - find free memory for the purgatory or the next 
kernel

[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] cxl: Fix bug where AFU disable operation had no effect

2016-06-30 Thread Frederic Barrat




Le 30/06/2016 17:32, Ian Munsie a écrit :

For dedicated mode, the CAIA recommends an explicit reset of the AFU
>(section 2.1.1).

True, I had forgotten that procedure was added to the document before it
was made public - I'll update the comment and resend.



Actually, my point was that for dedicated mode, we shouldn't have the 
"if" and always reset. It's only for dedicated mode, so it wouldn't 
impact cx4 and we would stay CAIA-compliant. If one day, there's a xsl 
with a dedicated mode AFU, they are expected to follow the spec.


  Fred

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/9] kexec_file: Generalize kexec_add_buffer.

2016-06-30 Thread Thiago Jung Bauermann

Am Donnerstag, 30 Juni 2016, 11:07:00 schrieb Dave Young:
> On 06/29/16 at 06:18pm, Thiago Jung Bauermann wrote:
> > Am Mittwoch, 29 Juni 2016, 15:47:51 schrieb Dave Young:
> > > On 06/28/16 at 07:18pm, Thiago Jung Bauermann wrote:
> > > > +/**
> > > > + * struct kexec_buf - parameters for finding a place for a buffer
> > > > in
> > > > memory + * @image:   kexec image in which memory to search.
> > > > + * @mem: On return will have address of the buffer in memory.
> > > > + * @memsz:   Size for the buffer in memory.
> > > > + * @buf_align:   Minimum alignment needed.
> > > > + * @buf_min: The buffer can't be placed below this address.
> > > > + * @buf_max: The buffer can't be placed above this address.
> > > > + * @top_down:Allocate from top of memory.
> > > > + */
> > > > +struct kexec_buf {
> > > > + struct kimage *image;
> > > > + unsigned long mem;
> > > > + unsigned long memsz;
> > > > + unsigned long buf_align;
> > > > + unsigned long buf_min;
> > > > + unsigned long buf_max;
> > > > + bool top_down;
> > > > +};
> > > 
> > > Rethink about the first patch, you dropped the user buffer in
> > > kexec_buf
> > > But later your passing IMA digests buffer patchset may need use it.
> > > 
> > > So keep it in kexec_buf should be better.
> > 
> > I'm not following. The IMA buffer patchset doesn't use
> > kexec_locate_mem_hole nor struct kexec_buf.
> 
> It does not use kexec_locate_mem_hole, but the buffer being passed is
> very similar to a kexec_buf struct, no?

If what you're saying is that the arguments passed to 
kexec_add_handover_buffer in the IMA buffer patchset are very similar to the 
arguments passed to kexec_add_buffer then yes, it's true.

> So you may refactor kexec_add_buffer and your new function to pass only
> kimage and a kbuf, it will be better than passing all those arguments
> separately.

To be honest I think struct kexec_buf is an implementation detail inside 
kexec_locate_mem_hole, made necessary because the callback functions it uses 
need to access its arguments. Callers of kexec_locate_mem_hole, 
kexec_add_handover_buffer and kexec_add_buffer shouldn't need to know it 
exists.

[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/9] kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.

2016-06-30 Thread Dave Young

On 06/29/16 at 06:09pm, Thiago Jung Bauermann wrote:
> Am Mittwoch, 29 Juni 2016, 15:45:18 schrieb Dave Young:
> > On 06/28/16 at 07:18pm, Thiago Jung Bauermann wrote:
> > > Am Dienstag, 28 Juni 2016, 15:20:55 schrieb Dave Young:
> > > > On 06/27/16 at 04:21pm, Dave Young wrote:
> > > > Using one argument for both sounds more reasonable than using a
> > > > separate
> > > > argument for memory walk..
> > > 
> > > I agree. This patch doesn't use a separate top_down argument, it's the
> > > same patch I sent earlier except that the comments to struct kexec_buf
> > > are in patch 2/9. What do you think?
> > 
> > It looks good except one nitpick inline..
> > 
> >
> > > +/**
> > > + * kexec_locate_mem_hole - find free memory to load segment or use in
> > > purgatory
>  
> > It is not necessary to use only for purgatory load..
> 
> Ok, what about this?
> 
> /**
>  * kexec_locate_mem_hole - find free memory in a given kimage.

Hmm, a given kimage sounds not correct, I can not get a better way to
describe it. How about below with a little change to your previous one:

kexec_locate_mem_hole - find a free chunk of memory to load kexec segment.
In powerpc the memory chunk can also be used for the purgatory stack.

Thanks
Dave
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4] cpuidle: Fix last_residency division

2016-06-30 Thread Nicolas Pitre

On Thu, 30 Jun 2016, Daniel Lezcano wrote:

> On 06/30/2016 04:34 PM, Shreyas B. Prabhu wrote:
> > Snooze is a poll idle state in powernv and pseries platforms. Snooze
> > has a timeout so that if a cpu stays in snooze for more than target
> > residency of the next available idle state, then it would exit thereby
> > giving chance to the cpuidle governor to re-evaluate and
> > promote the cpu to a deeper idle state. Therefore whenever snooze exits
> > due to this timeout, its last_residency will be target_residency of next
> > deeper state.
> >
> > commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
> > changed the math around last_residency calculation. Specifically, while
> > converting last_residency value from nanoseconds to microseconds it does
> > right shift by 10. Due to this, in snooze timeout exit scenarios
> > last_residency calculated is roughly 2.3% less than target_residency of
> > next available state. This pattern is picked up get_typical_interval()
> > in the menu governor and therefore expected_interval in menu_select() is
> > frequently less than the target_residency of any state but snooze.
> >
> > Due to this we are entering snooze at a higher rate, thereby affecting
> > the single thread performance.
> >
> > Fix this by using a better approximation for division by 1000.
> >
> > Reported-by: Anton Blanchard 
> > Bisected-by: Shilpasri G Bhat 
> > Suggested-by David Laight 
> > Signed-off-by: Shreyas B. Prabhu 
> > ---
> > Changes in v4
> > =
> >   - Increasing the threshold upto which approximation can be used.
> >   - Removed explicit cast. Instead added a comment saying why cast
> > is safe.
> >
> > Changes in v3
> > =
> >   - Using approximation suggested by David
> >
> > Changes in v2
> > =
> >   - Fixing it in the cpuidle core code instead of driver code.
> >
> >   drivers/cpuidle/cpuidle.c | 11 +++
> >   drivers/cpuidle/cpuidle.h | 38 ++
> >   2 files changed, 41 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index a4d0059..f55ad01 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -174,7 +174,6 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
> > struct cpuidle_driver *drv,
> >struct cpuidle_state *target_state = >states[index];
> >bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
> >u64 time_start, time_end;
> > -   s64 diff;
> >
> >/*
> >  * Tell the time framework to switch to a broadcast timer because our
> > @@ -218,14 +217,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
> > struct cpuidle_driver *drv,
> > local_irq_enable();
> >
> > /*
> > -* local_clock() returns the time in nanosecond, let's shift
> > -* by 10 (divide by 1024) to have microsecond based time.
> > +* local_clock() returns the time in nanoseconds, convert it to
> > +* microsecond based time.
> >  */
> > -   diff = (time_end - time_start) >> 10;
> > -   if (diff > INT_MAX)
> > -   diff = INT_MAX;
> > -
> > -   dev->last_residency = (int) diff;
> > +   dev->last_residency = convert_nsec_to_usec(time_end - time_start);
> >
> >if (entered_state >= 0) {
> > /* Update cpuidle counters */
> > diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
> > index f87f399..a027b35 100644
> > --- a/drivers/cpuidle/cpuidle.h
> > +++ b/drivers/cpuidle/cpuidle.h
> > @@ -68,4 +68,42 @@ static inline void
> > cpuidle_coupled_unregister_device(struct cpuidle_device *dev)
> >   }
> >   #endif
> >
> > +/*
> > + * To ensure that there is no overflow while approximation
> > + * for dividing val by 1000, we must respect -
> > + * val + (val >> 5) <= 0x
> > + * val + val/32 <= 0x
> > + * val <= (0x * 32) / 33
> > + * val <= 0xF83E0F82
> > + * Hence the threshold for val below which we can use the
> > + * approximation is 0xF83E0F82
> > + */
> > +#define DIV_APPROXIMATION_THRESHOLD 0xF83E0F82UL
> > +
> > +/*
> > + * Used for calculating last_residency in usec. Optimized for case
> > + * where last_residency in nsecs is < DIV_APPROXIMATION_THRESHOLD
> > + * Approximated value has less than 1% error.
> > + */
> > +static inline int convert_nsec_to_usec(u64 nsec)
> > +{
> > +   if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
> > +   u32 usec = nsec;
> > +
> > +   usec += usec >> 5;
> > +   usec = usec >> 10;
> > +
> > +   /* Can safely cast to int since usec is < INT_MAX */
> > +   return usec;
> > +   } else {
> > +   u64 usec = div_u64(nsec, 1000);
> > +
> > +   if (usec > INT_MAX)
> > +   usec = INT_MAX;
> > +
> > +   /* Can safely cast to int since usec is < INT_MAX */
> > +   return usec;
> > +   }
> > +}
>

Re: [PATCH 1/2] cxl: Fix bug where AFU disable operation had no effect

2016-06-30 Thread Ian Munsie

Excerpts from Frederic Barrat's message of 2016-06-30 16:19:54 +0200:
> I'm not a big fan of the new "clear" argument, which forces us to pass 
> an extra 0 most of the time. Why not always clearing the "action" bits 
> of the register before applying the command? They are mutually 
> exclusive, so it should work. I.e. :
> 
> +AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
> +AFU_Cntl &= ~(CXL_AFU_Cntl_An_E | CXL_AFU_Cntl_An_RA);
> +AFU_Cntl |= command;

In theory that should be OK, but I'd want to test it on some PSL images
first just in case they don't behave quite how we expect since we've had
problems in this area in the past (although after discovering the bug in
the disable path that may provide the explanation for those problems).

The risk I see with that approach is setting the reset bit and clearing
the enable bit at the same time is commanding the hardware to do two
similar but subtly different operations simultaneously, which is not
something we have done before or tested - we can always clean this up in
a later patch after we've tested it on a couple of PSLs, cxlflash, etc
and are happy that it works with no ill effects. I don't see any reason
that needs to be done in this patch though since it's just churn inside
the driver and doesn't impact anyone else.

> >   static inline int detach_process_native_dedicated(struct cxl_context *ctx)
> >   {
> > -cxl_ops->afu_reset(ctx->afu);
> > +if (ctx->afu->adapter->native->sl_ops->needs_reset_before_disable) {
> > +/*
> > + * XXX: We may be able to do away with this entirely - it is
> > + * possible that this was only ever needed due to a bug where
> > + * the disable operation did not clear the enable bit, however
> > + * I will only consider dropping this after more regression
> > + * testing on earlier PSL images.
> > + */
> > +cxl_ops->afu_reset(ctx->afu);
> > +}
> 
> For dedicated mode, the CAIA recommends an explicit reset of the AFU 
> (section 2.1.1).

True, I had forgotten that procedure was added to the document before it
was made public - I'll update the comment and resend.

> For directed mode, CAIA says it's AFU-specific.

Damnit, that should never have been architected as AFU specific - PSL
implementation specific would have been ok, but AFU specific is just
asking for trouble since this is all generic code with no way to
identify the AFU.

Oh well, in practice the AFU developers will be coding against our
implementation now, so I guess we effectively became the authority on
how this works by default... and therefore should probably continue to
do the reset here.

Just hope there aren't too many more ASIC implementations that implement
some contradictory behaviour and are set in stone before anyone
realises.

I'll resend this with these two comments updated.

> So for xsl, we only disable the afu and purge the xsl. Are we getting
> rid of the reset because it's useless in that environment, or because
> it times out?
>
> If just because of timeout, would it make sense to switch the order
> and disable first, then reset? I don't see any afu-reset on the next
> activation.

It times out if the AFU is enabled when we attempt the reset - that's
OK, but is a bit of a waste of time and generates unnecessary noise in
the kernel log. We could switch the order, but I don't think that the
AFU reset is necessary - the documentation certainly suggests that only
a disable is required (but then again, the XSL has been full of
surprises already so I reserve the right to send a later patch adding a
reset if it has one more in store for me :-p).

A reset is also pretty meaningless on the XSL as far as I can tell - it
looks like it will assert a bit to the CX4 hardware, so I assume the
rest of the card could choose to do something with it, but unlike the
FPGA cards I don't think it actually resets anything (and I doubt we
even need the one we do while initialising the card since the AFU
descriptor is in the CX4 hardware and should be readable without that).

Cheers,
-Ian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cpuidle: Fix last_residency division

2016-06-30 Thread Nicolas Pitre

On Thu, 30 Jun 2016, Shreyas B. Prabhu wrote:

> Snooze is a poll idle state in powernv and pseries platforms. Snooze
> has a timeout so that if a cpu stays in snooze for more than target
> residency of the next available idle state, then it would exit thereby
> giving chance to the cpuidle governor to re-evaluate and
> promote the cpu to a deeper idle state. Therefore whenever snooze exits
> due to this timeout, its last_residency will be target_residency of next
> deeper state.
> 
> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
> changed the math around last_residency calculation. Specifically, while
> converting last_residency value from nanoseconds to microseconds it does
> right shift by 10. Due to this, in snooze timeout exit scenarios
> last_residency calculated is roughly 2.3% less than target_residency of
> next available state. This pattern is picked up get_typical_interval()
> in the menu governor and therefore expected_interval in menu_select() is
> frequently less than the target_residency of any state but snooze.
> 
> Due to this we are entering snooze at a higher rate, thereby affecting
> the single thread performance.
> 
> Fix this by using a better approximation for division by 1000.
> 
> Reported-by: Anton Blanchard 
> Bisected-by: Shilpasri G Bhat 
> Suggested-by David Laight 
> Signed-off-by: Shreyas B. Prabhu 

Minor nit:

> +/*
> + * To ensure that there is no overflow while approximation
> + * for dividing val by 1000, we must respect -
> + * val + (val >> 5) <= 0x
> + * val + val/32 <= 0x
> + * val <= (0x * 32) / 33
> + * val <= 0xF83E0F82
> + * Hence the threshold for val below which we can use the
> + * approximation is 0xF83E0F82
> + */
> +#define DIV_APPROXIMATION_THRESHOLD 0xF83E0F82UL
> +
> +/*
> + * Used for calculating last_residency in usec. Optimized for case
> + * where last_residency in nsecs is < DIV_APPROXIMATION_THRESHOLD
> + * Approximated value has less than 1% error.
> + */
> +static inline int convert_nsec_to_usec(u64 nsec)
> +{
> + if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {

To be coherent with the comment, you could use <= instead.

Then you may add:

Reviewed-by: Nicolas Pitre 


Nicolas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/9] kexec_file: Generalize kexec_add_buffer.

2016-06-30 Thread Dave Young

On 06/29/16 at 06:18pm, Thiago Jung Bauermann wrote:
> Am Mittwoch, 29 Juni 2016, 15:47:51 schrieb Dave Young:
> > On 06/28/16 at 07:18pm, Thiago Jung Bauermann wrote:
> > > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> > > index e8acb2b43dd9..e16d845d587f 100644
> > > --- a/include/linux/kexec.h
> > > +++ b/include/linux/kexec.h
> > > @@ -146,7 +146,30 @@ struct kexec_file_ops {
> > > 
> > >   kexec_verify_sig_t *verify_sig;
> > >  
> > >  #endif
> > >  };
> > > 
> > > -#endif
> > > +
> > > +/**
> > > + * struct kexec_buf - parameters for finding a place for a buffer in
> > > memory + * @image:   kexec image in which memory to search.
> > > + * @mem: On return will have address of the buffer in memory.
> > > + * @memsz:   Size for the buffer in memory.
> > > + * @buf_align:   Minimum alignment needed.
> > > + * @buf_min: The buffer can't be placed below this address.
> > > + * @buf_max: The buffer can't be placed above this address.
> > > + * @top_down:Allocate from top of memory.
> > > + */
> > > +struct kexec_buf {
> > > + struct kimage *image;
> > > + unsigned long mem;
> > > + unsigned long memsz;
> > > + unsigned long buf_align;
> > > + unsigned long buf_min;
> > > + unsigned long buf_max;
> > > + bool top_down;
> > > +};
> > 
> > Rethink about the first patch, you dropped the user buffer in kexec_buf
> > But later your passing IMA digests buffer patchset may need use it.
> > 
> > So keep it in kexec_buf should be better.
> 
> I'm not following. The IMA buffer patchset doesn't use kexec_locate_mem_hole 
> nor struct kexec_buf.

It does not use kexec_locate_mem_hole, but the buffer being passed is
very similar to a kexec_buf struct, no? 

So you may refactor kexec_add_buffer and your new function to pass only kimage
and a kbuf, it will be better than passing all those arguments separately.

> 
> > For the IMA buffer patchset I'm still reading and learning the
> > background, will reply them later.
> 
> Thank you!
> 
> []'s
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 

Thanks
Dave
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4] cpuidle: Fix last_residency division

2016-06-30 Thread Daniel Lezcano


On 06/30/2016 04:34 PM, Shreyas B. Prabhu wrote:

Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a cpu stays in snooze for more than target
residency of the next available idle state, then it would exit thereby
giving chance to the cpuidle governor to re-evaluate and
promote the cpu to a deeper idle state. Therefore whenever snooze exits
due to this timeout, its last_residency will be target_residency of next
deeper state.

commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
changed the math around last_residency calculation. Specifically, while
converting last_residency value from nanoseconds to microseconds it does
right shift by 10. Due to this, in snooze timeout exit scenarios
last_residency calculated is roughly 2.3% less than target_residency of
next available state. This pattern is picked up get_typical_interval()
in the menu governor and therefore expected_interval in menu_select() is
frequently less than the target_residency of any state but snooze.

Due to this we are entering snooze at a higher rate, thereby affecting
the single thread performance.

Fix this by using a better approximation for division by 1000.

Reported-by: Anton Blanchard 
Bisected-by: Shilpasri G Bhat 
Suggested-by David Laight 
Signed-off-by: Shreyas B. Prabhu 
---
Changes in v4
=
  - Increasing the threshold upto which approximation can be used.
  - Removed explicit cast. Instead added a comment saying why cast
is safe.

Changes in v3
=
  - Using approximation suggested by David

Changes in v2
=
  - Fixing it in the cpuidle core code instead of driver code.

  drivers/cpuidle/cpuidle.c | 11 +++
  drivers/cpuidle/cpuidle.h | 38 ++
  2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a4d0059..f55ad01 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -174,7 +174,6 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct 
cpuidle_driver *drv,
struct cpuidle_state *target_state = >states[index];
bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
u64 time_start, time_end;
-   s64 diff;

/*
 * Tell the time framework to switch to a broadcast timer because our
@@ -218,14 +217,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
struct cpuidle_driver *drv,
local_irq_enable();

/*
-* local_clock() returns the time in nanosecond, let's shift
-* by 10 (divide by 1024) to have microsecond based time.
+* local_clock() returns the time in nanoseconds, convert it to
+* microsecond based time.
 */
-   diff = (time_end - time_start) >> 10;
-   if (diff > INT_MAX)
-   diff = INT_MAX;
-
-   dev->last_residency = (int) diff;
+   dev->last_residency = convert_nsec_to_usec(time_end - time_start);

if (entered_state >= 0) {
/* Update cpuidle counters */
diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
index f87f399..a027b35 100644
--- a/drivers/cpuidle/cpuidle.h
+++ b/drivers/cpuidle/cpuidle.h
@@ -68,4 +68,42 @@ static inline void cpuidle_coupled_unregister_device(struct 
cpuidle_device *dev)
  }
  #endif

+/*
+ * To ensure that there is no overflow while approximation
+ * for dividing val by 1000, we must respect -
+ * val + (val >> 5) <= 0x
+ * val + val/32 <= 0x
+ * val <= (0x * 32) / 33
+ * val <= 0xF83E0F82
+ * Hence the threshold for val below which we can use the
+ * approximation is 0xF83E0F82
+ */
+#define DIV_APPROXIMATION_THRESHOLD 0xF83E0F82UL
+
+/*
+ * Used for calculating last_residency in usec. Optimized for case
+ * where last_residency in nsecs is < DIV_APPROXIMATION_THRESHOLD
+ * Approximated value has less than 1% error.
+ */
+static inline int convert_nsec_to_usec(u64 nsec)
+{
+   if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
+   u32 usec = nsec;
+
+   usec += usec >> 5;
+   usec = usec >> 10;
+
+   /* Can safely cast to int since usec is < INT_MAX */
+   return usec;
+   } else {
+   u64 usec = div_u64(nsec, 1000);
+
+   if (usec > INT_MAX)
+   usec = INT_MAX;
+
+   /* Can safely cast to int since usec is < INT_MAX */
+   return usec;
+   }
+}



What bothers me with this division is the benefit of adding an extra 
ultra optimized division by 1000 in cpuidle.h while we have already 
ktime_divns which is optimized in ktime.h.


Why not:

ts = ns_to_ktime(local_clock());

...

te = ns_to_ktime(local_clock());


diff = ktime_us_delta(te, ts);





--
  Linaro.org │ Open source software for ARM

[PATCH v4] cpuidle: Fix last_residency division

2016-06-30 Thread Shreyas B. Prabhu

Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a cpu stays in snooze for more than target
residency of the next available idle state, then it would exit thereby
giving chance to the cpuidle governor to re-evaluate and
promote the cpu to a deeper idle state. Therefore whenever snooze exits
due to this timeout, its last_residency will be target_residency of next
deeper state.

commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
changed the math around last_residency calculation. Specifically, while
converting last_residency value from nanoseconds to microseconds it does
right shift by 10. Due to this, in snooze timeout exit scenarios
last_residency calculated is roughly 2.3% less than target_residency of
next available state. This pattern is picked up get_typical_interval()
in the menu governor and therefore expected_interval in menu_select() is
frequently less than the target_residency of any state but snooze.

Due to this we are entering snooze at a higher rate, thereby affecting
the single thread performance.

Fix this by using a better approximation for division by 1000.

Reported-by: Anton Blanchard 
Bisected-by: Shilpasri G Bhat 
Suggested-by David Laight 
Signed-off-by: Shreyas B. Prabhu 
---
Changes in v4
=
 - Increasing the threshold upto which approximation can be used.
 - Removed explicit cast. Instead added a comment saying why cast
   is safe.

Changes in v3
=
 - Using approximation suggested by David

Changes in v2
=
 - Fixing it in the cpuidle core code instead of driver code.

 drivers/cpuidle/cpuidle.c | 11 +++
 drivers/cpuidle/cpuidle.h | 38 ++
 2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a4d0059..f55ad01 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -174,7 +174,6 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct 
cpuidle_driver *drv,
struct cpuidle_state *target_state = >states[index];
bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
u64 time_start, time_end;
-   s64 diff;
 
/*
 * Tell the time framework to switch to a broadcast timer because our
@@ -218,14 +217,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
struct cpuidle_driver *drv,
local_irq_enable();
 
/*
-* local_clock() returns the time in nanosecond, let's shift
-* by 10 (divide by 1024) to have microsecond based time.
+* local_clock() returns the time in nanoseconds, convert it to
+* microsecond based time.
 */
-   diff = (time_end - time_start) >> 10;
-   if (diff > INT_MAX)
-   diff = INT_MAX;
-
-   dev->last_residency = (int) diff;
+   dev->last_residency = convert_nsec_to_usec(time_end - time_start);
 
if (entered_state >= 0) {
/* Update cpuidle counters */
diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
index f87f399..a027b35 100644
--- a/drivers/cpuidle/cpuidle.h
+++ b/drivers/cpuidle/cpuidle.h
@@ -68,4 +68,42 @@ static inline void cpuidle_coupled_unregister_device(struct 
cpuidle_device *dev)
 }
 #endif
 
+/*
+ * To ensure that there is no overflow while approximation
+ * for dividing val by 1000, we must respect -
+ * val + (val >> 5) <= 0x
+ * val + val/32 <= 0x
+ * val <= (0x * 32) / 33
+ * val <= 0xF83E0F82
+ * Hence the threshold for val below which we can use the
+ * approximation is 0xF83E0F82
+ */
+#define DIV_APPROXIMATION_THRESHOLD 0xF83E0F82UL
+
+/*
+ * Used for calculating last_residency in usec. Optimized for case
+ * where last_residency in nsecs is < DIV_APPROXIMATION_THRESHOLD
+ * Approximated value has less than 1% error.
+ */
+static inline int convert_nsec_to_usec(u64 nsec)
+{
+   if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
+   u32 usec = nsec;
+
+   usec += usec >> 5;
+   usec = usec >> 10;
+
+   /* Can safely cast to int since usec is < INT_MAX */
+   return usec;
+   } else {
+   u64 usec = div_u64(nsec, 1000);
+
+   if (usec > INT_MAX)
+   usec = INT_MAX;
+
+   /* Can safely cast to int since usec is < INT_MAX */
+   return usec;
+   }
+}
+
 #endif /* __DRIVER_CPUIDLE_H */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cpuidle: Fix last_residency division

2016-06-30 Thread Shreyas B Prabhu

Please ignore this mail. I have resent this post using the correct
version number. Sorry for the noise.


On 06/30/2016 07:57 PM, Shreyas B. Prabhu wrote:
> Snooze is a poll idle state in powernv and pseries platforms. Snooze
> has a timeout so that if a cpu stays in snooze for more than target
> residency of the next available idle state, then it would exit thereby
> giving chance to the cpuidle governor to re-evaluate and
> promote the cpu to a deeper idle state. Therefore whenever snooze exits
> due to this timeout, its last_residency will be target_residency of next
> deeper state.
> 
> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
> changed the math around last_residency calculation. Specifically, while
> converting last_residency value from nanoseconds to microseconds it does
> right shift by 10. Due to this, in snooze timeout exit scenarios
> last_residency calculated is roughly 2.3% less than target_residency of
> next available state. This pattern is picked up get_typical_interval()
> in the menu governor and therefore expected_interval in menu_select() is
> frequently less than the target_residency of any state but snooze.
> 
> Due to this we are entering snooze at a higher rate, thereby affecting
> the single thread performance.
> 
> Fix this by using a better approximation for division by 1000.
> 
> Reported-by: Anton Blanchard 
> Bisected-by: Shilpasri G Bhat 
> Suggested-by David Laight 
> Signed-off-by: Shreyas B. Prabhu 
> ---
> Changes in v4
> =
>  - Increasing the threshold upto which approximation can be used.
>  - Removed explicit cast. Instead added a comment saying why cast
>is safe.
> 
> Changes in v3
> =
>  - Using approximation suggested by David
> 
> Changes in v2
> =
>  - Fixing it in the cpuidle core code instead of driver code.
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] powerpc/boot: Add OPAL console to epapr wrappers

2016-06-30 Thread Oliver O'Halloran

This patch adds an OPAL console backend to the powerpc boot wrapper so
that decompression failures inside the wrapper can be reported to the
user. This is important since it typically indicates data corruption in
the firmware and other nasty things.

Currently this only works when building a little endian kernel. When
compiling a 64 bit BE kernel the wrapper is always build 32 bit to be
compatible with some 32 bit firmwares. BE support will be added at a
later date. Another limitation of this is that only the "raw" type of
OPAL console is supported, however machines that provide a hvsi console
also provide a raw console so this is not an issue in practice.

Actually-written-by: Benjamin Herrenschmidt 
Signed-off-by: Oliver O'Halloran 
Cc: Stewart Smith 
Cc: sta...@vger.kernel.org
---

Changelog:

v2: Added missing files
v3: Added copyright headers to opal.c and opal-calls.S

---
 arch/powerpc/boot/Makefile |  4 +-
 arch/powerpc/boot/opal-calls.S | 58 +
 arch/powerpc/boot/opal.c   | 97 ++
 arch/powerpc/boot/ops.h|  1 +
 arch/powerpc/boot/ppc_asm.h|  4 ++
 arch/powerpc/boot/serial.c |  2 +
 arch/powerpc/boot/types.h  | 10 +
 7 files changed, 174 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/boot/opal-calls.S
 create mode 100644 arch/powerpc/boot/opal.c

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 8fe78a3efc92..00cf88aa9a23 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -70,7 +70,7 @@ $(addprefix $(obj)/,$(zlib) cuboot-c2k.o gunzip_util.o 
main.o): \
 libfdt   := fdt.c fdt_ro.c fdt_wip.c fdt_sw.c fdt_rw.c fdt_strerror.c
 libfdtheader := fdt.h libfdt.h libfdt_internal.h
 
-$(addprefix $(obj)/,$(libfdt) libfdt-wrapper.o simpleboot.o epapr.o): \
+$(addprefix $(obj)/,$(libfdt) libfdt-wrapper.o simpleboot.o epapr.o opal.o): \
$(addprefix $(obj)/,$(libfdtheader))
 
 src-wlib-y := string.S crt0.S crtsavres.S stdio.c main.c \
@@ -78,7 +78,7 @@ src-wlib-y := string.S crt0.S crtsavres.S stdio.c main.c \
ns16550.c serial.c simple_alloc.c div64.S util.S \
gunzip_util.c elf_util.c $(zlib) devtree.c stdlib.c \
oflib.c ofconsole.c cuboot.c mpsc.c cpm-serial.c \
-   uartlite.c mpc52xx-psc.c
+   uartlite.c mpc52xx-psc.c opal.c opal-calls.S
 src-wlib-$(CONFIG_40x) += 4xx.c planetcore.c
 src-wlib-$(CONFIG_44x) += 4xx.c ebony.c bamboo.c
 src-wlib-$(CONFIG_8xx) += mpc8xx.c planetcore.c fsl-soc.c
diff --git a/arch/powerpc/boot/opal-calls.S b/arch/powerpc/boot/opal-calls.S
new file mode 100644
index ..ff2f1b97bc53
--- /dev/null
+++ b/arch/powerpc/boot/opal-calls.S
@@ -0,0 +1,58 @@
+/*
+ * Copyright (c) 2016 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "ppc_asm.h"
+#include "../include/asm/opal-api.h"
+
+   .text
+
+#define OPAL_CALL(name, token) \
+   .globl name;\
+name:  \
+   li  r0, token;  \
+   b   opal_call;
+
+opal_call:
+   mflrr11
+   std r11,16(r1)
+   mfcrr12
+   stw r12,8(r1)
+   mr  r13,r2
+
+   /* Set opal return address */
+   ld  r11,opal_return@got(r2)
+   mtlrr11
+   mfmsr   r12
+
+   /* switch to BE when we enter OPAL */
+   li  r11,MSR_LE
+   andcr12,r12,r11
+   mtspr   SPRN_HSRR1,r12
+
+   /* load the opal call entry point and base */
+   ld  r11,opal@got(r2)
+   ld  r12,8(r11)
+   ld  r2,0(r11)
+   mtspr   SPRN_HSRR0,r12
+   hrfid
+
+opal_return:
+   FIXUP_ENDIAN
+   mr  r2,r13;
+   lwz r11,8(r1);
+   ld  r12,16(r1)
+   mtcrr11;
+   mtlrr12
+   blr
+
+OPAL_CALL(opal_console_write,  OPAL_CONSOLE_WRITE);
+OPAL_CALL(opal_console_read,   OPAL_CONSOLE_READ);
+OPAL_CALL(opal_console_write_buffer_space, 
OPAL_CONSOLE_WRITE_BUFFER_SPACE);
+OPAL_CALL(opal_poll_events,OPAL_POLL_EVENTS);
+OPAL_CALL(opal_console_flush,  OPAL_CONSOLE_FLUSH);
diff --git a/arch/powerpc/boot/opal.c b/arch/powerpc/boot/opal.c
new file mode 100644
index ..3a2ce1e1f048
--- /dev/null
+++ b/arch/powerpc/boot/opal.c
@@ -0,0 +1,97 @@
+/*
+ * Copyright (c) 2016 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation;

[PATCH] cpuidle: Fix last_residency division

2016-06-30 Thread Shreyas B. Prabhu

Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a cpu stays in snooze for more than target
residency of the next available idle state, then it would exit thereby
giving chance to the cpuidle governor to re-evaluate and
promote the cpu to a deeper idle state. Therefore whenever snooze exits
due to this timeout, its last_residency will be target_residency of next
deeper state.

commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
changed the math around last_residency calculation. Specifically, while
converting last_residency value from nanoseconds to microseconds it does
right shift by 10. Due to this, in snooze timeout exit scenarios
last_residency calculated is roughly 2.3% less than target_residency of
next available state. This pattern is picked up get_typical_interval()
in the menu governor and therefore expected_interval in menu_select() is
frequently less than the target_residency of any state but snooze.

Due to this we are entering snooze at a higher rate, thereby affecting
the single thread performance.

Fix this by using a better approximation for division by 1000.

Reported-by: Anton Blanchard 
Bisected-by: Shilpasri G Bhat 
Suggested-by David Laight 
Signed-off-by: Shreyas B. Prabhu 
---
Changes in v4
=
 - Increasing the threshold upto which approximation can be used.
 - Removed explicit cast. Instead added a comment saying why cast
   is safe.

Changes in v3
=
 - Using approximation suggested by David

Changes in v2
=
 - Fixing it in the cpuidle core code instead of driver code.

 drivers/cpuidle/cpuidle.c | 11 +++
 drivers/cpuidle/cpuidle.h | 38 ++
 2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a4d0059..f55ad01 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -174,7 +174,6 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct 
cpuidle_driver *drv,
struct cpuidle_state *target_state = >states[index];
bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
u64 time_start, time_end;
-   s64 diff;
 
/*
 * Tell the time framework to switch to a broadcast timer because our
@@ -218,14 +217,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
struct cpuidle_driver *drv,
local_irq_enable();
 
/*
-* local_clock() returns the time in nanosecond, let's shift
-* by 10 (divide by 1024) to have microsecond based time.
+* local_clock() returns the time in nanoseconds, convert it to
+* microsecond based time.
 */
-   diff = (time_end - time_start) >> 10;
-   if (diff > INT_MAX)
-   diff = INT_MAX;
-
-   dev->last_residency = (int) diff;
+   dev->last_residency = convert_nsec_to_usec(time_end - time_start);
 
if (entered_state >= 0) {
/* Update cpuidle counters */
diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
index f87f399..a027b35 100644
--- a/drivers/cpuidle/cpuidle.h
+++ b/drivers/cpuidle/cpuidle.h
@@ -68,4 +68,42 @@ static inline void cpuidle_coupled_unregister_device(struct 
cpuidle_device *dev)
 }
 #endif
 
+/*
+ * To ensure that there is no overflow while approximation
+ * for dividing val by 1000, we must respect -
+ * val + (val >> 5) <= 0x
+ * val + val/32 <= 0x
+ * val <= (0x * 32) / 33
+ * val <= 0xF83E0F82
+ * Hence the threshold for val below which we can use the
+ * approximation is 0xF83E0F82
+ */
+#define DIV_APPROXIMATION_THRESHOLD 0xF83E0F82UL
+
+/*
+ * Used for calculating last_residency in usec. Optimized for case
+ * where last_residency in nsecs is < DIV_APPROXIMATION_THRESHOLD
+ * Approximated value has less than 1% error.
+ */
+static inline int convert_nsec_to_usec(u64 nsec)
+{
+   if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
+   u32 usec = nsec;
+
+   usec += usec >> 5;
+   usec = usec >> 10;
+
+   /* Can safely cast to int since usec is < INT_MAX */
+   return usec;
+   } else {
+   u64 usec = div_u64(nsec, 1000);
+
+   if (usec > INT_MAX)
+   usec = INT_MAX;
+
+   /* Can safely cast to int since usec is < INT_MAX */
+   return usec;
+   }
+}
+
 #endif /* __DRIVER_CPUIDLE_H */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] cpuidle: Fix last_residency division

2016-06-30 Thread Shreyas B Prabhu



On 06/29/2016 08:31 PM, Nicolas Pitre wrote:
> On Wed, 29 Jun 2016, Daniel Lezcano wrote:
> 
>> On 06/29/2016 09:06 AM, Shreyas B. Prabhu wrote:
>>> diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
>>> index f87f399..c8ea5ad 100644
>>> --- a/drivers/cpuidle/cpuidle.h
>>> +++ b/drivers/cpuidle/cpuidle.h
>>> @@ -68,4 +68,27 @@ static inline void
>>> cpuidle_coupled_unregister_device(struct cpuidle_device *dev)
>>>   }
>>>   #endif
>>>
>>> +/*
>>> + * Used for calculating last_residency in usec. Optimized for case
>>> + * where last_residency in nsecs is < INT_MAX/2 by using faster
>>> + * approximation. Approximated value has less than 1% error.
>>> + */
>>> +static inline int convert_nsec_to_usec(u64 nsec)
>>> +{
>>> +   if (likely(nsec < INT_MAX / 2)) {
>>
>> UINT_MAX ?
> 
> Actually this can be better than that.
> 
>>> +   int usec = (int)nsec;
> 
> First, you'll want an unsigned type. Given the provided argument is u64, 
> we can assume there won't be any negative values here.
> 
> Then it would be wise to use a type with an explicit width, like U32.

Cool. I wanted to avoid multiple casts. i.e u64 -> u32 -> int. But I
guess there is no real need to avoid it.

Sending v4 with your suggestions.

Thanks,
Shreyas

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] cxl: Fix bug where AFU disable operation had no effect

2016-06-30 Thread Frederic Barrat


Hi Ian,


-static int afu_control(struct cxl_afu *afu, u64 command,
+static int afu_control(struct cxl_afu *afu, u64 command, u64 clear,
   u64 result, u64 mask, bool enabled)


I'm not a big fan of the new "clear" argument, which forces us to pass 
an extra 0 most of the time. Why not always clearing the "action" bits 
of the register before applying the command? They are mutually 
exclusive, so it should work. I.e. :


+   AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+   AFU_Cntl &= ~(CXL_AFU_Cntl_An_E | CXL_AFU_Cntl_An_RA);
+   AFU_Cntl |= command;





@@ -595,7 +597,16 @@ static int deactivate_afu_directed(struct cxl_afu *afu)
cxl_sysfs_afu_m_remove(afu);
cxl_chardev_afu_remove(afu);

-   cxl_ops->afu_reset(afu);
+   if (afu->adapter->native->sl_ops->needs_reset_before_disable) {
+   /*
+* XXX: We may be able to do away with this entirely - it is
+* possible that this was only ever needed due to a bug where
+* the disable operation did not clear the enable bit, however
+* I will only consider dropping this after more regression
+* testing on earlier PSL images.
+*/
+   cxl_ops->afu_reset(afu);
+   }
cxl_afu_disable(afu);
cxl_psl_purge(afu);

@@ -735,7 +746,16 @@ static int native_attach_process(struct cxl_context *ctx, 
bool kernel,

  static inline int detach_process_native_dedicated(struct cxl_context *ctx)
  {
-   cxl_ops->afu_reset(ctx->afu);
+   if (ctx->afu->adapter->native->sl_ops->needs_reset_before_disable) {
+   /*
+* XXX: We may be able to do away with this entirely - it is
+* possible that this was only ever needed due to a bug where
+* the disable operation did not clear the enable bit, however
+* I will only consider dropping this after more regression
+* testing on earlier PSL images.
+*/
+   cxl_ops->afu_reset(ctx->afu);
+   }


For dedicated mode, the CAIA recommends an explicit reset of the AFU 
(section 2.1.1).


For directed mode, CAIA says it's AFU-specific. So for xsl, we only 
disable the afu and purge the xsl. Are we getting rid of the reset 
because it's useless in that environment, or because it times out? If 
just because of timeout, would it make sense to switch the order and 
disable first, then reset? I don't see any afu-reset on the next activation.


  Fred

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible parent clocks

2016-06-30 Thread Rafael J. Wysocki

On Thursday, June 30, 2016 05:46:42 AM Scott Wood wrote:
> On 06/29/2016 10:02 PM, Yuantian Tang wrote:
> >> -Original Message-
> >> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> >> Sent: Thursday, June 30, 2016 10:24 AM
> >> To: Yuantian Tang 
> >> Cc: Scott Wood ; Russell King ;
> >> Michael Turquette ; Stephen Boyd
> >> ; Viresh Kumar ; linux-
> >> c...@vger.kernel.org; linux...@vger.kernel.org; linuxppc-
> >> d...@lists.ozlabs.org; Yang-Leo Li ; Xiaofeng Ren
> >> ; Scott Wood 
> >> Subject: Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible
> >> parent clocks
> >>
> >> On Thursday, June 30, 2016 01:47:09 AM Yuantian Tang wrote:
>  -Original Message-
>  From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
>  Sent: Thursday, June 30, 2016 9:47 AM
>  To: Yuantian Tang 
>  Cc: Scott Wood ; Russell King
>  ; Michael Turquette
>  ; Stephen Boyd ;
>  Viresh Kumar ; linux- c...@vger.kernel.org;
>  linux...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Yang-Leo
>  Li ; Xiaofeng Ren ; Scott
>  Wood 
>  Subject: Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering
>  possible parent clocks
> 
>  On Wednesday, June 29, 2016 05:50:26 AM Yuantian Tang wrote:
> > Hi,
> >
> > This patch is acked by clock maintainer. If no comments from
> > anyone else,
>  we will merge it in next week.
> 
>  There is a cpufreq commit depending on it.  Are you going to handle
>  that one too?
> 
> >>> That one has been acked by cpufreq maintainer. You can get this from
> >> patch comments.
> >>
> >> I know that it has been ACKed.
> >>
> >> My question is whether or not you are going to apply it along the [1/2].
> >>
> >> If not, it will have to be deferred until the [1/2] is merged and then 
> >> applied
> >> which may not be desirable.
> >>
> > I hope we can apply both at same time. Seems Scott has a few concerns.
> > 
> > What you think about this patch? Can you apply it?
> > If you have applied this patch, then I can push CPUfreq maintainer to apply 
> > another one which will be delayed.
> 
> My only concern was getting an ack for this patch (1/2) -- did I miss it
> somewhere?

OK, so who's going to apply the series?

Thanks,
Rafael

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [V2] powerpc/fadump: trivial fix of spelling mistake, clean up message

2016-06-30 Thread Michael Ellerman

On Mon, 2016-27-06 at 11:07:41 UTC, Colin King wrote:
> From: Colin Ian King 
> 
> Fix trivial spelling mistake "rgistration". Also use pr_err
> instead of printk and unsplit the string to keep it all on one
> line.
> 
> Signed-off-by: Colin Ian King 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/4a03749f140cbee6fee66b674b

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/powernv: spelling mistake: "Retrived" -> "Retrieved"

2016-06-30 Thread Michael Ellerman

On Thu, 2016-23-06 at 17:05:56 UTC, Colin King wrote:
> From: Colin Ian King 
> 
> trivial fix to spelling mistake in pr_debug message
> 
> Signed-off-by: Colin Ian King 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/6e8a9279a85abd07d05e932284

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [V5, 3/3] powerpc/drivers: Add driver for operator panel on FSP machines

2016-06-30 Thread Michael Ellerman

On Wed, 2016-29-06 at 03:38:39 UTC, Suraj Jitindar Singh wrote:
> Implement new character device driver to allow access from user space
> to the operator panel display present on IBM Power Systems machines
> with FSPs.
> 
> This will allow status information to be presented on the display which
> is visible to a user.
> 
> The driver implements a character buffer which a user can read/write
> by accessing the device (/dev/op_panel). This buffer is then displayed on
> the operator panel display. Any attempt to write past the last character
> position will have no effect and attempts to write more characters than
> the size of the display will be truncated. The device may only be accessed
> by a single process at a time.
> 
> Signed-off-by: Suraj Jitindar Singh 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/43a1dd9b5fc64184e578ac1570

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [V5, 2/3] powerpc/opal: Add inline function to get rc from an ASYNC_COMP opal_msg

2016-06-30 Thread Michael Ellerman

On Wed, 2016-29-06 at 03:38:38 UTC, Suraj Jitindar Singh wrote:
> An opal_msg of type OPAL_MSG_ASYNC_COMP contains the return code in the
> params[1] struct member. However this isn't intuitive or obvious when
> reading the code and requires that a user look at the skiboot
> documentation or opal-api.h to verify this.
> 
> Add an inline function to get the return code from an opal_msg and update
> call sites accordingly.
> 
> Signed-off-by: Suraj Jitindar Singh 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d0226d315dba5e401a124b394a

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND, v7, 2/2] cxl: Add set and get private data to context struct

2016-06-30 Thread Michael Ellerman

On Fri, 2016-24-06 at 06:47:07 UTC, Philippe Bergheaud wrote:
> From: Michael Neuling 
> 
> This provides AFU drivers a means to associate private data with a cxl
> context. This is particularly intended for make the new callbacks for
> driver specific events easier for AFU drivers to use, as they can easily
> get back to any private data structures they may use.
> 
> Signed-off-by: Michael Neuling 
> Signed-off-by: Ian Munsie 
> Signed-off-by: Philippe Bergheaud  Reviewed-by: Matthew R. Ochs 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/ad42de859ff14c079e966e61cb

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [V5, 1/3] devicetree/bindings: Add binding for operator panel on FSP machines

2016-06-30 Thread Michael Ellerman

On Wed, 2016-29-06 at 03:38:37 UTC, Suraj Jitindar Singh wrote:
> Add a binding to Documentation/devicetree/bindings/powerpc/opal
> (oppanel-opal.txt) for the operator panel which is present on IBM
> Power Systems machines with FSPs.
> 
> Signed-off-by: Suraj Jitindar Singh 
> Acked-by: Rob Herring 
> Acked-by: Stewart Smith 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1ae88fd54c3ac31f68f91e37f7

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v7, 1/2] cxl: Add mechanism for delivering AFU driver specific events

2016-06-30 Thread Michael Ellerman

On Thu, 2016-23-06 at 13:03:53 UTC, Philippe Bergheaud wrote:
> This adds an afu_driver_ops structure with fetch_event() and
> event_delivered() callbacks. An AFU driver such as cxlflash can fill
> this out and associate it with a context to enable passing custom AFU
> specific events to userspace.
...
> 
> Signed-off-by: Philippe Bergheaud 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b810253bd9342f863a86ec7dff

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] cxl: Ignore CAPI adapters misplaced in switched slots

2016-06-30 Thread Philippe Bergheaud

One should not attempt to switch a PHB into CAPI mode if there is
a switch between the PHB and the adapter. This patch modifies the
cxl driver to ignore CAPI adapters misplaced in switched slots.

Signed-off-by: Philippe Bergheaud 
---
This patch fixes Bz 142217.

 drivers/misc/cxl/pci.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index a08fcc8..2f978ed 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1280,6 +1280,30 @@ static void cxl_pci_remove_adapter(struct cxl *adapter)
device_unregister(>dev);
 }
 
+#define CXL_MAX_PCIEX_PARENT 2
+
+static int cxl_slot_is_switched(struct pci_dev *dev)
+{
+   struct device_node *np;
+   int depth = 0;
+   const __be32 *prop;
+
+   if (!(np = pci_device_to_OF_node(dev))) {
+   pr_err("cxl: np = NULL\n");
+   return -ENODEV;
+   }
+   of_node_get(np);
+   while (np) {
+   np = of_get_next_parent(np);
+   prop = of_get_property(np, "device_type", NULL);
+   if (!prop || strcmp((char *)prop, "pciex"))
+   break;
+   depth++;
+   }
+   of_node_put(np);
+   return (depth > CXL_MAX_PCIEX_PARENT);
+}
+
 static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
 {
struct cxl *adapter;
@@ -1291,6 +1315,11 @@ static int cxl_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
return -ENODEV;
}
 
+   if (cxl_slot_is_switched(dev)) {
+   dev_dbg(>dev, "cxl_init_adapter: Ignoring switched slot 
device\n");
+   return -ENODEV;
+   }
+
if (cxl_verbose)
dump_cxl_config_space(dev);
 
-- 
2.8.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] cxl: Workaround XSL bug that does not clear the RA bit after a reset

2016-06-30 Thread Frederic Barrat




Le 29/06/2016 20:51, Ian Munsie a écrit :

From: Ian Munsie 

An issue was noted in our debug logs where the XSL would leave the RA
bit asserted after an AFU reset operation, which would effectively
prevent further AFU reset operations from working.

Workaround the issue by clearing the RA bit with an MMIO write if it is
still asserted after any AFU control operation.

Signed-off-by: Ian Munsie 



OK, should be safe and valid even on non-xsl hardware.

Reviewed-by: Frederic Barrat 


  Fred

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 7/7] PCI: Add a macro to set default alignment for all PCI devices

2016-06-30 Thread Yongji Xie

Now we can use something like "pci=resource_alignment=*:*:*.*:noresize"
to enforce the alignment of all MMIO BARs to be at least PAGE_SIZE so
that we can passthrough sub-page(size < PAGE_SIZE) BARs to guest in
VFIO module.

But sometimes we may want to enable this feature by default on
some platforms such as PowerNV platform which would easily see those
sub-page BARs because of its 64K page. To achieve that, we add a
macro PCIBIOS_DEFAULT_ALIGNMENT to set default alignment for all
PCI devices and define it on PowerNV platform.

Signed-off-by: Yongji Xie 
---
 arch/powerpc/include/asm/pci.h |4 
 drivers/pci/pci.c  |4 
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index a6f3ac0..b0d76b8 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -28,6 +28,10 @@
 #define PCIBIOS_MIN_IO 0x1000
 #define PCIBIOS_MIN_MEM0x1000
 
+#ifdef CONFIG_PPC_POWERNV
+#define PCIBIOS_DEFAULT_ALIGNMENT  PAGE_SIZE
+#endif
+
 struct pci_dev;
 
 /* Values for the `which' argument to sys_pciconfig_iobase syscall.  */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2e15ac8..dde0cce 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4761,6 +4761,10 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
char *p;
bool invalid = false;
 
+#ifdef PCIBIOS_DEFAULT_ALIGNMENT
+   align = PCIBIOS_DEFAULT_ALIGNMENT;
+   *resize = false;
+#endif
spin_lock(_alignment_lock);
p = resource_alignment_param;
if (pci_has_flag(PCI_PROBE_ONLY)) {
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 2/7] PCI: Ignore enforced alignment to VF BARs

2016-06-30 Thread Yongji Xie

VF BARs are read-only zeroes according to SRIOV spec,
the normal way(writing BARs) of allocating resources wouldn't
be applied to VFs. The VFs' resources would be allocated
when we enable SR-IOV capability. So we should not try to
reassign alignment after we enable VFs. It's meaningless
and will release the allocated resources which leads to a bug.

Signed-off-by: Yongji Xie 
---
 drivers/pci/pci.c |4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index be8f72c..6ae02de 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4822,6 +4822,10 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
resource_size_t align, size;
u16 command;
 
+   /* We should never try to reassign VF's alignment */
+   if (dev->is_virtfn)
+   return;
+
/* check if specified PCI is target device to reassign */
align = pci_specified_resource_alignment(dev);
if (!align)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 6/7] PCI: Add support for enforcing all MMIO BARs to be page aligned

2016-06-30 Thread Yongji Xie

When vfio passthrough a PCI device of which MMIO BARs are
smaller than PAGE_SIZE, guest will not handle the mmio
accesses to the BARs which leads to mmio emulations in host.

This is because vfio will not allow to passthrough one BAR's
mmio page which may be shared with other BARs. Otherwise,
there will be a backdoor that guest can use to access BARs
of other guest.

To solve this issue, this patch modifies resource_alignment
to support syntax where multiple devices get the same
alignment. So we can use something like
"pci=resource_alignment=*:*:*.*:noresize" to enforce the
alignment of all MMIO BARs to be at least PAGE_SIZE so that
one BAR's mmio page would not be shared with other BARs.

Signed-off-by: Yongji Xie 
---
 Documentation/kernel-parameters.txt |2 ++
 drivers/pci/pci.c   |   60 ---
 2 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index c4802f5..cb09503 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3003,6 +3003,8 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
aligned memory resources.
If  is not specified,
PAGE_SIZE is used as alignment.
+   , ,  and  can be set to
+   "*" which means match all values.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
noresize: Don't change the resources' sizes when
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1d80a94..2e15ac8 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4759,6 +4759,7 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
int seg, bus, slot, func, align_order, count;
resource_size_t align = 0;
char *p;
+   bool invalid = false;
 
spin_lock(_alignment_lock);
p = resource_alignment_param;
@@ -4777,16 +4778,49 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
} else {
align_order = -1;
}
-   if (sscanf(p, "%x:%x:%x.%x%n",
-   , , , , ) != 4) {
+   if (p[0] == '*' && p[1] == ':') {
+   seg = -1;
+   count = 1;
+   } else if (sscanf(p, "%x%n", , ) != 1 ||
+   p[count] != ':') {
+   invalid = true;
+   break;
+   }
+   p += count + 1;
+   if (*p == '*') {
+   bus = -1;
+   count = 1;
+   } else if (sscanf(p, "%x%n", , ) != 1) {
+   invalid = true;
+   break;
+   }
+   p += count;
+   if (*p == '.') {
+   slot = bus;
+   bus = seg;
seg = 0;
-   if (sscanf(p, "%x:%x.%x%n",
-   , , , ) != 3) {
-   /* Invalid format */
-   printk(KERN_ERR "PCI: Can't parse 
resource_alignment parameter: %s\n",
-   p);
+   p++;
+   } else if (*p == ':') {
+   p++;
+   if (p[0] == '*' && p[1] == '.') {
+   slot = -1;
+   count = 1;
+   } else if (sscanf(p, "%x%n", , ) != 1 ||
+   p[count] != '.') {
+   invalid = true;
break;
}
+   p += count + 1;
+   } else {
+   invalid = true;
+   break;
+   }
+   if (*p == '*') {
+   func = -1;
+   count = 1;
+   } else if (sscanf(p, "%x%n", , ) != 1) {
+   invalid = true;
+   break;
}
p += count;
if (!strncmp(p, ":noresize", 9)) {
@@ -4794,10 +4828,10 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
p += 9;
} else
*resize = true;
-   if (seg == pci_domain_nr(dev->bus) &&
-   bus == dev->bus->number &&
-   slot == PCI_SLOT(dev->devfn) &&
-   func == PCI_FUNC(dev->devfn)) {
+   if ((seg == pci_domain_nr(dev->bus) || seg == -1) &&
+

[PATCH v3 5/7] PCI: Do not use IORESOURCE_STARTALIGN to identify bridge resources

2016-06-30 Thread Yongji Xie

Now we use the IORESOURCE_STARTALIGN to identify bridge resources
in __assign_resources_sorted(). That's quite fragile. We may also
set flag IORESOURCE_STARTALIGN for SR-IOV resources in some cases,
for example, using the option "noresize" of parameter
"pci=resource_alignment".

In this patch, we try to use a more robust way to identify
bridge resources.

Signed-off-by: Yongji Xie 
---
 drivers/pci/setup-bus.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 55641a3..216ddbc 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -390,6 +390,7 @@ static void __assign_resources_sorted(struct list_head 
*head,
struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
unsigned long fail_type;
resource_size_t add_align, align;
+   int index;
 
/* Check if optional add_size is there */
if (!realloc_head || list_empty(realloc_head))
@@ -410,11 +411,13 @@ static void __assign_resources_sorted(struct list_head 
*head,
 
/*
 * There are two kinds of additional resources in the list:
-* 1. bridge resource  -- IORESOURCE_STARTALIGN
-* 2. SR-IOV resource   -- IORESOURCE_SIZEALIGN
+* 1. bridge resource
+* 2. SR-IOV resource
 * Here just fix the additional alignment for bridge
 */
-   if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
+   index = dev_res->res - dev_res->dev->resource;
+   if (index < PCI_BRIDGE_RESOURCES ||
+   index > PCI_BRIDGE_RESOURCE_END)
continue;
 
add_align = get_res_add_align(realloc_head, dev_res->res);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 3/7] PCI: Do not disable memory decoding in pci_reassigndev_resource_alignment()

2016-06-30 Thread Yongji Xie

We should not disable memory decoding when we reassign alignment
in pci_reassigndev_resource_alignment(). It's meaningless and
have some side effect. For example, some fixup functions such as
quirk_e100_interrupt() read PCI_COMMAND_MEMORY to know whether
the devices has been initialized by the firmware or not. If we
disable memory decoding here, these functions will get a wrong
information that the devices was not initialized by the firmware
which may cause a wrong fixup. Besides, disabling memory decoding
may also break some devices that need to have memory decoding
always-on during probing.

Signed-off-by: Yongji Xie 
---
 drivers/pci/pci.c |8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6ae02de..6241cfc 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4820,7 +4820,6 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
int i;
struct resource *r;
resource_size_t align, size;
-   u16 command;
 
/* We should never try to reassign VF's alignment */
if (dev->is_virtfn)
@@ -4838,12 +4837,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
return;
}
 
-   dev_info(>dev,
-   "Disabling memory decoding and releasing memory resources.\n");
-   pci_read_config_word(dev, PCI_COMMAND, );
-   command &= ~PCI_COMMAND_MEMORY;
-   pci_write_config_word(dev, PCI_COMMAND, command);
-
+   dev_info(>dev, "Releasing memory resources.\n");
for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
r = >resource[i];
if (!(r->flags & IORESOURCE_MEM))
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 4/7] PCI: Add a new option for resource_alignment to reassign alignment

2016-06-30 Thread Yongji Xie

When using resource_alignment kernel parameter, the current
implement reassigns the alignment by changing resources' size
which can potentially break some drivers. For example, the driver
uses the size to locate some register whose length is related
to the size.

This patch adds a new option "noresize" for the parameter to
solve this problem.

Signed-off-by: Yongji Xie 
---
 Documentation/kernel-parameters.txt |5 -
 drivers/pci/pci.c   |   35 +--
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 82b42c9..c4802f5 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2997,13 +2997,16 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
window. The default value is 64 megabytes.
resource_alignment=
Format:
-   [@][:]:.[; ...]
+   [@][:]:.
+   [:noresize][; ...]
Specifies alignment and device to reassign
aligned memory resources.
If  is not specified,
PAGE_SIZE is used as alignment.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
+   noresize: Don't change the resources' sizes when
+   reassigning alignment.
ecrc=   Enable/disable PCIe ECRC (transaction layer
end-to-end CRC checking).
bios: Use BIOS/firmware settings. This is the
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6241cfc..1d80a94 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4748,11 +4748,13 @@ static DEFINE_SPINLOCK(resource_alignment_lock);
 /**
  * pci_specified_resource_alignment - get resource alignment specified by user.
  * @dev: the PCI device to get
+ * @resize: whether or not to change resources' size when reassigning alignment
  *
  * RETURNS: Resource alignment if it is specified.
  *  Zero if it is not specified.
  */
-static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
+static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
+   bool *resize)
 {
int seg, bus, slot, func, align_order, count;
resource_size_t align = 0;
@@ -4787,6 +4789,11 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev)
}
}
p += count;
+   if (!strncmp(p, ":noresize", 9)) {
+   *resize = false;
+   p += 9;
+   } else
+   *resize = true;
if (seg == pci_domain_nr(dev->bus) &&
bus == dev->bus->number &&
slot == PCI_SLOT(dev->devfn) &&
@@ -4819,6 +4826,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
 {
int i;
struct resource *r;
+   bool resize = true;
resource_size_t align, size;
 
/* We should never try to reassign VF's alignment */
@@ -4826,7 +4834,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
return;
 
/* check if specified PCI is target device to reassign */
-   align = pci_specified_resource_alignment(dev);
+   align = pci_specified_resource_alignment(dev, );
if (!align)
return;
 
@@ -4848,15 +4856,22 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
continue;
}
size = resource_size(r);
-   if (size < align) {
-   size = align;
-   dev_info(>dev,
-   "Rounding up size of resource #%d to %#llx.\n",
-   i, (unsigned long long)size);
+   if (resize) {
+   if (size < align) {
+   size = align;
+   dev_info(>dev,
+   "Rounding up size of resource #%d to 
%#llx.\n",
+   i, (unsigned long long)size);
+   }
+   r->flags |= IORESOURCE_UNSET;
+   r->end = size - 1;
+   r->start = 0;
+   } else {
+   r->flags &= ~IORESOURCE_SIZEALIGN;
+   r->flags |= IORESOURCE_STARTALIGN | IORESOURCE_UNSET;
+   r->start = max(align, size);
+   r->end = r->start + size - 1;

[PATCH v3 1/7] PCI: Ignore enforced alignment when kernel uses existing firmware setup

2016-06-30 Thread Yongji Xie

PCI resources allocator will use firmware setup and not try to
reassign resource when PCI_PROBE_ONLY or IORESOURCE_PCI_FIXED
is set.

The enforced alignment in pci_reassigndev_resource_alignment()
should be ignored in this case. Otherwise, some PCI devices'
resources would be released here and not re-allocated.

Signed-off-by: Yongji Xie 
---
 drivers/pci/pci.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c8b4dbd..be8f72c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4760,6 +4760,13 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev)
 
spin_lock(_alignment_lock);
p = resource_alignment_param;
+   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (*p)
+   printk_once(KERN_INFO "PCI: Ignore resource_alignment 
parameter: %s with PCI_PROBE_ONLY set\n",
+   p);
+   spin_unlock(_alignment_lock);
+   return 0;
+   }
while (*p) {
count = 0;
if (sscanf(p, "%d%n", _order, ) == 1 &&
@@ -4837,6 +4844,11 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
r = >resource[i];
if (!(r->flags & IORESOURCE_MEM))
continue;
+   if (r->flags & IORESOURCE_PCI_FIXED) {
+   dev_info(>dev, "No alignment for fixed BAR%d: 
%pR\n",
+   i, r);
+   continue;
+   }
size = resource_size(r);
if (size < align) {
size = align;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/7] PCI: Add support for enforcing all MMIO BARs not to share PAGE_SIZE

2016-06-30 Thread Yongji Xie

This series aims to add an option for PCI resource allocator to  
force BARs not to share PAGE_SIZE. This would make sense to VFIO 
driver. Because current VFIO implementation disallows to mmap 
sub-page(size < PAGE_SIZE) MMIO BARs which may share the same page 
with other BARs for security reasons. Thus, we have to handle mmio 
access to these BARs in QEMU emulation rather than in guest which 
will cause some performance loss.

To achieve that, we would like to make use of the existing 
resource_alignment kernel parameter and force a minimum alignment 
of PAGE_SIZE. It's flexible. And we can enable it by default on
some archs which may easily hit the performance issue because of  
their 64K page.

In this series, patch 1,2,3 fixed bugs of using resource_alignment;
patch 4,5 tried to add a new option for resource_alignment to use 
IORESOURCE_STARTALIGN to specify the alignment of PCI BARs; patch 6
modified resource_alignment to support syntax which can be used to
enforce the alignment of all MMIO BARs to be at least PAGE_SIZE;
patch 7 enabled this feature by default on PowerNV platform.

Changelog v3:
- Ignore enforced alignment to fixed BARs
- Fix issue that disabling memory decoding when reassigning the alignment
- Only enable default alignment on PowerNV platform

Changelog v2:
- Ignore enforced alignment to VF BARs on pci_reassigndev_resource_alignment()

Yongji Xie (7):
  PCI: Ignore enforced alignment when kernel uses existing firmware setup
  PCI: Ignore enforced alignment to VF BARs
  PCI: Do not disable memory decoding in pci_reassigndev_resource_alignment()
  PCI: Add a new option for resource_alignment to reassign alignment
  PCI: Do not use IORESOURCE_STARTALIGN to identify bridge resources
  PCI: Add support for enforcing all MMIO BARs to be page aligned
  PCI: Add a macro to set default alignment for all PCI devices

 Documentation/kernel-parameters.txt |7 +-
 arch/powerpc/include/asm/pci.h  |4 ++
 drivers/pci/pci.c   |  123 +++
 drivers/pci/setup-bus.c |9 ++-
 4 files changed, 111 insertions(+), 32 deletions(-)

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCHv2 0/7] eBPF JIT for PPC64

2016-06-30 Thread Naveen N. Rao

On 2016/06/30 12:34PM, Denis Kirjanov wrote:
> On 6/30/16, Andreas Ziegler <andreas.zieg...@fau.de> wrote:
> > Hi Naveen,
> >
> > this patchset makes a change to arch/powerpc/net/Makefile in order to only
> > compile the previously existing bpf_jit_comp.c if !CONFIG_PPC64, and use
> > bpf_jit_comp64.c if CONFIG_PPC64 is enabled.
> >
> > Inside arch/powerpc/net/bpf_jit_comp.c, however, there is still an #ifdef
> > CONFIG_PPC64 block at line 667 (linux-next of today, i.e., next-20160630):
> >
> >   #ifdef CONFIG_PPC64
> >   /* Function descriptor nastiness: Address + TOC */
> >   ((u64 *)image)[0] = (u64)code_base;
> >   ((u64 *)image)[1] = local_paca->kernel_toc;
> >   #endif
> >
> > From my understanding of the code, this #ifdef can now be removed, as there
> > is
> > no way the file could be compiled with CONFIG_PPC64 enabled. Is this
> > correct?
> 
> That was used for running classic BPF on ppc64. With eBPF on ppc64 the
> whole block can be removed.

Yes, that can be removed and so can the many 64-bit related macros in 
bpf_jit32.h. The reason I didn't remove those just yet was so that we 
could easily try out the classic BPF JIT for ppc64 BE, for 
testing/performance comparison and so on. I think it would be good to 
retain this for one kernel release.

- Naveen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 3/3] powerpc: tm: Enable transactional memory (TM) lazily for userspace

2016-06-30 Thread Laurent Dufour

On 29/06/2016 08:34, Cyril Bur wrote:
> Currently the MSR TM bit is always set if the hardware is TM capable.
> This adds extra overhead as it means the TM SPRS (TFHAR, TEXASR and
> TFAIR) must be swapped for each process regardless of if they use TM.
> 
> For processes that don't use TM the TM MSR bit can be turned off
> allowing the kernel to avoid the expensive swap of the TM registers.
> 
> A TM unavailable exception will occur if a thread does use TM and the
> kernel will enable MSR_TM and leave it so for some time afterwards.
> 
> Signed-off-by: Cyril Bur 
> ---
>  arch/powerpc/include/asm/processor.h |  1 +
>  arch/powerpc/kernel/process.c| 30 ++
>  arch/powerpc/kernel/traps.c  |  8 
>  3 files changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/processor.h 
> b/arch/powerpc/include/asm/processor.h
> index 5ff1e4c..9d4363c 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -257,6 +257,7 @@ struct thread_struct {
>   int used_spe;   /* set if process has used spe */
>  #endif /* CONFIG_SPE */
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> + u8  load_tm;
>   u64 tm_tfhar;   /* Transaction fail handler addr */
>   u64 tm_texasr;  /* Transaction exception & summary */
>   u64 tm_tfiar;   /* Transaction fail instr address reg */
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 2e903c6..8abecda 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -870,6 +870,9 @@ void tm_recheckpoint(struct thread_struct *thread,
>  {
>   unsigned long flags;
>  
> + if (!(thread->regs->msr & MSR_TM))
> + return;
> +
>   /* We really can't be interrupted here as the TEXASR registers can't
>* change and later in the trecheckpoint code, we have a userspace R1.
>* So let's hard disable over this region.
> @@ -905,6 +908,9 @@ static inline void tm_recheckpoint_new_task(struct 
> task_struct *new)
>   if (!new->thread.regs)
>   return;
>  
> + if (!(new->thread.regs->msr & MSR_TM))
> + return;
> +
>   if (!MSR_TM_ACTIVE(new->thread.regs->msr)){
>   tm_restore_sprs(>thread);
>   return;
> @@ -925,11 +931,18 @@ static inline void tm_recheckpoint_new_task(struct 
> task_struct *new)
>new->pid, mfmsr());
>  }
>  
> -static inline void __switch_to_tm(struct task_struct *prev)
> +static inline void __switch_to_tm(struct task_struct *prev, struct 
> task_struct *new)
>  {
>   if (cpu_has_feature(CPU_FTR_TM)) {
> - tm_enable();
> - tm_reclaim_task(prev);
> + if (prev->thread.regs && (prev->thread.regs->msr & MSR_TM)) {
> + prev->thread.load_tm++;
> + tm_enable();
> + tm_reclaim_task(prev);
> + if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && 
> prev->thread.load_tm == 0)
> + prev->thread.regs->msr |= ~MSR_TM;

Hi Cyrill,

I guess the idea is to clear MSR_TM here, so why "or-ing" here ?
I'd rather see :
 +  prev->thread.regs->msr &= ~MSR_TM;

Cheers,
Laurent.

> + } else if (new && new->thread.regs && (new->thread.regs->msr & 
> MSR_TM)) {
> + tm_enable();
> + }
>   }
>  }
>  
> @@ -965,7 +978,7 @@ void restore_tm_state(struct pt_regs *regs)
>  
>  #else
>  #define tm_recheckpoint_new_task(new)
> -#define __switch_to_tm(prev)
> +#define __switch_to_tm(prev, new)
>  #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
>  
>  static inline void save_sprs(struct thread_struct *t)
> @@ -1095,7 +1108,7 @@ struct task_struct *__switch_to(struct task_struct 
> *prev,
>   /* Save FPU, Altivec, VSX and SPE state */
>   giveup_all(prev);
>  
> - __switch_to_tm(prev);
> + __switch_to_tm(prev, new);
>  
>   /*
>* We can't take a PMU exception inside _switch() since there is a
> @@ -1340,8 +1353,11 @@ int arch_dup_task_struct(struct task_struct *dst, 
> struct task_struct *src)
>* transitions the CPU out of TM mode.  Hence we need to call
>* tm_recheckpoint_new_task() (on the same task) to restore the
>* checkpointed state back and the TM mode.
> +  *
> +  * Can't pass dst because it isn't ready. Doesn't matter, passing
> +  * dst is only important for __switch_to()
>*/
> - __switch_to_tm(src);
> + __switch_to_tm(src, NULL);
>   tm_recheckpoint_new_task(src);
>  
>   *dst = *src;
> @@ -1574,8 +1590,6 @@ void start_thread(struct pt_regs *regs, unsigned long 
> start, unsigned long sp)
>   current->thread.used_spe = 0;
>  #endif /* CONFIG_SPE */
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> - if (cpu_has_feature(CPU_FTR_TM))
> -

Re: [PATCHv2 0/7] eBPF JIT for PPC64

2016-06-30 Thread Denis Kirjanov

On 6/30/16, Andreas Ziegler <andreas.zieg...@fau.de> wrote:
> Hi Naveen,
>
> this patchset makes a change to arch/powerpc/net/Makefile in order to only
> compile the previously existing bpf_jit_comp.c if !CONFIG_PPC64, and use
> bpf_jit_comp64.c if CONFIG_PPC64 is enabled.
>
> Inside arch/powerpc/net/bpf_jit_comp.c, however, there is still an #ifdef
> CONFIG_PPC64 block at line 667 (linux-next of today, i.e., next-20160630):
>
>   #ifdef CONFIG_PPC64
>   /* Function descriptor nastiness: Address + TOC */
>   ((u64 *)image)[0] = (u64)code_base;
>   ((u64 *)image)[1] = local_paca->kernel_toc;
>   #endif
>
> From my understanding of the code, this #ifdef can now be removed, as there
> is
> no way the file could be compiled with CONFIG_PPC64 enabled. Is this
> correct?

That was used for running classic BPF on ppc64. With eBPF on ppc64 the
whole block can be removed.

>
> Best regards,
>
> Andreas
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCHv2 0/7] eBPF JIT for PPC64

2016-06-30 Thread Andreas Ziegler

Hi Naveen,

this patchset makes a change to arch/powerpc/net/Makefile in order to only
compile the previously existing bpf_jit_comp.c if !CONFIG_PPC64, and use
bpf_jit_comp64.c if CONFIG_PPC64 is enabled.

Inside arch/powerpc/net/bpf_jit_comp.c, however, there is still an #ifdef
CONFIG_PPC64 block at line 667 (linux-next of today, i.e., next-20160630):

  #ifdef CONFIG_PPC64
  /* Function descriptor nastiness: Address + TOC */
  ((u64 *)image)[0] = (u64)code_base;
  ((u64 *)image)[1] = local_paca->kernel_toc;
  #endif

From my understanding of the code, this #ifdef can now be removed, as there is
no way the file could be compiled with CONFIG_PPC64 enabled. Is this correct?

Best regards,

Andreas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 RFC] pasemi: Fix boot failure on 4.7-rc1

2016-06-30 Thread Aneesh Kumar K.V

Olof Johansson  writes:

> Hi,
>
> On Wed, Jun 29, 2016 at 1:06 PM, Darren Stevens  
> wrote:
>>
>> Commit:d6a9996e84ac4beb7713e9485f4563e100a9b03e (powerpc/mm:
>> vmalloc abstraction in preparation for radix) turned kernel memory
>> and IO addresses from #defined constants to variables initialised
>> at runtime.
>>
>> On PA6T systems the setup_arch machine call initialises the onboard
>> PCI-e root-ports, and uses pci_io_base to do this, which is now before
>> its value has been set resulting in a panic right after 'booting
>> linux via __start()'
>
> I don't see the panic here on a Chitra, at least not on recent -next.
>
> What config are you building with when you see this?
>
> Please cc me on PA Semi related patches if you want my attention. I
> still have hardware, in fact I boot every -next release on it through
> automation.
>

You can find more details in the below thread.

https://lkml.kernel.org/r/8b4c4ab7-5c17-4992-935a-361153472...@xenosoft.de


-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 43/44] dma-mapping: Remove dma_get_attr

2016-06-30 Thread Krzysztof Kozlowski

After switching DMA attributes to unsigned long it is easier to just
compare the bits.

Signed-off-by: Krzysztof Kozlowski 
[for avr32]
Acked-by: Hans-Christian Noren Egtvedt 
[for arc]
Acked-by: Vineet Gupta 
[for arm64 and dma-iommu]
Acked-by: Robin Murphy 
---
 Documentation/DMA-API.txt  |  4 +--
 arch/arc/mm/dma.c  |  4 +--
 arch/arm/mm/dma-mapping.c  | 36 --
 arch/arm/xen/mm.c  |  4 +--
 arch/arm64/mm/dma-mapping.c| 10 +++
 arch/avr32/mm/dma-coherent.c   |  4 +--
 arch/ia64/sn/pci/pci_dma.c | 10 ++-
 arch/metag/kernel/dma.c|  2 +-
 arch/mips/mm/dma-default.c |  6 ++---
 arch/openrisc/kernel/dma.c |  4 +--
 arch/parisc/kernel/pci-dma.c   |  2 +-
 arch/powerpc/platforms/cell/iommu.c| 12 -
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c|  2 +-
 drivers/iommu/dma-iommu.c  |  2 +-
 drivers/media/v4l2-core/videobuf2-dma-contig.c |  2 +-
 include/linux/dma-mapping.h| 10 ---
 16 files changed, 47 insertions(+), 67 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 24f9688bb98a..1d26eeb6b5f6 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -422,9 +422,7 @@ void whizco_dma_map_sg_attrs(struct device *dev, dma_addr_t 
dma_addr,
 unsigned long attrs)
 {

-   int foo =  dma_get_attr(DMA_ATTR_FOO, attrs);
-   
-   if (foo)
+   if (attrs & DMA_ATTR_FOO)
/* twizzle the frobnozzle */

 
diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 3d1f467d1792..74bbe68dce9d 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -46,7 +46,7 @@ static void *arc_dma_alloc(struct device *dev, size_t size,
 *   (vs. always going to memory - thus are faster)
 */
if ((is_isa_arcv2() && ioc_exists) ||
-   dma_get_attr(DMA_ATTR_NON_CONSISTENT, attrs))
+   (attrs & DMA_ATTR_NON_CONSISTENT))
need_coh = 0;
 
/*
@@ -95,7 +95,7 @@ static void arc_dma_free(struct device *dev, size_t size, 
void *vaddr,
struct page *page = virt_to_page(dma_handle);
int is_non_coh = 1;
 
-   is_non_coh = dma_get_attr(DMA_ATTR_NON_CONSISTENT, attrs) ||
+   is_non_coh = (attrs & DMA_ATTR_NON_CONSISTENT) ||
(is_isa_arcv2() && ioc_exists);
 
if (PageHighMem(page) || !is_non_coh)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index ebb3fde99043..43e03b5293d0 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -126,7 +126,7 @@ static dma_addr_t arm_dma_map_page(struct device *dev, 
struct page *page,
 unsigned long offset, size_t size, enum dma_data_direction dir,
 unsigned long attrs)
 {
-   if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+   if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
__dma_page_cpu_to_dev(page, offset, size, dir);
return pfn_to_dma(dev, page_to_pfn(page)) + offset;
 }
@@ -155,7 +155,7 @@ static dma_addr_t arm_coherent_dma_map_page(struct device 
*dev, struct page *pag
 static void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-   if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+   if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
  handle & ~PAGE_MASK, size, dir);
 }
@@ -622,9 +622,9 @@ static void __free_from_contiguous(struct device *dev, 
struct page *page,
 
 static inline pgprot_t __get_dma_pgprot(unsigned long attrs, pgprot_t prot)
 {
-   prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
-   pgprot_writecombine(prot) :
-   pgprot_dmacoherent(prot);
+   prot = (attrs & DMA_ATTR_WRITE_COMBINE) ?
+   pgprot_writecombine(prot) :
+   pgprot_dmacoherent(prot);
return prot;
 }
 
@@ -744,7 +744,7 @@ static void *__dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
.gfp = gfp,
.prot = prot,
.caller = caller,
-   .want_vaddr = !dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs),
+   .want_vaddr = ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0),
};
 
 #ifdef CONFIG_DMA_API_DEBUG
@@ -887,7 +887,7 @@ static void __arm_dma_free(struct device *dev, size_t size, 
void *cpu_addr,
.size = PAGE_ALIGN(size),
.cpu_addr = cpu_addr,
.page =

[PATCH v5 21/44] powerpc: dma-mapping: Use unsigned long for dma_attrs

2016-06-30 Thread Krzysztof Kozlowski

Split out subsystem specific changes for easier reviews. This will be
squashed with main commit.

Signed-off-by: Krzysztof Kozlowski 
---
 arch/powerpc/include/asm/dma-mapping.h|  7 +++
 arch/powerpc/include/asm/iommu.h  | 10 +-
 arch/powerpc/kernel/dma-iommu.c   | 12 ++--
 arch/powerpc/kernel/dma.c | 18 +-
 arch/powerpc/kernel/ibmebus.c | 12 ++--
 arch/powerpc/kernel/iommu.c   | 12 ++--
 arch/powerpc/kernel/vio.c | 12 ++--
 arch/powerpc/platforms/cell/iommu.c   | 16 
 arch/powerpc/platforms/pasemi/iommu.c |  2 +-
 arch/powerpc/platforms/powernv/npu-dma.c  |  8 
 arch/powerpc/platforms/powernv/pci-ioda.c |  4 ++--
 arch/powerpc/platforms/powernv/pci.c  |  2 +-
 arch/powerpc/platforms/powernv/pci.h  |  2 +-
 arch/powerpc/platforms/ps3/system-bus.c   | 18 +-
 arch/powerpc/platforms/pseries/iommu.c|  6 +++---
 arch/powerpc/sysdev/dart_iommu.c  |  2 +-
 16 files changed, 71 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 77816acd4fd9..84e3f8dd5e4f 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -13,7 +13,6 @@
 /* need struct page definitions */
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -25,14 +24,14 @@
 /* Some dma direct funcs must be visible for use in other dma_ops */
 extern void *__dma_direct_alloc_coherent(struct device *dev, size_t size,
 dma_addr_t *dma_handle, gfp_t flag,
-struct dma_attrs *attrs);
+unsigned long attrs);
 extern void __dma_direct_free_coherent(struct device *dev, size_t size,
   void *vaddr, dma_addr_t dma_handle,
-  struct dma_attrs *attrs);
+  unsigned long attrs);
 extern int dma_direct_mmap_coherent(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t handle,
-   size_t size, struct dma_attrs *attrs);
+   size_t size, unsigned long attrs);
 
 #ifdef CONFIG_NOT_COHERENT_CACHE
 /*
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 7b87bab09564..760915241ce2 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -53,7 +53,7 @@ struct iommu_table_ops {
long index, long npages,
unsigned long uaddr,
enum dma_data_direction direction,
-   struct dma_attrs *attrs);
+   unsigned long attrs);
 #ifdef CONFIG_IOMMU_API
/*
 * Exchanges existing TCE with new TCE plus direction bits;
@@ -248,12 +248,12 @@ extern int ppc_iommu_map_sg(struct device *dev, struct 
iommu_table *tbl,
struct scatterlist *sglist, int nelems,
unsigned long mask,
enum dma_data_direction direction,
-   struct dma_attrs *attrs);
+   unsigned long attrs);
 extern void ppc_iommu_unmap_sg(struct iommu_table *tbl,
   struct scatterlist *sglist,
   int nelems,
   enum dma_data_direction direction,
-  struct dma_attrs *attrs);
+  unsigned long attrs);
 
 extern void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
  size_t size, dma_addr_t *dma_handle,
@@ -264,10 +264,10 @@ extern dma_addr_t iommu_map_page(struct device *dev, 
struct iommu_table *tbl,
 struct page *page, unsigned long offset,
 size_t size, unsigned long mask,
 enum dma_data_direction direction,
-struct dma_attrs *attrs);
+unsigned long attrs);
 extern void iommu_unmap_page(struct iommu_table *tbl, dma_addr_t dma_handle,
 size_t size, enum dma_data_direction direction,
-struct dma_attrs *attrs);
+unsigned long attrs);
 
 extern void iommu_init_early_pSeries(void);
 extern void iommu_init_early_dart(struct pci_controller_ops *controller_ops);
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 41a7d9d49a5a..fb7cbaa37658 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -18,7 +18,7 @@

[PATCH v5 00/44] dma-mapping: Use unsigned long for dma_attrs

2016-06-30 Thread Krzysztof Kozlowski

Hi,


This is fifth approach for replacing struct dma_attrs with unsigned
long.

The main patch (1/44) doing the change is split into many subpatches
for easier review (2-42).  They should be squashed together when
applying.


Rebased on v4.7-rc5.

For easier testing the patchset is available here:
repo:   https://github.com/krzk/linux
branch: for-next/dma-attrs-const-v5


Changes since v4

1. Collect some acks. Still need more.
2. Minor fixes pointed by Robin Murphy.
3. Applied changes from Bart Van Assche's comment.
4. More tests and builds (using https://www.kernel.org/pub/tools/crosstool/).


Changes since v3

1. Collect some acks.
2. Drop wrong patch 1/45 ("powerpc: dma-mapping: Don't hard-code
   the value of DMA_ATTR_WEAK_ORDERING").
3. Minor fix pointed out by Michael Ellerman.


Changes since v2

1. Follow Christoph Hellwig's comments (don't use BIT add
   documentation, remove dma_get_attr).


Rationale
=
The dma-mapping core and the implementations do not change the
DMA attributes passed by pointer.  Thus the pointer can point to const
data.  However the attributes do not have to be a bitfield. Instead
unsigned long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.


Best regards,
Krzysztof


Krzysztof Kozlowski (44):
  dma-mapping: Use unsigned long for dma_attrs
  alpha: dma-mapping: Use unsigned long for dma_attrs
  arc: dma-mapping: Use unsigned long for dma_attrs
  ARM: dma-mapping: Use unsigned long for dma_attrs
  arm64: dma-mapping: Use unsigned long for dma_attrs
  avr32: dma-mapping: Use unsigned long for dma_attrs
  blackfin: dma-mapping: Use unsigned long for dma_attrs
  c6x: dma-mapping: Use unsigned long for dma_attrs
  cris: dma-mapping: Use unsigned long for dma_attrs
  frv: dma-mapping: Use unsigned long for dma_attrs
  drm/exynos: dma-mapping: Use unsigned long for dma_attrs
  drm/mediatek: dma-mapping: Use unsigned long for dma_attrs
  drm/msm: dma-mapping: Use unsigned long for dma_attrs
  drm/nouveau: dma-mapping: Use unsigned long for dma_attrs
  drm/rockship: dma-mapping: Use unsigned long for dma_attrs
  infiniband: dma-mapping: Use unsigned long for dma_attrs
  iommu: dma-mapping: Use unsigned long for dma_attrs
  [media] dma-mapping: Use unsigned long for dma_attrs
  xen: dma-mapping: Use unsigned long for dma_attrs
  swiotlb: dma-mapping: Use unsigned long for dma_attrs
  powerpc: dma-mapping: Use unsigned long for dma_attrs
  video: dma-mapping: Use unsigned long for dma_attrs
  x86: dma-mapping: Use unsigned long for dma_attrs
  iommu: intel: dma-mapping: Use unsigned long for dma_attrs
  h8300: dma-mapping: Use unsigned long for dma_attrs
  hexagon: dma-mapping: Use unsigned long for dma_attrs
  ia64: dma-mapping: Use unsigned long for dma_attrs
  m68k: dma-mapping: Use unsigned long for dma_attrs
  metag: dma-mapping: Use unsigned long for dma_attrs
  microblaze: dma-mapping: Use unsigned long for dma_attrs
  mips: dma-mapping: Use unsigned long for dma_attrs
  mn10300: dma-mapping: Use unsigned long for dma_attrs
  nios2: dma-mapping: Use unsigned long for dma_attrs
  openrisc: dma-mapping: Use unsigned long for dma_attrs
  parisc: dma-mapping: Use unsigned long for dma_attrs
  misc: mic: dma-mapping: Use unsigned long for dma_attrs
  s390: dma-mapping: Use unsigned long for dma_attrs
  sh: dma-mapping: Use unsigned long for dma_attrs
  sparc: dma-mapping: Use unsigned long for dma_attrs
  tile: dma-mapping: Use unsigned long for dma_attrs
  unicore32: dma-mapping: Use unsigned long for dma_attrs
  xtensa: dma-mapping: Use unsigned long for dma_attrs
  dma-mapping: Remove dma_get_attr
  dma-mapping: Document the DMA attributes next to the declaration

 Documentation/DMA-API.txt  |  33 +++---
 Documentation/DMA-attributes.txt   |   2 +-
 arch/alpha/include/asm/dma-mapping.h   |   2 -
 arch/alpha/kernel/pci-noop.c   |   2 +-
 arch/alpha/kernel/pci_iommu.c  |  12 +-
 arch/arc/mm/dma.c  |  12 +-
 arch/arm/common/dmabounce.c|   4 +-
 arch/arm/include/asm/dma-mapping.h |  13 +--
 arch/arm/include/asm/xen/page-coherent.h   |  16 +--
 arch/arm/mm/dma-mapping.c  | 117 +--
 arch/arm/xen/mm.c  |   8 +-
 arch/arm64/mm/dma-mapping.c|  66 +--
 arch/avr32/mm/dma-coherent.c   |  12 +-
 arch/blackfin/kernel/dma-mapping.c |   8 +-
 arch/c6x/include/asm/dma-mapping.h |   4 +-
 arch/c6x/kernel/dma.c  |

Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible parent clocks

2016-06-30 Thread Scott Wood

On 06/29/2016 10:02 PM, Yuantian Tang wrote:
>> -Original Message-
>> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
>> Sent: Thursday, June 30, 2016 10:24 AM
>> To: Yuantian Tang 
>> Cc: Scott Wood ; Russell King ;
>> Michael Turquette ; Stephen Boyd
>> ; Viresh Kumar ; linux-
>> c...@vger.kernel.org; linux...@vger.kernel.org; linuxppc-
>> d...@lists.ozlabs.org; Yang-Leo Li ; Xiaofeng Ren
>> ; Scott Wood 
>> Subject: Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible
>> parent clocks
>>
>> On Thursday, June 30, 2016 01:47:09 AM Yuantian Tang wrote:
 -Original Message-
 From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
 Sent: Thursday, June 30, 2016 9:47 AM
 To: Yuantian Tang 
 Cc: Scott Wood ; Russell King
 ; Michael Turquette
 ; Stephen Boyd ;
 Viresh Kumar ; linux- c...@vger.kernel.org;
 linux...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Yang-Leo
 Li ; Xiaofeng Ren ; Scott
 Wood 
 Subject: Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering
 possible parent clocks

 On Wednesday, June 29, 2016 05:50:26 AM Yuantian Tang wrote:
> Hi,
>
> This patch is acked by clock maintainer. If no comments from
> anyone else,
 we will merge it in next week.

 There is a cpufreq commit depending on it.  Are you going to handle
 that one too?

>>> That one has been acked by cpufreq maintainer. You can get this from
>> patch comments.
>>
>> I know that it has been ACKed.
>>
>> My question is whether or not you are going to apply it along the [1/2].
>>
>> If not, it will have to be deferred until the [1/2] is merged and then 
>> applied
>> which may not be desirable.
>>
> I hope we can apply both at same time. Seems Scott has a few concerns.
> 
> What you think about this patch? Can you apply it?
> If you have applied this patch, then I can push CPUfreq maintainer to apply 
> another one which will be delayed.

My only concern was getting an ack for this patch (1/2) -- did I miss it
somewhere?

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 RFC] pasemi: Fix boot failure on 4.7-rc1

2016-06-30 Thread Olof Johansson

Hi,

On Wed, Jun 29, 2016 at 1:06 PM, Darren Stevens  wrote:
>
> Commit:d6a9996e84ac4beb7713e9485f4563e100a9b03e (powerpc/mm:
> vmalloc abstraction in preparation for radix) turned kernel memory
> and IO addresses from #defined constants to variables initialised
> at runtime.
>
> On PA6T systems the setup_arch machine call initialises the onboard
> PCI-e root-ports, and uses pci_io_base to do this, which is now before
> its value has been set resulting in a panic right after 'booting
> linux via __start()'

I don't see the panic here on a Chitra, at least not on recent -next.

What config are you building with when you see this?

Please cc me on PA Semi related patches if you want my attention. I
still have hardware, in fact I boot every -next release on it through
automation.



-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 00/12] gendisk: Generate uevent after attribute available

2016-06-30 Thread Christoph Hellwig

On Thu, Jun 30, 2016 at 02:35:54PM +0800, Fam Zheng wrote:
> also more code and less flexible IMO.  For example, we need at least two
> variants, for attribute_group and device_attribute separately, right?

Yes, or maybe just a calling convention that just passes both.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 00/12] gendisk: Generate uevent after attribute available

2016-06-30 Thread Fam Zheng

On Wed, 06/29 23:24, Christoph Hellwig wrote:
> On Thu, Jun 30, 2016 at 09:59:41AM +0800, Fam Zheng wrote:
> > Documentation/kobject.txt:
> > > Use the KOBJ_ADD action for when the kobject is first added to the kernel.
> > > This should be done only after any attributes or children of the kobject
> > > have been initialized properly, as userspace will instantly start to look
> > > for them when this call happens.
> > 
> > Unfortunately it seems impossible to fix this generally without touching the
> > offending callers.  The approach I'm proposing here is adding a flag to
> > suppress uevent in add_disk(), which is patch 1, then in later patches, 
> > convert
> > any caller to only trigger the uevent when attributes are added.
> 
> We (or rather Dan) is touching most add_disk callers anyway for the
> driverfs_dev removal.  Let's just pass the array of attributes to
> a disk_add variant and solve the issue for real.

I thought about that. Its usage is more compact compared to this series, but is
also more code and less flexible IMO.  For example, we need at least two
variants, for attribute_group and device_attribute separately, right?

Fam
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 00/12] gendisk: Generate uevent after attribute available

2016-06-30 Thread Christoph Hellwig

On Thu, Jun 30, 2016 at 09:59:41AM +0800, Fam Zheng wrote:
> Documentation/kobject.txt:
> > Use the KOBJ_ADD action for when the kobject is first added to the kernel.
> > This should be done only after any attributes or children of the kobject
> > have been initialized properly, as userspace will instantly start to look
> > for them when this call happens.
> 
> Unfortunately it seems impossible to fix this generally without touching the
> offending callers.  The approach I'm proposing here is adding a flag to
> suppress uevent in add_disk(), which is patch 1, then in later patches, 
> convert
> any caller to only trigger the uevent when attributes are added.

We (or rather Dan) is touching most add_disk callers anyway for the
driverfs_dev removal.  Let's just pass the array of attributes to
a disk_add variant and solve the issue for real.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cxl: Fix NULL pointer dereference on kernel contexts with no AFU interrupts

2016-06-30 Thread Ian Munsie

Excerpts from andrew.donnellan's message of 2016-06-30 15:15:02 +1000:
> On 30/06/16 15:00, Michael Ellerman wrote:
> > On Thu, 2016-06-30 at 08:28 +1000, Andrew Donnellan wrote:
> >> On 30/06/16 04:55, Ian Munsie wrote:
> >>>
> >>> From: Ian Munsie 
> >>>
> >>> If a kernel context is initialised and does not have any AFU interrupts
> >>> allocated it will cause a NULL pointer dereference when the context is
> >>> detached since the irq_names list will not have been initialised.
> >>>
> >>> Move the initialisation of the irq_names list into the cxl_context_init
> >>> routine so that it will be valid for the entire lifetime of the context
> >>> and will not cause a NULL pointer dereference.
> >>>
> >>> Signed-off-by: Ian Munsie 
> >
> >> As it's nice having your machine not crash on every shutdown...
> >
> > Fixes: 
> 
> Ian can correct me if I'm wrong, but I suspect this doesn't affect 
> cxlflash (the only current user of the cxl kernel API) - this issue was 
> hit while working on CAPI support for mlx5.

Correct - no current user hits this bug, but the upcoming mlx5 support
does because of the way it uses interrupts.

-Ian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] cpufreq: powernv: Replacing pstate_id with frequency table index

2016-06-30 Thread Akshay Adiga

Refactoring code to use frequency table index instead of pstate_id.
This abstraction will make the code independent of the pstate values.

- No functional changes
- The highest frequency is at frequency table index 0 and the frequency
  decreases as the index increases.
- Macros pstates_to_idx() and idx_to_pstate() can be used for conversion
  between pstate_id and index.
- powernv_pstate_info now contains frequency table index to min, max and
  nominal frequency (instead of pstate_ids)
- global_pstate_info new stores index values instead pstate ids.
- variables renamed as *_idx which now store index instead of pstate

Signed-off-by: Akshay Adiga 
Reviewed-by: Gautham R. Shenoy 
---
Changes from v1:
  - changed macro names from get_pstate()/ get_index() to 
idx_to_pstate()/ pstate_to_idx()
  - Renamed variables that store index instead of pstate_id to *_idx
  - Retained previous printk's 

v1 : http://marc.info/?l=linux-pm=146677701501225=1

 drivers/cpufreq/powernv-cpufreq.c | 177 ++
 1 file changed, 102 insertions(+), 75 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index b29c5c2..72c91d8 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -64,12 +64,14 @@
 /**
  * struct global_pstate_info - Per policy data structure to maintain history of
  * global pstates
- * @highest_lpstate:   The local pstate from which we are ramping down
+ * @highest_lpstate_idx:   The local pstate index from which we are
+ * ramping down
  * @elapsed_time:  Time in ms spent in ramping down from
- * highest_lpstate
+ * highest_lpstate_idx
  * @last_sampled_time: Time from boot in ms when global pstates were
  * last set
- * @last_lpstate,last_gpstate: Last set values for local and global pstates
+ * @last_lpstate_idx,  Last set value of local pstate and global
+ * last_gpstate_idxpstate in terms of cpufreq table index
  * @timer: Is used for ramping down if cpu goes idle for
  * a long time with global pstate held high
  * @gpstate_lock:  A spinlock to maintain synchronization between
@@ -77,11 +79,11 @@
  * governer's target_index calls
  */
 struct global_pstate_info {
-   int highest_lpstate;
+   int highest_lpstate_idx;
unsigned int elapsed_time;
unsigned int last_sampled_time;
-   int last_lpstate;
-   int last_gpstate;
+   int last_lpstate_idx;
+   int last_gpstate_idx;
spinlock_t gpstate_lock;
struct timer_list timer;
 };
@@ -124,29 +126,47 @@ static int nr_chips;
 static DEFINE_PER_CPU(struct chip *, chip_info);
 
 /*
- * Note: The set of pstates consists of contiguous integers, the
- * smallest of which is indicated by powernv_pstate_info.min, the
- * largest of which is indicated by powernv_pstate_info.max.
+ * Note:
+ * The set of pstates consists of contiguous integers.
+ * powernv_pstate_info stores the index of the frequency table for
+ * max, min and nominal frequencies. It also stores number of
+ * available frequencies.
  *
- * The nominal pstate is the highest non-turbo pstate in this
- * platform. This is indicated by powernv_pstate_info.nominal.
+ * powernv_pstate_info.nominal indicates the index to the highest
+ * non-turbo frequency.
  */
 static struct powernv_pstate_info {
-   int min;
-   int max;
-   int nominal;
-   int nr_pstates;
+   unsigned int min;
+   unsigned int max;
+   unsigned int nominal;
+   unsigned int nr_pstates;
 } powernv_pstate_info;
 
+/* Use following macros for conversions between pstate_id and index */
+static inline int idx_to_pstate(unsigned int i)
+{
+   return powernv_freqs[i].driver_data;
+}
+
+static inline unsigned int pstate_to_idx(int pstate)
+{
+   /*
+* abs() is deliberately used so that is works with
+* both monotonically increasing and decreasing
+* pstate values
+*/
+   return abs(pstate - idx_to_pstate(powernv_pstate_info.max));
+}
+
 static inline void reset_gpstates(struct cpufreq_policy *policy)
 {
struct global_pstate_info *gpstates = policy->driver_data;
 
-   gpstates->highest_lpstate = 0;
+   gpstates->highest_lpstate_idx = 0;
gpstates->elapsed_time = 0;
gpstates->last_sampled_time = 0;
-   gpstates->last_lpstate = 0;
-   gpstates->last_gpstate = 0;
+   gpstates->last_lpstate_idx = 0;
+   gpstates->last_gpstate_idx = 0;
 }
 
 /*
@@ -156,9 +176,10 @@ static inline void reset_gpstates(struct cpufreq_policy 
*policy)
 static int init_powernv_pstates(void)
 {
struct device_node *power_mgt;
-   int i, pstate_min,

Re: [PATCH v3 3/4] perf annotate: add powerpc support

2016-06-30 Thread Michael Ellerman

On Thu, 2016-06-30 at 11:44 +0530, Ravi Bangoria wrote:
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 36a5825..b87eac7 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -476,6 +481,125 @@ static int ins__cmp(const void *a, const void *b)
...
> +
> +static struct ins *ins__find_powerpc(const char *name)
> +{
> + int i;
> + struct ins *ins;
> + struct ins_ops *ops;
> + static struct instructions_powerpc head;
> + static bool list_initialized;
> +
> + /*
> +  * - Interested only if instruction starts with 'b'.
> +  * - Few start with 'b', but aren't branch instructions.
> +  * - Let's also ignore instructions involving 'ctr' and
> +  *   'tar' since target branch addresses for those can't
> +  *   be determined statically.
> +  */
> + if (name[0] != 'b' ||
> + !strncmp(name, "bcd", 3)   ||
> + !strncmp(name, "brinc", 5) ||
> + !strncmp(name, "bper", 4)  ||
> + strstr(name, "ctr")||
> + strstr(name, "tar"))
> + return NULL;

It would be good if 'bctr' was at least recognised as a branch, even if we
can't determine the target. They are very common.

It doesn't look like we have the opcode handy here? Could we get it somehow?
That would make this a *lot* more robust.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 4/4] perf: Define macro for normalized arch names

2016-06-30 Thread Ravi Bangoria

Define macro for each normalized arch name and use them instead
of using arch name as string

Signed-off-by: Ravi Bangoria 
---
Changes in v3:
  - No changes

 tools/perf/arch/common.c   | 36 ++--
 tools/perf/arch/common.h   | 11 +++
 tools/perf/util/annotate.c | 10 +-
 tools/perf/util/unwind-libunwind.c |  4 ++--
 4 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
index ee69668..feb2113 100644
--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@@ -122,25 +122,25 @@ static int lookup_triplets(const char *const *triplets, 
const char *name)
 const char *normalize_arch(char *arch)
 {
if (!strcmp(arch, "x86_64"))
-   return "x86";
+   return NORM_X86;
if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-   return "x86";
+   return NORM_X86;
if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-   return "sparc";
+   return NORM_SPARC;
if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64"))
-   return "arm64";
+   return NORM_ARM64;
if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-   return "arm";
+   return NORM_ARM;
if (!strncmp(arch, "s390", 4))
-   return "s390";
+   return NORM_S390;
if (!strncmp(arch, "parisc", 6))
-   return "parisc";
+   return NORM_PARISC;
if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-   return "powerpc";
+   return NORM_POWERPC;
if (!strncmp(arch, "mips", 4))
-   return "mips";
+   return NORM_MIPS;
if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-   return "sh";
+   return NORM_SH;
 
return arch;
 }
@@ -180,21 +180,21 @@ static int perf_env__lookup_binutils_path(struct perf_env 
*env,
zfree();
}
 
-   if (!strcmp(arch, "arm"))
+   if (!strcmp(arch, NORM_ARM))
path_list = arm_triplets;
-   else if (!strcmp(arch, "arm64"))
+   else if (!strcmp(arch, NORM_ARM64))
path_list = arm64_triplets;
-   else if (!strcmp(arch, "powerpc"))
+   else if (!strcmp(arch, NORM_POWERPC))
path_list = powerpc_triplets;
-   else if (!strcmp(arch, "sh"))
+   else if (!strcmp(arch, NORM_SH))
path_list = sh_triplets;
-   else if (!strcmp(arch, "s390"))
+   else if (!strcmp(arch, NORM_S390))
path_list = s390_triplets;
-   else if (!strcmp(arch, "sparc"))
+   else if (!strcmp(arch, NORM_SPARC))
path_list = sparc_triplets;
-   else if (!strcmp(arch, "x86"))
+   else if (!strcmp(arch, NORM_X86))
path_list = x86_triplets;
-   else if (!strcmp(arch, "mips"))
+   else if (!strcmp(arch, NORM_MIPS))
path_list = mips_triplets;
else {
ui__error("binutils for %s not supported.\n", arch);
diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h
index 6b01c73..14ca8ca 100644
--- a/tools/perf/arch/common.h
+++ b/tools/perf/arch/common.h
@@ -5,6 +5,17 @@
 
 extern const char *objdump_path;
 
+/* Macro for normalized arch names */
+#define NORM_X86   "x86"
+#define NORM_SPARC "sparc"
+#define NORM_ARM64 "arm64"
+#define NORM_ARM   "arm"
+#define NORM_S390  "s390"
+#define NORM_PARISC"parisc"
+#define NORM_POWERPC   "powerpc"
+#define NORM_MIPS  "mips"
+#define NORM_SH"sh"
+
 int perf_env__lookup_objdump(struct perf_env *env);
 const char *normalize_arch(char *arch);
 
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index b87eac7..fce60b4 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -68,7 +68,7 @@ static int call__parse(struct ins_operands *ops,
 
name++;
 
-   if (!strcmp(norm_arch, "arm") && strchr(name, '+'))
+   if (!strcmp(norm_arch, NORM_ARM) && strchr(name, '+'))
return -1;
 
tok = strchr(name, '>');
@@ -255,7 +255,7 @@ static int mov__parse(struct ins_operands *ops,
 
target = ++s;
 
-   if (!strcmp(norm_arch, "arm"))
+   if (!strcmp(norm_arch, NORM_ARM))
comment = strchr(s, ';');
else
comment = strchr(s, '#');
@@ -629,13 +629,13 @@ static struct ins *ins__find(const char *name, const char 
*norm_arch)
sorted = true;
}
 
-   if (!strcmp(norm_arch, "x86")) {
+   if (!strcmp(norm_arch, NORM_X86)) {
instructions = instructions_x86;
nmemb = ARRAY_SIZE(instructions_x86);
-   } else if (!strcmp(norm_arch, "arm")) {
+   } else if (!strcmp(norm_arch, NORM_ARM)) {

[PATCH v3 3/4] perf annotate: add powerpc support

2016-06-30 Thread Ravi Bangoria

From: Naveen N. Rao  

Powerpc has long list of branch instructions and hardcoding them in
table appears to be error-prone. So, add new function to find
instruction instead of creating table. This function dynamically
create table(list of 'struct ins'), and instead of creating object
every time, first check if list already contain object for that
instruction.

Signed-off-by: Naveen N. Rao 
Signed-off-by: Ravi Bangoria 
---
Changes in v3:
  - Optimized code
  - Corrected one memory leak

 tools/perf/util/annotate.c | 126 +
 1 file changed, 126 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 36a5825..b87eac7 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -461,6 +461,11 @@ static struct ins instructions_arm[] = {
{ .name = "bne",   .ops  = _ops, },
 };
 
+struct instructions_powerpc {
+   struct ins *ins;
+   struct list_head list;
+};
+
 static int ins__key_cmp(const void *name, const void *insp)
 {
const struct ins *ins = insp;
@@ -476,6 +481,125 @@ static int ins__cmp(const void *a, const void *b)
return strcmp(ia->name, ib->name);
 }
 
+static struct ins *list_add__ins_powerpc(struct instructions_powerpc *head,
+const char *name, struct ins_ops *ops)
+{
+   struct instructions_powerpc *ins_powerpc;
+   struct ins *ins;
+
+   ins = zalloc(sizeof(struct ins));
+   if (!ins)
+   return NULL;
+
+   ins_powerpc = zalloc(sizeof(struct instructions_powerpc));
+   if (!ins_powerpc)
+   goto out_free_ins;
+
+   ins->name = strdup(name);
+   if (!ins->name)
+   goto out_free_ins_power;
+
+   ins->ops = ops;
+   ins_powerpc->ins = ins;
+   list_add_tail(&(ins_powerpc->list), &(head->list));
+
+   return ins;
+
+out_free_ins_power:
+   zfree(_powerpc);
+out_free_ins:
+   zfree();
+   return NULL;
+}
+
+static struct ins *list_search__ins_powerpc(struct instructions_powerpc *head,
+   const char *name)
+{
+   struct instructions_powerpc *pos;
+
+   list_for_each_entry(pos, >list, list) {
+   if (!strcmp(pos->ins->name, name))
+   return pos->ins;
+   }
+   return NULL;
+}
+
+static struct ins *ins__find_powerpc(const char *name)
+{
+   int i;
+   struct ins *ins;
+   struct ins_ops *ops;
+   static struct instructions_powerpc head;
+   static bool list_initialized;
+
+   /*
+* - Interested only if instruction starts with 'b'.
+* - Few start with 'b', but aren't branch instructions.
+* - Let's also ignore instructions involving 'ctr' and
+*   'tar' since target branch addresses for those can't
+*   be determined statically.
+*/
+   if (name[0] != 'b' ||
+   !strncmp(name, "bcd", 3)   ||
+   !strncmp(name, "brinc", 5) ||
+   !strncmp(name, "bper", 4)  ||
+   strstr(name, "ctr")||
+   strstr(name, "tar"))
+   return NULL;
+
+   if (!list_initialized) {
+   INIT_LIST_HEAD();
+   list_initialized = true;
+   }
+
+   /*
+* Return if we already have object of 'struct ins' for this
+* instruction
+*/
+   ins = list_search__ins_powerpc(, name);
+   if (ins)
+   return ins;
+
+   ops = _ops;
+
+   i = strlen(name) - 1;
+   if (i < 0)
+   return NULL;
+
+   /* ignore optional hints at the end of the instructions */
+   if (name[i] == '+' || name[i] == '-')
+   i--;
+
+   if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) {
+   /*
+* if the instruction ends up with 'l' or 'la', then
+* those are considered 'calls' since they update LR.
+* ... except for 'bnl' which is branch if not less than
+* and the absolute form of the same.
+*/
+   if (strcmp(name, "bnl") && strcmp(name, "bnl+") &&
+   strcmp(name, "bnl-") && strcmp(name, "bnla") &&
+   strcmp(name, "bnla+") && strcmp(name, "bnla-"))
+   ops = _ops;
+   }
+   if (name[i] == 'r' && name[i-1] == 'l')
+   /*
+* instructions ending with 'lr' are considered to be
+* return instructions
+*/
+   ops = _ops;
+
+   /*
+* Add instruction to list so next time no need to
+* allocate memory for it.
+*/
+   ins = list_add__ins_powerpc(, name, ops);
+   if (ins)
+   return ins;
+
+   return NULL;
+}
+
 static void ins__sort(struct ins *instructions, int nmemb)
 {

[PATCH v3 2/4] perf annotate: Enable cross arch annotate

2016-06-30 Thread Ravi Bangoria

Change current data structures and function to enable cross arch
annotate.

Current implementation does not contain logic of record on one arch
and annotating on other. This remote annotate is partially possible
with current implementation for x86 (or may be arm as well) only.
But, to make remote annotation work properly, all architecture
instruction tables need to be included in the perf binary. And while
annotating, look for instruction table where perf.data was recorded.

For arm, few instructions were defined under #if __arm__ which I've
used as a table for arm. But I'm not sure whether instruction defined
outside of that also contains arm instructions. Apart from that,
'call__parse()' and 'move__parse()' contains #ifdef __arm__ directive.
I've changed it to  if (!strcmp(norm_arch, arm)). But I've not
tested this as well.

Signed-off-by: Ravi Bangoria 
---
Changes in v3:
  - No changes

 tools/perf/builtin-top.c  |   2 +-
 tools/perf/ui/browsers/annotate.c |   3 +-
 tools/perf/ui/gtk/annotate.c  |   2 +-
 tools/perf/util/annotate.c| 136 --
 tools/perf/util/annotate.h|   5 +-
 5 files changed, 95 insertions(+), 53 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 07fc792..d4fd947 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -128,7 +128,7 @@ static int perf_top__parse_source(struct perf_top *top, 
struct hist_entry *he)
return err;
}
 
-   err = symbol__annotate(sym, map, 0);
+   err = symbol__annotate(sym, map, 0, NULL);
if (err == 0) {
 out_assign:
top->sym_filter_entry = he;
diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 29dc6d2..3a652a6f 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -1050,7 +1050,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map 
*map,
  (nr_pcnt - 1);
}
 
-   if (symbol__annotate(sym, map, sizeof_bdl) < 0) {
+   if (symbol__annotate(sym, map, sizeof_bdl,
+perf_evsel__env_arch(evsel)) < 0) {
ui__error("%s", ui_helpline__last_msg);
goto out_free_offsets;
}
diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c
index 9c7ff8d..d7150b3 100644
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
@@ -166,7 +166,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct 
map *map,
if (map->dso->annotate_warned)
return -1;
 
-   if (symbol__annotate(sym, map, 0) < 0) {
+   if (symbol__annotate(sym, map, 0, perf_evsel__env_arch(evsel)) < 0) {
ui__error("%s", ui_helpline__current);
return -1;
}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index c385fec..36a5825 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -20,12 +20,14 @@
 #include 
 #include 
 #include 
+#include 
+#include "../arch/common.h"
 
 const char *disassembler_style;
 const char *objdump_path;
 static regex_t  file_lineno;
 
-static struct ins *ins__find(const char *name);
+static struct ins *ins__find(const char *name, const char *norm_arch);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
 
 static void ins__delete(struct ins_operands *ops)
@@ -53,7 +55,8 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size,
return ins__raw_scnprintf(ins, bf, size, ops);
 }
 
-static int call__parse(struct ins_operands *ops)
+static int call__parse(struct ins_operands *ops,
+  __maybe_unused const char *norm_arch)
 {
char *endptr, *tok, *name;
 
@@ -65,10 +68,8 @@ static int call__parse(struct ins_operands *ops)
 
name++;
 
-#ifdef __arm__
-   if (strchr(name, '+'))
+   if (!strcmp(norm_arch, "arm") && strchr(name, '+'))
return -1;
-#endif
 
tok = strchr(name, '>');
if (tok == NULL)
@@ -117,7 +118,8 @@ bool ins__is_call(const struct ins *ins)
return ins->ops == _ops;
 }
 
-static int jump__parse(struct ins_operands *ops)
+static int jump__parse(struct ins_operands *ops,
+  __maybe_unused const char *norm_arch)
 {
const char *s = strchr(ops->raw, '+');
 
@@ -172,7 +174,7 @@ static int comment__symbol(char *raw, char *comment, u64 
*addrp, char **namep)
return 0;
 }
 
-static int lock__parse(struct ins_operands *ops)
+static int lock__parse(struct ins_operands *ops, const char *norm_arch)
 {
char *name;
 
@@ -183,7 +185,7 @@ static int lock__parse(struct ins_operands *ops)
if (disasm_line__parse(ops->raw, , >locked.ops->raw) < 0)
goto out_free_ops;
 
-   ops->locked.ins = ins__find(name);
+   ops->locked.ins = ins__find(name, norm_arch);
free(name);
 
if

[PATCH v3 1/4] perf: Utility function to fetch arch

2016-06-30 Thread Ravi Bangoria

Add Utility function to fetch arch using evsel. (evsel->env->arch)

Signed-off-by: Ravi Bangoria 
---
Change in v3:
  - No changes

 tools/perf/util/evsel.c | 7 +++
 tools/perf/util/evsel.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 1d8f2bb..0fea724 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2422,3 +2422,10 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, 
struct target *target,
 err, strerror_r(err, sbuf, sizeof(sbuf)),
 perf_evsel__name(evsel));
 }
+
+char *perf_evsel__env_arch(struct perf_evsel *evsel)
+{
+   if (evsel && evsel->evlist && evsel->evlist->env)
+   return evsel->evlist->env->arch;
+   return NULL;
+}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 828ddd1..86fed7a 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -435,4 +435,6 @@ typedef int (*attr__fprintf_f)(FILE *, const char *, const 
char *, void *);
 int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 attr__fprintf_f attr__fprintf, void *priv);
 
+char *perf_evsel__env_arch(struct perf_evsel *evsel);
+
 #endif /* __PERF_EVSEL_H */
-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/4] perf annotate: Enable cross arch annotate

2016-06-30 Thread Ravi Bangoria

Perf can currently only support code navigation (branches and calls) in
annotate when run on the same architecture where perf.data was recorded.
But cross arch annotate is not supported.

This patchset enables cross arch annotate. Currently I've used x86
and arm instructions which are already available and adding support
for powerpc as well. Adding support for other arch will be easy.

I've created this patch on top of acme/perf/core. And tested it with
x86 and powerpc only.

Example:

  Record on powerpc:
  $ ./perf record -a

  Report -> Annotate on x86:
  $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc

Changes in v3:
  - Optimized patch that enables annotate on powerpc
  - Corrected one memory leak

v2 link: https://lkml.org/lkml/2016/6/29/278

Naveen N. Rao (1):
  perf annotate: add powerpc support

Ravi Bangoria (4):
  perf: Utility function to fetch arch
  perf annotate: Enable cross arch annotate
  perf: Define macro for normalized arch names

 tools/perf/arch/common.c   |  36 ++---
 tools/perf/arch/common.h   |  11 ++
 tools/perf/builtin-top.c   |   2 +-
 tools/perf/ui/browsers/annotate.c  |   3 +-
 tools/perf/ui/gtk/annotate.c   |   2 +-
 tools/perf/util/annotate.c | 260 ++---
 tools/perf/util/annotate.h |   5 +-
 tools/perf/util/evsel.c|   7 +
 tools/perf/util/evsel.h|   2 +
 tools/perf/util/unwind-libunwind.c |   4 +-
 10 files changed, 260 insertions(+), 72 deletions(-)

--
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

80 matches

Mail list logo