Re: [PATCH 0/6] eBPF JIT for PPC64

2016-06-17 Thread mpe

On 2016-06-13 15:40, Naveen N. Rao wrote:

On 2016/06/10 10:47PM, David Miller wrote:

From: "Naveen N. Rao" 
Date: Tue,  7 Jun 2016 19:02:17 +0530

> Please note that patch [2] is a pre-requisite for this patchset, and is
> not yet upstream.
 ...
> [1] http://thread.gmane.org/gmane.linux.kernel/2188694
> [2] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/96514

Because of #2 I don't think I can take this directly into the 
networking

tree, right?

Therefore, how would you like this to be merged?


Hi David,
Thanks for asking. Yes, I think it is better to take this through the
powerpc tree as all the changes are contained within arch/powerpc,
unless Michael Ellerman feels differently.

Michael?


Yeah I was planning to take it.

I put it in my test tree last night but it broke the build for some 
configs.

Once that is fixed I'll take it via powerpc#next.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v10,17/18] powerpc/powernv: Functions to get/set PCI slot state

2016-06-17 Thread Gavin Shan
On Fri, Jun 17, 2016 at 08:32:10PM +1000, Michael Ellerman wrote:
>On Fri, 2016-20-05 at 06:41:41 UTC, Gavin Shan wrote:
>> diff --git a/arch/powerpc/include/asm/opal-api.h 
>> b/arch/powerpc/include/asm/opal-api.h
>> index 9bb8ddf..2417c86 100644
>> --- a/arch/powerpc/include/asm/opal-api.h
>> +++ b/arch/powerpc/include/asm/opal-api.h
>> @@ -344,6 +348,18 @@ enum OpalPciResetState {
>>  OPAL_ASSERT_RESET   = 1
>>  };
>>  
>> +enum OpalPciSlotPresentenceState {
>
>In skiboot this is called "OpalPciSlotPresence".
>
>I've renamed it.
>
>> +OPAL_PCI_SLOT_EMPTY = 0,
>> +OPAL_PCI_SLOT_PRESENT   = 1
>> +};
>> +
>> +enum OpalPciSlotPowerState {
>
>In skiboot this is called "OpalPciSlotPower".
>
>I've renamed it.
>
>> +OPAL_PCI_SLOT_POWER_OFF = 0,
>> +OPAL_PCI_SLOT_POWER_ON  = 1,
>> +OPAL_PCI_SLOT_OFFLINE   = 2,
>> +OPAL_PCI_SLOT_ONLINE= 3
>> +};
>> +
>>  enum OpalSlotLedType {
>>  OPAL_SLOT_LED_TYPE_ID = 0,  /* IDENTIFY LED */
>>  OPAL_SLOT_LED_TYPE_FAULT = 1,   /* FAULT LED */
>> @@ -378,6 +394,7 @@ enum opal_msg_type {
>>  OPAL_MSG_DPO= 5,
>>  OPAL_MSG_PRD= 6,
>>  OPAL_MSG_OCC= 7,
>> +OPAL_MSG_PCI_HOTPLUG= 8,
>
>I don't see this in skiboot?
>
>It also doesn't seem to be used, so I've dropped it.
>

Thanks, Michael. All the changes are correct. The enum name was changed
in the last skiboot patchset that was merged couple days ago. At same
time, OPAL_MSG_PCI_HOTPLUG isn't needed as PCI hotplug won't have one
dedicated message type, an asychornous message is used instead.

Thanks,
Gavin


>>  OPAL_MSG_TYPE_MAX,
>>  };
>>  
>
>cheers
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] ibmvnic: fix to use list_for_each_safe() when delete items

2016-06-17 Thread weiyj_lk
From: Wei Yongjun 

Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each() macro aptly named
list_for_each_safe().

Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 864cb21..0b6a922 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3141,14 +3141,14 @@ static void handle_request_ras_comp_num_rsp(union 
ibmvnic_crq *crq,
 
 static void ibmvnic_free_inflight(struct ibmvnic_adapter *adapter)
 {
-   struct ibmvnic_inflight_cmd *inflight_cmd;
+   struct ibmvnic_inflight_cmd *inflight_cmd, *tmp1;
struct device *dev = &adapter->vdev->dev;
-   struct ibmvnic_error_buff *error_buff;
+   struct ibmvnic_error_buff *error_buff, *tmp2;
unsigned long flags;
unsigned long flags2;
 
spin_lock_irqsave(&adapter->inflight_lock, flags);
-   list_for_each_entry(inflight_cmd, &adapter->inflight, list) {
+   list_for_each_entry_safe(inflight_cmd, tmp1, &adapter->inflight, list) {
switch (inflight_cmd->crq.generic.cmd) {
case LOGIN:
dma_unmap_single(dev, adapter->login_buf_token,
@@ -3165,8 +3165,8 @@ static void ibmvnic_free_inflight(struct ibmvnic_adapter 
*adapter)
break;
case REQUEST_ERROR_INFO:
spin_lock_irqsave(&adapter->error_list_lock, flags2);
-   list_for_each_entry(error_buff, &adapter->errors,
-   list) {
+   list_for_each_entry_safe(error_buff, tmp2,
+&adapter->errors, list) {
dma_unmap_single(dev, error_buff->dma,
 error_buff->len,
 DMA_FROM_DEVICE);





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12.update2 02/15] PCI: Let pci_mmap_page_range() take resource address

2016-06-17 Thread Yinghai Lu
On Fri, Jun 17, 2016 at 12:52 PM, Bjorn Helgaas  wrote:
>>
>> and respin the whole patchset today.
>
> I added your acks and pushed the result to pci/resource.  I'll also
> post these formally on the list so they're easier to find.

Please review patchset v13 that is against your new pci/resource branch.

Thanks

Yinghai
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v13 10/16] powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing

2016-06-17 Thread Yinghai Lu
For device resource PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource.

This patch set IORESOUCE_MEM_64 for 64bit resource during OF device resource
flags parsing.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
Signed-off-by: Yinghai Lu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Gavin Shan 
Cc: Yijing Wang 
Cc: Anton Blanchard 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/pci_of_scan.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 719f225..476b8ac5 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -44,8 +44,10 @@ static unsigned int pci_parse_of_flags(u32 addr0, int bridge)
 
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
-   flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
+   if (addr0 & 0x0100)
+   flags |= IORESOURCE_MEM_64
+| PCI_BASE_ADDRESS_MEM_TYPE_64;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH
 | PCI_BASE_ADDRESS_MEM_PREFETCH;
-- 
2.8.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v13 02/16] PCI: Remove __pci_mmap_make_offset()

2016-06-17 Thread Yinghai Lu
After
  PCI: Let pci_mmap_page_range() take resource address
No user for __pci_mmap_make_offset in those arch.

Remove them.

Signed-off-by: Yinghai Lu 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
---
 arch/microblaze/pci/pci-common.c |  63 --
 arch/powerpc/kernel/pci-common.c |  63 --
 arch/sparc/kernel/pci.c  | 113 ---
 arch/xtensa/kernel/pci.c |  62 -
 4 files changed, 301 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 9e3bc05..e7cd0ab 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -156,69 +156,6 @@ void pcibios_set_master(struct pci_dev *dev)
  */
 
 /*
- * Adjust vm_pgoff of VMA such that it is the physical page offset
- * corresponding to the 32-bit pci bus offset for DEV requested by the user.
- *
- * Basically, the user finds the base address for his device which he wishes
- * to mmap.  They read the 32-bit value from the config space base register,
- * add whatever PAGE_SIZE multiple offset they wish, and feed this into the
- * offset parameter of mmap on /proc/bus/pci/XXX for that device.
- *
- * Returns negative error code on failure, zero on success.
- */
-static struct resource *__pci_mmap_make_offset(struct pci_dev *dev,
-  resource_size_t *offset,
-  enum pci_mmap_state mmap_state)
-{
-   struct pci_controller *hose = pci_bus_to_host(dev->bus);
-   unsigned long io_offset = 0;
-   int i, res_bit;
-
-   if (!hose)
-   return NULL;/* should never happen */
-
-   /* If memory, add on the PCI bridge address offset */
-   if (mmap_state == pci_mmap_mem) {
-#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
-   *offset += hose->pci_mem_offset;
-#endif
-   res_bit = IORESOURCE_MEM;
-   } else {
-   io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-   *offset += io_offset;
-   res_bit = IORESOURCE_IO;
-   }
-
-   /*
-* Check that the offset requested corresponds to one of the
-* resources of the device.
-*/
-   for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
-   struct resource *rp = &dev->resource[i];
-   int flags = rp->flags;
-
-   /* treat ROM as memory (should be already) */
-   if (i == PCI_ROM_RESOURCE)
-   flags |= IORESOURCE_MEM;
-
-   /* Active and same type? */
-   if ((flags & res_bit) == 0)
-   continue;
-
-   /* In the range of this resource? */
-   if (*offset < (rp->start & PAGE_MASK) || *offset > rp->end)
-   continue;
-
-   /* found it! construct the final physical address */
-   if (mmap_state == pci_mmap_io)
-   *offset += hose->io_base_phys - io_offset;
-   return rp;
-   }
-
-   return NULL;
-}
-
-/*
  * This one is used by /dev/mem and fbdev who have no clue about the
  * PCI device, it tries to find the PCI device first and calls the
  * above routine
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 53ba098..14c183d 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -293,69 +293,6 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
  */
 
 /*
- * Adjust vm_pgoff of VMA such that it is the physical page offset
- * corresponding to the 32-bit pci bus offset for DEV requested by the user.
- *
- * Basically, the user finds the base address for his device which he wishes
- * to mmap.  They read the 32-bit value from the config space base register,
- * add whatever PAGE_SIZE multiple offset they wish, and feed this into the
- * offset parameter of mmap on /proc/bus/pci/XXX for that device.
- *
- * Returns negative error code on failure, zero on success.
- */
-static struct resource *__pci_mmap_make_offset(struct pci_dev *dev,
-  resource_size_t *offset,
-  enum pci_mmap_state mmap_state)
-{
-   struct pci_controller *hose = pci_bus_to_host(dev->bus);
-   unsigned long io_offset = 0;
-   int i, res_bit;
-
-   if (hose == NULL)
-   return NULL;/* should never happen */
-
-   /* If memory, add on the PCI bridge address offset */
-   if (mmap_state == pci_mmap_mem) {
-#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
-   *offset += hose->pci_mem_offset;
-#endif
-   res_bit = IORESOURCE_MEM;
-   } else {
-   io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-  

[PATCH v13 01/16] PCI: Let pci_mmap_page_range() take resource address

2016-06-17 Thread Yinghai Lu
In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try
to check exposed value with resource start/end in proc mmap path.

|start = vma->vm_pgoff;
|size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
|pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
|pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
|if (start >= pci_start && start < pci_start + size &&
|start + nr <= pci_start + size)

That breaks sparc that exposed value is BAR value, and need to be offseted
to resource address.

Original pci_mmap_page_range() is taking PCI BAR value aka usr_address.

Bjorn found out that it would be much simple to pass resource address
directly and avoid extra those __pci_mmap_make_offset.

In this patch:
1. in proc path: proc_bus_pci_mmap, try convert back to resource
   before calling pci_mmap_page_range
2. in sysfs path: pci_mmap_resource will just offset with resource start.
3. all pci_mmap_page_range will have vma->vm_pgoff with in resource
   range instead of BAR value.
4. skip calling __pci_mmap_make_offset, as the checking is done
   in pci_mmap_fits().

-v2: add pci_user_to_resource and remove __pci_mmap_make_offset
-v3: pass resource pointer with pci_mmap_page_range()
-v4: put __pci_mmap_make_offset() removing to following patch
 seperate /sys io access alignment checking to another patch
 updated after Bjorn's pci_resource_to_user() changes.

Signed-off-by: Yinghai Lu 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
---
 arch/microblaze/pci/pci-common.c | 11 +---
 arch/powerpc/kernel/pci-common.c | 11 +---
 arch/sparc/kernel/pci.c  |  4 ---
 arch/xtensa/kernel/pci.c | 13 +++--
 drivers/pci/pci-sysfs.c  | 23 +--
 drivers/pci/pci.h|  2 +-
 drivers/pci/proc.c   | 60 ++--
 7 files changed, 91 insertions(+), 33 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 81556b8..9e3bc05 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -282,12 +282,15 @@ int pci_mmap_page_range(struct pci_dev *dev, struct 
vm_area_struct *vma,
 {
resource_size_t offset =
((resource_size_t)vma->vm_pgoff) << PAGE_SHIFT;
-   struct resource *rp;
int ret;
 
-   rp = __pci_mmap_make_offset(dev, &offset, mmap_state);
-   if (rp == NULL)
-   return -EINVAL;
+   if (mmap_state == pci_mmap_io) {
+   struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+   /* hose should never be NULL */
+   offset += hose->io_base_phys -
+((unsigned long)hose->io_base_virt - _IO_BASE);
+   }
 
vma->vm_pgoff = offset >> PAGE_SHIFT;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 6de6e0e..53ba098 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -420,12 +420,15 @@ int pci_mmap_page_range(struct pci_dev *dev, struct 
vm_area_struct *vma,
 {
resource_size_t offset =
((resource_size_t)vma->vm_pgoff) << PAGE_SHIFT;
-   struct resource *rp;
int ret;
 
-   rp = __pci_mmap_make_offset(dev, &offset, mmap_state);
-   if (rp == NULL)
-   return -EINVAL;
+   if (mmap_state == pci_mmap_io) {
+   struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+   /* hose should never be NULL */
+   offset += hose->io_base_phys -
+ ((unsigned long)hose->io_base_virt - _IO_BASE);
+   }
 
vma->vm_pgoff = offset >> PAGE_SHIFT;
if (write_combine)
diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index 9c1878f..5f2d78e 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -868,10 +868,6 @@ int pci_mmap_page_range(struct pci_dev *dev, struct 
vm_area_struct *vma,
 {
int ret;
 
-   ret = __pci_mmap_make_offset(dev, vma, mmap_state);
-   if (ret < 0)
-   return ret;
-
__pci_mmap_set_pgprot(dev, vma, mmap_state);
 
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
diff --git a/arch/xtensa/kernel/pci.c b/arch/xtensa/kernel/pci.c
index b848cc3..4c5f1fa 100644
--- a/arch/xtensa/kernel/pci.c
+++ b/arch/xtensa/kernel/pci.c
@@ -366,11 +366,18 @@ int pci_mmap_page_range(struct pci_dev *dev, struct 
vm_area_struct *vma,
enum pci_mmap_state mmap_state,
int write_combine)
 {
+   unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
int ret;
 
-   ret = __pci_mmap_make_offset(dev, vma, mmap_state);
-   if (ret < 0)
-   return ret;
+   if (mmap_state == pci_mmap_io) {
+ 

[PATCH v13 09/16] powerpc/PCI: Keep resource idx order with bridge register number

2016-06-17 Thread Yinghai Lu
Same as sparc version.

Make resource with consistent sequence
like other arch or directly from pci_read_bridge_bases(),
even when non-pref mmio is missing, or out of ordering in firmware reporting.

Just hold i = 1 for non pref mmio, and i = 2 for pref mmio.

Signed-off-by: Yinghai Lu 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/pci_of_scan.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 526ac67..719f225 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -252,7 +252,7 @@ void of_scan_pci_bridge(struct pci_dev *dev)
bus->resource[i] = res;
++res;
}
-   i = 1;
+   i = 3;
for (; len >= 32; len -= 32, ranges += 8) {
flags = pci_parse_of_flags(of_read_number(ranges, 1), 1);
size = of_read_number(&ranges[6], 2);
@@ -265,6 +265,12 @@ void of_scan_pci_bridge(struct pci_dev *dev)
   " for bridge %s\n", node->full_name);
continue;
}
+   } else if ((flags & IORESOURCE_PREFETCH) &&
+  !bus->resource[2]->flags) {
+   res = bus->resource[2];
+   } else if (((flags & (IORESOURCE_MEM | IORESOURCE_PREFETCH)) ==
+   IORESOURCE_MEM) && !bus->resource[1]->flags) {
+   res = bus->resource[1];
} else {
if (i >= PCI_NUM_RESOURCES - PCI_BRIDGE_RESOURCES) {
printk(KERN_ERR "PCI: too many memory ranges"
-- 
2.8.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 12/18] limits: track RLIMIT_MEMLOCK actual max

2016-06-17 Thread Doug Ledford
On 6/13/2016 3:44 PM, Topi Miettinen wrote:
> Track maximum size of locked memory, presented in /proc/self/limits.

You should have probably Cc:ed everyone on the cover letter and probably
patch 1 of this series.  This patch is hard to decipher without the
additional context of those items.  However, that said, I think I see
what you are doing.  But your wording of your comments below is bad:

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index feb9bb7..d3f3c9f 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -3378,10 +3378,16 @@ static inline unsigned long rlimit_max(unsigned int 
> limit)
>   return task_rlimit_max(current, limit);
>  }
>  
> +static inline void task_bump_rlimit(struct task_struct *tsk,
> + unsigned int limit, unsigned long r)
> +{
> + if (READ_ONCE(tsk->signal->rlim_curmax[limit]) < r)
> + tsk->signal->rlim_curmax[limit] = r;
> +}
> +
>  static inline void bump_rlimit(unsigned int limit, unsigned long r)
>  {
> - if (READ_ONCE(current->signal->rlim_curmax[limit]) < r)
> - current->signal->rlim_curmax[limit] = r;
> + return task_bump_rlimit(current, limit, r);
>  }
>  
>  #ifdef CONFIG_CPU_FREQ
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 46ecce4..192001e 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -76,6 +76,9 @@ static int bpf_map_charge_memlock(struct bpf_map *map)
>   return -EPERM;
>   }
>   map->user = user;
> + /* XXX resource limits apply per task, not per user */
> + bump_rlimit(RLIMIT_MEMLOCK, atomic_long_read(&user->locked_vm) <<
> + PAGE_SHIFT);

No, these resource limits do not apply per task.  They are per user.
However, you are doing maximum  usage accounting on a per-task basis by
adding a new counter to the signal struct of the task.  Fine, but your
comments need to reflect that instead of the confusing comment above.
In addition, your function name is horrible for what you are doing.  A
person reading this function will think that you are bumping the actual
rlimit on the task, which is not what you are doing.  You are performing
per-task accounting of MEMLOCK memory.  The actual permission checks are
per-user, and the primary accounting is per-user.  So, really, this is
just a nice little feature that provides a more granular per-task usage
(but not control) so a user can see where their overall memlock memory
is being used.  Fine.  I would reword the comment something like this:

/* XXX resource is tracked and limit enforced on a per user basis,
   but we track it on a per-task basis as well so users can identify
   hogs of this resource, stats can be found in /proc//limits */

And I would rename bump_rlimit and task_bump_rlimit to something like
account_rlimit and task_account_rlimit.  Calling it bump just gives the
wrong idea entirely on first read.

>   return 0;
>  }
>  
> @@ -601,6 +604,9 @@ static int bpf_prog_charge_memlock(struct bpf_prog *prog)
>   return -EPERM;
>   }
>   prog->aux->user = user;
> + /* XXX resource limits apply per task, not per user */
> + bump_rlimit(RLIMIT_MEMLOCK, atomic_long_read(&user->locked_vm) <<
> + PAGE_SHIFT);
>   return 0;
>  }

> @@ -798,6 +802,9 @@ int user_shm_lock(size_t size, struct user_struct *user)
>   get_uid(user);
>   user->locked_shm += locked;
>   allowed = 1;
> +
> + /* XXX resource limits apply per task, not per user */
> + bump_rlimit(RLIMIT_MEMLOCK, user->locked_shm << PAGE_SHIFT);
>  out:
>   spin_unlock(&shmlock_user_lock);
>   return allowed;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0963e7f..4e683dd 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2020,6 +2020,9 @@ static int acct_stack_growth(struct vm_area_struct 
> *vma, unsigned long size, uns
>   return -ENOMEM;
>  
>   bump_rlimit(RLIMIT_STACK, actual_size);
> + if (vma->vm_flags & VM_LOCKED)
> + bump_rlimit(RLIMIT_MEMLOCK,
> + (mm->locked_vm + grow) << PAGE_SHIFT);
>  
>   return 0;
>  }
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 1f157ad..ade3e13 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -394,6 +394,9 @@ static struct vm_area_struct *vma_to_resize(unsigned long 
> addr,
>   *p = charged;
>   }
>  
> + if (vma->vm_flags & VM_LOCKED)
> + bump_rlimit(RLIMIT_MEMLOCK, (mm->locked_vm << PAGE_SHIFT) +
> + new_len - old_len);
>   return vma;
>  }
>  
> 


-- 
Doug Ledford 
GPG Key ID: 0E572FDD



signature.asc
Description: OpenPGP digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-17 Thread Thadeu Lima de Souza Cascardo
On Fri, Jun 17, 2016 at 10:53:21PM +1000, Michael Ellerman wrote:
> On Tue, 2016-07-06 at 13:32:23 UTC, "Naveen N. Rao" wrote:
> > diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> > b/arch/powerpc/net/bpf_jit_comp64.c
> > new file mode 100644
> > index 000..954ff53
> > --- /dev/null
> > +++ b/arch/powerpc/net/bpf_jit_comp64.c
> > @@ -0,0 +1,956 @@
> ...
> > +
> > +static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
> > +{
> > +   int *p = area;
> > +
> > +   /* Fill whole space with trap instructions */
> > +   while (p < (int *)((char *)area + size))
> > +   *p++ = BREAKPOINT_INSTRUCTION;
> > +}
> 
> This breaks the build for some configs, presumably you're missing a header:
> 
>   arch/powerpc/net/bpf_jit_comp64.c:30:10: error: 'BREAKPOINT_INSTRUCTION' 
> undeclared (first use in this function)
> 
> http://kisskb.ellerman.id.au/kisskb/buildresult/12720611/
> 
> cheers

Hi, Michael and Naveen.

I noticed independently that there is a problem with BPF JIT and ABIv2, and
worked out the patch below before I noticed Naveen's patchset and the latest
changes in ppc tree for a better way to check for ABI versions.

However, since the issue described below affect mainline and stable kernels,
would you consider applying it before merging your two patchsets, so that we can
more easily backport the fix?

Thanks.
Cascardo.

---
From a984dc02b6317a1d3a3c2302385adba5227be5bd Mon Sep 17 00:00:00 2001
From: Thadeu Lima de Souza Cascardo 
Date: Wed, 15 Jun 2016 13:22:12 -0300
Subject: [PATCH] ppc: Fix BPF JIT for ABIv2

ABIv2 used for ppc64le does not use function descriptors. Without this patch,
whenever BPF JIT is enabled, we get a crash as below.

[root@ibm-p8-kvm-05-guest-02 ~]# echo 2 > /proc/sys/net/core/bpf_jit_enable
[root@ibm-p8-kvm-05-guest-02 ~]# tcpdump -n -i eth0 tcp port 22
device eth0 entered promiscuous mode
Pass 1: shrink = 0, seen = 0x0
Pass 2: shrink = 0, seen = 0x0
flen=1 proglen=8 pass=3 image=d5bb9018 from=tcpdump pid=11387
JIT code: : 00 00 60 38 20 00 80 4e
Pass 1: shrink = 0, seen = 0x3
Pass 2: shrink = 0, seen = 0x3
flen=20 proglen=524 pass=3 image=d5bbd018 from=tcpdump pid=11387
JIT code: : a6 02 08 7c 10 00 01 f8 70 ff c1 f9 78 ff e1 f9
JIT code: 0010: e1 fe 21 f8 7c 00 e3 80 78 00 e3 81 50 78 e7 7d
JIT code: 0020: c8 00 c3 e9 00 00 a0 38 00 c0 e0 3c c6 07 e7 78
JIT code: 0030: 08 00 e7 64 54 1b e7 60 a6 03 e8 7c 0c 00 c0 38
JIT code: 0040: 21 00 80 4e b0 01 80 41 00 00 00 60 dd 86 e0 38
JIT code: 0050: 01 00 e7 3c 40 38 04 7c 9c 00 82 40 00 00 00 60
JIT code: 0060: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 70 1b e7 60
JIT code: 0070: a6 03 e8 7c 14 00 c0 38 21 00 80 4e 78 01 80 41
JIT code: 0080: 00 00 00 60 06 00 04 28 68 01 82 40 00 00 00 60
JIT code: 0090: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 54 1b e7 60
JIT code: 00a0: a6 03 e8 7c 36 00 c0 38 21 00 80 4e 48 01 80 41
JIT code: 00b0: 00 00 00 60 16 00 04 28 2c 01 82 41 00 00 00 60
JIT code: 00c0: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 54 1b e7 60
JIT code: 00d0: a6 03 e8 7c 38 00 c0 38 21 00 80 4e 18 01 80 41
JIT code: 00e0: 00 00 00 60 16 00 04 28 fc 00 82 41 00 00 00 60
JIT code: 00f0: 00 01 00 48 00 08 04 28 f8 00 82 40 00 00 00 60
JIT code: 0100: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 70 1b e7 60
JIT code: 0110: a6 03 e8 7c 17 00 c0 38 21 00 80 4e d8 00 80 41
JIT code: 0120: 00 00 00 60 06 00 04 28 c8 00 82 40 00 00 00 60
JIT code: 0130: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 54 1b e7 60
JIT code: 0140: a6 03 e8 7c 14 00 c0 38 21 00 80 4e a8 00 80 41
JIT code: 0150: 00 00 00 60 ff 1f 87 70 98 00 82 40 00 00 00 60
JIT code: 0160: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 88 1b e7 60
JIT code: 0170: a6 03 e8 7c 0e 00 c0 38 21 00 80 4e 78 00 80 41
JIT code: 0180: 00 00 00 60 00 c0 e0 3c c6 07 e7 78 08 00 e7 64
JIT code: 0190: 4c 1b e7 60 a6 03 e8 7c 0e 00 c5 38 21 00 80 4e
JIT code: 01a0: 54 00 80 41 00 00 00 60 16 00 04 28 38 00 82 41
JIT code: 01b0: 00 00 00 60 00 c0 e0 3c c6 07 e7 78 08 00 e7 64
JIT code: 01c0: 4c 1b e7 60 a6 03 e8 7c 10 00 c5 38 21 00 80 4e
JIT code: 01d0: 24 00 80 41 00 00 00 60 16 00 04 28 14 00 82 40
JIT code: 01e0: 00 00 00 60 ff ff 60 38 01 00 63 3c 08 00 00 48
JIT code: 01f0: 00 00 60 38 20 01 21 38 10 00 01 e8 a6 03 08 7c
JIT code: 0200: 70 ff c1 e9 78 ff e1 e9 20 00 80 4e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
Oops: Exception in kernel mode, sig: 4 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in: virtio_balloon nfsd ip_tables x_tables autofs4 xfs libcrc32c 
virtio_console virtio_net virtio_pci virtio_ring virtio
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-9-gdb06d75 #1
task: c004a9254500 ti: c004bffe4000 task.ti: c004a926
NIP: d5bbd000 LR: c08bcad8 CTR: d5bbd000
REGS: c004bffe7

Re: [PATCH 12/12] leds: Only descend into leds directory when CONFIG_NEW_LEDS is set

2016-06-17 Thread Andrew F. Davis
On 06/15/2016 01:48 AM, Jacek Anaszewski wrote:
> Hi Andrew,
> 
> Thanks for the patch.
> 
> Please address the issue [1] raised by test bot and resubmit.
> 
> Thanks,
> Jacek Anaszewski
> 
> [1] https://lkml.org/lkml/2016/6/13/1091
> 

It looks like some systems use 'gpio_led_register_device' to make an
in-memory copy of their LED device table so the original can be removed
as .init.rodata. This doesn't necessarily depend on the LED subsystem
but it kind of seems useless when the rest of the subsystem is disabled.

One solution could be to use a dummy 'gpio_led_register_device' when the
subsystem is not enabled. Another is just to remove the five or so uses
of 'gpio_led_register_device' and have those systems register LED device
tables like other systems do.

If nether of these are acceptable then this patch can be dropped from
this series for now.

Thanks,
Andrew

> On 06/13/2016 10:02 PM, Andrew F. Davis wrote:
>> When CONFIG_NEW_LEDS is not set make will still descend into the leds
>> directory but nothing will be built. This produces unneeded build
>> artifacts and messages in addition to slowing the build. Fix this here.
>>
>> Signed-off-by: Andrew F. Davis 
>> ---
>>   drivers/Makefile | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/Makefile b/drivers/Makefile
>> index 567e32c..fa514d5 100644
>> --- a/drivers/Makefile
>> +++ b/drivers/Makefile
>> @@ -127,7 +127,7 @@ obj-$(CONFIG_CPU_FREQ)+= cpufreq/
>>   obj-$(CONFIG_CPU_IDLE)+= cpuidle/
>>   obj-$(CONFIG_MMC)+= mmc/
>>   obj-$(CONFIG_MEMSTICK)+= memstick/
>> -obj-y+= leds/
>> +obj-$(CONFIG_NEW_LEDS)+= leds/
>>   obj-$(CONFIG_INFINIBAND)+= infiniband/
>>   obj-$(CONFIG_SGI_SN)+= sn/
>>   obj-y+= firmware/
>>
> 
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] tracing: Expose CPU physical addresses (resource values) for PCI devices

2016-06-17 Thread Benjamin Herrenschmidt
On Fri, 2016-06-17 at 17:59 -0400, Steven Rostedt wrote:
> Sorry for the late reply, this patch got pushed down in my INBOX.
> 
> Could I get someone from PPC to review this patch, just to be safe?

The patch makes sense, I can try getting somebody onto porting
mmiotrace one of these days.

Cheers,
Ben.

> Thanks!
> 
> -- Steve
> 
> 
> 
> On Wed, 11 May 2016 14:06:57 -0500
> Bjorn Helgaas  wrote:
> 
> > Previously, mmio_print_pcidev() put "user" addresses in the trace
> > buffer.
> > On most architectures, these are the same as CPU physical
> > addresses, but on
> > microblaze, mips, powerpc, and sparc, they may be something else,
> > typically
> > a raw BAR value (a bus address as opposed to a CPU address).
> > 
> > Always expose the CPU physical address to avoid this arch-dependent
> > behavior.
> > 
> > This change should have no user-visible effect because this file
> > currently
> > depends on CONFIG_HAVE_MMIOTRACE_SUPPORT, which is only defined for
> > x86,
> > and pci_resource_to_user() is a no-op on x86.
> > 
> > Signed-off-by: Bjorn Helgaas 
> > ---
> >  kernel/trace/trace_mmiotrace.c |   10 +++---
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/trace/trace_mmiotrace.c
> > b/kernel/trace/trace_mmiotrace.c
> > index 68f376c..cd7480d 100644
> > --- a/kernel/trace/trace_mmiotrace.c
> > +++ b/kernel/trace/trace_mmiotrace.c
> > @@ -68,19 +68,15 @@ static void mmio_print_pcidev(struct trace_seq
> > *s, const struct pci_dev *dev)
> >     trace_seq_printf(s, "PCIDEV %02x%02x %04x%04x %x",
> >      dev->bus->number, dev->devfn,
> >      dev->vendor, dev->device, dev->irq);
> > -   /*
> > -    * XXX: is pci_resource_to_user() appropriate, since we
> > are
> > -    * supposed to interpret the __ioremap() phys_addr
> > argument based on
> > -    * these printed values?
> > -    */
> >     for (i = 0; i < 7; i++) {
> > -   pci_resource_to_user(dev, i, &dev->resource[i],
> > &start, &end);
> > +   start = dev->resource[i].start;
> >     trace_seq_printf(s, " %llx",
> >     (unsigned long long)(start |
> >     (dev->resource[i].flags &
> > PCI_REGION_FLAG_MASK)));
> >     }
> >     for (i = 0; i < 7; i++) {
> > -   pci_resource_to_user(dev, i, &dev->resource[i],
> > &start, &end);
> > +   start = dev->resource[i].start;
> > +   end = dev->resource[i].end;
> >     trace_seq_printf(s, " %llx",
> >     dev->resource[i].start < dev-
> > >resource[i].end ?
> >     (unsigned long long)(end - start) + 1 :
> > 0);
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] tracing: Expose CPU physical addresses (resource values) for PCI devices

2016-06-17 Thread Steven Rostedt

Sorry for the late reply, this patch got pushed down in my INBOX.

Could I get someone from PPC to review this patch, just to be safe?

Thanks!

-- Steve



On Wed, 11 May 2016 14:06:57 -0500
Bjorn Helgaas  wrote:

> Previously, mmio_print_pcidev() put "user" addresses in the trace buffer.
> On most architectures, these are the same as CPU physical addresses, but on
> microblaze, mips, powerpc, and sparc, they may be something else, typically
> a raw BAR value (a bus address as opposed to a CPU address).
> 
> Always expose the CPU physical address to avoid this arch-dependent
> behavior.
> 
> This change should have no user-visible effect because this file currently
> depends on CONFIG_HAVE_MMIOTRACE_SUPPORT, which is only defined for x86,
> and pci_resource_to_user() is a no-op on x86.
> 
> Signed-off-by: Bjorn Helgaas 
> ---
>  kernel/trace/trace_mmiotrace.c |   10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/trace/trace_mmiotrace.c b/kernel/trace/trace_mmiotrace.c
> index 68f376c..cd7480d 100644
> --- a/kernel/trace/trace_mmiotrace.c
> +++ b/kernel/trace/trace_mmiotrace.c
> @@ -68,19 +68,15 @@ static void mmio_print_pcidev(struct trace_seq *s, const 
> struct pci_dev *dev)
>   trace_seq_printf(s, "PCIDEV %02x%02x %04x%04x %x",
>dev->bus->number, dev->devfn,
>dev->vendor, dev->device, dev->irq);
> - /*
> -  * XXX: is pci_resource_to_user() appropriate, since we are
> -  * supposed to interpret the __ioremap() phys_addr argument based on
> -  * these printed values?
> -  */
>   for (i = 0; i < 7; i++) {
> - pci_resource_to_user(dev, i, &dev->resource[i], &start, &end);
> + start = dev->resource[i].start;
>   trace_seq_printf(s, " %llx",
>   (unsigned long long)(start |
>   (dev->resource[i].flags & PCI_REGION_FLAG_MASK)));
>   }
>   for (i = 0; i < 7; i++) {
> - pci_resource_to_user(dev, i, &dev->resource[i], &start, &end);
> + start = dev->resource[i].start;
> + end = dev->resource[i].end;
>   trace_seq_printf(s, " %llx",
>   dev->resource[i].start < dev->resource[i].end ?
>   (unsigned long long)(end - start) + 1 : 0);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/9] kexec_file: Generalize kexec_add_buffer.

2016-06-17 Thread Thiago Jung Bauermann
Am Freitag, 17 Juni 2016, 15:35:23 schrieb Dave Young:
> On 06/16/16 at 05:39pm, Thiago Jung Bauermann wrote:
> > Am Donnerstag, 16 Juni 2016, 09:58:53 schrieb Dave Young:
> > > On 06/15/16 at 01:21pm, Thiago Jung Bauermann wrote:
> > > > +int __weak arch_kexec_walk_mem(unsigned int image_type, bool
> > > > top_down,
> > > > +  void *data, int (*func)(u64, u64, void 
> > > > *))
> > > > +{
> > > 
> > > top_down is also not used?
> > 
> > It's unused in the default implementation, but the powerpc implementation
> > in patch 8 uses it:
> Well, arch_kexec_walk_mem use kbuf as "data", you can even drop
> "image_type" since kbuf has all you want kbuf->image->type, and
> kbuf->top_down
> 
> int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
>  int (*func)(u64, u64, void *))

Sounds good to me, but I had to move struct kexec_buf from
kernel/kexec_internal.h to include/linux/kexec.h.

Here's the updated patch. What do you think?

[]'s
Thiago Jung Bauermann
IBM Linux Technology Center


kexec_file: Generalize kexec_add_buffer.

Allow architectures to specify different memory walking functions for
kexec_add_buffer. Intel uses iomem to track reserved memory ranges,
but PowerPC uses the memblock subsystem.

Signed-off-by: Thiago Jung Bauermann 
Cc: Eric Biederman 
Cc: Dave Young 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index e8acb2b43dd9..d8df01107ae2 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -201,6 +201,20 @@ struct kimage {
 #endif
 };
 
+/*
+ * Keeps track of buffer parameters as provided by caller for requesting
+ * memory placement of buffer.
+ */
+struct kexec_buf {
+   struct kimage *image;
+   unsigned long mem;
+   unsigned long memsz;
+   unsigned long buf_align;
+   unsigned long buf_min;
+   unsigned long buf_max;
+   bool top_down;  /* allocate from top of memory hole */
+};
+
 /* kexec interface functions */
 extern void machine_kexec(struct kimage *image);
 extern int machine_kexec_prepare(struct kimage *image);
@@ -315,6 +329,8 @@ int __weak arch_kexec_apply_relocations_add(const Elf_Ehdr 
*ehdr,
Elf_Shdr *sechdrs, unsigned int relsec);
 int __weak arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, Elf_Shdr 
*sechdrs,
unsigned int relsec);
+int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
+  int (*func)(u64, u64, void *));
 void arch_kexec_protect_crashkres(void);
 void arch_kexec_unprotect_crashkres(void);
 
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index b6eec7527e9f..b1f1f6402518 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -428,6 +428,27 @@ static int locate_mem_hole_callback(u64 start, u64 end, 
void *arg)
return locate_mem_hole_bottom_up(start, end, kbuf);
 }
 
+/**
+ * arch_kexec_walk_mem - call func(data) on free memory regions
+ * @kbuf:  Context info for the search. Also passed to @func.
+ * @func:  Function to call for each memory region.
+ *
+ * Return: The memory walk will stop when func returns a non-zero value
+ * and that value will be returned. If all free regions are visited without
+ * func returning non-zero, then zero will be returned.
+ */
+int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
+  int (*func)(u64, u64, void *))
+{
+   if (kbuf->image->type == KEXEC_TYPE_CRASH)
+   return walk_iomem_res_desc(crashk_res.desc,
+  IORESOURCE_SYSTEM_RAM | 
IORESOURCE_BUSY,
+  crashk_res.start, crashk_res.end,
+  kbuf, func);
+   else
+   return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
+}
+
 /*
  * Helper function for placing a buffer in a kexec segment. This assumes
  * that kexec_mutex is held.
@@ -472,14 +493,7 @@ int kexec_add_buffer(struct kimage *image, char *buffer, 
unsigned long bufsz,
kbuf->top_down = top_down;
 
/* Walk the RAM ranges and allocate a suitable range for the buffer */
-   if (image->type == KEXEC_TYPE_CRASH)
-   ret = walk_iomem_res_desc(crashk_res.desc,
-   IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
-   crashk_res.start, crashk_res.end, kbuf,
-   locate_mem_hole_callback);
-   else
-   ret = walk_system_ram_res(0, -1, kbuf,
- locate_mem_hole_callback);
+   ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback);
if (ret != 1) {
/* A suitable memory range could not be found for buffer */
return -EADDRNOTAVAIL;
diff --git a/kernel/kexec_internal.h b/kernel/kexec_interna

Re: possible bug in powerpc LE compat syscalls with 64-bit args

2016-06-17 Thread Chris Metcalf

On 6/16/2016 5:42 PM, Andreas Schwab wrote:

Chris Metcalf  writes:


Reviewing what other platforms do, it seems like powerpc compat mode may
have the opposite problem in little-endian mode, since arguments are passed
in "hi, lo" order unconditionally in arch/powerpc/kernel/sys_ppc32.c.

PPC32 is always big-endian.


Sounds good then.

Commit 422b9b9684db ("powerpc/compat: 32-bit little endian machine
name is ppcle, not ppc") made me think there was support for a
little-endian 32-bit mode.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1 4/4] sparc/PCI: Implement pci_resource_to_user() with pcibios_resource_to_bus()

2016-06-17 Thread Bjorn Helgaas
"User" addresses are shown in /sys/devices/pci.../.../resource and
/proc/bus/pci/devices and used as mmap offsets for /proc/bus/pci/BB/DD.F
files.  On sparc, these are PCI bus addresses, i.e., raw BAR values.

Previously pci_resource_to_user() computed the user address by
subtracting either pbm->io_space.start or pbm->mem_space.start from the
resource start.

We've already told the PCI core about those offsets here:

  pci_scan_one_pbm()
pci_add_resource_offset(&resources, &pbm->io_space, pbm->io_space.start);
pci_add_resource_offset(&resources, &pbm->mem_space, pbm->mem_space.start);
pci_add_resource_offset(&resources, &pbm->mem64_space, 
pbm->mem_space.start);

so pcibios_resource_to_bus() knows how to do that translation.

No functional change intended.

Signed-off-by: Bjorn Helgaas 
Acked-by: Yinghai Lu 
---
 arch/sparc/kernel/pci.c |   20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index c2b202d..9c1878f 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -986,16 +986,18 @@ void pci_resource_to_user(const struct pci_dev *pdev, int 
bar,
  const struct resource *rp, resource_size_t *start,
  resource_size_t *end)
 {
-   struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-   unsigned long offset;
-
-   if (rp->flags & IORESOURCE_IO)
-   offset = pbm->io_space.start;
-   else
-   offset = pbm->mem_space.start;
+   struct pci_bus_region region;
 
-   *start = rp->start - offset;
-   *end = rp->end - offset;
+   /*
+* "User" addresses are shown in /sys/devices/pci.../.../resource
+* and /proc/bus/pci/devices and used as mmap offsets for
+* /proc/bus/pci/BB/DD.F files (see proc_bus_pci_mmap()).
+*
+* On sparc, these are PCI bus addresses, i.e., raw BAR values.
+*/
+   pcibios_resource_to_bus(pdev->bus, ®ion, (struct resource *) rp);
+   *start = region.start;
+   *end = region.end;
 }
 
 void pcibios_set_master(struct pci_dev *dev)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1 3/4] powerpc/pci: Implement pci_resource_to_user() with pcibios_resource_to_bus()

2016-06-17 Thread Bjorn Helgaas
"User" addresses are shown in /sys/devices/pci.../.../resource and
/proc/bus/pci/devices and used as mmap offsets for /proc/bus/pci/BB/DD.F
files.  For I/O port resources on powerpc, these are PCI bus addresses,
i.e., raw BAR values.

Previously pci_resource_to_user() computed the user address by subtracting
"hose->io_base_virt - _IO_BASE" from the resource start:

  pci_resource_to_user()
if (IO)
  offset = (unsigned long)hose->io_base_virt - _IO_BASE;
*start = rsrc->start - offset;

We've already told the PCI core about that "hose->io_base_virt - _IO_BASE"
offset:

  pcibios_setup_phb_resources()
res = &hose->io_resource;
offset = pcibios_io_space_offset();
/* i.e., "offset = hose->io_base_virt - _IO_BASE" */
pci_add_resource_offset(resources, res, offset);

so pcibios_resource_to_bus() knows how to do that translation.

No functional change intended.

Signed-off-by: Bjorn Helgaas 
Acked-by: Yinghai Lu 
---
 arch/powerpc/kernel/pci-common.c |   42 +-
 1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 8c6beb0..6de6e0e 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -581,39 +581,25 @@ void pci_resource_to_user(const struct pci_dev *dev, int 
bar,
  const struct resource *rsrc,
  resource_size_t *start, resource_size_t *end)
 {
-   struct pci_controller *hose = pci_bus_to_host(dev->bus);
-   resource_size_t offset = 0;
+   struct pci_bus_region region;
 
-   if (hose == NULL)
+   if (rsrc->flags & IORESOURCE_IO) {
+   pcibios_resource_to_bus(dev->bus, ®ion,
+   (struct resource *) rsrc);
+   *start = region.start;
+   *end = region.end;
return;
+   }
 
-   if (rsrc->flags & IORESOURCE_IO)
-   offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-
-   /* We pass a fully fixed up address to userland for MMIO instead of
-* a BAR value because X is lame and expects to be able to use that
-* to pass to /dev/mem !
-*
-* That means that we'll have potentially 64 bits values where some
-* userland apps only expect 32 (like X itself since it thinks only
-* Sparc has 64 bits MMIO) but if we don't do that, we break it on
-* 32 bits CHRPs :-(
-*
-* Hopefully, the sysfs insterface is immune to that gunk. Once X
-* has been fixed (and the fix spread enough), we can re-enable the
-* 2 lines below and pass down a BAR value to userland. In that case
-* we'll also have to re-enable the matching code in
-* __pci_mmap_make_offset().
+   /* We pass a CPU physical address to userland for MMIO instead of a
+* BAR value because X is lame and expects to be able to use that
+* to pass to /dev/mem!
 *
-* BenH.
+* That means we may have 64-bit values where some apps only expect
+* 32 (like X itself since it thinks only Sparc has 64-bit MMIO).
 */
-#if 0
-   else if (rsrc->flags & IORESOURCE_MEM)
-   offset = hose->pci_mem_offset;
-#endif
-
-   *start = rsrc->start - offset;
-   *end = rsrc->end - offset;
+   *start = rsrc->start;
+   *end = rsrc->end;
 }
 
 /**

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1 2/4] microblaze/PCI: Implement pci_resource_to_user() with pcibios_resource_to_bus()

2016-06-17 Thread Bjorn Helgaas
"User" addresses are shown in /sys/devices/pci.../.../resource and
/proc/bus/pci/devices and used as mmap offsets for /proc/bus/pci/BB/DD.F
files.  For I/O port resources on microblaze, these are PCI bus addresses,
i.e., raw BAR values.

Previously pci_resource_to_user() computed the user address by subtracting
"hose->io_base_virt - _IO_BASE" from the resource start:

  pci_resource_to_user()
if (IO)
  offset = (unsigned long)hose->io_base_virt - _IO_BASE;
*start = rsrc->start - offset;

We've already told the PCI core about that "hose->io_base_virt - _IO_BASE"
offset:

  pcibios_setup_phb_resources()
res = &hose->io_resource;
pci_add_resource_offset(resources, res, hose->io_base_virt - _IO_BASE);

so pcibios_resource_to_bus() knows how to do that translation.

No functional change intended.

Signed-off-by: Bjorn Helgaas 
Acked-by: Yinghai Lu 
---
 arch/microblaze/pci/pci-common.c |   42 +-
 1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 1974567..81556b8 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -444,39 +444,25 @@ void pci_resource_to_user(const struct pci_dev *dev, int 
bar,
  const struct resource *rsrc,
  resource_size_t *start, resource_size_t *end)
 {
-   struct pci_controller *hose = pci_bus_to_host(dev->bus);
-   resource_size_t offset = 0;
+   struct pci_bus_region region;
 
-   if (hose == NULL)
+   if (rsrc->flags & IORESOURCE_IO) {
+   pcibios_resource_to_bus(dev->bus, ®ion,
+   (struct resource *) rsrc);
+   *start = region.start;
+   *end = region.end;
return;
+   }
 
-   if (rsrc->flags & IORESOURCE_IO)
-   offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-
-   /* We pass a fully fixed up address to userland for MMIO instead of
-* a BAR value because X is lame and expects to be able to use that
-* to pass to /dev/mem !
+   /* We pass a CPU physical address to userland for MMIO instead of a
+* BAR value because X is lame and expects to be able to use that
+* to pass to /dev/mem!
 *
-* That means that we'll have potentially 64 bits values where some
-* userland apps only expect 32 (like X itself since it thinks only
-* Sparc has 64 bits MMIO) but if we don't do that, we break it on
-* 32 bits CHRPs :-(
-*
-* Hopefully, the sysfs insterface is immune to that gunk. Once X
-* has been fixed (and the fix spread enough), we can re-enable the
-* 2 lines below and pass down a BAR value to userland. In that case
-* we'll also have to re-enable the matching code in
-* __pci_mmap_make_offset().
-*
-* BenH.
+* That means we may have 64-bit values where some apps only expect
+* 32 (like X itself since it thinks only Sparc has 64-bit MMIO).
 */
-#if 0
-   else if (rsrc->flags & IORESOURCE_MEM)
-   offset = hose->pci_mem_offset;
-#endif
-
-   *start = rsrc->start - offset;
-   *end = rsrc->end - offset;
+   *start = rsrc->start;
+   *end = rsrc->end;
 }
 
 /**

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1 1/4] PCI: Unify pci_resource_to_user() declarations

2016-06-17 Thread Bjorn Helgaas
Replace the pci_resource_to_user() declarations in each arch that defines
HAVE_ARCH_PCI_RESOURCE_TO_USER with a single one in linux/pci.h.

Change the MIPS static inline implementation to a non-inline version so the
static inline doesn't conflict with the new non-static linux/pci.h
declaration.

No functional change intended.

Signed-off-by: Bjorn Helgaas 
---
 arch/microblaze/include/asm/pci.h |3 ---
 arch/mips/include/asm/pci.h   |   10 --
 arch/mips/pci/pci.c   |   10 ++
 arch/powerpc/include/asm/pci.h|3 ---
 arch/sparc/include/asm/pci_64.h   |3 ---
 include/linux/pci.h   |6 +-
 6 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/arch/microblaze/include/asm/pci.h 
b/arch/microblaze/include/asm/pci.h
index fc3ecb5..2a120bb 100644
--- a/arch/microblaze/include/asm/pci.h
+++ b/arch/microblaze/include/asm/pci.h
@@ -82,9 +82,6 @@ extern pgprot_t   pci_phys_mem_access_prot(struct file 
*file,
 pgprot_t prot);
 
 #define HAVE_ARCH_PCI_RESOURCE_TO_USER
-extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
-const struct resource *rsrc,
-resource_size_t *start, resource_size_t *end);
 
 extern void pcibios_setup_bus_devices(struct pci_bus *bus);
 extern void pcibios_setup_bus_self(struct pci_bus *bus);
diff --git a/arch/mips/include/asm/pci.h b/arch/mips/include/asm/pci.h
index 86b239d..9b63cd4 100644
--- a/arch/mips/include/asm/pci.h
+++ b/arch/mips/include/asm/pci.h
@@ -80,16 +80,6 @@ extern int pci_mmap_page_range(struct pci_dev *dev, struct 
vm_area_struct *vma,
 
 #define HAVE_ARCH_PCI_RESOURCE_TO_USER
 
-static inline void pci_resource_to_user(const struct pci_dev *dev, int bar,
-   const struct resource *rsrc, resource_size_t *start,
-   resource_size_t *end)
-{
-   phys_addr_t size = resource_size(rsrc);
-
-   *start = fixup_bigphys_addr(rsrc->start, size);
-   *end = rsrc->start + size;
-}
-
 /*
  * Dynamic DMA mapping stuff.
  * MIPS has everything mapped statically.
diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index f1b11f0..5717384 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -319,6 +319,16 @@ void pcibios_fixup_bus(struct pci_bus *bus)
 EXPORT_SYMBOL(PCIBIOS_MIN_IO);
 EXPORT_SYMBOL(PCIBIOS_MIN_MEM);
 
+void pci_resource_to_user(const struct pci_dev *dev, int bar,
+ const struct resource *rsrc, resource_size_t *start,
+ resource_size_t *end)
+{
+   phys_addr_t size = resource_size(rsrc);
+
+   *start = fixup_bigphys_addr(rsrc->start, size);
+   *end = rsrc->start + size;
+}
+
 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
enum pci_mmap_state mmap_state, int write_combine)
 {
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index a6f3ac0..e9bd6cf 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -136,9 +136,6 @@ extern pgprot_t pci_phys_mem_access_prot(struct file 
*file,
 pgprot_t prot);
 
 #define HAVE_ARCH_PCI_RESOURCE_TO_USER
-extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
-const struct resource *rsrc,
-resource_size_t *start, resource_size_t *end);
 
 extern resource_size_t pcibios_io_space_offset(struct pci_controller *hose);
 extern void pcibios_setup_bus_devices(struct pci_bus *bus);
diff --git a/arch/sparc/include/asm/pci_64.h b/arch/sparc/include/asm/pci_64.h
index 022d160..2303635 100644
--- a/arch/sparc/include/asm/pci_64.h
+++ b/arch/sparc/include/asm/pci_64.h
@@ -55,9 +55,6 @@ static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, 
int channel)
 }
 
 #define HAVE_ARCH_PCI_RESOURCE_TO_USER
-void pci_resource_to_user(const struct pci_dev *dev, int bar,
- const struct resource *rsrc,
- resource_size_t *start, resource_size_t *end);
 #endif /* __KERNEL__ */
 
 #endif /* __SPARC64_PCI_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b67e4df..9c201d4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1554,7 +1554,11 @@ static inline const char *pci_name(const struct pci_dev 
*pdev)
 /* Some archs don't want to expose struct resource to userland as-is
  * in sysfs and /proc
  */
-#ifndef HAVE_ARCH_PCI_RESOURCE_TO_USER
+#ifdef HAVE_ARCH_PCI_RESOURCE_TO_USER
+void pci_resource_to_user(const struct pci_dev *dev, int bar,
+ const struct resource *rsrc,
+ resource_size_t *start, resource_size_t *end);
+#else
 static inline void pci_resource_to_user(const struct pci_dev *dev, int bar,
const struct resource *rsrc, resource_size_t *start,
resource_size_t *end)

_

[PATCH v1 0/4] PCI: pci_resource_to_user() cleanups

2016-06-17 Thread Bjorn Helgaas via Linuxppc-dev
The /sys/devices/pci.../.../resource and /proc/bus/pci/devices files
contain PCI BAR addresses.  On most architectures these addresses are
"resource" values, e.g., CPU physical memory addresses or Linux I/O port
numbers.  These may be offset from the raw PCI values if there are multiple
PCI host bridges.

On others (microblaze, mips, powerpc, sparc) they are raw PCI values as
they would appear on the PCI bus.  pci_resource_to_user() converts from the
struct resource to whatever the arch wants to expose.  It's a no-op on
most arches.

The PCI core provides a pcibios_resource_to_bus() function that converts
from struct resource values to raw PCI bus values.  These patches use that
when possible instead of the arch-specific hand-coded equivalent.

These shouldn't fix or break anything unless I've made a mistake.

---

Bjorn Helgaas (4):
  PCI: Unify pci_resource_to_user() declarations
  microblaze/PCI: Implement pci_resource_to_user() with 
pcibios_resource_to_bus()
  powerpc/pci: Implement pci_resource_to_user() with 
pcibios_resource_to_bus()
  sparc/PCI: Implement pci_resource_to_user() with pcibios_resource_to_bus()


 arch/microblaze/include/asm/pci.h |3 ---
 arch/microblaze/pci/pci-common.c  |   42 -
 arch/mips/include/asm/pci.h   |   10 -
 arch/mips/pci/pci.c   |   10 +
 arch/powerpc/include/asm/pci.h|3 ---
 arch/powerpc/kernel/pci-common.c  |   42 -
 arch/sparc/include/asm/pci_64.h   |3 ---
 arch/sparc/kernel/pci.c   |   20 ++
 include/linux/pci.h   |6 -
 9 files changed, 54 insertions(+), 85 deletions(-)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v1 2/3] powerpc/pci: Remove __pci_mmap_set_pgprot()

2016-06-17 Thread Bjorn Helgaas
On Thu, Jun 09, 2016 at 01:20:23PM -0500, Bjorn Helgaas wrote:
> From: Yinghai Lu 
> 
> The powerpc-specific __pci_mmap_set_pgprot() does two things:
> 
>   1) Disables write combining for I/O port space mappings
> 
>  This only affects procfs mappings.  The pci_mmap_resource() sysfs path
>  only requests write combining for resources with IORESOURCE_PREFETCH
>  set, which doesn't include I/O resources.
> 
>  The only way to request write combining for I/O port space mappings
>  was via the PCIIOC_WRITE_COMBINE ioctl and the proc_bus_pci_mmap()
>  path, and we recently changed that path to ignore write combining for
>  I/O, so this code in powerpc is no longer needed.
> 
>   2) Automatically enables write combining for mappings of prefetchable
>  resources, even if not requested by the user
> 
>  Both procfs (via PCIIOC_MMAP_IS_MEM and PCIIOC_WRITE_COMBINE ioctls)
>  and sysfs (via "resourceN_wc" files, which are created for resources
>  with IORESOURCE_PREFETCH) provide ways for the user to map PCI memory
>  space with write combining.
> 
>  Users that desire write combining should use one of those ways instead
>  of relying on powerpc-specific behavior.
> 
> Remove the powerpc-specific __pci_mmap_set_pgprot().
> 
> The user-visible effect of this change is that users mapping prefetchable
> PCI memory space via procfs without PCIIOC_WRITE_COMBINE or via sysfs
> "resourceN" (not "resourceN_wc") will get regular uncacheable mappings
> instead of the write combining mappings they used to get.
> 
> The new behavior matches the behavior on all other arches that support
> write combining mapping.

Powerpc folks, any thoughts on this?

It's currently on my pci/resource branch, and I plan to merge it for
v4.8 if there are no objections.

> [bhelgaas: changelog]
> Signed-off-by: Yinghai Lu 
> Signed-off-by: Bjorn Helgaas 
> ---
>  arch/powerpc/kernel/pci-common.c |   37 -
>  1 file changed, 4 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/pci-common.c 
> b/arch/powerpc/kernel/pci-common.c
> index 0f7a60f..8c6beb0 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -356,36 +356,6 @@ static struct resource *__pci_mmap_make_offset(struct 
> pci_dev *dev,
>  }
>  
>  /*
> - * Set vm_page_prot of VMA, as appropriate for this architecture, for a pci
> - * device mapping.
> - */
> -static pgprot_t __pci_mmap_set_pgprot(struct pci_dev *dev, struct resource 
> *rp,
> -   pgprot_t protection,
> -   enum pci_mmap_state mmap_state,
> -   int write_combine)
> -{
> -
> - /* Write combine is always 0 on non-memory space mappings. On
> -  * memory space, if the user didn't pass 1, we check for a
> -  * "prefetchable" resource. This is a bit hackish, but we use
> -  * this to workaround the inability of /sysfs to provide a write
> -  * combine bit
> -  */
> - if (mmap_state != pci_mmap_mem)
> - write_combine = 0;
> - else if (write_combine == 0) {
> - if (rp->flags & IORESOURCE_PREFETCH)
> - write_combine = 1;
> - }
> -
> - /* XXX would be nice to have a way to ask for write-through */
> - if (write_combine)
> - return pgprot_noncached_wc(protection);
> - else
> - return pgprot_noncached(protection);
> -}
> -
> -/*
>   * This one is used by /dev/mem and fbdev who have no clue about the
>   * PCI device, it tries to find the PCI device first and calls the
>   * above routine
> @@ -458,9 +428,10 @@ int pci_mmap_page_range(struct pci_dev *dev, struct 
> vm_area_struct *vma,
>   return -EINVAL;
>  
>   vma->vm_pgoff = offset >> PAGE_SHIFT;
> - vma->vm_page_prot = __pci_mmap_set_pgprot(dev, rp,
> -   vma->vm_page_prot,
> -   mmap_state, write_combine);
> + if (write_combine)
> + vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot);
> + else
> + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>  
>   ret = remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
>  vma->vm_end - vma->vm_start, vma->vm_page_prot);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12.update2 02/15] PCI: Let pci_mmap_page_range() take resource address

2016-06-17 Thread Bjorn Helgaas
On Fri, Jun 17, 2016 at 12:25:49PM -0700, Yinghai Lu wrote:
> On Thu, Jun 16, 2016 at 7:15 PM, Bjorn Helgaas  wrote:
> > On Thu, Jun 09, 2016 at 03:25:52PM -0700, Yinghai Lu wrote:
> >> In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try
> >> to check exposed value with resource start/end in proc mmap path.
> >>
> >> |start = vma->vm_pgoff;
> >> |size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
> >> |pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
> >> |pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
> >> |if (start >= pci_start && start < pci_start + size &&
> >> |start + nr <= pci_start + size)
> >>
> >> That breaks sparc that exposed value is BAR value, and need to be offseted
> >> to resource address.
> >
> > I'm not quite sure what you're saying here.  Are you saying that sparc
> > is currently broken, and this patch fixes it?  If so, what exactly is
> > broken?  Can you give a small example of an mmap that is currently
> > broken?
> >
> >> Original pci_mmap_page_range() is taking PCI BAR value aka usr_address.
> >>
> >> Bjorn found out that it would be much simple to pass resource address
> >> directly and avoid extra those __pci_mmap_make_offset.
> >>
> >> In this patch:
> >> 1. in proc path: proc_bus_pci_mmap, try convert back to resource
> >>before calling pci_mmap_page_range
> >> 2. in sysfs path: pci_mmap_resource will just offset with resource start.
> >> 3. all pci_mmap_page_range will have vma->vm_pgoff with in resource
> >>range instead of BAR value.
> >> 4. remove __pci_mmap_make_offset, as the checking is done
> >>in pci_mmap_fits().
> >
> > This is a pretty big patch.  It would help a lot to split it up.
> 
> Looks like they are tight together after change api. vm_pgoff meaning changes.
> 
> I could split item 4 to another patch, but compiler could complain or
> even refuse to
> go on if static functions are defined but not used.

Yeah, I was afraid they might be too tightly coupled to split up.
Still, every little bit helps.

> > I think the comment about "re-enabling the 2 lines below" is pointless
> > because doing that would break applications, which I don't think we'll
> > do.
> >
> > I propose the microblaze, powerpc, and sparc patches below, which
> > remove simplify pci_resource_to_user() and clean up this comment.
> 
> Agreed. Actually I have the change for sparc/PCI in patch 3
>sparc/PCI: Use correct offset for bus address to resource
> according to previous review.

Sure enough, I see it there now.  I think it's easier to review when
split out, so I'll keep it separate, since it's not actually dependent
on the rest of the changes in "sparc/PCI: Use correct offset for bus
address to resource".

> Will drop related change in sparc/PCI: Use correct offset for bus
> address to resource
> 
> and respin the whole patchset today.

I added your acks and pushed the result to pci/resource.  I'll also
post these formally on the list so they're easier to find.

Bjorn
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12.update2 02/15] PCI: Let pci_mmap_page_range() take resource address

2016-06-17 Thread Yinghai Lu
On Thu, Jun 16, 2016 at 7:15 PM, Bjorn Helgaas  wrote:
> On Thu, Jun 09, 2016 at 03:25:52PM -0700, Yinghai Lu wrote:
>> In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try
>> to check exposed value with resource start/end in proc mmap path.
>>
>> |start = vma->vm_pgoff;
>> |size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
>> |pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
>> |pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
>> |if (start >= pci_start && start < pci_start + size &&
>> |start + nr <= pci_start + size)
>>
>> That breaks sparc that exposed value is BAR value, and need to be offseted
>> to resource address.
>
> I'm not quite sure what you're saying here.  Are you saying that sparc
> is currently broken, and this patch fixes it?  If so, what exactly is
> broken?  Can you give a small example of an mmap that is currently
> broken?
>
>> Original pci_mmap_page_range() is taking PCI BAR value aka usr_address.
>>
>> Bjorn found out that it would be much simple to pass resource address
>> directly and avoid extra those __pci_mmap_make_offset.
>>
>> In this patch:
>> 1. in proc path: proc_bus_pci_mmap, try convert back to resource
>>before calling pci_mmap_page_range
>> 2. in sysfs path: pci_mmap_resource will just offset with resource start.
>> 3. all pci_mmap_page_range will have vma->vm_pgoff with in resource
>>range instead of BAR value.
>> 4. remove __pci_mmap_make_offset, as the checking is done
>>in pci_mmap_fits().
>
> This is a pretty big patch.  It would help a lot to split it up.

Looks like they are tight together after change api. vm_pgoff meaning changes.

I could split item 4 to another patch, but compiler could complain or
even refuse to
go on if static functions are defined but not used.

...
>
> I think the comment about "re-enabling the 2 lines below" is pointless
> because doing that would break applications, which I don't think we'll
> do.
>
> I propose the microblaze, powerpc, and sparc patches below, which
> remove simplify pci_resource_to_user() and clean up this comment.

Agreed. Actually I have the change for sparc/PCI in patch 3
   sparc/PCI: Use correct offset for bus address to resource
according to previous review.

>> @@ -999,7 +1010,6 @@ static int pci_mmap_resource(struct kobject *kobj, 
>> struct bin_attribute *attr,
>>   struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
>>   struct resource *res = attr->private;
>>   enum pci_mmap_state mmap_type;
>> - resource_size_t start, end;
>>   int i;
>>
>>   for (i = 0; i < PCI_ROM_RESOURCE; i++)
>> @@ -1008,10 +1018,21 @@ static int pci_mmap_resource(struct kobject *kobj, 
>> struct bin_attribute *attr,
>>   if (i >= PCI_ROM_RESOURCE)
>>   return -ENODEV;
>>
>> + /*
>> +  * resource start have to be PAGE_SIZE aligned, as we pass
>> +  * back virt address include round down of resource_start,
>> +  * that caller can not figure out directly.
>> +  * when it is not aligned, that mean it is io port, should go
>> +  * pci_read_resource_io()/pci_write_resource_io() path.
>> +  */
>> + if (res->start & ~PAGE_MASK)
>> + return -EINVAL;
>
> It seems reasonable to require that the mmap start and end be
> page-aligned.  It seems like we ought to do the same for the sysfs and
> the procfs paths.
>
> Since we haven't enforced this in the past, there is the potential for
> breaking user programs, isn't there?
>
> The alignment enforcement should be in a patch by itself, so bisection
> would tell us something useful.

ok. will do that.

>
> commit 3dbd970b6d9a96ab471b4b86650a0200a47d375d
> Author: Bjorn Helgaas 
> Date:   Thu May 5 11:39:04 2016 -0500
>
> microblaze/PCI: Implement pci_resource_to_user() with 
> pcibios_resource_to_bus()
>
> "User" addresses are shown in /sys/devices/pci.../.../resource and
> /proc/bus/pci/devices and used as mmap offsets for /proc/bus/pci/BB/DD.F
> files.  For I/O port resources on microblaze, these are PCI bus addresses,
> i.e., raw BAR values.
>
> Previously pci_resource_to_user() computed the user address by subtracting
> "hose->io_base_virt - _IO_BASE" from the resource start:
>
>   pci_resource_to_user()
> if (IO)
>   offset = (unsigned long)hose->io_base_virt - _IO_BASE;
> *start = rsrc->start - offset;
>
> We've already told the PCI core about that "hose->io_base_virt - _IO_BASE"
> offset:
>
>   pcibios_setup_phb_resources()
> res = &hose->io_resource;
> pci_add_resource_offset(resources, res, hose->io_base_virt - 
> _IO_BASE);
>
> so pcibios_resource_to_bus() knows how to do that translation.
>
> No functional change intended.
>
> Signed-off-by: Bjorn Helgaas 

Acked-by: Yinghai Lu 

>
> diff --git a/arch/microblaze/pci/pci-common.c 
> b/arch/microblaze/pci/pci-common.c
> index 1974567..e0dd64e 100

[PATCH] powerpc/mm: Prevent unlikely crash in copro_calculate_slb()

2016-06-17 Thread Frederic Barrat
If a cxl adapter faults on an invalid address for a kernel context, we
may enter copro_calculate_slb() with a NULL mm pointer (kernel
context) and an effective address which looks like a user
address. Which will cause a crash when dereferencing mm. It is clearly
an AFU bug, but there's no reason to crash either. So return an error,
so that cxl can ack the interrupt with an address error.

Signed-off-by: Frederic Barrat 
Cc: 
---
 arch/powerpc/mm/copro_fault.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index 6527882..ddfd274 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -106,6 +106,8 @@ int copro_calculate_slb(struct mm_struct *mm, u64 ea, 
struct copro_slb *slb)
switch (REGION_ID(ea)) {
case USER_REGION_ID:
pr_devel("%s: 0x%llx -- USER_REGION_ID\n", __func__, ea);
+   if (mm == NULL)
+   return 1;
psize = get_slice_psize(mm, ea);
ssize = user_segment_size(ea);
vsid = get_vsid(mm->context.id, ea, ssize);
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v6, 1/2] cxl: Add mechanism for delivering AFU driver specific events

2016-06-17 Thread Matthew R. Ochs
> On Jun 16, 2016, at 9:13 AM, Philippe Bergheaud  
> wrote:
> 
> This adds an afu_driver_ops structure with fetch_event() and
> event_delivered() callbacks. An AFU driver such as cxlflash can fill
> this out and associate it with a context to enable passing custom
> AFU specific events to userspace.
> 
> This also adds a new kernel API function cxl_context_pending_events(),
> that the AFU driver can use to notify the cxl driver that new specific
> events are ready to be delivered, and wake up anyone waiting on the
> context wait queue.
> 
> The current count of AFU driver specific events is stored in the field
> afu_driver_events of the context structure.
> 
> The cxl driver checks the afu_driver_events count during poll, select,
> read, etc. calls to check if an AFU driver specific event is pending,
> and calls fetch_event() to obtain and deliver that event. This way,
> the cxl driver takes care of all the usual locking semantics around these
> calls and handles all the generic cxl events, so that the AFU driver only
> needs to worry about it's own events.
> 
> fetch_event() return a struct cxl_event_afu_driver_reserved, allocated
> by the AFU driver, and filled in with the specific event information and
> size. Data size should not be greater than CXL_MAX_EVENT_DATA_SIZE.
> 
> Th cxl driver prepends an appropriate cxl event header, copies the event
> to userspace, and finally calls event_delivered() to return the status of
> the operation to the AFU driver. The event is identified by the context
> and cxl_event_afu_driver_reserved pointers.
> 
> Since AFU drivers provide their own means for userspace to obtain the
> AFU file descriptor (i.e. cxlflash uses an ioctl on their scsi file
> descriptor to obtain the AFU file descriptor) and the generic cxl driver
> will never use this event, the ABI of the event is up to each individual
> AFU driver.
> 
> Signed-off-by: Philippe Bergheaud 
> ---
> Changes since v5:
> - s/deliver_event/fetch_event/
> - Fixed the handling of fetch_event errors
> - Documented the return codes of event_delivered
> 
> Changes since v4:
> - Addressed comments from Vaibhav:
>  - Changed struct cxl_event_afu_driver_reserved from
>{ __u64 reserved[4]; } to { size_t data_size; u8 data[]; }
>  - Modified deliver_event to return a struct cxl_event_afu_driver_reserved
>  - Added new callback event_delivered
>  - Added static function afu_driver_event_copy
> 
> Changes since v3:
> - Removed driver ops callback ctx_event_pending
> - Created cxl function cxl_context_pending_events
> - Created cxl function cxl_unset_driver_ops
> - Added atomic event counter afu_driver_events
> 
> Changes since v2:
> - Fixed some typos spotted by Matt Ochs
> 
> Changes since v1:
> - Rebased on upstream
> - Bumped cxl api version to 3
> - Addressed comments from mpe:
>  - Clarified commit message & some comments
>  - Mentioned 'cxlflash' as a possible user of this event
>  - Check driver ops on registration and warn if missing calls
>  - Remove redundant checks where driver ops is used
>  - Simplified ctx_event_pending and removed underscore version
>  - Changed deliver_event to take the context as the first argument
> 
> drivers/misc/cxl/Kconfig |  5 +
> drivers/misc/cxl/api.c   | 27 ++
> drivers/misc/cxl/cxl.h   |  7 +-
> drivers/misc/cxl/file.c  | 58 +---
> include/misc/cxl.h   | 53 +++
> include/uapi/misc/cxl.h  | 21 ++
> 6 files changed, 162 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
> index 8756d06..560412c 100644
> --- a/drivers/misc/cxl/Kconfig
> +++ b/drivers/misc/cxl/Kconfig
> @@ -15,12 +15,17 @@ config CXL_EEH
>   bool
>   default n
> 
> +config CXL_AFU_DRIVER_OPS
> + bool
> + default n
> +
> config CXL
>   tristate "Support for IBM Coherent Accelerators (CXL)"
>   depends on PPC_POWERNV && PCI_MSI && EEH
>   select CXL_BASE
>   select CXL_KERNEL_API
>   select CXL_EEH
> + select CXL_AFU_DRIVER_OPS
>   default m
>   help
> Select this option to enable driver support for IBM Coherent
> diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
> index 6d228cc..23f98f4 100644
> --- a/drivers/misc/cxl/api.c
> +++ b/drivers/misc/cxl/api.c
> @@ -323,6 +323,33 @@ struct cxl_context *cxl_fops_get_context(struct file 
> *file)
> }
> EXPORT_SYMBOL_GPL(cxl_fops_get_context);
> 
> +void cxl_set_driver_ops(struct cxl_context *ctx,
> + struct cxl_afu_driver_ops *ops)
> +{
> + WARN_ON(!ops->fetch_event || !ops->event_delivered);
> + atomic_set(&ctx->afu_driver_events, 0);
> + ctx->afu_driver_ops = ops;
> +}
> +EXPORT_SYMBOL_GPL(cxl_set_driver_ops);
> +
> +int cxl_unset_driver_ops(struct cxl_context *ctx)
> +{
> + if (atomic_read(&ctx->afu_driver_events))
> + return -EBUSY;
> +
> + ctx->afu_driver_ops = NULL;
> + r

Re: [PATCH 3/3] cxlflash: Shutdown notify support for CXL Flash cards

2016-06-17 Thread Manoj Kumar

On 6/15/2016 6:49 PM, Uma Krishnan wrote:

Some CXL Flash cards need notification of device shutdown
in order to flush pending I/Os.

A PCI notification hook for shutdown has been added where
the driver notifies the card and returns. When the device
is removed in the PCI remove path, notification code will
wait for shutdown processing to complete.

Signed-off-by: Uma Krishnan 


Acked-by: Manoj N. Kumar 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/4] powerpc/perf: Initialise PMU related regs on POWER9

2016-06-17 Thread Michael Ellerman
On Sat, 2016-11-06 at 07:18:12 UTC, Madhavan Srinivasan wrote:
> diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
> b/arch/powerpc/kernel/cpu_setup_power.S
> index 584e119fa8b0..a2080fde0cc5 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -219,3 +223,16 @@ __init_PMU:
>   mtspr   SPRN_MMCR1,r5
>   mtspr   SPRN_MMCR2,r5
>   blr
> +
> +__init_PMU_HV_300:
> + li  r5,0
> + mtspr   SPRN_MMCRC,r5
> + blr
> +
> +__init_PMU_300:
> + li  r5,0
> + mtspr   SPRN_MMCRA,r5
> + mtspr   SPRN_MMCR0,r5
> + mtspr   SPRN_MMCR1,r5
> + mtspr   SPRN_MMCR2,r5
> + blr

With the exception of MMCRS, these are the same as the Power8 code. So there
should be able to be some common code.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [3/4] powerpc/perf: Power9 PMU support

2016-06-17 Thread Michael Ellerman
On Sat, 2016-11-06 at 07:18:14 UTC, Madhavan Srinivasan wrote:
> This patch adds base enablement for the power9 PMU.

This is almost line-for-line identical to the Power8 code, with the exception of
some things you haven't sent yet because they're not ready.

Can you try and factor out some of the common bits please?

Perhaps isa207-common.c ? But I don't really mind.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF

2016-06-17 Thread Michael Ellerman
On Tue, 2016-07-06 at 13:32:23 UTC, "Naveen N. Rao" wrote:
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> new file mode 100644
> index 000..954ff53
> --- /dev/null
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -0,0 +1,956 @@
...
> +
> +static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
> +{
> + int *p = area;
> +
> + /* Fill whole space with trap instructions */
> + while (p < (int *)((char *)area + size))
> + *p++ = BREAKPOINT_INSTRUCTION;
> +}

This breaks the build for some configs, presumably you're missing a header:

  arch/powerpc/net/bpf_jit_comp64.c:30:10: error: 'BREAKPOINT_INSTRUCTION' 
undeclared (first use in this function)

http://kisskb.ellerman.id.au/kisskb/buildresult/12720611/

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] tools/perf: Fix the mask in regs_dump__printf

2016-06-17 Thread Jiri Olsa
On Fri, Jun 17, 2016 at 02:43:31PM +0530, Madhavan Srinivasan wrote:

SNIP

> 
>  
> if (data->user_regs.abi) {
> -   u64 mask = evsel->attr.sample_regs_user;
> +   u.val64 = evsel->attr.sample_regs_user;
>  
> -   sz = hweight_long(mask) * sizeof(u64);
> +   if (sizeof(u64) > sizeof(unsigned long)) {
> +   u64 mask = u.val64;
> +   u.val32[1] = mask >> 32;
> +   u.val32[0] = mask & ULONG_MAX;
> +   }
> +
> +   sz = hweight_long(u.val64) * sizeof(u64);
> OVERFLOW_CHECK(array, sz, max_size);
> -   data->user_regs.mask = mask;
> +   data->user_regs.mask = u.val64;
> data->user_regs.regs = (u64 *)array;
> 
> Issue I see is when printing the mask value in a 32bit perf
> on a 64bit kernel (big endian).
> 
> 442044948492 0xdc0 [0x188]: PERF_RECORD_SAMPLE(IP, 0x1): 7299/7299:
> 0xc0059200 period: 12599 addr: 0
> ... intr regs: mask 0x07ff ABI 32-bit
> ^^^ shld have been 0x7ff
> 
> I agree it is better to fix this when reading, but
> we need to swap again when printing?

hum, sample data should be swap at this point already..
see perf_session__process_event

jirka
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/align: Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE

2016-06-17 Thread Arnd Bergmann
On Friday, June 17, 2016 1:35:35 PM CEST Daniel Axtens wrote:
> > It would be better to fix the sparse compilation so the same endianess
> > is set that you get when calling gcc.
> 
> I will definitely work on a patch to sparse! I'd still like this or
> something like it to go in though, so we can keep working on reducing
> the sparse warning count while the sparse patch is in the works.

I think you just need to fix the Makefile so it sets the right
arguments when calling sparse.

Something like the (untested) patch below, similar to how we
already handle the word size and how some other architectures
handle setting __BIG_ENDIAN__.

Arnd

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 709a22a3e824..8617c71c3bdb 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -181,6 +181,11 @@ KBUILD_CFLAGS  += -pipe -Iarch/$(ARCH) $(CFLAGS-y)
 CPP= $(CC) -E $(KBUILD_CFLAGS)
 
 CHECKFLAGS += -m$(CONFIG_WORD_SIZE) -D__powerpc__ 
-D__powerpc$(CONFIG_WORD_SIZE)__
+ifdef CONFIG_CPU_BIG_ENDIAN
+CHECKFLAGS += -D__BIG_ENDIAN__
+else
+CHECKFLAGS += -D__LITTLE_ENDIAN__
+endif
 
 KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v10,17/18] powerpc/powernv: Functions to get/set PCI slot state

2016-06-17 Thread Michael Ellerman
On Fri, 2016-20-05 at 06:41:41 UTC, Gavin Shan wrote:
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index 9bb8ddf..2417c86 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -344,6 +348,18 @@ enum OpalPciResetState {
>   OPAL_ASSERT_RESET   = 1
>  };
>  
> +enum OpalPciSlotPresentenceState {

In skiboot this is called "OpalPciSlotPresence".

I've renamed it.

> + OPAL_PCI_SLOT_EMPTY = 0,
> + OPAL_PCI_SLOT_PRESENT   = 1
> +};
> +
> +enum OpalPciSlotPowerState {

In skiboot this is called "OpalPciSlotPower".

I've renamed it.

> + OPAL_PCI_SLOT_POWER_OFF = 0,
> + OPAL_PCI_SLOT_POWER_ON  = 1,
> + OPAL_PCI_SLOT_OFFLINE   = 2,
> + OPAL_PCI_SLOT_ONLINE= 3
> +};
> +
>  enum OpalSlotLedType {
>   OPAL_SLOT_LED_TYPE_ID = 0,  /* IDENTIFY LED */
>   OPAL_SLOT_LED_TYPE_FAULT = 1,   /* FAULT LED */
> @@ -378,6 +394,7 @@ enum opal_msg_type {
>   OPAL_MSG_DPO= 5,
>   OPAL_MSG_PRD= 6,
>   OPAL_MSG_OCC= 7,
> + OPAL_MSG_PCI_HOTPLUG= 8,

I don't see this in skiboot?

It also doesn't seem to be used, so I've dropped it.

>   OPAL_MSG_TYPE_MAX,
>  };
>  

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] tools/perf: Fix the mask in regs_dump__printf

2016-06-17 Thread Madhavan Srinivasan


On Friday 17 June 2016 12:07 PM, Jiri Olsa wrote:
> On Fri, Jun 17, 2016 at 10:52:38AM +0530, Madhavan Srinivasan wrote:
>> When decoding the perf_regs mask in regs_dump__printf(),
>> we loop through the mask using find_first_bit and find_next_bit functions.
>> And mask is of type "u64". But "u64" is send as a "unsigned long *" to
>> lib functions along with sizeof().
>>
>> While the exisitng code works fine in most of the case, when using a
>> 32bit perf on a 64bit kernel (Big Endian), we end up reading the wrong word
>> in the u64 mask. Patch to fix the mask in regs_dump__printf().
>>
>> Suggested-by: Yury Norov 
>> Cc: Yury Norov 
>> Cc: Peter Zijlstra 
>> Cc: Ingo Molnar 
>> Cc: Arnaldo Carvalho de Melo 
>> Cc: Alexander Shishkin 
>> Cc: Jiri Olsa 
>> Cc: Adrian Hunter 
>> Cc: Kan Liang 
>> Cc: Wang Nan 
>> Cc: Michael Ellerman 
>> Signed-off-by: Madhavan Srinivasan 
>> ---
>>  tools/perf/util/session.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> index 5214974e841a..2eaa42a4832a 100644
>> --- a/tools/perf/util/session.c
>> +++ b/tools/perf/util/session.c
>> @@ -940,8 +940,13 @@ static void branch_stack__printf(struct perf_sample 
>> *sample)
>>  static void regs_dump__printf(u64 mask, u64 *regs)
>>  {
>>  unsigned rid, i = 0;
>> +unsigned long _mask[sizeof(mask)/sizeof(unsigned long)];
>>  
>> -for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) {
>> +_mask[0] = mask & ULONG_MAX;
>> +if (sizeof(mask) > sizeof(unsigned long))
>> +_mask[1] = mask >> 32;
>> +
> I think we should do this earlier when reading the mask,
> not at the moment we just print it, like we do for other
> types.. maybe we could do this as an extra bit for

Ok. I tried something like this. Currently
mask is read in perf_evsel__parse_sample(),
so added for both sample type (PERF_SAMPLE_REGS_USER
and PERF_SAMPLE_REGS_INTR)

 
if (data->user_regs.abi) {
-   u64 mask = evsel->attr.sample_regs_user;
+   u.val64 = evsel->attr.sample_regs_user;
 
-   sz = hweight_long(mask) * sizeof(u64);
+   if (sizeof(u64) > sizeof(unsigned long)) {
+   u64 mask = u.val64;
+   u.val32[1] = mask >> 32;
+   u.val32[0] = mask & ULONG_MAX;
+   }
+
+   sz = hweight_long(u.val64) * sizeof(u64);
OVERFLOW_CHECK(array, sz, max_size);
-   data->user_regs.mask = mask;
+   data->user_regs.mask = u.val64;
data->user_regs.regs = (u64 *)array;

Issue I see is when printing the mask value in a 32bit perf
on a 64bit kernel (big endian).

442044948492 0xdc0 [0x188]: PERF_RECORD_SAMPLE(IP, 0x1): 7299/7299:
0xc0059200 period: 12599 addr: 0
... intr regs: mask 0x07ff ABI 32-bit
^^^ shld have been 0x7ff

I agree it is better to fix this when reading, but
we need to swap again when printing?

> perf_event__swap_ops[PERF_RECORD_SAMPLE] function
>
> also there's print_sample_iregs in builtin-script.c that's
> most likely affected as well

My bad. Should have seen this too.

Maddy
>
> thanks,
> jirka
>
>> +for_each_set_bit(rid, _mask, sizeof(mask) * 8) {
>>  u64 val = regs[i++];
>>  
>>  printf(" %-5s 0x%" PRIx64 "\n",
>> -- 
>> 1.9.1
>>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] of: fix memory leak related to safe_name()

2016-06-17 Thread Mathieu Malaterre
v3 tested here multiple times ! memleak is now gone.

Tested-by: Mathieu Malaterre 

Thanks

On Thu, Jun 16, 2016 at 7:51 PM, Frank Rowand  wrote:
> From: Frank Rowand 
>
> Fix a memory leak resulting from memory allocation in safe_name().
> This patch fixes all call sites of safe_name().
>
> Mathieu Malaterre reported the memory leak on boot:
>
> On my PowerMac device-tree would generate a duplicate name:
>
> [0.023043] device-tree: Duplicate name in PowerPC,G4@0, renamed to 
> "l2-cache#1"
>
> in this case a newly allocated name is generated by `safe_name`. However
> in this case it is never deallocated.
>
> The bug was found using kmemleak reported as:
>
> unreferenced object 0xdf532e60 (size 32):
>   comm "swapper", pid 1, jiffies 4294892300 (age 1993.532s)
>   hex dump (first 32 bytes):
> 6c 32 2d 63 61 63 68 65 23 31 00 dd e4 dd 1e c2  l2-cache#1..
> ec d4 ba ce 04 ec cc de 8e 85 e9 ca c4 ec cc 9e  
>   backtrace:
> [] kvasprintf+0x64/0xc8
> [] kasprintf+0x4c/0x5c
> [] safe_name.isra.1+0x80/0xc4
> [] __of_attach_node_sysfs+0x6c/0x11c
> [] of_core_init+0x8c/0xf8
> [] kernel_init_freeable+0xd4/0x208
> [] kernel_init+0x24/0x11c
> [] ret_from_kernel_thread+0x5c/0x64
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=120331
>
> Signed-off-by: Frank Rowand 
> Reported-by: mathieu.malate...@gmail.com
> ---
>
> changes from v1
>   Move the prototype of __of_sysfs_remove_bin_file() into of_private.h
>
> changes from v2
>   Add the kfree that was in Mathieu's original patch
>
>  drivers/of/base.c   |   30 +-
>  drivers/of/dynamic.c|2 +-
>  drivers/of/of_private.h |3 +++
>  3 files changed, 25 insertions(+), 10 deletions(-)
>
> Index: b/drivers/of/base.c
> ===
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -112,6 +112,7 @@ static ssize_t of_node_property_read(str
> return memory_read_from_buffer(buf, count, &offset, pp->value, 
> pp->length);
>  }
>
> +/* always return newly allocated name, caller must free after use */
>  static const char *safe_name(struct kobject *kobj, const char *orig_name)
>  {
> const char *name = orig_name;
> @@ -126,9 +127,12 @@ static const char *safe_name(struct kobj
> name = kasprintf(GFP_KERNEL, "%s#%i", orig_name, ++i);
> }
>
> -   if (name != orig_name)
> +   if (name == orig_name) {
> +   name = kstrdup(orig_name, GFP_KERNEL);
> +   } else {
> pr_warn("device-tree: Duplicate name in %s, renamed to 
> \"%s\"\n",
> kobject_name(kobj), name);
> +   }
> return name;
>  }
>
> @@ -159,6 +163,7 @@ int __of_add_property_sysfs(struct devic
>  int __of_attach_node_sysfs(struct device_node *np)
>  {
> const char *name;
> +   struct kobject *parent;
> struct property *pp;
> int rc;
>
> @@ -171,15 +176,16 @@ int __of_attach_node_sysfs(struct device
> np->kobj.kset = of_kset;
> if (!np->parent) {
> /* Nodes without parents are new top level trees */
> -   rc = kobject_add(&np->kobj, NULL, "%s",
> -safe_name(&of_kset->kobj, "base"));
> +   name = safe_name(&of_kset->kobj, "base");
> +   parent = NULL;
> } else {
> name = safe_name(&np->parent->kobj, kbasename(np->full_name));
> -   if (!name || !name[0])
> -   return -EINVAL;
> -
> -   rc = kobject_add(&np->kobj, &np->parent->kobj, "%s", name);
> +   parent = &np->parent->kobj;
> }
> +   if (!name)
> +   return -ENOMEM;
> +   rc = kobject_add(&np->kobj, parent, "%s", name);
> +   kfree(name);
> if (rc)
> return rc;
>
> @@ -1815,6 +1821,12 @@ int __of_remove_property(struct device_n
> return 0;
>  }
>
> +void __of_sysfs_remove_bin_file(struct device_node *np, struct property 
> *prop)
> +{
> +   sysfs_remove_bin_file(&np->kobj, &prop->attr);
> +   kfree(prop->attr.attr.name);
> +}
> +
>  void __of_remove_property_sysfs(struct device_node *np, struct property 
> *prop)
>  {
> if (!IS_ENABLED(CONFIG_SYSFS))
> @@ -1822,7 +1834,7 @@ void __of_remove_property_sysfs(struct d
>
> /* at early boot, bail here and defer setup to of_init() */
> if (of_kset && of_node_is_attached(np))
> -   sysfs_remove_bin_file(&np->kobj, &prop->attr);
> +   __of_sysfs_remove_bin_file(np, prop);
>  }
>
>  /**
> @@ -1895,7 +1907,7 @@ void __of_update_property_sysfs(struct d
> return;
>
> if (oldprop)
> -   sysfs_remove_bin_file(&np->kobj, &oldprop->attr);
> +   __of_sysfs_remove_bin_file(np, oldprop);
> __of_add_property_sysfs(np, newprop);
>  }
>
> Index: b/drivers/of/dynamic.c
> 

Re: [PATCH v2 2/9] kexec_file: Generalize kexec_add_buffer.

2016-06-17 Thread Dave Young
On 06/16/16 at 05:39pm, Thiago Jung Bauermann wrote:
> Am Donnerstag, 16 Juni 2016, 09:58:53 schrieb Dave Young:
> > On 06/15/16 at 01:21pm, Thiago Jung Bauermann wrote:
> > > +/**
> > > + * arch_kexec_walk_mem - call func(data) on free memory regions
> > > + * @image_type:  kimage.type
> > > + * @top_down:Start from the highest address?
> > > + * @data:Argument to pass to @func.
> > > + * @func:Function to call for each memory region.
> > > + *
> > > + * Return: The memory walk will stop when func returns a non-zero value
> > > + * and that value will be returned. If all free regions are visited
> > > without + * func returning non-zero, then zero will be returned.
> > > + */
> > > +int __weak arch_kexec_walk_mem(unsigned int image_type, bool top_down,
> > > +void *data, int (*func)(u64, u64, void *))
> > > +{
> > 
> > top_down is also not used?
> 
> It's unused in the default implementation, but the powerpc implementation in 
> patch 8 uses it:

Well, arch_kexec_walk_mem use kbuf as "data", you can even drop
"image_type" since kbuf has all you want kbuf->image->type, and
kbuf->top_down 

int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
   int (*func)(u64, u64, void *))
> 
> int arch_kexec_walk_mem(unsigned int type, bool top_down, void *data,
>   int (*func)(u64, u64, void *))
> {
>   int ret = 0;
>   u64 i;
>   phys_addr_t mstart, mend;
> 
>   if (top_down) {
>   for_each_free_mem_range_reverse(i, NUMA_NO_NODE, 0,
>   &mstart, &mend, NULL) {
>   ret = func(mstart, mend, data);
>   if (ret)
>   break;
>   }
>   } else {
>   for_each_free_mem_range(i, NUMA_NO_NODE, 0, &mstart, &mend,
>   NULL) {
>   ret = func(mstart, mend, data);
>   if (ret)
>   break;
>   }
>   }
> 
>   return ret;
> }
> 
> > Rethink about the argument name type should be better than image_type,
> > it is obvious in the comments and the code and simpler.
> 
> Ok, renamed in the patch below.
> 
> []'s
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 
> kexec_file: Generalize kexec_add_buffer.
> 
> Allow architectures to specify different memory walking functions for
> kexec_add_buffer. Intel uses iomem to track reserved memory ranges,
> but PowerPC uses the memblock subsystem.
> 
> Signed-off-by: Thiago Jung Bauermann 
> Cc: Eric Biederman 
> Cc: Dave Young 
> Cc: ke...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index e8acb2b43dd9..42b31f2e1d88 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -315,6 +315,8 @@ int __weak arch_kexec_apply_relocations_add(const 
> Elf_Ehdr *ehdr,
>   Elf_Shdr *sechdrs, unsigned int relsec);
>  int __weak arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, Elf_Shdr 
> *sechdrs,
>   unsigned int relsec);
> +int __weak arch_kexec_walk_mem(unsigned int type, bool top_down, void *data,
> +int (*func)(u64, u64, void *));
>  void arch_kexec_protect_crashkres(void);
>  void arch_kexec_unprotect_crashkres(void);
>  
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index b6eec7527e9f..989647a324f2 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -428,6 +428,29 @@ static int locate_mem_hole_callback(u64 start, u64 end, 
> void *arg)
>   return locate_mem_hole_bottom_up(start, end, kbuf);
>  }
>  
> +/**
> + * arch_kexec_walk_mem - call func(data) on free memory regions
> + * @type:kimage.type
> + * @top_down:Start from the highest address?
> + * @data:Argument to pass to @func.
> + * @func:Function to call for each memory region.
> + *
> + * Return: The memory walk will stop when func returns a non-zero value
> + * and that value will be returned. If all free regions are visited without
> + * func returning non-zero, then zero will be returned.
> + */
> +int __weak arch_kexec_walk_mem(unsigned int type, bool top_down, void *data,
> +int (*func)(u64, u64, void *))
> +{
> + if (type == KEXEC_TYPE_CRASH)
> + return walk_iomem_res_desc(crashk_res.desc,
> +IORESOURCE_SYSTEM_RAM | 
> IORESOURCE_BUSY,
> +crashk_res.start, crashk_res.end,
> +data, func);
> + else
> + return walk_system_ram_res(0, ULONG_MAX, data, func);
> +}
> +
>  /*
>   * Helper function for placing a buffer in a kexec segment. This assumes
>   * that kexec_mutex is held.
> @@ -472,14 +495,8 @@ i

Re: [Patch v2 1/2] powerpc: Send SIGBUS on unaligned copy and paste

2016-06-17 Thread Balbir Singh


On 17/06/16 09:33, Chris Smart wrote:
> Calling ISA 3.0 instructions copy, copy_first, paste and paste_last
> generates an alignment fault when copying or pasting unaligned
> data (128 byte). We catch this and send SIGBUS to the userspace
> process that caused it.
> 
> We do not emulate these because paste may contain additional metadata
> when pasting to a co-processor and paste_last is the synchronisation
> point for preceding copy/paste sequences.
> 
> Thanks to Michael Neuling  for his help.
> 
> Signed-off-by: Chris Smart 
> ---
> 
> Changes since v1:
> - define and use instruction for mask test
> 
> arch/powerpc/include/asm/ppc-opcode.h |  4 
> arch/powerpc/kernel/align.c   | 14 ++
> 2 files changed, 18 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
> b/arch/powerpc/include/asm/ppc-opcode.h
> index 1d035c1cc889..7921d3e5704d 100644
> --- a/arch/powerpc/include/asm/ppc-opcode.h
> +++ b/arch/powerpc/include/asm/ppc-opcode.h
> @@ -131,6 +131,8 @@
> /* sorted alphabetically */
> #define PPC_INST_BHRBE0x7c00025c
> #define PPC_INST_CLRBHRB0x7c00035c
> +#define PPC_INST_COPY0x7c00060c
> +#define PPC_INST_COPY_FIRST0x7c20060c
> #define PPC_INST_CP_ABORT0x7c00068c
> #define PPC_INST_DCBA0x7c0005ec
> #define PPC_INST_DCBA_MASK0xfc0007fe
> @@ -159,6 +161,8 @@
> #define PPC_INST_MSGSNDP0x7c00011c
> #define PPC_INST_MTTMR0x7c0003dc
> #define PPC_INST_NOP0x6000
> +#define PPC_INST_PASTE0x7c00070c
> +#define PPC_INST_PASTE_LAST0x7c20070d
> #define PPC_INST_POPCNTB0x7cf4
> #define PPC_INST_POPCNTB_MASK0xfc0007fe
> #define PPC_INST_POPCNTD0x7c0003f4
> diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
> index 8e7cb8e2b21a..6e0a1f8495f2 100644
> --- a/arch/powerpc/kernel/align.c
> +++ b/arch/powerpc/kernel/align.c
> @@ -875,6 +875,20 @@ int fix_alignment(struct pt_regs *regs)
> return emulate_vsx(addr, reg, areg, regs, flags, nb, elsize);
> }
> #endif
> +
> +/*
> + * ISA 3.0 (such as P9) copy, copy_first, paste and paste_last alignment
> + * check.
> + *
> + * Send a SIGBUS to the process that caused the fault.
> + *
> + * We do not emulate these because paste may contain additional metadata
> + * when pasting to a co-processor. Furthermore, paste_last is the
> + * synchronisation point for preceding copy/paste sequences.
> + */
> +if ((instruction & 0xfc0006fe) == PPC_INST_COPY)
> +return -EIO;

Should this all be under cpu_has_feature(CPU_FTR_ARCH_300)?

Balbir Singh.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev