Re: [PATCH 10/14] PCI: tegra: Move PCIe driver to drivers/pci/host

2013-01-29 Thread Andrew Murray
On Tue, Jan 22, 2013 at 07:29:01PM +, Jason Gunthorpe wrote:
> On Thu, Jan 17, 2013 at 04:22:18PM +0000, Andrew Murray wrote:
> 
> > > In either of those cases, does it make sense to use the MSI support
> > > outside the scope of the PCI infrastructure? That is, would devices
> > > other than PCI devices be able to generate an MSI?
> > 
> > I've come around to your way of thinking. Your approach sounds good for
> > registration of MSI ops - let the RC host driver do it (it probably has its
> > own), or use a helper for following a phandle to get ops that are not part
> > of the driver. MSIs won't be used outside of PCI devices.
> 
> Here is a bit of additional info on some MSI stuff..
> 
> This can be pretty complex. For instance on hyper transport systems
> the PCI to HT bridge has an MSI controller that maps between PCI and
> HT MSI formats, that mapping is configurable, so technically each
> brige could be considered a MSI controller. Typically the mapping
> controllers are all setup the same so there is not much problem with
> this. However *native* HT devices can (which are super rare) can use a
> different MSI format than PCI devices. From a linux perspective HT is
> just a variant of PCI.
>  
> On x86 the MSI is delivered to the CPU APIC complex which converts it
> into a vectored interrupt - part of the value of MSI is that the MSI
> data can vector the interrupt to a specific CPU, or group of CPUs or
> whatever.
> 
> Presumably SMP ARMs will evolve similar MSI based interrupt vectoring
> capabilities, and presumably on-chip, non-PCI peripherals will evolve
> options to use MSI as well (ie multi-queue ethernet). So it might be
> worth giving some thought to how things could migrate in that
> direction someday.
> 
> I have a bit hacky MSI driver for Kirkwood, this work you have to
> generalize the interface could let me actually upstream it :) The MSI
> is built using the Host2CPU doorbell registers, so it is entirely
> unrelated to the PCI-E RC driver.
> 
> However, my use of the MSI driver on kirkwood is to assign MSIs to a
> PCI-E device via non-standard registers, more like an on chip
> peripheral. This is because the Host2CPU doorbell doesn't fit 100%
> perfectly with the standard PCI MSI stuff, and the hardware has funny
> needs.. So an 'allocate a MSI interrupt' API would be snazzy too :)

Thanks for this. I believe Thierry may be working on improving the MSI
API - so perhaps we can see where that takes us.

Andrew Murray

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v2] of/pci: Provide support for parsing PCI DT ranges property

2013-02-21 Thread Andrew Murray
DT bindings for PCI host bridges often use the ranges property to describe
memory and IO ranges - this binding tends to be the same across architectures
yet several parsing implementations exist, e.g. arch/mips/pci/pci.c,
arch/powerpc/kernel/pci-common.c, arch/sparc/kernel/pci.c and
arch/microblaze/pci/pci-common.c (clone of PPC). Some of these duplicate
functionality provided by drivers/of/address.c.

This patch factors out common implementations patterns to reduce overall kernel
code and provide a means for host bridge drivers to directly obtain struct
resources from the DT's ranges property without relying on architecture specific
DT handling. This will make it easier to write archiecture independent host 
bridge
drivers and mitigate against further duplication of DT parsing code.

This patch can be used in the following way:

struct of_pci_range_iter iter;
for_each_of_pci_range(&iter, np) {

//directly access properties of the address range, e.g.:
//iter.pci_space, iter.pci_addr, iter.cpu_addr, iter.size or
//iter.flags

//alternatively obtain a struct resource, e.g.:
//struct resource res;
//range_iter_fill_resource(iter, np, res);
}

Additionally the implementation takes care of adjacent ranges and merges them
into a single range (as was the case with powerpc and microblaze).

The modifications to microblaze, mips and powerpc have not been tested.

v2:
  - This follows on from suggestions made by Grant Likely
(marc.info/?l=linux-kernel&m=136079602806328)

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
---
 arch/microblaze/pci/pci-common.c |  100 +++--
 arch/mips/pci/pci.c  |   44 -
 arch/powerpc/kernel/pci-common.c |   93 ++-
 drivers/of/address.c |   54 
 include/linux/of_address.h   |   30 +++
 5 files changed, 151 insertions(+), 170 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 4dbb505..ccc0d63 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -659,67 +659,37 @@ void __devinit pci_process_bridge_OF_ranges(struct 
pci_controller *hose,
struct device_node *dev,
int primary)
 {
-   const u32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(dev);
-   int np = pna + 5;
int memno = 0, isa_hole = -1;
-   u32 pci_space;
-   unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
unsigned long long isa_mb = 0;
struct resource *res;
+   struct of_pci_range_iter iter;
 
printk(KERN_INFO "PCI host bridge %s %s ranges:\n",
   dev->full_name, primary ? "(primary)" : "");
 
-   /* Get ranges property */
-   ranges = of_get_property(dev, "ranges", &rlen);
-   if (ranges == NULL)
-   return;
-
-   /* Parse it */
pr_debug("Parsing ranges property...\n");
-   while ((rlen -= np * 4) >= 0) {
+   for_each_of_pci_range(&iter, dev) {
/* Read next ranges element */
-   pci_space = ranges[0];
-   pci_addr = of_read_number(ranges + 1, 2);
-   cpu_addr = of_translate_address(dev, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-
pr_debug("pci_space: 0x%08x pci_addr:0x%016llx "
"cpu_addr:0x%016llx size:0x%016llx\n",
-   pci_space, pci_addr, cpu_addr, size);
-
-   ranges += np;
+   iter.pci_space, iter.pci_addr, iter.cpu_addr,
+   iter.size);
 
/* If we failed translation or got a zero-sized region
 * (some FW try to feed us with non sensical zero sized regions
 * such as power3 which look like some kind of attempt
 * at exposing the VGA memory hole)
 */
-   if (cpu_addr == OF_BAD_ADDR || size == 0)
+   if (iter.cpu_addr == OF_BAD_ADDR || iter.size == 0)
continue;
 
-   /* Now consume following elements while they are contiguous */
-   for (; rlen >= np * sizeof(u32);
-ranges += np, rlen -= np * 4) {
-   if (ranges[0] != pci_space)
-   break;
-   pci_next = of_read_number(ranges + 1, 2);
-   cpu_next = of_translate_address(dev, ranges + 3);
-   if (pci_next != pci_addr + size ||
-   cpu_next != cpu_addr + size)
-

Re: [PATCH v6 1/5] x86, pci: add dummy pci device for early stage

2012-11-13 Thread Andrew Murray
Hello,

Some comments inline...

On 13 November 2012 09:07, Takao Indoh  wrote:
> From: Yinghai Lu 
>
> So we can pass pci_dev *dev to reuse some generic pci functions.
>
> Signed-off-by: Yinghai Lu 
> Signed-off-by: Takao Indoh 
> ---
>  arch/x86/include/asm/pci-direct.h |2 +
>  arch/x86/pci/early.c  |   75 
> +
>  2 files changed, 77 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/include/asm/pci-direct.h 
> b/arch/x86/include/asm/pci-direct.h
> index b1e7a45..b6360d3 100644
> --- a/arch/x86/include/asm/pci-direct.h
> +++ b/arch/x86/include/asm/pci-direct.h
> @@ -18,4 +18,6 @@ extern int early_pci_allowed(void);
>  extern unsigned int pci_early_dump_regs;
>  extern void early_dump_pci_device(u8 bus, u8 slot, u8 func);
>  extern void early_dump_pci_devices(void);
> +
> +struct pci_dev *get_early_pci_dev(int num, int slot, int func);
>  #endif /* _ASM_X86_PCI_DIRECT_H */
> diff --git a/arch/x86/pci/early.c b/arch/x86/pci/early.c
> index d1067d5..aea6b2b 100644
> --- a/arch/x86/pci/early.c
> +++ b/arch/x86/pci/early.c
> @@ -109,3 +109,78 @@ void early_dump_pci_devices(void)
> }
> }
>  }
> +
> +static __init int
> +early_pci_read(struct pci_bus *bus, unsigned int devfn, int where,
> +   int size, u32 *value)
> +{
> +   int num, slot, func;
> +
> +   num = bus->number;
> +   slot = devfn >> 3;
> +   func = devfn & 7;

You may want to use the PCI_SLOT and PCI_FUNC macros in
include/linux/pci.h to determine values for slot and func.

> +   switch (size) {
> +   case 1:
> +   *value = read_pci_config_byte(num, slot, func, where);
> +   break;
> +   case 2:
> +   *value = read_pci_config_16(num, slot, func, where);
> +   break;
> +   case 4:
> +   *value = read_pci_config(num, slot, func, where);
> +   break;
> +   }
> +
> +   return 0;
> +}
> +
> +static __init int
> +early_pci_write(struct pci_bus *bus, unsigned int devfn, int where,
> +   int size, u32 value)
> +{
> +   int num, slot, func;
> +
> +   num = bus->number;
> +   slot = devfn >> 3;
> +   func = devfn & 7;

As above.

> +   switch (size) {
> +   case 1:
> +   write_pci_config_byte(num, slot, func, where, (u8)value);
> +   break;
> +   case 2:
> +   write_pci_config_16(num, slot, func, where, (u16)value);
> +   break;
> +   case 4:
> +   write_pci_config(num, slot, func, where, (u32)value);
> +   break;
> +   }
> +
> +   return 0;
> +}
> +
> +static __initdata struct pci_ops pci_early_ops = {
> +   .read  = early_pci_read,
> +   .write = early_pci_write,
> +};
> +static __initdata struct pci_bus pci_early_bus = {
> +   .ops = &pci_early_ops,
> +};
> +static __initdata char pci_early_init_name[8];
> +static __initdata struct pci_dev pci_early_dev = {
> +   .bus = &pci_early_bus,
> +   .dev = {
> +   .init_name = pci_early_init_name,
> +   },
> +};
> +
> +__init struct pci_dev *get_early_pci_dev(int num, int slot, int func)
> +{
> +   struct pci_dev *pdev;
> +
> +   pdev = &pci_early_dev;
> +   pdev->devfn = (slot<<3) | (func & 7);

You can use PCI_DEVFN here.

> +   pdev->bus->number = num;
> +   sprintf((char *)pdev->dev.init_name, "%02x:%02x.%01x", num, slot, 
> func);
> +
> +   return pdev;
> +}
> --
> 1.7.1
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrew Murray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Boottime: A tool for automatic measurement of kernel/bootloader boot time

2012-11-15 Thread Andrew Murray
On 15 November 2012 10:04, Lee Jones  wrote:
> The overhead is very low and the results will be found under
> sysfs/bootime, as well as detailed results in debugfs under
> boottime/. The bootgraph* files are compatible with
> scripts/bootgraph.pl. The reason for this patch is to provide
> data (sysfs/boottime) suitable for automatic test-cases as
> well as help for developers to reduce the boot time (debugfs).
>
> Based heavily on the original driver by Jonas Aaberg.
>

> +
> +static LIST_HEAD(boottime_list);
> +static DEFINE_SPINLOCK(boottime_list_lock);
> +static struct boottime_timer boottime_timer;
> +static int num_const_boottime_list;
> +static struct boottime_list const_boottime_list[NUM_STATIC_BOOTTIME_ENTRIES];
> +static unsigned long time_kernel_done;
> +static unsigned long time_bootloader_done;
> +static bool system_up;
> +static bool boottime_done;
> +
> +int __attribute__((weak)) boottime_arch_startup(void)
> +{
> +   return 0;
> +}
> +
> +int __attribute__((weak)) boottime_bootloader_idle(void)
> +{
> +   return 0;
> +}

You may wish to use the __weak macro (include/linux/compiler*) instead
of directly using GCC attributes here.

Andrew Murray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 0/3] of/pci: Provide common support for PCI DT parsing

2013-04-22 Thread Andrew Murray
This patchset factors out duplicated code associated with parsing PCI
DT "ranges" properties across the architectures and introduces a
"ranges" parser. This parser "of_pci_range_parser" can be used directly
by ARM host bridge drivers enabling them to obtain ranges from device
trees.

I've included the Reviewed-by, Tested-by and Acked-by's received from v5/v6/v7
in this patchset, earlier versions of this patchset (v3) have been tested-by:

Thierry Reding 
Jingoo Han 

I've tested that this patchset builds and runs on ARM and that it builds on
PowerPC, x86_64 and MIPS.

Compared to the v7 sent by Andrew Murray, the following changes have been made
(please note that the first patch is unchanged from v7):

 * Rename of_pci_range_parser to of_pci_range_parser_init and
   of_pci_process_ranges to of_pci_range_parser_one as suggested by Grant
   Likely.

 * Reverted back to using a switch statement instead of if/else in
   pci_process_bridge_OF_ranges. Grant Likely highlighted this change from
   the original code which was unnecessary.

 * Squashed in a patch provided by Gabor Juhos which fixes build errors on
   MIPS found in the last patchset.

Compared to the v6 sent by Andrew Murray, the following changes have
been made in response to build errors/warnings:

 * Inclusion of linux/of_address.h in of_pci.c as suggested by Michal
   Simek to prevent compilation failures on Microblaze (and others) and his
   ack.

 * Use of externs, static inlines and a typo in linux/of_address.h in response
   to linker errors (multiple defination) on x86_64 as spotted by a kbuild test
   robot on (jcooper/linux.git mvebu/drivers)

 * Add EXPORT_SYMBOL_GPL to of_pci_range_parser function to be consistent
   with of_pci_process_ranges function

Compared to the v5 sent by Andrew Murray, the following changes have
been made:

 * Use of CONFIG_64BIT instead of CONFIG_[a32bitarch] as suggested by
   Rob Herring in drivers/of/of_pci.c

 * Added forward declaration of struct pci_controller in linux/of_pci.h
   to prevent compiler warning as suggested by Thomas Petazzoni

 * Improved error checking (!range check), removal of unnecessary be32_to_cpup
   call, improved formatting of struct of_pci_range_parser layout and
   replacement of macro with a static inline. All suggested by Rob Herring.

Compared to the v4 (incorrectly labelled v3) sent by Andrew Murray,
the following changes have been made:

 * Split the patch as suggested by Rob Herring

Compared to the v3 sent by Andrew Murray, the following changes have
been made:

 * Unify and move duplicate pci_process_bridge_OF_ranges functions to
   drivers/of/of_pci.c as suggested by Rob Herring

 * Fix potential build errors with Microblaze/MIPS

Compared to "[PATCH v5 01/17] of/pci: Provide support for parsing PCI DT
ranges property", the following changes have been made:

 * Correct use of IORESOURCE_* as suggested by Russell King

 * Improved interface and naming as suggested by Thierry Reding

Compared to the v2 sent by Andrew Murray, Thomas Petazzoni did:

 * Add a memset() on the struct of_pci_range_iter when starting the
   for loop in for_each_pci_range(). Otherwise, with an uninitialized
   of_pci_range_iter, of_pci_process_ranges() may crash.

 * Add parenthesis around 'res', 'np' and 'iter' in the
   for_each_of_pci_range macro definitions. Otherwise, passing
   something like &foobar as 'res' didn't work.

 * Rebased on top of 3.9-rc2, which required fixing a few conflicts in
   the Microblaze code.

v2:
  This follows on from suggestions made by Grant Likely
  (marc.info/?l=linux-kernel&m=136079602806328)

Andrew Murray (3):
  of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and
PowerPC
  of/pci: Provide support for parsing PCI DT ranges property
  of/pci: mips: convert to common of_pci_range_parser

 arch/microblaze/include/asm/pci-bridge.h |5 +-
 arch/microblaze/pci/pci-common.c |  192 --
 arch/mips/pci/pci.c  |   51 +++-
 arch/powerpc/include/asm/pci-bridge.h|5 +-
 arch/powerpc/kernel/pci-common.c |  192 --
 drivers/of/address.c |   67 +++
 drivers/of/of_pci.c  |  173 +++
 include/linux/of_address.h   |   48 
 include/linux/of_pci.h   |4 +
 9 files changed, 313 insertions(+), 424 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 1/3] of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and PowerPC

2013-04-22 Thread Andrew Murray
The pci_process_bridge_OF_ranges function, used to parse the "ranges"
property of a PCI host device, is found in both Microblaze and PowerPC
architectures. These implementations are nearly identical. This patch
moves this common code to a common place.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Reviewed-by: Rob Herring 
Tested-by: Thomas Petazzoni 
Tested-by: Linus Walleij 
Acked-by: Michal Simek 
Acked-by: Grant Likely 
---
 arch/microblaze/include/asm/pci-bridge.h |5 +-
 arch/microblaze/pci/pci-common.c |  192 
 arch/powerpc/include/asm/pci-bridge.h|5 +-
 arch/powerpc/kernel/pci-common.c |  192 
 drivers/of/of_pci.c  |  200 ++
 include/linux/of_pci.h   |4 +
 6 files changed, 206 insertions(+), 392 deletions(-)

diff --git a/arch/microblaze/include/asm/pci-bridge.h 
b/arch/microblaze/include/asm/pci-bridge.h
index cb5d397..5783cd6 100644
--- a/arch/microblaze/include/asm/pci-bridge.h
+++ b/arch/microblaze/include/asm/pci-bridge.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device_node;
 
@@ -132,10 +133,6 @@ extern void setup_indirect_pci(struct pci_controller *hose,
 extern struct pci_controller *pci_find_hose_for_OF_device(
struct device_node *node);
 
-/* Fill up host controller resources from the OF node */
-extern void pci_process_bridge_OF_ranges(struct pci_controller *hose,
-   struct device_node *dev, int primary);
-
 /* Allocate & free a PCI host bridge structure */
 extern struct pci_controller *pcibios_alloc_controller(struct device_node 
*dev);
 extern void pcibios_free_controller(struct pci_controller *phb);
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 9ea521e..2735ad9 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -622,198 +622,6 @@ void pci_resource_to_user(const struct pci_dev *dev, int 
bar,
*end = rsrc->end - offset;
 }
 
-/**
- * pci_process_bridge_OF_ranges - Parse PCI bridge resources from device tree
- * @hose: newly allocated pci_controller to be setup
- * @dev: device node of the host bridge
- * @primary: set if primary bus (32 bits only, soon to be deprecated)
- *
- * This function will parse the "ranges" property of a PCI host bridge device
- * node and setup the resource mapping of a pci controller based on its
- * content.
- *
- * Life would be boring if it wasn't for a few issues that we have to deal
- * with here:
- *
- *   - We can only cope with one IO space range and up to 3 Memory space
- * ranges. However, some machines (thanks Apple !) tend to split their
- * space into lots of small contiguous ranges. So we have to coalesce.
- *
- *   - We can only cope with all memory ranges having the same offset
- * between CPU addresses and PCI addresses. Unfortunately, some bridges
- * are setup for a large 1:1 mapping along with a small "window" which
- * maps PCI address 0 to some arbitrary high address of the CPU space in
- * order to give access to the ISA memory hole.
- * The way out of here that I've chosen for now is to always set the
- * offset based on the first resource found, then override it if we
- * have a different offset and the previous was set by an ISA hole.
- *
- *   - Some busses have IO space not starting at 0, which causes trouble with
- * the way we do our IO resource renumbering. The code somewhat deals with
- * it for 64 bits but I would expect problems on 32 bits.
- *
- *   - Some 32 bits platforms such as 4xx can have physical space larger than
- * 32 bits so we need to use 64 bits values for the parsing
- */
-void pci_process_bridge_OF_ranges(struct pci_controller *hose,
- struct device_node *dev, int primary)
-{
-   const u32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(dev);
-   int np = pna + 5;
-   int memno = 0, isa_hole = -1;
-   u32 pci_space;
-   unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
-   unsigned long long isa_mb = 0;
-   struct resource *res;
-
-   pr_info("PCI host bridge %s %s ranges:\n",
-  dev->full_name, primary ? "(primary)" : "");
-
-   /* Get ranges property */
-   ranges = of_get_property(dev, "ranges", &rlen);
-   if (ranges == NULL)
-   return;
-
-   /* Parse it */
-   pr_debug("Parsing ranges property...\n");
-   while ((rlen -= np * 4) >= 0) {
-   /* Read next ranges element */
-   pci_space = ranges[0];
-   pci_addr = of_read_number(ranges + 1, 2);
-   cpu_addr = of_translate_address(dev, ranges + 3);
-   s

[PATCH v8 3/3] of/pci: mips: convert to common of_pci_range_parser

2013-04-22 Thread Andrew Murray
This patch converts the pci_load_of_ranges function to use the new common
of_pci_range_parser.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Signed-off-by: Gabor Juhos 
Reviewed-by: Rob Herring 
Reviewed-by: Grant Likely 
Tested-by: Linus Walleij 
---
 arch/mips/pci/pci.c |   51 +++
 1 files changed, 19 insertions(+), 32 deletions(-)

diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 0872f12..4b09ca8 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -122,51 +122,38 @@ static void pcibios_scanbus(struct pci_controller *hose)
 #ifdef CONFIG_OF
 void pci_load_of_ranges(struct pci_controller *hose, struct device_node *node)
 {
-   const __be32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(node);
-   int np = pna + 5;
+   struct of_pci_range range;
+   struct of_pci_range_parser parser;
+   u32 res_type;
 
pr_info("PCI host bridge %s ranges:\n", node->full_name);
-   ranges = of_get_property(node, "ranges", &rlen);
-   if (ranges == NULL)
-   return;
hose->of_node = node;
 
-   while ((rlen -= np * 4) >= 0) {
-   u32 pci_space;
+   if (of_pci_range_parser_init(&parser, node))
+   return;
+
+   for_each_of_pci_range(&parser, &range) {
struct resource *res = NULL;
-   u64 addr, size;
-
-   pci_space = be32_to_cpup(&ranges[0]);
-   addr = of_translate_address(node, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-   ranges += np;
-   switch ((pci_space >> 24) & 0x3) {
-   case 1: /* PCI IO space */
+
+   switch (range.flags & IORESOURCE_TYPE_BITS) {
+   case IORESOURCE_IO:
pr_info("  IO 0x%016llx..0x%016llx\n",
-   addr, addr + size - 1);
+   range.cpu_addr,
+   range.cpu_addr + range.size - 1);
hose->io_map_base =
-   (unsigned long)ioremap(addr, size);
+   (unsigned long)ioremap(range.cpu_addr,
+  range.size);
res = hose->io_resource;
-   res->flags = IORESOURCE_IO;
break;
-   case 2: /* PCI Memory space */
-   case 3: /* PCI 64 bits Memory space */
+   case IORESOURCE_MEM:
pr_info(" MEM 0x%016llx..0x%016llx\n",
-   addr, addr + size - 1);
+   range.cpu_addr,
+   range.cpu_addr + range.size - 1);
res = hose->mem_resource;
-   res->flags = IORESOURCE_MEM;
break;
}
-   if (res != NULL) {
-   res->start = addr;
-   res->name = node->full_name;
-   res->end = res->start + size - 1;
-   res->parent = NULL;
-   res->sibling = NULL;
-   res->child = NULL;
-   }
+   if (res != NULL)
+   of_pci_range_to_resource(&range, node, res);
}
 }
 #endif
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 2/3] of/pci: Provide support for parsing PCI DT ranges property

2013-04-22 Thread Andrew Murray
This patch factors out common implementation patterns to reduce overall kernel
code and provide a means for host bridge drivers to directly obtain struct
resources from the DT's ranges property without relying on architecture specific
DT handling. This will make it easier to write archiecture independent host 
bridge
drivers and mitigate against further duplication of DT parsing code.

This patch can be used in the following way:

struct of_pci_range_parser parser;
struct of_pci_range range;

if (of_pci_range_parser_init(&parser, np))
; //no ranges property

for_each_of_pci_range(&parser, &range) {

/*
directly access properties of the address range, e.g.:
range.pci_space, range.pci_addr, range.cpu_addr,
range.size, range.flags

alternatively obtain a struct resource, e.g.:
struct resource res;
of_pci_range_to_resource(&range, np, &res);
*/
}

Additionally the implementation takes care of adjacent ranges and merges them
into a single range (as was the case with powerpc and microblaze).

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Signed-off-by: Thomas Petazzoni 
Reviewed-by: Rob Herring 
Tested-by: Thomas Petazzoni 
Tested-by: Linus Walleij 
Acked-by: Grant Likely 
---
 drivers/of/address.c   |   67 ++
 drivers/of/of_pci.c|  113 +---
 include/linux/of_address.h |   48 +++
 3 files changed, 158 insertions(+), 70 deletions(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 04da786..fdd0636 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -227,6 +227,73 @@ int of_pci_address_to_resource(struct device_node *dev, 
int bar,
return __of_address_to_resource(dev, addrp, size, flags, NULL, r);
 }
 EXPORT_SYMBOL_GPL(of_pci_address_to_resource);
+
+int of_pci_range_parser_init(struct of_pci_range_parser *parser,
+   struct device_node *node)
+{
+   const int na = 3, ns = 2;
+   int rlen;
+
+   parser->node = node;
+   parser->pna = of_n_addr_cells(node);
+   parser->np = parser->pna + na + ns;
+
+   parser->range = of_get_property(node, "ranges", &rlen);
+   if (parser->range == NULL)
+   return -ENOENT;
+
+   parser->end = parser->range + rlen / sizeof(__be32);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(of_pci_range_parser_init);
+
+struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser 
*parser,
+   struct of_pci_range *range)
+{
+   const int na = 3, ns = 2;
+
+   if (!range)
+   return NULL;
+
+   if (!parser->range || parser->range + parser->np > parser->end)
+   return NULL;
+
+   range->pci_space = parser->range[0];
+   range->flags = of_bus_pci_get_flags(parser->range);
+   range->pci_addr = of_read_number(parser->range + 1, ns);
+   range->cpu_addr = of_translate_address(parser->node,
+   parser->range + na);
+   range->size = of_read_number(parser->range + parser->pna + na, ns);
+
+   parser->range += parser->np;
+
+   /* Now consume following elements while they are contiguous */
+   while (parser->range + parser->np <= parser->end) {
+   u32 flags, pci_space;
+   u64 pci_addr, cpu_addr, size;
+
+   pci_space = be32_to_cpup(parser->range);
+   flags = of_bus_pci_get_flags(parser->range);
+   pci_addr = of_read_number(parser->range + 1, ns);
+   cpu_addr = of_translate_address(parser->node,
+   parser->range + na);
+   size = of_read_number(parser->range + parser->pna + na, ns);
+
+   if (flags != range->flags)
+   break;
+   if (pci_addr != range->pci_addr + range->size ||
+   cpu_addr != range->cpu_addr + range->size)
+   break;
+
+   range->size += size;
+   parser->range += parser->np;
+   }
+
+   return range;
+}
+EXPORT_SYMBOL_GPL(of_pci_range_parser_one);
+
 #endif /* CONFIG_PCI */
 
 /*
diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 1626172..3c49ab2 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #if defined(CONFIG_PPC32) || defined(CONFIG_PPC64) || 
defined(CONFIG_MICROBLAZE)
@@ -82,67 +83,42 @@ EXPORT_SYMBOL_GPL(of_pci_find_child_device);
 void pci_process_bridge_OF_ranges(struct pci_controller *hose,
   

Re: [RFC 0/2] PCI: Introduce MSI chip infrastructure

2013-03-22 Thread Andrew Murray
On Fri, Mar 22, 2013 at 08:51:45AM +, Thierry Reding wrote:
> It
> is the responsibility of the PCI host bridge driver to setup the MSI
> chip for the root bus.

I think this could work well. In the future if the use of an independent MSI
controller is required, then new DT bindings for host-bridges could use
phandles to reference independent MSI controllers as their providers of
MSIs. I guess this functionality can be built on top of what you have proposed
later as the need arises.

Andrew Murray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 1/2] PCI: Introduce new MSI chip infrastructure

2013-03-22 Thread Andrew Murray
On Fri, Mar 22, 2013 at 08:51:46AM +, Thierry Reding wrote:
> index ce93a34..ea4a5be 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -58,5 +58,15 @@ extern int arch_setup_msi_irqs(struct pci_dev *dev, int 
> nvec, int type);
>  extern void arch_teardown_msi_irqs(struct pci_dev *dev);
>  extern int arch_msi_check_device(struct pci_dev* dev, int nvec, int type);
>  
> +struct msi_chip {
> + struct module *owner;
> + struct device *dev;
> +
> + int (*setup_irq)(struct msi_chip *chip, struct pci_dev *dev,
> +  struct msi_desc *desc);
> + void (*teardown_irq)(struct msi_chip *chip, unsigned int irq);
> + int (*check_device)(struct msi_chip *chip, struct pci_dev *dev,
> + int nvec, int type);
> +};

Is there a need to add setup_irqs and teardown_irqs functions here? This will
allow your MSI chips to support multiple MSIs per requesting device.

What about restore_msi_irqs? Does this fit in here too?

Andrew Murray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH RESEND v2] of/pci: Provide support for parsing PCI DT ranges property

2013-03-22 Thread Andrew Murray
On Thu, Mar 21, 2013 at 04:06:25PM +, Thomas Petazzoni wrote:
> Dear Andrew Murray,
> 
> On Fri,  1 Mar 2013 12:23:36 +0000, Andrew Murray wrote:
> > This patch factors out common implementations patterns to reduce overall 
> > kernel
> > code and provide a means for host bridge drivers to directly obtain struct
> > resources from the DT's ranges property without relying on architecture 
> > specific
> > DT handling. This will make it easier to write archiecture independent host 
> > bridge
> > drivers and mitigate against further duplication of DT parsing code.
> > 
> > This patch can be used in the following way:
> > 
> > struct of_pci_range_iter iter;
> > for_each_of_pci_range(&iter, np) {
> > 
> > //directly access properties of the address range, e.g.:
> > //iter.pci_space, iter.pci_addr, iter.cpu_addr, iter.size or
> > //iter.flags
> > 
> > //alternatively obtain a struct resource, e.g.:
> > //struct resource res;
> > //range_iter_fill_resource(iter, np, res);
> > }
> > 
> > Additionally the implementation takes care of adjacent ranges and merges 
> > them
> > into a single range (as was the case with powerpc and microblaze).
> > 
> > The modifications to microblaze, mips and powerpc have not been tested.
> > 
> > v2:
> >   This follows on from suggestions made by Grant Likely
> >   (marc.info/?l=linux-kernel&m=136079602806328)
> > 
> > Signed-off-by: Andrew Murray 
> > Signed-off-by: Liviu Dudau 
> 
> Thanks, I've tested this successfully with the Marvell PCIe driver. I'm
> about to send a new version of the Marvell PCIe patch set that includes
> this RFC proposal.
> 
> I only made two small changes compared to your version, detailed below.

Thanks for the feedback, all looks good to me. Do I need to give ack?

Andrew Murray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] of/pci: Provide support for parsing PCI DT ranges property

2013-03-25 Thread Andrew Murray
On Sat, Mar 23, 2013 at 01:37:04PM +, Thomas Petazzoni wrote:
> 
> On Sat, 23 Mar 2013 10:41:56 +, Russell King - ARM Linux wrote:
> 
> > Please look at how IORESOURCE_* stuff is defined:
> > #define IORESOURCE_TYPE_BITS0x1f00  /* Resource type */
> > #define IORESOURCE_IO   0x0100  /* PCI/ISA I/O ports */
> > #define IORESOURCE_MEM  0x0200
> > #define IORESOURCE_REG  0x0300  /* Register offsets */
> > #define IORESOURCE_IRQ  0x0400
> > #define IORESOURCE_DMA  0x0800
> > #define IORESOURCE_BUS  0x1000
> > 
> > Notice that it's not an array of bits.
> > 
> > So this should be:
> > if ((iter.flags & IORESOURCE_TYPE_BITS) == IORESOURCE_IO) {
> 
> What I've done for the Marvell PCIe driver is:
> 
> + for_each_of_pci_range(&iter, np) {
> + unsigned long restype = iter.flags & IORESOURCE_TYPE_BITS;
> + if (restype == IORESOURCE_IO) {
> [...]
> +     if (restype == IORESOURCE_MEM) {
> [...]

OK I'll update this patch and also include Thierry's suggestions.

Andrew Murray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] of/pci: Provide support for parsing PCI DT ranges property

2013-03-26 Thread Andrew Murray
This patch factors out common implementation patterns to reduce overall kernel
code and provide a means for host bridge drivers to directly obtain struct
resources from the DT's ranges property without relying on architecture specific
DT handling. This will make it easier to write archiecture independent host 
bridge
drivers and mitigate against further duplication of DT parsing code.

This patch can be used in the following way:

struct of_pci_range_parser parser;
struct of_pci_range range;

if (of_pci_range_parser(&parser, np))
; //no ranges property

for_each_of_pci_range(&parser, &range) {

/*
directly access properties of the address range, e.g.:
range.pci_space, range.pci_addr, range.cpu_addr,
range.size, range.flags

alternatively obtain a struct resource, e.g.:
struct resource res;
of_pci_range_to_resource(&range, np, &res);
*/
}

Additionally the implementation takes care of adjacent ranges and merges them
into a single range (as was the case with powerpc and microblaze).

The modifications to microblaze, mips and powerpc have not been tested.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Signed-off-by: Thomas Petazzoni 
---
Compared to "[PATCH v5 01/17] of/pci: Provide support for parsing PCI DT
ranges property", the following changes have been made:

 * Correct use of IORESOURCE_* as suggested by Russell King

 * Improved interface and naming as suggested by Thierry Reding

Compared to the v2 sent by Andrew Murray, Thomas Petazzoni did:

 * Add a memset() on the struct of_pci_range_iter when starting the
   for loop in for_each_pci_range(). Otherwise, with an uninitialized
   of_pci_range_iter, of_pci_process_ranges() may crash.

 * Add parenthesis around 'res', 'np' and 'iter' in the
   for_each_of_pci_range macro definitions. Otherwise, passing
   something like &foobar as 'res' didn't work.

 * Rebased on top of 3.9-rc2, which required fixing a few conflicts in
   the Microblaze code.

v2:
  This follows on from suggestions made by Grant Likely
  (marc.info/?l=linux-kernel&m=136079602806328)
---
 arch/microblaze/pci/pci-common.c |  110 +
 arch/mips/pci/pci.c  |   50 ++
 arch/powerpc/kernel/pci-common.c |   99 --
 drivers/of/address.c |   63 ++
 include/linux/of_address.h   |   42 ++
 5 files changed, 194 insertions(+), 170 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 9ea521e..17a7ad1 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -658,67 +658,43 @@ void pci_resource_to_user(const struct pci_dev *dev, int 
bar,
 void pci_process_bridge_OF_ranges(struct pci_controller *hose,
  struct device_node *dev, int primary)
 {
-   const u32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(dev);
-   int np = pna + 5;
int memno = 0, isa_hole = -1;
-   u32 pci_space;
-   unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
unsigned long long isa_mb = 0;
struct resource *res;
+   struct of_pci_range range;
+   struct of_pci_range_parser parser;
+   u32 res_type;
 
pr_info("PCI host bridge %s %s ranges:\n",
   dev->full_name, primary ? "(primary)" : "");
 
-   /* Get ranges property */
-   ranges = of_get_property(dev, "ranges", &rlen);
-   if (ranges == NULL)
+   /* Check for ranges property */
+   if (of_pci_range_parser(&parser, dev))
return;
 
-   /* Parse it */
pr_debug("Parsing ranges property...\n");
-   while ((rlen -= np * 4) >= 0) {
+   for_each_of_pci_range(&parser, &range) {
/* Read next ranges element */
-   pci_space = ranges[0];
-   pci_addr = of_read_number(ranges + 1, 2);
-   cpu_addr = of_translate_address(dev, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-
-   pr_debug("pci_space: 0x%08x pci_addr:0x%016llx ",
-   pci_space, pci_addr);
-   pr_debug("cpu_addr:0x%016llx size:0x%016llx\n",
-   cpu_addr, size);
-
-   ranges += np;
+   pr_debug("pci_space: 0x%08x pci_addr: 0x%016llx ",
+   range.pci_space, range.pci_addr);
+   pr_debug("cpu_addr: 0x%016llx size: 0x%016llx\n",
+   range

[RFC PATCH RESEND v2] of/pci: Provide support for parsing PCI DT ranges property

2013-03-01 Thread Andrew Murray
This patch factors out common implementations patterns to reduce overall kernel
code and provide a means for host bridge drivers to directly obtain struct
resources from the DT's ranges property without relying on architecture specific
DT handling. This will make it easier to write archiecture independent host 
bridge
drivers and mitigate against further duplication of DT parsing code.

This patch can be used in the following way:

struct of_pci_range_iter iter;
for_each_of_pci_range(&iter, np) {

//directly access properties of the address range, e.g.:
//iter.pci_space, iter.pci_addr, iter.cpu_addr, iter.size or
//iter.flags

//alternatively obtain a struct resource, e.g.:
//struct resource res;
//range_iter_fill_resource(iter, np, res);
}

Additionally the implementation takes care of adjacent ranges and merges them
into a single range (as was the case with powerpc and microblaze).

The modifications to microblaze, mips and powerpc have not been tested.

v2:
  This follows on from suggestions made by Grant Likely
  (marc.info/?l=linux-kernel&m=136079602806328)

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
---
 arch/microblaze/pci/pci-common.c |  100 +++--
 arch/mips/pci/pci.c  |   44 -
 arch/powerpc/kernel/pci-common.c |   93 ++-
 drivers/of/address.c |   54 
 include/linux/of_address.h   |   30 +++
 5 files changed, 151 insertions(+), 170 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 4dbb505..ccc0d63 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -659,67 +659,37 @@ void __devinit pci_process_bridge_OF_ranges(struct 
pci_controller *hose,
struct device_node *dev,
int primary)
 {
-   const u32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(dev);
-   int np = pna + 5;
int memno = 0, isa_hole = -1;
-   u32 pci_space;
-   unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
unsigned long long isa_mb = 0;
struct resource *res;
+   struct of_pci_range_iter iter;
 
printk(KERN_INFO "PCI host bridge %s %s ranges:\n",
   dev->full_name, primary ? "(primary)" : "");
 
-   /* Get ranges property */
-   ranges = of_get_property(dev, "ranges", &rlen);
-   if (ranges == NULL)
-   return;
-
-   /* Parse it */
pr_debug("Parsing ranges property...\n");
-   while ((rlen -= np * 4) >= 0) {
+   for_each_of_pci_range(&iter, dev) {
/* Read next ranges element */
-   pci_space = ranges[0];
-   pci_addr = of_read_number(ranges + 1, 2);
-   cpu_addr = of_translate_address(dev, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-
pr_debug("pci_space: 0x%08x pci_addr:0x%016llx "
"cpu_addr:0x%016llx size:0x%016llx\n",
-   pci_space, pci_addr, cpu_addr, size);
-
-   ranges += np;
+   iter.pci_space, iter.pci_addr, iter.cpu_addr,
+   iter.size);
 
/* If we failed translation or got a zero-sized region
 * (some FW try to feed us with non sensical zero sized regions
 * such as power3 which look like some kind of attempt
 * at exposing the VGA memory hole)
 */
-   if (cpu_addr == OF_BAD_ADDR || size == 0)
+   if (iter.cpu_addr == OF_BAD_ADDR || iter.size == 0)
continue;
 
-   /* Now consume following elements while they are contiguous */
-   for (; rlen >= np * sizeof(u32);
-ranges += np, rlen -= np * 4) {
-   if (ranges[0] != pci_space)
-   break;
-   pci_next = of_read_number(ranges + 1, 2);
-   cpu_next = of_translate_address(dev, ranges + 3);
-   if (pci_next != pci_addr + size ||
-   cpu_next != cpu_addr + size)
-   break;
-   size += of_read_number(ranges + pna + 3, 2);
-   }
-
/* Act based on address space type */
res = NULL;
-   switch ((pci_space >> 24) & 0x3) {
-   case 1: /* PCI IO space */
+   if (iter.flags & IORESOURCE_IO) {
printk(KERN_INFO
  

Re: [RFC PATCH RESEND v2] of/pci: Provide support for parsing PCI DT ranges property

2013-03-06 Thread Andrew Murray
On Fri, Mar 01, 2013 at 03:13:34PM +, Rob Herring wrote:
> On 03/01/2013 06:23 AM, Andrew Murray wrote:
> > This patch factors out common implementations patterns to reduce overall 
> > kernel
> > code and provide a means for host bridge drivers to directly obtain struct
> > resources from the DT's ranges property without relying on architecture 
> > specific
> > DT handling. This will make it easier to write archiecture independent host 
> > bridge
> > drivers and mitigate against further duplication of DT parsing code.
> > 
> > This patch can be used in the following way:
> > 
> > struct of_pci_range_iter iter;
> > for_each_of_pci_range(&iter, np) {
> > 
> > //directly access properties of the address range, e.g.:
> > //iter.pci_space, iter.pci_addr, iter.cpu_addr, iter.size or
> > //iter.flags
> > 
> > //alternatively obtain a struct resource, e.g.:
> > //struct resource res;
> > //range_iter_fill_resource(iter, np, res);
> > }
> > 
> > Additionally the implementation takes care of adjacent ranges and merges 
> > them
> > into a single range (as was the case with powerpc and microblaze).
> > 
> > The modifications to microblaze, mips and powerpc have not been tested.
> > 
> > v2:
> >   This follows on from suggestions made by Grant Likely
> >   (marc.info/?l=linux-kernel&m=136079602806328)
> > 
> > Signed-off-by: Andrew Murray 
> > Signed-off-by: Liviu Dudau 
> > ---
> >  arch/microblaze/pci/pci-common.c |  100 
> > +++--
> >  arch/mips/pci/pci.c  |   44 -
> >  arch/powerpc/kernel/pci-common.c |   93 ++-
> >  drivers/of/address.c |   54 
> >  include/linux/of_address.h   |   30 +++
> >  5 files changed, 151 insertions(+), 170 deletions(-)
> 
> The thing is that this still leaves pci_process_bridge_OF_ranges
> basically identical for microblaze and powerpc which is really what
> needs to be moved out to common code. Obviously, struct pci_controller
> vs. struct pci_sys_data on ARM is an issue, but they all have
> fundamentally the same data.
Yes it does. To make things worse struct pci_controller is duplicated and
pretty much identical between microblaze and powerpc. There is good scope
for getting rid of lots of code here :).

> 
> All these common fields should be in a common PCI controller struct.
> Perhaps introducing this with just what you need would work. Depending
> how invasive moving those fields to a new struct is, you could have a
> wrapper that just copies/translates the fields to the arch specific struct.
Yes I see how this would be a good approach. Though my concern would be how
quirks are handled - if microblaze has the same quirks as powerpc then you'll
see the same duplicated code between those two architectures. Or you'd see
the architecture code pick apart the common pci controller struct... I'll
investigate and see what can be done.

A lack of an accepted way to parse DT ranges on ARM is blocking Thierry,
Thomas and Jingoo from upstreaming their drivers - do you think there is some
middle ground or temporary solution for these drivers?

> 
> There's also things like ioremap of the i/o range. ARM uses a fixed
> virtual address, so we need to do something different. Just returning
> the i/o cpu_addr and moving the ioremap out of this function would solve
> that.
Yes I've noticed this wasn't quite right. I'm not quite sure how this fits
in with the DT. I guess the DT ranges would contain 0 for the PCI address
and a physical address which represents the host bridges I/O range. You
would then use these two addresses as inputs to pci_ioremap_io - and then
set the start address of the struct resource to 0 and pass to
pci_add_resource_offset with io_offset set to 0 - does this seem correct for
ARM?

Andrew Murray
> 
> Rob
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] of/pci: Provide support for parsing PCI DT ranges property

2013-04-08 Thread Andrew Murray
This patch factors out common implementation patterns to reduce overall kernel
code and provide a means for host bridge drivers to directly obtain struct
resources from the DT's ranges property without relying on architecture specific
DT handling. This will make it easier to write archiecture independent host 
bridge
drivers and mitigate against further duplication of DT parsing code.

This patch can be used in the following way:

struct of_pci_range_parser parser;
struct of_pci_range range;

if (of_pci_range_parser(&parser, np))
; //no ranges property

for_each_of_pci_range(&parser, &range) {

/*
directly access properties of the address range, e.g.:
range.pci_space, range.pci_addr, range.cpu_addr,
range.size, range.flags

alternatively obtain a struct resource, e.g.:
struct resource res;
of_pci_range_to_resource(&range, np, &res);
*/
}

Additionally the implementation takes care of adjacent ranges and merges them
into a single range (as was the case with powerpc and microblaze).

The modifications to microblaze, mips and powerpc have not been tested.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Signed-off-by: Thomas Petazzoni 
---
Compared to the v3 sent by Andrew Murray, the following changes have
been made:

 * Unify and move duplicate pci_process_bridge_OF_ranges functions to
   drivers/of/of_pci.c as suggested by Rob Herring

 * Fix potential build errors with Microblaze/MIPS

Compared to "[PATCH v5 01/17] of/pci: Provide support for parsing PCI DT
ranges property", the following changes have been made:

 * Correct use of IORESOURCE_* as suggested by Russell King

 * Improved interface and naming as suggested by Thierry Reding

Compared to the v2 sent by Andrew Murray, Thomas Petazzoni did:

 * Add a memset() on the struct of_pci_range_iter when starting the
   for loop in for_each_pci_range(). Otherwise, with an uninitialized
   of_pci_range_iter, of_pci_process_ranges() may crash.

 * Add parenthesis around 'res', 'np' and 'iter' in the
   for_each_of_pci_range macro definitions. Otherwise, passing
   something like &foobar as 'res' didn't work.

 * Rebased on top of 3.9-rc2, which required fixing a few conflicts in
   the Microblaze code.

v2:
  This follows on from suggestions made by Grant Likely
  (marc.info/?l=linux-kernel&m=136079602806328)
---
 arch/microblaze/include/asm/pci-bridge.h |5 +-
 arch/microblaze/pci/pci-common.c |  192 --
 arch/mips/pci/pci.c  |   50 +++--
 arch/powerpc/include/asm/pci-bridge.h|5 +-
 arch/powerpc/kernel/pci-common.c |  192 --
 drivers/of/address.c |   63 ++
 drivers/of/of_pci.c  |  168 ++
 include/linux/of_address.h   |   42 +++
 include/linux/of_pci.h   |3 +
 9 files changed, 294 insertions(+), 426 deletions(-)

diff --git a/arch/microblaze/include/asm/pci-bridge.h 
b/arch/microblaze/include/asm/pci-bridge.h
index cb5d397..5783cd6 100644
--- a/arch/microblaze/include/asm/pci-bridge.h
+++ b/arch/microblaze/include/asm/pci-bridge.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device_node;
 
@@ -132,10 +133,6 @@ extern void setup_indirect_pci(struct pci_controller *hose,
 extern struct pci_controller *pci_find_hose_for_OF_device(
struct device_node *node);
 
-/* Fill up host controller resources from the OF node */
-extern void pci_process_bridge_OF_ranges(struct pci_controller *hose,
-   struct device_node *dev, int primary);
-
 /* Allocate & free a PCI host bridge structure */
 extern struct pci_controller *pcibios_alloc_controller(struct device_node 
*dev);
 extern void pcibios_free_controller(struct pci_controller *phb);
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 9ea521e..2735ad9 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -622,198 +622,6 @@ void pci_resource_to_user(const struct pci_dev *dev, int 
bar,
*end = rsrc->end - offset;
 }
 
-/**
- * pci_process_bridge_OF_ranges - Parse PCI bridge resources from device tree
- * @hose: newly allocated pci_controller to be setup
- * @dev: device node of the host bridge
- * @primary: set if primary bus (32 bits only, soon to be deprecated)
- *
- * This function will parse the "ranges" property of a PCI host bridge device
- * node and setup the resource mapping of a pci controller based on its
- * content.
- *
- * Life would be boring if it wasn't for a few issues that we have to deal
- * wi

[PATCH v5 2/3] of/pci: Provide support for parsing PCI DT ranges property

2013-04-10 Thread Andrew Murray
This patch factors out common implementation patterns to reduce overall kernel
code and provide a means for host bridge drivers to directly obtain struct
resources from the DT's ranges property without relying on architecture specific
DT handling. This will make it easier to write archiecture independent host 
bridge
drivers and mitigate against further duplication of DT parsing code.

This patch can be used in the following way:

struct of_pci_range_parser parser;
struct of_pci_range range;

if (of_pci_range_parser(&parser, np))
; //no ranges property

for_each_of_pci_range(&parser, &range) {

/*
directly access properties of the address range, e.g.:
range.pci_space, range.pci_addr, range.cpu_addr,
range.size, range.flags

alternatively obtain a struct resource, e.g.:
struct resource res;
of_pci_range_to_resource(&range, np, &res);
*/
}

Additionally the implementation takes care of adjacent ranges and merges them
into a single range (as was the case with powerpc and microblaze).

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Signed-off-by: Thomas Petazzoni 
---
 drivers/of/address.c   |   63 +
 drivers/of/of_pci.c|  112 
 include/linux/of_address.h |   42 
 3 files changed, 145 insertions(+), 72 deletions(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 04da786..e87f45e 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -227,6 +227,69 @@ int of_pci_address_to_resource(struct device_node *dev, 
int bar,
return __of_address_to_resource(dev, addrp, size, flags, NULL, r);
 }
 EXPORT_SYMBOL_GPL(of_pci_address_to_resource);
+
+int of_pci_range_parser(struct of_pci_range_parser *parser,
+   struct device_node *node)
+{
+   const int na = 3, ns = 2;
+   int rlen;
+
+   parser->node = node;
+   parser->pna = of_n_addr_cells(node);
+   parser->np = parser->pna + na + ns;
+
+   parser->range = of_get_property(node, "ranges", &rlen);
+   if (parser->range == NULL)
+   return -ENOENT;
+
+   parser->end = parser->range + rlen / sizeof(__be32);
+
+   return 0;
+}
+
+struct of_pci_range *of_pci_process_ranges(struct of_pci_range_parser *parser,
+   struct of_pci_range *range)
+{
+   const int na = 3, ns = 2;
+
+   if (!parser->range || parser->range + parser->np > parser->end)
+   return NULL;
+
+   range->pci_space = be32_to_cpup(parser->range);
+   range->flags = of_bus_pci_get_flags(parser->range);
+   range->pci_addr = of_read_number(parser->range + 1, ns);
+   range->cpu_addr = of_translate_address(parser->node,
+   parser->range + na);
+   range->size = of_read_number(parser->range + parser->pna + na, ns);
+
+   parser->range += parser->np;
+
+   /* Now consume following elements while they are contiguous */
+   while (parser->range + parser->np <= parser->end) {
+   u32 flags, pci_space;
+   u64 pci_addr, cpu_addr, size;
+
+   pci_space = be32_to_cpup(parser->range);
+   flags = of_bus_pci_get_flags(parser->range);
+   pci_addr = of_read_number(parser->range + 1, ns);
+   cpu_addr = of_translate_address(parser->node,
+   parser->range + na);
+   size = of_read_number(parser->range + parser->pna + na, ns);
+
+   if (flags != range->flags)
+   break;
+   if (pci_addr != range->pci_addr + range->size ||
+   cpu_addr != range->cpu_addr + range->size)
+   break;
+
+   range->size += size;
+   parser->range += parser->np;
+   }
+
+   return range;
+}
+EXPORT_SYMBOL_GPL(of_pci_process_ranges);
+
 #endif /* CONFIG_PCI */
 
 /*
diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0611248..9680dc6 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -82,67 +82,43 @@ EXPORT_SYMBOL_GPL(of_pci_find_child_device);
 void pci_process_bridge_OF_ranges(struct pci_controller *hose,
  struct device_node *dev, int primary)
 {
-   const u32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(dev);
-   int np = pna + 5;
int memno = 0, isa_hole = -1;
-   u32 pci_space;
-   unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
unsigned long long isa_mb = 0;
struct resource 

[PATCH v5 3/3] of/pci: mips: convert to common of_pci_range_parser

2013-04-10 Thread Andrew Murray
This patch converts the pci_load_of_ranges function to use the new common
of_pci_range_parser.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
---
 arch/mips/pci/pci.c |   50 --
 1 files changed, 16 insertions(+), 34 deletions(-)

diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 0872f12..bee49a4 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -122,51 +122,33 @@ static void pcibios_scanbus(struct pci_controller *hose)
 #ifdef CONFIG_OF
 void pci_load_of_ranges(struct pci_controller *hose, struct device_node *node)
 {
-   const __be32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(node);
-   int np = pna + 5;
+   struct of_pci_range_range range;
+   struct of_pci_range_parser parser;
+   u32 res_type;
 
pr_info("PCI host bridge %s ranges:\n", node->full_name);
-   ranges = of_get_property(node, "ranges", &rlen);
-   if (ranges == NULL)
-   return;
hose->of_node = node;
 
-   while ((rlen -= np * 4) >= 0) {
-   u32 pci_space;
+   if (of_pci_range_parser(&parser, node))
+   return;
+
+   for_each_of_pci_range(&parser, &range) {
struct resource *res = NULL;
-   u64 addr, size;
-
-   pci_space = be32_to_cpup(&ranges[0]);
-   addr = of_translate_address(node, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-   ranges += np;
-   switch ((pci_space >> 24) & 0x3) {
-   case 1: /* PCI IO space */
+
+   res_type = range.flags & IORESOURCE_TYPE_BITS;
+   if (res_type == IORESOURCE_IO) {
pr_info("  IO 0x%016llx..0x%016llx\n",
-   addr, addr + size - 1);
+   range.addr, range.addr + range.size - 1);
hose->io_map_base =
-   (unsigned long)ioremap(addr, size);
+   (unsigned long)ioremap(range.addr, range.size);
res = hose->io_resource;
-   res->flags = IORESOURCE_IO;
-   break;
-   case 2: /* PCI Memory space */
-   case 3: /* PCI 64 bits Memory space */
+   } else if (res_type == IORESOURCE_MEM) {
pr_info(" MEM 0x%016llx..0x%016llx\n",
-   addr, addr + size - 1);
+   range.addr, range.addr + range.size - 1);
res = hose->mem_resource;
-   res->flags = IORESOURCE_MEM;
-   break;
-   }
-   if (res != NULL) {
-   res->start = addr;
-   res->name = node->full_name;
-   res->end = res->start + size - 1;
-   res->parent = NULL;
-   res->sibling = NULL;
-   res->child = NULL;
}
+   if (res != NULL)
+   of_pci_range_to_resource(&range, node, res);
}
 }
 #endif
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 0/3] of/pci: Provide common support for PCI DT parsing

2013-04-10 Thread Andrew Murray
This patchset factors out duplicated code associated with parsing PCI
DT "ranges" properties across the architectures and introduces a
"ranges" parser. This parser "of_pci_range_parser" can be used directly
by ARM host bridge drivers enabling them to obtain ranges from device
trees.

Compared to the v4 (incorrectly labelled v3) sent by Andrew Murray,
the following changes have been made:

 * Split the patch as suggested by Rob Herring

Compared to the v3 sent by Andrew Murray, the following changes have
been made:

 * Unify and move duplicate pci_process_bridge_OF_ranges functions to
   drivers/of/of_pci.c as suggested by Rob Herring

 * Fix potential build errors with Microblaze/MIPS

Compared to "[PATCH v5 01/17] of/pci: Provide support for parsing PCI DT
ranges property", the following changes have been made:

 * Correct use of IORESOURCE_* as suggested by Russell King

 * Improved interface and naming as suggested by Thierry Reding

Compared to the v2 sent by Andrew Murray, Thomas Petazzoni did:

 * Add a memset() on the struct of_pci_range_iter when starting the
   for loop in for_each_pci_range(). Otherwise, with an uninitialized
   of_pci_range_iter, of_pci_process_ranges() may crash.

 * Add parenthesis around 'res', 'np' and 'iter' in the
   for_each_of_pci_range macro definitions. Otherwise, passing
   something like &foobar as 'res' didn't work.

 * Rebased on top of 3.9-rc2, which required fixing a few conflicts in
   the Microblaze code.

v2:
  This follows on from suggestions made by Grant Likely
  (marc.info/?l=linux-kernel&m=136079602806328)

Andrew Murray (3):
  of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and
PowerPC
  of/pci: Provide support for parsing PCI DT ranges property
  of/pci: mips: convert to common of_pci_range_parser

 arch/microblaze/include/asm/pci-bridge.h |5 +-
 arch/microblaze/pci/pci-common.c |  192 --
 arch/mips/pci/pci.c  |   50 +++--
 arch/powerpc/include/asm/pci-bridge.h|5 +-
 arch/powerpc/kernel/pci-common.c |  192 --
 drivers/of/address.c |   63 ++
 drivers/of/of_pci.c  |  168 ++
 include/linux/of_address.h   |   42 +++
 include/linux/of_pci.h   |3 +
 9 files changed, 294 insertions(+), 426 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 1/3] of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and PowerPC

2013-04-10 Thread Andrew Murray
The pci_process_bridge_OF_ranges function, used to parse the "ranges"
property of a PCI host device, is found in both Microblaze and PowerPC
architectures. These implementations are nearly identical. This patch
moves this common code to a common place.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
---
 arch/microblaze/include/asm/pci-bridge.h |5 +-
 arch/microblaze/pci/pci-common.c |  192 
 arch/powerpc/include/asm/pci-bridge.h|5 +-
 arch/powerpc/kernel/pci-common.c |  192 
 drivers/of/of_pci.c  |  200 ++
 include/linux/of_pci.h   |3 +
 6 files changed, 205 insertions(+), 392 deletions(-)

diff --git a/arch/microblaze/include/asm/pci-bridge.h 
b/arch/microblaze/include/asm/pci-bridge.h
index cb5d397..5783cd6 100644
--- a/arch/microblaze/include/asm/pci-bridge.h
+++ b/arch/microblaze/include/asm/pci-bridge.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device_node;
 
@@ -132,10 +133,6 @@ extern void setup_indirect_pci(struct pci_controller *hose,
 extern struct pci_controller *pci_find_hose_for_OF_device(
struct device_node *node);
 
-/* Fill up host controller resources from the OF node */
-extern void pci_process_bridge_OF_ranges(struct pci_controller *hose,
-   struct device_node *dev, int primary);
-
 /* Allocate & free a PCI host bridge structure */
 extern struct pci_controller *pcibios_alloc_controller(struct device_node 
*dev);
 extern void pcibios_free_controller(struct pci_controller *phb);
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 9ea521e..2735ad9 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -622,198 +622,6 @@ void pci_resource_to_user(const struct pci_dev *dev, int 
bar,
*end = rsrc->end - offset;
 }
 
-/**
- * pci_process_bridge_OF_ranges - Parse PCI bridge resources from device tree
- * @hose: newly allocated pci_controller to be setup
- * @dev: device node of the host bridge
- * @primary: set if primary bus (32 bits only, soon to be deprecated)
- *
- * This function will parse the "ranges" property of a PCI host bridge device
- * node and setup the resource mapping of a pci controller based on its
- * content.
- *
- * Life would be boring if it wasn't for a few issues that we have to deal
- * with here:
- *
- *   - We can only cope with one IO space range and up to 3 Memory space
- * ranges. However, some machines (thanks Apple !) tend to split their
- * space into lots of small contiguous ranges. So we have to coalesce.
- *
- *   - We can only cope with all memory ranges having the same offset
- * between CPU addresses and PCI addresses. Unfortunately, some bridges
- * are setup for a large 1:1 mapping along with a small "window" which
- * maps PCI address 0 to some arbitrary high address of the CPU space in
- * order to give access to the ISA memory hole.
- * The way out of here that I've chosen for now is to always set the
- * offset based on the first resource found, then override it if we
- * have a different offset and the previous was set by an ISA hole.
- *
- *   - Some busses have IO space not starting at 0, which causes trouble with
- * the way we do our IO resource renumbering. The code somewhat deals with
- * it for 64 bits but I would expect problems on 32 bits.
- *
- *   - Some 32 bits platforms such as 4xx can have physical space larger than
- * 32 bits so we need to use 64 bits values for the parsing
- */
-void pci_process_bridge_OF_ranges(struct pci_controller *hose,
- struct device_node *dev, int primary)
-{
-   const u32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(dev);
-   int np = pna + 5;
-   int memno = 0, isa_hole = -1;
-   u32 pci_space;
-   unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
-   unsigned long long isa_mb = 0;
-   struct resource *res;
-
-   pr_info("PCI host bridge %s %s ranges:\n",
-  dev->full_name, primary ? "(primary)" : "");
-
-   /* Get ranges property */
-   ranges = of_get_property(dev, "ranges", &rlen);
-   if (ranges == NULL)
-   return;
-
-   /* Parse it */
-   pr_debug("Parsing ranges property...\n");
-   while ((rlen -= np * 4) >= 0) {
-   /* Read next ranges element */
-   pci_space = ranges[0];
-   pci_addr = of_read_number(ranges + 1, 2);
-   cpu_addr = of_translate_address(dev, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-
-   pr_debug("pci_space: 0x%08x pci_addr:0x%016llx "

[PATCH v9 2/3] of/pci: mips: convert to common of_pci_range_parser

2013-05-07 Thread Andrew Murray
This patch converts the pci_load_of_ranges function to use the new common
of_pci_range_parser.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Signed-off-by: Gabor Juhos 
Reviewed-by: Rob Herring 
Reviewed-by: Grant Likely 
Tested-by: Linus Walleij 
---
 arch/mips/pci/pci.c |   50 ++
 1 files changed, 18 insertions(+), 32 deletions(-)

diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 0872f12..0d291e9 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -122,51 +122,37 @@ static void pcibios_scanbus(struct pci_controller *hose)
 #ifdef CONFIG_OF
 void pci_load_of_ranges(struct pci_controller *hose, struct device_node *node)
 {
-   const __be32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(node);
-   int np = pna + 5;
+   struct of_pci_range range;
+   struct of_pci_range_parser parser;
 
pr_info("PCI host bridge %s ranges:\n", node->full_name);
-   ranges = of_get_property(node, "ranges", &rlen);
-   if (ranges == NULL)
-   return;
hose->of_node = node;
 
-   while ((rlen -= np * 4) >= 0) {
-   u32 pci_space;
+   if (of_pci_range_parser_init(&parser, node))
+   return;
+
+   for_each_of_pci_range(&parser, &range) {
struct resource *res = NULL;
-   u64 addr, size;
-
-   pci_space = be32_to_cpup(&ranges[0]);
-   addr = of_translate_address(node, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-   ranges += np;
-   switch ((pci_space >> 24) & 0x3) {
-   case 1: /* PCI IO space */
+
+   switch (range.flags & IORESOURCE_TYPE_BITS) {
+   case IORESOURCE_IO:
pr_info("  IO 0x%016llx..0x%016llx\n",
-   addr, addr + size - 1);
+   range.cpu_addr,
+   range.cpu_addr + range.size - 1);
hose->io_map_base =
-   (unsigned long)ioremap(addr, size);
+   (unsigned long)ioremap(range.cpu_addr,
+  range.size);
res = hose->io_resource;
-   res->flags = IORESOURCE_IO;
break;
-   case 2: /* PCI Memory space */
-   case 3: /* PCI 64 bits Memory space */
+   case IORESOURCE_MEM:
pr_info(" MEM 0x%016llx..0x%016llx\n",
-   addr, addr + size - 1);
+   range.cpu_addr,
+   range.cpu_addr + range.size - 1);
res = hose->mem_resource;
-   res->flags = IORESOURCE_MEM;
break;
}
-   if (res != NULL) {
-   res->start = addr;
-   res->name = node->full_name;
-   res->end = res->start + size - 1;
-   res->parent = NULL;
-   res->sibling = NULL;
-   res->child = NULL;
-   }
+   if (res != NULL)
+   of_pci_range_to_resource(&range, node, res);
}
 }
 #endif
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 3/3] of/pci: microblaze: convert to common of_pci_range_parser

2013-05-07 Thread Andrew Murray
This patch converts the pci_load_of_ranges function to use the new common
of_pci_range_parser.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
---
 arch/microblaze/pci/pci-common.c |  106 ++
 1 files changed, 38 insertions(+), 68 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 9ea521e..ba9e4a1 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -658,67 +658,42 @@ void pci_resource_to_user(const struct pci_dev *dev, int 
bar,
 void pci_process_bridge_OF_ranges(struct pci_controller *hose,
  struct device_node *dev, int primary)
 {
-   const u32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(dev);
-   int np = pna + 5;
int memno = 0, isa_hole = -1;
-   u32 pci_space;
-   unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
unsigned long long isa_mb = 0;
struct resource *res;
+   struct of_pci_range range;
+   struct of_pci_range_parser parser;
 
pr_info("PCI host bridge %s %s ranges:\n",
   dev->full_name, primary ? "(primary)" : "");
 
-   /* Get ranges property */
-   ranges = of_get_property(dev, "ranges", &rlen);
-   if (ranges == NULL)
+   /* Check for ranges property */
+   if (of_pci_range_parser_init(&parser, dev))
return;
 
-   /* Parse it */
pr_debug("Parsing ranges property...\n");
-   while ((rlen -= np * 4) >= 0) {
+   for_each_of_pci_range(&parser, &range) {
/* Read next ranges element */
-   pci_space = ranges[0];
-   pci_addr = of_read_number(ranges + 1, 2);
-   cpu_addr = of_translate_address(dev, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-
pr_debug("pci_space: 0x%08x pci_addr:0x%016llx ",
-   pci_space, pci_addr);
+   range.pci_space, range.pci_addr);
pr_debug("cpu_addr:0x%016llx size:0x%016llx\n",
-   cpu_addr, size);
-
-   ranges += np;
+   range.cpu_addr, range.size);
 
/* If we failed translation or got a zero-sized region
 * (some FW try to feed us with non sensical zero sized regions
 * such as power3 which look like some kind of attempt
 * at exposing the VGA memory hole)
 */
-   if (cpu_addr == OF_BAD_ADDR || size == 0)
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
continue;
 
-   /* Now consume following elements while they are contiguous */
-   for (; rlen >= np * sizeof(u32);
-ranges += np, rlen -= np * 4) {
-   if (ranges[0] != pci_space)
-   break;
-   pci_next = of_read_number(ranges + 1, 2);
-   cpu_next = of_translate_address(dev, ranges + 3);
-   if (pci_next != pci_addr + size ||
-   cpu_next != cpu_addr + size)
-   break;
-   size += of_read_number(ranges + pna + 3, 2);
-   }
-
/* Act based on address space type */
res = NULL;
-   switch ((pci_space >> 24) & 0x3) {
-   case 1: /* PCI IO space */
+   switch (range.flags & IORESOURCE_TYPE_BITS) {
+   case IORESOURCE_IO:
pr_info("  IO 0x%016llx..0x%016llx -> 0x%016llx\n",
-  cpu_addr, cpu_addr + size - 1, pci_addr);
+   range.cpu_addr, range.cpu_addr + range.size - 1,
+   range.pci_addr);
 
/* We support only one IO range */
if (hose->pci_io_size) {
@@ -726,11 +701,12 @@ void pci_process_bridge_OF_ranges(struct pci_controller 
*hose,
continue;
}
/* On 32 bits, limit I/O space to 16MB */
-   if (size > 0x0100)
-   size = 0x0100;
+   if (range.size > 0x0100)
+   range.size = 0x0100;
 
/* 32 bits needs to map IOs here */
-   hose->io_base_virt = ioremap(cpu_addr, size);
+   hose->io_base_virt = ioremap(range.cpu_addr,
+   range.size);
 
/* Expect trouble if pci_addr is not 0 */

[PATCH v9 0/3] of/pci: Provide common support for PCI DT parsing

2013-05-07 Thread Andrew Murray
This patchset factors out duplicated code associated with parsing PCI
DT "ranges" properties across the architectures and introduces a
"ranges" parser. This parser "of_pci_range_parser" can be used directly
by ARM host bridge drivers enabling them to obtain ranges from device
trees.

I've included the Reviewed-by, Tested-by and Acked-by's received from
v5/v6/v7/v8 in this patchset, earlier versions of this patchset (v3) have been
tested-by:

Thierry Reding 
Jingoo Han 

I've tested that this patchset builds and runs on ARM and that it builds on
PowerPC, x86_64, MIPS and Microblaze.

Compared to the v8 sent by Andrew Murray, the following changes have been made
(please note that the MIPS patch is unchanged from v8):

 * Remove the unification of pci_process_bridge_OF_ranges between PowerPC and
   Microblaze. Feedback from Bjorn and Benjamin (along with a NAK) suggested
   that this goes against their future direction (using more of struct
   pci_host_bridge and less of arch specific struct pci_controller).

Compared to the v7 sent by Andrew Murray, the following changes have been made
(please note that the first patch is unchanged from v7):

 * Rename of_pci_range_parser to of_pci_range_parser_init and
   of_pci_process_ranges to of_pci_range_parser_one as suggested by Grant
   Likely.

 * Reverted back to using a switch statement instead of if/else in
   pci_process_bridge_OF_ranges. Grant Likely highlighted this change from
   the original code which was unnecessary.

 * Squashed in a patch provided by Gabor Juhos which fixes build errors on
   MIPS found in the last patchset.

Compared to the v6 sent by Andrew Murray, the following changes have
been made in response to build errors/warnings:

 * Inclusion of linux/of_address.h in of_pci.c as suggested by Michal
   Simek to prevent compilation failures on Microblaze (and others) and his
   ack.

 * Use of externs, static inlines and a typo in linux/of_address.h in response
   to linker errors (multiple defination) on x86_64 as spotted by a kbuild test
   robot on (jcooper/linux.git mvebu/drivers)

 * Add EXPORT_SYMBOL_GPL to of_pci_range_parser function to be consistent
   with of_pci_process_ranges function

Compared to the v5 sent by Andrew Murray, the following changes have
been made:

 * Use of CONFIG_64BIT instead of CONFIG_[a32bitarch] as suggested by
   Rob Herring in drivers/of/of_pci.c

 * Added forward declaration of struct pci_controller in linux/of_pci.h
   to prevent compiler warning as suggested by Thomas Petazzoni

 * Improved error checking (!range check), removal of unnecessary be32_to_cpup
   call, improved formatting of struct of_pci_range_parser layout and
   replacement of macro with a static inline. All suggested by Rob Herring.

Compared to the v4 (incorrectly labelled v3) sent by Andrew Murray,
the following changes have been made:

 * Split the patch as suggested by Rob Herring

Compared to the v3 sent by Andrew Murray, the following changes have
been made:

 * Unify and move duplicate pci_process_bridge_OF_ranges functions to
   drivers/of/of_pci.c as suggested by Rob Herring

 * Fix potential build errors with Microblaze/MIPS

Compared to "[PATCH v5 01/17] of/pci: Provide support for parsing PCI DT
ranges property", the following changes have been made:

 * Correct use of IORESOURCE_* as suggested by Russell King

 * Improved interface and naming as suggested by Thierry Reding

Compared to the v2 sent by Andrew Murray, Thomas Petazzoni did:

 * Add a memset() on the struct of_pci_range_iter when starting the
   for loop in for_each_pci_range(). Otherwise, with an uninitialized
   of_pci_range_iter, of_pci_process_ranges() may crash.

 * Add parenthesis around 'res', 'np' and 'iter' in the
   for_each_of_pci_range macro definitions. Otherwise, passing
   something like &foobar as 'res' didn't work.

 * Rebased on top of 3.9-rc2, which required fixing a few conflicts in
   the Microblaze code.

v2:
  This follows on from suggestions made by Grant Likely
  (marc.info/?l=linux-kernel&m=136079602806328)

Andrew Murray (3):
  of/pci: Provide support for parsing PCI DT ranges property
  of/pci: mips: convert to common of_pci_range_parser
  of/pci: microblaze: convert to common of_pci_range_parser

 arch/microblaze/pci/pci-common.c |  106 ++
 arch/mips/pci/pci.c  |   50 ++---
 drivers/of/address.c |   67 
 include/linux/of_address.h   |   48 +
 4 files changed, 171 insertions(+), 100 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v9 1/3] of/pci: Provide support for parsing PCI DT ranges property

2013-05-07 Thread Andrew Murray
This patch factors out common implementation patterns to reduce overall kernel
code and provide a means for host bridge drivers to directly obtain struct
resources from the DT's ranges property without relying on architecture specific
DT handling. This will make it easier to write archiecture independent host 
bridge
drivers and mitigate against further duplication of DT parsing code.

This patch can be used in the following way:

struct of_pci_range_parser parser;
struct of_pci_range range;

if (of_pci_range_parser_init(&parser, np))
; //no ranges property

for_each_of_pci_range(&parser, &range) {

/*
directly access properties of the address range, e.g.:
range.pci_space, range.pci_addr, range.cpu_addr,
range.size, range.flags

alternatively obtain a struct resource, e.g.:
struct resource res;
of_pci_range_to_resource(&range, np, &res);
*/
}

Additionally the implementation takes care of adjacent ranges and merges them
into a single range (as was the case with powerpc and microblaze).

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Signed-off-by: Thomas Petazzoni 
Reviewed-by: Rob Herring 
Tested-by: Thomas Petazzoni 
Tested-by: Linus Walleij 
Tested-by: Jingoo Han 
Acked-by: Grant Likely 
---
 drivers/of/address.c   |   67 
 include/linux/of_address.h |   48 +++
 2 files changed, 115 insertions(+), 0 deletions(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 04da786..fdd0636 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -227,6 +227,73 @@ int of_pci_address_to_resource(struct device_node *dev, 
int bar,
return __of_address_to_resource(dev, addrp, size, flags, NULL, r);
 }
 EXPORT_SYMBOL_GPL(of_pci_address_to_resource);
+
+int of_pci_range_parser_init(struct of_pci_range_parser *parser,
+   struct device_node *node)
+{
+   const int na = 3, ns = 2;
+   int rlen;
+
+   parser->node = node;
+   parser->pna = of_n_addr_cells(node);
+   parser->np = parser->pna + na + ns;
+
+   parser->range = of_get_property(node, "ranges", &rlen);
+   if (parser->range == NULL)
+   return -ENOENT;
+
+   parser->end = parser->range + rlen / sizeof(__be32);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(of_pci_range_parser_init);
+
+struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser 
*parser,
+   struct of_pci_range *range)
+{
+   const int na = 3, ns = 2;
+
+   if (!range)
+   return NULL;
+
+   if (!parser->range || parser->range + parser->np > parser->end)
+   return NULL;
+
+   range->pci_space = parser->range[0];
+   range->flags = of_bus_pci_get_flags(parser->range);
+   range->pci_addr = of_read_number(parser->range + 1, ns);
+   range->cpu_addr = of_translate_address(parser->node,
+   parser->range + na);
+   range->size = of_read_number(parser->range + parser->pna + na, ns);
+
+   parser->range += parser->np;
+
+   /* Now consume following elements while they are contiguous */
+   while (parser->range + parser->np <= parser->end) {
+   u32 flags, pci_space;
+   u64 pci_addr, cpu_addr, size;
+
+   pci_space = be32_to_cpup(parser->range);
+   flags = of_bus_pci_get_flags(parser->range);
+   pci_addr = of_read_number(parser->range + 1, ns);
+   cpu_addr = of_translate_address(parser->node,
+   parser->range + na);
+   size = of_read_number(parser->range + parser->pna + na, ns);
+
+   if (flags != range->flags)
+   break;
+   if (pci_addr != range->pci_addr + range->size ||
+   cpu_addr != range->cpu_addr + range->size)
+   break;
+
+   range->size += size;
+   parser->range += parser->np;
+   }
+
+   return range;
+}
+EXPORT_SYMBOL_GPL(of_pci_range_parser_one);
+
 #endif /* CONFIG_PCI */
 
 /*
diff --git a/include/linux/of_address.h b/include/linux/of_address.h
index 0506eb5..4c2e6f2 100644
--- a/include/linux/of_address.h
+++ b/include/linux/of_address.h
@@ -4,6 +4,36 @@
 #include 
 #include 
 
+struct of_pci_range_parser {
+   struct device_node *node;
+   const __be32 *range;
+   const __be32 *end;
+   int np;
+   int pna;
+};
+
+struct of_pci_range {
+   u32 pci_space;
+   u64 pci_addr;
+   u64 c

[RFC PATCH 0/3] Unify definations of struct pci_controller

2013-04-25 Thread Andrew Murray
PowerPC and Microblaze have nearly identical definations of struct
pci_controller - this patch unifies them in asm-generic to reduce
code duplication and to allow new architectures to reuse.

This patchset follows and depends on "of/pci: Provide common
support for PCI DT parsing" which provided common 'ranges' parsing
code which uses an architecture defined struct pci_controller. This
patch is currently in Jason Coopers mvebu-next/pcie branch.

It is hoped this will pave the way for providing common
implementations of commonly duplicated functions found across the
architectures such as pcibios_alloc|free_controller and
pcibios_setup_phb_resources type functions.

Andrew Murray (3):
  powerpc: Move struct pci_controller to asm-generic
  microblaze: Use asm-generic version of pci_controller
  pci: Use common definations of INDIRECT_TYPE_*

 arch/microblaze/include/asm/pci-bridge.h |   70 +---
 arch/powerpc/include/asm/pci-bridge.h|   82 ---
 arch/powerpc/sysdev/fsl_pci.c|   16 +++---
 arch/powerpc/sysdev/indirect_pci.c   |   20 +++---
 arch/powerpc/sysdev/ppc4xx_pci.c |4 +-
 arch/powerpc/sysdev/xilinx_pci.c |2 +-
 include/asm-generic/pci-bridge.h |   90 ++
 7 files changed, 112 insertions(+), 172 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 2/3] microblaze: Use asm-generic version of pci_controller

2013-04-25 Thread Andrew Murray
This patch removes struct pci_controller from Microblaze and instead
uses struct pci_controller from asm-generic.

Signed-off-by: Andrew Murray 
---
 arch/microblaze/include/asm/pci-bridge.h |   75 ++
 include/asm-generic/pci-bridge.h |2 +-
 2 files changed, 16 insertions(+), 61 deletions(-)

diff --git a/arch/microblaze/include/asm/pci-bridge.h 
b/arch/microblaze/include/asm/pci-bridge.h
index 5783cd6..0ee75dc 100644
--- a/arch/microblaze/include/asm/pci-bridge.h
+++ b/arch/microblaze/include/asm/pci-bridge.h
@@ -8,9 +8,9 @@
  * 2 of the License, or (at your option) any later version.
  */
 #include 
-#include 
 #include 
 #include 
+#include 
 
 struct device_node;
 
@@ -25,72 +25,27 @@ static inline int pcibios_vaddr_is_ioport(void __iomem 
*address)
 #endif
 
 /*
- * Structure of a PCI controller (host bridge)
+ * Used for variants of PCI indirect handling and possible quirks:
+ *  SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1
+ *  EXT_REG - provides access to PCI-e extended registers
+ *  SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS
+ *   on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS
+ *   to determine which bus number to match on when generating type0
+ *   config cycles
+ *  NO_PCIE_LINK - the Freescale PCI-e controllers have issues with
+ *   hanging if we don't have link and try to do config cycles to
+ *   anything but the PHB.  Only allow talking to the PHB if this is
+ *   set.
+ *  BIG_ENDIAN - cfg_addr is a big endian register
+ *  BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs
+ *   on the PLB4.  Effectively disable MRM commands by setting this.
  */
-struct pci_controller {
-   struct pci_bus *bus;
-   char is_dynamic;
-   struct device_node *dn;
-   struct list_head list_node;
-   struct device *parent;
-
-   int first_busno;
-   int last_busno;
-
-   int self_busno;
-
-   void __iomem *io_base_virt;
-   resource_size_t io_base_phys;
-
-   resource_size_t pci_io_size;
-
-   /* Some machines (PReP) have a non 1:1 mapping of
-* the PCI memory space in the CPU bus space
-*/
-   resource_size_t pci_mem_offset;
-
-   /* Some machines have a special region to forward the ISA
-* "memory" cycles such as VGA memory regions. Left to 0
-* if unsupported
-*/
-   resource_size_t isa_mem_phys;
-   resource_size_t isa_mem_size;
-
-   struct pci_ops *ops;
-   unsigned int __iomem *cfg_addr;
-   void __iomem *cfg_data;
-
-   /*
-* Used for variants of PCI indirect handling and possible quirks:
-*  SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1
-*  EXT_REG - provides access to PCI-e extended registers
-*  SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS
-*   on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS
-*   to determine which bus number to match on when generating type0
-*   config cycles
-*  NO_PCIE_LINK - the Freescale PCI-e controllers have issues with
-*   hanging if we don't have link and try to do config cycles to
-*   anything but the PHB.  Only allow talking to the PHB if this is
-*   set.
-*  BIG_ENDIAN - cfg_addr is a big endian register
-*  BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs
-*   on the PLB4.  Effectively disable MRM commands by setting this.
-*/
 #define INDIRECT_TYPE_SET_CFG_TYPE 0x0001
 #define INDIRECT_TYPE_EXT_REG  0x0002
 #define INDIRECT_TYPE_SURPRESS_PRIMARY_BUS 0x0004
 #define INDIRECT_TYPE_NO_PCIE_LINK 0x0008
 #define INDIRECT_TYPE_BIG_ENDIAN   0x0010
 #define INDIRECT_TYPE_BROKEN_MRM   0x0020
-   u32 indirect_type;
-
-   /* Currently, we limit ourselves to 1 IO range and 3 mem
-* ranges since the common pci_bus structure can't handle more
-*/
-   struct resource io_resource;
-   struct resource mem_resources[3];
-   int global_number;  /* PCI domain number */
-};
 
 #ifdef CONFIG_PCI
 static inline struct pci_controller *pci_bus_to_host(const struct pci_bus *bus)
diff --git a/include/asm-generic/pci-bridge.h b/include/asm-generic/pci-bridge.h
index e58830e..1a7f96d 100644
--- a/include/asm-generic/pci-bridge.h
+++ b/include/asm-generic/pci-bridge.h
@@ -46,7 +46,7 @@ struct device_node;
 /*
  * Structure of a PCI controller (host bridge)
  */
-#if defined(CONFIG_PPC32) || defined(CONFIG_PPC64)
+#if defined(CONFIG_PPC32) || defined(CONFIG_PPC64) || 
defined(CONFIG_MICROBLAZE)
 struct pci_controller {
struct pci_bus *bus;
char is_dynamic;
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kern

[RFC PATCH 3/3] pci: Use common definations of INDIRECT_TYPE_*

2013-04-25 Thread Andrew Murray
This patch unifies similar definations of INDIRECT_TYPE_* between
PowerPC and Microblaze.

Signed-off-by: Andrew Murray 
---
 arch/microblaze/include/asm/pci-bridge.h |   23 ---
 arch/powerpc/include/asm/pci-bridge.h|   23 ---
 arch/powerpc/sysdev/fsl_pci.c|   16 
 arch/powerpc/sysdev/indirect_pci.c   |   20 ++--
 arch/powerpc/sysdev/ppc4xx_pci.c |4 ++--
 arch/powerpc/sysdev/xilinx_pci.c |2 +-
 include/asm-generic/pci-bridge.h |   22 ++
 7 files changed, 43 insertions(+), 67 deletions(-)

diff --git a/arch/microblaze/include/asm/pci-bridge.h 
b/arch/microblaze/include/asm/pci-bridge.h
index 0ee75dc..acf8252 100644
--- a/arch/microblaze/include/asm/pci-bridge.h
+++ b/arch/microblaze/include/asm/pci-bridge.h
@@ -24,29 +24,6 @@ static inline int pcibios_vaddr_is_ioport(void __iomem 
*address)
 }
 #endif
 
-/*
- * Used for variants of PCI indirect handling and possible quirks:
- *  SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1
- *  EXT_REG - provides access to PCI-e extended registers
- *  SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS
- *   on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS
- *   to determine which bus number to match on when generating type0
- *   config cycles
- *  NO_PCIE_LINK - the Freescale PCI-e controllers have issues with
- *   hanging if we don't have link and try to do config cycles to
- *   anything but the PHB.  Only allow talking to the PHB if this is
- *   set.
- *  BIG_ENDIAN - cfg_addr is a big endian register
- *  BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs
- *   on the PLB4.  Effectively disable MRM commands by setting this.
- */
-#define INDIRECT_TYPE_SET_CFG_TYPE 0x0001
-#define INDIRECT_TYPE_EXT_REG  0x0002
-#define INDIRECT_TYPE_SURPRESS_PRIMARY_BUS 0x0004
-#define INDIRECT_TYPE_NO_PCIE_LINK 0x0008
-#define INDIRECT_TYPE_BIG_ENDIAN   0x0010
-#define INDIRECT_TYPE_BROKEN_MRM   0x0020
-
 #ifdef CONFIG_PCI
 static inline struct pci_controller *pci_bus_to_host(const struct pci_bus *bus)
 {
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 163bd40..b2bbf05 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -14,29 +14,6 @@
 
 struct device_node;
 
-/*
- * Used for variants of PCI indirect handling and possible quirks:
- *  SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1
- *  EXT_REG - provides access to PCI-e extended registers
- *  SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS
- *   on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS
- *   to determine which bus number to match on when generating type0
- *   config cycles
- *  NO_PCIE_LINK - the Freescale PCI-e controllers have issues with
- *   hanging if we don't have link and try to do config cycles to
- *   anything but the PHB.  Only allow talking to the PHB if this is
- *   set.
- *  BIG_ENDIAN - cfg_addr is a big endian register
- *  BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs on
- *   the PLB4.  Effectively disable MRM commands by setting this.
- */
-#define PPC_INDIRECT_TYPE_SET_CFG_TYPE 0x0001
-#define PPC_INDIRECT_TYPE_EXT_REG  0x0002
-#define PPC_INDIRECT_TYPE_SURPRESS_PRIMARY_BUS 0x0004
-#define PPC_INDIRECT_TYPE_NO_PCIE_LINK 0x0008
-#define PPC_INDIRECT_TYPE_BIG_ENDIAN   0x0010
-#define PPC_INDIRECT_TYPE_BROKEN_MRM   0x0020
-
 /* These are used for config access before all the PCI probing
has been done. */
 extern int early_read_config_byte(struct pci_controller *hose, int bus,
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 682084d..d73f94a 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -399,7 +399,7 @@ void fsl_pcibios_fixup_bus(struct pci_bus *bus)
 
if (fsl_pcie_bus_fixup)
is_pcie = early_find_capability(hose, 0, 0, PCI_CAP_ID_EXP);
-   no_link = !!(hose->indirect_type & PPC_INDIRECT_TYPE_NO_PCIE_LINK);
+   no_link = !!(hose->indirect_type & INDIRECT_TYPE_NO_PCIE_LINK);
 
if (bus->parent == hose->bus && (is_pcie || no_link)) {
for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; ++i) {
@@ -462,7 +462,7 @@ int __init fsl_add_bridge(struct platform_device *pdev, int 
is_primary)
hose->last_busno = bus_range ? bus_range[1] : 0xff;
 
setup_indirect_pci(hose, rsrc.start, rsrc.start + 0x4,
-   PPC_INDIRECT_TYPE_BIG_ENDIAN);
+   INDIRECT_TYPE_BIG_ENDIAN);
 
if (early_find_capability(hose, 0, 0, PCI_CAP_ID_EXP)) {
/* For PCIE read HEADER_TYPE to identify contro

[RFC PATCH 1/3] powerpc: Move struct pci_controller to asm-generic

2013-04-25 Thread Andrew Murray
This patch moves struct pci_controller into asm-generic to allow
for use by other architectures thus reducing code duplication in
the kernel.

Signed-off-by: Andrew Murray 
---
 arch/powerpc/include/asm/pci-bridge.h |   87 +---
 include/asm-generic/pci-bridge.h  |   68 +
 2 files changed, 82 insertions(+), 73 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 205bfba..163bd40 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -8,7 +8,6 @@
  * 2 of the License, or (at your option) any later version.
  */
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -16,85 +15,27 @@
 struct device_node;
 
 /*
- * Structure of a PCI controller (host bridge)
+ * Used for variants of PCI indirect handling and possible quirks:
+ *  SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1
+ *  EXT_REG - provides access to PCI-e extended registers
+ *  SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS
+ *   on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS
+ *   to determine which bus number to match on when generating type0
+ *   config cycles
+ *  NO_PCIE_LINK - the Freescale PCI-e controllers have issues with
+ *   hanging if we don't have link and try to do config cycles to
+ *   anything but the PHB.  Only allow talking to the PHB if this is
+ *   set.
+ *  BIG_ENDIAN - cfg_addr is a big endian register
+ *  BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs on
+ *   the PLB4.  Effectively disable MRM commands by setting this.
  */
-struct pci_controller {
-   struct pci_bus *bus;
-   char is_dynamic;
-#ifdef CONFIG_PPC64
-   int node;
-#endif
-   struct device_node *dn;
-   struct list_head list_node;
-   struct device *parent;
-
-   int first_busno;
-   int last_busno;
-   int self_busno;
-   struct resource busn;
-
-   void __iomem *io_base_virt;
-#ifdef CONFIG_PPC64
-   void *io_base_alloc;
-#endif
-   resource_size_t io_base_phys;
-   resource_size_t pci_io_size;
-
-   /* Some machines (PReP) have a non 1:1 mapping of
-* the PCI memory space in the CPU bus space
-*/
-   resource_size_t pci_mem_offset;
-
-   /* Some machines have a special region to forward the ISA
-* "memory" cycles such as VGA memory regions. Left to 0
-* if unsupported
-*/
-   resource_size_t isa_mem_phys;
-   resource_size_t isa_mem_size;
-
-   struct pci_ops *ops;
-   unsigned int __iomem *cfg_addr;
-   void __iomem *cfg_data;
-
-   /*
-* Used for variants of PCI indirect handling and possible quirks:
-*  SET_CFG_TYPE - used on 4xx or any PHB that does explicit type0/1
-*  EXT_REG - provides access to PCI-e extended registers
-*  SURPRESS_PRIMARY_BUS - we suppress the setting of PCI_PRIMARY_BUS
-*   on Freescale PCI-e controllers since they used the PCI_PRIMARY_BUS
-*   to determine which bus number to match on when generating type0
-*   config cycles
-*  NO_PCIE_LINK - the Freescale PCI-e controllers have issues with
-*   hanging if we don't have link and try to do config cycles to
-*   anything but the PHB.  Only allow talking to the PHB if this is
-*   set.
-*  BIG_ENDIAN - cfg_addr is a big endian register
-*  BROKEN_MRM - the 440EPx/GRx chips have an errata that causes hangs 
on
-*   the PLB4.  Effectively disable MRM commands by setting this.
-*/
 #define PPC_INDIRECT_TYPE_SET_CFG_TYPE 0x0001
 #define PPC_INDIRECT_TYPE_EXT_REG  0x0002
 #define PPC_INDIRECT_TYPE_SURPRESS_PRIMARY_BUS 0x0004
 #define PPC_INDIRECT_TYPE_NO_PCIE_LINK 0x0008
 #define PPC_INDIRECT_TYPE_BIG_ENDIAN   0x0010
 #define PPC_INDIRECT_TYPE_BROKEN_MRM   0x0020
-   u32 indirect_type;
-   /* Currently, we limit ourselves to 1 IO range and 3 mem
-* ranges since the common pci_bus structure can't handle more
-*/
-   struct resource io_resource;
-   struct resource mem_resources[3];
-   int global_number;  /* PCI domain number */
-
-   resource_size_t dma_window_base_cur;
-   resource_size_t dma_window_size;
-
-#ifdef CONFIG_PPC64
-   unsigned long buid;
-
-   void *private_data;
-#endif /* CONFIG_PPC64 */
-};
 
 /* These are used for config access before all the PCI probing
has been done. */
diff --git a/include/asm-generic/pci-bridge.h b/include/asm-generic/pci-bridge.h
index 20db2e5..e58830e 100644
--- a/include/asm-generic/pci-bridge.h
+++ b/include/asm-generic/pci-bridge.h
@@ -9,6 +9,9 @@
 
 #ifdef __KERNEL__
 
+#include 
+#include 
+
 enum {
/* Force re-assigning all resources (ignore firmware
 * setup completely

Re: [PATCH v5 1/3] of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and PowerPC

2013-04-11 Thread Andrew Murray
On Wed, Apr 10, 2013 at 02:13:54PM +0100, Rob Herring wrote:
> Adding Ben H and Michal...
> 
> On 04/10/2013 02:29 AM, Andrew Murray wrote:
> > The pci_process_bridge_OF_ranges function, used to parse the "ranges"
> > property of a PCI host device, is found in both Microblaze and PowerPC
> > architectures. These implementations are nearly identical. This patch
> > moves this common code to a common place.
> > 
> > Signed-off-by: Andrew Murray 
> > Signed-off-by: Liviu Dudau 
> 
> One comment below. Otherwise,
> 
> Reviewed-by: Rob Herring 
> 
> You need also need acks from Ben and Michal.
> 
> [...]
> 
> > +   /* Act based on address space type */
> > +   res = NULL;
> > +   switch ((pci_space >> 24) & 0x3) {
> > +   case 1: /* PCI IO space */
> > +   pr_info("  IO 0x%016llx..0x%016llx -> 0x%016llx\n",
> > +  cpu_addr, cpu_addr + size - 1, pci_addr);
> > +
> > +   /* We support only one IO range */
> > +   if (hose->pci_io_size) {
> > +   pr_info(" \\--> Skipped (too many) !\n");
> > +   continue;
> > +   }
> > +#if defined(CONFIG_PPC32) || defined(CONFIG_MICROBLAZE)
> 
> How about "if (!IS_ENABLED(CONFIG_64BIT))" instead.

OK I'll add in my next re-spin. Would "#ifndef CONFIG_64BIT" suffice?

> 
> > +   /* On 32 bits, limit I/O space to 16MB */
> > +   if (size > 0x0100)
> > +   size = 0x0100;
> > +
> > +   /* 32 bits needs to map IOs here */
> > +   hose->io_base_virt = ioremap(cpu_addr, size);
> > +
> > +   /* Expect trouble if pci_addr is not 0 */
> > +   if (primary)
> > +   isa_io_base =
> > +   (unsigned long)hose->io_base_virt;
> > +#endif /* CONFIG_PPC32 || CONFIG_MICROBLAZE */
> > +   /* pci_io_size and io_base_phys always represent IO
> > +* space starting at 0 so we factor in pci_addr
> > +*/
> > +   hose->pci_io_size = pci_addr + size;
> > +   hose->io_base_phys = cpu_addr - pci_addr;
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 2/3] of/pci: Provide support for parsing PCI DT ranges property

2013-04-11 Thread Andrew Murray
On Wed, Apr 10, 2013 at 07:26:02PM +0100, Rob Herring wrote:
> On 04/10/2013 02:29 AM, Andrew Murray wrote:
> > This patch factors out common implementation patterns to reduce overall 
> > kernel
> > code and provide a means for host bridge drivers to directly obtain struct
> > resources from the DT's ranges property without relying on architecture 
> > specific
> > DT handling. This will make it easier to write archiecture independent host 
> > bridge
> > drivers and mitigate against further duplication of DT parsing code.
> >
> > This patch can be used in the following way:
> >
> >   struct of_pci_range_parser parser;
> >   struct of_pci_range range;
> >
> >   if (of_pci_range_parser(&parser, np))
> >   ; //no ranges property
> >
> >   for_each_of_pci_range(&parser, &range) {
> >
> >   /*
> >   directly access properties of the address range, e.g.:
> >   range.pci_space, range.pci_addr, range.cpu_addr,
> >   range.size, range.flags
> >
> >   alternatively obtain a struct resource, e.g.:
> >   struct resource res;
> >   of_pci_range_to_resource(&range, np, &res);
> >   */
> >   }
> >
> > Additionally the implementation takes care of adjacent ranges and merges 
> > them
> > into a single range (as was the case with powerpc and microblaze).
> >
> > Signed-off-by: Andrew Murray 
> > Signed-off-by: Liviu Dudau 
> > Signed-off-by: Thomas Petazzoni 
> > ---
> 
> A few minor things below, otherwise:
> 
> Reviewed-by: Rob Herring 
> 

Thanks for the feedback, I've included this in my next spin.

Andrew Murray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 0/3] of/pci: Provide common support for PCI DT parsing

2013-04-11 Thread Andrew Murray
This patchset factors out duplicated code associated with parsing PCI
DT "ranges" properties across the architectures and introduces a
"ranges" parser. This parser "of_pci_range_parser" can be used directly
by ARM host bridge drivers enabling them to obtain ranges from device
trees.

I've included the Reviewed-by and Tested-by's received from v5 in this
patchset, earlier versions of this patchset (v3) have been tested-by:

Thierry Reding 
Jingoo Han 

I believe a version of this patchset has also been tested through its
inclusion in Thomas Petazzoni's Armada 370 and Armada XP SoCs PCIe support by:

Linus Walleij 

I've tested that this patchset builds and runs on ARM and that it builds on
PowerPC.

Compared to the v5 sent by Andrew Murray, the following changes have
been made:

 * Use of CONFIG_64BIT instead of CONFIG_[a32bitarch] as suggested by
   Rob Herring in drivers/of/of_pci.c

 * Added forward declaration of struct pci_controller in linux/of_pci.h
   to prevent compiler warning as suggested by Thomas Petazzoni

 * Improved error checking (!range check), removal of unnecessary be32_to_cpup
   call, improved formatting of struct of_pci_range_parser layout and
   replacement of macro with a static inline. All suggested by Rob Herring.

Compared to the v4 (incorrectly labelled v3) sent by Andrew Murray,
the following changes have been made:

 * Split the patch as suggested by Rob Herring

Compared to the v3 sent by Andrew Murray, the following changes have
been made:

 * Unify and move duplicate pci_process_bridge_OF_ranges functions to
   drivers/of/of_pci.c as suggested by Rob Herring

 * Fix potential build errors with Microblaze/MIPS

Compared to "[PATCH v5 01/17] of/pci: Provide support for parsing PCI DT
ranges property", the following changes have been made:

 * Correct use of IORESOURCE_* as suggested by Russell King

 * Improved interface and naming as suggested by Thierry Reding

Compared to the v2 sent by Andrew Murray, Thomas Petazzoni did:

 * Add a memset() on the struct of_pci_range_iter when starting the
   for loop in for_each_pci_range(). Otherwise, with an uninitialized
   of_pci_range_iter, of_pci_process_ranges() may crash.

 * Add parenthesis around 'res', 'np' and 'iter' in the
   for_each_of_pci_range macro definitions. Otherwise, passing
   something like &foobar as 'res' didn't work.

 * Rebased on top of 3.9-rc2, which required fixing a few conflicts in
   the Microblaze code.

v2:
  This follows on from suggestions made by Grant Likely
  (marc.info/?l=linux-kernel&m=136079602806328)

Andrew Murray (3):
  of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and
PowerPC
  of/pci: Provide support for parsing PCI DT ranges property
  of/pci: mips: convert to common of_pci_range_parser

 arch/microblaze/include/asm/pci-bridge.h |5 +-
 arch/microblaze/pci/pci-common.c |  192 --
 arch/mips/pci/pci.c  |   50 +++--
 arch/powerpc/include/asm/pci-bridge.h|5 +-
 arch/powerpc/kernel/pci-common.c |  192 --
 drivers/of/address.c |   66 ++
 drivers/of/of_pci.c  |  168 ++
 include/linux/of_address.h   |   46 +++
 include/linux/of_pci.h   |4 +
 9 files changed, 302 insertions(+), 426 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 3/3] of/pci: mips: convert to common of_pci_range_parser

2013-04-11 Thread Andrew Murray
This patch converts the pci_load_of_ranges function to use the new common
of_pci_range_parser.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Reviewed-by: Rob Herring 
---
 arch/mips/pci/pci.c |   50 --
 1 files changed, 16 insertions(+), 34 deletions(-)

diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 0872f12..bee49a4 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -122,51 +122,33 @@ static void pcibios_scanbus(struct pci_controller *hose)
 #ifdef CONFIG_OF
 void pci_load_of_ranges(struct pci_controller *hose, struct device_node *node)
 {
-   const __be32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(node);
-   int np = pna + 5;
+   struct of_pci_range_range range;
+   struct of_pci_range_parser parser;
+   u32 res_type;
 
pr_info("PCI host bridge %s ranges:\n", node->full_name);
-   ranges = of_get_property(node, "ranges", &rlen);
-   if (ranges == NULL)
-   return;
hose->of_node = node;
 
-   while ((rlen -= np * 4) >= 0) {
-   u32 pci_space;
+   if (of_pci_range_parser(&parser, node))
+   return;
+
+   for_each_of_pci_range(&parser, &range) {
struct resource *res = NULL;
-   u64 addr, size;
-
-   pci_space = be32_to_cpup(&ranges[0]);
-   addr = of_translate_address(node, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-   ranges += np;
-   switch ((pci_space >> 24) & 0x3) {
-   case 1: /* PCI IO space */
+
+   res_type = range.flags & IORESOURCE_TYPE_BITS;
+   if (res_type == IORESOURCE_IO) {
pr_info("  IO 0x%016llx..0x%016llx\n",
-   addr, addr + size - 1);
+   range.addr, range.addr + range.size - 1);
hose->io_map_base =
-   (unsigned long)ioremap(addr, size);
+   (unsigned long)ioremap(range.addr, range.size);
res = hose->io_resource;
-   res->flags = IORESOURCE_IO;
-   break;
-   case 2: /* PCI Memory space */
-   case 3: /* PCI 64 bits Memory space */
+   } else if (res_type == IORESOURCE_MEM) {
pr_info(" MEM 0x%016llx..0x%016llx\n",
-   addr, addr + size - 1);
+   range.addr, range.addr + range.size - 1);
res = hose->mem_resource;
-   res->flags = IORESOURCE_MEM;
-   break;
-   }
-   if (res != NULL) {
-   res->start = addr;
-   res->name = node->full_name;
-   res->end = res->start + size - 1;
-   res->parent = NULL;
-   res->sibling = NULL;
-   res->child = NULL;
}
+   if (res != NULL)
+   of_pci_range_to_resource(&range, node, res);
}
 }
 #endif
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 1/3] of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and PowerPC

2013-04-11 Thread Andrew Murray
The pci_process_bridge_OF_ranges function, used to parse the "ranges"
property of a PCI host device, is found in both Microblaze and PowerPC
architectures. These implementations are nearly identical. This patch
moves this common code to a common place.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Reviewed-by: Rob Herring 
Tested-by: Thomas Petazzoni 
---
 arch/microblaze/include/asm/pci-bridge.h |5 +-
 arch/microblaze/pci/pci-common.c |  192 
 arch/powerpc/include/asm/pci-bridge.h|5 +-
 arch/powerpc/kernel/pci-common.c |  192 
 drivers/of/of_pci.c  |  200 ++
 include/linux/of_pci.h   |4 +
 6 files changed, 206 insertions(+), 392 deletions(-)

diff --git a/arch/microblaze/include/asm/pci-bridge.h 
b/arch/microblaze/include/asm/pci-bridge.h
index cb5d397..5783cd6 100644
--- a/arch/microblaze/include/asm/pci-bridge.h
+++ b/arch/microblaze/include/asm/pci-bridge.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device_node;
 
@@ -132,10 +133,6 @@ extern void setup_indirect_pci(struct pci_controller *hose,
 extern struct pci_controller *pci_find_hose_for_OF_device(
struct device_node *node);
 
-/* Fill up host controller resources from the OF node */
-extern void pci_process_bridge_OF_ranges(struct pci_controller *hose,
-   struct device_node *dev, int primary);
-
 /* Allocate & free a PCI host bridge structure */
 extern struct pci_controller *pcibios_alloc_controller(struct device_node 
*dev);
 extern void pcibios_free_controller(struct pci_controller *phb);
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 9ea521e..2735ad9 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -622,198 +622,6 @@ void pci_resource_to_user(const struct pci_dev *dev, int 
bar,
*end = rsrc->end - offset;
 }
 
-/**
- * pci_process_bridge_OF_ranges - Parse PCI bridge resources from device tree
- * @hose: newly allocated pci_controller to be setup
- * @dev: device node of the host bridge
- * @primary: set if primary bus (32 bits only, soon to be deprecated)
- *
- * This function will parse the "ranges" property of a PCI host bridge device
- * node and setup the resource mapping of a pci controller based on its
- * content.
- *
- * Life would be boring if it wasn't for a few issues that we have to deal
- * with here:
- *
- *   - We can only cope with one IO space range and up to 3 Memory space
- * ranges. However, some machines (thanks Apple !) tend to split their
- * space into lots of small contiguous ranges. So we have to coalesce.
- *
- *   - We can only cope with all memory ranges having the same offset
- * between CPU addresses and PCI addresses. Unfortunately, some bridges
- * are setup for a large 1:1 mapping along with a small "window" which
- * maps PCI address 0 to some arbitrary high address of the CPU space in
- * order to give access to the ISA memory hole.
- * The way out of here that I've chosen for now is to always set the
- * offset based on the first resource found, then override it if we
- * have a different offset and the previous was set by an ISA hole.
- *
- *   - Some busses have IO space not starting at 0, which causes trouble with
- * the way we do our IO resource renumbering. The code somewhat deals with
- * it for 64 bits but I would expect problems on 32 bits.
- *
- *   - Some 32 bits platforms such as 4xx can have physical space larger than
- * 32 bits so we need to use 64 bits values for the parsing
- */
-void pci_process_bridge_OF_ranges(struct pci_controller *hose,
- struct device_node *dev, int primary)
-{
-   const u32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(dev);
-   int np = pna + 5;
-   int memno = 0, isa_hole = -1;
-   u32 pci_space;
-   unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
-   unsigned long long isa_mb = 0;
-   struct resource *res;
-
-   pr_info("PCI host bridge %s %s ranges:\n",
-  dev->full_name, primary ? "(primary)" : "");
-
-   /* Get ranges property */
-   ranges = of_get_property(dev, "ranges", &rlen);
-   if (ranges == NULL)
-   return;
-
-   /* Parse it */
-   pr_debug("Parsing ranges property...\n");
-   while ((rlen -= np * 4) >= 0) {
-   /* Read next ranges element */
-   pci_space = ranges[0];
-   pci_addr = of_read_number(ranges + 1, 2);
-   cpu_addr = of_translate_address(dev, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-
-   pr_debug("pci_space: 0x%08x pci_addr:0x%0

[PATCH v6 2/3] of/pci: Provide support for parsing PCI DT ranges property

2013-04-11 Thread Andrew Murray
This patch factors out common implementation patterns to reduce overall kernel
code and provide a means for host bridge drivers to directly obtain struct
resources from the DT's ranges property without relying on architecture specific
DT handling. This will make it easier to write archiecture independent host 
bridge
drivers and mitigate against further duplication of DT parsing code.

This patch can be used in the following way:

struct of_pci_range_parser parser;
struct of_pci_range range;

if (of_pci_range_parser(&parser, np))
; //no ranges property

for_each_of_pci_range(&parser, &range) {

/*
directly access properties of the address range, e.g.:
range.pci_space, range.pci_addr, range.cpu_addr,
range.size, range.flags

alternatively obtain a struct resource, e.g.:
struct resource res;
of_pci_range_to_resource(&range, np, &res);
*/
}

Additionally the implementation takes care of adjacent ranges and merges them
into a single range (as was the case with powerpc and microblaze).

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Signed-off-by: Thomas Petazzoni 
Reviewed-by: Rob Herring 
Tested-by: Thomas Petazzoni 
---
 drivers/of/address.c   |   66 ++
 drivers/of/of_pci.c|  112 
 include/linux/of_address.h |   46 ++
 3 files changed, 152 insertions(+), 72 deletions(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 04da786..d3c4f2f 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -227,6 +227,72 @@ int of_pci_address_to_resource(struct device_node *dev, 
int bar,
return __of_address_to_resource(dev, addrp, size, flags, NULL, r);
 }
 EXPORT_SYMBOL_GPL(of_pci_address_to_resource);
+
+int of_pci_range_parser(struct of_pci_range_parser *parser,
+   struct device_node *node)
+{
+   const int na = 3, ns = 2;
+   int rlen;
+
+   parser->node = node;
+   parser->pna = of_n_addr_cells(node);
+   parser->np = parser->pna + na + ns;
+
+   parser->range = of_get_property(node, "ranges", &rlen);
+   if (parser->range == NULL)
+   return -ENOENT;
+
+   parser->end = parser->range + rlen / sizeof(__be32);
+
+   return 0;
+}
+
+struct of_pci_range *of_pci_process_ranges(struct of_pci_range_parser *parser,
+   struct of_pci_range *range)
+{
+   const int na = 3, ns = 2;
+
+   if (!range)
+   return NULL;
+
+   if (!parser->range || parser->range + parser->np > parser->end)
+   return NULL;
+
+   range->pci_space = parser->range[0];
+   range->flags = of_bus_pci_get_flags(parser->range);
+   range->pci_addr = of_read_number(parser->range + 1, ns);
+   range->cpu_addr = of_translate_address(parser->node,
+   parser->range + na);
+   range->size = of_read_number(parser->range + parser->pna + na, ns);
+
+   parser->range += parser->np;
+
+   /* Now consume following elements while they are contiguous */
+   while (parser->range + parser->np <= parser->end) {
+   u32 flags, pci_space;
+   u64 pci_addr, cpu_addr, size;
+
+   pci_space = be32_to_cpup(parser->range);
+   flags = of_bus_pci_get_flags(parser->range);
+   pci_addr = of_read_number(parser->range + 1, ns);
+   cpu_addr = of_translate_address(parser->node,
+   parser->range + na);
+   size = of_read_number(parser->range + parser->pna + na, ns);
+
+   if (flags != range->flags)
+   break;
+   if (pci_addr != range->pci_addr + range->size ||
+   cpu_addr != range->cpu_addr + range->size)
+   break;
+
+   range->size += size;
+   parser->range += parser->np;
+   }
+
+   return range;
+}
+EXPORT_SYMBOL_GPL(of_pci_process_ranges);
+
 #endif /* CONFIG_PCI */
 
 /*
diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 1626172..3e428a1 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -82,67 +82,43 @@ EXPORT_SYMBOL_GPL(of_pci_find_child_device);
 void pci_process_bridge_OF_ranges(struct pci_controller *hose,
  struct device_node *dev, int primary)
 {
-   const u32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(dev);
-   int np = pna + 5;
int memno = 0, isa_hole = -1;
-   u32 pci_space;
-   unsigned long long pci_addr, cpu_

[PATCH v7 2/3] of/pci: Provide support for parsing PCI DT ranges property

2013-04-16 Thread Andrew Murray
This patch factors out common implementation patterns to reduce overall kernel
code and provide a means for host bridge drivers to directly obtain struct
resources from the DT's ranges property without relying on architecture specific
DT handling. This will make it easier to write archiecture independent host 
bridge
drivers and mitigate against further duplication of DT parsing code.

This patch can be used in the following way:

struct of_pci_range_parser parser;
struct of_pci_range range;

if (of_pci_range_parser(&parser, np))
; //no ranges property

for_each_of_pci_range(&parser, &range) {

/*
directly access properties of the address range, e.g.:
range.pci_space, range.pci_addr, range.cpu_addr,
range.size, range.flags

alternatively obtain a struct resource, e.g.:
struct resource res;
of_pci_range_to_resource(&range, np, &res);
*/
}

Additionally the implementation takes care of adjacent ranges and merges them
into a single range (as was the case with powerpc and microblaze).

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Signed-off-by: Thomas Petazzoni 
Reviewed-by: Rob Herring 
Tested-by: Thomas Petazzoni 
Tested-by: Linus Walleij 
---
 drivers/of/address.c   |   67 ++
 drivers/of/of_pci.c|  113 
 include/linux/of_address.h |   46 ++
 3 files changed, 154 insertions(+), 72 deletions(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 04da786..6eec70c 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -227,6 +227,73 @@ int of_pci_address_to_resource(struct device_node *dev, 
int bar,
return __of_address_to_resource(dev, addrp, size, flags, NULL, r);
 }
 EXPORT_SYMBOL_GPL(of_pci_address_to_resource);
+
+int of_pci_range_parser(struct of_pci_range_parser *parser,
+   struct device_node *node)
+{
+   const int na = 3, ns = 2;
+   int rlen;
+
+   parser->node = node;
+   parser->pna = of_n_addr_cells(node);
+   parser->np = parser->pna + na + ns;
+
+   parser->range = of_get_property(node, "ranges", &rlen);
+   if (parser->range == NULL)
+   return -ENOENT;
+
+   parser->end = parser->range + rlen / sizeof(__be32);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(of_pci_range_parser);
+
+struct of_pci_range *of_pci_process_ranges(struct of_pci_range_parser *parser,
+   struct of_pci_range *range)
+{
+   const int na = 3, ns = 2;
+
+   if (!range)
+   return NULL;
+
+   if (!parser->range || parser->range + parser->np > parser->end)
+   return NULL;
+
+   range->pci_space = parser->range[0];
+   range->flags = of_bus_pci_get_flags(parser->range);
+   range->pci_addr = of_read_number(parser->range + 1, ns);
+   range->cpu_addr = of_translate_address(parser->node,
+   parser->range + na);
+   range->size = of_read_number(parser->range + parser->pna + na, ns);
+
+   parser->range += parser->np;
+
+   /* Now consume following elements while they are contiguous */
+   while (parser->range + parser->np <= parser->end) {
+   u32 flags, pci_space;
+   u64 pci_addr, cpu_addr, size;
+
+   pci_space = be32_to_cpup(parser->range);
+   flags = of_bus_pci_get_flags(parser->range);
+   pci_addr = of_read_number(parser->range + 1, ns);
+   cpu_addr = of_translate_address(parser->node,
+   parser->range + na);
+   size = of_read_number(parser->range + parser->pna + na, ns);
+
+   if (flags != range->flags)
+   break;
+   if (pci_addr != range->pci_addr + range->size ||
+   cpu_addr != range->cpu_addr + range->size)
+   break;
+
+   range->size += size;
+   parser->range += parser->np;
+   }
+
+   return range;
+}
+EXPORT_SYMBOL_GPL(of_pci_process_ranges);
+
 #endif /* CONFIG_PCI */
 
 /*
diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 1626172..e5ab604 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #if defined(CONFIG_PPC32) || defined(CONFIG_PPC64) || 
defined(CONFIG_MICROBLAZE)
@@ -82,67 +83,43 @@ EXPORT_SYMBOL_GPL(of_pci_find_child_device);
 void pci_process_bridge_OF_ranges(struct pci_controller *hose,
  struct device_node *dev,

[PATCH v7 0/3] of/pci: Provide common support for PCI DT parsing

2013-04-16 Thread Andrew Murray
This patchset factors out duplicated code associated with parsing PCI
DT "ranges" properties across the architectures and introduces a
"ranges" parser. This parser "of_pci_range_parser" can be used directly
by ARM host bridge drivers enabling them to obtain ranges from device
trees.

I've included the Reviewed-by and Tested-by's received from v5/v6 in this
patchset, earlier versions of this patchset (v3) have been tested-by:

Thierry Reding 
Jingoo Han 

I've tested that this patchset builds and runs on ARM and that it builds on
PowerPC and x86_64.

Compared to the v6 sent by Andrew Murray, the following changes have
been made in response to build errors/warnings:

 * Inclusion of linux/of_address.h in of_pci.c as suggested by Michal
   Simek to prevent compilation failures on Microblaze (and others) and his
   ack.

 * Use of externs, static inlines and a typo in linux/of_address.h in response
   to linker errors (multiple defination) on x86_64 as spotted by a kbuild test
   robot on (jcooper/linux.git mvebu/drivers)

 * Add EXPORT_SYMBOL_GPL to of_pci_range_parser function to be consistent
   with of_pci_process_ranges function

Compared to the v5 sent by Andrew Murray, the following changes have
been made:

 * Use of CONFIG_64BIT instead of CONFIG_[a32bitarch] as suggested by
   Rob Herring in drivers/of/of_pci.c

 * Added forward declaration of struct pci_controller in linux/of_pci.h
   to prevent compiler warning as suggested by Thomas Petazzoni

 * Improved error checking (!range check), removal of unnecessary be32_to_cpup
   call, improved formatting of struct of_pci_range_parser layout and
   replacement of macro with a static inline. All suggested by Rob Herring.

Compared to the v4 (incorrectly labelled v3) sent by Andrew Murray,
the following changes have been made:

 * Split the patch as suggested by Rob Herring

Compared to the v3 sent by Andrew Murray, the following changes have
been made:

 * Unify and move duplicate pci_process_bridge_OF_ranges functions to
   drivers/of/of_pci.c as suggested by Rob Herring

 * Fix potential build errors with Microblaze/MIPS

Compared to "[PATCH v5 01/17] of/pci: Provide support for parsing PCI DT
ranges property", the following changes have been made:

 * Correct use of IORESOURCE_* as suggested by Russell King

 * Improved interface and naming as suggested by Thierry Reding

Compared to the v2 sent by Andrew Murray, Thomas Petazzoni did:

 * Add a memset() on the struct of_pci_range_iter when starting the
   for loop in for_each_pci_range(). Otherwise, with an uninitialized
   of_pci_range_iter, of_pci_process_ranges() may crash.

 * Add parenthesis around 'res', 'np' and 'iter' in the
   for_each_of_pci_range macro definitions. Otherwise, passing
   something like &foobar as 'res' didn't work.

 * Rebased on top of 3.9-rc2, which required fixing a few conflicts in
   the Microblaze code.

v2:
  This follows on from suggestions made by Grant Likely
  (marc.info/?l=linux-kernel&m=136079602806328)

Andrew Murray (3):
  of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and
PowerPC
  of/pci: Provide support for parsing PCI DT ranges property
  of/pci: mips: convert to common of_pci_range_parser

 arch/microblaze/include/asm/pci-bridge.h |5 +-
 arch/microblaze/pci/pci-common.c |  192 --
 arch/mips/pci/pci.c  |   50 +++--
 arch/powerpc/include/asm/pci-bridge.h|5 +-
 arch/powerpc/kernel/pci-common.c |  192 --
 drivers/of/address.c |   67 +++
 drivers/of/of_pci.c  |  169 ++
 include/linux/of_address.h   |   46 +++
 include/linux/of_pci.h   |4 +
 9 files changed, 304 insertions(+), 426 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 3/3] of/pci: mips: convert to common of_pci_range_parser

2013-04-16 Thread Andrew Murray
This patch converts the pci_load_of_ranges function to use the new common
of_pci_range_parser.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Reviewed-by: Rob Herring 
---
 arch/mips/pci/pci.c |   50 --
 1 files changed, 16 insertions(+), 34 deletions(-)

diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 0872f12..bee49a4 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -122,51 +122,33 @@ static void pcibios_scanbus(struct pci_controller *hose)
 #ifdef CONFIG_OF
 void pci_load_of_ranges(struct pci_controller *hose, struct device_node *node)
 {
-   const __be32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(node);
-   int np = pna + 5;
+   struct of_pci_range_range range;
+   struct of_pci_range_parser parser;
+   u32 res_type;
 
pr_info("PCI host bridge %s ranges:\n", node->full_name);
-   ranges = of_get_property(node, "ranges", &rlen);
-   if (ranges == NULL)
-   return;
hose->of_node = node;
 
-   while ((rlen -= np * 4) >= 0) {
-   u32 pci_space;
+   if (of_pci_range_parser(&parser, node))
+   return;
+
+   for_each_of_pci_range(&parser, &range) {
struct resource *res = NULL;
-   u64 addr, size;
-
-   pci_space = be32_to_cpup(&ranges[0]);
-   addr = of_translate_address(node, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-   ranges += np;
-   switch ((pci_space >> 24) & 0x3) {
-   case 1: /* PCI IO space */
+
+   res_type = range.flags & IORESOURCE_TYPE_BITS;
+   if (res_type == IORESOURCE_IO) {
pr_info("  IO 0x%016llx..0x%016llx\n",
-   addr, addr + size - 1);
+   range.addr, range.addr + range.size - 1);
hose->io_map_base =
-   (unsigned long)ioremap(addr, size);
+   (unsigned long)ioremap(range.addr, range.size);
res = hose->io_resource;
-   res->flags = IORESOURCE_IO;
-   break;
-   case 2: /* PCI Memory space */
-   case 3: /* PCI 64 bits Memory space */
+   } else if (res_type == IORESOURCE_MEM) {
pr_info(" MEM 0x%016llx..0x%016llx\n",
-   addr, addr + size - 1);
+   range.addr, range.addr + range.size - 1);
res = hose->mem_resource;
-   res->flags = IORESOURCE_MEM;
-   break;
-   }
-   if (res != NULL) {
-   res->start = addr;
-   res->name = node->full_name;
-   res->end = res->start + size - 1;
-   res->parent = NULL;
-   res->sibling = NULL;
-   res->child = NULL;
}
+   if (res != NULL)
+   of_pci_range_to_resource(&range, node, res);
}
 }
 #endif
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 1/3] of/pci: Unify pci_process_bridge_OF_ranges from Microblaze and PowerPC

2013-04-16 Thread Andrew Murray
The pci_process_bridge_OF_ranges function, used to parse the "ranges"
property of a PCI host device, is found in both Microblaze and PowerPC
architectures. These implementations are nearly identical. This patch
moves this common code to a common place.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
Reviewed-by: Rob Herring 
Tested-by: Thomas Petazzoni 
Tested-by: Linus Walleij 
Acked-by: Michal Simek 
---
 arch/microblaze/include/asm/pci-bridge.h |5 +-
 arch/microblaze/pci/pci-common.c |  192 
 arch/powerpc/include/asm/pci-bridge.h|5 +-
 arch/powerpc/kernel/pci-common.c |  192 
 drivers/of/of_pci.c  |  200 ++
 include/linux/of_pci.h   |4 +
 6 files changed, 206 insertions(+), 392 deletions(-)

diff --git a/arch/microblaze/include/asm/pci-bridge.h 
b/arch/microblaze/include/asm/pci-bridge.h
index cb5d397..5783cd6 100644
--- a/arch/microblaze/include/asm/pci-bridge.h
+++ b/arch/microblaze/include/asm/pci-bridge.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device_node;
 
@@ -132,10 +133,6 @@ extern void setup_indirect_pci(struct pci_controller *hose,
 extern struct pci_controller *pci_find_hose_for_OF_device(
struct device_node *node);
 
-/* Fill up host controller resources from the OF node */
-extern void pci_process_bridge_OF_ranges(struct pci_controller *hose,
-   struct device_node *dev, int primary);
-
 /* Allocate & free a PCI host bridge structure */
 extern struct pci_controller *pcibios_alloc_controller(struct device_node 
*dev);
 extern void pcibios_free_controller(struct pci_controller *phb);
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 9ea521e..2735ad9 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -622,198 +622,6 @@ void pci_resource_to_user(const struct pci_dev *dev, int 
bar,
*end = rsrc->end - offset;
 }
 
-/**
- * pci_process_bridge_OF_ranges - Parse PCI bridge resources from device tree
- * @hose: newly allocated pci_controller to be setup
- * @dev: device node of the host bridge
- * @primary: set if primary bus (32 bits only, soon to be deprecated)
- *
- * This function will parse the "ranges" property of a PCI host bridge device
- * node and setup the resource mapping of a pci controller based on its
- * content.
- *
- * Life would be boring if it wasn't for a few issues that we have to deal
- * with here:
- *
- *   - We can only cope with one IO space range and up to 3 Memory space
- * ranges. However, some machines (thanks Apple !) tend to split their
- * space into lots of small contiguous ranges. So we have to coalesce.
- *
- *   - We can only cope with all memory ranges having the same offset
- * between CPU addresses and PCI addresses. Unfortunately, some bridges
- * are setup for a large 1:1 mapping along with a small "window" which
- * maps PCI address 0 to some arbitrary high address of the CPU space in
- * order to give access to the ISA memory hole.
- * The way out of here that I've chosen for now is to always set the
- * offset based on the first resource found, then override it if we
- * have a different offset and the previous was set by an ISA hole.
- *
- *   - Some busses have IO space not starting at 0, which causes trouble with
- * the way we do our IO resource renumbering. The code somewhat deals with
- * it for 64 bits but I would expect problems on 32 bits.
- *
- *   - Some 32 bits platforms such as 4xx can have physical space larger than
- * 32 bits so we need to use 64 bits values for the parsing
- */
-void pci_process_bridge_OF_ranges(struct pci_controller *hose,
- struct device_node *dev, int primary)
-{
-   const u32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(dev);
-   int np = pna + 5;
-   int memno = 0, isa_hole = -1;
-   u32 pci_space;
-   unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
-   unsigned long long isa_mb = 0;
-   struct resource *res;
-
-   pr_info("PCI host bridge %s %s ranges:\n",
-  dev->full_name, primary ? "(primary)" : "");
-
-   /* Get ranges property */
-   ranges = of_get_property(dev, "ranges", &rlen);
-   if (ranges == NULL)
-   return;
-
-   /* Parse it */
-   pr_debug("Parsing ranges property...\n");
-   while ((rlen -= np * 4) >= 0) {
-   /* Read next ranges element */
-   pci_space = ranges[0];
-   pci_addr = of_read_number(ranges + 1, 2);
-   cpu_addr = of_translate_address(dev, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-
- 

Re: [RFC PATCH RESEND v2] of/pci: Provide support for parsing PCI DT ranges property

2013-03-08 Thread Andrew Murray
On Fri, Mar 01, 2013 at 03:13:34PM +, Rob Herring wrote:
> On 03/01/2013 06:23 AM, Andrew Murray wrote:
> > This patch factors out common implementations patterns to reduce overall 
> > kernel
> > code and provide a means for host bridge drivers to directly obtain struct
> > resources from the DT's ranges property without relying on architecture 
> > specific
> > DT handling. This will make it easier to write archiecture independent host 
> > bridge
> > drivers and mitigate against further duplication of DT parsing code.
> > 
> > This patch can be used in the following way:
> > 
> > struct of_pci_range_iter iter;
> > for_each_of_pci_range(&iter, np) {
> > 
> > //directly access properties of the address range, e.g.:
> > //iter.pci_space, iter.pci_addr, iter.cpu_addr, iter.size or
> > //iter.flags
> > 
> > //alternatively obtain a struct resource, e.g.:
> > //struct resource res;
> > //range_iter_fill_resource(iter, np, res);
> > }
> > 
> > Additionally the implementation takes care of adjacent ranges and merges 
> > them
> > into a single range (as was the case with powerpc and microblaze).
> > 
> > The modifications to microblaze, mips and powerpc have not been tested.
> > 
> > v2:
> >   This follows on from suggestions made by Grant Likely
> >   (marc.info/?l=linux-kernel&m=136079602806328)
> > 
> > Signed-off-by: Andrew Murray 
> > Signed-off-by: Liviu Dudau 
> > ---
> >  arch/microblaze/pci/pci-common.c |  100 
> > +++--
> >  arch/mips/pci/pci.c  |   44 -
> >  arch/powerpc/kernel/pci-common.c |   93 ++-
> >  drivers/of/address.c |   54 
> >  include/linux/of_address.h   |   30 +++
> >  5 files changed, 151 insertions(+), 170 deletions(-)
> 
> The thing is that this still leaves pci_process_bridge_OF_ranges
> basically identical for microblaze and powerpc which is really what
> needs to be moved out to common code. Obviously, struct pci_controller
> vs. struct pci_sys_data on ARM is an issue, but they all have
> fundamentally the same data.
> 
> All these common fields should be in a common PCI controller struct.
> Perhaps introducing this with just what you need would work. Depending
> how invasive moving those fields to a new struct is, you could have a
> wrapper that just copies/translates the fields to the arch specific struct.
> 
> There's also things like ioremap of the i/o range. ARM uses a fixed
> virtual address, so we need to do something different. Just returning
> the i/o cpu_addr and moving the ioremap out of this function would solve
> that.

This is my current thinking...

 - Move struct pci_controller from arch/powerpc/include/asm/pci-bridge.h to
   include/linux/pci-bridge and rename (struct pci_controller_generic). Remove
   struct pci_controller from arch/microblaze/include/asm/pci-bridge.h.

   The powerpc struct pci_controller is a superset of the microblaze struct
   pci_controller. Doing this will allow two architectures to share a common
   implementation of a struct pci_controller. #ifdef's can be used to remove
   extra powerpc fields in the structure (they aren't many).

 - Provide a common implementation of pci_process_bridge_OF_range. This would
   use the for_each_of_pci_range macro to populate a struct pci_controller,
   this would remove duplicate code between microblaze and powerpc. The common
   implementation could use a Kconfig option to enable/disable handling the ISA
   hole (for architectures that don't need/want it). The caller can worry
   about ioremap.

 - Other architectures (mips, ARM) could use this common implementation of
   pci_process_bridge_OF_range in the future but at present they can use
   for_each_of_pci_range (as shown in this patch).
   
This reduces duplicated code, gives ARM a means of parsing PCI DT and provides
a starting point for getting ARM's pci_sys_data more inline with powerpc and
microblaze. Perhaps with a common controller structure - other areas of code
can also be factored out - for example functions like
pcibios_setup_phb_resources, etc - these are probably only arch specific due to
their use of the arch specific pci_controller struct.

Do you think this is a sensible direction?

Andrew Murray

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PCI: Document PCIE BUS MPS parameters

2013-01-23 Thread Andrew Murray
On Wed, Jan 23, 2013 at 08:01:36AM +, Yijing Wang wrote:
> Document PCIE BUS MPS parameters pcie_bus_tune_off, pcie_bus_safe,
> pcie_bus_peer2peer, pcie_bus_perf into Documentation/kernel-parameters.txt.
> These parameters were introduced by Jon Mason  at
> commit 5f39e6705 and commit b03e7495a8. Document these into 
> kernel-parameters.txt help users to understand and use the parameters.
> 
> Signed-off-by: Yijing Wang 
> ---
>  Documentation/kernel-parameters.txt |   13 +
>  1 files changed, 13 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index 363e348..4dfa8d2 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2227,6 +2227,19 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
>   This sorting is done to get a device
>   order compatible with older (<= 2.4) kernels.
>   nobfsortDon't sort PCI devices into breadth-first order.
> + pcie_bus_tune_off   [X86] Disable PCI-E MPS turning and 
> using
> + the BIOS configured MPS defaults.
> + pcie_bus_safe   [X86] Use the smallest common denominator MPS
> + of the entire tree below a root complex for 
> every device
> + on that fabric. Can avoid inconsistent mps 
> problem caused
> + by hotplug.
> + pcie_bus_perf   [X86] Configure pcie device MPS to the largest
> + allowable MPS based on its parent bus.Improve 
> performance
> + as much as possible.
> + pcie_bus_peer2peer  [X86] Make the system wide MPS the 
> smallest
> + possible value (128B).This configuration could 
> prevent it
> + from working by having the MPS on one root port 
> different
> + than the MPS on another.
>   cbiosize=nn[KMG]The fixed amount of bus space which is
>   reserved for the CardBus bridge's IO window.
>   The default value is 256 bytes.
>
I was searching for documentation on this the other day.

It's not just X86 that use these options, PowerPC and Tile also use them (grep
for users of pcie_bus_configure_settings). I've also noticed a call to it from
hotplug as well...

In addition these options also have an effect on MRRS - I've not figured out
what effect they have, but you can look in drivers/pci/probe.c at the
pcie_write_mrrs function.

Andrew Murray
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PCI: Document PCIE BUS MPS parameters

2013-01-23 Thread Andrew Murray
On Wed, Jan 23, 2013 at 10:13:02AM +, Yijing Wang wrote:
> ??? 2013-01-23 17:21, Andrew Murray ??:
> > On Wed, Jan 23, 2013 at 08:01:36AM +, Yijing Wang wrote:
> >> Document PCIE BUS MPS parameters pcie_bus_tune_off, pcie_bus_safe,
> >> pcie_bus_peer2peer, pcie_bus_perf into Documentation/kernel-parameters.txt.
> >> These parameters were introduced by Jon Mason  at
> >> commit 5f39e6705 and commit b03e7495a8. Document these into 
> >> kernel-parameters.txt help users to understand and use the parameters.
> >>
> >> Signed-off-by: Yijing Wang 
> >> ---
> >>  Documentation/kernel-parameters.txt |   13 +
> >>  1 files changed, 13 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/Documentation/kernel-parameters.txt 
> >> b/Documentation/kernel-parameters.txt
> >> index 363e348..4dfa8d2 100644
> >> --- a/Documentation/kernel-parameters.txt
> >> +++ b/Documentation/kernel-parameters.txt
> >> @@ -2227,6 +2227,19 @@ bytes respectively. Such letter suffixes can also 
> >> be entirely omitted.
> >>This sorting is done to get a device
> >>order compatible with older (<= 2.4) kernels.
> >>nobfsortDon't sort PCI devices into breadth-first order.
> >> +  pcie_bus_tune_off   [X86] Disable PCI-E MPS turning and 
> >> using
> >> +  the BIOS configured MPS defaults.
> >> +  pcie_bus_safe   [X86] Use the smallest common denominator MPS
> >> +  of the entire tree below a root complex for 
> >> every device
> >> +  on that fabric. Can avoid inconsistent mps 
> >> problem caused
> >> +  by hotplug.
> >> +  pcie_bus_perf   [X86] Configure pcie device MPS to the largest
> >> +  allowable MPS based on its parent bus.Improve 
> >> performance
> >> +  as much as possible.
> >> +  pcie_bus_peer2peer  [X86] Make the system wide MPS the 
> >> smallest
> >> +  possible value (128B).This configuration could 
> >> prevent it
> >> +  from working by having the MPS on one root port 
> >> different
> >> +  than the MPS on another.
> >>cbiosize=nn[KMG]The fixed amount of bus space which is
> >>reserved for the CardBus bridge's IO window.
> >>The default value is 256 bytes.
> >>
> > I was searching for documentation on this the other day.
> > 
> > It's not just X86 that use these options, PowerPC and Tile also use them 
> > (grep
> > for users of pcie_bus_configure_settings). I've also noticed a call to it 
> > from
> > hotplug as well...
> 
> Hi Andrew,
>Thanks for reminder! I will update this patch right now.
> 
> > 
> > In addition these options also have an effect on MRRS - I've not figured out
> > what effect they have, but you can look in drivers/pci/probe.c at the
> > pcie_write_mrrs function.
> 
> This is a separate issue, Andrew, can you provide the effetct problem log or 
> detail information?
> That will helps us to analyze this issue.

No this isn't a bug. When pcie_bus_perf is set, not only does it change the MPS
as described in your documentation - but it also changes the MRRS for better
performance. I felt this should also be included in your documentation of
pcie_bus_perf.

The pcie_write_mrrs function uses pcie_bus_config to determine if a change to
the MRRS should be made.

(What I don't understand is that the comments in this function suggest the MRRS
cannot be larger than the MPS - I thought it could be?)

Andrew Murray 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] of/pci: Provide support for parsing PCI DT ranges property

2013-02-14 Thread Andrew Murray
On Wed, Feb 13, 2013 at 10:53:11PM +, Grant Likely wrote:
> On Mon, 11 Feb 2013 09:22:17 +0100, Thierry Reding 
>  wrote:
> > From: Andrew Murray 
> > 
> > DT bindings for PCI host bridges often use the ranges property to describe
> > memory and IO ranges - this binding tends to be the same across 
> > architectures
> > yet several parsing implementations exist, e.g. arch/mips/pci/pci.c,
> > arch/powerpc/kernel/pci-common.c, arch/sparc/kernel/pci.c and
> > arch/microblaze/pci/pci-common.c (clone of PPC). Some of these duplicate
> > functionality provided by drivers/of/address.c.
> 
> Hi Thierry,
> 
> Following from my comments on not merging these patches, here are my
> comments on this patch...
> 
> > This patch provides a common iterator-based parser for the ranges property, 
> > it
> > is hoped this will reduce DT representation differences between 
> > architectures
> > and that architectures will migrate in part to this new parser.
> > 
> > It is also hoped (and the motativation for the patch) that this patch will
> > reduce duplication of code when writing host bridge drivers that are 
> > supported
> > by multiple architectures.
> > 
> > This patch provides struct resources from a device tree node, e.g.:
> > 
> > u32 *last = NULL;
> > struct resource res;
> > while ((last = of_pci_process_ranges(np, res, last))) {
> > //do something with res
> > }
> 
> The approach seems reasonable, but it isn't optimal. For one, the setup
> code ends up getting run every time of_pci_process_ranges() gets called.
> It would also be more user-friendly to wrap it up in a
> "for_each_of_pci_range()" macro.
> 
> > Platforms with quirks can then do what they like with the resource or 
> > migrate
> > common quirk handling to the parser. In an ideal world drivers can just 
> > request
> > the obtained resources and pass them on (e.g. pci_add_resource_offset).
> > 
> > Signed-off-by: Andrew Murray 
> > Signed-off-by: Liviu Dudau 
> > Signed-off-by: Thierry Reding 
> > ---
> >  drivers/of/address.c   | 63 
> > ++
> >  include/linux/of_address.h |  9 +++
> >  2 files changed, 72 insertions(+)
> > 
> > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > index 04da786..f607008 100644
> > --- a/drivers/of/address.c
> > +++ b/drivers/of/address.c
> > @@ -13,6 +13,7 @@
> >  #define OF_CHECK_COUNTS(na, ns)(OF_CHECK_ADDR_COUNT(na) && (ns) > 0)
> >  
> >  static struct of_bus *of_match_bus(struct device_node *np);
> > +static struct of_bus *of_find_bus(const char *name);
> >  static int __of_address_to_resource(struct device_node *dev,
> > const __be32 *addrp, u64 size, unsigned int flags,
> > const char *name, struct resource *r);
> > @@ -227,6 +228,57 @@ int of_pci_address_to_resource(struct device_node 
> > *dev, int bar,
> > return __of_address_to_resource(dev, addrp, size, flags, NULL, r);
> >  }
> >  EXPORT_SYMBOL_GPL(of_pci_address_to_resource);
> > +
> > +const __be32 *of_pci_process_ranges(struct device_node *node,
> > +   struct resource *res, const __be32 *from)
> > +{
> > +   const __be32 *start, *end;
> > +   int na, ns, np, pna;
> > +   int rlen;
> > +   struct of_bus *bus;
> > +
> > +   WARN_ON(!res);
> > +
> > +   bus = of_find_bus("pci");
> > +   bus->count_cells(node, &na, &ns);
> > +   if (!OF_CHECK_COUNTS(na, ns)) {
> > +   pr_err("Bad cell count for %s\n", node->full_name);
> > +   return NULL;
> > +   }
> 
> Looking up the pci of_bus structure isn't really warrented here. This
> function will only ever be used on PCI busses, and na/ns for PCI is
> always 3/2. Just use those numbers here. You could however validate that
> the node has the correct values in #address-cells and #size-cells
> 
> > +
> > +   pna = of_n_addr_cells(node);
> > +   np = pna + na + ns;
> > +
> > +   start = of_get_property(node, "ranges", &rlen);
> > +   if (start == NULL)
> > +   return NULL;
> > +
> > +   end = start + rlen / sizeof(__be32);
> 
> The above structure means that the ranges property has to be looked up
> each and every time this function is called. It would be better to have
> a state structure that can look it up once and then keep track of the
> iteration.
> 
> 

Re: [PATCH 1/4] of/pci: Provide support for parsing PCI DT ranges property

2013-02-18 Thread Andrew Murray
On Fri, Feb 15, 2013 at 01:16:17PM +, Linus Walleij wrote:
> n Thu, Feb 14, 2013 at 8:17 PM, Thierry Reding
>  wrote:
> > On Thu, Feb 14, 2013 at 04:53:41PM +0000, Andrew Murray wrote:
> >> Thierry,
> >>
> >> If you don't have much bandwidth I'd be quite happy to take this on - this
> >> would be beneficial for my eventual patchset. I can start by refactoring 
> >> common
> >> implementations of pci_process_bridge_OF_ranges or similar across the
> >> architectures as per Grant's suggestion? I didn't do this when I first 
> >> posted
> >> the patch as I was concerned about the testing effort.
> >
> > Absolutely! Since it was your patch in the first place you're just as
> > well suited to do this if you want to and have the time.
> 
> I am working on device tree patches for the Integrator/AP with it's
> PCIv3 bridge, and I also follow this with great interest. I was almost
> going to start the copy/paste cycle but now I think it's better if
> I wait for this to happen.

No problem - I am currently looking at this.

Andrew Murray

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/14] PCI: tegra: Move PCIe driver to drivers/pci/host

2013-01-14 Thread Andrew Murray
On Sun, Jan 13, 2013 at 09:58:06AM +, Thierry Reding wrote:
> On Sat, Jan 12, 2013 at 09:12:25PM +, Arnd Bergmann wrote:
> > On Saturday 12 January 2013, Thierry Reding wrote:
> > > > I already hinted at that in one of the other subthreads. Having such a
> > > > multiplex would also allow the driver to be built as a module. I had
> > > > already thought about this when I was working on an earlier version of
> > > > these patches. Basically these would be two ops attached to the host
> > > > bridge, and the generic arch_setup_msi_irq() could then look that up
> > > > given the struct pci_dev that is passed to it and call this new per-
> > > > host bridge .setup_msi_irq().
> > > 
> > > struct pci_ops looks like a good place to put these. They'll be
> > > available from each struct pci_bus, so should be easy to call from
> > > arch_setup_msi_irq().
> > > 
> > > Any objections?
> > > 
> > 
> > struct pci_ops has a long history of being specifically about
> > config space read/write operations, so on the one hand it does
> > not feel like the right place to put interrupt specific operations,
> > but on the other hand, the name sounds appropriate and I cannot
> > think of any other place to put this, so it's fine with me.
> > 
> > The only alternative I can think of is to introduce a new
> > structure next to it in struct pci_bus, but that feels a bit
> > pointless. Maybe Bjorn has a preference one way or the other.
> 
> The name pci_ops is certainly generic enough. Also the comment above the
> structure declaration says "Low-level architecture-dependent routines",
> which applies to the MSI functions as well.

I've previously looked into this. It seems that architectures handle this
in different ways, some use vector tables, others use a multiplex and others
just let the end user implement the callback directly.

I've made an attempt to find a more common way. Though my implementation, which
I will try to share later today for reference provides a registration function
in drivers/pci/msi.c to provide implementations of the
(setup|teardown)_msi_irq(s) ops. This seems slightly better than the current
approach and doesn't break existing users - but is still ugly.

At present the PCI and MSI frameworks are largely uncoupled from each other and
so I was keen to not pollute PCI structures (e.g. pci_ops) with MSI ops. Just
because most PCI host bridges also provide MSI support I don't think there is a
reason why they should always come as a pair or be provided by the same chip.

Perhaps the solution is to support MSI controller drivers and a means to
associate them with PCI host controller drivers?

Andrew Murray


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/14] PCI: tegra: Move PCIe driver to drivers/pci/host

2013-01-15 Thread Andrew Murray
On Tue, Jan 15, 2013 at 12:44:12PM +, Arnd Bergmann wrote:
> On Tuesday 15 January 2013, Thierry Reding wrote:
> > I'm not sure I follow you're reasoning here. Is it possible to use MSIs
> > without PCI? If not then I think there's little sense in keeping the
> > implementations separate.
> 
> Conceptually, you can use MSI for any device, but the Linux interfaces
> for MSI are tied to PCI. If you use an MSI controller for a non-PCI
> device, it would probably just appear as a regular interrupt controller.
> 
> > Furthermore, if MSI controller and PCI host bridge are separate entities
> > how do you look up the MSI controller given a PCI device?
> 
> The host bridge can contain a pointer ot the MSI controller. You can
> have multiple host bridges sharing a single MSI controller or you
> can have separate ones for each host.

Yes and I hoped this relationship would be described by a device tree phandle
as is done for relating devices to their interrupt-parent (where device trees
are used). This would provide (arguably unnecessarily) greater flexibility,
e.g. if you have two PCI/MSI controller pairs, the MSIs only offer limited MSIs
and you only use one PCI fabric - you could service different parts of the
fabric by different MSI controllers (assuming you relate MSI controllers to
part of the fabric and that you'd want to). Perhaps there would be benefits for
virtualisation as well?

Andrew Murray

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/14] PCI: tegra: Move PCIe driver to drivers/pci/host

2013-01-16 Thread Andrew Murray
On Wed, Jan 16, 2013 at 02:00:26PM +, Arnd Bergmann wrote:
> On Tuesday 15 January 2013, Thierry Reding wrote:
> > Is there actually hardware that supports this? I assumed that the MSI
> > controller would have to be tightly coupled to the PCI host bridge in
> > order to raise an interrupt when an MSI is received via PCI.
> 
> No, as long as it's guaranteed that the MSI notification won't arrive
> at the CPU before any inbound DMA data before it, the MSI controller
> can be anywhere. Typically, the MSI controller is actually closer to
> the CPU core than to the PCI bridge. On X86, I believe the MSI address
> is on normally on the the "local APIC" on each CPU.

MSIs are indistinguishable from other memory-write transactions originating
from the RC other than the address they target. Anything that can capture
that write in the address space (even a page fault) could be an MSI controller
and call interrupt handlers. And so the RC / MSI controllers don't need to
be aware of each other.

Andrew Murray

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC] Provide MSI controller registration mechanism

2013-01-17 Thread Andrew Murray
This was my initial attempt at providing an architecture agnostic MSI
controller. Whilst it does not do anything to allow multiple MSI controllers on
a platform as has been recently discussed, it can prevent linker errors when
building multiplatform kernels.
---
Implementing MSI controller support for devices that can be supported across
multiple architectures is made difficult due to the inconsistent handling of
MSIs across the architectures. This would lead to unnecessary conditional code
in an architecture agnostic MSI driver.

At present each MSI supporting architecture implements MSI callbacks
(include/linux/msi.h). In some architectures (arm, mips, tile) the callbacks
are implemented directly for each end user (SoC), e.g.
arch/arm/mach-iop13xx/msi.c, arch/mips/pci/pci-xlr.c, arch/tile/kernel/pci_gx.c.
In these scenarios where multiple MSI controllers exist (e.g. mips) it would be
impossible to produce a kernel image with support for multiple targets due to
multiple implementations of the callbacks (linker errors).

In other architectures (ia64, x86, powerpc) the callbacks are abstracted
through machine vector tables (e.g. arch/ia64/include/asm/machvec.h,
arch/powerpc/include/asm/machdep.h) - this approach allows for a form of
registration for end-users (in the case of powerpc: /platforms/*/*msi.c,
/platforms/powernv/pci.c, /sysdev/fsl_msi.c, /sysdev/*msi.c). Where multiple
controllers exist, run-time warnings may appear or the latest registered wins.

This patch provides for registration of MSI support in a common way. It is
hoped that further development of this mechansim will allow for multiple MSI
controllers serving different PCI domains and/or host-bridges.

In the case of sparc it appears as though the MSI implementation used depends
on the requesting PCI device's host bridge controller. As such multiple MSI
controllers can co-exist and as such sparc will not immediately benefit from
this patch.

This patch preserves the existing arch_ callback mechanism such that MSI support
will not break where this registration mechanism is not used - though it is
hoped that architectures will transition to this new mechanism and the old
callbacks can be deprecated.

The existing mechanism resulted in linker errors when arch_setup_msi_irq /
arch_teardown_msi_irq were not defined or defined multiple times - this no
longer occurs and a WARN is issued instead.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
---
 drivers/pci/msi.c   |  113 +++
 include/linux/msi.h |   16 +++-
 2 files changed, 111 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a825d78..352d4fe 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -25,9 +25,21 @@
 #include "msi.h"
 
 static int pci_msi_enable = 1;
+static struct msi_controller *ops;
 
 /* Arch hooks */
 
+static int __arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
+{
+   if (!ops)
+   return arch_msi_check_device(dev, nvec, type);
+
+   if (ops->msi_check_device)
+   return ops->msi_check_device(dev, nvec, type);
+
+   return 0;
+}
+
 #ifndef arch_msi_check_device
 int arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
 {
@@ -35,12 +47,44 @@ int arch_msi_check_device(struct pci_dev *dev, int nvec, 
int type)
 }
 #endif
 
+int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
+{
+   WARN_ON_ONCE(1);
+   return 1;
+}
+
+static int __arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
+{
+   if (!ops)
+   return arch_setup_msi_irq(dev, desc);
+
+   if (ops->setup_msi_irq)
+   return ops->setup_msi_irq(dev, desc);
+
+   WARN_ON_ONCE(1);
+   return 1;
+}
+
+void __weak arch_teardown_msi_irq(unsigned int irq)
+{
+   WARN_ON_ONCE(1);
+}
+
+static void __arch_teardown_msi_irq(unsigned int irq)
+{
+   if (!ops)
+   return arch_teardown_msi_irq(irq);
+
+   if (ops->teardown_msi_irq)
+   return ops->teardown_msi_irq(irq);
+
+   WARN_ON_ONCE(1);
+}
+
 #ifndef arch_setup_msi_irqs
 # define arch_setup_msi_irqs default_setup_msi_irqs
-# define HAVE_DEFAULT_MSI_SETUP_IRQS
 #endif
 
-#ifdef HAVE_DEFAULT_MSI_SETUP_IRQS
 int default_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
struct msi_desc *entry;
@@ -54,7 +98,7 @@ int default_setup_msi_irqs(struct pci_dev *dev, int nvec, int 
type)
return 1;
 
list_for_each_entry(entry, &dev->msi_list, list) {
-   ret = arch_setup_msi_irq(dev, entry);
+   ret = __arch_setup_msi_irq(dev, entry);
if (ret < 0)
return ret;
if (ret > 0)
@@ -63,14 +107,22 @@ int default_setup_msi_irqs(struct pci_dev *dev, int nvec, 
int type)
 
return 0;
 }
-#endif
+
+static int __arch_setup_msi_irqs(struct pci_dev

Re: [PATCH 10/14] PCI: tegra: Move PCIe driver to drivers/pci/host

2013-01-17 Thread Andrew Murray
On Wed, Jan 16, 2013 at 06:31:01PM +, Thierry Reding wrote:
> Alright, putting the functions into pci_ops doesn't sound like a very
> good idea then. Or perhaps it would make sense for hardware where the
> root complex and the MSI controller are handled by the same driver.
> Basically it could be done as a shortcut and if those are not filled
> in, the drivers could still opt to look up an MSI controller from a
> phandle specified in DT.
> 
> Even another alternative would be to keep the functions within the
> struct pci_ops and use generic ones if an external MSI controller is
> used. Just tossing around ideas.

I think an ideal solution would be for additional logic in drivers/msi.c
(e.g. in functions like msi_capability_init) to determine (based on the
passed in pci_dev) which MSI controller ops to use. I'm not sure the best
way to implement an association between an MSI controller and PCI busses
(I believe arch/sparc does something like this - perhaps there will be
inspiration there).

As you've pointed out, most RCs will have their own MSI controllers - so
it should be easy to register and associate both together.

I've submitted my previous work on MSI controller registration, but it
doesn't quite solve this problem - perhaps it can be a starting point?

Andrew Murray

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/14] PCI: tegra: Move PCIe driver to drivers/pci/host

2013-01-17 Thread Andrew Murray
On Thu, Jan 17, 2013 at 04:05:02PM +, Thierry Reding wrote:
> On Thu, Jan 17, 2013 at 03:42:36PM +0000, Andrew Murray wrote:
> > On Wed, Jan 16, 2013 at 06:31:01PM +, Thierry Reding wrote:
> > > Alright, putting the functions into pci_ops doesn't sound like a very
> > > good idea then. Or perhaps it would make sense for hardware where the
> > > root complex and the MSI controller are handled by the same driver.
> > > Basically it could be done as a shortcut and if those are not filled
> > > in, the drivers could still opt to look up an MSI controller from a
> > > phandle specified in DT.
> > > 
> > > Even another alternative would be to keep the functions within the
> > > struct pci_ops and use generic ones if an external MSI controller is
> > > used. Just tossing around ideas.
> > 
> > I think an ideal solution would be for additional logic in drivers/msi.c
> > (e.g. in functions like msi_capability_init) to determine (based on the
> > passed in pci_dev) which MSI controller ops to use. I'm not sure the best
> > way to implement an association between an MSI controller and PCI busses
> > (I believe arch/sparc does something like this - perhaps there will be
> > inspiration there).
> > 
> > As you've pointed out, most RCs will have their own MSI controllers - so
> > it should be easy to register and associate both together.
> > 
> > I've submitted my previous work on MSI controller registration, but it
> > doesn't quite solve this problem - perhaps it can be a starting point?
> 
> We basically have two cases:
> 
>   - The PCI host bridge contains registers for MSI support. In that case
> it makes little sense to uncouple the MSI implementation from the
> host bridge driver.
> 
>   - An MSI controller exists outside of the PCI host bridge. The PCI
> host bridge would in that case have to lookup an MSI controller (via
> DT phandle or some other method).
> 
> In either of those cases, does it make sense to use the MSI support
> outside the scope of the PCI infrastructure? That is, would devices
> other than PCI devices be able to generate an MSI?

I've come around to your way of thinking. Your approach sounds good for
registration of MSI ops - let the RC host driver do it (it probably has its
own), or use a helper for following a phandle to get ops that are not part
of the driver. MSIs won't be used outside of PCI devices.

Though existing drivers will use MSI framework functions to request MSIs, that
will result in callbacks to the arch_setup_msi_irqs type functions. These
functions would need to be updated to find these new ops if they exist, i.e. by
traversing the pci_dev structure up to the RC and finding a suitable structure.

Perhaps the msi ops could live alongside pci_ops in the pci_bus structure. This
way when traversing up the buses from the provided pci_dev - the first bus with
msi ops populated would be used?

If no ops are found, the standard arch callbacks can be called - thus preserving
exiting functionality.

Andrew Murray


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/14] PCI: tegra: Move PCIe driver to drivers/pci/host

2013-01-18 Thread Andrew Murray
On Thu, Jan 17, 2013 at 08:30:10PM +, Thierry Reding wrote:
> On Thu, Jan 17, 2013 at 04:22:18PM +0000, Andrew Murray wrote:
> > On Thu, Jan 17, 2013 at 04:05:02PM +, Thierry Reding wrote:
> > > On Thu, Jan 17, 2013 at 03:42:36PM +0000, Andrew Murray wrote:
> > > > On Wed, Jan 16, 2013 at 06:31:01PM +, Thierry Reding wrote:
> > > > > Alright, putting the functions into pci_ops doesn't sound like a very
> > > > > good idea then. Or perhaps it would make sense for hardware where the
> > > > > root complex and the MSI controller are handled by the same driver.
> > > > > Basically it could be done as a shortcut and if those are not filled
> > > > > in, the drivers could still opt to look up an MSI controller from a
> > > > > phandle specified in DT.
> > > > > 
> > > > > Even another alternative would be to keep the functions within the
> > > > > struct pci_ops and use generic ones if an external MSI controller is
> > > > > used. Just tossing around ideas.
> > > > 
> > > > I think an ideal solution would be for additional logic in drivers/msi.c
> > > > (e.g. in functions like msi_capability_init) to determine (based on the
> > > > passed in pci_dev) which MSI controller ops to use. I'm not sure the 
> > > > best
> > > > way to implement an association between an MSI controller and PCI busses
> > > > (I believe arch/sparc does something like this - perhaps there will be
> > > > inspiration there).
> > > > 
> > > > As you've pointed out, most RCs will have their own MSI controllers - so
> > > > it should be easy to register and associate both together.
> > > > 
> > > > I've submitted my previous work on MSI controller registration, but it
> > > > doesn't quite solve this problem - perhaps it can be a starting point?
> > > 
> > > We basically have two cases:
> > > 
> > >   - The PCI host bridge contains registers for MSI support. In that case
> > > it makes little sense to uncouple the MSI implementation from the
> > > host bridge driver.
> > > 
> > >   - An MSI controller exists outside of the PCI host bridge. The PCI
> > > host bridge would in that case have to lookup an MSI controller (via
> > > DT phandle or some other method).
> > > 
> > > In either of those cases, does it make sense to use the MSI support
> > > outside the scope of the PCI infrastructure? That is, would devices
> > > other than PCI devices be able to generate an MSI?
> > 
> > I've come around to your way of thinking. Your approach sounds good for
> > registration of MSI ops - let the RC host driver do it (it probably has its
> > own), or use a helper for following a phandle to get ops that are not part
> > of the driver. MSIs won't be used outside of PCI devices.
> > 
> > Though existing drivers will use MSI framework functions to request MSIs, 
> > that
> > will result in callbacks to the arch_setup_msi_irqs type functions. These
> > functions would need to be updated to find these new ops if they exist, 
> > i.e. by
> > traversing the pci_dev structure up to the RC and finding a suitable 
> > structure.
> > 
> > Perhaps the msi ops could live alongside pci_ops in the pci_bus structure. 
> > This
> > way when traversing up the buses from the provided pci_dev - the first bus 
> > with
> > msi ops populated would be used?
> > 
> > If no ops are found, the standard arch callbacks can be called - thus 
> > preserving
> > exiting functionality.
> 
> Yes, what you describe is exactly what I had in mind. I've been thinking
> about a possible implementation and there may be some details that could
> prove difficult to resolve. For instance, we likely need to pass context
> around for the MSI ops, or else make sure that they can find the context
> from the struct pci_dev or by traversing upwards from it.
> 
> I think for the case where the MSI hardware is controlled by the same
> driver as the PCI host bridge, doing this is easy because the context
> could be part of the PCI host bridge context, which in case of Tegra is
> stored in struct pci_bus' sysdata field (which is actually an ARM struct
> pci_sys_data and in turn stores a pointer to the struct tegra_pcie in
> the .private_data field). Other drivers often just use a global variable
> assuming that there will only ever be a single instance of the PCI host
> bridge.

Yes.

&g

Re: [PATCH 10/14] PCI: tegra: Move PCIe driver to drivers/pci/host

2013-01-18 Thread Andrew Murray
On Wed, Jan 09, 2013 at 08:43:10PM +, Thierry Reding wrote:
> Move the PCIe driver from arch/arm/mach-tegra into the drivers/pci/host
> directory. The motivation is to collect various host controller drivers
> in the same location in order to facilitate refactoring.
> 
> The Tegra PCIe driver has been largely rewritten, both in order to turn
> it into a proper platform driver and to add MSI (based on code by
> Krishna Kishore ) as well as device tree support.
> 
> Signed-off-by: Thierry Reding 

[snip]

> +static int tegra_pcie_enable(struct tegra_pcie *pcie)
> +{
> +   struct hw_pci hw;
> +
> +   memset(&hw, 0, sizeof(hw));
> +
> +   hw.nr_controllers = 1;
> +   hw.private_data = (void **)&pcie;
> +   hw.setup = tegra_pcie_setup;
> +   hw.scan = tegra_pcie_scan_bus;
> +   hw.map_irq = tegra_pcie_map_irq;
> +
> +   pci_common_init(&hw);
> +
> +   return 0;
> +}

[snip]

> +static int tegra_pcie_probe(struct platform_device *pdev)
> +{
> +   struct device_node *port;
> +   struct tegra_pcie *pcie;
> +   int err;
> +
> +   pcie = devm_kzalloc(&pdev->dev, sizeof(*pcie), GFP_KERNEL);
> +   if (!pcie)
> +   return -ENOMEM;
> +
> +   INIT_LIST_HEAD(&pcie->ports);
> +   pcie->dev = &pdev->dev;
> +
> +   err = tegra_pcie_parse_dt(pcie);
> +   if (err < 0)
> +   return err;
> +
> +   pcibios_min_mem = 0;
> +
> +   err = tegra_pcie_get_resources(pcie);
> +   if (err < 0) {
> +   dev_err(&pdev->dev, "failed to request resources: %d\n", err);
> +   return err;
> +   }
> +
> +   err = tegra_pcie_enable_controller(pcie);
> +   if (err)
> +   goto put_resources;
> +
> +   /* probe root ports */
> +   for_each_child_of_node(pdev->dev.of_node, port) {
> +   if (!of_device_is_available(port))
> +   continue;
> +
> +   err = tegra_pcie_add_port(pcie, port);
> +   if (err < 0) {
> +   dev_err(&pdev->dev, "failed to add port %s: %d\n",
> +   port->name, err);
> +   }
> +   }
> +
> +   /* setup the AFI address translations */
> +   tegra_pcie_setup_translations(pcie);
> +
> +   if (IS_ENABLED(CONFIG_PCI_MSI)) {
> +   err = tegra_pcie_enable_msi(pcie);
> +   if (err < 0) {
> +   dev_err(&pdev->dev,
> +   "failed to enable MSI support: %d\n",
> +   err);
> +   goto put_resources;
> +   }
> +   }
> +
> +   err = tegra_pcie_enable(pcie);
> +   if (err < 0) {
> +   dev_err(&pdev->dev, "failed to enable PCIe ports: %d\n", err);
> +   goto disable_msi;
> +   }
> +
> +   platform_set_drvdata(pdev, pcie);
> +   return 0;
> +
> +disable_msi:
> +   if (IS_ENABLED(CONFIG_PCI_MSI))
> +   tegra_pcie_disable_msi(pcie);
> +put_resources:
> +   tegra_pcie_put_resources(pcie);
> +   return err;
> +}
> +

[snip]

> +
> +static const struct of_device_id tegra_pcie_of_match[] = {
> +   { .compatible = "nvidia,tegra20-pcie", },
> +   { },
> +};
> +
> +static struct platform_driver tegra_pcie_driver = {
> +   .driver = {
> +   .name = "tegra-pcie",
> +   .owner = THIS_MODULE,
> +   .of_match_table = tegra_pcie_of_match,
> +   },
> +   .probe = tegra_pcie_probe,
> +   .remove = tegra_pcie_remove,
> +};
> +module_platform_driver(tegra_pcie_driver);

If you have multiple 'nvidia,tegra20-pcie's in your DT then you will end up
with multiple calls to tegra_pcie_probe/tegra_pcie_enable/pci_common_init.

However pci_common_init/pcibios_init_hw assumes it will only ever be called
once, and will thus result in trying to create multiple busses with the same
bus number. (The first root bus it creates is always zero provided you haven't
implemented hw->scan).

I have a patch for this if you want to fold it into your series? (I see you've
made changes to bios32 for per-controller data).

Andrew Murray

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC 1/2] Implementation of pci_fixup_irqs for descendants of a specified bus

2013-01-18 Thread Andrew Murray
Continuing from discussion with Thierry (lkml.org/lkml/2013/1/18/107) perhaps
this will be useful to fold into your patchset. 
---
This patch provides pci_bus_fixup_irqs which performs the same
function as pci_fixup_irqs but only to descendants of the specified
bus.

This can reduce unnecessary fixing up of device irqs when new buses
are added.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
---
 drivers/pci/setup-irq.c |   15 +++
 include/linux/pci.h |3 +++
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/setup-irq.c b/drivers/pci/setup-irq.c
index eb219a1..ea91874 100644
--- a/drivers/pci/setup-irq.c
+++ b/drivers/pci/setup-irq.c
@@ -62,3 +62,18 @@ pci_fixup_irqs(u8 (*swizzle)(struct pci_dev *, u8 *),
for_each_pci_dev(dev)
pdev_fixup_irq(dev, swizzle, map_irq);
 }
+
+void __init
+pci_bus_fixup_irqs(struct pci_bus *bus,
+   u8 (*swizzle)(struct pci_dev *, u8 *),
+   int (*map_irq)(const struct pci_dev *, u8, u8))
+{
+   struct pci_dev *dev;
+
+   list_for_each_entry(dev, &bus->devices, bus_list) {
+   pdev_fixup_irq(dev, swizzle, map_irq);
+
+   if (dev->subordinate)
+   pci_bus_fixup_irqs(dev->subordinate, swizzle, map_irq);
+   }
+}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5faa831..1b3c2eb 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -953,6 +953,9 @@ void pdev_enable_device(struct pci_dev *);
 int pci_enable_resources(struct pci_dev *, int mask);
 void pci_fixup_irqs(u8 (*)(struct pci_dev *, u8 *),
int (*)(const struct pci_dev *, u8, u8));
+void pci_bus_fixup_irqs(struct pci_bus *bus,
+   u8 (*swizzle)(struct pci_dev *, u8 *),
+   int (*map_irq)(const struct pci_dev *, u8, u8));
 #define HAVE_PCI_REQ_REGIONS   2
 int __must_check pci_request_regions(struct pci_dev *, const char *);
 int __must_check pci_request_regions_exclusive(struct pci_dev *, const char *);
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC 2/2] Improve bios32 support for DT PCI host bridge controllers

2013-01-18 Thread Andrew Murray
Continuing from discussion with Thierry (lkml.org/lkml/2013/1/18/107) perhaps
this will be useful to fold into your patchset - you may wish to remove the
overlap. 
---
This patch attempts to overcome two difficulities when providing DT PCI host
bridge controllers:

At present PCI controllers are registered via the pci_common_init call, this
results in callbacks (arch/arm/include/asm/mach/pci.h) which are used to setup
the controller. However there is no trivial way to pass a device_node to the
callbacks which is known at the time of calling pci_common_init. This is
required in order to add pci resources (pci_add_resource_offset) based on
information obtained from the device tree. This patch updates the hw_pci and
pci_sys_data structures such that drivers can provide a device_node to
pci_common_init and access it through the pci_sys_data argument of the
callbacks.

Additionally bios32 makes an assumption that all host controllers are
registered at the same time and handled by the same driver. This patch provides
support for calling pci_common_init multiple times to allow for one at a time
registration of PCI host controllers.

It also adds support for setting up of PCIe MPS and MRRS.

Signed-off-by: Andrew Murray 
Signed-off-by: Liviu Dudau 
---
 arch/arm/include/asm/mach/pci.h |2 ++
 arch/arm/kernel/bios32.c|   29 -
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/mach/pci.h b/arch/arm/include/asm/mach/pci.h
index 26c511f..845a6b7 100644
--- a/arch/arm/include/asm/mach/pci.h
+++ b/arch/arm/include/asm/mach/pci.h
@@ -27,6 +27,7 @@ struct hw_pci {
void(*postinit)(void);
u8  (*swizzle)(struct pci_dev *dev, u8 *pin);
int (*map_irq)(const struct pci_dev *dev, u8 slot, u8 pin);
+   struct device_node *of_node;
 };
 
 /*
@@ -47,6 +48,7 @@ struct pci_sys_data {
/* IRQ mapping  
*/
int (*map_irq)(const struct pci_dev *, u8, u8);
void*private_data;  /* platform controller private data 
*/
+   struct device_node *of_node;/* device tree node 
*/
 };
 
 /*
diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index 2b2f25e..bde4630 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -426,10 +427,10 @@ static int pcibios_map_irq(const struct pci_dev *dev, u8 
slot, u8 pin)
 static void __init pcibios_init_hw(struct hw_pci *hw, struct list_head *head)
 {
struct pci_sys_data *sys = NULL;
+   static int busnr;
int ret;
-   int nr, busnr;
-
-   for (nr = busnr = 0; nr < hw->nr_controllers; nr++) {
+   int nr;
+   for (nr = 0; nr < hw->nr_controllers; nr++) {
sys = kzalloc(sizeof(struct pci_sys_data), GFP_KERNEL);
if (!sys)
panic("PCI: unable to allocate sys data!");
@@ -440,6 +441,7 @@ static void __init pcibios_init_hw(struct hw_pci *hw, 
struct list_head *head)
sys->busnr   = busnr;
sys->swizzle = hw->swizzle;
sys->map_irq = hw->map_irq;
+   sys->of_node = hw->of_node;
INIT_LIST_HEAD(&sys->resources);
 
ret = hw->setup(nr, sys);
@@ -484,10 +486,11 @@ void __init pci_common_init(struct hw_pci *hw)
if (hw->postinit)
hw->postinit();
 
-   pci_fixup_irqs(pcibios_swizzle, pcibios_map_irq);
-
list_for_each_entry(sys, &head, node) {
struct pci_bus *bus = sys->bus;
+   struct pci_bus *child;
+
+   pci_bus_fixup_irqs(bus, pcibios_swizzle, pcibios_map_irq);
 
if (!pci_has_flag(PCI_PROBE_ONLY)) {
/*
@@ -504,6 +507,16 @@ void __init pci_common_init(struct hw_pci *hw)
 * Enable bridges
 */
pci_enable_bridges(bus);
+
+   /*
+* Configure children (MPS, MRRS)
+*/
+   list_for_each_entry(child, &bus->children, node) {
+   struct pci_dev *self = child->self;
+   if (!self)
+   continue;
+   pcie_bus_configure_settings(child, 
self->pcie_mpss);
+   }
}
 
/*
@@ -627,3 +640,9 @@ int pci_mmap_page_range(struct pci_dev *dev, struct 
vm_area_struct *vma,
 
return 0;
 }
+
+struct device_node *pcibios_get_phb_of_node(struct pci_bus *bus)
+{
+   struct pci_sys_data *sys = bus->sysdata;
+   return of_node_get(sys->of_node);
+}
-- 
1.7.0.4


-

[PATCH] sh: sh7712 clock support

2007-12-31 Thread Andrew Murray
From: Andrew Murray <[EMAIL PROTECTED]>

This patch provides specific clock support for the SH7712. (This is my first 
ever patch, so apologies if I've not followed the procedure correctly!)

Signed-off-by: Andrew Murray <[EMAIL PROTECTED]>
---
diff -uprN -x sh-2.6/Documentation/dontdiff 
sh-2.6/arch/sh/kernel/cpu/sh3/clock-sh7712.c 
sh-2.6-devel/arch/sh/kernel/cpu/sh3/clock-sh7712.c
--- sh-2.6/arch/sh/kernel/cpu/sh3/clock-sh7712.c1970-01-01 
01:00:00.0 +0100
+++ sh-2.6-devel/arch/sh/kernel/cpu/sh3/clock-sh7712.c  2007-12-31 
15:04:51.0 +
@@ -0,0 +1,81 @@
+/*
+ * arch/sh/kernel/cpu/sh3/clock-sh7712.c
+ *
+ * SH7712 support for the clock framework
+ *
+ *  Copyright (C) 2007  Andrew Murray <[EMAIL PROTECTED]>
+ *
+ * Based on arch/sh/kernel/cpu/sh3/clock-sh3.c
+ *  Copyright (C) 2005  Paul Mundt
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int multipliers[] = { 1, 2, 3 };
+static int divisors[]= { 1, 2, 3, 4, 6 };
+
+static void master_clk_init(struct clk *clk)
+{
+   int frqcr = ctrl_inw(FRQCR);
+   int idx = (frqcr & 0x0300) >> 8;
+
+   clk->rate *= multipliers[idx];
+}
+
+static struct clk_ops sh7712_master_clk_ops = {
+   .init   = master_clk_init,
+};
+
+static void module_clk_recalc(struct clk *clk)
+{
+   int frqcr = ctrl_inw(FRQCR);
+   int idx = frqcr & 0x0007;
+
+   clk->rate = clk->parent->rate / divisors[idx];
+}
+
+static struct clk_ops sh7712_module_clk_ops = {
+   .recalc = module_clk_recalc,
+};
+
+static void bus_clk_init(struct clk *clk)
+{
+   clk->rate = CONFIG_SH_PCLK_FREQ;
+}
+
+static struct clk_ops sh7712_bus_clk_ops = {
+   .init   = bus_clk_init,
+};
+
+static void cpu_clk_recalc(struct clk *clk)
+{
+   int frqcr = ctrl_inw(FRQCR);
+   int idx = (frqcr & 0x0030) >> 4;
+
+   clk->rate = clk->parent->rate / divisors[idx];
+}
+
+static struct clk_ops sh7712_cpu_clk_ops = {
+   .recalc = cpu_clk_recalc,
+};
+
+static struct clk_ops *sh7712_clk_ops[] = {
+   &sh7712_master_clk_ops,
+   &sh7712_module_clk_ops,
+   &sh7712_bus_clk_ops,
+   &sh7712_cpu_clk_ops,
+};
+
+void __init arch_init_clk_ops(struct clk_ops **ops, int idx)
+{
+   if (idx < ARRAY_SIZE(sh7712_clk_ops))
+   *ops = sh7712_clk_ops[idx];
+}
+
diff -uprN -x sh-2.6/Documentation/dontdiff 
sh-2.6/arch/sh/kernel/cpu/sh3/Makefile 
sh-2.6-devel/arch/sh/kernel/cpu/sh3/Makefile
--- sh-2.6/arch/sh/kernel/cpu/sh3/Makefile  2007-12-31 14:47:32.0 
+
+++ sh-2.6-devel/arch/sh/kernel/cpu/sh3/Makefile2007-12-31 
15:01:15.0 +
@@ -22,5 +22,6 @@ clock-$(CONFIG_CPU_SUBTYPE_SH7706):= cl
 clock-$(CONFIG_CPU_SUBTYPE_SH7709) := clock-sh7709.o
 clock-$(CONFIG_CPU_SUBTYPE_SH7710) := clock-sh7710.o
 clock-$(CONFIG_CPU_SUBTYPE_SH7720) := clock-sh7710.o
+clock-$(CONFIG_CPU_SUBTYPE_SH7712) := clock-sh7712.o
 
 obj-y  += $(clock-y)
diff -uprN -x sh-2.6/Documentation/dontdiff 
sh-2.6/include/asm-sh/cpu-sh3/freq.h sh-2.6-devel/include/asm-sh/cpu-sh3/freq.h
--- sh-2.6/include/asm-sh/cpu-sh3/freq.h2007-12-31 14:47:47.0 
+
+++ sh-2.6-devel/include/asm-sh/cpu-sh3/freq.h  2007-12-31 15:02:30.0 
+
@@ -10,7 +10,12 @@
 #ifndef __ASM_CPU_SH3_FREQ_H
 #define __ASM_CPU_SH3_FREQ_H
 
+#ifdef CONFIG_CPU_SUBTYPE_SH7712
+#define FRQCR  0xA415FF80
+#else
 #define FRQCR  0xff80
+#endif
+
 #define MIN_DIVISOR_NR 0
 #define MAX_DIVISOR_NR 4


Internal Virus Database is out-of-date.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.17.8 - Release Date: 24/12/2007 00:00
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] sh: sh7712 clock support

2007-12-31 Thread Andrew Murray
Hello,

Yes you are correct. Here is an updated patch. Happy New Year.

---
From: Andrew Murray <[EMAIL PROTECTED]>

This patch provides specific clock support for the SH7712. (This is my first 
ever patch, so apologies if I've not followed the procedure correctly!)

Signed-off-by: Andrew Murray <[EMAIL PROTECTED]>
---
diff -uprN -x sh-2.6/Documentation/dontdiff 
sh-2.6/arch/sh/kernel/cpu/sh3/clock-sh7712.c 
sh-2.6-devel/arch/sh/kernel/cpu/sh3/clock-sh7712.c
--- sh-2.6/arch/sh/kernel/cpu/sh3/clock-sh7712.c1970-01-01 
01:00:00.0 +0100
+++ sh-2.6-devel/arch/sh/kernel/cpu/sh3/clock-sh7712.c  2007-12-31 
15:04:51.0 +
@@ -0,0 +1,71 @@
+/*
+ * arch/sh/kernel/cpu/sh3/clock-sh7712.c
+ *
+ * SH7712 support for the clock framework
+ *
+ *  Copyright (C) 2007  Andrew Murray <[EMAIL PROTECTED]>
+ *
+ * Based on arch/sh/kernel/cpu/sh3/clock-sh3.c
+ *  Copyright (C) 2005  Paul Mundt
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int multipliers[] = { 1, 2, 3 };
+static int divisors[]= { 1, 2, 3, 4, 6 };
+
+static void master_clk_init(struct clk *clk)
+{
+   int frqcr = ctrl_inw(FRQCR);
+   int idx = (frqcr & 0x0300) >> 8;
+
+   clk->rate *= multipliers[idx];
+}
+
+static struct clk_ops sh7712_master_clk_ops = {
+   .init   = master_clk_init,
+};
+
+static void module_clk_recalc(struct clk *clk)
+{
+   int frqcr = ctrl_inw(FRQCR);
+   int idx = frqcr & 0x0007;
+
+   clk->rate = clk->parent->rate / divisors[idx];
+}
+
+static struct clk_ops sh7712_module_clk_ops = {
+   .recalc = module_clk_recalc,
+};
+
+static void cpu_clk_recalc(struct clk *clk)
+{
+   int frqcr = ctrl_inw(FRQCR);
+   int idx = (frqcr & 0x0030) >> 4;
+
+   clk->rate = clk->parent->rate / divisors[idx];
+}
+
+static struct clk_ops sh7712_cpu_clk_ops = {
+   .recalc = cpu_clk_recalc,
+};
+
+static struct clk_ops *sh7712_clk_ops[] = {
+   &sh7712_master_clk_ops,
+   &sh7712_module_clk_ops,
+   &sh7712_cpu_clk_ops,
+};
+
+void __init arch_init_clk_ops(struct clk_ops **ops, int idx)
+{
+   if (idx < ARRAY_SIZE(sh7712_clk_ops))
+   *ops = sh7712_clk_ops[idx];
+}
+
diff -uprN -x sh-2.6/Documentation/dontdiff 
sh-2.6/arch/sh/kernel/cpu/sh3/Makefile 
sh-2.6-devel/arch/sh/kernel/cpu/sh3/Makefile
--- sh-2.6/arch/sh/kernel/cpu/sh3/Makefile  2007-12-31 14:47:32.0 
+
+++ sh-2.6-devel/arch/sh/kernel/cpu/sh3/Makefile2007-12-31 
15:01:15.0 +
@@ -22,5 +22,6 @@ clock-$(CONFIG_CPU_SUBTYPE_SH7706):= cl
 clock-$(CONFIG_CPU_SUBTYPE_SH7709) := clock-sh7709.o
 clock-$(CONFIG_CPU_SUBTYPE_SH7710) := clock-sh7710.o
 clock-$(CONFIG_CPU_SUBTYPE_SH7720) := clock-sh7710.o
+clock-$(CONFIG_CPU_SUBTYPE_SH7712) := clock-sh7712.o
 
 obj-y  += $(clock-y)
diff -uprN -x sh-2.6/Documentation/dontdiff 
sh-2.6/include/asm-sh/cpu-sh3/freq.h sh-2.6-devel/include/asm-sh/cpu-sh3/freq.h
--- sh-2.6/include/asm-sh/cpu-sh3/freq.h2007-12-31 14:47:47.0 
+
+++ sh-2.6-devel/include/asm-sh/cpu-sh3/freq.h  2007-12-31 15:02:30.0 
+
@@ -10,7 +10,12 @@
 #ifndef __ASM_CPU_SH3_FREQ_H
 #define __ASM_CPU_SH3_FREQ_H
 
+#ifdef CONFIG_CPU_SUBTYPE_SH7712
+#define FRQCR  0xA415FF80
+#else
 #define FRQCR  0xff80
+#endif
+
 #define MIN_DIVISOR_NR 0
 #define MAX_DIVISOR_NR 4


-Original Message-----
From: Paul Mundt [mailto:[EMAIL PROTECTED] 
Sent: 31 December 2007 16:50
To: Andrew Murray
Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] sh: sh7712 clock support

On Mon, Dec 31, 2007 at 04:32:25PM -, Andrew Murray wrote:
> +static void bus_clk_init(struct clk *clk)
> +{
> + clk->rate = CONFIG_SH_PCLK_FREQ;
> +}
> +
> +static struct clk_ops sh7712_bus_clk_ops = {
> + .init   = bus_clk_init,
> +};
> +
This shouldn't be necessary, the bus clk already references the master
clk as its parent, and inherits the rate from there. If you have no
recalc work to do, then you should be able to just leave this out
altogether. If you actually need this dummy init, then something is
broken with the upper layer, which we should fix ;-)

Patch looks fine otherwise. Tidy this up and I'll apply it.


Internal Virus Database is out-of-date.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.17.8 - Release Date: 24/12/2007 00:00
 

Internal Virus Database is out-of-date.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.17.8 - Release Date: 24/12/2007 00:00
 
  


sh7712clocksupportpatch2
Description: sh7712clocksupportpatch2


[PATCH] sh: sh7712 defconfig

2008-01-02 Thread Andrew Murray
From: Andrew Murray <[EMAIL PROTECTED]>

This patch provides a correct value for CONFIG_SH_PCLK_FREQ for the SH7712 
solution engine when used with the board's default factory settings. This 
results in the board running at its maximum CPU clock rate (200 MHz).

The board I have is a Japanese Solution Engine with a 66 MHz (extal) crystal. 
Is this the only variant of the Solution Engine for the SH7712?

Signed-off-by: Andrew Murray <[EMAIL PROTECTED]>
---
diff -uprN -X sh-2.6/Documentation/dontdiff 
sh-2.6/arch/sh/configs/se7712_defconfig 
sh-2.6-devel/arch/sh/configs/se7712_defconfig
--- sh-2.6/arch/sh/configs/se7712_defconfig 2008-01-02 10:13:44.0 
+
+++ sh-2.6-devel/arch/sh/configs/se7712_defconfig   2008-01-02 
10:28:32.0 +
@@ -237,7 +237,7 @@ CONFIG_CPU_HAS_SR_RB=y
 CONFIG_SH_TMU=y
 CONFIG_SH_TIMER_IRQ=16
 # CONFIG_NO_IDLE_HZ is not set
-CONFIG_SH_PCLK_FREQ=
+CONFIG_SH_PCLK_FREQ=

 #
 # CPU Frequency scaling

Internal Virus Database is out-of-date.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.17.8 - Release Date: 24/12/2007 00:00
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 1/1] arm64: Early boot time stamps

2018-11-20 Thread Andrew Murray
On Tue, Nov 20, 2018 at 09:40:10AM -0500, Pavel Tatashin wrote:
> > > +static __init void sched_clock_early_init(void)
> > > +{
> > > + u64 freq = arch_timer_get_cntfrq();
> > > + u64 (*read_time)(void) = arch_counter_get_cntvct;
> >
> > We already have arch_timer_read_counter which is exposed from
> > arm_arch_timer.h.
> 
> OK
> 
> >
> > > +
> > > + /* Early clock is available only on platforms with known freqs */
> >
> > This comment is misleading. It should read something like:
> >
> > /*
> >  * The arm64 boot protocol mandates that CNTFRQ_EL0 reflects
> >  * the timer frequency. To avoid breakage on misconfigured
> >  * systems, do not register the early sched_clock if the
> >  * programmed value if zero. Other random values will just
> >  * result in random output.
> >  */
> >
> 
> OK
> 
> > > + if (!freq)
> > > + return;
> > > +
> > > + sched_clock_register(read_time, BITS_PER_LONG, freq);
> >
> > This doesn't seem right. The counter has an architected minimum of 56
> > bits, and you can't assume that it is going to be more than that.
> 
> Yeah, I saw 56 is used in arm_arch_timer.c, but I could not find where
> this minimum is defined in aarch64 specs. I will change it to 56.

See section G5.1.2 of the ARM ARM for details.

Thanks,

Andrew Murray

> 
> I will send v2 soon.
> 
> Thank you,
> Pasha


[PATCH 03/10] arm: perf: add additional validation to set_event_filter

2018-11-16 Thread Andrew Murray
The armv7pmu driver doesn't support host/guest mode exclusion so
let's report this when set_event_filter is called with these
exclusion flags set.

Signed-off-by: Andrew Murray 
---
 arch/arm/kernel/perf_event_v7.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index a4fb0f8..c4c9fbb 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1074,6 +1074,8 @@ static int armv7pmu_set_event_filter(struct hw_perf_event 
*event,
 
if (attr->exclude_idle)
return -EPERM;
+   if (attr->exclude_host || attr->exclude_guest)
+   return -EPERM;
if (attr->exclude_user)
config_base |= ARMV7_EXCLUDE_USER;
if (attr->exclude_kernel)
-- 
2.7.4



[PATCH 02/10] arm: perf/core: generalise event exclusion checking with perf macro

2018-11-16 Thread Andrew Murray
Replace checking of perf event exclusion flags with perf macro.

Signed-off-by: Andrew Murray 
---
 arch/arm/mach-imx/mmdc.c | 8 +---
 arch/arm/mm/cache-l2x0-pmu.c | 7 +--
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index 04b3bf7..d9d468f 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -293,13 +293,7 @@ static int mmdc_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   if (event->attr.exclude_user||
-   event->attr.exclude_kernel  ||
-   event->attr.exclude_hv  ||
-   event->attr.exclude_idle||
-   event->attr.exclude_host||
-   event->attr.exclude_guest   ||
-   event->attr.sample_period)
+   if (event_has_exclude_flags(event) || event->attr.sample_period)
return -EINVAL;
 
if (cfg < 0 || cfg >= MMDC_NUM_COUNTERS)
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index afe5b4c..968fdf8 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -314,12 +314,7 @@ static int l2x0_pmu_event_init(struct perf_event *event)
event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
+   if (event_has_exclude_flags(event))
return -EINVAL;
 
if (event->cpu < 0)
-- 
2.7.4



[PATCH v3 11/12] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray
For x86 PMUs that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

This change means that amd/iommu and amd/uncore will now also
indicate that they do not support exclude_{hv|idle} and intel/uncore
that it does not support exclude_{guest|host}.

Signed-off-by: Andrew Murray 
---
 arch/x86/events/amd/iommu.c| 6 +-
 arch/x86/events/amd/uncore.c   | 7 ++-
 arch/x86/events/intel/uncore.c | 9 +
 3 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 3210fee..7635c23 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -223,11 +223,6 @@ static int perf_iommu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* IOMMU counters do not have usr/os/guest/host bits */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -414,6 +409,7 @@ static const struct pmu iommu_pmu __initconst = {
.read   = perf_iommu_read,
.task_ctx_nr= perf_invalid_context,
.attr_groups= amd_iommu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static __init int init_one_iommu(unsigned int idx)
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 8671de1..988cb9c 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -201,11 +201,6 @@ static int amd_uncore_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* NB and Last level cache counters do not have usr/os/guest/host bits 
*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
/* and we do not enable counter overflow interrupts */
hwc->config = event->attr.config & AMD64_RAW_EVENT_MASK_NB;
hwc->idx = -1;
@@ -307,6 +302,7 @@ static struct pmu amd_nb_pmu = {
.start  = amd_uncore_start,
.stop   = amd_uncore_stop,
.read   = amd_uncore_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static struct pmu amd_llc_pmu = {
@@ -317,6 +313,7 @@ static struct pmu amd_llc_pmu = {
.start  = amd_uncore_start,
.stop   = amd_uncore_stop,
.read   = amd_uncore_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static struct amd_uncore *amd_uncore_alloc(unsigned int cpu)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 27a4614..d516161 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -695,14 +695,6 @@ static int uncore_pmu_event_init(struct perf_event *event)
if (pmu->func_id < 0)
return -ENOENT;
 
-   /*
-* Uncore PMU does measure at all privilege level all the time.
-* So it doesn't make sense to specify any exclude bits.
-*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle)
-   return -EINVAL;
-
/* Sampling not supported yet */
if (hwc->sample_period)
return -EINVAL;
@@ -800,6 +792,7 @@ static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
.stop   = uncore_pmu_event_stop,
.read   = uncore_pmu_event_read,
.module = THIS_MODULE,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
} else {
pmu->pmu = *pmu->type->pmu;
-- 
2.7.4



[PATCH v3 00/12] perf/core: Generalise event exclusion checking

2018-12-06 Thread Andrew Murray
Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags.

However this approach requires that each time a new event modifier is
added to perf, all the perf drivers need to be modified to indicate that
they don't support the attribute. This results in additional boiler-plate
code common to many drivers that needs to be maintained. Furthermore the
drivers are not consistent with regards to the error value they return
when reporting unsupported attributes.

This patchset allow PMU drivers to advertise their inability to exclude
based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This
allows the perf core to reject requests for exclusion events where there
is no support in the PMU.

This is a functional change, in particular:

 - Some drivers will now additionally (but correctly) report unsupported
   exclusion flags. It's typical for existing userspace tools such as
   perf to handle such errors by retrying the system call without the
   unsupported flags.

 - Drivers that do not support any exclusion that previously reported
   -EPERM or -EOPNOTSUPP will now report -EINVAL - this is consistent
   with the majority and results in userspace perf retrying without
   exclusion.

All drivers touched by this patchset have been compile tested.

Changes from v2:

 - Invert logic from CAP_EXCLUDE to CAP_NO_EXCLUDE

Changes from v1:

 - Changed approach from explicitly rejecting events in unsupporting PMU
   drivers to explicitly advertising a capability in PMU drivers that
   do support exclusion events

 - Added additional information to tools/perf/design.txt

 - Rename event_has_exclude_flags to event_has_any_exclude_flag and
   update commit log to reflect it's a function

Andrew Murray (12):
  perf/doc: update design.txt for exclude_{host|guest} flags
  perf/core: add function to test for event exclusion flags
  perf/core: add PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs
  alpha: perf/core: use PERF_PMU_CAP_NO_EXCLUDE
  arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE
  arm: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
  drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude
incapable PMUs
  drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude
incapable PMUs
  powerpc: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable
PMUs
  x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
  x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
  perf/core: remove unused perf_flags

 arch/alpha/kernel/perf_event.c|  7 +--
 arch/arm/mach-imx/mmdc.c  |  9 ++---
 arch/arm/mm/cache-l2x0-pmu.c  |  9 +
 arch/powerpc/perf/hv-24x7.c   | 10 +-
 arch/powerpc/perf/hv-gpci.c   | 10 +-
 arch/powerpc/perf/imc-pmu.c   | 19 +--
 arch/x86/events/amd/ibs.c | 13 +
 arch/x86/events/amd/iommu.c   |  6 +-
 arch/x86/events/amd/power.c   | 10 ++
 arch/x86/events/amd/uncore.c  |  7 ++-
 arch/x86/events/intel/cstate.c| 12 +++-
 arch/x86/events/intel/rapl.c  |  9 ++---
 arch/x86/events/intel/uncore.c|  9 +
 arch/x86/events/intel/uncore_snb.c|  9 ++---
 arch/x86/events/msr.c | 10 ++
 drivers/perf/arm-cci.c| 10 +-
 drivers/perf/arm-ccn.c|  6 ++
 drivers/perf/arm_dsu_pmu.c|  9 ++---
 drivers/perf/arm_pmu.c| 15 +--
 drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c |  1 +
 drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_pmu.c  |  9 -
 drivers/perf/qcom_l2_pmu.c|  9 +
 drivers/perf/qcom_l3_pmu.c|  8 +---
 drivers/perf/xgene_pmu.c  |  6 +-
 include/linux/perf_event.h| 10 ++
 include/uapi/linux/perf_event.h   |  2 --
 kernel/events/core.c  |  9 +
 tools/include/uapi/linux/perf_event.h |  2 --
 tools/perf/design.txt |  4 
 31 files changed, 62 insertions(+), 189 deletions(-)

-- 
2.7.4



[PATCH v3 02/12] perf/core: add function to test for event exclusion flags

2018-12-06 Thread Andrew Murray
Add a function that tests if any of the perf event exclusion flags
are set on a given event.

Signed-off-by: Andrew Murray 
---
 include/linux/perf_event.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 53c500f..b2e806f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
 extern void
 perf_log_lost_samples(struct perf_event *event, u64 lost);
 
+static inline bool event_has_any_exclude_flag(struct perf_event *event)
+{
+   struct perf_event_attr *attr = &event->attr;
+
+   return attr->exclude_idle || attr->exclude_user ||
+  attr->exclude_kernel || attr->exclude_hv ||
+  attr->exclude_guest || attr->exclude_host;
+}
+
 static inline bool is_sampling_event(struct perf_event *event)
 {
return event->attr.sample_period != 0;
-- 
2.7.4



[PATCH v3 01/12] perf/doc: update design.txt for exclude_{host|guest} flags

2018-12-06 Thread Andrew Murray
Update design.txt to reflect the presence of the exclude_host
and exclude_guest perf flags.

Signed-off-by: Andrew Murray 
---
 tools/perf/design.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index a28dca2..0453ba2 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits 
provide a
 way to request that counting of events be restricted to times when the
 CPU is in user, kernel and/or hypervisor mode.
 
+Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
+to request counting of events restricted to guest and host contexts when
+using Linux as the hypervisor.
+
 The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap
 operations, these can be used to relate userspace IP addresses to actual
 code, even after the mapping (or even the whole process) is gone,
-- 
2.7.4



[PATCH v3 03/12] perf/core: add PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs

2018-12-06 Thread Andrew Murray
Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags. This approach is error prone and
often inconsistent.

Let's instead allow PMU drivers to advertise their inability to exclude
based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This
allows the perf core to reject requests for exclusion events where
there is no support in the PMU.

Signed-off-by: Andrew Murray 
---
 include/linux/perf_event.h | 1 +
 kernel/events/core.c   | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b2e806f..fe92b89 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -244,6 +244,7 @@ struct perf_event;
 #define PERF_PMU_CAP_EXCLUSIVE 0x10
 #define PERF_PMU_CAP_ITRACE0x20
 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40
+#define PERF_PMU_CAP_NO_EXCLUDE0x80
 
 /**
  * struct pmu - generic performance monitoring unit
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5a97f34..5113cfc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9743,6 +9743,15 @@ static int perf_try_init_event(struct pmu *pmu, struct 
perf_event *event)
if (ctx)
perf_event_ctx_unlock(event->group_leader, ctx);
 
+   if (!ret) {
+   if (pmu->capabilities & PERF_PMU_CAP_NO_EXCLUDE &&
+   event_has_any_exclude_flag(event)) {
+   if (event->destroy)
+   event->destroy(event);
+   ret = -EINVAL;
+   }
+   }
+
if (ret)
module_put(pmu->module);
 
-- 
2.7.4



[PATCH v3 05/12] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE

2018-12-06 Thread Andrew Murray
The ARM PMU driver can be used to represent a variety of ARM based
PMUs. Some of these PMUs do not provide support for context
exclusion, where this is the case we advertise the
PERF_PMU_CAP_NO_EXCLUDE capability to ensure that perf prevents us
from handling events where any exclusion flags are set.

Signed-off-by: Andrew Murray 
---
 drivers/perf/arm_pmu.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 7f01f6f..ea69067 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 }
 
 static int
-event_requires_mode_exclusion(struct perf_event_attr *attr)
-{
-   return attr->exclude_idle || attr->exclude_user ||
-  attr->exclude_kernel || attr->exclude_hv;
-}
-
-static int
 __hw_perf_event_init(struct perf_event *event)
 {
struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
@@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event)
/*
 * Check whether we need to exclude the counter from certain modes.
 */
-   if ((!armpmu->set_event_filter ||
-armpmu->set_event_filter(hwc, &event->attr)) &&
-event_requires_mode_exclusion(&event->attr)) {
+   if (armpmu->set_event_filter &&
+   armpmu->set_event_filter(hwc, &event->attr)) {
pr_debug("ARM performance counters do not support "
 "mode exclusion\n");
return -EOPNOTSUPP;
@@ -861,6 +853,9 @@ int armpmu_register(struct arm_pmu *pmu)
if (ret)
return ret;
 
+   if (!pmu->set_event_filter)
+   pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
+
ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
if (ret)
goto out_destroy;
-- 
2.7.4



[PATCH v3 04/12] alpha: perf/core: use PERF_PMU_CAP_NO_EXCLUDE

2018-12-06 Thread Andrew Murray
As the Alpha PMU doesn't support context exclusion let's advertise
the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

This change means that __hw_perf_event_init will now also
indicate that it doesn't support exclude_host and exclude_guest and
will now implicitly return -EINVAL instead of -EPERM. This is likely
more desirable as -EPERM will result in a kernel.perf_event_paranoid
related warning from the perf userspace utility.

Signed-off-by: Andrew Murray 
---
 arch/alpha/kernel/perf_event.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c
index 5613aa37..4341ccf 100644
--- a/arch/alpha/kernel/perf_event.c
+++ b/arch/alpha/kernel/perf_event.c
@@ -630,12 +630,6 @@ static int __hw_perf_event_init(struct perf_event *event)
return ev;
}
 
-   /* The EV67 does not support mode exclusion */
-   if (attr->exclude_kernel || attr->exclude_user
-   || attr->exclude_hv || attr->exclude_idle) {
-   return -EPERM;
-   }
-
/*
 * We place the event type in event_base here and leave calculation
 * of the codes to programme the PMU for alpha_pmu_enable() because
@@ -771,6 +765,7 @@ static struct pmu pmu = {
.start  = alpha_pmu_start,
.stop   = alpha_pmu_stop,
.read   = alpha_pmu_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 
-- 
2.7.4



[PATCH v3 08/12] drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

This change means that qcom_{l2|l3}_pmu will now also indicate that
they do not support exclude_{host|guest} and that xgene_pmu does
not also support exclude_idle and exclude_hv.

Note that for qcom_l2_pmu we now implictly return -EINVAL instead
of -EOPNOTSUPP. This change will result in the perf userspace
utility retrying the perf_event_open system call with fallback
event attributes that do not fail.

Signed-off-by: Andrew Murray 
---
 drivers/perf/qcom_l2_pmu.c | 9 +
 drivers/perf/qcom_l3_pmu.c | 8 +---
 drivers/perf/xgene_pmu.c   | 6 +-
 3 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 842135c..091b4d7 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -509,14 +509,6 @@ static int l2_cache_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   /* We cannot filter accurately so we just don't allow it. */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle) {
-   dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
-   "Can't exclude execution levels\n");
-   return -EOPNOTSUPP;
-   }
-
if (((L2_EVT_GROUP(event->attr.config) > L2_EVT_GROUP_MAX) ||
 ((event->attr.config & ~L2_EVT_MASK) != 0)) &&
(event->attr.config != L2CYCLE_CTR_RAW_CODE)) {
@@ -982,6 +974,7 @@ static int l2_cache_pmu_probe(struct platform_device *pdev)
.stop   = l2_cache_event_stop,
.read   = l2_cache_event_read,
.attr_groups= l2_cache_pmu_attr_grps,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
l2cache_pmu->num_counters = get_num_counters();
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 2dc63d6..5d70646 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -495,13 +495,6 @@ static int qcom_l3_cache__event_init(struct perf_event 
*event)
return -ENOENT;
 
/*
-* There are no per-counter mode filters in the PMU.
-*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle)
-   return -EINVAL;
-
-   /*
 * Sampling not supported since these events are not core-attributable.
 */
if (hwc->sample_period)
@@ -777,6 +770,7 @@ static int qcom_l3_cache_pmu_probe(struct platform_device 
*pdev)
.read   = qcom_l3_cache__event_read,
 
.attr_groups= qcom_l3_cache_pmu_attr_grps,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
memrc = platform_get_resource(pdev, IORESOURCE_MEM, 0);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0e31f13..dad6169 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -914,11 +914,6 @@ static int xgene_perf_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* SOC counters do not have usr/os/guest/host bits */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
/*
@@ -1133,6 +1128,7 @@ static int xgene_init_perf(struct xgene_pmu_dev *pmu_dev, 
char *name)
.start  = xgene_perf_start,
.stop   = xgene_perf_stop,
.read   = xgene_perf_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
/* Hardware counter init */
-- 
2.7.4



[PATCH v3 12/12] perf/core: remove unused perf_flags

2018-12-06 Thread Andrew Murray
Now that perf_flags is not used we remove it.

Signed-off-by: Andrew Murray 
---
 include/uapi/linux/perf_event.h   | 2 --
 tools/include/uapi/linux/perf_event.h | 2 --
 2 files changed, 4 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index f35eb72..ba89bd3 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32   ids[0];
 };
 
-#define perf_flags(attr)   (*(&(attr)->read_format + 1))
-
 /*
  * Ioctls that can be done on a perf event fd:
  */
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index f35eb72..ba89bd3 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32   ids[0];
 };
 
-#define perf_flags(attr)   (*(&(attr)->read_format + 1))
-
 /*
  * Ioctls that can be done on a perf event fd:
  */
-- 
2.7.4



[PATCH v3 07/12] drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

Signed-off-by: Andrew Murray 
---
 drivers/perf/arm-cci.c| 10 +-
 drivers/perf/arm-ccn.c|  6 ++
 drivers/perf/arm_dsu_pmu.c|  9 ++---
 drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c |  1 +
 drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_pmu.c  |  9 -
 7 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 1bfeb16..bfd03e0 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1327,15 +1327,6 @@ static int cci_pmu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EOPNOTSUPP;
 
-   /* We have no filtering of any kind */
-   if (event->attr.exclude_user||
-   event->attr.exclude_kernel  ||
-   event->attr.exclude_hv  ||
-   event->attr.exclude_idle||
-   event->attr.exclude_host||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/*
 * Following the example set by other "uncore" PMUs, we accept any CPU
 * and rewrite its affinity dynamically rather than having perf core
@@ -1433,6 +1424,7 @@ static int cci_pmu_init(struct cci_pmu *cci_pmu, struct 
platform_device *pdev)
.stop   = cci_pmu_stop,
.read   = pmu_read,
.attr_groups= pmu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
cci_pmu->plat_device = pdev;
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 7dd850e..2ae7602 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   if (has_branch_stack(event) || event->attr.exclude_user ||
-   event->attr.exclude_kernel || event->attr.exclude_hv ||
-   event->attr.exclude_idle || event->attr.exclude_host ||
-   event->attr.exclude_guest) {
+   if (has_branch_stack(event)) {
dev_dbg(ccn->dev, "Can't exclude execution levels!\n");
return -EINVAL;
}
@@ -1290,6 +1287,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn)
.read = arm_ccn_pmu_event_read,
.pmu_enable = arm_ccn_pmu_enable,
.pmu_disable = arm_ccn_pmu_disable,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
 
/* No overflow interrupt? Have to use a timer instead. */
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index 660cb8a..5851de5 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -562,13 +562,7 @@ static int dsu_pmu_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   if (has_branch_stack(event) ||
-   event->attr.exclude_user ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle ||
-   event->attr.exclude_host ||
-   event->attr.exclude_guest) {
+   if (has_branch_stack(event)) {
dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
return -EINVAL;
}
@@ -735,6 +729,7 @@ static int dsu_pmu_device_probe(struct platform_device 
*pdev)
.read   = dsu_pmu_read,
 
.attr_groups= dsu_pmu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
rc = perf_pmu_register(&dsu_pmu->pmu, name, -1);
diff --git a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c 
b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
index 1b10ea0..296fef8 100644
--- a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
@@ -396,6 +396,7 @@ static int hisi_ddrc_pmu_probe(struct platform_device *pdev)
.stop   = hisi_uncore_pmu_stop,
.read   = hisi_uncore_pmu_read,
.attr_groups= hisi_ddrc_pmu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
ret = perf_pmu_register(&ddrc_pmu->pmu, name, -1);
diff --git a/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c 
b/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
index 443906e..2553a84 100644
--- a/drivers/perf/hisil

[PATCH v3 06/12] arm: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

Signed-off-by: Andrew Murray 
---
 arch/arm/mach-imx/mmdc.c | 9 ++---
 arch/arm/mm/cache-l2x0-pmu.c | 9 +
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index 04b3bf7..3453838 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -293,13 +293,7 @@ static int mmdc_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   if (event->attr.exclude_user||
-   event->attr.exclude_kernel  ||
-   event->attr.exclude_hv  ||
-   event->attr.exclude_idle||
-   event->attr.exclude_host||
-   event->attr.exclude_guest   ||
-   event->attr.sample_period)
+   if (event->attr.sample_period)
return -EINVAL;
 
if (cfg < 0 || cfg >= MMDC_NUM_COUNTERS)
@@ -455,6 +449,7 @@ static int mmdc_pmu_init(struct mmdc_pmu *pmu_mmdc,
.start  = mmdc_pmu_event_start,
.stop   = mmdc_pmu_event_stop,
.read   = mmdc_pmu_event_update,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
},
.mmdc_base = mmdc_base,
.dev = dev,
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index afe5b4c..99bcd07 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -314,14 +314,6 @@ static int l2x0_pmu_event_init(struct perf_event *event)
event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -544,6 +536,7 @@ static __init int l2x0_pmu_init(void)
.del = l2x0_pmu_event_del,
.event_init = l2x0_pmu_event_init,
.attr_groups = l2x0_pmu_attr_groups,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
 
l2x0_pmu_reset();
-- 
2.7.4



[PATCH v3 10/12] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

Signed-off-by: Andrew Murray 
---
 arch/x86/events/amd/ibs.c  | 13 +
 arch/x86/events/amd/power.c| 10 ++
 arch/x86/events/intel/cstate.c | 12 +++-
 arch/x86/events/intel/rapl.c   |  9 ++---
 arch/x86/events/intel/uncore_snb.c |  9 ++---
 arch/x86/events/msr.c  | 10 ++
 6 files changed, 12 insertions(+), 51 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index d50bb4d..62f317c 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -253,15 +253,6 @@ static int perf_ibs_precise_event(struct perf_event 
*event, u64 *config)
return -EOPNOTSUPP;
 }
 
-static const struct perf_event_attr ibs_notsupp = {
-   .exclude_user   = 1,
-   .exclude_kernel = 1,
-   .exclude_hv = 1,
-   .exclude_idle   = 1,
-   .exclude_host   = 1,
-   .exclude_guest  = 1,
-};
-
 static int perf_ibs_init(struct perf_event *event)
 {
struct hw_perf_event *hwc = &event->hw;
@@ -282,9 +273,6 @@ static int perf_ibs_init(struct perf_event *event)
if (event->pmu != &perf_ibs->pmu)
return -ENOENT;
 
-   if (perf_flags(&event->attr) & perf_flags(&ibs_notsupp))
-   return -EINVAL;
-
if (config & ~perf_ibs->config_mask)
return -EINVAL;
 
@@ -537,6 +525,7 @@ static struct perf_ibs perf_ibs_fetch = {
.start  = perf_ibs_start,
.stop   = perf_ibs_stop,
.read   = perf_ibs_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr= MSR_AMD64_IBSFETCHCTL,
.config_mask= IBS_FETCH_CONFIG_MASK,
diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
index 2aefacf..c5ff084 100644
--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -136,14 +136,7 @@ static int pmu_event_init(struct perf_event *event)
return -ENOENT;
 
/* Unsupported modes and filters. */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest  ||
-   /* no sampling */
-   event->attr.sample_period)
+   if (event->attr.sample_period)
return -EINVAL;
 
if (cfg != AMD_POWER_EVENTSEL_PKG)
@@ -226,6 +219,7 @@ static struct pmu pmu_class = {
.start  = pmu_event_start,
.stop   = pmu_event_stop,
.read   = pmu_event_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static int power_cpu_exit(unsigned int cpu)
diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 9f8084f..15a1981 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -280,13 +280,7 @@ static int cstate_pmu_event_init(struct perf_event *event)
return -ENOENT;
 
/* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest  ||
-   event->attr.sample_period) /* no sampling */
+   if (event->attr.sample_period) /* no sampling */
return -EINVAL;
 
if (event->cpu < 0)
@@ -437,7 +431,7 @@ static struct pmu cstate_core_pmu = {
.start  = cstate_pmu_event_start,
.stop   = cstate_pmu_event_stop,
.read   = cstate_pmu_event_update,
-   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT,
+   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
.module = THIS_MODULE,
 };
 
@@ -451,7 +445,7 @@ static struct pmu cstate_pkg_pmu = {
.start  = cstate_pmu_event_start,
.stop   = cstate_pmu_event_stop,
.read   = cstate_pmu_event_update,
-   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT,
+   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
.module = THIS_MODULE,
 };
 
diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 32f3e94..18a5628 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -397,13 +397,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
return -EINVAL;
 
/* unsupporte

[PATCH v3 09/12] powerpc: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray
For PowerPC PMUs that do not support context exclusion let's
advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that
perf will prevent us from handling events where any exclusion flags
are set. Let's also remove the now unnecessary check for exclusion
flags.

Signed-off-by: Andrew Murray 
---
 arch/powerpc/perf/hv-24x7.c | 10 +-
 arch/powerpc/perf/hv-gpci.c | 10 +-
 arch/powerpc/perf/imc-pmu.c | 19 +--
 3 files changed, 3 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 72238ee..d2b8e60 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1306,15 +1306,6 @@ static int h_24x7_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
@@ -1577,6 +1568,7 @@ static struct pmu h_24x7_pmu = {
.start_txn   = h_24x7_event_start_txn,
.commit_txn  = h_24x7_event_commit_txn,
.cancel_txn  = h_24x7_event_cancel_txn,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static int hv_24x7_init(void)
diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index 43fabb3..735e77b 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -232,15 +232,6 @@ static int h_gpci_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
@@ -285,6 +276,7 @@ static struct pmu h_gpci_pmu = {
.start   = h_gpci_event_start,
.stop= h_gpci_event_stop,
.read= h_gpci_event_update,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static int hv_gpci_init(void)
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 1fafc32b..1dbb0ee 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -473,15 +473,6 @@ static int nest_imc_event_init(struct perf_event *event)
if (event->hw.sample_period)
return -EINVAL;
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -748,15 +739,6 @@ static int core_imc_event_init(struct perf_event *event)
if (event->hw.sample_period)
return -EINVAL;
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -1069,6 +1051,7 @@ static int update_pmu_ops(struct imc_pmu *pmu)
pmu->pmu.stop = imc_event_stop;
pmu->pmu.read = imc_event_update;
pmu->pmu.attr_groups = pmu->attr_groups;
+   pmu->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
pmu->attr_groups[IMC_FORMAT_ATTR] = &imc_format_group;
 
switch (pmu->domain) {
-- 
2.7.4



Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags

2018-12-12 Thread Andrew Murray
On Wed, Dec 12, 2018 at 09:07:42AM +0100, Christoffer Dall wrote:
> On Tue, Dec 11, 2018 at 01:59:03PM +0000, Andrew Murray wrote:
> > On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
> > > [ Reviving old thread. ]
> > > 
> > > Andrew Murray  writes:
> > > > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> > > >> Andrew Murray  writes:
> > > >> 
> > > >> > Update design.txt to reflect the presence of the exclude_host
> > > >> > and exclude_guest perf flags.
> > > >> >
> > > >> > Signed-off-by: Andrew Murray 
> > > >> > ---
> > > >> >  tools/perf/design.txt | 4 
> > > >> >  1 file changed, 4 insertions(+)
> > > >> >
> > > >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> > > >> > index a28dca2..7de7d83 100644
> > > >> > --- a/tools/perf/design.txt
> > > >> > +++ b/tools/perf/design.txt
> > > >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 
> > > >> > 'exclude_hv' bits provide a
> > > >> >  way to request that counting of events be restricted to times when 
> > > >> > the
> > > >> >  CPU is in user, kernel and/or hypervisor mode.
> > > >> >  
> > > >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a 
> > > >> > way
> > > >> > +to request counting of events restricted to guest and host contexts 
> > > >> > when
> > > >> > +using virtualisation.
> > > >> 
> > > >> How does exclude_host differ from exclude_hv ?
> > > >
> > > > I believe exclude_host / exclude_guest are intented to distinguish
> > > > between host and guest in the hosted hypervisor context (KVM).
> > > 
> > > OK yeah, from the perf-list man page:
> > > 
> > >u - user-space counting
> > >k - kernel counting
> > >h - hypervisor counting
> > >I - non idle counting
> > >G - guest counting (in KVM guests)
> > >H - host counting (not in KVM guests)
> > > 
> > > > Whereas exclude_hv allows to distinguish between guest and
> > > > hypervisor in the bare-metal type hypervisors.
> > > 
> > > Except that's exactly not how we use them on powerpc :)
> > > 
> > > We use exclude_hv to exclude "the hypervisor", regardless of whether
> > > it's KVM or PowerVM (which is a bare-metal hypervisor).
> > > 
> > > We don't use exclude_host / exclude_guest at all, which I guess is a
> > > bug, except I didn't know they existed until this thread.
> > > 
> > > eg, in a KVM guest:
> > > 
> > >   $ perf record -e cycles:G /bin/bash -c "for i in {0..10}; do :;done"
> > >   $ perf report -D | grep -Fc "dso: [hypervisor]"
> > >   16
> > > 
> > > 
> > > > In the case of arm64 - if VHE extensions are present then the host
> > > > kernel will run at a higher privilege to the guest kernel, in which
> > > > case there is no distinction between hypervisor and host so we ignore
> > > > exclude_hv. But where VHE extensions are not present then the host
> > > > kernel runs at the same privilege level as the guest and we use a
> > > > higher privilege level to switch between them - in this case we can
> > > > use exclude_hv to discount that hypervisor role of switching between
> > > > guests.
> > > 
> > > I couldn't find any arm64 perf code using exclude_host/guest at all?
> > 
> > Correct - but this is in flight as I am currently adding support for this
> > see [1].
> > 
> > > 
> > > And I don't see any x86 code using exclude_hv.
> > 
> > I can't find any either.
> > 
> > > 
> > > But maybe that's OK, I just worry this is confusing for users.
> > 
> > There is some extra context regarding this where exclude_guest/exclude_host
> > was added, see [2] and where exclude_hv was added, see [3]
> > 
> > Generally it seems that exclude_guest/exclude_host relies upon switching
> > counters off/on on guest/host switch code (which works well in the nested
> > virt case). Wherea

Re: [PATCH v3 00/12] perf/core: Generalise event exclusion checking

2018-12-10 Thread Andrew Murray
On Fri, Dec 07, 2018 at 05:25:17PM +, Will Deacon wrote:
> On Thu, Dec 06, 2018 at 04:47:17PM +0000, Andrew Murray wrote:
> > Many PMU drivers do not have the capability to exclude counting events
> > that occur in specific contexts such as idle, kernel, guest, etc. These
> > drivers indicate this by returning an error in their event_init upon
> > testing the events attribute flags.
> > 
> > However this approach requires that each time a new event modifier is
> > added to perf, all the perf drivers need to be modified to indicate that
> > they don't support the attribute. This results in additional boiler-plate
> > code common to many drivers that needs to be maintained. Furthermore the
> > drivers are not consistent with regards to the error value they return
> > when reporting unsupported attributes.
> > 
> > This patchset allow PMU drivers to advertise their inability to exclude
> > based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This
> > allows the perf core to reject requests for exclusion events where there
> > is no support in the PMU.
> > 
> > This is a functional change, in particular:
> > 
> >  - Some drivers will now additionally (but correctly) report unsupported
> >exclusion flags. It's typical for existing userspace tools such as
> >perf to handle such errors by retrying the system call without the
> >unsupported flags.
> > 
> >  - Drivers that do not support any exclusion that previously reported
> >-EPERM or -EOPNOTSUPP will now report -EINVAL - this is consistent
> >with the majority and results in userspace perf retrying without
> >exclusion.
> > 
> > All drivers touched by this patchset have been compile tested.
> 
> For the bits under arch/arm/ and drivers/perf:
> 
> Acked-by: Will Deacon 
> 
> Note that I've queued the TX2 uncore PMU for 4.21 [1], which could also
> benefit from your new flag.

Ah thanks for pointing this out, I'll send a patch in due course.

Thanks,

Andrwe Murray

> 
> Will
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/commit/?h=for-next/perf&id=69c32972d59388c041268e8206e8eb1acff29b9a


Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags

2018-12-11 Thread Andrew Murray
On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
> [ Reviving old thread. ]
> 
> Andrew Murray  writes:
> > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> >> Andrew Murray  writes:
> >> 
> >> > Update design.txt to reflect the presence of the exclude_host
> >> > and exclude_guest perf flags.
> >> >
> >> > Signed-off-by: Andrew Murray 
> >> > ---
> >> >  tools/perf/design.txt | 4 
> >> >  1 file changed, 4 insertions(+)
> >> >
> >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> >> > index a28dca2..7de7d83 100644
> >> > --- a/tools/perf/design.txt
> >> > +++ b/tools/perf/design.txt
> >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 
> >> > 'exclude_hv' bits provide a
> >> >  way to request that counting of events be restricted to times when the
> >> >  CPU is in user, kernel and/or hypervisor mode.
> >> >  
> >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> >> > +to request counting of events restricted to guest and host contexts when
> >> > +using virtualisation.
> >> 
> >> How does exclude_host differ from exclude_hv ?
> >
> > I believe exclude_host / exclude_guest are intented to distinguish
> > between host and guest in the hosted hypervisor context (KVM).
> 
> OK yeah, from the perf-list man page:
> 
>u - user-space counting
>k - kernel counting
>h - hypervisor counting
>I - non idle counting
>G - guest counting (in KVM guests)
>H - host counting (not in KVM guests)
> 
> > Whereas exclude_hv allows to distinguish between guest and
> > hypervisor in the bare-metal type hypervisors.
> 
> Except that's exactly not how we use them on powerpc :)
> 
> We use exclude_hv to exclude "the hypervisor", regardless of whether
> it's KVM or PowerVM (which is a bare-metal hypervisor).
> 
> We don't use exclude_host / exclude_guest at all, which I guess is a
> bug, except I didn't know they existed until this thread.
> 
> eg, in a KVM guest:
> 
>   $ perf record -e cycles:G /bin/bash -c "for i in {0..10}; do :;done"
>   $ perf report -D | grep -Fc "dso: [hypervisor]"
>   16
> 
> 
> > In the case of arm64 - if VHE extensions are present then the host
> > kernel will run at a higher privilege to the guest kernel, in which
> > case there is no distinction between hypervisor and host so we ignore
> > exclude_hv. But where VHE extensions are not present then the host
> > kernel runs at the same privilege level as the guest and we use a
> > higher privilege level to switch between them - in this case we can
> > use exclude_hv to discount that hypervisor role of switching between
> > guests.
> 
> I couldn't find any arm64 perf code using exclude_host/guest at all?

Correct - but this is in flight as I am currently adding support for this
see [1].

> 
> And I don't see any x86 code using exclude_hv.

I can't find any either.

> 
> But maybe that's OK, I just worry this is confusing for users.

There is some extra context regarding this where exclude_guest/exclude_host
was added, see [2] and where exclude_hv was added, see [3]

Generally it seems that exclude_guest/exclude_host relies upon switching
counters off/on on guest/host switch code (which works well in the nested
virt case). Whereas exclude_hv tends to rely solely on hardware capability
based on privilege level (which works well in the bare metal case where
the guest doesn't run at same privilege as the host).

I think from the user perspective exclude_hv allows you to see your overhead
if you are a guest (i.e. work done by bare metal hypervisor associated with
you as the guest). Whereas exclude_guest/exclude_host doesn't allow you to
see events above you (i.e. the kernel hypervisor) if you are the guest...

At least that's how I read this, I've copied in others that may provide
more authoritative feedback.

[1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-December/033698.html
[2] https://www.spinics.net/lists/kvm/msg53996.html
[3] https://lore.kernel.org/patchwork/patch/143918/

Thanks,

Andrew Murray

> 
> cheers


[PATCH v4 02/13] perf/core: add function to test for event exclusion flags

2019-01-07 Thread Andrew Murray
Add a function that tests if any of the perf event exclusion flags
are set on a given event.

Signed-off-by: Andrew Murray 
---
 include/linux/perf_event.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1d5c551..54a78d2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
 extern void
 perf_log_lost_samples(struct perf_event *event, u64 lost);
 
+static inline bool event_has_any_exclude_flag(struct perf_event *event)
+{
+   struct perf_event_attr *attr = &event->attr;
+
+   return attr->exclude_idle || attr->exclude_user ||
+  attr->exclude_kernel || attr->exclude_hv ||
+  attr->exclude_guest || attr->exclude_host;
+}
+
 static inline bool is_sampling_event(struct perf_event *event)
 {
return event->attr.sample_period != 0;
-- 
2.7.4



[PATCH v4 00/13] perf/core: Generalise event exclusion checking

2019-01-07 Thread Andrew Murray
Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags.

However this approach requires that each time a new event modifier is
added to perf, all the perf drivers need to be modified to indicate that
they don't support the attribute. This results in additional boiler-plate
code common to many drivers that needs to be maintained. Furthermore the
drivers are not consistent with regards to the error value they return
when reporting unsupported attributes.

This patchset allow PMU drivers to advertise their inability to exclude
based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This
allows the perf core to reject requests for exclusion events where there
is no support in the PMU.

This is a functional change, in particular:

 - Some drivers will now additionally (but correctly) report unsupported
   exclusion flags. It's typical for existing userspace tools such as
   perf to handle such errors by retrying the system call without the
   unsupported flags.

 - Drivers that do not support any exclusion that previously reported
   -EPERM or -EOPNOTSUPP will now report -EINVAL - this is consistent
   with the majority and results in userspace perf retrying without
   exclusion.

All drivers touched by this patchset have been compile tested.

Changes from v3:

 - Added PERF_PMU_CAP_NO_EXCLUDE to Cavium TX2 PMU driver

Changes from v2:

 - Invert logic from CAP_EXCLUDE to CAP_NO_EXCLUDE

Changes from v1:

 - Changed approach from explicitly rejecting events in unsupporting PMU
   drivers to explicitly advertising a capability in PMU drivers that
   do support exclusion events

 - Added additional information to tools/perf/design.txt

 - Rename event_has_exclude_flags to event_has_any_exclude_flag and
   update commit log to reflect it's a function

Andrew Murray (13):
  perf/doc: update design.txt for exclude_{host|guest} flags
  perf/core: add function to test for event exclusion flags
  perf/core: add PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs
  alpha: perf/core: use PERF_PMU_CAP_NO_EXCLUDE
  arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE
  arm: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
  drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude
incapable PMUs
  drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude
incapable PMUs
  powerpc: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable
PMUs
  x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
  x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
  perf/core: remove unused perf_flags
  drivers/perf: use PERF_PMU_CAP_NO_EXCLUDE for Cavium TX2 PMU

 arch/alpha/kernel/perf_event.c|  7 +--
 arch/arm/mach-imx/mmdc.c  |  9 ++---
 arch/arm/mm/cache-l2x0-pmu.c  |  9 +
 arch/powerpc/perf/hv-24x7.c   | 10 +-
 arch/powerpc/perf/hv-gpci.c   | 10 +-
 arch/powerpc/perf/imc-pmu.c   | 19 +--
 arch/x86/events/amd/ibs.c | 13 +
 arch/x86/events/amd/iommu.c   |  6 +-
 arch/x86/events/amd/power.c   | 10 ++
 arch/x86/events/amd/uncore.c  |  7 ++-
 arch/x86/events/intel/cstate.c| 12 +++-
 arch/x86/events/intel/rapl.c  |  9 ++---
 arch/x86/events/intel/uncore.c|  9 +
 arch/x86/events/intel/uncore_snb.c|  9 ++---
 arch/x86/events/msr.c | 10 ++
 drivers/perf/arm-cci.c| 10 +-
 drivers/perf/arm-ccn.c|  6 ++
 drivers/perf/arm_dsu_pmu.c|  9 ++---
 drivers/perf/arm_pmu.c| 15 +--
 drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c |  1 +
 drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_pmu.c  |  9 -
 drivers/perf/qcom_l2_pmu.c|  9 +
 drivers/perf/qcom_l3_pmu.c|  8 +---
 drivers/perf/thunderx2_pmu.c  | 10 +-
 drivers/perf/xgene_pmu.c  |  6 +-
 include/linux/perf_event.h| 10 ++
 include/uapi/linux/perf_event.h   |  2 --
 kernel/events/core.c  |  9 +
 tools/include/uapi/linux/perf_event.h |  2 --
 tools/perf/design.txt |  4 
 32 files changed, 63 insertions(+), 198 deletions(-)

-- 
2.7.4



[PATCH v4 04/13] alpha: perf/core: use PERF_PMU_CAP_NO_EXCLUDE

2019-01-07 Thread Andrew Murray
As the Alpha PMU doesn't support context exclusion let's advertise
the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

This change means that __hw_perf_event_init will now also
indicate that it doesn't support exclude_host and exclude_guest and
will now implicitly return -EINVAL instead of -EPERM. This is likely
more desirable as -EPERM will result in a kernel.perf_event_paranoid
related warning from the perf userspace utility.

Signed-off-by: Andrew Murray 
---
 arch/alpha/kernel/perf_event.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c
index 5613aa37..4341ccf 100644
--- a/arch/alpha/kernel/perf_event.c
+++ b/arch/alpha/kernel/perf_event.c
@@ -630,12 +630,6 @@ static int __hw_perf_event_init(struct perf_event *event)
return ev;
}
 
-   /* The EV67 does not support mode exclusion */
-   if (attr->exclude_kernel || attr->exclude_user
-   || attr->exclude_hv || attr->exclude_idle) {
-   return -EPERM;
-   }
-
/*
 * We place the event type in event_base here and leave calculation
 * of the codes to programme the PMU for alpha_pmu_enable() because
@@ -771,6 +765,7 @@ static struct pmu pmu = {
.start  = alpha_pmu_start,
.stop   = alpha_pmu_stop,
.read   = alpha_pmu_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 
-- 
2.7.4



[PATCH v4 03/13] perf/core: add PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs

2019-01-07 Thread Andrew Murray
Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags. This approach is error prone and
often inconsistent.

Let's instead allow PMU drivers to advertise their inability to exclude
based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This
allows the perf core to reject requests for exclusion events where
there is no support in the PMU.

Signed-off-by: Andrew Murray 
---
 include/linux/perf_event.h | 1 +
 kernel/events/core.c   | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 54a78d2..cec02dc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -244,6 +244,7 @@ struct perf_event;
 #define PERF_PMU_CAP_EXCLUSIVE 0x10
 #define PERF_PMU_CAP_ITRACE0x20
 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40
+#define PERF_PMU_CAP_NO_EXCLUDE0x80
 
 /**
  * struct pmu - generic performance monitoring unit
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3cd13a3..fbe59b7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9772,6 +9772,15 @@ static int perf_try_init_event(struct pmu *pmu, struct 
perf_event *event)
if (ctx)
perf_event_ctx_unlock(event->group_leader, ctx);
 
+   if (!ret) {
+   if (pmu->capabilities & PERF_PMU_CAP_NO_EXCLUDE &&
+   event_has_any_exclude_flag(event)) {
+   if (event->destroy)
+   event->destroy(event);
+   ret = -EINVAL;
+   }
+   }
+
if (ret)
module_put(pmu->module);
 
-- 
2.7.4



[PATCH v4 01/13] perf/doc: update design.txt for exclude_{host|guest} flags

2019-01-07 Thread Andrew Murray
Update design.txt to reflect the presence of the exclude_host
and exclude_guest perf flags.

Signed-off-by: Andrew Murray 
---
 tools/perf/design.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index a28dca2..0453ba2 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits 
provide a
 way to request that counting of events be restricted to times when the
 CPU is in user, kernel and/or hypervisor mode.
 
+Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
+to request counting of events restricted to guest and host contexts when
+using Linux as the hypervisor.
+
 The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap
 operations, these can be used to relate userspace IP addresses to actual
 code, even after the mapping (or even the whole process) is gone,
-- 
2.7.4



[PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE

2019-01-07 Thread Andrew Murray
The ARM PMU driver can be used to represent a variety of ARM based
PMUs. Some of these PMUs do not provide support for context
exclusion, where this is the case we advertise the
PERF_PMU_CAP_NO_EXCLUDE capability to ensure that perf prevents us
from handling events where any exclusion flags are set.

Signed-off-by: Andrew Murray 
Acked-by: Will Deacon 
---
 drivers/perf/arm_pmu.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index d0b7dd8..eec75b9 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 }
 
 static int
-event_requires_mode_exclusion(struct perf_event_attr *attr)
-{
-   return attr->exclude_idle || attr->exclude_user ||
-  attr->exclude_kernel || attr->exclude_hv;
-}
-
-static int
 __hw_perf_event_init(struct perf_event *event)
 {
struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
@@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event)
/*
 * Check whether we need to exclude the counter from certain modes.
 */
-   if ((!armpmu->set_event_filter ||
-armpmu->set_event_filter(hwc, &event->attr)) &&
-event_requires_mode_exclusion(&event->attr)) {
+   if (armpmu->set_event_filter &&
+   armpmu->set_event_filter(hwc, &event->attr)) {
pr_debug("ARM performance counters do not support "
 "mode exclusion\n");
return -EOPNOTSUPP;
@@ -867,6 +859,9 @@ int armpmu_register(struct arm_pmu *pmu)
if (ret)
return ret;
 
+   if (!pmu->set_event_filter)
+   pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
+
ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
if (ret)
goto out_destroy;
-- 
2.7.4



[PATCH v4 06/13] arm: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-07 Thread Andrew Murray
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

Signed-off-by: Andrew Murray 
Acked-by: Shawn Guo 
Acked-by: Will Deacon 
---
 arch/arm/mach-imx/mmdc.c | 9 ++---
 arch/arm/mm/cache-l2x0-pmu.c | 9 +
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index e49e068..fce4b42 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -294,13 +294,7 @@ static int mmdc_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   if (event->attr.exclude_user||
-   event->attr.exclude_kernel  ||
-   event->attr.exclude_hv  ||
-   event->attr.exclude_idle||
-   event->attr.exclude_host||
-   event->attr.exclude_guest   ||
-   event->attr.sample_period)
+   if (event->attr.sample_period)
return -EINVAL;
 
if (cfg < 0 || cfg >= MMDC_NUM_COUNTERS)
@@ -456,6 +450,7 @@ static int mmdc_pmu_init(struct mmdc_pmu *pmu_mmdc,
.start  = mmdc_pmu_event_start,
.stop   = mmdc_pmu_event_stop,
.read   = mmdc_pmu_event_update,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
},
.mmdc_base = mmdc_base,
.dev = dev,
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index afe5b4c..99bcd07 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -314,14 +314,6 @@ static int l2x0_pmu_event_init(struct perf_event *event)
event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -544,6 +536,7 @@ static __init int l2x0_pmu_init(void)
.del = l2x0_pmu_event_del,
.event_init = l2x0_pmu_event_init,
.attr_groups = l2x0_pmu_attr_groups,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
 
l2x0_pmu_reset();
-- 
2.7.4



[PATCH v4 11/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-07 Thread Andrew Murray
For x86 PMUs that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

This change means that amd/iommu and amd/uncore will now also
indicate that they do not support exclude_{hv|idle} and intel/uncore
that it does not support exclude_{guest|host}.

Signed-off-by: Andrew Murray 
---
 arch/x86/events/amd/iommu.c| 6 +-
 arch/x86/events/amd/uncore.c   | 7 ++-
 arch/x86/events/intel/uncore.c | 9 +
 3 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 3210fee..7635c23 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -223,11 +223,6 @@ static int perf_iommu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* IOMMU counters do not have usr/os/guest/host bits */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -414,6 +409,7 @@ static const struct pmu iommu_pmu __initconst = {
.read   = perf_iommu_read,
.task_ctx_nr= perf_invalid_context,
.attr_groups= amd_iommu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static __init int init_one_iommu(unsigned int idx)
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 398df6e..79cfd3b 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -201,11 +201,6 @@ static int amd_uncore_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* NB and Last level cache counters do not have usr/os/guest/host bits 
*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
/* and we do not enable counter overflow interrupts */
hwc->config = event->attr.config & AMD64_RAW_EVENT_MASK_NB;
hwc->idx = -1;
@@ -307,6 +302,7 @@ static struct pmu amd_nb_pmu = {
.start  = amd_uncore_start,
.stop   = amd_uncore_stop,
.read   = amd_uncore_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static struct pmu amd_llc_pmu = {
@@ -317,6 +313,7 @@ static struct pmu amd_llc_pmu = {
.start  = amd_uncore_start,
.stop   = amd_uncore_stop,
.read   = amd_uncore_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static struct amd_uncore *amd_uncore_alloc(unsigned int cpu)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 27a4614..d516161 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -695,14 +695,6 @@ static int uncore_pmu_event_init(struct perf_event *event)
if (pmu->func_id < 0)
return -ENOENT;
 
-   /*
-* Uncore PMU does measure at all privilege level all the time.
-* So it doesn't make sense to specify any exclude bits.
-*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle)
-   return -EINVAL;
-
/* Sampling not supported yet */
if (hwc->sample_period)
return -EINVAL;
@@ -800,6 +792,7 @@ static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
.stop   = uncore_pmu_event_stop,
.read   = uncore_pmu_event_read,
.module = THIS_MODULE,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
} else {
pmu->pmu = *pmu->type->pmu;
-- 
2.7.4



[PATCH v4 12/13] perf/core: remove unused perf_flags

2019-01-07 Thread Andrew Murray
Now that perf_flags is not used we remove it.

Signed-off-by: Andrew Murray 
---
 include/uapi/linux/perf_event.h   | 2 --
 tools/include/uapi/linux/perf_event.h | 2 --
 2 files changed, 4 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9de8780..ea19b5d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32   ids[0];
 };
 
-#define perf_flags(attr)   (*(&(attr)->read_format + 1))
-
 /*
  * Ioctls that can be done on a perf event fd:
  */
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index 9de8780..ea19b5d 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32   ids[0];
 };
 
-#define perf_flags(attr)   (*(&(attr)->read_format + 1))
-
 /*
  * Ioctls that can be done on a perf event fd:
  */
-- 
2.7.4



[PATCH v4 08/13] drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-07 Thread Andrew Murray
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

This change means that qcom_{l2|l3}_pmu will now also indicate that
they do not support exclude_{host|guest} and that xgene_pmu does
not also support exclude_idle and exclude_hv.

Note that for qcom_l2_pmu we now implictly return -EINVAL instead
of -EOPNOTSUPP. This change will result in the perf userspace
utility retrying the perf_event_open system call with fallback
event attributes that do not fail.

Signed-off-by: Andrew Murray 
Acked-by: Will Deacon 
---
 drivers/perf/qcom_l2_pmu.c | 9 +
 drivers/perf/qcom_l3_pmu.c | 8 +---
 drivers/perf/xgene_pmu.c   | 6 +-
 3 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 842135c..091b4d7 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -509,14 +509,6 @@ static int l2_cache_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   /* We cannot filter accurately so we just don't allow it. */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle) {
-   dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
-   "Can't exclude execution levels\n");
-   return -EOPNOTSUPP;
-   }
-
if (((L2_EVT_GROUP(event->attr.config) > L2_EVT_GROUP_MAX) ||
 ((event->attr.config & ~L2_EVT_MASK) != 0)) &&
(event->attr.config != L2CYCLE_CTR_RAW_CODE)) {
@@ -982,6 +974,7 @@ static int l2_cache_pmu_probe(struct platform_device *pdev)
.stop   = l2_cache_event_stop,
.read   = l2_cache_event_read,
.attr_groups= l2_cache_pmu_attr_grps,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
l2cache_pmu->num_counters = get_num_counters();
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 2dc63d6..5d70646 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -495,13 +495,6 @@ static int qcom_l3_cache__event_init(struct perf_event 
*event)
return -ENOENT;
 
/*
-* There are no per-counter mode filters in the PMU.
-*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle)
-   return -EINVAL;
-
-   /*
 * Sampling not supported since these events are not core-attributable.
 */
if (hwc->sample_period)
@@ -777,6 +770,7 @@ static int qcom_l3_cache_pmu_probe(struct platform_device 
*pdev)
.read   = qcom_l3_cache__event_read,
 
.attr_groups= qcom_l3_cache_pmu_attr_grps,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
memrc = platform_get_resource(pdev, IORESOURCE_MEM, 0);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0dc9ff0..d4ec048 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -917,11 +917,6 @@ static int xgene_perf_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* SOC counters do not have usr/os/guest/host bits */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
/*
@@ -1136,6 +1131,7 @@ static int xgene_init_perf(struct xgene_pmu_dev *pmu_dev, 
char *name)
.start  = xgene_perf_start,
.stop   = xgene_perf_stop,
.read   = xgene_perf_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
/* Hardware counter init */
-- 
2.7.4



[PATCH v4 10/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-07 Thread Andrew Murray
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

Signed-off-by: Andrew Murray 
---
 arch/x86/events/amd/ibs.c  | 13 +
 arch/x86/events/amd/power.c| 10 ++
 arch/x86/events/intel/cstate.c | 12 +++-
 arch/x86/events/intel/rapl.c   |  9 ++---
 arch/x86/events/intel/uncore_snb.c |  9 ++---
 arch/x86/events/msr.c  | 10 ++
 6 files changed, 12 insertions(+), 51 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index d50bb4d..62f317c 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -253,15 +253,6 @@ static int perf_ibs_precise_event(struct perf_event 
*event, u64 *config)
return -EOPNOTSUPP;
 }
 
-static const struct perf_event_attr ibs_notsupp = {
-   .exclude_user   = 1,
-   .exclude_kernel = 1,
-   .exclude_hv = 1,
-   .exclude_idle   = 1,
-   .exclude_host   = 1,
-   .exclude_guest  = 1,
-};
-
 static int perf_ibs_init(struct perf_event *event)
 {
struct hw_perf_event *hwc = &event->hw;
@@ -282,9 +273,6 @@ static int perf_ibs_init(struct perf_event *event)
if (event->pmu != &perf_ibs->pmu)
return -ENOENT;
 
-   if (perf_flags(&event->attr) & perf_flags(&ibs_notsupp))
-   return -EINVAL;
-
if (config & ~perf_ibs->config_mask)
return -EINVAL;
 
@@ -537,6 +525,7 @@ static struct perf_ibs perf_ibs_fetch = {
.start  = perf_ibs_start,
.stop   = perf_ibs_stop,
.read   = perf_ibs_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr= MSR_AMD64_IBSFETCHCTL,
.config_mask= IBS_FETCH_CONFIG_MASK,
diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
index 2aefacf..c5ff084 100644
--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -136,14 +136,7 @@ static int pmu_event_init(struct perf_event *event)
return -ENOENT;
 
/* Unsupported modes and filters. */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest  ||
-   /* no sampling */
-   event->attr.sample_period)
+   if (event->attr.sample_period)
return -EINVAL;
 
if (cfg != AMD_POWER_EVENTSEL_PKG)
@@ -226,6 +219,7 @@ static struct pmu pmu_class = {
.start  = pmu_event_start,
.stop   = pmu_event_stop,
.read   = pmu_event_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static int power_cpu_exit(unsigned int cpu)
diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index d2e7807..94a4b7f 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -280,13 +280,7 @@ static int cstate_pmu_event_init(struct perf_event *event)
return -ENOENT;
 
/* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest  ||
-   event->attr.sample_period) /* no sampling */
+   if (event->attr.sample_period) /* no sampling */
return -EINVAL;
 
if (event->cpu < 0)
@@ -437,7 +431,7 @@ static struct pmu cstate_core_pmu = {
.start  = cstate_pmu_event_start,
.stop   = cstate_pmu_event_stop,
.read   = cstate_pmu_event_update,
-   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT,
+   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
.module = THIS_MODULE,
 };
 
@@ -451,7 +445,7 @@ static struct pmu cstate_pkg_pmu = {
.start  = cstate_pmu_event_start,
.stop   = cstate_pmu_event_stop,
.read   = cstate_pmu_event_update,
-   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT,
+   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
.module = THIS_MODULE,
 };
 
diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 91039ff..94dc564 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -397,13 +397,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
return -EINVAL;
 
/* unsupporte

[PATCH v4 13/13] drivers/perf: use PERF_PMU_CAP_NO_EXCLUDE for Cavium TX2 PMU

2019-01-07 Thread Andrew Murray
The Cavium ThunderX2 UNCORE PMU driver doesn't support any event
filtering. Let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability to
simplify the code.

Signed-off-by: Andrew Murray 
---
 drivers/perf/thunderx2_pmu.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
index c9a1701..43d76c8 100644
--- a/drivers/perf/thunderx2_pmu.c
+++ b/drivers/perf/thunderx2_pmu.c
@@ -424,15 +424,6 @@ static int tx2_uncore_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* We have no filtering of any kind */
-   if (event->attr.exclude_user||
-   event->attr.exclude_kernel  ||
-   event->attr.exclude_hv  ||
-   event->attr.exclude_idle||
-   event->attr.exclude_host||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -572,6 +563,7 @@ static int tx2_uncore_pmu_register(
.start  = tx2_uncore_event_start,
.stop   = tx2_uncore_event_stop,
.read   = tx2_uncore_event_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
tx2_pmu->pmu.name = devm_kasprintf(dev, GFP_KERNEL,
-- 
2.7.4



[PATCH v4 09/13] powerpc: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-07 Thread Andrew Murray
For PowerPC PMUs that do not support context exclusion let's
advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that
perf will prevent us from handling events where any exclusion flags
are set. Let's also remove the now unnecessary check for exclusion
flags.

Signed-off-by: Andrew Murray 
Reviewed-by: Madhavan Srinivasan 
Acked-by: Michael Ellerman 
---
 arch/powerpc/perf/hv-24x7.c | 10 +-
 arch/powerpc/perf/hv-gpci.c | 10 +-
 arch/powerpc/perf/imc-pmu.c | 19 +--
 3 files changed, 3 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 72238ee..d2b8e60 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1306,15 +1306,6 @@ static int h_24x7_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
@@ -1577,6 +1568,7 @@ static struct pmu h_24x7_pmu = {
.start_txn   = h_24x7_event_start_txn,
.commit_txn  = h_24x7_event_commit_txn,
.cancel_txn  = h_24x7_event_cancel_txn,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static int hv_24x7_init(void)
diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index 43fabb3..735e77b 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -232,15 +232,6 @@ static int h_gpci_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
@@ -285,6 +276,7 @@ static struct pmu h_gpci_pmu = {
.start   = h_gpci_event_start,
.stop= h_gpci_event_stop,
.read= h_gpci_event_update,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static int hv_gpci_init(void)
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f292a3f..b1c37cc 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -473,15 +473,6 @@ static int nest_imc_event_init(struct perf_event *event)
if (event->hw.sample_period)
return -EINVAL;
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -748,15 +739,6 @@ static int core_imc_event_init(struct perf_event *event)
if (event->hw.sample_period)
return -EINVAL;
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -1069,6 +1051,7 @@ static int update_pmu_ops(struct imc_pmu *pmu)
pmu->pmu.stop = imc_event_stop;
pmu->pmu.read = imc_event_update;
pmu->pmu.attr_groups = pmu->attr_groups;
+   pmu->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
pmu->attr_groups[IMC_FORMAT_ATTR] = &imc_format_group;
 
switch (pmu->domain) {
-- 
2.7.4



[PATCH v4 07/13] drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-07 Thread Andrew Murray
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

Signed-off-by: Andrew Murray 
Acked-by: Will Deacon 
---
 drivers/perf/arm-cci.c| 10 +-
 drivers/perf/arm-ccn.c|  6 ++
 drivers/perf/arm_dsu_pmu.c|  9 ++---
 drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c |  1 +
 drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_pmu.c  |  9 -
 7 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 1bfeb16..bfd03e0 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1327,15 +1327,6 @@ static int cci_pmu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EOPNOTSUPP;
 
-   /* We have no filtering of any kind */
-   if (event->attr.exclude_user||
-   event->attr.exclude_kernel  ||
-   event->attr.exclude_hv  ||
-   event->attr.exclude_idle||
-   event->attr.exclude_host||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/*
 * Following the example set by other "uncore" PMUs, we accept any CPU
 * and rewrite its affinity dynamically rather than having perf core
@@ -1433,6 +1424,7 @@ static int cci_pmu_init(struct cci_pmu *cci_pmu, struct 
platform_device *pdev)
.stop   = cci_pmu_stop,
.read   = pmu_read,
.attr_groups= pmu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
cci_pmu->plat_device = pdev;
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 7dd850e..2ae7602 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   if (has_branch_stack(event) || event->attr.exclude_user ||
-   event->attr.exclude_kernel || event->attr.exclude_hv ||
-   event->attr.exclude_idle || event->attr.exclude_host ||
-   event->attr.exclude_guest) {
+   if (has_branch_stack(event)) {
dev_dbg(ccn->dev, "Can't exclude execution levels!\n");
return -EINVAL;
}
@@ -1290,6 +1287,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn)
.read = arm_ccn_pmu_event_read,
.pmu_enable = arm_ccn_pmu_enable,
.pmu_disable = arm_ccn_pmu_disable,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
 
/* No overflow interrupt? Have to use a timer instead. */
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index 660cb8a..5851de5 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -562,13 +562,7 @@ static int dsu_pmu_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   if (has_branch_stack(event) ||
-   event->attr.exclude_user ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle ||
-   event->attr.exclude_host ||
-   event->attr.exclude_guest) {
+   if (has_branch_stack(event)) {
dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
return -EINVAL;
}
@@ -735,6 +729,7 @@ static int dsu_pmu_device_probe(struct platform_device 
*pdev)
.read   = dsu_pmu_read,
 
.attr_groups= dsu_pmu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
rc = perf_pmu_register(&dsu_pmu->pmu, name, -1);
diff --git a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c 
b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
index 69372e2..0eba947 100644
--- a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
@@ -396,6 +396,7 @@ static int hisi_ddrc_pmu_probe(struct platform_device *pdev)
.stop   = hisi_uncore_pmu_stop,
.read   = hisi_uncore_pmu_read,
.attr_groups= hisi_ddrc_pmu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
ret = perf_pmu_register(&ddrc_pmu->pmu, name, -1);
diff --git a/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c 
b/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
index 443906e..2553a84 10064

Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE

2019-01-08 Thread Andrew Murray
On Tue, Jan 08, 2019 at 11:28:02AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 04:27:22PM +0000, Andrew Murray wrote:
> > @@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event)
> > /*
> >  * Check whether we need to exclude the counter from certain modes.
> >  */
> > +   if (armpmu->set_event_filter &&
> > +   armpmu->set_event_filter(hwc, &event->attr)) {
> > pr_debug("ARM performance counters do not support "
> >  "mode exclusion\n");
> > return -EOPNOTSUPP;
> 
> This then requires all set_event_filter() implementations to check all
> the various exclude options;

Yes but this isn't a new requirement, this hunk uses the absence of
set_event_filter to blanket indicate that no exclusion flags are supported.


> also, set_event_filter() failing then
> returns with -EOPNOTSUPP instead of the -EINVAL the CAP_NO_EXCLUDE
> generates, which is again inconsitent.

Yes, it's not ideal - but a step in the right direction. I wanted to limit
user visible changes as much as possible, where I've identified them I've
noted it in the commit log.

> 
> If I look at (the very first git-grep found me)
> armv7pmu_set_event_filter(), then I find it returning -EPERM (again
> inconsistent but irrelevant because the actual value is not preserved)
> for exclude_idle.
> 
> But it doesn't seem to check exclude_host at all for example.

Yes I found lots of examples like this across the tree whilst doing this
work. However I decided to initially start with simply removing duplicated
code as a result of adding this flag and attempting to preserve existing
functionality. I thought that if I add missing checks then the patchset
will get much bigger and be harder to merge. I would like to do this though
as another non-cross-arch series.

Can we limit this patch series to the minimal changes required to fully
use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems
in subsequent patch sets?

Thanks,

Andrew Murray

> 
> > @@ -867,6 +859,9 @@ int armpmu_register(struct arm_pmu *pmu)
> > if (ret)
> > return ret;
> >  
> > +   if (!pmu->set_event_filter)
> > +   pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
> > +
> > ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
> > if (ret)
> > goto out_destroy;
> > -- 
> > 2.7.4
> > 


Re: [PATCH v4 11/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-08 Thread Andrew Murray
On Tue, Jan 08, 2019 at 11:49:40AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 04:27:28PM +0000, Andrew Murray wrote:
> 
> This patch has the exact same subject as the previous one.. that seems
> sub-optimal.

Ah yes, I'll update that in subsquent revisions. (The reason for two patches
was to separate functional vs non-functional changes).

Andrew Murray


Re: [PATCH v4 10/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-08 Thread Andrew Murray
On Tue, Jan 08, 2019 at 11:48:41AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 04:27:27PM +0000, Andrew Murray wrote:
> > For drivers that do not support context exclusion let's advertise the
> > PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
> > prevent us from handling events where any exclusion flags are set.
> > Let's also remove the now unnecessary check for exclusion flags.
> > 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/x86/events/amd/ibs.c  | 13 +
> >  arch/x86/events/amd/power.c| 10 ++
> >  arch/x86/events/intel/cstate.c | 12 +++-
> >  arch/x86/events/intel/rapl.c   |  9 ++---
> >  arch/x86/events/intel/uncore_snb.c |  9 ++---
> >  arch/x86/events/msr.c  | 10 ++
> >  6 files changed, 12 insertions(+), 51 deletions(-)
> 
> You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but
> then you also don't check if it handles all the various exclude options
> correctly/consistently.
> 
> Now; I must admit that that is a bit of a maze, but I think we can at
> least add exclude_idle and exclude_hv fails in there, nothing uses those
> afaict.

Yes it took me some time to make sense of it.

As per my comments in the other patch, I think you're suggesting that I
add additional checks to x86. I think they are needed but I'd prefer to
make functional changes in a separate series, I'm happy to do this.

> 
> On the various exclude options; they are as follows (IIUC):
> 
>   - exclude_guest: we're a HV/host-kernel and we don't want the counter
>to run when we run a guest context.
> 
>   - exclude_host: we're a HV/host-kernel and we don't want the counter
>   to run when we run in host context.
> 
>   - exclude_hv: we're a guest and don't want the counter to run in HV
> context.
> 
> Now, KVM always implies exclude_hv afaict (for guests),

It certaintly does for ARM.

> I'm not sure
> what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf
> works on Xen) -- nor quite sure who to ask, Boris, Jeurgen?

Thanks,

Andrew Murray
> 


Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE

2019-01-08 Thread Andrew Murray
On Tue, Jan 08, 2019 at 02:10:31PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 08, 2019 at 01:07:41PM +0000, Andrew Murray wrote:
> 
> > Yes I found lots of examples like this across the tree whilst doing this
> > work. However I decided to initially start with simply removing duplicated
> > code as a result of adding this flag and attempting to preserve existing
> > functionality. I thought that if I add missing checks then the patchset
> > will get much bigger and be harder to merge. I would like to do this though
> > as another non-cross-arch series.
> > 
> > Can we limit this patch series to the minimal changes required to fully
> > use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems
> > in subsequent patch sets?
> 
> Ok, but it would've been nice to see that mentioned somewhere.

I'll update the cover leter on any next revision. I'll try to be clearer next
time with my intentions.

Andrew Murray


Re: [PATCH] asm-generic: io: Fix ioport_map() for !CONFIG_GENERIC_IOMAP && CONFIG_INDIRECT_PIO

2018-09-17 Thread Andrew Murray
On Mon, Sep 17, 2018 at 03:42:32PM +0100, John Garry wrote:
> - dead e-mail addresses (Zhichang, Gabriele)
> 
> On 13/09/2018 13:48, Andrew Murray wrote:
> 
> Hi Andrew,
> 
> > The !CONFIG_GENERIC_IOMAP version of ioport_map uses MMIO_UPPER_LIMIT to
> > prevent users from making I/O accesses outside the expected I/O range -
> > however it erroneously treats MMIO_UPPER_LIMIT as a mask which is
> > contradictory to its other users.
> > 
> > The introduction of CONFIG_INDIRECT_PIO, which subtracts an arbitrary
> > amount from IO_SPACE_LIMIT to form MMIO_UPPER_LIMIT, results in ioport_map
> > mangling the given port rather than capping it.
> > 
> > We address this by aligning more closely with the CONFIG_GENERIC_IOMAP
> > implementation of ioport_map by using the comparison operator and
> > returning NULL where the port exceeds MMIO_UPPER_LIMIT. Though note that
> > we preserve the existing behavior of masking with IO_SPACE_LIMIT such that
> > we don't break existing buggy drivers that somehow rely on this masking.
> 
> I wouldn't say any drivers rely on this - for the only device driver which
> uses the "Indirect" IO space region above MMIO_UPPER_LIMIT (HiSilicon LPC),
> no child device driver for that host uses ioport_map() [those being ipmi si
> and 8250 generic+of drivers].

I was really referring to the existing !CONFIG_GENERIC_IOMAP && 
!CONFIG_INDIRECT_PIO use cases where there may be drivers (however unlikely)
that provide ioport_map an incorrect address which, due to the masking, gets
converted into a valid address. Returning NULL for these would result in new
run-time errors therefore it seemed safer to change this to support the new
"indirect IO" whilst not breaking existing bad drivers.

A more correct implementation would always return NULL if
port > IO_SPACE_LIMIT - it would fully align it with the CONFIG_GENERIC_IOMAP
implementation - in my view these two implementations should behave the same
with respect to error handling - at the moment they don't.


> 
> Regardless of that, it seems better to return NULL when the port is
> out-of-range, rather than masking it.
> 
> Cheers
> 
> > 
> > Fixes: 5745392e0c2b ("PCI: Apply the new generic I/O management on PCI IO 
> > hosts")
> > Reported-by: Will Deacon 
> > Signed-off-by: Andrew Murray 
> 
> Reviewed-by: John Garry 

Thanks for the review.

Andrew Murray

> 
> > ---
> >  include/asm-generic/io.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
> > index 66d1d45..d356f80 100644
> > --- a/include/asm-generic/io.h
> > +++ b/include/asm-generic/io.h
> > @@ -1026,7 +1026,8 @@ static inline void __iomem *ioremap_wt(phys_addr_t 
> > offset, size_t size)
> >  #define ioport_map ioport_map
> >  static inline void __iomem *ioport_map(unsigned long port, unsigned int nr)
> >  {
> > -   return PCI_IOBASE + (port & MMIO_UPPER_LIMIT);
> > +   port &= IO_SPACE_LIMIT;
> > +   return (port > MMIO_UPPER_LIMIT) ? NULL : PCI_IOBASE + port;
> >  }
> >  #endif
> > 
> > 
> 
> 


Re: [PATCH] kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP

2019-04-02 Thread Andrew Murray
On Tue, Apr 02, 2019 at 12:06:16PM +0100, Suzuki K Poulose wrote:
> With commit a80868f398554842b14, we no longer ensure that the
> THP page is properly aligned in the guest IPA. Skip the stage2
> huge mapping for unaligned IPA backed by transparent hugepages.
> 
> Fixes: a80868f398554842b14 ("KVM: arm/arm64: Enforce PTE mappings at stage2 
> when needed")
> Reported-by: Eric Auger 
> Cc: Marc Zyngier 
> Cc: Chirstoffer Dall 
> Cc: Zenghui Yu 
> Cc: Zheng Xiang 
> Tested-by: Eric Auger 
> Signed-off-by: Suzuki K Poulose 
> ---
>  virt/kvm/arm/mmu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 27c9583..4a22f5b 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1412,7 +1412,9 @@ static bool transparent_hugepage_adjust(kvm_pfn_t 
> *pfnp, phys_addr_t *ipap)
>* page accordingly.
>*/
>   mask = PTRS_PER_PMD - 1;
> - VM_BUG_ON((gfn & mask) != (pfn & mask));
> + /* Skip memslots with unaligned IPA and user address */
> + if ((gfn & mask) != (pfn & mask))
> + return false;
>   if (pfn & mask) {
>   *ipap &= PMD_MASK;
>   kvm_release_pfn_clean(pfn);
> -- 
> 2.7.4

I was able to reproduce this issue on v5.1-rc3 on a SoftIron Overdrive 1000
(AMD Seattle (Rev.B1)) with defconfig+ARM64_64K_PAGES with:

qemu-system-aarch64 -cpu host -machine type=virt,accel=kvm -nographic -smp 4
-m 4096 -kernel /boot/vmlinuz-4.9.0-7-arm64 --append "console=ttyAMA0
default_hugepagesz=2M hugepages=256"

The 'default_hugepagesz=2M hugepages=256' had no effect on the reproducibility,
however the guest only intermittently failed to boot. Applying the above
patch fixed this and the guest boots every time.

Tested-by: Andrew Murray 

Thanks,

Andrew Murray

> 
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/4] pci: OF: Fix the conversion of IO ranges into IO resources.

2014-02-27 Thread Andrew Murray
On 27 February 2014 13:06, Liviu Dudau  wrote:
>
> The ranges property for a host bridge controller in DT describes
> the mapping between the PCI bus address and the CPU physical address.
> The resources framework however expects that the IO resources start
> at a pseudo "port" address 0 (zero) and have a maximum size of 64kb.

Is this just in the case of ARM? (I've tried to keep up with the
conversation, but apologies if I've misunderstood).

> The conversion from pci ranges to resources failed to take that into
> account.
>
> In the process move the function into drivers/of/address.c as it
> now depends on pci_address_to_pio() code.
>
> Signed-off-by: Liviu Dudau 
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 1a54f1f..7cf2b16 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -719,3 +719,34 @@ void __iomem *of_iomap(struct device_node *np, int index)
> return ioremap(res.start, resource_size(&res));
>  }
>  EXPORT_SYMBOL(of_iomap);
> +
> +/**
> + * of_pci_range_to_resource - Create a resource from an of_pci_range
> + * @range: the PCI range that describes the resource
> + * @np:device node where the range belongs to
> + * @res:   pointer to a valid resource that will be updated to
> + *  reflect the values contained in the range.
> + * Note that if the range is an IO range, the resource will be converted
> + * using pci_address_to_pio() which can fail if it is called to early or
> + * if the range cannot be matched to any host bridge IO space.
> + */
> +void of_pci_range_to_resource(struct of_pci_range *range,
> +   struct device_node *np, struct resource *res)
> +{
> +   res->flags = range->flags;
> +   if (res->flags & IORESOURCE_IO) {
> +   unsigned long port;
> +   port = pci_address_to_pio(range->pci_addr);

Is this likely to break existing users of of_pci_range_to_resource?

For example arch/mips: IO_SPACE_LIMIT defaults to 0x and there is
no overridden implementation for pci_address_to_pio, therefore this
will set res->start to OF_BAD_ADDR whereas previously it would have
been the CPU address for I/O (assuming the cpu_addr was previously >
64K).

I have no idea if I/O previously worked for mips, but this patch seems
to change that behavior. It may be a similar story for microblaze and
powerpc.

Andrew Murray

> +   if (port == (unsigned long)-1) {
> +   res->start = (resource_size_t)OF_BAD_ADDR;
> +   res->end = (resource_size_t)OF_BAD_ADDR;
> +   return;
> +   }
> +   res->start = port;
> +   } else {
> +   res->start = range->cpu_addr;
> +   }
> +   res->end = res->start + range->size - 1;
> +   res->parent = res->child = res->sibling = NULL;
> +   res->name = np->full_name;
> +}
> diff --git a/include/linux/of_address.h b/include/linux/of_address.h
> index 5f6ed6b..a667762 100644
> --- a/include/linux/of_address.h
> +++ b/include/linux/of_address.h
> @@ -23,17 +23,8 @@ struct of_pci_range {
>  #define for_each_of_pci_range(parser, range) \
> for (; of_pci_range_parser_one(parser, range);)
>
> -static inline void of_pci_range_to_resource(struct of_pci_range *range,
> -   struct device_node *np,
> -   struct resource *res)
> -{
> -   res->flags = range->flags;
> -   res->start = range->cpu_addr;
> -   res->end = range->cpu_addr + range->size - 1;
> -   res->parent = res->child = res->sibling = NULL;
> -   res->name = np->full_name;
> -}
> -
> +extern void of_pci_range_to_resource(struct of_pci_range *range,
> +   struct device_node *np, struct resource *res);
>  /* Translate a DMA address from device space to CPU space */
>  extern u64 of_translate_dma_address(struct device_node *dev,
> const __be32 *in_addr);
> --
> 1.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/3] arm64: Add architecture support for PCI

2014-02-27 Thread Andrew Murray
gt; + * modify it under the terms of the GNU General Public License
> + * version 2 as published by the Free Software Foundation.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +
> +/*
> + * Called after each bus is probed, but before its children are examined
> + */
> +void pcibios_fixup_bus(struct pci_bus *bus)
> +{
> +   struct pci_dev *dev;
> +   struct resource *res;
> +   int i;
> +
> +   if (!pci_is_root_bus(bus)) {
> +   pci_read_bridge_bases(bus);
> +
> +   pci_bus_for_each_resource(bus, res, i) {
> +   if (!res || !res->flags || res->parent)
> +   continue;
> +
> +   /*
> +* If we are going to reassign everything, we can
> +* shrink the P2P resource to have zero size to
> +* save space
> +*/
> +   if (pci_has_flag(PCI_REASSIGN_ALL_RSRC)) {
> +   res->flags |= IORESOURCE_UNSET;
> +   res->start = 0;
> +   res->end = -1;
> +   continue;
> +   }
> +   }
> +   }
> +
> +   list_for_each_entry(dev, &bus->devices, bus_list) {
> +   /* Ignore fully discovered devices */
> +   if (dev->is_added)
> +   continue;
> +
> +   set_dev_node(&dev->dev, pcibus_to_node(dev->bus));
> +
> +   /* Read default IRQs and fixup if necessary */
> +   dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
> +   }
> +}
> +EXPORT_SYMBOL(pcibios_fixup_bus);
> +
> +/*
> + * We don't have to worry about legacy ISA devices, so nothing to do here
> + */
> +resource_size_t pcibios_align_resource(void *data, const struct resource 
> *res,
> +   resource_size_t size, resource_size_t align)
> +{
> +   return ALIGN(res->start, align);
> +}
> +EXPORT_SYMBOL(pcibios_align_resource);
> +
> +int pcibios_enable_device(struct pci_dev *dev, int mask)
> +{
> +   return pci_enable_resources(dev, mask);
> +}

It looks like you will soon be able to remove this and rely on the
shinny new weak implementation of pcibios_enable_device now
(http://www.spinics.net/lists/linux-pci/msg29387.html)

Andrew Murray

> +
> +void pcibios_fixup_bridge_ranges(struct list_head *resources)
> +{
> +}
> +
> +#define IO_SPACE_PAGES ((IO_SPACE_LIMIT + 1) / PAGE_SIZE)
> +static DECLARE_BITMAP(pci_iospace, IO_SPACE_PAGES);
> +
> +unsigned long pci_ioremap_io(const struct resource *res, phys_addr_t 
> phys_addr)
> +{
> +   unsigned long start, len, virt_start;
> +   int err;
> +
> +   if (res->end > IO_SPACE_LIMIT)
> +   return -EINVAL;
> +
> +   /*
> +* try finding free space for the whole size first,
> +* fall back to 64K if not available
> +*/
> +   len = resource_size(res);
> +   start = bitmap_find_next_zero_area(pci_iospace, IO_SPACE_PAGES,
> +   res->start / PAGE_SIZE, len / PAGE_SIZE, 0);
> +   if (start == IO_SPACE_PAGES && len > SZ_64K) {
> +   len = SZ_64K;
> +   start = 0;
> +   start = bitmap_find_next_zero_area(pci_iospace, 
> IO_SPACE_PAGES,
> +   start, len / PAGE_SIZE, 0);
> +   }
> +
> +   /* no 64K area found */
> +   if (start == IO_SPACE_PAGES)
> +   return -ENOMEM;
> +
> +   /* ioremap physical aperture to virtual aperture */
> +   virt_start = start * PAGE_SIZE + (unsigned long)PCI_IOBASE;
> +   err = ioremap_page_range(virt_start, virt_start + len,
> +   phys_addr, __pgprot(PROT_DEVICE_nGnRE));
> +   if (err)
> +   return err;
> +
> +   bitmap_set(pci_iospace, start, len / PAGE_SIZE);
> +
> +   /* return io_offset */
> +   return start * PAGE_SIZE - res->start;
> +}
> --
> 1.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND: RFC PATCH 3/3] pcie: keystone: add pcie driver based on designware core driver

2014-04-02 Thread Andrew Murray
On 2 April 2014 16:43, Murali Karicheri  wrote:

> Keystone pcie driver is developed based on other dw based pcie drivers
>
> such as pci-exynos that uses subsys_initcall(). I am new to this list,
>
> probably Jingoo (copied) has some history on why we can't use module.
> For now I will keep it as is and can be re-visited in the next revisions.
> Also I will experiment with PCIE port driver as well.
>
>
> BTW, PCIE driver currently uses Legacy or MSI IRQ. Keystone PCI has
> a platform IRQ. Is DT based irq configuration is the appropriate way
> to add this capability?

As far as I am aware - the PCI standards define a particular way for
devices to describe which interrupt will be used for things like
hotplug, AER and PME. These interrupts are always PCI interrupts (i.e.
MSI/MSI-X/legacy). Thus the port services code in the kernel uses
standard configuration space accesses to determine the interrupt to
use. Also note that it's not just the host bridge that can provide
these services but any PCIE device, I guess in this sense a host
bridge is treated like any other device.

If my understanding is correct I don't believe the current port
services code allows exceptions to this, i.e. to say this host bridge
actually uses a platform IRQ for AER rather than an MSI. Though this
may be quite useful as I suspect many host bridges provide interrupts
for things like PME through platform IRQs rather that PCI interrupts.

Does the Keystone have platform IRQs for things like AER? Is that
because the IP makes these events available through platform IRQs in
addition to the standard PCI means?

Andrew Murray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 05/12] PCI: OF: Fix the conversion of IO ranges into IO resources.

2014-09-24 Thread Andrew Murray
On 24 September 2014 01:22, Bjorn Helgaas  wrote:
> [+cc Andrew]
>
> On Tue, Sep 23, 2014 at 08:01:07PM +0100, Liviu Dudau wrote:
>> The ranges property for a host bridge controller in DT describes
>> the mapping between the PCI bus address and the CPU physical address.
>> The resources framework however expects that the IO resources start
>> at a pseudo "port" address 0 (zero) and have a maximum size of 
>> IO_SPACE_LIMIT.
>> The conversion from pci ranges to resources failed to take that into account,
>> returning a CPU physical address instead of a port number.
>>
>> Also fix all the drivers that depend on the old behaviour by fetching
>> the CPU physical address based on the port number where it is being needed.
>>
>> Cc: Grant Likely 
>> Cc: Rob Herring 
>> Cc: Arnd Bergmann 
>> Acked-by: Linus Walleij 
>> Cc: Thierry Reding 
>> Cc: Simon Horman 
>> Cc: Catalin Marinas 
>> Signed-off-by: Liviu Dudau 
>> ---
>>  arch/arm/mach-integrator/pci_v3.c | 23 ++--
>>  drivers/of/address.c  | 44 
>> +++
>>  drivers/pci/host/pci-tegra.c  | 10 ++---
>>  drivers/pci/host/pcie-rcar.c  | 21 +--
>>  include/linux/of_address.h| 15 ++---
>>  5 files changed, 82 insertions(+), 31 deletions(-)
>> ...
>
> The of_pci_range_to_resource() implementation in drivers/of/address.c is
> always compiled when CONFIG_OF_ADDRESS=y, but when CONFIG_OF_ADDRESS=y and
> CONFIG_PCI is not set, we get the static inline version from
> include/linux/of_address.h as well, causing a redefinition error.
>
>> diff --git a/drivers/of/address.c b/drivers/of/address.c
>> @@ -957,12 +957,48 @@ bool of_dma_is_coherent(struct device_node *np)
>> ...
>> +int of_pci_range_to_resource(struct of_pci_range *range,
>> + struct device_node *np, struct resource *res)
>
>> diff --git a/include/linux/of_address.h b/include/linux/of_address.h
>> ...
>>  #else /* CONFIG_OF_ADDRESS && CONFIG_PCI */
>>  static inline int of_pci_address_to_resource(struct device_node *dev, int 
>> bar,
>>struct resource *r)
>> @@ -144,6 +139,12 @@ static inline int of_pci_address_to_resource(struct 
>> device_node *dev, int bar,
>>   return -ENOSYS;
>>  }
>>
>> +static inline int of_pci_range_to_resource(struct of_pci_range *range,
>> + struct device_node *np, struct resource *res)
>> +{
>> + return -ENOSYS;
>> +}
>
> My proposal to fix it is the following three patches.  The first moves the
> inline version of of_pci_range_to_resource() into the existing "#if
> defined(CONFIG_OF_ADDRESS) && defined(CONFIG_PCI)" block.
>
> Andrew added it (and some other PCI-related things) with 29b635c00f3e
> ("of/pci: Provide support for parsing PCI DT ranges property") to
> of_address.h outside of any ifdefs, so it's always available.  Maybe
> there's a reason that's needed in the non-CONFIG_PCI case, but I didn't see
> it with a quick look.
>

There was no reason - it probably should have been inside a #ifdef
like the others.

Andrew Murray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND: RFC PATCH 3/3] pcie: keystone: add pcie driver based on designware core driver

2014-03-25 Thread Andrew Murray
On 25 March 2014 10:35, Thierry Reding  wrote:
> On Tue, Mar 25, 2014 at 08:44:36AM +0100, Arnd Bergmann wrote:
>> On Monday 24 March 2014 20:35:26 Murali Karicheri wrote:
> [...]
>> > +/* Keystone PCIe driver does not allow module unload */
>> > +static int __init ks_pcie_init(void)
>> > +{
>> > +   return platform_driver_probe(&ks_pcie_driver, ks_pcie_probe);
>> > +}
>> > +subsys_initcall(ks_pcie_init);
>>
>> Why subsys_initcall?
>>
>> We should probably try to fix unloading soon.
>
> I did some work on this a few months ago but never got around to
> cleaning up the patches. Let me see if I can resurrect that work.

I think there may be merit in these drivers using subsys_init. I've
not had time to investigate, but as far as I can remember this causes
issues with piceport.

For ARM32 host drivers, pci_fixup_irqs (arch/arm/kernel/bios32.c) must
be called before init_service_irqs (portdrv_core.c) otherwise pcieport
acts on invalid information in dev->irq and breaks. It seems that its
possible for the portbus driver to pick up new devices before bios32
has been able to fixup the irqs. Making the host bridge drivers subsys
will overcome this. I guess this hasn't been an issue in the past as
host bridge drivers were always in the arch/ directories.

In any case it may be worth testing this driver with PCIEPORTBUS enabled.

Andrew Murray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >