date:20160128

Re: [PATCH v11 2/4] perf,kvm/{x86,s390}: Remove const from kvm_events_tp

2016-01-28 Thread David Ahern


On 1/27/16 11:33 PM, Hemant Kumar wrote:

This patch removes the "const" qualifier from kvm_events_tp declaration
to account for the fact that some architectures may need to update this
variable dynamically. For instance, powerpc will need to update this
variable dynamically depending on the machine type.

Signed-off-by: Hemant Kumar
---


Acked-by: David Ahern 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v11 1/4] perf,kvm/{x86,s390}: Remove dependency on uapi/kvm_perf.h

2016-01-28 Thread David Ahern


On 1/27/16 11:33 PM, Hemant Kumar wrote:

Its better to remove the dependency on uapi/kvm_perf.h to allow dynamic
discovery of kvm events (if its needed). To do this, some extern
variables have been introduced with which we can keep the generic
functions generic.

Signed-off-by: Hemant Kumar
Acked-by: Alexander Yarygin


Acked-by: David Ahern 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v9 3/6] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms.

2016-01-28 Thread Ganapatrao Kulkarni

On Thu, Jan 28, 2016 at 11:38 PM, Will Deacon  wrote:
> On Thu, Jan 28, 2016 at 10:42:17PM +0530, Ganapatrao Kulkarni wrote:
>> On Thu, Jan 28, 2016 at 8:09 PM, Will Deacon  wrote:
>> > On Tue, Jan 26, 2016 at 02:36:04PM -0600, Bjorn Helgaas wrote:
>> >> Subject is "arm64/arm, numa, dt: adding ..."  What is the significance
>> >> of the "arm" part?  The other patches only mention "arm64".
>> >>
>> >> General comment: the code below has little, if anything, that is
>> >> actually arm64-specific.  Maybe this is the first DT-based NUMA
>> >> platform?  I don't see other similar code for other arches, so maybe
>> >> it's too early to try to generalize it, but we should try to avoid
>> >> adding duplicates of this code if/when other arches do show up.
>> >
>> > Having it in the core code would allow us to share it with arch/arm/
>> > fairly straightforwardly.
>> This binding can be used for arm too.
>> however at this moment it is the need of arm64 platforms.
>> can we please keep this to arm64 as it's too early to try to
>> generalize it(as Bjorn suggested)
>> I prefer to keep it as it is, otherwise ok.
>> Please suggest.
>
> My suggestions time and time again on the NUMA patches from you have
> consistently been around consolidation of existing code, or moving things
> that aren't architecture-specific out of the architecture code.

thanks, i shall move this out to drivers/of
>
> Will

thanks
Ganapat
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned

2016-01-28 Thread Alex Williamson

On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> When vfio passthrough a PCI device of which MMIO BARs
> are smaller than PAGE_SIZE, guest will not handle the
> mmio accesses to the BARs which leads to mmio emulations
> in host.
> 
> This is because vfio will not allow to passthrough one
> BAR's mmio page which may be shared with other BARs.
> 
> To solve this performance issue, this patch adds a kernel
> parameter "pci=resource_page_aligned=on" to enforce
> the alignment of all MMIO BARs to be at least PAGE_SIZE,
> so that one BAR's mmio page would not be shared with other
> BARs. We can also disable it through kernel parameter
> "pci=resource_page_aligned=off".
> 
> For the default value of the parameter, we think it should be
> arch-independent, so we add a macro
> HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we
> define this macro to enable this parameter by default on PPC64
> platform which can easily hit this performance issue because
> its PAGE_SIZE is 64KB.
> 
> Note that the kernel parameter won't works if kernel doesn't do
> resources reallocation.

And where do you account for this so that we know whether it's really in
effect?

> Signed-off-by: Yongji Xie 
> ---
>  Documentation/kernel-parameters.txt |5 +
>  arch/powerpc/include/asm/pci.h  |   11 +++
>  drivers/pci/pci.c   |   35 
> +++
>  drivers/pci/pci.h   |8 +++-
>  include/linux/pci.h |4 
>  5 files changed, 62 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index 742f69d..3f2a7c9 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
>   PAGE_SIZE is used as alignment.
>   PCI-PCI bridge can be specified, if resource
>   windows need to be expanded.
> + resource_page_aligned=  Enable/disable enforcing the alignment
> + of all PCI devices' memory resources to be
> + at least PAGE_SIZE if resources reallocation
> + is done by kernel.
> + Format: { "on" | "off" }
>   ecrc=   Enable/disable PCIe ECRC (transaction layer
>   end-to-end CRC checking).
>   bios: Use BIOS/firmware settings. This is the
> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
> index 3453bd8..2d2b3ef 100644
> --- a/arch/powerpc/include/asm/pci.h
> +++ b/arch/powerpc/include/asm/pci.h
> @@ -136,6 +136,17 @@ extern pgprot_t  pci_phys_mem_access_prot(struct file 
> *file,
>    unsigned long pfn,
>    unsigned long size,
>    pgprot_t prot);
> +#ifdef CONFIG_PPC64
> +
> +/* For PPC64, We enforce all PCI MMIO BARs to be page aligned
> + * by default. This would be helpful to improve performance
> + * when we passthrough a PCI device of which BARs are smaller
> + * than PAGE_SIZE(64KB). And we can use kernel parameter
> + * "pci=resource_page_aligned=off" to disable it.
> + */
> +#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED  1
> +
> +#endif
>  
>  #define HAVE_ARCH_PCI_RESOURCE_TO_USER
>  extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 314db8c..7b21238 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -99,6 +99,9 @@ u8 pci_cache_line_size;
>   */
>  unsigned int pcibios_max_latency = 255;
>  
> +bool pci_resources_page_aligned =
> + IS_ENABLED(HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED);

I don't think this is proper use of IS_ENABLED, which seems to be
targeted at CONFIG_ type options.  You could define this as that in an
arch Kconfig.

> +
>  /* If set, the PCIe ARI capability will not be used. */
>  static bool pcie_ari_disabled;
>  
> @@ -4746,6 +4749,35 @@ static ssize_t pci_resource_alignment_store(struct 
> bus_type *bus,
>  BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_show,
>   pci_resource_alignment_store);
>  
> +static void pci_resources_get_page_aligned(char *str)
> +{
> + if (!strncmp(str, "off", 3))
> + pci_resources_page_aligned = false;
> + else if (!strncmp(str, "on", 2))
> + pci_resources_page_aligned = true;
> +}

"get"?

> +
> +/*
> + * This function checks whether PCI BARs' mmio page will be shared
> + * with other BARs.
> + */
> +bool pci_resources_share_page(struct pci_dev *dev, int resno)
> +{
> + struct resource *res = dev->resource + resno;
> +
> + if (resource_size(res) >=

[no subject]

2016-01-28 Thread David Rientjes via Linuxppc-dev

--- Begin Message ---
On Thu, 28 Jan 2016, Christian Borntraeger wrote:

> Indeed, I only touched the identity mapping and dump stack.
> The question is do we really want to change free_init_pages as well?
> The unmapping during runtime causes significant overhead, but the
> unmapping after init imposes almost no runtime overhead. Of course,
> things get fishy now as what is enabled and what not.
> 
> Kconfig after my patch "mm/debug_pagealloc: Ask users for default setting of 
> debug_pagealloc"
> (in mm) now states
> snip
> By default this option will have a small overhead, e.g. by not
> allowing the kernel mapping to be backed by large pages on some
> architectures. Even bigger overhead comes when the debugging is
> enabled by DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc
> command line parameter.
> snip
> 
> So I am tempted to NOT change free_init_pages, but the x86 maintainers
> can certainly decide differently. Ingo, Thomas, H. Peter, please advise.
> 

I'm sorry, but I thought the discussion of the previous version of the 
patchset led to deciding that all CONFIG_DEBUG_PAGEALLOC behavior would be 
controlled by being enabled on the commandline and checked with 
debug_pagealloc_enabled().

I don't think we should have a CONFIG_DEBUG_PAGEALLOC that does some stuff 
and then a commandline parameter or CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT 
to enable more stuff.  It should either be all enabled by the commandline 
(or config option) or split into a separate entity.  
CONFIG_DEBUG_PAGEALLOC_LIGHT and CONFIG_DEBUG_PAGEALLOC would be fine, but 
the current state is very confusing about what is being done and what 
isn't.

It also wouldn't hurt to enumerate what is enabled and what isn't enabled 
in the Kconfig entry.
--- End Message ---
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported

2016-01-28 Thread Alex Williamson

On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> MSI-X tables are not allowed to be mmapped in vfio-pci
> driver in case that user get to touch this directly.
> This will cause some performance issues when when PCI
> adapters have critical registers in the same page as
> the MSI-X table.
> 
> However, some kind of PCI host bridge such as IODA bridge
> on Power support filtering of MSIs, which can ensure that a
> given pci device can only shoot the MSIs assigned for it.
> So we think it's safe to expose the MSI-X table to userspace
> if filtering of MSIs is supported because the exposed MSI-X
> table can't be used to do harm to other memory space.
> 
> To support this case, this patch adds a pci_host_bridge
> attribute to indicate if this PCI host bridge supports
> filtering of MSIs.
> 
> Signed-off-by: Yongji Xie 
> ---
>  drivers/pci/host-bridge.c |6 ++
>  include/linux/pci.h   |3 +++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
> index 5f4a2e0..c029267 100644
> --- a/drivers/pci/host-bridge.c
> +++ b/drivers/pci/host-bridge.c
> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct 
> resource *res,
>   res->end = region->end + offset;
>  }
>  EXPORT_SYMBOL(pcibios_bus_to_resource);
> +
> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
> +{
> + return pci_find_host_bridge(pdev->bus)->msi_filtered;
> +}
> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index b640d65..b952b78 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -412,6 +412,7 @@ struct pci_host_bridge {
>   void (*release_fn)(struct pci_host_bridge *);
>   void *release_data;
>   unsigned int ignore_reset_delay:1;  /* for entire hierarchy */
> + unsigned int msi_filtered:1;/* support filtering of MSIs */
>   /* Resource alignment requirements */
>   resource_size_t (*align_resource)(struct pci_dev *dev,
>   const struct resource *res,
> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct pci_host_bridge 
> *bridge,
>  
>  int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
>  
> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
> +
>  /*
>   * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that correspond
>   * to P2P or CardBus bridge windows) go in a table.  Additional ones (for

Don't we already have a flag for this in the IOMMU space?

enum iommu_cap {
IOMMU_CAP_CACHE_COHERENCY,  /* IOMMU can enforce cache coherent DMA
   transactions */
--->IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt isolation */
IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
};

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 5/5] vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs

2016-01-28 Thread Alex Williamson

On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> Current vfio-pci implementation disallows to mmap MSI-X
> table in case that user get to touch this directly.
> 
> But we should allow to mmap these MSI-X tables if the PCI
> host bridge supports filtering of MSIs.
> 
> Signed-off-by: Yongji Xie 
> ---
>  drivers/vfio/pci/vfio_pci.c |6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 11fd0f0..4d68f6a 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -555,7 +555,8 @@ static long vfio_pci_ioctl(void *device_data,
>   IORESOURCE_MEM && !pci_resources_share_page(pdev,
>   info.index)) {
>   info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
> - if (info.index == vdev->msix_bar) {
> + if (!pci_host_bridge_msi_filtered_enabled(pdev) 
> &&
> + info.index == vdev->msix_bar) {
>   ret = msix_sparse_mmap_cap(vdev, );
>   if (ret)
>   return ret;
> @@ -967,7 +968,8 @@ static int vfio_pci_mmap(void *device_data, struct 
> vm_area_struct *vma)
>   if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
>   return -EINVAL;
>  
> - if (index == vdev->msix_bar) {
> + if (!pci_host_bridge_msi_filtered_enabled(pdev) &&
> + index == vdev->msix_bar) {
>   /*
>    * Disallow mmaps overlapping the MSI-X table; users don't
>    * get to touch this directly.  We could find somewhere

What about read()/write() access, why would we allow mmap() but not
those?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] documentation: Add disclaimer

2016-01-28 Thread David Howells

Paul E. McKenney  wrote:

> Good point!  Would you be willing to add a Signed-off-by so I
> can take the combined change, assuming Peter and Will are good
> with it?

Sure!

David
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 29/31] Add debugger entry points for POWERPC

2016-01-28 Thread Jeffrey Merkey

This patch series adds an export which can be set by system debuggers to
direct the hard lockup and soft lockup detector to trigger a breakpoint
exception and enter a debugger if one is active.  It is assumed that if
someone sets this variable, then an breakpoint handler of some sort will
be actively loaded or registered via the notify die handler chain.

This addition is extremely useful for debugging hard and soft lockups
real time and quickly from a console debugger.

Signed-off-by: Jeffrey Merkey 
---
 arch/powerpc/include/asm/kdebug.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/include/asm/kdebug.h 
b/arch/powerpc/include/asm/kdebug.h
index ae6d206..54f5ca8 100644
--- a/arch/powerpc/include/asm/kdebug.h
+++ b/arch/powerpc/include/asm/kdebug.h
@@ -11,5 +11,10 @@ enum die_val {
DIE_SSTEP,
 };
 
+static inline void arch_breakpoint(void)
+{
+   asm(".long 0x7d821008"); /* twge r2, r2 */
+}
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_KDEBUG_H */
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/6] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path

2016-01-28 Thread Viresh Kumar

On 28-01-16, 12:55, Shilpasri G Bhat wrote:
> cpu_to_chip_id() does a DT walk through to find out the chip id by
> taking a contended device tree lock. This adds an unnecessary overhead
> in a hot path. So instead of calling cpu_to_chip_id() everytime cache
> the chip ids for all cores in the array 'core_to_chip_map' and use it
> in the hotpath.
> 
> Reported-by: Anton Blanchard 
> Signed-off-by: Shilpasri G Bhat 
> ---
> Changes from v6:
> - Minor changes to move the code 'cpumask_copy()' after 'core_to_chip_map'
>   is allocated.
> - Move 'kfree(chips)' to a separate patch.

See, you weren't that bad :)

Just that you missed saying that individual patches contain
version-log in cover-letter :)

Acked-by: Viresh Kumar 

-- 
viresh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 6/6] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-28 Thread Gautham R Shenoy

Hi Shilpa,

A minor nit.
On Thu, Jan 28, 2016 at 12:55:41PM +0530, Shilpasri G Bhat wrote:

[..snip..]
> +
> +What:
> /sys/devices/system/cpu/cpufreq/chip*/throttle_reasons/
> +Date:Jan 2016
> +Contact: Linux kernel mailing list 
> + Linux for PowerPC mailing list 
> +Description: CPU Frequency throttle reason stat for the chip
> +
> + This directory contains throttle reason files. Each file gives
> + the total number of times the max frequency is throttled, except
> + for 'unthrottle_count', which gives the total number of times
> + the max frequency is unthrottled after being throttled. Below
> + are the reason attributes.
> +
> + cpu_over_temperature: Throttled due to cpu over temperature
> +
> + occ_reset: Throttled due to reset of OCC
> +
> + over_current: Throttled due to over current

Overcurrent is a single word. No need of the extra space. You could
fix that and add my Reviewed-by.

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] powerpc/mm: Enable HugeTLB page migration

2016-01-28 Thread Anshuman Khandual

This enables HugeTLB page migration for PPC64_BOOK3S systems which implement
HugeTLB page at the PMD level. It enables the kernel configuration option
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION by default which turns on the function
hugepage_migration_supported() during migration. After the recent changes
to the PTE format, HugeTLB page migration happens successfully.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/Kconfig | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e4824fd..65d52a0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -82,6 +82,10 @@ config GENERIC_HWEIGHT
 config ARCH_HAS_DMA_SET_COHERENT_MASK
 bool
 
+config ARCH_ENABLE_HUGEPAGE_MIGRATION
+   def_bool y
+   depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
+
 config PPC
bool
default y
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 1/6] cpufreq: powernv: Free 'chips' on module exit

2016-01-28 Thread Gautham R Shenoy

On Thu, Jan 28, 2016 at 12:55:36PM +0530, Shilpasri G Bhat wrote:
> This will free the dynamically allocated memory of'chips' on
> module exit.
> 
> Signed-off-by: Shilpasri G Bhat 

Reviewed-by: Gautham R. Shenoy 

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 0/6] cpufreq: powernv: Redesign the presentation of throttle notification and solve bug-fixes in the driver

2016-01-28 Thread Viresh Kumar

On 28-01-16, 12:55, Shilpasri G Bhat wrote:
> In POWER8, OCC(On-Chip-Controller) can throttle the frequency of the
> CPU when the chip crosses its thermal and power limits. Currently,
> powernv-cpufreq driver detects and reports this event as a console
> message. Some machines may not sustain the max turbo frequency in all
> conditions and can be throttled frequently. This can lead to the
> flooding of console with throttle messages. So this patchset aims to
> redesign the presentation of this event via sysfs counters and
> tracepoints. And it also fixes couple of bugs reported in the driver.
> 
> - Patch [1] fixes a memory leak bug
> - Patch [2] fixes the cpu hot-plug bug in powernv_cpufreq_work_fn().
> - Patch [3] solves a bug in powernv_cpufreq_throttle_check(), which
>   calls in to cpu_to_chip_id() in hot path which reads DT every time
>   to find the chip id.
> - Patches [4] to [6] will add a perf trace point
>   "power:powernv_throttle" and sysfs throttle counter stats in
>   /sys/devices/system/cpu/cpufreq/chipN.
> 
> Changes from v6:
> - Changes wrt comments from Balbir Singh and Viresh Kumar.

Who cares about these names in version-log ?? You have completely
missed what should have been present here. This is version log and
that's what should be present here :)

And because of that, I have to
- search for your earlier version in my mailbox
- Read all my comments
- Haven't read what Balbir have said

See ..

-- 
viresh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 6/6] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-28 Thread Viresh Kumar

On 28-01-16, 12:55, Shilpasri G Bhat wrote:
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
> b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index b683e8e..dea4620 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -271,3 +271,48 @@ Description: Parameters for the CPU cache attributes
>   - WriteBack: data is written only to the cache line and
>the modified cache line is written to main
>memory only when it is replaced
> +
> +What:/sys/devices/system/cpu/cpufreq/chip*/throttle_stats

What about the chip directory ? Shouldn't that be documented? And
shouldn't that mention that this is just for powerpc ?

And before that, I don't think that you are doing this properly. I am
sorry that I never came to a point where I could review it, and you
continued with it, version after version.

But, I really have strong objections to the way this is done. And you
are making things more complex then they are.

So, these stats are per-policy, right ?

Then why aren't they added on the policy->kobj instead, just like
cpufreq-stats? And maybe inside cpufreq-stats folder only?

That will solve many complexities you have in place here and will look
sane as well.

Right now, you have stats as two places, cpu/cpufreq/chip/ and
cpu/cpuX/cpufreq/stats/, which doesn't look wise and adds to
confusion.

What do you say?

-- 
viresh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 6/6] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-28 Thread Viresh Kumar

On 28-01-16, 15:06, Shilpasri G Bhat wrote:
> No these stats are not per-policy. They are per-chip. The throttle event is
> common for all cores in the chip.

How do you define a chip? And how is it different then the group of
CPUs represented by the policy ?

-- 
viresh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] selfttest/powerpc: Add memory page migration tests

2016-01-28 Thread Anshuman Khandual

This adds two tests for memory page migration. One for normal page
migration which works for both 4K or 64K base page size kernel and
the other one is for 16MB huge page migration which will work both
4K or 64K base page sized 16MB huge pages as and when we support
huge page migration.

Signed-off-by: Anshuman Khandual 
---
Changes in V3:
- Minor changes to the code for considering skipped pages
- Enabled HugeTLB test in the script as it works now

Changes in V2:
- Changed the script to accommodate review comments from Michael
- Disabled huge page migration test till it is supported on POWER

Sample test result
==

Test HugeTLB vs THP

test: hugetlb_vs_thp
tags: git_version:v4.5-rc1-30-gda30491
success: hugetlb_vs_thp
[PASS]
.
Test subpage protection
.
test: subpage_prot_anon
tags: git_version:v4.5-rc1-30-gda30491
allocated malloc block of 0x400 bytes at 0x0x3fff8072
testing malloc block...
success: subpage_prot_anon
test: subpage_prot_file
tags: git_version:v4.5-rc1-30-gda30491
allocated tempfile for 0x1 bytes at 0x0x3fff8472
testing file map...
success: subpage_prot_file
[PASS]
...
Test normal page migration
...
test: page_migration
tags: git_version:v4.5-rc1-30-gda30491
Running on base page size 64K
64 moved 0 skipped 0 failed
1024 moved 0 skipped 0 failed
3328 moved 768 skipped 0 failed
4352 moved 3840 skipped 0 failed
8448 moved 7936 skipped 0 failed
16640 moved 16128 skipped 0 failed
success: page_migration
[PASS]
.
Test huge page migration
.
test: hugepage_migration
tags: git_version:v4.5-rc1-30-gda30491
Running on base page size 64K
1 moved 0 skipped 0 failed
16 moved 0 skipped 0 failed
32 moved 0 skipped 0 failed
success: hugepage_migration
[PASS]



 tools/testing/selftests/powerpc/mm/Makefile|  14 +-
 .../selftests/powerpc/mm/hugepage-migration.c  |  30 +++
 tools/testing/selftests/powerpc/mm/migration.h | 204 +
 .../testing/selftests/powerpc/mm/page-migration.c  |  33 
 tools/testing/selftests/powerpc/mm/run_mmtests | 104 +++
 5 files changed, 380 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/hugepage-migration.c
 create mode 100644 tools/testing/selftests/powerpc/mm/migration.h
 create mode 100644 tools/testing/selftests/powerpc/mm/page-migration.c
 create mode 100755 tools/testing/selftests/powerpc/mm/run_mmtests

diff --git a/tools/testing/selftests/powerpc/mm/Makefile 
b/tools/testing/selftests/powerpc/mm/Makefile
index ee179e2..c482614 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -1,12 +1,16 @@
 noarg:
$(MAKE) -C ../
 
-TEST_PROGS := hugetlb_vs_thp_test subpage_prot
-TEST_FILES := tempfile
+TEST_PROGS := run_mmtests
+TEST_FILES := hugetlb_vs_thp_test
+TEST_FILES += subpage_prot
+TEST_FILES += tempfile
+TEST_FILES += hugepage-migration
+TEST_FILES += page-migration
 
-all: $(TEST_PROGS) $(TEST_FILES)
+all: $(TEST_FILES)
 
-$(TEST_PROGS): ../harness.c
+$(TEST_FILES): ../harness.c
 
 include ../../lib.mk
 
@@ -14,4 +18,4 @@ tempfile:
dd if=/dev/zero of=tempfile bs=64k count=1
 
 clean:
-   rm -f $(TEST_PROGS) tempfile
+   rm -f $(TEST_FILES)
diff --git a/tools/testing/selftests/powerpc/mm/hugepage-migration.c 
b/tools/testing/selftests/powerpc/mm/hugepage-migration.c
new file mode 100644
index 000..b60bc10
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/hugepage-migration.c
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2015, Anshuman Khandual, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+#include "migration.h"
+
+static int hugepage_migration(void)
+{
+   int ret = 0;
+
+   if ((unsigned long)getpagesize() == 0x1000)
+   printf("Running on base page size 4K\n");
+
+   if ((unsigned long)getpagesize() == 0x1)
+   printf("Running on base page size 64K\n");
+
+   ret = test_huge_migration(16 * MEM_MB);
+   ret = test_huge_migration(256 * MEM_MB);
+   ret = test_huge_migration(512 * MEM_MB);
+
+   return ret;
+}
+
+int main(void)
+{
+   return test_harness(hugepage_migration, "hugepage_migration");
+}
diff --git a/tools/testing/selftests/powerpc/mm/migration.h 
b/tools/testing/selftests/powerpc/mm/migration.h
new file mode 100644
index 000..fe35849
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/migration.h
@@ -0,0 +1,204 @@
+/*
+ * Copyright (C) 2015, Anshuman Khandual, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as

Re: [PATCH v7 3/6] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path

2016-01-28 Thread Gautham R Shenoy

On Thu, Jan 28, 2016 at 12:55:38PM +0530, Shilpasri G Bhat wrote:
> cpu_to_chip_id() does a DT walk through to find out the chip id by
> taking a contended device tree lock. This adds an unnecessary overhead
> in a hot path. So instead of calling cpu_to_chip_id() everytime cache
> the chip ids for all cores in the array 'core_to_chip_map' and use it
> in the hotpath.
> 
> Reported-by: Anton Blanchard 
> Signed-off-by: Shilpasri G Bhat 

Reviewed-by: Gautham R. Shenoy 

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 5/6] cpufreq: powernv: Replace pr_info with trace print for throttle event

2016-01-28 Thread Gautham R Shenoy

On Thu, Jan 28, 2016 at 12:55:40PM +0530, Shilpasri G Bhat wrote:
> Currently we use printk message to notify the throttle event. But this
> can flood the console if the cpu is throttled frequently. So replace the
> printk with the tracepoint to notify the throttle event. And also events
> like throttle below nominal frequency and OCC_RESET are reduced to
> pr_warn/pr_warn_once as pointed by MFG to not mark them as critical
> messages. This patch adds 'throttle_reason' to struct chip to store the
> throttle reason.
> 
> Signed-off-by: Shilpasri G Bhat 

Reviewed-by: Gautham R. Shenoy 

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 5/6] cpufreq: powernv: Replace pr_info with trace print for throttle event

2016-01-28 Thread Viresh Kumar

On 28-01-16, 12:55, Shilpasri G Bhat wrote:
> Currently we use printk message to notify the throttle event. But this
> can flood the console if the cpu is throttled frequently. So replace the
> printk with the tracepoint to notify the throttle event. And also events
> like throttle below nominal frequency and OCC_RESET are reduced to
> pr_warn/pr_warn_once as pointed by MFG to not mark them as critical
> messages. This patch adds 'throttle_reason' to struct chip to store the
> throttle reason.
> 
> Signed-off-by: Shilpasri G Bhat 
> ---
> Changes from v6:
> - Rename struct chip member 'throt_reason' to 'throttle_reason'

Acked-by: Viresh Kumar 

-- 
viresh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 6/6] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-28 Thread Shilpasri G Bhat

Hi Viresh,

On 01/28/2016 02:10 PM, Viresh Kumar wrote:
> On 28-01-16, 12:55, Shilpasri G Bhat wrote:
>> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
>> b/Documentation/ABI/testing/sysfs-devices-system-cpu
>> index b683e8e..dea4620 100644
>> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
>> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
>> @@ -271,3 +271,48 @@ Description:Parameters for the CPU cache attributes
>>  - WriteBack: data is written only to the cache line and
>>   the modified cache line is written to main
>>   memory only when it is replaced
>> +
>> +What:   /sys/devices/system/cpu/cpufreq/chip*/throttle_stats
> 
> What about the chip directory ? Shouldn't that be documented? And
> shouldn't that mention that this is just for powerpc ?
> 
> And before that, I don't think that you are doing this properly. I am
> sorry that I never came to a point where I could review it, and you
> continued with it, version after version.
> 
> But, I really have strong objections to the way this is done. And you
> are making things more complex then they are.
> 
> So, these stats are per-policy, right ?

First of all sorry about the version log.
No these stats are not per-policy. They are per-chip. The throttle event is
common for all cores in the chip.

> 
> Then why aren't they added on the policy->kobj instead, just like
> cpufreq-stats? And maybe inside cpufreq-stats folder only?
> 
> That will solve many complexities you have in place here and will look
> sane as well.
> 
> Right now, you have stats as two places, cpu/cpufreq/chip/ and
> cpu/cpuX/cpufreq/stats/, which doesn't look wise and adds to
> confusion.
> 
> What do you say?
> 

Yes agree that it will be much cleaner with policy->kobj. But using policy->kobj
will result in multiple copies of the throttle-chip stats exported for each
policy in the chip. And moving it to cpu/cpuX/cpufreq/stats/
will add a dependency on CONFIG_CPU_FREQ_STAT

We want throttle attributes to be either in cpu/cpufreq or cpu/cpuX/cpufreq. If
multiple copies is not an issue, then I will move it to cpu/cpuX/cpufreq.

Thanks and Regards,
Shilpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/3] x86: query dynamic DEBUG_PAGEALLOC setting

2016-01-28 Thread Christian Borntraeger

On 01/27/2016 11:17 PM, David Rientjes wrote:
> On Wed, 27 Jan 2016, Christian Borntraeger wrote:
> 
>> We can use debug_pagealloc_enabled() to check if we can map
>> the identity mapping with 2MB pages. We can also add the state
>> into the dump_stack output.
>>
>> The patch does not touch the code for the 1GB pages, which ignored
>> CONFIG_DEBUG_PAGEALLOC. Do we need to fence this as well?
>>
>> Signed-off-by: Christian Borntraeger 
>> Reviewed-by: Thomas Gleixner 
>> ---
>>  arch/x86/kernel/dumpstack.c |  5 ++---
>>  arch/x86/mm/init.c  |  7 ---
>>  arch/x86/mm/pageattr.c  | 14 --
>>  3 files changed, 10 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
>> index 9c30acf..32e5699 100644
>> --- a/arch/x86/kernel/dumpstack.c
>> +++ b/arch/x86/kernel/dumpstack.c
>> @@ -265,9 +265,8 @@ int __die(const char *str, struct pt_regs *regs, long 
>> err)
>>  #ifdef CONFIG_SMP
>>  printk("SMP ");
>>  #endif
>> -#ifdef CONFIG_DEBUG_PAGEALLOC
>> -printk("DEBUG_PAGEALLOC ");
>> -#endif
>> +if (debug_pagealloc_enabled())
>> +printk("DEBUG_PAGEALLOC ");
>>  #ifdef CONFIG_KASAN
>>  printk("KASAN");
>>  #endif
>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>> index 493f541..39823fd 100644
>> --- a/arch/x86/mm/init.c
>> +++ b/arch/x86/mm/init.c
>> @@ -150,13 +150,14 @@ static int page_size_mask;
>>  
>>  static void __init probe_page_size_mask(void)
>>  {
>> -#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK)
>> +#if !defined(CONFIG_KMEMCHECK)
>>  /*
>> - * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages.
>> + * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will
>> + * use small pages.
>>   * This will simplify cpa(), which otherwise needs to support splitting
>>   * large pages into small in interrupt context, etc.
>>   */
>> -if (cpu_has_pse)
>> +if (cpu_has_pse && !debug_pagealloc_enabled())
>>  page_size_mask |= 1 << PG_LEVEL_2M;
>>  #endif
>>  
> 
> I would have thought free_init_pages() would be modified to use 
> debug_pagealloc_enabled() as well?


Indeed, I only touched the identity mapping and dump stack.
The question is do we really want to change free_init_pages as well?
The unmapping during runtime causes significant overhead, but the
unmapping after init imposes almost no runtime overhead. Of course,
things get fishy now as what is enabled and what not.

Kconfig after my patch "mm/debug_pagealloc: Ask users for default setting of 
debug_pagealloc"
(in mm) now states
snip
By default this option will have a small overhead, e.g. by not
allowing the kernel mapping to be backed by large pages on some
architectures. Even bigger overhead comes when the debugging is
enabled by DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc
command line parameter.
snip

So I am tempted to NOT change free_init_pages, but the x86 maintainers
can certainly decide differently. Ingo, Thomas, H. Peter, please advise.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 1/6] cpufreq: powernv: Free 'chips' on module exit

2016-01-28 Thread Viresh Kumar

On 28-01-16, 12:55, Shilpasri G Bhat wrote:
> This will free the dynamically allocated memory of'chips' on
> module exit.

Though it has a 'space' issues before 'chips', but I don't really care
much about that and so you aren't required to resend, unless you have
to send a v8 for something else.

> Signed-off-by: Shilpasri G Bhat 
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index 547890f..53f980b 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -612,6 +612,7 @@ static void __exit powernv_cpufreq_exit(void)
>   unregister_reboot_notifier(_cpufreq_reboot_nb);
>   opal_message_notifier_unregister(OPAL_MSG_OCC,
>_cpufreq_opal_nb);
> + kfree(chips);
>   cpufreq_unregister_driver(_cpufreq_driver);
>  }
>  module_exit(powernv_cpufreq_exit);

Acked-by: Viresh Kumar 

-- 
viresh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 0/6] cpufreq: powernv: Redesign the presentation of throttle notification and solve bug-fixes in the driver

2016-01-28 Thread Balbir Singh

On Thu, Jan 28, 2016 at 6:25 PM, Shilpasri G Bhat
 wrote:
> In POWER8, OCC(On-Chip-Controller) can throttle the frequency of the
> CPU when the chip crosses its thermal and power limits. Currently,
> powernv-cpufreq driver detects and reports this event as a console
> message. Some machines may not sustain the max turbo frequency in all
> conditions and can be throttled frequently. This can lead to the
> flooding of console with throttle messages. So this patchset aims to
> redesign the presentation of this event via sysfs counters and
> tracepoints. And it also fixes couple of bugs reported in the driver.
>
> - Patch [1] fixes a memory leak bug
> - Patch [2] fixes the cpu hot-plug bug in powernv_cpufreq_work_fn().
> - Patch [3] solves a bug in powernv_cpufreq_throttle_check(), which
>   calls in to cpu_to_chip_id() in hot path which reads DT every time
>   to find the chip id.
> - Patches [4] to [6] will add a perf trace point
>   "power:powernv_throttle" and sysfs throttle counter stats in
>   /sys/devices/system/cpu/cpufreq/chipN.
>

Looks good to me. You've got the reviews and acks you need.

Balbir Singh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] B4860qds/B4420qds: Updates to device trees for B4860 for DSP clusters and their L2 caches

2016-01-28 Thread Ashish Kumar

Please ignore this mail. Will send another revision.

Regards
Ashish

-Original Message-
From: Ashish Kumar [mailto:ashish.ku...@nxp.com] 
Sent: Thursday, January 28, 2016 1:23 PM
To: Scott Wood ; linuxppc-dev@lists.ozlabs.org
Cc: Ashish Kumar ; Shaveta Leekha 
Subject: [PATCH] B4860qds/B4420qds: Updates to device trees for B4860 for DSP 
clusters and their L2 caches

B4860 has 1 PPC core cluster and 3 DSP core clusters.
Similarly B4420 has 1 PPC core cluster and 1 DSP core cluster.

Each DSP core cluster consists of 2 SC3900 cores and a shared L2 cache.

1. Add DSP clusters for B4420
2. Reorganized the L2 cache nodes such that they now appear in only the soc 
specific dtsi files(b4860si-post.dtsi and b4420si-post.dtsi).
Earlier they were shown partly in common b4si-post.dtsi and si specific 
b4860si-post.dtsi files .

Signed-off-by: Ashish Kumar 
Signed-off-by: Shaveta Leekha  
---
 arch/powerpc/boot/dts/fsl/b4420si-post.dtsi |8 
 arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi  |   23 
 arch/powerpc/boot/dts/fsl/b4860si-post.dtsi |   18 +
 arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi  |   52 +++
 arch/powerpc/boot/dts/fsl/b4si-post.dtsi|5 ---
 5 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
index 86161ae..c0fe250 100644
--- a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
@@ -102,5 +102,13 @@
 
L2: l2-cache-controller@c2 {
compatible = "fsl,b4420-l2-cache-controller";
+   reg = <0xc2 0x1000>;
+   next-level-cache = <>;
+   };
+/* Following is DSP L2 cache*/
+   L2_2: l2-cache-controller@c6 {
+   compatible = "fsl,b4420-l2-cache-controller";
+   reg = <0xc6 0x1000>;
+   next-level-cache = <>;
};
 };
diff --git a/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
index 338af7e..5fec4ea 100644
--- a/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
@@ -76,4 +76,27 @@
fsl,portid-mapping = <0x8000>;
};
};
+
+   dsp-clusters {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   dsp-cluster0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "fsl,sc3900-cluster";
+   reg = <0>;
+
+   dsp0: dsp@0 {
+   compatible = "fsl,sc3900";
+   reg = <0>;
+   next-level-cache = <_2>;
+   };
+   dsp1: dsp@1 {
+   compatible = "fsl,sc3900";
+   reg = <1>;
+   next-level-cache = <_2>;
+   };
+   };
+   };
 };
diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
index f35e9e0..19679d3 100644
--- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
@@ -204,5 +204,23 @@
 
L2: l2-cache-controller@c2 {
compatible = "fsl,b4860-l2-cache-controller";
+   reg = <0xc2 0x1000>;
+   next-level-cache = <>;
+   };
+/* Following are DSP L2 cache */
+   L2_2: l2-cache-controller@c6 {
+   compatible = "fsl,b4860-l2-cache-controller";
+   reg = <0xc6 0x1000>;
+   next-level-cache = <>;
+   };
+   L2_3: l2-cache-controller@ca {
+   compatible = "fsl,b4860-l2-cache-controller";
+   reg = <0xca 0x1000>;
+   next-level-cache = <>;
+   };
+   L2_4: l2-cache-controller@ce {
+   compatible = "fsl,b4860-l2-cache-controller";
+   reg = <0xce 0x1000>;
+   next-level-cache = <>;
};
 };
diff --git a/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
index 1948f73..2e5dcb6 100644
--- a/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
@@ -90,4 +90,56 @@
fsl,portid-mapping = <0x8000>;
};
};
+   dsp-clusters {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   dsp-cluster0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "fsl,sc3900-cluster";
+   reg = <0>;
+   dsp0: dsp@0 {
+

Re: [PATCH 1/2] powerpc/mm: Enable HugeTLB page migration

2016-01-28 Thread Anshuman Khandual

On 01/28/2016 02:41 PM, Anshuman Khandual wrote:
> This enables HugeTLB page migration for PPC64_BOOK3S systems which implement
> HugeTLB page at the PMD level. It enables the kernel configuration option

As mentioned above, it works only for 16MB HugeTLB page migration on 64K base
pages implemented right on the PMD as a single PTE but not for the 16MB HugeTLB
page on 4K base pages implemented through huge page directory. As generic
VM migrate code does not look into the page table structure when initiating
migration of 16MB on 4K it will just fail without stating the reason which
might be some times confusing. I can edit the commit message to capture this
point if needed.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 1/9] ppc64 (le): prepare for -mprofile-kernel

2016-01-28 Thread Torsten Duwe

On Thu, Jan 28, 2016 at 03:26:59PM +1100, Michael Ellerman wrote:
> 
> That raises an interesting question, how does it work *without* 
> DYNAMIC_FTRACE?
> 
> It looks like you haven't updated that version of _mcount at all? Or maybe I'm
> missing an #ifdef somewhere?

You didn't, I did. I haven't considered that combination.

> It doesn't look like that will work right with the -mprofile-kernel ABI. And
> indeed it doesn't boot.

The lean _mcount should handle it and boot, had I not misplaced it in
the #ifdefs, but then of course profiling wouldn't work.

> So we'll need to work that out. I guess the minimum would be to disable
> -mprofile-kernel if DYNAMIC_FTRACE is disabled.

I feel that supporting all combinations of ABIv1/ABIv2, FTRACE/DYNAMIC_FTRACE,
-p/-mprofile-kernel will get us into #ifdef hell, and at least one kernel
developer will go insane. That will probably be the one porting this
to ppc64be (ABIv1).

> Frankly I think we'd be happy to *only* support DYNAMIC_FTRACE, but the 
> generic
> code doesn't let us do that at the moment.

Seconded.
I'll have a look at the Kconfigs.

> But it's better than the previous version which didn't boot :)

That's your fault, you picked the wrong compiler ;-)

> Also ftracetest fails at step 8:
>   ...
>   [8] ftrace - function graph filters with stack tracer
>   Unable to handle kernel paging request for data at address 
> 0xd33d7f70
[...]
> That doesn't happen without your series applied, though that doesn't 100% mean
> it's your bug. I haven't had time to dig any deeper.

Will check as well...

Torsten

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] B4860qds/B4420qds: Updates to device trees for B4860 for DSP clusters and their L2 caches

2016-01-28 Thread Ashish Kumar

B4860 has 1 PPC core cluster and 3 DSP core clusters.
Similarly B4420 has 1 PPC core cluster and 1 DSP core cluster.

Each DSP core cluster consists of 2 SC3900 cores and a shared L2 cache.

1. Add DSP clusters for B4420
2. Reorganized the L2 cache nodes such that they now appear in only the
soc specific dtsi files(b4860si-post.dtsi and b4420si-post.dtsi).
Earlier they were shown partly in common b4si-post.dtsi and si specific
b4860si-post.dtsi files .

Signed-off-by: Ashish Kumar 
Signed-off-by: Shaveta Leekha  
---
 arch/powerpc/boot/dts/fsl/b4420si-post.dtsi |8 
 arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi  |   23 
 arch/powerpc/boot/dts/fsl/b4860si-post.dtsi |   18 +
 arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi  |   52 +++
 arch/powerpc/boot/dts/fsl/b4si-post.dtsi|5 ---
 5 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
index 86161ae..c0fe250 100644
--- a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
@@ -102,5 +102,13 @@
 
L2: l2-cache-controller@c2 {
compatible = "fsl,b4420-l2-cache-controller";
+   reg = <0xc2 0x1000>;
+   next-level-cache = <>;
+   };
+/* Following is DSP L2 cache*/
+   L2_2: l2-cache-controller@c6 {
+   compatible = "fsl,b4420-l2-cache-controller";
+   reg = <0xc6 0x1000>;
+   next-level-cache = <>;
};
 };
diff --git a/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
index 338af7e..5fec4ea 100644
--- a/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
@@ -76,4 +76,27 @@
fsl,portid-mapping = <0x8000>;
};
};
+
+   dsp-clusters {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   dsp-cluster0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "fsl,sc3900-cluster";
+   reg = <0>;
+
+   dsp0: dsp@0 {
+   compatible = "fsl,sc3900";
+   reg = <0>;
+   next-level-cache = <_2>;
+   };
+   dsp1: dsp@1 {
+   compatible = "fsl,sc3900";
+   reg = <1>;
+   next-level-cache = <_2>;
+   };
+   };
+   };
 };
diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
index f35e9e0..19679d3 100644
--- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
@@ -204,5 +204,23 @@
 
L2: l2-cache-controller@c2 {
compatible = "fsl,b4860-l2-cache-controller";
+   reg = <0xc2 0x1000>;
+   next-level-cache = <>;
+   };
+/* Following are DSP L2 cache */
+   L2_2: l2-cache-controller@c6 {
+   compatible = "fsl,b4860-l2-cache-controller";
+   reg = <0xc6 0x1000>;
+   next-level-cache = <>;
+   };
+   L2_3: l2-cache-controller@ca {
+   compatible = "fsl,b4860-l2-cache-controller";
+   reg = <0xca 0x1000>;
+   next-level-cache = <>;
+   };
+   L2_4: l2-cache-controller@ce {
+   compatible = "fsl,b4860-l2-cache-controller";
+   reg = <0xce 0x1000>;
+   next-level-cache = <>;
};
 };
diff --git a/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
index 1948f73..2e5dcb6 100644
--- a/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
@@ -90,4 +90,56 @@
fsl,portid-mapping = <0x8000>;
};
};
+   dsp-clusters {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   dsp-cluster0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "fsl,sc3900-cluster";
+   reg = <0>;
+   dsp0: dsp@0 {
+   compatible = "fsl,sc3900";
+   reg = <0>;
+   next-level-cache = <_2>;
+   };
+   dsp1: dsp@1 {
+   compatible = "fsl,sc3900";
+   reg = <1>;
+   next-level-cache = <_2>;
+   };
+   };
+   dsp-cluster1 {
+

Re: [PATCH v6 0/9] ftrace with regs + live patching for ppc64 LE (ABI v2)

2016-01-28 Thread Torsten Duwe

On Thu, Jan 28, 2016 at 02:31:58PM +1100, Michael Ellerman wrote:
> 
> Looking at GCC history it looks like the fix is in 4.9.0 and anything later.

Good. But 4.8.5 has a buggy -mprofile-kernel, and there will be no 4.8.6, Bad.

> But a version check doesn't work with patched distro/vendor toolchains. So we
> probably need some sort of runtime check.

Agreed.

/bin/echo -e '#include \nnotrace int func() { return 0; }' |
 gcc -D__KERNEL__ -Iinclude -p -mprofile-kernel -x c -O2 - -S -o - | grep mcount

should be empty. If it yields "bl _mcount" your compiler is buggy.
I haven't looked at the kernel's "autoconf" yet, but it's probably capable
of testing this.

Torsten

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 6/6] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-28 Thread Shilpasri G Bhat

On 01/28/2016 03:11 PM, Viresh Kumar wrote:
> On 28-01-16, 15:06, Shilpasri G Bhat wrote:
>> No these stats are not per-policy. They are per-chip. The throttle event is
>> common for all cores in the chip.
> 
> How do you define a chip? And how is it different then the group of
> CPUs represented by the policy ?
> 

Chip is a group of policies.
Hmm yes I see your point. We anyways maintain frequency stats which is
per-policy. We might as well have throttle stats exported per-policy which
points to per-chip data.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform

2016-01-28 Thread Yongji Xie


Ping...

Alex, any comment?

Regards,
Yongji Xie

On 2016/1/15 15:06, Yongji Xie wrote:

Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs and MSI-X table. This is because
sub-page BARs' mmio page may be shared with other BARs and MSI-X table
should not be accessed directly from the guest for security reasons.

But these will easily cause some performance issues for mmio accesses
in guest when vfio passthrough sub-page BARs or BARs containing MSI-X
table on PPC64 platform. This is because PAGE_SIZE is 64KB by default
on PPC64 platform and the big page may easily hit the sub-page MMIO
BARs' unmmapping and cause the unmmaping of the mmio page which
MSI-X table locate in, which lead to mmio emulation in host.

For sub-page MMIO BARs' unmmapping, this patchset adds a kernel
parameter for PCI resource allocator to enforce the alignment of all
MMIO BARs to be at least PAGE_SZIE and make it enabled by default on
PPC64 platform so that sub-page BAR's mmio page will not be shared
with other BARs. Then we can mmap sub-page MMIO BARs in vfio-pci driver
with this parameter enabled.

For MSI-X table's unmmapping, we think MSI-X table is safe to access
directly from userspace if PCI host bridge support filtering of MSIs
which can ensure that a given pci device can only shoot the MSIs
assigned for it. So we add a pci_host_bridge attribute to indicate
if this PCI host bridge supports filtering of MSIs. Then we can mmap
MSI-X table with this attribute set.

With this patchset applied, we can get almost 100% improvement on
performance for mmio accesses when we passthrough sub-page BARs to guest
in our test.

The two vfio related patches(patch 2 and patch 5) are based on the
proposed patchset[1].

Changelog v3:
- Rebase on new linux kernel mainline with the patchset[1] applied.
- Add a function to check whether PCI BARs'mmio page is shared with
   other BARs.
- Add a host bridge attribute to indicate PCI host bridge support
   filtering of MSIs.
- Use the new host bridge attribute to check if MSI-X table can
   be mmapped instead of CONFIG_EEH.
- Remove Kconfig option VFIO_PCI_MMAP_MSIX

Changelog v2:
- Rebase on v4.4-rc6 with the patchset[1] applied.
- Use kernel parameter to enforce all MMIO BARs to be page aligned
   on PCI core code instead of doing it on PPC64 arch code.
- Remove flags: VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED
 VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP
- Add a Kconfig option to support for mmapping MSI-X table.

[1] https://lkml.org/lkml/2015/11/23/748

Yongji Xie (5):
   PCI: Add support for enforcing all MMIO BARs to be page aligned
   vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive
   PCI: Add host bridge attribute to indicate filtering of MSIs is supported
   powerpc/powernv/pci-ioda: Enable msi_filtered bit for any IODA host bridge
   vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs

  Documentation/kernel-parameters.txt   |5 +
  arch/powerpc/include/asm/pci.h|   11 +
  arch/powerpc/platforms/powernv/pci-ioda.c |6 +
  drivers/pci/host-bridge.c |6 +
  drivers/pci/pci.c |   35 +
  drivers/pci/pci.h |8 ++-
  drivers/vfio/pci/vfio_pci.c   |   13 ---
  include/linux/pci.h   |7 ++
  8 files changed, 87 insertions(+), 4 deletions(-)



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 6/6] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-28 Thread Viresh Kumar

On 28-01-16, 12:55, Shilpasri G Bhat wrote:
> Create sysfs attributes to export throttle information in
> /sys/devices/system/cpu/cpufreq/chipN. The newly added sysfs files are as
> follows:
> 
> 1)/sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
>   This gives the throttle stats for each of the available frequencies.
>   The throttle stat of a frequency is the total number of times the max
>   frequency is reduced to that frequency.
>   # cat /sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
>   4023000 0
>   399 0
>   3956000 1
>   3923000 0
>   389 0
>   3857000 2
>   3823000 0
>   379 0
>   3757000 2
>   3724000 1
>   369 1
>   ...
> 
> 2)/sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
>   This directory contains throttle reason files. Each file gives the
>   total number of times the max frequency is throttled, except for
>   'unthrottle_count', which gives the total number of times the max
>   frequency is unthrottled after being throttled.
>   # cd /sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
>   # cat cpu_over_temperature
>   7
>   # cat occ_reset
>   0
>   # cat over_current
>   0
>   # cat power_cap
>   0
>   # cat power_supply_failure
>   0
>   # cat unthrottle_count
>   7

Wouldn't it be better to keep a two dimensional table here, something
like: trans_table ?

You can have "reasons" in the vertical dimension and frequencies in
the horizontal one? So, that user can see which frequencies got
throttled and why?

> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
> b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index b683e8e..dea4620 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -271,3 +271,48 @@ Description: Parameters for the CPU cache attributes
>   - WriteBack: data is written only to the cache line and
>the modified cache line is written to main
>memory only when it is replaced
> +
> +What:/sys/devices/system/cpu/cpufreq/chip*/throttle_stats

You need documentation for chip*/ as well..

And how can a user know which policies or CPUs belong to a chip?

> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
>  static struct chip {
>   unsigned int id;
>   bool throttled;
> @@ -62,6 +72,11 @@ static struct chip {
>   u8 throttle_reason;
>   cpumask_t mask;
>   struct work_struct throttle;
> + int throttle_turbo;
> + int throttle_nominal;
> + int reason[OCC_MAX_REASON];
> + int *pstate_stat;
> + struct kobject *kobj;
>  } *chips;
>  
>  static int nr_chips;
> @@ -196,6 +211,126 @@ static struct freq_attr *powernv_cpu_freq_attr[] = {
>   NULL,
>  };
>  
> +static inline int get_chip_index(unsigned int id)
> +{
> + int i;
> +
> + for (i = 0; i < nr_chips; i++)
> + if (chips[i].id == id)
> + return i;
> +
> + return -EINVAL;
> +}
> +
> +static inline int get_chip_index_from_kobj(struct kobject *kobj)
> +{
> + int ret, id;
> + int len = strlen("chip");
> +
> + ret = kstrtoint(kobj->name + len, 0, );

That doesn't look nice. What about keeping the kobject in the 'struct
chip' and using container of here?

You can register the kobject with: kobject_init_and_add().

> + if (ret)
> + return ret;
> +
> + ret = get_chip_index(id);
> + if (ret < 0)
> + pr_warn_once("%s Matching chip-id not found %d\n", __func__,
> +  id);
> + return ret;
> +}
> +
> +static ssize_t throttle_freq_show(struct kobject *kobj,
> +   struct kobj_attribute *attr, char *buf)
> +{
> + int i, count = 0, id;
> +
> + id = get_chip_index_from_kobj(kobj);
> + if (id < 0)
> + return id;
> +
> + for (i = 0; i < powernv_pstate_info.nr_pstates; i++)
> + count += sprintf([count], "%d %d\n",
> +powernv_freqs[i].frequency,
> +chips[id].pstate_stat[i]);
> +
> + return count;
> +}
> +
> +static struct kobj_attribute attr_throttle_frequencies =
> +__ATTR(throttle_frequencies, 0444, throttle_freq_show, NULL);

Use DEVICE_ATTR_RO macro ?

> @@ -583,12 +736,38 @@ static int init_chip_info(void)
>   goto free_chip_map;
>  
>   for (i = 0; i < nr_chips; i++) {
> + char name[10];
> +
>   chips[i].id = chip[i];
>   cpumask_copy([i].mask, cpumask_of_node(chip[i]));
>   INIT_WORK([i].throttle, powernv_cpufreq_work_fn);
> + chips[i].pstate_stat = kcalloc(powernv_pstate_info.nr_pstates,
> + sizeof(int), GFP_KERNEL);
> + if (!chips[i].pstate_stat)
> + goto free;
> +
> + sprintf(name, "chip%d", chips[i].id);

Re: [PATCH v9 3/6] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms.

2016-01-28 Thread Will Deacon

On Thu, Jan 28, 2016 at 10:42:17PM +0530, Ganapatrao Kulkarni wrote:
> On Thu, Jan 28, 2016 at 8:09 PM, Will Deacon  wrote:
> > On Tue, Jan 26, 2016 at 02:36:04PM -0600, Bjorn Helgaas wrote:
> >> Subject is "arm64/arm, numa, dt: adding ..."  What is the significance
> >> of the "arm" part?  The other patches only mention "arm64".
> >>
> >> General comment: the code below has little, if anything, that is
> >> actually arm64-specific.  Maybe this is the first DT-based NUMA
> >> platform?  I don't see other similar code for other arches, so maybe
> >> it's too early to try to generalize it, but we should try to avoid
> >> adding duplicates of this code if/when other arches do show up.
> >
> > Having it in the core code would allow us to share it with arch/arm/
> > fairly straightforwardly.
> This binding can be used for arm too.
> however at this moment it is the need of arm64 platforms.
> can we please keep this to arm64 as it's too early to try to
> generalize it(as Bjorn suggested)
> I prefer to keep it as it is, otherwise ok.
> Please suggest.

My suggestions time and time again on the NUMA patches from you have
consistently been around consolidation of existing code, or moving things
that aren't architecture-specific out of the architecture code.

Will
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/mm: Fixup _HPAGE_CHG_MASK

2016-01-28 Thread Michael Ellerman

On Wed, 2016-27-01 at 06:34:20 UTC, "Aneesh Kumar K.V" wrote:
> This got wrongly updated by 7aa9a23c69eae5bfba4f1f92c58d89edc334c8ae
> ("powerpc, thp: remove infrastructure for handling splitting PMDs")
> during the last merge. Fix this up
> 
> Signed-off-by: Aneesh Kumar K.V 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/2d19fc639516dc7b4184450b31

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/3] param: convert some "on"/"off" users to strtobool

2016-01-28 Thread Michael Ellerman

On Thu, 2016-01-28 at 06:17 -0800, Kees Cook wrote:

> This changes several users of manual "on"/"off" parsing to use strtobool.

You should probably point out that it's a slight behaviour change for some
users. ie. parameters that previously *only* worked with "on"/"off", can now
also take 0/1/y/n etc.

But I don't think that's a show stopper.


> Signed-off-by: Kees Cook 
> Cc: x...@kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s...@vger.kernel.org
> ---
>  arch/powerpc/kernel/rtasd.c  | 10 +++---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 11 +++

Acked-by: Michael Ellerman  (powerpc)


cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v9 3/6] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms.

2016-01-28 Thread Will Deacon

On Tue, Jan 26, 2016 at 02:36:04PM -0600, Bjorn Helgaas wrote:
> Subject is "arm64/arm, numa, dt: adding ..."  What is the significance
> of the "arm" part?  The other patches only mention "arm64".
> 
> General comment: the code below has little, if anything, that is
> actually arm64-specific.  Maybe this is the first DT-based NUMA
> platform?  I don't see other similar code for other arches, so maybe
> it's too early to try to generalize it, but we should try to avoid
> adding duplicates of this code if/when other arches do show up.

Having it in the core code would allow us to share it with arch/arm/
fairly straightforwardly.

Will
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] lib: add "on" and "off" to strtobool

2016-01-28 Thread Kees Cook

Several places in the kernel expect to use "on" and "off" for their
boolean signifiers, so add them to strtobool.

Signed-off-by: Kees Cook 
Cc: Rasmus Villemoes 
Cc: Daniel Borkmann 
---
 lib/string.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/lib/string.c b/lib/string.c
index 0323c0d5629a..091570708db7 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -635,12 +635,15 @@ EXPORT_SYMBOL(sysfs_streq);
  * @s: input string
  * @res: result
  *
- * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
- * Otherwise it will return -EINVAL.  Value pointed to by res is
- * updated upon finding a match.
+ * This routine returns 0 iff the first character is one of 'Yy1Nn0', or
+ * [oO][NnFf] for "on" and "off". Otherwise it will return -EINVAL.  Value
+ * pointed to by res is updated upon finding a match.
  */
 int strtobool(const char *s, bool *res)
 {
+   if (!s)
+   return -EINVAL;
+
switch (s[0]) {
case 'y':
case 'Y':
@@ -652,6 +655,21 @@ int strtobool(const char *s, bool *res)
case '0':
*res = false;
break;
+   case 'o':
+   case 'O':
+   switch (s[1]) {
+   case 'n':
+   case 'N':
+   *res = true;
+   break;
+   case 'f':
+   case 'F':
+   *res = false;
+   break;
+   default:
+   return -EINVAL;
+   }
+   break;
default:
return -EINVAL;
}
-- 
2.6.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/3] param: convert some "on"/"off" users to strtobool

2016-01-28 Thread Kees Cook

This changes several users of manual "on"/"off" parsing to use strtobool.

Signed-off-by: Kees Cook 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
---
 arch/powerpc/kernel/rtasd.c  | 10 +++---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 11 +++
 arch/s390/kernel/time.c  |  8 ++--
 arch/s390/kernel/topology.c  |  8 +++-
 arch/x86/kernel/aperture_64.c| 13 +++--
 include/linux/tick.h |  2 +-
 kernel/time/hrtimer.c| 11 +++
 kernel/time/tick-sched.c | 11 +++
 8 files changed, 21 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 5a2c049c1c61..984e67e91ba3 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -49,7 +50,7 @@ static unsigned int rtas_error_log_buffer_max;
 static unsigned int event_scan;
 static unsigned int rtas_event_scan_rate;
 
-static int full_rtas_msgs = 0;
+static bool full_rtas_msgs;
 
 /* Stop logging to nvram after first fatal error */
 static int logging_enabled; /* Until we initialize everything,
@@ -592,11 +593,6 @@ __setup("surveillance=", surveillance_setup);
 
 static int __init rtasmsgs_setup(char *str)
 {
-   if (strcmp(str, "on") == 0)
-   full_rtas_msgs = 1;
-   else if (strcmp(str, "off") == 0)
-   full_rtas_msgs = 0;
-
-   return 1;
+   return strtobool(str, _rtas_msgs);
 }
 __setup("rtasmsgs=", rtasmsgs_setup);
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 32274f72fe3f..bb333e9fd77a 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -47,20 +48,14 @@ static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = 
CPU_STATE_OFFLINE;
 
 static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
 
-static int cede_offline_enabled __read_mostly = 1;
+static bool cede_offline_enabled __read_mostly = true;
 
 /*
  * Enable/disable cede_offline when available.
  */
 static int __init setup_cede_offline(char *str)
 {
-   if (!strcmp(str, "off"))
-   cede_offline_enabled = 0;
-   else if (!strcmp(str, "on"))
-   cede_offline_enabled = 1;
-   else
-   return 0;
-   return 1;
+   return strtobool(str, _offline_enabled);
 }
 
 __setup("cede_offline=", setup_cede_offline);
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 99f84ac31307..afc7fc9684ba 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -1433,7 +1433,7 @@ device_initcall(etr_init_sysfs);
 /*
  * Server Time Protocol (STP) code.
  */
-static int stp_online;
+static bool stp_online;
 static struct stp_sstpi stp_info;
 static void *stp_page;
 
@@ -1444,11 +1444,7 @@ static struct timer_list stp_timer;
 
 static int __init early_parse_stp(char *p)
 {
-   if (strncmp(p, "off", 3) == 0)
-   stp_online = 0;
-   else if (strncmp(p, "on", 2) == 0)
-   stp_online = 1;
-   return 0;
+   return strtobool(p, _online);
 }
 early_param("stp", early_parse_stp);
 
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 40b8102fdadb..10e388216307 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -37,7 +38,7 @@ static void set_topology_timer(void);
 static void topology_work_fn(struct work_struct *work);
 static struct sysinfo_15_1_x *tl_info;
 
-static int topology_enabled = 1;
+static bool topology_enabled = true;
 static DECLARE_WORK(topology_work, topology_work_fn);
 
 /*
@@ -444,10 +445,7 @@ static const struct cpumask *cpu_book_mask(int cpu)
 
 static int __init early_parse_topology(char *p)
 {
-   if (strncmp(p, "off", 3))
-   return 0;
-   topology_enabled = 0;
-   return 0;
+   return strtobool(p, _enabled);
 }
 early_param("topology", early_parse_topology);
 
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 6e85f713641d..6608b00a516a 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -227,19 +228,11 @@ static u32 __init search_agp_bridge(u32 *order, int 
*valid_agp)
return 0;
 }
 
-static int gart_fix_e820 __initdata = 1;
+static bool gart_fix_e820 __initdata = true;
 
 static int __init parse_gart_mem(char *p)
 {
-   if (!p)
-   return -EINVAL;
-
-   if (!strncmp(p, "off", 3))
-

Re: powerpc/mm: Allow user space to map rtas_rmo_buf

2016-01-28 Thread Michael Ellerman

On Thu, 2016-21-01 at 16:15:31 UTC, Vasant Hegde wrote:
> With commit 90a545e9 (restrict /dev/mem to idle io memory ranges) mapping
> rtas_rmo_buf from user space is failing. Hence we are not able to make
> RTAS syscall.
> 
> This patch calls page_is_rtas_user_buf before calling iomem_is_exclusive
> in  devmem_is_allowed(). This will allow user space to map rtas_rmo_buf
> and we are able to make RTAS syscall.
> 
> Reported-by: Bharata B Rao 
> CC: Dan Williams 
> CC: Nathan Fontenot 
> CC: Michael Ellerman 
> Signed-off-by: Vasant Hegde 
> Acked-by: Dan Williams 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/e256caa7d0515e301f8c8c6e7d

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/eeh: Fix PE location code

2016-01-28 Thread Michael Ellerman

On Wed, 2015-02-12 at 05:25:32 UTC, Gavin Shan wrote:
> In eeh_pe_loc_get(), the PE location code is retrieved from the
> "ibm,loc-code" property of the device node for the bridge of the
> PE's primary bus. It's not correct because the property indicates
> the parent PE's location code.
> 
> This reads the correct PE location code from "ibm,io-base-loc-code"
> or "ibm,slot-location-code" property of PE parent bus's device node.
> 
> Signed-off-by: Gavin Shan 
> Tested-by: Russell Currey 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/7e56f627768da4e6480986b514

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/perf: Remove PPMU_HAS_SSLOT flag for Power8

2016-01-28 Thread Michael Ellerman

On Mon, 2016-25-01 at 08:33:46 UTC, Madhavan Srinivasan wrote:
> Commit: 7a7868326d77 introduced PPMU_HAS_SSLOT flag to
>  remove assumption of MMCRA[SLOT] with respect to
>  PPMU_ALT_SIPR flag. Commit 7a7868326d77's message also
>  specifies that Power8 does not support MMCRA[SLOT].
>  But still PPMU_HAS_SSLOT flag managed to get into
>  Power8 code. Patch to remove the same from Power8 flags.
> 
> Signed-off-by: Madhavan Srinivasan 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/370f06c88528b3988fe24a372c

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] arch/PPC:B4860qds/B4420qds: Updates to device trees for B4860 for DSP clusters and their L2 caches

2016-01-28 Thread Ashish Kumar

B4860 has 1 PPC core cluster and 3 DSP core clusters.
Similarly B4420 has 1 PPC core cluster and 1 DSP core cluster.

Each DSP core cluster consists of 2 SC3900 cores and a shared L2 cache.

Add DSP clusters for B4420
The L2 cache nodes such that they now appear in only the
soc specific dtsi files(b4860si-post.dtsi and b4420si-post.dtsi).

Signed-off-by: Ashish Kumar 
Signed-off-by: Shaveta Leekha  
---
 arch/powerpc/boot/dts/fsl/b4420si-post.dtsi |7 +++-
 arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi  |   23 
 arch/powerpc/boot/dts/fsl/b4860si-post.dtsi |   20 ++-
 arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi  |   52 +++
 4 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
index f996cce..cc70adb 100644
--- a/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
@@ -91,7 +91,12 @@
 
L2_1: l2-cache-controller@c2 {
compatible = "fsl,b4420-l2-cache-controller";
-   reg = <0xc2 0x4>;
+   reg = <0xc2 0x1000>;
+   next-level-cache = <>;
+   };
+   L2_2: l2-cache-controller@c6 {
+   compatible = "fsl,b4420-l2-cache-controller";
+   reg = <0xc6 0x1000>;
next-level-cache = <>;
};
 };
diff --git a/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
index bc3bf93..87c2712 100644
--- a/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
@@ -81,4 +81,27 @@
fsl,portid-mapping = <0x8000>;
};
};
+
+   dsp-clusters {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   dsp-cluster0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "fsl,sc3900-cluster";
+   reg = <0>;
+
+   dsp0: dsp@0 {
+   compatible = "fsl,sc3900";
+   reg = <0>;
+   next-level-cache = <_2>;
+   };
+   dsp1: dsp@1 {
+   compatible = "fsl,sc3900";
+   reg = <1>;
+   next-level-cache = <_2>;
+   };
+   };
+   };
 };
diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
index 8687198..833d483 100644
--- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
@@ -278,7 +278,25 @@
 
L2_1: l2-cache-controller@c2 {
compatible = "fsl,b4860-l2-cache-controller";
-   reg = <0xc2 0x4>;
+   reg = <0xc2 0x1000>;
+   next-level-cache = <>;
+   };
+
+   L2_2: l2-cache-controller@c6 {
+   compatible = "fsl,b4860-l2-cache-controller";
+   reg = <0xc6 0x1000>;
+   next-level-cache = <>;
+   };
+
+   L2_3: l2-cache-controller@ca {
+   compatible = "fsl,b4860-l2-cache-controller";
+   reg = <0xca 0x1000>;
+   next-level-cache = <>;
+   };
+
+   L2_4: l2-cache-controller@ce {
+   compatible = "fsl,b4860-l2-cache-controller";
+   reg = <0xce 0x1000>;
next-level-cache = <>;
};
 };
diff --git a/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
index 8797ce1..a45800d 100644
--- a/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
@@ -100,4 +100,56 @@
fsl,portid-mapping = <0x8000>;
};
};
+   dsp-clusters {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   dsp-cluster0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "fsl,sc3900-cluster";
+   reg = <0>;
+   dsp0: dsp@0 {
+   compatible = "fsl,sc3900";
+   reg = <0>;
+   next-level-cache = <_2>;
+   };
+   dsp1: dsp@1 {
+   compatible = "fsl,sc3900";
+   reg = <1>;
+   next-level-cache = <_2>;
+   };
+   };
+   dsp-cluster1 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "fsl,sc3900-cluster";
+

[PATCH 1/3] lib: fix callers of strtobool to use char array

2016-01-28 Thread Kees Cook

Some callers of strtobool were passing a pointer to unterminated strings.
This fixes the issue and consolidates some logic in cifs.

Signed-off-by: Kees Cook 
Cc: Amitkumar Karwar 
Cc: Nishant Sarmukadam 
Cc: Kalle Valo 
Cc: Steve French 
Cc: linux-c...@vger.kernel.org
---
 drivers/net/wireless/marvell/mwifiex/debugfs.c |   6 +-
 fs/cifs/cifs_debug.c   | 106 -
 fs/cifs/cifs_debug.h   |   2 +-
 fs/cifs/cifsfs.c   |   6 +-
 fs/cifs/cifsglob.h |   4 +-
 5 files changed, 58 insertions(+), 66 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/debugfs.c 
b/drivers/net/wireless/marvell/mwifiex/debugfs.c
index 0b9c580af988..76af60899c69 100644
--- a/drivers/net/wireless/marvell/mwifiex/debugfs.c
+++ b/drivers/net/wireless/marvell/mwifiex/debugfs.c
@@ -880,13 +880,13 @@ mwifiex_reset_write(struct file *file,
 {
struct mwifiex_private *priv = file->private_data;
struct mwifiex_adapter *adapter = priv->adapter;
-   char cmd;
+   char cmd[2] = { '\0' };
bool result;
 
-   if (copy_from_user(, ubuf, sizeof(cmd)))
+   if (copy_from_user(cmd, ubuf, sizeof(char)))
return -EFAULT;
 
-   if (strtobool(, ))
+   if (strtobool(cmd, ))
return -EINVAL;
 
if (!result)
diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
index 50b268483302..2f7ffcc9e364 100644
--- a/fs/cifs/cifs_debug.c
+++ b/fs/cifs/cifs_debug.c
@@ -251,11 +251,29 @@ static const struct file_operations 
cifs_debug_data_proc_fops = {
.release= single_release,
 };
 
+static int get_user_bool(const char __user *buffer, bool *store)
+{
+   char c[2] = { '\0' };
+   bool bv;
+   int rc;
+
+   rc = get_user(c[0], buffer);
+   if (rc)
+   return rc;
+
+   rc = strtobool(c, );
+   if (rc)
+   return rc;
+
+   *store = bv;
+
+   return 0;
+}
+
 #ifdef CONFIG_CIFS_STATS
 static ssize_t cifs_stats_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *ppos)
 {
-   char c;
bool bv;
int rc;
struct list_head *tmp1, *tmp2, *tmp3;
@@ -263,34 +281,32 @@ static ssize_t cifs_stats_proc_write(struct file *file,
struct cifs_ses *ses;
struct cifs_tcon *tcon;
 
-   rc = get_user(c, buffer);
+   rc = get_user_bool(buffer, );
if (rc)
return rc;
 
-   if (strtobool(, ) == 0) {
 #ifdef CONFIG_CIFS_STATS2
-   atomic_set(, 0);
-   atomic_set(, 0);
+   atomic_set(, 0);
+   atomic_set(, 0);
 #endif /* CONFIG_CIFS_STATS2 */
-   spin_lock(_tcp_ses_lock);
-   list_for_each(tmp1, _tcp_ses_list) {
-   server = list_entry(tmp1, struct TCP_Server_Info,
-   tcp_ses_list);
-   list_for_each(tmp2, >smb_ses_list) {
-   ses = list_entry(tmp2, struct cifs_ses,
-smb_ses_list);
-   list_for_each(tmp3, >tcon_list) {
-   tcon = list_entry(tmp3,
- struct cifs_tcon,
- tcon_list);
-   atomic_set(>num_smbs_sent, 0);
-   if (server->ops->clear_stats)
-   server->ops->clear_stats(tcon);
-   }
+   spin_lock(_tcp_ses_lock);
+   list_for_each(tmp1, _tcp_ses_list) {
+   server = list_entry(tmp1, struct TCP_Server_Info,
+   tcp_ses_list);
+   list_for_each(tmp2, >smb_ses_list) {
+   ses = list_entry(tmp2, struct cifs_ses,
+smb_ses_list);
+   list_for_each(tmp3, >tcon_list) {
+   tcon = list_entry(tmp3,
+ struct cifs_tcon,
+ tcon_list);
+   atomic_set(>num_smbs_sent, 0);
+   if (server->ops->clear_stats)
+   server->ops->clear_stats(tcon);
}
}
-   spin_unlock(_tcp_ses_lock);
}
+   spin_unlock(_tcp_ses_lock);
 
return count;
 }
@@ -433,17 +449,17 @@ static int cifsFYI_proc_open(struct inode *inode, struct 
file *file)
 static ssize_t cifsFYI_proc_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
 {
-

[PATCH 0/3] lib: add "on" and "off" to strtobool

2016-01-28 Thread Kees Cook

This consolidates logic for handling "on"/"off" parsing for bools into
the existing strtobool function. This requires making sure callers are
passing NULL-terminated strings.

-Kees

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/3] param: convert some "on"/"off" users to strtobool

2016-01-28 Thread Heiko Carstens

On Thu, Jan 28, 2016 at 06:17:07AM -0800, Kees Cook wrote:
> This changes several users of manual "on"/"off" parsing to use strtobool.
> 
> Signed-off-by: Kees Cook 
> Cc: x...@kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s...@vger.kernel.org
> ---
>  arch/powerpc/kernel/rtasd.c  | 10 +++---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 11 +++
>  arch/s390/kernel/time.c  |  8 ++--
>  arch/s390/kernel/topology.c  |  8 +++-
>  arch/x86/kernel/aperture_64.c| 13 +++--
>  include/linux/tick.h |  2 +-
>  kernel/time/hrtimer.c| 11 +++
>  kernel/time/tick-sched.c | 11 +++
>  8 files changed, 21 insertions(+), 53 deletions(-)

For the s390 bits:

Acked-by: Heiko Carstens 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v9 3/6] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms.

2016-01-28 Thread Ganapatrao Kulkarni

Hi Will,


On Thu, Jan 28, 2016 at 8:09 PM, Will Deacon  wrote:
> On Tue, Jan 26, 2016 at 02:36:04PM -0600, Bjorn Helgaas wrote:
>> Subject is "arm64/arm, numa, dt: adding ..."  What is the significance
>> of the "arm" part?  The other patches only mention "arm64".
>>
>> General comment: the code below has little, if anything, that is
>> actually arm64-specific.  Maybe this is the first DT-based NUMA
>> platform?  I don't see other similar code for other arches, so maybe
>> it's too early to try to generalize it, but we should try to avoid
>> adding duplicates of this code if/when other arches do show up.
>
> Having it in the core code would allow us to share it with arch/arm/
> fairly straightforwardly.
This binding can be used for arm too.
however at this moment it is the need of arm64 platforms.
can we please keep this to arm64 as it's too early to try to
generalize it(as Bjorn suggested)
I prefer to keep it as it is, otherwise ok.
Please suggest.
>
> Will
thanks
Ganapat
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] powerpc/mm: Enable HugeTLB page migration

2016-01-28 Thread Aneesh Kumar K.V

Anshuman Khandual  writes:

> This enables HugeTLB page migration for PPC64_BOOK3S systems which implement
> HugeTLB page at the PMD level. It enables the kernel configuration option
> CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION by default which turns on the function
> hugepage_migration_supported() during migration. After the recent changes
> to the PTE format, HugeTLB page migration happens successfully.
>
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/powerpc/Kconfig | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index e4824fd..65d52a0 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -82,6 +82,10 @@ config GENERIC_HWEIGHT
>  config ARCH_HAS_DMA_SET_COHERENT_MASK
>  bool
>
> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> + def_bool y
> + depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
> +
>  config PPC
>   bool
>   default y

Are you sure this is all that is needed ? We will get a FOLL_GET with hugetlb
migration and our follow_huge_addr will BUG_ON on that. Look at
e66f17ff71772b209eed39de35aaa99ba819c93d (" mm/hugetlb: take page table
lock in follow_huge_pmd()").

Again this doesn't work with 4K page size. So if you are taking this
route, we will need that restriction here.

I would suggest we switch 64K page size hugetlb to generic
hugetlb and then do hugetlb migration on top of that.

Till you help me understnd why that FOLL_GET issue is not valid for
powerpc,

NAK.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 0/9] PAPR hash page table resizing (guest side)

2016-01-28 Thread David Gibson

Here's a second prototype of the guest side work for runtime resizing
of the has page table in PAPR guests.

This is now feature complete.  It implements the resizing, advertises
it with CAS, and will automatically invoke it to maintain a good HPT
size when memory is hot-added or hot-removed.

Patches 1-5 are standalone prerequisite cleanups that I'll be pushing
concurrently.

David Gibson (9):
  memblock: Don't mark memblock_phys_mem_size() as __init
  arch/powerpc: Clean up error handling for htab_remove_mapping
  arch/powerpc: Handle removing maybe-present bolted HPTEs
  arch/powerpc: Clean up memory hotplug failure paths
  arch/powerpc: Split hash page table sizing heuristic into a helper
  pseries: Add hypercall wrappers for hash page table resizing
  pseries: Add support for hash table resizing
  pseries: Advertise HPT resizing support via CAS
  pseries: Automatically resize HPT for memory hot add/remove

 arch/powerpc/include/asm/firmware.h   |   5 +-
 arch/powerpc/include/asm/hvcall.h |   2 +
 arch/powerpc/include/asm/machdep.h|   3 +-
 arch/powerpc/include/asm/mmu-hash64.h |   3 +
 arch/powerpc/include/asm/plpar_wrappers.h |  12 +++
 arch/powerpc/include/asm/prom.h   |   1 +
 arch/powerpc/include/asm/sparsemem.h  |   1 +
 arch/powerpc/kernel/prom_init.c   |   2 +-
 arch/powerpc/mm/hash_utils_64.c   | 121 --
 arch/powerpc/mm/init_64.c |  47 
 arch/powerpc/mm/mem.c |  14 +++-
 arch/powerpc/platforms/pseries/firmware.c |   1 +
 arch/powerpc/platforms/pseries/lpar.c | 117 -
 mm/memblock.c |   2 +-
 14 files changed, 281 insertions(+), 50 deletions(-)

-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 1/9] memblock: Don't mark memblock_phys_mem_size() as __init

2016-01-28 Thread David Gibson

At the moment memblock_phys_mem_size() is marked as __init, and so is
discarded after boot.  This is different from most of the memblock
functions which are marked __init_memblock, and are only discarded after
boot if memory hotplug is not configured.

To allow for upcoming code which will need memblock_phys_mem_size() in the
hotplug path, change it from __init to __init_memblock.

Signed-off-by: David Gibson 
---
 mm/memblock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index d2ed81e..dd79899 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1448,7 +1448,7 @@ void __init __memblock_free_late(phys_addr_t base, 
phys_addr_t size)
  * Remaining API functions
  */
 
-phys_addr_t __init memblock_phys_mem_size(void)
+phys_addr_t __init_memblock memblock_phys_mem_size(void)
 {
return memblock.memory.total_size;
 }
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 2/9] arch/powerpc: Clean up error handling for htab_remove_mapping

2016-01-28 Thread David Gibson

Currently, the only error that htab_remove_mapping() can report is -EINVAL,
if removal of bolted HPTEs isn't implemeted for this platform.  We make
a few clean ups to the handling of this:

 * EINVAL isn't really the right code - there's nothing wrong with the
   function's arguments - use ENODEV instead
 * We were also printing a warning message, but that's a decision better
   left up to the callers, so remove it
 * One caller is vmemmap_remove_mapping(), which will just BUG_ON() on
   error, making the warning message irrelevant, so no change is needed
   there.
 * The other caller is remove_section_mapping().  This is called in the
   memory hot remove path at a point after vmemmap_remove_mapping() so
   if hpte_removebolted isn't implemented, we'd expect to have already
   BUG()ed anyway.  Put a WARN_ON() here, in lieu of a printk() since this
   really shouldn't be happening.

Signed-off-by: David Gibson 
---
 arch/powerpc/mm/hash_utils_64.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index ba59d59..9f7d727 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -273,11 +273,8 @@ int htab_remove_mapping(unsigned long vstart, unsigned 
long vend,
shift = mmu_psize_defs[psize].shift;
step = 1 << shift;
 
-   if (!ppc_md.hpte_removebolted) {
-   printk(KERN_WARNING "Platform doesn't implement "
-   "hpte_removebolted\n");
-   return -EINVAL;
-   }
+   if (!ppc_md.hpte_removebolted)
+   return -ENODEV;
 
for (vaddr = vstart; vaddr < vend; vaddr += step)
ppc_md.hpte_removebolted(vaddr, psize, ssize);
@@ -641,8 +638,10 @@ int create_section_mapping(unsigned long start, unsigned 
long end)
 
 int remove_section_mapping(unsigned long start, unsigned long end)
 {
-   return htab_remove_mapping(start, end, mmu_linear_psize,
-   mmu_kernel_ssize);
+   int rc = htab_remove_mapping(start, end, mmu_linear_psize,
+mmu_kernel_ssize);
+   WARN_ON(rc < 0);
+   return rc;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 3/9] arch/powerpc: Handle removing maybe-present bolted HPTEs

2016-01-28 Thread David Gibson

At the moment the hpte_removebolted callback in ppc_md returns void and
will BUG_ON() if the hpte it's asked to remove doesn't exist in the first
place.  This is awkward for the case of cleaning up a mapping which was
partially made before failing.

So, we add a return value to hpte_removebolted, and have it return ENOENT
in the case that the HPTE to remove didn't exist in the first place.

In the (sole) caller, we propagate errors in hpte_removebolted to its
caller to handle.  However, we handle ENOENT specially, continuing to
complete the unmapping over the specified range before returning the error
to the caller.

This means that htab_remove_mapping() will work sanely on a partially
present mapping, removing any HPTEs which are present, while also returning
ENOENT to its caller in case it's important there.

There are two callers of htab_remove_mapping():
   - In remove_section_mapping() we already WARN_ON() any error return,
 which is reasonable - in this case the mapping should be fully
 present
   - In vmemmap_remove_mapping() we BUG_ON() any error.  We change that to
 just a WARN_ON() in the case of ENOENT, since failing to remove a
 mapping that wasn't there in the first place probably shouldn't be
 fatal.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/machdep.h|  2 +-
 arch/powerpc/mm/hash_utils_64.c   | 10 +++---
 arch/powerpc/mm/init_64.c |  9 +
 arch/powerpc/platforms/pseries/lpar.c |  7 +--
 4 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 3f191f5..a7d3f66 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -54,7 +54,7 @@ struct machdep_calls {
   int psize, int apsize,
   int ssize);
long(*hpte_remove)(unsigned long hpte_group);
-   void(*hpte_removebolted)(unsigned long ea,
+   long(*hpte_removebolted)(unsigned long ea,
 int psize, int ssize);
void(*flush_hash_range)(unsigned long number, int local);
void(*hugepage_invalidate)(unsigned long vsid,
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 9f7d727..0737eae 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -269,6 +269,7 @@ int htab_remove_mapping(unsigned long vstart, unsigned long 
vend,
 {
unsigned long vaddr;
unsigned int step, shift;
+   int rc = 0;
 
shift = mmu_psize_defs[psize].shift;
step = 1 << shift;
@@ -276,10 +277,13 @@ int htab_remove_mapping(unsigned long vstart, unsigned 
long vend,
if (!ppc_md.hpte_removebolted)
return -ENODEV;
 
-   for (vaddr = vstart; vaddr < vend; vaddr += step)
-   ppc_md.hpte_removebolted(vaddr, psize, ssize);
+   for (vaddr = vstart; vaddr < vend; vaddr += step) {
+   rc = ppc_md.hpte_removebolted(vaddr, psize, ssize);
+   if ((rc < 0) && (rc != -ENOENT))
+   return rc;
+   }
 
-   return 0;
+   return rc;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 379a6a9..baa1a23 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -232,10 +232,11 @@ static void __meminit vmemmap_create_mapping(unsigned 
long start,
 static void vmemmap_remove_mapping(unsigned long start,
   unsigned long page_size)
 {
-   int mapped = htab_remove_mapping(start, start + page_size,
-mmu_vmemmap_psize,
-mmu_kernel_ssize);
-   BUG_ON(mapped < 0);
+   int rc = htab_remove_mapping(start, start + page_size,
+mmu_vmemmap_psize,
+mmu_kernel_ssize);
+   BUG_ON((rc < 0) && (rc != -ENOENT));
+   WARN_ON(rc == -ENOENT);
 }
 #endif
 
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 477290a..92d472d 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -505,7 +505,7 @@ static void pSeries_lpar_hugepage_invalidate(unsigned long 
vsid,
 }
 #endif
 
-static void pSeries_lpar_hpte_removebolted(unsigned long ea,
+static long pSeries_lpar_hpte_removebolted(unsigned long ea,
   int psize, int ssize)
 {
unsigned long vpn;
@@ -515,11 +515,14 @@ static void pSeries_lpar_hpte_removebolted(unsigned long 
ea,
vpn = hpt_vpn(ea, vsid, ssize);
 
slot = pSeries_lpar_hpte_find(vpn, psize, ssize);
-   BUG_ON(slot == -1);
+   if (slot == -1)
+   return -ENOENT;

[RFCv2 4/9] arch/powerpc: Clean up memory hotplug failure paths

2016-01-28 Thread David Gibson

This makes a number of cleanups to handling of mapping failures during
memory hotplug on Power:

For errors creating the linear mapping for the hot-added region:
  * This is now reported with EFAULT which is more appropriate than the
previous EINVAL (the failure is unlikely to be related to the
function's parameters)
  * An error in this path now prints a warning message, rather than just
silently failing to add the extra memory.
  * Previously a failure here could result in the region being partially
mapped.  We now clean up any partial mapping before failing.

For errors creating the vmemmap for the hot-added region:
   * This is now reported with EFAULT instead of causing a BUG() - this
 could happen for external reason (e.g. full hash table) so it's better
 to handle this non-fatally
   * An error message is also printed, so the failure won't be silent
   * As above a failure could cause a partially mapped region, we now
 clean this up.

Signed-off-by: David Gibson 
---
 arch/powerpc/mm/hash_utils_64.c | 13 ++---
 arch/powerpc/mm/init_64.c   | 38 ++
 arch/powerpc/mm/mem.c   | 10 --
 3 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0737eae..e88a86e 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -635,9 +635,16 @@ static unsigned long __init htab_get_table_size(void)
 #ifdef CONFIG_MEMORY_HOTPLUG
 int create_section_mapping(unsigned long start, unsigned long end)
 {
-   return htab_bolt_mapping(start, end, __pa(start),
-pgprot_val(PAGE_KERNEL), mmu_linear_psize,
-mmu_kernel_ssize);
+   int rc = htab_bolt_mapping(start, end, __pa(start),
+  pgprot_val(PAGE_KERNEL), mmu_linear_psize,
+  mmu_kernel_ssize);
+
+   if (rc < 0) {
+   int rc2 = htab_remove_mapping(start, end, mmu_linear_psize,
+ mmu_kernel_ssize);
+   BUG_ON(rc2 && (rc2 != -ENOENT));
+   }
+   return rc;
 }
 
 int remove_section_mapping(unsigned long start, unsigned long end)
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index baa1a23..fbc9448 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -188,9 +188,9 @@ static int __meminit vmemmap_populated(unsigned long start, 
int page_size)
  */
 
 #ifdef CONFIG_PPC_BOOK3E
-static void __meminit vmemmap_create_mapping(unsigned long start,
-unsigned long page_size,
-unsigned long phys)
+static int __meminit vmemmap_create_mapping(unsigned long start,
+   unsigned long page_size,
+   unsigned long phys)
 {
/* Create a PTE encoding without page size */
unsigned long i, flags = _PAGE_PRESENT | _PAGE_ACCESSED |
@@ -208,6 +208,8 @@ static void __meminit vmemmap_create_mapping(unsigned long 
start,
 */
for (i = 0; i < page_size; i += PAGE_SIZE)
BUG_ON(map_kernel_page(start + i, phys, flags));
+
+   return 0;
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
@@ -217,15 +219,20 @@ static void vmemmap_remove_mapping(unsigned long start,
 }
 #endif
 #else /* CONFIG_PPC_BOOK3E */
-static void __meminit vmemmap_create_mapping(unsigned long start,
-unsigned long page_size,
-unsigned long phys)
+static int __meminit vmemmap_create_mapping(unsigned long start,
+   unsigned long page_size,
+   unsigned long phys)
 {
-   int  mapped = htab_bolt_mapping(start, start + page_size, phys,
-   pgprot_val(PAGE_KERNEL),
-   mmu_vmemmap_psize,
-   mmu_kernel_ssize);
-   BUG_ON(mapped < 0);
+   int rc = htab_bolt_mapping(start, start + page_size, phys,
+  pgprot_val(PAGE_KERNEL),
+  mmu_vmemmap_psize, mmu_kernel_ssize);
+   if (rc < 0) {
+   int rc2 = htab_remove_mapping(start, start + page_size,
+ mmu_vmemmap_psize,
+ mmu_kernel_ssize);
+   BUG_ON(rc2 && (rc2 != -ENOENT));
+   }
+   return rc;
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
@@ -304,6 +311,7 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node)
 
for (; start < end; start += page_size) {
void *p;
+   int rc;
 
if

[RFCv2 5/9] arch/powerpc: Split hash page table sizing heuristic into a helper

2016-01-28 Thread David Gibson

htab_get_table_size() either retrieve the size of the hash page table (HPT)
from the device tree - if the HPT size is determined by firmware - or
uses a heuristic to determine a good size based on RAM size if the kernel
is responsible for allocating the HPT.

To support a PAPR extension allowing resizing of the HPT, we're going to
want the memory size -> HPT size logic elsewhere, so split it out into a
helper function.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/mmu-hash64.h |  3 +++
 arch/powerpc/mm/hash_utils_64.c   | 30 +-
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
b/arch/powerpc/include/asm/mmu-hash64.h
index 7352d3f..cf070fd 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -607,6 +607,9 @@ static inline unsigned long get_kernel_vsid(unsigned long 
ea, int ssize)
context = (MAX_USER_CONTEXT) + ((ea >> 60) - 0xc) + 1;
return get_vsid(context, ea, ssize);
 }
+
+unsigned htab_shift_for_mem_size(unsigned long mem_size);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_MMU_HASH64_H_ */
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index e88a86e..d63f7dc 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -606,10 +606,24 @@ static int __init htab_dt_scan_pftsize(unsigned long node,
return 0;
 }
 
-static unsigned long __init htab_get_table_size(void)
+unsigned htab_shift_for_mem_size(unsigned long mem_size)
 {
-   unsigned long mem_size, rnd_mem_size, pteg_count, psize;
+   unsigned memshift = __ilog2(mem_size);
+   unsigned pshift = mmu_psize_defs[mmu_virtual_psize].shift;
+   unsigned pteg_shift;
+
+   /* round mem_size up to next power of 2 */
+   if ((1UL << memshift) < mem_size)
+   memshift += 1;
+
+   /* aim for 2 pages / pteg */
+   pteg_shift = memshift - (pshift + 1);
+
+   return max(pteg_shift + 7, 18U);
+}
 
+static unsigned long __init htab_get_table_size(void)
+{
/* If hash size isn't already provided by the platform, we try to
 * retrieve it from the device-tree. If it's not there neither, we
 * calculate it now based on the total RAM size
@@ -619,17 +633,7 @@ static unsigned long __init htab_get_table_size(void)
if (ppc64_pft_size)
return 1UL << ppc64_pft_size;
 
-   /* round mem_size up to next power of 2 */
-   mem_size = memblock_phys_mem_size();
-   rnd_mem_size = 1UL << __ilog2(mem_size);
-   if (rnd_mem_size < mem_size)
-   rnd_mem_size <<= 1;
-
-   /* # pages / 2 */
-   psize = mmu_psize_defs[mmu_virtual_psize].shift;
-   pteg_count = max(rnd_mem_size >> (psize + 1), 1UL << 11);
-
-   return pteg_count << 7;
+   return htab_shift_for_mem_size(memblock_phys_mem_size());
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 6/9] pseries: Add hypercall wrappers for hash page table resizing

2016-01-28 Thread David Gibson

This adds the hypercall numbers and wrapper functions for the hash page
table resizing hypercalls.

These are experimental "platform specific" values for now, until we have a
formal PAPR update.

It also adds a new firmware feature flat to track the presence of the
HPT resizing calls.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/firmware.h   |  5 +++--
 arch/powerpc/include/asm/hvcall.h |  2 ++
 arch/powerpc/include/asm/plpar_wrappers.h | 12 
 arch/powerpc/platforms/pseries/firmware.c |  1 +
 4 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index b062924..32435d2 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -42,7 +42,7 @@
 #define FW_FEATURE_SPLPAR  ASM_CONST(0x0010)
 #define FW_FEATURE_LPARASM_CONST(0x0040)
 #define FW_FEATURE_PS3_LV1 ASM_CONST(0x0080)
-/* FreeASM_CONST(0x0100) */
+#define FW_FEATURE_HPT_RESIZE  ASM_CONST(0x0100)
 #define FW_FEATURE_CMO ASM_CONST(0x0200)
 #define FW_FEATURE_VPHNASM_CONST(0x0400)
 #define FW_FEATURE_XCMOASM_CONST(0x0800)
@@ -66,7 +66,8 @@ enum {
FW_FEATURE_MULTITCE | FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
-   FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN,
+   FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
+   FW_FEATURE_HPT_RESIZE,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e3b54dd..195e080 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -293,6 +293,8 @@
 
 /* Platform specific hcalls, used by KVM */
 #define H_RTAS 0xf000
+#define H_RESIZE_HPT_PREPARE   0xf003
+#define H_RESIZE_HPT_COMMIT0xf004
 
 /* "Platform specific hcalls", provided by PHYP */
 #define H_GET_24X7_CATALOG_PAGE0xF078
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 1b39424..b7ee6d9 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -242,6 +242,18 @@ static inline long plpar_pte_protect(unsigned long flags, 
unsigned long ptex,
return plpar_hcall_norets(H_PROTECT, flags, ptex, avpn);
 }
 
+static inline long plpar_resize_hpt_prepare(unsigned long flags,
+   unsigned long shift)
+{
+   return plpar_hcall_norets(H_RESIZE_HPT_PREPARE, flags, shift);
+}
+
+static inline long plpar_resize_hpt_commit(unsigned long flags,
+  unsigned long shift)
+{
+   return plpar_hcall_norets(H_RESIZE_HPT_COMMIT, flags, shift);
+}
+
 static inline long plpar_tce_get(unsigned long liobn, unsigned long ioba,
unsigned long *tce_ret)
 {
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index 8c80588..7b287be 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -63,6 +63,7 @@ hypertas_fw_features_table[] = {
{FW_FEATURE_VPHN,   "hcall-vphn"},
{FW_FEATURE_SET_MODE,   "hcall-set-mode"},
{FW_FEATURE_BEST_ENERGY,"hcall-best-energy-1*"},
+   {FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
 };
 
 /* Build up the firmware features bitmask using the contents of
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 7/9] pseries: Add support for hash table resizing

2016-01-28 Thread David Gibson

This adds support for using experimental hypercalls to change the size
of the main hash page table while running as a PAPR guest.  For now these
hypercalls are only in experimental qemu versions.

The interface is two part: first H_RESIZE_HPT_PREPARE is used to allocate
and prepare the new hash table.  This may be slow, but can be done
asynchronously.  Then, H_RESIZE_HPT_COMMIT is used to switch to the new
hash table.  This requires that no CPUs be concurrently updating the HPT,
and so must be run under stop_machine().

This also adds a debugfs file which can be used to manually control
HPT resizing or testing purposes.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/machdep.h|   1 +
 arch/powerpc/mm/hash_utils_64.c   |  28 +
 arch/powerpc/platforms/pseries/lpar.c | 110 ++
 3 files changed, 139 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index a7d3f66..532d795 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -61,6 +61,7 @@ struct machdep_calls {
   unsigned long addr,
   unsigned char *hpte_slot_array,
   int psize, int ssize, int local);
+   int (*resize_hpt)(unsigned long shift);
/*
 * Special for kexec.
 * To be called in real mode with interrupts disabled. No locks are
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index d63f7dc..882e409 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1578,3 +1579,30 @@ void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
/* Finally limit subsequent allocations */
memblock_set_current_limit(ppc64_rma_size);
 }
+
+static int ppc64_pft_size_get(void *data, u64 *val)
+{
+   *val = ppc64_pft_size;
+   return 0;
+}
+
+static int ppc64_pft_size_set(void *data, u64 val)
+{
+   if (!ppc_md.resize_hpt)
+   return -ENODEV;
+   return ppc_md.resize_hpt(val);
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(fops_ppc64_pft_size,
+   ppc64_pft_size_get, ppc64_pft_size_set, "%llu\n");
+
+static int __init hash64_debugfs(void)
+{
+   if (!debugfs_create_file("pft-size", 0600, powerpc_debugfs_root,
+NULL, _ppc64_pft_size)) {
+   pr_err("lpar: unable to create ppc64_pft_size debugsfs file\n");
+   }
+
+   return 0;
+}
+machine_device_initcall(pseries, hash64_debugfs);
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 92d472d..ebf02e7 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -603,6 +605,113 @@ static int __init disable_bulk_remove(char *str)
 
 __setup("bulk_remove=", disable_bulk_remove);
 
+#define HPT_RESIZE_TIMEOUT 1 /* ms */
+
+struct hpt_resize_state {
+   unsigned long shift;
+   int commit_rc;
+};
+
+static int pseries_lpar_resize_hpt_commit(void *data)
+{
+   struct hpt_resize_state *state = data;
+
+   state->commit_rc = plpar_resize_hpt_commit(0, state->shift);
+   if (state->commit_rc != H_SUCCESS)
+   return -EIO;
+
+   /* Hypervisor has transitioned the HTAB, update our globals */
+   ppc64_pft_size = state->shift;
+   htab_size_bytes = 1UL << ppc64_pft_size;
+   htab_hash_mask = (htab_size_bytes >> 7) - 1;
+
+   return 0;
+}
+
+/* Must be called in user context */
+static int pseries_lpar_resize_hpt(unsigned long shift)
+{
+   struct hpt_resize_state state = {
+   .shift = shift,
+   .commit_rc = H_FUNCTION,
+   };
+   unsigned int delay, total_delay = 0;
+   int rc;
+   ktime_t t0, t1, t2;
+
+   might_sleep();
+
+   if (!firmware_has_feature(FW_FEATURE_HPT_RESIZE))
+   return -ENODEV;
+
+   printk(KERN_INFO "lpar: Attempting to resize HPT to shift %lu\n",
+  shift);
+
+   t0 = ktime_get();
+
+   rc = plpar_resize_hpt_prepare(0, shift);
+   while (H_IS_LONG_BUSY(rc)) {
+   delay = get_longbusy_msecs(rc);
+   total_delay += delay;
+   if (total_delay > HPT_RESIZE_TIMEOUT) {
+   /* prepare call with shift==0 cancels an
+* in-progress resize */
+   rc = plpar_resize_hpt_prepare(0, 0);
+   if (rc != H_SUCCESS)
+   printk(KERN_WARNING
+  "lpar: Unexpected error %d cancelling 
timed out HPT resize\n",
+

[RFCv2 8/9] pseries: Advertise HPT resizing support via CAS

2016-01-28 Thread David Gibson

The hypervisor needs to know a guest is capable of using the HPT resizing
PAPR extension in order to make full advantage of it for memory hotplug.

If the hypervisor knows the guest is HPT resize aware, it can size the
initial HPT based on the initial guest RAM size, relying on the guest to
resize the HPT when more memory is hot-added.  Without this, the hypervisor
must size the HPT for the maximum possible guest RAM, which can lead to
a huge waste of space if the guest never actually expends to that maximum
size.

This patch advertises the guest's support for HPT resizing via the
ibm,client-architecture-support OF interface.  Obviously, the actual
encoding in the CAS vector is tentative until the extension is officially
incorporated into PAPR.  For now we use bit 0 of (previously unused) byte 8
of option vector 5.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/prom.h | 1 +
 arch/powerpc/kernel/prom_init.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 7f436ba..ef08208 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -151,6 +151,7 @@ struct of_drconf_cell {
 #define OV5_XCMO   0x0440  /* Page Coalescing */
 #define OV5_TYPE1_AFFINITY 0x0580  /* Type 1 NUMA affinity */
 #define OV5_PRRN   0x0540  /* Platform Resource Reassignment */
+#define OV5_HPT_RESIZE 0x0880  /* Hash Page Table resizing */
 #define OV5_PFO_HW_RNG 0x0E80  /* PFO Random Number Generator */
 #define OV5_PFO_HW_842 0x0E40  /* PFO Compression Accelerator */
 #define OV5_PFO_HW_ENCR0x0E20  /* PFO Encryption Accelerator */
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index da51925..c6feafb 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -713,7 +713,7 @@ unsigned char ibm_architecture_vec[] = {
OV5_FEAT(OV5_TYPE1_AFFINITY) | OV5_FEAT(OV5_PRRN),
0,
0,
-   0,
+   OV5_FEAT(OV5_HPT_RESIZE),
/* WARNING: The offset of the "number of cores" field below
 * must match by the macro below. Update the definition if
 * the structure layout changes.
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 9/9] pseries: Automatically resize HPT for memory hot add/remove

2016-01-28 Thread David Gibson

We've now implemented code in the pseries platform to use the new PAPR
interface to allow resizing the hash page table (HPT) at runtime.

This patch uses that interface to automatically attempt to resize the HPT
when memory is hot added or removed.  This tries to always keep the HPT at
a reasonable size for our current memory size.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/sparsemem.h |  1 +
 arch/powerpc/mm/hash_utils_64.c  | 29 +
 arch/powerpc/mm/mem.c|  4 
 3 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index f6fc0ee..737335c 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -16,6 +16,7 @@
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+extern void resize_hpt_for_hotplug(unsigned long new_mem_size);
 extern int create_section_mapping(unsigned long start, unsigned long end);
 extern int remove_section_mapping(unsigned long start, unsigned long end);
 #ifdef CONFIG_NUMA
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 882e409..18cc851 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -638,6 +638,35 @@ static unsigned long __init htab_get_table_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+void resize_hpt_for_hotplug(unsigned long new_mem_size)
+{
+   unsigned target_hpt_shift;
+
+   if (!ppc_md.resize_hpt)
+   return;
+
+   target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
+
+   /*
+* To avoid lots of HPT resizes if memory size is fluctuating
+* across a boundary, we deliberately have some hysterisis
+* here: we immediately increase the HPT size if the target
+* shift exceeds the current shift, but we won't attempt to
+* reduce unless the target shift is at least 2 below the
+* current shift
+*/
+   if ((target_hpt_shift > ppc64_pft_size)
+   || (target_hpt_shift < (ppc64_pft_size - 1))) {
+   int rc;
+
+   rc = ppc_md.resize_hpt(target_hpt_shift);
+   if (rc)
+   printk(KERN_WARNING
+  "Unable to resize hash page table to target 
order %d: %d\n",
+  target_hpt_shift, rc);
+   }
+}
+
 int create_section_mapping(unsigned long start, unsigned long end)
 {
int rc = htab_bolt_mapping(start, end, __pa(start),
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 8ffc1e2..e77f36c 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -121,6 +121,8 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
for_device)
unsigned long nr_pages = size >> PAGE_SHIFT;
int rc;
 
+   resize_hpt_for_hotplug(memblock_phys_mem_size());
+
pgdata = NODE_DATA(nid);
 
start = (unsigned long)__va(start);
@@ -161,6 +163,8 @@ int arch_remove_memory(u64 start, u64 size)
 */
vm_unmap_aliases();
 
+   resize_hpt_for_hotplug(memblock_phys_mem_size());
+
return ret;
 }
 #endif
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

57 matches

Mail list logo