Re: [PATCH 2/3] drm:msm: Initial Add Writeback Support

2015-04-06 Thread Daniel Vetter
On Thu, Apr 02, 2015 at 10:29:52AM -0400, Rob Clark wrote:
> So, from a quick look, it seems like there is a lot of potential to
> split the v4l part out into some drm helpers.. it looks pretty
> generic(ish), or at least it could be with some strategically placed
> vfuncs in drm_v4l2_helper_funcs.
> 
> I do think we need to figure out the auth/security situation.  We
> probably don't want to let arbitrary processes open a v4l device and
> snoop on the screen contents.  We perhaps could re-use the dri2 drm
> auth stuff (v4l2_drm_get_magic ioctl?).  Or, well, it would be nice if
> the wb device could be made to not exist in /dev at all, and
> pre-open'd fd returned from an ioctl on the drm device, but not really
> sure if that is possible (or too weird).  Once the compositor process
> has the v4l device open and authenticated somehow, I expect it would
> use fd passing to pass the fd off to a trusted helper process.

Please don't resurrect the magic stuff ;-)

Anyway I discussed this a bit with Laurent and we figured the best way to
wire up writeback support is by using drm framebuffers. Then you can use
atomic flips to create a new snapshot. Of course that won't work with hw
where writeback is continuous, there v4l is a much better fit. And we also
have hardware where some v4l pipeline could directly feed into a drm
output pipeline, so we need a generic way to connect v4l and drm anyway.
For that I think we should add a new flag to addfb2 (or a new addfbv4l)
which creates a magic framebuffer from a v4l input/output. Some values
like stride don't make sense in such a virtual framebuffer, but pixel
format and size are all needed.

This way we don't need parallel abis for single-shot writeback directly
into framebuffers and for continuous writeback through v4l, we can reuse
the same drm framebuffer ones. And this also solves the security issues
since no one can start writeback without the drm device owner's consent,
so no need to reinvent anything there. And with atomic we already have
almost everything there: For the writeback framebuffer we only need a new
"WRITEBACK" property (which takes an fb id) and the small extension to
create v4l-backed framebuffers.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mmc: sdhci-of-arasan: Call OF parsing for MMC

2015-04-06 Thread Michal Simek
Also check MMC OF properties. The controller supports MMC too.

Signed-off-by: Michal Simek 
---

 drivers/mmc/host/sdhci-of-arasan.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/mmc/host/sdhci-of-arasan.c 
b/drivers/mmc/host/sdhci-of-arasan.c
index bcb51e9..b8cfac1 100644
--- a/drivers/mmc/host/sdhci-of-arasan.c
+++ b/drivers/mmc/host/sdhci-of-arasan.c
@@ -173,6 +173,12 @@ static int sdhci_arasan_probe(struct platform_device *pdev)
pltfm_host->priv = sdhci_arasan;
pltfm_host->clk = clk_xin;
 
+   ret = mmc_of_parse(host->mmc);
+   if (ret) {
+   dev_err(>dev, "parsing dt failed (%u)\n", ret);
+   goto clk_disable_all;
+   }
+
ret = sdhci_add_host(host);
if (ret)
goto err_pltfm_free;
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.18.11

2015-04-06 Thread Guenter Roeck
On Sun, Apr 05, 2015 at 10:31:52AM -0400, Sasha Levin wrote:
> I'm announcing the release of the 3.18.11 kernel.
> 
> All users of the 3.18 kernel series must upgrade.
> 
> The updated 3.18.y git tree can be found at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
> linux-3.18.y
> and can be browsed at the normal kernel.org git web browser:
>   
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
> 
Hi Sasha,

it would be great if you can Cc: me on the review announcements for 3.18.
This way I would be informed when you are about to release a new version,
and I could provide build and qemu test feedback.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 6/7] thermal: Use bool function return values of true/false not 1/0

2015-04-06 Thread Zhang, Rui


> -Original Message-
> From: Joe Perches [mailto:j...@perches.com]
> Sent: Tuesday, March 31, 2015 1:43 AM
> To: linux-kernel@vger.kernel.org; Zhang, Rui; Eduardo Valentin
> Cc: linux...@vger.kernel.org
> Subject: [PATCH 6/7] thermal: Use bool function return values of true/false 
> not
> 1/0
> Importance: High
> 
> Use the normal return values for bool functions
> 
> Signed-off-by: Joe Perches 

Applied.

Thanks,
rui
> ---
>  drivers/thermal/thermal_core.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
> index 9e20e4d..749d41a 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -115,7 +115,7 @@ static inline int of_thermal_get_ntrips(struct
> thermal_zone_device *tz)  static inline bool of_thermal_is_trip_valid(struct
> thermal_zone_device *tz,
>   int trip)
>  {
> - return 0;
> + return false;
>  }
>  static inline const struct thermal_trip *  of_thermal_get_trip_points(struct
> thermal_zone_device *tz)
> --
> 2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] PCI: Set pref for mem64 resource of pcie device

2015-04-06 Thread Yinghai Lu
On Mon, Apr 6, 2015 at 8:43 PM, Bjorn Helgaas  wrote:
> On Mon, Apr 6, 2015 at 8:13 PM, Yinghai Lu  wrote:
>> On Mon, Apr 6, 2015 at 3:49 PM, Bjorn Helgaas  wrote:
>>>
>>> For "[PATCH 1/3] PCI: Introduce pci_bus_addr_t", I'm waiting for an updated
>>> version with Kconfig tweaks so we don't break other arches.
>>
>> I was thinking that you will update it manually.
>
> I asked you for an updated version, incorporating the documentation
> updates, to make sure I got everything you intended.  But I did go
> ahead and do it manually for you.

Good.

http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=for-linus=f70899ff889a38f9697d3c153aaacaed25f501c3

Please consider to split that into two patches.
First one include changes in : include/linux/types.h and
Documentation/DMA-API-HOWTO.txt.
and you should be author for that.


>
>>> For "[PATCH 2/3] sparc/PCI: Add mem64 resource parsing for root bus", I'm
>>> waiting for a version that fixes the other of_bus_pci_get_flags() and
>>> pci_parse_of_flags() implementations at the same time (or an explanation
>>> about why we should fix only the arch/sparc version).  I don't want to fix
>>> one place and leave the same bug in other places.
>>
>> I don't even know if other arch like powerpc support 64-bit bus address.
>>
>> No one from powerpc reported a problem, why should we mess it up now?
>>
>> I would like to see someone get access those three kinds of machine that 
>> support
>> of to unify of support code.
>
> Of course changes there should be tested on all the affected machines.
> I opened https://bugzilla.kernel.org/show_bug.cgi?id=96241 and
> assigned it to you as a reminder that there is nearly identical code
> in several other places that may have the same issue.

ok, will on that to have three patches cover them.

>
> I pushed these two changes to for-linus.  I'll work on the third
> tomorrow.  The current changelog is very sparc64-centric, and it needs
> to be much more explicit about how the change will affect every arch.

Sure. That will affect all platform.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the drm tree with Linus' tree

2015-04-06 Thread Stephen Rothwell
Hi Dave,

Today's linux-next merge of the drm tree got a conflict in
drivers/gpu/drm/i915/intel_sprite.c between commit 840a1cf0cd53
("drm/i915: Reject the colorkey ioctls for primary and cursor planes")
from Linus' tree and commit a8265c59e22a ("drm/i915: Rip out
GET_SPRITE_COLORKEY ioctl") from the drm tree.

I fixed it up (The latter removed some of the code modified by the
former) and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpgYpnleaVqA.pgp
Description: OpenPGP digital signature


Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!

2015-04-06 Thread Mike Galbraith
On Mon, 2015-04-06 at 21:59 -0400, Steven Rostedt wrote:
> 
> We really should have a rt_spin_trylock_in_irq() and not have the
> below if conditional.
> 
> The paths that will be executed in hard irq context are static. They
> should be labeled as such.

I did it as an explicitly labeled special purpose (naughty) pair.

---
 include/linux/spinlock_rt.h |2 ++
 kernel/locking/rtmutex.c|   31 ++-
 2 files changed, 32 insertions(+), 1 deletion(-)

--- a/include/linux/spinlock_rt.h
+++ b/include/linux/spinlock_rt.h
@@ -27,6 +27,8 @@ extern void __lockfunc rt_spin_unlock_wa
 extern int __lockfunc rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long 
*flags);
 extern int __lockfunc rt_spin_trylock_bh(spinlock_t *lock);
 extern int __lockfunc rt_spin_trylock(spinlock_t *lock);
+extern int __lockfunc rt_spin_trylock_in_irq(spinlock_t *lock);
+extern void __lockfunc rt_spin_trylock_in_irq_unlock(spinlock_t *lock);
 extern int atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock);
 
 /*
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -87,7 +87,7 @@ static int rt_mutex_real_waiter(struct r
  * supports cmpxchg and if there's no debugging state to be set up
  */
 #if defined(__HAVE_ARCH_CMPXCHG) && !defined(CONFIG_DEBUG_RT_MUTEXES)
-# define rt_mutex_cmpxchg(l,c,n)   (cmpxchg(>owner, c, n) == c)
+# define rt_mutex_cmpxchg(l,c,n)   (cmpxchg(&(l)->owner, (c), (n)) == (c))
 static inline void mark_rt_mutex_waiters(struct rt_mutex *lock)
 {
unsigned long owner, *p = (unsigned long *) >owner;
@@ -1208,6 +1208,35 @@ int __lockfunc rt_spin_trylock_irqsave(s
 }
 EXPORT_SYMBOL(rt_spin_trylock_irqsave);
 
+/*
+ * Special purpose for locks taken in interrupt context: Take and hold
+ * ->wait_lock lest PI catching us with our fingers in the cookie jar.
+ * Do NOT abuse.
+ */
+int __lockfunc rt_spin_trylock_in_irq(spinlock_t *lock)
+{
+   struct task_struct *owner;
+   if (!raw_spin_trylock(>lock.wait_lock))
+   return 0;
+   owner = idle_task(raw_smp_processor_id());
+   if (!(rt_mutex_cmpxchg(>lock, NULL, owner))) {
+   raw_spin_unlock(>lock.wait_lock);
+   return 0;
+   }
+   spin_acquire(>dep_map, 0, 1, _RET_IP_);
+   return 1;
+}
+
+/* ONLY for use with rt_spin_trylock_in_irq(), do NOT abuse. */
+void __lockfunc rt_spin_trylock_in_irq_unlock(spinlock_t *lock)
+{
+   struct task_struct *owner = idle_task(raw_smp_processor_id());
+   /* NOTE: we always pass in '1' for nested, for simplicity */
+   spin_release(>dep_map, 1, _RET_IP_);
+   BUG_ON(!(rt_mutex_cmpxchg(>lock, owner, NULL)));
+   raw_spin_unlock(>lock.wait_lock);
+}
+
 int atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock)
 {
/* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 2/6] i2c: qup: Add V2 tags support

2015-04-06 Thread Andy Gross
On Tue, Apr 07, 2015 at 12:01:03AM +0530, Sricharan R wrote:



> +static u32 qup_i2c_send_data(struct qup_i2c_dev *qup, int tlen, u8 *tbuf,
> +  int dlen, u8 *dbuf)
> +{
> + u32 val = 0, idx = 0, pos = 0, i = 0, t;
> + int  len = tlen + dlen;
> + u8 *buf = tbuf;
> +
> + while (len > 0) {
> + if (qup_i2c_wait_ready(qup, QUP_OUT_FULL, 0, 4)) {

Instead of 0 and 4 can we use some #defines?  This applies for all of the
i2c_wait_ready calls




-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [v4 0/8] Add VT-d Posted-Interrupts support - IOMMU part

2015-04-06 Thread Wu, Feng


> -Original Message-
> From: Li, ZhenHua [mailto:zhen-h...@hp.com]
> Sent: Friday, April 03, 2015 4:13 PM
> To: Wu, Feng
> Cc: Joerg Roedel; dw...@infradead.org; jiang@linux.intel.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org; Li, ZhenHua
> Subject: Re: [v4 0/8] Add VT-d Posted-Interrupts support - IOMMU part
> 
> Hi Feng Wu,
> In my patchset, I created a new member ir_table->base_old_phys; In the
> normal kernel, everything is the same. In kdump kernel, ir_table->base
> is used for a buffer, and  ir_table->base_old_phys is the physical
> address of the tables used by the old kernel, also being used by the
> current kernel.
> 
> I did this in modify_irte():
> 
>  set_64bit(>high, irte_modified->high);
> +
> +#ifdef CONFIG_CRASH_DUMP
> +   if (is_kdump_kernel())
> +   __iommu_update_old_irte(iommu, index);
> +#endif
>  __iommu_flush_cache(iommu, irte, sizeof(*irte));
> 
> 
> Here the irte tables are stored in two places:
> iommu->ir_table->base : It is a buffer in kdump kernel, which is the
> running kernel;
> iommu->ir_table->base_old_phys : It is the irte used by the old kernel;
> 
> And function __iommu_update_old_irte is used to save the content of
> iommu->ir_table->base  to iommu->ir_table->base_old_phys. Because in
> kdump kernel, the vt-d is using ir_table->base_old_phys, not
> ir_table->base, so we need to copy the updated ir_table->base to
> ir_table->base_old_phys .
> 

Hi Zhenhua,

Thanks very much for your clarification! Basically, the main purpose of my
Patch-set is to provide an interface to KVM, so that KVM can update irte
for posted-interrupts. In this interface, it calls modify_irte(). I also go
through your patch set with the associated part, seems I cannot find
any conflicts with your patches. What is your idea about this? Thanks a lot!

Thanks,
Feng

-


> 
> Thanks
> Zhenhua
> 
> On 04/02/2015 07:28 PM, Joerg Roedel wrote:
> > On Mon, Feb 02, 2015 at 04:06:56PM +0800, Feng Wu wrote:
> >> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> >> With VT-d Posted-Interrupts enabled, external interrupts from
> >> direct-assigned devices can be delivered to guests without VMM
> >> intervention when guest is running in non-root mode.
> >>
> >> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> >>
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> y/vt-directed-io-spec.html
> >>
> >> This series was part of
> http://thread.gmane.org/gmane.linux.kernel.iommu/7708. To make things
> clear, send out IOMMU part here.
> >
> > Besides the modify_irte() changes I asked for the patch-set looks good.
> > I just have some concerns what these changes mean for the VT-d kdump
> > improvements Zhen-Hua Li is working on. Can you please discuss the
> > implications of having both patch-sets applied with him and make sure
> > they work together? I think in its current form your patch-set breaks
> > the kdump support patches. I added Zhen-Hua to Cc.
> >
> > Thanks,
> >
> > Joerg
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 3/7] ia64/PCI: Use common struct resource_entry to replace struct iospace_resource

2015-04-06 Thread Jiang Liu
Use common struct resource_entry to replace private struct iospace_resource.

Signed-off-by: Jiang Liu 
---
 arch/ia64/include/asm/pci.h |5 -
 arch/ia64/pci/pci.c |   17 -
 2 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/arch/ia64/include/asm/pci.h b/arch/ia64/include/asm/pci.h
index 52af5ed9f60b..5c10e0ec48d4 100644
--- a/arch/ia64/include/asm/pci.h
+++ b/arch/ia64/include/asm/pci.h
@@ -83,11 +83,6 @@ extern int pci_mmap_legacy_page_range(struct pci_bus *bus,
 #define pci_legacy_read platform_pci_legacy_read
 #define pci_legacy_write platform_pci_legacy_write
 
-struct iospace_resource {
-   struct list_head list;
-   struct resource res;
-};
-
 struct pci_controller {
struct acpi_device *companion;
void *iommu;
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index cafe8e47afb2..c5630dd5e181 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -154,14 +154,14 @@ new_space (u64 phys_base, int sparse)
 static int add_io_space(struct device *dev, struct pci_root_info *info,
struct resource_entry *entry)
 {
-   struct iospace_resource *iospace;
+   struct resource_entry *iospace;
struct resource *resource, *res = entry->res;
char *name;
unsigned long base, min, max, base_port;
unsigned int sparse = 0, space_nr, len;
 
len = strlen(info->name) + 32;
-   iospace = kzalloc(sizeof(*iospace) + len, GFP_KERNEL);
+   iospace = resource_list_create_entry(NULL, len);
if (!iospace) {
dev_err(dev, "PCI: No memory for %s I/O port space\n",
info->name);
@@ -190,7 +190,7 @@ static int add_io_space(struct device *dev, struct 
pci_root_info *info,
if (space_nr == 0)
sparse = 1;
 
-   resource = >res;
+   resource = iospace->res;
resource->name  = name;
resource->flags = IORESOURCE_MEM;
resource->start = base + (sparse ? IO_SPACE_SPARSE_ENCODING(min) : min);
@@ -205,12 +205,12 @@ static int add_io_space(struct device *dev, struct 
pci_root_info *info,
entry->offset = base_port;
res->start = min + base_port;
res->end = max + base_port;
-   list_add_tail(>list, >io_resources);
+   resource_list_add_tail(iospace, >io_resources);
 
return 0;
 
 free_resource:
-   kfree(iospace);
+   resource_list_free_entry(iospace);
return -ENOSPC;
 }
 
@@ -348,12 +348,11 @@ static void add_resources(struct pci_root_info *info, 
struct device *dev)
 static void __release_pci_root_info(struct pci_root_info *info)
 {
struct resource *res;
-   struct iospace_resource *iospace, *tmp;
struct resource_entry *entry, *tentry;
 
-   list_for_each_entry_safe(iospace, tmp, >io_resources, list) {
-   release_resource(>res);
-   kfree(iospace);
+   resource_list_for_each_entry_safe(entry, tentry, >io_resources) {
+   release_resource(entry->res);
+   resource_list_destroy_entry(entry);
}
 
resource_list_for_each_entry_safe(entry, tentry, >resources) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 4/7] x86/PCI: Rename struct pci_sysdata as struct pci_controller

2015-04-06 Thread Jiang Liu
Rename struct pci_sysdata as struct pci_controller, so we could share
common code between IA64 and x86 later.

Signed-off-by: Jiang Liu 
---
 arch/x86/include/asm/pci.h|   12 ++--
 arch/x86/include/asm/pci_64.h |4 ++--
 arch/x86/pci/acpi.c   |8 
 arch/x86/pci/common.c |2 +-
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index 4e370a5d8117..5fcdf53fcd54 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -11,15 +11,15 @@
 
 #ifdef __KERNEL__
 
-struct pci_sysdata {
-   int domain; /* PCI domain */
-   int node;   /* NUMA node */
+struct pci_controller {
 #ifdef CONFIG_ACPI
struct acpi_device *companion;  /* ACPI companion device */
 #endif
 #ifdef CONFIG_X86_64
void*iommu; /* IOMMU private data */
 #endif
+   int segment;/* PCI domain */
+   int node;   /* NUMA node */
 };
 
 extern int pci_routeirq;
@@ -31,8 +31,8 @@ extern int noioapicreroute;
 #ifdef CONFIG_PCI_DOMAINS
 static inline int pci_domain_nr(struct pci_bus *bus)
 {
-   struct pci_sysdata *sd = bus->sysdata;
-   return sd->domain;
+   struct pci_controller *sd = bus->sysdata;
+   return sd->segment;
 }
 
 static inline int pci_proc_domain(struct pci_bus *bus)
@@ -127,7 +127,7 @@ int setup_msi_irq(struct pci_dev *dev, struct msi_desc 
*msidesc,
 /* Returns the node based on pci bus */
 static inline int __pcibus_to_node(const struct pci_bus *bus)
 {
-   const struct pci_sysdata *sd = bus->sysdata;
+   const struct pci_controller *sd = bus->sysdata;
 
return sd->node;
 }
diff --git a/arch/x86/include/asm/pci_64.h b/arch/x86/include/asm/pci_64.h
index fe15cfb21b9b..dcbb6b52d4fd 100644
--- a/arch/x86/include/asm/pci_64.h
+++ b/arch/x86/include/asm/pci_64.h
@@ -6,13 +6,13 @@
 #ifdef CONFIG_CALGARY_IOMMU
 static inline void *pci_iommu(struct pci_bus *bus)
 {
-   struct pci_sysdata *sd = bus->sysdata;
+   struct pci_controller *sd = bus->sysdata;
return sd->iommu;
 }
 
 static inline void set_pci_iommu(struct pci_bus *bus, void *val)
 {
-   struct pci_sysdata *sd = bus->sysdata;
+   struct pci_controller *sd = bus->sysdata;
sd->iommu = val;
 }
 #endif /* CONFIG_CALGARY_IOMMU */
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 150774be0f3f..e7f35dc84801 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -10,7 +10,7 @@
 struct pci_root_info {
struct acpi_device *bridge;
char name[16];
-   struct pci_sysdata sd;
+   struct pci_controller sd;
 #ifdef CONFIG_PCI_MMCONFIG
bool mcfg_added;
u16 segment;
@@ -366,7 +366,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root 
*root)
LIST_HEAD(crs_res);
LIST_HEAD(resources);
struct pci_bus *bus;
-   struct pci_sysdata *sd;
+   struct pci_controller *sd;
int node;
 
if (pci_ignore_seg)
@@ -398,7 +398,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root 
*root)
}
 
sd = >sd;
-   sd->domain = domain;
+   sd->segment = domain;
sd->node = node;
sd->companion = device;
 
@@ -464,7 +464,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root 
*root)
 
 int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
 {
-   struct pci_sysdata *sd = bridge->bus->sysdata;
+   struct pci_controller *sd = bridge->bus->sysdata;
 
ACPI_COMPANION_SET(>dev, sd->companion);
return 0;
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 2fb384724ebb..e87194bad7e6 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -475,7 +475,7 @@ void __init dmi_check_pciprobe(void)
 void pcibios_scan_root(int busnum)
 {
struct pci_bus *bus;
-   struct pci_sysdata *sd;
+   struct pci_controller *sd;
LIST_HEAD(resources);
 
sd = kzalloc(sizeof(*sd), GFP_KERNEL);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 6/7] x86/PCI/ACPI: Use common interface to support PCI host bridge

2015-04-06 Thread Jiang Liu
Use common interface to simplify ACPI PCI host bridge implementation.

Signed-off-by: Jiang Liu 
---
 arch/x86/pci/acpi.c |  281 +--
 1 file changed, 72 insertions(+), 209 deletions(-)

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index e7f35dc84801..b52a4d96bc89 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -4,13 +4,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 struct pci_root_info {
-   struct acpi_device *bridge;
-   char name[16];
-   struct pci_controller sd;
+   struct acpi_pci_root_info_common common;
 #ifdef CONFIG_PCI_MMCONFIG
bool mcfg_added;
u16 segment;
@@ -165,14 +164,17 @@ static int check_segment(u16 seg, struct device *dev, 
char *estr)
return 0;
 }
 
-static int setup_mcfg_map(struct pci_root_info *info, u16 seg, u8 start,
- u8 end, phys_addr_t addr)
+static int setup_mcfg_map(struct acpi_pci_root_info_common *ci)
 {
int result;
-   struct device *dev = >bridge->dev;
+   struct pci_root_info *info;
+   struct acpi_pci_root *root = ci->root;
+   struct device *dev = >bridge->dev;
+   int seg = ci->controller.segment;
 
-   info->start_bus = start;
-   info->end_bus = end;
+   info = container_of(ci, struct pci_root_info, common);
+   info->start_bus = (u8)root->secondary.start;
+   info->end_bus = (u8)root->secondary.end;
info->mcfg_added = false;
 
/* return success if MMCFG is not in use */
@@ -182,7 +184,8 @@ static int setup_mcfg_map(struct pci_root_info *info, u16 
seg, u8 start,
if (!(pci_probe & PCI_PROBE_MMCONF))
return check_segment(seg, dev, "MMCONFIG is disabled,");
 
-   result = pci_mmconfig_insert(dev, seg, start, end, addr);
+   result = pci_mmconfig_insert(dev, seg, info->start_bus, info->end_bus,
+root->mcfg_addr);
if (result == 0) {
/* enable MMCFG if it hasn't been enabled yet */
if (raw_pci_ext_ops == NULL)
@@ -195,8 +198,11 @@ static int setup_mcfg_map(struct pci_root_info *info, u16 
seg, u8 start,
return 0;
 }
 
-static void teardown_mcfg_map(struct pci_root_info *info)
+static void teardown_mcfg_map(struct acpi_pci_root_info_common *ci)
 {
+   struct pci_root_info *info;
+
+   info = container_of(ci, struct pci_root_info, common);
if (info->mcfg_added) {
pci_mmconfig_delete(info->segment, info->start_bus,
info->end_bus);
@@ -204,170 +210,80 @@ static void teardown_mcfg_map(struct pci_root_info *info)
}
 }
 #else
-static int setup_mcfg_map(struct pci_root_info *info,
-   u16 seg, u8 start, u8 end,
-   phys_addr_t addr)
+static int setup_mcfg_map(struct acpi_pci_root_info_common *ci)
 {
return 0;
 }
-static void teardown_mcfg_map(struct pci_root_info *info)
+
+static void teardown_mcfg_map(struct acpi_pci_root_info_common *ci)
 {
 }
 #endif
 
-static void validate_resources(struct device *dev, struct list_head *crs_res,
-  unsigned long type)
+static int pci_acpi_root_get_node(struct acpi_pci_root *root)
 {
-   LIST_HEAD(list);
-   struct resource *res1, *res2, *root = NULL;
-   struct resource_entry *tmp, *entry, *entry2;
-
-   BUG_ON((type & (IORESOURCE_MEM | IORESOURCE_IO)) == 0);
-   root = (type & IORESOURCE_MEM) ? _resource : _resource;
-
-   list_splice_init(crs_res, );
-   resource_list_for_each_entry_safe(entry, tmp, ) {
-   bool free = false;
-   resource_size_t end;
-
-   res1 = entry->res;
-   if (!(res1->flags & type))
-   goto next;
-
-   /* Exclude non-addressable range or non-addressable portion */
-   end = min(res1->end, root->end);
-   if (end <= res1->start) {
-   dev_info(dev, "host bridge window %pR (ignored, not CPU 
addressable)\n",
-res1);
-   free = true;
-   goto next;
-   } else if (res1->end != end) {
-   dev_info(dev, "host bridge window %pR ([%#llx-%#llx] 
ignored, not CPU addressable)\n",
-res1, (unsigned long long)end + 1,
-(unsigned long long)res1->end);
-   res1->end = end;
-   }
-
-   resource_list_for_each_entry(entry2, crs_res) {
-   res2 = entry2->res;
-   if (!(res2->flags & type))
-   continue;
-
-   /*
-* I don't like throwing away windows because then
-* our resources no longer match the ACPI _CRS, but
-

[RFC 7/7] ia64/PCI/ACPI: Use common interface to support PCI host bridge

2015-04-06 Thread Jiang Liu
Use common interface to simplify PCI host bridge implementation.

Tested-by: Tony Luck 
Signed-off-by: Jiang Liu 
---
 arch/ia64/pci/pci.c |  243 +++
 1 file changed, 49 insertions(+), 194 deletions(-)

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index c5630dd5e181..efd78cea6a1e 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -116,15 +116,11 @@ struct pci_ops pci_root_ops = {
 };
 
 struct pci_root_info {
-   struct pci_controller controller;
-   struct acpi_device *bridge;
-   struct list_head resources;
+   struct acpi_pci_root_info_common common;
struct list_head io_resources;
-   char name[16];
 };
 
-static unsigned int
-new_space (u64 phys_base, int sparse)
+static unsigned int new_space(u64 phys_base, int sparse)
 {
u64 mmio_base;
int i;
@@ -160,11 +156,11 @@ static int add_io_space(struct device *dev, struct 
pci_root_info *info,
unsigned long base, min, max, base_port;
unsigned int sparse = 0, space_nr, len;
 
-   len = strlen(info->name) + 32;
+   len = strlen(info->common.name) + 32;
iospace = resource_list_create_entry(NULL, len);
if (!iospace) {
dev_err(dev, "PCI: No memory for %s I/O port space\n",
-   info->name);
+   info->common.name);
return -ENOMEM;
}
 
@@ -179,7 +175,7 @@ static int add_io_space(struct device *dev, struct 
pci_root_info *info,
max = res->end - entry->offset;
base = __pa(io_space[space_nr].mmio_base);
base_port = IO_SPACE_BASE(space_nr);
-   snprintf(name, len, "%s I/O Ports %08lx-%08lx", info->name,
+   snprintf(name, len, "%s I/O Ports %08lx-%08lx", info->common.name,
 base_port + min, base_port + max);
 
/*
@@ -214,216 +210,75 @@ free_resource:
return -ENOSPC;
 }
 
-static int
-probe_pci_root_info(struct pci_root_info *info, struct acpi_device *device,
-   int busnum, int domain)
+static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info_common 
*ci,
+  int status)
 {
-   int ret;
-   struct list_head *list = >resources;
+   struct device *dev = >bridge->dev;
+   struct pci_root_info *info;
+   struct resource *res;
struct resource_entry *entry, *tmp;
 
-   ret = acpi_dev_get_resources(device, list,
-acpi_dev_filter_resource_type_cb,
-(void *)(IORESOURCE_IO | IORESOURCE_MEM | 
IORESOURCE_WINDOW));
-   if (ret < 0)
-   dev_warn(>dev,
-"failed to parse _CRS method, error code %d\n", ret);
-   else if (ret == 0)
-   dev_dbg(>dev,
-   "no IO and memory resources present in _CRS\n");
-   else
-   resource_list_for_each_entry_safe(entry, tmp, list) {
-   if (entry->res->flags & IORESOURCE_DISABLED)
-   resource_list_destroy_entry(entry);
-   else
-   entry->res->name = info->name;
+   if (status > 0) {
+   info = container_of(ci, struct pci_root_info, common);
+   resource_list_for_each_entry_safe(entry, tmp, >resources) {
+   res = entry->res;
+   if (res->flags & IORESOURCE_MEM) {
+   /*
+* HP's firmware has a hack to work around a
+* Windows bug. Ignore these tiny memory ranges.
+*/
+   if (resource_size(res) <= 16) {
+   resource_list_del(entry);
+   insert_resource(_resource,
+   entry->res);
+   resource_list_add_tail(entry,
+   >io_resources);
+   }
+   } else if (res->flags & IORESOURCE_IO) {
+   if (add_io_space(dev, info, entry))
+   resource_list_destroy_entry(entry);
+   }
}
+   }
 
-   return ret;
+   return status;
 }
 
-static void validate_resources(struct device *dev, struct list_head *resources,
-  unsigned long type)
+static int pci_acpi_root_init_info(struct acpi_pci_root_info_common *ci)
 {
-   LIST_HEAD(list);
-   struct resource *res1, *res2, *root = NULL;
-   struct resource_entry *tmp, *entry, *entry2;
-
-   BUG_ON((type & (IORESOURCE_MEM | IORESOURCE_IO)) == 0);
-   root = (type & IORESOURCE_MEM) ? _resource : _resource;
-
-   list_splice_init(resources, );
-   

[RFC 1/7] ACPI/PCI: Enhance ACPI core to support sparse IO space

2015-04-06 Thread Jiang Liu
Enhance ACPI resource parsing interfaces to support sparse IO space,
which will be used to share common code between x86 and IA64.

Tested-by: Tony Luck 
Signed-off-by: Jiang Liu 
---
 drivers/acpi/resource.c |9 ++---
 include/linux/ioport.h  |1 +
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 0187e0e11bb8..5727c3bf98c9 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -123,7 +123,7 @@ bool acpi_dev_resource_memory(struct acpi_resource *ares, 
struct resource *res)
 EXPORT_SYMBOL_GPL(acpi_dev_resource_memory);
 
 static void acpi_dev_ioresource_flags(struct resource *res, u64 len,
- u8 io_decode)
+ u8 io_decode, u8 translation_type)
 {
res->flags = IORESOURCE_IO;
 
@@ -135,6 +135,8 @@ static void acpi_dev_ioresource_flags(struct resource *res, 
u64 len,
 
if (io_decode == ACPI_DECODE_16)
res->flags |= IORESOURCE_IO_16BIT_ADDR;
+   if (translation_type == ACPI_SPARSE_TRANSLATION)
+   res->flags |= IORESOURCE_IO_SPARSE;
 }
 
 static void acpi_dev_get_ioresource(struct resource *res, u64 start, u64 len,
@@ -142,7 +144,7 @@ static void acpi_dev_get_ioresource(struct resource *res, 
u64 start, u64 len,
 {
res->start = start;
res->end = start + len - 1;
-   acpi_dev_ioresource_flags(res, len, io_decode);
+   acpi_dev_ioresource_flags(res, len, io_decode, 0);
 }
 
 /**
@@ -227,7 +229,8 @@ static bool acpi_decode_space(struct resource_win *win,
acpi_dev_memresource_flags(res, len, wp);
break;
case ACPI_IO_RANGE:
-   acpi_dev_ioresource_flags(res, len, iodec);
+   acpi_dev_ioresource_flags(res, len, iodec,
+ addr->info.io.translation_type);
break;
case ACPI_BUS_NUMBER_RANGE:
res->flags = IORESOURCE_BUS;
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 2c525078..b9762760ca49 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -94,6 +94,7 @@ struct resource {
 /* PnP I/O specific bits (IORESOURCE_BITS) */
 #define IORESOURCE_IO_16BIT_ADDR   (1<<0)
 #define IORESOURCE_IO_FIXED(1<<1)
+#define IORESOURCE_IO_SPARSE   (1<<2)
 
 /* PCI ROM control bits (IORESOURCE_BITS) */
 #define IORESOURCE_ROM_ENABLE  (1<<0)  /* ROM is enabled, same as 
PCI_ROM_ADDRESS_ENABLE */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 5/7] PCI/ACPI: Consolidate common PCI host bridge code into ACPI core

2015-04-06 Thread Jiang Liu
Introduce common interface acpi_pci_root_create() and related data
structures to create PCI root bus for ACPI PCI host bridges. It will
be used to kill duplicated arch specific code for IA64 and x86. It may
also help ARM64 in future.

Tested-by: Tony Luck 
Signed-off-by: Jiang Liu 
---
 drivers/acpi/pci_root.c  |  215 ++
 include/linux/pci-acpi.h |   24 ++
 2 files changed, 239 insertions(+)

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 68a5f712cd19..32d6a5ba5534 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -659,6 +659,221 @@ static void acpi_pci_root_remove(struct acpi_device 
*device)
kfree(root);
 }
 
+static void acpi_pci_root_validate_resources(struct device *dev,
+struct list_head *resources,
+unsigned long type)
+{
+   LIST_HEAD(list);
+   struct resource *res1, *res2, *root = NULL;
+   struct resource_entry *tmp, *entry, *entry2;
+
+   BUG_ON((type & (IORESOURCE_MEM | IORESOURCE_IO)) == 0);
+   root = (type & IORESOURCE_MEM) ? _resource : _resource;
+
+   list_splice_init(resources, );
+   resource_list_for_each_entry_safe(entry, tmp, ) {
+   bool free = false;
+   resource_size_t end;
+
+   res1 = entry->res;
+   if (!(res1->flags & type))
+   goto next;
+
+   /* Exclude non-addressable range or non-addressable portion */
+   end = min(res1->end, root->end);
+   if (end <= res1->start) {
+   dev_info(dev, "host bridge window %pR (ignored, not CPU 
addressable)\n",
+res1);
+   free = true;
+   goto next;
+   } else if (res1->end != end) {
+   dev_info(dev, "host bridge window %pR ([%#llx-%#llx] 
ignored, not CPU addressable)\n",
+res1, (unsigned long long)end + 1,
+(unsigned long long)res1->end);
+   res1->end = end;
+   }
+
+   resource_list_for_each_entry(entry2, resources) {
+   res2 = entry2->res;
+   if (!(res2->flags & type))
+   continue;
+
+   /*
+* I don't like throwing away windows because then
+* our resources no longer match the ACPI _CRS, but
+* the kernel resource tree doesn't allow overlaps.
+*/
+   if (resource_overlaps(res1, res2)) {
+   res2->start = min(res1->start, res2->start);
+   res2->end = max(res1->end, res2->end);
+   dev_info(dev, "host bridge window expanded to 
%pR; %pR ignored\n",
+res2, res1);
+   free = true;
+   goto next;
+   }
+   }
+
+next:
+   resource_list_del(entry);
+   if (free)
+   resource_list_free_entry(entry);
+   else
+   resource_list_add_tail(entry, resources);
+   }
+}
+
+static int acpi_pci_probe_root_resources(struct acpi_pci_root_info_common 
*info)
+{
+   int ret;
+   struct list_head *list = >resources;
+   struct acpi_device *device = info->bridge;
+   struct resource_entry *entry, *tmp;
+   unsigned long flags;
+
+   flags = IORESOURCE_IO | IORESOURCE_MEM |
+   IORESOURCE_WINDOW | IORESOURCE_MEM_8AND16BIT;
+   ret = acpi_dev_get_resources(device, list,
+acpi_dev_filter_resource_type_cb,
+(void *)flags);
+   if (ret < 0)
+   dev_warn(>dev,
+"failed to parse _CRS method, error code %d\n", ret);
+   else if (ret == 0)
+   dev_dbg(>dev,
+   "no IO and memory resources present in _CRS\n");
+   else {
+   resource_list_for_each_entry_safe(entry, tmp, list) {
+   if (entry->res->flags & IORESOURCE_DISABLED)
+   resource_list_destroy_entry(entry);
+   else
+   entry->res->name = info->name;
+   }
+   acpi_pci_root_validate_resources(>dev, list,
+IORESOURCE_MEM);
+   acpi_pci_root_validate_resources(>dev, list,
+IORESOURCE_IO);
+   }
+
+   return ret;
+}
+
+static void pci_acpi_root_add_resources(struct acpi_pci_root_info_common *info)
+{
+   struct resource_entry *entry, *tmp;
+   

[RFC 2/7] ia64/PCI/ACPI: Use common ACPI resource parsing interface for host bridge

2015-04-06 Thread Jiang Liu
Use common ACPI resource parsing interface to parse ACPI resources for
PCI host bridge, so we could share more code between IA64 and x86.
Later we will consolidate arch specific implementations into ACPI core.

Tested-by: Tony Luck 
Signed-off-by: Jiang Liu 
---
 arch/ia64/pci/pci.c |  388 ++-
 1 file changed, 168 insertions(+), 220 deletions(-)

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 48cc65705db4..cafe8e47afb2 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -115,29 +115,12 @@ struct pci_ops pci_root_ops = {
.write = pci_write,
 };
 
-/* Called by ACPI when it finds a new root bus.  */
-
-static struct pci_controller *alloc_pci_controller(int seg)
-{
-   struct pci_controller *controller;
-
-   controller = kzalloc(sizeof(*controller), GFP_KERNEL);
-   if (!controller)
-   return NULL;
-
-   controller->segment = seg;
-   return controller;
-}
-
 struct pci_root_info {
+   struct pci_controller controller;
struct acpi_device *bridge;
-   struct pci_controller *controller;
struct list_head resources;
-   struct resource *res;
-   resource_size_t *res_offset;
-   unsigned int res_num;
struct list_head io_resources;
-   char *name;
+   char name[16];
 };
 
 static unsigned int
@@ -168,11 +151,11 @@ new_space (u64 phys_base, int sparse)
return i;
 }
 
-static u64 add_io_space(struct pci_root_info *info,
-   struct acpi_resource_address64 *addr)
+static int add_io_space(struct device *dev, struct pci_root_info *info,
+   struct resource_entry *entry)
 {
struct iospace_resource *iospace;
-   struct resource *resource;
+   struct resource *resource, *res = entry->res;
char *name;
unsigned long base, min, max, base_port;
unsigned int sparse = 0, space_nr, len;
@@ -180,27 +163,24 @@ static u64 add_io_space(struct pci_root_info *info,
len = strlen(info->name) + 32;
iospace = kzalloc(sizeof(*iospace) + len, GFP_KERNEL);
if (!iospace) {
-   dev_err(>bridge->dev,
-   "PCI: No memory for %s I/O port space\n",
-   info->name);
-   goto out;
+   dev_err(dev, "PCI: No memory for %s I/O port space\n",
+   info->name);
+   return -ENOMEM;
}
 
-   name = (char *)(iospace + 1);
-
-   min = addr->address.minimum;
-   max = min + addr->address.address_length - 1;
-   if (addr->info.io.translation_type == ACPI_SPARSE_TRANSLATION)
+   if (res->flags & IORESOURCE_IO_SPARSE)
sparse = 1;
-
-   space_nr = new_space(addr->address.translation_offset, sparse);
+   space_nr = new_space(entry->offset, sparse);
if (space_nr == ~0)
goto free_resource;
 
+   name = (char *)(iospace + 1);
+   min = res->start - entry->offset;
+   max = res->end - entry->offset;
base = __pa(io_space[space_nr].mmio_base);
base_port = IO_SPACE_BASE(space_nr);
snprintf(name, len, "%s I/O Ports %08lx-%08lx", info->name,
-   base_port + min, base_port + max);
+base_port + min, base_port + max);
 
/*
 * The SDM guarantees the legacy 0-64K space is sparse, but if the
@@ -216,156 +196,174 @@ static u64 add_io_space(struct pci_root_info *info,
resource->start = base + (sparse ? IO_SPACE_SPARSE_ENCODING(min) : min);
resource->end   = base + (sparse ? IO_SPACE_SPARSE_ENCODING(max) : max);
if (insert_resource(_resource, resource)) {
-   dev_err(>bridge->dev,
-   "can't allocate host bridge io space resource  
%pR\n",
-   resource);
+   dev_err(dev,
+   "can't allocate host bridge io space resource  %pR\n",
+   resource);
goto free_resource;
}
 
+   entry->offset = base_port;
+   res->start = min + base_port;
+   res->end = max + base_port;
list_add_tail(>list, >io_resources);
-   return base_port;
+
+   return 0;
 
 free_resource:
kfree(iospace);
-out:
-   return ~0;
+   return -ENOSPC;
 }
 
-static acpi_status resource_to_window(struct acpi_resource *resource,
- struct acpi_resource_address64 *addr)
+static int
+probe_pci_root_info(struct pci_root_info *info, struct acpi_device *device,
+   int busnum, int domain)
 {
-   acpi_status status;
+   int ret;
+   struct list_head *list = >resources;
+   struct resource_entry *entry, *tmp;
 
-   /*
-* We're only interested in _CRS descriptors that are
-*  - address space descriptors for memory or I/O space
-*  - non-zero size
-*  - producers, i.e., 

[RFC 0/7] Consolidate common ACPI PCI host bridge code into ACPI core

2015-04-06 Thread Jiang Liu
As suggested by Bjorn, this patch set consolidates common ACPI PCI host
bridge code from x86 and IA64 into ACPI core. It may also help to
support ACPI PCI host bridge on ARM64 platfrom too in future.

It introduces struct acpi_pci_root_ops and acpi_pci_root_create().
Arch code only needs to implement struct acpi_pci_root_ops and then
invoke acpi_pci_root_create() to parse ACPI resources and create
PCI root bus.
struct acpi_pci_root_info_common {
struct pci_controller   controller;
struct acpi_pci_root*root;
struct acpi_device  *bridge;
struct acpi_pci_root_ops*ops;
struct list_headresources;
charname[16];
};
struct acpi_pci_root_ops {
struct pci_ops *pci_ops;
int (*init_info)(struct acpi_pci_root_info_common *info);
void (*release_info)(struct acpi_pci_root_info_common *info);
int (*prepare_resources)(struct acpi_pci_root_info_common *info,
 int status);
};
extern struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
struct acpi_pci_root_ops *ops,
size_t extra_size);
It passes Fengguang's 0day test suite and has been tested on 
1) An Intel x86 4 socket platform
2) An Intel IA64 SDV
3) An HP IA64 platform

And you may access it at:
https://github.com/jiangliu/linux.git acpi_pci_root_v1

Thanks!
Gerry

Jiang Liu (7):
  ACPI/PCI: Enhance ACPI core to support sparse IO space
  ia64/PCI/ACPI: Use common ACPI resource parsing interface for host
bridge
  ia64/PCI: Use common struct resource_entry to replace struct
iospace_resource
  x86/PCI: Rename struct pci_sysdata as struct pci_controller
  PCI/ACPI: Consolidate common PCI host bridge code into ACPI core
  x86/PCI/ACPI: Use common interface to support PCI host bridge
  ia64/PCI/ACPI: Use common interface to support PCI host bridge

 arch/ia64/include/asm/pci.h   |5 -
 arch/ia64/pci/pci.c   |  360 ++---
 arch/x86/include/asm/pci.h|   12 +-
 arch/x86/include/asm/pci_64.h |4 +-
 arch/x86/pci/acpi.c   |  283 +---
 arch/x86/pci/common.c |2 +-
 drivers/acpi/pci_root.c   |  215 
 drivers/acpi/resource.c   |9 +-
 include/linux/ioport.h|1 +
 include/linux/pci-acpi.h  |   24 +++
 10 files changed, 409 insertions(+), 506 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix v4] x86/PCI/ACPI: Fix regression caused by commit 63f1789ec716

2015-04-06 Thread Jiang Liu
Before commit 593669c2ac0f("Use common ACPI resource interfaces to
simplify implementation"), arch/x86/pci/acpi.c applies following
rules when parsing ACPI resources for PCI host bridge:
1) Ignore IO port resources defined by acpi_resource_io and
   acpi_resource_fixed_io, which should be used to define resource
   for PCI device instead of PCI bridge.
2) Accept IOMEM resource defined by acpi_resource_memory24,
   acpi_resource_memory32 and acpi_resource_fixed_memory32.
   These IOMEM resources are accepted to workaround some BIOS issue,
   though they should be ignored. For example, PC Engines APU.1C
   platform defines PCI host bridge IOMEM resources as:
Memory32Fixed (ReadOnly,
0x000A, // Address Base
0x0002, // Address Length
)
Memory32Fixed (ReadOnly,
0x, // Address Base
0x, // Address Length
_Y00)
3) Accept all IO port and IOMEM resources defined by
   acpi_resource_address{16,32,64,extended64}, no matter it's marked as
   ACPI_CONSUMER or ACPI_PRODUCER.

Commit 593669c2ac0f("Use common ACPI resource interfaces to
simplify implementation") accept all IO port and IOMEM resources
defined by acpi_resource_io, acpi_resource_fixed_io,
acpi_resource_memory24, acpi_resource_memory32,
acpi_resource_fixed_memory32 and
acpi_resource_address{16,32,64,extended64}, which causes IO port
resources consumed by host bridge itself are listed in to host bridge
resource list.

Then commit 63f1789ec716("Ignore resources consumed by host bridge
itself") ignores resources consumed by host bridge itself by checking
IORESOURCE_WINDOW flag, which accidently removed the workaround in 2)
above for BIOS bug .

It's really costed us much time to figure out this whole picture.
So we refine interface acpi_dev_filter_resource_type as below,
which should be easier for maintence:
1) Caller specifies IORESOURCE_WINDOW flag to explicitly query resource
   for bridge(PRODUCER), otherwise it's querying resource for
   device(CONSUMER).
2) Ignore IO port resources defined by acpi_resource_io and
   acpi_resource_fixed_io if IORESOURCE_WINDOW is specified.
3) Accpet IOMEM resource defined by acpi_resource_memory24,
   acpi_resource_memory32 and acpi_resource_fixed_memory32 if both
   IORESOURCE_WINDOW and IORESOURCE_MEM_8AND16BIT are specified to work
   around BIOS issues.
4) Accept IO port and IOMEM defined by acpi_resource_addressxx if
   a) IORESOURCE_WINDOW is specified and ACPI_PRODUCER is true
   b) IORESOURCE_WINDOW is not specified and ACPI_PRODUCER is false

Currently acpi_dev_filter_resource_type() is only used by ACPI pci
host bridge and IOAPIC driver, so it shouldn't affect other drivers.

Another possible fix is to only ignore IO resource consumed by host
bridge and keep IOMEM resource consumed by host bridge, please refer to:
http://www.spinics.net/lists/linux-pci/msg39706.html

Sample ACPI table are archived at:
https://bugzilla.kernel.org/show_bug.cgi?id=94221

V3->V4:
1) Improve comments
2) Use flag IORESOURCE_MEM_8AND16BIT to work around BIOS issue

Fixes: 63f1789ec716("Ignore resources consumed by host bridge itself")
Reported-and-Tested-by: Bernhard Thaler 
Signed-off-by: Jiang Liu 
---
 arch/x86/pci/acpi.c |8 +---
 drivers/acpi/resource.c |   52 +--
 2 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index e4695985f9de..150774be0f3f 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -332,12 +332,15 @@ static void probe_pci_root_info(struct pci_root_info 
*info,
 {
int ret;
struct resource_entry *entry, *tmp;
+   unsigned long res_flags;
 
sprintf(info->name, "PCI Bus %04x:%02x", domain, busnum);
info->bridge = device;
+   res_flags = IORESOURCE_IO | IORESOURCE_MEM |
+   IORESOURCE_WINDOW | IORESOURCE_MEM_8AND16BIT;
ret = acpi_dev_get_resources(device, list,
 acpi_dev_filter_resource_type_cb,
-(void *)(IORESOURCE_IO | IORESOURCE_MEM));
+(void *)res_flags);
if (ret < 0)
dev_warn(>dev,
 "failed to parse _CRS method, error code %d\n", ret);
@@ -346,8 +349,7 @@ static void probe_pci_root_info(struct pci_root_info *info,
"no IO and memory resources present in _CRS\n");
else
resource_list_for_each_entry_safe(entry, tmp, list) {
-   if ((entry->res->flags & IORESOURCE_WINDOW) == 0 ||
-   (entry->res->flags & IORESOURCE_DISABLED))
+   if (entry->res->flags & IORESOURCE_DISABLED)
resource_list_destroy_entry(entry);
else

Re: [PATCH v2 1/2] dt-bindings: Document the hi6220 thermal sensor bindings

2015-04-06 Thread Eduardo Valentin
On Tue, Apr 07, 2015 at 11:46:22AM +0800, Xinwei Kong wrote:
> 
> 
> On 2015/4/6 22:03, Matt Porter wrote:
> > On Tue, Mar 31, 2015 at 02:59:21PM +0800, Xinwei Kong wrote:
> >> From: kongxinwei 
> >>
> >> This adds documentation of device tree bindings for the
> >> thermal sensor controller of hi6220 SoC.
> >>
> >> Signed-off-by: Leo Yan 
> >> Signed-off-by: kongxinwei 
> >> ---
> >>  .../bindings/thermal/hisilicon-thermal.txt | 45 
> >> ++
> >>  1 file changed, 45 insertions(+)
> >>  create mode 100644 
> >> Documentation/devicetree/bindings/thermal/hisilicon-thermal.txt
> >>
> >> diff --git 
> >> a/Documentation/devicetree/bindings/thermal/hisilicon-thermal.txt 
> >> b/Documentation/devicetree/bindings/thermal/hisilicon-thermal.txt
> >> new file mode 100644
> >> index 000..ceb6e2e
> >> --- /dev/null
> >> +++ b/Documentation/devicetree/bindings/thermal/hisilicon-thermal.txt
> >> @@ -0,0 +1,45 @@
> >> +* Hisilicon Thermal
> >> +
> >> +This driver is for hi6220 SoC which contain 4 thermal sensor.
> >> +
> >> +  1. sensor 0: local sensor;
> >> +  2. sensor 1: remote sensor for ACPU cluster 1;
> >> +  3. sensor 2: remote sensor for ACPU cluster 2;
> >> +  4. sensor 3: remote sensor for GPU.
> >> +
> >> +Every sensor use one child node to represent it, so thermal sensor include
> >> +parent node and four child node. The parent node describe common feature 
> >> and
> >> +child node describe private feature for thermal sensor;
> >> +
> >> +** Required properties :
> >> +
> >> +- compatible: "hisilicon,tsensor".
> >> +- reg: physical base address of thermal sensor and length of memory mapped
> >> +  region.
> >> +- interrupt: The interrupt number to the cpu. Defines the interrupt used
> >> +  by SOCTHERM.
> >> +- clock-names: Input clock name, should be 'thermal_clk'.
> >> +- clocks: phandles for clock specified in "clock-names" property.
> >> +- #thermal-sensor-cells: Should be 1. See ./thermal.txt for a description.
> >> +
> >> +** Required properties for child nodes :
> >> +
> >> +- hisilicon,tsensor-id: the index of thermal sensor and use it to 
> >> distinguish
> >> +  thermal sensor. For example: <0> stands for local sensor; <1> stands for
> >> +  acpu1 sensor;
> > 
> > Please show an example illustrating why this property is needed. The
> > example below doesn't show any per sensor properties aside from the
> > sensor id. Other bindings with a similar sub-sensor hardware design like
> > tegra-soctherm and rockchip-thermal don't have a need for a vendor
> > specific property like this. Their drivers simply iterate over an id
> > index during thermal sensor registration.
> > 
> > -Matt
> > 
> Thermal Ip of hisilicon SoC can get four module temperature--local sensor, 
> ACPU0
> sensor, ACPU1 sensor and gpu sensor. In order to use these sensors, this 
> driver
> will make use of sensor id to distinguish sensor in using process.
> 
> These four sensors only get one sensor temperature at the same times. Because
> these sensor commonly use the same register by setting diff value to enable 
> one
> sensor. howerver, sensor id is key flag for these diff sensor modules.
> 
> If deleting sensor id, this driver will define some value which set diff 
> sensor
> regitser and it difficult to understand sensor register operation.

The above still do not explain why you need a specific property.

Could you please check
Documentation/devicetree/bindings/thermal/thermal.txt file?

There are several examples there on how to define DT nodes for the exact
case you describe above.

> 
> Thanks
> Xinwei
> 
> >> +
> >> +Example :
> >> +
> >> +  tsensor: tsensor@0,f7030700 {
> >> +  compatible = "hisilicon,tsensor";
> >> +  reg = <0x0 0xf7030700 0x0 0x1000>;
> >> +  interrupts = <0 7 0x4>;
> >> +  clocks = <_sys HI6220_TSENSOR_CLK>;
> >> +  clock-names = "thermal_clk";
> >> +  #thermal-sensor-cells = <1>;
> >> +
> >> +  local_sensor {
> >> +  hisilicon,tsensor-id = <0>;
> >> +  }
> >> +  ...
> >> +  }
> >> -- 
> >> 1.9.1
> >>
> >>
> >>
> >> ___
> >> linux-arm-kernel mailing list
> >> linux-arm-ker...@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > 
> > .
> > 
> 


signature.asc
Description: Digital signature


[PATCH v2] phy: core: Check requested PHY status in _of_phy_get()

2015-04-06 Thread Axel Lin
This is a common checking in various drivers, so move the checking to
_of_phy_get().

Signed-off-by: Axel Lin 
---
v2: Fixup the error patch. It needs to call module_put if Requested PHY is
disabled.
 drivers/phy/phy-core.c  | 12 ++--
 drivers/phy/phy-miphy28lp.c |  5 -
 drivers/phy/phy-miphy365x.c |  5 -
 drivers/phy/phy-rcar-gen2.c |  5 -
 4 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/drivers/phy/phy-core.c b/drivers/phy/phy-core.c
index 3791838f..f7403de 100644
--- a/drivers/phy/phy-core.c
+++ b/drivers/phy/phy-core.c
@@ -367,13 +367,21 @@ static struct phy *_of_phy_get(struct device_node *np, 
int index)
phy_provider = of_phy_provider_lookup(args.np);
if (IS_ERR(phy_provider) || !try_module_get(phy_provider->owner)) {
phy = ERR_PTR(-EPROBE_DEFER);
-   goto err0;
+   goto out_unlock;
+   }
+
+   if (!of_device_is_available(args.np)) {
+   dev_warn(phy_provider->dev, "Requested PHY is disabled\n");
+   phy = ERR_PTR(-ENODEV);
+   goto out_put_module;
}
 
phy = phy_provider->of_xlate(phy_provider->dev, );
+
+out_put_module:
module_put(phy_provider->owner);
 
-err0:
+out_unlock:
mutex_unlock(_provider_mutex);
of_node_put(args.np);
 
diff --git a/drivers/phy/phy-miphy28lp.c b/drivers/phy/phy-miphy28lp.c
index 9334352..cc87b3f 100644
--- a/drivers/phy/phy-miphy28lp.c
+++ b/drivers/phy/phy-miphy28lp.c
@@ -1107,11 +1107,6 @@ static struct phy *miphy28lp_xlate(struct device *dev,
struct device_node *phynode = args->np;
int ret, index = 0;
 
-   if (!of_device_is_available(phynode)) {
-   dev_warn(dev, "Requested PHY is disabled\n");
-   return ERR_PTR(-ENODEV);
-   }
-
if (args->args_count != 1) {
dev_err(dev, "Invalid number of cells in 'phy' property\n");
return ERR_PTR(-EINVAL);
diff --git a/drivers/phy/phy-miphy365x.c b/drivers/phy/phy-miphy365x.c
index 51b459d..d29b332 100644
--- a/drivers/phy/phy-miphy365x.c
+++ b/drivers/phy/phy-miphy365x.c
@@ -476,11 +476,6 @@ static struct phy *miphy365x_xlate(struct device *dev,
struct device_node *phynode = args->np;
int ret, index;
 
-   if (!of_device_is_available(phynode)) {
-   dev_warn(dev, "Requested PHY is disabled\n");
-   return ERR_PTR(-ENODEV);
-   }
-
if (args->args_count != 1) {
dev_err(dev, "Invalid number of cells in 'phy' property\n");
return ERR_PTR(-EINVAL);
diff --git a/drivers/phy/phy-rcar-gen2.c b/drivers/phy/phy-rcar-gen2.c
index 778276a..f47bfd8 100644
--- a/drivers/phy/phy-rcar-gen2.c
+++ b/drivers/phy/phy-rcar-gen2.c
@@ -206,11 +206,6 @@ static struct phy *rcar_gen2_phy_xlate(struct device *dev,
struct device_node *np = args->np;
int i;
 
-   if (!of_device_is_available(np)) {
-   dev_warn(dev, "Requested PHY is disabled\n");
-   return ERR_PTR(-ENODEV);
-   }
-
drv = dev_get_drvdata(dev);
if (!drv)
return ERR_PTR(-EINVAL);
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the vfs tree with the v9fs tree

2015-04-06 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the vfs tree got a conflict in
net/9p/protocol.c between commit 6250a8badb31 ("9p: use unsigned
integers for nwqid/count") from the v9fs tree and commit 3f39ef33084b
("net/9p: switch the guts of p9_client_{read,write}() to iov_iter")
from the vfs tree.

I fixed it up (I think - see below) and can carry the fix as necessary
(no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc net/9p/protocol.c
index 305e4789f2cc,e9d0f0c1a048..
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@@ -437,23 -439,13 +439,13 @@@ p9pdu_vwritef(struct p9_fcall *pdu, in
 stbuf->extension, stbuf->n_uid,
 stbuf->n_gid, stbuf->n_muid);
} break;
-   case 'D':{
-   uint32_t count = va_arg(ap, uint32_t);
-   const void *data = va_arg(ap, const void *);
- 
-   errcode = p9pdu_writef(pdu, proto_version, "d",
-   count);
-   if (!errcode && pdu_write(pdu, data, count))
-   errcode = -EFAULT;
-   }
-   break;
-   case 'U':{
+   case 'V':{
 -  int32_t count = va_arg(ap, int32_t);
 +  uint32_t count = va_arg(ap, uint32_t);
-   const char __user *udata =
-   va_arg(ap, const void __user *);
+   struct iov_iter *from =
+   va_arg(ap, struct iov_iter *);
errcode = p9pdu_writef(pdu, proto_version, "d",
count);
-   if (!errcode && pdu_write_u(pdu, udata, count))
+   if (!errcode && pdu_write_u(pdu, from, count))
errcode = -EFAULT;
}
break;


pgpWuT5NzEHJX.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the vfs tree with the ext4 tree

2015-04-06 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the vfs tree got a conflict in fs/ext4/inode.c 
between commit 72b8e0f9fa8a ("ext4: remove unused header files") from the ext4 
tree and commit e2e40f2c1ed4 ("fs: move struct kiocb to fs.h") from the vfs 
tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc fs/ext4/inode.c
index 7eb70b7a5c19,42c942a950e1..
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@@ -35,12 -36,10 +35,11 @@@
  #include 
  #include 
  #include 
- #include 
 -#include 
  #include 
 +#include 
  
  #include "ext4_jbd2.h"
 +#include "ext4_crypto.h"
  #include "xattr.h"
  #include "acl.h"
  #include "truncate.h"
@@@ -3136,14 -3033,11 +3135,14 @@@ static ssize_t ext4_ext_direct_IO(struc
get_block_func = ext4_get_block_write;
dio_flags = DIO_LOCKING;
}
 +#ifdef CONFIG_EXT4_FS_ENCRYPTION
 +  BUG_ON(ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode));
 +#endif
if (IS_DAX(inode))
-   ret = dax_do_io(rw, iocb, inode, iter, offset, get_block_func,
+   ret = dax_do_io(iocb, inode, iter, offset, get_block_func,
ext4_end_io_dio, dio_flags);
else
-   ret = __blockdev_direct_IO(rw, iocb, inode,
+   ret = __blockdev_direct_IO(iocb, inode,
   inode->i_sb->s_bdev, iter, offset,
   get_block_func,
   ext4_end_io_dio, NULL, dio_flags);


pgpc6rWu57mIk.pgp
Description: OpenPGP digital signature


[PATCH] sched/deadline: fix dl bandwidth of root domain overflow after dl task dead

2015-04-06 Thread Wanpeng Li
The total used dl bandwidth of each root domain will be reset to 0 after 
cpu hotplug when rebuild sched domains, since the call path is:

_cpu_down
  cpuset_cpu_inactive() 
cpuset_update_active_cpus()
  partition_sched_domains()
build_sched_domains() 
  init_rootdomain() 
init_dl_bw() 

The bandwidth which dl task occupy will be released when dl task dead,
it will be minus from total used dl bandwidth of its root domain, 
however, bandwidth overflow occurs since total used dl bandwidth is 0.

This patch fix it by attaching the bandwidth which dl task occupy to 
the new root domain when the task is migrating since cpu hotplug, and
attach all the used dl bandwidth of dl tasks to the new root domain 
when sched domains are rebuild.

Signed-off-by: Wanpeng Li 
---
 kernel/sched/core.c |  1 +
 kernel/sched/deadline.c | 25 +
 kernel/sched/sched.h|  1 +
 3 files changed, 27 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 28b0d75..c940999 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5586,6 +5586,7 @@ static void rq_attach_root(struct rq *rq, struct 
root_domain *rd)
rq->rd = rd;
 
cpumask_set_cpu(rq->cpu, rd->span);
+   attach_dl_bw(rq);
if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
set_rq_online(rq);
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5e95145..62680d7 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -224,6 +224,7 @@ static void dl_task_offline_migration(struct rq *rq, struct 
task_struct *p)
 {
struct rq *later_rq = NULL;
bool fallback = false;
+   struct dl_bw *dl_b;
 
later_rq = find_lock_later_rq(p, rq);
 
@@ -258,6 +259,11 @@ static void dl_task_offline_migration(struct rq *rq, 
struct task_struct *p)
set_task_cpu(p, later_rq->cpu);
activate_task(later_rq, p, ENQUEUE_REPLENISH);
 
+   dl_b = dl_bw_of(later_rq->cpu);
+   raw_spin_lock(_b->lock);
+   __dl_add(dl_b, p->dl.dl_bw);
+   raw_spin_unlock(_b->lock);
+
if (!fallback)
resched_curr(later_rq);
 
@@ -1776,6 +1782,25 @@ static void prio_changed_dl(struct rq *rq, struct 
task_struct *p,
switched_to_dl(rq, p);
 }
 
+void attach_dl_bw(struct rq *rq)
+{
+   struct rb_node *next_node = rq->dl.rb_leftmost;
+   struct sched_dl_entity *dl_se;
+   struct dl_bw *dl_b;
+
+   dl_b = dl_bw_of(rq->cpu);
+   raw_spin_lock(_b->lock);
+next_node:
+   if (next_node) {
+   dl_se = rb_entry(next_node, struct sched_dl_entity, rb_node);
+   __dl_add(dl_b, dl_se->dl_bw);
+   next_node = rb_next(next_node);
+
+   goto next_node;
+   }
+   raw_spin_unlock(_b->lock);
+}
+
 const struct sched_class dl_sched_class = {
.next   = _sched_class,
.enqueue_task   = enqueue_task_dl,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e0e1299..a7b1a59 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1676,6 +1676,7 @@ extern void init_dl_rq(struct dl_rq *dl_rq);
 
 extern void cfs_bandwidth_usage_inc(void);
 extern void cfs_bandwidth_usage_dec(void);
+void attach_dl_bw(struct rq *rq);
 
 #ifdef CONFIG_NO_HZ_COMMON
 enum rq_nohz_flag_bits {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] checkpatch: improve operator spacing check

2015-04-06 Thread Joe Perches
On Tue, 2015-04-07 at 13:36 +1000, Sam Bobroff wrote:
> Code such as:
>x = timercmp(, , <);
> Will currently trigger a checkpatch error. e.g.
> ERROR: spaces required around that '<'
> 
> This is because the "Ignore operators passed as parameters" check
> looks only for a comma following the operator. Improve the check by
> also looking for a close parenthesis.
> 
> Signed-off-by: Sam Bobroff 

Seems sensible, thanks.

> ---
> 
>  scripts/checkpatch.pl |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index d124359..f65c4de 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3565,7 +3565,7 @@ sub process {
>  
>   # Ignore operators passed as parameters.
>   if ($op_type ne 'V' &&
> - $ca =~ /\s$/ && $cc =~ /^\s*,/) {
> + $ca =~ /\s$/ && $cc =~ /^\s*[,\)]/) {
>  
>  ## Ignore comments
>  #} elsif ($op =~ /^$;+$/) {



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-06 Thread Dave Young
On 04/03/15 at 02:05pm, Li, Zhen-Hua wrote:
> The hardware will do some verification, but not completely.  If people think 
> the OS should also do this, then it should be another patchset, I think.

If there is chance to corrupt more memory I think it is not a right way.
We should think about a better solution instead of fix it later.

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-04-06 Thread Dave Young
On 04/05/15 at 09:54am, Baoquan He wrote:
> On 04/03/15 at 05:21pm, Dave Young wrote:
> > On 04/03/15 at 05:01pm, Li, ZhenHua wrote:
> > > Hi Dave,
> > > 
> > > There may be some possibilities that the old iommu data is corrupted by
> > > some other modules. Currently we do not have a better solution for the
> > > dmar faults.
> > > 
> > > But I think when this happens, we need to fix the module that corrupted
> > > the old iommu data. I once met a similar problem in normal kernel, the
> > > queue used by the qi_* functions was written again by another module.
> > > The fix was in that module, not in iommu module.
> > 
> > It is too late, there will be no chance to save vmcore then.
> > 
> > Also if it is possible to continue corrupt other area of oldmem because
> > of using old iommu tables then it will cause more problems.
> > 
> > So I think the tables at least need some verifycation before being used.
> > 
> 
> Yes, it's a good thinking anout this and verification is also an
> interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
> and then verify this again when panic happens in purgatory. This checks
> whether any code stomps into region reserved for kexec/kernel and corrupt
> the loaded kernel.
> 
> If this is decided to do it should be an enhancement to current
> patchset but not a approach change. Since this patchset is going very
> close to point as maintainers expected maybe this can be merged firstly,
> then think about enhancement. After all without this patchset vt-d often
> raised error message, hung.

It does not convince me, we should do it right at the beginning instead of
introduce something wrong.

I wonder why the old dma can not be remap to a specific page in kdump kernel
so that it will not corrupt more memory. But I may missed something, I will
looking for old threads and catch up.

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] dt-bindings: Document the hi6220 thermal sensor bindings

2015-04-06 Thread Xinwei Kong


On 2015/4/6 22:03, Matt Porter wrote:
> On Tue, Mar 31, 2015 at 02:59:21PM +0800, Xinwei Kong wrote:
>> From: kongxinwei 
>>
>> This adds documentation of device tree bindings for the
>> thermal sensor controller of hi6220 SoC.
>>
>> Signed-off-by: Leo Yan 
>> Signed-off-by: kongxinwei 
>> ---
>>  .../bindings/thermal/hisilicon-thermal.txt | 45 
>> ++
>>  1 file changed, 45 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/thermal/hisilicon-thermal.txt
>>
>> diff --git a/Documentation/devicetree/bindings/thermal/hisilicon-thermal.txt 
>> b/Documentation/devicetree/bindings/thermal/hisilicon-thermal.txt
>> new file mode 100644
>> index 000..ceb6e2e
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/thermal/hisilicon-thermal.txt
>> @@ -0,0 +1,45 @@
>> +* Hisilicon Thermal
>> +
>> +This driver is for hi6220 SoC which contain 4 thermal sensor.
>> +
>> +1. sensor 0: local sensor;
>> +2. sensor 1: remote sensor for ACPU cluster 1;
>> +3. sensor 2: remote sensor for ACPU cluster 2;
>> +4. sensor 3: remote sensor for GPU.
>> +
>> +Every sensor use one child node to represent it, so thermal sensor include
>> +parent node and four child node. The parent node describe common feature and
>> +child node describe private feature for thermal sensor;
>> +
>> +** Required properties :
>> +
>> +- compatible: "hisilicon,tsensor".
>> +- reg: physical base address of thermal sensor and length of memory mapped
>> +  region.
>> +- interrupt: The interrupt number to the cpu. Defines the interrupt used
>> +  by SOCTHERM.
>> +- clock-names: Input clock name, should be 'thermal_clk'.
>> +- clocks: phandles for clock specified in "clock-names" property.
>> +- #thermal-sensor-cells: Should be 1. See ./thermal.txt for a description.
>> +
>> +** Required properties for child nodes :
>> +
>> +- hisilicon,tsensor-id: the index of thermal sensor and use it to 
>> distinguish
>> +  thermal sensor. For example: <0> stands for local sensor; <1> stands for
>> +  acpu1 sensor;
> 
> Please show an example illustrating why this property is needed. The
> example below doesn't show any per sensor properties aside from the
> sensor id. Other bindings with a similar sub-sensor hardware design like
> tegra-soctherm and rockchip-thermal don't have a need for a vendor
> specific property like this. Their drivers simply iterate over an id
> index during thermal sensor registration.
> 
> -Matt
> 
Thermal Ip of hisilicon SoC can get four module temperature--local sensor, ACPU0
sensor, ACPU1 sensor and gpu sensor. In order to use these sensors, this driver
will make use of sensor id to distinguish sensor in using process.

These four sensors only get one sensor temperature at the same times. Because
these sensor commonly use the same register by setting diff value to enable one
sensor. howerver, sensor id is key flag for these diff sensor modules.

If deleting sensor id, this driver will define some value which set diff sensor
regitser and it difficult to understand sensor register operation.

Thanks
Xinwei

>> +
>> +Example :
>> +
>> +tsensor: tsensor@0,f7030700 {
>> +compatible = "hisilicon,tsensor";
>> +reg = <0x0 0xf7030700 0x0 0x1000>;
>> +interrupts = <0 7 0x4>;
>> +clocks = <_sys HI6220_TSENSOR_CLK>;
>> +clock-names = "thermal_clk";
>> +#thermal-sensor-cells = <1>;
>> +
>> +local_sensor {
>> +hisilicon,tsensor-id = <0>;
>> +}
>> +...
>> +}
>> -- 
>> 1.9.1
>>
>>
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] PCI: Set pref for mem64 resource of pcie device

2015-04-06 Thread Bjorn Helgaas
On Mon, Apr 6, 2015 at 8:13 PM, Yinghai Lu  wrote:
> On Mon, Apr 6, 2015 at 3:49 PM, Bjorn Helgaas  wrote:
>>
>> For "[PATCH 1/3] PCI: Introduce pci_bus_addr_t", I'm waiting for an updated
>> version with Kconfig tweaks so we don't break other arches.
>
> I was thinking that you will update it manually.

I asked you for an updated version, incorporating the documentation
updates, to make sure I got everything you intended.  But I did go
ahead and do it manually for you.

>> For "[PATCH 2/3] sparc/PCI: Add mem64 resource parsing for root bus", I'm
>> waiting for a version that fixes the other of_bus_pci_get_flags() and
>> pci_parse_of_flags() implementations at the same time (or an explanation
>> about why we should fix only the arch/sparc version).  I don't want to fix
>> one place and leave the same bug in other places.
>
> I don't even know if other arch like powerpc support 64-bit bus address.
>
> No one from powerpc reported a problem, why should we mess it up now?
>
> I would like to see someone get access those three kinds of machine that 
> support
> of to unify of support code.

Of course changes there should be tested on all the affected machines.
I opened https://bugzilla.kernel.org/show_bug.cgi?id=96241 and
assigned it to you as a reminder that there is nearly identical code
in several other places that may have the same issue.

I pushed these two changes to for-linus.  I'll work on the third
tomorrow.  The current changelog is very sparc64-centric, and it needs
to be much more explicit about how the change will affect every arch.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] x86, selftests: Add sigreturn selftest

2015-04-06 Thread Michael Ellerman
On Mon, 2015-04-06 at 19:01 -0700, Andy Lutomirski wrote:
> This is my sigreturn test, added mostly unchanged from its old home.
> It exercises the sigreturn(2) syscall, specifically focusing on its
> interactions with various IRET corner cases.  It tests for correct
> behavior in several areas that were historically dangerously buggy.
> For example, it exercises espfix on kernels of both bitnesses under
> various conditions, and it contains exploits for several now-fixed
> bugs in IRET error handling.
> 
> If you run it on older kernels, your system will crash.  It probably
> won't eat your data in the process.
> 
> There is no released kernel on which the sigreturn_64 test will
> pass, but it passes on tip:x86/asm.
> 
> IMO it's unfortunate that I need to provide a special script to run
> tests.  I'd rather just list my targets.

If you use lib.mk you can.

  
https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git/log/?h=next

See for example:

  
https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git/commit/?h=next=5744de542dd4b963c2975e6f70844ce2899864e4

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Why not build kernel with -O3

2015-04-06 Thread Pengfei Yuan
Hi,

I have conducted some experiments to compare kernels built with -O2
and -O3. Here are the results:

Application  Performance O2   Performance O3   Improvement
Apache   127814.14 req/s  130321.24 req/s  1.96%
Nginx537589.08 req/s  556723.32 req/s  3.56%
MySQL70661.38 tx/s71008.47 tx/s0.49%
PostgreSQL   79763.39 tx/s79535.59 tx/s-0.29%
Redis352547.47 op/s   405417.24 op/s   15.0%
Memcached844439.14 op/s   845321.79 op/s   0.10%

Geomean: +3.34%

Experiment environment: Linux 3.19.3, GCC 4.9.3 prerelease, Core-i7
4770, 32G RAM, 10GbE

LMbench microbenchmark also shows reduction in various latencies, as
well as increase of throughputs.

Why not add an option to build kernel with -O3?

Regards,
YUAN, Pengfei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] checkpatch: improve operator spacing check

2015-04-06 Thread Sam Bobroff
Code such as:
   x = timercmp(, , <);
Will currently trigger a checkpatch error. e.g.

ERROR: spaces required around that '<'

This is because the "Ignore operators passed as parameters" check
looks only for a comma following the operator. Improve the check by
also looking for a close parenthesis.

Signed-off-by: Sam Bobroff 
---

 scripts/checkpatch.pl |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index d124359..f65c4de 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3565,7 +3565,7 @@ sub process {
 
# Ignore operators passed as parameters.
if ($op_type ne 'V' &&
-   $ca =~ /\s$/ && $cc =~ /^\s*,/) {
+   $ca =~ /\s$/ && $cc =~ /^\s*[,\)]/) {
 
 #  # Ignore comments
 #  } elsif ($op =~ /^$;+$/) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


not syncing: Attempted to kill init! exitcode=0x00000004 ?

2015-04-06 Thread Masahiro Yamada
Hello experts,
I hope this is the correct ML to ask this question.

I am struggling to port Linux-4.0-rc7 onto my SoC/board,
based on ARM cortex-A9 (single CPU), but the kernel fails to boot
with the error:
"not syncing: Attempted to kill init! exitcode=0x0004"


I want to use NS16550-compatible UART, Global Timer, and GIC.
I wrote a simple device tree source for my own board like this:

-->8--
/dts-v1/;
/include/ "skeleton.dtsi"

/ {
compatible = "socionext,ph1-ld4";

memory {
device_type = "memory";
reg = <0x8000 0x2000>;
};

chosen {
bootargs = "root=/dev/ram0 console=ttyS0,115200";
};

aliases {
serial0 = 
};

cpus {
#size-cells = <0>;
#address-cells = <1>;

cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a9";
reg = <0>;
};
};

clocks {
#address-cells = <1>;
#size-cells = <0>;

arm_timer_clk: arm_timer_clk {
#clock-cells = <0>;
compatible = "fixed-clock";
clock-frequency = <5000>;
};
};

soc: soc {
compatible = "simple-bus";
#address-cells = <1>;
#size-cells = <1>;
interrupt-parent = <>;
ranges;


uart0: uart@03fb {
compatible = "ns16550";
reg = <0x03fb 0x100>;
clock-frequency = <12288000>;
interrupts = <0 81 4>;
reg-shift = <1>;
fifo-size = <16>;
};

intc: interrupt-controller@60001000 {
compatible = "arm,cortex-a9-gic";
#interrupt-cells = <3>;
#address-cells = <1>;
interrupt-controller;
reg = <0x60001000 0x1000>,
  <0x6100 0x100>;
};

global_timer: timer@6200 {
compatible = "arm,cortex-a9-global-timer";
reg = <0x6200 0x20>;
interrupts = <1 11 0x104>;
interrupt-parent = <>;
clocks = <_timer_clk>;
};

};
};
-8<


The Kernel configuration is based on multi_v7_defconfig.
The diffconfig against it is like this:
(ARCH_UNIPHIER is intended to enable my SoC)

-->8--
CONFIG_ARCH_UNIPHIER=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=16384
-8<---



I passed the kernel image, the device tree, and the initramdisk from U-Boot.

I guess I am almost there, but it looks like the kernel failed to run init.
The kernel log is as follows:


-->8--
[0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 4.0.0-rc7-00032-g8c5ce71-dirty
(yamada@beagle) (gcc version 4.7.3 (Ubuntu/Linaro 4.7.3-12ubuntu1) )
#7 SMP Tue Apr 7 12:20:19 JST 2015
[0.00] CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=10c5387d
[0.00] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
instruction cache
[0.00] Machine model: socionext,ph1-ld4
[0.00] cma: Reserved 64 MiB at 0x9b00
[0.00] Memory policy: Data cache writeback
[0.00] CPU: All CPU(s) started in SVC mode.
[0.00] PERCPU: Embedded 11 pages/cpu @dfbc9000 s12480 r8192
d24384 u45056
[0.00] Built 1 zonelists in Zone order, mobility grouping on.
Total pages: 130048
[0.00] Kernel command line: root=/dev/ram0 console=ttyS0,115200
[0.00] PID hash table entries: 2048 (order: 1, 8192 bytes)
[0.00] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[0.00] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[0.00] Memory: 435456K/524288K available (7830K kernel code,
1015K rwdata, 3412K rodata, 808K init, 316K bss, 23296K reserved,
65536K cma-reserved, 0K highmem)
[0.00] Virtual kernel memory layout:
[0.00] vector  : 0x - 0x1000   (   4 kB)
[0.00] fixmap  : 0xffc0 - 0xfff0   (3072 kB)
[0.00] vmalloc : 0xe080 - 0xff00   ( 488 MB)
[0.00] lowmem  : 0xc000 - 0xe000   ( 512 MB)
[0.00] pkmap   : 0xbfe0 - 0xc000   (   2 MB)
[0.00] modules : 0xbf00 - 0xbfe0   (  14 MB)
[0.00]   .text : 0xc0208000 - 0xc0d03b7c   (11247 kB)
[0.00]   

Re: [PATCH] x86/numa: kernel stack corruption fix

2015-04-06 Thread Dave Young
On 04/06/15 at 07:26am, Yasuaki Ishimatsu wrote:
> Hi,
> 
> On Fri, 3 Apr 2015 15:15:13 +0800
> Dave Young  wrote:
> 
> > Hi,
> > 
> > On 04/02/15 at 12:36pm, Yasuaki Ishimatsu wrote:
> > > 
> > > On Wed, 1 Apr 2015 12:53:46 +0800
> > > Dave Young  wrote:
> > > 
> > > > I got below kernel panic during kdump test on Thinkpad T420 laptop:
> > > > 
> > > > [0.00] No NUMA configuration found  
> > > > 
> > > > [0.00] Faking a node at [mem 
> > > > 0x-0x37ba4fff] 
> > > > [0.00] Kernel panic - not syncing: stack-protector: Kernel 
> > > > stack is cor 
> > > > upted in: 81d21910  
> > > >r
> > > > [0.00]  
> > > > 
> > > > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.0.0-rc6+ #44   
> > > > 
> > > > [0.00] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 83ET76WW 
> > > > (1.46 ) 07/ 
> > > > 5/2013  
> > > >0
> > > > [0.00]   c70296ddd809e4f6 81b67ce8 
> > > > 817c 
> > > > a26 
> > > >2
> > > > [0.00]   81a61c90 81b67d68 
> > > > 817b 
> > > > 8d2 
> > > >c
> > > > [0.00]  0010 81b67d78 81b67d18 
> > > > c70296ddd809 
> > > > 4f6 
> > > >e
> > > > [0.00] Call Trace:  
> > > > 
> > > > [0.00]  [] dump_stack+0x45/0x57   
> > > > 
> > > > [0.00]  [] panic+0xd0/0x204   
> > > > 
> > > > [0.00]  [] ? 
> > > > numa_clear_kernel_node_hotplug+0xe6/0xf2 
> > > > [0.00]  [] __stack_chk_fail+0x1b/0x20 
> > > > 
> > > > [0.00]  [] 
> > > > numa_clear_kernel_node_hotplug+0xe6/0xf2   
> > > > [0.00]  [] numa_init+0x1a5/0x520  
> > > > 
> > > > [0.00]  [] x86_numa_init+0x19/0x3d
> > > > 
> > > > [0.00]  [] initmem_init+0x9/0xb   
> > > > 
> > > > [0.00]  [] setup_arch+0x94f/0xc82 
> > > > 
> > > > [0.00]  [] ? early_idt_handlers+0x120/0x120   
> > > > 
> > > > [0.00]  [] ? printk+0x55/0x6b 
> > > > 
> > > > [0.00]  [] ? early_idt_handlers+0x120/0x120   
> > > > 
> > > > [0.00]  [] start_kernel+0xe8/0x4d6
> > > > 
> > > > [0.00]  [] ? early_idt_handlers+0x120/0x120   
> > > > 
> > > > [0.00]  [] ? early_idt_handlers+0x120/0x120   
> > > > 
> > > > [0.00]  [] 
> > > > x86_64_start_reservations+0x2a/0x2c
> > > > [0.00]  [] x86_64_start_kernel+0x161/0x184
> > > > 
> > > > [0.00] ---[ end Kernel panic - not syncing: stack-protector: 
> > > > Kernel sta 
> > > > k is corrupted in: 81d21910 
> > > >c
> > > > [0.00]  
> > > > 
> > > > PANIC: early exception 0d rip 10:8105d2a6 error 7eb cr2 
> > > > 8800371dd00 
> > > > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.0.0-rc6+ #44   
> > > >0
> > > > [0.00] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 83ET76WW 
> > > > (1.46 ) 07/ 
> > > > 5/2013  
> > > >0
> > > > [0.00]   c70296ddd809e4f6 81b67c60 
> > > > 817c 
> > > > a26 
> > > >2
> > > > [0.00]  0096 81a61c90 81b67d68 
> > > > fff0 
> > > > 084 0a0d 0a00   
> > > >0
> > > > [0.00] Call Trace:  
> > > > 
> > > > [0.00]  [] dump_stack+0x45/0x57   
> > > > 
> > > > [0.00]  [] early_idt_handler+0x90/0xb7
> > > > 
> > > > [0.00]  [] ? native_irq_enable+0x6/0x10   
> > > > 
> > > > [0.00]  [] ? panic+0x1c3/0x204
> > > > 
> > > > [0.00]  [] ? 
> > > > numa_clear_kernel_node_hotplug+0xe6/0xf2 
> > > > [0.00]  [] __stack_chk_fail+0x1b/0x20 
> > > > 
> > > > [0.00]  [] 
> > > > numa_clear_kernel_node_hotplug+0xe6/0xf2   
> > > > [0.00]  [] numa_init+0x1a5/0x520  
> > > > 
> > 

[PATCH] sdhci: rtsx: fix 64 BIT DMA quirks

2015-04-06 Thread micky_ching
From: Micky Ching 

rts5250 chip failed handle 64 bit ADMA for address below 4G.
Add 64 BIT quirks to disable this feature.

Signed-off-by: Micky Ching 
---
 drivers/mmc/host/sdhci-pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mmc/host/sdhci-pci.c b/drivers/mmc/host/sdhci-pci.c
index 0342775..ae8e450 100644
--- a/drivers/mmc/host/sdhci-pci.c
+++ b/drivers/mmc/host/sdhci-pci.c
@@ -650,6 +650,7 @@ static int rtsx_probe_slot(struct sdhci_pci_slot *slot)
 
 static const struct sdhci_pci_fixes sdhci_rtsx = {
.quirks2= SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
+   SDHCI_QUIRK2_BROKEN_64_BIT_DMA |
SDHCI_QUIRK2_BROKEN_DDR50,
.probe_slot = rtsx_probe_slot,
 };
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rtc-linux] [PATCH] rtc: OMAP: Add external 32k clock feature

2015-04-06 Thread Keerthy

Hi Andrew,

Apologies for replying late.

On Wednesday 25 March 2015 04:29 AM, Andrew Morton wrote:

On Tue, 3 Mar 2015 15:12:02 +0530 Keerthy  wrote:


Add external 32k clock feature. The internal clock will be gated during suspend.
Hence make use of the external 32k clock so that rtc is functional accross
suspend/resume.

...

@@ -446,6 +449,7 @@ static const struct omap_rtc_device_type 
omap_rtc_default_type = {

  static const struct omap_rtc_device_type omap_rtc_am3352_type = {
.has_32kclk_en  = true,
+   .has_osc_ext_32k = true,
.has_kicker = true,
.has_irqwakeen  = true,
.has_pmic_mode  = true,
@@ -543,7 +547,16 @@ static int __init omap_rtc_probe(struct platform_device 
*pdev)
if (rtc->type->has_32kclk_en) {
reg = rtc_read(rtc, OMAP_RTC_OSC_REG);
rtc_writel(rtc, OMAP_RTC_OSC_REG,
-   reg | OMAP_RTC_OSC_32KCLK_EN);
+  reg | OMAP_RTC_OSC_32KCLK_EN);
+   }
+
+   /* Enable External clock as the source */
+
+   if (rtc->type->has_osc_ext_32k) {
+   rtc_writel(rtc, OMAP_RTC_OSC_REG,
+  (OMAP_RTC_OSC_EXT_32K |
+  rtc_read(rtc, OMAP_RTC_OSC_REG)) &
+  (~OMAP_RTC_OSC_OSC32K_GZ));
}


How do we know that all systems have this external clock and that it
works OK?



AM335 and AM43X have the external clock feature which we choose using 
RTC_OSC_REG. I verified it works OK by seeing the RTC seconds ticking 
even after switching the source to the external 32k Clock.


Regards,
Keerthy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/13] thermal: Make struct thermal_zone_device_ops const

2015-04-06 Thread Eduardo Valentin
On Thu, Mar 26, 2015 at 04:53:57PM +0100, Sascha Hauer wrote:
> Now that the of thermal support no longer changes the
> thermal_zone_device_ops it can be const again.
> 
> Signed-off-by: Sascha Hauer 
> ---
>  drivers/thermal/thermal_core.c | 2 +-
>  include/linux/thermal.h| 6 +++---

I believe this change deserves to be done together with the required driver 
updates on those parts that calls this function.

>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index acf00b5..dcdf45e 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1455,7 +1455,7 @@ static void remove_trip_attrs(struct 
> thermal_zone_device *tz)
>   */
>  struct thermal_zone_device *thermal_zone_device_register(const char *type,
>   int trips, int mask, void *devdata,
> - struct thermal_zone_device_ops *ops,
> + const struct thermal_zone_device_ops *ops,
>   const struct thermal_zone_params *tzp,
>   int passive_delay, int polling_delay)
>  {
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index 2f77091..ac2897c 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -185,7 +185,7 @@ struct thermal_zone_device {
>   unsigned long emul_temperature;
>   int passive;
>   unsigned int forced_passive;
> - struct thermal_zone_device_ops *ops;
> + const struct thermal_zone_device_ops *ops;
>   const struct thermal_zone_params *tzp;
>   struct thermal_governor *governor;
>   struct list_head thermal_instances;
> @@ -317,7 +317,7 @@ void thermal_zone_of_sensor_unregister(struct device *dev,
>  
>  #if IS_ENABLED(CONFIG_THERMAL)
>  struct thermal_zone_device *thermal_zone_device_register(const char *, int, 
> int,
> - void *, struct thermal_zone_device_ops *,
> + void *, const struct thermal_zone_device_ops *,
>   const struct thermal_zone_params *, int, int);
>  void thermal_zone_device_unregister(struct thermal_zone_device *);
>  
> @@ -345,7 +345,7 @@ void thermal_notify_framework(struct thermal_zone_device 
> *, int);
>  #else
>  static inline struct thermal_zone_device *thermal_zone_device_register(
>   const char *type, int trips, int mask, void *devdata,
> - struct thermal_zone_device_ops *ops,
> + const struct thermal_zone_device_ops *ops,
>   const struct thermal_zone_params *tzp,
>   int passive_delay, int polling_delay)
>  { return ERR_PTR(-ENODEV); }
> -- 
> 2.1.4
> 


signature.asc
Description: Digital signature


Re: [PATCH 11/13] thermal: of: make of_thermal_ops const

2015-04-06 Thread Eduardo Valentin
On Thu, Mar 26, 2015 at 04:53:58PM +0100, Sascha Hauer wrote:
> Now that we no longer modify the ops they can be const again. Also
> we no longer have to duplicate them.
> 
> Signed-off-by: Sascha Hauer 

Should this one be merged to patch 09/13?

> ---
>  drivers/thermal/of-thermal.c | 18 +++---
>  1 file changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
> index df14fdd..9b63193 100644
> --- a/drivers/thermal/of-thermal.c
> +++ b/drivers/thermal/of-thermal.c
> @@ -365,7 +365,7 @@ static int of_thermal_get_crit_temp(struct 
> thermal_zone_device *tz,
>   return -EINVAL;
>  }
>  
> -static struct thermal_zone_device_ops of_thermal_ops = {
> +static const struct thermal_zone_device_ops of_thermal_ops = {
>   .get_temp = of_thermal_get_temp,
>   .get_trend = of_thermal_get_trend,
>   .set_emul_temp = of_thermal_set_emul_temp,
> @@ -539,10 +539,6 @@ void thermal_zone_of_sensor_unregister(struct device 
> *dev,
>   return;
>  
>   mutex_lock(>lock);
> - tzd->ops->get_temp = NULL;
> - tzd->ops->get_trend = NULL;
> - tzd->ops->set_emul_temp = NULL;
> -
>   tz->ops = NULL;
>   tz->sensor_data = NULL;
>   mutex_unlock(>lock);
> @@ -849,7 +845,6 @@ int __init of_parse_thermal_zones(void)
>  {
>   struct device_node *np, *child;
>   struct __thermal_zone *tz;
> - struct thermal_zone_device_ops *ops;
>  
>   np = of_find_node_by_name(NULL, "thermal-zones");
>   if (!np) {
> @@ -873,29 +868,22 @@ int __init of_parse_thermal_zones(void)
>   continue;
>   }
>  
> - ops = kmemdup(_thermal_ops, sizeof(*ops), GFP_KERNEL);
> - if (!ops)
> - goto exit_free;
> -
>   tzp = kzalloc(sizeof(*tzp), GFP_KERNEL);
> - if (!tzp) {
> - kfree(ops);
> + if (!tzp)
>   goto exit_free;
> - }
>  
>   /* No hwmon because there might be hwmon drivers registering */
>   tzp->no_hwmon = true;
>  
>   zone = thermal_zone_device_register(child->name, tz->ntrips,
>   0, tz,
> - ops, tzp,
> + _thermal_ops, tzp,
>   tz->passive_delay,
>   tz->polling_delay);
>   if (IS_ERR(zone)) {
>   pr_err("Failed to build %s zone %ld\n", child->name,
>  PTR_ERR(zone));
>   kfree(tzp);
> - kfree(ops);
>   of_thermal_free_zone(tz);
>   /* attempting to build remaining zones still */
>   }
> -- 
> 2.1.4
> 


signature.asc
Description: Digital signature


Re: [PATCH 09/13] thermal: of: always set sensor related callbacks

2015-04-06 Thread Eduardo Valentin
On Thu, Mar 26, 2015 at 04:53:56PM +0100, Sascha Hauer wrote:
> Now that the thermal core treats -ENOSYS like the callbacks were
> not present at all we no longer have to overwrite the ops during
> runtime but instead can always set them and return -ENOSYS if no
> sensor is registered.
> 
> Signed-off-by: Sascha Hauer 
> ---
>  drivers/thermal/of-thermal.c | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
> index b39e22f..df14fdd 100644
> --- a/drivers/thermal/of-thermal.c
> +++ b/drivers/thermal/of-thermal.c
> @@ -91,7 +91,7 @@ static int of_thermal_get_temp(struct thermal_zone_device 
> *tz,
>  {
>   struct __thermal_zone *data = tz->devdata;
>  
> - if (!data->ops->get_temp)
> + if (!data->ops)
>   return -EINVAL;
>  
>   return data->ops->get_temp(data->sensor_data, temp);
> @@ -178,7 +178,7 @@ static int of_thermal_set_emul_temp(struct 
> thermal_zone_device *tz,
>   struct __thermal_zone *data = tz->devdata;
>  
>   if (!data->ops || !data->ops->set_emul_temp)
> - return -EINVAL;
> + return -ENOSYS;
>  
>   return data->ops->set_emul_temp(data->sensor_data, temp);
>  }
> @@ -189,8 +189,8 @@ static int of_thermal_get_trend(struct 
> thermal_zone_device *tz, int trip,
>   struct __thermal_zone *data = tz->devdata;
>   int r;
>  
> - if (!data->ops->get_trend)
> - return -EINVAL;
> + if (!data->ops || !data->ops->get_trend)
> + return -ENOSYS;
>  
>   r = data->ops->get_trend(data->sensor_data, trip, trend);
>   if (r)
> @@ -366,6 +366,10 @@ static int of_thermal_get_crit_temp(struct 
> thermal_zone_device *tz,
>  }
>  
>  static struct thermal_zone_device_ops of_thermal_ops = {
> + .get_temp = of_thermal_get_temp,
> + .get_trend = of_thermal_get_trend,
> + .set_emul_temp = of_thermal_set_emul_temp,
> +
>   .get_mode = of_thermal_get_mode,
>   .set_mode = of_thermal_set_mode,
>  
> @@ -399,13 +403,13 @@ thermal_zone_of_add_sensor(struct device_node *zone,
>   if (!ops)
>   return ERR_PTR(-EINVAL);
>  
> + if (!ops->get_temp)
> + return ERR_PTR(-EINVAL);
> +
>   mutex_lock(>lock);
>   tz->ops = ops;
>   tz->sensor_data = data;
>  
> - tzd->ops->get_temp = of_thermal_get_temp;
> - tzd->ops->get_trend = of_thermal_get_trend;
> - tzd->ops->set_emul_temp = of_thermal_set_emul_temp;

You may want to update the thermal_zone_of_sensor_unregister too.

>   mutex_unlock(>lock);
>  
>   return tzd;
> -- 
> 2.1.4
> 


signature.asc
Description: Digital signature


linux-next: manual merge of the net-next tree with the net tree

2015-04-06 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in
net/core/fib_rules.c between commit 419df12fb5fa ("net: move
fib_rules_unregister() under rtnl lock") from the net tree and commit
efd7ef1c1929 ("net: Kill hold_net release_net") from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc net/core/fib_rules.c
index e4fdc9dfb2c7,68ea6950cad1..
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@@ -175,10 -165,10 +165,10 @@@ void fib_rules_unregister(struct fib_ru
  
spin_lock(>rules_mod_lock);
list_del_rcu(>list);
 -  fib_rules_cleanup_ops(ops);
spin_unlock(>rules_mod_lock);
  
 +  fib_rules_cleanup_ops(ops);
-   call_rcu(>rcu, fib_rules_put_rcu);
+   kfree_rcu(ops, rcu);
  }
  EXPORT_SYMBOL_GPL(fib_rules_unregister);
  


pgprfi1xOxhZW.pgp
Description: OpenPGP digital signature


Re: [PATCH 0/9] perf sched replay: Make some improvements and fixes

2015-04-06 Thread Yunlong Song
On 2015/3/31 21:46, Yunlong Song wrote:
> Hi,
>   Found some functions to improve and bugs to fix in perf sched replay.
> 
> Yunlong Song (9):
>   perf sched replay: Use struct task_desc instead of struct task_task
> for correct meaning
>   perf sched replay: Increase the MAX_PID value to fix assertion failure
> problem
>   perf sched replay: Alloc the memory of pid_to_task dynamically to
> adapt to the unexpected change of pid_max
>   perf sched replay: Realloc the memory of pid_to_task stepwise to adapt
> to the different pid_max configurations
>   perf sched replay: Fix the segmentation fault problem caused by pr_err
> in threads
>   perf sched replay: Handle the dead halt of sem_wait when
> create_tasks() fails for any task
>   perf sched replay: Fix the EMFILE error caused by the limitation of
> the maximum open files
>   perf sched replay: Support using -f to override perf.data file
> ownership
>   perf sched replay: Use replay_repeat to calculate the runavg of cpu   
>  usage instead of the default value 10
> 
>  tools/perf/builtin-sched.c | 67 
> +++---
>  1 file changed, 52 insertions(+), 15 deletions(-)
> 

Ping...

-- 
Thanks,
Yunlong Song

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/13] thermal: streamline get_trend callbacks

2015-04-06 Thread Eduardo Valentin
On Thu, Mar 26, 2015 at 04:53:53PM +0100, Sascha Hauer wrote:
> The .get_trend callback in struct thermal_zone_device_ops has the prototype:
> 
>   int (*get_trend) (struct thermal_zone_device *, int,
> enum thermal_trend *);
> 
> whereas the .get_trend callback in struct thermal_zone_of_device_ops has:
> 
>   int (*get_trend)(void *, long *);
> 
> Streamline both prototypes and add the trip argument to the OF callback
> aswell and use enum thermal_trend * instead of an integer pointer.
> 
> While the OF prototype may be the better one, this should be decided at
> framework level and not on OF level.
> 
> Signed-off-by: Sascha Hauer 
> ---
>  drivers/thermal/of-thermal.c   | 11 +-
>  drivers/thermal/ti-soc-thermal/ti-thermal-common.c | 25 
> +++---
>  include/linux/thermal.h|  2 +-
>  3 files changed, 10 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
> index 668fb1b..b39e22f 100644
> --- a/drivers/thermal/of-thermal.c
> +++ b/drivers/thermal/of-thermal.c
> @@ -187,24 +187,15 @@ static int of_thermal_get_trend(struct 
> thermal_zone_device *tz, int trip,
>   enum thermal_trend *trend)
>  {
>   struct __thermal_zone *data = tz->devdata;
> - long dev_trend;
>   int r;
>  
>   if (!data->ops->get_trend)
>   return -EINVAL;
>  
> - r = data->ops->get_trend(data->sensor_data, _trend);
> + r = data->ops->get_trend(data->sensor_data, trip, trend);
>   if (r)
>   return r;
>  
> - /* TODO: These intervals might have some thresholds, but in core code */
> - if (dev_trend > 0)
> - *trend = THERMAL_TREND_RAISING;
> - else if (dev_trend < 0)
> - *trend = THERMAL_TREND_DROPPING;
> - else
> - *trend = THERMAL_TREND_STABLE;
> -
>   return 0;
>  }
>  
> diff --git a/drivers/thermal/ti-soc-thermal/ti-thermal-common.c 
> b/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
> index a38c175..7f8e5f3 100644
> --- a/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
> +++ b/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
> @@ -238,7 +238,7 @@ static int ti_thermal_get_trip_temp(struct 
> thermal_zone_device *thermal,
>   return 0;
>  }
>  
> -static int __ti_thermal_get_trend(void *p, long *trend)
> +static int __ti_thermal_get_trend(void *p, int trip, enum thermal_trend 
> *trend)
>  {
>   struct ti_thermal_data *data = p;
>   struct ti_bandgap *bgp;
> @@ -251,22 +251,6 @@ static int __ti_thermal_get_trend(void *p, long *trend)
>   if (ret)
>   return ret;
>  
> - *trend = tr;
> -
> - return 0;
> -}
> -
> -/* Get the temperature trend callback functions for thermal zone */
> -static int ti_thermal_get_trend(struct thermal_zone_device *thermal,
> - int trip, enum thermal_trend *trend)
> -{
> - int ret;
> - long tr;
> -
> - ret = __ti_thermal_get_trend(thermal->devdata, );
> - if (ret)
> - return ret;
> -
>   if (tr > 0)
>   *trend = THERMAL_TREND_RAISING;
>   else if (tr < 0)
> @@ -277,6 +261,13 @@ static int ti_thermal_get_trend(struct 
> thermal_zone_device *thermal,
>   return 0;
>  }
>  
> +/* Get the temperature trend callback functions for thermal zone */
> +static int ti_thermal_get_trend(struct thermal_zone_device *thermal,
> + int trip, enum thermal_trend *trend)
> +{
> + return __ti_thermal_get_trend(thermal->devdata, trip, trend);
> +}
> +
>  /* Get critical temperature callback functions for thermal zone */
>  static int ti_thermal_get_crit_temp(struct thermal_zone_device *thermal,
>   unsigned long *temp)
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index db6c12b..ba2e29a 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -273,7 +273,7 @@ struct thermal_genl_event {
>   */
>  struct thermal_zone_of_device_ops {
>   int (*get_temp)(void *, long *);
> - int (*get_trend)(void *, long *);
> + int (*get_trend)(void *, int trend, enum thermal_trend *);

Could you please keep the kernel doc entry up to date?


Apart from that, looks good to me.

>   int (*set_emul_temp)(void *, unsigned long);
>  };
>  
> -- 
> 2.1.4
> 


signature.asc
Description: Digital signature


linux-next: manual merge of the net-next tree with the net tree

2015-04-06 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in
drivers/net/ethernet/mellanox/mlx4/cmd.c between commit fde913e25496
("net/mlx4_core: Fix error message deprecation for ConnectX-2 cards")
from the net tree and commit a130b5905732 ("net/mlx4: Add SET_PORT
opcode modifiers enumeration") from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/net/ethernet/mellanox/mlx4/cmd.c
index 546ca4226916,06993ea9e6ba..
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@@ -724,9 -725,9 +725,10 @@@ static int mlx4_cmd_wait(struct mlx4_de
 * on the host, we deprecate the error message for this
 * specific command/input_mod/opcode_mod/fw-status to be debug.
 */
 -  if (op == MLX4_CMD_SET_PORT && in_modifier == 1 &&
 +  if (op == MLX4_CMD_SET_PORT &&
 +  (in_modifier == 1 || in_modifier == 2) &&
-   op_modifier == 0 && context->fw_status == CMD_STAT_BAD_SIZE)
+   op_modifier == MLX4_SET_PORT_IB_OPCODE &&
+   context->fw_status == CMD_STAT_BAD_SIZE)
mlx4_dbg(dev, "command 0x%x failed: fw status = 0x%x\n",
 op, context->fw_status);
else


pgp4mClhkEjIi.pgp
Description: OpenPGP digital signature


[PATCH] mac80211: Move message tracepoints to their own header

2015-04-06 Thread Steven Rostedt

Every tracing file must have its own TRACE_SYSTEM defined.
The mac80211 tracepoint header broke this and add in the middle
of the file had:

 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM mac80211_msg

Unfortunately, this broke new code in the ftrace infrastructure.
Moving the mac80211_msg into its own trace file with its own
TRACE_SYSTEM defined fixes the issue.

Cc: Johannes Berg 
Signed-off-by: Steven Rostedt 
---
 net/mac80211/trace.c |  1 +
 net/mac80211/trace.h | 38 --
 net/mac80211/trace_msg.h | 53 
 3 files changed, 54 insertions(+), 38 deletions(-)
 create mode 100644 net/mac80211/trace_msg.h

diff --git a/net/mac80211/trace.c b/net/mac80211/trace.c
index 386e45d8a958..edfe0c170a1c 100644
--- a/net/mac80211/trace.c
+++ b/net/mac80211/trace.c
@@ -8,6 +8,7 @@
 #include "debug.h"
 #define CREATE_TRACE_POINTS
 #include "trace.h"
+#include "trace_msg.h"
 
 #ifdef CONFIG_MAC80211_MESSAGE_TRACING
 void __sdata_info(const char *fmt, ...)
diff --git a/net/mac80211/trace.h b/net/mac80211/trace.h
index 263a9561eb26..755a5388dbca 100644
--- a/net/mac80211/trace.h
+++ b/net/mac80211/trace.h
@@ -2312,44 +2312,6 @@ TRACE_EVENT(drv_tdls_recv_channel_switch,
)
 );
 
-#ifdef CONFIG_MAC80211_MESSAGE_TRACING
-#undef TRACE_SYSTEM
-#define TRACE_SYSTEM mac80211_msg
-
-#define MAX_MSG_LEN100
-
-DECLARE_EVENT_CLASS(mac80211_msg_event,
-   TP_PROTO(struct va_format *vaf),
-
-   TP_ARGS(vaf),
-
-   TP_STRUCT__entry(
-   __dynamic_array(char, msg, MAX_MSG_LEN)
-   ),
-
-   TP_fast_assign(
-   WARN_ON_ONCE(vsnprintf(__get_dynamic_array(msg),
-  MAX_MSG_LEN, vaf->fmt,
-  *vaf->va) >= MAX_MSG_LEN);
-   ),
-
-   TP_printk("%s", __get_str(msg))
-);
-
-DEFINE_EVENT(mac80211_msg_event, mac80211_info,
-   TP_PROTO(struct va_format *vaf),
-   TP_ARGS(vaf)
-);
-DEFINE_EVENT(mac80211_msg_event, mac80211_dbg,
-   TP_PROTO(struct va_format *vaf),
-   TP_ARGS(vaf)
-);
-DEFINE_EVENT(mac80211_msg_event, mac80211_err,
-   TP_PROTO(struct va_format *vaf),
-   TP_ARGS(vaf)
-);
-#endif
-
 #endif /* !__MAC80211_DRIVER_TRACE || TRACE_HEADER_MULTI_READ */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/net/mac80211/trace_msg.h b/net/mac80211/trace_msg.h
new file mode 100644
index ..768f7c22a190
--- /dev/null
+++ b/net/mac80211/trace_msg.h
@@ -0,0 +1,53 @@
+#ifdef CONFIG_MAC80211_MESSAGE_TRACING
+
+#if !defined(__MAC80211_MSG_DRIVER_TRACE) || defined(TRACE_HEADER_MULTI_READ)
+#define __MAC80211_MSG_DRIVER_TRACE
+
+#include 
+#include 
+#include "ieee80211_i.h"
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mac80211_msg
+
+#define MAX_MSG_LEN100
+
+DECLARE_EVENT_CLASS(mac80211_msg_event,
+   TP_PROTO(struct va_format *vaf),
+
+   TP_ARGS(vaf),
+
+   TP_STRUCT__entry(
+   __dynamic_array(char, msg, MAX_MSG_LEN)
+   ),
+
+   TP_fast_assign(
+   WARN_ON_ONCE(vsnprintf(__get_dynamic_array(msg),
+  MAX_MSG_LEN, vaf->fmt,
+  *vaf->va) >= MAX_MSG_LEN);
+   ),
+
+   TP_printk("%s", __get_str(msg))
+);
+
+DEFINE_EVENT(mac80211_msg_event, mac80211_info,
+   TP_PROTO(struct va_format *vaf),
+   TP_ARGS(vaf)
+);
+DEFINE_EVENT(mac80211_msg_event, mac80211_dbg,
+   TP_PROTO(struct va_format *vaf),
+   TP_ARGS(vaf)
+);
+DEFINE_EVENT(mac80211_msg_event, mac80211_err,
+   TP_PROTO(struct va_format *vaf),
+   TP_ARGS(vaf)
+);
+#endif /* !__MAC80211_MSG_DRIVER_TRACE || TRACE_HEADER_MULTI_READ */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace_msg
+#include 
+
+#endif
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

2015-04-06 Thread Mike Galbraith
On Mon, 2015-04-06 at 15:28 -0400, Rik van Riel wrote:
> On 04/03/2015 11:43 PM, Mike Galbraith wrote:
> 
> > Speaking of microsecond savers, the (ick) deferment experiment 
> > below
> > cut 60 core jitter in half.  Shooting the clocksource watchdog 
> > fixes
> > alternating ~15us/~5us tick on my desktop box.
> > 
> > With workqueue twiddles and whatnot floating around, the thing is
> > starting to look viable.
> 
> Doesn't look too bad to me, though the changes below
> could probably use some comments when turned into a
> final patch :)

The watchdog yeah, the tick thing may want to become.. anything else.

-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v15 01/15] qspinlock: A simple generic 4-byte queue spinlock

2015-04-06 Thread Waiman Long
This patch introduces a new generic queue spinlock implementation that
can serve as an alternative to the default ticket spinlock. Compared
with the ticket spinlock, this queue spinlock should be almost as fair
as the ticket spinlock. It has about the same speed in single-thread
and it can be much faster in high contention situations especially when
the spinlock is embedded within the data structure to be protected.

Only in light to moderate contention where the average queue depth
is around 1-3 will this queue spinlock be potentially a bit slower
due to the higher slowpath overhead.

This queue spinlock is especially suit to NUMA machines with a large
number of cores as the chance of spinlock contention is much higher
in those machines. The cost of contention is also higher because of
slower inter-node memory traffic.

Due to the fact that spinlocks are acquired with preemption disabled,
the process will not be migrated to another CPU while it is trying
to get a spinlock. Ignoring interrupt handling, a CPU can only be
contending in one spinlock at any one time. Counting soft IRQ, hard
IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
activities.  By allocating a set of per-cpu queue nodes and used them
to form a waiting queue, we can encode the queue node address into a
much smaller 24-bit size (including CPU number and queue node index)
leaving one byte for the lock.

Please note that the queue node is only needed when waiting for the
lock. Once the lock is acquired, the queue node can be released to
be used later.

Signed-off-by: Waiman Long 
Signed-off-by: Peter Zijlstra (Intel) 
---
 include/asm-generic/qspinlock.h   |  132 +
 include/asm-generic/qspinlock_types.h |   58 +
 kernel/Kconfig.locks  |7 +
 kernel/locking/Makefile   |1 +
 kernel/locking/mcs_spinlock.h |1 +
 kernel/locking/qspinlock.c|  209 +
 6 files changed, 408 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/qspinlock.h
 create mode 100644 include/asm-generic/qspinlock_types.h
 create mode 100644 kernel/locking/qspinlock.c

diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
new file mode 100644
index 000..315d6dc
--- /dev/null
+++ b/include/asm-generic/qspinlock.h
@@ -0,0 +1,132 @@
+/*
+ * Queue spinlock
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long 
+ */
+#ifndef __ASM_GENERIC_QSPINLOCK_H
+#define __ASM_GENERIC_QSPINLOCK_H
+
+#include 
+
+/**
+ * queue_spin_is_locked - is the spinlock locked?
+ * @lock: Pointer to queue spinlock structure
+ * Return: 1 if it is locked, 0 otherwise
+ */
+static __always_inline int queue_spin_is_locked(struct qspinlock *lock)
+{
+   return atomic_read(>val);
+}
+
+/**
+ * queue_spin_value_unlocked - is the spinlock structure unlocked?
+ * @lock: queue spinlock structure
+ * Return: 1 if it is unlocked, 0 otherwise
+ *
+ * N.B. Whenever there are tasks waiting for the lock, it is considered
+ *  locked wrt the lockref code to avoid lock stealing by the lockref
+ *  code and change things underneath the lock. This also allows some
+ *  optimizations to be applied without conflict with lockref.
+ */
+static __always_inline int queue_spin_value_unlocked(struct qspinlock lock)
+{
+   return !atomic_read();
+}
+
+/**
+ * queue_spin_is_contended - check if the lock is contended
+ * @lock : Pointer to queue spinlock structure
+ * Return: 1 if lock contended, 0 otherwise
+ */
+static __always_inline int queue_spin_is_contended(struct qspinlock *lock)
+{
+   return atomic_read(>val) & ~_Q_LOCKED_MASK;
+}
+/**
+ * queue_spin_trylock - try to acquire the queue spinlock
+ * @lock : Pointer to queue spinlock structure
+ * Return: 1 if lock acquired, 0 if failed
+ */
+static __always_inline int queue_spin_trylock(struct qspinlock *lock)
+{
+   if (!atomic_read(>val) &&
+  (atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL) == 0))
+   return 1;
+   return 0;
+}
+
+extern void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+
+/**
+ * queue_spin_lock - acquire a queue spinlock
+ * @lock: Pointer to queue spinlock structure
+ */
+static __always_inline void queue_spin_lock(struct qspinlock *lock)
+{
+   u32 val;
+
+   val = atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL);
+   if (likely(val == 0))
+   

[PATCH v15 04/15] qspinlock: Extract out code snippets for the next patch

2015-04-06 Thread Waiman Long
This is a preparatory patch that extracts out the following 2 code
snippets to prepare for the next performance optimization patch.

 1) the logic for the exchange of new and previous tail code words
into a new xchg_tail() function.
 2) the logic for clearing the pending bit and setting the locked bit
into a new clear_pending_set_locked() function.

This patch also simplifies the trylock operation before queuing by
calling queue_spin_trylock() directly.

Signed-off-by: Waiman Long 
Signed-off-by: Peter Zijlstra (Intel) 
---
 include/asm-generic/qspinlock_types.h |2 +
 kernel/locking/qspinlock.c|   79 -
 2 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/include/asm-generic/qspinlock_types.h 
b/include/asm-generic/qspinlock_types.h
index 9c3f5c2..ef36613 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -58,6 +58,8 @@ typedef struct qspinlock {
 #define _Q_TAIL_CPU_BITS   (32 - _Q_TAIL_CPU_OFFSET)
 #define _Q_TAIL_CPU_MASK   _Q_SET_MASK(TAIL_CPU)
 
+#define _Q_TAIL_MASK   (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK)
+
 #define _Q_LOCKED_VAL  (1U << _Q_LOCKED_OFFSET)
 #define _Q_PENDING_VAL (1U << _Q_PENDING_OFFSET)
 
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 0351f78..11f6ad9 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -97,6 +97,42 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
 #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
 
 /**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queue spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+   atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, >val);
+}
+
+/**
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queue spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail)
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+   u32 old, new, val = atomic_read(>val);
+
+   for (;;) {
+   new = (val & _Q_LOCKED_PENDING_MASK) | tail;
+   old = atomic_cmpxchg(>val, val, new);
+   if (old == val)
+   break;
+
+   val = old;
+   }
+   return old;
+}
+
+/**
  * queue_spin_lock_slowpath - acquire the queue spinlock
  * @lock: Pointer to queue spinlock structure
  * @val: Current value of the queue spinlock 32-bit word
@@ -178,15 +214,7 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 *
 * *,1,0 -> *,0,1
 */
-   for (;;) {
-   new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL;
-
-   old = atomic_cmpxchg(>val, val, new);
-   if (old == val)
-   break;
-
-   val = old;
-   }
+   clear_pending_set_locked(lock);
return;
 
/*
@@ -203,37 +231,26 @@ queue:
node->next = NULL;
 
/*
-* We have already touched the queueing cacheline; don't bother with
-* pending stuff.
-*
-* trylock || xchg(lock, node)
-*
-* 0,0,0 -> 0,0,1 ; no tail, not locked -> no tail, locked.
-* p,y,x -> n,y,x ; tail was p -> tail is n; preserving locked.
+* We touched a (possibly) cold cacheline in the per-cpu queue node;
+* attempt the trylock once more in the hope someone let go while we
+* weren't watching.
 */
-   for (;;) {
-   new = _Q_LOCKED_VAL;
-   if (val)
-   new = tail | (val & _Q_LOCKED_PENDING_MASK);
-
-   old = atomic_cmpxchg(>val, val, new);
-   if (old == val)
-   break;
-
-   val = old;
-   }
+   if (queue_spin_trylock(lock))
+   goto release;
 
/*
-* we won the trylock; forget about queueing.
+* We have already touched the queueing cacheline; don't bother with
+* pending stuff.
+*
+* p,*,* -> n,*,*
 */
-   if (new == _Q_LOCKED_VAL)
-   goto release;
+   old = xchg_tail(lock, tail);
 
/*
 * if there was a previous node; link it and wait until reaching the
 * head of the waitqueue.
 */
-   if (old & ~_Q_LOCKED_PENDING_MASK) {
+   if (old & _Q_TAIL_MASK) {
prev = decode_tail(old);
WRITE_ONCE(prev->next, node);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015-04-06 Thread Waiman Long
Provide a separate (second) version of the spin_lock_slowpath for
paravirt along with a special unlock path.

The second slowpath is generated by adding a few pv hooks to the
normal slowpath, but where those will compile away for the native
case, they expand into special wait/wake code for the pv version.

The actual MCS queue can use extra storage in the mcs_nodes[] array to
keep track of state and therefore uses directed wakeups.

The head contender has no such storage directly visible to the
unlocker.  So the unlocker searches a hash table with open addressing
using a simple binary Galois linear feedback shift register.

Signed-off-by: Waiman Long 
---
 kernel/locking/qspinlock.c  |   69 -
 kernel/locking/qspinlock_paravirt.h |  321 +++
 2 files changed, 389 insertions(+), 1 deletions(-)
 create mode 100644 kernel/locking/qspinlock_paravirt.h

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index fc2e5ab..33b3f54 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -18,6 +18,9 @@
  * Authors: Waiman Long 
  *  Peter Zijlstra 
  */
+
+#ifndef _GEN_PV_LOCK_SLOWPATH
+
 #include 
 #include 
 #include 
@@ -65,13 +68,21 @@
 
 #include "mcs_spinlock.h"
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define MAX_NODES  8
+#else
+#define MAX_NODES  4
+#endif
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
  *
  * Exactly fits one 64-byte cacheline on a 64-bit architecture.
+ *
+ * PV doubles the storage and uses the second cacheline for PV state.
  */
-static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[4]);
+static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[MAX_NODES]);
 
 /*
  * We must be able to distinguish between no-tail and the tail at 0:0,
@@ -220,6 +231,33 @@ static __always_inline void set_locked(struct qspinlock 
*lock)
WRITE_ONCE(l->locked, _Q_LOCKED_VAL);
 }
 
+
+/*
+ * Generate the native code for queue_spin_unlock_slowpath(); provide NOPs for
+ * all the PV callbacks.
+ */
+
+static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_kick_node(struct mcs_spinlock *node) { }
+
+static __always_inline void __pv_wait_head(struct qspinlock *lock,
+  struct mcs_spinlock *node) { }
+
+#define pv_enabled()   false
+
+#define pv_init_node   __pv_init_node
+#define pv_wait_node   __pv_wait_node
+#define pv_kick_node   __pv_kick_node
+
+#define pv_wait_head   __pv_wait_head
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define queue_spin_lock_slowpath   native_queue_spin_lock_slowpath
+#endif
+
+#endif /* _GEN_PV_LOCK_SLOWPATH */
+
 /**
  * queue_spin_lock_slowpath - acquire the queue spinlock
  * @lock: Pointer to queue spinlock structure
@@ -249,6 +287,9 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 
BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+   if (pv_enabled())
+   goto queue;
+
if (virt_queue_spin_lock(lock))
return;
 
@@ -325,6 +366,7 @@ queue:
node += idx;
node->locked = 0;
node->next = NULL;
+   pv_init_node(node);
 
/*
 * We touched a (possibly) cold cacheline in the per-cpu queue node;
@@ -350,6 +392,7 @@ queue:
prev = decode_tail(old);
WRITE_ONCE(prev->next, node);
 
+   pv_wait_node(node);
arch_mcs_spin_lock_contended(>locked);
}
 
@@ -365,6 +408,7 @@ queue:
 * does not imply a full barrier.
 *
 */
+   pv_wait_head(lock, node);
while ((val = smp_load_acquire(>val.counter)) & 
_Q_LOCKED_PENDING_MASK)
cpu_relax();
 
@@ -397,6 +441,7 @@ queue:
cpu_relax();
 
arch_mcs_spin_unlock_contended(>locked);
+   pv_kick_node(next);
 
 release:
/*
@@ -405,3 +450,25 @@ release:
this_cpu_dec(mcs_nodes[0].count);
 }
 EXPORT_SYMBOL(queue_spin_lock_slowpath);
+
+/*
+ * Generate the paravirt code for queue_spin_unlock_slowpath().
+ */
+#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
+#define _GEN_PV_LOCK_SLOWPATH
+
+#undef  pv_enabled
+#define pv_enabled()   true
+
+#undef pv_init_node
+#undef pv_wait_node
+#undef pv_kick_node
+#undef pv_wait_head
+
+#undef  queue_spin_lock_slowpath
+#define queue_spin_lock_slowpath   __pv_queue_spin_lock_slowpath
+
+#include "qspinlock_paravirt.h"
+#include "qspinlock.c"
+
+#endif
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
new file mode 100644
index 000..49dbd39
--- /dev/null
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -0,0 +1,321 @@
+#ifndef _GEN_PV_LOCK_SLOWPATH
+#error "do not include this file"
+#endif
+
+/*
+ * 

[PATCH v15 05/15] qspinlock: Optimize for smaller NR_CPUS

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) 

When we allow for a max NR_CPUS < 2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.

By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxchg() operations.

This in turn allows us to unconditionally acquire; the locked state
as observed by the wait loops cannot change. And because both locked
and pending are now a full byte we can use simple stores for the
state transition, obviating one atomic operation entirely.

This optimization is needed to make the qspinlock achieve performance
parity with ticket spinlock at light load.

All this is horribly broken on Alpha pre EV56 (and any other arch that
cannot do single-copy atomic byte stores).

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Waiman Long 
---
 include/asm-generic/qspinlock_types.h |   13 ++
 kernel/locking/qspinlock.c|   69 -
 2 files changed, 81 insertions(+), 1 deletions(-)

diff --git a/include/asm-generic/qspinlock_types.h 
b/include/asm-generic/qspinlock_types.h
index ef36613..f01b55d 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -35,6 +35,14 @@ typedef struct qspinlock {
 /*
  * Bitfields in the atomic value:
  *
+ * When NR_CPUS < 16K
+ *  0- 7: locked byte
+ * 8: pending
+ *  9-15: not used
+ * 16-17: tail index
+ * 18-31: tail cpu (+1)
+ *
+ * When NR_CPUS >= 16K
  *  0- 7: locked byte
  * 8: pending
  *  9-10: tail index
@@ -47,7 +55,11 @@ typedef struct qspinlock {
 #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED)
 
 #define _Q_PENDING_OFFSET  (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#if CONFIG_NR_CPUS < (1U << 14)
+#define _Q_PENDING_BITS8
+#else
 #define _Q_PENDING_BITS1
+#endif
 #define _Q_PENDING_MASK_Q_SET_MASK(PENDING)
 
 #define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS)
@@ -58,6 +70,7 @@ typedef struct qspinlock {
 #define _Q_TAIL_CPU_BITS   (32 - _Q_TAIL_CPU_OFFSET)
 #define _Q_TAIL_CPU_MASK   _Q_SET_MASK(TAIL_CPU)
 
+#define _Q_TAIL_OFFSET _Q_TAIL_IDX_OFFSET
 #define _Q_TAIL_MASK   (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK)
 
 #define _Q_LOCKED_VAL  (1U << _Q_LOCKED_OFFSET)
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 11f6ad9..bcc99e6 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -56,6 +57,10 @@
  * node; whereby avoiding the need to carry a node from lock to unlock, and
  * preserving existing lock API. This also makes the unlock code simpler and
  * faster.
+ *
+ * N.B. The current implementation only supports architectures that allow
+ *  atomic operations on smaller 8-bit and 16-bit data types.
+ *
  */
 
 #include "mcs_spinlock.h"
@@ -96,6 +101,62 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
 
 #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
 
+/*
+ * By using the whole 2nd least significant byte for the pending bit, we
+ * can allow better optimization of the lock acquisition for the pending
+ * bit holder.
+ */
+#if _Q_PENDING_BITS == 8
+
+struct __qspinlock {
+   union {
+   atomic_t val;
+   struct {
+#ifdef __LITTLE_ENDIAN
+   u16 locked_pending;
+   u16 tail;
+#else
+   u16 tail;
+   u16 locked_pending;
+#endif
+   };
+   };
+};
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queue spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ *
+ * Lock stealing is not allowed if this function is used.
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+   struct __qspinlock *l = (void *)lock;
+
+   WRITE_ONCE(l->locked_pending, _Q_LOCKED_VAL);
+}
+
+/*
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queue spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail)
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+   struct __qspinlock *l = (void *)lock;
+
+   return (u32)xchg(>tail, tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
+}
+
+#else /* _Q_PENDING_BITS == 8 */
+
 /**
  * clear_pending_set_locked - take ownership and clear the pending bit.
  * @lock: Pointer to queue spinlock structure
@@ -131,6 +192,7 @@ static __always_inline u32 xchg_tail(struct qspinlock 
*lock, u32 tail)
}
return old;
 }
+#endif /* _Q_PENDING_BITS == 8 */
 
 /**
  * queue_spin_lock_slowpath - acquire the queue spinlock
@@ -205,8 +267,13 @@ void 

[PATCH v15 02/15] qspinlock, x86: Enable x86-64 to use queue spinlock

2015-04-06 Thread Waiman Long
This patch makes the necessary changes at the x86 architecture
specific layer to enable the use of queue spinlock for x86-64. As
x86-32 machines are typically not multi-socket. The benefit of queue
spinlock may not be apparent. So queue spinlock is not enabled.

Currently, there is some incompatibilities between the para-virtualized
spinlock code (which hard-codes the use of ticket spinlock) and the
queue spinlock. Therefore, the use of queue spinlock is disabled when
the para-virtualized spinlock is enabled.

The arch/x86/include/asm/qspinlock.h header file includes some x86
specific optimization which will make the queue spinlock code perform
better than the generic implementation.

Signed-off-by: Waiman Long 
Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/Kconfig  |1 +
 arch/x86/include/asm/qspinlock.h  |   20 
 arch/x86/include/asm/spinlock.h   |5 +
 arch/x86/include/asm/spinlock_types.h |4 
 4 files changed, 30 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/qspinlock.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca..49fecb1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -125,6 +125,7 @@ config X86
select MODULES_USE_ELF_RELA if X86_64
select CLONE_BACKWARDS if X86_32
select ARCH_USE_BUILTIN_BSWAP
+   select ARCH_USE_QUEUE_SPINLOCK
select ARCH_USE_QUEUE_RWLOCK
select OLD_SIGSUSPEND3 if X86_32 || IA32_EMULATION
select OLD_SIGACTION if X86_32
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
new file mode 100644
index 000..222995b
--- /dev/null
+++ b/arch/x86/include/asm/qspinlock.h
@@ -0,0 +1,20 @@
+#ifndef _ASM_X86_QSPINLOCK_H
+#define _ASM_X86_QSPINLOCK_H
+
+#include 
+
+#definequeue_spin_unlock queue_spin_unlock
+/**
+ * queue_spin_unlock - release a queue spinlock
+ * @lock : Pointer to queue spinlock structure
+ *
+ * A smp_store_release() on the least-significant byte.
+ */
+static inline void queue_spin_unlock(struct qspinlock *lock)
+{
+   smp_store_release((u8 *)lock, 0);
+}
+
+#include 
+
+#endif /* _ASM_X86_QSPINLOCK_H */
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index cf87de3..a9c01fd 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -42,6 +42,10 @@
 extern struct static_key paravirt_ticketlocks_enabled;
 static __always_inline bool static_key_false(struct static_key *key);
 
+#ifdef CONFIG_QUEUE_SPINLOCK
+#include 
+#else
+
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
@@ -196,6 +200,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t 
*lock)
cpu_relax();
}
 }
+#endif /* CONFIG_QUEUE_SPINLOCK */
 
 /*
  * Read-write spinlocks, allowing multiple readers
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 5f9d757..5d654a1 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -23,6 +23,9 @@ typedef u32 __ticketpair_t;
 
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 
+#ifdef CONFIG_QUEUE_SPINLOCK
+#include 
+#else
 typedef struct arch_spinlock {
union {
__ticketpair_t head_tail;
@@ -33,6 +36,7 @@ typedef struct arch_spinlock {
 } arch_spinlock_t;
 
 #define __ARCH_SPIN_LOCK_UNLOCKED  { { 0 } }
+#endif /* CONFIG_QUEUE_SPINLOCK */
 
 #include 
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v15 15/15] pvqspinlock: Add debug code to check for PV lock hash sanity

2015-04-06 Thread Waiman Long
The current code for PV lock hash table processing will panic the
system if pv_hash_find() can't find the desired hash bucket. However,
there is no check to see if there is more than one entry for a given
lock which should never happen.

This patch adds a pv_hash_check_duplicate() function to do that which
will only be enabled if CONFIG_DEBUG_SPINLOCK is defined because of
the performance overhead it introduces.

Signed-off-by: Waiman Long 
---
 kernel/locking/qspinlock_paravirt.h |   58 +++
 1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index a9fe10d..4d39c8b 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -107,6 +107,63 @@ static inline u32 hash_align(u32 hash)
 }
 
 /*
+ * Hash table debugging code
+ */
+#ifdef CONFIG_DEBUG_SPINLOCK
+
+#define _NODE_IDX(pn)  unsigned long)pn) & (SMP_CACHE_BYTES - 1)) /\
+   sizeof(struct mcs_spinlock))
+/*
+ * Check if there is additional hash buckets with the same lock which
+ * should not happen.
+ */
+static inline void pv_hash_check_duplicate(struct qspinlock *lock)
+{
+   struct pv_hash_bucket *hb, *end, *hb1 = NULL;
+   int count = 0, used = 0;
+
+   end = _lock_hash[1 << pv_lock_hash_bits];
+   for (hb = pv_lock_hash; hb < end; hb++) {
+   struct qspinlock *l = READ_ONCE(hb->lock);
+   struct pv_node *pn;
+
+   if (l)
+   used++;
+   if (l != lock)
+   continue;
+   if (++count == 1) {
+   hb1 = hb;
+   continue;
+   }
+   WARN_ON(count == 2);
+   if (hb1) {
+   pn = READ_ONCE(hb1->node);
+   printk(KERN_ERR "PV lock hash error: duplicated entry "
+  "#%d - hash %ld, node %ld, cpu %d\n", 1,
+  hb1 - pv_lock_hash, _NODE_IDX(pn),
+  pn ? pn->cpu : -1);
+   hb1 = NULL;
+   }
+   pn = READ_ONCE(hb->node);
+   printk(KERN_ERR "PV lock hash error: duplicated entry #%d - "
+  "hash %ld, node %ld, cpu %d\n", count, hb - pv_lock_hash,
+  _NODE_IDX(pn), pn ? pn->cpu : -1);
+   }
+   /*
+* Warn if more than half of the buckets are used
+*/
+   if (used > (1 << (pv_lock_hash_bits - 1)))
+   printk(KERN_WARNING "PV lock hash warning: "
+  "%d hash entries used!\n", used);
+}
+
+#else /* CONFIG_DEBUG_SPINLOCK */
+
+static inline void pv_hash_check_duplicate(struct qspinlock *lock) {}
+
+#endif /* CONFIG_DEBUG_SPINLOCK */
+
+/*
  * Set up an entry in the lock hash table
  * This is not inlined to reduce size of generated code as it is included
  * twice and is used only in the slowest path of handling CPU halting.
@@ -141,6 +198,7 @@ pv_hash(struct qspinlock *lock, struct pv_node *node)
}
 
 done:
+   pv_hash_check_duplicate(lock);
return >lock;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v15 07/15] qspinlock: Revert to test-and-set on hypervisors

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) 

When we detect a hypervisor (!paravirt, see qspinlock paravirt support
patches), revert to a simple test-and-set lock to avoid the horrors
of queue preemption.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Waiman Long 
---
 arch/x86/include/asm/qspinlock.h |   14 ++
 include/asm-generic/qspinlock.h  |7 +++
 kernel/locking/qspinlock.c   |3 +++
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 222995b..64c925e 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_QSPINLOCK_H
 #define _ASM_X86_QSPINLOCK_H
 
+#include 
 #include 
 
 #definequeue_spin_unlock queue_spin_unlock
@@ -15,6 +16,19 @@ static inline void queue_spin_unlock(struct qspinlock *lock)
smp_store_release((u8 *)lock, 0);
 }
 
+#define virt_queue_spin_lock virt_queue_spin_lock
+
+static inline bool virt_queue_spin_lock(struct qspinlock *lock)
+{
+   if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   while (atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL) != 0)
+   cpu_relax();
+
+   return true;
+}
+
 #include 
 
 #endif /* _ASM_X86_QSPINLOCK_H */
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 315d6dc..bcbbc5e 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -111,6 +111,13 @@ static inline void queue_spin_unlock_wait(struct qspinlock 
*lock)
cpu_relax();
 }
 
+#ifndef virt_queue_spin_lock
+static __always_inline bool virt_queue_spin_lock(struct qspinlock *lock)
+{
+   return false;
+}
+#endif
+
 /*
  * Initializier
  */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 99503ef..fc2e5ab 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -249,6 +249,9 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 
BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+   if (virt_queue_spin_lock(lock))
+   return;
+
/*
 * wait for in-progress pending->locked hand-overs
 *
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v15 13/15] pvqspinlock: Only kick CPU at unlock time

2015-04-06 Thread Waiman Long
Before this patch, a CPU may have been kicked twice before getting
the lock - one before it becomes queue head and once before it gets
the lock. All these CPU kicking and halting (VMEXIT) can be expensive
and slow down system performance, especially in an overcommitted guest.

This patch add a new vCPU state (vcpu_hashed) which enables the code
to delay CPU kicking until at unlock time. Once this state is set,
the new lock holder will set _Q_SLOW_VAL and fill in the hash table
on behalf of the halted queue head vCPU.

Signed-off-by: Waiman Long 
---
 kernel/locking/qspinlock.c  |   10 ++--
 kernel/locking/qspinlock_paravirt.h |   76 +--
 2 files changed, 59 insertions(+), 27 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 33b3f54..b9ba83b 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -239,8 +239,8 @@ static __always_inline void set_locked(struct qspinlock 
*lock)
 
 static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
 static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_kick_node(struct mcs_spinlock *node) { }
-
+static __always_inline void __pv_scan_next(struct qspinlock *lock,
+  struct mcs_spinlock *node) { }
 static __always_inline void __pv_wait_head(struct qspinlock *lock,
   struct mcs_spinlock *node) { }
 
@@ -248,7 +248,7 @@ static __always_inline void __pv_wait_head(struct qspinlock 
*lock,
 
 #define pv_init_node   __pv_init_node
 #define pv_wait_node   __pv_wait_node
-#define pv_kick_node   __pv_kick_node
+#define pv_scan_next   __pv_scan_next
 
 #define pv_wait_head   __pv_wait_head
 
@@ -441,7 +441,7 @@ queue:
cpu_relax();
 
arch_mcs_spin_unlock_contended(>locked);
-   pv_kick_node(next);
+   pv_scan_next(lock, next);
 
 release:
/*
@@ -462,7 +462,7 @@ EXPORT_SYMBOL(queue_spin_lock_slowpath);
 
 #undef pv_init_node
 #undef pv_wait_node
-#undef pv_kick_node
+#undef pv_scan_next
 #undef pv_wait_head
 
 #undef  queue_spin_lock_slowpath
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 49dbd39..a210061 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -18,9 +18,16 @@
 
 #define _Q_SLOW_VAL(3U << _Q_LOCKED_OFFSET)
 
+/*
+ * The vcpu_hashed is a special state that is set by the new lock holder on
+ * the new queue head to indicate that _Q_SLOW_VAL is set and hash entry
+ * filled. With this state, the queue head CPU will always be kicked even
+ * if it is not halted to avoid potential racing condition.
+ */
 enum vcpu_state {
vcpu_running = 0,
vcpu_halted,
+   vcpu_hashed
 };
 
 struct pv_node {
@@ -97,7 +104,13 @@ static inline u32 hash_align(u32 hash)
return hash & ~(PV_HB_PER_LINE - 1);
 }
 
-static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node)
+/*
+ * Set up an entry in the lock hash table
+ * This is not inlined to reduce size of generated code as it is included
+ * twice and is used only in the slowest path of handling CPU halting.
+ */
+static noinline struct qspinlock **
+pv_hash(struct qspinlock *lock, struct pv_node *node)
 {
unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits);
struct pv_hash_bucket *hb, *end;
@@ -178,7 +191,8 @@ static void pv_init_node(struct mcs_spinlock *node)
 
 /*
  * Wait for node->locked to become true, halt the vcpu after a short spin.
- * pv_kick_node() is used to wake the vcpu again.
+ * pv_scan_next() is used to set _Q_SLOW_VAL and fill in hash table on its
+ * behalf.
  */
 static void pv_wait_node(struct mcs_spinlock *node)
 {
@@ -189,7 +203,6 @@ static void pv_wait_node(struct mcs_spinlock *node)
for (loop = SPIN_THRESHOLD; loop; loop--) {
if (READ_ONCE(node->locked))
return;
-
cpu_relax();
}
 
@@ -198,17 +211,21 @@ static void pv_wait_node(struct mcs_spinlock *node)
 *
 * [S] pn->state = vcpu_halted[S] next->locked = 1
 * MB MB
-* [L] pn->locked   [RmW] pn->state = vcpu_running
+* [L] pn->locked   [RmW] pn->state = vcpu_hashed
 *
-* Matches the xchg() from pv_kick_node().
+* Matches the cmpxchg() from pv_scan_next().
 */
(void)xchg(>state, vcpu_halted);
 
if (!READ_ONCE(node->locked))
pv_wait(>state, vcpu_halted);
 
-   /* Make sure that state is correct for spurious wakeup */
-   WRITE_ONCE(pn->state, vcpu_running);
+   /*
+

[PATCH v15 08/15] lfsr: a simple binary Galois linear feedback shift register

2015-04-06 Thread Waiman Long
This patch is based on the code sent out by Peter Zijstra as part
of his queue spinlock patch to provide a hashing function with open
addressing.  The lfsr() function can be used to return a sequence of
numbers that cycle through all the bit patterns (2^n -1) of a given
bit width n except the value 0 in a somewhat random fashion depending
on the LFSR taps that is being used. Callers can provide their own
taps value or use the default.

Signed-off-by: Waiman Long 
---
 include/linux/lfsr.h |   80 ++
 1 files changed, 80 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/lfsr.h

diff --git a/include/linux/lfsr.h b/include/linux/lfsr.h
new file mode 100644
index 000..f570819
--- /dev/null
+++ b/include/linux/lfsr.h
@@ -0,0 +1,80 @@
+#ifndef _LINUX_LFSR_H
+#define _LINUX_LFSR_H
+
+/*
+ * Simple Binary Galois Linear Feedback Shift Register
+ *
+ * http://en.wikipedia.org/wiki/Linear_feedback_shift_register
+ *
+ * This function only currently supports only bits values of 4-30. Callers
+ * that doesn't pass in a constant bits value can optionally define
+ * LFSR_MIN_BITS and LFSR_MAX_BITS before including the lfsr.h header file
+ * to reduce the size of the jump table in the compiled code, if desired.
+ */
+#ifndef LFSR_MIN_BITS
+#define LFSR_MIN_BITS  4
+#endif
+
+#ifndef LFSR_MAX_BITS
+#define LFSR_MAX_BITS  30
+#endif
+
+static __always_inline u32 lfsr_taps(int bits)
+{
+   BUG_ON((bits < LFSR_MIN_BITS) || (bits > LFSR_MAX_BITS));
+   BUILD_BUG_ON((LFSR_MIN_BITS < 4) || (LFSR_MAX_BITS > 30));
+
+#define _IF_BITS_EQ(x) \
+   if (((x) >= LFSR_MIN_BITS) && ((x) <= LFSR_MAX_BITS) && ((x) == bits))
+
+   /*
+* Feedback terms copied from
+* http://users.ece.cmu.edu/~koopman/lfsr/index.html
+*/
+   _IF_BITS_EQ(4)  return 0x0009;
+   _IF_BITS_EQ(5)  return 0x0012;
+   _IF_BITS_EQ(6)  return 0x0021;
+   _IF_BITS_EQ(7)  return 0x0041;
+   _IF_BITS_EQ(8)  return 0x008E;
+   _IF_BITS_EQ(9)  return 0x0108;
+   _IF_BITS_EQ(10) return 0x0204;
+   _IF_BITS_EQ(11) return 0x0402;
+   _IF_BITS_EQ(12) return 0x0829;
+   _IF_BITS_EQ(13) return 0x100D;
+   _IF_BITS_EQ(14) return 0x2015;
+   _IF_BITS_EQ(15) return 0x4122;
+   _IF_BITS_EQ(16) return 0x8112;
+   _IF_BITS_EQ(17) return 0x102C9;
+   _IF_BITS_EQ(18) return 0x20195;
+   _IF_BITS_EQ(19) return 0x403FE;
+   _IF_BITS_EQ(20) return 0x80637;
+   _IF_BITS_EQ(21) return 0x100478;
+   _IF_BITS_EQ(22) return 0x20069E;
+   _IF_BITS_EQ(23) return 0x4004B2;
+   _IF_BITS_EQ(24) return 0x800B87;
+   _IF_BITS_EQ(25) return 0x10004F3;
+   _IF_BITS_EQ(26) return 0x200072D;
+   _IF_BITS_EQ(27) return 0x40006AE;
+   _IF_BITS_EQ(28) return 0x80009E3;
+   _IF_BITS_EQ(29) return 0x1583;
+   _IF_BITS_EQ(30) return 0x2C92;
+#undef _IF_BITS_EQ
+
+   /* Unreachable */
+   return 0;
+}
+
+/*
+ * Please note that LFSR doesn't work with a start state of 0.
+ */
+static inline u32 lfsr(u32 val, int bits, u32 taps)
+{
+   u32 bit = val & 1;
+
+   val >>= 1;
+   if (bit)
+   val ^= taps ? taps : lfsr_taps(bits);
+   return val;
+}
+
+#endif /* _LINUX_LFSR_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v15 12/15] pvqspinlock, x86: Enable PV qspinlock for Xen

2015-04-06 Thread Waiman Long
This patch adds the necessary Xen specific code to allow Xen to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.

Signed-off-by: Waiman Long 
---
 arch/x86/xen/spinlock.c |   63 ---
 kernel/Kconfig.locks|2 +-
 2 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 956374c..728b45b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -17,6 +17,55 @@
 #include "xen-ops.h"
 #include "debugfs.h"
 
+static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+static DEFINE_PER_CPU(char *, irq_name);
+static bool xen_pvspin = true;
+
+#ifdef CONFIG_QUEUE_SPINLOCK
+
+#include 
+
+static void xen_qlock_kick(int cpu)
+{
+   xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
+}
+
+/*
+ * Halt the current CPU & release it back to the host
+ */
+static void xen_qlock_wait(u8 *byte, u8 val)
+{
+   int irq = __this_cpu_read(lock_kicker_irq);
+
+   /* If kicker interrupts not initialized yet, just spin */
+   if (irq == -1)
+   return;
+
+   /* clear pending */
+   xen_clear_irq_pending(irq);
+
+   /*
+* We check the byte value after clearing pending IRQ to make sure
+* that we won't miss a wakeup event because of the clearing.
+*
+* The sync_clear_bit() call in xen_clear_irq_pending() is atomic.
+* So it is effectively a memory barrier for x86.
+*/
+   if (READ_ONCE(*byte) != val)
+   return;
+
+   /*
+* If an interrupt happens here, it will leave the wakeup irq
+* pending, which will cause xen_poll_irq() to return
+* immediately.
+*/
+
+   /* Block until irq becomes pending (or perhaps a spurious wakeup) */
+   xen_poll_irq(irq);
+}
+
+#else /* CONFIG_QUEUE_SPINLOCK */
+
 enum xen_contention_stat {
TAKEN_SLOW,
TAKEN_SLOW_PICKUP,
@@ -100,12 +149,9 @@ struct xen_lock_waiting {
__ticket_t want;
 };
 
-static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
-static DEFINE_PER_CPU(char *, irq_name);
 static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
 static cpumask_t waiting_cpus;
 
-static bool xen_pvspin = true;
 __visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 {
int irq = __this_cpu_read(lock_kicker_irq);
@@ -217,6 +263,7 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
}
}
 }
+#endif /* CONFIG_QUEUE_SPINLOCK */
 
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
@@ -280,8 +327,16 @@ void __init xen_init_spinlocks(void)
return;
}
printk(KERN_DEBUG "xen: PV spinlocks enabled\n");
+#ifdef CONFIG_QUEUE_SPINLOCK
+   __pv_init_lock_hash();
+   pv_lock_ops.queue_spin_lock_slowpath = __pv_queue_spin_lock_slowpath;
+   pv_lock_ops.queue_spin_unlock = PV_CALLEE_SAVE(__pv_queue_spin_unlock);
+   pv_lock_ops.wait = xen_qlock_wait;
+   pv_lock_ops.kick = xen_qlock_kick;
+#else
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
+#endif
 }
 
 /*
@@ -310,7 +365,7 @@ static __init int xen_parse_nopvspin(char *arg)
 }
 early_param("xen_nopvspin", xen_parse_nopvspin);
 
-#ifdef CONFIG_XEN_DEBUG_FS
+#if defined(CONFIG_XEN_DEBUG_FS) && !defined(CONFIG_QUEUE_SPINLOCK)
 
 static struct dentry *d_spin_debug;
 
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 537b13e..0b42933 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -240,7 +240,7 @@ config ARCH_USE_QUEUE_SPINLOCK
 
 config QUEUE_SPINLOCK
def_bool y if ARCH_USE_QUEUE_SPINLOCK
-   depends on SMP && (!PARAVIRT_SPINLOCKS || !XEN)
+   depends on SMP
 
 config ARCH_USE_QUEUE_RWLOCK
bool
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v15 06/15] qspinlock: Use a simple write to grab the lock

2015-04-06 Thread Waiman Long
Currently, atomic_cmpxchg() is used to get the lock. However, this
is not really necessary if there is more than one task in the queue
and the queue head don't need to reset the tail code. For that case,
a simple write to set the lock bit is enough as the queue head will
be the only one eligible to get the lock as long as it checks that
both the lock and pending bits are not set. The current pending bit
waiting code will ensure that the bit will not be set as soon as the
tail code in the lock is set.

With that change, the are some slight improvement in the performance
of the queue spinlock in the 5M loop micro-benchmark run on a 4-socket
Westere-EX machine as shown in the tables below.

[Standalone/Embedded - same node]
  # of tasksBefore patchAfter patch %Change
  ----- --  ---
   3 2324/2321  2248/2265-3%/-2%
   4 2890/2896  2819/2831-2%/-2%
   5 3611/3595  3522/3512-2%/-2%
   6 4281/4276  4173/4160-3%/-3%
   7 5018/5001  4875/4861-3%/-3%
   8 5759/5750  5563/5568-3%/-3%

[Standalone/Embedded - different nodes]
  # of tasksBefore patchAfter patch %Change
  ----- --  ---
   312242/12237 12087/12093  -1%/-1%
   410688/10696 10507/10521  -2%/-2%

It was also found that this change produced a much bigger performance
improvement in the newer IvyBridge-EX chip and was essentially to close
the performance gap between the ticket spinlock and queue spinlock.

The disk workload of the AIM7 benchmark was run on a 4-socket
Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
on a 3.14 based kernel. The results of the test runs were:

AIM7 XFS Disk Test
  kernel JPMReal Time   Sys TimeUsr Time
  -  ----   
  ticketlock56782333.17   96.61   5.81
  qspinlock 57507993.13   94.83   5.97

AIM7 EXT4 Disk Test
  kernel JPMReal Time   Sys TimeUsr Time
  -  ----   
  ticketlock1114551   16.15  509.72   7.11
  qspinlock 21844668.24  232.99   6.01

The ext4 filesystem run had a much higher spinlock contention than
the xfs filesystem run.

The "ebizzy -m" test was also run with the following results:

  kernel   records/s  Real Time   Sys TimeUsr Time
  --  -   
  ticketlock 2075   10.00  216.35   3.49
  qspinlock  3023   10.00  198.20   4.80

Signed-off-by: Waiman Long 
Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/locking/qspinlock.c |   66 +--
 1 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index bcc99e6..99503ef 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -105,24 +105,37 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
  * By using the whole 2nd least significant byte for the pending bit, we
  * can allow better optimization of the lock acquisition for the pending
  * bit holder.
+ *
+ * This internal structure is also used by the set_locked function which
+ * is not restricted to _Q_PENDING_BITS == 8.
  */
-#if _Q_PENDING_BITS == 8
-
 struct __qspinlock {
union {
atomic_t val;
-   struct {
 #ifdef __LITTLE_ENDIAN
+   struct {
+   u8  locked;
+   u8  pending;
+   };
+   struct {
u16 locked_pending;
u16 tail;
+   };
 #else
+   struct {
u16 tail;
u16 locked_pending;
-#endif
};
+   struct {
+   u8  reserved[2];
+   u8  pending;
+   u8  locked;
+   };
+#endif
};
 };
 
+#if _Q_PENDING_BITS == 8
 /**
  * clear_pending_set_locked - take ownership and clear the pending bit.
  * @lock: Pointer to queue spinlock structure
@@ -195,6 +208,19 @@ static __always_inline u32 xchg_tail(struct qspinlock 
*lock, u32 tail)
 #endif /* _Q_PENDING_BITS == 8 */
 
 /**
+ * set_locked - Set the lock bit and own the lock
+ * @lock: Pointer to queue spinlock structure
+ *
+ * *,*,0 -> *,0,1
+ */
+static __always_inline void set_locked(struct qspinlock *lock)
+{
+   struct __qspinlock *l = (void *)lock;
+
+   WRITE_ONCE(l->locked, _Q_LOCKED_VAL);
+}
+
+/**
  * queue_spin_lock_slowpath - acquire the 

[PATCH v15 11/15] pvqspinlock, x86: Enable PV qspinlock for KVM

2015-04-06 Thread Waiman Long
This patch adds the necessary KVM specific code to allow KVM to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.

Signed-off-by: Waiman Long 
---
 arch/x86/kernel/kvm.c |   43 +++
 kernel/Kconfig.locks  |2 +-
 2 files changed, 44 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e354cc6..4bb42c0 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -584,6 +584,39 @@ static void kvm_kick_cpu(int cpu)
kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
 }
 
+
+#ifdef CONFIG_QUEUE_SPINLOCK
+
+#include 
+
+static void kvm_wait(u8 *ptr, u8 val)
+{
+   unsigned long flags;
+
+   if (in_nmi())
+   return;
+
+   local_irq_save(flags);
+
+   if (READ_ONCE(*ptr) != val)
+   goto out;
+
+   /*
+* halt until it's our turn and kicked. Note that we do safe halt
+* for irq enabled case to avoid hang when lock info is overwritten
+* in irq spinlock slowpath and no spurious interrupt occur to save us.
+*/
+   if (arch_irqs_disabled_flags(flags))
+   halt();
+   else
+   safe_halt();
+
+out:
+   local_irq_restore(flags);
+}
+
+#else /* !CONFIG_QUEUE_SPINLOCK */
+
 enum kvm_contention_stat {
TAKEN_SLOW,
TAKEN_SLOW_PICKUP,
@@ -817,6 +850,8 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, 
__ticket_t ticket)
}
 }
 
+#endif /* !CONFIG_QUEUE_SPINLOCK */
+
 /*
  * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
  */
@@ -828,8 +863,16 @@ void __init kvm_spinlock_init(void)
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
 
+#ifdef CONFIG_QUEUE_SPINLOCK
+   __pv_init_lock_hash();
+   pv_lock_ops.queue_spin_lock_slowpath = __pv_queue_spin_lock_slowpath;
+   pv_lock_ops.queue_spin_unlock = PV_CALLEE_SAVE(__pv_queue_spin_unlock);
+   pv_lock_ops.wait = kvm_wait;
+   pv_lock_ops.kick = kvm_kick_cpu;
+#else /* !CONFIG_QUEUE_SPINLOCK */
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
pv_lock_ops.unlock_kick = kvm_unlock_kick;
+#endif
 }
 
 static __init int kvm_spinlock_init_jump(void)
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index c6a8f7c..537b13e 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -240,7 +240,7 @@ config ARCH_USE_QUEUE_SPINLOCK
 
 config QUEUE_SPINLOCK
def_bool y if ARCH_USE_QUEUE_SPINLOCK
-   depends on SMP && !PARAVIRT_SPINLOCKS
+   depends on SMP && (!PARAVIRT_SPINLOCKS || !XEN)
 
 config ARCH_USE_QUEUE_RWLOCK
bool
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v15 03/15] qspinlock: Add pending bit

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) 

Because the qspinlock needs to touch a second cacheline (the per-cpu
mcs_nodes[]); add a pending bit and allow a single in-word spinner
before we punt to the second cacheline.

It is possible so observe the pending bit without the locked bit when
the last owner has just released but the pending owner has not yet
taken ownership.

In this case we would normally queue -- because the pending bit is
already taken. However, in this case the pending bit is guaranteed
to be released 'soon', therefore wait for it and avoid queueing.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Waiman Long 
---
 include/asm-generic/qspinlock_types.h |   12 +++-
 kernel/locking/qspinlock.c|  119 +++--
 2 files changed, 107 insertions(+), 24 deletions(-)

diff --git a/include/asm-generic/qspinlock_types.h 
b/include/asm-generic/qspinlock_types.h
index c9348d8..9c3f5c2 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -36,8 +36,9 @@ typedef struct qspinlock {
  * Bitfields in the atomic value:
  *
  *  0- 7: locked byte
- *  8- 9: tail index
- * 10-31: tail cpu (+1)
+ * 8: pending
+ *  9-10: tail index
+ * 11-31: tail cpu (+1)
  */
 #define_Q_SET_MASK(type)   (((1U << _Q_ ## type ## _BITS) - 1)\
  << _Q_ ## type ## _OFFSET)
@@ -45,7 +46,11 @@ typedef struct qspinlock {
 #define _Q_LOCKED_BITS 8
 #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED)
 
-#define _Q_TAIL_IDX_OFFSET (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#define _Q_PENDING_OFFSET  (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#define _Q_PENDING_BITS1
+#define _Q_PENDING_MASK_Q_SET_MASK(PENDING)
+
+#define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS)
 #define _Q_TAIL_IDX_BITS   2
 #define _Q_TAIL_IDX_MASK   _Q_SET_MASK(TAIL_IDX)
 
@@ -54,5 +59,6 @@ typedef struct qspinlock {
 #define _Q_TAIL_CPU_MASK   _Q_SET_MASK(TAIL_CPU)
 
 #define _Q_LOCKED_VAL  (1U << _Q_LOCKED_OFFSET)
+#define _Q_PENDING_VAL (1U << _Q_PENDING_OFFSET)
 
 #endif /* __ASM_GENERIC_QSPINLOCK_TYPES_H */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 3456819..0351f78 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -94,24 +94,28 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
return per_cpu_ptr(_nodes[idx], cpu);
 }
 
+#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
+
 /**
  * queue_spin_lock_slowpath - acquire the queue spinlock
  * @lock: Pointer to queue spinlock structure
  * @val: Current value of the queue spinlock 32-bit word
  *
- * (queue tail, lock value)
- *
- *  fast  :slow  :
unlock
- *:  :
- * uncontended  (0,0)   --:--> (0,1) :--> (*,0)
- *:   | ^./  :
- *:   v   \   |  :
- * uncontended:(n,x) --+--> (n,0) |  :
- *   queue:   | ^--'  |  :
- *:   v   |  :
- * contended  :(*,x) --+--> (*,0) -> (*,1) ---'  :
- *   queue: ^--' :
+ * (queue tail, pending bit, lock value)
  *
+ *  fast :slow  :unlock
+ *   :  :
+ * uncontended  (0,0,0) -:--> (0,0,1) --:--> 
(*,*,0)
+ *   :   | ^.--. /  :
+ *   :   v   \  \|  :
+ * pending   :(0,1,1) +--> (0,1,0)   \   |  :
+ *   :   | ^--'  |   |  :
+ *   :   v   |   |  :
+ * uncontended   :(n,x,y) +--> (n,0,0) --'   |  :
+ *   queue   :   | ^--'  |  :
+ *   :   v   |  :
+ * contended :(*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
+ *   queue   : ^--' :
  */
 void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 {
@@ -121,6 +125,75 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 
BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+   /*
+* wait for in-progress pending->locked hand-overs
+*
+* 0,1,0 -> 0,0,1
+*/
+   if (val == _Q_PENDING_VAL) {
+   while ((val = atomic_read(>val)) == _Q_PENDING_VAL)
+   cpu_relax();
+   }
+
+ 

[PATCH v15 14/15] pvqspinlock: Improve slowpath performance by avoiding cmpxchg

2015-04-06 Thread Waiman Long
In the pv_scan_next() function, the slow cmpxchg atomic operation is
performed even if the other CPU is not even close to being halted. This
extra cmpxchg can harm slowpath performance.

This patch introduces the new mayhalt flag to indicate if the other
spinning CPU is close to being halted or not. The current threshold
for x86 is 2k cpu_relax() calls. If this flag is not set, the other
spinning CPU will have at least 2k more cpu_relax() calls before
it can enter the halt state. This should give enough time for the
setting of the locked flag in struct mcs_spinlock to propagate to
that CPU without using atomic op.

Signed-off-by: Waiman Long 
---
 kernel/locking/qspinlock_paravirt.h |   28 +---
 1 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index a210061..a9fe10d 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -16,7 +16,8 @@
  * native_queue_spin_unlock().
  */
 
-#define _Q_SLOW_VAL(3U << _Q_LOCKED_OFFSET)
+#define _Q_SLOW_VAL(3U << _Q_LOCKED_OFFSET)
+#define MAYHALT_THRESHOLD  (SPIN_THRESHOLD >> 4)
 
 /*
  * The vcpu_hashed is a special state that is set by the new lock holder on
@@ -36,6 +37,7 @@ struct pv_node {
 
int cpu;
u8  state;
+   u8  mayhalt;
 };
 
 /*
@@ -187,6 +189,7 @@ static void pv_init_node(struct mcs_spinlock *node)
 
pn->cpu = smp_processor_id();
pn->state = vcpu_running;
+   pn->mayhalt = false;
 }
 
 /*
@@ -203,17 +206,27 @@ static void pv_wait_node(struct mcs_spinlock *node)
for (loop = SPIN_THRESHOLD; loop; loop--) {
if (READ_ONCE(node->locked))
return;
+   if (loop == MAYHALT_THRESHOLD)
+   xchg(>mayhalt, true);
cpu_relax();
}
 
/*
-* Order pn->state vs pn->locked thusly:
+* Order pn->state/pn->mayhalt vs pn->locked thusly:
 *
-* [S] pn->state = vcpu_halted[S] next->locked = 1
+* [S] pn->mayhalt = 1[S] next->locked = 1
+* MB, delay  barrier()
+* [S] pn->state = vcpu_halted[L] pn->mayhalt
 * MB MB
 * [L] pn->locked   [RmW] pn->state = vcpu_hashed
 *
 * Matches the cmpxchg() from pv_scan_next().
+*
+* As the new lock holder may quit (when pn->mayhalt is not
+* set) without memory barrier, a sufficiently long delay is
+* inserted between the setting of pn->mayhalt and pn->state
+* to ensure that there is enough time for the new pn->locked
+* value to be propagated here to be checked below.
 */
(void)xchg(>state, vcpu_halted);
 
@@ -226,6 +239,7 @@ static void pv_wait_node(struct mcs_spinlock *node)
 * needs to move on to pv_wait_head().
 */
(void)cmpxchg(>state, vcpu_halted, vcpu_running);
+   pn->mayhalt = false;
}
 
/*
@@ -246,6 +260,14 @@ static void pv_scan_next(struct qspinlock *lock, struct 
mcs_spinlock *node)
struct __qspinlock *l = (void *)lock;
 
/*
+* If mayhalt is not set, there is enough time for the just set value
+* in pn->locked to be propagated to the other CPU before it is time
+* to halt.
+*/
+   if (!READ_ONCE(pn->mayhalt))
+   return;
+
+   /*
 * Transition CPU state: halted => hashed
 * Quit if the transition failed.
 */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v15 10/15] pvqspinlock: Implement the paravirt qspinlock for x86

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) 

We use the regular paravirt call patching to switch between:

  native_queue_spin_lock_slowpath() __pv_queue_spin_lock_slowpath()
  native_queue_spin_unlock()__pv_queue_spin_unlock()

We use a callee saved call for the unlock function which reduces the
i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions
again.

We further optimize the unlock path by patching the direct call with a
"movb $0,%arg1" if we are indeed using the native unlock code. This
makes the unlock code almost as fast as the !PARAVIRT case.

This significantly lowers the overhead of having
CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Waiman Long 
---
 arch/x86/Kconfig  |2 +-
 arch/x86/include/asm/paravirt.h   |   28 +++-
 arch/x86/include/asm/paravirt_types.h |   10 ++
 arch/x86/include/asm/qspinlock.h  |   25 -
 arch/x86/kernel/paravirt-spinlocks.c  |   24 +++-
 arch/x86/kernel/paravirt_patch_32.c   |   22 ++
 arch/x86/kernel/paravirt_patch_64.c   |   22 ++
 7 files changed, 121 insertions(+), 12 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 49fecb1..a0946e7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -661,7 +661,7 @@ config PARAVIRT_DEBUG
 config PARAVIRT_SPINLOCKS
bool "Paravirtualization layer for spinlocks"
depends on PARAVIRT && SMP
-   select UNINLINE_SPIN_UNLOCK
+   select UNINLINE_SPIN_UNLOCK if !QUEUE_SPINLOCK
---help---
  Paravirtualized spinlocks allow a pvops backend to replace the
  spinlock implementation with something virtualization-friendly
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 965c47d..dd40269 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,6 +712,30 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
 
+#ifdef CONFIG_QUEUE_SPINLOCK
+
+static __always_inline void pv_queue_spin_lock_slowpath(struct qspinlock 
*lock, u32 val)
+{
+   PVOP_VCALL2(pv_lock_ops.queue_spin_lock_slowpath, lock, val);
+}
+
+static __always_inline void pv_queue_spin_unlock(struct qspinlock *lock)
+{
+   PVOP_VCALLEE1(pv_lock_ops.queue_spin_unlock, lock);
+}
+
+static __always_inline void pv_wait(u8 *ptr, u8 val)
+{
+   PVOP_VCALL2(pv_lock_ops.wait, ptr, val);
+}
+
+static __always_inline void pv_kick(int cpu)
+{
+   PVOP_VCALL1(pv_lock_ops.kick, cpu);
+}
+
+#else /* !CONFIG_QUEUE_SPINLOCK */
+
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
@@ -724,7 +748,9 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
-#endif
+#endif /* CONFIG_QUEUE_SPINLOCK */
+
+#endif /* SMP && PARAVIRT_SPINLOCKS */
 
 #ifdef CONFIG_X86_32
 #define PV_SAVE_REGS "pushl %ecx; pushl %edx;"
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 7549b8b..f6acaea 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -333,9 +333,19 @@ struct arch_spinlock;
 typedef u16 __ticket_t;
 #endif
 
+struct qspinlock;
+
 struct pv_lock_ops {
+#ifdef CONFIG_QUEUE_SPINLOCK
+   void (*queue_spin_lock_slowpath)(struct qspinlock *lock, u32 val);
+   struct paravirt_callee_save queue_spin_unlock;
+
+   void (*wait)(u8 *ptr, u8 val);
+   void (*kick)(int cpu);
+#else /* !CONFIG_QUEUE_SPINLOCK */
struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
+#endif /* !CONFIG_QUEUE_SPINLOCK */
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 64c925e..c8290db 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 #definequeue_spin_unlock queue_spin_unlock
 /**
@@ -11,11 +12,33 @@
  *
  * A smp_store_release() on the least-significant byte.
  */
-static inline void queue_spin_unlock(struct qspinlock *lock)
+static inline void native_queue_spin_unlock(struct qspinlock *lock)
 {
smp_store_release((u8 *)lock, 0);
 }
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+extern void native_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_init_lock_hash(void);
+extern void __pv_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __raw_callee_save___pv_queue_spin_unlock(struct qspinlock *lock);
+
+static inline void queue_spin_lock_slowpath(struct qspinlock 

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

2015-04-06 Thread Waiman Long
v14->v15:
 - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV
   qspinlock code by dynamically allocating the hash table as well
   as some other performance optimization.
 - Simplified the Xen PV qspinlock code as suggested by David Vrabel
   .
 - Add benchmarking data for 3.19 kernel to compare the performance
   of a spinlock heavy test with and without the qspinlock patch
   under different cpufreq drivers and scaling governors.

v13->v14:
 - Patches 1 & 2: Add queue_spin_unlock_wait() to accommodate commit
   78bff1c86 from Oleg Nesterov.
 - Fix the system hang problem when using PV qspinlock in an
   over-committed guest due to a racing condition in the
   pv_set_head_in_tail() function.
 - Increase the MAYHALT_THRESHOLD from 10 to 1024.
 - Change kick_cpu into a regular function pointer instead of a
   callee-saved function.
 - Change lock statistics code to use separate bits for different
   statistics.

v12->v13:
 - Change patch 9 to generate separate versions of the
   queue_spin_lock_slowpath functions for bare metal and PV guest. This
   reduces the performance impact of the PV code on bare metal systems.

v11->v12:
 - Based on PeterZ's version of the qspinlock patch
   (https://lkml.org/lkml/2014/6/15/63).
 - Incorporated many of the review comments from Konrad Wilk and
   Paolo Bonzini.
 - The pvqspinlock code is largely from my previous version with
   PeterZ's way of going from queue tail to head and his idea of
   using callee saved calls to KVM and XEN codes.

v10->v11:
  - Use a simple test-and-set unfair lock to simplify the code,
but performance may suffer a bit for large guest with many CPUs.
  - Take out Raghavendra KT's test results as the unfair lock changes
may render some of his results invalid.
  - Add PV support without increasing the size of the core queue node
structure.
  - Other minor changes to address some of the feedback comments.

v9->v10:
  - Make some minor changes to qspinlock.c to accommodate review feedback.
  - Change author to PeterZ for 2 of the patches.
  - Include Raghavendra KT's test results in patch 18.

v8->v9:
  - Integrate PeterZ's version of the queue spinlock patch with some
modification:
http://lkml.kernel.org/r/20140310154236.038181...@infradead.org
  - Break the more complex patches into smaller ones to ease review effort.
  - Fix a racing condition in the PV qspinlock code.

v7->v8:
  - Remove one unneeded atomic operation from the slowpath, thus
improving performance.
  - Simplify some of the codes and add more comments.
  - Test for X86_FEATURE_HYPERVISOR CPU feature bit to enable/disable
unfair lock.
  - Reduce unfair lock slowpath lock stealing frequency depending
on its distance from the queue head.
  - Add performance data for IvyBridge-EX CPU.

v6->v7:
  - Remove an atomic operation from the 2-task contending code
  - Shorten the names of some macros
  - Make the queue waiter to attempt to steal lock when unfair lock is
enabled.
  - Remove lock holder kick from the PV code and fix a race condition
  - Run the unfair lock & PV code on overcommitted KVM guests to collect
performance data.

v5->v6:
 - Change the optimized 2-task contending code to make it fairer at the
   expense of a bit of performance.
 - Add a patch to support unfair queue spinlock for Xen.
 - Modify the PV qspinlock code to follow what was done in the PV
   ticketlock.
 - Add performance data for the unfair lock as well as the PV
   support code.

v4->v5:
 - Move the optimized 2-task contending code to the generic file to
   enable more architectures to use it without code duplication.
 - Address some of the style-related comments by PeterZ.
 - Allow the use of unfair queue spinlock in a real para-virtualized
   execution environment.
 - Add para-virtualization support to the qspinlock code by ensuring
   that the lock holder and queue head stay alive as much as possible.

v3->v4:
 - Remove debugging code and fix a configuration error
 - Simplify the qspinlock structure and streamline the code to make it
   perform a bit better
 - Add an x86 version of asm/qspinlock.h for holding x86 specific
   optimization.
 - Add an optimized x86 code path for 2 contending tasks to improve
   low contention performance.

v2->v3:
 - Simplify the code by using numerous mode only without an unfair option.
 - Use the latest smp_load_acquire()/smp_store_release() barriers.
 - Move the queue spinlock code to kernel/locking.
 - Make the use of queue spinlock the default for x86-64 without user
   configuration.
 - Additional performance tuning.

v1->v2:
 - Add some more comments to document what the code does.
 - Add a numerous CPU mode to support >= 16K CPUs
 - Add a configuration option to allow lock stealing which can further
   improve performance in many cases.
 - Enable wakeup of queue head CPU at unlock time for non-numerous
   CPU mode.

This patch set has 3 different sections:
 1) Patches 1-6: Introduces a queue-based spinlock 

Re: [PATCH 03/13] thermal: remove useless call to thermal_zone_device_set_polling

2015-04-06 Thread Eduardo Valentin
On Thu, Mar 26, 2015 at 04:53:50PM +0100, Sascha Hauer wrote:
> When the thermal zone has no get_temp callback then 
> thermal_zone_device_register()
> calls thermal_zone_device_set_polling() with a polling delay of 0. This
> only cancels the poll_queue. Since the poll_queue hasn't been scheduled this
> is a no-op. Remove it.
> 
> Signed-off-by: Sascha Hauer 

This seams reasonable to me:

Acked-by: Eduardo Valentin 

> ---
>  drivers/thermal/thermal_core.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index c735ac4c..dcea909 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1571,9 +1571,6 @@ struct thermal_zone_device 
> *thermal_zone_device_register(const char *type,
>  
>   INIT_DELAYED_WORK(&(tz->poll_queue), thermal_zone_device_check);
>  
> - if (!tz->ops->get_temp)
> - thermal_zone_device_set_polling(tz, 0);
> -
>   thermal_zone_device_update(tz);
>  
>   return tz;
> -- 
> 2.1.4
> 


signature.asc
Description: Digital signature


Re: [PATCH v9 19/30] powerpc/pci: Use pci_scan_host_bridge() for simplicity

2015-04-06 Thread Yijing Wang
On 2015/4/7 7:35, Daniel Axtens wrote:
> I've been looking at this patch series for a while now, and I now
> believe it's ready on the PowerPC side.
> 
> I was originally concerned that it would break odd corner cases,
> particularly where similar code appears (namely kernel/pci_hotplug.c and
> kernel/pci_of_scan.c). However, upon further examination, talking with
> Yijing, and some testing, I'm now convinced that it is indeed restricted
> to the generic code, and doesn't change behaviour.
> 
> This is both a plus and a minus: because it's currently restricted to
> generic code, I'm confident it works, but the down side is that it
> doesn't yet simplify our arch-specific complexity. We'll need to do some
> more work on our side to reap the full benefits.
> 
> I tested this entire series on a PowerNV machine, including doing EEH
> injection to trigger PCI hotplug:
> Tested-by: Daniel Axtens 
> For completeness, it would be good to test it on Cell, as they are the
> only remaining user of pci_of_scan.c
> 
> In conclusion, this patch is
> Reviewed-by: Daniel Axtens 
> 


Thanks very much for your test and review.

Thanks!
Yijing.

> 
> 
> On Fri, 2015-04-03 at 17:25 +0800, Yijing Wang wrote:
>> Now we could use pci_scan_host_bridge() to scan
>> pci buses, provide powerpc specific pci_host_bridge_ops.
>>
>> Signed-off-by: Yijing Wang 
>> CC: Benjamin Herrenschmidt 
>> CC: linuxppc-...@lists.ozlabs.org
>> ---
>>  arch/powerpc/kernel/pci-common.c |   62 
>> +++--
>>  1 files changed, 38 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/pci-common.c 
>> b/arch/powerpc/kernel/pci-common.c
>> index 2c58200..50b32f6 100644
>> --- a/arch/powerpc/kernel/pci-common.c
>> +++ b/arch/powerpc/kernel/pci-common.c
>> @@ -773,6 +773,29 @@ void pcibios_set_root_bus_speed(struct pci_host_bridge 
>> *bridge)
>>  return ppc_md.pcibios_set_root_bus_speed(bridge);
>>  }
>>  
>> +static int pci_host_scan_bus(struct pci_host_bridge *host)
>> +{
>> +int mode = PCI_PROBE_NORMAL;
>> +struct pci_bus *bus = host->bus;
>> +struct pci_controller *hose = dev_get_drvdata(>dev);
>> +
>> +/* Get probe mode and perform scan */
>> +if (hose->dn && ppc_md.pci_probe_mode)
>> +mode = ppc_md.pci_probe_mode(bus);
>> +
>> +pr_debug("probe mode: %d\n", mode);
>> +if (mode == PCI_PROBE_DEVTREE)
>> +of_scan_bus(hose->dn, bus);
>> +
>> +if (mode == PCI_PROBE_NORMAL) {
>> +pci_bus_update_busn_res_end(bus, 255);
>> +hose->last_busno = pci_scan_child_bus(bus);
>> +pci_bus_update_busn_res_end(bus, hose->last_busno);
>> +}
>> +
>> +return pci_bus_child_max_busnr(bus);
>> +}
>> +
>>  /* This header fixup will do the resource fixup for all devices as they are
>>   * probed, but not for bridge ranges
>>   */
>> @@ -1585,6 +1608,11 @@ struct device_node *pcibios_get_phb_of_node(struct 
>> pci_bus *bus)
>>  return of_node_get(hose->dn);
>>  }
>>  
>> +static struct pci_host_bridge_ops pci_host_ops = {
>> +.set_root_bus_speed = pcibios_set_root_bus_speed,
>> +.scan_bus = pci_host_scan_bus,
>> +};
>> +
>>  /**
>>   * pci_scan_phb - Given a pci_controller, setup and scan the PCI bus
>>   * @hose: Pointer to the PCI host controller instance structure
>> @@ -1592,9 +1620,8 @@ struct device_node *pcibios_get_phb_of_node(struct 
>> pci_bus *bus)
>>  void pcibios_scan_phb(struct pci_controller *hose)
>>  {
>>  LIST_HEAD(resources);
>> -struct pci_bus *bus;
>> +struct pci_host_bridge *host;
>>  struct device_node *node = hose->dn;
>> -int mode;
>>  
>>  pr_debug("PCI: Scanning PHB %s\n", of_node_full_name(node));
>>  
>> @@ -1609,30 +1636,17 @@ void pcibios_scan_phb(struct pci_controller *hose)
>>  hose->busn.flags = IORESOURCE_BUS;
>>  pci_add_resource(, >busn);
>>  
>> +pci_host_ops.pci_ops = hose->ops;
>>  /* Create an empty bus for the toplevel */
>> -bus = pci_create_root_bus(hose->parent, hose->global_number,
>> -hose->first_busno, hose->ops, hose, );
>> -if (bus == NULL) {
>> -pr_err("Failed to create bus for PCI domain %04x\n",
>> -hose->global_number);
>> +host = pci_scan_host_bridge(hose->parent, hose->global_number,
>> +hose->first_busno, hose, , _host_ops);
>> +if (host == NULL) {
>> +pr_err("Failed to create host bridge for pci%04x:%02x\n",
>> +hose->global_number, hose->first_busno);
>>  pci_free_resource_list();
>>  return;
>>  }
>> -hose->bus = bus;
>> -
>> -/* Get probe mode and perform scan */
>> -mode = PCI_PROBE_NORMAL;
>> -if (node && ppc_md.pci_probe_mode)
>> -mode = ppc_md.pci_probe_mode(bus);
>> -pr_debug("probe mode: %d\n", mode);
>> -if (mode == PCI_PROBE_DEVTREE)
>> -of_scan_bus(node, bus);
>> -
>> -if (mode == PCI_PROBE_NORMAL) {

Re: [PATCH 1/6] block: export blkdev_reread_part()

2015-04-06 Thread Ming Lei
On Mon, Apr 6, 2015 at 9:42 PM, Jarod Wilson  wrote:
> On Mon, Apr 06, 2015 at 12:40:12AM +0800, Ming Lei wrote:
>> On Mon, Apr 6, 2015 at 12:12 AM, Christoph Hellwig  
>> wrote:
>> >> +/*
>> >> + * This is exported as API for block driver, can be called
>> >> + * with requiring bd_mutex or not.
>> >> + */
>> >> +int __blkdev_reread_part(struct block_device *bdev, bool lock)
>> >>  {
>> >>   struct gendisk *disk = bdev->bd_disk;
>> >>   int res;
>> >> @@ -159,12 +163,14 @@ static int blkdev_reread_part(struct block_device 
>> >> *bdev)
>> >>   return -EINVAL;
>> >>   if (!capable(CAP_SYS_ADMIN))
>> >>   return -EACCES;
>> >> - if (!mutex_trylock(>bd_mutex))
>> >> + if (lock && !mutex_trylock(>bd_mutex))
>> >>   return -EBUSY;
>> >
>> > Please don't add funtions that do conditional locking, instead move
>> > all the code into blkdev_reread_part_nolock, and then wrap it:
>> >
>> > int blkdev_reread_part(struct block_device *bdev)
>> > {
>> > if (!mutex_trylock(>bd_mutex))
>> > return -EBUSY;
>> > blkdev_reread_part_nolock(bdev);
>> > mutex_unlock(>bd_mutex);
>> > }
>>
>> Yes, it is more clean, but with extra acquiring lock cost for the
>> failure cases, especially when we replace trylock with mutex_lock().
>
> I was working on a version of this myself over the past few days, I
> actually removed blkdev_reread_part() entirely, renamed
> fs/partition-generic.c::reread_partitions() to __reread_partitions(), then
> moved the locking from blkdev_reread_part() into a new reread_partitions()
> that wrapped around __reread_partitions(). Same difference, I guess.
>
>> > Please also add a lockdep_assert_held to blkdev_reread_part_nolock to
>> > ensure callers actually do hold the lock.
>>
>> Good point!
>
> Looks like fs/block_dev.c::__blkdev_get() is the only thing that would be
> calling the _nolock variant of whichever route, as it handles bd_mutex
> acquisition within __blkdev_get().

I guess you forget __blkdev_put(), :-)

>
> As an aside, there's a piece of that function that could be worth
> duplicating over into loop.c as well:
>
> if (bdev->bd_invalidated) {
> if (!ret)
> rescan_partitions(bdev);
> else if (ret == -ENOMEDIUM)
> invalidate_partitions(disk, bdev);
>
> Might this possibly be put to use to help with the problem commit
> 8761a3dc1f07b163414e2215a2cadbb4cfe2a107 was trying to solve?

I am wondering if the problem claimed in this commit exists in reality,
at least fdisk need to run reread partition first before adding partition.

- LO_FLAGS_PARTSCAN is set for both 'losetup -P' and max_parts
- if max_parts isn't set, GENHD_FL_NO_PART_SCAN is set, so user
can't reread partition successfully because of disk_part_scan_enabled().

If there is really the problem, it can be fixed by exporting
rescan_partitions or the approach in commit 8761a3dc
with not acquiring bd_mutex in release().

Thanks,
Ming Lei

>
> --
> Jarod Wilson
> ja...@redhat.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 13/13] thermal: of: implement .set_trips for device tree thermal zones

2015-04-06 Thread Eduardo Valentin
On Thu, Mar 26, 2015 at 04:54:00PM +0100, Sascha Hauer wrote:
> Signed-off-by: Sascha Hauer 
> ---
>  drivers/thermal/of-thermal.c | 12 
>  include/linux/thermal.h  |  1 +
>  2 files changed, 13 insertions(+)

Can you please include at least one user of this call back in your patch
series?

> 
> diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
> index 9b63193..a3de5de 100644
> --- a/drivers/thermal/of-thermal.c
> +++ b/drivers/thermal/of-thermal.c
> @@ -97,6 +97,17 @@ static int of_thermal_get_temp(struct thermal_zone_device 
> *tz,
>   return data->ops->get_temp(data->sensor_data, temp);
>  }
>  
> +static int of_thermal_set_trips(struct thermal_zone_device *tz,
> +unsigned long low, unsigned long high)
> +{
> + struct __thermal_zone *data = tz->devdata;
> +
> + if (!data->ops || !data->ops->set_trips)
> + return -ENOSYS;
> +
> + return data->ops->set_trips(data->sensor_data, low, high);
> +}
> +
>  /**
>   * of_thermal_get_ntrips - function to export number of available trip
>   *  points.
> @@ -367,6 +378,7 @@ static int of_thermal_get_crit_temp(struct 
> thermal_zone_device *tz,
>  
>  static const struct thermal_zone_device_ops of_thermal_ops = {
>   .get_temp = of_thermal_get_temp,
> + .set_trips = of_thermal_set_trips,
>   .get_trend = of_thermal_get_trend,
>   .set_emul_temp = of_thermal_set_emul_temp,
>  
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index b870702..84a5b5d 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -276,6 +276,7 @@ struct thermal_genl_event {
>   */
>  struct thermal_zone_of_device_ops {
>   int (*get_temp)(void *, unsigned long *);
> + int (*set_trips)(void *, unsigned long, unsigned long);

Could you please keep the kernel doc entry up to date? I know we donot
have entries for all structs, but I am working in improving this.

>   int (*get_trend)(void *, int trend, enum thermal_trend *);
>   int (*set_emul_temp)(void *, unsigned long);
>  };
> -- 
> 2.1.4
> 


signature.asc
Description: Digital signature


Re: [PATCH 12/13] thermal: Add support for hardware-tracked trip points

2015-04-06 Thread Eduardo Valentin
On Thu, Mar 26, 2015 at 04:53:59PM +0100, Sascha Hauer wrote:
> This adds support for hardware-tracked trip points to the device tree
> thermal sensor framework.
> 
> The framework supports an arbitrary number of trip points. Whenever
> the current temperature is updated, the trip points immediately
> below and above the current temperature are found. A .set_trips
> callback is then called with the temperatures. If there is no trip
> point above or below the current temperature, the passed trip
> temperature will be ULONG_MAX or 0 respectively. In this callback,
> the driver should program the hardware such that it is notified
> when either of these trip points are triggered. When a trip point
> is triggered, the driver should call `thermal_zone_device_update'
> for the respective thermal zone. This will cause the trip points
> to be updated again.
> 
> If .set_trips is not implemented, the framework behaves as before.
> 
> This patch is based on an earlier version from Mikko Perttunen
> 
> 
> Signed-off-by: Sascha Hauer 
> ---
>  drivers/thermal/thermal_core.c | 41 +
>  include/linux/thermal.h|  3 +++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index dcdf45e..7138f8f 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -434,6 +434,45 @@ int thermal_zone_get_temp(struct thermal_zone_device 
> *tz, unsigned long *temp)
>  }
>  EXPORT_SYMBOL_GPL(thermal_zone_get_temp);
>  
> +static void thermal_zone_set_trips(struct thermal_zone_device *tz)
> +{
> + unsigned long low = 0;
> + unsigned long high = ULONG_MAX;
> + unsigned long trip_temp, hysteresis;
> + unsigned long temp = tz->temperature;
> + int i;
> +
> + if (!tz->ops->set_trips)
> + return;
> +
> + /* No need to change trip points */
> + if (temp > tz->prev_low_trip && temp < tz->prev_high_trip)
> + return;
> +
> + for (i = 0; i < tz->trips; i++) {
> + unsigned long trip_low;
> +
> + tz->ops->get_trip_temp(tz, i, _temp);
> + tz->ops->get_trip_hyst(tz, i, );
> +
> + trip_low = trip_temp - hysteresis;
> +
> + if (trip_low < temp && trip_low > low)
> + low = trip_low;
> +
> + if (trip_temp > temp && trip_temp < high)
> + high = trip_temp;
> + }
> +
> + tz->prev_low_trip = low;
> + tz->prev_high_trip = high;
> +
> + dev_dbg(>device, "new temperature boundaries: %lu < x < %lu\n",
> + low, high);
> +
> + tz->ops->set_trips(tz, low, high);
> +}
> +
>  void thermal_zone_device_update(struct thermal_zone_device *tz)
>  {
>   int count;
> @@ -460,6 +499,8 @@ void thermal_zone_device_update(struct 
> thermal_zone_device *tz)
>   dev_dbg(>device, "last_temperature=%lu, current_temperature=%lu\n",
>   tz->last_temperature, tz->temperature);
>  
> + thermal_zone_set_trips(tz);

Do we need to lock the tz->lock to perform this operation of setting the
hardware trip points?

> +
>   for (count = 0; count < tz->trips; count++)
>   handle_thermal_trip(tz, count);
>  }
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index ac2897c..b870702 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -87,6 +87,7 @@ struct thermal_zone_device_ops {
>   int (*unbind) (struct thermal_zone_device *,
>  struct thermal_cooling_device *);
>   int (*get_temp) (struct thermal_zone_device *, unsigned long *);
> + int (*set_trips) (struct thermal_zone_device *, unsigned long, unsigned 
> long);
>   int (*get_mode) (struct thermal_zone_device *,
>enum thermal_device_mode *);
>   int (*set_mode) (struct thermal_zone_device *,
> @@ -183,6 +184,8 @@ struct thermal_zone_device {
>   unsigned long temperature;
>   unsigned long last_temperature;
>   unsigned long emul_temperature;
> + unsigned long prev_low_trip;
> + unsigned long prev_high_trip;
>   int passive;
>   unsigned int forced_passive;
>   const struct thermal_zone_device_ops *ops;
> -- 
> 2.1.4
> 


signature.asc
Description: Digital signature


Re: [PATCH 4/4 V6] workqueue: Allow modifying low level unbound workqueue cpumask

2015-04-06 Thread Lai Jiangshan
On 04/07/2015 09:58 AM, Tejun Heo wrote:
> Hello, Lai.
> 
> On Tue, Apr 07, 2015 at 09:25:59AM +0800, Lai Jiangshan wrote:
>> On 04/06/2015 11:53 PM, Tejun Heo wrote:
>>> On Thu, Apr 02, 2015 at 07:14:42PM +0800, Lai Jiangshan wrote:
/* make a copy of @attrs and sanitize it */
copy_workqueue_attrs(new_attrs, attrs);
 -  cpumask_and(new_attrs->cpumask, new_attrs->cpumask, 
 wq_unbound_global_cpumask);
 +  copy_workqueue_attrs(pwq_attrs, attrs);
 +  cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask);
 +  cpumask_and(pwq_attrs->cpumask, pwq_attrs->cpumask, unbound_cpumask);
>>>
>>> Hmmm... why do we need to keep track of both cpu_possible_mask and
>>> unbound_cpumask?  Can't we just make unbound_cpumask replace
>>> cpu_possible_mask for unbound workqueues?
>>>
>>
>> I want to save the original user-setting cpumask.
>>
>> When any time the wq_unbound_global_cpumask is changed,
>> the new effective cpumask is
>> the-original-user-setting-cpumask & wq_unbound_global_cpumask
>> instead of
>> the-last-effective-cpumask & wq_unbound_global_cpumask.
> 
> Yes, I get that, but that'd require just tracking the original

wq->unbound_attrs (new_attrs) saves the original configured value
and is needed to be keep track of.
For sanity, it needs to be masked with cpu_possible_mask.

+   cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask);

This code is changed back to the original code (before this patchset).

In the next iterate, I will reduce the number of the local vars to make
the code clearer.

> configured value and the unbound_cpumask masked value, no?  What am I
> missing?
> 
> Thanks.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/13] thermal: of: streamline .get_temp callbacks

2015-04-06 Thread Eduardo Valentin
On Thu, Mar 26, 2015 at 04:53:54PM +0100, Sascha Hauer wrote:
> In the thermal framework it was decided that temperatures can't
> be negative, so let the .get_temp callback in struct
> thermal_zone_of_device_ops take an unsigned long pointer for
> the temperature like the .get_temp callback in
> struct thermal_zone_device_ops does.

This change is required. However, better we move to the direction of
using signed type for temperature. However, we want to have int instead
of long though.

> 
> Signed-off-by: Sascha Hauer 
> ---
>  drivers/hwmon/lm75.c   | 2 +-
>  drivers/hwmon/ntc_thermistor.c | 2 +-
>  drivers/hwmon/tmp102.c | 2 +-
>  drivers/input/touchscreen/sun4i-ts.c   | 2 +-
>  drivers/thermal/rockchip_thermal.c | 2 +-
>  drivers/thermal/samsung/exynos_tmu.c   | 2 +-
>  drivers/thermal/tegra_soctherm.c   | 2 +-
>  drivers/thermal/ti-soc-thermal/ti-thermal-common.c | 2 +-
>  include/linux/thermal.h| 2 +-
>  9 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/hwmon/lm75.c b/drivers/hwmon/lm75.c
> index fe41d5a..9df3ca3 100644
> --- a/drivers/hwmon/lm75.c
> +++ b/drivers/hwmon/lm75.c
> @@ -104,7 +104,7 @@ static inline long lm75_reg_to_mc(s16 temp, u8 resolution)
>  
>  /* sysfs attributes for hwmon */
>  
> -static int lm75_read_temp(void *dev, long *temp)
> +static int lm75_read_temp(void *dev, unsigned long *temp)
>  {
>   struct lm75_data *data = lm75_update_device(dev);
>  
> diff --git a/drivers/hwmon/ntc_thermistor.c b/drivers/hwmon/ntc_thermistor.c
> index 112e4d4..12cb333 100644
> --- a/drivers/hwmon/ntc_thermistor.c
> +++ b/drivers/hwmon/ntc_thermistor.c
> @@ -430,7 +430,7 @@ static int ntc_thermistor_get_ohm(struct ntc_data *data)
>   return -EINVAL;
>  }
>  
> -static int ntc_read_temp(void *dev, long *temp)
> +static int ntc_read_temp(void *dev, unsigned long *temp)
>  {
>   struct ntc_data *data = dev_get_drvdata(dev);
>   int ohm;
> diff --git a/drivers/hwmon/tmp102.c b/drivers/hwmon/tmp102.c
> index 9da2735..25bd72b 100644
> --- a/drivers/hwmon/tmp102.c
> +++ b/drivers/hwmon/tmp102.c
> @@ -98,7 +98,7 @@ static struct tmp102 *tmp102_update_device(struct device 
> *dev)
>   return tmp102;
>  }
>  
> -static int tmp102_read_temp(void *dev, long *temp)
> +static int tmp102_read_temp(void *dev, unsigned long *temp)
>  {
>   struct tmp102 *tmp102 = tmp102_update_device(dev);
>  
> diff --git a/drivers/input/touchscreen/sun4i-ts.c 
> b/drivers/input/touchscreen/sun4i-ts.c
> index b93a28b..26a7cf5 100644
> --- a/drivers/input/touchscreen/sun4i-ts.c
> +++ b/drivers/input/touchscreen/sun4i-ts.c
> @@ -198,7 +198,7 @@ static int sun4i_get_temp(const struct sun4i_ts_data *ts, 
> long *temp)
>   return 0;
>  }
>  
> -static int sun4i_get_tz_temp(void *data, long *temp)
> +static int sun4i_get_tz_temp(void *data, unsigned long *temp)
>  {
>   return sun4i_get_temp(data, temp);
>  }
> diff --git a/drivers/thermal/rockchip_thermal.c 
> b/drivers/thermal/rockchip_thermal.c
> index 3aa46ac..67dfc67 100644
> --- a/drivers/thermal/rockchip_thermal.c
> +++ b/drivers/thermal/rockchip_thermal.c
> @@ -366,7 +366,7 @@ static irqreturn_t rockchip_thermal_alarm_irq_thread(int 
> irq, void *dev)
>   return IRQ_HANDLED;
>  }
>  
> -static int rockchip_thermal_get_temp(void *_sensor, long *out_temp)
> +static int rockchip_thermal_get_temp(void *_sensor, unsigned long *out_temp)
>  {
>   struct rockchip_thermal_sensor *sensor = _sensor;
>   struct rockchip_thermal_data *thermal = sensor->thermal;
> diff --git a/drivers/thermal/samsung/exynos_tmu.c 
> b/drivers/thermal/samsung/exynos_tmu.c
> index 1d30b09..5f721f2 100644
> --- a/drivers/thermal/samsung/exynos_tmu.c
> +++ b/drivers/thermal/samsung/exynos_tmu.c
> @@ -713,7 +713,7 @@ static void exynos7_tmu_control(struct platform_device 
> *pdev, bool on)
>   writel(con, data->base + EXYNOS_TMU_REG_CONTROL);
>  }
>  
> -static int exynos_get_temp(void *p, long *temp)
> +static int exynos_get_temp(void *p, unsigned long *temp)
>  {
>   struct exynos_tmu_data *data = p;
>  
> diff --git a/drivers/thermal/tegra_soctherm.c 
> b/drivers/thermal/tegra_soctherm.c
> index 9197fc0..1c4e455 100644
> --- a/drivers/thermal/tegra_soctherm.c
> +++ b/drivers/thermal/tegra_soctherm.c
> @@ -306,7 +306,7 @@ static long translate_temp(u16 val)
>   return t;
>  }
>  
> -static int tegra_thermctl_get_temp(void *data, long *out_temp)
> +static int tegra_thermctl_get_temp(void *data, unsigned long *out_temp)
>  {
>   struct tegra_thermctl_zone *zone = data;
>   u32 val;
> diff --git a/drivers/thermal/ti-soc-thermal/ti-thermal-common.c 
> b/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
> index 7f8e5f3..f480a01 100644
> --- a/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
> +++ b/drivers/thermal/ti-soc-thermal/ti-thermal-common.c
> @@ -76,7 

Re: [PATCH 05/13] thermal: inline only once used function

2015-04-06 Thread Eduardo Valentin
On Thu, Mar 26, 2015 at 04:53:52PM +0100, Sascha Hauer wrote:
> Inline update_temperature into its only caller to make the code
> more readable.

I am not sure I understand how this is improving readability, can you
please elaborate?

The way it is now it is more modular at least.

> 
> Signed-off-by: Sascha Hauer 
> ---
>  drivers/thermal/thermal_core.c | 16 +---
>  1 file changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index ebca854..6d0fdad 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -431,11 +431,15 @@ exit:
>  }
>  EXPORT_SYMBOL_GPL(thermal_zone_get_temp);
>  
> -static void update_temperature(struct thermal_zone_device *tz)
> +void thermal_zone_device_update(struct thermal_zone_device *tz)
>  {
> + int count;
>   unsigned long temp;
>   int ret;
>  
> + if (!tz->ops->get_temp)
> + return;
> +
>   ret = thermal_zone_get_temp(tz, );
>   if (ret) {
>   dev_warn(>device, "failed to read out thermal zone %d\n",
> @@ -451,16 +455,6 @@ static void update_temperature(struct 
> thermal_zone_device *tz)
>   trace_thermal_temperature(tz);
>   dev_dbg(>device, "last_temperature=%lu, current_temperature=%lu\n",
>   tz->last_temperature, tz->temperature);
> -}
> -
> -void thermal_zone_device_update(struct thermal_zone_device *tz)
> -{
> - int count;
> -
> - if (!tz->ops->get_temp)
> - return;
> -
> - update_temperature(tz);
>  
>   for (count = 0; count < tz->trips; count++)
>   handle_thermal_trip(tz, count);
> -- 
> 2.1.4
> 


signature.asc
Description: Digital signature


Re: [PATCH 1/6] block: export blkdev_reread_part()

2015-04-06 Thread Ming Lei
On Mon, Apr 6, 2015 at 10:50 PM, Christoph Hellwig  wrote:
> On Mon, Apr 06, 2015 at 12:40:12AM +0800, Ming Lei wrote:
>> > int blkdev_reread_part(struct block_device *bdev)
>> > {
>> > if (!mutex_trylock(>bd_mutex))
>> > return -EBUSY;
>> > blkdev_reread_part_nolock(bdev);
>> > mutex_unlock(>bd_mutex);
>> > }
>>
>> Yes, it is more clean, but with extra acquiring lock cost for the
>> failure cases, especially when we replace trylock with mutex_lock().
>
> It's just a few fairly trivial checks, so 'm not really worried about
> it, especially given that blkdev_reread_part isn't called from a fast
> path.

OK, considered that common users don't have any privilege
on block devices at default in most distributions, so they can't
do DoS by running ioctl(RRPART) with this change.

I will change to this style in v1.

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 0/2] hrtimer: Iterate only over active clock-bases

2015-04-06 Thread Viresh Kumar
Hi,

'active_bases' indicates which clock-base have active timers. While it
is updated (almost) correctly, it is hardly used.

And so this is an attempt to improve the code that iterates over all
clock-bases.

The first patch fixes a issue that will result in a bug after the second commit,
and the second commit creates a macro for_each_active_base() and uses it at
multiple places.

V1->V2:
- Dropped ffs() and wrote own routine __next_bit().

Viresh Kumar (2):
  hrtimer: update '->active_bases' before calling
hrtimer_force_reprogram()
  hrtimer: create for_each_active_base() to iterate over active
clock-bases

 kernel/time/hrtimer.c | 70 ---
 1 file changed, 44 insertions(+), 26 deletions(-)

-- 
2.3.0.rc0.44.ga94655d

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 2/2] hrtimer: Iterate only over active clock-bases

2015-04-06 Thread Viresh Kumar
At several instances we iterate over all possible clock-bases for a
particular cpu-base. Whereas, we only need to iterate over active bases.

We already have per cpu-base 'active_bases' field, which is updated on
addition/removal of hrtimers.

This patch creates for_each_active_base(), which uses 'active_bases' to
iterate only over active bases.

This also updates code which iterates over clock-bases.

Signed-off-by: Viresh Kumar 
---
 kernel/time/hrtimer.c | 65 ---
 1 file changed, 41 insertions(+), 24 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 3152f327c988..9da63e9ee63b 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -110,6 +110,31 @@ static inline int hrtimer_clockid_to_base(clockid_t 
clock_id)
 }
 
 
+static inline int __next_bit(unsigned int active_bases, int bit)
+{
+   do {
+   if (active_bases & (1 << bit))
+   return bit;
+   } while (++bit < HRTIMER_MAX_CLOCK_BASES);
+
+   /* We should never reach here */
+   return 0;
+}
+
+/*
+ * for_each_active_base: iterate over all active clock bases
+ * @_bit: 'int' variable for internal purpose
+ * @_base: holds pointer to a active clock base
+ * @_cpu_base: cpu base to iterate on
+ * @_active_bases: 'unsigned int' variable for internal purpose
+ */
+#define for_each_active_base(_bit, _base, _cpu_base, _active_bases)\
+   for ((_active_bases) = (_cpu_base)->active_bases, (_bit) = -1;  \
+   (_active_bases) &&  \
+   ((_bit) = __next_bit(_active_bases, ++_bit),\
+   (_base) = (_cpu_base)->clock_base + _bit);  \
+   (_active_bases) &= ~(1 << (_bit)))
+
 /*
  * Get the coarse grained time at the softirq based on xtime and
  * wall_to_monotonic.
@@ -443,19 +468,15 @@ static inline void debug_deactivate(struct hrtimer *timer)
 #if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS)
 static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
 {
-   struct hrtimer_clock_base *base = cpu_base->clock_base;
+   struct hrtimer_clock_base *base;
ktime_t expires, expires_next = { .tv64 = KTIME_MAX };
+   struct hrtimer *timer;
+   unsigned int active_bases;
int i;
 
-   for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++, base++) {
-   struct timerqueue_node *next;
-   struct hrtimer *timer;
-
-   next = timerqueue_getnext(>active);
-   if (!next)
-   continue;
-
-   timer = container_of(next, struct hrtimer, node);
+   for_each_active_base(i, base, cpu_base, active_bases) {
+   timer = container_of(timerqueue_getnext(>active),
+struct hrtimer, node);
expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
if (expires.tv64 < expires_next.tv64)
expires_next = expires;
@@ -1245,6 +1266,8 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 {
struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases);
ktime_t expires_next, now, entry_time, delta;
+   struct hrtimer_clock_base *base;
+   unsigned int active_bases;
int i, retries = 0;
 
BUG_ON(!cpu_base->hres_active);
@@ -1264,15 +1287,10 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 */
cpu_base->expires_next.tv64 = KTIME_MAX;
 
-   for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
-   struct hrtimer_clock_base *base;
+   for_each_active_base(i, base, cpu_base, active_bases) {
struct timerqueue_node *node;
ktime_t basenow;
 
-   if (!(cpu_base->active_bases & (1 << i)))
-   continue;
-
-   base = cpu_base->clock_base + i;
basenow = ktime_add(now, base->offset);
 
while ((node = timerqueue_getnext(>active))) {
@@ -1435,16 +1453,13 @@ void hrtimer_run_queues(void)
struct timerqueue_node *node;
struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases);
struct hrtimer_clock_base *base;
+   unsigned int active_bases;
int index, gettime = 1;
 
if (hrtimer_hres_active())
return;
 
-   for (index = 0; index < HRTIMER_MAX_CLOCK_BASES; index++) {
-   base = _base->clock_base[index];
-   if (!timerqueue_getnext(>active))
-   continue;
-
+   for_each_active_base(index, base, cpu_base, active_bases) {
if (gettime) {
hrtimer_get_softirq_time(cpu_base);
gettime = 0;
@@ -1665,6 +1680,8 @@ static void migrate_hrtimer_list(struct 
hrtimer_clock_base *old_base,
 static void migrate_hrtimers(int scpu)
 {
struct hrtimer_cpu_base *old_base, *new_base;
+   

[PATCH V2 1/2] hrtimer: update '->active_bases' before calling hrtimer_force_reprogram()

2015-04-06 Thread Viresh Kumar
'active_bases' indicates which clock-base have active timers. While it
is updated correctly, it is hardly used. Next commit will start using it
to make code more efficient, but before that we need to fix a problem.

While removing hrtimers, in __remove_hrtimer():
- We first remove the hrtimer from the queue.
- Then reprogram clockevent device if required
  (hrtimer_force_reprogram()).
- And then finally clear 'active_bases', if no more timers are pending
  on the current clock base (from which we are removing the hrtimer).

hrtimer_force_reprogram() needs to loop over all active clock bases to
find the next expiry event, and while doing so it will use
'active_bases' (after next commit). And it will find the current base
active, as we haven't cleared it until now, even if current clock base
has no more hrtimers queued.

The next commit will skip validating what timerqueue_getnext() returns,
as that is guaranteed to be valid for an active base, and the above
stated problem will result in a crash then (Because timerqueue_getnext()
will return NULL for the current clock base).

So, fix this issue by clearing active_bases before calling
hrtimer_force_reprogram().

Reviewed-by: Preeti U Murthy 
Signed-off-by: Viresh Kumar 
---
 kernel/time/hrtimer.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index bee0c1f78091..3152f327c988 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -879,6 +879,9 @@ static void __remove_hrtimer(struct hrtimer *timer,
 
next_timer = timerqueue_getnext(>active);
timerqueue_del(>active, >node);
+   if (!timerqueue_getnext(>active))
+   base->cpu_base->active_bases &= ~(1 << base->index);
+
if (>node == next_timer) {
 #ifdef CONFIG_HIGH_RES_TIMERS
/* Reprogram the clock event device. if enabled */
@@ -892,8 +895,6 @@ static void __remove_hrtimer(struct hrtimer *timer,
}
 #endif
}
-   if (!timerqueue_getnext(>active))
-   base->cpu_base->active_bases &= ~(1 << base->index);
 out:
timer->state = newstate;
 }
-- 
2.3.0.rc0.44.ga94655d

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/13] thermal: Fix not emulating critical temperatures

2015-04-06 Thread Eduardo Valentin
On Fri, Mar 27, 2015 at 06:23:18AM +0100, Sascha Hauer wrote:
> Hi Amit,
> 
> On Fri, Mar 27, 2015 at 08:35:50AM +0530, amit daniel kachhap wrote:
> > Hi Sascha,
> > 
> > > -#ifdef CONFIG_THERMAL_EMULATION
> > > -   if (!tz->emul_temperature)
> > > -   goto skip_emul;
> > > -
> > > -   for (count = 0; count < tz->trips; count++) {
> > > -   ret = tz->ops->get_trip_type(tz, count, );
> > > -   if (!ret && type == THERMAL_TRIP_CRITICAL) {
> > > -   ret = tz->ops->get_trip_temp(tz, count, 
> > > _temp);
> > > -   break;
> > > -   }
> > > -   }
> > > -
> > > -   if (ret)
> > > -   goto skip_emul;
> > >
> > > -   if (*temp < crit_temp)
> > I guess this check is confusing. Actually instead of returning
> > emulating temperature it is returning actual temperature. But the
> > important thing to look here is that actual temperature is higher than
> > critical temperature. So this check prevents the user from suppressing
> > the critical temperature and hence prevents from burning up the chip.
> 
> Indeed the check is confusing, but now it makes perfectly sense. I'll
> look at the patch again and maybe turn into a patch just adding a
> comment to clarify this.

That will be great. Thanks Sascha.

> 
> Sascha
> 
> -- 
> Pengutronix e.K.   | |
> Industrial Linux Solutions | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
> Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: [PATCH 1/1] irqchip/gicv3-its: remove GITS_BASER_TYPE_CPU base on latest specification

2015-04-06 Thread leizhen
On 2015/4/3 22:46, Jason Cooper wrote:
> Zhen Lei,
> 
> On Fri, Apr 03, 2015 at 11:33:52AM +0800, Zhen Lei wrote:
>> Acutally, "Interrupt Collections" and "Physical Processors" is the
>> same thing.
> 
> I'm sorry, but this isn't clear.
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/irqchip/irq-gic-v3-its.c   | 2 +-
>>  include/linux/irqchip/arm-gic-v3.h | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
>> b/drivers/irqchip/irq-gic-v3-its.c
>> index 9687f8a..a795aae 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -777,7 +777,7 @@ static int __init its_alloc_lpi_tables(void)
>>  static const char *its_base_type_string[] = {
>>  [GITS_BASER_TYPE_DEVICE]= "Devices",
>>  [GITS_BASER_TYPE_VCPU]  = "Virtual CPUs",
>> -[GITS_BASER_TYPE_CPU]   = "Physical CPUs",
>> +[GITS_BASER_TYPE_RESERVED3] = "Reserved (3)",
> 
> Are you fixing a bug?  Was the old information wrong?  Did the spec get
> revised?

In spec version 19.0, clause 5.12.13 GITS_BASERn. The "Type" field define 
value=0x3 means:
0x3. Physical Processors. This register corresponds to a table that scales 
according to the number of physical processors in the system and requires 
(Entry-size * number-of-processors) bytes of memory.

In spec version 24.0, clause 5.12.13 GITS_BASERn. The "Type" field define 
value=0x3 as reserved:
0x3. Reserved.

> 
> Please redo your commit message to explain why this change is necessary and
> what it's doing.

OK, thank  you for your advise.

> 
>>  [GITS_BASER_TYPE_COLLECTION]= "Interrupt Collections",
>>  [GITS_BASER_TYPE_RESERVED5] = "Reserved (5)",
>>  [GITS_BASER_TYPE_RESERVED6] = "Reserved (6)",
>> diff --git a/include/linux/irqchip/arm-gic-v3.h 
>> b/include/linux/irqchip/arm-gic-v3.h
>> index ffbc034..67f5779 100644
>> --- a/include/linux/irqchip/arm-gic-v3.h
>> +++ b/include/linux/irqchip/arm-gic-v3.h
>> @@ -233,7 +233,7 @@
>>  #define GITS_BASER_TYPE_NONE0
>>  #define GITS_BASER_TYPE_DEVICE  1
>>  #define GITS_BASER_TYPE_VCPU2
>> -#define GITS_BASER_TYPE_CPU 3
>> +#define GITS_BASER_TYPE_RESERVED3   3
>>  #define GITS_BASER_TYPE_COLLECTION  4
>>  #define GITS_BASER_TYPE_RESERVED5   5
>>  #define GITS_BASER_TYPE_RESERVED6   6
>> --
>> 1.8.0
> 
> thx,
> 
> Jason.
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] staging: lustre: Make cfs_sched_rehash static

2015-04-06 Thread Nickolaus Woodruff
This patch fixes the following sparse warning:

CHECK   drivers/staging/lustre/lustre/libcfs/hash.c
drivers/staging/lustre/lustre/libcfs/hash.c:119:21: warning: symbol
'cfs_sched_rehash' was not declared. Should it be static?

Signed-off-by: Nickolaus Woodruff 
---
 drivers/staging/lustre/lustre/libcfs/hash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/libcfs/hash.c 
b/drivers/staging/lustre/lustre/libcfs/hash.c
index a55567e..a4920a2 100644
--- a/drivers/staging/lustre/lustre/libcfs/hash.c
+++ b/drivers/staging/lustre/lustre/libcfs/hash.c
@@ -116,7 +116,7 @@ module_param(warn_on_depth, uint, 0644);
 MODULE_PARM_DESC(warn_on_depth, "warning when hash depth is high.");
 #endif

-struct cfs_wi_sched *cfs_sched_rehash;
+static struct cfs_wi_sched *cfs_sched_rehash;

 static inline void
 cfs_hash_nl_lock(union cfs_hash_lock *lock, int exclusive) {}
--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4] x86, selftests: Add sigreturn selftest

2015-04-06 Thread Andy Lutomirski
This is my sigreturn test, added mostly unchanged from its old home.
It exercises the sigreturn(2) syscall, specifically focusing on its
interactions with various IRET corner cases.  It tests for correct
behavior in several areas that were historically dangerously buggy.
For example, it exercises espfix on kernels of both bitnesses under
various conditions, and it contains exploits for several now-fixed
bugs in IRET error handling.

If you run it on older kernels, your system will crash.  It probably
won't eat your data in the process.

There is no released kernel on which the sigreturn_64 test will
pass, but it passes on tip:x86/asm.

IMO it's unfortunate that I need to provide a special script to run
tests.  I'd rather just list my targets.

I'm not using the ksft_ helpers at all yet.  I can do that later.

Signed-off-by: Andy Lutomirski 
---

Changes from v3:
 - Improve code clarity a bit and add tons of comments to the test case.

Changes from v2:
 - Improve changelog slightly.

Changes from v1:
 - Build and run the 64-bit test on 64-bit hosts, since the kernel
   prereq is now in tip:x86/asm.
 - Improve commit message.
 - Add a helpful diagnostic if -m32 doesn't work for user code.

 tools/testing/selftests/Makefile   |   1 +
 tools/testing/selftests/x86/.gitignore |   2 +
 tools/testing/selftests/x86/Makefile   |  48 ++
 tools/testing/selftests/x86/run_x86_tests.sh   |  11 +
 tools/testing/selftests/x86/sigreturn.c| 675 +
 .../testing/selftests/x86/trivial_32bit_program.c  |  14 +
 6 files changed, 751 insertions(+)
 create mode 100644 tools/testing/selftests/x86/.gitignore
 create mode 100644 tools/testing/selftests/x86/Makefile
 create mode 100755 tools/testing/selftests/x86/run_x86_tests.sh
 create mode 100644 tools/testing/selftests/x86/sigreturn.c
 create mode 100644 tools/testing/selftests/x86/trivial_32bit_program.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 4e511221a0c1..2ad56d451469 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -17,6 +17,7 @@ TARGETS += sysctl
 TARGETS += timers
 TARGETS += user
 TARGETS += vm
+TARGETS += x86
 #Please keep the TARGETS list alphabetically sorted
 
 TARGETS_HOTPLUG = cpu-hotplug
diff --git a/tools/testing/selftests/x86/.gitignore 
b/tools/testing/selftests/x86/.gitignore
new file mode 100644
index ..15034fef9698
--- /dev/null
+++ b/tools/testing/selftests/x86/.gitignore
@@ -0,0 +1,2 @@
+*_32
+*_64
diff --git a/tools/testing/selftests/x86/Makefile 
b/tools/testing/selftests/x86/Makefile
new file mode 100644
index ..f0a7918178dd
--- /dev/null
+++ b/tools/testing/selftests/x86/Makefile
@@ -0,0 +1,48 @@
+.PHONY: all all_32 all_64 check_build32 clean run_tests
+
+TARGETS_C_BOTHBITS := sigreturn
+
+BINARIES_32 := $(TARGETS_C_BOTHBITS:%=%_32)
+BINARIES_64 := $(TARGETS_C_BOTHBITS:%=%_64)
+
+CFLAGS := -O2 -g -std=gnu99 -pthread -Wall
+
+UNAME_P := $(shell uname -p)
+
+# Always build 32-bit tests
+all: all_32
+
+# If we're on a 64-bit host, build 64-bit tests as well
+ifeq ($(shell uname -p),x86_64)
+all: all_64
+endif
+
+all_32: check_build32 $(BINARIES_32)
+
+all_64: $(BINARIES_64)
+
+clean:
+   $(RM) $(BINARIES_32) $(BINARIES_64)
+
+run_tests:
+   ./run_x86_tests.sh
+
+$(TARGETS_C_BOTHBITS:%=%_32): %_32: %.c
+   $(CC) -m32 -o $@ $(CFLAGS) $(EXTRA_CFLAGS) $^ -lrt -ldl
+
+$(TARGETS_C_BOTHBITS:%=%_64): %_64: %.c
+   $(CC) -m64 -o $@ $(CFLAGS) $(EXTRA_CFLAGS) $^ -lrt -ldl
+
+check_build32:
+   @if ! $(CC) -m32 -o /dev/null trivial_32bit_program.c; then \
+ echo "Warning: you seem to have a broken 32-bit build" 2>&1;  \
+ echo "environment.  If you are using a Debian-like";  \
+ echo " distribution, try:";   \
+ echo "";  \
+ echo "  apt-get install gcc-multilib libc6-i386 libc6-dev-i386"; \
+ echo "";  \
+ echo "If you are using a Fedora-like distribution, try:"; \
+ echo "";  \
+ echo "  yum install glibc-devel.*i686";   \
+ exit 1;   \
+   fi
diff --git a/tools/testing/selftests/x86/run_x86_tests.sh 
b/tools/testing/selftests/x86/run_x86_tests.sh
new file mode 100755
index ..3d3ec65f3e7c
--- /dev/null
+++ b/tools/testing/selftests/x86/run_x86_tests.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+# This is deliberately minimal.  IMO kselftests should provide a standard
+# script here.
+./sigreturn_32 || exit 1
+
+if [[ "$uname -p" -eq "x86_64" ]]; then
+./sigreturn_64 || exit 1
+fi
+
+exit 0
diff --git a/tools/testing/selftests/x86/sigreturn.c 
b/tools/testing/selftests/x86/sigreturn.c
new file mode 100644

Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!

2015-04-06 Thread Steven Rostedt
On Mon,  6 Apr 2015 19:26:01 -0600
Thavatchai Makphaibulchoke  wrote:

> This patch fixes the problem that the ownership of a mutex acquired by an
> interrupt handler(IH) gets incorrectly attributed to the interrupted thread.
> 
> This could result in an incorrect deadlock detection in function
> rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading
> up to a system hang.
> 
> Here is the approach taken: when calling from an interrupt handler, instead of
> attributing ownership to the interrupted task, use the idle task on the 
> processor
> to indicate that the owner is a interrupt handler.  This approach avoids the
> above incorrect deadlock detection.
> 
> This also includes changes to disable priority boosting when lock owner is
> the idle_task, as it is not allowed.

Hmm, why is it not allowed?

If we just let it boost it, it will cut down on the code changes and
checks that add to the hot paths.

> 
> Kernel version 3.14.25 + patch-3.14.25-rt22
> 
> Signed-off-by: T. Makphaibulchoke 
> ---
> Changed in v2:
> - Use idle_task on the processor as rtmutex's owner instead of the
>   reserved interrupt handler task value.
> - Removed code to hadle the reserved interrupt handler's task value.
>  kernel/locking/rtmutex.c | 77 
> 
>  1 file changed, 52 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 6c40660..ae5c13f 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -26,6 +26,9 @@
>  
>  #include "rtmutex_common.h"
>  
> +static int __sched __rt_mutex_trylock(struct rt_mutex *lock,
> + struct task_struct *caller);
> +
>  /*
>   * lock->owner state tracking:
>   *
> @@ -51,6 +54,9 @@
>   * waiters. This can happen when grabbing the lock in the slow path.
>   * To prevent a cmpxchg of the owner releasing the lock, we need to
>   * set this bit before looking at the lock.
> + *
> + * Owner can also be reserved value, INTERRUPT_HANDLER. In this case the 
> mutex
> + * is owned by idle_task on the processor.
>   */
>  
>  static void
> @@ -298,7 +304,7 @@ static void __rt_mutex_adjust_prio(struct task_struct 
> *task)
>  {
>   int prio = rt_mutex_getprio(task);
>  
> - if (task->prio != prio || dl_prio(prio))
> + if (!is_idle_task(task) && (task->prio != prio || dl_prio(prio)))
>   rt_mutex_setprio(task, prio);
>  }
>  
> @@ -730,7 +736,6 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
>   if (waiter == rt_mutex_top_waiter(lock)) {
>   rt_mutex_dequeue_pi(owner, top_waiter);
>   rt_mutex_enqueue_pi(owner, waiter);
> -

I don't think this whitespace change needs to be done. The space does
split up the dequeue and enqueue from the rest.

>   __rt_mutex_adjust_prio(owner);
>   if (rt_mutex_real_waiter(owner->pi_blocked_on))
>   chain_walk = 1;
> @@ -777,10 +782,11 @@ static int task_blocks_on_rt_mutex(struct rt_mutex 
> *lock,
>   */
>  static void wakeup_next_waiter(struct rt_mutex *lock)
>  {
> + struct task_struct *owner = rt_mutex_owner(lock);
>   struct rt_mutex_waiter *waiter;
>   unsigned long flags;
>  
> - raw_spin_lock_irqsave(>pi_lock, flags);
> + raw_spin_lock_irqsave(>pi_lock, flags);
>  
>   waiter = rt_mutex_top_waiter(lock);
>  
> @@ -790,7 +796,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
>* boosted mode and go back to normal after releasing
>* lock->wait_lock.
>*/
> - rt_mutex_dequeue_pi(current, waiter);
> + rt_mutex_dequeue_pi(owner, waiter);
>  
>   /*
>* As we are waking up the top waiter, and the waiter stays
> @@ -802,7 +808,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
>*/
>   lock->owner = (void *) RT_MUTEX_HAS_WAITERS;
>  
> - raw_spin_unlock_irqrestore(>pi_lock, flags);
> + raw_spin_unlock_irqrestore(>pi_lock, flags);
>  
>   /*
>* It's safe to dereference waiter as it cannot go away as
> @@ -902,6 +908,8 @@ void rt_mutex_adjust_pi(struct task_struct *task)
>  static inline void rt_spin_lock_fastlock(struct rt_mutex *lock,
>void  (*slowfn)(struct rt_mutex *lock))
>  {
> + /* Might sleep, should not be called in interrupt context. */
> + BUG_ON(in_interrupt());

You're right it shouldn't. But that's why might_sleep() will give us a
nice big warning if it is. Don't add the BUG_ON().

>   might_sleep();
>  
>   if (likely(rt_mutex_cmpxchg(lock, NULL, current)))
> @@ -911,12 +919,12 @@ static inline void rt_spin_lock_fastlock(struct 
> rt_mutex *lock,
>  }
>  
>  static inline void rt_spin_lock_fastunlock(struct rt_mutex *lock,
> -void  (*slowfn)(struct rt_mutex 
> *lock))
> + void (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
>  {
>   if 

Re: [PATCH 1/3] mmc: dw_mmc: Increase cmd11 timeout to 500ms

2015-04-06 Thread Jaehoon Chung
Hi, Doug.

On 04/07/2015 04:32 AM, Doug Anderson wrote:
> Jaehoon,
> 
> On Mon, Apr 6, 2015 at 3:46 AM, Jaehoon Chung  wrote:
>> Hi, Doug.
>>
>> On 04/04/2015 03:13 AM, Doug Anderson wrote:
>>> The Designware databook claims that cmd11 should be finished in 2ms,
>>> but my testing showed that not to be the case in some situations.
>>> I've seen cmd11 timeouts of up to 130ms (!) during reboot tests.
>>> Let's bump the timeout way up so that we're absolutely sure.  CMD11 is
>>> only sent during card insertion, so this extra timeout shouldn't be
>>> terrible.
>>
>> Is it h/w problem? Could you explain to me about "some situations"?
>> As you said, this timeout only used during card inserting. So, it's not 
>> critical..
>> But there is much different between 2ms and 500ms(or 130ms).
> 
> Very good question, and it makes sense to dig into this...
> 
> OK, I think I've got it.  Dang printk bites me again.  I have serial
> console enabled and my printouts were actually causing these delays.
> With serial console turned off I reliably get ~280us for the interrupt
> to fire (tested across SD and WiFi across 137 + 128 + 111 + 127 = 503
> reboots)

Oh..agreed. I also think printouts can be caused the delay.
Thanks for your explanation.

> 
> I think it makes sense to land this patch anyway, but with an updated
> description.  I'm happy to repost this or happy if you just want to
> update the description when applying.

To save your time, when applying, i will do the updating description.

Best Regards,
Jaehoon Chung

> 
> ---
> 
> Although the cmd11 interrupt should come within 2ms, that's a very
> short time.  Let's increase the timeout to be really sure that we
> don't get an accidnetal timeout.  One case in particular this is
> useful is if you've got a serial console and printk in just the right
> places.  Under that scenario I've seen delays of up to 130ms before
> the interrupt fired.
> 
> CMD11 is only sent during card insertion, so this extra timeout
> shouldn't be terrible.
> 
> ---
> 
> -Doug
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] block: dasd_genhd: convert to blkdev_reread_part

2015-04-06 Thread Ming Lei
On Mon, Apr 6, 2015 at 9:51 PM, Jarod Wilson  wrote:
>>
>> Note: patch 6/6 in the series makes this whole while() loops pointless,
>> since the possibility of the -EBUSY return goes away.
>
> Minor clarification: the -EBUSY due to the trylock, which is why that
> retry loop exists, goes away. You *could* still get an -EBUSY through
> blkdev_reread_part()->rescan_partitions()->drop_partitions() if
> bdev->bd_part_count is non-zero.

Not like trylock(_mutex), Inside kernel, it should be driver's
responsibility to avoid that before rescanning partitions, that said
retry may not be needed for this case.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4 V6] workqueue: Allow modifying low level unbound workqueue cpumask

2015-04-06 Thread Tejun Heo
Hello, Lai.

On Tue, Apr 07, 2015 at 09:25:59AM +0800, Lai Jiangshan wrote:
> On 04/06/2015 11:53 PM, Tejun Heo wrote:
> > On Thu, Apr 02, 2015 at 07:14:42PM +0800, Lai Jiangshan wrote:
> >>/* make a copy of @attrs and sanitize it */
> >>copy_workqueue_attrs(new_attrs, attrs);
> >> -  cpumask_and(new_attrs->cpumask, new_attrs->cpumask, 
> >> wq_unbound_global_cpumask);
> >> +  copy_workqueue_attrs(pwq_attrs, attrs);
> >> +  cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask);
> >> +  cpumask_and(pwq_attrs->cpumask, pwq_attrs->cpumask, unbound_cpumask);
> > 
> > Hmmm... why do we need to keep track of both cpu_possible_mask and
> > unbound_cpumask?  Can't we just make unbound_cpumask replace
> > cpu_possible_mask for unbound workqueues?
> > 
> 
> I want to save the original user-setting cpumask.
> 
> When any time the wq_unbound_global_cpumask is changed,
> the new effective cpumask is
> the-original-user-setting-cpumask & wq_unbound_global_cpumask
> instead of
> the-last-effective-cpumask & wq_unbound_global_cpumask.

Yes, I get that, but that'd require just tracking the original
configured value and the unbound_cpumask masked value, no?  What am I
missing?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + zram-fix-error-return-code.patch added to -mm tree

2015-04-06 Thread Sergey Senozhatsky
On (04/06/15 12:43), a...@linux-foundation.org wrote:
> From: Julia Lawall 
> Subject: zram: fix error return code
> 
> Return a negative error code on failure.
> 
[..]
> A simplified version of the semantic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
> 
> Signed-off-by: Julia Lawall 
> Cc: Minchan Kim 
> Cc: Nitin Gupta 
> Signed-off-by: Andrew Morton 
> ---

good catch.
Acked-by: Sergey Senozhatsky 

>  drivers/block/zram/zram_drv.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff -puN drivers/block/zram/zram_drv.c~zram-fix-error-return-code 
> drivers/block/zram/zram_drv.c
> --- a/drivers/block/zram/zram_drv.c~zram-fix-error-return-code
> +++ a/drivers/block/zram/zram_drv.c
> @@ -1188,6 +1188,7 @@ static int zram_add(int device_id)
>   if (!queue) {
>   pr_err("Error allocating disk queue for device %d\n",
>   device_id);
> + ret = -ENOMEM;
>   goto out_free_idr;
>   }
>  
> @@ -1198,6 +1199,7 @@ static int zram_add(int device_id)
>   if (!zram->disk) {
>   pr_warn("Error allocating disk structure for device %d\n",
>   device_id);
> + ret = -ENOMEM;
>   goto out_free_queue;
>   }

I think we can drop the default `ret' value and just return explicit `-ENOMEM' 
in
!zram case.

---
 drivers/block/zram/zram_drv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index fe67ebb..f444c15 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1164,11 +1164,11 @@ static int zram_add(int device_id)
 {
struct zram *zram;
struct request_queue *queue;
-   int ret = -ENOMEM;
+   int ret;
 
zram = kzalloc(sizeof(struct zram), GFP_KERNEL);
if (!zram)
-   return ret;
+   return -ENOMEM;
 
if (device_id < 0) {
/* generate new device_id */
-- 
2.4.0.rc1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/13] thermal: trivial: fix typo in comment

2015-04-06 Thread Eduardo Valentin
On Thu, Mar 26, 2015 at 04:53:49PM +0100, Sascha Hauer wrote:
> Signed-off-by: Sascha Hauer 

Acked-by: Eduardo Valentin 

> ---
>  drivers/thermal/thermal_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 0e4ad7c..c735ac4c 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -402,7 +402,7 @@ static void handle_thermal_trip(struct 
> thermal_zone_device *tz, int trip)
>  }
>  
>  /**
> - * thermal_zone_get_temp() - returns its the temperature of thermal zone
> + * thermal_zone_get_temp() - returns the temperature of a thermal zone
>   * @tz: a valid pointer to a struct thermal_zone_device
>   * @temp: a valid pointer to where to store the resulting temperature.
>   *
> -- 
> 2.1.4
> 


signature.asc
Description: Digital signature


Re: [PATCH 01/13] thermal: Make temperatures consistently unsigned long

2015-04-06 Thread Eduardo Valentin
On Fri, Mar 27, 2015 at 08:07:50PM +0100, Sascha Hauer wrote:
> On Fri, Mar 27, 2015 at 10:18:14AM +, Punit Agrawal wrote:
> > Hi Sascha,
> > 
> > Sascha Hauer  writes:
> > 
> > > The thermal framework uses int, long and unsigned long for temperatures
> > > in millicelsius. The majority of functions uses unsigned long, so change
> > > the remaining functions to use this type aswell.
> > >
> > > Signed-off-by: Sascha Hauer 
> > 
> > I'd suggest changing to long instead. It would allow the use of the
> > thermal framework in environments where temperatures are below 0C -
> > quite easily reached in many parts of the world.
> 
> I agree to use a signed type. I also found it not so nice that the thermal
> core does not support negative temperatures. I only chose unsigned long
> because the patch got smallest that way, but I already expected this
> answer ;)
> We could also use int instead of long. INT_MAX °mC is still enough for using
> a computer on the surface of the sun (Not for the center though)

Agreed, int is the preferred type.


> 
> Sascha
> 
> -- 
> Pengutronix e.K.   | |
> Industrial Linux Solutions | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
> Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |


signature.asc
Description: Digital signature


linux-next: manual merge of the arm64 tree with the arm-soc tree

2015-04-06 Thread Stephen Rothwell
Hi Catalin,

Today's linux-next merge of the arm64 tree got a conflict in
arch/arm64/configs/defconfig between commit d7f64a44356c ("arm64: qcom:
Add support for Qualcomm MSM8916 SoC") from the arm-soc tree and commit
475bfd3d67fa ("arm64: defconfig: updates for 4.1") from the arm64 tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm64/configs/defconfig
index 96bba367f80c,e07896c819ef..
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@@ -31,10 -31,12 +31,14 @@@ CONFIG_MODULES=
  CONFIG_MODULE_UNLOAD=y
  # CONFIG_BLK_DEV_BSG is not set
  # CONFIG_IOSCHED_DEADLINE is not set
+ CONFIG_ARCH_EXYNOS7=y
  CONFIG_ARCH_FSL_LS2085A=y
  CONFIG_ARCH_MEDIATEK=y
 +CONFIG_ARCH_QCOM=y
 +CONFIG_ARCH_SPRD=y
+ CONFIG_ARCH_SEATTLE=y
+ CONFIG_ARCH_TEGRA=y
+ CONFIG_ARCH_TEGRA_132_SOC=y
  CONFIG_ARCH_THUNDER=y
  CONFIG_ARCH_VEXPRESS=y
  CONFIG_ARCH_XGENE=y
@@@ -105,9 -104,10 +111,11 @@@ CONFIG_VIRTIO_CONSOLE=
  # CONFIG_HW_RANDOM is not set
  CONFIG_SPI=y
  CONFIG_SPI_PL022=y
 +CONFIG_PINCTRL_MSM8916=y
  CONFIG_GPIO_PL061=y
  CONFIG_GPIO_XGENE=y
+ CONFIG_POWER_RESET_XGENE=y
+ CONFIG_POWER_RESET_SYSCON=y
  # CONFIG_HWMON is not set
  CONFIG_REGULATOR=y
  CONFIG_REGULATOR_FIXED_VOLTAGE=y
@@@ -133,10 -133,9 +141,11 @@@ CONFIG_MMC_SPI=
  CONFIG_RTC_CLASS=y
  CONFIG_RTC_DRV_EFI=y
  CONFIG_RTC_DRV_XGENE=y
+ CONFIG_VIRTIO_PCI=y
  CONFIG_VIRTIO_BALLOON=y
  CONFIG_VIRTIO_MMIO=y
 +CONFIG_COMMON_CLK_QCOM=y
 +CONFIG_MSM_GCC_8916=y
  # CONFIG_IOMMU_SUPPORT is not set
  CONFIG_PHY_XGENE=y
  CONFIG_EXT2_FS=y


pgpO6jRUMBQUK.pgp
Description: OpenPGP digital signature


Re: [PATCH 01/13] thermal: Make temperatures consistently unsigned long

2015-04-06 Thread Eduardo Valentin
On Thu, Mar 26, 2015 at 04:53:48PM +0100, Sascha Hauer wrote:
> The thermal framework uses int, long and unsigned long for temperatures
> in millicelsius. The majority of functions uses unsigned long, so change
> the remaining functions to use this type aswell.

I believe it make sense to change all to be int. int covers for the
required temperature range. 

Rui is just introducing the concept of invalid temp, which is below 0 K.

> 
> Signed-off-by: Sascha Hauer 
> ---
>  drivers/thermal/thermal_core.c | 10 +-
>  include/linux/thermal.h|  6 +++---

This change is not that straight forward as it looks like. 

In order to standardize this, apart from the thermal core
we will need to make drivers aware of the change too.
That will require changing the thermal zone device ops and all its
users, for the trip temperature, temperature, and hysteresis cases.

>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 174d3bc..0e4ad7c 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -378,7 +378,7 @@ static void handle_critical_trips(struct 
> thermal_zone_device *tz,
>  
>   if (trip_type == THERMAL_TRIP_CRITICAL) {
>   dev_emerg(>device,
> -   "critical temperature reached(%d C),shutting down\n",
> +   "critical temperature reached(%lu C),shutting down\n",
> tz->temperature / 1000);
>   orderly_poweroff(true);
>   }
> @@ -453,7 +453,7 @@ EXPORT_SYMBOL_GPL(thermal_zone_get_temp);
>  
>  static void update_temperature(struct thermal_zone_device *tz)
>  {
> - long temp;
> + unsigned long temp;
>   int ret;
>  
>   ret = thermal_zone_get_temp(tz, );
> @@ -469,7 +469,7 @@ static void update_temperature(struct thermal_zone_device 
> *tz)
>   mutex_unlock(>lock);
>  
>   trace_thermal_temperature(tz);
> - dev_dbg(>device, "last_temperature=%d, current_temperature=%d\n",
> + dev_dbg(>device, "last_temperature=%lu, current_temperature=%lu\n",
>   tz->last_temperature, tz->temperature);
>  }
>  
> @@ -512,7 +512,7 @@ static ssize_t
>  temp_show(struct device *dev, struct device_attribute *attr, char *buf)
>  {
>   struct thermal_zone_device *tz = to_thermal_zone(dev);
> - long temperature;
> + unsigned long temperature;
>   int ret;
>  
>   ret = thermal_zone_get_temp(tz, );
> @@ -520,7 +520,7 @@ temp_show(struct device *dev, struct device_attribute 
> *attr, char *buf)
>   if (ret)
>   return ret;
>  
> - return sprintf(buf, "%ld\n", temperature);
> + return sprintf(buf, "%lu\n", temperature);
>  }
>  
>  static ssize_t
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index 5eac316..db6c12b 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -180,9 +180,9 @@ struct thermal_zone_device {
>   int trips;
>   int passive_delay;
>   int polling_delay;
> - int temperature;
> - int last_temperature;
> - int emul_temperature;
> + unsigned long temperature;
> + unsigned long last_temperature;
> + unsigned long emul_temperature;
>   int passive;
>   unsigned int forced_passive;
>   struct thermal_zone_device_ops *ops;

Something like the following would be required to do a standardization.
Of course, the code below requires changing drivers too:

diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 6bbe11c..84c9777 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -92,7 +92,7 @@ struct thermal_zone_device_ops {
 struct thermal_cooling_device *);
int (*unbind) (struct thermal_zone_device *,
   struct thermal_cooling_device *);
-   int (*get_temp) (struct thermal_zone_device *, unsigned long *);
+   int (*get_temp) (struct thermal_zone_device *, int *);
int (*get_mode) (struct thermal_zone_device *,
 enum thermal_device_mode *);
int (*set_mode) (struct thermal_zone_device *,
@@ -100,15 +100,15 @@ struct thermal_zone_device_ops {
int (*get_trip_type) (struct thermal_zone_device *, int,
enum thermal_trip_type *);
int (*get_trip_temp) (struct thermal_zone_device *, int,
- unsigned long *);
+ int *);
int (*set_trip_temp) (struct thermal_zone_device *, int,
- unsigned long);
+ int);
int (*get_trip_hyst) (struct thermal_zone_device *, int,
- unsigned long *);
+ int *);
int (*set_trip_hyst) (struct thermal_zone_device *, int,
- unsigned long);
-   int (*get_crit_temp) (struct thermal_zone_device *, unsigned long *);
-   int 

linux-next: build failure after merge of the omap tree

2015-04-06 Thread Stephen Rothwell
Hi Tony,

After merging the omap tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

ERROR (phandle_references): Reference to non-existent node or label 
"omap3_scm_general"

ERROR: Input tree has errors, aborting (use -f to force output)
ERROR (phandle_references): Reference to non-existent node or label 
"omap3_scm_general"

ERROR: Input tree has errors, aborting (use -f to force output)
make[2]: *** [arch/arm/boot/dts/am3517_mt_ventoux.dtb] Error 2
make[2]: *** [arch/arm/boot/dts/omap3430-sdp.dtb] Error 2
ERROR (phandle_references): Reference to non-existent node or label 
"omap3_scm_general"

ERROR: Input tree has errors, aborting (use -f to force output)
make[2]: *** [arch/arm/boot/dts/omap3-beagle-xm.dtb] Error 2
ERROR (phandle_references): Reference to non-existent node or label 
"omap3_scm_general"

ERROR: Input tree has errors, aborting (use -f to force output)
make[2]: *** [arch/arm/boot/dts/omap3-beagle-xm-ab.dtb] Error 2
ERROR (phandle_references): Reference to non-existent node or label 
"omap3_scm_general"

ERROR: Input tree has errors, aborting (use -f to force output)
ERROR (phandle_references): Reference to non-existent node or label 
"omap3_scm_general"

ERROR: Input tree has errors, aborting (use -f to force output)
make[2]: *** [arch/arm/boot/dts/omap3-beagle.dtb] Error 2
make[2]: *** [arch/arm/boot/dts/omap3-devkit8000.dtb] Error 2
ERROR (phandle_references): Reference to non-existent node or label 
"omap3_scm_general"

ERROR: Input tree has errors, aborting (use -f to force output)
make[2]: *** [arch/arm/boot/dts/omap3-cm-t3530.dtb] Error 2
ERROR (phandle_references): Reference to non-existent node or label 
"omap3_scm_general"

ERROR: Input tree has errors, aborting (use -f to force output)
ERROR (phandle_references): Reference to non-existent node or label 
"omap3_scm_general"

ERROR: Input tree has errors, aborting (use -f to force output)
make[2]: *** [arch/arm/boot/dts/omap3-cm-t3730.dtb] Error 2
make[2]: *** [arch/arm/boot/dts/omap3-evm-37xx.dtb] Error 2
ERROR (phandle_references): Reference to non-existent node or label 
"omap3_scm_general"

ERROR: Input tree has errors, aborting (use -f to force output)
make[2]: *** [arch/arm/boot/dts/omap3-evm.dtb] Error 2

Presumably caused by commit b8845074cfbb ("ARM: dts: omap3: add minimal
l4 bus layout with control module support") interacting with commit
e52117638b79 ("ARM: dts: omap3: Add DT entries for OMAP 3 ISP") from
the arm-soc tree.

I applied the following merge fix patch for today (probably wrong, but
hopefully builds):

From: Stephen Rothwell 
Date: Tue, 7 Apr 2015 11:30:14 +1000
Subject: [PATCH] ARM: dts: omap3: fixup for merge conflict around
 omap3_scm_general

Signed-off-by: Stephen Rothwell 
---
 arch/arm/boot/dts/omap34xx.dtsi | 2 +-
 arch/arm/boot/dts/omap36xx.dtsi | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/omap34xx.dtsi b/arch/arm/boot/dts/omap34xx.dtsi
index 7bc8c0f72ddb..4f6b2d5b1902 100644
--- a/arch/arm/boot/dts/omap34xx.dtsi
+++ b/arch/arm/boot/dts/omap34xx.dtsi
@@ -46,7 +46,7 @@
   0x480bd800 0x017c>;
interrupts = <24>;
iommus = <_isp>;
-   syscon = <_scm_general 0xdc>;
+   syscon = <_conf 0xdc>;
ti,phy-type = ;
#clock-cells = <1>;
ports {
diff --git a/arch/arm/boot/dts/omap36xx.dtsi b/arch/arm/boot/dts/omap36xx.dtsi
index 3502fe00ec7d..86253de5a97a 100644
--- a/arch/arm/boot/dts/omap36xx.dtsi
+++ b/arch/arm/boot/dts/omap36xx.dtsi
@@ -78,7 +78,7 @@
   0x480bd800 0x0600>;
interrupts = <24>;
iommus = <_isp>;
-   syscon = <_scm_general 0x2f0>;
+   syscon = <_conf 0x2f0>;
ti,phy-type = ;
#clock-cells = <1>;
ports {
-- 
2.1.4

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpTvOilcCIPA.pgp
Description: OpenPGP digital signature


[PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!

2015-04-06 Thread Thavatchai Makphaibulchoke
This patch fixes the problem that the ownership of a mutex acquired by an
interrupt handler(IH) gets incorrectly attributed to the interrupted thread.

This could result in an incorrect deadlock detection in function
rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading
up to a system hang.

Here is the approach taken: when calling from an interrupt handler, instead of
attributing ownership to the interrupted task, use the idle task on the 
processor
to indicate that the owner is a interrupt handler.  This approach avoids the
above incorrect deadlock detection.

This also includes changes to disable priority boosting when lock owner is
the idle_task, as it is not allowed.

Kernel version 3.14.25 + patch-3.14.25-rt22

Signed-off-by: T. Makphaibulchoke 
---
Changed in v2:
- Use idle_task on the processor as rtmutex's owner instead of the
  reserved interrupt handler task value.
- Removed code to hadle the reserved interrupt handler's task value.
 kernel/locking/rtmutex.c | 77 
 1 file changed, 52 insertions(+), 25 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 6c40660..ae5c13f 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -26,6 +26,9 @@
 
 #include "rtmutex_common.h"
 
+static int __sched __rt_mutex_trylock(struct rt_mutex *lock,
+   struct task_struct *caller);
+
 /*
  * lock->owner state tracking:
  *
@@ -51,6 +54,9 @@
  * waiters. This can happen when grabbing the lock in the slow path.
  * To prevent a cmpxchg of the owner releasing the lock, we need to
  * set this bit before looking at the lock.
+ *
+ * Owner can also be reserved value, INTERRUPT_HANDLER. In this case the mutex
+ * is owned by idle_task on the processor.
  */
 
 static void
@@ -298,7 +304,7 @@ static void __rt_mutex_adjust_prio(struct task_struct *task)
 {
int prio = rt_mutex_getprio(task);
 
-   if (task->prio != prio || dl_prio(prio))
+   if (!is_idle_task(task) && (task->prio != prio || dl_prio(prio)))
rt_mutex_setprio(task, prio);
 }
 
@@ -730,7 +736,6 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
if (waiter == rt_mutex_top_waiter(lock)) {
rt_mutex_dequeue_pi(owner, top_waiter);
rt_mutex_enqueue_pi(owner, waiter);
-
__rt_mutex_adjust_prio(owner);
if (rt_mutex_real_waiter(owner->pi_blocked_on))
chain_walk = 1;
@@ -777,10 +782,11 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
  */
 static void wakeup_next_waiter(struct rt_mutex *lock)
 {
+   struct task_struct *owner = rt_mutex_owner(lock);
struct rt_mutex_waiter *waiter;
unsigned long flags;
 
-   raw_spin_lock_irqsave(>pi_lock, flags);
+   raw_spin_lock_irqsave(>pi_lock, flags);
 
waiter = rt_mutex_top_waiter(lock);
 
@@ -790,7 +796,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
 * boosted mode and go back to normal after releasing
 * lock->wait_lock.
 */
-   rt_mutex_dequeue_pi(current, waiter);
+   rt_mutex_dequeue_pi(owner, waiter);
 
/*
 * As we are waking up the top waiter, and the waiter stays
@@ -802,7 +808,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
 */
lock->owner = (void *) RT_MUTEX_HAS_WAITERS;
 
-   raw_spin_unlock_irqrestore(>pi_lock, flags);
+   raw_spin_unlock_irqrestore(>pi_lock, flags);
 
/*
 * It's safe to dereference waiter as it cannot go away as
@@ -902,6 +908,8 @@ void rt_mutex_adjust_pi(struct task_struct *task)
 static inline void rt_spin_lock_fastlock(struct rt_mutex *lock,
 void  (*slowfn)(struct rt_mutex *lock))
 {
+   /* Might sleep, should not be called in interrupt context. */
+   BUG_ON(in_interrupt());
might_sleep();
 
if (likely(rt_mutex_cmpxchg(lock, NULL, current)))
@@ -911,12 +919,12 @@ static inline void rt_spin_lock_fastlock(struct rt_mutex 
*lock,
 }
 
 static inline void rt_spin_lock_fastunlock(struct rt_mutex *lock,
-  void  (*slowfn)(struct rt_mutex 
*lock))
+   void (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
 {
if (likely(rt_mutex_cmpxchg(lock, current, NULL)))
rt_mutex_deadlock_account_unlock(current);
else
-   slowfn(lock);
+   slowfn(lock, current);
 }
 
 #ifdef CONFIG_SMP
@@ -1047,11 +1055,12 @@ static void  noinline __sched 
rt_spin_lock_slowlock(struct rt_mutex *lock)
 /*
  * Slow path to release a rt_mutex spin_lock style
  */
-static void __sched __rt_spin_lock_slowunlock(struct rt_mutex *lock)
+static void __sched __rt_spin_lock_slowunlock(struct rt_mutex *lock,
+   struct task_struct *task)
 {
debug_rt_mutex_unlock(lock);
 
-   

[PATCH v2 0/2] rtmutex Real-Time Linux: fix BUG at kernel/locking/rtmutex.c:997!

2015-04-06 Thread Thavatchai Makphaibulchoke
This patch series compose of 2 patches.

First patch, fixing kernel BUG at kernel/locking/rtmutex.c:997!

Second patch, some code optimation in kernel/locking/rtmutex.c

Changed in v2:
- Use idle_task on the processor as rtmutex's owner instead of the
  reserved interrupt handler task value.
- Removed code to hadle the reserved interrupt handler's task value.

Thavatchai Makphaibulchoke (2):
  rtmutex Real-Time Linux: Fixing kernel BUG at
kernel/locking/rtmutex.c:997!
  kernel/locking/rtmutex.c: some code optimization

 kernel/locking/rtmutex.c | 107 ++-
 1 file changed, 69 insertions(+), 38 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] kernel/locking/rtmutex.c: some code optimization

2015-04-06 Thread Thavatchai Makphaibulchoke
Adding the following code optimization,

- Reducing the number of cmpxchgs.  Only call mark_rt_mutex_waiters() when
  needed, waiters bit is not set.
- Reducing the hold time of wait_lock lock.
- Calling fixup_rt_mutex_waiters() only when needed.
- When unlocking rt_spin_lock in IRQ, alternate between attempting fast
  unlocking and attempting to lock mutex's wait_lock.

Signed-off-by: T. Makphaibulchoke 
---
 kernel/locking/rtmutex.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index ae5c13f..cadba20 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -608,7 +608,8 @@ __try_to_take_rt_mutex(struct rt_mutex *lock, struct 
task_struct *task,
 * any more. This is fixed up when we take the ownership.
 * This is the transitional state explained at the top of this file.
 */
-   mark_rt_mutex_waiters(lock);
+   if (!((unsigned long)lock->owner & RT_MUTEX_HAS_WAITERS))
+   mark_rt_mutex_waiters(lock);
 
if (rt_mutex_owner(lock))
return 0;
@@ -832,8 +833,8 @@ static void remove_waiter(struct rt_mutex *lock,
struct rt_mutex *next_lock = NULL;
unsigned long flags;
 
-   raw_spin_lock_irqsave(>pi_lock, flags);
rt_mutex_dequeue(lock, waiter);
+   raw_spin_lock_irqsave(>pi_lock, flags);
current->pi_blocked_on = NULL;
raw_spin_unlock_irqrestore(>pi_lock, flags);
 
@@ -1019,11 +1020,11 @@ static void  noinline __sched 
rt_spin_lock_slowlock(struct rt_mutex *lock)
if (top_waiter !=  || adaptive_wait(lock, lock_owner))
schedule_rt_mutex(lock);
 
-   raw_spin_lock(>wait_lock);
-
pi_lock(>pi_lock);
__set_current_state(TASK_UNINTERRUPTIBLE);
pi_unlock(>pi_lock);
+
+   raw_spin_lock(>wait_lock);
}
 
/*
@@ -1038,12 +1039,6 @@ static void  noinline __sched 
rt_spin_lock_slowlock(struct rt_mutex *lock)
self->saved_state = TASK_RUNNING;
pi_unlock(>pi_lock);
 
-   /*
-* try_to_take_rt_mutex() sets the waiter bit
-* unconditionally. We might have to fix that up:
-*/
-   fixup_rt_mutex_waiters(lock);
-
BUG_ON(rt_mutex_has_waiters(lock) &&  == 
rt_mutex_top_waiter(lock));
BUG_ON(!RB_EMPTY_NODE(_entry));
 
@@ -1096,7 +1091,14 @@ static inline void rt_spin_lock_fastunlock_in_irq(struct 
rt_mutex *lock,
return;
}
do {
+   /*
+* Alternate between fast acquire and try lock and proceed
+* to slow lock whichever succeeds first.
+*/
ret = raw_spin_trylock(>wait_lock);
+   if (!ret && unlikely(rt_mutex_cmpxchg(lock, intr_owner,
+   NULL)))
+   return;
} while (!ret);
 
slowfn(lock, intr_owner);
@@ -1499,10 +1501,12 @@ rt_mutex_slowtrylock(struct rt_mutex *lock, struct 
task_struct *task)
 
ret = try_to_take_rt_mutex(lock, task, NULL);
/*
-* try_to_take_rt_mutex() sets the lock waiters
-* bit unconditionally. Clean this up.
+* try_to_take_rt_mutex() keeps the lock waiters
+* bit set when failed to grab lock. Clean this
+* in case of a failure.
 */
-   fixup_rt_mutex_waiters(lock);
+   if (!ret)
+   fixup_rt_mutex_waiters(lock);
}
 
raw_spin_unlock(>wait_lock);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4 V6] workqueue: Allow modifying low level unbound workqueue cpumask

2015-04-06 Thread Lai Jiangshan
On 04/06/2015 11:53 PM, Tejun Heo wrote:
> On Thu, Apr 02, 2015 at 07:14:42PM +0800, Lai Jiangshan wrote:
>>  /* make a copy of @attrs and sanitize it */
>>  copy_workqueue_attrs(new_attrs, attrs);
>> -cpumask_and(new_attrs->cpumask, new_attrs->cpumask, 
>> wq_unbound_global_cpumask);
>> +copy_workqueue_attrs(pwq_attrs, attrs);
>> +cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask);
>> +cpumask_and(pwq_attrs->cpumask, pwq_attrs->cpumask, unbound_cpumask);
> 
> Hmmm... why do we need to keep track of both cpu_possible_mask and
> unbound_cpumask?  Can't we just make unbound_cpumask replace
> cpu_possible_mask for unbound workqueues?
> 

I want to save the original user-setting cpumask.

When any time the wq_unbound_global_cpumask is changed,
the new effective cpumask is
the-original-user-setting-cpumask & wq_unbound_global_cpumask
instead of
the-last-effective-cpumask & wq_unbound_global_cpumask.

thanks,
Lai

> Thanks.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] PCI: Set pref for mem64 resource of pcie device

2015-04-06 Thread Yinghai Lu
On Mon, Apr 6, 2015 at 3:49 PM, Bjorn Helgaas  wrote:
>
> For "[PATCH 1/3] PCI: Introduce pci_bus_addr_t", I'm waiting for an updated
> version with Kconfig tweaks so we don't break other arches.

I was thinking that you will update it manually.

>
> For "[PATCH 2/3] sparc/PCI: Add mem64 resource parsing for root bus", I'm
> waiting for a version that fixes the other of_bus_pci_get_flags() and
> pci_parse_of_flags() implementations at the same time (or an explanation
> about why we should fix only the arch/sparc version).  I don't want to fix
> one place and leave the same bug in other places.

I don't even know if other arch like powerpc support 64-bit bus address.

No one from powerpc reported a problem, why should we mess it up now?

I would like to see someone get access those three kinds of machine that support
of to unify of support code.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the arm-soc tree

2015-04-06 Thread Stephen Rothwell
Hi all,

After merging the arm-soc tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

In file included from arch/arm/boot/dts/mt8135.dtsi:18:0,
 from arch/arm/boot/dts/mt8135-evbp1.dts:16:
arch/arm/boot/dts/mt8135-pinfunc.h:18:40: fatal error: 
dt-bindings/pinctrl/mt65xx.h: No such file or directory
 #include 
^

Caused by commit e6f219b8ec5e ("ARM: dts: mt8135: Add pinctrl/GPIO/EINT
node for mt8135").

I have reverted that commit for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpRS3ewYiOLR.pgp
Description: OpenPGP digital signature


Re: [PATCH] add generic callbacks into compaction

2015-04-06 Thread Minchan Kim
Hello Gioh,

I wanted to have such feature for zsmalloc.
Thanks for the work.

On Wed, Apr 01, 2015 at 08:11:30AM +0900, Gioh Kim wrote:
> I sent a patch about page allocation for less fragmentation.
> http://permalink.gmane.org/gmane.linux.kernel.mm/130599
> 
> It proposes a page allocator allocates pages in the same pageblock
> for the drivers to move their unmovable pages. Some drivers which comsumes 
> many pages
> and increases system fragmentation use the allocator to move their pages to
> decrease fragmentation.
> 
> I think I can try another approach.
> There is a compaction code for balloon pages.
> But the compaction code cannot migrate pages of other drivers.
> If there is a generic migration framework applicable to any drivers,
> drivers can register their migration functions.
> And the compaction can migrate movable pages and also driver's pages.
> 
> I'm not familiar with virtualization so I couldn't test this patch yet.
> But if mm developers agree with this approach, I will complete this patch.
> 
> I would do appreciate any feedback.

Could you separate introducing migrate core patchset and balloon patchset for
using the feature?

> 
> Signed-off-by: Gioh Kim 
> ---
>  drivers/virtio/virtio_balloon.c|2 ++
>  include/linux/balloon_compaction.h |   23 +---
>  include/linux/fs.h |3 ++
>  include/linux/pagemap.h|   26 ++
>  mm/balloon_compaction.c|   68 
> ++--
>  mm/compaction.c|7 ++--
>  mm/migrate.c   |   24 ++---
>  7 files changed, 129 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 0413157..cd9b8e4 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -486,6 +486,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  
>   balloon_devinfo_init(>vb_dev_info);
>  #ifdef CONFIG_BALLOON_COMPACTION
> + vb->vb_dev_info.mapping = balloon_mapping_alloc(>vb_dev_info,
> + _aops);
>   vb->vb_dev_info.migratepage = virtballoon_migratepage;
>  #endif
>  
> diff --git a/include/linux/balloon_compaction.h 
> b/include/linux/balloon_compaction.h
> index 9b0a15d..0af32b3 100644
> --- a/include/linux/balloon_compaction.h
> +++ b/include/linux/balloon_compaction.h
> @@ -62,6 +62,7 @@ struct balloon_dev_info {
>   struct list_head pages; /* Pages enqueued & handled to Host */
>   int (*migratepage)(struct balloon_dev_info *, struct page *newpage,
>   struct page *page, enum migrate_mode mode);
> + struct address_space *mapping;
>  };
>  
>  extern struct page *balloon_page_enqueue(struct balloon_dev_info 
> *b_dev_info);
> @@ -76,10 +77,22 @@ static inline void balloon_devinfo_init(struct 
> balloon_dev_info *balloon)
>  }
>  
>  #ifdef CONFIG_BALLOON_COMPACTION
> -extern bool balloon_page_isolate(struct page *page);
> -extern void balloon_page_putback(struct page *page);
> -extern int balloon_page_migrate(struct page *newpage,
> - struct page *page, enum migrate_mode mode);
> +extern const struct address_space_operations balloon_aops;
> +extern int balloon_page_isolate(struct page *page);
> +extern int balloon_page_putback(struct page *page);
> +extern int balloon_page_migrate(struct address_space *mapping,
> + struct page *newpage,
> + struct page *page,
> + enum migrate_mode mode);
> +
> +extern struct address_space
> +*balloon_mapping_alloc(struct balloon_dev_info *b_dev_info,
> +const struct address_space_operations *a_ops);
> +
> +static inline void balloon_mapping_free(struct address_space 
> *balloon_mapping)
> +{
> + kfree(balloon_mapping);
> +}
>  
>  /*
>   * __is_movable_balloon_page - helper to perform @page PageBalloon tests
> @@ -123,6 +136,7 @@ static inline bool isolated_balloon_page(struct page 
> *page)
>  static inline void balloon_page_insert(struct balloon_dev_info *balloon,
>  struct page *page)
>  {
> + page->mapping = balloon->mapping;
>   __SetPageBalloon(page);
>   SetPagePrivate(page);
>   set_page_private(page, (unsigned long)balloon);
> @@ -139,6 +153,7 @@ static inline void balloon_page_insert(struct 
> balloon_dev_info *balloon,
>   */
>  static inline void balloon_page_delete(struct page *page)
>  {
> + page->mapping = NULL;
>   __ClearPageBalloon(page);
>   set_page_private(page, 0);
>   if (PagePrivate(page)) {
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b4d71b5..de463b9 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -368,6 +368,9 @@ struct address_space_operations {
>*/
>   int (*migratepage) (struct address_space *,
>   struct 

Re: [RFC PATCH 5 7/7] KEYS: exec request key within service thread of key creator

2015-04-06 Thread Ian Kent
On Thu, 2015-04-02 at 13:58 +0100, David Howells wrote:
> Ian Kent  wrote:
> 
> > +
> > +   /* Namespace token */
> > +   int umh_token;
> 
> If you could put it after data_len so that all the smaller-than-wordsize
> fields are together for better packing.

OK.

> 
> > +   umh_wq_put_token(key->umh_token);
> 
> Does gc.c need an extra #include for this?

Umm ... you'd think so, wonder how it compiled without kmod.h 

> 
> > +   /* If running within a container use the container namespace */
> > +   if (current->nsproxy->net_ns != _net)
> > +   key->umh_token = umh_wq_get_token(0, "keys");
> 
> So keys live in the networking namespace?

Perhaps checking the pid namespace would make more sense?

> 
> > -   ret = call_usermodehelper_keys(argv[0], argv, envp, keyring,
> > -  UMH_WAIT_PROC);
> > +   /* If running within a container use the container namespace */
> > +   if (key->umh_token)
> > +   ret = call_usermodehelper_keys_service(argv[0], argv, envp,
> > +  keyring, key->umh_token,
> > +  UMH_WAIT_PROC);
> > +   else
> > +   ret = call_usermodehelper_keys(argv[0], argv, envp,
> > +  keyring, UMH_WAIT_PROC);
> 
> call_usermodehelper_keys_service() would appear to be superfluous.  If
> key->umh_token is 0, you call call_usermodehelper_keys() which then calls
> call_usermodehelper_keys_service() with a 0 token...

Yeah, not really worth the additional function. IIRC there are no other
callers of call_usermodehelper_keys().

> 
> David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 5 1/7] kmod - add workqueue service thread store

2015-04-06 Thread Ian Kent
On Thu, 2015-04-02 at 13:43 +0100, David Howells wrote:
> Ian Kent  wrote:
> 
> > +static struct umh_wq_entry *umh_wq_find_entry(int token)
> > +{
> > +   struct umh_wq_entry *this, *entry;
> > +   struct hlist_head *bucket;
> > +   unsigned int hash;
> > +
> > +   hash = hash_32((unsigned long) token, UMH_WQ_HASH_SHIFT);
> > +   bucket = _wq_hash[hash];
> > +
> > +   entry = ERR_PTR(-ENOENT);
> > +   if (hlist_empty(bucket))
> > +   goto out;
> > +
> > +   hlist_for_each_entry(this, bucket, umh_wq_hlist) {
> > +   if (this->token == token) {
> > +   entry = this;
> > +   break;
> > +   }
> > +   }
> > +out:
> > +   return entry;
> > +}
> 
> Can "struct umh_wq_entry *" be used as the token?

Probably not, for example.

Couldn't a user set a different workqueue_struct and have it used for
execution. Not sure what that would get the user but it sounds like the
original reason we couldn't allow execution directly within the caller
environment.

> 
> David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/18 v3] tracing/drm: Remove unused TRACE_SYSTEM_STRING define

2015-04-06 Thread Masami Hiramatsu
(2015/04/03 10:38), Steven Rostedt wrote:
> From: Steven Rostedt 
> 
> The tracing infrastructure is adding a macro TRACE_SYSTEM_STRING, and
> hit the following build failure:
> 
>In file included from include/trace/define_trace.h:90:0,
> from drivers/gpu/drm/.//radeon/radeon_trace.h:209,
> from drivers/gpu/drm/.//radeon/radeon_trace_points.c:9:
>>> >> include/trace/ftrace.h:28:0: warning: "TRACE_SYSTEM_STRING" redefined
> #define TRACE_SYSTEM_STRING __app(TRACE_SYSTEM_VAR,__trace_system_name)
> 
> Seems that the DRM folks have added their own use to the
> TRACE_SYSTEM_STRING, with:
> 
>  #define TRACE_SYSTEM_STRING __stringify(TRACE_SYSTEM)
> 
> Although, I can not find its use anywhere. I could simply use another
> name, but if this macro is not being used, it should be removed.
> 
> Link: http://lkml.kernel.org/r/20150402123736.01eda...@gandalf.local.home
> 
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Jani Nikula 
> Reported-by: kbuild test robot 
> Signed-off-by: Steven Rostedt 

Reviewed-by: Masami Hiramatsu 

Thanks,

> ---
>  drivers/gpu/drm/drm_trace.h   | 1 -
>  drivers/gpu/drm/i915/i915_trace.h | 1 -
>  drivers/gpu/drm/radeon/radeon_trace.h | 1 -
>  3 files changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_trace.h b/drivers/gpu/drm/drm_trace.h
> index 27cc95f36381..ce3c42813fbb 100644
> --- a/drivers/gpu/drm/drm_trace.h
> +++ b/drivers/gpu/drm/drm_trace.h
> @@ -7,7 +7,6 @@
>  
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM drm
> -#define TRACE_SYSTEM_STRING __stringify(TRACE_SYSTEM)
>  #define TRACE_INCLUDE_FILE drm_trace
>  
>  TRACE_EVENT(drm_vblank_event,
> diff --git a/drivers/gpu/drm/i915/i915_trace.h 
> b/drivers/gpu/drm/i915/i915_trace.h
> index 6058a01b4443..d776621c8521 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -12,7 +12,6 @@
>  
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM i915
> -#define TRACE_SYSTEM_STRING __stringify(TRACE_SYSTEM)
>  #define TRACE_INCLUDE_FILE i915_trace
>  
>  /* pipe updates */
> diff --git a/drivers/gpu/drm/radeon/radeon_trace.h 
> b/drivers/gpu/drm/radeon/radeon_trace.h
> index ce075cb08cb2..fdce4062901f 100644
> --- a/drivers/gpu/drm/radeon/radeon_trace.h
> +++ b/drivers/gpu/drm/radeon/radeon_trace.h
> @@ -9,7 +9,6 @@
>  
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM radeon
> -#define TRACE_SYSTEM_STRING __stringify(TRACE_SYSTEM)
>  #define TRACE_INCLUDE_FILE radeon_trace
>  
>  TRACE_EVENT(radeon_bo_create,
> -- 2.1.4
> 


-- 
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] block: dasd_genhd: convert to blkdev_reread_part

2015-04-06 Thread Ming Lei
On Mon, Apr 6, 2015 at 9:46 PM, Jarod Wilson  wrote:
> On Sun, Apr 05, 2015 at 03:24:47PM +0800, Ming Lei wrote:
>> Also remove the obsolete comment.
>>
>> Signed-off-by: Ming Lei 
>> ---
>>  drivers/s390/block/dasd_genhd.c |9 +++--
>>  1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/s390/block/dasd_genhd.c 
>> b/drivers/s390/block/dasd_genhd.c
>> index 90f39f7..2af4619 100644
>> --- a/drivers/s390/block/dasd_genhd.c
>> +++ b/drivers/s390/block/dasd_genhd.c
>> @@ -116,14 +116,11 @@ int dasd_scan_partitions(struct dasd_block *block)
>> rc);
>>   return -ENODEV;
>>   }
>> - /*
>> -  * See fs/partition/check.c:register_disk,rescan_partitions
>> -  * Can't call rescan_partitions directly. Use ioctl.
>> -  */
>> - rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
>> +
>> + rc = blkdev_reread_part(bdev);
>>   while (rc == -EBUSY && retry > 0) {
>>   schedule();
>> - rc = ioctl_by_bdev(bdev, BLKRRPART, 0);
>> + rc = blkdev_reread_part(bdev);
>>   retry--;
>>   DBF_DEV_EVENT(DBF_ERR, block->base,
>> "scan partitions error, retry %d rc %d",
>
> Note: patch 6/6 in the series makes this whole while() loops pointless,
> since the possibility of the -EBUSY return goes away.

Yes, I do see that, and the while() can be removed after this patchset
is merged.

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 00/17 v2] tracing: Use TRACE_DEFINE_ENUM() to show enum values

2015-04-06 Thread Masami Hiramatsu
(2015/04/02 10:56), Steven Rostedt wrote:
> As there are many tracepoints that use __print_symbolic() to translate
> numbers into ASCII strings, and several of these translate enums as
> well, it causes a problem for user space tools that read the tracepoint
> format files and have to translate the binary data to their associated
> strings.
> 
> For example, with the tlb_flush tracepoint, we have this in the format
> file:
> 
> print fmt: "pages:%ld reason:%s (%d)", REC->pages,
>  __print_symbolic(REC->reason,
>{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
>{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
>{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
>{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }), REC->reason
> 
> Now, userspace does not know what the value of TLB_REMOTE_SHOOTDOWN is.
> To solve this, a new macro is created as a helper to allow tracepoints
> to export enums they use to userspace. This macro is called,
> TRACE_DEFINE_ENUM(), such that
> 
>  TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
> 
> will convert the "print fmt"s in the format files to its actual value
> and no longer display the enum name.
> 
> On boot up (or module load), the enums saved via TRACE_DEFINE_ENUM()
> will be searched for in the TP_printk()s of the tracepoints. Logic
> knows enough to ignore quoted text.
> 
> For debugging, a new file is still added in the tracing directory
> to show what enums were added, their values and the TRACE_SYSTEM that
> added them:
> 
>  # cat /sys/kernel/debug/tracing/enum_map
> TLB_LOCAL_MM_SHOOTDOWN 3 (tlb)
> TLB_LOCAL_SHOOTDOWN 2 (tlb)
> TLB_REMOTE_SHOOTDOWN 1 (tlb)
> TLB_FLUSH_ON_TASK_SWITCH 0 (tlb)
> 
> And the output of the tlb_flush format is now:
> 
> print fmt: "pages:%ld reason:%s (%d)", REC->pages,
>  __print_symbolic(REC->reason,
>{ 0, "flush on task switch" },
>{ 1, "remote shootdown" },
>{ 2, "local shootdown" },
>{ 3, "local mm shootdown" }), REC->reason
> 
> And userspace tools can easily parse that without special handling.

Great! :)
Thank you for updating the series :D
Now I'm OK for this series.

> 
> Local SHA1: a6862181206543b6493c73690f322868c86de0ea
> 
> 
> Steven Rostedt (Red Hat) (17):
>   tracing: Add TRACE_SYSTEM_VAR to intel-sst
>   tracing: Add TRACE_SYSTEM_VAR to kvm-s390
>   tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
>   tracing: Give system name a pointer
>   tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
>   tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
>   tracing: Allow for modules to export their trace enums as well
>   tracing/samples: Update the trace-event-sample.h with 
> TRACE_DEFINE_ENUM()
>   tracing: Show the mapped enums in enum_map file
>   x86/tlb/trace: Export enums in used by tlb_flush tracepoint
>   net/9p/tracing: Export enums in tracepoints to userspace
>   f2fs: Export the enums in the tracepoints to userspace
>   irq/tracing: Export enums in tracepoints to user space
>   mm: tracing: Export enums in tracepoints to user space
>   SUNRPC: Export enums in tracepoints to user space
>   v4l: Export enums used by tracepoints to user space
>   writeback: Export enums used by tracepoint to user space
> 
> 
>  arch/s390/kvm/trace-s390.h |   7 +
>  drivers/usb/host/xhci-trace.h  |   7 +
>  include/asm-generic/vmlinux.lds.h  |   5 +-
>  include/linux/ftrace_event.h   |   4 +-
>  include/linux/module.h |   2 +
>  include/linux/tracepoint.h |   8 +
>  include/trace/events/9p.h  | 157 
>  include/trace/events/f2fs.h|  30 
>  include/trace/events/intel-sst.h   |   7 +
>  include/trace/events/irq.h |  39 ++--
>  include/trace/events/migrate.h |  42 +++--
>  include/trace/events/sunrpc.h  |  62 +--
>  include/trace/events/tlb.h |  30 +++-
>  include/trace/events/v4l2.h|  75 +---
>  include/trace/events/writeback.h   |  33 +++-
>  include/trace/ftrace.h |  41 -
>  kernel/module.c|   3 +
>  kernel/trace/trace.c   | 276 
> -
>  kernel/trace/trace.h   |   2 +
>  kernel/trace/trace_events.c|  98 +-
>  samples/trace_events/trace-events-sample.h |  84 -
>  21 files changed, 853 insertions(+), 159 deletions(-)
> 
> 


-- 
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] PCI: Set pref for mem64 resource of pcie device

2015-04-06 Thread David Miller
From: Bjorn Helgaas 
Date: Mon, 6 Apr 2015 17:06:38 -0500

> But this is a general change that affects all platforms, and it's late in
> the cycle for something as invasive as this.  I'd rather include your patch
> in the v4.1 merge window, and revert d63e2e1f3df9 ("sparc/PCI: Clip bridge
> windows to fit in upstream windows") for v4.0.

I would kindly ask that we not proceed this way and use the change
which implements the fix properly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Question about switch() of soc_mbus_config_compatible()

2015-04-06 Thread Kuninori Morimoto

Hi Mauro, Guennadi

I would like to ask you about switch() of
linux/drivers/media/platform/soc_camera/soc_mediabus.c :: 
soc_mbus_config_compatible

unsigned int soc_mbus_config_compatible(const struct v4l2_mbus_config *cfg,
unsigned int flags)
{
...

switch (cfg->type) {
case V4L2_MBUS_PARALLEL:
hsync = common_flags & (V4L2_MBUS_HSYNC_ACTIVE_HIGH |
V4L2_MBUS_HSYNC_ACTIVE_LOW);
vsync = common_flags & (V4L2_MBUS_VSYNC_ACTIVE_HIGH |
=>  V4L2_MBUS_VSYNC_ACTIVE_LOW);
case V4L2_MBUS_BT656:
pclk = common_flags & (V4L2_MBUS_PCLK_SAMPLE_RISING |
   V4L2_MBUS_PCLK_SAMPLE_FALLING);
data = common_flags & (V4L2_MBUS_DATA_ACTIVE_HIGH |
   V4L2_MBUS_DATA_ACTIVE_LOW);
mode = common_flags & (V4L2_MBUS_MASTER | V4L2_MBUS_SLAVE);
return (!hsync || !vsync || !pclk || !data || !mode) ?
0 : common_flags;
...
}

Here, there is no break, no return, no /* FALL THROUGH */
It is very confusable, but what is this intention ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >