date:20140120

VDSO support for 32bit time functions

2014-01-20 Thread Greg KH

Hi Stefani,

About a year ago you posted a big patch to implement VDSO support for
32bit functions, and the response was a request to clean it up a bit by
breaking up the generic bits into a series to make it easier to review /
apply.

The patch I'm referring to can be found here:
http://thread.gmane.org/gmane.linux.kernel/1411713

Did that ever happen?

If not, any specific reason why?  Do you have a newer version somewhere
for "modern" kernel versions?

If you're not interested in this anymore, mind if I take it up based on
your last version?

I'm getting some complaints that this type of thing would really be good
as 32bit gettimeofday() on 64bit kernels is really slow (65 nanoseconds
on 32bit vs. 17 nanoseconds on 64bit on a high-end i7 processor.)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] phy: Add new Exynos5 USB 3.0 PHY driver

2014-01-20 Thread Vivek Gautam

On Mon, Jan 20, 2014 at 7:12 PM, Vivek Gautam  wrote:
> Add a new driver for the USB 3.0 PHY on Exynos5 series of SoCs.
> The new driver uses the generic PHY framework and will interact
> with DWC3 controller present on Exynos5 series of SoCs.
> Thereby, removing old phy-samsung-usb3 driver and related code
> used untill now which was based on usb/phy framework.
>
> Signed-off-by: Vivek Gautam 

Sorry, forgot to add a Reviewed-by tag from Felipe. :-(
Will update in the next version of patch after getting a feedback on this patch.

> ---
>
> Changes from v2:
> 1) Added support for multiple PHYs (UTMI+ and PIPE3) and
>related changes in the driver structuring.
> 2) Added a xlate function to get the required phy out of
>number of PHYs in mutiple PHY scenerio.
> 3) Changed the names of few structures and variables to
>have a clearer meaning.
> 4) Added 'usb3phy_config' structure to take care of mutiple
>phys for a SoC having 'exynos5_usb3phy_drv_data' driver data.
> 5) Not deleting support for old driver 'phy-samsung-usb3' until
>required support for generic phy is added to DWC3.
>
>  .../devicetree/bindings/phy/samsung-phy.txt|   49 ++
>  drivers/phy/Kconfig|8 +
>  drivers/phy/Makefile   |1 +
>  drivers/phy/phy-exynos5-usb3.c |  621 
> 
>  4 files changed, 679 insertions(+)
>  create mode 100644 drivers/phy/phy-exynos5-usb3.c
>
> diff --git a/Documentation/devicetree/bindings/phy/samsung-phy.txt 
> b/Documentation/devicetree/bindings/phy/samsung-phy.txt
> index c0fccaa..57079f8 100644
> --- a/Documentation/devicetree/bindings/phy/samsung-phy.txt
> +++ b/Documentation/devicetree/bindings/phy/samsung-phy.txt
> @@ -20,3 +20,52 @@ Required properties:
>  - compatible : should be "samsung,exynos5250-dp-video-phy";
>  - reg : offset and length of the Display Port PHY register set;
>  - #phy-cells : from the generic PHY bindings, must be 0;
> +
> +Samsung Exynos5 SoC series USB 3.0 PHY controller
> +--
> +
> +Required properties:
> +- compatible : Should be set to one of the following supported values:
> +   - "samsung,exynos5250-usb3phy" - for exynos5250 SoC,
> +   - "samsung,exynos5420-usb3phy" - for exynos5420 SoC.
> +- reg : Register offset and length of USB 3.0 PHY register set;
> +- clocks: Clock IDs array as required by the controller
> +- clock-names: names of clocks correseponding to IDs in the clock property;
> +   Required clocks:
> +   - phy: main PHY clock (same as USB 3.0 controller IP clock),
> +   used for register access.
> +   - usb3phy_refclk: PHY's reference clock (usually crystal clock),
> +   associated by phy name, used to determine bit values for
> +   clock settings register.
> +   Additional clock required for Exynos5420:
> +   - usb30_sclk_100m: Additional special clock used for PHY operation
> +  depicted as 'sclk_usbphy30' in CMU of Exynos5420.
> +- samsung,syscon-phandle: phandle for syscon interface, which is used to
> +   control pmu registers for power isolation.
> +- #phy-cells : from the generic PHY bindings, must be 1;
> +
> +For "samsung,exynos5250-usb3phy" and "samsung,exynos5420-usb3phy" compatible
> +PHYs, the second cell in the PHY specifier identifies the PHY id, which is
> +interpreted as follows:
> +  0 - UTMI+ type phy,
> +  1 - PIPE3 type phy,
> +
> +Example:
> +   usb3_phy: usbphy@1210 {
> +   compatible = "samsung,exynos5250-usb3phy";
> +   reg = <0x1210 0x100>;
> +   clocks = < 286>, < 1>;
> +   clock-names = "phy", "usb3phy_refclk";
> +   samsung,syscon-phandle = <_syscon>;
> +   #phy-cells = <1>;
> +   };
> +
> +- aliases: For SoCs like Exynos5420 having multiple USB PHY controllers,
> +  'usb3_phy' nodes should have numbered alias in the aliases node,
> +  in the form of usb3phyN, N = 0, 1... (depending on number of
> +  controllers).
> +Example:
> +   aliases {
> +   usb3phy0 = _phy0;
> +   usb3phy1 = _phy1;
> +   };
> diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
> index 330ef2d..32f9f38 100644
> --- a/drivers/phy/Kconfig
> +++ b/drivers/phy/Kconfig
> @@ -51,4 +51,12 @@ config PHY_EXYNOS_DP_VIDEO
> help
>   Support for Display Port PHY found on Samsung EXYNOS SoCs.
>
> +config PHY_EXYNOS5_USB3
> +   tristate "Exynos5 SoC series USB 3.0 PHY driver"
> +   depends on ARCH_EXYNOS5
> +   select GENERIC_PHY
> +   select MFD_SYSCON
> +   help
> + Enable USB 3.0 PHY support for Exynos 5 SoC series
> +
>  endmenu
> diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
> index d0caae9..9c06a61 100644
> --- a/drivers/phy/Makefile
> +++ b/drivers/phy/Makefile
> @@ -7,3 +7,4 @@

RE: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support

2014-01-20 Thread bharat.bhus...@freescale.com



> -Original Message-
> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> boun...@lists.linux-foundation.org] On Behalf Of Alex Williamson
> Sent: Saturday, January 18, 2014 2:06 AM
> To: Sethi Varun-B16395
> Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
> 
> RFC: This is not complete but I want to share with Varun the
> dirrection I'm thinking about.  In particular, I'm really not
> sure if we want to introduce a "v2" interface version with
> slightly different unmap semantics.  QEMU doesn't care about
> the difference, but other users might.  Be warned, I'm not even
> sure if this code works at the moment.  Thanks,
> 
> Alex
> 
> 
> We currently have a problem that we cannot support advanced features
> of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee
> that those features will be supported by all of the hardware units
> involved with the domain over its lifetime.  For instance, the Intel
> VT-d architecture does not require that all DRHDs support snoop
> control.  If we create a domain based on a device behind a DRHD that
> does support snoop control and enable SNP support via the IOMMU_CACHE
> mapping option, we cannot then add a device behind a DRHD which does
> not support snoop control or we'll get reserved bit faults from the
> SNP bit in the pagetables.  To add to the complexity, we can't know
> the properties of a domain until a device is attached.
> 
> We could pass this problem off to userspace and require that a
> separate vfio container be used, but we don't know how to handle page
> accounting in that case.  How do we know that a page pinned in one
> container is the same page as a different container and avoid double
> billing the user for the page.
> 
> The solution is therefore to support multiple IOMMU domains per
> container.  In the majority of cases, only one domain will be required
> since hardware is typically consistent within a system.  However, this
> provides us the ability to validate compatibility of domains and
> support mixed environments where page table flags can be different
> between domains.
> 
> To do this, our DMA tracking needs to change.  We currently try to
> coalesce user mappings into as few tracking entries as possible.  The
> problem then becomes that we lose granularity of user mappings.  We've
> never guaranteed that a user is able to unmap at a finer granularity
> than the original mapping, but we must honor the granularity of the
> original mapping.  This coalescing code is therefore removed, allowing
> only unmaps covering complete maps.  The change in accounting is
> fairly small here, a typical QEMU VM will start out with roughly a
> dozen entries, so it's arguable if this coalescing was ever needed.
> 
> We also move IOMMU domain creation to the point where a group is
> attached to the container.  An interesting side-effect of this is that
> we now have access to the device at the time of domain creation and
> can probe the devices within the group to determine the bus_type.
> This finally makes vfio_iommu_type1 completely device/bus agnostic.
> In fact, each IOMMU domain can host devices on different buses managed
> by different physical IOMMUs, and present a single DMA mapping
> interface to the user.  When a new domain is created, mappings are
> replayed to bring the IOMMU pagetables up to the state of the current
> container.  And of course, DMA mapping and unmapping automatically
> traverse all of the configured IOMMU domains.
> 
> Signed-off-by: Alex Williamson 
> ---
> 
>  drivers/vfio/vfio_iommu_type1.c |  631 
> ---
>  include/uapi/linux/vfio.h   |1
>  2 files changed, 329 insertions(+), 303 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4fb7a8f..983aae5 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -30,7 +30,6 @@
>  #include 
>  #include 
>  #include 
> -#include/* pci_bus_type */
>  #include 
>  #include 
>  #include 
> @@ -55,11 +54,18 @@ MODULE_PARM_DESC(disable_hugepages,
>"Disable VFIO IOMMU support for IOMMU hugepages.");
> 
>  struct vfio_iommu {
> - struct iommu_domain *domain;
> + struct list_headdomain_list;
>   struct mutexlock;
>   struct rb_root  dma_list;
> + bool v2;
> +};
> +
> +struct vfio_domain {
> + struct iommu_domain *domain;
> + struct bus_type *bus;
> + struct list_headnext;
>   struct list_headgroup_list;
> - boolcache;
> + int prot;   /* IOMMU_CACHE */
>  };
> 
>  struct vfio_dma {
> @@ -99,7 +105,7 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iommu
> *iommu,
>   return NULL;
>  }
> 
> -static void vfio_insert_dma(struct vfio_iommu *iommu, struct

Re: [PATCH 2/2] sched: add statistic for rq->max_idle_balance_cost

2014-01-20 Thread Jason Low

On Mon, Jan 20, 2014 at 9:33 PM, Alex Shi  wrote:
> It's useful to track this value in debug mode.
>
> Signed-off-by: Alex Shi 
> ---
>  kernel/sched/debug.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 1e43e70..f5c529a 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -315,6 +315,7 @@ do {  
>   \
> P(sched_goidle);
>  #ifdef CONFIG_SMP
> P64(avg_idle);
> +   p64(max_idle_balance_cost);

Hi Alex,

Does this need to be P64(max_idle_balance_cost)?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add HID's to hid-microsoft driver of Surface Type/Touch Cover 2 to fix bug

2014-01-20 Thread Reyad Attiyat

On Tue, Jan 21, 2014 at 12:47 AM, Jiri Kosina  wrote:
> On Mon, 20 Jan 2014, Reyad Attiyat wrote:
>
>> The below patch fixes a bug 64811
>> (https://bugzilla.kernel.org/show_bug.cgi?id=64811) of the Microsoft
>> Surface Type/Touch cover 2 devices being detected as a multitouch
>> device.
>> The fix adds the HID of the two devices to hid-microsoft driver. This
>> ensures that hid-input will eventually be used for the device and not
>> hid-multitouch.
>
> Hi,
>
> your patch is missing hid_have_special_driver[] entry, therefore correct
> driver binding is not guaranteed.
>
> --
> Jiri Kosina
> SUSE Labs
>

Hi,

Thanks for reminding me of hid_have_special_driver[]. I noticed that
this device has the HID_DG_CONTACTID and in the comment of the
hid_have_sepcial_driver[]

* Please note that for multitouch devices (driven by hid-multitouch driver),
* there is a proper autodetection and autoloading in place (based on presence
* of HID_DG_CONTACTID), so those devices don't need to be added to this list,
* as we are doing the right thing in hid_scan_usage().

This device should not be driven by hid-multitouch as it does not
handle keyboard/mouse input devices.
I submitted a new patch below with it added. I believe it should still
be part of this array, in case this kind of implementation is
fixed/updated.

>From 291742873dcf181faf9657b41279487f31302c73 Mon Sep 17 00:00:00 2001
From: Reyad Attiyat 
Date: Tue, 21 Jan 2014 01:22:25 -0600
Subject: [PATCH 1/1] Added in HID's for Microsoft Surface Type/Touch cover 2.
 This is to fix bug 64811 where this device is detected as a multitouch device

---
 drivers/hid/hid-core.c  | 3 +++
 drivers/hid/hid-ids.h   | 4 +++-
 drivers/hid/hid-microsoft.c | 4 
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index 253fe23..88eb4a6 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -1778,6 +1778,9 @@ static const struct hid_device_id
hid_have_special_driver[] = {
 { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT,
USB_DEVICE_ID_MS_PRESENTER_8K_USB) },
 { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT,
USB_DEVICE_ID_MS_DIGITAL_MEDIA_3K) },
 { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT,
USB_DEVICE_ID_WIRELESS_OPTICAL_DESKTOP_3_0) },
+{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TYPE_COVER_2) },
+{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT,
USB_DEVICE_ID_MS_TOUCH_COVER_2) },
+
 { HID_USB_DEVICE(USB_VENDOR_ID_MONTEREY, USB_DEVICE_ID_GENIUS_KB29E) },
 { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN) },
 { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG,
USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_1) },
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index f9304cb..b523a8b 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -611,7 +611,9 @@
 #define USB_DEVICE_ID_MS_PRESENTER_8K_USB0x0713
 #define USB_DEVICE_ID_MS_DIGITAL_MEDIA_3K0x0730
 #define USB_DEVICE_ID_MS_COMFORT_MOUSE_45000x076c
-
+#define USB_DEVICE_ID_MS_TOUCH_COVER_2 0x07a7
+#define USB_DEVICE_ID_MS_TYPE_COVER_2  0x07a9
+
 #define USB_VENDOR_ID_MOJO0x8282
 #define USB_DEVICE_ID_RETRO_ADAPTER0x3201

diff --git a/drivers/hid/hid-microsoft.c b/drivers/hid/hid-microsoft.c
index 551795b..2599de8 100644
--- a/drivers/hid/hid-microsoft.c
+++ b/drivers/hid/hid-microsoft.c
@@ -207,6 +207,10 @@ static const struct hid_device_id ms_devices[] = {
 .driver_data = MS_NOGET },
 { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT,
USB_DEVICE_ID_MS_COMFORT_MOUSE_4500),
 .driver_data = MS_DUPLICATE_USAGES },
+{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TYPE_COVER_2),
+.driver_data = 0 },
+{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TOUCH_COVER_2),
+.driver_data = 0 },

 { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT,
USB_DEVICE_ID_MS_PRESENTER_8K_BT),
 .driver_data = MS_PRESENTER },
-- 
1.8.4.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] vfs: Add fchmodat4 syscall: fchmodat with flag argument

2014-01-20 Thread Florian Weimer


On 01/13/2012 02:53 AM, Andrew Ayer wrote:

This adds a 4 argument version of fchmodat (fchmodat4) that
supports a flag argument, as specified by POSIX.  It supports
the same two flags as fchownat: AT_SYMLINK_NOFOLLOW and AT_EMPTY_PATH.


I don't think it's possible to emulate AT_EMPTY_PATH in user space, so I 
wonder if this could be applied, and if not, why.  Thanks.


--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] cpufreq: Align all CPUs to the same frequency if using shared clock

2014-01-20 Thread Li, Zhuangzhi

Thanks for reviewing.

Sorry for make you misunderstanding, on our x86 platform, we want all the CPUs 
share one policy by setting CPUFREQ_SHARED_TYPE_ALL, not share one HW clock 
line.
If the CPUs work at different frequency at init stage, then while the CPUs 
registering, no policy to align all the CPUs to the same frequency, this caused 
some conflicts with current CPUs P-state.

For example:
CPU0: P0
CPU1: P14
CPU2: P14
CPU3: P14
During all the CPUs registering, kernel considers all the CPUs work at P0 state 
because of sharing CPU0 policy, but the other three CPUs really work at P14 
state,
all CPUs frequency only wait to be aligned until CPU0 state changed by governor 
based on shared policy. 

Best Regards

Li zhuangzhi
Android System Integration Shanghai
Tel: +86 (0)21 6116 4323

-Original Message-
From: Viresh Kumar [mailto:viresh.ku...@linaro.org] 
Sent: Tuesday, January 21, 2014 2:35 PM
To: Li, Zhuangzhi
Cc: Rafael J. Wysocki; cpuf...@vger.kernel.org; linux...@vger.kernel.org; Linux 
Kernel Mailing List; Liu, Chuansheng
Subject: Re: [PATCH] cpufreq: Align all CPUs to the same frequency if using 
shared clock

On 21 January 2014 08:35, lizhuangzhi  wrote:
> Some SMP systems want to make all the possible CPUs share the clock, 
> if the CPUs init frequencies aren't the same, we need to align all the 
> CPUs to the same frequency while CPUs registing to avoid mismatched 
> CPU's P-states.
>
> Signed-off-by: lizhuangzhi 
> ---
>  drivers/cpufreq/cpufreq.c |2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c 
> index 8d19f7c..d00abb5 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -991,6 +991,8 @@ static int __cpufreq_add_dev(struct device *dev, struct 
> subsys_interface *sif,
>  * CPU because it is in the same boat. */
> policy = cpufreq_cpu_get(cpu);
> if (unlikely(policy)) {
> +   /* according present policy to align all the cpus frequencies 
> */
> +   cpufreq_driver->target(policy, policy->cur, 
> + CPUFREQ_RELATION_H);

I don't really understand why is this required? CPUs sharing clocks means that 
CPUs runs on the same clock line and so all of them must be running on same 
frequency. So, why do we need to call this routine? policy->cur must be the 
current freq here for CPU in question.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] x86, AMD, NB: silence an underflow test

2014-01-20 Thread Dan Carpenter

This is under CAP_SYS_ADMIN but Smatch complains that mask comes from
the user and the test for "mask > 0xf" can underflow.  The fix is
simple.

Signed-off-by: Dan Carpenter 

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index a54ee1d054d9..0c4e3e47d462 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -19,7 +19,7 @@ extern int amd_cache_northbridges(void);
 extern void amd_flush_garts(void);
 extern int amd_numa_init(void);
 extern int amd_get_subcaches(int);
-extern int amd_set_subcaches(int, int);
+extern int amd_set_subcaches(int, unsigned long);
 
 struct amd_l3_cache {
unsigned indices;
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 59554dca96ec..d9fceb697322 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -179,7 +179,7 @@ int amd_get_subcaches(int cpu)
return (mask >> (4 * cuid)) & 0xf;
 }
 
-int amd_set_subcaches(int cpu, int mask)
+int amd_set_subcaches(int cpu, unsigned long mask)
 {
static unsigned int reset, ban;
struct amd_northbridge *nb = node_to_amd_nb(amd_get_nb_id(cpu));
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: More GPIO madness on iMX6 - and the crappy ARM port of Linux

2014-01-20 Thread Lothar Waßmann

Hi,

Alexandre Courbot wrote:
> On Sat, Jan 18, 2014 at 7:43 AM, Linus Walleij  
> wrote:
> > On Fri, Jan 17, 2014 at 9:53 PM, Russell King - ARM Linux
> >  wrote:
> >> On Fri, Jan 17, 2014 at 01:42:44PM -0700, Stephen Warren wrote:
> >
[...]
> > If the OPEN_DRAIN flag is set on that descriptor we should
> > always be able to read the input. But as this is not really what the
> > I2C core wants to know (it really would prefer not to bother with
> > such GPIO flag details) so is it better if we add a special call to
> > figure out if the input can be read? Like:
> >
> > bool gpiod_input_always_valid(const struct gpio_desc *desc);
> >
> > And leave it up to the core to look at flags, driver characteristics
> > etc and determine whether the input can be trusted?
> 
> I am personally a little bit scared by the number of exported
> functions in the GPIO framework. It is a pretty large number for
> something that is supposed to be simple, so I'd like to avoid adding
> more. :) How about the following:
> 
> 1) GPIOs configured as output without the open drain or open source
> flag either return -EINVAL on gpiod_get_value(), or a cached value
> tracked by gpiolib for consistency (probably the latter).
> 2) GPIOs configured as open drain or open source report the actual
> value read on the pin, like i2c-core needs. This requires that, for
> each GPIO that can be set open drain or open source,
> gpiod_input_always_valid() == true.
> 
I would not bind this to the open drain configuration. Any GPIO output
pin may actually be in a different state than programmed when the
output is forcefully driven by another source (shortcut).
So it makes sense to be able to read back the real state of the pad
even for push pull outputs.


Lothar Waßmann
-- 
___

Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
Geschäftsführer: Matthias Kaussen
Handelsregistereintrag: Amtsgericht Aachen, HRB 4996

www.karo-electronics.de | i...@karo-electronics.de
___
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: LSF/MM 2014 Call For Proposals

2014-01-20 Thread Michel Lespinasse

On Fri, Dec 20, 2013 at 1:30 AM, Mel Gorman  wrote:
> The annual Linux Storage, Filesystem and Memory Management Summit for
> 2014 will be held on March 24th and 25th before the Linux Foundation
> Collaboration summit at The Meritage Resort, Napa Valley, CA.
>
> 
> http://events.linuxfoundation.org/events/linux-storage-filesystem-and-mm-summit
> http://events.linuxfoundation.org/events/collaboration-summit

Just a reminder for anyone who wants to participate in LSF/MM: If you
haven't already done so, please send us your request and/or topic
proposals by January 31st...

> Note that we are running LSF/MM a little earlier in 2014 than in previous
> years.
>
> On behalf of the committee I would like to issue a call for agenda proposals
> that are suitable for cross-track discussion as well as more technical
> subjects for discussion in the breakout sessions.
>
> 1) Suggestions for agenda topics should be sent before January 31st
> 2014 to:
>
> lsf...@lists.linux-foundation.org
>
> and cc the Linux list or lists that are most interested in it:
>
> ATA: linux-...@vger.kernel.org
> FS: linux-fsde...@vger.kernel.org
> MM: linux...@kvack.org
> SCSI: linux-s...@vger.kernel.org
>
> People who need more time for visa applications should send proposals before
> January 15th. The committee will complete the first round of selections
> on that date to accommodate applications.
>
> Please remember to tag your subject with [LSF/MM TOPIC] to make it
> easier to track. Agenda topics and attendees will be selected by the
> program committee, but the final agenda will be formed by consensus of
> the attendees on the day.
>
> We will try to cap attendance at around 25-30 per track to facilitate
> discussions although the final numbers will depend on the room sizes at
> the venue.
>
> 2) Requests to attend the summit should be sent to:
>
> lsf...@lists.linux-foundation.org
>
> Please summarize what expertise you will bring to the meeting, and what
> you would like to discuss. Please also tag your email with [LSF/MM ATTEND]
> so there is less chance of it getting lost in the large mail pile.
>
> Presentations are allowed to guide discussion, but are strongly
> discouraged. There will be no recording or audio bridge. However, we expect
> that written minutes will be published as we did in previous years
>
> 2013:
> http://lwn.net/Articles/548089/
>
> 2012:
> http://lwn.net/Articles/490114/
> http://lwn.net/Articles/490501/
>
> 2011:
> http://lwn.net/Articles/436871/
> http://lwn.net/Articles/437066/
>
> 3) If you have feedback on last year's meeting that we can use to
> improve this year's, please also send that to:
>
> lsf...@lists.linux-foundation.org
>
> Thank you on behalf of the program committee:
>
> Storage:
> James Bottomley
> Martin K. Petersen
>
> Filesystems:
> Trond Myklebust
> Jeff Layton
> Dave Chinner
> Jan Kara
> Ted Ts'o
>
> MM:
> Rik van Riel
> Michel Lespinasse

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V5 1/3] mm/nobootmem: Fix unused variable

2014-01-20 Thread Philipp Hachtmann



Am Mon, 20 Jan 2014 22:16:33 -0800 (PST)
schrieb David Rientjes :

> Not sure why you don't just do a one line patch:
> 
> - phys_addr_t size;
> + phys_addr_t size __maybe_unused;
> to fix it.

Just because I did not know that __maybe_unused thing.

Discussion of this fix seems to be obsolete because Andrew already took
the patch int the form he suggested: One #ifdef in the function with a
basic block declaring size once inside.

Regards

Philipp

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Urgent Offer

2014-01-20 Thread 0800 Trabajo



Hello Peeps Virgin Money Int| now offer cheep loans to any one who is 
interested in going into Business or Expanding, personal loans are also give at 
a flat rate of %10 for the total duration , take a step by contacting us!!! 
Reply now we await you.  jasoncud...@ntlworld.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/2] sched/update_avg: avoid negative time

2014-01-20 Thread Michael wang

Hi, Alex

On 01/21/2014 01:33 PM, Alex Shi wrote:
> rq->avg_idle try to reflect the average idle time between the cpu idle
> and first wakeup. But in the function, it maybe get a negative value
> if old avg_idle is too small. Then this negative value will be double
> counted in next time calculation. Guess that is not the original purpose,
> so recalibrate it to zero.
> 
> Signed-off-by: Alex Shi 
> ---
>  kernel/sched/core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 30eb011..af9121c6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1358,6 +1358,9 @@ static void update_avg(u64 *avg, u64 sample)
>  {
>   s64 diff = sample - *avg;
>   *avg += diff >> 3;
> +
> + if (*avg < 0)
> + *avg = 0;

This seems like won't happen...

if 'diff' is negative, it's absolute value won't bigger than '*avg', not
to mention we only use 1/8 of it.

Regards,
Michael Wang

>  }
>  #endif
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add HID's to hid-microsoft driver of Surface Type/Touch Cover 2 to fix bug

2014-01-20 Thread Jiri Kosina

On Mon, 20 Jan 2014, Reyad Attiyat wrote:

> The below patch fixes a bug 64811
> (https://bugzilla.kernel.org/show_bug.cgi?id=64811) of the Microsoft
> Surface Type/Touch cover 2 devices being detected as a multitouch
> device.
> The fix adds the HID of the two devices to hid-microsoft driver. This
> ensures that hid-input will eventually be used for the device and not
> hid-multitouch.

Hi,

your patch is missing hid_have_special_driver[] entry, therefore correct 
driver binding is not guaranteed.

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH V2] fs null_blk: Null pointer deference problem in alloc_page_buffers

2014-01-20 Thread Raghavendra K T


On 01/20/2014 10:02 PM, Matias Bjorling wrote:

On 01/20/2014 04:58 AM, Raghavendra K T wrote:

  If we load the null_blk module with bs=8k we get following oops:
[ 3819.812190] BUG: unable to handle kernel NULL pointer dereference at 
0008
[ 3819.812387] IP: [] create_empty_buffers+0x28/0xaf
[ 3819.812527] PGD 219244067 PUD 215a06067 PMD 0
[ 3819.812640] Oops:  [#1] SMP
[ 3819.812772] Modules linked in: null_blk(+)

  Fix that by resetting block size to PAGE_SIZE if it is greater than PAGE_SIZE
  Also add sanity checks for block size > PAGE_SIZE.


We should probably split the patch into two. Giving a better description
to each of the changes.



Agreed.

[...]

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index a2e69d2..bcae726 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -622,6 +622,10 @@ static int __init null_init(void)
irqmode = NULL_IRQ_NONE;
}
  #endif
+   if (bs > PAGE_SIZE) {
+   pr_warn("Invalid block size. Setting it to %lu\n", PAGE_SIZE);
+   bs = PAGE_SIZE;
+   }


Could the warning say something like:

pr_warn("null_blk: invalid block size\n");
pr_warn("null_blk: defaults block size to \n");

then it follows the same patterns as the other errors.


Agree.





if (queue_mode == NULL_Q_MQ && use_per_node_hctx) {
if (submit_queues < nr_online_nodes) {
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 1e86823..2481d42 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1027,6 +1027,7 @@ void bd_set_size(struct block_device *bdev, loff_t size)
break;
bsize <<= 1;
}
+   BUG_ON(bsize > PAGE_SIZE);
bdev->bd_block_size = bsize;
bdev->bd_inode->i_blkbits = blksize_bits(bsize);
  }
diff --git a/fs/buffer.c b/fs/buffer.c
index 6024877..8b7ada1 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1571,6 +1571,7 @@ void create_empty_buffers(struct page *page,
struct buffer_head *bh, *head, *tail;

head = alloc_page_buffers(page, blocksize, 1);
+   BUG_ON(!head);
bh = head;
do {
bh->b_state |= b_state;



For the check patch, the description could mention some text on why we
hit this error case and why we check for it.



Thanks for the suggestions. Will post V3 with the changes.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: spinlock lockup

2014-01-20 Thread naveen yadav

Dear Will,

Thanks for your reply,

We are using Cortex A15.
yes,  this is with ticket lock.

We will check value of arch_spinlock_t and share it. It is bit
difficult to reproduce this scenario.

If you have some idea ,please suggest how to reproduce it.

thanks

On Mon, Jan 20, 2014 at 3:50 PM, Will Deacon  wrote:
> On Sat, Jan 18, 2014 at 07:25:51AM +, naveen yadav wrote:
>> We are using 3.8.x  kernel on ARM, We are facing soft lockup issue.
>> Following are the logs.
>
> Which CPU/SoC are you using?


>
>> BUG: spinlock lockup suspected on CPU#0, process1/525
>> lock: 0xd8ac9a64, .magic: dead4ead, .owner: /-1, .owner_cpu: -1
>>
>>
>> 1 . Looks like lock is available as owner is -1, why arch_spin_trylock
>> is getting failed ?
>
> Is this with or without the ticket lock patches? Can you inspect the actual
> value of the arch_spinlock_t?

>
>> 2. There is a patch : ARM: spinlock: retry trylock operation if strex
>> fails on free lock
>> http://permalink.gmane.org/gmane.linux.ports.arm.kernel/240913
>> In this patch, A loop has been added around strexeq %2, %0, [%3]".
>> {Comment "retry the trylock operation if the lock appears
>> to be free but the strex reported failure"}
>>
>> but arch_spin_trylock is called by __spin_lock_debug and its already
>> getting called in loops. So what purpose is resolves?
>
> Does this patch help your issue? The purpose of it is to distinguish between
> two types of contention:
>
>   (1) The lock is actually taken
>   (2) The lock is free, but two people are doing a trylock at the same time
>
> In the case of (2), we do actually want to spin again otherwise you could
> potentially end up in a pathological case where the two CPUs repeatedly
> shoot down each other's monitor and forward progress isn't made until the
> sequence is broken by something like an interrupt.
>
>> static void __spin_lock_debug(raw_spinlock_t *lock)
>> {
>> u64 i;
>> u64 loops = loops_per_jiffy * HZ;
>>
>> for (i = 0; i < loops; i++) {
>> if (arch_spin_trylock(>raw_lock))
>> return;
>> __delay(1);
>> }
>> /* lockup suspected: */
>> spin_dump(lock, "lockup suspected");
>> }
>>
>> 3. Is this patch useful to us, How can we reproduce this scenario ?
>> Scenario : Lock is available but arch_spin_trylock  is returning as failure
>
> Potentially. Why can't you simply apply the patch and see if it resolves your
> issue?
>
> Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages

2014-01-20 Thread Cai Liu

2014/1/21 Minchan Kim :
> Please check your MUA and don't break thread.
>
> On Tue, Jan 21, 2014 at 11:07:42AM +0800, Cai Liu wrote:
>> Thanks for your review.
>>
>> 2014/1/21 Minchan Kim :
>> > Hello Cai,
>> >
>> > On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote:
>> >> zswap can support multiple swapfiles. So we need to check
>> >> all zbud pool pages in zswap.
>> >>
>> >> Version 2:
>> >>   * add *total_zbud_pages* in zbud to record all the pages in pools
>> >>   * move the updating of pool pages statistics to
>> >> alloc_zbud_page/free_zbud_page to hide the details
>> >>
>> >> Signed-off-by: Cai Liu 
>> >> ---
>> >>  include/linux/zbud.h |2 +-
>> >>  mm/zbud.c|   44 
>> >>  mm/zswap.c   |4 ++--
>> >>  3 files changed, 35 insertions(+), 15 deletions(-)
>> >>
>> >> diff --git a/include/linux/zbud.h b/include/linux/zbud.h
>> >> index 2571a5c..1dbc13e 100644
>> >> --- a/include/linux/zbud.h
>> >> +++ b/include/linux/zbud.h
>> >> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, unsigned long 
>> >> handle);
>> >>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
>> >>  void *zbud_map(struct zbud_pool *pool, unsigned long handle);
>> >>  void zbud_unmap(struct zbud_pool *pool, unsigned long handle);
>> >> -u64 zbud_get_pool_size(struct zbud_pool *pool);
>> >> +u64 zbud_get_pool_size(void);
>> >>
>> >>  #endif /* _ZBUD_H_ */
>> >> diff --git a/mm/zbud.c b/mm/zbud.c
>> >> index 9451361..711aaf4 100644
>> >> --- a/mm/zbud.c
>> >> +++ b/mm/zbud.c
>> >> @@ -52,6 +52,13 @@
>> >>  #include 
>> >>  #include 
>> >>
>> >> +/*
>> >> +* statistics
>> >> +**/
>> >> +
>> >> +/* zbud pages in all pools */
>> >> +static u64 total_zbud_pages;
>> >> +
>> >>  /*
>> >>   * Structures
>> >>  */
>> >> @@ -142,10 +149,28 @@ static struct zbud_header *init_zbud_page(struct 
>> >> page *page)
>> >>   return zhdr;
>> >>  }
>> >>
>> >> +static struct page *alloc_zbud_page(struct zbud_pool *pool, gfp_t gfp)
>> >> +{
>> >> + struct page *page;
>> >> +
>> >> + page = alloc_page(gfp);
>> >> +
>> >> + if (page) {
>> >> + pool->pages_nr++;
>> >> + total_zbud_pages++;
>> >
>> > Who protect race?
>>
>> Yes, here the pool->pages_nr and also the total_zbud_pages are not protected.
>> I will re-do it.
>>
>> I will change *total_zbud_pages* to atomic type.
>
> Wait, it doesn't make sense. Now, you assume zbud allocator would be used
> for only zswap. It's true until now but we couldn't make sure it in future.
> If other user start to use zbud allocator, total_zbud_pages would be 
> pointless.

Yes, you are right.  ZBUD is a common module. So in this patch calculate the
zswap pool size in zbud is not suitable.

>
> Another concern is that what's your scenario for above two swap?
> How often we need to call zbud_get_pool_size?
> In previous your patch, you reduced the number of call so IIRC,
> we only called it in zswap_is_full and for debugfs.

zbud_get_pool_size() is called frequently when adding/freeing zswap
entry happen in zswap . This is why in this patch I added a counter in zbud,
and then in zswap the iteration of zswap_list to calculate the pool size will
not be needed.

> Of course, it would need some lock or refcount to prevent destroy
> of zswap_tree in parallel with zswap_frontswap_invalidate_area but
> zswap_is_full doesn't need to be exact so RCU would be good fit.
>
> Most important point is that now zswap doesn't consider multiple swap.
> For example, Let's assume you uses two swap A and B with different priority
> and A already has charged 19% long time ago and let's assume that A swap is
> full now so VM start to use B so that B has charged 1% recently.
> It menas zswap charged (19% + 1%)i is full by default.
>
> Then, if VM want to swap out more pages into B, zbud_reclaim_page
> would be evict one of pages in B's pool and it would be repeated
> continuously. It's totally LRU reverse problem and swap thrashing in B
> would happen.
>

The scenario is below:
There are 2 swap A, B in system. If pool size of A reach 19% of ram
size and swap A
is also full. Then swap B will be used. Pool size of B will be
increased until it hit
the 20% of the ram size. By now zswap pool size is about 39% of ram size.
If there are more than 2 swap file/device,  zswap pool will expand out
of control
and there may be no swapout happened.

I think the original intention of zswap designer is to keep the total
zswap pools size below
20% of RAM size.

Thanks.

> Please say your usecase scenario and if it's really problem,
> we need more surgery.
>
> Thanks.
>
>> For *pool->pages_nr*, one way is to use pool->lock to protect. But I
>> think it is too heavy.
>> So does it ok to change pages_nr to atomic type too?
>>
>>
>> >
>> >> + }
>> >> +
>> >> + return page;
>> >> +}
>> >> +
>> >> +
>> >>  /* Resets the

Re: [PATCH] cpufreq: Align all CPUs to the same frequency if using shared clock

2014-01-20 Thread Viresh Kumar

On 21 January 2014 08:35, lizhuangzhi  wrote:
> Some SMP systems want to make all the possible CPUs share the clock,
> if the CPUs init frequencies aren't the same, we need to align all the
> CPUs to the same frequency while CPUs registing to avoid mismatched
> CPU's P-states.
>
> Signed-off-by: lizhuangzhi 
> ---
>  drivers/cpufreq/cpufreq.c |2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 8d19f7c..d00abb5 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -991,6 +991,8 @@ static int __cpufreq_add_dev(struct device *dev, struct 
> subsys_interface *sif,
>  * CPU because it is in the same boat. */
> policy = cpufreq_cpu_get(cpu);
> if (unlikely(policy)) {
> +   /* according present policy to align all the cpus frequencies 
> */
> +   cpufreq_driver->target(policy, policy->cur, 
> CPUFREQ_RELATION_H);

I don't really understand why is this required? CPUs sharing clocks means
that CPUs runs on the same clock line and so all of them must be running
on same frequency. So, why do we need to call this routine? policy->cur
must be the current freq here for CPU in question.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Jan 21

2014-01-20 Thread Stephen Rothwell

Hi all,

This tree fails (more than usual) the powerpc allyesconfig build.

Changes since 20140117:

Dropped tree: sh (complex merge conflicts against very old commits)
imx-mxs (complex merge conflicts against the arm tree)

The imx-mxs tree gained conflicts against the arm tree (so I dropped it).

The powerpc tree still had its build failure.

The sound-asoc tree gained a conflict against the sound tree.

The kvm tree lost its build failure.

The gpio tree still had its build failure for which I reverted a commit.

Non-merge commits (relative to Linus' tree): 9729
 9131 files changed, 494215 insertions(+), 241500 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (minus CONFIG_PROFILE_ALL_BRANCHES - this fails its final
link) and i386, sparc, sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

I am currently merging 209 trees (counting Linus' and 29 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (c9cdd9a6ae49 Merge branch 'x86/mpx' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging fixes/master (b0031f227e47 Merge tag 's2mps11-build' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator)
Merging kbuild-current/rc-fixes (19514fc665ff arm, kbuild: make "make install" 
not depend on vmlinux)
Merging arc-current/for-curr (7e22e91102c6 Linux 3.13-rc8)
Merging arm-current/fixes (b25f3e1c3584 ARM: 7938/1: OMAP4/highbank: Flush L2 
cache before disabling)
Merging m68k-current/for-linus (56931d73697c m68k/mac: Make SCC reset work more 
reliably)
Merging metag-fixes/fixes (3b2f64d00c46 Linux 3.11-rc2)
Merging powerpc-merge/merge (b3084f4db3ae powerpc/thp: Fix crash on mremap)
Merging sparc/master (ef350bb7c5e0 Merge tag 'ext4_for_linus_stable' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4)
Merging net/master (7d0d46da750a Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (965cdea82569 dccp: catch failed request_module call in 
dccp_probe init)
Merging sound-current/for-linus (7552f34a7900 Merge tag 'asoc-v3.14-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus)
Merging pci-current/for-linus (f0b75693cbb2 MAINTAINERS: Add DesignWare, i.MX6, 
Armada, R-Car PCI host maintainers)
Merging wireless/master (2eff7c791a18 Merge tag 'nfc-fixes-3.13-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-fixes)
Merging driver-core.current/driver-core-linus (413541dd66d5 Linux 3.13-rc5)
Merging tty.current/tty-linus (413541dd66d5 Linux 3.13-rc5)
Merging usb.current/usb-linus (413541dd66d5 Linux 3.13-rc5)
Merging staging.current/staging-linus (413541dd66d5 Linux 3.13-rc5)
Merging char-misc.current/char-misc-linus (802eee95bde7 Linux 3.13-rc6)
Merging input-current/for-linus (8e2f2325b73f Input: xpad - add new USB IDs for 
Logitech F310 and F710)
Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" 
stripe)
Merging crypto-current/master (efb753b8e013 crypto: ixp4xx - Fix kernel compile 
error)
Merging ide/master (c2f7d1e103ef ide: pmac: remove unnecessary 
pci_set_drvdata())
Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff)
Merging sh-current/sh-fixes-for-linus (44033109e99c SH: Convert out[bwl] macros 
to inline functions)
Merging devicetree-current/devicetree/merge (6f041e99fc7b of: Fix NULL

Re: [PATCH 1/2] USB: at91: fix the number of endpoint parameter

2014-01-20 Thread Jean-Christophe PLAGNIOL-VILLARD

On 11:39 Mon 20 Jan , Bo Shen wrote:
> Hi J,
> 
> On 01/18/2014 01:20 PM, Jean-Christophe PLAGNIOL-VILLARD wrote:
> >On 10:59 Fri 17 Jan , Bo Shen wrote:
> >>In sama5d3 SoC, there are 16 endpoints. As the USBA_NR_ENDPOINTS
> >>is only 7. So, fix it for sama5d3 SoC using the udc->num_ep.
> >>
> >>Signed-off-by: Bo Shen 
> >>---
> >>
> >>  drivers/usb/gadget/atmel_usba_udc.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/usb/gadget/atmel_usba_udc.c 
> >>b/drivers/usb/gadget/atmel_usba_udc.c
> >>index 2cb52e0..7e67a81 100644
> >>--- a/drivers/usb/gadget/atmel_usba_udc.c
> >>+++ b/drivers/usb/gadget/atmel_usba_udc.c
> >>@@ -1670,7 +1670,7 @@ static irqreturn_t usba_udc_irq(int irq, void *devid)
> >>if (ep_status) {
> >>int i;
> >>
> >>-   for (i = 0; i < USBA_NR_ENDPOINTS; i++)
> >>+   for (i = 0; i < udc->num_ep; i++)
> >
> >no the limit need to specified in the driver as a checkpoint by the 
> >compatible
> >or platform driver id
> 
> You mean, we should not trust the data passed from dt node or
> platform data? Or do you think we should do double confirm?

no base on the driver name or the compatible you will known the MAX EP

not based on the dt ep description

as we do on pinctrl-at91

Best Regards,
J.
> >>if (ep_status & (1 << i)) {
> >>if (ep_is_control(>usba_ep[i]))
> >>usba_control_irq(udc, >usba_ep[i]);
> >>--
> >>1.8.5.2
> >>
> 
> Best Regards,
> Bo Shen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V5 1/3] mm/nobootmem: Fix unused variable

2014-01-20 Thread David Rientjes

On Mon, 20 Jan 2014, Philipp Hachtmann wrote:

> diff --git a/mm/nobootmem.c b/mm/nobootmem.c
> index e2906a5..0215c77 100644
> --- a/mm/nobootmem.c
> +++ b/mm/nobootmem.c
> @@ -116,23 +116,29 @@ static unsigned long __init 
> __free_memory_core(phys_addr_t start,
>  static unsigned long __init free_low_memory_core_early(void)
>  {
>   unsigned long count = 0;
> - phys_addr_t start, end, size;
> + phys_addr_t start, end;
>   u64 i;
>  
> +#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
> + phys_addr_t size;
> +#endif
> +
>   for_each_free_mem_range(i, NUMA_NO_NODE, , , NULL)
>   count += __free_memory_core(start, end);
>  
>  #ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
> -
> - /* Free memblock.reserved array if it was allocated */
> - size = get_allocated_memblock_reserved_regions_info();
> - if (size)
> - count += __free_memory_core(start, start + size);
> -
> - /* Free memblock.memory array if it was allocated */
> - size = get_allocated_memblock_memory_regions_info();
> - if (size)
> - count += __free_memory_core(start, start + size);
> + {
> + phys_addr_t size;

I think you may have misunderstood Andrew's suggestion: "size" here is 
overloading the "size" you have already declared for this configuration.

Not sure why you don't just do a one line patch:

-   phys_addr_t size;
+   phys_addr_t size __maybe_unused;

to fix it.

> + /* Free memblock.reserved array if it was allocated */
> + size = get_allocated_memblock_reserved_regions_info();
> + if (size)
> + count += __free_memory_core(start, start + size);
> + 
> + /* Free memblock.memory array if it was allocated */
> + size = get_allocated_memblock_memory_regions_info();
> + if (size)
> + count += __free_memory_core(start, start + size);
> + }
>  #endif
>  
>   return count;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] memcg: Do not hang on OOM when killed by userspace OOM access to memory reserves

2014-01-20 Thread David Rientjes

On Thu, 16 Jan 2014, Michal Hocko wrote:

> > The heuristic may have existed for ages, but the proposed memcg 
> > configuration for preserving memory such that userspace oom handlers may 
> > run such as
> > 
> >  _root__
> > /   \
> > user oom
> >/\   /   \
> >AB   a   b
> > 
> > where user/memory.limit_in_bytes == [amount of present RAM] + 
> > oom/memory.limit_in_bytes - [some fudge] causes all bypasses to be 
> > problematic, including Johannes' buggy bypass for charges in memcgs with 
> > pending memcgs that has since been fixed after I identified it.  This 
> > bypass is included.  Processes attached to "a" and "b" are userspace oom 
> > handlers for processes attached to "A" and "B", respectively.
> > 
> > The amount of memory you're talking about is proportional to the number of 
> > processes that have pending SIGKILLs (and now those with PF_EXITING set), 
> > the former of which is obviously more concerning since they could be 
> > charging memory at any point in the kernel that would succeed. 
> 
> I understand your concerns. Yes, excessive charges might be dangerous. I
> haven't dismissed that when you mentioned it earlier. I am just
> repeatedly asking how much memory are we talking about, how real is the
> issue and what are all the other conseqeunces. And for some reason you
> are not providing that information (or maybe I am just not seeing that
> in your responses) and that is why we are stuck in circle.
> 

Wtf are you talking about?  You're adding a bypass in this patch and then 
you're asking me to go and see how much memory it could potentially bypass 
and take away from oom handlers under the above memcg configuration?  This 
seems like something you should provide before throwing out patches that 
nobody has tested if you want to make the argument that the above memcg 
configuration is valid for handling userspace oom notifications.

And you certainly have dismissed what I've mentioned earlier when I said 
that anybody can add memory allocation to the exit path later on and 
nobody is going to think about how much memory this is going to bypass to 
the root memcg and potentially take away from userspace oom handlers.

There's two possible ways to forward this:

 - avoid bypass to the root memcg in every possible case such that the
   above memcg configuration actually makes a guarantee to userspace oom
   handlers attached to it, or

 - provide per-memcg memory reserves such that userspace oom handlers can
   allocate and charge memory without the above memcg configuration so 
   there is a guarantee.

What's not acceptable, now or ever, is suggesting a solution to a problem 
that is supposed to guarantee some resource and then allow under some 
circumstances that resource to be completely depleted such that the 
solution never works.

> Yes, and apart from GFP_NOFAIL we are allowing to bypass only those that
> should terminate in a short time. I think that having a setup with a
> guarantee of never triggering the global OOM is too ambitious and I am
> even skeptical it would be achievable.
> 

"Short time" is meaningless if the memory allocation causes memory to not 
be available to userspace oom handlers.  If allocations are allowed to be 
charged because you're in the exit() path or because you have SIGKILL, 
that can result in a system oom condition that would prevent userspace 
from being able to handle them.

> > I'm debating both fatal_signal_pending() and PF_EXITING here since they 
> > are now both bypasses, we need to remove fatal_signal_pending().  My 
> > simple question with your patch: how do you guarantee memory to processes 
> > attached to "a" and "b"?
> 
> The only way you can get that _guarantee_ is to account all the memory
> allocations. And that is not implemented and I would even question
> whether it is worthwhile. So we still have to live with a possibility
> of triggering the global OOM killer. That's why I believe we need to be
> able to tell the kernel what is the user policy for oom killer (that is
> a different discussion though).
> 

So you're saying that Tejun's suggested userspace oom handler 
configuration is pointless, correct?  We can certainly provide a guarantee 
if memory is reserved specifically for userspace oom handling like I 
proposed, the same way that memory reserves are guaranteed for oom killed 
processes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves

2014-01-20 Thread David Rientjes

On Mon, 20 Jan 2014, Greg Kroah-Hartmann wrote:

> > The patches getting proposed through -mm for stable boggles my mind
> > sometimes.
> 
> Do you have any objections to patches that I have taken for -stable?  If
> so, please let me know.
> 

You've haven't taken the ones that I outlined in 
http://marc.info/?l=linux-kernel=138580717728759, so I'm happy that 
those could be prevented.  I'm identifying another patch here that is 
pending in -mm that obviously violates the stable kernel rules and I don't 
believe it should be annotated in a way that you'll scoop it up later.

The patch in question hasn't been tested by anybody and I don't think you 
want such things to ever be merged into a stable kernel series.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves

2014-01-20 Thread Greg Kroah-Hartmann

On Mon, Jan 20, 2014 at 09:58:28PM -0800, David Rientjes wrote:
> The patches getting proposed through -mm for stable boggles my mind
> sometimes.

Do you have any objections to patches that I have taken for -stable?  If
so, please let me know.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/2] sched/update_avg: avoid negative time

2014-01-20 Thread Alex Shi

On 01/21/2014 01:33 PM, Alex Shi wrote:
> rq->avg_idle try to reflect the average idle time between the cpu idle
> and first wakeup. But in the function, it maybe get a negative value
> if old avg_idle is too small. Then this negative value will be double
> counted in next time calculation. Guess that is not the original purpose,
> so recalibrate it to zero.

Forget this patch, the avg_idle is impossible to get negative.
Sorry for noise!


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves

2014-01-20 Thread David Rientjes

On Thu, 16 Jan 2014, Michal Hocko wrote:

> > This is concerning because it's merged in -mm without being tested by Eric 
> > and is marked for stable while violating the stable kernel rules criteria.
> 
> Are you questioning the patch fixes the described issue?
> 
> Please note that the exit_robust_list and PF_EXITING as a culprit has
> been identified by Eric. Of course I would prefer if it was tested by
> anybody who can reproduce it.

You're saying the patch hasn't been tested by anybody and that clearly 
violates the first rule in Documentation/stable_kernel_rules.txt:

 - It must be obviously correct and tested.

Adding Greg to the cc if this should be clarified further.  The patches 
getting proposed through -mm for stable boggles my mind sometimes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf tools: Fix JIT profiling on heap

2014-01-20 Thread Namhyung Kim

Hi Arnaldo,

On Fri, 17 Jan 2014 11:34:04 -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Jan 17, 2014 at 04:44:04PM +0900, Namhyung Kim escreveu:
>> On Thu, 16 Jan 2014 20:23:27 +, Gaurav Jain wrote:
>> > On 1/16/14, 9:37 AM, "Arnaldo Carvalho de Melo" 
>> > wrote:
>> >
>> >>Em Thu, Jan 16, 2014 at 10:49:31AM +0900, Namhyung Kim escreveu:
>> >>> Gaurav reported that perf cannot profile JIT program if it executes
>> >>> the code on heap.  This was because current map__new() only handle JIT
>> >>> on anon mappings - extends it to handle no_dso (heap, stack) case too.
>> >>> 
>> >>> This patch assumes JIT profiling only provides dynamic function
>> >>> symbols so check the mapping type to distinguish the case.  It'd
>> >>> provide no symbols for data mapping - if we need to support symbols on
>> >>> data mappings later it should be changed.
>> >>
>> >>But we do support symbols in data mappings, that is why we have
>> >>MAP__VARIABLE, etc, can you elaborate?
>
>> > Does perf support data mappings from perf map files? Could you please
>> > share an example of how I may be able to use this.
>
>> IIUC there's no difference between function and data mapping.  So you
>> can use same perf map file for both - in fact there's no way to use
>> different map file in a single task.  I guess perf will use it to find
>
> Do the /tmp/perf mapping has any per entry indication on the type of
> symbol it is (data, text) like ELF and kallsyms symtabs have?

Quoting Documentation/jit-interface.txt:

  Each line has the following format, fields separated with spaces:

  START SIZE symbolname

  START and SIZE are hex numbers without 0x.
  symbolname is the rest of the line, so it could contain special
  characters.

>
> It is possible for a function and a variable to have the same virtual
> address in some architectures (SPARC, iirc), that is why we have
> different MAP_ types (FUNCTION, VARIABLE) (which should really be
> renamed to TEXT, DATA).

Hmm.. didn't know that, interesting..

>
> So a 'struct map' for a data mmap should possibly point to a different
> 'dso' of the JIT /tmp/perf-... style if those maps don't have per entry
> indication of text/data.

Yes, but there's no way to do it currently.

>
>> only function symbols in function mappings and variables in data
>> mapping based on the address it accesses.
>
> Well, the lookup should figure out if the IP refers to TEXT or DATA and
> use MAP__{FUNCTION, VARIABLE} accordingly when asking for symbol
> resolution.

Right.  But in this case we cannot determine whether a symbol in the
/tmp/perf-... file is a function or variable.

>
>> What I wasn't sure is whether JIT program also produces some dynamic data.
>> And I think only perf mem command cares about data mappings, no?
>
> Well, I think it would be great to do that kind of data resolution for
> JITs the same way it is interesting to do for ELF ones :-)
>
> I need to stare harder at that patch, but with the above in mind, do we
> really have to check if the map is MAP__FUNCTION as IIRC this patch
> does?

Not sure.  For a JIT case, I guess the mapping is always executable and
we don't support data mapping yet, so it seems okay for now.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] fs: don't write pages when receiving a pending SIGKILL in __get_user_pages()

2014-01-20 Thread David Rientjes

On Sat, 18 Jan 2014, Xishi Qiu wrote:

> In the process IO direction, dio_refill_pages will call get_user_pages_fast 
> to map the page from user space. If ret is less than 0 and IO is write, the 
> function will create a zero page to fill data. This may work for some file 
> system, but in some device operate we prefer whole write or fail, not half 
> data half zero, e.g. fs metadata, like inode, identy.

So you're attemping to define a behavior for all users of direct IO for a 
problem that is filesystem or backing device dependent?  Perhaps if you 
elaborated on the problem that you're seeing then we could address it.

> This happens often when kill a process which is doing direct IO. Consider 
> the following cases, the process A is doing IO process, may enter 
> __get_user_pages 
> function, if other processes send process A SIG_KILL, A will enter the 
> following branches 
>   /*
>* If we have a pending SIGKILL, don't keep faulting
>* pages and potentially allocating memory.
>*/
>   if (unlikely(fatal_signal_pending(current)))
>   return i ? i : -ERESTARTSYS;
> Return current pages. direct IO will write the pages, the subsequent pages 
> which can’t get will use zero page instead. 
> 
> Signed-off-by: Xishi Qiu 
> Signed-off-by: Bin Yang 
> ---
>  fs/direct-io.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 0e04142..b74d565 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -174,6 +174,9 @@ static inline int dio_refill_pages(struct dio *dio, 
> struct dio_submit *sdio)
>   >pages[0]);/* Put results here */
>  
>   if (ret < 0 && sdio->blocks_available && (dio->rw & WRITE)) {
> + /* If task is killed, do not write anymore */
> + if (ret == -ERESTARTSYS)
> + goto out;
>   struct page *page = ZERO_PAGE(0);
>   /*
>* A memory fault, but the filesystem has some outstanding

We don't mix declarations and text, please try to compile your patches 
before proposing them.

Re: [patch 9/9] mm: keep page cache radix tree nodes in check

2014-01-20 Thread Johannes Weiner

On Tue, Jan 21, 2014 at 02:03:58PM +1100, Dave Chinner wrote:
> On Mon, Jan 20, 2014 at 06:17:37PM -0500, Johannes Weiner wrote:
> > On Fri, Jan 17, 2014 at 11:05:17AM +1100, Dave Chinner wrote:
> > > On Fri, Jan 10, 2014 at 01:10:43PM -0500, Johannes Weiner wrote:
> > > > +   /* Only shadow entries in there, keep track of this node */
> > > > +   if (!(node->count & RADIX_TREE_COUNT_MASK) &&
> > > > +   list_empty(>private_list)) {
> > > > +   node->private_data = mapping;
> > > > +   list_lru_add(_shadow_nodes, 
> > > > >private_list);
> > > > +   }
> > > 
> > > You can't do this list_empty(>private_list) check safely
> > > externally to the list_lru code - only time that entry can be
> > > checked safely is under the LRU list locks. This is the reason that
> > > list_lru_add/list_lru_del return a boolean to indicate is the object
> > > was added/removed from the list - they do this list_empty() check
> > > internally. i.e. the correct, safe way to do conditionally update
> > > state iff the object was added to the LRU is:
> > > 
> > >   if (!(node->count & RADIX_TREE_COUNT_MASK)) {
> > >   if (list_lru_add(_shadow_nodes, >private_list))
> > >   node->private_data = mapping;
> > >   }
> > > 
> > > > +   radix_tree_replace_slot(slot, page);
> > > > +   mapping->nrpages++;
> > > > +   if (node) {
> > > > +   node->count++;
> > > > +   /* Installed page, can't be shadow-only anymore */
> > > > +   if (!list_empty(>private_list))
> > > > +   list_lru_del(_shadow_nodes,
> > > > +>private_list);
> > > > +   }
> > > 
> > > Same issue here:
> > > 
> > >   if (node) {
> > >   node->count++;
> > >   list_lru_del(_shadow_nodes, >private_list);
> > >   }
> > 
> > All modifications to node->private_list happen under
> > mapping->tree_lock, and modifications of a neighboring link should not
> > affect the outcome of the list_empty(), so I don't think the lru lock
> > is necessary.
> 
> Can you please add that as a comment somewhere explaining why it is
> safe to do this?

Absolutely.

> > > > +   case LRU_REMOVED_RETRY:
> > > > if (--nlru->nr_items == 0)
> > > > node_clear(nid, lru->active_nodes);
> > > > WARN_ON_ONCE(nlru->nr_items < 0);
> > > > isolated++;
> > > > +   /*
> > > > +* If the lru lock has been dropped, our list
> > > > +* traversal is now invalid and so we have to
> > > > +* restart from scratch.
> > > > +*/
> > > > +   if (ret == LRU_REMOVED_RETRY)
> > > > +   goto restart;
> > > > break;
> > > > case LRU_ROTATE:
> > > > list_move_tail(item, >list);
> > > 
> > > I think that we need to assert that the list lru lock is correctly
> > > held here on return with LRU_REMOVED_RETRY. i.e.
> > > 
> > >   case LRU_REMOVED_RETRY:
> > >   assert_spin_locked(>lock);
> > >   case LRU_REMOVED:
> > 
> > Ah, good idea.  How about adding it to LRU_RETRY as well?
> 
> Yup, good idea.

Ok, will do.

> > > > +static struct shrinker workingset_shadow_shrinker = {
> > > > +   .count_objects = count_shadow_nodes,
> > > > +   .scan_objects = scan_shadow_nodes,
> > > > +   .seeks = DEFAULT_SEEKS * 4,
> > > > +   .flags = SHRINKER_NUMA_AWARE,
> > > > +};
> > > 
> > > Can you add a comment explaining how you calculated the .seeks
> > > value? It's important to document the weighings/importance
> > > we give to slab reclaim so we can determine if it's actually
> > > acheiving the desired balance under different loads...
> > 
> > This is not an exact science, to say the least.
> 
> I know, that's why I asked it be documented rather than be something
> kept in your head.
> 
> > The shadow entries are mostly self-regulated, so I don't want the
> > shrinker to interfere while the machine is just regularly trimming
> > caches during normal operation.
> > 
> > It should only kick in when either a) reclaim is picking up and the
> > scan-to-reclaim ratio increases due to mapped pages, dirty cache,
> > swapping etc. or b) the number of objects compared to LRU pages
> > becomes excessive.
> > 
> > I think that is what most shrinkers with an elevated seeks value want,
> > but this translates very awkwardly (and not completely) to the current
> > cost model, and we should probably rework that interface.
> > 
> > "Seeks" currently encodes 3 ratios:
> > 
> >   1. the cost of creating an object vs. a page
> > 
> >   2. the expected number of objects vs. pages
> 
> It doesn't encode that at all. If it did, then the default value
> wouldn't be "2".
>
> >   3. the cost of

[PATCH 2/2] sched: add statistic for rq->max_idle_balance_cost

2014-01-20 Thread Alex Shi

It's useful to track this value in debug mode.

Signed-off-by: Alex Shi 
---
 kernel/sched/debug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 1e43e70..f5c529a 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -315,6 +315,7 @@ do {
\
P(sched_goidle);
 #ifdef CONFIG_SMP
P64(avg_idle);
+   p64(max_idle_balance_cost);
 #endif
 
P(ttwu_count);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 1/2] sched/update_avg: avoid negative time

2014-01-20 Thread Alex Shi

rq->avg_idle try to reflect the average idle time between the cpu idle
and first wakeup. But in the function, it maybe get a negative value
if old avg_idle is too small. Then this negative value will be double
counted in next time calculation. Guess that is not the original purpose,
so recalibrate it to zero.

Signed-off-by: Alex Shi 
---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 30eb011..af9121c6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1358,6 +1358,9 @@ static void update_avg(u64 *avg, u64 sample)
 {
s64 diff = sample - *avg;
*avg += diff >> 3;
+
+   if (*avg < 0)
+   *avg = 0;
 }
 #endif
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?

2014-01-20 Thread David Rientjes

On Mon, 20 Jan 2014, Jianguo Wu wrote:

> When OOM happen, will dump buddy free areas info, hugetlb pages info,
> memory state of all eligible tasks, per-cpu memory info.
> But do not dump slab/vmalloc info, sometime, it's not enough to figure out the
> reason OOM happened.
> 
> So, my questions are:
> 1. Should dump slab/vmalloc info when OOM happen? Though we can get these 
> from proc file,
> but usually we do not monitor the logs and check proc file immediately when 
> OOM happened.
> 

The problem is that slabinfo becomes excessively verbose and dumping it 
all to the kernel log often times causes important messages to be lost.  
This is why we control things like the tasklist dump with a VM sysctl.  It 
would be possible to dump, say, the top ten slab caches with the highest 
memory usage, but it will only be helpful for slab leaks.  Typically there 
are better debugging tools available than analyzing the kernel log; if you 
see unusually high slab memory in the meminfo dump, you can enable it.

> 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be 
> dumped?
> 

Also very verbose and would cause important messages to be lost, we try to 
avoid spamming the kernel log with all of this information as much as 
possible.

> 3. Without these info, usually how to figure out OOM reason?
> 

Analyze the memory usage in the meminfo and determine what is unusually 
high; if it's mostly anonymous memory, you can usually correlate it back 
to a high rss for a process in the tasklist that you didn't suspect to be 
using that much memory, for example.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BTRFS-specific] Re: Dirty deleted files cause pointless I/O storms (unless truncated first)

2014-01-20 Thread Andy Lutomirski

[cc: btrfs]

On Mon, Jan 20, 2014 at 8:46 PM, Dave Chinner  wrote:
> On Mon, Jan 20, 2014 at 04:59:23PM -0800, Andy Lutomirski wrote:
>> The code below runs quickly for a few iterations, and then it slows
>> down and the whole system becomes laggy for far too long.
>>
>> Removing the sync_file_range call results in no I/O being performed at
>> all (which means that the kernel isn't totally screwing this up), and
>> changing "4096" to SIZE causes lots of I/O but without
>> the going-out-to-lunch bit (unsurprisingly).
>
> More details please. hardware, storage, kernel version, etc.

The kernel is 3.11.10-301.fc20.x86_64.  It's an excessively fast CPU
(Intel i7-3930K) with 16GB RAM and a Corsair Force 3 SSD (6Gb/s SATA)
SSD.  The FS is btrfs on LVM on dm-crypt.

In that setup, this thing goes quickly for 100 iterations or so, at
which point even trying to Ctrl-C it lags out for ten seconds or so.

I clearly should have tested more thoroughly, though -- I can't
reproduce this problem on ext4.

>
> I can't reproduce any slowdown with the code as posted on a VM
> running 3.31-rc5 with 16GB RAM and an SSD w/ ext4 or XFS. The
> workload is only generating about 80 IOPS on ext4 so even a slow
> spindle should be able handle this without problems...
>
>> Surprisingly, uncommenting the ftruncate call seems to fix the
>> problem.  This suggests that all the necessary infrastructure to avoid
>> wasting time writing to deleted files is there but that it's not
>> getting used.
>
> Not surprising at all - if it's stuck in a writeback loop somewhere,
> truncating the file will terminate writeback because it end up being
> past EOF and so stops immediately...

Presumably ext4 and xfs are smart enough to stop writeback when the
inode is gone, but btrfs is still either keeping the inode alive or
just finishes writeback anyway.

--Andy

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 

#define SIZE (16 * 1048576)

static void hammer(const char *name)
{
  int fd = open(name, O_RDWR | O_CREAT | O_EXCL, 0600);
  if (fd == -1)
err(1, "open");

  fallocate(fd, 0, 0, SIZE);

  void *addr = mmap(NULL, SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
  if (addr == MAP_FAILED)
err(1, "mmap");

  memset(addr, 0, SIZE);

  if (munmap(addr, SIZE) != 0)
err(1, "munmap");

  if (sync_file_range(fd, 0, 4096,
  SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE |
  SYNC_FILE_RANGE_WAIT_AFTER) != 0)
err(1, "sync_file_range");

  if (unlink(name) != 0)
err(1, "unlink");

  //  if (ftruncate(fd, 0) != 0)
  //err(1, "ftruncate");

  close(fd);
}

int main(int argc, char **argv)
{
  if (argc != 2) {
printf("Usage: hammer_and_delete FILENAME\n");
return 1;
  }

  while (true) {
hammer(argv[1]);
write(1, ".", 1);
  }
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] media: i2c: mt9p031: Check return value of clk_prepare_enable/clk_set_rate

2014-01-20 Thread Prabhakar Lad

From: "Lad, Prabhakar" 

clk_set_rate(), clk_prepare_enable() functions can fail, so check the return
values to avoid surprises.

Signed-off-by: Lad, Prabhakar 
---
 Changes for v2:
 1: Called regulator_bulk_disable() in the error path
 
 drivers/media/i2c/mt9p031.c |   15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/media/i2c/mt9p031.c b/drivers/media/i2c/mt9p031.c
index e5ddf47..05278f5 100644
--- a/drivers/media/i2c/mt9p031.c
+++ b/drivers/media/i2c/mt9p031.c
@@ -222,12 +222,15 @@ static int mt9p031_clk_setup(struct mt9p031 *mt9p031)
 
struct i2c_client *client = v4l2_get_subdevdata(>subdev);
struct mt9p031_platform_data *pdata = mt9p031->pdata;
+   int ret;
 
mt9p031->clk = devm_clk_get(>dev, NULL);
if (IS_ERR(mt9p031->clk))
return PTR_ERR(mt9p031->clk);
 
-   clk_set_rate(mt9p031->clk, pdata->ext_freq);
+   ret = clk_set_rate(mt9p031->clk, pdata->ext_freq);
+   if (ret < 0)
+   return ret;
 
mt9p031->pll.ext_clock = pdata->ext_freq;
mt9p031->pll.pix_clock = pdata->target_freq;
@@ -286,8 +289,14 @@ static int mt9p031_power_on(struct mt9p031 *mt9p031)
return ret;
 
/* Emable clock */
-   if (mt9p031->clk)
-   clk_prepare_enable(mt9p031->clk);
+   if (mt9p031->clk) {
+   ret = clk_prepare_enable(mt9p031->clk);
+   if (ret) {
+   regulator_bulk_disable(ARRAY_SIZE(mt9p031->regulators),
+  mt9p031->regulators);
+   return ret;
+   }
+   }
 
/* Now RESET_BAR must be high */
if (gpio_is_valid(mt9p031->reset)) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] x86/kaslr for v3.14

2014-01-20 Thread Kees Cook

On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds
 wrote:
> So I pulled this, but one question:
>
> On Mon, Jan 20, 2014 at 8:47 AM, H. Peter Anvin  wrote:
>> +config RANDOMIZE_BASE
>> +   bool "Randomize the address of the kernel image"
>> +   depends on RELOCATABLE
>> +   depends on !HIBERNATION
>
> How fundamental is that "!HIBERNATION" issue? Right now that
> anti-dependency on hibernation support will mean that no distro kernel
> will actually use the kernel address space randomization. Which
> long-term is a problem.
>
> I'm not sure HIBERNATION is really getting all that much use, but I
> suspect distros would still want to support it.
>
> Is it just a temporary "I wasn't able to make it work, need to get
> some PM people involved", or is it something really fundamental?

Right, this is a "need to get PM people involved" situation. When
kASLR was being designed, hibernation learning about the kernel base
looked like a separable problem, and given the very long list of
requirements for making it work at all, I carved this out as "future
work".

As for perf, it's similar -- it's another entirely solvable problem,
but perf needs to be untaught some of its assumptions.

We've had a static kernel base forever, so I'm expecting some bumps in
the road here. I'm hopeful none of it will be too painful, though.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/x32] uapi: Use __kernel_ulong_t in shmid64_ds/shminfo64/ shm_info

2014-01-20 Thread tip-bot for H.J. Lu

Commit-ID:  f8dcdf0130d3ba34f8f7531af7c45616efe1e32e
Gitweb: http://git.kernel.org/tip/f8dcdf0130d3ba34f8f7531af7c45616efe1e32e
Author: H.J. Lu 
AuthorDate: Fri, 27 Dec 2013 14:14:23 -0800
Committer:  H. Peter Anvin 
CommitDate: Mon, 20 Jan 2014 14:45:25 -0800

uapi: Use __kernel_ulong_t in shmid64_ds/shminfo64/shm_info

Both x32 and x86-64 use the same struct shmid64_ds/shminfo64/shm_info for
system calls.  But x32 long is 32-bit. This patch replaces unsigned long
with __kernel_ulong_t in struct shmid64_ds/shminfo64/shm_info.

Signed-off-by: H.J. Lu 
Link: 
http://lkml.kernel.org/r/1388182464-28428-8-git-send-email-hjl.to...@gmail.com
Signed-off-by: H. Peter Anvin 
---
 include/uapi/asm-generic/shmbuf.h | 24 
 include/uapi/linux/shm.h  | 10 +-
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/uapi/asm-generic/shmbuf.h 
b/include/uapi/asm-generic/shmbuf.h
index 5768fa6..7e9fb2f 100644
--- a/include/uapi/asm-generic/shmbuf.h
+++ b/include/uapi/asm-generic/shmbuf.h
@@ -39,21 +39,21 @@ struct shmid64_ds {
 #endif
__kernel_pid_t  shm_cpid;   /* pid of creator */
__kernel_pid_t  shm_lpid;   /* pid of last operator */
-   unsigned long   shm_nattch; /* no. of current attaches */
-   unsigned long   __unused4;
-   unsigned long   __unused5;
+   __kernel_ulong_tshm_nattch; /* no. of current attaches */
+   __kernel_ulong_t__unused4;
+   __kernel_ulong_t__unused5;
 };
 
 struct shminfo64 {
-   unsigned long   shmmax;
-   unsigned long   shmmin;
-   unsigned long   shmmni;
-   unsigned long   shmseg;
-   unsigned long   shmall;
-   unsigned long   __unused1;
-   unsigned long   __unused2;
-   unsigned long   __unused3;
-   unsigned long   __unused4;
+   __kernel_ulong_tshmmax;
+   __kernel_ulong_tshmmin;
+   __kernel_ulong_tshmmni;
+   __kernel_ulong_tshmseg;
+   __kernel_ulong_tshmall;
+   __kernel_ulong_t__unused1;
+   __kernel_ulong_t__unused2;
+   __kernel_ulong_t__unused3;
+   __kernel_ulong_t__unused4;
 };
 
 #endif /* __ASM_GENERIC_SHMBUF_H */
diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
index ec36fa1..78b6941 100644
--- a/include/uapi/linux/shm.h
+++ b/include/uapi/linux/shm.h
@@ -68,11 +68,11 @@ struct  shminfo {
 
 struct shm_info {
int used_ids;
-   unsigned long shm_tot;  /* total allocated shm */
-   unsigned long shm_rss;  /* total resident shm */
-   unsigned long shm_swp;  /* total swapped shm */
-   unsigned long swap_attempts;
-   unsigned long swap_successes;
+   __kernel_ulong_t shm_tot;   /* total allocated shm */
+   __kernel_ulong_t shm_rss;   /* total resident shm */
+   __kernel_ulong_t shm_swp;   /* total swapped shm */
+   __kernel_ulong_t swap_attempts;
+   __kernel_ulong_t swap_successes;
 };
 
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] mm/memcg: fix endless iteration in reclaim

2014-01-20 Thread Hugh Dickins

On Fri, 17 Jan 2014, Michal Hocko wrote:
> On Thu 16-01-14 11:15:36, Hugh Dickins wrote:
> 
> > I don't believe 19f39402864e was responsible for a reference leak,
> > that came later.  But I think it was responsible for the original
> > endless iteration (shrink_zone going around and around getting root
> > again and again from mem_cgroup_iter).
> 
> So your hang is not within mem_cgroup_iter but you are getting root all
> the time without any way out?

In the 3.10 and 3.11 cases, yes.

> 
> [3.10 code base]
> shrink_zone
>   [rmdir root]
>   mem_cgroup_iter(root, NULL, reclaim)
> // prev = NULL
> rcu_read_lock()
> last_visited = iter->last_visited // gets root || NULL
> css_tryget(last_visited)  // failed
> last_visited = NULL   [1]
> memcg = root = __mem_cgroup_iter_next(root, NULL)
> iter->last_visited = root;
> reclaim->generation = iter->generation
> 
>  mem_cgroup_iter(root, root, reclaim)
>// prev = root
>rcu_read_lock
> last_visited = iter->last_visited // gets root
> css_tryget(last_visited)  // failed
> [1]
> 
> So we indeed can loop here without any progress. I just fail
> to see how my patch could help. We even do not get down to
> cgroup_next_descendant_pre.
> 
> Or am I missing something?

Your patch to 3.12 and 3.13 mem_cgroup_iter_next() doesn't help
in 3.10 and 3.11, correct.  That's why I appended a different patch,
to mem_cgroup_iter(), for the 3.10 and 3.11 versions of the hang.

> 
> The following should fix this kind of endless loop:
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 194721839cf5..168e5abcca92 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1221,7 +1221,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup 
> *root,
>   smp_rmb();
>   last_visited = iter->last_visited;
>   if (last_visited &&
> - !css_tryget(_visited->css))
> + last_visited != root &&
> +  !css_tryget(_visited->css))
>   last_visited = NULL;
>   }
>   }
> @@ -1229,7 +1230,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup 
> *root,
>   memcg = __mem_cgroup_iter_next(root, last_visited);
>  
>   if (reclaim) {
> - if (last_visited)
> + if (last_visited && last_visited != root)
>   css_put(_visited->css);
>  
>   iter->last_visited = memcg;

Right, that appears to fix 3.10, and seems a better alternative to the
patch I suggested.  I say "appears" because my success in reproducing
the hang is variable, so when I see that it's "fixed" I cannot be
quite sure.  I say "seems" because I think yours respects the intention
of the iterator better than mine, but I've never been convinced that
the iterator is as sensible as it intends in the face of races.

At the bottom I've appended the version of yours that I've been trying
on 3.11.  I did succeed in reproducing the hang twice on 3.11.10.3
(which I don't think differs in any essential from 3.11.0 for this issue,
but after my lack of success with 3.11.0 I tried harder with that.)

More so than in the 3.10 case, I haven't really given it long enough
with the patch to really assert that it's good; and Greg Thelen came
across a different reproduction case that I've yet to remind myself
of and try, I'll have to report back to you later in the week when
I've run that with your fix.

> 
> Not that I like it much :/

Well, I'm not in love with it, but I do think it's more appropriate
than mine, if it really does fix the issues.  It was only under
questioning from you that we arrived at the belief that the problem
is with the css_tryget of a root being removed: my patch was vaguer
than that, not identifying the root cause.

I suspect that the underlying problem is actually the "do {} while ()"
nature of the iteration loops, instead of "while () {}"s.  That places
us (not for the first time) in the awkward position of having to supply
something once (and once only) even when it doesn't really fit.

(I have wondered whether making mem_cgroup_invalidate_reclaim_iterators
visit the memcg as well as its parents, might provide another fix; nice
if it did, but I doubt it, and have spent so much time fiddling around
here that I've lost the will to try anything else.)

> 
> > But beware of my conclusion, please check for yourself: with my
> > separate kbuilds in separate /cg/cg/? memcgs, what "cg m" is doing
> > is very simple and segregated, can hardly be called testing reclaim
> > iteration, so I hope you have something better to check it.  Plus
> > I was testing on 3.10 and 3.11 vanilla, not latest stable versions.
> > 
> > (If I'm very honest, I'll admit that I

[tip:x86/x32] uapi: Use __kernel_long_t in struct mq_attr

2014-01-20 Thread tip-bot for H.J. Lu

Commit-ID:  63159f5dcccb3858d88aaef800c4ee0eb4cc8577
Gitweb: http://git.kernel.org/tip/63159f5dcccb3858d88aaef800c4ee0eb4cc8577
Author: H.J. Lu 
AuthorDate: Fri, 27 Dec 2013 14:14:24 -0800
Committer:  H. Peter Anvin 
CommitDate: Mon, 20 Jan 2014 14:45:33 -0800

uapi: Use __kernel_long_t in struct mq_attr

Both x32 and x86-64 use the same struct mq_attr for system calls.  But
x32 long is 32-bit. This patch replaces long with __kernel_long_t in
struct mq_attr.

Signed-off-by: H.J. Lu 
Link: 
http://lkml.kernel.org/r/1388182464-28428-9-git-send-email-hjl.to...@gmail.com
Signed-off-by: H. Peter Anvin 
---
 include/uapi/linux/mqueue.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/mqueue.h b/include/uapi/linux/mqueue.h
index 8b5a796..d0a2b8e 100644
--- a/include/uapi/linux/mqueue.h
+++ b/include/uapi/linux/mqueue.h
@@ -23,11 +23,11 @@
 #define MQ_BYTES_MAX   819200
 
 struct mq_attr {
-   longmq_flags;   /* message queue flags  */
-   longmq_maxmsg;  /* maximum number of messages   */
-   longmq_msgsize; /* maximum message size */
-   longmq_curmsgs; /* number of messages currently queued  */
-   long__reserved[4];  /* ignored for input, zeroed for output */
+   __kernel_long_t mq_flags;   /* message queue flags  
*/
+   __kernel_long_t mq_maxmsg;  /* maximum number of messages   
*/
+   __kernel_long_t mq_msgsize; /* maximum message size 
*/
+   __kernel_long_t mq_curmsgs; /* number of messages currently queued  
*/
+   __kernel_long_t __reserved[4];  /* ignored for input, zeroed for output 
*/
 };
 
 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/x32] uapi: Use __kernel_ulong_t in struct msqid64_ds

2014-01-20 Thread tip-bot for H.J. Lu

Commit-ID:  b9cd5ca22d6739c61655d4fcf8b29669d5d177a3
Gitweb: http://git.kernel.org/tip/b9cd5ca22d6739c61655d4fcf8b29669d5d177a3
Author: H.J. Lu 
AuthorDate: Fri, 27 Dec 2013 14:14:21 -0800
Committer:  H. Peter Anvin 
CommitDate: Mon, 20 Jan 2014 14:45:01 -0800

uapi: Use __kernel_ulong_t in struct msqid64_ds

Both x32 and x86-64 use the same struct msqid64_ds for system calls.
But x32 long is 32-bit. This patch replaces unsigned long with
__kernel_ulong_t in struct msqid64_ds.

Signed-off-by: H.J. Lu 
Link: 
http://lkml.kernel.org/r/1388182464-28428-6-git-send-email-hjl.to...@gmail.com
Signed-off-by: H. Peter Anvin 
---
 include/uapi/asm-generic/msgbuf.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/uapi/asm-generic/msgbuf.h 
b/include/uapi/asm-generic/msgbuf.h
index aec850d..f55ecc4 100644
--- a/include/uapi/asm-generic/msgbuf.h
+++ b/include/uapi/asm-generic/msgbuf.h
@@ -35,13 +35,13 @@ struct msqid64_ds {
 #if __BITS_PER_LONG != 64
unsigned long   __unused3;
 #endif
-   unsigned long  msg_cbytes;  /* current number of bytes on queue */
-   unsigned long  msg_qnum;/* number of messages in queue */
-   unsigned long  msg_qbytes;  /* max number of bytes on queue */
+   __kernel_ulong_t msg_cbytes;/* current number of bytes on queue */
+   __kernel_ulong_t msg_qnum;  /* number of messages in queue */
+   __kernel_ulong_t msg_qbytes;/* max number of bytes on queue */
__kernel_pid_t msg_lspid;   /* pid of last msgsnd */
__kernel_pid_t msg_lrpid;   /* last receive pid */
-   unsigned long  __unused4;
-   unsigned long  __unused5;
+   __kernel_ulong_t __unused4;
+   __kernel_ulong_t __unused5;
 };
 
 #endif /* __ASM_GENERIC_MSGBUF_H */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/x32] x86, uapi, x32: Use __kernel_ulong_t in x86 struct semid64_ds

2014-01-20 Thread tip-bot for H.J. Lu

Commit-ID:  386916598e901e406c1f1fc801ade2646a1e8137
Gitweb: http://git.kernel.org/tip/386916598e901e406c1f1fc801ade2646a1e8137
Author: H.J. Lu 
AuthorDate: Fri, 27 Dec 2013 14:14:22 -0800
Committer:  H. Peter Anvin 
CommitDate: Mon, 20 Jan 2014 14:45:13 -0800

x86, uapi, x32: Use __kernel_ulong_t in x86 struct semid64_ds

Both x32 and x86-64 use the same struct semid64_ds for system calls.
But x32 long is 32-bit. This patch replaces unsigned long with
__kernel_ulong_t in x86 struct semid64_ds.

Signed-off-by: H.J. Lu 
Link: 
http://lkml.kernel.org/r/1388182464-28428-7-git-send-email-hjl.to...@gmail.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/uapi/asm/sembuf.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/uapi/asm/sembuf.h 
b/arch/x86/include/uapi/asm/sembuf.h
index ee50c80..cc2d6a3 100644
--- a/arch/x86/include/uapi/asm/sembuf.h
+++ b/arch/x86/include/uapi/asm/sembuf.h
@@ -13,12 +13,12 @@
 struct semid64_ds {
struct ipc64_perm sem_perm; /* permissions .. see ipc.h */
__kernel_time_t sem_otime;  /* last semop time */
-   unsigned long   __unused1;
+   __kernel_ulong_t __unused1;
__kernel_time_t sem_ctime;  /* last change time */
-   unsigned long   __unused2;
-   unsigned long   sem_nsems;  /* no. of semaphores in array */
-   unsigned long   __unused3;
-   unsigned long   __unused4;
+   __kernel_ulong_t __unused2;
+   __kernel_ulong_t sem_nsems; /* no. of semaphores in array */
+   __kernel_ulong_t __unused3;
+   __kernel_ulong_t __unused4;
 };
 
 #endif /* _ASM_X86_SEMBUF_H */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/x32] uapi, asm-generic: Use __kernel_ulong_t in uapi struct ipc64_perm

2014-01-20 Thread tip-bot for H.J. Lu

Commit-ID:  071ed2456f79722d0a54f51717e66aacbc7a5d26
Gitweb: http://git.kernel.org/tip/071ed2456f79722d0a54f51717e66aacbc7a5d26
Author: H.J. Lu 
AuthorDate: Fri, 27 Dec 2013 14:14:19 -0800
Committer:  H. Peter Anvin 
CommitDate: Mon, 20 Jan 2014 14:44:35 -0800

uapi, asm-generic: Use __kernel_ulong_t in uapi struct ipc64_perm

x32 IPC system call is the same as x86-64 IPC system call, which uses
64-bit integer for unsigned long in struct ipc64_perm.  But x32 long is
32 bit.  This patch replaces unsigned long in uapi struct ipc64_perm with
__kernel_ulong_t.

Signed-off-by: H.J. Lu 
Link: 
http://lkml.kernel.org/r/1388182464-28428-4-git-send-email-hjl.to...@gmail.com
Signed-off-by: H. Peter Anvin 
---
 include/uapi/asm-generic/ipcbuf.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/asm-generic/ipcbuf.h 
b/include/uapi/asm-generic/ipcbuf.h
index 76982b2..3dbcc1e 100644
--- a/include/uapi/asm-generic/ipcbuf.h
+++ b/include/uapi/asm-generic/ipcbuf.h
@@ -27,8 +27,8 @@ struct ipc64_perm {
unsigned char   __pad1[4 - sizeof(__kernel_mode_t)];
unsigned short  seq;
unsigned short  __pad2;
-   unsigned long   __unused1;
-   unsigned long   __unused2;
+   __kernel_ulong_t__unused1;
+   __kernel_ulong_t__unused2;
 };
 
 #endif /* __ASM_GENERIC_IPCBUF_H */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/x32] uapi: Use __kernel_long_t in struct timex

2014-01-20 Thread tip-bot for H.J. Lu

Commit-ID:  7fb30128527a4220f181c2867edd9ac178175a87
Gitweb: http://git.kernel.org/tip/7fb30128527a4220f181c2867edd9ac178175a87
Author: H.J. Lu 
AuthorDate: Fri, 27 Dec 2013 14:14:17 -0800
Committer:  H. Peter Anvin 
CommitDate: Mon, 20 Jan 2014 14:44:05 -0800

uapi: Use __kernel_long_t in struct timex

x32 adjtimex system call is the same as x86-64 adjtimex system call,
which uses 64-bit integer for long in struct timex. But x32 long is
32 bit.  This patch replaces long in struct timex with __kernel_long_t.

Signed-off-by: H.J. Lu 
Link: 
http://lkml.kernel.org/r/1388182464-28428-2-git-send-email-hjl.to...@gmail.com
Signed-off-by: H. Peter Anvin 
---
 include/uapi/linux/timex.h | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/uapi/linux/timex.h b/include/uapi/linux/timex.h
index a7ea81f..92685d8 100644
--- a/include/uapi/linux/timex.h
+++ b/include/uapi/linux/timex.h
@@ -63,27 +63,27 @@
  */
 struct timex {
unsigned int modes; /* mode selector */
-   long offset;/* time offset (usec) */
-   long freq;  /* frequency offset (scaled ppm) */
-   long maxerror;  /* maximum error (usec) */
-   long esterror;  /* estimated error (usec) */
+   __kernel_long_t offset; /* time offset (usec) */
+   __kernel_long_t freq;   /* frequency offset (scaled ppm) */
+   __kernel_long_t maxerror;/* maximum error (usec) */
+   __kernel_long_t esterror;/* estimated error (usec) */
int status; /* clock command/status */
-   long constant;  /* pll time constant */
-   long precision; /* clock precision (usec) (read only) */
-   long tolerance; /* clock frequency tolerance (ppm)
-* (read only)
-*/
+   __kernel_long_t constant;/* pll time constant */
+   __kernel_long_t precision;/* clock precision (usec) (read only) */
+   __kernel_long_t tolerance;/* clock frequency tolerance (ppm)
+  * (read only)
+  */
struct timeval time;/* (read only, except for ADJ_SETOFFSET) */
-   long tick;  /* (modified) usecs between clock ticks */
+   __kernel_long_t tick;   /* (modified) usecs between clock ticks */
 
-   long ppsfreq;   /* pps frequency (scaled ppm) (ro) */
-   long jitter;/* pps jitter (us) (ro) */
+   __kernel_long_t ppsfreq;/* pps frequency (scaled ppm) (ro) */
+   __kernel_long_t jitter; /* pps jitter (us) (ro) */
int shift;  /* interval duration (s) (shift) (ro) */
-   long stabil;/* pps stability (scaled ppm) (ro) */
-   long jitcnt;/* jitter limit exceeded (ro) */
-   long calcnt;/* calibration intervals (ro) */
-   long errcnt;/* calibration errors (ro) */
-   long stbcnt;/* stability limit exceeded (ro) */
+   __kernel_long_t stabil;/* pps stability (scaled ppm) (ro) */
+   __kernel_long_t jitcnt; /* jitter limit exceeded (ro) */
+   __kernel_long_t calcnt; /* calibration intervals (ro) */
+   __kernel_long_t errcnt; /* calibration errors (ro) */
+   __kernel_long_t stbcnt; /* stability limit exceeded (ro) */
 
int tai;/* TAI offset (ro) */
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/x32] uapi: Use __kernel_long_t in struct msgbuf

2014-01-20 Thread tip-bot for H.J. Lu

Commit-ID:  443d5670f77aab121cb95f45da60f0aad390bcb5
Gitweb: http://git.kernel.org/tip/443d5670f77aab121cb95f45da60f0aad390bcb5
Author: H.J. Lu 
AuthorDate: Fri, 27 Dec 2013 14:14:20 -0800
Committer:  H. Peter Anvin 
CommitDate: Mon, 20 Jan 2014 14:44:50 -0800

uapi: Use __kernel_long_t in struct msgbuf

x32 msgsnd/msgrcv system calls are the same as x86-64 msgsnd/msgrcv system
calls, which use 64-bit integer for long in struct msgbuf . But x32 long
is 32 bit.  This patch replaces long in struct msgbuf with __kernel_long_t.

Signed-off-by: H.J. Lu 
Link: 
http://lkml.kernel.org/r/1388182464-28428-5-git-send-email-hjl.to...@gmail.com
Signed-off-by: H. Peter Anvin 
---
 include/uapi/linux/msg.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/msg.h b/include/uapi/linux/msg.h
index 22d95c6..a703755 100644
--- a/include/uapi/linux/msg.h
+++ b/include/uapi/linux/msg.h
@@ -34,8 +34,8 @@ struct msqid_ds {
 
 /* message buffer for msgsnd and msgrcv calls */
 struct msgbuf {
-   long mtype; /* type of message */
-   char mtext[1];  /* message text */
+   __kernel_long_t mtype;  /* type of message */
+   char mtext[1];  /* message text */
 };
 
 /* buffer for msgctl calls IPC_INFO, MSG_INFO */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/x32] uapi: Use __kernel_long_t/__kernel_ulong_t in < linux/resource.h>

2014-01-20 Thread tip-bot for H.J. Lu

Commit-ID:  b684bfedc94d4b2efff09dc499a9985321c482f5
Gitweb: http://git.kernel.org/tip/b684bfedc94d4b2efff09dc499a9985321c482f5
Author: H.J. Lu 
AuthorDate: Fri, 27 Dec 2013 14:14:18 -0800
Committer:  H. Peter Anvin 
CommitDate: Mon, 20 Jan 2014 14:44:17 -0800

uapi: Use __kernel_long_t/__kernel_ulong_t in 

Both x32 and x86-64 use the same struct rusage and struct rlimit for
system calls.  But x32 log is 32-bit.  This patch change uapi
 to use __kernel_long_t in struct rusage and
__kernel_ulong_t in and struct rlimit.

Signed-off-by: H.J. Lu 
Link: 
http://lkml.kernel.org/r/1388182464-28428-3-git-send-email-hjl.to...@gmail.com
Signed-off-by: H. Peter Anvin 
---
 include/uapi/linux/resource.h | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/include/uapi/linux/resource.h b/include/uapi/linux/resource.h
index e0ed284..36fb3b5 100644
--- a/include/uapi/linux/resource.h
+++ b/include/uapi/linux/resource.h
@@ -23,25 +23,25 @@
 struct rusage {
struct timeval ru_utime;/* user time used */
struct timeval ru_stime;/* system time used */
-   longru_maxrss;  /* maximum resident set size */
-   longru_ixrss;   /* integral shared memory size */
-   longru_idrss;   /* integral unshared data size */
-   longru_isrss;   /* integral unshared stack size */
-   longru_minflt;  /* page reclaims */
-   longru_majflt;  /* page faults */
-   longru_nswap;   /* swaps */
-   longru_inblock; /* block input operations */
-   longru_oublock; /* block output operations */
-   longru_msgsnd;  /* messages sent */
-   longru_msgrcv;  /* messages received */
-   longru_nsignals;/* signals received */
-   longru_nvcsw;   /* voluntary context switches */
-   longru_nivcsw;  /* involuntary " */
+   __kernel_long_t ru_maxrss;  /* maximum resident set size */
+   __kernel_long_t ru_ixrss;   /* integral shared memory size */
+   __kernel_long_t ru_idrss;   /* integral unshared data size */
+   __kernel_long_t ru_isrss;   /* integral unshared stack size */
+   __kernel_long_t ru_minflt;  /* page reclaims */
+   __kernel_long_t ru_majflt;  /* page faults */
+   __kernel_long_t ru_nswap;   /* swaps */
+   __kernel_long_t ru_inblock; /* block input operations */
+   __kernel_long_t ru_oublock; /* block output operations */
+   __kernel_long_t ru_msgsnd;  /* messages sent */
+   __kernel_long_t ru_msgrcv;  /* messages received */
+   __kernel_long_t ru_nsignals;/* signals received */
+   __kernel_long_t ru_nvcsw;   /* voluntary context switches */
+   __kernel_long_t ru_nivcsw;  /* involuntary " */
 };
 
 struct rlimit {
-   unsigned long   rlim_cur;
-   unsigned long   rlim_max;
+   __kernel_ulong_trlim_cur;
+   __kernel_ulong_trlim_max;
 };
 
 #define RLIM64_INFINITY(~0ULL)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] arch: use ASM_NL instead of ';' for assembler new line character in the macro

2014-01-20 Thread Vineet Gupta

Hi Mike,

On Saturday 18 January 2014 03:14 PM, Chen Gang wrote:
> Hello Maintainers:
>
> Please help check this patch when you have time.
>
> Thanks.

Do you know whose tree this is goona go thru. I can take it thru ARC (but maybe
for 3.15, however it would be better it went thru mm or some such).

-Vineet

>
> On 01/12/2014 09:59 AM, Chen Gang wrote:
>> For some assemblers, they use another character as newline in a macro
>> (e.g. arc uses '`'), so for generic assembly code, need use ASM_NL (a
>> macro) instead of ';' for it.
>>
>>
>> Signed-off-by: Chen Gang 
>> Acked-by: Vineet Gupta 
>> ---
>>  arch/arc/include/asm/linkage.h |  2 ++
>>  include/linux/linkage.h| 19 ---
>>  2 files changed, 14 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arc/include/asm/linkage.h b/arch/arc/include/asm/linkage.h
>> index 0283e9e..66ee552 100644
>> --- a/arch/arc/include/asm/linkage.h
>> +++ b/arch/arc/include/asm/linkage.h
>> @@ -11,6 +11,8 @@
>>  
>>  #ifdef __ASSEMBLY__
>>  
>> +#define ASM_NL   `  /* use '`' to mark new line in macro */
>> +
>>  /* Can't use the ENTRY macro in linux/linkage.h
>>   * gas considers ';' as comment vs. newline
>>   */
>> diff --git a/include/linux/linkage.h b/include/linux/linkage.h
>> index d3e8ad2..a6a42dd 100644
>> --- a/include/linux/linkage.h
>> +++ b/include/linux/linkage.h
>> @@ -6,6 +6,11 @@
>>  #include 
>>  #include 
>>  
>> +/* Some toolchains use other characters (e.g. '`') to mark new line in 
>> macro */
>> +#ifndef ASM_NL
>> +#define ASM_NL   ;
>> +#endif
>> +
>>  #ifdef __cplusplus
>>  #define CPP_ASMLINKAGE extern "C"
>>  #else
>> @@ -75,21 +80,21 @@
>>  
>>  #ifndef ENTRY
>>  #define ENTRY(name) \
>> -  .globl name; \
>> -  ALIGN; \
>> -  name:
>> +.globl name ASM_NL \
>> +ALIGN ASM_NL \
>> +name:
>>  #endif
>>  #endif /* LINKER_SCRIPT */
>>  
>>  #ifndef WEAK
>>  #define WEAK(name) \
>> -.weak name;\
>> +.weak name ASM_NL   \
>>  name:
>>  #endif
>>  
>>  #ifndef END
>>  #define END(name) \
>> -  .size name, .-name
>> +.size name, .-name
>>  #endif
>>  
>>  /* If symbol 'name' is treated as a subroutine (gets called, and returns)
>> @@ -98,8 +103,8 @@
>>   */
>>  #ifndef ENDPROC
>>  #define ENDPROC(name) \
>> -  .type name, @function; \
>> -  END(name)
>> +.type name, @function ASM_NL \
>> +END(name)
>>  #endif
>>  
>>  #endif
>>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages

2014-01-20 Thread Minchan Kim

Please check your MUA and don't break thread.

On Tue, Jan 21, 2014 at 11:07:42AM +0800, Cai Liu wrote:
> Thanks for your review.
> 
> 2014/1/21 Minchan Kim :
> > Hello Cai,
> >
> > On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote:
> >> zswap can support multiple swapfiles. So we need to check
> >> all zbud pool pages in zswap.
> >>
> >> Version 2:
> >>   * add *total_zbud_pages* in zbud to record all the pages in pools
> >>   * move the updating of pool pages statistics to
> >> alloc_zbud_page/free_zbud_page to hide the details
> >>
> >> Signed-off-by: Cai Liu 
> >> ---
> >>  include/linux/zbud.h |2 +-
> >>  mm/zbud.c|   44 
> >>  mm/zswap.c   |4 ++--
> >>  3 files changed, 35 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/include/linux/zbud.h b/include/linux/zbud.h
> >> index 2571a5c..1dbc13e 100644
> >> --- a/include/linux/zbud.h
> >> +++ b/include/linux/zbud.h
> >> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, unsigned long 
> >> handle);
> >>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
> >>  void *zbud_map(struct zbud_pool *pool, unsigned long handle);
> >>  void zbud_unmap(struct zbud_pool *pool, unsigned long handle);
> >> -u64 zbud_get_pool_size(struct zbud_pool *pool);
> >> +u64 zbud_get_pool_size(void);
> >>
> >>  #endif /* _ZBUD_H_ */
> >> diff --git a/mm/zbud.c b/mm/zbud.c
> >> index 9451361..711aaf4 100644
> >> --- a/mm/zbud.c
> >> +++ b/mm/zbud.c
> >> @@ -52,6 +52,13 @@
> >>  #include 
> >>  #include 
> >>
> >> +/*
> >> +* statistics
> >> +**/
> >> +
> >> +/* zbud pages in all pools */
> >> +static u64 total_zbud_pages;
> >> +
> >>  /*
> >>   * Structures
> >>  */
> >> @@ -142,10 +149,28 @@ static struct zbud_header *init_zbud_page(struct 
> >> page *page)
> >>   return zhdr;
> >>  }
> >>
> >> +static struct page *alloc_zbud_page(struct zbud_pool *pool, gfp_t gfp)
> >> +{
> >> + struct page *page;
> >> +
> >> + page = alloc_page(gfp);
> >> +
> >> + if (page) {
> >> + pool->pages_nr++;
> >> + total_zbud_pages++;
> >
> > Who protect race?
> 
> Yes, here the pool->pages_nr and also the total_zbud_pages are not protected.
> I will re-do it.
> 
> I will change *total_zbud_pages* to atomic type.

Wait, it doesn't make sense. Now, you assume zbud allocator would be used
for only zswap. It's true until now but we couldn't make sure it in future.
If other user start to use zbud allocator, total_zbud_pages would be pointless.

Another concern is that what's your scenario for above two swap?
How often we need to call zbud_get_pool_size?
In previous your patch, you reduced the number of call so IIRC,
we only called it in zswap_is_full and for debugfs.
Of course, it would need some lock or refcount to prevent destroy
of zswap_tree in parallel with zswap_frontswap_invalidate_area but
zswap_is_full doesn't need to be exact so RCU would be good fit.

Most important point is that now zswap doesn't consider multiple swap.
For example, Let's assume you uses two swap A and B with different priority
and A already has charged 19% long time ago and let's assume that A swap is
full now so VM start to use B so that B has charged 1% recently.
It menas zswap charged (19% + 1%)i is full by default.

Then, if VM want to swap out more pages into B, zbud_reclaim_page
would be evict one of pages in B's pool and it would be repeated
continuously. It's totally LRU reverse problem and swap thrashing in B
would happen.

Please say your usecase scenario and if it's really problem,
we need more surgery.

Thanks.

> For *pool->pages_nr*, one way is to use pool->lock to protect. But I
> think it is too heavy.
> So does it ok to change pages_nr to atomic type too?
> 
> 
> >
> >> + }
> >> +
> >> + return page;
> >> +}
> >> +
> >> +
> >>  /* Resets the struct page fields and frees the page */
> >> -static void free_zbud_page(struct zbud_header *zhdr)
> >> +static void free_zbud_page(struct zbud_pool *pool, struct zbud_header 
> >> *zhdr)
> >>  {
> >>   __free_page(virt_to_page(zhdr));
> >> +
> >> + pool->pages_nr--;
> >> + total_zbud_pages--;
> >>  }
> >>
> >>  /*
> >> @@ -279,11 +304,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, 
> >> gfp_t gfp,
> >>
> >>   /* Couldn't find unbuddied zbud page, create new one */
> >>   spin_unlock(>lock);
> >> - page = alloc_page(gfp);
> >> + page = alloc_zbud_page(pool, gfp);
> >>   if (!page)
> >>   return -ENOMEM;
> >>   spin_lock(>lock);
> >> - pool->pages_nr++;
> >>   zhdr = init_zbud_page(page);
> >>   bud = FIRST;
> >>
> >> @@ -349,8 +373,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long 
> >> handle)
> >>   if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> >>   /* zbud page is empty, free */
> >>   list_del(>lru);

Re: Dirty deleted files cause pointless I/O storms (unless truncated first)

2014-01-20 Thread Dave Chinner

On Mon, Jan 20, 2014 at 04:59:23PM -0800, Andy Lutomirski wrote:
> The code below runs quickly for a few iterations, and then it slows
> down and the whole system becomes laggy for far too long.
> 
> Removing the sync_file_range call results in no I/O being performed at
> all (which means that the kernel isn't totally screwing this up), and
> changing "4096" to SIZE causes lots of I/O but without
> the going-out-to-lunch bit (unsurprisingly).

More details please. hardware, storage, kernel version, etc.

I can't reproduce any slowdown with the code as posted on a VM
running 3.31-rc5 with 16GB RAM and an SSD w/ ext4 or XFS. The
workload is only generating about 80 IOPS on ext4 so even a slow
spindle should be able handle this without problems...

> Surprisingly, uncommenting the ftruncate call seems to fix the
> problem.  This suggests that all the necessary infrastructure to avoid
> wasting time writing to deleted files is there but that it's not
> getting used.

Not surprising at all - if it's stuck in a writeback loop somewhere,
truncating the file will terminate writeback because it end up being
past EOF and so stops immediately...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] input/uinput: add UI_GET_SYSNAME ioctl to retrieve the sysfs path

2014-01-20 Thread Dmitry Torokhov

On Tue, Jan 21, 2014 at 08:56:51AM +1000, Peter Hutterer wrote:
> On Mon, Jan 20, 2014 at 05:17:08PM -0500, Benjamin Tissoires wrote:
> > On Mon, Jan 20, 2014 at 4:53 PM, Dmitry Torokhov
> >  wrote:
> > > Hi Benjamin,
> > >
> > > On Fri, Jan 17, 2014 at 02:12:51PM -0500, Benjamin Tissoires wrote:
> > >> Evemu [1] uses uinput to replay devices traces it has recorded. However,
> > >> the way evemu uses uinput is slightly different from how uinput is
> > >> supposed to be used.
> > >> Evemu relies on libevdev, which creates the device node through uinput.
> > >> It then injects events through the input device node directly (and it
> > >> completely skips the uinput node).
> > >>
> > >> Currently, libevdev relies on an heuristic to guess which input node was
> > >> created. The problem is that is heuristic is subjected to races between
> > >> different uinput devices or even with physical devices. Having a way
> > >> to retrieve the sysfs path allows us to find the event node without
> > >> having to rely on this heuristic.
> > >
> > > I have been thinking about it and I think that providing tight coupling
> > > between uinput and resulting event device is wrong thing to do. We do
> > > allow sending input events through uinput interface and I think evemu
> > > should be using it, instead of going halfway through uinput and halfway
> > > though evdev. Replaying though uinput would actually be more correct as
> > > it would involve the same code paths throgugh input core as with using
> > > real devices (see input_event() vs. input_inject_event() that is used by
> > > input handlers).
> > >
> > 
> > Yes, I am perfectly aware of the fact that evemu is not using uinput
> > in the way it is intended to be.
> > I agree that it should be using the uinput node to inject events but
> > this means that only the process which has created the virtual device
> > can access it. It seems weird, I know, but the typical use of evemu is
> > the following:
> > - in a first terminal: $> sudo evemu-device mydevice.desc
> > - In a second: $> sudo evemu-play /dev/input/event12 < mydevice.events
> > 
> > It looks weird here, but it allows to inject different events
> > recording for the same virtual device node. 
> 
> it also allows replaying an event through the device it was recorded on.
> it's not always necessary or desirable to create a uinput device, sometimes
> replaying it through the actual device is better to reproduce a certain bug.

I was not saying that we should remove ability to inject events through
evdev nodes, so I am not sure why you are bringing your last point, but
form your and Benjamin's other mails I can see why going through evdev
(that has a separate device node) might be beneficial.

Benjamin, please clean up the issues brought up by David and I should be
able to apply the patch.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages

2014-01-20 Thread Bob Liu

On Tue, Jan 21, 2014 at 11:07 AM, Cai Liu  wrote:
> Thanks for your review.
>
> 2014/1/21 Minchan Kim :
>> Hello Cai,
>>
>> On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote:
>>> zswap can support multiple swapfiles. So we need to check
>>> all zbud pool pages in zswap.
>>>
>>> Version 2:
>>>   * add *total_zbud_pages* in zbud to record all the pages in pools
>>>   * move the updating of pool pages statistics to
>>> alloc_zbud_page/free_zbud_page to hide the details
>>>
>>> Signed-off-by: Cai Liu 
>>> ---
>>>  include/linux/zbud.h |2 +-
>>>  mm/zbud.c|   44 
>>>  mm/zswap.c   |4 ++--
>>>  3 files changed, 35 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/include/linux/zbud.h b/include/linux/zbud.h
>>> index 2571a5c..1dbc13e 100644
>>> --- a/include/linux/zbud.h
>>> +++ b/include/linux/zbud.h
>>> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, unsigned long 
>>> handle);
>>>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
>>>  void *zbud_map(struct zbud_pool *pool, unsigned long handle);
>>>  void zbud_unmap(struct zbud_pool *pool, unsigned long handle);
>>> -u64 zbud_get_pool_size(struct zbud_pool *pool);
>>> +u64 zbud_get_pool_size(void);
>>>
>>>  #endif /* _ZBUD_H_ */
>>> diff --git a/mm/zbud.c b/mm/zbud.c
>>> index 9451361..711aaf4 100644
>>> --- a/mm/zbud.c
>>> +++ b/mm/zbud.c
>>> @@ -52,6 +52,13 @@
>>>  #include 
>>>  #include 
>>>
>>> +/*
>>> +* statistics
>>> +**/
>>> +
>>> +/* zbud pages in all pools */
>>> +static u64 total_zbud_pages;
>>> +
>>>  /*
>>>   * Structures
>>>  */
>>> @@ -142,10 +149,28 @@ static struct zbud_header *init_zbud_page(struct page 
>>> *page)
>>>   return zhdr;
>>>  }
>>>
>>> +static struct page *alloc_zbud_page(struct zbud_pool *pool, gfp_t gfp)
>>> +{
>>> + struct page *page;
>>> +
>>> + page = alloc_page(gfp);
>>> +
>>> + if (page) {
>>> + pool->pages_nr++;
>>> + total_zbud_pages++;
>>
>> Who protect race?
>
> Yes, here the pool->pages_nr and also the total_zbud_pages are not protected.
> I will re-do it.
>
> I will change *total_zbud_pages* to atomic type.

And how about just add total_zbud_pages++ and leave pool->pages_nr in
its original place which already protected by pool->lock?

> For *pool->pages_nr*, one way is to use pool->lock to protect. But I
> think it is too heavy.
> So does it ok to change pages_nr to atomic type too?
>

-- 
Regards,
--Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 06/20] ARM64 / ACPI: Introduce some PCI functions when PCI is enabled

2014-01-20 Thread Hanjun Guo

On 2014-1-21 2:39, Arnd Bergmann wrote:
> On Monday 20 January 2014, Hanjun Guo wrote:
 acpi_register_ioapic()/acpi_unregister_ioapic() will be used for IOAPIC
 hotplug and GIC distributor is something like IOAPIC on x86, so I think
 these two functions can be reserved for future use.
>>> But GIC is not hotplugged, is it? It still sounds x86 specific to me.
>>
>> Well, if we want to do physical CPU hotplug on ARM/ARM64 (maybe years 
>> later?),
>> then GIC add/remove is needed because we have to remove GIC
>> on the SoC too when we remove the physical CPU.
> 
> In general, I recommend not planning for the future in kernel code when you
> don't know what is going to happen. It's always easy enough to change
> things once you get there, as long as no stable ABI is involved.

Ok, I agree with you.

> 
> I just looked at the caller of these functions, and found a self-contained
> PCI driver in drivers/pci/ioapic.c, which uses two sepate PCI classes for
> ioapic and ioxapic. I think it's a safe assumption to say that even if we
> get ARM CPU+GIC hotplug, that would not use the same ioapic driver. This
> driver is currently marked X86-only, and that should probably stay this way,
> so you won't need the hooks.

Will find a suitable way to fix that in next version, thanks for you comments :)

Hanjun

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] ACPI / idle: Move idle_boot_override out of the arch directory

2014-01-20 Thread Hanjun Guo

On 2014-1-21 7:34, Rafael J. Wysocki wrote:
> On Monday, January 20, 2014 10:08:41 PM Hanjun Guo wrote:
>> On 2014年01月18日 21:47, Rafael J. Wysocki wrote:
>>> On Saturday, January 18, 2014 11:52:18 AM Hanjun Guo wrote:
 On 2014-1-18 11:45, Hanjun Guo wrote:
> On 2014-1-17 20:06, Sudeep Holla wrote:
>> On 17/01/14 02:03, Hanjun Guo wrote:
>>> Move idle_boot_override out of the arch directory to be a single enum
>>> including both platforms values, this will make it rather easier to
>>> avoid ifdefs around which definitions are for which processor in
>>> generally used ACPI code.
>>>
>>> IDLE_FORCE_MWAIT for IA64 is not used anywhere, so romove it.
>>>
>>> No functional change in this patch.
>>>
>>> Suggested-by: Alan 
>>> Signed-off-by: Hanjun Guo 
>>> ---
 [...]
>>> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
>>> index 03e235ad..e324561 100644
>>> --- a/include/linux/cpu.h
>>> +++ b/include/linux/cpu.h
>>> @@ -220,6 +220,14 @@ void cpu_idle(void);
>>>   
>>>   void cpu_idle_poll_ctrl(bool enable);
>>>   
>>> +enum idle_boot_override {
>>> +   IDLE_NO_OVERRIDE = 0,
>>> +   IDLE_HALT,
>>> +   IDLE_NOMWAIT,
>>> +   IDLE_POLL,
>>> +   IDLE_POWERSAVE_OFF
>>> +};
>>> +
>> I do understand the idea behind this change, but IMO HALT and MWAIT are 
>> x86
>> specific and may not make sense for other architectures.
> yes, this is the strange part, the value is arch-dependent.
>
>> It will also require every architecture using ACPI to export
>> boot_option_idle_override which may not be really required.
> so, how about forget this patch and move boot_option_idle_override
> related code into arch directory such as arch/x86/acpi/boot.c for
> x86?
 The general idea is that we can move all the arch-dependent codes
 in ACPI driver to arch directory, then make codes in drivers/acpi/
 arch independent.
>>> Well, MWAIT is arch-dependent, so I'm not sure how IDLE_NOMWAIT fits into
>>> include/linux/cpu.h?
>>
>> So you will not happy with this patch and should find another solution?
> 
> No, I'm not happy with it.
> 
> If you want to move that to an arch-agnostic header, the symbol names cannot
> be arch-dependent any more.

Ok, will find another solution for that, thanks for your comments :)

Hanjun

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the tip tree

2014-01-20 Thread Mike Galbraith

On Mon, 2014-01-20 at 22:51 +0100, Peter Zijlstra wrote:

> I'm still waiting for someone to explain what's wrong with:
> 
> static inline void mwait_idle(void)
> {
>   local_irq_enable();
>   mwait_idle_with_hints(0, 0);
> }

How about just do that going forward, it work, and can always be fixed
if something turns up, and the below for stable once it hits mainline?

Q6600 box is happy camper in all trees.

From: Len Brown 

x86 idle: restore mwait_idle()

In Linux-3.9 we removed the mwait_idle() loop:
'x86 idle: remove mwait_idle() and "idle=mwait" cmdline param'
(69fb3676df3329a7142803bb3502fa59dc0db2e3)

The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT loop,
until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.

But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
   MWAIT-C1 is preferred for optimal power and performance.
   But if they support just C1, cpuidle never loads and
   so they use the boot-time default idle loop forever.

2. Some laptops will boot-hang if HALT is used,
   but will boot successfully if MWAIT is used.
   This appears to be a hidden assumption in BIOS SMI,
   that is presumably valid on the proprietary OS
   where the BIOS was validated.

   https://bugzilla.kernel.org/show_bug.cgi?id=60770

So here we effectively revert the patch above, restoring
the mwait_idle() loop.  However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.

Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in context
For 3.11, 3.12, 3.13, this patch applies cleanly

Mike: reinstate polling, and add clflush barriers.

Cc: Mike Galbraith 
Cc: Ian Malone 
Cc: Josh Boyer 
Cc:  # 3.9, 3.10, 3.11, 3.12, 3.13
Signed-off-by: Mike Galbraith 
Signed-off-by: Len Brown 
---
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3fb8d95ab8b5..c5db2a43e730 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -398,6 +398,52 @@ static void amd_e400_idle(void)
default_idle();
 }
 
+/*
+ * Intel Core2 and older machines prefer MWAIT over HALT for C1.
+ * We can't rely on cpuidle installing MWAIT, because it will not load
+ * on systems that support only C1 -- so the boot default must be MWAIT.
+ *
+ * Some AMD machines are the opposite, they depend on using HALT.
+ *
+ * So for default C1, which is used during boot until cpuidle loads,
+ * use MWAIT-C1 on Intel HW that has it, else use HALT.
+ */
+static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
+{
+   if (c->x86_vendor != X86_VENDOR_INTEL)
+   return 0;
+
+   if (!cpu_has(c, X86_FEATURE_MWAIT))
+   return 0;
+
+   return 1;
+}
+
+/*
+ * MONITOR/MWAIT with no hints, used for default default C1 state.
+ * This invokes MWAIT with interrutps enabled and no flags,
+ * which is backwards compatible with the original MWAIT implementation.
+ */
+
+static void mwait_idle(void)
+{
+   if (!current_set_polling_and_test()) {
+   if (static_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) {
+   mb();
+   clflush((void *)_thread_info()->flags);
+   mb();
+   }
+
+   __monitor((void *)_thread_info()->flags, 0, 0);
+   if (!need_resched())
+   __sti_mwait(0, 0);
+   else
+   local_irq_enable();
+   } else
+   local_irq_enable();
+   __current_clr_polling();
+}
+
 void select_idle_routine(const struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
@@ -411,6 +457,9 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
/* E400: APIC timer interrupt does not wake up CPU from C1e */
pr_info("using AMD E400 aware idle routine\n");
x86_idle = amd_e400_idle;
+   } else if (prefer_mwait_c1_over_halt(c)) {
+   pr_info("using mwait in idle threads\n");
+   x86_idle = mwait_idle;
} else
x86_idle = default_idle;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.12-rc5 and overwritten partition table - by powertop?

2014-01-20 Thread Robert Hancock


On 10/29/2013 04:32 PM, John Twideldum wrote:

The first ~170kb of /dev/sda got blown away with what seems to be a logging 
output
by Powertop, when I was playing with the tuneables.


So did you log the output to some file? I'm just trying to understand how
it could get onto your disk in the first place...


Attached a dump of the first 1Mb of the disk, HTH.
It looks like a powertop log?
(I have powertop 2.4)


Yes, likely. But it is strange the corruption doesn't even end at any
sensible boundary (data ends at offset 0x27b53). Shrug...


My recollection what I did is this:

I was looking into powertop and observing how -rc5 works now with Haswell.
I saw the tuneable parameters and quite a few were "bad", so I set them to 
"good".
Power usage dropped about one third - yay!
However, changing "SATA link power" threw up complaints:

Oct 29 09:09:21 localhost kernel: [ 3697.423868] ata1.00: exception Emask 0x10 
SAct 0x1 SErr 0xc action 0x6 frozen
Oct 29 09:09:21 localhost kernel: [ 3697.423873] ata1.00: irq_stat 0x0800, 
interface fatal error
Oct 29 09:09:21 localhost kernel: [ 3697.423877] ata1: SError: { CommWake 10B8B 
}
Oct 29 09:09:21 localhost kernel: [ 3697.423880] ata1.00: failed command: WRITE 
FPDMA QUEUED
Oct 29 09:09:21 localhost kernel: [ 3697.423886] ata1.00: cmd 
61/38:00:01:9e:a4/01:00:00:00:00/40 tag 0 ncq 159744 out
Oct 29 09:09:21 localhost kernel: [ 3697.423886]  res 
50/01:00:01:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
Oct 29 09:09:21 localhost kernel: [ 3697.423888] ata1.00: status: { DRDY }
Oct 29 09:09:21 localhost kernel: [ 3697.423894] ata1: hard resetting link
Oct 29 09:09:22 localhost kernel: [ 3697.743196] ata1: SATA link up 6.0 Gbps 
(SStatus 133 SControl 300)
Oct 29 09:09:22 localhost kernel: [ 3697.744707] ata1.00: ACPI cmd 
ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.744719] ata1.00: ACPI cmd 
f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.744725] ata1.00: ACPI cmd 
ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.744813] ata1.00: ACPI cmd 
ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.745212] ata1.00: failed to get NCQ 
Send/Recv Log Emask 0x1
Oct 29 09:09:22 localhost kernel: [ 3697.746694] ata1.00: ACPI cmd 
ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.746705] ata1.00: ACPI cmd 
f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.746711] ata1.00: ACPI cmd 
ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.746779] ata1.00: ACPI cmd 
ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.747286] ata1.00: failed to get NCQ 
Send/Recv Log Emask 0x1
Oct 29 09:09:22 localhost kernel: [ 3697.747432] ata1.00: configured for 
UDMA/133
Oct 29 09:09:22 localhost kernel: [ 3697.763181] ata1: EH complete

I did not know yet about what "frozen" means, so I did not investigate and
very soon powered down as I had to leave.
Next time I boot up I did not boot.
So data probable is just the size because as long as I had powertop running...


(CCing linux-ide)

It seems like most likely either the SATA host controller or drive 
doesn't play nice with link power management enabled. Can you post the 
full dmesg boot log?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] cpufreq: Align all CPUs to the same frequency if using shared clock

2014-01-20 Thread lizhuangzhi

Some SMP systems want to make all the possible CPUs share the clock,
if the CPUs init frequencies aren't the same, we need to align all the
CPUs to the same frequency while CPUs registing to avoid mismatched
CPU's P-states.

Signed-off-by: lizhuangzhi 
---
 drivers/cpufreq/cpufreq.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 8d19f7c..d00abb5 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -991,6 +991,8 @@ static int __cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif,
 * CPU because it is in the same boat. */
policy = cpufreq_cpu_get(cpu);
if (unlikely(policy)) {
+   /* according present policy to align all the cpus frequencies */
+   cpufreq_driver->target(policy, policy->cur, CPUFREQ_RELATION_H);
cpufreq_cpu_put(policy);
return 0;
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 4/6] net: rfkill: gpio: add device tree support

2014-01-20 Thread Alexandre Courbot

On Sat, Jan 18, 2014 at 8:11 AM, Linus Walleij  wrote:
> On Fri, Jan 17, 2014 at 6:43 PM, Chen-Yu Tsai  wrote:
>> On Sat, Jan 18, 2014 at 12:47 AM, Arnd Bergmann  wrote:
>
 +- NAME_shutdown-gpios  : GPIO phandle to shutdown control
 + (phandle must be the second)
 +- NAME_reset-gpios : GPIO phandle to reset control
 +
 +NAME must match the rfkill-name property. NAME_shutdown-gpios or
 +NAME_reset-gpios, or both, must be defined.
 +
>>>
>>> I don't understand this part. Why do you include the name in the
>>> gpios property, rather than just hardcoding the property strings
>>> to "shutdown-gpios" and "reset-gpios"?
>>
>> This quirk is a result of how gpiod_get_index implements device tree
>> lookup.
>
> Why can't it just have a single property "gpios", where the first
> element is the reset GPIO and the second is the shutdown GPIO?
>
> rfkill-gpio does this:
>
> gpio = devm_gpiod_get_index(>dev, rfkill->reset_name, 0);
> gpio = devm_gpiod_get_index(>dev, rfkill->shutdown_name, 1);
>
> The passed con ID name parameter is only there for the device
> tree case it seems. (ACPI ignores it.) So what about you just
> don't pass it at all and patch it to do like this instead:
>
> gpio = devm_gpiod_get_index(>dev, NULL, 0);
> gpio = devm_gpiod_get_index(>dev, NULL, 1);
>
> Heikki, are you OK with this change?
>
> I think this is actually necessary if the ACPI and DT unification
> pipe dream shall limp forward, we cannot have arguments passed
> that have a semantic effect on DT but not on ACPI... Drivers
> that are supposed to use both ACPI and DT will always
> have to pass NULL as con ID.

I agree that's how it should be be done with the current API if your
driver can obtain GPIOs from both ACPI and DT. This is a potential
issue, as drivers are not supposed to make assumptions about who is
going to be their GPIO provider. Let's say you started a driver with
only DT in mind, and used gpio_get(dev, con_id) to get your GPIOs. DT
bindings are thus of the form "con_id-gpio = ", and set in
stone. Then later, someone wants to use your driver with ACPI. How do
you handle that gracefully?

I'm starting to wonder, now that ACPI is a first-class GPIO provider,
whether we should not start to encourage the deprecation of the
"con_id-gpio = " binding form in DT and only use a single
indexed GPIO property per device. The con_id parameter would then only
be used as a label, which would also have the nice side-effect that
all GPIOs used for a given function will be reported under the same
name no matter what the GPIO provider is.

>From an aesthetic point of view, I definitely prefer using con_id to
identify GPIOs instead of indexes, but I don't see how we can make it
play nice with ACPI. Thoughts?

Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages

2014-01-20 Thread Cai Liu

Thanks for your review.

2014/1/21 Minchan Kim :
> Hello Cai,
>
> On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote:
>> zswap can support multiple swapfiles. So we need to check
>> all zbud pool pages in zswap.
>>
>> Version 2:
>>   * add *total_zbud_pages* in zbud to record all the pages in pools
>>   * move the updating of pool pages statistics to
>> alloc_zbud_page/free_zbud_page to hide the details
>>
>> Signed-off-by: Cai Liu 
>> ---
>>  include/linux/zbud.h |2 +-
>>  mm/zbud.c|   44 
>>  mm/zswap.c   |4 ++--
>>  3 files changed, 35 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/linux/zbud.h b/include/linux/zbud.h
>> index 2571a5c..1dbc13e 100644
>> --- a/include/linux/zbud.h
>> +++ b/include/linux/zbud.h
>> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, unsigned long 
>> handle);
>>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
>>  void *zbud_map(struct zbud_pool *pool, unsigned long handle);
>>  void zbud_unmap(struct zbud_pool *pool, unsigned long handle);
>> -u64 zbud_get_pool_size(struct zbud_pool *pool);
>> +u64 zbud_get_pool_size(void);
>>
>>  #endif /* _ZBUD_H_ */
>> diff --git a/mm/zbud.c b/mm/zbud.c
>> index 9451361..711aaf4 100644
>> --- a/mm/zbud.c
>> +++ b/mm/zbud.c
>> @@ -52,6 +52,13 @@
>>  #include 
>>  #include 
>>
>> +/*
>> +* statistics
>> +**/
>> +
>> +/* zbud pages in all pools */
>> +static u64 total_zbud_pages;
>> +
>>  /*
>>   * Structures
>>  */
>> @@ -142,10 +149,28 @@ static struct zbud_header *init_zbud_page(struct page 
>> *page)
>>   return zhdr;
>>  }
>>
>> +static struct page *alloc_zbud_page(struct zbud_pool *pool, gfp_t gfp)
>> +{
>> + struct page *page;
>> +
>> + page = alloc_page(gfp);
>> +
>> + if (page) {
>> + pool->pages_nr++;
>> + total_zbud_pages++;
>
> Who protect race?

Yes, here the pool->pages_nr and also the total_zbud_pages are not protected.
I will re-do it.

I will change *total_zbud_pages* to atomic type.
For *pool->pages_nr*, one way is to use pool->lock to protect. But I
think it is too heavy.
So does it ok to change pages_nr to atomic type too?


>
>> + }
>> +
>> + return page;
>> +}
>> +
>> +
>>  /* Resets the struct page fields and frees the page */
>> -static void free_zbud_page(struct zbud_header *zhdr)
>> +static void free_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr)
>>  {
>>   __free_page(virt_to_page(zhdr));
>> +
>> + pool->pages_nr--;
>> + total_zbud_pages--;
>>  }
>>
>>  /*
>> @@ -279,11 +304,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t 
>> gfp,
>>
>>   /* Couldn't find unbuddied zbud page, create new one */
>>   spin_unlock(>lock);
>> - page = alloc_page(gfp);
>> + page = alloc_zbud_page(pool, gfp);
>>   if (!page)
>>   return -ENOMEM;
>>   spin_lock(>lock);
>> - pool->pages_nr++;
>>   zhdr = init_zbud_page(page);
>>   bud = FIRST;
>>
>> @@ -349,8 +373,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long 
>> handle)
>>   if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
>>   /* zbud page is empty, free */
>>   list_del(>lru);
>> - free_zbud_page(zhdr);
>> - pool->pages_nr--;
>> + free_zbud_page(pool, zhdr);
>>   } else {
>>   /* Add to unbuddied list */
>>   freechunks = num_free_chunks(zhdr);
>> @@ -447,8 +470,7 @@ next:
>>* Both buddies are now free, free the zbud page and
>>* return success.
>>*/
>> - free_zbud_page(zhdr);
>> - pool->pages_nr--;
>> + free_zbud_page(pool, zhdr);
>>   spin_unlock(>lock);
>>   return 0;
>>   } else if (zhdr->first_chunks == 0 ||
>> @@ -496,14 +518,12 @@ void zbud_unmap(struct zbud_pool *pool, unsigned long 
>> handle)
>>
>>  /**
>>   * zbud_get_pool_size() - gets the zbud pool size in pages
>> - * @pool:pool whose size is being queried
>>   *
>> - * Returns: size in pages of the given pool.  The pool lock need not be
>> - * taken to access pages_nr.
>> + * Returns: size in pages of all the zbud pools.
>>   */
>> -u64 zbud_get_pool_size(struct zbud_pool *pool)
>> +u64 zbud_get_pool_size(void)
>>  {
>> - return pool->pages_nr;
>> + return total_zbud_pages;
>>  }
>>
>>  static int __init init_zbud(void)
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index 5a63f78..ef44d9d 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -291,7 +291,7 @@ static void zswap_free_entry(struct zswap_tree *tree,
>>   zbud_free(tree->pool, entry->handle);
>>   zswap_entry_cache_free(entry);
>>   atomic_dec(_stored_pages);
>> - zswap_pool_pages =

Re: [patch 9/9] mm: keep page cache radix tree nodes in check

2014-01-20 Thread Dave Chinner

On Mon, Jan 20, 2014 at 06:17:37PM -0500, Johannes Weiner wrote:
> On Fri, Jan 17, 2014 at 11:05:17AM +1100, Dave Chinner wrote:
> > On Fri, Jan 10, 2014 at 01:10:43PM -0500, Johannes Weiner wrote:
> > > + /* Only shadow entries in there, keep track of this node */
> > > + if (!(node->count & RADIX_TREE_COUNT_MASK) &&
> > > + list_empty(>private_list)) {
> > > + node->private_data = mapping;
> > > + list_lru_add(_shadow_nodes, >private_list);
> > > + }
> > 
> > You can't do this list_empty(>private_list) check safely
> > externally to the list_lru code - only time that entry can be
> > checked safely is under the LRU list locks. This is the reason that
> > list_lru_add/list_lru_del return a boolean to indicate is the object
> > was added/removed from the list - they do this list_empty() check
> > internally. i.e. the correct, safe way to do conditionally update
> > state iff the object was added to the LRU is:
> > 
> > if (!(node->count & RADIX_TREE_COUNT_MASK)) {
> > if (list_lru_add(_shadow_nodes, >private_list))
> > node->private_data = mapping;
> > }
> > 
> > > + radix_tree_replace_slot(slot, page);
> > > + mapping->nrpages++;
> > > + if (node) {
> > > + node->count++;
> > > + /* Installed page, can't be shadow-only anymore */
> > > + if (!list_empty(>private_list))
> > > + list_lru_del(_shadow_nodes,
> > > +  >private_list);
> > > + }
> > 
> > Same issue here:
> > 
> > if (node) {
> > node->count++;
> > list_lru_del(_shadow_nodes, >private_list);
> > }
> 
> All modifications to node->private_list happen under
> mapping->tree_lock, and modifications of a neighboring link should not
> affect the outcome of the list_empty(), so I don't think the lru lock
> is necessary.

Can you please add that as a comment somewhere explaining why it is
safe to do this?

> > > + case LRU_REMOVED_RETRY:
> > >   if (--nlru->nr_items == 0)
> > >   node_clear(nid, lru->active_nodes);
> > >   WARN_ON_ONCE(nlru->nr_items < 0);
> > >   isolated++;
> > > + /*
> > > +  * If the lru lock has been dropped, our list
> > > +  * traversal is now invalid and so we have to
> > > +  * restart from scratch.
> > > +  */
> > > + if (ret == LRU_REMOVED_RETRY)
> > > + goto restart;
> > >   break;
> > >   case LRU_ROTATE:
> > >   list_move_tail(item, >list);
> > 
> > I think that we need to assert that the list lru lock is correctly
> > held here on return with LRU_REMOVED_RETRY. i.e.
> > 
> > case LRU_REMOVED_RETRY:
> > assert_spin_locked(>lock);
> > case LRU_REMOVED:
> 
> Ah, good idea.  How about adding it to LRU_RETRY as well?

Yup, good idea.

> > > +static struct shrinker workingset_shadow_shrinker = {
> > > + .count_objects = count_shadow_nodes,
> > > + .scan_objects = scan_shadow_nodes,
> > > + .seeks = DEFAULT_SEEKS * 4,
> > > + .flags = SHRINKER_NUMA_AWARE,
> > > +};
> > 
> > Can you add a comment explaining how you calculated the .seeks
> > value? It's important to document the weighings/importance
> > we give to slab reclaim so we can determine if it's actually
> > acheiving the desired balance under different loads...
> 
> This is not an exact science, to say the least.

I know, that's why I asked it be documented rather than be something
kept in your head.

> The shadow entries are mostly self-regulated, so I don't want the
> shrinker to interfere while the machine is just regularly trimming
> caches during normal operation.
> 
> It should only kick in when either a) reclaim is picking up and the
> scan-to-reclaim ratio increases due to mapped pages, dirty cache,
> swapping etc. or b) the number of objects compared to LRU pages
> becomes excessive.
> 
> I think that is what most shrinkers with an elevated seeks value want,
> but this translates very awkwardly (and not completely) to the current
> cost model, and we should probably rework that interface.
> 
> "Seeks" currently encodes 3 ratios:
> 
>   1. the cost of creating an object vs. a page
> 
>   2. the expected number of objects vs. pages

It doesn't encode that at all. If it did, then the default value
wouldn't be "2".

>   3. the cost of reclaiming an object vs. a page

Which, when you consider #3 in conjunction with #1, the actual
intended meaning of .seeks is "the cost of replacing this object in
the cache compared to the cost of replacing a page cache page."

> but they are not necessarily correlated.  How I would like to
> configure the shadow shrinker instead is:
> 
>   o scan objects when reclaim efficiency is down to 75%, because they
> are more valuable than use-once cache but less than workingset
> 
>   o scan objects

Re: More GPIO madness on iMX6 - and the crappy ARM port of Linux

2014-01-20 Thread Alexandre Courbot

On Sat, Jan 18, 2014 at 7:43 AM, Linus Walleij  wrote:
> On Fri, Jan 17, 2014 at 9:53 PM, Russell King - ARM Linux
>  wrote:
>> On Fri, Jan 17, 2014 at 01:42:44PM -0700, Stephen Warren wrote:
>
>>> I believe you want gpio_get_value() to return either the driven or
>>> actual pin value where it can on the current HW, but just e.g. hard-code
>>> 0 on other HW. That would introduce a core feature that works some
>>> places but not others, and hence make drivers that relied on the feature
>>> less portable between HW with different actual features.
>>
>> I can buy that argument, but there's an issue which stands squarely in
>> its way, and that is open-drain GPIOs.
>>
>> These are modelled just as any other GPIO, mainly so that both
>> gpio_set_value(gpio, 1) and gpio_direction_input(gpio) both result in
>> the signal being high.  The only combination which results in the
>> signal being driven low is outputting zero - and the state of the signal
>> can aways be read back.
>>
>> The problem here is that such gpios are implemented in things like the
>> I2C driver such that they're _always_ outputs, and gpio_set_value() is
>> used to pull the signal down.  gpio_get_value() is used to read its
>> current state.
>>
>> So, if we say that gpio_get_value() is undefined, we force such
>> subsystems to always jump through the non-open-drain paths (using
>> gpio_direction_input() to set the line high and
>> gpio_direction_output(gpio, 0) to drive it low.)
>
> Incidentally that is what gpiolib is doing internally in
> gpiod_direction_output().
>
> You're absolutely right that it makes no sense to have open
> drain (or open source) unless the signal can be read back from
> the hardware.
>
> I'm thinking something like if the driver manages to obtain a
> GPIO with
>
> gpio_request_one(gpio, GPIOF_OPEN_DRAIN |
>  GPIOF_OUT_INIT_HIGH);
>
> As the I2C core does, and then when that call succeeds, it can
> expect that whatever comes back from gpio_get_value() is
> always what is actually on the line. If the driver cannot determine
> this it should not have allowed that flag to succeed in the first
> place, so this might be something we want to enforce.
>
> There are two white spots on the map here:
>
> 1. Today this OPEN_DRAIN flag is not even passed down to
> the driver so how could it say anything about it :-( it's a pure gpiolib
> internal flag. We don't know if the hardware can actually even
> do open drain, we just assume it can.
>
> What it should really do - in the best of worlds - is to check if
> it can cross-reference the GPIO line to a pin in the pin control
> subsystem, and if that is possible, then ask the pin if it
> is supporting open drain and set it. It currently has no such
> cross-calls, it is just assumed that the configuration is consistent,
> and the actual pin is set up as open drain. But it would make
> sense to add more cross-calls here, since GPIO is accepting
> these flags (OPEN_DRAIN/OPEN_SOURCE).

This would definitely work in the case of pinctrl-backed GPIOs, but
would not cover all GPIO chips. If we want to cover all cases we
should give drivers a way to way to report or enforce this capability,
and make the pinctrl cross-reference one of its implementations where
it can be done.

>
> Like:
> int pinctrl_gpio_set_flags(unsigned gpio, unsigned long flags);
>
> Where the pinctrl subsystem would attempt to cross reference
> and set the flag, and the pin controller backend will then have
> the option to return an error code.
>
> We could atleast support that for the select pin controllers
> that use generic pin config. i.MX is another story, but I'm open
> to compromises.
>
> 2. In the new descriptor API this open drain setting would
> be set from the lookup table and be a property on the line,
> meaning this flag is not requested explicitly by the consumer,
> and the consumer needs to inspect the obtained descriptor
> to figure out if it is set to open drain.
>
> Alexandre: do you have plans for how to handle a dynamic
> consumer passing flags to its gpio request in the gpiod API?

Do you mean like passing OPEN_DRAIN or OPEN_SOURCE flags to
gpiod_get(), similarly to what is done for e.g. gpio_request_one()?

In the case of the gpiod API I would rather see these flags defined in
the GPIO mapping if possible. For platform data it is already possible
to specify open drain/open source, for DT this is trivial to add. ACPI
would be more of a problem here, but I'm not sure whether the problem
is relevant for ACPI GPIOs.

So the way I see it coming into shape would be something like:

1) GPIO drivers' request() function get an extra flags argument that
is passed by the GPIO core with the flags of the mapping. There we can
define all the range of properties that gpio_request_one() supported.
The driver's request() will fail it if cannot satisfy these
properties. That's where the pinctrl cross-reference would take place.

2) All properties accepted by gpio_request_one() can also be passed

[PATCH v4] ACPI: Fix acpi_evaluate_object() return value check

2014-01-20 Thread Yijing Wang

Since acpi_evaluate_object() returns acpi_status and not plain int,
ACPI_FAILURE() should be used for checking its return value.

Reviewed-by: Jani Nikula 
Signed-off-by: Yijing Wang 
---
v3->v4: Fix spell error, add Jani Nikula reviewed-by.
v2->v3: Fix compile error pointed out by Hanjun.
v1->v2: Add CC to related subsystem MAINTAINERS
---
 drivers/gpu/drm/i915/intel_acpi.c  |   24 ++--
 drivers/gpu/drm/nouveau/core/subdev/mxm/base.c |9 +
 drivers/gpu/drm/nouveau/nouveau_acpi.c |   23 +--
 drivers/pci/pci-label.c|9 ++---
 4 files changed, 38 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_acpi.c 
b/drivers/gpu/drm/i915/intel_acpi.c
index dfff090..87e8f74 100644
--- a/drivers/gpu/drm/i915/intel_acpi.c
+++ b/drivers/gpu/drm/i915/intel_acpi.c
@@ -35,7 +35,8 @@ static int intel_dsm(acpi_handle handle, int func)
union acpi_object params[4];
union acpi_object *obj;
u32 result;
-   int ret = 0;
+   acpi_status status;
+   int ret;
 
input.count = 4;
input.pointer = params;
@@ -50,10 +51,11 @@ static int intel_dsm(acpi_handle handle, int func)
params[3].package.count = 0;
params[3].package.elements = NULL;
 
-   ret = acpi_evaluate_object(handle, "_DSM", , );
-   if (ret) {
-   DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret);
-   return ret;
+   status = acpi_evaluate_object(handle, "_DSM", , );
+   if (ACPI_FAILURE(status)) {
+   DRM_DEBUG_DRIVER("failed to evaluate _DSM: %s\n",
+   acpi_format_exception(status));
+   return -EINVAL;
}
 
obj = (union acpi_object *)output.pointer;
@@ -141,7 +143,8 @@ static void intel_dsm_platform_mux_info(void)
struct acpi_object_list input;
union acpi_object params[4];
union acpi_object *pkg;
-   int i, ret;
+   acpi_status status;
+   int i;
 
input.count = 4;
input.pointer = params;
@@ -156,10 +159,11 @@ static void intel_dsm_platform_mux_info(void)
params[3].package.count = 0;
params[3].package.elements = NULL;
 
-   ret = acpi_evaluate_object(intel_dsm_priv.dhandle, "_DSM", ,
-  );
-   if (ret) {
-   DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret);
+   acpi_status = acpi_evaluate_object(intel_dsm_priv.dhandle,
+   "_DSM", , );
+   if (ACPI_FAILURE(status)) {
+   DRM_DEBUG_DRIVER("failed to evaluate _DSM: %s\n",
+   acpi_format_exception(status));
goto out;
}
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c 
b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c
index 1291204..c5e7a2b 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c
@@ -114,15 +114,16 @@ mxm_shadow_dsm(struct nouveau_mxm *mxm, u8 version)
struct acpi_buffer retn = { ACPI_ALLOCATE_BUFFER, NULL };
union acpi_object *obj;
acpi_handle handle;
-   int ret;
+   acpi_status status;
 
handle = ACPI_HANDLE(>pdev->dev);
if (!handle)
return false;
 
-   ret = acpi_evaluate_object(handle, "_DSM", , );
-   if (ret) {
-   nv_debug(mxm, "DSM MXMS failed: %d\n", ret);
+   status = acpi_evaluate_object(handle, "_DSM", , );
+   if (ACPI_FAILURE(status)) {
+   nv_debug(mxm, "DSM MXMS failed: %s\n",
+   acpi_format_exception(status));
return false;
}
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c 
b/drivers/gpu/drm/nouveau/nouveau_acpi.c
index ba0183f..de3068b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
+++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
@@ -82,7 +82,8 @@ static int nouveau_optimus_dsm(acpi_handle handle, int func, 
int arg, uint32_t *
struct acpi_object_list input;
union acpi_object params[4];
union acpi_object *obj;
-   int i, err;
+   acpi_status status;
+   int i;
char args_buff[4];
 
input.count = 4;
@@ -101,10 +102,11 @@ static int nouveau_optimus_dsm(acpi_handle handle, int 
func, int arg, uint32_t *
args_buff[i] = (arg >> i * 8) & 0xFF;
params[3].buffer.pointer = args_buff;
 
-   err = acpi_evaluate_object(handle, "_DSM", , );
-   if (err) {
-   printk(KERN_INFO "failed to evaluate _DSM: %d\n", err);
-   return err;
+   status = acpi_evaluate_object(handle, "_DSM", , );
+   if (ACPI_FAILURE(status)) {
+   pr_info("failed to evaluate _DSM: %s\n",
+   acpi_format_exception(status));
+   return -EINVAL;
}
 
obj = (union acpi_object *)output.pointer;
@@ -134,7 +136,7 @@

linux-next: manual merge of the sound-asoc tree with the tree

2014-01-20 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the sound-asoc tree got a conflict in
sound/soc/soc-compress.c between commit 2a99ef0fdb35 ("ASoC: compress:
Add suport for DPCM into compressed audio") from the sound tree and
commit 76063d340520 ("ASoC: compress: Add suport for DPCM into compressed
audio") from the sound-asoc tree.

The sound tree version had a later Author date, so I just used that
version - let me know if something else should be done.  Otherwise, the
sound-asoc tree needs to be cleaned up as this is the only change left in
it (relative to the sound tree).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpmRV8zBwyBw.pgp
Description: PGP signature

Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages

2014-01-20 Thread Minchan Kim

Hello Cai,

On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote:
> zswap can support multiple swapfiles. So we need to check
> all zbud pool pages in zswap.
> 
> Version 2:
>   * add *total_zbud_pages* in zbud to record all the pages in pools
>   * move the updating of pool pages statistics to
> alloc_zbud_page/free_zbud_page to hide the details
> 
> Signed-off-by: Cai Liu 
> ---
>  include/linux/zbud.h |2 +-
>  mm/zbud.c|   44 
>  mm/zswap.c   |4 ++--
>  3 files changed, 35 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/zbud.h b/include/linux/zbud.h
> index 2571a5c..1dbc13e 100644
> --- a/include/linux/zbud.h
> +++ b/include/linux/zbud.h
> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, unsigned long 
> handle);
>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
>  void *zbud_map(struct zbud_pool *pool, unsigned long handle);
>  void zbud_unmap(struct zbud_pool *pool, unsigned long handle);
> -u64 zbud_get_pool_size(struct zbud_pool *pool);
> +u64 zbud_get_pool_size(void);
>  
>  #endif /* _ZBUD_H_ */
> diff --git a/mm/zbud.c b/mm/zbud.c
> index 9451361..711aaf4 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -52,6 +52,13 @@
>  #include 
>  #include 
>  
> +/*
> +* statistics
> +**/
> +
> +/* zbud pages in all pools */
> +static u64 total_zbud_pages;
> +
>  /*
>   * Structures
>  */
> @@ -142,10 +149,28 @@ static struct zbud_header *init_zbud_page(struct page 
> *page)
>   return zhdr;
>  }
>  
> +static struct page *alloc_zbud_page(struct zbud_pool *pool, gfp_t gfp)
> +{
> + struct page *page;
> +
> + page = alloc_page(gfp);
> +
> + if (page) {
> + pool->pages_nr++;
> + total_zbud_pages++;

Who protect race?

> + }
> +
> + return page;
> +}
> +
> +
>  /* Resets the struct page fields and frees the page */
> -static void free_zbud_page(struct zbud_header *zhdr)
> +static void free_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr)
>  {
>   __free_page(virt_to_page(zhdr));
> +
> + pool->pages_nr--;
> + total_zbud_pages--;
>  }
>  
>  /*
> @@ -279,11 +304,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t 
> gfp,
>  
>   /* Couldn't find unbuddied zbud page, create new one */
>   spin_unlock(>lock);
> - page = alloc_page(gfp);
> + page = alloc_zbud_page(pool, gfp);
>   if (!page)
>   return -ENOMEM;
>   spin_lock(>lock);
> - pool->pages_nr++;
>   zhdr = init_zbud_page(page);
>   bud = FIRST;
>  
> @@ -349,8 +373,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long 
> handle)
>   if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
>   /* zbud page is empty, free */
>   list_del(>lru);
> - free_zbud_page(zhdr);
> - pool->pages_nr--;
> + free_zbud_page(pool, zhdr);
>   } else {
>   /* Add to unbuddied list */
>   freechunks = num_free_chunks(zhdr);
> @@ -447,8 +470,7 @@ next:
>* Both buddies are now free, free the zbud page and
>* return success.
>*/
> - free_zbud_page(zhdr);
> - pool->pages_nr--;
> + free_zbud_page(pool, zhdr);
>   spin_unlock(>lock);
>   return 0;
>   } else if (zhdr->first_chunks == 0 ||
> @@ -496,14 +518,12 @@ void zbud_unmap(struct zbud_pool *pool, unsigned long 
> handle)
>  
>  /**
>   * zbud_get_pool_size() - gets the zbud pool size in pages
> - * @pool:pool whose size is being queried
>   *
> - * Returns: size in pages of the given pool.  The pool lock need not be
> - * taken to access pages_nr.
> + * Returns: size in pages of all the zbud pools.
>   */
> -u64 zbud_get_pool_size(struct zbud_pool *pool)
> +u64 zbud_get_pool_size(void)
>  {
> - return pool->pages_nr;
> + return total_zbud_pages;
>  }
>  
>  static int __init init_zbud(void)
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 5a63f78..ef44d9d 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -291,7 +291,7 @@ static void zswap_free_entry(struct zswap_tree *tree,
>   zbud_free(tree->pool, entry->handle);
>   zswap_entry_cache_free(entry);
>   atomic_dec(_stored_pages);
> - zswap_pool_pages = zbud_get_pool_size(tree->pool);
> + zswap_pool_pages = zbud_get_pool_size();
>  }
>  
>  /* caller must hold the tree lock */
> @@ -716,7 +716,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t 
> offset,
>  
>   /* update stats */
>   atomic_inc(_stored_pages);
> - zswap_pool_pages = zbud_get_pool_size(tree->pool);
> + zswap_pool_pages = zbud_get_pool_size();
>  
>   return 0;
>  
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe

Re: math_state_restore and kernel_fpu_end disable interrupts?

2014-01-20 Thread Nate Eldredge


On Sun, 19 Jan 2014, George Spelvin wrote:


It's credited to Suresh Siddha, whom I've cc'ed (along with others who
signed off).  Suresh, if you're still around, could you comment on why
math_state_restore always leaves interrupts disabled, regardless of their
state on entry?  Is there a deep reason or is it a bug?


What the comments seemed to be implying was that it was a bug to enter
this code with interrupts enabled.  So the problem may be a little bit
more systemic; expert counsel is required.


It would be kind of weird for code that requires disabled interrupts on 
entry to turn around and enable interrupts itself.  I agree that it would 
really help for a guru to take a look...


On which note, Suresh's email bounced :-(

--
Nate Eldredge
n...@thatsmathematics.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/2] i2c: New bus driver for the QUP I2C controller

2014-01-20 Thread Stephen Boyd

On 01/17, Bjorn Andersson wrote:
> diff --git a/drivers/i2c/busses/i2c-qup.c b/drivers/i2c/busses/i2c-qup.c
> new file mode 100644
> index 000..2e0020e
> --- /dev/null
> +++ b/drivers/i2c/busses/i2c-qup.c
> @@ -0,0 +1,894 @@
> +/* Copyright (c) 2009-2013, The Linux Foundation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 and
> + * only version 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* QUP Registers */
> +#define QUP_CONFIG   0x000
> +#define QUP_STATE0x004
> +#define QUP_IO_MODE  0x008
> +#define QUP_SW_RESET 0x00c
> +#define QUP_OPERATIONAL  0x018
> +#define QUP_ERROR_FLAGS  0x01c
> +#define QUP_ERROR_FLAGS_EN   0x020
> +#define QUP_HW_VERSION   0x030
> +#define QUP_MX_OUTPUT_CNT0x100
> +#define QUP_OUT_FIFO_BASE0x110
> +#define QUP_MX_WRITE_CNT 0x150
> +#define QUP_MX_INPUT_CNT 0x200
> +#define QUP_MX_READ_CNT  0x208
> +#define QUP_IN_FIFO_BASE 0x218
> +#define QUP_I2C_CLK_CTL  0x400
> +#define QUP_I2C_STATUS   0x404
> +
> +/* QUP States and reset values */
> +#define QUP_RESET_STATE  0
> +#define QUP_RUN_STATE1
> +#define QUP_PAUSE_STATE  3
> +#define QUP_STATE_MASK   3
> +
> +#define QUP_STATE_VALID  BIT(2)
> +#define QUP_I2C_MAST_GEN BIT(4)
> +
> +#define QUP_OPERATIONAL_RESET0x000ff0
> +#define QUP_I2C_STATUS_RESET 0xfc
> +
> +/* QUP OPERATIONAL FLAGS */
> +#define QUP_OUT_SVC_FLAG BIT(8)
> +#define QUP_IN_SVC_FLAG  BIT(9)
> +#define QUP_MX_INPUT_DONEBIT(11)
> +
> +/* I2C mini core related values */
> +#define I2C_MINI_CORE(2 << 8)
> +#define I2C_N_VAL15
> +/* Most significant word offset in FIFO port */
> +#define QUP_MSW_SHIFT(I2C_N_VAL + 1)
> +#define QUP_CLOCK_AUTO_GATE  BIT(13)
> +
> +/* Packing/Unpacking words in FIFOs, and IO modes */
> +#define QUP_UNPACK_ENBIT(14)
> +#define QUP_PACK_EN  BIT(15)
> +#define QUP_OUTPUT_BLK_MODE  BIT(10)
> +#define QUP_INPUT_BLK_MODE   BIT(12)
> +
> +#define QUP_REPACK_EN(QUP_UNPACK_EN | QUP_PACK_EN)
> +
> +#define QUP_OUTPUT_BLOCK_SIZE(x)(((x) & (0x03 << 0)) >> 0)
> +#define QUP_OUTPUT_FIFO_SIZE(x)  (((x) & (0x07 << 2)) >> 2)
> +#define QUP_INPUT_BLOCK_SIZE(x)  (((x) & (0x03 << 5)) >> 5)
> +#define QUP_INPUT_FIFO_SIZE(x)   (((x) & (0x07 << 7)) >> 7)
> +
> +/* QUP tags */
> +#define QUP_OUT_NOP  (0 << 8)
> +#define QUP_OUT_START(1 << 8)
> +#define QUP_OUT_DATA (2 << 8)
> +#define QUP_OUT_STOP (3 << 8)
> +#define QUP_OUT_REC  (4 << 8)
> +#define QUP_IN_DATA  (5 << 8)
> +#define QUP_IN_STOP  (6 << 8)
> +#define QUP_IN_NACK  (7 << 8)
> +
> +/* Status, Error flags */
> +#define I2C_STATUS_WR_BUFFER_FULLBIT(0)
> +#define I2C_STATUS_BUS_ACTIVEBIT(8)
> +#define I2C_STATUS_BUS_MASTERBIT(9)
> +#define I2C_STATUS_ERROR_MASK0x38000fc
> +#define QUP_I2C_NACK_FLAGBIT(3)
> +#define QUP_IN_NOT_EMPTY BIT(5)
> +#define QUP_STATUS_ERROR_FLAGS   0x7c
> +
> +/* Master bus_err clock states */
> +#define I2C_CLK_RESET_BUSIDLE_STATE  0
> +#define I2C_CLK_FORCED_LOW_STATE 5
> +
> +#define QUP_MAX_CLK_STATE_RETRIES300
> +#define QUP_MAX_QUP_STATE_RETRIES100
> +#define I2C_STATUS_CLK_STATE 13
> +#define QUP_OUT_FIFO_NOT_EMPTY   0x10
> +#define QUP_READ_LIMIT   256
> +
> +struct qup_i2c_dev {
> + struct device   *dev;
> + void __iomem*base;
> + int irq;
> + struct clk  *clk;
> + struct clk  *pclk;
> + struct i2c_adapter  adap;
> +
> + int clk_ctl;
> + int one_bit_t;
> + int out_fifo_sz;
> + int in_fifo_sz;
> + int out_blk_sz;
> + int in_blk_sz;
> + unsigned long   xfer_time;
> + unsigned long   wait_idle;
> +
> + struct i2c_msg  *msg;
> + /* Current posion in user message buffer */

s/posion/position/

> + int pos;
> + /* Keep number of bytes left to be transmitted */
> + int cnt;
> + /*

Re: [ANNOUNCE] 3.12.6-rt9

2014-01-20 Thread Steven Rostedt

On Sat, 18 Jan 2014 04:15:29 +0100
Mike Galbraith  wrote:

> > So you also have the timers-do-not-raise-softirq-unconditionally.patch?
> 

People have been complaining that the latest 3.12-rt does not boot on
intel i7 boxes. And by reverting this patch, it boots fine.

I happen to have a i7 box to test on, and sure enough, the latest
3.12-rt locks up on boot and reverting the
timers-do-not-raise-softirq-unconditionally.patch, it boots fine.

Looking into it, I made this small update, and the box boots. Seems
checking "active_timers" is not enough to skip raising softirqs. I
haven't looked at why yet, but I would like others to test this patch
too.

I'll leave why this lets i7 boxes boot as an exercise for Thomas ;-)

-- Steve

Signed-off-by: Steven Rostedt 

diff --git a/kernel/timer.c b/kernel/timer.c
index 46467be..8212c10 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1464,13 +1464,11 @@ void run_local_timers(void)
raise_softirq(TIMER_SOFTIRQ);
return;
}
-   if (!base->active_timers)
-   goto out;

/* Check whether the next pending timer has expired */
if (time_before_eq(base->next_timer, jiffies))
raise_softirq(TIMER_SOFTIRQ);
-out:
+
rt_spin_unlock_after_trylock_in_irq(>lock);

 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Backlight driver for MacBook Air 6,1 and 6,2

2014-01-20 Thread Patrik Jakobsson

Hi Andrew and CCs

I've put together (rather quickly) a driver for directly handling the backlight
driver chip (LM8550) on the 2013 MacBook Air. It is needed to work around a bug
(likely in firmware) that occurs after suspend/resume. See:

https://bugs.freedesktop.org/show_bug.cgi?id=67454

This seems to fall outside of what the i915 driver should handle and thus need a
separate driver. It's available at: https://github.com/patjak/mba6x_bl

The MacBook Air provides ACPI backlight methods but they also break after
suspend. I'm planning to mainline this and have a few questions.

1) I'm accessing the LP8550 on the SMBUS through ACPI methods. Should I access
the SMBUS directly instead or is this ok? I probably need to look at locking
around SMBUS accesses.

2) Is DMI the proper way of probing? Currently I'm just checking if the chip is
there and that it returns the proper contents in an identifier byte.

3) I assume the backlight type should be BACKLIGHT_PLATFORM (currently
BACKLIGHT_FIRMWARE) but do I also need to blacklist the ACPI backlight on these
devices? How do I get the proper precedence over other backlight devices?

Is there still time to get this into 3.14-rc1?

Thanks
Patrik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [QUERY]: Is using CPU hotplug right for isolating CPUs?

2014-01-20 Thread Lei Wen

On Mon, Jan 20, 2014 at 11:41 PM, Frederic Weisbecker
 wrote:
> On Mon, Jan 20, 2014 at 08:30:10PM +0530, Viresh Kumar wrote:
>> On 20 January 2014 19:29, Lei Wen  wrote:
>> > Hi Viresh,
>>
>> Hi Lei,
>>
>> > I have one question regarding unbounded workqueue migration in your case.
>> > You use hotplug to migrate the unbounded work to other cpus, but its cpu 
>> > mask
>> > would still be 0xf, since cannot be changed by cpuset.
>> >
>> > My question is how you could prevent this unbounded work migrate back
>> > to your isolated cpu?
>> > Seems to me there is no such mechanism in kernel, am I understand wrong?
>>
>> These workqueues are normally queued back from workqueue handler. And we
>> normally queue them on the local cpu, that's the default behavior of 
>> workqueue
>> subsystem. And so they land up on the same CPU again and again.
>
> But for workqueues having a global affinity, I think they can be rescheduled 
> later
> on the old CPUs. Although I'm not sure about that, I'm Cc'ing Tejun.

Agree, since worker thread is made as enterring into all cpus, it
cannot prevent scheduler
do the migration.

But here is one point, that I see Viresh alredy set up two cpuset with
scheduler load balance
disabled, so it should stop the task migration between those two groups? Since
the sched_domain changed?

What is more, I also did  similiar test, and find when I set two such
cpuset group,
like core 0-2 to cpuset1, core 3 to cpuset2, while hotunplug the core3
afterwise.
I find the cpuset's cpus member becomes NULL even I hotplug the core3
back again.
So is it a bug?

Thanks,
Lei

>
> Also, one of the plan is to extend the sysfs interface of workqueues to 
> override
> their affinity. If any of you guys want to try something there, that would be 
> welcome.
> Also we want to work on the timer affinity. Perhaps we don't need a user 
> interface
> for that, or maybe something on top of full dynticks to outline that we want 
> the unbound
> timers to run on housekeeping CPUs only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 2/6] MCS Lock: Restructure the MCS lock defines and locking

2014-01-20 Thread Tim Chen

We will need the MCS lock code for doing optimistic spinning for rwsem
and queue rwlock.  Extracting the MCS code from mutex.c and put into
its own file allow us to reuse this code easily.

Note that using the smp_load_acquire/smp_store_release pair used in
mcs_lock and mcs_unlock is not sufficient to form a full memory barrier
across cpus for many architectures (except x86).  For applications that
absolutely need a full barrier across multiple cpus with mcs_unlock and
mcs_lock pair, smp_mb__after_unlock_lock() should be used after mcs_lock.

Signed-off-by: Tim Chen 
Signed-off-by: Davidlohr Bueso 
---
 include/linux/mcs_spinlock.h | 81 
 include/linux/mutex.h|  5 +--
 kernel/locking/mutex.c   | 68 -
 3 files changed, 91 insertions(+), 63 deletions(-)
 create mode 100644 include/linux/mcs_spinlock.h

diff --git a/include/linux/mcs_spinlock.h b/include/linux/mcs_spinlock.h
new file mode 100644
index 000..23912cb
--- /dev/null
+++ b/include/linux/mcs_spinlock.h
@@ -0,0 +1,81 @@
+/*
+ * MCS lock defines
+ *
+ * This file contains the main data structure and API definitions of MCS lock.
+ *
+ * The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spin-lock
+ * with the desirable properties of being fair, and with each cpu trying
+ * to acquire the lock spinning on a local variable.
+ * It avoids expensive cache bouncings that common test-and-set spin-lock
+ * implementations incur.
+ */
+#ifndef __LINUX_MCS_SPINLOCK_H
+#define __LINUX_MCS_SPINLOCK_H
+
+struct mcs_spinlock {
+   struct mcs_spinlock *next;
+   int locked; /* 1 if lock acquired */
+};
+
+/*
+ * Note: the smp_load_acquire/smp_store_release pair is not
+ * sufficient to form a full memory barrier across
+ * cpus for many architectures (except x86) for mcs_unlock and mcs_lock.
+ * For applications that need a full barrier across multiple cpus
+ * with mcs_unlock and mcs_lock pair, smp_mb__after_unlock_lock() should be
+ * used after mcs_lock.
+ */
+
+/*
+ * We don't inline mcs_spin_lock() so that perf can correctly account for the
+ * time spent in this lock function.
+ */
+static noinline
+void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
+{
+   struct mcs_spinlock *prev;
+
+   /* Init node */
+   node->locked = 0;
+   node->next   = NULL;
+
+   prev = xchg(lock, node);
+   if (likely(prev == NULL)) {
+   /* Lock acquired */
+   node->locked = 1;
+   return;
+   }
+   ACCESS_ONCE(prev->next) = node;
+   /*
+* Wait until the lock holder passes the lock down.
+* Using smp_load_acquire() provides a memory barrier that
+* ensures subsequent operations happen after the lock is acquired.
+*/
+   while (!(smp_load_acquire(>locked)))
+   arch_mutex_cpu_relax();
+}
+
+static void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock 
*node)
+{
+   struct mcs_spinlock *next = ACCESS_ONCE(node->next);
+
+   if (likely(!next)) {
+   /*
+* Release the lock by setting it to NULL
+*/
+   if (cmpxchg(lock, node, NULL) == node)
+   return;
+   /* Wait until the next pointer is set */
+   while (!(next = ACCESS_ONCE(node->next)))
+   arch_mutex_cpu_relax();
+   }
+   /*
+* Pass lock to next waiter.
+* smp_store_release() provides a memory barrier to ensure
+* all operations in the critical section has been completed
+* before unlocking.
+*/
+   smp_store_release(>locked, 1);
+}
+
+#endif /* __LINUX_MCS_SPINLOCK_H */
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index d318193..c482e1d 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -46,6 +46,7 @@
  * - detects multi-task circular deadlocks and prints out all affected
  *   locks and tasks (and only those tasks)
  */
+struct mcs_spinlock;
 struct mutex {
/* 1: unlocked, 0: locked, negative: locked, possible waiters */
atomic_tcount;
@@ -55,7 +56,7 @@ struct mutex {
struct task_struct  *owner;
 #endif
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
-   void*spin_mlock;/* Spinner MCS lock */
+   struct mcs_spinlock *mcs_lock;  /* Spinner MCS lock */
 #endif
 #ifdef CONFIG_DEBUG_MUTEXES
const char  *name;
@@ -179,4 +180,4 @@ extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct 
mutex *lock);
 # define arch_mutex_cpu_relax() cpu_relax()
 #endif
 
-#endif
+#endif /* __LINUX_MUTEX_H */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index fbbd2ed..45fe1b5 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * In the DEBUG case we are using the "NULL fastpath" for

[PATCH v8 4/6] MCS Lock: Move mcs_lock/unlock function into its own

2014-01-20 Thread Tim Chen

From: Waiman Long 

Create a new mcs_spinlock.c file to contain the
mcs_spin_lock() and mcs_spin_unlock() function.

Signed-off-by: Waiman Long 
Signed-off-by: Tim Chen 
---
 include/linux/mcs_spinlock.h   | 77 ++
 kernel/locking/Makefile|  6 +-
 .../locking/mcs_spinlock.c | 27 
 3 files changed, 18 insertions(+), 92 deletions(-)
 copy include/linux/mcs_spinlock.h => kernel/locking/mcs_spinlock.c (82%)

diff --git a/include/linux/mcs_spinlock.h b/include/linux/mcs_spinlock.h
index bfe84c6..d54bb23 100644
--- a/include/linux/mcs_spinlock.h
+++ b/include/linux/mcs_spinlock.h
@@ -17,78 +17,9 @@ struct mcs_spinlock {
int locked; /* 1 if lock acquired */
 };
 
-/*
- * Note: the smp_load_acquire/smp_store_release pair is not
- * sufficient to form a full memory barrier across
- * cpus for many architectures (except x86) for mcs_unlock and mcs_lock.
- * For applications that need a full barrier across multiple cpus
- * with mcs_unlock and mcs_lock pair, smp_mb__after_unlock_lock() should be
- * used after mcs_lock.
- */
-
-/*
- * In order to acquire the lock, the caller should declare a local node and
- * pass a reference of the node to this function in addition to the lock.
- * If the lock has already been acquired, then this will proceed to spin
- * on this node->locked until the previous lock holder sets the node->locked
- * in mcs_spin_unlock().
- *
- * We don't inline mcs_spin_lock() so that perf can correctly account for the
- * time spent in this lock function.
- */
-static noinline
-void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
-{
-   struct mcs_spinlock *prev;
-
-   /* Init node */
-   node->locked = 0;
-   node->next   = NULL;
-
-   prev = xchg(lock, node);
-   if (likely(prev == NULL)) {
-   /* Lock acquired, don't need to set node->locked to 1
-* as lock owner and other contenders won't check this value.
-* If a debug mode is needed to audit lock status, then
-* set node->locked value here.
-*/
-   return;
-   }
-   ACCESS_ONCE(prev->next) = node;
-   /*
-* Wait until the lock holder passes the lock down.
-* Using smp_load_acquire() provides a memory barrier that
-* ensures subsequent operations happen after the lock is acquired.
-*/
-   while (!(smp_load_acquire(>locked)))
-   arch_mutex_cpu_relax();
-}
-
-/*
- * Releases the lock. The caller should pass in the corresponding node that
- * was used to acquire the lock.
- */
-static void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock 
*node)
-{
-   struct mcs_spinlock *next = ACCESS_ONCE(node->next);
-
-   if (likely(!next)) {
-   /*
-* Release the lock by setting it to NULL
-*/
-   if (likely(cmpxchg(lock, node, NULL) == node))
-   return;
-   /* Wait until the next pointer is set */
-   while (!(next = ACCESS_ONCE(node->next)))
-   arch_mutex_cpu_relax();
-   }
-   /*
-* Pass lock to next waiter.
-* smp_store_release() provides a memory barrier to ensure
-* all operations in the critical section has been completed
-* before unlocking.
-*/
-   smp_store_release(>locked, 1);
-}
+extern
+void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node);
+extern
+void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node);
 
 #endif /* __LINUX_MCS_SPINLOCK_H */
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index baab8e5..20d9d5c 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -13,12 +13,12 @@ obj-$(CONFIG_LOCKDEP) += lockdep.o
 ifeq ($(CONFIG_PROC_FS),y)
 obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
 endif
-obj-$(CONFIG_SMP) += spinlock.o
-obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
+obj-$(CONFIG_SMP) += spinlock.o mcs_spinlock.o
+obj-$(CONFIG_PROVE_LOCKING) += spinlock.o mcs_spinlock.o
 obj-$(CONFIG_RT_MUTEXES) += rtmutex.o
 obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o
 obj-$(CONFIG_RT_MUTEX_TESTER) += rtmutex-tester.o
-obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o
+obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o mcs_spinlock.o
 obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o
 obj-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
 obj-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem-xadd.o
diff --git a/include/linux/mcs_spinlock.h b/kernel/locking/mcs_spinlock.c
similarity index 82%
copy from include/linux/mcs_spinlock.h
copy to kernel/locking/mcs_spinlock.c
index bfe84c6..c3ee9cf 100644
--- a/include/linux/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.c
@@ -1,7 +1,5 @@
 /*
- * MCS lock defines
- *
- * This file contains the main data structure and API definitions of MCS lock.
+ * MCS lock
  *
  * The

[PATCH v8 3/6] MCS Lock: optimizations and extra comments

2014-01-20 Thread Tim Chen

From: Jason Low 

Remove unnecessary operation to assign locked status to 1 if lock is
acquired without contention as this value will not be checked by lock
holder again and other potential lock contenders will not be looking at
their own lock status.

Make the cmpxchg(lock, node, NULL) == node check in mcs_spin_unlock()
likely() as it is likely that a race did not occur most of the time.

Also add in more comments describing how the local node is used in MCS locks.

Reviewed-by: Tim Chen 
Signed-off-by: Jason Low 
Signed-off-by: Tim Chen 
---
 include/linux/mcs_spinlock.h | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/include/linux/mcs_spinlock.h b/include/linux/mcs_spinlock.h
index 23912cb..bfe84c6 100644
--- a/include/linux/mcs_spinlock.h
+++ b/include/linux/mcs_spinlock.h
@@ -27,6 +27,12 @@ struct mcs_spinlock {
  */
 
 /*
+ * In order to acquire the lock, the caller should declare a local node and
+ * pass a reference of the node to this function in addition to the lock.
+ * If the lock has already been acquired, then this will proceed to spin
+ * on this node->locked until the previous lock holder sets the node->locked
+ * in mcs_spin_unlock().
+ *
  * We don't inline mcs_spin_lock() so that perf can correctly account for the
  * time spent in this lock function.
  */
@@ -41,8 +47,11 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct 
mcs_spinlock *node)
 
prev = xchg(lock, node);
if (likely(prev == NULL)) {
-   /* Lock acquired */
-   node->locked = 1;
+   /* Lock acquired, don't need to set node->locked to 1
+* as lock owner and other contenders won't check this value.
+* If a debug mode is needed to audit lock status, then
+* set node->locked value here.
+*/
return;
}
ACCESS_ONCE(prev->next) = node;
@@ -55,6 +64,10 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct 
mcs_spinlock *node)
arch_mutex_cpu_relax();
 }
 
+/*
+ * Releases the lock. The caller should pass in the corresponding node that
+ * was used to acquire the lock.
+ */
 static void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock 
*node)
 {
struct mcs_spinlock *next = ACCESS_ONCE(node->next);
@@ -63,7 +76,7 @@ static void mcs_spin_unlock(struct mcs_spinlock **lock, 
struct mcs_spinlock *nod
/*
 * Release the lock by setting it to NULL
 */
-   if (cmpxchg(lock, node, NULL) == node)
+   if (likely(cmpxchg(lock, node, NULL) == node))
return;
/* Wait until the next pointer is set */
while (!(next = ACCESS_ONCE(node->next)))
-- 
1.7.11.7



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 6/6] MCS Lock: Allow architecture specific asm files to be used for contended case

2014-01-20 Thread Tim Chen

From: Peter Zijlstra 

This patch allows each architecture to add its specific assembly optimized
arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for
MCS lock and unlock functions.

Signed-off-by: Tim Chen 
---
 arch/alpha/include/asm/Kbuild  |  1 +
 arch/arc/include/asm/Kbuild|  1 +
 arch/arm/include/asm/Kbuild|  1 +
 arch/arm64/include/asm/Kbuild  |  1 +
 arch/avr32/include/asm/Kbuild  |  1 +
 arch/blackfin/include/asm/Kbuild   |  1 +
 arch/c6x/include/asm/Kbuild|  1 +
 arch/cris/include/asm/Kbuild   |  1 +
 arch/frv/include/asm/Kbuild|  1 +
 arch/hexagon/include/asm/Kbuild|  1 +
 arch/ia64/include/asm/Kbuild   |  2 +-
 arch/m32r/include/asm/Kbuild   |  1 +
 arch/m68k/include/asm/Kbuild   |  1 +
 arch/metag/include/asm/Kbuild  |  1 +
 arch/microblaze/include/asm/Kbuild |  1 +
 arch/mips/include/asm/Kbuild   |  1 +
 arch/mn10300/include/asm/Kbuild|  1 +
 arch/openrisc/include/asm/Kbuild   |  1 +
 arch/parisc/include/asm/Kbuild |  1 +
 arch/powerpc/include/asm/Kbuild|  2 +-
 arch/s390/include/asm/Kbuild   |  1 +
 arch/score/include/asm/Kbuild  |  1 +
 arch/sh/include/asm/Kbuild |  1 +
 arch/sparc/include/asm/Kbuild  |  1 +
 arch/tile/include/asm/Kbuild   |  1 +
 arch/um/include/asm/Kbuild |  1 +
 arch/unicore32/include/asm/Kbuild  |  1 +
 arch/x86/include/asm/Kbuild|  1 +
 arch/xtensa/include/asm/Kbuild |  1 +
 include/asm-generic/mcs_spinlock.h | 13 +
 include/linux/mcs_spinlock.h   |  2 ++
 31 files changed, 44 insertions(+), 2 deletions(-)
 create mode 100644 include/asm-generic/mcs_spinlock.h

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index f01fb50..14cbbbc 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -4,3 +4,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
 generic-y += preempt.h
+generic-y += mcs_spinlock.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 9ae21c1..c0773a5 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -48,3 +48,4 @@ generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += mcs_spinlock.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index c38b58c..c68cfdd 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -34,3 +34,4 @@ generic-y += timex.h
 generic-y += trace_clock.h
 generic-y += unaligned.h
 generic-y += preempt.h
+generic-y += mcs_spinlock.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 519f89f..24a3c10 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -51,3 +51,4 @@ generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += mcs_spinlock.h
diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild
index 658001b..466e13d 100644
--- a/arch/avr32/include/asm/Kbuild
+++ b/arch/avr32/include/asm/Kbuild
@@ -18,3 +18,4 @@ generic-y   += sections.h
 generic-y   += topology.h
 generic-y  += trace_clock.h
 generic-y   += xor.h
+generic-y += mcs_spinlock.h
diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild
index f2b4347..0bd1c5c 100644
--- a/arch/blackfin/include/asm/Kbuild
+++ b/arch/blackfin/include/asm/Kbuild
@@ -45,3 +45,4 @@ generic-y += unaligned.h
 generic-y += user.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += mcs_spinlock.h
diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild
index fc0b3c3..21d7100 100644
--- a/arch/c6x/include/asm/Kbuild
+++ b/arch/c6x/include/asm/Kbuild
@@ -57,3 +57,4 @@ generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += mcs_spinlock.h
diff --git a/arch/cris/include/asm/Kbuild b/arch/cris/include/asm/Kbuild
index 199b1a9..c571cc1 100644
--- a/arch/cris/include/asm/Kbuild
+++ b/arch/cris/include/asm/Kbuild
@@ -13,3 +13,4 @@ generic-y += trace_clock.h
 generic-y += vga.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += mcs_spinlock.h
diff --git a/arch/frv/include/asm/Kbuild b/arch/frv/include/asm/Kbuild
index 74742dc..ccca92e 100644
--- a/arch/frv/include/asm/Kbuild
+++ b/arch/frv/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
 generic-y += preempt.h
+generic-y += mcs_spinlock.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index ada843c..553077d 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -55,3 +55,4 @@ generic-y += ucontext.h
 generic-y += unaligned.h
 generic-y += xor.h
 generic-y += preempt.h
+generic-y += mcs_spinlock.h
diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index f93ee08..25aed55 100644
--- a/arch/ia64/include/asm/Kbuild
+++

[PATCH v8 5/6] MCS Lock: allow architectures to hook in to contended

2014-01-20 Thread Tim Chen

From: Will Deacon 

When contended, architectures may be able to reduce the polling overhead
in ways which aren't expressible using a simple relax() primitive.

This patch allows architectures to hook into the mcs_{lock,unlock}
functions for the contended cases only.

Signed-off-by: Will Deacon 
Signed-off-by: Tim Chen 
---
 kernel/locking/mcs_spinlock.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/mcs_spinlock.c
index c3ee9cf..e12ed32 100644
--- a/kernel/locking/mcs_spinlock.c
+++ b/kernel/locking/mcs_spinlock.c
@@ -16,6 +16,28 @@
 #include 
 #include 
 
+#ifndef arch_mcs_spin_lock_contended
+/*
+ * Using smp_load_acquire() provides a memory barrier that ensures
+ * subsequent operations happen after the lock is acquired.
+ */
+#define arch_mcs_spin_lock_contended(l)
\
+do {   \
+   while (!(smp_load_acquire(l)))  \
+   arch_mutex_cpu_relax(); \
+} while (0)
+#endif
+
+#ifndef arch_mcs_spin_unlock_contended
+/*
+ * smp_store_release() provides a memory barrier to ensure all
+ * operations in the critical section has been completed before
+ * unlocking.
+ */
+#define arch_mcs_spin_unlock_contended(l)  \
+   smp_store_release((l), 1)
+#endif
+
 /*
  * Note: the smp_load_acquire/smp_store_release pair is not
  * sufficient to form a full memory barrier across
@@ -50,13 +72,9 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct 
mcs_spinlock *node)
return;
}
ACCESS_ONCE(prev->next) = node;
-   /*
-* Wait until the lock holder passes the lock down.
-* Using smp_load_acquire() provides a memory barrier that
-* ensures subsequent operations happen after the lock is acquired.
-*/
-   while (!(smp_load_acquire(>locked)))
-   arch_mutex_cpu_relax();
+
+   /* Wait until the lock holder passes the lock down. */
+   arch_mcs_spin_lock_contended(>locked);
 }
 EXPORT_SYMBOL_GPL(mcs_spin_lock);
 
@@ -78,12 +96,8 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct 
mcs_spinlock *node)
while (!(next = ACCESS_ONCE(node->next)))
arch_mutex_cpu_relax();
}
-   /*
-* Pass lock to next waiter.
-* smp_store_release() provides a memory barrier to ensure
-* all operations in the critical section has been completed
-* before unlocking.
-*/
-   smp_store_release(>locked, 1);
+
+   /* Pass lock to next waiter. */
+   arch_mcs_spin_unlock_contended(>locked);
 }
 EXPORT_SYMBOL_GPL(mcs_spin_unlock);
-- 
1.7.11.7



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v8 0/6] MCS Lock: MCS lock code cleanup and optimizations

2014-01-20 Thread Tim Chen

This update to the patch series reorganize the order of the patches
by fixing MCS lock barrier leakage first before making standalone
MCS lock and unlock functions.  We also changed the hooks to architecture
specific mcs_spin_lock_contended and mcs_spin_lock_uncontended from
needing Kconfig to generic-asm and putting arch specific asm headers
as needed.  Peter, please review the last patch and bless it with your
signed-off if it looks right.

This patch series fixes barriers of MCS lock and perform some optimizations.
Proper passing of the mcs lock is now done with smp_load_acquire() in
mcs_spin_lock() and smp_store_release() in mcs_spin_unlock.  Note that
this is not sufficient to form a full memory barrier across cpus on
many architectures (except x86) for the mcs_unlock and mcs_lock pair.
For code that needs a full memory barrier with mcs_unlock and mcs_lock
pair, smp_mb__after_unlock_lock() should be used after mcs_lock.

Will also added hooks to allow for architecture specific 
implementation and optimization of the of the contended paths of
lock and unlock of mcs_spin_lock and mcs_spin_unlock functions.

The original mcs lock code has potential leaks between critical sections, which
was not a problem when MCS was embedded within the mutex but needs
to be corrected when allowing the MCS lock to be used by itself for
other locking purposes.  The MCS lock code was previously embedded in
the mutex.c and is now sepearted.  This allows for easier reuse of MCS
lock in other places like rwsem and qrwlock.

Tim

v8:
1. Move order of patches by putting barrier corrections first.
2. Use generic-asm headers for hooking in arch specific mcs_spin_lock_contended
and mcs_spin_lock_uncontended function.
3. Some minor cleanup and comments added.

v7:
1. Update architecture specific hooks with concise architecture
specific arch_mcs_spin_lock_contended and arch_mcs_spin_lock_uncontended
functions. 

v6:
1. Fix a bug of improper xchg_acquire and extra space in barrier
fixing patch.
2. Added extra hooks to allow for architecture specific version
of mcs_spin_lock and mcs_spin_unlock to be used.

v5:
1. Rework barrier correction patch.  We now use smp_load_acquire()
in mcs_spin_lock() and smp_store_release() in
mcs_spin_unlock() to allow for architecture dependent barriers to be
automatically used.  This is clean and will provide the right
barriers for all architecture.

v4:
1. Move patch series to the latest tip after v3.12

v3:
1. modified memory barriers to support non x86 architectures that have
weak memory ordering.

v2:
1. change export mcs_spin_lock as a GPL export symbol
2. corrected mcs_spin_lock to references


Jason Low (1):
  MCS Lock: optimizations and extra comments

Peter Zijlstra (1):
  MCS Lock: Allow architecture specific asm files to be used for
contended case

Tim Chen (2):
  MCS Lock: Restructure the MCS lock defines and locking
  MCS Lock: allow architectures to hook in to contended

Waiman Long (2):
  MCS Lock: Barrier corrections
  MCS Lock: Move mcs_lock/unlock function into its own

 arch/alpha/include/asm/Kbuild  |   1 +
 arch/arc/include/asm/Kbuild|   1 +
 arch/arm/include/asm/Kbuild|   1 +
 arch/arm64/include/asm/Kbuild  |   1 +
 arch/avr32/include/asm/Kbuild  |   1 +
 arch/blackfin/include/asm/Kbuild   |   1 +
 arch/c6x/include/asm/Kbuild|   1 +
 arch/cris/include/asm/Kbuild   |   1 +
 arch/frv/include/asm/Kbuild|   1 +
 arch/hexagon/include/asm/Kbuild|   1 +
 arch/ia64/include/asm/Kbuild   |   2 +-
 arch/m32r/include/asm/Kbuild   |   1 +
 arch/m68k/include/asm/Kbuild   |   1 +
 arch/metag/include/asm/Kbuild  |   1 +
 arch/microblaze/include/asm/Kbuild |   1 +
 arch/mips/include/asm/Kbuild   |   1 +
 arch/mn10300/include/asm/Kbuild|   1 +
 arch/openrisc/include/asm/Kbuild   |   1 +
 arch/parisc/include/asm/Kbuild |   1 +
 arch/powerpc/include/asm/Kbuild|   2 +-
 arch/s390/include/asm/Kbuild   |   1 +
 arch/score/include/asm/Kbuild  |   1 +
 arch/sh/include/asm/Kbuild |   1 +
 arch/sparc/include/asm/Kbuild  |   1 +
 arch/tile/include/asm/Kbuild   |   1 +
 arch/um/include/asm/Kbuild |   1 +
 arch/unicore32/include/asm/Kbuild  |   1 +
 arch/x86/include/asm/Kbuild|   1 +
 arch/xtensa/include/asm/Kbuild |   1 +
 include/asm-generic/mcs_spinlock.h |  13 +
 include/linux/mcs_spinlock.h   |  27 ++
 include/linux/mutex.h  |   5 +-
 kernel/locking/Makefile|   6 +--
 kernel/locking/mcs_spinlock.c  | 103 +
 kernel/locking/mutex.c |  60 +++--
 35 files changed, 185 insertions(+), 60 deletions(-)
 create mode 100644 include/asm-generic/mcs_spinlock.h
 create mode 100644 include/linux/mcs_spinlock.h
 create mode 100644 kernel/locking/mcs_spinlock.c

-- 
1.7.11.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to

[PATCH v8 1/6] MCS Lock: Barrier corrections

2014-01-20 Thread Tim Chen

From: Waiman Long 

This patch corrects the way memory barriers are used in the MCS lock
with smp_load_acquire and smp_store_release fucnction.
It removes ones that are not needed.

Suggested-by: Michel Lespinasse 
Signed-off-by: Waiman Long 
Signed-off-by: Jason Low 
Signed-off-by: Tim Chen 
---
 kernel/locking/mutex.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4dd6e4c..fbbd2ed 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -136,9 +136,12 @@ void mspin_lock(struct mspin_node **lock, struct 
mspin_node *node)
return;
}
ACCESS_ONCE(prev->next) = node;
-   smp_wmb();
-   /* Wait until the lock holder passes the lock down */
-   while (!ACCESS_ONCE(node->locked))
+   /*
+* Wait until the lock holder passes the lock down.
+* Using smp_load_acquire() provides a memory barrier that
+* ensures subsequent operations happen after the lock is acquired.
+*/
+   while (!(smp_load_acquire(>locked)))
arch_mutex_cpu_relax();
 }
 
@@ -156,8 +159,13 @@ static void mspin_unlock(struct mspin_node **lock, struct 
mspin_node *node)
while (!(next = ACCESS_ONCE(node->next)))
arch_mutex_cpu_relax();
}
-   ACCESS_ONCE(next->locked) = 1;
-   smp_wmb();
+   /*
+* Pass lock to next waiter.
+* smp_store_release() provides a memory barrier to ensure
+* all operations in the critical section has been completed
+* before unlocking.
+*/
+   smp_store_release(>locked, 1);
 }
 
 /*
-- 
1.7.11.7



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ALSA: hda - Fix possible races in HDMI driver - lockup on shutdown when radeon.audio=1 after using audacity

2014-01-20 Thread Arthur Marsh




Takashi Iwai wrote, on 20/01/14 19:22:

At Sun, 19 Jan 2014 17:32:16 +1030,
Arthur Marsh wrote:


I have had reproducible lock-ups on shut-down (at the shutting down ALSA
stage) of my AMD64 machine (Asus M3A78Pro motherboard, BIOS 1701
01/27/2011, CPU AMD Athlon(tm) II X4 640 Processor) running the 64 bit
Linux kernel more recent than 3.12 when *both* radeon.audio=1 was set
and I had been running audacity 2.0.5. (iommu=noaperture is also set).

The problem was reproducible with the stock Debian kernel
linux-image-3.13-rc6-amd64 version 3.13~rc6-1~exp1.

The machine is using an ATI/AMD 3850HD video card with a DVI cable to a
DVI input on my monitor, and the default audio device is the
motherboard's on-board audio device:

00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00
Azalia (Intel HDA)

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] RV670 [Radeon HD 3690/3850]
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] RV670/680
HDMI Audio [Radeon HD 3690/3800 Series]

$ git bisect bad
cbbaa603a03cc46681e24d6b2804b62fde95a2af is the first bad commit
commit cbbaa603a03cc46681e24d6b2804b62fde95a2af
Author: Takashi Iwai 
Date:   Thu Oct 17 18:03:24 2013 +0200

  ALSA: hda - Fix possible races in HDMI driver

  Some per_pin fields and ELD contents might be changed dynamically in
  multiple ways where the concurrent accesses are still opened in the
  current code.  This patch fixes such possible races by using eld->lock
  in appropriate places.

  Reported-by: Anssi Hannula 
  Signed-off-by: Takashi Iwai 

:04 04 0c29281f82a3ebd9a704d481114f9cfcefea07c8
d71fd101125cd29a628cb5e66c7ee4f56e28329b M  sound

When running audacity from the command line there was the following output:

ALSA lib pcm_dsnoop.c:618:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
Expression 'stream->playback.pcm' failed in
'src/hostapi/alsa/pa_linux_alsa.c', line: 4611
Expression 'stream->playback.pcm' failed in
'src/hostapi/alsa/pa_linux_alsa.c', line: 4611
Expression 'stream->playback.pcm' failed in
'src/hostapi/alsa/pa_linux_alsa.c', line: 4611

I am happy to supply further information or run further tests to help in
isolating the problem and verifying a solution.


Could you build the kernel with lockdep kconfig and see whether it
reports errors?

Reverting the commit doesn't work cleanly.  Instead, you can try to
simply comment out all mutex_lock(_pin->lock) and
mutex_unlock(_pin->lock) calls in patch_hdmi.c to see whether it's
a mutex deadlock.


thanks,

Takashi



I rebuilt the kernel after commenting out all mutex_lock(_pin->lock) 
and mutex_unlock(_pin->lock) calls in patch_hdmi.c, and the 
resulting kernel shutdown without hanging.


Regards,

Arthur.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Dirty deleted files cause pointless I/O storms (unless truncated first)

2014-01-20 Thread Andy Lutomirski

The code below runs quickly for a few iterations, and then it slows
down and the whole system becomes laggy for far too long.

Removing the sync_file_range call results in no I/O being performed at
all (which means that the kernel isn't totally screwing this up), and
changing "4096" to SIZE causes lots of I/O but without
the going-out-to-lunch bit (unsurprisingly).

Surprisingly, uncommenting the ftruncate call seems to fix the
problem.  This suggests that all the necessary infrastructure to avoid
wasting time writing to deleted files is there but that it's not
getting used.


#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 

#define SIZE (16 * 1048576)

static void hammer(const char *name)
{
  int fd = open(name, O_RDWR | O_CREAT | O_EXCL, 0600);
  if (fd == -1)
err(1, "open");

  fallocate(fd, 0, 0, SIZE);

  void *addr = mmap(NULL, SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
  if (addr == MAP_FAILED)
err(1, "mmap");

  memset(addr, 0, SIZE);

  if (munmap(addr, SIZE) != 0)
err(1, "munmap");

  if (sync_file_range(fd, 0, 4096,
  SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE |
  SYNC_FILE_RANGE_WAIT_AFTER) != 0)
err(1, "sync_file_range");

  if (unlink(name) != 0)
err(1, "unlink");

  //  if (ftruncate(fd, 0) != 0)
  //err(1, "ftruncate");

  close(fd);
}

int main(int argc, char **argv)
{
  if (argc != 2) {
printf("Usage: hammer_and_delete FILENAME\n");
return 1;
  }

  while (true) {
hammer(argv[1]);
write(1, ".", 1);
  }
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [V2] MAINTAINERS: Add dts files for r8 series to SHMOBILE

2014-01-20 Thread Simon Horman

Hi Ben,

thanks for your work on this.

On Mon, Jan 20, 2014 at 04:10:32PM +, Ben Dooks wrote:
> Add a number of files to the list of files covered by SHMOBILE
> so any changes to these can be reported with get_maintailers.pl
> for the current SHMOILE architectures.

I'm fine with you only addressing r8 and r7 SoCs, I am happy
to make an incremental patch to cover the stragglers (sh7*, emev2).
And I would be even happier if this patch didn't suffer excessive
bike-shedding.  But I think there are some minor inconsistencies in this
patch.

Firstly, I think the subject should probably mention r7 and defconfigs.

> 
> Signed-off-by: Ben Dooks 
> ---
> v2:
>   - add defconfigs and r7 configurations
>   - fix path to dt-bindings
> 
> Cc: Joe Perches 
> Cc: Greg Kroah-Hartman 
> Cc: Andrew Morton 
> Cc: Magnus Damm 
> Cc: Simon Horman 
> Cc: linux-kernel@vger.kernel.org
> Cc: linux...@vger.kernel.org
> ---
>  MAINTAINERS | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6c20792..f74d830 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1239,7 +1239,14 @@ Q: 
> http://patchwork.kernel.org/project/linux-sh/list/
>  T:   git git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas.git next
>  S:   Supported
>  F:   arch/arm/mach-shmobile/
> +F:   arch/arm/boot/dts/r8*
> +F:   arch/arm/boot/dts/r7*
> +F:   arch/arm/configs/bockw_defconfig
> +F:   arch/arm/configs/genmai_defconfig
> +F:   arch/arm/configs/lager_defconfig
> +F:   arch/arm/configs/marzen_defconfig

I believe you should ad the following as they are for boards
that use SoCs whose name matches r8*.

arch/arm/configs/armadillo800eva_defconfig
arch/arm/configs/ape6evm_defconfig
arch/arm/configs/koelsch_defconfig

>  F:   drivers/sh/
> +F:   include/dt-bindings/clock/r8a*
>  
>  ARM/SOCFPGA ARCHITECTURE
>  M:   Dinh Nguyen 
> -- 
> 1.8.5.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sh" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Add HID's to hid-microsoft driver of Surface Type/Touch Cover 2 to fix bug

2014-01-20 Thread Reyad Attiyat

The below patch fixes a bug 64811
(https://bugzilla.kernel.org/show_bug.cgi?id=64811) of the Microsoft
Surface Type/Touch cover 2 devices being detected as a multitouch
device.
The fix adds the HID of the two devices to hid-microsoft driver. This
ensures that hid-input will eventually be used for the device and not
hid-multitouch.

>From 866c814f3f6740a5a79858fdf8bf5bbcdc3b57f8 Mon Sep 17 00:00:00 2001
From: Reyad Attiyat 
Date: Mon, 20 Jan 2014 16:24:49 -0600
Subject: [PATCH 1/2] Added in ID's for Surface Type/Touch cover 2 to the
 hid-microsoft driver

---
 drivers/hid/hid-ids.h   | 4 +++-
 drivers/hid/hid-microsoft.c | 4 
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index f9304cb..b523a8b 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -611,7 +611,9 @@
 #define USB_DEVICE_ID_MS_PRESENTER_8K_USB0x0713
 #define USB_DEVICE_ID_MS_DIGITAL_MEDIA_3K0x0730
 #define USB_DEVICE_ID_MS_COMFORT_MOUSE_45000x076c
-
+#define USB_DEVICE_ID_MS_TOUCH_COVER_2 0x07a7
+#define USB_DEVICE_ID_MS_TYPE_COVER_2  0x07a9
+
 #define USB_VENDOR_ID_MOJO0x8282
 #define USB_DEVICE_ID_RETRO_ADAPTER0x3201

diff --git a/drivers/hid/hid-microsoft.c b/drivers/hid/hid-microsoft.c
index 551795b..2599de8 100644
--- a/drivers/hid/hid-microsoft.c
+++ b/drivers/hid/hid-microsoft.c
@@ -207,6 +207,10 @@ static const struct hid_device_id ms_devices[] = {
 .driver_data = MS_NOGET },
 { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT,
USB_DEVICE_ID_MS_COMFORT_MOUSE_4500),
 .driver_data = MS_DUPLICATE_USAGES },
+{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TYPE_COVER_2),
+.driver_data = 0 },
+{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TOUCH_COVER_2),
+.driver_data = 0 },

 { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT,
USB_DEVICE_ID_MS_PRESENTER_8K_BT),
 .driver_data = MS_PRESENTER },
-- 
1.8.4.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe

2014-01-20 Thread Russell King - ARM Linux

On Mon, Jan 20, 2014 at 04:26:23PM -0800, Greg KH wrote:
> On Tue, Jan 21, 2014 at 12:07:06AM +, Russell King - ARM Linux wrote:
> > On Mon, Jan 20, 2014 at 03:51:28PM -0800, Greg KH wrote:
> > > On Mon, Jan 20, 2014 at 11:16:03PM +, Russell King - ARM Linux wrote:
> > > > I don't believe the driver model has any locking to prevent a drivers
> > > > ->probe function running concurrently with it's ->remove function for
> > > > two (or more) devices.
> > > 
> > > The bus prevents this from happening.
> > > 
> > > > The locking against this is done on a per-device basis, not a per-driver
> > > > basis.
> > > 
> > > No, on a per-bus basis.
> > 
> > I don't see it.
> > 
> > Let's start from driver_register().
> 
> Which happens from module probing, which is single-threaded, right?

Yes, to _some_ extent - the driver is added to the bus list of drivers
before existing drivers are probed, so it's always worth bearing in
mind that if a new device comes along, it's possible for that device
to be offered to even a driver which hasn't finished returning from
its module_init().

> > If you think there's a per-driver lock that's held over probes or removes,
> > please point it out.  I'm fairly certain that there isn't, because we have
> > to be able to deal with recursive probes (yes, we've had to deal with
> > those in the past.)
> 
> Hm, you are right, I think that's why we had to remove the locks.  The
> klist stuff handles us getting the needed locks for managing our
> internal lists of devices and drivers, and those should be fine.
> 
> So, let's go back to your original worry, what are you concerned about?
> A device being removed while probe() is called?

My concern is that we're turning something which should be simple into
something unnecessarily complex.  By that, I mean something along the
lines of:

static DEFINE_MUTEX(foo_mutex);
static unsigned foo_devices;

static int foo_probe(struct platform_device *pdev)
{
int ret;

mutex_lock(_mutex);
if (foo_devices++ == 0)
uart_register_driver();

ret = foo_really_probe_device(pdev);
if (ret) {
if (--foo_devices == 0)
uart_unregister_driver();
}
mutex_unlock(_mutex);

return ret;
}

static int foo_remove(struct platform_device *pdev)
{
mutex_lock(_mutex);
foo_really_remove(pdev);
if (--foo_devices == 0)
uart_unregister_driver();
mutex_unlock(_mutex);

return 0;
}

in every single serial driver we have...  Wouldn't it just be better to
fix the major/minor number problem rather than have to add all that code
repetitively to all those drivers?

-- 
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up.  Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.13-rc5] module: Add missing newline in printk call.

2014-01-20 Thread Rusty Russell

Tetsuo Handa  writes:
> Rusty, would you pick up this patch?
>
> This message was added in 3.13-rc1. Thus, should be fixed in 3.13.

Thanks, applied.  It's a bit trivial for a CC:stable though.

Cheers,
Rusty.

> Tetsuo Handa wrote:
>> From cc90e27d5cda227e7a0cbeb5de3cc1cbb1595dfa Mon Sep 17 00:00:00 2001
>> From: Tetsuo Handa 
>> Date: Mon, 23 Dec 2013 15:52:42 +0900
>> Subject: [PATCH] module: Add missing newline in printk call.
>> 
>> Add missing \n and also follow commit bddb12b3 "kernel/module.c: use 
>> pr_foo()".
>> 
>> Signed-off-by: Tetsuo Handa 
>> ---
>>  kernel/module.c |6 ++
>>  1 files changed, 2 insertions(+), 4 deletions(-)
>> 
>> diff --git a/kernel/module.c b/kernel/module.c
>> index f5a3b1e..d24fcf2 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -815,10 +815,8 @@ SYSCALL_DEFINE2(delete_module, const char __user *, 
>> name_user,
>>  return -EFAULT;
>>  name[MODULE_NAME_LEN-1] = '\0';
>>  
>> -if (!(flags & O_NONBLOCK)) {
>> -printk(KERN_WARNING
>> -   "waiting module removal not supported: please upgrade");
>> -}
>> +if (!(flags & O_NONBLOCK))
>> +pr_warn("waiting module removal not supported: please 
>> upgrade\n");
>>  
>>  if (mutex_lock_interruptible(_mutex) != 0)
>>  return -EINTR;
>> -- 
>> 1.7.1
>> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe

2014-01-20 Thread Greg KH

On Tue, Jan 21, 2014 at 12:07:06AM +, Russell King - ARM Linux wrote:
> On Mon, Jan 20, 2014 at 03:51:28PM -0800, Greg KH wrote:
> > On Mon, Jan 20, 2014 at 11:16:03PM +, Russell King - ARM Linux wrote:
> > > I don't believe the driver model has any locking to prevent a drivers
> > > ->probe function running concurrently with it's ->remove function for
> > > two (or more) devices.
> > 
> > The bus prevents this from happening.
> > 
> > > The locking against this is done on a per-device basis, not a per-driver
> > > basis.
> > 
> > No, on a per-bus basis.
> 
> I don't see it.
> 
> Let's start from driver_register().

Which happens from module probing, which is single-threaded, right?

Or from module_init callbacks, which is single-threaded.

Normally, busses never add devices (which is what drivers bind to),
except in a single-at-a-time fashion, unless they really know what they
are doing (i.e. PCI had multi-threaded device probing for a while, don't
remember if it still does...)


> This takes no locks and calls bus_add_driver().
> This also takes no locks and calls driver_attach().
> This walks the list of devices calling __driver_attach() for each.
> __driver_attach() tries to match the device against the driver,
> locks the parent device if one exists, and the device which is about
> to be probed.  It then calls driver_probe_device().
> driver_probe_device() inserts a runtime barrier and calls really_probe().
> really_probe() ultimately calls either the bus ->probe method or the
> driver ->probe method.
> 
> At no point in that sequence do I see anything which does any locking
> on a per-driver basis.  Let's look at device_add().
> 
> device_add() calls bus_probe_device(), which then calls device_attach().
> device_attach() takes the device's lock, and walks the list of drivers
> calling __device_attach() on each driver.  This then calls down into
> driver_probe_device(), and the path is the same as the above.
> 
> I don't see any per-driver locking here either.
> 
> I've checked the klist stuff, don't see anything there.  Ditto for
> bus_for_each_drv().
> 
> If you think there's a per-driver lock that's held over probes or removes,
> please point it out.  I'm fairly certain that there isn't, because we have
> to be able to deal with recursive probes (yes, we've had to deal with
> those in the past.)

Hm, you are right, I think that's why we had to remove the locks.  The
klist stuff handles us getting the needed locks for managing our
internal lists of devices and drivers, and those should be fine.

So, let's go back to your original worry, what are you concerned about?
A device being removed while probe() is called?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next v5 8/9] xen-netback: Timeout packets in RX path

2014-01-20 Thread Zoltan Kiss


On 20/01/14 22:03, Wei Liu wrote:

On Mon, Jan 20, 2014 at 09:24:28PM +, Zoltan Kiss wrote:

@@ -557,12 +577,25 @@ void xenvif_disconnect(struct xenvif *vif)
  void xenvif_free(struct xenvif *vif)
  {
int i, unmap_timeout = 0;
+   /* Here we want to avoid timeout messages if an skb can be legitimatly
+* stucked somewhere else. Realisticly this could be an another vif's
+* internal or QDisc queue. That another vif also has this
+* rx_drain_timeout_msecs timeout, but the timer only ditches the
+* internal queue. After that, the QDisc queue can put in worst case
+* XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS skbs into that another vif's
+* internal queue, so we need several rounds of such timeouts until we
+* can be sure that no another vif should have skb's from us. We are
+* not sending more skb's, so newly stucked packets are not interesting
+* for us here.
+*/

You beat me to this. Was about to reply to your other email. :-)

It's also worth mentioning that DIV_ROUND_UP part is merely estimation,
as you cannot possible know the maximum / miminum queue length of all
other vifs (as they can be changed during runtime). In practice most
users will stick with the default, but some advanced users might want to
tune this value for individual vif (whether that's a good idea or not is
another topic).

So, in order to convince myself this is safe. I also did some analysis
on the impact of having queue length other than default value.  If
queue_len < XENVIF_QUEUE_LENGTH, that means you can queue less packets
in qdisc than default and drain it faster than calculated, which is
safe. On the other hand if queue_len > XENVIF_QUEUE_LENGTH, it means
actually you need more time than calculated. I'm in two minded here. The
default value seems sensible to me but I'm still a bit worried about the
queue_len > XENVIF_QUEUE_LENGTH case.

An idea is to book-keep maximum tx queue len among all vifs and use that
to calculate worst scenario.
I don't think it should be that perfect. This is just a best effort 
estimation, if someone changes the vif queue length and see this message 
because of that, nothing very drastic will happen. It is just a rate 
limited warning message. Well, it is marked as error, because it is a 
serious condition.
And also, the odds of seeing this message unnecessarily are quite low. 
With default settings (256 slots, max 17 per skb, 32 queue length, 10 
secs queue drain timeout) this delay is 20 seconds. You can raise the 
queue length to 64 before getting warning (see netif_napi_add), so it 
would go up to 40 seconds, but anyway, if your vif is sitting on a 
packet more than 20 seconds, you deserve this message :)


Zoli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe

2014-01-20 Thread Russell King - ARM Linux

On Mon, Jan 20, 2014 at 11:47:34PM +, Alan Cox wrote:
> But yes I agree about the idiom, but a definite NAK to any attempts to
> plaster over this grand screwup by crapping in the tty core. Your turd,
> deal with it locally in the ARM code if you can't apply common sense and
> just go dynamic.

I believe at the time there was no one maintaining the device list to
_do_ that allocation - AMBA PL011 came along in 2005 after (I believe)
hpa stopped looking after that list.

So, please tell me how a number could be allocated properly without the
device numbers list being maintained?

I've no problem with going dynamic, and I suggest that you get a sense
of perspective rather than just spouting rubbish from on high.

-- 
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up.  Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/3] zram: introduce zram compressor operations struct

2014-01-20 Thread Minchan Kim

On Mon, Jan 20, 2014 at 01:03:48PM +0300, Sergey Senozhatsky wrote:
> On (01/20/14 14:12), Minchan Kim wrote:
> > Hello Sergey,
> > 
> > I reviewed this patchset and I suggest somethings.
> > Please have a look and feedback to me. :)
> > 
> > 1. Let's define new file zram_comp.c
> > 2. zram_comp includes following field
> >.create
> >.compress
> >.decompress.
> >.destroy
> >.name
> > 
> 
> alternatively, we can use crypto api, the same way as zswap does (that
> will require handling of cpu hotplug).
> 
>   -ss

I really doubt what's the benefit from crypto API for zram.
It's maybe since I'm not familiar with it so I should ask a silly
question.

1. What's the runtime overhead for using such frontend?

   As you know, zram is in-memory block device so I don't want to
   add unnecessary overhead to optimize.

2. What's the memory footprint for using such frontend?

   As you know, zram is very popular for small-memory embedded device
   so I don't want to consume more runtime memory and static memory
   due to CONFIG_CRYPTO friend.

3. Is it a flexible to alloc/handle multiple compressor buffer for
   the our purpose? zswap and zcache have been used it with per-cpu
   buffer but it would a problem for write scalabitliy if we uses zlib
   which takes long time to compress.
   When I read code, maybe we can allocate multiple buffers through
   cryptop_alloc_compo several time but it would cause 1) and 2) problem
   again.

So, what's the attractive point for using crypto?
One of thing I could imagine is that it could make zram H/W compressor
but I don't have heard about it so if we don't have any special reason,
I'd like to go with raw compressor so we can get a *base* number. Then,
if we really need crypto API, we can change it easily and benchmark.
Finally, we could get a comparision number in future and it would make
the decision easily.

Thanks.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe

2014-01-20 Thread Russell King - ARM Linux

On Mon, Jan 20, 2014 at 03:51:28PM -0800, Greg KH wrote:
> On Mon, Jan 20, 2014 at 11:16:03PM +, Russell King - ARM Linux wrote:
> > I don't believe the driver model has any locking to prevent a drivers
> > ->probe function running concurrently with it's ->remove function for
> > two (or more) devices.
> 
> The bus prevents this from happening.
> 
> > The locking against this is done on a per-device basis, not a per-driver
> > basis.
> 
> No, on a per-bus basis.

I don't see it.

Let's start from driver_register().
This takes no locks and calls bus_add_driver().
This also takes no locks and calls driver_attach().
This walks the list of devices calling __driver_attach() for each.
__driver_attach() tries to match the device against the driver,
locks the parent device if one exists, and the device which is about
to be probed.  It then calls driver_probe_device().
driver_probe_device() inserts a runtime barrier and calls really_probe().
really_probe() ultimately calls either the bus ->probe method or the
driver ->probe method.

At no point in that sequence do I see anything which does any locking
on a per-driver basis.  Let's look at device_add().

device_add() calls bus_probe_device(), which then calls device_attach().
device_attach() takes the device's lock, and walks the list of drivers
calling __device_attach() on each driver.  This then calls down into
driver_probe_device(), and the path is the same as the above.

I don't see any per-driver locking here either.

I've checked the klist stuff, don't see anything there.  Ditto for
bus_for_each_drv().

If you think there's a per-driver lock that's held over probes or removes,
please point it out.  I'm fairly certain that there isn't, because we have
to be able to deal with recursive probes (yes, we've had to deal with
those in the past.)

-- 
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up.  Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the imx-mxs tree with the arm tree

2014-01-20 Thread Stephen Rothwell

Hi Shawn,

Today's linux-next merge of the imx-mxs tree got conflicts in
arch/arm/boot/dts/Makefile, arch/arm/boot/dts/imx6dl-hummingboard.dts,
arch/arm/boot/dts/imx6qdl-microsom-ar8035.dtsi and
arch/arm/boot/dts/imx6qdl-microsom.dtsi between commits 728d5599f5d8
("ARM: imx: initial SolidRun HummingBoard support") and d79c363fd9cd
("ARM: imx: initial SolidRun Cubox-i support") from the arm tree and a
series of commits from the imx-mxs tree.

Russell told me that the imx-ims tree changes have not yet been approved
for v3.13 inclusion and they conflict fairly badly, so for today I have
dropped the imx-mxs tree.  Let me know what I should do in the future.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpb3_GXVe6n7.pgp
Description: PGP signature

Re: Deadlock in do_page_fault() on ARM (old kernel)

2014-01-20 Thread Alan Ott


On 01/17/2014 08:20 PM, Russell King - ARM Linux wrote:

On Fri, Jan 17, 2014 at 07:57:16PM -0500, Alan Ott wrote:

On 01/17/2014 08:46 AM, Russell King - ARM Linux wrote:

My suspicion therefore is that some other thread must have died while
holding the mmap_sem, so there's probably a kernel oops earlier...
that's my best guess at the moment without seeing the full backtrace.

There's no oops that I'm able to see.

Each of the tasks which lockdep reports as "holding" mmap_sem are
blocking for it. If some other task had taken it and then crashed, I
assume lockdep would list the crashed task as also holding the resource
in the printout.

My point is this:

- the five (or six) threads which are trying to take the mmap_sem in
   read-mode in the fault handler are all blocked on it - they haven't
   taken the lock, which will only happen because there's a pending writer.
- of these in your original post, there are two which faulted from
   __copy_to_user_std().  __copy_to_user_std() doesn't take the mmap_sem -
   this is the non-uaccess-with-memcpy path.
- the pending writers are the two threads in sys_mmap_pgoff(), both of
   which are blocked waiting to gain the write lock.
- there are no *other* threads holding the mmap_sem lock.


Yes, all true. I don't remember why I started looking at the memcpy() case.


So... there's a question here how we got into this state - and frankly
I don't know.  What I do see from your latest dump is that there's two
unknown modules there - something called rcu2m and another called
buttoms, and there are two threads inside ioctls there.  Both have
faulted from the function at 0xc0d2a394 (which won't appear in the
backtrace, but is most likely __copy_to_user_std.)


Yes, there are a handful of out-of-tree modules.


So, in the absence of you saying anything about there being any preceding
oopses, my conclusion now is that one of those modules is taking the
mmap_sem itself, and is the culpret inducing this deadlock.


Yes, I came to that as well. I had checked for the presence of mmap_sem 
in the sources of the out-of-tree modules and didn't see it. However, 
upon closer inspection, my grep-fu failed me as there were some backward 
symlinks I didn't account for. TI's cmemk module _is_ taking out 
mmap_sem. I wish I had seen this days ago. That's my new investigation path.



Note that your dump ([2]) in your reply was just the hung task detector
printing out the stacktrace for a few tasks, not the full all-threads
stack dump which I was expecting.


Yes, in a misguided attempt to keep the SNR high, I didn't include the 
full dump, but only what I thought was the interesting part. I did 
another capture and the full dump is at [1] .



So I'm pulling out these conclusions from the very little information
you're supplying.


I appreciate it. Thank you for taking the time to reply.

Alan.

[1] 
http://www.signal11.us/~alan/stack_dump_all_tasks_with_frame_pointers.txt


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe

2014-01-20 Thread Greg Kroah-Hartman

On Mon, Jan 20, 2014 at 11:35:41PM +, Alan Cox wrote:
> > The first bit is easy... but we need to add locks to every serial
> > driver to prevent two probes operating concurrently...
> 
> The bus probe should already be serializing surely ?

Yes, it better be, otherwise that bus is badly broken.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe

2014-01-20 Thread Greg KH

On Mon, Jan 20, 2014 at 11:16:03PM +, Russell King - ARM Linux wrote:
> On Mon, Jan 20, 2014 at 03:11:41PM -0800, Greg KH wrote:
> > On Mon, Jan 20, 2014 at 09:32:06PM +, Russell King - ARM Linux wrote:
> > > On Mon, Jan 20, 2014 at 01:16:01PM -0800, Greg KH wrote:
> > > > On Mon, Jan 20, 2014 at 10:05:30AM +, Russell King - ARM Linux 
> > > > wrote:
> > > > > On Mon, Jan 20, 2014 at 02:32:34PM +0530, Tushar Behera wrote:
> > > > > > uart_register_driver call binds the driver to a specific device
> > > > > > node through tty_register_driver call. This should typically happen
> > > > > > during device probe call.
> > > > > > 
> > > > > > In a multiplatform scenario, it is possible that multiple serial
> > > > > > drivers are part of the kernel. Currently the driver registration 
> > > > > > fails
> > > > > > if multiple serial drivers with same default major/minor numbers are
> > > > > > included in the kernel.
> > > > > > 
> > > > > > A typical case is observed with amba-pl011 and samsung-uart drivers.
> > > > > 
> > > > > The samsung-uart driver is at fault here - the major/minor numbers 
> > > > > were
> > > > > officially registered to amba-pl011.  Samsung needs to be fixed 
> > > > > properly.
> > > > 
> > > > I agree, the Samsung driver is "broken" here, but that's no reason why
> > > > these two drivers can't register with the tty layer _after_ the hardware
> > > > is detected, and not before.
> > > > 
> > > > That saves resources on systems that build the drivers in, yet do not
> > > > have the hardware present, which is always a good thing.
> > > 
> > > Great, so what you're saying is that we need to wait until the first
> > > device calls into the probe function.  What about removal... how does
> > > a driver know when it's last device has been removed to de-register
> > > that?
> > 
> > The "bus" that the device is on handles that, right?
> > 
> > > I guess it needs the driver model to provide some way to know when a
> > > driver is completely unbound - but isn't that racy?
> > 
> > How is it racy?  That's how the driver model works...
> 
> Think about what happens when the last device unregisters, but a new
> device comes along and is probed.
> 
> I don't believe the driver model has any locking to prevent a drivers
> ->probe function running concurrently with it's ->remove function for
> two (or more) devices.

The bus prevents this from happening.

> The locking against this is done on a per-device basis, not a per-driver
> basis.

No, on a per-bus basis.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe

2014-01-20 Thread Alan Cox

On Mon, 20 Jan 2014 23:14:57 +
Mark Brown  wrote:

> On Mon, Jan 20, 2014 at 09:43:05PM +, Alan Cox wrote:
> 
> > The dynamic major/minor is the right patch. If the userspace breaks then
> > the userspace was broken, but I see no evidence in the discussion that
> > the userspace broke.
> 
> The userspace breakage is that if someone has a static /dev that doesn't
> handle any dynamic devices then renumbering the device will cause that
> static /dev to stop matching the kernel.

Diddums and you've only provided theoretical cases not real world ones.

They should have followed proper practice and reserved their minors. The
device number belongs to the Altix. The driver should just move.

> > Thats what the list says. Samsung should have followed the rules, they
> > didn't so they get to pick up the pieces. The Amba driver wants moving as
> > well. It's easy. If you want something to be ABI then make sure you get
> > it upstream first, if not you get to own all the pain down the line.
> 
> This stuff is all upstream already, a quick check suggests both drivers
> predate git - it's been noticed because the ARM multiplatform work has
> caused people to try booting kernels with both built in.

So ARM people didn't follow the policy on allocating device minors even
within their own community and got burned by it. That's despite having
previously been burned by abusing the ttyS0 8250 major/minor in the same
way, for the same purposes, and creating the same mess.

{facepalm...}

> > If the hardware isn't present then the driver shouldn't even register
> > with the tty layer in the first place so it doesn't make any resource
> > differeneces either for properly written code.
> 
> Right, that's not the idiom that has been followed by any of serial
> drivers though so needs fixing too.

Actally some drivers do get this right but not many. ehv-bc for example
does.

But yes I agree about the idiom, but a definite NAK to any attempts to
plaster over this grand screwup by crapping in the tty core. Your turd,
deal with it locally in the ARM code if you can't apply common sense and
just go dynamic.

And please, after screwing this up twice - *learn* from the mess.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] vfs: Remove second variable named error in __dentry_path

2014-01-20 Thread Eric W. Biederman


In commit  232d2d60aa5469bb097f55728f65146bd49c1d25
Author: Waiman Long 
Date:   Mon Sep 9 12:18:13 2013 -0400

dcache: Translating dentry into pathname without taking rename_lock

The __dentry_path locking was changed and the variable error was
intended to be moved outside of the loop.  Unfortunately the inner
declaration of error was not removed. Resulting in a version of
__dentry_path that will never return an error.

Remove the problematic inner declaration of error and allow
__dentry_path to return errors once again.

Cc: sta...@vger.kernel.org
Cc: Waiman Long 
Signed-off-by: "Eric W. Biederman" 
---
 fs/dcache.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index cb4a10690868..fdbe23027810 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3135,7 +3135,6 @@ restart:
read_seqbegin_or_lock(_lock, );
while (!IS_ROOT(dentry)) {
struct dentry *parent = dentry->d_parent;
-   int error;
 
prefetch(parent);
error = prepend_name(, , >d_name);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe

2014-01-20 Thread Alan Cox

> The first bit is easy... but we need to add locks to every serial
> driver to prevent two probes operating concurrently...

The bus probe should already be serializing surely ?

 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 6/6] MCS Lock: add Kconfig entries to allow arch-specific hooks

2014-01-20 Thread Tim Chen

On Mon, 2014-01-20 at 13:30 +0100, Peter Zijlstra wrote:

> 
> Then again, people seem to whinge if you don't keep these Kbuild files
> sorted, but manually sorting 29 files is just not something I like to
> do.
> 

Peter,

Can you clarify what exactly needs to be sorted?  The Kbuild files
spit out by git diff appears to be sorted already.

Tim

> ---
>  arch/alpha/include/asm/Kbuild  |  1 +
>  arch/arc/include/asm/Kbuild|  1 +
>  arch/arm/include/asm/Kbuild|  1 +
>  arch/arm64/include/asm/Kbuild  |  1 +
>  arch/avr32/include/asm/Kbuild  |  1 +
>  arch/blackfin/include/asm/Kbuild   |  1 +
>  arch/c6x/include/asm/Kbuild|  1 +
>  arch/cris/include/asm/Kbuild   |  1 +
>  arch/frv/include/asm/Kbuild|  1 +
>  arch/hexagon/include/asm/Kbuild|  1 +
>  arch/ia64/include/asm/Kbuild   |  2 +-
>  arch/m32r/include/asm/Kbuild   |  1 +
>  arch/m68k/include/asm/Kbuild   |  1 +
>  arch/metag/include/asm/Kbuild  |  1 +
>  arch/microblaze/include/asm/Kbuild |  1 +
>  arch/mips/include/asm/Kbuild   |  1 +
>  arch/mn10300/include/asm/Kbuild|  1 +
>  arch/openrisc/include/asm/Kbuild   |  1 +
>  arch/parisc/include/asm/Kbuild |  1 +
>  arch/powerpc/include/asm/Kbuild|  2 +-
>  arch/s390/include/asm/Kbuild   |  1 +
>  arch/score/include/asm/Kbuild  |  1 +
>  arch/sh/include/asm/Kbuild |  1 +
>  arch/sparc/include/asm/Kbuild  |  1 +
>  arch/tile/include/asm/Kbuild   |  1 +
>  arch/um/include/asm/Kbuild |  1 +
>  arch/unicore32/include/asm/Kbuild  |  1 +
>  arch/x86/include/asm/Kbuild|  1 +
>  arch/xtensa/include/asm/Kbuild |  1 +
>  include/asm-generic/mcs_spinlock.h | 13 +
>  30 files changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
> index f01fb505ad52..14cbbbcec01f 100644
> --- a/arch/alpha/include/asm/Kbuild
> +++ b/arch/alpha/include/asm/Kbuild
> @@ -4,3 +4,4 @@ generic-y += clkdev.h
>  generic-y += exec.h
>  generic-y += trace_clock.h
>  generic-y += preempt.h
> +generic-y += mcs_spinlock.h
> diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
> index 9ae21c198007..c0773a5c2ca7 100644
> --- a/arch/arc/include/asm/Kbuild
> +++ b/arch/arc/include/asm/Kbuild
> @@ -48,3 +48,4 @@ generic-y += user.h
>  generic-y += vga.h
>  generic-y += xor.h
>  generic-y += preempt.h
> +generic-y += mcs_spinlock.h
> diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
> index c38b58c80202..c68cfdde8783 100644
> --- a/arch/arm/include/asm/Kbuild
> +++ b/arch/arm/include/asm/Kbuild
> @@ -34,3 +34,4 @@ generic-y += timex.h
>  generic-y += trace_clock.h
>  generic-y += unaligned.h
>  generic-y += preempt.h
> +generic-y += mcs_spinlock.h
> diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
> index 519f89f5b6a3..24a3c10cdf38 100644
> --- a/arch/arm64/include/asm/Kbuild
> +++ b/arch/arm64/include/asm/Kbuild
> @@ -51,3 +51,4 @@ generic-y += user.h
>  generic-y += vga.h
>  generic-y += xor.h
>  generic-y += preempt.h
> +generic-y += mcs_spinlock.h
> diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild
> index 658001b52400..466e13d06bd3 100644
> --- a/arch/avr32/include/asm/Kbuild
> +++ b/arch/avr32/include/asm/Kbuild
> @@ -18,3 +18,4 @@ generic-y   += sections.h
>  generic-y   += topology.h
>  generic-y+= trace_clock.h
>  generic-y   += xor.h
> +generic-y += mcs_spinlock.h
> diff --git a/arch/blackfin/include/asm/Kbuild 
> b/arch/blackfin/include/asm/Kbuild
> index f2b43474b0e2..0bd1c5c688e3 100644
> --- a/arch/blackfin/include/asm/Kbuild
> +++ b/arch/blackfin/include/asm/Kbuild
> @@ -45,3 +45,4 @@ generic-y += unaligned.h
>  generic-y += user.h
>  generic-y += xor.h
>  generic-y += preempt.h
> +generic-y += mcs_spinlock.h
> diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild
> index fc0b3c356027..21d7100ddef9 100644
> --- a/arch/c6x/include/asm/Kbuild
> +++ b/arch/c6x/include/asm/Kbuild
> @@ -57,3 +57,4 @@ generic-y += user.h
>  generic-y += vga.h
>  generic-y += xor.h
>  generic-y += preempt.h
> +generic-y += mcs_spinlock.h
> diff --git a/arch/cris/include/asm/Kbuild b/arch/cris/include/asm/Kbuild
> index 199b1a9dab89..c571cc12a4d2 100644
> --- a/arch/cris/include/asm/Kbuild
> +++ b/arch/cris/include/asm/Kbuild
> @@ -13,3 +13,4 @@ generic-y += trace_clock.h
>  generic-y += vga.h
>  generic-y += xor.h
>  generic-y += preempt.h
> +generic-y += mcs_spinlock.h
> diff --git a/arch/frv/include/asm/Kbuild b/arch/frv/include/asm/Kbuild
> index 74742dc6a3da..ccca92eb782a 100644
> --- a/arch/frv/include/asm/Kbuild
> +++ b/arch/frv/include/asm/Kbuild
> @@ -3,3 +3,4 @@ generic-y += clkdev.h
>  generic-y += exec.h
>  generic-y += trace_clock.h
>  generic-y += preempt.h
> +generic-y += mcs_spinlock.h
> diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
> index ada843c701ef..553077d0f50c 100644
> ---

Re: 3.12-rc5 and overwritten partition table - by powertop?

2014-01-20 Thread Stefan Agner

Am 2013-10-29 21:10, schrieb Jan Kara:
>> The first ~170kb of /dev/sda got blown away with what seems to be a logging 
>> output
>> by Powertop, when I was playing with the tuneables.
>> (Luckily the first partition starts later :-))
>   So did you log the output to some file? I'm just trying to understand how
> it could get onto your disk in the first place...
> 
>> Why is that I don't know, but maybe when turning on the SATA knobs
>> something goes wrong. I'm afraid to try again, but I accept rather higher
>> power use than data loss again :-/

I experienced the same on the very same hardware (Lenovo T440s). Like
John, I turned all those knobs in powertop, including the SATA ones.
Several time I ended up with broken partition table. Once, even my EFI
System partition (first partition) was broken. However, since I use EFI
I was able to recover the partition table quite easily (gdisk asks for
recovery from backup partition table, kudos to the designer of the GPT
format!).

This happens running on Arch Linux with stock 3.12.7 as well as mainline
3.13 kernel. I use latest T440s firmware (2.17). 

Is it possible to disable/warn user when using that knob (at least on
Lenovo T440s), in order to avoid users left at an unbootable system...?

dmesg output:
[2.744398] ata1: SATA max UDMA/133 abar m2048@0xf063c000 port
0xf063c100 irq 59
[2.744400] ata2: DUMMY
[2.744401] ata3: DUMMY
[3.063804] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[3.064532] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
succeeded
[3.064536] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
LOCK) filtered out
[3.064538] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
filtered out
[3.064606] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES)
succeeded
[3.064926] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1
[3.064929] ata1.00: ATA-9: Samsung SSD 840 PRO Series, DXM05B0Q, max
UDMA/133
[3.064931] ata1.00: 500118192 sectors, multi 16: LBA48 NCQ (depth
31/32), AA
[3.065256] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
succeeded
[3.065259] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
LOCK) filtered out
[3.065261] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
filtered out
[3.065286] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES)
succeeded
[3.065545] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1
[3.065605] ata1.00: configured for UDMA/133
...
[  130.578789] ata1.00: exception Emask 0x0 SAct 0x7fff SErr 0x4
action 0x0
[  130.578794] ata1.00: irq_stat 0x4001
[  130.578796] ata1: SError: { CommWake }
[  130.578798] ata1.00: failed command: WRITE FPDMA QUEUED
[  130.578802] ata1.00: cmd 61/10:00:f0:29:05/00:00:00:00:00/40 tag 0
ncq 8192 out
[  130.578804] ata1.00: status: { DRDY ERR }
[  130.578806] ata1.00: error: { ABRT }
...
[  130.579011] ata1.00: failed command: WRITE FPDMA QUEUED
[  130.579014] ata1.00: cmd 61/10:f0:58:7c:0f/00:00:00:00:00/40 tag 30
ncq 8192 out
[  130.579016] ata1.00: status: { DRDY ERR }
[  130.579017] ata1.00: error: { ABRT }
[  130.579207] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1
[  130.579456] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1
[  130.579511] ata1.00: configured for UDMA/133
[  130.579583] ata1: EH complete

--
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Poor network performance x86_64.. also with 3.13

2014-01-20 Thread Branimir Maksimovic


On 01/20/2014 11:37 PM, Borislav Petkov wrote:

On Mon, Jan 20, 2014 at 11:27:25PM +0100, Daniel Exner wrote:

I just did the same procedure with Kernel Version 3.13: same poor rates.

I think I will try to see of 3.12.6 was still ok and bisect from there.

Or try something more coarse-grained like 3.11 first, then 3.12 and then
the -rcs in between.


Hm, on my machine 3.13 (latest git) has double throughtput of 3.11 (distro
compiled) on loopback interface. 68Gb vs 33Gb (iperf).



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] vfs: Is mounted should be testing mnt_ns for NULL or error.

2014-01-20 Thread Eric W. Biederman


A bug was introduced with the is_mounted helper function in
commit f7a99c5b7c8bd3d3f533c8b38274e33f3da9096e
Author: Al Viro 
Date:   Sat Jun 9 00:59:08 2012 -0400

get rid of ->mnt_longterm

it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()

Signed-off-by: Al Viro 

The intent was to test if the real_mount(vfsmount)->mnt_ns was
NULL_OR_ERR but the code is actually testing real_mount(vfsmount)
and always returning true.

The result is d_absolute_path returning paths it should be hiding.

Cc: sta...@vger.kernel.org
Signed-off-by: "Eric W. Biederman" 
---
 fs/mount.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index d64c594be6c4..a17458ca6f29 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -74,7 +74,7 @@ static inline int mnt_has_parent(struct mount *mnt)
 static inline int is_mounted(struct vfsmount *mnt)
 {
/* neither detached nor internal? */
-   return !IS_ERR_OR_NULL(real_mount(mnt));
+   return !IS_ERR_OR_NULL(real_mount(mnt)->mnt_ns);
 }
 
 extern struct mount *__lookup_mnt(struct vfsmount *, struct dentry *);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] NFSv4.1: new layout stateid can not be overwrite by one out of date

2014-01-20 Thread Trond Myklebust

On Mon, 2014-01-20 at 16:15 +0800, shaobingqing wrote:
> If initiate_file_draining returned NFS4ERR_DELAY, all the lsegs of
> a file might be released before the retrying cb_layout request arriving
> at the client. In this situation, layoutget request of the file will
> use open stateid to obtain a new layout stateid. And if the retrying
> cb_layout request arrived at the client after the layoutget reply,
> new layout stateid would be overwrite by one out of date.
> 
> Signed-off-by: shaobingqing 
> ---
>  fs/nfs/callback_proc.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index ae2e87b..98fed13 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -174,7 +174,9 @@ static u32 initiate_file_draining(struct nfs_client *clp,
>   rv = NFS4ERR_DELAY;
>   else
>   rv = NFS4ERR_NOMATCHING_LAYOUT;
> - pnfs_set_layout_stateid(lo, >cbl_stateid, true);
> + if (memcmp(args->cbl_stateid.other, lo->plh_stateid.other,
> +  NFS4_STATEID_OTHER_SIZE) == 0)
> + pnfs_set_layout_stateid(lo, >cbl_stateid, true);

Well...  We shouldn't really be calling
pnfs_mark_matching_lsegs_invalid() either in this case...

>   spin_unlock(>i_lock);
>   pnfs_free_lseg_list(_me_list);
>   pnfs_put_layout_hdr(lo);


-- 
Trond Myklebust
Linux NFS client maintainer

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe

2014-01-20 Thread Russell King - ARM Linux

On Mon, Jan 20, 2014 at 11:14:57PM +, Mark Brown wrote:
> On Mon, Jan 20, 2014 at 09:43:05PM +, Alan Cox wrote:
> > If the hardware isn't present then the driver shouldn't even register
> > with the tty layer in the first place so it doesn't make any resource
> > differeneces either for properly written code.
> 
> Right, that's not the idiom that has been followed by any of serial
> drivers though so needs fixing too.

It's not followed by serial drivers because it gets f*scking complicated
to do it that way.

In order to do it that way, what we need to do is:

1. On the first device probe, register the UART driver.
2. On subsequent device probes, don't register the UART driver because
   it's already registered.
3. When devices are removed, do nothing until the last device.
4. When the last device is removed, unregister the UART driver.

The first bit is easy... but we need to add locks to every serial
driver to prevent two probes operating concurrently...

-- 
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up.  Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] ACPI / idle: Move idle_boot_override out of the arch directory

2014-01-20 Thread Rafael J. Wysocki

On Monday, January 20, 2014 10:08:41 PM Hanjun Guo wrote:
> On 2014年01月18日 21:47, Rafael J. Wysocki wrote:
> > On Saturday, January 18, 2014 11:52:18 AM Hanjun Guo wrote:
> >> On 2014-1-18 11:45, Hanjun Guo wrote:
> >>> On 2014-1-17 20:06, Sudeep Holla wrote:
>  On 17/01/14 02:03, Hanjun Guo wrote:
> > Move idle_boot_override out of the arch directory to be a single enum
> > including both platforms values, this will make it rather easier to
> > avoid ifdefs around which definitions are for which processor in
> > generally used ACPI code.
> >
> > IDLE_FORCE_MWAIT for IA64 is not used anywhere, so romove it.
> >
> > No functional change in this patch.
> >
> > Suggested-by: Alan 
> > Signed-off-by: Hanjun Guo 
> > ---
> >> [...]
> > diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> > index 03e235ad..e324561 100644
> > --- a/include/linux/cpu.h
> > +++ b/include/linux/cpu.h
> > @@ -220,6 +220,14 @@ void cpu_idle(void);
> >   
> >   void cpu_idle_poll_ctrl(bool enable);
> >   
> > +enum idle_boot_override {
> > +   IDLE_NO_OVERRIDE = 0,
> > +   IDLE_HALT,
> > +   IDLE_NOMWAIT,
> > +   IDLE_POLL,
> > +   IDLE_POWERSAVE_OFF
> > +};
> > +
>  I do understand the idea behind this change, but IMO HALT and MWAIT are 
>  x86
>  specific and may not make sense for other architectures.
> >>> yes, this is the strange part, the value is arch-dependent.
> >>>
>  It will also require every architecture using ACPI to export
>  boot_option_idle_override which may not be really required.
> >>> so, how about forget this patch and move boot_option_idle_override
> >>> related code into arch directory such as arch/x86/acpi/boot.c for
> >>> x86?
> >> The general idea is that we can move all the arch-dependent codes
> >> in ACPI driver to arch directory, then make codes in drivers/acpi/
> >> arch independent.
> > Well, MWAIT is arch-dependent, so I'm not sure how IDLE_NOMWAIT fits into
> > include/linux/cpu.h?
> 
> So you will not happy with this patch and should find another solution?

No, I'm not happy with it.

If you want to move that to an arch-agnostic header, the symbol names cannot
be arch-dependent any more.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 9/9] mm: keep page cache radix tree nodes in check

2014-01-20 Thread Johannes Weiner

On Fri, Jan 17, 2014 at 11:05:17AM +1100, Dave Chinner wrote:
> On Fri, Jan 10, 2014 at 01:10:43PM -0500, Johannes Weiner wrote:
> > Previously, page cache radix tree nodes were freed after reclaim
> > emptied out their page pointers.  But now reclaim stores shadow
> > entries in their place, which are only reclaimed when the inodes
> > themselves are reclaimed.  This is problematic for bigger files that
> > are still in use after they have a significant amount of their cache
> > reclaimed, without any of those pages actually refaulting.  The shadow
> > entries will just sit there and waste memory.  In the worst case, the
> > shadow entries will accumulate until the machine runs out of memory.
> > 
> > To get this under control, the VM will track radix tree nodes
> > exclusively containing shadow entries on a per-NUMA node list.
> > Per-NUMA rather than global because we expect the radix tree nodes
> > themselves to be allocated node-locally and we want to reduce
> > cross-node references of otherwise independent cache workloads.  A
> > simple shrinker will then reclaim these nodes on memory pressure.
> > 
> > A few things need to be stored in the radix tree node to implement the
> > shadow node LRU and allow tree deletions coming from the list:
> 
> Just a couple of things with the list_lru interfaces.
> 
> 
> > @@ -123,9 +129,39 @@ static void page_cache_tree_delete(struct 
> > address_space *mapping,
> >  * same time and miss a shadow entry.
> >  */
> > smp_wmb();
> > -   } else
> > -   radix_tree_delete(>page_tree, page->index);
> > +   }
> > mapping->nrpages--;
> > +
> > +   if (!node) {
> > +   /* Clear direct pointer tags in root node */
> > +   mapping->page_tree.gfp_mask &= __GFP_BITS_MASK;
> > +   radix_tree_replace_slot(slot, shadow);
> > +   return;
> > +   }
> > +
> > +   /* Clear tree tags for the removed page */
> > +   index = page->index;
> > +   offset = index & RADIX_TREE_MAP_MASK;
> > +   for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
> > +   if (test_bit(offset, node->tags[tag]))
> > +   radix_tree_tag_clear(>page_tree, index, tag);
> > +   }
> > +
> > +   /* Delete page, swap shadow entry */
> > +   radix_tree_replace_slot(slot, shadow);
> > +   node->count--;
> > +   if (shadow)
> > +   node->count += 1U << RADIX_TREE_COUNT_SHIFT;
> > +   else
> > +   if (__radix_tree_delete_node(>page_tree, node))
> > +   return;
> > +
> > +   /* Only shadow entries in there, keep track of this node */
> > +   if (!(node->count & RADIX_TREE_COUNT_MASK) &&
> > +   list_empty(>private_list)) {
> > +   node->private_data = mapping;
> > +   list_lru_add(_shadow_nodes, >private_list);
> > +   }
> 
> You can't do this list_empty(>private_list) check safely
> externally to the list_lru code - only time that entry can be
> checked safely is under the LRU list locks. This is the reason that
> list_lru_add/list_lru_del return a boolean to indicate is the object
> was added/removed from the list - they do this list_empty() check
> internally. i.e. the correct, safe way to do conditionally update
> state iff the object was added to the LRU is:
> 
>   if (!(node->count & RADIX_TREE_COUNT_MASK)) {
>   if (list_lru_add(_shadow_nodes, >private_list))
>   node->private_data = mapping;
>   }
> 
> > +   radix_tree_replace_slot(slot, page);
> > +   mapping->nrpages++;
> > +   if (node) {
> > +   node->count++;
> > +   /* Installed page, can't be shadow-only anymore */
> > +   if (!list_empty(>private_list))
> > +   list_lru_del(_shadow_nodes,
> > +>private_list);
> > +   }
> 
> Same issue here:
> 
>   if (node) {
>   node->count++;
>   list_lru_del(_shadow_nodes, >private_list);
>   }

All modifications to node->private_list happen under
mapping->tree_lock, and modifications of a neighboring link should not
affect the outcome of the list_empty(), so I don't think the lru lock
is necessary.

It would be cleaner to take it of course, but that would mean adding
an unconditional NUMAnode-wide lock to every page cache population.

> >  static int __add_to_page_cache_locked(struct page *page,
> > diff --git a/mm/list_lru.c b/mm/list_lru.c
> > index 72f9decb0104..47a9faf4070b 100644
> > --- a/mm/list_lru.c
> > +++ b/mm/list_lru.c
> > @@ -88,10 +88,18 @@ restart:
> > ret = isolate(item, >lock, cb_arg);
> > switch (ret) {
> > case LRU_REMOVED:
> > +   case LRU_REMOVED_RETRY:
> > if (--nlru->nr_items == 0)
> > node_clear(nid, lru->active_nodes);
> > WARN_ON_ONCE(nlru->nr_items < 0);
> > isolated++;
> > +   /*
> > +* If the lru lock has been dropped, our list
> >

Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff

2014-01-20 Thread Trond Myklebust

On Mon, 2014-01-20 at 14:59 +0800, shaobingqing wrote:
> In current code, there only one struct rpc_rqst is prealloced. If one 
> callback request is received from two sk_buff, the xprt_alloc_bc_request
> would be execute two times with the same transport->xid. The first time
> xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA
> bit of transport->tcp_flags will not be cleared. The second time 
> xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
> pointer will be returned, then xprt_force_disconnect occur. I think one 
> callback request can be allowed to be received from two sk_buff.
> 
> Signed-off-by: shaobingqing 
> ---
>  net/sunrpc/xprtsock.c |   11 +--
>  1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index ee03d35..606950d 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1271,8 +1271,13 @@ static inline int xs_tcp_read_callback(struct rpc_xprt 
> *xprt,
>   struct sock_xprt *transport =
>   container_of(xprt, struct sock_xprt, xprt);
>   struct rpc_rqst *req;
> + static struct rpc_rqst *req_partial;
> +
> + if (req_partial == NULL)
> + req = xprt_alloc_bc_request(xprt);
> + else if (req_partial->rq_xid == transport->tcp_xid)
> + req = req_partial;

What happens here if req_partial->rq_xid != transport->tcp_xid? AFAICS,
req will be undefined. Either way, you cannot use a static variable for
storage here: that isn't re-entrant.

> - req = xprt_alloc_bc_request(xprt);
>   if (req == NULL) {
>   printk(KERN_WARNING "Callback slot table overflowed\n");
>   xprt_force_disconnect(xprt);
> @@ -1285,6 +1290,7 @@ static inline int xs_tcp_read_callback(struct rpc_xprt 
> *xprt,
>  
>   if (!(transport->tcp_flags & TCP_RCV_COPY_DATA)) {
>   struct svc_serv *bc_serv = xprt->bc_serv;
> + req_partial = NULL;
>  
>   /*
>* Add callback request to callback list.  The callback
> @@ -1297,7 +1303,8 @@ static inline int xs_tcp_read_callback(struct rpc_xprt 
> *xprt,
>   list_add(>rq_bc_list, _serv->sv_cb_list);
>   spin_unlock(_serv->sv_cb_lock);
>   wake_up(_serv->sv_cb_waitq);
> - }
> + } else
> + req_partial = req;
>  
>   req->rq_private_buf.len = transport->tcp_copied;
>  


-- 
Trond Myklebust
Linux NFS client maintainer

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1214 matches

Mail list logo