VDSO support for 32bit time functions
Hi Stefani, About a year ago you posted a big patch to implement VDSO support for 32bit functions, and the response was a request to clean it up a bit by breaking up the generic bits into a series to make it easier to review / apply. The patch I'm referring to can be found here: http://thread.gmane.org/gmane.linux.kernel/1411713 Did that ever happen? If not, any specific reason why? Do you have a newer version somewhere for "modern" kernel versions? If you're not interested in this anymore, mind if I take it up based on your last version? I'm getting some complaints that this type of thing would really be good as 32bit gettimeofday() on 64bit kernels is really slow (65 nanoseconds on 32bit vs. 17 nanoseconds on 64bit on a high-end i7 processor.) thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] phy: Add new Exynos5 USB 3.0 PHY driver
On Mon, Jan 20, 2014 at 7:12 PM, Vivek Gautam wrote: > Add a new driver for the USB 3.0 PHY on Exynos5 series of SoCs. > The new driver uses the generic PHY framework and will interact > with DWC3 controller present on Exynos5 series of SoCs. > Thereby, removing old phy-samsung-usb3 driver and related code > used untill now which was based on usb/phy framework. > > Signed-off-by: Vivek Gautam Sorry, forgot to add a Reviewed-by tag from Felipe. :-( Will update in the next version of patch after getting a feedback on this patch. > --- > > Changes from v2: > 1) Added support for multiple PHYs (UTMI+ and PIPE3) and >related changes in the driver structuring. > 2) Added a xlate function to get the required phy out of >number of PHYs in mutiple PHY scenerio. > 3) Changed the names of few structures and variables to >have a clearer meaning. > 4) Added 'usb3phy_config' structure to take care of mutiple >phys for a SoC having 'exynos5_usb3phy_drv_data' driver data. > 5) Not deleting support for old driver 'phy-samsung-usb3' until >required support for generic phy is added to DWC3. > > .../devicetree/bindings/phy/samsung-phy.txt| 49 ++ > drivers/phy/Kconfig|8 + > drivers/phy/Makefile |1 + > drivers/phy/phy-exynos5-usb3.c | 621 > > 4 files changed, 679 insertions(+) > create mode 100644 drivers/phy/phy-exynos5-usb3.c > > diff --git a/Documentation/devicetree/bindings/phy/samsung-phy.txt > b/Documentation/devicetree/bindings/phy/samsung-phy.txt > index c0fccaa..57079f8 100644 > --- a/Documentation/devicetree/bindings/phy/samsung-phy.txt > +++ b/Documentation/devicetree/bindings/phy/samsung-phy.txt > @@ -20,3 +20,52 @@ Required properties: > - compatible : should be "samsung,exynos5250-dp-video-phy"; > - reg : offset and length of the Display Port PHY register set; > - #phy-cells : from the generic PHY bindings, must be 0; > + > +Samsung Exynos5 SoC series USB 3.0 PHY controller > +-- > + > +Required properties: > +- compatible : Should be set to one of the following supported values: > + - "samsung,exynos5250-usb3phy" - for exynos5250 SoC, > + - "samsung,exynos5420-usb3phy" - for exynos5420 SoC. > +- reg : Register offset and length of USB 3.0 PHY register set; > +- clocks: Clock IDs array as required by the controller > +- clock-names: names of clocks correseponding to IDs in the clock property; > + Required clocks: > + - phy: main PHY clock (same as USB 3.0 controller IP clock), > + used for register access. > + - usb3phy_refclk: PHY's reference clock (usually crystal clock), > + associated by phy name, used to determine bit values for > + clock settings register. > + Additional clock required for Exynos5420: > + - usb30_sclk_100m: Additional special clock used for PHY operation > + depicted as 'sclk_usbphy30' in CMU of Exynos5420. > +- samsung,syscon-phandle: phandle for syscon interface, which is used to > + control pmu registers for power isolation. > +- #phy-cells : from the generic PHY bindings, must be 1; > + > +For "samsung,exynos5250-usb3phy" and "samsung,exynos5420-usb3phy" compatible > +PHYs, the second cell in the PHY specifier identifies the PHY id, which is > +interpreted as follows: > + 0 - UTMI+ type phy, > + 1 - PIPE3 type phy, > + > +Example: > + usb3_phy: usbphy@1210 { > + compatible = "samsung,exynos5250-usb3phy"; > + reg = <0x1210 0x100>; > + clocks = < 286>, < 1>; > + clock-names = "phy", "usb3phy_refclk"; > + samsung,syscon-phandle = <_syscon>; > + #phy-cells = <1>; > + }; > + > +- aliases: For SoCs like Exynos5420 having multiple USB PHY controllers, > + 'usb3_phy' nodes should have numbered alias in the aliases node, > + in the form of usb3phyN, N = 0, 1... (depending on number of > + controllers). > +Example: > + aliases { > + usb3phy0 = _phy0; > + usb3phy1 = _phy1; > + }; > diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig > index 330ef2d..32f9f38 100644 > --- a/drivers/phy/Kconfig > +++ b/drivers/phy/Kconfig > @@ -51,4 +51,12 @@ config PHY_EXYNOS_DP_VIDEO > help > Support for Display Port PHY found on Samsung EXYNOS SoCs. > > +config PHY_EXYNOS5_USB3 > + tristate "Exynos5 SoC series USB 3.0 PHY driver" > + depends on ARCH_EXYNOS5 > + select GENERIC_PHY > + select MFD_SYSCON > + help > + Enable USB 3.0 PHY support for Exynos 5 SoC series > + > endmenu > diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile > index d0caae9..9c06a61 100644 > --- a/drivers/phy/Makefile > +++ b/drivers/phy/Makefile > @@ -7,3 +7,4 @@
RE: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
> -Original Message- > From: iommu-boun...@lists.linux-foundation.org [mailto:iommu- > boun...@lists.linux-foundation.org] On Behalf Of Alex Williamson > Sent: Saturday, January 18, 2014 2:06 AM > To: Sethi Varun-B16395 > Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org > Subject: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support > > RFC: This is not complete but I want to share with Varun the > dirrection I'm thinking about. In particular, I'm really not > sure if we want to introduce a "v2" interface version with > slightly different unmap semantics. QEMU doesn't care about > the difference, but other users might. Be warned, I'm not even > sure if this code works at the moment. Thanks, > > Alex > > > We currently have a problem that we cannot support advanced features > of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee > that those features will be supported by all of the hardware units > involved with the domain over its lifetime. For instance, the Intel > VT-d architecture does not require that all DRHDs support snoop > control. If we create a domain based on a device behind a DRHD that > does support snoop control and enable SNP support via the IOMMU_CACHE > mapping option, we cannot then add a device behind a DRHD which does > not support snoop control or we'll get reserved bit faults from the > SNP bit in the pagetables. To add to the complexity, we can't know > the properties of a domain until a device is attached. > > We could pass this problem off to userspace and require that a > separate vfio container be used, but we don't know how to handle page > accounting in that case. How do we know that a page pinned in one > container is the same page as a different container and avoid double > billing the user for the page. > > The solution is therefore to support multiple IOMMU domains per > container. In the majority of cases, only one domain will be required > since hardware is typically consistent within a system. However, this > provides us the ability to validate compatibility of domains and > support mixed environments where page table flags can be different > between domains. > > To do this, our DMA tracking needs to change. We currently try to > coalesce user mappings into as few tracking entries as possible. The > problem then becomes that we lose granularity of user mappings. We've > never guaranteed that a user is able to unmap at a finer granularity > than the original mapping, but we must honor the granularity of the > original mapping. This coalescing code is therefore removed, allowing > only unmaps covering complete maps. The change in accounting is > fairly small here, a typical QEMU VM will start out with roughly a > dozen entries, so it's arguable if this coalescing was ever needed. > > We also move IOMMU domain creation to the point where a group is > attached to the container. An interesting side-effect of this is that > we now have access to the device at the time of domain creation and > can probe the devices within the group to determine the bus_type. > This finally makes vfio_iommu_type1 completely device/bus agnostic. > In fact, each IOMMU domain can host devices on different buses managed > by different physical IOMMUs, and present a single DMA mapping > interface to the user. When a new domain is created, mappings are > replayed to bring the IOMMU pagetables up to the state of the current > container. And of course, DMA mapping and unmapping automatically > traverse all of the configured IOMMU domains. > > Signed-off-by: Alex Williamson > --- > > drivers/vfio/vfio_iommu_type1.c | 631 > --- > include/uapi/linux/vfio.h |1 > 2 files changed, 329 insertions(+), 303 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 4fb7a8f..983aae5 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -30,7 +30,6 @@ > #include > #include > #include > -#include/* pci_bus_type */ > #include > #include > #include > @@ -55,11 +54,18 @@ MODULE_PARM_DESC(disable_hugepages, >"Disable VFIO IOMMU support for IOMMU hugepages."); > > struct vfio_iommu { > - struct iommu_domain *domain; > + struct list_headdomain_list; > struct mutexlock; > struct rb_root dma_list; > + bool v2; > +}; > + > +struct vfio_domain { > + struct iommu_domain *domain; > + struct bus_type *bus; > + struct list_headnext; > struct list_headgroup_list; > - boolcache; > + int prot; /* IOMMU_CACHE */ > }; > > struct vfio_dma { > @@ -99,7 +105,7 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iommu > *iommu, > return NULL; > } > > -static void vfio_insert_dma(struct vfio_iommu *iommu, struct
Re: [PATCH 2/2] sched: add statistic for rq->max_idle_balance_cost
On Mon, Jan 20, 2014 at 9:33 PM, Alex Shi wrote: > It's useful to track this value in debug mode. > > Signed-off-by: Alex Shi > --- > kernel/sched/debug.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c > index 1e43e70..f5c529a 100644 > --- a/kernel/sched/debug.c > +++ b/kernel/sched/debug.c > @@ -315,6 +315,7 @@ do { > \ > P(sched_goidle); > #ifdef CONFIG_SMP > P64(avg_idle); > + p64(max_idle_balance_cost); Hi Alex, Does this need to be P64(max_idle_balance_cost)? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add HID's to hid-microsoft driver of Surface Type/Touch Cover 2 to fix bug
On Tue, Jan 21, 2014 at 12:47 AM, Jiri Kosina wrote: > On Mon, 20 Jan 2014, Reyad Attiyat wrote: > >> The below patch fixes a bug 64811 >> (https://bugzilla.kernel.org/show_bug.cgi?id=64811) of the Microsoft >> Surface Type/Touch cover 2 devices being detected as a multitouch >> device. >> The fix adds the HID of the two devices to hid-microsoft driver. This >> ensures that hid-input will eventually be used for the device and not >> hid-multitouch. > > Hi, > > your patch is missing hid_have_special_driver[] entry, therefore correct > driver binding is not guaranteed. > > -- > Jiri Kosina > SUSE Labs > Hi, Thanks for reminding me of hid_have_special_driver[]. I noticed that this device has the HID_DG_CONTACTID and in the comment of the hid_have_sepcial_driver[] * Please note that for multitouch devices (driven by hid-multitouch driver), * there is a proper autodetection and autoloading in place (based on presence * of HID_DG_CONTACTID), so those devices don't need to be added to this list, * as we are doing the right thing in hid_scan_usage(). This device should not be driven by hid-multitouch as it does not handle keyboard/mouse input devices. I submitted a new patch below with it added. I believe it should still be part of this array, in case this kind of implementation is fixed/updated. >From 291742873dcf181faf9657b41279487f31302c73 Mon Sep 17 00:00:00 2001 From: Reyad Attiyat Date: Tue, 21 Jan 2014 01:22:25 -0600 Subject: [PATCH 1/1] Added in HID's for Microsoft Surface Type/Touch cover 2. This is to fix bug 64811 where this device is detected as a multitouch device --- drivers/hid/hid-core.c | 3 +++ drivers/hid/hid-ids.h | 4 +++- drivers/hid/hid-microsoft.c | 4 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index 253fe23..88eb4a6 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -1778,6 +1778,9 @@ static const struct hid_device_id hid_have_special_driver[] = { { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_PRESENTER_8K_USB) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_DIGITAL_MEDIA_3K) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_WIRELESS_OPTICAL_DESKTOP_3_0) }, +{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TYPE_COVER_2) }, +{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TOUCH_COVER_2) }, + { HID_USB_DEVICE(USB_VENDOR_ID_MONTEREY, USB_DEVICE_ID_GENIUS_KB29E) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN) }, { HID_USB_DEVICE(USB_VENDOR_ID_NTRIG, USB_DEVICE_ID_NTRIG_TOUCH_SCREEN_1) }, diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index f9304cb..b523a8b 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -611,7 +611,9 @@ #define USB_DEVICE_ID_MS_PRESENTER_8K_USB0x0713 #define USB_DEVICE_ID_MS_DIGITAL_MEDIA_3K0x0730 #define USB_DEVICE_ID_MS_COMFORT_MOUSE_45000x076c - +#define USB_DEVICE_ID_MS_TOUCH_COVER_2 0x07a7 +#define USB_DEVICE_ID_MS_TYPE_COVER_2 0x07a9 + #define USB_VENDOR_ID_MOJO0x8282 #define USB_DEVICE_ID_RETRO_ADAPTER0x3201 diff --git a/drivers/hid/hid-microsoft.c b/drivers/hid/hid-microsoft.c index 551795b..2599de8 100644 --- a/drivers/hid/hid-microsoft.c +++ b/drivers/hid/hid-microsoft.c @@ -207,6 +207,10 @@ static const struct hid_device_id ms_devices[] = { .driver_data = MS_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_COMFORT_MOUSE_4500), .driver_data = MS_DUPLICATE_USAGES }, +{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TYPE_COVER_2), +.driver_data = 0 }, +{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TOUCH_COVER_2), +.driver_data = 0 }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_PRESENTER_8K_BT), .driver_data = MS_PRESENTER }, -- 1.8.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] vfs: Add fchmodat4 syscall: fchmodat with flag argument
On 01/13/2012 02:53 AM, Andrew Ayer wrote: This adds a 4 argument version of fchmodat (fchmodat4) that supports a flag argument, as specified by POSIX. It supports the same two flags as fchownat: AT_SYMLINK_NOFOLLOW and AT_EMPTY_PATH. I don't think it's possible to emulate AT_EMPTY_PATH in user space, so I wonder if this could be applied, and if not, why. Thanks. -- Florian Weimer / Red Hat Product Security Team -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] cpufreq: Align all CPUs to the same frequency if using shared clock
Thanks for reviewing. Sorry for make you misunderstanding, on our x86 platform, we want all the CPUs share one policy by setting CPUFREQ_SHARED_TYPE_ALL, not share one HW clock line. If the CPUs work at different frequency at init stage, then while the CPUs registering, no policy to align all the CPUs to the same frequency, this caused some conflicts with current CPUs P-state. For example: CPU0: P0 CPU1: P14 CPU2: P14 CPU3: P14 During all the CPUs registering, kernel considers all the CPUs work at P0 state because of sharing CPU0 policy, but the other three CPUs really work at P14 state, all CPUs frequency only wait to be aligned until CPU0 state changed by governor based on shared policy. Best Regards Li zhuangzhi Android System Integration Shanghai Tel: +86 (0)21 6116 4323 -Original Message- From: Viresh Kumar [mailto:viresh.ku...@linaro.org] Sent: Tuesday, January 21, 2014 2:35 PM To: Li, Zhuangzhi Cc: Rafael J. Wysocki; cpuf...@vger.kernel.org; linux...@vger.kernel.org; Linux Kernel Mailing List; Liu, Chuansheng Subject: Re: [PATCH] cpufreq: Align all CPUs to the same frequency if using shared clock On 21 January 2014 08:35, lizhuangzhi wrote: > Some SMP systems want to make all the possible CPUs share the clock, > if the CPUs init frequencies aren't the same, we need to align all the > CPUs to the same frequency while CPUs registing to avoid mismatched > CPU's P-states. > > Signed-off-by: lizhuangzhi > --- > drivers/cpufreq/cpufreq.c |2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 8d19f7c..d00abb5 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -991,6 +991,8 @@ static int __cpufreq_add_dev(struct device *dev, struct > subsys_interface *sif, > * CPU because it is in the same boat. */ > policy = cpufreq_cpu_get(cpu); > if (unlikely(policy)) { > + /* according present policy to align all the cpus frequencies > */ > + cpufreq_driver->target(policy, policy->cur, > + CPUFREQ_RELATION_H); I don't really understand why is this required? CPUs sharing clocks means that CPUs runs on the same clock line and so all of them must be running on same frequency. So, why do we need to call this routine? policy->cur must be the current freq here for CPU in question. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] x86, AMD, NB: silence an underflow test
This is under CAP_SYS_ADMIN but Smatch complains that mask comes from the user and the test for "mask > 0xf" can underflow. The fix is simple. Signed-off-by: Dan Carpenter diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h index a54ee1d054d9..0c4e3e47d462 100644 --- a/arch/x86/include/asm/amd_nb.h +++ b/arch/x86/include/asm/amd_nb.h @@ -19,7 +19,7 @@ extern int amd_cache_northbridges(void); extern void amd_flush_garts(void); extern int amd_numa_init(void); extern int amd_get_subcaches(int); -extern int amd_set_subcaches(int, int); +extern int amd_set_subcaches(int, unsigned long); struct amd_l3_cache { unsigned indices; diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index 59554dca96ec..d9fceb697322 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -179,7 +179,7 @@ int amd_get_subcaches(int cpu) return (mask >> (4 * cuid)) & 0xf; } -int amd_set_subcaches(int cpu, int mask) +int amd_set_subcaches(int cpu, unsigned long mask) { static unsigned int reset, ban; struct amd_northbridge *nb = node_to_amd_nb(amd_get_nb_id(cpu)); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More GPIO madness on iMX6 - and the crappy ARM port of Linux
Hi, Alexandre Courbot wrote: > On Sat, Jan 18, 2014 at 7:43 AM, Linus Walleij > wrote: > > On Fri, Jan 17, 2014 at 9:53 PM, Russell King - ARM Linux > > wrote: > >> On Fri, Jan 17, 2014 at 01:42:44PM -0700, Stephen Warren wrote: > > [...] > > If the OPEN_DRAIN flag is set on that descriptor we should > > always be able to read the input. But as this is not really what the > > I2C core wants to know (it really would prefer not to bother with > > such GPIO flag details) so is it better if we add a special call to > > figure out if the input can be read? Like: > > > > bool gpiod_input_always_valid(const struct gpio_desc *desc); > > > > And leave it up to the core to look at flags, driver characteristics > > etc and determine whether the input can be trusted? > > I am personally a little bit scared by the number of exported > functions in the GPIO framework. It is a pretty large number for > something that is supposed to be simple, so I'd like to avoid adding > more. :) How about the following: > > 1) GPIOs configured as output without the open drain or open source > flag either return -EINVAL on gpiod_get_value(), or a cached value > tracked by gpiolib for consistency (probably the latter). > 2) GPIOs configured as open drain or open source report the actual > value read on the pin, like i2c-core needs. This requires that, for > each GPIO that can be set open drain or open source, > gpiod_input_always_valid() == true. > I would not bind this to the open drain configuration. Any GPIO output pin may actually be in a different state than programmed when the output is forcefully driven by another source (shortcut). So it makes sense to be able to read back the real state of the pad even for push pull outputs. Lothar Waßmann -- ___ Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10 Geschäftsführer: Matthias Kaussen Handelsregistereintrag: Amtsgericht Aachen, HRB 4996 www.karo-electronics.de | i...@karo-electronics.de ___ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LSF/MM 2014 Call For Proposals
On Fri, Dec 20, 2013 at 1:30 AM, Mel Gorman wrote: > The annual Linux Storage, Filesystem and Memory Management Summit for > 2014 will be held on March 24th and 25th before the Linux Foundation > Collaboration summit at The Meritage Resort, Napa Valley, CA. > > > http://events.linuxfoundation.org/events/linux-storage-filesystem-and-mm-summit > http://events.linuxfoundation.org/events/collaboration-summit Just a reminder for anyone who wants to participate in LSF/MM: If you haven't already done so, please send us your request and/or topic proposals by January 31st... > Note that we are running LSF/MM a little earlier in 2014 than in previous > years. > > On behalf of the committee I would like to issue a call for agenda proposals > that are suitable for cross-track discussion as well as more technical > subjects for discussion in the breakout sessions. > > 1) Suggestions for agenda topics should be sent before January 31st > 2014 to: > > lsf...@lists.linux-foundation.org > > and cc the Linux list or lists that are most interested in it: > > ATA: linux-...@vger.kernel.org > FS: linux-fsde...@vger.kernel.org > MM: linux...@kvack.org > SCSI: linux-s...@vger.kernel.org > > People who need more time for visa applications should send proposals before > January 15th. The committee will complete the first round of selections > on that date to accommodate applications. > > Please remember to tag your subject with [LSF/MM TOPIC] to make it > easier to track. Agenda topics and attendees will be selected by the > program committee, but the final agenda will be formed by consensus of > the attendees on the day. > > We will try to cap attendance at around 25-30 per track to facilitate > discussions although the final numbers will depend on the room sizes at > the venue. > > 2) Requests to attend the summit should be sent to: > > lsf...@lists.linux-foundation.org > > Please summarize what expertise you will bring to the meeting, and what > you would like to discuss. Please also tag your email with [LSF/MM ATTEND] > so there is less chance of it getting lost in the large mail pile. > > Presentations are allowed to guide discussion, but are strongly > discouraged. There will be no recording or audio bridge. However, we expect > that written minutes will be published as we did in previous years > > 2013: > http://lwn.net/Articles/548089/ > > 2012: > http://lwn.net/Articles/490114/ > http://lwn.net/Articles/490501/ > > 2011: > http://lwn.net/Articles/436871/ > http://lwn.net/Articles/437066/ > > 3) If you have feedback on last year's meeting that we can use to > improve this year's, please also send that to: > > lsf...@lists.linux-foundation.org > > Thank you on behalf of the program committee: > > Storage: > James Bottomley > Martin K. Petersen > > Filesystems: > Trond Myklebust > Jeff Layton > Dave Chinner > Jan Kara > Ted Ts'o > > MM: > Rik van Riel > Michel Lespinasse -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5 1/3] mm/nobootmem: Fix unused variable
Am Mon, 20 Jan 2014 22:16:33 -0800 (PST) schrieb David Rientjes : > Not sure why you don't just do a one line patch: > > - phys_addr_t size; > + phys_addr_t size __maybe_unused; > to fix it. Just because I did not know that __maybe_unused thing. Discussion of this fix seems to be obsolete because Andrew already took the patch int the form he suggested: One #ifdef in the function with a basic block declaring size once inside. Regards Philipp -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Urgent Offer
Hello Peeps Virgin Money Int| now offer cheep loans to any one who is interested in going into Business or Expanding, personal loans are also give at a flat rate of %10 for the total duration , take a step by contacting us!!! Reply now we await you. jasoncud...@ntlworld.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 1/2] sched/update_avg: avoid negative time
Hi, Alex On 01/21/2014 01:33 PM, Alex Shi wrote: > rq->avg_idle try to reflect the average idle time between the cpu idle > and first wakeup. But in the function, it maybe get a negative value > if old avg_idle is too small. Then this negative value will be double > counted in next time calculation. Guess that is not the original purpose, > so recalibrate it to zero. > > Signed-off-by: Alex Shi > --- > kernel/sched/core.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 30eb011..af9121c6 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1358,6 +1358,9 @@ static void update_avg(u64 *avg, u64 sample) > { > s64 diff = sample - *avg; > *avg += diff >> 3; > + > + if (*avg < 0) > + *avg = 0; This seems like won't happen... if 'diff' is negative, it's absolute value won't bigger than '*avg', not to mention we only use 1/8 of it. Regards, Michael Wang > } > #endif > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add HID's to hid-microsoft driver of Surface Type/Touch Cover 2 to fix bug
On Mon, 20 Jan 2014, Reyad Attiyat wrote: > The below patch fixes a bug 64811 > (https://bugzilla.kernel.org/show_bug.cgi?id=64811) of the Microsoft > Surface Type/Touch cover 2 devices being detected as a multitouch > device. > The fix adds the HID of the two devices to hid-microsoft driver. This > ensures that hid-input will eventually be used for the device and not > hid-multitouch. Hi, your patch is missing hid_have_special_driver[] entry, therefore correct driver binding is not guaranteed. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH V2] fs null_blk: Null pointer deference problem in alloc_page_buffers
On 01/20/2014 10:02 PM, Matias Bjorling wrote: On 01/20/2014 04:58 AM, Raghavendra K T wrote: If we load the null_blk module with bs=8k we get following oops: [ 3819.812190] BUG: unable to handle kernel NULL pointer dereference at 0008 [ 3819.812387] IP: [] create_empty_buffers+0x28/0xaf [ 3819.812527] PGD 219244067 PUD 215a06067 PMD 0 [ 3819.812640] Oops: [#1] SMP [ 3819.812772] Modules linked in: null_blk(+) Fix that by resetting block size to PAGE_SIZE if it is greater than PAGE_SIZE Also add sanity checks for block size > PAGE_SIZE. We should probably split the patch into two. Giving a better description to each of the changes. Agreed. [...] diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index a2e69d2..bcae726 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -622,6 +622,10 @@ static int __init null_init(void) irqmode = NULL_IRQ_NONE; } #endif + if (bs > PAGE_SIZE) { + pr_warn("Invalid block size. Setting it to %lu\n", PAGE_SIZE); + bs = PAGE_SIZE; + } Could the warning say something like: pr_warn("null_blk: invalid block size\n"); pr_warn("null_blk: defaults block size to \n"); then it follows the same patterns as the other errors. Agree. if (queue_mode == NULL_Q_MQ && use_per_node_hctx) { if (submit_queues < nr_online_nodes) { diff --git a/fs/block_dev.c b/fs/block_dev.c index 1e86823..2481d42 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1027,6 +1027,7 @@ void bd_set_size(struct block_device *bdev, loff_t size) break; bsize <<= 1; } + BUG_ON(bsize > PAGE_SIZE); bdev->bd_block_size = bsize; bdev->bd_inode->i_blkbits = blksize_bits(bsize); } diff --git a/fs/buffer.c b/fs/buffer.c index 6024877..8b7ada1 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1571,6 +1571,7 @@ void create_empty_buffers(struct page *page, struct buffer_head *bh, *head, *tail; head = alloc_page_buffers(page, blocksize, 1); + BUG_ON(!head); bh = head; do { bh->b_state |= b_state; For the check patch, the description could mention some text on why we hit this error case and why we check for it. Thanks for the suggestions. Will post V3 with the changes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: spinlock lockup
Dear Will, Thanks for your reply, We are using Cortex A15. yes, this is with ticket lock. We will check value of arch_spinlock_t and share it. It is bit difficult to reproduce this scenario. If you have some idea ,please suggest how to reproduce it. thanks On Mon, Jan 20, 2014 at 3:50 PM, Will Deacon wrote: > On Sat, Jan 18, 2014 at 07:25:51AM +, naveen yadav wrote: >> We are using 3.8.x kernel on ARM, We are facing soft lockup issue. >> Following are the logs. > > Which CPU/SoC are you using? > >> BUG: spinlock lockup suspected on CPU#0, process1/525 >> lock: 0xd8ac9a64, .magic: dead4ead, .owner: /-1, .owner_cpu: -1 >> >> >> 1 . Looks like lock is available as owner is -1, why arch_spin_trylock >> is getting failed ? > > Is this with or without the ticket lock patches? Can you inspect the actual > value of the arch_spinlock_t? > >> 2. There is a patch : ARM: spinlock: retry trylock operation if strex >> fails on free lock >> http://permalink.gmane.org/gmane.linux.ports.arm.kernel/240913 >> In this patch, A loop has been added around strexeq %2, %0, [%3]". >> {Comment "retry the trylock operation if the lock appears >> to be free but the strex reported failure"} >> >> but arch_spin_trylock is called by __spin_lock_debug and its already >> getting called in loops. So what purpose is resolves? > > Does this patch help your issue? The purpose of it is to distinguish between > two types of contention: > > (1) The lock is actually taken > (2) The lock is free, but two people are doing a trylock at the same time > > In the case of (2), we do actually want to spin again otherwise you could > potentially end up in a pathological case where the two CPUs repeatedly > shoot down each other's monitor and forward progress isn't made until the > sequence is broken by something like an interrupt. > >> static void __spin_lock_debug(raw_spinlock_t *lock) >> { >> u64 i; >> u64 loops = loops_per_jiffy * HZ; >> >> for (i = 0; i < loops; i++) { >> if (arch_spin_trylock(>raw_lock)) >> return; >> __delay(1); >> } >> /* lockup suspected: */ >> spin_dump(lock, "lockup suspected"); >> } >> >> 3. Is this patch useful to us, How can we reproduce this scenario ? >> Scenario : Lock is available but arch_spin_trylock is returning as failure > > Potentially. Why can't you simply apply the patch and see if it resolves your > issue? > > Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages
2014/1/21 Minchan Kim : > Please check your MUA and don't break thread. > > On Tue, Jan 21, 2014 at 11:07:42AM +0800, Cai Liu wrote: >> Thanks for your review. >> >> 2014/1/21 Minchan Kim : >> > Hello Cai, >> > >> > On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote: >> >> zswap can support multiple swapfiles. So we need to check >> >> all zbud pool pages in zswap. >> >> >> >> Version 2: >> >> * add *total_zbud_pages* in zbud to record all the pages in pools >> >> * move the updating of pool pages statistics to >> >> alloc_zbud_page/free_zbud_page to hide the details >> >> >> >> Signed-off-by: Cai Liu >> >> --- >> >> include/linux/zbud.h |2 +- >> >> mm/zbud.c| 44 >> >> mm/zswap.c |4 ++-- >> >> 3 files changed, 35 insertions(+), 15 deletions(-) >> >> >> >> diff --git a/include/linux/zbud.h b/include/linux/zbud.h >> >> index 2571a5c..1dbc13e 100644 >> >> --- a/include/linux/zbud.h >> >> +++ b/include/linux/zbud.h >> >> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, unsigned long >> >> handle); >> >> int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries); >> >> void *zbud_map(struct zbud_pool *pool, unsigned long handle); >> >> void zbud_unmap(struct zbud_pool *pool, unsigned long handle); >> >> -u64 zbud_get_pool_size(struct zbud_pool *pool); >> >> +u64 zbud_get_pool_size(void); >> >> >> >> #endif /* _ZBUD_H_ */ >> >> diff --git a/mm/zbud.c b/mm/zbud.c >> >> index 9451361..711aaf4 100644 >> >> --- a/mm/zbud.c >> >> +++ b/mm/zbud.c >> >> @@ -52,6 +52,13 @@ >> >> #include >> >> #include >> >> >> >> +/* >> >> +* statistics >> >> +**/ >> >> + >> >> +/* zbud pages in all pools */ >> >> +static u64 total_zbud_pages; >> >> + >> >> /* >> >> * Structures >> >> */ >> >> @@ -142,10 +149,28 @@ static struct zbud_header *init_zbud_page(struct >> >> page *page) >> >> return zhdr; >> >> } >> >> >> >> +static struct page *alloc_zbud_page(struct zbud_pool *pool, gfp_t gfp) >> >> +{ >> >> + struct page *page; >> >> + >> >> + page = alloc_page(gfp); >> >> + >> >> + if (page) { >> >> + pool->pages_nr++; >> >> + total_zbud_pages++; >> > >> > Who protect race? >> >> Yes, here the pool->pages_nr and also the total_zbud_pages are not protected. >> I will re-do it. >> >> I will change *total_zbud_pages* to atomic type. > > Wait, it doesn't make sense. Now, you assume zbud allocator would be used > for only zswap. It's true until now but we couldn't make sure it in future. > If other user start to use zbud allocator, total_zbud_pages would be > pointless. Yes, you are right. ZBUD is a common module. So in this patch calculate the zswap pool size in zbud is not suitable. > > Another concern is that what's your scenario for above two swap? > How often we need to call zbud_get_pool_size? > In previous your patch, you reduced the number of call so IIRC, > we only called it in zswap_is_full and for debugfs. zbud_get_pool_size() is called frequently when adding/freeing zswap entry happen in zswap . This is why in this patch I added a counter in zbud, and then in zswap the iteration of zswap_list to calculate the pool size will not be needed. > Of course, it would need some lock or refcount to prevent destroy > of zswap_tree in parallel with zswap_frontswap_invalidate_area but > zswap_is_full doesn't need to be exact so RCU would be good fit. > > Most important point is that now zswap doesn't consider multiple swap. > For example, Let's assume you uses two swap A and B with different priority > and A already has charged 19% long time ago and let's assume that A swap is > full now so VM start to use B so that B has charged 1% recently. > It menas zswap charged (19% + 1%)i is full by default. > > Then, if VM want to swap out more pages into B, zbud_reclaim_page > would be evict one of pages in B's pool and it would be repeated > continuously. It's totally LRU reverse problem and swap thrashing in B > would happen. > The scenario is below: There are 2 swap A, B in system. If pool size of A reach 19% of ram size and swap A is also full. Then swap B will be used. Pool size of B will be increased until it hit the 20% of the ram size. By now zswap pool size is about 39% of ram size. If there are more than 2 swap file/device, zswap pool will expand out of control and there may be no swapout happened. I think the original intention of zswap designer is to keep the total zswap pools size below 20% of RAM size. Thanks. > Please say your usecase scenario and if it's really problem, > we need more surgery. > > Thanks. > >> For *pool->pages_nr*, one way is to use pool->lock to protect. But I >> think it is too heavy. >> So does it ok to change pages_nr to atomic type too? >> >> >> > >> >> + } >> >> + >> >> + return page; >> >> +} >> >> + >> >> + >> >> /* Resets the
Re: [PATCH] cpufreq: Align all CPUs to the same frequency if using shared clock
On 21 January 2014 08:35, lizhuangzhi wrote: > Some SMP systems want to make all the possible CPUs share the clock, > if the CPUs init frequencies aren't the same, we need to align all the > CPUs to the same frequency while CPUs registing to avoid mismatched > CPU's P-states. > > Signed-off-by: lizhuangzhi > --- > drivers/cpufreq/cpufreq.c |2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 8d19f7c..d00abb5 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -991,6 +991,8 @@ static int __cpufreq_add_dev(struct device *dev, struct > subsys_interface *sif, > * CPU because it is in the same boat. */ > policy = cpufreq_cpu_get(cpu); > if (unlikely(policy)) { > + /* according present policy to align all the cpus frequencies > */ > + cpufreq_driver->target(policy, policy->cur, > CPUFREQ_RELATION_H); I don't really understand why is this required? CPUs sharing clocks means that CPUs runs on the same clock line and so all of them must be running on same frequency. So, why do we need to call this routine? policy->cur must be the current freq here for CPU in question. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: Tree for Jan 21
Hi all, This tree fails (more than usual) the powerpc allyesconfig build. Changes since 20140117: Dropped tree: sh (complex merge conflicts against very old commits) imx-mxs (complex merge conflicts against the arm tree) The imx-mxs tree gained conflicts against the arm tree (so I dropped it). The powerpc tree still had its build failure. The sound-asoc tree gained a conflict against the sound tree. The kvm tree lost its build failure. The gpio tree still had its build failure for which I reverted a commit. Non-merge commits (relative to Linus' tree): 9729 9131 files changed, 494215 insertions(+), 241500 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" as mentioned in the FAQ on the wiki (see below). You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a multi_v7_defconfig for arm. After the final fixups (if any), it is also built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and allyesconfig (minus CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc, sparc64 and arm defconfig. These builds also have CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and CONFIG_DEBUG_INFO disabled when necessary. Below is a summary of the state of the merge. I am currently merging 209 trees (counting Linus' and 29 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. There is a wiki covering stuff to do with linux-next at http://linux.f-seidel.de/linux-next/pmwiki/ . Thanks to Frank Seidel. -- Cheers, Stephen Rothwells...@canb.auug.org.au $ git checkout master $ git reset --hard stable Merging origin/master (c9cdd9a6ae49 Merge branch 'x86/mpx' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip) Merging fixes/master (b0031f227e47 Merge tag 's2mps11-build' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator) Merging kbuild-current/rc-fixes (19514fc665ff arm, kbuild: make "make install" not depend on vmlinux) Merging arc-current/for-curr (7e22e91102c6 Linux 3.13-rc8) Merging arm-current/fixes (b25f3e1c3584 ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling) Merging m68k-current/for-linus (56931d73697c m68k/mac: Make SCC reset work more reliably) Merging metag-fixes/fixes (3b2f64d00c46 Linux 3.11-rc2) Merging powerpc-merge/merge (b3084f4db3ae powerpc/thp: Fix crash on mremap) Merging sparc/master (ef350bb7c5e0 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4) Merging net/master (7d0d46da750a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging ipsec/master (965cdea82569 dccp: catch failed request_module call in dccp_probe init) Merging sound-current/for-linus (7552f34a7900 Merge tag 'asoc-v3.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus) Merging pci-current/for-linus (f0b75693cbb2 MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers) Merging wireless/master (2eff7c791a18 Merge tag 'nfc-fixes-3.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-fixes) Merging driver-core.current/driver-core-linus (413541dd66d5 Linux 3.13-rc5) Merging tty.current/tty-linus (413541dd66d5 Linux 3.13-rc5) Merging usb.current/usb-linus (413541dd66d5 Linux 3.13-rc5) Merging staging.current/staging-linus (413541dd66d5 Linux 3.13-rc5) Merging char-misc.current/char-misc-linus (802eee95bde7 Linux 3.13-rc6) Merging input-current/for-linus (8e2f2325b73f Input: xpad - add new USB IDs for Logitech F310 and F710) Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" stripe) Merging crypto-current/master (efb753b8e013 crypto: ixp4xx - Fix kernel compile error) Merging ide/master (c2f7d1e103ef ide: pmac: remove unnecessary pci_set_drvdata()) Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff) Merging sh-current/sh-fixes-for-linus (44033109e99c SH: Convert out[bwl] macros to inline functions) Merging devicetree-current/devicetree/merge (6f041e99fc7b of: Fix NULL
Re: [PATCH 1/2] USB: at91: fix the number of endpoint parameter
On 11:39 Mon 20 Jan , Bo Shen wrote: > Hi J, > > On 01/18/2014 01:20 PM, Jean-Christophe PLAGNIOL-VILLARD wrote: > >On 10:59 Fri 17 Jan , Bo Shen wrote: > >>In sama5d3 SoC, there are 16 endpoints. As the USBA_NR_ENDPOINTS > >>is only 7. So, fix it for sama5d3 SoC using the udc->num_ep. > >> > >>Signed-off-by: Bo Shen > >>--- > >> > >> drivers/usb/gadget/atmel_usba_udc.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >>diff --git a/drivers/usb/gadget/atmel_usba_udc.c > >>b/drivers/usb/gadget/atmel_usba_udc.c > >>index 2cb52e0..7e67a81 100644 > >>--- a/drivers/usb/gadget/atmel_usba_udc.c > >>+++ b/drivers/usb/gadget/atmel_usba_udc.c > >>@@ -1670,7 +1670,7 @@ static irqreturn_t usba_udc_irq(int irq, void *devid) > >>if (ep_status) { > >>int i; > >> > >>- for (i = 0; i < USBA_NR_ENDPOINTS; i++) > >>+ for (i = 0; i < udc->num_ep; i++) > > > >no the limit need to specified in the driver as a checkpoint by the > >compatible > >or platform driver id > > You mean, we should not trust the data passed from dt node or > platform data? Or do you think we should do double confirm? no base on the driver name or the compatible you will known the MAX EP not based on the dt ep description as we do on pinctrl-at91 Best Regards, J. > >>if (ep_status & (1 << i)) { > >>if (ep_is_control(>usba_ep[i])) > >>usba_control_irq(udc, >usba_ep[i]); > >>-- > >>1.8.5.2 > >> > > Best Regards, > Bo Shen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5 1/3] mm/nobootmem: Fix unused variable
On Mon, 20 Jan 2014, Philipp Hachtmann wrote: > diff --git a/mm/nobootmem.c b/mm/nobootmem.c > index e2906a5..0215c77 100644 > --- a/mm/nobootmem.c > +++ b/mm/nobootmem.c > @@ -116,23 +116,29 @@ static unsigned long __init > __free_memory_core(phys_addr_t start, > static unsigned long __init free_low_memory_core_early(void) > { > unsigned long count = 0; > - phys_addr_t start, end, size; > + phys_addr_t start, end; > u64 i; > > +#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK > + phys_addr_t size; > +#endif > + > for_each_free_mem_range(i, NUMA_NO_NODE, , , NULL) > count += __free_memory_core(start, end); > > #ifdef CONFIG_ARCH_DISCARD_MEMBLOCK > - > - /* Free memblock.reserved array if it was allocated */ > - size = get_allocated_memblock_reserved_regions_info(); > - if (size) > - count += __free_memory_core(start, start + size); > - > - /* Free memblock.memory array if it was allocated */ > - size = get_allocated_memblock_memory_regions_info(); > - if (size) > - count += __free_memory_core(start, start + size); > + { > + phys_addr_t size; I think you may have misunderstood Andrew's suggestion: "size" here is overloading the "size" you have already declared for this configuration. Not sure why you don't just do a one line patch: - phys_addr_t size; + phys_addr_t size __maybe_unused; to fix it. > + /* Free memblock.reserved array if it was allocated */ > + size = get_allocated_memblock_reserved_regions_info(); > + if (size) > + count += __free_memory_core(start, start + size); > + > + /* Free memblock.memory array if it was allocated */ > + size = get_allocated_memblock_memory_regions_info(); > + if (size) > + count += __free_memory_core(start, start + size); > + } > #endif > > return count; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] memcg: Do not hang on OOM when killed by userspace OOM access to memory reserves
On Thu, 16 Jan 2014, Michal Hocko wrote: > > The heuristic may have existed for ages, but the proposed memcg > > configuration for preserving memory such that userspace oom handlers may > > run such as > > > > _root__ > > / \ > > user oom > >/\ / \ > >AB a b > > > > where user/memory.limit_in_bytes == [amount of present RAM] + > > oom/memory.limit_in_bytes - [some fudge] causes all bypasses to be > > problematic, including Johannes' buggy bypass for charges in memcgs with > > pending memcgs that has since been fixed after I identified it. This > > bypass is included. Processes attached to "a" and "b" are userspace oom > > handlers for processes attached to "A" and "B", respectively. > > > > The amount of memory you're talking about is proportional to the number of > > processes that have pending SIGKILLs (and now those with PF_EXITING set), > > the former of which is obviously more concerning since they could be > > charging memory at any point in the kernel that would succeed. > > I understand your concerns. Yes, excessive charges might be dangerous. I > haven't dismissed that when you mentioned it earlier. I am just > repeatedly asking how much memory are we talking about, how real is the > issue and what are all the other conseqeunces. And for some reason you > are not providing that information (or maybe I am just not seeing that > in your responses) and that is why we are stuck in circle. > Wtf are you talking about? You're adding a bypass in this patch and then you're asking me to go and see how much memory it could potentially bypass and take away from oom handlers under the above memcg configuration? This seems like something you should provide before throwing out patches that nobody has tested if you want to make the argument that the above memcg configuration is valid for handling userspace oom notifications. And you certainly have dismissed what I've mentioned earlier when I said that anybody can add memory allocation to the exit path later on and nobody is going to think about how much memory this is going to bypass to the root memcg and potentially take away from userspace oom handlers. There's two possible ways to forward this: - avoid bypass to the root memcg in every possible case such that the above memcg configuration actually makes a guarantee to userspace oom handlers attached to it, or - provide per-memcg memory reserves such that userspace oom handlers can allocate and charge memory without the above memcg configuration so there is a guarantee. What's not acceptable, now or ever, is suggesting a solution to a problem that is supposed to guarantee some resource and then allow under some circumstances that resource to be completely depleted such that the solution never works. > Yes, and apart from GFP_NOFAIL we are allowing to bypass only those that > should terminate in a short time. I think that having a setup with a > guarantee of never triggering the global OOM is too ambitious and I am > even skeptical it would be achievable. > "Short time" is meaningless if the memory allocation causes memory to not be available to userspace oom handlers. If allocations are allowed to be charged because you're in the exit() path or because you have SIGKILL, that can result in a system oom condition that would prevent userspace from being able to handle them. > > I'm debating both fatal_signal_pending() and PF_EXITING here since they > > are now both bypasses, we need to remove fatal_signal_pending(). My > > simple question with your patch: how do you guarantee memory to processes > > attached to "a" and "b"? > > The only way you can get that _guarantee_ is to account all the memory > allocations. And that is not implemented and I would even question > whether it is worthwhile. So we still have to live with a possibility > of triggering the global OOM killer. That's why I believe we need to be > able to tell the kernel what is the user policy for oom killer (that is > a different discussion though). > So you're saying that Tejun's suggested userspace oom handler configuration is pointless, correct? We can certainly provide a guarantee if memory is reserved specifically for userspace oom handling like I proposed, the same way that memory reserves are guaranteed for oom killed processes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves
On Mon, 20 Jan 2014, Greg Kroah-Hartmann wrote: > > The patches getting proposed through -mm for stable boggles my mind > > sometimes. > > Do you have any objections to patches that I have taken for -stable? If > so, please let me know. > You've haven't taken the ones that I outlined in http://marc.info/?l=linux-kernel=138580717728759, so I'm happy that those could be prevented. I'm identifying another patch here that is pending in -mm that obviously violates the stable kernel rules and I don't believe it should be annotated in a way that you'll scoop it up later. The patch in question hasn't been tested by anybody and I don't think you want such things to ever be merged into a stable kernel series. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves
On Mon, Jan 20, 2014 at 09:58:28PM -0800, David Rientjes wrote: > The patches getting proposed through -mm for stable boggles my mind > sometimes. Do you have any objections to patches that I have taken for -stable? If so, please let me know. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 1/2] sched/update_avg: avoid negative time
On 01/21/2014 01:33 PM, Alex Shi wrote: > rq->avg_idle try to reflect the average idle time between the cpu idle > and first wakeup. But in the function, it maybe get a negative value > if old avg_idle is too small. Then this negative value will be double > counted in next time calculation. Guess that is not the original purpose, > so recalibrate it to zero. Forget this patch, the avg_idle is impossible to get negative. Sorry for noise! -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves
On Thu, 16 Jan 2014, Michal Hocko wrote: > > This is concerning because it's merged in -mm without being tested by Eric > > and is marked for stable while violating the stable kernel rules criteria. > > Are you questioning the patch fixes the described issue? > > Please note that the exit_robust_list and PF_EXITING as a culprit has > been identified by Eric. Of course I would prefer if it was tested by > anybody who can reproduce it. You're saying the patch hasn't been tested by anybody and that clearly violates the first rule in Documentation/stable_kernel_rules.txt: - It must be obviously correct and tested. Adding Greg to the cc if this should be clarified further. The patches getting proposed through -mm for stable boggles my mind sometimes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf tools: Fix JIT profiling on heap
Hi Arnaldo, On Fri, 17 Jan 2014 11:34:04 -0300, Arnaldo Carvalho de Melo wrote: > Em Fri, Jan 17, 2014 at 04:44:04PM +0900, Namhyung Kim escreveu: >> On Thu, 16 Jan 2014 20:23:27 +, Gaurav Jain wrote: >> > On 1/16/14, 9:37 AM, "Arnaldo Carvalho de Melo" >> > wrote: >> > >> >>Em Thu, Jan 16, 2014 at 10:49:31AM +0900, Namhyung Kim escreveu: >> >>> Gaurav reported that perf cannot profile JIT program if it executes >> >>> the code on heap. This was because current map__new() only handle JIT >> >>> on anon mappings - extends it to handle no_dso (heap, stack) case too. >> >>> >> >>> This patch assumes JIT profiling only provides dynamic function >> >>> symbols so check the mapping type to distinguish the case. It'd >> >>> provide no symbols for data mapping - if we need to support symbols on >> >>> data mappings later it should be changed. >> >> >> >>But we do support symbols in data mappings, that is why we have >> >>MAP__VARIABLE, etc, can you elaborate? > >> > Does perf support data mappings from perf map files? Could you please >> > share an example of how I may be able to use this. > >> IIUC there's no difference between function and data mapping. So you >> can use same perf map file for both - in fact there's no way to use >> different map file in a single task. I guess perf will use it to find > > Do the /tmp/perf mapping has any per entry indication on the type of > symbol it is (data, text) like ELF and kallsyms symtabs have? Quoting Documentation/jit-interface.txt: Each line has the following format, fields separated with spaces: START SIZE symbolname START and SIZE are hex numbers without 0x. symbolname is the rest of the line, so it could contain special characters. > > It is possible for a function and a variable to have the same virtual > address in some architectures (SPARC, iirc), that is why we have > different MAP_ types (FUNCTION, VARIABLE) (which should really be > renamed to TEXT, DATA). Hmm.. didn't know that, interesting.. > > So a 'struct map' for a data mmap should possibly point to a different > 'dso' of the JIT /tmp/perf-... style if those maps don't have per entry > indication of text/data. Yes, but there's no way to do it currently. > >> only function symbols in function mappings and variables in data >> mapping based on the address it accesses. > > Well, the lookup should figure out if the IP refers to TEXT or DATA and > use MAP__{FUNCTION, VARIABLE} accordingly when asking for symbol > resolution. Right. But in this case we cannot determine whether a symbol in the /tmp/perf-... file is a function or variable. > >> What I wasn't sure is whether JIT program also produces some dynamic data. >> And I think only perf mem command cares about data mappings, no? > > Well, I think it would be great to do that kind of data resolution for > JITs the same way it is interesting to do for ELF ones :-) > > I need to stare harder at that patch, but with the above in mind, do we > really have to check if the map is MAP__FUNCTION as IIRC this patch > does? Not sure. For a JIT case, I guess the mapping is always executable and we don't support data mapping yet, so it seems okay for now. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] fs: don't write pages when receiving a pending SIGKILL in __get_user_pages()
On Sat, 18 Jan 2014, Xishi Qiu wrote: > In the process IO direction, dio_refill_pages will call get_user_pages_fast > to map the page from user space. If ret is less than 0 and IO is write, the > function will create a zero page to fill data. This may work for some file > system, but in some device operate we prefer whole write or fail, not half > data half zero, e.g. fs metadata, like inode, identy. So you're attemping to define a behavior for all users of direct IO for a problem that is filesystem or backing device dependent? Perhaps if you elaborated on the problem that you're seeing then we could address it. > This happens often when kill a process which is doing direct IO. Consider > the following cases, the process A is doing IO process, may enter > __get_user_pages > function, if other processes send process A SIG_KILL, A will enter the > following branches > /* >* If we have a pending SIGKILL, don't keep faulting >* pages and potentially allocating memory. >*/ > if (unlikely(fatal_signal_pending(current))) > return i ? i : -ERESTARTSYS; > Return current pages. direct IO will write the pages, the subsequent pages > which can’t get will use zero page instead. > > Signed-off-by: Xishi Qiu > Signed-off-by: Bin Yang > --- > fs/direct-io.c |3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/fs/direct-io.c b/fs/direct-io.c > index 0e04142..b74d565 100644 > --- a/fs/direct-io.c > +++ b/fs/direct-io.c > @@ -174,6 +174,9 @@ static inline int dio_refill_pages(struct dio *dio, > struct dio_submit *sdio) > >pages[0]);/* Put results here */ > > if (ret < 0 && sdio->blocks_available && (dio->rw & WRITE)) { > + /* If task is killed, do not write anymore */ > + if (ret == -ERESTARTSYS) > + goto out; > struct page *page = ZERO_PAGE(0); > /* >* A memory fault, but the filesystem has some outstanding We don't mix declarations and text, please try to compile your patches before proposing them.
Re: [patch 9/9] mm: keep page cache radix tree nodes in check
On Tue, Jan 21, 2014 at 02:03:58PM +1100, Dave Chinner wrote: > On Mon, Jan 20, 2014 at 06:17:37PM -0500, Johannes Weiner wrote: > > On Fri, Jan 17, 2014 at 11:05:17AM +1100, Dave Chinner wrote: > > > On Fri, Jan 10, 2014 at 01:10:43PM -0500, Johannes Weiner wrote: > > > > + /* Only shadow entries in there, keep track of this node */ > > > > + if (!(node->count & RADIX_TREE_COUNT_MASK) && > > > > + list_empty(>private_list)) { > > > > + node->private_data = mapping; > > > > + list_lru_add(_shadow_nodes, > > > > >private_list); > > > > + } > > > > > > You can't do this list_empty(>private_list) check safely > > > externally to the list_lru code - only time that entry can be > > > checked safely is under the LRU list locks. This is the reason that > > > list_lru_add/list_lru_del return a boolean to indicate is the object > > > was added/removed from the list - they do this list_empty() check > > > internally. i.e. the correct, safe way to do conditionally update > > > state iff the object was added to the LRU is: > > > > > > if (!(node->count & RADIX_TREE_COUNT_MASK)) { > > > if (list_lru_add(_shadow_nodes, >private_list)) > > > node->private_data = mapping; > > > } > > > > > > > + radix_tree_replace_slot(slot, page); > > > > + mapping->nrpages++; > > > > + if (node) { > > > > + node->count++; > > > > + /* Installed page, can't be shadow-only anymore */ > > > > + if (!list_empty(>private_list)) > > > > + list_lru_del(_shadow_nodes, > > > > +>private_list); > > > > + } > > > > > > Same issue here: > > > > > > if (node) { > > > node->count++; > > > list_lru_del(_shadow_nodes, >private_list); > > > } > > > > All modifications to node->private_list happen under > > mapping->tree_lock, and modifications of a neighboring link should not > > affect the outcome of the list_empty(), so I don't think the lru lock > > is necessary. > > Can you please add that as a comment somewhere explaining why it is > safe to do this? Absolutely. > > > > + case LRU_REMOVED_RETRY: > > > > if (--nlru->nr_items == 0) > > > > node_clear(nid, lru->active_nodes); > > > > WARN_ON_ONCE(nlru->nr_items < 0); > > > > isolated++; > > > > + /* > > > > +* If the lru lock has been dropped, our list > > > > +* traversal is now invalid and so we have to > > > > +* restart from scratch. > > > > +*/ > > > > + if (ret == LRU_REMOVED_RETRY) > > > > + goto restart; > > > > break; > > > > case LRU_ROTATE: > > > > list_move_tail(item, >list); > > > > > > I think that we need to assert that the list lru lock is correctly > > > held here on return with LRU_REMOVED_RETRY. i.e. > > > > > > case LRU_REMOVED_RETRY: > > > assert_spin_locked(>lock); > > > case LRU_REMOVED: > > > > Ah, good idea. How about adding it to LRU_RETRY as well? > > Yup, good idea. Ok, will do. > > > > +static struct shrinker workingset_shadow_shrinker = { > > > > + .count_objects = count_shadow_nodes, > > > > + .scan_objects = scan_shadow_nodes, > > > > + .seeks = DEFAULT_SEEKS * 4, > > > > + .flags = SHRINKER_NUMA_AWARE, > > > > +}; > > > > > > Can you add a comment explaining how you calculated the .seeks > > > value? It's important to document the weighings/importance > > > we give to slab reclaim so we can determine if it's actually > > > acheiving the desired balance under different loads... > > > > This is not an exact science, to say the least. > > I know, that's why I asked it be documented rather than be something > kept in your head. > > > The shadow entries are mostly self-regulated, so I don't want the > > shrinker to interfere while the machine is just regularly trimming > > caches during normal operation. > > > > It should only kick in when either a) reclaim is picking up and the > > scan-to-reclaim ratio increases due to mapped pages, dirty cache, > > swapping etc. or b) the number of objects compared to LRU pages > > becomes excessive. > > > > I think that is what most shrinkers with an elevated seeks value want, > > but this translates very awkwardly (and not completely) to the current > > cost model, and we should probably rework that interface. > > > > "Seeks" currently encodes 3 ratios: > > > > 1. the cost of creating an object vs. a page > > > > 2. the expected number of objects vs. pages > > It doesn't encode that at all. If it did, then the default value > wouldn't be "2". > > > 3. the cost of
[PATCH 2/2] sched: add statistic for rq->max_idle_balance_cost
It's useful to track this value in debug mode. Signed-off-by: Alex Shi --- kernel/sched/debug.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 1e43e70..f5c529a 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -315,6 +315,7 @@ do { \ P(sched_goidle); #ifdef CONFIG_SMP P64(avg_idle); + p64(max_idle_balance_cost); #endif P(ttwu_count); -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 1/2] sched/update_avg: avoid negative time
rq->avg_idle try to reflect the average idle time between the cpu idle and first wakeup. But in the function, it maybe get a negative value if old avg_idle is too small. Then this negative value will be double counted in next time calculation. Guess that is not the original purpose, so recalibrate it to zero. Signed-off-by: Alex Shi --- kernel/sched/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 30eb011..af9121c6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1358,6 +1358,9 @@ static void update_avg(u64 *avg, u64 sample) { s64 diff = sample - *avg; *avg += diff >> 3; + + if (*avg < 0) + *avg = 0; } #endif -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
On Mon, 20 Jan 2014, Jianguo Wu wrote: > When OOM happen, will dump buddy free areas info, hugetlb pages info, > memory state of all eligible tasks, per-cpu memory info. > But do not dump slab/vmalloc info, sometime, it's not enough to figure out the > reason OOM happened. > > So, my questions are: > 1. Should dump slab/vmalloc info when OOM happen? Though we can get these > from proc file, > but usually we do not monitor the logs and check proc file immediately when > OOM happened. > The problem is that slabinfo becomes excessively verbose and dumping it all to the kernel log often times causes important messages to be lost. This is why we control things like the tasklist dump with a VM sysctl. It would be possible to dump, say, the top ten slab caches with the highest memory usage, but it will only be helpful for slab leaks. Typically there are better debugging tools available than analyzing the kernel log; if you see unusually high slab memory in the meminfo dump, you can enable it. > 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be > dumped? > Also very verbose and would cause important messages to be lost, we try to avoid spamming the kernel log with all of this information as much as possible. > 3. Without these info, usually how to figure out OOM reason? > Analyze the memory usage in the meminfo and determine what is unusually high; if it's mostly anonymous memory, you can usually correlate it back to a high rss for a process in the tasklist that you didn't suspect to be using that much memory, for example. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BTRFS-specific] Re: Dirty deleted files cause pointless I/O storms (unless truncated first)
[cc: btrfs] On Mon, Jan 20, 2014 at 8:46 PM, Dave Chinner wrote: > On Mon, Jan 20, 2014 at 04:59:23PM -0800, Andy Lutomirski wrote: >> The code below runs quickly for a few iterations, and then it slows >> down and the whole system becomes laggy for far too long. >> >> Removing the sync_file_range call results in no I/O being performed at >> all (which means that the kernel isn't totally screwing this up), and >> changing "4096" to SIZE causes lots of I/O but without >> the going-out-to-lunch bit (unsurprisingly). > > More details please. hardware, storage, kernel version, etc. The kernel is 3.11.10-301.fc20.x86_64. It's an excessively fast CPU (Intel i7-3930K) with 16GB RAM and a Corsair Force 3 SSD (6Gb/s SATA) SSD. The FS is btrfs on LVM on dm-crypt. In that setup, this thing goes quickly for 100 iterations or so, at which point even trying to Ctrl-C it lags out for ten seconds or so. I clearly should have tested more thoroughly, though -- I can't reproduce this problem on ext4. > > I can't reproduce any slowdown with the code as posted on a VM > running 3.31-rc5 with 16GB RAM and an SSD w/ ext4 or XFS. The > workload is only generating about 80 IOPS on ext4 so even a slow > spindle should be able handle this without problems... > >> Surprisingly, uncommenting the ftruncate call seems to fix the >> problem. This suggests that all the necessary infrastructure to avoid >> wasting time writing to deleted files is there but that it's not >> getting used. > > Not surprising at all - if it's stuck in a writeback loop somewhere, > truncating the file will terminate writeback because it end up being > past EOF and so stops immediately... Presumably ext4 and xfs are smart enough to stop writeback when the inode is gone, but btrfs is still either keeping the inode alive or just finishes writeback anyway. --Andy #define _GNU_SOURCE #include #include #include #include #include #include #define SIZE (16 * 1048576) static void hammer(const char *name) { int fd = open(name, O_RDWR | O_CREAT | O_EXCL, 0600); if (fd == -1) err(1, "open"); fallocate(fd, 0, 0, SIZE); void *addr = mmap(NULL, SIZE, PROT_WRITE, MAP_SHARED, fd, 0); if (addr == MAP_FAILED) err(1, "mmap"); memset(addr, 0, SIZE); if (munmap(addr, SIZE) != 0) err(1, "munmap"); if (sync_file_range(fd, 0, 4096, SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WAIT_AFTER) != 0) err(1, "sync_file_range"); if (unlink(name) != 0) err(1, "unlink"); // if (ftruncate(fd, 0) != 0) //err(1, "ftruncate"); close(fd); } int main(int argc, char **argv) { if (argc != 2) { printf("Usage: hammer_and_delete FILENAME\n"); return 1; } while (true) { hammer(argv[1]); write(1, ".", 1); } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] media: i2c: mt9p031: Check return value of clk_prepare_enable/clk_set_rate
From: "Lad, Prabhakar" clk_set_rate(), clk_prepare_enable() functions can fail, so check the return values to avoid surprises. Signed-off-by: Lad, Prabhakar --- Changes for v2: 1: Called regulator_bulk_disable() in the error path drivers/media/i2c/mt9p031.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/media/i2c/mt9p031.c b/drivers/media/i2c/mt9p031.c index e5ddf47..05278f5 100644 --- a/drivers/media/i2c/mt9p031.c +++ b/drivers/media/i2c/mt9p031.c @@ -222,12 +222,15 @@ static int mt9p031_clk_setup(struct mt9p031 *mt9p031) struct i2c_client *client = v4l2_get_subdevdata(>subdev); struct mt9p031_platform_data *pdata = mt9p031->pdata; + int ret; mt9p031->clk = devm_clk_get(>dev, NULL); if (IS_ERR(mt9p031->clk)) return PTR_ERR(mt9p031->clk); - clk_set_rate(mt9p031->clk, pdata->ext_freq); + ret = clk_set_rate(mt9p031->clk, pdata->ext_freq); + if (ret < 0) + return ret; mt9p031->pll.ext_clock = pdata->ext_freq; mt9p031->pll.pix_clock = pdata->target_freq; @@ -286,8 +289,14 @@ static int mt9p031_power_on(struct mt9p031 *mt9p031) return ret; /* Emable clock */ - if (mt9p031->clk) - clk_prepare_enable(mt9p031->clk); + if (mt9p031->clk) { + ret = clk_prepare_enable(mt9p031->clk); + if (ret) { + regulator_bulk_disable(ARRAY_SIZE(mt9p031->regulators), + mt9p031->regulators); + return ret; + } + } /* Now RESET_BAR must be high */ if (gpio_is_valid(mt9p031->reset)) { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/kaslr for v3.14
On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds wrote: > So I pulled this, but one question: > > On Mon, Jan 20, 2014 at 8:47 AM, H. Peter Anvin wrote: >> +config RANDOMIZE_BASE >> + bool "Randomize the address of the kernel image" >> + depends on RELOCATABLE >> + depends on !HIBERNATION > > How fundamental is that "!HIBERNATION" issue? Right now that > anti-dependency on hibernation support will mean that no distro kernel > will actually use the kernel address space randomization. Which > long-term is a problem. > > I'm not sure HIBERNATION is really getting all that much use, but I > suspect distros would still want to support it. > > Is it just a temporary "I wasn't able to make it work, need to get > some PM people involved", or is it something really fundamental? Right, this is a "need to get PM people involved" situation. When kASLR was being designed, hibernation learning about the kernel base looked like a separable problem, and given the very long list of requirements for making it work at all, I carved this out as "future work". As for perf, it's similar -- it's another entirely solvable problem, but perf needs to be untaught some of its assumptions. We've had a static kernel base forever, so I'm expecting some bumps in the road here. I'm hopeful none of it will be too painful, though. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/x32] uapi: Use __kernel_ulong_t in shmid64_ds/shminfo64/ shm_info
Commit-ID: f8dcdf0130d3ba34f8f7531af7c45616efe1e32e Gitweb: http://git.kernel.org/tip/f8dcdf0130d3ba34f8f7531af7c45616efe1e32e Author: H.J. Lu AuthorDate: Fri, 27 Dec 2013 14:14:23 -0800 Committer: H. Peter Anvin CommitDate: Mon, 20 Jan 2014 14:45:25 -0800 uapi: Use __kernel_ulong_t in shmid64_ds/shminfo64/shm_info Both x32 and x86-64 use the same struct shmid64_ds/shminfo64/shm_info for system calls. But x32 long is 32-bit. This patch replaces unsigned long with __kernel_ulong_t in struct shmid64_ds/shminfo64/shm_info. Signed-off-by: H.J. Lu Link: http://lkml.kernel.org/r/1388182464-28428-8-git-send-email-hjl.to...@gmail.com Signed-off-by: H. Peter Anvin --- include/uapi/asm-generic/shmbuf.h | 24 include/uapi/linux/shm.h | 10 +- 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/include/uapi/asm-generic/shmbuf.h b/include/uapi/asm-generic/shmbuf.h index 5768fa6..7e9fb2f 100644 --- a/include/uapi/asm-generic/shmbuf.h +++ b/include/uapi/asm-generic/shmbuf.h @@ -39,21 +39,21 @@ struct shmid64_ds { #endif __kernel_pid_t shm_cpid; /* pid of creator */ __kernel_pid_t shm_lpid; /* pid of last operator */ - unsigned long shm_nattch; /* no. of current attaches */ - unsigned long __unused4; - unsigned long __unused5; + __kernel_ulong_tshm_nattch; /* no. of current attaches */ + __kernel_ulong_t__unused4; + __kernel_ulong_t__unused5; }; struct shminfo64 { - unsigned long shmmax; - unsigned long shmmin; - unsigned long shmmni; - unsigned long shmseg; - unsigned long shmall; - unsigned long __unused1; - unsigned long __unused2; - unsigned long __unused3; - unsigned long __unused4; + __kernel_ulong_tshmmax; + __kernel_ulong_tshmmin; + __kernel_ulong_tshmmni; + __kernel_ulong_tshmseg; + __kernel_ulong_tshmall; + __kernel_ulong_t__unused1; + __kernel_ulong_t__unused2; + __kernel_ulong_t__unused3; + __kernel_ulong_t__unused4; }; #endif /* __ASM_GENERIC_SHMBUF_H */ diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h index ec36fa1..78b6941 100644 --- a/include/uapi/linux/shm.h +++ b/include/uapi/linux/shm.h @@ -68,11 +68,11 @@ struct shminfo { struct shm_info { int used_ids; - unsigned long shm_tot; /* total allocated shm */ - unsigned long shm_rss; /* total resident shm */ - unsigned long shm_swp; /* total swapped shm */ - unsigned long swap_attempts; - unsigned long swap_successes; + __kernel_ulong_t shm_tot; /* total allocated shm */ + __kernel_ulong_t shm_rss; /* total resident shm */ + __kernel_ulong_t shm_swp; /* total swapped shm */ + __kernel_ulong_t swap_attempts; + __kernel_ulong_t swap_successes; }; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] mm/memcg: fix endless iteration in reclaim
On Fri, 17 Jan 2014, Michal Hocko wrote: > On Thu 16-01-14 11:15:36, Hugh Dickins wrote: > > > I don't believe 19f39402864e was responsible for a reference leak, > > that came later. But I think it was responsible for the original > > endless iteration (shrink_zone going around and around getting root > > again and again from mem_cgroup_iter). > > So your hang is not within mem_cgroup_iter but you are getting root all > the time without any way out? In the 3.10 and 3.11 cases, yes. > > [3.10 code base] > shrink_zone > [rmdir root] > mem_cgroup_iter(root, NULL, reclaim) > // prev = NULL > rcu_read_lock() > last_visited = iter->last_visited // gets root || NULL > css_tryget(last_visited) // failed > last_visited = NULL [1] > memcg = root = __mem_cgroup_iter_next(root, NULL) > iter->last_visited = root; > reclaim->generation = iter->generation > > mem_cgroup_iter(root, root, reclaim) >// prev = root >rcu_read_lock > last_visited = iter->last_visited // gets root > css_tryget(last_visited) // failed > [1] > > So we indeed can loop here without any progress. I just fail > to see how my patch could help. We even do not get down to > cgroup_next_descendant_pre. > > Or am I missing something? Your patch to 3.12 and 3.13 mem_cgroup_iter_next() doesn't help in 3.10 and 3.11, correct. That's why I appended a different patch, to mem_cgroup_iter(), for the 3.10 and 3.11 versions of the hang. > > The following should fix this kind of endless loop: > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 194721839cf5..168e5abcca92 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1221,7 +1221,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup > *root, > smp_rmb(); > last_visited = iter->last_visited; > if (last_visited && > - !css_tryget(_visited->css)) > + last_visited != root && > + !css_tryget(_visited->css)) > last_visited = NULL; > } > } > @@ -1229,7 +1230,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup > *root, > memcg = __mem_cgroup_iter_next(root, last_visited); > > if (reclaim) { > - if (last_visited) > + if (last_visited && last_visited != root) > css_put(_visited->css); > > iter->last_visited = memcg; Right, that appears to fix 3.10, and seems a better alternative to the patch I suggested. I say "appears" because my success in reproducing the hang is variable, so when I see that it's "fixed" I cannot be quite sure. I say "seems" because I think yours respects the intention of the iterator better than mine, but I've never been convinced that the iterator is as sensible as it intends in the face of races. At the bottom I've appended the version of yours that I've been trying on 3.11. I did succeed in reproducing the hang twice on 3.11.10.3 (which I don't think differs in any essential from 3.11.0 for this issue, but after my lack of success with 3.11.0 I tried harder with that.) More so than in the 3.10 case, I haven't really given it long enough with the patch to really assert that it's good; and Greg Thelen came across a different reproduction case that I've yet to remind myself of and try, I'll have to report back to you later in the week when I've run that with your fix. > > Not that I like it much :/ Well, I'm not in love with it, but I do think it's more appropriate than mine, if it really does fix the issues. It was only under questioning from you that we arrived at the belief that the problem is with the css_tryget of a root being removed: my patch was vaguer than that, not identifying the root cause. I suspect that the underlying problem is actually the "do {} while ()" nature of the iteration loops, instead of "while () {}"s. That places us (not for the first time) in the awkward position of having to supply something once (and once only) even when it doesn't really fit. (I have wondered whether making mem_cgroup_invalidate_reclaim_iterators visit the memcg as well as its parents, might provide another fix; nice if it did, but I doubt it, and have spent so much time fiddling around here that I've lost the will to try anything else.) > > > But beware of my conclusion, please check for yourself: with my > > separate kbuilds in separate /cg/cg/? memcgs, what "cg m" is doing > > is very simple and segregated, can hardly be called testing reclaim > > iteration, so I hope you have something better to check it. Plus > > I was testing on 3.10 and 3.11 vanilla, not latest stable versions. > > > > (If I'm very honest, I'll admit that I
[tip:x86/x32] uapi: Use __kernel_long_t in struct mq_attr
Commit-ID: 63159f5dcccb3858d88aaef800c4ee0eb4cc8577 Gitweb: http://git.kernel.org/tip/63159f5dcccb3858d88aaef800c4ee0eb4cc8577 Author: H.J. Lu AuthorDate: Fri, 27 Dec 2013 14:14:24 -0800 Committer: H. Peter Anvin CommitDate: Mon, 20 Jan 2014 14:45:33 -0800 uapi: Use __kernel_long_t in struct mq_attr Both x32 and x86-64 use the same struct mq_attr for system calls. But x32 long is 32-bit. This patch replaces long with __kernel_long_t in struct mq_attr. Signed-off-by: H.J. Lu Link: http://lkml.kernel.org/r/1388182464-28428-9-git-send-email-hjl.to...@gmail.com Signed-off-by: H. Peter Anvin --- include/uapi/linux/mqueue.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/mqueue.h b/include/uapi/linux/mqueue.h index 8b5a796..d0a2b8e 100644 --- a/include/uapi/linux/mqueue.h +++ b/include/uapi/linux/mqueue.h @@ -23,11 +23,11 @@ #define MQ_BYTES_MAX 819200 struct mq_attr { - longmq_flags; /* message queue flags */ - longmq_maxmsg; /* maximum number of messages */ - longmq_msgsize; /* maximum message size */ - longmq_curmsgs; /* number of messages currently queued */ - long__reserved[4]; /* ignored for input, zeroed for output */ + __kernel_long_t mq_flags; /* message queue flags */ + __kernel_long_t mq_maxmsg; /* maximum number of messages */ + __kernel_long_t mq_msgsize; /* maximum message size */ + __kernel_long_t mq_curmsgs; /* number of messages currently queued */ + __kernel_long_t __reserved[4]; /* ignored for input, zeroed for output */ }; /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/x32] uapi: Use __kernel_ulong_t in struct msqid64_ds
Commit-ID: b9cd5ca22d6739c61655d4fcf8b29669d5d177a3 Gitweb: http://git.kernel.org/tip/b9cd5ca22d6739c61655d4fcf8b29669d5d177a3 Author: H.J. Lu AuthorDate: Fri, 27 Dec 2013 14:14:21 -0800 Committer: H. Peter Anvin CommitDate: Mon, 20 Jan 2014 14:45:01 -0800 uapi: Use __kernel_ulong_t in struct msqid64_ds Both x32 and x86-64 use the same struct msqid64_ds for system calls. But x32 long is 32-bit. This patch replaces unsigned long with __kernel_ulong_t in struct msqid64_ds. Signed-off-by: H.J. Lu Link: http://lkml.kernel.org/r/1388182464-28428-6-git-send-email-hjl.to...@gmail.com Signed-off-by: H. Peter Anvin --- include/uapi/asm-generic/msgbuf.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include/uapi/asm-generic/msgbuf.h b/include/uapi/asm-generic/msgbuf.h index aec850d..f55ecc4 100644 --- a/include/uapi/asm-generic/msgbuf.h +++ b/include/uapi/asm-generic/msgbuf.h @@ -35,13 +35,13 @@ struct msqid64_ds { #if __BITS_PER_LONG != 64 unsigned long __unused3; #endif - unsigned long msg_cbytes; /* current number of bytes on queue */ - unsigned long msg_qnum;/* number of messages in queue */ - unsigned long msg_qbytes; /* max number of bytes on queue */ + __kernel_ulong_t msg_cbytes;/* current number of bytes on queue */ + __kernel_ulong_t msg_qnum; /* number of messages in queue */ + __kernel_ulong_t msg_qbytes;/* max number of bytes on queue */ __kernel_pid_t msg_lspid; /* pid of last msgsnd */ __kernel_pid_t msg_lrpid; /* last receive pid */ - unsigned long __unused4; - unsigned long __unused5; + __kernel_ulong_t __unused4; + __kernel_ulong_t __unused5; }; #endif /* __ASM_GENERIC_MSGBUF_H */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/x32] x86, uapi, x32: Use __kernel_ulong_t in x86 struct semid64_ds
Commit-ID: 386916598e901e406c1f1fc801ade2646a1e8137 Gitweb: http://git.kernel.org/tip/386916598e901e406c1f1fc801ade2646a1e8137 Author: H.J. Lu AuthorDate: Fri, 27 Dec 2013 14:14:22 -0800 Committer: H. Peter Anvin CommitDate: Mon, 20 Jan 2014 14:45:13 -0800 x86, uapi, x32: Use __kernel_ulong_t in x86 struct semid64_ds Both x32 and x86-64 use the same struct semid64_ds for system calls. But x32 long is 32-bit. This patch replaces unsigned long with __kernel_ulong_t in x86 struct semid64_ds. Signed-off-by: H.J. Lu Link: http://lkml.kernel.org/r/1388182464-28428-7-git-send-email-hjl.to...@gmail.com Signed-off-by: H. Peter Anvin --- arch/x86/include/uapi/asm/sembuf.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/uapi/asm/sembuf.h b/arch/x86/include/uapi/asm/sembuf.h index ee50c80..cc2d6a3 100644 --- a/arch/x86/include/uapi/asm/sembuf.h +++ b/arch/x86/include/uapi/asm/sembuf.h @@ -13,12 +13,12 @@ struct semid64_ds { struct ipc64_perm sem_perm; /* permissions .. see ipc.h */ __kernel_time_t sem_otime; /* last semop time */ - unsigned long __unused1; + __kernel_ulong_t __unused1; __kernel_time_t sem_ctime; /* last change time */ - unsigned long __unused2; - unsigned long sem_nsems; /* no. of semaphores in array */ - unsigned long __unused3; - unsigned long __unused4; + __kernel_ulong_t __unused2; + __kernel_ulong_t sem_nsems; /* no. of semaphores in array */ + __kernel_ulong_t __unused3; + __kernel_ulong_t __unused4; }; #endif /* _ASM_X86_SEMBUF_H */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/x32] uapi, asm-generic: Use __kernel_ulong_t in uapi struct ipc64_perm
Commit-ID: 071ed2456f79722d0a54f51717e66aacbc7a5d26 Gitweb: http://git.kernel.org/tip/071ed2456f79722d0a54f51717e66aacbc7a5d26 Author: H.J. Lu AuthorDate: Fri, 27 Dec 2013 14:14:19 -0800 Committer: H. Peter Anvin CommitDate: Mon, 20 Jan 2014 14:44:35 -0800 uapi, asm-generic: Use __kernel_ulong_t in uapi struct ipc64_perm x32 IPC system call is the same as x86-64 IPC system call, which uses 64-bit integer for unsigned long in struct ipc64_perm. But x32 long is 32 bit. This patch replaces unsigned long in uapi struct ipc64_perm with __kernel_ulong_t. Signed-off-by: H.J. Lu Link: http://lkml.kernel.org/r/1388182464-28428-4-git-send-email-hjl.to...@gmail.com Signed-off-by: H. Peter Anvin --- include/uapi/asm-generic/ipcbuf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/asm-generic/ipcbuf.h b/include/uapi/asm-generic/ipcbuf.h index 76982b2..3dbcc1e 100644 --- a/include/uapi/asm-generic/ipcbuf.h +++ b/include/uapi/asm-generic/ipcbuf.h @@ -27,8 +27,8 @@ struct ipc64_perm { unsigned char __pad1[4 - sizeof(__kernel_mode_t)]; unsigned short seq; unsigned short __pad2; - unsigned long __unused1; - unsigned long __unused2; + __kernel_ulong_t__unused1; + __kernel_ulong_t__unused2; }; #endif /* __ASM_GENERIC_IPCBUF_H */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/x32] uapi: Use __kernel_long_t in struct timex
Commit-ID: 7fb30128527a4220f181c2867edd9ac178175a87 Gitweb: http://git.kernel.org/tip/7fb30128527a4220f181c2867edd9ac178175a87 Author: H.J. Lu AuthorDate: Fri, 27 Dec 2013 14:14:17 -0800 Committer: H. Peter Anvin CommitDate: Mon, 20 Jan 2014 14:44:05 -0800 uapi: Use __kernel_long_t in struct timex x32 adjtimex system call is the same as x86-64 adjtimex system call, which uses 64-bit integer for long in struct timex. But x32 long is 32 bit. This patch replaces long in struct timex with __kernel_long_t. Signed-off-by: H.J. Lu Link: http://lkml.kernel.org/r/1388182464-28428-2-git-send-email-hjl.to...@gmail.com Signed-off-by: H. Peter Anvin --- include/uapi/linux/timex.h | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/include/uapi/linux/timex.h b/include/uapi/linux/timex.h index a7ea81f..92685d8 100644 --- a/include/uapi/linux/timex.h +++ b/include/uapi/linux/timex.h @@ -63,27 +63,27 @@ */ struct timex { unsigned int modes; /* mode selector */ - long offset;/* time offset (usec) */ - long freq; /* frequency offset (scaled ppm) */ - long maxerror; /* maximum error (usec) */ - long esterror; /* estimated error (usec) */ + __kernel_long_t offset; /* time offset (usec) */ + __kernel_long_t freq; /* frequency offset (scaled ppm) */ + __kernel_long_t maxerror;/* maximum error (usec) */ + __kernel_long_t esterror;/* estimated error (usec) */ int status; /* clock command/status */ - long constant; /* pll time constant */ - long precision; /* clock precision (usec) (read only) */ - long tolerance; /* clock frequency tolerance (ppm) -* (read only) -*/ + __kernel_long_t constant;/* pll time constant */ + __kernel_long_t precision;/* clock precision (usec) (read only) */ + __kernel_long_t tolerance;/* clock frequency tolerance (ppm) + * (read only) + */ struct timeval time;/* (read only, except for ADJ_SETOFFSET) */ - long tick; /* (modified) usecs between clock ticks */ + __kernel_long_t tick; /* (modified) usecs between clock ticks */ - long ppsfreq; /* pps frequency (scaled ppm) (ro) */ - long jitter;/* pps jitter (us) (ro) */ + __kernel_long_t ppsfreq;/* pps frequency (scaled ppm) (ro) */ + __kernel_long_t jitter; /* pps jitter (us) (ro) */ int shift; /* interval duration (s) (shift) (ro) */ - long stabil;/* pps stability (scaled ppm) (ro) */ - long jitcnt;/* jitter limit exceeded (ro) */ - long calcnt;/* calibration intervals (ro) */ - long errcnt;/* calibration errors (ro) */ - long stbcnt;/* stability limit exceeded (ro) */ + __kernel_long_t stabil;/* pps stability (scaled ppm) (ro) */ + __kernel_long_t jitcnt; /* jitter limit exceeded (ro) */ + __kernel_long_t calcnt; /* calibration intervals (ro) */ + __kernel_long_t errcnt; /* calibration errors (ro) */ + __kernel_long_t stbcnt; /* stability limit exceeded (ro) */ int tai;/* TAI offset (ro) */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/x32] uapi: Use __kernel_long_t in struct msgbuf
Commit-ID: 443d5670f77aab121cb95f45da60f0aad390bcb5 Gitweb: http://git.kernel.org/tip/443d5670f77aab121cb95f45da60f0aad390bcb5 Author: H.J. Lu AuthorDate: Fri, 27 Dec 2013 14:14:20 -0800 Committer: H. Peter Anvin CommitDate: Mon, 20 Jan 2014 14:44:50 -0800 uapi: Use __kernel_long_t in struct msgbuf x32 msgsnd/msgrcv system calls are the same as x86-64 msgsnd/msgrcv system calls, which use 64-bit integer for long in struct msgbuf . But x32 long is 32 bit. This patch replaces long in struct msgbuf with __kernel_long_t. Signed-off-by: H.J. Lu Link: http://lkml.kernel.org/r/1388182464-28428-5-git-send-email-hjl.to...@gmail.com Signed-off-by: H. Peter Anvin --- include/uapi/linux/msg.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/msg.h b/include/uapi/linux/msg.h index 22d95c6..a703755 100644 --- a/include/uapi/linux/msg.h +++ b/include/uapi/linux/msg.h @@ -34,8 +34,8 @@ struct msqid_ds { /* message buffer for msgsnd and msgrcv calls */ struct msgbuf { - long mtype; /* type of message */ - char mtext[1]; /* message text */ + __kernel_long_t mtype; /* type of message */ + char mtext[1]; /* message text */ }; /* buffer for msgctl calls IPC_INFO, MSG_INFO */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/x32] uapi: Use __kernel_long_t/__kernel_ulong_t in < linux/resource.h>
Commit-ID: b684bfedc94d4b2efff09dc499a9985321c482f5 Gitweb: http://git.kernel.org/tip/b684bfedc94d4b2efff09dc499a9985321c482f5 Author: H.J. Lu AuthorDate: Fri, 27 Dec 2013 14:14:18 -0800 Committer: H. Peter Anvin CommitDate: Mon, 20 Jan 2014 14:44:17 -0800 uapi: Use __kernel_long_t/__kernel_ulong_t in Both x32 and x86-64 use the same struct rusage and struct rlimit for system calls. But x32 log is 32-bit. This patch change uapi to use __kernel_long_t in struct rusage and __kernel_ulong_t in and struct rlimit. Signed-off-by: H.J. Lu Link: http://lkml.kernel.org/r/1388182464-28428-3-git-send-email-hjl.to...@gmail.com Signed-off-by: H. Peter Anvin --- include/uapi/linux/resource.h | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/include/uapi/linux/resource.h b/include/uapi/linux/resource.h index e0ed284..36fb3b5 100644 --- a/include/uapi/linux/resource.h +++ b/include/uapi/linux/resource.h @@ -23,25 +23,25 @@ struct rusage { struct timeval ru_utime;/* user time used */ struct timeval ru_stime;/* system time used */ - longru_maxrss; /* maximum resident set size */ - longru_ixrss; /* integral shared memory size */ - longru_idrss; /* integral unshared data size */ - longru_isrss; /* integral unshared stack size */ - longru_minflt; /* page reclaims */ - longru_majflt; /* page faults */ - longru_nswap; /* swaps */ - longru_inblock; /* block input operations */ - longru_oublock; /* block output operations */ - longru_msgsnd; /* messages sent */ - longru_msgrcv; /* messages received */ - longru_nsignals;/* signals received */ - longru_nvcsw; /* voluntary context switches */ - longru_nivcsw; /* involuntary " */ + __kernel_long_t ru_maxrss; /* maximum resident set size */ + __kernel_long_t ru_ixrss; /* integral shared memory size */ + __kernel_long_t ru_idrss; /* integral unshared data size */ + __kernel_long_t ru_isrss; /* integral unshared stack size */ + __kernel_long_t ru_minflt; /* page reclaims */ + __kernel_long_t ru_majflt; /* page faults */ + __kernel_long_t ru_nswap; /* swaps */ + __kernel_long_t ru_inblock; /* block input operations */ + __kernel_long_t ru_oublock; /* block output operations */ + __kernel_long_t ru_msgsnd; /* messages sent */ + __kernel_long_t ru_msgrcv; /* messages received */ + __kernel_long_t ru_nsignals;/* signals received */ + __kernel_long_t ru_nvcsw; /* voluntary context switches */ + __kernel_long_t ru_nivcsw; /* involuntary " */ }; struct rlimit { - unsigned long rlim_cur; - unsigned long rlim_max; + __kernel_ulong_trlim_cur; + __kernel_ulong_trlim_max; }; #define RLIM64_INFINITY(~0ULL) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] arch: use ASM_NL instead of ';' for assembler new line character in the macro
Hi Mike, On Saturday 18 January 2014 03:14 PM, Chen Gang wrote: > Hello Maintainers: > > Please help check this patch when you have time. > > Thanks. Do you know whose tree this is goona go thru. I can take it thru ARC (but maybe for 3.15, however it would be better it went thru mm or some such). -Vineet > > On 01/12/2014 09:59 AM, Chen Gang wrote: >> For some assemblers, they use another character as newline in a macro >> (e.g. arc uses '`'), so for generic assembly code, need use ASM_NL (a >> macro) instead of ';' for it. >> >> >> Signed-off-by: Chen Gang >> Acked-by: Vineet Gupta >> --- >> arch/arc/include/asm/linkage.h | 2 ++ >> include/linux/linkage.h| 19 --- >> 2 files changed, 14 insertions(+), 7 deletions(-) >> >> diff --git a/arch/arc/include/asm/linkage.h b/arch/arc/include/asm/linkage.h >> index 0283e9e..66ee552 100644 >> --- a/arch/arc/include/asm/linkage.h >> +++ b/arch/arc/include/asm/linkage.h >> @@ -11,6 +11,8 @@ >> >> #ifdef __ASSEMBLY__ >> >> +#define ASM_NL ` /* use '`' to mark new line in macro */ >> + >> /* Can't use the ENTRY macro in linux/linkage.h >> * gas considers ';' as comment vs. newline >> */ >> diff --git a/include/linux/linkage.h b/include/linux/linkage.h >> index d3e8ad2..a6a42dd 100644 >> --- a/include/linux/linkage.h >> +++ b/include/linux/linkage.h >> @@ -6,6 +6,11 @@ >> #include >> #include >> >> +/* Some toolchains use other characters (e.g. '`') to mark new line in >> macro */ >> +#ifndef ASM_NL >> +#define ASM_NL ; >> +#endif >> + >> #ifdef __cplusplus >> #define CPP_ASMLINKAGE extern "C" >> #else >> @@ -75,21 +80,21 @@ >> >> #ifndef ENTRY >> #define ENTRY(name) \ >> - .globl name; \ >> - ALIGN; \ >> - name: >> +.globl name ASM_NL \ >> +ALIGN ASM_NL \ >> +name: >> #endif >> #endif /* LINKER_SCRIPT */ >> >> #ifndef WEAK >> #define WEAK(name) \ >> -.weak name;\ >> +.weak name ASM_NL \ >> name: >> #endif >> >> #ifndef END >> #define END(name) \ >> - .size name, .-name >> +.size name, .-name >> #endif >> >> /* If symbol 'name' is treated as a subroutine (gets called, and returns) >> @@ -98,8 +103,8 @@ >> */ >> #ifndef ENDPROC >> #define ENDPROC(name) \ >> - .type name, @function; \ >> - END(name) >> +.type name, @function ASM_NL \ >> +END(name) >> #endif >> >> #endif >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages
Please check your MUA and don't break thread. On Tue, Jan 21, 2014 at 11:07:42AM +0800, Cai Liu wrote: > Thanks for your review. > > 2014/1/21 Minchan Kim : > > Hello Cai, > > > > On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote: > >> zswap can support multiple swapfiles. So we need to check > >> all zbud pool pages in zswap. > >> > >> Version 2: > >> * add *total_zbud_pages* in zbud to record all the pages in pools > >> * move the updating of pool pages statistics to > >> alloc_zbud_page/free_zbud_page to hide the details > >> > >> Signed-off-by: Cai Liu > >> --- > >> include/linux/zbud.h |2 +- > >> mm/zbud.c| 44 > >> mm/zswap.c |4 ++-- > >> 3 files changed, 35 insertions(+), 15 deletions(-) > >> > >> diff --git a/include/linux/zbud.h b/include/linux/zbud.h > >> index 2571a5c..1dbc13e 100644 > >> --- a/include/linux/zbud.h > >> +++ b/include/linux/zbud.h > >> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, unsigned long > >> handle); > >> int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries); > >> void *zbud_map(struct zbud_pool *pool, unsigned long handle); > >> void zbud_unmap(struct zbud_pool *pool, unsigned long handle); > >> -u64 zbud_get_pool_size(struct zbud_pool *pool); > >> +u64 zbud_get_pool_size(void); > >> > >> #endif /* _ZBUD_H_ */ > >> diff --git a/mm/zbud.c b/mm/zbud.c > >> index 9451361..711aaf4 100644 > >> --- a/mm/zbud.c > >> +++ b/mm/zbud.c > >> @@ -52,6 +52,13 @@ > >> #include > >> #include > >> > >> +/* > >> +* statistics > >> +**/ > >> + > >> +/* zbud pages in all pools */ > >> +static u64 total_zbud_pages; > >> + > >> /* > >> * Structures > >> */ > >> @@ -142,10 +149,28 @@ static struct zbud_header *init_zbud_page(struct > >> page *page) > >> return zhdr; > >> } > >> > >> +static struct page *alloc_zbud_page(struct zbud_pool *pool, gfp_t gfp) > >> +{ > >> + struct page *page; > >> + > >> + page = alloc_page(gfp); > >> + > >> + if (page) { > >> + pool->pages_nr++; > >> + total_zbud_pages++; > > > > Who protect race? > > Yes, here the pool->pages_nr and also the total_zbud_pages are not protected. > I will re-do it. > > I will change *total_zbud_pages* to atomic type. Wait, it doesn't make sense. Now, you assume zbud allocator would be used for only zswap. It's true until now but we couldn't make sure it in future. If other user start to use zbud allocator, total_zbud_pages would be pointless. Another concern is that what's your scenario for above two swap? How often we need to call zbud_get_pool_size? In previous your patch, you reduced the number of call so IIRC, we only called it in zswap_is_full and for debugfs. Of course, it would need some lock or refcount to prevent destroy of zswap_tree in parallel with zswap_frontswap_invalidate_area but zswap_is_full doesn't need to be exact so RCU would be good fit. Most important point is that now zswap doesn't consider multiple swap. For example, Let's assume you uses two swap A and B with different priority and A already has charged 19% long time ago and let's assume that A swap is full now so VM start to use B so that B has charged 1% recently. It menas zswap charged (19% + 1%)i is full by default. Then, if VM want to swap out more pages into B, zbud_reclaim_page would be evict one of pages in B's pool and it would be repeated continuously. It's totally LRU reverse problem and swap thrashing in B would happen. Please say your usecase scenario and if it's really problem, we need more surgery. Thanks. > For *pool->pages_nr*, one way is to use pool->lock to protect. But I > think it is too heavy. > So does it ok to change pages_nr to atomic type too? > > > > > >> + } > >> + > >> + return page; > >> +} > >> + > >> + > >> /* Resets the struct page fields and frees the page */ > >> -static void free_zbud_page(struct zbud_header *zhdr) > >> +static void free_zbud_page(struct zbud_pool *pool, struct zbud_header > >> *zhdr) > >> { > >> __free_page(virt_to_page(zhdr)); > >> + > >> + pool->pages_nr--; > >> + total_zbud_pages--; > >> } > >> > >> /* > >> @@ -279,11 +304,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, > >> gfp_t gfp, > >> > >> /* Couldn't find unbuddied zbud page, create new one */ > >> spin_unlock(>lock); > >> - page = alloc_page(gfp); > >> + page = alloc_zbud_page(pool, gfp); > >> if (!page) > >> return -ENOMEM; > >> spin_lock(>lock); > >> - pool->pages_nr++; > >> zhdr = init_zbud_page(page); > >> bud = FIRST; > >> > >> @@ -349,8 +373,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long > >> handle) > >> if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { > >> /* zbud page is empty, free */ > >> list_del(>lru);
Re: Dirty deleted files cause pointless I/O storms (unless truncated first)
On Mon, Jan 20, 2014 at 04:59:23PM -0800, Andy Lutomirski wrote: > The code below runs quickly for a few iterations, and then it slows > down and the whole system becomes laggy for far too long. > > Removing the sync_file_range call results in no I/O being performed at > all (which means that the kernel isn't totally screwing this up), and > changing "4096" to SIZE causes lots of I/O but without > the going-out-to-lunch bit (unsurprisingly). More details please. hardware, storage, kernel version, etc. I can't reproduce any slowdown with the code as posted on a VM running 3.31-rc5 with 16GB RAM and an SSD w/ ext4 or XFS. The workload is only generating about 80 IOPS on ext4 so even a slow spindle should be able handle this without problems... > Surprisingly, uncommenting the ftruncate call seems to fix the > problem. This suggests that all the necessary infrastructure to avoid > wasting time writing to deleted files is there but that it's not > getting used. Not surprising at all - if it's stuck in a writeback loop somewhere, truncating the file will terminate writeback because it end up being past EOF and so stops immediately... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] input/uinput: add UI_GET_SYSNAME ioctl to retrieve the sysfs path
On Tue, Jan 21, 2014 at 08:56:51AM +1000, Peter Hutterer wrote: > On Mon, Jan 20, 2014 at 05:17:08PM -0500, Benjamin Tissoires wrote: > > On Mon, Jan 20, 2014 at 4:53 PM, Dmitry Torokhov > > wrote: > > > Hi Benjamin, > > > > > > On Fri, Jan 17, 2014 at 02:12:51PM -0500, Benjamin Tissoires wrote: > > >> Evemu [1] uses uinput to replay devices traces it has recorded. However, > > >> the way evemu uses uinput is slightly different from how uinput is > > >> supposed to be used. > > >> Evemu relies on libevdev, which creates the device node through uinput. > > >> It then injects events through the input device node directly (and it > > >> completely skips the uinput node). > > >> > > >> Currently, libevdev relies on an heuristic to guess which input node was > > >> created. The problem is that is heuristic is subjected to races between > > >> different uinput devices or even with physical devices. Having a way > > >> to retrieve the sysfs path allows us to find the event node without > > >> having to rely on this heuristic. > > > > > > I have been thinking about it and I think that providing tight coupling > > > between uinput and resulting event device is wrong thing to do. We do > > > allow sending input events through uinput interface and I think evemu > > > should be using it, instead of going halfway through uinput and halfway > > > though evdev. Replaying though uinput would actually be more correct as > > > it would involve the same code paths throgugh input core as with using > > > real devices (see input_event() vs. input_inject_event() that is used by > > > input handlers). > > > > > > > Yes, I am perfectly aware of the fact that evemu is not using uinput > > in the way it is intended to be. > > I agree that it should be using the uinput node to inject events but > > this means that only the process which has created the virtual device > > can access it. It seems weird, I know, but the typical use of evemu is > > the following: > > - in a first terminal: $> sudo evemu-device mydevice.desc > > - In a second: $> sudo evemu-play /dev/input/event12 < mydevice.events > > > > It looks weird here, but it allows to inject different events > > recording for the same virtual device node. > > it also allows replaying an event through the device it was recorded on. > it's not always necessary or desirable to create a uinput device, sometimes > replaying it through the actual device is better to reproduce a certain bug. I was not saying that we should remove ability to inject events through evdev nodes, so I am not sure why you are bringing your last point, but form your and Benjamin's other mails I can see why going through evdev (that has a separate device node) might be beneficial. Benjamin, please clean up the issues brought up by David and I should be able to apply the patch. Thanks. -- Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages
On Tue, Jan 21, 2014 at 11:07 AM, Cai Liu wrote: > Thanks for your review. > > 2014/1/21 Minchan Kim : >> Hello Cai, >> >> On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote: >>> zswap can support multiple swapfiles. So we need to check >>> all zbud pool pages in zswap. >>> >>> Version 2: >>> * add *total_zbud_pages* in zbud to record all the pages in pools >>> * move the updating of pool pages statistics to >>> alloc_zbud_page/free_zbud_page to hide the details >>> >>> Signed-off-by: Cai Liu >>> --- >>> include/linux/zbud.h |2 +- >>> mm/zbud.c| 44 >>> mm/zswap.c |4 ++-- >>> 3 files changed, 35 insertions(+), 15 deletions(-) >>> >>> diff --git a/include/linux/zbud.h b/include/linux/zbud.h >>> index 2571a5c..1dbc13e 100644 >>> --- a/include/linux/zbud.h >>> +++ b/include/linux/zbud.h >>> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, unsigned long >>> handle); >>> int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries); >>> void *zbud_map(struct zbud_pool *pool, unsigned long handle); >>> void zbud_unmap(struct zbud_pool *pool, unsigned long handle); >>> -u64 zbud_get_pool_size(struct zbud_pool *pool); >>> +u64 zbud_get_pool_size(void); >>> >>> #endif /* _ZBUD_H_ */ >>> diff --git a/mm/zbud.c b/mm/zbud.c >>> index 9451361..711aaf4 100644 >>> --- a/mm/zbud.c >>> +++ b/mm/zbud.c >>> @@ -52,6 +52,13 @@ >>> #include >>> #include >>> >>> +/* >>> +* statistics >>> +**/ >>> + >>> +/* zbud pages in all pools */ >>> +static u64 total_zbud_pages; >>> + >>> /* >>> * Structures >>> */ >>> @@ -142,10 +149,28 @@ static struct zbud_header *init_zbud_page(struct page >>> *page) >>> return zhdr; >>> } >>> >>> +static struct page *alloc_zbud_page(struct zbud_pool *pool, gfp_t gfp) >>> +{ >>> + struct page *page; >>> + >>> + page = alloc_page(gfp); >>> + >>> + if (page) { >>> + pool->pages_nr++; >>> + total_zbud_pages++; >> >> Who protect race? > > Yes, here the pool->pages_nr and also the total_zbud_pages are not protected. > I will re-do it. > > I will change *total_zbud_pages* to atomic type. And how about just add total_zbud_pages++ and leave pool->pages_nr in its original place which already protected by pool->lock? > For *pool->pages_nr*, one way is to use pool->lock to protect. But I > think it is too heavy. > So does it ok to change pages_nr to atomic type too? > -- Regards, --Bob -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 06/20] ARM64 / ACPI: Introduce some PCI functions when PCI is enabled
On 2014-1-21 2:39, Arnd Bergmann wrote: > On Monday 20 January 2014, Hanjun Guo wrote: acpi_register_ioapic()/acpi_unregister_ioapic() will be used for IOAPIC hotplug and GIC distributor is something like IOAPIC on x86, so I think these two functions can be reserved for future use. >>> But GIC is not hotplugged, is it? It still sounds x86 specific to me. >> >> Well, if we want to do physical CPU hotplug on ARM/ARM64 (maybe years >> later?), >> then GIC add/remove is needed because we have to remove GIC >> on the SoC too when we remove the physical CPU. > > In general, I recommend not planning for the future in kernel code when you > don't know what is going to happen. It's always easy enough to change > things once you get there, as long as no stable ABI is involved. Ok, I agree with you. > > I just looked at the caller of these functions, and found a self-contained > PCI driver in drivers/pci/ioapic.c, which uses two sepate PCI classes for > ioapic and ioxapic. I think it's a safe assumption to say that even if we > get ARM CPU+GIC hotplug, that would not use the same ioapic driver. This > driver is currently marked X86-only, and that should probably stay this way, > so you won't need the hooks. Will find a suitable way to fix that in next version, thanks for you comments :) Hanjun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] ACPI / idle: Move idle_boot_override out of the arch directory
On 2014-1-21 7:34, Rafael J. Wysocki wrote: > On Monday, January 20, 2014 10:08:41 PM Hanjun Guo wrote: >> On 2014年01月18日 21:47, Rafael J. Wysocki wrote: >>> On Saturday, January 18, 2014 11:52:18 AM Hanjun Guo wrote: On 2014-1-18 11:45, Hanjun Guo wrote: > On 2014-1-17 20:06, Sudeep Holla wrote: >> On 17/01/14 02:03, Hanjun Guo wrote: >>> Move idle_boot_override out of the arch directory to be a single enum >>> including both platforms values, this will make it rather easier to >>> avoid ifdefs around which definitions are for which processor in >>> generally used ACPI code. >>> >>> IDLE_FORCE_MWAIT for IA64 is not used anywhere, so romove it. >>> >>> No functional change in this patch. >>> >>> Suggested-by: Alan >>> Signed-off-by: Hanjun Guo >>> --- [...] >>> diff --git a/include/linux/cpu.h b/include/linux/cpu.h >>> index 03e235ad..e324561 100644 >>> --- a/include/linux/cpu.h >>> +++ b/include/linux/cpu.h >>> @@ -220,6 +220,14 @@ void cpu_idle(void); >>> >>> void cpu_idle_poll_ctrl(bool enable); >>> >>> +enum idle_boot_override { >>> + IDLE_NO_OVERRIDE = 0, >>> + IDLE_HALT, >>> + IDLE_NOMWAIT, >>> + IDLE_POLL, >>> + IDLE_POWERSAVE_OFF >>> +}; >>> + >> I do understand the idea behind this change, but IMO HALT and MWAIT are >> x86 >> specific and may not make sense for other architectures. > yes, this is the strange part, the value is arch-dependent. > >> It will also require every architecture using ACPI to export >> boot_option_idle_override which may not be really required. > so, how about forget this patch and move boot_option_idle_override > related code into arch directory such as arch/x86/acpi/boot.c for > x86? The general idea is that we can move all the arch-dependent codes in ACPI driver to arch directory, then make codes in drivers/acpi/ arch independent. >>> Well, MWAIT is arch-dependent, so I'm not sure how IDLE_NOMWAIT fits into >>> include/linux/cpu.h? >> >> So you will not happy with this patch and should find another solution? > > No, I'm not happy with it. > > If you want to move that to an arch-agnostic header, the symbol names cannot > be arch-dependent any more. Ok, will find another solution for that, thanks for your comments :) Hanjun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the tip tree
On Mon, 2014-01-20 at 22:51 +0100, Peter Zijlstra wrote: > I'm still waiting for someone to explain what's wrong with: > > static inline void mwait_idle(void) > { > local_irq_enable(); > mwait_idle_with_hints(0, 0); > } How about just do that going forward, it work, and can always be fixed if something turns up, and the below for stable once it hits mainline? Q6600 box is happy camper in all trees. From: Len Brown x86 idle: restore mwait_idle() In Linux-3.9 we removed the mwait_idle() loop: 'x86 idle: remove mwait_idle() and "idle=mwait" cmdline param' (69fb3676df3329a7142803bb3502fa59dc0db2e3) The reasoning was that modern machines should be sufficiently happy during the boot process using the default_idle() HALT loop, until cpuidle loads and either acpi_idle or intel_idle invoke the newer MWAIT-with-hints idle loop. But two machines reported problems: 1. Certain Core2-era machines support MWAIT-C1 and HALT only. MWAIT-C1 is preferred for optimal power and performance. But if they support just C1, cpuidle never loads and so they use the boot-time default idle loop forever. 2. Some laptops will boot-hang if HALT is used, but will boot successfully if MWAIT is used. This appears to be a hidden assumption in BIOS SMI, that is presumably valid on the proprietary OS where the BIOS was validated. https://bugzilla.kernel.org/show_bug.cgi?id=60770 So here we effectively revert the patch above, restoring the mwait_idle() loop. However, we don't bother restoring the idle=mwait cmdline parameter, since it appears to add no value. Maintainer notes: For 3.9, simply revert 69fb3676df for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in context For 3.11, 3.12, 3.13, this patch applies cleanly Mike: reinstate polling, and add clflush barriers. Cc: Mike Galbraith Cc: Ian Malone Cc: Josh Boyer Cc: # 3.9, 3.10, 3.11, 3.12, 3.13 Signed-off-by: Mike Galbraith Signed-off-by: Len Brown --- diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 3fb8d95ab8b5..c5db2a43e730 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -398,6 +398,52 @@ static void amd_e400_idle(void) default_idle(); } +/* + * Intel Core2 and older machines prefer MWAIT over HALT for C1. + * We can't rely on cpuidle installing MWAIT, because it will not load + * on systems that support only C1 -- so the boot default must be MWAIT. + * + * Some AMD machines are the opposite, they depend on using HALT. + * + * So for default C1, which is used during boot until cpuidle loads, + * use MWAIT-C1 on Intel HW that has it, else use HALT. + */ +static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c) +{ + if (c->x86_vendor != X86_VENDOR_INTEL) + return 0; + + if (!cpu_has(c, X86_FEATURE_MWAIT)) + return 0; + + return 1; +} + +/* + * MONITOR/MWAIT with no hints, used for default default C1 state. + * This invokes MWAIT with interrutps enabled and no flags, + * which is backwards compatible with the original MWAIT implementation. + */ + +static void mwait_idle(void) +{ + if (!current_set_polling_and_test()) { + if (static_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) { + mb(); + clflush((void *)_thread_info()->flags); + mb(); + } + + __monitor((void *)_thread_info()->flags, 0, 0); + if (!need_resched()) + __sti_mwait(0, 0); + else + local_irq_enable(); + } else + local_irq_enable(); + __current_clr_polling(); +} + void select_idle_routine(const struct cpuinfo_x86 *c) { #ifdef CONFIG_SMP @@ -411,6 +457,9 @@ void select_idle_routine(const struct cpuinfo_x86 *c) /* E400: APIC timer interrupt does not wake up CPU from C1e */ pr_info("using AMD E400 aware idle routine\n"); x86_idle = amd_e400_idle; + } else if (prefer_mwait_c1_over_halt(c)) { + pr_info("using mwait in idle threads\n"); + x86_idle = mwait_idle; } else x86_idle = default_idle; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.12-rc5 and overwritten partition table - by powertop?
On 10/29/2013 04:32 PM, John Twideldum wrote: The first ~170kb of /dev/sda got blown away with what seems to be a logging output by Powertop, when I was playing with the tuneables. So did you log the output to some file? I'm just trying to understand how it could get onto your disk in the first place... Attached a dump of the first 1Mb of the disk, HTH. It looks like a powertop log? (I have powertop 2.4) Yes, likely. But it is strange the corruption doesn't even end at any sensible boundary (data ends at offset 0x27b53). Shrug... My recollection what I did is this: I was looking into powertop and observing how -rc5 works now with Haswell. I saw the tuneable parameters and quite a few were "bad", so I set them to "good". Power usage dropped about one third - yay! However, changing "SATA link power" threw up complaints: Oct 29 09:09:21 localhost kernel: [ 3697.423868] ata1.00: exception Emask 0x10 SAct 0x1 SErr 0xc action 0x6 frozen Oct 29 09:09:21 localhost kernel: [ 3697.423873] ata1.00: irq_stat 0x0800, interface fatal error Oct 29 09:09:21 localhost kernel: [ 3697.423877] ata1: SError: { CommWake 10B8B } Oct 29 09:09:21 localhost kernel: [ 3697.423880] ata1.00: failed command: WRITE FPDMA QUEUED Oct 29 09:09:21 localhost kernel: [ 3697.423886] ata1.00: cmd 61/38:00:01:9e:a4/01:00:00:00:00/40 tag 0 ncq 159744 out Oct 29 09:09:21 localhost kernel: [ 3697.423886] res 50/01:00:01:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) Oct 29 09:09:21 localhost kernel: [ 3697.423888] ata1.00: status: { DRDY } Oct 29 09:09:21 localhost kernel: [ 3697.423894] ata1: hard resetting link Oct 29 09:09:22 localhost kernel: [ 3697.743196] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Oct 29 09:09:22 localhost kernel: [ 3697.744707] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded Oct 29 09:09:22 localhost kernel: [ 3697.744719] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out Oct 29 09:09:22 localhost kernel: [ 3697.744725] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out Oct 29 09:09:22 localhost kernel: [ 3697.744813] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded Oct 29 09:09:22 localhost kernel: [ 3697.745212] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1 Oct 29 09:09:22 localhost kernel: [ 3697.746694] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded Oct 29 09:09:22 localhost kernel: [ 3697.746705] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out Oct 29 09:09:22 localhost kernel: [ 3697.746711] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out Oct 29 09:09:22 localhost kernel: [ 3697.746779] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded Oct 29 09:09:22 localhost kernel: [ 3697.747286] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1 Oct 29 09:09:22 localhost kernel: [ 3697.747432] ata1.00: configured for UDMA/133 Oct 29 09:09:22 localhost kernel: [ 3697.763181] ata1: EH complete I did not know yet about what "frozen" means, so I did not investigate and very soon powered down as I had to leave. Next time I boot up I did not boot. So data probable is just the size because as long as I had powertop running... (CCing linux-ide) It seems like most likely either the SATA host controller or drive doesn't play nice with link power management enabled. Can you post the full dmesg boot log? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] cpufreq: Align all CPUs to the same frequency if using shared clock
Some SMP systems want to make all the possible CPUs share the clock, if the CPUs init frequencies aren't the same, we need to align all the CPUs to the same frequency while CPUs registing to avoid mismatched CPU's P-states. Signed-off-by: lizhuangzhi --- drivers/cpufreq/cpufreq.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 8d19f7c..d00abb5 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -991,6 +991,8 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, * CPU because it is in the same boat. */ policy = cpufreq_cpu_get(cpu); if (unlikely(policy)) { + /* according present policy to align all the cpus frequencies */ + cpufreq_driver->target(policy, policy->cur, CPUFREQ_RELATION_H); cpufreq_cpu_put(policy); return 0; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 4/6] net: rfkill: gpio: add device tree support
On Sat, Jan 18, 2014 at 8:11 AM, Linus Walleij wrote: > On Fri, Jan 17, 2014 at 6:43 PM, Chen-Yu Tsai wrote: >> On Sat, Jan 18, 2014 at 12:47 AM, Arnd Bergmann wrote: > +- NAME_shutdown-gpios : GPIO phandle to shutdown control + (phandle must be the second) +- NAME_reset-gpios : GPIO phandle to reset control + +NAME must match the rfkill-name property. NAME_shutdown-gpios or +NAME_reset-gpios, or both, must be defined. + >>> >>> I don't understand this part. Why do you include the name in the >>> gpios property, rather than just hardcoding the property strings >>> to "shutdown-gpios" and "reset-gpios"? >> >> This quirk is a result of how gpiod_get_index implements device tree >> lookup. > > Why can't it just have a single property "gpios", where the first > element is the reset GPIO and the second is the shutdown GPIO? > > rfkill-gpio does this: > > gpio = devm_gpiod_get_index(>dev, rfkill->reset_name, 0); > gpio = devm_gpiod_get_index(>dev, rfkill->shutdown_name, 1); > > The passed con ID name parameter is only there for the device > tree case it seems. (ACPI ignores it.) So what about you just > don't pass it at all and patch it to do like this instead: > > gpio = devm_gpiod_get_index(>dev, NULL, 0); > gpio = devm_gpiod_get_index(>dev, NULL, 1); > > Heikki, are you OK with this change? > > I think this is actually necessary if the ACPI and DT unification > pipe dream shall limp forward, we cannot have arguments passed > that have a semantic effect on DT but not on ACPI... Drivers > that are supposed to use both ACPI and DT will always > have to pass NULL as con ID. I agree that's how it should be be done with the current API if your driver can obtain GPIOs from both ACPI and DT. This is a potential issue, as drivers are not supposed to make assumptions about who is going to be their GPIO provider. Let's say you started a driver with only DT in mind, and used gpio_get(dev, con_id) to get your GPIOs. DT bindings are thus of the form "con_id-gpio = ", and set in stone. Then later, someone wants to use your driver with ACPI. How do you handle that gracefully? I'm starting to wonder, now that ACPI is a first-class GPIO provider, whether we should not start to encourage the deprecation of the "con_id-gpio = " binding form in DT and only use a single indexed GPIO property per device. The con_id parameter would then only be used as a label, which would also have the nice side-effect that all GPIOs used for a given function will be reported under the same name no matter what the GPIO provider is. >From an aesthetic point of view, I definitely prefer using con_id to identify GPIOs instead of indexes, but I don't see how we can make it play nice with ACPI. Thoughts? Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages
Thanks for your review. 2014/1/21 Minchan Kim : > Hello Cai, > > On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote: >> zswap can support multiple swapfiles. So we need to check >> all zbud pool pages in zswap. >> >> Version 2: >> * add *total_zbud_pages* in zbud to record all the pages in pools >> * move the updating of pool pages statistics to >> alloc_zbud_page/free_zbud_page to hide the details >> >> Signed-off-by: Cai Liu >> --- >> include/linux/zbud.h |2 +- >> mm/zbud.c| 44 >> mm/zswap.c |4 ++-- >> 3 files changed, 35 insertions(+), 15 deletions(-) >> >> diff --git a/include/linux/zbud.h b/include/linux/zbud.h >> index 2571a5c..1dbc13e 100644 >> --- a/include/linux/zbud.h >> +++ b/include/linux/zbud.h >> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, unsigned long >> handle); >> int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries); >> void *zbud_map(struct zbud_pool *pool, unsigned long handle); >> void zbud_unmap(struct zbud_pool *pool, unsigned long handle); >> -u64 zbud_get_pool_size(struct zbud_pool *pool); >> +u64 zbud_get_pool_size(void); >> >> #endif /* _ZBUD_H_ */ >> diff --git a/mm/zbud.c b/mm/zbud.c >> index 9451361..711aaf4 100644 >> --- a/mm/zbud.c >> +++ b/mm/zbud.c >> @@ -52,6 +52,13 @@ >> #include >> #include >> >> +/* >> +* statistics >> +**/ >> + >> +/* zbud pages in all pools */ >> +static u64 total_zbud_pages; >> + >> /* >> * Structures >> */ >> @@ -142,10 +149,28 @@ static struct zbud_header *init_zbud_page(struct page >> *page) >> return zhdr; >> } >> >> +static struct page *alloc_zbud_page(struct zbud_pool *pool, gfp_t gfp) >> +{ >> + struct page *page; >> + >> + page = alloc_page(gfp); >> + >> + if (page) { >> + pool->pages_nr++; >> + total_zbud_pages++; > > Who protect race? Yes, here the pool->pages_nr and also the total_zbud_pages are not protected. I will re-do it. I will change *total_zbud_pages* to atomic type. For *pool->pages_nr*, one way is to use pool->lock to protect. But I think it is too heavy. So does it ok to change pages_nr to atomic type too? > >> + } >> + >> + return page; >> +} >> + >> + >> /* Resets the struct page fields and frees the page */ >> -static void free_zbud_page(struct zbud_header *zhdr) >> +static void free_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr) >> { >> __free_page(virt_to_page(zhdr)); >> + >> + pool->pages_nr--; >> + total_zbud_pages--; >> } >> >> /* >> @@ -279,11 +304,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t >> gfp, >> >> /* Couldn't find unbuddied zbud page, create new one */ >> spin_unlock(>lock); >> - page = alloc_page(gfp); >> + page = alloc_zbud_page(pool, gfp); >> if (!page) >> return -ENOMEM; >> spin_lock(>lock); >> - pool->pages_nr++; >> zhdr = init_zbud_page(page); >> bud = FIRST; >> >> @@ -349,8 +373,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long >> handle) >> if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { >> /* zbud page is empty, free */ >> list_del(>lru); >> - free_zbud_page(zhdr); >> - pool->pages_nr--; >> + free_zbud_page(pool, zhdr); >> } else { >> /* Add to unbuddied list */ >> freechunks = num_free_chunks(zhdr); >> @@ -447,8 +470,7 @@ next: >>* Both buddies are now free, free the zbud page and >>* return success. >>*/ >> - free_zbud_page(zhdr); >> - pool->pages_nr--; >> + free_zbud_page(pool, zhdr); >> spin_unlock(>lock); >> return 0; >> } else if (zhdr->first_chunks == 0 || >> @@ -496,14 +518,12 @@ void zbud_unmap(struct zbud_pool *pool, unsigned long >> handle) >> >> /** >> * zbud_get_pool_size() - gets the zbud pool size in pages >> - * @pool:pool whose size is being queried >> * >> - * Returns: size in pages of the given pool. The pool lock need not be >> - * taken to access pages_nr. >> + * Returns: size in pages of all the zbud pools. >> */ >> -u64 zbud_get_pool_size(struct zbud_pool *pool) >> +u64 zbud_get_pool_size(void) >> { >> - return pool->pages_nr; >> + return total_zbud_pages; >> } >> >> static int __init init_zbud(void) >> diff --git a/mm/zswap.c b/mm/zswap.c >> index 5a63f78..ef44d9d 100644 >> --- a/mm/zswap.c >> +++ b/mm/zswap.c >> @@ -291,7 +291,7 @@ static void zswap_free_entry(struct zswap_tree *tree, >> zbud_free(tree->pool, entry->handle); >> zswap_entry_cache_free(entry); >> atomic_dec(_stored_pages); >> - zswap_pool_pages =
Re: [patch 9/9] mm: keep page cache radix tree nodes in check
On Mon, Jan 20, 2014 at 06:17:37PM -0500, Johannes Weiner wrote: > On Fri, Jan 17, 2014 at 11:05:17AM +1100, Dave Chinner wrote: > > On Fri, Jan 10, 2014 at 01:10:43PM -0500, Johannes Weiner wrote: > > > + /* Only shadow entries in there, keep track of this node */ > > > + if (!(node->count & RADIX_TREE_COUNT_MASK) && > > > + list_empty(>private_list)) { > > > + node->private_data = mapping; > > > + list_lru_add(_shadow_nodes, >private_list); > > > + } > > > > You can't do this list_empty(>private_list) check safely > > externally to the list_lru code - only time that entry can be > > checked safely is under the LRU list locks. This is the reason that > > list_lru_add/list_lru_del return a boolean to indicate is the object > > was added/removed from the list - they do this list_empty() check > > internally. i.e. the correct, safe way to do conditionally update > > state iff the object was added to the LRU is: > > > > if (!(node->count & RADIX_TREE_COUNT_MASK)) { > > if (list_lru_add(_shadow_nodes, >private_list)) > > node->private_data = mapping; > > } > > > > > + radix_tree_replace_slot(slot, page); > > > + mapping->nrpages++; > > > + if (node) { > > > + node->count++; > > > + /* Installed page, can't be shadow-only anymore */ > > > + if (!list_empty(>private_list)) > > > + list_lru_del(_shadow_nodes, > > > + >private_list); > > > + } > > > > Same issue here: > > > > if (node) { > > node->count++; > > list_lru_del(_shadow_nodes, >private_list); > > } > > All modifications to node->private_list happen under > mapping->tree_lock, and modifications of a neighboring link should not > affect the outcome of the list_empty(), so I don't think the lru lock > is necessary. Can you please add that as a comment somewhere explaining why it is safe to do this? > > > + case LRU_REMOVED_RETRY: > > > if (--nlru->nr_items == 0) > > > node_clear(nid, lru->active_nodes); > > > WARN_ON_ONCE(nlru->nr_items < 0); > > > isolated++; > > > + /* > > > + * If the lru lock has been dropped, our list > > > + * traversal is now invalid and so we have to > > > + * restart from scratch. > > > + */ > > > + if (ret == LRU_REMOVED_RETRY) > > > + goto restart; > > > break; > > > case LRU_ROTATE: > > > list_move_tail(item, >list); > > > > I think that we need to assert that the list lru lock is correctly > > held here on return with LRU_REMOVED_RETRY. i.e. > > > > case LRU_REMOVED_RETRY: > > assert_spin_locked(>lock); > > case LRU_REMOVED: > > Ah, good idea. How about adding it to LRU_RETRY as well? Yup, good idea. > > > +static struct shrinker workingset_shadow_shrinker = { > > > + .count_objects = count_shadow_nodes, > > > + .scan_objects = scan_shadow_nodes, > > > + .seeks = DEFAULT_SEEKS * 4, > > > + .flags = SHRINKER_NUMA_AWARE, > > > +}; > > > > Can you add a comment explaining how you calculated the .seeks > > value? It's important to document the weighings/importance > > we give to slab reclaim so we can determine if it's actually > > acheiving the desired balance under different loads... > > This is not an exact science, to say the least. I know, that's why I asked it be documented rather than be something kept in your head. > The shadow entries are mostly self-regulated, so I don't want the > shrinker to interfere while the machine is just regularly trimming > caches during normal operation. > > It should only kick in when either a) reclaim is picking up and the > scan-to-reclaim ratio increases due to mapped pages, dirty cache, > swapping etc. or b) the number of objects compared to LRU pages > becomes excessive. > > I think that is what most shrinkers with an elevated seeks value want, > but this translates very awkwardly (and not completely) to the current > cost model, and we should probably rework that interface. > > "Seeks" currently encodes 3 ratios: > > 1. the cost of creating an object vs. a page > > 2. the expected number of objects vs. pages It doesn't encode that at all. If it did, then the default value wouldn't be "2". > 3. the cost of reclaiming an object vs. a page Which, when you consider #3 in conjunction with #1, the actual intended meaning of .seeks is "the cost of replacing this object in the cache compared to the cost of replacing a page cache page." > but they are not necessarily correlated. How I would like to > configure the shadow shrinker instead is: > > o scan objects when reclaim efficiency is down to 75%, because they > are more valuable than use-once cache but less than workingset > > o scan objects
Re: More GPIO madness on iMX6 - and the crappy ARM port of Linux
On Sat, Jan 18, 2014 at 7:43 AM, Linus Walleij wrote: > On Fri, Jan 17, 2014 at 9:53 PM, Russell King - ARM Linux > wrote: >> On Fri, Jan 17, 2014 at 01:42:44PM -0700, Stephen Warren wrote: > >>> I believe you want gpio_get_value() to return either the driven or >>> actual pin value where it can on the current HW, but just e.g. hard-code >>> 0 on other HW. That would introduce a core feature that works some >>> places but not others, and hence make drivers that relied on the feature >>> less portable between HW with different actual features. >> >> I can buy that argument, but there's an issue which stands squarely in >> its way, and that is open-drain GPIOs. >> >> These are modelled just as any other GPIO, mainly so that both >> gpio_set_value(gpio, 1) and gpio_direction_input(gpio) both result in >> the signal being high. The only combination which results in the >> signal being driven low is outputting zero - and the state of the signal >> can aways be read back. >> >> The problem here is that such gpios are implemented in things like the >> I2C driver such that they're _always_ outputs, and gpio_set_value() is >> used to pull the signal down. gpio_get_value() is used to read its >> current state. >> >> So, if we say that gpio_get_value() is undefined, we force such >> subsystems to always jump through the non-open-drain paths (using >> gpio_direction_input() to set the line high and >> gpio_direction_output(gpio, 0) to drive it low.) > > Incidentally that is what gpiolib is doing internally in > gpiod_direction_output(). > > You're absolutely right that it makes no sense to have open > drain (or open source) unless the signal can be read back from > the hardware. > > I'm thinking something like if the driver manages to obtain a > GPIO with > > gpio_request_one(gpio, GPIOF_OPEN_DRAIN | > GPIOF_OUT_INIT_HIGH); > > As the I2C core does, and then when that call succeeds, it can > expect that whatever comes back from gpio_get_value() is > always what is actually on the line. If the driver cannot determine > this it should not have allowed that flag to succeed in the first > place, so this might be something we want to enforce. > > There are two white spots on the map here: > > 1. Today this OPEN_DRAIN flag is not even passed down to > the driver so how could it say anything about it :-( it's a pure gpiolib > internal flag. We don't know if the hardware can actually even > do open drain, we just assume it can. > > What it should really do - in the best of worlds - is to check if > it can cross-reference the GPIO line to a pin in the pin control > subsystem, and if that is possible, then ask the pin if it > is supporting open drain and set it. It currently has no such > cross-calls, it is just assumed that the configuration is consistent, > and the actual pin is set up as open drain. But it would make > sense to add more cross-calls here, since GPIO is accepting > these flags (OPEN_DRAIN/OPEN_SOURCE). This would definitely work in the case of pinctrl-backed GPIOs, but would not cover all GPIO chips. If we want to cover all cases we should give drivers a way to way to report or enforce this capability, and make the pinctrl cross-reference one of its implementations where it can be done. > > Like: > int pinctrl_gpio_set_flags(unsigned gpio, unsigned long flags); > > Where the pinctrl subsystem would attempt to cross reference > and set the flag, and the pin controller backend will then have > the option to return an error code. > > We could atleast support that for the select pin controllers > that use generic pin config. i.MX is another story, but I'm open > to compromises. > > 2. In the new descriptor API this open drain setting would > be set from the lookup table and be a property on the line, > meaning this flag is not requested explicitly by the consumer, > and the consumer needs to inspect the obtained descriptor > to figure out if it is set to open drain. > > Alexandre: do you have plans for how to handle a dynamic > consumer passing flags to its gpio request in the gpiod API? Do you mean like passing OPEN_DRAIN or OPEN_SOURCE flags to gpiod_get(), similarly to what is done for e.g. gpio_request_one()? In the case of the gpiod API I would rather see these flags defined in the GPIO mapping if possible. For platform data it is already possible to specify open drain/open source, for DT this is trivial to add. ACPI would be more of a problem here, but I'm not sure whether the problem is relevant for ACPI GPIOs. So the way I see it coming into shape would be something like: 1) GPIO drivers' request() function get an extra flags argument that is passed by the GPIO core with the flags of the mapping. There we can define all the range of properties that gpio_request_one() supported. The driver's request() will fail it if cannot satisfy these properties. That's where the pinctrl cross-reference would take place. 2) All properties accepted by gpio_request_one() can also be passed
[PATCH v4] ACPI: Fix acpi_evaluate_object() return value check
Since acpi_evaluate_object() returns acpi_status and not plain int, ACPI_FAILURE() should be used for checking its return value. Reviewed-by: Jani Nikula Signed-off-by: Yijing Wang --- v3->v4: Fix spell error, add Jani Nikula reviewed-by. v2->v3: Fix compile error pointed out by Hanjun. v1->v2: Add CC to related subsystem MAINTAINERS --- drivers/gpu/drm/i915/intel_acpi.c | 24 ++-- drivers/gpu/drm/nouveau/core/subdev/mxm/base.c |9 + drivers/gpu/drm/nouveau/nouveau_acpi.c | 23 +-- drivers/pci/pci-label.c|9 ++--- 4 files changed, 38 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_acpi.c b/drivers/gpu/drm/i915/intel_acpi.c index dfff090..87e8f74 100644 --- a/drivers/gpu/drm/i915/intel_acpi.c +++ b/drivers/gpu/drm/i915/intel_acpi.c @@ -35,7 +35,8 @@ static int intel_dsm(acpi_handle handle, int func) union acpi_object params[4]; union acpi_object *obj; u32 result; - int ret = 0; + acpi_status status; + int ret; input.count = 4; input.pointer = params; @@ -50,10 +51,11 @@ static int intel_dsm(acpi_handle handle, int func) params[3].package.count = 0; params[3].package.elements = NULL; - ret = acpi_evaluate_object(handle, "_DSM", , ); - if (ret) { - DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret); - return ret; + status = acpi_evaluate_object(handle, "_DSM", , ); + if (ACPI_FAILURE(status)) { + DRM_DEBUG_DRIVER("failed to evaluate _DSM: %s\n", + acpi_format_exception(status)); + return -EINVAL; } obj = (union acpi_object *)output.pointer; @@ -141,7 +143,8 @@ static void intel_dsm_platform_mux_info(void) struct acpi_object_list input; union acpi_object params[4]; union acpi_object *pkg; - int i, ret; + acpi_status status; + int i; input.count = 4; input.pointer = params; @@ -156,10 +159,11 @@ static void intel_dsm_platform_mux_info(void) params[3].package.count = 0; params[3].package.elements = NULL; - ret = acpi_evaluate_object(intel_dsm_priv.dhandle, "_DSM", , - ); - if (ret) { - DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret); + acpi_status = acpi_evaluate_object(intel_dsm_priv.dhandle, + "_DSM", , ); + if (ACPI_FAILURE(status)) { + DRM_DEBUG_DRIVER("failed to evaluate _DSM: %s\n", + acpi_format_exception(status)); goto out; } diff --git a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c index 1291204..c5e7a2b 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c +++ b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c @@ -114,15 +114,16 @@ mxm_shadow_dsm(struct nouveau_mxm *mxm, u8 version) struct acpi_buffer retn = { ACPI_ALLOCATE_BUFFER, NULL }; union acpi_object *obj; acpi_handle handle; - int ret; + acpi_status status; handle = ACPI_HANDLE(>pdev->dev); if (!handle) return false; - ret = acpi_evaluate_object(handle, "_DSM", , ); - if (ret) { - nv_debug(mxm, "DSM MXMS failed: %d\n", ret); + status = acpi_evaluate_object(handle, "_DSM", , ); + if (ACPI_FAILURE(status)) { + nv_debug(mxm, "DSM MXMS failed: %s\n", + acpi_format_exception(status)); return false; } diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c b/drivers/gpu/drm/nouveau/nouveau_acpi.c index ba0183f..de3068b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c @@ -82,7 +82,8 @@ static int nouveau_optimus_dsm(acpi_handle handle, int func, int arg, uint32_t * struct acpi_object_list input; union acpi_object params[4]; union acpi_object *obj; - int i, err; + acpi_status status; + int i; char args_buff[4]; input.count = 4; @@ -101,10 +102,11 @@ static int nouveau_optimus_dsm(acpi_handle handle, int func, int arg, uint32_t * args_buff[i] = (arg >> i * 8) & 0xFF; params[3].buffer.pointer = args_buff; - err = acpi_evaluate_object(handle, "_DSM", , ); - if (err) { - printk(KERN_INFO "failed to evaluate _DSM: %d\n", err); - return err; + status = acpi_evaluate_object(handle, "_DSM", , ); + if (ACPI_FAILURE(status)) { + pr_info("failed to evaluate _DSM: %s\n", + acpi_format_exception(status)); + return -EINVAL; } obj = (union acpi_object *)output.pointer; @@ -134,7 +136,7 @@
linux-next: manual merge of the sound-asoc tree with the tree
Hi all, Today's linux-next merge of the sound-asoc tree got a conflict in sound/soc/soc-compress.c between commit 2a99ef0fdb35 ("ASoC: compress: Add suport for DPCM into compressed audio") from the sound tree and commit 76063d340520 ("ASoC: compress: Add suport for DPCM into compressed audio") from the sound-asoc tree. The sound tree version had a later Author date, so I just used that version - let me know if something else should be done. Otherwise, the sound-asoc tree needs to be cleaned up as this is the only change left in it (relative to the sound tree). -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpmRV8zBwyBw.pgp Description: PGP signature
Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages
Hello Cai, On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote: > zswap can support multiple swapfiles. So we need to check > all zbud pool pages in zswap. > > Version 2: > * add *total_zbud_pages* in zbud to record all the pages in pools > * move the updating of pool pages statistics to > alloc_zbud_page/free_zbud_page to hide the details > > Signed-off-by: Cai Liu > --- > include/linux/zbud.h |2 +- > mm/zbud.c| 44 > mm/zswap.c |4 ++-- > 3 files changed, 35 insertions(+), 15 deletions(-) > > diff --git a/include/linux/zbud.h b/include/linux/zbud.h > index 2571a5c..1dbc13e 100644 > --- a/include/linux/zbud.h > +++ b/include/linux/zbud.h > @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, unsigned long > handle); > int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries); > void *zbud_map(struct zbud_pool *pool, unsigned long handle); > void zbud_unmap(struct zbud_pool *pool, unsigned long handle); > -u64 zbud_get_pool_size(struct zbud_pool *pool); > +u64 zbud_get_pool_size(void); > > #endif /* _ZBUD_H_ */ > diff --git a/mm/zbud.c b/mm/zbud.c > index 9451361..711aaf4 100644 > --- a/mm/zbud.c > +++ b/mm/zbud.c > @@ -52,6 +52,13 @@ > #include > #include > > +/* > +* statistics > +**/ > + > +/* zbud pages in all pools */ > +static u64 total_zbud_pages; > + > /* > * Structures > */ > @@ -142,10 +149,28 @@ static struct zbud_header *init_zbud_page(struct page > *page) > return zhdr; > } > > +static struct page *alloc_zbud_page(struct zbud_pool *pool, gfp_t gfp) > +{ > + struct page *page; > + > + page = alloc_page(gfp); > + > + if (page) { > + pool->pages_nr++; > + total_zbud_pages++; Who protect race? > + } > + > + return page; > +} > + > + > /* Resets the struct page fields and frees the page */ > -static void free_zbud_page(struct zbud_header *zhdr) > +static void free_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr) > { > __free_page(virt_to_page(zhdr)); > + > + pool->pages_nr--; > + total_zbud_pages--; > } > > /* > @@ -279,11 +304,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t > gfp, > > /* Couldn't find unbuddied zbud page, create new one */ > spin_unlock(>lock); > - page = alloc_page(gfp); > + page = alloc_zbud_page(pool, gfp); > if (!page) > return -ENOMEM; > spin_lock(>lock); > - pool->pages_nr++; > zhdr = init_zbud_page(page); > bud = FIRST; > > @@ -349,8 +373,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long > handle) > if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { > /* zbud page is empty, free */ > list_del(>lru); > - free_zbud_page(zhdr); > - pool->pages_nr--; > + free_zbud_page(pool, zhdr); > } else { > /* Add to unbuddied list */ > freechunks = num_free_chunks(zhdr); > @@ -447,8 +470,7 @@ next: >* Both buddies are now free, free the zbud page and >* return success. >*/ > - free_zbud_page(zhdr); > - pool->pages_nr--; > + free_zbud_page(pool, zhdr); > spin_unlock(>lock); > return 0; > } else if (zhdr->first_chunks == 0 || > @@ -496,14 +518,12 @@ void zbud_unmap(struct zbud_pool *pool, unsigned long > handle) > > /** > * zbud_get_pool_size() - gets the zbud pool size in pages > - * @pool:pool whose size is being queried > * > - * Returns: size in pages of the given pool. The pool lock need not be > - * taken to access pages_nr. > + * Returns: size in pages of all the zbud pools. > */ > -u64 zbud_get_pool_size(struct zbud_pool *pool) > +u64 zbud_get_pool_size(void) > { > - return pool->pages_nr; > + return total_zbud_pages; > } > > static int __init init_zbud(void) > diff --git a/mm/zswap.c b/mm/zswap.c > index 5a63f78..ef44d9d 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -291,7 +291,7 @@ static void zswap_free_entry(struct zswap_tree *tree, > zbud_free(tree->pool, entry->handle); > zswap_entry_cache_free(entry); > atomic_dec(_stored_pages); > - zswap_pool_pages = zbud_get_pool_size(tree->pool); > + zswap_pool_pages = zbud_get_pool_size(); > } > > /* caller must hold the tree lock */ > @@ -716,7 +716,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t > offset, > > /* update stats */ > atomic_inc(_stored_pages); > - zswap_pool_pages = zbud_get_pool_size(tree->pool); > + zswap_pool_pages = zbud_get_pool_size(); > > return 0; > > -- > 1.7.10.4 > > -- > To unsubscribe, send a message with 'unsubscribe
Re: math_state_restore and kernel_fpu_end disable interrupts?
On Sun, 19 Jan 2014, George Spelvin wrote: It's credited to Suresh Siddha, whom I've cc'ed (along with others who signed off). Suresh, if you're still around, could you comment on why math_state_restore always leaves interrupts disabled, regardless of their state on entry? Is there a deep reason or is it a bug? What the comments seemed to be implying was that it was a bug to enter this code with interrupts enabled. So the problem may be a little bit more systemic; expert counsel is required. It would be kind of weird for code that requires disabled interrupts on entry to turn around and enable interrupts itself. I agree that it would really help for a guru to take a look... On which note, Suresh's email bounced :-( -- Nate Eldredge n...@thatsmathematics.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/2] i2c: New bus driver for the QUP I2C controller
On 01/17, Bjorn Andersson wrote: > diff --git a/drivers/i2c/busses/i2c-qup.c b/drivers/i2c/busses/i2c-qup.c > new file mode 100644 > index 000..2e0020e > --- /dev/null > +++ b/drivers/i2c/busses/i2c-qup.c > @@ -0,0 +1,894 @@ > +/* Copyright (c) 2009-2013, The Linux Foundation. All rights reserved. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 and > + * only version 2 as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* QUP Registers */ > +#define QUP_CONFIG 0x000 > +#define QUP_STATE0x004 > +#define QUP_IO_MODE 0x008 > +#define QUP_SW_RESET 0x00c > +#define QUP_OPERATIONAL 0x018 > +#define QUP_ERROR_FLAGS 0x01c > +#define QUP_ERROR_FLAGS_EN 0x020 > +#define QUP_HW_VERSION 0x030 > +#define QUP_MX_OUTPUT_CNT0x100 > +#define QUP_OUT_FIFO_BASE0x110 > +#define QUP_MX_WRITE_CNT 0x150 > +#define QUP_MX_INPUT_CNT 0x200 > +#define QUP_MX_READ_CNT 0x208 > +#define QUP_IN_FIFO_BASE 0x218 > +#define QUP_I2C_CLK_CTL 0x400 > +#define QUP_I2C_STATUS 0x404 > + > +/* QUP States and reset values */ > +#define QUP_RESET_STATE 0 > +#define QUP_RUN_STATE1 > +#define QUP_PAUSE_STATE 3 > +#define QUP_STATE_MASK 3 > + > +#define QUP_STATE_VALID BIT(2) > +#define QUP_I2C_MAST_GEN BIT(4) > + > +#define QUP_OPERATIONAL_RESET0x000ff0 > +#define QUP_I2C_STATUS_RESET 0xfc > + > +/* QUP OPERATIONAL FLAGS */ > +#define QUP_OUT_SVC_FLAG BIT(8) > +#define QUP_IN_SVC_FLAG BIT(9) > +#define QUP_MX_INPUT_DONEBIT(11) > + > +/* I2C mini core related values */ > +#define I2C_MINI_CORE(2 << 8) > +#define I2C_N_VAL15 > +/* Most significant word offset in FIFO port */ > +#define QUP_MSW_SHIFT(I2C_N_VAL + 1) > +#define QUP_CLOCK_AUTO_GATE BIT(13) > + > +/* Packing/Unpacking words in FIFOs, and IO modes */ > +#define QUP_UNPACK_ENBIT(14) > +#define QUP_PACK_EN BIT(15) > +#define QUP_OUTPUT_BLK_MODE BIT(10) > +#define QUP_INPUT_BLK_MODE BIT(12) > + > +#define QUP_REPACK_EN(QUP_UNPACK_EN | QUP_PACK_EN) > + > +#define QUP_OUTPUT_BLOCK_SIZE(x)(((x) & (0x03 << 0)) >> 0) > +#define QUP_OUTPUT_FIFO_SIZE(x) (((x) & (0x07 << 2)) >> 2) > +#define QUP_INPUT_BLOCK_SIZE(x) (((x) & (0x03 << 5)) >> 5) > +#define QUP_INPUT_FIFO_SIZE(x) (((x) & (0x07 << 7)) >> 7) > + > +/* QUP tags */ > +#define QUP_OUT_NOP (0 << 8) > +#define QUP_OUT_START(1 << 8) > +#define QUP_OUT_DATA (2 << 8) > +#define QUP_OUT_STOP (3 << 8) > +#define QUP_OUT_REC (4 << 8) > +#define QUP_IN_DATA (5 << 8) > +#define QUP_IN_STOP (6 << 8) > +#define QUP_IN_NACK (7 << 8) > + > +/* Status, Error flags */ > +#define I2C_STATUS_WR_BUFFER_FULLBIT(0) > +#define I2C_STATUS_BUS_ACTIVEBIT(8) > +#define I2C_STATUS_BUS_MASTERBIT(9) > +#define I2C_STATUS_ERROR_MASK0x38000fc > +#define QUP_I2C_NACK_FLAGBIT(3) > +#define QUP_IN_NOT_EMPTY BIT(5) > +#define QUP_STATUS_ERROR_FLAGS 0x7c > + > +/* Master bus_err clock states */ > +#define I2C_CLK_RESET_BUSIDLE_STATE 0 > +#define I2C_CLK_FORCED_LOW_STATE 5 > + > +#define QUP_MAX_CLK_STATE_RETRIES300 > +#define QUP_MAX_QUP_STATE_RETRIES100 > +#define I2C_STATUS_CLK_STATE 13 > +#define QUP_OUT_FIFO_NOT_EMPTY 0x10 > +#define QUP_READ_LIMIT 256 > + > +struct qup_i2c_dev { > + struct device *dev; > + void __iomem*base; > + int irq; > + struct clk *clk; > + struct clk *pclk; > + struct i2c_adapter adap; > + > + int clk_ctl; > + int one_bit_t; > + int out_fifo_sz; > + int in_fifo_sz; > + int out_blk_sz; > + int in_blk_sz; > + unsigned long xfer_time; > + unsigned long wait_idle; > + > + struct i2c_msg *msg; > + /* Current posion in user message buffer */ s/posion/position/ > + int pos; > + /* Keep number of bytes left to be transmitted */ > + int cnt; > + /*
Re: [ANNOUNCE] 3.12.6-rt9
On Sat, 18 Jan 2014 04:15:29 +0100 Mike Galbraith wrote: > > So you also have the timers-do-not-raise-softirq-unconditionally.patch? > People have been complaining that the latest 3.12-rt does not boot on intel i7 boxes. And by reverting this patch, it boots fine. I happen to have a i7 box to test on, and sure enough, the latest 3.12-rt locks up on boot and reverting the timers-do-not-raise-softirq-unconditionally.patch, it boots fine. Looking into it, I made this small update, and the box boots. Seems checking "active_timers" is not enough to skip raising softirqs. I haven't looked at why yet, but I would like others to test this patch too. I'll leave why this lets i7 boxes boot as an exercise for Thomas ;-) -- Steve Signed-off-by: Steven Rostedt diff --git a/kernel/timer.c b/kernel/timer.c index 46467be..8212c10 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1464,13 +1464,11 @@ void run_local_timers(void) raise_softirq(TIMER_SOFTIRQ); return; } - if (!base->active_timers) - goto out; /* Check whether the next pending timer has expired */ if (time_before_eq(base->next_timer, jiffies)) raise_softirq(TIMER_SOFTIRQ); -out: + rt_spin_unlock_after_trylock_in_irq(>lock); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Backlight driver for MacBook Air 6,1 and 6,2
Hi Andrew and CCs I've put together (rather quickly) a driver for directly handling the backlight driver chip (LM8550) on the 2013 MacBook Air. It is needed to work around a bug (likely in firmware) that occurs after suspend/resume. See: https://bugs.freedesktop.org/show_bug.cgi?id=67454 This seems to fall outside of what the i915 driver should handle and thus need a separate driver. It's available at: https://github.com/patjak/mba6x_bl The MacBook Air provides ACPI backlight methods but they also break after suspend. I'm planning to mainline this and have a few questions. 1) I'm accessing the LP8550 on the SMBUS through ACPI methods. Should I access the SMBUS directly instead or is this ok? I probably need to look at locking around SMBUS accesses. 2) Is DMI the proper way of probing? Currently I'm just checking if the chip is there and that it returns the proper contents in an identifier byte. 3) I assume the backlight type should be BACKLIGHT_PLATFORM (currently BACKLIGHT_FIRMWARE) but do I also need to blacklist the ACPI backlight on these devices? How do I get the proper precedence over other backlight devices? Is there still time to get this into 3.14-rc1? Thanks Patrik -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUERY]: Is using CPU hotplug right for isolating CPUs?
On Mon, Jan 20, 2014 at 11:41 PM, Frederic Weisbecker wrote: > On Mon, Jan 20, 2014 at 08:30:10PM +0530, Viresh Kumar wrote: >> On 20 January 2014 19:29, Lei Wen wrote: >> > Hi Viresh, >> >> Hi Lei, >> >> > I have one question regarding unbounded workqueue migration in your case. >> > You use hotplug to migrate the unbounded work to other cpus, but its cpu >> > mask >> > would still be 0xf, since cannot be changed by cpuset. >> > >> > My question is how you could prevent this unbounded work migrate back >> > to your isolated cpu? >> > Seems to me there is no such mechanism in kernel, am I understand wrong? >> >> These workqueues are normally queued back from workqueue handler. And we >> normally queue them on the local cpu, that's the default behavior of >> workqueue >> subsystem. And so they land up on the same CPU again and again. > > But for workqueues having a global affinity, I think they can be rescheduled > later > on the old CPUs. Although I'm not sure about that, I'm Cc'ing Tejun. Agree, since worker thread is made as enterring into all cpus, it cannot prevent scheduler do the migration. But here is one point, that I see Viresh alredy set up two cpuset with scheduler load balance disabled, so it should stop the task migration between those two groups? Since the sched_domain changed? What is more, I also did similiar test, and find when I set two such cpuset group, like core 0-2 to cpuset1, core 3 to cpuset2, while hotunplug the core3 afterwise. I find the cpuset's cpus member becomes NULL even I hotplug the core3 back again. So is it a bug? Thanks, Lei > > Also, one of the plan is to extend the sysfs interface of workqueues to > override > their affinity. If any of you guys want to try something there, that would be > welcome. > Also we want to work on the timer affinity. Perhaps we don't need a user > interface > for that, or maybe something on top of full dynticks to outline that we want > the unbound > timers to run on housekeeping CPUs only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 2/6] MCS Lock: Restructure the MCS lock defines and locking
We will need the MCS lock code for doing optimistic spinning for rwsem and queue rwlock. Extracting the MCS code from mutex.c and put into its own file allow us to reuse this code easily. Note that using the smp_load_acquire/smp_store_release pair used in mcs_lock and mcs_unlock is not sufficient to form a full memory barrier across cpus for many architectures (except x86). For applications that absolutely need a full barrier across multiple cpus with mcs_unlock and mcs_lock pair, smp_mb__after_unlock_lock() should be used after mcs_lock. Signed-off-by: Tim Chen Signed-off-by: Davidlohr Bueso --- include/linux/mcs_spinlock.h | 81 include/linux/mutex.h| 5 +-- kernel/locking/mutex.c | 68 - 3 files changed, 91 insertions(+), 63 deletions(-) create mode 100644 include/linux/mcs_spinlock.h diff --git a/include/linux/mcs_spinlock.h b/include/linux/mcs_spinlock.h new file mode 100644 index 000..23912cb --- /dev/null +++ b/include/linux/mcs_spinlock.h @@ -0,0 +1,81 @@ +/* + * MCS lock defines + * + * This file contains the main data structure and API definitions of MCS lock. + * + * The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spin-lock + * with the desirable properties of being fair, and with each cpu trying + * to acquire the lock spinning on a local variable. + * It avoids expensive cache bouncings that common test-and-set spin-lock + * implementations incur. + */ +#ifndef __LINUX_MCS_SPINLOCK_H +#define __LINUX_MCS_SPINLOCK_H + +struct mcs_spinlock { + struct mcs_spinlock *next; + int locked; /* 1 if lock acquired */ +}; + +/* + * Note: the smp_load_acquire/smp_store_release pair is not + * sufficient to form a full memory barrier across + * cpus for many architectures (except x86) for mcs_unlock and mcs_lock. + * For applications that need a full barrier across multiple cpus + * with mcs_unlock and mcs_lock pair, smp_mb__after_unlock_lock() should be + * used after mcs_lock. + */ + +/* + * We don't inline mcs_spin_lock() so that perf can correctly account for the + * time spent in this lock function. + */ +static noinline +void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node) +{ + struct mcs_spinlock *prev; + + /* Init node */ + node->locked = 0; + node->next = NULL; + + prev = xchg(lock, node); + if (likely(prev == NULL)) { + /* Lock acquired */ + node->locked = 1; + return; + } + ACCESS_ONCE(prev->next) = node; + /* +* Wait until the lock holder passes the lock down. +* Using smp_load_acquire() provides a memory barrier that +* ensures subsequent operations happen after the lock is acquired. +*/ + while (!(smp_load_acquire(>locked))) + arch_mutex_cpu_relax(); +} + +static void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node) +{ + struct mcs_spinlock *next = ACCESS_ONCE(node->next); + + if (likely(!next)) { + /* +* Release the lock by setting it to NULL +*/ + if (cmpxchg(lock, node, NULL) == node) + return; + /* Wait until the next pointer is set */ + while (!(next = ACCESS_ONCE(node->next))) + arch_mutex_cpu_relax(); + } + /* +* Pass lock to next waiter. +* smp_store_release() provides a memory barrier to ensure +* all operations in the critical section has been completed +* before unlocking. +*/ + smp_store_release(>locked, 1); +} + +#endif /* __LINUX_MCS_SPINLOCK_H */ diff --git a/include/linux/mutex.h b/include/linux/mutex.h index d318193..c482e1d 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -46,6 +46,7 @@ * - detects multi-task circular deadlocks and prints out all affected * locks and tasks (and only those tasks) */ +struct mcs_spinlock; struct mutex { /* 1: unlocked, 0: locked, negative: locked, possible waiters */ atomic_tcount; @@ -55,7 +56,7 @@ struct mutex { struct task_struct *owner; #endif #ifdef CONFIG_MUTEX_SPIN_ON_OWNER - void*spin_mlock;/* Spinner MCS lock */ + struct mcs_spinlock *mcs_lock; /* Spinner MCS lock */ #endif #ifdef CONFIG_DEBUG_MUTEXES const char *name; @@ -179,4 +180,4 @@ extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); # define arch_mutex_cpu_relax() cpu_relax() #endif -#endif +#endif /* __LINUX_MUTEX_H */ diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index fbbd2ed..45fe1b5 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -25,6 +25,7 @@ #include #include #include +#include /* * In the DEBUG case we are using the "NULL fastpath" for
[PATCH v8 4/6] MCS Lock: Move mcs_lock/unlock function into its own
From: Waiman Long Create a new mcs_spinlock.c file to contain the mcs_spin_lock() and mcs_spin_unlock() function. Signed-off-by: Waiman Long Signed-off-by: Tim Chen --- include/linux/mcs_spinlock.h | 77 ++ kernel/locking/Makefile| 6 +- .../locking/mcs_spinlock.c | 27 3 files changed, 18 insertions(+), 92 deletions(-) copy include/linux/mcs_spinlock.h => kernel/locking/mcs_spinlock.c (82%) diff --git a/include/linux/mcs_spinlock.h b/include/linux/mcs_spinlock.h index bfe84c6..d54bb23 100644 --- a/include/linux/mcs_spinlock.h +++ b/include/linux/mcs_spinlock.h @@ -17,78 +17,9 @@ struct mcs_spinlock { int locked; /* 1 if lock acquired */ }; -/* - * Note: the smp_load_acquire/smp_store_release pair is not - * sufficient to form a full memory barrier across - * cpus for many architectures (except x86) for mcs_unlock and mcs_lock. - * For applications that need a full barrier across multiple cpus - * with mcs_unlock and mcs_lock pair, smp_mb__after_unlock_lock() should be - * used after mcs_lock. - */ - -/* - * In order to acquire the lock, the caller should declare a local node and - * pass a reference of the node to this function in addition to the lock. - * If the lock has already been acquired, then this will proceed to spin - * on this node->locked until the previous lock holder sets the node->locked - * in mcs_spin_unlock(). - * - * We don't inline mcs_spin_lock() so that perf can correctly account for the - * time spent in this lock function. - */ -static noinline -void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node) -{ - struct mcs_spinlock *prev; - - /* Init node */ - node->locked = 0; - node->next = NULL; - - prev = xchg(lock, node); - if (likely(prev == NULL)) { - /* Lock acquired, don't need to set node->locked to 1 -* as lock owner and other contenders won't check this value. -* If a debug mode is needed to audit lock status, then -* set node->locked value here. -*/ - return; - } - ACCESS_ONCE(prev->next) = node; - /* -* Wait until the lock holder passes the lock down. -* Using smp_load_acquire() provides a memory barrier that -* ensures subsequent operations happen after the lock is acquired. -*/ - while (!(smp_load_acquire(>locked))) - arch_mutex_cpu_relax(); -} - -/* - * Releases the lock. The caller should pass in the corresponding node that - * was used to acquire the lock. - */ -static void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node) -{ - struct mcs_spinlock *next = ACCESS_ONCE(node->next); - - if (likely(!next)) { - /* -* Release the lock by setting it to NULL -*/ - if (likely(cmpxchg(lock, node, NULL) == node)) - return; - /* Wait until the next pointer is set */ - while (!(next = ACCESS_ONCE(node->next))) - arch_mutex_cpu_relax(); - } - /* -* Pass lock to next waiter. -* smp_store_release() provides a memory barrier to ensure -* all operations in the critical section has been completed -* before unlocking. -*/ - smp_store_release(>locked, 1); -} +extern +void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node); +extern +void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node); #endif /* __LINUX_MCS_SPINLOCK_H */ diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile index baab8e5..20d9d5c 100644 --- a/kernel/locking/Makefile +++ b/kernel/locking/Makefile @@ -13,12 +13,12 @@ obj-$(CONFIG_LOCKDEP) += lockdep.o ifeq ($(CONFIG_PROC_FS),y) obj-$(CONFIG_LOCKDEP) += lockdep_proc.o endif -obj-$(CONFIG_SMP) += spinlock.o -obj-$(CONFIG_PROVE_LOCKING) += spinlock.o +obj-$(CONFIG_SMP) += spinlock.o mcs_spinlock.o +obj-$(CONFIG_PROVE_LOCKING) += spinlock.o mcs_spinlock.o obj-$(CONFIG_RT_MUTEXES) += rtmutex.o obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o obj-$(CONFIG_RT_MUTEX_TESTER) += rtmutex-tester.o -obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o +obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o mcs_spinlock.o obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o obj-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o obj-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem-xadd.o diff --git a/include/linux/mcs_spinlock.h b/kernel/locking/mcs_spinlock.c similarity index 82% copy from include/linux/mcs_spinlock.h copy to kernel/locking/mcs_spinlock.c index bfe84c6..c3ee9cf 100644 --- a/include/linux/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.c @@ -1,7 +1,5 @@ /* - * MCS lock defines - * - * This file contains the main data structure and API definitions of MCS lock. + * MCS lock * * The
[PATCH v8 3/6] MCS Lock: optimizations and extra comments
From: Jason Low Remove unnecessary operation to assign locked status to 1 if lock is acquired without contention as this value will not be checked by lock holder again and other potential lock contenders will not be looking at their own lock status. Make the cmpxchg(lock, node, NULL) == node check in mcs_spin_unlock() likely() as it is likely that a race did not occur most of the time. Also add in more comments describing how the local node is used in MCS locks. Reviewed-by: Tim Chen Signed-off-by: Jason Low Signed-off-by: Tim Chen --- include/linux/mcs_spinlock.h | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/include/linux/mcs_spinlock.h b/include/linux/mcs_spinlock.h index 23912cb..bfe84c6 100644 --- a/include/linux/mcs_spinlock.h +++ b/include/linux/mcs_spinlock.h @@ -27,6 +27,12 @@ struct mcs_spinlock { */ /* + * In order to acquire the lock, the caller should declare a local node and + * pass a reference of the node to this function in addition to the lock. + * If the lock has already been acquired, then this will proceed to spin + * on this node->locked until the previous lock holder sets the node->locked + * in mcs_spin_unlock(). + * * We don't inline mcs_spin_lock() so that perf can correctly account for the * time spent in this lock function. */ @@ -41,8 +47,11 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node) prev = xchg(lock, node); if (likely(prev == NULL)) { - /* Lock acquired */ - node->locked = 1; + /* Lock acquired, don't need to set node->locked to 1 +* as lock owner and other contenders won't check this value. +* If a debug mode is needed to audit lock status, then +* set node->locked value here. +*/ return; } ACCESS_ONCE(prev->next) = node; @@ -55,6 +64,10 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node) arch_mutex_cpu_relax(); } +/* + * Releases the lock. The caller should pass in the corresponding node that + * was used to acquire the lock. + */ static void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node) { struct mcs_spinlock *next = ACCESS_ONCE(node->next); @@ -63,7 +76,7 @@ static void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *nod /* * Release the lock by setting it to NULL */ - if (cmpxchg(lock, node, NULL) == node) + if (likely(cmpxchg(lock, node, NULL) == node)) return; /* Wait until the next pointer is set */ while (!(next = ACCESS_ONCE(node->next))) -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 6/6] MCS Lock: Allow architecture specific asm files to be used for contended case
From: Peter Zijlstra This patch allows each architecture to add its specific assembly optimized arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for MCS lock and unlock functions. Signed-off-by: Tim Chen --- arch/alpha/include/asm/Kbuild | 1 + arch/arc/include/asm/Kbuild| 1 + arch/arm/include/asm/Kbuild| 1 + arch/arm64/include/asm/Kbuild | 1 + arch/avr32/include/asm/Kbuild | 1 + arch/blackfin/include/asm/Kbuild | 1 + arch/c6x/include/asm/Kbuild| 1 + arch/cris/include/asm/Kbuild | 1 + arch/frv/include/asm/Kbuild| 1 + arch/hexagon/include/asm/Kbuild| 1 + arch/ia64/include/asm/Kbuild | 2 +- arch/m32r/include/asm/Kbuild | 1 + arch/m68k/include/asm/Kbuild | 1 + arch/metag/include/asm/Kbuild | 1 + arch/microblaze/include/asm/Kbuild | 1 + arch/mips/include/asm/Kbuild | 1 + arch/mn10300/include/asm/Kbuild| 1 + arch/openrisc/include/asm/Kbuild | 1 + arch/parisc/include/asm/Kbuild | 1 + arch/powerpc/include/asm/Kbuild| 2 +- arch/s390/include/asm/Kbuild | 1 + arch/score/include/asm/Kbuild | 1 + arch/sh/include/asm/Kbuild | 1 + arch/sparc/include/asm/Kbuild | 1 + arch/tile/include/asm/Kbuild | 1 + arch/um/include/asm/Kbuild | 1 + arch/unicore32/include/asm/Kbuild | 1 + arch/x86/include/asm/Kbuild| 1 + arch/xtensa/include/asm/Kbuild | 1 + include/asm-generic/mcs_spinlock.h | 13 + include/linux/mcs_spinlock.h | 2 ++ 31 files changed, 44 insertions(+), 2 deletions(-) create mode 100644 include/asm-generic/mcs_spinlock.h diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index f01fb50..14cbbbc 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -4,3 +4,4 @@ generic-y += clkdev.h generic-y += exec.h generic-y += trace_clock.h generic-y += preempt.h +generic-y += mcs_spinlock.h diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index 9ae21c1..c0773a5 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -48,3 +48,4 @@ generic-y += user.h generic-y += vga.h generic-y += xor.h generic-y += preempt.h +generic-y += mcs_spinlock.h diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index c38b58c..c68cfdd 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -34,3 +34,4 @@ generic-y += timex.h generic-y += trace_clock.h generic-y += unaligned.h generic-y += preempt.h +generic-y += mcs_spinlock.h diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 519f89f..24a3c10 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -51,3 +51,4 @@ generic-y += user.h generic-y += vga.h generic-y += xor.h generic-y += preempt.h +generic-y += mcs_spinlock.h diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild index 658001b..466e13d 100644 --- a/arch/avr32/include/asm/Kbuild +++ b/arch/avr32/include/asm/Kbuild @@ -18,3 +18,4 @@ generic-y += sections.h generic-y += topology.h generic-y += trace_clock.h generic-y += xor.h +generic-y += mcs_spinlock.h diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild index f2b4347..0bd1c5c 100644 --- a/arch/blackfin/include/asm/Kbuild +++ b/arch/blackfin/include/asm/Kbuild @@ -45,3 +45,4 @@ generic-y += unaligned.h generic-y += user.h generic-y += xor.h generic-y += preempt.h +generic-y += mcs_spinlock.h diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild index fc0b3c3..21d7100 100644 --- a/arch/c6x/include/asm/Kbuild +++ b/arch/c6x/include/asm/Kbuild @@ -57,3 +57,4 @@ generic-y += user.h generic-y += vga.h generic-y += xor.h generic-y += preempt.h +generic-y += mcs_spinlock.h diff --git a/arch/cris/include/asm/Kbuild b/arch/cris/include/asm/Kbuild index 199b1a9..c571cc1 100644 --- a/arch/cris/include/asm/Kbuild +++ b/arch/cris/include/asm/Kbuild @@ -13,3 +13,4 @@ generic-y += trace_clock.h generic-y += vga.h generic-y += xor.h generic-y += preempt.h +generic-y += mcs_spinlock.h diff --git a/arch/frv/include/asm/Kbuild b/arch/frv/include/asm/Kbuild index 74742dc..ccca92e 100644 --- a/arch/frv/include/asm/Kbuild +++ b/arch/frv/include/asm/Kbuild @@ -3,3 +3,4 @@ generic-y += clkdev.h generic-y += exec.h generic-y += trace_clock.h generic-y += preempt.h +generic-y += mcs_spinlock.h diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild index ada843c..553077d 100644 --- a/arch/hexagon/include/asm/Kbuild +++ b/arch/hexagon/include/asm/Kbuild @@ -55,3 +55,4 @@ generic-y += ucontext.h generic-y += unaligned.h generic-y += xor.h generic-y += preempt.h +generic-y += mcs_spinlock.h diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild index f93ee08..25aed55 100644 --- a/arch/ia64/include/asm/Kbuild +++
[PATCH v8 5/6] MCS Lock: allow architectures to hook in to contended
From: Will Deacon When contended, architectures may be able to reduce the polling overhead in ways which aren't expressible using a simple relax() primitive. This patch allows architectures to hook into the mcs_{lock,unlock} functions for the contended cases only. Signed-off-by: Will Deacon Signed-off-by: Tim Chen --- kernel/locking/mcs_spinlock.c | 42 -- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/mcs_spinlock.c index c3ee9cf..e12ed32 100644 --- a/kernel/locking/mcs_spinlock.c +++ b/kernel/locking/mcs_spinlock.c @@ -16,6 +16,28 @@ #include #include +#ifndef arch_mcs_spin_lock_contended +/* + * Using smp_load_acquire() provides a memory barrier that ensures + * subsequent operations happen after the lock is acquired. + */ +#define arch_mcs_spin_lock_contended(l) \ +do { \ + while (!(smp_load_acquire(l))) \ + arch_mutex_cpu_relax(); \ +} while (0) +#endif + +#ifndef arch_mcs_spin_unlock_contended +/* + * smp_store_release() provides a memory barrier to ensure all + * operations in the critical section has been completed before + * unlocking. + */ +#define arch_mcs_spin_unlock_contended(l) \ + smp_store_release((l), 1) +#endif + /* * Note: the smp_load_acquire/smp_store_release pair is not * sufficient to form a full memory barrier across @@ -50,13 +72,9 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node) return; } ACCESS_ONCE(prev->next) = node; - /* -* Wait until the lock holder passes the lock down. -* Using smp_load_acquire() provides a memory barrier that -* ensures subsequent operations happen after the lock is acquired. -*/ - while (!(smp_load_acquire(>locked))) - arch_mutex_cpu_relax(); + + /* Wait until the lock holder passes the lock down. */ + arch_mcs_spin_lock_contended(>locked); } EXPORT_SYMBOL_GPL(mcs_spin_lock); @@ -78,12 +96,8 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node) while (!(next = ACCESS_ONCE(node->next))) arch_mutex_cpu_relax(); } - /* -* Pass lock to next waiter. -* smp_store_release() provides a memory barrier to ensure -* all operations in the critical section has been completed -* before unlocking. -*/ - smp_store_release(>locked, 1); + + /* Pass lock to next waiter. */ + arch_mcs_spin_unlock_contended(>locked); } EXPORT_SYMBOL_GPL(mcs_spin_unlock); -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 0/6] MCS Lock: MCS lock code cleanup and optimizations
This update to the patch series reorganize the order of the patches by fixing MCS lock barrier leakage first before making standalone MCS lock and unlock functions. We also changed the hooks to architecture specific mcs_spin_lock_contended and mcs_spin_lock_uncontended from needing Kconfig to generic-asm and putting arch specific asm headers as needed. Peter, please review the last patch and bless it with your signed-off if it looks right. This patch series fixes barriers of MCS lock and perform some optimizations. Proper passing of the mcs lock is now done with smp_load_acquire() in mcs_spin_lock() and smp_store_release() in mcs_spin_unlock. Note that this is not sufficient to form a full memory barrier across cpus on many architectures (except x86) for the mcs_unlock and mcs_lock pair. For code that needs a full memory barrier with mcs_unlock and mcs_lock pair, smp_mb__after_unlock_lock() should be used after mcs_lock. Will also added hooks to allow for architecture specific implementation and optimization of the of the contended paths of lock and unlock of mcs_spin_lock and mcs_spin_unlock functions. The original mcs lock code has potential leaks between critical sections, which was not a problem when MCS was embedded within the mutex but needs to be corrected when allowing the MCS lock to be used by itself for other locking purposes. The MCS lock code was previously embedded in the mutex.c and is now sepearted. This allows for easier reuse of MCS lock in other places like rwsem and qrwlock. Tim v8: 1. Move order of patches by putting barrier corrections first. 2. Use generic-asm headers for hooking in arch specific mcs_spin_lock_contended and mcs_spin_lock_uncontended function. 3. Some minor cleanup and comments added. v7: 1. Update architecture specific hooks with concise architecture specific arch_mcs_spin_lock_contended and arch_mcs_spin_lock_uncontended functions. v6: 1. Fix a bug of improper xchg_acquire and extra space in barrier fixing patch. 2. Added extra hooks to allow for architecture specific version of mcs_spin_lock and mcs_spin_unlock to be used. v5: 1. Rework barrier correction patch. We now use smp_load_acquire() in mcs_spin_lock() and smp_store_release() in mcs_spin_unlock() to allow for architecture dependent barriers to be automatically used. This is clean and will provide the right barriers for all architecture. v4: 1. Move patch series to the latest tip after v3.12 v3: 1. modified memory barriers to support non x86 architectures that have weak memory ordering. v2: 1. change export mcs_spin_lock as a GPL export symbol 2. corrected mcs_spin_lock to references Jason Low (1): MCS Lock: optimizations and extra comments Peter Zijlstra (1): MCS Lock: Allow architecture specific asm files to be used for contended case Tim Chen (2): MCS Lock: Restructure the MCS lock defines and locking MCS Lock: allow architectures to hook in to contended Waiman Long (2): MCS Lock: Barrier corrections MCS Lock: Move mcs_lock/unlock function into its own arch/alpha/include/asm/Kbuild | 1 + arch/arc/include/asm/Kbuild| 1 + arch/arm/include/asm/Kbuild| 1 + arch/arm64/include/asm/Kbuild | 1 + arch/avr32/include/asm/Kbuild | 1 + arch/blackfin/include/asm/Kbuild | 1 + arch/c6x/include/asm/Kbuild| 1 + arch/cris/include/asm/Kbuild | 1 + arch/frv/include/asm/Kbuild| 1 + arch/hexagon/include/asm/Kbuild| 1 + arch/ia64/include/asm/Kbuild | 2 +- arch/m32r/include/asm/Kbuild | 1 + arch/m68k/include/asm/Kbuild | 1 + arch/metag/include/asm/Kbuild | 1 + arch/microblaze/include/asm/Kbuild | 1 + arch/mips/include/asm/Kbuild | 1 + arch/mn10300/include/asm/Kbuild| 1 + arch/openrisc/include/asm/Kbuild | 1 + arch/parisc/include/asm/Kbuild | 1 + arch/powerpc/include/asm/Kbuild| 2 +- arch/s390/include/asm/Kbuild | 1 + arch/score/include/asm/Kbuild | 1 + arch/sh/include/asm/Kbuild | 1 + arch/sparc/include/asm/Kbuild | 1 + arch/tile/include/asm/Kbuild | 1 + arch/um/include/asm/Kbuild | 1 + arch/unicore32/include/asm/Kbuild | 1 + arch/x86/include/asm/Kbuild| 1 + arch/xtensa/include/asm/Kbuild | 1 + include/asm-generic/mcs_spinlock.h | 13 + include/linux/mcs_spinlock.h | 27 ++ include/linux/mutex.h | 5 +- kernel/locking/Makefile| 6 +-- kernel/locking/mcs_spinlock.c | 103 + kernel/locking/mutex.c | 60 +++-- 35 files changed, 185 insertions(+), 60 deletions(-) create mode 100644 include/asm-generic/mcs_spinlock.h create mode 100644 include/linux/mcs_spinlock.h create mode 100644 kernel/locking/mcs_spinlock.c -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to
[PATCH v8 1/6] MCS Lock: Barrier corrections
From: Waiman Long This patch corrects the way memory barriers are used in the MCS lock with smp_load_acquire and smp_store_release fucnction. It removes ones that are not needed. Suggested-by: Michel Lespinasse Signed-off-by: Waiman Long Signed-off-by: Jason Low Signed-off-by: Tim Chen --- kernel/locking/mutex.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 4dd6e4c..fbbd2ed 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -136,9 +136,12 @@ void mspin_lock(struct mspin_node **lock, struct mspin_node *node) return; } ACCESS_ONCE(prev->next) = node; - smp_wmb(); - /* Wait until the lock holder passes the lock down */ - while (!ACCESS_ONCE(node->locked)) + /* +* Wait until the lock holder passes the lock down. +* Using smp_load_acquire() provides a memory barrier that +* ensures subsequent operations happen after the lock is acquired. +*/ + while (!(smp_load_acquire(>locked))) arch_mutex_cpu_relax(); } @@ -156,8 +159,13 @@ static void mspin_unlock(struct mspin_node **lock, struct mspin_node *node) while (!(next = ACCESS_ONCE(node->next))) arch_mutex_cpu_relax(); } - ACCESS_ONCE(next->locked) = 1; - smp_wmb(); + /* +* Pass lock to next waiter. +* smp_store_release() provides a memory barrier to ensure +* all operations in the critical section has been completed +* before unlocking. +*/ + smp_store_release(>locked, 1); } /* -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ALSA: hda - Fix possible races in HDMI driver - lockup on shutdown when radeon.audio=1 after using audacity
Takashi Iwai wrote, on 20/01/14 19:22: At Sun, 19 Jan 2014 17:32:16 +1030, Arthur Marsh wrote: I have had reproducible lock-ups on shut-down (at the shutting down ALSA stage) of my AMD64 machine (Asus M3A78Pro motherboard, BIOS 1701 01/27/2011, CPU AMD Athlon(tm) II X4 640 Processor) running the 64 bit Linux kernel more recent than 3.12 when *both* radeon.audio=1 was set and I had been running audacity 2.0.5. (iommu=noaperture is also set). The problem was reproducible with the stock Debian kernel linux-image-3.13-rc6-amd64 version 3.13~rc6-1~exp1. The machine is using an ATI/AMD 3850HD video card with a DVI cable to a DVI input on my monitor, and the default audio device is the motherboard's on-board audio device: 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV670 [Radeon HD 3690/3850] 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] RV670/680 HDMI Audio [Radeon HD 3690/3800 Series] $ git bisect bad cbbaa603a03cc46681e24d6b2804b62fde95a2af is the first bad commit commit cbbaa603a03cc46681e24d6b2804b62fde95a2af Author: Takashi Iwai Date: Thu Oct 17 18:03:24 2013 +0200 ALSA: hda - Fix possible races in HDMI driver Some per_pin fields and ELD contents might be changed dynamically in multiple ways where the concurrent accesses are still opened in the current code. This patch fixes such possible races by using eld->lock in appropriate places. Reported-by: Anssi Hannula Signed-off-by: Takashi Iwai :04 04 0c29281f82a3ebd9a704d481114f9cfcefea07c8 d71fd101125cd29a628cb5e66c7ee4f56e28329b M sound When running audacity from the command line there was the following output: ALSA lib pcm_dsnoop.c:618:(snd_pcm_dsnoop_open) unable to open slave ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave Expression 'stream->playback.pcm' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 4611 Expression 'stream->playback.pcm' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 4611 Expression 'stream->playback.pcm' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 4611 I am happy to supply further information or run further tests to help in isolating the problem and verifying a solution. Could you build the kernel with lockdep kconfig and see whether it reports errors? Reverting the commit doesn't work cleanly. Instead, you can try to simply comment out all mutex_lock(_pin->lock) and mutex_unlock(_pin->lock) calls in patch_hdmi.c to see whether it's a mutex deadlock. thanks, Takashi I rebuilt the kernel after commenting out all mutex_lock(_pin->lock) and mutex_unlock(_pin->lock) calls in patch_hdmi.c, and the resulting kernel shutdown without hanging. Regards, Arthur. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Dirty deleted files cause pointless I/O storms (unless truncated first)
The code below runs quickly for a few iterations, and then it slows down and the whole system becomes laggy for far too long. Removing the sync_file_range call results in no I/O being performed at all (which means that the kernel isn't totally screwing this up), and changing "4096" to SIZE causes lots of I/O but without the going-out-to-lunch bit (unsurprisingly). Surprisingly, uncommenting the ftruncate call seems to fix the problem. This suggests that all the necessary infrastructure to avoid wasting time writing to deleted files is there but that it's not getting used. #define _GNU_SOURCE #include #include #include #include #include #include #define SIZE (16 * 1048576) static void hammer(const char *name) { int fd = open(name, O_RDWR | O_CREAT | O_EXCL, 0600); if (fd == -1) err(1, "open"); fallocate(fd, 0, 0, SIZE); void *addr = mmap(NULL, SIZE, PROT_WRITE, MAP_SHARED, fd, 0); if (addr == MAP_FAILED) err(1, "mmap"); memset(addr, 0, SIZE); if (munmap(addr, SIZE) != 0) err(1, "munmap"); if (sync_file_range(fd, 0, 4096, SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WAIT_AFTER) != 0) err(1, "sync_file_range"); if (unlink(name) != 0) err(1, "unlink"); // if (ftruncate(fd, 0) != 0) //err(1, "ftruncate"); close(fd); } int main(int argc, char **argv) { if (argc != 2) { printf("Usage: hammer_and_delete FILENAME\n"); return 1; } while (true) { hammer(argv[1]); write(1, ".", 1); } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [V2] MAINTAINERS: Add dts files for r8 series to SHMOBILE
Hi Ben, thanks for your work on this. On Mon, Jan 20, 2014 at 04:10:32PM +, Ben Dooks wrote: > Add a number of files to the list of files covered by SHMOBILE > so any changes to these can be reported with get_maintailers.pl > for the current SHMOILE architectures. I'm fine with you only addressing r8 and r7 SoCs, I am happy to make an incremental patch to cover the stragglers (sh7*, emev2). And I would be even happier if this patch didn't suffer excessive bike-shedding. But I think there are some minor inconsistencies in this patch. Firstly, I think the subject should probably mention r7 and defconfigs. > > Signed-off-by: Ben Dooks > --- > v2: > - add defconfigs and r7 configurations > - fix path to dt-bindings > > Cc: Joe Perches > Cc: Greg Kroah-Hartman > Cc: Andrew Morton > Cc: Magnus Damm > Cc: Simon Horman > Cc: linux-kernel@vger.kernel.org > Cc: linux...@vger.kernel.org > --- > MAINTAINERS | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 6c20792..f74d830 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1239,7 +1239,14 @@ Q: > http://patchwork.kernel.org/project/linux-sh/list/ > T: git git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas.git next > S: Supported > F: arch/arm/mach-shmobile/ > +F: arch/arm/boot/dts/r8* > +F: arch/arm/boot/dts/r7* > +F: arch/arm/configs/bockw_defconfig > +F: arch/arm/configs/genmai_defconfig > +F: arch/arm/configs/lager_defconfig > +F: arch/arm/configs/marzen_defconfig I believe you should ad the following as they are for boards that use SoCs whose name matches r8*. arch/arm/configs/armadillo800eva_defconfig arch/arm/configs/ape6evm_defconfig arch/arm/configs/koelsch_defconfig > F: drivers/sh/ > +F: include/dt-bindings/clock/r8a* > > ARM/SOCFPGA ARCHITECTURE > M: Dinh Nguyen > -- > 1.8.5.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sh" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Add HID's to hid-microsoft driver of Surface Type/Touch Cover 2 to fix bug
The below patch fixes a bug 64811 (https://bugzilla.kernel.org/show_bug.cgi?id=64811) of the Microsoft Surface Type/Touch cover 2 devices being detected as a multitouch device. The fix adds the HID of the two devices to hid-microsoft driver. This ensures that hid-input will eventually be used for the device and not hid-multitouch. >From 866c814f3f6740a5a79858fdf8bf5bbcdc3b57f8 Mon Sep 17 00:00:00 2001 From: Reyad Attiyat Date: Mon, 20 Jan 2014 16:24:49 -0600 Subject: [PATCH 1/2] Added in ID's for Surface Type/Touch cover 2 to the hid-microsoft driver --- drivers/hid/hid-ids.h | 4 +++- drivers/hid/hid-microsoft.c | 4 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index f9304cb..b523a8b 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -611,7 +611,9 @@ #define USB_DEVICE_ID_MS_PRESENTER_8K_USB0x0713 #define USB_DEVICE_ID_MS_DIGITAL_MEDIA_3K0x0730 #define USB_DEVICE_ID_MS_COMFORT_MOUSE_45000x076c - +#define USB_DEVICE_ID_MS_TOUCH_COVER_2 0x07a7 +#define USB_DEVICE_ID_MS_TYPE_COVER_2 0x07a9 + #define USB_VENDOR_ID_MOJO0x8282 #define USB_DEVICE_ID_RETRO_ADAPTER0x3201 diff --git a/drivers/hid/hid-microsoft.c b/drivers/hid/hid-microsoft.c index 551795b..2599de8 100644 --- a/drivers/hid/hid-microsoft.c +++ b/drivers/hid/hid-microsoft.c @@ -207,6 +207,10 @@ static const struct hid_device_id ms_devices[] = { .driver_data = MS_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_COMFORT_MOUSE_4500), .driver_data = MS_DUPLICATE_USAGES }, +{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TYPE_COVER_2), +.driver_data = 0 }, +{ HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_TOUCH_COVER_2), +.driver_data = 0 }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_PRESENTER_8K_BT), .driver_data = MS_PRESENTER }, -- 1.8.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe
On Mon, Jan 20, 2014 at 04:26:23PM -0800, Greg KH wrote: > On Tue, Jan 21, 2014 at 12:07:06AM +, Russell King - ARM Linux wrote: > > On Mon, Jan 20, 2014 at 03:51:28PM -0800, Greg KH wrote: > > > On Mon, Jan 20, 2014 at 11:16:03PM +, Russell King - ARM Linux wrote: > > > > I don't believe the driver model has any locking to prevent a drivers > > > > ->probe function running concurrently with it's ->remove function for > > > > two (or more) devices. > > > > > > The bus prevents this from happening. > > > > > > > The locking against this is done on a per-device basis, not a per-driver > > > > basis. > > > > > > No, on a per-bus basis. > > > > I don't see it. > > > > Let's start from driver_register(). > > Which happens from module probing, which is single-threaded, right? Yes, to _some_ extent - the driver is added to the bus list of drivers before existing drivers are probed, so it's always worth bearing in mind that if a new device comes along, it's possible for that device to be offered to even a driver which hasn't finished returning from its module_init(). > > If you think there's a per-driver lock that's held over probes or removes, > > please point it out. I'm fairly certain that there isn't, because we have > > to be able to deal with recursive probes (yes, we've had to deal with > > those in the past.) > > Hm, you are right, I think that's why we had to remove the locks. The > klist stuff handles us getting the needed locks for managing our > internal lists of devices and drivers, and those should be fine. > > So, let's go back to your original worry, what are you concerned about? > A device being removed while probe() is called? My concern is that we're turning something which should be simple into something unnecessarily complex. By that, I mean something along the lines of: static DEFINE_MUTEX(foo_mutex); static unsigned foo_devices; static int foo_probe(struct platform_device *pdev) { int ret; mutex_lock(_mutex); if (foo_devices++ == 0) uart_register_driver(); ret = foo_really_probe_device(pdev); if (ret) { if (--foo_devices == 0) uart_unregister_driver(); } mutex_unlock(_mutex); return ret; } static int foo_remove(struct platform_device *pdev) { mutex_lock(_mutex); foo_really_remove(pdev); if (--foo_devices == 0) uart_unregister_driver(); mutex_unlock(_mutex); return 0; } in every single serial driver we have... Wouldn't it just be better to fix the major/minor number problem rather than have to add all that code repetitively to all those drivers? -- FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad. Estimate before purchase was "up to 13.2Mbit". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.13-rc5] module: Add missing newline in printk call.
Tetsuo Handa writes: > Rusty, would you pick up this patch? > > This message was added in 3.13-rc1. Thus, should be fixed in 3.13. Thanks, applied. It's a bit trivial for a CC:stable though. Cheers, Rusty. > Tetsuo Handa wrote: >> From cc90e27d5cda227e7a0cbeb5de3cc1cbb1595dfa Mon Sep 17 00:00:00 2001 >> From: Tetsuo Handa >> Date: Mon, 23 Dec 2013 15:52:42 +0900 >> Subject: [PATCH] module: Add missing newline in printk call. >> >> Add missing \n and also follow commit bddb12b3 "kernel/module.c: use >> pr_foo()". >> >> Signed-off-by: Tetsuo Handa >> --- >> kernel/module.c |6 ++ >> 1 files changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/module.c b/kernel/module.c >> index f5a3b1e..d24fcf2 100644 >> --- a/kernel/module.c >> +++ b/kernel/module.c >> @@ -815,10 +815,8 @@ SYSCALL_DEFINE2(delete_module, const char __user *, >> name_user, >> return -EFAULT; >> name[MODULE_NAME_LEN-1] = '\0'; >> >> -if (!(flags & O_NONBLOCK)) { >> -printk(KERN_WARNING >> - "waiting module removal not supported: please upgrade"); >> -} >> +if (!(flags & O_NONBLOCK)) >> +pr_warn("waiting module removal not supported: please >> upgrade\n"); >> >> if (mutex_lock_interruptible(_mutex) != 0) >> return -EINTR; >> -- >> 1.7.1 >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe
On Tue, Jan 21, 2014 at 12:07:06AM +, Russell King - ARM Linux wrote: > On Mon, Jan 20, 2014 at 03:51:28PM -0800, Greg KH wrote: > > On Mon, Jan 20, 2014 at 11:16:03PM +, Russell King - ARM Linux wrote: > > > I don't believe the driver model has any locking to prevent a drivers > > > ->probe function running concurrently with it's ->remove function for > > > two (or more) devices. > > > > The bus prevents this from happening. > > > > > The locking against this is done on a per-device basis, not a per-driver > > > basis. > > > > No, on a per-bus basis. > > I don't see it. > > Let's start from driver_register(). Which happens from module probing, which is single-threaded, right? Or from module_init callbacks, which is single-threaded. Normally, busses never add devices (which is what drivers bind to), except in a single-at-a-time fashion, unless they really know what they are doing (i.e. PCI had multi-threaded device probing for a while, don't remember if it still does...) > This takes no locks and calls bus_add_driver(). > This also takes no locks and calls driver_attach(). > This walks the list of devices calling __driver_attach() for each. > __driver_attach() tries to match the device against the driver, > locks the parent device if one exists, and the device which is about > to be probed. It then calls driver_probe_device(). > driver_probe_device() inserts a runtime barrier and calls really_probe(). > really_probe() ultimately calls either the bus ->probe method or the > driver ->probe method. > > At no point in that sequence do I see anything which does any locking > on a per-driver basis. Let's look at device_add(). > > device_add() calls bus_probe_device(), which then calls device_attach(). > device_attach() takes the device's lock, and walks the list of drivers > calling __device_attach() on each driver. This then calls down into > driver_probe_device(), and the path is the same as the above. > > I don't see any per-driver locking here either. > > I've checked the klist stuff, don't see anything there. Ditto for > bus_for_each_drv(). > > If you think there's a per-driver lock that's held over probes or removes, > please point it out. I'm fairly certain that there isn't, because we have > to be able to deal with recursive probes (yes, we've had to deal with > those in the past.) Hm, you are right, I think that's why we had to remove the locks. The klist stuff handles us getting the needed locks for managing our internal lists of devices and drivers, and those should be fine. So, let's go back to your original worry, what are you concerned about? A device being removed while probe() is called? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next v5 8/9] xen-netback: Timeout packets in RX path
On 20/01/14 22:03, Wei Liu wrote: On Mon, Jan 20, 2014 at 09:24:28PM +, Zoltan Kiss wrote: @@ -557,12 +577,25 @@ void xenvif_disconnect(struct xenvif *vif) void xenvif_free(struct xenvif *vif) { int i, unmap_timeout = 0; + /* Here we want to avoid timeout messages if an skb can be legitimatly +* stucked somewhere else. Realisticly this could be an another vif's +* internal or QDisc queue. That another vif also has this +* rx_drain_timeout_msecs timeout, but the timer only ditches the +* internal queue. After that, the QDisc queue can put in worst case +* XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS skbs into that another vif's +* internal queue, so we need several rounds of such timeouts until we +* can be sure that no another vif should have skb's from us. We are +* not sending more skb's, so newly stucked packets are not interesting +* for us here. +*/ You beat me to this. Was about to reply to your other email. :-) It's also worth mentioning that DIV_ROUND_UP part is merely estimation, as you cannot possible know the maximum / miminum queue length of all other vifs (as they can be changed during runtime). In practice most users will stick with the default, but some advanced users might want to tune this value for individual vif (whether that's a good idea or not is another topic). So, in order to convince myself this is safe. I also did some analysis on the impact of having queue length other than default value. If queue_len < XENVIF_QUEUE_LENGTH, that means you can queue less packets in qdisc than default and drain it faster than calculated, which is safe. On the other hand if queue_len > XENVIF_QUEUE_LENGTH, it means actually you need more time than calculated. I'm in two minded here. The default value seems sensible to me but I'm still a bit worried about the queue_len > XENVIF_QUEUE_LENGTH case. An idea is to book-keep maximum tx queue len among all vifs and use that to calculate worst scenario. I don't think it should be that perfect. This is just a best effort estimation, if someone changes the vif queue length and see this message because of that, nothing very drastic will happen. It is just a rate limited warning message. Well, it is marked as error, because it is a serious condition. And also, the odds of seeing this message unnecessarily are quite low. With default settings (256 slots, max 17 per skb, 32 queue length, 10 secs queue drain timeout) this delay is 20 seconds. You can raise the queue length to 64 before getting warning (see netif_napi_add), so it would go up to 40 seconds, but anyway, if your vif is sitting on a packet more than 20 seconds, you deserve this message :) Zoli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe
On Mon, Jan 20, 2014 at 11:47:34PM +, Alan Cox wrote: > But yes I agree about the idiom, but a definite NAK to any attempts to > plaster over this grand screwup by crapping in the tty core. Your turd, > deal with it locally in the ARM code if you can't apply common sense and > just go dynamic. I believe at the time there was no one maintaining the device list to _do_ that allocation - AMBA PL011 came along in 2005 after (I believe) hpa stopped looking after that list. So, please tell me how a number could be allocated properly without the device numbers list being maintained? I've no problem with going dynamic, and I suggest that you get a sense of perspective rather than just spouting rubbish from on high. -- FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad. Estimate before purchase was "up to 13.2Mbit". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 2/3] zram: introduce zram compressor operations struct
On Mon, Jan 20, 2014 at 01:03:48PM +0300, Sergey Senozhatsky wrote: > On (01/20/14 14:12), Minchan Kim wrote: > > Hello Sergey, > > > > I reviewed this patchset and I suggest somethings. > > Please have a look and feedback to me. :) > > > > 1. Let's define new file zram_comp.c > > 2. zram_comp includes following field > >.create > >.compress > >.decompress. > >.destroy > >.name > > > > alternatively, we can use crypto api, the same way as zswap does (that > will require handling of cpu hotplug). > > -ss I really doubt what's the benefit from crypto API for zram. It's maybe since I'm not familiar with it so I should ask a silly question. 1. What's the runtime overhead for using such frontend? As you know, zram is in-memory block device so I don't want to add unnecessary overhead to optimize. 2. What's the memory footprint for using such frontend? As you know, zram is very popular for small-memory embedded device so I don't want to consume more runtime memory and static memory due to CONFIG_CRYPTO friend. 3. Is it a flexible to alloc/handle multiple compressor buffer for the our purpose? zswap and zcache have been used it with per-cpu buffer but it would a problem for write scalabitliy if we uses zlib which takes long time to compress. When I read code, maybe we can allocate multiple buffers through cryptop_alloc_compo several time but it would cause 1) and 2) problem again. So, what's the attractive point for using crypto? One of thing I could imagine is that it could make zram H/W compressor but I don't have heard about it so if we don't have any special reason, I'd like to go with raw compressor so we can get a *base* number. Then, if we really need crypto API, we can change it easily and benchmark. Finally, we could get a comparision number in future and it would make the decision easily. Thanks. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe
On Mon, Jan 20, 2014 at 03:51:28PM -0800, Greg KH wrote: > On Mon, Jan 20, 2014 at 11:16:03PM +, Russell King - ARM Linux wrote: > > I don't believe the driver model has any locking to prevent a drivers > > ->probe function running concurrently with it's ->remove function for > > two (or more) devices. > > The bus prevents this from happening. > > > The locking against this is done on a per-device basis, not a per-driver > > basis. > > No, on a per-bus basis. I don't see it. Let's start from driver_register(). This takes no locks and calls bus_add_driver(). This also takes no locks and calls driver_attach(). This walks the list of devices calling __driver_attach() for each. __driver_attach() tries to match the device against the driver, locks the parent device if one exists, and the device which is about to be probed. It then calls driver_probe_device(). driver_probe_device() inserts a runtime barrier and calls really_probe(). really_probe() ultimately calls either the bus ->probe method or the driver ->probe method. At no point in that sequence do I see anything which does any locking on a per-driver basis. Let's look at device_add(). device_add() calls bus_probe_device(), which then calls device_attach(). device_attach() takes the device's lock, and walks the list of drivers calling __device_attach() on each driver. This then calls down into driver_probe_device(), and the path is the same as the above. I don't see any per-driver locking here either. I've checked the klist stuff, don't see anything there. Ditto for bus_for_each_drv(). If you think there's a per-driver lock that's held over probes or removes, please point it out. I'm fairly certain that there isn't, because we have to be able to deal with recursive probes (yes, we've had to deal with those in the past.) -- FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad. Estimate before purchase was "up to 13.2Mbit". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the imx-mxs tree with the arm tree
Hi Shawn, Today's linux-next merge of the imx-mxs tree got conflicts in arch/arm/boot/dts/Makefile, arch/arm/boot/dts/imx6dl-hummingboard.dts, arch/arm/boot/dts/imx6qdl-microsom-ar8035.dtsi and arch/arm/boot/dts/imx6qdl-microsom.dtsi between commits 728d5599f5d8 ("ARM: imx: initial SolidRun HummingBoard support") and d79c363fd9cd ("ARM: imx: initial SolidRun Cubox-i support") from the arm tree and a series of commits from the imx-mxs tree. Russell told me that the imx-ims tree changes have not yet been approved for v3.13 inclusion and they conflict fairly badly, so for today I have dropped the imx-mxs tree. Let me know what I should do in the future. -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpb3_GXVe6n7.pgp Description: PGP signature
Re: Deadlock in do_page_fault() on ARM (old kernel)
On 01/17/2014 08:20 PM, Russell King - ARM Linux wrote: On Fri, Jan 17, 2014 at 07:57:16PM -0500, Alan Ott wrote: On 01/17/2014 08:46 AM, Russell King - ARM Linux wrote: My suspicion therefore is that some other thread must have died while holding the mmap_sem, so there's probably a kernel oops earlier... that's my best guess at the moment without seeing the full backtrace. There's no oops that I'm able to see. Each of the tasks which lockdep reports as "holding" mmap_sem are blocking for it. If some other task had taken it and then crashed, I assume lockdep would list the crashed task as also holding the resource in the printout. My point is this: - the five (or six) threads which are trying to take the mmap_sem in read-mode in the fault handler are all blocked on it - they haven't taken the lock, which will only happen because there's a pending writer. - of these in your original post, there are two which faulted from __copy_to_user_std(). __copy_to_user_std() doesn't take the mmap_sem - this is the non-uaccess-with-memcpy path. - the pending writers are the two threads in sys_mmap_pgoff(), both of which are blocked waiting to gain the write lock. - there are no *other* threads holding the mmap_sem lock. Yes, all true. I don't remember why I started looking at the memcpy() case. So... there's a question here how we got into this state - and frankly I don't know. What I do see from your latest dump is that there's two unknown modules there - something called rcu2m and another called buttoms, and there are two threads inside ioctls there. Both have faulted from the function at 0xc0d2a394 (which won't appear in the backtrace, but is most likely __copy_to_user_std.) Yes, there are a handful of out-of-tree modules. So, in the absence of you saying anything about there being any preceding oopses, my conclusion now is that one of those modules is taking the mmap_sem itself, and is the culpret inducing this deadlock. Yes, I came to that as well. I had checked for the presence of mmap_sem in the sources of the out-of-tree modules and didn't see it. However, upon closer inspection, my grep-fu failed me as there were some backward symlinks I didn't account for. TI's cmemk module _is_ taking out mmap_sem. I wish I had seen this days ago. That's my new investigation path. Note that your dump ([2]) in your reply was just the hung task detector printing out the stacktrace for a few tasks, not the full all-threads stack dump which I was expecting. Yes, in a misguided attempt to keep the SNR high, I didn't include the full dump, but only what I thought was the interesting part. I did another capture and the full dump is at [1] . So I'm pulling out these conclusions from the very little information you're supplying. I appreciate it. Thank you for taking the time to reply. Alan. [1] http://www.signal11.us/~alan/stack_dump_all_tasks_with_frame_pointers.txt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe
On Mon, Jan 20, 2014 at 11:35:41PM +, Alan Cox wrote: > > The first bit is easy... but we need to add locks to every serial > > driver to prevent two probes operating concurrently... > > The bus probe should already be serializing surely ? Yes, it better be, otherwise that bus is badly broken. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe
On Mon, Jan 20, 2014 at 11:16:03PM +, Russell King - ARM Linux wrote: > On Mon, Jan 20, 2014 at 03:11:41PM -0800, Greg KH wrote: > > On Mon, Jan 20, 2014 at 09:32:06PM +, Russell King - ARM Linux wrote: > > > On Mon, Jan 20, 2014 at 01:16:01PM -0800, Greg KH wrote: > > > > On Mon, Jan 20, 2014 at 10:05:30AM +, Russell King - ARM Linux > > > > wrote: > > > > > On Mon, Jan 20, 2014 at 02:32:34PM +0530, Tushar Behera wrote: > > > > > > uart_register_driver call binds the driver to a specific device > > > > > > node through tty_register_driver call. This should typically happen > > > > > > during device probe call. > > > > > > > > > > > > In a multiplatform scenario, it is possible that multiple serial > > > > > > drivers are part of the kernel. Currently the driver registration > > > > > > fails > > > > > > if multiple serial drivers with same default major/minor numbers are > > > > > > included in the kernel. > > > > > > > > > > > > A typical case is observed with amba-pl011 and samsung-uart drivers. > > > > > > > > > > The samsung-uart driver is at fault here - the major/minor numbers > > > > > were > > > > > officially registered to amba-pl011. Samsung needs to be fixed > > > > > properly. > > > > > > > > I agree, the Samsung driver is "broken" here, but that's no reason why > > > > these two drivers can't register with the tty layer _after_ the hardware > > > > is detected, and not before. > > > > > > > > That saves resources on systems that build the drivers in, yet do not > > > > have the hardware present, which is always a good thing. > > > > > > Great, so what you're saying is that we need to wait until the first > > > device calls into the probe function. What about removal... how does > > > a driver know when it's last device has been removed to de-register > > > that? > > > > The "bus" that the device is on handles that, right? > > > > > I guess it needs the driver model to provide some way to know when a > > > driver is completely unbound - but isn't that racy? > > > > How is it racy? That's how the driver model works... > > Think about what happens when the last device unregisters, but a new > device comes along and is probed. > > I don't believe the driver model has any locking to prevent a drivers > ->probe function running concurrently with it's ->remove function for > two (or more) devices. The bus prevents this from happening. > The locking against this is done on a per-device basis, not a per-driver > basis. No, on a per-bus basis. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe
On Mon, 20 Jan 2014 23:14:57 + Mark Brown wrote: > On Mon, Jan 20, 2014 at 09:43:05PM +, Alan Cox wrote: > > > The dynamic major/minor is the right patch. If the userspace breaks then > > the userspace was broken, but I see no evidence in the discussion that > > the userspace broke. > > The userspace breakage is that if someone has a static /dev that doesn't > handle any dynamic devices then renumbering the device will cause that > static /dev to stop matching the kernel. Diddums and you've only provided theoretical cases not real world ones. They should have followed proper practice and reserved their minors. The device number belongs to the Altix. The driver should just move. > > Thats what the list says. Samsung should have followed the rules, they > > didn't so they get to pick up the pieces. The Amba driver wants moving as > > well. It's easy. If you want something to be ABI then make sure you get > > it upstream first, if not you get to own all the pain down the line. > > This stuff is all upstream already, a quick check suggests both drivers > predate git - it's been noticed because the ARM multiplatform work has > caused people to try booting kernels with both built in. So ARM people didn't follow the policy on allocating device minors even within their own community and got burned by it. That's despite having previously been burned by abusing the ttyS0 8250 major/minor in the same way, for the same purposes, and creating the same mess. {facepalm...} > > If the hardware isn't present then the driver shouldn't even register > > with the tty layer in the first place so it doesn't make any resource > > differeneces either for properly written code. > > Right, that's not the idiom that has been followed by any of serial > drivers though so needs fixing too. Actally some drivers do get this right but not many. ehv-bc for example does. But yes I agree about the idiom, but a definite NAK to any attempts to plaster over this grand screwup by crapping in the tty core. Your turd, deal with it locally in the ARM code if you can't apply common sense and just go dynamic. And please, after screwing this up twice - *learn* from the mess. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] vfs: Remove second variable named error in __dentry_path
In commit 232d2d60aa5469bb097f55728f65146bd49c1d25 Author: Waiman Long Date: Mon Sep 9 12:18:13 2013 -0400 dcache: Translating dentry into pathname without taking rename_lock The __dentry_path locking was changed and the variable error was intended to be moved outside of the loop. Unfortunately the inner declaration of error was not removed. Resulting in a version of __dentry_path that will never return an error. Remove the problematic inner declaration of error and allow __dentry_path to return errors once again. Cc: sta...@vger.kernel.org Cc: Waiman Long Signed-off-by: "Eric W. Biederman" --- fs/dcache.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index cb4a10690868..fdbe23027810 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -3135,7 +3135,6 @@ restart: read_seqbegin_or_lock(_lock, ); while (!IS_ROOT(dentry)) { struct dentry *parent = dentry->d_parent; - int error; prefetch(parent); error = prepend_name(, , >d_name); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe
> The first bit is easy... but we need to add locks to every serial > driver to prevent two probes operating concurrently... The bus probe should already be serializing surely ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 6/6] MCS Lock: add Kconfig entries to allow arch-specific hooks
On Mon, 2014-01-20 at 13:30 +0100, Peter Zijlstra wrote: > > Then again, people seem to whinge if you don't keep these Kbuild files > sorted, but manually sorting 29 files is just not something I like to > do. > Peter, Can you clarify what exactly needs to be sorted? The Kbuild files spit out by git diff appears to be sorted already. Tim > --- > arch/alpha/include/asm/Kbuild | 1 + > arch/arc/include/asm/Kbuild| 1 + > arch/arm/include/asm/Kbuild| 1 + > arch/arm64/include/asm/Kbuild | 1 + > arch/avr32/include/asm/Kbuild | 1 + > arch/blackfin/include/asm/Kbuild | 1 + > arch/c6x/include/asm/Kbuild| 1 + > arch/cris/include/asm/Kbuild | 1 + > arch/frv/include/asm/Kbuild| 1 + > arch/hexagon/include/asm/Kbuild| 1 + > arch/ia64/include/asm/Kbuild | 2 +- > arch/m32r/include/asm/Kbuild | 1 + > arch/m68k/include/asm/Kbuild | 1 + > arch/metag/include/asm/Kbuild | 1 + > arch/microblaze/include/asm/Kbuild | 1 + > arch/mips/include/asm/Kbuild | 1 + > arch/mn10300/include/asm/Kbuild| 1 + > arch/openrisc/include/asm/Kbuild | 1 + > arch/parisc/include/asm/Kbuild | 1 + > arch/powerpc/include/asm/Kbuild| 2 +- > arch/s390/include/asm/Kbuild | 1 + > arch/score/include/asm/Kbuild | 1 + > arch/sh/include/asm/Kbuild | 1 + > arch/sparc/include/asm/Kbuild | 1 + > arch/tile/include/asm/Kbuild | 1 + > arch/um/include/asm/Kbuild | 1 + > arch/unicore32/include/asm/Kbuild | 1 + > arch/x86/include/asm/Kbuild| 1 + > arch/xtensa/include/asm/Kbuild | 1 + > include/asm-generic/mcs_spinlock.h | 13 + > 30 files changed, 42 insertions(+), 2 deletions(-) > > diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild > index f01fb505ad52..14cbbbcec01f 100644 > --- a/arch/alpha/include/asm/Kbuild > +++ b/arch/alpha/include/asm/Kbuild > @@ -4,3 +4,4 @@ generic-y += clkdev.h > generic-y += exec.h > generic-y += trace_clock.h > generic-y += preempt.h > +generic-y += mcs_spinlock.h > diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild > index 9ae21c198007..c0773a5c2ca7 100644 > --- a/arch/arc/include/asm/Kbuild > +++ b/arch/arc/include/asm/Kbuild > @@ -48,3 +48,4 @@ generic-y += user.h > generic-y += vga.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += mcs_spinlock.h > diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild > index c38b58c80202..c68cfdde8783 100644 > --- a/arch/arm/include/asm/Kbuild > +++ b/arch/arm/include/asm/Kbuild > @@ -34,3 +34,4 @@ generic-y += timex.h > generic-y += trace_clock.h > generic-y += unaligned.h > generic-y += preempt.h > +generic-y += mcs_spinlock.h > diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild > index 519f89f5b6a3..24a3c10cdf38 100644 > --- a/arch/arm64/include/asm/Kbuild > +++ b/arch/arm64/include/asm/Kbuild > @@ -51,3 +51,4 @@ generic-y += user.h > generic-y += vga.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += mcs_spinlock.h > diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild > index 658001b52400..466e13d06bd3 100644 > --- a/arch/avr32/include/asm/Kbuild > +++ b/arch/avr32/include/asm/Kbuild > @@ -18,3 +18,4 @@ generic-y += sections.h > generic-y += topology.h > generic-y+= trace_clock.h > generic-y += xor.h > +generic-y += mcs_spinlock.h > diff --git a/arch/blackfin/include/asm/Kbuild > b/arch/blackfin/include/asm/Kbuild > index f2b43474b0e2..0bd1c5c688e3 100644 > --- a/arch/blackfin/include/asm/Kbuild > +++ b/arch/blackfin/include/asm/Kbuild > @@ -45,3 +45,4 @@ generic-y += unaligned.h > generic-y += user.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += mcs_spinlock.h > diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild > index fc0b3c356027..21d7100ddef9 100644 > --- a/arch/c6x/include/asm/Kbuild > +++ b/arch/c6x/include/asm/Kbuild > @@ -57,3 +57,4 @@ generic-y += user.h > generic-y += vga.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += mcs_spinlock.h > diff --git a/arch/cris/include/asm/Kbuild b/arch/cris/include/asm/Kbuild > index 199b1a9dab89..c571cc12a4d2 100644 > --- a/arch/cris/include/asm/Kbuild > +++ b/arch/cris/include/asm/Kbuild > @@ -13,3 +13,4 @@ generic-y += trace_clock.h > generic-y += vga.h > generic-y += xor.h > generic-y += preempt.h > +generic-y += mcs_spinlock.h > diff --git a/arch/frv/include/asm/Kbuild b/arch/frv/include/asm/Kbuild > index 74742dc6a3da..ccca92eb782a 100644 > --- a/arch/frv/include/asm/Kbuild > +++ b/arch/frv/include/asm/Kbuild > @@ -3,3 +3,4 @@ generic-y += clkdev.h > generic-y += exec.h > generic-y += trace_clock.h > generic-y += preempt.h > +generic-y += mcs_spinlock.h > diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild > index ada843c701ef..553077d0f50c 100644 > ---
Re: 3.12-rc5 and overwritten partition table - by powertop?
Am 2013-10-29 21:10, schrieb Jan Kara: >> The first ~170kb of /dev/sda got blown away with what seems to be a logging >> output >> by Powertop, when I was playing with the tuneables. >> (Luckily the first partition starts later :-)) > So did you log the output to some file? I'm just trying to understand how > it could get onto your disk in the first place... > >> Why is that I don't know, but maybe when turning on the SATA knobs >> something goes wrong. I'm afraid to try again, but I accept rather higher >> power use than data loss again :-/ I experienced the same on the very same hardware (Lenovo T440s). Like John, I turned all those knobs in powertop, including the SATA ones. Several time I ended up with broken partition table. Once, even my EFI System partition (first partition) was broken. However, since I use EFI I was able to recover the partition table quite easily (gdisk asks for recovery from backup partition table, kudos to the designer of the GPT format!). This happens running on Arch Linux with stock 3.12.7 as well as mainline 3.13 kernel. I use latest T440s firmware (2.17). Is it possible to disable/warn user when using that knob (at least on Lenovo T440s), in order to avoid users left at an unbootable system...? dmesg output: [2.744398] ata1: SATA max UDMA/133 abar m2048@0xf063c000 port 0xf063c100 irq 59 [2.744400] ata2: DUMMY [2.744401] ata3: DUMMY [3.063804] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [3.064532] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded [3.064536] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out [3.064538] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [3.064606] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded [3.064926] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1 [3.064929] ata1.00: ATA-9: Samsung SSD 840 PRO Series, DXM05B0Q, max UDMA/133 [3.064931] ata1.00: 500118192 sectors, multi 16: LBA48 NCQ (depth 31/32), AA [3.065256] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded [3.065259] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out [3.065261] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [3.065286] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES) succeeded [3.065545] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1 [3.065605] ata1.00: configured for UDMA/133 ... [ 130.578789] ata1.00: exception Emask 0x0 SAct 0x7fff SErr 0x4 action 0x0 [ 130.578794] ata1.00: irq_stat 0x4001 [ 130.578796] ata1: SError: { CommWake } [ 130.578798] ata1.00: failed command: WRITE FPDMA QUEUED [ 130.578802] ata1.00: cmd 61/10:00:f0:29:05/00:00:00:00:00/40 tag 0 ncq 8192 out [ 130.578804] ata1.00: status: { DRDY ERR } [ 130.578806] ata1.00: error: { ABRT } ... [ 130.579011] ata1.00: failed command: WRITE FPDMA QUEUED [ 130.579014] ata1.00: cmd 61/10:f0:58:7c:0f/00:00:00:00:00/40 tag 30 ncq 8192 out [ 130.579016] ata1.00: status: { DRDY ERR } [ 130.579017] ata1.00: error: { ABRT } [ 130.579207] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1 [ 130.579456] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1 [ 130.579511] ata1.00: configured for UDMA/133 [ 130.579583] ata1: EH complete -- Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Poor network performance x86_64.. also with 3.13
On 01/20/2014 11:37 PM, Borislav Petkov wrote: On Mon, Jan 20, 2014 at 11:27:25PM +0100, Daniel Exner wrote: I just did the same procedure with Kernel Version 3.13: same poor rates. I think I will try to see of 3.12.6 was still ok and bisect from there. Or try something more coarse-grained like 3.11 first, then 3.12 and then the -rcs in between. Hm, on my machine 3.13 (latest git) has double throughtput of 3.11 (distro compiled) on loopback interface. 68Gb vs 33Gb (iperf). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] vfs: Is mounted should be testing mnt_ns for NULL or error.
A bug was introduced with the is_mounted helper function in commit f7a99c5b7c8bd3d3f533c8b38274e33f3da9096e Author: Al Viro Date: Sat Jun 9 00:59:08 2012 -0400 get rid of ->mnt_longterm it's enough to set ->mnt_ns of internal vfsmounts to something distinct from all struct mnt_namespace out there; then we can just use the check for ->mnt_ns != NULL in the fast path of mntput_no_expire() Signed-off-by: Al Viro The intent was to test if the real_mount(vfsmount)->mnt_ns was NULL_OR_ERR but the code is actually testing real_mount(vfsmount) and always returning true. The result is d_absolute_path returning paths it should be hiding. Cc: sta...@vger.kernel.org Signed-off-by: "Eric W. Biederman" --- fs/mount.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index d64c594be6c4..a17458ca6f29 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -74,7 +74,7 @@ static inline int mnt_has_parent(struct mount *mnt) static inline int is_mounted(struct vfsmount *mnt) { /* neither detached nor internal? */ - return !IS_ERR_OR_NULL(real_mount(mnt)); + return !IS_ERR_OR_NULL(real_mount(mnt)->mnt_ns); } extern struct mount *__lookup_mnt(struct vfsmount *, struct dentry *); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NFSv4.1: new layout stateid can not be overwrite by one out of date
On Mon, 2014-01-20 at 16:15 +0800, shaobingqing wrote: > If initiate_file_draining returned NFS4ERR_DELAY, all the lsegs of > a file might be released before the retrying cb_layout request arriving > at the client. In this situation, layoutget request of the file will > use open stateid to obtain a new layout stateid. And if the retrying > cb_layout request arrived at the client after the layoutget reply, > new layout stateid would be overwrite by one out of date. > > Signed-off-by: shaobingqing > --- > fs/nfs/callback_proc.c |4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c > index ae2e87b..98fed13 100644 > --- a/fs/nfs/callback_proc.c > +++ b/fs/nfs/callback_proc.c > @@ -174,7 +174,9 @@ static u32 initiate_file_draining(struct nfs_client *clp, > rv = NFS4ERR_DELAY; > else > rv = NFS4ERR_NOMATCHING_LAYOUT; > - pnfs_set_layout_stateid(lo, >cbl_stateid, true); > + if (memcmp(args->cbl_stateid.other, lo->plh_stateid.other, > + NFS4_STATEID_OTHER_SIZE) == 0) > + pnfs_set_layout_stateid(lo, >cbl_stateid, true); Well... We shouldn't really be calling pnfs_mark_matching_lsegs_invalid() either in this case... > spin_unlock(>i_lock); > pnfs_free_lseg_list(_me_list); > pnfs_put_layout_hdr(lo); -- Trond Myklebust Linux NFS client maintainer -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] serial: samsung: Move uart_register_driver call to device probe
On Mon, Jan 20, 2014 at 11:14:57PM +, Mark Brown wrote: > On Mon, Jan 20, 2014 at 09:43:05PM +, Alan Cox wrote: > > If the hardware isn't present then the driver shouldn't even register > > with the tty layer in the first place so it doesn't make any resource > > differeneces either for properly written code. > > Right, that's not the idiom that has been followed by any of serial > drivers though so needs fixing too. It's not followed by serial drivers because it gets f*scking complicated to do it that way. In order to do it that way, what we need to do is: 1. On the first device probe, register the UART driver. 2. On subsequent device probes, don't register the UART driver because it's already registered. 3. When devices are removed, do nothing until the last device. 4. When the last device is removed, unregister the UART driver. The first bit is easy... but we need to add locks to every serial driver to prevent two probes operating concurrently... -- FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad. Estimate before purchase was "up to 13.2Mbit". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] ACPI / idle: Move idle_boot_override out of the arch directory
On Monday, January 20, 2014 10:08:41 PM Hanjun Guo wrote: > On 2014年01月18日 21:47, Rafael J. Wysocki wrote: > > On Saturday, January 18, 2014 11:52:18 AM Hanjun Guo wrote: > >> On 2014-1-18 11:45, Hanjun Guo wrote: > >>> On 2014-1-17 20:06, Sudeep Holla wrote: > On 17/01/14 02:03, Hanjun Guo wrote: > > Move idle_boot_override out of the arch directory to be a single enum > > including both platforms values, this will make it rather easier to > > avoid ifdefs around which definitions are for which processor in > > generally used ACPI code. > > > > IDLE_FORCE_MWAIT for IA64 is not used anywhere, so romove it. > > > > No functional change in this patch. > > > > Suggested-by: Alan > > Signed-off-by: Hanjun Guo > > --- > >> [...] > > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > > index 03e235ad..e324561 100644 > > --- a/include/linux/cpu.h > > +++ b/include/linux/cpu.h > > @@ -220,6 +220,14 @@ void cpu_idle(void); > > > > void cpu_idle_poll_ctrl(bool enable); > > > > +enum idle_boot_override { > > + IDLE_NO_OVERRIDE = 0, > > + IDLE_HALT, > > + IDLE_NOMWAIT, > > + IDLE_POLL, > > + IDLE_POWERSAVE_OFF > > +}; > > + > I do understand the idea behind this change, but IMO HALT and MWAIT are > x86 > specific and may not make sense for other architectures. > >>> yes, this is the strange part, the value is arch-dependent. > >>> > It will also require every architecture using ACPI to export > boot_option_idle_override which may not be really required. > >>> so, how about forget this patch and move boot_option_idle_override > >>> related code into arch directory such as arch/x86/acpi/boot.c for > >>> x86? > >> The general idea is that we can move all the arch-dependent codes > >> in ACPI driver to arch directory, then make codes in drivers/acpi/ > >> arch independent. > > Well, MWAIT is arch-dependent, so I'm not sure how IDLE_NOMWAIT fits into > > include/linux/cpu.h? > > So you will not happy with this patch and should find another solution? No, I'm not happy with it. If you want to move that to an arch-agnostic header, the symbol names cannot be arch-dependent any more. Thanks! -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 9/9] mm: keep page cache radix tree nodes in check
On Fri, Jan 17, 2014 at 11:05:17AM +1100, Dave Chinner wrote: > On Fri, Jan 10, 2014 at 01:10:43PM -0500, Johannes Weiner wrote: > > Previously, page cache radix tree nodes were freed after reclaim > > emptied out their page pointers. But now reclaim stores shadow > > entries in their place, which are only reclaimed when the inodes > > themselves are reclaimed. This is problematic for bigger files that > > are still in use after they have a significant amount of their cache > > reclaimed, without any of those pages actually refaulting. The shadow > > entries will just sit there and waste memory. In the worst case, the > > shadow entries will accumulate until the machine runs out of memory. > > > > To get this under control, the VM will track radix tree nodes > > exclusively containing shadow entries on a per-NUMA node list. > > Per-NUMA rather than global because we expect the radix tree nodes > > themselves to be allocated node-locally and we want to reduce > > cross-node references of otherwise independent cache workloads. A > > simple shrinker will then reclaim these nodes on memory pressure. > > > > A few things need to be stored in the radix tree node to implement the > > shadow node LRU and allow tree deletions coming from the list: > > Just a couple of things with the list_lru interfaces. > > > > @@ -123,9 +129,39 @@ static void page_cache_tree_delete(struct > > address_space *mapping, > > * same time and miss a shadow entry. > > */ > > smp_wmb(); > > - } else > > - radix_tree_delete(>page_tree, page->index); > > + } > > mapping->nrpages--; > > + > > + if (!node) { > > + /* Clear direct pointer tags in root node */ > > + mapping->page_tree.gfp_mask &= __GFP_BITS_MASK; > > + radix_tree_replace_slot(slot, shadow); > > + return; > > + } > > + > > + /* Clear tree tags for the removed page */ > > + index = page->index; > > + offset = index & RADIX_TREE_MAP_MASK; > > + for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) { > > + if (test_bit(offset, node->tags[tag])) > > + radix_tree_tag_clear(>page_tree, index, tag); > > + } > > + > > + /* Delete page, swap shadow entry */ > > + radix_tree_replace_slot(slot, shadow); > > + node->count--; > > + if (shadow) > > + node->count += 1U << RADIX_TREE_COUNT_SHIFT; > > + else > > + if (__radix_tree_delete_node(>page_tree, node)) > > + return; > > + > > + /* Only shadow entries in there, keep track of this node */ > > + if (!(node->count & RADIX_TREE_COUNT_MASK) && > > + list_empty(>private_list)) { > > + node->private_data = mapping; > > + list_lru_add(_shadow_nodes, >private_list); > > + } > > You can't do this list_empty(>private_list) check safely > externally to the list_lru code - only time that entry can be > checked safely is under the LRU list locks. This is the reason that > list_lru_add/list_lru_del return a boolean to indicate is the object > was added/removed from the list - they do this list_empty() check > internally. i.e. the correct, safe way to do conditionally update > state iff the object was added to the LRU is: > > if (!(node->count & RADIX_TREE_COUNT_MASK)) { > if (list_lru_add(_shadow_nodes, >private_list)) > node->private_data = mapping; > } > > > + radix_tree_replace_slot(slot, page); > > + mapping->nrpages++; > > + if (node) { > > + node->count++; > > + /* Installed page, can't be shadow-only anymore */ > > + if (!list_empty(>private_list)) > > + list_lru_del(_shadow_nodes, > > +>private_list); > > + } > > Same issue here: > > if (node) { > node->count++; > list_lru_del(_shadow_nodes, >private_list); > } All modifications to node->private_list happen under mapping->tree_lock, and modifications of a neighboring link should not affect the outcome of the list_empty(), so I don't think the lru lock is necessary. It would be cleaner to take it of course, but that would mean adding an unconditional NUMAnode-wide lock to every page cache population. > > static int __add_to_page_cache_locked(struct page *page, > > diff --git a/mm/list_lru.c b/mm/list_lru.c > > index 72f9decb0104..47a9faf4070b 100644 > > --- a/mm/list_lru.c > > +++ b/mm/list_lru.c > > @@ -88,10 +88,18 @@ restart: > > ret = isolate(item, >lock, cb_arg); > > switch (ret) { > > case LRU_REMOVED: > > + case LRU_REMOVED_RETRY: > > if (--nlru->nr_items == 0) > > node_clear(nid, lru->active_nodes); > > WARN_ON_ONCE(nlru->nr_items < 0); > > isolated++; > > + /* > > +* If the lru lock has been dropped, our list > >
Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff
On Mon, 2014-01-20 at 14:59 +0800, shaobingqing wrote: > In current code, there only one struct rpc_rqst is prealloced. If one > callback request is received from two sk_buff, the xprt_alloc_bc_request > would be execute two times with the same transport->xid. The first time > xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA > bit of transport->tcp_flags will not be cleared. The second time > xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL > pointer will be returned, then xprt_force_disconnect occur. I think one > callback request can be allowed to be received from two sk_buff. > > Signed-off-by: shaobingqing > --- > net/sunrpc/xprtsock.c | 11 +-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index ee03d35..606950d 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -1271,8 +1271,13 @@ static inline int xs_tcp_read_callback(struct rpc_xprt > *xprt, > struct sock_xprt *transport = > container_of(xprt, struct sock_xprt, xprt); > struct rpc_rqst *req; > + static struct rpc_rqst *req_partial; > + > + if (req_partial == NULL) > + req = xprt_alloc_bc_request(xprt); > + else if (req_partial->rq_xid == transport->tcp_xid) > + req = req_partial; What happens here if req_partial->rq_xid != transport->tcp_xid? AFAICS, req will be undefined. Either way, you cannot use a static variable for storage here: that isn't re-entrant. > - req = xprt_alloc_bc_request(xprt); > if (req == NULL) { > printk(KERN_WARNING "Callback slot table overflowed\n"); > xprt_force_disconnect(xprt); > @@ -1285,6 +1290,7 @@ static inline int xs_tcp_read_callback(struct rpc_xprt > *xprt, > > if (!(transport->tcp_flags & TCP_RCV_COPY_DATA)) { > struct svc_serv *bc_serv = xprt->bc_serv; > + req_partial = NULL; > > /* >* Add callback request to callback list. The callback > @@ -1297,7 +1303,8 @@ static inline int xs_tcp_read_callback(struct rpc_xprt > *xprt, > list_add(>rq_bc_list, _serv->sv_cb_list); > spin_unlock(_serv->sv_cb_lock); > wake_up(_serv->sv_cb_waitq); > - } > + } else > + req_partial = req; > > req->rq_private_buf.len = transport->tcp_copied; > -- Trond Myklebust Linux NFS client maintainer -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/