Re: [PATCH 4.1 099/267] phy: twl4030-usb: remove incorrect pm_runtime_get_sync() in probe function.
Hello, this patch killed the musb-host functionality on my classic Beagleboard (rev c4). Symptom was that it there was a message I don't remember and the attached device didn't enumerate anymore (likely because of missing power, but I'm not sure). A simple revert has fixed it, I haven't looked further into the problem. Regards, Alexander Holler Am 31.07.2015 um 21:39 schrieb Greg Kroah-Hartman: 4.1-stable review patch. If anyone has any objections, please let me know. -- From: NeilBrown commit 4724e27114c4a7ec07db227a17fcab6f165c upstream. The USB phy should initialize with power-off, and will be powered on by the USB system when a cable connection is detected. Having this pm_runtime_get_sync() during probe causes the phy to *always* be powered on. Removing it returns to sensible power management. Fixes: 96be39ab34b77c6f6f5cd6ae03aac6c6449ee5c4 Signed-off-by: NeilBrown Signed-off-by: Kishon Vijay Abraham I Signed-off-by: Greg Kroah-Hartman --- drivers/phy/phy-twl4030-usb.c |1 - 1 file changed, 1 deletion(-) --- a/drivers/phy/phy-twl4030-usb.c +++ b/drivers/phy/phy-twl4030-usb.c @@ -711,7 +711,6 @@ static int twl4030_usb_probe(struct plat pm_runtime_use_autosuspend(>dev); pm_runtime_set_autosuspend_delay(>dev, 2000); pm_runtime_enable(>dev); - pm_runtime_get_sync(>dev); /* Our job is to use irqs and status from the power module * to keep the transceiver disabled when nothing's connected. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 2/2] usb: Add support for ACPI identification to xhci-platform
Hello Greg, On Sat, Aug 8, 2015 at 3:29 AM, Greg KH wrote: > On Fri, Aug 07, 2015 at 06:03:36PM -0700, Duc Dang wrote: >> Provide the methods to let ACPI identify the need to use >> xhci-platform. Change the Kconfig files so the >> xhci-plat.o file is selectable during kernel config. >> >> This has been tested on an ARM64 machine with platform XHCI, an >> x86_64 machine with XHCI, and an x86_64 machine without XHCI. >> There were no regressions or error messages on the machines >> without platform XHCI. >> >> [dhdang: regenerate the patch over 4.2-rc5] >> Signed-off-by: Mark Langsdorf >> Signed-off-by: Duc Dang >> >> --- >> Changes from v3: >> Regenerate the patch over 4.2-rc5 >> No code change >> >> Changes from v2 >> Replaced tristate with a boolean as the driver doesn't >> compile as a module >> Correct --help-- to ---help--- >> >> Changes from v1 >> Renamed from "add support for APM X-Gene to xhci-platform" >> Removed changes to arm64/Kconfig >> Made CONFIG_USB_XHCI_PLATFORM a user selectable config option >> >> drivers/usb/host/Kconfig | 7 ++- >> drivers/usb/host/xhci-plat.c | 11 +++ >> 2 files changed, 17 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig >> index 8afc3c1..96231ee 100644 >> --- a/drivers/usb/host/Kconfig >> +++ b/drivers/usb/host/Kconfig >> @@ -32,7 +32,12 @@ config USB_XHCI_PCI >> default y >> >> config USB_XHCI_PLATFORM >> - tristate >> + tristate "xHCI platform driver support" >> + ---help--- >> + Say 'Y' to enable the support for the xHCI host controller >> + as a platform device. Many ARM SoCs provide USB this way. >> + >> + If unsure, say 'Y'. >> >> config USB_XHCI_MVEBU >> tristate "xHCI support for Marvell Armada 375/38x" >> diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c >> index 5d03f8b..14b40d2 100644 >> --- a/drivers/usb/host/xhci-plat.c >> +++ b/drivers/usb/host/xhci-plat.c >> @@ -19,6 +19,7 @@ >> #include >> #include >> #include >> +#include >> >> #include "xhci.h" >> #include "xhci-mvebu.h" >> @@ -262,6 +263,15 @@ static const struct of_device_id usb_xhci_of_match[] = { >> MODULE_DEVICE_TABLE(of, usb_xhci_of_match); >> #endif >> >> +#ifdef CONFIG_ACPI > > You shoudn't need this #ifdef, right? > Why it is not needed? The driver does .acpi_match_table = ACPI_PTR(usb_xhci_acpi_match) and ACPI_PTR() is NULL if CONFIG_ACPI is not enabled. Which can happen AFAIU since the driver also supports OF. So without the #ifdef guards, .acpi_match_table = NULL and the struct acpi_device_id usb_xhci_acpi_match[] will be built but not used. Or am I missing something? > thanks, > > greg k-h > Best regards, Javier -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Aug 07, 2015 at 10:17:43PM -0700, Ming Lin wrote: > On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet > > Ideally we'd get upper layers out of the business of knowing about the queue > > limits at all - that was the point of the patch series, after all. > > > > Instead of using UINT_MAX, would it work to just make the max 1 << 31 > > sectors?' > > 1 << 31 = 2G bytes = 0x40 sectors. > > Yes, that works as long as it's multiple of granularity. Is granularity required to be a power of two? One would hope, but looking at the code that doesn't appear to be a requirement... ugh, that's terrible... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet wrote: > On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote: >> >> On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: >> > I'm for solution 3: >> > >> > - keep blk_bio_{discard,write_same}_split, but ensure we never built >> >a > 4GB bio in blkdev_issue_{discard,write_same}. >> >> This has problem as I mentioned in solution 1. >> We need to also make sure max discard size is of proper granularity. >> See below example. >> >> 4G: 8388608 sectors >> UINT_MAX: 8388607 sectors >> >> dm-thinp block size = default discard granularity = 128 sectors >> >> blkdev_issue_discard(sector=0, nr_sectors=8388608) >> >> 1. Only ensure bi_size not overflow >> >> It doesn't work. >> >> [start_sector, end_sector] >> [0, 8388607] >> [0, 8388606], then dm-thinp splits it to 2 bios >> [0, 8388479] >> [8388480, 8388606] ---> this has problem in process_discard_bio(), >> because the discard size(7 sectors) covers >> less than a block(128 sectors) >> [8388607, 8388607] ---> same problem >> >> 2. Ensure bi_size not overflow and max discard size is of proper granularity > > Ideally we'd get upper layers out of the business of knowing about the queue > limits at all - that was the point of the patch series, after all. > > Instead of using UINT_MAX, would it work to just make the max 1 << 31 > sectors?' 1 << 31 = 2G bytes = 0x40 sectors. Yes, that works as long as it's multiple of granularity. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lkp] [staging] 68905a14e49: kernel BUG at drivers/base/driver.c:153!
On Fri, Aug 07, 2015 at 11:01:13AM -0700, Greg Kroah-Hartman wrote: > On Fri, Aug 07, 2015 at 12:45:53PM +0530, Sudip Mukherjee wrote: > > On Thu, Aug 06, 2015 at 07:28:12PM -0700, Greg Kroah-Hartman wrote: > > > > > > > > > On Sat, Aug 01, 2015 at 01:12:05PM +0800, kernel test robot wrote: > > > > > > [ 12.276231] RIP > > > > [ 12.276231] RIP [] driver_register+0xa8/0xe0 > > > > [] driver_register+0xa8/0xe0 > > > > [ 12.276231] RSP > > > > [ 12.276231] RSP > > > > [ 12.403330] ---[ end trace 075c805d53b02836 ]--- > > > > [ 12.403330] ---[ end trace 075c805d53b02836 ]--- > > > > > > David, any ideas of what needs to be done to fix this? > > > > Hi Greg, > > I was having a quick look and it looks like visorchipset is not yet > > initialised and the bus is not yet registered. If you remember we had a > > lng discussion in another thread "usb: ulpi: don't register drivers > > if bus doesn't exist", I sent a proposed patch also. For your reference: > > "http://www.spinics.net/lists/linux-usb/msg125368.html;. > > I think this one is the perfect example of that case here. > > Great, then it needs to be fixed in the unisys driver code, like I said > should be done for busses to prevent this type of issue :) oops. then I have missed that mail. The last mail I have from you in that thread said "I have some time on Saturday (very long plane ride), where I'll take a look at this." , so I though that is still in your todo list. Then I need to fix parport also, as that can also have this same problem. regards sudip -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V9 3/3] audit: add audit by children of executable path
On 15/08/07, Paul Moore wrote: > On Fri, Aug 7, 2015 at 12:03 PM, Richard Guy Briggs wrote: > > On 15/08/07, Paul Moore wrote: > >> On Friday, August 07, 2015 02:37:15 AM Richard Guy Briggs wrote: > >> > On 15/08/06, Paul Moore wrote: > >> > > >> > > I guess what I'm saying is that I'm not currently convinced that > >> > > there is enough value in this to offset the risk I feel the loop > >> > > presents. I understand the use cases that you are mentioning, the > >> > > are the same as the last time we discussed this, but I'm going to > >> > > need something better than that. > >> > > >> > Can you better describe the loop that concerns you? I don't quite see > >> > it. > >> > >> It would be the only loop in the patch, look at the for loop in > >> audit_filter_rules() which iterates up the process' parent chain. > > > > Sorry, I should reword that... What risk do you see in that loop? It > > works up the task ancestry tree until it triggers, or hits init for that > > PID namespace that terminates the loop. Do you see a risk in the > > numerical pids rolling underneath the loop? > > I suppose there is some risk of PID overlap, and while that is a > concern, it isn't my first. > > My main concern is that a malicious user could add an extra level of > burden to the system by making an absurdly tall process tree and then > hammer the system with trivial, short lived syscalls. Granted, there > are userspace limits which would bound the impact to some extent, but > there is no way to really reduce the risk. You could further put hard > limits on the loop, but what good would that do? Malicious users > would just know to blow past that limit before they did their Evil > Deeds. > > I'll say it again; I'm not completely opposed to something like this - > perhaps in some modified form - but I have yet to see a need for this > functionality that is great enough to counter the risk. I am not going to lobby hard for it. I split this part of the patch out to avoid jeopardizing the acceptability of the rest of the patchset and to isolate it to make it easier to focus on its issues and apply it later once they are addressed. I'll reflect on this concern and see if I can come up with any ways to minimize this danger. This issue is related to the request to list the chain of processes back to the first ancestor in each record. You can make a best effort to record or track the entire chain, but at some point need to put a limit on it to avoid a DoS, at which point there is no point in listing the information since it is incomplete. (Too many "point"s in that last sentence...) > > I *do* notice that find_task_by_vpid(pid_t) must be replaced with > > find_task_by_pid_ns(pid_t, _pid_ns), since task_struct->pid is > > always stored in the initial PID namespace. > > Another thing that needs to be resolved. I've already fixed it in my tree: ptsk = find_task_by_pid_ns(ptsk->parent->pid) _pid_ns) > paul moore - RGB -- Richard Guy Briggs Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat Remote, Ottawa, Canada Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/8] watchdog: watchdog_dev: Use single variable name for struct watchdog_device
The current code uses 'wdd', wddev', and 'watchdog' as variable names for struct watchdog_device. This is confusing and makes it difficult to enhance the code. Replace it all with 'wdd'. Cc: Timo Kokkonen Cc: Uwe Kleine-König Acked-by: Uwe Kleine-König Signed-off-by: Guenter Roeck --- v2: No changes --- drivers/watchdog/watchdog_dev.c | 151 1 file changed, 75 insertions(+), 76 deletions(-) diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c index 6aaefbad303e..06171c73daf5 100644 --- a/drivers/watchdog/watchdog_dev.c +++ b/drivers/watchdog/watchdog_dev.c @@ -51,7 +51,7 @@ static struct watchdog_device *old_wdd; /* * watchdog_ping: ping the watchdog. - * @wddev: the watchdog device to ping + * @wdd: the watchdog device to ping * * If the watchdog has no own ping operation then it needs to be * restarted via the start operation. This wrapper function does @@ -59,65 +59,65 @@ static struct watchdog_device *old_wdd; * We only ping when the watchdog device is running. */ -static int watchdog_ping(struct watchdog_device *wddev) +static int watchdog_ping(struct watchdog_device *wdd) { int err = 0; - mutex_lock(>lock); + mutex_lock(>lock); - if (test_bit(WDOG_UNREGISTERED, >status)) { + if (test_bit(WDOG_UNREGISTERED, >status)) { err = -ENODEV; goto out_ping; } - if (!watchdog_active(wddev)) + if (!watchdog_active(wdd)) goto out_ping; - if (wddev->ops->ping) - err = wddev->ops->ping(wddev); /* ping the watchdog */ + if (wdd->ops->ping) + err = wdd->ops->ping(wdd); /* ping the watchdog */ else - err = wddev->ops->start(wddev); /* restart watchdog */ + err = wdd->ops->start(wdd); /* restart watchdog */ out_ping: - mutex_unlock(>lock); + mutex_unlock(>lock); return err; } /* * watchdog_start: wrapper to start the watchdog. - * @wddev: the watchdog device to start + * @wdd: the watchdog device to start * * Start the watchdog if it is not active and mark it active. * This function returns zero on success or a negative errno code for * failure. */ -static int watchdog_start(struct watchdog_device *wddev) +static int watchdog_start(struct watchdog_device *wdd) { int err = 0; - mutex_lock(>lock); + mutex_lock(>lock); - if (test_bit(WDOG_UNREGISTERED, >status)) { + if (test_bit(WDOG_UNREGISTERED, >status)) { err = -ENODEV; goto out_start; } - if (watchdog_active(wddev)) + if (watchdog_active(wdd)) goto out_start; - err = wddev->ops->start(wddev); + err = wdd->ops->start(wdd); if (err == 0) - set_bit(WDOG_ACTIVE, >status); + set_bit(WDOG_ACTIVE, >status); out_start: - mutex_unlock(>lock); + mutex_unlock(>lock); return err; } /* * watchdog_stop: wrapper to stop the watchdog. - * @wddev: the watchdog device to stop + * @wdd: the watchdog device to stop * * Stop the watchdog if it is still active and unmark it active. * This function returns zero on success or a negative errno code for @@ -125,155 +125,154 @@ out_start: * If the 'nowayout' feature was set, the watchdog cannot be stopped. */ -static int watchdog_stop(struct watchdog_device *wddev) +static int watchdog_stop(struct watchdog_device *wdd) { int err = 0; - mutex_lock(>lock); + mutex_lock(>lock); - if (test_bit(WDOG_UNREGISTERED, >status)) { + if (test_bit(WDOG_UNREGISTERED, >status)) { err = -ENODEV; goto out_stop; } - if (!watchdog_active(wddev)) + if (!watchdog_active(wdd)) goto out_stop; - if (test_bit(WDOG_NO_WAY_OUT, >status)) { - dev_info(wddev->dev, "nowayout prevents watchdog being stopped!\n"); + if (test_bit(WDOG_NO_WAY_OUT, >status)) { + dev_info(wdd->dev, "nowayout prevents watchdog being stopped!\n"); err = -EBUSY; goto out_stop; } - err = wddev->ops->stop(wddev); + err = wdd->ops->stop(wdd); if (err == 0) - clear_bit(WDOG_ACTIVE, >status); + clear_bit(WDOG_ACTIVE, >status); out_stop: - mutex_unlock(>lock); + mutex_unlock(>lock); return err; } /* * watchdog_get_status: wrapper to get the watchdog status - * @wddev: the watchdog device to get the status from + * @wdd: the watchdog device to get the status from * @status: the status of the watchdog device * * Get the watchdog's status flags. */ -static int watchdog_get_status(struct watchdog_device *wddev, +static int
[PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core
Introduce an optional hardware maximum timeout in the watchdog core. The hardware maximum timeout can be lower than the maximum timeout. Drivers can set the maximum hardware timeout value in the watchdog data structure. If the configured timeout exceeds the maximum hardware timeout, the watchdog core enables a timer function to assist sending keepalive requests to the watchdog driver. Cc: Timo Kokkonen Cc: Uwe Kleine-König Signed-off-by: Guenter Roeck --- v2: - Improved and hopefully clarified documentation. - Rearranged variables in struct watchdog_device such that internal variables come last. - The code now ensures that the watchdog times out seconds after the most recent keepalive sent from user space. - The internal keepalive now stops silently and no longer generates a warning message. Reason is that it will now stop early, while there may still be a substantial amount of time for keepalives from user space to arrive. If such keepalives arrive late (for example if user space is configured to send keepalives just a few seconds before the watchdog times out), the message would just be noise and not provide any value. --- Documentation/watchdog/watchdog-kernel-api.txt | 23 +++- drivers/watchdog/watchdog_dev.c| 140 ++--- include/linux/watchdog.h | 26 +++-- 3 files changed, 163 insertions(+), 26 deletions(-) diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt index d8b0d3367706..25b00b878a7b 100644 --- a/Documentation/watchdog/watchdog-kernel-api.txt +++ b/Documentation/watchdog/watchdog-kernel-api.txt @@ -53,9 +53,12 @@ struct watchdog_device { unsigned int timeout; unsigned int min_timeout; unsigned int max_timeout; + unsigned int max_hw_timeout_ms; void *driver_data; - struct mutex lock; unsigned long status; + struct mutex lock; + unsigned long last_keepalive; + struct delayed_work work; struct list_head deferred; }; @@ -73,18 +76,28 @@ It contains following fields: additional information about the watchdog timer itself. (Like it's unique name) * ops: a pointer to the list of watchdog operations that the watchdog supports. * timeout: the watchdog timer's timeout value (in seconds). + This is the time after which the system will reboot if user space does + not send a heartbeat request if WDOG_ACTIVE is set. * min_timeout: the watchdog timer's minimum timeout value (in seconds). * max_timeout: the watchdog timer's maximum timeout value (in seconds). +* max_hw_timeout_ms: Maximum hardware timeout, in milli-seconds. May differ + from max_timeout. If set to a value larger than max_timeout, the + infrastructure will send a heartbeat to the watchdog driver if 'timeout' + is larger than 'max_hw_timeout / 2', unless WDOG_ACTIVE is set and user + space failed to send a heartbeat for at least 'timeout' seconds. * bootstatus: status of the device after booting (reported with watchdog WDIOF_* status bits). * driver_data: a pointer to the drivers private data of a watchdog device. This data should only be accessed via the watchdog_set_drvdata and watchdog_get_drvdata routines. -* lock: Mutex for WatchDog Timer Driver Core internal use only. * status: this field contains a number of status bits that give extra information about the status of the device (Like: is the watchdog timer running/active, is the nowayout bit set, is the device opened via the /dev/watchdog interface or not, ...). +* lock: Mutex for WatchDog Timer Driver Core internal use only. +* last_keepalive: Time of most recent keepalive triggered from user space, + in jiffies. +* work: Worker data structure for WatchDog Timer Driver Core internal use only. * deferred: entry in wtd_deferred_reg_list which is used to register early initialized watchdogs. @@ -160,7 +173,11 @@ they are supported. These optional routines/operations are: and -EIO for "could not write value to the watchdog". On success this routine should set the timeout value of the watchdog_device to the achieved timeout value (which may be different from the requested one - because the watchdog does not necessarily has a 1 second resolution). + because the watchdog does not necessarily have a 1 second resolution). + Drivers implementing hw_max_timeout_ms set the hardware watchdog timeout + to the minimum of timeout and hw_max_timeout_ms. Those drivers set the + timeout value of the watchdog_device either to the requested timeout value + (if it is larger than hw_max_timeout_ms), or to the achieved timeout value. (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the watchdog's info structure). * get_timeleft: this routines returns the time that's left before a reset. diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c index 06171c73daf5..c04ba1a98cc8 100644 ---
[PATCH v2 4/8] watchdog: Make set_timeout function optional
For some watchdogs, the hardware timeout is fixed, and the watchdog driver depends on the watchdog core to handle the actual timeout. In this situation, the watchdog driver might only set the 'timeout' variable but do nothing else. This can as well be handled by the infrastructure, so make the set_timeout callback optional. If WDIOF_SETTIMEOUT is configured but the .set_timeout callback is not available, update the timeout variable in the infrastructure code. Signed-off-by: Guenter Roeck --- v2: No changes --- Documentation/watchdog/watchdog-kernel-api.txt | 5 + drivers/watchdog/watchdog_dev.c| 9 ++--- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt index 6a54dc15a556..49dce3a5477b 100644 --- a/Documentation/watchdog/watchdog-kernel-api.txt +++ b/Documentation/watchdog/watchdog-kernel-api.txt @@ -181,6 +181,11 @@ they are supported. These optional routines/operations are: (if it is larger than hw_max_timeout_ms), or to the achieved timeout value. (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the watchdog's info structure). + If the watchdog driver does not have to perform any action but setting the + watchdog_device.timeout, this callback can be omitted. + If set_timeout is not provided but, WDIOF_SETTIMEOUT is set, the watchdog + infrastructure updates the timeout value of the watchdog_device internally + to the requested value. * get_timeleft: this routines returns the time that's left before a reset. * ref: the operation that calls kref_get on the kref of a dynamically allocated watchdog_device struct. diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c index 676e233d5e7b..752de264a244 100644 --- a/drivers/watchdog/watchdog_dev.c +++ b/drivers/watchdog/watchdog_dev.c @@ -282,9 +282,9 @@ out_status: static int watchdog_set_timeout(struct watchdog_device *wdd, unsigned int timeout) { - int err; + int err = 0; - if (!wdd->ops->set_timeout || !(wdd->info->options & WDIOF_SETTIMEOUT)) + if (!(wdd->info->options & WDIOF_SETTIMEOUT)) return -EOPNOTSUPP; if (watchdog_timeout_invalid(wdd, timeout)) @@ -297,7 +297,10 @@ static int watchdog_set_timeout(struct watchdog_device *wdd, goto out_timeout; } - err = wdd->ops->set_timeout(wdd, timeout); + if (wdd->ops->set_timeout) + err = wdd->ops->set_timeout(wdd, timeout); + else + wdd->timeout = timeout; watchdog_update_worker(wdd, true, false); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 6/8] watchdog: retu: Convert to use infrastructure triggered keepalives
The watchdog infrastructure now supports handling watchdog keepalive if the watchdog is running while the watchdog device is closed. Convert the driver to use this infrastructure. Signed-off-by: Guenter Roeck --- v2: No changes --- drivers/watchdog/retu_wdt.c | 78 - 1 file changed, 7 insertions(+), 71 deletions(-) diff --git a/drivers/watchdog/retu_wdt.c b/drivers/watchdog/retu_wdt.c index b7c68e275aeb..ce2982a7670c 100644 --- a/drivers/watchdog/retu_wdt.c +++ b/drivers/watchdog/retu_wdt.c @@ -28,69 +28,22 @@ /* Watchdog timer values in seconds */ #define RETU_WDT_MAX_TIMER 63 -struct retu_wdt_dev { - struct retu_dev *rdev; - struct device *dev; - struct delayed_work ping_work; -}; - -/* - * Since Retu watchdog cannot be disabled in hardware, we must kick it - * with a timer until userspace watchdog software takes over. If - * CONFIG_WATCHDOG_NOWAYOUT is set, we never start the feeding. - */ -static void retu_wdt_ping_enable(struct retu_wdt_dev *wdev) -{ - retu_write(wdev->rdev, RETU_REG_WATCHDOG, RETU_WDT_MAX_TIMER); - schedule_delayed_work(>ping_work, - round_jiffies_relative(RETU_WDT_MAX_TIMER * HZ / 2)); -} - -static void retu_wdt_ping_disable(struct retu_wdt_dev *wdev) -{ - retu_write(wdev->rdev, RETU_REG_WATCHDOG, RETU_WDT_MAX_TIMER); - cancel_delayed_work_sync(>ping_work); -} - -static void retu_wdt_ping_work(struct work_struct *work) -{ - struct retu_wdt_dev *wdev = container_of(to_delayed_work(work), - struct retu_wdt_dev, ping_work); - retu_wdt_ping_enable(wdev); -} - static int retu_wdt_start(struct watchdog_device *wdog) { - struct retu_wdt_dev *wdev = watchdog_get_drvdata(wdog); + struct retu_dev *rdev = watchdog_get_drvdata(wdog); - retu_wdt_ping_disable(wdev); + set_bit(WDOG_RUNNING, >status); - return retu_write(wdev->rdev, RETU_REG_WATCHDOG, wdog->timeout); -} - -static int retu_wdt_stop(struct watchdog_device *wdog) -{ - struct retu_wdt_dev *wdev = watchdog_get_drvdata(wdog); - - retu_wdt_ping_enable(wdev); - - return 0; -} - -static int retu_wdt_ping(struct watchdog_device *wdog) -{ - struct retu_wdt_dev *wdev = watchdog_get_drvdata(wdog); - - return retu_write(wdev->rdev, RETU_REG_WATCHDOG, wdog->timeout); + return retu_write(rdev, RETU_REG_WATCHDOG, wdog->timeout); } static int retu_wdt_set_timeout(struct watchdog_device *wdog, unsigned int timeout) { - struct retu_wdt_dev *wdev = watchdog_get_drvdata(wdog); + struct retu_dev *rdev = watchdog_get_drvdata(wdog); wdog->timeout = timeout; - return retu_write(wdev->rdev, RETU_REG_WATCHDOG, wdog->timeout); + return retu_write(rdev, RETU_REG_WATCHDOG, wdog->timeout); } static const struct watchdog_info retu_wdt_info = { @@ -101,8 +54,6 @@ static const struct watchdog_info retu_wdt_info = { static const struct watchdog_ops retu_wdt_ops = { .owner = THIS_MODULE, .start = retu_wdt_start, - .stop = retu_wdt_stop, - .ping = retu_wdt_ping, .set_timeout= retu_wdt_set_timeout, }; @@ -111,39 +62,26 @@ static int retu_wdt_probe(struct platform_device *pdev) struct retu_dev *rdev = dev_get_drvdata(pdev->dev.parent); bool nowayout = WATCHDOG_NOWAYOUT; struct watchdog_device *retu_wdt; - struct retu_wdt_dev *wdev; int ret; retu_wdt = devm_kzalloc(>dev, sizeof(*retu_wdt), GFP_KERNEL); if (!retu_wdt) return -ENOMEM; - wdev = devm_kzalloc(>dev, sizeof(*wdev), GFP_KERNEL); - if (!wdev) - return -ENOMEM; - retu_wdt->info = _wdt_info; retu_wdt->ops = _wdt_ops; retu_wdt->timeout = RETU_WDT_MAX_TIMER; retu_wdt->min_timeout = 0; retu_wdt->max_timeout = RETU_WDT_MAX_TIMER; - watchdog_set_drvdata(retu_wdt, wdev); + watchdog_set_drvdata(retu_wdt, rdev); watchdog_set_nowayout(retu_wdt, nowayout); - wdev->rdev = rdev; - wdev->dev = >dev; - - INIT_DELAYED_WORK(>ping_work, retu_wdt_ping_work); - ret = watchdog_register_device(retu_wdt); if (ret < 0) return ret; - if (nowayout) - retu_wdt_ping(retu_wdt); - else - retu_wdt_ping_enable(wdev); + retu_wdt_start(retu_wdt); platform_set_drvdata(pdev, retu_wdt); @@ -153,10 +91,8 @@ static int retu_wdt_probe(struct platform_device *pdev) static int retu_wdt_remove(struct platform_device *pdev) { struct watchdog_device *wdog = platform_get_drvdata(pdev); - struct retu_wdt_dev *wdev = watchdog_get_drvdata(wdog); watchdog_unregister_device(wdog); -
[PATCH v2 5/8] watchdog: imx2: Convert to use infrastructure triggered keepalives
The watchdog infrastructure now supports handling watchdog keepalive if the watchdog is running while the watchdog device is closed. Convert the driver to use this infrastructure. Signed-off-by: Guenter Roeck --- v2: No changes --- drivers/watchdog/imx2_wdt.c | 72 - 1 file changed, 12 insertions(+), 60 deletions(-) diff --git a/drivers/watchdog/imx2_wdt.c b/drivers/watchdog/imx2_wdt.c index 0bb1a1d1b170..66feef254661 100644 --- a/drivers/watchdog/imx2_wdt.c +++ b/drivers/watchdog/imx2_wdt.c @@ -25,7 +25,6 @@ #include #include #include -#include #include #include #include @@ -34,7 +33,6 @@ #include #include #include -#include #include #define DRIVER_NAME "imx2-wdt" @@ -62,7 +60,6 @@ struct imx2_wdt_device { struct clk *clk; struct regmap *regmap; - struct timer_list timer;/* Pings the watchdog when closed */ struct watchdog_device wdog; struct notifier_block restart_handler; }; @@ -151,16 +148,6 @@ static int imx2_wdt_ping(struct watchdog_device *wdog) return 0; } -static void imx2_wdt_timer_ping(unsigned long arg) -{ - struct watchdog_device *wdog = (struct watchdog_device *)arg; - struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog); - - /* ping it every wdog->timeout / 2 seconds to prevent reboot */ - imx2_wdt_ping(wdog); - mod_timer(>timer, jiffies + wdog->timeout * HZ / 2); -} - static int imx2_wdt_set_timeout(struct watchdog_device *wdog, unsigned int new_timeout) { @@ -177,40 +164,19 @@ static int imx2_wdt_start(struct watchdog_device *wdog) { struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog); - if (imx2_wdt_is_running(wdev)) { - /* delete the timer that pings the watchdog after close */ - del_timer_sync(>timer); + if (imx2_wdt_is_running(wdev)) imx2_wdt_set_timeout(wdog, wdog->timeout); - } else + else imx2_wdt_setup(wdog); - return imx2_wdt_ping(wdog); -} - -static int imx2_wdt_stop(struct watchdog_device *wdog) -{ - /* -* We don't need a clk_disable, it cannot be disabled once started. -* We use a timer to ping the watchdog while /dev/watchdog is closed -*/ - imx2_wdt_timer_ping((unsigned long)wdog); - return 0; -} - -static inline void imx2_wdt_ping_if_active(struct watchdog_device *wdog) -{ - struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog); + set_bit(WDOG_RUNNING, >status); - if (imx2_wdt_is_running(wdev)) { - imx2_wdt_set_timeout(wdog, wdog->timeout); - imx2_wdt_timer_ping((unsigned long)wdog); - } + return imx2_wdt_ping(wdog); } static const struct watchdog_ops imx2_wdt_ops = { .owner = THIS_MODULE, .start = imx2_wdt_start, - .stop = imx2_wdt_stop, .ping = imx2_wdt_ping, .set_timeout = imx2_wdt_set_timeout, }; @@ -277,9 +243,10 @@ static int __init imx2_wdt_probe(struct platform_device *pdev) watchdog_set_nowayout(wdog, nowayout); watchdog_init_timeout(wdog, timeout, >dev); - setup_timer(>timer, imx2_wdt_timer_ping, (unsigned long)wdog); - - imx2_wdt_ping_if_active(wdog); + if (imx2_wdt_is_running(wdev)) { + imx2_wdt_set_timeout(wdog, wdog->timeout); + set_bit(WDOG_RUNNING, >status); + } /* * Disable the watchdog power down counter at boot. Otherwise the power @@ -320,7 +287,6 @@ static int __exit imx2_wdt_remove(struct platform_device *pdev) watchdog_unregister_device(wdog); if (imx2_wdt_is_running(wdev)) { - del_timer_sync(>timer); imx2_wdt_ping(wdog); dev_crit(>dev, "Device removed: Expect reboot!\n"); } @@ -334,10 +300,9 @@ static void imx2_wdt_shutdown(struct platform_device *pdev) if (imx2_wdt_is_running(wdev)) { /* -* We are running, we need to delete the timer but will -* give max timeout before reboot will take place +* We are running, configure max timeout before reboot +* will take place. */ - del_timer_sync(>timer); imx2_wdt_set_timeout(wdog, IMX2_WDT_MAX_TIME); imx2_wdt_ping(wdog); dev_crit(>dev, "Device shutdown: Expect reboot!\n"); @@ -355,10 +320,6 @@ static int imx2_wdt_suspend(struct device *dev) if (imx2_wdt_is_running(wdev)) { imx2_wdt_set_timeout(wdog, IMX2_WDT_MAX_TIME); imx2_wdt_ping(wdog); - - /* The watchdog is not active */ - if (!watchdog_active(wdog)) - del_timer_sync(>timer); } clk_disable_unprepare(wdev->clk); @@ -384,19 +345,10 @@ static int imx2_wdt_resume(struct device
[PATCH v2 3/8] watchdog: Introduce WDOG_RUNNING flag
The WDOG_RUNNING flag is expected to be set by watchdog drivers if the hardware watchdog is running. If the flag is set, the watchdog subsystem will ping the watchdog even if the watchdog device is closed. The watchdog driver stop function is now optional and may be omitted if the watchdog can not be stopped. If stopping the watchdog is not possible but the driver implements a stop function, it is responsible to set the WDOG_RUNNING flag in its stop function. Cc: Timo Kokkonen Cc: Uwe Kleine-König Signed-off-by: Guenter Roeck --- v2: Improved documentation --- Documentation/watchdog/watchdog-kernel-api.txt | 29 --- drivers/watchdog/watchdog_core.c | 2 +- drivers/watchdog/watchdog_dev.c| 40 -- include/linux/watchdog.h | 7 + 4 files changed, 59 insertions(+), 19 deletions(-) diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt index 25b00b878a7b..6a54dc15a556 100644 --- a/Documentation/watchdog/watchdog-kernel-api.txt +++ b/Documentation/watchdog/watchdog-kernel-api.txt @@ -143,17 +143,18 @@ are: device. The routine needs a pointer to the watchdog timer device structure as a parameter. It returns zero on success or a negative errno code for failure. -* stop: with this routine the watchdog timer device is being stopped. - The routine needs a pointer to the watchdog timer device structure as a - parameter. It returns zero on success or a negative errno code for failure. - Some watchdog timer hardware can only be started and not be stopped. The - driver supporting this hardware needs to make sure that a start and stop - routine is being provided. This can be done by using a timer in the driver - that regularly sends a keepalive ping to the watchdog timer hardware. Not all watchdog timer hardware supports the same functionality. That's why all other routines/operations are optional. They only need to be provided if they are supported. These optional routines/operations are: +* stop: with this routine the watchdog timer device is being stopped. + The routine needs a pointer to the watchdog timer device structure as a + parameter. It returns zero on success or a negative errno code for failure. + Some watchdog timer hardware can only be started and not be stopped. A + driver supporting such hardware does not have to implement the stop routine. + If a driver has no stop function, the watchdog core will set WDOG_RUNNING and + start calling the driver's keepalive pings function after the watchdog device + is closed. * ping: this is the routine that sends a keepalive ping to the watchdog timer hardware. The routine needs a pointer to the watchdog timer device structure as a @@ -193,9 +194,12 @@ they are supported. These optional routines/operations are: The status bits should (preferably) be set with the set_bit and clear_bit alike bit-operations. The status bits that are defined are: * WDOG_ACTIVE: this status bit indicates whether or not a watchdog timer device - is active or not. When the watchdog is active after booting, then you should - set this status bit (Note: when you register the watchdog timer device with - this bit set, then opening /dev/watchdog will skip the start operation) + is active or not from user perspective. User space is expected to send + heartbeat requests to the driver while this flag is set. If the watchdog + is active after booting, and you don't want the infrastructure to send + heartbeats to the watchdog driver, then you should set this status bit. + Note: when you register the watchdog timer device with this bit set, + then opening /dev/watchdog will skip the start operation. * WDOG_DEV_OPEN: this status bit shows whether or not the watchdog device was opened via /dev/watchdog. (This bit should only be used by the WatchDog Timer Driver Core). @@ -209,6 +213,11 @@ bit-operations. The status bits that are defined are: any watchdog_ops, so that you can be sure that no operations (other then unref) will get called after unregister, even if userspace still holds a reference to /dev/watchdog +* WDOG_RUNNING: Set by the watchdog driver if the hardware watchdog is running. + The bit must be set if the watchdog timer hardware can not be stopped. + The bit may also be set if the watchdog timer is running aftyer booting, + before the watchdog device is opened. If set, the watchdog infrastructure + will send keepalives to the watchdog hardware while WDOG_ACTIVE is not set. To set the WDOG_NO_WAY_OUT status bit (before registering your watchdog timer device) you can either: diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c index 1a8059455413..b38d1b7ae10e 100644 --- a/drivers/watchdog/watchdog_core.c +++ b/drivers/watchdog/watchdog_core.c @@ -145,7 +145,7 @@ static int __watchdog_register_device(struct watchdog_device *wdd)
[PATCH v2 7/8] watchdog: gpio_wdt: Convert to use infrastructure triggered keepalives
The watchdog infrastructure now supports handling watchdog keepalive if the watchdog is running while the watchdog device is closed. The infrastructure now also supports generating additional heartbeats if the maximum hardware timeout is smaller than or close to the configured timeout. Convert the driver to use this infrastructure. Signed-off-by: Guenter Roeck --- v2: No changes --- drivers/watchdog/gpio_wdt.c | 65 - 1 file changed, 11 insertions(+), 54 deletions(-) diff --git a/drivers/watchdog/gpio_wdt.c b/drivers/watchdog/gpio_wdt.c index 1687cc2d7122..cbbdae440bfa 100644 --- a/drivers/watchdog/gpio_wdt.c +++ b/drivers/watchdog/gpio_wdt.c @@ -32,12 +32,8 @@ struct gpio_wdt_priv { boolactive_low; boolstate; boolalways_running; - boolarmed; unsigned inthw_algo; - unsigned inthw_margin; - unsigned long last_jiffies; struct notifier_block notifier; - struct timer_list timer; struct watchdog_device wdd; }; @@ -50,20 +46,12 @@ static void gpio_wdt_disable(struct gpio_wdt_priv *priv) gpio_direction_input(priv->gpio); } -static void gpio_wdt_start_impl(struct gpio_wdt_priv *priv) -{ - priv->state = priv->active_low; - gpio_direction_output(priv->gpio, priv->state); - priv->last_jiffies = jiffies; - mod_timer(>timer, priv->last_jiffies + priv->hw_margin); -} - static int gpio_wdt_start(struct watchdog_device *wdd) { struct gpio_wdt_priv *priv = watchdog_get_drvdata(wdd); - gpio_wdt_start_impl(priv); - priv->armed = true; + priv->state = priv->active_low; + gpio_direction_output(priv->gpio, priv->state); return 0; } @@ -72,10 +60,9 @@ static int gpio_wdt_stop(struct watchdog_device *wdd) { struct gpio_wdt_priv *priv = watchdog_get_drvdata(wdd); - priv->armed = false; if (!priv->always_running) { - mod_timer(>timer, 0); gpio_wdt_disable(priv); + clear_bit(WDOG_RUNNING, >wdd.status); } return 0; @@ -85,32 +72,6 @@ static int gpio_wdt_ping(struct watchdog_device *wdd) { struct gpio_wdt_priv *priv = watchdog_get_drvdata(wdd); - priv->last_jiffies = jiffies; - - return 0; -} - -static int gpio_wdt_set_timeout(struct watchdog_device *wdd, unsigned int t) -{ - wdd->timeout = t; - - return gpio_wdt_ping(wdd); -} - -static void gpio_wdt_hwping(unsigned long data) -{ - struct watchdog_device *wdd = (struct watchdog_device *)data; - struct gpio_wdt_priv *priv = watchdog_get_drvdata(wdd); - - if (priv->armed && time_after(jiffies, priv->last_jiffies + - msecs_to_jiffies(wdd->timeout * 1000))) { - dev_crit(wdd->dev, "Timer expired. System will reboot soon!\n"); - return; - } - - /* Restart timer */ - mod_timer(>timer, jiffies + priv->hw_margin); - switch (priv->hw_algo) { case HW_ALGO_TOGGLE: /* Toggle output pin */ @@ -124,6 +85,8 @@ static void gpio_wdt_hwping(unsigned long data) gpio_set_value_cansleep(priv->gpio, priv->active_low); break; } + + return 0; } static int gpio_wdt_notify_sys(struct notifier_block *nb, unsigned long code, @@ -132,12 +95,10 @@ static int gpio_wdt_notify_sys(struct notifier_block *nb, unsigned long code, struct gpio_wdt_priv *priv = container_of(nb, struct gpio_wdt_priv, notifier); - mod_timer(>timer, 0); - switch (code) { case SYS_HALT: case SYS_POWER_OFF: - gpio_wdt_disable(priv); + gpio_wdt_stop(>wdd); break; default: break; @@ -157,7 +118,6 @@ static const struct watchdog_ops gpio_wdt_ops = { .start = gpio_wdt_start, .stop = gpio_wdt_stop, .ping = gpio_wdt_ping, - .set_timeout= gpio_wdt_set_timeout, }; static int gpio_wdt_probe(struct platform_device *pdev) @@ -205,9 +165,6 @@ static int gpio_wdt_probe(struct platform_device *pdev) if (hw_margin < 2 || hw_margin > 65535) return -EINVAL; - /* Use safe value (1/2 of real timeout) */ - priv->hw_margin = msecs_to_jiffies(hw_margin / 2); - priv->always_running = of_property_read_bool(pdev->dev.of_node, "always-running"); @@ -217,11 +174,15 @@ static int gpio_wdt_probe(struct platform_device *pdev) priv->wdd.ops = _wdt_ops; priv->wdd.min_timeout = SOFT_TIMEOUT_MIN; priv->wdd.max_timeout = SOFT_TIMEOUT_MAX; + priv->wdd.max_hw_timeout_ms = hw_margin;
[PATCH v2 8/8] watchdog: at91sam9: Convert to use infrastructure triggered keepalives
The watchdog infrastructure now supports handling watchdog keepalive if the watchdog is running while the watchdog device is closed. The infrastructure now also supports generating additional heartbeats if the maximum hardware timeout is smaller than or close to the configured timeout. Convert the driver to use this infrastructure. Signed-off-by: Guenter Roeck --- v2: No changes --- drivers/watchdog/at91sam9_wdt.c | 102 +--- 1 file changed, 11 insertions(+), 91 deletions(-) diff --git a/drivers/watchdog/at91sam9_wdt.c b/drivers/watchdog/at91sam9_wdt.c index e4698f7c5f93..0de39b52962c 100644 --- a/drivers/watchdog/at91sam9_wdt.c +++ b/drivers/watchdog/at91sam9_wdt.c @@ -29,7 +29,6 @@ #include #include #include -#include #include #include #include @@ -48,8 +47,8 @@ * use this to convert a watchdog * value from/to milliseconds. */ -#define ticks_to_hz_rounddown(t) t) + 1) * HZ) >> 8) -#define ticks_to_hz_roundup(t) (t) + 1) * HZ) + 255) >> 8) +#define ticks_to_ms_rounddown(t) t) + 1) * 1000) >> 8) +#define ticks_to_ms_roundup(t) (t) + 1) * 1000) + 255) >> 8) #define ticks_to_secs(t) (((t) + 1) >> 8) #define secs_to_ticks(s) ((s) ? (((s) << 8) - 1) : 0) @@ -64,9 +63,6 @@ /* Hardware timeout in seconds */ #define WDT_HW_TIMEOUT 2 -/* Timer heartbeat (500ms) */ -#define WDT_TIMEOUT(HZ/2) - /* User land timeout */ #define WDT_HEARTBEAT 15 static int heartbeat; @@ -83,11 +79,8 @@ MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started " struct at91wdt { struct watchdog_device wdd; void __iomem *base; - unsigned long next_heartbeat; /* the next_heartbeat for the timer */ - struct timer_list timer;/* The timer that pings the watchdog */ u32 mr; u32 mr_mask; - unsigned long heartbeat;/* WDT heartbeat in jiffies */ bool nowayout; unsigned int irq; }; @@ -107,47 +100,13 @@ static irqreturn_t wdt_interrupt(int irq, void *dev_id) return IRQ_HANDLED; } -/* - * Reload the watchdog timer. (ie, pat the watchdog) - */ -static inline void at91_wdt_reset(struct at91wdt *wdt) -{ - wdt_write(wdt, AT91_WDT_CR, AT91_WDT_KEY | AT91_WDT_WDRSTT); -} - -/* - * Timer tick - */ -static void at91_ping(unsigned long data) -{ - struct at91wdt *wdt = (struct at91wdt *)data; - if (time_before(jiffies, wdt->next_heartbeat) || - !watchdog_active(>wdd)) { - at91_wdt_reset(wdt); - mod_timer(>timer, jiffies + wdt->heartbeat); - } else { - pr_crit("I will reset your machine !\n"); - } -} - static int at91_wdt_start(struct watchdog_device *wdd) { struct at91wdt *wdt = to_wdt(wdd); - /* calculate when the next userspace timeout will be */ - wdt->next_heartbeat = jiffies + wdd->timeout * HZ; - return 0; -} -static int at91_wdt_stop(struct watchdog_device *wdd) -{ - /* The watchdog timer hardware can not be stopped... */ - return 0; -} + wdt_write(wdt, AT91_WDT_CR, AT91_WDT_KEY | AT91_WDT_WDRSTT); -static int at91_wdt_set_timeout(struct watchdog_device *wdd, unsigned int new_timeout) -{ - wdd->timeout = new_timeout; - return at91_wdt_start(wdd); + return 0; } static int at91_wdt_init(struct platform_device *pdev, struct at91wdt *wdt) @@ -157,8 +116,8 @@ static int at91_wdt_init(struct platform_device *pdev, struct at91wdt *wdt) u32 value; int err; u32 mask = wdt->mr_mask; - unsigned long min_heartbeat = 1; - unsigned long max_heartbeat; + unsigned int min_timeout = jiffies_to_msecs(1); + unsigned int hw_timeout; struct device *dev = >dev; tmp = wdt_read(wdt, AT91_WDT_MR); @@ -180,31 +139,15 @@ static int at91_wdt_init(struct platform_device *pdev, struct at91wdt *wdt) delta = (tmp & AT91_WDT_WDD) >> 16; if (delta < value) - min_heartbeat = ticks_to_hz_roundup(value - delta); + min_timeout = ticks_to_ms_roundup(value - delta); - max_heartbeat = ticks_to_hz_rounddown(value); - if (!max_heartbeat) { + hw_timeout = ticks_to_ms_rounddown(value); + if (hw_timeout < min_timeout * 2) { dev_err(dev, "heartbeat is too small for the system to handle it correctly\n"); return -EINVAL; } - - /* -* Try to reset the watchdog counter 4 or 2 times more often than -* actually requested, to avoid spurious watchdog reset. -* If this is not possible because of the min_heartbeat value, reset -* it at the min_heartbeat period. -*/ - if ((max_heartbeat / 4) >= min_heartbeat) - wdt->heartbeat = max_heartbeat / 4; - else if ((max_heartbeat / 2) >= min_heartbeat) - wdt->heartbeat =
[PATCH v2 0/8] watchdog: Add support for keepalives triggered by infrastructure
The watchdog infrastructure is currently purely passive, meaning it only passes information from user space to drivers and vice versa. Since watchdog hardware tends to have its own quirks, this can result in quite complex watchdog drivers. A number of scanarios are especially common. - A watchdog is always active and can not be disabled, or can not be disabled once enabled. To support such hardware, watchdog drivers have to implement their own timers and use those timers to trigger watchdog keepalives while the watchdog device is not or not yet opened. - A variant of this is the desire to enable a watchdog as soon as its driver has been instantiated, to protect the system while it is still booting up, but the watchdog daemon is not yet running. - Some watchdogs have a very short maximum timeout, in the range of just a few seconds. Such low timeouts are difficult if not impossible to support from user space. Drivers supporting such watchdog hardware need to implement a timer function to augment heartbeats from user space. This patch set solves the above problems while keeping changes to the watchdog core minimal. - A new status flag, WDOG_RUNNING, informs the watchdog subsystem that a watchdog is running, and that the watchdog subsystem needs to generate heartbeat requests while the associated watchdog device is closed. - A new parameter in the watchdog data structure, max_hw_timeout_ms, informs the watchdog subsystem about a maximum hardware timeout. The watchdog subsystem uses this information together with the configured timeout and the maximum permitted timeout to determine if it needs to generate additional heartbeat requests. As part of this patchset, the semantics of the 'timeout' variable and of the WDOG_ACTIVE flag are changed slightly. Per the current watchdog kernel API, the 'timeout' variable is supposed to reflect the actual hardware watcdog timeout. WDOG_ACTIVE is supposed to reflect if the hardware watchdog is running or not. Unfortunately, this does not always reflect reality. In drivers which solve the above mentioned problems internally, 'timeout' is the watchdog timeout as seen from user space, and WDOG_ACTIVE reflects that user space is expected to send keepalive requests to the watchdog driver. After this patch set is applied, this so far inofficial interpretation is the 'official' semantics for the timeout variable and the WDOG_ACTIVE flag. In other words, both values no longer reflect the hardware watchdog status, but its status as seen from user space. Patch #1 is a preparatory patch. Patch #2 adds timer functionality to the watchdog core. It solves the problem of short maximum hardware timeouts by augmenting heartbeats triggered from user space with internally triggered heartbeats. Patch #3 adds functionality to generate heartbeats while the watchdog device is closed. It handles situation where where the watchdog is running after the driver has been instantiated, but the device is not yet opened, and post-close situations necessary if a watchdog can not be stopped. Patch #4 makes the set_timeout function optional. This is now possible since timeout changes can now be completely handled in the watchdog core, for example if the hardware watchdog timeout is fixed. Patch #5 to #8 are example conversions of some watchdog drivers. Those patches will require testing. The patch set is also available in branch watchdog-timer of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git. This patch set does not solve all limitations of the watchdog subsystem. Specifically, it does not add support for the following features. - It is desirable to be able to specify a maximum early timeout, from booting the system to opening the watchdog device. - Some watchdogs may require a minimum period of time between heartbeats. Examples are DA9062 and possibly AT91SAM9x. This and other features will be addressed with subsequent patches. The patch set is inspired by an earlier patch set from Timo Kokonnen. v2: - Rebased to v4.2-rc5 - Improved and hopefully clarified documentation. - Rearranged variables in struct watchdog_device such that internal variables come last. - The code now ensures that the watchdog times out seconds after the most recent keepalive sent from user space. - The internal keepalive now stops silently and no longer generates a warning message. Reason is that it will now stop early, while there may still be a substantial amount of time for keepalives from user space to arrive. If such keepalives arrive late (for example if user space is configured to send keepalives just a few seconds before the watchdog times out), the message would just be noise and not provide any value. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Caro usuário Valorizado
-- Tisztelt Felhasználó, A postaláda mérete elérte a 100 MB tárolási határérték nem tud fogadni vagy küldjön e-mailt, amíg nem frissíti a postaláda. Ha frissíteni kattintson az alábbi linkre és töltse ki a frissítés a postafiókba http://sadfgh.tripod.com/ 24 óra után nem kapott semmilyen választ akkor kikapcsolja a postafiókot. Kattintson ide: http://sadfgh.tripod.com/ Köszönjük, hogy a webmail Administrator Minden jog fenntartva © 2014 Help Desk webmail adminisztrátor. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/8] drm: rockchip/dp: add rockchip platform dp driver
Hi Hekio, 在 2015/8/8 6:46, Heiko Stübner 写道: Hi Yakir, I think this Rockchip portion is missing a devicetree binding. Oh, thanks, I would complete it in next ;) You have the ability to power down the actual edp phy by using grf_edp_iddq_en from GRF_SOC_CON12. This is similar to how the rk3288 usb-phy gets put into a deeper state. So maybe you could provide a phy driver (drivers/phy) for this similar to what the exynos-dp does. Okay, so I need to add a new phy-rockchip-dp.c, include power_on/power_off ops, thanks. Some more stuff inline. But I guess by no means complete, as I'm still trying to integrate this into my development-tree. Am Freitag, 7. August 2015, 05:46:20 schrieb Yakir Yang: Rockchip have three clocks for dp controller, we leave pclk_edp to analogix_dp driver control, and keep the sclk_edp_24m and sclk_edp in platform driver. Signed-off-by: Yakir Yang --- Changes in v2: None drivers/gpu/drm/rockchip/Kconfig| 10 + drivers/gpu/drm/rockchip/Makefile | 1 + drivers/gpu/drm/rockchip/analogix_dp-rockchip.c | 419 3 files changed, 430 insertions(+) create mode 100644 drivers/gpu/drm/rockchip/analogix_dp-rockchip.c diff --git a/drivers/gpu/drm/rockchip/Kconfig b/drivers/gpu/drm/rockchip/Kconfig index 35215f6..096ed77 100644 --- a/drivers/gpu/drm/rockchip/Kconfig +++ b/drivers/gpu/drm/rockchip/Kconfig @@ -25,3 +25,13 @@ config ROCKCHIP_DW_HDMI for the Synopsys DesignWare HDMI driver. If you want to enable HDMI on RK3288 based SoC, you should selet this option. + + nit: double blank line Done, +config ROCKCHIP_ANALOGIX_DP +tristate "Rockchip specific extensions for Analogix DP driver" +depends on DRM_ROCKCHIP +select DRM_ANALOGIX_DP +help + This selects support for Rockchip SoC specific extensions + for the Analogix Core DP driver. If you want to enable DP + on RK3288 based SoC, you should selet this option. [...] diff --git a/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c new file mode 100644 index 000..2f86e5e --- /dev/null +++ b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c @@ -0,0 +1,419 @@ +/* + * Rockchip SoC DP (Display Port) interface driver. + * + * Copyright (C) Fuzhou Rockchip Electronics Co., Ltd. + * Author: Andy Yan + * Yakir Yang + * Jeff Chen + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include +#include + +#include + +#include "rockchip_drm_drv.h" +#include "rockchip_drm_vop.h" + +#define encoder_to_dp(c) \ + container_of(c, struct rockchip_dp_device, encoder) + +#define plat_data_to_dp(pd) \ + container_of(pd, struct rockchip_dp_device, plat_data) + +/* dp grf register offset */ +#define DP_VOP_SEL 0x025c /* grf_soc_con6 */ +#define DP_REF_CLK_SEL 0x0274 /* grf_soc_con12 */ + +#define GRF_DP_REF_CLK_SEL_INTERBIT(4) +#define DP_SEL_VOP_LIT BIT(5) + +struct rockchip_dp_device { + struct drm_device*drm_dev; + struct device*dev; + struct drm_encoder encoder; + struct drm_display_mode mode; + + struct clk *clk_dp; + struct clk *clk_24m_parent; this clk_24m_parent does not seem used at all Done, + struct clk *clk_24m; + struct regmap*grf; + struct reset_control *rst; + + struct analogix_dp_plat_data plat_data; +}; + [...] +static int rockchip_dp_init(struct rockchip_dp_device *dp) +{ + struct device *dev = dp->dev; + struct device_node *np = dev->of_node; + int ret; + + dp->grf = syscon_regmap_lookup_by_phandle(np, "rockchip,grf"); + if (IS_ERR(dp->grf)) { + dev_err(dev, + "rk3288-dp needs rockchip,grf property\n"); + return PTR_ERR(dp->grf); + } + + dp->clk_dp = devm_clk_get(dev, "clk_dp"); I've looked at the manual, but couldn't find an actual clock-name used there. Is it really "clk_dp" or should it just be "dp"? This should be "clk_dp", not "dp". Cause analogix_dp_core would need a clock name with "dp", so I would rather to pasted my rockchip-dp node here before I add dt-bindings in next version ;) edp: edp@ff97 { compatible = "rockchip,rk3288-dp"; reg = <0xff97 0x4000>; interrupts = ; clocks = < SCLK_EDP>, < SCLK_EDP_24M>, < PCLK_EDP_CTRL>;
[PATCH v5 2/2] usb: Add support for ACPI identification to xhci-platform
Provide the methods to let ACPI identify the need to use xhci-platform. Change the Kconfig files so the xhci-plat.o file is selectable during kernel config. This has been tested on an ARM64 machine with platform XHCI, an x86_64 machine with XHCI, and an x86_64 machine without XHCI. There were no regressions or error messages on the machines without platform XHCI. Signed-off-by: Mark Langsdorf Signed-off-by: Duc Dang --- Changes from v4: Remove #ifdef CONFIG_ACPI Changes from v3: Regenerate the patch over 4.2-rc5 No code change Changes from v2 Replaced tristate with a boolean as the driver doesn't compile as a module Correct --help-- to ---help--- Changes from v1 Renamed from "add support for APM X-Gene to xhci-platform" Removed changes to arm64/Kconfig Made CONFIG_USB_XHCI_PLATFORM a user selectable config option drivers/usb/host/Kconfig | 7 ++- drivers/usb/host/xhci-plat.c | 9 + 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig index 8afc3c1..96231ee 100644 --- a/drivers/usb/host/Kconfig +++ b/drivers/usb/host/Kconfig @@ -32,7 +32,12 @@ config USB_XHCI_PCI default y config USB_XHCI_PLATFORM - tristate + tristate "xHCI platform driver support" + ---help--- + Say 'Y' to enable the support for the xHCI host controller + as a platform device. Many ARM SoCs provide USB this way. + + If unsure, say 'Y'. config USB_XHCI_MVEBU tristate "xHCI support for Marvell Armada 375/38x" diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 5d03f8b..bd282cd 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "xhci.h" #include "xhci-mvebu.h" @@ -262,6 +263,13 @@ static const struct of_device_id usb_xhci_of_match[] = { MODULE_DEVICE_TABLE(of, usb_xhci_of_match); #endif +static const struct acpi_device_id usb_xhci_acpi_match[] = { + /* APM X-Gene USB Controller */ + { "PNP0D10", }, + { } +}; +MODULE_DEVICE_TABLE(acpi, usb_xhci_acpi_match); + static struct platform_driver usb_xhci_driver = { .probe = xhci_plat_probe, .remove = xhci_plat_remove, @@ -269,6 +277,7 @@ static struct platform_driver usb_xhci_driver = { .name = "xhci-hcd", .pm = DEV_PM_OPS, .of_match_table = of_match_ptr(usb_xhci_of_match), + .acpi_match_table = ACPI_PTR(usb_xhci_acpi_match), }, }; MODULE_ALIAS("platform:xhci-hcd"); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 6/6] locking/pvqspinlock: Queue node adaptive spinning
In an overcommitted guest where some vCPUs have to be halted to make forward progress in other areas, it is highly likely that a vCPU later in the spinlock queue will be spinning while the ones earlier in the queue would have been halted. The spinning in the later vCPUs is then just a waste of precious CPU cycles because they are not going to get the lock soon as the earlier ones have to be woken up and take their turn to get the lock. Reducing the spinning threshold is found to improve performance in an overcommitted VM guest, but decrease performance when there is no overcommittment. This patch implements an adaptive spinning mechanism where the vCPU will call pv_wait() earlier if all the following conditions are true: 1) the vCPU has not been halted before; 2) the previous vCPU is in the halted state; 3) there are a lot of pv_wait() for the current vCPU recently. Linux kernel builds were run in KVM guest on an 8-socket, 4 cores/socket Westmere-EX system and a 4-socket, 8 cores/socket Haswell-EX system. Both systems are configured to have 32 physical CPUs. The kernel build times before and after the patch were: WestmereHaswell Patch 32 vCPUs48 vCPUs32 vCPUs48 vCPUs - Before patch 3m03.2s 9m21.1s 2m08.9s16m14.8s After patch3m04.1s 9m28.5s 2m09.5s 8m29.3s This patch seemed to cause a tiny bit of performance degraduation for 32 vCPUs. For 48 vCPUs, there wasn't much change for Westmere, but a pretty big performance jump for Haswell. Signed-off-by: Waiman Long --- kernel/locking/qspinlock.c |5 +- kernel/locking/qspinlock_paravirt.h | 111 +- 2 files changed, 110 insertions(+), 6 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 94fdd27..da39d43 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -258,7 +258,8 @@ static __always_inline void set_locked(struct qspinlock *lock) */ static __always_inline void __pv_init_node(struct mcs_spinlock *node) { } -static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { } +static __always_inline void __pv_wait_node(struct mcs_spinlock *node, + struct mcs_spinlock *prev) { } static __always_inline void __pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) { } static __always_inline void __pv_wait_head(struct qspinlock *lock, @@ -415,7 +416,7 @@ queue: prev = decode_tail(old); WRITE_ONCE(prev->next, node); - pv_wait_node(node); + pv_wait_node(node, prev); arch_mcs_spin_lock_contended(>locked); } diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h index 9996609..f03bd7a 100644 --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -31,6 +31,38 @@ #define PENDING_SPIN_THRESHOLD (SPIN_THRESHOLD >> 5) /* + * Queue Node Adaptive Spinning + * + * A queue node vCPU will spin less if the following conditions are all true: + * 1) vCPU in the previous node is halted + * 2) it has not been halted before + * 3) there is a lot of pv_wait() in the curent vCPU recently + * + * The last condition is being monitored by the wait_hist field in the pv_node + * structure which tracks the history of pv_wait() relative to slowpath calls. + * Each pv_wait will increment this field by PV_WAITHIST_INC until it exceeds + * PV_WAITHIST_MAX. Each slowpath lock call will decrement it by 1 until it + * reaches PV_WAITHIST_MIN. If its value is higher than PV_WAITHIST_THRESHOLD, + * the vCPU will spin less. The reason for this adaptive spinning is to try + * to enable wait-early mode only on over-committed guest which helps + * performance. However, it shouldn't be enabled when the guest is not + * over-committed as it will hurt performance. + * + * With PV_WAITHIST_INC set to 4, each pv_wait() while not in wait-early mode + * will increment wait_hist by 3. Each slowpath call without pv_wait() will + * decrement wait_hist by 1. The threshold is set at about 3/4 of the range + * so that about 10 steps from the edges in either direction will reach the + * threshold. If, on average, more than 1/4 of all slowpath calls results in + * a pv_wait(), it should stay in the wait-early mode. + */ +#define PV_WAITHIST_MASK 0xff +#define PV_WAITHIST_INC4 +#define PV_WAITHIST_MIN1 +#define PV_WAITHIST_MAX40 +#define PV_WAITHIST_THRESHOLD 30 +#define PV_CAN_WAIT_EARLY(w) ((w)->wait_hist > PV_WAITHIST_THRESHOLD) + +/* * Queue node uses: vcpu_running & vcpu_halted. * Queue head uses: vcpu_running & vcpu_hashed. */ @@ -46,6 +78,8 @@ struct pv_node { int cpu; u8
[PATCH v5 2/6] locking/pvqspinlock: Add pending bit support
Like the native qspinlock, using the pending bit when it is lightly loaded to acquire the lock is faster than going through the PV queuing process which is even slower than the native queuing process. It also avoids loading two additional cachelines (the MCS and PV nodes). This patch adds the pending bit support for PV qspinlock. The pending bit code has a smaller spin threshold (1<<10). It will default back to the queuing method if it cannot acquired the lock within a certain time limit. On a VM with 32 vCPUs on a 32-core Westmere-EX box, the kernel build times on 4.2-rc1 based kernels were: KernelBuild Time Sys Time ---- w/o patch 3m28.5s 28m17.5s with patch 3m19.3s 23m55.7s Using a locking microbenchmark on the same system, the locking rates in (kops/s) were: Threads Rate w/o patch Rate with patch --- -- --- 2 (same socket) 6,515,265 7,077,476 2 (diff sockets)2,967,145 4,353,851 Signed-off-by: Waiman Long --- kernel/locking/qspinlock.c | 27 - kernel/locking/qspinlock_paravirt.h | 73 +++ 2 files changed, 99 insertions(+), 1 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 337c881..94fdd27 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -162,6 +162,17 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock) WRITE_ONCE(l->locked_pending, _Q_LOCKED_VAL); } +/** + * clear_pending - clear the pending bit. + * @lock: Pointer to queued spinlock structure + */ +static __always_inline void clear_pending(struct qspinlock *lock) +{ + struct __qspinlock *l = (void *)lock; + + WRITE_ONCE(l->pending, 0); +} + /* * xchg_tail - Put in the new queue tail code word & retrieve previous one * @lock : Pointer to queued spinlock structure @@ -193,6 +204,15 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock) } /** + * clear_pending - clear the pending bit. + * @lock: Pointer to queued spinlock structure + */ +static __always_inline void clear_pending(struct qspinlock *lock) +{ + atomic_add(-_Q_PENDING_VAL, >val); +} + +/** * xchg_tail - Put in the new queue tail code word & retrieve previous one * @lock : Pointer to queued spinlock structure * @tail : The new queue tail code word @@ -245,6 +265,7 @@ static __always_inline void __pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node) { } #define pv_enabled() false +#define pv_pending_lock(l, v) false #define pv_init_node __pv_init_node #define pv_wait_node __pv_wait_node @@ -286,8 +307,11 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); - if (pv_enabled()) + if (pv_enabled()) { + if (pv_pending_lock(lock, val)) + return; /* Got the lock via pending bit */ goto queue; + } if (virt_queued_spin_lock(lock)) return; @@ -463,6 +487,7 @@ EXPORT_SYMBOL(queued_spin_lock_slowpath); #undef pv_wait_node #undef pv_kick_node #undef pv_wait_head +#undef pv_pending_lock #undef queued_spin_lock_slowpath #define queued_spin_lock_slowpath __pv_queued_spin_lock_slowpath diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h index 6eafb9e..94f9adf 100644 --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -23,6 +23,14 @@ #define _Q_SLOW_VAL(3U << _Q_LOCKED_OFFSET) /* + * Queued Spinlock Spin Threshold + * + * The vCPU will spin a relatively short time in pending mode before falling + * back to queuing. + */ +#define PENDING_SPIN_THRESHOLD (SPIN_THRESHOLD >> 5) + +/* * Queue node uses: vcpu_running & vcpu_halted. * Queue head uses: vcpu_running & vcpu_hashed. */ @@ -157,6 +165,71 @@ static void pv_init_node(struct mcs_spinlock *node) } /* + * Try to acquire the lock and wait using the pending bit within a certain + * threshold as specified by PENDING_SPIN_THRESHOLD. If the threshold has + * been exceeded without getting the lock, we fall back to queuing. + */ +static int pv_pending_lock(struct qspinlock *lock, u32 val) +{ + int loop = PENDING_SPIN_THRESHOLD; + u32 new, old; + + /* +* wait for in-progress pending->locked hand-overs +*/ + while ((val == _Q_PENDING_VAL) && loop) { + cpu_relax(); + val = atomic_read(>val); + loop--; + } + + /* +* trylock || pending +* +* This loop does a trylock if lock is free or sets the pending bit +* if lock is taken until the cmpxchg succeeds. As it is expected +* that
[PATCH v5 3/6] locking/pvqspinlock: Collect slowpath lock statistics
This patch enables the accumulation of kicking and waiting related PV qspinlock statistics when the new QUEUED_LOCK_STAT configuration option is selected. It also enables the collection of kicking and wakeup latencies which have a heavy dependency on the CPUs being used. The measured latencies for different CPUs are: CPU Wakeup Kicking --- -- --- Haswell-EX 89.8us 7.4us Westmere-EX 67.6us 9.3us The measured latencies varied a bit from run-to-run. The wakeup latency is much higher than the kicking latency. A sample of statistics counts after a kernel build (no CPU overcommit) was: hash_hops_count=576912 kick_latencies=5258025484 kick_unlock_count=576911 kick_wait_count=576903 pending_fail_count=10722 pending_lock_count=6123545 spurious_wakeup=92 wait_again_count=75 wait_head_count=60 wait_node_count=576936 wake_latencies=37061460652 Signed-off-by: Waiman Long --- arch/x86/Kconfig|7 ++ kernel/locking/qspinlock_paravirt.h | 178 ++- 2 files changed, 180 insertions(+), 5 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4c9c8b8..86bf53e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -719,6 +719,13 @@ config PARAVIRT_SPINLOCKS If you are unsure how to answer this question, answer Y. +config QUEUED_LOCK_STAT + bool "Paravirt queued lock statistics" + depends on PARAVIRT && DEBUG_FS && QUEUED_SPINLOCKS + ---help--- + Enable the collection of statistical data on the behavior of + paravirtualized queued spinlocks and report them on debugfs. + source "arch/x86/xen/Kconfig" config KVM_GUEST diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h index 94f9adf..5eb5dea 100644 --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -49,6 +49,151 @@ struct pv_node { }; /* + * PV qspinlock statistics + */ +enum pv_qlock_stat { + pvstat_wait_head, + pvstat_wait_node, + pvstat_wait_again, + pvstat_kick_wait, + pvstat_kick_unlock, + pvstat_pend_lock, + pvstat_pend_fail, + pvstat_spurious, + pvstat_hops, + pvstat_num /* Total number of statistics counts */ +}; + +#ifdef CONFIG_QUEUED_LOCK_STAT +/* + * Collect pvqspinlock statiatics + */ +#include +#include + +static const char * const stat_fsnames[pvstat_num] = { + [pvstat_wait_head] = "wait_head_count", + [pvstat_wait_node] = "wait_node_count", + [pvstat_wait_again] = "wait_again_count", + [pvstat_kick_wait] = "kick_wait_count", + [pvstat_kick_unlock] = "kick_unlock_count", + [pvstat_pend_lock] = "pending_lock_count", + [pvstat_pend_fail] = "pending_fail_count", + [pvstat_spurious]= "spurious_wakeup", + [pvstat_hops]= "hash_hops_count", +}; + +static atomic_t pvstats[pvstat_num]; + +/* + * pv_kick_latencies = sum of all pv_kick latencies in ns + * pv_wake_latencies = sum of all wakeup latencies in ns + * + * Avg kick latency = pv_kick_latencies/kick_unlock_count + * Avg wake latency = pv_wake_latencies/kick_wait_count + * Avg # of hops/hash = hash_hops_count/kick_unlock_count + */ +static atomic64_t pv_kick_latencies, pv_wake_latencies; +static DEFINE_PER_CPU(u64, pv_kick_time); + +/* + * Reset all the statistics counts if set + */ +static bool reset_cnts __read_mostly; + +/* + * Initialize debugfs for the PV qspinlock statistics + */ +static int __init pv_qspinlock_debugfs(void) +{ + struct dentry *d_pvqlock = debugfs_create_dir("pv-qspinlock", NULL); + int i; + + if (!d_pvqlock) + pr_warn("Could not create 'pv-qspinlock' debugfs directory\n"); + + for (i = 0; i < pvstat_num; i++) + debugfs_create_u32(stat_fsnames[i], 0444, d_pvqlock, + (u32 *)[i]); + debugfs_create_u64("kick_latencies", 0444, d_pvqlock, + (u64 *)_kick_latencies); + debugfs_create_u64("wake_latencies", 0444, d_pvqlock, + (u64 *)_wake_latencies); + debugfs_create_bool("reset_cnts", 0644, d_pvqlock, (u32 *)_cnts); + return 0; +} +fs_initcall(pv_qspinlock_debugfs); + +/* + * Reset all the counts + */ +static noinline void pvstat_reset(void) +{ + int i; + + for (i = 0; i < pvstat_num; i++) + atomic_set([i], 0); + atomic64_set(_kick_latencies, 0); + atomic64_set(_wake_latencies, 0); + reset_cnts = 0; +} + +/* + * Increment the PV qspinlock statistics counts + */ +static inline void pvstat_inc(enum pv_qlock_stat stat) +{ + atomic_inc([stat]); + if (unlikely(reset_cnts)) + pvstat_reset(); +} + +/* + * PV hash hop count + */ +static inline void pvstat_hop(int hopcnt) +{ + atomic_add(hopcnt, [pvstat_hops]); +} + +/* + *
[PATCH v5 5/6] locking/pvqspinlock: Allow vCPUs kick-ahead
Frequent CPU halting (vmexit) and CPU kicking (vmenter) lengthens critical section and block forward progress. This patch implements a kick-ahead mechanism where the unlocker will kick the queue head vCPUs as well as up to four additional vCPUs next to the queue head if they were halted. The kickings are done after exiting the critical section to improve parallelism. The amount of kick-ahead allowed depends on the number of vCPUs in the VM guest. Currently it allows up to 1 vCPU kick-ahead per 4 vCPUs available up to a maximum of PV_KICK_AHEAD_MAX (4). There are diminishing returns in increasing the maximum value. The current value of 4 is a compromise of getting a nice performance boost without penalizing too much on the one vCPU that is doing all the kickings. Linux kernel builds were run in KVM guest on an 8-socket, 4 cores/socket Westmere-EX system and a 4-socket, 8 cores/socket Haswell-EX system. Both systems are configured to have 32 physical CPUs. The kernel build times before and after the patch were: WestmereHaswell Patch 32 vCPUs48 vCPUs32 vCPUs48 vCPUs - Before patch 3m21.9s11m20.6s 2m08.6s17m12.8s After patch3m03.2s 9m21.1s 2m08.9s16m14.8s This improves performance quite substantially on Westmere, but not so much on Haswell. Signed-off-by: Waiman Long --- kernel/locking/qspinlock_paravirt.h | 71 +- 1 files changed, 68 insertions(+), 3 deletions(-) diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h index 7c9d6ed..9996609 100644 --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -57,6 +57,7 @@ enum pv_qlock_stat { pvstat_wait_again, pvstat_kick_wait, pvstat_kick_unlock, + pvstat_kick_ahead, pvstat_pend_lock, pvstat_pend_fail, pvstat_spurious, @@ -77,6 +78,7 @@ static const char * const stat_fsnames[pvstat_num] = { [pvstat_wait_again] = "wait_again_count", [pvstat_kick_wait] = "kick_wait_count", [pvstat_kick_unlock] = "kick_unlock_count", + [pvstat_kick_ahead] = "kick_ahead_count", [pvstat_pend_lock] = "pending_lock_count", [pvstat_pend_fail] = "pending_fail_count", [pvstat_spurious]= "spurious_wakeup", @@ -89,7 +91,7 @@ static atomic_t pvstats[pvstat_num]; * pv_kick_latencies = sum of all pv_kick latencies in ns * pv_wake_latencies = sum of all wakeup latencies in ns * - * Avg kick latency = pv_kick_latencies/kick_unlock_count + * Avg kick latency = pv_kick_latencies/(kick_unlock_count + kick_ahead_count) * Avg wake latency = pv_wake_latencies/kick_wait_count * Avg # of hops/hash = hash_hops_count/kick_unlock_count */ @@ -221,6 +223,18 @@ static struct pv_hash_entry *pv_lock_hash; static unsigned int pv_lock_hash_bits __read_mostly; /* + * Allow kick-ahead of vCPUs at unlock time + * + * The pv_kick_ahead value is set by a simple formula that 1 vCPU kick-ahead + * is allowed per 4 vCPUs available up to a maximum of PV_KICK_AHEAD_MAX. + * There are diminishing returns in increasing PV_KICK_AHEAD_MAX. The current + * value of 4 is a good compromise that gives a good performance boost without + * penalizing the vCPU that is doing the kicking by too much. + */ +#define PV_KICK_AHEAD_MAX 4 +static int pv_kick_ahead __read_mostly; + +/* * Allocate memory for the PV qspinlock hash buckets * * This function should be called from the paravirt spinlock initialization @@ -228,7 +242,8 @@ static unsigned int pv_lock_hash_bits __read_mostly; */ void __init __pv_init_lock_hash(void) { - int pv_hash_size = ALIGN(4 * num_possible_cpus(), PV_HE_PER_LINE); + int ncpus = num_possible_cpus(); + int pv_hash_size = ALIGN(4 * ncpus, PV_HE_PER_LINE); if (pv_hash_size < PV_HE_MIN) pv_hash_size = PV_HE_MIN; @@ -242,6 +257,13 @@ void __init __pv_init_lock_hash(void) pv_hash_size, 0, HASH_EARLY, _lock_hash_bits, NULL, pv_hash_size, pv_hash_size); + /* +* Enable the unlock kick ahead mode according to the number of +* vCPUs available. +*/ + pv_kick_ahead = min(ncpus/4, PV_KICK_AHEAD_MAX); + if (pv_kick_ahead) + pr_info("PV unlock kick ahead max count = %d\n", pv_kick_ahead); } #define for_each_hash_entry(he, offset, hash) \ @@ -551,6 +573,26 @@ static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node) } /* + * Helper to get the address of the next kickable node + * + * The node has to be in the halted state. The state will then be + * transitioned to the running state. If no kickable node is
[PATCH v5 4/6] locking/pvqspinlock, x86: Optimize PV unlock code path
The unlock function in queued spinlocks was optimized for better performance on bare metal systems at the expense of virtualized guests. For x86-64 systems, the unlock call needs to go through a PV_CALLEE_SAVE_REGS_THUNK() which saves and restores 8 64-bit registers before calling the real __pv_queued_spin_unlock() function. The thunk code may also be in a separate cacheline from __pv_queued_spin_unlock(). This patch optimizes the PV unlock code path by: 1) Moving the unlock slowpath code from the fastpath into a separate __pv_queued_spin_unlock_slowpath() function to make the fastpath as simple as possible.. 2) For x86-64, hand-coded an assembly function to combine the register saving thunk code with the fastpath code. Only registers that are used in the fastpath will be saved and restored. If the fastpath fails, the slowpath function will be called via another PV_CALLEE_SAVE_REGS_THUNK(). For 32-bit, it falls back to the C __pv_queued_spin_unlock() code as the thunk saves and restores only one 32-bit register. With a microbenchmark of 5M lock-unlock loop, the table below shows the execution times before and after the patch with different number of threads in a VM running on a 32-core Westmere-EX box with x86-64 4.2-rc1 based kernels: Threads Before patchAfter patch % Change --- --- 1 134.1 ms 119.3 ms-11% 2 1286 ms953 ms-26% 3 3715 ms 3480 ms-6.3% 4 4092 ms 3764 ms-8.0% Signed-off-by: Waiman Long --- arch/x86/include/asm/qspinlock_paravirt.h | 59 + kernel/locking/qspinlock_paravirt.h | 43 + 2 files changed, 86 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/qspinlock_paravirt.h b/arch/x86/include/asm/qspinlock_paravirt.h index b002e71..3001972 100644 --- a/arch/x86/include/asm/qspinlock_paravirt.h +++ b/arch/x86/include/asm/qspinlock_paravirt.h @@ -1,6 +1,65 @@ #ifndef __ASM_QSPINLOCK_PARAVIRT_H #define __ASM_QSPINLOCK_PARAVIRT_H +/* + * For x86-64, PV_CALLEE_SAVE_REGS_THUNK() saves and restores 8 64-bit + * registers. For i386, however, only 1 32-bit register needs to be saved + * and restored. So an optimized version of __pv_queued_spin_unlock() is + * hand-coded for 64-bit, but it isn't worthwhile to do it for 32-bit. + */ +#ifdef CONFIG_64BIT + +PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath); +#define __pv_queued_spin_unlock__pv_queued_spin_unlock +#define PV_UNLOCK "__raw_callee_save___pv_queued_spin_unlock" +#define PV_UNLOCK_SLOWPATH "__raw_callee_save___pv_queued_spin_unlock_slowpath" + +/* + * Optimized assembly version of __raw_callee_save___pv_queued_spin_unlock + * which combines the registers saving trunk and the body of the following + * C code: + * + * void __pv_queued_spin_unlock(struct qspinlock *lock) + * { + * struct __qspinlock *l = (void *)lock; + * u8 lockval = cmpxchg(>locked, _Q_LOCKED_VAL, 0); + * + * if (likely(lockval == _Q_LOCKED_VAL)) + * return; + * pv_queued_spin_unlock_slowpath(lock, lockval); + * } + * + * For x86-64, + * rdi = lock(first argument) + * rsi = lockval (second argument) + * rdx = internal variable (set to 0) + */ +asm(".pushsection .text;" +".globl " PV_UNLOCK ";" +".align 4,0x90;" +PV_UNLOCK ": " +"push %rdx;" +"mov $0x1,%eax;" +"xor %edx,%edx;" +"lock cmpxchg %dl,(%rdi);" +"cmp $0x1,%al;" +"jne .slowpath;" +"pop %rdx;" +"ret;" +".slowpath: " +"push %rsi;" +"movzbl %al,%esi;" +"call " PV_UNLOCK_SLOWPATH ";" +"pop%rsi;" +"pop%rdx;" +"ret;" +".size " PV_UNLOCK ", .-" PV_UNLOCK ";" +".popsection"); + +#else /* CONFIG_64BIT */ + +extern void __pv_queued_spin_unlock(struct qspinlock *lock); PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock); +#endif /* CONFIG_64BIT */ #endif diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h index 5eb5dea..7c9d6ed 100644 --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -551,23 +551,14 @@ static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node) } /* - * PV version of the unlock function to be used in stead of - * queued_spin_unlock(). + * PV versions of the unlock fastpath and slowpath functions to be used + * instead of queued_spin_unlock(). */ -__visible void __pv_queued_spin_unlock(struct qspinlock *lock) +__visible void +__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked) { struct __qspinlock *l = (void *)lock; struct pv_node *node; - u8 locked; - - /* -* We must not unlock if SLOW, because in that case we must first -* unhash. Otherwise it would be possible to
[PATCH v5 1/2] usb: make xhci platform driver use 64 bit or 32 bit DMA
The xhci platform driver needs to work on systems that either only support 64-bit DMA or only support 32-bit DMA. Attempt to set a coherent dma mask for 64-bit DMA, and attempt again with 32-bit DMA if that fails. [dhdang: Regenerate the patch over 4.2-rc5] Signed-off-by: Mark Langsdorf Tested-by: Mark Salter Signed-off-by: Duc Dang --- Changes from v4: None Changes from v3: Re-generate the patch over 4.2-rc5 No code change. Changes from v2: None Changes from v1: Consolidated to use dma_set_mask_and_coherent Got rid of the check against sizeof(dma_addr_t) drivers/usb/host/xhci-plat.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 890ad9d..5d03f8b 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -93,14 +93,14 @@ static int xhci_plat_probe(struct platform_device *pdev) if (irq < 0) return -ENODEV; - /* Initialize dma_mask and coherent_dma_mask to 32-bits */ - ret = dma_set_coherent_mask(>dev, DMA_BIT_MASK(32)); - if (ret) - return ret; - if (!pdev->dev.dma_mask) - pdev->dev.dma_mask = >dev.coherent_dma_mask; - else - dma_set_mask(>dev, DMA_BIT_MASK(32)); + /* Try setting the coherent_dma_mask to 64 bits, then try 32 bits */ + ret = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(64)); + if (ret) { + ret = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(32)); + if (ret) + return ret; + } + hcd = usb_create_hcd(driver, >dev, dev_name(>dev)); if (!hcd) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 0/6] locking/qspinlock: Enhance pvqspinlock performance
v4->v5: - Rebased the patch to the latest tip tree. - Corrected the comments and commit log for patch 1. - Removed the v4 patch 5 as PV kick deferment is no longer needed with the new tip tree. - Simplified the adaptive spinning patch (patch 6) & improve its performance a bit further. - Re-ran the benchmark test with the new patch. v3->v4: - Patch 1: add comment about possible racing condition in PV unlock. - Patch 2: simplified the pv_pending_lock() function as suggested by Davidlohr. - Move PV unlock optimization patch forward to patch 4 & rerun performance test. v2->v3: - Moved deferred kicking enablement patch forward & move back the kick-ahead patch to make the effect of kick-ahead more visible. - Reworked patch 6 to make it more readable. - Reverted back to use state as a tri-state variable instead of adding an additional bistate variable. - Added performance data for different values of PV_KICK_AHEAD_MAX. - Add a new patch to optimize PV unlock code path performance. v1->v2: - Take out the queued unfair lock patches - Add a patch to simplify the PV unlock code - Move pending bit and statistics collection patches to the front - Keep vCPU kicking in pv_kick_node(), but defer it to unlock time when appropriate. - Change the wait-early patch to use adaptive spinning to better balance the difference effect on normal and over-committed guests. - Add patch-to-patch performance changes in the patch commit logs. This patchset tries to improve the performance of both normal and over-commmitted VM guests. The kick-ahead and adaptive spinning patches are inspired by the "Do Virtual Machines Really Scale?" blog from Sanidhya Kashyap. Patch 1 simplifies the unlock code by removing the unnecessary state check. Patch 2 adds pending bit support to pvqspinlock improving performance at light load. Patch 3 allows the collection of various count data that are useful to see what is happening in the system. They do add a bit of overhead when enabled slowing performance a tiny bit. Patch 4 optimizes the PV unlock code path performance for x86-64 architecture. Patch 5 enables multiple vCPU kick-ahead's at unlock time, outside of the critical section which can improve performance in overcommitted guests and sometime even in normal guests. Patch 6 enables adaptive spinning in the queue nodes. This patch can lead to pretty big performance increase in over-committed guest at the expense of a slight performance hit in normal guests. Patches 2 & 4 improves performance of common uncontended and lightly contended cases. Patches 5-6 are for improving performance in over-committed VM guests. Performance measurements were done on a 32-CPU Westmere-EX and Haswell-EX systems. The Westmere-EX system got the most performance gain from patch 5, whereas the Haswell-EX system got the most gain from patch 6 for over-committed guests. The table below shows the Linux kernel build times for various values of PV_KICK_AHEAD_MAX on an over-committed 48-vCPU guest on the Westmere-EX system: PV_KICK_AHEAD_MAX Patches 1-5 Patches 1-6 - --- --- 1 9m46.9s11m10.1s 2 9m40.2s10m08.3s 3 9m36.8s 9m49.8s 4 9m35.9s 9m38.7s 5 9m35.1s 9m33.0s 6 9m35.7s 9m28.5s With patches 1-5, the performance wasn't very sensitive to different PV_KICK_AHEAD_MAX values. Adding patch 6 into the mix, however, changes the picture quite dramatically. There is a performance regression if PV_KICK_AHEAD_MAX is too small. Starting with a value of 4, increasing PV_KICK_AHEAD_MAX only gets us a minor benefit. Waiman Long (6): locking/pvqspinlock: Unconditional PV kick with _Q_SLOW_VAL locking/pvqspinlock: Add pending bit support locking/pvqspinlock: Collect slowpath lock statistics locking/pvqspinlock, x86: Optimize PV unlock code path locking/pvqspinlock: Allow vCPUs kick-ahead locking/pvqspinlock: Queue node adaptive spinning arch/x86/Kconfig |7 + arch/x86/include/asm/qspinlock_paravirt.h | 59 kernel/locking/qspinlock.c| 32 ++- kernel/locking/qspinlock_paravirt.h | 475 +++-- 4 files changed, 542 insertions(+), 31 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 1/6] locking/pvqspinlock: Unconditional PV kick with _Q_SLOW_VAL
If _Q_SLOW_VAL has been set, the vCPU state must have been vcpu_hashed. The extra check at the end of __pv_queued_spin_unlock() is unnecessary and so is removed. Signed-off-by: Waiman Long Reviewed-by: Davidlohr Bueso --- kernel/locking/qspinlock_paravirt.h |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h index c8e6e9a..6eafb9e 100644 --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -364,8 +364,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock *lock) * vCPU is harmless other than the additional latency in completing * the unlock. */ - if (READ_ONCE(node->state) == vcpu_hashed) - pv_kick(node->cpu); + pv_kick(node->cpu); } /* * Include the architecture specific callee-save thunk of the -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 2/2] usb: Add support for ACPI identification to xhci-platform
On Fri, Aug 7, 2015 at 6:29 PM, Greg KH wrote: > On Fri, Aug 07, 2015 at 06:03:36PM -0700, Duc Dang wrote: >> Provide the methods to let ACPI identify the need to use >> xhci-platform. Change the Kconfig files so the >> xhci-plat.o file is selectable during kernel config. >> >> This has been tested on an ARM64 machine with platform XHCI, an >> x86_64 machine with XHCI, and an x86_64 machine without XHCI. >> There were no regressions or error messages on the machines >> without platform XHCI. >> >> [dhdang: regenerate the patch over 4.2-rc5] >> Signed-off-by: Mark Langsdorf >> Signed-off-by: Duc Dang >> >> --- >> Changes from v3: >> Regenerate the patch over 4.2-rc5 >> No code change >> >> Changes from v2 >> Replaced tristate with a boolean as the driver doesn't >> compile as a module >> Correct --help-- to ---help--- >> >> Changes from v1 >> Renamed from "add support for APM X-Gene to xhci-platform" >> Removed changes to arm64/Kconfig >> Made CONFIG_USB_XHCI_PLATFORM a user selectable config option >> >> drivers/usb/host/Kconfig | 7 ++- >> drivers/usb/host/xhci-plat.c | 11 +++ >> 2 files changed, 17 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig >> index 8afc3c1..96231ee 100644 >> --- a/drivers/usb/host/Kconfig >> +++ b/drivers/usb/host/Kconfig >> @@ -32,7 +32,12 @@ config USB_XHCI_PCI >> default y >> >> config USB_XHCI_PLATFORM >> - tristate >> + tristate "xHCI platform driver support" >> + ---help--- >> + Say 'Y' to enable the support for the xHCI host controller >> + as a platform device. Many ARM SoCs provide USB this way. >> + >> + If unsure, say 'Y'. >> >> config USB_XHCI_MVEBU >> tristate "xHCI support for Marvell Armada 375/38x" >> diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c >> index 5d03f8b..14b40d2 100644 >> --- a/drivers/usb/host/xhci-plat.c >> +++ b/drivers/usb/host/xhci-plat.c >> @@ -19,6 +19,7 @@ >> #include >> #include >> #include >> +#include >> >> #include "xhci.h" >> #include "xhci-mvebu.h" >> @@ -262,6 +263,15 @@ static const struct of_device_id usb_xhci_of_match[] = { >> MODULE_DEVICE_TABLE(of, usb_xhci_of_match); >> #endif >> >> +#ifdef CONFIG_ACPI > > You shoudn't need this #ifdef, right? You are correct, Greg. I will post a new version that remove this #ifdef CONFIG_ACPI shortly > > thanks, > > greg k-h -- Regards, Duc Dang. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf, tools, report: Add support for srcfile sort key
On Fri, Aug 07, 2015 at 09:02:15PM -0300, Arnaldo Carvalho de Melo wrote: > Em Fri, Aug 07, 2015 at 08:51:45PM -0300, Arnaldo Carvalho de Melo escreveu: > > Em Fri, Aug 07, 2015 at 03:54:24PM -0700, Andi Kleen escreveu: > > > From: Andi Kleen > > > > > > In some cases it's useful to characterize samples by file. This is useful > > > to get a higher level categorization, for example to map cost to > > > subsystems. > > > > > > Add a srcfile sort key to perf report. It builds on top of the existing > > > srcline support. > > > > Applied > > Humm, holding this up a bit, further testing showed some oddities, > fedora21, the width of the column is being limited to the lenght of the > header Yes I've seen that, I just use -w normally. It also happens with --sort srcline. The column sizing code could probably be somewhat smarter and always allow the last column to become as wide as needed. But that's something that should be done separately; I don't think it belongs into this patch. > and there are some DWARF errors, have you noticed those? No I didn't. Some generic issue, likely happening with srcline and perhaps even objdump -S too. Find out with strace what file it is. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] cpufreq-dt: make scaling_boost_freqs sysfs attr available when boost is enabled
On 07-08-15, 13:59, Bartlomiej Zolnierkiewicz wrote: > Make scaling_boost_freqs sysfs attribute is available when > cpufreq-dt driver is used and boost support is enabled. > > Cc: Thomas Abraham > Cc: Javier Martinez Canillas > Cc: Krzysztof Kozlowski > Suggested-by: Viresh Kumar > Acked-by: Viresh Kumar > Signed-off-by: Bartlomiej Zolnierkiewicz > --- > v2: Removed leftover prototype. > v3: added missing Acked-by > Sorry for two resends, this is not my day.. > Rafael, please apply for v4.3. Yeah, please apply only this patch from the series. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 0/6] cpufreq: use generic cpufreq drivers for Exynos4x12 platform
On 08-08-15, 00:24, Rafael J. Wysocki wrote: > OK, so please let me know which patches you want me to pick up. > > Ideally, I'd prefer them to be resent in a separate series with ACKs and all > with a cover letter clearly stating whose tree they are being targeted at. He already sent it separately, let me send a confirmatory mail in reply to that. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 1/6] cpufreq: make scaling_boost_freqs sysfs attr available when boost is enabled
On 08-08-15, 00:21, Rafael J. Wysocki wrote: > > Acked-by: Viresh Kumar > > And what exactly am I supposed to do with this? > > Have a robot that will pick up all patches ACKed by you magically or what? :) That's why I have asked Bartlomiej specifically to send it separately to the pm list as a cpufreq patch. I wasn't expecting you to apply right from here. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] ARM: VDSO: fix coarse clock monotonicity regression
Since 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the real timekeeper last") it has become possible on ARM to: - Obtain a CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE timestamp via syscall. - Subsequently obtain a timestamp for the same clock ID via VDSO which predates the first timestamp (by one jiffy). This is because ARM's update_vsyscall is deriving the coarse time using the __current_kernel_time interface, when it should really be using the timekeeper object provided to it by the timekeeping core. It happened to work before only because __current_kernel_time would access the same timekeeper object which had been passed to update_vsyscall. This is no longer the case. Signed-off-by: Nathan Lynch --- arch/arm/kernel/vdso.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/arm/kernel/vdso.c b/arch/arm/kernel/vdso.c index efe17dd9b921..c8b243c1aef8 100644 --- a/arch/arm/kernel/vdso.c +++ b/arch/arm/kernel/vdso.c @@ -296,7 +296,6 @@ static bool tk_is_cntvct(const struct timekeeper *tk) */ void update_vsyscall(struct timekeeper *tk) { - struct timespec xtime_coarse; struct timespec64 *wtm = >wall_to_monotonic; if (!cntvct_ok) { @@ -308,10 +307,10 @@ void update_vsyscall(struct timekeeper *tk) vdso_write_begin(vdso_data); - xtime_coarse = __current_kernel_time(); vdso_data->tk_is_cntvct = tk_is_cntvct(tk); - vdso_data->xtime_coarse_sec = xtime_coarse.tv_sec; - vdso_data->xtime_coarse_nsec= xtime_coarse.tv_nsec; + vdso_data->xtime_coarse_sec = tk->xtime_sec; + vdso_data->xtime_coarse_nsec= tk->tkr_mono.xtime_nsec >> + tk->tkr_mono.shift; vdso_data->wtm_clock_sec= wtm->tv_sec; vdso_data->wtm_clock_nsec = wtm->tv_nsec; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] fix vdso coarse clock monotonicity regressions
Commit 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the real timekeeper last") made it so the user can observe the coarse clocks going backwards on arm and arm64, if they're really looking for it. Technically these are fixing regressions versus 4.1, but I won't be bothered if they don't make 4.2 final at this late stage, since only the (seldom-used?) coarse clocks are affected. I'd like to collect review/acks for these now and make sure they at least make it into 4.3-rc1 (and -stable after that). Nathan Lynch (2): ARM: VDSO: fix coarse clock monotonicity regression arm64: VDSO: fix coarse clock monotonicity regression arch/arm/kernel/vdso.c | 7 +++ arch/arm64/kernel/vdso.c | 7 +++ 2 files changed, 6 insertions(+), 8 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] arm64: VDSO: fix coarse clock monotonicity regression
Since 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the real timekeeper last") it has become possible on arm64 to: - Obtain a CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE timestamp via syscall. - Subsequently obtain a timestamp for the same clock ID via VDSO which predates the first timestamp (by one jiffy). This is because arm64's update_vsyscall is deriving the coarse time using the __current_kernel_time interface, when it should really be using the timekeeper object provided to it by the timekeeping core. It happened to work before only because __current_kernel_time would access the same timekeeper object which had been passed to update_vsyscall. This is no longer the case. Signed-off-by: Nathan Lynch --- arch/arm64/kernel/vdso.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index ec37ab3f524f..97bc68f4c689 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -199,16 +199,15 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, */ void update_vsyscall(struct timekeeper *tk) { - struct timespec xtime_coarse; u32 use_syscall = strcmp(tk->tkr_mono.clock->name, "arch_sys_counter"); ++vdso_data->tb_seq_count; smp_wmb(); - xtime_coarse = __current_kernel_time(); vdso_data->use_syscall = use_syscall; - vdso_data->xtime_coarse_sec = xtime_coarse.tv_sec; - vdso_data->xtime_coarse_nsec= xtime_coarse.tv_nsec; + vdso_data->xtime_coarse_sec = tk->xtime_sec; + vdso_data->xtime_coarse_nsec= tk->tkr_mono.xtime_nsec >> + tk->tkr_mono.shift; vdso_data->wtm_clock_sec= tk->wall_to_monotonic.tv_sec; vdso_data->wtm_clock_nsec = tk->wall_to_monotonic.tv_nsec; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 2/3] xfs: Disable percpu SB on PREEMPT_RT_FULL
3.2.69-rt102-rc1 stable review patch. If anyone has any objections, please let me know. -- From: Steven Rostedt Running a test on a large CPU count box with xfs, I hit a live lock with the following backtraces on several CPUs: Call Trace: [] __const_udelay+0x28/0x30 [] xfs_icsb_lock_cntr+0x2a/0x40 [xfs] [] xfs_icsb_modify_counters+0x71/0x280 [xfs] [] xfs_trans_reserve+0x171/0x210 [xfs] [] xfs_create+0x24d/0x6f0 [xfs] [] ? avc_has_perm_flags+0xfb/0x1e0 [] xfs_vn_mknod+0xbb/0x1e0 [xfs] [] xfs_vn_create+0x13/0x20 [xfs] [] vfs_create+0xcd/0x130 [] do_last+0xb8f/0x1240 [] path_openat+0xc2/0x490 Looking at the code I see it was stuck at: STATIC void xfs_icsb_lock_cntr( xfs_icsb_cnts_t *icsbp) { while (test_and_set_bit(XFS_ICSB_FLAG_LOCK, >icsb_flags)) { ndelay(1000); } } In xfs_icsb_modify_counters() the code is fine. There's a preempt_disable() called when taking this bit spinlock and a preempt_enable() after it is released. The issue is that not all locations are protected by preempt_disable() when PREEMPT_RT is set. Namely the places that grab all CPU cntr locks. STATIC void xfs_icsb_lock_all_counters( xfs_mount_t *mp) { xfs_icsb_cnts_t *cntp; int i; for_each_online_cpu(i) { cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i); xfs_icsb_lock_cntr(cntp); } } STATIC void xfs_icsb_disable_counter() { [...] xfs_icsb_lock_all_counters(mp); [...] xfs_icsb_unlock_all_counters(mp); } STATIC void xfs_icsb_balance_counter_locked() { [...] xfs_icsb_disable_counter(); [...] } STATIC void xfs_icsb_balance_counter( xfs_mount_t *mp, xfs_sb_field_t fields, int min_per_cpu) { spin_lock(>m_sb_lock); xfs_icsb_balance_counter_locked(mp, fields, min_per_cpu); spin_unlock(>m_sb_lock); } Now, when PREEMPT_RT is not enabled, that spin_lock() disables preemption. But for PREEMPT_RT, it does not. Although with my test box I was not able to produce a task state of all tasks, but I'm assuming that some task called the xfs_icsb_lock_all_counters() and was preempted by an RT task and could not finish, causing all callers of that lock to block indefinitely. Dave Chinner has stated that the scalability of that code will probably be negated by PREEMPT_RT, and that it is probably best to just disable the code in question. Also, this code has been rewritten in newer kernels. Link: http://lkml.kernel.org/r/20150504004844.GA21261@dastard Cc: stable...@vger.kernel.org Suggested-by: Dave Chinner Signed-off-by: Steven Rostedt --- fs/xfs/xfs_linux.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index 828662f70d64..13d86a8dae43 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -97,7 +97,7 @@ /* * Feature macros (disable/enable) */ -#ifdef CONFIG_SMP +#if defined(CONFIG_SMP) && !defined(CONFIG_PREEMPT_RT_FULL) #define HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */ #else #undef HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */ -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 0/3] Linux 3.2.69-rt102-rc1
Dear RT Folks, This is the RT stable review cycle of patch 3.2.69-rt102-rc1. Please scream at me if I messed something up. Please test the patches too. The -rc release will be uploaded to kernel.org and will be deleted when the final release is out. This is just a review release (or release candidate). The pre-releases will not be pushed to the git repository, only the final release is. If all goes well, this patch will be converted to the next main release on 8/10/2015. Enjoy, -- Steve To build 3.2.69-rt102-rc1 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.2.tar.xz http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.2.69.xz http://www.kernel.org/pub/linux/kernel/projects/rt/3.2/patch-3.2.69-rt102-rc1.patch.xz You can also build from 3.2.69-rt101 by applying the incremental patch: http://www.kernel.org/pub/linux/kernel/projects/rt/3.2/incr/patch-3.2.69-rt101-rt102-rc1.patch.xz Changes from 3.2.69-rt101: --- Steven Rostedt (1): xfs: Disable percpu SB on PREEMPT_RT_FULL Steven Rostedt (Red Hat) (1): Linux 3.2.69-rt102-rc1 Thomas Gleixner (1): mm/slub: move slab initialization into irq enabled region fs/xfs/xfs_linux.h | 2 +- localversion-rt| 2 +- mm/slub.c | 77 ++ 3 files changed, 39 insertions(+), 42 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 3/3] Linux 3.2.69-rt102-rc1
3.2.69-rt102-rc1 stable review patch. If anyone has any objections, please let me know. -- From: "Steven Rostedt (Red Hat)" --- localversion-rt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/localversion-rt b/localversion-rt index 9ea5981fa956..04456d92828d 100644 --- a/localversion-rt +++ b/localversion-rt @@ -1 +1 @@ --rt101 +-rt102-rc1 -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 1/3] mm/slub: move slab initialization into irq enabled region
3.2.69-rt102-rc1 stable review patch. If anyone has any objections, please let me know. -- From: Thomas Gleixner Initializing a new slab can introduce rather large latencies because most of the initialization runs always with interrupts disabled. There is no point in doing so. The newly allocated slab is not visible yet, so there is no reason to protect it against concurrent alloc/free. Move the expensive parts of the initialization into allocate_slab(), so for all allocations with GFP_WAIT set, interrupts are enabled. Signed-off-by: Thomas Gleixner Acked-by: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Sebastian Andrzej Siewior Cc: Steven Rostedt Cc: Peter Zijlstra Signed-off-by: Andrew Morton Signed-off-by: Steven Rostedt --- mm/slub.c | 77 ++- 1 file changed, 37 insertions(+), 40 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 6a4c2fb83839..513c55ddbc3d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1258,6 +1258,14 @@ static inline void slab_free_hook(struct kmem_cache *s, void *x) {} #endif /* CONFIG_SLUB_DEBUG */ +static void setup_object(struct kmem_cache *s, struct page *page, + void *object) +{ + setup_object_debug(s, page, object); + if (unlikely(s->ctor)) + s->ctor(object); +} + /* * Slab allocation and freeing */ @@ -1279,6 +1287,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) struct page *page; struct kmem_cache_order_objects oo = s->oo; gfp_t alloc_gfp; + void *start, *last, *p; + int idx, order; flags &= gfp_allowed_mask; @@ -1301,17 +1311,11 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) * Try a lower order alloc if possible */ page = alloc_slab_page(flags, node, oo); - - if (page) - stat(s, ORDER_FALLBACK); + if (unlikely(!page)) + goto out; + stat(s, ORDER_FALLBACK); } - if (flags & __GFP_WAIT) - local_irq_disable(); - - if (!page) - return NULL; - if (kmemcheck_enabled && !(s->flags & (SLAB_NOTRACK | DEBUG_DEFAULT_FLAGS))) { int pages = 1 << oo_order(oo); @@ -1329,37 +1333,6 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) } page->objects = oo_objects(oo); - mod_zone_page_state(page_zone(page), - (s->flags & SLAB_RECLAIM_ACCOUNT) ? - NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE, - 1 << oo_order(oo)); - - return page; -} - -static void setup_object(struct kmem_cache *s, struct page *page, - void *object) -{ - setup_object_debug(s, page, object); - if (unlikely(s->ctor)) - s->ctor(object); -} - -static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node) -{ - struct page *page; - void *start; - void *last; - void *p; - - BUG_ON(flags & GFP_SLAB_BUG_MASK); - - page = allocate_slab(s, - flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); - if (!page) - goto out; - - inc_slabs_node(s, page_to_nid(page), page->objects); page->slab = s; page->flags |= 1 << PG_slab; @@ -1380,10 +1353,34 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node) page->freelist = start; page->inuse = page->objects; page->frozen = 1; + out: + if (flags & __GFP_WAIT) + local_irq_disable(); + if (!page) + return NULL; + + mod_zone_page_state(page_zone(page), + (s->flags & SLAB_RECLAIM_ACCOUNT) ? + NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE, + 1 << oo_order(oo)); + + inc_slabs_node(s, page_to_nid(page), page->objects); + return page; } +static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node) +{ + if (unlikely(flags & GFP_SLAB_BUG_MASK)) { + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); + BUG(); + } + + return allocate_slab(s, + flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); +} + static void __free_slab(struct kmem_cache *s, struct page *page) { int order = compound_order(page); -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dma: omap-dma: add support for pause of non-cyclic transfers
On 08/07/2015 02:32 PM, Russell King - ARM Linux wrote: > On Fri, Aug 07, 2015 at 02:21:59PM -0400, Peter Hurley wrote: >> [ + Heikki ] >> >> On 08/07/2015 12:33 PM, Russell King - ARM Linux wrote: >>> What you have is a race condition in the code you a responsible for >>> maintaining, caused by poorly implemented code. Fix it, rather than >>> whinging about drivers outside of your subsystem having never implemented >>> _optional_ things that you choose to merge broken code which relied upon >>> it _without_ checking that the operation succeeded. >>> >>> It is _entirely_ your code which is wrong here. >>> >>> I will wait for that to be fixed before acking the omap-dma change since >>> you obviously need something to test with. >> >> I'm not sure to what you're referring here. >> >> A WARNing fixes nothing. > > The warning can wait. > >> If you mean some patch, as yet unwritten, that handles the dma cases when >> dmaengine_pause() is unimplemented without data loss, ok, but please confirm >> that's what you mean. > > But the regression needs fixing. I too would prefer the bug to be fixed. But calling it a regression is incorrect. There is no previous SHA in which this problem didn't exist, except before either 8250_dma or 8250_omap was added. >From the outset, both the 8250 dma code and the 8250_omap driver (mistakenly) relied on dmaengine_pause. >> However, at some point one must look at the api and wonder if the separation >> of concern has been drawn in the right place. > > It _is_ in the right place. dmaengine_pause() always has been permitted > to fail. It's the responsibility of the user of this API to _check_ the > return code to find out whether it had the desired effect. Not checking > the return code is a bug in the caller's code. > > If that wasn't the case, dmaengine_pause() would have a void return type. > It doesn't. It has an 'int' to allow failure A resource error is significantly different than ENOSYS or EINVAL. > or to allow non- > implementation for cases where the underlying hardware can't pause the > channel without causing data loss. That's your assertion; I've seen no documentation to back that up (other than the de facto commit). And quite frankly, that's absurd. 1. No other driver implements _only some_ use-cases of dmaengine_pause(). 2. The number of users expecting dmaengine_pause to be implemented for non-cyclic dma transfers _dwarfs_ cyclic users. 3. There's a dedicated query interface, dma_get_slave_caps(), for which omap-dma returns /true/ -- not /maybe/ -- to indicate dmaengine_pause() is implemented. As a consumer of the api, I'd much rather opt-out at device initialization time knowing that a required feature is unimplemented, than discover it at i/o time when it's too late. > What would you think is better: an API which silently loses data, or > one which refuses to stop the transfer and reports an error code back > to the caller. An api which provides a means of determining if necessary functionality is implemented _during setup_. That way the consumer of the api can determine if the feature is supportable. For example, dma_get_slave_caps() could differentiate * pause for cyclic support * pause for non-cyclic support * pause and resume support * pause and terminate support > You seem to be arguing for the former, and as such, there's no way I > can take you seriously. Leaping to conclusions. > In any case, Greg has now commented on the patch adding the feature, > basically refusing it for stable tree inclusion. So the matter is > settled: omap-dma isn't going to get the pause feature added in stable > trees any time soon. So a different solution now needs to be found, > which is what I've been saying all along... While Sebastian's initial patch is a good first-cut at addressing 8250_omap's use of omap-dma, none of the patches address the general design problem I have outlined above; namely, that simply returning an error at use time for an unimplemented slave transaction is fundamentally flawed. Regards, Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: create and use seq_show_option for escaping
On Fri, Aug 07, 2015 at 04:41:50PM -0700, Kees Cook wrote: > Many file systems that implement the show_options hook fail to correctly > escape their output which could lead to unescaped characters (e.g. new > lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This > could lead to confusion, spoofed entries (resulting in things like > systemd issuing false d-bus "mount" notifications), and who knows > what else. This looks like it would only be the root user stepping on > themselves, but it's possible weird things could happen in containers > or in other situations with delegated mount privileges. > > Here's an example using overlay with setuid fusermount trusting the > contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of > "sudo" is something more sneaky: > > $ BASE="ovl" > $ MNT="$BASE/mnt" > $ LOW="$BASE/lower" > $ UP="$BASE/upper" > $ WORK="$BASE/work/ 0 0 > none /proc fuse.pwn user_id=1000" > $ mkdir -p "$LOW" "$UP" "$WORK" > $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none > /mnt > $ cat /proc/mounts > none /root/ovl/mnt overlay > rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0 > none /proc fuse.pwn user_id=1000 0 0 > $ fusermount -u /proc > $ cat /proc/mounts > cat: /proc/mounts: No such file or directory > > This fixes the problem by adding new seq_show_option and seq_show_option_n > helpers, and updating the vulnerable show_option handlers to use them as > needed. Some, like SELinux, need to be open coded due to unusual existing > escape mechanisms. > > Signed-off-by: Kees Cook > Cc: sta...@vger.kernel.org Acked-by: Serge Hallyn > --- > fs/ceph/super.c | 2 +- > fs/cifs/cifsfs.c | 6 +++--- > fs/ext3/super.c | 4 ++-- > fs/ext4/super.c | 4 ++-- > fs/gfs2/super.c | 6 +++--- > fs/hfs/super.c | 4 ++-- > fs/hfsplus/options.c | 4 ++-- > fs/hostfs/hostfs_kern.c | 2 +- > fs/ocfs2/super.c | 4 ++-- > fs/overlayfs/super.c | 6 +++--- > fs/reiserfs/super.c | 8 +--- > fs/xfs/xfs_super.c | 4 ++-- > include/linux/seq_file.h | 34 ++ > kernel/cgroup.c | 7 --- > net/ceph/ceph_common.c | 7 +-- > security/selinux/hooks.c | 2 +- > 16 files changed, 72 insertions(+), 32 deletions(-) > > diff --git a/fs/ceph/super.c b/fs/ceph/super.c > index d1c833c321b9..7b6bfcbf801c 100644 > --- a/fs/ceph/super.c > +++ b/fs/ceph/super.c > @@ -479,7 +479,7 @@ static int ceph_show_options(struct seq_file *m, struct > dentry *root) > if (fsopt->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT) > seq_printf(m, ",readdir_max_bytes=%d", > fsopt->max_readdir_bytes); > if (strcmp(fsopt->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT)) > - seq_printf(m, ",snapdirname=%s", fsopt->snapdir_name); > + seq_show_option(m, "snapdirname", fsopt->snapdir_name); > > return 0; > } > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c > index 0a9fb6b53126..6a1119e87fbb 100644 > --- a/fs/cifs/cifsfs.c > +++ b/fs/cifs/cifsfs.c > @@ -394,17 +394,17 @@ cifs_show_options(struct seq_file *s, struct dentry > *root) > struct sockaddr *srcaddr; > srcaddr = (struct sockaddr *)>ses->server->srcaddr; > > - seq_printf(s, ",vers=%s", tcon->ses->server->vals->version_string); > + seq_show_option(s, "vers", tcon->ses->server->vals->version_string); > cifs_show_security(s, tcon->ses); > cifs_show_cache_flavor(s, cifs_sb); > > if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER) > seq_puts(s, ",multiuser"); > else if (tcon->ses->user_name) > - seq_printf(s, ",username=%s", tcon->ses->user_name); > + seq_show_option(s, "username", tcon->ses->user_name); > > if (tcon->ses->domainName) > - seq_printf(s, ",domain=%s", tcon->ses->domainName); > + seq_show_option(s, "domain", tcon->ses->domainName); > > if (srcaddr->sa_family != AF_UNSPEC) { > struct sockaddr_in *saddr4; > diff --git a/fs/ext3/super.c b/fs/ext3/super.c > index 5ed0044fbb37..e9312494f3ee 100644 > --- a/fs/ext3/super.c > +++ b/fs/ext3/super.c > @@ -578,10 +578,10 @@ static inline void ext3_show_quota_options(struct > seq_file *seq, struct super_bl > } > > if (sbi->s_qf_names[USRQUOTA]) > - seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]); > + seq_show_option(seq, "usrjquota", sbi->s_qf_names[USRQUOTA]); > > if (sbi->s_qf_names[GRPQUOTA]) > - seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]); > + seq_show_option(seq, "grpjquota", sbi->s_qf_names[GRPQUOTA]); > > if (test_opt(sb, USRQUOTA)) > seq_puts(seq, ",usrquota"); > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 58987b5c514b..9981064c4a54 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c >
Re: [PATCH v4 2/2] usb: Add support for ACPI identification to xhci-platform
On Fri, Aug 07, 2015 at 06:03:36PM -0700, Duc Dang wrote: > Provide the methods to let ACPI identify the need to use > xhci-platform. Change the Kconfig files so the > xhci-plat.o file is selectable during kernel config. > > This has been tested on an ARM64 machine with platform XHCI, an > x86_64 machine with XHCI, and an x86_64 machine without XHCI. > There were no regressions or error messages on the machines > without platform XHCI. > > [dhdang: regenerate the patch over 4.2-rc5] > Signed-off-by: Mark Langsdorf > Signed-off-by: Duc Dang > > --- > Changes from v3: > Regenerate the patch over 4.2-rc5 > No code change > > Changes from v2 > Replaced tristate with a boolean as the driver doesn't > compile as a module > Correct --help-- to ---help--- > > Changes from v1 > Renamed from "add support for APM X-Gene to xhci-platform" > Removed changes to arm64/Kconfig > Made CONFIG_USB_XHCI_PLATFORM a user selectable config option > > drivers/usb/host/Kconfig | 7 ++- > drivers/usb/host/xhci-plat.c | 11 +++ > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig > index 8afc3c1..96231ee 100644 > --- a/drivers/usb/host/Kconfig > +++ b/drivers/usb/host/Kconfig > @@ -32,7 +32,12 @@ config USB_XHCI_PCI > default y > > config USB_XHCI_PLATFORM > - tristate > + tristate "xHCI platform driver support" > + ---help--- > + Say 'Y' to enable the support for the xHCI host controller > + as a platform device. Many ARM SoCs provide USB this way. > + > + If unsure, say 'Y'. > > config USB_XHCI_MVEBU > tristate "xHCI support for Marvell Armada 375/38x" > diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c > index 5d03f8b..14b40d2 100644 > --- a/drivers/usb/host/xhci-plat.c > +++ b/drivers/usb/host/xhci-plat.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > #include "xhci.h" > #include "xhci-mvebu.h" > @@ -262,6 +263,15 @@ static const struct of_device_id usb_xhci_of_match[] = { > MODULE_DEVICE_TABLE(of, usb_xhci_of_match); > #endif > > +#ifdef CONFIG_ACPI You shoudn't need this #ifdef, right? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] scatterlist: use sg_phys()
On Fri, Aug 7, 2015 at 9:15 AM, Christoph Hellwig wrote: > From: Dan Williams > > Coccinelle cleanup to replace open coded sg to physical address > translations. This is in preparation for introducing scatterlists that > reference __pfn_t. > > // sg_phys.cocci: convert usage page_to_phys(sg_page(sg)) to sg_phys(sg) > // usage: make coccicheck COCCI=sg_phys.cocci MODE=patch > > virtual patch > virtual report > virtual org > > @@ > struct scatterlist *sg; > @@ > > - page_to_phys(sg_page(sg)) + sg->offset > + sg_phys(sg) > > @@ > struct scatterlist *sg; > @@ > > - page_to_phys(sg_page(sg)) > + sg_phys(sg) - sg->offset So this ends up being an awkward conversion, a better one is: - page_to_phys(sg_page(sg)) + sg_phys(sg) & PAGE_MASK [..] > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 5852df8..e0faecc 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -2098,7 +2098,7 @@ static int __domain_mapping(struct dmar_domain *domain, > unsigned long iov_pfn, > sg_res = aligned_nrpages(sg->offset, sg->length); > sg->dma_address = ((dma_addr_t)iov_pfn << > VTD_PAGE_SHIFT) + sg->offset; > sg->dma_length = sg->length; > - pteval = page_to_phys(sg_page(sg)) | prot; > + pteval = (sg_phys(sg) - sg->offset) | prot; For example here, as Joerg pointed out. [1]: https://lkml.org/lkml/2015/6/10/230 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/9] x86, perf: Add Top Down events to Intel Core
From: Andi Kleen Add declarations for the events needed for TopDown to the Intel big core CPUs starting with Sandy Bridge. We need to report different values if HyperThreading is on or off. The only thing this patch does is to export some events in sysfs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Some events need a divisor. To handle this I redefined ".scale" slightly to let a negative value mean divide by. For HyperThreading the any bit is needed to get accurate values when both threads are executing. This implies that the events can only be collected as root or with perf_event_paranoid=-1 for now. Hyper Threading also requires averaging events from both threads together (the CPU cannot measure them independently). In perf stat this is done by using per core mode, and then forcing a divisor of two to get the average. The new .agg-per-core attribute is added to the events, which then forces perf stat to enable --per-core. When hyperthreading is disabled the attribute has the value 0. The basic scheme is based on the following paper: Yasin, A Top Down Method for Performance analysis and Counter architecture ISPASS14 (pdf available via google) with some extensions to handle HyperThreading. Signed-off-by: Andi Kleen --- arch/x86/kernel/cpu/perf_event_intel.c | 82 ++ 1 file changed, 82 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index a478e3c..65b58cb 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -217,9 +217,70 @@ struct attribute *nhm_events_attrs[] = { NULL, }; +/* + * TopDown events for Core. + * + * With Hyper Threading on, TopDown metrics are averaged between the + * threads of a core: (count_core0 + count_core1) / 2. The 2 is expressed + * as a scale parameter. We also tell perf to aggregate per core + * by setting the .agg-per-core attribute for the alias to 1. + * + * Some events need to be multiplied by the pipeline width (4), which + * is expressed as a negative scale. In HT we cancel the factor 4 + * with the 2 dividend for the core average, so we use -2. + */ + +EVENT_ATTR_STR_HT(topdown-total-slots, td_total_slots, + "event=0x3c,umask=0x0", /* cpu_clk_unhalted.thread */ + "event=0x3c,umask=0x0,any=1"); /* cpu_clk_unhalted.thread_any */ +EVENT_ATTR_STR_HT(topdown-total-slots.scale, td_total_slots_scale, + "-4", "-2"); +EVENT_ATTR_STR_HT(topdown-total-slots.agg-per-core, td_total_slots_pc, + "0", "1"); +EVENT_ATTR_STR(topdown-slots-issued, td_slots_issued, + "event=0xe,umask=0x1"); /* uops_issued.any */ +EVENT_ATTR_STR_HT(topdown-slots-issued.agg-per-core, td_slots_issued_pc, + "0", "1"); +EVENT_ATTR_STR_HT(topdown-slots-issued.scale, td_slots_issued_scale, + "0", "2"); +EVENT_ATTR_STR(topdown-slots-retired, td_slots_retired, + "event=0xc2,umask=0x2");/* uops_retired.retire_slots */ +EVENT_ATTR_STR_HT(topdown-slots-retired.agg-per-core, td_slots_retired_pc, + "0", "1"); +EVENT_ATTR_STR_HT(topdown-slots-retired.scale, td_slots_retired_scale, + "0", "2"); +EVENT_ATTR_STR(topdown-fetch-bubbles, td_fetch_bubbles, + "event=0x9c,umask=0x1");/* idq_uops_not_delivered_core */ +EVENT_ATTR_STR_HT(topdown-fetch-bubbles.agg-per-core, td_fetch_bubbles_pc, + "0", "1"); +EVENT_ATTR_STR_HT(topdown-fetch-bubbles.scale, td_fetch_bubbles_scale, + "0", "2"); +EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles, + "event=0xd,umask=0x3,cmask=1", /* int_misc.recovery_cycles */ + "event=0xd,umask=0x3,cmask=1,any=1"); /* int_misc.recovery_cycles_any */ +EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale, + "-4", "-2"); +EVENT_ATTR_STR_HT(topdown-recovery-bubbles.agg-per-core, td_recovery_bubbles_pc, + "0", "1"); + struct attribute *snb_events_attrs[] = { EVENT_PTR(mem_ld_snb), EVENT_PTR(mem_st_snb), + EVENT_PTR(td_slots_issued), + EVENT_PTR(td_slots_issued_scale), + EVENT_PTR(td_slots_issued_pc), +
[PATCH 2/9] perf, tools, stat: Support up-scaling of events
From: Andi Kleen TopDown needs to multiply events by constants (for example the CPU Pipeline Width) to get the correct results. The kernel needs to export this factor. Today *.scale is only used to scale down metrics (divide), for example to scale bytes to MB. Repurpose negative scale to mean scaling up, that is multiplying. Implement the code for this in perf stat. Signed-off-by: Andi Kleen --- tools/perf/builtin-stat.c | 27 +++ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index ea5298a..2590c75 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -179,6 +179,17 @@ static inline int nsec_counter(struct perf_evsel *evsel) return 0; } +static double scale_val(struct perf_evsel *counter, u64 val) +{ + double uval = val; + + if (counter->scale < 0) + uval = val * (-counter->scale); + else if (counter->scale) + uval = val / counter->scale; + return uval; +} + /* * Read out the results of a single counter: * do not aggregate counts across CPUs in system-wide mode @@ -630,12 +641,12 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg) const char *fmt; if (csv_output) { - fmt = sc != 1.0 ? "%.2f%s" : "%.0f%s"; + fmt = (sc != 1.0 && sc > 0) ? "%.2f%s" : "%.0f%s"; } else { if (big_num) - fmt = sc != 1.0 ? "%'18.2f%s" : "%'18.0f%s"; + fmt = (sc != 1.0 && sc > 0) ? "%'18.2f%s" : "%'18.0f%s"; else - fmt = sc != 1.0 ? "%18.2f%s" : "%18.0f%s"; + fmt = (sc != 1.0 && sc > 0) ? "%18.2f%s" : "%18.0f%s"; } aggr_printout(evsel, id, nr); @@ -750,7 +761,7 @@ static void aggr_update_shadow(void) continue; val += perf_counts(counter->counts, cpu, 0)->val; } - val = val * counter->scale; + val = scale_val(counter, val); perf_stat__update_shadow_stats(counter, , first_shadow_cpu(counter, id)); } @@ -788,7 +799,7 @@ static void print_aggr(char *prefix) if (prefix) fprintf(output, "%s", prefix); - uval = val * counter->scale; + uval = scale_val(counter, val); printout(id, nr, counter, uval, prefix, run, ena, 1.0); fputc('\n', output); } @@ -815,7 +826,7 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix) if (prefix) fprintf(output, "%s", prefix); - uval = val * counter->scale; + uval = scale_val(counter, val); printout(thread, 0, counter, uval, prefix, run, ena, 1.0); fputc('\n', output); } @@ -860,7 +871,7 @@ static void print_counter_aggr(struct perf_evsel *counter, char *prefix) return; } - uval = avg * counter->scale; + uval = scale_val(counter, avg); printout(-1, 0, counter, uval, prefix, avg_running, avg_enabled, avg); fprintf(output, "\n"); } @@ -884,7 +895,7 @@ static void print_counter(struct perf_evsel *counter, char *prefix) if (prefix) fprintf(output, "%s", prefix); - uval = val * counter->scale; + uval = scale_val(counter, val); printout(cpu, 0, counter, uval, prefix, run, ena, 1.0); fputc('\n', output); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] PM / AVS: rockchip-io: add io selectors and supplies for rk3368
On Friday, August 07, 2015 04:04:53 PM Kevin Hilman wrote: > Heiko Stübner writes: > > > This adds the necessary data for handling io voltage domains on the rk3368. > > As interesting tidbit, the rk3368 contains two separate iodomain areas. > > One in the regular General Register Files (GRF) and one in PMUGRF in the > > pmu power domain. > > > > Signed-off-by: Heiko Stuebner > > Acked-by: Kevin Hilman > > Rafael, feel free to take this series directly, or let me know if you > want a pull request. I don't have anything other AVS stuff queued up > for v4.3. I'll take it directly, thanks! Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/9] x86, perf: Support sysfs files depending on SMT status
From: Andi Kleen Add a way to show different sysfs events attributes depending on HyperThreading is on or off. This is difficult to determine early at boot, so we just do it dynamically when the sysfs attribute is read. Signed-off-by: Andi Kleen --- arch/x86/kernel/cpu/perf_event.c | 34 ++ arch/x86/kernel/cpu/perf_event.h | 10 ++ include/linux/perf_event.h | 7 +++ 3 files changed, 51 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 8bac4bb..a1313ed 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -1590,6 +1590,40 @@ ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr, return x86_pmu.events_sysfs_show(page, config); } +ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr, + char *page) +{ + struct perf_pmu_events_ht_attr *pmu_attr = + container_of(attr, struct perf_pmu_events_ht_attr, attr); + bool ht_on = false; + int cpu; + + /* +* Report conditional events depending on Hyper-Threading. +* +* Check all online CPUs if any have a thread sibling, +* as perf may measure any of them. +* +* This is overly conservative as usually the HT special +* handling is not needed if the other CPU thread is idle. +* +* Note this does not (cannot) handle the case when thread +* siblings are invisible, for example with virtualization +* if they are owned by some other guest. The user tool +* has to re-read when a thread sibling gets onlined later. +*/ + for_each_online_cpu (cpu) { + ht_on = cpumask_weight(topology_sibling_cpumask(cpu)) > 1; + if (ht_on) + break; + } + + return sprintf(page, "%s", + ht_on ? + pmu_attr->event_str_ht : + pmu_attr->event_str_noht); +} + EVENT_ATTR(cpu-cycles, CPU_CYCLES ); EVENT_ATTR(instructions, INSTRUCTIONS); EVENT_ATTR(cache-references, CACHE_REFERENCES); diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h index 5edf6d8..3df86d9 100644 --- a/arch/x86/kernel/cpu/perf_event.h +++ b/arch/x86/kernel/cpu/perf_event.h @@ -664,6 +664,14 @@ static struct perf_pmu_events_attr event_attr_##v = { \ .event_str = str, \ }; +#define EVENT_ATTR_STR_HT(_name, v, noht, ht) \ +static struct perf_pmu_events_ht_attr event_attr_##v = { \ + .attr = __ATTR(_name, 0444, events_ht_sysfs_show, NULL),\ + .id = 0,\ + .event_str_noht = noht, \ + .event_str_ht = ht, \ +}; + extern struct x86_pmu x86_pmu __read_mostly; static inline bool x86_pmu_has_lbr_callstack(void) @@ -923,6 +931,8 @@ int knc_pmu_init(void); ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr, char *page); +ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr, + char *page); static inline int is_ht_workaround_enabled(void) { diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2027809..5e9ee24 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1068,6 +1068,13 @@ struct perf_pmu_events_attr { const char *event_str; }; +struct perf_pmu_events_ht_attr { + struct device_attribute attr; + u64 id; + const char *event_str_ht; + const char *event_str_noht; +}; + ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute *attr, char *page); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 9/9] perf, tools, stat: Force --per-core mode for .agg-per-core aliases
From: Andi Kleen When an event alias is used that the kernel marked as .agg-per-core, force --per-core mode (and also require -a and forbid cgroups or per thread mode). This in term means, --topdown forces --per-core mode. This is needed for TopDown in SMT mode, because it needs to measure all threads in a core together and merge the values to compute the correct percentages of how the pipeline is limited. We do this if any alias is agg-per-core. Add the code to parse the .agg-per-core attributes and propagate the information to the evsel. Then the main stat code does the necessary checks and forces per core mode. Open issue: in combination with -C ... we get wrong values. I think that's a existing bug that needs to be debugged/fixed separately. Signed-off-by: Andi Kleen --- tools/perf/builtin-stat.c | 18 ++ tools/perf/util/evsel.h| 1 + tools/perf/util/parse-events.c | 1 + tools/perf/util/pmu.c | 23 +++ tools/perf/util/pmu.h | 2 ++ 5 files changed, 45 insertions(+) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index eec6c16..0df0aff 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -1382,6 +1382,7 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused) bool append_file = false; int output_fd = 0; const char *output_name = NULL; + struct perf_evsel *counter; const struct option options[] = { OPT_BOOLEAN('T', "transaction", _run, "hardware transaction statistics"), @@ -1563,6 +1564,23 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused) if (add_default_attributes()) goto out; + evlist__for_each (evsel_list, counter) { + /* Enable per core mode if only a single event requires it. */ + if (counter->agg_per_core) { + if (stat_config.aggr_mode != AGGR_GLOBAL && + stat_config.aggr_mode != AGGR_CORE) { + pr_err("per core event configuration requires per core mode\n"); + goto out; + } + stat_config.aggr_mode = AGGR_CORE; + if (nr_cgroups || !target__has_cpu()) { + pr_err("per core event configuration requires system-wide mode (-a)\n"); + goto out; + } + break; + } + } + target__validate(); if (perf_evlist__create_maps(evsel_list, ) < 0) { diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 6a12908..85f02b8 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -100,6 +100,7 @@ struct perf_evsel { boolsystem_wide; booltracking; boolper_pkg; + boolagg_per_core; /* parse modifier helper */ int exclude_GH; int nr_members; diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 828936d..d2a5938 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -759,6 +759,7 @@ int parse_events_add_pmu(struct parse_events_evlist *data, evsel->unit = info.unit; evsel->scale = info.scale; evsel->per_pkg = info.per_pkg; + evsel->agg_per_core = info.agg_per_core; evsel->snapshot = info.snapshot; } diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index ce56354..abedb6a 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -189,6 +189,23 @@ perf_pmu__parse_per_pkg(struct perf_pmu_alias *alias, char *dir, char *name) return 0; } +static void +perf_pmu__parse_agg_per_core(struct perf_pmu_alias *alias, char *dir, char *name) +{ + char path[PATH_MAX]; + FILE *f; + int flag; + + snprintf(path, PATH_MAX, "%s/%s.agg-per-core", dir, name); + + f = fopen(path, "r"); + if (f && fscanf(f, "%d", ) == 1) { + alias->agg_per_core = flag != 0; + fclose(f); + } +} + + static int perf_pmu__parse_snapshot(struct perf_pmu_alias *alias, char *dir, char *name) { @@ -237,6 +254,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, perf_pmu__parse_scale(alias, dir, name); perf_pmu__parse_per_pkg(alias, dir, name); perf_pmu__parse_snapshot(alias, dir, name); + perf_pmu__parse_agg_per_core(alias, dir, name); } list_add_tail(>list, list); @@ -271,6 +289,8 @@ static inline bool pmu_alias_info_file(char *name) return true; if (len > 9
[PATCH 7/9] x86, perf: Add Top Down events to Intel Atom
From: Andi Kleen Add topdown event declarations to Silvermont / Airmont. These cores do not support the full Top Down metrics, but an useful subset (FrontendBound, Retiring, Backend Bound/Bad Speculation). The perf stat tool automatically handles the missing events and combines the available metrics. Signed-off-by: Andi Kleen --- arch/x86/kernel/cpu/perf_event_intel.c | 24 1 file changed, 24 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 65b58cb..1f08603 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -1380,6 +1380,29 @@ static __initconst const u64 atom_hw_cache_event_ids }, }; +EVENT_ATTR_STR(topdown-total-slots, td_total_slots_slm, "event=0x3c"); +EVENT_ATTR_STR(topdown-total-slots.scale, td_total_slots_scale_slm, "-2"); +/* no_alloc_cycles.not_delivered */ +EVENT_ATTR_STR(topdown-fetch-bubbles, td_fetch_bubbles_slm, + "event=0xca,umask=0x50"); +EVENT_ATTR_STR(topdown-fetch-bubbles.scale, td_fetch_bubbles_scale_slm, "-2"); +/* uops_retired.all */ +EVENT_ATTR_STR(topdown-slots-issued, td_slots_issued_slm, + "event=0xc2,umask=0x10"); +/* uops_retired.all */ +EVENT_ATTR_STR(topdown-slots-retired, td_slots_retired_slm, + "event=0xc2,umask=0x10"); + +struct attribute *slm_events_attrs[] = { + EVENT_PTR(td_total_slots_slm), + EVENT_PTR(td_total_slots_scale_slm), + EVENT_PTR(td_fetch_bubbles_slm), + EVENT_PTR(td_fetch_bubbles_scale_slm), + EVENT_PTR(td_slots_issued_slm), + EVENT_PTR(td_slots_retired_slm), + NULL +}; + static struct extra_reg intel_slm_extra_regs[] __read_mostly = { /* must define OFFCORE_RSP_X first, see intel_fixup_er() */ @@ -3401,6 +3424,7 @@ __init int intel_pmu_init(void) x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints; x86_pmu.extra_regs = intel_slm_extra_regs; x86_pmu.flags |= PMU_FL_HAS_RSP_1; + x86_pmu.cpu_events = slm_events_attrs; pr_cont("Silvermont events, "); break; -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/9] perf, tools: Dont stop PMU parsing on alias parse error
From: Andi Kleen When an error happens during alias parsing currently the complete parsing of all attributes of the PMU is stopped. This is breaks old perf on a newer kernel that may have not-yet-know alias attributes (such as .scale or .per-pkg). Continue when some attribute is unparseable. This is IMHO a stable candidate and should be backported to older versions to avoid problems with newer kernels. Signed-off-by: Andi Kleen --- tools/perf/util/pmu.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index d4b0e64..ce56354 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -283,13 +283,12 @@ static int pmu_aliases_parse(char *dir, struct list_head *head) { struct dirent *evt_ent; DIR *event_dir; - int ret = 0; event_dir = opendir(dir); if (!event_dir) return -EINVAL; - while (!ret && (evt_ent = readdir(event_dir))) { + while ((evt_ent = readdir(event_dir))) { char path[PATH_MAX]; char *name = evt_ent->d_name; FILE *file; @@ -305,17 +304,16 @@ static int pmu_aliases_parse(char *dir, struct list_head *head) snprintf(path, PATH_MAX, "%s/%s", dir, name); - ret = -EINVAL; file = fopen(path, "r"); if (!file) - break; + continue; - ret = perf_pmu__new_alias(head, dir, name, file); + perf_pmu__new_alias(head, dir, name, file); fclose(file); } closedir(event_dir); - return ret; + return 0; } /* -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/9] perf, tools, stat: Add extra output of counter values with -v
From: Andi Kleen Add debug output of raw counter values per CPU when perf stat -v is specified, together with their cpu numbers. This is very useful to debug problems with per core counters, where we can normally only see aggregated values. Signed-off-by: Andi Kleen --- tools/perf/builtin-stat.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index a83f26f..eec6c16 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -223,6 +223,13 @@ static int read_counter(struct perf_evsel *counter) count = perf_counts(counter->counts, cpu, thread); if (perf_evsel__read(counter, cpu, thread, count)) return -1; + if (verbose) { + fprintf(stat_config.output, + "%s: %d: %" PRIu64 " %" PRIu64 " %" PRIu64 "\n", + perf_evsel__name(counter), + cpu, + count->val, count->ena, count->run); + } } } -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Add top down metrics to perf stat
This patchkit adds support for TopDown to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ ./perf stat --topdown -a ./BC1s Performance counter stats for 'system wide': S0-C0 2 19650790 topdown-total-slots (100.00%) S0-C0 2 4445680.00 topdown-fetch-bubbles #22.62% frontend bound (100.00%) S0-C0 2 1743552.00 topdown-slots-retired (100.00%) S0-C0 2 622954 topdown-recovery-bubbles (100.00%) S0-C0 2 2025498.00 topdown-slots-issued #63.90% backend bound S0-C1 216685216540 topdown-total-slots (100.00%) S0-C1 2 962557931.00 topdown-fetch-bubbles (100.00%) S0-C1 2 4175583320.00 topdown-slots-retired (100.00%) S0-C1 2 1743329246 topdown-recovery-bubbles #22.22% bad speculation (100.00%) S0-C1 2 6138901193.50 topdown-slots-issued #46.99% backend bound 1.535832673 seconds time elapsed On Hyper Threaded CPUs Top Down computes metrics per core instead of per logical CPU. In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. On systems without Hyper Threading it can be used per process. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/9] perf, tools, stat: Add computation of TopDown formulas
From: Andi Kleen Implement the TopDown formulas in perf stat. The topdown basic metrics reported by the kernel are collected, and the formulas are computed and output as normal metrics. See the kernel commit exporting the events for details on the used metrics. Signed-off-by: Andi Kleen --- tools/perf/util/stat-shadow.c | 119 +- tools/perf/util/stat.c| 5 ++ tools/perf/util/stat.h| 8 ++- 3 files changed, 130 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index 073e66f7..2158a0e 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -28,6 +28,11 @@ static struct stats runtime_dtlb_cache_stats[NUM_CTX][MAX_NR_CPUS]; static struct stats runtime_cycles_in_tx_stats[NUM_CTX][MAX_NR_CPUS]; static struct stats runtime_transaction_stats[NUM_CTX][MAX_NR_CPUS]; static struct stats runtime_elision_stats[NUM_CTX][MAX_NR_CPUS]; +static struct stats runtime_topdown_total_slots[NUM_CTX][MAX_NR_CPUS]; +static struct stats runtime_topdown_slots_issued[NUM_CTX][MAX_NR_CPUS]; +static struct stats runtime_topdown_slots_retired[NUM_CTX][MAX_NR_CPUS]; +static struct stats runtime_topdown_fetch_bubbles[NUM_CTX][MAX_NR_CPUS]; +static struct stats runtime_topdown_recovery_bubbles[NUM_CTX][MAX_NR_CPUS]; struct stats walltime_nsecs_stats; @@ -68,6 +73,11 @@ void perf_stat__reset_shadow_stats(void) sizeof(runtime_transaction_stats)); memset(runtime_elision_stats, 0, sizeof(runtime_elision_stats)); memset(_nsecs_stats, 0, sizeof(walltime_nsecs_stats)); + memset(runtime_topdown_total_slots, 0, sizeof(runtime_topdown_total_slots)); + memset(runtime_topdown_slots_retired, 0, sizeof(runtime_topdown_slots_retired)); + memset(runtime_topdown_slots_issued, 0, sizeof(runtime_topdown_slots_issued)); + memset(runtime_topdown_fetch_bubbles, 0, sizeof(runtime_topdown_fetch_bubbles)); + memset(runtime_topdown_recovery_bubbles, 0, sizeof(runtime_topdown_recovery_bubbles)); } /* @@ -90,6 +100,16 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 *count, update_stats(_transaction_stats[ctx][cpu], count[0]); else if (perf_stat_evsel__is(counter, ELISION_START)) update_stats(_elision_stats[ctx][cpu], count[0]); + else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS)) + update_stats(_topdown_total_slots[ctx][cpu], count[0]); + else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED)) + update_stats(_topdown_slots_issued[ctx][cpu], count[0]); + else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED)) + update_stats(_topdown_slots_retired[ctx][cpu], count[0]); + else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES)) + update_stats(_topdown_fetch_bubbles[ctx][cpu],count[0]); + else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES)) + update_stats(_topdown_recovery_bubbles[ctx][cpu], count[0]); else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) update_stats(_stalled_cycles_front_stats[ctx][cpu], count[0]); else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND)) @@ -293,11 +313,70 @@ static void print_ll_cache_misses(int cpu, print_metric(ctxp, color, "%7.2f%%", "of all LL-cache hits", ratio); } +/* + * For an explanation of the formulas see: + * Yasin, A Top Down Method for Performance analysis and Counter architecture + * ISPASS14 + */ + +static double td_total_slots(int ctx, int cpu) +{ + return avg_stats(_topdown_total_slots[ctx][cpu]); +} + +static double td_bad_spec(int ctx, int cpu) +{ + double bad_spec = 0; + double total_slots; + double total; + + total = avg_stats(_topdown_slots_issued[ctx][cpu]) - + avg_stats(_topdown_slots_retired[ctx][cpu]) + + avg_stats(_topdown_recovery_bubbles[ctx][cpu]); + total_slots = td_total_slots(ctx, cpu); + if (total_slots) + bad_spec = total / total_slots; + return bad_spec; +} + +static double td_retiring(int ctx, int cpu) +{ + double retiring = 0; + double total_slots = td_total_slots(ctx, cpu); + double ret_slots = avg_stats(_topdown_slots_retired[ctx][cpu]); + + if (total_slots) + retiring = ret_slots / total_slots; + return retiring; +} + +static double td_fe_bound(int ctx, int cpu) +{ + double fe_bound = 0; + double total_slots = td_total_slots(ctx, cpu); + double fetch_bub = avg_stats(_topdown_fetch_bubbles[ctx][cpu]); + + if (total_slots) + fe_bound = fetch_bub / total_slots; + return fe_bound; +} + +static double td_be_bound(int ctx, int cpu) +{ + double sum = (td_fe_bound(ctx, cpu) + + td_bad_spec(ctx,
[PATCH 3/9] perf, tools, stat: Basic support for TopDown in perf stat
From: Andi Kleen Add basic plumbing for TopDown in perf stat Add a new --topdown options to enable events. When --topdown is specified set up events for all topdown events supported by the kernel. Add topdown-* as a special case to the event parser, as is needed for all events containing -. The actual code to compute the metrics is in follow-on patches. Signed-off-by: Andi Kleen --- tools/perf/Documentation/perf-stat.txt | 8 +++ tools/perf/builtin-stat.c | 124 - tools/perf/util/parse-events.l | 1 + 3 files changed, 131 insertions(+), 2 deletions(-) diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 47469ab..86c03e9 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -158,6 +158,14 @@ filter out the startup phase of the program, which is often very different. Print statistics of transactional execution if supported. +--topdown:: + +Print top down level 1 metrics if supported by the CPU. This allows to +determine bottle necks in the CPU pipeline for CPU bound workloads, +by breaking it down into frontend bound, backend bound, bad speculation +and retiring. Specifying the option multiple times shows metrics even +if the don't cross a threshold. + EXAMPLES diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 2590c75..a83f26f 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -93,6 +93,15 @@ static const char * transaction_limited_attrs = { "}" }; +static const char * topdown_attrs[] = { + "topdown-total-slots", + "topdown-fetch-bubbles", + "topdown-slots-retired", + "topdown-recovery-bubbles", + "topdown-slots-issued", + NULL, +}; + static struct perf_evlist *evsel_list; static struct target target = { @@ -105,6 +114,7 @@ static volatile pid_t child_pid = -1; static boolnull_run= false; static int detailed_run= 0; static booltransaction_run; +static int topdown_run = 0; static boolbig_num = true; static int big_num_opt = -1; static const char *csv_sep= NULL; @@ -735,7 +745,8 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval, first_shadow_cpu(counter, id), pm, nl, - ); + , + topdown_run); if (!csv_output) { print_noise(counter, noise); @@ -1093,12 +1104,90 @@ static int perf_stat_init_aggr_mode(void) return 0; } +static void filter_events(const char **attr, char **str, bool use_group) +{ + int off = 0; + int i; + int len = 0; + char *s; + + for (i = 0; attr[i]; i++) { + if (pmu_have_event("cpu", attr[i])) { + len += strlen(attr[i]) + 1; + attr[i - off] = attr[i]; + } else + off++; + } + attr[i - off] = NULL; + + *str = malloc(len + 1 + 2); + if (!*str) + return; + s = *str; + if (i - off == 0) { + *s = 0; + return; + } + if (use_group) + *s++ = '{'; + for (i = 0; attr[i]; i++) { + strcpy(s, attr[i]); + s += strlen(s); + *s++ = ','; + } + if (use_group) { + s[-1] = '}'; + *s = 0; + } else + s[-1] = 0; +} + +/* Caller must free result */ +static char *sysctl_read(const char *fn) +{ + int n; + char *line = NULL; + size_t linelen = 0; + FILE *f = fopen(fn, "r"); + if (!f) + return NULL; + n = getline(, , f); + fclose(f); + if (n > 0) + return line; + free(line); + return NULL; +} + +/* + * Check whether we can use a group for top down. + * Without a group may get bad results. + */ +static bool check_group(bool *warn) +{ + char *v = sysctl_read("/proc/sys/kernel/nmi_watchdog"); + int n; + + *warn = false; + if (v) { + bool res = sscanf(v, "%d", ) == 1 && n != 0; + free(v); + if (res) { + *warn = true; + return false; + } + return true; + } + return false; /* Don't know, so don't use group */ +} + /* * Add default attributes, if there were no attributes specified or * if -d/--detailed, -d -d or -d -d -d is used:
[PATCH v4 2/2] usb: Add support for ACPI identification to xhci-platform
Provide the methods to let ACPI identify the need to use xhci-platform. Change the Kconfig files so the xhci-plat.o file is selectable during kernel config. This has been tested on an ARM64 machine with platform XHCI, an x86_64 machine with XHCI, and an x86_64 machine without XHCI. There were no regressions or error messages on the machines without platform XHCI. [dhdang: regenerate the patch over 4.2-rc5] Signed-off-by: Mark Langsdorf Signed-off-by: Duc Dang --- Changes from v3: Regenerate the patch over 4.2-rc5 No code change Changes from v2 Replaced tristate with a boolean as the driver doesn't compile as a module Correct --help-- to ---help--- Changes from v1 Renamed from "add support for APM X-Gene to xhci-platform" Removed changes to arm64/Kconfig Made CONFIG_USB_XHCI_PLATFORM a user selectable config option drivers/usb/host/Kconfig | 7 ++- drivers/usb/host/xhci-plat.c | 11 +++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig index 8afc3c1..96231ee 100644 --- a/drivers/usb/host/Kconfig +++ b/drivers/usb/host/Kconfig @@ -32,7 +32,12 @@ config USB_XHCI_PCI default y config USB_XHCI_PLATFORM - tristate + tristate "xHCI platform driver support" + ---help--- + Say 'Y' to enable the support for the xHCI host controller + as a platform device. Many ARM SoCs provide USB this way. + + If unsure, say 'Y'. config USB_XHCI_MVEBU tristate "xHCI support for Marvell Armada 375/38x" diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 5d03f8b..14b40d2 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "xhci.h" #include "xhci-mvebu.h" @@ -262,6 +263,15 @@ static const struct of_device_id usb_xhci_of_match[] = { MODULE_DEVICE_TABLE(of, usb_xhci_of_match); #endif +#ifdef CONFIG_ACPI +static const struct acpi_device_id usb_xhci_acpi_match[] = { + /* APM X-Gene USB Controller */ + { "PNP0D10", }, + { } +}; +MODULE_DEVICE_TABLE(acpi, usb_xhci_acpi_match); +#endif + static struct platform_driver usb_xhci_driver = { .probe = xhci_plat_probe, .remove = xhci_plat_remove, @@ -269,6 +279,7 @@ static struct platform_driver usb_xhci_driver = { .name = "xhci-hcd", .pm = DEV_PM_OPS, .of_match_table = of_match_ptr(usb_xhci_of_match), + .acpi_match_table = ACPI_PTR(usb_xhci_acpi_match), }, }; MODULE_ALIAS("platform:xhci-hcd"); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 1/2] usb: make xhci platform driver use 64 bit or 32 bit DMA
The xhci platform driver needs to work on systems that either only support 64-bit DMA or only support 32-bit DMA. Attempt to set a coherent dma mask for 64-bit DMA, and attempt again with 32-bit DMA if that fails. [dhdang: regenerate the patch over 4.2-rc5] Signed-off-by: Mark Langsdorf Tested-by: Mark Salter Signed-off-by: Duc Dang --- Changes from v3: Re-generate the patch over 4.2-rc5 No code change. Changes from v2: None Changes from v1: Consolidated to use dma_set_mask_and_coherent Got rid of the check against sizeof(dma_addr_t) drivers/usb/host/xhci-plat.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 890ad9d..5d03f8b 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -93,14 +93,14 @@ static int xhci_plat_probe(struct platform_device *pdev) if (irq < 0) return -ENODEV; - /* Initialize dma_mask and coherent_dma_mask to 32-bits */ - ret = dma_set_coherent_mask(>dev, DMA_BIT_MASK(32)); - if (ret) - return ret; - if (!pdev->dev.dma_mask) - pdev->dev.dma_mask = >dev.coherent_dma_mask; - else - dma_set_mask(>dev, DMA_BIT_MASK(32)); + /* Try setting the coherent_dma_mask to 64 bits, then try 32 bits */ + ret = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(64)); + if (ret) { + ret = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(32)); + if (ret) + return ret; + } + hcd = usb_create_hcd(driver, >dev, dev_name(>dev)); if (!hcd) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/7] perf, tools, stat: Move non counting counter printing to printout
From: Andi Kleen Move the special case printing for non-running counters to printout, so it can be shared by all the output options. Signed-off-by: Andi Kleen --- tools/perf/builtin-stat.c | 73 --- 1 file changed, 24 insertions(+), 49 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 7801665..d777bb6 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -662,6 +662,30 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval, os.ena = ena; } + if (run == 0 || ena == 0) { + aggr_printout(counter, id, nr); + + fprintf(stat_config.output, "%*s%s", + csv_output ? 0 : 18, + counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED, + csv_sep); + + fprintf(stat_config.output, "%-*s%s", + csv_output ? 0 : unit_width, + counter->unit, csv_sep); + + fprintf(stat_config.output, "%*s", + csv_output ? 0 : -25, + perf_evsel__name(counter)); + + if (counter->cgrp) + fprintf(stat_config.output, "%s%s", + csv_sep, counter->cgrp->name); + + print_running(run, ena); + return; + } + if (nsec_counter(counter)) nsec_printout(id, nr, counter, uval); else @@ -710,30 +734,6 @@ static void print_aggr(char *prefix) if (prefix) fprintf(output, "%s", prefix); - if (run == 0 || ena == 0) { - aggr_printout(counter, id, nr); - - fprintf(output, "%*s%s", - csv_output ? 0 : 18, - counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED, - csv_sep); - - fprintf(output, "%-*s%s", - csv_output ? 0 : unit_width, - counter->unit, csv_sep); - - fprintf(output, "%*s", - csv_output ? 0 : -25, - perf_evsel__name(counter)); - - if (counter->cgrp) - fprintf(output, "%s%s", - csv_sep, counter->cgrp->name); - - print_running(run, ena); - fputc('\n', output); - continue; - } uval = val * counter->scale; printout(id, nr, counter, uval, prefix, run, ena, 1.0); fputc('\n', output); @@ -830,31 +830,6 @@ static void print_counter(struct perf_evsel *counter, char *prefix) if (prefix) fprintf(output, "%s", prefix); - if (run == 0 || ena == 0) { - fprintf(output, "CPU%*d%s%*s%s", - csv_output ? 0 : -4, - perf_evsel__cpus(counter)->map[cpu], csv_sep, - csv_output ? 0 : 18, - counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED, - csv_sep); - - fprintf(output, "%-*s%s", - csv_output ? 0 : unit_width, - counter->unit, csv_sep); - - fprintf(output, "%*s", - csv_output ? 0 : -25, - perf_evsel__name(counter)); - - if (counter->cgrp) - fprintf(output, "%s%s", - csv_sep, counter->cgrp->name); - - print_running(run, ena); - fputc('\n', output); - continue; - } - uval = val * counter->scale; printout(cpu, 0, counter, uval, prefix, run, ena, 1.0); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/7] perf, tools, stat: Add support for metrics in interval mode
From: Andi Kleen Now that we can modify the metrics printout functions easily, it's straight forward to support metric printing for interval mode. All that is needed is to print the time stamp on every new line. Pass the prefix into the context and print it out. Signed-off-by: Andi Kleen --- tools/perf/builtin-stat.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index bf6c02c..30c2046 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -485,6 +485,7 @@ static void aggr_printout(struct perf_evsel *evsel, int id, int nr) struct outstate { FILE *fh; + const char *prefix; }; #define BASE_INDENT 41 @@ -495,13 +496,13 @@ struct outstate { static void new_line_no_aggr_std(void *ctx) { struct outstate *os = ctx; - fprintf(os->fh, "\n%*s", BASE_INDENT + NA_INDENT, ""); + fprintf(os->fh, "\n%s%-*s", os->prefix, BASE_INDENT + NA_INDENT, ""); } static void new_line_std(void *ctx) { struct outstate *os = ctx; - fprintf(os->fh, "\n%-*s", BASE_INDENT + AGGR_INDENT, ""); + fprintf(os->fh, "\n%s%-*s", os->prefix, BASE_INDENT + AGGR_INDENT, ""); } static void print_metric_std(void *ctx, const char *color, const char *fmt, @@ -582,9 +583,13 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg) fprintf(output, "%s%s", csv_sep, evsel->cgrp->name); } -static void printout(int id, int nr, struct perf_evsel *counter, double uval) +static void printout(int id, int nr, struct perf_evsel *counter, double uval, +char *prefix) { - struct outstate os = { .fh = output }; + struct outstate os = { + .fh = stat_config.output, + .prefix = prefix ? prefix : "" + }; print_metric_t pm = print_metric_std; void (*nl)(void *); @@ -660,7 +665,7 @@ static void print_aggr(char *prefix) continue; } uval = val * counter->scale; - printout(id, nr, counter, uval); + printout(id, nr, counter, uval, prefix); if (!csv_output) print_noise(counter, 1.0); @@ -691,7 +696,7 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix) fprintf(output, "%s", prefix); uval = val * counter->scale; - printout(thread, 0, counter, uval); + printout(thread, 0, counter, uval, prefix); if (!csv_output) print_noise(counter, 1.0); @@ -741,7 +746,7 @@ static void print_counter_aggr(struct perf_evsel *counter, char *prefix) } uval = avg * counter->scale; - printout(-1, 0, counter, uval); + printout(-1, 0, counter, uval, prefix); print_noise(counter, avg); @@ -794,7 +799,7 @@ static void print_counter(struct perf_evsel *counter, char *prefix) } uval = val * counter->scale; - printout(cpu, 0, counter, uval); + printout(cpu, 0, counter, uval, prefix); if (!csv_output) print_noise(counter, 1.0); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/7] perf, tools, stat: Move noise/running printing into printout
From: Andi Kleen Move the running/noise printing into printout to avoid duplicated code in the callers. Signed-off-by: Andi Kleen --- tools/perf/builtin-stat.c | 33 +++-- 1 file changed, 11 insertions(+), 22 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 30c2046..0174aca 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -584,7 +584,7 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg) } static void printout(int id, int nr, struct perf_evsel *counter, double uval, -char *prefix) +char *prefix, u64 run, u64 ena, double noise) { struct outstate os = { .fh = stat_config.output, @@ -609,6 +609,12 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval, pm, nl, ); + + if (!csv_output) { + print_noise(counter, noise); + if (run != ena) + fprintf(stat_config.output, " (%.2f%%)", 100.0 * run / ena); + } } static void print_aggr(char *prefix) @@ -665,11 +671,7 @@ static void print_aggr(char *prefix) continue; } uval = val * counter->scale; - printout(id, nr, counter, uval, prefix); - if (!csv_output) - print_noise(counter, 1.0); - - print_running(run, ena); + printout(id, nr, counter, uval, prefix, run, ena, 1.0); fputc('\n', output); } } @@ -696,12 +698,7 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix) fprintf(output, "%s", prefix); uval = val * counter->scale; - printout(thread, 0, counter, uval, prefix); - - if (!csv_output) - print_noise(counter, 1.0); - - print_running(run, ena); + printout(thread, 0, counter, uval, prefix, run, ena, 1.0); fputc('\n', output); } } @@ -746,11 +743,7 @@ static void print_counter_aggr(struct perf_evsel *counter, char *prefix) } uval = avg * counter->scale; - printout(-1, 0, counter, uval, prefix); - - print_noise(counter, avg); - - print_running(avg_running, avg_enabled); + printout(-1, 0, counter, uval, prefix, avg_running, avg_enabled, avg); fprintf(output, "\n"); } @@ -799,11 +792,7 @@ static void print_counter(struct perf_evsel *counter, char *prefix) } uval = val * counter->scale; - printout(cpu, 0, counter, uval, prefix); - - if (!csv_output) - print_noise(counter, 1.0); - print_running(run, ena); + printout(cpu, 0, counter, uval, prefix, run, ena, 1.0); fputc('\n', output); } -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/7] perf, tools: Support metrics in --per-core/socket mode
From: Andi Kleen Enable metrics printing in --per-core / --per-socket mode. We need to save the shadow metrics in a unique place. Always use the first CPU in the aggregation. Then use the same CPU to retrieve the shadow value later. Example output: % perf stat --per-core -a ./BC1s Performance counter stats for 'system wide': S0-C0 22966.020381 task-clock (msec) #2.004 CPUs utilized(100.00%) S0-C0 2 49 context-switches #0.017 K/sec(100.00%) S0-C0 2 4 cpu-migrations#0.001 K/sec(100.00%) S0-C0 2467 page-faults #0.157 K/sec S0-C0 2 4,599,061,773 cycles#1.551 GHz (100.00%) S0-C0 2 9,755,886,883 instructions #2.12 insn per cycle (100.00%) S0-C0 2 1,906,272,125 branches # 642.704 M/sec(100.00%) S0-C0 2 81,180,867 branch-misses #4.26% of all branches S0-C1 22965.995373 task-clock (msec) #2.003 CPUs utilized(100.00%) S0-C1 2 62 context-switches #0.021 K/sec(100.00%) S0-C1 2 8 cpu-migrations#0.003 K/sec(100.00%) S0-C1 2281 page-faults #0.095 K/sec S0-C1 2 6,347,290 cycles#0.002 GHz (100.00%) S0-C1 2 4,654,156 instructions #0.73 insn per cycle (100.00%) S0-C1 2947,121 branches #0.319 M/sec(100.00%) S0-C1 2 37,322 branch-misses #3.94% of all branches 1.480409747 seconds time elapsed Signed-off-by: Andi Kleen --- tools/perf/builtin-stat.c | 62 --- 1 file changed, 58 insertions(+), 4 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index d777bb6..ea5298a 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -488,6 +488,8 @@ struct outstate { const char *prefix; int nfields; u64 run, ena; + int id, nr; + struct perf_evsel *evsel; }; #define BASE_INDENT 41 @@ -498,13 +500,19 @@ struct outstate { static void new_line_no_aggr_std(void *ctx) { struct outstate *os = ctx; - fprintf(os->fh, "\n%s%-*s", os->prefix, BASE_INDENT + NA_INDENT, ""); + + fprintf(os->fh, "\n%s", os->prefix); + aggr_printout(os->evsel, os->id, os->nr); + fprintf(os->fh, "%-*s", BASE_INDENT + NA_INDENT, ""); } static void new_line_std(void *ctx) { struct outstate *os = ctx; - fprintf(os->fh, "\n%s%-*s", os->prefix, BASE_INDENT + AGGR_INDENT, ""); + + fprintf(os->fh, "\n%s", os->prefix); + aggr_printout(os->evsel, os->id, os->nr); + fprintf(os->fh, "%-*s", BASE_INDENT + AGGR_INDENT, ""); } static void print_metric_std(void *ctx, const char *color, const char *fmt, @@ -535,6 +543,7 @@ static void new_line_csv(void *ctx) fputc('\n', os->fh); if (os->prefix) fprintf(os->fh, "%s%s", os->prefix, csv_sep); + aggr_printout(os->evsel, os->id, os->nr); for (i = 0; i < os->nfields; i++) fputs(csv_sep, os->fh); } @@ -598,6 +607,22 @@ static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg) fprintf(output, "%s%s", csv_sep, evsel->cgrp->name); } +static int first_shadow_cpu(struct perf_evsel *evsel, int id) +{ + int i; + + if (aggr_get_id == NULL) + return 0; + + for (i = 0; i < perf_evsel__nr_cpus(evsel); i++) { + int cpu2 = perf_evsel__cpus(evsel)->map[i]; + + if (aggr_get_id(evsel_list->cpus, cpu2) == id) + return cpu2; + } + return 0; +} + static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg) { FILE *output = stat_config.output; @@ -633,7 +658,10 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval, { struct outstate os = { .fh = stat_config.output, - .prefix = prefix ? prefix : "" + .prefix = prefix ? prefix : "", + .id = id, + .nr = nr, + .evsel = counter, }; print_metric_t pm = print_metric_std; void (*nl)(void *); @@ -693,7 +721,7 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval, perf_stat__print_shadow_stats(counter,
[PATCH 2/7] perf, tools, stat: Abstract stat metrics printing
From: Andi Kleen Abstract the printing of shadow metrics. Instead of every metric calling fprintf directly and taking care of indentation, use two call backs: one to print metrics and another to start a new line. This will allow adding metrics to CSV mode and also using them for other purposes. The computation of padding is now done in the central callback, instead of every metric doing it manually. This makes it easier to add new metrics. v2: Refactor functions, printout now does more. Move shadow printing. Signed-off-by: Andi Kleen --- tools/perf/builtin-stat.c | 96 + tools/perf/util/stat-shadow.c | 158 ++ tools/perf/util/stat.h| 10 ++- 3 files changed, 157 insertions(+), 107 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index fac0c50..bf6c02c 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -483,6 +483,47 @@ static void aggr_printout(struct perf_evsel *evsel, int id, int nr) } } +struct outstate { + FILE *fh; +}; + +#define BASE_INDENT 41 +#define AGGR_INDENT 8 +#define METRIC_LEN 35 +#define NA_INDENT 16 + +static void new_line_no_aggr_std(void *ctx) +{ + struct outstate *os = ctx; + fprintf(os->fh, "\n%*s", BASE_INDENT + NA_INDENT, ""); +} + +static void new_line_std(void *ctx) +{ + struct outstate *os = ctx; + fprintf(os->fh, "\n%-*s", BASE_INDENT + AGGR_INDENT, ""); +} + +static void print_metric_std(void *ctx, const char *color, const char *fmt, +const char *unit, double val) +{ + struct outstate *os = ctx; + FILE *out = os->fh; + int n; + + if (unit == NULL) { + fprintf(out, "%-*s", METRIC_LEN, ""); + return; + } + + n = fprintf(out, " # "); + if (color) + n += color_fprintf(out, color, fmt, val); + else + n += fprintf(out, fmt, val); + fprintf(out, " %-*s", METRIC_LEN - n - 1, unit); +} + static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg) { FILE *output = stat_config.output; @@ -516,7 +557,6 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg) FILE *output = stat_config.output; double sc = evsel->scale; const char *fmt; - int cpu = cpu_map__id_to_cpu(id); if (csv_output) { fmt = sc != 1.0 ? "%.2f%s" : "%.0f%s"; @@ -529,9 +569,6 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg) aggr_printout(evsel, id, nr); - if (stat_config.aggr_mode == AGGR_GLOBAL) - cpu = 0; - fprintf(output, fmt, avg, csv_sep); if (evsel->unit) @@ -543,12 +580,30 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg) if (evsel->cgrp) fprintf(output, "%s%s", csv_sep, evsel->cgrp->name); +} - if (csv_output || stat_config.interval) - return; +static void printout(int id, int nr, struct perf_evsel *counter, double uval) +{ + struct outstate os = { .fh = output }; + print_metric_t pm = print_metric_std; + void (*nl)(void *); - perf_stat__print_shadow_stats(output, evsel, avg, cpu, - stat_config.aggr_mode); + if (stat_config.aggr_mode == AGGR_NONE) + nl = new_line_no_aggr_std; + else + nl = new_line_std; + + if (nsec_counter(counter)) + nsec_printout(id, nr, counter, uval); + else + abs_printout(id, nr, counter, uval); + + perf_stat__print_shadow_stats(counter, uval, + stat_config.aggr_mode == AGGR_GLOBAL ? 0 : + cpu_map__id_to_cpu(id), + pm, + nl, + ); } static void print_aggr(char *prefix) @@ -605,12 +660,7 @@ static void print_aggr(char *prefix) continue; } uval = val * counter->scale; - - if (nsec_counter(counter)) - nsec_printout(id, nr, counter, uval); - else - abs_printout(id, nr, counter, uval); - + printout(id, nr, counter, uval); if (!csv_output) print_noise(counter, 1.0); @@ -641,11 +691,7 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix) fprintf(output, "%s", prefix); uval = val * counter->scale; - - if (nsec_counter(counter)) - nsec_printout(thread, 0, counter, uval); - else - abs_printout(thread, 0, counter, uval); +
perf, tools: Refactor and support interval and CSV metrics v3
[v2: Addressed (near) all review feedback. No manpage updates so far. Add support for --per-core metrics. Various cleanups.] [v3: Everything compiles again. Some more cleanups. Updated to latest tip tree.] Currently perf stat does not support printing computed metrics for interval (-I xxx) or CSV (-x,) mode. For example IPC or TSX metrics over time are quite useful to know. This patch implements them. The main obstacle was that the metrics printing was all open coded all over the metrics computation code. The second patch refactors the metrics printing to work through call backs that can be more easily changed. This also cleans up the metrics printing significantly. The indentation is now handled through printf, no more need to manually count spaces. Then based on that it implements metrics printing for CSV and interval mode. Example output: % perf stat -I1000 -a sleep 1 # time counts unit eventsmetric multiplex 1.001301370 12020.049593 task-clock (msec) (100.00%) 1.001301370 3,952 context-switches #0.329 K/sec(100.00%) 1.001301370 69 cpu-migrations#0.006 K/sec(100.00%) 1.001301370 76 page-faults #0.006 K/sec 1.001301370386,582,789 cycles#0.032 GHz (100.00%) 1.001301370716,441,544 stalled-cycles-frontend # 185.33% frontend cycles idle (100.00%) 1.001301370 stalled-cycles-backend 1.001301370101,751,678 instructions #0.26 insn per cycle 1.001301370 #7.04 stalled cycles per insn (100.00%) 1.001301370 20,914,692 branches #1.740 M/sec(100.00%) 1.001301370 1,943,630 branch-misses #9.29% of all branches CSV mode % perf stat -x, -I1000 -a sleep 1 1.000852081,12016.143006,,task-clock 1.000852081,4457,,context-switches,12015168277,100.00,0.371,K/sec 1.000852081,50,,cpu-migrations,12014024424,100.00,0.004,K/sec 1.000852081,76,,page-faults,12013076716,100.00,0.006,K/sec 1.000852081,515854373,,cycles,12011235336,100.00,0.043,GHz 1.000852081,1030742150,,stalled-cycles-frontend,12010984057,100.00,199.81,frontend cycles idle 1.000852081,,,stalled-cycles-backend,0,100.00 1.000852081,116782495,,instructions,12011130729,100.00,0.23,insn per cycle 1.00085208112011130729,100.00,8.83,stalled cycles per insn 1.000852081,23748237,,branches,12010745125,100.00,1.976,M/sec 1.000852081,1976560,,branch-misses,12010501884,100.00,8.32,of all branches Available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6 perf/stat-metrics-3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/7] perf, tools, stat: Implement CSV metrics output
From: Andi Kleen Now support CSV output for metrics. With the new output callbacks this is relatively straight forward by creating new callbacks. The new line callback needs to know the number of fields to skip them correctly v2: Split out function argument changes Signed-off-by: Andi Kleen --- tools/perf/builtin-stat.c | 64 +++ 1 file changed, 64 insertions(+) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 0174aca..7801665 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -486,6 +486,8 @@ static void aggr_printout(struct perf_evsel *evsel, int id, int nr) struct outstate { FILE *fh; const char *prefix; + int nfields; + u64 run, ena; }; #define BASE_INDENT 41 @@ -525,6 +527,49 @@ static void print_metric_std(void *ctx, const char *color, const char *fmt, fprintf(out, " %-*s", METRIC_LEN - n - 1, unit); } +static void new_line_csv(void *ctx) +{ + struct outstate *os = ctx; + int i; + + fputc('\n', os->fh); + if (os->prefix) + fprintf(os->fh, "%s%s", os->prefix, csv_sep); + for (i = 0; i < os->nfields; i++) + fputs(csv_sep, os->fh); +} + +static void print_metric_csv(void *ctx, +const char *color __maybe_unused, +const char *fmt, const char *unit, double val) +{ + struct outstate *os = ctx; + FILE *out = os->fh; + char buf[64], *vals, *ends; + + if (unit == NULL) { + fprintf(out, "%s%s%s%s", csv_sep, csv_sep, csv_sep, csv_sep); + return; + } + fprintf(out, "%s%" PRIu64 "%s%.2f%s", + csv_sep, + os->run, + csv_sep, + os->ena ? 100.0 * os->run / os->ena : 100.0, + csv_sep); + snprintf(buf, sizeof(buf), fmt, val); + vals = buf; + while (isspace(*vals)) + vals++; + ends = vals; + while (isdigit(*ends) || *ends == '.') + ends++; + *ends = 0; + while (isspace(*unit)) + unit++; + fprintf(out, "%s%s%s", vals, csv_sep, unit); +} + static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg) { FILE *output = stat_config.output; @@ -598,6 +643,25 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval, else nl = new_line_std; + if (csv_output) { + static int aggr_fields[] = { + [AGGR_GLOBAL] = 0, + [AGGR_THREAD] = 1, + [AGGR_NONE] = 1, + [AGGR_SOCKET] = 2, + [AGGR_CORE] = 2, + }; + + pm = print_metric_csv; + nl = new_line_csv; + os.nfields = 1; + os.nfields += aggr_fields[stat_config.aggr_mode]; + if (counter->cgrp) + os.nfields++; + os.run = run; + os.ena = ena; + } + if (nsec_counter(counter)) nsec_printout(id, nr, counter, uval); else -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/7] perf, tools, stat: Move sw clock metrics printout to stat-shadow
From: Andi Kleen The sw clock metrics printing was missed in the earlier move to stat-shadow of all the other metric printouts. Move it too. Signed-off-by: Andi Kleen --- tools/perf/builtin-stat.c | 9 - tools/perf/util/stat-shadow.c | 4 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index a054ddc..fac0c50 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -509,15 +509,6 @@ static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg) if (evsel->cgrp) fprintf(output, "%s%s", csv_sep, evsel->cgrp->name); - - if (csv_output || stat_config.interval) - return; - - if (perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK)) - fprintf(output, " # %8.3f CPUs utilized ", - avg / avg_stats(_nsecs_stats)); - else - fprintf(output, " "); } static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg) diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index 2a5d8d7..fc970db 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -413,6 +413,10 @@ void perf_stat__print_shadow_stats(FILE *out, struct perf_evsel *evsel, ratio = total / avg; fprintf(out, " # %8.0f cycles / elision ", ratio); + } else if (perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK) && + (ratio = avg_stats(_nsecs_stats)) != 0) { + fprintf(output, " # %8.3f CPUs utilized ", +avg / ratio); } else if (runtime_nsecs_stats[cpu].n != 0) { char unit = 'M'; -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v12 17/21] h8300: clock driver
Hello Sato-san, Unfortunately this patch did not Cc myself, Stephen Boyd or the linux-...@vger.kernel.org mailing list. As such Stephen and I did not have a chance to review it. Even more unfortunate was that it was ninja merged by maintainers without our ack. :-/ Quoting Yoshinori Sato (2015-05-10 23:26:36) > Signed-off-by: Yoshinori Sato > --- > .../bindings/clock/renesas,h8300-div-clock.txt | 24 > .../bindings/clock/renesas,h8s2678-pll-clock.txt | 23 > drivers/clk/Makefile | 1 + > drivers/clk/h8300/Makefile | 2 + > drivers/clk/h8300/clk-div.c| 53 > drivers/clk/h8300/clk-h8s2678.c| 147 > + > 6 files changed, 250 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/clock/renesas,h8300-div-clock.txt > create mode 100644 > Documentation/devicetree/bindings/clock/renesas,h8s2678-pll-clock.txt > create mode 100644 drivers/clk/h8300/Makefile > create mode 100644 drivers/clk/h8300/clk-div.c > create mode 100644 drivers/clk/h8300/clk-h8s2678.c > > diff --git > a/Documentation/devicetree/bindings/clock/renesas,h8300-div-clock.txt > b/Documentation/devicetree/bindings/clock/renesas,h8300-div-clock.txt > new file mode 100644 > index 000..36c2b52 > --- /dev/null > +++ b/Documentation/devicetree/bindings/clock/renesas,h8300-div-clock.txt > @@ -0,0 +1,24 @@ > +* Renesas H8/300 divider clock > + > +Required Properties: > + > + - compatible: Must be "renesas,sh73a0-h8300-div-clock" > + > + - clocks: Reference to the parent clocks ("extal1" and "extal2") > + > + - #clock-cells: Must be 1 > + > + - reg: Base address and length of the divide rate selector > + > + - renesas,width: bit width of selector > + > +Example > +--- > + > + cclk: cclk { > + compatible = "renesas,h8300-div-clock"; > + clocks = <>; > + #clock-cells = <0>; > + reg = <0xfee01b 2>; > + renesas,width = <2>; > + }; I could not find any info on this clock in the H8S/2678 reference manual[0]. Could you point me to the right documentation? > diff --git > a/Documentation/devicetree/bindings/clock/renesas,h8s2678-pll-clock.txt > b/Documentation/devicetree/bindings/clock/renesas,h8s2678-pll-clock.txt > new file mode 100644 > index 000..500cdadb > --- /dev/null > +++ b/Documentation/devicetree/bindings/clock/renesas,h8s2678-pll-clock.txt > @@ -0,0 +1,23 @@ > +Renesas H8S2678 PLL clock > + > +This device is Clock multiplyer > + > +Required Properties: > + > + - compatible: Must be "renesas,h8s2678-pll-clock" > + > + - clocks: Reference to the parent clocks > + > + - #clock-cells: Must be 0 > + > + - reg: Two rate selector (Multiply / Divide) register address > + > +Example > +--- > + > + pllclk: pllclk { > + compatible = "renesas,h8s2678-pll-clock"; > + clocks = <>; > + #clock-cells = <0>; > + reg = <0xfee03b 2>, <0xfee045 2>; > + }; Is there really only one clock output? According to figure 21.1 there is the "System clock to φ pin" output and the "Internal clock to peripheral modules" output. I am wondering if clock-cells should be 1 instead of zero and support both of these output signals? As a nitpick, I think it would have been better to name the node "cpg" as it is listed in Section 21. pllclk is only one of the two registers that make up the cpg. Something like: cpg: clock-controller@fee03b If you do decide to have clock-cells greater than zero, you might find the following threads helpful. They describe how to craft a clock-controller style binding: http://lkml.kernel.org/r/<20150411001231.18916.93186@quantum> http://lkml.kernel.org/r/<20150724034229.642.88156@quantum> As an additional thought, it looks like the module stop registers are mixed in with the clock registers. When you decide to write a reset driver for these platforms you might want to re-use this existing dt binding description and put the reset code into your clock provider driver. Grep for reset.h in the drivers/clk/ directory for some examples. [0] http://documentation.renesas.com/doc/products/mpumcu/rej09b0283_2678hm.pdf Regards, Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] dma: add __must_check annotation for dmaengine_pause()
On 08/07/2015 04:00 PM, Sebastian Andrzej Siewior wrote: > In 8250-omap I learned it the hard way that ignoring the return code > of dmaengine_pause() might be bad because the underlying DMA driver > might not support the function at all and so not doing what one is > expecting. > This patch adds the __must_check annotation as suggested by Russell King. > > Signed-off-by: Sebastian Andrzej Siewior > --- > include/linux/dmaengine.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h > index 8ad9a4e839f6..4eac4716bded 100644 > --- a/include/linux/dmaengine.h > +++ b/include/linux/dmaengine.h > @@ -825,7 +825,7 @@ static inline int dmaengine_terminate_all(struct dma_chan > *chan) > return -ENOSYS; > } > > -static inline int dmaengine_pause(struct dma_chan *chan) > +static inline int __must_check dmaengine_pause(struct dma_chan *chan) > { > if (chan->device->device_pause) > return chan->device->device_pause(chan); > Not that this is your responsibility, Sebastian, but considering there are fewer than 20 users of dmaengine_pause() in the entire tree, we should add WARN_ON_ONCE() around those uses with this patch to avoid a bunch needless one-off "fixes". Regards, Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Input: synaptics-rmi4: Add device tree support for RMI4 I2C devices
Add devicetree binding for I2C devices and add bindings for optional parameters in the function drivers. Signed-off-by: Andrew Duggan --- I saw Benjamin Tissoires's email about the lack of a devicetree implementation for rmi_i2c.c. I decided to clean up and add documentation to the implementaion which I have been using and submit it for review. This patch applies to the current implementation of Dmitry's synaptics-rmi4 branch in the input repository. If Benjamin's patchset gets applied before this I can rebase this patch. Thanks, Andrew .../devicetree/bindings/input/rmi4/rmi_f01.txt | 34 ++ .../devicetree/bindings/input/rmi4/rmi_f11.txt | 51 .../devicetree/bindings/input/rmi4/rmi_i2c.txt | 40 +++ .../devicetree/bindings/vendor-prefixes.txt| 1 + drivers/input/rmi4/rmi_bus.c | 49 drivers/input/rmi4/rmi_bus.h | 8 +- drivers/input/rmi4/rmi_driver.c| 46 ++- drivers/input/rmi4/rmi_f01.c | 50 +++- drivers/input/rmi4/rmi_f11.c | 133 - drivers/input/rmi4/rmi_i2c.c | 60 +- include/linux/rmi.h| 2 +- 11 files changed, 465 insertions(+), 9 deletions(-) create mode 100644 Documentation/devicetree/bindings/input/rmi4/rmi_f01.txt create mode 100644 Documentation/devicetree/bindings/input/rmi4/rmi_f11.txt create mode 100644 Documentation/devicetree/bindings/input/rmi4/rmi_i2c.txt diff --git a/Documentation/devicetree/bindings/input/rmi4/rmi_f01.txt b/Documentation/devicetree/bindings/input/rmi4/rmi_f01.txt new file mode 100644 index 000..53846e2 --- /dev/null +++ b/Documentation/devicetree/bindings/input/rmi4/rmi_f01.txt @@ -0,0 +1,34 @@ +Synaptics RMI4 F01 Device Binding + +The Synaptics RMI4 core is able to support RMI4 devices using differnet +transports and differnet functions. This file describes the device tree +bindings for devices which contain Function 1. Complete documentation +for transports and other functions can be found in: +Documentation/devicetree/bindings/input/rmi4. + +Additional documentation for F01 can be found at: +http://www.synaptics.com/sites/default/files/511-000136-01-Rev-E-RMI4-Interfacing-Guide.pdf + +Optional Properties: +- syna,f01-nosleep: If set the device will run at full power without sleeping. +- syna,f01-wakeup-threshold: Defines the amplitude of the disturbance to the + background capacitance that will cause the + device to wake from dozing. +- syna,f01-doze-holdoff: The delay to wait after the last finger lift and the + first doze cycle (in 0.1 second units). +- syna,f01-doze-interval: The time period that the device sleeps between finger + activity (in 10 ms units). + + +Example of a RMI4 I2C device with F01: +{ + rmi-i2c-dev@2c { + compatible = "syna,rmi-i2c"; + reg = <0x2c>; + syna,sensor-name="TM1949"; + syna,attn-gpio = <4 2>; + syna,attn-polarity = <0>; + syna,level-triggered = <1>; + syna,f01-nosleep = <1>; + }; + }; diff --git a/Documentation/devicetree/bindings/input/rmi4/rmi_f11.txt b/Documentation/devicetree/bindings/input/rmi4/rmi_f11.txt new file mode 100644 index 000..2405523 --- /dev/null +++ b/Documentation/devicetree/bindings/input/rmi4/rmi_f11.txt @@ -0,0 +1,51 @@ +Synaptics RMI4 F11 Device Binding + +The Synaptics RMI4 core is able to support RMI4 devices using differnet +transports and differnet functions. This file describes the device tree +bindings for devices which contain Function 11. Complete documentation +for transports and other functions can be found in: +Documentation/devicetree/bindings/input/rmi4. + +RMI4 Function 11 is for 2D touch position sensing. Additional documentation for +F11 can be found at: +http://www.synaptics.com/sites/default/files/511-000136-01-Rev-E-RMI4-Interfacing-Guide.pdf + +Optional Properties: +- syna,f11-swap-axes: Swap X and Y positions when reporting. +- syna,f11-flip-x: Reverse the direction of X. +- syna,f11-flip-y: Reverse the direction of Y. +- syna,f11-clip-x-low: Sets a minimum value for X. +- syna,f11-clip-y-low: Sets a minimum value for Y. +- syna,f11-clip-x-high: Sets a maximum value for X. +- syna,f11-clip-y-high: Sets a maximum value for Y. +- syna,f11-offset-x: Add an offset to X. +- syna,f11-offset_y: Add an offset to Y. +- syna,f11-delta-x-threshold: Set the minimum distance on the X axis required + to generate an interrupt in reduced reporting + mode. +- syna,f11-delta-y-threshold: Set the minimum distance on the Y axis required +
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
Hi David, On Fri, Aug 7, 2015 at 8:14 PM, David Daney wrote: > On 08/07/2015 07:54 AM, Graeme Gregory wrote: >> >> On Thu, Aug 06, 2015 at 05:33:10PM -0700, David Daney wrote: >>> >>> From: David Daney >>> >>> Find out which PHYs belong to which BGX instance in the ACPI way. >>> >>> Set the MAC address of the device as provided by ACPI tables. This is >>> similar to the implementation for devicetree in >>> of_get_mac_address(). The table is searched for the device property >>> entries "mac-address", "local-mac-address" and "address" in that >>> order. The address is provided in a u64 variable and must contain a >>> valid 6 bytes-len mac addr. >>> >>> Based on code from: Narinder Dhillon >>> Tomasz Nowicki >>> Robert Richter >>> >>> Signed-off-by: Tomasz Nowicki >>> Signed-off-by: Robert Richter >>> Signed-off-by: David Daney >>> --- >>> drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 137 >>> +- >>> 1 file changed, 135 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c >>> b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c >>> index 615b2af..2056583 100644 >>> --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c >>> +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c > > [...] >>> >>> + >>> +static int acpi_get_mac_address(struct acpi_device *adev, u8 *dst) >>> +{ >>> + const union acpi_object *prop; >>> + u64 mac_val; >>> + u8 mac[ETH_ALEN]; >>> + int i, j; >>> + int ret; >>> + >>> + for (i = 0; i < ARRAY_SIZE(addr_propnames); i++) { >>> + ret = acpi_dev_get_property(adev, addr_propnames[i], >>> + ACPI_TYPE_INTEGER, ); >> >> >> Shouldn't this be trying to use device_property_read_* API and making >> the DT/ACPI path the same where possible? >> > > Ideally, something like you suggest would be possible. However, there are a > couple of problems trying to do it in the kernel as it exists today: > > 1) There is no 'struct device *' here, so device_property_read_* is not > applicable. > > 2) There is no standard ACPI binding for MAC addresses, so it is impossible > to create a hypothetical fw_get_mac_address(), which would be analogous to > of_get_mac_address(). > > Other e-mail threads have suggested that the path to an elegant solution is > to inter-mix a bunch of calls to acpi_dev_get_property*() and > fwnode_property_read*() as to use these more generic fwnode_property_read*() > functions whereever possible. I rejected this approach as it seems cleaner > to me to consistently use a single set of APIs. Actually, that wasn't my intention. I wanted to say that once you'd got an ACPI device pointer (struct acpi_device), you could easly convert it to a struct fwnode_handle pointer and operate that going forward when accessing properties. That at least would help with the properties that do not differ between DT and ACPI. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote: > > On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: > > I'm for solution 3: > > > > - keep blk_bio_{discard,write_same}_split, but ensure we never built > >a > 4GB bio in blkdev_issue_{discard,write_same}. > > This has problem as I mentioned in solution 1. > We need to also make sure max discard size is of proper granularity. > See below example. > > 4G: 8388608 sectors > UINT_MAX: 8388607 sectors > > dm-thinp block size = default discard granularity = 128 sectors > > blkdev_issue_discard(sector=0, nr_sectors=8388608) > > 1. Only ensure bi_size not overflow > > It doesn't work. > > [start_sector, end_sector] > [0, 8388607] > [0, 8388606], then dm-thinp splits it to 2 bios > [0, 8388479] > [8388480, 8388606] ---> this has problem in process_discard_bio(), > because the discard size(7 sectors) covers > less than a block(128 sectors) > [8388607, 8388607] ---> same problem > > 2. Ensure bi_size not overflow and max discard size is of proper granularity Ideally we'd get upper layers out of the business of knowing about the queue limits at all - that was the point of the patch series, after all. Instead of using UINT_MAX, would it work to just make the max 1 << 31 sectors? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] tty: serial: 8250_omap: do not use RX DMA if pause is not supported
On 08/07/2015 04:00 PM, Sebastian Andrzej Siewior wrote: > The 8250-omap driver requires the DMA-engine driver to support the pause > command in order to properly turn off programmed RX transfer before the > driver stars manually reading from the FIFO. > The lacking support of the requirement has been discovered recently. In > order to stay safe here we disable support for RX-DMA as soon as we > notice that it does not work. This should happen very early. > If the user does not want to see this backtrace he can either disable > DMA support (completely or RX-only) or backport the required patches for > edma / omap-dma once they hit mainline. > > Cc: > Signed-off-by: Sebastian Andrzej Siewior > --- > drivers/tty/serial/8250/8250_omap.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/tty/serial/8250/8250_omap.c > b/drivers/tty/serial/8250/8250_omap.c > index 0340ee6ba970..07a11e0935e4 100644 > --- a/drivers/tty/serial/8250/8250_omap.c > +++ b/drivers/tty/serial/8250/8250_omap.c > @@ -112,6 +112,7 @@ struct omap8250_priv { > struct work_struct qos_work; > struct uart_8250_dma omap8250_dma; > spinlock_t rx_dma_lock; > + bool rx_dma_broken; > }; > > static u32 uart_read(struct uart_8250_port *up, u32 reg) > @@ -761,6 +762,7 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port > *p) > struct omap8250_priv*priv = p->port.private_data; > struct uart_8250_dma*dma = p->dma; > unsigned long flags; > + int ret; > > spin_lock_irqsave(>rx_dma_lock, flags); > > @@ -769,7 +771,9 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port > *p) > return; > } > > - dmaengine_pause(dma->rxchan); > + ret = dmaengine_pause(dma->rxchan); > + if (WARN_ON_ONCE(ret)) > + priv->rx_dma_broken = true; No offense, Sebastian, but it boggles my mind that anyone could defend this as solid api design. We're in the middle of an interrupt handler and the slave dma driver is /just/ telling us now that it doesn't implement this functionality?!!? The dmaengine api has _so much_ setup and none of it contemplates telling the consumer that critical functionality is missing? Even dma_get_slave_caps() returns _true_ for cmd_pause support; ok, that interface is pointless. Rather than losing /critical data/ here, the interrupt handler should just busy-wait until dmaengine_tx_status() returns DMA_COMPLETE for the rx_cookie. Regards, Peter Hurley > spin_unlock_irqrestore(>rx_dma_lock, flags); > > @@ -813,6 +817,9 @@ static int omap_8250_rx_dma(struct uart_8250_port *p, > unsigned int iir) > break; > } > > + if (priv->rx_dma_broken) > + return -EINVAL; > + > spin_lock_irqsave(>rx_dma_lock, flags); > > if (dma->rx_running) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
Hi David, On Sat, Aug 8, 2015 at 2:11 AM, David Daney wrote: > On 08/07/2015 05:05 PM, Rafael J. Wysocki wrote: [cut] >> >> It is actually useful to people as far as I can say. >> >> Also, if somebody is going to use properties with ACPI, why whould >> they use a different set of properties with DT? >> >> Wouldn't it be more reasonable to use the same set in both cases? > > > Yes, but there is still quite a bit of leeway to screw things up. That I have to agree with, unfortunately. On the other hand, this is a fairly new concept and we need to gain some experience with it to be able to come up with best practices and so on. Cases like yours are really helpful here. > FWIW: http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf > > This actually seems to have been adopted by the UEFI people as a > "Standard", I am not sure where a record of this is kept though. Work on this is in progress, but far from completion. Essentially, what's needed is more pressure from vendors who want to use properties in their firmware. > So, we are changing our firmware to use this standard (which is quite > similar the the DT with respect to MAC addresses). Cool. :-) Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] Powerpc: mpc85xx: refactor the PM operations
On Fri, 2015-08-07 at 11:19 +0800, Chenhui Zhao wrote: > On Fri, Aug 7, 2015 at 2:02 AM, Scott Wood > wrote: > > On Thu, 2015-08-06 at 13:54 +0800, Chenhui Zhao wrote: > > > On Thu, Aug 6, 2015 at 1:46 PM, Scott Wood > > > wrote: > > > > On Thu, 2015-08-06 at 12:20 +0800, Chenhui Zhao wrote: > > > > > On Thu, Aug 6, 2015 at 10:57 AM, Scott Wood > > > > > > > > > > wrote: > > > > > > On Wed, 2015-08-05 at 18:11 +0800, Chenhui Zhao wrote: > > > > > > > On Tue, Aug 4, 2015 at 4:26 AM, Scott Wood > > > > > > > > > > > > wrote: > > > > > > > > On Mon, 2015-08-03 at 19:32 +0800, Chenhui Zhao wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Aug 1, 2015 at 7:59 AM, Scott Wood > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Could you explain irq_mask()? Why would there > > > still be > > > > > IRQs > > > > > > > > > destined > > > > > > > > > > for > > > > > > > > > > this CPU at this point? > > > > > > > > > > > > > > > > > > This function just masks irq by setting the > > > registers in > > > > > RCPM > > > > > > > (for > > > > > > > > > example, RCPM_CPMIMR, RCPM_CPMCIMR). Actually, all > > > irqs to > > > > > > > this CPU > > > > > > > > > have been migrated to other CPUs. > > > > > > > > > > > > > > > > So why do we need to set those bits in RCPM? Is it just > > > > > caution? > > > > > > > > > > > > > > Setting these bits can mask interrupts signalled to RCPM > > > from > > > > > MPIC > > > > > > > as a > > > > > > > means of > > > > > > > waking up from a lower power state. So, cores will not be > > > > > waked up > > > > > > > unexpectedly. > > > > > > > > > > > > Why would the MPIC be signalling those interrupts if they've > > > been > > > > > > masked at > > > > > > the MPIC? > > > > > > > > > > > > -Scott > > > > > > > > > > > > > > > > The interrupts to RCPM from MPIC are IRQ, Machine Check, NMI > > > and > > > > > Critical interrupts. Some of them didn't be masked in MPIC. > > > > > > > > What interrupt could actually happen to a sleeping cpu that this > > > > protects > > > > against? > > > > > > > > -Scott > > > > > > Not sure. Maybe spurious interrupts or hardware exceptions. > > > > Spurious interrupts happen due to race conditions. They don't happen > > because > > the MPIC is bored and decides to ring a CPU's doorbell and hide in > > the bushes. > > > > If by "hardware exceptions" you mean machine checks, how would such a > > machine > > check be generated by a core that is off? > > > > > However, setting them make sure dead cpus can not be waked up > > > unexpectedly. > > > > I'm not seeing enough value here to warrant resurrecting the old > > sleep node > > stuff. > > > > -Scott > > My guess maybe not accurate. My point is that electronic parts don't > always work as expected. Taking preventative measures can make the > system more robust. In addition, this step is required in deep sleep > procedure. The deep sleep part is more convincing -- so MPIC masking is not effective during deep sleep? -Scott -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
On 08/07/2015 05:05 PM, Rafael J. Wysocki wrote: Hi Mark, On Fri, Aug 7, 2015 at 7:51 PM, Mark Rutland wrote: [Correcting the devicetree list address, which I typo'd in my original reply] +static const char * const addr_propnames[] = { + "mac-address", + "local-mac-address", + "address", +}; If these are going to be generally necessary, then we should get them adopted as standardised _DSD properties (ideally just one of them). As far as I can tell, and please correct me if I am wrong, ACPI-6.0 doesn't contemplate MAC addresses. Today we are using "mac-address", which is an Integer containing the MAC address in its lowest order 48 bits in Little-Endian byte order. The hardware and ACPI tables are here today, and we would like to support it. If some future ACPI specification specifies a standard way to do this, we will probably adapt the code to do this in a standard manner. [...] +static acpi_status bgx_acpi_register_phy(acpi_handle handle, + u32 lvl, void *context, void **rv) +{ + struct acpi_reference_args args; + const union acpi_object *prop; + struct bgx *bgx = context; + struct acpi_device *adev; + struct device *phy_dev; + u32 phy_id; + + if (acpi_bus_get_device(handle, )) + goto out; + + SET_NETDEV_DEV(>lmac[bgx->lmac_count].netdev, >pdev->dev); + + acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); + + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; + + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, )) + goto out; + + if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, )) + goto out; Likewise for any inter-device properties, so that we can actually handle them in a generic fashion, and avoid / learn from the mistakes we've already handled with DT. This is the fallacy of the ACPI is superior to DT argument. The specification of PHY topology and MAC addresses is well standardized in DT, there is no question about what the proper way to specify it is. Under ACPI, it is the Wild West, there is no specification, so each system design is forced to invent something, and everybody comes up with an incompatible implementation. Indeed. If ACPI is going to handle it, it should handle it properly. I really don't see the point in bodging properties together in a less standard manner than DT, especially for inter-device relationships. Doing so is painful for _everyone_, and it's extremely unlikely that other ACPI-aware OSs will actually support these custom descriptions, making this Linux-specific, and breaking the rationale for using ACPI in the first place -- a standard that says "just do non-standard stuff" is not a usable standard. For intra-device properties, we should standardise what we can, but vendor-specific stuff is ok -- this can be self-contained within a driver. For inter-device relationships ACPI _must_ gain a better model of componentised devices. It's simply unworkable otherwise, and as you point out it's fallacious to say that because ACPI is being used that something is magically industry standard, portable, etc. This is not your problem in particular; the entire handling of _DSD so far is a joke IMO. It is actually useful to people as far as I can say. Also, if somebody is going to use properties with ACPI, why whould they use a different set of properties with DT? Wouldn't it be more reasonable to use the same set in both cases? Yes, but there is still quite a bit of leeway to screw things up. FWIW: http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf This actually seems to have been adopted by the UEFI people as a "Standard", I am not sure where a record of this is kept though. So, we are changing our firmware to use this standard (which is quite similar the the DT with respect to MAC addresses). Thanks, David Daney -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.
Hi Mark, On Fri, Aug 7, 2015 at 7:51 PM, Mark Rutland wrote: > [Correcting the devicetree list address, which I typo'd in my original > reply] > >> >> +static const char * const addr_propnames[] = { >> >> + "mac-address", >> >> + "local-mac-address", >> >> + "address", >> >> +}; >> > >> > If these are going to be generally necessary, then we should get them >> > adopted as standardised _DSD properties (ideally just one of them). >> >> As far as I can tell, and please correct me if I am wrong, ACPI-6.0 >> doesn't contemplate MAC addresses. >> >> Today we are using "mac-address", which is an Integer containing the MAC >> address in its lowest order 48 bits in Little-Endian byte order. >> >> The hardware and ACPI tables are here today, and we would like to >> support it. If some future ACPI specification specifies a standard way >> to do this, we will probably adapt the code to do this in a standard manner. >> >> >> > >> > [...] >> > >> >> +static acpi_status bgx_acpi_register_phy(acpi_handle handle, >> >> + u32 lvl, void *context, void **rv) >> >> +{ >> >> + struct acpi_reference_args args; >> >> + const union acpi_object *prop; >> >> + struct bgx *bgx = context; >> >> + struct acpi_device *adev; >> >> + struct device *phy_dev; >> >> + u32 phy_id; >> >> + >> >> + if (acpi_bus_get_device(handle, )) >> >> + goto out; >> >> + >> >> + SET_NETDEV_DEV(>lmac[bgx->lmac_count].netdev, >pdev->dev); >> >> + >> >> + acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); >> >> + >> >> + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; >> >> + >> >> + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, )) >> >> + goto out; >> >> + >> >> + if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, >> >> )) >> >> + goto out; >> > >> > Likewise for any inter-device properties, so that we can actually handle >> > them in a generic fashion, and avoid / learn from the mistakes we've >> > already handled with DT. >> >> This is the fallacy of the ACPI is superior to DT argument. The >> specification of PHY topology and MAC addresses is well standardized in >> DT, there is no question about what the proper way to specify it is. >> Under ACPI, it is the Wild West, there is no specification, so each >> system design is forced to invent something, and everybody comes up with >> an incompatible implementation. > > Indeed. > > If ACPI is going to handle it, it should handle it properly. I really > don't see the point in bodging properties together in a less standard > manner than DT, especially for inter-device relationships. > > Doing so is painful for _everyone_, and it's extremely unlikely that > other ACPI-aware OSs will actually support these custom descriptions, > making this Linux-specific, and breaking the rationale for using ACPI in > the first place -- a standard that says "just do non-standard stuff" is > not a usable standard. > > For intra-device properties, we should standardise what we can, but > vendor-specific stuff is ok -- this can be self-contained within a > driver. > > For inter-device relationships ACPI _must_ gain a better model of > componentised devices. It's simply unworkable otherwise, and as you > point out it's fallacious to say that because ACPI is being used that > something is magically industry standard, portable, etc. > > This is not your problem in particular; the entire handling of _DSD so > far is a joke IMO. It is actually useful to people as far as I can say. Also, if somebody is going to use properties with ACPI, why whould they use a different set of properties with DT? Wouldn't it be more reasonable to use the same set in both cases? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf, tools, report: Add support for srcfile sort key
Em Fri, Aug 07, 2015 at 08:51:45PM -0300, Arnaldo Carvalho de Melo escreveu: > Em Fri, Aug 07, 2015 at 03:54:24PM -0700, Andi Kleen escreveu: > > From: Andi Kleen > > > > In some cases it's useful to characterize samples by file. This is useful > > to get a higher level categorization, for example to map cost to > > subsystems. > > > > Add a srcfile sort key to perf report. It builds on top of the existing > > srcline support. > > Applied Humm, holding this up a bit, further testing showed some oddities, fedora21, the width of the column is being limited to the lenght of the header and there are some DWARF errors, have you noticed those? [root@zoo ~]# rpm -q binutils-devel binutils-devel-2.24-32.fc21.x86_64 # perf report --stdio -s srcfile Failed to open /tmp/perf-2268.map, continuing without symbols # To display the perf.data header info, please use --header/--header-only options. # BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (111062) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (111062) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (22876). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (58106). BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size (58106). BFD: Dwarf Error: Offset (58380) greater than or equal to .debug_str size (17648). BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size (58106). BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size (58106). BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size (58106). BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size (58106). BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size (58106). BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size (58106). BFD: Dwarf Error: Offset
Re: [PATCH 1/3] clk: berlin: add cpuclk
On 07/30, Antoine Tenart wrote: > Add cpuclk in the Berlin BG2Q clock driver. This clk has a divider > fixed to 1. > > Signed-off-by: Antoine Tenart > --- I didn't get anything besides 1/3, but this looks ok. If you don't want us to take this through clk-tree, feel to take my ack Acked-by: Stephen Boyd > drivers/clk/berlin/bg2q.c| 14 +++--- > include/dt-bindings/clock/berlin2q.h | 3 ++- > 2 files changed, 9 insertions(+), 8 deletions(-) > > diff --git a/drivers/clk/berlin/bg2q.c b/drivers/clk/berlin/bg2q.c > index 221f40c2b850..72d2f3500db8 100644 > --- a/drivers/clk/berlin/bg2q.c > +++ b/drivers/clk/berlin/bg2q.c > @@ -45,7 +45,7 @@ > #define REG_SDIO0XIN_CLKCTL 0x0158 > #define REG_SDIO1XIN_CLKCTL 0x015c > > -#define MAX_CLKS 27 > +#define MAX_CLKS 28 Maybe you should move this into the dt-bindings.h file. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the clk tree
On 08/03, Tony Lindgren wrote: > > Sorry for the delay in replying, I was offline for a few days. Thanks > for fixing it up. I assume no need to apply this fix into arm-soc > for-next but please let me know if you prefer that. > Can we apply it into arm-soc. I'm worried we're going to forget about it, and having the include there in arm-soc (where a3314e9cf69c isn't present) doesn't hurt anything from what I can tell. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: create and use seq_show_option for escaping
On Fri, Aug 7, 2015 at 4:41 PM, Kees Cook wrote: > Many file systems that implement the show_options hook fail to correctly > escape their output which could lead to unescaped characters (e.g. new > lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This > could lead to confusion, spoofed entries (resulting in things like > systemd issuing false d-bus "mount" notifications), and who knows > what else. This looks like it would only be the root user stepping on > themselves, but it's possible weird things could happen in containers > or in other situations with delegated mount privileges. > > Here's an example using overlay with setuid fusermount trusting the > contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of > "sudo" is something more sneaky: > > $ BASE="ovl" > $ MNT="$BASE/mnt" > $ LOW="$BASE/lower" > $ UP="$BASE/upper" > $ WORK="$BASE/work/ 0 0 > none /proc fuse.pwn user_id=1000" > $ mkdir -p "$LOW" "$UP" "$WORK" > $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none > /mnt > $ cat /proc/mounts > none /root/ovl/mnt overlay > rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0 > none /proc fuse.pwn user_id=1000 0 0 > $ fusermount -u /proc > $ cat /proc/mounts > cat: /proc/mounts: No such file or directory > > This fixes the problem by adding new seq_show_option and seq_show_option_n > helpers, and updating the vulnerable show_option handlers to use them as > needed. Some, like SELinux, need to be open coded due to unusual existing > escape mechanisms. > > Signed-off-by: Kees Cook > Cc: sta...@vger.kernel.org > --- > fs/ceph/super.c | 2 +- > fs/cifs/cifsfs.c | 6 +++--- > fs/ext3/super.c | 4 ++-- > fs/ext4/super.c | 4 ++-- > fs/gfs2/super.c | 6 +++--- > fs/hfs/super.c | 4 ++-- > fs/hfsplus/options.c | 4 ++-- > fs/hostfs/hostfs_kern.c | 2 +- > fs/ocfs2/super.c | 4 ++-- > fs/overlayfs/super.c | 6 +++--- > fs/reiserfs/super.c | 8 +--- > fs/xfs/xfs_super.c | 4 ++-- > include/linux/seq_file.h | 34 ++ > kernel/cgroup.c | 7 --- > net/ceph/ceph_common.c | 7 +-- > security/selinux/hooks.c | 2 +- > 16 files changed, 72 insertions(+), 32 deletions(-) > > diff --git a/fs/ceph/super.c b/fs/ceph/super.c > index d1c833c321b9..7b6bfcbf801c 100644 > --- a/fs/ceph/super.c > +++ b/fs/ceph/super.c > @@ -479,7 +479,7 @@ static int ceph_show_options(struct seq_file *m, struct > dentry *root) > if (fsopt->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT) > seq_printf(m, ",readdir_max_bytes=%d", > fsopt->max_readdir_bytes); > if (strcmp(fsopt->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT)) > - seq_printf(m, ",snapdirname=%s", fsopt->snapdir_name); > + seq_show_option(m, "snapdirname", fsopt->snapdir_name); > > return 0; > } > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c > index 0a9fb6b53126..6a1119e87fbb 100644 > --- a/fs/cifs/cifsfs.c > +++ b/fs/cifs/cifsfs.c > @@ -394,17 +394,17 @@ cifs_show_options(struct seq_file *s, struct dentry > *root) > struct sockaddr *srcaddr; > srcaddr = (struct sockaddr *)>ses->server->srcaddr; > > - seq_printf(s, ",vers=%s", tcon->ses->server->vals->version_string); > + seq_show_option(s, "vers", tcon->ses->server->vals->version_string); > cifs_show_security(s, tcon->ses); > cifs_show_cache_flavor(s, cifs_sb); > > if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER) > seq_puts(s, ",multiuser"); > else if (tcon->ses->user_name) > - seq_printf(s, ",username=%s", tcon->ses->user_name); > + seq_show_option(s, "username", tcon->ses->user_name); > > if (tcon->ses->domainName) > - seq_printf(s, ",domain=%s", tcon->ses->domainName); > + seq_show_option(s, "domain", tcon->ses->domainName); > > if (srcaddr->sa_family != AF_UNSPEC) { > struct sockaddr_in *saddr4; > diff --git a/fs/ext3/super.c b/fs/ext3/super.c > index 5ed0044fbb37..e9312494f3ee 100644 > --- a/fs/ext3/super.c > +++ b/fs/ext3/super.c > @@ -578,10 +578,10 @@ static inline void ext3_show_quota_options(struct > seq_file *seq, struct super_bl > } > > if (sbi->s_qf_names[USRQUOTA]) > - seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]); > + seq_show_option(seq, "usrjquota", sbi->s_qf_names[USRQUOTA]); > > if (sbi->s_qf_names[GRPQUOTA]) > - seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]); > + seq_show_option(seq, "grpjquota", sbi->s_qf_names[GRPQUOTA]); > > if (test_opt(sb, USRQUOTA)) > seq_puts(seq, ",usrquota"); > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 58987b5c514b..9981064c4a54 100644 > --- a/fs/ext4/super.c > +++
Re: [PATCH v2] clk: pxa: pxa3xx: fix CKEN register access
On 08/04, Robert Jarzmik wrote: > Clocks 0 to 31 are on CKENA, and not CKENB. The clock register names > were inadequately inverted. As a consequence, all clock operations were > happening on CKENB, because almost all but 2 clocks are on CKENA. > > As the clocks were activated by the bootloader in the former tests, it > escaped the testing that the wrong clock gate was manipulated. The error > was revealed by changing the pxa3xx-and driver to a module, where tupon > unloading the wrong clock was disabled in CKENB. > > Fixes: 9bbb8a338fb2 ("clk: pxa: add pxa3xx clock driver") > Signed-off-by: Robert Jarzmik > --- Applied to clk-fixes. Sorry, got busy last few days. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf, tools: Support static linking with libdw
Em Fri, Aug 07, 2015 at 03:18:48PM -0700, Andi Kleen escreveu: > From: Andi Kleen > > The Fedora 22 version of libdw requires a couple of extra libraries > to link. With a dynamic link the dependencies are pulled in automatically, > but this doesn't work for static linking. Add the needed libraries > explicitely to the feature probe and the Makefile. Thanks, applied. - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Potential data race in SyS_swapon
Em 07-08-2015 13:14, Andrey Konovalov escreveu: Hi! We are working on a dynamic data race detector for the Linux kernel called KernelThreadSanitizer (ktsan) (https://github.com/google/ktsan/wiki). While running ktsan on the upstream revision 21bdb584af8c with trinity we got a few reports from SyS_swapon, here is one of them: [...] The race is happening when accessing the swap_file field of a swap_info_struct struct. 2392 for (i = 0; i < nr_swapfiles; i++) { 2393 struct swap_info_struct *q = swap_info[i]; 2394 2395 if (q == p || !q->swap_file) 2396 continue; 2397 if (mapping == q->swap_file->f_mapping) { 2398 error = -EBUSY; 2399 goto bad_swap; 2400 } 2401 } 2539 spin_lock(_lock); 2540 p->swap_file = NULL; 2541 p->flags = 0; 2542 spin_unlock(_lock); There's another (more important) place which sets the swap_file field to NULL, it's within swapoff. It's also protected by swap_lock. Since the swap_lock lock is not taken in the first snippet, it's possible for q->swap_file to be assigned to NULL and reloaded between executing lines 2395 and 2397, which might lead to a null pointer dereference. I agree with that analysis. It should be possible to hit by racing swapon of a file with swapoff of another. Looks like the swap_lock should be taken when iterating through the swap_info array on lines 2392 - 2401. I'd take that lock a couple of lines earlier, so that every place that sets the swap_file field on a swap_info_struct is behind swap_lock, for simplicity. -- Cesar Eduardo Barros ces...@cesarb.eti.br -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irqchip: gic-v3: Explicitly include linux/io.h
On Fri, Aug 7, 2015 at 1:40 AM, Marc Zyngier wrote: > Hi Ben, > > On 07/08/15 06:31, Ben Zhang wrote: >> linux/io.h is needed because the driver uses: >> readl_relaxed >> writel_relaxed >> writeq_relaxed >> readq_relaxed >> iounmap >> >> The header was implicitly included by an unrelated >> commit 332fd7c4fef5 >> ("genirq: Generic chip: Change irq_reg_{readl,writel} arguments") >> from the path below: >> include/linux/io.h >> include/linux/irq.h >> include/linux/of_irq.h >> drivers/irqchip/irq-gic-v3.c >> >> Signed-off-by: Ben Zhang >> --- >> drivers/irqchip/irq-gic-v3.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c >> index e406bc5..3350b8d 100644 >> --- a/drivers/irqchip/irq-gic-v3.c >> +++ b/drivers/irqchip/irq-gic-v3.c >> @@ -19,6 +19,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> > > This seems valid, but out of curiosity: Has this triggered any > observable build failure? Or is that something you found by inspection? > > Thanks, > > M. > -- > Jazz is not dead. It just smells funny... Hi Marc, We have seen a build failure on the v3.18-based kernel for ChromeOS. After cherry-picking commit 74d23cc704d1 ("time: move the timecounter/cyclecounter code into its own file.") into the kernel, irq-gic-v3 build is broken: /mnt/host/source/src/third_party/kernel/v3.18/drivers/irqchip/irq-gic-v3.c: In function 'gic_do_wait_for_rwp': /mnt/host/source/src/third_party/kernel/v3.18/drivers/irqchip/irq-gic-v3.c:84:2: error: implicit declaration of function 'readl_relaxed' [-Werror=implicit-function-declaration] while (readl_relaxed(base + GICD_CTLR) & GICD_CTLR_RWP) { ^ ... The timecounter patch removes #include in include/clocksource/arm_arch_timer.h, and breaks all include paths to the arm64 io.h header, e.g. arch/arm64/include/asm/io.h include/linux/clocksource.h include/clocksource/arm_arch_timer.h arch/arm64/include/asm/arch_timer.h arch/arm64/include/asm/timex.h include/linux/timex.h include/linux/sched.h arch/arm64/include/asm/compat.h arch/arm64/include/asm/stat.h include/linux/stat.h include/linux/sysfs.h include/linux/kobject.h include/linux/device.h include/linux/node.h include/linux/cpu.h drivers/irqchip/irq-gic-v3.c Mainline build is fine because commit 332fd7c4fef5("genirq: Generic chip: Change irq_reg_{readl,writel} arguments") landed before the timecounter patch, and it creates a new include path to the io.h header: arch/arm64/include/asm/io.h include/linux/io.h include/linux/irq.h include/linux/of_irq.h drivers/irqchip/irq-gic-v3.c Thanks, Ben -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 4/9] dax: fix mapping lifetime handling, convert to __pfn_t + kmap_atomic_pfn_t()
On Sat, Jun 6, 2015 at 4:58 AM, Matthew Wilcox wrote: > On Fri, Jun 05, 2015 at 05:19:24PM -0400, Dan Williams wrote: >> @@ -35,13 +35,16 @@ int dax_clear_blocks(struct inode *inode, sector_t >> block, long size) >> might_sleep(); >> do { >> void *addr; >> - unsigned long pfn; >> + __pfn_t pfn; >> long count; >> >> - count = bdev_direct_access(bdev, sector, , , size); >> + count = bdev_direct_access(bdev, sector, , size); >> if (count < 0) >> return count; >> BUG_ON(size < count); >> + addr = kmap_atomic_pfn_t(pfn); >> + if (!addr) >> + return -EIO; >> while (count > 0) { >> unsigned pgsz = PAGE_SIZE - offset_in_page(addr); >> if (pgsz > count) > > This part is incomplete. When bdev_direct_access() could return an > address, it was possible for that address to be unaligned (eg when > 'sector' was not a multiple of 8). DAX has never had full support for > devices that weren't a 4k sector size, but I was trying to not make that > assumption in more places than I had to. So this function needs a lot > more simplification (or it needs to add '(sector & 7) << 9' to addr ... > assuming that the partition this bdev represents actually starts at a > multiple of 8 ... bleh!). Isn't this already handled by the: if (sector % (PAGE_SIZE / 512)) return -EINVAL; ...check in bdev_direct_access()? As long as the driver's mapping is 4K aligned, which appears to be the case for all DAX-enabled drivers, then we should be good to go. >> >> -static long dax_get_addr(struct buffer_head *bh, void **addr, unsigned >> blkbits) >> +static long dax_get_pfn(struct buffer_head *bh, __pfn_t *pfn, unsigned >> blkbits) >> { >> - unsigned long pfn; >> sector_t sector = bh->b_blocknr << (blkbits - 9); >> - return bdev_direct_access(bh->b_bdev, sector, addr, , bh->b_size); >> + return bdev_direct_access(bh->b_bdev, sector, pfn, bh->b_size); >> } > > This function should just be deleted. It offers essentially nothing > over just calling bdev_direct_access(). Ok. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf, tools, report: Add support for srcfile sort key
Em Fri, Aug 07, 2015 at 03:54:24PM -0700, Andi Kleen escreveu: > From: Andi Kleen > > In some cases it's useful to characterize samples by file. This is useful > to get a higher level categorization, for example to map cost to > subsystems. > > Add a srcfile sort key to perf report. It builds on top of the existing > srcline support. Applied > Signed-off-by: Andi Kleen > --- > tools/perf/Documentation/perf-report.txt | 2 ++ > tools/perf/util/hist.c | 2 ++ > tools/perf/util/hist.h | 1 + > tools/perf/util/sort.c | 52 > > tools/perf/util/sort.h | 2 ++ > 5 files changed, 59 insertions(+) > > diff --git a/tools/perf/Documentation/perf-report.txt > b/tools/perf/Documentation/perf-report.txt > index aabb1b4..724ab3f 100644 > --- a/tools/perf/Documentation/perf-report.txt > +++ b/tools/perf/Documentation/perf-report.txt > @@ -81,6 +81,8 @@ OPTIONS > - cpu: cpu number the task ran at the time of sample > - srcline: filename and line number executed at the time of sample. The > DWARF debugging info must be provided. > + - srcfile: file name of the source file of the same. Requires dwarf > + information. > - weight: Event specific weight, e.g. memory latency or transaction > abort cost. This is the global weight. > - local_weight: Local weight version of the weight above. > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c > index 6f28d53..37dd8ae 100644 > --- a/tools/perf/util/hist.c > +++ b/tools/perf/util/hist.c > @@ -944,6 +944,8 @@ void hist_entry__delete(struct hist_entry *he) > > zfree(>stat_acc); > free_srcline(he->srcline); > + if (he->srcfile && he->srcfile[0]) > + free(he->srcfile); > free_callchain(he->callchain); > free(he); > } > diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h > index 5ed8d9c..3be8087 100644 > --- a/tools/perf/util/hist.h > +++ b/tools/perf/util/hist.h > @@ -30,6 +30,7 @@ enum hist_column { > HISTC_PARENT, > HISTC_CPU, > HISTC_SRCLINE, > + HISTC_SRCFILE, > HISTC_MISPREDICT, > HISTC_IN_TX, > HISTC_ABORT, > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c > index 4c65a14..e3e8b13 100644 > --- a/tools/perf/util/sort.c > +++ b/tools/perf/util/sort.c > @@ -319,6 +319,57 @@ struct sort_entry sort_srcline = { > .se_width_idx = HISTC_SRCLINE, > }; > > +/* --sort srcfile */ > + > +static char no_srcfile[1]; > + > +static char *get_srcfile(struct hist_entry *e) > +{ > + char *sf, *p; > + struct map *map = e->ms.map; > + > + sf = get_srcline(map->dso, map__rip_2objdump(map, e->ip), > + e->ms.sym, true); > + p = strchr(sf, ':'); > + if (p && *sf) { > + *p = 0; > + return sf; > + } > + free(sf); > + return no_srcfile; > +} > + > +static int64_t > +sort__srcfile_cmp(struct hist_entry *left, struct hist_entry *right) > +{ > + if (!left->srcfile) { > + if (!left->ms.map) > + left->srcfile = no_srcfile; > + else > + left->srcfile = get_srcfile(left); > + } > + if (!right->srcfile) { > + if (!right->ms.map) > + right->srcfile = no_srcfile; > + else > + right->srcfile = get_srcfile(right); > + } > + return strcmp(right->srcfile, left->srcfile); > +} > + > +static int hist_entry__srcfile_snprintf(struct hist_entry *he, char *bf, > + size_t size, unsigned int width) > +{ > + return repsep_snprintf(bf, size, "%-*.*s", width, width, he->srcfile); > +} > + > +struct sort_entry sort_srcfile = { > + .se_header = "Source File", > + .se_cmp = sort__srcfile_cmp, > + .se_snprintf= hist_entry__srcfile_snprintf, > + .se_width_idx = HISTC_SRCFILE, > +}; > + > /* --sort parent */ > > static int64_t > @@ -1173,6 +1224,7 @@ static struct sort_dimension common_sort_dimensions[] = > { > DIM(SORT_PARENT, "parent", sort_parent), > DIM(SORT_CPU, "cpu", sort_cpu), > DIM(SORT_SRCLINE, "srcline", sort_srcline), > + DIM(SORT_SRCFILE, "srcfile", sort_srcfile), > DIM(SORT_LOCAL_WEIGHT, "local_weight", sort_local_weight), > DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight), > DIM(SORT_TRANSACTION, "transaction", sort_transaction), > diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h > index e97cd47..13705b2 100644 > --- a/tools/perf/util/sort.h > +++ b/tools/perf/util/sort.h > @@ -114,6 +114,7 @@ struct hist_entry { > }; > }; > char*srcline; > + char*srcfile; > struct symbol *parent; > struct rb_root sorted_chain; > struct branch_info *branch_info; > @@ -172,6 +173,7 @@ enum
Re: [PATCH v2 1/4] Add generic correlated clocksource code and ART to TSC conversion code
On 08/07/2015 04:01 PM, Christopher Hall wrote: Original patch description: Subject: ptp: Get sync timestamps From: Thomas Gleixner Date: Wed, 29 Jul 2015 10:52:06 +0200 The ART stuff wants to be splitted out. Changes === Add struct correlated_cs (clocksource) with pointer to original clocksource and function pointer to convert correlated clocksource to the original Add struct correlated_ts (timestamp) with function pointer to read correlated clocksource, device and system (in terms of correlated clocksource) counter values (input) with resulting converted real and monotonic raw system times (output) Add get_correlated_timestamp() function which given specific correlated_cs and correlated_ts convert correlated counter value to system time Add art_to_tsc conversion function translated Always Running Timer (ART) to TSC value --- arch/x86/kernel/tsc.c | 31 ++ include/linux/clocksource.h | 30 + include/linux/timekeeping.h | 4 +++ kernel/time/timekeeping.c | 63 + 4 files changed, 128 insertions(+) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 7437b41..a90aa6a 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1059,6 +1059,27 @@ int unsynchronized_tsc(void) return 0; } +static u32 tsc_numerator; +static u32 tsc_denominator; +/* + * CHECKME: Do we need the adjust value? It should be 0, but if we run + * in a VM this might be a different story. + */ +static u64 tsc_adjust; + +static u64 art_to_tsc(u64 cycles) +{ + u64 tmp, res = tsc_adjust; + + res += (cycles / tsc_denominator) * tsc_numerator; + tmp = (cycles % tsc_denominator) * tsc_numerator; + res += tmp / tsc_denominator; + return res; Nice trick! diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 278dd27..2ed3d0c 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -258,4 +258,34 @@ void acpi_generic_timer_init(void); static inline void acpi_generic_timer_init(void) { } #endif +/** + * struct correlated_cs - Descriptor for a clocksource correlated to another clocksource + * @related_cs:Pointer to the related timekeeping clocksource + * @convert: Conversion function to convert a timestamp from + * the correlated clocksource to cycles of the related + * timekeeping clocksource + */ +struct correlated_cs { + struct clocksource *related_cs; + u64 (*convert)(u64 cycles); Should the name make it clearer which way it converts? For example, convert_to_related? We might also want convert_from_related. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/11] test_user_copy improvements
On Fri, Aug 7, 2015 at 8:21 AM, James Hogan wrote: > These patches extend the test_user_copy test module to handle lots more > cases of user accessors which architectures can override separately, and > in particular those which are important for checking the MIPS Enhanced > Virtual Addressing (EVA) implementations, which need to handle > overlapping user and kernel address spaces, with special instructions > for accessing user address space from kernel mode. > > - Checking that kernel pointers are accepted when user address limit is > set to KERNEL_DS, as done by the kernel when it internally invokes > system calls with kernel pointers. > - Checking of the unchecked accessors (which don't call access_ok()). > Some of the tests are special cased for EVA at the moment which has > stricter hardware guarantees for bad user accesses than other > configurations. > - Checking of other sets of user accessors, including the inatomic user > copies, clear_user, compatibility accessors (copy_in_user and > _unaligned), the user string accessors, and the user checksum > functions, all of which need special handling in arch code with EVA. > > Tested on MIPS with and without EVA, and on x86_64. > > Only build tested for arm, blackfin, metag, microblaze, openrisc, > parisc, powerpc, sh, sparc, tile, i386 & xtensa. > > All arches were audited for the appropriate exports, only score is known > to still be missing some. > > Changes in v2: > - Add arch exports (patches 1-4). > - Reorder patches slightly. > - Patch 9: Drop strlen_user test. Microblaze doesn't define it, and > nothing actually uses it. IMO it should be removed, and there's no > point testing it in the mean time. > - Patch 10: Conditionalise on CONFIG_COMPAT, otherwise it breaks build > on some 32-bit arches e.g. i386 (kbuild test robot). > - Patch 10: Add testing of _unaligned accessors, which are also > conditional upon CONFIG_COMPAT. > - Patch 11: Only test csum_partial_copy_from_user #ifndef > _HAVE_ARCH_COPY_AND_CSUM_FROM_USER, fixing powerpc64 build (Stephen > Rothwell) Thanks for the fixes! Acked-by: Kees Cook -Kees > > James Hogan (11): > microblaze: Export __strnlen_user to modules > nios2: Export strncpy_from_user / strnlen_user to modules > openrisc: Export __clear_user to modules > xtensa: Export __strnlen_user to modules > test_user_copy: Check legit kernel accesses > test_user_copy: Check unchecked accessors > test_user_copy: Check __copy_{to,from}_user_inatomic() > test_user_copy: Check __clear_user()/clear_user() > test_user_copy: Check user string accessors > test_user_copy: Check user compatibility accessors > test_user_copy: Check user checksum functions > > arch/microblaze/kernel/microblaze_ksyms.c | 1 + > arch/nios2/mm/uaccess.c | 2 + > arch/openrisc/kernel/or32_ksyms.c | 1 + > arch/xtensa/kernel/xtensa_ksyms.c | 1 + > lib/test_user_copy.c | 251 > ++ > 5 files changed, 256 insertions(+) > > Cc: Kees Cook > Cc: Andrew Morton > Cc: Michal Simek > Cc: Ley Foon Tan > Cc: Jonas Bonn > Cc: Chris Zankel > Cc: Max Filippov > Cc: nios2-...@lists.rocketboards.org > Cc: li...@lists.openrisc.net > Cc: linux-xte...@linux-xtensa.org > -- > 2.3.6 > -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2
On Fri, 2015-08-07 at 16:26 -0700, Luis R. Rodriguez wrote: > On Fri, Aug 7, 2015 at 4:08 PM, Toshi Kani wrote: > > On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote: > > > On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani wrote: > > > > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote: > > > > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani > > > > > wrote: > > > > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote: > > > > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani > > > > > > > > > > > > > > wrote: : > > > > > > > > > > Its a bit more than that though. Since you agree that the OS can > > > > > live without MTRR code I was hoping to then see if we can fold out > > > > > PAT Linux code from under the MTRR dependency on Linux and make > > > > > PAT a first class citizen, maybe at least for x86-64. Right now > > > > > you can only get PAT support on Linux if you have MTRR code, but > > > > > I'd like to see if instead we can rip MTRR code out completely > > > > > under its own Kconfig and let it start rotting away. > > > > > > > > > > Code-wise the only issue I saw was that PAT code also relies on > > > > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I > > > > > found no other obvious issues. > > > > > > > > We can rip of the MTTR code that modifies the MTRR setup, but not > > > > mtrr_type_lookup(). This function provides necessary checks per > > > > documented > > > > in commit 7f0431e3dc89 as follows. > > > > > > > > 1) reserve_memtype() tracks an effective memory type in case > > > >a request type is WB (ex. /dev/mem blindly uses WB). Missing > > > >to track with its effective type causes a subsequent request > > > >to map the same range with the effective type to fail. > > > > > > > > 2) pud_set_huge() and pmd_set_huge() check if a requested range > > > >has any overlap with MTRRs. Missing to detect an overlap may > > > >cause a performance penalty or undefined behavior. > > > > > > > > mtrr_type_lookup() is still admittedly awkward, but I do not think > > > > we > > > > have an immediate issue in PAT code calling it. I do not think it > > > > makes > > > > PAT code a second class citizen. > > > > > > OK since we know that if MTRR set up code ends up disabled and would > > > return MTRR_TYPE_INVALID what if we just static inline this for the > > > no-MTRR Kconfig build option immediately, and only then have the full > > > blown implementation for the case where MTRR Kconfig option is > > > enabled? > > > > Yes, the MTRR code could be disabled by Kconfig with such inline stubs > > OK thanks. > > > as > > long as the kernel is built specifically for a particular platform with > > MTRR disabled, such as Xen guest kernel. > > Sure. > > > However, since MTRR is a CPU feature enabled on most of the systems, I > > am not sure if it makes sense to be configurable with Kconfig, though. > > To me this is about making PAT a first class citizen in code though > and validating through Kconfig the option then to opt-out of MTRR from > OS code. Perhaps we can recommend to enable it but having the options > to split out PAT from MTRR is what I was aiming for. Since we have CONFIG_MTRR already, we do not need to argue over this option. :-) It makes sense since when MTRR code was introduced, there were CPUs without this capability... > > > > > Platform firmware and SMIs seems to be the only other possible > > > > > issue. More on this below. > > > > > > > > > > > > For those type of OSes... > > > > > > > could it be possible to negotiate or hint to the platform > > > > > > > through an attribute somehow that the OS has such capability > > > > > > > to not use MTRR? > > > > > > > > > > > > The OS can disable MTRR. However, this can also cause a problem > > > > > > in firmware, which may rely on MTRR. > > > > > > > > > > Can you describe what type of issues we could expect ? I tend to > > > > > care more about this for 64-bit systems so if 32-bit platforms > > > > > would be more of the ones which could cause an issue would > > > > > restricting disabling MTRR only for 64-bit help? > > > > > > > > The SMI handler runs in real-mode and relies on MTRR being effective > > > > to provide right cache types. It does not matter if it is 64-bit or > > > > not. > > > > > > I see... since I have no visibility to what goes under the hood, can > > > you provide one example use case where an SMI handler would require > > > getting a cache type through MTRR ? I realize this can vary, vendor by > > > vendor, but any example would do just to satisfy my curiosity. > > > > For fan control, it would need UC access to its registers. > > OK thanks! To follow up with the example, since the platform firmware > would have set up the MTRRs anyway, the SMI should still work, even if > the OS didn't do anything, right? Yes, MTRR works without the OS code. However, mtrr_type_lookup() is necessary to
Re: [PATCH 17/26] clk: rockchip: Convert to clk_hw based provider APIs
On 08/04, Heiko Stübner wrote: > Am Freitag, 31. Juli 2015, 10:03:57 schrieb Stephen Boyd: > > We're removing struct clk from the clk provider API, so switch > > this code to using the clk_hw based provider APIs. > > > > Cc: Heiko Stuebner > > Signed-off-by: Stephen Boyd > > Looks correct to me and there are no functional changes in here anyway > Reviewed-by: Heiko Stuebner Thanks. There was one bug, but I fixed it ;-) ---8<--- diff --git a/drivers/clk/rockchip/clk-pll.c b/drivers/clk/rockchip/clk-pll.c index 198261e001fa..eab430452c52 100644 --- a/drivers/clk/rockchip/clk-pll.c +++ b/drivers/clk/rockchip/clk-pll.c @@ -295,7 +295,7 @@ static void rockchip_rk3066_pll_init(struct clk_hw *hw) if (!parent) { pr_warn("%s: parent of %s not available\n", - __func__, clk_hw_get_parent(hw)); + __func__, clk_hw_get_name(hw)); return; } -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fs: create and use seq_show_option for escaping
Many file systems that implement the show_options hook fail to correctly escape their output which could lead to unescaped characters (e.g. new lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This could lead to confusion, spoofed entries (resulting in things like systemd issuing false d-bus "mount" notifications), and who knows what else. This looks like it would only be the root user stepping on themselves, but it's possible weird things could happen in containers or in other situations with delegated mount privileges. Here's an example using overlay with setuid fusermount trusting the contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of "sudo" is something more sneaky: $ BASE="ovl" $ MNT="$BASE/mnt" $ LOW="$BASE/lower" $ UP="$BASE/upper" $ WORK="$BASE/work/ 0 0 none /proc fuse.pwn user_id=1000" $ mkdir -p "$LOW" "$UP" "$WORK" $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt $ cat /proc/mounts none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0 none /proc fuse.pwn user_id=1000 0 0 $ fusermount -u /proc $ cat /proc/mounts cat: /proc/mounts: No such file or directory This fixes the problem by adding new seq_show_option and seq_show_option_n helpers, and updating the vulnerable show_option handlers to use them as needed. Some, like SELinux, need to be open coded due to unusual existing escape mechanisms. Signed-off-by: Kees Cook Cc: sta...@vger.kernel.org --- fs/ceph/super.c | 2 +- fs/cifs/cifsfs.c | 6 +++--- fs/ext3/super.c | 4 ++-- fs/ext4/super.c | 4 ++-- fs/gfs2/super.c | 6 +++--- fs/hfs/super.c | 4 ++-- fs/hfsplus/options.c | 4 ++-- fs/hostfs/hostfs_kern.c | 2 +- fs/ocfs2/super.c | 4 ++-- fs/overlayfs/super.c | 6 +++--- fs/reiserfs/super.c | 8 +--- fs/xfs/xfs_super.c | 4 ++-- include/linux/seq_file.h | 34 ++ kernel/cgroup.c | 7 --- net/ceph/ceph_common.c | 7 +-- security/selinux/hooks.c | 2 +- 16 files changed, 72 insertions(+), 32 deletions(-) diff --git a/fs/ceph/super.c b/fs/ceph/super.c index d1c833c321b9..7b6bfcbf801c 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -479,7 +479,7 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root) if (fsopt->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT) seq_printf(m, ",readdir_max_bytes=%d", fsopt->max_readdir_bytes); if (strcmp(fsopt->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT)) - seq_printf(m, ",snapdirname=%s", fsopt->snapdir_name); + seq_show_option(m, "snapdirname", fsopt->snapdir_name); return 0; } diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index 0a9fb6b53126..6a1119e87fbb 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -394,17 +394,17 @@ cifs_show_options(struct seq_file *s, struct dentry *root) struct sockaddr *srcaddr; srcaddr = (struct sockaddr *)>ses->server->srcaddr; - seq_printf(s, ",vers=%s", tcon->ses->server->vals->version_string); + seq_show_option(s, "vers", tcon->ses->server->vals->version_string); cifs_show_security(s, tcon->ses); cifs_show_cache_flavor(s, cifs_sb); if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER) seq_puts(s, ",multiuser"); else if (tcon->ses->user_name) - seq_printf(s, ",username=%s", tcon->ses->user_name); + seq_show_option(s, "username", tcon->ses->user_name); if (tcon->ses->domainName) - seq_printf(s, ",domain=%s", tcon->ses->domainName); + seq_show_option(s, "domain", tcon->ses->domainName); if (srcaddr->sa_family != AF_UNSPEC) { struct sockaddr_in *saddr4; diff --git a/fs/ext3/super.c b/fs/ext3/super.c index 5ed0044fbb37..e9312494f3ee 100644 --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -578,10 +578,10 @@ static inline void ext3_show_quota_options(struct seq_file *seq, struct super_bl } if (sbi->s_qf_names[USRQUOTA]) - seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]); + seq_show_option(seq, "usrjquota", sbi->s_qf_names[USRQUOTA]); if (sbi->s_qf_names[GRPQUOTA]) - seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]); + seq_show_option(seq, "grpjquota", sbi->s_qf_names[GRPQUOTA]); if (test_opt(sb, USRQUOTA)) seq_puts(seq, ",usrquota"); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 58987b5c514b..9981064c4a54 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1763,10 +1763,10 @@ static inline void ext4_show_quota_options(struct seq_file *seq, } if (sbi->s_qf_names[USRQUOTA]) - seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]); + seq_show_option(seq,
Re: [RFC PATCH v4 1/9] mmc: dw_mmc: Add external dma interface support
在 2015/8/8 5:32, Joachim Eastwood 写道: Hi Shawn, On 6 August 2015 at 08:44, Shawn Lin wrote: DesignWare MMC Controller can supports two types of DMA mode: external dma and internal dma. We get a RK312x platform integrated dw_mmc and ARM pl330 dma controller. This patch add edmac ops to support these platforms. I've tested it on RK312x platform with edmac mode and RK3288 platform with idmac mode. Signed-off-by: Shawn Lin @@ -2256,26 +2373,30 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id) } -#ifdef CONFIG_MMC_DW_IDMAC - /* Handle DMA interrupts */ - if (host->dma_64bit_address == 1) { - pending = mci_readl(host, IDSTS64); - if (pending & (SDMMC_IDMAC_INT_TI | SDMMC_IDMAC_INT_RI)) { - mci_writel(host, IDSTS64, SDMMC_IDMAC_INT_TI | - SDMMC_IDMAC_INT_RI); - mci_writel(host, IDSTS64, SDMMC_IDMAC_INT_NI); - host->dma_ops->complete(host); - } - } else { - pending = mci_readl(host, IDSTS); - if (pending & (SDMMC_IDMAC_INT_TI | SDMMC_IDMAC_INT_RI)) { - mci_writel(host, IDSTS, SDMMC_IDMAC_INT_TI | - SDMMC_IDMAC_INT_RI); - mci_writel(host, IDSTS, SDMMC_IDMAC_INT_NI); - host->dma_ops->complete(host); + if (host->use_dma == TRANS_MODE_IDMAC) { Doing: if (host->use_dma != TRANS_MODE_IDMAC) return IRQ_HANDLED; Okay. Could save you the extra level of identation you add below. + /* Handle DMA interrupts */ + if (host->dma_64bit_address == 1) { + pending = mci_readl(host, IDSTS64); + if (pending & (SDMMC_IDMAC_INT_TI | + SDMMC_IDMAC_INT_RI)) { + mci_writel(host, IDSTS64, + SDMMC_IDMAC_INT_TI | + SDMMC_IDMAC_INT_RI); + mci_writel(host, IDSTS64, SDMMC_IDMAC_INT_NI); + host->dma_ops->complete((void *)host); + } + } else { + pending = mci_readl(host, IDSTS); + if (pending & (SDMMC_IDMAC_INT_TI | + SDMMC_IDMAC_INT_RI)) { + mci_writel(host, IDSTS, + SDMMC_IDMAC_INT_TI | + SDMMC_IDMAC_INT_RI); + mci_writel(host, IDSTS, SDMMC_IDMAC_INT_NI); + host->dma_ops->complete((void *)host); + } } } -#endif return IRQ_HANDLED; } @@ -2437,6 +2567,21 @@ static void dw_mci_cleanup_slot(struct dw_mci_slot *slot, unsigned int id) static void dw_mci_init_dma(struct dw_mci *host) { int addr_config; + int trans_mode; + struct device *dev = host->dev; + struct device_node *np = dev->of_node; + + /* Check tansfer mode */ + trans_mode = (mci_readl(host, HCON) >> 16) & 0x3; I think it would be nice if you could add some defines for 16 and 0x03 or add a macro like SDMMC_GET_FCNT() that is in dw_mmc.h. yes, it's better to avoid magic number for register operation to make others understand w/o checking databook for details. And might more than one (e.g "Check ADDR_CONFIG bit in HCON to find IDMAC address bus width") should be modified. Although one patch only do one thing, I will drop by to make it in v5. + if (trans_mode == 0) { + trans_mode = TRANS_MODE_IDMAC; + } else if (trans_mode == 1 || trans_mode == 2) { + trans_mode = TRANS_MODE_EDMAC; + } else { + trans_mode = TRANS_MODE_PIO; + goto no_dma; + } + /* Check ADDR_CONFIG bit in HCON to find IDMAC address bus width */ addr_config = (mci_readl(host, HCON) >> 27) & 0x01; I'll try to get this patch tested on my lpc18xx platform soon. btw, the HCON reg on lpc18xx reads as 0x00e42cc1 (address 0x40004070). yes, HCON[17:16] is 2b'00 means your lpc18xx use IDMAC. regard, Joachim Eastwood -- Shawn Lin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf, tools: Support full source file paths for srcline
Em Fri, Aug 07, 2015 at 03:24:05PM -0700, Andi Kleen escreveu: > From: Andi Kleen > > For perf report/script srcline currently only the base file name > of the source file is printed. This is a good default because > it usually fits on the screen. > > But in some cases we want to know the full file name, > for example to aggregate hits per file. > > In the later case we need more than the base file name > to resolve file naming collisions: for example the kernel source > has ~70 files named "core.c" > > It's also useful as input to post processing tools which > want to point to the right file. > > Add a flag to allow full file name output. > > Add an option to perf report/script to enable this option. Applied -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] MODSIGN: Use PKCS#7 for module signatures [ver #7]
Hi James, Can you pull this into security/next please? Its aim is twofold: firstly, make the module signatures of PKCS#7/CMS format rather than a home-brewed format and secondly to pave the way for use of the signing code for firmware signatures (to follow later). To this end, the patchset effects the following changes: (1) Extracts both parts of an X.509 AuthorityKeyIdentifier (AKID) extension. We already extract the bit that can match the subjectKeyIdentifier (SKID) of the parent X.509 cert, but we currently ignore the bits that can match the issuer and serialNumber. Looks up an X.509 cert by issuer and serialNumber if those are provided in the AKID. If the keyIdentifier is also provided, checks that the subjectKeyIdentifier of the cert found matches that also. If no issuer and serialNumber are provided in the AKID, looks up an X.509 cert by SKID using the AKID keyIdentifier. This allows module signing to be done with certificates that don't have an SKID by which they can be looked up. (2) Makes use of the PKCS#7 facility to provide module signatures. sign-file is replaced with a program that generates a PKCS#7 message that has no X.509 certs embedded and that has detached data (the module content) and adds it onto the message with magic string and descriptor. (3) The PKCS#7 message supplies all the information that is needed to select the X.509 cert to be used to verify the signature by standard means (including selection of digest algorithm and public key algorithm). No kernel-specific magic values are required. (4) Makes it possible to get sign-file to just write out a file containing the PKCS#7 signature blob. This can be used for debugging and potentially for firmware signing. (5) Extracts the function that does PKCS#7 signature verification on a blob from the module signing code and put it somewhere more general so that other things, such as firmware signing, can make use of it without depending on module config options. (6) Adds support for CMS messages in place of PKCS#7 (they're very similar ASN.1) and makes sign-file create CMS messages instead of PKCS#7. This allows signatures to refer to the verifying key by X.509 cert SKID instead of X.509 cert issuer and serial number. (7) Provides support for providing a password/pin for an encrypted private key to sign-file. (8) Makes it possible to use PKCS#11 with sign-file, thus allowing the use of cryptographic hardware. (9) Overhauls the way the module signing key is handled. If the name in CONFIG_MODULE_SIG_KEY is "signing_key.pem" then a key will be automatically generated and placed in the build directory. If the name is different, autogeneration is suppressed and the file is presumed to be a PEM file containing both the private key and X.509 certificate. (10) Overhauls the way auxiliary trusted keys are added to the kernel. Files matching the pattern "*.x509" are no longer just gathered up and cat'd together. Now CONFIG_SYSTEM_TRUSTED_KEYS must be set to point to a single PEM file containing a set of X.509 certs cat'd together if this facility is desired. (11) Severely restricts what authenticateAttributes are permitted in a PKCS#7 or CMS message and what content type may be used. This is selected by the in-kernel user with the appropriate VERIFYING_*_SIGNATURE constant. Note that the revised sign-file program no longer supports the "-s " option to add an externally generated signature. This is deprecated in favour of using PKCS#11. Note also that the format of the signature file that would be passed to -s has changed. Thanks, David --- The following changes since commit 459c15e53cf7e4e88a78ecfb109af5a267c5500a: Merge tag 'asn1-fixes-20150805' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next (2015-08-07 13:27:58 +1000) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/modsign-pkcs7-20150807 for you to fetch changes up to d158553fdadb0501551a50fcb9537f73d3ed8e01: PKCS#7: Appropriately restrict authenticated attributes and content type (2015-08-07 16:26:15 +0100) Module signing with PKCS#7 David Howells (16): ASN.1: Add an ASN.1 compiler option to dump the element tree ASN.1: Copy string names to tokens in ASN.1 compiler X.509: Extract both parts of the AuthorityKeyIdentifier X.509: Support X.509 lookup by Issuer+Serial form AuthorityKeyIdentifier PKCS#7: Allow detached data to be supplied for signature checking purposes MODSIGN: Provide a utility to append a PKCS#7 signature to a module MODSIGN: Use PKCS#7
[PATCH] clk: si5351: Convert __clk_get_name() to clk_hw_get_name()
This driver uses __clk_get_name() when it's dealing with struct clk_hw pointers. Use the simpler form so that we can get rid of the clk member in struct clk_hw one day. Cc: Sebastian Hesselbarth Signed-off-by: Stephen Boyd --- This depends on the series that introduces clk_hw_get_name()[1]. [1] http://lkml.kernel.org/r/<1438362246-6664-1-git-send-email-sb...@codeaurora.org> drivers/clk/clk-si5351.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/clk/clk-si5351.c b/drivers/clk/clk-si5351.c index 4b2ee39f05fd..5596c0aac22f 100644 --- a/drivers/clk/clk-si5351.c +++ b/drivers/clk/clk-si5351.c @@ -439,7 +439,7 @@ static unsigned long si5351_pll_recalc_rate(struct clk_hw *hw, dev_dbg(>drvdata->client->dev, "%s - %s: p1 = %lu, p2 = %lu, p3 = %lu, parent_rate = %lu, rate = %lu\n", - __func__, __clk_get_name(hwdata->hw.clk), + __func__, clk_hw_get_name(hw), hwdata->params.p1, hwdata->params.p2, hwdata->params.p3, parent_rate, (unsigned long)rate); @@ -497,7 +497,7 @@ static long si5351_pll_round_rate(struct clk_hw *hw, unsigned long rate, dev_dbg(>drvdata->client->dev, "%s - %s: a = %lu, b = %lu, c = %lu, parent_rate = %lu, rate = %lu\n", - __func__, __clk_get_name(hwdata->hw.clk), a, b, c, + __func__, clk_hw_get_name(hw), a, b, c, *parent_rate, rate); return rate; @@ -521,7 +521,7 @@ static int si5351_pll_set_rate(struct clk_hw *hw, unsigned long rate, dev_dbg(>drvdata->client->dev, "%s - %s: p1 = %lu, p2 = %lu, p3 = %lu, parent_rate = %lu, rate = %lu\n", - __func__, __clk_get_name(hwdata->hw.clk), + __func__, clk_hw_get_name(hw), hwdata->params.p1, hwdata->params.p2, hwdata->params.p3, parent_rate, rate); @@ -632,7 +632,7 @@ static unsigned long si5351_msynth_recalc_rate(struct clk_hw *hw, dev_dbg(>drvdata->client->dev, "%s - %s: p1 = %lu, p2 = %lu, p3 = %lu, m = %lu, parent_rate = %lu, rate = %lu\n", - __func__, __clk_get_name(hwdata->hw.clk), + __func__, clk_hw_get_name(hw), hwdata->params.p1, hwdata->params.p2, hwdata->params.p3, m, parent_rate, (unsigned long)rate); @@ -745,7 +745,7 @@ static long si5351_msynth_round_rate(struct clk_hw *hw, unsigned long rate, dev_dbg(>drvdata->client->dev, "%s - %s: a = %lu, b = %lu, c = %lu, divby4 = %d, parent_rate = %lu, rate = %lu\n", - __func__, __clk_get_name(hwdata->hw.clk), a, b, c, divby4, + __func__, clk_hw_get_name(hw), a, b, c, divby4, *parent_rate, rate); return rate; @@ -777,7 +777,7 @@ static int si5351_msynth_set_rate(struct clk_hw *hw, unsigned long rate, dev_dbg(>drvdata->client->dev, "%s - %s: p1 = %lu, p2 = %lu, p3 = %lu, divby4 = %d, parent_rate = %lu, rate = %lu\n", - __func__, __clk_get_name(hwdata->hw.clk), + __func__, clk_hw_get_name(hw), hwdata->params.p1, hwdata->params.p2, hwdata->params.p3, divby4, parent_rate, rate); @@ -1042,7 +1042,7 @@ static long si5351_clkout_round_rate(struct clk_hw *hw, unsigned long rate, dev_dbg(>drvdata->client->dev, "%s - %s: rdiv = %u, parent_rate = %lu, rate = %lu\n", - __func__, __clk_get_name(hwdata->hw.clk), (1 << rdiv), + __func__, clk_hw_get_name(hw), (1 << rdiv), *parent_rate, rate); return rate; @@ -1093,7 +1093,7 @@ static int si5351_clkout_set_rate(struct clk_hw *hw, unsigned long rate, dev_dbg(>drvdata->client->dev, "%s - %s: rdiv = %u, parent_rate = %lu, rate = %lu\n", - __func__, __clk_get_name(hwdata->hw.clk), (1 << rdiv), + __func__, clk_hw_get_name(hw), (1 << rdiv), parent_rate, rate); return 0; -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: > I'm for solution 3: > > - keep blk_bio_{discard,write_same}_split, but ensure we never built >a > 4GB bio in blkdev_issue_{discard,write_same}. This has problem as I mentioned in solution 1. We need to also make sure max discard size is of proper granularity. See below example. 4G: 8388608 sectors UINT_MAX: 8388607 sectors dm-thinp block size = default discard granularity = 128 sectors blkdev_issue_discard(sector=0, nr_sectors=8388608) 1. Only ensure bi_size not overflow It doesn't work. [start_sector, end_sector] [0, 8388607] [0, 8388606], then dm-thinp splits it to 2 bios [0, 8388479] [8388480, 8388606] ---> this has problem in process_discard_bio(), because the discard size(7 sectors) covers less than a block(128 sectors) [8388607, 8388607] ---> same problem 2. Ensure bi_size not overflow and max discard size is of proper granularity It works. [start_sector, end_sector] [0, 8388607] [0, 8388479] [8388480, 8388607] So how about below patch? commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7 Author: Ming Lin Date: Fri Aug 7 15:07:07 2015 -0700 block: remove split code in blkdev_issue_{discard,write_same} The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we ensure max_discard_sectors is of the proper granularity. So if discard size > 4G, blkdev_issue_discard() always send multiple granularity requests to lower level, except that the last one may be not multiple granularity. Signed-off-by: Ming Lin --- block/blk-lib.c | 37 + 1 file changed, 9 insertions(+), 28 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..e178a07 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, /* Zero-sector (unknown) and one-sector granularities are the same. */ granularity = max(q->limits.discard_granularity >> 9, 1U); - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; /* -* Ensure that max_discard_sectors is of the proper -* granularity, so that requests stay aligned after a split. -*/ - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); +* Ensure that max_discard_sectors doesn't overflow bi_size and is of +* the proper granularity. So if discard size > 4G, blkdev_issue_discard() +* always split and send multiple granularity requests to lower level, +* except that the last one may be not multiple granularity. + */ + max_discard_sectors = UINT_MAX >> 9; max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, } req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* -* If splitting a request, and the next starting sector would be -* misaligned, stop the discard at the previous aligned sector. -*/ end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects < nr_sects && - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio->bi_iter.bi_sector = sector; bio->bi_end_io = bio_batch_end_io; @@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, if (!q) return -ENXIO; -
[mm/slab_common] BUG: kernel early-boot crashed early console in setup code
Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit 4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77 Author: Gavin Guo AuthorDate: Wed Jun 24 16:55:54 2015 -0700 Commit: Linus Torvalds CommitDate: Wed Jun 24 17:49:40 2015 -0700 mm/slab_common: support the slub_debug boot option on specific object size The slub_debug=PU,kmalloc-xx cannot work because in the create_kmalloc_caches() the s->name is created after the create_kmalloc_cache() is called. The name is NULL in the create_kmalloc_cache() so the kmem_cache_flags() would not set the slub_debug flags to the s->flags. The fix here set up a kmalloc_names string array for the initialization purpose and delete the dynamic name creation of kmalloc_caches. [a...@linux-foundation.org: s/kmalloc_names/kmalloc_info/, tweak comment text] Signed-off-by: Gavin Guo Acked-by: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds +---++++ | | 3693a84d3b | 4066c33d03 | e4bc13adfd | +---++++ | boot_successes| 377| 14 | 2 | | boot_failures | 0 | 116 | 27 | | BUG:kernel_early-boot_crashed_early_console_in_setup_code | 0 | 116 | 22 | | IP-Config:Auto-configuration_of_network_failed| 0 | 0 | 5 | +---++++ early console in setup code Elapsed time: 10 BUG: kernel early-boot crashed early console in setup code Linux version 4.1.0-03324-g4066c33 #5 Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw link=/kbuild-tests/run-queue/kvm/i386-randconfig-sb0-08051454/linux-devel:devel-spot-201508051145:4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77:bisect-linux-5/.vmlinuz-4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77-20150808052740-53-ivb41 branch=linux-devel/devel-spot-201508051145 BOOT_IMAGE=/pkg/linux/i386-randconfig-sb0-08051454/gcc-4.9/4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77/vmlinuz-4.1.0-03324-g4066c33 drbd.minor_count=8 qemu-system-x86_64 -enable-kvm -cpu kvm64 -kernel /pkg/linux/i386-randconfig-sb0-08051454/gcc-4.9/4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77/vmlinuz-4.1.0-03324-g4066c33 -append 'hung_task_panic=1 earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw link=/kbuild-tests/run-queue/kvm/i386-randconfig-sb0-08051454/linux-devel:devel-spot-201508051145:4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77:bisect-linux-5/.vmlinuz-4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77-20150808052740-53-ivb41 branch=linux-devel/devel-spot-201508051145 BOOT_IMAGE=/pkg/linux/i386-randconfig-sb0-08051454/gcc-4.9/4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77/vmlinuz-4.1.0-03324-g4066c33 drbd.minor_count=8' -initrd /osimage/quantal/quantal-core-i386.cgz -m 300 -smp 2 -device e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot -watchdog i6300esb -rtc base=localtime -drive file=/fs/sda5/disk0-quantal-ivb41-22,media=disk,if=virtio -drive file=/fs/sda5/disk1-quantal-ivb41-22,media=disk,if=virtio -drive file=/fs/sda5/disk2-quantal-ivb41-22,media=disk,if=virtio -drive file=/fs/sda5/disk3-quantal-ivb41-22,media=disk,if=virtio -drive file=/fs/sda5/disk4-quantal-ivb41-22,media=disk,if=virtio -drive file=/fs/sda5/disk5-quantal-ivb41-22,media=disk,if=virtio -drive file=/fs/sda5/disk6-quantal-ivb41-22,media=disk,if=virtio -pidfile /dev/shm/kboot/pid-quantal-ivb41-22 -serial file:/dev/shm/kboot/serial-quantal-ivb41-22 -daemonize -display none -monitor null git bisect start e4bc13adfd016fc1036838170288b5680d1a98b0 v4.1 -- git bisect good acd53127c4adbd34570b221e7ea1f7fc94aea923 # 04:28 22+ 0 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi git bisect good e0456717e483bb8a9431b80a5bdc99a928b9b003 # 04:34 22+ 0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect good 266da6f14232638b9caafb7facf2a7333895dd05 # 04:39 22+ 0 Merge tag
Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2
On Fri, Aug 7, 2015 at 4:08 PM, Toshi Kani wrote: > On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote: >> On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani wrote: >> > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote: >> > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani wrote: >> > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote: >> > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani >> > > > > wrote: > : >> > > > >> > > > No, there is no OS support necessary to use MTRR. After firmware >> > > > sets it up, CPUs continue to use it without any OS support. I think >> > > > the Linux change you are referring is to obsolete legacy interfaces >> > > > that modify the MTRR setup. I agree that Linux should not modify >> > > > MTRR. >> > > >> > > Its a bit more than that though. Since you agree that the OS can live >> > > without MTRR code I was hoping to then see if we can fold out PAT >> > > Linux code from under the MTRR dependency on Linux and make PAT a >> > > first class citizen, maybe at least for x86-64. Right now you can only >> > > get PAT support on Linux if you have MTRR code, but I'd like to see if >> > > instead we can rip MTRR code out completely under its own Kconfig and >> > > let it start rotting away. >> > > >> > > Code-wise the only issue I saw was that PAT code also relies on >> > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I found >> > > no other obvious issues. >> > >> > We can rip of the MTTR code that modifies the MTRR setup, but not >> > mtrr_type_lookup(). This function provides necessary checks per >> > documented >> > in commit 7f0431e3dc89 as follows. >> > >> > 1) reserve_memtype() tracks an effective memory type in case >> >a request type is WB (ex. /dev/mem blindly uses WB). Missing >> >to track with its effective type causes a subsequent request >> >to map the same range with the effective type to fail. >> > >> > 2) pud_set_huge() and pmd_set_huge() check if a requested range >> >has any overlap with MTRRs. Missing to detect an overlap may >> >cause a performance penalty or undefined behavior. >> > >> > mtrr_type_lookup() is still admittedly awkward, but I do not think we >> > have an immediate issue in PAT code calling it. I do not think it makes >> > PAT code a second class citizen. >> >> OK since we know that if MTRR set up code ends up disabled and would >> return MTRR_TYPE_INVALID what if we just static inline this for the >> no-MTRR Kconfig build option immediately, and only then have the full >> blown implementation for the case where MTRR Kconfig option is >> enabled? > > Yes, the MTRR code could be disabled by Kconfig with such inline stubs OK thanks. > as > long as the kernel is built specifically for a particular platform with MTRR > disabled, such as Xen guest kernel. Sure. > However, since MTRR is a CPU feature enabled on most of the systems, I am > not sure if it makes sense to be configurable with Kconfig, though. To me this is about making PAT a first class citizen in code though and validating through Kconfig the option then to opt-out of MTRR from OS code. Perhaps we can recommend to enable it but having the options to split out PAT from MTRR is what I was aiming for. >> > > Platform firmware and SMIs seems to be the only other possible issue. >> > > More on this below. >> > > >> > > > > For those type of OSes... >> > > > > could it be possible to negotiate or hint to the platform through >> > > > > an attribute somehow that the OS has such capability to not use >> > > > > MTRR? >> > > > >> > > > The OS can disable MTRR. However, this can also cause a problem in >> > > > firmware, which may rely on MTRR. >> > > >> > > Can you describe what type of issues we could expect ? I tend to care >> > > more about this for 64-bit systems so if 32-bit platforms would be >> > > more of the ones which could cause an issue would restricting >> > > disabling MTRR only for 64-bit help? >> > >> > The SMI handler runs in real-mode and relies on MTRR being effective to >> > provide right cache types. It does not matter if it is 64-bit or not. >> >> I see... since I have no visibility to what goes under the hood, can >> you provide one example use case where an SMI handler would require >> getting a cache type through MTRR ? I realize this can vary, vendor by >> vendor, but any example would do just to satisfy my curiosity. > > For fan control, it would need UC access to its registers. OK thanks! To follow up with the example, since the platform firmware would have set up the MTRRs anyway, the SMI should still work, even if the OS didn't do anything, right? >> > > > > Then, only if this bit is set, the platform could then avoid such >> > > > > MTRR settings, and if we have issues you can throw rocks at us. >> > > > >> > > > > And if that's not possible how about a new platform setting that >> > > > > would need to be set at the platform level to enable disabling >> > > >
Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2
On Fri, 2015-08-07 at 17:08 -0600, Toshi Kani wrote: > On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote: > > On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani wrote: > > > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote: > > > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani > > > > wrote: > > > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote: > > > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani > > > > > > wrote: > : > > > > > > > > > > No, there is no OS support necessary to use MTRR. After firmware > > > > > sets it up, CPUs continue to use it without any OS support. I > > > > > think the Linux change you are referring is to obsolete legacy > > > > > interfaces that modify the MTRR setup. I agree that Linux should > > > > > not modify MTRR. > > > > > > > > Its a bit more than that though. Since you agree that the OS can > > > > live without MTRR code I was hoping to then see if we can fold out > > > > PAT Linux code from under the MTRR dependency on Linux and make PAT > > > > a first class citizen, maybe at least for x86-64. Right now you can > > > > only get PAT support on Linux if you have MTRR code, but I'd like to > > > > see if instead we can rip MTRR code out completely under its own > > > > Kconfig and let it start rotting away. > > > > > > > > Code-wise the only issue I saw was that PAT code also relies on > > > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I > > > > found no other obvious issues. > > > > > > We can rip of the MTTR code that modifies the MTRR setup, but not > > > mtrr_type_lookup(). This function provides necessary checks per > > > documented in commit 7f0431e3dc89 as follows. > > > > > > 1) reserve_memtype() tracks an effective memory type in case > > >a request type is WB (ex. /dev/mem blindly uses WB). Missing > > >to track with its effective type causes a subsequent request > > >to map the same range with the effective type to fail. > > > > > > 2) pud_set_huge() and pmd_set_huge() check if a requested range > > >has any overlap with MTRRs. Missing to detect an overlap may > > >cause a performance penalty or undefined behavior. > > > > > > mtrr_type_lookup() is still admittedly awkward, but I do not think we > > > have an immediate issue in PAT code calling it. I do not think it > > > makes > > > PAT code a second class citizen. > > > > OK since we know that if MTRR set up code ends up disabled and would > > return MTRR_TYPE_INVALID what if we just static inline this for the > > no-MTRR Kconfig build option immediately, and only then have the full > > blown implementation for the case where MTRR Kconfig option is > > enabled? > > Yes, the MTRR code could be disabled by Kconfig with such inline stubs as > long as the kernel is built specifically for a particular platform with > MTRR disabled, such as Xen guest kernel. Noticed that we do have CONFIG_MTRR and mtrr_type_lookup() inline stub returns MTRR_INVALID. -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tools/thermal: tmon: Allow overriding pkg-config
On Fri, Aug 07, 2015 at 03:58:02PM -0700, Florian Fainelli wrote: > Some build systems might not ship with a proper pkg-config > infrastructure, so picking up the host pkg-config might cause us to link > with ncursesw which may not be present in the build environment. > > Allow an external build system to override the pkg-config value (e.g: > set it to /bin/false if we do not have anything). Is that really the best approach? I see other Makefiles that leave PKG_CONFIG as a variable (which is fine), but not to allow full override; they do something like: PKG_CONFIG := $(CROSS_COMPILE)pkg-config So, I suppose you could then alias your my-foo-cross-pkg-config to /bin/false, or something like that. Another alternative: supply empty $PKG_CONFIG_LIBDIR and $PKG_CONFIG_PATH variables, so pkg-config finds nothing. (Bonus: no patch needed here!) Brian > Fixes: 96a0d99c72cc ("tools/thermal: tmon: use pkg-config to determine > library dependencies") > Signed-off-by: Florian Fainelli > --- > tools/thermal/tmon/Makefile | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/tools/thermal/tmon/Makefile b/tools/thermal/tmon/Makefile > index 2e83dd3655a2..3777ab59d08c 100644 > --- a/tools/thermal/tmon/Makefile > +++ b/tools/thermal/tmon/Makefile > @@ -11,6 +11,7 @@ TARGET=tmon > > INSTALL_PROGRAM=install -m 755 -p > DEL_FILE=rm -f > +PKG_CONFIG?=pkg-config > > # Static builds might require -ltinfo, for instance > ifneq ($(findstring -static, $(LDFLAGS)),) > @@ -18,8 +19,8 @@ STATIC := --static > endif > > TMON_LIBS=-lm -lpthread > -TMON_LIBS += $(shell pkg-config --libs $(STATIC) panelw ncursesw 2> > /dev/null || \ > - pkg-config --libs $(STATIC) panel ncurses 2> /dev/null || \ > +TMON_LIBS += $(shell $(PKG_CONFIG) --libs $(STATIC) panelw ncursesw 2> > /dev/null || \ > + $(PKG_CONFIG) --libs $(STATIC) panel ncurses 2> /dev/null > || \ >echo -lpanel -lncurses) > > OBJS = tmon.o tui.o sysfs.o pid.o > -- > 2.1.0 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2
On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote: > On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani wrote: > > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote: > > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani wrote: > > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote: > > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani > > > > > wrote: : > > > > > > > > No, there is no OS support necessary to use MTRR. After firmware > > > > sets it up, CPUs continue to use it without any OS support. I think > > > > the Linux change you are referring is to obsolete legacy interfaces > > > > that modify the MTRR setup. I agree that Linux should not modify > > > > MTRR. > > > > > > Its a bit more than that though. Since you agree that the OS can live > > > without MTRR code I was hoping to then see if we can fold out PAT > > > Linux code from under the MTRR dependency on Linux and make PAT a > > > first class citizen, maybe at least for x86-64. Right now you can only > > > get PAT support on Linux if you have MTRR code, but I'd like to see if > > > instead we can rip MTRR code out completely under its own Kconfig and > > > let it start rotting away. > > > > > > Code-wise the only issue I saw was that PAT code also relies on > > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I found > > > no other obvious issues. > > > > We can rip of the MTTR code that modifies the MTRR setup, but not > > mtrr_type_lookup(). This function provides necessary checks per > > documented > > in commit 7f0431e3dc89 as follows. > > > > 1) reserve_memtype() tracks an effective memory type in case > >a request type is WB (ex. /dev/mem blindly uses WB). Missing > >to track with its effective type causes a subsequent request > >to map the same range with the effective type to fail. > > > > 2) pud_set_huge() and pmd_set_huge() check if a requested range > >has any overlap with MTRRs. Missing to detect an overlap may > >cause a performance penalty or undefined behavior. > > > > mtrr_type_lookup() is still admittedly awkward, but I do not think we > > have an immediate issue in PAT code calling it. I do not think it makes > > PAT code a second class citizen. > > OK since we know that if MTRR set up code ends up disabled and would > return MTRR_TYPE_INVALID what if we just static inline this for the > no-MTRR Kconfig build option immediately, and only then have the full > blown implementation for the case where MTRR Kconfig option is > enabled? Yes, the MTRR code could be disabled by Kconfig with such inline stubs as long as the kernel is built specifically for a particular platform with MTRR disabled, such as Xen guest kernel. However, since MTRR is a CPU feature enabled on most of the systems, I am not sure if it makes sense to be configurable with Kconfig, though. > > > Platform firmware and SMIs seems to be the only other possible issue. > > > More on this below. > > > > > > > > For those type of OSes... > > > > > could it be possible to negotiate or hint to the platform through > > > > > an attribute somehow that the OS has such capability to not use > > > > > MTRR? > > > > > > > > The OS can disable MTRR. However, this can also cause a problem in > > > > firmware, which may rely on MTRR. > > > > > > Can you describe what type of issues we could expect ? I tend to care > > > more about this for 64-bit systems so if 32-bit platforms would be > > > more of the ones which could cause an issue would restricting > > > disabling MTRR only for 64-bit help? > > > > The SMI handler runs in real-mode and relies on MTRR being effective to > > provide right cache types. It does not matter if it is 64-bit or not. > > I see... since I have no visibility to what goes under the hood, can > you provide one example use case where an SMI handler would require > getting a cache type through MTRR ? I realize this can vary, vendor by > vendor, but any example would do just to satisfy my curiosity. For fan control, it would need UC access to its registers. > > > > > Then, only if this bit is set, the platform could then avoid such > > > > > MTRR settings, and if we have issues you can throw rocks at us. > > > > > > > > > And if that's not possible how about a new platform setting that > > > > > would need to be set at the platform level to enable disabling > > > > > this junk? > > > > > Then only folks who know what they are doing would enable it, and > > > > > if the customer set it, the issue would not be on the platform. > > > > > > > > > Could this also be used to prevent SMIs with MTRRs? > > > > > > > > ACPI _OSI could be used for firmware to implement some OS-specific > > > > features, but it may be too late for firmware to make major changes > > > > and > > > > is generally useless unless OS requirements are described in a spec > > > > backed by logo certification. > > > > > > I see.. So there are no guarantees that
Re: [PATCH v2] staging: ion: Add a default struct device for cma heap
On 08/07/2015 11:05 AM, Greg Kroah-Hartman wrote: On Fri, Aug 07, 2015 at 11:50:04PM +0800, Feng Tang wrote: On Fri, Aug 07, 2015 at 04:48:28PM +0200, Michal Nazarewicz wrote: On Fri, Aug 07 2015, Feng Tang wrote: As I described above, the dummy struct device is only needed for dma request, its lifetime is align with the cma_heap itself. Again, this is from perspective of someone who is unfamiliar with ION, but perhaps a viable solution is to bypass DMA API and just call cma_alloc directly? For ion cma heap, the buffer allocation func ion_cma_allocate() will call dma_alloc_coherent(dev, ...). And dma_alloc_coherent() is implemented by each architeture(arm/m68k/x86 etc), and many Arch's implementation doesn't use cma, but use alloc_pages() like APIs. So I'm afraid we can't direcly call cma_alloc directly here. Ick. But using a "fake" struct device here, for no real reason, makes me very nervous that you are going to hit a codepath somewhere that assumes this is a "real" struct device and tries to do something with it (dev_printk(), look up what bus it is on, change the name of it, etc.) Trying to fake out the subsystem in this manner is a sign that something is really wrong here. Please either make this a real device, or fix up the api to not need this type of thing. I think this issue represents one of the many current issues with Ion. When the void * == struct dev was added, everything was working off of board files. We now have devicetree which makes the device association even more awkward to pull off. Every vendor out there is doing something different right now so the assertion in the commit text about 'normal' is not true; existing code has managed to work with the (not super great) API. There is going to be an Ion session at Plumbers in a few weeks. I'd like to propose holding off on merging anything until after plumbers when there can be some more discussion about what would be a reasonable API, taking into consideration the points brought up in this patch series. Thanks, Laura -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/8] drm: rockchip/dp: add rockchip platform dp driver
Hi Yakir, I think this Rockchip portion is missing a devicetree binding. You have the ability to power down the actual edp phy by using grf_edp_iddq_en from GRF_SOC_CON12. This is similar to how the rk3288 usb-phy gets put into a deeper state. So maybe you could provide a phy driver (drivers/phy) for this similar to what the exynos-dp does. Some more stuff inline. But I guess by no means complete, as I'm still trying to integrate this into my development-tree. Am Freitag, 7. August 2015, 05:46:20 schrieb Yakir Yang: > Rockchip have three clocks for dp controller, we leave pclk_edp > to analogix_dp driver control, and keep the sclk_edp_24m and > sclk_edp in platform driver. > > Signed-off-by: Yakir Yang > --- > Changes in v2: None > > drivers/gpu/drm/rockchip/Kconfig| 10 + > drivers/gpu/drm/rockchip/Makefile | 1 + > drivers/gpu/drm/rockchip/analogix_dp-rockchip.c | 419 > 3 files changed, 430 insertions(+) > create mode 100644 drivers/gpu/drm/rockchip/analogix_dp-rockchip.c > > diff --git a/drivers/gpu/drm/rockchip/Kconfig > b/drivers/gpu/drm/rockchip/Kconfig index 35215f6..096ed77 100644 > --- a/drivers/gpu/drm/rockchip/Kconfig > +++ b/drivers/gpu/drm/rockchip/Kconfig > @@ -25,3 +25,13 @@ config ROCKCHIP_DW_HDMI > for the Synopsys DesignWare HDMI driver. If you want to > enable HDMI on RK3288 based SoC, you should selet this > option. > + > + nit: double blank line > +config ROCKCHIP_ANALOGIX_DP > +tristate "Rockchip specific extensions for Analogix DP driver" > +depends on DRM_ROCKCHIP > +select DRM_ANALOGIX_DP > +help > + This selects support for Rockchip SoC specific extensions > + for the Analogix Core DP driver. If you want to enable DP > + on RK3288 based SoC, you should selet this option. [...] > diff --git a/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c > b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c new file mode 100644 > index 000..2f86e5e > --- /dev/null > +++ b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c > @@ -0,0 +1,419 @@ > +/* > + * Rockchip SoC DP (Display Port) interface driver. > + * > + * Copyright (C) Fuzhou Rockchip Electronics Co., Ltd. > + * Author: Andy Yan > + * Yakir Yang > + * Jeff Chen > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License as published by the > + * Free Software Foundation; either version 2 of the License, or (at your > + * option) any later version. > + */ > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > + > +#include > + > +#include "rockchip_drm_drv.h" > +#include "rockchip_drm_vop.h" > + > +#define encoder_to_dp(c) \ > + container_of(c, struct rockchip_dp_device, encoder) > + > +#define plat_data_to_dp(pd) \ > + container_of(pd, struct rockchip_dp_device, plat_data) > + > +/* dp grf register offset */ > +#define DP_VOP_SEL 0x025c /* grf_soc_con6 */ > +#define DP_REF_CLK_SEL 0x0274 /* grf_soc_con12 */ > + > +#define GRF_DP_REF_CLK_SEL_INTERBIT(4) > +#define DP_SEL_VOP_LIT BIT(5) > + > +struct rockchip_dp_device { > + struct drm_device*drm_dev; > + struct device*dev; > + struct drm_encoder encoder; > + struct drm_display_mode mode; > + > + struct clk *clk_dp; > + struct clk *clk_24m_parent; this clk_24m_parent does not seem used at all > + struct clk *clk_24m; > + struct regmap*grf; > + struct reset_control *rst; > + > + struct analogix_dp_plat_data plat_data; > +}; > + [...] > +static int rockchip_dp_init(struct rockchip_dp_device *dp) > +{ > + struct device *dev = dp->dev; > + struct device_node *np = dev->of_node; > + int ret; > + > + dp->grf = syscon_regmap_lookup_by_phandle(np, "rockchip,grf"); > + if (IS_ERR(dp->grf)) { > + dev_err(dev, > + "rk3288-dp needs rockchip,grf property\n"); > + return PTR_ERR(dp->grf); > + } > + > + dp->clk_dp = devm_clk_get(dev, "clk_dp"); I've looked at the manual, but couldn't find an actual clock-name used there. Is it really "clk_dp" or should it just be "dp"? > + if (IS_ERR(dp->clk_dp)) { > + dev_err(dev, "cannot get clk_dp\n"); > + return PTR_ERR(dp->clk_dp); > + } > + > + dp->clk_24m = devm_clk_get(dev, "clk_dp_24m"); Same here, maybe "dp_24m". > + if (IS_ERR(dp->clk_24m)) { > + dev_err(dev, "cannot get clk_dp_24m\n"); > + return PTR_ERR(dp->clk_24m); > + } I think you're missing the pclk here (PCLK_EDP_CTRL) or is this part of something else? >