Re: [PATCH 4.1 099/267] phy: twl4030-usb: remove incorrect pm_runtime_get_sync() in probe function.

2015-08-07 Thread Alexander Holler

Hello,

this patch killed the musb-host functionality on my classic Beagleboard 
(rev c4). Symptom was that it there was a message I don't remember and 
the attached device didn't enumerate anymore (likely because of missing 
power, but I'm not sure).


A simple revert has fixed it, I haven't looked further into the problem.

Regards,

Alexander Holler

Am 31.07.2015 um 21:39 schrieb Greg Kroah-Hartman:

4.1-stable review patch.  If anyone has any objections, please let me know.

--

From: NeilBrown 

commit 4724e27114c4a7ec07db227a17fcab6f165c upstream.

The USB phy should initialize with power-off, and will be powered on
by the USB system when a cable connection is detected.

Having this pm_runtime_get_sync() during probe causes the phy to
*always* be powered on.
Removing it returns to sensible power management.

Fixes: 96be39ab34b77c6f6f5cd6ae03aac6c6449ee5c4
Signed-off-by: NeilBrown 
Signed-off-by: Kishon Vijay Abraham I 
Signed-off-by: Greg Kroah-Hartman 

---
  drivers/phy/phy-twl4030-usb.c |1 -
  1 file changed, 1 deletion(-)

--- a/drivers/phy/phy-twl4030-usb.c
+++ b/drivers/phy/phy-twl4030-usb.c
@@ -711,7 +711,6 @@ static int twl4030_usb_probe(struct plat
pm_runtime_use_autosuspend(>dev);
pm_runtime_set_autosuspend_delay(>dev, 2000);
pm_runtime_enable(>dev);
-   pm_runtime_get_sync(>dev);

/* Our job is to use irqs and status from the power module
 * to keep the transceiver disabled when nothing's connected.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/2] usb: Add support for ACPI identification to xhci-platform

2015-08-07 Thread Javier Martinez Canillas
Hello Greg,

On Sat, Aug 8, 2015 at 3:29 AM, Greg KH  wrote:
> On Fri, Aug 07, 2015 at 06:03:36PM -0700, Duc Dang wrote:
>> Provide the methods to let ACPI identify the need to use
>> xhci-platform. Change the Kconfig files so the
>> xhci-plat.o file is selectable during kernel config.
>>
>> This has been tested on an ARM64 machine with platform XHCI, an
>> x86_64 machine with XHCI, and an x86_64 machine without XHCI.
>> There were no regressions or error messages on the machines
>> without platform XHCI.
>>
>> [dhdang: regenerate the patch over 4.2-rc5]
>> Signed-off-by: Mark Langsdorf 
>> Signed-off-by: Duc Dang 
>>
>> ---
>> Changes from v3:
>>   Regenerate the patch over 4.2-rc5
>>   No code change
>>
>> Changes from v2
>>   Replaced tristate with a boolean as the driver doesn't
>>   compile as a module
>>   Correct --help-- to ---help---
>>
>> Changes from v1
>>   Renamed from "add support for APM X-Gene to xhci-platform"
>>   Removed changes to arm64/Kconfig
>>   Made CONFIG_USB_XHCI_PLATFORM a user selectable config option
>>
>>  drivers/usb/host/Kconfig |  7 ++-
>>  drivers/usb/host/xhci-plat.c | 11 +++
>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
>> index 8afc3c1..96231ee 100644
>> --- a/drivers/usb/host/Kconfig
>> +++ b/drivers/usb/host/Kconfig
>> @@ -32,7 +32,12 @@ config USB_XHCI_PCI
>> default y
>>
>>  config USB_XHCI_PLATFORM
>> - tristate
>> + tristate "xHCI platform driver support"
>> + ---help---
>> +   Say 'Y' to enable the support for the xHCI host controller
>> +   as a platform device. Many ARM SoCs provide USB this way.
>> +
>> +   If unsure, say 'Y'.
>>
>>  config USB_XHCI_MVEBU
>>   tristate "xHCI support for Marvell Armada 375/38x"
>> diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
>> index 5d03f8b..14b40d2 100644
>> --- a/drivers/usb/host/xhci-plat.c
>> +++ b/drivers/usb/host/xhci-plat.c
>> @@ -19,6 +19,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include "xhci.h"
>>  #include "xhci-mvebu.h"
>> @@ -262,6 +263,15 @@ static const struct of_device_id usb_xhci_of_match[] = {
>>  MODULE_DEVICE_TABLE(of, usb_xhci_of_match);
>>  #endif
>>
>> +#ifdef CONFIG_ACPI
>
> You shoudn't need this #ifdef, right?
>

Why it is not needed?

The driver does .acpi_match_table = ACPI_PTR(usb_xhci_acpi_match) and
ACPI_PTR() is NULL if CONFIG_ACPI is not enabled. Which can happen
AFAIU since the driver also supports OF. So without the #ifdef guards,
.acpi_match_table = NULL and the struct acpi_device_id
usb_xhci_acpi_match[] will be built but not used.

Or am I missing something?

> thanks,
>
> greg k-h
>

Best regards,
Javier
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Kent Overstreet
On Fri, Aug 07, 2015 at 10:17:43PM -0700, Ming Lin wrote:
> On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet
> > Ideally we'd get upper layers out of the business of knowing about the queue
> > limits at all - that was the point of the patch series, after all.
> >
> > Instead of using UINT_MAX, would it work to just make the max 1 << 31 
> > sectors?'
> 
> 1 << 31 = 2G bytes = 0x40 sectors.
> 
> Yes, that works as long as it's multiple of granularity.

Is granularity required to be a power of two? One would hope, but looking at the
code that doesn't appear to be a requirement... ugh, that's terrible...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Ming Lin
On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet
 wrote:
> On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote:
>>
>> On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
>> > I'm for solution 3:
>> >
>> >  - keep blk_bio_{discard,write_same}_split, but ensure we never built
>> >a > 4GB bio in blkdev_issue_{discard,write_same}.
>>
>> This has problem as I mentioned in solution 1.
>> We need to also make sure max discard size is of proper granularity.
>> See below example.
>>
>>   4G: 8388608 sectors
>> UINT_MAX: 8388607 sectors
>>
>> dm-thinp block size = default discard granularity = 128 sectors
>>
>> blkdev_issue_discard(sector=0, nr_sectors=8388608)
>>
>> 1. Only ensure bi_size not overflow
>>
>> It doesn't work.
>>
>> [start_sector, end_sector]
>> [0, 8388607]
>> [0, 8388606], then dm-thinp splits it to 2 bios
>> [0, 8388479]
>> [8388480, 8388606] ---> this has problem in process_discard_bio(),
>> because the discard size(7 sectors) covers 
>> less than a block(128 sectors)
>> [8388607, 8388607] ---> same problem
>>
>> 2. Ensure bi_size not overflow and max discard size is of proper granularity
>
> Ideally we'd get upper layers out of the business of knowing about the queue
> limits at all - that was the point of the patch series, after all.
>
> Instead of using UINT_MAX, would it work to just make the max 1 << 31 
> sectors?'

1 << 31 = 2G bytes = 0x40 sectors.

Yes, that works as long as it's multiple of granularity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lkp] [staging] 68905a14e49: kernel BUG at drivers/base/driver.c:153!

2015-08-07 Thread Sudip Mukherjee
On Fri, Aug 07, 2015 at 11:01:13AM -0700, Greg Kroah-Hartman wrote:
> On Fri, Aug 07, 2015 at 12:45:53PM +0530, Sudip Mukherjee wrote:
> > On Thu, Aug 06, 2015 at 07:28:12PM -0700, Greg Kroah-Hartman wrote:
> > > 
> > > 
> > > On Sat, Aug 01, 2015 at 01:12:05PM +0800, kernel test robot wrote:
> > 
> > > > [   12.276231] RIP 
> > > > [   12.276231] RIP  [] driver_register+0xa8/0xe0
> > > >  [] driver_register+0xa8/0xe0
> > > > [   12.276231]  RSP 
> > > > [   12.276231]  RSP 
> > > > [   12.403330] ---[ end trace 075c805d53b02836 ]---
> > > > [   12.403330] ---[ end trace 075c805d53b02836 ]---
> > > 
> > > David, any ideas of what needs to be done to fix this?
> > 
> > Hi Greg,
> > I was having a quick look and it looks like visorchipset is not yet
> > initialised and the bus is not yet registered. If you remember we had a
> > lng discussion in another thread "usb: ulpi: don't register drivers
> > if bus doesn't exist", I sent a proposed patch also. For your reference:
> > "http://www.spinics.net/lists/linux-usb/msg125368.html;.
> > I think this one is the perfect example of that case here.
> 
> Great, then it needs to be fixed in the unisys driver code, like I said
> should be done for busses to prevent this type of issue :)
oops. then I have missed that mail. The last mail I have from you in
that thread said "I have some time on Saturday (very long plane ride),
where I'll take a look at this." , so I though that is still in your
todo list. Then I need to fix parport also, as that can also have this
same problem.

regards
sudip
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V9 3/3] audit: add audit by children of executable path

2015-08-07 Thread Richard Guy Briggs
On 15/08/07, Paul Moore wrote:
> On Fri, Aug 7, 2015 at 12:03 PM, Richard Guy Briggs  wrote:
> > On 15/08/07, Paul Moore wrote:
> >> On Friday, August 07, 2015 02:37:15 AM Richard Guy Briggs wrote:
> >> > On 15/08/06, Paul Moore wrote:
> >> >
> >> > > I guess what I'm saying is that I'm not currently convinced that
> >> > > there is enough value in this to offset the risk I feel the loop
> >> > > presents. I understand the use cases that you are mentioning, the
> >> > > are the same as the last time we discussed this, but I'm going to
> >> > > need something better than that.
> >> >
> >> > Can you better describe the loop that concerns you?  I don't quite see
> >> > it.
> >>
> >> It would be the only loop in the patch, look at the for loop in
> >> audit_filter_rules() which iterates up the process' parent chain.
> >
> > Sorry, I should reword that...  What risk do you see in that loop?  It
> > works up the task ancestry tree until it triggers, or hits init for that
> > PID namespace that terminates the loop.  Do you see a risk in the
> > numerical pids rolling underneath the loop?
> 
> I suppose there is some risk of PID overlap, and while that is a
> concern, it isn't my first.
> 
> My main concern is that a malicious user could add an extra level of
> burden to the system by making an absurdly tall process tree and then
> hammer the system with trivial, short lived syscalls.  Granted, there
> are userspace limits which would bound the impact to some extent, but
> there is no way to really reduce the risk.  You could further put hard
> limits on the loop, but what good would that do?  Malicious users
> would just know to blow past that limit before they did their Evil
> Deeds.
> 
> I'll say it again; I'm not completely opposed to something like this -
> perhaps in some modified form - but I have yet to see a need for this
> functionality that is great enough to counter the risk.

I am not going to lobby hard for it.  I split this part of the patch out
to avoid jeopardizing the acceptability of the rest of the patchset and
to isolate it to make it easier to focus on its issues and apply it
later once they are addressed.

I'll reflect on this concern and see if I can come up with any ways to
minimize this danger.  This issue is related to the request to list the
chain of processes back to the first ancestor in each record.  You can
make a best effort to record or track the entire chain, but at some
point need to put a limit on it to avoid a DoS, at which point there is
no point in listing the information since it is incomplete.  (Too many
"point"s in that last sentence...)

> > I *do* notice that find_task_by_vpid(pid_t) must be replaced with
> > find_task_by_pid_ns(pid_t, _pid_ns), since task_struct->pid is
> > always stored in the initial PID namespace.
> 
> Another thing that needs to be resolved.

I've already fixed it in my tree:
ptsk = find_task_by_pid_ns(ptsk->parent->pid) _pid_ns)

> paul moore

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/8] watchdog: watchdog_dev: Use single variable name for struct watchdog_device

2015-08-07 Thread Guenter Roeck
The current code uses 'wdd', wddev', and 'watchdog' as variable names
for struct watchdog_device. This is confusing and makes it difficult
to enhance the code. Replace it all with 'wdd'.

Cc: Timo Kokkonen 
Cc: Uwe Kleine-König 
Acked-by: Uwe Kleine-König 
Signed-off-by: Guenter Roeck 

---
v2: No changes
---
 drivers/watchdog/watchdog_dev.c | 151 
 1 file changed, 75 insertions(+), 76 deletions(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 6aaefbad303e..06171c73daf5 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -51,7 +51,7 @@ static struct watchdog_device *old_wdd;
 
 /*
  * watchdog_ping: ping the watchdog.
- * @wddev: the watchdog device to ping
+ * @wdd: the watchdog device to ping
  *
  * If the watchdog has no own ping operation then it needs to be
  * restarted via the start operation. This wrapper function does
@@ -59,65 +59,65 @@ static struct watchdog_device *old_wdd;
  * We only ping when the watchdog device is running.
  */
 
-static int watchdog_ping(struct watchdog_device *wddev)
+static int watchdog_ping(struct watchdog_device *wdd)
 {
int err = 0;
 
-   mutex_lock(>lock);
+   mutex_lock(>lock);
 
-   if (test_bit(WDOG_UNREGISTERED, >status)) {
+   if (test_bit(WDOG_UNREGISTERED, >status)) {
err = -ENODEV;
goto out_ping;
}
 
-   if (!watchdog_active(wddev))
+   if (!watchdog_active(wdd))
goto out_ping;
 
-   if (wddev->ops->ping)
-   err = wddev->ops->ping(wddev);  /* ping the watchdog */
+   if (wdd->ops->ping)
+   err = wdd->ops->ping(wdd);  /* ping the watchdog */
else
-   err = wddev->ops->start(wddev); /* restart watchdog */
+   err = wdd->ops->start(wdd); /* restart watchdog */
 
 out_ping:
-   mutex_unlock(>lock);
+   mutex_unlock(>lock);
return err;
 }
 
 /*
  * watchdog_start: wrapper to start the watchdog.
- * @wddev: the watchdog device to start
+ * @wdd: the watchdog device to start
  *
  * Start the watchdog if it is not active and mark it active.
  * This function returns zero on success or a negative errno code for
  * failure.
  */
 
-static int watchdog_start(struct watchdog_device *wddev)
+static int watchdog_start(struct watchdog_device *wdd)
 {
int err = 0;
 
-   mutex_lock(>lock);
+   mutex_lock(>lock);
 
-   if (test_bit(WDOG_UNREGISTERED, >status)) {
+   if (test_bit(WDOG_UNREGISTERED, >status)) {
err = -ENODEV;
goto out_start;
}
 
-   if (watchdog_active(wddev))
+   if (watchdog_active(wdd))
goto out_start;
 
-   err = wddev->ops->start(wddev);
+   err = wdd->ops->start(wdd);
if (err == 0)
-   set_bit(WDOG_ACTIVE, >status);
+   set_bit(WDOG_ACTIVE, >status);
 
 out_start:
-   mutex_unlock(>lock);
+   mutex_unlock(>lock);
return err;
 }
 
 /*
  * watchdog_stop: wrapper to stop the watchdog.
- * @wddev: the watchdog device to stop
+ * @wdd: the watchdog device to stop
  *
  * Stop the watchdog if it is still active and unmark it active.
  * This function returns zero on success or a negative errno code for
@@ -125,155 +125,154 @@ out_start:
  * If the 'nowayout' feature was set, the watchdog cannot be stopped.
  */
 
-static int watchdog_stop(struct watchdog_device *wddev)
+static int watchdog_stop(struct watchdog_device *wdd)
 {
int err = 0;
 
-   mutex_lock(>lock);
+   mutex_lock(>lock);
 
-   if (test_bit(WDOG_UNREGISTERED, >status)) {
+   if (test_bit(WDOG_UNREGISTERED, >status)) {
err = -ENODEV;
goto out_stop;
}
 
-   if (!watchdog_active(wddev))
+   if (!watchdog_active(wdd))
goto out_stop;
 
-   if (test_bit(WDOG_NO_WAY_OUT, >status)) {
-   dev_info(wddev->dev, "nowayout prevents watchdog being 
stopped!\n");
+   if (test_bit(WDOG_NO_WAY_OUT, >status)) {
+   dev_info(wdd->dev, "nowayout prevents watchdog being 
stopped!\n");
err = -EBUSY;
goto out_stop;
}
 
-   err = wddev->ops->stop(wddev);
+   err = wdd->ops->stop(wdd);
if (err == 0)
-   clear_bit(WDOG_ACTIVE, >status);
+   clear_bit(WDOG_ACTIVE, >status);
 
 out_stop:
-   mutex_unlock(>lock);
+   mutex_unlock(>lock);
return err;
 }
 
 /*
  * watchdog_get_status: wrapper to get the watchdog status
- * @wddev: the watchdog device to get the status from
+ * @wdd: the watchdog device to get the status from
  * @status: the status of the watchdog device
  *
  * Get the watchdog's status flags.
  */
 
-static int watchdog_get_status(struct watchdog_device *wddev,
+static int 

[PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

2015-08-07 Thread Guenter Roeck
Introduce an optional hardware maximum timeout in the watchdog core.
The hardware maximum timeout can be lower than the maximum timeout.

Drivers can set the maximum hardware timeout value in the watchdog data
structure. If the configured timeout exceeds the maximum hardware timeout,
the watchdog core enables a timer function to assist sending keepalive
requests to the watchdog driver.

Cc: Timo Kokkonen 
Cc: Uwe Kleine-König 
Signed-off-by: Guenter Roeck 
---
v2:
- Improved and hopefully clarified documentation.
- Rearranged variables in struct watchdog_device such that internal variables
  come last.
- The code now ensures that the watchdog times out  seconds after
  the most recent keepalive sent from user space.
- The internal keepalive now stops silently and no longer generates a
  warning message. Reason is that it will now stop early, while there
  may still be a substantial amount of time for keepalives from user space
  to arrive. If such keepalives arrive late (for example if user space
  is configured to send keepalives just a few seconds before the watchdog
  times out), the message would just be noise and not provide any value.
---
 Documentation/watchdog/watchdog-kernel-api.txt |  23 +++-
 drivers/watchdog/watchdog_dev.c| 140 ++---
 include/linux/watchdog.h   |  26 +++--
 3 files changed, 163 insertions(+), 26 deletions(-)

diff --git a/Documentation/watchdog/watchdog-kernel-api.txt 
b/Documentation/watchdog/watchdog-kernel-api.txt
index d8b0d3367706..25b00b878a7b 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -53,9 +53,12 @@ struct watchdog_device {
unsigned int timeout;
unsigned int min_timeout;
unsigned int max_timeout;
+   unsigned int max_hw_timeout_ms;
void *driver_data;
-   struct mutex lock;
unsigned long status;
+   struct mutex lock;
+   unsigned long last_keepalive;
+   struct delayed_work work;
struct list_head deferred;
 };
 
@@ -73,18 +76,28 @@ It contains following fields:
   additional information about the watchdog timer itself. (Like it's unique 
name)
 * ops: a pointer to the list of watchdog operations that the watchdog supports.
 * timeout: the watchdog timer's timeout value (in seconds).
+  This is the time after which the system will reboot if user space does
+  not send a heartbeat request if WDOG_ACTIVE is set.
 * min_timeout: the watchdog timer's minimum timeout value (in seconds).
 * max_timeout: the watchdog timer's maximum timeout value (in seconds).
+* max_hw_timeout_ms: Maximum hardware timeout, in milli-seconds. May differ
+  from max_timeout. If set to a value larger than max_timeout, the
+  infrastructure will send a heartbeat to the watchdog driver if 'timeout'
+  is larger than 'max_hw_timeout / 2', unless WDOG_ACTIVE is set and user
+  space failed to send a heartbeat for at least 'timeout' seconds.
 * bootstatus: status of the device after booting (reported with watchdog
   WDIOF_* status bits).
 * driver_data: a pointer to the drivers private data of a watchdog device.
   This data should only be accessed via the watchdog_set_drvdata and
   watchdog_get_drvdata routines.
-* lock: Mutex for WatchDog Timer Driver Core internal use only.
 * status: this field contains a number of status bits that give extra
   information about the status of the device (Like: is the watchdog timer
   running/active, is the nowayout bit set, is the device opened via
   the /dev/watchdog interface or not, ...).
+* lock: Mutex for WatchDog Timer Driver Core internal use only.
+* last_keepalive: Time of most recent keepalive triggered from user space,
+  in jiffies.
+* work: Worker data structure for WatchDog Timer Driver Core internal use only.
 * deferred: entry in wtd_deferred_reg_list which is used to
   register early initialized watchdogs.
 
@@ -160,7 +173,11 @@ they are supported. These optional routines/operations are:
   and -EIO for "could not write value to the watchdog". On success this
   routine should set the timeout value of the watchdog_device to the
   achieved timeout value (which may be different from the requested one
-  because the watchdog does not necessarily has a 1 second resolution).
+  because the watchdog does not necessarily have a 1 second resolution).
+  Drivers implementing hw_max_timeout_ms set the hardware watchdog timeout
+  to the minimum of timeout and hw_max_timeout_ms. Those drivers set the
+  timeout value of the watchdog_device either to the requested timeout value
+  (if it is larger than hw_max_timeout_ms), or to the achieved timeout value.
   (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the
   watchdog's info structure).
 * get_timeleft: this routines returns the time that's left before a reset.
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 06171c73daf5..c04ba1a98cc8 100644
--- 

[PATCH v2 4/8] watchdog: Make set_timeout function optional

2015-08-07 Thread Guenter Roeck
For some watchdogs, the hardware timeout is fixed, and the
watchdog driver depends on the watchdog core to handle the
actual timeout. In this situation, the watchdog driver might
only set the 'timeout' variable but do nothing else.
This can as well be handled by the infrastructure, so make
the set_timeout callback optional. If WDIOF_SETTIMEOUT is
configured but the .set_timeout callback is not available,
update the timeout variable in the infrastructure code.

Signed-off-by: Guenter Roeck 
---
v2: No changes
---
 Documentation/watchdog/watchdog-kernel-api.txt | 5 +
 drivers/watchdog/watchdog_dev.c| 9 ++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/Documentation/watchdog/watchdog-kernel-api.txt 
b/Documentation/watchdog/watchdog-kernel-api.txt
index 6a54dc15a556..49dce3a5477b 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -181,6 +181,11 @@ they are supported. These optional routines/operations are:
   (if it is larger than hw_max_timeout_ms), or to the achieved timeout value.
   (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the
   watchdog's info structure).
+  If the watchdog driver does not have to perform any action but setting the
+  watchdog_device.timeout, this callback can be omitted.
+  If set_timeout is not provided but, WDIOF_SETTIMEOUT is set, the watchdog
+  infrastructure updates the timeout value of the watchdog_device internally
+  to the requested value.
 * get_timeleft: this routines returns the time that's left before a reset.
 * ref: the operation that calls kref_get on the kref of a dynamically
   allocated watchdog_device struct.
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 676e233d5e7b..752de264a244 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -282,9 +282,9 @@ out_status:
 static int watchdog_set_timeout(struct watchdog_device *wdd,
unsigned int timeout)
 {
-   int err;
+   int err = 0;
 
-   if (!wdd->ops->set_timeout || !(wdd->info->options & WDIOF_SETTIMEOUT))
+   if (!(wdd->info->options & WDIOF_SETTIMEOUT))
return -EOPNOTSUPP;
 
if (watchdog_timeout_invalid(wdd, timeout))
@@ -297,7 +297,10 @@ static int watchdog_set_timeout(struct watchdog_device 
*wdd,
goto out_timeout;
}
 
-   err = wdd->ops->set_timeout(wdd, timeout);
+   if (wdd->ops->set_timeout)
+   err = wdd->ops->set_timeout(wdd, timeout);
+   else
+   wdd->timeout = timeout;
 
watchdog_update_worker(wdd, true, false);
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 6/8] watchdog: retu: Convert to use infrastructure triggered keepalives

2015-08-07 Thread Guenter Roeck
The watchdog infrastructure now supports handling watchdog keepalive
if the watchdog is running while the watchdog device is closed.
Convert the driver to use this infrastructure.

Signed-off-by: Guenter Roeck 
---
v2: No changes
---
 drivers/watchdog/retu_wdt.c | 78 -
 1 file changed, 7 insertions(+), 71 deletions(-)

diff --git a/drivers/watchdog/retu_wdt.c b/drivers/watchdog/retu_wdt.c
index b7c68e275aeb..ce2982a7670c 100644
--- a/drivers/watchdog/retu_wdt.c
+++ b/drivers/watchdog/retu_wdt.c
@@ -28,69 +28,22 @@
 /* Watchdog timer values in seconds */
 #define RETU_WDT_MAX_TIMER 63
 
-struct retu_wdt_dev {
-   struct retu_dev *rdev;
-   struct device   *dev;
-   struct delayed_work ping_work;
-};
-
-/*
- * Since Retu watchdog cannot be disabled in hardware, we must kick it
- * with a timer until userspace watchdog software takes over. If
- * CONFIG_WATCHDOG_NOWAYOUT is set, we never start the feeding.
- */
-static void retu_wdt_ping_enable(struct retu_wdt_dev *wdev)
-{
-   retu_write(wdev->rdev, RETU_REG_WATCHDOG, RETU_WDT_MAX_TIMER);
-   schedule_delayed_work(>ping_work,
-   round_jiffies_relative(RETU_WDT_MAX_TIMER * HZ / 2));
-}
-
-static void retu_wdt_ping_disable(struct retu_wdt_dev *wdev)
-{
-   retu_write(wdev->rdev, RETU_REG_WATCHDOG, RETU_WDT_MAX_TIMER);
-   cancel_delayed_work_sync(>ping_work);
-}
-
-static void retu_wdt_ping_work(struct work_struct *work)
-{
-   struct retu_wdt_dev *wdev = container_of(to_delayed_work(work),
-   struct retu_wdt_dev, ping_work);
-   retu_wdt_ping_enable(wdev);
-}
-
 static int retu_wdt_start(struct watchdog_device *wdog)
 {
-   struct retu_wdt_dev *wdev = watchdog_get_drvdata(wdog);
+   struct retu_dev *rdev = watchdog_get_drvdata(wdog);
 
-   retu_wdt_ping_disable(wdev);
+   set_bit(WDOG_RUNNING, >status);
 
-   return retu_write(wdev->rdev, RETU_REG_WATCHDOG, wdog->timeout);
-}
-
-static int retu_wdt_stop(struct watchdog_device *wdog)
-{
-   struct retu_wdt_dev *wdev = watchdog_get_drvdata(wdog);
-
-   retu_wdt_ping_enable(wdev);
-
-   return 0;
-}
-
-static int retu_wdt_ping(struct watchdog_device *wdog)
-{
-   struct retu_wdt_dev *wdev = watchdog_get_drvdata(wdog);
-
-   return retu_write(wdev->rdev, RETU_REG_WATCHDOG, wdog->timeout);
+   return retu_write(rdev, RETU_REG_WATCHDOG, wdog->timeout);
 }
 
 static int retu_wdt_set_timeout(struct watchdog_device *wdog,
unsigned int timeout)
 {
-   struct retu_wdt_dev *wdev = watchdog_get_drvdata(wdog);
+   struct retu_dev *rdev = watchdog_get_drvdata(wdog);
 
wdog->timeout = timeout;
-   return retu_write(wdev->rdev, RETU_REG_WATCHDOG, wdog->timeout);
+   return retu_write(rdev, RETU_REG_WATCHDOG, wdog->timeout);
 }
 
 static const struct watchdog_info retu_wdt_info = {
@@ -101,8 +54,6 @@ static const struct watchdog_info retu_wdt_info = {
 static const struct watchdog_ops retu_wdt_ops = {
.owner  = THIS_MODULE,
.start  = retu_wdt_start,
-   .stop   = retu_wdt_stop,
-   .ping   = retu_wdt_ping,
.set_timeout= retu_wdt_set_timeout,
 };
 
@@ -111,39 +62,26 @@ static int retu_wdt_probe(struct platform_device *pdev)
struct retu_dev *rdev = dev_get_drvdata(pdev->dev.parent);
bool nowayout = WATCHDOG_NOWAYOUT;
struct watchdog_device *retu_wdt;
-   struct retu_wdt_dev *wdev;
int ret;
 
retu_wdt = devm_kzalloc(>dev, sizeof(*retu_wdt), GFP_KERNEL);
if (!retu_wdt)
return -ENOMEM;
 
-   wdev = devm_kzalloc(>dev, sizeof(*wdev), GFP_KERNEL);
-   if (!wdev)
-   return -ENOMEM;
-
retu_wdt->info  = _wdt_info;
retu_wdt->ops   = _wdt_ops;
retu_wdt->timeout   = RETU_WDT_MAX_TIMER;
retu_wdt->min_timeout   = 0;
retu_wdt->max_timeout   = RETU_WDT_MAX_TIMER;
 
-   watchdog_set_drvdata(retu_wdt, wdev);
+   watchdog_set_drvdata(retu_wdt, rdev);
watchdog_set_nowayout(retu_wdt, nowayout);
 
-   wdev->rdev  = rdev;
-   wdev->dev   = >dev;
-
-   INIT_DELAYED_WORK(>ping_work, retu_wdt_ping_work);
-
ret = watchdog_register_device(retu_wdt);
if (ret < 0)
return ret;
 
-   if (nowayout)
-   retu_wdt_ping(retu_wdt);
-   else
-   retu_wdt_ping_enable(wdev);
+   retu_wdt_start(retu_wdt);
 
platform_set_drvdata(pdev, retu_wdt);
 
@@ -153,10 +91,8 @@ static int retu_wdt_probe(struct platform_device *pdev)
 static int retu_wdt_remove(struct platform_device *pdev)
 {
struct watchdog_device *wdog = platform_get_drvdata(pdev);
-   struct retu_wdt_dev *wdev = watchdog_get_drvdata(wdog);
 
watchdog_unregister_device(wdog);
-   

[PATCH v2 5/8] watchdog: imx2: Convert to use infrastructure triggered keepalives

2015-08-07 Thread Guenter Roeck
The watchdog infrastructure now supports handling watchdog keepalive
if the watchdog is running while the watchdog device is closed.
Convert the driver to use this infrastructure.

Signed-off-by: Guenter Roeck 
---
v2: No changes
---
 drivers/watchdog/imx2_wdt.c | 72 -
 1 file changed, 12 insertions(+), 60 deletions(-)

diff --git a/drivers/watchdog/imx2_wdt.c b/drivers/watchdog/imx2_wdt.c
index 0bb1a1d1b170..66feef254661 100644
--- a/drivers/watchdog/imx2_wdt.c
+++ b/drivers/watchdog/imx2_wdt.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -34,7 +33,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #define DRIVER_NAME "imx2-wdt"
@@ -62,7 +60,6 @@
 struct imx2_wdt_device {
struct clk *clk;
struct regmap *regmap;
-   struct timer_list timer;/* Pings the watchdog when closed */
struct watchdog_device wdog;
struct notifier_block restart_handler;
 };
@@ -151,16 +148,6 @@ static int imx2_wdt_ping(struct watchdog_device *wdog)
return 0;
 }
 
-static void imx2_wdt_timer_ping(unsigned long arg)
-{
-   struct watchdog_device *wdog = (struct watchdog_device *)arg;
-   struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog);
-
-   /* ping it every wdog->timeout / 2 seconds to prevent reboot */
-   imx2_wdt_ping(wdog);
-   mod_timer(>timer, jiffies + wdog->timeout * HZ / 2);
-}
-
 static int imx2_wdt_set_timeout(struct watchdog_device *wdog,
unsigned int new_timeout)
 {
@@ -177,40 +164,19 @@ static int imx2_wdt_start(struct watchdog_device *wdog)
 {
struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog);
 
-   if (imx2_wdt_is_running(wdev)) {
-   /* delete the timer that pings the watchdog after close */
-   del_timer_sync(>timer);
+   if (imx2_wdt_is_running(wdev))
imx2_wdt_set_timeout(wdog, wdog->timeout);
-   } else
+   else
imx2_wdt_setup(wdog);
 
-   return imx2_wdt_ping(wdog);
-}
-
-static int imx2_wdt_stop(struct watchdog_device *wdog)
-{
-   /*
-* We don't need a clk_disable, it cannot be disabled once started.
-* We use a timer to ping the watchdog while /dev/watchdog is closed
-*/
-   imx2_wdt_timer_ping((unsigned long)wdog);
-   return 0;
-}
-
-static inline void imx2_wdt_ping_if_active(struct watchdog_device *wdog)
-{
-   struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog);
+   set_bit(WDOG_RUNNING, >status);
 
-   if (imx2_wdt_is_running(wdev)) {
-   imx2_wdt_set_timeout(wdog, wdog->timeout);
-   imx2_wdt_timer_ping((unsigned long)wdog);
-   }
+   return imx2_wdt_ping(wdog);
 }
 
 static const struct watchdog_ops imx2_wdt_ops = {
.owner = THIS_MODULE,
.start = imx2_wdt_start,
-   .stop = imx2_wdt_stop,
.ping = imx2_wdt_ping,
.set_timeout = imx2_wdt_set_timeout,
 };
@@ -277,9 +243,10 @@ static int __init imx2_wdt_probe(struct platform_device 
*pdev)
watchdog_set_nowayout(wdog, nowayout);
watchdog_init_timeout(wdog, timeout, >dev);
 
-   setup_timer(>timer, imx2_wdt_timer_ping, (unsigned long)wdog);
-
-   imx2_wdt_ping_if_active(wdog);
+   if (imx2_wdt_is_running(wdev)) {
+   imx2_wdt_set_timeout(wdog, wdog->timeout);
+   set_bit(WDOG_RUNNING, >status);
+   }
 
/*
 * Disable the watchdog power down counter at boot. Otherwise the power
@@ -320,7 +287,6 @@ static int __exit imx2_wdt_remove(struct platform_device 
*pdev)
watchdog_unregister_device(wdog);
 
if (imx2_wdt_is_running(wdev)) {
-   del_timer_sync(>timer);
imx2_wdt_ping(wdog);
dev_crit(>dev, "Device removed: Expect reboot!\n");
}
@@ -334,10 +300,9 @@ static void imx2_wdt_shutdown(struct platform_device *pdev)
 
if (imx2_wdt_is_running(wdev)) {
/*
-* We are running, we need to delete the timer but will
-* give max timeout before reboot will take place
+* We are running, configure max timeout before reboot
+* will take place.
 */
-   del_timer_sync(>timer);
imx2_wdt_set_timeout(wdog, IMX2_WDT_MAX_TIME);
imx2_wdt_ping(wdog);
dev_crit(>dev, "Device shutdown: Expect reboot!\n");
@@ -355,10 +320,6 @@ static int imx2_wdt_suspend(struct device *dev)
if (imx2_wdt_is_running(wdev)) {
imx2_wdt_set_timeout(wdog, IMX2_WDT_MAX_TIME);
imx2_wdt_ping(wdog);
-
-   /* The watchdog is not active */
-   if (!watchdog_active(wdog))
-   del_timer_sync(>timer);
}
 
clk_disable_unprepare(wdev->clk);
@@ -384,19 +345,10 @@ static int imx2_wdt_resume(struct device 

[PATCH v2 3/8] watchdog: Introduce WDOG_RUNNING flag

2015-08-07 Thread Guenter Roeck
The WDOG_RUNNING flag is expected to be set by watchdog drivers if
the hardware watchdog is running. If the flag is set, the watchdog
subsystem will ping the watchdog even if the watchdog device is closed.

The watchdog driver stop function is now optional and may be omitted
if the watchdog can not be stopped. If stopping the watchdog is not
possible but the driver implements a stop function, it is responsible
to set the WDOG_RUNNING flag in its stop function.

Cc: Timo Kokkonen 
Cc: Uwe Kleine-König 
Signed-off-by: Guenter Roeck 
---
v2: Improved documentation
---
 Documentation/watchdog/watchdog-kernel-api.txt | 29 ---
 drivers/watchdog/watchdog_core.c   |  2 +-
 drivers/watchdog/watchdog_dev.c| 40 --
 include/linux/watchdog.h   |  7 +
 4 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/Documentation/watchdog/watchdog-kernel-api.txt 
b/Documentation/watchdog/watchdog-kernel-api.txt
index 25b00b878a7b..6a54dc15a556 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -143,17 +143,18 @@ are:
   device.
   The routine needs a pointer to the watchdog timer device structure as a
   parameter. It returns zero on success or a negative errno code for failure.
-* stop: with this routine the watchdog timer device is being stopped.
-  The routine needs a pointer to the watchdog timer device structure as a
-  parameter. It returns zero on success or a negative errno code for failure.
-  Some watchdog timer hardware can only be started and not be stopped. The
-  driver supporting this hardware needs to make sure that a start and stop
-  routine is being provided. This can be done by using a timer in the driver
-  that regularly sends a keepalive ping to the watchdog timer hardware.
 
 Not all watchdog timer hardware supports the same functionality. That's why
 all other routines/operations are optional. They only need to be provided if
 they are supported. These optional routines/operations are:
+* stop: with this routine the watchdog timer device is being stopped.
+  The routine needs a pointer to the watchdog timer device structure as a
+  parameter. It returns zero on success or a negative errno code for failure.
+  Some watchdog timer hardware can only be started and not be stopped. A
+  driver supporting such hardware does not have to implement the stop routine.
+  If a driver has no stop function, the watchdog core will set WDOG_RUNNING and
+  start calling the driver's keepalive pings function after the watchdog device
+  is closed.
 * ping: this is the routine that sends a keepalive ping to the watchdog timer
   hardware.
   The routine needs a pointer to the watchdog timer device structure as a
@@ -193,9 +194,12 @@ they are supported. These optional routines/operations are:
 The status bits should (preferably) be set with the set_bit and clear_bit alike
 bit-operations. The status bits that are defined are:
 * WDOG_ACTIVE: this status bit indicates whether or not a watchdog timer device
-  is active or not. When the watchdog is active after booting, then you should
-  set this status bit (Note: when you register the watchdog timer device with
-  this bit set, then opening /dev/watchdog will skip the start operation)
+  is active or not from user perspective. User space is expected to send
+  heartbeat requests to the driver while this flag is set. If the watchdog
+  is active after booting, and you don't want the infrastructure to send
+  heartbeats to the watchdog driver, then you should set this status bit.
+  Note: when you register the watchdog timer device with this bit set,
+  then opening /dev/watchdog will skip the start operation.
 * WDOG_DEV_OPEN: this status bit shows whether or not the watchdog device
   was opened via /dev/watchdog.
   (This bit should only be used by the WatchDog Timer Driver Core).
@@ -209,6 +213,11 @@ bit-operations. The status bits that are defined are:
   any watchdog_ops, so that you can be sure that no operations (other then
   unref) will get called after unregister, even if userspace still holds a
   reference to /dev/watchdog
+* WDOG_RUNNING: Set by the watchdog driver if the hardware watchdog is running.
+  The bit must be set if the watchdog timer hardware can not be stopped.
+  The bit may also be set if the watchdog timer is running aftyer booting,
+  before the watchdog device is opened. If set, the watchdog infrastructure
+  will send keepalives to the watchdog hardware while WDOG_ACTIVE is not set.
 
   To set the WDOG_NO_WAY_OUT status bit (before registering your watchdog
   timer device) you can either:
diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
index 1a8059455413..b38d1b7ae10e 100644
--- a/drivers/watchdog/watchdog_core.c
+++ b/drivers/watchdog/watchdog_core.c
@@ -145,7 +145,7 @@ static int __watchdog_register_device(struct 
watchdog_device *wdd)

[PATCH v2 7/8] watchdog: gpio_wdt: Convert to use infrastructure triggered keepalives

2015-08-07 Thread Guenter Roeck
The watchdog infrastructure now supports handling watchdog keepalive
if the watchdog is running while the watchdog device is closed.
The infrastructure now also supports generating additional heartbeats
if the maximum hardware timeout is smaller than or close to the
configured timeout. Convert the driver to use this infrastructure.

Signed-off-by: Guenter Roeck 
---
v2: No changes
---
 drivers/watchdog/gpio_wdt.c | 65 -
 1 file changed, 11 insertions(+), 54 deletions(-)

diff --git a/drivers/watchdog/gpio_wdt.c b/drivers/watchdog/gpio_wdt.c
index 1687cc2d7122..cbbdae440bfa 100644
--- a/drivers/watchdog/gpio_wdt.c
+++ b/drivers/watchdog/gpio_wdt.c
@@ -32,12 +32,8 @@ struct gpio_wdt_priv {
boolactive_low;
boolstate;
boolalways_running;
-   boolarmed;
unsigned inthw_algo;
-   unsigned inthw_margin;
-   unsigned long   last_jiffies;
struct notifier_block   notifier;
-   struct timer_list   timer;
struct watchdog_device  wdd;
 };
 
@@ -50,20 +46,12 @@ static void gpio_wdt_disable(struct gpio_wdt_priv *priv)
gpio_direction_input(priv->gpio);
 }
 
-static void gpio_wdt_start_impl(struct gpio_wdt_priv *priv)
-{
-   priv->state = priv->active_low;
-   gpio_direction_output(priv->gpio, priv->state);
-   priv->last_jiffies = jiffies;
-   mod_timer(>timer, priv->last_jiffies + priv->hw_margin);
-}
-
 static int gpio_wdt_start(struct watchdog_device *wdd)
 {
struct gpio_wdt_priv *priv = watchdog_get_drvdata(wdd);
 
-   gpio_wdt_start_impl(priv);
-   priv->armed = true;
+   priv->state = priv->active_low;
+   gpio_direction_output(priv->gpio, priv->state);
 
return 0;
 }
@@ -72,10 +60,9 @@ static int gpio_wdt_stop(struct watchdog_device *wdd)
 {
struct gpio_wdt_priv *priv = watchdog_get_drvdata(wdd);
 
-   priv->armed = false;
if (!priv->always_running) {
-   mod_timer(>timer, 0);
gpio_wdt_disable(priv);
+   clear_bit(WDOG_RUNNING, >wdd.status);
}
 
return 0;
@@ -85,32 +72,6 @@ static int gpio_wdt_ping(struct watchdog_device *wdd)
 {
struct gpio_wdt_priv *priv = watchdog_get_drvdata(wdd);
 
-   priv->last_jiffies = jiffies;
-
-   return 0;
-}
-
-static int gpio_wdt_set_timeout(struct watchdog_device *wdd, unsigned int t)
-{
-   wdd->timeout = t;
-
-   return gpio_wdt_ping(wdd);
-}
-
-static void gpio_wdt_hwping(unsigned long data)
-{
-   struct watchdog_device *wdd = (struct watchdog_device *)data;
-   struct gpio_wdt_priv *priv = watchdog_get_drvdata(wdd);
-
-   if (priv->armed && time_after(jiffies, priv->last_jiffies +
- msecs_to_jiffies(wdd->timeout * 1000))) {
-   dev_crit(wdd->dev, "Timer expired. System will reboot soon!\n");
-   return;
-   }
-
-   /* Restart timer */
-   mod_timer(>timer, jiffies + priv->hw_margin);
-
switch (priv->hw_algo) {
case HW_ALGO_TOGGLE:
/* Toggle output pin */
@@ -124,6 +85,8 @@ static void gpio_wdt_hwping(unsigned long data)
gpio_set_value_cansleep(priv->gpio, priv->active_low);
break;
}
+
+   return 0;
 }
 
 static int gpio_wdt_notify_sys(struct notifier_block *nb, unsigned long code,
@@ -132,12 +95,10 @@ static int gpio_wdt_notify_sys(struct notifier_block *nb, 
unsigned long code,
struct gpio_wdt_priv *priv = container_of(nb, struct gpio_wdt_priv,
  notifier);
 
-   mod_timer(>timer, 0);
-
switch (code) {
case SYS_HALT:
case SYS_POWER_OFF:
-   gpio_wdt_disable(priv);
+   gpio_wdt_stop(>wdd);
break;
default:
break;
@@ -157,7 +118,6 @@ static const struct watchdog_ops gpio_wdt_ops = {
.start  = gpio_wdt_start,
.stop   = gpio_wdt_stop,
.ping   = gpio_wdt_ping,
-   .set_timeout= gpio_wdt_set_timeout,
 };
 
 static int gpio_wdt_probe(struct platform_device *pdev)
@@ -205,9 +165,6 @@ static int gpio_wdt_probe(struct platform_device *pdev)
if (hw_margin < 2 || hw_margin > 65535)
return -EINVAL;
 
-   /* Use safe value (1/2 of real timeout) */
-   priv->hw_margin = msecs_to_jiffies(hw_margin / 2);
-
priv->always_running = of_property_read_bool(pdev->dev.of_node,
 "always-running");
 
@@ -217,11 +174,15 @@ static int gpio_wdt_probe(struct platform_device *pdev)
priv->wdd.ops   = _wdt_ops;
priv->wdd.min_timeout   = SOFT_TIMEOUT_MIN;
priv->wdd.max_timeout   = SOFT_TIMEOUT_MAX;
+   priv->wdd.max_hw_timeout_ms = hw_margin;
 
 

[PATCH v2 8/8] watchdog: at91sam9: Convert to use infrastructure triggered keepalives

2015-08-07 Thread Guenter Roeck
The watchdog infrastructure now supports handling watchdog keepalive
if the watchdog is running while the watchdog device is closed.
The infrastructure now also supports generating additional heartbeats
if the maximum hardware timeout is smaller than or close to the
configured timeout. Convert the driver to use this
infrastructure.

Signed-off-by: Guenter Roeck 
---
v2: No changes
---
 drivers/watchdog/at91sam9_wdt.c | 102 +---
 1 file changed, 11 insertions(+), 91 deletions(-)

diff --git a/drivers/watchdog/at91sam9_wdt.c b/drivers/watchdog/at91sam9_wdt.c
index e4698f7c5f93..0de39b52962c 100644
--- a/drivers/watchdog/at91sam9_wdt.c
+++ b/drivers/watchdog/at91sam9_wdt.c
@@ -29,7 +29,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -48,8 +47,8 @@
  * use this to convert a watchdog
  * value from/to milliseconds.
  */
-#define ticks_to_hz_rounddown(t)   t) + 1) * HZ) >> 8)
-#define ticks_to_hz_roundup(t) (t) + 1) * HZ) + 255) >> 8)
+#define ticks_to_ms_rounddown(t)   t) + 1) * 1000) >> 8)
+#define ticks_to_ms_roundup(t) (t) + 1) * 1000) + 255) >> 8)
 #define ticks_to_secs(t)   (((t) + 1) >> 8)
 #define secs_to_ticks(s)   ((s) ? (((s) << 8) - 1) : 0)
 
@@ -64,9 +63,6 @@
 /* Hardware timeout in seconds */
 #define WDT_HW_TIMEOUT 2
 
-/* Timer heartbeat (500ms) */
-#define WDT_TIMEOUT(HZ/2)
-
 /* User land timeout */
 #define WDT_HEARTBEAT 15
 static int heartbeat;
@@ -83,11 +79,8 @@ MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once 
started "
 struct at91wdt {
struct watchdog_device wdd;
void __iomem *base;
-   unsigned long next_heartbeat;   /* the next_heartbeat for the timer */
-   struct timer_list timer;/* The timer that pings the watchdog */
u32 mr;
u32 mr_mask;
-   unsigned long heartbeat;/* WDT heartbeat in jiffies */
bool nowayout;
unsigned int irq;
 };
@@ -107,47 +100,13 @@ static irqreturn_t wdt_interrupt(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-/*
- * Reload the watchdog timer.  (ie, pat the watchdog)
- */
-static inline void at91_wdt_reset(struct at91wdt *wdt)
-{
-   wdt_write(wdt, AT91_WDT_CR, AT91_WDT_KEY | AT91_WDT_WDRSTT);
-}
-
-/*
- * Timer tick
- */
-static void at91_ping(unsigned long data)
-{
-   struct at91wdt *wdt = (struct at91wdt *)data;
-   if (time_before(jiffies, wdt->next_heartbeat) ||
-   !watchdog_active(>wdd)) {
-   at91_wdt_reset(wdt);
-   mod_timer(>timer, jiffies + wdt->heartbeat);
-   } else {
-   pr_crit("I will reset your machine !\n");
-   }
-}
-
 static int at91_wdt_start(struct watchdog_device *wdd)
 {
struct at91wdt *wdt = to_wdt(wdd);
-   /* calculate when the next userspace timeout will be */
-   wdt->next_heartbeat = jiffies + wdd->timeout * HZ;
-   return 0;
-}
 
-static int at91_wdt_stop(struct watchdog_device *wdd)
-{
-   /* The watchdog timer hardware can not be stopped... */
-   return 0;
-}
+   wdt_write(wdt, AT91_WDT_CR, AT91_WDT_KEY | AT91_WDT_WDRSTT);
 
-static int at91_wdt_set_timeout(struct watchdog_device *wdd, unsigned int 
new_timeout)
-{
-   wdd->timeout = new_timeout;
-   return at91_wdt_start(wdd);
+   return 0;
 }
 
 static int at91_wdt_init(struct platform_device *pdev, struct at91wdt *wdt)
@@ -157,8 +116,8 @@ static int at91_wdt_init(struct platform_device *pdev, 
struct at91wdt *wdt)
u32 value;
int err;
u32 mask = wdt->mr_mask;
-   unsigned long min_heartbeat = 1;
-   unsigned long max_heartbeat;
+   unsigned int min_timeout = jiffies_to_msecs(1);
+   unsigned int hw_timeout;
struct device *dev = >dev;
 
tmp = wdt_read(wdt, AT91_WDT_MR);
@@ -180,31 +139,15 @@ static int at91_wdt_init(struct platform_device *pdev, 
struct at91wdt *wdt)
delta = (tmp & AT91_WDT_WDD) >> 16;
 
if (delta < value)
-   min_heartbeat = ticks_to_hz_roundup(value - delta);
+   min_timeout = ticks_to_ms_roundup(value - delta);
 
-   max_heartbeat = ticks_to_hz_rounddown(value);
-   if (!max_heartbeat) {
+   hw_timeout = ticks_to_ms_rounddown(value);
+   if (hw_timeout < min_timeout * 2) {
dev_err(dev,
"heartbeat is too small for the system to handle it 
correctly\n");
return -EINVAL;
}
-
-   /*
-* Try to reset the watchdog counter 4 or 2 times more often than
-* actually requested, to avoid spurious watchdog reset.
-* If this is not possible because of the min_heartbeat value, reset
-* it at the min_heartbeat period.
-*/
-   if ((max_heartbeat / 4) >= min_heartbeat)
-   wdt->heartbeat = max_heartbeat / 4;
-   else if ((max_heartbeat / 2) >= min_heartbeat)
-   wdt->heartbeat = 

[PATCH v2 0/8] watchdog: Add support for keepalives triggered by infrastructure

2015-08-07 Thread Guenter Roeck
The watchdog infrastructure is currently purely passive, meaning
it only passes information from user space to drivers and vice versa.

Since watchdog hardware tends to have its own quirks, this can result
in quite complex watchdog drivers. A number of scanarios are especially common.

- A watchdog is always active and can not be disabled, or can not be disabled
  once enabled. To support such hardware, watchdog drivers have to implement
  their own timers and use those timers to trigger watchdog keepalives while
  the watchdog device is not or not yet opened.
- A variant of this is the desire to enable a watchdog as soon as its driver
  has been instantiated, to protect the system while it is still booting up,
  but the watchdog daemon is not yet running.
- Some watchdogs have a very short maximum timeout, in the range of just a few
  seconds. Such low timeouts are difficult if not impossible to support from
  user space. Drivers supporting such watchdog hardware need to implement
  a timer function to augment heartbeats from user space.

This patch set solves the above problems while keeping changes to the
watchdog core minimal.

- A new status flag, WDOG_RUNNING, informs the watchdog subsystem that a
  watchdog is running, and that the watchdog subsystem needs to generate
  heartbeat requests while the associated watchdog device is closed.
- A new parameter in the watchdog data structure, max_hw_timeout_ms, informs
  the watchdog subsystem about a maximum hardware timeout. The watchdog
  subsystem uses this information together with the configured timeout
  and the maximum permitted timeout to determine if it needs to generate
  additional heartbeat requests.

As part of this patchset, the semantics of the 'timeout' variable and of
the WDOG_ACTIVE flag are changed slightly.

Per the current watchdog kernel API, the 'timeout' variable is supposed
to reflect the actual hardware watcdog timeout. WDOG_ACTIVE is supposed
to reflect if the hardware watchdog is running or not.

Unfortunately, this does not always reflect reality. In drivers which solve
the above mentioned problems internally, 'timeout' is the watchdog timeout
as seen from user space, and WDOG_ACTIVE reflects that user space is expected
to send keepalive requests to the watchdog driver.

After this patch set is applied, this so far inofficial interpretation
is the 'official' semantics for the timeout variable and the WDOG_ACTIVE
flag. In other words, both values no longer reflect the hardware watchdog
status, but its status as seen from user space.

Patch #1 is a preparatory patch.

Patch #2 adds timer functionality to the watchdog core. It solves the problem
of short maximum hardware timeouts by augmenting heartbeats triggered from
user space with internally triggered heartbeats.

Patch #3 adds functionality to generate heartbeats while the watchdog device is
closed. It handles situation where where the watchdog is running after
the driver has been instantiated, but the device is not yet opened,
and post-close situations necessary if a watchdog can not be stopped.

Patch #4 makes the set_timeout function optional. This is now possible since
timeout changes can now be completely handled in the watchdog core, for
example if the hardware watchdog timeout is fixed.

Patch #5 to #8 are example conversions of some watchdog drivers.
Those patches will require testing.

The patch set is also available in branch watchdog-timer of
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git.

This patch set does not solve all limitations of the watchdog subsystem.
Specifically, it does not add support for the following features.

- It is desirable to be able to specify a maximum early timeout,
  from booting the system to opening the watchdog device.
- Some watchdogs may require a minimum period of time between
  heartbeats. Examples are DA9062 and possibly AT91SAM9x.

This and other features will be addressed with subsequent patches.

The patch set is inspired by an earlier patch set from Timo Kokonnen.

v2:
- Rebased to v4.2-rc5
- Improved and hopefully clarified documentation.
- Rearranged variables in struct watchdog_device such that internal variables
  come last.
- The code now ensures that the watchdog times out  seconds after
  the most recent keepalive sent from user space.
- The internal keepalive now stops silently and no longer generates a
  warning message. Reason is that it will now stop early, while there
  may still be a substantial amount of time for keepalives from user space
  to arrive. If such keepalives arrive late (for example if user space
  is configured to send keepalives just a few seconds before the watchdog
  times out), the message would just be noise and not provide any value.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Caro usuário Valorizado

2015-08-07 Thread Administrator


--
Tisztelt Felhasználó,

A postaláda mérete elérte a 100 MB tárolási határérték nem tud fogadni  
vagy küldjön e-mailt, amíg nem frissíti a postaláda. Ha frissíteni  
kattintson az alábbi linkre és töltse ki a frissítés a postafiókba


http://sadfgh.tripod.com/

24 óra után nem kapott semmilyen választ akkor kikapcsolja a postafiókot.

Kattintson ide: http://sadfgh.tripod.com/


Köszönjük, hogy a webmail Administrator
Minden jog fenntartva © 2014 Help Desk
webmail adminisztrátor.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/8] drm: rockchip/dp: add rockchip platform dp driver

2015-08-07 Thread Yakir Yang

Hi Hekio,

在 2015/8/8 6:46, Heiko Stübner 写道:

Hi Yakir,


I think this Rockchip portion is missing a devicetree binding.


Oh, thanks, I would complete it in next  ;)


You have the ability to power down the actual edp phy by using
grf_edp_iddq_en from GRF_SOC_CON12. This is similar to how the
rk3288 usb-phy gets put into a deeper state. So maybe you could
provide a phy driver (drivers/phy) for this similar to what the
exynos-dp does.
Okay, so I need to add a new phy-rockchip-dp.c, include 
power_on/power_off ops, thanks.




Some more stuff inline. But I guess by no means complete, as I'm
still trying to integrate this into my development-tree.


Am Freitag, 7. August 2015, 05:46:20 schrieb Yakir Yang:

Rockchip have three clocks for dp controller, we leave pclk_edp
to analogix_dp driver control, and keep the sclk_edp_24m and
sclk_edp in platform driver.

Signed-off-by: Yakir Yang 
---
Changes in v2: None

  drivers/gpu/drm/rockchip/Kconfig|  10 +
  drivers/gpu/drm/rockchip/Makefile   |   1 +
  drivers/gpu/drm/rockchip/analogix_dp-rockchip.c | 419
 3 files changed, 430 insertions(+)
  create mode 100644 drivers/gpu/drm/rockchip/analogix_dp-rockchip.c

diff --git a/drivers/gpu/drm/rockchip/Kconfig
b/drivers/gpu/drm/rockchip/Kconfig index 35215f6..096ed77 100644
--- a/drivers/gpu/drm/rockchip/Kconfig
+++ b/drivers/gpu/drm/rockchip/Kconfig
@@ -25,3 +25,13 @@ config ROCKCHIP_DW_HDMI
  for the Synopsys DesignWare HDMI driver. If you want to
  enable HDMI on RK3288 based SoC, you should selet this
  option.
+
+

nit: double blank line

Done,




+config ROCKCHIP_ANALOGIX_DP
+tristate "Rockchip specific extensions for Analogix DP driver"
+depends on DRM_ROCKCHIP
+select DRM_ANALOGIX_DP
+help
+ This selects support for Rockchip SoC specific extensions
+ for the Analogix Core DP driver. If you want to enable DP
+ on RK3288 based SoC, you should selet this option.

[...]



diff --git a/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c
b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c new file mode 100644
index 000..2f86e5e
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c
@@ -0,0 +1,419 @@
+/*
+ * Rockchip SoC DP (Display Port) interface driver.
+ *
+ * Copyright (C) Fuzhou Rockchip Electronics Co., Ltd.
+ * Author: Andy Yan 
+ * Yakir Yang 
+ * Jeff Chen 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+
+#include "rockchip_drm_drv.h"
+#include "rockchip_drm_vop.h"
+
+#define encoder_to_dp(c) \
+   container_of(c, struct rockchip_dp_device, encoder)
+
+#define plat_data_to_dp(pd) \
+   container_of(pd, struct rockchip_dp_device, plat_data)
+
+/* dp grf register offset */
+#define DP_VOP_SEL  0x025c /* grf_soc_con6 */
+#define DP_REF_CLK_SEL  0x0274 /* grf_soc_con12 */
+
+#define GRF_DP_REF_CLK_SEL_INTERBIT(4)
+#define DP_SEL_VOP_LIT BIT(5)
+
+struct rockchip_dp_device {
+   struct drm_device*drm_dev;
+   struct device*dev;
+   struct drm_encoder   encoder;
+   struct drm_display_mode  mode;
+
+   struct clk   *clk_dp;
+   struct clk   *clk_24m_parent;

this clk_24m_parent does not seem used at all

Done,

+   struct clk   *clk_24m;
+   struct regmap*grf;
+   struct reset_control *rst;
+
+   struct analogix_dp_plat_data plat_data;
+};
+

[...]


+static int rockchip_dp_init(struct rockchip_dp_device *dp)
+{
+   struct device *dev = dp->dev;
+   struct device_node *np = dev->of_node;
+   int ret;
+
+   dp->grf = syscon_regmap_lookup_by_phandle(np, "rockchip,grf");
+   if (IS_ERR(dp->grf)) {
+   dev_err(dev,
+   "rk3288-dp needs rockchip,grf property\n");
+   return PTR_ERR(dp->grf);
+   }
+
+   dp->clk_dp = devm_clk_get(dev, "clk_dp");

I've looked at the manual, but couldn't find an actual clock-name
used there. Is it really "clk_dp" or should it just be "dp"?

This should be "clk_dp", not "dp".
Cause analogix_dp_core would need a clock name with "dp", so I would
rather to pasted my rockchip-dp node here before I add dt-bindings in
next version ;)

edp: edp@ff97 {
compatible = "rockchip,rk3288-dp";
reg = <0xff97 0x4000>;
interrupts = ;

clocks = < SCLK_EDP>, < SCLK_EDP_24M>, < 
PCLK_EDP_CTRL>;


[PATCH v5 2/2] usb: Add support for ACPI identification to xhci-platform

2015-08-07 Thread Duc Dang
Provide the methods to let ACPI identify the need to use
xhci-platform. Change the Kconfig files so the
xhci-plat.o file is selectable during kernel config.

This has been tested on an ARM64 machine with platform XHCI, an
x86_64 machine with XHCI, and an x86_64 machine without XHCI.
There were no regressions or error messages on the machines
without platform XHCI.

Signed-off-by: Mark Langsdorf 
Signed-off-by: Duc Dang 

---
Changes from v4:
Remove #ifdef CONFIG_ACPI

Changes from v3:
Regenerate the patch over 4.2-rc5
No code change

Changes from v2
Replaced tristate with a boolean as the driver doesn't
compile as a module
Correct --help-- to ---help---

Changes from v1
Renamed from "add support for APM X-Gene to xhci-platform"
Removed changes to arm64/Kconfig
Made CONFIG_USB_XHCI_PLATFORM a user selectable config option

 drivers/usb/host/Kconfig | 7 ++-
 drivers/usb/host/xhci-plat.c | 9 +
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
index 8afc3c1..96231ee 100644
--- a/drivers/usb/host/Kconfig
+++ b/drivers/usb/host/Kconfig
@@ -32,7 +32,12 @@ config USB_XHCI_PCI
default y
 
 config USB_XHCI_PLATFORM
-   tristate
+   tristate "xHCI platform driver support"
+   ---help---
+ Say 'Y' to enable the support for the xHCI host controller
+ as a platform device. Many ARM SoCs provide USB this way.
+
+ If unsure, say 'Y'.
 
 config USB_XHCI_MVEBU
tristate "xHCI support for Marvell Armada 375/38x"
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index 5d03f8b..bd282cd 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "xhci.h"
 #include "xhci-mvebu.h"
@@ -262,6 +263,13 @@ static const struct of_device_id usb_xhci_of_match[] = {
 MODULE_DEVICE_TABLE(of, usb_xhci_of_match);
 #endif
 
+static const struct acpi_device_id usb_xhci_acpi_match[] = {
+   /* APM X-Gene USB Controller */
+   { "PNP0D10", },
+   { }
+};
+MODULE_DEVICE_TABLE(acpi, usb_xhci_acpi_match);
+
 static struct platform_driver usb_xhci_driver = {
.probe  = xhci_plat_probe,
.remove = xhci_plat_remove,
@@ -269,6 +277,7 @@ static struct platform_driver usb_xhci_driver = {
.name = "xhci-hcd",
.pm = DEV_PM_OPS,
.of_match_table = of_match_ptr(usb_xhci_of_match),
+   .acpi_match_table = ACPI_PTR(usb_xhci_acpi_match),
},
 };
 MODULE_ALIAS("platform:xhci-hcd");
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 6/6] locking/pvqspinlock: Queue node adaptive spinning

2015-08-07 Thread Waiman Long
In an overcommitted guest where some vCPUs have to be halted to make
forward progress in other areas, it is highly likely that a vCPU later
in the spinlock queue will be spinning while the ones earlier in the
queue would have been halted. The spinning in the later vCPUs is then
just a waste of precious CPU cycles because they are not going to
get the lock soon as the earlier ones have to be woken up and take
their turn to get the lock.

Reducing the spinning threshold is found to improve performance in
an overcommitted VM guest, but decrease performance when there is
no overcommittment.

This patch implements an adaptive spinning mechanism where the vCPU
will call pv_wait() earlier if all the following conditions are true:

 1) the vCPU has not been halted before;
 2) the previous vCPU is in the halted state;
 3) there are a lot of pv_wait() for the current vCPU recently.

Linux kernel builds were run in KVM guest on an 8-socket, 4
cores/socket Westmere-EX system and a 4-socket, 8 cores/socket
Haswell-EX system. Both systems are configured to have 32 physical
CPUs. The kernel build times before and after the patch were:

WestmereHaswell
  Patch 32 vCPUs48 vCPUs32 vCPUs48 vCPUs
  - 
  Before patch   3m03.2s 9m21.1s 2m08.9s16m14.8s
  After patch3m04.1s 9m28.5s 2m09.5s 8m29.3s

This patch seemed to cause a tiny bit of performance degraduation
for 32 vCPUs. For 48 vCPUs, there wasn't much change for Westmere,
but a pretty big performance jump for Haswell.

Signed-off-by: Waiman Long 
---
 kernel/locking/qspinlock.c  |5 +-
 kernel/locking/qspinlock_paravirt.h |  111 +-
 2 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 94fdd27..da39d43 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -258,7 +258,8 @@ static __always_inline void set_locked(struct qspinlock 
*lock)
  */
 
 static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
+  struct mcs_spinlock *prev) { }
 static __always_inline void __pv_kick_node(struct qspinlock *lock,
   struct mcs_spinlock *node) { }
 static __always_inline void __pv_wait_head(struct qspinlock *lock,
@@ -415,7 +416,7 @@ queue:
prev = decode_tail(old);
WRITE_ONCE(prev->next, node);
 
-   pv_wait_node(node);
+   pv_wait_node(node, prev);
arch_mcs_spin_lock_contended(>locked);
}
 
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 9996609..f03bd7a 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -31,6 +31,38 @@
 #define PENDING_SPIN_THRESHOLD (SPIN_THRESHOLD >> 5)
 
 /*
+ * Queue Node Adaptive Spinning
+ *
+ * A queue node vCPU will spin less if the following conditions are all true:
+ * 1) vCPU in the previous node is halted
+ * 2) it has not been halted before
+ * 3) there is a lot of pv_wait() in the curent vCPU recently
+ *
+ * The last condition is being monitored by the wait_hist field in the pv_node
+ * structure which tracks the history of pv_wait() relative to slowpath calls.
+ * Each pv_wait will increment this field by PV_WAITHIST_INC until it exceeds
+ * PV_WAITHIST_MAX. Each slowpath lock call will decrement it by 1 until it
+ * reaches PV_WAITHIST_MIN. If its value is higher than PV_WAITHIST_THRESHOLD,
+ * the vCPU will spin less. The reason for this adaptive spinning is to try
+ * to enable wait-early mode only on over-committed guest which helps
+ * performance. However, it shouldn't be enabled when the guest is not
+ * over-committed as it will hurt performance.
+ *
+ * With PV_WAITHIST_INC set to 4, each pv_wait() while not in wait-early mode
+ * will increment wait_hist by 3. Each slowpath call without pv_wait() will
+ * decrement wait_hist by 1. The threshold is set at about 3/4 of the range
+ * so that about 10 steps from the edges in either direction will reach the
+ * threshold. If, on average, more than 1/4 of all slowpath calls results in
+ * a pv_wait(), it should stay in the wait-early mode.
+ */
+#define PV_WAITHIST_MASK   0xff
+#define PV_WAITHIST_INC4
+#define PV_WAITHIST_MIN1
+#define PV_WAITHIST_MAX40
+#define PV_WAITHIST_THRESHOLD  30
+#define PV_CAN_WAIT_EARLY(w)   ((w)->wait_hist > PV_WAITHIST_THRESHOLD)
+
+/*
  * Queue node uses: vcpu_running & vcpu_halted.
  * Queue head uses: vcpu_running & vcpu_hashed.
  */
@@ -46,6 +78,8 @@ struct pv_node {
 
int cpu;
u8   

[PATCH v5 2/6] locking/pvqspinlock: Add pending bit support

2015-08-07 Thread Waiman Long
Like the native qspinlock, using the pending bit when it is lightly
loaded to acquire the lock is faster than going through the PV queuing
process which is even slower than the native queuing process. It also
avoids loading two additional cachelines (the MCS and PV nodes).

This patch adds the pending bit support for PV qspinlock. The pending
bit code has a smaller spin threshold (1<<10). It will default back
to the queuing method if it cannot acquired the lock within a certain
time limit.

On a VM with 32 vCPUs on a 32-core Westmere-EX box, the kernel
build times on 4.2-rc1 based kernels were:

  KernelBuild Time  Sys Time
  ----  
  w/o patch   3m28.5s   28m17.5s
  with patch  3m19.3s   23m55.7s

Using a locking microbenchmark on the same system, the locking
rates in (kops/s) were:

  Threads   Rate w/o patch  Rate with patch
  ---   --  ---
  2 (same socket) 6,515,265   7,077,476
  2 (diff sockets)2,967,145   4,353,851

Signed-off-by: Waiman Long 
---
 kernel/locking/qspinlock.c  |   27 -
 kernel/locking/qspinlock_paravirt.h |   73 +++
 2 files changed, 99 insertions(+), 1 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 337c881..94fdd27 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -162,6 +162,17 @@ static __always_inline void 
clear_pending_set_locked(struct qspinlock *lock)
WRITE_ONCE(l->locked_pending, _Q_LOCKED_VAL);
 }
 
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+   struct __qspinlock *l = (void *)lock;
+
+   WRITE_ONCE(l->pending, 0);
+}
+
 /*
  * xchg_tail - Put in the new queue tail code word & retrieve previous one
  * @lock : Pointer to queued spinlock structure
@@ -193,6 +204,15 @@ static __always_inline void 
clear_pending_set_locked(struct qspinlock *lock)
 }
 
 /**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+   atomic_add(-_Q_PENDING_VAL, >val);
+}
+
+/**
  * xchg_tail - Put in the new queue tail code word & retrieve previous one
  * @lock : Pointer to queued spinlock structure
  * @tail : The new queue tail code word
@@ -245,6 +265,7 @@ static __always_inline void __pv_wait_head(struct qspinlock 
*lock,
   struct mcs_spinlock *node) { }
 
 #define pv_enabled()   false
+#define pv_pending_lock(l, v)  false
 
 #define pv_init_node   __pv_init_node
 #define pv_wait_node   __pv_wait_node
@@ -286,8 +307,11 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 
BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
-   if (pv_enabled())
+   if (pv_enabled()) {
+   if (pv_pending_lock(lock, val))
+   return; /* Got the lock via pending bit */
goto queue;
+   }
 
if (virt_queued_spin_lock(lock))
return;
@@ -463,6 +487,7 @@ EXPORT_SYMBOL(queued_spin_lock_slowpath);
 #undef pv_wait_node
 #undef pv_kick_node
 #undef pv_wait_head
+#undef pv_pending_lock
 
 #undef  queued_spin_lock_slowpath
 #define queued_spin_lock_slowpath  __pv_queued_spin_lock_slowpath
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 6eafb9e..94f9adf 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -23,6 +23,14 @@
 #define _Q_SLOW_VAL(3U << _Q_LOCKED_OFFSET)
 
 /*
+ * Queued Spinlock Spin Threshold
+ *
+ * The vCPU will spin a relatively short time in pending mode before falling
+ * back to queuing.
+ */
+#define PENDING_SPIN_THRESHOLD (SPIN_THRESHOLD >> 5)
+
+/*
  * Queue node uses: vcpu_running & vcpu_halted.
  * Queue head uses: vcpu_running & vcpu_hashed.
  */
@@ -157,6 +165,71 @@ static void pv_init_node(struct mcs_spinlock *node)
 }
 
 /*
+ * Try to acquire the lock and wait using the pending bit within a certain
+ * threshold as specified by PENDING_SPIN_THRESHOLD. If the threshold has
+ * been exceeded without getting the lock, we fall back to queuing.
+ */
+static int pv_pending_lock(struct qspinlock *lock, u32 val)
+{
+   int loop = PENDING_SPIN_THRESHOLD;
+   u32 new, old;
+
+   /*
+* wait for in-progress pending->locked hand-overs
+*/
+   while ((val == _Q_PENDING_VAL) && loop) {
+   cpu_relax();
+   val = atomic_read(>val);
+   loop--;
+   }
+
+   /*
+* trylock || pending
+*
+* This loop does a trylock if lock is free or sets the pending bit
+* if lock is taken until the cmpxchg succeeds. As it is expected
+* that 

[PATCH v5 3/6] locking/pvqspinlock: Collect slowpath lock statistics

2015-08-07 Thread Waiman Long
This patch enables the accumulation of kicking and waiting related
PV qspinlock statistics when the new QUEUED_LOCK_STAT configuration
option is selected. It also enables the collection of kicking and
wakeup latencies which have a heavy dependency on the CPUs being used.

The measured latencies for different CPUs are:

CPU Wakeup  Kicking
--- --  ---
Haswell-EX  89.8us   7.4us
Westmere-EX 67.6us   9.3us

The measured latencies varied a bit from run-to-run. The wakeup
latency is much higher than the kicking latency.

A sample of statistics counts after a kernel build (no CPU overcommit)
was:

hash_hops_count=576912
kick_latencies=5258025484
kick_unlock_count=576911
kick_wait_count=576903
pending_fail_count=10722
pending_lock_count=6123545
spurious_wakeup=92
wait_again_count=75
wait_head_count=60
wait_node_count=576936
wake_latencies=37061460652

Signed-off-by: Waiman Long 
---
 arch/x86/Kconfig|7 ++
 kernel/locking/qspinlock_paravirt.h |  178 ++-
 2 files changed, 180 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4c9c8b8..86bf53e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -719,6 +719,13 @@ config PARAVIRT_SPINLOCKS
 
  If you are unsure how to answer this question, answer Y.
 
+config QUEUED_LOCK_STAT
+   bool "Paravirt queued lock statistics"
+   depends on PARAVIRT && DEBUG_FS && QUEUED_SPINLOCKS
+   ---help---
+ Enable the collection of statistical data on the behavior of
+ paravirtualized queued spinlocks and report them on debugfs.
+
 source "arch/x86/xen/Kconfig"
 
 config KVM_GUEST
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 94f9adf..5eb5dea 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -49,6 +49,151 @@ struct pv_node {
 };
 
 /*
+ * PV qspinlock statistics
+ */
+enum pv_qlock_stat {
+   pvstat_wait_head,
+   pvstat_wait_node,
+   pvstat_wait_again,
+   pvstat_kick_wait,
+   pvstat_kick_unlock,
+   pvstat_pend_lock,
+   pvstat_pend_fail,
+   pvstat_spurious,
+   pvstat_hops,
+   pvstat_num  /* Total number of statistics counts */
+};
+
+#ifdef CONFIG_QUEUED_LOCK_STAT
+/*
+ * Collect pvqspinlock statiatics
+ */
+#include 
+#include 
+
+static const char * const stat_fsnames[pvstat_num] = {
+   [pvstat_wait_head]   = "wait_head_count",
+   [pvstat_wait_node]   = "wait_node_count",
+   [pvstat_wait_again]  = "wait_again_count",
+   [pvstat_kick_wait]   = "kick_wait_count",
+   [pvstat_kick_unlock] = "kick_unlock_count",
+   [pvstat_pend_lock]   = "pending_lock_count",
+   [pvstat_pend_fail]   = "pending_fail_count",
+   [pvstat_spurious]= "spurious_wakeup",
+   [pvstat_hops]= "hash_hops_count",
+};
+
+static atomic_t pvstats[pvstat_num];
+
+/*
+ * pv_kick_latencies = sum of all pv_kick latencies in ns
+ * pv_wake_latencies = sum of all wakeup latencies in ns
+ *
+ * Avg kick latency   = pv_kick_latencies/kick_unlock_count
+ * Avg wake latency   = pv_wake_latencies/kick_wait_count
+ * Avg # of hops/hash = hash_hops_count/kick_unlock_count
+ */
+static atomic64_t pv_kick_latencies, pv_wake_latencies;
+static DEFINE_PER_CPU(u64, pv_kick_time);
+
+/*
+ * Reset all the statistics counts if set
+ */
+static bool reset_cnts __read_mostly;
+
+/*
+ * Initialize debugfs for the PV qspinlock statistics
+ */
+static int __init pv_qspinlock_debugfs(void)
+{
+   struct dentry *d_pvqlock = debugfs_create_dir("pv-qspinlock", NULL);
+   int i;
+
+   if (!d_pvqlock)
+   pr_warn("Could not create 'pv-qspinlock' debugfs directory\n");
+
+   for (i = 0; i < pvstat_num; i++)
+   debugfs_create_u32(stat_fsnames[i], 0444, d_pvqlock,
+ (u32 *)[i]);
+   debugfs_create_u64("kick_latencies", 0444, d_pvqlock,
+  (u64 *)_kick_latencies);
+   debugfs_create_u64("wake_latencies", 0444, d_pvqlock,
+  (u64 *)_wake_latencies);
+   debugfs_create_bool("reset_cnts", 0644, d_pvqlock, (u32 *)_cnts);
+   return 0;
+}
+fs_initcall(pv_qspinlock_debugfs);
+
+/*
+ * Reset all the counts
+ */
+static noinline void pvstat_reset(void)
+{
+   int i;
+
+   for (i = 0; i < pvstat_num; i++)
+   atomic_set([i], 0);
+   atomic64_set(_kick_latencies, 0);
+   atomic64_set(_wake_latencies, 0);
+   reset_cnts = 0;
+}
+
+/*
+ * Increment the PV qspinlock statistics counts
+ */
+static inline void pvstat_inc(enum pv_qlock_stat stat)
+{
+   atomic_inc([stat]);
+   if (unlikely(reset_cnts))
+   pvstat_reset();
+}
+
+/*
+ * PV hash hop count
+ */
+static inline void pvstat_hop(int hopcnt)
+{
+   atomic_add(hopcnt, [pvstat_hops]);
+}
+
+/*
+ * 

[PATCH v5 5/6] locking/pvqspinlock: Allow vCPUs kick-ahead

2015-08-07 Thread Waiman Long
Frequent CPU halting (vmexit) and CPU kicking (vmenter) lengthens
critical section and block forward progress.  This patch implements
a kick-ahead mechanism where the unlocker will kick the queue head
vCPUs as well as up to four additional vCPUs next to the queue head
if they were halted.  The kickings are done after exiting the critical
section to improve parallelism.

The amount of kick-ahead allowed depends on the number of vCPUs
in the VM guest. Currently it allows up to 1 vCPU kick-ahead per
4 vCPUs available up to a maximum of PV_KICK_AHEAD_MAX (4). There
are diminishing returns in increasing the maximum value. The current
value of 4 is a compromise of getting a nice performance boost without
penalizing too much on the one vCPU that is doing all the kickings.

Linux kernel builds were run in KVM guest on an 8-socket, 4
cores/socket Westmere-EX system and a 4-socket, 8 cores/socket
Haswell-EX system. Both systems are configured to have 32 physical
CPUs. The kernel build times before and after the patch were:

WestmereHaswell
  Patch 32 vCPUs48 vCPUs32 vCPUs48 vCPUs
  - 
  Before patch   3m21.9s11m20.6s 2m08.6s17m12.8s
  After patch3m03.2s 9m21.1s 2m08.9s16m14.8s

This improves performance quite substantially on Westmere, but not
so much on Haswell.

Signed-off-by: Waiman Long 
---
 kernel/locking/qspinlock_paravirt.h |   71 +-
 1 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 7c9d6ed..9996609 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -57,6 +57,7 @@ enum pv_qlock_stat {
pvstat_wait_again,
pvstat_kick_wait,
pvstat_kick_unlock,
+   pvstat_kick_ahead,
pvstat_pend_lock,
pvstat_pend_fail,
pvstat_spurious,
@@ -77,6 +78,7 @@ static const char * const stat_fsnames[pvstat_num] = {
[pvstat_wait_again]  = "wait_again_count",
[pvstat_kick_wait]   = "kick_wait_count",
[pvstat_kick_unlock] = "kick_unlock_count",
+   [pvstat_kick_ahead]  = "kick_ahead_count",
[pvstat_pend_lock]   = "pending_lock_count",
[pvstat_pend_fail]   = "pending_fail_count",
[pvstat_spurious]= "spurious_wakeup",
@@ -89,7 +91,7 @@ static atomic_t pvstats[pvstat_num];
  * pv_kick_latencies = sum of all pv_kick latencies in ns
  * pv_wake_latencies = sum of all wakeup latencies in ns
  *
- * Avg kick latency   = pv_kick_latencies/kick_unlock_count
+ * Avg kick latency   = pv_kick_latencies/(kick_unlock_count + 
kick_ahead_count)
  * Avg wake latency   = pv_wake_latencies/kick_wait_count
  * Avg # of hops/hash = hash_hops_count/kick_unlock_count
  */
@@ -221,6 +223,18 @@ static struct pv_hash_entry *pv_lock_hash;
 static unsigned int pv_lock_hash_bits __read_mostly;
 
 /*
+ * Allow kick-ahead of vCPUs at unlock time
+ *
+ * The pv_kick_ahead value is set by a simple formula that 1 vCPU kick-ahead
+ * is allowed per 4 vCPUs available up to a maximum of PV_KICK_AHEAD_MAX.
+ * There are diminishing returns in increasing PV_KICK_AHEAD_MAX. The current
+ * value of 4 is a good compromise that gives a good performance boost without
+ * penalizing the vCPU that is doing the kicking by too much.
+ */
+#define PV_KICK_AHEAD_MAX  4
+static int pv_kick_ahead __read_mostly;
+
+/*
  * Allocate memory for the PV qspinlock hash buckets
  *
  * This function should be called from the paravirt spinlock initialization
@@ -228,7 +242,8 @@ static unsigned int pv_lock_hash_bits __read_mostly;
  */
 void __init __pv_init_lock_hash(void)
 {
-   int pv_hash_size = ALIGN(4 * num_possible_cpus(), PV_HE_PER_LINE);
+   int ncpus = num_possible_cpus();
+   int pv_hash_size = ALIGN(4 * ncpus, PV_HE_PER_LINE);
 
if (pv_hash_size < PV_HE_MIN)
pv_hash_size = PV_HE_MIN;
@@ -242,6 +257,13 @@ void __init __pv_init_lock_hash(void)
   pv_hash_size, 0, HASH_EARLY,
   _lock_hash_bits, NULL,
   pv_hash_size, pv_hash_size);
+   /*
+* Enable the unlock kick ahead mode according to the number of
+* vCPUs available.
+*/
+   pv_kick_ahead = min(ncpus/4, PV_KICK_AHEAD_MAX);
+   if (pv_kick_ahead)
+   pr_info("PV unlock kick ahead max count = %d\n", pv_kick_ahead);
 }
 
 #define for_each_hash_entry(he, offset, hash)  
\
@@ -551,6 +573,26 @@ static void pv_wait_head(struct qspinlock *lock, struct 
mcs_spinlock *node)
 }
 
 /*
+ * Helper to get the address of the next kickable node
+ *
+ * The node has to be in the halted state. The state will then be
+ * transitioned to the running state. If no kickable node is 

[PATCH v5 4/6] locking/pvqspinlock, x86: Optimize PV unlock code path

2015-08-07 Thread Waiman Long
The unlock function in queued spinlocks was optimized for better
performance on bare metal systems at the expense of virtualized guests.

For x86-64 systems, the unlock call needs to go through a
PV_CALLEE_SAVE_REGS_THUNK() which saves and restores 8 64-bit
registers before calling the real __pv_queued_spin_unlock()
function. The thunk code may also be in a separate cacheline from
__pv_queued_spin_unlock().

This patch optimizes the PV unlock code path by:
 1) Moving the unlock slowpath code from the fastpath into a separate
__pv_queued_spin_unlock_slowpath() function to make the fastpath
as simple as possible..
 2) For x86-64, hand-coded an assembly function to combine the register
saving thunk code with the fastpath code. Only registers that
are used in the fastpath will be saved and restored. If the
fastpath fails, the slowpath function will be called via another
PV_CALLEE_SAVE_REGS_THUNK(). For 32-bit, it falls back to the C
__pv_queued_spin_unlock() code as the thunk saves and restores
only one 32-bit register.

With a microbenchmark of 5M lock-unlock loop, the table below shows
the execution times before and after the patch with different number
of threads in a VM running on a 32-core Westmere-EX box with x86-64
4.2-rc1 based kernels:

  Threads   Before patchAfter patch % Change
  ---   --- 
 1 134.1 ms   119.3 ms-11%
 2 1286  ms953  ms-26%
 3 3715  ms   3480  ms-6.3%
 4 4092  ms   3764  ms-8.0%

Signed-off-by: Waiman Long 
---
 arch/x86/include/asm/qspinlock_paravirt.h |   59 +
 kernel/locking/qspinlock_paravirt.h   |   43 +
 2 files changed, 86 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/qspinlock_paravirt.h 
b/arch/x86/include/asm/qspinlock_paravirt.h
index b002e71..3001972 100644
--- a/arch/x86/include/asm/qspinlock_paravirt.h
+++ b/arch/x86/include/asm/qspinlock_paravirt.h
@@ -1,6 +1,65 @@
 #ifndef __ASM_QSPINLOCK_PARAVIRT_H
 #define __ASM_QSPINLOCK_PARAVIRT_H
 
+/*
+ * For x86-64, PV_CALLEE_SAVE_REGS_THUNK() saves and restores 8 64-bit
+ * registers. For i386, however, only 1 32-bit register needs to be saved
+ * and restored. So an optimized version of __pv_queued_spin_unlock() is
+ * hand-coded for 64-bit, but it isn't worthwhile to do it for 32-bit.
+ */
+#ifdef CONFIG_64BIT
+
+PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath);
+#define __pv_queued_spin_unlock__pv_queued_spin_unlock
+#define PV_UNLOCK  "__raw_callee_save___pv_queued_spin_unlock"
+#define PV_UNLOCK_SLOWPATH 
"__raw_callee_save___pv_queued_spin_unlock_slowpath"
+
+/*
+ * Optimized assembly version of __raw_callee_save___pv_queued_spin_unlock
+ * which combines the registers saving trunk and the body of the following
+ * C code:
+ *
+ * void __pv_queued_spin_unlock(struct qspinlock *lock)
+ * {
+ * struct __qspinlock *l = (void *)lock;
+ * u8 lockval = cmpxchg(>locked, _Q_LOCKED_VAL, 0);
+ *
+ * if (likely(lockval == _Q_LOCKED_VAL))
+ * return;
+ * pv_queued_spin_unlock_slowpath(lock, lockval);
+ * }
+ *
+ * For x86-64,
+ *   rdi = lock(first argument)
+ *   rsi = lockval (second argument)
+ *   rdx = internal variable (set to 0)
+ */
+asm(".pushsection .text;"
+".globl " PV_UNLOCK ";"
+".align 4,0x90;"
+PV_UNLOCK ": "
+"push  %rdx;"
+"mov   $0x1,%eax;"
+"xor   %edx,%edx;"
+"lock cmpxchg %dl,(%rdi);"
+"cmp   $0x1,%al;"
+"jne   .slowpath;"
+"pop   %rdx;"
+"ret;"
+".slowpath: "
+"push   %rsi;"
+"movzbl %al,%esi;"
+"call " PV_UNLOCK_SLOWPATH ";"
+"pop%rsi;"
+"pop%rdx;"
+"ret;"
+".size " PV_UNLOCK ", .-" PV_UNLOCK ";"
+".popsection");
+
+#else /* CONFIG_64BIT */
+
+extern void __pv_queued_spin_unlock(struct qspinlock *lock);
 PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock);
 
+#endif /* CONFIG_64BIT */
 #endif
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 5eb5dea..7c9d6ed 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -551,23 +551,14 @@ static void pv_wait_head(struct qspinlock *lock, struct 
mcs_spinlock *node)
 }
 
 /*
- * PV version of the unlock function to be used in stead of
- * queued_spin_unlock().
+ * PV versions of the unlock fastpath and slowpath functions to be used
+ * instead of queued_spin_unlock().
  */
-__visible void __pv_queued_spin_unlock(struct qspinlock *lock)
+__visible void
+__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
 {
struct __qspinlock *l = (void *)lock;
struct pv_node *node;
-   u8 locked;
-
-   /*
-* We must not unlock if SLOW, because in that case we must first
-* unhash. Otherwise it would be possible to 

[PATCH v5 1/2] usb: make xhci platform driver use 64 bit or 32 bit DMA

2015-08-07 Thread Duc Dang
The xhci platform driver needs to work on systems that
either only support 64-bit DMA or only support 32-bit DMA.
Attempt to set a coherent dma mask for 64-bit DMA, and
attempt again with 32-bit DMA if that fails.

[dhdang: Regenerate the patch over 4.2-rc5]
Signed-off-by: Mark Langsdorf 
Tested-by: Mark Salter 
Signed-off-by: Duc Dang 

---
Changes from v4:
None

Changes from v3:
Re-generate the patch over 4.2-rc5
No code change.

Changes from v2:
None

Changes from v1:
Consolidated to use dma_set_mask_and_coherent
Got rid of the check against sizeof(dma_addr_t)

 drivers/usb/host/xhci-plat.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index 890ad9d..5d03f8b 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -93,14 +93,14 @@ static int xhci_plat_probe(struct platform_device *pdev)
if (irq < 0)
return -ENODEV;
 
-   /* Initialize dma_mask and coherent_dma_mask to 32-bits */
-   ret = dma_set_coherent_mask(>dev, DMA_BIT_MASK(32));
-   if (ret)
-   return ret;
-   if (!pdev->dev.dma_mask)
-   pdev->dev.dma_mask = >dev.coherent_dma_mask;
-   else
-   dma_set_mask(>dev, DMA_BIT_MASK(32));
+   /* Try setting the coherent_dma_mask to 64 bits, then try 32 bits */
+   ret = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(64));
+   if (ret) {
+   ret = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(32));
+   if (ret)
+   return ret;
+   }
+
 
hcd = usb_create_hcd(driver, >dev, dev_name(>dev));
if (!hcd)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 0/6] locking/qspinlock: Enhance pvqspinlock performance

2015-08-07 Thread Waiman Long
v4->v5:
 - Rebased the patch to the latest tip tree.
 - Corrected the comments and commit log for patch 1.
 - Removed the v4 patch 5 as PV kick deferment is no longer needed with
   the new tip tree.
 - Simplified the adaptive spinning patch (patch 6) & improve its
   performance a bit further.
 - Re-ran the benchmark test with the new patch.

v3->v4:
 - Patch 1: add comment about possible racing condition in PV unlock.
 - Patch 2: simplified the pv_pending_lock() function as suggested by
   Davidlohr.
 - Move PV unlock optimization patch forward to patch 4 & rerun
   performance test.

v2->v3:
 - Moved deferred kicking enablement patch forward & move back
   the kick-ahead patch to make the effect of kick-ahead more visible.
 - Reworked patch 6 to make it more readable.
 - Reverted back to use state as a tri-state variable instead of
   adding an additional bistate variable.
 - Added performance data for different values of PV_KICK_AHEAD_MAX.
 - Add a new patch to optimize PV unlock code path performance.

v1->v2:
 - Take out the queued unfair lock patches
 - Add a patch to simplify the PV unlock code
 - Move pending bit and statistics collection patches to the front
 - Keep vCPU kicking in pv_kick_node(), but defer it to unlock time
   when appropriate.
 - Change the wait-early patch to use adaptive spinning to better
   balance the difference effect on normal and over-committed guests.
 - Add patch-to-patch performance changes in the patch commit logs.

This patchset tries to improve the performance of both normal and
over-commmitted VM guests. The kick-ahead and adaptive spinning
patches are inspired by the "Do Virtual Machines Really Scale?" blog
from Sanidhya Kashyap.

Patch 1 simplifies the unlock code by removing the unnecessary
state check.

Patch 2 adds pending bit support to pvqspinlock improving performance
at light load.

Patch 3 allows the collection of various count data that are useful
to see what is happening in the system. They do add a bit of overhead
when enabled slowing performance a tiny bit.

Patch 4 optimizes the PV unlock code path performance for x86-64
architecture.

Patch 5 enables multiple vCPU kick-ahead's at unlock time, outside of
the critical section which can improve performance in overcommitted
guests and sometime even in normal guests.

Patch 6 enables adaptive spinning in the queue nodes. This patch can
lead to pretty big performance increase in over-committed guest at
the expense of a slight performance hit in normal guests.

Patches 2 & 4 improves performance of common uncontended and lightly
contended cases. Patches 5-6 are for improving performance in
over-committed VM guests.

Performance measurements were done on a 32-CPU Westmere-EX and
Haswell-EX systems. The Westmere-EX system got the most performance
gain from patch 5, whereas the Haswell-EX system got the most gain
from patch 6 for over-committed guests.

The table below shows the Linux kernel build times for various
values of PV_KICK_AHEAD_MAX on an over-committed 48-vCPU guest on
the Westmere-EX system:

  PV_KICK_AHEAD_MAX Patches 1-5 Patches 1-6
  - --- ---
  1   9m46.9s11m10.1s
  2   9m40.2s10m08.3s
  3   9m36.8s 9m49.8s
  4   9m35.9s 9m38.7s
  5   9m35.1s 9m33.0s
  6   9m35.7s 9m28.5s

With patches 1-5, the performance wasn't very sensitive to different
PV_KICK_AHEAD_MAX values. Adding patch 6 into the mix, however, changes
the picture quite dramatically. There is a performance regression if
PV_KICK_AHEAD_MAX is too small. Starting with a value of 4, increasing
PV_KICK_AHEAD_MAX only gets us a minor benefit.

Waiman Long (6):
  locking/pvqspinlock: Unconditional PV kick with _Q_SLOW_VAL
  locking/pvqspinlock: Add pending bit support
  locking/pvqspinlock: Collect slowpath lock statistics
  locking/pvqspinlock, x86: Optimize PV unlock code path
  locking/pvqspinlock: Allow vCPUs kick-ahead
  locking/pvqspinlock: Queue node adaptive spinning

 arch/x86/Kconfig  |7 +
 arch/x86/include/asm/qspinlock_paravirt.h |   59 
 kernel/locking/qspinlock.c|   32 ++-
 kernel/locking/qspinlock_paravirt.h   |  475 +++--
 4 files changed, 542 insertions(+), 31 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 1/6] locking/pvqspinlock: Unconditional PV kick with _Q_SLOW_VAL

2015-08-07 Thread Waiman Long
If _Q_SLOW_VAL has been set, the vCPU state must have been vcpu_hashed.
The extra check at the end of __pv_queued_spin_unlock() is unnecessary
and so is removed.

Signed-off-by: Waiman Long 
Reviewed-by: Davidlohr Bueso 
---
 kernel/locking/qspinlock_paravirt.h |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index c8e6e9a..6eafb9e 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -364,8 +364,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock 
*lock)
 * vCPU is harmless other than the additional latency in completing
 * the unlock.
 */
-   if (READ_ONCE(node->state) == vcpu_hashed)
-   pv_kick(node->cpu);
+   pv_kick(node->cpu);
 }
 /*
  * Include the architecture specific callee-save thunk of the
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/2] usb: Add support for ACPI identification to xhci-platform

2015-08-07 Thread Duc Dang
On Fri, Aug 7, 2015 at 6:29 PM, Greg KH  wrote:
> On Fri, Aug 07, 2015 at 06:03:36PM -0700, Duc Dang wrote:
>> Provide the methods to let ACPI identify the need to use
>> xhci-platform. Change the Kconfig files so the
>> xhci-plat.o file is selectable during kernel config.
>>
>> This has been tested on an ARM64 machine with platform XHCI, an
>> x86_64 machine with XHCI, and an x86_64 machine without XHCI.
>> There were no regressions or error messages on the machines
>> without platform XHCI.
>>
>> [dhdang: regenerate the patch over 4.2-rc5]
>> Signed-off-by: Mark Langsdorf 
>> Signed-off-by: Duc Dang 
>>
>> ---
>> Changes from v3:
>>   Regenerate the patch over 4.2-rc5
>>   No code change
>>
>> Changes from v2
>>   Replaced tristate with a boolean as the driver doesn't
>>   compile as a module
>>   Correct --help-- to ---help---
>>
>> Changes from v1
>>   Renamed from "add support for APM X-Gene to xhci-platform"
>>   Removed changes to arm64/Kconfig
>>   Made CONFIG_USB_XHCI_PLATFORM a user selectable config option
>>
>>  drivers/usb/host/Kconfig |  7 ++-
>>  drivers/usb/host/xhci-plat.c | 11 +++
>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
>> index 8afc3c1..96231ee 100644
>> --- a/drivers/usb/host/Kconfig
>> +++ b/drivers/usb/host/Kconfig
>> @@ -32,7 +32,12 @@ config USB_XHCI_PCI
>> default y
>>
>>  config USB_XHCI_PLATFORM
>> - tristate
>> + tristate "xHCI platform driver support"
>> + ---help---
>> +   Say 'Y' to enable the support for the xHCI host controller
>> +   as a platform device. Many ARM SoCs provide USB this way.
>> +
>> +   If unsure, say 'Y'.
>>
>>  config USB_XHCI_MVEBU
>>   tristate "xHCI support for Marvell Armada 375/38x"
>> diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
>> index 5d03f8b..14b40d2 100644
>> --- a/drivers/usb/host/xhci-plat.c
>> +++ b/drivers/usb/host/xhci-plat.c
>> @@ -19,6 +19,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include "xhci.h"
>>  #include "xhci-mvebu.h"
>> @@ -262,6 +263,15 @@ static const struct of_device_id usb_xhci_of_match[] = {
>>  MODULE_DEVICE_TABLE(of, usb_xhci_of_match);
>>  #endif
>>
>> +#ifdef CONFIG_ACPI
>
> You shoudn't need this #ifdef, right?

You are correct, Greg.

I will post a new version that remove this #ifdef CONFIG_ACPI shortly

>
> thanks,
>
> greg k-h



-- 
Regards,
Duc Dang.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf, tools, report: Add support for srcfile sort key

2015-08-07 Thread Andi Kleen
On Fri, Aug 07, 2015 at 09:02:15PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Aug 07, 2015 at 08:51:45PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Aug 07, 2015 at 03:54:24PM -0700, Andi Kleen escreveu:
> > > From: Andi Kleen 
> > > 
> > > In some cases it's useful to characterize samples by file. This is useful
> > > to get a higher level categorization, for example to map cost to
> > > subsystems.
> > > 
> > > Add a srcfile sort key to perf report. It builds on top of the existing
> > > srcline support.
> > 
> > Applied
> 
> Humm, holding this up a bit, further testing showed some oddities,
> fedora21, the width of the column is being limited to the lenght of the
> header

Yes I've seen that, I just use -w normally. It also happens with --sort
srcline. The column sizing code could probably be somewhat smarter and
always allow the last column to become as wide as needed. But that's
something that should be done separately; I don't think it belongs
into this patch.

> and there are some DWARF errors, have you noticed those?

No I didn't. Some generic issue, likely happening with srcline and
perhaps even objdump -S too. Find out with strace what file it is.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] cpufreq-dt: make scaling_boost_freqs sysfs attr available when boost is enabled

2015-08-07 Thread Viresh Kumar
On 07-08-15, 13:59, Bartlomiej Zolnierkiewicz wrote:
> Make scaling_boost_freqs sysfs attribute is available when
> cpufreq-dt driver is used and boost support is enabled.
> 
> Cc: Thomas Abraham 
> Cc: Javier Martinez Canillas 
> Cc: Krzysztof Kozlowski 
> Suggested-by: Viresh Kumar 
> Acked-by: Viresh Kumar 
> Signed-off-by: Bartlomiej Zolnierkiewicz 
> ---
> v2: Removed leftover prototype.
> v3: added missing Acked-by
> Sorry for two resends, this is not my day..
> Rafael, please apply for v4.3.

Yeah, please apply only this patch from the series.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/6] cpufreq: use generic cpufreq drivers for Exynos4x12 platform

2015-08-07 Thread Viresh Kumar
On 08-08-15, 00:24, Rafael J. Wysocki wrote:
> OK, so please let me know which patches you want me to pick up.
> 
> Ideally, I'd prefer them to be resent in a separate series with ACKs and all
> with a cover letter clearly stating whose tree they are being targeted at.

He already sent it separately, let me send a confirmatory mail in
reply to that.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 1/6] cpufreq: make scaling_boost_freqs sysfs attr available when boost is enabled

2015-08-07 Thread Viresh Kumar
On 08-08-15, 00:21, Rafael J. Wysocki wrote:
> > Acked-by: Viresh Kumar 
> 
> And what exactly am I supposed to do with this?
> 
> Have a robot that will pick up all patches ACKed by you magically or what?

:)

That's why I have asked Bartlomiej specifically to send it separately
to the pm list as a cpufreq patch. I wasn't expecting you to apply
right from here.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] ARM: VDSO: fix coarse clock monotonicity regression

2015-08-07 Thread Nathan Lynch
Since 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the
real timekeeper last") it has become possible on ARM to:

- Obtain a CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE timestamp
  via syscall.
- Subsequently obtain a timestamp for the same clock ID via VDSO which
  predates the first timestamp (by one jiffy).

This is because ARM's update_vsyscall is deriving the coarse time
using the __current_kernel_time interface, when it should really be
using the timekeeper object provided to it by the timekeeping core.
It happened to work before only because __current_kernel_time would
access the same timekeeper object which had been passed to
update_vsyscall.  This is no longer the case.

Signed-off-by: Nathan Lynch 
---
 arch/arm/kernel/vdso.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/vdso.c b/arch/arm/kernel/vdso.c
index efe17dd9b921..c8b243c1aef8 100644
--- a/arch/arm/kernel/vdso.c
+++ b/arch/arm/kernel/vdso.c
@@ -296,7 +296,6 @@ static bool tk_is_cntvct(const struct timekeeper *tk)
  */
 void update_vsyscall(struct timekeeper *tk)
 {
-   struct timespec xtime_coarse;
struct timespec64 *wtm = >wall_to_monotonic;
 
if (!cntvct_ok) {
@@ -308,10 +307,10 @@ void update_vsyscall(struct timekeeper *tk)
 
vdso_write_begin(vdso_data);
 
-   xtime_coarse = __current_kernel_time();
vdso_data->tk_is_cntvct = tk_is_cntvct(tk);
-   vdso_data->xtime_coarse_sec = xtime_coarse.tv_sec;
-   vdso_data->xtime_coarse_nsec= xtime_coarse.tv_nsec;
+   vdso_data->xtime_coarse_sec = tk->xtime_sec;
+   vdso_data->xtime_coarse_nsec= tk->tkr_mono.xtime_nsec >>
+   tk->tkr_mono.shift;
vdso_data->wtm_clock_sec= wtm->tv_sec;
vdso_data->wtm_clock_nsec   = wtm->tv_nsec;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] fix vdso coarse clock monotonicity regressions

2015-08-07 Thread Nathan Lynch
Commit 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the
real timekeeper last") made it so the user can observe the coarse
clocks going backwards on arm and arm64, if they're really looking for
it.

Technically these are fixing regressions versus 4.1, but I won't be
bothered if they don't make 4.2 final at this late stage, since only
the (seldom-used?) coarse clocks are affected.

I'd like to collect review/acks for these now and make sure they at
least make it into 4.3-rc1 (and -stable after that).


Nathan Lynch (2):
  ARM: VDSO: fix coarse clock monotonicity regression
  arm64: VDSO: fix coarse clock monotonicity regression

 arch/arm/kernel/vdso.c   | 7 +++
 arch/arm64/kernel/vdso.c | 7 +++
 2 files changed, 6 insertions(+), 8 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] arm64: VDSO: fix coarse clock monotonicity regression

2015-08-07 Thread Nathan Lynch
Since 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the
real timekeeper last") it has become possible on arm64 to:

- Obtain a CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE timestamp
  via syscall.
- Subsequently obtain a timestamp for the same clock ID via VDSO which
  predates the first timestamp (by one jiffy).

This is because arm64's update_vsyscall is deriving the coarse time
using the __current_kernel_time interface, when it should really be
using the timekeeper object provided to it by the timekeeping core.
It happened to work before only because __current_kernel_time would
access the same timekeeper object which had been passed to
update_vsyscall.  This is no longer the case.

Signed-off-by: Nathan Lynch 
---
 arch/arm64/kernel/vdso.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index ec37ab3f524f..97bc68f4c689 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -199,16 +199,15 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
  */
 void update_vsyscall(struct timekeeper *tk)
 {
-   struct timespec xtime_coarse;
u32 use_syscall = strcmp(tk->tkr_mono.clock->name, "arch_sys_counter");
 
++vdso_data->tb_seq_count;
smp_wmb();
 
-   xtime_coarse = __current_kernel_time();
vdso_data->use_syscall  = use_syscall;
-   vdso_data->xtime_coarse_sec = xtime_coarse.tv_sec;
-   vdso_data->xtime_coarse_nsec= xtime_coarse.tv_nsec;
+   vdso_data->xtime_coarse_sec = tk->xtime_sec;
+   vdso_data->xtime_coarse_nsec= tk->tkr_mono.xtime_nsec >>
+   tk->tkr_mono.shift;
vdso_data->wtm_clock_sec= tk->wall_to_monotonic.tv_sec;
vdso_data->wtm_clock_nsec   = tk->wall_to_monotonic.tv_nsec;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 2/3] xfs: Disable percpu SB on PREEMPT_RT_FULL

2015-08-07 Thread Steven Rostedt
3.2.69-rt102-rc1 stable review patch.
If anyone has any objections, please let me know.

--

From: Steven Rostedt 

Running a test on a large CPU count box with xfs, I hit a live lock
with the following backtraces on several CPUs:

 Call Trace:
  [] __const_udelay+0x28/0x30
  [] xfs_icsb_lock_cntr+0x2a/0x40 [xfs]
  [] xfs_icsb_modify_counters+0x71/0x280 [xfs]
  [] xfs_trans_reserve+0x171/0x210 [xfs]
  [] xfs_create+0x24d/0x6f0 [xfs]
  [] ? avc_has_perm_flags+0xfb/0x1e0
  [] xfs_vn_mknod+0xbb/0x1e0 [xfs]
  [] xfs_vn_create+0x13/0x20 [xfs]
  [] vfs_create+0xcd/0x130
  [] do_last+0xb8f/0x1240
  [] path_openat+0xc2/0x490

Looking at the code I see it was stuck at:

STATIC void
xfs_icsb_lock_cntr(
xfs_icsb_cnts_t *icsbp)
{
while (test_and_set_bit(XFS_ICSB_FLAG_LOCK, >icsb_flags)) {
ndelay(1000);
}
}

In xfs_icsb_modify_counters() the code is fine. There's a
preempt_disable() called when taking this bit spinlock and a
preempt_enable() after it is released. The issue is that not all
locations are protected by preempt_disable() when PREEMPT_RT is set.
Namely the places that grab all CPU cntr locks.

STATIC void
xfs_icsb_lock_all_counters(
xfs_mount_t *mp)
{
xfs_icsb_cnts_t *cntp;
int i;

for_each_online_cpu(i) {
cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
xfs_icsb_lock_cntr(cntp);
}
}

STATIC void
xfs_icsb_disable_counter()
{
[...]
xfs_icsb_lock_all_counters(mp);
[...]
xfs_icsb_unlock_all_counters(mp);
}

STATIC void
xfs_icsb_balance_counter_locked()
{
[...]
xfs_icsb_disable_counter();
[...]
}

STATIC void
xfs_icsb_balance_counter(
xfs_mount_t *mp,
xfs_sb_field_t  fields,
int min_per_cpu)
{
spin_lock(>m_sb_lock);
xfs_icsb_balance_counter_locked(mp, fields, min_per_cpu);
spin_unlock(>m_sb_lock);
}

Now, when PREEMPT_RT is not enabled, that spin_lock() disables
preemption. But for PREEMPT_RT, it does not. Although with my test box I
was not able to produce a task state of all tasks, but I'm assuming that
some task called the xfs_icsb_lock_all_counters() and was preempted by
an RT task and could not finish, causing all callers of that lock to
block indefinitely.

Dave Chinner has stated that the scalability of that code will probably
be negated by PREEMPT_RT, and that it is probably best to just disable
the code in question. Also, this code has been rewritten in newer kernels.

Link: http://lkml.kernel.org/r/20150504004844.GA21261@dastard

Cc: stable...@vger.kernel.org
Suggested-by: Dave Chinner 
Signed-off-by: Steven Rostedt 
---
 fs/xfs/xfs_linux.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 828662f70d64..13d86a8dae43 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -97,7 +97,7 @@
 /*
  * Feature macros (disable/enable)
  */
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) && !defined(CONFIG_PREEMPT_RT_FULL)
 #define HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */
 #else
 #undef  HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */
-- 
2.4.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 0/3] Linux 3.2.69-rt102-rc1

2015-08-07 Thread Steven Rostedt

Dear RT Folks,

This is the RT stable review cycle of patch 3.2.69-rt102-rc1.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 8/10/2015.

Enjoy,

-- Steve


To build 3.2.69-rt102-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.2.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.2.69.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.2/patch-3.2.69-rt102-rc1.patch.xz

You can also build from 3.2.69-rt101 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/3.2/incr/patch-3.2.69-rt101-rt102-rc1.patch.xz


Changes from 3.2.69-rt101:

---


Steven Rostedt (1):
  xfs: Disable percpu SB on PREEMPT_RT_FULL

Steven Rostedt (Red Hat) (1):
  Linux 3.2.69-rt102-rc1

Thomas Gleixner (1):
  mm/slub: move slab initialization into irq enabled region


 fs/xfs/xfs_linux.h |  2 +-
 localversion-rt|  2 +-
 mm/slub.c  | 77 ++
 3 files changed, 39 insertions(+), 42 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 3/3] Linux 3.2.69-rt102-rc1

2015-08-07 Thread Steven Rostedt
3.2.69-rt102-rc1 stable review patch.
If anyone has any objections, please let me know.

--

From: "Steven Rostedt (Red Hat)" 

---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index 9ea5981fa956..04456d92828d 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt101
+-rt102-rc1
-- 
2.4.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 1/3] mm/slub: move slab initialization into irq enabled region

2015-08-07 Thread Steven Rostedt
3.2.69-rt102-rc1 stable review patch.
If anyone has any objections, please let me know.

--

From: Thomas Gleixner 

Initializing a new slab can introduce rather large latencies because most
of the initialization runs always with interrupts disabled.

There is no point in doing so.  The newly allocated slab is not visible
yet, so there is no reason to protect it against concurrent alloc/free.

Move the expensive parts of the initialization into allocate_slab(), so
for all allocations with GFP_WAIT set, interrupts are enabled.

Signed-off-by: Thomas Gleixner 
Acked-by: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Sebastian Andrzej Siewior 
Cc: Steven Rostedt 
Cc: Peter Zijlstra 
Signed-off-by: Andrew Morton 
Signed-off-by: Steven Rostedt 
---
 mm/slub.c | 77 ++-
 1 file changed, 37 insertions(+), 40 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 6a4c2fb83839..513c55ddbc3d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1258,6 +1258,14 @@ static inline void slab_free_hook(struct kmem_cache *s, 
void *x) {}
 
 #endif /* CONFIG_SLUB_DEBUG */
 
+static void setup_object(struct kmem_cache *s, struct page *page,
+   void *object)
+{
+   setup_object_debug(s, page, object);
+   if (unlikely(s->ctor))
+   s->ctor(object);
+}
+
 /*
  * Slab allocation and freeing
  */
@@ -1279,6 +1287,8 @@ static struct page *allocate_slab(struct kmem_cache *s, 
gfp_t flags, int node)
struct page *page;
struct kmem_cache_order_objects oo = s->oo;
gfp_t alloc_gfp;
+   void *start, *last, *p;
+   int idx, order;
 
flags &= gfp_allowed_mask;
 
@@ -1301,17 +1311,11 @@ static struct page *allocate_slab(struct kmem_cache *s, 
gfp_t flags, int node)
 * Try a lower order alloc if possible
 */
page = alloc_slab_page(flags, node, oo);
-
-   if (page)
-   stat(s, ORDER_FALLBACK);
+   if (unlikely(!page))
+   goto out;
+   stat(s, ORDER_FALLBACK);
}
 
-   if (flags & __GFP_WAIT)
-   local_irq_disable();
-
-   if (!page)
-   return NULL;
-
if (kmemcheck_enabled
&& !(s->flags & (SLAB_NOTRACK | DEBUG_DEFAULT_FLAGS))) {
int pages = 1 << oo_order(oo);
@@ -1329,37 +1333,6 @@ static struct page *allocate_slab(struct kmem_cache *s, 
gfp_t flags, int node)
}
 
page->objects = oo_objects(oo);
-   mod_zone_page_state(page_zone(page),
-   (s->flags & SLAB_RECLAIM_ACCOUNT) ?
-   NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
-   1 << oo_order(oo));
-
-   return page;
-}
-
-static void setup_object(struct kmem_cache *s, struct page *page,
-   void *object)
-{
-   setup_object_debug(s, page, object);
-   if (unlikely(s->ctor))
-   s->ctor(object);
-}
-
-static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
-{
-   struct page *page;
-   void *start;
-   void *last;
-   void *p;
-
-   BUG_ON(flags & GFP_SLAB_BUG_MASK);
-
-   page = allocate_slab(s,
-   flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
-   if (!page)
-   goto out;
-
-   inc_slabs_node(s, page_to_nid(page), page->objects);
page->slab = s;
page->flags |= 1 << PG_slab;
 
@@ -1380,10 +1353,34 @@ static struct page *new_slab(struct kmem_cache *s, 
gfp_t flags, int node)
page->freelist = start;
page->inuse = page->objects;
page->frozen = 1;
+
 out:
+   if (flags & __GFP_WAIT)
+   local_irq_disable();
+   if (!page)
+   return NULL;
+
+   mod_zone_page_state(page_zone(page),
+   (s->flags & SLAB_RECLAIM_ACCOUNT) ?
+   NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
+   1 << oo_order(oo));
+
+   inc_slabs_node(s, page_to_nid(page), page->objects);
+
return page;
 }
 
+static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
+{
+   if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
+   pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK);
+   BUG();
+   }
+
+   return allocate_slab(s,
+   flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
+}
+
 static void __free_slab(struct kmem_cache *s, struct page *page)
 {
int order = compound_order(page);
-- 
2.4.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] dma: omap-dma: add support for pause of non-cyclic transfers

2015-08-07 Thread Peter Hurley
On 08/07/2015 02:32 PM, Russell King - ARM Linux wrote:
> On Fri, Aug 07, 2015 at 02:21:59PM -0400, Peter Hurley wrote:
>> [ + Heikki ]
>>
>> On 08/07/2015 12:33 PM, Russell King - ARM Linux wrote:
>>> What you have is a race condition in the code you a responsible for
>>> maintaining, caused by poorly implemented code.  Fix it, rather than
>>> whinging about drivers outside of your subsystem having never implemented
>>> _optional_ things that you choose to merge broken code which relied upon
>>> it _without_ checking that the operation succeeded.
>>>
>>> It is _entirely_ your code which is wrong here.
>>>
>>> I will wait for that to be fixed before acking the omap-dma change since
>>> you obviously need something to test with.
>>
>> I'm not sure to what you're referring here.
>>
>> A WARNing fixes nothing.
> 
> The warning can wait.
> 
>> If you mean some patch, as yet unwritten, that handles the dma cases when
>> dmaengine_pause() is unimplemented without data loss, ok, but please confirm
>> that's what you mean.
> 
> But the regression needs fixing.

I too would prefer the bug to be fixed.

But calling it a regression is incorrect. There is no previous SHA in which this
problem didn't exist, except before either 8250_dma or 8250_omap was added.

>From the outset, both the 8250 dma code and the 8250_omap driver (mistakenly)
relied on dmaengine_pause.


>> However, at some point one must look at the api and wonder if the separation
>> of concern has been drawn in the right place.
> 
> It _is_ in the right place.  dmaengine_pause() always has been permitted
> to fail.  It's the responsibility of the user of this API to _check_ the
> return code to find out whether it had the desired effect.  Not checking
> the return code is a bug in the caller's code.
> 
> If that wasn't the case, dmaengine_pause() would have a void return type.
> It doesn't.  It has an 'int' to allow failure

A resource error is significantly different than ENOSYS or EINVAL.


> or to allow non-
> implementation for cases where the underlying hardware can't pause the
> channel without causing data loss.


That's your assertion; I've seen no documentation to back that up
(other than the de facto commit).

And quite frankly, that's absurd.

1. No other driver implements _only some_ use-cases of dmaengine_pause().
2. The number of users expecting dmaengine_pause to be implemented for
   non-cyclic dma transfers _dwarfs_ cyclic users.
3. There's a dedicated query interface, dma_get_slave_caps(), for which
   omap-dma returns /true/ -- not /maybe/ -- to indicate dmaengine_pause()
   is implemented.

As a consumer of the api, I'd much rather opt-out at device initialization
time knowing that a required feature is unimplemented, than discover it
at i/o time when it's too late.


> What would you think is better: an API which silently loses data, or
> one which refuses to stop the transfer and reports an error code back
> to the caller.

An api which provides a means of determining if necessary functionality
is implemented _during setup_. That way the consumer of the api can
determine if the feature is supportable.

For example, dma_get_slave_caps() could differentiate
* pause for cyclic support
* pause for non-cyclic support
* pause and resume support
* pause and terminate support



> You seem to be arguing for the former, and as such, there's no way I
> can take you seriously.

Leaping to conclusions.


> In any case, Greg has now commented on the patch adding the feature,
> basically refusing it for stable tree inclusion.  So the matter is
> settled: omap-dma isn't going to get the pause feature added in stable
> trees any time soon.  So a different solution now needs to be found,
> which is what I've been saying all along...

While Sebastian's initial patch is a good first-cut at addressing
8250_omap's use of omap-dma, none of the patches address the general
design problem I have outlined above; namely, that simply returning
an error at use time for an unimplemented slave transaction is
fundamentally flawed.

Regards,
Peter Hurley


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: create and use seq_show_option for escaping

2015-08-07 Thread Serge E. Hallyn
On Fri, Aug 07, 2015 at 04:41:50PM -0700, Kees Cook wrote:
> Many file systems that implement the show_options hook fail to correctly
> escape their output which could lead to unescaped characters (e.g. new
> lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
> could lead to confusion, spoofed entries (resulting in things like
> systemd issuing false d-bus "mount" notifications), and who knows
> what else. This looks like it would only be the root user stepping on
> themselves, but it's possible weird things could happen in containers
> or in other situations with delegated mount privileges.
> 
> Here's an example using overlay with setuid fusermount trusting the
> contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of
> "sudo" is something more sneaky:
> 
> $ BASE="ovl"
> $ MNT="$BASE/mnt"
> $ LOW="$BASE/lower"
> $ UP="$BASE/upper"
> $ WORK="$BASE/work/ 0 0
> none /proc fuse.pwn user_id=1000"
> $ mkdir -p "$LOW" "$UP" "$WORK"
> $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none 
> /mnt
> $ cat /proc/mounts
> none /root/ovl/mnt overlay 
> rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
> none /proc fuse.pwn user_id=1000 0 0
> $ fusermount -u /proc
> $ cat /proc/mounts
> cat: /proc/mounts: No such file or directory
> 
> This fixes the problem by adding new seq_show_option and seq_show_option_n
> helpers, and updating the vulnerable show_option handlers to use them as
> needed. Some, like SELinux, need to be open coded due to unusual existing
> escape mechanisms.
> 
> Signed-off-by: Kees Cook 
> Cc: sta...@vger.kernel.org

Acked-by: Serge Hallyn 

> ---
>  fs/ceph/super.c  |  2 +-
>  fs/cifs/cifsfs.c |  6 +++---
>  fs/ext3/super.c  |  4 ++--
>  fs/ext4/super.c  |  4 ++--
>  fs/gfs2/super.c  |  6 +++---
>  fs/hfs/super.c   |  4 ++--
>  fs/hfsplus/options.c |  4 ++--
>  fs/hostfs/hostfs_kern.c  |  2 +-
>  fs/ocfs2/super.c |  4 ++--
>  fs/overlayfs/super.c |  6 +++---
>  fs/reiserfs/super.c  |  8 +---
>  fs/xfs/xfs_super.c   |  4 ++--
>  include/linux/seq_file.h | 34 ++
>  kernel/cgroup.c  |  7 ---
>  net/ceph/ceph_common.c   |  7 +--
>  security/selinux/hooks.c |  2 +-
>  16 files changed, 72 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index d1c833c321b9..7b6bfcbf801c 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -479,7 +479,7 @@ static int ceph_show_options(struct seq_file *m, struct 
> dentry *root)
>   if (fsopt->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT)
>   seq_printf(m, ",readdir_max_bytes=%d", 
> fsopt->max_readdir_bytes);
>   if (strcmp(fsopt->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT))
> - seq_printf(m, ",snapdirname=%s", fsopt->snapdir_name);
> + seq_show_option(m, "snapdirname", fsopt->snapdir_name);
>  
>   return 0;
>  }
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 0a9fb6b53126..6a1119e87fbb 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -394,17 +394,17 @@ cifs_show_options(struct seq_file *s, struct dentry 
> *root)
>   struct sockaddr *srcaddr;
>   srcaddr = (struct sockaddr *)>ses->server->srcaddr;
>  
> - seq_printf(s, ",vers=%s", tcon->ses->server->vals->version_string);
> + seq_show_option(s, "vers", tcon->ses->server->vals->version_string);
>   cifs_show_security(s, tcon->ses);
>   cifs_show_cache_flavor(s, cifs_sb);
>  
>   if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER)
>   seq_puts(s, ",multiuser");
>   else if (tcon->ses->user_name)
> - seq_printf(s, ",username=%s", tcon->ses->user_name);
> + seq_show_option(s, "username", tcon->ses->user_name);
>  
>   if (tcon->ses->domainName)
> - seq_printf(s, ",domain=%s", tcon->ses->domainName);
> + seq_show_option(s, "domain", tcon->ses->domainName);
>  
>   if (srcaddr->sa_family != AF_UNSPEC) {
>   struct sockaddr_in *saddr4;
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index 5ed0044fbb37..e9312494f3ee 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -578,10 +578,10 @@ static inline void ext3_show_quota_options(struct 
> seq_file *seq, struct super_bl
>   }
>  
>   if (sbi->s_qf_names[USRQUOTA])
> - seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]);
> + seq_show_option(seq, "usrjquota", sbi->s_qf_names[USRQUOTA]);
>  
>   if (sbi->s_qf_names[GRPQUOTA])
> - seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]);
> + seq_show_option(seq, "grpjquota", sbi->s_qf_names[GRPQUOTA]);
>  
>   if (test_opt(sb, USRQUOTA))
>   seq_puts(seq, ",usrquota");
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 58987b5c514b..9981064c4a54 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> 

Re: [PATCH v4 2/2] usb: Add support for ACPI identification to xhci-platform

2015-08-07 Thread Greg KH
On Fri, Aug 07, 2015 at 06:03:36PM -0700, Duc Dang wrote:
> Provide the methods to let ACPI identify the need to use
> xhci-platform. Change the Kconfig files so the
> xhci-plat.o file is selectable during kernel config.
> 
> This has been tested on an ARM64 machine with platform XHCI, an
> x86_64 machine with XHCI, and an x86_64 machine without XHCI.
> There were no regressions or error messages on the machines
> without platform XHCI.
> 
> [dhdang: regenerate the patch over 4.2-rc5]
> Signed-off-by: Mark Langsdorf 
> Signed-off-by: Duc Dang 
> 
> ---
> Changes from v3:
>   Regenerate the patch over 4.2-rc5
>   No code change
> 
> Changes from v2
>   Replaced tristate with a boolean as the driver doesn't
>   compile as a module
>   Correct --help-- to ---help---
> 
> Changes from v1
>   Renamed from "add support for APM X-Gene to xhci-platform"
>   Removed changes to arm64/Kconfig
>   Made CONFIG_USB_XHCI_PLATFORM a user selectable config option
> 
>  drivers/usb/host/Kconfig |  7 ++-
>  drivers/usb/host/xhci-plat.c | 11 +++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
> index 8afc3c1..96231ee 100644
> --- a/drivers/usb/host/Kconfig
> +++ b/drivers/usb/host/Kconfig
> @@ -32,7 +32,12 @@ config USB_XHCI_PCI
> default y
>  
>  config USB_XHCI_PLATFORM
> - tristate
> + tristate "xHCI platform driver support"
> + ---help---
> +   Say 'Y' to enable the support for the xHCI host controller
> +   as a platform device. Many ARM SoCs provide USB this way.
> +
> +   If unsure, say 'Y'.
>  
>  config USB_XHCI_MVEBU
>   tristate "xHCI support for Marvell Armada 375/38x"
> diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
> index 5d03f8b..14b40d2 100644
> --- a/drivers/usb/host/xhci-plat.c
> +++ b/drivers/usb/host/xhci-plat.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "xhci.h"
>  #include "xhci-mvebu.h"
> @@ -262,6 +263,15 @@ static const struct of_device_id usb_xhci_of_match[] = {
>  MODULE_DEVICE_TABLE(of, usb_xhci_of_match);
>  #endif
>  
> +#ifdef CONFIG_ACPI

You shoudn't need this #ifdef, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] scatterlist: use sg_phys()

2015-08-07 Thread Dan Williams
On Fri, Aug 7, 2015 at 9:15 AM, Christoph Hellwig  wrote:
> From: Dan Williams 
>
> Coccinelle cleanup to replace open coded sg to physical address
> translations.  This is in preparation for introducing scatterlists that
> reference __pfn_t.
>
> // sg_phys.cocci: convert usage page_to_phys(sg_page(sg)) to sg_phys(sg)
> // usage: make coccicheck COCCI=sg_phys.cocci MODE=patch
>
> virtual patch
> virtual report
> virtual org
>
> @@
> struct scatterlist *sg;
> @@
>
> - page_to_phys(sg_page(sg)) + sg->offset
> + sg_phys(sg)
>
> @@
> struct scatterlist *sg;
> @@
>
> - page_to_phys(sg_page(sg))
> + sg_phys(sg) - sg->offset

So this ends up being an awkward conversion, a better one is:

- page_to_phys(sg_page(sg))
+ sg_phys(sg) & PAGE_MASK

[..]
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 5852df8..e0faecc 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2098,7 +2098,7 @@ static int __domain_mapping(struct dmar_domain *domain, 
> unsigned long iov_pfn,
> sg_res = aligned_nrpages(sg->offset, sg->length);
> sg->dma_address = ((dma_addr_t)iov_pfn << 
> VTD_PAGE_SHIFT) + sg->offset;
> sg->dma_length = sg->length;
> -   pteval = page_to_phys(sg_page(sg)) | prot;
> +   pteval = (sg_phys(sg) - sg->offset) | prot;

For example here, as Joerg pointed out.

[1]: https://lkml.org/lkml/2015/6/10/230
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/9] x86, perf: Add Top Down events to Intel Core

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Add declarations for the events needed for TopDown to the
Intel big core CPUs starting with Sandy Bridge. We need
to report different values if HyperThreading is on or off.

The only thing this patch does is to export some events
in sysfs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.

Some events need a divisor. To handle this I redefined
".scale" slightly to let a negative value mean divide by.

For HyperThreading the any bit is needed to get accurate
values when both threads are executing. This implies that
the events can only be collected as root or with
perf_event_paranoid=-1 for now.

Hyper Threading also requires averaging events from both
threads together (the CPU cannot measure them independently).
In perf stat this is done by using per core mode, and then
forcing a divisor of two to get the average. The
new .agg-per-core attribute is added to the events, which
then forces perf stat to enable --per-core.
When hyperthreading is disabled the attribute has the value 0.

The basic scheme is based on the following paper:
Yasin,
A Top Down Method for Performance analysis and Counter architecture
ISPASS14
(pdf available via google)

with some extensions to handle HyperThreading.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c | 82 ++
 1 file changed, 82 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index a478e3c..65b58cb 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -217,9 +217,70 @@ struct attribute *nhm_events_attrs[] = {
NULL,
 };
 
+/*
+ * TopDown events for Core.
+ *
+ * With Hyper Threading on, TopDown metrics are averaged between the
+ * threads of a core: (count_core0 + count_core1) / 2. The 2 is expressed
+ * as a scale parameter. We also tell perf to aggregate per core
+ * by setting the .agg-per-core attribute for the alias to 1.
+ *
+ * Some events need to be multiplied by the pipeline width (4), which
+ * is expressed as a negative scale. In HT we cancel the factor 4
+ * with the 2 dividend for the core average, so we use -2.
+ */
+
+EVENT_ATTR_STR_HT(topdown-total-slots, td_total_slots,
+   "event=0x3c,umask=0x0", /* cpu_clk_unhalted.thread */
+   "event=0x3c,umask=0x0,any=1");  /* cpu_clk_unhalted.thread_any 
*/
+EVENT_ATTR_STR_HT(topdown-total-slots.scale, td_total_slots_scale,
+   "-4", "-2");
+EVENT_ATTR_STR_HT(topdown-total-slots.agg-per-core, td_total_slots_pc,
+   "0", "1");
+EVENT_ATTR_STR(topdown-slots-issued, td_slots_issued,
+   "event=0xe,umask=0x1"); /* uops_issued.any */
+EVENT_ATTR_STR_HT(topdown-slots-issued.agg-per-core, td_slots_issued_pc,
+   "0", "1");
+EVENT_ATTR_STR_HT(topdown-slots-issued.scale, td_slots_issued_scale,
+   "0", "2");
+EVENT_ATTR_STR(topdown-slots-retired, td_slots_retired,
+   "event=0xc2,umask=0x2");/* uops_retired.retire_slots */
+EVENT_ATTR_STR_HT(topdown-slots-retired.agg-per-core, td_slots_retired_pc,
+   "0", "1");
+EVENT_ATTR_STR_HT(topdown-slots-retired.scale, td_slots_retired_scale,
+   "0", "2");
+EVENT_ATTR_STR(topdown-fetch-bubbles, td_fetch_bubbles,
+   "event=0x9c,umask=0x1");/* idq_uops_not_delivered_core 
*/
+EVENT_ATTR_STR_HT(topdown-fetch-bubbles.agg-per-core, td_fetch_bubbles_pc,
+   "0", "1");
+EVENT_ATTR_STR_HT(topdown-fetch-bubbles.scale, td_fetch_bubbles_scale,
+   "0", "2");
+EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles,
+   "event=0xd,umask=0x3,cmask=1",  /* int_misc.recovery_cycles */
+   "event=0xd,umask=0x3,cmask=1,any=1");   /* int_misc.recovery_cycles_any 
*/
+EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale,
+   "-4", "-2");
+EVENT_ATTR_STR_HT(topdown-recovery-bubbles.agg-per-core, 
td_recovery_bubbles_pc,
+   "0", "1");
+
 struct attribute *snb_events_attrs[] = {
EVENT_PTR(mem_ld_snb),
EVENT_PTR(mem_st_snb),
+   EVENT_PTR(td_slots_issued),
+   EVENT_PTR(td_slots_issued_scale),
+   EVENT_PTR(td_slots_issued_pc),
+   

[PATCH 2/9] perf, tools, stat: Support up-scaling of events

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

TopDown needs to multiply events by constants (for example
the CPU Pipeline Width) to get the correct results.
The kernel needs to export this factor.

Today *.scale is only used to scale down metrics (divide), for example
to scale bytes to MB.

Repurpose negative scale to mean scaling up, that is multiplying.
Implement the code for this in perf stat.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-stat.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index ea5298a..2590c75 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -179,6 +179,17 @@ static inline int nsec_counter(struct perf_evsel *evsel)
return 0;
 }
 
+static double scale_val(struct perf_evsel *counter, u64 val)
+{
+   double uval = val;
+
+   if (counter->scale < 0)
+   uval = val * (-counter->scale);
+   else if (counter->scale)
+   uval = val / counter->scale;
+   return uval;
+}
+
 /*
  * Read out the results of a single counter:
  * do not aggregate counts across CPUs in system-wide mode
@@ -630,12 +641,12 @@ static void abs_printout(int id, int nr, struct 
perf_evsel *evsel, double avg)
const char *fmt;
 
if (csv_output) {
-   fmt = sc != 1.0 ?  "%.2f%s" : "%.0f%s";
+   fmt = (sc != 1.0 && sc > 0) ?  "%.2f%s" : "%.0f%s";
} else {
if (big_num)
-   fmt = sc != 1.0 ? "%'18.2f%s" : "%'18.0f%s";
+   fmt = (sc != 1.0 && sc > 0) ? "%'18.2f%s" : "%'18.0f%s";
else
-   fmt = sc != 1.0 ? "%18.2f%s" : "%18.0f%s";
+   fmt = (sc != 1.0 && sc > 0) ? "%18.2f%s" : "%18.0f%s";
}
 
aggr_printout(evsel, id, nr);
@@ -750,7 +761,7 @@ static void aggr_update_shadow(void)
continue;
val += perf_counts(counter->counts, cpu, 
0)->val;
}
-   val = val * counter->scale;
+   val = scale_val(counter, val);
perf_stat__update_shadow_stats(counter, ,
   
first_shadow_cpu(counter, id));
}
@@ -788,7 +799,7 @@ static void print_aggr(char *prefix)
if (prefix)
fprintf(output, "%s", prefix);
 
-   uval = val * counter->scale;
+   uval = scale_val(counter, val);
printout(id, nr, counter, uval, prefix, run, ena, 1.0);
fputc('\n', output);
}
@@ -815,7 +826,7 @@ static void print_aggr_thread(struct perf_evsel *counter, 
char *prefix)
if (prefix)
fprintf(output, "%s", prefix);
 
-   uval = val * counter->scale;
+   uval = scale_val(counter, val);
printout(thread, 0, counter, uval, prefix, run, ena, 1.0);
fputc('\n', output);
}
@@ -860,7 +871,7 @@ static void print_counter_aggr(struct perf_evsel *counter, 
char *prefix)
return;
}
 
-   uval = avg * counter->scale;
+   uval = scale_val(counter, avg);
printout(-1, 0, counter, uval, prefix, avg_running, avg_enabled, avg);
fprintf(output, "\n");
 }
@@ -884,7 +895,7 @@ static void print_counter(struct perf_evsel *counter, char 
*prefix)
if (prefix)
fprintf(output, "%s", prefix);
 
-   uval = val * counter->scale;
+   uval = scale_val(counter, val);
printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
 
fputc('\n', output);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] PM / AVS: rockchip-io: add io selectors and supplies for rk3368

2015-08-07 Thread Rafael J. Wysocki
On Friday, August 07, 2015 04:04:53 PM Kevin Hilman wrote:
> Heiko Stübner  writes:
> 
> > This adds the necessary data for handling io voltage domains on the rk3368.
> > As interesting tidbit, the rk3368 contains two separate iodomain areas.
> > One in the regular General Register Files (GRF) and one in PMUGRF in the
> > pmu power domain.
> >
> > Signed-off-by: Heiko Stuebner 
> 
> Acked-by: Kevin Hilman 
> 
> Rafael, feel free to take this series directly, or let me know if you
> want a pull request.  I don't have anything other AVS stuff queued up
> for v4.3.

I'll take it directly, thanks!

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/9] x86, perf: Support sysfs files depending on SMT status

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Add a way to show different sysfs events attributes depending on
HyperThreading is on or off. This is difficult to determine
early at boot, so we just do it dynamically when the sysfs
attribute is read.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.c | 34 ++
 arch/x86/kernel/cpu/perf_event.h | 10 ++
 include/linux/perf_event.h   |  7 +++
 3 files changed, 51 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8bac4bb..a1313ed 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1590,6 +1590,40 @@ ssize_t events_sysfs_show(struct device *dev, struct 
device_attribute *attr,
return x86_pmu.events_sysfs_show(page, config);
 }
 
+ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
+ char *page)
+{
+   struct perf_pmu_events_ht_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_events_ht_attr, attr);
+   bool ht_on = false;
+   int cpu;
+
+   /*
+* Report conditional events depending on Hyper-Threading.
+*
+* Check all online CPUs if any have a thread sibling,
+* as perf may measure any of them.
+*
+* This is overly conservative as usually the HT special
+* handling is not needed if the other CPU thread is idle.
+*
+* Note this does not (cannot) handle the case when thread
+* siblings are invisible, for example with virtualization
+* if they are owned by some other guest.  The user tool
+* has to re-read when a thread sibling gets onlined later.
+*/
+   for_each_online_cpu (cpu) {
+   ht_on = cpumask_weight(topology_sibling_cpumask(cpu)) > 1;
+   if (ht_on)
+   break;
+   }
+
+   return sprintf(page, "%s",
+   ht_on ?
+   pmu_attr->event_str_ht :
+   pmu_attr->event_str_noht);
+}
+
 EVENT_ATTR(cpu-cycles, CPU_CYCLES  );
 EVENT_ATTR(instructions,   INSTRUCTIONS);
 EVENT_ATTR(cache-references,   CACHE_REFERENCES);
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 5edf6d8..3df86d9 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -664,6 +664,14 @@ static struct perf_pmu_events_attr event_attr_##v = {  
\
.event_str  = str,  \
 };
 
+#define EVENT_ATTR_STR_HT(_name, v, noht, ht)  \
+static struct perf_pmu_events_ht_attr event_attr_##v = {   \
+   .attr   = __ATTR(_name, 0444, events_ht_sysfs_show, NULL),\
+   .id = 0,\
+   .event_str_noht = noht, \
+   .event_str_ht   = ht,   \
+};
+
 extern struct x86_pmu x86_pmu __read_mostly;
 
 static inline bool x86_pmu_has_lbr_callstack(void)
@@ -923,6 +931,8 @@ int knc_pmu_init(void);
 
 ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
  char *page);
+ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
+ char *page);
 
 static inline int is_ht_workaround_enabled(void)
 {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2027809..5e9ee24 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1068,6 +1068,13 @@ struct perf_pmu_events_attr {
const char *event_str;
 };
 
+struct perf_pmu_events_ht_attr {
+   struct device_attribute attr;
+   u64 id;
+   const char *event_str_ht;
+   const char *event_str_noht;
+};
+
 ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute 
*attr,
  char *page);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 9/9] perf, tools, stat: Force --per-core mode for .agg-per-core aliases

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

When an event alias is used that the kernel marked as .agg-per-core, force
--per-core mode (and also require -a and forbid cgroups or per thread mode).
This in term means, --topdown forces --per-core mode.

This is needed for TopDown in SMT mode, because it needs to measure
all threads in a core together and merge the values to compute the correct
percentages of how the pipeline is limited.

We do this if any alias is agg-per-core.

Add the code to parse the .agg-per-core attributes and propagate
the information to the evsel. Then the main stat code does
the necessary checks and forces per core mode.

Open issue: in combination with -C ... we get wrong values. I think that's
a existing bug that needs to be debugged/fixed separately.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-stat.c  | 18 ++
 tools/perf/util/evsel.h|  1 +
 tools/perf/util/parse-events.c |  1 +
 tools/perf/util/pmu.c  | 23 +++
 tools/perf/util/pmu.h  |  2 ++
 5 files changed, 45 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index eec6c16..0df0aff 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1382,6 +1382,7 @@ int cmd_stat(int argc, const char **argv, const char 
*prefix __maybe_unused)
bool append_file = false;
int output_fd = 0;
const char *output_name = NULL;
+   struct perf_evsel *counter;
const struct option options[] = {
OPT_BOOLEAN('T', "transaction", _run,
"hardware transaction statistics"),
@@ -1563,6 +1564,23 @@ int cmd_stat(int argc, const char **argv, const char 
*prefix __maybe_unused)
if (add_default_attributes())
goto out;
 
+   evlist__for_each (evsel_list, counter) {
+   /* Enable per core mode if only a single event requires it. */
+   if (counter->agg_per_core) {
+   if (stat_config.aggr_mode != AGGR_GLOBAL &&
+   stat_config.aggr_mode != AGGR_CORE) {
+   pr_err("per core event configuration requires 
per core mode\n");
+   goto out;
+   }
+   stat_config.aggr_mode = AGGR_CORE;
+   if (nr_cgroups || !target__has_cpu()) {
+   pr_err("per core event configuration requires 
system-wide mode (-a)\n");
+   goto out;
+   }
+   break;
+   }
+   }
+
target__validate();
 
if (perf_evlist__create_maps(evsel_list, ) < 0) {
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 6a12908..85f02b8 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -100,6 +100,7 @@ struct perf_evsel {
boolsystem_wide;
booltracking;
boolper_pkg;
+   boolagg_per_core;
/* parse modifier helper */
int exclude_GH;
int nr_members;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 828936d..d2a5938 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -759,6 +759,7 @@ int parse_events_add_pmu(struct parse_events_evlist *data,
evsel->unit = info.unit;
evsel->scale = info.scale;
evsel->per_pkg = info.per_pkg;
+   evsel->agg_per_core = info.agg_per_core;
evsel->snapshot = info.snapshot;
}
 
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index ce56354..abedb6a 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -189,6 +189,23 @@ perf_pmu__parse_per_pkg(struct perf_pmu_alias *alias, char 
*dir, char *name)
return 0;
 }
 
+static void
+perf_pmu__parse_agg_per_core(struct perf_pmu_alias *alias, char *dir, char 
*name)
+{
+   char path[PATH_MAX];
+   FILE *f;
+   int flag;
+
+   snprintf(path, PATH_MAX, "%s/%s.agg-per-core", dir, name);
+
+   f = fopen(path, "r");
+   if (f && fscanf(f, "%d", ) == 1) {
+   alias->agg_per_core = flag != 0;
+   fclose(f);
+   }
+}
+
+
 static int perf_pmu__parse_snapshot(struct perf_pmu_alias *alias,
char *dir, char *name)
 {
@@ -237,6 +254,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
perf_pmu__parse_scale(alias, dir, name);
perf_pmu__parse_per_pkg(alias, dir, name);
perf_pmu__parse_snapshot(alias, dir, name);
+   perf_pmu__parse_agg_per_core(alias, dir, name);
}
 
list_add_tail(>list, list);
@@ -271,6 +289,8 @@ static inline bool pmu_alias_info_file(char *name)
return true;
if (len > 9 

[PATCH 7/9] x86, perf: Add Top Down events to Intel Atom

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Add topdown event declarations to Silvermont / Airmont.
These cores do not support the full Top Down metrics, but an useful
subset (FrontendBound, Retiring, Backend Bound/Bad Speculation).

The perf stat tool automatically handles the missing events
and combines the available metrics.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 65b58cb..1f08603 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1380,6 +1380,29 @@ static __initconst const u64 atom_hw_cache_event_ids
  },
 };
 
+EVENT_ATTR_STR(topdown-total-slots, td_total_slots_slm, "event=0x3c");
+EVENT_ATTR_STR(topdown-total-slots.scale, td_total_slots_scale_slm, "-2");
+/* no_alloc_cycles.not_delivered */
+EVENT_ATTR_STR(topdown-fetch-bubbles, td_fetch_bubbles_slm,
+  "event=0xca,umask=0x50");
+EVENT_ATTR_STR(topdown-fetch-bubbles.scale, td_fetch_bubbles_scale_slm, "-2");
+/* uops_retired.all */
+EVENT_ATTR_STR(topdown-slots-issued, td_slots_issued_slm,
+  "event=0xc2,umask=0x10");
+/* uops_retired.all */
+EVENT_ATTR_STR(topdown-slots-retired, td_slots_retired_slm,
+  "event=0xc2,umask=0x10");
+
+struct attribute *slm_events_attrs[] = {
+   EVENT_PTR(td_total_slots_slm),
+   EVENT_PTR(td_total_slots_scale_slm),
+   EVENT_PTR(td_fetch_bubbles_slm),
+   EVENT_PTR(td_fetch_bubbles_scale_slm),
+   EVENT_PTR(td_slots_issued_slm),
+   EVENT_PTR(td_slots_retired_slm),
+   NULL
+};
+
 static struct extra_reg intel_slm_extra_regs[] __read_mostly =
 {
/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
@@ -3401,6 +3424,7 @@ __init int intel_pmu_init(void)
x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints;
x86_pmu.extra_regs = intel_slm_extra_regs;
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
+   x86_pmu.cpu_events = slm_events_attrs;
pr_cont("Silvermont events, ");
break;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/9] perf, tools: Dont stop PMU parsing on alias parse error

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

When an error happens during alias parsing currently the complete
parsing of all attributes of the PMU is stopped. This is breaks
old perf on a newer kernel that may have not-yet-know
alias attributes (such as .scale or .per-pkg).

Continue when some attribute is unparseable.

This is IMHO a stable candidate and should be backported
to older versions to avoid problems with newer kernels.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/pmu.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index d4b0e64..ce56354 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -283,13 +283,12 @@ static int pmu_aliases_parse(char *dir, struct list_head 
*head)
 {
struct dirent *evt_ent;
DIR *event_dir;
-   int ret = 0;
 
event_dir = opendir(dir);
if (!event_dir)
return -EINVAL;
 
-   while (!ret && (evt_ent = readdir(event_dir))) {
+   while ((evt_ent = readdir(event_dir))) {
char path[PATH_MAX];
char *name = evt_ent->d_name;
FILE *file;
@@ -305,17 +304,16 @@ static int pmu_aliases_parse(char *dir, struct list_head 
*head)
 
snprintf(path, PATH_MAX, "%s/%s", dir, name);
 
-   ret = -EINVAL;
file = fopen(path, "r");
if (!file)
-   break;
+   continue;
 
-   ret = perf_pmu__new_alias(head, dir, name, file);
+   perf_pmu__new_alias(head, dir, name, file);
fclose(file);
}
 
closedir(event_dir);
-   return ret;
+   return 0;
 }
 
 /*
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/9] perf, tools, stat: Add extra output of counter values with -v

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Add debug output of raw counter values per CPU when
perf stat -v is specified, together with their cpu numbers.
This is very useful to debug problems with per core counters,
where we can normally only see aggregated values.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-stat.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a83f26f..eec6c16 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -223,6 +223,13 @@ static int read_counter(struct perf_evsel *counter)
count = perf_counts(counter->counts, cpu, thread);
if (perf_evsel__read(counter, cpu, thread, count))
return -1;
+   if (verbose) {
+   fprintf(stat_config.output,
+   "%s: %d: %" PRIu64 " %" PRIu64 " %" 
PRIu64 "\n",
+   perf_evsel__name(counter),
+   cpu,
+   count->val, count->ena, 
count->run);
+   }
}
}
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Add top down metrics to perf stat

2015-08-07 Thread Andi Kleen
This patchkit adds support for TopDown to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots   Available slots in the pipeline
topdown-slots-issued  Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ ./perf stat --topdown -a ./BC1s 

 Performance counter stats for 'system wide':

S0-C0   2   19650790  topdown-total-slots   
(100.00%)
S0-C0   2 4445680.00  topdown-fetch-bubbles #22.62% 
frontend bound  (100.00%)
S0-C0   2 1743552.00  topdown-slots-retired 
(100.00%)
S0-C0   2 622954  topdown-recovery-bubbles  
(100.00%)
S0-C0   2 2025498.00  topdown-slots-issued  #63.90% 
backend bound 
S0-C1   216685216540  topdown-total-slots   
(100.00%)
S0-C1   2   962557931.00  topdown-fetch-bubbles 
(100.00%)
S0-C1   2  4175583320.00  topdown-slots-retired 
(100.00%)
S0-C1   2 1743329246  topdown-recovery-bubbles  #22.22% 
bad speculation (100.00%)
S0-C1   2  6138901193.50  topdown-slots-issued  #46.99% 
backend bound 

   1.535832673 seconds time elapsed
 
On Hyper Threaded CPUs Top Down computes metrics per core instead of per 
logical CPU.
In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting.  

On systems without Hyper Threading it can be used per process.

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/9] perf, tools, stat: Add computation of TopDown formulas

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Implement the TopDown formulas in perf stat. The topdown basic metrics
reported by the kernel are collected, and the formulas are computed
and output as normal metrics.

See the kernel commit exporting the events for details on the used
metrics.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/stat-shadow.c | 119 +-
 tools/perf/util/stat.c|   5 ++
 tools/perf/util/stat.h|   8 ++-
 3 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 073e66f7..2158a0e 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -28,6 +28,11 @@ static struct stats 
runtime_dtlb_cache_stats[NUM_CTX][MAX_NR_CPUS];
 static struct stats runtime_cycles_in_tx_stats[NUM_CTX][MAX_NR_CPUS];
 static struct stats runtime_transaction_stats[NUM_CTX][MAX_NR_CPUS];
 static struct stats runtime_elision_stats[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_total_slots[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_slots_issued[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_slots_retired[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_fetch_bubbles[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_recovery_bubbles[NUM_CTX][MAX_NR_CPUS];
 
 struct stats walltime_nsecs_stats;
 
@@ -68,6 +73,11 @@ void perf_stat__reset_shadow_stats(void)
sizeof(runtime_transaction_stats));
memset(runtime_elision_stats, 0, sizeof(runtime_elision_stats));
memset(_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
+   memset(runtime_topdown_total_slots, 0, 
sizeof(runtime_topdown_total_slots));
+   memset(runtime_topdown_slots_retired, 0, 
sizeof(runtime_topdown_slots_retired));
+   memset(runtime_topdown_slots_issued, 0, 
sizeof(runtime_topdown_slots_issued));
+   memset(runtime_topdown_fetch_bubbles, 0, 
sizeof(runtime_topdown_fetch_bubbles));
+   memset(runtime_topdown_recovery_bubbles, 0, 
sizeof(runtime_topdown_recovery_bubbles));
 }
 
 /*
@@ -90,6 +100,16 @@ void perf_stat__update_shadow_stats(struct perf_evsel 
*counter, u64 *count,
update_stats(_transaction_stats[ctx][cpu], count[0]);
else if (perf_stat_evsel__is(counter, ELISION_START))
update_stats(_elision_stats[ctx][cpu], count[0]);
+   else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS))
+   update_stats(_topdown_total_slots[ctx][cpu], count[0]);
+   else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED))
+   update_stats(_topdown_slots_issued[ctx][cpu], count[0]);
+   else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED))
+   update_stats(_topdown_slots_retired[ctx][cpu], 
count[0]);
+   else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES))
+   update_stats(_topdown_fetch_bubbles[ctx][cpu],count[0]);
+   else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
+   update_stats(_topdown_recovery_bubbles[ctx][cpu], 
count[0]);
else if (perf_evsel__match(counter, HARDWARE, 
HW_STALLED_CYCLES_FRONTEND))
update_stats(_stalled_cycles_front_stats[ctx][cpu], 
count[0]);
else if (perf_evsel__match(counter, HARDWARE, 
HW_STALLED_CYCLES_BACKEND))
@@ -293,11 +313,70 @@ static void print_ll_cache_misses(int cpu,
print_metric(ctxp, color, "%7.2f%%", "of all LL-cache hits", ratio);
 }
 
+/*
+ * For an explanation of the formulas see:
+ * Yasin, A Top Down Method for Performance analysis and Counter architecture
+ * ISPASS14
+ */
+
+static double td_total_slots(int ctx, int cpu)
+{
+   return avg_stats(_topdown_total_slots[ctx][cpu]);
+}
+
+static double td_bad_spec(int ctx, int cpu)
+{
+   double bad_spec = 0;
+   double total_slots;
+   double total;
+
+   total = avg_stats(_topdown_slots_issued[ctx][cpu]) -
+   avg_stats(_topdown_slots_retired[ctx][cpu]) +
+   avg_stats(_topdown_recovery_bubbles[ctx][cpu]);
+   total_slots = td_total_slots(ctx, cpu);
+   if (total_slots)
+   bad_spec = total / total_slots;
+   return bad_spec;
+}
+
+static double td_retiring(int ctx, int cpu)
+{
+   double retiring = 0;
+   double total_slots = td_total_slots(ctx, cpu);
+   double ret_slots = avg_stats(_topdown_slots_retired[ctx][cpu]);
+
+   if (total_slots)
+   retiring = ret_slots / total_slots;
+   return retiring;
+}
+
+static double td_fe_bound(int ctx, int cpu)
+{
+   double fe_bound = 0;
+   double total_slots = td_total_slots(ctx, cpu);
+   double fetch_bub = avg_stats(_topdown_fetch_bubbles[ctx][cpu]);
+
+   if (total_slots)
+   fe_bound = fetch_bub / total_slots;
+   return fe_bound;
+}
+
+static double td_be_bound(int ctx, int cpu)
+{
+   double sum = (td_fe_bound(ctx, cpu) +
+ td_bad_spec(ctx, 

[PATCH 3/9] perf, tools, stat: Basic support for TopDown in perf stat

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Add basic plumbing for TopDown in perf stat

Add a new --topdown options to enable events.
When --topdown is specified set up events for all topdown
events supported by the kernel.
Add topdown-* as a special case to the event parser, as is
needed for all events containing -.

The actual code to compute the metrics is in follow-on patches.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-stat.txt |   8 +++
 tools/perf/builtin-stat.c  | 124 -
 tools/perf/util/parse-events.l |   1 +
 3 files changed, 131 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 47469ab..86c03e9 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -158,6 +158,14 @@ filter out the startup phase of the program, which is 
often very different.
 
 Print statistics of transactional execution if supported.
 
+--topdown::
+
+Print top down level 1 metrics if supported by the CPU. This allows to
+determine bottle necks in the CPU pipeline for CPU bound workloads,
+by breaking it down into frontend bound, backend bound, bad speculation
+and retiring.  Specifying the option multiple times shows metrics even
+if the don't cross a threshold.
+
 EXAMPLES
 
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2590c75..a83f26f 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -93,6 +93,15 @@ static const char * transaction_limited_attrs = {
"}"
 };
 
+static const char * topdown_attrs[] = {
+   "topdown-total-slots",
+   "topdown-fetch-bubbles",
+   "topdown-slots-retired",
+   "topdown-recovery-bubbles",
+   "topdown-slots-issued",
+   NULL,
+};
+
 static struct perf_evlist  *evsel_list;
 
 static struct target target = {
@@ -105,6 +114,7 @@ static volatile pid_t   child_pid   
= -1;
 static boolnull_run=  false;
 static int detailed_run=  0;
 static booltransaction_run;
+static int topdown_run = 0;
 static boolbig_num =  true;
 static int big_num_opt =  -1;
 static const char  *csv_sep= NULL;
@@ -735,7 +745,8 @@ static void printout(int id, int nr, struct perf_evsel 
*counter, double uval,
first_shadow_cpu(counter, id),
pm,
nl,
-   );
+   ,
+   topdown_run);
 
if (!csv_output) {
print_noise(counter, noise);
@@ -1093,12 +1104,90 @@ static int perf_stat_init_aggr_mode(void)
return 0;
 }
 
+static void filter_events(const char **attr, char **str, bool use_group)
+{
+   int off = 0;
+   int i;
+   int len = 0;
+   char *s;
+
+   for (i = 0; attr[i]; i++) {
+   if (pmu_have_event("cpu", attr[i])) {
+   len += strlen(attr[i]) + 1;
+   attr[i - off] = attr[i];
+   } else
+   off++;
+   }
+   attr[i - off] = NULL;
+
+   *str = malloc(len + 1 + 2);
+   if (!*str)
+   return;
+   s = *str;
+   if (i - off == 0) {
+   *s = 0;
+   return;
+   }
+   if (use_group)
+   *s++ = '{';
+   for (i = 0; attr[i]; i++) {
+   strcpy(s, attr[i]);
+   s += strlen(s);
+   *s++ = ',';
+   }
+   if (use_group) {
+   s[-1] = '}';
+   *s = 0;
+   } else
+   s[-1] = 0;
+}
+
+/* Caller must free result */
+static char *sysctl_read(const char *fn)
+{
+   int n;
+   char *line = NULL;
+   size_t linelen = 0;
+   FILE *f = fopen(fn, "r");
+   if (!f)
+   return NULL;
+   n = getline(, , f);
+   fclose(f);
+   if (n > 0)
+   return line;
+   free(line);
+   return NULL;
+}
+
+/*
+ * Check whether we can use a group for top down.
+ * Without a group may get bad results.
+ */
+static bool check_group(bool *warn)
+{
+   char *v = sysctl_read("/proc/sys/kernel/nmi_watchdog");
+   int n;
+
+   *warn = false;
+   if (v) {
+   bool res = sscanf(v, "%d", ) == 1 && n != 0;
+   free(v);
+   if (res) {
+   *warn = true;
+   return false;
+   }
+   return true;
+   }
+   return false; /* Don't know, so don't use group */
+}
+
 /*
  * Add default attributes, if there were no attributes specified or
  * if -d/--detailed, -d -d or -d -d -d is used:
  

[PATCH v4 2/2] usb: Add support for ACPI identification to xhci-platform

2015-08-07 Thread Duc Dang
Provide the methods to let ACPI identify the need to use
xhci-platform. Change the Kconfig files so the
xhci-plat.o file is selectable during kernel config.

This has been tested on an ARM64 machine with platform XHCI, an
x86_64 machine with XHCI, and an x86_64 machine without XHCI.
There were no regressions or error messages on the machines
without platform XHCI.

[dhdang: regenerate the patch over 4.2-rc5]
Signed-off-by: Mark Langsdorf 
Signed-off-by: Duc Dang 

---
Changes from v3:
Regenerate the patch over 4.2-rc5
No code change

Changes from v2
Replaced tristate with a boolean as the driver doesn't
compile as a module
Correct --help-- to ---help---

Changes from v1
Renamed from "add support for APM X-Gene to xhci-platform"
Removed changes to arm64/Kconfig
Made CONFIG_USB_XHCI_PLATFORM a user selectable config option

 drivers/usb/host/Kconfig |  7 ++-
 drivers/usb/host/xhci-plat.c | 11 +++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
index 8afc3c1..96231ee 100644
--- a/drivers/usb/host/Kconfig
+++ b/drivers/usb/host/Kconfig
@@ -32,7 +32,12 @@ config USB_XHCI_PCI
default y
 
 config USB_XHCI_PLATFORM
-   tristate
+   tristate "xHCI platform driver support"
+   ---help---
+ Say 'Y' to enable the support for the xHCI host controller
+ as a platform device. Many ARM SoCs provide USB this way.
+
+ If unsure, say 'Y'.
 
 config USB_XHCI_MVEBU
tristate "xHCI support for Marvell Armada 375/38x"
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index 5d03f8b..14b40d2 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "xhci.h"
 #include "xhci-mvebu.h"
@@ -262,6 +263,15 @@ static const struct of_device_id usb_xhci_of_match[] = {
 MODULE_DEVICE_TABLE(of, usb_xhci_of_match);
 #endif
 
+#ifdef CONFIG_ACPI
+static const struct acpi_device_id usb_xhci_acpi_match[] = {
+   /* APM X-Gene USB Controller */
+   { "PNP0D10", },
+   { }
+};
+MODULE_DEVICE_TABLE(acpi, usb_xhci_acpi_match);
+#endif
+
 static struct platform_driver usb_xhci_driver = {
.probe  = xhci_plat_probe,
.remove = xhci_plat_remove,
@@ -269,6 +279,7 @@ static struct platform_driver usb_xhci_driver = {
.name = "xhci-hcd",
.pm = DEV_PM_OPS,
.of_match_table = of_match_ptr(usb_xhci_of_match),
+   .acpi_match_table = ACPI_PTR(usb_xhci_acpi_match),
},
 };
 MODULE_ALIAS("platform:xhci-hcd");
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/2] usb: make xhci platform driver use 64 bit or 32 bit DMA

2015-08-07 Thread Duc Dang
The xhci platform driver needs to work on systems that
either only support 64-bit DMA or only support 32-bit DMA. 
Attempt to set a coherent dma mask for 64-bit DMA, and 
attempt again with 32-bit DMA if that fails.

[dhdang: regenerate the patch over 4.2-rc5]
Signed-off-by: Mark Langsdorf 
Tested-by: Mark Salter 
Signed-off-by: Duc Dang 

---
Changes from v3:
Re-generate the patch over 4.2-rc5
No code change.

Changes from v2:
None

Changes from v1:
Consolidated to use dma_set_mask_and_coherent
Got rid of the check against sizeof(dma_addr_t)

 drivers/usb/host/xhci-plat.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index 890ad9d..5d03f8b 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -93,14 +93,14 @@ static int xhci_plat_probe(struct platform_device *pdev)
if (irq < 0)
return -ENODEV;
 
-   /* Initialize dma_mask and coherent_dma_mask to 32-bits */
-   ret = dma_set_coherent_mask(>dev, DMA_BIT_MASK(32));
-   if (ret)
-   return ret;
-   if (!pdev->dev.dma_mask)
-   pdev->dev.dma_mask = >dev.coherent_dma_mask;
-   else
-   dma_set_mask(>dev, DMA_BIT_MASK(32));
+   /* Try setting the coherent_dma_mask to 64 bits, then try 32 bits */
+   ret = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(64));
+   if (ret) {
+   ret = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(32));
+   if (ret)
+   return ret;
+   }
+
 
hcd = usb_create_hcd(driver, >dev, dev_name(>dev));
if (!hcd)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/7] perf, tools, stat: Move non counting counter printing to printout

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Move the special case printing for non-running counters to
printout, so it can be shared by all the output options.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-stat.c | 73 ---
 1 file changed, 24 insertions(+), 49 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7801665..d777bb6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -662,6 +662,30 @@ static void printout(int id, int nr, struct perf_evsel 
*counter, double uval,
os.ena = ena;
}
 
+   if (run == 0 || ena == 0) {
+   aggr_printout(counter, id, nr);
+
+   fprintf(stat_config.output, "%*s%s",
+   csv_output ? 0 : 18,
+   counter->supported ? CNTR_NOT_COUNTED : 
CNTR_NOT_SUPPORTED,
+   csv_sep);
+
+   fprintf(stat_config.output, "%-*s%s",
+   csv_output ? 0 : unit_width,
+   counter->unit, csv_sep);
+
+   fprintf(stat_config.output, "%*s",
+   csv_output ? 0 : -25,
+   perf_evsel__name(counter));
+
+   if (counter->cgrp)
+   fprintf(stat_config.output, "%s%s",
+   csv_sep, counter->cgrp->name);
+
+   print_running(run, ena);
+   return;
+   }
+
if (nsec_counter(counter))
nsec_printout(id, nr, counter, uval);
else
@@ -710,30 +734,6 @@ static void print_aggr(char *prefix)
if (prefix)
fprintf(output, "%s", prefix);
 
-   if (run == 0 || ena == 0) {
-   aggr_printout(counter, id, nr);
-
-   fprintf(output, "%*s%s",
-   csv_output ? 0 : 18,
-   counter->supported ? CNTR_NOT_COUNTED : 
CNTR_NOT_SUPPORTED,
-   csv_sep);
-
-   fprintf(output, "%-*s%s",
-   csv_output ? 0 : unit_width,
-   counter->unit, csv_sep);
-
-   fprintf(output, "%*s",
-   csv_output ? 0 : -25,
-   perf_evsel__name(counter));
-
-   if (counter->cgrp)
-   fprintf(output, "%s%s",
-   csv_sep, counter->cgrp->name);
-
-   print_running(run, ena);
-   fputc('\n', output);
-   continue;
-   }
uval = val * counter->scale;
printout(id, nr, counter, uval, prefix, run, ena, 1.0);
fputc('\n', output);
@@ -830,31 +830,6 @@ static void print_counter(struct perf_evsel *counter, char 
*prefix)
if (prefix)
fprintf(output, "%s", prefix);
 
-   if (run == 0 || ena == 0) {
-   fprintf(output, "CPU%*d%s%*s%s",
-   csv_output ? 0 : -4,
-   perf_evsel__cpus(counter)->map[cpu], csv_sep,
-   csv_output ? 0 : 18,
-   counter->supported ? CNTR_NOT_COUNTED : 
CNTR_NOT_SUPPORTED,
-   csv_sep);
-
-   fprintf(output, "%-*s%s",
-   csv_output ? 0 : unit_width,
-   counter->unit, csv_sep);
-
-   fprintf(output, "%*s",
-   csv_output ? 0 : -25,
-   perf_evsel__name(counter));
-
-   if (counter->cgrp)
-   fprintf(output, "%s%s",
-   csv_sep, counter->cgrp->name);
-
-   print_running(run, ena);
-   fputc('\n', output);
-   continue;
-   }
-
uval = val * counter->scale;
printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/7] perf, tools, stat: Add support for metrics in interval mode

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Now that we can modify the metrics printout functions easily,
it's straight forward to support metric printing for interval mode.
All that is needed is to print the time stamp on every new line.
Pass the prefix into the context and print it out.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-stat.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index bf6c02c..30c2046 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -485,6 +485,7 @@ static void aggr_printout(struct perf_evsel *evsel, int id, 
int nr)
 
 struct outstate {
FILE *fh;
+   const char *prefix;
 };
 
 #define BASE_INDENT 41
@@ -495,13 +496,13 @@ struct outstate {
 static void new_line_no_aggr_std(void *ctx)
 {
struct outstate *os = ctx;
-   fprintf(os->fh, "\n%*s", BASE_INDENT + NA_INDENT, "");
+   fprintf(os->fh, "\n%s%-*s", os->prefix, BASE_INDENT + NA_INDENT, "");
 }
 
 static void new_line_std(void *ctx)
 {
struct outstate *os = ctx;
-   fprintf(os->fh, "\n%-*s", BASE_INDENT + AGGR_INDENT, "");
+   fprintf(os->fh, "\n%s%-*s", os->prefix, BASE_INDENT + AGGR_INDENT, "");
 }
 
 static void print_metric_std(void *ctx, const char *color, const char *fmt,
@@ -582,9 +583,13 @@ static void abs_printout(int id, int nr, struct perf_evsel 
*evsel, double avg)
fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
 }
 
-static void printout(int id, int nr, struct perf_evsel *counter, double uval)
+static void printout(int id, int nr, struct perf_evsel *counter, double uval,
+char *prefix)
 {
-   struct outstate os = { .fh = output };
+   struct outstate os = {
+   .fh = stat_config.output,
+   .prefix = prefix ? prefix : ""
+   };
print_metric_t pm = print_metric_std;
void (*nl)(void *);
 
@@ -660,7 +665,7 @@ static void print_aggr(char *prefix)
continue;
}
uval = val * counter->scale;
-   printout(id, nr, counter, uval);
+   printout(id, nr, counter, uval, prefix);
if (!csv_output)
print_noise(counter, 1.0);
 
@@ -691,7 +696,7 @@ static void print_aggr_thread(struct perf_evsel *counter, 
char *prefix)
fprintf(output, "%s", prefix);
 
uval = val * counter->scale;
-   printout(thread, 0, counter, uval);
+   printout(thread, 0, counter, uval, prefix);
 
if (!csv_output)
print_noise(counter, 1.0);
@@ -741,7 +746,7 @@ static void print_counter_aggr(struct perf_evsel *counter, 
char *prefix)
}
 
uval = avg * counter->scale;
-   printout(-1, 0, counter, uval);
+   printout(-1, 0, counter, uval, prefix);
 
print_noise(counter, avg);
 
@@ -794,7 +799,7 @@ static void print_counter(struct perf_evsel *counter, char 
*prefix)
}
 
uval = val * counter->scale;
-   printout(cpu, 0, counter, uval);
+   printout(cpu, 0, counter, uval, prefix);
 
if (!csv_output)
print_noise(counter, 1.0);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/7] perf, tools, stat: Move noise/running printing into printout

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Move the running/noise printing into printout to avoid
duplicated code in the callers.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-stat.c | 33 +++--
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 30c2046..0174aca 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -584,7 +584,7 @@ static void abs_printout(int id, int nr, struct perf_evsel 
*evsel, double avg)
 }
 
 static void printout(int id, int nr, struct perf_evsel *counter, double uval,
-char *prefix)
+char *prefix, u64 run, u64 ena, double noise)
 {
struct outstate os = {
.fh = stat_config.output,
@@ -609,6 +609,12 @@ static void printout(int id, int nr, struct perf_evsel 
*counter, double uval,
pm,
nl,
);
+
+   if (!csv_output) {
+   print_noise(counter, noise);
+   if (run != ena)
+   fprintf(stat_config.output, "  (%.2f%%)", 100.0 * run / 
ena);
+   }
 }
 
 static void print_aggr(char *prefix)
@@ -665,11 +671,7 @@ static void print_aggr(char *prefix)
continue;
}
uval = val * counter->scale;
-   printout(id, nr, counter, uval, prefix);
-   if (!csv_output)
-   print_noise(counter, 1.0);
-
-   print_running(run, ena);
+   printout(id, nr, counter, uval, prefix, run, ena, 1.0);
fputc('\n', output);
}
}
@@ -696,12 +698,7 @@ static void print_aggr_thread(struct perf_evsel *counter, 
char *prefix)
fprintf(output, "%s", prefix);
 
uval = val * counter->scale;
-   printout(thread, 0, counter, uval, prefix);
-
-   if (!csv_output)
-   print_noise(counter, 1.0);
-
-   print_running(run, ena);
+   printout(thread, 0, counter, uval, prefix, run, ena, 1.0);
fputc('\n', output);
}
 }
@@ -746,11 +743,7 @@ static void print_counter_aggr(struct perf_evsel *counter, 
char *prefix)
}
 
uval = avg * counter->scale;
-   printout(-1, 0, counter, uval, prefix);
-
-   print_noise(counter, avg);
-
-   print_running(avg_running, avg_enabled);
+   printout(-1, 0, counter, uval, prefix, avg_running, avg_enabled, avg);
fprintf(output, "\n");
 }
 
@@ -799,11 +792,7 @@ static void print_counter(struct perf_evsel *counter, char 
*prefix)
}
 
uval = val * counter->scale;
-   printout(cpu, 0, counter, uval, prefix);
-
-   if (!csv_output)
-   print_noise(counter, 1.0);
-   print_running(run, ena);
+   printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
 
fputc('\n', output);
}
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/7] perf, tools: Support metrics in --per-core/socket mode

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Enable metrics printing in --per-core / --per-socket mode. We need
to save the shadow metrics in a unique place. Always use the first
CPU in the aggregation. Then use the same CPU to retrieve the
shadow value later.

Example output:

% perf stat --per-core -a ./BC1s

 Performance counter stats for 'system wide':

S0-C0   22966.020381  task-clock (msec) #2.004 
CPUs utilized(100.00%)
S0-C0   2 49  context-switches  #0.017 
K/sec(100.00%)
S0-C0   2  4  cpu-migrations#0.001 
K/sec(100.00%)
S0-C0   2467  page-faults   #0.157 
K/sec
S0-C0   2  4,599,061,773  cycles#1.551 
GHz  (100.00%)
S0-C0   2  9,755,886,883  instructions  #2.12  
insn per cycle   (100.00%)
S0-C0   2  1,906,272,125  branches  #  642.704 
M/sec(100.00%)
S0-C0   2 81,180,867  branch-misses #4.26% 
of all branches
S0-C1   22965.995373  task-clock (msec) #2.003 
CPUs utilized(100.00%)
S0-C1   2 62  context-switches  #0.021 
K/sec(100.00%)
S0-C1   2  8  cpu-migrations#0.003 
K/sec(100.00%)
S0-C1   2281  page-faults   #0.095 
K/sec
S0-C1   2  6,347,290  cycles#0.002 
GHz  (100.00%)
S0-C1   2  4,654,156  instructions  #0.73  
insn per cycle   (100.00%)
S0-C1   2947,121  branches  #0.319 
M/sec(100.00%)
S0-C1   2 37,322  branch-misses #3.94% 
of all branches

   1.480409747 seconds time elapsed

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-stat.c | 62 ---
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index d777bb6..ea5298a 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -488,6 +488,8 @@ struct outstate {
const char *prefix;
int  nfields;
u64  run, ena;
+   int  id, nr;
+   struct perf_evsel *evsel;
 };
 
 #define BASE_INDENT 41
@@ -498,13 +500,19 @@ struct outstate {
 static void new_line_no_aggr_std(void *ctx)
 {
struct outstate *os = ctx;
-   fprintf(os->fh, "\n%s%-*s", os->prefix, BASE_INDENT + NA_INDENT, "");
+
+   fprintf(os->fh, "\n%s", os->prefix);
+   aggr_printout(os->evsel, os->id, os->nr);
+   fprintf(os->fh, "%-*s", BASE_INDENT + NA_INDENT, "");
 }
 
 static void new_line_std(void *ctx)
 {
struct outstate *os = ctx;
-   fprintf(os->fh, "\n%s%-*s", os->prefix, BASE_INDENT + AGGR_INDENT, "");
+
+   fprintf(os->fh, "\n%s", os->prefix);
+   aggr_printout(os->evsel, os->id, os->nr);
+   fprintf(os->fh, "%-*s", BASE_INDENT + AGGR_INDENT, "");
 }
 
 static void print_metric_std(void *ctx, const char *color, const char *fmt,
@@ -535,6 +543,7 @@ static void new_line_csv(void *ctx)
fputc('\n', os->fh);
if (os->prefix)
fprintf(os->fh, "%s%s", os->prefix, csv_sep);
+   aggr_printout(os->evsel, os->id, os->nr);
for (i = 0; i < os->nfields; i++)
fputs(csv_sep, os->fh);
 }
@@ -598,6 +607,22 @@ static void nsec_printout(int id, int nr, struct 
perf_evsel *evsel, double avg)
fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
 }
 
+static int first_shadow_cpu(struct perf_evsel *evsel, int id)
+{
+   int i;
+
+   if (aggr_get_id == NULL)
+   return 0;
+
+   for (i = 0; i < perf_evsel__nr_cpus(evsel); i++) {
+   int cpu2 = perf_evsel__cpus(evsel)->map[i];
+
+   if (aggr_get_id(evsel_list->cpus, cpu2) == id)
+   return cpu2;
+   }
+   return 0;
+}
+
 static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
 {
FILE *output = stat_config.output;
@@ -633,7 +658,10 @@ static void printout(int id, int nr, struct perf_evsel 
*counter, double uval,
 {
struct outstate os = {
.fh = stat_config.output,
-   .prefix = prefix ? prefix : ""
+   .prefix = prefix ? prefix : "",
+   .id = id,
+   .nr = nr,
+   .evsel = counter,
};
print_metric_t pm = print_metric_std;
void (*nl)(void *);
@@ -693,7 +721,7 @@ static void printout(int id, int nr, struct perf_evsel 
*counter, double uval,
 
perf_stat__print_shadow_stats(counter, 

[PATCH 2/7] perf, tools, stat: Abstract stat metrics printing

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Abstract the printing of shadow metrics. Instead of every
metric calling fprintf directly and taking care of indentation,
use two call backs: one to print metrics and another to
start a new line.

This will allow adding metrics to CSV mode and also
using them for other purposes.

The computation of padding is now done in the central
callback, instead of every metric doing it manually.
This makes it easier to add new metrics.

v2: Refactor functions, printout now does more. Move
shadow printing.
Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-stat.c |  96 +
 tools/perf/util/stat-shadow.c | 158 ++
 tools/perf/util/stat.h|  10 ++-
 3 files changed, 157 insertions(+), 107 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index fac0c50..bf6c02c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -483,6 +483,47 @@ static void aggr_printout(struct perf_evsel *evsel, int 
id, int nr)
}
 }
 
+struct outstate {
+   FILE *fh;
+};
+
+#define BASE_INDENT 41
+#define AGGR_INDENT  8
+#define METRIC_LEN  35
+#define NA_INDENT   16
+
+static void new_line_no_aggr_std(void *ctx)
+{
+   struct outstate *os = ctx;
+   fprintf(os->fh, "\n%*s", BASE_INDENT + NA_INDENT, "");
+}
+
+static void new_line_std(void *ctx)
+{
+   struct outstate *os = ctx;
+   fprintf(os->fh, "\n%-*s", BASE_INDENT + AGGR_INDENT, "");
+}
+
+static void print_metric_std(void *ctx, const char *color, const char *fmt,
+const char *unit, double val)
+{
+   struct outstate *os = ctx;
+   FILE *out = os->fh;
+   int n;
+
+   if (unit == NULL) {
+   fprintf(out, "%-*s", METRIC_LEN, "");
+   return;
+   }
+
+   n = fprintf(out, " # ");
+   if (color)
+   n += color_fprintf(out, color, fmt, val);
+   else
+   n += fprintf(out, fmt, val);
+   fprintf(out, " %-*s", METRIC_LEN - n - 1, unit);
+}
+
 static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)
 {
FILE *output = stat_config.output;
@@ -516,7 +557,6 @@ static void abs_printout(int id, int nr, struct perf_evsel 
*evsel, double avg)
FILE *output = stat_config.output;
double sc =  evsel->scale;
const char *fmt;
-   int cpu = cpu_map__id_to_cpu(id);
 
if (csv_output) {
fmt = sc != 1.0 ?  "%.2f%s" : "%.0f%s";
@@ -529,9 +569,6 @@ static void abs_printout(int id, int nr, struct perf_evsel 
*evsel, double avg)
 
aggr_printout(evsel, id, nr);
 
-   if (stat_config.aggr_mode == AGGR_GLOBAL)
-   cpu = 0;
-
fprintf(output, fmt, avg, csv_sep);
 
if (evsel->unit)
@@ -543,12 +580,30 @@ static void abs_printout(int id, int nr, struct 
perf_evsel *evsel, double avg)
 
if (evsel->cgrp)
fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
+}
 
-   if (csv_output || stat_config.interval)
-   return;
+static void printout(int id, int nr, struct perf_evsel *counter, double uval)
+{
+   struct outstate os = { .fh = output };
+   print_metric_t pm = print_metric_std;
+   void (*nl)(void *);
 
-   perf_stat__print_shadow_stats(output, evsel, avg, cpu,
- stat_config.aggr_mode);
+   if (stat_config.aggr_mode == AGGR_NONE)
+   nl = new_line_no_aggr_std;
+   else
+   nl = new_line_std;
+
+   if (nsec_counter(counter))
+   nsec_printout(id, nr, counter, uval);
+   else
+   abs_printout(id, nr, counter, uval);
+
+   perf_stat__print_shadow_stats(counter, uval,
+   stat_config.aggr_mode == AGGR_GLOBAL ? 0 :
+   cpu_map__id_to_cpu(id),
+   pm,
+   nl,
+   );
 }
 
 static void print_aggr(char *prefix)
@@ -605,12 +660,7 @@ static void print_aggr(char *prefix)
continue;
}
uval = val * counter->scale;
-
-   if (nsec_counter(counter))
-   nsec_printout(id, nr, counter, uval);
-   else
-   abs_printout(id, nr, counter, uval);
-
+   printout(id, nr, counter, uval);
if (!csv_output)
print_noise(counter, 1.0);
 
@@ -641,11 +691,7 @@ static void print_aggr_thread(struct perf_evsel *counter, 
char *prefix)
fprintf(output, "%s", prefix);
 
uval = val * counter->scale;
-
-   if (nsec_counter(counter))
-   nsec_printout(thread, 0, counter, uval);
-   else
-   abs_printout(thread, 0, counter, uval);
+   

perf, tools: Refactor and support interval and CSV metrics v3

2015-08-07 Thread Andi Kleen

[v2: Addressed (near) all review feedback. No manpage updates so far.
 Add support for --per-core metrics. Various cleanups.]
[v3: Everything compiles again. Some more cleanups. Updated to latest tip tree.]

Currently perf stat does not support printing computed metrics for interval (-I 
xxx)
or CSV (-x,) mode. For example IPC or TSX metrics over time are quite useful to 
know.

This patch implements them. The main obstacle was that the
metrics printing was all open coded all over the metrics computation code.
The second patch refactors the metrics printing to work through call backs that
can be more easily changed. This also cleans up the metrics printing 
significantly.
The indentation is now handled through printf, no more need to manually count 
spaces.

Then based on that it implements metrics printing for CSV and interval mode.

Example output:

% perf stat  -I1000 -a sleep 1
#  time  counts unit eventsmetric   
   multiplex
 1.001301370   12020.049593  task-clock (msec)  
   (100.00%)
 1.001301370  3,952  context-switches  #0.329 
K/sec(100.00%)
 1.001301370 69  cpu-migrations#0.006 
K/sec(100.00%)
 1.001301370 76  page-faults   #0.006 
K/sec  
 1.001301370386,582,789  cycles#0.032 
GHz  (100.00%)
 1.001301370716,441,544  stalled-cycles-frontend   #  185.33% 
frontend cycles idle (100.00%)
 1.001301370  stalled-cycles-backend   
 1.001301370101,751,678  instructions  #0.26  
insn per cycle 
 1.001301370   #7.04  
stalled cycles per insn  (100.00%)
 1.001301370 20,914,692  branches  #1.740 
M/sec(100.00%)
 1.001301370  1,943,630  branch-misses #9.29% 
of all branches

CSV mode

% perf stat  -x, -I1000 -a sleep 1
 1.000852081,12016.143006,,task-clock
 1.000852081,4457,,context-switches,12015168277,100.00,0.371,K/sec
 1.000852081,50,,cpu-migrations,12014024424,100.00,0.004,K/sec
 1.000852081,76,,page-faults,12013076716,100.00,0.006,K/sec
 1.000852081,515854373,,cycles,12011235336,100.00,0.043,GHz
 
1.000852081,1030742150,,stalled-cycles-frontend,12010984057,100.00,199.81,frontend
 cycles idle
 1.000852081,,,stalled-cycles-backend,0,100.00
 1.000852081,116782495,,instructions,12011130729,100.00,0.23,insn per cycle
 1.00085208112011130729,100.00,8.83,stalled cycles per insn
 1.000852081,23748237,,branches,12010745125,100.00,1.976,M/sec
 1.000852081,1976560,,branch-misses,12010501884,100.00,8.32,of all branches

Available in
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6 
perf/stat-metrics-3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/7] perf, tools, stat: Implement CSV metrics output

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

Now support CSV output for metrics. With the new output callbacks
this is relatively straight forward by creating new callbacks.

The new line callback needs to know the number of fields to skip them
correctly

v2: Split out function argument changes
Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-stat.c | 64 +++
 1 file changed, 64 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 0174aca..7801665 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -486,6 +486,8 @@ static void aggr_printout(struct perf_evsel *evsel, int id, 
int nr)
 struct outstate {
FILE *fh;
const char *prefix;
+   int  nfields;
+   u64  run, ena;
 };
 
 #define BASE_INDENT 41
@@ -525,6 +527,49 @@ static void print_metric_std(void *ctx, const char *color, 
const char *fmt,
fprintf(out, " %-*s", METRIC_LEN - n - 1, unit);
 }
 
+static void new_line_csv(void *ctx)
+{
+   struct outstate *os = ctx;
+   int i;
+
+   fputc('\n', os->fh);
+   if (os->prefix)
+   fprintf(os->fh, "%s%s", os->prefix, csv_sep);
+   for (i = 0; i < os->nfields; i++)
+   fputs(csv_sep, os->fh);
+}
+
+static void print_metric_csv(void *ctx,
+const char *color __maybe_unused,
+const char *fmt, const char *unit, double val)
+{
+   struct outstate *os = ctx;
+   FILE *out = os->fh;
+   char buf[64], *vals, *ends;
+
+   if (unit == NULL) {
+   fprintf(out, "%s%s%s%s", csv_sep, csv_sep, csv_sep, csv_sep);
+   return;
+   }
+   fprintf(out, "%s%" PRIu64 "%s%.2f%s",
+   csv_sep,
+   os->run,
+   csv_sep,
+   os->ena ? 100.0 * os->run / os->ena : 100.0,
+   csv_sep);
+   snprintf(buf, sizeof(buf), fmt, val);
+   vals = buf;
+   while (isspace(*vals))
+   vals++;
+   ends = vals;
+   while (isdigit(*ends) || *ends == '.')
+   ends++;
+   *ends = 0;
+   while (isspace(*unit))
+   unit++;
+   fprintf(out, "%s%s%s", vals, csv_sep, unit);
+}
+
 static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)
 {
FILE *output = stat_config.output;
@@ -598,6 +643,25 @@ static void printout(int id, int nr, struct perf_evsel 
*counter, double uval,
else
nl = new_line_std;
 
+   if (csv_output) {
+   static int aggr_fields[] = {
+   [AGGR_GLOBAL] = 0,
+   [AGGR_THREAD] = 1,
+   [AGGR_NONE] = 1,
+   [AGGR_SOCKET] = 2,
+   [AGGR_CORE] = 2,
+   };
+
+   pm = print_metric_csv;
+   nl = new_line_csv;
+   os.nfields = 1;
+   os.nfields += aggr_fields[stat_config.aggr_mode];
+   if (counter->cgrp)
+   os.nfields++;
+   os.run = run;
+   os.ena = ena;
+   }
+
if (nsec_counter(counter))
nsec_printout(id, nr, counter, uval);
else
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/7] perf, tools, stat: Move sw clock metrics printout to stat-shadow

2015-08-07 Thread Andi Kleen
From: Andi Kleen 

The sw clock metrics printing was missed in the earlier move to
stat-shadow of all the other metric printouts. Move it too.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-stat.c | 9 -
 tools/perf/util/stat-shadow.c | 4 
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a054ddc..fac0c50 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -509,15 +509,6 @@ static void nsec_printout(int id, int nr, struct 
perf_evsel *evsel, double avg)
 
if (evsel->cgrp)
fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
-
-   if (csv_output || stat_config.interval)
-   return;
-
-   if (perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK))
-   fprintf(output, " # %8.3f CPUs utilized  ",
-   avg / avg_stats(_nsecs_stats));
-   else
-   fprintf(output, "   ");
 }
 
 static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 2a5d8d7..fc970db 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -413,6 +413,10 @@ void perf_stat__print_shadow_stats(FILE *out, struct 
perf_evsel *evsel,
ratio = total / avg;
 
fprintf(out, " # %8.0f cycles / elision   ", ratio);
+   } else if (perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK) &&
+  (ratio = avg_stats(_nsecs_stats)) != 0) {
+   fprintf(output, " # %8.3f CPUs utilized  ",
+avg / ratio);
} else if (runtime_nsecs_stats[cpu].n != 0) {
char unit = 'M';
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 17/21] h8300: clock driver

2015-08-07 Thread Michael Turquette
Hello Sato-san,

Unfortunately this patch did not Cc myself, Stephen Boyd or the
linux-...@vger.kernel.org mailing list. As such Stephen and I did not
have a chance to review it. Even more unfortunate was that it was ninja
merged by maintainers without our ack. :-/

Quoting Yoshinori Sato (2015-05-10 23:26:36)
> Signed-off-by: Yoshinori Sato 
> ---
>  .../bindings/clock/renesas,h8300-div-clock.txt |  24 
>  .../bindings/clock/renesas,h8s2678-pll-clock.txt   |  23 
>  drivers/clk/Makefile   |   1 +
>  drivers/clk/h8300/Makefile |   2 +
>  drivers/clk/h8300/clk-div.c|  53 
>  drivers/clk/h8300/clk-h8s2678.c| 147 
> +
>  6 files changed, 250 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/clock/renesas,h8300-div-clock.txt
>  create mode 100644 
> Documentation/devicetree/bindings/clock/renesas,h8s2678-pll-clock.txt
>  create mode 100644 drivers/clk/h8300/Makefile
>  create mode 100644 drivers/clk/h8300/clk-div.c
>  create mode 100644 drivers/clk/h8300/clk-h8s2678.c
> 
> diff --git 
> a/Documentation/devicetree/bindings/clock/renesas,h8300-div-clock.txt 
> b/Documentation/devicetree/bindings/clock/renesas,h8300-div-clock.txt
> new file mode 100644
> index 000..36c2b52
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/clock/renesas,h8300-div-clock.txt
> @@ -0,0 +1,24 @@
> +* Renesas H8/300 divider clock
> +
> +Required Properties:
> +
> +  - compatible: Must be "renesas,sh73a0-h8300-div-clock"
> +
> +  - clocks: Reference to the parent clocks ("extal1" and "extal2")
> +
> +  - #clock-cells: Must be 1
> +
> +  - reg: Base address and length of the divide rate selector
> +
> +  - renesas,width: bit width of selector
> +
> +Example
> +---
> +
> +   cclk: cclk {
> +   compatible = "renesas,h8300-div-clock";
> +   clocks = <>;
> +   #clock-cells = <0>;
> +   reg = <0xfee01b 2>;
> +   renesas,width = <2>;
> +   };

I could not find any info on this clock in the H8S/2678 reference
manual[0]. Could you point me to the right documentation?

> diff --git 
> a/Documentation/devicetree/bindings/clock/renesas,h8s2678-pll-clock.txt 
> b/Documentation/devicetree/bindings/clock/renesas,h8s2678-pll-clock.txt
> new file mode 100644
> index 000..500cdadb
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/clock/renesas,h8s2678-pll-clock.txt
> @@ -0,0 +1,23 @@
> +Renesas H8S2678 PLL clock
> +
> +This device is Clock multiplyer
> +
> +Required Properties:
> +
> +  - compatible: Must be "renesas,h8s2678-pll-clock"
> +
> +  - clocks: Reference to the parent clocks
> +
> +  - #clock-cells: Must be 0
> +
> +  - reg: Two rate selector (Multiply / Divide) register address
> +
> +Example
> +---
> +
> +   pllclk: pllclk {
> +   compatible = "renesas,h8s2678-pll-clock";
> +   clocks = <>;
> +   #clock-cells = <0>;
> +   reg = <0xfee03b 2>, <0xfee045 2>;
> +   };

Is there really only one clock output? According to figure 21.1 there is
the "System clock to φ pin" output and the "Internal clock to peripheral
modules" output.

I am wondering if clock-cells should be 1 instead of zero and support
both of these output signals?

As a nitpick, I think it would have been better to name the node "cpg"
as it is listed in Section 21. pllclk is only one of the two registers
that make up the cpg. Something like:

cpg: clock-controller@fee03b

If you do decide to have clock-cells greater than zero, you might find
the following threads helpful. They describe how to craft a
clock-controller style binding:

http://lkml.kernel.org/r/<20150411001231.18916.93186@quantum>

http://lkml.kernel.org/r/<20150724034229.642.88156@quantum>

As an additional thought, it looks like the module stop registers are
mixed in with the clock registers. When you decide to write a reset
driver for these platforms you might want to re-use this existing dt
binding description and put the reset code into your clock provider
driver. Grep for reset.h in the drivers/clk/ directory for some
examples.

[0] http://documentation.renesas.com/doc/products/mpumcu/rej09b0283_2678hm.pdf

Regards,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] dma: add __must_check annotation for dmaengine_pause()

2015-08-07 Thread Peter Hurley
On 08/07/2015 04:00 PM, Sebastian Andrzej Siewior wrote:
> In 8250-omap I learned it the hard way that ignoring the return code
> of dmaengine_pause() might be bad because the underlying DMA driver
> might not support the function at all and so not doing what one is
> expecting.
> This patch adds the __must_check annotation as suggested by Russell King.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> ---
>  include/linux/dmaengine.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index 8ad9a4e839f6..4eac4716bded 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -825,7 +825,7 @@ static inline int dmaengine_terminate_all(struct dma_chan 
> *chan)
>   return -ENOSYS;
>  }
>  
> -static inline int dmaengine_pause(struct dma_chan *chan)
> +static inline int __must_check dmaengine_pause(struct dma_chan *chan)
>  {
>   if (chan->device->device_pause)
>   return chan->device->device_pause(chan);
> 

Not that this is your responsibility, Sebastian, but considering there are
fewer than 20 users of dmaengine_pause() in the entire tree, we should add
WARN_ON_ONCE() around those uses with this patch to avoid a bunch needless
one-off "fixes".

Regards,
Peter Hurley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Input: synaptics-rmi4: Add device tree support for RMI4 I2C devices

2015-08-07 Thread Andrew Duggan
Add devicetree binding for I2C devices and add bindings for optional
parameters in the function drivers.

Signed-off-by: Andrew Duggan 
---
I saw Benjamin Tissoires's email about the lack of a devicetree implementation
for rmi_i2c.c. I decided to clean up and add documentation to the implementaion
which I have been using and submit it for review.

This patch applies to the current implementation of Dmitry's synaptics-rmi4
branch in the input repository. If Benjamin's patchset gets applied before
this I can rebase this patch.

Thanks,
Andrew

 .../devicetree/bindings/input/rmi4/rmi_f01.txt |  34 ++
 .../devicetree/bindings/input/rmi4/rmi_f11.txt |  51 
 .../devicetree/bindings/input/rmi4/rmi_i2c.txt |  40 +++
 .../devicetree/bindings/vendor-prefixes.txt|   1 +
 drivers/input/rmi4/rmi_bus.c   |  49 
 drivers/input/rmi4/rmi_bus.h   |   8 +-
 drivers/input/rmi4/rmi_driver.c|  46 ++-
 drivers/input/rmi4/rmi_f01.c   |  50 +++-
 drivers/input/rmi4/rmi_f11.c   | 133 -
 drivers/input/rmi4/rmi_i2c.c   |  60 +-
 include/linux/rmi.h|   2 +-
 11 files changed, 465 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/input/rmi4/rmi_f01.txt
 create mode 100644 Documentation/devicetree/bindings/input/rmi4/rmi_f11.txt
 create mode 100644 Documentation/devicetree/bindings/input/rmi4/rmi_i2c.txt

diff --git a/Documentation/devicetree/bindings/input/rmi4/rmi_f01.txt 
b/Documentation/devicetree/bindings/input/rmi4/rmi_f01.txt
new file mode 100644
index 000..53846e2
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/rmi4/rmi_f01.txt
@@ -0,0 +1,34 @@
+Synaptics RMI4 F01 Device Binding
+
+The Synaptics RMI4 core is able to support RMI4 devices using differnet
+transports and differnet functions. This file describes the device tree
+bindings for devices which contain Function 1. Complete documentation
+for transports and other functions can be found in:
+Documentation/devicetree/bindings/input/rmi4.
+
+Additional documentation for F01 can be found at:
+http://www.synaptics.com/sites/default/files/511-000136-01-Rev-E-RMI4-Interfacing-Guide.pdf
+
+Optional Properties:
+- syna,f01-nosleep: If set the device will run at full power without sleeping.
+- syna,f01-wakeup-threshold: Defines the amplitude of the disturbance to the
+   background capacitance that will cause the
+   device to wake from dozing.
+- syna,f01-doze-holdoff: The delay to wait after the last finger lift and the
+   first doze cycle (in 0.1 second units).
+- syna,f01-doze-interval: The time period that the device sleeps between finger
+   activity (in 10 ms units).
+
+
+Example of a RMI4 I2C device with F01:
+{
+   rmi-i2c-dev@2c {
+   compatible = "syna,rmi-i2c";
+   reg = <0x2c>;
+   syna,sensor-name="TM1949";
+   syna,attn-gpio = <4 2>;
+   syna,attn-polarity = <0>;
+   syna,level-triggered = <1>;
+   syna,f01-nosleep = <1>;
+   };
+   };
diff --git a/Documentation/devicetree/bindings/input/rmi4/rmi_f11.txt 
b/Documentation/devicetree/bindings/input/rmi4/rmi_f11.txt
new file mode 100644
index 000..2405523
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/rmi4/rmi_f11.txt
@@ -0,0 +1,51 @@
+Synaptics RMI4 F11 Device Binding
+
+The Synaptics RMI4 core is able to support RMI4 devices using differnet
+transports and differnet functions. This file describes the device tree
+bindings for devices which contain Function 11. Complete documentation
+for transports and other functions can be found in:
+Documentation/devicetree/bindings/input/rmi4.
+
+RMI4 Function 11 is for 2D touch position sensing. Additional documentation for
+F11 can be found at:
+http://www.synaptics.com/sites/default/files/511-000136-01-Rev-E-RMI4-Interfacing-Guide.pdf
+
+Optional Properties:
+- syna,f11-swap-axes: Swap X and Y positions when reporting.
+- syna,f11-flip-x: Reverse the direction of X.
+- syna,f11-flip-y: Reverse the direction of Y.
+- syna,f11-clip-x-low: Sets a minimum value for X.
+- syna,f11-clip-y-low: Sets a minimum value for Y.
+- syna,f11-clip-x-high: Sets a maximum value for X.
+- syna,f11-clip-y-high: Sets a maximum value for Y.
+- syna,f11-offset-x: Add an offset to X.
+- syna,f11-offset_y: Add an offset to Y.
+- syna,f11-delta-x-threshold: Set the minimum distance on the X axis required
+   to generate an interrupt in reduced reporting
+   mode.
+- syna,f11-delta-y-threshold: Set the minimum distance on the Y axis required
+   

Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.

2015-08-07 Thread Rafael J. Wysocki
Hi David,

On Fri, Aug 7, 2015 at 8:14 PM, David Daney  wrote:
> On 08/07/2015 07:54 AM, Graeme Gregory wrote:
>>
>> On Thu, Aug 06, 2015 at 05:33:10PM -0700, David Daney wrote:
>>>
>>> From: David Daney 
>>>
>>> Find out which PHYs belong to which BGX instance in the ACPI way.
>>>
>>> Set the MAC address of the device as provided by ACPI tables. This is
>>> similar to the implementation for devicetree in
>>> of_get_mac_address(). The table is searched for the device property
>>> entries "mac-address", "local-mac-address" and "address" in that
>>> order. The address is provided in a u64 variable and must contain a
>>> valid 6 bytes-len mac addr.
>>>
>>> Based on code from: Narinder Dhillon 
>>>  Tomasz Nowicki 
>>>  Robert Richter 
>>>
>>> Signed-off-by: Tomasz Nowicki 
>>> Signed-off-by: Robert Richter 
>>> Signed-off-by: David Daney 
>>> ---
>>>   drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 137
>>> +-
>>>   1 file changed, 135 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
>>> b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
>>> index 615b2af..2056583 100644
>>> --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
>>> +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
>
> [...]
>>>
>>> +
>>> +static int acpi_get_mac_address(struct acpi_device *adev, u8 *dst)
>>> +{
>>> +   const union acpi_object *prop;
>>> +   u64 mac_val;
>>> +   u8 mac[ETH_ALEN];
>>> +   int i, j;
>>> +   int ret;
>>> +
>>> +   for (i = 0; i < ARRAY_SIZE(addr_propnames); i++) {
>>> +   ret = acpi_dev_get_property(adev, addr_propnames[i],
>>> +   ACPI_TYPE_INTEGER, );
>>
>>
>> Shouldn't this be trying to use device_property_read_* API and making
>> the DT/ACPI path the same where possible?
>>
>
> Ideally, something like you suggest would be possible.  However, there are a
> couple of problems trying to do it in the kernel as it exists today:
>
> 1) There is no 'struct device *' here, so device_property_read_* is not
> applicable.
>
> 2) There is no standard ACPI binding for MAC addresses, so it is impossible
> to create a hypothetical fw_get_mac_address(), which would be analogous to
> of_get_mac_address().
>
> Other e-mail threads have suggested that the path to an elegant solution is
> to inter-mix a bunch of calls to acpi_dev_get_property*() and
> fwnode_property_read*() as to use these more generic fwnode_property_read*()
> functions whereever possible.  I rejected this approach as it seems cleaner
> to me to consistently use a single set of APIs.

Actually, that wasn't my intention.

I wanted to say that once you'd got an ACPI device pointer (struct
acpi_device), you could easly convert it to a struct fwnode_handle
pointer and operate that going forward when accessing properties.
That at least would help with the properties that do not differ
between DT and ACPI.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Kent Overstreet
On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote:
> 
> On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
> > I'm for solution 3:
> > 
> >  - keep blk_bio_{discard,write_same}_split, but ensure we never built
> >a > 4GB bio in blkdev_issue_{discard,write_same}.
> 
> This has problem as I mentioned in solution 1.
> We need to also make sure max discard size is of proper granularity.
> See below example.
> 
>   4G: 8388608 sectors
> UINT_MAX: 8388607 sectors
> 
> dm-thinp block size = default discard granularity = 128 sectors
> 
> blkdev_issue_discard(sector=0, nr_sectors=8388608)
> 
> 1. Only ensure bi_size not overflow
> 
> It doesn't work.
> 
> [start_sector, end_sector]
> [0, 8388607]
> [0, 8388606], then dm-thinp splits it to 2 bios
> [0, 8388479]
> [8388480, 8388606] ---> this has problem in process_discard_bio(),
> because the discard size(7 sectors) covers 
> less than a block(128 sectors)
> [8388607, 8388607] ---> same problem 
> 
> 2. Ensure bi_size not overflow and max discard size is of proper granularity

Ideally we'd get upper layers out of the business of knowing about the queue
limits at all - that was the point of the patch series, after all.

Instead of using UINT_MAX, would it work to just make the max 1 << 31 sectors?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] tty: serial: 8250_omap: do not use RX DMA if pause is not supported

2015-08-07 Thread Peter Hurley
On 08/07/2015 04:00 PM, Sebastian Andrzej Siewior wrote:
> The 8250-omap driver requires the DMA-engine driver to support the pause
> command in order to properly turn off programmed RX transfer before the
> driver stars manually reading from the FIFO.
> The lacking support of the requirement has been discovered recently. In
> order to stay safe here we disable support for RX-DMA as soon as we
> notice that it does not work. This should happen very early.
> If the user does not want to see this backtrace he can either disable
> DMA support (completely or RX-only) or backport the required patches for
> edma / omap-dma once they hit mainline.
> 
> Cc: 
> Signed-off-by: Sebastian Andrzej Siewior 
> ---
>  drivers/tty/serial/8250/8250_omap.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/serial/8250/8250_omap.c 
> b/drivers/tty/serial/8250/8250_omap.c
> index 0340ee6ba970..07a11e0935e4 100644
> --- a/drivers/tty/serial/8250/8250_omap.c
> +++ b/drivers/tty/serial/8250/8250_omap.c
> @@ -112,6 +112,7 @@ struct omap8250_priv {
>   struct work_struct qos_work;
>   struct uart_8250_dma omap8250_dma;
>   spinlock_t rx_dma_lock;
> + bool rx_dma_broken;
>  };
>  
>  static u32 uart_read(struct uart_8250_port *up, u32 reg)
> @@ -761,6 +762,7 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port 
> *p)
>   struct omap8250_priv*priv = p->port.private_data;
>   struct uart_8250_dma*dma = p->dma;
>   unsigned long   flags;
> + int ret;
>  
>   spin_lock_irqsave(>rx_dma_lock, flags);
>  
> @@ -769,7 +771,9 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port 
> *p)
>   return;
>   }
>  
> - dmaengine_pause(dma->rxchan);
> + ret = dmaengine_pause(dma->rxchan);
> + if (WARN_ON_ONCE(ret))
> + priv->rx_dma_broken = true;

No offense, Sebastian, but it boggles my mind that anyone could defend this
as solid api design. We're in the middle of an interrupt handler and the
slave dma driver is /just/ telling us now that it doesn't implement this
functionality?!!?

The dmaengine api has _so much_ setup and none of it contemplates telling the
consumer that critical functionality is missing?

Even dma_get_slave_caps() returns _true_ for cmd_pause support; ok, that
interface is pointless.

Rather than losing /critical data/ here, the interrupt handler should just
busy-wait until dmaengine_tx_status() returns DMA_COMPLETE for the rx_cookie.

Regards,
Peter Hurley

>   spin_unlock_irqrestore(>rx_dma_lock, flags);
>  
> @@ -813,6 +817,9 @@ static int omap_8250_rx_dma(struct uart_8250_port *p, 
> unsigned int iir)
>   break;
>   }
>  
> + if (priv->rx_dma_broken)
> + return -EINVAL;
> +
>   spin_lock_irqsave(>rx_dma_lock, flags);
>  
>   if (dma->rx_running)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.

2015-08-07 Thread Rafael J. Wysocki
Hi David,

On Sat, Aug 8, 2015 at 2:11 AM, David Daney  wrote:
> On 08/07/2015 05:05 PM, Rafael J. Wysocki wrote:

[cut]

>>
>> It is actually useful to people as far as I can say.
>>
>> Also, if somebody is going to use properties with ACPI, why whould
>> they use a different set of properties with DT?
>>
>> Wouldn't it be more reasonable to use the same set in both cases?
>
>
> Yes, but there is still quite a bit of leeway to screw things up.

That I have to agree with, unfortunately.

On the other hand, this is a fairly new concept and we need to gain
some experience with it to be able to come up with best practices and
so on.  Cases like yours are really helpful here.

> FWIW:  http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf
>
> This actually seems to have been adopted by the UEFI people as a
> "Standard", I am not sure where a record of this is kept though.

Work on this is in progress, but far from completion.  Essentially,
what's needed is more pressure from vendors who want to use properties
in their firmware.

> So, we are changing our firmware to use this standard (which is quite
> similar the the DT with respect to MAC addresses).

Cool. :-)

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] Powerpc: mpc85xx: refactor the PM operations

2015-08-07 Thread Scott Wood
On Fri, 2015-08-07 at 11:19 +0800, Chenhui Zhao wrote:
> On Fri, Aug 7, 2015 at 2:02 AM, Scott Wood  
> wrote:
> > On Thu, 2015-08-06 at 13:54 +0800, Chenhui Zhao wrote:
> > >  On Thu, Aug 6, 2015 at 1:46 PM, Scott Wood 
> > >  wrote:
> > >  > On Thu, 2015-08-06 at 12:20 +0800, Chenhui Zhao wrote:
> > >  > >  On Thu, Aug 6, 2015 at 10:57 AM, Scott Wood
> > >  > > 
> > >  > >  wrote:
> > >  > >  > On Wed, 2015-08-05 at 18:11 +0800, Chenhui Zhao wrote:
> > >  > >  > >  On Tue, Aug 4, 2015 at 4:26 AM, Scott Wood
> > >  > > 
> > >  > >  > >  wrote:
> > >  > >  > >  > On Mon, 2015-08-03 at 19:32 +0800, Chenhui Zhao wrote:
> > >  > >  > >  > >  >
> > >  > >  > >  >
> > >  > >  > >  > >  On Sat, Aug 1, 2015 at 7:59 AM, Scott Wood
> > >  > >  > > 
> > >  > >  > >  > >  wrote:
> > >  > >  > >  >
> > >  > >  > >  > >  >
> > >  > >  > >  > >  > Could you explain irq_mask()?  Why would there 
> > > still be
> > >  > > IRQs
> > >  > >  > >  > > destined
> > >  > >  > >  > >  > for
> > >  > >  > >  > >  > this CPU at this point?
> > >  > >  > >  > >
> > >  > >  > >  > >  This function just masks irq by setting the 
> > > registers in
> > >  > > RCPM
> > >  > >  > > (for
> > >  > >  > >  > >  example, RCPM_CPMIMR, RCPM_CPMCIMR). Actually, all 
> > > irqs to
> > >  > >  > > this CPU
> > >  > >  > >  > >  have been migrated to other CPUs.
> > >  > >  > >  >
> > >  > >  > >  > So why do we need to set those bits in RCPM?  Is it just
> > >  > > caution?
> > >  > >  > >
> > >  > >  > >  Setting these bits can mask interrupts signalled to RCPM 
> > > from
> > >  > > MPIC
> > >  > >  > > as a
> > >  > >  > >  means of
> > >  > >  > >  waking up from a lower power state. So, cores will not be
> > >  > > waked up
> > >  > >  > >  unexpectedly.
> > >  > >  >
> > >  > >  > Why would the MPIC be signalling those interrupts if they've 
> > > been
> > >  > >  > masked at
> > >  > >  > the MPIC?
> > >  > >  >
> > >  > >  > -Scott
> > >  > >  >
> > >  > >
> > >  > >  The interrupts to RCPM from MPIC are IRQ, Machine Check, NMI 
> > > and
> > >  > >  Critical interrupts. Some of them didn't be masked in MPIC.
> > >  >
> > >  > What interrupt could actually happen to a sleeping cpu that this
> > >  > protects
> > >  > against?
> > >  >
> > >  > -Scott
> > > 
> > >  Not sure. Maybe spurious interrupts or hardware exceptions.
> > 
> > Spurious interrupts happen due to race conditions.  They don't happen 
> > because
> > the MPIC is bored and decides to ring a CPU's doorbell and hide in 
> > the bushes.
> > 
> > If by "hardware exceptions" you mean machine checks, how would such a 
> > machine
> > check be generated by a core that is off?
> > 
> > >   However, setting them make sure dead cpus can not be waked up 
> > > unexpectedly.
> > 
> > I'm not seeing enough value here to warrant resurrecting the old 
> > sleep node
> > stuff.
> > 
> > -Scott
> 
> My guess maybe not accurate. My point is that electronic parts don't 
> always work as expected. Taking preventative measures can make the 
> system more robust. In addition, this step is required in deep sleep 
> procedure.

The deep sleep part is more convincing -- so MPIC masking is not effective 
during deep sleep?

-Scott

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.

2015-08-07 Thread David Daney

On 08/07/2015 05:05 PM, Rafael J. Wysocki wrote:

Hi Mark,

On Fri, Aug 7, 2015 at 7:51 PM, Mark Rutland  wrote:

[Correcting the devicetree list address, which I typo'd in my original
reply]


+static const char * const addr_propnames[] = {
+  "mac-address",
+  "local-mac-address",
+  "address",
+};


If these are going to be generally necessary, then we should get them
adopted as standardised _DSD properties (ideally just one of them).


As far as I can tell, and please correct me if I am wrong, ACPI-6.0
doesn't contemplate MAC addresses.

Today we are using "mac-address", which is an Integer containing the MAC
address in its lowest order 48 bits in Little-Endian byte order.

The hardware and ACPI tables are here today, and we would like to
support it.  If some future ACPI specification specifies a standard way
to do this, we will probably adapt the code to do this in a standard manner.




[...]


+static acpi_status bgx_acpi_register_phy(acpi_handle handle,
+   u32 lvl, void *context, void **rv)
+{
+  struct acpi_reference_args args;
+  const union acpi_object *prop;
+  struct bgx *bgx = context;
+  struct acpi_device *adev;
+  struct device *phy_dev;
+  u32 phy_id;
+
+  if (acpi_bus_get_device(handle, ))
+  goto out;
+
+  SET_NETDEV_DEV(>lmac[bgx->lmac_count].netdev, >pdev->dev);
+
+  acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac);
+
+  bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count;
+
+  if (acpi_dev_get_property_reference(adev, "phy-handle", 0, ))
+  goto out;
+
+  if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, 
))
+  goto out;


Likewise for any inter-device properties, so that we can actually handle
them in a generic fashion, and avoid / learn from the mistakes we've
already handled with DT.


This is the fallacy of the ACPI is superior to DT argument.  The
specification of PHY topology and MAC addresses is well standardized in
DT, there is no question about what the proper way to specify it is.
Under ACPI, it is the Wild West, there is no specification, so each
system design is forced to invent something, and everybody comes up with
an incompatible implementation.


Indeed.

If ACPI is going to handle it, it should handle it properly. I really
don't see the point in bodging properties together in a less standard
manner than DT, especially for inter-device relationships.

Doing so is painful for _everyone_, and it's extremely unlikely that
other ACPI-aware OSs will actually support these custom descriptions,
making this Linux-specific, and breaking the rationale for using ACPI in
the first place -- a standard that says "just do non-standard stuff" is
not a usable standard.

For intra-device properties, we should standardise what we can, but
vendor-specific stuff is ok -- this can be self-contained within a
driver.

For inter-device relationships ACPI _must_ gain a better model of
componentised devices. It's simply unworkable otherwise, and as you
point out it's fallacious to say that because ACPI is being used that
something is magically industry standard, portable, etc.

This is not your problem in particular; the entire handling of _DSD so
far is a joke IMO.


It is actually useful to people as far as I can say.

Also, if somebody is going to use properties with ACPI, why whould
they use a different set of properties with DT?

Wouldn't it be more reasonable to use the same set in both cases?


Yes, but there is still quite a bit of leeway to screw things up.


FWIW:  http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf

This actually seems to have been adopted by the UEFI people as a
"Standard", I am not sure where a record of this is kept though.

So, we are changing our firmware to use this standard (which is quite
similar the the DT with respect to MAC addresses).

Thanks,
David Daney
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.

2015-08-07 Thread Rafael J. Wysocki
Hi Mark,

On Fri, Aug 7, 2015 at 7:51 PM, Mark Rutland  wrote:
> [Correcting the devicetree list address, which I typo'd in my original
> reply]
>
>> >> +static const char * const addr_propnames[] = {
>> >> +  "mac-address",
>> >> +  "local-mac-address",
>> >> +  "address",
>> >> +};
>> >
>> > If these are going to be generally necessary, then we should get them
>> > adopted as standardised _DSD properties (ideally just one of them).
>>
>> As far as I can tell, and please correct me if I am wrong, ACPI-6.0
>> doesn't contemplate MAC addresses.
>>
>> Today we are using "mac-address", which is an Integer containing the MAC
>> address in its lowest order 48 bits in Little-Endian byte order.
>>
>> The hardware and ACPI tables are here today, and we would like to
>> support it.  If some future ACPI specification specifies a standard way
>> to do this, we will probably adapt the code to do this in a standard manner.
>>
>>
>> >
>> > [...]
>> >
>> >> +static acpi_status bgx_acpi_register_phy(acpi_handle handle,
>> >> +   u32 lvl, void *context, void **rv)
>> >> +{
>> >> +  struct acpi_reference_args args;
>> >> +  const union acpi_object *prop;
>> >> +  struct bgx *bgx = context;
>> >> +  struct acpi_device *adev;
>> >> +  struct device *phy_dev;
>> >> +  u32 phy_id;
>> >> +
>> >> +  if (acpi_bus_get_device(handle, ))
>> >> +  goto out;
>> >> +
>> >> +  SET_NETDEV_DEV(>lmac[bgx->lmac_count].netdev, >pdev->dev);
>> >> +
>> >> +  acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac);
>> >> +
>> >> +  bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count;
>> >> +
>> >> +  if (acpi_dev_get_property_reference(adev, "phy-handle", 0, ))
>> >> +  goto out;
>> >> +
>> >> +  if (acpi_dev_get_property(args.adev, "phy-channel", ACPI_TYPE_INTEGER, 
>> >> ))
>> >> +  goto out;
>> >
>> > Likewise for any inter-device properties, so that we can actually handle
>> > them in a generic fashion, and avoid / learn from the mistakes we've
>> > already handled with DT.
>>
>> This is the fallacy of the ACPI is superior to DT argument.  The
>> specification of PHY topology and MAC addresses is well standardized in
>> DT, there is no question about what the proper way to specify it is.
>> Under ACPI, it is the Wild West, there is no specification, so each
>> system design is forced to invent something, and everybody comes up with
>> an incompatible implementation.
>
> Indeed.
>
> If ACPI is going to handle it, it should handle it properly. I really
> don't see the point in bodging properties together in a less standard
> manner than DT, especially for inter-device relationships.
>
> Doing so is painful for _everyone_, and it's extremely unlikely that
> other ACPI-aware OSs will actually support these custom descriptions,
> making this Linux-specific, and breaking the rationale for using ACPI in
> the first place -- a standard that says "just do non-standard stuff" is
> not a usable standard.
>
> For intra-device properties, we should standardise what we can, but
> vendor-specific stuff is ok -- this can be self-contained within a
> driver.
>
> For inter-device relationships ACPI _must_ gain a better model of
> componentised devices. It's simply unworkable otherwise, and as you
> point out it's fallacious to say that because ACPI is being used that
> something is magically industry standard, portable, etc.
>
> This is not your problem in particular; the entire handling of _DSD so
> far is a joke IMO.

It is actually useful to people as far as I can say.

Also, if somebody is going to use properties with ACPI, why whould
they use a different set of properties with DT?

Wouldn't it be more reasonable to use the same set in both cases?

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf, tools, report: Add support for srcfile sort key

2015-08-07 Thread Arnaldo Carvalho de Melo
Em Fri, Aug 07, 2015 at 08:51:45PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Aug 07, 2015 at 03:54:24PM -0700, Andi Kleen escreveu:
> > From: Andi Kleen 
> > 
> > In some cases it's useful to characterize samples by file. This is useful
> > to get a higher level categorization, for example to map cost to
> > subsystems.
> > 
> > Add a srcfile sort key to perf report. It builds on top of the existing
> > srcline support.
> 
> Applied

Humm, holding this up a bit, further testing showed some oddities,
fedora21, the width of the column is being limited to the lenght of the
header and there are some DWARF errors, have you noticed those?

[root@zoo ~]# rpm -q binutils-devel
binutils-devel-2.24-32.fc21.x86_64

# perf report --stdio -s srcfile
Failed to open /tmp/perf-2268.map, continuing without symbols
# To display the perf.data header info, please use --header/--header-only 
options.
#
BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (111062) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (111062) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(22876).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(58106).
BFD: Dwarf Error: Offset (113124) greater than or equal to .debug_str size 
(58106).
BFD: Dwarf Error: Offset (58380) greater than or equal to .debug_str size 
(17648).
BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size 
(58106).
BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size 
(58106).
BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size 
(58106).
BFD: Dwarf Error: Offset (124839) greater than or equal to .debug_str size 
(58106).
BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size 
(58106).
BFD: Dwarf Error: Offset (70405) greater than or equal to .debug_str size 
(58106).
BFD: Dwarf Error: Offset 

Re: [PATCH 1/3] clk: berlin: add cpuclk

2015-08-07 Thread Stephen Boyd
On 07/30, Antoine Tenart wrote:
> Add cpuclk in the Berlin BG2Q clock driver. This clk has a divider
> fixed to 1.
> 
> Signed-off-by: Antoine Tenart 
> ---

I didn't get anything besides 1/3, but this looks ok. If you
don't want us to take this through clk-tree, feel to take my ack

Acked-by: Stephen Boyd 

>  drivers/clk/berlin/bg2q.c| 14 +++---
>  include/dt-bindings/clock/berlin2q.h |  3 ++-
>  2 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/clk/berlin/bg2q.c b/drivers/clk/berlin/bg2q.c
> index 221f40c2b850..72d2f3500db8 100644
> --- a/drivers/clk/berlin/bg2q.c
> +++ b/drivers/clk/berlin/bg2q.c
> @@ -45,7 +45,7 @@
>  #define REG_SDIO0XIN_CLKCTL  0x0158
>  #define REG_SDIO1XIN_CLKCTL  0x015c
>  
> -#define  MAX_CLKS 27
> +#define  MAX_CLKS 28

Maybe you should move this into the dt-bindings.h file.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the clk tree

2015-08-07 Thread Stephen Boyd
On 08/03, Tony Lindgren wrote:
> 
> Sorry for the delay in replying, I was offline for a few days. Thanks
> for fixing it up. I assume no need to apply this fix into arm-soc
> for-next but please let me know if you prefer that.
> 

Can we apply it into arm-soc. I'm worried we're going to forget
about it, and having the include there in arm-soc (where
a3314e9cf69c isn't present) doesn't hurt anything from what I can
tell.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: create and use seq_show_option for escaping

2015-08-07 Thread Kees Cook
On Fri, Aug 7, 2015 at 4:41 PM, Kees Cook  wrote:
> Many file systems that implement the show_options hook fail to correctly
> escape their output which could lead to unescaped characters (e.g. new
> lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
> could lead to confusion, spoofed entries (resulting in things like
> systemd issuing false d-bus "mount" notifications), and who knows
> what else. This looks like it would only be the root user stepping on
> themselves, but it's possible weird things could happen in containers
> or in other situations with delegated mount privileges.
>
> Here's an example using overlay with setuid fusermount trusting the
> contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of
> "sudo" is something more sneaky:
>
> $ BASE="ovl"
> $ MNT="$BASE/mnt"
> $ LOW="$BASE/lower"
> $ UP="$BASE/upper"
> $ WORK="$BASE/work/ 0 0
> none /proc fuse.pwn user_id=1000"
> $ mkdir -p "$LOW" "$UP" "$WORK"
> $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none 
> /mnt
> $ cat /proc/mounts
> none /root/ovl/mnt overlay 
> rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
> none /proc fuse.pwn user_id=1000 0 0
> $ fusermount -u /proc
> $ cat /proc/mounts
> cat: /proc/mounts: No such file or directory
>
> This fixes the problem by adding new seq_show_option and seq_show_option_n
> helpers, and updating the vulnerable show_option handlers to use them as
> needed. Some, like SELinux, need to be open coded due to unusual existing
> escape mechanisms.
>
> Signed-off-by: Kees Cook 
> Cc: sta...@vger.kernel.org
> ---
>  fs/ceph/super.c  |  2 +-
>  fs/cifs/cifsfs.c |  6 +++---
>  fs/ext3/super.c  |  4 ++--
>  fs/ext4/super.c  |  4 ++--
>  fs/gfs2/super.c  |  6 +++---
>  fs/hfs/super.c   |  4 ++--
>  fs/hfsplus/options.c |  4 ++--
>  fs/hostfs/hostfs_kern.c  |  2 +-
>  fs/ocfs2/super.c |  4 ++--
>  fs/overlayfs/super.c |  6 +++---
>  fs/reiserfs/super.c  |  8 +---
>  fs/xfs/xfs_super.c   |  4 ++--
>  include/linux/seq_file.h | 34 ++
>  kernel/cgroup.c  |  7 ---
>  net/ceph/ceph_common.c   |  7 +--
>  security/selinux/hooks.c |  2 +-
>  16 files changed, 72 insertions(+), 32 deletions(-)
>
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index d1c833c321b9..7b6bfcbf801c 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -479,7 +479,7 @@ static int ceph_show_options(struct seq_file *m, struct 
> dentry *root)
> if (fsopt->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT)
> seq_printf(m, ",readdir_max_bytes=%d", 
> fsopt->max_readdir_bytes);
> if (strcmp(fsopt->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT))
> -   seq_printf(m, ",snapdirname=%s", fsopt->snapdir_name);
> +   seq_show_option(m, "snapdirname", fsopt->snapdir_name);
>
> return 0;
>  }
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 0a9fb6b53126..6a1119e87fbb 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -394,17 +394,17 @@ cifs_show_options(struct seq_file *s, struct dentry 
> *root)
> struct sockaddr *srcaddr;
> srcaddr = (struct sockaddr *)>ses->server->srcaddr;
>
> -   seq_printf(s, ",vers=%s", tcon->ses->server->vals->version_string);
> +   seq_show_option(s, "vers", tcon->ses->server->vals->version_string);
> cifs_show_security(s, tcon->ses);
> cifs_show_cache_flavor(s, cifs_sb);
>
> if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER)
> seq_puts(s, ",multiuser");
> else if (tcon->ses->user_name)
> -   seq_printf(s, ",username=%s", tcon->ses->user_name);
> +   seq_show_option(s, "username", tcon->ses->user_name);
>
> if (tcon->ses->domainName)
> -   seq_printf(s, ",domain=%s", tcon->ses->domainName);
> +   seq_show_option(s, "domain", tcon->ses->domainName);
>
> if (srcaddr->sa_family != AF_UNSPEC) {
> struct sockaddr_in *saddr4;
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index 5ed0044fbb37..e9312494f3ee 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -578,10 +578,10 @@ static inline void ext3_show_quota_options(struct 
> seq_file *seq, struct super_bl
> }
>
> if (sbi->s_qf_names[USRQUOTA])
> -   seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]);
> +   seq_show_option(seq, "usrjquota", sbi->s_qf_names[USRQUOTA]);
>
> if (sbi->s_qf_names[GRPQUOTA])
> -   seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]);
> +   seq_show_option(seq, "grpjquota", sbi->s_qf_names[GRPQUOTA]);
>
> if (test_opt(sb, USRQUOTA))
> seq_puts(seq, ",usrquota");
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 58987b5c514b..9981064c4a54 100644
> --- a/fs/ext4/super.c
> +++ 

Re: [PATCH v2] clk: pxa: pxa3xx: fix CKEN register access

2015-08-07 Thread Stephen Boyd
On 08/04, Robert Jarzmik wrote:
> Clocks 0 to 31 are on CKENA, and not CKENB. The clock register names
> were inadequately inverted. As a consequence, all clock operations were
> happening on CKENB, because almost all but 2 clocks are on CKENA.
> 
> As the clocks were activated by the bootloader in the former tests, it
> escaped the testing that the wrong clock gate was manipulated. The error
> was revealed by changing the pxa3xx-and driver to a module, where tupon
> unloading the wrong clock was disabled in CKENB.
> 
> Fixes: 9bbb8a338fb2 ("clk: pxa: add pxa3xx clock driver")
> Signed-off-by: Robert Jarzmik 
> ---

Applied to clk-fixes. Sorry, got busy last few days.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf, tools: Support static linking with libdw

2015-08-07 Thread Arnaldo Carvalho de Melo
Em Fri, Aug 07, 2015 at 03:18:48PM -0700, Andi Kleen escreveu:
> From: Andi Kleen 
> 
> The Fedora 22 version of libdw requires a couple of extra libraries
> to link. With a dynamic link the dependencies are pulled in automatically,
> but this doesn't work for static linking. Add the needed libraries
> explicitely to the feature probe and the Makefile.

Thanks, applied.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Potential data race in SyS_swapon

2015-08-07 Thread Cesar Eduardo Barros

Em 07-08-2015 13:14, Andrey Konovalov escreveu:

Hi!

We are working on a dynamic data race detector for the Linux kernel
called KernelThreadSanitizer (ktsan)
(https://github.com/google/ktsan/wiki).

While running ktsan on the upstream revision 21bdb584af8c with trinity
we got a few reports from SyS_swapon, here is one of them:


[...]


The race is happening when accessing the swap_file field of a
swap_info_struct struct.

2392 for (i = 0; i < nr_swapfiles; i++) {
2393 struct swap_info_struct *q = swap_info[i];
2394
2395 if (q == p || !q->swap_file)
2396 continue;
2397 if (mapping == q->swap_file->f_mapping) {
2398 error = -EBUSY;
2399 goto bad_swap;
2400 }
2401 }

2539 spin_lock(_lock);
2540 p->swap_file = NULL;
2541 p->flags = 0;
2542 spin_unlock(_lock);


There's another (more important) place which sets the swap_file field to 
NULL, it's within swapoff. It's also protected by swap_lock.



Since the swap_lock lock is not taken in the first snippet, it's
possible for q->swap_file to be assigned to NULL and reloaded between
executing lines 2395 and 2397, which might lead to a null pointer
dereference.


I agree with that analysis. It should be possible to hit by racing 
swapon of a file with swapoff of another.



Looks like the swap_lock should be taken when iterating through the
swap_info array on lines 2392 - 2401.


I'd take that lock a couple of lines earlier, so that every place that 
sets the swap_file field on a swap_info_struct is behind swap_lock, for 
simplicity.


--
Cesar Eduardo Barros
ces...@cesarb.eti.br
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irqchip: gic-v3: Explicitly include linux/io.h

2015-08-07 Thread Ben Zhang
On Fri, Aug 7, 2015 at 1:40 AM, Marc Zyngier  wrote:
> Hi Ben,
>
> On 07/08/15 06:31, Ben Zhang wrote:
>> linux/io.h is needed because the driver uses:
>> readl_relaxed
>> writel_relaxed
>> writeq_relaxed
>> readq_relaxed
>> iounmap
>>
>> The header was implicitly included by an unrelated
>> commit 332fd7c4fef5
>> ("genirq: Generic chip: Change irq_reg_{readl,writel} arguments")
>> from the path below:
>> include/linux/io.h
>> include/linux/irq.h
>> include/linux/of_irq.h
>> drivers/irqchip/irq-gic-v3.c
>>
>> Signed-off-by: Ben Zhang 
>> ---
>>  drivers/irqchip/irq-gic-v3.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
>> index e406bc5..3350b8d 100644
>> --- a/drivers/irqchip/irq-gic-v3.c
>> +++ b/drivers/irqchip/irq-gic-v3.c
>> @@ -19,6 +19,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>>
>
> This seems valid, but out of curiosity: Has this triggered any
> observable build failure? Or is that something you found by inspection?
>
> Thanks,
>
> M.
> --
> Jazz is not dead. It just smells funny...


Hi Marc,

We have seen a build failure on the v3.18-based kernel for ChromeOS.
After cherry-picking commit 74d23cc704d1 ("time: move the
timecounter/cyclecounter code into its own file.") into the kernel,
irq-gic-v3 build is broken:

/mnt/host/source/src/third_party/kernel/v3.18/drivers/irqchip/irq-gic-v3.c:
In function 'gic_do_wait_for_rwp':
/mnt/host/source/src/third_party/kernel/v3.18/drivers/irqchip/irq-gic-v3.c:84:2:
error: implicit declaration of function 'readl_relaxed'
[-Werror=implicit-function-declaration]
  while (readl_relaxed(base + GICD_CTLR) & GICD_CTLR_RWP) {
  ^
...

The timecounter patch removes #include  in
include/clocksource/arm_arch_timer.h, and breaks all include paths to
the arm64 io.h header, e.g.
arch/arm64/include/asm/io.h
include/linux/clocksource.h
include/clocksource/arm_arch_timer.h
arch/arm64/include/asm/arch_timer.h
arch/arm64/include/asm/timex.h
include/linux/timex.h
include/linux/sched.h
arch/arm64/include/asm/compat.h
arch/arm64/include/asm/stat.h
include/linux/stat.h
include/linux/sysfs.h
include/linux/kobject.h
include/linux/device.h
include/linux/node.h
include/linux/cpu.h
drivers/irqchip/irq-gic-v3.c

Mainline build is fine because commit 332fd7c4fef5("genirq: Generic
chip: Change irq_reg_{readl,writel} arguments") landed before the
timecounter patch, and it creates a new include path to the io.h
header:
arch/arm64/include/asm/io.h
include/linux/io.h
include/linux/irq.h
include/linux/of_irq.h
drivers/irqchip/irq-gic-v3.c

Thanks,
Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 4/9] dax: fix mapping lifetime handling, convert to __pfn_t + kmap_atomic_pfn_t()

2015-08-07 Thread Dan Williams
On Sat, Jun 6, 2015 at 4:58 AM, Matthew Wilcox  wrote:
> On Fri, Jun 05, 2015 at 05:19:24PM -0400, Dan Williams wrote:
>> @@ -35,13 +35,16 @@ int dax_clear_blocks(struct inode *inode, sector_t 
>> block, long size)
>>   might_sleep();
>>   do {
>>   void *addr;
>> - unsigned long pfn;
>> + __pfn_t pfn;
>>   long count;
>>
>> - count = bdev_direct_access(bdev, sector, , , size);
>> + count = bdev_direct_access(bdev, sector, , size);
>>   if (count < 0)
>>   return count;
>>   BUG_ON(size < count);
>> + addr = kmap_atomic_pfn_t(pfn);
>> + if (!addr)
>> + return -EIO;
>>   while (count > 0) {
>>   unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
>>   if (pgsz > count)
>
> This part is incomplete.  When bdev_direct_access() could return an
> address, it was possible for that address to be unaligned (eg when
> 'sector' was not a multiple of 8).  DAX has never had full support for
> devices that weren't a 4k sector size, but I was trying to not make that
> assumption in more places than I had to.  So this function needs a lot
> more simplification (or it needs to add '(sector & 7) << 9' to addr ...
> assuming that the partition this bdev represents actually starts at a
> multiple of 8 ... bleh!).

Isn't this already handled by the:

if (sector % (PAGE_SIZE / 512))
return -EINVAL;

...check in bdev_direct_access()?  As long as the driver's mapping is
4K aligned, which appears to be the case for all DAX-enabled drivers,
then we should be good to go.

>>
>> -static long dax_get_addr(struct buffer_head *bh, void **addr, unsigned 
>> blkbits)
>> +static long dax_get_pfn(struct buffer_head *bh, __pfn_t *pfn, unsigned 
>> blkbits)
>>  {
>> - unsigned long pfn;
>>   sector_t sector = bh->b_blocknr << (blkbits - 9);
>> - return bdev_direct_access(bh->b_bdev, sector, addr, , bh->b_size);
>> + return bdev_direct_access(bh->b_bdev, sector, pfn, bh->b_size);
>>  }
>
> This function should just be deleted.  It offers essentially nothing
> over just calling bdev_direct_access().

Ok.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf, tools, report: Add support for srcfile sort key

2015-08-07 Thread Arnaldo Carvalho de Melo
Em Fri, Aug 07, 2015 at 03:54:24PM -0700, Andi Kleen escreveu:
> From: Andi Kleen 
> 
> In some cases it's useful to characterize samples by file. This is useful
> to get a higher level categorization, for example to map cost to
> subsystems.
> 
> Add a srcfile sort key to perf report. It builds on top of the existing
> srcline support.

Applied
 
> Signed-off-by: Andi Kleen 
> ---
>  tools/perf/Documentation/perf-report.txt |  2 ++
>  tools/perf/util/hist.c   |  2 ++
>  tools/perf/util/hist.h   |  1 +
>  tools/perf/util/sort.c   | 52 
> 
>  tools/perf/util/sort.h   |  2 ++
>  5 files changed, 59 insertions(+)
> 
> diff --git a/tools/perf/Documentation/perf-report.txt 
> b/tools/perf/Documentation/perf-report.txt
> index aabb1b4..724ab3f 100644
> --- a/tools/perf/Documentation/perf-report.txt
> +++ b/tools/perf/Documentation/perf-report.txt
> @@ -81,6 +81,8 @@ OPTIONS
>   - cpu: cpu number the task ran at the time of sample
>   - srcline: filename and line number executed at the time of sample.  The
>   DWARF debugging info must be provided.
> + - srcfile: file name of the source file of the same. Requires dwarf
> + information.
>   - weight: Event specific weight, e.g. memory latency or transaction
>   abort cost. This is the global weight.
>   - local_weight: Local weight version of the weight above.
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index 6f28d53..37dd8ae 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -944,6 +944,8 @@ void hist_entry__delete(struct hist_entry *he)
>  
>   zfree(>stat_acc);
>   free_srcline(he->srcline);
> + if (he->srcfile && he->srcfile[0])
> + free(he->srcfile);
>   free_callchain(he->callchain);
>   free(he);
>  }
> diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> index 5ed8d9c..3be8087 100644
> --- a/tools/perf/util/hist.h
> +++ b/tools/perf/util/hist.h
> @@ -30,6 +30,7 @@ enum hist_column {
>   HISTC_PARENT,
>   HISTC_CPU,
>   HISTC_SRCLINE,
> + HISTC_SRCFILE,
>   HISTC_MISPREDICT,
>   HISTC_IN_TX,
>   HISTC_ABORT,
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 4c65a14..e3e8b13 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -319,6 +319,57 @@ struct sort_entry sort_srcline = {
>   .se_width_idx   = HISTC_SRCLINE,
>  };
>  
> +/* --sort srcfile */
> +
> +static char no_srcfile[1];
> +
> +static char *get_srcfile(struct hist_entry *e)
> +{
> + char *sf, *p;
> + struct map *map = e->ms.map;
> +
> + sf = get_srcline(map->dso, map__rip_2objdump(map, e->ip),
> +  e->ms.sym, true);
> + p = strchr(sf, ':');
> + if (p && *sf) {
> + *p = 0;
> + return sf;
> + }
> + free(sf);
> + return no_srcfile;
> +}
> +
> +static int64_t
> +sort__srcfile_cmp(struct hist_entry *left, struct hist_entry *right)
> +{
> + if (!left->srcfile) {
> + if (!left->ms.map)
> + left->srcfile = no_srcfile;
> + else
> + left->srcfile = get_srcfile(left);
> + }
> + if (!right->srcfile) {
> + if (!right->ms.map)
> + right->srcfile = no_srcfile;
> + else
> + right->srcfile = get_srcfile(right);
> + }
> + return strcmp(right->srcfile, left->srcfile);
> +}
> +
> +static int hist_entry__srcfile_snprintf(struct hist_entry *he, char *bf,
> + size_t size, unsigned int width)
> +{
> + return repsep_snprintf(bf, size, "%-*.*s", width, width, he->srcfile);
> +}
> +
> +struct sort_entry sort_srcfile = {
> + .se_header  = "Source File",
> + .se_cmp = sort__srcfile_cmp,
> + .se_snprintf= hist_entry__srcfile_snprintf,
> + .se_width_idx   = HISTC_SRCFILE,
> +};
> +
>  /* --sort parent */
>  
>  static int64_t
> @@ -1173,6 +1224,7 @@ static struct sort_dimension common_sort_dimensions[] = 
> {
>   DIM(SORT_PARENT, "parent", sort_parent),
>   DIM(SORT_CPU, "cpu", sort_cpu),
>   DIM(SORT_SRCLINE, "srcline", sort_srcline),
> + DIM(SORT_SRCFILE, "srcfile", sort_srcfile),
>   DIM(SORT_LOCAL_WEIGHT, "local_weight", sort_local_weight),
>   DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight),
>   DIM(SORT_TRANSACTION, "transaction", sort_transaction),
> diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
> index e97cd47..13705b2 100644
> --- a/tools/perf/util/sort.h
> +++ b/tools/perf/util/sort.h
> @@ -114,6 +114,7 @@ struct hist_entry {
>   };
>   };
>   char*srcline;
> + char*srcfile;
>   struct symbol   *parent;
>   struct rb_root  sorted_chain;
>   struct branch_info  *branch_info;
> @@ -172,6 +173,7 @@ enum 

Re: [PATCH v2 1/4] Add generic correlated clocksource code and ART to TSC conversion code

2015-08-07 Thread Andy Lutomirski

On 08/07/2015 04:01 PM, Christopher Hall wrote:

Original patch description:

Subject: ptp: Get sync timestamps
From: Thomas Gleixner 
Date: Wed, 29 Jul 2015 10:52:06 +0200

The ART stuff wants to be splitted out.

 Changes ===

Add struct correlated_cs (clocksource) with pointer to original clocksource
and function pointer to convert correlated clocksource to the original

Add struct correlated_ts (timestamp) with function pointer to read correlated
clocksource, device and system (in terms of correlated clocksource)
counter values (input) with resulting converted real and monotonic raw
system times (output)

Add get_correlated_timestamp() function which given specific correlated_cs
and correlated_ts convert correlated counter value to system time

Add art_to_tsc conversion function translated Always Running Timer (ART) to
TSC value
---
  arch/x86/kernel/tsc.c   | 31 ++
  include/linux/clocksource.h | 30 +
  include/linux/timekeeping.h |  4 +++
  kernel/time/timekeeping.c   | 63 +
  4 files changed, 128 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 7437b41..a90aa6a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1059,6 +1059,27 @@ int unsynchronized_tsc(void)
return 0;
  }

+static u32 tsc_numerator;
+static u32 tsc_denominator;
+/*
+ * CHECKME: Do we need the adjust value? It should be 0, but if we run
+ * in a VM this might be a different story.
+ */
+static u64 tsc_adjust;
+
+static u64 art_to_tsc(u64 cycles)
+{
+   u64 tmp, res = tsc_adjust;
+
+   res += (cycles / tsc_denominator) * tsc_numerator;
+   tmp = (cycles % tsc_denominator) * tsc_numerator;
+   res += tmp / tsc_denominator;
+   return res;


Nice trick!


diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 278dd27..2ed3d0c 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -258,4 +258,34 @@ void acpi_generic_timer_init(void);
  static inline void acpi_generic_timer_init(void) { }
  #endif

+/**
+ * struct correlated_cs - Descriptor for a clocksource correlated to another 
clocksource
+ * @related_cs:Pointer to the related timekeeping clocksource
+ * @convert:   Conversion function to convert a timestamp from
+ * the correlated clocksource to cycles of the related
+ * timekeeping clocksource
+ */
+struct correlated_cs {
+   struct clocksource  *related_cs;
+   u64 (*convert)(u64 cycles);


Should the name make it clearer which way it converts?  For example, 
convert_to_related?  We might also want convert_from_related.


--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/11] test_user_copy improvements

2015-08-07 Thread Kees Cook
On Fri, Aug 7, 2015 at 8:21 AM, James Hogan  wrote:
> These patches extend the test_user_copy test module to handle lots more
> cases of user accessors which architectures can override separately, and
> in particular those which are important for checking the MIPS Enhanced
> Virtual Addressing (EVA) implementations, which need to handle
> overlapping user and kernel address spaces, with special instructions
> for accessing user address space from kernel mode.
>
> - Checking that kernel pointers are accepted when user address limit is
>   set to KERNEL_DS, as done by the kernel when it internally invokes
>   system calls with kernel pointers.
> - Checking of the unchecked accessors (which don't call access_ok()).
>   Some of the tests are special cased for EVA at the moment which has
>   stricter hardware guarantees for bad user accesses than other
>   configurations.
> - Checking of other sets of user accessors, including the inatomic user
>   copies, clear_user, compatibility accessors (copy_in_user and
>   _unaligned), the user string accessors, and the user checksum
>   functions, all of which need special handling in arch code with EVA.
>
> Tested on MIPS with and without EVA, and on x86_64.
>
> Only build tested for arm, blackfin, metag, microblaze, openrisc,
> parisc, powerpc, sh, sparc, tile, i386 & xtensa.
>
> All arches were audited for the appropriate exports, only score is known
> to still be missing some.
>
> Changes in v2:
> - Add arch exports (patches 1-4).
> - Reorder patches slightly.
> - Patch 9: Drop strlen_user test. Microblaze doesn't define it, and
>   nothing actually uses it. IMO it should be removed, and there's no
>   point testing it in the mean time.
> - Patch 10: Conditionalise on CONFIG_COMPAT, otherwise it breaks build
>   on some 32-bit arches e.g. i386 (kbuild test robot).
> - Patch 10: Add testing of _unaligned accessors, which are also
>   conditional upon CONFIG_COMPAT.
> - Patch 11: Only test csum_partial_copy_from_user #ifndef
>   _HAVE_ARCH_COPY_AND_CSUM_FROM_USER, fixing powerpc64 build (Stephen
>   Rothwell)

Thanks for the fixes!

Acked-by: Kees Cook 

-Kees

>
> James Hogan (11):
>   microblaze: Export __strnlen_user to modules
>   nios2: Export strncpy_from_user / strnlen_user to modules
>   openrisc: Export __clear_user to modules
>   xtensa: Export __strnlen_user to modules
>   test_user_copy: Check legit kernel accesses
>   test_user_copy: Check unchecked accessors
>   test_user_copy: Check __copy_{to,from}_user_inatomic()
>   test_user_copy: Check __clear_user()/clear_user()
>   test_user_copy: Check user string accessors
>   test_user_copy: Check user compatibility accessors
>   test_user_copy: Check user checksum functions
>
>  arch/microblaze/kernel/microblaze_ksyms.c |   1 +
>  arch/nios2/mm/uaccess.c   |   2 +
>  arch/openrisc/kernel/or32_ksyms.c |   1 +
>  arch/xtensa/kernel/xtensa_ksyms.c |   1 +
>  lib/test_user_copy.c  | 251 
> ++
>  5 files changed, 256 insertions(+)
>
> Cc: Kees Cook 
> Cc: Andrew Morton 
> Cc: Michal Simek 
> Cc: Ley Foon Tan 
> Cc: Jonas Bonn 
> Cc: Chris Zankel 
> Cc: Max Filippov 
> Cc: nios2-...@lists.rocketboards.org
> Cc: li...@lists.openrisc.net
> Cc: linux-xte...@linux-xtensa.org
> --
> 2.3.6
>



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

2015-08-07 Thread Toshi Kani
On Fri, 2015-08-07 at 16:26 -0700, Luis R. Rodriguez wrote:
> On Fri, Aug 7, 2015 at 4:08 PM, Toshi Kani  wrote:
> > On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote:
> > > On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani  wrote:
> > > > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote:
> > > > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani  
> > > > > wrote:
> > > > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
> > > > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani  > > > > > > >
> > > > > > > wrote:
 :
> > > > > 
> > > > > Its a bit more than that though. Since you agree that the OS can 
> > > > > live without MTRR code I was hoping to then see if we can fold out 
> > > > > PAT Linux code from under the MTRR dependency on Linux and make 
> > > > > PAT a first class citizen, maybe at least for x86-64. Right now 
> > > > > you can only get PAT support on Linux if you have MTRR code, but 
> > > > > I'd like to see if instead we can rip MTRR code out completely 
> > > > > under its own Kconfig and let it start rotting away.
> > > > > 
> > > > > Code-wise the only issue I saw was that PAT code also relies on
> > > > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I 
> > > > > found no other obvious issues.
> > > > 
> > > > We can rip of the MTTR code that modifies the MTRR setup, but not
> > > > mtrr_type_lookup().  This function provides necessary checks per
> > > > documented
> > > > in commit 7f0431e3dc89 as follows.
> > > > 
> > > > 1) reserve_memtype() tracks an effective memory type in case
> > > >a request type is WB (ex. /dev/mem blindly uses WB). Missing
> > > >to track with its effective type causes a subsequent request
> > > >to map the same range with the effective type to fail.
> > > > 
> > > > 2) pud_set_huge() and pmd_set_huge() check if a requested range
> > > >has any overlap with MTRRs. Missing to detect an overlap may
> > > >cause a performance penalty or undefined behavior.
> > > > 
> > > > mtrr_type_lookup() is still admittedly awkward, but I do not think 
> > > > we
> > > > have an immediate issue in PAT code calling it.  I do not think it 
> > > > makes
> > > > PAT code a second class citizen.
> > > 
> > > OK since we know that if MTRR set up code ends up disabled and would
> > > return MTRR_TYPE_INVALID what if we just static inline this for the
> > > no-MTRR Kconfig build option immediately, and only then have the full
> > > blown implementation for the case where MTRR Kconfig option is
> > > enabled?
> > 
> > Yes, the MTRR code could be disabled by Kconfig with such inline stubs
> 
> OK thanks.
> 
> > as
> > long as the kernel is built specifically for a particular platform with 
> > MTRR disabled, such as Xen guest kernel.
> 
> Sure.
> 
> > However, since MTRR is a CPU feature enabled on most of the systems, I 
> > am not sure if it makes sense to be configurable with Kconfig, though.
> 
> To me this is about making PAT a first class citizen in code though
> and validating through Kconfig the option then to opt-out of MTRR from
> OS code. Perhaps we can recommend to enable it but having the options
> to split out PAT from MTRR is what I was aiming for.

Since we have CONFIG_MTRR already, we do not need to argue over this option.
:-)  It makes sense since when MTRR code was introduced, there were CPUs
without this capability...

> > > > > Platform firmware and SMIs seems to be the only other possible 
> > > > > issue. More on this below.
> > > > > 
> > > > > > > For those type of OSes...
> > > > > > > could it be possible to negotiate or hint to the platform 
> > > > > > > through an attribute somehow that the OS has such capability 
> > > > > > > to not use MTRR?
> > > > > > 
> > > > > > The OS can disable MTRR.  However, this can also cause a problem 
> > > > > > in firmware, which may rely on MTRR.
> > > > > 
> > > > > Can you describe what type of issues we could expect ? I tend to 
> > > > > care more about this for 64-bit systems so if 32-bit platforms 
> > > > > would be more of the ones which could cause an issue would 
> > > > > restricting disabling MTRR only for 64-bit help?
> > > > 
> > > > The SMI handler runs in real-mode and relies on MTRR being effective 
> > > > to provide right cache types.  It does not matter if it is 64-bit or
> > > > not.
> > > 
> > > I see... since I have no visibility to what goes under the hood, can
> > > you provide one example use case where an SMI handler would require
> > > getting a cache type through MTRR ? I realize this can vary, vendor by
> > > vendor, but any example would do just to satisfy my curiosity.
> > 
> > For fan control, it would need UC access to its registers.
> 
> OK thanks! To follow up with the example, since the platform firmware
> would have set up the MTRRs anyway, the SMI should still work, even if
> the OS didn't do anything, right?

Yes, MTRR works without the OS code.  However, mtrr_type_lookup() is
necessary to 

Re: [PATCH 17/26] clk: rockchip: Convert to clk_hw based provider APIs

2015-08-07 Thread Stephen Boyd
On 08/04, Heiko Stübner wrote:
> Am Freitag, 31. Juli 2015, 10:03:57 schrieb Stephen Boyd:
> > We're removing struct clk from the clk provider API, so switch
> > this code to using the clk_hw based provider APIs.
> > 
> > Cc: Heiko Stuebner 
> > Signed-off-by: Stephen Boyd 
> 
> Looks correct to me and there are no functional changes in here anyway
> Reviewed-by: Heiko Stuebner 

Thanks. There was one bug, but I fixed it ;-)

---8<---
diff --git a/drivers/clk/rockchip/clk-pll.c b/drivers/clk/rockchip/clk-pll.c
index 198261e001fa..eab430452c52 100644
--- a/drivers/clk/rockchip/clk-pll.c
+++ b/drivers/clk/rockchip/clk-pll.c
@@ -295,7 +295,7 @@ static void rockchip_rk3066_pll_init(struct clk_hw *hw)
 
if (!parent) {
pr_warn("%s: parent of %s not available\n",
-   __func__, clk_hw_get_parent(hw));
+   __func__, clk_hw_get_name(hw));
return;
}
 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fs: create and use seq_show_option for escaping

2015-08-07 Thread Kees Cook
Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g. new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows
what else. This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers
or in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of
"sudo" is something more sneaky:

$ BASE="ovl"
$ MNT="$BASE/mnt"
$ LOW="$BASE/lower"
$ UP="$BASE/upper"
$ WORK="$BASE/work/ 0 0
none /proc fuse.pwn user_id=1000"
$ mkdir -p "$LOW" "$UP" "$WORK"
$ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
$ cat /proc/mounts
none /root/ovl/mnt overlay 
rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
none /proc fuse.pwn user_id=1000 0 0
$ fusermount -u /proc
$ cat /proc/mounts
cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and seq_show_option_n
helpers, and updating the vulnerable show_option handlers to use them as
needed. Some, like SELinux, need to be open coded due to unusual existing
escape mechanisms.

Signed-off-by: Kees Cook 
Cc: sta...@vger.kernel.org
---
 fs/ceph/super.c  |  2 +-
 fs/cifs/cifsfs.c |  6 +++---
 fs/ext3/super.c  |  4 ++--
 fs/ext4/super.c  |  4 ++--
 fs/gfs2/super.c  |  6 +++---
 fs/hfs/super.c   |  4 ++--
 fs/hfsplus/options.c |  4 ++--
 fs/hostfs/hostfs_kern.c  |  2 +-
 fs/ocfs2/super.c |  4 ++--
 fs/overlayfs/super.c |  6 +++---
 fs/reiserfs/super.c  |  8 +---
 fs/xfs/xfs_super.c   |  4 ++--
 include/linux/seq_file.h | 34 ++
 kernel/cgroup.c  |  7 ---
 net/ceph/ceph_common.c   |  7 +--
 security/selinux/hooks.c |  2 +-
 16 files changed, 72 insertions(+), 32 deletions(-)

diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index d1c833c321b9..7b6bfcbf801c 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -479,7 +479,7 @@ static int ceph_show_options(struct seq_file *m, struct 
dentry *root)
if (fsopt->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT)
seq_printf(m, ",readdir_max_bytes=%d", 
fsopt->max_readdir_bytes);
if (strcmp(fsopt->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT))
-   seq_printf(m, ",snapdirname=%s", fsopt->snapdir_name);
+   seq_show_option(m, "snapdirname", fsopt->snapdir_name);
 
return 0;
 }
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 0a9fb6b53126..6a1119e87fbb 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -394,17 +394,17 @@ cifs_show_options(struct seq_file *s, struct dentry *root)
struct sockaddr *srcaddr;
srcaddr = (struct sockaddr *)>ses->server->srcaddr;
 
-   seq_printf(s, ",vers=%s", tcon->ses->server->vals->version_string);
+   seq_show_option(s, "vers", tcon->ses->server->vals->version_string);
cifs_show_security(s, tcon->ses);
cifs_show_cache_flavor(s, cifs_sb);
 
if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER)
seq_puts(s, ",multiuser");
else if (tcon->ses->user_name)
-   seq_printf(s, ",username=%s", tcon->ses->user_name);
+   seq_show_option(s, "username", tcon->ses->user_name);
 
if (tcon->ses->domainName)
-   seq_printf(s, ",domain=%s", tcon->ses->domainName);
+   seq_show_option(s, "domain", tcon->ses->domainName);
 
if (srcaddr->sa_family != AF_UNSPEC) {
struct sockaddr_in *saddr4;
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 5ed0044fbb37..e9312494f3ee 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -578,10 +578,10 @@ static inline void ext3_show_quota_options(struct 
seq_file *seq, struct super_bl
}
 
if (sbi->s_qf_names[USRQUOTA])
-   seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]);
+   seq_show_option(seq, "usrjquota", sbi->s_qf_names[USRQUOTA]);
 
if (sbi->s_qf_names[GRPQUOTA])
-   seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]);
+   seq_show_option(seq, "grpjquota", sbi->s_qf_names[GRPQUOTA]);
 
if (test_opt(sb, USRQUOTA))
seq_puts(seq, ",usrquota");
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 58987b5c514b..9981064c4a54 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1763,10 +1763,10 @@ static inline void ext4_show_quota_options(struct 
seq_file *seq,
}
 
if (sbi->s_qf_names[USRQUOTA])
-   seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]);
+   seq_show_option(seq, 

Re: [RFC PATCH v4 1/9] mmc: dw_mmc: Add external dma interface support

2015-08-07 Thread Shawn Lin

在 2015/8/8 5:32, Joachim Eastwood 写道:

Hi Shawn,

On 6 August 2015 at 08:44, Shawn Lin  wrote:

DesignWare MMC Controller can supports two types of DMA
mode: external dma and internal dma. We get a RK312x platform
integrated dw_mmc and ARM pl330 dma controller. This patch add
edmac ops to support these platforms. I've tested it on RK312x
platform with edmac mode and RK3288 platform with idmac mode.

Signed-off-by: Shawn Lin 



@@ -2256,26 +2373,30 @@ static irqreturn_t dw_mci_interrupt(int irq, void 
*dev_id)

 }

-#ifdef CONFIG_MMC_DW_IDMAC
-   /* Handle DMA interrupts */
-   if (host->dma_64bit_address == 1) {
-   pending = mci_readl(host, IDSTS64);
-   if (pending & (SDMMC_IDMAC_INT_TI | SDMMC_IDMAC_INT_RI)) {
-   mci_writel(host, IDSTS64, SDMMC_IDMAC_INT_TI |
-   SDMMC_IDMAC_INT_RI);
-   mci_writel(host, IDSTS64, SDMMC_IDMAC_INT_NI);
-   host->dma_ops->complete(host);
-   }
-   } else {
-   pending = mci_readl(host, IDSTS);
-   if (pending & (SDMMC_IDMAC_INT_TI | SDMMC_IDMAC_INT_RI)) {
-   mci_writel(host, IDSTS, SDMMC_IDMAC_INT_TI |
-   SDMMC_IDMAC_INT_RI);
-   mci_writel(host, IDSTS, SDMMC_IDMAC_INT_NI);
-   host->dma_ops->complete(host);
+   if (host->use_dma == TRANS_MODE_IDMAC) {


Doing:
if (host->use_dma != TRANS_MODE_IDMAC)
 return IRQ_HANDLED;



Okay.


Could save you the extra level of identation you add below.


+   /* Handle DMA interrupts */
+   if (host->dma_64bit_address == 1) {
+   pending = mci_readl(host, IDSTS64);
+   if (pending & (SDMMC_IDMAC_INT_TI |
+  SDMMC_IDMAC_INT_RI)) {
+   mci_writel(host, IDSTS64,
+  SDMMC_IDMAC_INT_TI |
+  SDMMC_IDMAC_INT_RI);
+   mci_writel(host, IDSTS64, SDMMC_IDMAC_INT_NI);
+   host->dma_ops->complete((void *)host);
+   }
+   } else {
+   pending = mci_readl(host, IDSTS);
+   if (pending & (SDMMC_IDMAC_INT_TI |
+  SDMMC_IDMAC_INT_RI)) {
+   mci_writel(host, IDSTS,
+  SDMMC_IDMAC_INT_TI |
+  SDMMC_IDMAC_INT_RI);
+   mci_writel(host, IDSTS, SDMMC_IDMAC_INT_NI);
+   host->dma_ops->complete((void *)host);
+   }
 }
 }
-#endif

 return IRQ_HANDLED;
  }




@@ -2437,6 +2567,21 @@ static void dw_mci_cleanup_slot(struct dw_mci_slot 
*slot, unsigned int id)
  static void dw_mci_init_dma(struct dw_mci *host)
  {
 int addr_config;
+   int trans_mode;
+   struct device *dev = host->dev;
+   struct device_node *np = dev->of_node;
+
+   /* Check tansfer mode */
+   trans_mode = (mci_readl(host, HCON) >> 16) & 0x3;


I think it would be nice if you could add some defines for 16 and 0x03
or add a macro like SDMMC_GET_FCNT() that is in dw_mmc.h.



yes, it's better to avoid magic number for register operation to make
others understand w/o checking databook for details. And might more than 
one (e.g "Check ADDR_CONFIG bit in HCON to find IDMAC address bus 
width") should be modified.


Although one patch only do one thing, I will drop by to make it in v5.


+   if (trans_mode == 0) {
+   trans_mode = TRANS_MODE_IDMAC;
+   } else if (trans_mode == 1 || trans_mode == 2) {
+   trans_mode = TRANS_MODE_EDMAC;
+   } else {
+   trans_mode = TRANS_MODE_PIO;
+   goto no_dma;
+   }
+
 /* Check ADDR_CONFIG bit in HCON to find IDMAC address bus width */
 addr_config = (mci_readl(host, HCON) >> 27) & 0x01;


I'll try to get this patch tested on my lpc18xx platform soon.
btw, the HCON reg on lpc18xx reads as 0x00e42cc1 (address 0x40004070).



yes, HCON[17:16] is 2b'00 means your lpc18xx use IDMAC.



regard,
Joachim Eastwood






--
Shawn Lin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf, tools: Support full source file paths for srcline

2015-08-07 Thread Arnaldo Carvalho de Melo
Em Fri, Aug 07, 2015 at 03:24:05PM -0700, Andi Kleen escreveu:
> From: Andi Kleen 
> 
> For perf report/script srcline currently only the base file name
> of the source file is printed. This is a good default because
> it usually fits on the screen.
> 
> But in some cases we want to know the full file name,
> for example to aggregate hits per file.
> 
> In the later case we need more than the base file name
> to resolve file naming collisions: for example the kernel source
> has ~70 files named "core.c"
> 
> It's also useful as input to post processing tools which
> want to point to the right file.
> 
> Add a flag to allow full file name output.
> 
> Add an option to perf report/script to enable this option.

Applied
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] MODSIGN: Use PKCS#7 for module signatures [ver #7]

2015-08-07 Thread David Howells
Hi James,

Can you pull this into security/next please?  Its aim is twofold: firstly,
make the module signatures of PKCS#7/CMS format rather than a home-brewed
format and secondly to pave the way for use of the signing code for
firmware signatures (to follow later).

To this end, the patchset effects the following changes:

 (1) Extracts both parts of an X.509 AuthorityKeyIdentifier (AKID)
 extension.  We already extract the bit that can match the
 subjectKeyIdentifier (SKID) of the parent X.509 cert, but we currently
 ignore the bits that can match the issuer and serialNumber.

 Looks up an X.509 cert by issuer and serialNumber if those are
 provided in the AKID.  If the keyIdentifier is also provided, checks
 that the subjectKeyIdentifier of the cert found matches that also.

 If no issuer and serialNumber are provided in the AKID, looks up an
 X.509 cert by SKID using the AKID keyIdentifier.

 This allows module signing to be done with certificates that don't
 have an SKID by which they can be looked up.

 (2) Makes use of the PKCS#7 facility to provide module signatures.

 sign-file is replaced with a program that generates a PKCS#7 message
 that has no X.509 certs embedded and that has detached data (the
 module content) and adds it onto the message with magic string and
 descriptor.

 (3) The PKCS#7 message supplies all the information that is needed to
 select the X.509 cert to be used to verify the signature by standard
 means (including selection of digest algorithm and public key
 algorithm).  No kernel-specific magic values are required.

 (4) Makes it possible to get sign-file to just write out a file containing
 the PKCS#7 signature blob.  This can be used for debugging and
 potentially for firmware signing.

 (5) Extracts the function that does PKCS#7 signature verification on a
 blob from the module signing code and put it somewhere more general so
 that other things, such as firmware signing, can make use of it
 without depending on module config options.

 (6) Adds support for CMS messages in place of PKCS#7 (they're very similar
 ASN.1) and makes sign-file create CMS messages instead of PKCS#7.
 This allows signatures to refer to the verifying key by X.509 cert
 SKID instead of X.509 cert issuer and serial number.

 (7) Provides support for providing a password/pin for an encrypted private
 key to sign-file.

 (8) Makes it possible to use PKCS#11 with sign-file, thus allowing the use
 of cryptographic hardware.

 (9) Overhauls the way the module signing key is handled.  If the name in
 CONFIG_MODULE_SIG_KEY is "signing_key.pem" then a key will be
 automatically generated and placed in the build directory.  If the
 name is different, autogeneration is suppressed and the file is
 presumed to be a PEM file containing both the private key and X.509
 certificate.

(10) Overhauls the way auxiliary trusted keys are added to the kernel.
 Files matching the pattern "*.x509" are no longer just gathered up and
 cat'd together.  Now CONFIG_SYSTEM_TRUSTED_KEYS must be set to point
 to a single PEM file containing a set of X.509 certs cat'd together if
 this facility is desired.

(11) Severely restricts what authenticateAttributes are permitted in a PKCS#7
 or CMS message and what content type may be used.  This is selected by
 the in-kernel user with the appropriate VERIFYING_*_SIGNATURE constant.

Note that the revised sign-file program no longer supports the "-s
" option to add an externally generated signature.  This is
deprecated in favour of using PKCS#11.  Note also that the format of the
signature file that would be passed to -s has changed.

Thanks,
David
---
The following changes since commit 459c15e53cf7e4e88a78ecfb109af5a267c5500a:

  Merge tag 'asn1-fixes-20150805' of 
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next 
(2015-08-07 13:27:58 +1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git 
tags/modsign-pkcs7-20150807

for you to fetch changes up to d158553fdadb0501551a50fcb9537f73d3ed8e01:

  PKCS#7: Appropriately restrict authenticated attributes and content type 
(2015-08-07 16:26:15 +0100)


Module signing with PKCS#7


David Howells (16):
  ASN.1: Add an ASN.1 compiler option to dump the element tree
  ASN.1: Copy string names to tokens in ASN.1 compiler
  X.509: Extract both parts of the AuthorityKeyIdentifier
  X.509: Support X.509 lookup by Issuer+Serial form AuthorityKeyIdentifier
  PKCS#7: Allow detached data to be supplied for signature checking purposes
  MODSIGN: Provide a utility to append a PKCS#7 signature to a module
  MODSIGN: Use PKCS#7

[PATCH] clk: si5351: Convert __clk_get_name() to clk_hw_get_name()

2015-08-07 Thread Stephen Boyd
This driver uses __clk_get_name() when it's dealing with struct
clk_hw pointers. Use the simpler form so that we can get rid of
the clk member in struct clk_hw one day.

Cc: Sebastian Hesselbarth 
Signed-off-by: Stephen Boyd 
---

This depends on the series that introduces clk_hw_get_name()[1].

[1] 
http://lkml.kernel.org/r/<1438362246-6664-1-git-send-email-sb...@codeaurora.org>

 drivers/clk/clk-si5351.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/clk/clk-si5351.c b/drivers/clk/clk-si5351.c
index 4b2ee39f05fd..5596c0aac22f 100644
--- a/drivers/clk/clk-si5351.c
+++ b/drivers/clk/clk-si5351.c
@@ -439,7 +439,7 @@ static unsigned long si5351_pll_recalc_rate(struct clk_hw 
*hw,
 
dev_dbg(>drvdata->client->dev,
"%s - %s: p1 = %lu, p2 = %lu, p3 = %lu, parent_rate = %lu, rate 
= %lu\n",
-   __func__, __clk_get_name(hwdata->hw.clk),
+   __func__, clk_hw_get_name(hw),
hwdata->params.p1, hwdata->params.p2, hwdata->params.p3,
parent_rate, (unsigned long)rate);
 
@@ -497,7 +497,7 @@ static long si5351_pll_round_rate(struct clk_hw *hw, 
unsigned long rate,
 
dev_dbg(>drvdata->client->dev,
"%s - %s: a = %lu, b = %lu, c = %lu, parent_rate = %lu, rate = 
%lu\n",
-   __func__, __clk_get_name(hwdata->hw.clk), a, b, c,
+   __func__, clk_hw_get_name(hw), a, b, c,
*parent_rate, rate);
 
return rate;
@@ -521,7 +521,7 @@ static int si5351_pll_set_rate(struct clk_hw *hw, unsigned 
long rate,
 
dev_dbg(>drvdata->client->dev,
"%s - %s: p1 = %lu, p2 = %lu, p3 = %lu, parent_rate = %lu, rate 
= %lu\n",
-   __func__, __clk_get_name(hwdata->hw.clk),
+   __func__, clk_hw_get_name(hw),
hwdata->params.p1, hwdata->params.p2, hwdata->params.p3,
parent_rate, rate);
 
@@ -632,7 +632,7 @@ static unsigned long si5351_msynth_recalc_rate(struct 
clk_hw *hw,
 
dev_dbg(>drvdata->client->dev,
"%s - %s: p1 = %lu, p2 = %lu, p3 = %lu, m = %lu, parent_rate = 
%lu, rate = %lu\n",
-   __func__, __clk_get_name(hwdata->hw.clk),
+   __func__, clk_hw_get_name(hw),
hwdata->params.p1, hwdata->params.p2, hwdata->params.p3,
m, parent_rate, (unsigned long)rate);
 
@@ -745,7 +745,7 @@ static long si5351_msynth_round_rate(struct clk_hw *hw, 
unsigned long rate,
 
dev_dbg(>drvdata->client->dev,
"%s - %s: a = %lu, b = %lu, c = %lu, divby4 = %d, parent_rate = 
%lu, rate = %lu\n",
-   __func__, __clk_get_name(hwdata->hw.clk), a, b, c, divby4,
+   __func__, clk_hw_get_name(hw), a, b, c, divby4,
*parent_rate, rate);
 
return rate;
@@ -777,7 +777,7 @@ static int si5351_msynth_set_rate(struct clk_hw *hw, 
unsigned long rate,
 
dev_dbg(>drvdata->client->dev,
"%s - %s: p1 = %lu, p2 = %lu, p3 = %lu, divby4 = %d, 
parent_rate = %lu, rate = %lu\n",
-   __func__, __clk_get_name(hwdata->hw.clk),
+   __func__, clk_hw_get_name(hw),
hwdata->params.p1, hwdata->params.p2, hwdata->params.p3,
divby4, parent_rate, rate);
 
@@ -1042,7 +1042,7 @@ static long si5351_clkout_round_rate(struct clk_hw *hw, 
unsigned long rate,
 
dev_dbg(>drvdata->client->dev,
"%s - %s: rdiv = %u, parent_rate = %lu, rate = %lu\n",
-   __func__, __clk_get_name(hwdata->hw.clk), (1 << rdiv),
+   __func__, clk_hw_get_name(hw), (1 << rdiv),
*parent_rate, rate);
 
return rate;
@@ -1093,7 +1093,7 @@ static int si5351_clkout_set_rate(struct clk_hw *hw, 
unsigned long rate,
 
dev_dbg(>drvdata->client->dev,
"%s - %s: rdiv = %u, parent_rate = %lu, rate = %lu\n",
-   __func__, __clk_get_name(hwdata->hw.clk), (1 << rdiv),
+   __func__, clk_hw_get_name(hw), (1 << rdiv),
parent_rate, rate);
 
return 0;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Ming Lin

On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
> I'm for solution 3:
> 
>  - keep blk_bio_{discard,write_same}_split, but ensure we never built
>a > 4GB bio in blkdev_issue_{discard,write_same}.

This has problem as I mentioned in solution 1.
We need to also make sure max discard size is of proper granularity.
See below example.

  4G: 8388608 sectors
UINT_MAX: 8388607 sectors

dm-thinp block size = default discard granularity = 128 sectors

blkdev_issue_discard(sector=0, nr_sectors=8388608)

1. Only ensure bi_size not overflow

It doesn't work.

[start_sector, end_sector]
[0, 8388607]
[0, 8388606], then dm-thinp splits it to 2 bios
[0, 8388479]
[8388480, 8388606] ---> this has problem in process_discard_bio(),
because the discard size(7 sectors) covers less 
than a block(128 sectors)
[8388607, 8388607] ---> same problem 

2. Ensure bi_size not overflow and max discard size is of proper granularity

It works.

[start_sector, end_sector]
[0, 8388607]
[0, 8388479]
[8388480, 8388607]


So how about below patch?

commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7
Author: Ming Lin 
Date:   Fri Aug 7 15:07:07 2015 -0700

block: remove split code in blkdev_issue_{discard,write_same}

The split code in blkdev_issue_{discard,write_same} can go away
now that any driver that cares does the split. We have to make
sure bio size doesn't overflow.

For discard, we ensure max_discard_sectors is of the proper
granularity. So if discard size > 4G, blkdev_issue_discard() always
send multiple granularity requests to lower level, except that the
last one may be not multiple granularity.

Signed-off-by: Ming Lin 
---
 block/blk-lib.c | 37 +
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3..e178a07 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
unsigned int max_discard_sectors, granularity;
-   int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
 
/* Zero-sector (unknown) and one-sector granularities are the same.  */
granularity = max(q->limits.discard_granularity >> 9, 1U);
-   alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
 
/*
-* Ensure that max_discard_sectors is of the proper
-* granularity, so that requests stay aligned after a split.
-*/
-   max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
+* Ensure that max_discard_sectors doesn't overflow bi_size and is of
+* the proper granularity. So if discard size > 4G, 
blkdev_issue_discard()
+* always split and send multiple granularity requests to lower level,
+* except that the last one may be not multiple granularity.
+ */
+   max_discard_sectors = UINT_MAX >> 9;
max_discard_sectors -= max_discard_sectors % granularity;
-   if (unlikely(!max_discard_sectors)) {
-   /* Avoid infinite loop below. Being cautious never hurts. */
-   return -EOPNOTSUPP;
-   }
 
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
@@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
blk_start_plug();
while (nr_sects) {
unsigned int req_sects;
-   sector_t end_sect, tmp;
+   sector_t end_sect;
 
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
}
 
req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
-   /*
-* If splitting a request, and the next starting sector would be
-* misaligned, stop the discard at the previous aligned sector.
-*/
end_sect = sector + req_sects;
-   tmp = end_sect;
-   if (req_sects < nr_sects &&
-   sector_div(tmp, granularity) != alignment) {
-   end_sect = end_sect - alignment;
-   sector_div(end_sect, granularity);
-   end_sect = end_sect * granularity + alignment;
-   req_sects = end_sect - sector;
-   }
 
bio->bi_iter.bi_sector = sector;
bio->bi_end_io = bio_batch_end_io;
@@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
if (!q)
return -ENXIO;
 
-   

[mm/slab_common] BUG: kernel early-boot crashed early console in setup code

2015-08-07 Thread kernel test robot
Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit 4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77
Author: Gavin Guo 
AuthorDate: Wed Jun 24 16:55:54 2015 -0700
Commit: Linus Torvalds 
CommitDate: Wed Jun 24 17:49:40 2015 -0700

mm/slab_common: support the slub_debug boot option on specific object size

The slub_debug=PU,kmalloc-xx cannot work because in the
create_kmalloc_caches() the s->name is created after the
create_kmalloc_cache() is called.  The name is NULL in the
create_kmalloc_cache() so the kmem_cache_flags() would not set the
slub_debug flags to the s->flags.  The fix here set up a kmalloc_names
string array for the initialization purpose and delete the dynamic name
creation of kmalloc_caches.

[a...@linux-foundation.org: s/kmalloc_names/kmalloc_info/, tweak comment 
text]
Signed-off-by: Gavin Guo 
Acked-by: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

+---++++
|   | 3693a84d3b | 
4066c33d03 | e4bc13adfd |
+---++++
| boot_successes| 377| 14   
  | 2  |
| boot_failures | 0  | 116  
  | 27 |
| BUG:kernel_early-boot_crashed_early_console_in_setup_code | 0  | 116  
  | 22 |
| IP-Config:Auto-configuration_of_network_failed| 0  | 0
  | 5  |
+---++++

early console in setup code

Elapsed time: 10
BUG: kernel early-boot crashed early console in setup code
Linux version 4.1.0-03324-g4066c33 #5
Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 systemd.log_level=err 
debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 
panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal  root=/dev/ram0 
rw 
link=/kbuild-tests/run-queue/kvm/i386-randconfig-sb0-08051454/linux-devel:devel-spot-201508051145:4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77:bisect-linux-5/.vmlinuz-4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77-20150808052740-53-ivb41
 branch=linux-devel/devel-spot-201508051145 
BOOT_IMAGE=/pkg/linux/i386-randconfig-sb0-08051454/gcc-4.9/4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77/vmlinuz-4.1.0-03324-g4066c33
 drbd.minor_count=8
qemu-system-x86_64 -enable-kvm -cpu kvm64 -kernel 
/pkg/linux/i386-randconfig-sb0-08051454/gcc-4.9/4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77/vmlinuz-4.1.0-03324-g4066c33
 -append 'hung_task_panic=1 earlyprintk=ttyS0,115200 systemd.log_level=err 
debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 
panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal  root=/dev/ram0 
rw 
link=/kbuild-tests/run-queue/kvm/i386-randconfig-sb0-08051454/linux-devel:devel-spot-201508051145:4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77:bisect-linux-5/.vmlinuz-4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77-20150808052740-53-ivb41
 branch=linux-devel/devel-spot-201508051145 
BOOT_IMAGE=/pkg/linux/i386-randconfig-sb0-08051454/gcc-4.9/4066c33d0308f87e9a3b0c7fafb9141c0bfbfa77/vmlinuz-4.1.0-03324-g4066c33
 drbd.minor_count=8'  -initrd /osimage/quantal/quantal-core-i386.cgz -m 300 
-smp 2 -device e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot 
-watchdog i6300esb -rtc base=localtime -drive 
file=/fs/sda5/disk0-quantal-ivb41-22,media=disk,if=virtio -drive 
file=/fs/sda5/disk1-quantal-ivb41-22,media=disk,if=virtio -drive 
file=/fs/sda5/disk2-quantal-ivb41-22,media=disk,if=virtio -drive 
file=/fs/sda5/disk3-quantal-ivb41-22,media=disk,if=virtio -drive 
file=/fs/sda5/disk4-quantal-ivb41-22,media=disk,if=virtio -drive 
file=/fs/sda5/disk5-quantal-ivb41-22,media=disk,if=virtio -drive 
file=/fs/sda5/disk6-quantal-ivb41-22,media=disk,if=virtio -pidfile 
/dev/shm/kboot/pid-quantal-ivb41-22 -serial 
file:/dev/shm/kboot/serial-quantal-ivb41-22 -daemonize -display none -monitor 
null 

git bisect start e4bc13adfd016fc1036838170288b5680d1a98b0 v4.1 --
git bisect good acd53127c4adbd34570b221e7ea1f7fc94aea923  # 04:28 22+  
0  Merge tag 'scsi-misc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect good e0456717e483bb8a9431b80a5bdc99a928b9b003  # 04:34 22+  
0  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect good 266da6f14232638b9caafb7facf2a7333895dd05  # 04:39 22+  
0  Merge tag 

Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

2015-08-07 Thread Luis R. Rodriguez
On Fri, Aug 7, 2015 at 4:08 PM, Toshi Kani  wrote:
> On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote:
>> On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani  wrote:
>> > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote:
>> > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani  wrote:
>> > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
>> > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani 
>> > > > > wrote:
>  :
>> > > >
>> > > > No, there is no OS support necessary to use MTRR.  After firmware
>> > > > sets it up, CPUs continue to use it without any OS support.  I think
>> > > > the Linux change you are referring is to obsolete legacy interfaces
>> > > > that modify the MTRR setup.  I agree that Linux should not modify
>> > > > MTRR.
>> > >
>> > > Its a bit more than that though. Since you agree that the OS can live
>> > > without MTRR code I was hoping to then see if we can fold out PAT
>> > > Linux code from under the MTRR dependency on Linux and make PAT a
>> > > first class citizen, maybe at least for x86-64. Right now you can only
>> > > get PAT support on Linux if you have MTRR code, but I'd like to see if
>> > > instead we can rip MTRR code out completely under its own Kconfig and
>> > > let it start rotting away.
>> > >
>> > > Code-wise the only issue I saw was that PAT code also relies on
>> > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I found
>> > > no other obvious issues.
>> >
>> > We can rip of the MTTR code that modifies the MTRR setup, but not
>> > mtrr_type_lookup().  This function provides necessary checks per
>> > documented
>> > in commit 7f0431e3dc89 as follows.
>> >
>> > 1) reserve_memtype() tracks an effective memory type in case
>> >a request type is WB (ex. /dev/mem blindly uses WB). Missing
>> >to track with its effective type causes a subsequent request
>> >to map the same range with the effective type to fail.
>> >
>> > 2) pud_set_huge() and pmd_set_huge() check if a requested range
>> >has any overlap with MTRRs. Missing to detect an overlap may
>> >cause a performance penalty or undefined behavior.
>> >
>> > mtrr_type_lookup() is still admittedly awkward, but I do not think we
>> > have an immediate issue in PAT code calling it.  I do not think it makes
>> > PAT code a second class citizen.
>>
>> OK since we know that if MTRR set up code ends up disabled and would
>> return MTRR_TYPE_INVALID what if we just static inline this for the
>> no-MTRR Kconfig build option immediately, and only then have the full
>> blown implementation for the case where MTRR Kconfig option is
>> enabled?
>
> Yes, the MTRR code could be disabled by Kconfig with such inline stubs

OK thanks.

> as
> long as the kernel is built specifically for a particular platform with MTRR
> disabled, such as Xen guest kernel.

Sure.

> However, since MTRR is a CPU feature enabled on most of the systems, I am
> not sure if it makes sense to be configurable with Kconfig, though.

To me this is about making PAT a first class citizen in code though
and validating through Kconfig the option then to opt-out of MTRR from
OS code. Perhaps we can recommend to enable it but having the options
to split out PAT from MTRR is what I was aiming for.

>> > > Platform firmware and SMIs seems to be the only other possible issue.
>> > > More on this below.
>> > >
>> > > > > For those type of OSes...
>> > > > > could it be possible to negotiate or hint to the platform through
>> > > > > an attribute somehow that the OS has such capability to not use
>> > > > > MTRR?
>> > > >
>> > > > The OS can disable MTRR.  However, this can also cause a problem in
>> > > > firmware, which may rely on MTRR.
>> > >
>> > > Can you describe what type of issues we could expect ? I tend to care
>> > > more about this for 64-bit systems so if 32-bit platforms would be
>> > > more of the ones which could cause an issue would restricting
>> > > disabling MTRR only for 64-bit help?
>> >
>> > The SMI handler runs in real-mode and relies on MTRR being effective to
>> > provide right cache types.  It does not matter if it is 64-bit or not.
>>
>> I see... since I have no visibility to what goes under the hood, can
>> you provide one example use case where an SMI handler would require
>> getting a cache type through MTRR ? I realize this can vary, vendor by
>> vendor, but any example would do just to satisfy my curiosity.
>
> For fan control, it would need UC access to its registers.

OK thanks! To follow up with the example, since the platform firmware
would have set up the MTRRs anyway, the SMI should still work, even if
the OS didn't do anything, right?

>> > > > > Then, only if this bit is set, the platform could then avoid such
>> > > > > MTRR settings, and if we have issues you can throw rocks at us.
>> > > >
>> > > > > And if that's not possible how about a new platform setting that
>> > > > > would need to be set at the platform level to enable disabling
>> > > > 

Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

2015-08-07 Thread Toshi Kani
On Fri, 2015-08-07 at 17:08 -0600, Toshi Kani wrote:
> On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote:
> > On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani  wrote:
> > > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote:
> > > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani  
> > > > wrote:
> > > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
> > > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani  
> > > > > > wrote:
>  :
> > > > > 
> > > > > No, there is no OS support necessary to use MTRR.  After firmware 
> > > > > sets it up, CPUs continue to use it without any OS support.  I 
> > > > > think the Linux change you are referring is to obsolete legacy
> > > > > interfaces that modify the MTRR setup.  I agree that Linux should 
> > > > > not modify MTRR.
> > > > 
> > > > Its a bit more than that though. Since you agree that the OS can 
> > > > live without MTRR code I was hoping to then see if we can fold out 
> > > > PAT Linux code from under the MTRR dependency on Linux and make PAT 
> > > > a first class citizen, maybe at least for x86-64. Right now you can
> > > > only get PAT support on Linux if you have MTRR code, but I'd like to 
> > > > see if instead we can rip MTRR code out completely under its own 
> > > > Kconfig and let it start rotting away.
> > > > 
> > > > Code-wise the only issue I saw was that PAT code also relies on
> > > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I 
> > > > found no other obvious issues.
> > > 
> > > We can rip of the MTTR code that modifies the MTRR setup, but not
> > > mtrr_type_lookup().  This function provides necessary checks per 
> > > documented in commit 7f0431e3dc89 as follows.
> > > 
> > > 1) reserve_memtype() tracks an effective memory type in case
> > >a request type is WB (ex. /dev/mem blindly uses WB). Missing
> > >to track with its effective type causes a subsequent request
> > >to map the same range with the effective type to fail.
> > > 
> > > 2) pud_set_huge() and pmd_set_huge() check if a requested range
> > >has any overlap with MTRRs. Missing to detect an overlap may
> > >cause a performance penalty or undefined behavior.
> > > 
> > > mtrr_type_lookup() is still admittedly awkward, but I do not think we 
> > > have an immediate issue in PAT code calling it.  I do not think it 
> > > makes 
> > > PAT code a second class citizen.
> > 
> > OK since we know that if MTRR set up code ends up disabled and would
> > return MTRR_TYPE_INVALID what if we just static inline this for the
> > no-MTRR Kconfig build option immediately, and only then have the full
> > blown implementation for the case where MTRR Kconfig option is
> > enabled?
> 
> Yes, the MTRR code could be disabled by Kconfig with such inline stubs as
> long as the kernel is built specifically for a particular platform with 
> MTRR disabled, such as Xen guest kernel.

Noticed that we do have CONFIG_MTRR and mtrr_type_lookup() inline stub
returns MTRR_INVALID.

-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tools/thermal: tmon: Allow overriding pkg-config

2015-08-07 Thread Brian Norris
On Fri, Aug 07, 2015 at 03:58:02PM -0700, Florian Fainelli wrote:
> Some build systems might not ship with a proper pkg-config
> infrastructure, so picking up the host pkg-config might cause us to link
> with ncursesw which may not be present in the build environment.
> 
> Allow an external build system to override the pkg-config value (e.g:
> set it to /bin/false if we do not have anything).

Is that really the best approach? I see other Makefiles that leave
PKG_CONFIG as a variable (which is fine), but not to allow full
override; they do something like:

PKG_CONFIG := $(CROSS_COMPILE)pkg-config

So, I suppose you could then alias your my-foo-cross-pkg-config to
/bin/false, or something like that.

Another alternative: supply empty $PKG_CONFIG_LIBDIR and
$PKG_CONFIG_PATH variables, so pkg-config finds nothing. (Bonus: no
patch needed here!)

Brian

> Fixes: 96a0d99c72cc ("tools/thermal: tmon: use pkg-config to determine 
> library dependencies")
> Signed-off-by: Florian Fainelli 
> ---
>  tools/thermal/tmon/Makefile | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/thermal/tmon/Makefile b/tools/thermal/tmon/Makefile
> index 2e83dd3655a2..3777ab59d08c 100644
> --- a/tools/thermal/tmon/Makefile
> +++ b/tools/thermal/tmon/Makefile
> @@ -11,6 +11,7 @@ TARGET=tmon
>  
>  INSTALL_PROGRAM=install -m 755 -p
>  DEL_FILE=rm -f
> +PKG_CONFIG?=pkg-config
>  
>  # Static builds might require -ltinfo, for instance
>  ifneq ($(findstring -static, $(LDFLAGS)),)
> @@ -18,8 +19,8 @@ STATIC := --static
>  endif
>  
>  TMON_LIBS=-lm -lpthread
> -TMON_LIBS += $(shell pkg-config --libs $(STATIC) panelw ncursesw 2> 
> /dev/null || \
> -  pkg-config --libs $(STATIC) panel ncurses 2> /dev/null || \
> +TMON_LIBS += $(shell $(PKG_CONFIG) --libs $(STATIC) panelw ncursesw 2> 
> /dev/null || \
> +  $(PKG_CONFIG) --libs $(STATIC) panel ncurses 2> /dev/null 
> || \
>echo -lpanel -lncurses)
>  
>  OBJS = tmon.o tui.o sysfs.o pid.o
> -- 
> 2.1.0
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

2015-08-07 Thread Toshi Kani
On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote:
> On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani  wrote:
> > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote:
> > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani  wrote:
> > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
> > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani  
> > > > > wrote:
 :
> > > > 
> > > > No, there is no OS support necessary to use MTRR.  After firmware 
> > > > sets it up, CPUs continue to use it without any OS support.  I think 
> > > > the Linux change you are referring is to obsolete legacy interfaces 
> > > > that modify the MTRR setup.  I agree that Linux should not modify 
> > > > MTRR.
> > > 
> > > Its a bit more than that though. Since you agree that the OS can live
> > > without MTRR code I was hoping to then see if we can fold out PAT
> > > Linux code from under the MTRR dependency on Linux and make PAT a
> > > first class citizen, maybe at least for x86-64. Right now you can only
> > > get PAT support on Linux if you have MTRR code, but I'd like to see if
> > > instead we can rip MTRR code out completely under its own Kconfig and
> > > let it start rotting away.
> > > 
> > > Code-wise the only issue I saw was that PAT code also relies on
> > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I found
> > > no other obvious issues.
> > 
> > We can rip of the MTTR code that modifies the MTRR setup, but not
> > mtrr_type_lookup().  This function provides necessary checks per 
> > documented
> > in commit 7f0431e3dc89 as follows.
> > 
> > 1) reserve_memtype() tracks an effective memory type in case
> >a request type is WB (ex. /dev/mem blindly uses WB). Missing
> >to track with its effective type causes a subsequent request
> >to map the same range with the effective type to fail.
> > 
> > 2) pud_set_huge() and pmd_set_huge() check if a requested range
> >has any overlap with MTRRs. Missing to detect an overlap may
> >cause a performance penalty or undefined behavior.
> > 
> > mtrr_type_lookup() is still admittedly awkward, but I do not think we 
> > have an immediate issue in PAT code calling it.  I do not think it makes 
> > PAT code a second class citizen.
> 
> OK since we know that if MTRR set up code ends up disabled and would
> return MTRR_TYPE_INVALID what if we just static inline this for the
> no-MTRR Kconfig build option immediately, and only then have the full
> blown implementation for the case where MTRR Kconfig option is
> enabled?

Yes, the MTRR code could be disabled by Kconfig with such inline stubs as
long as the kernel is built specifically for a particular platform with MTRR
disabled, such as Xen guest kernel.

However, since MTRR is a CPU feature enabled on most of the systems, I am
not sure if it makes sense to be configurable with Kconfig, though.

> > > Platform firmware and SMIs seems to be the only other possible issue.
> > > More on this below.
> > > 
> > > > > For those type of OSes...
> > > > > could it be possible to negotiate or hint to the platform through 
> > > > > an attribute somehow that the OS has such capability to not use 
> > > > > MTRR?
> > > > 
> > > > The OS can disable MTRR.  However, this can also cause a problem in
> > > > firmware, which may rely on MTRR.
> > > 
> > > Can you describe what type of issues we could expect ? I tend to care
> > > more about this for 64-bit systems so if 32-bit platforms would be
> > > more of the ones which could cause an issue would restricting
> > > disabling MTRR only for 64-bit help?
> > 
> > The SMI handler runs in real-mode and relies on MTRR being effective to
> > provide right cache types.  It does not matter if it is 64-bit or not.
> 
> I see... since I have no visibility to what goes under the hood, can
> you provide one example use case where an SMI handler would require
> getting a cache type through MTRR ? I realize this can vary, vendor by
> vendor, but any example would do just to satisfy my curiosity.

For fan control, it would need UC access to its registers.

> > > > > Then, only if this bit is set, the platform could then avoid such 
> > > > > MTRR settings, and if we have issues you can throw rocks at us.
> > > > 
> > > > > And if that's not possible how about a new platform setting that 
> > > > > would need to be set at the platform level to enable disabling 
> > > > > this junk?
> > > > > Then only folks who know what they are doing would enable it, and 
> > > > > if the customer set it, the issue would not be on the platform.
> > > > 
> > > > > Could this also be used to prevent SMIs with MTRRs?
> > > > 
> > > > ACPI _OSI could be used for firmware to implement some OS-specific
> > > > features, but it may be too late for firmware to make major changes 
> > > > and
> > > > is generally useless unless OS requirements are described in a spec
> > > > backed by logo certification.
> > > 
> > > I see.. So there are no guarantees that 

Re: [PATCH v2] staging: ion: Add a default struct device for cma heap

2015-08-07 Thread Laura Abbott

On 08/07/2015 11:05 AM, Greg Kroah-Hartman wrote:

On Fri, Aug 07, 2015 at 11:50:04PM +0800, Feng Tang wrote:

On Fri, Aug 07, 2015 at 04:48:28PM +0200, Michal Nazarewicz wrote:

On Fri, Aug 07 2015, Feng Tang wrote:

As I described above, the dummy struct device is only needed for
dma request, its lifetime is align with the cma_heap itself.


Again, this is from perspective of someone who is unfamiliar with ION,
but perhaps a viable solution is to bypass DMA API and just call
cma_alloc directly?


For ion cma heap, the buffer allocation func ion_cma_allocate() will
call dma_alloc_coherent(dev, ...). And dma_alloc_coherent() is
implemented by each architeture(arm/m68k/x86 etc), and many Arch's
implementation doesn't use cma, but use alloc_pages() like APIs.
So I'm afraid we can't direcly call cma_alloc directly here.


Ick.  But using a "fake" struct device here, for no real reason,
makes me very nervous that you are going to hit a codepath somewhere
that assumes this is a "real" struct device and tries to do something
with it (dev_printk(), look up what bus it is on, change the name of it,
etc.)  Trying to fake out the subsystem in this manner is a sign that
something is really wrong here.

Please either make this a real device, or fix up the api to not need
this type of thing.



I think this issue represents one of the many current issues with Ion.
When the void * == struct dev was added, everything was working off of
board files. We now have devicetree which makes the device association
even more awkward to pull off. Every vendor out there is doing something
different right now so the assertion in the commit text about 'normal'
is not true; existing code has managed to work with the (not super great)
API.

There is going to be an Ion session at Plumbers in a few weeks. I'd like
to propose holding off on merging anything until after plumbers when
there can be some more discussion about what would be a reasonable API,
taking into consideration the points brought up in this patch series.

Thanks,
Laura
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/8] drm: rockchip/dp: add rockchip platform dp driver

2015-08-07 Thread Heiko Stübner
Hi Yakir,


I think this Rockchip portion is missing a devicetree binding.

You have the ability to power down the actual edp phy by using
grf_edp_iddq_en from GRF_SOC_CON12. This is similar to how the
rk3288 usb-phy gets put into a deeper state. So maybe you could
provide a phy driver (drivers/phy) for this similar to what the
exynos-dp does.

Some more stuff inline. But I guess by no means complete, as I'm
still trying to integrate this into my development-tree.


Am Freitag, 7. August 2015, 05:46:20 schrieb Yakir Yang:
> Rockchip have three clocks for dp controller, we leave pclk_edp
> to analogix_dp driver control, and keep the sclk_edp_24m and
> sclk_edp in platform driver.
> 
> Signed-off-by: Yakir Yang 
> ---
> Changes in v2: None
> 
>  drivers/gpu/drm/rockchip/Kconfig|  10 +
>  drivers/gpu/drm/rockchip/Makefile   |   1 +
>  drivers/gpu/drm/rockchip/analogix_dp-rockchip.c | 419
>  3 files changed, 430 insertions(+)
>  create mode 100644 drivers/gpu/drm/rockchip/analogix_dp-rockchip.c
> 
> diff --git a/drivers/gpu/drm/rockchip/Kconfig
> b/drivers/gpu/drm/rockchip/Kconfig index 35215f6..096ed77 100644
> --- a/drivers/gpu/drm/rockchip/Kconfig
> +++ b/drivers/gpu/drm/rockchip/Kconfig
> @@ -25,3 +25,13 @@ config ROCKCHIP_DW_HDMI
> for the Synopsys DesignWare HDMI driver. If you want to
> enable HDMI on RK3288 based SoC, you should selet this
> option.
> +
> +

nit: double blank line

> +config ROCKCHIP_ANALOGIX_DP
> +tristate "Rockchip specific extensions for Analogix DP driver"
> +depends on DRM_ROCKCHIP
> +select DRM_ANALOGIX_DP
> +help
> +   This selects support for Rockchip SoC specific extensions
> +   for the Analogix Core DP driver. If you want to enable DP
> +   on RK3288 based SoC, you should selet this option.

[...]


> diff --git a/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c
> b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c new file mode 100644
> index 000..2f86e5e
> --- /dev/null
> +++ b/drivers/gpu/drm/rockchip/analogix_dp-rockchip.c
> @@ -0,0 +1,419 @@
> +/*
> + * Rockchip SoC DP (Display Port) interface driver.
> + *
> + * Copyright (C) Fuzhou Rockchip Electronics Co., Ltd.
> + * Author: Andy Yan 
> + * Yakir Yang 
> + * Jeff Chen 
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; either version 2 of the License, or (at your
> + * option) any later version.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "rockchip_drm_drv.h"
> +#include "rockchip_drm_vop.h"
> +
> +#define encoder_to_dp(c) \
> + container_of(c, struct rockchip_dp_device, encoder)
> +
> +#define plat_data_to_dp(pd) \
> + container_of(pd, struct rockchip_dp_device, plat_data)
> +
> +/* dp grf register offset */
> +#define DP_VOP_SEL  0x025c /* grf_soc_con6 */
> +#define DP_REF_CLK_SEL  0x0274 /* grf_soc_con12 */
> +
> +#define GRF_DP_REF_CLK_SEL_INTERBIT(4)
> +#define DP_SEL_VOP_LIT   BIT(5)
> +
> +struct rockchip_dp_device {
> + struct drm_device*drm_dev;
> + struct device*dev;
> + struct drm_encoder   encoder;
> + struct drm_display_mode  mode;
> +
> + struct clk   *clk_dp;
> + struct clk   *clk_24m_parent;

this clk_24m_parent does not seem used at all

> + struct clk   *clk_24m;
> + struct regmap*grf;
> + struct reset_control *rst;
> +
> + struct analogix_dp_plat_data plat_data;
> +};
> +

[...]

> +static int rockchip_dp_init(struct rockchip_dp_device *dp)
> +{
> + struct device *dev = dp->dev;
> + struct device_node *np = dev->of_node;
> + int ret;
> +
> + dp->grf = syscon_regmap_lookup_by_phandle(np, "rockchip,grf");
> + if (IS_ERR(dp->grf)) {
> + dev_err(dev,
> + "rk3288-dp needs rockchip,grf property\n");
> + return PTR_ERR(dp->grf);
> + }
> +
> + dp->clk_dp = devm_clk_get(dev, "clk_dp");

I've looked at the manual, but couldn't find an actual clock-name
used there. Is it really "clk_dp" or should it just be "dp"?


> + if (IS_ERR(dp->clk_dp)) {
> + dev_err(dev, "cannot get clk_dp\n");
> + return PTR_ERR(dp->clk_dp);
> + }
> +
> + dp->clk_24m = devm_clk_get(dev, "clk_dp_24m");

Same here, maybe "dp_24m".


> + if (IS_ERR(dp->clk_24m)) {
> + dev_err(dev, "cannot get clk_dp_24m\n");
> + return PTR_ERR(dp->clk_24m);
> + }

I think you're missing the pclk here (PCLK_EDP_CTRL) or is this part of
something else?



> 

  1   2   3   4   5   6   7   8   9   10   >