Re: [PATCH v2 2/2] arm64: dts: exynos: add OF graph between USB-PHY and MUIC

2018-05-15 Thread Felipe Balbi
Krzysztof Kozlowski  writes:

> On Tue, May 15, 2018 at 2:12 PM, Andrzej Hajda  wrote:
>> OF graph describes USB data lanes between USB-PHY and respective MUIC.
>> Since graph is present and DWC driver can use it to get extcon, obsolete
>> extcon property can be removed.
>>
>> Signed-off-by: Andrzej Hajda 
>> ---
>>  .../dts/exynos/exynos5433-tm2-common.dtsi | 19 ++-
>>  1 file changed, 18 insertions(+), 1 deletion(-)
>
> As we discussed for v1 - since this was not split into two, I'll apply
> it once first patch hits mainline.

I just took patch 1 to my tree, fyi

-- 
balbi


signature.asc
Description: PGP signature


Re: [PATCH v2 2/2] arm64: dts: exynos: add OF graph between USB-PHY and MUIC

2018-05-15 Thread Felipe Balbi
Krzysztof Kozlowski  writes:

> On Tue, May 15, 2018 at 2:12 PM, Andrzej Hajda  wrote:
>> OF graph describes USB data lanes between USB-PHY and respective MUIC.
>> Since graph is present and DWC driver can use it to get extcon, obsolete
>> extcon property can be removed.
>>
>> Signed-off-by: Andrzej Hajda 
>> ---
>>  .../dts/exynos/exynos5433-tm2-common.dtsi | 19 ++-
>>  1 file changed, 18 insertions(+), 1 deletion(-)
>
> As we discussed for v1 - since this was not split into two, I'll apply
> it once first patch hits mainline.

I just took patch 1 to my tree, fyi

-- 
balbi


signature.asc
Description: PGP signature


Re: [PATCH 1/2] Convert target drivers to use sbitmap

2018-05-15 Thread Felipe Balbi

Hi,

Matthew Wilcox  writes:
> From: Matthew Wilcox 
>
> The sbitmap and the percpu_ida perform essentially the same task,
> allocating tags for commands.  Since the sbitmap is more used than
> the percpu_ida, convert the percpu_ida users to the sbitmap API.
>
> Signed-off-by: Matthew Wilcox 
> ---

[...]

>  drivers/usb/gadget/function/f_tcm.c  |  8 +++---

for drivers/usb/gadget/function/f_tcm.c

Acked-by: Felipe Balbi 

-- 
balbi


signature.asc
Description: PGP signature


Re: [PATCH 1/2] Convert target drivers to use sbitmap

2018-05-15 Thread Felipe Balbi

Hi,

Matthew Wilcox  writes:
> From: Matthew Wilcox 
>
> The sbitmap and the percpu_ida perform essentially the same task,
> allocating tags for commands.  Since the sbitmap is more used than
> the percpu_ida, convert the percpu_ida users to the sbitmap API.
>
> Signed-off-by: Matthew Wilcox 
> ---

[...]

>  drivers/usb/gadget/function/f_tcm.c  |  8 +++---

for drivers/usb/gadget/function/f_tcm.c

Acked-by: Felipe Balbi 

-- 
balbi


signature.asc
Description: PGP signature


Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring

2018-05-15 Thread Tiwei Bie
On Wed, May 16, 2018 at 01:01:04PM +0800, Jason Wang wrote:
> On 2018年04月25日 13:15, Tiwei Bie wrote:
[...]
> > @@ -1143,10 +1160,17 @@ static unsigned 
> > virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> > /* We optimistically turn back on interrupts, then check if there was
> >  * more to do. */
> > +   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > +* either clear the flags bit or point the event index at the next
> > +* entry. Always update the event index to keep code simple. */
> > +
> > +   vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > +   vq->last_used_idx | (vq->wrap_counter << 15));
> 
> 
> Using vq->wrap_counter seems not correct, what we need is the warp counter
> for the last_used_idx not next_avail_idx.

Yes, you're right. I have fixed it in my local repo,
but haven't sent out a new version yet.

I'll try to send out a new RFC today.

> 
> And I think there's even no need to bother with event idx here, how about
> just set VRING_EVENT_F_ENABLE?

We had a similar discussion before. Michael prefers
to use VRING_EVENT_F_DESC when possible to avoid
extra interrupts if host is fast:

https://lkml.org/lkml/2018/4/16/1085
"""
I suspect this will lead to extra interrupts if host is fast.
So I think for now we should always use VRING_EVENT_F_DESC
if EVENT_IDX is negotiated.
"""

> 
> > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > virtio_wmb(vq->weak_barriers);
> > -   vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > +   vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > +VRING_EVENT_F_ENABLE;
> > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > vq->event_flags_shadow);
> > }
> > @@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue 
> > *_vq, unsigned last_used_idx)
> >   static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> >   {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> > +   u16 bufs, used_idx, wrap_counter;
> > START_USE(vq);
> > /* We optimistically turn back on interrupts, then check if there was
> >  * more to do. */
> > +   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > +* either clear the flags bit or point the event index at the next
> > +* entry. Always update the event index to keep code simple. */
> > +
> > +   /* TODO: tune this threshold */
> > +   bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
> 
> bufs could be more than vq->num here, is this intended?

Yes, you're right. Like the above one -- I have fixed
it in my local repo, but haven't sent out a new version
yet. Thanks for spotting this!

> 
> > +
> > +   used_idx = vq->last_used_idx + bufs;
> > +   wrap_counter = vq->wrap_counter;
> > +
> > +   if (used_idx >= vq->vring_packed.num) {
> > +   used_idx -= vq->vring_packed.num;
> > +   wrap_counter ^= 1;
> 
> When used_idx is greater or equal vq->num, there's no need to flip
> warp_counter bit since it should match next_avail_idx.
> 
> And we need also care about the case when next_avail wraps but used_idx not.
> so we probaly need:
> 
> else if (vq->next_avail_idx < used_idx) {
>     wrap_counter ^= 1;
> }
> 
> I think maybe it's time to add some sample codes in the spec to avoid
> duplicating the efforts(bugs).

+1

Best regards,
Tiwei Bie


Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring

2018-05-15 Thread Tiwei Bie
On Wed, May 16, 2018 at 01:01:04PM +0800, Jason Wang wrote:
> On 2018年04月25日 13:15, Tiwei Bie wrote:
[...]
> > @@ -1143,10 +1160,17 @@ static unsigned 
> > virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> > /* We optimistically turn back on interrupts, then check if there was
> >  * more to do. */
> > +   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > +* either clear the flags bit or point the event index at the next
> > +* entry. Always update the event index to keep code simple. */
> > +
> > +   vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > +   vq->last_used_idx | (vq->wrap_counter << 15));
> 
> 
> Using vq->wrap_counter seems not correct, what we need is the warp counter
> for the last_used_idx not next_avail_idx.

Yes, you're right. I have fixed it in my local repo,
but haven't sent out a new version yet.

I'll try to send out a new RFC today.

> 
> And I think there's even no need to bother with event idx here, how about
> just set VRING_EVENT_F_ENABLE?

We had a similar discussion before. Michael prefers
to use VRING_EVENT_F_DESC when possible to avoid
extra interrupts if host is fast:

https://lkml.org/lkml/2018/4/16/1085
"""
I suspect this will lead to extra interrupts if host is fast.
So I think for now we should always use VRING_EVENT_F_DESC
if EVENT_IDX is negotiated.
"""

> 
> > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > virtio_wmb(vq->weak_barriers);
> > -   vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > +   vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > +VRING_EVENT_F_ENABLE;
> > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > vq->event_flags_shadow);
> > }
> > @@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue 
> > *_vq, unsigned last_used_idx)
> >   static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> >   {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> > +   u16 bufs, used_idx, wrap_counter;
> > START_USE(vq);
> > /* We optimistically turn back on interrupts, then check if there was
> >  * more to do. */
> > +   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > +* either clear the flags bit or point the event index at the next
> > +* entry. Always update the event index to keep code simple. */
> > +
> > +   /* TODO: tune this threshold */
> > +   bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
> 
> bufs could be more than vq->num here, is this intended?

Yes, you're right. Like the above one -- I have fixed
it in my local repo, but haven't sent out a new version
yet. Thanks for spotting this!

> 
> > +
> > +   used_idx = vq->last_used_idx + bufs;
> > +   wrap_counter = vq->wrap_counter;
> > +
> > +   if (used_idx >= vq->vring_packed.num) {
> > +   used_idx -= vq->vring_packed.num;
> > +   wrap_counter ^= 1;
> 
> When used_idx is greater or equal vq->num, there's no need to flip
> warp_counter bit since it should match next_avail_idx.
> 
> And we need also care about the case when next_avail wraps but used_idx not.
> so we probaly need:
> 
> else if (vq->next_avail_idx < used_idx) {
>     wrap_counter ^= 1;
> }
> 
> I think maybe it's time to add some sample codes in the spec to avoid
> duplicating the efforts(bugs).

+1

Best regards,
Tiwei Bie


Re: [PATCH v10 02/27] clk: davinci: da850-pll: change PLL0 to CLK_OF_DECLARE

2018-05-15 Thread Sekhar Nori
On Tuesday 15 May 2018 09:12 PM, David Lechner wrote:
> On 05/15/2018 08:31 AM, Sekhar Nori wrote:
>> On Wednesday 09 May 2018 10:55 PM, David Lechner wrote:
>>> +void of_da850_pll0_init(struct device_node *node)
>>>   {
>>> -    return of_davinci_pll_init(dev, dev->of_node, _pll0_info,
>>> -   _pll0_obsclk_info,
>>> -   da850_pll0_sysclk_info, 7, base, cfgchip);
>>> +    void __iomem *base;
>>> +    struct regmap *cfgchip;
>>> +
>>> +    base = of_iomap(node, 0);
>>> +    if (!base) {
>>> +    pr_err("%s: ioremap failed\n", __func__);
>>> +    return;
>>> +    }
>>> +
>>> +    cfgchip = syscon_regmap_lookup_by_compatible("ti,da830-cfgchip");
> 
> In your previous review, you pointed out that the error did not need to
> be handled here because it is handled later in davinci_pll_clk_register().>
> We get a warning there because cfgchip is only needed for unlocking the
> PLL for CPU frequency scaling and is not critical for operation of the
> clocks.

Oops, forgot about that :)

Reviewed-by: Sekhar Nori 

Thanks,
Sekhar


Re: [PATCH v10 02/27] clk: davinci: da850-pll: change PLL0 to CLK_OF_DECLARE

2018-05-15 Thread Sekhar Nori
On Tuesday 15 May 2018 09:12 PM, David Lechner wrote:
> On 05/15/2018 08:31 AM, Sekhar Nori wrote:
>> On Wednesday 09 May 2018 10:55 PM, David Lechner wrote:
>>> +void of_da850_pll0_init(struct device_node *node)
>>>   {
>>> -    return of_davinci_pll_init(dev, dev->of_node, _pll0_info,
>>> -   _pll0_obsclk_info,
>>> -   da850_pll0_sysclk_info, 7, base, cfgchip);
>>> +    void __iomem *base;
>>> +    struct regmap *cfgchip;
>>> +
>>> +    base = of_iomap(node, 0);
>>> +    if (!base) {
>>> +    pr_err("%s: ioremap failed\n", __func__);
>>> +    return;
>>> +    }
>>> +
>>> +    cfgchip = syscon_regmap_lookup_by_compatible("ti,da830-cfgchip");
> 
> In your previous review, you pointed out that the error did not need to
> be handled here because it is handled later in davinci_pll_clk_register().>
> We get a warning there because cfgchip is only needed for unlocking the
> PLL for CPU frequency scaling and is not critical for operation of the
> clocks.

Oops, forgot about that :)

Reviewed-by: Sekhar Nori 

Thanks,
Sekhar


[PATCH v4 0/2] tpm: improving granularity in poll sleep times

2018-05-15 Thread Nayna Jain
The existing TPM polling code sleeps in each loop iteration for time in
msecs ranging from 1 msecs to 5 msecs. However, many of the TPM commands
complete much faster, resulting in unnecessary delays.

This set of patches identifies such iterations and optimizes the sleep
time. The first patch replaces TPM_POLL_SLEEP with TPM_TIMEOUT_POLL and
moves it from tpm_tis_core.c to tpm.h as an enum with value 1 msecs. The
second patch further reduces the TPM poll sleep time in get_burstcount()
and wait_for_tpm_stat() in tpm_tis_core.c by calling usleep_range()
directly.

The change is only in the polling time, and the maximum timeout is still
maintained the same. Thus, it should not affect the overall existing
behavior.

Changelog:

v4:
tpm: reduce poll sleep time in tpm_transmit()
* added Reviewed-by, Tested-by and Ack-by

tpm: reduce polling time to usecs for even finer granularity
* included Jarkko's feedback
* added Ack-by

v3:

tpm: reduce poll sleep time in tpm_transmit()
* added testing platform information
* updated patch description for more clarity on reasoning

tpm: reduce polling time to usecs for even finer granularity
* added testing platform information
* added Jarkko's and Mimi's Reviewed-by

v2:

tpm: reduce poll sleep time in tpm_transmit()
* merged previously defined two patches into this.
* updated patch description as per Jarkko's feedback

tpm: reduce polling time to usecs for even finer granularity
* directly use usleep_range with finer granularity less than 1msec

Nayna Jain (2):
  tpm: reduce poll sleep time in tpm_transmit()
  tpm: reduce polling time to usecs for even finer granularity

 drivers/char/tpm/tpm-interface.c |  2 +-
 drivers/char/tpm/tpm.h   |  5 -
 drivers/char/tpm/tpm_tis_core.c  | 11 +++
 3 files changed, 8 insertions(+), 10 deletions(-)

-- 
2.13.3



[PATCH v4 0/2] tpm: improving granularity in poll sleep times

2018-05-15 Thread Nayna Jain
The existing TPM polling code sleeps in each loop iteration for time in
msecs ranging from 1 msecs to 5 msecs. However, many of the TPM commands
complete much faster, resulting in unnecessary delays.

This set of patches identifies such iterations and optimizes the sleep
time. The first patch replaces TPM_POLL_SLEEP with TPM_TIMEOUT_POLL and
moves it from tpm_tis_core.c to tpm.h as an enum with value 1 msecs. The
second patch further reduces the TPM poll sleep time in get_burstcount()
and wait_for_tpm_stat() in tpm_tis_core.c by calling usleep_range()
directly.

The change is only in the polling time, and the maximum timeout is still
maintained the same. Thus, it should not affect the overall existing
behavior.

Changelog:

v4:
tpm: reduce poll sleep time in tpm_transmit()
* added Reviewed-by, Tested-by and Ack-by

tpm: reduce polling time to usecs for even finer granularity
* included Jarkko's feedback
* added Ack-by

v3:

tpm: reduce poll sleep time in tpm_transmit()
* added testing platform information
* updated patch description for more clarity on reasoning

tpm: reduce polling time to usecs for even finer granularity
* added testing platform information
* added Jarkko's and Mimi's Reviewed-by

v2:

tpm: reduce poll sleep time in tpm_transmit()
* merged previously defined two patches into this.
* updated patch description as per Jarkko's feedback

tpm: reduce polling time to usecs for even finer granularity
* directly use usleep_range with finer granularity less than 1msec

Nayna Jain (2):
  tpm: reduce poll sleep time in tpm_transmit()
  tpm: reduce polling time to usecs for even finer granularity

 drivers/char/tpm/tpm-interface.c |  2 +-
 drivers/char/tpm/tpm.h   |  5 -
 drivers/char/tpm/tpm_tis_core.c  | 11 +++
 3 files changed, 8 insertions(+), 10 deletions(-)

-- 
2.13.3



[PATCH v4 1/2] tpm: reduce poll sleep time in tpm_transmit()

2018-05-15 Thread Nayna Jain
tpm_try_transmit currently checks TPM status every 5 msecs between
send and recv. It does so in a loop for the maximum timeout as defined
in the TPM Interface Specification. However, the TPM may return before
5 msecs. Thus the polling interval for each iteration can be reduced,
which improves overall performance. This patch changes the polling sleep
time from 5 msecs to 1 msec.

Additionally, this patch renames TPM_POLL_SLEEP to TPM_TIMEOUT_POLL and
moves it to tpm.h as an enum value.

After this change, performance on a system[1] with a TPM 1.2 with an 8 byte
burstcount for 1000 extends improved from ~14 sec to ~10.7 sec.

[1] All tests are performed on an x86 based, locked down, single purpose
closed system. It has Infineon TPM 1.2 using LPC Bus.

Signed-off-by: Nayna Jain 
Reviewed-by: Mimi Zohar 
Acked-by: Jay Freyensee 
Reviewed-by: Jarkko Sakkinen 
Tested-by: Jarkko Sakkinen 
---
 drivers/char/tpm/tpm-interface.c |  2 +-
 drivers/char/tpm/tpm.h   |  3 ++-
 drivers/char/tpm/tpm_tis_core.c  | 10 ++
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 9e80a953d693..a676d8ad5992 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -470,7 +470,7 @@ ssize_t tpm_transmit(struct tpm_chip *chip, struct 
tpm_space *space,
goto out;
}
 
-   tpm_msleep(TPM_TIMEOUT);
+   tpm_msleep(TPM_TIMEOUT_POLL);
rmb();
} while (time_before(jiffies, stop));
 
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index f895fba4e20d..7e797377e1eb 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -53,7 +53,8 @@ enum tpm_const {
 enum tpm_timeout {
TPM_TIMEOUT = 5,/* msecs */
TPM_TIMEOUT_RETRY = 100, /* msecs */
-   TPM_TIMEOUT_RANGE_US = 300  /* usecs */
+   TPM_TIMEOUT_RANGE_US = 300, /* usecs */
+   TPM_TIMEOUT_POLL = 1/* msecs */
 };
 
 /* TPM addresses */
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index da074e3db19b..021e6b68f2db 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -31,12 +31,6 @@
 #include "tpm.h"
 #include "tpm_tis_core.h"
 
-/* This is a polling delay to check for status and burstcount.
- * As per ddwg input, expectation is that status check and burstcount
- * check should return within few usecs.
- */
-#define TPM_POLL_SLEEP 1  /* msec */
-
 static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
 
 static bool wait_for_tpm_stat_cond(struct tpm_chip *chip, u8 mask,
@@ -90,7 +84,7 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
}
} else {
do {
-   tpm_msleep(TPM_POLL_SLEEP);
+   tpm_msleep(TPM_TIMEOUT_POLL);
status = chip->ops->status(chip);
if ((status & mask) == mask)
return 0;
@@ -232,7 +226,7 @@ static int get_burstcount(struct tpm_chip *chip)
burstcnt = (value >> 8) & 0x;
if (burstcnt)
return burstcnt;
-   tpm_msleep(TPM_POLL_SLEEP);
+   tpm_msleep(TPM_TIMEOUT_POLL);
} while (time_before(jiffies, stop));
return -EBUSY;
 }
-- 
2.13.3



[PATCH v4 1/2] tpm: reduce poll sleep time in tpm_transmit()

2018-05-15 Thread Nayna Jain
tpm_try_transmit currently checks TPM status every 5 msecs between
send and recv. It does so in a loop for the maximum timeout as defined
in the TPM Interface Specification. However, the TPM may return before
5 msecs. Thus the polling interval for each iteration can be reduced,
which improves overall performance. This patch changes the polling sleep
time from 5 msecs to 1 msec.

Additionally, this patch renames TPM_POLL_SLEEP to TPM_TIMEOUT_POLL and
moves it to tpm.h as an enum value.

After this change, performance on a system[1] with a TPM 1.2 with an 8 byte
burstcount for 1000 extends improved from ~14 sec to ~10.7 sec.

[1] All tests are performed on an x86 based, locked down, single purpose
closed system. It has Infineon TPM 1.2 using LPC Bus.

Signed-off-by: Nayna Jain 
Reviewed-by: Mimi Zohar 
Acked-by: Jay Freyensee 
Reviewed-by: Jarkko Sakkinen 
Tested-by: Jarkko Sakkinen 
---
 drivers/char/tpm/tpm-interface.c |  2 +-
 drivers/char/tpm/tpm.h   |  3 ++-
 drivers/char/tpm/tpm_tis_core.c  | 10 ++
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 9e80a953d693..a676d8ad5992 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -470,7 +470,7 @@ ssize_t tpm_transmit(struct tpm_chip *chip, struct 
tpm_space *space,
goto out;
}
 
-   tpm_msleep(TPM_TIMEOUT);
+   tpm_msleep(TPM_TIMEOUT_POLL);
rmb();
} while (time_before(jiffies, stop));
 
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index f895fba4e20d..7e797377e1eb 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -53,7 +53,8 @@ enum tpm_const {
 enum tpm_timeout {
TPM_TIMEOUT = 5,/* msecs */
TPM_TIMEOUT_RETRY = 100, /* msecs */
-   TPM_TIMEOUT_RANGE_US = 300  /* usecs */
+   TPM_TIMEOUT_RANGE_US = 300, /* usecs */
+   TPM_TIMEOUT_POLL = 1/* msecs */
 };
 
 /* TPM addresses */
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index da074e3db19b..021e6b68f2db 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -31,12 +31,6 @@
 #include "tpm.h"
 #include "tpm_tis_core.h"
 
-/* This is a polling delay to check for status and burstcount.
- * As per ddwg input, expectation is that status check and burstcount
- * check should return within few usecs.
- */
-#define TPM_POLL_SLEEP 1  /* msec */
-
 static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
 
 static bool wait_for_tpm_stat_cond(struct tpm_chip *chip, u8 mask,
@@ -90,7 +84,7 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
}
} else {
do {
-   tpm_msleep(TPM_POLL_SLEEP);
+   tpm_msleep(TPM_TIMEOUT_POLL);
status = chip->ops->status(chip);
if ((status & mask) == mask)
return 0;
@@ -232,7 +226,7 @@ static int get_burstcount(struct tpm_chip *chip)
burstcnt = (value >> 8) & 0x;
if (burstcnt)
return burstcnt;
-   tpm_msleep(TPM_POLL_SLEEP);
+   tpm_msleep(TPM_TIMEOUT_POLL);
} while (time_before(jiffies, stop));
return -EBUSY;
 }
-- 
2.13.3



[PATCH] Makefile: disable PIE before testing asm goto

2018-05-15 Thread Michal Kubecek
Since commit e501ce957a78 ("x86: Force asm-goto"), aarch64 build on
distributions which enable PIE by default (e.g. openSUSE Tumbleweed) does
not detect support for asm goto correctly. The problem is that ARM specific
part of scripts/gcc-goto.sh fails with PIE even with recent gcc versions.
Moving the asm goto detection up in Makefile put it before the place where
we disable PIE. As a result, kernel is built without jump label support.

Move the lines disabling PIE before the asm goto test to make it work.

Fixes: e501ce957a78 ("x86: Force asm-goto")
Reported-by: Andreas Faerber 
Signed-off-by: Michal Kubecek 
---
 Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index ba3106b36597..5532a15c4781 100644
--- a/Makefile
+++ b/Makefile
@@ -500,6 +500,9 @@ RETPOLINE_CFLAGS_CLANG := -mretpoline-external-thunk
 RETPOLINE_CFLAGS := $(call cc-option,$(RETPOLINE_CFLAGS_GCC),$(call 
cc-option,$(RETPOLINE_CFLAGS_CLANG)))
 export RETPOLINE_CFLAGS
 
+KBUILD_CFLAGS  += $(call cc-option,-fno-PIE)
+KBUILD_AFLAGS  += $(call cc-option,-fno-PIE)
+
 # check for 'asm goto'
 ifeq ($(call shell-cached,$(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC) 
$(KBUILD_CFLAGS)), y)
   CC_HAVE_ASM_GOTO := 1
@@ -621,8 +624,6 @@ endif # $(dot-config)
 # Defaults to vmlinux, but the arch makefile usually adds further targets
 all: vmlinux
 
-KBUILD_CFLAGS  += $(call cc-option,-fno-PIE)
-KBUILD_AFLAGS  += $(call cc-option,-fno-PIE)
 CFLAGS_GCOV:= -fprofile-arcs -ftest-coverage -fno-tree-loop-im $(call 
cc-disable-warning,maybe-uninitialized,)
 export CFLAGS_GCOV CFLAGS_KCOV
 
-- 
2.16.3



[PATCH] Makefile: disable PIE before testing asm goto

2018-05-15 Thread Michal Kubecek
Since commit e501ce957a78 ("x86: Force asm-goto"), aarch64 build on
distributions which enable PIE by default (e.g. openSUSE Tumbleweed) does
not detect support for asm goto correctly. The problem is that ARM specific
part of scripts/gcc-goto.sh fails with PIE even with recent gcc versions.
Moving the asm goto detection up in Makefile put it before the place where
we disable PIE. As a result, kernel is built without jump label support.

Move the lines disabling PIE before the asm goto test to make it work.

Fixes: e501ce957a78 ("x86: Force asm-goto")
Reported-by: Andreas Faerber 
Signed-off-by: Michal Kubecek 
---
 Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index ba3106b36597..5532a15c4781 100644
--- a/Makefile
+++ b/Makefile
@@ -500,6 +500,9 @@ RETPOLINE_CFLAGS_CLANG := -mretpoline-external-thunk
 RETPOLINE_CFLAGS := $(call cc-option,$(RETPOLINE_CFLAGS_GCC),$(call 
cc-option,$(RETPOLINE_CFLAGS_CLANG)))
 export RETPOLINE_CFLAGS
 
+KBUILD_CFLAGS  += $(call cc-option,-fno-PIE)
+KBUILD_AFLAGS  += $(call cc-option,-fno-PIE)
+
 # check for 'asm goto'
 ifeq ($(call shell-cached,$(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC) 
$(KBUILD_CFLAGS)), y)
   CC_HAVE_ASM_GOTO := 1
@@ -621,8 +624,6 @@ endif # $(dot-config)
 # Defaults to vmlinux, but the arch makefile usually adds further targets
 all: vmlinux
 
-KBUILD_CFLAGS  += $(call cc-option,-fno-PIE)
-KBUILD_AFLAGS  += $(call cc-option,-fno-PIE)
 CFLAGS_GCOV:= -fprofile-arcs -ftest-coverage -fno-tree-loop-im $(call 
cc-disable-warning,maybe-uninitialized,)
 export CFLAGS_GCOV CFLAGS_KCOV
 
-- 
2.16.3



[PATCH v4 2/2] tpm: reduce polling time to usecs for even finer granularity

2018-05-15 Thread Nayna Jain
The TPM burstcount and status commands are supposed to return very
quickly [2][3]. This patch further reduces the TPM poll sleep time to usecs
in get_burstcount() and wait_for_tpm_stat() by calling usleep_range()
directly.

After this change, performance on a system[1] with a TPM 1.2 with an 8 byte
burstcount for 1000 extends improved from ~10.7 sec to ~7 sec.

[1] All tests are performed on an x86 based, locked down, single purpose
closed system. It has Infineon TPM 1.2 using LPC Bus.

[2] From the TCG Specification "TCG PC Client Specific TPM Interface
Specification (TIS), Family 1.2":

"NOTE : It takes roughly 330 ns per byte transfer on LPC. 256 bytes would
take 84 us, which is a long time to stall the CPU. Chipsets may not be
designed to post this much data to LPC; therefore, the CPU itself is
stalled for much of this time. Sending 1 kB would take 350 μs. Therefore,
even if the TPM_STS_x.burstCount field is a high value, software SHOULD
be interruptible during this period."

[3] From the TCG Specification 2.0, "TCG PC Client Platform TPM Profile
(PTP) Specification":

"It takes roughly 330 ns per byte transfer on LPC. 256 bytes would take
84 us. Chipsets may not be designed to post this much data to LPC;
therefore, the CPU itself is stalled for much of this time. Sending 1 kB
would take 350 us. Therefore, even if the TPM_STS_x.burstCount field is a
high value, software should be interruptible during this period. For SPI,
assuming 20MHz clock and 64-byte transfers, it would take about 120 usec
to move 256B of data. Sending 1kB would take about 500 usec. If the
transactions are done using 4 bytes at a time, then it would take about
1 msec. to transfer 1kB of data."

Signed-off-by: Nayna Jain 
Reviewed-by: Mimi Zohar 
Reviewed-by: Jarkko Sakkinen 
Acked-by: Jay Freyensee 
---
 drivers/char/tpm/tpm.h  | 4 +++-
 drivers/char/tpm/tpm_tis_core.c | 5 +++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 7e797377e1eb..f0e4d290c347 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -54,7 +54,9 @@ enum tpm_timeout {
TPM_TIMEOUT = 5,/* msecs */
TPM_TIMEOUT_RETRY = 100, /* msecs */
TPM_TIMEOUT_RANGE_US = 300, /* usecs */
-   TPM_TIMEOUT_POLL = 1/* msecs */
+   TPM_TIMEOUT_POLL = 1,   /* msecs */
+   TPM_TIMEOUT_USECS_MIN = 100,  /* usecs */
+   TPM_TIMEOUT_USECS_MAX = 500  /* usecs */
 };
 
 /* TPM addresses */
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 021e6b68f2db..bbd8eed30e57 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -84,7 +84,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
}
} else {
do {
-   tpm_msleep(TPM_TIMEOUT_POLL);
+   usleep_range(TPM_TIMEOUT_USECS_MIN,
+TPM_TIMEOUT_USECS_MAX);
status = chip->ops->status(chip);
if ((status & mask) == mask)
return 0;
@@ -226,7 +227,7 @@ static int get_burstcount(struct tpm_chip *chip)
burstcnt = (value >> 8) & 0x;
if (burstcnt)
return burstcnt;
-   tpm_msleep(TPM_TIMEOUT_POLL);
+   usleep_range(TPM_TIMEOUT_USECS_MIN, TPM_TIMEOUT_USECS_MAX);
} while (time_before(jiffies, stop));
return -EBUSY;
 }
-- 
2.13.3



[PATCH v4 2/2] tpm: reduce polling time to usecs for even finer granularity

2018-05-15 Thread Nayna Jain
The TPM burstcount and status commands are supposed to return very
quickly [2][3]. This patch further reduces the TPM poll sleep time to usecs
in get_burstcount() and wait_for_tpm_stat() by calling usleep_range()
directly.

After this change, performance on a system[1] with a TPM 1.2 with an 8 byte
burstcount for 1000 extends improved from ~10.7 sec to ~7 sec.

[1] All tests are performed on an x86 based, locked down, single purpose
closed system. It has Infineon TPM 1.2 using LPC Bus.

[2] From the TCG Specification "TCG PC Client Specific TPM Interface
Specification (TIS), Family 1.2":

"NOTE : It takes roughly 330 ns per byte transfer on LPC. 256 bytes would
take 84 us, which is a long time to stall the CPU. Chipsets may not be
designed to post this much data to LPC; therefore, the CPU itself is
stalled for much of this time. Sending 1 kB would take 350 μs. Therefore,
even if the TPM_STS_x.burstCount field is a high value, software SHOULD
be interruptible during this period."

[3] From the TCG Specification 2.0, "TCG PC Client Platform TPM Profile
(PTP) Specification":

"It takes roughly 330 ns per byte transfer on LPC. 256 bytes would take
84 us. Chipsets may not be designed to post this much data to LPC;
therefore, the CPU itself is stalled for much of this time. Sending 1 kB
would take 350 us. Therefore, even if the TPM_STS_x.burstCount field is a
high value, software should be interruptible during this period. For SPI,
assuming 20MHz clock and 64-byte transfers, it would take about 120 usec
to move 256B of data. Sending 1kB would take about 500 usec. If the
transactions are done using 4 bytes at a time, then it would take about
1 msec. to transfer 1kB of data."

Signed-off-by: Nayna Jain 
Reviewed-by: Mimi Zohar 
Reviewed-by: Jarkko Sakkinen 
Acked-by: Jay Freyensee 
---
 drivers/char/tpm/tpm.h  | 4 +++-
 drivers/char/tpm/tpm_tis_core.c | 5 +++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 7e797377e1eb..f0e4d290c347 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -54,7 +54,9 @@ enum tpm_timeout {
TPM_TIMEOUT = 5,/* msecs */
TPM_TIMEOUT_RETRY = 100, /* msecs */
TPM_TIMEOUT_RANGE_US = 300, /* usecs */
-   TPM_TIMEOUT_POLL = 1/* msecs */
+   TPM_TIMEOUT_POLL = 1,   /* msecs */
+   TPM_TIMEOUT_USECS_MIN = 100,  /* usecs */
+   TPM_TIMEOUT_USECS_MAX = 500  /* usecs */
 };
 
 /* TPM addresses */
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 021e6b68f2db..bbd8eed30e57 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -84,7 +84,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
}
} else {
do {
-   tpm_msleep(TPM_TIMEOUT_POLL);
+   usleep_range(TPM_TIMEOUT_USECS_MIN,
+TPM_TIMEOUT_USECS_MAX);
status = chip->ops->status(chip);
if ((status & mask) == mask)
return 0;
@@ -226,7 +227,7 @@ static int get_burstcount(struct tpm_chip *chip)
burstcnt = (value >> 8) & 0x;
if (burstcnt)
return burstcnt;
-   tpm_msleep(TPM_TIMEOUT_POLL);
+   usleep_range(TPM_TIMEOUT_USECS_MIN, TPM_TIMEOUT_USECS_MAX);
} while (time_before(jiffies, stop));
return -EBUSY;
 }
-- 
2.13.3



Re: [PATCH v16 3/9] PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices

2018-05-15 Thread poza

On 2018-05-16 05:29, Bjorn Helgaas wrote:

On Fri, May 11, 2018 at 06:43:22AM -0400, Oza Pawandeep wrote:

This patch alters the behavior of handling of ERR_FATAL, where removal
of devices is initiated, followed by reset link, followed by
re-enumeration.

So the errors are handled in a different way as follows:
ERR_NONFATAL => call driver recovery entry points
ERR_FATAL=> remove and re-enumerate

please refer to Documentation/PCI/pci-error-recovery.txt for more 
details.


Signed-off-by: Oza Pawandeep 
Reviewed-by: Keith Busch 

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c

index 0ea5acc..649dd1f 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include "aerdrv.h"
+#include "../../pci.h"

 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE 
| \

 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
@@ -475,35 +476,84 @@ static pci_ers_result_t reset_link(struct 
pci_dev *dev)

 }

 /**
- * do_recovery - handle nonfatal/fatal error recovery process
+ * do_fatal_recovery - handle fatal error recovery process
+ * @dev: pointer to a pci_dev data structure of agent detecting an 
error

+ *
+ * Invoked when an error is fatal. Once being invoked, removes the 
devices
+ * benetah this AER agent, followed by reset link e.g. secondary bus 
reset

+ * followed by re-enumeration of devices.
+ */
+
+static void do_fatal_recovery(struct pci_dev *dev)
+{
+   struct pci_dev *udev;
+   struct pci_bus *parent;
+   struct pci_dev *pdev, *temp;
+   pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
+   struct aer_broadcast_data result_data;
+
+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   udev = dev;
+   else
+   udev = dev->bus->self;
+
+   parent = udev->subordinate;
+   pci_lock_rescan_remove();
+   list_for_each_entry_safe_reverse(pdev, temp, >devices,
+bus_list) {
+   pci_dev_get(pdev);
+   pci_dev_set_disconnected(pdev, NULL);
+   if (pci_has_subordinate(pdev))
+   pci_walk_bus(pdev->subordinate,
+pci_dev_set_disconnected, NULL);
+   pci_stop_and_remove_bus_device(pdev);
+   pci_dev_put(pdev);
+   }
+
+   result = reset_link(udev);


I don't like the fact that for DPC, the link reset happens before we 
call
the driver .remove() methods, while for AER, the reset happens *after* 
the
.remove() methods.  That means the .remove() methods may work 
differently

for AER vs. DPC, e.g., they may be able to access the device if AER is
handling the error, but they would not be able to access it if DPC is
handling it.

I don't know how to fix this, and I think we can keep this patch as it 
is

for now, but I think we should fix it eventually.


point noted, will see to this eventually.




+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   /*
+* If the error is reported by a bridge, we think this error
+* is related to the downstream link of the bridge, so we
+* do error recovery on all subordinates of the bridge instead
+* of the bridge and clear the error status of the bridge.
+*/
+   pci_walk_bus(dev->subordinate, report_resume, _data);
+   pci_cleanup_aer_uncorrect_error_status(dev);
+   }
+
+   if (result == PCI_ERS_RESULT_RECOVERED) {
+   if (pcie_wait_for_link(udev, true))
+   pci_rescan_bus(udev->bus);
+   } else {
+   pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
+   pci_info(dev, "AER: Device recovery failed\n");
+   }
+
+   pci_unlock_rescan_remove();
+}
+
+/**
+ * do_nonfatal_recovery - handle nonfatal error recovery process
  * @dev: pointer to a pci_dev data structure of agent detecting an 
error

- * @severity: error severity type
  *
  * Invoked when an error is nonfatal/fatal. Once being invoked, 
broadcast
  * error detected message to all downstream drivers within a 
hierarchy in

  * question and return the returned code.
  */
-static void do_recovery(struct pci_dev *dev, int severity)
+static void do_nonfatal_recovery(struct pci_dev *dev)
 {
-   pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
+   pci_ers_result_t status;
enum pci_channel_state state;

-   if (severity == AER_FATAL)
-   state = pci_channel_io_frozen;
-   else
-   state = pci_channel_io_normal;
+   state = pci_channel_io_normal;

status = broadcast_error_message(dev,
state,
"error_detected",
report_error_detected);

-   if (severity == AER_FATAL) {
-   result 

Re: [PATCH v16 3/9] PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices

2018-05-15 Thread poza

On 2018-05-16 05:29, Bjorn Helgaas wrote:

On Fri, May 11, 2018 at 06:43:22AM -0400, Oza Pawandeep wrote:

This patch alters the behavior of handling of ERR_FATAL, where removal
of devices is initiated, followed by reset link, followed by
re-enumeration.

So the errors are handled in a different way as follows:
ERR_NONFATAL => call driver recovery entry points
ERR_FATAL=> remove and re-enumerate

please refer to Documentation/PCI/pci-error-recovery.txt for more 
details.


Signed-off-by: Oza Pawandeep 
Reviewed-by: Keith Busch 

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c

index 0ea5acc..649dd1f 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include "aerdrv.h"
+#include "../../pci.h"

 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE 
| \

 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
@@ -475,35 +476,84 @@ static pci_ers_result_t reset_link(struct 
pci_dev *dev)

 }

 /**
- * do_recovery - handle nonfatal/fatal error recovery process
+ * do_fatal_recovery - handle fatal error recovery process
+ * @dev: pointer to a pci_dev data structure of agent detecting an 
error

+ *
+ * Invoked when an error is fatal. Once being invoked, removes the 
devices
+ * benetah this AER agent, followed by reset link e.g. secondary bus 
reset

+ * followed by re-enumeration of devices.
+ */
+
+static void do_fatal_recovery(struct pci_dev *dev)
+{
+   struct pci_dev *udev;
+   struct pci_bus *parent;
+   struct pci_dev *pdev, *temp;
+   pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
+   struct aer_broadcast_data result_data;
+
+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   udev = dev;
+   else
+   udev = dev->bus->self;
+
+   parent = udev->subordinate;
+   pci_lock_rescan_remove();
+   list_for_each_entry_safe_reverse(pdev, temp, >devices,
+bus_list) {
+   pci_dev_get(pdev);
+   pci_dev_set_disconnected(pdev, NULL);
+   if (pci_has_subordinate(pdev))
+   pci_walk_bus(pdev->subordinate,
+pci_dev_set_disconnected, NULL);
+   pci_stop_and_remove_bus_device(pdev);
+   pci_dev_put(pdev);
+   }
+
+   result = reset_link(udev);


I don't like the fact that for DPC, the link reset happens before we 
call
the driver .remove() methods, while for AER, the reset happens *after* 
the
.remove() methods.  That means the .remove() methods may work 
differently

for AER vs. DPC, e.g., they may be able to access the device if AER is
handling the error, but they would not be able to access it if DPC is
handling it.

I don't know how to fix this, and I think we can keep this patch as it 
is

for now, but I think we should fix it eventually.


point noted, will see to this eventually.




+   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   /*
+* If the error is reported by a bridge, we think this error
+* is related to the downstream link of the bridge, so we
+* do error recovery on all subordinates of the bridge instead
+* of the bridge and clear the error status of the bridge.
+*/
+   pci_walk_bus(dev->subordinate, report_resume, _data);
+   pci_cleanup_aer_uncorrect_error_status(dev);
+   }
+
+   if (result == PCI_ERS_RESULT_RECOVERED) {
+   if (pcie_wait_for_link(udev, true))
+   pci_rescan_bus(udev->bus);
+   } else {
+   pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
+   pci_info(dev, "AER: Device recovery failed\n");
+   }
+
+   pci_unlock_rescan_remove();
+}
+
+/**
+ * do_nonfatal_recovery - handle nonfatal error recovery process
  * @dev: pointer to a pci_dev data structure of agent detecting an 
error

- * @severity: error severity type
  *
  * Invoked when an error is nonfatal/fatal. Once being invoked, 
broadcast
  * error detected message to all downstream drivers within a 
hierarchy in

  * question and return the returned code.
  */
-static void do_recovery(struct pci_dev *dev, int severity)
+static void do_nonfatal_recovery(struct pci_dev *dev)
 {
-   pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
+   pci_ers_result_t status;
enum pci_channel_state state;

-   if (severity == AER_FATAL)
-   state = pci_channel_io_frozen;
-   else
-   state = pci_channel_io_normal;
+   state = pci_channel_io_normal;

status = broadcast_error_message(dev,
state,
"error_detected",
report_error_detected);

-   if (severity == AER_FATAL) {
-   result = reset_link(dev);
-   if 

[PATCH 04/14] mm: remove the unused device_private_entry_fault export

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 kernel/memremap.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index db4e1a373e5f..59ee3b604b39 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -65,7 +65,6 @@ int device_private_entry_fault(struct vm_area_struct *vma,
 */
return page->pgmap->page_fault(vma, addr, page, flags, pmdp);
 }
-EXPORT_SYMBOL(device_private_entry_fault);
 #endif /* CONFIG_DEVICE_PRIVATE */
 
 static void pgmap_radix_release(struct resource *res, unsigned long end_pgoff)
-- 
2.17.0



Re: [PATCH] usbip: usbip_host: fix bad unlock balance during stub_probe()

2018-05-15 Thread Greg KH
On Tue, May 15, 2018 at 05:57:23PM -0600, Shuah Khan (Samsung OSG) wrote:
> stub_probe() calls put_busid_priv() in an error path when device isn't
> found in the busid_table. Fix it by making put_busid_priv() safe to be
> called with null struct bus_id_priv pointer.
> 
> This problem happens when "usbip bind" is run without loading usbip_host
> driver and then running modprobe. The first failed bind attempt unbinds
> the device from the original driver and when usbip_host is modprobed,
> stub_probe() runs and doesn't find the device in its busid table and calls
> put_busid_priv(0 with null bus_id_priv pointer.
> 
> usbip-host 3-10.2: 3-10.2 is not in match_busid table...  skip!
> 
> [  367.359679] =
> [  367.359681] WARNING: bad unlock balance detected!
> [  367.359683] 4.17.0-rc4+ #5 Not tainted
> [  367.359685] -
> [  367.359688] modprobe/2768 is trying to release lock (
> [  367.359689]
> ==
> [  367.359696] BUG: KASAN: null-ptr-deref in
> print_unlock_imbalance_bug+0x99/0x110
> [  367.359699] Read of size 8 at addr 0058 by task
> modprobe/2768

Minor nit, no need to line-wrap this.

> [  367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ #5
> 
> Fixes: 22076557b07c ("usbip: usbip_host: fix NULL-ptr deref and
> use-after-free errors") in usb-linus

Nor this, and the extra blank line isn't needed here either.  I'll fix
it up by hand when I queue this up later today, thanks.

greg k-h


[PATCH 04/14] mm: remove the unused device_private_entry_fault export

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 kernel/memremap.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index db4e1a373e5f..59ee3b604b39 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -65,7 +65,6 @@ int device_private_entry_fault(struct vm_area_struct *vma,
 */
return page->pgmap->page_fault(vma, addr, page, flags, pmdp);
 }
-EXPORT_SYMBOL(device_private_entry_fault);
 #endif /* CONFIG_DEVICE_PRIVATE */
 
 static void pgmap_radix_release(struct resource *res, unsigned long end_pgoff)
-- 
2.17.0



Re: [PATCH] usbip: usbip_host: fix bad unlock balance during stub_probe()

2018-05-15 Thread Greg KH
On Tue, May 15, 2018 at 05:57:23PM -0600, Shuah Khan (Samsung OSG) wrote:
> stub_probe() calls put_busid_priv() in an error path when device isn't
> found in the busid_table. Fix it by making put_busid_priv() safe to be
> called with null struct bus_id_priv pointer.
> 
> This problem happens when "usbip bind" is run without loading usbip_host
> driver and then running modprobe. The first failed bind attempt unbinds
> the device from the original driver and when usbip_host is modprobed,
> stub_probe() runs and doesn't find the device in its busid table and calls
> put_busid_priv(0 with null bus_id_priv pointer.
> 
> usbip-host 3-10.2: 3-10.2 is not in match_busid table...  skip!
> 
> [  367.359679] =
> [  367.359681] WARNING: bad unlock balance detected!
> [  367.359683] 4.17.0-rc4+ #5 Not tainted
> [  367.359685] -
> [  367.359688] modprobe/2768 is trying to release lock (
> [  367.359689]
> ==
> [  367.359696] BUG: KASAN: null-ptr-deref in
> print_unlock_imbalance_bug+0x99/0x110
> [  367.359699] Read of size 8 at addr 0058 by task
> modprobe/2768

Minor nit, no need to line-wrap this.

> [  367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ #5
> 
> Fixes: 22076557b07c ("usbip: usbip_host: fix NULL-ptr deref and
> use-after-free errors") in usb-linus

Nor this, and the extra blank line isn't needed here either.  I'll fix
it up by hand when I queue this up later today, thanks.

greg k-h


[PATCH 06/14] btrfs: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/btrfs/ctree.h |  2 +-
 fs/btrfs/inode.c | 19 ++-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1485cd130e2b..02a0de73c1d1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3203,7 +3203,7 @@ int btrfs_merge_bio_hook(struct page *page, unsigned long 
offset,
 size_t size, struct bio *bio,
 unsigned long bio_flags);
 void btrfs_set_range_writeback(void *private_data, u64 start, u64 end);
-int btrfs_page_mkwrite(struct vm_fault *vmf);
+vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf);
 int btrfs_readpage(struct file *file, struct page *page);
 void btrfs_evict_inode(struct inode *inode);
 int btrfs_write_inode(struct inode *inode, struct writeback_control *wbc);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ec9db248c499..f4f03f0f4556 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8824,7 +8824,7 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
  * beyond EOF, then the page is guaranteed safe against truncation until we
  * unlock the page.
  */
-int btrfs_page_mkwrite(struct vm_fault *vmf)
+vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 {
struct page *page = vmf->page;
struct inode *inode = file_inode(vmf->vma->vm_file);
@@ -8836,7 +8836,8 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
char *kaddr;
unsigned long zero_start;
loff_t size;
-   int ret;
+   vm_fault_t ret;
+   int err;
int reserved = 0;
u64 reserved_space;
u64 page_start;
@@ -8858,14 +8859,14 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 * end up waiting indefinitely to get a lock on the page currently
 * being processed by btrfs_page_mkwrite() function.
 */
-   ret = btrfs_delalloc_reserve_space(inode, _reserved, page_start,
+   err = btrfs_delalloc_reserve_space(inode, _reserved, page_start,
   reserved_space);
-   if (!ret) {
-   ret = file_update_time(vmf->vma->vm_file);
+   if (!err) {
+   err = file_update_time(vmf->vma->vm_file);
reserved = 1;
}
-   if (ret) {
-   if (ret == -ENOMEM)
+   if (err) {
+   if (err == -ENOMEM)
ret = VM_FAULT_OOM;
else /* -ENOSPC, -EIO, etc */
ret = VM_FAULT_SIGBUS;
@@ -8927,9 +8928,9 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
  EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
  0, 0, _state);
 
-   ret = btrfs_set_extent_delalloc(inode, page_start, end, 0,
+   err = btrfs_set_extent_delalloc(inode, page_start, end, 0,
_state, 0);
-   if (ret) {
+   if (err) {
unlock_extent_cached(io_tree, page_start, page_end,
 _state);
ret = VM_FAULT_SIGBUS;
-- 
2.17.0



[PATCH 06/14] btrfs: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/btrfs/ctree.h |  2 +-
 fs/btrfs/inode.c | 19 ++-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1485cd130e2b..02a0de73c1d1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3203,7 +3203,7 @@ int btrfs_merge_bio_hook(struct page *page, unsigned long 
offset,
 size_t size, struct bio *bio,
 unsigned long bio_flags);
 void btrfs_set_range_writeback(void *private_data, u64 start, u64 end);
-int btrfs_page_mkwrite(struct vm_fault *vmf);
+vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf);
 int btrfs_readpage(struct file *file, struct page *page);
 void btrfs_evict_inode(struct inode *inode);
 int btrfs_write_inode(struct inode *inode, struct writeback_control *wbc);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ec9db248c499..f4f03f0f4556 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8824,7 +8824,7 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
  * beyond EOF, then the page is guaranteed safe against truncation until we
  * unlock the page.
  */
-int btrfs_page_mkwrite(struct vm_fault *vmf)
+vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 {
struct page *page = vmf->page;
struct inode *inode = file_inode(vmf->vma->vm_file);
@@ -8836,7 +8836,8 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
char *kaddr;
unsigned long zero_start;
loff_t size;
-   int ret;
+   vm_fault_t ret;
+   int err;
int reserved = 0;
u64 reserved_space;
u64 page_start;
@@ -8858,14 +8859,14 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 * end up waiting indefinitely to get a lock on the page currently
 * being processed by btrfs_page_mkwrite() function.
 */
-   ret = btrfs_delalloc_reserve_space(inode, _reserved, page_start,
+   err = btrfs_delalloc_reserve_space(inode, _reserved, page_start,
   reserved_space);
-   if (!ret) {
-   ret = file_update_time(vmf->vma->vm_file);
+   if (!err) {
+   err = file_update_time(vmf->vma->vm_file);
reserved = 1;
}
-   if (ret) {
-   if (ret == -ENOMEM)
+   if (err) {
+   if (err == -ENOMEM)
ret = VM_FAULT_OOM;
else /* -ENOSPC, -EIO, etc */
ret = VM_FAULT_SIGBUS;
@@ -8927,9 +8928,9 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
  EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
  0, 0, _state);
 
-   ret = btrfs_set_extent_delalloc(inode, page_start, end, 0,
+   err = btrfs_set_extent_delalloc(inode, page_start, end, 0,
_state, 0);
-   if (ret) {
+   if (err) {
unlock_extent_cached(io_tree, page_start, page_end,
 _state);
ret = VM_FAULT_SIGBUS;
-- 
2.17.0



[PATCH 10/14] vgem: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
And streamline the code in vgem_fault with early returns so that it is
a little bit more readable.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/vgem/vgem_drv.c | 51 +++--
 1 file changed, 23 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
index 2524ff116f00..a261e0aab83a 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.c
+++ b/drivers/gpu/drm/vgem/vgem_drv.c
@@ -61,12 +61,13 @@ static void vgem_gem_free_object(struct drm_gem_object *obj)
kfree(vgem_obj);
 }
 
-static int vgem_gem_fault(struct vm_fault *vmf)
+static vm_fault_t vgem_gem_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct drm_vgem_gem_object *obj = vma->vm_private_data;
/* We don't use vmf->pgoff since that has the fake offset */
unsigned long vaddr = vmf->address;
+   struct page *page;
int ret;
loff_t num_pages;
pgoff_t page_offset;
@@ -85,35 +86,29 @@ static int vgem_gem_fault(struct vm_fault *vmf)
ret = 0;
}
mutex_unlock(>pages_lock);
-   if (ret) {
-   struct page *page;
-
-   page = shmem_read_mapping_page(
-   file_inode(obj->base.filp)->i_mapping,
-   page_offset);
-   if (!IS_ERR(page)) {
-   vmf->page = page;
-   ret = 0;
-   } else switch (PTR_ERR(page)) {
-   case -ENOSPC:
-   case -ENOMEM:
-   ret = VM_FAULT_OOM;
-   break;
-   case -EBUSY:
-   ret = VM_FAULT_RETRY;
-   break;
-   case -EFAULT:
-   case -EINVAL:
-   ret = VM_FAULT_SIGBUS;
-   break;
-   default:
-   WARN_ON(PTR_ERR(page));
-   ret = VM_FAULT_SIGBUS;
-   break;
-   }
+   if (!ret)
+   return 0;
+
+   page = shmem_read_mapping_page(file_inode(obj->base.filp)->i_mapping,
+   page_offset);
+   if (!IS_ERR(page)) {
+   vmf->page = page;
+   return 0;
+   }
 
+   switch (PTR_ERR(page)) {
+   case -ENOSPC:
+   case -ENOMEM:
+   return VM_FAULT_OOM;
+   case -EBUSY:
+   return VM_FAULT_RETRY;
+   case -EFAULT:
+   case -EINVAL:
+   return VM_FAULT_SIGBUS;
+   default:
+   WARN_ON(PTR_ERR(page));
+   return VM_FAULT_SIGBUS;
}
-   return ret;
 }
 
 static const struct vm_operations_struct vgem_gem_vm_ops = {
-- 
2.17.0



[PATCH 10/14] vgem: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
And streamline the code in vgem_fault with early returns so that it is
a little bit more readable.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/vgem/vgem_drv.c | 51 +++--
 1 file changed, 23 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
index 2524ff116f00..a261e0aab83a 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.c
+++ b/drivers/gpu/drm/vgem/vgem_drv.c
@@ -61,12 +61,13 @@ static void vgem_gem_free_object(struct drm_gem_object *obj)
kfree(vgem_obj);
 }
 
-static int vgem_gem_fault(struct vm_fault *vmf)
+static vm_fault_t vgem_gem_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct drm_vgem_gem_object *obj = vma->vm_private_data;
/* We don't use vmf->pgoff since that has the fake offset */
unsigned long vaddr = vmf->address;
+   struct page *page;
int ret;
loff_t num_pages;
pgoff_t page_offset;
@@ -85,35 +86,29 @@ static int vgem_gem_fault(struct vm_fault *vmf)
ret = 0;
}
mutex_unlock(>pages_lock);
-   if (ret) {
-   struct page *page;
-
-   page = shmem_read_mapping_page(
-   file_inode(obj->base.filp)->i_mapping,
-   page_offset);
-   if (!IS_ERR(page)) {
-   vmf->page = page;
-   ret = 0;
-   } else switch (PTR_ERR(page)) {
-   case -ENOSPC:
-   case -ENOMEM:
-   ret = VM_FAULT_OOM;
-   break;
-   case -EBUSY:
-   ret = VM_FAULT_RETRY;
-   break;
-   case -EFAULT:
-   case -EINVAL:
-   ret = VM_FAULT_SIGBUS;
-   break;
-   default:
-   WARN_ON(PTR_ERR(page));
-   ret = VM_FAULT_SIGBUS;
-   break;
-   }
+   if (!ret)
+   return 0;
+
+   page = shmem_read_mapping_page(file_inode(obj->base.filp)->i_mapping,
+   page_offset);
+   if (!IS_ERR(page)) {
+   vmf->page = page;
+   return 0;
+   }
 
+   switch (PTR_ERR(page)) {
+   case -ENOSPC:
+   case -ENOMEM:
+   return VM_FAULT_OOM;
+   case -EBUSY:
+   return VM_FAULT_RETRY;
+   case -EFAULT:
+   case -EINVAL:
+   return VM_FAULT_SIGBUS;
+   default:
+   WARN_ON(PTR_ERR(page));
+   return VM_FAULT_SIGBUS;
}
-   return ret;
 }
 
 static const struct vm_operations_struct vgem_gem_vm_ops = {
-- 
2.17.0



[PATCH 11/14] ttm: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 42 +
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 8eba95b3c737..255e7801f62c 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -43,10 +43,11 @@
 
 #define TTM_BO_VM_NUM_PREFAULT 16
 
-static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
+static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
struct vm_fault *vmf)
 {
-   int ret = 0;
+   vm_fault_t ret = 0;
+   int err = 0;
 
if (likely(!bo->moving))
goto out_unlock;
@@ -77,8 +78,8 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
/*
 * Ordinary wait.
 */
-   ret = dma_fence_wait(bo->moving, true);
-   if (unlikely(ret != 0)) {
+   err = dma_fence_wait(bo->moving, true);
+   if (unlikely(err != 0)) {
ret = (ret != -ERESTARTSYS) ? VM_FAULT_SIGBUS :
VM_FAULT_NOPAGE;
goto out_unlock;
@@ -104,7 +105,7 @@ static unsigned long ttm_bo_io_mem_pfn(struct 
ttm_buffer_object *bo,
+ page_offset;
 }
 
-static int ttm_bo_vm_fault(struct vm_fault *vmf)
+static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct ttm_buffer_object *bo = (struct ttm_buffer_object *)
@@ -115,7 +116,8 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
unsigned long pfn;
struct ttm_tt *ttm = NULL;
struct page *page;
-   int ret;
+   vm_fault_t ret;
+   int err;
int i;
unsigned long address = vmf->address;
struct ttm_mem_type_manager *man =
@@ -128,9 +130,9 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
 * for reserve, and if it fails, retry the fault after waiting
 * for the buffer to become unreserved.
 */
-   ret = ttm_bo_reserve(bo, true, true, NULL);
-   if (unlikely(ret != 0)) {
-   if (ret != -EBUSY)
+   err = ttm_bo_reserve(bo, true, true, NULL);
+   if (unlikely(err != 0)) {
+   if (err != -EBUSY)
return VM_FAULT_NOPAGE;
 
if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
@@ -162,8 +164,8 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
}
 
if (bdev->driver->fault_reserve_notify) {
-   ret = bdev->driver->fault_reserve_notify(bo);
-   switch (ret) {
+   err = bdev->driver->fault_reserve_notify(bo);
+   switch (err) {
case 0:
break;
case -EBUSY:
@@ -191,13 +193,13 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
goto out_unlock;
}
 
-   ret = ttm_mem_io_lock(man, true);
-   if (unlikely(ret != 0)) {
+   err = ttm_mem_io_lock(man, true);
+   if (unlikely(err != 0)) {
ret = VM_FAULT_NOPAGE;
goto out_unlock;
}
-   ret = ttm_mem_io_reserve_vm(bo);
-   if (unlikely(ret != 0)) {
+   err = ttm_mem_io_reserve_vm(bo);
+   if (unlikely(err != 0)) {
ret = VM_FAULT_SIGBUS;
goto out_io_unlock;
}
@@ -265,21 +267,21 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
}
 
if (vma->vm_flags & VM_MIXEDMAP)
-   ret = vm_insert_mixed(, address,
+   err = vm_insert_mixed(, address,
__pfn_to_pfn_t(pfn, PFN_DEV));
else
-   ret = vm_insert_pfn(, address, pfn);
+   err = vm_insert_pfn(, address, pfn);
 
/*
 * Somebody beat us to this PTE or prefaulting to
 * an already populated PTE, or prefaulting error.
 */
 
-   if (unlikely((ret == -EBUSY) || (ret != 0 && i > 0)))
+   if (unlikely((err == -EBUSY) || (err != 0 && i > 0)))
break;
-   else if (unlikely(ret != 0)) {
+   else if (unlikely(err != 0)) {
ret =
-   (ret == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
+   (err == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
goto out_io_unlock;
}
 
-- 
2.17.0



[PATCH 11/14] ttm: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 42 +
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 8eba95b3c737..255e7801f62c 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -43,10 +43,11 @@
 
 #define TTM_BO_VM_NUM_PREFAULT 16
 
-static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
+static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
struct vm_fault *vmf)
 {
-   int ret = 0;
+   vm_fault_t ret = 0;
+   int err = 0;
 
if (likely(!bo->moving))
goto out_unlock;
@@ -77,8 +78,8 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
/*
 * Ordinary wait.
 */
-   ret = dma_fence_wait(bo->moving, true);
-   if (unlikely(ret != 0)) {
+   err = dma_fence_wait(bo->moving, true);
+   if (unlikely(err != 0)) {
ret = (ret != -ERESTARTSYS) ? VM_FAULT_SIGBUS :
VM_FAULT_NOPAGE;
goto out_unlock;
@@ -104,7 +105,7 @@ static unsigned long ttm_bo_io_mem_pfn(struct 
ttm_buffer_object *bo,
+ page_offset;
 }
 
-static int ttm_bo_vm_fault(struct vm_fault *vmf)
+static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct ttm_buffer_object *bo = (struct ttm_buffer_object *)
@@ -115,7 +116,8 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
unsigned long pfn;
struct ttm_tt *ttm = NULL;
struct page *page;
-   int ret;
+   vm_fault_t ret;
+   int err;
int i;
unsigned long address = vmf->address;
struct ttm_mem_type_manager *man =
@@ -128,9 +130,9 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
 * for reserve, and if it fails, retry the fault after waiting
 * for the buffer to become unreserved.
 */
-   ret = ttm_bo_reserve(bo, true, true, NULL);
-   if (unlikely(ret != 0)) {
-   if (ret != -EBUSY)
+   err = ttm_bo_reserve(bo, true, true, NULL);
+   if (unlikely(err != 0)) {
+   if (err != -EBUSY)
return VM_FAULT_NOPAGE;
 
if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
@@ -162,8 +164,8 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
}
 
if (bdev->driver->fault_reserve_notify) {
-   ret = bdev->driver->fault_reserve_notify(bo);
-   switch (ret) {
+   err = bdev->driver->fault_reserve_notify(bo);
+   switch (err) {
case 0:
break;
case -EBUSY:
@@ -191,13 +193,13 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
goto out_unlock;
}
 
-   ret = ttm_mem_io_lock(man, true);
-   if (unlikely(ret != 0)) {
+   err = ttm_mem_io_lock(man, true);
+   if (unlikely(err != 0)) {
ret = VM_FAULT_NOPAGE;
goto out_unlock;
}
-   ret = ttm_mem_io_reserve_vm(bo);
-   if (unlikely(ret != 0)) {
+   err = ttm_mem_io_reserve_vm(bo);
+   if (unlikely(err != 0)) {
ret = VM_FAULT_SIGBUS;
goto out_io_unlock;
}
@@ -265,21 +267,21 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
}
 
if (vma->vm_flags & VM_MIXEDMAP)
-   ret = vm_insert_mixed(, address,
+   err = vm_insert_mixed(, address,
__pfn_to_pfn_t(pfn, PFN_DEV));
else
-   ret = vm_insert_pfn(, address, pfn);
+   err = vm_insert_pfn(, address, pfn);
 
/*
 * Somebody beat us to this PTE or prefaulting to
 * an already populated PTE, or prefaulting error.
 */
 
-   if (unlikely((ret == -EBUSY) || (ret != 0 && i > 0)))
+   if (unlikely((err == -EBUSY) || (err != 0 && i > 0)))
break;
-   else if (unlikely(ret != 0)) {
+   else if (unlikely(err != 0)) {
ret =
-   (ret == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
+   (err == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
goto out_io_unlock;
}
 
-- 
2.17.0



[PATCH 09/14] ubifs: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/ubifs/file.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 1acb2ff505e6..7c1a2e1c3de5 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1513,7 +1513,7 @@ static int ubifs_releasepage(struct page *page, gfp_t 
unused_gfp_flags)
  * mmap()d file has taken write protection fault and is being made writable.
  * UBIFS must ensure page is budgeted for.
  */
-static int ubifs_vm_page_mkwrite(struct vm_fault *vmf)
+static vm_fault_t ubifs_vm_page_mkwrite(struct vm_fault *vmf)
 {
struct page *page = vmf->page;
struct inode *inode = file_inode(vmf->vma->vm_file);
@@ -1521,6 +1521,7 @@ static int ubifs_vm_page_mkwrite(struct vm_fault *vmf)
struct timespec now = current_time(inode);
struct ubifs_budget_req req = { .new_page = 1 };
int err, update_time;
+   vm_fault_t ret = 0;
 
dbg_gen("ino %lu, pg %lu, i_size %lld", inode->i_ino, page->index,
i_size_read(inode));
@@ -1601,8 +1602,8 @@ static int ubifs_vm_page_mkwrite(struct vm_fault *vmf)
unlock_page(page);
ubifs_release_budget(c, );
if (err)
-   err = VM_FAULT_SIGBUS;
-   return err;
+   ret = VM_FAULT_SIGBUS;
+   return ret;
 }
 
 static const struct vm_operations_struct ubifs_file_vm_ops = {
-- 
2.17.0



[PATCH 09/14] ubifs: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/ubifs/file.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 1acb2ff505e6..7c1a2e1c3de5 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1513,7 +1513,7 @@ static int ubifs_releasepage(struct page *page, gfp_t 
unused_gfp_flags)
  * mmap()d file has taken write protection fault and is being made writable.
  * UBIFS must ensure page is budgeted for.
  */
-static int ubifs_vm_page_mkwrite(struct vm_fault *vmf)
+static vm_fault_t ubifs_vm_page_mkwrite(struct vm_fault *vmf)
 {
struct page *page = vmf->page;
struct inode *inode = file_inode(vmf->vma->vm_file);
@@ -1521,6 +1521,7 @@ static int ubifs_vm_page_mkwrite(struct vm_fault *vmf)
struct timespec now = current_time(inode);
struct ubifs_budget_req req = { .new_page = 1 };
int err, update_time;
+   vm_fault_t ret = 0;
 
dbg_gen("ino %lu, pg %lu, i_size %lld", inode->i_ino, page->index,
i_size_read(inode));
@@ -1601,8 +1602,8 @@ static int ubifs_vm_page_mkwrite(struct vm_fault *vmf)
unlock_page(page);
ubifs_release_budget(c, );
if (err)
-   err = VM_FAULT_SIGBUS;
-   return err;
+   ret = VM_FAULT_SIGBUS;
+   return ret;
 }
 
 static const struct vm_operations_struct ubifs_file_vm_ops = {
-- 
2.17.0



[PATCH 12/14] lustre: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 .../staging/lustre/lustre/llite/llite_mmap.c  | 37 +++
 .../lustre/lustre/llite/vvp_internal.h|  2 +-
 2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c 
b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index 214b07554e62..061d98871959 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -231,23 +231,18 @@ static int ll_page_mkwrite0(struct vm_area_struct *vma, 
struct page *vmpage,
return result;
 }
 
-static inline int to_fault_error(int result)
+static inline vm_fault_t to_fault_error(int result)
 {
switch (result) {
case 0:
-   result = VM_FAULT_LOCKED;
-   break;
+   return VM_FAULT_LOCKED;
case -EFAULT:
-   result = VM_FAULT_NOPAGE;
-   break;
+   return VM_FAULT_NOPAGE;
case -ENOMEM:
-   result = VM_FAULT_OOM;
-   break;
+   return VM_FAULT_OOM;
default:
-   result = VM_FAULT_SIGBUS;
-   break;
+   return VM_FAULT_SIGBUS;
}
-   return result;
 }
 
 /**
@@ -261,7 +256,7 @@ static inline int to_fault_error(int result)
  * \retval VM_FAULT_ERROR on general error
  * \retval NOPAGE_OOM not have memory for allocate new page
  */
-static int ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf)
+static vm_fault_t ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
struct lu_env  *env;
struct cl_io*io;
@@ -269,7 +264,7 @@ static int ll_fault0(struct vm_area_struct *vma, struct 
vm_fault *vmf)
struct page  *vmpage;
unsigned long   ra_flags;
int   result = 0;
-   int   fault_ret = 0;
+   vm_fault_tfault_ret = 0;
u16 refcheck;
 
env = cl_env_get();
@@ -323,7 +318,7 @@ static int ll_fault0(struct vm_area_struct *vma, struct 
vm_fault *vmf)
return fault_ret;
 }
 
-static int ll_fault(struct vm_fault *vmf)
+static vm_fault_t ll_fault(struct vm_fault *vmf)
 {
int count = 0;
bool printed = false;
@@ -364,7 +359,7 @@ static int ll_fault(struct vm_fault *vmf)
return result;
 }
 
-static int ll_page_mkwrite(struct vm_fault *vmf)
+static vm_fault_t ll_page_mkwrite(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
int count = 0;
@@ -390,22 +385,16 @@ static int ll_page_mkwrite(struct vm_fault *vmf)
switch (result) {
case 0:
LASSERT(PageLocked(vmf->page));
-   result = VM_FAULT_LOCKED;
-   break;
+   return VM_FAULT_LOCKED;
case -ENODATA:
case -EAGAIN:
case -EFAULT:
-   result = VM_FAULT_NOPAGE;
-   break;
+   return VM_FAULT_NOPAGE;
case -ENOMEM:
-   result = VM_FAULT_OOM;
-   break;
+   return VM_FAULT_OOM;
default:
-   result = VM_FAULT_SIGBUS;
-   break;
+   return VM_FAULT_SIGBUS;
}
-
-   return result;
 }
 
 /**
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h 
b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 7d3abb43584a..c194966a3d82 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -83,7 +83,7 @@ struct vvp_io {
/**
 * fault API used bitflags for return code.
 */
-   unsigned intft_flags;
+   vm_fault_tft_flags;
/**
 * check that flags are from filemap_fault
 */
-- 
2.17.0



[PATCH 12/14] lustre: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 .../staging/lustre/lustre/llite/llite_mmap.c  | 37 +++
 .../lustre/lustre/llite/vvp_internal.h|  2 +-
 2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c 
b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index 214b07554e62..061d98871959 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -231,23 +231,18 @@ static int ll_page_mkwrite0(struct vm_area_struct *vma, 
struct page *vmpage,
return result;
 }
 
-static inline int to_fault_error(int result)
+static inline vm_fault_t to_fault_error(int result)
 {
switch (result) {
case 0:
-   result = VM_FAULT_LOCKED;
-   break;
+   return VM_FAULT_LOCKED;
case -EFAULT:
-   result = VM_FAULT_NOPAGE;
-   break;
+   return VM_FAULT_NOPAGE;
case -ENOMEM:
-   result = VM_FAULT_OOM;
-   break;
+   return VM_FAULT_OOM;
default:
-   result = VM_FAULT_SIGBUS;
-   break;
+   return VM_FAULT_SIGBUS;
}
-   return result;
 }
 
 /**
@@ -261,7 +256,7 @@ static inline int to_fault_error(int result)
  * \retval VM_FAULT_ERROR on general error
  * \retval NOPAGE_OOM not have memory for allocate new page
  */
-static int ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf)
+static vm_fault_t ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
struct lu_env  *env;
struct cl_io*io;
@@ -269,7 +264,7 @@ static int ll_fault0(struct vm_area_struct *vma, struct 
vm_fault *vmf)
struct page  *vmpage;
unsigned long   ra_flags;
int   result = 0;
-   int   fault_ret = 0;
+   vm_fault_tfault_ret = 0;
u16 refcheck;
 
env = cl_env_get();
@@ -323,7 +318,7 @@ static int ll_fault0(struct vm_area_struct *vma, struct 
vm_fault *vmf)
return fault_ret;
 }
 
-static int ll_fault(struct vm_fault *vmf)
+static vm_fault_t ll_fault(struct vm_fault *vmf)
 {
int count = 0;
bool printed = false;
@@ -364,7 +359,7 @@ static int ll_fault(struct vm_fault *vmf)
return result;
 }
 
-static int ll_page_mkwrite(struct vm_fault *vmf)
+static vm_fault_t ll_page_mkwrite(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
int count = 0;
@@ -390,22 +385,16 @@ static int ll_page_mkwrite(struct vm_fault *vmf)
switch (result) {
case 0:
LASSERT(PageLocked(vmf->page));
-   result = VM_FAULT_LOCKED;
-   break;
+   return VM_FAULT_LOCKED;
case -ENODATA:
case -EAGAIN:
case -EFAULT:
-   result = VM_FAULT_NOPAGE;
-   break;
+   return VM_FAULT_NOPAGE;
case -ENOMEM:
-   result = VM_FAULT_OOM;
-   break;
+   return VM_FAULT_OOM;
default:
-   result = VM_FAULT_SIGBUS;
-   break;
+   return VM_FAULT_SIGBUS;
}
-
-   return result;
 }
 
 /**
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h 
b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 7d3abb43584a..c194966a3d82 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -83,7 +83,7 @@ struct vvp_io {
/**
 * fault API used bitflags for return code.
 */
-   unsigned intft_flags;
+   vm_fault_tft_flags;
/**
 * check that flags are from filemap_fault
 */
-- 
2.17.0



[PATCH 13/14] mm: move arch specific VM_FAULT_* flags to mm.h

2018-05-15 Thread Christoph Hellwig
Various architectures define their own internal flags.  Not sure a public
header like mm.h is a good place, but keeping them inside the arch code
with possible conflicts also seems like a bad idea.  Maybe we just need
to stop overloading the value instead.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/fault.c   | 3 ---
 arch/arm64/mm/fault.c | 3 ---
 arch/s390/mm/fault.c  | 6 --
 arch/unicore32/mm/fault.c | 3 ---
 include/linux/mm.h| 7 +++
 5 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 32034543f49c..b696eabccf60 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -201,9 +201,6 @@ void do_bad_area(unsigned long addr, unsigned int fsr, 
struct pt_regs *regs)
 }
 
 #ifdef CONFIG_MMU
-#define VM_FAULT_BADMAP0x01
-#define VM_FAULT_BADACCESS 0x02
-
 /*
  * Check that the permissions on the VMA allow for the fault which occurred.
  * If we encountered a write fault, we must have write permission, otherwise
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 91c53a7d2575..3d0b1f8eacce 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -318,9 +318,6 @@ static void do_bad_area(unsigned long addr, unsigned int 
esr, struct pt_regs *re
}
 }
 
-#define VM_FAULT_BADMAP0x01
-#define VM_FAULT_BADACCESS 0x02
-
 static int __do_page_fault(struct mm_struct *mm, unsigned long addr,
   unsigned int mm_flags, unsigned long vm_flags,
   struct task_struct *tsk)
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index e074480d3598..48c781ae25d0 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -44,12 +44,6 @@
 #define __SUBCODE_MASK 0x0600
 #define __PF_RES_FIELD 0x8000ULL
 
-#define VM_FAULT_BADCONTEXT0x01
-#define VM_FAULT_BADMAP0x02
-#define VM_FAULT_BADACCESS 0x04
-#define VM_FAULT_SIGNAL0x08
-#define VM_FAULT_PFAULT0x10
-
 enum fault_type {
KERNEL_FAULT,
USER_FAULT,
diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index 381473412937..6c3c1a82925f 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -148,9 +148,6 @@ void do_bad_area(unsigned long addr, unsigned int fsr, 
struct pt_regs *regs)
__do_kernel_fault(mm, addr, fsr, regs);
 }
 
-#define VM_FAULT_BADMAP0x01
-#define VM_FAULT_BADACCESS 0x02
-
 /*
  * Check that the permissions on the VMA allow for the fault which occurred.
  * If we encountered a write fault, we must have write permission, otherwise
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 338b8a1afb02..64d09e3afc24 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1250,6 +1250,13 @@ static inline void clear_page_pfmemalloc(struct page 
*page)
 * and needs fsync() to complete (for
 * synchronous page faults in DAX) */
 
+/* Only for use in architecture specific page fault handling: */
+#define VM_FAULT_BADMAP0x01
+#define VM_FAULT_BADACCESS 0x02
+#define VM_FAULT_BADCONTEXT0x04
+#define VM_FAULT_SIGNAL0x08
+#define VM_FAULT_PFAULT0x10
+
 #define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \
 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
 VM_FAULT_FALLBACK)
-- 
2.17.0



[PATCH 13/14] mm: move arch specific VM_FAULT_* flags to mm.h

2018-05-15 Thread Christoph Hellwig
Various architectures define their own internal flags.  Not sure a public
header like mm.h is a good place, but keeping them inside the arch code
with possible conflicts also seems like a bad idea.  Maybe we just need
to stop overloading the value instead.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/fault.c   | 3 ---
 arch/arm64/mm/fault.c | 3 ---
 arch/s390/mm/fault.c  | 6 --
 arch/unicore32/mm/fault.c | 3 ---
 include/linux/mm.h| 7 +++
 5 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 32034543f49c..b696eabccf60 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -201,9 +201,6 @@ void do_bad_area(unsigned long addr, unsigned int fsr, 
struct pt_regs *regs)
 }
 
 #ifdef CONFIG_MMU
-#define VM_FAULT_BADMAP0x01
-#define VM_FAULT_BADACCESS 0x02
-
 /*
  * Check that the permissions on the VMA allow for the fault which occurred.
  * If we encountered a write fault, we must have write permission, otherwise
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 91c53a7d2575..3d0b1f8eacce 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -318,9 +318,6 @@ static void do_bad_area(unsigned long addr, unsigned int 
esr, struct pt_regs *re
}
 }
 
-#define VM_FAULT_BADMAP0x01
-#define VM_FAULT_BADACCESS 0x02
-
 static int __do_page_fault(struct mm_struct *mm, unsigned long addr,
   unsigned int mm_flags, unsigned long vm_flags,
   struct task_struct *tsk)
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index e074480d3598..48c781ae25d0 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -44,12 +44,6 @@
 #define __SUBCODE_MASK 0x0600
 #define __PF_RES_FIELD 0x8000ULL
 
-#define VM_FAULT_BADCONTEXT0x01
-#define VM_FAULT_BADMAP0x02
-#define VM_FAULT_BADACCESS 0x04
-#define VM_FAULT_SIGNAL0x08
-#define VM_FAULT_PFAULT0x10
-
 enum fault_type {
KERNEL_FAULT,
USER_FAULT,
diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index 381473412937..6c3c1a82925f 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -148,9 +148,6 @@ void do_bad_area(unsigned long addr, unsigned int fsr, 
struct pt_regs *regs)
__do_kernel_fault(mm, addr, fsr, regs);
 }
 
-#define VM_FAULT_BADMAP0x01
-#define VM_FAULT_BADACCESS 0x02
-
 /*
  * Check that the permissions on the VMA allow for the fault which occurred.
  * If we encountered a write fault, we must have write permission, otherwise
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 338b8a1afb02..64d09e3afc24 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1250,6 +1250,13 @@ static inline void clear_page_pfmemalloc(struct page 
*page)
 * and needs fsync() to complete (for
 * synchronous page faults in DAX) */
 
+/* Only for use in architecture specific page fault handling: */
+#define VM_FAULT_BADMAP0x01
+#define VM_FAULT_BADACCESS 0x02
+#define VM_FAULT_BADCONTEXT0x04
+#define VM_FAULT_SIGNAL0x08
+#define VM_FAULT_PFAULT0x10
+
 #define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \
 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
 VM_FAULT_FALLBACK)
-- 
2.17.0



[PATCH 14/14] mm: turn on vm_fault_t type checking

2018-05-15 Thread Christoph Hellwig
Switch vm_fault_t to point to an unsigned int with __bіtwise annotations.
This both catches any old ->fault or ->page_mkwrite instance with plain
compiler type checking, as well as finding more intricate problems with
sparse.

Signed-off-by: Christoph Hellwig 
---
 arch/alpha/mm/fault.c |  2 +-
 arch/arc/mm/fault.c   |  3 +-
 arch/arm/mm/fault.c   |  5 +-
 arch/arm64/mm/fault.c |  7 +-
 arch/hexagon/mm/vm_fault.c|  2 +-
 arch/ia64/mm/fault.c  |  2 +-
 arch/m68k/mm/fault.c  |  2 +-
 arch/microblaze/mm/fault.c|  2 +-
 arch/mips/mm/fault.c  |  2 +-
 arch/nds32/mm/fault.c |  2 +-
 arch/nios2/mm/fault.c |  2 +-
 arch/openrisc/mm/fault.c  |  2 +-
 arch/parisc/mm/fault.c|  2 +-
 arch/powerpc/include/asm/copro.h  |  2 +-
 arch/powerpc/mm/copro_fault.c |  2 +-
 arch/powerpc/mm/fault.c   | 10 +--
 arch/powerpc/platforms/cell/spufs/fault.c |  2 +-
 arch/riscv/mm/fault.c |  3 +-
 arch/s390/kernel/vdso.c   |  2 +-
 arch/s390/mm/fault.c  |  2 +-
 arch/sh/mm/fault.c|  2 +-
 arch/sparc/mm/fault_32.c  |  4 +-
 arch/sparc/mm/fault_64.c  |  3 +-
 arch/um/kernel/trap.c |  2 +-
 arch/unicore32/mm/fault.c | 10 +--
 arch/x86/entry/vdso/vma.c |  4 +-
 arch/x86/mm/fault.c   | 11 +--
 arch/xtensa/mm/fault.c|  2 +-
 drivers/dax/device.c  | 21 +++---
 drivers/gpu/drm/drm_vm.c  | 10 +--
 drivers/gpu/drm/etnaviv/etnaviv_drv.h |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c   |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.h   |  2 +-
 drivers/gpu/drm/gma500/framebuffer.c  |  6 +-
 drivers/gpu/drm/gma500/gem.c  |  2 +-
 drivers/gpu/drm/gma500/psb_drv.h  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h   |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   | 21 ++
 drivers/gpu/drm/msm/msm_drv.h |  2 +-
 drivers/gpu/drm/msm/msm_gem.c |  2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c |  4 +-
 drivers/gpu/drm/radeon/radeon_ttm.c   |  2 +-
 drivers/gpu/drm/udl/udl_drv.h |  2 +-
 drivers/gpu/drm/udl/udl_gem.c |  2 +-
 drivers/gpu/drm/vc4/vc4_bo.c  |  2 +-
 drivers/gpu/drm/vc4/vc4_drv.h |  2 +-
 drivers/hwtracing/intel_th/msu.c  |  2 +-
 drivers/iommu/amd_iommu_v2.c  |  2 +-
 drivers/iommu/intel-svm.c |  3 +-
 drivers/misc/cxl/fault.c  |  2 +-
 drivers/misc/ocxl/context.c   |  6 +-
 drivers/misc/ocxl/link.c  |  2 +-
 drivers/misc/ocxl/sysfs.c |  2 +-
 drivers/scsi/cxlflash/superpipe.c |  4 +-
 drivers/staging/ncpfs/mmap.c  |  2 +-
 drivers/xen/privcmd.c |  2 +-
 fs/9p/vfs_file.c  |  2 +-
 fs/afs/internal.h |  2 +-
 fs/afs/write.c|  2 +-
 fs/f2fs/file.c| 10 +--
 fs/fuse/file.c|  2 +-
 fs/gfs2/file.c|  2 +-
 fs/iomap.c|  2 +-
 fs/nfs/file.c |  4 +-
 fs/nilfs2/file.c  |  2 +-
 fs/proc/vmcore.c  |  2 +-
 fs/userfaultfd.c  |  4 +-
 fs/xfs/xfs_file.c | 12 ++--
 include/linux/huge_mm.h   | 13 ++--
 include/linux/hugetlb.h   |  2 +-
 include/linux/iomap.h |  4 +-
 include/linux/mm.h| 67 +
 include/linux/mm_types.h  |  5 +-
 include/linux/oom.h   |  2 +-
 include/linux/swapops.h   |  4 +-
 include/linux/userfaultfd_k.h |  5 +-
 ipc/shm.c |  2 +-
 kernel/events/core.c  |  4 +-
 mm/gup.c  |  7 +-
 mm/hmm.c  |  2 +-
 mm/huge_memory.c  | 29 
 mm/hugetlb.c  | 25 +++
 mm/internal.h |  2 +-
 mm/khugepaged.c   |  3 +-
 mm/ksm.c  |  2 +-
 mm/memory.c   | 88 ---
 mm/mmap.c |  4 +-
 mm/shmem.c|  9 +--
 samples/vfio-mdev/mbochs.c|  4 +-
 virt/kvm/kvm_main.c   |  2 +-
 91 files 

[PATCH 14/14] mm: turn on vm_fault_t type checking

2018-05-15 Thread Christoph Hellwig
Switch vm_fault_t to point to an unsigned int with __bіtwise annotations.
This both catches any old ->fault or ->page_mkwrite instance with plain
compiler type checking, as well as finding more intricate problems with
sparse.

Signed-off-by: Christoph Hellwig 
---
 arch/alpha/mm/fault.c |  2 +-
 arch/arc/mm/fault.c   |  3 +-
 arch/arm/mm/fault.c   |  5 +-
 arch/arm64/mm/fault.c |  7 +-
 arch/hexagon/mm/vm_fault.c|  2 +-
 arch/ia64/mm/fault.c  |  2 +-
 arch/m68k/mm/fault.c  |  2 +-
 arch/microblaze/mm/fault.c|  2 +-
 arch/mips/mm/fault.c  |  2 +-
 arch/nds32/mm/fault.c |  2 +-
 arch/nios2/mm/fault.c |  2 +-
 arch/openrisc/mm/fault.c  |  2 +-
 arch/parisc/mm/fault.c|  2 +-
 arch/powerpc/include/asm/copro.h  |  2 +-
 arch/powerpc/mm/copro_fault.c |  2 +-
 arch/powerpc/mm/fault.c   | 10 +--
 arch/powerpc/platforms/cell/spufs/fault.c |  2 +-
 arch/riscv/mm/fault.c |  3 +-
 arch/s390/kernel/vdso.c   |  2 +-
 arch/s390/mm/fault.c  |  2 +-
 arch/sh/mm/fault.c|  2 +-
 arch/sparc/mm/fault_32.c  |  4 +-
 arch/sparc/mm/fault_64.c  |  3 +-
 arch/um/kernel/trap.c |  2 +-
 arch/unicore32/mm/fault.c | 10 +--
 arch/x86/entry/vdso/vma.c |  4 +-
 arch/x86/mm/fault.c   | 11 +--
 arch/xtensa/mm/fault.c|  2 +-
 drivers/dax/device.c  | 21 +++---
 drivers/gpu/drm/drm_vm.c  | 10 +--
 drivers/gpu/drm/etnaviv/etnaviv_drv.h |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c   |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.h   |  2 +-
 drivers/gpu/drm/gma500/framebuffer.c  |  6 +-
 drivers/gpu/drm/gma500/gem.c  |  2 +-
 drivers/gpu/drm/gma500/psb_drv.h  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h   |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   | 21 ++
 drivers/gpu/drm/msm/msm_drv.h |  2 +-
 drivers/gpu/drm/msm/msm_gem.c |  2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c |  4 +-
 drivers/gpu/drm/radeon/radeon_ttm.c   |  2 +-
 drivers/gpu/drm/udl/udl_drv.h |  2 +-
 drivers/gpu/drm/udl/udl_gem.c |  2 +-
 drivers/gpu/drm/vc4/vc4_bo.c  |  2 +-
 drivers/gpu/drm/vc4/vc4_drv.h |  2 +-
 drivers/hwtracing/intel_th/msu.c  |  2 +-
 drivers/iommu/amd_iommu_v2.c  |  2 +-
 drivers/iommu/intel-svm.c |  3 +-
 drivers/misc/cxl/fault.c  |  2 +-
 drivers/misc/ocxl/context.c   |  6 +-
 drivers/misc/ocxl/link.c  |  2 +-
 drivers/misc/ocxl/sysfs.c |  2 +-
 drivers/scsi/cxlflash/superpipe.c |  4 +-
 drivers/staging/ncpfs/mmap.c  |  2 +-
 drivers/xen/privcmd.c |  2 +-
 fs/9p/vfs_file.c  |  2 +-
 fs/afs/internal.h |  2 +-
 fs/afs/write.c|  2 +-
 fs/f2fs/file.c| 10 +--
 fs/fuse/file.c|  2 +-
 fs/gfs2/file.c|  2 +-
 fs/iomap.c|  2 +-
 fs/nfs/file.c |  4 +-
 fs/nilfs2/file.c  |  2 +-
 fs/proc/vmcore.c  |  2 +-
 fs/userfaultfd.c  |  4 +-
 fs/xfs/xfs_file.c | 12 ++--
 include/linux/huge_mm.h   | 13 ++--
 include/linux/hugetlb.h   |  2 +-
 include/linux/iomap.h |  4 +-
 include/linux/mm.h| 67 +
 include/linux/mm_types.h  |  5 +-
 include/linux/oom.h   |  2 +-
 include/linux/swapops.h   |  4 +-
 include/linux/userfaultfd_k.h |  5 +-
 ipc/shm.c |  2 +-
 kernel/events/core.c  |  4 +-
 mm/gup.c  |  7 +-
 mm/hmm.c  |  2 +-
 mm/huge_memory.c  | 29 
 mm/hugetlb.c  | 25 +++
 mm/internal.h |  2 +-
 mm/khugepaged.c   |  3 +-
 mm/ksm.c  |  2 +-
 mm/memory.c   | 88 ---
 mm/mmap.c |  4 +-
 mm/shmem.c|  9 +--
 samples/vfio-mdev/mbochs.c|  4 +-
 virt/kvm/kvm_main.c   |  2 +-
 91 files changed, 285 

[PATCH 08/14] ocfs2: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/ocfs2/mmap.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
index fb9a20e3d608..e75c1fc5333e 100644
--- a/fs/ocfs2/mmap.c
+++ b/fs/ocfs2/mmap.c
@@ -44,11 +44,11 @@
 #include "ocfs2_trace.h"
 
 
-static int ocfs2_fault(struct vm_fault *vmf)
+static vm_fault_t ocfs2_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
sigset_t oldset;
-   int ret;
+   vm_fault_t ret;
 
ocfs2_block_signals();
ret = filemap_fault(vmf);
@@ -59,10 +59,10 @@ static int ocfs2_fault(struct vm_fault *vmf)
return ret;
 }
 
-static int __ocfs2_page_mkwrite(struct file *file, struct buffer_head *di_bh,
-   struct page *page)
+static vm_fault_t __ocfs2_page_mkwrite(struct file *file,
+   struct buffer_head *di_bh, struct page *page)
 {
-   int ret = VM_FAULT_NOPAGE;
+   vm_fault_t ret = VM_FAULT_NOPAGE;
struct inode *inode = file_inode(file);
struct address_space *mapping = inode->i_mapping;
loff_t pos = page_offset(page);
@@ -71,6 +71,7 @@ static int __ocfs2_page_mkwrite(struct file *file, struct 
buffer_head *di_bh,
struct page *locked_page = NULL;
void *fsdata;
loff_t size = i_size_read(inode);
+   int err;
 
last_index = (size - 1) >> PAGE_SHIFT;
 
@@ -105,12 +106,12 @@ static int __ocfs2_page_mkwrite(struct file *file, struct 
buffer_head *di_bh,
if (page->index == last_index)
len = ((size - 1) & ~PAGE_MASK) + 1;
 
-   ret = ocfs2_write_begin_nolock(mapping, pos, len, OCFS2_WRITE_MMAP,
+   err = ocfs2_write_begin_nolock(mapping, pos, len, OCFS2_WRITE_MMAP,
   _page, , di_bh, page);
-   if (ret) {
-   if (ret != -ENOSPC)
-   mlog_errno(ret);
-   if (ret == -ENOMEM)
+   if (err) {
+   if (err != -ENOSPC)
+   mlog_errno(err);
+   if (err == -ENOMEM)
ret = VM_FAULT_OOM;
else
ret = VM_FAULT_SIGBUS;
@@ -121,20 +122,21 @@ static int __ocfs2_page_mkwrite(struct file *file, struct 
buffer_head *di_bh,
ret = VM_FAULT_NOPAGE;
goto out;
}
-   ret = ocfs2_write_end_nolock(mapping, pos, len, len, fsdata);
-   BUG_ON(ret != len);
+   err = ocfs2_write_end_nolock(mapping, pos, len, len, fsdata);
+   BUG_ON(err != len);
ret = VM_FAULT_LOCKED;
 out:
return ret;
 }
 
-static int ocfs2_page_mkwrite(struct vm_fault *vmf)
+static vm_fault_t ocfs2_page_mkwrite(struct vm_fault *vmf)
 {
struct page *page = vmf->page;
struct inode *inode = file_inode(vmf->vma->vm_file);
struct buffer_head *di_bh = NULL;
sigset_t oldset;
-   int ret;
+   vm_fault_t ret = 0;
+   int err;
 
sb_start_pagefault(inode->i_sb);
ocfs2_block_signals();
@@ -144,10 +146,10 @@ static int ocfs2_page_mkwrite(struct vm_fault *vmf)
 * node. Taking the data lock will also ensure that we don't
 * attempt page truncation as part of a downconvert.
 */
-   ret = ocfs2_inode_lock(inode, _bh, 1);
-   if (ret < 0) {
+   err = ocfs2_inode_lock(inode, _bh, 1);
+   if (err < 0) {
mlog_errno(ret);
-   if (ret == -ENOMEM)
+   if (err == -ENOMEM)
ret = VM_FAULT_OOM;
else
ret = VM_FAULT_SIGBUS;
-- 
2.17.0



[PATCH 01/14] orangefs: don't return errno values from ->fault

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/orangefs/file.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 26358efbf794..b4a25cd4f3fa 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -528,18 +528,16 @@ static long orangefs_ioctl(struct file *file, unsigned 
int cmd, unsigned long ar
return ret;
 }
 
-static int orangefs_fault(struct vm_fault *vmf)
+static vm_fault_t orangefs_fault(struct vm_fault *vmf)
 {
struct file *file = vmf->vma->vm_file;
int rc;
-   rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
-   STATX_SIZE);
-   if (rc == -ESTALE)
-   rc = -EIO;
+
+   rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1, STATX_SIZE);
if (rc) {
gossip_err("%s: orangefs_inode_getattr failed, "
"rc:%d:.\n", __func__, rc);
-   return rc;
+   return VM_FAULT_SIGBUS;
}
return filemap_fault(vmf);
 }
-- 
2.17.0



[PATCH 08/14] ocfs2: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/ocfs2/mmap.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
index fb9a20e3d608..e75c1fc5333e 100644
--- a/fs/ocfs2/mmap.c
+++ b/fs/ocfs2/mmap.c
@@ -44,11 +44,11 @@
 #include "ocfs2_trace.h"
 
 
-static int ocfs2_fault(struct vm_fault *vmf)
+static vm_fault_t ocfs2_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
sigset_t oldset;
-   int ret;
+   vm_fault_t ret;
 
ocfs2_block_signals();
ret = filemap_fault(vmf);
@@ -59,10 +59,10 @@ static int ocfs2_fault(struct vm_fault *vmf)
return ret;
 }
 
-static int __ocfs2_page_mkwrite(struct file *file, struct buffer_head *di_bh,
-   struct page *page)
+static vm_fault_t __ocfs2_page_mkwrite(struct file *file,
+   struct buffer_head *di_bh, struct page *page)
 {
-   int ret = VM_FAULT_NOPAGE;
+   vm_fault_t ret = VM_FAULT_NOPAGE;
struct inode *inode = file_inode(file);
struct address_space *mapping = inode->i_mapping;
loff_t pos = page_offset(page);
@@ -71,6 +71,7 @@ static int __ocfs2_page_mkwrite(struct file *file, struct 
buffer_head *di_bh,
struct page *locked_page = NULL;
void *fsdata;
loff_t size = i_size_read(inode);
+   int err;
 
last_index = (size - 1) >> PAGE_SHIFT;
 
@@ -105,12 +106,12 @@ static int __ocfs2_page_mkwrite(struct file *file, struct 
buffer_head *di_bh,
if (page->index == last_index)
len = ((size - 1) & ~PAGE_MASK) + 1;
 
-   ret = ocfs2_write_begin_nolock(mapping, pos, len, OCFS2_WRITE_MMAP,
+   err = ocfs2_write_begin_nolock(mapping, pos, len, OCFS2_WRITE_MMAP,
   _page, , di_bh, page);
-   if (ret) {
-   if (ret != -ENOSPC)
-   mlog_errno(ret);
-   if (ret == -ENOMEM)
+   if (err) {
+   if (err != -ENOSPC)
+   mlog_errno(err);
+   if (err == -ENOMEM)
ret = VM_FAULT_OOM;
else
ret = VM_FAULT_SIGBUS;
@@ -121,20 +122,21 @@ static int __ocfs2_page_mkwrite(struct file *file, struct 
buffer_head *di_bh,
ret = VM_FAULT_NOPAGE;
goto out;
}
-   ret = ocfs2_write_end_nolock(mapping, pos, len, len, fsdata);
-   BUG_ON(ret != len);
+   err = ocfs2_write_end_nolock(mapping, pos, len, len, fsdata);
+   BUG_ON(err != len);
ret = VM_FAULT_LOCKED;
 out:
return ret;
 }
 
-static int ocfs2_page_mkwrite(struct vm_fault *vmf)
+static vm_fault_t ocfs2_page_mkwrite(struct vm_fault *vmf)
 {
struct page *page = vmf->page;
struct inode *inode = file_inode(vmf->vma->vm_file);
struct buffer_head *di_bh = NULL;
sigset_t oldset;
-   int ret;
+   vm_fault_t ret = 0;
+   int err;
 
sb_start_pagefault(inode->i_sb);
ocfs2_block_signals();
@@ -144,10 +146,10 @@ static int ocfs2_page_mkwrite(struct vm_fault *vmf)
 * node. Taking the data lock will also ensure that we don't
 * attempt page truncation as part of a downconvert.
 */
-   ret = ocfs2_inode_lock(inode, _bh, 1);
-   if (ret < 0) {
+   err = ocfs2_inode_lock(inode, _bh, 1);
+   if (err < 0) {
mlog_errno(ret);
-   if (ret == -ENOMEM)
+   if (err == -ENOMEM)
ret = VM_FAULT_OOM;
else
ret = VM_FAULT_SIGBUS;
-- 
2.17.0



[PATCH 01/14] orangefs: don't return errno values from ->fault

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/orangefs/file.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 26358efbf794..b4a25cd4f3fa 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -528,18 +528,16 @@ static long orangefs_ioctl(struct file *file, unsigned 
int cmd, unsigned long ar
return ret;
 }
 
-static int orangefs_fault(struct vm_fault *vmf)
+static vm_fault_t orangefs_fault(struct vm_fault *vmf)
 {
struct file *file = vmf->vma->vm_file;
int rc;
-   rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
-   STATX_SIZE);
-   if (rc == -ESTALE)
-   rc = -EIO;
+
+   rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1, STATX_SIZE);
if (rc) {
gossip_err("%s: orangefs_inode_getattr failed, "
"rc:%d:.\n", __func__, rc);
-   return rc;
+   return VM_FAULT_SIGBUS;
}
return filemap_fault(vmf);
 }
-- 
2.17.0



[PATCH 05/14] ceph: untangle ceph_filemap_fault

2018-05-15 Thread Christoph Hellwig
Streamline the code to have a somewhat natural flow, and separate the
errno values from the VM_FAULT_* values.

Signed-off-by: Christoph Hellwig 
---
 fs/ceph/addr.c | 100 +
 1 file changed, 51 insertions(+), 49 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 5f7ad3d0df2e..6e80894ca073 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1428,15 +1428,18 @@ static void ceph_restore_sigs(sigset_t *oldset)
 /*
  * vm ops
  */
-static int ceph_filemap_fault(struct vm_fault *vmf)
+static vm_fault_t ceph_filemap_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct inode *inode = file_inode(vma->vm_file);
+   struct address_space *mapping = inode->i_mapping;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_file_info *fi = vma->vm_file->private_data;
-   struct page *pinned_page = NULL;
+   struct page *pinned_page = NULL, *page;
loff_t off = vmf->pgoff << PAGE_SHIFT;
-   int want, got, ret;
+   int want, got, err = 0;
+   vm_fault_t ret = 0;
+   bool did_fault = false;
sigset_t oldset;
 
ceph_block_sigs();
@@ -1449,9 +1452,9 @@ static int ceph_filemap_fault(struct vm_fault *vmf)
want = CEPH_CAP_FILE_CACHE;
 
got = 0;
-   ret = ceph_get_caps(ci, CEPH_CAP_FILE_RD, want, -1, , _page);
-   if (ret < 0)
-   goto out_restore;
+   err = ceph_get_caps(ci, CEPH_CAP_FILE_RD, want, -1, , _page);
+   if (err < 0)
+   goto out_errno;
 
dout("filemap_fault %p %llu~%zd got cap refs on %s\n",
 inode, off, (size_t)PAGE_SIZE, ceph_cap_string(got));
@@ -1462,8 +1465,8 @@ static int ceph_filemap_fault(struct vm_fault *vmf)
ceph_add_rw_context(fi, _ctx);
ret = filemap_fault(vmf);
ceph_del_rw_context(fi, _ctx);
-   } else
-   ret = -EAGAIN;
+   did_fault = true;
+   }
 
dout("filemap_fault %p %llu~%zd dropping cap refs on %s ret %d\n",
 inode, off, (size_t)PAGE_SIZE, ceph_cap_string(got), ret);
@@ -1471,57 +1474,55 @@ static int ceph_filemap_fault(struct vm_fault *vmf)
put_page(pinned_page);
ceph_put_cap_refs(ci, got);
 
-   if (ret != -EAGAIN)
+   if (did_fault)
goto out_restore;
 
/* read inline data */
if (off >= PAGE_SIZE) {
/* does not support inline data > PAGE_SIZE */
ret = VM_FAULT_SIGBUS;
+   goto out_restore;
+   }
+
+   page = find_or_create_page(mapping, 0,
+   mapping_gfp_constraint(mapping, ~__GFP_FS));
+   if (!page) {
+   ret = VM_FAULT_OOM;
+   goto out_inline;
+   }
+
+   err = __ceph_do_getattr(inode, page, CEPH_STAT_CAP_INLINE_DATA, true);
+   if (err < 0 || off >= i_size_read(inode)) {
+   unlock_page(page);
+   put_page(page);
+   if (err < 0)
+   goto out_errno;
+   ret = VM_FAULT_SIGBUS;
} else {
-   int ret1;
-   struct address_space *mapping = inode->i_mapping;
-   struct page *page = find_or_create_page(mapping, 0,
-   mapping_gfp_constraint(mapping,
-   ~__GFP_FS));
-   if (!page) {
-   ret = VM_FAULT_OOM;
-   goto out_inline;
-   }
-   ret1 = __ceph_do_getattr(inode, page,
-CEPH_STAT_CAP_INLINE_DATA, true);
-   if (ret1 < 0 || off >= i_size_read(inode)) {
-   unlock_page(page);
-   put_page(page);
-   if (ret1 < 0)
-   ret = ret1;
-   else
-   ret = VM_FAULT_SIGBUS;
-   goto out_inline;
-   }
-   if (ret1 < PAGE_SIZE)
-   zero_user_segment(page, ret1, PAGE_SIZE);
+   if (err < PAGE_SIZE)
+   zero_user_segment(page, err, PAGE_SIZE);
else
flush_dcache_page(page);
SetPageUptodate(page);
vmf->page = page;
ret = VM_FAULT_MAJOR | VM_FAULT_LOCKED;
-out_inline:
-   dout("filemap_fault %p %llu~%zd read inline data ret %d\n",
-inode, off, (size_t)PAGE_SIZE, ret);
}
+
+out_inline:
+   dout("filemap_fault %p %llu~%zd read inline data ret %d\n",
+inode, off, (size_t)PAGE_SIZE, ret);
 out_restore:
ceph_restore_sigs();
-   if (ret < 0)
-   ret = (ret == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
-
return ret;
+out_errno:
+   ret = (err == -ENOMEM) 

[PATCH 07/14] ext4: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/ext4/ext4.h  |  4 ++--
 fs/ext4/inode.c | 30 +++---
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index fa52b7dd4542..48592d0edf3e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2463,8 +2463,8 @@ extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 loff_t lstart, loff_t lend);
-extern int ext4_page_mkwrite(struct vm_fault *vmf);
-extern int ext4_filemap_fault(struct vm_fault *vmf);
+extern vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf);
+extern vm_fault_t ext4_filemap_fault(struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
 extern int ext4_get_projid(struct inode *inode, kprojid_t *projid);
 extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 95bc48f5c88b..fe49045a2832 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -6106,27 +6106,27 @@ static int ext4_bh_unmapped(handle_t *handle, struct 
buffer_head *bh)
return !buffer_mapped(bh);
 }
 
-int ext4_page_mkwrite(struct vm_fault *vmf)
+vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct page *page = vmf->page;
loff_t size;
unsigned long len;
-   int ret;
+   vm_fault_t ret;
struct file *file = vma->vm_file;
struct inode *inode = file_inode(file);
struct address_space *mapping = inode->i_mapping;
handle_t *handle;
get_block_t *get_block;
-   int retries = 0;
+   int retries = 0, err;
 
sb_start_pagefault(inode->i_sb);
file_update_time(vma->vm_file);
 
down_read(_I(inode)->i_mmap_sem);
 
-   ret = ext4_convert_inline_data(inode);
-   if (ret)
+   err = ext4_convert_inline_data(inode);
+   if (err)
goto out_ret;
 
/* Delalloc case is easy... */
@@ -6134,9 +6134,9 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
!ext4_should_journal_data(inode) &&
!ext4_nonda_switch(inode->i_sb)) {
do {
-   ret = block_page_mkwrite(vma, vmf,
+   err = block_page_mkwrite(vma, vmf,
   ext4_da_get_block_prep);
-   } while (ret == -ENOSPC &&
+   } while (err == -ENOSPC &&
   ext4_should_retry_alloc(inode->i_sb, ));
goto out_ret;
}
@@ -6181,8 +6181,8 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
ret = VM_FAULT_SIGBUS;
goto out;
}
-   ret = block_page_mkwrite(vma, vmf, get_block);
-   if (!ret && ext4_should_journal_data(inode)) {
+   err = block_page_mkwrite(vma, vmf, get_block);
+   if (!err && ext4_should_journal_data(inode)) {
if (ext4_walk_page_buffers(handle, page_buffers(page), 0,
  PAGE_SIZE, NULL, do_journal_get_write_access)) {
unlock_page(page);
@@ -6193,24 +6193,24 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
}
ext4_journal_stop(handle);
-   if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, ))
+   if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, ))
goto retry_alloc;
 out_ret:
-   ret = block_page_mkwrite_return(ret);
+   ret = block_page_mkwrite_return(err);
 out:
up_read(_I(inode)->i_mmap_sem);
sb_end_pagefault(inode->i_sb);
return ret;
 }
 
-int ext4_filemap_fault(struct vm_fault *vmf)
+vm_fault_t ext4_filemap_fault(struct vm_fault *vmf)
 {
struct inode *inode = file_inode(vmf->vma->vm_file);
-   int err;
+   vm_fault_t ret;
 
down_read(_I(inode)->i_mmap_sem);
-   err = filemap_fault(vmf);
+   ret = filemap_fault(vmf);
up_read(_I(inode)->i_mmap_sem);
 
-   return err;
+   return ret;
 }
-- 
2.17.0



[PATCH 05/14] ceph: untangle ceph_filemap_fault

2018-05-15 Thread Christoph Hellwig
Streamline the code to have a somewhat natural flow, and separate the
errno values from the VM_FAULT_* values.

Signed-off-by: Christoph Hellwig 
---
 fs/ceph/addr.c | 100 +
 1 file changed, 51 insertions(+), 49 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 5f7ad3d0df2e..6e80894ca073 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1428,15 +1428,18 @@ static void ceph_restore_sigs(sigset_t *oldset)
 /*
  * vm ops
  */
-static int ceph_filemap_fault(struct vm_fault *vmf)
+static vm_fault_t ceph_filemap_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct inode *inode = file_inode(vma->vm_file);
+   struct address_space *mapping = inode->i_mapping;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_file_info *fi = vma->vm_file->private_data;
-   struct page *pinned_page = NULL;
+   struct page *pinned_page = NULL, *page;
loff_t off = vmf->pgoff << PAGE_SHIFT;
-   int want, got, ret;
+   int want, got, err = 0;
+   vm_fault_t ret = 0;
+   bool did_fault = false;
sigset_t oldset;
 
ceph_block_sigs();
@@ -1449,9 +1452,9 @@ static int ceph_filemap_fault(struct vm_fault *vmf)
want = CEPH_CAP_FILE_CACHE;
 
got = 0;
-   ret = ceph_get_caps(ci, CEPH_CAP_FILE_RD, want, -1, , _page);
-   if (ret < 0)
-   goto out_restore;
+   err = ceph_get_caps(ci, CEPH_CAP_FILE_RD, want, -1, , _page);
+   if (err < 0)
+   goto out_errno;
 
dout("filemap_fault %p %llu~%zd got cap refs on %s\n",
 inode, off, (size_t)PAGE_SIZE, ceph_cap_string(got));
@@ -1462,8 +1465,8 @@ static int ceph_filemap_fault(struct vm_fault *vmf)
ceph_add_rw_context(fi, _ctx);
ret = filemap_fault(vmf);
ceph_del_rw_context(fi, _ctx);
-   } else
-   ret = -EAGAIN;
+   did_fault = true;
+   }
 
dout("filemap_fault %p %llu~%zd dropping cap refs on %s ret %d\n",
 inode, off, (size_t)PAGE_SIZE, ceph_cap_string(got), ret);
@@ -1471,57 +1474,55 @@ static int ceph_filemap_fault(struct vm_fault *vmf)
put_page(pinned_page);
ceph_put_cap_refs(ci, got);
 
-   if (ret != -EAGAIN)
+   if (did_fault)
goto out_restore;
 
/* read inline data */
if (off >= PAGE_SIZE) {
/* does not support inline data > PAGE_SIZE */
ret = VM_FAULT_SIGBUS;
+   goto out_restore;
+   }
+
+   page = find_or_create_page(mapping, 0,
+   mapping_gfp_constraint(mapping, ~__GFP_FS));
+   if (!page) {
+   ret = VM_FAULT_OOM;
+   goto out_inline;
+   }
+
+   err = __ceph_do_getattr(inode, page, CEPH_STAT_CAP_INLINE_DATA, true);
+   if (err < 0 || off >= i_size_read(inode)) {
+   unlock_page(page);
+   put_page(page);
+   if (err < 0)
+   goto out_errno;
+   ret = VM_FAULT_SIGBUS;
} else {
-   int ret1;
-   struct address_space *mapping = inode->i_mapping;
-   struct page *page = find_or_create_page(mapping, 0,
-   mapping_gfp_constraint(mapping,
-   ~__GFP_FS));
-   if (!page) {
-   ret = VM_FAULT_OOM;
-   goto out_inline;
-   }
-   ret1 = __ceph_do_getattr(inode, page,
-CEPH_STAT_CAP_INLINE_DATA, true);
-   if (ret1 < 0 || off >= i_size_read(inode)) {
-   unlock_page(page);
-   put_page(page);
-   if (ret1 < 0)
-   ret = ret1;
-   else
-   ret = VM_FAULT_SIGBUS;
-   goto out_inline;
-   }
-   if (ret1 < PAGE_SIZE)
-   zero_user_segment(page, ret1, PAGE_SIZE);
+   if (err < PAGE_SIZE)
+   zero_user_segment(page, err, PAGE_SIZE);
else
flush_dcache_page(page);
SetPageUptodate(page);
vmf->page = page;
ret = VM_FAULT_MAJOR | VM_FAULT_LOCKED;
-out_inline:
-   dout("filemap_fault %p %llu~%zd read inline data ret %d\n",
-inode, off, (size_t)PAGE_SIZE, ret);
}
+
+out_inline:
+   dout("filemap_fault %p %llu~%zd read inline data ret %d\n",
+inode, off, (size_t)PAGE_SIZE, ret);
 out_restore:
ceph_restore_sigs();
-   if (ret < 0)
-   ret = (ret == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
-
return ret;
+out_errno:
+   ret = (err == -ENOMEM) ? 

[PATCH 07/14] ext4: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/ext4/ext4.h  |  4 ++--
 fs/ext4/inode.c | 30 +++---
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index fa52b7dd4542..48592d0edf3e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2463,8 +2463,8 @@ extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 loff_t lstart, loff_t lend);
-extern int ext4_page_mkwrite(struct vm_fault *vmf);
-extern int ext4_filemap_fault(struct vm_fault *vmf);
+extern vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf);
+extern vm_fault_t ext4_filemap_fault(struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
 extern int ext4_get_projid(struct inode *inode, kprojid_t *projid);
 extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 95bc48f5c88b..fe49045a2832 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -6106,27 +6106,27 @@ static int ext4_bh_unmapped(handle_t *handle, struct 
buffer_head *bh)
return !buffer_mapped(bh);
 }
 
-int ext4_page_mkwrite(struct vm_fault *vmf)
+vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct page *page = vmf->page;
loff_t size;
unsigned long len;
-   int ret;
+   vm_fault_t ret;
struct file *file = vma->vm_file;
struct inode *inode = file_inode(file);
struct address_space *mapping = inode->i_mapping;
handle_t *handle;
get_block_t *get_block;
-   int retries = 0;
+   int retries = 0, err;
 
sb_start_pagefault(inode->i_sb);
file_update_time(vma->vm_file);
 
down_read(_I(inode)->i_mmap_sem);
 
-   ret = ext4_convert_inline_data(inode);
-   if (ret)
+   err = ext4_convert_inline_data(inode);
+   if (err)
goto out_ret;
 
/* Delalloc case is easy... */
@@ -6134,9 +6134,9 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
!ext4_should_journal_data(inode) &&
!ext4_nonda_switch(inode->i_sb)) {
do {
-   ret = block_page_mkwrite(vma, vmf,
+   err = block_page_mkwrite(vma, vmf,
   ext4_da_get_block_prep);
-   } while (ret == -ENOSPC &&
+   } while (err == -ENOSPC &&
   ext4_should_retry_alloc(inode->i_sb, ));
goto out_ret;
}
@@ -6181,8 +6181,8 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
ret = VM_FAULT_SIGBUS;
goto out;
}
-   ret = block_page_mkwrite(vma, vmf, get_block);
-   if (!ret && ext4_should_journal_data(inode)) {
+   err = block_page_mkwrite(vma, vmf, get_block);
+   if (!err && ext4_should_journal_data(inode)) {
if (ext4_walk_page_buffers(handle, page_buffers(page), 0,
  PAGE_SIZE, NULL, do_journal_get_write_access)) {
unlock_page(page);
@@ -6193,24 +6193,24 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
}
ext4_journal_stop(handle);
-   if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, ))
+   if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, ))
goto retry_alloc;
 out_ret:
-   ret = block_page_mkwrite_return(ret);
+   ret = block_page_mkwrite_return(err);
 out:
up_read(_I(inode)->i_mmap_sem);
sb_end_pagefault(inode->i_sb);
return ret;
 }
 
-int ext4_filemap_fault(struct vm_fault *vmf)
+vm_fault_t ext4_filemap_fault(struct vm_fault *vmf)
 {
struct inode *inode = file_inode(vmf->vma->vm_file);
-   int err;
+   vm_fault_t ret;
 
down_read(_I(inode)->i_mmap_sem);
-   err = filemap_fault(vmf);
+   ret = filemap_fault(vmf);
up_read(_I(inode)->i_mmap_sem);
 
-   return err;
+   return ret;
 }
-- 
2.17.0



[PATCH 03/14] dax: make the dax_iomap_fault prototype consistent

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 include/linux/dax.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index dc65ece825ee..a292bccdc274 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -183,7 +183,7 @@ void dax_flush(struct dax_device *dax_dev, void *addr, 
size_t size);
 
 ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops);
-int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
+vm_fault_t dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
pfn_t *pfnp, int *errp, const struct iomap_ops *ops);
 vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
enum page_entry_size pe_size, pfn_t pfn);
-- 
2.17.0



[PATCH 03/14] dax: make the dax_iomap_fault prototype consistent

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 include/linux/dax.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index dc65ece825ee..a292bccdc274 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -183,7 +183,7 @@ void dax_flush(struct dax_device *dax_dev, void *addr, 
size_t size);
 
 ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops);
-int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
+vm_fault_t dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
pfn_t *pfnp, int *errp, const struct iomap_ops *ops);
 vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
enum page_entry_size pe_size, pfn_t pfn);
-- 
2.17.0



[PATCH 02/14] fs: make the filemap_page_mkwrite prototype consistent

2018-05-15 Thread Christoph Hellwig
!CONFIG_MMU version didn't agree with the rest of the kernel..

Signed-off-by: Christoph Hellwig 
---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 52517f28e6f4..cf21ced98eff 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2748,7 +2748,7 @@ int generic_file_readonly_mmap(struct file *file, struct 
vm_area_struct *vma)
return generic_file_mmap(file, vma);
 }
 #else
-int filemap_page_mkwrite(struct vm_fault *vmf)
+vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf)
 {
return -ENOSYS;
 }
-- 
2.17.0



[PATCH 02/14] fs: make the filemap_page_mkwrite prototype consistent

2018-05-15 Thread Christoph Hellwig
!CONFIG_MMU version didn't agree with the rest of the kernel..

Signed-off-by: Christoph Hellwig 
---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 52517f28e6f4..cf21ced98eff 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2748,7 +2748,7 @@ int generic_file_readonly_mmap(struct file *file, struct 
vm_area_struct *vma)
return generic_file_mmap(file, vma);
 }
 #else
-int filemap_page_mkwrite(struct vm_fault *vmf)
+vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf)
 {
return -ENOSYS;
 }
-- 
2.17.0



vm_fault_t conversion, for real

2018-05-15 Thread Christoph Hellwig
Hi all,

this series tries to actually turn vm_fault_t into a type that can be
typechecked and checks the fallout instead of sprinkling random
annotations without context.

The first one fixes a real bug in orangefs, the second and third fix
mismatched existing vm_fault_t annotations on the same function, the
fourth removes an unused export that was in the chain.  The remainder
until the last one do some not quite trivial conversions, and the last
one does the trivial mass annotation and flips vm_fault_t to a __bitwise
unsigned int - the unsigned means we also get plain compiler type
checking for the new ->fault signature even without sparse.

This has survived an x86 allyesconfig build, and got a SUCCESS from the
buildbot that I don't really trust - I'm pretty sure there are bits
and pieces hiding in other architectures that it hasn't caught up to.

The sparse annotations are manuall verified for the core MM code and
a few other interesting bits (e.g. DAX and the x86 fault code)

The series is against linux-next as of 2018/05/15 to make sure any
annotations in subsystem trees are picked up.


vm_fault_t conversion, for real

2018-05-15 Thread Christoph Hellwig
Hi all,

this series tries to actually turn vm_fault_t into a type that can be
typechecked and checks the fallout instead of sprinkling random
annotations without context.

The first one fixes a real bug in orangefs, the second and third fix
mismatched existing vm_fault_t annotations on the same function, the
fourth removes an unused export that was in the chain.  The remainder
until the last one do some not quite trivial conversions, and the last
one does the trivial mass annotation and flips vm_fault_t to a __bitwise
unsigned int - the unsigned means we also get plain compiler type
checking for the new ->fault signature even without sparse.

This has survived an x86 allyesconfig build, and got a SUCCESS from the
buildbot that I don't really trust - I'm pretty sure there are bits
and pieces hiding in other architectures that it hasn't caught up to.

The sparse annotations are manuall verified for the core MM code and
a few other interesting bits (e.g. DAX and the x86 fault code)

The series is against linux-next as of 2018/05/15 to make sure any
annotations in subsystem trees are picked up.


[RFC] powerpc/emulate_step: Fix kernel crash when VSX is not present

2018-05-15 Thread Ravi Bangoria
emulate_step() is not checking runtime VSX feature flag before
emulating an instruction. This can cause kernel oops when kernel
is compiled with CONFIG_VSX=y but running on machine where VSX is
not supported or disabled. Ex, while running emulate_step tests on
P6 machine:

  ...
  emulate_step_test: lvx: PASS
  emulate_step_test: stvx   : PASS
  Oops: Exception in kernel mode, sig: 4 [#1]
  NIP [c0095c24] .load_vsrn+0x28/0x54
  LR [c0094bdc] .emulate_loadstore+0x167c/0x17b0
  Call Trace:
  [c000494c3770] [40fe240c7ae147ae] 0x40fe240c7ae147ae (unreliable)
  [c000494c37f0] [c0094bdc] .emulate_loadstore+0x167c/0x17b0
  [c000494c3900] [c0094f6c] .emulate_step+0x25c/0x5bc
  [c000494c39b0] [c0ec90dc] .test_lxvd2x_stxvd2x+0x64/0x154
  [c000494c3b90] [c0ec9204] .test_emulate_step+0x38/0x4c
  [c000494c3c00] [c000de2c] .do_one_initcall+0x5c/0x2c0
  [c000494c3cd0] [c0eb4bf8] .kernel_init_freeable+0x314/0x4cc
  [c000494c3db0] [c000e1e4] .kernel_init+0x24/0x160
  [c000494c3e30] [c000bc24] .ret_from_kernel_thread+0x58/0xb4

With fix:
  ...
  emulate_step_test: stvx   : PASS
  emulate_step_test: lxvd2x : FAIL
  emulate_step_test: stxvd2x: FAIL

Fixes: https://github.com/linuxppc/linux/issues/148

Reported-by: Michael Ellerman 
Signed-off-by: Ravi Bangoria 
---
Note: VSX was first introduced on P6. But there are many VSX instructions
  which were introduced in later versions. So ideally, analyse_instr()
  should check for ISA version along with opcode.

 arch/powerpc/lib/sstep.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 34d68f1b1b40..470942f9c567 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1173,6 +1173,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
unsigned long int val, val2;
unsigned int mb, me, sh;
long ival;
+   int type;
 
op->type = COMPUTE;
 
@@ -2544,6 +2545,15 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
 #endif /* __powerpc64__ */
 
}
+
+#ifdef CONFIG_VSX
+   type = op->type & INSTR_TYPE_MASK;
+   if ((type == LOAD_VSX || type == STORE_VSX) &&
+   !cpu_has_feature(CPU_FTR_VSX)) {
+   return -1;
+   }
+#endif /* CONFIG_VSX */
+
return 0;
 
  logical_done:
-- 
2.16.2



[RFC] powerpc/emulate_step: Fix kernel crash when VSX is not present

2018-05-15 Thread Ravi Bangoria
emulate_step() is not checking runtime VSX feature flag before
emulating an instruction. This can cause kernel oops when kernel
is compiled with CONFIG_VSX=y but running on machine where VSX is
not supported or disabled. Ex, while running emulate_step tests on
P6 machine:

  ...
  emulate_step_test: lvx: PASS
  emulate_step_test: stvx   : PASS
  Oops: Exception in kernel mode, sig: 4 [#1]
  NIP [c0095c24] .load_vsrn+0x28/0x54
  LR [c0094bdc] .emulate_loadstore+0x167c/0x17b0
  Call Trace:
  [c000494c3770] [40fe240c7ae147ae] 0x40fe240c7ae147ae (unreliable)
  [c000494c37f0] [c0094bdc] .emulate_loadstore+0x167c/0x17b0
  [c000494c3900] [c0094f6c] .emulate_step+0x25c/0x5bc
  [c000494c39b0] [c0ec90dc] .test_lxvd2x_stxvd2x+0x64/0x154
  [c000494c3b90] [c0ec9204] .test_emulate_step+0x38/0x4c
  [c000494c3c00] [c000de2c] .do_one_initcall+0x5c/0x2c0
  [c000494c3cd0] [c0eb4bf8] .kernel_init_freeable+0x314/0x4cc
  [c000494c3db0] [c000e1e4] .kernel_init+0x24/0x160
  [c000494c3e30] [c000bc24] .ret_from_kernel_thread+0x58/0xb4

With fix:
  ...
  emulate_step_test: stvx   : PASS
  emulate_step_test: lxvd2x : FAIL
  emulate_step_test: stxvd2x: FAIL

Fixes: https://github.com/linuxppc/linux/issues/148

Reported-by: Michael Ellerman 
Signed-off-by: Ravi Bangoria 
---
Note: VSX was first introduced on P6. But there are many VSX instructions
  which were introduced in later versions. So ideally, analyse_instr()
  should check for ISA version along with opcode.

 arch/powerpc/lib/sstep.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 34d68f1b1b40..470942f9c567 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1173,6 +1173,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
unsigned long int val, val2;
unsigned int mb, me, sh;
long ival;
+   int type;
 
op->type = COMPUTE;
 
@@ -2544,6 +2545,15 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
 #endif /* __powerpc64__ */
 
}
+
+#ifdef CONFIG_VSX
+   type = op->type & INSTR_TYPE_MASK;
+   if ((type == LOAD_VSX || type == STORE_VSX) &&
+   !cpu_has_feature(CPU_FTR_VSX)) {
+   return -1;
+   }
+#endif /* CONFIG_VSX */
+
return 0;
 
  logical_done:
-- 
2.16.2



Re: [PATCH v4 2/2] locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN

2018-05-15 Thread Amir Goldstein
On Wed, May 16, 2018 at 12:49 AM, Waiman Long  wrote:
> The filesystem freezing code needs to transfer ownership of a rwsem
> embedded in a percpu-rwsem from the task that does the freezing to
> another one that does the thawing by calling percpu_rwsem_release()
> after freezing and percpu_rwsem_acquire() before thawing.
>
> However, the new rwsem debug code runs afoul with this scheme by warning
> that the task that releases the rwsem isn't the one that acquires it.
>
> [   20.302978] [ cut here ]
> [   20.305016] DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
> [   20.305029] WARNING: CPU: 1 PID: 1401 at
> /home/amir/build/src/linux/kernel/locking/rwsem.c:133 up_write+0x59/0x79
> [   20.311252] CPU: 1 PID: 1401 Comm: fsfreeze Not tainted 
> 4.17.0-rc3-xfstests-00049-g39e47bf59eb3 #3276
> [   20.314808] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [   20.318403] RIP: 0010:up_write+0x59/0x79
> [   20.320928] RSP: 0018:c9717e48 EFLAGS: 00010286
> [   20.322955] RAX: 0030 RBX: 880078f1c680 RCX: 
> 880078e42200
> [   20.325665] RDX: 810cc9c1 RSI: 0001 RDI: 
> 0202
> [   20.328844] RBP: c9717e80 R08: 0001 R09: 
> 0001
> [   20.332340] R10: c9717c58 R11: 836807ad R12: 
> 880078f1c388
> [   20.335095] R13: 880078a8b980 R14:  R15: 
> fff7
> [   20.338009] FS:  7fb61ca42700() GS:88007f40() 
> knlGS:
> [   20.341423] CS:  0010 DS:  ES:  CR0: 80050033
> [   20.343772] CR2: 7fb61c559b30 CR3: 78da6000 CR4: 
> 06e0
> [   20.346463] DR0:  DR1:  DR2: 
> 
> [   20.349201] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [   20.351960] Call Trace:
> [   20.352911]  percpu_up_write+0x1f/0x28
> [   20.354344]  thaw_super_locked+0xdf/0x120
> [   20.355944]  do_vfs_ioctl+0x270/0x5f1
> [   20.357390]  ? __se_sys_newfstat+0x2e/0x39
> [   20.358969]  ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
> [   20.360991]  ksys_ioctl+0x52/0x71
> [   20.362384]  __x64_sys_ioctl+0x16/0x19
> [   20.363702]  do_syscall_64+0x5d/0x167
> [   20.365099]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> To work properly with the rwsem debug code, we need to annotate that the
> rwsem ownership is unknown during the tranfer period until a brave soul
> comes forward to acquire the ownership. During that period, optimistic
> spinning will be disabled.
>
> Signed-off-by: Waiman Long 

Looks good and tested

Thanks,
Amir.

> ---
>  include/linux/percpu-rwsem.h | 6 +-
>  include/linux/rwsem.h| 6 ++
>  kernel/locking/rwsem-xadd.c  | 2 ++
>  3 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
> index b1f37a8..79b99d6 100644
> --- a/include/linux/percpu-rwsem.h
> +++ b/include/linux/percpu-rwsem.h
> @@ -133,7 +133,7 @@ static inline void percpu_rwsem_release(struct 
> percpu_rw_semaphore *sem,
> lock_release(>rw_sem.dep_map, 1, ip);
>  #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
> if (!read)
> -   sem->rw_sem.owner = NULL;
> +   sem->rw_sem.owner = RWSEM_OWNER_UNKNOWN;
>  #endif
>  }
>
> @@ -141,6 +141,10 @@ static inline void percpu_rwsem_acquire(struct 
> percpu_rw_semaphore *sem,
> bool read, unsigned long ip)
>  {
> lock_acquire(>rw_sem.dep_map, 0, 1, read, 1, NULL, ip);
> +#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
> +   if (!read)
> +   sem->rw_sem.owner = current;
> +#endif
>  }
>
>  #endif
> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
> index 56707d5..ab93b6e 100644
> --- a/include/linux/rwsem.h
> +++ b/include/linux/rwsem.h
> @@ -44,6 +44,12 @@ struct rw_semaphore {
>  #endif
>  };
>
> +/*
> + * Setting bit 0 of the owner field with other non-zero bits will indicate
> + * that the rwsem is writer-owned with an unknown owner.
> + */
> +#define RWSEM_OWNER_UNKNOWN((struct task_struct *)-1L)
> +
>  extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
>  extern struct rw_semaphore *rwsem_down_read_failed_killable(struct 
> rw_semaphore *sem);
>  extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore 
> *sem);
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 604d247..a903367 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -352,6 +352,8 @@ static inline bool rwsem_can_spin_on_owner(struct 
> rw_semaphore *sem)
> struct task_struct *owner;
> bool ret = true;
>
> +   BUILD_BUG_ON(!rwsem_has_anonymous_owner(RWSEM_OWNER_UNKNOWN));
> +
> if (need_resched())
> return false;
>
> --
> 1.8.3.1
>


Re: [PATCH v4 2/2] locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN

2018-05-15 Thread Amir Goldstein
On Wed, May 16, 2018 at 12:49 AM, Waiman Long  wrote:
> The filesystem freezing code needs to transfer ownership of a rwsem
> embedded in a percpu-rwsem from the task that does the freezing to
> another one that does the thawing by calling percpu_rwsem_release()
> after freezing and percpu_rwsem_acquire() before thawing.
>
> However, the new rwsem debug code runs afoul with this scheme by warning
> that the task that releases the rwsem isn't the one that acquires it.
>
> [   20.302978] [ cut here ]
> [   20.305016] DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
> [   20.305029] WARNING: CPU: 1 PID: 1401 at
> /home/amir/build/src/linux/kernel/locking/rwsem.c:133 up_write+0x59/0x79
> [   20.311252] CPU: 1 PID: 1401 Comm: fsfreeze Not tainted 
> 4.17.0-rc3-xfstests-00049-g39e47bf59eb3 #3276
> [   20.314808] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [   20.318403] RIP: 0010:up_write+0x59/0x79
> [   20.320928] RSP: 0018:c9717e48 EFLAGS: 00010286
> [   20.322955] RAX: 0030 RBX: 880078f1c680 RCX: 
> 880078e42200
> [   20.325665] RDX: 810cc9c1 RSI: 0001 RDI: 
> 0202
> [   20.328844] RBP: c9717e80 R08: 0001 R09: 
> 0001
> [   20.332340] R10: c9717c58 R11: 836807ad R12: 
> 880078f1c388
> [   20.335095] R13: 880078a8b980 R14:  R15: 
> fff7
> [   20.338009] FS:  7fb61ca42700() GS:88007f40() 
> knlGS:
> [   20.341423] CS:  0010 DS:  ES:  CR0: 80050033
> [   20.343772] CR2: 7fb61c559b30 CR3: 78da6000 CR4: 
> 06e0
> [   20.346463] DR0:  DR1:  DR2: 
> 
> [   20.349201] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [   20.351960] Call Trace:
> [   20.352911]  percpu_up_write+0x1f/0x28
> [   20.354344]  thaw_super_locked+0xdf/0x120
> [   20.355944]  do_vfs_ioctl+0x270/0x5f1
> [   20.357390]  ? __se_sys_newfstat+0x2e/0x39
> [   20.358969]  ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
> [   20.360991]  ksys_ioctl+0x52/0x71
> [   20.362384]  __x64_sys_ioctl+0x16/0x19
> [   20.363702]  do_syscall_64+0x5d/0x167
> [   20.365099]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> To work properly with the rwsem debug code, we need to annotate that the
> rwsem ownership is unknown during the tranfer period until a brave soul
> comes forward to acquire the ownership. During that period, optimistic
> spinning will be disabled.
>
> Signed-off-by: Waiman Long 

Looks good and tested

Thanks,
Amir.

> ---
>  include/linux/percpu-rwsem.h | 6 +-
>  include/linux/rwsem.h| 6 ++
>  kernel/locking/rwsem-xadd.c  | 2 ++
>  3 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
> index b1f37a8..79b99d6 100644
> --- a/include/linux/percpu-rwsem.h
> +++ b/include/linux/percpu-rwsem.h
> @@ -133,7 +133,7 @@ static inline void percpu_rwsem_release(struct 
> percpu_rw_semaphore *sem,
> lock_release(>rw_sem.dep_map, 1, ip);
>  #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
> if (!read)
> -   sem->rw_sem.owner = NULL;
> +   sem->rw_sem.owner = RWSEM_OWNER_UNKNOWN;
>  #endif
>  }
>
> @@ -141,6 +141,10 @@ static inline void percpu_rwsem_acquire(struct 
> percpu_rw_semaphore *sem,
> bool read, unsigned long ip)
>  {
> lock_acquire(>rw_sem.dep_map, 0, 1, read, 1, NULL, ip);
> +#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
> +   if (!read)
> +   sem->rw_sem.owner = current;
> +#endif
>  }
>
>  #endif
> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
> index 56707d5..ab93b6e 100644
> --- a/include/linux/rwsem.h
> +++ b/include/linux/rwsem.h
> @@ -44,6 +44,12 @@ struct rw_semaphore {
>  #endif
>  };
>
> +/*
> + * Setting bit 0 of the owner field with other non-zero bits will indicate
> + * that the rwsem is writer-owned with an unknown owner.
> + */
> +#define RWSEM_OWNER_UNKNOWN((struct task_struct *)-1L)
> +
>  extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
>  extern struct rw_semaphore *rwsem_down_read_failed_killable(struct 
> rw_semaphore *sem);
>  extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore 
> *sem);
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 604d247..a903367 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -352,6 +352,8 @@ static inline bool rwsem_can_spin_on_owner(struct 
> rw_semaphore *sem)
> struct task_struct *owner;
> bool ret = true;
>
> +   BUILD_BUG_ON(!rwsem_has_anonymous_owner(RWSEM_OWNER_UNKNOWN));
> +
> if (need_resched())
> return false;
>
> --
> 1.8.3.1
>


Re: [PATCH v6 04/17] media: rkisp1: add Rockchip MIPI Synopsys DPHY driver

2018-05-15 Thread Laurent Pinchart
Hi Jacob,

Thank you for the patch.

On Thursday, 8 March 2018 11:47:54 EEST Jacob Chen wrote:
> From: Jacob Chen 
> 
> This commit adds a subdev driver for Rockchip MIPI Synopsys DPHY driver

Should this really be a subdev driver ? After a quick look at the code, the 
only parameters you need to configure the PHY is the number of lanes and the 
data rate. Implementing the whole subdev API seems overcomplicated to me, 
especially given that the D-PHY doesn't deal with video streams as such, but 
operates one level down. Shouldn't we model the D-PHY using the Linux PHY 
framework ? I believe all the features you need are there except for a D-PHY-
specific configuration function that should be very easy to add.

> Signed-off-by: Jacob Chen 
> Signed-off-by: Shunqian Zheng 
> Signed-off-by: Tomasz Figa 
> ---
>  .../media/platform/rockchip/isp1/mipi_dphy_sy.c| 868 ++
>  .../media/platform/rockchip/isp1/mipi_dphy_sy.h|  15 +
>  2 files changed, 883 insertions(+)
>  create mode 100644 drivers/media/platform/rockchip/isp1/mipi_dphy_sy.c
>  create mode 100644 drivers/media/platform/rockchip/isp1/mipi_dphy_sy.h
> 
> diff --git a/drivers/media/platform/rockchip/isp1/mipi_dphy_sy.c
> b/drivers/media/platform/rockchip/isp1/mipi_dphy_sy.c new file mode 100644
> index ..32140960557a
> --- /dev/null
> +++ b/drivers/media/platform/rockchip/isp1/mipi_dphy_sy.c
> @@ -0,0 +1,868 @@
> +// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> +/*
> + * Rockchip MIPI Synopsys DPHY driver
> + *
> + * Copyright (C) 2017 Fuzhou Rockchip Electronics Co., Ltd.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define RK3288_GRF_SOC_CON6  0x025c
> +#define RK3288_GRF_SOC_CON8  0x0264
> +#define RK3288_GRF_SOC_CON9  0x0268
> +#define RK3288_GRF_SOC_CON10 0x026c
> +#define RK3288_GRF_SOC_CON14 0x027c
> +#define RK3288_GRF_SOC_STATUS21  0x02d4
> +#define RK3288_GRF_IO_VSEL   0x0380
> +#define RK3288_GRF_SOC_CON15 0x03a4
> +
> +#define RK3399_GRF_SOC_CON9  0x6224
> +#define RK3399_GRF_SOC_CON21 0x6254
> +#define RK3399_GRF_SOC_CON22 0x6258
> +#define RK3399_GRF_SOC_CON23 0x625c
> +#define RK3399_GRF_SOC_CON24 0x6260
> +#define RK3399_GRF_SOC_CON25 0x6264
> +#define RK3399_GRF_SOC_STATUS1   0xe2a4
> +
> +#define CLOCK_LANE_HS_RX_CONTROL 0x34
> +#define LANE0_HS_RX_CONTROL  0x44
> +#define LANE1_HS_RX_CONTROL  0x54
> +#define LANE2_HS_RX_CONTROL  0x84
> +#define LANE3_HS_RX_CONTROL  0x94
> +#define HS_RX_DATA_LANES_THS_SETTLE_CONTROL  0x75
> +
> +/*
> + * CSI HOST
> + */
> +#define CSIHOST_PHY_TEST_CTRL0   0x30
> +#define CSIHOST_PHY_TEST_CTRL1   0x34
> +#define CSIHOST_PHY_SHUTDOWNZ0x08
> +#define CSIHOST_DPHY_RSTZ0x0c
> +
> +#define PHY_TESTEN_ADDR  (0x1 << 16)
> +#define PHY_TESTEN_DATA  (0x0 << 16)
> +#define PHY_TESTCLK  (0x1 << 1)
> +#define PHY_TESTCLR  (0x1 << 0)
> +#define THS_SETTLE_COUNTER_THRESHOLD 0x04
> +
> +#define HIWORD_UPDATE(val, mask, shift) \
> + ((val) << (shift) | (mask) << ((shift) + 16))
> +
> +enum mipi_dphy_sy_pads {
> + MIPI_DPHY_SY_PAD_SINK = 0,
> + MIPI_DPHY_SY_PAD_SOURCE,
> + MIPI_DPHY_SY_PADS_NUM,
> +};
> +
> +enum dphy_reg_id {
> + GRF_DPHY_RX0_TURNDISABLE = 0,
> + GRF_DPHY_RX0_FORCERXMODE,
> + GRF_DPHY_RX0_FORCETXSTOPMODE,
> + GRF_DPHY_RX0_ENABLE,
> + GRF_DPHY_RX0_TESTCLR,
> + GRF_DPHY_RX0_TESTCLK,
> + GRF_DPHY_RX0_TESTEN,
> + GRF_DPHY_RX0_TESTDIN,
> + GRF_DPHY_RX0_TURNREQUEST,
> + GRF_DPHY_RX0_TESTDOUT,
> + GRF_DPHY_TX0_TURNDISABLE,
> + GRF_DPHY_TX0_FORCERXMODE,
> + GRF_DPHY_TX0_FORCETXSTOPMODE,
> + GRF_DPHY_TX0_TURNREQUEST,
> + GRF_DPHY_TX1RX1_TURNDISABLE,
> + GRF_DPHY_TX1RX1_FORCERXMODE,
> + GRF_DPHY_TX1RX1_FORCETXSTOPMODE,
> + GRF_DPHY_TX1RX1_ENABLE,
> + GRF_DPHY_TX1RX1_MASTERSLAVEZ,
> + GRF_DPHY_TX1RX1_BASEDIR,
> + GRF_DPHY_TX1RX1_ENABLECLK,
> + GRF_DPHY_TX1RX1_TURNREQUEST,
> + GRF_DPHY_RX1_SRC_SEL,
> + /* rk3288 only */
> + GRF_CON_DISABLE_ISP,
> + GRF_CON_ISP_DPHY_SEL,
> + GRF_DSI_CSI_TESTBUS_SEL,
> + GRF_DVP_V18SEL,
> + /* below is for rk3399 only */
> + GRF_DPHY_RX0_CLK_INV_SEL,
> + GRF_DPHY_RX1_CLK_INV_SEL,
> +};
> +
> +struct dphy_reg {
> + u32 offset;
> + u32 mask;
> + u32 shift;
> +};
> +
> +#define PHY_REG(_offset, _width, _shift) \
> + { .offset = _offset, .mask = BIT(_width) - 1, .shift = _shift, }
> +
> +static const struct dphy_reg rk3399_grf_dphy_regs[] = {
> + [GRF_DPHY_RX0_TURNREQUEST] = PHY_REG(RK3399_GRF_SOC_CON9, 4, 0),
> + [GRF_DPHY_RX0_CLK_INV_SEL] = 

Re: [PATCH v6 04/17] media: rkisp1: add Rockchip MIPI Synopsys DPHY driver

2018-05-15 Thread Laurent Pinchart
Hi Jacob,

Thank you for the patch.

On Thursday, 8 March 2018 11:47:54 EEST Jacob Chen wrote:
> From: Jacob Chen 
> 
> This commit adds a subdev driver for Rockchip MIPI Synopsys DPHY driver

Should this really be a subdev driver ? After a quick look at the code, the 
only parameters you need to configure the PHY is the number of lanes and the 
data rate. Implementing the whole subdev API seems overcomplicated to me, 
especially given that the D-PHY doesn't deal with video streams as such, but 
operates one level down. Shouldn't we model the D-PHY using the Linux PHY 
framework ? I believe all the features you need are there except for a D-PHY-
specific configuration function that should be very easy to add.

> Signed-off-by: Jacob Chen 
> Signed-off-by: Shunqian Zheng 
> Signed-off-by: Tomasz Figa 
> ---
>  .../media/platform/rockchip/isp1/mipi_dphy_sy.c| 868 ++
>  .../media/platform/rockchip/isp1/mipi_dphy_sy.h|  15 +
>  2 files changed, 883 insertions(+)
>  create mode 100644 drivers/media/platform/rockchip/isp1/mipi_dphy_sy.c
>  create mode 100644 drivers/media/platform/rockchip/isp1/mipi_dphy_sy.h
> 
> diff --git a/drivers/media/platform/rockchip/isp1/mipi_dphy_sy.c
> b/drivers/media/platform/rockchip/isp1/mipi_dphy_sy.c new file mode 100644
> index ..32140960557a
> --- /dev/null
> +++ b/drivers/media/platform/rockchip/isp1/mipi_dphy_sy.c
> @@ -0,0 +1,868 @@
> +// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> +/*
> + * Rockchip MIPI Synopsys DPHY driver
> + *
> + * Copyright (C) 2017 Fuzhou Rockchip Electronics Co., Ltd.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define RK3288_GRF_SOC_CON6  0x025c
> +#define RK3288_GRF_SOC_CON8  0x0264
> +#define RK3288_GRF_SOC_CON9  0x0268
> +#define RK3288_GRF_SOC_CON10 0x026c
> +#define RK3288_GRF_SOC_CON14 0x027c
> +#define RK3288_GRF_SOC_STATUS21  0x02d4
> +#define RK3288_GRF_IO_VSEL   0x0380
> +#define RK3288_GRF_SOC_CON15 0x03a4
> +
> +#define RK3399_GRF_SOC_CON9  0x6224
> +#define RK3399_GRF_SOC_CON21 0x6254
> +#define RK3399_GRF_SOC_CON22 0x6258
> +#define RK3399_GRF_SOC_CON23 0x625c
> +#define RK3399_GRF_SOC_CON24 0x6260
> +#define RK3399_GRF_SOC_CON25 0x6264
> +#define RK3399_GRF_SOC_STATUS1   0xe2a4
> +
> +#define CLOCK_LANE_HS_RX_CONTROL 0x34
> +#define LANE0_HS_RX_CONTROL  0x44
> +#define LANE1_HS_RX_CONTROL  0x54
> +#define LANE2_HS_RX_CONTROL  0x84
> +#define LANE3_HS_RX_CONTROL  0x94
> +#define HS_RX_DATA_LANES_THS_SETTLE_CONTROL  0x75
> +
> +/*
> + * CSI HOST
> + */
> +#define CSIHOST_PHY_TEST_CTRL0   0x30
> +#define CSIHOST_PHY_TEST_CTRL1   0x34
> +#define CSIHOST_PHY_SHUTDOWNZ0x08
> +#define CSIHOST_DPHY_RSTZ0x0c
> +
> +#define PHY_TESTEN_ADDR  (0x1 << 16)
> +#define PHY_TESTEN_DATA  (0x0 << 16)
> +#define PHY_TESTCLK  (0x1 << 1)
> +#define PHY_TESTCLR  (0x1 << 0)
> +#define THS_SETTLE_COUNTER_THRESHOLD 0x04
> +
> +#define HIWORD_UPDATE(val, mask, shift) \
> + ((val) << (shift) | (mask) << ((shift) + 16))
> +
> +enum mipi_dphy_sy_pads {
> + MIPI_DPHY_SY_PAD_SINK = 0,
> + MIPI_DPHY_SY_PAD_SOURCE,
> + MIPI_DPHY_SY_PADS_NUM,
> +};
> +
> +enum dphy_reg_id {
> + GRF_DPHY_RX0_TURNDISABLE = 0,
> + GRF_DPHY_RX0_FORCERXMODE,
> + GRF_DPHY_RX0_FORCETXSTOPMODE,
> + GRF_DPHY_RX0_ENABLE,
> + GRF_DPHY_RX0_TESTCLR,
> + GRF_DPHY_RX0_TESTCLK,
> + GRF_DPHY_RX0_TESTEN,
> + GRF_DPHY_RX0_TESTDIN,
> + GRF_DPHY_RX0_TURNREQUEST,
> + GRF_DPHY_RX0_TESTDOUT,
> + GRF_DPHY_TX0_TURNDISABLE,
> + GRF_DPHY_TX0_FORCERXMODE,
> + GRF_DPHY_TX0_FORCETXSTOPMODE,
> + GRF_DPHY_TX0_TURNREQUEST,
> + GRF_DPHY_TX1RX1_TURNDISABLE,
> + GRF_DPHY_TX1RX1_FORCERXMODE,
> + GRF_DPHY_TX1RX1_FORCETXSTOPMODE,
> + GRF_DPHY_TX1RX1_ENABLE,
> + GRF_DPHY_TX1RX1_MASTERSLAVEZ,
> + GRF_DPHY_TX1RX1_BASEDIR,
> + GRF_DPHY_TX1RX1_ENABLECLK,
> + GRF_DPHY_TX1RX1_TURNREQUEST,
> + GRF_DPHY_RX1_SRC_SEL,
> + /* rk3288 only */
> + GRF_CON_DISABLE_ISP,
> + GRF_CON_ISP_DPHY_SEL,
> + GRF_DSI_CSI_TESTBUS_SEL,
> + GRF_DVP_V18SEL,
> + /* below is for rk3399 only */
> + GRF_DPHY_RX0_CLK_INV_SEL,
> + GRF_DPHY_RX1_CLK_INV_SEL,
> +};
> +
> +struct dphy_reg {
> + u32 offset;
> + u32 mask;
> + u32 shift;
> +};
> +
> +#define PHY_REG(_offset, _width, _shift) \
> + { .offset = _offset, .mask = BIT(_width) - 1, .shift = _shift, }
> +
> +static const struct dphy_reg rk3399_grf_dphy_regs[] = {
> + [GRF_DPHY_RX0_TURNREQUEST] = PHY_REG(RK3399_GRF_SOC_CON9, 4, 0),
> + [GRF_DPHY_RX0_CLK_INV_SEL] = PHY_REG(RK3399_GRF_SOC_CON9, 1, 10),
> + [GRF_DPHY_RX1_CLK_INV_SEL] = PHY_REG(RK3399_GRF_SOC_CON9, 

Re: [PATCH] clk: stm32mp1: Add CLK_IGNORE_UNUSED to ck_sys_dbg clock

2018-05-15 Thread Gabriel FERNANDEZ
Thanks Stephen


On 05/15/2018 08:23 PM, Stephen Boyd wrote:
> Quoting gabriel.fernan...@st.com (2018-04-24 00:58:43)
>> From: Gabriel Fernandez 
>>
>> Don't disable the dbg clock if was set by bootloader.
>>
>> Signed-off-by: Gabriel Fernandez 
>> ---
> Applied to clk-next
>


Re: [PATCH] clk: stm32mp1: Add CLK_IGNORE_UNUSED to ck_sys_dbg clock

2018-05-15 Thread Gabriel FERNANDEZ
Thanks Stephen


On 05/15/2018 08:23 PM, Stephen Boyd wrote:
> Quoting gabriel.fernan...@st.com (2018-04-24 00:58:43)
>> From: Gabriel Fernandez 
>>
>> Don't disable the dbg clock if was set by bootloader.
>>
>> Signed-off-by: Gabriel Fernandez 
>> ---
> Applied to clk-next
>


Re: [PATCH 2/2] powerpc: Enable ASYM_SMT on interleaved big-core systems

2018-05-15 Thread Gautham R Shenoy
On Mon, May 14, 2018 at 01:22:07PM +1000, Michael Neuling wrote:
> On Fri, 2018-05-11 at 16:47 +0530, Gautham R. Shenoy wrote:
> > From: "Gautham R. Shenoy" 
> > 
> > Each of the SMT4 cores forming a fused-core are more or less
> > independent units. Thus when multiple tasks are scheduled to run on
> > the fused core, we get the best performance when the tasks are spread
> > across the pair of SMT4 cores.
> > 
> > Since the threads in the pair of SMT4 cores of an interleaved big-core
> > are numbered {0,2,4,6} and {1,3,5,7} respectively, enable ASYM_SMT on
> > such interleaved big-cores that will bias the load-balancing of tasks
> > on smaller numbered threads, which will automatically result in
> > spreading the tasks uniformly across the associated pair of SMT4
> > cores.
> > 
> > Signed-off-by: Gautham R. Shenoy 
> > ---
> >  arch/powerpc/kernel/smp.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 9ca7148..0153f01 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -1082,7 +1082,7 @@ static int powerpc_smt_flags(void)
> >  {
> > int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> >  
> > -   if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
> > +   if (cpu_has_feature(CPU_FTR_ASYM_SMT) || has_interleaved_big_core) {
> 
> Shouldn't we just set CPU_FTR_ASYM_SMT and leave this code
unchanged?

Yes, that would have the same effect. I refrained from doing that
since I thought CPU_FTR_ASYM_SMT has the "lower numbered threads
expedite thread-folding" connotation from the POWER7 generation.

If it is ok to overload CPU_FTR_ASYM_SMT, we can do what you suggest
and have all the changes in setup-common.c

--
Thanks and Regards
gautham.



Re: [PATCH 2/2] powerpc: Enable ASYM_SMT on interleaved big-core systems

2018-05-15 Thread Gautham R Shenoy
On Mon, May 14, 2018 at 01:22:07PM +1000, Michael Neuling wrote:
> On Fri, 2018-05-11 at 16:47 +0530, Gautham R. Shenoy wrote:
> > From: "Gautham R. Shenoy" 
> > 
> > Each of the SMT4 cores forming a fused-core are more or less
> > independent units. Thus when multiple tasks are scheduled to run on
> > the fused core, we get the best performance when the tasks are spread
> > across the pair of SMT4 cores.
> > 
> > Since the threads in the pair of SMT4 cores of an interleaved big-core
> > are numbered {0,2,4,6} and {1,3,5,7} respectively, enable ASYM_SMT on
> > such interleaved big-cores that will bias the load-balancing of tasks
> > on smaller numbered threads, which will automatically result in
> > spreading the tasks uniformly across the associated pair of SMT4
> > cores.
> > 
> > Signed-off-by: Gautham R. Shenoy 
> > ---
> >  arch/powerpc/kernel/smp.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 9ca7148..0153f01 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -1082,7 +1082,7 @@ static int powerpc_smt_flags(void)
> >  {
> > int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> >  
> > -   if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
> > +   if (cpu_has_feature(CPU_FTR_ASYM_SMT) || has_interleaved_big_core) {
> 
> Shouldn't we just set CPU_FTR_ASYM_SMT and leave this code
unchanged?

Yes, that would have the same effect. I refrained from doing that
since I thought CPU_FTR_ASYM_SMT has the "lower numbered threads
expedite thread-folding" connotation from the POWER7 generation.

If it is ok to overload CPU_FTR_ASYM_SMT, we can do what you suggest
and have all the changes in setup-common.c

--
Thanks and Regards
gautham.



Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring

2018-05-15 Thread Jason Wang



On 2018年04月25日 13:15, Tiwei Bie wrote:

This commit introduces the event idx support in packed
ring. This feature is temporarily disabled, because the
implementation in this patch may not work as expected,
and some further discussions on the implementation are
needed, e.g. do we have to check the wrap counter when
checking whether a kick is needed?

Signed-off-by: Tiwei Bie 
---
  drivers/virtio/virtio_ring.c | 53 
  1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0181e93897be..b1039c2985b9 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
  static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
  {
struct vring_virtqueue *vq = to_vvq(_vq);
-   u16 flags;
+   u16 new, old, off_wrap, flags;
bool needs_kick;
u32 snapshot;
  
@@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)

 * suppressions. */
virtio_mb(vq->weak_barriers);
  
+	old = vq->next_avail_idx - vq->num_added;

+   new = vq->next_avail_idx;
+   vq->num_added = 0;
+
snapshot = *(u32 *)vq->vring_packed.device;
+   off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0x);
flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
  
  #ifdef DEBUG

@@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct 
virtqueue *_vq)
vq->last_add_time_valid = false;
  #endif
  
-	needs_kick = (flags != VRING_EVENT_F_DISABLE);

+   if (flags == VRING_EVENT_F_DESC)
+   needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
+   else
+   needs_kick = (flags != VRING_EVENT_F_DISABLE);
END_USE(vq);
return needs_kick;
  }
@@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct 
virtqueue *_vq,
if (vq->last_used_idx >= vq->vring_packed.num)
vq->last_used_idx -= vq->vring_packed.num;
  
+	/* If we expect an interrupt for the next entry, tell host

+* by writing event index and flush out the write before
+* the read in the next get_buf call. */
+   if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
+   virtio_store_mb(vq->weak_barriers,
+   >vring_packed.driver->off_wrap,
+   cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
+   (vq->wrap_counter << 15)));
+
  #ifdef DEBUG
vq->last_add_time_valid = false;
  #endif
@@ -1143,10 +1160,17 @@ static unsigned 
virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
  
  	/* We optimistically turn back on interrupts, then check if there was

 * more to do. */
+   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+* either clear the flags bit or point the event index at the next
+* entry. Always update the event index to keep code simple. */
+
+   vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+   vq->last_used_idx | (vq->wrap_counter << 15));



Using vq->wrap_counter seems not correct, what we need is the warp 
counter for the last_used_idx not next_avail_idx.


And I think there's even no need to bother with event idx here, how 
about just set VRING_EVENT_F_ENABLE?


  
  	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {

virtio_wmb(vq->weak_barriers);
-   vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+   vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+VRING_EVENT_F_ENABLE;
vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
vq->event_flags_shadow);
}
@@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue 
*_vq, unsigned last_used_idx)
  static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
  {
struct vring_virtqueue *vq = to_vvq(_vq);
+   u16 bufs, used_idx, wrap_counter;
  
  	START_USE(vq);
  
  	/* We optimistically turn back on interrupts, then check if there was

 * more to do. */
+   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+* either clear the flags bit or point the event index at the next
+* entry. Always update the event index to keep code simple. */
+
+   /* TODO: tune this threshold */
+   bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;


bufs could be more than vq->num here, is this intended?


+
+   used_idx = vq->last_used_idx + bufs;
+   wrap_counter = vq->wrap_counter;
+
+   if (used_idx >= vq->vring_packed.num) {
+   used_idx -= vq->vring_packed.num;
+  

Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring

2018-05-15 Thread Jason Wang



On 2018年04月25日 13:15, Tiwei Bie wrote:

This commit introduces the event idx support in packed
ring. This feature is temporarily disabled, because the
implementation in this patch may not work as expected,
and some further discussions on the implementation are
needed, e.g. do we have to check the wrap counter when
checking whether a kick is needed?

Signed-off-by: Tiwei Bie 
---
  drivers/virtio/virtio_ring.c | 53 
  1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0181e93897be..b1039c2985b9 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
  static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
  {
struct vring_virtqueue *vq = to_vvq(_vq);
-   u16 flags;
+   u16 new, old, off_wrap, flags;
bool needs_kick;
u32 snapshot;
  
@@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)

 * suppressions. */
virtio_mb(vq->weak_barriers);
  
+	old = vq->next_avail_idx - vq->num_added;

+   new = vq->next_avail_idx;
+   vq->num_added = 0;
+
snapshot = *(u32 *)vq->vring_packed.device;
+   off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0x);
flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
  
  #ifdef DEBUG

@@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct 
virtqueue *_vq)
vq->last_add_time_valid = false;
  #endif
  
-	needs_kick = (flags != VRING_EVENT_F_DISABLE);

+   if (flags == VRING_EVENT_F_DESC)
+   needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
+   else
+   needs_kick = (flags != VRING_EVENT_F_DISABLE);
END_USE(vq);
return needs_kick;
  }
@@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct 
virtqueue *_vq,
if (vq->last_used_idx >= vq->vring_packed.num)
vq->last_used_idx -= vq->vring_packed.num;
  
+	/* If we expect an interrupt for the next entry, tell host

+* by writing event index and flush out the write before
+* the read in the next get_buf call. */
+   if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
+   virtio_store_mb(vq->weak_barriers,
+   >vring_packed.driver->off_wrap,
+   cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
+   (vq->wrap_counter << 15)));
+
  #ifdef DEBUG
vq->last_add_time_valid = false;
  #endif
@@ -1143,10 +1160,17 @@ static unsigned 
virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
  
  	/* We optimistically turn back on interrupts, then check if there was

 * more to do. */
+   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+* either clear the flags bit or point the event index at the next
+* entry. Always update the event index to keep code simple. */
+
+   vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+   vq->last_used_idx | (vq->wrap_counter << 15));



Using vq->wrap_counter seems not correct, what we need is the warp 
counter for the last_used_idx not next_avail_idx.


And I think there's even no need to bother with event idx here, how 
about just set VRING_EVENT_F_ENABLE?


  
  	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {

virtio_wmb(vq->weak_barriers);
-   vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+   vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+VRING_EVENT_F_ENABLE;
vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
vq->event_flags_shadow);
}
@@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue 
*_vq, unsigned last_used_idx)
  static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
  {
struct vring_virtqueue *vq = to_vvq(_vq);
+   u16 bufs, used_idx, wrap_counter;
  
  	START_USE(vq);
  
  	/* We optimistically turn back on interrupts, then check if there was

 * more to do. */
+   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+* either clear the flags bit or point the event index at the next
+* entry. Always update the event index to keep code simple. */
+
+   /* TODO: tune this threshold */
+   bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;


bufs could be more than vq->num here, is this intended?


+
+   used_idx = vq->last_used_idx + bufs;
+   wrap_counter = vq->wrap_counter;
+
+   if (used_idx >= vq->vring_packed.num) {
+   used_idx -= vq->vring_packed.num;
+   wrap_counter ^= 

Re: [PATCH 8/9] perf/breakpoint: Split breakpoint "check" and "commit"

2018-05-15 Thread Andy Lutomirski


> On May 15, 2018, at 8:11 PM, Frederic Weisbecker  wrote:
> 
>> On Wed, May 09, 2018 at 11:17:03AM +0200, Peter Zijlstra wrote:
>>> On Sun, May 06, 2018 at 09:19:54PM +0200, Frederic Weisbecker wrote:
>>> arch/arm/include/asm/hw_breakpoint.h |  5 -
>>> arch/arm/kernel/hw_breakpoint.c  | 22 +++---
>>> arch/arm64/include/asm/hw_breakpoint.h   |  5 -
>>> arch/arm64/kernel/hw_breakpoint.c| 22 +++---
>>> arch/powerpc/include/asm/hw_breakpoint.h |  5 -
>>> arch/powerpc/kernel/hw_breakpoint.c  | 22 +++---
>>> arch/sh/include/asm/hw_breakpoint.h |  5 -
>>> arch/sh/kernel/hw_breakpoint.c   | 22 +++---
>>> arch/x86/include/asm/hw_breakpoint.h |  5 -
>>> arch/x86/kernel/hw_breakpoint.c  | 23 +++
>>> arch/xtensa/include/asm/hw_breakpoint.h  |  5 -
>>> arch/xtensa/kernel/hw_breakpoint.c   | 22 +++---
>> 
>> Because of those ^,
>> 
>>> kernel/events/hw_breakpoint.c| 11 ++-
>> 
>> would it not make sense to have a prelimenary patch doing something
>> like:
>> 
>> __weak int hw_breakpoint_arch_check(struct perf_event *bp)
>> {
>>return arch_validate_hwbkpt_settings(bp);
>> }
> 
> So eventually I fear I can't do that, due to linking order.
> 
> Say I convert x86 to implement hw_breakpoint_arch_check(), so I
> remove arch_validate_hwbkpt_settings(). On build time, the weak version
> is still compiled and can't find a declaration for 
> arch_validate_hwbkpt_settings().
> 
> I tried to keep the declaration while the definition has been removed but
> it seems the weak version is linked first before it gets later replaced by
> the overriden arch version. So I get a build error.
> 
> I could keep arch_validate_hwbkpt_settings() around on all archs and remove 
> it in
> the end with the weak version but that would defeat the purpose of removing
> the mid-state in the current patch.

How about just not using weak functions?  Weak functions have annoying issues 
like this, and they have trouble generating good code. I much prefer the 
pattern:

in arch header:
extern void arch_func(whatever);
#define arch_func arch_func

in generic header:
#ifndef arch_func
static inline void arch_func(whatever) ...
#endif

Re: [PATCH 8/9] perf/breakpoint: Split breakpoint "check" and "commit"

2018-05-15 Thread Andy Lutomirski


> On May 15, 2018, at 8:11 PM, Frederic Weisbecker  wrote:
> 
>> On Wed, May 09, 2018 at 11:17:03AM +0200, Peter Zijlstra wrote:
>>> On Sun, May 06, 2018 at 09:19:54PM +0200, Frederic Weisbecker wrote:
>>> arch/arm/include/asm/hw_breakpoint.h |  5 -
>>> arch/arm/kernel/hw_breakpoint.c  | 22 +++---
>>> arch/arm64/include/asm/hw_breakpoint.h   |  5 -
>>> arch/arm64/kernel/hw_breakpoint.c| 22 +++---
>>> arch/powerpc/include/asm/hw_breakpoint.h |  5 -
>>> arch/powerpc/kernel/hw_breakpoint.c  | 22 +++---
>>> arch/sh/include/asm/hw_breakpoint.h |  5 -
>>> arch/sh/kernel/hw_breakpoint.c   | 22 +++---
>>> arch/x86/include/asm/hw_breakpoint.h |  5 -
>>> arch/x86/kernel/hw_breakpoint.c  | 23 +++
>>> arch/xtensa/include/asm/hw_breakpoint.h  |  5 -
>>> arch/xtensa/kernel/hw_breakpoint.c   | 22 +++---
>> 
>> Because of those ^,
>> 
>>> kernel/events/hw_breakpoint.c| 11 ++-
>> 
>> would it not make sense to have a prelimenary patch doing something
>> like:
>> 
>> __weak int hw_breakpoint_arch_check(struct perf_event *bp)
>> {
>>return arch_validate_hwbkpt_settings(bp);
>> }
> 
> So eventually I fear I can't do that, due to linking order.
> 
> Say I convert x86 to implement hw_breakpoint_arch_check(), so I
> remove arch_validate_hwbkpt_settings(). On build time, the weak version
> is still compiled and can't find a declaration for 
> arch_validate_hwbkpt_settings().
> 
> I tried to keep the declaration while the definition has been removed but
> it seems the weak version is linked first before it gets later replaced by
> the overriden arch version. So I get a build error.
> 
> I could keep arch_validate_hwbkpt_settings() around on all archs and remove 
> it in
> the end with the weak version but that would defeat the purpose of removing
> the mid-state in the current patch.

How about just not using weak functions?  Weak functions have annoying issues 
like this, and they have trouble generating good code. I much prefer the 
pattern:

in arch header:
extern void arch_func(whatever);
#define arch_func arch_func

in generic header:
#ifndef arch_func
static inline void arch_func(whatever) ...
#endif

Re: Great Investment Offer

2018-05-15 Thread Gagum Melvin Sikze Kakha
Hello

In my search for a business partner i got your contact in google 
search. My client is willing to invest $10 Million to $500 
million but my client said he need a trusted partner who he can 
have a meeting at the point of releasing his funds. 

I told my client that you have a good profile with your company 
which i got details about you on my search on google lookup. Can 
we trust you. 

Can we make a plan for a long term business relationship.

Please reply. 

Regards,
Gagum Melvin Sikze Kakha


Re: Great Investment Offer

2018-05-15 Thread Gagum Melvin Sikze Kakha
Hello

In my search for a business partner i got your contact in google 
search. My client is willing to invest $10 Million to $500 
million but my client said he need a trusted partner who he can 
have a meeting at the point of releasing his funds. 

I told my client that you have a good profile with your company 
which i got details about you on my search on google lookup. Can 
we trust you. 

Can we make a plan for a long term business relationship.

Please reply. 

Regards,
Gagum Melvin Sikze Kakha


[PATCH v5 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-15 Thread Ganapatrao Kulkarni
This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory
Controller(DMC) and Level 3 Cache(L3C).

ThunderX2 has 8 independent DMC PMUs to capture performance events
corresponding to 8 channels of DDR4 Memory Controller and 16 independent
L3C PMUs to capture events corresponding to 16 tiles of L3 cache.
Each PMU supports up to 4 counters. All counters lack overflow interrupt
and are sampled periodically.

Signed-off-by: Ganapatrao Kulkarni 
---
 drivers/perf/Kconfig |   8 +
 drivers/perf/Makefile|   1 +
 drivers/perf/thunderx2_pmu.c | 965 +++
 include/linux/cpuhotplug.h   |   1 +
 4 files changed, 975 insertions(+)
 create mode 100644 drivers/perf/thunderx2_pmu.c

diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 28bb5a0..eafd0fc 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -85,6 +85,14 @@ config QCOM_L3_PMU
   Adds the L3 cache PMU into the perf events subsystem for
   monitoring L3 cache events.
 
+config THUNDERX2_PMU
+bool "Cavium ThunderX2 SoC PMU UNCORE"
+depends on ARCH_THUNDER2 && PERF_EVENTS && ACPI
+   help
+ Provides support for ThunderX2 UNCORE events.
+ The SoC has PMU support in its L3 cache controller (L3C) and
+ in the DDR4 Memory Controller (DMC).
+
 config XGENE_PMU
 depends on ARCH_XGENE
 bool "APM X-Gene SoC PMU"
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index b3902bd..909f27f 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -7,5 +7,6 @@ obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
 obj-$(CONFIG_HISI_PMU) += hisilicon/
 obj-$(CONFIG_QCOM_L2_PMU)  += qcom_l2_pmu.o
 obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
+obj-$(CONFIG_THUNDERX2_PMU) += thunderx2_pmu.o
 obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
 obj-$(CONFIG_ARM_SPE_PMU) += arm_spe_pmu.o
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
new file mode 100644
index 000..0401443
--- /dev/null
+++ b/drivers/perf/thunderx2_pmu.c
@@ -0,0 +1,965 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * CAVIUM THUNDERX2 SoC PMU UNCORE
+ *
+ * Copyright (C) 2018 Cavium Inc.
+ * Author: Ganapatrao Kulkarni 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* L3c and DMC has 16 and 8 channels per socket respectively.
+ * Each Channel supports UNCORE PMU device and consists of
+ * 4 independent programmable counters. Counters are 32 bit
+ * and does not support overflow interrupt, they needs to be
+ * sampled before overflow(i.e, at every 2 seconds).
+ */
+
+#define UNCORE_MAX_COUNTERS4
+#define UNCORE_L3_MAX_TILES16
+#define UNCORE_DMC_MAX_CHANNELS8
+
+#define UNCORE_HRTIMER_INTERVAL(2 * NSEC_PER_SEC)
+#define GET_EVENTID(ev)((ev->hw.config) & 0x1ff)
+#define GET_COUNTERID(ev)  ((ev->hw.idx) & 0xf)
+#define GET_CHANNELID(pmu_uncore)  (pmu_uncore->channel)
+#define DMC_EVENT_CFG(idx, val)((val) << (((idx) * 8) + 1))
+
+#define DMC_COUNTER_CTL0x234
+#define DMC_COUNTER_DATA   0x240
+#define L3C_COUNTER_CTL0xA8
+#define L3C_COUNTER_DATA   0xAC
+
+#define THUNDERX2_SMC_CALL_ID  0xC200FF00
+#define THUNDERX2_SMC_SET_CHANNEL  0xB010
+
+enum thunderx2_uncore_l3_events {
+   L3_EVENT_NONE,
+   L3_EVENT_NBU_CANCEL,
+   L3_EVENT_DIB_RETRY,
+   L3_EVENT_DOB_RETRY,
+   L3_EVENT_DIB_CREDIT_RETRY,
+   L3_EVENT_DOB_CREDIT_RETRY,
+   L3_EVENT_FORCE_RETRY,
+   L3_EVENT_IDX_CONFLICT_RETRY,
+   L3_EVENT_EVICT_CONFLICT_RETRY,
+   L3_EVENT_BANK_CONFLICT_RETRY,
+   L3_EVENT_FILL_ENTRY_RETRY,
+   L3_EVENT_EVICT_NOT_READY_RETRY,
+   L3_EVENT_L3_RETRY,
+   L3_EVENT_READ_REQ,
+   L3_EVENT_WRITE_BACK_REQ,
+   L3_EVENT_INVALIDATE_NWRITE_REQ,
+   L3_EVENT_INV_REQ,
+   L3_EVENT_SELF_REQ,
+   L3_EVENT_REQ,
+   L3_EVENT_EVICT_REQ,
+   L3_EVENT_INVALIDATE_NWRITE_HIT,
+   L3_EVENT_INVALIDATE_HIT,
+   L3_EVENT_SELF_HIT,
+   L3_EVENT_READ_HIT,
+   L3_EVENT_MAX,
+};
+
+enum thunderx2_uncore_dmc_events {
+   DMC_EVENT_NONE,
+   DMC_EVENT_COUNT_CYCLES,
+   DMC_EVENT_RES2,
+   DMC_EVENT_RES3,
+   

[PATCH v5 1/2] perf: uncore: Adding documentation for ThunderX2 pmu uncore driver

2018-05-15 Thread Ganapatrao Kulkarni
Documentation for the UNCORE PMUs on Cavium's ThunderX2 SoC.
The SoC has PMU support in its L3 cache controller (L3C) and in the
DDR4 Memory Controller (DMC).

Signed-off-by: Ganapatrao Kulkarni 
---
 Documentation/perf/thunderx2-pmu.txt | 66 
 1 file changed, 66 insertions(+)
 create mode 100644 Documentation/perf/thunderx2-pmu.txt

diff --git a/Documentation/perf/thunderx2-pmu.txt 
b/Documentation/perf/thunderx2-pmu.txt
new file mode 100644
index 000..7d89935
--- /dev/null
+++ b/Documentation/perf/thunderx2-pmu.txt
@@ -0,0 +1,66 @@
+
+Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
+==
+
+ThunderX2 SoC PMU consists of independent system wide per Socket PMUs such
+as Level 3 Cache(L3C) and DDR4 Memory Controller(DMC).
+
+It has 8 independent DMC PMUs to capture performance events corresponding
+to 8 channels of DDR4 Memory Controller. There are 16 independent L3C PMUs
+to capture events corresponding to 16 tiles of L3 cache. Each PMU supports
+up to 4 counters.
+
+Counters are independently programmable and can be started and stopped
+individually. Each counter can be set to sample specific perf events.
+Counters are 32 bit and do not support overflow interrupt; they are
+sampled at every 2 seconds. The Counters register access are multiplexed
+across channels of DMC and L3C. The muxing(select channel) is done through
+write to a Secure register using smcc calls.
+
+PMU UNCORE (perf) driver:
+
+The thunderx2-pmu driver registers several perf PMUs for DMC and L3C devices.
+Each of the PMUs provides description of its available events
+and configuration options in sysfs.
+   see /sys/devices/uncore_
+
+S is socket id and X represents channel number.
+Each PMU can be used to sample up to 4 events simultaneously.
+
+The "format" directory describes format of the config (event ID).
+The "events" directory provides configuration templates for all
+supported event types that can be used with perf tool.
+
+For example, "uncore_dmc_0_0/cnt_cycles/" is an
+equivalent of "uncore_dmc_0_0/config=0x1/".
+
+Each perf driver also provides a "cpumask" sysfs attribute, which contains a
+single CPU ID of the processor which is likely to be used to handle all the
+PMU events. It will be the first online CPU from the NUMA node of PMU device.
+
+Example for perf tool use:
+
+perf stat -a -e \
+uncore_dmc_0_0/cnt_cycles/,\
+uncore_dmc_0_1/cnt_cycles/,\
+uncore_dmc_0_2/cnt_cycles/,\
+uncore_dmc_0_3/cnt_cycles/,\
+uncore_dmc_0_4/cnt_cycles/,\
+uncore_dmc_0_5/cnt_cycles/,\
+uncore_dmc_0_6/cnt_cycles/,\
+uncore_dmc_0_7/cnt_cycles/ sleep 1
+
+perf stat -a -e \
+uncore_dmc_0_0/cancelled_read_txns/,\
+uncore_dmc_0_0/cnt_cycles/,\
+uncore_dmc_0_0/consumed_read_txns/,\
+uncore_dmc_0_0/data_transfers/ sleep 1
+
+perf stat -a -e \
+uncore_l3c_0_0/l3_retry/,\
+uncore_l3c_0_0/read_hit/,\
+uncore_l3c_0_0/read_request/,\
+uncore_l3c_0_0/inv_request/ sleep 1
+
+The driver does not support sampling, therefore "perf record" will
+not work. Per-task (without "-a") perf sessions are not supported.
-- 
2.9.4



[PATCH v5 1/2] perf: uncore: Adding documentation for ThunderX2 pmu uncore driver

2018-05-15 Thread Ganapatrao Kulkarni
Documentation for the UNCORE PMUs on Cavium's ThunderX2 SoC.
The SoC has PMU support in its L3 cache controller (L3C) and in the
DDR4 Memory Controller (DMC).

Signed-off-by: Ganapatrao Kulkarni 
---
 Documentation/perf/thunderx2-pmu.txt | 66 
 1 file changed, 66 insertions(+)
 create mode 100644 Documentation/perf/thunderx2-pmu.txt

diff --git a/Documentation/perf/thunderx2-pmu.txt 
b/Documentation/perf/thunderx2-pmu.txt
new file mode 100644
index 000..7d89935
--- /dev/null
+++ b/Documentation/perf/thunderx2-pmu.txt
@@ -0,0 +1,66 @@
+
+Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
+==
+
+ThunderX2 SoC PMU consists of independent system wide per Socket PMUs such
+as Level 3 Cache(L3C) and DDR4 Memory Controller(DMC).
+
+It has 8 independent DMC PMUs to capture performance events corresponding
+to 8 channels of DDR4 Memory Controller. There are 16 independent L3C PMUs
+to capture events corresponding to 16 tiles of L3 cache. Each PMU supports
+up to 4 counters.
+
+Counters are independently programmable and can be started and stopped
+individually. Each counter can be set to sample specific perf events.
+Counters are 32 bit and do not support overflow interrupt; they are
+sampled at every 2 seconds. The Counters register access are multiplexed
+across channels of DMC and L3C. The muxing(select channel) is done through
+write to a Secure register using smcc calls.
+
+PMU UNCORE (perf) driver:
+
+The thunderx2-pmu driver registers several perf PMUs for DMC and L3C devices.
+Each of the PMUs provides description of its available events
+and configuration options in sysfs.
+   see /sys/devices/uncore_
+
+S is socket id and X represents channel number.
+Each PMU can be used to sample up to 4 events simultaneously.
+
+The "format" directory describes format of the config (event ID).
+The "events" directory provides configuration templates for all
+supported event types that can be used with perf tool.
+
+For example, "uncore_dmc_0_0/cnt_cycles/" is an
+equivalent of "uncore_dmc_0_0/config=0x1/".
+
+Each perf driver also provides a "cpumask" sysfs attribute, which contains a
+single CPU ID of the processor which is likely to be used to handle all the
+PMU events. It will be the first online CPU from the NUMA node of PMU device.
+
+Example for perf tool use:
+
+perf stat -a -e \
+uncore_dmc_0_0/cnt_cycles/,\
+uncore_dmc_0_1/cnt_cycles/,\
+uncore_dmc_0_2/cnt_cycles/,\
+uncore_dmc_0_3/cnt_cycles/,\
+uncore_dmc_0_4/cnt_cycles/,\
+uncore_dmc_0_5/cnt_cycles/,\
+uncore_dmc_0_6/cnt_cycles/,\
+uncore_dmc_0_7/cnt_cycles/ sleep 1
+
+perf stat -a -e \
+uncore_dmc_0_0/cancelled_read_txns/,\
+uncore_dmc_0_0/cnt_cycles/,\
+uncore_dmc_0_0/consumed_read_txns/,\
+uncore_dmc_0_0/data_transfers/ sleep 1
+
+perf stat -a -e \
+uncore_l3c_0_0/l3_retry/,\
+uncore_l3c_0_0/read_hit/,\
+uncore_l3c_0_0/read_request/,\
+uncore_l3c_0_0/inv_request/ sleep 1
+
+The driver does not support sampling, therefore "perf record" will
+not work. Per-task (without "-a") perf sessions are not supported.
-- 
2.9.4



[PATCH v5 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-15 Thread Ganapatrao Kulkarni
This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory
Controller(DMC) and Level 3 Cache(L3C).

ThunderX2 has 8 independent DMC PMUs to capture performance events
corresponding to 8 channels of DDR4 Memory Controller and 16 independent
L3C PMUs to capture events corresponding to 16 tiles of L3 cache.
Each PMU supports up to 4 counters. All counters lack overflow interrupt
and are sampled periodically.

Signed-off-by: Ganapatrao Kulkarni 
---
 drivers/perf/Kconfig |   8 +
 drivers/perf/Makefile|   1 +
 drivers/perf/thunderx2_pmu.c | 965 +++
 include/linux/cpuhotplug.h   |   1 +
 4 files changed, 975 insertions(+)
 create mode 100644 drivers/perf/thunderx2_pmu.c

diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 28bb5a0..eafd0fc 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -85,6 +85,14 @@ config QCOM_L3_PMU
   Adds the L3 cache PMU into the perf events subsystem for
   monitoring L3 cache events.
 
+config THUNDERX2_PMU
+bool "Cavium ThunderX2 SoC PMU UNCORE"
+depends on ARCH_THUNDER2 && PERF_EVENTS && ACPI
+   help
+ Provides support for ThunderX2 UNCORE events.
+ The SoC has PMU support in its L3 cache controller (L3C) and
+ in the DDR4 Memory Controller (DMC).
+
 config XGENE_PMU
 depends on ARCH_XGENE
 bool "APM X-Gene SoC PMU"
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index b3902bd..909f27f 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -7,5 +7,6 @@ obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
 obj-$(CONFIG_HISI_PMU) += hisilicon/
 obj-$(CONFIG_QCOM_L2_PMU)  += qcom_l2_pmu.o
 obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
+obj-$(CONFIG_THUNDERX2_PMU) += thunderx2_pmu.o
 obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
 obj-$(CONFIG_ARM_SPE_PMU) += arm_spe_pmu.o
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
new file mode 100644
index 000..0401443
--- /dev/null
+++ b/drivers/perf/thunderx2_pmu.c
@@ -0,0 +1,965 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * CAVIUM THUNDERX2 SoC PMU UNCORE
+ *
+ * Copyright (C) 2018 Cavium Inc.
+ * Author: Ganapatrao Kulkarni 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* L3c and DMC has 16 and 8 channels per socket respectively.
+ * Each Channel supports UNCORE PMU device and consists of
+ * 4 independent programmable counters. Counters are 32 bit
+ * and does not support overflow interrupt, they needs to be
+ * sampled before overflow(i.e, at every 2 seconds).
+ */
+
+#define UNCORE_MAX_COUNTERS4
+#define UNCORE_L3_MAX_TILES16
+#define UNCORE_DMC_MAX_CHANNELS8
+
+#define UNCORE_HRTIMER_INTERVAL(2 * NSEC_PER_SEC)
+#define GET_EVENTID(ev)((ev->hw.config) & 0x1ff)
+#define GET_COUNTERID(ev)  ((ev->hw.idx) & 0xf)
+#define GET_CHANNELID(pmu_uncore)  (pmu_uncore->channel)
+#define DMC_EVENT_CFG(idx, val)((val) << (((idx) * 8) + 1))
+
+#define DMC_COUNTER_CTL0x234
+#define DMC_COUNTER_DATA   0x240
+#define L3C_COUNTER_CTL0xA8
+#define L3C_COUNTER_DATA   0xAC
+
+#define THUNDERX2_SMC_CALL_ID  0xC200FF00
+#define THUNDERX2_SMC_SET_CHANNEL  0xB010
+
+enum thunderx2_uncore_l3_events {
+   L3_EVENT_NONE,
+   L3_EVENT_NBU_CANCEL,
+   L3_EVENT_DIB_RETRY,
+   L3_EVENT_DOB_RETRY,
+   L3_EVENT_DIB_CREDIT_RETRY,
+   L3_EVENT_DOB_CREDIT_RETRY,
+   L3_EVENT_FORCE_RETRY,
+   L3_EVENT_IDX_CONFLICT_RETRY,
+   L3_EVENT_EVICT_CONFLICT_RETRY,
+   L3_EVENT_BANK_CONFLICT_RETRY,
+   L3_EVENT_FILL_ENTRY_RETRY,
+   L3_EVENT_EVICT_NOT_READY_RETRY,
+   L3_EVENT_L3_RETRY,
+   L3_EVENT_READ_REQ,
+   L3_EVENT_WRITE_BACK_REQ,
+   L3_EVENT_INVALIDATE_NWRITE_REQ,
+   L3_EVENT_INV_REQ,
+   L3_EVENT_SELF_REQ,
+   L3_EVENT_REQ,
+   L3_EVENT_EVICT_REQ,
+   L3_EVENT_INVALIDATE_NWRITE_HIT,
+   L3_EVENT_INVALIDATE_HIT,
+   L3_EVENT_SELF_HIT,
+   L3_EVENT_READ_HIT,
+   L3_EVENT_MAX,
+};
+
+enum thunderx2_uncore_dmc_events {
+   DMC_EVENT_NONE,
+   DMC_EVENT_COUNT_CYCLES,
+   DMC_EVENT_RES2,
+   DMC_EVENT_RES3,
+   DMC_EVENT_RES4,
+   DMC_EVENT_RES5,
+   

[PATCH v5 0/2] Add ThunderX2 SoC Performance Monitoring Unit driver

2018-05-15 Thread Ganapatrao Kulkarni
This patchset adds PMU driver for Cavium's ThunderX2 SoC UNCORE devices.
The SoC has PMU support in L3 cache controller (L3C) and in the
DDR4 Memory Controller (DMC).

v5:
 -Incroporated review comments from Mark Rutland[2]
v4:
 -Incroporated review comments from Mark Rutland[1]

[1] https://www.spinics.net/lists/arm-kernel/msg588563.html
[2] https://lkml.org/lkml/2018/4/26/376

v3:
 - fixed warning reported by kbuild robot

v2:
 - rebased to 4.12-rc1
 - Removed Arch VULCAN dependency.
 - update SMC call parameters as per latest firmware.

v1:
 -Initial patch

Ganapatrao Kulkarni (2):
  perf: uncore: Adding documentation for ThunderX2 pmu uncore driver
  ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

 Documentation/perf/thunderx2-pmu.txt |  66 +++
 drivers/perf/Kconfig |   8 +
 drivers/perf/Makefile|   1 +
 drivers/perf/thunderx2_pmu.c | 965 +++
 include/linux/cpuhotplug.h   |   1 +
 5 files changed, 1041 insertions(+)
 create mode 100644 Documentation/perf/thunderx2-pmu.txt
 create mode 100644 drivers/perf/thunderx2_pmu.c

-- 
2.9.4



[PATCH v5 0/2] Add ThunderX2 SoC Performance Monitoring Unit driver

2018-05-15 Thread Ganapatrao Kulkarni
This patchset adds PMU driver for Cavium's ThunderX2 SoC UNCORE devices.
The SoC has PMU support in L3 cache controller (L3C) and in the
DDR4 Memory Controller (DMC).

v5:
 -Incroporated review comments from Mark Rutland[2]
v4:
 -Incroporated review comments from Mark Rutland[1]

[1] https://www.spinics.net/lists/arm-kernel/msg588563.html
[2] https://lkml.org/lkml/2018/4/26/376

v3:
 - fixed warning reported by kbuild robot

v2:
 - rebased to 4.12-rc1
 - Removed Arch VULCAN dependency.
 - update SMC call parameters as per latest firmware.

v1:
 -Initial patch

Ganapatrao Kulkarni (2):
  perf: uncore: Adding documentation for ThunderX2 pmu uncore driver
  ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

 Documentation/perf/thunderx2-pmu.txt |  66 +++
 drivers/perf/Kconfig |   8 +
 drivers/perf/Makefile|   1 +
 drivers/perf/thunderx2_pmu.c | 965 +++
 include/linux/cpuhotplug.h   |   1 +
 5 files changed, 1041 insertions(+)
 create mode 100644 Documentation/perf/thunderx2-pmu.txt
 create mode 100644 drivers/perf/thunderx2_pmu.c

-- 
2.9.4



[RFC/RFT] [PATCH 02/10] cpufreq: intel_pstate: Conditional frequency invariant accounting

2018-05-15 Thread Srinivas Pandruvada
intel_pstate has two operating modes: active and passive. In "active"
mode, the in-built scaling governor is used and in "passive" mode,
the driver can be used with any governor like "schedutil". In "active"
mode the utilization values from schedutil is not used and there is
a requirement from high performance computing use cases, not to read
any APERF/MPERF MSRs. In this case no need to use CPU cycles for
frequency invariant accounting by reading APERF/MPERF MSRs.
With this change frequency invariant account is only enabled in
"passive" mode.

Signed-off-by: Srinivas Pandruvada 
---
[Note: The tick will be enabled later in the series when hwp dynamic
boost is enabled]

 drivers/cpufreq/intel_pstate.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 17e566af..f686bbe 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2040,6 +2040,8 @@ static int intel_pstate_register_driver(struct 
cpufreq_driver *driver)
 {
int ret;
 
+   x86_arch_scale_freq_tick_disable();
+
memset(, 0, sizeof(global));
global.max_perf_pct = 100;
 
@@ -2052,6 +2054,9 @@ static int intel_pstate_register_driver(struct 
cpufreq_driver *driver)
 
global.min_perf_pct = min_perf_pct_min();
 
+   if (driver == _cpufreq)
+   x86_arch_scale_freq_tick_enable();
+
return 0;
 }
 
-- 
2.9.5



[PATCH] ARM: dts: imx7d: use operating-points-v2 for cpu

2018-05-15 Thread Anson Huang
This patch uses "operating-points-v2" instead of
"operating-points" to be more fit with cpufreq-dt
driver.

Signed-off-by: Anson Huang 
---
 arch/arm/boot/dts/imx7d.dtsi | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/arm/boot/dts/imx7d.dtsi b/arch/arm/boot/dts/imx7d.dtsi
index 4c9877e..28980c8 100644
--- a/arch/arm/boot/dts/imx7d.dtsi
+++ b/arch/arm/boot/dts/imx7d.dtsi
@@ -9,12 +9,8 @@
 / {
cpus {
cpu0: cpu@0 {
-   operating-points = <
-   /* KHz  uV */
-   996000  1075000
-   792000  975000
-   >;
clock-frequency = <99600>;
+   operating-points-v2 = <_opp_table>;
};
 
cpu1: cpu@1 {
@@ -22,6 +18,24 @@
device_type = "cpu";
reg = <1>;
clock-frequency = <99600>;
+   operating-points-v2 = <_opp_table>;
+   };
+   };
+
+   cpu0_opp_table: opp_table0 {
+   compatible = "operating-points-v2";
+   opp-shared;
+
+   opp-79200 {
+   opp-hz = /bits/ 64 <79200>;
+   opp-microvolt = <975000>;
+   clock-latency-ns = <15>;
+   };
+   opp-99600 {
+   opp-hz = /bits/ 64 <99600>;
+   opp-microvolt = <1075000>;
+   clock-latency-ns = <15>;
+   opp-suspend;
};
};
 
-- 
2.7.4



[RFC/RFT] [PATCH 02/10] cpufreq: intel_pstate: Conditional frequency invariant accounting

2018-05-15 Thread Srinivas Pandruvada
intel_pstate has two operating modes: active and passive. In "active"
mode, the in-built scaling governor is used and in "passive" mode,
the driver can be used with any governor like "schedutil". In "active"
mode the utilization values from schedutil is not used and there is
a requirement from high performance computing use cases, not to read
any APERF/MPERF MSRs. In this case no need to use CPU cycles for
frequency invariant accounting by reading APERF/MPERF MSRs.
With this change frequency invariant account is only enabled in
"passive" mode.

Signed-off-by: Srinivas Pandruvada 
---
[Note: The tick will be enabled later in the series when hwp dynamic
boost is enabled]

 drivers/cpufreq/intel_pstate.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 17e566af..f686bbe 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2040,6 +2040,8 @@ static int intel_pstate_register_driver(struct 
cpufreq_driver *driver)
 {
int ret;
 
+   x86_arch_scale_freq_tick_disable();
+
memset(, 0, sizeof(global));
global.max_perf_pct = 100;
 
@@ -2052,6 +2054,9 @@ static int intel_pstate_register_driver(struct 
cpufreq_driver *driver)
 
global.min_perf_pct = min_perf_pct_min();
 
+   if (driver == _cpufreq)
+   x86_arch_scale_freq_tick_enable();
+
return 0;
 }
 
-- 
2.9.5



[PATCH] ARM: dts: imx7d: use operating-points-v2 for cpu

2018-05-15 Thread Anson Huang
This patch uses "operating-points-v2" instead of
"operating-points" to be more fit with cpufreq-dt
driver.

Signed-off-by: Anson Huang 
---
 arch/arm/boot/dts/imx7d.dtsi | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/arm/boot/dts/imx7d.dtsi b/arch/arm/boot/dts/imx7d.dtsi
index 4c9877e..28980c8 100644
--- a/arch/arm/boot/dts/imx7d.dtsi
+++ b/arch/arm/boot/dts/imx7d.dtsi
@@ -9,12 +9,8 @@
 / {
cpus {
cpu0: cpu@0 {
-   operating-points = <
-   /* KHz  uV */
-   996000  1075000
-   792000  975000
-   >;
clock-frequency = <99600>;
+   operating-points-v2 = <_opp_table>;
};
 
cpu1: cpu@1 {
@@ -22,6 +18,24 @@
device_type = "cpu";
reg = <1>;
clock-frequency = <99600>;
+   operating-points-v2 = <_opp_table>;
+   };
+   };
+
+   cpu0_opp_table: opp_table0 {
+   compatible = "operating-points-v2";
+   opp-shared;
+
+   opp-79200 {
+   opp-hz = /bits/ 64 <79200>;
+   opp-microvolt = <975000>;
+   clock-latency-ns = <15>;
+   };
+   opp-99600 {
+   opp-hz = /bits/ 64 <99600>;
+   opp-microvolt = <1075000>;
+   clock-latency-ns = <15>;
+   opp-suspend;
};
};
 
-- 
2.7.4



[RFC/RFT] [PATCH 01/10] x86,sched: Add support for frequency invariance

2018-05-15 Thread Srinivas Pandruvada
From: Peter Zijlstra 

Implement arch_scale_freq_capacity() for 'modern' x86. This function
is used by the scheduler to correctly account usage in the face of
DVFS.

For example; suppose a CPU has two frequencies: 500 and 1000 Mhz. When
running a task that would consume 1/3rd of a CPU at 1000 MHz, it would
appear to consume 2/3rd (or 66.6%) when running at 500 MHz, giving the
false impression this CPU is almost at capacity, even though it can go
faster [*].

Since modern x86 has hardware control over the actual frequency we run
at (because amongst other things, Turbo-Mode), we cannot simply use
the frequency as requested through cpufreq.

Instead we use the APERF/MPERF MSRs to compute the effective frequency
over the recent past. Also, because reading MSRs is expensive, don't
do so every time we need the value, but amortize the cost by doing it
every tick.

[*] this assumes a linear frequency/performance relation; which
everybody knows to be false, but given realities its the best
approximation we can make.

Cc: Thomas Gleixner 
Cc: Suravee Suthikulpanit 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Srinivas Pandruvada 
---

Changes on top of Peter's patch:
-Ported to the latest 4.17-rc4
-Added KNL/KNM related changes
-Account for Turbo boost disabled on a system in BIOS
-New interface to disable tick processing when we don't want

 arch/x86/include/asm/topology.h |  29 ++
 arch/x86/kernel/smpboot.c   | 196 +++-
 kernel/sched/core.c |   1 +
 kernel/sched/sched.h|   7 ++
 4 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index c1d2a98..3fb5346 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -172,4 +172,33 @@ static inline void sched_clear_itmt_support(void)
 }
 #endif /* CONFIG_SCHED_MC_PRIO */
 
+#ifdef CONFIG_SMP
+#include 
+
+#define arch_scale_freq_tick arch_scale_freq_tick
+#define arch_scale_freq_capacity arch_scale_freq_capacity
+
+DECLARE_PER_CPU(unsigned long, arch_cpu_freq);
+
+static inline long arch_scale_freq_capacity(int cpu)
+{
+   if (static_cpu_has(X86_FEATURE_APERFMPERF))
+   return per_cpu(arch_cpu_freq, cpu);
+
+   return 1024 /* SCHED_CAPACITY_SCALE */;
+}
+
+extern void arch_scale_freq_tick(void);
+extern void x86_arch_scale_freq_tick_enable(void);
+extern void x86_arch_scale_freq_tick_disable(void);
+#else
+static inline void x86_arch_scale_freq_tick_enable(void)
+{
+}
+
+static inline void x86_arch_scale_freq_tick_disable(void)
+{
+}
+#endif
+
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0f1cbb0..9e2cb82 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -148,6 +148,8 @@ static inline void smpboot_restore_warm_reset_vector(void)
*((volatile u32 *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = 0;
 }
 
+static void set_cpu_max_freq(void);
+
 /*
  * Report back to the Boot Processor during boot time or to the caller 
processor
  * during CPU online.
@@ -189,6 +191,8 @@ static void smp_callin(void)
 */
set_cpu_sibling_map(raw_smp_processor_id());
 
+   set_cpu_max_freq();
+
/*
 * Get our bogomips.
 * Update loops_per_jiffy in cpu_data. Previous call to
@@ -1259,7 +1263,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
set_sched_topology(x86_topology);
 
set_cpu_sibling_map(0);
-
+   set_cpu_max_freq();
smp_sanity_check();
 
switch (apic_intr_mode) {
@@ -1676,3 +1680,193 @@ void native_play_dead(void)
 }
 
 #endif
+
+/*
+ * APERF/MPERF frequency ratio computation.
+ *
+ * The scheduler wants to do frequency invariant accounting and needs a <1
+ * ratio to account for the 'current' frequency.
+ *
+ * Since the frequency on x86 is controlled by micro-controller and our P-state
+ * setting is little more than a request/hint, we need to observe the effective
+ * frequency. We do this with APERF/MPERF.
+ *
+ * One complication is that the APERF/MPERF ratio can be >1, specifically
+ * APERF/MPERF gives the ratio relative to the max non-turbo P-state. Therefore
+ * we need to re-normalize the ratio.
+ *
+ * We do this by tracking the max APERF/MPERF ratio previously observed and
+ * scaling our MPERF delta with that. Every time our ratio goes over 1, we
+ * proportionally scale up our old max.
+ *
+ * The down-side to this runtime max search is that you have to trigger the
+ * actual max frequency before your scale is right. Therefore allow
+ * architectures to initialize the max ratio on CPU bringup.
+ */
+
+static DEFINE_PER_CPU(u64, arch_prev_aperf);
+static DEFINE_PER_CPU(u64, arch_prev_mperf);
+static DEFINE_PER_CPU(u64, 

[RFC/RFT] [PATCH 01/10] x86,sched: Add support for frequency invariance

2018-05-15 Thread Srinivas Pandruvada
From: Peter Zijlstra 

Implement arch_scale_freq_capacity() for 'modern' x86. This function
is used by the scheduler to correctly account usage in the face of
DVFS.

For example; suppose a CPU has two frequencies: 500 and 1000 Mhz. When
running a task that would consume 1/3rd of a CPU at 1000 MHz, it would
appear to consume 2/3rd (or 66.6%) when running at 500 MHz, giving the
false impression this CPU is almost at capacity, even though it can go
faster [*].

Since modern x86 has hardware control over the actual frequency we run
at (because amongst other things, Turbo-Mode), we cannot simply use
the frequency as requested through cpufreq.

Instead we use the APERF/MPERF MSRs to compute the effective frequency
over the recent past. Also, because reading MSRs is expensive, don't
do so every time we need the value, but amortize the cost by doing it
every tick.

[*] this assumes a linear frequency/performance relation; which
everybody knows to be false, but given realities its the best
approximation we can make.

Cc: Thomas Gleixner 
Cc: Suravee Suthikulpanit 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Srinivas Pandruvada 
---

Changes on top of Peter's patch:
-Ported to the latest 4.17-rc4
-Added KNL/KNM related changes
-Account for Turbo boost disabled on a system in BIOS
-New interface to disable tick processing when we don't want

 arch/x86/include/asm/topology.h |  29 ++
 arch/x86/kernel/smpboot.c   | 196 +++-
 kernel/sched/core.c |   1 +
 kernel/sched/sched.h|   7 ++
 4 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index c1d2a98..3fb5346 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -172,4 +172,33 @@ static inline void sched_clear_itmt_support(void)
 }
 #endif /* CONFIG_SCHED_MC_PRIO */
 
+#ifdef CONFIG_SMP
+#include 
+
+#define arch_scale_freq_tick arch_scale_freq_tick
+#define arch_scale_freq_capacity arch_scale_freq_capacity
+
+DECLARE_PER_CPU(unsigned long, arch_cpu_freq);
+
+static inline long arch_scale_freq_capacity(int cpu)
+{
+   if (static_cpu_has(X86_FEATURE_APERFMPERF))
+   return per_cpu(arch_cpu_freq, cpu);
+
+   return 1024 /* SCHED_CAPACITY_SCALE */;
+}
+
+extern void arch_scale_freq_tick(void);
+extern void x86_arch_scale_freq_tick_enable(void);
+extern void x86_arch_scale_freq_tick_disable(void);
+#else
+static inline void x86_arch_scale_freq_tick_enable(void)
+{
+}
+
+static inline void x86_arch_scale_freq_tick_disable(void)
+{
+}
+#endif
+
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0f1cbb0..9e2cb82 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -148,6 +148,8 @@ static inline void smpboot_restore_warm_reset_vector(void)
*((volatile u32 *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = 0;
 }
 
+static void set_cpu_max_freq(void);
+
 /*
  * Report back to the Boot Processor during boot time or to the caller 
processor
  * during CPU online.
@@ -189,6 +191,8 @@ static void smp_callin(void)
 */
set_cpu_sibling_map(raw_smp_processor_id());
 
+   set_cpu_max_freq();
+
/*
 * Get our bogomips.
 * Update loops_per_jiffy in cpu_data. Previous call to
@@ -1259,7 +1263,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
set_sched_topology(x86_topology);
 
set_cpu_sibling_map(0);
-
+   set_cpu_max_freq();
smp_sanity_check();
 
switch (apic_intr_mode) {
@@ -1676,3 +1680,193 @@ void native_play_dead(void)
 }
 
 #endif
+
+/*
+ * APERF/MPERF frequency ratio computation.
+ *
+ * The scheduler wants to do frequency invariant accounting and needs a <1
+ * ratio to account for the 'current' frequency.
+ *
+ * Since the frequency on x86 is controlled by micro-controller and our P-state
+ * setting is little more than a request/hint, we need to observe the effective
+ * frequency. We do this with APERF/MPERF.
+ *
+ * One complication is that the APERF/MPERF ratio can be >1, specifically
+ * APERF/MPERF gives the ratio relative to the max non-turbo P-state. Therefore
+ * we need to re-normalize the ratio.
+ *
+ * We do this by tracking the max APERF/MPERF ratio previously observed and
+ * scaling our MPERF delta with that. Every time our ratio goes over 1, we
+ * proportionally scale up our old max.
+ *
+ * The down-side to this runtime max search is that you have to trigger the
+ * actual max frequency before your scale is right. Therefore allow
+ * architectures to initialize the max ratio on CPU bringup.
+ */
+
+static DEFINE_PER_CPU(u64, arch_prev_aperf);
+static DEFINE_PER_CPU(u64, arch_prev_mperf);
+static DEFINE_PER_CPU(u64, arch_prev_max_freq) = SCHED_CAPACITY_SCALE;
+
+static bool turbo_disabled(void)
+{
+   u64 misc_en;
+   int err;
+
+   err = 

[RFC/RFT] [PATCH 06/10] cpufreq / sched: Add interface to get utilization values

2018-05-15 Thread Srinivas Pandruvada
Added cpufreq_get_sched_util() to get the CFS, DL and max utilization
values for a CPU. This is required for getting utilization values
for cpufreq drivers outside of kernel/sched folder.

Signed-off-by: Srinivas Pandruvada 
---
 include/linux/sched/cpufreq.h |  2 ++
 kernel/sched/cpufreq.c| 23 +++
 2 files changed, 25 insertions(+)

diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
index 5966744..a366600 100644
--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -20,6 +20,8 @@ void cpufreq_add_update_util_hook(int cpu, struct 
update_util_data *data,
void (*func)(struct update_util_data *data, u64 time,
unsigned int flags));
 void cpufreq_remove_update_util_hook(int cpu);
+void cpufreq_get_sched_util(int cpu, unsigned long *util_cfs,
+   unsigned long *util_dl, unsigned long *max);
 #endif /* CONFIG_CPU_FREQ */
 
 #endif /* _LINUX_SCHED_CPUFREQ_H */
diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index 5e54cbc..36e2839 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -60,3 +60,26 @@ void cpufreq_remove_update_util_hook(int cpu)
rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL);
 }
 EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
+
+/**
+ * cpufreq_get_sched_util - Get utilization values.
+ * @cpu: The targeted CPU.
+ *
+ * Get the CFS, DL and max utilization.
+ * This function allows cpufreq driver outside the kernel/sched to access
+ * utilization value for a CPUs run queue.
+ */
+void cpufreq_get_sched_util(int cpu, unsigned long *util_cfs,
+   unsigned long *util_dl, unsigned long *max)
+{
+#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
+   struct rq *rq = cpu_rq(cpu);
+
+   *max = arch_scale_cpu_capacity(NULL, cpu);
+   *util_cfs = cpu_util_cfs(rq);
+   *util_dl  = cpu_util_dl(rq);
+#else
+   *util_cfs = *util_dl = 1;
+#endif
+}
+EXPORT_SYMBOL_GPL(cpufreq_get_sched_util);
-- 
2.9.5



[RFC/RFT] [PATCH 06/10] cpufreq / sched: Add interface to get utilization values

2018-05-15 Thread Srinivas Pandruvada
Added cpufreq_get_sched_util() to get the CFS, DL and max utilization
values for a CPU. This is required for getting utilization values
for cpufreq drivers outside of kernel/sched folder.

Signed-off-by: Srinivas Pandruvada 
---
 include/linux/sched/cpufreq.h |  2 ++
 kernel/sched/cpufreq.c| 23 +++
 2 files changed, 25 insertions(+)

diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
index 5966744..a366600 100644
--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -20,6 +20,8 @@ void cpufreq_add_update_util_hook(int cpu, struct 
update_util_data *data,
void (*func)(struct update_util_data *data, u64 time,
unsigned int flags));
 void cpufreq_remove_update_util_hook(int cpu);
+void cpufreq_get_sched_util(int cpu, unsigned long *util_cfs,
+   unsigned long *util_dl, unsigned long *max);
 #endif /* CONFIG_CPU_FREQ */
 
 #endif /* _LINUX_SCHED_CPUFREQ_H */
diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index 5e54cbc..36e2839 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -60,3 +60,26 @@ void cpufreq_remove_update_util_hook(int cpu)
rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL);
 }
 EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
+
+/**
+ * cpufreq_get_sched_util - Get utilization values.
+ * @cpu: The targeted CPU.
+ *
+ * Get the CFS, DL and max utilization.
+ * This function allows cpufreq driver outside the kernel/sched to access
+ * utilization value for a CPUs run queue.
+ */
+void cpufreq_get_sched_util(int cpu, unsigned long *util_cfs,
+   unsigned long *util_dl, unsigned long *max)
+{
+#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
+   struct rq *rq = cpu_rq(cpu);
+
+   *max = arch_scale_cpu_capacity(NULL, cpu);
+   *util_cfs = cpu_util_cfs(rq);
+   *util_dl  = cpu_util_dl(rq);
+#else
+   *util_cfs = *util_dl = 1;
+#endif
+}
+EXPORT_SYMBOL_GPL(cpufreq_get_sched_util);
-- 
2.9.5



[RFC/RFT] [PATCH 05/10] cpufreq: intel_pstate: HWP boost performance on IO Wake

2018-05-15 Thread Srinivas Pandruvada
When a task is woken up from IO wait, boost HWP prformance to max. This
helps IO workloads on servers with per core P-states. But changing limits
has extra over head of issuing new HWP Request MSR, which takes 1000+
cycles. So this change limits setting HWP Request MSR. Also request can
be for a remote CPU.
Rate control in setting HWP Requests:
- If the current performance is around P1, simply ignore IO flag.
- Once set wait till hold time, till remove boost. While the boost
  is on, another IO flags is notified, it will prolong boost.
- If the IO flags are notified multiple ticks apart, this may not be
IO bound task. Othewise idle system gets periodic boosts for one
IO wake.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 75 ++
 1 file changed, 75 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index e200887..d418265 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -224,6 +225,8 @@ struct global_params {
  * @hwp_req_cached:Cached value of the last HWP request MSR
  * @csd:   A structure used to issue SMP async call, which
  * defines callback and arguments
+ * @hwp_boost_active:  HWP performance is boosted on this CPU
+ * @last_io_update:Last time when IO wake flag was set
  *
  * This structure stores per CPU instance data for all CPUs.
  */
@@ -258,6 +261,8 @@ struct cpudata {
s16 epp_saved;
u64 hwp_req_cached;
call_single_data_t csd;
+   bool hwp_boost_active;
+   u64 last_io_update;
 };
 
 static struct cpudata **all_cpu_data;
@@ -1421,10 +1426,80 @@ static void csd_init(struct cpudata *cpu)
cpu->csd.info = cpu;
 }
 
+/*
+ * Long hold time will keep high perf limits for long time,
+ * which negatively impacts perf/watt for some workloads,
+ * like specpower. 3ms is based on experiements on some
+ * workoads.
+ */
+static int hwp_boost_hold_time_ms = 3;
+
+/* Default: This will roughly around P1 on SKX */
+#define BOOST_PSTATE_THRESHOLD (SCHED_CAPACITY_SCALE / 2)
+static int hwp_boost_pstate_threshold = BOOST_PSTATE_THRESHOLD;
+
+static inline bool intel_pstate_check_boost_threhold(struct cpudata *cpu)
+{
+   /*
+* If the last performance is above threshold, then return false,
+* so that caller can ignore boosting.
+*/
+   if (arch_scale_freq_capacity(cpu->cpu) > hwp_boost_pstate_threshold)
+   return false;
+
+   return true;
+}
+
 static inline void intel_pstate_update_util_hwp(struct update_util_data *data,
u64 time, unsigned int flags)
 {
+   struct cpudata *cpu = container_of(data, struct cpudata, update_util);
+
+   if (flags & SCHED_CPUFREQ_IOWAIT) {
+   /*
+* Set iowait_boost flag and update time. Since IO WAIT flag
+* is set all the time, we can't just conclude that there is
+* some IO bound activity is scheduled on this CPU with just
+* one occurrence. If we receive at least two in two
+* consecutive ticks, then we start treating as IO. So
+* there will be one tick latency.
+*/
+   if (time_before64(time, cpu->last_io_update + 2 * TICK_NSEC) &&
+   intel_pstate_check_boost_threhold(cpu))
+   cpu->iowait_boost = true;
+
+   cpu->last_io_update = time;
+   cpu->last_update = time;
+   }
 
+   /*
+* If the boost is active, we will remove it after timeout on local
+* CPU only.
+*/
+   if (cpu->hwp_boost_active) {
+   if (smp_processor_id() == cpu->cpu) {
+   bool expired;
+
+   expired = time_after64(time, cpu->last_update +
+  (hwp_boost_hold_time_ms * 
NSEC_PER_MSEC));
+   if (expired) {
+   intel_pstate_hwp_boost_down(cpu);
+   cpu->hwp_boost_active = false;
+   cpu->iowait_boost = false;
+   }
+   }
+   return;
+   }
+
+   cpu->last_update = time;
+
+   if (cpu->iowait_boost) {
+   cpu->hwp_boost_active = true;
+   if (smp_processor_id() == cpu->cpu)
+   intel_pstate_hwp_boost_up(cpu);
+   else
+   smp_call_function_single_async(cpu->cpu, >csd);
+   }
 }
 
 static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
-- 
2.9.5



[RFC/RFT] [PATCH 03/10] cpufreq: intel_pstate: Utility functions to boost HWP performance limits

2018-05-15 Thread Srinivas Pandruvada
Setup necessary infrastructure to be able to boost HWP performance on a
remote CPU. First initialize data structure to be able to use
smp_call_function_single_async(). The boost up function simply set HWP
min to HWP max value and EPP to 0. The boost down function simply restores
to last cached HWP Request value.

To avoid reading HWP Request MSR during dynamic update, the HWP Request
MSR value is cached in the local memory. This caching is done whenever
HWP request MSR is modified during driver init on in setpolicy() callback
path.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index f686bbe..dc7dfa9 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -221,6 +221,9 @@ struct global_params {
  * preference/bias
  * @epp_saved: Saved EPP/EPB during system suspend or CPU offline
  * operation
+ * @hwp_req_cached:Cached value of the last HWP request MSR
+ * @csd:   A structure used to issue SMP async call, which
+ * defines callback and arguments
  *
  * This structure stores per CPU instance data for all CPUs.
  */
@@ -253,6 +256,8 @@ struct cpudata {
s16 epp_policy;
s16 epp_default;
s16 epp_saved;
+   u64 hwp_req_cached;
+   call_single_data_t csd;
 };
 
 static struct cpudata **all_cpu_data;
@@ -763,6 +768,7 @@ static void intel_pstate_hwp_set(unsigned int cpu)
intel_pstate_set_epb(cpu, epp);
}
 skip_epp:
+   cpu_data->hwp_req_cached = value;
wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
 }
 
@@ -1381,6 +1387,39 @@ static void intel_pstate_get_cpu_pstates(struct cpudata 
*cpu)
intel_pstate_set_min_pstate(cpu);
 }
 
+
+static inline void intel_pstate_hwp_boost_up(struct cpudata *cpu)
+{
+   u64 hwp_req;
+   u8 max;
+
+   max = (u8) (cpu->hwp_req_cached >> 8);
+
+   hwp_req = cpu->hwp_req_cached & ~GENMASK_ULL(31, 24);
+   hwp_req = (hwp_req & ~GENMASK_ULL(7, 0)) | max;
+
+   wrmsrl(MSR_HWP_REQUEST, hwp_req);
+}
+
+static inline void intel_pstate_hwp_boost_down(struct cpudata *cpu)
+{
+   wrmsrl(MSR_HWP_REQUEST, cpu->hwp_req_cached);
+}
+
+static void intel_pstate_hwp_boost_up_local(void *arg)
+{
+   struct cpudata *cpu = arg;
+
+   intel_pstate_hwp_boost_up(cpu);
+}
+
+static void csd_init(struct cpudata *cpu)
+{
+   cpu->csd.flags = 0;
+   cpu->csd.func = intel_pstate_hwp_boost_up_local;
+   cpu->csd.info = cpu;
+}
+
 static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
 {
struct sample *sample = >sample;
@@ -1894,6 +1933,9 @@ static int __intel_pstate_cpu_init(struct cpufreq_policy 
*policy)
 
policy->fast_switch_possible = true;
 
+   if (hwp_active)
+   csd_init(cpu);
+
return 0;
 }
 
-- 
2.9.5



[RFC/RFT] [PATCH 07/10] cpufreq: intel_pstate: HWP boost performance on busy task migrate

2018-05-15 Thread Srinivas Pandruvada
When a busy task migrates to a new CPU boost HWP prformance to max. This
helps workloads on servers with per core P-states, which saturates all
CPUs and then they migrate frequently. But changing limits has extra over
head of issuing new HWP Request MSR, which takes 1000+
cycles. So this change limits setting HWP Request MSR.
Rate control in setting HWP Requests:
- If the current performance is around P1, simply ignore.
- Once set wait till hold time, till remove boost. While the boost
 is on, another flags is notified, it will prolong boost.
- The task migrates needs to have some utilzation which is more
than threshold utilization, which will trigger P-state above minimum.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 37 -
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index d418265..ec455af 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -227,6 +227,7 @@ struct global_params {
  * defines callback and arguments
  * @hwp_boost_active:  HWP performance is boosted on this CPU
  * @last_io_update:Last time when IO wake flag was set
+ * @migrate_hint:  Set when scheduler indicates thread migration
  *
  * This structure stores per CPU instance data for all CPUs.
  */
@@ -263,6 +264,7 @@ struct cpudata {
call_single_data_t csd;
bool hwp_boost_active;
u64 last_io_update;
+   bool migrate_hint;
 };
 
 static struct cpudata **all_cpu_data;
@@ -1438,6 +1440,8 @@ static int hwp_boost_hold_time_ms = 3;
 #define BOOST_PSTATE_THRESHOLD (SCHED_CAPACITY_SCALE / 2)
 static int hwp_boost_pstate_threshold = BOOST_PSTATE_THRESHOLD;
 
+static int hwp_boost_threshold_busy_pct;
+
 static inline bool intel_pstate_check_boost_threhold(struct cpudata *cpu)
 {
/*
@@ -1450,12 +1454,32 @@ static inline bool 
intel_pstate_check_boost_threhold(struct cpudata *cpu)
return true;
 }
 
+static inline int intel_pstate_get_sched_util(struct cpudata *cpu)
+{
+   unsigned long util_cfs, util_dl, max, util;
+
+   cpufreq_get_sched_util(cpu->cpu, _cfs, _dl, );
+   util = min(util_cfs + util_dl, max);
+   return util * 100 / max;
+}
+
 static inline void intel_pstate_update_util_hwp(struct update_util_data *data,
u64 time, unsigned int flags)
 {
struct cpudata *cpu = container_of(data, struct cpudata, update_util);
 
-   if (flags & SCHED_CPUFREQ_IOWAIT) {
+   if (flags & SCHED_CPUFREQ_MIGRATION) {
+   if (intel_pstate_check_boost_threhold(cpu))
+   cpu->migrate_hint = true;
+
+   cpu->last_update = time;
+   /*
+* The rq utilization data is not migrated yet to the new CPU
+* rq, so wait for call on local CPU to boost.
+*/
+   if (smp_processor_id() != cpu->cpu)
+   return;
+   } else if (flags & SCHED_CPUFREQ_IOWAIT) {
/*
 * Set iowait_boost flag and update time. Since IO WAIT flag
 * is set all the time, we can't just conclude that there is
@@ -1499,6 +1523,17 @@ static inline void intel_pstate_update_util_hwp(struct 
update_util_data *data,
intel_pstate_hwp_boost_up(cpu);
else
smp_call_function_single_async(cpu->cpu, >csd);
+   return;
+   }
+
+   /* Ignore if the migrated thread has low utilization */
+   if (cpu->migrate_hint && smp_processor_id() == cpu->cpu) {
+   int util = intel_pstate_get_sched_util(cpu);
+
+   if (util >= hwp_boost_threshold_busy_pct) {
+   cpu->hwp_boost_active = true;
+   intel_pstate_hwp_boost_up(cpu);
+   }
}
 }
 
-- 
2.9.5



[RFC/RFT] [PATCH 00/10] Intel_pstate: HWP Dynamic performance boost

2018-05-15 Thread Srinivas Pandruvada
This series tries to address some concern in performance particularly with IO
workloads (Reported by Mel Gorman), when HWP is using intel_pstate powersave
policy.

Background
HWP performance can be controlled by user space using sysfs interface for
max/min frequency limits and energy performance preference settings. Based on
workload characteristics these can be adjusted from user space. These limits
are not changed dynamically by kernel based on workload.

By default HWP defaults to energy performance preference value of 0x80 on
majority of platforms(Scale is 0-255, 0 is max performance and 255 is min).
This value offers best performance/watt and for majority of server workloads
performance doesn't suffer. Also users always have option to use performance
policy of intel_pstate, to get best performance. But user tend to run with
out of box configuration, which is powersave policy on most of the distros.

In some case it is possible to dynamically adjust performance, for example,
when a CPU is woken up due to IO completion or thread migrate to a new CPU. In
this case HWP algorithm will take some time to build utilization and ramp up
P-states. So this may results in lower performance for some IO workloads and
workloads which tend to migrate. The idea of this patch series is to
temporarily boost performance dynamically in these cases. This is only
applicable only when user is using powersave policy, not in performance policy.

Results on a Skylake server:

Benchmark   Improvement %
--
dbench  50.36
thread IO bench (tiobench)  10.35
File IO 9.81
sqlite  15.76
X264 -104 cores 9.75

Spec Power  (Negligible impact 7382 Vs. 7378)
Idle Power  No change observed
---

HWP brings in best performace/watt at EPP=0x80. Since we are boosting
EPP here to 0, the performance/watt drops upto 10%. So there is a power
penalty of these changes.

Also Mel Gorman provided test results on a prior patchset, which shows
benifits of this series.

Peter Zijlstra (1):
  x86,sched: Add support for frequency invariance

Srinivas Pandruvada (9):
  cpufreq: intel_pstate: Conditional frequency invariant accounting
  cpufreq: intel_pstate: Utility functions to boost HWP performance
limits
  cpufreq: intel_pstate: Add update_util_hook for HWP
  cpufreq: intel_pstate: HWP boost performance on IO Wake
  cpufreq / sched: Add interface to get utilization values
  cpufreq: intel_pstate: HWP boost performance on busy task migrate
  cpufreq: intel_pstate: Dyanmically update busy pct
  cpufreq: intel_pstate: New sysfs entry to control HWP boost
  cpufreq: intel_pstate: enable boost for SKX

 arch/x86/include/asm/topology.h |  29 +
 arch/x86/kernel/smpboot.c   | 196 +-
 drivers/cpufreq/intel_pstate.c  | 260 +++-
 include/linux/sched/cpufreq.h   |   2 +
 kernel/sched/core.c |   1 +
 kernel/sched/cpufreq.c  |  23 
 kernel/sched/sched.h|   7 ++
 7 files changed, 513 insertions(+), 5 deletions(-)

-- 
2.9.5



[RFC/RFT] [PATCH 05/10] cpufreq: intel_pstate: HWP boost performance on IO Wake

2018-05-15 Thread Srinivas Pandruvada
When a task is woken up from IO wait, boost HWP prformance to max. This
helps IO workloads on servers with per core P-states. But changing limits
has extra over head of issuing new HWP Request MSR, which takes 1000+
cycles. So this change limits setting HWP Request MSR. Also request can
be for a remote CPU.
Rate control in setting HWP Requests:
- If the current performance is around P1, simply ignore IO flag.
- Once set wait till hold time, till remove boost. While the boost
  is on, another IO flags is notified, it will prolong boost.
- If the IO flags are notified multiple ticks apart, this may not be
IO bound task. Othewise idle system gets periodic boosts for one
IO wake.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 75 ++
 1 file changed, 75 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index e200887..d418265 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -224,6 +225,8 @@ struct global_params {
  * @hwp_req_cached:Cached value of the last HWP request MSR
  * @csd:   A structure used to issue SMP async call, which
  * defines callback and arguments
+ * @hwp_boost_active:  HWP performance is boosted on this CPU
+ * @last_io_update:Last time when IO wake flag was set
  *
  * This structure stores per CPU instance data for all CPUs.
  */
@@ -258,6 +261,8 @@ struct cpudata {
s16 epp_saved;
u64 hwp_req_cached;
call_single_data_t csd;
+   bool hwp_boost_active;
+   u64 last_io_update;
 };
 
 static struct cpudata **all_cpu_data;
@@ -1421,10 +1426,80 @@ static void csd_init(struct cpudata *cpu)
cpu->csd.info = cpu;
 }
 
+/*
+ * Long hold time will keep high perf limits for long time,
+ * which negatively impacts perf/watt for some workloads,
+ * like specpower. 3ms is based on experiements on some
+ * workoads.
+ */
+static int hwp_boost_hold_time_ms = 3;
+
+/* Default: This will roughly around P1 on SKX */
+#define BOOST_PSTATE_THRESHOLD (SCHED_CAPACITY_SCALE / 2)
+static int hwp_boost_pstate_threshold = BOOST_PSTATE_THRESHOLD;
+
+static inline bool intel_pstate_check_boost_threhold(struct cpudata *cpu)
+{
+   /*
+* If the last performance is above threshold, then return false,
+* so that caller can ignore boosting.
+*/
+   if (arch_scale_freq_capacity(cpu->cpu) > hwp_boost_pstate_threshold)
+   return false;
+
+   return true;
+}
+
 static inline void intel_pstate_update_util_hwp(struct update_util_data *data,
u64 time, unsigned int flags)
 {
+   struct cpudata *cpu = container_of(data, struct cpudata, update_util);
+
+   if (flags & SCHED_CPUFREQ_IOWAIT) {
+   /*
+* Set iowait_boost flag and update time. Since IO WAIT flag
+* is set all the time, we can't just conclude that there is
+* some IO bound activity is scheduled on this CPU with just
+* one occurrence. If we receive at least two in two
+* consecutive ticks, then we start treating as IO. So
+* there will be one tick latency.
+*/
+   if (time_before64(time, cpu->last_io_update + 2 * TICK_NSEC) &&
+   intel_pstate_check_boost_threhold(cpu))
+   cpu->iowait_boost = true;
+
+   cpu->last_io_update = time;
+   cpu->last_update = time;
+   }
 
+   /*
+* If the boost is active, we will remove it after timeout on local
+* CPU only.
+*/
+   if (cpu->hwp_boost_active) {
+   if (smp_processor_id() == cpu->cpu) {
+   bool expired;
+
+   expired = time_after64(time, cpu->last_update +
+  (hwp_boost_hold_time_ms * 
NSEC_PER_MSEC));
+   if (expired) {
+   intel_pstate_hwp_boost_down(cpu);
+   cpu->hwp_boost_active = false;
+   cpu->iowait_boost = false;
+   }
+   }
+   return;
+   }
+
+   cpu->last_update = time;
+
+   if (cpu->iowait_boost) {
+   cpu->hwp_boost_active = true;
+   if (smp_processor_id() == cpu->cpu)
+   intel_pstate_hwp_boost_up(cpu);
+   else
+   smp_call_function_single_async(cpu->cpu, >csd);
+   }
 }
 
 static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
-- 
2.9.5



[RFC/RFT] [PATCH 03/10] cpufreq: intel_pstate: Utility functions to boost HWP performance limits

2018-05-15 Thread Srinivas Pandruvada
Setup necessary infrastructure to be able to boost HWP performance on a
remote CPU. First initialize data structure to be able to use
smp_call_function_single_async(). The boost up function simply set HWP
min to HWP max value and EPP to 0. The boost down function simply restores
to last cached HWP Request value.

To avoid reading HWP Request MSR during dynamic update, the HWP Request
MSR value is cached in the local memory. This caching is done whenever
HWP request MSR is modified during driver init on in setpolicy() callback
path.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index f686bbe..dc7dfa9 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -221,6 +221,9 @@ struct global_params {
  * preference/bias
  * @epp_saved: Saved EPP/EPB during system suspend or CPU offline
  * operation
+ * @hwp_req_cached:Cached value of the last HWP request MSR
+ * @csd:   A structure used to issue SMP async call, which
+ * defines callback and arguments
  *
  * This structure stores per CPU instance data for all CPUs.
  */
@@ -253,6 +256,8 @@ struct cpudata {
s16 epp_policy;
s16 epp_default;
s16 epp_saved;
+   u64 hwp_req_cached;
+   call_single_data_t csd;
 };
 
 static struct cpudata **all_cpu_data;
@@ -763,6 +768,7 @@ static void intel_pstate_hwp_set(unsigned int cpu)
intel_pstate_set_epb(cpu, epp);
}
 skip_epp:
+   cpu_data->hwp_req_cached = value;
wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
 }
 
@@ -1381,6 +1387,39 @@ static void intel_pstate_get_cpu_pstates(struct cpudata 
*cpu)
intel_pstate_set_min_pstate(cpu);
 }
 
+
+static inline void intel_pstate_hwp_boost_up(struct cpudata *cpu)
+{
+   u64 hwp_req;
+   u8 max;
+
+   max = (u8) (cpu->hwp_req_cached >> 8);
+
+   hwp_req = cpu->hwp_req_cached & ~GENMASK_ULL(31, 24);
+   hwp_req = (hwp_req & ~GENMASK_ULL(7, 0)) | max;
+
+   wrmsrl(MSR_HWP_REQUEST, hwp_req);
+}
+
+static inline void intel_pstate_hwp_boost_down(struct cpudata *cpu)
+{
+   wrmsrl(MSR_HWP_REQUEST, cpu->hwp_req_cached);
+}
+
+static void intel_pstate_hwp_boost_up_local(void *arg)
+{
+   struct cpudata *cpu = arg;
+
+   intel_pstate_hwp_boost_up(cpu);
+}
+
+static void csd_init(struct cpudata *cpu)
+{
+   cpu->csd.flags = 0;
+   cpu->csd.func = intel_pstate_hwp_boost_up_local;
+   cpu->csd.info = cpu;
+}
+
 static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
 {
struct sample *sample = >sample;
@@ -1894,6 +1933,9 @@ static int __intel_pstate_cpu_init(struct cpufreq_policy 
*policy)
 
policy->fast_switch_possible = true;
 
+   if (hwp_active)
+   csd_init(cpu);
+
return 0;
 }
 
-- 
2.9.5



[RFC/RFT] [PATCH 07/10] cpufreq: intel_pstate: HWP boost performance on busy task migrate

2018-05-15 Thread Srinivas Pandruvada
When a busy task migrates to a new CPU boost HWP prformance to max. This
helps workloads on servers with per core P-states, which saturates all
CPUs and then they migrate frequently. But changing limits has extra over
head of issuing new HWP Request MSR, which takes 1000+
cycles. So this change limits setting HWP Request MSR.
Rate control in setting HWP Requests:
- If the current performance is around P1, simply ignore.
- Once set wait till hold time, till remove boost. While the boost
 is on, another flags is notified, it will prolong boost.
- The task migrates needs to have some utilzation which is more
than threshold utilization, which will trigger P-state above minimum.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 37 -
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index d418265..ec455af 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -227,6 +227,7 @@ struct global_params {
  * defines callback and arguments
  * @hwp_boost_active:  HWP performance is boosted on this CPU
  * @last_io_update:Last time when IO wake flag was set
+ * @migrate_hint:  Set when scheduler indicates thread migration
  *
  * This structure stores per CPU instance data for all CPUs.
  */
@@ -263,6 +264,7 @@ struct cpudata {
call_single_data_t csd;
bool hwp_boost_active;
u64 last_io_update;
+   bool migrate_hint;
 };
 
 static struct cpudata **all_cpu_data;
@@ -1438,6 +1440,8 @@ static int hwp_boost_hold_time_ms = 3;
 #define BOOST_PSTATE_THRESHOLD (SCHED_CAPACITY_SCALE / 2)
 static int hwp_boost_pstate_threshold = BOOST_PSTATE_THRESHOLD;
 
+static int hwp_boost_threshold_busy_pct;
+
 static inline bool intel_pstate_check_boost_threhold(struct cpudata *cpu)
 {
/*
@@ -1450,12 +1454,32 @@ static inline bool 
intel_pstate_check_boost_threhold(struct cpudata *cpu)
return true;
 }
 
+static inline int intel_pstate_get_sched_util(struct cpudata *cpu)
+{
+   unsigned long util_cfs, util_dl, max, util;
+
+   cpufreq_get_sched_util(cpu->cpu, _cfs, _dl, );
+   util = min(util_cfs + util_dl, max);
+   return util * 100 / max;
+}
+
 static inline void intel_pstate_update_util_hwp(struct update_util_data *data,
u64 time, unsigned int flags)
 {
struct cpudata *cpu = container_of(data, struct cpudata, update_util);
 
-   if (flags & SCHED_CPUFREQ_IOWAIT) {
+   if (flags & SCHED_CPUFREQ_MIGRATION) {
+   if (intel_pstate_check_boost_threhold(cpu))
+   cpu->migrate_hint = true;
+
+   cpu->last_update = time;
+   /*
+* The rq utilization data is not migrated yet to the new CPU
+* rq, so wait for call on local CPU to boost.
+*/
+   if (smp_processor_id() != cpu->cpu)
+   return;
+   } else if (flags & SCHED_CPUFREQ_IOWAIT) {
/*
 * Set iowait_boost flag and update time. Since IO WAIT flag
 * is set all the time, we can't just conclude that there is
@@ -1499,6 +1523,17 @@ static inline void intel_pstate_update_util_hwp(struct 
update_util_data *data,
intel_pstate_hwp_boost_up(cpu);
else
smp_call_function_single_async(cpu->cpu, >csd);
+   return;
+   }
+
+   /* Ignore if the migrated thread has low utilization */
+   if (cpu->migrate_hint && smp_processor_id() == cpu->cpu) {
+   int util = intel_pstate_get_sched_util(cpu);
+
+   if (util >= hwp_boost_threshold_busy_pct) {
+   cpu->hwp_boost_active = true;
+   intel_pstate_hwp_boost_up(cpu);
+   }
}
 }
 
-- 
2.9.5



[RFC/RFT] [PATCH 00/10] Intel_pstate: HWP Dynamic performance boost

2018-05-15 Thread Srinivas Pandruvada
This series tries to address some concern in performance particularly with IO
workloads (Reported by Mel Gorman), when HWP is using intel_pstate powersave
policy.

Background
HWP performance can be controlled by user space using sysfs interface for
max/min frequency limits and energy performance preference settings. Based on
workload characteristics these can be adjusted from user space. These limits
are not changed dynamically by kernel based on workload.

By default HWP defaults to energy performance preference value of 0x80 on
majority of platforms(Scale is 0-255, 0 is max performance and 255 is min).
This value offers best performance/watt and for majority of server workloads
performance doesn't suffer. Also users always have option to use performance
policy of intel_pstate, to get best performance. But user tend to run with
out of box configuration, which is powersave policy on most of the distros.

In some case it is possible to dynamically adjust performance, for example,
when a CPU is woken up due to IO completion or thread migrate to a new CPU. In
this case HWP algorithm will take some time to build utilization and ramp up
P-states. So this may results in lower performance for some IO workloads and
workloads which tend to migrate. The idea of this patch series is to
temporarily boost performance dynamically in these cases. This is only
applicable only when user is using powersave policy, not in performance policy.

Results on a Skylake server:

Benchmark   Improvement %
--
dbench  50.36
thread IO bench (tiobench)  10.35
File IO 9.81
sqlite  15.76
X264 -104 cores 9.75

Spec Power  (Negligible impact 7382 Vs. 7378)
Idle Power  No change observed
---

HWP brings in best performace/watt at EPP=0x80. Since we are boosting
EPP here to 0, the performance/watt drops upto 10%. So there is a power
penalty of these changes.

Also Mel Gorman provided test results on a prior patchset, which shows
benifits of this series.

Peter Zijlstra (1):
  x86,sched: Add support for frequency invariance

Srinivas Pandruvada (9):
  cpufreq: intel_pstate: Conditional frequency invariant accounting
  cpufreq: intel_pstate: Utility functions to boost HWP performance
limits
  cpufreq: intel_pstate: Add update_util_hook for HWP
  cpufreq: intel_pstate: HWP boost performance on IO Wake
  cpufreq / sched: Add interface to get utilization values
  cpufreq: intel_pstate: HWP boost performance on busy task migrate
  cpufreq: intel_pstate: Dyanmically update busy pct
  cpufreq: intel_pstate: New sysfs entry to control HWP boost
  cpufreq: intel_pstate: enable boost for SKX

 arch/x86/include/asm/topology.h |  29 +
 arch/x86/kernel/smpboot.c   | 196 +-
 drivers/cpufreq/intel_pstate.c  | 260 +++-
 include/linux/sched/cpufreq.h   |   2 +
 kernel/sched/core.c |   1 +
 kernel/sched/cpufreq.c  |  23 
 kernel/sched/sched.h|   7 ++
 7 files changed, 513 insertions(+), 5 deletions(-)

-- 
2.9.5



[RFC/RFT] [PATCH 10/10] cpufreq: intel_pstate: enable boost for SKX

2018-05-15 Thread Srinivas Pandruvada
Enable HWP boost on Skylake server platform.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 65d11d2..827c003 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1863,6 +1863,11 @@ static const struct x86_cpu_id 
intel_pstate_cpu_ee_disable_ids[] = {
{}
 };
 
+static const struct x86_cpu_id intel_pstate_hwp_boost_ids[] __initconst = {
+   ICPU(INTEL_FAM6_SKYLAKE_X, core_funcs),
+   {}
+};
+
 static int intel_pstate_init_cpu(unsigned int cpunum)
 {
struct cpudata *cpu;
@@ -1893,6 +1898,10 @@ static int intel_pstate_init_cpu(unsigned int cpunum)
intel_pstate_disable_ee(cpunum);
 
intel_pstate_hwp_enable(cpu);
+
+   id = x86_match_cpu(intel_pstate_hwp_boost_ids);
+   if (id)
+   hwp_boost = true;
}
 
intel_pstate_get_cpu_pstates(cpu);
-- 
2.9.5



[RFC/RFT] [PATCH 10/10] cpufreq: intel_pstate: enable boost for SKX

2018-05-15 Thread Srinivas Pandruvada
Enable HWP boost on Skylake server platform.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 65d11d2..827c003 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1863,6 +1863,11 @@ static const struct x86_cpu_id 
intel_pstate_cpu_ee_disable_ids[] = {
{}
 };
 
+static const struct x86_cpu_id intel_pstate_hwp_boost_ids[] __initconst = {
+   ICPU(INTEL_FAM6_SKYLAKE_X, core_funcs),
+   {}
+};
+
 static int intel_pstate_init_cpu(unsigned int cpunum)
 {
struct cpudata *cpu;
@@ -1893,6 +1898,10 @@ static int intel_pstate_init_cpu(unsigned int cpunum)
intel_pstate_disable_ee(cpunum);
 
intel_pstate_hwp_enable(cpu);
+
+   id = x86_match_cpu(intel_pstate_hwp_boost_ids);
+   if (id)
+   hwp_boost = true;
}
 
intel_pstate_get_cpu_pstates(cpu);
-- 
2.9.5



[RFC/RFT] [PATCH 09/10] cpufreq: intel_pstate: New sysfs entry to control HWP boost

2018-05-15 Thread Srinivas Pandruvada
A new attribute is added to intel_pstate sysfs to enable/disable
HWP dynamic performance boost.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index c43edce..65d11d2 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1034,6 +1034,30 @@ static ssize_t store_min_perf_pct(struct kobject *a, 
struct attribute *b,
return count;
 }
 
+static ssize_t show_hwp_dynamic_boost(struct kobject *kobj,
+   struct attribute *attr, char *buf)
+{
+   return sprintf(buf, "%u\n", hwp_boost);
+}
+
+static ssize_t store_hwp_dynamic_boost(struct kobject *a, struct attribute *b,
+  const char *buf, size_t count)
+{
+   unsigned int input;
+   int ret;
+
+   ret = kstrtouint(buf, 10, );
+   if (ret)
+   return ret;
+
+   mutex_lock(_pstate_driver_lock);
+   hwp_boost = !!input;
+   intel_pstate_update_policies();
+   mutex_unlock(_pstate_driver_lock);
+
+   return count;
+}
+
 show_one(max_perf_pct, max_perf_pct);
 show_one(min_perf_pct, min_perf_pct);
 
@@ -1043,6 +1067,7 @@ define_one_global_rw(max_perf_pct);
 define_one_global_rw(min_perf_pct);
 define_one_global_ro(turbo_pct);
 define_one_global_ro(num_pstates);
+define_one_global_rw(hwp_dynamic_boost);
 
 static struct attribute *intel_pstate_attributes[] = {
,
@@ -1083,6 +1108,11 @@ static void __init intel_pstate_sysfs_expose_params(void)
rc = sysfs_create_file(intel_pstate_kobject, _perf_pct.attr);
WARN_ON(rc);
 
+   if (hwp_active) {
+   rc = sysfs_create_file(intel_pstate_kobject,
+  _dynamic_boost.attr);
+   WARN_ON(rc);
+   }
 }
 /** sysfs end /
 
-- 
2.9.5



[RFC/RFT] [PATCH 08/10] cpufreq: intel_pstate: Dyanmically update busy pct

2018-05-15 Thread Srinivas Pandruvada
Calculate hwp_boost_threshold_busy_pct (task busy percent, which is
worth boosting) and hwp_boost_pstate_threshold (Don't boost if
CPU already has some performance) based on platform, min, max and
turbo frequencies.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 40 +++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index ec455af..c43edce 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1463,6 +1463,42 @@ static inline int intel_pstate_get_sched_util(struct 
cpudata *cpu)
return util * 100 / max;
 }
 
+
+static inline void intel_pstate_update_busy_threshold(struct cpudata *cpu)
+{
+   if (!hwp_boost_threshold_busy_pct) {
+   int min_freq, max_freq;
+
+   min_freq  = cpu->pstate.min_pstate * cpu->pstate.scaling;
+   update_turbo_state();
+   max_freq =  global.turbo_disabled || global.no_turbo ?
+   cpu->pstate.max_freq : cpu->pstate.turbo_freq;
+
+   /*
+* We are guranteed to get atleast min P-state. If we assume
+* P-state is proportional to load (such that 10% load
+* increase will result in 10% P-state increase), we will
+* get at least min P-state till we have atleast
+* (min * 100/max) percent cpu load. So any load less than
+* than this this we shouldn't do any boost. Then boosting
+* is not free, we will add atleast 20% offset.
+*/
+   hwp_boost_threshold_busy_pct = min_freq * 100 / max_freq;
+   hwp_boost_threshold_busy_pct += 20;
+   pr_debug("hwp_boost_threshold_busy_pct = %d\n",
+hwp_boost_threshold_busy_pct);
+   }
+
+   /* P1 percent out of total range of P-states */
+   if (cpu->pstate.max_freq != cpu->pstate.turbo_freq) {
+   hwp_boost_pstate_threshold =
+   cpu->pstate.max_freq * SCHED_CAPACITY_SCALE / 
cpu->pstate.turbo_freq;
+   pr_debug("hwp_boost_pstate_threshold = %d\n",
+hwp_boost_pstate_threshold);
+   }
+
+}
+
 static inline void intel_pstate_update_util_hwp(struct update_util_data *data,
u64 time, unsigned int flags)
 {
@@ -2061,8 +2097,10 @@ static int __intel_pstate_cpu_init(struct cpufreq_policy 
*policy)
 
policy->fast_switch_possible = true;
 
-   if (hwp_active)
+   if (hwp_active) {
csd_init(cpu);
+   intel_pstate_update_busy_threshold(cpu);
+   }
 
return 0;
 }
-- 
2.9.5



[RFC/RFT] [PATCH 09/10] cpufreq: intel_pstate: New sysfs entry to control HWP boost

2018-05-15 Thread Srinivas Pandruvada
A new attribute is added to intel_pstate sysfs to enable/disable
HWP dynamic performance boost.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index c43edce..65d11d2 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1034,6 +1034,30 @@ static ssize_t store_min_perf_pct(struct kobject *a, 
struct attribute *b,
return count;
 }
 
+static ssize_t show_hwp_dynamic_boost(struct kobject *kobj,
+   struct attribute *attr, char *buf)
+{
+   return sprintf(buf, "%u\n", hwp_boost);
+}
+
+static ssize_t store_hwp_dynamic_boost(struct kobject *a, struct attribute *b,
+  const char *buf, size_t count)
+{
+   unsigned int input;
+   int ret;
+
+   ret = kstrtouint(buf, 10, );
+   if (ret)
+   return ret;
+
+   mutex_lock(_pstate_driver_lock);
+   hwp_boost = !!input;
+   intel_pstate_update_policies();
+   mutex_unlock(_pstate_driver_lock);
+
+   return count;
+}
+
 show_one(max_perf_pct, max_perf_pct);
 show_one(min_perf_pct, min_perf_pct);
 
@@ -1043,6 +1067,7 @@ define_one_global_rw(max_perf_pct);
 define_one_global_rw(min_perf_pct);
 define_one_global_ro(turbo_pct);
 define_one_global_ro(num_pstates);
+define_one_global_rw(hwp_dynamic_boost);
 
 static struct attribute *intel_pstate_attributes[] = {
,
@@ -1083,6 +1108,11 @@ static void __init intel_pstate_sysfs_expose_params(void)
rc = sysfs_create_file(intel_pstate_kobject, _perf_pct.attr);
WARN_ON(rc);
 
+   if (hwp_active) {
+   rc = sysfs_create_file(intel_pstate_kobject,
+  _dynamic_boost.attr);
+   WARN_ON(rc);
+   }
 }
 /** sysfs end /
 
-- 
2.9.5



[RFC/RFT] [PATCH 08/10] cpufreq: intel_pstate: Dyanmically update busy pct

2018-05-15 Thread Srinivas Pandruvada
Calculate hwp_boost_threshold_busy_pct (task busy percent, which is
worth boosting) and hwp_boost_pstate_threshold (Don't boost if
CPU already has some performance) based on platform, min, max and
turbo frequencies.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 40 +++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index ec455af..c43edce 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1463,6 +1463,42 @@ static inline int intel_pstate_get_sched_util(struct 
cpudata *cpu)
return util * 100 / max;
 }
 
+
+static inline void intel_pstate_update_busy_threshold(struct cpudata *cpu)
+{
+   if (!hwp_boost_threshold_busy_pct) {
+   int min_freq, max_freq;
+
+   min_freq  = cpu->pstate.min_pstate * cpu->pstate.scaling;
+   update_turbo_state();
+   max_freq =  global.turbo_disabled || global.no_turbo ?
+   cpu->pstate.max_freq : cpu->pstate.turbo_freq;
+
+   /*
+* We are guranteed to get atleast min P-state. If we assume
+* P-state is proportional to load (such that 10% load
+* increase will result in 10% P-state increase), we will
+* get at least min P-state till we have atleast
+* (min * 100/max) percent cpu load. So any load less than
+* than this this we shouldn't do any boost. Then boosting
+* is not free, we will add atleast 20% offset.
+*/
+   hwp_boost_threshold_busy_pct = min_freq * 100 / max_freq;
+   hwp_boost_threshold_busy_pct += 20;
+   pr_debug("hwp_boost_threshold_busy_pct = %d\n",
+hwp_boost_threshold_busy_pct);
+   }
+
+   /* P1 percent out of total range of P-states */
+   if (cpu->pstate.max_freq != cpu->pstate.turbo_freq) {
+   hwp_boost_pstate_threshold =
+   cpu->pstate.max_freq * SCHED_CAPACITY_SCALE / 
cpu->pstate.turbo_freq;
+   pr_debug("hwp_boost_pstate_threshold = %d\n",
+hwp_boost_pstate_threshold);
+   }
+
+}
+
 static inline void intel_pstate_update_util_hwp(struct update_util_data *data,
u64 time, unsigned int flags)
 {
@@ -2061,8 +2097,10 @@ static int __intel_pstate_cpu_init(struct cpufreq_policy 
*policy)
 
policy->fast_switch_possible = true;
 
-   if (hwp_active)
+   if (hwp_active) {
csd_init(cpu);
+   intel_pstate_update_busy_threshold(cpu);
+   }
 
return 0;
 }
-- 
2.9.5



[RFC/RFT] [PATCH 04/10] cpufreq: intel_pstate: Add update_util_hook for HWP

2018-05-15 Thread Srinivas Pandruvada
When HWP dynamic boost is active then set the HWP specific update util
hook. Also start and stop processing in frequency invariant accounting
based on the HWP dyanmic boost setting

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index dc7dfa9..e200887 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -290,6 +290,7 @@ static struct pstate_funcs pstate_funcs __read_mostly;
 
 static int hwp_active __read_mostly;
 static bool per_cpu_limits __read_mostly;
+static bool hwp_boost __read_mostly;
 
 static struct cpufreq_driver *intel_pstate_driver __read_mostly;
 
@@ -1420,6 +1421,12 @@ static void csd_init(struct cpudata *cpu)
cpu->csd.info = cpu;
 }
 
+static inline void intel_pstate_update_util_hwp(struct update_util_data *data,
+   u64 time, unsigned int flags)
+{
+
+}
+
 static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
 {
struct sample *sample = >sample;
@@ -1723,7 +1730,7 @@ static void intel_pstate_set_update_util_hook(unsigned 
int cpu_num)
 {
struct cpudata *cpu = all_cpu_data[cpu_num];
 
-   if (hwp_active)
+   if (hwp_active && !hwp_boost)
return;
 
if (cpu->update_util_set)
@@ -1731,8 +1738,12 @@ static void intel_pstate_set_update_util_hook(unsigned 
int cpu_num)
 
/* Prevent intel_pstate_update_util() from using stale data. */
cpu->sample.time = 0;
-   cpufreq_add_update_util_hook(cpu_num, >update_util,
-intel_pstate_update_util);
+   if (hwp_active)
+   cpufreq_add_update_util_hook(cpu_num, >update_util,
+intel_pstate_update_util_hwp);
+   else
+   cpufreq_add_update_util_hook(cpu_num, >update_util,
+intel_pstate_update_util);
cpu->update_util_set = true;
 }
 
@@ -1844,8 +1855,15 @@ static int intel_pstate_set_policy(struct cpufreq_policy 
*policy)
intel_pstate_set_update_util_hook(policy->cpu);
}
 
-   if (hwp_active)
+   if (hwp_active) {
+   if (hwp_boost) {
+   x86_arch_scale_freq_tick_enable();
+   } else {
+   intel_pstate_clear_update_util_hook(policy->cpu);
+   x86_arch_scale_freq_tick_disable();
+   }
intel_pstate_hwp_set(policy->cpu);
+   }
 
mutex_unlock(_pstate_limits_lock);
 
-- 
2.9.5



[RFC/RFT] [PATCH 04/10] cpufreq: intel_pstate: Add update_util_hook for HWP

2018-05-15 Thread Srinivas Pandruvada
When HWP dynamic boost is active then set the HWP specific update util
hook. Also start and stop processing in frequency invariant accounting
based on the HWP dyanmic boost setting

Signed-off-by: Srinivas Pandruvada 
---
 drivers/cpufreq/intel_pstate.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index dc7dfa9..e200887 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -290,6 +290,7 @@ static struct pstate_funcs pstate_funcs __read_mostly;
 
 static int hwp_active __read_mostly;
 static bool per_cpu_limits __read_mostly;
+static bool hwp_boost __read_mostly;
 
 static struct cpufreq_driver *intel_pstate_driver __read_mostly;
 
@@ -1420,6 +1421,12 @@ static void csd_init(struct cpudata *cpu)
cpu->csd.info = cpu;
 }
 
+static inline void intel_pstate_update_util_hwp(struct update_util_data *data,
+   u64 time, unsigned int flags)
+{
+
+}
+
 static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
 {
struct sample *sample = >sample;
@@ -1723,7 +1730,7 @@ static void intel_pstate_set_update_util_hook(unsigned 
int cpu_num)
 {
struct cpudata *cpu = all_cpu_data[cpu_num];
 
-   if (hwp_active)
+   if (hwp_active && !hwp_boost)
return;
 
if (cpu->update_util_set)
@@ -1731,8 +1738,12 @@ static void intel_pstate_set_update_util_hook(unsigned 
int cpu_num)
 
/* Prevent intel_pstate_update_util() from using stale data. */
cpu->sample.time = 0;
-   cpufreq_add_update_util_hook(cpu_num, >update_util,
-intel_pstate_update_util);
+   if (hwp_active)
+   cpufreq_add_update_util_hook(cpu_num, >update_util,
+intel_pstate_update_util_hwp);
+   else
+   cpufreq_add_update_util_hook(cpu_num, >update_util,
+intel_pstate_update_util);
cpu->update_util_set = true;
 }
 
@@ -1844,8 +1855,15 @@ static int intel_pstate_set_policy(struct cpufreq_policy 
*policy)
intel_pstate_set_update_util_hook(policy->cpu);
}
 
-   if (hwp_active)
+   if (hwp_active) {
+   if (hwp_boost) {
+   x86_arch_scale_freq_tick_enable();
+   } else {
+   intel_pstate_clear_update_util_hook(policy->cpu);
+   x86_arch_scale_freq_tick_disable();
+   }
intel_pstate_hwp_set(policy->cpu);
+   }
 
mutex_unlock(_pstate_limits_lock);
 
-- 
2.9.5



Re: [PATCH v2] ipc: Adding new return type vm_fault_t

2018-05-15 Thread Souptick Joarder
On Thu, May 10, 2018 at 7:34 PM, Souptick Joarder  wrote:
> On Wed, Apr 25, 2018 at 10:04 AM, Souptick Joarder  
> wrote:
>> Use new return type vm_fault_t for fault handler. For
>> now, this is just documenting that the function returns
>> a VM_FAULT value rather than an errno. Once all instances
>> are converted, vm_fault_t will become a distinct type.
>>
>> Commit 1c8f422059ae ("mm: change return type to vm_fault_t")
>>
>> Signed-off-by: Souptick Joarder 
>> Reviewed-by: Matthew Wilcox 
>> ---
>> v2: Updated the change log
>>
>>  ipc/shm.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/ipc/shm.c b/ipc/shm.c
>> index 4643865..2ba0cfc 100644
>> --- a/ipc/shm.c
>> +++ b/ipc/shm.c
>> @@ -378,7 +378,7 @@ void exit_shm(struct task_struct *task)
>> up_write(_ids(ns).rwsem);
>>  }
>>
>> -static int shm_fault(struct vm_fault *vmf)
>> +static vm_fault_t shm_fault(struct vm_fault *vmf)
>>  {
>> struct file *file = vmf->vma->vm_file;
>> struct shm_file_data *sfd = shm_file_data(file);
>> --
>> 1.9.1
>>
>
> Any comment for this patch ?

If no further comment, we would like to get this
patch in queue for 4.18.


Re: [PATCH v2] ipc: Adding new return type vm_fault_t

2018-05-15 Thread Souptick Joarder
On Thu, May 10, 2018 at 7:34 PM, Souptick Joarder  wrote:
> On Wed, Apr 25, 2018 at 10:04 AM, Souptick Joarder  
> wrote:
>> Use new return type vm_fault_t for fault handler. For
>> now, this is just documenting that the function returns
>> a VM_FAULT value rather than an errno. Once all instances
>> are converted, vm_fault_t will become a distinct type.
>>
>> Commit 1c8f422059ae ("mm: change return type to vm_fault_t")
>>
>> Signed-off-by: Souptick Joarder 
>> Reviewed-by: Matthew Wilcox 
>> ---
>> v2: Updated the change log
>>
>>  ipc/shm.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/ipc/shm.c b/ipc/shm.c
>> index 4643865..2ba0cfc 100644
>> --- a/ipc/shm.c
>> +++ b/ipc/shm.c
>> @@ -378,7 +378,7 @@ void exit_shm(struct task_struct *task)
>> up_write(_ids(ns).rwsem);
>>  }
>>
>> -static int shm_fault(struct vm_fault *vmf)
>> +static vm_fault_t shm_fault(struct vm_fault *vmf)
>>  {
>> struct file *file = vmf->vma->vm_file;
>> struct shm_file_data *sfd = shm_file_data(file);
>> --
>> 1.9.1
>>
>
> Any comment for this patch ?

If no further comment, we would like to get this
patch in queue for 4.18.


[PATCH] ASoC: codecs: fix pcm1789.c build errors

2018-05-15 Thread Randy Dunlap
From: Randy Dunlap 

Fix build errors in pcm1789.c.
The source file needs to #include  since it
uses interfaces and macros that are provided by it.
However, it does not need to #include , so drop it.

Fixes these build errors:

../sound/soc/codecs/pcm1789.c: In function 'pcm1789_common_init':
../sound/soc/codecs/pcm1789.c:247:2: error: implicit declaration of function 
'devm_gpiod_get_optional' [-Werror=implicit-function-declaration]
  pcm1789->reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_HIGH);
../sound/soc/codecs/pcm1789.c:247:57: error: 'GPIOD_OUT_HIGH' undeclared (first 
use in this function)
  pcm1789->reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_HIGH);
../sound/soc/codecs/pcm1789.c:251:2: error: implicit declaration of function 
'gpiod_set_value_cansleep' [-Werror=implicit-function-declaration]
  gpiod_set_value_cansleep(pcm1789->reset, 0);

Fixes: 4ae340d1be36 ("ASoC: codecs: Add support for PCM1789")
Reported-by: kbuild test robot 
Signed-off-by: Randy Dunlap 
Cc: Mylène Josserand 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: alsa-de...@alsa-project.org
---
 sound/soc/codecs/pcm1789.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

BTW, MODULE_AUTHOR() still uses @free-electrons.com.

--- lnx-417-rc4.orig/sound/soc/codecs/pcm1789.c
+++ lnx-417-rc4/sound/soc/codecs/pcm1789.c
@@ -3,7 +3,7 @@
 // Copyright (C) 2018 Bootlin
 // Mylène Josserand 
 
-#include 
+#include 
 #include 
 #include 
 




[PATCH] ASoC: codecs: fix pcm1789.c build errors

2018-05-15 Thread Randy Dunlap
From: Randy Dunlap 

Fix build errors in pcm1789.c.
The source file needs to #include  since it
uses interfaces and macros that are provided by it.
However, it does not need to #include , so drop it.

Fixes these build errors:

../sound/soc/codecs/pcm1789.c: In function 'pcm1789_common_init':
../sound/soc/codecs/pcm1789.c:247:2: error: implicit declaration of function 
'devm_gpiod_get_optional' [-Werror=implicit-function-declaration]
  pcm1789->reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_HIGH);
../sound/soc/codecs/pcm1789.c:247:57: error: 'GPIOD_OUT_HIGH' undeclared (first 
use in this function)
  pcm1789->reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_HIGH);
../sound/soc/codecs/pcm1789.c:251:2: error: implicit declaration of function 
'gpiod_set_value_cansleep' [-Werror=implicit-function-declaration]
  gpiod_set_value_cansleep(pcm1789->reset, 0);

Fixes: 4ae340d1be36 ("ASoC: codecs: Add support for PCM1789")
Reported-by: kbuild test robot 
Signed-off-by: Randy Dunlap 
Cc: Mylène Josserand 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: alsa-de...@alsa-project.org
---
 sound/soc/codecs/pcm1789.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

BTW, MODULE_AUTHOR() still uses @free-electrons.com.

--- lnx-417-rc4.orig/sound/soc/codecs/pcm1789.c
+++ lnx-417-rc4/sound/soc/codecs/pcm1789.c
@@ -3,7 +3,7 @@
 // Copyright (C) 2018 Bootlin
 // Mylène Josserand 
 
-#include 
+#include 
 #include 
 #include 
 




  1   2   3   4   5   6   7   8   9   10   >