Re: fsl_espi errors on v5.7.15

2020-08-25 Thread Chris Packham

On 26/08/20 10:22 am, Chris Packham wrote:
> On 25/08/20 7:22 pm, Heiner Kallweit wrote:
>
> 
>> I've been staring at spi-fsl-espi.c for while now and I think I've
>>> identified a couple of deficiencies that may or may not be related 
>>> to my
>>> issue.
>>>
>>> First I think the 'Transfer done but SPIE_DON isn't set' message can be
>>> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE register.
>>> We also write back to it to clear the current events. We re-read it in
>>> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
>>> naturally end up in that situation if we're doing a large read. 
>>> Consider
>>> the messages for reading a block of data from a spi-nor chip
>>>
>>>    tx = READ_OP + ADDR
>>>    rx = data
>>>
>>> We setup the transfer and pump out the tx_buf. The first interrupt goes
>>> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
>>> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
>>> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
>>> continues until we've received all the data and we finish with 
>>> ESPI_SPIE
>>> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
>>> isn't set.
>>>
>>> The other deficiency is that we only get an interrupt when the 
>>> amount of
>>> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
>>> FSL_ESPI_RXTHR left to be received we will never pull them out of 
>>> the fifo.
>>>
>> SPIM_DON will trigger an interrupt once the last characters have been
>> transferred, and read the remaining characters from the FIFO.
>
> The T2080RM that I have says the following about the DON bit
>
> "Last character was transmitted. The last character was transmitted 
> and a new command can be written for the next frame."
>
> That does at least seem to fit with my assertion that it's all about 
> the TX direction. But the fact that it doesn't happen all the time 
> throws some doubt on it.
>
>> I think the reason I'm seeing some variability is because of how fast
>>> (or slow) the interrupts get processed and how fast the spi-nor chip 
>>> can
>>> fill the CPUs rx fifo.
>>>
>> To rule out timing issues at high bus frequencies I initially asked
>> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
>> or even less, then timing shouldn't be an issue.
> Yes I've currently got spi-max-frequency = <100>; in my dts. I 
> would also expect a slower frequency would fit my "DON is for TX" 
> narrative.
>> Last relevant functional changes have been done almost 4 years ago.
>> And yours is the first such report I see. So question is what could 
>> be so
>> special with your setup that it seems you're the only one being 
>> affected.
>> The scenarios you describe are standard, therefore much more people
>> should be affected in case of a driver bug.
> Agreed. But even on my hardware (which may have a latent issue despite 
> being in the field for going on 5 years) the issue only triggers under 
> some fairly specific circumstances.
>> You said that kernel config impacts how frequently the issue happens.
>> Therefore question is what's the diff in kernel config, and how could
>> the differences be related to SPI.
>
> It did seem to be somewhat random. Things like CONFIG_PREEMPT have an 
> impact but every time I found something that seemed to be having an 
> impact I've been able to disprove it. I actually think its about how 
> busy the system is which may or may not affect when we get round to 
> processing the interrupts.
>
> I have managed to get the 'Transfer done but SPIE_DON isn't set!' to 
> occur on the T2080RDB.
>
> I've had to add the following to expose the environment as a mtd 
> partition
>
> diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi 
> b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
> index ff87e67c70da..fbf95fc1fd68 100644
> --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
> +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
> @@ -116,6 +116,15 @@ flash@0 {
>     compatible = "micron,n25q512ax3", 
> "jedec,spi-nor";
>     reg = <0>;
>     spi-max-frequency = <1000>; /* 
> input clock */
> +
> +   partition@u-boot {
> +    reg = <0x 0x0010>;
> +    label = "u-boot";
> +    };
> +    partition@u-boot-env {
> +    reg = <0x0010 0x0001>;
> +    label = "u-boot-env";
> +    };
>     };
>     };
>
> And I'm using the following script to poke at the environment (warning 
> if anyone does try this and the bug hits it can render your u-boot 
> environment invalid).
>
> cat flash/fw_env_test.sh
> #!/bin/sh
>
> generate_fw_env_config()
> {
>   cat 

Re: [PATCH net v3] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset

2020-08-25 Thread Mingming Cao



> On Aug 25, 2020, at 5:31 PM, David Miller  wrote:
> 
> From: Dany Madden 
> Date: Tue, 25 Aug 2020 13:26:41 -0400
> 
>> From: Mingming Cao 
>> 
>> At the time of do_rest, ibmvnic tries to re-initalize the tx_pools
>> and rx_pools to avoid re-allocating the long term buffer. However
>> there is a window inside do_reset that the tx_pools and
>> rx_pools were freed before re-initialized making it possible to deference
>> null pointers.
>> 
>> This patch fix this issue by always check the tx_pool
>> and rx_pool are not NULL after ibmvnic_login. If so, re-allocating
>> the pools. This will avoid getting into calling reset_tx/rx_pools with
>> NULL adapter tx_pools/rx_pools pointer. Also add null pointer check in
>> reset_tx_pools and reset_rx_pools to safe handle NULL pointer case.
>> 
>> Signed-off-by: Mingming Cao 
>> Signed-off-by: Dany Madden 
> 
> Applied, but:
> 
>> +if (!adapter->rx_pool)
>> +return -1;
>> +
> 
> This driver has poor error code usage, it's a random mix of hypervisor
> error codes, normal error codes like -EINVAL, and internal error codes.
> Sometimes used all in the same function.
> 

Agree need to improve. For this patch/fix,  -1 is  chosen to follow other part 
of the driver that check NULL pointer and return -1 . We should  go through all 
of -1 cases and replace with normal proper error code. That should be a 
seperate patch. 

> For example:
> 
> static int ibmvnic_send_crq(struct ibmvnic_adapter *adapter,
>   union ibmvnic_crq *crq)
> ...
>   if (!adapter->crq.active &&
>   crq->generic.first != IBMVNIC_CRQ_INIT_CMD) {
>   dev_warn(dev, "Invalid request detected while CRQ is inactive, 
> possible device state change during reset\n");
>   return -EINVAL;
>   }
> ...
>   rc = plpar_hcall_norets(H_SEND_CRQ, ua,
>   cpu_to_be64(u64_crq[0]),
>   cpu_to_be64(u64_crq[1]));
> 
>   if (rc) {
>   if (rc == H_CLOSED) {
> ...
>   return rc;
> 
> So obviously this function returns a mix of negative erro codes
> and Hypervisor codes such as H_CLOSED.
> 
> And stuff like:
> 
>   rc = __ibmvnic_open(netdev);
>   if (rc)
>   return IBMVNIC_OPEN_FAILED;

Agree. 

Mingming

Re: [PATCH net v3] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset

2020-08-25 Thread David Miller
From: Dany Madden 
Date: Tue, 25 Aug 2020 13:26:41 -0400

> From: Mingming Cao 
> 
> At the time of do_rest, ibmvnic tries to re-initalize the tx_pools
> and rx_pools to avoid re-allocating the long term buffer. However
> there is a window inside do_reset that the tx_pools and
> rx_pools were freed before re-initialized making it possible to deference
> null pointers.
> 
> This patch fix this issue by always check the tx_pool
> and rx_pool are not NULL after ibmvnic_login. If so, re-allocating
> the pools. This will avoid getting into calling reset_tx/rx_pools with
> NULL adapter tx_pools/rx_pools pointer. Also add null pointer check in
> reset_tx_pools and reset_rx_pools to safe handle NULL pointer case.
> 
> Signed-off-by: Mingming Cao 
> Signed-off-by: Dany Madden 

Applied, but:

> + if (!adapter->rx_pool)
> + return -1;
> +

This driver has poor error code usage, it's a random mix of hypervisor
error codes, normal error codes like -EINVAL, and internal error codes.
Sometimes used all in the same function.

For example:

static int ibmvnic_send_crq(struct ibmvnic_adapter *adapter,
union ibmvnic_crq *crq)
 ...
if (!adapter->crq.active &&
crq->generic.first != IBMVNIC_CRQ_INIT_CMD) {
dev_warn(dev, "Invalid request detected while CRQ is inactive, 
possible device state change during reset\n");
return -EINVAL;
}
 ...
rc = plpar_hcall_norets(H_SEND_CRQ, ua,
cpu_to_be64(u64_crq[0]),
cpu_to_be64(u64_crq[1]));

if (rc) {
if (rc == H_CLOSED) {
 ...
return rc;

So obviously this function returns a mix of negative erro codes
and Hypervisor codes such as H_CLOSED.

And stuff like:

rc = __ibmvnic_open(netdev);
if (rc)
return IBMVNIC_OPEN_FAILED;


Re: [RFT][PATCH 0/7] Avoid overflow at boundary_size

2020-08-25 Thread Nicolin Chen
Hi Niklas,

On Tue, Aug 25, 2020 at 12:16:27PM +0200, Niklas Schnelle wrote:
> On 8/21/20 1:19 AM, Nicolin Chen wrote:
> > We are expending the default DMA segmentation boundary to its
> > possible maximum value (ULONG_MAX) to indicate that a device
> > doesn't specify a boundary limit. So all dma_get_seg_boundary
> > callers should take a precaution with the return values since
> > it would easily get overflowed.
> > 
> > I scanned the entire kernel tree for all the existing callers
> > and found that most of callers may get overflowed in two ways:
> > either "+ 1" or passing it to ALIGN() that does "+ mask".
> > 
> > According to kernel defines:
> > #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
> > #define ALIGN(x, a) ALIGN_MASK(x, (typeof(x))(a) - 1)
> > 
> > We can simplify the logic here:
> >   ALIGN(boundary + 1, 1 << shift) >> shift
> > = ALIGN_MASK(b + 1, (1 << s) - 1) >> s
> > = {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
> > = [b + 1 + (1 << s) - 1] >> s
> > = [b + (1 << s)] >> s
> > = (b >> s) + 1
> > 
> > So this series of patches fix the potential overflow with this
> > overflow-free shortcut.
 
> haven't seen any other feedback from other maintainers,

I am wondering this too...whether I sent correctly or not.

> so I guess you will resend this?

Do I need to? Though I won't mind doing so if it's necessary..

> On first glance it seems to make sense.
> I'm a little confused why it is only a "potential overflow"
> while this part
> 
> "We are expending the default DMA segmentation boundary to its
>  possible maximum value (ULONG_MAX) to indicate that a device
>  doesn't specify a boundary limit"
> 
> sounds to me like ULONG_MAX is actually used, does that
> mean there are currently no devices which do not specify a
> boundary limit?

Sorry for the confusion. We actually applied ULONG_MAX change
last week but reverted it right after, due to a bug report at
one of these "potential" overflows. So at this moment the top
of the tree doesn't set default boundary to ULONG_MAX yet.

Thanks
Nic


Please apply commit 0828137e8f16 ("powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()") to v4.14.y, v4.19.y, v5.4.y, v5.7.y

2020-08-25 Thread Thadeu Lima de Souza Cascardo
After commit 912c0a7f2b5daa3cbb2bc10f303981e493de73bd ("powerpc/64s: Save FSCR
to init_task.thread.fscr after feature init"), which has been applied to the
referred branches, when userspace sets the user DSCR MSR, it won't be inherited
or restored during context switch, because the facility unavailable interrupt
won't trigger.

Applying 0828137e8f16721842468e33df0460044a0c588b ("powerpc/64s: Don't init
FSCR_DSCR in __init_FSCR()") will fix it.

Cascardo.


Re: fsl_espi errors on v5.7.15

2020-08-25 Thread Chris Packham
On 25/08/20 7:22 pm, Heiner Kallweit wrote:


> I've been staring at spi-fsl-espi.c for while now and I think I've
>> identified a couple of deficiencies that may or may not be related to my
>> issue.
>>
>> First I think the 'Transfer done but SPIE_DON isn't set' message can be
>> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE register.
>> We also write back to it to clear the current events. We re-read it in
>> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
>> naturally end up in that situation if we're doing a large read. Consider
>> the messages for reading a block of data from a spi-nor chip
>>
>>    tx = READ_OP + ADDR
>>    rx = data
>>
>> We setup the transfer and pump out the tx_buf. The first interrupt goes
>> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
>> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
>> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
>> continues until we've received all the data and we finish with ESPI_SPIE
>> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
>> isn't set.
>>
>> The other deficiency is that we only get an interrupt when the amount of
>> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
>> FSL_ESPI_RXTHR left to be received we will never pull them out of the fifo.
>>
> SPIM_DON will trigger an interrupt once the last characters have been
> transferred, and read the remaining characters from the FIFO.

The T2080RM that I have says the following about the DON bit

"Last character was transmitted. The last character was transmitted and 
a new command can be written for the next frame."

That does at least seem to fit with my assertion that it's all about the 
TX direction. But the fact that it doesn't happen all the time throws 
some doubt on it.

> I think the reason I'm seeing some variability is because of how fast
>> (or slow) the interrupts get processed and how fast the spi-nor chip can
>> fill the CPUs rx fifo.
>>
> To rule out timing issues at high bus frequencies I initially asked
> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
> or even less, then timing shouldn't be an issue.
Yes I've currently got spi-max-frequency = <100>; in my dts. I would 
also expect a slower frequency would fit my "DON is for TX" narrative.
> Last relevant functional changes have been done almost 4 years ago.
> And yours is the first such report I see. So question is what could be so
> special with your setup that it seems you're the only one being affected.
> The scenarios you describe are standard, therefore much more people
> should be affected in case of a driver bug.
Agreed. But even on my hardware (which may have a latent issue despite 
being in the field for going on 5 years) the issue only triggers under 
some fairly specific circumstances.
> You said that kernel config impacts how frequently the issue happens.
> Therefore question is what's the diff in kernel config, and how could
> the differences be related to SPI.

It did seem to be somewhat random. Things like CONFIG_PREEMPT have an 
impact but every time I found something that seemed to be having an 
impact I've been able to disprove it. I actually think its about how 
busy the system is which may or may not affect when we get round to 
processing the interrupts.

I have managed to get the 'Transfer done but SPIE_DON isn't set!' to 
occur on the T2080RDB.

I've had to add the following to expose the environment as a mtd partition

diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi 
b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
index ff87e67c70da..fbf95fc1fd68 100644
--- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
@@ -116,6 +116,15 @@ flash@0 {
     compatible = "micron,n25q512ax3", 
"jedec,spi-nor";
     reg = <0>;
     spi-max-frequency = <1000>; /* 
input clock */
+
+   partition@u-boot {
+    reg = <0x 0x0010>;
+    label = "u-boot";
+    };
+    partition@u-boot-env {
+    reg = <0x0010 0x0001>;
+    label = "u-boot-env";
+    };
     };
     };

And I'm using the following script to poke at the environment (warning 
if anyone does try this and the bug hits it can render your u-boot 
environment invalid).

cat flash/fw_env_test.sh
#!/bin/sh

generate_fw_env_config()
{
   cat /proc/mtd | sed 's/[:"]//g' | while read dev size erasesize name ; do
  echo "$dev $size $erasesize $name"
  [ "$name" = "u-boot-env" ] && echo "/dev/$dev 0x 0x2000 
$erasesize" >/flash/fw_env.config
   done
}

cycles=10
[ $# -ge 

Re: [PATCH net v2] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset

2020-08-25 Thread Mingming Cao
On Aug 25, 2020, at 10:08 AM, David Miller  wrote:From: Dany Madden Date: Tue, 25 Aug 2020 12:56:06 -0400@@ -2011,7 +2017,10 @@ static int do_reset(struct ibmvnic_adapter *adapter,		    adapter->req_rx_add_entries_per_subcrq !=		    old_num_rx_slots ||		    adapter->req_tx_entries_per_subcrq !=-		    old_num_tx_slots) {+		    old_num_tx_slots ||+			!adapter->rx_pool ||+			!adapter->tso_pool ||+			!adapter->tx_pool) {Please don't over indent these new lines, indent them identically as thelines above where you are adding new conditions.Thank you.Okay, good catch.  thanks!Mingming

[PATCH net v3] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset

2020-08-25 Thread Dany Madden
From: Mingming Cao 

At the time of do_rest, ibmvnic tries to re-initalize the tx_pools
and rx_pools to avoid re-allocating the long term buffer. However
there is a window inside do_reset that the tx_pools and
rx_pools were freed before re-initialized making it possible to deference
null pointers.

This patch fix this issue by always check the tx_pool
and rx_pool are not NULL after ibmvnic_login. If so, re-allocating
the pools. This will avoid getting into calling reset_tx/rx_pools with
NULL adapter tx_pools/rx_pools pointer. Also add null pointer check in
reset_tx_pools and reset_rx_pools to safe handle NULL pointer case.

Signed-off-by: Mingming Cao 
Signed-off-by: Dany Madden 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 5afb3c9c52d2..d3a774331afc 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -479,6 +479,9 @@ static int reset_rx_pools(struct ibmvnic_adapter *adapter)
int i, j, rc;
u64 *size_array;
 
+   if (!adapter->rx_pool)
+   return -1;
+
size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size));
 
@@ -649,6 +652,9 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter)
int tx_scrqs;
int i, rc;
 
+   if (!adapter->tx_pool)
+   return -1;
+
tx_scrqs = be32_to_cpu(adapter->login_rsp_buf->num_txsubm_subcrqs);
for (i = 0; i < tx_scrqs; i++) {
rc = reset_one_tx_pool(adapter, >tso_pool[i]);
@@ -2011,7 +2017,10 @@ static int do_reset(struct ibmvnic_adapter *adapter,
adapter->req_rx_add_entries_per_subcrq !=
old_num_rx_slots ||
adapter->req_tx_entries_per_subcrq !=
-   old_num_tx_slots) {
+   old_num_tx_slots ||
+   !adapter->rx_pool ||
+   !adapter->tso_pool ||
+   !adapter->tx_pool) {
release_rx_pools(adapter);
release_tx_pools(adapter);
release_napi(adapter);
@@ -2024,10 +2033,14 @@ static int do_reset(struct ibmvnic_adapter *adapter,
} else {
rc = reset_tx_pools(adapter);
if (rc)
+   netdev_dbg(adapter->netdev, "reset tx pools 
failed (%d)\n",
+   rc);
goto out;
 
rc = reset_rx_pools(adapter);
if (rc)
+   netdev_dbg(adapter->netdev, "reset rx pools 
failed (%d)\n",
+   rc);
goto out;
}
ibmvnic_disable_irqs(adapter);
-- 
2.18.2



Re: [PATCH net v2] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset

2020-08-25 Thread David Miller
From: Dany Madden 
Date: Tue, 25 Aug 2020 12:56:06 -0400

> @@ -2011,7 +2017,10 @@ static int do_reset(struct ibmvnic_adapter *adapter,
>   adapter->req_rx_add_entries_per_subcrq !=
>   old_num_rx_slots ||
>   adapter->req_tx_entries_per_subcrq !=
> - old_num_tx_slots) {
> + old_num_tx_slots ||
> + !adapter->rx_pool ||
> + !adapter->tso_pool ||
> + !adapter->tx_pool) {

Please don't over indent these new lines, indent them identically as the
lines above where you are adding new conditions.

Thank you.


[PATCH net v2] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset

2020-08-25 Thread Dany Madden
From: Mingming Cao 

At the time of do_rest, ibmvnic tries to re-initalize the tx_pools
and rx_pools to avoid re-allocating the long term buffer. However
there is a window inside do_reset that the tx_pools and
rx_pools were freed before re-initialized making it possible to deference
null pointers.

This patch fix this issue by always check the tx_pool
and rx_pool are not NULL after ibmvnic_login. If so, re-allocating
the pools. This will avoid getting into calling reset_tx/rx_pools with
NULL adapter tx_pools/rx_pools pointer. Also add null pointer check in
reset_tx_pools and reset_rx_pools to safe handle NULL pointer case.

Signed-off-by: Mingming Cao 
Signed-off-by: Dany Madden 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 5afb3c9c52d2..52feee97821e 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -479,6 +479,9 @@ static int reset_rx_pools(struct ibmvnic_adapter *adapter)
int i, j, rc;
u64 *size_array;
 
+   if (!adapter->rx_pool)
+   return -1;
+
size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size));
 
@@ -649,6 +652,9 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter)
int tx_scrqs;
int i, rc;
 
+   if (!adapter->tx_pool)
+   return -1;
+
tx_scrqs = be32_to_cpu(adapter->login_rsp_buf->num_txsubm_subcrqs);
for (i = 0; i < tx_scrqs; i++) {
rc = reset_one_tx_pool(adapter, >tso_pool[i]);
@@ -2011,7 +2017,10 @@ static int do_reset(struct ibmvnic_adapter *adapter,
adapter->req_rx_add_entries_per_subcrq !=
old_num_rx_slots ||
adapter->req_tx_entries_per_subcrq !=
-   old_num_tx_slots) {
+   old_num_tx_slots ||
+   !adapter->rx_pool ||
+   !adapter->tso_pool ||
+   !adapter->tx_pool) {
release_rx_pools(adapter);
release_tx_pools(adapter);
release_napi(adapter);
@@ -2024,10 +2033,14 @@ static int do_reset(struct ibmvnic_adapter *adapter,
} else {
rc = reset_tx_pools(adapter);
if (rc)
+   netdev_dbg(adapter->netdev, "reset tx pools 
failed (%d)\n",
+   rc);
goto out;
 
rc = reset_rx_pools(adapter);
if (rc)
+   netdev_dbg(adapter->netdev, "reset rx pools 
failed (%d)\n",
+   rc);
goto out;
}
ibmvnic_disable_irqs(adapter);
-- 
2.18.2



[PATCH v7 12/12] powerpc/64s/radix: Enable huge vmalloc mappings

2020-08-25 Thread Nicholas Piggin
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nicholas Piggin 
---
 Documentation/admin-guide/kernel-parameters.txt | 2 ++
 arch/powerpc/Kconfig| 1 +
 2 files changed, 3 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index bdc1f33fd3d1..6f0b41289a90 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3190,6 +3190,8 @@
 
nohugeiomap [KNL,X86,PPC] Disable kernel huge I/O mappings.
 
+   nohugevmalloc   [PPC] Disable kernel huge vmalloc mappings.
+
nosmt   [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1f48bbfb3ce9..9171d25ad7dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -175,6 +175,7 @@ config PPC
select GENERIC_TIME_VSYSCALL
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
+   select HAVE_ARCH_HUGE_VMALLOC   if HAVE_ARCH_HUGE_VMAP
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
-- 
2.23.0



[PATCH v7 11/12] mm/vmalloc: Hugepage vmalloc mappings

2020-08-25 Thread Nicholas Piggin
Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
supports PMD sized vmap mappings.

vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or
larger, and fall back to small pages if that was unsuccessful.

Allocations that do not use PAGE_KERNEL prot are not permitted to use huge
pages, because not all callers expect this (e.g., module allocations vs
strict module rwx).

This reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.

This can result in more internal fragmentation and memory overhead for a
given allocation, an option nohugevmalloc is added to disable at boot.

Signed-off-by: Nicholas Piggin 
---
 arch/Kconfig|   4 +
 include/linux/vmalloc.h |   1 +
 mm/page_alloc.c |   5 +-
 mm/vmalloc.c| 180 ++--
 4 files changed, 145 insertions(+), 45 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index af14a567b493..b2b89d629317 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -616,6 +616,10 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
 config HAVE_ARCH_HUGE_VMAP
bool
 
+config HAVE_ARCH_HUGE_VMALLOC
+   depends on HAVE_ARCH_HUGE_VMAP
+   bool
+
 config ARCH_WANT_HUGE_PMD_SHARE
bool
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 15adb9a14fb6..a7449064fe35 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -58,6 +58,7 @@ struct vm_struct {
unsigned long   size;
unsigned long   flags;
struct page **pages;
+   unsigned intpage_order;
unsigned intnr_pages;
phys_addr_t phys_addr;
const void  *caller;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e2bab486fea..b6427cc7b838 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -69,6 +69,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -8102,6 +8103,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
void *table = NULL;
gfp_t gfp_flags;
bool virt;
+   bool huge;
 
/* allow the kernel cmdline to have a say */
if (!numentries) {
@@ -8169,6 +8171,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
} else if (get_order(size) >= MAX_ORDER || hashdist) {
table = __vmalloc(size, gfp_flags);
virt = true;
+   huge = (find_vm_area(table)->page_order > 0);
} else {
/*
 * If bucketsize is not a power-of-two, we may free
@@ -8185,7 +8188,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
 
pr_info("%s hash table entries: %ld (order: %d, %lu bytes, %s)\n",
tablename, 1UL << log2qty, ilog2(size) - PAGE_SHIFT, size,
-   virt ? "vmalloc" : "linear");
+   virt ? (huge ? "vmalloc hugepage" : "vmalloc") : "linear");
 
if (_hash_shift)
*_hash_shift = log2qty;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 1d6cad16bda3..8db53c2d7f72 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -44,6 +44,19 @@
 #include "internal.h"
 #include "pgalloc-track.h"
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
+static bool __ro_after_init vmap_allow_huge = true;
+
+static int __init set_nohugevmalloc(char *str)
+{
+   vmap_allow_huge = false;
+   return 0;
+}
+early_param("nohugevmalloc", set_nohugevmalloc);
+#else /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */
+static const bool vmap_allow_huge = false;
+#endif /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */
+
 bool is_vmalloc_addr(const void *x)
 {
unsigned long addr = (unsigned long)x;
@@ -477,31 +490,12 @@ static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long 
addr,
return 0;
 }
 
-/**
- * map_kernel_range_noflush - map kernel VM area with the specified pages
- * @addr: start of the VM area to map
- * @size: size of the VM area to map
- * @prot: page protection flags to use
- * @pages: pages to map
- *
- * Map PFN_UP(@size) pages at @addr.  The VM area @addr and @size specify 
should
- * have been allocated using get_vm_area() and its friends.
- *
- * NOTE:
- * This function does NOT do any cache flushing.  The caller is responsible for
- * calling flush_cache_vmap() on to-be-mapped areas before calling this
- * function.
- *
- * RETURNS:
- * 0 on success, -errno on failure.
- */
-int map_kernel_range_noflush(unsigned long addr, unsigned long size,
-pgprot_t prot, struct page **pages)
+static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long 
end,
+   pgprot_t prot, struct page **pages)
 {
unsigned long start = addr;
-   unsigned long end = addr + size;
-   unsigned long next;
pgd_t *pgd;
+   

[PATCH v7 10/12] mm/vmalloc: add vmap_range_noflush variant

2020-08-25 Thread Nicholas Piggin
As a side-effect, the order of flush_cache_vmap() and
arch_sync_kernel_mappings() calls are switched, but that now matches
the other callers in this file.

Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 256554d598e6..1d6cad16bda3 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -237,7 +237,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, 
unsigned long end,
return 0;
 }
 
-int vmap_range(unsigned long addr, unsigned long end,
+static int vmap_range_noflush(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot,
unsigned int max_page_shift)
 {
@@ -259,14 +259,24 @@ int vmap_range(unsigned long addr, unsigned long end,
break;
} while (pgd++, phys_addr += (next - addr), addr = next, addr != end);
 
-   flush_cache_vmap(start, end);
-
if (mask & ARCH_PAGE_TABLE_SYNC_MASK)
arch_sync_kernel_mappings(start, end);
 
return err;
 }
 
+int vmap_range(unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   unsigned int max_page_shift)
+{
+   int err;
+
+   err = vmap_range_noflush(addr, end, phys_addr, prot, max_page_shift);
+   flush_cache_vmap(addr, end);
+
+   return err;
+}
+
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 pgtbl_mod_mask *mask)
 {
-- 
2.23.0



[PATCH v7 09/12] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c

2020-08-25 Thread Nicholas Piggin
This is a generic kernel virtual memory mapper, not specific to ioremap.

Signed-off-by: Nicholas Piggin 
---
 include/linux/vmalloc.h |   3 +
 mm/ioremap.c| 197 
 mm/vmalloc.c| 196 +++
 3 files changed, 199 insertions(+), 197 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 3f6bba4cc9bc..15adb9a14fb6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -177,6 +177,9 @@ extern struct vm_struct *remove_vm_area(const void *addr);
 extern struct vm_struct *find_vm_area(const void *addr);
 
 #ifdef CONFIG_MMU
+int vmap_range(unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   unsigned int max_page_shift);
 extern int map_kernel_range_noflush(unsigned long start, unsigned long size,
pgprot_t prot, struct page **pages);
 int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot,
diff --git a/mm/ioremap.c b/mm/ioremap.c
index c67f91164401..d1dcc7e744ac 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -28,203 +28,6 @@ early_param("nohugeiomap", set_nohugeiomap);
 static const bool iomap_max_page_shift = PAGE_SHIFT;
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
-static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
-{
-   pte_t *pte;
-   u64 pfn;
-
-   pfn = phys_addr >> PAGE_SHIFT;
-   pte = pte_alloc_kernel_track(pmd, addr, mask);
-   if (!pte)
-   return -ENOMEM;
-   do {
-   BUG_ON(!pte_none(*pte));
-   set_pte_at(_mm, addr, pte, pfn_pte(pfn, prot));
-   pfn++;
-   } while (pte++, addr += PAGE_SIZE, addr != end);
-   *mask |= PGTBL_PTE_MODIFIED;
-   return 0;
-}
-
-static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift)
-{
-   if (max_page_shift < PMD_SHIFT)
-   return 0;
-
-   if (!arch_vmap_pmd_supported(prot))
-   return 0;
-
-   if ((end - addr) != PMD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(addr, PMD_SIZE))
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PMD_SIZE))
-   return 0;
-
-   if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
-   return 0;
-
-   return pmd_set_huge(pmd, phys_addr, prot);
-}
-
-static int vmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift, pgtbl_mod_mask *mask)
-{
-   pmd_t *pmd;
-   unsigned long next;
-
-   pmd = pmd_alloc_track(_mm, pud, addr, mask);
-   if (!pmd)
-   return -ENOMEM;
-   do {
-   next = pmd_addr_end(addr, end);
-
-   if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot, 
max_page_shift)) {
-   *mask |= PGTBL_PMD_MODIFIED;
-   continue;
-   }
-
-   if (vmap_pte_range(pmd, addr, next, phys_addr, prot, mask))
-   return -ENOMEM;
-   } while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
-   return 0;
-}
-
-static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift)
-{
-   if (max_page_shift < PUD_SHIFT)
-   return 0;
-
-   if (!arch_vmap_pud_supported(prot))
-   return 0;
-
-   if ((end - addr) != PUD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(addr, PUD_SIZE))
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PUD_SIZE))
-   return 0;
-
-   if (pud_present(*pud) && !pud_free_pmd_page(pud, addr))
-   return 0;
-
-   return pud_set_huge(pud, phys_addr, prot);
-}
-
-static int vmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift, pgtbl_mod_mask *mask)
-{
-   pud_t *pud;
-   unsigned long next;
-
-   pud = pud_alloc_track(_mm, p4d, addr, mask);
-   if (!pud)
-   return -ENOMEM;
-   do {
-   next = pud_addr_end(addr, end);
-
-   if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot, 
max_page_shift)) {
-   *mask |= PGTBL_PUD_MODIFIED;
-   continue;
-   }
-
-   if (vmap_pmd_range(pud, addr, next, phys_addr, prot, 
max_page_shift, mask))
-   return -ENOMEM;
-   } while 

[PATCH v7 08/12] x86: inline huge vmap supported functions

2020-08-25 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Signed-off-by: Nicholas Piggin 
---

Ack or objection if this goes via the -mm tree?

 arch/x86/include/asm/vmalloc.h | 22 +++---
 arch/x86/mm/ioremap.c  | 19 ---
 arch/x86/mm/pgtable.c  | 13 -
 3 files changed, 19 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
index 094ea2b565f3..e714b00fc0ca 100644
--- a/arch/x86/include/asm/vmalloc.h
+++ b/arch/x86/include/asm/vmalloc.h
@@ -1,13 +1,29 @@
 #ifndef _ASM_X86_VMALLOC_H
 #define _ASM_X86_VMALLOC_H
 
+#include 
 #include 
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+#ifdef CONFIG_X86_64
+   return boot_cpu_has(X86_FEATURE_GBPAGES);
+#else
+   return false;
+#endif
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   return boot_cpu_has(X86_FEATURE_PSE);
+}
 #endif
 
 #endif /* _ASM_X86_VMALLOC_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 159bfca757b9..1465a22a9bfb 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -481,25 +481,6 @@ void iounmap(volatile void __iomem *addr)
 }
 EXPORT_SYMBOL(iounmap);
 
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-
-bool arch_vmap_pud_supported(pgprot_t prot)
-{
-#ifdef CONFIG_X86_64
-   return boot_cpu_has(X86_FEATURE_GBPAGES);
-#else
-   return false;
-#endif
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   return boot_cpu_has(X86_FEATURE_PSE);
-}
-
 /*
  * Convert a physical pointer to a virtual kernel pointer for /dev/mem
  * access
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index dfd82f51ba66..801c418ee97d 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -780,14 +780,6 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
-/*
- * Until we support 512GB pages, skip them in the vmap area.
- */
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;
-}
-
 #ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
@@ -859,11 +851,6 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 #else /* !CONFIG_X86_64 */
 
-int pud_free_pmd_page(pud_t *pud, unsigned long addr)
-{
-   return pud_none(*pud);
-}
-
 /*
  * Disable free page handling on x86-PAE. This assures that ioremap()
  * does not update sync'd pmd entries. See vmalloc_sync_one().
-- 
2.23.0



[PATCH v7 07/12] arm64: inline huge vmap supported functions

2020-08-25 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Nicholas Piggin 
---

Ack or objection if this goes via the -mm tree?

 arch/arm64/include/asm/vmalloc.h | 23 ---
 arch/arm64/mm/mmu.c  | 26 --
 2 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 597b40405319..fc9a12d6cc1a 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -4,9 +4,26 @@
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+   /*
+* Only 4k granule supports level 1 block mappings.
+* SW table walks can't handle removal of intermediate entries.
+*/
+   return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
+  !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   /* See arch_vmap_pud_supported() */
+   return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
+}
 #endif
 
 #endif /* _ASM_ARM64_VMALLOC_H */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9df7e0058c78..07093e148957 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1304,27 +1304,6 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int 
*size, pgprot_t prot)
return dt_virt;
 }
 
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-
-bool arch_vmap_pud_supported(pgprot_t prot);
-{
-   /*
-* Only 4k granule supports level 1 block mappings.
-* SW table walks can't handle removal of intermediate entries.
-*/
-   return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
-  !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   /* See arch_vmap_pud_supported() */
-   return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
-}
-
 int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot)
 {
pud_t new_pud = pfn_pud(__phys_to_pfn(phys), mk_pud_sect_prot(prot));
@@ -1416,11 +1395,6 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
return 1;
 }
 
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;   /* Don't attempt a block mapping */
-}
-
 #ifdef CONFIG_MEMORY_HOTPLUG
 static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
 {
-- 
2.23.0



[PATCH v7 06/12] powerpc: inline huge vmap supported functions

2020-08-25 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nicholas Piggin 
---

Ack or objection if this goes via the -mm tree? 

 arch/powerpc/include/asm/vmalloc.h   | 19 ---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 21 -
 2 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/vmalloc.h 
b/arch/powerpc/include/asm/vmalloc.h
index 105abb73f075..3f0c153befb0 100644
--- a/arch/powerpc/include/asm/vmalloc.h
+++ b/arch/powerpc/include/asm/vmalloc.h
@@ -1,12 +1,25 @@
 #ifndef _ASM_POWERPC_VMALLOC_H
 #define _ASM_POWERPC_VMALLOC_H
 
+#include 
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+   /* HPT does not cope with large pages in the vmalloc area */
+   return radix_enabled();
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   return radix_enabled();
+}
 #endif
 
 #endif /* _ASM_POWERPC_VMALLOC_H */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index eca83a50bf2e..27f5837cf145 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1134,22 +1134,6 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
 }
 
-bool arch_vmap_pud_supported(pgprot_t prot)
-{
-   /* HPT does not cope with large pages in the vmalloc area */
-   return radix_enabled();
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   return radix_enabled();
-}
-
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;
-}
-
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
pte_t *ptep = (pte_t *)pud;
@@ -1233,8 +1217,3 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
return 1;
 }
-
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-- 
2.23.0



[PATCH v7 05/12] mm: HUGE_VMAP arch support cleanup

2020-08-25 Thread Nicholas Piggin
This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.

This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.

This also adds a prot argument to the arch query. This is unused
currently but could help with some architectures (e.g., some powerpc
processors can't map uncacheable memory with large pages).

Cc: linuxppc-dev@lists.ozlabs.org
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Signed-off-by: Nicholas Piggin 
---

Ack or objection from arch maintainers if this goes via the -mm tree?

 arch/arm64/include/asm/vmalloc.h |  8 +++
 arch/arm64/mm/mmu.c  | 10 +--
 arch/powerpc/include/asm/vmalloc.h   |  8 +++
 arch/powerpc/mm/book3s64/radix_pgtable.c |  8 +--
 arch/x86/include/asm/vmalloc.h   |  7 ++
 arch/x86/mm/ioremap.c| 10 +--
 include/linux/io.h   |  9 ---
 include/linux/vmalloc.h  |  6 ++
 init/main.c  |  1 -
 mm/ioremap.c | 88 +---
 10 files changed, 77 insertions(+), 78 deletions(-)

diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 2ca708ab9b20..597b40405319 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -1,4 +1,12 @@
 #ifndef _ASM_ARM64_VMALLOC_H
 #define _ASM_ARM64_VMALLOC_H
 
+#include 
+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_ARM64_VMALLOC_H */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62fea1b6..9df7e0058c78 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1304,12 +1304,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int 
*size, pgprot_t prot)
return dt_virt;
 }
 
-int __init arch_ioremap_p4d_supported(void)
+bool arch_vmap_p4d_supported(pgprot_t prot)
 {
-   return 0;
+   return false;
 }
 
-int __init arch_ioremap_pud_supported(void)
+bool arch_vmap_pud_supported(pgprot_t prot);
 {
/*
 * Only 4k granule supports level 1 block mappings.
@@ -1319,9 +1319,9 @@ int __init arch_ioremap_pud_supported(void)
   !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
-int __init arch_ioremap_pmd_supported(void)
+bool arch_vmap_pmd_supported(pgprot_t prot)
 {
-   /* See arch_ioremap_pud_supported() */
+   /* See arch_vmap_pud_supported() */
return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
diff --git a/arch/powerpc/include/asm/vmalloc.h 
b/arch/powerpc/include/asm/vmalloc.h
index b992dfaaa161..105abb73f075 100644
--- a/arch/powerpc/include/asm/vmalloc.h
+++ b/arch/powerpc/include/asm/vmalloc.h
@@ -1,4 +1,12 @@
 #ifndef _ASM_POWERPC_VMALLOC_H
 #define _ASM_POWERPC_VMALLOC_H
 
+#include 
+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_POWERPC_VMALLOC_H */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 28c784976bed..eca83a50bf2e 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1134,13 +1134,13 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
 }
 
-int __init arch_ioremap_pud_supported(void)
+bool arch_vmap_pud_supported(pgprot_t prot)
 {
/* HPT does not cope with large pages in the vmalloc area */
return radix_enabled();
 }
 
-int __init arch_ioremap_pmd_supported(void)
+bool arch_vmap_pmd_supported(pgprot_t prot)
 {
return radix_enabled();
 }
@@ -1234,7 +1234,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
return 1;
 }
 
-int __init arch_ioremap_p4d_supported(void)
+bool arch_vmap_p4d_supported(pgprot_t prot)
 {
-   return 0;
+   return false;
 }
diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
index 29837740b520..094ea2b565f3 100644
--- a/arch/x86/include/asm/vmalloc.h
+++ b/arch/x86/include/asm/vmalloc.h
@@ -1,6 +1,13 @@
 #ifndef _ASM_X86_VMALLOC_H
 #define _ASM_X86_VMALLOC_H
 
+#include 
 #include 
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_X86_VMALLOC_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 84d85dbd1dad..159bfca757b9 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -481,21 +481,21 @@ void iounmap(volatile 

[PATCH v7 04/12] mm/ioremap: rename ioremap_*_range to vmap_*_range

2020-08-25 Thread Nicholas Piggin
This will be used as a generic kernel virtual mapping function, so
re-name it in preparation.

Signed-off-by: Nicholas Piggin 
---
 mm/ioremap.c | 64 +++-
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/mm/ioremap.c b/mm/ioremap.c
index 5fa1ab41d152..3f4d36f9745a 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -61,9 +61,9 @@ static inline int ioremap_pud_enabled(void) { return 0; }
 static inline int ioremap_pmd_enabled(void) { return 0; }
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
-static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pte_t *pte;
u64 pfn;
@@ -81,9 +81,8 @@ static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
return 0;
 }
 
-static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_pmd_enabled())
return 0;
@@ -103,9 +102,9 @@ static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long 
addr,
return pmd_set_huge(pmd, phys_addr, prot);
 }
 
-static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pmd_t *pmd;
unsigned long next;
@@ -116,20 +115,19 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned 
long addr,
do {
next = pmd_addr_end(addr, end);
 
-   if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) {
+   if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) {
*mask |= PGTBL_PMD_MODIFIED;
continue;
}
 
-   if (ioremap_pte_range(pmd, addr, next, phys_addr, prot, mask))
+   if (vmap_pte_range(pmd, addr, next, phys_addr, prot, mask))
return -ENOMEM;
} while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
return 0;
 }
 
-static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_pud_enabled())
return 0;
@@ -149,9 +147,9 @@ static int ioremap_try_huge_pud(pud_t *pud, unsigned long 
addr,
return pud_set_huge(pud, phys_addr, prot);
 }
 
-static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pud_t *pud;
unsigned long next;
@@ -162,20 +160,19 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
do {
next = pud_addr_end(addr, end);
 
-   if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot)) {
+   if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot)) {
*mask |= PGTBL_PUD_MODIFIED;
continue;
}
 
-   if (ioremap_pmd_range(pud, addr, next, phys_addr, prot, mask))
+   if (vmap_pmd_range(pud, addr, next, phys_addr, prot, mask))
return -ENOMEM;
} while (pud++, phys_addr += (next - addr), addr = next, addr != end);
return 0;
 }
 
-static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_p4d(p4d_t *p4d, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_p4d_enabled())
return 0;
@@ -195,9 +192,9 @@ static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long 
addr,
return p4d_set_huge(p4d, phys_addr, prot);
 }
 
-static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
-   

[PATCH v7 03/12] mm/vmalloc: rename vmap_*_range vmap_pages_*_range

2020-08-25 Thread Nicholas Piggin
The vmalloc mapper operates on a struct page * array rather than a
linear physical address, re-name it to make this distinction clear.

Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 4e9b21adc73d..45cd80ec7eeb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -189,7 +189,7 @@ void unmap_kernel_range_noflush(unsigned long start, 
unsigned long size)
arch_sync_kernel_mappings(start, end);
 }
 
-static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
+static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -217,7 +217,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
return 0;
 }
 
-static int vmap_pmd_range(pud_t *pud, unsigned long addr,
+static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -229,13 +229,13 @@ static int vmap_pmd_range(pud_t *pud, unsigned long addr,
return -ENOMEM;
do {
next = pmd_addr_end(addr, end);
-   if (vmap_pte_range(pmd, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (pmd++, addr = next, addr != end);
return 0;
 }
 
-static int vmap_pud_range(p4d_t *p4d, unsigned long addr,
+static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -247,13 +247,13 @@ static int vmap_pud_range(p4d_t *p4d, unsigned long addr,
return -ENOMEM;
do {
next = pud_addr_end(addr, end);
-   if (vmap_pmd_range(pud, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (pud++, addr = next, addr != end);
return 0;
 }
 
-static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
+static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -265,7 +265,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
return -ENOMEM;
do {
next = p4d_addr_end(addr, end);
-   if (vmap_pud_range(p4d, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (p4d++, addr = next, addr != end);
return 0;
@@ -306,7 +306,7 @@ int map_kernel_range_noflush(unsigned long addr, unsigned 
long size,
next = pgd_addr_end(addr, end);
if (pgd_bad(*pgd))
mask |= PGTBL_PGD_MODIFIED;
-   err = vmap_p4d_range(pgd, addr, next, prot, pages, , );
+   err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, , 
);
if (err)
return err;
} while (pgd++, addr = next, addr != end);
-- 
2.23.0



[PATCH v7 02/12] mm: apply_to_pte_range warn and fail if a large pte is encountered

2020-08-25 Thread Nicholas Piggin
apply_to_pte_range might mistake a large pte for bad, or treat it as a
page table, resulting in a crash or corruption. Add a test to warn and
return error if large entries are found.

Signed-off-by: Nicholas Piggin 
---
 mm/memory.c | 60 +++--
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 602f4283122f..995b2e790b79 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2262,13 +2262,20 @@ static int apply_to_pmd_range(struct mm_struct *mm, 
pud_t *pud,
}
do {
next = pmd_addr_end(addr, end);
-   if (create || !pmd_none_or_clear_bad(pmd)) {
-   err = apply_to_pte_range(mm, pmd, addr, next, fn, data,
-create);
-   if (err)
-   break;
+   if (pmd_none(*pmd) && !create)
+   continue;
+   if (WARN_ON_ONCE(pmd_leaf(*pmd)))
+   return -EINVAL;
+   if (!pmd_none(*pmd) && WARN_ON_ONCE(pmd_bad(*pmd))) {
+   if (!create)
+   continue;
+   pmd_clear_bad(pmd);
}
+   err = apply_to_pte_range(mm, pmd, addr, next, fn, data, create);
+   if (err)
+   break;
} while (pmd++, addr = next, addr != end);
+
return err;
 }
 
@@ -2289,13 +2296,20 @@ static int apply_to_pud_range(struct mm_struct *mm, 
p4d_t *p4d,
}
do {
next = pud_addr_end(addr, end);
-   if (create || !pud_none_or_clear_bad(pud)) {
-   err = apply_to_pmd_range(mm, pud, addr, next, fn, data,
-create);
-   if (err)
-   break;
+   if (pud_none(*pud) && !create)
+   continue;
+   if (WARN_ON_ONCE(pud_leaf(*pud)))
+   return -EINVAL;
+   if (!pud_none(*pud) && WARN_ON_ONCE(pud_bad(*pud))) {
+   if (!create)
+   continue;
+   pud_clear_bad(pud);
}
+   err = apply_to_pmd_range(mm, pud, addr, next, fn, data, create);
+   if (err)
+   break;
} while (pud++, addr = next, addr != end);
+
return err;
 }
 
@@ -2316,13 +2330,20 @@ static int apply_to_p4d_range(struct mm_struct *mm, 
pgd_t *pgd,
}
do {
next = p4d_addr_end(addr, end);
-   if (create || !p4d_none_or_clear_bad(p4d)) {
-   err = apply_to_pud_range(mm, p4d, addr, next, fn, data,
-create);
-   if (err)
-   break;
+   if (p4d_none(*p4d) && !create)
+   continue;
+   if (WARN_ON_ONCE(p4d_leaf(*p4d)))
+   return -EINVAL;
+   if (!p4d_none(*p4d) && WARN_ON_ONCE(p4d_bad(*p4d))) {
+   if (!create)
+   continue;
+   p4d_clear_bad(p4d);
}
+   err = apply_to_pud_range(mm, p4d, addr, next, fn, data, create);
+   if (err)
+   break;
} while (p4d++, addr = next, addr != end);
+
return err;
 }
 
@@ -2341,8 +2362,15 @@ static int __apply_to_page_range(struct mm_struct *mm, 
unsigned long addr,
pgd = pgd_offset(mm, addr);
do {
next = pgd_addr_end(addr, end);
-   if (!create && pgd_none_or_clear_bad(pgd))
+   if (pgd_none(*pgd) && !create)
continue;
+   if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+   return -EINVAL;
+   if (!pgd_none(*pgd) && WARN_ON_ONCE(pgd_bad(*pgd))) {
+   if (!create)
+   continue;
+   pgd_clear_bad(pgd);
+   }
err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, create);
if (err)
break;
-- 
2.23.0



[PATCH v7 01/12] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings

2020-08-25 Thread Nicholas Piggin
vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
Whether or not a vmap is huge depends on the architecture details,
alignments, boot options, etc., which the caller can not be expected
to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.

This change teaches vmalloc_to_page about larger pages, and returns
the struct page that corresponds to the offset within the large page.
This makes the API agnostic to mapping implementation details.

[*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
fail gracefully on unexpected huge vmap mappings")

Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 41 ++---
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b482d240f9a2..4e9b21adc73d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -36,7 +36,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 #include 
 #include 
@@ -343,7 +343,9 @@ int is_vmalloc_or_module_addr(const void *x)
 }
 
 /*
- * Walk a vmap address to the struct page it maps.
+ * Walk a vmap address to the struct page it maps. Huge vmap mappings will
+ * return the tail page that corresponds to the base page address, which
+ * matches small vmap mappings.
  */
 struct page *vmalloc_to_page(const void *vmalloc_addr)
 {
@@ -363,25 +365,33 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 
if (pgd_none(*pgd))
return NULL;
+   if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+   return NULL; /* XXX: no allowance for huge pgd */
+   if (WARN_ON_ONCE(pgd_bad(*pgd)))
+   return NULL;
+
p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d))
return NULL;
-   pud = pud_offset(p4d, addr);
+   if (p4d_leaf(*p4d))
+   return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(p4d_bad(*p4d)))
+   return NULL;
 
-   /*
-* Don't dereference bad PUD or PMD (below) entries. This will also
-* identify huge mappings, which we may encounter on architectures
-* that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
-* identified as vmalloc addresses by is_vmalloc_addr(), but are
-* not [unambiguously] associated with a struct page, so there is
-* no correct value to return for them.
-*/
-   WARN_ON_ONCE(pud_bad(*pud));
-   if (pud_none(*pud) || pud_bad(*pud))
+   pud = pud_offset(p4d, addr);
+   if (pud_none(*pud))
+   return NULL;
+   if (pud_leaf(*pud))
+   return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(pud_bad(*pud)))
return NULL;
+
pmd = pmd_offset(pud, addr);
-   WARN_ON_ONCE(pmd_bad(*pmd));
-   if (pmd_none(*pmd) || pmd_bad(*pmd))
+   if (pmd_none(*pmd))
+   return NULL;
+   if (pmd_leaf(*pmd))
+   return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(pmd_bad(*pmd)))
return NULL;
 
ptep = pte_offset_map(pmd, addr);
@@ -389,6 +399,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
if (pte_present(pte))
page = pte_page(pte);
pte_unmap(ptep);
+
return page;
 }
 EXPORT_SYMBOL(vmalloc_to_page);
-- 
2.23.0



[PATCH v7 00/12] huge vmalloc mappings

2020-08-25 Thread Nicholas Piggin
I think it's ready to go into -mm if it gets acks for the arch
changes.

Thanks,
Nick

Since v6:
- Fixed a false positive warning introduced in patch 2, found by
  kbuild test robot.

Since v5:
- Split arch changes out better and make the constant folding work
- Avoid most of the 80 column wrap, fix a reference to lib/ioremap.c
- Fix compile error on some archs

Since v4:
- Fixed an off-by-page-order bug in v4
- Several minor cleanups.
- Added page order to /proc/vmallocinfo
- Added hugepage to alloc_large_system_hage output.
- Made an architecture config option, powerpc only for now.

Since v3:
- Fixed an off-by-one bug in a loop
- Fix !CONFIG_HAVE_ARCH_HUGE_VMAP build fail
- Hopefully this time fix the arm64 vmap stack bug, thanks Jonathan
  Cameron for debugging the cause of this (hopefully).

Since v2:
- Rebased on vmalloc cleanups, split series into simpler pieces.
- Fixed several compile errors and warnings
- Keep the page array and accounting in small page units because
  struct vm_struct is an interface (this should fix x86 vmap stack debug
  assert). [Thanks Zefan]


Nicholas Piggin (12):
  mm/vmalloc: fix vmalloc_to_page for huge vmap mappings
  mm: apply_to_pte_range warn and fail if a large pte is encountered
  mm/vmalloc: rename vmap_*_range vmap_pages_*_range
  mm/ioremap: rename ioremap_*_range to vmap_*_range
  mm: HUGE_VMAP arch support cleanup
  powerpc: inline huge vmap supported functions
  arm64: inline huge vmap supported functions
  x86: inline huge vmap supported functions
  mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c
  mm/vmalloc: add vmap_range_noflush variant
  mm/vmalloc: Hugepage vmalloc mappings
  powerpc/64s/radix: Enable huge vmalloc mappings

 .../admin-guide/kernel-parameters.txt |   2 +
 arch/Kconfig  |   4 +
 arch/arm64/include/asm/vmalloc.h  |  25 +
 arch/arm64/mm/mmu.c   |  26 -
 arch/powerpc/Kconfig  |   1 +
 arch/powerpc/include/asm/vmalloc.h|  21 +
 arch/powerpc/mm/book3s64/radix_pgtable.c  |  21 -
 arch/x86/include/asm/vmalloc.h|  23 +
 arch/x86/mm/ioremap.c |  19 -
 arch/x86/mm/pgtable.c |  13 -
 include/linux/io.h|   9 -
 include/linux/vmalloc.h   |  10 +
 init/main.c   |   1 -
 mm/ioremap.c  | 225 +
 mm/memory.c   |  60 ++-
 mm/page_alloc.c   |   5 +-
 mm/vmalloc.c  | 443 +++---
 17 files changed, 515 insertions(+), 393 deletions(-)

-- 
2.23.0



Re: [PATCH 17/29] fs_enet: Avoid comma separated statements

2020-08-25 Thread David Miller
From: Joe Perches 
Date: Mon, 24 Aug 2020 21:56:14 -0700

> Use semicolons and braces.
> 
> Signed-off-by: Joe Perches 

Applied.


Re: [PATCH v8 2/8] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()

2020-08-25 Thread Christophe Leroy




Le 04/08/2020 à 13:17, Christophe Leroy a écrit :



On 07/16/2020 02:59 AM, Michael Ellerman wrote:

Christophe Leroy  writes:

The VDSO datapage and the text pages are always located immediately
next to each other, so it can be hardcoded without an indirection
through __kernel_datapage_offset

In order to ease things, move the data page in front like other
arches, that way there is no need to know the size of the library
to locate the data page.


[...]



I merged this but then realised it breaks the display of the vdso in 
/proc/self/maps.


ie. the vdso vma gets no name:

   # cat /proc/self/maps


[...]




And it's also going to break the logic in arch_unmap() to detect if
we're unmapping (part of) the VDSO. And it will break arch_remap() too.

And the logic to recognise the signal trampoline in
arch/powerpc/perf/callchain_*.c as well.


I don't think it breaks that one, because ->vdsobase is still the start 
of text.




So I'm going to rebase and drop this for now.

Basically we have a bunch of places that assume that vdso_base is == the
start of the VDSO vma, and also that the code starts there. So that will
need some work to tease out all those assumptions and make them work
with this change.


Ok, one day I need to look at it in more details and see how other 
architectures handle it etc ...




I just sent out a series which switches powerpc to the new 
_install_special_mapping() API, the one powerpc uses being deprecated 
since commit a62c34bd2a8a ("x86, mm: Improve _install_special_mapping

and fix x86 vdso naming")

arch_remap() gets replaced by vdso_remap()

For arch_unmap(), I'm wondering how/what other architectures do, because 
powerpc seems to be the only one to erase the vdso context pointer when 
unmapping the vdso. So far I updated it to take into account the pages 
switch.


Everything else is not impacted because our vdso_base is still the base 
of the text and that's what those things (signal trampoline, callchain, 
...) expect.


Maybe we should change it to 'void *vdso' in the same way as other 
architectures, as it is not anymore the exact vdso_base but the start of 
VDSO text.


Note that the series applies on top of the generic C VDSO implementation 
series. However all but the last commit cleanly apply without that 
series. As that last commit is just an afterwork cleanup, it can come in 
a second step.


Christophe


[PATCH v1 8/9] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()

2020-08-25 Thread Christophe Leroy
The VDSO datapage and the text pages are always located immediately
next to each other, so it can be hardcoded without an indirection
through __kernel_datapage_offset

Before:
clock-getres-realtime-coarse:vdso: 714 nsec/call
clock-gettime-realtime-coarse:vdso: 792 nsec/call
clock-gettime-realtime:vdso: 1243 nsec/call

After:
clock-getres-realtime-coarse:vdso: 699 nsec/call
clock-gettime-realtime-coarse:vdso: 784 nsec/call
clock-gettime-realtime:vdso: 1231 nsec/call

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/vdso_datapage.h |  8 +++--
 arch/powerpc/kernel/vdso.c   | 37 
 arch/powerpc/kernel/vdso32/datapage.S|  3 --
 arch/powerpc/kernel/vdso32/vdso32.lds.S  |  7 ++---
 arch/powerpc/kernel/vdso64/datapage.S|  3 --
 arch/powerpc/kernel/vdso64/vdso64.lds.S  |  7 ++---
 6 files changed, 9 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
index c4d320504d26..2bc415f7714c 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -104,10 +104,12 @@ extern struct vdso_arch_data *vdso_data;
 
 .macro get_datapage ptr, tmp
bcl 20, 31, .+4
+999:
mflr\ptr
-   addi\ptr, \ptr, (__kernel_datapage_offset - (.-4))@l
-   lwz \tmp, 0(\ptr)
-   add \ptr, \tmp, \ptr
+#if CONFIG_PPC_PAGE_SHIFT > 14
+   addis   \ptr, \ptr, (_vdso_datapage - 999b)@ha
+#endif
+   addi\ptr, \ptr, (_vdso_datapage - 999b)@l
 .endm
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 1d72c4b7672f..e2568d9ecdff 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -500,40 +500,6 @@ static __init void vdso_setup_trampolines(struct 
lib32_elfinfo *v32,
vdso32_rt_sigtramp = find_function32(v32, "__kernel_sigtramp_rt32");
 }
 
-static __init int vdso_fixup_datapage(struct lib32_elfinfo *v32,
-  struct lib64_elfinfo *v64)
-{
-#ifdef CONFIG_VDSO32
-   Elf32_Sym *sym32;
-#endif
-#ifdef CONFIG_PPC64
-   Elf64_Sym *sym64;
-
-   sym64 = find_symbol64(v64, "__kernel_datapage_offset");
-   if (sym64 == NULL) {
-   printk(KERN_ERR "vDSO64: Can't find symbol "
-  "__kernel_datapage_offset !\n");
-   return -1;
-   }
-   *((int *)(vdso64_kbase + sym64->st_value - VDSO64_LBASE)) =
-   (sym64->st_value - VDSO64_LBASE) - PAGE_SIZE;
-#endif /* CONFIG_PPC64 */
-
-#ifdef CONFIG_VDSO32
-   sym32 = find_symbol32(v32, "__kernel_datapage_offset");
-   if (sym32 == NULL) {
-   printk(KERN_ERR "vDSO32: Can't find symbol "
-  "__kernel_datapage_offset !\n");
-   return -1;
-   }
-   *((int *)(vdso32_kbase + (sym32->st_value - VDSO32_LBASE))) =
-   (sym32->st_value - VDSO32_LBASE) - PAGE_SIZE;
-#endif
-
-   return 0;
-}
-
-
 static __init int vdso_fixup_features(struct lib32_elfinfo *v32,
  struct lib64_elfinfo *v64)
 {
@@ -634,9 +600,6 @@ static __init int vdso_setup(void)
if (vdso_do_find_sections(, ))
return -1;
 
-   if (vdso_fixup_datapage(, ))
-   return -1;
-
if (vdso_fixup_features(, ))
return -1;
 
diff --git a/arch/powerpc/kernel/vdso32/datapage.S 
b/arch/powerpc/kernel/vdso32/datapage.S
index 217bb630f8f9..5513a4f8253e 100644
--- a/arch/powerpc/kernel/vdso32/datapage.S
+++ b/arch/powerpc/kernel/vdso32/datapage.S
@@ -13,9 +13,6 @@
 #include 
 
.text
-   .global __kernel_datapage_offset;
-__kernel_datapage_offset:
-   .long   0
 
 /*
  * void *__kernel_get_syscall_map(unsigned int *syscall_count) ;
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S 
b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 582c5b046cc9..25be27b47a9f 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -4,6 +4,7 @@
  * library
  */
 #include 
+#include 
 
 #ifdef __LITTLE_ENDIAN__
 OUTPUT_FORMAT("elf32-powerpcle", "elf32-powerpcle", "elf32-powerpcle")
@@ -15,6 +16,7 @@ ENTRY(_start)
 
 SECTIONS
 {
+   PROVIDE(_vdso_datapage = . - PAGE_SIZE);
. = VDSO32_LBASE + SIZEOF_HEADERS;
 
.hash   : { *(.hash) }  :text
@@ -139,11 +141,6 @@ VERSION
 {
VDSO_VERSION_STRING {
global:
-   /*
-* Has to be there for the kernel to find
-*/
-   __kernel_datapage_offset;
-
__kernel_get_syscall_map;
 #ifndef CONFIG_PPC_BOOK3S_601
__kernel_gettimeofday;
diff --git a/arch/powerpc/kernel/vdso64/datapage.S 
b/arch/powerpc/kernel/vdso64/datapage.S
index 067247d3efb9..03bb72c440dc 100644
--- a/arch/powerpc/kernel/vdso64/datapage.S
+++ b/arch/powerpc/kernel/vdso64/datapage.S

[PATCH v1 6/9] powerpc/vdso: Provide vdso_remap()

2020-08-25 Thread Christophe Leroy
Provide vdso_remap() through _install_special_mapping() and
drop arch_remap().

This adds a test of the size and returns -EINVAL if the size
is not correct.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/mm-arch-hooks.h | 25 -
 arch/powerpc/kernel/vdso.c   | 28 
 2 files changed, 28 insertions(+), 25 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h

diff --git a/arch/powerpc/include/asm/mm-arch-hooks.h 
b/arch/powerpc/include/asm/mm-arch-hooks.h
deleted file mode 100644
index dce274be824a..
--- a/arch/powerpc/include/asm/mm-arch-hooks.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Architecture specific mm hooks
- *
- * Copyright (C) 2015, IBM Corporation
- * Author: Laurent Dufour 
- */
-
-#ifndef _ASM_POWERPC_MM_ARCH_HOOKS_H
-#define _ASM_POWERPC_MM_ARCH_HOOKS_H
-
-static inline void arch_remap(struct mm_struct *mm,
- unsigned long old_start, unsigned long old_end,
- unsigned long new_start, unsigned long new_end)
-{
-   /*
-* mremap() doesn't allow moving multiple vmas so we can limit the
-* check to old_start == vdso_base.
-*/
-   if (old_start == mm->context.vdso_base)
-   mm->context.vdso_base = new_start;
-}
-#define arch_remap arch_remap
-
-#endif /* _ASM_POWERPC_MM_ARCH_HOOKS_H */
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 4ccfc0dc96b5..b9270923452e 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -114,13 +114,41 @@ struct lib64_elfinfo
unsigned long   text;
 };
 
+static int vdso_mremap(unsigned long vdso_pages,
+  const struct vm_special_mapping *sm,
+  struct vm_area_struct *new_vma)
+{
+   unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
+   unsigned long vdso_size = (vdso_pages + 1) << PAGE_SHIFT;
+
+   if (new_size != vdso_size)
+   return -EINVAL;
+
+   current->mm->context.vdso_base = (unsigned long)new_vma->vm_start;
+
+   return 0;
+}
+
+static int vdso32_mremap(const struct vm_special_mapping *sm,
+struct vm_area_struct *new_vma)
+{
+   return vdso_mremap(vdso32_pages, sm, new_vma);
+}
+
+static int vdso64_mremap(const struct vm_special_mapping *sm,
+struct vm_area_struct *new_vma)
+{
+   return vdso_mremap(vdso64_pages, sm, new_vma);
+}
 
 static struct vm_special_mapping vdso32_spec __ro_after_init = {
.name = "[vdso]",
+   .mremap = vdso32_mremap,
 };
 
 static struct vm_special_mapping vdso64_spec __ro_after_init = {
.name = "[vdso]",
+   .mremap = vdso64_mremap,
 };
 
 /*
-- 
2.25.0



[PATCH v1 7/9] powerpc/vdso: Move vdso datapage up front

2020-08-25 Thread Christophe Leroy
Move the vdso datapage in front of the VDSO area,
before vdso test.

This will allow to remove the __kernel_datapage_offset symbol
and simplify __get_datapage() in the following patch.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/mmu_context.h |  4 +++-
 arch/powerpc/kernel/vdso.c | 22 ++
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 7f3658a97384..be18ad12bb54 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -262,7 +262,9 @@ extern void arch_exit_mmap(struct mm_struct *mm);
 static inline void arch_unmap(struct mm_struct *mm,
  unsigned long start, unsigned long end)
 {
-   if (start <= mm->context.vdso_base && mm->context.vdso_base < end)
+   unsigned long vdso_base = mm->context.vdso_base - PAGE_SIZE;
+
+   if (start <= vdso_base && vdso_base < end)
mm->context.vdso_base = 0;
 }
 
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index b9270923452e..1d72c4b7672f 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -124,7 +124,7 @@ static int vdso_mremap(unsigned long vdso_pages,
if (new_size != vdso_size)
return -EINVAL;
 
-   current->mm->context.vdso_base = (unsigned long)new_vma->vm_start;
+   current->mm->context.vdso_base = (unsigned long)new_vma->vm_start + 
PAGE_SIZE;
 
return 0;
 }
@@ -217,7 +217,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
 * install_special_mapping or the perf counter mmap tracking code
 * will fail to recognise it as a vDSO (since arch_vma_name fails).
 */
-   current->mm->context.vdso_base = vdso_base;
+   current->mm->context.vdso_base = vdso_base + PAGE_SIZE;
 
/*
 * our vma flags don't have VM_WRITE so by default, the process isn't
@@ -516,8 +516,7 @@ static __init int vdso_fixup_datapage(struct lib32_elfinfo 
*v32,
return -1;
}
*((int *)(vdso64_kbase + sym64->st_value - VDSO64_LBASE)) =
-   (vdso64_pages << PAGE_SHIFT) -
-   (sym64->st_value - VDSO64_LBASE);
+   (sym64->st_value - VDSO64_LBASE) - PAGE_SIZE;
 #endif /* CONFIG_PPC64 */
 
 #ifdef CONFIG_VDSO32
@@ -528,8 +527,7 @@ static __init int vdso_fixup_datapage(struct lib32_elfinfo 
*v32,
return -1;
}
*((int *)(vdso32_kbase + (sym32->st_value - VDSO32_LBASE))) =
-   (vdso32_pages << PAGE_SHIFT) -
-   (sym32->st_value - VDSO32_LBASE);
+   (sym32->st_value - VDSO32_LBASE) - PAGE_SIZE;
 #endif
 
return 0;
@@ -771,10 +769,10 @@ static int __init vdso_init(void)
if (!pagelist)
goto alloc_failed;
 
-   for (i = 0; i < vdso32_pages; i++)
-   pagelist[i] = virt_to_page(vdso32_kbase + i * 
PAGE_SIZE);
+   pagelist[0] = virt_to_page(vdso_data);
 
-   pagelist[i++] = virt_to_page(vdso_data);
+   for (i = 0; i < vdso32_pages; i++)
+   pagelist[i + 1] = virt_to_page(vdso32_kbase + i * 
PAGE_SIZE);
 
vdso32_spec.pages = pagelist;
}
@@ -784,10 +782,10 @@ static int __init vdso_init(void)
if (!pagelist)
goto alloc_failed;
 
-   for (i = 0; i < vdso64_pages; i++)
-   pagelist[i] = virt_to_page(vdso64_kbase + i * 
PAGE_SIZE);
+   pagelist[0] = virt_to_page(vdso_data);
 
-   pagelist[i++] = virt_to_page(vdso_data);
+   for (i = 0; i < vdso64_pages; i++)
+   pagelist[i + 1] = virt_to_page(vdso64_kbase + i * 
PAGE_SIZE);
 
vdso64_spec.pages = pagelist;
}
-- 
2.25.0



[PATCH v1 9/9] powerpc/vdso: Remove unused \tmp param in __get_datapage()

2020-08-25 Thread Christophe Leroy
The \tmp param is not used anymore, remove it.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/vdso/gettimeofday.h | 4 ++--
 arch/powerpc/include/asm/vdso_datapage.h | 2 +-
 arch/powerpc/kernel/vdso32/cacheflush.S  | 2 +-
 arch/powerpc/kernel/vdso32/datapage.S| 4 ++--
 arch/powerpc/kernel/vdso64/cacheflush.S  | 2 +-
 arch/powerpc/kernel/vdso64/datapage.S| 4 ++--
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
index 59a609a48b63..8602f1243e8d 100644
--- a/arch/powerpc/include/asm/vdso/gettimeofday.h
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -22,7 +22,7 @@
 #ifdef CONFIG_PPC64
PPC_STL r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
 #endif
-   get_datapager5, r0
+   get_datapager5
addir5, r5, VDSO_DATA_OFFSET
bl  \funct
PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
@@ -51,7 +51,7 @@
 #ifdef CONFIG_PPC64
PPC_STL r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
 #endif
-   get_datapager4, r0
+   get_datapager4
addir4, r4, VDSO_DATA_OFFSET
bl  \funct
PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
index 2bc415f7714c..71f44598f392 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -102,7 +102,7 @@ extern struct vdso_arch_data *vdso_data;
 
 #else /* __ASSEMBLY__ */
 
-.macro get_datapage ptr, tmp
+.macro get_datapage ptr
bcl 20, 31, .+4
 999:
mflr\ptr
diff --git a/arch/powerpc/kernel/vdso32/cacheflush.S 
b/arch/powerpc/kernel/vdso32/cacheflush.S
index 3440ddf21c8b..017843bf5382 100644
--- a/arch/powerpc/kernel/vdso32/cacheflush.S
+++ b/arch/powerpc/kernel/vdso32/cacheflush.S
@@ -27,7 +27,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
 #ifdef CONFIG_PPC64
mflrr12
   .cfi_register lr,r12
-   get_datapager10, r0
+   get_datapager10
mtlrr12
 #endif
 
diff --git a/arch/powerpc/kernel/vdso32/datapage.S 
b/arch/powerpc/kernel/vdso32/datapage.S
index 5513a4f8253e..0513a2eabec8 100644
--- a/arch/powerpc/kernel/vdso32/datapage.S
+++ b/arch/powerpc/kernel/vdso32/datapage.S
@@ -28,7 +28,7 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map)
mflrr12
   .cfi_register lr,r12
mr. r4,r3
-   get_datapager3, r0
+   get_datapager3
mtlrr12
addir3,r3,CFG_SYSCALL_MAP32
beqlr
@@ -49,7 +49,7 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   get_datapager3, r0
+   get_datapager3
lwz r4,(CFG_TB_TICKS_PER_SEC + 4)(r3)
lwz r3,CFG_TB_TICKS_PER_SEC(r3)
mtlrr12
diff --git a/arch/powerpc/kernel/vdso64/cacheflush.S 
b/arch/powerpc/kernel/vdso64/cacheflush.S
index cab14324242b..61985de5758f 100644
--- a/arch/powerpc/kernel/vdso64/cacheflush.S
+++ b/arch/powerpc/kernel/vdso64/cacheflush.S
@@ -25,7 +25,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   get_datapager10, r0
+   get_datapager10
mtlrr12
 
lwz r7,CFG_DCACHE_BLOCKSZ(r10)
diff --git a/arch/powerpc/kernel/vdso64/datapage.S 
b/arch/powerpc/kernel/vdso64/datapage.S
index 03bb72c440dc..00760dc69d68 100644
--- a/arch/powerpc/kernel/vdso64/datapage.S
+++ b/arch/powerpc/kernel/vdso64/datapage.S
@@ -28,7 +28,7 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map)
mflrr12
   .cfi_register lr,r12
mr  r4,r3
-   get_datapager3, r0
+   get_datapager3
mtlrr12
addir3,r3,CFG_SYSCALL_MAP64
cmpldi  cr0,r4,0
@@ -50,7 +50,7 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   get_datapager3, r0
+   get_datapager3
ld  r3,CFG_TB_TICKS_PER_SEC(r3)
mtlrr12
crclr   cr0*4+so
-- 
2.25.0



[PATCH v1 5/9] powerpc/vdso: move to _install_special_mapping() and remove arch_vma_name()

2020-08-25 Thread Christophe Leroy
>From commit 2fea7f6c98f5 ("arm64: vdso: move to
_install_special_mapping and remove arch_vma_name").

Use the new _install_special_mapping() API added by
commit a62c34bd2a8a ("x86, mm: Improve _install_special_mapping
and fix x86 vdso naming") which obsolete install_special_mapping().

And remove arch_vma_name() as the name is handled by the new API.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso.c | 59 +++---
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index bbb69832fd46..4ccfc0dc96b5 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -47,7 +47,6 @@
 
 static unsigned int vdso32_pages;
 static void *vdso32_kbase;
-static struct page **vdso32_pagelist;
 unsigned long vdso32_sigtramp;
 unsigned long vdso32_rt_sigtramp;
 
@@ -56,7 +55,6 @@ extern char vdso32_start, vdso32_end;
 extern char vdso64_start, vdso64_end;
 static void *vdso64_kbase = _start;
 static unsigned int vdso64_pages;
-static struct page **vdso64_pagelist;
 #ifdef CONFIG_PPC64
 unsigned long vdso64_rt_sigtramp;
 #endif /* CONFIG_PPC64 */
@@ -117,6 +115,14 @@ struct lib64_elfinfo
 };
 
 
+static struct vm_special_mapping vdso32_spec __ro_after_init = {
+   .name = "[vdso]",
+};
+
+static struct vm_special_mapping vdso64_spec __ro_after_init = {
+   .name = "[vdso]",
+};
+
 /*
  * This is called from binfmt_elf, we create the special vma for the
  * vDSO and insert it into the mm struct tree
@@ -124,7 +130,8 @@ struct lib64_elfinfo
 int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 {
struct mm_struct *mm = current->mm;
-   struct page **vdso_pagelist;
+   struct vm_special_mapping *vdso_spec;
+   struct vm_area_struct *vma;
unsigned long vdso_pages;
unsigned long vdso_base;
int rc;
@@ -133,11 +140,11 @@ int arch_setup_additional_pages(struct linux_binprm 
*bprm, int uses_interp)
return 0;
 
if (is_32bit_task()) {
-   vdso_pagelist = vdso32_pagelist;
+   vdso_spec = _spec;
vdso_pages = vdso32_pages;
vdso_base = VDSO32_MBASE;
} else {
-   vdso_pagelist = vdso64_pagelist;
+   vdso_spec = _spec;
vdso_pages = vdso64_pages;
/*
 * On 64bit we don't have a preferred map address. This
@@ -194,12 +201,12 @@ int arch_setup_additional_pages(struct linux_binprm 
*bprm, int uses_interp)
 * It's fine to use that for setting breakpoints in the vDSO code
 * pages though.
 */
-   rc = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
-VM_READ|VM_EXEC|
-VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
-vdso_pagelist);
-   if (rc) {
+   vma = _install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
+  VM_READ | VM_EXEC | VM_MAYREAD |
+  VM_MAYWRITE | VM_MAYEXEC, vdso_spec);
+   if (IS_ERR(vma)) {
current->mm->context.vdso_base = 0;
+   rc = PTR_ERR(vma);
goto fail_mmapsem;
}
 
@@ -211,15 +218,6 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
return rc;
 }
 
-const char *arch_vma_name(struct vm_area_struct *vma)
-{
-   if (vma->vm_mm && vma->vm_start == vma->vm_mm->context.vdso_base)
-   return "[vdso]";
-   return NULL;
-}
-
-
-
 #ifdef CONFIG_VDSO32
 static void * __init find_section32(Elf32_Ehdr *ehdr, const char *secname,
  unsigned long *size)
@@ -685,6 +683,7 @@ early_initcall(vdso_getcpu_init);
 static int __init vdso_init(void)
 {
int i;
+   struct page **pagelist;
 
 #ifdef CONFIG_PPC64
/*
@@ -740,27 +739,29 @@ static int __init vdso_init(void)
 
if (IS_ENABLED(CONFIG_VDSO32)) {
/* Make sure pages are in the correct state */
-   vdso32_pagelist = kcalloc(vdso32_pages + 1, sizeof(struct page 
*),
- GFP_KERNEL);
-   if (!vdso32_pagelist)
+   pagelist = kcalloc(vdso32_pages + 1, sizeof(struct page *), 
GFP_KERNEL);
+   if (!pagelist)
goto alloc_failed;
 
for (i = 0; i < vdso32_pages; i++)
-   vdso32_pagelist[i] = virt_to_page(vdso32_kbase + i * 
PAGE_SIZE);
+   pagelist[i] = virt_to_page(vdso32_kbase + i * 
PAGE_SIZE);
+
+   pagelist[i++] = virt_to_page(vdso_data);
 
-   vdso32_pagelist[i] = virt_to_page(vdso_data);
+   vdso32_spec.pages = pagelist;
}
 
if (IS_ENABLED(CONFIG_PPC64)) {
-   vdso64_pagelist = kcalloc(vdso64_pages + 1, 

[PATCH v1 4/9] powerpc/vdso: Remove unnecessary ifdefs in vdso_pagelist initialization

2020-08-25 Thread Christophe Leroy
No need of all those #ifdefs around the pagelist initialisation,
use IS_ENABLED(), GCC will kick out unused static variables.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso.c | 57 +++---
 1 file changed, 22 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index daef14a284a3..bbb69832fd46 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -51,15 +51,13 @@ static struct page **vdso32_pagelist;
 unsigned long vdso32_sigtramp;
 unsigned long vdso32_rt_sigtramp;
 
-#ifdef CONFIG_VDSO32
 extern char vdso32_start, vdso32_end;
-#endif
 
-#ifdef CONFIG_PPC64
 extern char vdso64_start, vdso64_end;
 static void *vdso64_kbase = _start;
 static unsigned int vdso64_pages;
 static struct page **vdso64_pagelist;
+#ifdef CONFIG_PPC64
 unsigned long vdso64_rt_sigtramp;
 #endif /* CONFIG_PPC64 */
 
@@ -134,7 +132,6 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
if (!vdso_ready)
return 0;
 
-#ifdef CONFIG_PPC64
if (is_32bit_task()) {
vdso_pagelist = vdso32_pagelist;
vdso_pages = vdso32_pages;
@@ -149,11 +146,6 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
 */
vdso_base = 0;
}
-#else
-   vdso_pagelist = vdso32_pagelist;
-   vdso_pages = vdso32_pages;
-   vdso_base = VDSO32_MBASE;
-#endif
 
current->mm->context.vdso_base = 0;
 
@@ -718,16 +710,14 @@ static int __init vdso_init(void)
vdso_data->icache_block_size = ppc64_caches.l1i.block_size;
vdso_data->dcache_log_block_size = ppc64_caches.l1d.log_block_size;
vdso_data->icache_log_block_size = ppc64_caches.l1i.log_block_size;
+#endif /* CONFIG_PPC64 */
 
/*
 * Calculate the size of the 64 bits vDSO
 */
vdso64_pages = (_end - _start) >> PAGE_SHIFT;
DBG("vdso64_kbase: %p, 0x%x pages\n", vdso64_kbase, vdso64_pages);
-#endif /* CONFIG_PPC64 */
 
-
-#ifdef CONFIG_VDSO32
vdso32_kbase = _start;
 
/*
@@ -735,8 +725,6 @@ static int __init vdso_init(void)
 */
vdso32_pages = (_end - _start) >> PAGE_SHIFT;
DBG("vdso32_kbase: %p, 0x%x pages\n", vdso32_kbase, vdso32_pages);
-#endif
-
 
/*
 * Setup the syscall map in the vDOS
@@ -750,30 +738,30 @@ static int __init vdso_init(void)
if (vdso_setup())
goto setup_failed;
 
-#ifdef CONFIG_VDSO32
-   /* Make sure pages are in the correct state */
-   vdso32_pagelist = kcalloc(vdso32_pages + 1, sizeof(struct page *),
- GFP_KERNEL);
-   if (!vdso32_pagelist)
-   goto alloc_failed;
+   if (IS_ENABLED(CONFIG_VDSO32)) {
+   /* Make sure pages are in the correct state */
+   vdso32_pagelist = kcalloc(vdso32_pages + 1, sizeof(struct page 
*),
+ GFP_KERNEL);
+   if (!vdso32_pagelist)
+   goto alloc_failed;
 
-   for (i = 0; i < vdso32_pages; i++)
-   vdso32_pagelist[i] = virt_to_page(vdso32_kbase + i * PAGE_SIZE);
+   for (i = 0; i < vdso32_pages; i++)
+   vdso32_pagelist[i] = virt_to_page(vdso32_kbase + i * 
PAGE_SIZE);
 
-   vdso32_pagelist[i] = virt_to_page(vdso_data);
-#endif
+   vdso32_pagelist[i] = virt_to_page(vdso_data);
+   }
 
-#ifdef CONFIG_PPC64
-   vdso64_pagelist = kcalloc(vdso64_pages + 1, sizeof(struct page *),
- GFP_KERNEL);
-   if (!vdso64_pagelist)
-   goto alloc_failed;
+   if (IS_ENABLED(CONFIG_PPC64)) {
+   vdso64_pagelist = kcalloc(vdso64_pages + 1, sizeof(struct page 
*),
+ GFP_KERNEL);
+   if (!vdso64_pagelist)
+   goto alloc_failed;
 
-   for (i = 0; i < vdso64_pages; i++)
-   vdso64_pagelist[i] = virt_to_page(vdso64_kbase + i * PAGE_SIZE);
+   for (i = 0; i < vdso64_pages; i++)
+   vdso64_pagelist[i] = virt_to_page(vdso64_kbase + i * 
PAGE_SIZE);
 
-   vdso64_pagelist[i] = virt_to_page(vdso_data);
-#endif /* CONFIG_PPC64 */
+   vdso64_pagelist[i] = virt_to_page(vdso_data);
+   }
 
smp_wmb();
vdso_ready = 1;
@@ -784,9 +772,8 @@ static int __init vdso_init(void)
pr_err("vDSO setup failure, not enabled !\n");
 alloc_failed:
vdso32_pages = 0;
-#ifdef CONFIG_PPC64
vdso64_pages = 0;
-#endif
+
return 0;
 }
 arch_initcall(vdso_init);
-- 
2.25.0



[PATCH v1 2/9] powerpc/vdso: Remove get_page() in vdso_pagelist initialization

2020-08-25 Thread Christophe Leroy
Partly copied from commit 16fb1a9bec61 ("arm64: vdso: clean up
vdso_pagelist initialization").

No need to get_page() the vdso text/data - these are part of the
kernel image.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 88a4a02ed4c4..3bc4d5b1980b 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -757,11 +757,9 @@ static int __init vdso_init(void)
if (!vdso32_pagelist)
goto alloc_failed;
 
-   for (i = 0; i < vdso32_pages; i++) {
-   struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE);
-   get_page(pg);
-   vdso32_pagelist[i] = pg;
-   }
+   for (i = 0; i < vdso32_pages; i++)
+   vdso32_pagelist[i] = virt_to_page(vdso32_kbase + i * PAGE_SIZE);
+
vdso32_pagelist[i++] = virt_to_page(vdso_data);
vdso32_pagelist[i] = NULL;
 #endif
@@ -772,17 +770,13 @@ static int __init vdso_init(void)
if (!vdso64_pagelist)
goto alloc_failed;
 
-   for (i = 0; i < vdso64_pages; i++) {
-   struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE);
-   get_page(pg);
-   vdso64_pagelist[i] = pg;
-   }
+   for (i = 0; i < vdso64_pages; i++)
+   vdso64_pagelist[i] = virt_to_page(vdso64_kbase + i * PAGE_SIZE);
+
vdso64_pagelist[i++] = virt_to_page(vdso_data);
vdso64_pagelist[i] = NULL;
 #endif /* CONFIG_PPC64 */
 
-   get_page(virt_to_page(vdso_data));
-
smp_wmb();
vdso_ready = 1;
 
-- 
2.25.0



[PATCH v1 3/9] powerpc/vdso: Remove NULL termination element in vdso_pagelist

2020-08-25 Thread Christophe Leroy
No need of a NULL last element in pagelists, install_special_mapping()
knows how long the list is.

Remove that element.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 3bc4d5b1980b..daef14a284a3 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -752,7 +752,7 @@ static int __init vdso_init(void)
 
 #ifdef CONFIG_VDSO32
/* Make sure pages are in the correct state */
-   vdso32_pagelist = kcalloc(vdso32_pages + 2, sizeof(struct page *),
+   vdso32_pagelist = kcalloc(vdso32_pages + 1, sizeof(struct page *),
  GFP_KERNEL);
if (!vdso32_pagelist)
goto alloc_failed;
@@ -760,12 +760,11 @@ static int __init vdso_init(void)
for (i = 0; i < vdso32_pages; i++)
vdso32_pagelist[i] = virt_to_page(vdso32_kbase + i * PAGE_SIZE);
 
-   vdso32_pagelist[i++] = virt_to_page(vdso_data);
-   vdso32_pagelist[i] = NULL;
+   vdso32_pagelist[i] = virt_to_page(vdso_data);
 #endif
 
 #ifdef CONFIG_PPC64
-   vdso64_pagelist = kcalloc(vdso64_pages + 2, sizeof(struct page *),
+   vdso64_pagelist = kcalloc(vdso64_pages + 1, sizeof(struct page *),
  GFP_KERNEL);
if (!vdso64_pagelist)
goto alloc_failed;
@@ -773,8 +772,7 @@ static int __init vdso_init(void)
for (i = 0; i < vdso64_pages; i++)
vdso64_pagelist[i] = virt_to_page(vdso64_kbase + i * PAGE_SIZE);
 
-   vdso64_pagelist[i++] = virt_to_page(vdso_data);
-   vdso64_pagelist[i] = NULL;
+   vdso64_pagelist[i] = virt_to_page(vdso_data);
 #endif /* CONFIG_PPC64 */
 
smp_wmb();
-- 
2.25.0



[PATCH v1 1/9] powerpc/vdso: Remove BUG_ON() in vdso_init()

2020-08-25 Thread Christophe Leroy
If we are not able to allocate memory for the pagelists, bail out.

There is no reason to crash the machine, just have vdso init fail.

Signed-off-by: Christophe Leroy 
---
This series is based on top of the series switching to C VDSO implementation,
but in fact only the last patch depends on that series and is not vital as
it is just afterwork cleanup.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 23208a051af5..88a4a02ed4c4 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -747,20 +747,16 @@ static int __init vdso_init(void)
 * Initialize the vDSO images in memory, that is do necessary
 * fixups of vDSO symbols, locate trampolines, etc...
 */
-   if (vdso_setup()) {
-   printk(KERN_ERR "vDSO setup failure, not enabled !\n");
-   vdso32_pages = 0;
-#ifdef CONFIG_PPC64
-   vdso64_pages = 0;
-#endif
-   return 0;
-   }
+   if (vdso_setup())
+   goto setup_failed;
 
 #ifdef CONFIG_VDSO32
/* Make sure pages are in the correct state */
vdso32_pagelist = kcalloc(vdso32_pages + 2, sizeof(struct page *),
  GFP_KERNEL);
-   BUG_ON(vdso32_pagelist == NULL);
+   if (!vdso32_pagelist)
+   goto alloc_failed;
+
for (i = 0; i < vdso32_pages; i++) {
struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE);
get_page(pg);
@@ -773,7 +769,9 @@ static int __init vdso_init(void)
 #ifdef CONFIG_PPC64
vdso64_pagelist = kcalloc(vdso64_pages + 2, sizeof(struct page *),
  GFP_KERNEL);
-   BUG_ON(vdso64_pagelist == NULL);
+   if (!vdso64_pagelist)
+   goto alloc_failed;
+
for (i = 0; i < vdso64_pages; i++) {
struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE);
get_page(pg);
@@ -789,5 +787,14 @@ static int __init vdso_init(void)
vdso_ready = 1;
 
return 0;
+
+setup_failed:
+   pr_err("vDSO setup failure, not enabled !\n");
+alloc_failed:
+   vdso32_pages = 0;
+#ifdef CONFIG_PPC64
+   vdso64_pages = 0;
+#endif
+   return 0;
 }
 arch_initcall(vdso_init);
-- 
2.25.0



Re: [PATCH net] ibmvnic fix NULL tx_pools and rx_tools issue at do_reset

2020-08-25 Thread Brian W Hart
On Mon, Aug 24, 2020 at 07:49:23PM -0400, Dany Madden wrote:
> From: Mingming Cao 
> 
> At the time of do_reset, ibmvnic tries to re-initalize the tx_pools
> and rx_pools to avoid re-allocating the long term buffer. However
> there is a window inside do_reset that the tx_pools and
> rx_pools were freed before re-initialized making it possible to deference
> null pointers.
> 
> This patch fixes this issue by checking that the tx_pool
> and rx_pool are not NULL after ibmvnic_login. If so, re-allocating
> the pools. This will avoid getting into calling reset_tx/rx_pools with
> NULL adapter tx_pools/rx_pools pointer. Also add null pointer check in
> reset_tx_pools and reset_rx_pools to safe handle NULL pointer case.
> 
> Signed-off-by: Mingming Cao 
> Signed-off-by: Dany Madden 
> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
> b/drivers/net/ethernet/ibm/ibmvnic.c
> index 5afb3c9c52d2..5ff48e55308b 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -479,6 +479,9 @@ static int reset_rx_pools(struct ibmvnic_adapter *adapter)
>   int i, j, rc;
>   u64 *size_array;
> 
> + if (!adapter->tx_pool)
> + return -1;
> +

Should this one be testing rx_pool?

brian

>   size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
>   be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size));
> 
> @@ -649,6 +652,9 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter)
>   int tx_scrqs;
>   int i, rc;
> 
> + if (!adapter->tx_pool)
> + return -1;
> +
>   tx_scrqs = be32_to_cpu(adapter->login_rsp_buf->num_txsubm_subcrqs);
>   for (i = 0; i < tx_scrqs; i++) {
>   rc = reset_one_tx_pool(adapter, >tso_pool[i]);
> @@ -2011,7 +2017,10 @@ static int do_reset(struct ibmvnic_adapter *adapter,
>   adapter->req_rx_add_entries_per_subcrq !=
>   old_num_rx_slots ||
>   adapter->req_tx_entries_per_subcrq !=
> - old_num_tx_slots) {
> + old_num_tx_slots ||
> + !adapter->rx_pool ||
> + !adapter->tso_pool ||
> + !adapter->tx_pool) {
>   release_rx_pools(adapter);
>   release_tx_pools(adapter);
>   release_napi(adapter);
> @@ -2024,10 +2033,14 @@ static int do_reset(struct ibmvnic_adapter *adapter,
>   } else {
>   rc = reset_tx_pools(adapter);
>   if (rc)
> + netdev_dbg(adapter->netdev, "reset tx pools 
> failed (%d)\n",
> + rc);
>   goto out;
> 
>   rc = reset_rx_pools(adapter);
>   if (rc)
> + netdev_dbg(adapter->netdev, "reset rx pools 
> failed (%d)\n",
> + rc);
>   goto out;
>   }
>   ibmvnic_disable_irqs(adapter);
> -- 
> 2.18.2
> 


Re: [PATCH] powerpc: Update documentation of ISA versions for Power10

2020-08-25 Thread Jordan Niethe
On Tue, Aug 25, 2020 at 10:41 PM Gabriel Paubert  wrote:
>
> On Tue, Aug 25, 2020 at 09:45:07PM +1000, Jordan Niethe wrote:
> > Update the CPU to ISA Version Mapping document to include Power10 and
> > ISA v3.1.
> >
> > Signed-off-by: Jordan Niethe 
> > ---
> >  Documentation/powerpc/isa-versions.rst | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/Documentation/powerpc/isa-versions.rst 
> > b/Documentation/powerpc/isa-versions.rst
> > index a363d8c1603c..72aff1eaaea1 100644
> > --- a/Documentation/powerpc/isa-versions.rst
> > +++ b/Documentation/powerpc/isa-versions.rst
> > @@ -7,6 +7,7 @@ Mapping of some CPU versions to relevant ISA versions.
> >  = 
> > 
> >  CPU   Architecture version
> >  = 
> > 
> > +Power10   Power ISA v3.1
> >  Power9Power ISA v3.0B
> >  Power8Power ISA v2.07
> >  Power7Power ISA v2.06
> > @@ -32,6 +33,7 @@ Key Features
> >  == ==
> >  CPUVMX (aka. Altivec)
> >  == ==
> > +Power10Yes
> >  Power9 Yes
> >  Power8 Yes
> >  Power7 Yes
> > @@ -47,6 +49,7 @@ PPC970 Yes
> >  == 
> >  CPUVSX
> >  == 
> > +Power10Yes
> >  Power9 Yes
> >  Power8 Yes
> >  Power7 Yes
> > @@ -62,6 +65,7 @@ PPC970 No
> >  == 
> >  CPUTransactional Memory
> >  == 
> > +Power10Yes
> >  Power9 Yes (* see transactional_memory.txt)
> >  Power8 Yes
> >  Power7 No
>
> Huh?
>
> Transactional memory has been removed from the architecture for Power10.
Yeah you're right, I confused myself looking at CPU_FTRS_POWER10...
#define CPU_FTRS_POWER10 (CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
CPU_FTR_DSCR | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \
CPU_FTR_DAWR | CPU_FTR_DAWR1)

CPU_FTR_TM_COMP should not be in there.

>
> Gabriel
>
>


[PATCH v11 5/5] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

2020-08-25 Thread Christophe Leroy
Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso32/gettimeofday.S  | 9 +
 arch/powerpc/kernel/vdso32/vdso32.lds.S| 1 +
 arch/powerpc/kernel/vdso32/vgettimeofday.c | 6 ++
 3 files changed, 16 insertions(+)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index fd7b01c51281..a6e29f880e0e 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -35,6 +35,15 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
cvdso_call __c_kernel_clock_gettime
 V_FUNCTION_END(__kernel_clock_gettime)
 
+/*
+ * Exact prototype of clock_gettime64()
+ *
+ * int __kernel_clock_gettime64(clockid_t clock_id, struct __timespec64 *ts);
+ *
+ */
+V_FUNCTION_BEGIN(__kernel_clock_gettime64)
+   cvdso_call __c_kernel_clock_gettime64
+V_FUNCTION_END(__kernel_clock_gettime64)
 
 /*
  * Exact prototype of clock_getres()
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S 
b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 4c985467a668..582c5b046cc9 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -148,6 +148,7 @@ VERSION
 #ifndef CONFIG_PPC_BOOK3S_601
__kernel_gettimeofday;
__kernel_clock_gettime;
+   __kernel_clock_gettime64;
__kernel_clock_getres;
__kernel_time;
__kernel_get_tbfreq;
diff --git a/arch/powerpc/kernel/vdso32/vgettimeofday.c 
b/arch/powerpc/kernel/vdso32/vgettimeofday.c
index 0b9ab4c22ef2..f7f71fecf4ed 100644
--- a/arch/powerpc/kernel/vdso32/vgettimeofday.c
+++ b/arch/powerpc/kernel/vdso32/vgettimeofday.c
@@ -11,6 +11,12 @@ int __c_kernel_clock_gettime(clockid_t clock, struct 
old_timespec32 *ts,
return __cvdso_clock_gettime32_data(vd, clock, ts);
 }
 
+int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts,
+  const struct vdso_data *vd)
+{
+   return __cvdso_clock_gettime_data(vd, clock, ts);
+}
+
 int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone 
*tz,
const struct vdso_data *vd)
 {
-- 
2.25.0



[PATCH v11 4/5] powerpc/vdso: Switch VDSO to generic C implementation.

2020-08-25 Thread Christophe Leroy
For VDSO32 on PPC64, we create a fake 32 bits config, on the same
principle as MIPS architecture, in order to get the correct parts of
the different asm header files.

With the C VDSO, the performance is slightly lower, but it is worth
it as it will ease maintenance and evolution, and also brings clocks
that are not supported with the ASM VDSO.

On an 8xx at 132 MHz, vdsotest with the ASM VDSO:
gettimeofday:vdso: 828 nsec/call
clock-getres-realtime-coarse:vdso: 391 nsec/call
clock-gettime-realtime-coarse:vdso: 614 nsec/call
clock-getres-realtime:vdso: 460 nsec/call
clock-gettime-realtime:vdso: 876 nsec/call
clock-getres-monotonic-coarse:vdso: 399 nsec/call
clock-gettime-monotonic-coarse:vdso: 691 nsec/call
clock-getres-monotonic:vdso: 460 nsec/call
clock-gettime-monotonic:vdso: 1026 nsec/call

On an 8xx at 132 MHz, vdsotest with the C VDSO:
gettimeofday:vdso: 955 nsec/call
clock-getres-realtime-coarse:vdso: 545 nsec/call
clock-gettime-realtime-coarse:vdso: 592 nsec/call
clock-getres-realtime:vdso: 545 nsec/call
clock-gettime-realtime:vdso: 941 nsec/call
clock-getres-monotonic-coarse:vdso: 545 nsec/call
clock-gettime-monotonic-coarse:vdso: 591 nsec/call
clock-getres-monotonic:vdso: 545 nsec/call
clock-gettime-monotonic:vdso: 940 nsec/call

It is even better for gettime with monotonic clocks.

Unsupported clocks with ASM VDSO:
clock-gettime-boottime:vdso: 3851 nsec/call
clock-gettime-tai:vdso: 3852 nsec/call
clock-gettime-monotonic-raw:vdso: 3396 nsec/call

Same clocks with C VDSO:
clock-gettime-tai:vdso: 941 nsec/call
clock-gettime-monotonic-raw:vdso: 1001 nsec/call
clock-gettime-monotonic-coarse:vdso: 591 nsec/call

On an 8321E at 333 MHz, vdsotest with the ASM VDSO:
gettimeofday:vdso: 220 nsec/call
clock-getres-realtime-coarse:vdso: 102 nsec/call
clock-gettime-realtime-coarse:vdso: 178 nsec/call
clock-getres-realtime:vdso: 129 nsec/call
clock-gettime-realtime:vdso: 235 nsec/call
clock-getres-monotonic-coarse:vdso: 105 nsec/call
clock-gettime-monotonic-coarse:vdso: 208 nsec/call
clock-getres-monotonic:vdso: 129 nsec/call
clock-gettime-monotonic:vdso: 274 nsec/call

On an 8321E at 333 MHz, vdsotest with the C VDSO:
gettimeofday:vdso: 272 nsec/call
clock-getres-realtime-coarse:vdso: 160 nsec/call
clock-gettime-realtime-coarse:vdso: 184 nsec/call
clock-getres-realtime:vdso: 166 nsec/call
clock-gettime-realtime:vdso: 281 nsec/call
clock-getres-monotonic-coarse:vdso: 160 nsec/call
clock-gettime-monotonic-coarse:vdso: 184 nsec/call
clock-getres-monotonic:vdso: 169 nsec/call
clock-gettime-monotonic:vdso: 275 nsec/call

Signed-off-by: Christophe Leroy 
---
v9:
- Rebased (Impact on arch/powerpc/kernel/vdso??/Makefile

v7:
- Split out preparatory changes in a new preceding patch
- Added -fasynchronous-unwind-tables to CC flags.

v6:
- Added missing prototypes in asm/vdso/gettimeofday.h for __c_kernel_ functions.
- Using STACK_FRAME_OVERHEAD instead of INT_FRAME_SIZE
- Rebased on powerpc/merge as of 7 Apr 2020
- Fixed build failure with gcc 9
- Added a patch to create asm/vdso/processor.h and more cpu_relax() in it

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   |   2 +
 arch/powerpc/include/asm/vdso/vsyscall.h   |  25 ++
 arch/powerpc/include/asm/vdso_datapage.h   |  40 +--
 arch/powerpc/kernel/asm-offsets.c  |  49 +---
 arch/powerpc/kernel/time.c |  91 +--
 arch/powerpc/kernel/vdso.c |   5 +-
 arch/powerpc/kernel/vdso32/Makefile|  32 ++-
 arch/powerpc/kernel/vdso32/config-fake32.h |  34 +++
 arch/powerpc/kernel/vdso32/gettimeofday.S  | 291 +
 arch/powerpc/kernel/vdso64/Makefile|  23 +-
 arch/powerpc/kernel/vdso64/gettimeofday.S  | 242 +
 11 files changed, 143 insertions(+), 691 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
 create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1f48bbfb3ce9..5bd22f4b38c3 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -173,6 +173,7 @@ config PPC
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
select GENERIC_TIME_VSYSCALL
+   select GENERIC_GETTIMEOFDAY
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
@@ -203,6 +204,7 @@ config PPC
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200   # 
plugin support on gcc <= 5.1 is buggy on PPC
+   select HAVE_GENERIC_VDSO
select HAVE_HW_BREAKPOINT   if PERF_EVENTS && (PPC_BOOK3S 
|| PPC_8xx)
select HAVE_IDE
select HAVE_IOREMAP_PROT
diff --git 

[PATCH v11 3/5] powerpc/vdso: Save and restore TOC pointer on PPC64

2020-08-25 Thread Christophe Leroy
On PPC64, the TOC pointer needs to be saved and restored.

Suggested-by: Michael Ellerman 
Signed-off-by: Christophe Leroy 
---
v9: New.

I'm not sure this is really needed, I can't see the VDSO C code doing
anything with r2, at least on ppc64_defconfig.

So I let you decide whether you take it or not.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/vdso/gettimeofday.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
index dce9d5051259..59a609a48b63 100644
--- a/arch/powerpc/include/asm/vdso/gettimeofday.h
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -19,10 +19,16 @@
   .cfi_register lr, r0
PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+   PPC_STL r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
get_datapager5, r0
addir5, r5, VDSO_DATA_OFFSET
bl  \funct
PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+   PPC_LL  r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
cmpwi   r3, 0
mtlrr0
   .cfi_restore lr
@@ -42,10 +48,16 @@
   .cfi_register lr, r0
PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+   PPC_STL r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
get_datapager4, r0
addir4, r4, VDSO_DATA_OFFSET
bl  \funct
PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+#ifdef CONFIG_PPC64
+   PPC_LL  r2, STACK_FRAME_OVERHEAD + STK_GOT(r1)
+#endif
crclr   so
mtlrr0
   .cfi_restore lr
-- 
2.25.0



[PATCH v11 1/5] powerpc/processor: Move cpu_relax() into asm/vdso/processor.h

2020-08-25 Thread Christophe Leroy
cpu_relax() need to be in asm/vdso/processor.h to be used by
the C VDSO generic library.

Move it there.

Signed-off-by: Christophe Leroy 
---
v9: Forgot to remove cpu_relax() from processor.h in v8
---
 arch/powerpc/include/asm/processor.h  | 13 ++---
 arch/powerpc/include/asm/vdso/processor.h | 23 +++
 2 files changed, 25 insertions(+), 11 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vdso/processor.h

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa..c1ba9c8d9b90 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -6,6 +6,8 @@
  * Copyright (C) 2001 PPC 64 Team, IBM Corp
  */
 
+#include 
+
 #include 
 
 #ifdef CONFIG_VSX
@@ -63,14 +65,6 @@ extern int _chrp_type;
 
 #endif /* defined(__KERNEL__) && defined(CONFIG_PPC32) */
 
-/* Macros for adjusting thread priority (hardware multi-threading) */
-#define HMT_very_low()   asm volatile("or 31,31,31   # very low priority")
-#define HMT_low()   asm volatile("or 1,1,1  # low priority")
-#define HMT_medium_low() asm volatile("or 6,6,6  # medium low priority")
-#define HMT_medium()asm volatile("or 2,2,2  # medium priority")
-#define HMT_medium_high() asm volatile("or 5,5,5  # medium high priority")
-#define HMT_high()  asm volatile("or 3,3,3  # high priority")
-
 #ifdef __KERNEL__
 
 #ifdef CONFIG_PPC64
@@ -350,7 +344,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 }
 
 #ifdef CONFIG_PPC64
-#define cpu_relax()do { HMT_low(); HMT_medium(); barrier(); } while (0)
 
 #define spin_begin()   HMT_low()
 
@@ -369,8 +362,6 @@ do {
\
}   \
 } while (0)
 
-#else
-#define cpu_relax()barrier()
 #endif
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
diff --git a/arch/powerpc/include/asm/vdso/processor.h 
b/arch/powerpc/include/asm/vdso/processor.h
new file mode 100644
index ..39b9beace9ca
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/processor.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_VDSO_PROCESSOR_H
+#define __ASM_VDSO_PROCESSOR_H
+
+#ifndef __ASSEMBLY__
+
+/* Macros for adjusting thread priority (hardware multi-threading) */
+#define HMT_very_low() asm volatile("or 31, 31, 31 # very low 
priority")
+#define HMT_low()  asm volatile("or 1, 1, 1# low priority")
+#define HMT_medium_low()   asm volatile("or 6, 6, 6# medium low 
priority")
+#define HMT_medium()   asm volatile("or 2, 2, 2# medium 
priority")
+#define HMT_medium_high()  asm volatile("or 5, 5, 5# medium high 
priority")
+#define HMT_high() asm volatile("or 3, 3, 3# high 
priority")
+
+#ifdef CONFIG_PPC64
+#define cpu_relax()do { HMT_low(); HMT_medium(); barrier(); } while (0)
+#else
+#define cpu_relax()barrier()
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_PROCESSOR_H */
-- 
2.25.0



[PATCH v11 0/5] powerpc: switch VDSO to C implementation

2020-08-25 Thread Christophe Leroy
This is the tenth version of a series to switch powerpc VDSO to
generic C implementation.

Changes in v11:
- Rebased to today's powerpc/merge branch
- Prototype of __arch_get_hw_counter() was modified in mainline (patch 2)

Changes in v10 are:
- Added a comment explaining the reason for the double stack frame
- Moved back .cfi_register lr next to mflr

Main changes in v9 are:
- Dropped the patches which put the VDSO datapage in front of VDSO text in the 
mapping
- Adds a second stack frame because the caller doesn't set one, at least on 
PPC64
- Saving the TOC pointer on PPC64 (is that really needed ?)

This series applies on today's powerpc/merge branch.

See the last patches for details on changes and performance.

Christophe Leroy (5):
  powerpc/processor: Move cpu_relax() into asm/vdso/processor.h
  powerpc/vdso: Prepare for switching VDSO to generic C implementation.
  powerpc/vdso: Save and restore TOC pointer on PPC64
  powerpc/vdso: Switch VDSO to generic C implementation.
  powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

 arch/powerpc/Kconfig |   2 +
 arch/powerpc/include/asm/clocksource.h   |   7 +
 arch/powerpc/include/asm/processor.h |  13 +-
 arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
 arch/powerpc/include/asm/vdso/gettimeofday.h | 198 
 arch/powerpc/include/asm/vdso/processor.h|  23 ++
 arch/powerpc/include/asm/vdso/vsyscall.h |  25 ++
 arch/powerpc/include/asm/vdso_datapage.h |  40 +--
 arch/powerpc/kernel/asm-offsets.c|  49 +--
 arch/powerpc/kernel/time.c   |  91 +-
 arch/powerpc/kernel/vdso.c   |   5 +-
 arch/powerpc/kernel/vdso32/Makefile  |  32 +-
 arch/powerpc/kernel/vdso32/config-fake32.h   |  34 +++
 arch/powerpc/kernel/vdso32/gettimeofday.S| 300 +--
 arch/powerpc/kernel/vdso32/vdso32.lds.S  |   1 +
 arch/powerpc/kernel/vdso32/vgettimeofday.c   |  35 +++
 arch/powerpc/kernel/vdso64/Makefile  |  23 +-
 arch/powerpc/kernel/vdso64/gettimeofday.S| 242 +--
 arch/powerpc/kernel/vdso64/vgettimeofday.c   |  29 ++
 19 files changed, 454 insertions(+), 702 deletions(-)
 create mode 100644 arch/powerpc/include/asm/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/powerpc/include/asm/vdso/processor.h
 create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
 create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h
 create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
 create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c

-- 
2.25.0



[PATCH v11 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-25 Thread Christophe Leroy
Prepare for switching VDSO to generic C implementation in following
patch. Here, we:
- Prepare the helpers to call the C VDSO functions
- Prepare the required callbacks for the C VDSO functions
- Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
- Add the C trampolines to the generic C VDSO functions

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.

Implementing __arch_get_vdso_data() would clobber the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.

Implement __arch_vdso_capable() and:
- When the timebase is used, make it always return true.
- When the RTC clock is used, make it always return false.

Provide vdso_shift_ns(), as the generic x >> s gives the following
bad result:

  18:   35 25 ff e0 addic.  r9,r5,-32
  1c:   41 80 00 10 blt 2c 
  20:   7c 64 4c 30 srw r4,r3,r9
  24:   38 60 00 00 li  r3,0
...
  2c:   54 69 08 3c rlwinm  r9,r3,1,0,30
  30:   21 45 00 1f subfic  r10,r5,31
  34:   7c 84 2c 30 srw r4,r4,r5
  38:   7d 29 50 30 slw r9,r9,r10
  3c:   7c 63 2c 30 srw r3,r3,r5
  40:   7d 24 23 78 or  r4,r9,r4

In our case the shift is always <= 32. In addition,  the upper 32 bits
of the result are likely nul. Lets GCC know it, it also optimises the
following calculations.

With the patch, we get:
   0:   21 25 00 20 subfic  r9,r5,32
   4:   7c 69 48 30 slw r9,r3,r9
   8:   7c 84 2c 30 srw r4,r4,r5
   c:   7d 24 23 78 or  r4,r9,r4
  10:   7c 63 2c 30 srw r3,r3,r5

Signed-off-by: Christophe Leroy 
---
v11:
- Changed of __arch_get_hw_counter() to adapt to 4c5a116ada95
("vdso/treewide: Add vdso_data pointer argument to __arch_get_hw_counter()")

v10:
- Added a comment to explain the reason for the two stack frames.
- Moved back the .cfi_register next to mflr

v9:
- No more modification of __get_datapage(). Offset is added after.
- Adding a second stack frame because the PPC VDSO ABI doesn't force
the caller to set one.

v8:
- New, splitted out of last patch of the series

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/clocksource.h   |   7 +
 arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
 arch/powerpc/include/asm/vdso/gettimeofday.h | 186 +++
 arch/powerpc/kernel/vdso32/vgettimeofday.c   |  29 +++
 arch/powerpc/kernel/vdso64/vgettimeofday.c   |  29 +++
 5 files changed, 258 insertions(+)
 create mode 100644 arch/powerpc/include/asm/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
 create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c

diff --git a/arch/powerpc/include/asm/clocksource.h 
b/arch/powerpc/include/asm/clocksource.h
new file mode 100644
index ..482185566b0c
--- /dev/null
+++ b/arch/powerpc/include/asm/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_CLOCKSOURCE_H
+#define _ASM_CLOCKSOURCE_H
+
+#include 
+
+#endif
diff --git a/arch/powerpc/include/asm/vdso/clocksource.h 
b/arch/powerpc/include/asm/vdso/clocksource.h
new file mode 100644
index ..ec5d672d2569
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSOCLOCKSOURCE_H
+#define __ASM_VDSOCLOCKSOURCE_H
+
+#define VDSO_ARCH_CLOCKMODES   VDSO_CLOCKMODE_ARCHTIMER
+
+#endif
diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
new file mode 100644
index ..dce9d5051259
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -0,0 +1,186 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSO_GETTIMEOFDAY_H
+#define __ASM_VDSO_GETTIMEOFDAY_H
+
+#include 
+
+#ifdef __ASSEMBLY__
+
+/*
+ * The macros sets two stack frames, one for the caller and one for the callee
+ * because there are no requirement for the caller to set a stack frame when
+ * calling VDSO so it may have omitted to set one, especially on PPC64
+ */
+
+.macro cvdso_call funct
+  .cfi_startproc
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+   mflrr0
+  .cfi_register lr, r0
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   get_datapager5, r0
+   addir5, r5, VDSO_DATA_OFFSET
+   bl  \funct
+   PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   cmpwi   r3, 0
+   mtlrr0
+  .cfi_restore lr
+   addir1, r1, 2 * STACK_FRAME_OVERHEAD
+   

[PATCH V2] ASoC: fsl: imx-es8328: add missing put_device() call in imx_es8328_probe()

2020-08-25 Thread Yu Kuai
if of_find_device_by_node() succeed, imx_es8328_probe() doesn't have
a corresponding put_device(). Thus add a jump target to fix the exception
handling for this function implementation.

Fixes: 7e7292dba215 ("ASoC: fsl: add imx-es8328 machine driver")
Signed-off-by: Yu Kuai 
---
Changes from V1:
 - remove the first patch in patch series

 sound/soc/fsl/imx-es8328.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/sound/soc/fsl/imx-es8328.c b/sound/soc/fsl/imx-es8328.c
index 15a27a2cd0ca..fad1eb6253d5 100644
--- a/sound/soc/fsl/imx-es8328.c
+++ b/sound/soc/fsl/imx-es8328.c
@@ -145,13 +145,13 @@ static int imx_es8328_probe(struct platform_device *pdev)
data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
if (!data) {
ret = -ENOMEM;
-   goto fail;
+   goto put_device;
}
 
comp = devm_kzalloc(dev, 3 * sizeof(*comp), GFP_KERNEL);
if (!comp) {
ret = -ENOMEM;
-   goto fail;
+   goto put_device;
}
 
data->dev = dev;
@@ -182,12 +182,12 @@ static int imx_es8328_probe(struct platform_device *pdev)
ret = snd_soc_of_parse_card_name(>card, "model");
if (ret) {
dev_err(dev, "Unable to parse card name\n");
-   goto fail;
+   goto put_device;
}
ret = snd_soc_of_parse_audio_routing(>card, "audio-routing");
if (ret) {
dev_err(dev, "Unable to parse routing: %d\n", ret);
-   goto fail;
+   goto put_device;
}
data->card.num_links = 1;
data->card.owner = THIS_MODULE;
@@ -196,10 +196,12 @@ static int imx_es8328_probe(struct platform_device *pdev)
ret = snd_soc_register_card(>card);
if (ret) {
dev_err(dev, "Unable to register: %d\n", ret);
-   goto fail;
+   goto put_device;
}
 
platform_set_drvdata(pdev, data);
+put_device:
+   put_device(_pdev->dev);
 fail:
of_node_put(ssi_np);
of_node_put(codec_np);
-- 
2.25.4



Re: [PATCH 1/2] ASoC: fsl: imx-es8328: add missing kfree() call in imx_es8328_probe()

2020-08-25 Thread yukuai (C)



On 2020/08/25 20:11, Mark Brown wrote:

On Tue, Aug 25, 2020 at 08:05:30PM +0800, Yu Kuai wrote:

If memory allocation for 'data' or 'comp' succeed, imx_es8328_probe()
doesn't have corresponding kfree() in exception handling. Thus add
kfree() for this function implementation.



@@ -151,7 +151,7 @@ static int imx_es8328_probe(struct platform_device *pdev)
comp = devm_kzalloc(dev, 3 * sizeof(*comp), GFP_KERNEL);
if (!comp) {


The allocation is being done using devm_ which means no explicit kfree()
is needed, the allocation will be automatically unwound when the device
is unbound.


Hi,

Thanks for pointing it out, I'll remove this patch.

Best regards,
Yu Kuai



[PATCH 2/2] ASoC: fsl: imx-es8328: add missing put_device() call in imx_es8328_probe()

2020-08-25 Thread Yu Kuai
if of_find_device_by_node() succeed, imx_es8328_probe() doesn't have
a corresponding put_device(). Thus add a jump target to fix the exception
handling for this function implementation.

Fixes: 7e7292dba215 ("ASoC: fsl: add imx-es8328 machine driver")
Signed-off-by: Yu Kuai 
---
 sound/soc/fsl/imx-es8328.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sound/soc/fsl/imx-es8328.c b/sound/soc/fsl/imx-es8328.c
index 8f71ed3a6f75..a3f121939a83 100644
--- a/sound/soc/fsl/imx-es8328.c
+++ b/sound/soc/fsl/imx-es8328.c
@@ -145,7 +145,7 @@ static int imx_es8328_probe(struct platform_device *pdev)
data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
if (!data) {
ret = -ENOMEM;
-   goto fail;
+   goto put_device;
}
 
comp = devm_kzalloc(dev, 3 * sizeof(*comp), GFP_KERNEL);
@@ -204,6 +204,8 @@ static int imx_es8328_probe(struct platform_device *pdev)
kfree(comp);
 free_data:
kfree(data);
+put_device:
+   put_device(_pdev->dev);
 fail:
of_node_put(ssi_np);
of_node_put(codec_np);
-- 
2.25.4



[PATCH 0/2] do exception handling appropriately in imx_es8328_probe()

2020-08-25 Thread Yu Kuai
Yu Kuai (2):
  ASoC: fsl: imx-es8328: add missing kfree() call in imx_es8328_probe()
  ASoC: fsl: imx-es8328: add missing put_device() call in
imx_es8328_probe()

 sound/soc/fsl/imx-es8328.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

-- 
2.25.4



[PATCH 1/2] ASoC: fsl: imx-es8328: add missing kfree() call in imx_es8328_probe()

2020-08-25 Thread Yu Kuai
If memory allocation for 'data' or 'comp' succeed, imx_es8328_probe()
doesn't have corresponding kfree() in exception handling. Thus add
kfree() for this function implementation.

Fixes: 7e7292dba215 ("ASoC: fsl: add imx-es8328 machine driver")
Signed-off-by: Yu Kuai 
---
 sound/soc/fsl/imx-es8328.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/sound/soc/fsl/imx-es8328.c b/sound/soc/fsl/imx-es8328.c
index 15a27a2cd0ca..8f71ed3a6f75 100644
--- a/sound/soc/fsl/imx-es8328.c
+++ b/sound/soc/fsl/imx-es8328.c
@@ -151,7 +151,7 @@ static int imx_es8328_probe(struct platform_device *pdev)
comp = devm_kzalloc(dev, 3 * sizeof(*comp), GFP_KERNEL);
if (!comp) {
ret = -ENOMEM;
-   goto fail;
+   goto free_data;
}
 
data->dev = dev;
@@ -182,12 +182,12 @@ static int imx_es8328_probe(struct platform_device *pdev)
ret = snd_soc_of_parse_card_name(>card, "model");
if (ret) {
dev_err(dev, "Unable to parse card name\n");
-   goto fail;
+   goto free_comp;
}
ret = snd_soc_of_parse_audio_routing(>card, "audio-routing");
if (ret) {
dev_err(dev, "Unable to parse routing: %d\n", ret);
-   goto fail;
+   goto free_comp;
}
data->card.num_links = 1;
data->card.owner = THIS_MODULE;
@@ -196,10 +196,14 @@ static int imx_es8328_probe(struct platform_device *pdev)
ret = snd_soc_register_card(>card);
if (ret) {
dev_err(dev, "Unable to register: %d\n", ret);
-   goto fail;
+   goto free_comp;
}
 
platform_set_drvdata(pdev, data);
+free_comp:
+   kfree(comp);
+free_data:
+   kfree(data);
 fail:
of_node_put(ssi_np);
of_node_put(codec_np);
-- 
2.25.4



Re: [PATCH] powerpc: Update documentation of ISA versions for Power10

2020-08-25 Thread Gabriel Paubert
On Tue, Aug 25, 2020 at 09:45:07PM +1000, Jordan Niethe wrote:
> Update the CPU to ISA Version Mapping document to include Power10 and
> ISA v3.1.
> 
> Signed-off-by: Jordan Niethe 
> ---
>  Documentation/powerpc/isa-versions.rst | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/Documentation/powerpc/isa-versions.rst 
> b/Documentation/powerpc/isa-versions.rst
> index a363d8c1603c..72aff1eaaea1 100644
> --- a/Documentation/powerpc/isa-versions.rst
> +++ b/Documentation/powerpc/isa-versions.rst
> @@ -7,6 +7,7 @@ Mapping of some CPU versions to relevant ISA versions.
>  = 
> 
>  CPU   Architecture version
>  = 
> 
> +Power10   Power ISA v3.1
>  Power9Power ISA v3.0B
>  Power8Power ISA v2.07
>  Power7Power ISA v2.06
> @@ -32,6 +33,7 @@ Key Features
>  == ==
>  CPUVMX (aka. Altivec)
>  == ==
> +Power10Yes
>  Power9 Yes
>  Power8 Yes
>  Power7 Yes
> @@ -47,6 +49,7 @@ PPC970 Yes
>  == 
>  CPUVSX
>  == 
> +Power10Yes
>  Power9 Yes
>  Power8 Yes
>  Power7 Yes
> @@ -62,6 +65,7 @@ PPC970 No
>  == 
>  CPUTransactional Memory
>  == 
> +Power10Yes
>  Power9 Yes (* see transactional_memory.txt)
>  Power8 Yes
>  Power7 No

Huh? 

Transactional memory has been removed from the architecture for Power10. 

Gabriel
 



Re: [PATCH 1/2] ASoC: fsl: imx-es8328: add missing kfree() call in imx_es8328_probe()

2020-08-25 Thread Mark Brown
On Tue, Aug 25, 2020 at 08:05:30PM +0800, Yu Kuai wrote:
> If memory allocation for 'data' or 'comp' succeed, imx_es8328_probe()
> doesn't have corresponding kfree() in exception handling. Thus add
> kfree() for this function implementation.

> @@ -151,7 +151,7 @@ static int imx_es8328_probe(struct platform_device *pdev)
>   comp = devm_kzalloc(dev, 3 * sizeof(*comp), GFP_KERNEL);
>   if (!comp) {

The allocation is being done using devm_ which means no explicit kfree()
is needed, the allocation will be automatically unwound when the device
is unbound.


signature.asc
Description: PGP signature


Re: [PATCH v5 5/8] powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N

2020-08-25 Thread Christophe Leroy




Le 25/08/2020 à 13:07, Ravi Bangoria a écrit :

Hi Christophe,

diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c

index 57a0ab822334..866597b407bc 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -286,11 +286,16 @@ long ppc_del_hwdebug(struct task_struct *child, 
long data)

  }
  return ret;
  #else /* CONFIG_HAVE_HW_BREAKPOINT */
+    if (child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED)


I think child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED 
should go around additionnal ()


Not sure I follow.


Neither do I 

I thought that GCC would emit a warning for that, but in fact it only 
emit warnings for things like:


if (flags & HW_BRK_FLAG_DISABLED == HW_BRK_FLAG_DISABLED)






+    goto del;
+
  if (child->thread.hw_brk[data - 1].address == 0)
  return -ENOENT;


What about replacing the above if by:
 if (!(child->thread.hw_brk[data - 1].flags) & 
HW_BRK_FLAG_DISABLED) &&

 child->thread.hw_brk[data - 1].address == 0)
 return -ENOENT;

okay.. that's more compact.

But more importantly, what I wanted to know is whether 
CONFIG_HAVE_HW_BREAKPOINT
is set or not in production/distro builds for 8xx. Because I see it's 
not set in

8xx defconfigs.


Yes in our production configs with have CONFIG_PERF_EVENTS, that implies 
CONFIG_HAVE_HW_BREAKPOINT


Christophe


Re: [PATCH v5 4/8] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c

2020-08-25 Thread Christophe Leroy




Le 25/08/2020 à 13:08, Ravi Bangoria a écrit :

Hi Christophe,


+static int cache_op_size(void)
+{
+#ifdef __powerpc64__
+    return ppc64_caches.l1d.block_size;
+#else
+    return L1_CACHE_BYTES;
+#endif
+}


You've got l1_dcache_bytes() in arch/powerpc/include/asm/cache.h to do 
that.



+
+void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
+ int *type, int *size, unsigned long *ea)
+{
+    struct instruction_op op;
+
+    if (__get_user_instr_inatomic(*instr, (void __user *)regs->nip))
+    return;
+
+    analyse_instr(, regs, *instr);
+    *type = GETTYPE(op.type);
+    *ea = op.ea;
+#ifdef __powerpc64__
+    if (!(regs->msr & MSR_64BIT))
+    *ea &= 0xUL;
+#endif


This #ifdef is unneeded, it should build fine on a 32 bits too.


This patch is just a code movement from one file to another.
I don't really change the logic. Would you mind if I do a
separate patch for these changes (not a part of this series)?


Sure, do it in a separate patch.

Christophe


[PATCH] powerpc: Update documentation of ISA versions for Power10

2020-08-25 Thread Jordan Niethe
Update the CPU to ISA Version Mapping document to include Power10 and
ISA v3.1.

Signed-off-by: Jordan Niethe 
---
 Documentation/powerpc/isa-versions.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/powerpc/isa-versions.rst 
b/Documentation/powerpc/isa-versions.rst
index a363d8c1603c..72aff1eaaea1 100644
--- a/Documentation/powerpc/isa-versions.rst
+++ b/Documentation/powerpc/isa-versions.rst
@@ -7,6 +7,7 @@ Mapping of some CPU versions to relevant ISA versions.
 = 
 CPU   Architecture version
 = 
+Power10   Power ISA v3.1
 Power9Power ISA v3.0B
 Power8Power ISA v2.07
 Power7Power ISA v2.06
@@ -32,6 +33,7 @@ Key Features
 == ==
 CPUVMX (aka. Altivec)
 == ==
+Power10Yes
 Power9 Yes
 Power8 Yes
 Power7 Yes
@@ -47,6 +49,7 @@ PPC970 Yes
 == 
 CPUVSX
 == 
+Power10Yes
 Power9 Yes
 Power8 Yes
 Power7 Yes
@@ -62,6 +65,7 @@ PPC970 No
 == 
 CPUTransactional Memory
 == 
+Power10Yes
 Power9 Yes (* see transactional_memory.txt)
 Power8 Yes
 Power7 No
-- 
2.17.1



Re: [PATCH v5 4/8] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c

2020-08-25 Thread Ravi Bangoria

Hi Christophe,


+static int cache_op_size(void)
+{
+#ifdef __powerpc64__
+    return ppc64_caches.l1d.block_size;
+#else
+    return L1_CACHE_BYTES;
+#endif
+}


You've got l1_dcache_bytes() in arch/powerpc/include/asm/cache.h to do that.


+
+void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
+ int *type, int *size, unsigned long *ea)
+{
+    struct instruction_op op;
+
+    if (__get_user_instr_inatomic(*instr, (void __user *)regs->nip))
+    return;
+
+    analyse_instr(, regs, *instr);
+    *type = GETTYPE(op.type);
+    *ea = op.ea;
+#ifdef __powerpc64__
+    if (!(regs->msr & MSR_64BIT))
+    *ea &= 0xUL;
+#endif


This #ifdef is unneeded, it should build fine on a 32 bits too.


This patch is just a code movement from one file to another.
I don't really change the logic. Would you mind if I do a
separate patch for these changes (not a part of this series)?

Thanks for review,
Ravi


Re: [PATCH v5 5/8] powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N

2020-08-25 Thread Ravi Bangoria

Hi Christophe,


diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
index 57a0ab822334..866597b407bc 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -286,11 +286,16 @@ long ppc_del_hwdebug(struct task_struct *child, long data)
  }
  return ret;
  #else /* CONFIG_HAVE_HW_BREAKPOINT */
+    if (child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED)


I think child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED should go 
around additionnal ()


Not sure I follow.




+    goto del;
+
  if (child->thread.hw_brk[data - 1].address == 0)
  return -ENOENT;


What about replacing the above if by:
 if (!(child->thread.hw_brk[data - 1].flags) & HW_BRK_FLAG_DISABLED) &&
     child->thread.hw_brk[data - 1].address == 0)
     return -ENOENT;

okay.. that's more compact.

But more importantly, what I wanted to know is whether CONFIG_HAVE_HW_BREAKPOINT
is set or not in production/distro builds for 8xx. Because I see it's not set in
8xx defconfigs.

Thanks,
Ravi


Re: [RFT][PATCH 0/7] Avoid overflow at boundary_size

2020-08-25 Thread Niklas Schnelle



On 8/21/20 1:19 AM, Nicolin Chen wrote:
> We are expending the default DMA segmentation boundary to its
> possible maximum value (ULONG_MAX) to indicate that a device
> doesn't specify a boundary limit. So all dma_get_seg_boundary
> callers should take a precaution with the return values since
> it would easily get overflowed.
> 
> I scanned the entire kernel tree for all the existing callers
> and found that most of callers may get overflowed in two ways:
> either "+ 1" or passing it to ALIGN() that does "+ mask".
> 
> According to kernel defines:
> #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
> #define ALIGN(x, a)   ALIGN_MASK(x, (typeof(x))(a) - 1)
> 
> We can simplify the logic here:
>   ALIGN(boundary + 1, 1 << shift) >> shift
> = ALIGN_MASK(b + 1, (1 << s) - 1) >> s
> = {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
> = [b + 1 + (1 << s) - 1] >> s
> = [b + (1 << s)] >> s
> = (b >> s) + 1
> 
> So this series of patches fix the potential overflow with this
> overflow-free shortcut.

Hi Nicolin,

haven't seen any other feedback from other maintainers,
so I guess you will resend this?
On first glance it seems to make sense.
I'm a little confused why it is only a "potential overflow"
while this part

"We are expending the default DMA segmentation boundary to its
 possible maximum value (ULONG_MAX) to indicate that a device
 doesn't specify a boundary limit"

sounds to me like ULONG_MAX is actually used, does that
mean there are currently no devices which do not specify a
boundary limit?


> 
> As I don't think that I have these platforms, marking RFT.
> 
> Thanks
> Nic
> 
> Nicolin Chen (7):
>   powerpc/iommu: Avoid overflow at boundary_size
>   alpha: Avoid overflow at boundary_size
>   ia64/sba_iommu: Avoid overflow at boundary_size
>   s390/pci_dma: Avoid overflow at boundary_size
>   sparc: Avoid overflow at boundary_size
>   x86/amd_gart: Avoid overflow at boundary_size
>   parisc: Avoid overflow at boundary_size
> 
>  arch/alpha/kernel/pci_iommu.c| 10 --
>  arch/ia64/hp/common/sba_iommu.c  |  4 ++--
>  arch/powerpc/kernel/iommu.c  | 11 +--
>  arch/s390/pci/pci_dma.c  |  4 ++--
>  arch/sparc/kernel/iommu-common.c |  9 +++--
>  arch/sparc/kernel/iommu.c|  4 ++--
>  arch/sparc/kernel/pci_sun4v.c|  4 ++--
>  arch/x86/kernel/amd_gart_64.c|  4 ++--
>  drivers/parisc/ccio-dma.c|  4 ++--
>  drivers/parisc/sba_iommu.c   |  4 ++--
>  10 files changed, 26 insertions(+), 32 deletions(-)
> 


Re: [PATCH] usb: gadget: fsl: Fix unsigned expression compared with zero in fsl_udc_probe

2020-08-25 Thread Joakim Tjernlund
On Tue, 2020-08-25 at 11:53 +0300, Felipe Balbi wrote:
Joakim Tjernlund 
mailto:joakim.tjernl...@infinera.com>> writes:

> On Mon, 2020-08-24 at 16:58 +0300, Felipe Balbi wrote:
>> Joakim Tjernlund 
>> mailto:joakim.tjernl...@infinera.com>> writes:
>>
>> > On Mon, 2020-08-24 at 10:21 +0200, Greg KH wrote:
>> > >
>> > > On Mon, Aug 24, 2020 at 04:04:37PM +0800, Ye Bin wrote:
>> > > > Signed-off-by: Ye Bin mailto:yebi...@huawei.com>>
>> > >
>> > > I can't take patches without any changelog text, sorry.
>> >
>> > Still taking patches for fsl_udc_core.c ?
>> > I figured this driver was obsolete and should be moved to one of the 
>> > Chipidea drivers.
>>
>> Nobody sent any patches to switch over the users of this driver to
>> chipidea. I would love to delete this driver :-)
>
> Me too, I got a few local patches here as the driver is quite buggy.
> Got to little USB knowledge to switch it over though :(

this wouldn't require USB knowledge. It only requires some minor DTS
knowledge and HW for testing.

hmm, OK. If it is that simple I may take a crack at it(but then why hasn't NXP 
already done that ?)
I would need some guidance as to what the involved files are?

Jocke



Re: [PATCH v5 5/8] powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N

2020-08-25 Thread Christophe Leroy




Le 25/08/2020 à 06:36, Ravi Bangoria a écrit :

On powerpc, ptrace watchpoint works in one-shot mode. i.e. kernel
disables event every time it fires and user has to re-enable it.
Also, in case of ptrace watchpoint, kernel notifies ptrace user
before executing instruction.

With CONFIG_HAVE_HW_BREAKPOINT=N, kernel is missing to disable
ptrace event and thus it's causing infinite loop of exceptions.
This is especially harmful when user watches on a data which is
also read/written by kernel, eg syscall parameters. In such case,
infinite exceptions happens in kernel mode which causes soft-lockup.

Fixes: 9422de3e953d ("powerpc: Hardware breakpoints rewrite to handle non DABR 
breakpoint registers")
Reported-by: Pedro Miraglia Franco de Carvalho 
Signed-off-by: Ravi Bangoria 
---
  arch/powerpc/include/asm/hw_breakpoint.h  |  3 ++
  arch/powerpc/kernel/process.c | 48 +++
  arch/powerpc/kernel/ptrace/ptrace-noadv.c |  5 +++
  3 files changed, 56 insertions(+)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 2eca3dd54b55..c72263214d3f 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -18,6 +18,7 @@ struct arch_hw_breakpoint {
u16 type;
u16 len; /* length of the target data symbol */
u16 hw_len; /* length programmed in hw */
+   u8  flags;
  };
  
  /* Note: Don't change the first 6 bits below as they are in the same order

@@ -37,6 +38,8 @@ struct arch_hw_breakpoint {
  #define HW_BRK_TYPE_PRIV_ALL  (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \
 HW_BRK_TYPE_HYP)
  
+#define HW_BRK_FLAG_DISABLED	0x1

+
  /* Minimum granularity */
  #ifdef CONFIG_PPC_8xx
  #define HW_BREAKPOINT_SIZE  0x4
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 016bd831908e..160fbbf41d40 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -636,6 +636,44 @@ void do_send_trap(struct pt_regs *regs, unsigned long 
address,
(void __user *)address);
  }
  #else /* !CONFIG_PPC_ADV_DEBUG_REGS */
+
+static void do_break_handler(struct pt_regs *regs)
+{
+   struct arch_hw_breakpoint null_brk = {0};
+   struct arch_hw_breakpoint *info;
+   struct ppc_inst instr = ppc_inst(0);
+   int type = 0;
+   int size = 0;
+   unsigned long ea;
+   int i;
+
+   /*
+* If underneath hw supports only one watchpoint, we know it
+* caused exception. 8xx also falls into this category.
+*/
+   if (nr_wp_slots() == 1) {
+   __set_breakpoint(0, _brk);
+   current->thread.hw_brk[0] = null_brk;
+   current->thread.hw_brk[0].flags |= HW_BRK_FLAG_DISABLED;
+   return;
+   }
+
+   /* Otherwise findout which DAWR caused exception and disable it. */
+   wp_get_instr_detail(regs, , , , );
+
+   for (i = 0; i < nr_wp_slots(); i++) {
+   info = >thread.hw_brk[i];
+   if (!info->address)
+   continue;
+
+   if (wp_check_constraints(regs, instr, ea, type, size, info)) {
+   __set_breakpoint(i, _brk);
+   current->thread.hw_brk[i] = null_brk;
+   current->thread.hw_brk[i].flags |= HW_BRK_FLAG_DISABLED;
+   }
+   }
+}
+
  void do_break (struct pt_regs *regs, unsigned long address,
unsigned long error_code)
  {
@@ -647,6 +685,16 @@ void do_break (struct pt_regs *regs, unsigned long address,
if (debugger_break_match(regs))
return;
  
+	/*

+* We reach here only when watchpoint exception is generated by ptrace
+* event (or hw is buggy!). Now if CONFIG_HAVE_HW_BREAKPOINT is set,
+* watchpoint is already handled by hw_breakpoint_handler() so we don't
+* have to do anything. But when CONFIG_HAVE_HW_BREAKPOINT is not set,
+* we need to manually handle the watchpoint here.
+*/
+   if (!IS_ENABLED(CONFIG_HAVE_HW_BREAKPOINT))
+   do_break_handler(regs);
+
/* Deliver the signal to userspace */
force_sig_fault(SIGTRAP, TRAP_HWBKPT, (void __user *)address);
  }
diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c 
b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
index 57a0ab822334..866597b407bc 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c
@@ -286,11 +286,16 @@ long ppc_del_hwdebug(struct task_struct *child, long data)
}
return ret;
  #else /* CONFIG_HAVE_HW_BREAKPOINT */
+   if (child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED)


I think child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED 
should go around additionnal ()



+   goto del;
+
if (child->thread.hw_brk[data - 

[PATCH] powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode

2020-08-25 Thread Michael Ellerman
The recent commit 01eb01877f33 ("powerpc/64s: Fix restore_math
unnecessarily changing MSR") changed some of the handling of floating
point/vector restore.

In particular it caused current->thread.fpexc_mode to be copied into
the current MSR (via msr_check_and_set()), rather than just into
regs->msr (which is moved into MSR on return to userspace).

This can lead to a crash in the kernel if we take a floating point
exception when restoring FPSCR:

  Oops: Exception in kernel mode, sig: 8 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 3 PID: 101213 Comm: ld64.so.2 Not tainted 
5.9.0-rc1-00098-g18445bf405cb-dirty #9
  NIP:  c000fbb4 LR: c001a7ac CTR: c0183570
  REGS: c016b7cfb3b0 TRAP: 0700   Not tainted  
(5.9.0-rc1-00098-g18445bf405cb-dirty)
  MSR:  9290b933   CR: 44002444  
XER: 
  CFAR: c001a7a8 IRQMASK: 1
  GPR00: c001ae40 c016b7cfb640 c11b7f00 c01542a0f740
  GPR04: c01542a0f720 c01542a0eb00 0900 c01542a0eb00
  GPR08: 000a 2000 90009033 
  GPR12: 4000 c017d900 0001 c0df5a58
  GPR16: c0e19c18 c10e1123 0001 c0e1a638
  GPR20:  c44b1d00  c01542a0f2a0
  GPR24: 0016c7fe c01542a0f720 c1c93da0 c0fe5f28
  GPR28: c01542a0f720 0080 c016b7cfbe90 02802900
  NIP load_fp_state+0x4/0x214
  LR  restore_math+0x17c/0x1f0
  Call Trace:
0xc016b7cfb680 (unreliable)
__switch_to+0x330/0x460
__schedule+0x318/0x920
schedule+0x74/0x140
schedule_timeout+0x318/0x3f0
wait_for_completion+0xc8/0x210
call_usermodehelper_exec+0x234/0x280
do_coredump+0xedc/0x13c0
get_signal+0x1d4/0xbe0
do_notify_resume+0x1a0/0x490
interrupt_exit_user_prepare+0x1c4/0x230
interrupt_return+0x14/0x1c0
  Instruction dump:
  ebe10168 e88101a0 7c8ff120 382101e0 e8010010 7c0803a6 4e800020 790605c4
  782905c4 7c0008a8 7c0008a8 c8030200  4888 c803 c8230010

Fix it by only loading the fpexc_mode value into regs->msr.

Also add a comment to explain that although VSX is subject to the
value of fpexc_mode, we don't have to handle that separately because
we only allow VSX to be enabled if FP is also enabled.

Fixes: 01eb01877f33 ("powerpc/64s: Fix restore_math unnecessarily changing MSR")
Reported-by: Milton Miller 
Signed-off-by: Michael Ellerman 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kernel/process.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 016bd831908e..73a57043ee66 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -548,7 +548,7 @@ void notrace restore_math(struct pt_regs *regs)
 * are live for the user thread).
 */
if ((!(msr & MSR_FP)) && should_restore_fp())
-   new_msr |= MSR_FP | current->thread.fpexc_mode;
+   new_msr |= MSR_FP;
 
if ((!(msr & MSR_VEC)) && should_restore_altivec())
new_msr |= MSR_VEC;
@@ -559,11 +559,17 @@ void notrace restore_math(struct pt_regs *regs)
}
 
if (new_msr) {
+   unsigned long fpexc_mode = 0;
+
msr_check_and_set(new_msr);
 
-   if (new_msr & MSR_FP)
+   if (new_msr & MSR_FP) {
do_restore_fp();
 
+   // This also covers VSX, because VSX implies FP
+   fpexc_mode = current->thread.fpexc_mode;
+   }
+
if (new_msr & MSR_VEC)
do_restore_altivec();
 
@@ -572,7 +578,7 @@ void notrace restore_math(struct pt_regs *regs)
 
msr_check_and_clear(new_msr);
 
-   regs->msr |= new_msr;
+   regs->msr |= new_msr | fpexc_mode;
}
 }
 #endif
-- 
2.25.1



Re: [PATCH v5 4/8] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c

2020-08-25 Thread Christophe Leroy




Le 25/08/2020 à 06:36, Ravi Bangoria a écrit :

Power10 hw has multiple DAWRs but hw doesn't tell which DAWR caused
the exception. So we have a sw logic to detect that in hw_breakpoint.c.
But hw_breakpoint.c gets compiled only with CONFIG_HAVE_HW_BREAKPOINT=Y.
Move DAWR detection logic outside of hw_breakpoint.c so that it can be
reused when CONFIG_HAVE_HW_BREAKPOINT is not set.

Signed-off-by: Ravi Bangoria 
---
  arch/powerpc/include/asm/hw_breakpoint.h  |   8 +
  arch/powerpc/kernel/Makefile  |   3 +-
  arch/powerpc/kernel/hw_breakpoint.c   | 159 +
  .../kernel/hw_breakpoint_constraints.c| 162 ++
  4 files changed, 174 insertions(+), 158 deletions(-)
  create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c



[...]


diff --git a/arch/powerpc/kernel/hw_breakpoint_constraints.c 
b/arch/powerpc/kernel/hw_breakpoint_constraints.c
new file mode 100644
index ..867ee4aa026a
--- /dev/null
+++ b/arch/powerpc/kernel/hw_breakpoint_constraints.c
@@ -0,0 +1,162 @@


[...]


+
+static int cache_op_size(void)
+{
+#ifdef __powerpc64__
+   return ppc64_caches.l1d.block_size;
+#else
+   return L1_CACHE_BYTES;
+#endif
+}


You've got l1_dcache_bytes() in arch/powerpc/include/asm/cache.h to do that.


+
+void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
+int *type, int *size, unsigned long *ea)
+{
+   struct instruction_op op;
+
+   if (__get_user_instr_inatomic(*instr, (void __user *)regs->nip))
+   return;
+
+   analyse_instr(, regs, *instr);
+   *type = GETTYPE(op.type);
+   *ea = op.ea;
+#ifdef __powerpc64__
+   if (!(regs->msr & MSR_64BIT))
+   *ea &= 0xUL;
+#endif


This #ifdef is unneeded, it should build fine on a 32 bits too.


+
+   *size = GETSIZE(op.type);
+   if (*type == CACHEOP) {
+   *size = cache_op_size();
+   *ea &= ~(*size - 1);
+   } else if (*type == LOAD_VMX || *type == STORE_VMX) {
+   *ea &= ~(*size - 1);
+   }
+}



Christophe


Re: [PATCH] usb: gadget: fsl: Fix unsigned expression compared with zero in fsl_udc_probe

2020-08-25 Thread Felipe Balbi
Joakim Tjernlund  writes:

> On Mon, 2020-08-24 at 16:58 +0300, Felipe Balbi wrote:
>> Joakim Tjernlund  writes:
>> 
>> > On Mon, 2020-08-24 at 10:21 +0200, Greg KH wrote:
>> > > 
>> > > On Mon, Aug 24, 2020 at 04:04:37PM +0800, Ye Bin wrote:
>> > > > Signed-off-by: Ye Bin 
>> > > 
>> > > I can't take patches without any changelog text, sorry.
>> > 
>> > Still taking patches for fsl_udc_core.c ?
>> > I figured this driver was obsolete and should be moved to one of the 
>> > Chipidea drivers.
>> 
>> Nobody sent any patches to switch over the users of this driver to
>> chipidea. I would love to delete this driver :-)
>
> Me too, I got a few local patches here as the driver is quite buggy.
> Got to little USB knowledge to switch it over though :(

this wouldn't require USB knowledge. It only requires some minor DTS
knowledge and HW for testing.

-- 
balbi


signature.asc
Description: PGP signature


Re: Build regressions/improvements in v5.9-rc2

2020-08-25 Thread Geert Uytterhoeven
On Tue, Aug 25, 2020 at 10:23 AM Geert Uytterhoeven
 wrote:
> JFYI, when comparing v5.9-rc2[1] to v5.9-rc1[3], the summaries are:
>   - build errors: +12/-0

  + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:
error: implicit declaration of function 'disable_kernel_vsx'
[-Werror=implicit-function-declaration]:  => 676:2
  + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:
error: implicit declaration of function 'enable_kernel_vsx'
[-Werror=implicit-function-declaration]:  => 640:2

powerpc-gcc4.9/ppc64_book3e_allmodconfig

  + error: arch/sparc/kernel/head_32.o: relocation truncated to fit:
R_SPARC_WDISP22 against `.init.text':  => (.head.text+0x5040),
(.head.text+0x5100)
  + error: arch/sparc/kernel/head_32.o: relocation truncated to fit:
R_SPARC_WDISP22 against symbol `leon_smp_cpu_startup' defined in .text
section in arch/sparc/kernel/trampoline_32.o:  => (.init.text+0xa4)
  + error: arch/sparc/kernel/process_32.o: relocation truncated to
fit: R_SPARC_WDISP22 against `.text':  => (.fixup+0x4), (.fixup+0xc)
  + error: arch/sparc/kernel/signal_32.o: relocation truncated to fit:
R_SPARC_WDISP22 against `.text':  => (.fixup+0x28), (.fixup+0x1c),
(.fixup+0x34), (.fixup+0x10), (.fixup+0x4)

sparc64/sparc-allmodconfig

  + error: modpost: "devm_ioremap"
[drivers/net/ethernet/xilinx/ll_temac.ko] undefined!:  => N/A
  + error: modpost: "devm_ioremap_resource"
[drivers/net/ethernet/xilinx/xilinx_emac.ko] undefined!:  => N/A
  + error: modpost: "devm_of_iomap"
[drivers/net/ethernet/xilinx/ll_temac.ko] undefined!:  => N/A
  + error: modpost: "devm_platform_ioremap_resource"
[drivers/iio/adc/adi-axi-adc.ko] undefined!:  => N/A
  + error: modpost: "devm_platform_ioremap_resource"
[drivers/ptp/ptp_ines.ko] undefined!:  => N/A
  + error: modpost: "devm_platform_ioremap_resource_byname"
[drivers/net/ethernet/xilinx/ll_temac.ko] undefined!:  => N/A

um-x86_64/um-all{mod,yes}config

> [1] 
> http://kisskb.ellerman.id.au/kisskb/branch/linus/head/d012a7190fc1fd72ed48911e77ca97ba4521bccd/
>  (all 192 configs)
> [3] 
> http://kisskb.ellerman.id.au/kisskb/branch/linus/head/9123e3a74ec7b934a4a099e98af6a61c2f80bbf5/
>  (all 192 configs)
Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[PATCH] powerpc/pseries: add new branch prediction security bits for link stack

2020-08-25 Thread Nicholas Piggin
The hypervisor interface has defined branch prediction security bits for
handling the link stack. Wire them up.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/hvcall.h  | 2 ++
 arch/powerpc/platforms/pseries/setup.c | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index fbb377055471..e66627fc1972 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -375,11 +375,13 @@
 #define H_CPU_CHAR_THREAD_RECONFIG_CTRL(1ull << 57) // IBM bit 6
 #define H_CPU_CHAR_COUNT_CACHE_DISABLED(1ull << 56) // IBM bit 7
 #define H_CPU_CHAR_BCCTR_FLUSH_ASSIST  (1ull << 54) // IBM bit 9
+#define H_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST (1ull << 52) // IBM bit 11
 
 #define H_CPU_BEHAV_FAVOUR_SECURITY(1ull << 63) // IBM bit 0
 #define H_CPU_BEHAV_L1D_FLUSH_PR   (1ull << 62) // IBM bit 1
 #define H_CPU_BEHAV_BNDS_CHK_SPEC_BAR  (1ull << 61) // IBM bit 2
 #define H_CPU_BEHAV_FLUSH_COUNT_CACHE  (1ull << 58) // IBM bit 5
+#define H_CPU_BEHAV_FLUSH_LINK_STACK   (1ull << 57) // IBM bit 6
 
 /* Flag values used in H_REGISTER_PROC_TBL hcall */
 #define PROC_TABLE_OP_MASK 0x18
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 2f4ee0a90284..633c45ec406d 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -519,9 +519,15 @@ static void init_cpu_char_feature_flags(struct 
h_cpu_char_result *result)
if (result->character & H_CPU_CHAR_BCCTR_FLUSH_ASSIST)
security_ftr_set(SEC_FTR_BCCTR_FLUSH_ASSIST);
 
+   if (result->character & H_CPU_CHAR_BCCTR_LINK_FLUSH_ASSIST)
+   security_ftr_set(SEC_FTR_BCCTR_LINK_FLUSH_ASSIST);
+
if (result->behaviour & H_CPU_BEHAV_FLUSH_COUNT_CACHE)
security_ftr_set(SEC_FTR_FLUSH_COUNT_CACHE);
 
+   if (result->behaviour & H_CPU_BEHAV_FLUSH_LINK_STACK)
+   security_ftr_set(SEC_FTR_FLUSH_LINK_STACK);
+
/*
 * The features below are enabled by default, so we instead look to see
 * if firmware has *disabled* them, and clear them if so.
-- 
2.23.0



[PATCH] powerpc/64s: handle ISA v3.1 local copy-paste context switches

2020-08-25 Thread Nicholas Piggin
The ISA v3.1 the copy-paste facility has a new memory move functionality
which allows the copy buffer to be pasted to domestic memory (RAM) as
opposed to foreign memory (accelerator).

This means the POWER9 trick of avoiding the cp_abort on context switch if
the process had not mapped foreign memory does not work on POWER10. Do the
cp_abort unconditionally there.

KVM must also cp_abort on guest exit to prevent copy buffer state leaking
between contexts.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/process.c   | 16 +---
 arch/powerpc/kvm/book3s_hv.c|  7 +++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 016bd831908e..1a572c811ca5 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1250,15 +1250,17 @@ struct task_struct *__switch_to(struct task_struct 
*prev,
restore_math(current->thread.regs);
 
/*
-* The copy-paste buffer can only store into foreign real
-* addresses, so unprivileged processes can not see the
-* data or use it in any way unless they have foreign real
-* mappings. If the new process has the foreign real address
-* mappings, we must issue a cp_abort to clear any state and
-* prevent snooping, corruption or a covert channel.
+* On POWER9 the copy-paste buffer can only paste into
+* foreign real addresses, so unprivileged processes can not
+* see the data or use it in any way unless they have
+* foreign real mappings. If the new process has the foreign
+* real address mappings, we must issue a cp_abort to clear
+* any state and prevent snooping, corruption or a covert
+* channel. ISA v3.1 supports paste into local memory.
 */
if (current->mm &&
-   atomic_read(>mm->context.vas_windows))
+   (cpu_has_feature(CPU_FTR_ARCH_31) ||
+   atomic_read(>mm->context.vas_windows)))
asm volatile(PPC_CP_ABORT);
}
 #endif /* CONFIG_PPC_BOOK3S_64 */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4ba06a2a306c..3bd3118c7633 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3530,6 +3530,13 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
 */
asm volatile("eieio; tlbsync; ptesync");
 
+   /*
+* cp_abort is required if the processor supports local copy-paste
+* to clear the copy buffer that was under control of the guest.
+*/
+   if (cpu_has_feature(CPU_FTR_ARCH_31))
+   asm volatile(PPC_CP_ABORT);
+
mtspr(SPRN_LPID, vcpu->kvm->arch.host_lpid);/* restore host LPID */
isync();
 
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 799d6d0f4ead..cd9995ee8441 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1830,6 +1830,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_RADIX_PREFETCH_BUG)
 2:
 #endif /* CONFIG_PPC_RADIX_MMU */
 
+   /*
+* cp_abort is required if the processor supports local copy-paste
+* to clear the copy buffer that was under control of the guest.
+*/
+BEGIN_FTR_SECTION
+   PPC_CP_ABORT
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_31)
+
/*
 * POWER7/POWER8 guest -> host partition switch code.
 * We don't have to lock against tlbies but we do
-- 
2.23.0



[PATCH] powerpc/64s: Add cp_abort after tlbiel to invalidate copy-buffer address

2020-08-25 Thread Nicholas Piggin
The copy buffer is implemented as a real address in the nest which is
translated from EA by copy, and used for memory access by paste. This
requires that it be invalidated by TLB invalidation.

TLBIE does invalidate the copy buffer, but TLBIEL does not. Add cp_abort
to the tlbiel sequence.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/synch.h   | 13 +
 arch/powerpc/mm/book3s64/hash_native.c |  8 
 arch/powerpc/mm/book3s64/radix_tlb.c   | 12 ++--
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/synch.h b/arch/powerpc/include/asm/synch.h
index aca70fb43147..47d036d32828 100644
--- a/arch/powerpc/include/asm/synch.h
+++ b/arch/powerpc/include/asm/synch.h
@@ -3,7 +3,9 @@
 #define _ASM_POWERPC_SYNCH_H 
 #ifdef __KERNEL__
 
+#include 
 #include 
+#include 
 #include 
 
 #ifndef __ASSEMBLY__
@@ -20,6 +22,17 @@ static inline void isync(void)
 {
__asm__ __volatile__ ("isync" : : : "memory");
 }
+
+static inline void ppc_after_tlbiel_barrier(void)
+{
+asm volatile("ptesync": : :"memory");
+   /*
+* POWER9, POWER10 need a cp_abort after tlbiel. For POWER9 this could
+* possibly be limited to tasks which have mapped foreign address, 
similar
+* to cp_abort in context switch.
+*/
+asm volatile(ASM_FTR_IFSET(PPC_CP_ABORT, "", %0) : : "i" 
(CPU_FTR_ARCH_300) : "memory");
+}
 #endif /* __ASSEMBLY__ */
 
 #if defined(__powerpc64__)
diff --git a/arch/powerpc/mm/book3s64/hash_native.c 
b/arch/powerpc/mm/book3s64/hash_native.c
index cf20e5229ce1..0203cdf48c54 100644
--- a/arch/powerpc/mm/book3s64/hash_native.c
+++ b/arch/powerpc/mm/book3s64/hash_native.c
@@ -82,7 +82,7 @@ static void tlbiel_all_isa206(unsigned int num_sets, unsigned 
int is)
for (set = 0; set < num_sets; set++)
tlbiel_hash_set_isa206(set, is);
 
-   asm volatile("ptesync": : :"memory");
+   ppc_after_tlbiel_barrier();
 }
 
 static void tlbiel_all_isa300(unsigned int num_sets, unsigned int is)
@@ -110,7 +110,7 @@ static void tlbiel_all_isa300(unsigned int num_sets, 
unsigned int is)
 */
tlbiel_hash_set_isa300(0, is, 0, 2, 1);
 
-   asm volatile("ptesync": : :"memory");
+   ppc_after_tlbiel_barrier();
 
asm volatile(PPC_ISA_3_0_INVALIDATE_ERAT "; isync" : : :"memory");
 }
@@ -303,7 +303,7 @@ static inline void tlbie(unsigned long vpn, int psize, int 
apsize,
asm volatile("ptesync": : :"memory");
if (use_local) {
__tlbiel(vpn, psize, apsize, ssize);
-   asm volatile("ptesync": : :"memory");
+   ppc_after_tlbiel_barrier();
} else {
__tlbie(vpn, psize, apsize, ssize);
fixup_tlbie_vpn(vpn, psize, apsize, ssize);
@@ -879,7 +879,7 @@ static void native_flush_hash_range(unsigned long number, 
int local)
__tlbiel(vpn, psize, psize, ssize);
} pte_iterate_hashed_end();
}
-   asm volatile("ptesync":::"memory");
+   ppc_after_tlbiel_barrier();
} else {
int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index 0d233763441f..5c9d2fccacc7 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -65,7 +65,7 @@ static void tlbiel_all_isa300(unsigned int num_sets, unsigned 
int is)
for (set = 1; set < num_sets; set++)
tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 1);
 
-   asm volatile("ptesync": : :"memory");
+   ppc_after_tlbiel_barrier();
 }
 
 void radix__tlbiel_all(unsigned int action)
@@ -296,7 +296,7 @@ static __always_inline void _tlbiel_pid(unsigned long pid, 
unsigned long ric)
 
/* For PWC, only one flush is needed */
if (ric == RIC_FLUSH_PWC) {
-   asm volatile("ptesync": : :"memory");
+   ppc_after_tlbiel_barrier();
return;
}
 
@@ -304,7 +304,7 @@ static __always_inline void _tlbiel_pid(unsigned long pid, 
unsigned long ric)
for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
__tlbiel_pid(pid, set, RIC_FLUSH_TLB);
 
-   asm volatile("ptesync": : :"memory");
+   ppc_after_tlbiel_barrier();
asm volatile(PPC_RADIX_INVALIDATE_ERAT_USER "; isync" : : :"memory");
 }
 
@@ -431,7 +431,7 @@ static __always_inline void _tlbiel_va(unsigned long va, 
unsigned long pid,
 
asm volatile("ptesync": : :"memory");
__tlbiel_va(va, pid, ap, ric);
-   asm volatile("ptesync": : :"memory");
+   ppc_after_tlbiel_barrier();
 }
 
 static inline void _tlbiel_va_range(unsigned long start, unsigned long end,
@@ -442,7 +442,7 @@ static inline void _tlbiel_va_range(unsigned long start, 
unsigned long end,
if (also_pwc)
__tlbiel_pid(pid, 0, 

[PATCH] powerpc/64s: scv entry should set PPR

2020-08-25 Thread Nicholas Piggin
Kernel entry sets PPR to HMT_MEDIUM by convention. The scv entry
path missed this.

Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv 
instructions")
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 33a42e42c56f..733e40eba4eb 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -113,6 +113,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
ld  r11,exception_marker@toc(r2)
std r11,-16(r10)/* "regshere" marker */
 
+BEGIN_FTR_SECTION
+   HMT_MEDIUM
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+
/*
 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
 * would clobber syscall parameters. Also we always enter with IRQs
-- 
2.23.0



Re: fsl_espi errors on v5.7.15

2020-08-25 Thread Heiner Kallweit
On 25.08.2020 05:54, Chris Packham wrote:
> 
> On 25/08/20 10:04 am, Chris Packham wrote:
>>
>> On 20/08/20 9:08 am, Chris Packham wrote:
>>>
>>> On 19/08/20 6:15 pm, Heiner Kallweit wrote:
 On 19.08.2020 00:44, Chris Packham wrote:
> Hi Again,
>
> On 17/08/20 9:09 am, Chris Packham wrote:
>
>> On 14/08/20 6:19 pm, Heiner Kallweit wrote:
>>> On 14.08.2020 04:48, Chris Packham wrote:
 Hi,

 I'm seeing a problem with accessing spi-nor after upgrading a T2081
 based system to linux v5.7.15

 For this board u-boot and the u-boot environment live on spi-nor.

 When I use fw_setenv from userspace I get the following kernel logs

 # fw_setenv foo=1
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
 empty!
 fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
 ...

>>> This error reporting doesn't exist yet in 4.4. So you may have an 
>>> issue
>>> under 4.4 too, it's just not reported.
>>> Did you verify that under 4.4 fw_setenv actually has an effect?
>> Just double checked and yes under 4.4 the setting does get saved.
 If I run fw_printenv (before getting it into a bad state) it is 
 able to
 display the content of the boards u-boot environment.

>>> This might indicate an issue with spi being locked. I've seen 
>>> related
>>> questions, just use the search engine of your choice and check for
>>> fw_setenv and locked.
>> I'm running a version of fw_setenv which includes
>> https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it 
>> shouldn't
>> be locking things unnecessarily.
 If been unsuccessful in producing a setup for bisecting the 
 issue. I do
 know the issue doesn't occur on the old 4.4.x based kernel but 
 that's
 probably not much help.

 Any pointers on what the issue (and/or solution) might be.
> I finally managed to get our board running with a vanilla kernel. With
> corenet64_smp_defconfig I occasionally see
>
>     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>
> other than the message things seem to be working.
>
> With a custom defconfig I see
>
>     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>     fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't 
> empty!
>     fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>     ...
>
> and access to the spi-nor does not work until the board is reset.
>
> I'll try and pick apart the differences between the two defconfigs.
>>>
>>> I now think my earlier testing is invalid. I have seen the problem 
>>> with either defconfig if I try hard enough. I had convinced myself 
>>> that the problem was CONFIG_PREEMPT but that was before I found 
>>> boot-to-boot differences with the same kernel.
>>>
>>> It's possible that I'm chasing multiple issues with the same symptom.
>>>
>>> The error I'm most concerned with is in the sequence
>>> 1. boot with old image
>>> 2. write environment
>>> 3. boot with new image
>>> 4. write environment
>>> 5. write fails and environment is corrupted
>>>
>>> After I recover the system things sometimes seem fine. Until I repeat 
>>> the sequence above.
>>>
 Also relevant may be:
 - Which dts are you using?
>>> Custom but based heavily on the t2080rdb.
 - What's the spi-nor type, and at which frequency are you operating it?
>>> The board has several alternate parts for the spi-nor so the dts just 
>>> specifies