[PATCH 1/1] powerpc: Update MAINTAINERS for ibmvnic and VAS

2022-04-13 Thread Sukadev Bhattiprolu
Signed-off-by: Sukadev Bhattiprolu 
---
 MAINTAINERS | 2 --
 1 file changed, 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 61d9f114c37f..cf96ac858cc3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9336,14 +9336,12 @@ F:  drivers/pci/hotplug/rpaphp*
 
 IBM Power SRIOV Virtual NIC Device Driver
 M: Dany Madden 
-M: Sukadev Bhattiprolu 
 R: Thomas Falcon 
 L: net...@vger.kernel.org
 S: Supported
 F: drivers/net/ethernet/ibm/ibmvnic.*
 
 IBM Power Virtual Accelerator Switchboard
-M: Sukadev Bhattiprolu 
 L: linuxppc-dev@lists.ozlabs.org
 S: Supported
 F: arch/powerpc/include/asm/vas.h
-- 
2.27.0



Re: [5.16.0-rc5][ppc][net] kernel oops when hotplug remove of vNIC interface

2022-01-06 Thread Sukadev Bhattiprolu
ble (net/core/dev.c:6966).
> >> 6961    void napi_enable(struct napi_struct *n)
> >> 6962    {
> >> 6963        unsigned long val, new;
> >> 6964
> >> 6965        do {
> >> 6966            val = READ_ONCE(n->state);
> >
> > If n is NULL here that's gotta be a driver problem.
> 
> Definitely looks like it, the disassembly is:
> 
>   not r9,r9
>   clrldi  r3,r9,63
>   blr # end of previous function
>   nop
>   addis   r2,r12,491  # function entry
>   addir2,r2,14816
>   stdur1,-48(r1)  # stack frame creation
>   li  r5,-10
>   ld  r9,4352(r13)
>   std r9,40(r1)
>   li  r9,0
>   ld  r8,16(r3)   # load from r3 (n) + 16
> 
> 
> The register dump shows that r3 is NULL, and it comes directly from the
> caller. So we've been called with n = NULL.

Yeah, Good catch Abdul.

I suspect its due to the release_resources() in __ibmvnic_open(). The
problem is hard to reproduce but we are testing following patch with
error injection. Will formally submit after testing/review.

---
From 8a78083e5ec6914be197352f391bfa17420a147c Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Wed, 5 Jan 2022 16:22:58 -0500
Subject: [PATCH 1/1] ibmvnic: don't release napi in __ibmvnic_open()

If __ibmvnic_open() encounters an error such as when setting link state,
it calls release_resources() which frees the napi structures needlessly.
Instead, have __ibmvnic_open() only clean up the work it did so far (i.e.
disable napi and irqs) and leave the rest to the callers.

If caller of __ibmvnic_open() is ibmvnic_open(), it should release the
resources immediately. If the caller is do_reset() or do_hard_reset(),
they will release the resources on the next reset.

Reported-by: Abdul Haleem 
Signed-off-by: Sukadev Bhattiprolu 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 0bb3911dd014..34efba6c117b 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -110,6 +110,7 @@ static void ibmvnic_tx_scrq_clean_buffer(struct 
ibmvnic_adapter *adapter,
 struct ibmvnic_sub_crq_queue *tx_scrq);
 static void free_long_term_buff(struct ibmvnic_adapter *adapter,
struct ibmvnic_long_term_buff *ltb);
+static void ibmvnic_disable_irqs(struct ibmvnic_adapter *adapter);
 
 struct ibmvnic_stat {
char name[ETH_GSTRING_LEN];
@@ -1418,7 +1419,7 @@ static int __ibmvnic_open(struct net_device *netdev)
rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP);
if (rc) {
ibmvnic_napi_disable(adapter);
-   release_resources(adapter);
+   ibmvnic_disable_irqs(adapter);
return rc;
}
 
@@ -1468,9 +1469,6 @@ static int ibmvnic_open(struct net_device *netdev)
rc = init_resources(adapter);
if (rc) {
netdev_err(netdev, "failed to initialize resources\n");
-   release_resources(adapter);
-   release_rx_pools(adapter);
-   release_tx_pools(adapter);
goto out;
}
}
@@ -1487,6 +1485,12 @@ static int ibmvnic_open(struct net_device *netdev)
adapter->state = VNIC_OPEN;
rc = 0;
}
+   if (rc) {
+   release_resources(adapter);
+   release_rx_pools(adapter);
+   release_tx_pools(adapter);
+   }
+
return rc;
 }
 
-- 
2.27.0

> 
> cheers


Re: [PATCH V2 net] ibmvnic: Continue with reset if set link down failed

2021-04-22 Thread Sukadev Bhattiprolu
Lijun Pan [lijunp...@gmail.com] wrote:
> > Now, sure we can attempt a "thorough hard reset" which also does
> > the same hcalls to reestablish the connection. Is there any
> > other magic in do_hard_reset()? But in addition, it also frees lot
> > more Linux kernel buffers and reallocates them for instance.
> 
> Working around everything in do_reset will make the code very difficult

We are not working around everything. We are doing in do_reset()
exactly what we would do in hard reset for this error (ignore the
set link down error and try to reestablish the connection with the
VIOS).

What we are avoiding is unnecessary work on the Linux side for a
communication problem on the VIOS side.

> to manage. Ultimately do_reset can do anything I am afraid, and do_hard_reset
> can be removed completely or merged into do_reset.
> 
> >
> > If we are having a communication problem with the VIOS, what is
> > the point of freeing and reallocating Linux kernel buffers? Beside
> > being inefficient, it would expose us to even more errors during
> > reset under heavy workloads?
> 
> No real customer runs the system under that heavy load created by
> HTX stress test, which can tear down any working system.

We need to talk to capacity planning and test architects about that,
but all I want to know is what hard reset would do differently to
fix this communication error with VIOS.

Sukadev


Re: [PATCH V2 net] ibmvnic: Continue with reset if set link down failed

2021-04-21 Thread Sukadev Bhattiprolu
Lijun Pan [l...@linux.vnet.ibm.com] wrote:
> 
> 
> > On Apr 20, 2021, at 4:35 PM, Dany Madden  wrote:
> > 
> > When ibmvnic gets a FATAL error message from the vnicserver, it marks
> > the Command Respond Queue (CRQ) inactive and resets the adapter. If this
> > FATAL reset fails and a transmission timeout reset follows, the CRQ is
> > still inactive, ibmvnic's attempt to set link down will also fail. If
> > ibmvnic abandons the reset because of this failed set link down and this
> > is the last reset in the workqueue, then this adapter will be left in an
> > inoperable state.
> > 
> > Instead, make the driver ignore this link down failure and continue to
> > free and re-register CRQ so that the adapter has an opportunity to
> > recover.
> 
> This v2 does not adddress the concerns mentioned in v1.
> And I think it is better to exit with error from do_reset, and schedule a 
> thorough
> do_hard_reset if the the adapter is already in unstable state.

We had a FATAL error and when handling it, we failed to send a 
link-down message to the VIOS. So what we need to try next is to 
reset the connection with the VIOS. For this we must talk to the 
firmware using the H_FREE_CRQ and H_REG_CRQ hcalls. do_reset()
does just that in ibmvnic_reset_crq().

Now, sure we can attempt a "thorough hard reset" which also does
the same hcalls to reestablish the connection. Is there any
other magic in do_hard_reset()? But in addition, it also frees lot
more Linux kernel buffers and reallocates them for instance.

If we are having a communication problem with the VIOS, what is
the point of freeing and reallocating Linux kernel buffers? Beside
being inefficient, it would expose us to even more errors during
reset under heavy workloads?

>From what I understand so far, do_reset() is complicated because
it is attempting some optimizations.  If we are going to fall back
to hard reset for every error we might as well drop the do_reset()
and just do the "thorough hard reset" every time right?

The protocol spec is ambiguous and so far I did not get a clear
answer on whether the link-down is even needed. If it is needed,
then should we add it to do_hard_reset() also? If not, we should
remove it (like you mentioned your earlier) completely but am
waiting for confirmation on that. git history has not been helpful.

While there are other rough edges around do_reset() that we are
working on fixing separately (eg: ignore the error return from 
__ibmvnic_close() right above this change) I see a benefit to
the customer with this patch.

I am not convinced we should perform a hard reset just because
the link down failed.

Sukadev


Re: [PATCH] ibmvnic: remove excessive irqsave

2021-03-04 Thread Sukadev Bhattiprolu
angkery [angk...@163.com] wrote:
> From: Junlin Yang 
> 
> ibmvnic_remove locks multiple spinlocks while disabling interrupts:
> spin_lock_irqsave(>state_lock, flags);
> spin_lock_irqsave(>rwi_lock, flags);
> 
> there is no need for the second irqsave,since interrupts are disabled
> at that point, so remove the second irqsave:
> spin_lock_irqsave(>state_lock, flags);
> spin_lock(>rwi_lock);
> 
> Generated by: ./scripts/coccinelle/locks/flags.cocci
> ./drivers/net/ethernet/ibm/ibmvnic.c:5413:1-18:
> ERROR: nested lock+irqsave that reuses flags from line 5404.
> 

Thanks. Please add

Fixes: 4a41c421f367 ("ibmvnic: serialize access to work queue on remove")

> Signed-off-by: Junlin Yang 

Reviewed-by: Sukadev Bhattiprolu 


Re: [PATCH] ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning.

2021-03-02 Thread Sukadev Bhattiprolu
Michal Suchanek [msucha...@suse.de] wrote:
> GCC 7.5 reports:
> ../drivers/net/ethernet/ibm/ibmvnic.c: In function 'ibmvnic_reset_init':
> ../drivers/net/ethernet/ibm/ibmvnic.c:5373:51: warning: 'old_num_tx_queues' 
> may be used uninitialized in this function [-Wmaybe-uninitialized]
> ../drivers/net/ethernet/ibm/ibmvnic.c:5373:6: warning: 'old_num_rx_queues' 
> may be used uninitialized in this function [-Wmaybe-uninitialized]
> 
> The variable is initialized only if(reset) and used only if(reset &&
> something) so this is a false positive. However, there is no reason to
> not initialize the variables unconditionally avoiding the warning.

Yeah, its a false positive, but initializing doesn't hurt.
> 
> Fixes: 635e442f4a48 ("ibmvnic: merge ibmvnic_reset_init and ibmvnic_init")
> Signed-off-by: Michal Suchanek 

Reviewed-by: Sukadev Bhattiprolu 


Re: [PATCH] vio: make remove callback return void

2021-01-28 Thread Sukadev Bhattiprolu


Uwe Kleine-König [u...@kleine-koenig.org] wrote:
> The driver core ignores the return value of struct bus_type::remove()
> because there is only little that can be done. To simplify the quest to
> make this function return void, let struct vio_driver::remove() return
> void, too. All users already unconditionally return 0, this commit makes
> it obvious that returning an error code is a bad idea and makes it
> obvious for future driver authors that returning an error code isn't
> intended.

Slightly off-topic, should ndo_stop() also return a void? Its return value
seems to be mostly ignored and __dev_close_many() has:

/*
 *  Call the device specific close. This cannot fail.
 *  Only if device is UP
 *
 *  We allow it to be called even after a DETACH hot-plug
 *  event.
 */
if (ops->ndo_stop)
ops->ndo_stop(dev);
Sukadev


Re: [PATCH net] ibmvnic: device remove has higher precedence over reset

2021-01-21 Thread Sukadev Bhattiprolu
Lijun Pan [lijunp...@gmail.com] wrote:
> > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c
> > > b/drivers/net/ethernet/ibm/ibmvnic.c
> > > index aed985e08e8a..11f28fd03057 100644
> > > --- a/drivers/net/ethernet/ibm/ibmvnic.c
> > > +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> > > @@ -2235,8 +2235,7 @@ static void __ibmvnic_reset(struct work_struct
> > > *work)
> > >   while (rwi) {
> > >   spin_lock_irqsave(>state_lock, flags);
> > >
> > > - if (adapter->state == VNIC_REMOVING ||
> > > - adapter->state == VNIC_REMOVED) {
> > > + if (adapter->state == VNIC_REMOVED) {

If the adapter is in REMOVING state, there is no point going
through the reset process. We could just bail out here. We
should also drain any other resets in the queue (something
my other patch set was addressing).

Sukadev

> >
> > If we do get here, we would crash because ibmvnic_remove() happened. It
> > frees the adapter struct already.
> 
> Not exactly. viodev is gone; netdev is gone; ibmvnic_adapter is still there.
> 
> Lijun



Re: CONFIG_PPC_VAS depends on 64k pages...?

2020-11-30 Thread Sukadev Bhattiprolu


Christophe Leroy [christophe.le...@csgroup.eu] wrote:
> Hi,
> 
> Le 19/11/2020 à 11:58, Will Springer a écrit :
> > I learned about the POWER9 gzip accelerator a few months ago when the
> > support hit upstream Linux 5.8. However, for some reason the Kconfig
> > dictates that VAS depends on a 64k page size, which is problematic as I
> > run Void Linux, which uses a 4k-page kernel.
> > 
> > Some early poking by others indicated there wasn't an obvious page size
> > dependency in the code, and suggested I try modifying the config to switch
> > it on. I did so, but was stopped by a minor complaint of an "unexpected DT
> > configuration" by the VAS code. I wasn't equipped to figure out exactly what
> > this meant, even after finding the offending condition, so after writing a
> > very drawn-out forum post asking for help, I dropped the subject.
> > 
> > Fast forward to today, when I was reminded of the whole thing again, and
> > decided to debug a bit further. Apparently the VAS platform device
> > (derived from the DT node) has 5 resources on my 4k kernel, instead of 4
> > (which evidently works for others who have had success on 64k kernels). I
> > have no idea what this means in practice (I don't know how to introspect
> > it), but after making a tiny patch[1], everything came up smoothly and I
> > was doing blazing-fast gzip (de)compression in no time.
> > 
> > Everything seems to work fine on 4k pages. So, what's up? Are there
> > pitfalls lurking around that I've yet to stumble over? More reasonably,
> > I'm curious as to why the feature supposedly depends on 64k pages, or if
> > there's anything else I should be concerned about.

Will,

The reason I put in that config check is because we were only able to
test 64K pages at that point.

It is interesting that it is working for you. Following code in skiboot
https://github.com/open-power/skiboot/blob/master/hw/vas.c should restrict
it to 64K pages. IIRC there is also a corresponding change in some NX 
registers that should also be configured to allow 4K pages.


static int init_north_ctl(struct proc_chip *chip)
{
uint64_t val = 0ULL;

val = SETFIELD(VAS_64K_MODE_MASK, val, true);
val = SETFIELD(VAS_ACCEPT_PASTE_MASK, val, true);
val = SETFIELD(VAS_ENABLE_WC_MMIO_BAR, val, true);
val = SETFIELD(VAS_ENABLE_UWC_MMIO_BAR, val, true);
val = SETFIELD(VAS_ENABLE_RMA_MMIO_BAR, val, true);

return vas_scom_write(chip, VAS_MISC_N_CTL, val);
}

I am copying Bulent Albali and Haren Myneni who have been working with
VAS/NX for their thoughts/experience.

> > 
> 
> Maybe ask Sukadev who did the implementation and is maintaining it ?
> 
> > I do have to say I'm quite satisfied with the results of the NX
> > accelerator, though. Being able to shuffle data to a RaptorCS box over gigE
> > and get compressed data back faster than most software gzip could ever
> > hope to achieve is no small feat, let alone the instantaneous results 
> > locally.
> > :)
> > 
> > Cheers,
> > Will Springer [she/her]
> > 
> > [1]: 
> > https://github.com/Skirmisher/void-packages/blob/vas-4k-pages/srcpkgs/linux5.9/patches/ppc-vas-on-4k.patch
> > 
> 
> 
> Christophe


Re: [PATCH v2 1/8] powerpc/perf/hv-24x7: Fix inconsistent output values incase multiple hv-24x7 events run

2020-02-22 Thread Sukadev Bhattiprolu
Kajol Jain [kj...@linux.ibm.com] wrote:
> Commit 2b206ee6b0df ("powerpc/perf/hv-24x7: Display change in counter
> values")' added to print _change_ in the counter value rather then raw
> value for 24x7 counters. Incase of transactions, the event count
> is set to 0 at the beginning of the transaction. It also sets
> the event's prev_count to the raw value at the time of initialization.
> Because of setting event count to 0, we are seeing some weird behaviour,
> whenever we run multiple 24x7 events at a time.

Interesting. Are we taking delta of a delta and ending up with large
negative values in the -I case?  However...



> 
> Signed-off-by: Kajol Jain 
> ---
>  arch/powerpc/perf/hv-24x7.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index 573e0b309c0c..6dbbf70232aa 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -1409,7 +1409,7 @@ static void h_24x7_event_read(struct perf_event *event)
>* that would require issuing a hcall, which would then
>* defeat the purpose of using the txn interface.
>*/
> - local64_set(>count, 0);
> + local64_add(0, >count);

... not sure, how adding zero to the count helps. Should we just remove the
line (and the comment block above it)?  Or does it help to clear the event
count in ->start_txn() rather than on read()?

How does the change impact the counts when run without the -I?

Thanks for chasing this down.

Sukadev


[PATCH] powerpc/xmon: Fix compile error in print_insn* functions

2020-01-22 Thread Sukadev Bhattiprolu
>From 72a7497a8673c93a4b80aa4fc38b88a8e90aa650 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Wed, 22 Jan 2020 18:57:18 -0600
Subject: [PATCH 1/1] powerpc/xmon: Fix compile error in print_insn* functions

Fix couple of compile errors I stumbled upon with CONFIG_XMON=y and
CONFIG_XMON_DISASSEMBLY=n

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/xmon/dis-asm.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/xmon/dis-asm.h b/arch/powerpc/xmon/dis-asm.h
index c4d246ebca37..c4c982d6402e 100644
--- a/arch/powerpc/xmon/dis-asm.h
+++ b/arch/powerpc/xmon/dis-asm.h
@@ -13,13 +13,13 @@ extern int print_insn_spu(unsigned long insn, unsigned long 
memaddr);
 #else
 static inline int print_insn_powerpc(unsigned long insn, unsigned long memaddr)
 {
-   printf("%.8x", insn);
+   printf("%.8lx", insn);
return 0;
 }
 
 static inline int print_insn_spu(unsigned long insn, unsigned long memaddr)
 {
-   printf("%.8x", insn);
+   printf("%.8lx", insn);
return 0;
 }
 #endif
-- 
2.18.1



[PATCH v4 1/2] powerpc/pseries/svm: Use FW_FEATURE to detect SVM

2020-01-21 Thread Sukadev Bhattiprolu
Use FW_FEATURE_SVM to detect a secure guest (SVM). This would be
more efficient than calling mfmsr() frequently.

Suggested-by: Michael Ellerman 
Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/firmware.h   | 3 ++-
 arch/powerpc/include/asm/svm.h| 6 +-
 arch/powerpc/kernel/paca.c| 6 +-
 arch/powerpc/platforms/pseries/firmware.c | 3 +++
 4 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index b3e214a97f3a..23cffcec8a55 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -51,6 +51,7 @@
 #define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0010)
 #define FW_FEATURE_PAPR_SCMASM_CONST(0x0020)
 #define FW_FEATURE_ULTRAVISOR  ASM_CONST(0x0040)
+#define FW_FEATURE_SVM ASM_CONST(0x0080)
 
 #ifndef __ASSEMBLY__
 
@@ -69,7 +70,7 @@ enum {
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
-   FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR,
+   FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR | FW_FEATURE_SVM,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/include/asm/svm.h b/arch/powerpc/include/asm/svm.h
index 85580b30aba4..1d056c70fa87 100644
--- a/arch/powerpc/include/asm/svm.h
+++ b/arch/powerpc/include/asm/svm.h
@@ -10,9 +10,13 @@
 
 #ifdef CONFIG_PPC_SVM
 
+/*
+ * Note that this is not usable in early boot - before FW
+ * features were probed
+ */
 static inline bool is_secure_guest(void)
 {
-   return mfmsr() & MSR_S;
+   return firmware_has_feature(FW_FEATURE_SVM);
 }
 
 void dtl_cache_ctor(void *addr);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 949eceb254d8..3cba33a99549 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -120,7 +120,11 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
if (early_cpu_has_feature(CPU_FTR_HVMODE))
return NULL;
 
-   if (is_secure_guest())
+   /*
+* Firmware features may not have been probed yet, so check
+* MSR rather than FW_FEATURE_SVM in is_secure_guest().
+*/
+   if (mfmsr() & MSR_S)
lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu);
else
lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index d4a8f1702417..c98527fb4937 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -175,4 +175,7 @@ static int __init probe_fw_features(unsigned long node, 
const char *uname, int
 void __init pseries_probe_fw_features(void)
 {
of_scan_flat_dt(probe_fw_features, NULL);
+
+   if (mfmsr() & MSR_S)
+   powerpc_firmware_features |= FW_FEATURE_SVM;
 }
-- 
2.17.2



[PATCH v4 2/2] powerpc/pseries/svm: Disable BHRB/EBB/PMU access

2020-01-21 Thread Sukadev Bhattiprolu
Ultravisor disables some CPU features like BHRB, EBB and PMU in secure
virtual machines (SVMs) for now. Skip accessing those registers in
SVMs to avoid getting a Program Interrupt.

Basic performance monitoring in SVMs is likely to be enabled in the future
after adding the necessary security mechanisms in Ultravisor. Some features,
like BHRB or monitoring event counts in HV-mode (e.g: perf stat -e cycles:h)
may still be restricted for the longer term.

Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Madhavan Srinivasan 
---
Changelog[v4]
- [Paul Mackerras] Drop is_secure_guest() checks in HV-only code
  and indicate if the disabling of PMU is temporary.
- For consistency, also skip registering PMUs in secure guests.

Changelog[v2]
- [Michael Ellerman] Optimize the code using FW_FEATURE_SVM
- Merged EBB/BHRB and PMU patches into one and reorganized code.
- Fix some build errors reported by 
---
 arch/powerpc/kernel/cpu_setup_power.S | 21 +++
 arch/powerpc/kernel/process.c | 23 
 arch/powerpc/perf/power9-pmu.c| 10 +
 arch/powerpc/xmon/xmon.c  | 30 +--
 4 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index a460298c7ddb..9e895d8db468 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -206,14 +206,35 @@ __init_PMU_HV_ISA207:
blr
 
 __init_PMU:
+#ifdef CONFIG_PPC_SVM
+   /*
+* SVM's are restricted from accessing PMU, so skip.
+*/
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip1
+#endif
li  r5,0
mtspr   SPRN_MMCRA,r5
mtspr   SPRN_MMCR0,r5
mtspr   SPRN_MMCR1,r5
mtspr   SPRN_MMCR2,r5
+skip1:
blr
 
 __init_PMU_ISA207:
+
+#ifdef CONFIG_PPC_SVM
+   /*
+* SVM's are restricted from accessing PMU, so skip.
+   */
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip2
+#endif
li  r5,0
mtspr   SPRN_MMCRS,r5
+skip2:
blr
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 639ceae7da9d..83c7c4119305 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1059,9 +1060,11 @@ static inline void save_sprs(struct thread_struct *t)
t->dscr = mfspr(SPRN_DSCR);
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   t->bescr = mfspr(SPRN_BESCR);
-   t->ebbhr = mfspr(SPRN_EBBHR);
-   t->ebbrr = mfspr(SPRN_EBBRR);
+   if (!is_secure_guest()) {
+   t->bescr = mfspr(SPRN_BESCR);
+   t->ebbhr = mfspr(SPRN_EBBHR);
+   t->ebbrr = mfspr(SPRN_EBBRR);
+   }
 
t->fscr = mfspr(SPRN_FSCR);
 
@@ -1097,12 +1100,14 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
}
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   if (old_thread->bescr != new_thread->bescr)
-   mtspr(SPRN_BESCR, new_thread->bescr);
-   if (old_thread->ebbhr != new_thread->ebbhr)
-   mtspr(SPRN_EBBHR, new_thread->ebbhr);
-   if (old_thread->ebbrr != new_thread->ebbrr)
-   mtspr(SPRN_EBBRR, new_thread->ebbrr);
+   if (!is_secure_guest()) {
+   if (old_thread->bescr != new_thread->bescr)
+   mtspr(SPRN_BESCR, new_thread->bescr);
+   if (old_thread->ebbhr != new_thread->ebbhr)
+   mtspr(SPRN_EBBHR, new_thread->ebbhr);
+   if (old_thread->ebbrr != new_thread->ebbrr)
+   mtspr(SPRN_EBBRR, new_thread->ebbrr);
+   }
 
if (old_thread->fscr != new_thread->fscr)
mtspr(SPRN_FSCR, new_thread->fscr);
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 08c3ef796198..c6eca682180d 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -10,6 +10,7 @@
 #define pr_fmt(fmt)"power9-pmu: " fmt
 
 #include "isa207-common.h"
+#include 
 
 /*
  * Raw event encoding for Power9:
@@ -446,6 +447,15 @@ int init_power9_pmu(void)
strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power9"))
return -ENODEV;
 
+   /*
+* Disable PMUs in secure guests until we evaluate security
+* exposure and add relevant functionality in Ultravisor.
+*/
+   if (is_sec

[PATCH v3 1/2] powerpc/pseries/svm: Use FW_FEATURE to detect SVM

2020-01-09 Thread Sukadev Bhattiprolu
Use FW_FEATURE_SVM to detect a secure guest (SVM). This would be
more efficient than calling mfmsr() frequently.

Suggested-by: Michael Ellerman 
Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/firmware.h   | 3 ++-
 arch/powerpc/include/asm/svm.h| 6 +-
 arch/powerpc/kernel/paca.c| 6 +-
 arch/powerpc/platforms/pseries/firmware.c | 3 +++
 4 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index b3e214a97f3a..23cffcec8a55 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -51,6 +51,7 @@
 #define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0010)
 #define FW_FEATURE_PAPR_SCMASM_CONST(0x0020)
 #define FW_FEATURE_ULTRAVISOR  ASM_CONST(0x0040)
+#define FW_FEATURE_SVM ASM_CONST(0x0080)
 
 #ifndef __ASSEMBLY__
 
@@ -69,7 +70,7 @@ enum {
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
-   FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR,
+   FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR | FW_FEATURE_SVM,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/include/asm/svm.h b/arch/powerpc/include/asm/svm.h
index 85580b30aba4..1d056c70fa87 100644
--- a/arch/powerpc/include/asm/svm.h
+++ b/arch/powerpc/include/asm/svm.h
@@ -10,9 +10,13 @@
 
 #ifdef CONFIG_PPC_SVM
 
+/*
+ * Note that this is not usable in early boot - before FW
+ * features were probed
+ */
 static inline bool is_secure_guest(void)
 {
-   return mfmsr() & MSR_S;
+   return firmware_has_feature(FW_FEATURE_SVM);
 }
 
 void dtl_cache_ctor(void *addr);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 949eceb254d8..3cba33a99549 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -120,7 +120,11 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
if (early_cpu_has_feature(CPU_FTR_HVMODE))
return NULL;
 
-   if (is_secure_guest())
+   /*
+* Firmware features may not have been probed yet, so check
+* MSR rather than FW_FEATURE_SVM in is_secure_guest().
+*/
+   if (mfmsr() & MSR_S)
lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu);
else
lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index d4a8f1702417..c98527fb4937 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -175,4 +175,7 @@ static int __init probe_fw_features(unsigned long node, 
const char *uname, int
 void __init pseries_probe_fw_features(void)
 {
of_scan_flat_dt(probe_fw_features, NULL);
+
+   if (mfmsr() & MSR_S)
+   powerpc_firmware_features |= FW_FEATURE_SVM;
 }
-- 
2.17.2



[PATCH v3 2/2] powerpc/pseries/svm: Disable BHRB/EBB/PMU access

2020-01-09 Thread Sukadev Bhattiprolu
Ultravisor disables some CPU features like BHRB, EBB and PMU in
secure virtual machines (SVMs). Skip accessing those registers
in SVMs to avoid getting a Program Interrupt.

Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Madhavan Srinivasan 
---
Changelog[v2]
- [Michael Ellerman] Optimize the code using FW_FEATURE_SVM
- Merged EBB/BHRB and PMU patches into one and reorganized code.
- Fix some build errors reported by 
---
 arch/powerpc/kernel/cpu_setup_power.S   | 21 
 arch/powerpc/kernel/process.c   | 23 ++---
 arch/powerpc/kvm/book3s_hv.c| 33 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 32 +++-
 arch/powerpc/kvm/book3s_hv_tm_builtin.c | 21 ++--
 arch/powerpc/perf/core-book3s.c |  6 +
 arch/powerpc/xmon/xmon.c| 30 +-
 7 files changed, 114 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index a460298c7ddb..9e895d8db468 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -206,14 +206,35 @@ __init_PMU_HV_ISA207:
blr
 
 __init_PMU:
+#ifdef CONFIG_PPC_SVM
+   /*
+* SVM's are restricted from accessing PMU, so skip.
+*/
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip1
+#endif
li  r5,0
mtspr   SPRN_MMCRA,r5
mtspr   SPRN_MMCR0,r5
mtspr   SPRN_MMCR1,r5
mtspr   SPRN_MMCR2,r5
+skip1:
blr
 
 __init_PMU_ISA207:
+
+#ifdef CONFIG_PPC_SVM
+   /*
+* SVM's are restricted from accessing PMU, so skip.
+   */
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip2
+#endif
li  r5,0
mtspr   SPRN_MMCRS,r5
+skip2:
blr
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 639ceae7da9d..83c7c4119305 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1059,9 +1060,11 @@ static inline void save_sprs(struct thread_struct *t)
t->dscr = mfspr(SPRN_DSCR);
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   t->bescr = mfspr(SPRN_BESCR);
-   t->ebbhr = mfspr(SPRN_EBBHR);
-   t->ebbrr = mfspr(SPRN_EBBRR);
+   if (!is_secure_guest()) {
+   t->bescr = mfspr(SPRN_BESCR);
+   t->ebbhr = mfspr(SPRN_EBBHR);
+   t->ebbrr = mfspr(SPRN_EBBRR);
+   }
 
t->fscr = mfspr(SPRN_FSCR);
 
@@ -1097,12 +1100,14 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
}
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   if (old_thread->bescr != new_thread->bescr)
-   mtspr(SPRN_BESCR, new_thread->bescr);
-   if (old_thread->ebbhr != new_thread->ebbhr)
-   mtspr(SPRN_EBBHR, new_thread->ebbhr);
-   if (old_thread->ebbrr != new_thread->ebbrr)
-   mtspr(SPRN_EBBRR, new_thread->ebbrr);
+   if (!is_secure_guest()) {
+   if (old_thread->bescr != new_thread->bescr)
+   mtspr(SPRN_BESCR, new_thread->bescr);
+   if (old_thread->ebbhr != new_thread->ebbhr)
+   mtspr(SPRN_EBBHR, new_thread->ebbhr);
+   if (old_thread->ebbrr != new_thread->ebbrr)
+   mtspr(SPRN_EBBRR, new_thread->ebbrr);
+   }
 
if (old_thread->fscr != new_thread->fscr)
mtspr(SPRN_FSCR, new_thread->fscr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 709cf1fd4cf4..29a2640108d1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -3568,9 +3569,11 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
mtspr(SPRN_PSPB, vcpu->arch.pspb);
mtspr(SPRN_FSCR, vcpu->arch.fscr);
mtspr(SPRN_TAR, vcpu->arch.tar);
-   mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
-   mtspr(SPRN_EBBRR, vcpu->arch.ebbrr);
-   mtspr(SPRN_BESCR, vcpu->arch.bescr);
+   if (!is_secure_guest()) {
+   mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
+   mtspr(SPRN_EBBRR, vcpu->arch.ebbrr);
+   mtspr(SPRN_BESCR, vcpu->arch.bescr);
+   }
mtspr(SPRN_WORT, vcpu->arch.wort);
mtspr(SPRN_TIDR, vcpu->arch.tid);
mtspr(SPRN_DAR, vcpu->arch.shregs.dar);
@@ -36

Re: [PATCH 2/2] powerpc/pseries/svm: Disable BHRB/EBB/PMU access

2020-01-09 Thread Sukadev Bhattiprolu
maddy [ma...@linux.ibm.com] wrote:
> 
> >   __init_PMU:
> > +#ifdef CONFIG_PPC_SVM
> > +   /*
> > +* SVM's are restricted from accessing PMU, so skip.
> > +*/
> > +   mfmsr   r5
> > +   rldicl  r5, r5, 64-MSR_S_LG, 62
> > +   cmpwi   r5,1
> > +   beq skip1
> 
> I know all MMCR* are loaded with 0. But
> it is better if PEF code load the MMCR0
> with freeze bits on. I will send a separate
> patch to handle in the non-svm case.

Quick question: 
By PEF code you mean the Ultravisor and not here in
the SVM right? - bc SVMs cannot access PMU registers.
> 
> Rest looks good.
> Acked-by: Madhavan Srinivasan 

Cool, Thanks,

Sukadev


powerpc/xmon: don't access ASDR in VMs

2020-01-06 Thread Sukadev Bhattiprolu
>From 91a77dbea3c909ff15c66cded37f1334304a293d Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Mon, 6 Jan 2020 13:50:02 -0600
Subject: [PATCH 1/1] powerpc/xmon: don't access ASDR in VMs

ASDR is HV-privileged and must only be accessed in HV-mode.
Fixes a Program Check (0x700) when xmon in a VM dumps SPRs.

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/xmon/xmon.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 02fae453c2ec..b8d179b5cf4f 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1949,15 +1949,14 @@ static void dump_300_sprs(void)
 
printf("pidr   = %.16lx  tidr  = %.16lx\n",
mfspr(SPRN_PID), mfspr(SPRN_TIDR));
-   printf("asdr   = %.16lx  psscr = %.16lx\n",
-   mfspr(SPRN_ASDR), hv ? mfspr(SPRN_PSSCR)
-   : mfspr(SPRN_PSSCR_PR));
+   printf("psscr  = %.16lx\n",
+   hv ? mfspr(SPRN_PSSCR) : mfspr(SPRN_PSSCR_PR));
 
if (!hv)
return;
 
-   printf("ptcr   = %.16lx\n",
-   mfspr(SPRN_PTCR));
+   printf("ptcr   = %.16lx  asdr  = %.16lx\n",
+   mfspr(SPRN_PTCR), mfspr(SPRN_ASDR));
 #endif
 }
 
-- 
2.17.2



Re: [PATCH v4 2/2] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

2020-01-06 Thread Sukadev Bhattiprolu
Ram Pai [linux...@us.ibm.com] wrote:
>
> One small comment.. H_STATE is a better return code than H_UNSUPPORTED.
> 

Here is the updated patch - we now return H_STATE if the abort call is
made after the VM has gone secure.
---
>From 73fe1fa5aff2829f2fae6a339169e56dc0bbae06 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Fri, 27 Sep 2019 14:30:36 -0500
Subject: [PATCH 2/2] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
abort an SVM after it has issued the H_SVM_INIT_START and before the
H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
encounters security violations or other errors when starting an SVM.

Note that this hcall is different from UV_SVM_TERMINATE ucall which
is used by HV to terminate/cleanup an VM that has becore secure.

The H_SVM_INIT_ABORT should basically undo operations that were done
since the H_SVM_INIT_START hcall - i.e page-out all the VM pages back
to normal memory, and terminate the SVM.

(If we do not bring the pages back to normal memory, the text/data
of the VM would be stuck in secure memory and since the SVM did not
go secure, its MSR_S bit will be clear and the VM wont be able to
access its pages even to do a clean exit).

Based on patches and discussion with Paul Mackerras, Ram Pai and
Bharata Rao.

Signed-off-by: Ram Pai 
Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Bharata B Rao 
---
Changelog[v4]:
- [Bharata Rao] Add missing rcu locking
- [Paul Mackerras] simplify code that walks memslots
- Add a check to ensure that H_SVM_INIT_ABORT is called before
  H_SVM_INIT_DONE hcall (i.e the SVM is not already secure).
- [Ram Pai] Return H_STATE if hcall is called after *INIT_DONE.

Changelog[v3]:
- Rather than pass the NIP/MSR as parameters, load them into
  SRR0/SRR1 (like we do with other registers) and terminate
  the VM after paging out pages
- Move the code to add a skip_page_out parameter into a
  separate patch.

Changelog[v2]:
[Paul Mackerras] avoid returning to UV "one last time" after
the state is cleaned up.  So, we now have H_SVM_INIT_ABORT:
- take the VM's NIP/MSR register states as parameters
- inherit the state of other registers as at UV_ESM call.
After cleaning up the partial state, HV uses these to return
directly to the VM with a failed UV_ESM call.
---
 Documentation/powerpc/ultravisor.rst| 60 +
 arch/powerpc/include/asm/hvcall.h   |  1 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  6 +++
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s_hv.c|  3 ++
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 28 ++
 6 files changed, 99 insertions(+)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index 730854f73830..363736d7fd36 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -948,6 +948,66 @@ Use cases
 up its internal state for this virtual machine.
 
 
+H_SVM_INIT_ABORT
+
+
+Abort the process of securing an SVM.
+
+Syntax
+~~
+
+.. code-block:: c
+
+   uint64_t hypercall(const uint64_t H_SVM_INIT_ABORT)
+
+Return values
+~
+
+One of the following values:
+
+   * H_PARAMETER   on successfully cleaning up the state,
+   Hypervisor will return this value to the
+   **guest**, to indicate that the underlying
+   UV_ESM ultracall failed.
+
+   * H_STATE   if called after a VM has gone secure (i.e
+   H_SVM_INIT_DONE hypercall was successful).
+
+   * H_UNSUPPORTED if called from a wrong context (e.g. from a
+   normal VM).
+
+Description
+~~~
+
+Abort the process of securing a virtual machine. This call must
+be made after a prior call to ``H_SVM_INIT_START`` hypercall and
+before a call to ``H_SVM_INIT_DONE``.
+
+On entry into this hypercall the non-volatile GPRs and FPRs are
+expected to contain the values they had at the time the VM issued
+the UV_ESM ultracall. Further ``SRR0`` is expected to contain the
+address of the instruction after the ``UV_ESM`` ultracall and ``SRR1``
+the MSR value with which to return to the VM.
+
+This hypercall will cleanup any partial state that was established for
+the VM since the prior ``H_SVM_INIT_START`` hypercall, including paging
+out pages that were paged-into secure memory, and issue the
+``UV_SVM_TERMINATE`` ultracall to terminate the VM.
+
+After the partial state is cleaned up, control returns to the VM
+(**not Ultravisor**), at the address specified in ``SRR0`` with the
+MSR values set to the value in 

Re: [PATCH V3 2/2] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

2020-01-02 Thread Sukadev Bhattiprolu
Ram Pai [linux...@us.ibm.com] wrote:
> > +unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
> > +{
> > +   int i;
> > +
> > +   if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
> > +   return H_UNSUPPORTED;
> 
> It should also return H_UNSUPPORTED when 
> (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE) is true.

If KVMPPC_SECURE_INIT_DONE is set, KVMPPC_SECURE_INIT_START is also
set - we never clear KVMPPC_SECURE_INIT_START right?

Sukadev


Re: [PATCH 2/2] powerpc/pseries/svm: Disable BHRB/EBB/PMU access

2019-12-26 Thread Sukadev Bhattiprolu
Sukadev Bhattiprolu [suka...@linux.ibm.com] wrote:
> Ultravisor disables some CPU features like BHRB, EBB and PMU in
> secure virtual machines (SVMs). Skip accessing those registers
> in SVMs to avoid getting a Program Interrupt.

Here is an updated patch that explicitly includes  in
in some files to fix build errors reported by .
---

From: Sukadev Bhattiprolu 
Date: Thu, 16 May 2019 20:57:12 -0500
Subject: [PATCH 2/2] powerpc/pseries/svm: Disable BHRB/EBB/PMU access

Ultravisor disables some CPU features like BHRB, EBB and PMU in
secure virtual machines (SVMs). Skip accessing those registers
in SVMs to avoid getting a Program Interrupt.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v2]
- [Michael Ellerman] Optimize the code using FW_FEATURE_SVM
- Merged EBB/BHRB and PMU patches into one and reorganized code.
- Fix some build errors reported by 
---
 arch/powerpc/kernel/cpu_setup_power.S   | 21 
 arch/powerpc/kernel/process.c   | 23 ++---
 arch/powerpc/kvm/book3s_hv.c| 33 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 32 +++-
 arch/powerpc/kvm/book3s_hv_tm_builtin.c | 21 ++--
 arch/powerpc/perf/core-book3s.c |  6 +
 arch/powerpc/xmon/xmon.c| 30 +-
 7 files changed, 114 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index a460298c7ddb..9e895d8db468 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -206,14 +206,35 @@ __init_PMU_HV_ISA207:
blr
 
 __init_PMU:
+#ifdef CONFIG_PPC_SVM
+   /*
+* SVM's are restricted from accessing PMU, so skip.
+*/
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip1
+#endif
li  r5,0
mtspr   SPRN_MMCRA,r5
mtspr   SPRN_MMCR0,r5
mtspr   SPRN_MMCR1,r5
mtspr   SPRN_MMCR2,r5
+skip1:
blr
 
 __init_PMU_ISA207:
+
+#ifdef CONFIG_PPC_SVM
+   /*
+* SVM's are restricted from accessing PMU, so skip.
+   */
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip2
+#endif
li  r5,0
mtspr   SPRN_MMCRS,r5
+skip2:
blr
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 639ceae7da9d..83c7c4119305 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1059,9 +1060,11 @@ static inline void save_sprs(struct thread_struct *t)
t->dscr = mfspr(SPRN_DSCR);
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   t->bescr = mfspr(SPRN_BESCR);
-   t->ebbhr = mfspr(SPRN_EBBHR);
-   t->ebbrr = mfspr(SPRN_EBBRR);
+   if (!is_secure_guest()) {
+   t->bescr = mfspr(SPRN_BESCR);
+   t->ebbhr = mfspr(SPRN_EBBHR);
+   t->ebbrr = mfspr(SPRN_EBBRR);
+   }
 
t->fscr = mfspr(SPRN_FSCR);
 
@@ -1097,12 +1100,14 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
}
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   if (old_thread->bescr != new_thread->bescr)
-   mtspr(SPRN_BESCR, new_thread->bescr);
-   if (old_thread->ebbhr != new_thread->ebbhr)
-   mtspr(SPRN_EBBHR, new_thread->ebbhr);
-   if (old_thread->ebbrr != new_thread->ebbrr)
-   mtspr(SPRN_EBBRR, new_thread->ebbrr);
+   if (!is_secure_guest()) {
+   if (old_thread->bescr != new_thread->bescr)
+   mtspr(SPRN_BESCR, new_thread->bescr);
+   if (old_thread->ebbhr != new_thread->ebbhr)
+   mtspr(SPRN_EBBHR, new_thread->ebbhr);
+   if (old_thread->ebbrr != new_thread->ebbrr)
+   mtspr(SPRN_EBBRR, new_thread->ebbrr);
+   }
 
if (old_thread->fscr != new_thread->fscr)
mtspr(SPRN_FSCR, new_thread->fscr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 709cf1fd4cf4..29a2640108d1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -3568,9 +3569,11 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
mtspr(SPRN_PSPB, vcpu->arch.pspb);
mtspr(SPRN_FSCR, vcpu->arch.fscr);
mtspr(SPRN_TAR, vcpu->arch.tar);
-   mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
-   mt

[PATCH 2/2] powerpc/pseries/svm: Disable BHRB/EBB/PMU access

2019-12-24 Thread Sukadev Bhattiprolu
Ultravisor disables some CPU features like BHRB, EBB and PMU in
secure virtual machines (SVMs). Skip accessing those registers
in SVMs to avoid getting a Program Interrupt.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v2]
- [Michael Ellerman] Optimize the code using FW_FEATURE_SVM
- Merged EBB/BHRB and PMU patches into one and reorganized code.
---
 arch/powerpc/kernel/cpu_setup_power.S   | 21 
 arch/powerpc/kernel/process.c   | 22 ++---
 arch/powerpc/kvm/book3s_hv.c| 32 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 32 +++--
 arch/powerpc/kvm/book3s_hv_tm_builtin.c | 21 +---
 arch/powerpc/perf/core-book3s.c |  5 
 arch/powerpc/xmon/xmon.c| 29 +-
 7 files changed, 110 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index a460298c7ddb..9e895d8db468 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -206,14 +206,35 @@ __init_PMU_HV_ISA207:
blr
 
 __init_PMU:
+#ifdef CONFIG_PPC_SVM
+   /*
+* SVM's are restricted from accessing PMU, so skip.
+*/
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip1
+#endif
li  r5,0
mtspr   SPRN_MMCRA,r5
mtspr   SPRN_MMCR0,r5
mtspr   SPRN_MMCR1,r5
mtspr   SPRN_MMCR2,r5
+skip1:
blr
 
 __init_PMU_ISA207:
+
+#ifdef CONFIG_PPC_SVM
+   /*
+* SVM's are restricted from accessing PMU, so skip.
+   */
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip2
+#endif
li  r5,0
mtspr   SPRN_MMCRS,r5
+skip2:
blr
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 639ceae7da9d..e24b9c740596 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1059,9 +1059,11 @@ static inline void save_sprs(struct thread_struct *t)
t->dscr = mfspr(SPRN_DSCR);
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   t->bescr = mfspr(SPRN_BESCR);
-   t->ebbhr = mfspr(SPRN_EBBHR);
-   t->ebbrr = mfspr(SPRN_EBBRR);
+   if (!is_secure_guest()) {
+   t->bescr = mfspr(SPRN_BESCR);
+   t->ebbhr = mfspr(SPRN_EBBHR);
+   t->ebbrr = mfspr(SPRN_EBBRR);
+   }
 
t->fscr = mfspr(SPRN_FSCR);
 
@@ -1097,12 +1099,14 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
}
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   if (old_thread->bescr != new_thread->bescr)
-   mtspr(SPRN_BESCR, new_thread->bescr);
-   if (old_thread->ebbhr != new_thread->ebbhr)
-   mtspr(SPRN_EBBHR, new_thread->ebbhr);
-   if (old_thread->ebbrr != new_thread->ebbrr)
-   mtspr(SPRN_EBBRR, new_thread->ebbrr);
+   if (!is_secure_guest()) {
+   if (old_thread->bescr != new_thread->bescr)
+   mtspr(SPRN_BESCR, new_thread->bescr);
+   if (old_thread->ebbhr != new_thread->ebbhr)
+   mtspr(SPRN_EBBHR, new_thread->ebbhr);
+   if (old_thread->ebbrr != new_thread->ebbrr)
+   mtspr(SPRN_EBBRR, new_thread->ebbrr);
+   }
 
if (old_thread->fscr != new_thread->fscr)
mtspr(SPRN_FSCR, new_thread->fscr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 709cf1fd4cf4..ced0460afafe 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3568,9 +3568,11 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
mtspr(SPRN_PSPB, vcpu->arch.pspb);
mtspr(SPRN_FSCR, vcpu->arch.fscr);
mtspr(SPRN_TAR, vcpu->arch.tar);
-   mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
-   mtspr(SPRN_EBBRR, vcpu->arch.ebbrr);
-   mtspr(SPRN_BESCR, vcpu->arch.bescr);
+   if (!is_secure_guest()) {
+   mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
+   mtspr(SPRN_EBBRR, vcpu->arch.ebbrr);
+   mtspr(SPRN_BESCR, vcpu->arch.bescr);
+   }
mtspr(SPRN_WORT, vcpu->arch.wort);
mtspr(SPRN_TIDR, vcpu->arch.tid);
mtspr(SPRN_DAR, vcpu->arch.shregs.dar);
@@ -3641,9 +3643,11 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
vcpu->arch.pspb = mfspr(SPRN_PSPB);
vcpu->arch.fscr = mfspr(SPRN_FSCR);
vcpu->arch.tar = mfspr(SPRN_TAR);
-   vcpu->arch.ebbhr = mfsp

[PATCH 1/2] powerpc/pseries/svm: Use FW_FEATURE to detect SVM

2019-12-24 Thread Sukadev Bhattiprolu
Use FW_FEATURE_SVM to detect a secure guest (SVM). This would be
more efficient than calling mfmsr() frequently.

Suggested-by: Michael Ellerman 
Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/firmware.h   | 3 ++-
 arch/powerpc/include/asm/svm.h| 6 +-
 arch/powerpc/kernel/paca.c| 6 +-
 arch/powerpc/platforms/pseries/firmware.c | 3 +++
 4 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index b3e214a97f3a..23cffcec8a55 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -51,6 +51,7 @@
 #define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0010)
 #define FW_FEATURE_PAPR_SCMASM_CONST(0x0020)
 #define FW_FEATURE_ULTRAVISOR  ASM_CONST(0x0040)
+#define FW_FEATURE_SVM ASM_CONST(0x0080)
 
 #ifndef __ASSEMBLY__
 
@@ -69,7 +70,7 @@ enum {
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
-   FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR,
+   FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR | FW_FEATURE_SVM,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/include/asm/svm.h b/arch/powerpc/include/asm/svm.h
index 85580b30aba4..1d056c70fa87 100644
--- a/arch/powerpc/include/asm/svm.h
+++ b/arch/powerpc/include/asm/svm.h
@@ -10,9 +10,13 @@
 
 #ifdef CONFIG_PPC_SVM
 
+/*
+ * Note that this is not usable in early boot - before FW
+ * features were probed
+ */
 static inline bool is_secure_guest(void)
 {
-   return mfmsr() & MSR_S;
+   return firmware_has_feature(FW_FEATURE_SVM);
 }
 
 void dtl_cache_ctor(void *addr);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 949eceb254d8..3cba33a99549 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -120,7 +120,11 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
if (early_cpu_has_feature(CPU_FTR_HVMODE))
return NULL;
 
-   if (is_secure_guest())
+   /*
+* Firmware features may not have been probed yet, so check
+* MSR rather than FW_FEATURE_SVM in is_secure_guest().
+*/
+   if (mfmsr() & MSR_S)
lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu);
else
lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index d4a8f1702417..c98527fb4937 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -175,4 +175,7 @@ static int __init probe_fw_features(unsigned long node, 
const char *uname, int
 void __init pseries_probe_fw_features(void)
 {
of_scan_flat_dt(probe_fw_features, NULL);
+
+   if (mfmsr() & MSR_S)
+   powerpc_firmware_features |= FW_FEATURE_SVM;
 }
-- 
2.17.2



[PATCH v4 1/2] KVM: PPC: Add skip_page_out parameter

2019-12-19 Thread Sukadev Bhattiprolu
Add 'skip_page_out' parameter to kvmppc_uvmem_drop_pages() so the
callers can specify whetheter or not to skip paging out pages. This
will be needed in a follow-on patch that implements H_SVM_INIT_ABORT
hcall

Reviewed-by: Paul Mackerras 
Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h | 4 ++--
 arch/powerpc/kvm/book3s_64_mmu_radix.c  | 2 +-
 arch/powerpc/kvm/book3s_hv.c| 2 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 50204e228f16..3cf8425b9838 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -20,7 +20,7 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm);
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm);
 int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-struct kvm *kvm);
+struct kvm *kvm, bool skip_page_out);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -69,6 +69,6 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
unsigned long gfn)
 
 static inline void
 kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-   struct kvm *kvm) { }
+   struct kvm *kvm, bool skip_page_out) { }
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index da857c8ba6e4..744dba98e5d1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1102,7 +1102,7 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
unsigned int shift;
 
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START)
-   kvmppc_uvmem_drop_pages(memslot, kvm);
+   kvmppc_uvmem_drop_pages(memslot, kvm, true);
 
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
return;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 597f4bfecf0e..66d5312be16b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5493,7 +5493,7 @@ static int kvmhv_svm_off(struct kvm *kvm)
continue;
 
kvm_for_each_memslot(memslot, slots) {
-   kvmppc_uvmem_drop_pages(memslot, kvm);
+   kvmppc_uvmem_drop_pages(memslot, kvm, true);
uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
}
}
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index f24ac3cfb34c..9a5bbad7d87e 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -259,7 +259,7 @@ unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
  * QEMU page table with normal PTEs from newly allocated pages.
  */
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-struct kvm *kvm)
+struct kvm *kvm, bool skip_page_out)
 {
int i;
struct kvmppc_uvmem_page_pvt *pvt;
@@ -277,7 +277,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot 
*free,
 
uvmem_page = pfn_to_page(uvmem_pfn);
pvt = uvmem_page->zone_device_data;
-   pvt->skip_page_out = true;
+   pvt->skip_page_out = skip_page_out;
mutex_unlock(>arch.uvmem_lock);
 
pfn = gfn_to_pfn(kvm, gfn);
-- 
2.17.2



[PATCH v4 2/2] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

2019-12-19 Thread Sukadev Bhattiprolu
Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
abort an SVM after it has issued the H_SVM_INIT_START and before the
H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
encounters security violations or other errors when starting an SVM.

Note that this hcall is different from UV_SVM_TERMINATE ucall which
is used by HV to terminate/cleanup an VM that has becore secure.

The H_SVM_INIT_ABORT should basically undo operations that were done
since the H_SVM_INIT_START hcall - i.e page-out all the VM pages back
to normal memory, and terminate the SVM.

(If we do not bring the pages back to normal memory, the text/data
of the VM would be stuck in secure memory and since the SVM did not
go secure, its MSR_S bit will be clear and the VM wont be able to
access its pages even to do a clean exit).

Based on patches and discussion with Paul Mackerras, Ram Pai and
Bharata Rao.

Signed-off-by: Ram Pai 
Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Bharata B Rao 
---
Changelog[v4]:
- [Bharata Rao] Add missing rcu locking
- [Paul Mackerras] simplify code that walks memslots
- Add a check to ensure that H_SVM_INIT_ABORT is called before
  H_SVM_INIT_DONE hcall (i.e the SVM is not already secure).

Changelog[v3]:
- Rather than pass the NIP/MSR as parameters, load them into
  SRR0/SRR1 (like we do with other registers) and terminate
  the VM after paging out pages
- Move the code to add a skip_page_out parameter into a
  separate patch.

Changelog[v2]:
[Paul Mackerras] avoid returning to UV "one last time" after
the state is cleaned up.  So, we now have H_SVM_INIT_ABORT:
- take the VM's NIP/MSR register states as parameters
- inherit the state of other registers as at UV_ESM call.
After cleaning up the partial state, HV uses these to return
directly to the VM with a failed UV_ESM call.
---
 Documentation/powerpc/ultravisor.rst| 57 +
 arch/powerpc/include/asm/hvcall.h   |  1 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  6 +++
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s_hv.c|  3 ++
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 26 ++
 6 files changed, 94 insertions(+)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index 730854f73830..8c114c071bfa 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -948,6 +948,63 @@ Use cases
 up its internal state for this virtual machine.
 
 
+H_SVM_INIT_ABORT
+
+
+Abort the process of securing an SVM.
+
+Syntax
+~~
+
+.. code-block:: c
+
+   uint64_t hypercall(const uint64_t H_SVM_INIT_ABORT)
+
+Return values
+~
+
+One of the following values:
+
+   * H_PARAMETER   on successfully cleaning up the state,
+   Hypervisor will return this value to the
+   **guest**, to indicate that the underlying
+   UV_ESM ultracall failed.
+
+   * H_UNSUPPORTED if called from the wrong context (e.g. from
+   an SVM or before an H_SVM_INIT_START hypercall).
+
+Description
+~~~
+
+Abort the process of securing a virtual machine. This call must
+be made after a prior call to ``H_SVM_INIT_START`` hypercall and
+before a call to ``H_SVM_INIT_DONE``.
+
+On entry into this hypercall the non-volatile GPRs and FPRs are
+expected to contain the values they had at the time the VM issued
+the UV_ESM ultracall. Further ``SRR0`` is expected to contain the
+address of the instruction after the ``UV_ESM`` ultracall and ``SRR1``
+the MSR value with which to return to the VM.
+
+This hypercall will cleanup any partial state that was established for
+the VM since the prior ``H_SVM_INIT_START`` hypercall, including paging
+out pages that were paged-into secure memory, and issue the
+``UV_SVM_TERMINATE`` ultracall to terminate the VM.
+
+After the partial state is cleaned up, control returns to the VM
+(**not Ultravisor**), at the address specified in ``SRR0`` with the
+MSR values set to the value in ``SRR1``.
+
+Use cases
+~
+
+If after a successful call to ``H_SVM_INIT_START``, the Ultravisor
+encounters an error while securing a virtual machine, either due
+to lack of resources or because the VM's security information could
+not be validated, Ultravisor informs the Hypervisor about it.
+Hypervisor should use this call to clean up any internal state for
+this virtual machine and return to the VM.
+
 H_SVM_PAGE_IN
 -
 
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 13bd870609c3..e90c073e437e 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/ar

Re: [PATCH V3 2/2] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

2019-12-19 Thread Sukadev Bhattiprolu
Paul Mackerras [pau...@ozlabs.org] wrote:
> On Sat, Dec 14, 2019 at 06:12:08PM -0800, Sukadev Bhattiprolu wrote:
> > 
> > Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
> > abort an SVM after it has issued the H_SVM_INIT_START and before the
> > H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
> > encounters security violations or other errors when starting an SVM.
> > 
> > Note that this hcall is different from UV_SVM_TERMINATE ucall which
> > is used by HV to terminate/cleanup an VM that has becore secure.
> > 
> > The H_SVM_INIT_ABORT should basically undo operations that were done
> > since the H_SVM_INIT_START hcall - i.e page-out all the VM pages back
> > to normal memory, and terminate the SVM.
> > 
> > (If we do not bring the pages back to normal memory, the text/data
> > of the VM would be stuck in secure memory and since the SVM did not
> > go secure, its MSR_S bit will be clear and the VM wont be able to
> > access its pages even to do a clean exit).
> > 
> > Based on patches and discussion with Paul Mackerras, Ram Pai and
> > Bharata Rao.
> > 
> > Signed-off-by: Ram Pai 
> > Signed-off-by: Sukadev Bhattiprolu 
> > Signed-off-by: Bharata B Rao 
> 
> Minor comment below, but not a showstopper.  Also, as Bharata noted
> you need to hold the srcu lock for reading.

Yes, I fixed that.

> 
> > +   for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> > +   struct kvm_memory_slot *memslot;
> > +   struct kvm_memslots *slots = __kvm_memslots(kvm, i);
> > +
> > +   if (!slots)
> > +   continue;
> > +
> > +   kvm_for_each_memslot(memslot, slots)
> > +   kvmppc_uvmem_drop_pages(memslot, kvm, false);
> > +   }
> 
> Since we use the default KVM_ADDRESS_SPACE_NUM, which is 1, this code
> isn't wrong but it is more verbose than it needs to be.  It could be
> 
>   kvm_for_each_memslot(kvm_memslots(kvm), slots)
>   kvmppc_uvmem_drop_pages(memslot, kvm, false);

and simplified this.

Thanks.

Sukadev


Re: [PATCH V3 2/2] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

2019-12-19 Thread Sukadev Bhattiprolu
Bharata B Rao [bhar...@linux.ibm.com] wrote:
> On Sat, Dec 14, 2019 at 06:12:08PM -0800, Sukadev Bhattiprolu wrote:
> > +unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
> > +{
> > +   int i;
> > +
> > +   if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
> > +   return H_UNSUPPORTED;
> > +
> > +   for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> > +   struct kvm_memory_slot *memslot;
> > +   struct kvm_memslots *slots = __kvm_memslots(kvm, i);
> > +
> > +   if (!slots)
> > +   continue;
> > +
> > +   kvm_for_each_memslot(memslot, slots)
> > +   kvmppc_uvmem_drop_pages(memslot, kvm, false);
> > +   }
> 
> You need to hold srcu_read_lock(>srcu) here.

Yes, thanks! Fixed in the next version.

Sukadev



Re: [PATCH 1/2] powerpc/pseries/svm: Don't access some SPRs

2019-12-18 Thread Sukadev Bhattiprolu
Michael Ellerman [m...@ellerman.id.au] wrote:
> 
> eg. here.
> 
> This is the fast path of context switch.
> 
> That expands to:
> 
>   if (!(mfmsr() & MSR_S))
>   asm volatile("mfspr %0, SPRN_BESCR" : "=r" (rval));
>   if (!(mfmsr() & MSR_S))
>   asm volatile("mfspr %0, SPRN_EBBHR" : "=r" (rval));
>   if (!(mfmsr() & MSR_S))
>   asm volatile("mfspr %0, SPRN_EBBRR" : "=r" (rval));
> 

Yes, should have optimized this at least :-)
> 
> If the Ultravisor is going to disable EBB and BHRB then we need new
> CPU_FTR bits for those, and the code that accesses those registers
> needs to be put behind cpu_has_feature(EBB) etc.

Will try the cpu_has_feature(). Would it be ok to use a single feature
bit, like UV or make it per-register group as that could need more
feature bits?

Thanks,

Sukadev


[PATCH 2/2] powerpc/pseries/svm: Disable PMUs in SVMs

2019-12-17 Thread Sukadev Bhattiprolu
For now, disable hardware PMU facilities in secure virtual
machines (SVMs) to prevent any information leak between SVMs
and the (untrusted) HV.

With this, a simple 'myperf' program that uses the perf_event_open()
fails for SVMs (with the corresponding fix to UV). In normal VMs and
on the bare-metal HV the syscall and performance counters work

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/kernel/cpu_setup_power.S | 22 ++
 arch/powerpc/perf/core-book3s.c   |  6 ++
 2 files changed, 28 insertions(+)

diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index a460298c7ddb..d5eb06e20b5a 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -206,14 +206,36 @@ __init_PMU_HV_ISA207:
blr
 
 __init_PMU:
+#ifdef CONFIG_PPC_SVM
+   /*
+* For now, SVM's are restricted from accessing PMU
+* features, so skip accordingly.
+*/
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip1
+#endif
li  r5,0
mtspr   SPRN_MMCRA,r5
mtspr   SPRN_MMCR0,r5
mtspr   SPRN_MMCR1,r5
mtspr   SPRN_MMCR2,r5
+skip1:
blr
 
 __init_PMU_ISA207:
+#ifdef CONFIG_PPC_SVM
+   /*
+* For now, SVM's are restricted from accessing PMU
+* features, so skip accordingly.
+*/
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip2
+#endif
li  r5,0
mtspr   SPRN_MMCRS,r5
+skip2:
blr
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 4e76b2251801..9e6a9f1803f6 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2275,6 +2275,12 @@ static int power_pmu_prepare_cpu(unsigned int cpu)
 
 int register_power_pmu(struct power_pmu *pmu)
 {
+   /*
+* PMU events are not currently supported in SVMs
+*/
+   if (is_secure_guest())
+   return -ENOSYS;
+
if (ppmu)
return -EBUSY;  /* something's already registered */
 
-- 
2.17.2



[PATCH 1/2] powerpc/pseries/svm: Don't access some SPRs

2019-12-17 Thread Sukadev Bhattiprolu
Ultravisor disables some CPU features like EBB and BHRB in the HFSCR
for secure virtual machines (SVMs). If the SVMs attempt to access
related registers, they will get a Program Interrupt.

Use macros/wrappers to skip accessing EBB and BHRB registers in secure
VMs.

Signed-off-by: Sukadev Bhattiprolu 
---
---
 arch/powerpc/include/asm/reg.h  | 35 ++
 arch/powerpc/kernel/process.c   | 12 +++
 arch/powerpc/kvm/book3s_hv.c| 24 ++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 48 ++---
 arch/powerpc/kvm/book3s_hv_tm_builtin.c |  6 ++--
 arch/powerpc/perf/core-book3s.c |  5 +--
 arch/powerpc/xmon/xmon.c|  2 +-
 7 files changed, 96 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index b3cbb1136bce..026eb20f6d13 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1379,6 +1379,41 @@ static inline void msr_check_and_clear(unsigned long 
bits)
__msr_check_and_clear(bits);
 }
 
+#ifdef CONFIG_PPC_SVM
+/*
+ * Move from some "restricted" sprs.
+ * Secure VMs should not access some registers as the related features
+ * are disabled in the CPU. If an SVM is attempting read from the given
+ * SPR, return 0. Otherwise behave like a normal mfspr.
+ */
+#define mfspr_r(rn)\
+({ \
+   unsigned long rval = 0ULL;  \
+   \
+   if (!(mfmsr() & MSR_S)) \
+   asm volatile("mfspr %0," __stringify(rn)\
+   : "=r" (rval)); \
+   rval;   \
+})
+
+/*
+ * Move to some "restricted" sprs.
+ * Secure VMs should not access some registers as the related features
+ * are disabled in the CPU. If an SVM is attempting write to the given
+ * SPR, ignore the write. Otherwise behave like a normal mtspr.
+ */
+#define mtspr_r(rn, v) \
+({ \
+   if (!(mfmsr() & MSR_S)) \
+   asm volatile("mtspr " __stringify(rn) ",%0" :   \
+: "r" ((unsigned long)(v)) \
+: "memory");   \
+})
+#else
+#define mfspr_rmfspr
+#define mtspr_rmtspr
+#endif
+
 #ifdef __powerpc64__
 #if defined(CONFIG_PPC_CELL) || defined(CONFIG_PPC_FSL_BOOK3E)
 #define mftb() ({unsigned long rval;   \
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 639ceae7da9d..9a691452ea3b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1059,9 +1059,9 @@ static inline void save_sprs(struct thread_struct *t)
t->dscr = mfspr(SPRN_DSCR);
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   t->bescr = mfspr(SPRN_BESCR);
-   t->ebbhr = mfspr(SPRN_EBBHR);
-   t->ebbrr = mfspr(SPRN_EBBRR);
+   t->bescr = mfspr_r(SPRN_BESCR);
+   t->ebbhr = mfspr_r(SPRN_EBBHR);
+   t->ebbrr = mfspr_r(SPRN_EBBRR);
 
t->fscr = mfspr(SPRN_FSCR);
 
@@ -1098,11 +1098,11 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
if (old_thread->bescr != new_thread->bescr)
-   mtspr(SPRN_BESCR, new_thread->bescr);
+   mtspr_r(SPRN_BESCR, new_thread->bescr);
if (old_thread->ebbhr != new_thread->ebbhr)
-   mtspr(SPRN_EBBHR, new_thread->ebbhr);
+   mtspr_r(SPRN_EBBHR, new_thread->ebbhr);
if (old_thread->ebbrr != new_thread->ebbrr)
-   mtspr(SPRN_EBBRR, new_thread->ebbrr);
+   mtspr_r(SPRN_EBBRR, new_thread->ebbrr);
 
if (old_thread->fscr != new_thread->fscr)
mtspr(SPRN_FSCR, new_thread->fscr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 709cf1fd4cf4..dba21b0e1d22 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3568,9 +3568,9 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
mtspr(SPRN_PSPB, vcpu->arch.pspb);
mtspr(SPRN_FSCR, vcpu->arch.fscr);
mtspr(SPRN_TAR, vcpu->arch.tar);
-   mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
-   mtspr(SPRN_EBBRR, vcpu->arch.ebbrr);
-   mtspr(SPRN_BESCR, v

[PATCH V3 2/2] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

2019-12-14 Thread Sukadev Bhattiprolu


Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
abort an SVM after it has issued the H_SVM_INIT_START and before the
H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
encounters security violations or other errors when starting an SVM.

Note that this hcall is different from UV_SVM_TERMINATE ucall which
is used by HV to terminate/cleanup an VM that has becore secure.

The H_SVM_INIT_ABORT should basically undo operations that were done
since the H_SVM_INIT_START hcall - i.e page-out all the VM pages back
to normal memory, and terminate the SVM.

(If we do not bring the pages back to normal memory, the text/data
of the VM would be stuck in secure memory and since the SVM did not
go secure, its MSR_S bit will be clear and the VM wont be able to
access its pages even to do a clean exit).

Based on patches and discussion with Paul Mackerras, Ram Pai and
Bharata Rao.

Signed-off-by: Ram Pai 
Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Bharata B Rao 
---
Changelog[v3]:
- Rather than pass the NIP/MSR as parameters, load them into
  SRR0/SRR1 (like we do with other registers) and terminate
  the VM after paging out pages
- Move the code to add a skip_page_out parameter into a
  separate patch.

Changelog[v2]:
[Paul Mackerras] avoid returning to UV "one last time" after
the state is cleaned up.  So, we now have H_SVM_INIT_ABORT:
- take the VM's NIP/MSR register states as parameters
- inherit the state of other registers as at UV_ESM call.
After cleaning up the partial state, HV uses these to return
directly to the VM with a failed UV_ESM call.
---
 Documentation/powerpc/ultravisor.rst| 57 +
 arch/powerpc/include/asm/hvcall.h   |  1 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  6 +++
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s_hv.c|  3 ++
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 24 +
 6 files changed, 92 insertions(+)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index 730854f73830..8c114c071bfa 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -948,6 +948,63 @@ Use cases
 up its internal state for this virtual machine.
 
 
+H_SVM_INIT_ABORT
+
+
+Abort the process of securing an SVM.
+
+Syntax
+~~
+
+.. code-block:: c
+
+   uint64_t hypercall(const uint64_t H_SVM_INIT_ABORT)
+
+Return values
+~
+
+One of the following values:
+
+   * H_PARAMETER   on successfully cleaning up the state,
+   Hypervisor will return this value to the
+   **guest**, to indicate that the underlying
+   UV_ESM ultracall failed.
+
+   * H_UNSUPPORTED if called from the wrong context (e.g. from
+   an SVM or before an H_SVM_INIT_START hypercall).
+
+Description
+~~~
+
+Abort the process of securing a virtual machine. This call must
+be made after a prior call to ``H_SVM_INIT_START`` hypercall and
+before a call to ``H_SVM_INIT_DONE``.
+
+On entry into this hypercall the non-volatile GPRs and FPRs are
+expected to contain the values they had at the time the VM issued
+the UV_ESM ultracall. Further ``SRR0`` is expected to contain the
+address of the instruction after the ``UV_ESM`` ultracall and ``SRR1``
+the MSR value with which to return to the VM.
+
+This hypercall will cleanup any partial state that was established for
+the VM since the prior ``H_SVM_INIT_START`` hypercall, including paging
+out pages that were paged-into secure memory, and issue the
+``UV_SVM_TERMINATE`` ultracall to terminate the VM.
+
+After the partial state is cleaned up, control returns to the VM
+(**not Ultravisor**), at the address specified in ``SRR0`` with the
+MSR values set to the value in ``SRR1``.
+
+Use cases
+~
+
+If after a successful call to ``H_SVM_INIT_START``, the Ultravisor
+encounters an error while securing a virtual machine, either due
+to lack of resources or because the VM's security information could
+not be validated, Ultravisor informs the Hypervisor about it.
+Hypervisor should use this call to clean up any internal state for
+this virtual machine and return to the VM.
+
 H_SVM_PAGE_IN
 -
 
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 13bd870609c3..e90c073e437e 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -350,6 +350,7 @@
 #define H_SVM_PAGE_OUT 0xEF04
 #define H_SVM_INIT_START   0xEF08
 #define H_SVM_INIT_DONE0xEF0C
+#define H_SVM_INIT_ABORT   0xEF14
 
 /* Values for 2nd argument to H_SET_MODE *

[PATCH V3 1/2] KVM: PPC: Add skip_page_out parameter

2019-12-14 Thread Sukadev Bhattiprolu


This patch is based on Bharata's v11 KVM patches for secure guests:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-November/200918.html
---

From: Sukadev Bhattiprolu 
Date: Fri, 13 Dec 2019 15:06:16 -0600
Subject: [PATCH V3 1/2] KVM: PPC: Add skip_page_out parameter

Add 'skip_page_out' parameter to kvmppc_uvmem_drop_pages() which will
be needed in a follow-on patch that implements H_SVM_INIT_ABORT hcall.

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h | 4 ++--
 arch/powerpc/kvm/book3s_64_mmu_radix.c  | 2 +-
 arch/powerpc/kvm/book3s_hv.c| 2 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 50204e228f16..3cf8425b9838 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -20,7 +20,7 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm);
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm);
 int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-struct kvm *kvm);
+struct kvm *kvm, bool skip_page_out);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -69,6 +69,6 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
unsigned long gfn)
 
 static inline void
 kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-   struct kvm *kvm) { }
+   struct kvm *kvm, bool skip_page_out) { }
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index da857c8ba6e4..744dba98e5d1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1102,7 +1102,7 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
unsigned int shift;
 
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START)
-   kvmppc_uvmem_drop_pages(memslot, kvm);
+   kvmppc_uvmem_drop_pages(memslot, kvm, true);
 
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
return;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 597f4bfecf0e..66d5312be16b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5493,7 +5493,7 @@ static int kvmhv_svm_off(struct kvm *kvm)
continue;
 
kvm_for_each_memslot(memslot, slots) {
-   kvmppc_uvmem_drop_pages(memslot, kvm);
+   kvmppc_uvmem_drop_pages(memslot, kvm, true);
uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
}
}
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index f24ac3cfb34c..9a5bbad7d87e 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -259,7 +259,7 @@ unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
  * QEMU page table with normal PTEs from newly allocated pages.
  */
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
-struct kvm *kvm)
+struct kvm *kvm, bool skip_page_out)
 {
int i;
struct kvmppc_uvmem_page_pvt *pvt;
@@ -277,7 +277,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot 
*free,
 
uvmem_page = pfn_to_page(uvmem_pfn);
pvt = uvmem_page->zone_device_data;
-   pvt->skip_page_out = true;
+   pvt->skip_page_out = skip_page_out;
mutex_unlock(>arch.uvmem_lock);
 
pfn = gfn_to_pfn(kvm, gfn);
-- 
2.17.2



[PATCH v2 1/1] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

2019-12-12 Thread Sukadev Bhattiprolu


This patch is based on Bharata's v11 KVM patches for secure guests:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-November/200918.html

---
>From c0826bac72a658312f3d87e0bb18ecaf08ac2b2e Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Fri, 27 Sep 2019 14:30:36 -0500
Subject: [PATCH v2 1/1] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
abort an SVM after it has issued the H_SVM_INIT_START and before the
H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
encounters security violations or other errors when starting an SVM.

Note that this hcall is different from UV_SVM_TERMINATE ucall which
is used by HV to terminate/cleanup an VM that has becore secure.

The H_SVM_INIT_ABORT should basically undo operations that were done
since the H_SVM_INIT_START hcall - i.e page-out all the VM pages back
to normal memory, unregister memslots etc.

(If we do not bring the pages back to normal memory, the text/data
of the VM would be stuck in secure memory and since the SVM did not
go secure, its MSR_S bit will be clear and the VM wont be able to
access its pages even to do a clean exit).

Based on patches and discussion with Paul Mackerras, Ram Pai and
Bharata Rao.

Signed-off-by: Ram Pai 
Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Bharata B Rao 
---
Changelog[v2]:
[Paul Mackerras] avoid returning to UV "one last time" after
the state is cleaned up.  So, we now have H_SVM_INIT_ABORT:
- take the VM's NIP/MSR register states as parameters
- inherit the state of other registers as at UV_ESM call.
After cleaning up the partial state, HV uses these to return
directly to the VM with a failed UV_ESM call.
---
 Documentation/powerpc/ultravisor.rst| 56 +
 arch/powerpc/include/asm/hvcall.h   |  1 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h | 10 +++-
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c  |  2 +-
 arch/powerpc/kvm/book3s_hv.c|  5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c  | 48 +-
 7 files changed, 117 insertions(+), 6 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst 
b/Documentation/powerpc/ultravisor.rst
index 730854f73830..ef49c9192775 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -948,6 +948,62 @@ Use cases
 up its internal state for this virtual machine.
 
 
+H_SVM_INIT_ABORT
+
+
+Abort the process of securing an SVM.
+
+Syntax
+~~
+
+.. code-block:: c
+
+   uint64_t hypercall(const uint64_t H_SVM_INIT_ABORT,
+   uint64_t guest_pc,  /* guest NIP to return to */
+   uint64_t guest_msr, /* guest MSR value */
+
+Return values
+~
+
+One of the following values:
+
+   * H_PARAMETER   on successfully cleaning up the state, 
Hypervisor will
+ return this value to the **guest**, to indicate that the underlying
+ UV_ESM ultra call failed.
+
+   * H_UNSUPPORTED if called from the wrong context (e.g. from an 
SVM
+ or before an H_SVM_INIT_START hypercall). This will return to the
+ Ultravisor which incorrectly issued the hcall.
+
+
+Description
+~~~
+
+Abort the process of securing a virtual machine. This call must
+be made after a prior call to ``H_SVM_INIT_START`` hypercall and
+before a call to ``H_SVM_INIT_DONE``.
+
+This hcall will cleanup any partial state that was established for
+the VM since the prior ``H_SVM_INIT_START hcall`` including paging
+out pages that were paged-into secure memory, unregistering memory
+slots etc.
+
+After the partial state is cleaned up, control returns to the address
+specified in ``guest_pc`` with the MSR values set to ``guest_msr``.
+These parameters are expected to match the state of NIP and MSR
+registers of the VM at the time it issued the ``UV_ESM`` ultra call
+to transition to a secure VM.
+
+Use cases
+~
+
+If after a successful call to ``H_SVM_INIT_START``, the Ultravisor
+encounters an error while securing a virtual machine, either due
+to lack of resources or because the VM's security information could
+not be validated, Ultravisor informs the Hypervisor about it.
+Hypervisor should use this call to clean up any internal state for
+this virtual machine and return to the VM.
+
 H_SVM_PAGE_IN
 -
 
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 13bd870609c3..e90c073e437e 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -350,6 +350,7 @@
 #define H_SVM_PAGE_OUT 0xEF04
 #define H_SVM_INIT_START   0xEF08
 #define H_SVM_INIT_DONE0xEF0C
+#define H_SVM_INIT_ABORT   0xEF14
 
 /* Values for

Re: [PATCH v8 8/8] KVM: PPC: Ultravisor: Add PPC_UV config option

2019-09-17 Thread Sukadev Bhattiprolu
Bharata B Rao [bhar...@linux.ibm.com] wrote:
> From: Anshuman Khandual 
> 
> CONFIG_PPC_UV adds support for ultravisor.
> 
> Signed-off-by: Anshuman Khandual 
> Signed-off-by: Bharata B Rao 
> Signed-off-by: Ram Pai 
> [ Update config help and commit message ]
> Signed-off-by: Claudio Carvalho 

Except for one question in Patch 2, the patch series looks good to me.

Reviewed-by: Sukadev Bhattiprolu 


Re: [PATCH v8 4/8] kvmppc: H_SVM_INIT_START and H_SVM_INIT_DONE hcalls

2019-09-17 Thread Sukadev Bhattiprolu


> +unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
> +{
> + if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))

Minor: Should we also check if KVMPPC_SECURE_INIT_DONE is set here (since
both can be set)?


Re: [PATCH v8 3/8] kvmppc: Shared pages support for secure guests

2019-09-17 Thread Sukadev Bhattiprolu


> A secure guest will share some of its pages with hypervisor (Eg. virtio
> bounce buffers etc). Support sharing of pages between hypervisor and
> ultravisor.

A brief note about what a shared page is would help (a page belonging
to the SVM but in normal memory and with decrypted contents)? Either
here or in the function header of kvmppc_h_svm_page_in() where we
handle shared pages.

> 
> Once a secure page is converted to shared page, the device page is

Maybe useful to add "the device page (representing the secure page") is ...

> unmapped from the HV side page tables.
> 
> Signed-off-by: Bharata B Rao 
> ---
>  arch/powerpc/include/asm/hvcall.h  |  3 ++
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 65 --
>  2 files changed, 65 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hvcall.h 
> b/arch/powerpc/include/asm/hvcall.h
> index 2595d0144958..4e98dd992bd1 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -342,6 +342,9 @@
>  #define H_TLB_INVALIDATE 0xF808
>  #define H_COPY_TOFROM_GUEST  0xF80C
> 
> +/* Flags for H_SVM_PAGE_IN */
> +#define H_PAGE_IN_SHARED0x1
> +
>  /* Platform-specific hcalls used by the Ultravisor */
>  #define H_SVM_PAGE_IN0xEF00
>  #define H_SVM_PAGE_OUT   0xEF04
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index a1eccb065ba9..bcecb643a730 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -46,6 +46,7 @@ struct kvmppc_uvmem_page_pvt {
>   unsigned long *rmap;
>   unsigned int lpid;
>   unsigned long gpa;
> + bool skip_page_out;
>  };
> 
>  /*
> @@ -159,6 +160,53 @@ kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned 
> long start,
>   return ret;
>  }
> 
> +/*
> + * Shares the page with HV, thus making it a normal page.
> + *
> + * - If the page is already secure, then provision a new page and share
> + * - If the page is a normal page, share the existing page
> + *
> + * In the former case, uses the dev_pagemap_ops migrate_to_ram handler
> + * to unmap the device page from QEMU's page tables.
> + */
> +static unsigned long
> +kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long 
> page_shift)
> +{
> +
> + int ret = H_PARAMETER;
> + struct page *uvmem_page;
> + struct kvmppc_uvmem_page_pvt *pvt;
> + unsigned long pfn;
> + unsigned long *rmap;
> + struct kvm_memory_slot *slot;
> + unsigned long gfn = gpa >> page_shift;
> + int srcu_idx;
> +
> + srcu_idx = srcu_read_lock(>srcu);
> + slot = gfn_to_memslot(kvm, gfn);
> + if (!slot)
> + goto out;
> +
> + rmap = >arch.rmap[gfn - slot->base_gfn];
> + if (kvmppc_rmap_type(rmap) == KVMPPC_RMAP_UVMEM_PFN) {
> + uvmem_page = pfn_to_page(*rmap & ~KVMPPC_RMAP_UVMEM_PFN);
> + pvt = (struct kvmppc_uvmem_page_pvt *)
> + uvmem_page->zone_device_data;
> + pvt->skip_page_out = true;
> + }
> +
> + pfn = gfn_to_pfn(kvm, gfn);
> + if (is_error_noslot_pfn(pfn))
> + goto out;
> +
> + if (!uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0, page_shift))
> + ret = H_SUCCESS;
> + kvm_release_pfn_clean(pfn);
> +out:
> + srcu_read_unlock(>srcu, srcu_idx);
> + return ret;
> +}
> +
>  /*
>   * H_SVM_PAGE_IN: Move page from normal memory to secure memory.

Would help to mention/remind here what a shared page is.

>   */
> @@ -177,9 +225,12 @@ kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
>   if (page_shift != PAGE_SHIFT)
>   return H_P3;
> 
> - if (flags)
> + if (flags & ~H_PAGE_IN_SHARED)
>   return H_P2;
> 
> + if (flags & H_PAGE_IN_SHARED)
> + return kvmppc_share_page(kvm, gpa, page_shift);
> +
>   ret = H_PARAMETER;
>   srcu_idx = srcu_read_lock(>srcu);
>   down_read(>mm->mmap_sem);
> @@ -252,8 +303,16 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned 
> long start,
>   pvt = spage->zone_device_data;
>   pfn = page_to_pfn(dpage);
> 
> - ret = uv_page_out(pvt->lpid, pfn << page_shift, pvt->gpa, 0,
> -   page_shift);
> + /*
> +  * This function is used in two cases:
> +  * - When HV touches a secure page, for which we do UV_PAGE_OUT
> +  * - When a secure page is converted to shared page, we touch
> +  *   the page to essentially unmap the device page. In this
> +  *   case we skip page-out.
> +  */
> + if (!pvt->skip_page_out)
> + ret = uv_page_out(pvt->lpid, pfn << page_shift, pvt->gpa, 0,
> +   page_shift);
> 
>   if (ret == U_SUCCESS)
>   *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
> -- 
> 2.21.0


Re: [PATCH v8 2/8] kvmppc: Movement of pages between normal and secure memory

2019-09-17 Thread Sukadev Bhattiprolu


In the subject line s/Movement of/Move/? Some minor comments below.

Bharata B Rao [bhar...@linux.ibm.com] wrote:

> Manage migration of pages betwen normal and secure memory of secure
> guest by implementing H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls.
> 
> H_SVM_PAGE_IN: Move the content of a normal page to secure page
> H_SVM_PAGE_OUT: Move the content of a secure page to normal page
> 
> Private ZONE_DEVICE memory equal to the amount of secure memory
> available in the platform for running secure guests is created.
> Whenever a page belonging to the guest becomes secure, a page from
> this private device memory is used to represent and track that secure
> page on the HV side. The movement of pages between normal and secure
> memory is done via migrate_vma_pages() using UV_PAGE_IN and
> UV_PAGE_OUT ucalls.
> 
> Signed-off-by: Bharata B Rao 
> ---
>  arch/powerpc/include/asm/hvcall.h   |   4 +
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h |  29 ++
>  arch/powerpc/include/asm/kvm_host.h |  12 +
>  arch/powerpc/include/asm/ultravisor-api.h   |   2 +
>  arch/powerpc/include/asm/ultravisor.h   |  14 +
>  arch/powerpc/kvm/Makefile   |   3 +
>  arch/powerpc/kvm/book3s_hv.c|  19 +
>  arch/powerpc/kvm/book3s_hv_uvmem.c  | 431 
>  8 files changed, 514 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/kvm_book3s_uvmem.h
>  create mode 100644 arch/powerpc/kvm/book3s_hv_uvmem.c
> 
> diff --git a/arch/powerpc/include/asm/hvcall.h 
> b/arch/powerpc/include/asm/hvcall.h
> index 2023e327..2595d0144958 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -342,6 +342,10 @@
>  #define H_TLB_INVALIDATE 0xF808
>  #define H_COPY_TOFROM_GUEST  0xF80C
> 
> +/* Platform-specific hcalls used by the Ultravisor */
> +#define H_SVM_PAGE_IN0xEF00
> +#define H_SVM_PAGE_OUT   0xEF04
> +
>  /* Values for 2nd argument to H_SET_MODE */
>  #define H_SET_MODE_RESOURCE_SET_CIABR1
>  #define H_SET_MODE_RESOURCE_SET_DAWR 2
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
> b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> new file mode 100644
> index ..9603c2b48d67
> --- /dev/null
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __POWERPC_KVM_PPC_HMM_H__
> +#define __POWERPC_KVM_PPC_HMM_H__
> +
> +#ifdef CONFIG_PPC_UV
> +unsigned long kvmppc_h_svm_page_in(struct kvm *kvm,
> +unsigned long gra,
> +unsigned long flags,
> +unsigned long page_shift);
> +unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
> + unsigned long gra,
> + unsigned long flags,
> + unsigned long page_shift);
> +#else
> +static inline unsigned long
> +kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gra,
> +  unsigned long flags, unsigned long page_shift)
> +{
> + return H_UNSUPPORTED;
> +}
> +
> +static inline unsigned long
> +kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gra,
> +   unsigned long flags, unsigned long page_shift)
> +{
> + return H_UNSUPPORTED;
> +}
> +#endif /* CONFIG_PPC_UV */
> +#endif /* __POWERPC_KVM_PPC_HMM_H__ */
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 81cd221ccc04..16633ad3be45 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -869,4 +869,16 @@ static inline void kvm_arch_vcpu_blocking(struct 
> kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
> 
> +#ifdef CONFIG_PPC_UV
> +int kvmppc_uvmem_init(void);
> +void kvmppc_uvmem_free(void);
> +#else
> +static inline int kvmppc_uvmem_init(void)
> +{
> + return 0;
> +}
> +
> +static inline void kvmppc_uvmem_free(void) {}
> +#endif /* CONFIG_PPC_UV */
> +
>  #endif /* __POWERPC_KVM_HOST_H__ */
> diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
> b/arch/powerpc/include/asm/ultravisor-api.h
> index 6a0f9c74f959..1cd1f595fd81 100644
> --- a/arch/powerpc/include/asm/ultravisor-api.h
> +++ b/arch/powerpc/include/asm/ultravisor-api.h
> @@ -25,5 +25,7 @@
>  /* opcodes */
>  #define UV_WRITE_PATE0xF104
>  #define UV_RETURN0xF11C
> +#define UV_PAGE_IN   0xF128
> +#define UV_PAGE_OUT  0xF12C
> 
>  #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
> diff --git a/arch/powerpc/include/asm/ultravisor.h 
> b/arch/powerpc/include/asm/ultravisor.h
> index d7aa97aa7834..0fc4a974b2e8 100644
> --- a/arch/powerpc/include/asm/ultravisor.h
> +++ b/arch/powerpc/include/asm/ultravisor.h
> @@ -31,4 +31,18 @@ 

Re: [PATCH v8 7/8] kvmppc: Support reset of secure guest

2019-09-17 Thread Sukadev Bhattiprolu
Bharata B Rao [bhar...@linux.ibm.com] wrote:
> Add support for reset of secure guest via a new ioctl KVM_PPC_SVM_OFF.
> This ioctl will be issued by QEMU during reset and includes the
> the following steps:
> 
> - Ask UV to terminate the guest via UV_SVM_TERMINATE ucall
> - Unpin the VPA pages so that they can be migrated back to secure
>   side when guest becomes secure again. This is required because
>   pinned pages can't be migrated.
> - Reinitialize guest's partitioned scoped page tables. These are
>   freed when guest becomes secure (H_SVM_INIT_DONE)
> - Release all device pages of the secure guest.
> 
> After these steps, guest is ready to issue UV_ESM call once again
> to switch to secure mode.
> 
> Signed-off-by: Bharata B Rao 
> Signed-off-by: Sukadev Bhattiprolu 
>   [Implementation of uv_svm_terminate() and its call from
>   guest shutdown path]
> Signed-off-by: Ram Pai 
>   [Unpinning of VPA pages]
> ---
>  Documentation/virt/kvm/api.txt  | 19 ++
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h |  7 ++
>  arch/powerpc/include/asm/kvm_ppc.h  |  2 +
>  arch/powerpc/include/asm/ultravisor-api.h   |  1 +
>  arch/powerpc/include/asm/ultravisor.h   |  5 ++
>  arch/powerpc/kvm/book3s_hv.c| 74 +
>  arch/powerpc/kvm/book3s_hv_uvmem.c  | 62 -
>  arch/powerpc/kvm/powerpc.c  | 12 
>  include/uapi/linux/kvm.h|  1 +
>  9 files changed, 182 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> index 2d067767b617..8e7a02e547e9 100644
> --- a/Documentation/virt/kvm/api.txt
> +++ b/Documentation/virt/kvm/api.txt
> @@ -4111,6 +4111,25 @@ Valid values for 'action':
>  #define KVM_PMU_EVENT_ALLOW 0
>  #define KVM_PMU_EVENT_DENY 1
> 
> +4.121 KVM_PPC_SVM_OFF
> +
> +Capability: basic
> +Architectures: powerpc
> +Type: vm ioctl
> +Parameters: none
> +Returns: 0 on successful completion,
> +Errors:
> +  EINVAL:if ultravisor failed to terminate the secure guest
> +  ENOMEM:if hypervisor failed to allocate new radix page tables for guest
> +
> +This ioctl is used to turn off the secure mode of the guest or transition
> +the guest from secure mode to normal mode. This is invoked when the guest
> +is reset. This has no effect if called for a normal guest.
> +
> +This ioctl issues an ultravisor call to terminate the secure guest,
> +unpins the VPA pages, reinitializes guest's partition scoped page
> +tables and releases all the device pages that are used to track the
> +secure pages by hypervisor.
> 
>  5. The kvm_run structure
>  
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
> b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index fc924ef00b91..6b8cc8edd0ab 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -13,6 +13,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
>   unsigned long page_shift);
>  unsigned long kvmppc_h_svm_init_start(struct kvm *kvm);
>  unsigned long kvmppc_h_svm_init_done(struct kvm *kvm);
> +void kvmppc_uvmem_free_memslot_pfns(struct kvm *kvm,
> + struct kvm_memslots *slots);
>  #else
>  static inline unsigned long
>  kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gra,
> @@ -37,5 +39,10 @@ static inline unsigned long kvmppc_h_svm_init_done(struct 
> kvm *kvm)
>  {
>   return H_UNSUPPORTED;
>  }
> +
> +static inline void kvmppc_uvmem_free_memslot_pfns(struct kvm *kvm,
> +   struct kvm_memslots *slots)
> +{
> +}
>  #endif /* CONFIG_PPC_UV */
>  #endif /* __POWERPC_KVM_PPC_HMM_H__ */
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 2484e6a8f5ca..e4093d067354 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -177,6 +177,7 @@ extern void kvm_spapr_tce_release_iommu_group(struct kvm 
> *kvm,
>  extern int kvmppc_switch_mmu_to_hpt(struct kvm *kvm);
>  extern int kvmppc_switch_mmu_to_radix(struct kvm *kvm);
>  extern void kvmppc_setup_partition_table(struct kvm *kvm);
> +extern int kvmppc_reinit_partition_table(struct kvm *kvm);
> 
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>   struct kvm_create_spapr_tce_64 *args);
> @@ -321,6 +322,7 @@ struct kvmppc_ops {
>  int size);
>   int (*store_to_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr,
> int size);
&

Re: [RFC PATCH 00/11] Secure Virtual Machine Enablement

2019-09-03 Thread Sukadev Bhattiprolu
Thiago Jung Bauermann [bauer...@linux.ibm.com] wrote:
> [ Some people didn't receive all the patches in this series, even though
>   the linuxppc-dev list did so trying to send again. This is exactly the
>   same series I posted yesterday. Sorry for the clutter. ]
> 
> This series contains preliminary work to enable Secure Virtual Machines
> (SVM) on powerpc. SVMs request to be migrated to secure memory very early in
> the boot process (in prom_init()), so by default all of their memory is
> inaccessible to the hypervisor. There is an ultravisor call that the VM can
> use to request certain pages to be made accessible (aka shared).

I would like to piggy-back on this series (since it provides the
context) to add another patch we need for SVMs :-) 

Appreciate any comments. 

---

>From ed93a0e36ec886483a72fdb8d99380bbe6607f37 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Thu, 16 May 2019 20:57:12 -0500
Subject: [PATCH 1/1] powerpc/pseries/svm: don't access some SPRs

Ultravisor disables some CPU features like EBB and BHRB in the HFSCR
for secure virtual machines (SVMs). If the SVMs attempt to access
related registers, they will get a Program Interrupt.

Use macros/wrappers to skip accessing EBB and BHRB related registers
for secure VMs.

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/reg.h  | 35 
 arch/powerpc/kernel/process.c   | 12 -
 arch/powerpc/kvm/book3s_hv.c| 24 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 48 -
 arch/powerpc/kvm/book3s_hv_tm_builtin.c |  6 ++---
 arch/powerpc/perf/core-book3s.c |  4 +--
 arch/powerpc/xmon/xmon.c|  2 +-
 7 files changed, 95 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index ec3714c..1397cb3 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1376,6 +1376,41 @@ static inline void msr_check_and_clear(unsigned long 
bits)
__msr_check_and_clear(bits);
 }
 
+#ifdef CONFIG_PPC_SVM
+/*
+ * Move from some "restricted" sprs.
+ * Secure VMs should not access some registers as the related features
+ * are disabled in the CPU. If an SVM is attempting read from the given
+ * SPR, return 0. Otherwise behave like a normal mfspr.
+ */
+#define mfspr_r(rn)\
+({ \
+   unsigned long rval = 0ULL;  \
+   \
+   if (!(mfmsr() & MSR_S)) \
+   asm volatile("mfspr %0," __stringify(rn)\
+   : "=r" (rval)); \
+   rval;   \
+})
+
+/*
+ * Move to some "restricted" sprs.
+ * Secure VMs should not access some registers as the related features
+ * are disabled in the CPU. If an SVM is attempting write to the given
+ * SPR, ignore the write. Otherwise behave like a normal mtspr.
+ */
+#define mtspr_r(rn, v) \
+({ \
+   if (!(mfmsr() & MSR_S)) \
+   asm volatile("mtspr " __stringify(rn) ",%0" :   \
+: "r" ((unsigned long)(v)) \
+: "memory");   \
+})
+#else
+#define mfspr_rmfspr
+#define mtspr_rmtspr
+#endif
+
 #ifdef __powerpc64__
 #if defined(CONFIG_PPC_CELL) || defined(CONFIG_PPC_FSL_BOOK3E)
 #define mftb() ({unsigned long rval;   \
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 8fc4de0..d5e7386 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1072,9 +1072,9 @@ static inline void save_sprs(struct thread_struct *t)
t->dscr = mfspr(SPRN_DSCR);
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   t->bescr = mfspr(SPRN_BESCR);
-   t->ebbhr = mfspr(SPRN_EBBHR);
-   t->ebbrr = mfspr(SPRN_EBBRR);
+   t->bescr = mfspr_r(SPRN_BESCR);
+   t->ebbhr = mfspr_r(SPRN_EBBHR);
+   t->ebbrr = mfspr_r(SPRN_EBBRR);
 
t->fscr = mfspr(SPRN_FSCR);
 
@@ -,11 +,11 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
 
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
if (old_thread->bescr != new_thread->bescr)
-   mtspr(SPRN_BESCR, new_thread->bescr);
+   mtspr_r(SPRN_BESCR, new_thread->bescr);
if (old_threa

Re: [PATCH v7 1/7] kvmppc: Driver to manage pages of secure guest

2019-08-29 Thread Sukadev Bhattiprolu
Bharata B Rao [bhar...@linux.ibm.com] wrote:
> On Wed, Aug 28, 2019 at 08:02:19PM -0700, Sukadev Bhattiprolu wrote:
> > Some minor comments/questions below. Overall, the patches look
> > fine to me.
> > 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +static struct dev_pagemap kvmppc_devm_pgmap;
> > > +static unsigned long *kvmppc_devm_pfn_bitmap;
> > > +static DEFINE_SPINLOCK(kvmppc_devm_pfn_lock);
> > 
> > Is this lock protecting just the pfn_bitmap?
> 
> Yes.
> 
> > 
> > > +
> > > +struct kvmppc_devm_page_pvt {
> > > + unsigned long *rmap;
> > > + unsigned int lpid;
> > > + unsigned long gpa;
> > > +};
> > > +
> > > +/*
> > > + * Get a free device PFN from the pool
> > > + *
> > > + * Called when a normal page is moved to secure memory (UV_PAGE_IN). 
> > > Device
> > > + * PFN will be used to keep track of the secure page on HV side.
> > > + *
> > > + * @rmap here is the slot in the rmap array that corresponds to @gpa.
> > > + * Thus a non-zero rmap entry indicates that the corresponding guest
> > > + * page has become secure, and is not mapped on the HV side.
> > > + *
> > > + * NOTE: In this and subsequent functions, we pass around and access
> > > + * individual elements of kvm_memory_slot->arch.rmap[] without any
> > > + * protection. Should we use lock_rmap() here?

Where do we serialize two threads attempting to H_SVM_PAGE_IN the same gfn
at the same time? Or one thread issuing a H_SVM_PAGE_IN and another a
H_SVM_PAGE_OUT for the same page?

> > > + */
> > > +static struct page *kvmppc_devm_get_page(unsigned long *rmap, unsigned 
> > > long gpa,
> > > +  unsigned int lpid)
> > > +{
> > > + struct page *dpage = NULL;
> > > + unsigned long bit, devm_pfn;
> > > + unsigned long flags;
> > > + struct kvmppc_devm_page_pvt *pvt;
> > > + unsigned long pfn_last, pfn_first;
> > > +
> > > + if (kvmppc_rmap_is_devm_pfn(*rmap))
> > > + return NULL;
> > > +
> > > + pfn_first = kvmppc_devm_pgmap.res.start >> PAGE_SHIFT;
> > > + pfn_last = pfn_first +
> > > +(resource_size(_devm_pgmap.res) >> PAGE_SHIFT);
> > > + spin_lock_irqsave(_devm_pfn_lock, flags);
> > 
> > Blank lines around spin_lock() would help.
> 
> You mean blank line before lock and after unlock to clearly see
> where the lock starts and ends?
> 
> > 
> > > + bit = find_first_zero_bit(kvmppc_devm_pfn_bitmap, pfn_last - pfn_first);
> > > + if (bit >= (pfn_last - pfn_first))
> > > + goto out;
> > > +
> > > + bitmap_set(kvmppc_devm_pfn_bitmap, bit, 1);
> > > + devm_pfn = bit + pfn_first;
> > 
> > Can we drop the _devm_pfn_lock here or after the trylock_page()?
> > Or does it also protect the ->zone_device_data' assignment below as well?
> > If so, maybe drop the 'pfn_' from the name of the lock?
> > 
> > Besides, we don't seem to hold this lock when accessing ->zone_device_data
> > in kvmppc_share_page(). Maybe _devm_pfn_lock just protects the 
> > bitmap?
> 
> Will move the unlock to appropriately.
> 
> > 
> > 
> > > + dpage = pfn_to_page(devm_pfn);
> > 
> > Does this code and hence CONFIG_PPC_UV depend on a specific model like
> > CONFIG_SPARSEMEM_VMEMMAP?
> 
> I don't think so. Irrespective of that pfn_to_page() should just work
> for us.
> 
> > > +
> > > + if (!trylock_page(dpage))
> > > + goto out_clear;
> > > +
> > > + *rmap = devm_pfn | KVMPPC_RMAP_DEVM_PFN;
> > > + pvt = kzalloc(sizeof(*pvt), GFP_ATOMIC);
> > > + if (!pvt)
> > > + goto out_unlock;

If we fail to alloc, we don't clear the KVMPPC_RMAP_DEVM_PFN?

Also, when/where do we clear this flag on an uv-page-out?
kvmppc_devm_drop_pages() drops the flag on a local variable but not
in the rmap? If we don't clear the flag on page-out, would the
subsequent H_SVM_PAGE_IN of this page fail?

> > > + pvt->rmap = rmap;
> > > + pvt->gpa = gpa;
> > > + pvt->lpid = lpid;
> > > + dpage->zone_device_data = pvt;
> > 
> > ->zone_device_data is set after locking the dpage here, but in
> > kvmppc_share_page() and kvmppc_devm_fault_migrate_alloc_and_copy()
> > it is accessed without locking the page?
> > 
> > > + spin_unlock_irqrestore(_devm_pfn_l

Re: [PATCH v7 5/7] kvmppc: Radix changes for secure guest

2019-08-28 Thread Sukadev Bhattiprolu
> - After the guest becomes secure, when we handle a page fault of a page
>   belonging to SVM in HV, send that page to UV via UV_PAGE_IN.
> - Whenever a page is unmapped on the HV side, inform UV via UV_PAGE_INVAL.
> - Ensure all those routines that walk the secondary page tables of
>   the guest don't do so in case of secure VM. For secure guest, the
>   active secondary page tables are in secure memory and the secondary
>   page tables in HV are freed when guest becomes secure.
> 
> Signed-off-by: Bharata B Rao 
> ---
>  arch/powerpc/include/asm/kvm_host.h   | 12 
>  arch/powerpc/include/asm/ultravisor-api.h |  1 +
>  arch/powerpc/include/asm/ultravisor.h |  5 +
>  arch/powerpc/kvm/book3s_64_mmu_radix.c| 22 ++
>  arch/powerpc/kvm/book3s_hv_devm.c | 20 
>  5 files changed, 60 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 66e5cc8c9759..29333e8de1c4 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -867,6 +867,8 @@ static inline void kvm_arch_vcpu_block_finish(struct 
> kvm_vcpu *vcpu) {}
>  #ifdef CONFIG_PPC_UV
>  extern int kvmppc_devm_init(void);
>  extern void kvmppc_devm_free(void);
> +extern bool kvmppc_is_guest_secure(struct kvm *kvm);
> +extern int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gpa);
>  #else
>  static inline int kvmppc_devm_init(void)
>  {
> @@ -874,6 +876,16 @@ static inline int kvmppc_devm_init(void)
>  }
> 
>  static inline void kvmppc_devm_free(void) {}
> +
> +static inline bool kvmppc_is_guest_secure(struct kvm *kvm)
> +{
> + return false;
> +}
> +
> +static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gpa)
> +{
> + return -EFAULT;
> +}
>  #endif /* CONFIG_PPC_UV */
> 
>  #endif /* __POWERPC_KVM_HOST_H__ */
> diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
> b/arch/powerpc/include/asm/ultravisor-api.h
> index 46b1ee381695..cf200d4ce703 100644
> --- a/arch/powerpc/include/asm/ultravisor-api.h
> +++ b/arch/powerpc/include/asm/ultravisor-api.h
> @@ -29,5 +29,6 @@
>  #define UV_UNREGISTER_MEM_SLOT   0xF124
>  #define UV_PAGE_IN   0xF128
>  #define UV_PAGE_OUT  0xF12C
> +#define UV_PAGE_INVAL0xF138
> 
>  #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
> diff --git a/arch/powerpc/include/asm/ultravisor.h 
> b/arch/powerpc/include/asm/ultravisor.h
> index 719c0c3930b9..b333241bbe4c 100644
> --- a/arch/powerpc/include/asm/ultravisor.h
> +++ b/arch/powerpc/include/asm/ultravisor.h
> @@ -57,4 +57,9 @@ static inline int uv_unregister_mem_slot(u64 lpid, u64 
> slotid)
>   return ucall_norets(UV_UNREGISTER_MEM_SLOT, lpid, slotid);
>  }
> 
> +static inline int uv_page_inval(u64 lpid, u64 gpa, u64 page_shift)
> +{
> + return ucall_norets(UV_PAGE_INVAL, lpid, gpa, page_shift);
> +}
> +
>  #endif   /* _ASM_POWERPC_ULTRAVISOR_H */
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
> b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> index 2d415c36a61d..93ad34e63045 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> @@ -19,6 +19,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> 
>  /*
>   * Supported radix tree geometry.
> @@ -915,6 +917,9 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, 
> struct kvm_vcpu *vcpu,
>   if (!(dsisr & DSISR_PRTABLE_FAULT))
>   gpa |= ea & 0xfff;
> 
> + if (kvmppc_is_guest_secure(kvm))
> + return kvmppc_send_page_to_uv(kvm, gpa & PAGE_MASK);
> +
>   /* Get the corresponding memslot */
>   memslot = gfn_to_memslot(kvm, gfn);
> 
> @@ -972,6 +977,11 @@ int kvm_unmap_radix(struct kvm *kvm, struct 
> kvm_memory_slot *memslot,
>   unsigned long gpa = gfn << PAGE_SHIFT;
>   unsigned int shift;
> 
> + if (kvmppc_is_guest_secure(kvm)) {
> + uv_page_inval(kvm->arch.lpid, gpa, PAGE_SIZE);
> + return 0;
> + }

If it is a page we share with UV, won't we need to drop the HV mapping
for the page?
> +
>   ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, );
>   if (ptep && pte_present(*ptep))
>   kvmppc_unmap_pte(kvm, ptep, gpa, shift, memslot,
> @@ -989,6 +999,9 @@ int kvm_age_radix(struct kvm *kvm, struct kvm_memory_slot 
> *memslot,
>   int ref = 0;
>   unsigned long old, *rmapp;
> 
> + if (kvmppc_is_guest_secure(kvm))
> + return ref;
> +
>   ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, );
>   if (ptep && pte_present(*ptep) && pte_young(*ptep)) {
>   old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_ACCESSED, 0,
> @@ -1013,6 +1026,9 @@ int kvm_test_age_radix(struct kvm *kvm, struct 
> kvm_memory_slot *memslot,
>   unsigned int shift;
>   int ref = 0;
> 
> + if (kvmppc_is_guest_secure(kvm))
> + return ref;
> +
>  

Re: [PATCH v7 2/7] kvmppc: Shared pages support for secure guests

2019-08-28 Thread Sukadev Bhattiprolu
> A secure guest will share some of its pages with hypervisor (Eg. virtio
> bounce buffers etc). Support sharing of pages between hypervisor and
> ultravisor.
> 
> Once a secure page is converted to shared page, the device page is
> unmapped from the HV side page tables.
> 
> Signed-off-by: Bharata B Rao 
> ---
>  arch/powerpc/include/asm/hvcall.h |  3 ++
>  arch/powerpc/kvm/book3s_hv_devm.c | 70 +--
>  2 files changed, 69 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hvcall.h 
> b/arch/powerpc/include/asm/hvcall.h
> index 2f6b952deb0f..05b8536f6653 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -337,6 +337,9 @@
>  #define H_TLB_INVALIDATE 0xF808
>  #define H_COPY_TOFROM_GUEST  0xF80C
> 
> +/* Flags for H_SVM_PAGE_IN */
> +#define H_PAGE_IN_SHARED0x1
> +
>  /* Platform-specific hcalls used by the Ultravisor */
>  #define H_SVM_PAGE_IN0xEF00
>  #define H_SVM_PAGE_OUT   0xEF04
> diff --git a/arch/powerpc/kvm/book3s_hv_devm.c 
> b/arch/powerpc/kvm/book3s_hv_devm.c
> index 13722f27fa7d..6a3229b78fed 100644
> --- a/arch/powerpc/kvm/book3s_hv_devm.c
> +++ b/arch/powerpc/kvm/book3s_hv_devm.c
> @@ -46,6 +46,7 @@ struct kvmppc_devm_page_pvt {
>   unsigned long *rmap;
>   unsigned int lpid;
>   unsigned long gpa;
> + bool skip_page_out;
>  };
> 
>  /*
> @@ -139,6 +140,54 @@ kvmppc_devm_migrate_alloc_and_copy(struct migrate_vma 
> *mig,
>   return 0;
>  }
> 
> +/*
> + * Shares the page with HV, thus making it a normal page.
> + *
> + * - If the page is already secure, then provision a new page and share
> + * - If the page is a normal page, share the existing page
> + *
> + * In the former case, uses the dev_pagemap_ops migrate_to_ram handler
> + * to unmap the device page from QEMU's page tables.
> + */
> +static unsigned long
> +kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long 
> page_shift)
> +{
> +
> + int ret = H_PARAMETER;
> + struct page *devm_page;
> + struct kvmppc_devm_page_pvt *pvt;
> + unsigned long pfn;
> + unsigned long *rmap;
> + struct kvm_memory_slot *slot;
> + unsigned long gfn = gpa >> page_shift;
> + int srcu_idx;
> +
> + srcu_idx = srcu_read_lock(>srcu);
> + slot = gfn_to_memslot(kvm, gfn);
> + if (!slot)
> + goto out;
> +
> + rmap = >arch.rmap[gfn - slot->base_gfn];
> + if (kvmppc_rmap_is_devm_pfn(*rmap)) {
> + devm_page = pfn_to_page(*rmap & ~KVMPPC_RMAP_DEVM_PFN);
> + pvt = (struct kvmppc_devm_page_pvt *)
> + devm_page->zone_device_data;
> + pvt->skip_page_out = true;
> + }
> +
> + pfn = gfn_to_pfn(kvm, gpa >> page_shift);

Use 'gfn'?

> + if (is_error_noslot_pfn(pfn))
> + goto out;
> +
> + ret = uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0, page_shift);
> + if (ret == U_SUCCESS)
> + ret = H_SUCCESS;
> + kvm_release_pfn_clean(pfn);

Nit: Blank line?
> +out:
> + srcu_read_unlock(>srcu, srcu_idx);
> + return ret;
> +}
> +
>  /*
>   * Move page from normal memory to secure memory.
>   */
> @@ -159,9 +208,12 @@ kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
>   if (page_shift != PAGE_SHIFT)
>   return H_P3;
> 
> - if (flags)
> + if (flags & ~H_PAGE_IN_SHARED)
>   return H_P2;
> 
> + if (flags & H_PAGE_IN_SHARED)
> + return kvmppc_share_page(kvm, gpa, page_shift);
> +
>   ret = H_PARAMETER;
>   down_read(>mm->mmap_sem);
>   srcu_idx = srcu_read_lock(>srcu);
> @@ -211,7 +263,7 @@ kvmppc_devm_fault_migrate_alloc_and_copy(struct 
> migrate_vma *mig,
>   struct page *dpage, *spage;
>   struct kvmppc_devm_page_pvt *pvt;
>   unsigned long pfn;
> - int ret;
> + int ret = U_SUCCESS;
> 
>   spage = migrate_pfn_to_page(*mig->src);
>   if (!spage || !(*mig->src & MIGRATE_PFN_MIGRATE))
> @@ -226,8 +278,18 @@ kvmppc_devm_fault_migrate_alloc_and_copy(struct 
> migrate_vma *mig,
>   pvt = spage->zone_device_data;
> 
>   pfn = page_to_pfn(dpage);
> - ret = uv_page_out(pvt->lpid, pfn << page_shift, pvt->gpa, 0,
> -   page_shift);
> +
> + /*
> +  * This same function is used in two cases:

Nit: s/same//

> +  * - When HV touches a secure page, for which we do page-out

Better to qualify page-out with "uv page-out"? its kind of counterintuitive
to do a page-out on a fault!

> +  * - When a secure page is converted to shared page, we touch
> +  *   the page to essentially unmap the device page. In this
> +  *   case we skip page-out.
> +  */
> + if (!pvt->skip_page_out)
> + ret = uv_page_out(pvt->lpid, pfn << page_shift, pvt->gpa, 0,
> +   page_shift);
> +
>   if (ret == U_SUCCESS)
>   *mig->dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
>   

Re: [PATCH v7 1/7] kvmppc: Driver to manage pages of secure guest

2019-08-28 Thread Sukadev Bhattiprolu
Some minor comments/questions below. Overall, the patches look
fine to me.

> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static struct dev_pagemap kvmppc_devm_pgmap;
> +static unsigned long *kvmppc_devm_pfn_bitmap;
> +static DEFINE_SPINLOCK(kvmppc_devm_pfn_lock);

Is this lock protecting just the pfn_bitmap?

> +
> +struct kvmppc_devm_page_pvt {
> + unsigned long *rmap;
> + unsigned int lpid;
> + unsigned long gpa;
> +};
> +
> +/*
> + * Get a free device PFN from the pool
> + *
> + * Called when a normal page is moved to secure memory (UV_PAGE_IN). Device
> + * PFN will be used to keep track of the secure page on HV side.
> + *
> + * @rmap here is the slot in the rmap array that corresponds to @gpa.
> + * Thus a non-zero rmap entry indicates that the corresponding guest
> + * page has become secure, and is not mapped on the HV side.
> + *
> + * NOTE: In this and subsequent functions, we pass around and access
> + * individual elements of kvm_memory_slot->arch.rmap[] without any
> + * protection. Should we use lock_rmap() here?
> + */
> +static struct page *kvmppc_devm_get_page(unsigned long *rmap, unsigned long 
> gpa,
> +  unsigned int lpid)
> +{
> + struct page *dpage = NULL;
> + unsigned long bit, devm_pfn;
> + unsigned long flags;
> + struct kvmppc_devm_page_pvt *pvt;
> + unsigned long pfn_last, pfn_first;
> +
> + if (kvmppc_rmap_is_devm_pfn(*rmap))
> + return NULL;
> +
> + pfn_first = kvmppc_devm_pgmap.res.start >> PAGE_SHIFT;
> + pfn_last = pfn_first +
> +(resource_size(_devm_pgmap.res) >> PAGE_SHIFT);
> + spin_lock_irqsave(_devm_pfn_lock, flags);

Blank lines around spin_lock() would help.

> + bit = find_first_zero_bit(kvmppc_devm_pfn_bitmap, pfn_last - pfn_first);
> + if (bit >= (pfn_last - pfn_first))
> + goto out;
> +
> + bitmap_set(kvmppc_devm_pfn_bitmap, bit, 1);
> + devm_pfn = bit + pfn_first;

Can we drop the _devm_pfn_lock here or after the trylock_page()?
Or does it also protect the ->zone_device_data' assignment below as well?
If so, maybe drop the 'pfn_' from the name of the lock?

Besides, we don't seem to hold this lock when accessing ->zone_device_data
in kvmppc_share_page(). Maybe _devm_pfn_lock just protects the bitmap?


> + dpage = pfn_to_page(devm_pfn);

Does this code and hence CONFIG_PPC_UV depend on a specific model like
CONFIG_SPARSEMEM_VMEMMAP?
> +
> + if (!trylock_page(dpage))
> + goto out_clear;
> +
> + *rmap = devm_pfn | KVMPPC_RMAP_DEVM_PFN;
> + pvt = kzalloc(sizeof(*pvt), GFP_ATOMIC);
> + if (!pvt)
> + goto out_unlock;
> + pvt->rmap = rmap;
> + pvt->gpa = gpa;
> + pvt->lpid = lpid;
> + dpage->zone_device_data = pvt;

->zone_device_data is set after locking the dpage here, but in
kvmppc_share_page() and kvmppc_devm_fault_migrate_alloc_and_copy()
it is accessed without locking the page?

> + spin_unlock_irqrestore(_devm_pfn_lock, flags);
> +
> + get_page(dpage);
> + return dpage;
> +
> +out_unlock:
> + unlock_page(dpage);
> +out_clear:
> + bitmap_clear(kvmppc_devm_pfn_bitmap, devm_pfn - pfn_first, 1);
> +out:
> + spin_unlock_irqrestore(_devm_pfn_lock, flags);
> + return NULL;
> +}
> +
> +/*
> + * Alloc a PFN from private device memory pool and copy page from normal
> + * memory to secure memory.
> + */
> +static int
> +kvmppc_devm_migrate_alloc_and_copy(struct migrate_vma *mig,
> +unsigned long *rmap, unsigned long gpa,
> +unsigned int lpid, unsigned long page_shift)
> +{
> + struct page *spage = migrate_pfn_to_page(*mig->src);
> + unsigned long pfn = *mig->src >> MIGRATE_PFN_SHIFT;
> + struct page *dpage;
> +
> + *mig->dst = 0;
> + if (!spage || !(*mig->src & MIGRATE_PFN_MIGRATE))
> + return 0;
> +
> + dpage = kvmppc_devm_get_page(rmap, gpa, lpid);
> + if (!dpage)
> + return -EINVAL;
> +
> + if (spage)
> + uv_page_in(lpid, pfn << page_shift, gpa, 0, page_shift);
> +
> + *mig->dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> + return 0;
> +}
> +
> +/*
> + * Move page from normal memory to secure memory.
> + */
> +unsigned long
> +kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
> +  unsigned long flags, unsigned long page_shift)
> +{
> + unsigned long addr, end;
> + unsigned long src_pfn, dst_pfn;

These are the host frame numbers correct? Trying to distinguish them
from 'gfn' and 'gpa' used in the function.

> + struct migrate_vma mig;
> + struct vm_area_struct *vma;
> + int srcu_idx;
> + unsigned long gfn = gpa >> page_shift;
> + struct kvm_memory_slot *slot;
> + unsigned long *rmap;
> + int ret;
> +
> + if (page_shift != PAGE_SHIFT)
> + return H_P3;
> +
> + if (flags)
> + return H_P2;
> +
> 

Re: [PATCH v5 4/7] powerpc/mm: Use UV_WRITE_PATE ucall to register a PATE

2019-08-20 Thread Sukadev Bhattiprolu
nsigned int lpid, unsigned long dw0,
> > + unsigned long dw1)
> > +{
> > +   unsigned long old = be64_to_cpu(partition_tb[lpid].patb0);
> > +
> > +   partition_tb[lpid].patb0 = cpu_to_be64(dw0);
> > +   partition_tb[lpid].patb1 = cpu_to_be64(dw1);
> 
> ie. here we always update the copy of the partition table, regardless of
> whether we're running under an ultravisor or not. So the copy is a
> complete copy isn't it?

Yes.
> 
> > +   /*
> > +* In ultravisor enabled systems, the ultravisor maintains the partition
> > +* table in secure memory where we don't have access, therefore, we have
> > +* to do a ucall to set an entry.
> > +*/
> > +   if (firmware_has_feature(FW_FEATURE_ULTRAVISOR)) {
> > +   uv_register_pate(lpid, dw0, dw1);
> > +   pr_info("PATE registered by ultravisor: dw0 = 0x%lx, dw1 = 
> > 0x%lx\n",
> > +   dw0, dw1);
> > +   } else {
> > +   flush_partition(lpid, old);
> > +   }
> 
> What is different is whether we flush or not.

only differences are where the partition table used by hardware is stored
(secure memory) and updated (in UV, with higher privilege).

> 
> And don't we still need to do the flush for the nestMMU? I assume we're
> saying the ultravisor will broadcast a flush for us, which will also
> handle the nestMMU case?

The same sequence of instructions (as HV) are used in uv_register_pate()
to flush partition and process scoped entries (so nest MMU would also be
covered when NMMU sees the tlbie?)

Thanks,

Sukadev


Re: [PATCH v4 7/8] KVM: PPC: Ultravisor: Enter a secure guest

2019-07-17 Thread Sukadev Bhattiprolu
Michael Ellerman [m...@ellerman.id.au] wrote:
> Claudio Carvalho  writes:
> > From: Sukadev Bhattiprolu 
> >
> > To enter a secure guest, we have to go through the ultravisor, therefore
> > we do a ucall when we are entering a secure guest.
> >
> > This change is needed for any sort of entry to the secure guest from the
> > hypervisor, whether it is a return from an hcall, a return from a
> > hypervisor interrupt, or the first time that a secure guest vCPU is run.
> >
> > If we are returning from an hcall, the results are already in the
> > appropriate registers R3:12, except for R3, R6 and R7. R3 has the status
> > of the reflected hcall, therefore we move it to R0 for the ultravisor and
> > set R3 to the UV_RETURN ucall number. R6,7 were used as temporary
> > registers, hence we restore them.
> 
> This is another case where some documentation would help people to
> review the code.
> 
> > Have fast_guest_return check the kvm_arch.secure_guest field so that a
> > new CPU enters UV when started (in response to a RTAS start-cpu call).
> >
> > Thanks to input from Paul Mackerras, Ram Pai and Mike Anderson.
> >
> > Signed-off-by: Sukadev Bhattiprolu 
> > [ Pass SRR1 in r11 for UV_RETURN, fix kvmppc_msr_interrupt to preserve
> >   the MSR_S bit ]
> > Signed-off-by: Paul Mackerras 
> > [ Fix UV_RETURN ucall number and arch.secure_guest check ]
> > Signed-off-by: Ram Pai 
> > [ Save the actual R3 in R0 for the ultravisor and use R3 for the
> >   UV_RETURN ucall number. Update commit message and ret_to_ultra comment ]
> > Signed-off-by: Claudio Carvalho 
> > ---
> >  arch/powerpc/include/asm/kvm_host.h   |  1 +
> >  arch/powerpc/include/asm/ultravisor-api.h |  1 +
> >  arch/powerpc/kernel/asm-offsets.c |  1 +
> >  arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 40 +++
> >  4 files changed, 37 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > index cffb365d9d02..89813ca987c2 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > @@ -36,6 +36,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  /* Sign-extend HDEC if not on POWER9 */
> >  #define EXTEND_HDEC(reg)   \
> > @@ -1092,16 +1093,12 @@ BEGIN_FTR_SECTION
> >  END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
> >  
> > ld  r5, VCPU_LR(r4)
> > -   ld  r6, VCPU_CR(r4)
> > mtlrr5
> > -   mtcrr6
> >  
> > ld  r1, VCPU_GPR(R1)(r4)
> > ld  r2, VCPU_GPR(R2)(r4)
> > ld  r3, VCPU_GPR(R3)(r4)
> > ld  r5, VCPU_GPR(R5)(r4)
> > -   ld  r6, VCPU_GPR(R6)(r4)
> > -   ld  r7, VCPU_GPR(R7)(r4)
> > ld  r8, VCPU_GPR(R8)(r4)
> > ld  r9, VCPU_GPR(R9)(r4)
> > ld  r10, VCPU_GPR(R10)(r4)
> > @@ -1119,10 +1116,38 @@ BEGIN_FTR_SECTION
> > mtspr   SPRN_HDSISR, r0
> >  END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
> >  
> > +   ld  r6, VCPU_KVM(r4)
> > +   lbz r7, KVM_SECURE_GUEST(r6)
> > +   cmpdi   r7, 0
> 
> You could hoist the load of r6 and r7 to here?

we could move 'ld r7' here. r6 is used to restore CR below so
it (r6) has to stay there?

> 
> > +   bne ret_to_ultra
> > +
> > +   lwz r6, VCPU_CR(r4)
> > +   mtcrr6
> > +
> > +   ld  r7, VCPU_GPR(R7)(r4)
> > +   ld  r6, VCPU_GPR(R6)(r4)
> > ld  r0, VCPU_GPR(R0)(r4)
> > ld  r4, VCPU_GPR(R4)(r4)
> > HRFI_TO_GUEST
> > b   .
> > +/*
> > + * We are entering a secure guest, so we have to invoke the ultravisor to 
> > do
> > + * that. If we are returning from a hcall, the results are already in the
> > + * appropriate registers R3:12, except for R3, R6 and R7. R3 has the 
> > status of
> > + * the reflected hcall, therefore we move it to R0 for the ultravisor and 
> > set
> > + * R3 to the UV_RETURN ucall number. R6,7 were used as temporary registers
> > + * above, hence we restore them.
> > + */
> > +ret_to_ultra:
> > +   lwz r6, VCPU_CR(r4)
> > +   mtcrr6
> > +   mfspr   r11, SPRN_SRR1
> > +   mr  r0, r3
> > +   LOAD_REG_IMMEDIATE(r3, UV_RETURN)
> 
> Worth open coding to save three instructions?

Yes, good point:

-   LOAD_REG_IMMEDIATE(r3, UV_RETURN)
+
+   li  r3, 0
+   orisr3, r3, (UV_RETURN)@__AS_ATHIGH
+   ori r3, r3, (UV_RETURN)@l

MAINTAINERS: Remove non-existent VAS file

2019-04-10 Thread Sukadev Bhattiprolu


The file arch/powerpc/include/uapi/asm/vas.h was considered but
never merged and should be removed from the MAINTAINERS file.

While here, add missing email address.

Reported-by: Joe Perches 
Signed-off-by: Sukadev Bhattiprolu 
---
 MAINTAINERS | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3671fde..e3bf3d5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7386,13 +7386,12 @@ S:  Supported
 F: drivers/net/ethernet/ibm/ibmvnic.*
 
 IBM Power Virtual Accelerator Switchboard
-M: Sukadev Bhattiprolu
+M: Sukadev Bhattiprolu 
 L: linuxppc-dev@lists.ozlabs.org
 S: Supported
 F: arch/powerpc/platforms/powernv/vas*
 F: arch/powerpc/platforms/powernv/copy-paste.h
 F: arch/powerpc/include/asm/vas.h
-F: arch/powerpc/include/uapi/asm/vas.h
 
 IBM Power Virtual Ethernet Device Driver
 M: Thomas Falcon 
-- 
1.8.3.1



Re: [PATCH v2 3/7] powerpc: use task_pid_nr() for TID allocation

2018-04-24 Thread Sukadev Bhattiprolu
Andrew Donnellan [andrew.donnel...@au1.ibm.com] wrote:
> [+ Sukadev, Christophe]
> 
> On 18/04/18 11:08, Alastair D'Silva wrote:
> > From: Alastair D'Silva <alast...@d-silva.org>
> > 
> > The current implementation of TID allocation, using a global IDR, may
> > result in an errant process starving the system of available TIDs.
> > Instead, use task_pid_nr(), as mentioned by the original author. The
> > scenario described which prevented it's use is not applicable, as
> > set_thread_tidr can only be called after the task struct has been
> > populated.
> > 
> > Signed-off-by: Alastair D'Silva <alast...@d-silva.org>
> 
> So it's too late in the evening for me to completely get my head around
> what's going on here enough to give my Reviewed-by:, but my current thinking
> is:
> 
> - In the first version of the patch to add TIDR support
> (https://patchwork.ozlabs.org/patch/799494/), it was originally proposed to
> call assign_thread_id() (as it was then called) from copy_thread()
> 
> - The comment block documents the reason why we can't use task_pid_nr() but
> assumes that we're trying to assign a TIDR from within copy_thread()
> 
> - The final patch that was accepted
> (https://patchwork.ozlabs.org/patch/835552/,
> ec233ede4c8654894610ea54f4dae7adc954ac62) instead sets the TIDR to 0 from
> copy_thread(), so the original reasoning regarding not using task_pid_nr()
> within copy_thread() is no longer applicable.
> 
> Sukadev: does this sound right?

Yes. Like with PIDR, was trying to assign TIDR initially to all threads.
But since only a subset of threads need/use TIDR, we can assign the
value later (when set_thread_tidr() is called). So we should be able to
use task_pid_nr() then.

Sukadev



[GIT PULL] Please pull JSON files for POWR9 PMU events

2018-03-13 Thread Sukadev Bhattiprolu

Hi Arnaldo,

Please pull an update to the JSON files for POWER9 PMU events.

The following changes since commit 90d2614c4d10c2f9d0ada9a3b01e5f43ca8d1ae3:

  perf test: Fix exit code for record+probe_libc_inet_pton.sh (2018-03-13 
15:14:43 -0300)

are available in the git repository at:

  https://github.com/sukadev/linux/ p9-json-v5

for you to fetch changes up to 99c9dff949f2502964005f9afa8d60c89b446f2c:

  perf vendor events: Update POWER9 events (2018-03-13 16:48:12 -0500)


Sukadev Bhattiprolu (1):
  perf vendor events: Update POWER9 events

 .../perf/pmu-events/arch/powerpc/power9/cache.json |  25 ---
 .../pmu-events/arch/powerpc/power9/frontend.json   |  10 -
 .../pmu-events/arch/powerpc/power9/marked.json |   5 -
 .../pmu-events/arch/powerpc/power9/memory.json |   5 -
 .../perf/pmu-events/arch/powerpc/power9/other.json | 241 ++---
 .../pmu-events/arch/powerpc/power9/pipeline.json   |  50 ++---
 tools/perf/pmu-events/arch/powerpc/power9/pmc.json |   5 -
 .../arch/powerpc/power9/translation.json   |  10 +-
 8 files changed, 178 insertions(+), 173 deletions(-)



[PATCH] Fix cleanup when VAS is not configured

2018-02-13 Thread Sukadev Bhattiprolu
From: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
Date: Fri, 9 Feb 2018 11:49:06 -0600
Subject: [PATCH 1/1] powerpc/vas: Fix cleanup when VAS is not configured

When VAS is not configured, unregister the platform driver. Also simplify
cleanup by delaying vas debugfs init until we know VAS is configured.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
Changelog[v2]
- [Michael Ellerman] Move vas_init_dbgdir() into a lower level
  function to keep vas_init() cleaner.
---
 arch/powerpc/platforms/powernv/vas-debug.c | 11 +++
 arch/powerpc/platforms/powernv/vas.c   |  6 +++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-debug.c 
b/arch/powerpc/platforms/powernv/vas-debug.c
index b4de4c6..4f7276e 100644
--- a/arch/powerpc/platforms/powernv/vas-debug.c
+++ b/arch/powerpc/platforms/powernv/vas-debug.c
@@ -179,6 +179,7 @@ void vas_instance_init_dbgdir(struct vas_instance *vinst)
 {
struct dentry *d;
 
+   vas_init_dbgdir();
if (!vas_debugfs)
return;
 
@@ -201,8 +202,18 @@ void vas_instance_init_dbgdir(struct vas_instance *vinst)
vinst->dbgdir = NULL;
 }
 
+/*
+ * Set up the "root" VAS debugfs dir. Return if we already set it up
+ * (or failed to) in an earlier instance of VAS.
+ */
 void vas_init_dbgdir(void)
 {
+   static bool first_time = true;
+
+   if (!first_time)
+   return;
+
+   first_time = false;
vas_debugfs = debugfs_create_dir("vas", NULL);
if (IS_ERR(vas_debugfs))
vas_debugfs = NULL;
diff --git a/arch/powerpc/platforms/powernv/vas.c 
b/arch/powerpc/platforms/powernv/vas.c
index aebbe95..5a2b24c 100644
--- a/arch/powerpc/platforms/powernv/vas.c
+++ b/arch/powerpc/platforms/powernv/vas.c
@@ -160,8 +160,6 @@ static int __init vas_init(void)
int found = 0;
struct device_node *dn;
 
-   vas_init_dbgdir();
-
platform_driver_register(_driver);
 
for_each_compatible_node(dn, NULL, "ibm,vas") {
@@ -169,8 +167,10 @@ static int __init vas_init(void)
found++;
}
 
-   if (!found)
+   if (!found) {
+   platform_driver_unregister(_driver);
return -ENODEV;
+   }
 
pr_devel("Found %d instances\n", found);
 
-- 
2.7.4



Re: [PATCH 2/4] powerpc/vas: Fix cleanup when VAS is not configured

2018-02-12 Thread Sukadev Bhattiprolu
Michael Ellerman [m...@ellerman.id.au] wrote:
> Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com> writes:
> 
> > When VAS is not configured in the system, make sure to remove
> > the VAS debugfs directory and unregister the platform driver.
> >
> > Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
> ...
> > diff --git a/arch/powerpc/platforms/powernv/vas.c 
> > b/arch/powerpc/platforms/powernv/vas.c
> > index aebbe95..f83e27d8 100644
> > --- a/arch/powerpc/platforms/powernv/vas.c
> > +++ b/arch/powerpc/platforms/powernv/vas.c
> > @@ -169,8 +169,11 @@ static int __init vas_init(void)
> > found++;
> > }
> >  
> > -   if (!found)
> > +   if (!found) {
> > +   platform_driver_unregister(_driver);
> > +   vas_cleanup_dbgdir();
> > return -ENODEV;
> > +   }
> 
> The better patch would be to move the call to vas_init_dbgdir() down
> here, where we know we have successfully registered the driver.

Well, when VAS is configured, init_vas_instance() expects the top level
"vas" debugfs dir to already be setup.

We could have each init_vas_instance() assume it is the first and
unconditionally call vas_init_dbgdir(). vas_init_dbgdir() could make
sure to initialize only once.

Or, we could make a separate pass countng "ibm,vas" nodes. If there are
none, skip both steps (dbgdir and registering platform driver).

Sukadev



Re: [PATCH] powerpc/vas: do not set uses_vas for kernel windows

2018-02-09 Thread Sukadev Bhattiprolu
Nicholas Piggin [npig...@gmail.com] wrote:
> cp_abort is only required or user windows, because kernel context
> must not be preempted between a copy/paste pair.

Yes, that is a good optimization.

> 
> Without this patch, the init task gets used_vas set when it runs
> the nx842_powernv_init initcall, which opens windows for kernel
> usage.
> 
> used_vas is then never cleared anywhere, so it gets propagated
> into all other tasks. It's a property of the address space, so it
> should really be cleared when a new mm is created (or in dup_mmap
> if the mmaps are marked as VM_DONTCOPY). For now we seem to have
> no such driver, so leave that for another patch.

If the parent process has the paste address mapped, the child inherits
those mappings - so we can't clear the ->used_vas in a process until
it has unmapped all the send windows right?

If VM_DONCOPY is set, then we can clear it.

> 
> Cc: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
> Signed-off-by: Nicholas Piggin <npig...@gmail.com>

Reviewed-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>



[PATCH 4/4] powerpc/vas: Add a couple of trace points

2018-02-09 Thread Sukadev Bhattiprolu
Add a couple of trace points in the VAS driver

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
Changelog [v2]
- Make TRACE_INCLUDE_PATH relative to 
---
 arch/powerpc/platforms/powernv/vas-trace.h  | 112 
 arch/powerpc/platforms/powernv/vas-window.c |   9 +++
 2 files changed, 121 insertions(+)
 create mode 100644 arch/powerpc/platforms/powernv/vas-trace.h

diff --git a/arch/powerpc/platforms/powernv/vas-trace.h 
b/arch/powerpc/platforms/powernv/vas-trace.h
new file mode 100644
index 000..939d85d
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/vas-trace.h
@@ -0,0 +1,112 @@
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM   vas
+
+#if !defined(_VAS_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+
+#define _VAS_TRACE_H
+#include 
+#include 
+#include 
+
+TRACE_EVENT(   vas_rx_win_open,
+
+   TP_PROTO(struct task_struct *tsk,
+int vasid,
+int cop,
+struct vas_rx_win_attr *rxattr),
+
+   TP_ARGS(tsk, vasid, cop, rxattr),
+
+   TP_STRUCT__entry(
+   __field(struct task_struct *, tsk)
+   __field(int, pid)
+   __field(int, cop)
+   __field(int, vasid)
+   __field(struct vas_rx_win_attr *, rxattr)
+   __field(int, lnotify_lpid)
+   __field(int, lnotify_pid)
+   __field(int, lnotify_tid)
+   ),
+
+   TP_fast_assign(
+   __entry->pid = tsk->pid;
+   __entry->vasid = vasid;
+   __entry->cop = cop;
+   __entry->lnotify_lpid = rxattr->lnotify_lpid;
+   __entry->lnotify_pid = rxattr->lnotify_pid;
+   __entry->lnotify_tid = rxattr->lnotify_tid;
+   ),
+
+   TP_printk("pid=%d, vasid=%d, cop=%d, lpid=%d, pid=%d, tid=%d",
+   __entry->pid, __entry->vasid, __entry->cop,
+   __entry->lnotify_lpid, __entry->lnotify_pid,
+   __entry->lnotify_tid)
+);
+
+TRACE_EVENT(   vas_tx_win_open,
+
+   TP_PROTO(struct task_struct *tsk,
+int vasid,
+int cop,
+struct vas_tx_win_attr *txattr),
+
+   TP_ARGS(tsk, vasid, cop, txattr),
+
+   TP_STRUCT__entry(
+   __field(struct task_struct *, tsk)
+   __field(int, pid)
+   __field(int, cop)
+   __field(int, vasid)
+   __field(struct vas_tx_win_attr *, txattr)
+   __field(int, lpid)
+   __field(int, pidr)
+   ),
+
+   TP_fast_assign(
+   __entry->pid = tsk->pid;
+   __entry->vasid = vasid;
+   __entry->cop = cop;
+   __entry->lpid = txattr->lpid;
+   __entry->pidr = txattr->pidr;
+   ),
+
+   TP_printk("pid=%d, vasid=%d, cop=%d, lpid=%d, pidr=%d",
+   __entry->pid, __entry->vasid, __entry->cop,
+   __entry->lpid, __entry->pidr)
+);
+
+TRACE_EVENT(   vas_paste_crb,
+
+   TP_PROTO(struct task_struct *tsk,
+   struct vas_window *win),
+
+   TP_ARGS(tsk, win),
+
+   TP_STRUCT__entry(
+   __field(struct task_struct *, tsk)
+   __field(struct vas_window *, win)
+   __field(int, pid)
+   __field(int, vasid)
+   __field(int, winid)
+   __field(unsigned long, paste_kaddr)
+   ),
+
+   TP_fast_assign(
+   __entry->pid = tsk->pid;
+   __entry->vasid = win->vinst->vas_id;
+   __entry->winid = win->winid;
+   __entry->paste_kaddr = (unsigned long)win->paste_kaddr
+   ),
+
+   TP_printk("pid=%d, vasid=%d, winid=%d, paste_kaddr=0x%016lx\n",
+   __entry->pid, __entry->vasid, __entry->winid,
+   __entry->paste_kaddr)
+);
+
+#endif /* _VAS_TRACE_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH ../../arch/powerpc/platforms/powernv
+#define TRACE_INCLUDE_FILE vas-trace
+#include 
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 2b3eb01..6b2de9e 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -21,6 +21,9 @@
 #inclu

[PATCH RESEND 3/4] powerpc/vas: Remove a stray line in Makefile

2018-02-09 Thread Sukadev Bhattiprolu
Remove a bogus line from arch/powerpc/platforms/powernv/Makefile that
was added by commit ece4e51 ("powerpc/vas: Export HVWC to debugfs").

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 6c9d519..703a350 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -16,5 +16,4 @@ obj-$(CONFIG_OPAL_PRD)+= opal-prd.o
 obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
 obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o
-obj-$(CONFIG_PPC_FTW)  += nx-ftw.o
 obj-$(CONFIG_OCXL_BASE)+= ocxl.o
-- 
2.7.4



[PATCH 2/4] powerpc/vas: Fix cleanup when VAS is not configured

2018-02-09 Thread Sukadev Bhattiprolu
When VAS is not configured in the system, make sure to remove
the VAS debugfs directory and unregister the platform driver.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-debug.c | 5 +
 arch/powerpc/platforms/powernv/vas.c   | 5 -
 arch/powerpc/platforms/powernv/vas.h   | 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/vas-debug.c 
b/arch/powerpc/platforms/powernv/vas-debug.c
index b4de4c6..e6e4067 100644
--- a/arch/powerpc/platforms/powernv/vas-debug.c
+++ b/arch/powerpc/platforms/powernv/vas-debug.c
@@ -207,3 +207,8 @@ void vas_init_dbgdir(void)
if (IS_ERR(vas_debugfs))
vas_debugfs = NULL;
 }
+
+void vas_cleanup_dbgdir(void)
+{
+   debugfs_remove_recursive(vas_debugfs);
+}
diff --git a/arch/powerpc/platforms/powernv/vas.c 
b/arch/powerpc/platforms/powernv/vas.c
index aebbe95..f83e27d8 100644
--- a/arch/powerpc/platforms/powernv/vas.c
+++ b/arch/powerpc/platforms/powernv/vas.c
@@ -169,8 +169,11 @@ static int __init vas_init(void)
found++;
}
 
-   if (!found)
+   if (!found) {
+   platform_driver_unregister(_driver);
+   vas_cleanup_dbgdir();
return -ENODEV;
+   }
 
pr_devel("Found %d instances\n", found);
 
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index ae0100f..2645613 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -406,6 +406,7 @@ extern struct mutex vas_mutex;
 
 extern struct vas_instance *find_vas_instance(int vasid);
 extern void vas_init_dbgdir(void);
+extern void vas_cleanup_dbgdir(void);
 extern void vas_instance_init_dbgdir(struct vas_instance *vinst);
 extern void vas_window_init_dbgdir(struct vas_window *win);
 extern void vas_window_free_dbgdir(struct vas_window *win);
-- 
2.7.4



[PATCH RESEND 1/4] powerpc/vas: Fix order of cleanup in debugfs dir

2018-02-09 Thread Sukadev Bhattiprolu
Fix the order of cleanup to ensure we free the name buffer in case
of an error creating 'hvwc' or 'info' files.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-debug.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-debug.c 
b/arch/powerpc/platforms/powernv/vas-debug.c
index ca22f1e..b4de4c6 100644
--- a/arch/powerpc/platforms/powernv/vas-debug.c
+++ b/arch/powerpc/platforms/powernv/vas-debug.c
@@ -166,13 +166,13 @@ void vas_window_init_dbgdir(struct vas_window *window)
 
return;
 
-free_name:
-   kfree(window->dbgname);
-   window->dbgname = NULL;
-
 remove_dir:
debugfs_remove_recursive(window->dbgdir);
window->dbgdir = NULL;
+
+free_name:
+   kfree(window->dbgname);
+   window->dbgname = NULL;
 }
 
 void vas_instance_init_dbgdir(struct vas_instance *vinst)
-- 
2.7.4



Re: [PATCH 5/5] powerpc/ftw: Document FTW API/usage

2018-01-24 Thread Sukadev Bhattiprolu
Randy Dunlap [rdun...@infradead.org] wrote:

> > +struct ftw_setup_attr ftwattr;
> > +
> > +fd = open("/dev/ftw", O_RDWR);
> > +
> > +memset(, 0, sizeof(rxattr));
> 
> Is that supposed to be ftwattr (2x above)?

Yes. I agree with your other comments as well and will send a new version.

Thanks for the detailed review.

Sukadev



Re: [PATCH 3/5] powerpc/ftw: Implement a simple FTW driver

2018-01-18 Thread Sukadev Bhattiprolu
Randy Dunlap [rdun...@infradead.org] wrote:
> > +
> > +   default:
> > +   return -EINVAL;
> > +   }
> > +}
> 
> Nit:  some versions of gcc (or maybe clang) complain about a typed function
> not always having a return value in code like above, so it is often done as:

Ok.
> 
> > +static long ftw_ioctl(struct file *fp, unsigned int cmd, unsigned long arg)
> > +{
> > +   switch (cmd) {
> > +
> > +   case FTW_SETUP:
> > +   return ftw_ioc_ftw_setup(fp, arg);
> > +
> > +   default:
> > +   break;
> > +   }
> 
>   return -EINVAL;
> > +}
> 
> Do you expect to implement more ioctls?  If not, just change the switch to
> an if ().
Maybe a couple more but changed it to an 'if' for now (and fixed an
error handling issue in ftw_file_init()).

Here is the updated patch.

---
>From 344ffbcc2cd1e64dd87249d508cf6000e6e41a0c Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
Date: Fri, 4 Aug 2017 16:45:34 -0500
Subject: [PATCH 3/5] powerpc/ftw: Implement a simple FTW driver

The Fast Thread Wake-up (FTW) driver provides user space applications an
interface to the low latency Core-to-Core wakeup functionality in POWER9.

This mechanism allows a thread on one core to efficiently send a message
to a "waiting thread" on another core on the same chip, using the Virtual
Accelrator Switchboard (VAS) subsystem.

This initial FTW driver implements the ioctl and mmap operations on an
FTW device node. Using these operations, a pair of application threads
can establish a "communication channel" and use the COPY, PASTE and WAIT
instructions to wait/wake up.

PATCH 5/5 documents the API and includes an example of the usage.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
Changelog[v2]
- [Michael Neuling] Rename from drop "nx" from name "nx-ftw".
- [Michael Neuling] Use a single VAS_FTW_SETUP ioctl to simplify
  interface.
- [Michael Ellerman] To work with paste emulation patch, mark
  PTE dirty in ->mmap() to ensure there is no fault on paste
  (the emulation patch must disable pagefaults when updating
  thread reconfig registers).
- [Randy Dunlap] Minor cleanup in ftw_ioctl().
- Fix cleanup code in ftw_file_init()
- Check return value from set_thread_tidr().
- Move driver drivers/misc/ftw.
---
 drivers/misc/Kconfig  |   1 +
 drivers/misc/Makefile |   1 +
 drivers/misc/ftw/Kconfig  |  16 +++
 drivers/misc/ftw/Makefile |   4 +
 drivers/misc/ftw/ftw.c| 346 ++
 5 files changed, 368 insertions(+)
 create mode 100644 drivers/misc/ftw/Kconfig
 create mode 100644 drivers/misc/ftw/Makefile
 create mode 100644 drivers/misc/ftw/ftw.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index f1a5c23..a9b161f 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -508,4 +508,5 @@ source "drivers/misc/mic/Kconfig"
 source "drivers/misc/genwqe/Kconfig"
 source "drivers/misc/echo/Kconfig"
 source "drivers/misc/cxl/Kconfig"
+source "drivers/misc/ftw/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 5ca5f64..338668c 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_GENWQE)  += genwqe/
 obj-$(CONFIG_ECHO) += echo/
 obj-$(CONFIG_VEXPRESS_SYSCFG)  += vexpress-syscfg.o
 obj-$(CONFIG_CXL_BASE) += cxl/
+obj-$(CONFIG_PPC_FTW)  += ftw/
 obj-$(CONFIG_ASPEED_LPC_CTRL)  += aspeed-lpc-ctrl.o
 obj-$(CONFIG_ASPEED_LPC_SNOOP) += aspeed-lpc-snoop.o
 obj-$(CONFIG_PCI_ENDPOINT_TEST)+= pci_endpoint_test.o
diff --git a/drivers/misc/ftw/Kconfig b/drivers/misc/ftw/Kconfig
new file mode 100644
index 000..5454d40
--- /dev/null
+++ b/drivers/misc/ftw/Kconfig
@@ -0,0 +1,16 @@
+
+config PPC_FTW
+   tristate "IBM Fast Thread-Wakeup (FTW)"
+   depends on PPC_VAS
+   default n
+   help
+  This enables support for IBM Fast Thread-Wakeup driver.
+
+  The FTW driver allows applications to utilize a low overhead
+  core-to-core wake up mechansim in the IBM Virtual Accelerator
+  Switchboard (VAS) to improve performance.
+
+  VAS adapters are found in POWER9 based systems and are required
+  for the FTW driver to be operational.
+
+  If unsure, say N.
diff --git a/drivers/misc/ftw/Makefile b/drivers/misc/ftw/Makefile
new file mode 100644
index 000..2cfe566
--- /dev/null
+++ b/drivers/misc/ftw/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+ccflags-y  := $(call cc-disable-warning, 
unused-const-variable)
+ccflags-$(CONFIG_PPC_WERROR)   += -Werror
+obj-$(CONFIG_PPC_FTW)  += ftw.o

Re: [PATCH 2/5] powerpc/ftw: Define FTW_SETUP ioctl API

2018-01-18 Thread Sukadev Bhattiprolu
Randy Dunlap [rdun...@infradead.org] wrote:

> > +#define FTW_FLAGS_PIN_WINDOW   0x1
> > +
> > +#define FTW_SETUP  _IOW('v', 1, struct ftw_setup_attr)
> 
> ioctls should be documented in Documentation/ioctl/ioctl-number.txt.
> Please update that file.

Ok. Here is the updated patch.

Thanks for the review.

Sukadev
---
>From 1f347c199a0b1bbc528705c8e9ddd11c825a80fc Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
Date: Thu, 2 Feb 2017 06:20:07 -0500
Subject: [PATCH 2/5] powerpc/ftw: Define FTW_SETUP ioctl API

Define the FTW_SETUP ioctl interface for fast thread wakeup (FTW). A
follow-on patch will implement the FTW driver and ioctl.

Thanks to input from Ben Herrenschmidt, Michael Neuling, Michael Ellerman.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
Changelog[v2]
- [Michael Neuling] Use a single VAS_FTW_SETUP ioctl and simplify
  the interface.
- [Randy Dunlap] Reserve/document the ioctl number used.
---
 Documentation/ioctl/ioctl-number.txt |  1 +
 include/uapi/misc/ftw.h  | 35 +++
 2 files changed, 36 insertions(+)
 create mode 100644 include/uapi/misc/ftw.h

diff --git a/Documentation/ioctl/ioctl-number.txt 
b/Documentation/ioctl/ioctl-number.txt
index 3e3fdae..b0f323c 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -277,6 +277,7 @@ Code  Seq#(hex) Include FileComments
 'v'00-1F   linux/fs.h  conflict!
 'v'00-0F   linux/sonypi.h  conflict!
 'v'C0-FF   linux/meye.hconflict!
+'v'20-27   include/uapi/misc/ftw.h
 'w'all CERN SCI driver
 'y'00-1F   packet based user level communications
<mailto:zap...@interlan.net>
diff --git a/include/uapi/misc/ftw.h b/include/uapi/misc/ftw.h
new file mode 100644
index 000..99676b2
--- /dev/null
+++ b/include/uapi/misc/ftw.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright 2018 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_MISC_FTW_H
+#define _UAPI_MISC_FTW_H
+
+#include 
+#include 
+
+#define FTW_FLAGS_PIN_WINDOW   0x1
+
+/*
+ * Note: The range 0x20-27 for letter 'v' are reserved for FTW ioctls in
+ *  Documentation/ioctl/ioctl-number.txt.
+ */
+#define FTW_SETUP  _IOW('v', 0x20, struct ftw_setup_attr)
+
+struct ftw_setup_attr {
+   __s16   version;
+   __s16   vas_id; /* specific instance of vas or -1 for default */
+   __u32   reserved;
+
+   __u64   reserved1;
+
+   __u64   flags;
+   __u64   reserved2;
+};
+
+#endif /* _UAPI_MISC_FTW_H */
-- 
2.7.4



Re: [PATCH 1/2] powerpc: export thread-tidr interfaces

2018-01-17 Thread Sukadev Bhattiprolu
Frederic Barrat [fbar...@linux.vnet.ibm.com] wrote:
> Hi,
> 
> 
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index 2010e4c..f20c1ad 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -1560,6 +1560,7 @@ void clear_thread_tidr(struct task_struct *t)
> > free_thread_tidr(t->thread.tidr);
> > t->thread.tidr = 0;
> >   }
> > +EXPORT_SYMBOL_GPL(clear_thread_tidr);
> 
> Isn't it dangerous to export clear_thread_tidr()? Other modules may also
> have assigned the TIDR by calling set_thread_tidr(), so clearing it could
> potentially break those other modules. My understanding is that once the
> TIDR is assigned, there's no safe way to reclaim it other than the thread
> exiting. Or we would need some kind of reference counter.

Yes the FTW driver avoids calling clear_thread_tidr() for the same reasons.
I don't have a strong case for exporting clear_thread_tidr(). Here is the
updated patch, exporting just the set_thread_tidr().

Thanks,

Sukadev
---
>From 204ee3c918f8dad46c1e40d2d3730b07c10a87a3 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
Date: Mon, 15 Jan 2018 13:43:18 -0600
Subject: [PATCH 1/2] powerpc: export set_thread_tidr()

Export set_thread_tidr() so it can be used by external modules.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
Changelog  [Frederic Barrat] Don't export clear_thread_tidr()

---
 arch/powerpc/kernel/process.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 2010e4c..20df2cb2 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1592,6 +1592,7 @@ int set_thread_tidr(struct task_struct *t)
 
return 0;
 }
+EXPORT_SYMBOL_GPL(set_thread_tidr);
 
 #endif /* CONFIG_PPC64 */
 
-- 
1.8.3.1



[PATCH 5/5] powerpc/ftw: Document FTW API/usage

2018-01-16 Thread Sukadev Bhattiprolu
Document the usage of the VAS Fast thread-wakeup API and add an entry in
MAINTAINERS file.

Thanks for input/comments from Benjamin Herrenschmidt, Michael Neuling,
Michael Ellerman, Robert Blackmore, Ian Munsie, Haren Myneni and Paul
Mackerras.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---

Changelog[v2]
- [Michael Neuling] Update API to use a single, VAS_FTW_SEUTP ioctl
  rather than two ioctls.
- [Michael Neuling] Drop "nx" from name "nx-ftw".

---
 Documentation/powerpc/ftw-api.txt | 283 ++
 MAINTAINERS   |   8 ++
 2 files changed, 291 insertions(+)
 create mode 100644 Documentation/powerpc/ftw-api.txt

diff --git a/Documentation/powerpc/ftw-api.txt 
b/Documentation/powerpc/ftw-api.txt
new file mode 100644
index 000..a107628
--- /dev/null
+++ b/Documentation/powerpc/ftw-api.txt
@@ -0,0 +1,283 @@
+Virtual Accelerator Switchboard and Fast Thread-Wakeup API
+
+Power9 processor supports a hardware subystem known as the Virtual
+Accelerator Switchboard (VAS) which allows two entities in the Power9
+system to efficiently exchange messages. Messages must be formatted as
+Coprocessor Request Blocks (CRB) and be submitted using the COPY/PASTE
+instructions (new in Power9).
+
+Usage of VAS depends on the entities exchanging the messages and
+currently two usages have been identified.
+
+First usage of VAS, referred to as VAS/NX involves a software thread
+submitting data compression requests to a co-processor (hardware/nest
+accelerator) aka NX engine. This usage is not yet available to user
+applications.
+
+Alternatively, VAS can be used by two software threads to efficiently
+exchange messages. Initially, this mechanism is intended to wake up a
+waiting thread quickly - i.e "fast thread wake-up (FTW)". This document
+describes the user API for this VAS/FTW mechanism.
+
+Application access to the FTW mechanism is provided through the FTW
+device node (/dev/ftw) implemented by the FTW device driver.
+
+A multi-threaded software processes that intends to use the FTW
+mechanism must first setup a channel (consisting of a pair of VAS
+windows) for the waiting and waking threads to communicate. The
+channel is set up by opening the FTW device and issuing the FTW_SETUP
+ioctl. Upon successful return from the ioctl, the waiting side of
+channel is complete and a thread can issue the "Wait" instruction
+to wait for an event.
+
+After the successful return from the FTW_SETUP ioctl, the waking
+thread must use mmap() system call on the same file descriptor and
+obtain a virtual address known as the "paste address".
+
+Once the mmap() call succeeds the setup of "waking" side of the channel
+is complete. To wake up a waiting thread, the waking thread should use
+the "COPY" and "PASTE" instructions to write a zero-filled CRB to the
+paste-address.
+
+The wait and wake up operations can be repeated as long as the paste
+address and the FTW file descriptor are valid (i.e until munmap() of
+the paste address or a close() of the FTW fd).
+
+1. FTW Device Node
+
+There is one /dev/ftw node in the system and it provides access to the
+VAS/FTW functionality.
+
+The only valid operations (system calls) on the FTW node are:
+
+- open() the device for read and write.
+
+- issue the FTW_SETUP ioctl to set up a channel.
+
+- mmap() the file descriptor
+
+- close the device node.
+
+Other file operations on the FTW node are undefined.
+
+Note that the COPY and PASTE operations go directly to the hardware
+and do not involve system calls or go through the FTW device.
+
+Although a system may have several instances of the VAS in the system
+(typically, one per P9 chip) there is just one FTW device node in
+the system.
+
+When the FTW device node is opened, the kernel assigns a suitable
+instance of VAS to the process. Kernel will make a best-effort attempt
+to assign an optimal instance of VAS for the process - based on the CPU/
+chip that the process is running on. In the initial release, the kernel
+does not support migrating the VAS instance if the process migrates from
+a CPU on one chip to a CPU on another chip.
+
+Applications may chose a specific instance of the VAS using the 'vas_id'
+field in the FTW_SETUP ioctl as detailed below.
+
+2. Open FTW node
+
+The device should be opened for read and write. No special privileges
+are needed to open the device. The device may be opened multiple times.
+
+Each open() of the FTW device is associated with one channel of
+communication. There is a system-wide limit (currently 64K windows per
+chip and since some are reserved for hardware, there are a

[PATCH 4/5] powerpc/ftw: Add a couple of trace points

2018-01-16 Thread Sukadev Bhattiprolu
Add a couple of trace points in the FTW driver

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 drivers/misc/ftw/ftw-trace.h | 75 
 drivers/misc/ftw/ftw.c   |  6 
 2 files changed, 81 insertions(+)
 create mode 100644 drivers/misc/ftw/ftw-trace.h

diff --git a/drivers/misc/ftw/ftw-trace.h b/drivers/misc/ftw/ftw-trace.h
new file mode 100644
index 000..0d96046
--- /dev/null
+++ b/drivers/misc/ftw/ftw-trace.h
@@ -0,0 +1,75 @@
+/*
+ * Copyright 2018 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM   ftw
+
+#if !defined(_FTW_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+
+#define _FTW_TRACE_H
+#include 
+#include 
+
+TRACE_EVENT(   ftw_open_event,
+
+   TP_PROTO(struct task_struct *tsk,
+int instid),
+
+   TP_ARGS(tsk, instid),
+
+   TP_STRUCT__entry(
+   __field(struct task_struct *, tsk)
+   __field(int, instid)
+   __field(int, pid)
+   ),
+
+   TP_fast_assign(
+   __entry->pid = tsk->pid;
+   __entry->instid = instid;
+   ),
+
+   TP_printk("pid=%d, inst=%d", __entry->pid, __entry->instid)
+);
+
+TRACE_EVENT(   ftw_mmap_event,
+
+   TP_PROTO(struct task_struct *tsk,
+int instid,
+unsigned long paste_addr,
+unsigned long vma_start),
+
+   TP_ARGS(tsk, instid, paste_addr, vma_start),
+
+   TP_STRUCT__entry(
+   __field(struct task_struct *, tsk)
+   __field(int, pid)
+   __field(int, instid)
+   __field(unsigned long, paste_addr)
+   __field(unsigned long, vma_start)
+   ),
+
+   TP_fast_assign(
+   __entry->pid = tsk->pid;
+   __entry->instid = instid;
+   __entry->paste_addr = paste_addr;
+   __entry->vma_start = vma_start;
+   ),
+
+   TP_printk(
+   "pid=%d, inst=%d, pasteaddr=0x%16lx, vma_start=0x%16lx",
+   __entry->pid, __entry->instid, __entry->paste_addr,
+   __entry->vma_start)
+);
+
+#endif /* _FTW_TRACE_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE ftw-trace
+#include 
diff --git a/drivers/misc/ftw/ftw.c b/drivers/misc/ftw/ftw.c
index 6fcb4e2..a01c9e6 100644
--- a/drivers/misc/ftw/ftw.c
+++ b/drivers/misc/ftw/ftw.c
@@ -21,6 +21,9 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include "ftw-trace.h"
+
 /*
  * FTW is a device driver used to provide user space access to the
  * Core-to-Core aka Fast Thread Wakeup (FTW) functionality provided by
@@ -81,6 +84,8 @@ static int ftw_open(struct inode *inode, struct file *fp)
 
fp->private_data = instance;
 
+   trace_ftw_open_event(current, instance->id);
+
return 0;
 }
 
@@ -234,6 +239,7 @@ static int ftw_mmap(struct file *fp, struct vm_area_struct 
*vma)
 
pr_devel("paste addr %llx at %lx, rc %d\n", paste_addr, vma->vm_start,
rc);
+   trace_ftw_mmap_event(current, instance->id, paste_addr, vma->vm_start);
 
set_thread_uses_vas();
 
-- 
2.7.4



[PATCH 3/5] powerpc/ftw: Implement a simple FTW driver

2018-01-16 Thread Sukadev Bhattiprolu
The Fast Thread Wake-up (FTW) driver provides user space applications an
interface to the low latency Core-to-Core wakeup functionality in POWER9.

This mechanism allows a thread on one core to efficiently send a message
to a "waiting thread" on another core on the same chip, using the Virtual
Accelrator Switchboard (VAS) subsystem.

This initial FTW driver implements the ioctl and mmap operations on an
FTW device node. Using these operations, a pair of application threads
can establish a "communication channel" and use the COPY, PASTE and WAIT
instructions to wait/wake up.

PATCH 5/5 documents the API and includes an example of the usage.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
Changelog[v2]
- [Michael Neuling] Rename from drop "nx" from name "nx-ftw".
- [Michael Neuling] Use a single VAS_FTW_SETUP ioctl to simplify
  interface.
- [Michael Ellerman] To work with paste emulation patch, mark
  PTE dirty in ->mmap() to ensure there is no fault on paste
  (the emulation patch must disable pagefaults when updating
  thread reconfig registers).
- Check return value from set_thread_tidr().
- Move driver drivers/misc/ftw.

---
 drivers/misc/Kconfig  |   1 +
 drivers/misc/Makefile |   1 +
 drivers/misc/ftw/Kconfig  |  16 +++
 drivers/misc/ftw/Makefile |   4 +
 drivers/misc/ftw/ftw.c| 346 ++
 5 files changed, 368 insertions(+)
 create mode 100644 drivers/misc/ftw/Kconfig
 create mode 100644 drivers/misc/ftw/Makefile
 create mode 100644 drivers/misc/ftw/ftw.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index f1a5c23..a9b161f 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -508,4 +508,5 @@ source "drivers/misc/mic/Kconfig"
 source "drivers/misc/genwqe/Kconfig"
 source "drivers/misc/echo/Kconfig"
 source "drivers/misc/cxl/Kconfig"
+source "drivers/misc/ftw/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 5ca5f64..338668c 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_GENWQE)  += genwqe/
 obj-$(CONFIG_ECHO) += echo/
 obj-$(CONFIG_VEXPRESS_SYSCFG)  += vexpress-syscfg.o
 obj-$(CONFIG_CXL_BASE) += cxl/
+obj-$(CONFIG_PPC_FTW)  += ftw/
 obj-$(CONFIG_ASPEED_LPC_CTRL)  += aspeed-lpc-ctrl.o
 obj-$(CONFIG_ASPEED_LPC_SNOOP) += aspeed-lpc-snoop.o
 obj-$(CONFIG_PCI_ENDPOINT_TEST)+= pci_endpoint_test.o
diff --git a/drivers/misc/ftw/Kconfig b/drivers/misc/ftw/Kconfig
new file mode 100644
index 000..5454d40
--- /dev/null
+++ b/drivers/misc/ftw/Kconfig
@@ -0,0 +1,16 @@
+
+config PPC_FTW
+   tristate "IBM Fast Thread-Wakeup (FTW)"
+   depends on PPC_VAS
+   default n
+   help
+  This enables support for IBM Fast Thread-Wakeup driver.
+
+  The FTW driver allows applications to utilize a low overhead
+  core-to-core wake up mechansim in the IBM Virtual Accelerator
+  Switchboard (VAS) to improve performance.
+
+  VAS adapters are found in POWER9 based systems and are required
+  for the FTW driver to be operational.
+
+  If unsure, say N.
diff --git a/drivers/misc/ftw/Makefile b/drivers/misc/ftw/Makefile
new file mode 100644
index 000..2cfe566
--- /dev/null
+++ b/drivers/misc/ftw/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+ccflags-y  := $(call cc-disable-warning, 
unused-const-variable)
+ccflags-$(CONFIG_PPC_WERROR)   += -Werror
+obj-$(CONFIG_PPC_FTW)  += ftw.o
diff --git a/drivers/misc/ftw/ftw.c b/drivers/misc/ftw/ftw.c
new file mode 100644
index 000..6fcb4e2
--- /dev/null
+++ b/drivers/misc/ftw/ftw.c
@@ -0,0 +1,346 @@
+/*
+ * Copyright 2018 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#define pr_fmt(fmt) "ftw: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * FTW is a device driver used to provide user space access to the
+ * Core-to-Core aka Fast Thread Wakeup (FTW) functionality provided by
+ * the Virtual Accelerator Subsystem (VAS) in POWER9 systems. See also
+ * arch/powerpc/platforms/powernv/vas*.
+ *
+ * The driver creates the device /dev/ftw that can be used as follows:
+ *
+ * fd = open("/dev/ftw", O_RDWR);
+ * rc = ioctl(fd, FTW_SETUP, );
+ * paste_addr = mmap(NULL, PAGE_SIZE, prot, MAP_SHARED, fd, 0ULL).
+ * vas_copy(, 0, 1);
+ * vas_paste(paste_addr, 0, 1);
+ *
+ * where "vas_copy" and "vas_past

[PATCH 2/5] powerpc/ftw: Define FTW_SETUP ioctl API

2018-01-16 Thread Sukadev Bhattiprolu
Define the FTW_SETUP ioctl interface for fast thread wakeup (FTW). A
follow-on patch will implement the FTW driver and ioctl.

Thanks to input from Ben Herrenschmidt, Michael Neuling, Michael Ellerman.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
Changelog[v2]
- [Michael Neuling] Use a single VAS_FTW_SETUP ioctl and simplify
  the interface.
---
 include/uapi/misc/ftw.h | 31 +++
 1 file changed, 31 insertions(+)
 create mode 100644 include/uapi/misc/ftw.h

diff --git a/include/uapi/misc/ftw.h b/include/uapi/misc/ftw.h
new file mode 100644
index 000..f233f51
--- /dev/null
+++ b/include/uapi/misc/ftw.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright 2018 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_MISC_FTW_H
+#define _UAPI_MISC_FTW_H
+
+#include 
+#include 
+
+#define FTW_FLAGS_PIN_WINDOW   0x1
+
+#define FTW_SETUP  _IOW('v', 1, struct ftw_setup_attr)
+
+struct ftw_setup_attr {
+   __s16   version;
+   __s16   vas_id; /* specific instance of vas or -1 for default */
+   __u32   reserved;
+
+   __u64   reserved1;
+
+   __u64   flags;
+   __u64   reserved2;
+};
+
+#endif /* _UAPI_MISC_FTW_H */
-- 
2.7.4



[PATCH 1/5] powerpc/vas: Remove a stray line in Makefile

2018-01-16 Thread Sukadev Bhattiprolu
Remove a bogus line from arch/powerpc/platforms/powernv/Makefile that
was added by commit ece4e51 ("powerpc/vas: Export HVWC to debugfs").

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 3732118..ca94488 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -16,4 +16,3 @@ obj-$(CONFIG_OPAL_PRD)+= opal-prd.o
 obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
 obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o
-obj-$(CONFIG_PPC_FTW)  += nx-ftw.o
-- 
2.7.4



[PATCH 0/5] Implement FTW driver

2018-01-16 Thread Sukadev Bhattiprolu
The Virtual Accelerator Switchboard (VAS) subsystem in the POWER9 processor
provides a low latency Core-to-core wakeup" mechanism which allows a thread
on one core the processor to efficiently send a message to a thread waiting
on another core.

This Fast thread-wakeup (FTW) driver provides user space applications an
interface to the Core-to-core wakeup mechanism. The FTW driver uses the
"external" interfaces provided by the VAS driver to interact with the VAS
hardware.

PATCH 5/5 documents the API.

The ftw-next branch on my github has some initial test cases for the
driver:

https://github.com/sukadev/linux/tree/ftw-next

Thanks to input from Ben Herrenschmidt, Michael Ellerman, Michael
Neuling and Robert Blackmore.

Sukadev Bhattiprolu (5):
  powerpc/vas: Remove a stray line in Makefile
  powerpc/ftw: Define FTW_SETUP ioctl API
  powerpc/ftw: Implement a simple FTW driver
  powerpc/ftw: Add a couple of trace points
  powerpc/ftw: Document FTW API/usage

 Documentation/powerpc/ftw-api.txt   | 283 +
 MAINTAINERS |   8 +
 arch/powerpc/platforms/powernv/Makefile |   1 -
 drivers/misc/Kconfig|   1 +
 drivers/misc/Makefile   |   1 +
 drivers/misc/ftw/Kconfig|  16 ++
 drivers/misc/ftw/Makefile   |   4 +
 drivers/misc/ftw/ftw-trace.h|  75 +++
 drivers/misc/ftw/ftw.c  | 352 
 include/uapi/misc/ftw.h |  31 +++
 10 files changed, 771 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/powerpc/ftw-api.txt
 create mode 100644 drivers/misc/ftw/Kconfig
 create mode 100644 drivers/misc/ftw/Makefile
 create mode 100644 drivers/misc/ftw/ftw-trace.h
 create mode 100644 drivers/misc/ftw/ftw.c
 create mode 100644 include/uapi/misc/ftw.h

-- 
2.7.4



[PATCH 1/1] powerpc/vas: Add a couple of trace points

2018-01-16 Thread Sukadev Bhattiprolu
Add a couple of trace points in the VAS driver.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-trace.h  | 112 
 arch/powerpc/platforms/powernv/vas-window.c |   9 +++
 2 files changed, 121 insertions(+)
 create mode 100644 arch/powerpc/platforms/powernv/vas-trace.h

diff --git a/arch/powerpc/platforms/powernv/vas-trace.h 
b/arch/powerpc/platforms/powernv/vas-trace.h
new file mode 100644
index 000..c937191
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/vas-trace.h
@@ -0,0 +1,112 @@
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM   vas
+
+#if !defined(_VAS_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+
+#define _VAS_TRACE_H
+#include 
+#include 
+#include 
+
+TRACE_EVENT(   vas_rx_win_open,
+
+   TP_PROTO(struct task_struct *tsk,
+int vasid,
+int cop,
+struct vas_rx_win_attr *rxattr),
+
+   TP_ARGS(tsk, vasid, cop, rxattr),
+
+   TP_STRUCT__entry(
+   __field(struct task_struct *, tsk)
+   __field(int, pid)
+   __field(int, cop)
+   __field(int, vasid)
+   __field(struct vas_rx_win_attr *, rxattr)
+   __field(int, lnotify_lpid)
+   __field(int, lnotify_pid)
+   __field(int, lnotify_tid)
+   ),
+
+   TP_fast_assign(
+   __entry->pid = tsk->pid;
+   __entry->vasid = vasid;
+   __entry->cop = cop;
+   __entry->lnotify_lpid = rxattr->lnotify_lpid;
+   __entry->lnotify_pid = rxattr->lnotify_pid;
+   __entry->lnotify_tid = rxattr->lnotify_tid;
+   ),
+
+   TP_printk("pid=%d, vasid=%d, cop=%d, lpid=%d, pid=%d, tid=%d",
+   __entry->pid, __entry->vasid, __entry->cop,
+   __entry->lnotify_lpid, __entry->lnotify_pid,
+   __entry->lnotify_tid)
+);
+
+TRACE_EVENT(   vas_tx_win_open,
+
+   TP_PROTO(struct task_struct *tsk,
+int vasid,
+int cop,
+struct vas_tx_win_attr *txattr),
+
+   TP_ARGS(tsk, vasid, cop, txattr),
+
+   TP_STRUCT__entry(
+   __field(struct task_struct *, tsk)
+   __field(int, pid)
+   __field(int, cop)
+   __field(int, vasid)
+   __field(struct vas_tx_win_attr *, txattr)
+   __field(int, lpid)
+   __field(int, pidr)
+   ),
+
+   TP_fast_assign(
+   __entry->pid = tsk->pid;
+   __entry->vasid = vasid;
+   __entry->cop = cop;
+   __entry->lpid = txattr->lpid;
+   __entry->pidr = txattr->pidr;
+   ),
+
+   TP_printk("pid=%d, vasid=%d, cop=%d, lpid=%d, pidr=%d",
+   __entry->pid, __entry->vasid, __entry->cop,
+   __entry->lpid, __entry->pidr)
+);
+
+TRACE_EVENT(   vas_paste_crb,
+
+   TP_PROTO(struct task_struct *tsk,
+   struct vas_window *win),
+
+   TP_ARGS(tsk, win),
+
+   TP_STRUCT__entry(
+   __field(struct task_struct *, tsk)
+   __field(struct vas_window *, win)
+   __field(int, pid)
+   __field(int, vasid)
+   __field(int, winid)
+   __field(unsigned long, paste_kaddr)
+   ),
+
+   TP_fast_assign(
+   __entry->pid = tsk->pid;
+   __entry->vasid = win->vinst->vas_id;
+   __entry->winid = win->winid;
+   __entry->paste_kaddr = (unsigned long)win->paste_kaddr
+   ),
+
+   TP_printk("pid=%d, vasid=%d, winid=%d, paste_kaddr=0x%016lx\n",
+   __entry->pid, __entry->vasid, __entry->winid,
+   __entry->paste_kaddr)
+);
+
+#endif /* _VAS_TRACE_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE vas-trace
+#include 
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 2b3eb01..6b2de9e 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -21,6 +21,9 @@
 #include "vas.h"
 #include "copy-paste.h"
 
+#define CREATE_TRACE_POINTS
+#incl

[PATCH 2/2] powerpc: export set_thread_uses_vas()

2018-01-16 Thread Sukadev Bhattiprolu
Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/process.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index f20c1ad..d22055b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1475,6 +1475,7 @@ int set_thread_uses_vas(void)
 #endif /* CONFIG_PPC_BOOK3S_64 */
return 0;
 }
+EXPORT_SYMBOL_GPL(set_thread_uses_vas);
 
 #ifdef CONFIG_PPC64
 static DEFINE_SPINLOCK(vas_thread_id_lock);
-- 
2.7.4



[PATCH 1/2] powerpc: export thread-tidr interfaces

2018-01-16 Thread Sukadev Bhattiprolu
Export set_thread_tidr() and clear_thread_tidr() interfaces so they
can be used by external modules.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/process.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 2010e4c..f20c1ad 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1560,6 +1560,7 @@ void clear_thread_tidr(struct task_struct *t)
free_thread_tidr(t->thread.tidr);
t->thread.tidr = 0;
 }
+EXPORT_SYMBOL_GPL(clear_thread_tidr);
 
 void arch_release_task_struct(struct task_struct *t)
 {
@@ -1592,6 +1593,7 @@ int set_thread_tidr(struct task_struct *t)
 
return 0;
 }
+EXPORT_SYMBOL_GPL(set_thread_tidr);
 
 #endif /* CONFIG_PPC64 */
 
-- 
2.7.4



[PATCH 1/1] powerpc: Emulate paste instruction

2017-12-19 Thread Sukadev Bhattiprolu
From: Michael Neuling <mi...@neuling.org>

On POWER9 DD2.1 and below there are issues when the paste instruction
generates an error. If an error occurs when thread reconfiguration
happens (ie another thread in the core goes into/out of powersave) the
core may hang.

To avoid this a special sequence is required which stops thread
configuration so that the paste can be safely executed.

This patch assumes paste executed in userspace are trapped into the
illegal instruction exception at 0xe40.

Here we re-execute the paste instruction but with the required
sequence to ensure thread reconfiguration doesn't occur.

Signed-off-by: Michael Neuling <mi...@neuling.org>
Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
Changelog[v4]:
- We need to disable pagefaults after all when modifying the thread
  reconfig registers. Use a mutex, rather than a spinlock around
  the thread reconfig registers. Acquire the mutex first then block
  interrupts so we don't sleep on the mutex with interrupts disabled.

Changlog[v3]:
- [Michael Ellerman] We don't need to disable/enable pagefaults
  when emulating paste;
- [Michael Ellerman, Aneesh Kumar] Fix retval from emulate_paste()

Changelog[v2]:
[Sukadev]: Use PPC_PASTE() rather than the paste instruction since
in older versions the instruction required a third parameter.
---
 arch/powerpc/include/asm/emulated_ops.h |  1 +
 arch/powerpc/include/asm/ppc-opcode.h   |  1 +
 arch/powerpc/include/asm/reg.h  |  2 +
 arch/powerpc/kernel/traps.c | 73 +
 4 files changed, 77 insertions(+)

diff --git a/arch/powerpc/include/asm/emulated_ops.h 
b/arch/powerpc/include/asm/emulated_ops.h
index 651e135..fdc95cf 100644
--- a/arch/powerpc/include/asm/emulated_ops.h
+++ b/arch/powerpc/include/asm/emulated_ops.h
@@ -59,6 +59,7 @@ extern struct ppc_emulated {
struct ppc_emulated_entry lxvh8x;
struct ppc_emulated_entry lxvd2x;
struct ppc_emulated_entry lxvb16x;
+   struct ppc_emulated_entry paste;
 #endif
 } ppc_emulated;
 
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index ce0930d..a55d2ef 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -229,6 +229,7 @@
 #define PPC_INST_MTTMR 0x7c0003dc
 #define PPC_INST_NOP   0x6000
 #define PPC_INST_PASTE 0x7c20070d
+#define PPC_INST_PASTE_MASK0xfc2007ff
 #define PPC_INST_POPCNTB   0x7cf4
 #define PPC_INST_POPCNTB_MASK  0xfc0007fe
 #define PPC_INST_POPCNTD   0x7c0003f4
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index b779f3c..3495ecf 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -469,6 +469,8 @@
 #define SPRN_DBAT7U0x23E   /* Data BAT 7 Upper Register */
 #define SPRN_PPR   0x380   /* SMT Thread status Register */
 #define SPRN_TSCR  0x399   /* Thread Switch Control Register */
+#define SPRN_TRIG1 0x371   /* WAT Trigger 1 */
+#define SPRN_TRIG2 0x372   /* WAT Trigger 2 */
 
 #define SPRN_DEC   0x016   /* Decrement Register */
 #define SPRN_DER   0x095   /* Debug Enable Register */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index f3eb61b..e1ea3be 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1153,6 +1153,74 @@ static inline bool tm_abort_check(struct pt_regs *regs, 
int reason)
 }
 #endif
 
+static DEFINE_MUTEX(paste_emulation_mutex);
+
+static inline int paste(void *i)
+{
+   int cr;
+   long retval = 0;
+
+   /* Need per core lock to ensure trig1/2 writes don't race */
+   mutex_lock(_emulation_mutex);
+
+   hard_irq_disable();
+
+   mtspr(SPRN_TRIG1, 0); /* data doesn't matter */
+   mtspr(SPRN_TRIG1, 0); /* HW says do this twice */
+   asm volatile(
+   "1: " PPC_PASTE(0, %2) "\n"
+   "2: mfcr %1\n"
+   ".section .fixup,\"ax\"\n"
+   "3: li %0,%3\n"
+   "   li %2,0\n"
+   "   b 2b\n"
+   ".previous\n"
+   EX_TABLE(1b, 3b)
+   : "=r" (retval), "=r" (cr)
+   : "b" (i), "i" (-EFAULT), "0" (retval));
+   mtspr(SPRN_TRIG2, 0);
+
+   local_irq_enable();
+
+   mutex_unlock(_emulation_mutex);
+
+   return retval ? retval : cr;
+}
+
+static int emulate_paste(struct pt_regs *regs, u32 instword)
+{
+   const void __user *addr;
+   unsigned long ea;
+   u8 ra, rb;
+   int rc;
+
+   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   return -EINVAL;
+
+   ra = (instword &

[PATCH 1/1] vas: vas_window_init_dbgdir: fix order of cleanup.

2017-12-19 Thread Sukadev Bhattiprolu
Fix the order of cleanup to ensure we free the name buffer in case
of an error creating 'hvwc' or 'info' files.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-debug.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-debug.c 
b/arch/powerpc/platforms/powernv/vas-debug.c
index ca22f1e..b4de4c6 100644
--- a/arch/powerpc/platforms/powernv/vas-debug.c
+++ b/arch/powerpc/platforms/powernv/vas-debug.c
@@ -166,13 +166,13 @@ void vas_window_init_dbgdir(struct vas_window *window)
 
return;
 
-free_name:
-   kfree(window->dbgname);
-   window->dbgname = NULL;
-
 remove_dir:
debugfs_remove_recursive(window->dbgdir);
window->dbgdir = NULL;
+
+free_name:
+   kfree(window->dbgname);
+   window->dbgname = NULL;
 }
 
 void vas_instance_init_dbgdir(struct vas_instance *vinst)
-- 
2.7.4



[PATCH 1/1]: powerpc: block interrupts when updating TIDR

2017-12-01 Thread Sukadev Bhattiprolu
From: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
Date: Tue, 28 Nov 2017 13:39:43 -0600
Subject: [PATCH 1/1]: powerpc: block interrupts when updating TIDR

clear_thread_tidr() is called in interrupt context as a part of delayed
put of the task structure (i.e as a part of timer interrupt). To prevent
a deadlock, block interrupts when holding vas_thread_id_lock to set/
clear TIDR for a task.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/process.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index bfdd783..aa8dbb9 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1509,14 +1509,15 @@ static int assign_thread_tidr(void)
 {
int index;
int err;
+   unsigned long flags;
 
 again:
if (!ida_pre_get(_thread_ida, GFP_KERNEL))
return -ENOMEM;
 
-   spin_lock(_thread_id_lock);
+   spin_lock_irqsave(_thread_id_lock, flags);
err = ida_get_new_above(_thread_ida, 1, );
-   spin_unlock(_thread_id_lock);
+   spin_unlock_irqrestore(_thread_id_lock, flags);
 
if (err == -EAGAIN)
goto again;
@@ -1524,9 +1525,9 @@ static int assign_thread_tidr(void)
return err;
 
if (index > MAX_THREAD_CONTEXT) {
-   spin_lock(_thread_id_lock);
+   spin_lock_irqsave(_thread_id_lock, flags);
ida_remove(_thread_ida, index);
-   spin_unlock(_thread_id_lock);
+   spin_unlock_irqrestore(_thread_id_lock, flags);
return -ENOMEM;
}
 
@@ -1535,9 +1536,11 @@ static int assign_thread_tidr(void)
 
 static void free_thread_tidr(int id)
 {
-   spin_lock(_thread_id_lock);
+   unsigned long flags;
+
+   spin_lock_irqsave(_thread_id_lock, flags);
ida_remove(_thread_ida, id);
-   spin_unlock(_thread_id_lock);
+   spin_unlock_irqrestore(_thread_id_lock, flags);
 }
 
 /*
-- 
2.7.4



Re: [PATCH v3] powerpc: Avoid signed to unsigned conversion in set_thread_tidr()

2017-11-27 Thread Sukadev Bhattiprolu
Vaibhav Jain [vaib...@linux.vnet.ibm.com] wrote:
> There is an unsafe signed to unsigned conversion in set_thread_tidr()
> that may cause an error value to be assigned to SPRN_TIDR register and
> used as thread-id.

Thanks for fixing this. I have a comment below
> 
> The issue happens as assign_thread_tidr() returns an int and
> thread.tidr is an unsigned-long. So a negative error code returned
> from assign_thread_tidr() will fail the error check and gets assigned
> as tidr as a large positive value.
> 
> To fix this the patch assigns the return value of assign_thread_tidr()
> to a temporary int and assigns it to thread.tidr iff its '> 0'.
> 
> The patch shouldn't impact the calling convention of set_thread_tidr()
> i.e all -ve return-values are error codes and a return value of '0'
> indicates success.
> 
> Fixes: ec233ede4c86("powerpc: Add support for setting SPRN_TIDR")
> Signed-off-by: Vaibhav Jain <vaib...@linux.vnet.ibm.com>
> 
> ---
> Changelog:
> 
> v3  ->  Updated the patch to not impact the calling convention [Mpe, 
> Christophe]
> 
> v2  ->* Update the patch description to document the calling
>   convention of set_thread_tidr(). [Mpe]
>   * Fix a tidr allocation leak.
> ---
>  arch/powerpc/kernel/process.c | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index bfdd783e3916..9fb69211a3d4 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1569,19 +1569,22 @@ void arch_release_task_struct(struct task_struct *t)
>   */
>  int set_thread_tidr(struct task_struct *t)
>  {
> + int rc;
> +
>   if (!cpu_has_feature(CPU_FTR_ARCH_300))
>   return -EINVAL;
> 
>   if (t != current)
>   return -EINVAL;
> 
> - t->thread.tidr = assign_thread_tidr();
> - if (t->thread.tidr < 0)
> - return t->thread.tidr;
> -
> - mtspr(SPRN_TIDR, t->thread.tidr);
> -
> - return 0;
> + rc = assign_thread_tidr();
> + if (rc > 0) {
> + t->thread.tidr = rc;
> + mtspr(SPRN_TIDR, t->thread.tidr);
> + return 0;
> + } else {
> + return rc;
> + }

We can eliminate the 'else' and be consistent with existing code, if
we check for error (i.e rc < 0) and return rc. assign_thread_tidr() will
not return 0, but even if it did, setting the register and thread.tidr
to 0 should not be a problem.

Sukadev



[PATCH] powerpc/vas, export chip_to_vas_id()

2017-11-20 Thread Sukadev Bhattiprolu
>From 958f8db089f4b89407fc4b89bccd3eaef585aa96 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
Date: Mon, 20 Nov 2017 12:53:15 -0600
Subject: [PATCH 1/1] powerpc/vas, export chip_to_vas_id()

Export the symbol chip_to_vas_id() to fix a build failure when
CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV=m.

Reported-by: Haren Myneni <hb...@us.ibm.com>
Reported-by: Josh Boyer <jwbo...@fedoraproject.org>
Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---

This was broken by the patch https://lkml.org/lkml/2017/11/7/915.

---
 arch/powerpc/platforms/powernv/vas.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/vas.c 
b/arch/powerpc/platforms/powernv/vas.c
index c488621..aebbe95 100644
--- a/arch/powerpc/platforms/powernv/vas.c
+++ b/arch/powerpc/platforms/powernv/vas.c
@@ -135,6 +135,7 @@ int chip_to_vas_id(int chipid)
}
return -1;
 }
+EXPORT_SYMBOL(chip_to_vas_id);
 
 static int vas_probe(struct platform_device *pdev)
 {
-- 
2.7.4



[GIT PULL] Please pull JSON files for POWR9 PMU events

2017-11-08 Thread Sukadev Bhattiprolu
Hi Arnaldo,

Please pull an update to the JSON files for POWER9 PMU events.

The following changes since commit 148b43a3540bf25875bb5ab695a446950dc8d559:

  tools headers: Synchronize kernel ABI headers wrt SPDX tags (2017-11-07 
13:41:35 -0300)

are available in the git repository at:

  https://github.com/sukadev/linux p9-json-v4

for you to fetch changes up to 4afb062d7d306bf56dbae9b5291e3515ccfede4c:

  perf vendor events powerpc: Update POWER9 events (2017-11-08 18:42:03 -0500)


Sukadev Bhattiprolu (1):
  perf vendor events powerpc: Update POWER9 events

 .../perf/pmu-events/arch/powerpc/power9/cache.json |   5 -
 .../pmu-events/arch/powerpc/power9/frontend.json   |   7 +-
 .../pmu-events/arch/powerpc/power9/marked.json |  27 +-
 .../perf/pmu-events/arch/powerpc/power9/other.json | 276 ++---
 .../pmu-events/arch/powerpc/power9/pipeline.json   |  14 +-
 tools/perf/pmu-events/arch/powerpc/power9/pmc.json |   2 +-
 .../arch/powerpc/power9/translation.json   |   5 -
 7 files changed, 88 insertions(+), 248 deletions(-)



[PATCH v3 18/18] powerpc/vas: Add support for user receive window

2017-11-07 Thread Sukadev Bhattiprolu
Add support for user space receive window (for the Fast thread-wakeup
coprocessor type)

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
Changelog[v3]
- [Nick Piggin] Drop CP_ABORT since set_thread_uses_vas() does
  that now (in earlier patch) and add a check for return value.
---
 arch/powerpc/platforms/powernv/vas-window.c | 56 +
 1 file changed, 49 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 8275492..2b3eb01 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -16,7 +16,8 @@
 #include 
 #include 
 #include 
-
+#include 
+#include 
 #include "vas.h"
 #include "copy-paste.h"
 
@@ -598,6 +599,32 @@ static void put_rx_win(struct vas_window *rxwin)
 }
 
 /*
+ * Find the user space receive window given the @pswid.
+ *  - We must have a valid vasid and it must belong to this instance.
+ *(so both send and receive windows are on the same VAS instance)
+ *  - The window must refer to an OPEN, FTW, RECEIVE window.
+ *
+ * NOTE: We access ->windows[] table and assume that vinst->mutex is held.
+ */
+static struct vas_window *get_user_rxwin(struct vas_instance *vinst, u32 pswid)
+{
+   int vasid, winid;
+   struct vas_window *rxwin;
+
+   decode_pswid(pswid, , );
+
+   if (vinst->vas_id != vasid)
+   return ERR_PTR(-EINVAL);
+
+   rxwin = vinst->windows[winid];
+
+   if (!rxwin || rxwin->tx_win || rxwin->cop != VAS_COP_TYPE_FTW)
+   return ERR_PTR(-EINVAL);
+
+   return rxwin;
+}
+
+/*
  * Get the VAS receive window associated with NX engine identified
  * by @cop and if applicable, @pswid.
  *
@@ -610,10 +637,10 @@ static struct vas_window *get_vinst_rxwin(struct 
vas_instance *vinst,
 
mutex_lock(>mutex);
 
-   if (cop == VAS_COP_TYPE_842 || cop == VAS_COP_TYPE_842_HIPRI)
-   rxwin = vinst->rxwin[cop] ?: ERR_PTR(-EINVAL);
+   if (cop == VAS_COP_TYPE_FTW)
+   rxwin = get_user_rxwin(vinst, pswid);
else
-   rxwin = ERR_PTR(-EINVAL);
+   rxwin = vinst->rxwin[cop] ?: ERR_PTR(-EINVAL);
 
if (!IS_ERR(rxwin))
atomic_inc(>num_txwins);
@@ -937,10 +964,9 @@ static void init_winctx_for_txwin(struct vas_window *txwin,
winctx->tx_word_mode = txattr->tx_win_ord_mode;
winctx->rsvd_txbuf_count = txattr->rsvd_txbuf_count;
 
-   if (winctx->nx_win) {
+   winctx->intr_disable = true;
+   if (winctx->nx_win)
winctx->data_stamp = true;
-   winctx->intr_disable = true;
-   }
 
winctx->lpid = txattr->lpid;
winctx->pidr = txattr->pidr;
@@ -985,6 +1011,14 @@ struct vas_window *vas_tx_win_open(int vasid, enum 
vas_cop_type cop,
if (!tx_win_args_valid(cop, attr))
return ERR_PTR(-EINVAL);
 
+   /*
+* If caller did not specify a vasid but specified the PSWID of a
+* receive window (applicable only to FTW windows), use the vasid
+* from that receive window.
+*/
+   if (vasid == -1 && attr->pswid)
+   decode_pswid(attr->pswid, , NULL);
+
vinst = find_vas_instance(vasid);
if (!vinst) {
pr_devel("vasid %d not found!\n", vasid);
@@ -1031,6 +1065,14 @@ struct vas_window *vas_tx_win_open(int vasid, enum 
vas_cop_type cop,
}
}
 
+   /*
+* Now that we have a send window, ensure context switch issues
+* CP_ABORT for this thread.
+*/
+   rc = -EINVAL;
+   if (set_thread_uses_vas() < 0)
+   goto free_window;
+
set_vinst_win(vinst, txwin);
 
return txwin;
-- 
2.7.4



[PATCH v3 16/18] powerpc/vas: Define vas_win_paste_addr()

2017-11-07 Thread Sukadev Bhattiprolu
Define an interface that the NX drivers can use to find the physical
paste address of a send window. This interface is expected to be used
with the mmap() operation of the NX driver's device. i.e the user space
process can use driver's mmap() operation to map the send window's paste
address into their address space and then use copy and paste instructions
to submit the CRBs to the NX engine.

Note that kernel drivers will use vas_paste_crb() directly and don't need
this interface.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/vas.h  |  7 +++
 arch/powerpc/platforms/powernv/vas-window.c | 10 ++
 2 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index 044748f..f98ade8 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -10,6 +10,8 @@
 #ifndef _ASM_POWERPC_VAS_H
 #define _ASM_POWERPC_VAS_H
 
+struct vas_window;
+
 /*
  * Min and max FIFO sizes are based on Version 1.05 Section 3.1.4.25
  * (Local FIFO Size Register) of the VAS workbook.
@@ -165,4 +167,9 @@ int vas_copy_crb(void *crb, int offset);
  */
 int vas_paste_crb(struct vas_window *win, int offset, bool re);
 
+/*
+ * Return the power bus paste address associated with @win so the caller
+ * can map that address into their address space.
+ */
+extern u64 vas_win_paste_addr(struct vas_window *win);
 #endif /* __ASM_POWERPC_VAS_H */
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index c030d4c..d7d0653 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -40,6 +40,16 @@ static void compute_paste_address(struct vas_window *window, 
u64 *addr, int *len
pr_debug("Txwin #%d: Paste addr 0x%llx\n", winid, *addr);
 }
 
+u64 vas_win_paste_addr(struct vas_window *win)
+{
+   u64 addr;
+
+   compute_paste_address(win, , NULL);
+
+   return addr;
+}
+EXPORT_SYMBOL(vas_win_paste_addr);
+
 static inline void get_hvwc_mmio_bar(struct vas_window *window,
u64 *start, int *len)
 {
-- 
2.7.4



[PATCH v3 17/18] powerpc/vas: Define vas_win_id()

2017-11-07 Thread Sukadev Bhattiprolu
Define an interface to return a system-wide unique id for a given VAS
window.

The vas_win_id() will be used in a follow-on patch to generate an unique
handle for a user space receive window. Applications can use this handle
to pair send and receive windows for fast thread-wakeup.

The hardware refers to this system-wide unique id as a Partition Send
Window ID which is expected to be used during fault handling. Hence the
"pswid" in the function names.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/vas.h  |  5 +
 arch/powerpc/platforms/powernv/vas-window.c |  9 +
 arch/powerpc/platforms/powernv/vas.h| 28 
 3 files changed, 42 insertions(+)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index f98ade8..7714562 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -168,6 +168,11 @@ int vas_copy_crb(void *crb, int offset);
 int vas_paste_crb(struct vas_window *win, int offset, bool re);
 
 /*
+ * Return a system-wide unique id for the VAS window @win.
+ */
+extern u32 vas_win_id(struct vas_window *win);
+
+/*
  * Return the power bus paste address associated with @win so the caller
  * can map that address into their address space.
  */
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index d7d0653..8275492 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1235,3 +1235,12 @@ int vas_win_close(struct vas_window *window)
return 0;
 }
 EXPORT_SYMBOL_GPL(vas_win_close);
+
+/*
+ * Return a system-wide unique window id for the window @win.
+ */
+u32 vas_win_id(struct vas_window *win)
+{
+   return encode_pswid(win->vinst->vas_id, win->winid);
+}
+EXPORT_SYMBOL_GPL(vas_win_id);
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index 756cbc5..ae0100f 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -447,4 +447,32 @@ static inline u64 read_hvwc_reg(struct vas_window *win,
return in_be64(win->hvwc_map+reg);
 }
 
+/*
+ * Encode/decode the Partition Send Window ID (PSWID) for a window in
+ * a way that we can uniquely identify any window in the system. i.e.
+ * we should be able to locate the 'struct vas_window' given the PSWID.
+ *
+ * BitsUsage
+ * 0:7 VAS id (8 bits)
+ * 8:15Unused, 0 (3 bits)
+ * 16:31   Window id (16 bits)
+ */
+static inline u32 encode_pswid(int vasid, int winid)
+{
+   u32 pswid = 0;
+
+   pswid |= vasid << (31 - 7);
+   pswid |= winid;
+
+   return pswid;
+}
+
+static inline void decode_pswid(u32 pswid, int *vasid, int *winid)
+{
+   if (vasid)
+   *vasid = pswid >> (31 - 7) & 0xFF;
+
+   if (winid)
+   *winid = pswid & 0x;
+}
 #endif /* _VAS_H */
-- 
2.7.4



[PATCH v3 15/18] powerpc: Emulate paste instruction

2017-11-07 Thread Sukadev Bhattiprolu
From: Michael Neuling <mi...@neuling.org>

On POWER9 DD2.1 and below there are issues when the paste instruction
generates an error. If an error occurs when thread reconfiguration
happens (ie another thread in the core goes into/out of powersave) the
core may hang.

To avoid this a special sequence is required which stops thread
configuration so that the paste can be safely executed.

This patch assumes paste executed in userspace are trapped into the
illegal instruction exception at 0xe40.

Here we re-execute the paste instruction but with the required
sequence to ensure thread reconfiguration doesn't occur.

Cc: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mi...@neuling.org>
Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
Changlog[v3]:
- [Michael Ellerman] We don't need to disable/enable pagefaults
  when emulating paste;
- [Michael Ellerman, Aneesh Kumar] Fix retval from emulate_paste()

Edit by Sukadev: Use PPC_PASTE() rather than the paste instruction since
in older versions the instruction required a third parameter.
---
 arch/powerpc/include/asm/emulated_ops.h |  1 +
 arch/powerpc/include/asm/ppc-opcode.h   |  1 +
 arch/powerpc/include/asm/reg.h  |  2 +
 arch/powerpc/kernel/traps.c | 67 +
 4 files changed, 71 insertions(+)

diff --git a/arch/powerpc/include/asm/emulated_ops.h 
b/arch/powerpc/include/asm/emulated_ops.h
index f00e10e..9247af9 100644
--- a/arch/powerpc/include/asm/emulated_ops.h
+++ b/arch/powerpc/include/asm/emulated_ops.h
@@ -55,6 +55,7 @@ extern struct ppc_emulated {
struct ppc_emulated_entry mfdscr;
struct ppc_emulated_entry mtdscr;
struct ppc_emulated_entry lq_stq;
+   struct ppc_emulated_entry paste;
 #endif
 } ppc_emulated;
 
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index ce0930d..a55d2ef 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -229,6 +229,7 @@
 #define PPC_INST_MTTMR 0x7c0003dc
 #define PPC_INST_NOP   0x6000
 #define PPC_INST_PASTE 0x7c20070d
+#define PPC_INST_PASTE_MASK0xfc2007ff
 #define PPC_INST_POPCNTB   0x7cf4
 #define PPC_INST_POPCNTB_MASK  0xfc0007fe
 #define PPC_INST_POPCNTD   0x7c0003f4
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index b779f3c..3495ecf 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -469,6 +469,8 @@
 #define SPRN_DBAT7U0x23E   /* Data BAT 7 Upper Register */
 #define SPRN_PPR   0x380   /* SMT Thread status Register */
 #define SPRN_TSCR  0x399   /* Thread Switch Control Register */
+#define SPRN_TRIG1 0x371   /* WAT Trigger 1 */
+#define SPRN_TRIG2 0x372   /* WAT Trigger 2 */
 
 #define SPRN_DEC   0x016   /* Decrement Register */
 #define SPRN_DER   0x095   /* Debug Enable Register */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 13c9dcd..c2cce25 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -956,6 +956,68 @@ static inline bool tm_abort_check(struct pt_regs *regs, 
int reason)
 }
 #endif
 
+static DEFINE_SPINLOCK(paste_emulation_lock);
+
+static inline int paste(void *i)
+{
+   int cr;
+   long retval = 0;
+
+   /* Need per core lock to ensure trig1/2 writes don't race */
+   spin_lock(_emulation_lock);
+   mtspr(SPRN_TRIG1, 0); /* data doesn't matter */
+   mtspr(SPRN_TRIG1, 0); /* HW says do this twice */
+   asm volatile(
+   "1: " PPC_PASTE(0, %2) "\n"
+   "2: mfcr %1\n"
+   ".section .fixup,\"ax\"\n"
+   "3: li %0,%3\n"
+   "   li %2,0\n"
+   "   b 2b\n"
+   ".previous\n"
+   EX_TABLE(1b, 3b)
+   : "=r" (retval), "=r" (cr)
+   : "b" (i), "i" (-EFAULT), "0" (retval));
+   mtspr(SPRN_TRIG2, 0);
+   spin_unlock(_emulation_lock);
+
+   return retval ?: cr;
+}
+
+static int emulate_paste(struct pt_regs *regs, u32 instword)
+{
+   const void __user *addr;
+   unsigned long ea;
+   u8 ra, rb;
+   int rc;
+
+   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   return -EINVAL;
+
+   ra = (instword >> 16) & 0x1f;
+   rb = (instword >> 11) & 0x1f;
+
+   ea = regs->gpr[rb] + (ra ? regs->gpr[ra] : 0ul);
+   if (is_32bit_task())
+   ea &= 0xul;
+   addr = (__force const void __user *)ea;
+
+   if (!access_ok(VERIFY_WRITE, addr, 128)) // cacheline size == 128
+   return -

[PATCH v3 14/18] powerpc: Define set_thread_uses_vas()

2017-11-07 Thread Sukadev Bhattiprolu
A CP_ABORT instruction is required in processes that have mapped a VAS
"paste address" with the intention of using COPY/PASTE instructions.
But since CP_ABORT is expensive, we want to restrict it to only processes
that use/intend to use COPY/PASTE.

Define an interface, set_thread_uses_vas(), that VAS can use to indicate
that the current process opened a send window. During context switch,
issue CP_ABORT only for processes that have the flag set.

Thanks for input from Nick Piggin, Michael Ellerman.

Cc: Nicholas Piggin <npig...@gmail.com>
Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
Changelog[v3]:
- [Nick Piggin] Rename interface to set_thread_uses_vas(), tweak
  comment and move the CP_ABORT from callers into the interface.
---
 arch/powerpc/include/asm/processor.h |  2 ++
 arch/powerpc/include/asm/switch_to.h |  2 ++
 arch/powerpc/kernel/process.c| 41 +++-
 3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 58cc212..bdab3b74 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -341,7 +341,9 @@ struct thread_struct {
unsigned long   sier;
unsigned long   mmcr2;
unsignedmmcr0;
+
unsignedused_ebb;
+   unsigned intused_vas;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index ad2d762..c3ca42c 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -92,6 +92,8 @@ static inline void clear_task_ebb(struct task_struct *t)
 #endif
 }
 
+extern int set_thread_uses_vas(void);
+
 extern int set_thread_tidr(struct task_struct *t);
 extern void clear_thread_tidr(struct task_struct *t);
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index d861fcd..395ca80 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1234,17 +1234,17 @@ struct task_struct *__switch_to(struct task_struct 
*prev,
 * The copy-paste buffer can only store into foreign real
 * addresses, so unprivileged processes can not see the
 * data or use it in any way unless they have foreign real
-* mappings. We don't have a VAS driver that allocates those
-* yet, so no cpabort is required.
+* mappings. If the new process has the foreign real address
+* mappings, we must issue a cp_abort to clear any state and
+* prevent snooping, corruption or a covert channel.
+*
+* DD1 allows paste into normal system memory so we do an
+* unpaired copy, rather than cp_abort, to clear the buffer,
+* since cp_abort is quite expensive.
 */
-   if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
-   /*
-* DD1 allows paste into normal system memory, so we
-* do an unpaired copy here to clear the buffer and
-* prevent a covert channel being set up.
-*
-* cpabort is not used because it is quite expensive.
-*/
+   if (new_thread->used_vas) {
+   asm volatile(PPC_CP_ABORT);
+   } else if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
asm volatile(PPC_COPY(%0, %1)
: : "r"(dummy_copy_buffer), "r"(0));
}
@@ -1445,6 +1445,27 @@ void flush_thread(void)
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
 }
 
+int set_thread_uses_vas(void)
+{
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   return -EINVAL;
+
+   current->thread.used_vas = 1;
+
+   /*
+* Even a process that has no foreign real address mapping can use
+* an unpaired COPY instruction (to no real effect). Issue CP_ABORT
+* to clear any pending COPY and prevent a covert channel.
+*
+* __switch_to() will issue CP_ABORT on future context switches.
+*/
+   asm volatile(PPC_CP_ABORT);
+
+#endif /* CONFIG_PPC_BOOK3S_64 */
+   return 0;
+}
+
 #ifdef CONFIG_PPC64
 static DEFINE_SPINLOCK(vas_thread_id_lock);
 static DEFINE_IDA(vas_thread_ida);
-- 
2.7.4



[PATCH v3 13/18] powerpc: Add support for setting SPRN_TIDR

2017-11-07 Thread Sukadev Bhattiprolu
We need the SPRN_TIDR to be set for use with fast thread-wakeup (core-
to-core wakeup) and also with CAPI.

Each thread in a process needs to have a unique id within the process.
But as explained below, for now, we assign globally unique thread ids
to all threads in the system.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
Signed-off-by: Philippe Bergheaud <fe...@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clomb...@linux.vnet.ibm.com>
---
Changelog[v3]
- Merge changes with and address comments to Christophe's patch.
  (i.e drop CONFIG_PPC_VAS; use CONFIG_PPC64; check CPU_ARCH_300
  before setting TIDR). Defer following to separate patches:
- emulation parts of Christophe's patch,
- setting TIDR for tasks other than 'current'
- setting feature bit in AT_HWCAP2

Changelog[v2]
- Michael Ellerman: Use an interface to assign TIDR so it is
assigned to only threads that need it; move assignment to
restore_sprs(). Drop lint from rebase;
---
 arch/powerpc/include/asm/processor.h |   1 +
 arch/powerpc/include/asm/switch_to.h |   3 +
 arch/powerpc/kernel/process.c| 122 +++
 3 files changed, 126 insertions(+)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index fab7ff8..58cc212 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -329,6 +329,7 @@ struct thread_struct {
 */
int dscr_inherit;
unsigned long   ppr;/* used to save/restore SMT priority */
+   unsigned long   tidr;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
unsigned long   tar;
diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index bf820f5..ad2d762 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -92,4 +92,7 @@ static inline void clear_task_ebb(struct task_struct *t)
 #endif
 }
 
+extern int set_thread_tidr(struct task_struct *t);
+extern void clear_thread_tidr(struct task_struct *t);
+
 #endif /* _ASM_POWERPC_SWITCH_TO_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 37ed60b..d861fcd 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1120,6 +1120,13 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
mtspr(SPRN_TAR, new_thread->tar);
}
 #endif
+#ifdef CONFIG_PPC64
+   if (old_thread->tidr != new_thread->tidr) {
+   /* TIDR should be non-zero only with ISA3.0. */
+   WARN_ON_ONCE(!cpu_has_feature(CPU_FTR_ARCH_300));
+   mtspr(SPRN_TIDR, new_thread->tidr);
+   }
+#endif
 }
 
 #ifdef CONFIG_PPC_BOOK3S_64
@@ -1438,9 +1445,117 @@ void flush_thread(void)
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
 }
 
+#ifdef CONFIG_PPC64
+static DEFINE_SPINLOCK(vas_thread_id_lock);
+static DEFINE_IDA(vas_thread_ida);
+
+/*
+ * We need to assign a unique thread id to each thread in a process.
+ *
+ * This thread id, referred to as TIDR, and separate from the Linux's tgid,
+ * is intended to be used to direct an ASB_Notify from the hardware to the
+ * thread, when a suitable event occurs in the system.
+ *
+ * One such event is a "paste" instruction in the context of Fast Thread
+ * Wakeup (aka Core-to-core wake up in the Virtual Accelerator Switchboard
+ * (VAS) in POWER9.
+ *
+ * To get a unique TIDR per process we could simply reuse task_pid_nr() but
+ * the problem is that task_pid_nr() is not yet available copy_thread() is
+ * called. Fixing that would require changing more intrusive arch-neutral
+ * code in code path in copy_process()?.
+ *
+ * Further, to assign unique TIDRs within each process, we need an atomic
+ * field (or an IDR) in task_struct, which again intrudes into the arch-
+ * neutral code. So try to assign globally unique TIDRs for now.
+ *
+ * NOTE: TIDR 0 indicates that the thread does not need a TIDR value.
+ *  For now, only threads that expect to be notified by the VAS
+ *  hardware need a TIDR value and we assign values > 0 for those.
+ */
+#define MAX_THREAD_CONTEXT ((1 << 16) - 1)
+static int assign_thread_tidr(void)
+{
+   int index;
+   int err;
+
+again:
+   if (!ida_pre_get(_thread_ida, GFP_KERNEL))
+   return -ENOMEM;
+
+   spin_lock(_thread_id_lock);
+   err = ida_get_new_above(_thread_ida, 1, );
+   spin_unlock(_thread_id_lock);
+
+   if (err == -EAGAIN)
+   goto again;
+   else if (err)
+   return err;
+
+   if (index > MAX_THREAD_CONTEXT) {
+   spin_lock(_thread_id_lock);
+   ida_remove(_thread_ida, index);
+   spin_unlock(_thread_id_lock);
+   return -ENOMEM;
+   }
+
+   return index;
+}
+
+stati

[PATCH v3 12/18] powerpc: have copy depend on CONFIG_BOOK3S_64

2017-11-07 Thread Sukadev Bhattiprolu
Have the COPY/PASTE instructions depend on CONFIG_BOOK3S_64 rather than
CONFIG_PPC_STD_MMU_64.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/process.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a0c74bb..37ed60b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1215,10 +1215,14 @@ struct task_struct *__switch_to(struct task_struct 
*prev,
batch = this_cpu_ptr(_tlb_batch);
batch->active = 1;
}
+#endif /* CONFIG_PPC_STD_MMU_64 */
 
if (current_thread_info()->task->thread.regs) {
+#ifdef CONFIG_PPC_STD_MMU_64
restore_math(current_thread_info()->task->thread.regs);
+#endif /* CONFIG_PPC_STD_MMU_64 */
 
+#ifdef CONFIG_PPC_BOOK3S_64
/*
 * The copy-paste buffer can only store into foreign real
 * addresses, so unprivileged processes can not see the
@@ -1237,8 +1241,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
asm volatile(PPC_COPY(%0, %1)
: : "r"(dummy_copy_buffer), "r"(0));
}
+#endif /* CONFIG_PPC_BOOK3S_64 */
}
-#endif /* CONFIG_PPC_STD_MMU_64 */
 
return last;
 }
-- 
2.7.4



[PATCH v3 11/18] powerpc/vas: Export HVWC to debugfs

2017-11-07 Thread Sukadev Bhattiprolu
Export the VAS Window context information to debugfs.

We need to hold a mutex when closing the window to prevent a race
with the debugfs read(). Rather than introduce a per-instance mutex,
we use the global vas_mutex for now, since it is not heavily contended.

The window->cop field is only relevant to a receive window so we were
not setting it for a send window (which is is paired to a receive window
anyway). But to simplify reporting in debugfs, set the 'cop' field for the
send window also.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>

---
Changelog[v3]:
- [Michael Ellerman] Fix couple of unininitialized variables
---
 arch/powerpc/platforms/powernv/Makefile |   3 +-
 arch/powerpc/platforms/powernv/vas-debug.c  | 209 
 arch/powerpc/platforms/powernv/vas-window.c |  34 -
 arch/powerpc/platforms/powernv/vas.c|   6 +-
 arch/powerpc/platforms/powernv/vas.h|  14 ++
 5 files changed, 257 insertions(+), 9 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/vas-debug.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 7a31c26..3732118 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -15,4 +15,5 @@ obj-$(CONFIG_TRACEPOINTS) += opal-tracepoints.o
 obj-$(CONFIG_OPAL_PRD) += opal-prd.o
 obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
-obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o
+obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o
+obj-$(CONFIG_PPC_FTW)  += nx-ftw.o
diff --git a/arch/powerpc/platforms/powernv/vas-debug.c 
b/arch/powerpc/platforms/powernv/vas-debug.c
new file mode 100644
index 000..ca22f1e
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/vas-debug.c
@@ -0,0 +1,209 @@
+/*
+ * Copyright 2016-17 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "vas: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include "vas.h"
+
+static struct dentry *vas_debugfs;
+
+static char *cop_to_str(int cop)
+{
+   switch (cop) {
+   case VAS_COP_TYPE_FAULT:return "Fault";
+   case VAS_COP_TYPE_842:  return "NX-842 Normal Priority";
+   case VAS_COP_TYPE_842_HIPRI:return "NX-842 High Priority";
+   case VAS_COP_TYPE_GZIP: return "NX-GZIP Normal Priority";
+   case VAS_COP_TYPE_GZIP_HIPRI:   return "NX-GZIP High Priority";
+   case VAS_COP_TYPE_FTW:  return "Fast Thread-wakeup";
+   default:return "Unknown";
+   }
+}
+
+static int info_dbg_show(struct seq_file *s, void *private)
+{
+   struct vas_window *window = s->private;
+
+   mutex_lock(_mutex);
+
+   /* ensure window is not unmapped */
+   if (!window->hvwc_map)
+   goto unlock;
+
+   seq_printf(s, "Type: %s, %s\n", cop_to_str(window->cop),
+   window->tx_win ? "Send" : "Receive");
+   seq_printf(s, "Pid : %d\n", window->pid);
+
+unlock:
+   mutex_unlock(_mutex);
+   return 0;
+}
+
+static int info_dbg_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, info_dbg_show, inode->i_private);
+}
+
+static const struct file_operations info_fops = {
+   .open   = info_dbg_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+static inline void print_reg(struct seq_file *s, struct vas_window *win,
+   char *name, u32 reg)
+{
+   seq_printf(s, "0x%016llx %s\n", read_hvwc_reg(win, name, reg), name);
+}
+
+static int hvwc_dbg_show(struct seq_file *s, void *private)
+{
+   struct vas_window *window = s->private;
+
+   mutex_lock(_mutex);
+
+   /* ensure window is not unmapped */
+   if (!window->hvwc_map)
+   goto unlock;
+
+   print_reg(s, window, VREG(LPID));
+   print_reg(s, window, VREG(PID));
+   print_reg(s, window, VREG(XLATE_MSR));
+   print_reg(s, window, VREG(XLATE_LPCR));
+   print_reg(s, window, VREG(XLATE_CTL));
+   print_reg(s, window, VREG(AMR));
+   print_reg(s, window, VREG(SEIDR));
+   print_reg(s, window, VREG(FAULT_TX_WIN));
+   print_reg(s, window, VREG(OSU_INTR_SRC_RA));
+   print_reg(s, window, VREG(HV_INTR_SRC_RA));
+   print_reg(s, window, VREG(PSWID));
+   print_reg(s, window, VREG(LFIFO_BAR));
+   print_reg(s, window, VREG(LDATA_STAMP_CTL));
+   print_reg(s, window, VREG(LDMA_CACHE_CTL));
+   

[PATCH v3 10/18] powerpc/vas, nx-842: Define and use chip_to_vas_id()

2017-11-07 Thread Sukadev Bhattiprolu
Define a helper, chip_to_vas_id() to map a given chip id to corresponding
vas id.

Normally, callers of vas_rx_win_open() and vas_tx_win_open() want the VAS
window to be on the same chip where the calling thread is executing. These
callers can pass in -1 for the VAS id.

This interface will be useful if a thread running on one chip wants to open
a window on another chip (like the NX-842 driver does during start up).

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/vas.h   |  9 +
 arch/powerpc/platforms/powernv/vas.c | 11 +++
 drivers/crypto/nx/nx-842-powernv.c   | 18 +++---
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index fd5963a..044748f 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -104,6 +104,15 @@ struct vas_tx_win_attr {
 };
 
 /*
+ * Helper to map a chip id to VAS id.
+ * For POWER9, this is a 1:1 mapping. In the future this maybe a 1:N
+ * mapping in which case, we will need to update this helper.
+ *
+ * Return the VAS id or -1 if no matching vasid is found.
+ */
+int chip_to_vas_id(int chipid);
+
+/*
  * Helper to initialize receive window attributes to defaults for an
  * NX window.
  */
diff --git a/arch/powerpc/platforms/powernv/vas.c 
b/arch/powerpc/platforms/powernv/vas.c
index abb7090..cd9a733 100644
--- a/arch/powerpc/platforms/powernv/vas.c
+++ b/arch/powerpc/platforms/powernv/vas.c
@@ -123,6 +123,17 @@ struct vas_instance *find_vas_instance(int vasid)
return NULL;
 }
 
+int chip_to_vas_id(int chipid)
+{
+   int cpu;
+
+   for_each_possible_cpu(cpu) {
+   if (cpu_to_chip_id(cpu) == chipid)
+   return per_cpu(cpu_vas_id, cpu);
+   }
+   return -1;
+}
+
 static int vas_probe(struct platform_device *pdev)
 {
return init_vas_instance(pdev);
diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 874ddf5..eb221ed 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -847,24 +847,12 @@ static int __init nx842_powernv_probe_vas(struct 
device_node *pn)
return -EINVAL;
}
 
-   for_each_compatible_node(dn, NULL, "ibm,power9-vas-x") {
-   if (of_get_ibm_chip_id(dn) == chip_id)
-   break;
-   }
-
-   if (!dn) {
-   pr_err("Missing VAS device node\n");
+   vasid = chip_to_vas_id(chip_id);
+   if (vasid < 0) {
+   pr_err("Unable to map chip_id %d to vasid\n", chip_id);
return -EINVAL;
}
 
-   if (of_property_read_u32(dn, "ibm,vas-id", )) {
-   pr_err("Missing ibm,vas-id device property\n");
-   of_node_put(dn);
-   return -EINVAL;
-   }
-
-   of_node_put(dn);
-
for_each_child_of_node(pn, dn) {
if (of_device_is_compatible(dn, "ibm,p9-nx-842")) {
ret = vas_cfg_coproc_info(dn, chip_id, vasid);
-- 
2.7.4



[PATCH v3 09/18] powerpc/vas: Create cpu to vas id mapping

2017-11-07 Thread Sukadev Bhattiprolu
Create a cpu to vasid mapping so callers can specify -1 instead of
trying to find a VAS id.

Changelog[v2]
[Michael Ellerman] Use per-cpu variables to simplify code.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/vas.c 
b/arch/powerpc/platforms/powernv/vas.c
index 565a487..abb7090 100644
--- a/arch/powerpc/platforms/powernv/vas.c
+++ b/arch/powerpc/platforms/powernv/vas.c
@@ -18,15 +18,18 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vas.h"
 
 static DEFINE_MUTEX(vas_mutex);
 static LIST_HEAD(vas_instances);
 
+static DEFINE_PER_CPU(int, cpu_vas_id);
+
 static int init_vas_instance(struct platform_device *pdev)
 {
-   int rc, vasid;
+   int rc, cpu, vasid;
struct resource *res;
struct vas_instance *vinst;
struct device_node *dn = pdev->dev.of_node;
@@ -74,6 +77,11 @@ static int init_vas_instance(struct platform_device *pdev)
"paste_win_id_shift 0x%llx\n", pdev->name, vasid,
vinst->paste_base_addr, vinst->paste_win_id_shift);
 
+   for_each_possible_cpu(cpu) {
+   if (cpu_to_chip_id(cpu) == of_get_ibm_chip_id(dn))
+   per_cpu(cpu_vas_id, cpu) = vasid;
+   }
+
mutex_lock(_mutex);
list_add(>node, _instances);
mutex_unlock(_mutex);
@@ -98,6 +106,10 @@ struct vas_instance *find_vas_instance(int vasid)
struct vas_instance *vinst;
 
mutex_lock(_mutex);
+
+   if (vasid == -1)
+   vasid = per_cpu(cpu_vas_id, smp_processor_id());
+
list_for_each(ent, _instances) {
vinst = list_entry(ent, struct vas_instance, node);
if (vinst->vas_id == vasid) {
-- 
2.7.4



[PATCH v3 08/18] powerpc/vas: poll for return of window credits

2017-11-07 Thread Sukadev Bhattiprolu
Normally, the NX driver waits for the CRBs to be processed before closing
the window. But it is better to ensure that the credits are returned before
the window gets reassigned later.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-window.c | 45 +
 1 file changed, 45 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index a59a187..23c13a7 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1063,6 +1063,49 @@ int vas_paste_crb(struct vas_window *txwin, int offset, 
bool re)
 EXPORT_SYMBOL_GPL(vas_paste_crb);
 
 /*
+ * If credit checking is enabled for this window, poll for the return
+ * of window credits (i.e for NX engines to process any outstanding CRBs).
+ * Since NX-842 waits for the CRBs to be processed before closing the
+ * window, we should not have to wait for too long.
+ *
+ * TODO: We retry in 10ms intervals now. We could/should probably peek at
+ * the VAS_LRFIFO_PUSH_OFFSET register to get an estimate of pending
+ * CRBs on the FIFO and compute the delay dynamically on each retry.
+ * But that is not really needed until we support NX-GZIP access from
+ * user space. (NX-842 driver waits for CSB and Fast thread-wakeup
+ * doesn't use credit checking).
+ */
+static void poll_window_credits(struct vas_window *window)
+{
+   u64 val;
+   int creds, mode;
+
+   val = read_hvwc_reg(window, VREG(WINCTL));
+   if (window->tx_win)
+   mode = GET_FIELD(VAS_WINCTL_TX_WCRED_MODE, val);
+   else
+   mode = GET_FIELD(VAS_WINCTL_RX_WCRED_MODE, val);
+
+   if (!mode)
+   return;
+retry:
+   if (window->tx_win) {
+   val = read_hvwc_reg(window, VREG(TX_WCRED));
+   creds = GET_FIELD(VAS_TX_WCRED, val);
+   } else {
+   val = read_hvwc_reg(window, VREG(LRX_WCRED));
+   creds = GET_FIELD(VAS_LRX_WCRED, val);
+   }
+
+   if (creds < window->wcreds_max) {
+   val = 0;
+   set_current_state(TASK_UNINTERRUPTIBLE);
+   schedule_timeout(msecs_to_jiffies(10));
+   goto retry;
+   }
+}
+
+/*
  * Wait for the window to go to "not-busy" state. It should only take a
  * short time to queue a CRB, so window should not be busy for too long.
  * Trying 5ms intervals.
@@ -1149,6 +1192,8 @@ int vas_win_close(struct vas_window *window)
 
unpin_close_window(window);
 
+   poll_window_credits(window);
+
poll_window_castout(window);
 
/* if send window, drop reference to matching receive window */
-- 
2.7.4



[PATCH v3 07/18] powerpc/vas: Save configured window credits

2017-11-07 Thread Sukadev Bhattiprolu
Save the configured max window credits for a window in the vas_window
structure. We will need this when polling for return of window credits.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-window.c | 6 --
 arch/powerpc/platforms/powernv/vas.h| 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 1422cdd..a59a187 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -674,7 +674,7 @@ static void init_winctx_for_rxwin(struct vas_window *rxwin,
 
winctx->rx_fifo = rxattr->rx_fifo;
winctx->rx_fifo_size = rxattr->rx_fifo_size;
-   winctx->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT;
+   winctx->wcreds_max = rxwin->wcreds_max;
winctx->pin_win = rxattr->pin_win;
 
winctx->nx_win = rxattr->nx_win;
@@ -844,6 +844,7 @@ struct vas_window *vas_rx_win_open(int vasid, enum 
vas_cop_type cop,
rxwin->nx_win = rxattr->nx_win;
rxwin->user_win = rxattr->user_win;
rxwin->cop = cop;
+   rxwin->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT;
if (rxattr->user_win)
rxwin->pid = task_pid_vnr(current);
 
@@ -893,7 +894,7 @@ static void init_winctx_for_txwin(struct vas_window *txwin,
 */
memset(winctx, 0, sizeof(struct vas_winctx));
 
-   winctx->wcreds_max = txattr->wcreds_max ?: VAS_WCREDS_DEFAULT;
+   winctx->wcreds_max = txwin->wcreds_max;
 
winctx->user_win = txattr->user_win;
winctx->nx_win = txwin->rxwin->nx_win;
@@ -978,6 +979,7 @@ struct vas_window *vas_tx_win_open(int vasid, enum 
vas_cop_type cop,
txwin->nx_win = txwin->rxwin->nx_win;
txwin->pid = attr->pid;
txwin->user_win = attr->user_win;
+   txwin->wcreds_max = attr->wcreds_max ?: VAS_WCREDS_DEFAULT;
 
init_winctx_for_txwin(txwin, attr, );
 
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index 63e8e03..02d8a31 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -332,6 +332,7 @@ struct vas_window {
void *hvwc_map; /* HV window context */
void *uwc_map;  /* OS/User window context */
pid_t pid;  /* Linux process id of owner */
+   int wcreds_max; /* Window credits */
 
/* Fields applicable only to send windows */
void *paste_kaddr;
-- 
2.7.4



[PATCH v3 06/18] powerpc/vas: Reduce polling interval for busy state

2017-11-07 Thread Sukadev Bhattiprolu
A VAS window is normally in "busy" state for only a short duration.
Reduce the time we wait for the window to go to "not-busy" state to
speed-up vas_win_close() a bit.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-window.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 95622a9..1422cdd 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1060,21 +1060,23 @@ int vas_paste_crb(struct vas_window *txwin, int offset, 
bool re)
 }
 EXPORT_SYMBOL_GPL(vas_paste_crb);
 
+/*
+ * Wait for the window to go to "not-busy" state. It should only take a
+ * short time to queue a CRB, so window should not be busy for too long.
+ * Trying 5ms intervals.
+ */
 static void poll_window_busy_state(struct vas_window *window)
 {
int busy;
u64 val;
 
 retry:
-   /*
-* Poll Window Busy flag
-*/
val = read_hvwc_reg(window, VREG(WIN_STATUS));
busy = GET_FIELD(VAS_WIN_BUSY, val);
if (busy) {
val = 0;
set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(HZ);
+   schedule_timeout(msecs_to_jiffies(5));
goto retry;
}
 }
-- 
2.7.4



[PATCH v3 04/18] powerpc/vas: Drop poll_window_cast_out().

2017-11-07 Thread Sukadev Bhattiprolu
Polling for window cast out is listed in the spec, but turns out that
it is not strictly necessary and slows down window close. Making it a
stub for now.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-window.c | 34 ++---
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 67ffc5d..8ab8a82 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1079,25 +1079,25 @@ static void poll_window_busy_state(struct vas_window 
*window)
}
 }
 
+/*
+ * Have the hardware cast a window out of cache and wait for it to
+ * be completed.
+ *
+ * NOTE: It can take a relatively long time to cast the window context
+ * out of the cache. It is not strictly necessary to cast out if:
+ *
+ * - we clear the "Pin Window" bit (so hardware is free to evict)
+ *
+ * - we re-initialize the window context when it is reassigned.
+ *
+ * We do the former in vas_win_close() and latter in vas_win_open().
+ * So, ignoring the cast-out for now. We can add it as needed. If
+ * casting out becomes necessary we should consider offloading the
+ * job to a worker thread, so the window close can proceed quickly.
+ */
 static void poll_window_castout(struct vas_window *window)
 {
-   int cached;
-   u64 val;
-
-   /* Cast window context out of the cache */
-retry:
-   val = read_hvwc_reg(window, VREG(WIN_CTX_CACHING_CTL));
-   cached = GET_FIELD(VAS_WIN_CACHE_STATUS, val);
-   if (cached) {
-   val = 0ULL;
-   val = SET_FIELD(VAS_CASTOUT_REQ, val, 1);
-   val = SET_FIELD(VAS_PUSH_TO_MEM, val, 0);
-   write_hvwc_reg(window, VREG(WIN_CTX_CACHING_CTL), val);
-
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(HZ);
-   goto retry;
-   }
+   /* stub for now */
 }
 
 /*
-- 
2.7.4



[PATCH v3 05/18] powerpc/vas: Use helper to unpin/close window

2017-11-07 Thread Sukadev Bhattiprolu
Use a helper to have the hardware unpin and mark a window closed.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-window.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 8ab8a82..95622a9 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1101,6 +1101,20 @@ static void poll_window_castout(struct vas_window 
*window)
 }
 
 /*
+ * Unpin and close a window so no new requests are accepted and the
+ * hardware can evict this window from cache if necessary.
+ */
+static void unpin_close_window(struct vas_window *window)
+{
+   u64 val;
+
+   val = read_hvwc_reg(window, VREG(WINCTL));
+   val = SET_FIELD(VAS_WINCTL_PIN, val, 0);
+   val = SET_FIELD(VAS_WINCTL_OPEN, val, 0);
+   write_hvwc_reg(window, VREG(WINCTL), val);
+}
+
+/*
  * Close a window.
  *
  * See Section 1.12.1 of VAS workbook v1.05 for details on closing window:
@@ -1114,8 +1128,6 @@ static void poll_window_castout(struct vas_window *window)
  */
 int vas_win_close(struct vas_window *window)
 {
-   u64 val;
-
if (!window)
return 0;
 
@@ -1131,11 +1143,7 @@ int vas_win_close(struct vas_window *window)
 
poll_window_busy_state(window);
 
-   /* Unpin window from cache and close it */
-   val = read_hvwc_reg(window, VREG(WINCTL));
-   val = SET_FIELD(VAS_WINCTL_PIN, val, 0);
-   val = SET_FIELD(VAS_WINCTL_OPEN, val, 0);
-   write_hvwc_reg(window, VREG(WINCTL), val);
+   unpin_close_window(window);
 
poll_window_castout(window);
 
-- 
2.7.4



[PATCH v3 03/18] powerpc/vas: Cleanup some debug code

2017-11-07 Thread Sukadev Bhattiprolu
Clean up vas.h and the debug code around ifdef vas_debug.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>

---
Changelog[v3]
- Minor tweak to a debug message
---
 arch/powerpc/platforms/powernv/vas-window.c |  8 +++--
 arch/powerpc/platforms/powernv/vas.h| 54 ++---
 2 files changed, 17 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index a2fe120..67ffc5d 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -726,7 +726,10 @@ static void init_winctx_for_rxwin(struct vas_window *rxwin,
 static bool rx_win_args_valid(enum vas_cop_type cop,
struct vas_rx_win_attr *attr)
 {
-   dump_rx_win_attr(attr);
+   pr_debug("Rxattr: fault %d, notify %d, intr %d, early %d, fifo %d\n",
+   attr->fault_win, attr->notify_disable,
+   attr->intr_disable, attr->notify_early,
+   attr->rx_fifo_size);
 
if (cop >= VAS_COP_TYPE_MAX)
return false;
@@ -1050,7 +1053,8 @@ int vas_paste_crb(struct vas_window *txwin, int offset, 
bool re)
else
rc = -EINVAL;
 
-   print_fifo_msg_count(txwin);
+   pr_debug("Txwin #%d: Msg count %llu\n", txwin->winid,
+   read_hvwc_reg(txwin, VREG(LRFIFO_PUSH)));
 
return rc;
 }
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index fea0de4..63e8e03 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -259,6 +259,16 @@
 #define VAS_NX_UTIL_ADDER  PPC_BITMASK(32, 63)
 
 /*
+ * VREG(x):
+ * Expand a register's short name (eg: LPID) into two parameters:
+ * - the register's short name in string form ("LPID"), and
+ * - the name of the macro (eg: VAS_LPID_OFFSET), defining the
+ *   register's offset in the window context
+ */
+#define VREG_SFX(n, s) __stringify(n), VAS_##n##s
+#define VREG(r)VREG_SFX(r, _OFFSET)
+
+/*
  * Local Notify Scope Control Register. (Receive windows only).
  */
 enum vas_notify_scope {
@@ -385,43 +395,15 @@ struct vas_winctx {
 
 extern struct vas_instance *find_vas_instance(int vasid);
 
-/*
- * VREG(x):
- * Expand a register's short name (eg: LPID) into two parameters:
- * - the register's short name in string form ("LPID"), and
- * - the name of the macro (eg: VAS_LPID_OFFSET), defining the
- *   register's offset in the window context
- */
-#define VREG_SFX(n, s) __stringify(n), VAS_##n##s
-#define VREG(r)VREG_SFX(r, _OFFSET)
-
-#ifdef vas_debug
-static inline void dump_rx_win_attr(struct vas_rx_win_attr *attr)
-{
-   pr_err("fault %d, notify %d, intr %d early %d\n",
-   attr->fault_win, attr->notify_disable,
-   attr->intr_disable, attr->notify_early);
-
-   pr_err("rx_fifo_size %d, max value %d\n",
-   attr->rx_fifo_size, VAS_RX_FIFO_SIZE_MAX);
-}
-
 static inline void vas_log_write(struct vas_window *win, char *name,
void *regptr, u64 val)
 {
if (val)
-   pr_err("%swin #%d: %s reg %p, val 0x%016llx\n",
+   pr_debug("%swin #%d: %s reg %p, val 0x%016llx\n",
win->tx_win ? "Tx" : "Rx", win->winid, name,
regptr, val);
 }
 
-#else  /* vas_debug */
-
-#define vas_log_write(win, name, reg, val)
-#define dump_rx_win_attr(attr)
-
-#endif /* vas_debug */
-
 static inline void write_uwc_reg(struct vas_window *win, char *name,
s32 reg, u64 val)
 {
@@ -450,18 +432,4 @@ static inline u64 read_hvwc_reg(struct vas_window *win,
return in_be64(win->hvwc_map+reg);
 }
 
-#ifdef vas_debug
-
-static void print_fifo_msg_count(struct vas_window *txwin)
-{
-   uint64_t read_hvwc_reg(struct vas_window *w, char *n, uint64_t o);
-   pr_devel("Winid %d, Msg count %llu\n", txwin->winid,
-   (uint64_t)read_hvwc_reg(txwin, VREG(LRFIFO_PUSH)));
-}
-#else  /* vas_debug */
-
-#define print_fifo_msg_count(window)
-
-#endif /* vas_debug */
-
 #endif /* _VAS_H */
-- 
2.7.4



[PATCH v3 02/18] powerpc/vas: Validate window credits

2017-11-07 Thread Sukadev Bhattiprolu
NX-842, the only user of VAS, sets the window credits to default values
but VAS should check the credits against the possible max values.

The VAS_WCREDS_MIN is not needed and can be dropped.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-window.c | 6 ++
 arch/powerpc/platforms/powernv/vas.h| 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index cec7ab7..a2fe120 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -738,6 +738,9 @@ static bool rx_win_args_valid(enum vas_cop_type cop,
if (attr->rx_fifo_size > VAS_RX_FIFO_SIZE_MAX)
return false;
 
+   if (attr->wcreds_max > VAS_RX_WCREDS_MAX)
+   return false;
+
if (attr->nx_win) {
/* cannot be fault or user window if it is nx */
if (attr->fault_win || attr->user_win)
@@ -927,6 +930,9 @@ static bool tx_win_args_valid(enum vas_cop_type cop,
if (cop > VAS_COP_TYPE_MAX)
return false;
 
+   if (attr->wcreds_max > VAS_TX_WCREDS_MAX)
+   return false;
+
if (attr->user_win &&
(cop != VAS_COP_TYPE_FTW || attr->rsvd_txbuf_count))
return false;
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index 38dee5d..fea0de4 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -106,8 +106,8 @@
  *
  * TODO: Needs tuning for per-process credits
  */
-#define VAS_WCREDS_MIN 16
-#define VAS_WCREDS_MAX ((64 << 10) - 1)
+#define VAS_RX_WCREDS_MAX  ((64 << 10) - 1)
+#define VAS_TX_WCREDS_MAX  ((4 << 10) - 1)
 #define VAS_WCREDS_DEFAULT (1 << 10)
 
 /*
-- 
2.7.4



[PATCH v3 01/18] powerpc/vas: init missing fields from [rt]xattr

2017-11-07 Thread Sukadev Bhattiprolu
Initialize a few missing window context fields from the window attributes
specified by the caller. These fields are currently set to their default
values by the caller (NX-842), but would be good to apply them anyway.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-window.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 5aae845..cec7ab7 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -679,10 +679,13 @@ static void init_winctx_for_rxwin(struct vas_window 
*rxwin,
 
winctx->nx_win = rxattr->nx_win;
winctx->fault_win = rxattr->fault_win;
+   winctx->user_win = rxattr->user_win;
+   winctx->rej_no_credit = rxattr->rej_no_credit;
winctx->rx_word_mode = rxattr->rx_win_ord_mode;
winctx->tx_word_mode = rxattr->tx_win_ord_mode;
winctx->rx_wcred_mode = rxattr->rx_wcred_mode;
winctx->tx_wcred_mode = rxattr->tx_wcred_mode;
+   winctx->notify_early = rxattr->notify_early;
 
if (winctx->nx_win) {
winctx->data_stamp = true;
@@ -889,11 +892,14 @@ static void init_winctx_for_txwin(struct vas_window 
*txwin,
winctx->user_win = txattr->user_win;
winctx->nx_win = txwin->rxwin->nx_win;
winctx->pin_win = txattr->pin_win;
+   winctx->rej_no_credit = txattr->rej_no_credit;
+   winctx->rsvd_txbuf_enable = txattr->rsvd_txbuf_enable;
 
winctx->rx_wcred_mode = txattr->rx_wcred_mode;
winctx->tx_wcred_mode = txattr->tx_wcred_mode;
winctx->rx_word_mode = txattr->rx_win_ord_mode;
winctx->tx_word_mode = txattr->tx_win_ord_mode;
+   winctx->rsvd_txbuf_count = txattr->rsvd_txbuf_count;
 
if (winctx->nx_win) {
winctx->data_stamp = true;
-- 
2.7.4



[PATCH v3 00/18] powerpc/vas: Add support for FTW

2017-11-07 Thread Sukadev Bhattiprolu
The first 10 patches in this set sanitize cpu/chip id to VAS id mapping,
improve vas_win_close() performance, add a check for return of credits
and cleans up some code.

Patch 11 adds debugfs support for the VAS window contexts.

Patches 12-18 add support for user space aka Fast thread-wakeup windows
in VAS. These include a patch from Michael Neuling to support emulating
the paste instruction.

Changelog[v3]:
- [Michael Ellerman] We don't need to disable/enable pagefaults
  when emulating paste;
- [Michael Ellerman, Aneesh Kumar] Fix retval from emulate_paste()
  and paste().
- [Nick Piggin] Drop CP_ABORT since set_thread_uses_vas() does
  that now (in earlier patch) and add a check for return value.
- [Michael Ellerman] Fix couple of unininitialized variables
- Minor tweak to a debug message

Michael Neuling (1):
  powerpc: Emulate paste instruction

Sukadev Bhattiprolu (17):
  powerpc/vas: init missing fields from [rt]xattr
  powerpc/vas: Validate window credits
  powerpc/vas: Cleanup some debug code
  powerpc/vas: Drop poll_window_cast_out().
  powerpc/vas: Use helper to unpin/close window
  powerpc/vas: Reduce polling interval for busy state
  powerpc/vas: Save configured window credits
  powerpc/vas: poll for return of window credits
  powerpc/vas: Create cpu to vas id mapping
  powerpc/vas, nx-842: Define and use chip_to_vas_id()
  powerpc/vas: Export HVWC to debugfs
  powerpc: have copy depend on CONFIG_BOOK3S_64
  powerpc: Add support for setting SPRN_TIDR
  powerpc: Define set_thread_uses_vas()
  powerpc/vas: Define vas_win_paste_addr()
  powerpc/vas: Define vas_win_id()
  powerpc/vas: Add support for user receive window

 arch/powerpc/include/asm/emulated_ops.h |   1 +
 arch/powerpc/include/asm/ppc-opcode.h   |   1 +
 arch/powerpc/include/asm/processor.h|   3 +
 arch/powerpc/include/asm/reg.h  |   2 +
 arch/powerpc/include/asm/switch_to.h|   5 +
 arch/powerpc/include/asm/vas.h  |  21 +++
 arch/powerpc/kernel/process.c   | 169 +--
 arch/powerpc/kernel/traps.c |  67 
 arch/powerpc/platforms/powernv/Makefile |   3 +-
 arch/powerpc/platforms/powernv/vas-debug.c  | 209 
 arch/powerpc/platforms/powernv/vas-window.c | 242 +++-
 arch/powerpc/platforms/powernv/vas.c|  31 +++-
 arch/powerpc/platforms/powernv/vas.h|  93 ++-
 drivers/crypto/nx/nx-842-powernv.c  |  18 +--
 14 files changed, 751 insertions(+), 114 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/vas-debug.c

-- 
2.7.4



[PATCH v2 18/18] powerpc/vas: Add support for user receive window

2017-10-06 Thread Sukadev Bhattiprolu
Add support for user space receive window (for the Fast thread-wakeup
coprocessor type)

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/vas-window.c | 59 +
 1 file changed, 52 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 1d08b64..99642ec 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -16,7 +16,8 @@
 #include 
 #include 
 #include 
-
+#include 
+#include 
 #include "vas.h"
 #include "copy-paste.h"
 
@@ -602,6 +603,32 @@ static void put_rx_win(struct vas_window *rxwin)
 }
 
 /*
+ * Find the user space receive window given the @pswid.
+ *  - We must have a valid vasid and it must belong to this instance.
+ *(so both send and receive windows are on the same VAS instance)
+ *  - The window must refer to an OPEN, FTW, RECEIVE window.
+ *
+ * NOTE: We access ->windows[] table and assume that vinst->mutex is held.
+ */
+static struct vas_window *get_user_rxwin(struct vas_instance *vinst, u32 pswid)
+{
+   int vasid, winid;
+   struct vas_window *rxwin;
+
+   decode_pswid(pswid, , );
+
+   if (vinst->vas_id != vasid)
+   return ERR_PTR(-EINVAL);
+
+   rxwin = vinst->windows[winid];
+
+   if (!rxwin || rxwin->tx_win || rxwin->cop != VAS_COP_TYPE_FTW)
+   return ERR_PTR(-EINVAL);
+
+   return rxwin;
+}
+
+/*
  * Get the VAS receive window associated with NX engine identified
  * by @cop and if applicable, @pswid.
  *
@@ -614,10 +641,10 @@ static struct vas_window *get_vinst_rxwin(struct 
vas_instance *vinst,
 
mutex_lock(>mutex);
 
-   if (cop == VAS_COP_TYPE_842 || cop == VAS_COP_TYPE_842_HIPRI)
-   rxwin = vinst->rxwin[cop] ?: ERR_PTR(-EINVAL);
+   if (cop == VAS_COP_TYPE_FTW)
+   rxwin = get_user_rxwin(vinst, pswid);
else
-   rxwin = ERR_PTR(-EINVAL);
+   rxwin = vinst->rxwin[cop] ?: ERR_PTR(-EINVAL);
 
if (!IS_ERR(rxwin))
atomic_inc(>num_txwins);
@@ -941,10 +968,9 @@ static void init_winctx_for_txwin(struct vas_window *txwin,
winctx->tx_word_mode = txattr->tx_win_ord_mode;
winctx->rsvd_txbuf_count = txattr->rsvd_txbuf_count;
 
-   if (winctx->nx_win) {
+   winctx->intr_disable = true;
+   if (winctx->nx_win)
winctx->data_stamp = true;
-   winctx->intr_disable = true;
-   }
 
winctx->lpid = txattr->lpid;
winctx->pidr = txattr->pidr;
@@ -989,6 +1015,14 @@ struct vas_window *vas_tx_win_open(int vasid, enum 
vas_cop_type cop,
if (!tx_win_args_valid(cop, attr))
return ERR_PTR(-EINVAL);
 
+   /*
+* If caller did not specify a vasid but specified the PSWID of a
+* receive window (applicable only to FTW windows), use the vasid
+* from that receive window.
+*/
+   if (vasid == -1 && attr->pswid)
+   decode_pswid(attr->pswid, , NULL);
+
vinst = find_vas_instance(vasid);
if (!vinst) {
pr_devel("vasid %d not found!\n", vasid);
@@ -1037,6 +1071,17 @@ struct vas_window *vas_tx_win_open(int vasid, enum 
vas_cop_type cop,
 
set_vinst_win(vinst, txwin);
 
+   set_thread_used_vas();
+
+   /*
+* Even a process that has no foreign real address mapping can use
+* an unpaired COPY instruction (to no real effect). Issue CP_ABORT
+* to clear any pending COPY and prevent a covert channel.
+*
+* __switch_to() will issue CP_ABORT on future context switches.
+*/
+   asm volatile(PPC_CP_ABORT);
+
return txwin;
 
 free_window:
-- 
2.7.4



[PATCH v2 17/18] powerpc/vas: Define vas_win_id()

2017-10-06 Thread Sukadev Bhattiprolu
Define an interface to return a system-wide unique id for a given VAS
window.

The vas_win_id() will be used in a follow-on patch to generate an unique
handle for a user space receive window. Applications can use this handle
to pair send and receive windows for fast thread-wakeup.

The hardware refers to this system-wide unique id as a Partition Send
Window ID which is expected to be used during fault handling. Hence the
"pswid" in the function names.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/vas.h  |  5 +
 arch/powerpc/platforms/powernv/vas-window.c |  9 +
 arch/powerpc/platforms/powernv/vas.h| 28 
 3 files changed, 42 insertions(+)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index f98ade8..7714562 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -168,6 +168,11 @@ int vas_copy_crb(void *crb, int offset);
 int vas_paste_crb(struct vas_window *win, int offset, bool re);
 
 /*
+ * Return a system-wide unique id for the VAS window @win.
+ */
+extern u32 vas_win_id(struct vas_window *win);
+
+/*
  * Return the power bus paste address associated with @win so the caller
  * can map that address into their address space.
  */
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index e4a9c7b..1d08b64 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1239,3 +1239,12 @@ int vas_win_close(struct vas_window *window)
return 0;
 }
 EXPORT_SYMBOL_GPL(vas_win_close);
+
+/*
+ * Return a system-wide unique window id for the window @win.
+ */
+u32 vas_win_id(struct vas_window *win)
+{
+   return encode_pswid(win->vinst->vas_id, win->winid);
+}
+EXPORT_SYMBOL_GPL(vas_win_id);
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index 145749a..78a8926 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -447,4 +447,32 @@ static inline u64 read_hvwc_reg(struct vas_window *win,
return in_be64(win->hvwc_map+reg);
 }
 
+/*
+ * Encode/decode the Partition Send Window ID (PSWID) for a window in
+ * a way that we can uniquely identify any window in the system. i.e.
+ * we should be able to locate the 'struct vas_window' given the PSWID.
+ *
+ * BitsUsage
+ * 0:7 VAS id (8 bits)
+ * 8:15Unused, 0 (3 bits)
+ * 16:31   Window id (16 bits)
+ */
+static inline u32 encode_pswid(int vasid, int winid)
+{
+   u32 pswid = 0;
+
+   pswid |= vasid << (31 - 7);
+   pswid |= winid;
+
+   return pswid;
+}
+
+static inline void decode_pswid(u32 pswid, int *vasid, int *winid)
+{
+   if (vasid)
+   *vasid = pswid >> (31 - 7) & 0xFF;
+
+   if (winid)
+   *winid = pswid & 0x;
+}
 #endif /* _VAS_H */
-- 
2.7.4



[PATCH v2 16/18] powerpc/vas: Define vas_win_paste_addr()

2017-10-06 Thread Sukadev Bhattiprolu
Define an interface that the NX drivers can use to find the physical
paste address of a send window. This interface is expected to be used
with the mmap() operation of the NX driver's device. i.e the user space
process can use driver's mmap() operation to map the send window's paste
address into their address space and then use copy and paste instructions
to submit the CRBs to the NX engine.

Note that kernel drivers will use vas_paste_crb() directly and don't need
this interface.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/vas.h  |  7 +++
 arch/powerpc/platforms/powernv/vas-window.c | 10 ++
 2 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index 044748f..f98ade8 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -10,6 +10,8 @@
 #ifndef _ASM_POWERPC_VAS_H
 #define _ASM_POWERPC_VAS_H
 
+struct vas_window;
+
 /*
  * Min and max FIFO sizes are based on Version 1.05 Section 3.1.4.25
  * (Local FIFO Size Register) of the VAS workbook.
@@ -165,4 +167,9 @@ int vas_copy_crb(void *crb, int offset);
  */
 int vas_paste_crb(struct vas_window *win, int offset, bool re);
 
+/*
+ * Return the power bus paste address associated with @win so the caller
+ * can map that address into their address space.
+ */
+extern u64 vas_win_paste_addr(struct vas_window *win);
 #endif /* __ASM_POWERPC_VAS_H */
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 088ce56..e4a9c7b 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -40,6 +40,16 @@ static void compute_paste_address(struct vas_window *window, 
u64 *addr, int *len
pr_debug("Txwin #%d: Paste addr 0x%llx\n", winid, *addr);
 }
 
+u64 vas_win_paste_addr(struct vas_window *win)
+{
+   u64 addr;
+
+   compute_paste_address(win, , NULL);
+
+   return addr;
+}
+EXPORT_SYMBOL(vas_win_paste_addr);
+
 static inline void get_hvwc_mmio_bar(struct vas_window *window,
u64 *start, int *len)
 {
-- 
2.7.4



[PATCH v2 15/18] powerpc: Emulate paste instruction

2017-10-06 Thread Sukadev Bhattiprolu
From: Michael Neuling <mi...@neuling.org>

On POWER9 DD2.1 and below there are issues when the paste instruction
generates an error. If an error occurs when thread reconfiguration
happens (ie another thread in the core goes into/out of powersave) the
core may hang.

To avoid this a special sequence is required which stops thread
configuration so that the paste can be safely executed.

This patch assumes paste executed in userspace are trapped into the
illegal instruction exception at 0xe40.

Here we re-execute the paste instruction but with the required
sequence to ensure thread reconfiguration doesn't occur.

Signed-off-by: Michael Neuling <mi...@neuling.org>
Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---

Edit by Sukadev: Use PPC_PASTE() rather than the paste instruction since
in older versions the instruction required a third parameter.
---
 arch/powerpc/include/asm/emulated_ops.h |  1 +
 arch/powerpc/include/asm/ppc-opcode.h   |  1 +
 arch/powerpc/include/asm/reg.h  |  2 ++
 arch/powerpc/kernel/traps.c | 64 +
 4 files changed, 68 insertions(+)

diff --git a/arch/powerpc/include/asm/emulated_ops.h 
b/arch/powerpc/include/asm/emulated_ops.h
index f00e10e..9247af9 100644
--- a/arch/powerpc/include/asm/emulated_ops.h
+++ b/arch/powerpc/include/asm/emulated_ops.h
@@ -55,6 +55,7 @@ extern struct ppc_emulated {
struct ppc_emulated_entry mfdscr;
struct ppc_emulated_entry mtdscr;
struct ppc_emulated_entry lq_stq;
+   struct ppc_emulated_entry paste;
 #endif
 } ppc_emulated;
 
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index ce0930d..a55d2ef 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -229,6 +229,7 @@
 #define PPC_INST_MTTMR 0x7c0003dc
 #define PPC_INST_NOP   0x6000
 #define PPC_INST_PASTE 0x7c20070d
+#define PPC_INST_PASTE_MASK0xfc2007ff
 #define PPC_INST_POPCNTB   0x7cf4
 #define PPC_INST_POPCNTB_MASK  0xfc0007fe
 #define PPC_INST_POPCNTD   0x7c0003f4
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index f92eaf7..5cde1c4 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -468,6 +468,8 @@
 #define SPRN_DBAT7U0x23E   /* Data BAT 7 Upper Register */
 #define SPRN_PPR   0x380   /* SMT Thread status Register */
 #define SPRN_TSCR  0x399   /* Thread Switch Control Register */
+#define SPRN_TRIG1 0x371   /* WAT Trigger 1 */
+#define SPRN_TRIG2 0x372   /* WAT Trigger 2 */
 
 #define SPRN_DEC   0x016   /* Decrement Register */
 #define SPRN_DER   0x095   /* Debug Enable Register */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 13c9dcd..7e6b1fe 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -956,6 +956,65 @@ static inline bool tm_abort_check(struct pt_regs *regs, 
int reason)
 }
 #endif
 
+static DEFINE_SPINLOCK(paste_emulation_lock);
+
+static inline int paste(void *i)
+{
+   int cr;
+   long retval = 0;
+
+   /* Need per core lock to ensure trig1/2 writes don't race */
+   spin_lock(_emulation_lock);
+   mtspr(SPRN_TRIG1, 0); /* data doesn't matter */
+   mtspr(SPRN_TRIG1, 0); /* HW says do this twice */
+   asm volatile(
+   "1: " PPC_PASTE(0, %2) "\n"
+   "2: mfcr %1\n"
+   ".section .fixup,\"ax\"\n"
+   "3: li %0,%3\n"
+   "   li %2,0\n"
+   "   b 2b\n"
+   ".previous\n"
+   EX_TABLE(1b, 3b)
+   : "=r" (retval), "=r" (cr)
+   : "b" (i), "i" (-EFAULT), "0" (retval));
+   mtspr(SPRN_TRIG2, 0);
+   spin_unlock(_emulation_lock);
+   return cr;
+}
+
+static int emulate_paste(struct pt_regs *regs, u32 instword)
+{
+   const void __user *addr;
+   unsigned long ea;
+   u8 ra, rb;
+
+   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   return -EINVAL;
+
+   ra = (instword >> 16) & 0x1f;
+   rb = (instword >> 11) & 0x1f;
+
+   ea = regs->gpr[rb] + (ra ? regs->gpr[ra] : 0ul);
+   if (is_32bit_task())
+   ea &= 0xul;
+   addr = (__force const void __user *)ea;
+
+   if (!access_ok(VERIFY_WRITE, addr, 128)) // cacheline size == 128
+   return -EFAULT;
+
+   hard_irq_disable(); /* FIXME: could we just soft disable ?? */
+   pagefault_disable();
+
+   PPC_WARN_EMULATED(paste, regs);
+   regs->ccr = paste((void *)addr);
+
+   pagefault_enable();
+   may_hard_irq_enable();
+
+   return 0;
+}
+
 stat

[PATCH v2 14/18] powerpc: Define set_thread_used_vas()

2017-10-06 Thread Sukadev Bhattiprolu
A CP_ABORT instruction is required in processes that have mapped a VAS
"paste address" with the intention of using COPY/PASTE instructions.
But since CP_ABORT is expensive, we want to restrict it to only processes
that use/intend to use COPY/PASTE.

Define an interface, set_thread_used_vas(), that VAS can use to indicate
that the current process opened a send window. During context switch,
issue CP_ABORT only for processes that have the flag set.

Thanks for input from Nick Piggin, Michael Ellerman.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/processor.h |  2 ++
 arch/powerpc/include/asm/switch_to.h |  2 ++
 arch/powerpc/kernel/process.c| 32 ++--
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 58cc212..bdab3b74 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -341,7 +341,9 @@ struct thread_struct {
unsigned long   sier;
unsigned long   mmcr2;
unsignedmmcr0;
+
unsignedused_ebb;
+   unsigned intused_vas;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index f5da32f..aeb305b 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -91,6 +91,8 @@ static inline void clear_task_ebb(struct task_struct *t)
 #endif
 }
 
+extern int set_thread_used_vas(void);
+
 extern int set_thread_tidr(struct task_struct *t);
 extern void clear_thread_tidr(struct task_struct *t);
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index d861fcd..cb5f108 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1234,17 +1234,17 @@ struct task_struct *__switch_to(struct task_struct 
*prev,
 * The copy-paste buffer can only store into foreign real
 * addresses, so unprivileged processes can not see the
 * data or use it in any way unless they have foreign real
-* mappings. We don't have a VAS driver that allocates those
-* yet, so no cpabort is required.
+* mappings. If the new process has the foreign real address
+* mappings, we must issue a cp_abort to clear any state and
+* prevent a covert channel being setup.
+*
+* DD1 allows paste into normal system memory so we do an
+* unpaired copy, rather than cp_abort, to clear the buffer,
+* since cp_abort is quite expensive.
 */
-   if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
-   /*
-* DD1 allows paste into normal system memory, so we
-* do an unpaired copy here to clear the buffer and
-* prevent a covert channel being set up.
-*
-* cpabort is not used because it is quite expensive.
-*/
+   if (new_thread->used_vas) {
+   asm volatile(PPC_CP_ABORT);
+   } else if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
asm volatile(PPC_COPY(%0, %1)
: : "r"(dummy_copy_buffer), "r"(0));
}
@@ -1445,6 +1445,18 @@ void flush_thread(void)
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
 }
 
+int set_thread_used_vas(void)
+{
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   return -EINVAL;
+
+   current->thread.used_vas = 1;
+
+#endif /* CONFIG_PPC_BOOK3S_64 */
+   return 0;
+}
+
 #ifdef CONFIG_PPC64
 static DEFINE_SPINLOCK(vas_thread_id_lock);
 static DEFINE_IDA(vas_thread_ida);
-- 
2.7.4



[PATCH v2 13/18] powerpc: Add support for setting SPRN_TIDR

2017-10-06 Thread Sukadev Bhattiprolu
We need the SPRN_TIDR to be set for use with fast thread-wakeup (core-
to-core wakeup) and also with CAPI.

Each thread in a process needs to have a unique id within the process.
But as explained below, for now, we assign globally unique thread ids
to all threads in the system.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
Signed-off-by: Philippe Bergheaud <fe...@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clomb...@linux.vnet.ibm.com>
---
Changelog[v3]
- Merge changes with and address comments to Christophe's patch.
  (i.e drop CONFIG_PPC_VAS; use CONFIG_PPC64; check CPU_ARCH_300
  before setting TIDR). Defer following to separate patches:
- emulation parts of Christophe's patch,
- setting TIDR for tasks other than 'current'
- setting feature bit in AT_HWCAP2

Changelog[v2]
- Michael Ellerman: Use an interface to assign TIDR so it is
assigned to only threads that need it; move assignment to
restore_sprs(). Drop lint from rebase;
---
 arch/powerpc/include/asm/processor.h |   1 +
 arch/powerpc/include/asm/switch_to.h |   3 +
 arch/powerpc/kernel/process.c| 122 +++
 3 files changed, 126 insertions(+)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index fab7ff8..58cc212 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -329,6 +329,7 @@ struct thread_struct {
 */
int dscr_inherit;
unsigned long   ppr;/* used to save/restore SMT priority */
+   unsigned long   tidr;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
unsigned long   tar;
diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 17c8380..f5da32f 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -91,4 +91,7 @@ static inline void clear_task_ebb(struct task_struct *t)
 #endif
 }
 
+extern int set_thread_tidr(struct task_struct *t);
+extern void clear_thread_tidr(struct task_struct *t);
+
 #endif /* _ASM_POWERPC_SWITCH_TO_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 37ed60b..d861fcd 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1120,6 +1120,13 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
mtspr(SPRN_TAR, new_thread->tar);
}
 #endif
+#ifdef CONFIG_PPC64
+   if (old_thread->tidr != new_thread->tidr) {
+   /* TIDR should be non-zero only with ISA3.0. */
+   WARN_ON_ONCE(!cpu_has_feature(CPU_FTR_ARCH_300));
+   mtspr(SPRN_TIDR, new_thread->tidr);
+   }
+#endif
 }
 
 #ifdef CONFIG_PPC_BOOK3S_64
@@ -1438,9 +1445,117 @@ void flush_thread(void)
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
 }
 
+#ifdef CONFIG_PPC64
+static DEFINE_SPINLOCK(vas_thread_id_lock);
+static DEFINE_IDA(vas_thread_ida);
+
+/*
+ * We need to assign a unique thread id to each thread in a process.
+ *
+ * This thread id, referred to as TIDR, and separate from the Linux's tgid,
+ * is intended to be used to direct an ASB_Notify from the hardware to the
+ * thread, when a suitable event occurs in the system.
+ *
+ * One such event is a "paste" instruction in the context of Fast Thread
+ * Wakeup (aka Core-to-core wake up in the Virtual Accelerator Switchboard
+ * (VAS) in POWER9.
+ *
+ * To get a unique TIDR per process we could simply reuse task_pid_nr() but
+ * the problem is that task_pid_nr() is not yet available copy_thread() is
+ * called. Fixing that would require changing more intrusive arch-neutral
+ * code in code path in copy_process()?.
+ *
+ * Further, to assign unique TIDRs within each process, we need an atomic
+ * field (or an IDR) in task_struct, which again intrudes into the arch-
+ * neutral code. So try to assign globally unique TIDRs for now.
+ *
+ * NOTE: TIDR 0 indicates that the thread does not need a TIDR value.
+ *  For now, only threads that expect to be notified by the VAS
+ *  hardware need a TIDR value and we assign values > 0 for those.
+ */
+#define MAX_THREAD_CONTEXT ((1 << 16) - 1)
+static int assign_thread_tidr(void)
+{
+   int index;
+   int err;
+
+again:
+   if (!ida_pre_get(_thread_ida, GFP_KERNEL))
+   return -ENOMEM;
+
+   spin_lock(_thread_id_lock);
+   err = ida_get_new_above(_thread_ida, 1, );
+   spin_unlock(_thread_id_lock);
+
+   if (err == -EAGAIN)
+   goto again;
+   else if (err)
+   return err;
+
+   if (index > MAX_THREAD_CONTEXT) {
+   spin_lock(_thread_id_lock);
+   ida_remove(_thread_ida, index);
+   spin_unlock(_thread_id_lock);
+   return -ENOMEM;
+   }
+
+   return index;
+}
+
+stati

[PATCH v2 12/18] powerpc: have copy depend on CONFIG_BOOK3S_64

2017-10-06 Thread Sukadev Bhattiprolu
Have the COPY/PASTE instructions depend on CONFIG_BOOK3S_64 rather than
CONFIG_PPC_STD_MMU_64.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/process.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a0c74bb..37ed60b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1215,10 +1215,14 @@ struct task_struct *__switch_to(struct task_struct 
*prev,
batch = this_cpu_ptr(_tlb_batch);
batch->active = 1;
}
+#endif /* CONFIG_PPC_STD_MMU_64 */
 
if (current_thread_info()->task->thread.regs) {
+#ifdef CONFIG_PPC_STD_MMU_64
restore_math(current_thread_info()->task->thread.regs);
+#endif /* CONFIG_PPC_STD_MMU_64 */
 
+#ifdef CONFIG_PPC_BOOK3S_64
/*
 * The copy-paste buffer can only store into foreign real
 * addresses, so unprivileged processes can not see the
@@ -1237,8 +1241,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
asm volatile(PPC_COPY(%0, %1)
: : "r"(dummy_copy_buffer), "r"(0));
}
+#endif /* CONFIG_PPC_BOOK3S_64 */
}
-#endif /* CONFIG_PPC_STD_MMU_64 */
 
return last;
 }
-- 
2.7.4



[PATCH v2 11/18] powerpc/vas: Export HVWC to debugfs

2017-10-06 Thread Sukadev Bhattiprolu
Export the VAS Window context information to debugfs.

We need to hold a mutex when closing the window to prevent a race
with the debugfs read(). Rather than introduce a per-instance mutex,
we use the global vas_mutex for now, since it is not heavily contended.

The window->cop field is only relevant to a receive window so we were
not setting it for a send window (which is is paired to a receive window
anyway). But to simplify reporting in debugfs, set the 'cop' field for the
send window also.

Signed-off-by: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Makefile |   3 +-
 arch/powerpc/platforms/powernv/vas-debug.c  | 209 
 arch/powerpc/platforms/powernv/vas-window.c |  34 -
 arch/powerpc/platforms/powernv/vas.c|   6 +-
 arch/powerpc/platforms/powernv/vas.h|  14 ++
 5 files changed, 259 insertions(+), 7 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/vas-debug.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 37d60f7..17921c4 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -14,4 +14,5 @@ obj-$(CONFIG_TRACEPOINTS) += opal-tracepoints.o
 obj-$(CONFIG_OPAL_PRD) += opal-prd.o
 obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
-obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o
+obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o
+obj-$(CONFIG_PPC_FTW)  += nx-ftw.o
diff --git a/arch/powerpc/platforms/powernv/vas-debug.c 
b/arch/powerpc/platforms/powernv/vas-debug.c
new file mode 100644
index 000..ca22f1e
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/vas-debug.c
@@ -0,0 +1,209 @@
+/*
+ * Copyright 2016-17 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "vas: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include "vas.h"
+
+static struct dentry *vas_debugfs;
+
+static char *cop_to_str(int cop)
+{
+   switch (cop) {
+   case VAS_COP_TYPE_FAULT:return "Fault";
+   case VAS_COP_TYPE_842:  return "NX-842 Normal Priority";
+   case VAS_COP_TYPE_842_HIPRI:return "NX-842 High Priority";
+   case VAS_COP_TYPE_GZIP: return "NX-GZIP Normal Priority";
+   case VAS_COP_TYPE_GZIP_HIPRI:   return "NX-GZIP High Priority";
+   case VAS_COP_TYPE_FTW:  return "Fast Thread-wakeup";
+   default:return "Unknown";
+   }
+}
+
+static int info_dbg_show(struct seq_file *s, void *private)
+{
+   struct vas_window *window = s->private;
+
+   mutex_lock(_mutex);
+
+   /* ensure window is not unmapped */
+   if (!window->hvwc_map)
+   goto unlock;
+
+   seq_printf(s, "Type: %s, %s\n", cop_to_str(window->cop),
+   window->tx_win ? "Send" : "Receive");
+   seq_printf(s, "Pid : %d\n", window->pid);
+
+unlock:
+   mutex_unlock(_mutex);
+   return 0;
+}
+
+static int info_dbg_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, info_dbg_show, inode->i_private);
+}
+
+static const struct file_operations info_fops = {
+   .open   = info_dbg_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+static inline void print_reg(struct seq_file *s, struct vas_window *win,
+   char *name, u32 reg)
+{
+   seq_printf(s, "0x%016llx %s\n", read_hvwc_reg(win, name, reg), name);
+}
+
+static int hvwc_dbg_show(struct seq_file *s, void *private)
+{
+   struct vas_window *window = s->private;
+
+   mutex_lock(_mutex);
+
+   /* ensure window is not unmapped */
+   if (!window->hvwc_map)
+   goto unlock;
+
+   print_reg(s, window, VREG(LPID));
+   print_reg(s, window, VREG(PID));
+   print_reg(s, window, VREG(XLATE_MSR));
+   print_reg(s, window, VREG(XLATE_LPCR));
+   print_reg(s, window, VREG(XLATE_CTL));
+   print_reg(s, window, VREG(AMR));
+   print_reg(s, window, VREG(SEIDR));
+   print_reg(s, window, VREG(FAULT_TX_WIN));
+   print_reg(s, window, VREG(OSU_INTR_SRC_RA));
+   print_reg(s, window, VREG(HV_INTR_SRC_RA));
+   print_reg(s, window, VREG(PSWID));
+   print_reg(s, window, VREG(LFIFO_BAR));
+   print_reg(s, window, VREG(LDATA_STAMP_CTL));
+   print_reg(s, window, VREG(LDMA_CACHE_CTL));
+   print_reg(s, window, VREG(LRFIFO_PUSH));
+   print_reg(s, window, VRE

  1   2   3   4   5   6   7   8   9   10   >