Re: [PATCH] powerpc/devicetrees: Change 'gpios' to 'cs-gpios' on fsl,spi nodes

2019-12-13 Thread Christophe Leroy




Le 13/12/2019 à 22:34, Rob Herring a écrit :

On Thu, Nov 28, 2019 at 12:16:35PM +, Christophe Leroy wrote:

Since commit 0f0581b24bd0 ("spi: fsl: Convert to use CS GPIO
descriptors"), the prefered way to define chipselect GPIOs is using
'cs-gpios' property instead of the legacy 'gpios' property.


This will break using a new dtb on a kernel without the above commit. Or
with any OS that never made the change.


Why would anybody use a new dtb on an old kernel ? I have not tagged 
this change for stable, it will only apply to DTBs in new kernels, won't 
it ?


That's not the first time DTS have to change for new kernels. For 
instance, some time ago I had to replace all 'gpios' property by a set 
of 'rdy-gpio', 'nce-gpio', 'ale-gpio' and 'cle-gpio' properties to 
continue using 'gpio-control-nand' driver.




I'm fine with the doc change, but you should keep 'gpios' as deprecated.


Ok

Christophe


[PATCH v11 23/25] mm/gup: track FOLL_PIN pages

2019-12-13 Thread John Hubbard
Add tracking of pages that were pinned via FOLL_PIN.

As mentioned in the FOLL_PIN documentation, callers who effectively set
FOLL_PIN are required to ultimately free such pages via unpin_user_page().
The effect is similar to FOLL_GET, and may be thought of as "FOLL_GET
for DIO and/or RDMA use".

Pages that have been pinned via FOLL_PIN are identifiable via a
new function call:

   bool page_dma_pinned(struct page *page);

What to do in response to encountering such a page, is left to later
patchsets. There is discussion about this in [1], [2], and [3].

This also changes a BUG_ON(), to a WARN_ON(), in follow_page_mask().

[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/

Suggested-by: Jan Kara 
Suggested-by: Jérôme Glisse 
Cc: Kirill A. Shutemov 
Signed-off-by: John Hubbard 
---

Hi Jan,

This should address all of your comments for patch 23!

thanks,
John Hubbard
NVIDIA

 Documentation/core-api/pin_user_pages.rst |   2 +-
 include/linux/mm.h|  83 -
 include/linux/mmzone.h|   2 +
 include/linux/page_ref.h  |  10 +
 mm/gup.c  | 409 +-
 mm/huge_memory.c  |  28 +-
 mm/hugetlb.c  |  38 +-
 mm/vmstat.c   |   2 +
 8 files changed, 438 insertions(+), 136 deletions(-)

diff --git a/Documentation/core-api/pin_user_pages.rst 
b/Documentation/core-api/pin_user_pages.rst
index 1d490155ecd7..2db14df1f2d7 100644
--- a/Documentation/core-api/pin_user_pages.rst
+++ b/Documentation/core-api/pin_user_pages.rst
@@ -53,7 +53,7 @@ Which flags are set by each wrapper
 For these pin_user_pages*() functions, FOLL_PIN is OR'd in with whatever gup
 flags the caller provides. The caller is required to pass in a non-null struct
 pages* array, and the function then pin pages by incrementing each by a special
-value. For now, that value is +1, just like get_user_pages*().::
+value: GUP_PIN_COUNTING_BIAS.::
 
  Function
  
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6a1a357e7d86..bb44c4d2ada7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1016,6 +1016,8 @@ static inline void get_page(struct page *page)
page_ref_inc(page);
 }
 
+bool __must_check try_grab_page(struct page *page, unsigned int flags);
+
 static inline __must_check bool try_get_page(struct page *page)
 {
page = compound_head(page);
@@ -1044,29 +1046,80 @@ static inline void put_page(struct page *page)
__put_page(page);
 }
 
-/**
- * unpin_user_page() - release a gup-pinned page
- * @page:pointer to page to be released
+/*
+ * GUP_PIN_COUNTING_BIAS, and the associated functions that use it, overload
+ * the page's refcount so that two separate items are tracked: the original 
page
+ * reference count, and also a new count of how many pin_user_pages() calls 
were
+ * made against the page. ("gup-pinned" is another term for the latter).
+ *
+ * With this scheme, pin_user_pages() becomes special: such pages are marked as
+ * distinct from normal pages. As such, the unpin_user_page() call (and its
+ * variants) must be used in order to release gup-pinned pages.
+ *
+ * Choice of value:
+ *
+ * By making GUP_PIN_COUNTING_BIAS a power of two, debugging of page reference
+ * counts with respect to pin_user_pages() and unpin_user_page() becomes
+ * simpler, due to the fact that adding an even power of two to the page
+ * refcount has the effect of using only the upper N bits, for the code that
+ * counts up using the bias value. This means that the lower bits are left for
+ * the exclusive use of the original code that increments and decrements by one
+ * (or at least, by much smaller values than the bias value).
  *
- * Pages that were pinned via pin_user_pages*() must be released via either
- * unpin_user_page(), or one of the unpin_user_pages*() routines. This is so
- * that eventually such pages can be separately tracked and uniquely handled. 
In
- * particular, interactions with RDMA and filesystems need special handling.
+ * Of course, once the lower bits overflow into the upper bits (and this is
+ * OK, because subtraction recovers the original values), then visual 
inspection
+ * no longer suffices to directly view the separate counts. However, for normal
+ * applications that don't have huge page reference counts, this won't be an
+ * issue.
  *
- * unpin_user_page() and put_page() are not interchangeable, despite this early
- * implementation that makes them look the same. unpin_user_page() calls must
- * be perfectly matched up with pin*() calls.
+ * Locking: the lockless algorithm described in page_cache_get_speculative()
+ * and page_cache_gup_pin_speculative() provides safe 

Re: [PATCH net v2] net/ibmvnic: Fix typo in retry check

2019-12-13 Thread Jakub Kicinski
On Wed, 11 Dec 2019 09:38:39 -0600, Thomas Falcon wrote:
> This conditional is missing a bang, with the intent
> being to break when the retry count reaches zero.
> 
> Fixes: 476d96ca9c ("ibmvnic: Bound waits for device queries")
> Suggested-by: Juliet Kim 
> Signed-off-by: Thomas Falcon 

Ah damn, looks like this originates from my pseudo code.

I had to fix the fixes tag:

Commit: 847496ccfa22 ("net/ibmvnic: Fix typo in retry check")
Fixes tag: Fixes: 476d96ca9c ("ibmvnic: Bound waits for device queries")
Has these problem(s):
- SHA1 should be at least 12 digits long
  Can be fixed by setting core.abbrev to 12 (or more) or (for 
git v2.11
  or later) just making sure it is not set (or set to "auto").

Applied to net, thanks!

> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
> b/drivers/net/ethernet/ibm/ibmvnic.c
> index efb0f10..2d84523 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -184,7 +184,7 @@ static int ibmvnic_wait_for_completion(struct 
> ibmvnic_adapter *adapter,
>   netdev_err(netdev, "Device down!\n");
>   return -ENODEV;
>   }
> - if (retry--)
> + if (!retry--)
>   break;
>   if (wait_for_completion_timeout(comp_done, div_timeout))
>   return 0;



Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

2019-12-13 Thread Linus Torvalds
On Fri, Dec 13, 2019 at 1:33 PM Arnd Bergmann  wrote:
>
> A few hundred randconfig (x86, arm32 and arm64) builds later I
> still only found one other instance:

Just send me the pull request to READ_ONCE() and WRITE_ONCE() be
arithmetic types, and your two trivial fixes, and let's get this over
with.

With that, you can remove the 'volatile' with my simple
'typeof(0+*(p))' trick, and we're all good, and we don't need to worry
about compiler versions either.

I'm willing to take that after the merge window as a "sanity fix".

 Linus


Re: [PATCH v3 1/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables

2019-12-13 Thread Balbir Singh



On 13/12/19 2:16 am, Daniel Axtens wrote:
> powerpc has a variable number of PTRS_PER_*, set at runtime based
> on the MMU that the kernel is booted under.
> 
> This means the PTRS_PER_* are no longer constants, and therefore
> breaks the build.
> 
> Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
> As KASAN is the only user at the moment, just define them in the kasan
> header, and have them default to PTRS_PER_* unless overridden in arch
> code.
> 
> Suggested-by: Christophe Leroy 
> Suggested-by: Balbir Singh 
> Signed-off-by: Daniel Axtens 
> ---
Reviewed-by: Balbir Singh 

Balbir


Re: [PATCH] powerpc/devicetrees: Change 'gpios' to 'cs-gpios' on fsl,spi nodes

2019-12-13 Thread Rob Herring
On Thu, Nov 28, 2019 at 12:16:35PM +, Christophe Leroy wrote:
> Since commit 0f0581b24bd0 ("spi: fsl: Convert to use CS GPIO
> descriptors"), the prefered way to define chipselect GPIOs is using
> 'cs-gpios' property instead of the legacy 'gpios' property.

This will break using a new dtb on a kernel without the above commit. Or 
with any OS that never made the change.

I'm fine with the doc change, but you should keep 'gpios' as deprecated.

> 
> Signed-off-by: Christophe Leroy 
> ---
>  Documentation/devicetree/bindings/spi/fsl-spi.txt | 8 
>  arch/powerpc/boot/dts/mgcoge.dts  | 2 +-
>  arch/powerpc/boot/dts/mpc832x_rdb.dts | 2 +-
>  arch/powerpc/boot/dts/mpc8610_hpcd.dts| 2 +-
>  4 files changed, 7 insertions(+), 7 deletions(-)


Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

2019-12-13 Thread Arnd Bergmann
On Fri, Dec 13, 2019 at 2:17 PM Arnd Bergmann  wrote:
>
> On Thu, Dec 12, 2019 at 9:50 PM Linus Torvalds
>  wrote:
> I'll have my randconfig builder look for instances, so far I found one,
> see below. My feeling is that it would be better to enforce at least
> the size being a 1/2/4/8, to avoid cases where someone thinks
> the access is atomic, but it falls back on a memcpy.
>
>   Arnd
>
> diff --git a/drivers/xen/time.c b/drivers/xen/time.c
> index 0968859c29d0..adb492c0aa34 100644
> --- a/drivers/xen/time.c
> +++ b/drivers/xen/time.c
> @@ -64,7 +64,7 @@ static void xen_get_runstate_snapshot_cpu_delta(
> do {
> state_time = get64(>state_entry_time);
> rmb();  /* Hypervisor might update data. */
> -   *res = READ_ONCE(*state);
> +   memcpy(res, state, sizeof(*res));
> rmb();  /* Hypervisor might update data. */
> } while (get64(>state_entry_time) != state_time ||
>  (state_time & XEN_RUNSTATE_UPDATE));


A few hundred randconfig (x86, arm32 and arm64) builds later I
still only found one other instance:

diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index eddae4688862..1c1f33447e96 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -304,7 +304,9 @@ static inline struct xdp_desc
*xskq_validate_desc(struct xsk_queue *q,
struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
unsigned int idx = q->cons_tail & q->ring_mask;

-   *desc = READ_ONCE(ring->desc[idx]);
+   barrier();
+   memcpy(desc, >desc[idx], sizeof(*desc));
+   barrier();
if (xskq_is_valid_desc(q, desc, umem))
return desc;

   Arnd


Re: [PATCH v5 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt

2019-12-13 Thread Michael Ellerman
On Fri, 2019-12-13 at 03:50:35 UTC, Michael Ellerman wrote:
> From: Srikar Dronamraju 
> 
> With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on
> pre-empted vCPUs"), the scheduler avoids preempted vCPUs to schedule
> tasks on wakeup. This leads to wrong choice of CPU, which in-turn
> leads to larger wakeup latencies. Eventually, it leads to performance
> regression in latency sensitive benchmarks like soltp, schbench etc.
> 
> On Powerpc, vcpu_is_preempted() only looks at yield_count. If the
> yield_count is odd, the vCPU is assumed to be preempted. However
> yield_count is increased whenever the LPAR enters CEDE state (idle).
> So any CPU that has entered CEDE state is assumed to be preempted.
> 
> Even if vCPU of dedicated LPAR is preempted/donated, it should have
> right of first-use since they are supposed to own the vCPU.
...
> 
> Waiman Long suggested using static_keys.
> 
> Fixes: 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs")
> Cc: sta...@vger.kernel.org # v4.18+
> Reported-by: Parth Shah 
> Reported-by: Ihor Pasichnyk 
> Tested-by: Juri Lelli 
> Acked-by: Waiman Long 
> Reviewed-by: Gautham R. Shenoy 
> Signed-off-by: Srikar Dronamraju 
> Acked-by: Phil Auld 
> Reviewed-by: Vaidyanathan Srinivasan 
> Tested-by: Parth Shah 
> [mpe: Move the key and setting of the key to pseries/setup.c]
> Signed-off-by: Michael Ellerman 

Series applied to powerpc fixes.

https://git.kernel.org/powerpc/c/14c73bd344da60abaf7da3ea2e7733ddda35bbac

cheers


Re: [PATCH] ocxl: Fix concurrent AFU open and device removal

2019-12-13 Thread Michael Ellerman
On Mon, 2019-06-24 at 14:41:48 UTC, Frederic Barrat wrote:
> If an ocxl device is unbound through sysfs at the same time its AFU is
> being opened by a user process, the open code may dereference freed
> stuctures, which can lead to kernel oops messages. You'd have to hit a
> tiny time window, but it's possible. It's fairly easy to test by
> making the time window bigger artificially.
> 
> Fix it with a combination of 2 changes:
> - when an AFU device is found in the IDR by looking for the device
> minor number, we should hold a reference on the device until after the
> context is allocated. A reference on the AFU structure is kept when
> the context is allocated, so we can release the reference on the
> device after the context allocation.
> - with the fix above, there's still another even tinier window,
> between the time the AFU device is found in the IDR and the reference
> on the device is taken. We can fix this one by removing the IDR entry
> earlier, when the device setup is removed, instead of waiting for the
> 'release' device callback. With proper locking around the IDR.
> 
> Fixes: 75ca758adbaf ("ocxl: Create a clear delineation between ocxl backend & 
> frontend")
> Signed-off-by: Frederic Barrat 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/a58d37bce0d21cf7fbd589384c619e465ef2f927

cheers


Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

2019-12-13 Thread Michael Ellerman
Segher Boessenkool  writes:
> Hi!
>
> On Fri, Dec 13, 2019 at 11:07:55PM +1100, Michael Ellerman wrote:
>> I tried this:
>> 
>> > @@ -295,6 +296,23 @@ void __write_once_size(volatile void *p, void *res, 
>> > int size)
>> >   */
>> >  #define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0)
>> >  
>> > +#else /* GCC_VERSION < 40800 */
>> > +
>> > +#define READ_ONCE_NOCHECK(x)  
>> > \
>> > +({
>> > \
>> > +  typeof(x) __x = *(volatile typeof(x))&(x);  \
>> 
>> Didn't compile, needed:
>> 
>>  typeof(x) __x = *(volatile typeof())&(x); \
>> 
>> 
>> > +  smp_read_barrier_depends(); \
>> > +  __x;
>> > +})
>> 
>> 
>> And that works for me. No extra stack check stuff.
>> 
>> I guess the question is does that version of READ_ONCE() implement the
>> read once semantics. Do we have a good way to test that?
>> 
>> The only differences are because of the early return in the generic
>> test_and_set_bit_lock():
>
> No, there is another difference:
>
>>   30 ld  r10,560(r9)
>>   31 std r10,104(r1)
>>   32 ld  r10,104(r1)
>>   33 andi.   r10,r10,1
>>   34 bne29 bne   
>>   
>
> The stack var is volatile, so it is read back immediately after writing
> it, here.  This is a bad idea for performance, in general.

Argh, yuck. Thanks, I shouldn't try to read asm listings at 11pm.

So that just confirms what Will was saying further up the thread about
the volatile pointer, rather than READ_ONCE() per se.

cheers


Re: [PATCH kernel 2/3] powerpc/pseries: Allow not having ibm,hypertas-functions::hcall-multi-tce for DDW

2019-12-13 Thread Ram Pai
On Fri, Dec 13, 2019 at 07:45:36PM +1100, Alexey Kardashevskiy wrote:
> By default a pseries guest supports a H_PUT_TCE hypercall which maps
> a single IOMMU page in a DMA window. Additionally the hypervisor may
> support H_PUT_TCE_INDIRECT/H_STUFF_TCE which update multiple TCEs at once;
> this is advertised via the device tree /rtas/ibm,hypertas-functions
> property which Linux converts to FW_FEATURE_MULTITCE.

Thanks Alexey for the patches!

> 
> FW_FEATURE_MULTITCE is checked when dma_iommu_ops is used; however
> the code managing the huge DMA window (DDW) ignores it and calls
> H_PUT_TCE_INDIRECT even if it is explicitly disabled via
> the "multitce=off" kernel command line parameter.

Also H_PUT_TCE_INDIRECT should not be called in secure VM, even when
"multitce=on".  right? Or does it get disabled somehow in the 
secure guest?


> 
> This adds FW_FEATURE_MULTITCE checking to the DDW code path.
> 
> This changes tce_build_pSeriesLP to take liobn and page size as
> the huge window does not have iommu_table descriptor which usually
> the place to store these numbers.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> The idea is then set FW_FEATURE_MULTITCE in init_svm() and have the guest

I think, you mean unset FW_FEATURE_MULTITCE.

> use H_PUT_TCE without sharing a (temporary) page for H_PUT_TCE_INDIRECT.
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 44 ++
>  1 file changed, 30 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index df7db33ca93b..f6e9b87c82fc 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -132,10 +132,10 @@ static unsigned long tce_get_pseries(struct iommu_table 
> *tbl, long index)
>   return be64_to_cpu(*tcep);
>  }
> 
> -static void tce_free_pSeriesLP(struct iommu_table*, long, long);
> +static void tce_free_pSeriesLP(unsigned long liobn, long, long);
>  static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
> 
> -static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
> +static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long 
> tceshift,
>   long npages, unsigned long uaddr,
>   enum dma_data_direction direction,
>   unsigned long attrs)
> @@ -146,25 +146,25 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, 
> long tcenum,
>   int ret = 0;
>   long tcenum_start = tcenum, npages_start = npages;
> 
> - rpn = __pa(uaddr) >> TCE_SHIFT;
> + rpn = __pa(uaddr) >> tceshift;
>   proto_tce = TCE_PCI_READ;
>   if (direction != DMA_TO_DEVICE)
>   proto_tce |= TCE_PCI_WRITE;
> 
>   while (npages--) {
> - tce = proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT;
> - rc = plpar_tce_put((u64)tbl->it_index, (u64)tcenum << 12, tce);
> + tce = proto_tce | (rpn & TCE_RPN_MASK) << tceshift;
> + rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, tce);
> 
>   if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
>   ret = (int)rc;
> - tce_free_pSeriesLP(tbl, tcenum_start,
> + tce_free_pSeriesLP(liobn, tcenum_start,
>  (npages_start - (npages + 1)));
>   break;
>   }
> 
>   if (rc && printk_ratelimit()) {
>   printk("tce_build_pSeriesLP: plpar_tce_put failed. 
> rc=%lld\n", rc);
> - printk("\tindex   = 0x%llx\n", (u64)tbl->it_index);
> + printk("\tindex   = 0x%llx\n", (u64)liobn);
>   printk("\ttcenum  = 0x%llx\n", (u64)tcenum);
>   printk("\ttce val = 0x%llx\n", tce );
>   dump_stack();
> @@ -193,7 +193,8 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
> *tbl, long tcenum,
>   unsigned long flags;
> 
>   if ((npages == 1) || !firmware_has_feature(FW_FEATURE_MULTITCE)) {
> - return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
> + return tce_build_pSeriesLP(tbl->it_index, tcenum,
> +tbl->it_page_shift, npages, uaddr,
>  direction, attrs);
>   }
> 
> @@ -209,8 +210,9 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
> *tbl, long tcenum,
>   /* If allocation fails, fall back to the loop implementation */
>   if (!tcep) {
>   local_irq_restore(flags);
> - return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
> - direction, attrs);
> + return tce_build_pSeriesLP(tbl->it_index, tcenum,
> + tbl->it_page_shift,
> + 

Re: [PATCH 00/58] serial/sysrq: Cleanup ifdeffery

2019-12-13 Thread Dmitry Safonov
Hi Christophe,

On 12/13/19 5:47 AM, Christophe Leroy wrote:
> Le 13/12/2019 à 01:05, Dmitry Safonov a écrit :
[..]
> 
> powerpc patchwork didn't get the full series, see
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=148198

Yes, I was under impression that architecture mail-lists want related
patches. But now I see that from the patchwork point of view it's better
to have the whole series in inbox.

> Can't find them on linux-serial patchwork either
> (https://patches.linaro.org/project/linux-serial/list/)

I'm not sure - maybe the frequency of checking is low?
I see all patches in linux-serial ml:
https://marc.info/?l=linux-serial=1=201912=2

> It is impossible to review/test powerpc bits without the first patches
> of the series, where can the entire series be found ?

Sorry for the inconvenience.
I can resend without Cc'ing all people just to ppc mail-list if that
works for you. Or you can clone it directly from my github:
https://github.com/0x7f454c46/linux/tree/sysrq-serial-seq-v1

Thanks,
  Dmitry


Re: [PATCH] libbpf: fix readelf output parsing on powerpc with recent binutils

2019-12-13 Thread Ben Hutchings
On Thu, 2019-12-12 at 11:53 +1100, Michael Ellerman wrote:
> Thadeu Lima de Souza Cascardo  writes:
[...]
> > This is a patch on binutils carried by Fedora:
> > 
> > https://src.fedoraproject.org/rpms/binutils/c/b8265c46f7ddae23a792ee8306fbaaeacba83bf8
> > 
> > " b8265c Have readelf display extra symbol information at the end of the 
> > line. "
> > 
> > It has the following comment:
> > 
> > # FIXME:The proper fix would be to update the scripts that are expecting
> > #   a fixed output from readelf.  But it seems that some of them are
> > #   no longer being maintained.
> > 
> > This commit is from 2017, had it been on binutils upstream, maybe the 
> > situation
> > right now would be different.
> 
> Bleeping bleep.
> 
> Looks like it was actually ruby that was the original problem:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=1479302
> 
> 
> Why it wasn't hacked around in the ruby package I don't know, doing it in
> the distro binutils package is not ideal.

That wouldn't help people building Ruby from upstream.

Any tool generating tabular output like this should add new fields at
the end (or show them only if requested), since there are bound to be
scripts that parse the output like this.  So I think Fedora's change to
readelf was reasonable, but should have been pushed upstream as soon as
possible.

Now everyone is going to have to deal with both formats.

Ben.

-- 
Ben Hutchings
Horngren's Observation:
  Among economists, the real world is often a special case.




signature.asc
Description: This is a digitally signed message part


[PATCH v2] powerpc/32: add support of KASAN_VMALLOC

2019-12-13 Thread Christophe Leroy
Add support of KASAN_VMALLOC on PPC32.

To allow this, the early shadow covering the VMALLOC space
need to be removed once high_memory var is set and before
freeing memblock.

And the VMALLOC area need to be aligned such that boundaries
are covered by a full shadow page.

Signed-off-by: Christophe Leroy 

---
v2: rebased ; exclude specific module handling when CONFIG_KASAN_VMALLOC is set.
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  5 +
 arch/powerpc/include/asm/kasan.h |  2 ++
 arch/powerpc/include/asm/nohash/32/pgtable.h |  5 +
 arch/powerpc/mm/kasan/kasan_init_32.c| 33 +++-
 arch/powerpc/mm/mem.c|  3 +++
 6 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1ec34e16ed65..a247bbfb03d4 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -173,6 +173,7 @@ config PPC
select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32
+   select HAVE_ARCH_KASAN_VMALLOC  if PPC32
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 0796533d37dd..5b39c11e884a 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -193,7 +193,12 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, 
pgprot_t prot);
 #else
 #define VMALLOC_START long)high_memory + VMALLOC_OFFSET) & 
~(VMALLOC_OFFSET-1)))
 #endif
+
+#ifdef CONFIG_KASAN_VMALLOC
+#define VMALLOC_END_ALIGN_DOWN(ioremap_bot, PAGE_SIZE << 
KASAN_SHADOW_SCALE_SHIFT)
+#else
 #define VMALLOC_ENDioremap_bot
+#endif
 
 #ifndef __ASSEMBLY__
 #include 
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 296e51c2f066..fbff9ff9032e 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -31,9 +31,11 @@
 void kasan_early_init(void);
 void kasan_mmu_init(void);
 void kasan_init(void);
+void kasan_late_init(void);
 #else
 static inline void kasan_init(void) { }
 static inline void kasan_mmu_init(void) { }
+static inline void kasan_late_init(void) { }
 #endif
 
 #endif /* __ASSEMBLY */
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 552b96eef0c8..60c4d829152e 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -114,7 +114,12 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, 
pgprot_t prot);
 #else
 #define VMALLOC_START long)high_memory + VMALLOC_OFFSET) & 
~(VMALLOC_OFFSET-1)))
 #endif
+
+#ifdef CONFIG_KASAN_VMALLOC
+#define VMALLOC_END_ALIGN_DOWN(ioremap_bot, PAGE_SIZE << 
KASAN_SHADOW_SCALE_SHIFT)
+#else
 #define VMALLOC_ENDioremap_bot
+#endif
 
 /*
  * Bits in a linux-style PTE.  These match the bits in the
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index 0e6ed4413eea..88036fb88350 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -129,6 +129,31 @@ static void __init kasan_remap_early_shadow_ro(void)
flush_tlb_kernel_range(KASAN_SHADOW_START, KASAN_SHADOW_END);
 }
 
+static void __init kasan_unmap_early_shadow_vmalloc(void)
+{
+   unsigned long k_start = (unsigned long)kasan_mem_to_shadow((void 
*)VMALLOC_START);
+   unsigned long k_end = (unsigned long)kasan_mem_to_shadow((void 
*)VMALLOC_END);
+   unsigned long k_cur;
+   phys_addr_t pa = __pa(kasan_early_shadow_page);
+
+   if (!early_mmu_has_feature(MMU_FTR_HPTE_TABLE)) {
+   int ret = kasan_init_shadow_page_tables(k_start, k_end);
+
+   if (ret)
+   panic("kasan: kasan_init_shadow_page_tables() failed");
+   }
+   for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
+   pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(k_cur), k_cur), 
k_cur);
+   pte_t *ptep = pte_offset_kernel(pmd, k_cur);
+
+   if ((pte_val(*ptep) & PTE_RPN_MASK) != pa)
+   continue;
+
+   __set_pte_at(_mm, k_cur, ptep, __pte(0), 0);
+   }
+   flush_tlb_kernel_range(k_start, k_end);
+}
+
 void __init kasan_mmu_init(void)
 {
int ret;
@@ -165,7 +190,13 @@ void __init kasan_init(void)
pr_info("KASAN init done\n");
 }
 
-#ifdef CONFIG_MODULES
+void __init kasan_late_init(void)
+{
+   if (IS_ENABLED(CONFIG_KASAN_VMALLOC))
+   kasan_unmap_early_shadow_vmalloc();
+}
+
+#if defined(CONFIG_MODULES) && !defined(CONFIG_KASAN_VMALLOC)
 void *module_alloc(unsigned long size)
 {

Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

2019-12-13 Thread Luc Van Oostenryck
On Fri, Dec 13, 2019 at 01:56:18PM +0100, Peter Zijlstra wrote:
> 
> Excellent! I had to change it to something like:
> 
> #define unqual_typeof(x)typeof(({_Atomic typeof(x) ___x __maybe_unused; 
> ___x; }))
> 
> but that does indeed work!
> 
> Now I suppose we should wrap that in a symbol that indicates our
> compiler does indeed support _Atomic, otherwise things will come apart.
> 
> That is, my gcc-4.6 doesn't seem to have it, while gcc-4.8 does, which
> is exactly the range that needs the daft READ_ONCE() construct, how
> convenient :/
> 
> Something a little like this perhaps?

Yes, this looks good to me.
Just a small nit here below.

> ---
> 
> diff --git a/arch/arm64/include/asm/barrier.h 
> b/arch/arm64/include/asm/barrier.h
> index 7d9cc5ec4971..c389af602da8 100644
> --- a/arch/arm64/include/asm/barrier.h
> +++ b/arch/arm64/include/asm/barrier.h
> @@ -75,9 +75,9 @@ static inline unsigned long 
> array_index_mask_nospec(unsigned long idx,
>  
>  #define __smp_store_release(p, v)\
>  do { \
> - typeof(p) __p = (p);\
> - union { typeof(*p) __val; char __c[1]; } __u =  \
> - { .__val = (__force typeof(*p)) (v) };  \
> + unqual_typeof(p) __p = (p); \
> + union { unqual_typeof(*p) __val; char __c[1]; } __u =   \
> + { .__val = (__force unqual_typeof(*p)) (v) };   \

The 2 two trailing backslashes are now off by one tab.

-- Luc 


Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

2019-12-13 Thread Segher Boessenkool
Hi!

On Fri, Dec 13, 2019 at 11:07:55PM +1100, Michael Ellerman wrote:
> I tried this:
> 
> > @@ -295,6 +296,23 @@ void __write_once_size(volatile void *p, void *res, 
> > int size)
> >   */
> >  #define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0)
> >  
> > +#else /* GCC_VERSION < 40800 */
> > +
> > +#define READ_ONCE_NOCHECK(x)   
> > \
> > +({ \
> > +   typeof(x) __x = *(volatile typeof(x))&(x);  \
> 
> Didn't compile, needed:
> 
>   typeof(x) __x = *(volatile typeof())&(x); \
> 
> 
> > +   smp_read_barrier_depends(); \
> > +   __x;
> > +})
> 
> 
> And that works for me. No extra stack check stuff.
> 
> I guess the question is does that version of READ_ONCE() implement the
> read once semantics. Do we have a good way to test that?
> 
> The only differences are because of the early return in the generic
> test_and_set_bit_lock():

No, there is another difference:

>   30 ld  r10,560(r9)
>   31 std r10,104(r1)
>   32 ld  r10,104(r1)
>   33 andi.   r10,r10,1
>   34 bne29 bne
>  

The stack var is volatile, so it is read back immediately after writing
it, here.  This is a bad idea for performance, in general.


Segher


Re: [PATCH 4/4] powerpc: Book3S 64-bit "heavyweight" KASAN support

2019-12-13 Thread Daniel Axtens
Hi Christophe,

- We run a lot of code at boot in real mode. This includes stuff like
  printk(), so it's not feasible to just disable instrumentation
  around it.
>>>
>>> Have you definitely given up the idea of doing a standard implementation
>>> of KASAN like other 64 bits arches have done ?
>>>
>>> Isn't it possible to setup an early 1:1 mapping and go in virtual mode
>>> earlier ? What is so different between book3s64 and book3e64 ?
>>> On book3e64, we've been able to setup KASAN before printing anything
>>> (except when using EARLY_DEBUG). Isn't it feasible on book3s64 too ?
>> 
>> So I got this pretty wrong when trying to explain it. The problem isn't
>> that we run the code in boot as I said, it's that a bunch of the KVM
>> code runs in real mode.
>
> Ok.
>
> Does it mean we would be able to implement it the standard way when 
> CONFIG_KVM is not selected ?

I suppose, but KVM is pretty important to me!

- disabled reporting when we're checking the stack for exception
  frames. The behaviour isn't wrong, just incompatible with KASAN.
>>>
>>> Does this applies to / impacts PPC32 at all ?
>> 
>> It should. I found that when doing stack walks, the code would touch
>> memory that KASAN hadn't unpoisioned. I'm a bit surprised you haven't
>> seen it arise, tbh.
>
> How do you trigger that ?
>
> I've tried to provoke some faults with LKDTM that provoke BUG dumps, but 
> it doesn't trip.
> I also performed task state listing via sysrq, and I don't get anything 
> wrong either.

I'll try to disable this and see if I can trigger it again.

- Dropped old module stuff in favour of KASAN_VMALLOC.
>>>
>>> You said in the cover that this is done to avoid having to split modules
>>> out of VMALLOC area. Would it be an issue to perform that split ?
>>> I can understand it is not easy on 32 bits because vmalloc space is
>>> rather small, but on 64 bits don't we have enough virtual space to
>>> confortably split modules out of vmalloc ? The 64 bits already splits
>>> ioremap away from vmalloc whereas 32 bits have them merged too.
>> 
>> I could have done this. Maybe I should have done this. But now I have
>> done vmalloc space support.
>
> So you force the use of KASAN_VMALLOC ? Doesn't it have a performance 
> impact ?

It has a perfomance impact when allocating and freeing virtual address
space in the vmalloc region, yes. There should be no discernable impact
when using vmalloc space.

My team is actively working on vmap-stack support for ppc64, with the
end goal of running syzkaller with vmap-stack and kasan. vmap-stack plus
kasan requires kasan-vmalloc, so for my purposes doing things in this
order makes sense.

I'd be happy to have a later series introduce the split and then make
KASAN_VMALLOC optional. I would need to understand the implications of
splitting the address space from a KASLR point of view: I don't want to
accidentally overly restrict the available randomness.

Regards,
Daniel

>
> Christophe


Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

2019-12-13 Thread Arnd Bergmann
On Thu, Dec 12, 2019 at 9:50 PM Linus Torvalds
 wrote:
> On Thu, Dec 12, 2019 at 11:34 AM Will Deacon  wrote:
> > The root of my concern in all of this, and what started me looking at it in
> > the first place, is the interaction with 'typeof()'. Inheriting 'volatile'
> > for a pointer means that local variables in macros declared using typeof()
> > suddenly start generating *hideous* code, particularly when pointless stack
> > spills get stackprotector all excited.
>
> Yeah, removing volatile can be a bit annoying.
>
> For the particular case of the bitops, though, it's not an issue.
> Since you know the type there, you can just cast it.
>
> And if we had the rule that READ_ONCE() was an arithmetic type, you could do
>
> typeof(0+(*p)) __var;
>
> since you might as well get the integer promotion anyway (on the
> non-volatile result).
>
> But that doesn't work with structures or unions, of course.
>
> I'm not entirely sure we have READ_ONCE() with a struct. I do know we
> have it with 64-bit entities on 32-bit machines, but that's ok with
> the "0+" trick.

I'll have my randconfig builder look for instances, so far I found one,
see below. My feeling is that it would be better to enforce at least
the size being a 1/2/4/8, to avoid cases where someone thinks
the access is atomic, but it falls back on a memcpy.

  Arnd

diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index 0968859c29d0..adb492c0aa34 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -64,7 +64,7 @@ static void xen_get_runstate_snapshot_cpu_delta(
do {
state_time = get64(>state_entry_time);
rmb();  /* Hypervisor might update data. */
-   *res = READ_ONCE(*state);
+   memcpy(res, state, sizeof(*res));
rmb();  /* Hypervisor might update data. */
} while (get64(>state_entry_time) != state_time ||
 (state_time & XEN_RUNSTATE_UPDATE));
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 5e88e7e33abe..f4ae360efdba 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -179,6 +179,8 @@ void ftrace_likely_update(struct
ftrace_likely_data *f, int val,

 #include 

+extern void __broken_access_once(void *, const void *, unsigned long);
+
 #define __READ_ONCE_SIZE   \
 ({ \
switch (size) { \
@@ -187,9 +189,7 @@ void ftrace_likely_update(struct
ftrace_likely_data *f, int val,
case 4: *(__u32 *)res = *(volatile __u32 *)p; break;\
case 8: *(__u64 *)res = *(volatile __u64 *)p; break;\
default:\
-   barrier();  \
-   __builtin_memcpy((void *)res, (const void *)p, size);   \
-   barrier();  \
+   __broken_access_once((void *)res, (const void *)p,
size);   \
}   \
 })

@@ -225,9 +225,7 @@ static __always_inline void
__write_once_size(volatile void *p, void *res, int s
case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
default:
-   barrier();
-   __builtin_memcpy((void *)p, (const void *)res, size);
-   barrier();
+   __broken_access_once((void *)p, (const void *)res, size);
}
 }


Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

2019-12-13 Thread Peter Zijlstra
On Fri, Dec 13, 2019 at 11:47:06AM +0100, Luc Van Oostenryck wrote:
> On Thu, Dec 12, 2019 at 09:53:38PM +0100, Peter Zijlstra wrote:
> > Now, looking at the current GCC source:
> > 
> >   
> > https://github.com/gcc-mirror/gcc/blob/97d7270f894395e513667a031a0c309d1819d05e/gcc/c/c-parser.c#L3707
> > 
> > it seems that __typeof__() is supposed to strip all qualifiers from
> > _Atomic types. That lead me to try:
> > 
> > typeof(_Atomic typeof(p)) __p = (p);
> > 
> > But alas, I still get the same junk you got for ool_store_release() :/
> 
> I was checking this to see if Sparse was ready to support this.
> I was a bit surprised because at first sigth GCC was doing as
> it claims (typeof striping const & volatile on _Atomic types)
> but your exampe wasn't working. But it's working if an
> intermediate var is used:
>   _Atomic typeof(p) tmp;
>   typeof(tmp) __p = (p);
> or, uglier but probably more practical:
>   typeof(({_Atomic typeof(p) tmp; })) __p = (p);
> 
> Go figure!

Excellent! I had to change it to something like:

#define unqual_typeof(x)typeof(({_Atomic typeof(x) ___x __maybe_unused; 
___x; }))

but that does indeed work!

Now I suppose we should wrap that in a symbol that indicates our
compiler does indeed support _Atomic, otherwise things will come apart.

That is, my gcc-4.6 doesn't seem to have it, while gcc-4.8 does, which
is exactly the range that needs the daft READ_ONCE() construct, how
convenient :/

Something a little like this perhaps?

---

diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 7d9cc5ec4971..c389af602da8 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -75,9 +75,9 @@ static inline unsigned long array_index_mask_nospec(unsigned 
long idx,
 
 #define __smp_store_release(p, v)  \
 do {   \
-   typeof(p) __p = (p);\
-   union { typeof(*p) __val; char __c[1]; } __u =  \
-   { .__val = (__force typeof(*p)) (v) };  \
+   unqual_typeof(p) __p = (p); \
+   union { unqual_typeof(*p) __val; char __c[1]; } __u =   \
+   { .__val = (__force unqual_typeof(*p)) (v) };   \
compiletime_assert_atomic_type(*p); \
kasan_check_write(__p, sizeof(*p)); \
switch (sizeof(*p)) {   \
@@ -110,8 +110,8 @@ do {
\
 
 #define __smp_load_acquire(p)  \
 ({ \
-   union { typeof(*p) __val; char __c[1]; } __u;   \
-   typeof(p) __p = (p);\
+   union { unqual_typeof(*p) __val; char __c[1]; } __u;\
+   unqual_typeof(p) __p = (p); \
compiletime_assert_atomic_type(*p); \
kasan_check_read(__p, sizeof(*p));  \
switch (sizeof(*p)) {   \
@@ -141,8 +141,8 @@ do {
\
 
 #define smp_cond_load_relaxed(ptr, cond_expr)  \
 ({ \
-   typeof(ptr) __PTR = (ptr);  \
-   typeof(*ptr) VAL;   \
+   unqual_typeof(ptr) __PTR = (ptr);   \
+   unqual_typeof(*ptr) VAL;\
for (;;) {  \
VAL = READ_ONCE(*__PTR);\
if (cond_expr)  \
@@ -154,8 +154,8 @@ do {
\
 
 #define smp_cond_load_acquire(ptr, cond_expr)  \
 ({ \
-   typeof(ptr) __PTR = (ptr);  \
-   typeof(*ptr) VAL;   \
+   unqual_typeof(ptr) __PTR = (ptr);   \
+   unqual_typeof(*ptr) VAL;\
for (;;) {  \
VAL = smp_load_acquire(__PTR);  \
if (cond_expr)  \
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 

Re: [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support

2019-12-13 Thread Christophe Leroy




Le 12/12/2019 à 16:16, Daniel Axtens a écrit :

KASAN support on Book3S is a bit tricky to get right:

  - It would be good to support inline instrumentation so as to be able to
catch stack issues that cannot be caught with outline mode.

  - Inline instrumentation requires a fixed offset.

  - Book3S runs code in real mode after booting. Most notably a lot of KVM
runs in real mode, and it would be good to be able to instrument it.

  - Because code runs in real mode after boot, the offset has to point to
valid memory both in and out of real mode.

[For those not immersed in ppc64, in real mode, the top nibble or 2 bits
(depending on radix/hash mmu) of the address is ignored. The linear
mapping is placed at 0xc000. This means that a pointer to
part of the linear mapping will work both in real mode, where it will be
interpreted as a physical address of the form 0x000..., and out of real
mode, where it will go via the linear mapping.]

One approach is just to give up on inline instrumentation. This way all
checks can be delayed until after everything set is up correctly, and the
address-to-shadow calculations can be overridden. However, the features and
speed boost provided by inline instrumentation are worth trying to do
better.

If _at compile time_ it is known how much contiguous physical memory a
system has, the top 1/8th of the first block of physical memory can be set
aside for the shadow. This is a big hammer and comes with 3 big
consequences:

  - there's no nice way to handle physically discontiguous memory, so only
the first physical memory block can be used.

  - kernels will simply fail to boot on machines with less memory than
specified when compiling.

  - kernels running on machines with more memory than specified when
compiling will simply ignore the extra memory.

Implement and document KASAN this way. The current implementation is Radix
only.

Despite the limitations, it can still find bugs,
e.g. http://patchwork.ozlabs.org/patch/1103775/

At the moment, this physical memory limit must be set _even for outline
mode_. This may be changed in a later series - a different implementation
could be added for outline mode that dynamically allocates shadow at a
fixed offset. For example, see https://patchwork.ozlabs.org/patch/795211/

Suggested-by: Michael Ellerman 
Cc: Balbir Singh  # ppc64 out-of-line radix version
Cc: Christophe Leroy  # ppc32 version
Signed-off-by: Daniel Axtens 

---
Changes since v2:

  - Address feedback from Christophe around cleanups and docs.
  - Address feedback from Balbir: at this point I don't have a good solution
for the issues you identify around the limitations of the inline 
implementation
but I think that it's worth trying to get the stack instrumentation support.
I'm happy to have an alternative and more flexible outline mode - I had
envisoned this would be called 'lightweight' mode as it imposes fewer 
restrictions.
I've linked to your implementation. I think it's best to add it in a 
follow-up series.
  - Made the default PHYS_MEM_SIZE_FOR_KASAN value 1024MB. I think most people 
have
guests with at least that much memory in the Radix 64s case so it's a much
saner default - it means that if you just turn on KASAN without reading the
docs you're much more likely to have a bootable kernel, which you will never
have if the value is set to zero! I'm happy to bikeshed the value if we 
want.

Changes since v1:
  - Landed kasan vmalloc support upstream
  - Lots of feedback from Christophe.

Changes since the rfc:

  - Boots real and virtual hardware, kvm works.

  - disabled reporting when we're checking the stack for exception
frames. The behaviour isn't wrong, just incompatible with KASAN.

  - Documentation!

  - Dropped old module stuff in favour of KASAN_VMALLOC.

The bugs with ftrace and kuap were due to kernel bloat pushing
prom_init calls to be done via the plt. Because we did not have
a relocatable kernel, and they are done very early, this caused
everything to explode. Compile with CONFIG_RELOCATABLE!
---
  Documentation/dev-tools/kasan.rst |   8 +-
  Documentation/powerpc/kasan.txt   | 112 +-
  arch/powerpc/Kconfig  |   3 +
  arch/powerpc/Kconfig.debug|  21 
  arch/powerpc/Makefile |  11 ++
  arch/powerpc/include/asm/book3s/64/hash.h |   4 +
  arch/powerpc/include/asm/book3s/64/pgtable.h  |   7 ++
  arch/powerpc/include/asm/book3s/64/radix.h|   5 +
  arch/powerpc/include/asm/kasan.h  |  21 +++-
  arch/powerpc/kernel/process.c |   8 ++
  arch/powerpc/kernel/prom.c|  64 +-
  arch/powerpc/mm/kasan/Makefile|   3 +-
  .../mm/kasan/{kasan_init_32.c => init_32.c}   |   0
  arch/powerpc/mm/kasan/init_book3s_64.c|  72 +++
  14 files changed, 330 

Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

2019-12-13 Thread Michael Ellerman
Peter Zijlstra  writes:
> On Thu, Dec 12, 2019 at 10:07:56AM +, Will Deacon wrote:
>
>> > So your proposed change _should_ be fine. Will, I'm assuming you never
>> > saw this on your ARGH64 builds when you did this code ?
>> 
>> I did see it, but (a) looking at the code out-of-line makes it look a lot
>> worse than it actually is (so the ext4 example is really helpful -- thanks
>> Michael!) and (b) I chalked it up to a crappy compiler.
>> 
>> However, see this comment from Arnd on my READ_ONCE series from the other
>> day:
>> 
>> https://lore.kernel.org/lkml/CAK8P3a0f=wvsqsbq4t0fmekcfe_mc3oarxaetvitsksa-d2...@mail.gmail.com
>> 
>> In which case, I'm thinking that we should be doing better in READ_ONCE()
>> for non-buggy compilers which would also keep the KCSAN folks happy for this
>> code (and would help with [1] too).
>
> So something like this then? Although I suppose that should be moved
> into compiler-gcc.h and then guarded by #ifndef READ_ONCE or so.

I tried this:

> @@ -295,6 +296,23 @@ void __write_once_size(volatile void *p, void *res, int 
> size)
>   */
>  #define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0)
>  
> +#else /* GCC_VERSION < 40800 */
> +
> +#define READ_ONCE_NOCHECK(x) \
> +({   \
> + typeof(x) __x = *(volatile typeof(x))&(x);  \

Didn't compile, needed:

typeof(x) __x = *(volatile typeof())&(x); \


> + smp_read_barrier_depends(); \
> + __x;
> +})


And that works for me. No extra stack check stuff.

I guess the question is does that version of READ_ONCE() implement the
read once semantics. Do we have a good way to test that?

The only differences are because of the early return in the generic
test_and_set_bit_lock():

   1 :1 
:
   2 addis   r2,r12,281  2 addis   
r2,r12,281
   3 addir2,r2,-223683 addi
r2,r2,-22064
   4 mflrr0  4 mflr
r0
   5 bl  <_mcount>   5 bl  
<_mcount>
   6 mflrr0  6 mflr
r0
   7 std r31,-8(r1)  7 std 
r31,-8(r1)
   8 std r30,-16(r1) 8 std 
r30,-16(r1)
   9 mr  r31,r3  9 mr  
r31,r3
  10 li  r3,24  10 li  
r3,24
  11 std r0,16(r1)  11 std 
r0,16(r1)
  12 stdur1,-128(r1)12 stdu
r1,-112(r1)
  13 ld  r30,920(r31)   13 ld  
r30,920(r31)
  14 bl14 bl  

  15 nop15 nop
  16 cmpdi   cr7,r3,0   16 cmpdi   
cr7,r3,0
  17 beq cr7,   17 beq 
cr7,
  18 ld  r9,920(r31)18 ld  
r9,920(r31)
  19 ld  r10,96(r30)19 ld  
r10,96(r30)
  20 lwz r7,84(r30) 20 lwz 
r7,84(r30)
  21 ld  r8,104(r9) 21 ld  
r8,104(r9)
  22 ld  r10,24(r10)22 ld  
r10,24(r10)
  23 lwz r8,20(r8)  23 lwz 
r8,20(r8)
  24 srd r10,r10,r7 24 srd 
r10,r10,r7
  25 cmpdcr7,r10,r8 25 cmpd
cr7,r10,r8
  26 bne cr7,  26 bne 
cr7,
  27 lhz r10,160(r9)27 lhz 
r10,160(r9)
  28 andi.   r10,r10,2  28 andi.   
r10,r10,2
  29 bne 
  30 ld  r10,560(r9)
  31 std r10,104(r1)
  32 ld  r10,104(r1)
  33 andi.   r10,r10,1
  34 bne29 bne 

  35 addir7,r9,560  30 addi
r9,r9,560
  36 li  r8,1   31 li  
r10,1
  37 ldarx   r10,0,r7   32 ldarx   
r3,0,r9,1
  38 or  r6,r8,r10  33 or  
r8,r3,r10
  39 stdcx.  r6,0,r734 stdcx.  
r8,0,r9
  40 bne-   35 bne-

  41 isync  36 

Re: [PATCH 4/4] powerpc: Book3S 64-bit "heavyweight" KASAN support

2019-12-13 Thread Christophe Leroy




Le 10/12/2019 à 06:10, Daniel Axtens a écrit :

Christophe Leroy  writes:


Le 07/08/2019 à 01:38, Daniel Axtens a écrit :

KASAN support on powerpc64 is interesting:

   - We want to be able to support inline instrumentation so as to be
 able to catch global and stack issues.

   - We run a lot of code at boot in real mode. This includes stuff like
 printk(), so it's not feasible to just disable instrumentation
 around it.


Have you definitely given up the idea of doing a standard implementation
of KASAN like other 64 bits arches have done ?

Isn't it possible to setup an early 1:1 mapping and go in virtual mode
earlier ? What is so different between book3s64 and book3e64 ?
On book3e64, we've been able to setup KASAN before printing anything
(except when using EARLY_DEBUG). Isn't it feasible on book3s64 too ?


So I got this pretty wrong when trying to explain it. The problem isn't
that we run the code in boot as I said, it's that a bunch of the KVM
code runs in real mode.


Ok.

Does it mean we would be able to implement it the standard way when 
CONFIG_KVM is not selected ?





   - disabled reporting when we're checking the stack for exception
 frames. The behaviour isn't wrong, just incompatible with KASAN.


Does this applies to / impacts PPC32 at all ?


It should. I found that when doing stack walks, the code would touch
memory that KASAN hadn't unpoisioned. I'm a bit surprised you haven't
seen it arise, tbh.


How do you trigger that ?

I've tried to provoke some faults with LKDTM that provoke BUG dumps, but 
it doesn't trip.
I also performed task state listing via sysrq, and I don't get anything 
wrong either.





   - Dropped old module stuff in favour of KASAN_VMALLOC.


You said in the cover that this is done to avoid having to split modules
out of VMALLOC area. Would it be an issue to perform that split ?
I can understand it is not easy on 32 bits because vmalloc space is
rather small, but on 64 bits don't we have enough virtual space to
confortably split modules out of vmalloc ? The 64 bits already splits
ioremap away from vmalloc whereas 32 bits have them merged too.


I could have done this. Maybe I should have done this. But now I have
done vmalloc space support.


So you force the use of KASAN_VMALLOC ? Doesn't it have a performance 
impact ?



Christophe


Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

2019-12-13 Thread Luc Van Oostenryck
On Thu, Dec 12, 2019 at 09:53:38PM +0100, Peter Zijlstra wrote:
> Now, looking at the current GCC source:
> 
>   
> https://github.com/gcc-mirror/gcc/blob/97d7270f894395e513667a031a0c309d1819d05e/gcc/c/c-parser.c#L3707
> 
> it seems that __typeof__() is supposed to strip all qualifiers from
> _Atomic types. That lead me to try:
> 
>   typeof(_Atomic typeof(p)) __p = (p);
> 
> But alas, I still get the same junk you got for ool_store_release() :/

I was checking this to see if Sparse was ready to support this.
I was a bit surprised because at first sigth GCC was doing as
it claims (typeof striping const & volatile on _Atomic types)
but your exampe wasn't working. But it's working if an
intermediate var is used:
_Atomic typeof(p) tmp;
typeof(tmp) __p = (p);
or, uglier but probably more practical:
typeof(({_Atomic typeof(p) tmp; })) __p = (p);

Go figure!

OTOH, at least on GCC 8.3, it seems to always do the same with
volatiles than it does with consts.

-- Luc Van Oostenryck


Re: [PATCH kernel 1/3] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VM.

2019-12-13 Thread Michael Ellerman
Alexey Kardashevskiy  writes:

> From: Ram Pai 
>
> Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on
>   secure guests")
> disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops
> path for secure VMs, helped enable dma_direct path.  This enabled
> support for bounce-buffering through SWIOTLB.  However it fails to
> operate when IOMMU is enabled, since I/O pages are not TCE mapped.
>
> Reenable dma_iommu_ops path for pseries Secure VMs.  It handles all
> cases including, TCE mapping I/O pages, in the presence of a
> IOMMU.
>
> Signed-off-by: Ram Pai 
> Signed-off-by: Alexey Kardashevskiy 

This seems like it should have a Fixes tag?

cheers

> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 6ba081dd61c9..df7db33ca93b 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -36,7 +36,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "pseries.h"
>  
> @@ -1320,15 +1319,7 @@ void iommu_init_early_pSeries(void)
>   of_reconfig_notifier_register(_reconfig_nb);
>   register_memory_notifier(_mem_nb);
>  
> - /*
> -  * Secure guest memory is inacessible to devices so regular DMA isn't
> -  * possible.
> -  *
> -  * In that case keep devices' dma_map_ops as NULL so that the generic
> -  * DMA code path will use SWIOTLB to bounce buffers for DMA.
> -  */
> - if (!is_secure_guest())
> - set_pci_dma_ops(_iommu_ops);
> + set_pci_dma_ops(_iommu_ops);
>  }
>  
>  static int __init disable_multitce(char *str)
> -- 
> 2.17.1


Re: [PATCH kernel 3/3] powerpc/pseries/iommu: Do not use H_PUT_TCE_INDIRECT in secure VM

2019-12-13 Thread Michael Ellerman
Alexey Kardashevskiy  writes:

> H_PUT_TCE_INDIRECT uses a 4K page with TCEs to allow handling up to 512
> TCEs per hypercall. While it is a decent optimization, we rather share
> less of secure VM memory so let's avoid sharing.
>
> This only allows H_PUT_TCE_INDIRECT for normal (not secure) VMs.
>
> This keeps using H_STUFF_TCE as it does not require sharing.
>
> Signed-off-by: Alexey Kardashevskiy 
> ---
>
> Possible alternatives are:
>
> 1. define FW_FEATURE_STUFFTCE (to allow H_STUFF_TCE) in addition to
> FW_FEATURE_MULTITCE (make it only enable H_PUT_TCE_INDIRECT) and enable
> only FW_FEATURE_STUFFTCE for SVM; pro = no SVM mention in iommu.c,
> con = a FW feature bit with very limited use

Yes let's do that please.

Actually let's rename FW_FEATURE_MULTITCE to FW_FEATURE_PUT_TCE_IND (?)
to make it clear that's what it controls now.

You should just be able to add two entries to hypertas_fw_features_table
that both look for "hcall-multi-tce". And then the SVM code disables the
PUT_TCE_IND feature at some point.

cheers

>
> 2. disable FW_FEATURE_MULTITCE and loose H_STUFF_TCE which adds a delay
> in booting process
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index f6e9b87c82fc..2334a67c7614 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -192,7 +192,8 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
> *tbl, long tcenum,
>   int ret = 0;
>   unsigned long flags;
>  
> - if ((npages == 1) || !firmware_has_feature(FW_FEATURE_MULTITCE)) {
> + if ((npages == 1) || !firmware_has_feature(FW_FEATURE_MULTITCE) ||
> + is_secure_guest()) {
>   return tce_build_pSeriesLP(tbl->it_index, tcenum,
>  tbl->it_page_shift, npages, uaddr,
>  direction, attrs);
> @@ -402,7 +403,8 @@ static int tce_setrange_multi_pSeriesLP(unsigned long 
> start_pfn,
>   u64 rc = 0;
>   long l, limit;
>  
> - if (!firmware_has_feature(FW_FEATURE_MULTITCE)) {
> + if (!firmware_has_feature(FW_FEATURE_MULTITCE) ||
> + is_secure_guest()) {
>   unsigned long tceshift = be32_to_cpu(maprange->tce_shift);
>   unsigned long dmastart = (start_pfn << PAGE_SHIFT) +
>   be64_to_cpu(maprange->dma_base);
> -- 
> 2.17.1


[PATCH] libbpf: Fix readelf output parsing for Fedora

2019-12-13 Thread Thadeu Lima de Souza Cascardo
Fedora binutils has been patched to show "other info" for a symbol at the
end of the line. This was done in order to support unmaintained scripts
that would break with the extra info. [1]

[1] 
https://src.fedoraproject.org/rpms/binutils/c/b8265c46f7ddae23a792ee8306fbaaeacba83bf8

This in turn has been done to fix the build of ruby, because of checksec.
[2] Thanks Michael Ellerman for the pointer.

[2] https://bugzilla.redhat.com/show_bug.cgi?id=1479302

As libbpf Makefile is not unmaintained, we can simply deal with either
output format, by just removing the "other info" field, as it always comes
inside brackets.

Cc: Aurelien Jarno 
Fixes: 3464afdf11f9 (libbpf: Fix readelf output parsing on powerpc with recent 
binutils)
Reported-by: Justin Forbes 
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 tools/lib/bpf/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index defae23a0169..23ae06c43d08 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -147,6 +147,7 @@ TAGS_PROG := $(if $(shell which etags 
2>/dev/null),etags,ctags)
 
 GLOBAL_SYM_COUNT = $(shell readelf -s --wide $(BPF_IN_SHARED) | \
   cut -d "@" -f1 | sed 's/_v[0-9]_[0-9]_[0-9].*//' | \
+  sed 's/\[.*\]//' | \
   awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}' | 
\
   sort -u | wc -l)
 VERSIONED_SYM_COUNT = $(shell readelf -s --wide $(OUTPUT)libbpf.so | \
@@ -213,6 +214,7 @@ check_abi: $(OUTPUT)libbpf.so
 "versioned in $(VERSION_SCRIPT)." >&2;  \
readelf -s --wide $(BPF_IN_SHARED) | \
cut -d "@" -f1 | sed 's/_v[0-9]_[0-9]_[0-9].*//' |   \
+   sed 's/\[.*\]//' |   \
awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}'|  \
sort -u > $(OUTPUT)libbpf_global_syms.tmp;   \
readelf -s --wide $(OUTPUT)libbpf.so |   \
-- 
2.24.0



Re: [PATCH v5 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt

2019-12-13 Thread Michael Ellerman
Srikar Dronamraju  writes:
> * Michael Ellerman  [2019-12-13 14:50:35]:
>
>> Waiman Long suggested using static_keys.
>> 
>> Fixes: 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted 
>> vCPUs")
>> Cc: sta...@vger.kernel.org # v4.18+
>> Reported-by: Parth Shah 
>> Reported-by: Ihor Pasichnyk 
>> Tested-by: Juri Lelli 
>> Acked-by: Waiman Long 
>> Reviewed-by: Gautham R. Shenoy 
>> Signed-off-by: Srikar Dronamraju 
>> Acked-by: Phil Auld 
>> Reviewed-by: Vaidyanathan Srinivasan 
>> Tested-by: Parth Shah 
>> [mpe: Move the key and setting of the key to pseries/setup.c]
>> Signed-off-by: Michael Ellerman 
>> ---
>>  arch/powerpc/include/asm/spinlock.h| 4 +++-
>>  arch/powerpc/platforms/pseries/setup.c | 7 +++
>>  2 files changed, 10 insertions(+), 1 deletion(-)
>> 
>
> Tested with your version of the patch and its working as expected

Thanks, I took the values below and put them in the changelog for the
patched case.

cheers

> Latency percentiles (usec)
> 50.0th: 45
> 75.0th: 63
> 90.0th: 74
> 95.0th: 78
> *99.0th: 82
> 99.5th: 83
> 99.9th: 86
> min=0, max=96
> Latency percentiles (usec)
> 50.0th: 46
> 75.0th: 64
> 90.0th: 75
> 95.0th: 79
> *99.0th: 83
> 99.5th: 85
> 99.9th: 91
> min=0, max=117
> Latency percentiles (usec)
> 50.0th: 46
> 75.0th: 64
> 90.0th: 75
> 95.0th: 79
> *99.0th: 83
> 99.5th: 84
> 99.9th: 90
> min=0, max=95
> Latency percentiles (usec)
> 50.0th: 47
> 75.0th: 65
> 90.0th: 75
> 95.0th: 79
> *99.0th: 84
> 99.5th: 85
> 99.9th: 90
> min=0, max=117
> Latency percentiles (usec)
> 50.0th: 45
> 75.0th: 64
> 90.0th: 75
> 95.0th: 79
> *99.0th: 82
> 99.5th: 83
> 99.9th: 93
> min=0, max=111
>
>
> -- 
> Thanks and Regards
> Srikar Dronamraju


[PATCH kernel 0/3] Enable IOMMU support for pseries Secure VMs

2019-12-13 Thread Alexey Kardashevskiy
This enables IOMMU for SVM. Instead of sharing
a H_PUT_TCE_INDIRECT page, this uses H_PUT_TCE for mapping
and H_STUFF_TCE for clearing TCEs which should bring
acceptable performance at the boot time; otherwise things
work with the same speed anyway.

Please comment. Thanks.



Alexey Kardashevskiy (2):
  powerpc/pseries: Allow not having
ibm,hypertas-functions::hcall-multi-tce for DDW
  powerpc/pseries/iommu: Do not use H_PUT_TCE_INDIRECT in secure VM

Ram Pai (1):
  powerpc/pseries/iommu: Use dma_iommu_ops for Secure VM.

 arch/powerpc/platforms/pseries/iommu.c | 59 +++---
 1 file changed, 34 insertions(+), 25 deletions(-)

-- 
2.17.1



[PATCH kernel 3/3] powerpc/pseries/iommu: Do not use H_PUT_TCE_INDIRECT in secure VM

2019-12-13 Thread Alexey Kardashevskiy
H_PUT_TCE_INDIRECT uses a 4K page with TCEs to allow handling up to 512
TCEs per hypercall. While it is a decent optimization, we rather share
less of secure VM memory so let's avoid sharing.

This only allows H_PUT_TCE_INDIRECT for normal (not secure) VMs.

This keeps using H_STUFF_TCE as it does not require sharing.

Signed-off-by: Alexey Kardashevskiy 
---

Possible alternatives are:

1. define FW_FEATURE_STUFFTCE (to allow H_STUFF_TCE) in addition to
FW_FEATURE_MULTITCE (make it only enable H_PUT_TCE_INDIRECT) and enable
only FW_FEATURE_STUFFTCE for SVM; pro = no SVM mention in iommu.c,
con = a FW feature bit with very limited use

2. disable FW_FEATURE_MULTITCE and loose H_STUFF_TCE which adds a delay
in booting process
---
 arch/powerpc/platforms/pseries/iommu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index f6e9b87c82fc..2334a67c7614 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -192,7 +192,8 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
int ret = 0;
unsigned long flags;
 
-   if ((npages == 1) || !firmware_has_feature(FW_FEATURE_MULTITCE)) {
+   if ((npages == 1) || !firmware_has_feature(FW_FEATURE_MULTITCE) ||
+   is_secure_guest()) {
return tce_build_pSeriesLP(tbl->it_index, tcenum,
   tbl->it_page_shift, npages, uaddr,
   direction, attrs);
@@ -402,7 +403,8 @@ static int tce_setrange_multi_pSeriesLP(unsigned long 
start_pfn,
u64 rc = 0;
long l, limit;
 
-   if (!firmware_has_feature(FW_FEATURE_MULTITCE)) {
+   if (!firmware_has_feature(FW_FEATURE_MULTITCE) ||
+   is_secure_guest()) {
unsigned long tceshift = be32_to_cpu(maprange->tce_shift);
unsigned long dmastart = (start_pfn << PAGE_SHIFT) +
be64_to_cpu(maprange->dma_base);
-- 
2.17.1



[PATCH kernel 2/3] powerpc/pseries: Allow not having ibm, hypertas-functions::hcall-multi-tce for DDW

2019-12-13 Thread Alexey Kardashevskiy
By default a pseries guest supports a H_PUT_TCE hypercall which maps
a single IOMMU page in a DMA window. Additionally the hypervisor may
support H_PUT_TCE_INDIRECT/H_STUFF_TCE which update multiple TCEs at once;
this is advertised via the device tree /rtas/ibm,hypertas-functions
property which Linux converts to FW_FEATURE_MULTITCE.

FW_FEATURE_MULTITCE is checked when dma_iommu_ops is used; however
the code managing the huge DMA window (DDW) ignores it and calls
H_PUT_TCE_INDIRECT even if it is explicitly disabled via
the "multitce=off" kernel command line parameter.

This adds FW_FEATURE_MULTITCE checking to the DDW code path.

This changes tce_build_pSeriesLP to take liobn and page size as
the huge window does not have iommu_table descriptor which usually
the place to store these numbers.

Signed-off-by: Alexey Kardashevskiy 
---

The idea is then set FW_FEATURE_MULTITCE in init_svm() and have the guest
use H_PUT_TCE without sharing a (temporary) page for H_PUT_TCE_INDIRECT.
---
 arch/powerpc/platforms/pseries/iommu.c | 44 ++
 1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index df7db33ca93b..f6e9b87c82fc 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -132,10 +132,10 @@ static unsigned long tce_get_pseries(struct iommu_table 
*tbl, long index)
return be64_to_cpu(*tcep);
 }
 
-static void tce_free_pSeriesLP(struct iommu_table*, long, long);
+static void tce_free_pSeriesLP(unsigned long liobn, long, long);
 static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
 
-static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
+static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
long npages, unsigned long uaddr,
enum dma_data_direction direction,
unsigned long attrs)
@@ -146,25 +146,25 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, 
long tcenum,
int ret = 0;
long tcenum_start = tcenum, npages_start = npages;
 
-   rpn = __pa(uaddr) >> TCE_SHIFT;
+   rpn = __pa(uaddr) >> tceshift;
proto_tce = TCE_PCI_READ;
if (direction != DMA_TO_DEVICE)
proto_tce |= TCE_PCI_WRITE;
 
while (npages--) {
-   tce = proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT;
-   rc = plpar_tce_put((u64)tbl->it_index, (u64)tcenum << 12, tce);
+   tce = proto_tce | (rpn & TCE_RPN_MASK) << tceshift;
+   rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, tce);
 
if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
ret = (int)rc;
-   tce_free_pSeriesLP(tbl, tcenum_start,
+   tce_free_pSeriesLP(liobn, tcenum_start,
   (npages_start - (npages + 1)));
break;
}
 
if (rc && printk_ratelimit()) {
printk("tce_build_pSeriesLP: plpar_tce_put failed. 
rc=%lld\n", rc);
-   printk("\tindex   = 0x%llx\n", (u64)tbl->it_index);
+   printk("\tindex   = 0x%llx\n", (u64)liobn);
printk("\ttcenum  = 0x%llx\n", (u64)tcenum);
printk("\ttce val = 0x%llx\n", tce );
dump_stack();
@@ -193,7 +193,8 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
unsigned long flags;
 
if ((npages == 1) || !firmware_has_feature(FW_FEATURE_MULTITCE)) {
-   return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
+   return tce_build_pSeriesLP(tbl->it_index, tcenum,
+  tbl->it_page_shift, npages, uaddr,
   direction, attrs);
}
 
@@ -209,8 +210,9 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
/* If allocation fails, fall back to the loop implementation */
if (!tcep) {
local_irq_restore(flags);
-   return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
-   direction, attrs);
+   return tce_build_pSeriesLP(tbl->it_index, tcenum,
+   tbl->it_page_shift,
+   npages, uaddr, direction, attrs);
}
__this_cpu_write(tce_page, tcep);
}
@@ -261,16 +263,16 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
return ret;
 }
 
-static void tce_free_pSeriesLP(struct iommu_table *tbl, long tcenum, long 
npages)
+static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long npages)
 {
   

[PATCH kernel 1/3] powerpc/pseries/iommu: Use dma_iommu_ops for Secure VM.

2019-12-13 Thread Alexey Kardashevskiy
From: Ram Pai 

Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on
secure guests")
disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops
path for secure VMs, helped enable dma_direct path.  This enabled
support for bounce-buffering through SWIOTLB.  However it fails to
operate when IOMMU is enabled, since I/O pages are not TCE mapped.

Reenable dma_iommu_ops path for pseries Secure VMs.  It handles all
cases including, TCE mapping I/O pages, in the presence of a
IOMMU.

Signed-off-by: Ram Pai 
Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/pseries/iommu.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 6ba081dd61c9..df7db33ca93b 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -36,7 +36,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "pseries.h"
 
@@ -1320,15 +1319,7 @@ void iommu_init_early_pSeries(void)
of_reconfig_notifier_register(_reconfig_nb);
register_memory_notifier(_mem_nb);
 
-   /*
-* Secure guest memory is inacessible to devices so regular DMA isn't
-* possible.
-*
-* In that case keep devices' dma_map_ops as NULL so that the generic
-* DMA code path will use SWIOTLB to bounce buffers for DMA.
-*/
-   if (!is_secure_guest())
-   set_pci_dma_ops(_iommu_ops);
+   set_pci_dma_ops(_iommu_ops);
 }
 
 static int __init disable_multitce(char *str)
-- 
2.17.1