Re: [PATCH v3 9/9] efi: move screen_info into efi init code

2023-10-10 Thread Catalin Marinas
On Mon, Oct 09, 2023 at 11:18:45PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> After the vga console no longer relies on global screen_info, there are
> only two remaining use cases:
> 
>  - on the x86 architecture, it is used for multiple boot methods
>(bzImage, EFI, Xen, kexec) to commucate the initial VGA or framebuffer
>settings to a number of device drivers.
> 
>  - on other architectures, it is only used as part of the EFI stub,
>and only for the three sysfb framebuffers (simpledrm, simplefb, efifb).
> 
> Remove the duplicate data structure definitions by moving it into the
> efi-init.c file that sets it up initially for the EFI case, leaving x86
> as an exception that retains its own definition for non-EFI boots.
> 
> The added #ifdefs here are optional, I added them to further limit the
> reach of screen_info to configurations that have at least one of the
> users enabled.
> 
> Reviewed-by: Ard Biesheuvel 
> Reviewed-by: Javier Martinez Canillas 
> Acked-by: Helge Deller 
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/arm/kernel/setup.c   |  4 
>  arch/arm64/kernel/efi.c   |  4 
>  arch/arm64/kernel/image-vars.h|  2 ++

It's more Ard's thing and he reviewed it already but if you need another
ack:

Acked-by: Catalin Marinas 


Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers

2023-05-24 Thread Catalin Marinas
On Wed, May 17, 2023 at 01:27:48PM +0200, Petr Tesařík wrote:
> On Wed, 17 May 2023 08:35:10 +0200
> Petr Tesařík  wrote:
> > Anyway, my greatest objection to allocating additional swiotlb chunks is
> > that _all_ of them must be searched to determine that the physical
> > address does _not_ belong to a swiotlb, incurring performance penalty
> 
> I thought about this part again, and I overlooked one option. We can
> track only the _active_ swiotlbs for each device. If a device never
> needs a swiotlb, there is no active swiotlb, and is_swiotlb_buffer()
> short-circuits to false. This should avoid all collateral damage to
> innocent devices.

Does this work with dma-buf or does dma-buf not allow swiotlb bouncing?

-- 
Catalin


Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers

2023-05-18 Thread Catalin Marinas
On Wed, May 17, 2023 at 11:58:21AM +0200, Petr Tesařík wrote:
> On Wed, 17 May 2023 10:41:19 +0100
> Catalin Marinas  wrote:
> > On Wed, May 17, 2023 at 08:56:53AM +0200, Christoph Hellwig wrote:
> > > Just thinking out loud:
> > > 
> > >  - what if we always way overallocate the swiotlb buffer
> > >  - and then mark the second half / two thirds /  > >of the thin air> slots as used, and make that region available
> > >through a special CMA mechanism as ZONE_MOVABLE (but not allowing
> > >other CMA allocations to dip into it).
> > > 
> > > This allows us to have a single slot management for the entire
> > > area, but allow reclaiming from it.  We'd probably also need to make
> > > this CMA variant irq safe.  
> > 
> > I think this could work. It doesn't need to be ZONE_MOVABLE (and we
> > actually need this buffer in ZONE_DMA). But we can introduce a new
> > migrate type, MIGRATE_SWIOTLB, and movable page allocations can use this
> > range. The CMA allocations go to free_list[MIGRATE_CMA], so they won't
> > overlap.
> > 
> > One of the downsides is that migrating movable pages still needs a
> > sleep-able context.
> 
> Pages can be migrated by a separate worker thread when the number of
> free slots reaches a low watermark.

Indeed, you still need such worker thread.

> > Another potential confusion is is_swiotlb_buffer() for pages in this
> > range allocated through the normal page allocator. We may need to check
> > the slots as well rather than just the buffer boundaries.
> 
> Ah, yes, I forgot about this part; thanks for the reminder.
> 
> Indeed, movable pages can be used for the page cache, and drivers do
> DMA to/from buffers in the page cache.
> 
> Let me recap:
> 
> - Allocated chunks must still be tracked with this approach.
> - The pool of available slots cannot be grown from interrupt context.
> 
> So, what exactly is the advantage compared to allocating additional
> swiotlb chunks from CMA?

This would work as well but it depends on how many other drivers
allocate from the CMA range. Maybe it's simpler to this initially (I
haven't got to your other emails yet).

> > > This could still be combined with more aggressive use of per-device
> > > swiotlb area, which is probably a good idea based on some hints.
> > > E.g. device could hint an amount of inflight DMA to the DMA layer,
> > > and if there are addressing limitations and the amout is large enough
> > > that could cause the allocation of a per-device swiotlb area.  
> > 
> > If we go for one large-ish per-device buffer for specific cases, maybe
> > something similar to the rmem_swiotlb_setup() but which can be
> > dynamically allocated at run-time and may live alongside the default
> > swiotlb. The advantage is that it uses a similar slot tracking to the
> > default swiotlb, no need to invent another. This per-device buffer could
> > also be allocated from the MIGRATE_SWIOTLB range if we make it large
> > enough at boot. It would be seen just a local accelerator for devices
> > that use bouncing frequently or from irq context.
> 
> A per-device pool could also be used for small buffers. IIRC somebody
> was interested in that.

That was me ;) but TBH, I don't care how large the bounce buffer is,
only that it can bounce small structures. If there's some critical path,
people can change the kmalloc() allocation for those structures to make
them cacheline-aligned.

-- 
Catalin


Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers

2023-05-18 Thread Catalin Marinas
On Wed, May 17, 2023 at 08:56:53AM +0200, Christoph Hellwig wrote:
> Just thinking out loud:
> 
>  - what if we always way overallocate the swiotlb buffer
>  - and then mark the second half / two thirds / of the thin air> slots as used, and make that region available
>through a special CMA mechanism as ZONE_MOVABLE (but not allowing
>other CMA allocations to dip into it).
> 
> This allows us to have a single slot management for the entire
> area, but allow reclaiming from it.  We'd probably also need to make
> this CMA variant irq safe.

I think this could work. It doesn't need to be ZONE_MOVABLE (and we
actually need this buffer in ZONE_DMA). But we can introduce a new
migrate type, MIGRATE_SWIOTLB, and movable page allocations can use this
range. The CMA allocations go to free_list[MIGRATE_CMA], so they won't
overlap.

One of the downsides is that migrating movable pages still needs a
sleep-able context.

Another potential confusion is is_swiotlb_buffer() for pages in this
range allocated through the normal page allocator. We may need to check
the slots as well rather than just the buffer boundaries.

(we are actually looking at a MIGRATE_METADATA type for the arm64 memory
tagging extension which uses a 3% carveout of the RAM for storing the
tags and people want that reused somehow; we have some WIP patches but
we'll post them later this summer)

> This could still be combined with more aggressive use of per-device
> swiotlb area, which is probably a good idea based on some hints.
> E.g. device could hint an amount of inflight DMA to the DMA layer,
> and if there are addressing limitations and the amout is large enough
> that could cause the allocation of a per-device swiotlb area.

If we go for one large-ish per-device buffer for specific cases, maybe
something similar to the rmem_swiotlb_setup() but which can be
dynamically allocated at run-time and may live alongside the default
swiotlb. The advantage is that it uses a similar slot tracking to the
default swiotlb, no need to invent another. This per-device buffer could
also be allocated from the MIGRATE_SWIOTLB range if we make it large
enough at boot. It would be seen just a local accelerator for devices
that use bouncing frequently or from irq context.

-- 
Catalin


Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers

2023-05-17 Thread Catalin Marinas
On Tue, May 16, 2023 at 08:39:42AM +0200, Petr Tesařík wrote:
> On Tue, 16 May 2023 08:13:09 +0200
> Christoph Hellwig  wrote:
> > On Mon, May 15, 2023 at 07:43:52PM +, Michael Kelley (LINUX) wrote:
> > > FWIW, I don't think the approach you have implemented here will be
> > > practical to use for CoCo VMs (SEV, TDX, whatever else).  The problem
> > > is that dma_direct_alloc_pages() and dma_direct_free_pages() must
> > > call dma_set_decrypted() and dma_set_encrypted(), respectively.  In CoCo
> > > VMs, these calls are expensive because they require a hypercall to the 
> > > host,
> > > and the operation on the host isn't trivial either.  I haven't measured 
> > > the
> > > overhead, but doing a hypercall on every DMA map operation and on
> > > every unmap operation has long been something we thought we must
> > > avoid.  The fixed swiotlb bounce buffer space solves this problem by
> > > doing set_decrypted() in batch at boot time, and never
> > > doing set_encrypted().  
> > 
> > I also suspect it doesn't really scale too well due to the number of
> > allocations.  I suspect a better way to implement things would be to
> > add more large chunks that are used just like the main swiotlb buffers.
> > 
> > That is when we run out of space try to allocate another chunk of the
> > same size in the background, similar to what we do with the pool in
> > dma-pool.c.  This means we'll do a fairly large allocation, so we'll
> > need compaction or even CMA to back it up, but the other big upside
> > is that it also reduces the number of buffers that need to be checked
> > in is_swiotlb_buffer or the free / sync side.
> 
> I have considered this approach. The two main issues I ran into were:
> 
> 1. MAX_ORDER allocations were too small (at least with 4K pages), and
>even then they would often fail.
> 
> 2. Allocating from CMA did work but only from process context.
>I made a stab at modifying the CMA allocator to work from interrupt
>context, but there are non-trivial interactions with the buddy
>allocator. Making them safe from interrupt context looked like a
>major task.

Can you kick off a worker thread when the swiotlb allocation gets past
some reserve limit? It still has a risk of failing to bounce until the
swiotlb buffer is extended.

> I also had some fears about the length of the dynamic buffer list. I
> observed maximum length for block devices, and then it roughly followed
> the queue depth. Walking a few hundred buffers was still fast enough.
> I admit the list length may become an issue with high-end NVMe and
> I/O-intensive applications.

You could replace the list with an rbtree, O(log n) look-up vs O(n),
could be faster if you have many bounces active.

-- 
Catalin


Re: [PATCH v2 RESEND 7/7] swiotlb: per-device flag if there are dynamically allocated buffers

2023-05-17 Thread Catalin Marinas
On Tue, May 16, 2023 at 09:55:12AM +0200, Petr Tesařík wrote:
> On Mon, 15 May 2023 17:28:38 +0100
> Catalin Marinas  wrote:
> > There is another scenario to take into account on the list_del() side.
> > Let's assume that there are other elements on the list, so
> > list_empty() == false:
> > 
> > P0:
> > list_del(paddr);
> > /* the memory gets freed, added to some slab or page free list */
> > WRITE_ONCE(slab_free_list, __va(paddr));
> > 
> > P1:
> > paddr = __pa(READ_ONCE(slab_free_list));/* re-allocating paddr freed on 
> > P0 */
> > if (!list_empty()) {/* assuming other elements on 
> > the list */
> > /* searching the list */
> > list_for_each() {
> > if (pos->paddr) == __pa(vaddr))
> > /* match */
> > }
> > }
> > 
> > On P0, you want the list update to be visible before the memory is freed
> > (and potentially reallocated on P1). An smp_wmb() on P0 would do. For
> > P1, we don't care about list_empty() as there can be other elements
> > already. But we do want any list elements reading during the search to
> > be ordered after the slab_free_list reading. The smp_rmb() you'd add for
> > the case above would suffice.
> 
> Yes, but to protect against concurrent insertions/deletions, a spinlock
> is held while searching the list. The spin lock provides the necessary
> memory barriers implicitly.

Well, mostly. The spinlock acquire/release semantics ensure that
accesses within the locked region are not observed outside the
lock/unlock. But it doesn't guarantee anything about accesses outside
such region in relation to the accesses within the region. For example:

P0:
spin_lock_irqsave(_dyn_lock);
list_del(paddr);
spin_unlock_irqrestore(_dyn_lock);

/* the blah write below can be observed before list_del() above */
WRITE_ONCE(blah, paddr);

/* that's somewhat tricker but slab_free_list update can also be
 * seen before list_del() above on certain architectures */
spin_lock_irqsave(_lock);
WRITE_ONCE(slab_free_list, __va(paddr));
spin_unlock_irqrestore(_lock);

On most architectures, the writing of the pointer to a slab structure
(assuming some spinlocks) would be ordered against the list_del() from
the swiotlb code. Apart from powerpc where the spin_unlock() is not
necessarily ordered against the subsequent spin_lock(). The architecture
selects ARCH_WEAK_RELEASE_ACQUIRE which in turns makes
smp_mb__after_unlock_lock() an smp_mb() (rather than no-op on all the
other architectures).

On arm64 we have smp_mb__after_spinlock() which ensures that memory
accesses prior to spin_lock() are not observed after accesses within the
locked region. I don't think this matters for your case but I thought
I'd mention it.

-- 
Catalin


Re: [PATCH v2 RESEND 7/7] swiotlb: per-device flag if there are dynamically allocated buffers

2023-05-16 Thread Catalin Marinas
(some of you replies may have been filtered to various of my mailboxes,
depending on which lists you cc'ed; replying here)

On Mon, May 15, 2023 at 12:00:54PM +0200, Petr Tesařík wrote:
> On Mon, 15 May 2023 10:48:47 +0200
> Petr Tesařík  wrote:
> > On Sun, 14 May 2023 19:54:27 +0100
> > Catalin Marinas  wrote:
> > > Now, thinking about the list_head access and the flag ordering, since it
> > > doesn't matter, you might as well not bother with the flag at all and
> > > rely on list_add() and list_empty() ordering vs the hypothetical 'blah'
> > > access. Both of these use READ/WRITE_ONCE() for setting
> > > dma_io_tlb_dyn_slots.next. You only need an smp_wmb() after the
> > > list_add() and an smp_rmb() before a list_empty() check in
>   ^
> Got it, finally. Well, that's exactly something I don't want to do.
> For example, on arm64 (seeing your email address), smp_rmb() translates
> to a "dsb ld" instruction. I would expect that this is more expensive
> than a "ldar", generated by smp_load_acquire().

It translates to a dmb ishld which is on par with ldar (dsb is indeed a
lot more expensive but that's not generated here).

> > > is_swiotlb_buffer(), no dma_iotlb_have_dyn variable.  
> > 
> > Wait, let me check that I understand you right. Do you suggest that I
> > convert dma_io_tlb_dyn_slots to a lockless list and get rid of the
> > spinlock?
> > 
> > I'm sure it can be done for list_add() and list_del(). I'll have
> > to think about list_move().
> 
> Hm, even the documentation of llist_empty() says that it is "not
> guaranteed to be accurate or up to date". If it could be, I'm quite
> sure the authors would have gladly implemented it as such.

It doesn't but neither does your flag. If you want a guarantee, you'd
need locks because a llist_empty() on its own can race with other
llist_add/del_*() that may not yet be visible to a CPU at exactly that
moment. BTW, the llist implementation cannot delete a random element, so
not sure this is suitable for your implementation (it can only delete
the first element or the whole list).

I do think you need to change your invariants and not rely on an
absolute list_empty() or some flag:

P0:
list_add(paddr);
WRITE_ONCE(blah, paddr);

P1:
paddr = READ_ONCE(blah);
list_empty();

Your invariant (on P1) should be blah == paddr => !list_empty(). If
there is another P2 removing paddr from the list, this wouldn't work
(nor your flag) but the assumption is that a correctly written driver
that still has a reference to paddr doesn't use it after being removed
from the list (i.e. it doesn't do a dma_unmap(paddr) and still continue
to use this paddr for e.g. dma_sync()).

For such invariant, you'd need ordering between list_add() and the
write of paddr (smp_wmb() would do). On P1, you need an smp_rmb() before
list_empty() since the implementation does a READ_ONCE only).

You still need the locks for list modifications and list traversal as I
don't see how you can use the llist implementation with random element
removal.


There is another scenario to take into account on the list_del() side.
Let's assume that there are other elements on the list, so
list_empty() == false:

P0:
list_del(paddr);
/* the memory gets freed, added to some slab or page free list */
WRITE_ONCE(slab_free_list, __va(paddr));

P1:
paddr = __pa(READ_ONCE(slab_free_list));/* re-allocating paddr freed on 
P0 */
if (!list_empty()) {/* assuming other elements on 
the list */
/* searching the list */
list_for_each() {
if (pos->paddr) == __pa(vaddr))
/* match */
}
}

On P0, you want the list update to be visible before the memory is freed
(and potentially reallocated on P1). An smp_wmb() on P0 would do. For
P1, we don't care about list_empty() as there can be other elements
already. But we do want any list elements reading during the search to
be ordered after the slab_free_list reading. The smp_rmb() you'd add for
the case above would suffice.

-- 
Catalin


Re: [PATCH v2 RESEND 7/7] swiotlb: per-device flag if there are dynamically allocated buffers

2023-05-15 Thread Catalin Marinas
On Tue, May 09, 2023 at 11:18:19AM +0200, Petr Tesarik wrote:
> diff --git a/include/linux/device.h b/include/linux/device.h
> index d1d2b8557b30..e340e0f06dce 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -516,6 +516,9 @@ struct device_physical_location {
>   * @dma_io_tlb_dyn_slots:
>   *   Dynamically allocated bounce buffers for this device.
>   *   Not for driver use.
> + * @dma_io_tlb_have_dyn:
> + *   Does this device have any dynamically allocated bounce
> + *   buffers? Not for driver use.
>   * @archdata:For arch-specific additions.
>   * @of_node: Associated device tree node.
>   * @fwnode:  Associated device node supplied by platform firmware.
> @@ -623,6 +626,7 @@ struct device {
>   struct io_tlb_mem *dma_io_tlb_mem;
>   spinlock_t dma_io_tlb_dyn_lock;
>   struct list_head dma_io_tlb_dyn_slots;
> + bool dma_io_tlb_have_dyn;
>  #endif
>   /* arch specific additions */
>   struct dev_archdata archdata;
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index daa2064f2ede..8cbb0bebb0bc 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -152,7 +152,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
> phys_addr_t paddr)
>  
>   return mem &&
>   (is_swiotlb_fixed(mem, paddr) ||
> -  (mem->allow_dyn && is_swiotlb_dyn(dev, paddr)));
> +  /* Pairs with smp_store_release() in swiotlb_dyn_map()
> +   * and swiotlb_dyn_unmap().
> +   */
> +  (smp_load_acquire(>dma_io_tlb_have_dyn) &&
> +   is_swiotlb_dyn(dev, paddr)));
>  }
>  
>  static inline bool is_swiotlb_force_bounce(struct device *dev)
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 81eab1c72c50..e8be3ee50f18 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -642,6 +642,9 @@ static phys_addr_t swiotlb_dyn_map(struct device *dev, 
> phys_addr_t orig_addr,
>  
>   spin_lock_irqsave(>dma_io_tlb_dyn_lock, flags);
>   list_add(>node, >dma_io_tlb_dyn_slots);
> + if (!dev->dma_io_tlb_have_dyn)
> + /* Pairs with smp_load_acquire() in is_swiotlb_buffer() */
> + smp_store_release(>dma_io_tlb_have_dyn, true);
>   spin_unlock_irqrestore(>dma_io_tlb_dyn_lock, flags);

I'm not sure this works. What this seems to do is that if the caller of
is_swiotlb_buffer() sees the flag set, it's guaranteed that something
was added to the dma_io_tlb_dyn_slots list. But the reverse is not
necessarily true. IOW, if something was added to the list, there is a
brief window where the dma_io_tlb_have_dyn flag is still false. In the
general case, I doubt any ordering between list_add() and the flag
setting changes anything since neither of them may be visible to another
CPU.

What you need is for a 'paddr' added to the dynamic list to be correctly
identified by another CPU as dynamic swiotlb. That other CPU doesn't
check random addresses but only those returned by the DMA API. Such
values would be normally passed through a memory location (e.g. driver
local structures) and that's what you want to order against.

What I mean is that a 'dev->blah = paddr' needs to be ordered _after_
your flag setting. So you need the either the 'blah = paddr' assignment
to have release semantics or the flag setting to be an
smp_store_acquire() (but we don't have such thing). You'd have to use an
explicit smp_wmb() barrier after the flag setting (it can be outside the
lock). The spin_unlock() is not sufficient since it only has release
semantics. I also don't think the ordering between list_add() and flag
setting matters since the smp_wmb() would ensure that both are visible
when the 'paddr' value made it to the other CPU.

Similarly on the is_swiotlb_buffer() side, you want the flag reading to
be ordered after the 'blah = paddr' is observed. Here the
smp_load_acquire() is sufficient.

>   return page_to_phys(slot->page);
> @@ -668,6 +671,9 @@ static void swiotlb_dyn_unmap(struct device *dev, 
> phys_addr_t tlb_addr,
>   unsigned long flags;
>  
>   spin_lock_irqsave(>dma_io_tlb_dyn_lock, flags);
> + if (list_is_singular(>dma_io_tlb_dyn_slots))
> + /* Pairs with smp_load_acquire() in is_swiotlb_buffer() */
> + smp_store_release(>dma_io_tlb_have_dyn, false);
>   slot = lookup_dyn_slot_locked(dev, tlb_addr);
>   list_del(>node);
>   spin_unlock_irqrestore(>dma_io_tlb_dyn_lock, flags);

As with the map case, I don't think the ordering between list_del() and
the flag setting matters. If you unmap the last dynamic buffer, the
worst that can happen is that an is_swiotlb_buffer() call attempts a
read of the list but the flag will eventually become visible. There
shouldn't be another caller trying to unmap the same paddr (in well
behaved drivers).

Now, thinking about the list_head access and the flag ordering, since it
doesn't matter, you might 

Re: (subset) [PATCH 00/35] Documentation: correct lots of spelling errors (series 1)

2023-02-01 Thread Catalin Marinas
On Thu, 26 Jan 2023 22:39:30 -0800, Randy Dunlap wrote:
> Correct many spelling errors in Documentation/ as reported by codespell.
> 
> Maintainers of specific kernel subsystems are only Cc-ed on their
> respective patches, not the entire series. [if all goes well]
> 
> These patches are based on linux-next-20230125.
> 
> [...]

Applied to arm64 (for-next/misc), thanks!

[01/35] Documentation: arm64: correct spelling
https://git.kernel.org/arm64/c/a70f00e7f1a3

-- 
Catalin



Re: [PATCH 1/3] iommu/dma: Clean up Kconfig

2022-09-02 Thread Catalin Marinas
On Tue, Aug 16, 2022 at 06:28:03PM +0100, Robin Murphy wrote:
> Although iommu-dma is a per-architecture chonce, that is currently
> implemented in a rather haphazard way. Selecting from the arch Kconfig
> was the original logical approach, but is complicated by having to
> manage dependencies; conversely, selecting from drivers ends up hiding
> the architecture dependency *too* well. Instead, let's just have it
> enable itself automatically when IOMMU API support is enabled for the
> relevant architectures. It can't get much clearer than that.
> 
> Signed-off-by: Robin Murphy 
> ---
>  arch/arm64/Kconfig  | 1 -

For this change:

Acked-by: Catalin Marinas 


Re: [PATCH 2/3] iommu/dma: Move public interfaces to linux/iommu.h

2022-09-02 Thread Catalin Marinas
On Tue, Aug 16, 2022 at 06:28:04PM +0100, Robin Murphy wrote:
> The iommu-dma layer is now mostly encapsulated by iommu_dma_ops, with
> only a couple more public interfaces left pertaining to MSI integration.
> Since these depend on the main IOMMU API header anyway, move their
> declarations there, taking the opportunity to update the half-baked
> comments to proper kerneldoc along the way.
> 
> Signed-off-by: Robin Murphy 
> ---
> 
> Note that iommu_setup_dma_ops() should also become internal in a future
> phase of the great IOMMU API upheaval - for now as the last bit of true
> arch code glue I consider it more "necessarily exposed" than "public".
> 
>  arch/arm64/mm/dma-mapping.c   |  2 +-

And here:

Acked-by: Catalin Marinas 


Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)

2022-05-20 Thread Catalin Marinas
On Wed, May 11, 2022 at 07:04:51PM +0900, Hyeonggon Yoo wrote:
> On Wed, May 11, 2022 at 08:39:29AM +0900, Byungchul Park wrote:
> > On Tue, May 10, 2022 at 08:18:12PM +0900, Hyeonggon Yoo wrote:
> > > On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> > > > CASE 1.
> > > > 
> > > >lock L with depth n
> > > >lock_nested L' with depth n + 1
> > > >...
> > > >unlock L'
> > > >unlock L
> > > > 
> > > > This case is allowed by Lockdep.
> > > > This case is allowed by DEPT cuz it's not a deadlock.
> > > > 
> > > > CASE 2.
> > > > 
> > > >lock L with depth n
> > > >lock A
> > > >lock_nested L' with depth n + 1
> > > >...
> > > >unlock L'
> > > >unlock A
> > > >unlock L
> > > > 
> > > > This case is allowed by Lockdep.
> > > > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> > > 
> > > Yeah, in previous threads we discussed this [1]
> > > 
> > > And the case was:
> > >   scan_mutex -> object_lock -> kmemleak_lock -> object_lock
> > > And dept reported:
> > >   object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
> > >   deadlock.
> > > 
> > > But IIUC - What DEPT reported happens only under scan_mutex and it
> > > is not simple just not to take them because the object can be
> > > removed from the list and freed while scanning via kmemleak_free()
> > > without kmemleak_lock and object_lock.

The above kmemleak sequence shouldn't deadlock since those locks, even
if taken in a different order, are serialised by scan_mutex. For various
reasons, trying to reduce the latency, I ended up with some
fine-grained, per-object locking.

For object allocation (rbtree modification) and tree search, we use
kmemleak_lock. During scanning (which can take minutes under
scan_mutex), we want to prevent (a) long latencies and (b) freeing the
object being scanned. We release the locks regularly for (a) and hold
the object->lock for (b).

In another thread Byungchul mentioned:

|context X  context Y
| 
|lock mutex A   lock mutex A
|lock B lock C
|lock C lock B
|unlock C   unlock B
|unlock B   unlock C
|unlock mutex A unlock mutex A
| 
| In my opinion, lock B and lock C are unnecessary if they are always
| along with lock mutex A. Or we should keep correct lock order across all
| the code.

If these are the only two places, yes, locks B and C would be
unnecessary. But we have those locks acquired (not nested) on the
allocation path (kmemleak_lock) and freeing path (object->lock). We
don't want to block those paths while scan_mutex is held.

That said, we may be able to use a single kmemleak_lock for everything.
The object freeing path may be affected slightly during scanning but the
code does release it every MAX_SCAN_SIZE bytes. It may even get slightly
faster as we'd hammer a single lock (I'll do some benchmarks).

But from a correctness perspective, I think the DEPT tool should be
improved a bit to detect when such out of order locking is serialised by
an enclosing lock/mutex.

-- 
Catalin


Re: [PATCH v19 00/15] arm64: untag user pointers passed to the kernel

2019-08-05 Thread Catalin Marinas
On Thu, Aug 01, 2019 at 08:36:47AM -0700, Dave Hansen wrote:
> On 8/1/19 5:48 AM, Andrey Konovalov wrote:
> > On Thu, Aug 1, 2019 at 2:11 PM Kevin Brodsky  wrote:
> >> On 31/07/2019 17:50, Dave Hansen wrote:
> >>> On 7/23/19 10:58 AM, Andrey Konovalov wrote:
>  The mmap and mremap (only new_addr) syscalls do not currently accept
>  tagged addresses. Architectures may interpret the tag as a background
>  colour for the corresponding vma.
> >>>
> >>> What the heck is a "background colour"? :)
> >>
> >> Good point, this is some jargon that we started using for MTE, the idea 
> >> being that
> >> the kernel could set a tag value (specified during mmap()) as "background 
> >> colour" for
> >> anonymous pages allocated in that range.
> >>
> >> Anyway, this patch series is not about MTE. Andrey, for v20 (if any), I 
> >> think it's
> >> best to drop this last sentence to avoid any confusion.

Indeed, the part with the "background colour" and even the "currently"
adverb should be dropped.

Also, if we merge the patches via different trees anyway, I don't think
there is a need for Andrey to integrate them with his series. We can
pick them up directly in the arm64 tree (once the review finished).

> OK, but what does that mean for tagged addresses getting passed to
> mmap/mremap?  That sentence read to me like "architectures might allow
> tags for ...something...".  So do we accept tagged addresses into those
> syscalls?

If mmap() does not return a tagged address, the reasoning is that it
should not accept one as an address hint (with or without MAP_FIXED).
Note that these docs should only describe the top-byte-ignore ABI while
leaving the memory tagging for a future patchset.

In that future patchset, we may want to update the mmap() ABI to allow,
only in conjunction with PROT_MTE, a tagged pointer as an address
argument. In such case mmap() will return a tagged address and the pages
pre-coloured (on fault) with the tag requested by the user. As I said,
that's to be discussed later in the year.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v19 02/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-08-05 Thread Catalin Marinas
On Thu, Aug 01, 2019 at 09:45:05AM -0700, Dave Hansen wrote:
> On 8/1/19 5:38 AM, Kevin Brodsky wrote:
> > This patch series only changes what is allowed or not at the syscall
> > interface. It does not change the address space size. On arm64, TBI (Top
> > Byte Ignore) has always been enabled for userspace, so it has never been
> > possible to use the upper 8 bits of user pointers for addressing.
> 
> Oh, so does the address space that's available already chop that out?

Yes. Currently the hardware only supports 52-bit virtual addresses. It
could be expanded (though it needs a 5th page table level) to 56-bit VA
but it's not currently on our (hardware) plans. Beyond 56-bit, it cannot
be done without breaking the software expectations (and hopefully I'll
retire before we need this ;)).

> > If other architectures were to support a similar functionality, then I
> > agree that a common and more generic interface (if needed) would be
> > helpful, but as it stands this is an arm64-specific prctl, and on arm64
> > the address tag is defined by the architecture as bits [63:56].
> 
> It should then be an arch_prctl(), no?

I guess you just want renaming SET_TAGGED_ADDR_CTRL() to
arch_prctl_tagged_addr_ctrl_set()? (similarly for 'get')

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v18 00/15] arm64: untag user pointers passed to the kernel

2019-06-27 Thread Catalin Marinas
Hi Andrew,

On Mon, Jun 24, 2019 at 04:32:45PM +0200, Andrey Konovalov wrote:
> Andrey Konovalov (14):
>   arm64: untag user pointers in access_ok and __uaccess_mask_ptr
>   lib: untag user pointers in strn*_user
>   mm: untag user pointers passed to memory syscalls
>   mm: untag user pointers in mm/gup.c
>   mm: untag user pointers in get_vaddr_frames
>   fs/namespace: untag user pointers in copy_mount_options
>   userfaultfd: untag user pointers
>   drm/amdgpu: untag user pointers
>   drm/radeon: untag user pointers in radeon_gem_userptr_ioctl
>   IB/mlx4: untag user pointers in mlx4_get_umem_mr
>   media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get
>   tee/shm: untag user pointers in tee_shm_register
>   vfio/type1: untag user pointers in vaddr_get_pfn
>   selftests, arm64: add a selftest for passing tagged pointers to kernel
> 
> Catalin Marinas (1):
>   arm64: Introduce prctl() options to control the tagged user addresses
> ABI
> 
>  arch/arm64/Kconfig|  9 +++
>  arch/arm64/include/asm/processor.h|  8 +++
>  arch/arm64/include/asm/thread_info.h  |  1 +
>  arch/arm64/include/asm/uaccess.h  | 12 +++-
>  arch/arm64/kernel/process.c   | 72 +++
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  2 +
>  drivers/gpu/drm/radeon/radeon_gem.c   |  2 +
>  drivers/infiniband/hw/mlx4/mr.c   |  7 +-
>  drivers/media/v4l2-core/videobuf-dma-contig.c |  9 +--
>  drivers/tee/tee_shm.c |  1 +
>  drivers/vfio/vfio_iommu_type1.c   |  2 +
>  fs/namespace.c|  2 +-
>  fs/userfaultfd.c  | 22 +++---
>  include/uapi/linux/prctl.h|  5 ++
>  kernel/sys.c  | 12 
>  lib/strncpy_from_user.c   |  3 +-
>  lib/strnlen_user.c|  3 +-
>  mm/frame_vector.c |  2 +
>  mm/gup.c  |  4 ++
>  mm/madvise.c  |  2 +
>  mm/mempolicy.c|  3 +
>  mm/migrate.c  |  2 +-
>  mm/mincore.c  |  2 +
>  mm/mlock.c|  4 ++
>  mm/mprotect.c |  2 +
>  mm/mremap.c   |  7 ++
>  mm/msync.c|  2 +
>  tools/testing/selftests/arm64/.gitignore  |  1 +
>  tools/testing/selftests/arm64/Makefile| 11 +++
>  .../testing/selftests/arm64/run_tags_test.sh  | 12 
>  tools/testing/selftests/arm64/tags_test.c | 29 
>  32 files changed, 232 insertions(+), 25 deletions(-)

It looks like we got to an agreement on how to deal with tagged user
addresses between SPARC ADI and ARM Memory Tagging. If there are no
other objections, what's your preferred way of getting this series into
-next first and then mainline? Are you ok to merge them into the mm
tree?

Thanks.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v18 08/15] userfaultfd: untag user pointers

2019-06-25 Thread Catalin Marinas
On Mon, Jun 24, 2019 at 04:32:53PM +0200, Andrey Konovalov wrote:
> This patch is a part of a series that extends kernel ABI to allow to pass
> tagged user pointers (with the top byte set to something else other than
> 0x00) as syscall arguments.
> 
> userfaultfd code use provided user pointers for vma lookups, which can
> only by done with untagged pointers.
> 
> Untag user pointers in validate_range().
> 
> Reviewed-by: Vincenzo Frascino 
> Reviewed-by: Catalin Marinas 
> Reviewed-by: Kees Cook 
> Signed-off-by: Andrey Konovalov 
> ---
>  fs/userfaultfd.c | 22 --
>  1 file changed, 12 insertions(+), 10 deletions(-)

Same here, it needs an ack from Al Viro.

> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index ae0b8b5f69e6..c2be36a168ca 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1261,21 +1261,23 @@ static __always_inline void wake_userfault(struct 
> userfaultfd_ctx *ctx,
>  }
>  
>  static __always_inline int validate_range(struct mm_struct *mm,
> -   __u64 start, __u64 len)
> +   __u64 *start, __u64 len)
>  {
>   __u64 task_size = mm->task_size;
>  
> - if (start & ~PAGE_MASK)
> + *start = untagged_addr(*start);
> +
> + if (*start & ~PAGE_MASK)
>   return -EINVAL;
>   if (len & ~PAGE_MASK)
>   return -EINVAL;
>   if (!len)
>   return -EINVAL;
> - if (start < mmap_min_addr)
> + if (*start < mmap_min_addr)
>   return -EINVAL;
> - if (start >= task_size)
> + if (*start >= task_size)
>   return -EINVAL;
> - if (len > task_size - start)
> + if (len > task_size - *start)
>   return -EINVAL;
>   return 0;
>  }
> @@ -1325,7 +1327,7 @@ static int userfaultfd_register(struct userfaultfd_ctx 
> *ctx,
>   goto out;
>   }
>  
> - ret = validate_range(mm, uffdio_register.range.start,
> + ret = validate_range(mm, _register.range.start,
>uffdio_register.range.len);
>   if (ret)
>   goto out;
> @@ -1514,7 +1516,7 @@ static int userfaultfd_unregister(struct 
> userfaultfd_ctx *ctx,
>   if (copy_from_user(_unregister, buf, sizeof(uffdio_unregister)))
>   goto out;
>  
> - ret = validate_range(mm, uffdio_unregister.start,
> + ret = validate_range(mm, _unregister.start,
>uffdio_unregister.len);
>   if (ret)
>   goto out;
> @@ -1665,7 +1667,7 @@ static int userfaultfd_wake(struct userfaultfd_ctx *ctx,
>   if (copy_from_user(_wake, buf, sizeof(uffdio_wake)))
>   goto out;
>  
> - ret = validate_range(ctx->mm, uffdio_wake.start, uffdio_wake.len);
> + ret = validate_range(ctx->mm, _wake.start, uffdio_wake.len);
>   if (ret)
>   goto out;
>  
> @@ -1705,7 +1707,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
>  sizeof(uffdio_copy)-sizeof(__s64)))
>   goto out;
>  
> - ret = validate_range(ctx->mm, uffdio_copy.dst, uffdio_copy.len);
> + ret = validate_range(ctx->mm, _copy.dst, uffdio_copy.len);
>   if (ret)
>   goto out;
>   /*
> @@ -1761,7 +1763,7 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx 
> *ctx,
>  sizeof(uffdio_zeropage)-sizeof(__s64)))
>   goto out;
>  
> - ret = validate_range(ctx->mm, uffdio_zeropage.range.start,
> + ret = validate_range(ctx->mm, _zeropage.range.start,
>uffdio_zeropage.range.len);
>   if (ret)
>   goto out;
> -- 
> 2.22.0.410.gd8fdbe21b5-goog
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v18 11/15] IB/mlx4: untag user pointers in mlx4_get_umem_mr

2019-06-25 Thread Catalin Marinas
On Mon, Jun 24, 2019 at 04:32:56PM +0200, Andrey Konovalov wrote:
> This patch is a part of a series that extends kernel ABI to allow to pass
> tagged user pointers (with the top byte set to something else other than
> 0x00) as syscall arguments.
> 
> mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can
> only by done with untagged pointers.
> 
> Untag user pointers in this function.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  drivers/infiniband/hw/mlx4/mr.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Acked-by: Catalin Marinas 

This patch also needs an ack from the infiniband maintainers (Jason).

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v18 07/15] fs/namespace: untag user pointers in copy_mount_options

2019-06-25 Thread Catalin Marinas
On Mon, Jun 24, 2019 at 04:32:52PM +0200, Andrey Konovalov wrote:
> This patch is a part of a series that extends kernel ABI to allow to pass
> tagged user pointers (with the top byte set to something else other than
> 0x00) as syscall arguments.
> 
> In copy_mount_options a user address is being subtracted from TASK_SIZE.
> If the address is lower than TASK_SIZE, the size is calculated to not
> allow the exact_copy_from_user() call to cross TASK_SIZE boundary.
> However if the address is tagged, then the size will be calculated
> incorrectly.
> 
> Untag the address before subtracting.
> 
> Reviewed-by: Khalid Aziz 
> Reviewed-by: Vincenzo Frascino 
> Reviewed-by: Kees Cook 
> Reviewed-by: Catalin Marinas 
> Signed-off-by: Andrey Konovalov 
> ---
>  fs/namespace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 7660c2749c96..ec78f7223917 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2994,7 +2994,7 @@ void *copy_mount_options(const void __user * data)
>* the remainder of the page.
>*/
>   /* copy_from_user cannot cross TASK_SIZE ! */
> - size = TASK_SIZE - (unsigned long)data;
> + size = TASK_SIZE - (unsigned long)untagged_addr(data);
>   if (size > PAGE_SIZE)
>   size = PAGE_SIZE;

I think this patch needs an ack from Al Viro (cc'ed).

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v18 15/15] selftests, arm64: add a selftest for passing tagged pointers to kernel

2019-06-25 Thread Catalin Marinas
On Mon, Jun 24, 2019 at 04:33:00PM +0200, Andrey Konovalov wrote:
> --- /dev/null
> +++ b/tools/testing/selftests/arm64/tags_test.c
> @@ -0,0 +1,29 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define SHIFT_TAG(tag)   ((uint64_t)(tag) << 56)
> +#define SET_TAG(ptr, tag)(((uint64_t)(ptr) & ~SHIFT_TAG(0xff)) | \
> + SHIFT_TAG(tag))
> +
> +int main(void)
> +{
> + static int tbi_enabled = 0;
> + struct utsname *ptr, *tagged_ptr;
> + int err;
> +
> + if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0) == 0)
> + tbi_enabled = 1;

Nitpick: with the latest prctl() patch, you can skip the last three
arguments as they are ignored.

Either way:

Reviewed-by: Catalin Marinas 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v17 03/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-06-20 Thread Catalin Marinas
On Wed, Jun 19, 2019 at 04:45:02PM +0200, Andrey Konovalov wrote:
> On Wed, Jun 12, 2019 at 1:43 PM Andrey Konovalov  
> wrote:
> > From: Catalin Marinas 
> >
> > It is not desirable to relax the ABI to allow tagged user addresses into
> > the kernel indiscriminately. This patch introduces a prctl() interface
> > for enabling or disabling the tagged ABI with a global sysctl control
> > for preventing applications from enabling the relaxed ABI (meant for
> > testing user-space prctl() return error checking without reconfiguring
> > the kernel). The ABI properties are inherited by threads of the same
> > application and fork()'ed children but cleared on execve().
> >
> > The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
> > MTE-specific settings like imprecise vs precise exceptions.
> >
> > Signed-off-by: Catalin Marinas 
> 
> Catalin, would you like to do the requested changes to this patch
> yourself and send it to me or should I do that?

I'll send you an updated version this week.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v17 03/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-06-18 Thread Catalin Marinas
On Wed, Jun 12, 2019 at 01:43:20PM +0200, Andrey Konovalov wrote:
> From: Catalin Marinas 
> 
> It is not desirable to relax the ABI to allow tagged user addresses into
> the kernel indiscriminately. This patch introduces a prctl() interface
> for enabling or disabling the tagged ABI with a global sysctl control
> for preventing applications from enabling the relaxed ABI (meant for
> testing user-space prctl() return error checking without reconfiguring
> the kernel). The ABI properties are inherited by threads of the same
> application and fork()'ed children but cleared on execve().
> 
> The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
> MTE-specific settings like imprecise vs precise exceptions.
> 
> Signed-off-by: Catalin Marinas 

A question for the user-space folk: if an application opts in to this
ABI, would you want the sigcontext.fault_address and/or siginfo.si_addr
to contain the tag? We currently clear it early in the arm64 entry.S but
we could find a way to pass it down if needed.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v17 03/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-06-18 Thread Catalin Marinas
On Mon, Jun 17, 2019 at 09:57:36AM -0700, Evgenii Stepanov wrote:
> On Mon, Jun 17, 2019 at 6:56 AM Catalin Marinas  
> wrote:
> > On Wed, Jun 12, 2019 at 01:43:20PM +0200, Andrey Konovalov wrote:
> > > From: Catalin Marinas 
> > >
> > > It is not desirable to relax the ABI to allow tagged user addresses into
> > > the kernel indiscriminately. This patch introduces a prctl() interface
> > > for enabling or disabling the tagged ABI with a global sysctl control
> > > for preventing applications from enabling the relaxed ABI (meant for
> > > testing user-space prctl() return error checking without reconfiguring
> > > the kernel). The ABI properties are inherited by threads of the same
> > > application and fork()'ed children but cleared on execve().
> > >
> > > The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
> > > MTE-specific settings like imprecise vs precise exceptions.
> > >
> > > Signed-off-by: Catalin Marinas 
> >
> > A question for the user-space folk: if an application opts in to this
> > ABI, would you want the sigcontext.fault_address and/or siginfo.si_addr
> > to contain the tag? We currently clear it early in the arm64 entry.S but
> > we could find a way to pass it down if needed.
> 
> For HWASan this would not be useful because we instrument memory
> accesses with explicit checks anyway. For MTE, on the other hand, it
> would be very convenient to know the fault address tag without
> disassembling the code.

I could as this differently: does anything break if, once the user
opts in to TBI, fault_address and/or si_addr have non-zero top byte?

Alternatively, we could present the original FAR_EL1 register as a
separate field as we do with ESR_EL1, independently of whether the user
opted in to TBI or not.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v17 03/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-06-14 Thread Catalin Marinas
On Thu, Jun 13, 2019 at 04:45:54PM +0100, Vincenzo Frascino wrote:
> On 13/06/2019 16:35, Catalin Marinas wrote:
> > On Thu, Jun 13, 2019 at 12:16:59PM +0100, Dave P Martin wrote:
> >> On Wed, Jun 12, 2019 at 01:43:20PM +0200, Andrey Konovalov wrote:
> >>> +
> >>> +/*
> >>> + * Control the relaxed ABI allowing tagged user addresses into the 
> >>> kernel.
> >>> + */
> >>> +static unsigned int tagged_addr_prctl_allowed = 1;
> >>> +
> >>> +long set_tagged_addr_ctrl(unsigned long arg)
> >>> +{
> >>> + if (!tagged_addr_prctl_allowed)
> >>> + return -EINVAL;
> >>
> >> So, tagging can actually be locked on by having a process enable it and
> >> then some possibly unrelated process clearing tagged_addr_prctl_allowed.
> >> That feels a bit weird.
> > 
> > The problem is that if you disable the ABI globally, lots of
> > applications would crash. This sysctl is meant as a way to disable the
> > opt-in to the TBI ABI. Another option would be a kernel command line
> > option (I'm not keen on a Kconfig option).
> 
> Why you are not keen on a Kconfig option?

Because I don't want to rebuild the kernel/reboot just to be able to
test how user space handles the ABI opt-in. I'm ok with a Kconfig option
to disable this globally in addition to a run-time option (if actually
needed, I'm not sure).

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v17 03/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-06-14 Thread Catalin Marinas
Hi Dave,

On Thu, Jun 13, 2019 at 12:02:35PM +0100, Dave P Martin wrote:
> On Wed, Jun 12, 2019 at 01:43:20PM +0200, Andrey Konovalov wrote:
> > +/*
> > + * Global sysctl to disable the tagged user addresses support. This control
> > + * only prevents the tagged address ABI enabling via prctl() and does not
> > + * disable it for tasks that already opted in to the relaxed ABI.
> > + */
> > +static int zero;
> > +static int one = 1;
> 
> !!!
> 
> And these can't even be const without a cast.  Yuk.
> 
> (Not your fault though, but it would be nice to have a proc_dobool() to
> avoid this.)

I had the same reaction. Maybe for another patch sanitising this pattern
across the kernel.

> > --- a/include/uapi/linux/prctl.h
> > +++ b/include/uapi/linux/prctl.h
> > @@ -229,4 +229,9 @@ struct prctl_mm_map {
> >  # define PR_PAC_APDBKEY(1UL << 3)
> >  # define PR_PAC_APGAKEY(1UL << 4)
> >  
> > +/* Tagged user address controls for arm64 */
> > +#define PR_SET_TAGGED_ADDR_CTRL55
> > +#define PR_GET_TAGGED_ADDR_CTRL56
> > +# define PR_TAGGED_ADDR_ENABLE (1UL << 0)
> > +
> 
> Do we expect this prctl to be applicable to other arches, or is it
> strictly arm64-specific?

I kept it generic, at least the tagged address part. The MTE bits later
on would be arm64-specific.

> > @@ -2492,6 +2498,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, 
> > arg2, unsigned long, arg3,
> > return -EINVAL;
> > error = PAC_RESET_KEYS(me, arg2);
> > break;
> > +   case PR_SET_TAGGED_ADDR_CTRL:
> > +   if (arg3 || arg4 || arg5)
> 
> 
> 
> How do you anticipate these arguments being used in the future?

I don't expect them to be used at all. But since I'm not sure, I'd force
them as zero for now rather than ignored. The GET is supposed to return
the SET arg2, hence I'd rather not used the other arguments.

> For the SVE prctls I took the view that "get" could only ever mean one
> thing, and "put" already had a flags argument with spare bits for future
> expansion anyway, so forcing the extra arguments to zero would be
> unnecessary.
> 
> Opinions seem to differ on whether requiring surplus arguments to be 0
> is beneficial for hygiene, but the glibc prototype for prctl() is
> 
>   int prctl (int __option, ...);
> 
> so it seemed annoying to have to pass extra arguments to it just for the
> sake of it.  IMHO this also makes the code at the call site less
> readable, since it's not immediately apparent that all those 0s are
> meaningless.

It's fine by me to ignore the other arguments. I just followed the
pattern of some existing prctl options. I don't have a strong opinion
either way.

> > +   return -EINVAL;
> > +   error = SET_TAGGED_ADDR_CTRL(arg2);
> > +   break;
> > +   case PR_GET_TAGGED_ADDR_CTRL:
> > +   if (arg2 || arg3 || arg4 || arg5)
> > +   return -EINVAL;
> > +   error = GET_TAGGED_ADDR_CTRL();
> 
> Having a "get" prctl is probably a good idea, but is there a clear
> usecase for it?

Not sure, maybe some other library (e.g. a JIT compiler) would like to
check whether tagged addresses have been enabled during application
start and decide to generate tagged pointers for itself. It seemed
pretty harmless, unless we add more complex things to the prctl() that
cannot be returned in one request).

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v17 03/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-06-14 Thread Catalin Marinas
On Thu, Jun 13, 2019 at 12:16:59PM +0100, Dave P Martin wrote:
> On Wed, Jun 12, 2019 at 01:43:20PM +0200, Andrey Konovalov wrote:
> > From: Catalin Marinas 
> > 
> > It is not desirable to relax the ABI to allow tagged user addresses into
> > the kernel indiscriminately. This patch introduces a prctl() interface
> > for enabling or disabling the tagged ABI with a global sysctl control
> > for preventing applications from enabling the relaxed ABI (meant for
> > testing user-space prctl() return error checking without reconfiguring
> > the kernel). The ABI properties are inherited by threads of the same
> > application and fork()'ed children but cleared on execve().
> > 
> > The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
> > MTE-specific settings like imprecise vs precise exceptions.
> > 
> > Signed-off-by: Catalin Marinas 
> > ---
> >  arch/arm64/include/asm/processor.h   |  6 +++
> >  arch/arm64/include/asm/thread_info.h |  1 +
> >  arch/arm64/include/asm/uaccess.h |  3 +-
> >  arch/arm64/kernel/process.c  | 67 
> >  include/uapi/linux/prctl.h   |  5 +++
> >  kernel/sys.c | 16 +++
> >  6 files changed, 97 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/include/asm/processor.h 
> > b/arch/arm64/include/asm/processor.h
> > index fcd0e691b1ea..fee457456aa8 100644
> > --- a/arch/arm64/include/asm/processor.h
> > +++ b/arch/arm64/include/asm/processor.h
> > @@ -307,6 +307,12 @@ extern void __init minsigstksz_setup(void);
> >  /* PR_PAC_RESET_KEYS prctl */
> >  #define PAC_RESET_KEYS(tsk, arg)   ptrauth_prctl_reset_keys(tsk, arg)
> >  
> > +/* PR_TAGGED_ADDR prctl */
> 
> (A couple of comments I missed in my last reply:)
> 
> Name mismatch?

Yeah, it went through several names but it seems that I didn't update
all places.

> > +long set_tagged_addr_ctrl(unsigned long arg);
> > +long get_tagged_addr_ctrl(void);
> > +#define SET_TAGGED_ADDR_CTRL(arg)  set_tagged_addr_ctrl(arg)
> > +#define GET_TAGGED_ADDR_CTRL() get_tagged_addr_ctrl()
> > +
> 
> [...]
> 
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index 3767fb21a5b8..69d0be1fc708 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > @@ -30,6 +30,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -323,6 +324,7 @@ void flush_thread(void)
> > fpsimd_flush_thread();
> > tls_thread_flush();
> > flush_ptrace_hw_breakpoint(current);
> > +   clear_thread_flag(TIF_TAGGED_ADDR);
> >  }
> >  
> >  void release_thread(struct task_struct *dead_task)
> > @@ -552,3 +554,68 @@ void arch_setup_new_exec(void)
> >  
> > ptrauth_thread_init_user(current);
> >  }
> > +
> > +/*
> > + * Control the relaxed ABI allowing tagged user addresses into the kernel.
> > + */
> > +static unsigned int tagged_addr_prctl_allowed = 1;
> > +
> > +long set_tagged_addr_ctrl(unsigned long arg)
> > +{
> > +   if (!tagged_addr_prctl_allowed)
> > +   return -EINVAL;
> 
> So, tagging can actually be locked on by having a process enable it and
> then some possibly unrelated process clearing tagged_addr_prctl_allowed.
> That feels a bit weird.

The problem is that if you disable the ABI globally, lots of
applications would crash. This sysctl is meant as a way to disable the
opt-in to the TBI ABI. Another option would be a kernel command line
option (I'm not keen on a Kconfig option).

> Do we want to allow a process that has tagging on to be able to turn
> it off at all?  Possibly things like CRIU might want to do that.

I left it in for symmetry but I don't expect it to be used. A potential
use-case is doing it per subsequent threads in an application.

> > +   if (is_compat_task())
> > +   return -EINVAL;
> > +   if (arg & ~PR_TAGGED_ADDR_ENABLE)
> > +   return -EINVAL;
> 
> How do we expect this argument to be extended in the future?

Yes, for MTE. That's why I wouldn't allow random bits here.

> I'm wondering whether this is really a bitmask or an enum, or a mixture
> of the two.  Maybe it doesn't matter.

User may want to set PR_TAGGED_ADDR_ENABLE | PR_MTE_PRECISE in a single
call.

> > +   if (arg & PR_TAGGED_ADDR_ENABLE)
> > +   set_thread_flag(TIF_TAGGED_ADDR);
> > +   else
> > +   clear_thread_flag(TIF_TAGGED_ADDR);
> 
> I think update_thread_flag() could be used here.

Yes. I forgot you added this.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v17 15/15] selftests, arm64: add a selftest for passing tagged pointers to kernel

2019-06-13 Thread Catalin Marinas
On Wed, Jun 12, 2019 at 01:30:36PM +0100, Szabolcs Nagy wrote:
> On 12/06/2019 12:43, Andrey Konovalov wrote:
> > --- /dev/null
> > +++ b/tools/testing/selftests/arm64/tags_lib.c
> > @@ -0,0 +1,62 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include 
> > +#include 
> > +
> > +#define TAG_SHIFT  (56)
> > +#define TAG_MASK   (0xffUL << TAG_SHIFT)
> > +
> > +#define PR_SET_TAGGED_ADDR_CTRL55
> > +#define PR_GET_TAGGED_ADDR_CTRL56
> > +#define PR_TAGGED_ADDR_ENABLE  (1UL << 0)
> > +
> > +void *__libc_malloc(size_t size);
> > +void __libc_free(void *ptr);
> > +void *__libc_realloc(void *ptr, size_t size);
> > +void *__libc_calloc(size_t nmemb, size_t size);
> 
> this does not work on at least musl.
> 
> the most robust solution would be to implement
> the malloc apis with mmap/munmap/mremap, if that's
> too cumbersome then use dlsym RTLD_NEXT (although
> that has the slight wart that in glibc it may call
> calloc so wrapping calloc that way is tricky).
> 
> in simple linux tests i'd just use static or
> stack allocations or mmap.
> 
> if a generic preloadable lib solution is needed
> then do it properly with pthread_once to avoid
> races etc.

Thanks for the feedback Szabolcs. I guess we can go back to the initial
simple test that Andrey had and drop the whole LD_PRELOAD hack (I'll
just use it for my internal testing).

BTW, when you get some time, please review Vincenzo's ABI documentation
patches from a user/libc perspective. Once agreed, they should become
part of this series.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v17 03/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-06-13 Thread Catalin Marinas
On Wed, Jun 12, 2019 at 01:43:20PM +0200, Andrey Konovalov wrote:
> From: Catalin Marinas 
> 
> It is not desirable to relax the ABI to allow tagged user addresses into
> the kernel indiscriminately. This patch introduces a prctl() interface
> for enabling or disabling the tagged ABI with a global sysctl control
> for preventing applications from enabling the relaxed ABI (meant for
> testing user-space prctl() return error checking without reconfiguring
> the kernel). The ABI properties are inherited by threads of the same
> application and fork()'ed children but cleared on execve().
> 
> The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
> MTE-specific settings like imprecise vs precise exceptions.
> 
> Signed-off-by: Catalin Marinas 

You need your signed-off-by here since you are contributing it. And
thanks for adding the comment to the TIF definition.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v16 02/16] arm64: untag user pointers in access_ok and __uaccess_mask_ptr

2019-06-13 Thread Catalin Marinas
On Wed, Jun 12, 2019 at 01:03:10PM +0200, Andrey Konovalov wrote:
> On Tue, Jun 11, 2019 at 7:39 PM Catalin Marinas  
> wrote:
> > On Tue, Jun 11, 2019 at 07:09:46PM +0200, Andrey Konovalov wrote:
> > > Should I drop access_ok() change from my patch, since yours just reverts 
> > > it?
> >
> > Not necessary, your patch just relaxes the ABI for all apps, mine
> > tightens it. You could instead move the untagging to __range_ok() and
> > rebase my patch accordingly.
> 
> OK, will do. I'll also add a comment next to TIF_TAGGED_ADDR as Vincenzo 
> asked.

Thanks.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v16 12/16] IB, arm64: untag user pointers in ib_uverbs_(re)reg_mr()

2019-06-13 Thread Catalin Marinas
On Tue, Jun 04, 2019 at 03:09:26PM +0200, Andrey Konovalov wrote:
> On Tue, Jun 4, 2019 at 3:02 PM Jason Gunthorpe  wrote:
> > On Tue, Jun 04, 2019 at 02:45:32PM +0200, Andrey Konovalov wrote:
> > > On Tue, Jun 4, 2019 at 2:27 PM Jason Gunthorpe  wrote:
> > > > On Tue, Jun 04, 2019 at 02:18:19PM +0200, Andrey Konovalov wrote:
> > > > > On Mon, Jun 3, 2019 at 7:46 PM Jason Gunthorpe  wrote:
> > > > > > On Mon, Jun 03, 2019 at 06:55:14PM +0200, Andrey Konovalov wrote:
> > > > > > > This patch is a part of a series that extends arm64 kernel ABI to 
> > > > > > > allow to
> > > > > > > pass tagged user pointers (with the top byte set to something 
> > > > > > > else other
> > > > > > > than 0x00) as syscall arguments.
> > > > > > >
> > > > > > > ib_uverbs_(re)reg_mr() use provided user pointers for vma lookups 
> > > > > > > (through
> > > > > > > e.g. mlx4_get_umem_mr()), which can only by done with untagged 
> > > > > > > pointers.
> > > > > > >
> > > > > > > Untag user pointers in these functions.
> > > > > > >
> > > > > > > Signed-off-by: Andrey Konovalov 
> > > > > > >  drivers/infiniband/core/uverbs_cmd.c | 4 
> > > > > > >  1 file changed, 4 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/infiniband/core/uverbs_cmd.c 
> > > > > > > b/drivers/infiniband/core/uverbs_cmd.c
> > > > > > > index 5a3a1780ceea..f88ee733e617 100644
> > > > > > > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > > > > > > @@ -709,6 +709,8 @@ static int ib_uverbs_reg_mr(struct 
> > > > > > > uverbs_attr_bundle *attrs)
> > > > > > >   if (ret)
> > > > > > >   return ret;
> > > > > > >
> > > > > > > + cmd.start = untagged_addr(cmd.start);
> > > > > > > +
> > > > > > >   if ((cmd.start & ~PAGE_MASK) != (cmd.hca_va & ~PAGE_MASK))
> > > > > > >   return -EINVAL;
> > > > > >
> > > > > > I feel like we shouldn't thave to do this here, surely the cmd.start
> > > > > > should flow unmodified to get_user_pages, and gup should untag it?
> > > > > >
> > > > > > ie, this sort of direction for the IB code (this would be a giant
> > > > > > patch, so I didn't have time to write it all, but I think it is much
> > > > > > saner):
> > > > >
> > > > > ib_uverbs_reg_mr() passes cmd.start to mlx4_get_umem_mr(), which calls
> > > > > find_vma(), which only accepts untagged addresses. Could you explain
> > > > > how your patch helps?
> > > >
> > > > That mlx4 is just a 'weird duck', it is not the normal flow, and I
> > > > don't think the core code should be making special consideration for
> > > > it.
> > >
> > > How do you think we should do untagging (or something else) to deal
> > > with this 'weird duck' case?
> >
> > mlx4 should handle it around the call to find_vma like other patches
> > do, ideally as part of the cast from a void __user * to the unsigned
> > long that find_vma needs
> 
> So essentially what we had a few versions ago
> (https://lkml.org/lkml/2019/4/30/785) plus changing unsigned longs to
> __user * across all IB code? I think the second part is something
> that's not related to this series and needs to be done separately. I
> can move untagging back to mlx4_get_umem_mr() though.
> 
> Catalin, you've initially asked to to move untagging out of
> mlx4_get_umem_mr(), do you have any comments on this?

It's fine by me either way. My original reasoning was to untag this at
the higher level as tags may not be relevant to the mlx4 code. If that's
what Jason prefers, go for it.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v16 09/16] fs, arm64: untag user pointers in fs/userfaultfd.c

2019-06-13 Thread Catalin Marinas
On Mon, Jun 03, 2019 at 06:55:11PM +0200, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> userfaultfd code use provided user pointers for vma lookups, which can
> only by done with untagged pointers.
> 
> Untag user pointers in validate_range().
> 
> Signed-off-by: Andrey Konovalov 

Reviewed-by: Catalin Marinas 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v16 15/16] vfio/type1, arm64: untag user pointers in vaddr_get_pfn

2019-06-13 Thread Catalin Marinas
On Mon, Jun 03, 2019 at 06:55:17PM +0200, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> vaddr_get_pfn() uses provided user pointers for vma lookups, which can
> only by done with untagged pointers.
> 
> Untag user pointers in this function.
> 
> Signed-off-by: Andrey Konovalov 

Reviewed-by: Catalin Marinas 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v16 02/16] arm64: untag user pointers in access_ok and __uaccess_mask_ptr

2019-06-13 Thread Catalin Marinas
Hi Vincenzo,

On Tue, Jun 11, 2019 at 06:09:10PM +0100, Vincenzo Frascino wrote:
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index 3767fb21a5b8..69d0be1fc708 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > @@ -30,6 +30,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -323,6 +324,7 @@ void flush_thread(void)
> > fpsimd_flush_thread();
> > tls_thread_flush();
> > flush_ptrace_hw_breakpoint(current);
> > +   clear_thread_flag(TIF_TAGGED_ADDR);
> 
> Nit: in line we the other functions in thread_flush we could have something 
> like
> "tagged_addr_thread_flush", maybe inlined.

The other functions do a lot more than clearing a TIF flag, so they
deserved their own place. We could do this when adding MTE support. I
think we also need to check what other TIF flags we may inadvertently
pass on execve(), maybe have a mask clearing.

> > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> > index 094bb03b9cc2..2e927b3e9d6c 100644
> > --- a/include/uapi/linux/prctl.h
> > +++ b/include/uapi/linux/prctl.h
> > @@ -229,4 +229,9 @@ struct prctl_mm_map {
> >  # define PR_PAC_APDBKEY(1UL << 3)
> >  # define PR_PAC_APGAKEY(1UL << 4)
> >  
> > +/* Tagged user address controls for arm64 */
> > +#define PR_SET_TAGGED_ADDR_CTRL55
> > +#define PR_GET_TAGGED_ADDR_CTRL56
> > +# define PR_TAGGED_ADDR_ENABLE (1UL << 0)
> > +
> >  #endif /* _LINUX_PRCTL_H */
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 2969304c29fe..ec48396b4943 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -124,6 +124,12 @@
> >  #ifndef PAC_RESET_KEYS
> >  # define PAC_RESET_KEYS(a, b)  (-EINVAL)
> >  #endif
> > +#ifndef SET_TAGGED_ADDR_CTRL
> > +# define SET_TAGGED_ADDR_CTRL(a)   (-EINVAL)
> > +#endif
> > +#ifndef GET_TAGGED_ADDR_CTRL
> > +# define GET_TAGGED_ADDR_CTRL()(-EINVAL)
> > +#endif
> >  
> >  /*
> >   * this is where the system-wide overflow UID and GID are defined, for
> > @@ -2492,6 +2498,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, 
> > arg2, unsigned long, arg3,
> > return -EINVAL;
> > error = PAC_RESET_KEYS(me, arg2);
> > break;
> > +   case PR_SET_TAGGED_ADDR_CTRL:
> > +   if (arg3 || arg4 || arg5)
> > +   return -EINVAL;
> > +   error = SET_TAGGED_ADDR_CTRL(arg2);
> > +   break;
> > +   case PR_GET_TAGGED_ADDR_CTRL:
> > +   if (arg2 || arg3 || arg4 || arg5)
> > +   return -EINVAL;
> > +   error = GET_TAGGED_ADDR_CTRL();
> > +   break;
> 
> Why do we need two prctl here? We could have only one and use arg2 as set/get
> and arg3 as a parameter. What do you think?

This follows the other PR_* options, e.g. PR_SET_VL/GET_VL,
PR_*_FP_MODE. We will use other bits in arg2, for example to set the
precise vs imprecise MTE trapping.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v16 02/16] arm64: untag user pointers in access_ok and __uaccess_mask_ptr

2019-06-12 Thread Catalin Marinas
On Tue, Jun 11, 2019 at 07:09:46PM +0200, Andrey Konovalov wrote:
> On Tue, Jun 11, 2019 at 4:57 PM Catalin Marinas  
> wrote:
> >
> > On Mon, Jun 10, 2019 at 06:53:27PM +0100, Catalin Marinas wrote:
> > > On Mon, Jun 03, 2019 at 06:55:04PM +0200, Andrey Konovalov wrote:
> > > > diff --git a/arch/arm64/include/asm/uaccess.h 
> > > > b/arch/arm64/include/asm/uaccess.h
> > > > index e5d5f31c6d36..9164ecb5feca 100644
> > > > --- a/arch/arm64/include/asm/uaccess.h
> > > > +++ b/arch/arm64/include/asm/uaccess.h
> > > > @@ -94,7 +94,7 @@ static inline unsigned long __range_ok(const void 
> > > > __user *addr, unsigned long si
> > > > return ret;
> > > >  }
> > > >
> > > > -#define access_ok(addr, size)  __range_ok(addr, size)
> > > > +#define access_ok(addr, size)  __range_ok(untagged_addr(addr), 
> > > > size)
> > >
> > > I'm going to propose an opt-in method here (RFC for now). We can't have
> > > a check in untagged_addr() since this is already used throughout the
> > > kernel for both user and kernel addresses (khwasan) but we can add one
> > > in __range_ok(). The same prctl() option will be used for controlling
> > > the precise/imprecise mode of MTE later on. We can use a TIF_ flag here
> > > assuming that this will be called early on and any cloned thread will
> > > inherit this.
> >
> > Updated patch, inlining it below. Once we agreed on the approach, I
> > think Andrey can insert in in this series, probably after patch 2. The
> > differences from the one I posted yesterday:
> >
> > - renamed PR_* macros together with get/set variants and the possibility
> >   to disable the relaxed ABI
> >
> > - sysctl option - /proc/sys/abi/tagged_addr to disable the ABI globally
> >   (just the prctl() opt-in, tasks already using it won't be affected)
> >
> > And, of course, it needs more testing.
> 
> Sure, I'll add it to the series.
> 
> Should I drop access_ok() change from my patch, since yours just reverts it?

Not necessary, your patch just relaxes the ABI for all apps, mine
tightens it. You could instead move the untagging to __range_ok() and
rebase my patch accordingly.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v16 16/16] selftests, arm64: add a selftest for passing tagged pointers to kernel

2019-06-12 Thread Catalin Marinas
On Tue, Jun 11, 2019 at 07:18:04PM +0200, Andrey Konovalov wrote:
> On Tue, Jun 11, 2019 at 5:01 PM Catalin Marinas  
> wrote:
> > static void *tag_ptr(void *ptr)
> > {
> > static int tagged_addr_err = 1;
> > unsigned long tag = 0;
> >
> > if (tagged_addr_err == 1)
> > tagged_addr_err = prctl(PR_SET_TAGGED_ADDR_CTRL,
> > PR_TAGGED_ADDR_ENABLE, 0, 0, 0);
> 
> I think this requires atomics. malloc() can be called from multiple threads.

It's slightly racy but I assume in a real libc it can be initialised
earlier than the hook calls while still in single-threaded mode (I had
a quick attempt with __attribute__((constructor)) but didn't get far).

Even with the race, under normal circumstances calling the prctl() twice
is not a problem. I think the risk here is that someone disables the ABI
via sysctl and the ABI is enabled for some of the threads only.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v16 02/16] arm64: untag user pointers in access_ok and __uaccess_mask_ptr

2019-06-12 Thread Catalin Marinas
On Mon, Jun 10, 2019 at 06:53:27PM +0100, Catalin Marinas wrote:
> On Mon, Jun 03, 2019 at 06:55:04PM +0200, Andrey Konovalov wrote:
> > diff --git a/arch/arm64/include/asm/uaccess.h 
> > b/arch/arm64/include/asm/uaccess.h
> > index e5d5f31c6d36..9164ecb5feca 100644
> > --- a/arch/arm64/include/asm/uaccess.h
> > +++ b/arch/arm64/include/asm/uaccess.h
> > @@ -94,7 +94,7 @@ static inline unsigned long __range_ok(const void __user 
> > *addr, unsigned long si
> > return ret;
> >  }
> >  
> > -#define access_ok(addr, size)  __range_ok(addr, size)
> > +#define access_ok(addr, size)  __range_ok(untagged_addr(addr), size)
> 
> I'm going to propose an opt-in method here (RFC for now). We can't have
> a check in untagged_addr() since this is already used throughout the
> kernel for both user and kernel addresses (khwasan) but we can add one
> in __range_ok(). The same prctl() option will be used for controlling
> the precise/imprecise mode of MTE later on. We can use a TIF_ flag here
> assuming that this will be called early on and any cloned thread will
> inherit this.

Updated patch, inlining it below. Once we agreed on the approach, I
think Andrey can insert in in this series, probably after patch 2. The
differences from the one I posted yesterday:

- renamed PR_* macros together with get/set variants and the possibility
  to disable the relaxed ABI

- sysctl option - /proc/sys/abi/tagged_addr to disable the ABI globally
  (just the prctl() opt-in, tasks already using it won't be affected)

And, of course, it needs more testing.

-----8<----
From 7c624777a4e545522dec1b34e60f0229cb2bd59f Mon Sep 17 00:00:00 2001
From: Catalin Marinas 
Date: Tue, 11 Jun 2019 13:03:38 +0100
Subject: [PATCH] arm64: Introduce prctl() options to control the tagged user
 addresses ABI

It is not desirable to relax the ABI to allow tagged user addresses into
the kernel indiscriminately. This patch introduces a prctl() interface
for enabling or disabling the tagged ABI with a global sysctl control
for preventing applications from enabling the relaxed ABI (meant for
testing user-space prctl() return error checking without reconfiguring
the kernel). The ABI properties are inherited by threads of the same
application and fork()'ed children but cleared on execve().

The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
MTE-specific settings like imprecise vs precise exceptions.

Signed-off-by: Catalin Marinas 
---
 arch/arm64/include/asm/processor.h   |  6 +++
 arch/arm64/include/asm/thread_info.h |  1 +
 arch/arm64/include/asm/uaccess.h |  5 ++-
 arch/arm64/kernel/process.c  | 67 
 include/uapi/linux/prctl.h   |  5 +++
 kernel/sys.c | 16 +++
 6 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index fcd0e691b1ea..fee457456aa8 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -307,6 +307,12 @@ extern void __init minsigstksz_setup(void);
 /* PR_PAC_RESET_KEYS prctl */
 #define PAC_RESET_KEYS(tsk, arg)   ptrauth_prctl_reset_keys(tsk, arg)
 
+/* PR_TAGGED_ADDR prctl */
+long set_tagged_addr_ctrl(unsigned long arg);
+long get_tagged_addr_ctrl(void);
+#define SET_TAGGED_ADDR_CTRL(arg)  set_tagged_addr_ctrl(arg)
+#define GET_TAGGED_ADDR_CTRL() get_tagged_addr_ctrl()
+
 /*
  * For CONFIG_GCC_PLUGIN_STACKLEAK
  *
diff --git a/arch/arm64/include/asm/thread_info.h 
b/arch/arm64/include/asm/thread_info.h
index c285d1ce7186..7263d4c973ce 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -101,6 +101,7 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_SVE23  /* Scalable Vector Extension in 
use */
 #define TIF_SVE_VL_INHERIT 24  /* Inherit sve_vl_onexec across exec */
 #define TIF_SSBD   25  /* Wants SSB mitigation */
+#define TIF_TAGGED_ADDR26
 
 #define _TIF_SIGPENDING(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 9164ecb5feca..995b9ea11a89 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -73,6 +73,9 @@ static inline unsigned long __range_ok(const void __user 
*addr, unsigned long si
 {
unsigned long ret, limit = current_thread_info()->addr_limit;
 
+   if (test_thread_flag(TIF_TAGGED_ADDR))
+   addr = untagged_addr(addr);
+
__chk_user_ptr(addr);
asm volatile(
// A + B <= C + 1 for all A,B,C, in four easy steps:
@@ -94,7 +97,7 @@ static inline unsigned long __range_ok(const void __user 
*addr, unsigned long si
return re

Re: [PATCH v16 16/16] selftests, arm64: add a selftest for passing tagged pointers to kernel

2019-06-12 Thread Catalin Marinas
On Mon, Jun 03, 2019 at 06:55:18PM +0200, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> This patch adds a simple test, that calls the uname syscall with a
> tagged user pointer as an argument. Without the kernel accepting tagged
> user pointers the test fails with EFAULT.
> 
> Signed-off-by: Andrey Konovalov 

BTW, you could add

Co-developed-by: Catalin Marinas 

since I wrote the malloc() etc. hooks.


> +static void *tag_ptr(void *ptr)
> +{
> + unsigned long tag = rand() & 0xff;
> + if (!ptr)
> + return ptr;
> + return (void *)((unsigned long)ptr | (tag << TAG_SHIFT));
> +}

With the prctl() option, this function becomes (if you have a better
idea, fine by me):

--8<---
#include 
#include 

#define TAG_SHIFT   (56)
#define TAG_MASK(0xffUL << TAG_SHIFT)

#define PR_SET_TAGGED_ADDR_CTRL 55
#define PR_GET_TAGGED_ADDR_CTRL 56
# define PR_TAGGED_ADDR_ENABLE  (1UL << 0)

void *__libc_malloc(size_t size);
void __libc_free(void *ptr);
void *__libc_realloc(void *ptr, size_t size);
void *__libc_calloc(size_t nmemb, size_t size);

static void *tag_ptr(void *ptr)
{
static int tagged_addr_err = 1;
unsigned long tag = 0;

if (tagged_addr_err == 1)
tagged_addr_err = prctl(PR_SET_TAGGED_ADDR_CTRL,
PR_TAGGED_ADDR_ENABLE, 0, 0, 0);

if (!ptr)
return ptr;
if (!tagged_addr_err)
tag = rand() & 0xff;

return (void *)((unsigned long)ptr | (tag << TAG_SHIFT));
}

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v16 05/16] arm64: untag user pointers passed to memory syscalls

2019-06-12 Thread Catalin Marinas
On Tue, Jun 11, 2019 at 05:35:31PM +0200, Andrey Konovalov wrote:
> On Mon, Jun 10, 2019 at 4:28 PM Catalin Marinas  
> wrote:
> > On Mon, Jun 03, 2019 at 06:55:07PM +0200, Andrey Konovalov wrote:
> > > This patch is a part of a series that extends arm64 kernel ABI to allow to
> > > pass tagged user pointers (with the top byte set to something else other
> > > than 0x00) as syscall arguments.
> > >
> > > This patch allows tagged pointers to be passed to the following memory
> > > syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect,
> > > mremap, msync, munlock.
> > >
> > > Signed-off-by: Andrey Konovalov 
> >
> > I would add in the commit log (and possibly in the code with a comment)
> > that mremap() and mmap() do not currently accept tagged hint addresses.
> > Architectures may interpret the hint tag as a background colour for the
> > corresponding vma. With this:
> 
> I'll change the commit log. Where do you you think I should put this
> comment? Before mmap and mremap definitions in mm/?

On arm64 we use our own sys_mmap(). I'd say just add a comment on the
generic mremap() just before the untagged_addr() along the lines that
new_address is not untagged for preserving similar behaviour to mmap().

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v14 13/17] IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr

2019-05-06 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 03:25:09PM +0200, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can
> only by done with untagged pointers.
> 
> Untag user pointers in this function.
> 
> Signed-off-by: Andrey Konovalov 
> Reviewed-by: Leon Romanovsky 
> ---
>  drivers/infiniband/hw/mlx4/mr.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
> index 395379a480cb..9a35ed2c6a6f 100644
> --- a/drivers/infiniband/hw/mlx4/mr.c
> +++ b/drivers/infiniband/hw/mlx4/mr.c
> @@ -378,6 +378,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata 
> *udata, u64 start,
>* again
>*/
>   if (!ib_access_writable(access_flags)) {
> + unsigned long untagged_start = untagged_addr(start);
>   struct vm_area_struct *vma;
>  
>   down_read(>mm->mmap_sem);
> @@ -386,9 +387,9 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata 
> *udata, u64 start,
>* cover the memory, but for now it requires a single vma to
>* entirely cover the MR to support RO mappings.
>*/
> - vma = find_vma(current->mm, start);
> - if (vma && vma->vm_end >= start + length &&
> - vma->vm_start <= start) {
> + vma = find_vma(current->mm, untagged_start);
> + if (vma && vma->vm_end >= untagged_start + length &&
> + vma->vm_start <= untagged_start) {
>   if (vma->vm_flags & VM_WRITE)
>   access_flags |= IB_ACCESS_LOCAL_WRITE;
>   } else {

Discussion ongoing on the previous version of the patch but I'm more
inclined to do this in ib_uverbs_(re)reg_mr() on cmd.start.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v14 10/17] fs, arm64: untag user pointers in fs/userfaultfd.c

2019-05-06 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 03:25:06PM +0200, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> userfaultfd_register() and userfaultfd_unregister() use provided user
> pointers for vma lookups, which can only by done with untagged pointers.
> 
> Untag user pointers in these functions.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  fs/userfaultfd.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index f5de1e726356..fdee0db0e847 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1325,6 +1325,9 @@ static int userfaultfd_register(struct userfaultfd_ctx 
> *ctx,
>   goto out;
>   }
>  
> + uffdio_register.range.start =
> + untagged_addr(uffdio_register.range.start);
> +
>   ret = validate_range(mm, uffdio_register.range.start,
>uffdio_register.range.len);
>   if (ret)
> @@ -1514,6 +1517,8 @@ static int userfaultfd_unregister(struct 
> userfaultfd_ctx *ctx,
>   if (copy_from_user(_unregister, buf, sizeof(uffdio_unregister)))
>   goto out;
>  
> + uffdio_unregister.start = untagged_addr(uffdio_unregister.start);
> +
>   ret = validate_range(mm, uffdio_unregister.start,
>uffdio_unregister.len);
>   if (ret)

Wouldn't it be easier to do this in validate_range()? There are a few
more calls in this file, though I didn't check whether a tagged address
would cause issues.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v14 08/17] mm, arm64: untag user pointers in get_vaddr_frames

2019-05-06 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 03:25:04PM +0200, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> get_vaddr_frames uses provided user pointers for vma lookups, which can
> only by done with untagged pointers. Instead of locating and changing
> all callers of this function, perform untagging in it.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  mm/frame_vector.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> index c64dca6e27c2..c431ca81dad5 100644
> --- a/mm/frame_vector.c
> +++ b/mm/frame_vector.c
> @@ -46,6 +46,8 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> nr_frames,
>   if (WARN_ON_ONCE(nr_frames > vec->nr_allocated))
>   nr_frames = vec->nr_allocated;
>  
> + start = untagged_addr(start);
> +
>   down_read(>mmap_sem);
>   locked = 1;
>   vma = find_vma_intersection(mm, start, start + 1);

Is this some buffer that the user may have malloc'ed? I got lost when
trying to track down the provenience of this buffer.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v13 10/20] kernel, arm64: untag user pointers in prctl_set_mm*

2019-04-29 Thread Catalin Marinas
On Mon, Apr 01, 2019 at 06:44:34PM +0200, Andrey Konovalov wrote:
> On Fri, Mar 22, 2019 at 4:41 PM Catalin Marinas  
> wrote:
> > On Wed, Mar 20, 2019 at 03:51:24PM +0100, Andrey Konovalov wrote:
> > > @@ -2120,13 +2135,14 @@ static int prctl_set_mm(int opt, unsigned long 
> > > addr,
> > >   if (opt == PR_SET_MM_AUXV)
> > >   return prctl_set_auxv(mm, addr, arg4);
> > >
> > > - if (addr >= TASK_SIZE || addr < mmap_min_addr)
> > > + if (untagged_addr(addr) >= TASK_SIZE ||
> > > + untagged_addr(addr) < mmap_min_addr)
> > >   return -EINVAL;
> > >
> > >   error = -EINVAL;
> > >
> > >   down_write(>mmap_sem);
> > > - vma = find_vma(mm, addr);
> > > + vma = find_vma(mm, untagged_addr(addr));
> > >
> > >   prctl_map.start_code= mm->start_code;
> > >   prctl_map.end_code  = mm->end_code;
> >
> > Does this mean that we are left with tagged addresses for the
> > mm->start_code etc. values? I really don't think we should allow this,
> > I'm not sure what the implications are in other parts of the kernel.
> >
> > Arguably, these are not even pointer values but some address ranges. I
> > know we decided to relax this notion for mmap/mprotect/madvise() since
> > the user function prototypes take pointer as arguments but it feels like
> > we are overdoing it here (struct prctl_mm_map doesn't even have
> > pointers).
> >
> > What is the use-case for allowing tagged addresses here? Can user space
> > handle untagging?
> 
> I don't know any use cases for this. I did it because it seems to be
> covered by the relaxed ABI. I'm not entirely sure what to do here,
> should I just drop this patch?

If we allow tagged addresses to be passed here, we'd have to untag them
before they end up in the mm->start_code etc. members.

I know we are trying to relax the ABI here w.r.t. address ranges but
mostly because we couldn't figure out a way to document unambiguously
the difference between a user pointer that may be dereferenced by the
kernel (tags allowed) and an address typically used for managing the
address space layout. Suggestions welcomed.

I'd say just drop this patch and capture it in the ABI document.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v13 10/20] kernel, arm64: untag user pointers in prctl_set_mm*

2019-03-24 Thread Catalin Marinas
On Wed, Mar 20, 2019 at 03:51:24PM +0100, Andrey Konovalov wrote:
> @@ -2120,13 +2135,14 @@ static int prctl_set_mm(int opt, unsigned long addr,
>   if (opt == PR_SET_MM_AUXV)
>   return prctl_set_auxv(mm, addr, arg4);
>  
> - if (addr >= TASK_SIZE || addr < mmap_min_addr)
> + if (untagged_addr(addr) >= TASK_SIZE ||
> + untagged_addr(addr) < mmap_min_addr)
>   return -EINVAL;
>  
>   error = -EINVAL;
>  
>   down_write(>mmap_sem);
> - vma = find_vma(mm, addr);
> + vma = find_vma(mm, untagged_addr(addr));
>  
>   prctl_map.start_code= mm->start_code;
>   prctl_map.end_code  = mm->end_code;

Does this mean that we are left with tagged addresses for the
mm->start_code etc. values? I really don't think we should allow this,
I'm not sure what the implications are in other parts of the kernel.

Arguably, these are not even pointer values but some address ranges. I
know we decided to relax this notion for mmap/mprotect/madvise() since
the user function prototypes take pointer as arguments but it feels like
we are overdoing it here (struct prctl_mm_map doesn't even have
pointers).

What is the use-case for allowing tagged addresses here? Can user space
handle untagging?

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v13 04/20] mm, arm64: untag user pointers passed to memory syscalls

2019-03-24 Thread Catalin Marinas
On Wed, Mar 20, 2019 at 03:51:18PM +0100, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> This patch allows tagged pointers to be passed to the following memory
> syscalls: madvise, mbind, get_mempolicy, mincore, mlock, mlock2, brk,
> mmap_pgoff, old_mmap, munmap, remap_file_pages, mprotect, pkey_mprotect,
> mremap, msync and shmdt.
> 
> This is done by untagging pointers passed to these syscalls in the
> prologues of their handlers.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  ipc/shm.c  | 2 ++
>  mm/madvise.c   | 2 ++
>  mm/mempolicy.c | 5 +
>  mm/migrate.c   | 1 +
>  mm/mincore.c   | 2 ++
>  mm/mlock.c | 5 +
>  mm/mmap.c  | 7 +++
>  mm/mprotect.c  | 1 +
>  mm/mremap.c| 2 ++
>  mm/msync.c | 2 ++
>  10 files changed, 29 insertions(+)

I wonder whether it's better to keep these as wrappers in the arm64
code.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v13 14/20] drm/amdgpu, arm64: untag user pointers in amdgpu_ttm_tt_get_user_pages

2019-03-24 Thread Catalin Marinas
On Wed, Mar 20, 2019 at 03:51:28PM +0100, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> amdgpu_ttm_tt_get_user_pages() uses provided user pointers for vma
> lookups, which can only by done with untagged pointers.
> 
> Untag user pointers in this function.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 73e71e61dc99..891b027fa33b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -751,10 +751,11 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, 
> struct page **pages)
>* check that we only use anonymous memory to prevent problems
>* with writeback
>*/
> - unsigned long end = gtt->userptr + ttm->num_pages * PAGE_SIZE;
> + unsigned long userptr = untagged_addr(gtt->userptr);
> + unsigned long end = userptr + ttm->num_pages * PAGE_SIZE;
>   struct vm_area_struct *vma;
>  
> - vma = find_vma(mm, gtt->userptr);
> + vma = find_vma(mm, userptr);
>   if (!vma || vma->vm_file || vma->vm_end < end) {
>   up_read(>mmap_sem);
>   return -EPERM;

I tried to track this down but I failed to see whether user could
provide an tagged pointer here (under the restrictions as per Vincenzo's
ABI document).

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v13 12/20] uprobes, arm64: untag user pointers in find_active_uprobe

2019-03-24 Thread Catalin Marinas
On Wed, Mar 20, 2019 at 03:51:26PM +0100, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> find_active_uprobe() uses user pointers (obtained via
> instruction_pointer(regs)) for vma lookups, which can only by done with
> untagged pointers.
> 
> Untag user pointers in this function.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  kernel/events/uprobes.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index c5cde87329c7..d3a2716a813a 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1992,6 +1992,8 @@ static struct uprobe *find_active_uprobe(unsigned long 
> bp_vaddr, int *is_swbp)
>   struct uprobe *uprobe = NULL;
>   struct vm_area_struct *vma;
>  
> + bp_vaddr = untagged_addr(bp_vaddr);
> +
>   down_read(>mmap_sem);
>   vma = find_vma(mm, bp_vaddr);
>   if (vma && vma->vm_start <= bp_vaddr) {

Similarly here, that's a breakpoint address, hence instruction pointer
(PC) which is untagged.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v13 09/20] net, arm64: untag user pointers in tcp_zerocopy_receive

2019-03-24 Thread Catalin Marinas
On Wed, Mar 20, 2019 at 03:51:23PM +0100, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> tcp_zerocopy_receive() uses provided user pointers for vma lookups, which
> can only by done with untagged pointers.
> 
> Untag user pointers in this function.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  net/ipv4/tcp.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 6baa6dc1b13b..855a1f68c1ea 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1761,6 +1761,8 @@ static int tcp_zerocopy_receive(struct sock *sk,
>   if (address & (PAGE_SIZE - 1) || address != zc->address)
>   return -EINVAL;
>  
> + address = untagged_addr(address);
> +
>   if (sk->sk_state == TCP_LISTEN)
>   return -ENOTCONN;

I don't think we need this patch if we stick to Vincenzo's ABI
restrictions. Can zc->address be an anonymous mmap()? My understanding
of TCP_ZEROCOPY_RECEIVE is that this is an mmap() on a socket, so user
should not tag such pointer.

We want to allow tagged pointers to work transparently only for heap and
stack, hence the restriction to anonymous mmap() and those addresses
below sbrk(0).

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v13 13/20] bpf, arm64: untag user pointers in stack_map_get_build_id_offset

2019-03-24 Thread Catalin Marinas
On Wed, Mar 20, 2019 at 03:51:27PM +0100, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> stack_map_get_build_id_offset() uses provided user pointers for vma
> lookups, which can only by done with untagged pointers.
> 
> Untag user pointers in this function for doing the lookup and
> calculating the offset, but save as is in the bpf_stack_build_id
> struct.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  kernel/bpf/stackmap.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 950ab2f28922..bb89341d3faf 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -320,7 +320,9 @@ static void stack_map_get_build_id_offset(struct 
> bpf_stack_build_id *id_offs,
>   }
>  
>   for (i = 0; i < trace_nr; i++) {
> - vma = find_vma(current->mm, ips[i]);
> + u64 untagged_ip = untagged_addr(ips[i]);
> +
> + vma = find_vma(current->mm, untagged_ip);
>   if (!vma || stack_map_get_build_id(vma, id_offs[i].build_id)) {
>   /* per entry fall back to ips */
>   id_offs[i].status = BPF_STACK_BUILD_ID_IP;
> @@ -328,7 +330,7 @@ static void stack_map_get_build_id_offset(struct 
> bpf_stack_build_id *id_offs,
>   memset(id_offs[i].build_id, 0, BPF_BUILD_ID_SIZE);
>   continue;
>   }
> - id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ips[i]
> + id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + untagged_ip
>   - vma->vm_start;
>   id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
>   }

Can the ips[*] here ever be tagged?

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v13 15/20] drm/radeon, arm64: untag user pointers in radeon_ttm_tt_pin_userptr

2019-03-24 Thread Catalin Marinas
On Wed, Mar 20, 2019 at 03:51:29PM +0100, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> radeon_ttm_tt_pin_userptr() uses provided user pointers for vma
> lookups, which can only by done with untagged pointers.
> 
> Untag user pointers in this function.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
> b/drivers/gpu/drm/radeon/radeon_ttm.c
> index 9920a6fc11bf..872a98796117 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -497,9 +497,10 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
>   if (gtt->userflags & RADEON_GEM_USERPTR_ANONONLY) {
>   /* check that we only pin down anonymous memory
>  to prevent problems with writeback */
> - unsigned long end = gtt->userptr + ttm->num_pages * PAGE_SIZE;
> + unsigned long userptr = untagged_addr(gtt->userptr);
> + unsigned long end = userptr + ttm->num_pages * PAGE_SIZE;
>   struct vm_area_struct *vma;
> - vma = find_vma(gtt->usermm, gtt->userptr);
> + vma = find_vma(gtt->usermm, userptr);
>   if (!vma || vma->vm_file || vma->vm_end < end)
>   return -EPERM;
>   }

Same comment as on the previous patch.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v13 17/20] media/v4l2-core, arm64: untag user pointers in videobuf_dma_contig_user_get

2019-03-24 Thread Catalin Marinas
On Wed, Mar 20, 2019 at 03:51:31PM +0100, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> videobuf_dma_contig_user_get() uses provided user pointers for vma
> lookups, which can only by done with untagged pointers.
> 
> Untag the pointers in this function.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  drivers/media/v4l2-core/videobuf-dma-contig.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c 
> b/drivers/media/v4l2-core/videobuf-dma-contig.c
> index e1bf50df4c70..8a1ddd146b17 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
> @@ -160,6 +160,7 @@ static void videobuf_dma_contig_user_put(struct 
> videobuf_dma_contig_memory *mem)
>  static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory 
> *mem,
>   struct videobuf_buffer *vb)
>  {
> + unsigned long untagged_baddr = untagged_addr(vb->baddr);
>   struct mm_struct *mm = current->mm;
>   struct vm_area_struct *vma;
>   unsigned long prev_pfn, this_pfn;
> @@ -167,22 +168,22 @@ static int videobuf_dma_contig_user_get(struct 
> videobuf_dma_contig_memory *mem,
>   unsigned int offset;
>   int ret;
>  
> - offset = vb->baddr & ~PAGE_MASK;
> + offset = untagged_baddr & ~PAGE_MASK;
>   mem->size = PAGE_ALIGN(vb->size + offset);
>   ret = -EINVAL;
>  
>   down_read(>mmap_sem);
>  
> - vma = find_vma(mm, vb->baddr);
> + vma = find_vma(mm, untagged_baddr);
>   if (!vma)
>   goto out_up;
>  
> - if ((vb->baddr + mem->size) > vma->vm_end)
> + if ((untagged_baddr + mem->size) > vma->vm_end)
>   goto out_up;
>  
>   pages_done = 0;
>   prev_pfn = 0; /* kill warning */
> - user_address = vb->baddr;
> + user_address = untagged_baddr;
>  
>   while (pages_done < (mem->size >> PAGE_SHIFT)) {
>   ret = follow_pfn(vma, user_address, _pfn);

I don't think vb->baddr here is anonymous mmap() but worth checking the
call paths.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v13 18/20] tee/optee, arm64: untag user pointers in check_mem_type

2019-03-24 Thread Catalin Marinas
On Wed, Mar 20, 2019 at 03:51:32PM +0100, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> check_mem_type() uses provided user pointers for vma lookups (via
> __check_mem_type()), which can only by done with untagged pointers.
> 
> Untag user pointers in this function.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  drivers/tee/optee/call.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> index a5afbe6dee68..e3be20264092 100644
> --- a/drivers/tee/optee/call.c
> +++ b/drivers/tee/optee/call.c
> @@ -563,6 +563,7 @@ static int check_mem_type(unsigned long start, size_t 
> num_pages)
>   int rc;
>  
>   down_read(>mmap_sem);
> + start = untagged_addr(start);
>   rc = __check_mem_type(find_vma(mm, start),
> start + num_pages * PAGE_SIZE);
>   up_read(>mmap_sem);

I guess we could just untag this in tee_shm_register(). The tag is not
relevant to a TEE implementation (firmware) anyway.

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v13 11/20] tracing, arm64: untag user pointers in seq_print_user_ip

2019-03-24 Thread Catalin Marinas
On Wed, Mar 20, 2019 at 03:51:25PM +0100, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> seq_print_user_ip() uses provided user pointers for vma lookups, which
> can only by done with untagged pointers.
> 
> Untag user pointers in this function.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  kernel/trace/trace_output.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 54373d93e251..6376bee93c84 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -370,6 +370,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct 
> mm_struct *mm,
>  {
>   struct file *file = NULL;
>   unsigned long vmstart = 0;
> + unsigned long untagged_ip = untagged_addr(ip);
>   int ret = 1;
>  
>   if (s->full)
> @@ -379,7 +380,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct 
> mm_struct *mm,
>   const struct vm_area_struct *vma;
>  
>   down_read(>mmap_sem);
> - vma = find_vma(mm, ip);
> + vma = find_vma(mm, untagged_ip);
>   if (vma) {
>   file = vma->vm_file;
>   vmstart = vma->vm_start;
> @@ -388,7 +389,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct 
> mm_struct *mm,
>   ret = trace_seq_path(s, >f_path);
>   if (ret)
>   trace_seq_printf(s, "[+0x%lx]",
> -  ip - vmstart);
> +  untagged_ip - vmstart);
>   }
>   up_read(>mmap_sem);
>   }

How would we end up with a tagged address here? Does "ip" here imply
instruction pointer, which we wouldn't tag?

-- 
Catalin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel