Re: [PATCH 0/2] swiotlb: rework fix info leak with DMA_FROM_DEVICE

2022-03-04 Thread Matthew Wilcox
On Fri, Mar 04, 2022 at 05:29:08PM +0100, Halil Pasic wrote:
> No problem, I can do that. It isn't hard to squash things together, but
> when I was about to write the commit message, I had the feeling doing
> a revert is cleaner.
> 
> Any other opinions?

One patch, not two.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-25 Thread Matthew Wilcox
On Sat, Dec 25, 2021 at 09:16:55AM +, Hyeonggon Yoo wrote:
> # mm: Convert struct page to struct slab in functions used by other subsystems
> I'm not familiar with kasan, but to ask:
> Does kasan_slab_free detect invalid free if someone frees
> an object that is not allocated from slab?
> 
> @@ -341,7 +341,7 @@ static inline bool kasan_slab_free(struct kmem_cache 
> *cache, void *object,
> -   if (unlikely(nearest_obj(cache, virt_to_head_page(object), object) !=
> +   if (unlikely(nearest_obj(cache, virt_to_slab(object), object) !=
> object)) {
> kasan_report_invalid_free(tagged_object, ip);
> return true;
> 
> I'm asking this because virt_to_slab() will return NULL if folio_test_slab()
> returns false. That will cause NULL pointer dereference in nearest_obj.
> I don't think this change is intended.

You need to track down how this could happen.  As far as I can tell,
it's always called when we know the object is part of a slab.  That's
where the cachep pointer is deduced from.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-19 Thread Matthew Wilcox
On Mon, Dec 20, 2021 at 01:47:54AM +0100, Vlastimil Babka wrote:
> > * mm/slub: Convert print_page_info() to print_slab_info()
> > Do we really need to explicitly convert slab_folio()'s result to (struct 
> > folio *)?
> 
> Unfortunately yes, as long as folio_flags() don't take const struct folio *,
> which will need some yak shaving.

In case anyone's interested ...

folio_flags calls VM_BUG_ON_PGFLAGS() which would need its second
argument to be const.

That means dump_page() needs to take a const struct page, which
means __dump_page() needs its argument to be const.

That calls ...

is_migrate_cma_page()
page_mapping()
page_mapcount()
page_ref_count()
page_to_pgoff()
page_to_pfn()
hpage_pincount_available()
head_compound_mapcount()
head_compound_pincount()
compound_order()
PageKsm()
PageAnon()
PageCompound()

... and at that point, I ran out of motivation to track down some parts
of this tarbaby that could be fixed.  I did do:

mm: constify page_count and page_ref_count
mm: constify get_pfnblock_flags_mask and get_pfnblock_migratetype
mm: make compound_head const-preserving
mm/page_owner: constify dump_page_owner

so some of those are already done.  But a lot of them just need to be
done at the same time.  For example, page_mapping() calls
folio_mapping() which calls folio_test_slab() which calls folio_flags(),
so dump_page() and page_mapping() need to be done at the same time.

One bit that could be broken off easily (I think ...) is PageTransTail()
PageTail(), PageCompound(), PageHuge(), page_to_pgoff() and
page_to_index().  One wrinkle is needed a temporary
TESTPAGEFLAGS_FALSE_CONST.  But I haven't tried it yet.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 29/33] iommu: Use put_pages_list

2021-12-01 Thread Matthew Wilcox
On Wed, Dec 01, 2021 at 07:15:06PM +0100, Vlastimil Babka wrote:
> From: "Matthew Wilcox (Oracle)" 
> 
> page->freelist is for the use of slab.  We already have the ability
> to free a list of pages in the core mm, but it requires the use of a
> list_head and for the pages to be chained together through page->lru.
> Switch the iommu code over to using put_pages_list().

FYI, this is going to have conflicts with
https://lore.kernel.org/lkml/cover.1637671820.git.robin.mur...@arm.com/

I'm not sure what the appropriate resolution is going to be here;
maybe hold back part of this patch series to the following merge
window to give the iommu people time to merge their own patches?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] iommu: Use put_pages_list

2021-10-14 Thread Matthew Wilcox
On Thu, Oct 14, 2021 at 05:17:18PM +0100, Robin Murphy wrote:
> On 2021-10-14 12:52, John Garry wrote:
> > On 14/10/2021 12:20, Matthew Wilcox wrote:
> > > I'm going to keep pinging this patch weekly.
> > > 
> > > On Thu, Oct 07, 2021 at 07:17:02PM +0100, Matthew Wilcox wrote:
> > > > ping?
> > 
> > Robin, Were you checking this? You mentioned "I got
> > side-tracked trying to make io-pgtable use that freelist properly" in
> > another thread, which seems related.
> 
> Ooh, thanks for the heads-up John - I'm still only just starting to catch up
> on my mailing list folders since I got back off holiday.
> 
> Indeed I already started untangling the freelist handling in the flush queue
> code (to make the move into iommu-dma smaller). Once I'd figured out how it
> worked I did wonder whether there was any more "standard" field to borrow,
> since page->freelist did seem very much in the minority. If page->lru is it
> then great! From a quick skim of the patch I think I'd only have a few
> trivial review comments to make - certainly no objection to the fundamental
> change itself (indeed I hit a point in io-pgtable-arm where adding to the
> pointer chain got rather awkward, so having proper lists to splice would be
> lovely).

Great to hear!

> Matthew - is this something getting in the way of mm development, or just a
> nice cleanup? I'd be happy either to pursue merging it on its own, or to
> pick it up and work it into a series with my stuff.

This is probably going to get in the way of MM development in ~6 months
time.  I'm happy for you to pick it up and put it in a series of your own!
BTW, the optimisation of the implementation of put_pages_list() is sitting
in akpm's tree, so if you see a performance problem, please give that
a try.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] iommu: Use put_pages_list

2021-10-14 Thread Matthew Wilcox
I'm going to keep pinging this patch weekly.

On Thu, Oct 07, 2021 at 07:17:02PM +0100, Matthew Wilcox wrote:
> ping?
> 
> On Thu, Sep 30, 2021 at 05:20:42PM +0100, Matthew Wilcox (Oracle) wrote:
> > page->freelist is for the use of slab.  We already have the ability
> > to free a list of pages in the core mm, but it requires the use of a
> > list_head and for the pages to be chained together through page->lru.
> > Switch the iommu code over to using free_pages_list().
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) 
> > ---
> >  drivers/iommu/amd/io_pgtable.c | 99 +++---
> >  drivers/iommu/dma-iommu.c  | 11 +---
> >  drivers/iommu/intel/iommu.c| 89 +++---
> >  include/linux/iommu.h  |  3 +-
> >  4 files changed, 77 insertions(+), 125 deletions(-)
> > 
> > diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
> > index 182c93a43efd..8dfa6ee58b76 100644
> > --- a/drivers/iommu/amd/io_pgtable.c
> > +++ b/drivers/iommu/amd/io_pgtable.c
> > @@ -74,49 +74,37 @@ static u64 *first_pte_l7(u64 *pte, unsigned long 
> > *page_size,
> >   *
> >   
> > /
> >  
> > -static void free_page_list(struct page *freelist)
> > -{
> > -   while (freelist != NULL) {
> > -   unsigned long p = (unsigned long)page_address(freelist);
> > -
> > -   freelist = freelist->freelist;
> > -   free_page(p);
> > -   }
> > -}
> > -
> > -static struct page *free_pt_page(unsigned long pt, struct page *freelist)
> > +static void free_pt_page(unsigned long pt, struct list_head *list)
> >  {
> > struct page *p = virt_to_page((void *)pt);
> >  
> > -   p->freelist = freelist;
> > -
> > -   return p;
> > +   list_add_tail(>lru, list);
> >  }
> >  
> >  #define DEFINE_FREE_PT_FN(LVL, FN) 
> > \
> > -static struct page *free_pt_##LVL (unsigned long __pt, struct page 
> > *freelist)  \
> > -{  
> > \
> > -   unsigned long p;
> > \
> > -   u64 *pt;
> > \
> > -   int i;  
> > \
> > -   
> > \
> > -   pt = (u64 *)__pt;   
> > \
> > -   
> > \
> > -   for (i = 0; i < 512; ++i) { 
> > \
> > -   /* PTE present? */  
> > \
> > -   if (!IOMMU_PTE_PRESENT(pt[i]))  
> > \
> > -   continue;   
> > \
> > -   
> > \
> > -   /* Large PTE? */
> > \
> > -   if (PM_PTE_LEVEL(pt[i]) == 0 || 
> > \
> > -   PM_PTE_LEVEL(pt[i]) == 7)   
> > \
> > -   continue;   
> > \
> > -   
> > \
> > -   p = (unsigned long)IOMMU_PTE_PAGE(pt[i]);   
> > \
> > -   freelist = FN(p, freelist); 
> > \
> > -   }   
> > \
> > -   
> > \
> > -   return free_pt_page((unsigned long)pt, freelist);   
> > \
> > +static void free_pt_##LVL (unsigned long __pt, struct list_head *list) 
> > \
> > +{  \
> > +   unsigned long p;\
> > +   u64 *pt;\
> > +   int i;  \
> 

Re: [RFC] iommu: Use put_pages_list

2021-10-07 Thread Matthew Wilcox
ping?

On Thu, Sep 30, 2021 at 05:20:42PM +0100, Matthew Wilcox (Oracle) wrote:
> page->freelist is for the use of slab.  We already have the ability
> to free a list of pages in the core mm, but it requires the use of a
> list_head and for the pages to be chained together through page->lru.
> Switch the iommu code over to using free_pages_list().
> 
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---
>  drivers/iommu/amd/io_pgtable.c | 99 +++---
>  drivers/iommu/dma-iommu.c  | 11 +---
>  drivers/iommu/intel/iommu.c| 89 +++---
>  include/linux/iommu.h  |  3 +-
>  4 files changed, 77 insertions(+), 125 deletions(-)
> 
> diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
> index 182c93a43efd..8dfa6ee58b76 100644
> --- a/drivers/iommu/amd/io_pgtable.c
> +++ b/drivers/iommu/amd/io_pgtable.c
> @@ -74,49 +74,37 @@ static u64 *first_pte_l7(u64 *pte, unsigned long 
> *page_size,
>   *
>   
> /
>  
> -static void free_page_list(struct page *freelist)
> -{
> - while (freelist != NULL) {
> - unsigned long p = (unsigned long)page_address(freelist);
> -
> - freelist = freelist->freelist;
> - free_page(p);
> - }
> -}
> -
> -static struct page *free_pt_page(unsigned long pt, struct page *freelist)
> +static void free_pt_page(unsigned long pt, struct list_head *list)
>  {
>   struct page *p = virt_to_page((void *)pt);
>  
> - p->freelist = freelist;
> -
> - return p;
> + list_add_tail(>lru, list);
>  }
>  
>  #define DEFINE_FREE_PT_FN(LVL, FN)   
> \
> -static struct page *free_pt_##LVL (unsigned long __pt, struct page 
> *freelist)\
> -{
> \
> - unsigned long p;
> \
> - u64 *pt;
> \
> - int i;  
> \
> - 
> \
> - pt = (u64 *)__pt;   
> \
> - 
> \
> - for (i = 0; i < 512; ++i) { 
> \
> - /* PTE present? */  
> \
> - if (!IOMMU_PTE_PRESENT(pt[i]))  
> \
> - continue;   
> \
> - 
> \
> - /* Large PTE? */
> \
> - if (PM_PTE_LEVEL(pt[i]) == 0 || 
> \
> - PM_PTE_LEVEL(pt[i]) == 7)   
> \
> - continue;   
> \
> - 
> \
> - p = (unsigned long)IOMMU_PTE_PAGE(pt[i]);   
> \
> - freelist = FN(p, freelist); 
> \
> - }   
> \
> - 
> \
> - return free_pt_page((unsigned long)pt, freelist);   
> \
> +static void free_pt_##LVL (unsigned long __pt, struct list_head *list)   
> \
> +{\
> + unsigned long p;\
> + u64 *pt;\
> + int i;  \
> + \
> + pt = (u64 *)__pt;   \
> + \
> + for (i = 0; i < 512; ++i) { \
> + /* PTE present? */  \
> + if (!IOMMU_PTE_PRESENT(pt[i]))  \
> + continue; 

[RFC] iommu: Use put_pages_list

2021-09-30 Thread Matthew Wilcox (Oracle)
page->freelist is for the use of slab.  We already have the ability
to free a list of pages in the core mm, but it requires the use of a
list_head and for the pages to be chained together through page->lru.
Switch the iommu code over to using free_pages_list().

Signed-off-by: Matthew Wilcox (Oracle) 
---
 drivers/iommu/amd/io_pgtable.c | 99 +++---
 drivers/iommu/dma-iommu.c  | 11 +---
 drivers/iommu/intel/iommu.c| 89 +++---
 include/linux/iommu.h  |  3 +-
 4 files changed, 77 insertions(+), 125 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 182c93a43efd..8dfa6ee58b76 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -74,49 +74,37 @@ static u64 *first_pte_l7(u64 *pte, unsigned long *page_size,
  *
  /
 
-static void free_page_list(struct page *freelist)
-{
-   while (freelist != NULL) {
-   unsigned long p = (unsigned long)page_address(freelist);
-
-   freelist = freelist->freelist;
-   free_page(p);
-   }
-}
-
-static struct page *free_pt_page(unsigned long pt, struct page *freelist)
+static void free_pt_page(unsigned long pt, struct list_head *list)
 {
struct page *p = virt_to_page((void *)pt);
 
-   p->freelist = freelist;
-
-   return p;
+   list_add_tail(>lru, list);
 }
 
 #define DEFINE_FREE_PT_FN(LVL, FN) 
\
-static struct page *free_pt_##LVL (unsigned long __pt, struct page *freelist)  
\
-{  
\
-   unsigned long p;
\
-   u64 *pt;
\
-   int i;  
\
-   
\
-   pt = (u64 *)__pt;   
\
-   
\
-   for (i = 0; i < 512; ++i) { 
\
-   /* PTE present? */  
\
-   if (!IOMMU_PTE_PRESENT(pt[i]))  
\
-   continue;   
\
-   
\
-   /* Large PTE? */
\
-   if (PM_PTE_LEVEL(pt[i]) == 0 || 
\
-   PM_PTE_LEVEL(pt[i]) == 7)   
\
-   continue;   
\
-   
\
-   p = (unsigned long)IOMMU_PTE_PAGE(pt[i]);   
\
-   freelist = FN(p, freelist); 
\
-   }   
\
-   
\
-   return free_pt_page((unsigned long)pt, freelist);   
\
+static void free_pt_##LVL (unsigned long __pt, struct list_head *list) \
+{  \
+   unsigned long p;\
+   u64 *pt;\
+   int i;  \
+   \
+   pt = (u64 *)__pt;   \
+   \
+   for (i = 0; i < 512; ++i) { \
+   /* PTE present? */  \
+   if (!IOMMU_PTE_PRESENT(pt[i]))  \
+   continue;   \
+   \
+   /* Large PTE? */\
+   if (PM_PTE_LEVEL(pt[i]) == 0 || \
+   PM_PTE_LEVEL(pt[i]) == 7)   \
+   continue;   \
+   \
+   p = (unsigned long)IOMMU_PTE_PAGE(pt[i]);   \
+

Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin

2021-02-10 Thread Matthew Wilcox
On Tue, Feb 09, 2021 at 08:20:18PM +0800, Zhou Wang wrote:
> Agree, will add it in next version.

No, don't do another version.  Jason is right, this approach is wrong.
The point of SVA is that it doesn't require the application to do
anything special.  If jitter from too-frequent page migration is actually
a problem, then fix the frequency of page migration.  Don't pretend that
this particular application is so important that it prevents the kernel
from doing its housekeeping.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin

2021-02-07 Thread Matthew Wilcox
On Sun, Feb 07, 2021 at 10:24:28PM +, Song Bao Hua (Barry Song) wrote:
> > > In high-performance I/O cases, accelerators might want to perform
> > > I/O on a memory without IO page faults which can result in dramatically
> > > increased latency. Current memory related APIs could not achieve this
> > > requirement, e.g. mlock can only avoid memory to swap to backup device,
> > > page migration can still trigger IO page fault.
> > 
> > Well ... we have two requirements.  The application wants to not take
> > page faults.  The system wants to move the application to a different
> > NUMA node in order to optimise overall performance.  Why should the
> > application's desires take precedence over the kernel's desires?  And why
> > should it be done this way rather than by the sysadmin using numactl to
> > lock the application to a particular node?
> 
> NUMA balancer is just one of many reasons for page migration. Even one
> simple alloc_pages() can cause memory migration in just single NUMA
> node or UMA system.
> 
> The other reasons for page migration include but are not limited to:
> * memory move due to CMA
> * memory move due to huge pages creation
> 
> Hardly we can ask users to disable the COMPACTION, CMA and Huge Page
> in the whole system.

You're dodging the question.  Should the CMA allocation fail because
another application is using SVA?

I would say no.  The application using SVA should take the one-time
performance hit from having its memory moved around.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin

2021-02-07 Thread Matthew Wilcox
On Sun, Feb 07, 2021 at 04:18:03PM +0800, Zhou Wang wrote:
> SVA(share virtual address) offers a way for device to share process virtual
> address space safely, which makes more convenient for user space device
> driver coding. However, IO page faults may happen when doing DMA
> operations. As the latency of IO page fault is relatively big, DMA
> performance will be affected severely when there are IO page faults.
> >From a long term view, DMA performance will be not stable.
> 
> In high-performance I/O cases, accelerators might want to perform
> I/O on a memory without IO page faults which can result in dramatically
> increased latency. Current memory related APIs could not achieve this
> requirement, e.g. mlock can only avoid memory to swap to backup device,
> page migration can still trigger IO page fault.

Well ... we have two requirements.  The application wants to not take
page faults.  The system wants to move the application to a different
NUMA node in order to optimise overall performance.  Why should the
application's desires take precedence over the kernel's desires?  And why
should it be done this way rather than by the sysadmin using numactl to
lock the application to a particular node?

> +struct mem_pin_container {
> + struct xarray array;
> + struct mutex lock;
> +};

I don't understand what the lock actually protects.

> +struct pin_pages {
> + unsigned long first;
> + unsigned long nr_pages;
> + struct page **pages;
> +};

I don't think you need 'first', and I think you can embed the pages
array into this struct, removing one allocation.

> + xa_for_each(>array, idx, p) {
> + unpin_user_pages(p->pages, p->nr_pages);
> + xa_erase(>array, p->first);
> + vfree(p->pages);
> + kfree(p);
> + }
> +
> + mutex_destroy(>lock);
> + xa_destroy(>array);

If you just called xa_erase() on every element of the array, you don't need
to call xa_destroy().

> + if (!can_do_mlock())
> + return -EPERM;

You check for can_do_mlock(), but you don't account the pages to this
rlimit.

> + first = (addr->addr & PAGE_MASK) >> PAGE_SHIFT;

You don't need to mask off the bits, the shift will remove them.

> + last = ((addr->addr + addr->size - 1) & PAGE_MASK) >> PAGE_SHIFT;

DIV_ROUND_UP()?

> + pages = vmalloc(nr_pages * sizeof(struct page *));

kvmalloc().  vmalloc() always allocates at least a page, so we want to
use kmalloc if the size is small.  Also, use array_size() -- I know this
can't overflow, but let's be clear

> + ret = pin_user_pages_fast(addr->addr & PAGE_MASK, nr_pages,
> +   flags | FOLL_LONGTERM, pages);
> + if (ret != nr_pages) {
> + pr_err("mempinfd: Failed to pin page\n");

No.  You mustn't allow the user to be able to generate messages to syslog,
just by passing garbage to a syscall.

> + ret = xa_insert(>array, p->first, p, GFP_KERNEL);
> + if (ret)
> + goto unpin_pages;

Hmm.  So we can't pin two ranges which start at the same address, but we
can pin two overlapping ranges.  Is that OK?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: a saner API for allocating DMA addressable pages v2

2020-09-14 Thread Matthew Wilcox
On Mon, Sep 14, 2020 at 04:44:16PM +0200, Christoph Hellwig wrote:
> I'm still a little unsure about the API naming, as alloc_pages sort of
> implies a struct page return value, but we return a kernel virtual
> address.

Erm ... dma_alloc_pages() returns a struct page, so is this sentence
stale?

>From patch 14:

+struct page *dma_alloc_pages(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);

> The other alternative would be to name the API
> dma_alloc_noncoherent, but the whole non-coherent naming seems to put
> people off.

You say that like it's a bad thing.  I think the problem is more that
people don't understand what non-coherent means and think they're
supporting it when they're not.

dma_alloc_manual_flushing()?

> As a follow up I plan to move the implementation of the
> DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given
> that is also is a fundamentally non coherent allocation.  The replacement
> for that flag would then return a struct page, as it is allowed to
> actually return pages without a kernel mapping as the name suggested
> (although most of the time they will actually have a kernel mapping..)

If the page doesn't have a kernel mapping, shouldn't it return a PFN
or a phys_addr?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 11/17] sgiseeq: convert to dma_alloc_noncoherent

2020-09-14 Thread Matthew Wilcox
On Mon, Sep 14, 2020 at 04:44:27PM +0200, Christoph Hellwig wrote:
>  drivers/net/ethernet/i825xx/lasi_82596.c |  25 ++---
>  drivers/net/ethernet/i825xx/lib82596.c   | 114 ++-
>  drivers/net/ethernet/i825xx/sni_82596.c  |   4 -
>  drivers/net/ethernet/seeq/sgiseeq.c  |  28 --
>  drivers/scsi/53c700.c|   9 +-
>  5 files changed, 103 insertions(+), 77 deletions(-)

I think your patch slicing-and-dicing went wrong here ;-(
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/28] 53c700: improve non-coherent DMA handling

2020-09-01 Thread Matthew Wilcox
On Tue, Sep 01, 2020 at 06:41:12PM +0200, Helge Deller wrote:
> > I still have a zoo of machines running for such testing, including a
> > 715/64 and two 730.
> > I'm going to test this git tree on the 715/64:

The 715/64 is a 7100LC machine though.  I think you need to boot on
the 730 to test the non-coherent path.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/28] 53c700: improve non-coherent DMA handling

2020-09-01 Thread Matthew Wilcox
On Tue, Sep 01, 2020 at 07:52:40AM -0700, James Bottomley wrote:
> I think this looks mostly OK, except for one misnamed parameter below. 
> Unfortunately, the last non-coherent parisc was the 700 series and I no
> longer own a box, so I can't test that part of it (I can fire up the
> C360 to test it on a coherent arch).

I have a 715/50 that probably hasn't been powered on in 15 years if you
need something that old to test on (I believe the 725/100 uses the 7100LC
and so is coherent).  I'll need to set up a cross-compiler ...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Matthew Wilcox
On Wed, Apr 08, 2020 at 05:12:03PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 08, 2020 at 08:01:00AM -0700, Randy Dunlap wrote:
> > Hi,
> > 
> > On 4/8/20 4:59 AM, Christoph Hellwig wrote:
> > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > index 36949a9425b8..614cc786b519 100644
> > > --- a/mm/Kconfig
> > > +++ b/mm/Kconfig
> > > @@ -702,7 +702,7 @@ config ZSMALLOC
> > >  
> > >  config ZSMALLOC_PGTABLE_MAPPING
> > >   bool "Use page table mapping to access object in zsmalloc"
> > > - depends on ZSMALLOC
> > > + depends on ZSMALLOC=y
> > 
> > It's a bool so this shouldn't matter... not needed.
> 
> My mm/Kconfig has:
> 
> config ZSMALLOC
>   tristate "Memory allocator for compressed pages"
>   depends on MMU
> 
> which I think means it can be modular, no?

Randy means that ZSMALLOC_PGTABLE_MAPPING is a bool, so I think hch's patch
is wrong ... if ZSMALLOC is 'm' then ZSMALLOC_PGTABLE_MAPPING would become
'n' instead of 'y'.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


What is the meaning of PASID_MIN?

2019-02-11 Thread Matthew Wilcox


I'm looking at commit 562831747f6299abd481b5b00bd4fa19d5c8a259
which fails to adequately explain why we can't use PASID 0.  Commit
af39507305fb83a5d3c475c2851f4d59545d8a18 also doesn't explain why PASID
0 is no longer usable for the intel-svm driver.

There are a load of simplifications that could be made to this, but I
don't know which ones to suggest without a clear understanding of the
problem you're actually trying to solve.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv2 1/9] mm: Introduce new vm_insert_range and vm_insert_range_buggy API

2019-02-07 Thread Matthew Wilcox
On Thu, Feb 07, 2019 at 09:19:47PM +0530, Souptick Joarder wrote:
> Just thought to take opinion for documentation before placing it in v3.
> Does it looks fine ?
> 
> +/**
> + * __vm_insert_range - insert range of kernel pages into user vma
> + * @vma: user vma to map to
> + * @pages: pointer to array of source kernel pages
> + * @num: number of pages in page array
> + * @offset: user's requested vm_pgoff
> + *
> + * This allow drivers to insert range of kernel pages into a user vma.
> + *
> + * Return: 0 on success and error code otherwise.
> + */
> +static int __vm_insert_range(struct vm_area_struct *vma, struct page **pages,
> +   unsigned long num, unsigned long offset)

For static functions, I prefer to leave off the second '*', ie make it
formatted like a docbook comment, but not be processed like a docbook
comment.  That avoids cluttering the html with descriptions of internal
functions that people can't actually call.

> +/**
> + * vm_insert_range - insert range of kernel pages starts with non zero offset
> + * @vma: user vma to map to
> + * @pages: pointer to array of source kernel pages
> + * @num: number of pages in page array
> + *
> + * Maps an object consisting of `num' `pages', catering for the user's

Rather than using `num', you should use @num.

> + * requested vm_pgoff
> + *
> + * If we fail to insert any page into the vma, the function will return
> + * immediately leaving any previously inserted pages present.  Callers
> + * from the mmap handler may immediately return the error as their caller
> + * will destroy the vma, removing any successfully inserted pages. Other
> + * callers should make their own arrangements for calling unmap_region().
> + *
> + * Context: Process context. Called by mmap handlers.
> + * Return: 0 on success and error code otherwise.
> + */
> +int vm_insert_range(struct vm_area_struct *vma, struct page **pages,
> +   unsigned long num)
> 
> 
> +/**
> + * vm_insert_range_buggy - insert range of kernel pages starts with zero 
> offset
> + * @vma: user vma to map to
> + * @pages: pointer to array of source kernel pages
> + * @num: number of pages in page array
> + *
> + * Similar to vm_insert_range(), except that it explicitly sets @vm_pgoff to

But vm_pgoff isn't a parameter, so it's misleading to format it as such.

> + * 0. This function is intended for the drivers that did not consider
> + * @vm_pgoff.
> + *
> + * Context: Process context. Called by mmap handlers.
> + * Return: 0 on success and error code otherwise.
> + */
> +int vm_insert_range_buggy(struct vm_area_struct *vma, struct page **pages,
> +   unsigned long num)

I don't think we should call it 'buggy'.  'zero' would make more sense
as a suffix.

Given how this interface has evolved, I'm no longer sure than
'vm_insert_range' makes sense as the name for it.  Is it perhaps
'vm_map_object' or 'vm_map_pages'?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Matthew Wilcox
On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote:
> > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
> > +   struct page **pages, unsigned long page_count)
> > +{
> > +   unsigned long uaddr = addr;
> > +   int ret = 0, i;
> 
> Some of the sites being replaced were effectively ensuring that vma and
> pages were mutually compatible as an initial condition - would it be worth
> adding something here for robustness, e.g.:
> 
> + if (page_count != vma_pages(vma))
> + return -ENXIO;

I think we want to allow this to be used to populate part of a VMA.
So perhaps:

if (page_count > vma_pages(vma))
return -ENXIO;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-07 Thread Matthew Wilcox
On Fri, Dec 07, 2018 at 02:16:19PM +0800, Nicolas Boichat wrote:
> +#ifdef CONFIG_ZONE_DMA32
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32

This name doesn't make any sense.  Why not ARM_V7S_TABLE_SLAB_FLAGS ?

> +#else
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA

Can you remind me again why it is, on machines which don't support
ZONE_DMA32, why we have to allocate from ZONE_DMA?  My understanding
is that 64-bit machines have ZONE_DMA32 and 32-bit machines don't.
So shouldn't this rather be GFP_KERNEL?

Actually, maybe we could centralise this in gfp.h:

#ifdef CONFIG_64BIT
# ifdef CONFIG_ZONE_DMA32
#define GFP_32BIT   GFP_DMA32
# else
#define GFP_32BIT   GFP_DMA
#else /* 32-bit */
#define GFP_32BIT   GFP_KERNEL
#endif

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 9/9] dmapool: debug: prevent endless loop in case of corruption

2018-12-04 Thread Matthew Wilcox
On Tue, Dec 04, 2018 at 12:28:54PM -0800, Andrew Morton wrote:
> On Tue, 4 Dec 2018 12:18:01 -0800 Matthew Wilcox  wrote:
> > I only had a review comment on 8/9, which I then withdrew during my review
> > of patch 9/9.  Unless I missed something during my re-review of my 
> > responses?
> 
> And in 0/9, that 1.3MB allocation.
> 
> Maybe it's using kvmalloc, I didn't look.

Oh!  That's the mptsas driver doing something utterly awful.  Not the
fault of this patchset, in any way.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 9/9] dmapool: debug: prevent endless loop in case of corruption

2018-12-04 Thread Matthew Wilcox
On Tue, Dec 04, 2018 at 12:14:43PM -0800, Andrew Morton wrote:
> On Tue, 4 Dec 2018 11:22:34 -0500 Tony Battersby  
> wrote:
> 
> > On 11/13/18 1:36 AM, Matthew Wilcox wrote:
> > > On Mon, Nov 12, 2018 at 10:46:35AM -0500, Tony Battersby wrote:
> > >> Prevent a possible endless loop with DMAPOOL_DEBUG enabled if a buggy
> > >> driver corrupts DMA pool memory.
> > >>
> > >> Signed-off-by: Tony Battersby 
> > > I like it!  Also, here you're using blks_per_alloc in a way which isn't
> > > normally in the performance path, but might be with the right config
> > > options.  With that, I withdraw my objection to the previous patch and
> > >
> > > Acked-by: Matthew Wilcox 
> > >
> > > Andrew, can you funnel these in through your tree?  If you'd rather not,
> > > I don't mind stuffing them into a git tree and asking Linus to pull
> > > for 4.21.
> > >
> > No reply for 3 weeks, so adding Andrew Morton to recipient list.
> > 
> > Andrew, I have 9 dmapool patches ready for merging in 4.21.  See Matthew
> > Wilcox's request above.
> > 
> 
> I'll take a look, but I see that this v4 series has several review
> comments from Matthew which remain unresponded to.  Please attend to
> that.

I only had a review comment on 8/9, which I then withdrew during my review
of patch 9/9.  Unless I missed something during my re-review of my responses?

> Also, Andy had issues with the v2 series so it would be good to hear an
> update from him?

Certainly.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/9] iommu/dma-iommu.c: Convert to use vm_insert_range

2018-11-23 Thread Matthew Wilcox
On Fri, Nov 23, 2018 at 05:23:06PM +, Robin Murphy wrote:
> On 15/11/2018 15:49, Souptick Joarder wrote:
> > Convert to use vm_insert_range() to map range of kernel
> > memory to user vma.
> > 
> > Signed-off-by: Souptick Joarder 
> > Reviewed-by: Matthew Wilcox 
> > ---
> >   drivers/iommu/dma-iommu.c | 12 ++--
> >   1 file changed, 2 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index d1b0475..69c66b1 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -622,17 +622,9 @@ struct page **iommu_dma_alloc(struct device *dev, 
> > size_t size, gfp_t gfp,
> >   int iommu_dma_mmap(struct page **pages, size_t size, struct 
> > vm_area_struct *vma)
> >   {
> > -   unsigned long uaddr = vma->vm_start;
> > -   unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> > -   int ret = -ENXIO;
> > +   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> > -   for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
> > -   ret = vm_insert_page(vma, uaddr, pages[i]);
> > -   if (ret)
> > -   break;
> > -   uaddr += PAGE_SIZE;
> > -   }
> > -   return ret;
> > +   return vm_insert_range(vma, vma->vm_start, pages, count);
> 
> AFIACS, vm_insert_range() doesn't respect vma->vm_pgoff, so doesn't this
> break partial mmap()s of a large buffer? (which I believe can be a thing)

Whoops.  That should have been:

return vm_insert_range(vma, vma->vm_start, pages + vma->vm_pgoff, count);

I suppose.

Although arguably we should respect vm_pgoff inside vm_insert_region()
and then callers automatically get support for vm_pgoff without having
to think about it ... although we should then also pass in the length
of the pages array to avoid pages being mapped in which aren't part of
the allocated array.

Hm.  More thought required.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-22 Thread Matthew Wilcox
On Thu, Nov 22, 2018 at 12:26:02AM -0800, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 06:35:58PM -0800, Matthew Wilcox wrote:
> > I think you should look at using the page_frag allocator here.  You can
> > use whatever GFP_DMA flags you like.
> 
> So I actually tries to use page_frag to solve the XFS unaligned kmalloc
> allocations problem, and I don't think it is the right hammer for this
> nail (or any other nail outside of networking).
> 
> The problem with the page_frag allocator is that it never reuses
> fragments returned to the page, but only only frees the page once all
> fragments are freed.  This means that if you have some long(er) term
> allocations you are effectively creating memory leaks.

Yes, your allocations from the page_frag allocator have to have similar
lifetimes.  I thought that would be ideal for XFS though; as I understood
the problem, these were per-IO allocations, and IOs to the same filesystem
tend to take roughly the same amount of time.  Sure, in an error case,
some IOs will take a long time before timing out, but it should be OK
to have pages unavailable during that time in these rare situations.
What am I missing?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-21 Thread Matthew Wilcox
On Wed, Nov 21, 2018 at 10:26:26PM +, Robin Murphy wrote:
> These are IOMMU page tables, rather than CPU ones, so we're already well
> outside arch code - indeed the original motivation of io-pgtable was to be
> entirely independent of the p*d types and arch-specific MM code (this Armv7
> short-descriptor format is already "non-native" when used by drivers in an
> arm64 kernel).

There was quite a lot of explanation missing from this patch description!

> There are various efficiency reasons for using regular kernel memory instead
> of coherent DMA allocations - for the most part it works well, we just have
> the odd corner case like this one where the 32-bit format gets used on
> 64-bit systems such that the tables themselves still need to be allocated
> below 4GB (although the final output address can point at higher memory by
> virtue of the IOMMU in question not implementing permissions and repurposing
> some of those PTE fields as extra address bits).
> 
> TBH, if this DMA32 stuff is going to be contentious we could possibly just
> rip out the offending kmem_cache - it seemed like good practice for the
> use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to
> give the same 1KB alignment and chance of succeeding as the equivalent
> kmem_cache_alloc(), then we could quite easily make do with that instead.

I think you should look at using the page_frag allocator here.  You can
use whatever GFP_DMA flags you like.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-21 Thread Matthew Wilcox
On Wed, Nov 21, 2018 at 06:20:02PM +, Christopher Lameter wrote:
> On Sun, 11 Nov 2018, Nicolas Boichat wrote:
> 
> > This is a follow-up to the discussion in [1], to make sure that the page
> > tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> > physical address space.
> 
> Page tables? This means you need a page frame? Why go through the slab
> allocators?

Because this particular architecture has sub-page-size PMD page tables.
We desperately need to hoist page table allocation out of the architectures;
there're a bunch of different implementations and they're mostly bad,
one way or another.

For each level of page table we generally have three cases:

1. single page
2. sub-page, naturally aligned
3. multiple pages, naturally aligned

for 1 and 3, the page allocator will do just fine.
for 2, we should have a per-MM page_frag allocator.  s390 already has
something like this, although it's more complicated.  ppc also has
something a little more complex for the cases when it's configured with
a 64k page size but wants to use a 4k page table entry.

I'd like x86 to be able to simply do:

#define pte_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)
#define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)
#define pud_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)
#define p4d_alloc_one(mm, addr) page_alloc_table(mm, addr, 0)

An architecture with 4k page size and needing a 16k PMD would do:

#define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2)

while an architecture with a 64k page size needing a 4k PTE would do:

#define ARCH_PAGE_TABLE_FRAG
#define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096)

I haven't had time to work on this, but perhaps someone with a problem
that needs fixing would like to, instead of burying yet another awful
implementation away in arch/ somewhere.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/9] mm: Introduce new vm_insert_range API

2018-11-21 Thread Matthew Wilcox
On Wed, Nov 21, 2018 at 04:19:11AM -0700, William Kucharski wrote:
> Could you add a line to the description explicitly stating that a failure
> to insert any page in the range will fail the entire routine, something
> like:
> 
> > * This allows drivers to insert range of kernel pages they've allocated
> > * into a user vma. This is a generic function which drivers can use
> > * rather than using their own way of mapping range of kernel pages into
> > * user vma.
> > *
> > * A failure to insert any page in the range will fail the call as a whole.
> 
> It's obvious when reading the code, but it would be self-documenting to
> state it outright.

It's probably better to be more explicit and answer Randy's question:

 * If we fail to insert any page into the vma, the function will return
 * immediately leaving any previously-inserted pages present.  Callers
 * from the mmap handler may immediately return the error as their
 * caller will destroy the vma, removing any successfully-inserted pages.
 * Other callers should make their own arrangements for calling unmap_region().

Although unmap_region() is static so there clearly isn't any code in the
kernel today other than in mmap handlers (or fault handlers) that needs to
insert pages into a VMA.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/9] mm: Introduce new vm_insert_range API

2018-11-17 Thread Matthew Wilcox
On Sat, Nov 17, 2018 at 12:26:38PM +0530, Souptick Joarder wrote:
> On Fri, Nov 16, 2018 at 11:59 PM Mike Rapoport  wrote:
> > > + * vm_insert_range - insert range of kernel pages into user vma
> > > + * @vma: user vma to map to
> > > + * @addr: target user address of this page
> > > + * @pages: pointer to array of source kernel pages
> > > + * @page_count: no. of pages need to insert into user vma
> > > + *
> > > + * This allows drivers to insert range of kernel pages they've allocated
> > > + * into a user vma. This is a generic function which drivers can use
> > > + * rather than using their own way of mapping range of kernel pages into
> > > + * user vma.
> >
> > Please add the return value and context descriptions.
> >
> 
> Sure I will wait for some time to get additional review comments and
> add all of those requested changes in v2.

You could send your proposed wording now which might remove the need
for a v3 if we end up arguing about the wording.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/9] mm: Introduce new vm_insert_range API

2018-11-15 Thread Matthew Wilcox
On Fri, Nov 16, 2018 at 11:00:30AM +0530, Souptick Joarder wrote:
> On Thu, Nov 15, 2018 at 11:44 PM Randy Dunlap  wrote:
> > On 11/15/18 7:45 AM, Souptick Joarder wrote:
> > What is the opposite of vm_insert_range() or even of vm_insert_page()?
> > or is there no need for that?
> 
> There is no opposite function of vm_insert_range() / vm_insert_page().
> My understanding is, in case of any error, mmap handlers will return the
> err to user process and user space will decide the next action. So next
> time when mmap handler is getting invoked it will map from the beginning.
> Correct me if I am wrong.

The opposite function, I suppose, is unmap_region().

> > s/no./number/
> 
> I didn't get it ??

This is a 'sed' expression.  's' is the 'substitute' command; the /
is a separator, 'no.' is what you wrote, and 'number' is what Randy
is recommending instead.

> > > + for (i = 0; i < page_count; i++) {
> > > + ret = vm_insert_page(vma, uaddr, pages[i]);
> > > + if (ret < 0)
> > > + return ret;
> >
> > For a non-trivial value of page_count:
> > Is it a problem if vm_insert_page() succeeds for several pages
> > and then fails?
> 
> No, it will be considered as total failure and mmap handler will return
> the err to user space.

I think what Randy means is "What happens to the inserted pages?" and
the answer is that mmap_region() jumps to the 'unmap_and_free_vma'
label, which is an accurate name.

Thanks for looking, Randy.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/9] mpt3sas and dmapool scalability

2018-11-12 Thread Matthew Wilcox
On Mon, Nov 12, 2018 at 10:40:57AM -0500, Tony Battersby wrote:
> I posted v3 on August 7.  Nobody acked or merged the patches, and then
> I got too busy with other stuff to repost until now.

Thanks for resending.  They were in my pile of things to look at, but
that's an ever-growing pile.

> I believe these patches are ready for merging.

I agree.

> cat /sys/devices/pci:80/:80:07.0/:85:00.0/pools
> (manually cleaned up column alignment)
> poolinfo - 0.1
> reply_post_free_array pool  1  21 192 1
> reply_free pool 1  1  41728   1
> reply pool  1  1  1335296 1
> sense pool  1  1  970272  1
> chain pool  373959 386048 128 12064
> reply_post_free pool12 12 166528  12

That reply pool ... 1 object of 1.3MB?  That's a lot of strain to put
on the page allocator.  I wonder if anything can be done about that.

(I'm equally non-thrilled about the sense pool, the reply_post_free pool
and the reply_free pool, but they seem a little less stressful than the
reply pool)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 9/9] dmapool: debug: prevent endless loop in case of corruption

2018-11-12 Thread Matthew Wilcox
On Mon, Nov 12, 2018 at 10:46:35AM -0500, Tony Battersby wrote:
> Prevent a possible endless loop with DMAPOOL_DEBUG enabled if a buggy
> driver corrupts DMA pool memory.
> 
> Signed-off-by: Tony Battersby 

I like it!  Also, here you're using blks_per_alloc in a way which isn't
normally in the performance path, but might be with the right config
options.  With that, I withdraw my objection to the previous patch and

Acked-by: Matthew Wilcox 

Andrew, can you funnel these in through your tree?  If you'd rather not,
I don't mind stuffing them into a git tree and asking Linus to pull
for 4.21.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 8/9] dmapool: improve accuracy of debug statistics

2018-11-12 Thread Matthew Wilcox
On Mon, Nov 12, 2018 at 10:45:58AM -0500, Tony Battersby wrote:
> +++ linux/mm/dmapool.c2018-08-06 17:52:53.0 -0400
> @@ -61,6 +61,7 @@ struct dma_pool {   /* the pool */
>   struct device *dev;
>   unsigned int allocation;
>   unsigned int boundary;
> + unsigned int blks_per_alloc;
>   char name[32];
>   struct list_head pools;
>  };

This one I'm not totally happy with.  You're storing this value when
it could be easily calculated each time through the show_pools() code.
I appreciate this is a topic where reasonable people might have different
opinions about which solution is preferable.

> @@ -182,6 +182,9 @@ struct dma_pool *dma_pool_create(const c
>   retval->size = size;
>   retval->boundary = boundary;
>   retval->allocation = allocation;
> + retval->blks_per_alloc =
> + (allocation / boundary) * (boundary / size) +
> + (allocation % boundary) / size;
>  
>   INIT_LIST_HEAD(>pools);
>  
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 7/9] dmapool: cleanup integer types

2018-11-12 Thread Matthew Wilcox
On Mon, Nov 12, 2018 at 10:45:21AM -0500, Tony Battersby wrote:
> To represent the size of a single allocation, dmapool currently uses
> 'unsigned int' in some places and 'size_t' in other places.  Standardize
> on 'unsigned int' to reduce overhead, but use 'size_t' when counting all
> the blocks in the entire pool.
> 
> Signed-off-by: Tony Battersby 

Acked-by: Matthew Wilcox 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 6/9] dmapool: improve scalability of dma_pool_free

2018-11-12 Thread Matthew Wilcox
On Mon, Nov 12, 2018 at 10:44:40AM -0500, Tony Battersby wrote:
> dma_pool_free() scales poorly when the pool contains many pages because
> pool_find_page() does a linear scan of all allocated pages.  Improve its
> scalability by replacing the linear scan with virt_to_page() and storing
> dmapool private data directly in 'struct page', thereby eliminating
> 'struct dma_page'.  In big O notation, this improves the algorithm from
> O(n^2) to O(n) while also reducing memory usage.
> 
> Thanks to Matthew Wilcox for the suggestion to use struct page.
> 
> Signed-off-by: Tony Battersby 

Acked-by: Matthew Wilcox 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 5/9] dmapool: rename fields in dma_page

2018-11-12 Thread Matthew Wilcox
On Mon, Nov 12, 2018 at 10:44:02AM -0500, Tony Battersby wrote:
> Rename fields in 'struct dma_page' in preparation for moving them into
> 'struct page'.  No functional changes.
> 
> in_use -> dma_in_use
> offset -> dma_free_off
> 
> Signed-off-by: Tony Battersby 

Acked-by: Matthew Wilcox 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/9] dmapool: improve scalability of dma_pool_alloc

2018-11-12 Thread Matthew Wilcox
On Mon, Nov 12, 2018 at 10:43:25AM -0500, Tony Battersby wrote:
> dma_pool_alloc() scales poorly when allocating a large number of pages
> because it does a linear scan of all previously-allocated pages before
> allocating a new one.  Improve its scalability by maintaining a separate
> list of pages that have free blocks ready to (re)allocate.  In big O
> notation, this improves the algorithm from O(n^2) to O(n).
> 
> Signed-off-by: Tony Battersby 

Acked-by: Matthew Wilcox 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/9] dmapool: cleanup dma_pool_destroy

2018-11-12 Thread Matthew Wilcox
On Mon, Nov 12, 2018 at 10:42:48AM -0500, Tony Battersby wrote:
> Remove a small amount of code duplication between dma_pool_destroy() and
> pool_free_page() in preparation for adding more code without having to
> duplicate it.  No functional changes.
> 
> Signed-off-by: Tony Battersby 

Acked-by: Matthew Wilcox 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/9] dmapool: remove checks for dev == NULL

2018-11-12 Thread Matthew Wilcox
On Mon, Nov 12, 2018 at 10:42:12AM -0500, Tony Battersby wrote:
> dmapool originally tried to support pools without a device because
> dma_alloc_coherent() supports allocations without a device.  But nobody
> ended up using dma pools without a device, so the current checks in
> dmapool.c for pool->dev == NULL are both insufficient and causing bloat.
> Remove them.
> 
> Signed-off-by: Tony Battersby 

Acked-by: Matthew Wilcox 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/9] dmapool: fix boundary comparison

2018-11-12 Thread Matthew Wilcox
On Mon, Nov 12, 2018 at 10:41:34AM -0500, Tony Battersby wrote:
> Fixes: e34f44b3517f ("pool: Improve memory usage for devices which can't 
> cross boundaries")
> Signed-off-by: Tony Battersby 

Acked-by: Matthew Wilcox 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/9] dmapool: cleanup error messages

2018-08-03 Thread Matthew Wilcox
On Fri, Aug 03, 2018 at 02:43:07PM -0400, Tony Battersby wrote:
> Out of curiosity, I just tried to create a dmapool with a NULL dev and
> it crashed on this:
> 
> static inline int dev_to_node(struct device *dev)
> {
>   return dev->numa_node;
> }
> 
> struct dma_pool *dma_pool_create(const char *name, struct device *dev,
>size_t size, size_t align, size_t boundary)
> {
>   ...
>   retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, dev_to_node(dev));
>   ...
> }
> 
> So either it needs more special cases for supporting a NULL dev, or the
> special cases can be removed since no one does that anyway.

Actually, it's worse.  dev_to_node() works with a NULL dev ... unless
CONFIG_NUMA is set.  So we're leaving a timebomb by pretending to
allow it.  Let's just 'if (!dev) return NULL;' early in create.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/9] dmapool: improve scalability of dma_pool_free

2018-08-03 Thread Matthew Wilcox
On Fri, Aug 03, 2018 at 04:05:35PM -0400, Tony Battersby wrote:
> For v3 of the patchset, I was also considering to add a note to the
> kernel-doc comments for dma_pool_create() to use dma_alloc_coherent()
> directly instead of a dma pool if the driver intends to allow userspace
> to mmap() the returned pages, due to the new use of the _mapcount union
> in struct page.  Would you consider that useful information or pointless
> trivia?

If userspace is going to map the pages, it's going to expose other things
to userspace than the dma pages.  I'd suggest they not do this; they
should do their own sub-allocation which only exposes to an individual
task the data they're sure is OK for each task to see.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/9] dmapool: cleanup error messages

2018-08-03 Thread Matthew Wilcox
On Fri, Aug 03, 2018 at 06:59:20PM +0300, Andy Shevchenko wrote:
> >>> I'm pretty sure this was created in an order to avoid bad looking (and
> >>> in some cases frightening) "NULL device *" part.
> 
> JFYI: git log --no-merges --grep 'NULL device \*'

I think those commits actually argue in favour of Tony's patch to remove
the special casing.  Is it really useful to create dma pools with a NULL
device?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 8/9] dmapool: reduce footprint in struct page

2018-08-02 Thread Matthew Wilcox
On Thu, Aug 02, 2018 at 04:01:12PM -0400, Tony Battersby wrote:
> This is my attempt to shrink 'dma_free_o' and 'dma_in_use' in 'struct
> page' (originally 'offset' and 'in_use' in 'struct dma_page') to 16-bit
> so that it is unnecessary to use the '_mapcount' field of 'struct
> page'.  However, it adds complexity and makes allocating and freeing up
> to 20% slower for little gain, so I am NOT recommending that it be
> merged at this time.  I am posting it just for reference in case someone
> finds it useful in the future.

I spy some interesting pieces in here that I'd love you to submit as
patches for merging.

> One of the nice things about this is that dma_pool_free() can do some
> additional sanity checks:
> *) Check that the offset of the passed-in address corresponds to a valid
> block offset.

Can't we do that already?  Subtract the base address of the page from
the passed-in vaddr and check it's a multiple of pool->size?

>  struct dma_pool {/* the pool */
>  #define POOL_FULL_IDX   0
>  #define POOL_AVAIL_IDX  1
>  #define POOL_N_LISTS2
>   struct list_head page_list[POOL_N_LISTS];
>   spinlock_t lock;
> - size_t size;
>   struct device *dev;
> - size_t allocation;
> - size_t boundary;
> + unsigned int size;
> + unsigned int allocation;
> + unsigned int boundary_shift;
> + unsigned int blks_per_boundary;
> + unsigned int blks_per_alloc;

s/size_t/unsigned int/ is a good saving on 64-bit systems.  We recently
did something similar for slab/slub.

> @@ -141,6 +150,7 @@ static DEVICE_ATTR(pools, 0444, show_pool
>  struct dma_pool *dma_pool_create(const char *name, struct device *dev,
>size_t size, size_t align, size_t boundary)
>  {

We should change the API here too.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/9] dmapool: rename fields in dma_page

2018-08-02 Thread Matthew Wilcox
On Thu, Aug 02, 2018 at 03:59:15PM -0400, Tony Battersby wrote:
> Rename fields in 'struct dma_page' in preparation for moving them into
> 'struct page'.  No functional changes.
> 
> in_use -> dma_in_use
> offset -> dma_free_o

I don't like dma_free_o.  dma_free_off is OK by me.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/9] dmapool: improve scalability of dma_pool_alloc

2018-08-02 Thread Matthew Wilcox
On Thu, Aug 02, 2018 at 03:58:40PM -0400, Tony Battersby wrote:
> @@ -339,11 +360,16 @@ void *dma_pool_alloc(struct dma_pool *po
>  
>   spin_lock_irqsave(>lock, flags);
>  
> - list_add(>page_list, >page_list);
> + list_add(>dma_list, >page_list[POOL_AVAIL_IDX]);
>   ready:
>   page->in_use++;
>   offset = page->offset;
>   page->offset = *(int *)(page->vaddr + offset);
> + if (page->offset >= pool->allocation) {
> + /* Move page from the "available" list to the "full" list. */
> + list_del(>dma_list);
> + list_add(>dma_list, >page_list[POOL_FULL_IDX]);

I think this should be:

list_move_tail(>dma_list,
>page_list[POOL_FULL_IDX]);

> @@ -444,6 +476,11 @@ void dma_pool_free(struct dma_pool *pool
>  #endif
>  
>   page->in_use--;
> + if (page->offset >= pool->allocation) {
> + /* Move page from the "full" list to the "available" list. */
> + list_del(>dma_list);
> + list_add(>dma_list, >page_list[POOL_AVAIL_IDX]);

This one probably wants to be
list_move(>dma_list, >page_list[POOL_AVAIL_IDX]);

so that it's first-in-line to be allocated from for cache warmth purposes.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] dmapool: improve scalability of dma_pool_free

2018-07-27 Thread Matthew Wilcox
On Fri, Jul 27, 2018 at 09:23:30AM -0400, Tony Battersby wrote:
> On 07/26/2018 08:07 PM, Matthew Wilcox wrote:
> > If you're up for more major surgery, then I think we can put all the
> > information currently stored in dma_page into struct page.  Something
> > like this:
> >
> > +++ b/include/linux/mm_types.h
> > @@ -152,6 +152,12 @@ struct page {
> > unsigned long hmm_data;
> > unsigned long _zd_pad_1;/* uses mapping */
> > };
> > +   struct {/* dma_pool pages */
> > +   struct list_head dma_list;
> > +   unsigned short in_use;
> > +   unsigned short offset;
> > +   dma_addr_t dma;
> > +   };
> >  
> > /** @rcu_head: You can use this to free a page by RCU. */
> > struct rcu_head rcu_head;
> >
> > page_list -> dma_list
> > vaddr goes away (page_to_virt() exists)
> > dma -> dma
> > in_use and offset shrink from 4 bytes to 2.
> >
> > Some 32-bit systems have a 64-bit dma_addr_t, and on those systems,
> > this will be 8 + 2 + 2 + 8 = 20 bytes.  On 64-bit systems, it'll be
> > 16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available).
> >
> >
> offset at least needs more bits, since allocations can be multi-page. 

Ah, rats.  That means we have to use the mapcount union too:

+++ b/include/linux/mm_types.h
@@ -152,6 +152,11 @@ struct page {
unsigned long hmm_data;
unsigned long _zd_pad_1;/* uses mapping */
};
+   struct {/* dma_pool pages */
+   struct list_head dma_list;
+   unsigned int dma_in_use;
+   dma_addr_t dma;
+   };
 
/** @rcu_head: You can use this to free a page by RCU. */
struct rcu_head rcu_head;
@@ -174,6 +179,7 @@ struct page {
 
unsigned int active;/* SLAB */
int units;  /* SLOB */
+   unsigned int dma_offset;/* dma_pool */
};
 
/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */


> See the following from mpt3sas:
> 
> cat /sys/devices/pci:80/:80:07.0/:85:00.0/pools
> (manually cleaned up column alignment)
> poolinfo - 0.1
> reply_post_free_array pool  1  21 192 1
> reply_free pool 1  1  41728   1
> reply pool  1  1  1335296 1
> sense pool  1  1  970272  1
> chain pool  373959 386048 128 12064
> reply_post_free pool12 12 166528  12
>   ^size^

Wow, that's a pretty weird way to use the dmapool.  It'd be more efficient
to just call dma_alloc_coherent() directly.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] dmapool: improve scalability of dma_pool_free

2018-07-26 Thread Matthew Wilcox
On Thu, Jul 26, 2018 at 04:06:05PM -0400, Tony Battersby wrote:
> On 07/26/2018 03:42 PM, Matthew Wilcox wrote:
> > On Thu, Jul 26, 2018 at 02:54:56PM -0400, Tony Battersby wrote:
> >> dma_pool_free() scales poorly when the pool contains many pages because
> >> pool_find_page() does a linear scan of all allocated pages.  Improve its
> >> scalability by replacing the linear scan with a red-black tree lookup. 
> >> In big O notation, this improves the algorithm from O(n^2) to O(n * log n).
> > This is a lot of code to get us to O(n * log(n)) when we can use less
> > code to go to O(n).  dma_pool_free() is passed the virtual address.
> > We can go from virtual address to struct page with virt_to_page().
> > In struct page, we have 5 words available (20/40 bytes), and it's trivial
> > to use one of them to point to the struct dma_page.
> >
> Thanks for the tip.  I will give that a try.

If you're up for more major surgery, then I think we can put all the
information currently stored in dma_page into struct page.  Something
like this:

+++ b/include/linux/mm_types.h
@@ -152,6 +152,12 @@ struct page {
unsigned long hmm_data;
unsigned long _zd_pad_1;/* uses mapping */
};
+   struct {/* dma_pool pages */
+   struct list_head dma_list;
+   unsigned short in_use;
+   unsigned short offset;
+   dma_addr_t dma;
+   };
 
/** @rcu_head: You can use this to free a page by RCU. */
struct rcu_head rcu_head;

page_list -> dma_list
vaddr goes away (page_to_virt() exists)
dma -> dma
in_use and offset shrink from 4 bytes to 2.

Some 32-bit systems have a 64-bit dma_addr_t, and on those systems,
this will be 8 + 2 + 2 + 8 = 20 bytes.  On 64-bit systems, it'll be
16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available).

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] dmapool: improve scalability of dma_pool_free

2018-07-26 Thread Matthew Wilcox
On Thu, Jul 26, 2018 at 02:54:56PM -0400, Tony Battersby wrote:
> dma_pool_free() scales poorly when the pool contains many pages because
> pool_find_page() does a linear scan of all allocated pages.  Improve its
> scalability by replacing the linear scan with a red-black tree lookup. 
> In big O notation, this improves the algorithm from O(n^2) to O(n * log n).

This is a lot of code to get us to O(n * log(n)) when we can use less
code to go to O(n).  dma_pool_free() is passed the virtual address.
We can go from virtual address to struct page with virt_to_page().
In struct page, we have 5 words available (20/40 bytes), and it's trivial
to use one of them to point to the struct dma_page.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-25 Thread Matthew Wilcox
On Thu, May 24, 2018 at 05:29:43PM +0200, Michal Hocko wrote:
> > ie if we had more,
> > could we solve our pain by making them more generic?
> 
> Well, if you have more you will consume more bits in the struct pages,
> right?

Not necessarily ... the zone number is stored in the struct page
currently, so either two or three bits are used right now.  In my
proposal, one can infer the zone of a page from its PFN, except for
ZONE_MOVABLE.  So we could trim down to just one bit per struct page
for 32-bit machines while using 3 bits on 64-bit machines, where there
is plenty of space.

> > it more-or-less sucks that the devices with 28-bit DMA limits are forced
> > to allocate from the low 16MB when they're perfectly capable of using the
> > low 256MB.
> 
> Do we actually care all that much about those? If yes then we should
> probably follow the ZONE_DMA (x86) path and use a CMA region for them.
> I mean most devices should be good with very limited addressability or
> below 4G, no?

Sure.  One other thing I meant to mention was the media devices
(TV capture cards and so on) which want a vmalloc_32() allocation.
On 32-bit machines right now, we allocate from LOWMEM, when we really
should be allocating from the 1GB-4GB region.  32-bit machines generally
don't have a ZONE_DMA32 today.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-24 Thread Matthew Wilcox
On Thu, May 24, 2018 at 02:23:23PM +0200, Michal Hocko wrote:
> > If we had eight ZONEs, we could offer:
> 
> No, please no more zones. What we have is quite a maint. burden on its
> own. Ideally we should only have lowmem, highmem and special/device
> zones for directly kernel accessible memory, the one that the kernel
> cannot or must not use and completely special memory managed out of
> the page allocator. All the remaining constrains should better be
> implemented on top.

I believe you when you say that they're a maintenance pain.  Is that
maintenance pain because they're so specialised?  ie if we had more,
could we solve our pain by making them more generic?

> > ZONE_16M// 24 bit
> > ZONE_256M   // 28 bit
> > ZONE_LOWMEM // CONFIG_32BIT only
> > ZONE_4G // 32 bit
> > ZONE_64G// 36 bit
> > ZONE_1T // 40 bit
> > ZONE_ALL// everything larger
> > ZONE_MOVABLE// movable allocations; no physical address guarantees
> > 
> > #ifdef CONFIG_64BIT
> > #define ZONE_NORMAL ZONE_ALL
> > #else
> > #define ZONE_NORMAL ZONE_LOWMEM
> > #endif
> > 
> > This would cover most driver DMA mask allocations; we could tweak the
> > offered zones based on analysis of what people need.
> 
> But those already do have aproper API, IIUC. So do we really need to
> make our GFP_*/Zone API more complicated than it already is?

I don't want to change the driver API (setting the DMA mask, etc),
but we don't actually have a good API to the page allocator for the
implementation of dma_alloc_foo() to request pages.  More or less,
architectures do:

if (mask < 4GB)
alloc_page(GFP_DMA)
else if (mask < 64EB)
alloc_page(GFP_DMA32)
else
alloc_page(GFP_HIGHMEM)

it more-or-less sucks that the devices with 28-bit DMA limits are forced
to allocate from the low 16MB when they're perfectly capable of using the
low 256MB.  Sure, my proposal doesn't help 27 or 26 bit DMA mask devices,
but those are pretty rare.

I'm sure you don't need reminding what a mess vmalloc_32 is, and the
implementation of saa7146_vmalloc_build_pgtable() just hurts.

> > #define GFP_HIGHUSER(GFP_USER | ZONE_ALL)
> > #define GFP_HIGHUSER_MOVABLE(GFP_USER | ZONE_MOVABLE)
> > 
> > One other thing I want to see is that fallback from zones happens from
> > highest to lowest normally (ie if you fail to allocate in 1T, then you
> > try to allocate from 64G), but movable allocations hapen from lowest
> > to highest.  So ZONE_16M ends up full of page cache pages which are
> > readily evictable for the rare occasions when we need to allocate memory
> > below 16MB.
> > 
> > I'm sure there are lots of good reasons why this won't work, which is
> > why I've been hesitant to propose it before now.
> 
> I am worried you are playing with a can of worms...

Yes.  Me too.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-23 Thread Matthew Wilcox
On Tue, May 22, 2018 at 08:37:28PM +0200, Michal Hocko wrote:
> So why is this any better than the current code. Sure I am not a great
> fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> doesn't look too much better, yet we are losing a check for incompatible
> gfp flags. The diffstat looks really sound but then you just look and
> see that the large part is the comment that at least explained the gfp
> zone modifiers somehow and the debugging code. So what is the selling
> point?

I have a plan, but it's not exactly fully-formed yet.

One of the big problems we have today is that we have a lot of users
who have constraints on the physical memory they want to allocate,
but we have very limited abilities to provide them with what they're
asking for.  The various different ZONEs have different meanings on
different architectures and are generally a mess.

If we had eight ZONEs, we could offer:

ZONE_16M// 24 bit
ZONE_256M   // 28 bit
ZONE_LOWMEM // CONFIG_32BIT only
ZONE_4G // 32 bit
ZONE_64G// 36 bit
ZONE_1T // 40 bit
ZONE_ALL// everything larger
ZONE_MOVABLE// movable allocations; no physical address guarantees

#ifdef CONFIG_64BIT
#define ZONE_NORMAL ZONE_ALL
#else
#define ZONE_NORMAL ZONE_LOWMEM
#endif

This would cover most driver DMA mask allocations; we could tweak the
offered zones based on analysis of what people need.

#define GFP_HIGHUSER(GFP_USER | ZONE_ALL)
#define GFP_HIGHUSER_MOVABLE(GFP_USER | ZONE_MOVABLE)

One other thing I want to see is that fallback from zones happens from
highest to lowest normally (ie if you fail to allocate in 1T, then you
try to allocate from 64G), but movable allocations hapen from lowest
to highest.  So ZONE_16M ends up full of page cache pages which are
readily evictable for the rare occasions when we need to allocate memory
below 16MB.

I'm sure there are lots of good reasons why this won't work, which is
why I've been hesitant to propose it before now.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 10/12] mm/zsmalloc: update usage of address zone modifiers

2018-05-22 Thread Matthew Wilcox
On Mon, May 21, 2018 at 11:20:31PM +0800, Huaisheng Ye wrote:
> @@ -343,7 +343,7 @@ static void destroy_cache(struct zs_pool *pool)
>  static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp)
>  {
>   return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
> - gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
> + gfp & ~__GFP_ZONE_MOVABLE);
>  }

This should be & ~GFP_ZONEMASK

Actually, we should probably have a function to clear those bits rather
than have every driver manipulating the gfp mask like this.  Maybe

#define gfp_normal(gfp) ((gfp) & ~GFP_ZONEMASK)

return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
-   gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
+   gfp_normal(gfp));

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 00/99] XArray version 6

2018-01-18 Thread Matthew Wilcox
On Thu, Jan 18, 2018 at 05:56:12PM +0100, David Sterba wrote:
> On Thu, Jan 18, 2018 at 08:48:43AM -0800, Matthew Wilcox wrote:
> > Thank you!  I shall attempt to debug.  Was this with a btrfs root
> > filesystem?  I'm most suspicious of those patches right now, since they've
> > received next to no testing.  I'm going to put together a smaller patchset
> > which just does the page cache conversion and nothing else in the hope
> > that we can get that merged this year.
> 
> No, the root is ext3 and there was no btrfs filesytem mounted at the
> time.

Found it; I was missing a prerequisite patch.  New (smaller) patch series
coming soon.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 00/99] XArray version 6

2018-01-18 Thread Matthew Wilcox
On Thu, Jan 18, 2018 at 05:07:50PM +0100, David Sterba wrote:
> On Wed, Jan 17, 2018 at 12:20:24PM -0800, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawil...@microsoft.com>
> > 
> > This version of the XArray has no known bugs.
> 
> I've booted this patchset on 2 boxes, both had random problems during
> boot. On one I was not able to diagnose what went wrong. On the other
> one the system booted up to userspace and failed to set up networking.
> Serial console worked and the network service complained about wrong
> format of /usr/share/wicked/schema/team.xml . That's supposed to be a
> text file, though hexdump showed me lots of zeros. Trimmed output:
> 
>   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
> *
> (similar output here)
> *
> 0a10  00 00 00 00 00 00 00 00  11 03 00 00 00 00 00 00  ||
> 0a20  20 8b 7f 01 00 00 00 00  a0 84 7d 01 00 00 00 00  | .}.|
> 0a30  00 00 00 00 00 00 00 00  10 89 7f 01 00 00 00 00  ||
> 0a40  a0 84 7d 01 00 00 00 00  00 00 00 00 00 00 00 00  |..}.|
> 0a50  80 8a 7f 01 00 00 00 00  e0 cf 7d 01 00 00 00 00  |..}.|
> 0a60  00 00 00 00 00 00 00 00  60 8a 7f 01 00 00 00 00  |`...|
> 0a70  a0 84 7d 01 00 00 00 00  00 00 00 00 00 00 00 00  |..}.|
> 0a80  30 89 7f 01 00 00 00 00  a0 84 7d 01 00 00 00 00  |0.}.|
> 0a90  00 00 00 00 00 00 00 00  60 f2 7f 01 00 00 00 00  |`...|
> 0aa0  40 fd 7e 01 00 00 00 00  00 00 00 00 00 00 00 00  |@.~.|
> 0ab0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
> *
> 1000  3e 0a 20 20 3c 2f 6d 65  74 68 6f 64 3e 0a 3c 2f  |>.  . 1010  73 65 72 76 69 63 65 3e  0a   |service>.|
> 
> There's something at the end of the file that does look like a xml fragment.
> The file size is 4121. This looks to me like exactly the first page of the 
> file
> was not read correctly.
> 
> The xml file is supposed to be read-only during startup, so there was no write
> in flight. 'rpm -Vv' reported only this file corrupted. Booting to other
> kernels was fine, network up, and the file was ok again. So the
> corruption happened only in memory, which leads me to conclusion that
> there is an unknown bug in your patchset.

Thank you!  I shall attempt to debug.  Was this with a btrfs root
filesystem?  I'm most suspicious of those patches right now, since they've
received next to no testing.  I'm going to put together a smaller patchset
which just does the page cache conversion and nothing else in the hope
that we can get that merged this year.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 56/99] lustre: Convert to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 drivers/staging/lustre/lustre/llite/glimpse.c   | 12 +---
 drivers/staging/lustre/lustre/mdc/mdc_request.c | 16 
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/glimpse.c 
b/drivers/staging/lustre/lustre/llite/glimpse.c
index 5f2843da911c..25232fdf5797 100644
--- a/drivers/staging/lustre/lustre/llite/glimpse.c
+++ b/drivers/staging/lustre/lustre/llite/glimpse.c
@@ -57,7 +57,7 @@ static const struct cl_lock_descr whole_file = {
 };
 
 /*
- * Check whether file has possible unwriten pages.
+ * Check whether file has possible unwritten pages.
  *
  * \retval 1file is mmap-ed or has dirty pages
  *  0otherwise
@@ -66,16 +66,14 @@ blkcnt_t dirty_cnt(struct inode *inode)
 {
blkcnt_t cnt = 0;
struct vvp_object *vob = cl_inode2vvp(inode);
-   void  *results[1];
 
-   if (inode->i_mapping)
-   cnt += radix_tree_gang_lookup_tag(>i_mapping->pages,
- results, 0, 1,
- PAGECACHE_TAG_DIRTY);
+   if (inode->i_mapping && xa_tagged(>i_mapping->pages,
+   PAGECACHE_TAG_DIRTY))
+   cnt = 1;
if (cnt == 0 && atomic_read(>vob_mmap_cnt) > 0)
cnt = 1;
 
-   return (cnt > 0) ? 1 : 0;
+   return cnt;
 }
 
 int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c 
b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 2ec79a6b17da..ea23247e9e02 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -934,17 +934,18 @@ static struct page *mdc_page_locate(struct address_space 
*mapping, __u64 *hash,
 * hash _smaller_ than one we are looking for.
 */
unsigned long offset = hash_x_index(*hash, hash64);
+   XA_STATE(xas, >pages, offset);
struct page *page;
-   int found;
 
-   xa_lock_irq(>pages);
-   found = radix_tree_gang_lookup(>pages,
-  (void **), offset, 1);
-   if (found > 0 && !xa_is_value(page)) {
+   xas_lock_irq();
+   page = xas_find(, ULONG_MAX);
+   if (xa_is_value(page))
+   page = NULL;
+   if (page) {
struct lu_dirpage *dp;
 
get_page(page);
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
/*
 * In contrast to find_lock_page() we are sure that directory
 * page cannot be truncated (while DLM lock is held) and,
@@ -992,8 +993,7 @@ static struct page *mdc_page_locate(struct address_space 
*mapping, __u64 *hash,
page = ERR_PTR(-EIO);
}
} else {
-   xa_unlock_irq(>pages);
-   page = NULL;
+   xas_unlock_irq();
}
return page;
 }
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 89/99] btrfs: Convert buffer_radix to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Eliminate the buffer_lock as the internal xa_lock provides all the
necessary protection.  We can remove the radix_tree_preload calls, but
I can't find a good way to use the 'exists' result from xa_cmpxchg().
We could resort to the advanced API to improve this, but it's a really
unlikely case (nothing in the xarray when we first look; something there
when we try to add the newly-allocated extent buffer), so I think it's
not worth optimising for.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/btrfs/ctree.h |  5 ++-
 fs/btrfs/disk-io.c   |  3 +-
 fs/btrfs/extent_io.c | 82 ++--
 fs/btrfs/tests/btrfs-tests.c | 26 +++---
 4 files changed, 40 insertions(+), 76 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 272d099bed7e..87984ce3a4c2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1058,9 +1058,8 @@ struct btrfs_fs_info {
/* readahead works cnt */
atomic_t reada_works_cnt;
 
-   /* Extent buffer radix tree */
-   spinlock_t buffer_lock;
-   struct radix_tree_root buffer_radix;
+   /* Extent buffer array */
+   struct xarray buffer_array;
 
/* next backup root to be overwritten */
int backup_root_index;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1eae29045d43..650d1350b64d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2429,7 +2429,7 @@ int open_ctree(struct super_block *sb,
}
 
xa_init(_info->fs_roots);
-   INIT_RADIX_TREE(_info->buffer_radix, GFP_ATOMIC);
+   xa_init(_info->buffer_array);
INIT_LIST_HEAD(_info->trans_list);
INIT_LIST_HEAD(_info->dead_roots);
INIT_LIST_HEAD(_info->delayed_iputs);
@@ -2442,7 +2442,6 @@ int open_ctree(struct super_block *sb,
spin_lock_init(_info->tree_mod_seq_lock);
spin_lock_init(_info->super_lock);
spin_lock_init(_info->qgroup_op_lock);
-   spin_lock_init(_info->buffer_lock);
spin_lock_init(_info->unused_bgs_lock);
rwlock_init(_info->tree_mod_log_lock);
mutex_init(_info->unused_bg_unpin_mutex);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fd5e9d887328..2b43fa11c9e2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4884,8 +4884,7 @@ struct extent_buffer *find_extent_buffer(struct 
btrfs_fs_info *fs_info,
struct extent_buffer *eb;
 
rcu_read_lock();
-   eb = radix_tree_lookup(_info->buffer_radix,
-  start >> PAGE_SHIFT);
+   eb = xa_load(_info->buffer_array, start >> PAGE_SHIFT);
if (eb && atomic_inc_not_zero(>refs)) {
rcu_read_unlock();
/*
@@ -4919,31 +4918,24 @@ struct extent_buffer *find_extent_buffer(struct 
btrfs_fs_info *fs_info,
 struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
u64 start)
 {
-   struct extent_buffer *eb, *exists = NULL;
-   int ret;
+   struct extent_buffer *exists, *eb = NULL;
 
-   eb = find_extent_buffer(fs_info, start);
-   if (eb)
-   return eb;
-   eb = alloc_dummy_extent_buffer(fs_info, start);
-   if (!eb)
-   return NULL;
-   eb->fs_info = fs_info;
 again:
-   ret = radix_tree_preload(GFP_NOFS);
-   if (ret)
+   exists = find_extent_buffer(fs_info, start);
+   if (exists)
goto free_eb;
-   spin_lock(_info->buffer_lock);
-   ret = radix_tree_insert(_info->buffer_radix,
-   start >> PAGE_SHIFT, eb);
-   spin_unlock(_info->buffer_lock);
-   radix_tree_preload_end();
-   if (ret == -EEXIST) {
-   exists = find_extent_buffer(fs_info, start);
-   if (exists)
+   if (!eb)
+   eb = alloc_dummy_extent_buffer(fs_info, start);
+   if (!eb)
+   return NULL;
+   exists = xa_cmpxchg(_info->buffer_array, start >> PAGE_SHIFT,
+   NULL, eb, GFP_NOFS);
+   if (unlikely(exists)) {
+   if (xa_is_err(exists)) {
+   exists = NULL;
goto free_eb;
-   else
-   goto again;
+   }
+   goto again;
}
check_buffer_tree_ref(eb);
set_bit(EXTENT_BUFFER_IN_TREE, >bflags);
@@ -4957,7 +4949,8 @@ struct extent_buffer *alloc_test_extent_buffer(struct 
btrfs_fs_info *fs_info,
atomic_inc(>refs);
return eb;
 free_eb:
-   btrfs_release_extent_buffer(eb);
+   if (eb)
+   btrfs_release_extent_buffer(eb);
return exists;
 }
 #endif
@@ -4969,22 +4962,24 @@ struct extent_buffer *alloc_extent_buffer(struct 
btrfs_fs_info *fs_info,
uns

[PATCH v6 91/99] btrfs: Convert name_cache to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

This is a very straightforward conversion.  The handling of collisions
in the namecache could be better handled with an hlist, but that's a
patch for another day.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/btrfs/send.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 20d3300bd268..3891a8e958fa 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -23,7 +23,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -118,7 +118,7 @@ struct send_ctx {
struct list_head new_refs;
struct list_head deleted_refs;
 
-   struct radix_tree_root name_cache;
+   struct xarray name_cache;
struct list_head name_cache_list;
int name_cache_size;
 
@@ -2021,8 +2021,7 @@ static int name_cache_insert(struct send_ctx *sctx,
int ret = 0;
struct list_head *nce_head;
 
-   nce_head = radix_tree_lookup(>name_cache,
-   (unsigned long)nce->ino);
+   nce_head = xa_load(>name_cache, (unsigned long)nce->ino);
if (!nce_head) {
nce_head = kmalloc(sizeof(*nce_head), GFP_KERNEL);
if (!nce_head) {
@@ -2031,7 +2030,8 @@ static int name_cache_insert(struct send_ctx *sctx,
}
INIT_LIST_HEAD(nce_head);
 
-   ret = radix_tree_insert(>name_cache, nce->ino, nce_head);
+   ret = xa_insert(>name_cache, nce->ino, nce_head,
+   GFP_KERNEL);
if (ret < 0) {
kfree(nce_head);
kfree(nce);
@@ -2050,8 +2050,7 @@ static void name_cache_delete(struct send_ctx *sctx,
 {
struct list_head *nce_head;
 
-   nce_head = radix_tree_lookup(>name_cache,
-   (unsigned long)nce->ino);
+   nce_head = xa_load(>name_cache, (unsigned long)nce->ino);
if (!nce_head) {
btrfs_err(sctx->send_root->fs_info,
  "name_cache_delete lookup failed ino %llu cache size %d, leaking 
memory",
@@ -2066,7 +2065,7 @@ static void name_cache_delete(struct send_ctx *sctx,
 * We may not get to the final release of nce_head if the lookup fails
 */
if (nce_head && list_empty(nce_head)) {
-   radix_tree_delete(>name_cache, (unsigned long)nce->ino);
+   xa_erase(>name_cache, (unsigned long)nce->ino);
kfree(nce_head);
}
 }
@@ -2077,7 +2076,7 @@ static struct name_cache_entry *name_cache_search(struct 
send_ctx *sctx,
struct list_head *nce_head;
struct name_cache_entry *cur;
 
-   nce_head = radix_tree_lookup(>name_cache, (unsigned long)ino);
+   nce_head = xa_load(>name_cache, (unsigned long)ino);
if (!nce_head)
return NULL;
 
@@ -6526,7 +6525,7 @@ long btrfs_ioctl_send(struct file *mnt_file, struct 
btrfs_ioctl_send_args *arg)
 
INIT_LIST_HEAD(>new_refs);
INIT_LIST_HEAD(>deleted_refs);
-   INIT_RADIX_TREE(>name_cache, GFP_KERNEL);
+   xa_init(>name_cache);
INIT_LIST_HEAD(>name_cache_list);
 
sctx->flags = arg->flags;
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 57/99] dax: Convert dax_unlock_mapping_entry to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Replace slot_locked() with dax_locked() and inline unlock_slot() into
its only caller.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/dax.c | 48 
 1 file changed, 16 insertions(+), 32 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 5097a606da1a..f3463d93a6ce 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -73,6 +73,11 @@ fs_initcall(init_dax_wait_table);
 #define DAX_ZERO_PAGE  (1UL << 2)
 #define DAX_EMPTY  (1UL << 3)
 
+static bool dax_locked(void *entry)
+{
+   return xa_to_value(entry) & DAX_ENTRY_LOCK;
+}
+
 static unsigned long dax_radix_sector(void *entry)
 {
return xa_to_value(entry) >> DAX_SHIFT;
@@ -180,16 +185,6 @@ static void dax_wake_mapping_entry_waiter(struct 
address_space *mapping,
__wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, );
 }
 
-/*
- * Check whether the given slot is locked.  Must be called with xa_lock held.
- */
-static inline int slot_locked(struct address_space *mapping, void **slot)
-{
-   unsigned long entry = xa_to_value(
-   radix_tree_deref_slot_protected(slot, >pages.xa_lock));
-   return entry & DAX_ENTRY_LOCK;
-}
-
 /*
  * Mark the given slot as locked.  Must be called with xa_lock held.
  */
@@ -202,18 +197,6 @@ static inline void *lock_slot(struct address_space 
*mapping, void **slot)
return entry;
 }
 
-/*
- * Mark the given slot as unlocked.  Must be called with xa_lock held.
- */
-static inline void *unlock_slot(struct address_space *mapping, void **slot)
-{
-   unsigned long v = xa_to_value(
-   radix_tree_deref_slot_protected(slot, >pages.xa_lock));
-   void *entry = xa_mk_value(v & ~DAX_ENTRY_LOCK);
-   radix_tree_replace_slot(>pages, slot, entry);
-   return entry;
-}
-
 /*
  * Lookup entry in radix tree, wait for it to become unlocked if it is
  * a DAX entry and return it. The caller must call
@@ -237,8 +220,7 @@ static void *get_unlocked_mapping_entry(struct 
address_space *mapping,
entry = __radix_tree_lookup(>pages, index, NULL,
  );
if (!entry ||
-   WARN_ON_ONCE(!xa_is_value(entry)) ||
-   !slot_locked(mapping, slot)) {
+   WARN_ON_ONCE(!xa_is_value(entry)) || !dax_locked(entry)) {
if (slotp)
*slotp = slot;
return entry;
@@ -257,17 +239,19 @@ static void *get_unlocked_mapping_entry(struct 
address_space *mapping,
 static void dax_unlock_mapping_entry(struct address_space *mapping,
 pgoff_t index)
 {
-   void *entry, **slot;
+   XA_STATE(xas, >pages, index);
+   void *entry;
 
-   xa_lock_irq(>pages);
-   entry = __radix_tree_lookup(>pages, index, NULL, );
-   if (WARN_ON_ONCE(!entry || !xa_is_value(entry) ||
-!slot_locked(mapping, slot))) {
-   xa_unlock_irq(>pages);
+   xas_lock_irq();
+   entry = xas_load();
+   if (WARN_ON_ONCE(!entry || !xa_is_value(entry) || !dax_locked(entry))) {
+   xas_unlock_irq();
return;
}
-   unlock_slot(mapping, slot);
-   xa_unlock_irq(>pages);
+   entry = xa_mk_value(xa_to_value(entry) & ~DAX_ENTRY_LOCK);
+   xas_store(, entry);
+   /* Safe to not call xas_pause here -- we don't touch the array after */
+   xas_unlock_irq();
dax_wake_mapping_entry_waiter(mapping, index, entry, false);
 }
 
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 55/99] f2fs: Convert to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

This is a straightforward conversion.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/f2fs/data.c   |  3 +--
 fs/f2fs/dir.c|  5 +
 fs/f2fs/inline.c |  6 +-
 fs/f2fs/node.c   | 10 ++
 4 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c8f6d9806896..1f3f192f152f 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2175,8 +2175,7 @@ void f2fs_set_page_dirty_nobuffers(struct page *page)
xa_lock_irqsave(>pages, flags);
WARN_ON_ONCE(!PageUptodate(page));
account_page_dirtied(page, mapping);
-   radix_tree_tag_set(>pages,
-   page_index(page), PAGECACHE_TAG_DIRTY);
+   __xa_set_tag(>pages, page_index(page), PAGECACHE_TAG_DIRTY);
xa_unlock_irqrestore(>pages, flags);
unlock_page_memcg(page);
 
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index b5515ea6bb2f..296070016ec9 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -708,7 +708,6 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
struct page *page,
unsigned int bit_pos;
int slots = GET_DENTRY_SLOTS(le16_to_cpu(dentry->name_len));
struct address_space *mapping = page_mapping(page);
-   unsigned long flags;
int i;
 
f2fs_update_time(F2FS_I_SB(dir), REQ_TIME);
@@ -739,10 +738,8 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
struct page *page,
 
if (bit_pos == NR_DENTRY_IN_BLOCK &&
!truncate_hole(dir, page->index, page->index + 1)) {
-   xa_lock_irqsave(>pages, flags);
-   radix_tree_tag_clear(>pages, page_index(page),
+   xa_clear_tag(>pages, page_index(page),
 PAGECACHE_TAG_DIRTY);
-   xa_unlock_irqrestore(>pages, flags);
 
clear_page_dirty_for_io(page);
ClearPagePrivate(page);
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 7858b8e15f33..d3c3f84beca9 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -204,7 +204,6 @@ int f2fs_write_inline_data(struct inode *inode, struct page 
*page)
void *src_addr, *dst_addr;
struct dnode_of_data dn;
struct address_space *mapping = page_mapping(page);
-   unsigned long flags;
int err;
 
set_new_dnode(, inode, NULL, NULL, 0);
@@ -226,10 +225,7 @@ int f2fs_write_inline_data(struct inode *inode, struct 
page *page)
kunmap_atomic(src_addr);
set_page_dirty(dn.inode_page);
 
-   xa_lock_irqsave(>pages, flags);
-   radix_tree_tag_clear(>pages, page_index(page),
-PAGECACHE_TAG_DIRTY);
-   xa_unlock_irqrestore(>pages, flags);
+   xa_clear_tag(>pages, page_index(page), PAGECACHE_TAG_DIRTY);
 
set_inode_flag(inode, FI_APPEND_WRITE);
set_inode_flag(inode, FI_DATA_EXIST);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 6b64a3009d55..0a6d5c2f996e 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -88,14 +88,10 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int 
type)
 static void clear_node_page_dirty(struct page *page)
 {
struct address_space *mapping = page->mapping;
-   unsigned int long flags;
 
if (PageDirty(page)) {
-   xa_lock_irqsave(>pages, flags);
-   radix_tree_tag_clear(>pages,
-   page_index(page),
+   xa_clear_tag(>pages, page_index(page),
PAGECACHE_TAG_DIRTY);
-   xa_unlock_irqrestore(>pages, flags);
 
clear_page_dirty_for_io(page);
dec_page_count(F2FS_M_SB(mapping), F2FS_DIRTY_NODES);
@@ -1142,9 +1138,7 @@ void ra_node_page(struct f2fs_sb_info *sbi, nid_t nid)
return;
f2fs_bug_on(sbi, check_nid_range(sbi, nid));
 
-   rcu_read_lock();
-   apage = radix_tree_lookup(_MAPPING(sbi)->pages, nid);
-   rcu_read_unlock();
+   apage = xa_load(_MAPPING(sbi)->pages, nid);
if (apage)
return;
 
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 35/99] mm: Convert __do_page_cache_readahead to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

This one is trivial.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 mm/readahead.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index f64b31b3a84a..66bcaffd47f0 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -174,9 +174,7 @@ int __do_page_cache_readahead(struct address_space 
*mapping, struct file *filp,
if (page_offset > end_index)
break;
 
-   rcu_read_lock();
-   page = radix_tree_lookup(>pages, page_offset);
-   rcu_read_unlock();
+   page = xa_load(>pages, page_offset);
if (page && !xa_is_value(page))
continue;
 
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 37/99] mm: Convert huge_memory to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Quite a straightforward conversion.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 mm/huge_memory.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f71dd3e7d8cd..5c275295bbd3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2379,7 +2379,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
if (PageAnon(head) && !PageSwapCache(head)) {
page_ref_inc(page_tail);
} else {
-   /* Additional pin to radix tree */
+   /* Additional pin to page cache */
page_ref_add(page_tail, 2);
}
 
@@ -2450,13 +2450,13 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
ClearPageCompound(head);
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
-   /* Additional pin to radix tree of swap cache */
+   /* Additional pin to swap cache */
if (PageSwapCache(head))
page_ref_add(head, 2);
else
page_ref_inc(head);
} else {
-   /* Additional pin to radix tree */
+   /* Additional pin to page cache */
page_ref_add(head, 2);
xa_unlock(>mapping->pages);
}
@@ -2568,7 +2568,7 @@ bool can_split_huge_page(struct page *page, int 
*pextra_pins)
 {
int extra_pins;
 
-   /* Additional pins from radix tree */
+   /* Additional pins from page cache */
if (PageAnon(page))
extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0;
else
@@ -2664,17 +2664,14 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
spin_lock_irqsave(zone_lru_lock(page_zone(head)), flags);
 
if (mapping) {
-   void **pslot;
+   XA_STATE(xas, >pages, page_index(head));
 
-   xa_lock(>pages);
-   pslot = radix_tree_lookup_slot(>pages,
-   page_index(head));
/*
-* Check if the head page is present in radix tree.
+* Check if the head page is present in page cache.
 * We assume all tail are present too, if head is there.
 */
-   if (radix_tree_deref_slot_protected(pslot,
-   >pages.xa_lock) != head)
+   xa_lock(>pages);
+   if (xas_load() != head)
goto fail;
}
 
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 93/99] f2fs: Convert ino_root to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

I did a fairly major rewrite of __add_ino_entry(); please check carefully.
Also, we can remove ino_list unless it's important to write out orphan
inodes in the order they were orphaned.  It may also make more sense to
combine the array of inode_management structures into a single XArray
with tags, but that would be a job for someone who understands this
filesystem better than I do.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/f2fs/checkpoint.c | 85 +++-
 fs/f2fs/f2fs.h   |  3 +-
 2 files changed, 38 insertions(+), 50 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 4aa69bc1c70a..04d69679da13 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -403,33 +403,30 @@ static void __add_ino_entry(struct f2fs_sb_info *sbi, 
nid_t ino,
struct inode_management *im = >im[type];
struct ino_entry *e, *tmp;
 
-   tmp = f2fs_kmem_cache_alloc(ino_entry_slab, GFP_NOFS);
-
-   radix_tree_preload(GFP_NOFS | __GFP_NOFAIL);
-
-   spin_lock(>ino_lock);
-   e = radix_tree_lookup(>ino_root, ino);
-   if (!e) {
-   e = tmp;
-   if (unlikely(radix_tree_insert(>ino_root, ino, e)))
-   f2fs_bug_on(sbi, 1);
-
-   memset(e, 0, sizeof(struct ino_entry));
-   e->ino = ino;
-
-   list_add_tail(>list, >ino_list);
-   if (type != ORPHAN_INO)
-   im->ino_num++;
+   xa_lock(>ino_root);
+   e = xa_load(>ino_root, ino);
+   if (e)
+   goto found;
+   xa_unlock(>ino_root);
+
+   tmp = f2fs_kmem_cache_alloc(ino_entry_slab, GFP_NOFS | __GFP_ZERO);
+   xa_lock(>ino_root);
+   e = __xa_cmpxchg(>ino_root, ino, NULL, tmp,
+   GFP_NOFS | __GFP_NOFAIL);
+   if (e) {
+   kmem_cache_free(ino_entry_slab, tmp);
+   goto found;
}
+   e = tmp;
 
+   e->ino = ino;
+   list_add_tail(>list, >ino_list);
+   if (type != ORPHAN_INO)
+   im->ino_num++;
+found:
if (type == FLUSH_INO)
f2fs_set_bit(devidx, (char *)>dirty_device);
-
-   spin_unlock(>ino_lock);
-   radix_tree_preload_end();
-
-   if (e != tmp)
-   kmem_cache_free(ino_entry_slab, tmp);
+   xa_unlock(>ino_root);
 }
 
 static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
@@ -437,17 +434,14 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, 
nid_t ino, int type)
struct inode_management *im = >im[type];
struct ino_entry *e;
 
-   spin_lock(>ino_lock);
-   e = radix_tree_lookup(>ino_root, ino);
+   xa_lock(>ino_root);
+   e = __xa_erase(>ino_root, ino);
if (e) {
list_del(>list);
-   radix_tree_delete(>ino_root, ino);
im->ino_num--;
-   spin_unlock(>ino_lock);
kmem_cache_free(ino_entry_slab, e);
-   return;
}
-   spin_unlock(>ino_lock);
+   xa_unlock(>ino_root);
 }
 
 void add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
@@ -466,12 +460,8 @@ void remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, 
int type)
 bool exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode)
 {
struct inode_management *im = >im[mode];
-   struct ino_entry *e;
 
-   spin_lock(>ino_lock);
-   e = radix_tree_lookup(>ino_root, ino);
-   spin_unlock(>ino_lock);
-   return e ? true : false;
+   return xa_load(>ino_root, ino) ? true : false;
 }
 
 void release_ino_entry(struct f2fs_sb_info *sbi, bool all)
@@ -482,14 +472,14 @@ void release_ino_entry(struct f2fs_sb_info *sbi, bool all)
for (i = all ? ORPHAN_INO : APPEND_INO; i < MAX_INO_ENTRY; i++) {
struct inode_management *im = >im[i];
 
-   spin_lock(>ino_lock);
+   xa_lock(>ino_root);
list_for_each_entry_safe(e, tmp, >ino_list, list) {
list_del(>list);
-   radix_tree_delete(>ino_root, e->ino);
+   __xa_erase(>ino_root, e->ino);
kmem_cache_free(ino_entry_slab, e);
im->ino_num--;
}
-   spin_unlock(>ino_lock);
+   xa_unlock(>ino_root);
}
 }
 
@@ -506,11 +496,11 @@ bool is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino,
struct ino_entry *e;
bool is_dirty = false;
 
-   spin_lock(>ino_lock);
-   e = radix_tree_lookup(>ino_root, ino);
+   xa_lock(>ino_root);
+   e = xa_load(>ino_root, ino);
if (e && f2fs_test_bit(dev

[PATCH v6 36/99] mm: Convert page migration to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 mm/migrate.c | 41 -
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 75d19904dd9a..7122fec9b075 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -322,7 +322,7 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t 
*ptep,
page = migration_entry_to_page(entry);
 
/*
-* Once radix-tree replacement of page migration started, page_count
+* Once page cache replacement of page migration started, page_count
 * *must* be zero. And, we don't want to call wait_on_page_locked()
 * against a page without get_page().
 * So, we use get_page_unless_zero(), here. Even failed, page fault
@@ -437,10 +437,10 @@ int migrate_page_move_mapping(struct address_space 
*mapping,
struct buffer_head *head, enum migrate_mode mode,
int extra_count)
 {
+   XA_STATE(xas, >pages, page_index(page));
struct zone *oldzone, *newzone;
int dirty;
int expected_count = 1 + extra_count;
-   void **pslot;
 
/*
 * Device public or private pages have an extra refcount as they are
@@ -466,21 +466,16 @@ int migrate_page_move_mapping(struct address_space 
*mapping,
oldzone = page_zone(page);
newzone = page_zone(newpage);
 
-   xa_lock_irq(>pages);
-
-   pslot = radix_tree_lookup_slot(>pages,
-   page_index(page));
+   xas_lock_irq();
 
expected_count += 1 + page_has_private(page);
-   if (page_count(page) != expected_count ||
-   radix_tree_deref_slot_protected(pslot,
-   >pages.xa_lock) != page) {
-   xa_unlock_irq(>pages);
+   if (page_count(page) != expected_count || xas_load() != page) {
+   xas_unlock_irq();
return -EAGAIN;
}
 
if (!page_ref_freeze(page, expected_count)) {
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
return -EAGAIN;
}
 
@@ -494,7 +489,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
if (mode == MIGRATE_ASYNC && head &&
!buffer_migrate_lock_buffers(head, mode)) {
page_ref_unfreeze(page, expected_count);
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
return -EAGAIN;
}
 
@@ -522,7 +517,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
SetPageDirty(newpage);
}
 
-   radix_tree_replace_slot(>pages, pslot, newpage);
+   xas_store(, newpage);
 
/*
 * Drop cache reference from old page by unfreezing
@@ -531,7 +526,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
 */
page_ref_unfreeze(page, expected_count - 1);
 
-   xa_unlock(>pages);
+   xas_unlock();
/* Leave irq disabled to prevent preemption while updating stats */
 
/*
@@ -571,22 +566,18 @@ EXPORT_SYMBOL(migrate_page_move_mapping);
 int migrate_huge_page_move_mapping(struct address_space *mapping,
   struct page *newpage, struct page *page)
 {
+   XA_STATE(xas, >pages, page_index(page));
int expected_count;
-   void **pslot;
-
-   xa_lock_irq(>pages);
-
-   pslot = radix_tree_lookup_slot(>pages, page_index(page));
 
+   xas_lock_irq();
expected_count = 2 + page_has_private(page);
-   if (page_count(page) != expected_count ||
-   radix_tree_deref_slot_protected(pslot, >pages.xa_lock) 
!= page) {
-   xa_unlock_irq(>pages);
+   if (page_count(page) != expected_count || xas_load() != page) {
+   xas_unlock_irq();
return -EAGAIN;
}
 
if (!page_ref_freeze(page, expected_count)) {
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
return -EAGAIN;
}
 
@@ -595,11 +586,11 @@ int migrate_huge_page_move_mapping(struct address_space 
*mapping,
 
get_page(newpage);
 
-   radix_tree_replace_slot(>pages, pslot, newpage);
+   xas_store(, newpage);
 
page_ref_unfreeze(page, expected_count - 1);
 
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
 
return MIGRATEPAGE_SUCCESS;
 }
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 92/99] f2fs: Convert pids radix tree to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

The XArray API works out rather well for this user.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/f2fs/super.c |  2 --
 fs/f2fs/trace.c | 60 -
 fs/f2fs/trace.h |  2 --
 3 files changed, 4 insertions(+), 60 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 708155d9c2e4..d608edffe69e 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2831,8 +2831,6 @@ static int __init init_f2fs_fs(void)
 {
int err;
 
-   f2fs_build_trace_ios();
-
err = init_inodecache();
if (err)
goto fail;
diff --git a/fs/f2fs/trace.c b/fs/f2fs/trace.c
index bccbbf2616d2..f316a42c547f 100644
--- a/fs/f2fs/trace.c
+++ b/fs/f2fs/trace.c
@@ -16,8 +16,7 @@
 #include "f2fs.h"
 #include "trace.h"
 
-static RADIX_TREE(pids, GFP_ATOMIC);
-static spinlock_t pids_lock;
+static DEFINE_XARRAY(pids);
 static struct last_io_info last_io;
 
 static inline void __print_last_io(void)
@@ -57,28 +56,13 @@ void f2fs_trace_pid(struct page *page)
 {
struct inode *inode = page->mapping->host;
pid_t pid = task_pid_nr(current);
-   void *p;
 
set_page_private(page, (unsigned long)pid);
 
-   if (radix_tree_preload(GFP_NOFS))
-   return;
-
-   spin_lock(_lock);
-   p = radix_tree_lookup(, pid);
-   if (p == current)
-   goto out;
-   if (p)
-   radix_tree_delete(, pid);
-
-   f2fs_radix_tree_insert(, pid, current);
-
-   trace_printk("%3x:%3x %4x %-16s\n",
+   if (xa_store(, pid, current, GFP_NOFS) != current)
+   trace_printk("%3x:%3x %4x %-16s\n",
MAJOR(inode->i_sb->s_dev), MINOR(inode->i_sb->s_dev),
pid, current->comm);
-out:
-   spin_unlock(_lock);
-   radix_tree_preload_end();
 }
 
 void f2fs_trace_ios(struct f2fs_io_info *fio, int flush)
@@ -120,43 +104,7 @@ void f2fs_trace_ios(struct f2fs_io_info *fio, int flush)
return;
 }
 
-void f2fs_build_trace_ios(void)
-{
-   spin_lock_init(_lock);
-}
-
-#define PIDVEC_SIZE128
-static unsigned int gang_lookup_pids(pid_t *results, unsigned long first_index,
-   unsigned int max_items)
-{
-   struct radix_tree_iter iter;
-   void **slot;
-   unsigned int ret = 0;
-
-   if (unlikely(!max_items))
-   return 0;
-
-   radix_tree_for_each_slot(slot, , , first_index) {
-   results[ret] = iter.index;
-   if (++ret == max_items)
-   break;
-   }
-   return ret;
-}
-
 void f2fs_destroy_trace_ios(void)
 {
-   pid_t pid[PIDVEC_SIZE];
-   pid_t next_pid = 0;
-   unsigned int found;
-
-   spin_lock(_lock);
-   while ((found = gang_lookup_pids(pid, next_pid, PIDVEC_SIZE))) {
-   unsigned idx;
-
-   next_pid = pid[found - 1] + 1;
-   for (idx = 0; idx < found; idx++)
-   radix_tree_delete(, pid[idx]);
-   }
-   spin_unlock(_lock);
+   xa_destroy();
 }
diff --git a/fs/f2fs/trace.h b/fs/f2fs/trace.h
index 67db24ac1e85..157e4564e48b 100644
--- a/fs/f2fs/trace.h
+++ b/fs/f2fs/trace.h
@@ -34,12 +34,10 @@ struct last_io_info {
 
 extern void f2fs_trace_pid(struct page *);
 extern void f2fs_trace_ios(struct f2fs_io_info *, int);
-extern void f2fs_build_trace_ios(void);
 extern void f2fs_destroy_trace_ios(void);
 #else
 #define f2fs_trace_pid(p)
 #define f2fs_trace_ios(i, n)
-#define f2fs_build_trace_ios()
 #define f2fs_destroy_trace_ios()
 
 #endif
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 94/99] f2fs: Convert extent_tree_root to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Rename it to extent_array and use the xa_lock in place of the
extent_tree_lock mutex.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/f2fs/extent_cache.c | 59 +-
 fs/f2fs/f2fs.h |  3 +--
 2 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c
index ff2352a0ed15..da5f3bd1808d 100644
--- a/fs/f2fs/extent_cache.c
+++ b/fs/f2fs/extent_cache.c
@@ -250,25 +250,25 @@ static struct extent_tree *__grab_extent_tree(struct 
inode *inode)
struct extent_tree *et;
nid_t ino = inode->i_ino;
 
-   mutex_lock(>extent_tree_lock);
-   et = radix_tree_lookup(>extent_tree_root, ino);
-   if (!et) {
-   et = f2fs_kmem_cache_alloc(extent_tree_slab, GFP_NOFS);
-   f2fs_radix_tree_insert(>extent_tree_root, ino, et);
-   memset(et, 0, sizeof(struct extent_tree));
-   et->ino = ino;
-   et->root = RB_ROOT;
-   et->cached_en = NULL;
-   rwlock_init(>lock);
-   INIT_LIST_HEAD(>list);
-   atomic_set(>node_cnt, 0);
-   atomic_inc(>total_ext_tree);
-   } else {
+   et = xa_load(>extent_array, ino);
+   if (et) {
atomic_dec(>total_zombie_tree);
list_del_init(>list);
+   goto out;
}
-   mutex_unlock(>extent_tree_lock);
 
+   et = f2fs_kmem_cache_alloc(extent_tree_slab, GFP_NOFS | __GFP_ZERO);
+   et->ino = ino;
+   et->root = RB_ROOT;
+   et->cached_en = NULL;
+   rwlock_init(>lock);
+   INIT_LIST_HEAD(>list);
+   atomic_set(>node_cnt, 0);
+
+   xa_store(>extent_array, ino, et, GFP_NOFS);
+   atomic_inc(>total_ext_tree);
+
+out:
/* never died until evict_inode */
F2FS_I(inode)->extent_tree = et;
 
@@ -622,7 +622,7 @@ unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info 
*sbi, int nr_shrink)
if (!atomic_read(>total_zombie_tree))
goto free_node;
 
-   if (!mutex_trylock(>extent_tree_lock))
+   if (!xa_trylock(>extent_array))
goto out;
 
/* 1. remove unreferenced extent tree */
@@ -634,7 +634,7 @@ unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info 
*sbi, int nr_shrink)
}
f2fs_bug_on(sbi, atomic_read(>node_cnt));
list_del_init(>list);
-   radix_tree_delete(>extent_tree_root, et->ino);
+   xa_erase(>extent_array, et->ino);
kmem_cache_free(extent_tree_slab, et);
atomic_dec(>total_ext_tree);
atomic_dec(>total_zombie_tree);
@@ -642,13 +642,13 @@ unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info 
*sbi, int nr_shrink)
 
if (node_cnt + tree_cnt >= nr_shrink)
goto unlock_out;
-   cond_resched();
+   cond_resched_lock(>extent_array.xa_lock);
}
-   mutex_unlock(>extent_tree_lock);
+   xa_unlock(>extent_array);
 
 free_node:
/* 2. remove LRU extent entries */
-   if (!mutex_trylock(>extent_tree_lock))
+   if (!xa_trylock(>extent_array))
goto out;
 
remained = nr_shrink - (node_cnt + tree_cnt);
@@ -678,7 +678,7 @@ unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info 
*sbi, int nr_shrink)
spin_unlock(>extent_lock);
 
 unlock_out:
-   mutex_unlock(>extent_tree_lock);
+   xa_unlock(>extent_array);
 out:
trace_f2fs_shrink_extent_tree(sbi, node_cnt, tree_cnt);
 
@@ -725,23 +725,23 @@ void f2fs_destroy_extent_tree(struct inode *inode)
 
if (inode->i_nlink && !is_bad_inode(inode) &&
atomic_read(>node_cnt)) {
-   mutex_lock(>extent_tree_lock);
+   xa_lock(>extent_array);
list_add_tail(>list, >zombie_list);
atomic_inc(>total_zombie_tree);
-   mutex_unlock(>extent_tree_lock);
+   xa_unlock(>extent_array);
return;
}
 
/* free all extent info belong to this extent tree */
node_cnt = f2fs_destroy_extent_node(inode);
 
-   /* delete extent tree entry in radix tree */
-   mutex_lock(>extent_tree_lock);
+   /* delete extent from array */
+   xa_lock(>extent_array);
f2fs_bug_on(sbi, atomic_read(>node_cnt));
-   radix_tree_delete(>extent_tree_root, inode->i_ino);
-   kmem_cache_free(extent_tree_slab, et);
+   __xa_erase(>extent_array, inode->i_ino);
atomic_dec(>total_ext_tree);
-   mutex_unlock(>extent_tree_lock);
+   xa_unlock(>extent_array);
+   kmem_ca

[PATCH v6 49/99] shmem: Convert shmem_partial_swap_usage to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Simpler code because the xarray takes care of things like the limit and
dereferencing the slot.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 mm/shmem.c | 18 +++---
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 5a2226e06f8c..4dbcfb436bd1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -658,29 +658,17 @@ static int shmem_free_swap(struct address_space *mapping,
 unsigned long shmem_partial_swap_usage(struct address_space *mapping,
pgoff_t start, pgoff_t end)
 {
-   struct radix_tree_iter iter;
-   void **slot;
+   XA_STATE(xas, >pages, start);
struct page *page;
unsigned long swapped = 0;
 
rcu_read_lock();
-
-   radix_tree_for_each_slot(slot, >pages, , start) {
-   if (iter.index >= end)
-   break;
-
-   page = radix_tree_deref_slot(slot);
-
-   if (radix_tree_deref_retry(page)) {
-   slot = radix_tree_iter_retry();
-   continue;
-   }
-
+   xas_for_each(, page, end - 1) {
if (xa_is_value(page))
swapped++;
 
if (need_resched()) {
-   slot = radix_tree_iter_resume(slot, );
+   xas_pause();
cond_resched_rcu();
}
}
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 51/99] btrfs: Convert page cache to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/btrfs/compression.c | 4 +---
 fs/btrfs/extent_io.c   | 6 ++
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index e687d06cd97c..4174b166e235 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -449,9 +449,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
if (pg_index > end_index)
break;
 
-   rcu_read_lock();
-   page = radix_tree_lookup(>pages, pg_index);
-   rcu_read_unlock();
+   page = xa_load(>pages, pg_index);
if (page && !xa_is_value(page)) {
misses++;
if (misses > 4)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4301cbf4e31f..fd5e9d887328 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5197,11 +5197,9 @@ void clear_extent_buffer_dirty(struct extent_buffer *eb)
 
clear_page_dirty_for_io(page);
xa_lock_irq(>mapping->pages);
-   if (!PageDirty(page)) {
-   radix_tree_tag_clear(>mapping->pages,
-   page_index(page),
+   if (!PageDirty(page))
+   __xa_clear_tag(>mapping->pages, page_index(page),
PAGECACHE_TAG_DIRTY);
-   }
xa_unlock_irq(>mapping->pages);
ClearPageError(page);
unlock_page(page);
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 50/99] shmem: Comment fixups

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Remove the last mentions of radix tree from various comments.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 mm/shmem.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 4dbcfb436bd1..5110848885d4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -743,7 +743,7 @@ void shmem_unlock_mapping(struct address_space *mapping)
 }
 
 /*
- * Remove range of pages and swap entries from radix tree, and free them.
+ * Remove range of pages and swap entries from page cache, and free them.
  * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
  */
 static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
@@ -1118,10 +1118,10 @@ static int shmem_unuse_inode(struct shmem_inode_info 
*info,
 * We needed to drop mutex to make that restrictive page
 * allocation, but the inode might have been freed while we
 * dropped it: although a racing shmem_evict_inode() cannot
-* complete without emptying the radix_tree, our page lock
+* complete without emptying the page cache, our page lock
 * on this swapcache page is not enough to prevent that -
 * free_swap_and_cache() of our swap entry will only
-* trylock_page(), removing swap from radix_tree whatever.
+* trylock_page(), removing swap from page cache whatever.
 *
 * We must not proceed to shmem_add_to_page_cache() if the
 * inode has been freed, but of course we cannot rely on
@@ -1187,7 +1187,7 @@ int shmem_unuse(swp_entry_t swap, struct page *page)
false);
if (error)
goto out;
-   /* No radix_tree_preload: swap entry keeps a place for page in tree */
+   /* No memory allocation: swap entry occupies the slot for the page */
error = -EAGAIN;
 
mutex_lock(_swaplist_mutex);
@@ -1863,7 +1863,7 @@ alloc_nohuge: page = 
shmem_alloc_and_acct_page(gfp, inode,
spin_unlock_irq(>lock);
goto repeat;
}
-   if (error == -EEXIST)   /* from above or from radix_tree_insert */
+   if (error == -EEXIST)
goto repeat;
return error;
 }
@@ -2475,7 +2475,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, 
struct iov_iter *to)
 }
 
 /*
- * llseek SEEK_DATA or SEEK_HOLE through the radix_tree.
+ * llseek SEEK_DATA or SEEK_HOLE through the page cache.
  */
 static pgoff_t shmem_seek_hole_data(struct address_space *mapping,
pgoff_t index, pgoff_t end, int whence)
@@ -2563,7 +2563,7 @@ static loff_t shmem_file_llseek(struct file *file, loff_t 
offset, int whence)
 }
 
 /*
- * We need a tag: a new tag would expand every radix_tree_node by 8 bytes,
+ * We need a tag: a new tag would expand every xa_node by 8 bytes,
  * so reuse a tag which we firmly believe is never set or cleared on shmem.
  */
 #define SHMEM_TAG_PINNEDPAGECACHE_TAG_TOWRITE
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 62/99] dax: Convert dax_insert_pfn_mkwrite to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/dax.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index b66b8c896ed8..e6b25ef112f2 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1497,21 +1497,21 @@ static int dax_insert_pfn_mkwrite(struct vm_fault *vmf,
void *entry;
int vmf_ret, error;
 
-   xa_lock_irq(>pages);
+   xas_lock_irq();
entry = get_unlocked_mapping_entry();
/* Did we race with someone splitting entry or so? */
if (!entry ||
(pe_size == PE_SIZE_PTE && !dax_is_pte_entry(entry)) ||
(pe_size == PE_SIZE_PMD && !dax_is_pmd_entry(entry))) {
put_unlocked_mapping_entry(, entry);
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf,
  VM_FAULT_NOPAGE);
return VM_FAULT_NOPAGE;
}
-   radix_tree_tag_set(>pages, index, PAGECACHE_TAG_DIRTY);
+   xas_set_tag(, PAGECACHE_TAG_DIRTY);
entry = lock_slot();
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
switch (pe_size) {
case PE_SIZE_PTE:
error = vm_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn);
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 64/99] dax: Convert grab_mapping_entry to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/dax.c | 98 +---
 1 file changed, 26 insertions(+), 72 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 494e8fb7a98f..3eb0cf176d69 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -44,6 +44,7 @@
 
 /* The 'colour' (ie low bits) within a PMD of a page offset.  */
 #define PG_PMD_COLOUR  ((PMD_SIZE >> PAGE_SHIFT) - 1)
+#define PMD_ORDER  (PMD_SHIFT - PAGE_SHIFT)
 
 static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES];
 
@@ -89,10 +90,10 @@ static void *dax_radix_locked_entry(sector_t sector, 
unsigned long flags)
DAX_ENTRY_LOCK);
 }
 
-static unsigned int dax_radix_order(void *entry)
+static unsigned int dax_entry_order(void *entry)
 {
if (xa_to_value(entry) & DAX_PMD)
-   return PMD_SHIFT - PAGE_SHIFT;
+   return PMD_ORDER;
return 0;
 }
 
@@ -299,10 +300,11 @@ static void *grab_mapping_entry(struct address_space 
*mapping, pgoff_t index,
 {
XA_STATE(xas, >pages, index);
bool pmd_downgrade = false; /* splitting 2MiB entry into 4k entries? */
-   void *entry, **slot;
+   void *entry;
 
+   xas_set_order(, index, size_flag ? PMD_ORDER : 0);
 restart:
-   xa_lock_irq(>pages);
+   xas_lock_irq();
entry = get_unlocked_mapping_entry();
 
if (WARN_ON_ONCE(entry && !xa_is_value(entry))) {
@@ -326,84 +328,36 @@ static void *grab_mapping_entry(struct address_space 
*mapping, pgoff_t index,
}
}
 
-   /* No entry for given index? Make sure radix tree is big enough. */
-   if (!entry || pmd_downgrade) {
-   int err;
-
-   if (pmd_downgrade) {
-   /*
-* Make sure 'entry' remains valid while we drop
-* xa_lock.
-*/
-   entry = lock_slot();
-   }
-
-   xa_unlock_irq(>pages);
+   if (pmd_downgrade) {
+   entry = lock_slot();
/*
 * Besides huge zero pages the only other thing that gets
 * downgraded are empty entries which don't need to be
 * unmapped.
 */
-   if (pmd_downgrade && dax_is_zero_entry(entry))
+   if (dax_is_zero_entry(entry)) {
+   xas_pause();
+   xas_unlock_irq();
unmap_mapping_range(mapping,
(index << PAGE_SHIFT) & PMD_MASK, PMD_SIZE, 0);
-
-   err = radix_tree_preload(
-   mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM);
-   if (err) {
-   if (pmd_downgrade)
-   put_locked_mapping_entry(mapping, index);
-   return ERR_PTR(err);
+   xas_lock_irq();
}
-   xa_lock_irq(>pages);
-
-   if (!entry) {
-   /*
-* We needed to drop the pages lock while calling
-* radix_tree_preload() and we didn't have an entry to
-* lock.  See if another thread inserted an entry at
-* our index during this time.
-*/
-   entry = __radix_tree_lookup(>pages, index,
-   NULL, );
-   if (entry) {
-   radix_tree_preload_end();
-   xa_unlock_irq(>pages);
-   goto restart;
-   }
-   }
-
-   if (pmd_downgrade) {
-   radix_tree_delete(>pages, index);
-   mapping->nrexceptional--;
-   dax_wake_entry(, entry, true);
-   }
-
+   xas_store(, NULL);
+   mapping->nrexceptional--;
+   dax_wake_entry(, entry, true);
+   }
+   if (!entry || pmd_downgrade) {
entry = dax_radix_locked_entry(0, size_flag | DAX_EMPTY);
-
-   err = __radix_tree_insert(>pages, index,
-   dax_radix_order(entry), entry);
-   radix_tree_preload_end();
-   if (err) {
-   xa_unlock_irq(>pages);
-   /*
-* Our insertion of a DAX entry failed, most likely
-* because we were inserting a PMD entry and it
-* collided with a PTE sized entry at a different
-* index in the PMD range.  We haven't inserted
-

[PATCH v6 73/99] xfs: Convert mru cache to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

This eliminates a call to radix_tree_preload().

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/xfs/xfs_mru_cache.c | 72 +++---
 1 file changed, 33 insertions(+), 39 deletions(-)

diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
index f8a674d7f092..2179bede5396 100644
--- a/fs/xfs/xfs_mru_cache.c
+++ b/fs/xfs/xfs_mru_cache.c
@@ -101,10 +101,9 @@
  * an infinite loop in the code.
  */
 struct xfs_mru_cache {
-   struct radix_tree_root  store; /* Core storage data structure.  */
+   struct xarray   store; /* Core storage data structure.  */
struct list_head*lists;/* Array of lists, one per grp.  */
struct list_headreap_list; /* Elements overdue for reaping. */
-   spinlock_t  lock;  /* Lock to protect this struct.  */
unsigned intgrp_count; /* Number of discrete groups.*/
unsigned intgrp_time;  /* Time period spanned by grps.  */
unsigned intlru_grp;   /* Group containing time zero.   */
@@ -232,22 +231,21 @@ _xfs_mru_cache_list_insert(
  * data store, removing it from the reap list, calling the client's free
  * function and deleting the element from the element zone.
  *
- * We get called holding the mru->lock, which we drop and then reacquire.
- * Sparse need special help with this to tell it we know what we are doing.
+ * We get called holding the mru->store lock, which we drop and then reacquire.
+ * Sparse needs special help with this to tell it we know what we are doing.
  */
 STATIC void
 _xfs_mru_cache_clear_reap_list(
struct xfs_mru_cache*mru)
-   __releases(mru->lock) __acquires(mru->lock)
+   __releases(mru->store) __acquires(mru->store)
 {
struct xfs_mru_cache_elem *elem, *next;
struct list_headtmp;
 
INIT_LIST_HEAD();
list_for_each_entry_safe(elem, next, >reap_list, list_node) {
-
/* Remove the element from the data store. */
-   radix_tree_delete(>store, elem->key);
+   __xa_erase(>store, elem->key);
 
/*
 * remove to temp list so it can be freed without
@@ -255,14 +253,14 @@ _xfs_mru_cache_clear_reap_list(
 */
list_move(>list_node, );
}
-   spin_unlock(>lock);
+   xa_unlock(>store);
 
list_for_each_entry_safe(elem, next, , list_node) {
list_del_init(>list_node);
mru->free_func(elem);
}
 
-   spin_lock(>lock);
+   xa_lock(>store);
 }
 
 /*
@@ -284,7 +282,7 @@ _xfs_mru_cache_reap(
if (!mru || !mru->lists)
return;
 
-   spin_lock(>lock);
+   xa_lock(>store);
next = _xfs_mru_cache_migrate(mru, jiffies);
_xfs_mru_cache_clear_reap_list(mru);
 
@@ -298,7 +296,7 @@ _xfs_mru_cache_reap(
queue_delayed_work(xfs_mru_reap_wq, >work, next);
}
 
-   spin_unlock(>lock);
+   xa_unlock(>store);
 }
 
 int
@@ -358,13 +356,8 @@ xfs_mru_cache_create(
for (grp = 0; grp < mru->grp_count; grp++)
INIT_LIST_HEAD(mru->lists + grp);
 
-   /*
-* We use GFP_KERNEL radix tree preload and do inserts under a
-* spinlock so GFP_ATOMIC is appropriate for the radix tree itself.
-*/
-   INIT_RADIX_TREE(>store, GFP_ATOMIC);
+   xa_init(>store);
INIT_LIST_HEAD(>reap_list);
-   spin_lock_init(>lock);
INIT_DELAYED_WORK(>work, _xfs_mru_cache_reap);
 
mru->grp_time  = grp_time;
@@ -394,17 +387,17 @@ xfs_mru_cache_flush(
if (!mru || !mru->lists)
return;
 
-   spin_lock(>lock);
+   xa_lock(>store);
if (mru->queued) {
-   spin_unlock(>lock);
+   xa_unlock(>store);
cancel_delayed_work_sync(>work);
-   spin_lock(>lock);
+   xa_lock(>store);
}
 
_xfs_mru_cache_migrate(mru, jiffies + mru->grp_count * mru->grp_time);
_xfs_mru_cache_clear_reap_list(mru);
 
-   spin_unlock(>lock);
+   xa_unlock(>store);
 }
 
 void
@@ -431,24 +424,24 @@ xfs_mru_cache_insert(
unsigned long   key,
struct xfs_mru_cache_elem *elem)
 {
+   XA_STATE(xas, >store, key);
int error;
 
ASSERT(mru && mru->lists);
if (!mru || !mru->lists)
return -EINVAL;
 
-   if (radix_tree_preload(GFP_NOFS))
-   return -ENOMEM;
-
INIT_LIST_HEAD(>list_node);
elem->key = key;
 
-   spin_lock(>lock);
-   error = radix_tree_insert(>store, key, elem);
-   radix_tree_preload_end(

[PATCH v6 74/99] usb: Convert xhci-mem to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

The XArray API is a slightly better fit for xhci_insert_segment_mapping()
than the radix tree API was.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 drivers/usb/host/xhci-mem.c | 68 +++--
 drivers/usb/host/xhci.h |  6 ++--
 2 files changed, 32 insertions(+), 42 deletions(-)

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 3a29b32a3bd0..a2e15a9abc30 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -149,70 +149,60 @@ static void xhci_link_rings(struct xhci_hcd *xhci, struct 
xhci_ring *ring,
 }
 
 /*
- * We need a radix tree for mapping physical addresses of TRBs to which stream
- * ID they belong to.  We need to do this because the host controller won't 
tell
+ * We need to map physical addresses of TRBs to the stream ID they belong to.
+ * We need to do this because the host controller won't tell
  * us which stream ring the TRB came from.  We could store the stream ID in an
  * event data TRB, but that doesn't help us for the cancellation case, since 
the
  * endpoint may stop before it reaches that event data TRB.
  *
- * The radix tree maps the upper portion of the TRB DMA address to a ring
+ * The xarray maps the upper portion of the TRB DMA address to a ring
  * segment that has the same upper portion of DMA addresses.  For example, say 
I
  * have segments of size 1KB, that are always 1KB aligned.  A segment may
  * start at 0x10c91000 and end at 0x10c913f0.  If I use the upper 10 bits, the
- * key to the stream ID is 0x43244.  I can use the DMA address of the TRB to
- * pass the radix tree a key to get the right stream ID:
+ * index of the stream ID is 0x43244.  I can use the DMA address of the TRB as
+ * the xarray index to get the right stream ID:
  *
  * 0x10c90fff >> 10 = 0x43243
  * 0x10c912c0 >> 10 = 0x43244
  * 0x10c91400 >> 10 = 0x43245
  *
  * Obviously, only those TRBs with DMA addresses that are within the segment
- * will make the radix tree return the stream ID for that ring.
+ * will make the xarray return the stream ID for that ring.
  *
- * Caveats for the radix tree:
+ * Caveats for the xarray:
  *
- * The radix tree uses an unsigned long as a key pair.  On 32-bit systems, an
+ * The xarray uses an unsigned long for the index.  On 32-bit systems, an
  * unsigned long will be 32-bits; on a 64-bit system an unsigned long will be
  * 64-bits.  Since we only request 32-bit DMA addresses, we can use that as the
- * key on 32-bit or 64-bit systems (it would also be fine if we asked for 
64-bit
- * PCI DMA addresses on a 64-bit system).  There might be a problem on 32-bit
- * extended systems (where the DMA address can be bigger than 32-bits),
+ * index on 32-bit or 64-bit systems (it would also be fine if we asked for
+ * 64-bit PCI DMA addresses on a 64-bit system).  There might be a problem on
+ * 32-bit extended systems (where the DMA address can be bigger than 32-bits),
  * if we allow the PCI dma mask to be bigger than 32-bits.  So don't do that.
  */
-static int xhci_insert_segment_mapping(struct radix_tree_root *trb_address_map,
+
+static unsigned long trb_index(dma_addr_t dma)
+{
+   return (unsigned long)(dma >> TRB_SEGMENT_SHIFT);
+}
+
+static int xhci_insert_segment_mapping(struct xarray *trb_address_map,
struct xhci_ring *ring,
struct xhci_segment *seg,
-   gfp_t mem_flags)
+   gfp_t gfp)
 {
-   unsigned long key;
-   int ret;
-
-   key = (unsigned long)(seg->dma >> TRB_SEGMENT_SHIFT);
/* Skip any segments that were already added. */
-   if (radix_tree_lookup(trb_address_map, key))
-   return 0;
-
-   ret = radix_tree_maybe_preload(mem_flags);
-   if (ret)
-   return ret;
-   ret = radix_tree_insert(trb_address_map,
-   key, ring);
-   radix_tree_preload_end();
-   return ret;
+   return xa_err(xa_cmpxchg(trb_address_map, trb_index(seg->dma), NULL,
+   ring, gfp));
 }
 
-static void xhci_remove_segment_mapping(struct radix_tree_root 
*trb_address_map,
+static void xhci_remove_segment_mapping(struct xarray *trb_address_map,
struct xhci_segment *seg)
 {
-   unsigned long key;
-
-   key = (unsigned long)(seg->dma >> TRB_SEGMENT_SHIFT);
-   if (radix_tree_lookup(trb_address_map, key))
-   radix_tree_delete(trb_address_map, key);
+   xa_erase(trb_address_map, trb_index(seg->dma));
 }
 
 static int xhci_update_stream_segment_mapping(
-   struct radix_tree_root *trb_address_map,
+   struct xarray *trb_address_map,
struct xhci_ring *ring,
struct xhci_segment *first_seg,
struct xhci_segment *last_seg,
@@ -574,8 +564,8 @@ stru

[PATCH v6 63/99] dax: Convert dax_insert_mapping_entry to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/dax.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index e6b25ef112f2..494e8fb7a98f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -498,9 +498,9 @@ static void *dax_insert_mapping_entry(struct address_space 
*mapping,
  void *entry, sector_t sector,
  unsigned long flags, bool dirty)
 {
-   struct radix_tree_root *pages = >pages;
void *new_entry;
pgoff_t index = vmf->pgoff;
+   XA_STATE(xas, >pages, index);
 
if (dirty)
__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
@@ -516,7 +516,7 @@ static void *dax_insert_mapping_entry(struct address_space 
*mapping,
PAGE_SIZE, 0);
}
 
-   xa_lock_irq(>pages);
+   xas_lock_irq();
new_entry = dax_radix_locked_entry(sector, flags);
 
if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
@@ -528,21 +528,15 @@ static void *dax_insert_mapping_entry(struct 
address_space *mapping,
 * existing entry is a PMD, we will just leave the PMD in the
 * tree and dirty it if necessary.
 */
-   struct radix_tree_node *node;
-   void **slot;
-   void *ret;
-
-   ret = __radix_tree_lookup(pages, index, , );
-   WARN_ON_ONCE(ret != entry);
-   __radix_tree_replace(pages, node, slot,
-new_entry, NULL);
+   void *prev = xas_store(, new_entry);
+   WARN_ON_ONCE(prev != entry);
entry = new_entry;
}
 
if (dirty)
-   radix_tree_tag_set(pages, index, PAGECACHE_TAG_DIRTY);
+   xas_set_tag(, PAGECACHE_TAG_DIRTY);
 
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
return entry;
 }
 
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 71/99] xfs: Convert pag_ici_root to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Rename pag_ici_root to pag_ici_xa and use XArray APIs instead of radix
tree APIs.  Shorter code, typechecking on tag numbers, better error
checking in xfs_reclaim_inode(), and eliminates a call to
radix_tree_preload().

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/xfs/libxfs/xfs_sb.c |   2 +-
 fs/xfs/libxfs/xfs_sb.h |   2 +-
 fs/xfs/xfs_icache.c| 111 +++--
 fs/xfs/xfs_icache.h|   5 +--
 fs/xfs/xfs_inode.c |  24 ---
 fs/xfs/xfs_mount.c |   3 +-
 fs/xfs/xfs_mount.h |   3 +-
 7 files changed, 56 insertions(+), 94 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 3b0b65eb8224..8fb7c216c761 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -76,7 +76,7 @@ struct xfs_perag *
 xfs_perag_get_tag(
struct xfs_mount*mp,
xfs_agnumber_t  first,
-   int tag)
+   xa_tag_ttag)
 {
XA_STATE(xas, >m_perag_xa, first);
struct xfs_perag*pag;
diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h
index 961e6475a309..d2de90b8f39c 100644
--- a/fs/xfs/libxfs/xfs_sb.h
+++ b/fs/xfs/libxfs/xfs_sb.h
@@ -23,7 +23,7 @@
  */
 extern struct xfs_perag *xfs_perag_get(struct xfs_mount *, xfs_agnumber_t);
 extern struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *, xfs_agnumber_t,
-  int tag);
+  xa_tag_t tag);
 extern voidxfs_perag_put(struct xfs_perag *pag);
 extern int xfs_initialize_perag_data(struct xfs_mount *, xfs_agnumber_t);
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 65a8b91b2e70..10c76209227b 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -186,7 +186,7 @@ xfs_perag_set_reclaim_tag(
 {
struct xfs_mount*mp = pag->pag_mount;
 
-   lockdep_assert_held(>pag_ici_lock);
+   lockdep_assert_held(>pag_ici_xa.xa_lock);
if (pag->pag_ici_reclaimable++)
return;
 
@@ -205,7 +205,7 @@ xfs_perag_clear_reclaim_tag(
 {
struct xfs_mount*mp = pag->pag_mount;
 
-   lockdep_assert_held(>pag_ici_lock);
+   lockdep_assert_held(>pag_ici_xa.xa_lock);
if (--pag->pag_ici_reclaimable)
return;
 
@@ -228,16 +228,16 @@ xfs_inode_set_reclaim_tag(
struct xfs_perag*pag;
 
pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-   spin_lock(>pag_ici_lock);
+   xa_lock(>pag_ici_xa);
spin_lock(>i_flags_lock);
 
-   radix_tree_tag_set(>pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
+   __xa_set_tag(>pag_ici_xa, XFS_INO_TO_AGINO(mp, ip->i_ino),
   XFS_ICI_RECLAIM_TAG);
xfs_perag_set_reclaim_tag(pag);
__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 
spin_unlock(>i_flags_lock);
-   spin_unlock(>pag_ici_lock);
+   xa_unlock(>pag_ici_xa);
xfs_perag_put(pag);
 }
 
@@ -246,7 +246,7 @@ xfs_inode_clear_reclaim_tag(
struct xfs_perag*pag,
xfs_ino_t   ino)
 {
-   radix_tree_tag_clear(>pag_ici_root,
+   __xa_clear_tag(>pag_ici_xa,
 XFS_INO_TO_AGINO(pag->pag_mount, ino),
 XFS_ICI_RECLAIM_TAG);
xfs_perag_clear_reclaim_tag(pag);
@@ -367,8 +367,8 @@ xfs_iget_cache_hit(
/*
 * We need to set XFS_IRECLAIM to prevent xfs_reclaim_inode
 * from stomping over us while we recycle the inode.  We can't
-* clear the radix tree reclaimable tag yet as it requires
-* pag_ici_lock to be held exclusive.
+* clear the xarray reclaimable tag yet as it requires
+* pag_ici_xa.xa_lock to be held exclusive.
 */
ip->i_flags |= XFS_IRECLAIM;
 
@@ -393,7 +393,7 @@ xfs_iget_cache_hit(
goto out_error;
}
 
-   spin_lock(>pag_ici_lock);
+   xa_lock(>pag_ici_xa);
spin_lock(>i_flags_lock);
 
/*
@@ -410,7 +410,7 @@ xfs_iget_cache_hit(
init_rwsem(>i_rwsem);
 
spin_unlock(>i_flags_lock);
-   spin_unlock(>pag_ici_lock);
+   xa_unlock(>pag_ici_xa);
} else {
/* If the VFS inode is being torn down, pause and try again. */
if (!igrab(inode)) {
@@ -471,17 +471,6 @@ xfs_iget_cache_miss(
goto out_destroy;
}
 
-   /*
-* Preload the radix tree so we can insert safely under the
-* write spinlock. Note that we cannot sleep inside the preload
-* region. Since we can be called from transaction context, don't
-* recurse into

[PATCH v6 53/99] fs: Convert writeback to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

A couple of short loops.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/fs-writeback.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e2c1ca667d9a..897a89489fe9 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -339,9 +339,9 @@ static void inode_switch_wbs_work_fn(struct work_struct 
*work)
struct address_space *mapping = inode->i_mapping;
struct bdi_writeback *old_wb = inode->i_wb;
struct bdi_writeback *new_wb = isw->new_wb;
-   struct radix_tree_iter iter;
+   XA_STATE(xas, >pages, 0);
+   struct page *page;
bool switched = false;
-   void **slot;
 
/*
 * By the time control reaches here, RCU grace period has passed
@@ -375,25 +375,18 @@ static void inode_switch_wbs_work_fn(struct work_struct 
*work)
 * to possibly dirty pages while PAGECACHE_TAG_WRITEBACK points to
 * pages actually under writeback.
 */
-   radix_tree_for_each_tagged(slot, >pages, , 0,
-  PAGECACHE_TAG_DIRTY) {
-   struct page *page = radix_tree_deref_slot_protected(slot,
-   >pages.xa_lock);
-   if (likely(page) && PageDirty(page)) {
+   xas_for_each_tag(, page, ULONG_MAX, PAGECACHE_TAG_DIRTY) {
+   if (PageDirty(page)) {
dec_wb_stat(old_wb, WB_RECLAIMABLE);
inc_wb_stat(new_wb, WB_RECLAIMABLE);
}
}
 
-   radix_tree_for_each_tagged(slot, >pages, , 0,
-  PAGECACHE_TAG_WRITEBACK) {
-   struct page *page = radix_tree_deref_slot_protected(slot,
-   >pages.xa_lock);
-   if (likely(page)) {
-   WARN_ON_ONCE(!PageWriteback(page));
-   dec_wb_stat(old_wb, WB_WRITEBACK);
-   inc_wb_stat(new_wb, WB_WRITEBACK);
-   }
+   xas_set(, 0);
+   xas_for_each_tag(, page, ULONG_MAX, PAGECACHE_TAG_WRITEBACK) {
+   WARN_ON_ONCE(!PageWriteback(page));
+   dec_wb_stat(old_wb, WB_WRITEBACK);
+   inc_wb_stat(new_wb, WB_WRITEBACK);
}
 
wb_get(new_wb);
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 76/99] irqdomain: Convert to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

In a non-critical path, irqdomain wants to know how many entries are
stored in the xarray, so add xa_count().  This is a pretty straightforward
conversion; mostly just removing now-redundant locking.  The only thing
of note is just how much simpler irq_domain_fix_revmap() becomes.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
Acked-by: Marc Zyngier <marc.zyng...@arm.com>
---
 include/linux/irqdomain.h | 10 --
 include/linux/xarray.h|  1 +
 kernel/irq/irqdomain.c| 39 ++-
 lib/xarray.c  | 25 +
 4 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 48c7e86bb556..6c69d9141709 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -33,8 +33,7 @@
 #include 
 #include 
 #include 
-#include 
-#include 
+#include 
 
 struct device_node;
 struct irq_domain;
@@ -151,7 +150,7 @@ struct irq_domain_chip_generic;
  * @revmap_direct_max_irq: The largest hwirq that can be set for controllers 
that
  * support direct mapping
  * @revmap_size: Size of the linear map table @linear_revmap[]
- * @revmap_tree: Radix map tree for hwirqs that don't fit in the linear map
+ * @revmap_array: hwirqs that don't fit in the linear map
  * @linear_revmap: Linear table of hwirq->virq reverse mappings
  */
 struct irq_domain {
@@ -177,8 +176,7 @@ struct irq_domain {
irq_hw_number_t hwirq_max;
unsigned int revmap_direct_max_irq;
unsigned int revmap_size;
-   struct radix_tree_root revmap_tree;
-   struct mutex revmap_tree_mutex;
+   struct xarray revmap_array;
unsigned int linear_revmap[];
 };
 
@@ -378,7 +376,7 @@ extern void irq_dispose_mapping(unsigned int virq);
  * This is a fast path alternative to irq_find_mapping() that can be
  * called directly by irq controller code to save a handful of
  * instructions. It is always safe to call, but won't find irqs mapped
- * using the radix tree.
+ * using the xarray.
  */
 static inline unsigned int irq_linear_revmap(struct irq_domain *domain,
 irq_hw_number_t hwirq)
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index c3f7405c5517..892288fe9595 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -269,6 +269,7 @@ void *xa_find_after(struct xarray *xa, unsigned long *index,
unsigned long max, xa_tag_t) __attribute__((nonnull(2)));
 unsigned int xa_extract(struct xarray *, void **dst, unsigned long start,
unsigned long max, unsigned int n, xa_tag_t);
+unsigned long xa_count(struct xarray *);
 void xa_destroy(struct xarray *);
 
 /**
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 62068ad46930..d6da3a8eadd2 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -114,7 +114,7 @@ EXPORT_SYMBOL_GPL(irq_domain_free_fwnode);
 /**
  * __irq_domain_add() - Allocate a new irq_domain data structure
  * @fwnode: firmware node for the interrupt controller
- * @size: Size of linear map; 0 for radix mapping only
+ * @size: Size of linear map; 0 for xarray mapping only
  * @hwirq_max: Maximum number of interrupts supported by controller
  * @direct_max: Maximum value of direct maps; Use ~0 for no limit; 0 for no
  *  direct mapping
@@ -209,8 +209,7 @@ struct irq_domain *__irq_domain_add(struct fwnode_handle 
*fwnode, int size,
of_node_get(of_node);
 
/* Fill structure */
-   INIT_RADIX_TREE(>revmap_tree, GFP_KERNEL);
-   mutex_init(>revmap_tree_mutex);
+   xa_init(>revmap_array);
domain->ops = ops;
domain->host_data = host_data;
domain->hwirq_max = hwirq_max;
@@ -241,7 +240,7 @@ void irq_domain_remove(struct irq_domain *domain)
mutex_lock(_domain_mutex);
debugfs_remove_domain_dir(domain);
 
-   WARN_ON(!radix_tree_empty(>revmap_tree));
+   WARN_ON(!xa_empty(>revmap_array));
 
list_del(>link);
 
@@ -462,9 +461,7 @@ static void irq_domain_clear_mapping(struct irq_domain 
*domain,
if (hwirq < domain->revmap_size) {
domain->linear_revmap[hwirq] = 0;
} else {
-   mutex_lock(>revmap_tree_mutex);
-   radix_tree_delete(>revmap_tree, hwirq);
-   mutex_unlock(>revmap_tree_mutex);
+   xa_erase(>revmap_array, hwirq);
}
 }
 
@@ -475,9 +472,7 @@ static void irq_domain_set_mapping(struct irq_domain 
*domain,
if (hwirq < domain->revmap_size) {
domain->linear_revmap[hwirq] = irq_data->irq;
} else {
-   mutex_lock(>revmap_tree_mutex);
-   radix_tree_insert(>revmap_tree, hwirq, irq_data);
-   mutex_unlock(>revmap_tree_mutex);
+   xa_store(>rev

[PATCH v6 72/99] xfs: Convert xfs dquot to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

This is a pretty straight-forward conversion.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/xfs/xfs_dquot.c | 38 +-
 fs/xfs/xfs_qm.c| 36 ++--
 fs/xfs/xfs_qm.h| 18 +-
 3 files changed, 48 insertions(+), 44 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index e2a466df5dd1..c6832db23ca8 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -44,7 +44,7 @@
  * Lock order:
  *
  * ip->i_lock
- *   qi->qi_tree_lock
+ *   qi->qi_xa_lock
  * dquot->q_qlock (xfs_dqlock() and friends)
  *   dquot->q_flush (xfs_dqflock() and friends)
  *   qi->qi_lru_lock
@@ -752,8 +752,8 @@ xfs_qm_dqget(
xfs_dquot_t **O_dqpp) /* OUT : locked incore dquot */
 {
struct xfs_quotainfo*qi = mp->m_quotainfo;
-   struct radix_tree_root *tree = xfs_dquot_tree(qi, type);
-   struct xfs_dquot*dqp;
+   struct xarray   *xa = xfs_dquot_xa(qi, type);
+   struct xfs_dquot*dqp, *duplicate;
int error;
 
ASSERT(XFS_IS_QUOTA_RUNNING(mp));
@@ -772,23 +772,24 @@ xfs_qm_dqget(
}
 
 restart:
-   mutex_lock(>qi_tree_lock);
-   dqp = radix_tree_lookup(tree, id);
+   mutex_lock(>qi_xa_lock);
+   dqp = xa_load(xa, id);
+found:
if (dqp) {
xfs_dqlock(dqp);
if (dqp->dq_flags & XFS_DQ_FREEING) {
xfs_dqunlock(dqp);
-   mutex_unlock(>qi_tree_lock);
+   mutex_unlock(>qi_xa_lock);
trace_xfs_dqget_freeing(dqp);
delay(1);
goto restart;
}
 
-   /* uninit / unused quota found in radix tree, keep looking  */
+   /* uninit / unused quota found, keep looking  */
if (flags & XFS_QMOPT_DQNEXT) {
if (XFS_IS_DQUOT_UNINITIALIZED(dqp)) {
xfs_dqunlock(dqp);
-   mutex_unlock(>qi_tree_lock);
+   mutex_unlock(>qi_xa_lock);
error = xfs_dq_get_next_id(mp, type, );
if (error)
return error;
@@ -797,14 +798,14 @@ xfs_qm_dqget(
}
 
dqp->q_nrefs++;
-   mutex_unlock(>qi_tree_lock);
+   mutex_unlock(>qi_xa_lock);
 
trace_xfs_dqget_hit(dqp);
XFS_STATS_INC(mp, xs_qm_dqcachehits);
*O_dqpp = dqp;
return 0;
}
-   mutex_unlock(>qi_tree_lock);
+   mutex_unlock(>qi_xa_lock);
XFS_STATS_INC(mp, xs_qm_dqcachemisses);
 
/*
@@ -854,20 +855,23 @@ xfs_qm_dqget(
}
}
 
-   mutex_lock(>qi_tree_lock);
-   error = radix_tree_insert(tree, id, dqp);
-   if (unlikely(error)) {
-   WARN_ON(error != -EEXIST);
+   mutex_lock(>qi_xa_lock);
+   duplicate = xa_cmpxchg(xa, id, NULL, dqp, GFP_NOFS);
+   if (unlikely(duplicate)) {
+   if (xa_is_err(duplicate)) {
+   mutex_unlock(>qi_xa_lock);
+   return xa_err(duplicate);
+   }
 
/*
 * Duplicate found. Just throw away the new dquot and start
 * over.
 */
-   mutex_unlock(>qi_tree_lock);
trace_xfs_dqget_dup(dqp);
xfs_qm_dqdestroy(dqp);
XFS_STATS_INC(mp, xs_qm_dquot_dups);
-   goto restart;
+   dqp = duplicate;
+   goto found;
}
 
/*
@@ -877,7 +881,7 @@ xfs_qm_dqget(
dqp->q_nrefs = 1;
 
qi->qi_dquots++;
-   mutex_unlock(>qi_tree_lock);
+   mutex_unlock(>qi_xa_lock);
 
/* If we are asked to find next active id, keep looking */
if (flags & XFS_QMOPT_DQNEXT) {
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index b897b11afb2c..000b207762d6 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -67,7 +67,7 @@ xfs_qm_dquot_walk(
void*data)
 {
struct xfs_quotainfo*qi = mp->m_quotainfo;
-   struct radix_tree_root  *tree = xfs_dquot_tree(qi, type);
+   struct xarray   *xa = xfs_dquot_xa(qi, type);
uint32_tnext_index;
int last_error = 0;
int skipped;
@@ -83,11 +83,11 @@ xfs_qm_dquot_walk(
int error = 0;
int i;
 
-   mutex_lock(>qi_tree_lock);
-   nr_found = radix_tree_gang_lookup(tree, (void **)batch,
-   

[PATCH v6 75/99] md: Convert raid5-cache to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

This is the first user of the radix tree I've converted which was
storing numbers rather than pointers.  I'm fairly pleased with how
well it came out.  There's less boiler-plate involved than there was
with the radix tree, so that's a win.  It does use the advanced API,
and I think that's a signal that there needs to be a separate API for
using the XArray for only integers.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 drivers/md/raid5-cache.c | 119 ---
 1 file changed, 40 insertions(+), 79 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 39f31f07ffe9..2c8ad0ed9b48 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -158,9 +158,8 @@ struct r5l_log {
/* to disable write back during in degraded mode */
struct work_struct disable_writeback_work;
 
-   /* to for chunk_aligned_read in writeback mode, details below */
-   spinlock_t tree_lock;
-   struct radix_tree_root big_stripe_tree;
+   /* for chunk_aligned_read in writeback mode, details below */
+   struct xarray big_stripe;
 };
 
 /*
@@ -170,9 +169,8 @@ struct r5l_log {
  * chunk contains 64 4kB-page, so this chunk contain 64 stripes). For
  * chunk_aligned_read, these stripes are grouped into one "big_stripe".
  * For each big_stripe, we count how many stripes of this big_stripe
- * are in the write back cache. These data are tracked in a radix tree
- * (big_stripe_tree). We use radix_tree item pointer as the counter.
- * r5c_tree_index() is used to calculate keys for the radix tree.
+ * are in the write back cache. This counter is tracked in an xarray
+ * (big_stripe). r5c_index() is used to calculate the index.
  *
  * chunk_aligned_read() calls r5c_big_stripe_cached() to look up
  * big_stripe of each chunk in the tree. If this big_stripe is in the
@@ -180,9 +178,9 @@ struct r5l_log {
  * rcu_read_lock().
  *
  * It is necessary to remember whether a stripe is counted in
- * big_stripe_tree. Instead of adding new flag, we reuses existing flags:
+ * big_stripe. Instead of adding new flag, we reuses existing flags:
  * STRIPE_R5C_PARTIAL_STRIPE and STRIPE_R5C_FULL_STRIPE. If either of these
- * two flags are set, the stripe is counted in big_stripe_tree. This
+ * two flags are set, the stripe is counted in big_stripe. This
  * requires moving set_bit(STRIPE_R5C_PARTIAL_STRIPE) to
  * r5c_try_caching_write(); and moving clear_bit of
  * STRIPE_R5C_PARTIAL_STRIPE and STRIPE_R5C_FULL_STRIPE to
@@ -190,23 +188,13 @@ struct r5l_log {
  */
 
 /*
- * radix tree requests lowest 2 bits of data pointer to be 2b'00.
- * So it is necessary to left shift the counter by 2 bits before using it
- * as data pointer of the tree.
- */
-#define R5C_RADIX_COUNT_SHIFT 2
-
-/*
- * calculate key for big_stripe_tree
+ * calculate key for big_stripe
  *
  * sect: align_bi->bi_iter.bi_sector or sh->sector
  */
-static inline sector_t r5c_tree_index(struct r5conf *conf,
- sector_t sect)
+static inline sector_t r5c_index(struct r5conf *conf, sector_t sect)
 {
-   sector_t offset;
-
-   offset = sector_div(sect, conf->chunk_sectors);
+   sector_div(sect, conf->chunk_sectors);
return sect;
 }
 
@@ -2646,10 +2634,6 @@ int r5c_try_caching_write(struct r5conf *conf,
int i;
struct r5dev *dev;
int to_cache = 0;
-   void **pslot;
-   sector_t tree_index;
-   int ret;
-   uintptr_t refcount;
 
BUG_ON(!r5c_is_writeback(log));
 
@@ -2697,39 +2681,29 @@ int r5c_try_caching_write(struct r5conf *conf,
}
}
 
-   /* if the stripe is not counted in big_stripe_tree, add it now */
+   /* if the stripe is not counted in big_stripe, add it now */
if (!test_bit(STRIPE_R5C_PARTIAL_STRIPE, >state) &&
!test_bit(STRIPE_R5C_FULL_STRIPE, >state)) {
-   tree_index = r5c_tree_index(conf, sh->sector);
-   spin_lock(>tree_lock);
-   pslot = radix_tree_lookup_slot(>big_stripe_tree,
-  tree_index);
-   if (pslot) {
-   refcount = (uintptr_t)radix_tree_deref_slot_protected(
-   pslot, >tree_lock) >>
-   R5C_RADIX_COUNT_SHIFT;
-   radix_tree_replace_slot(
-   >big_stripe_tree, pslot,
-   (void *)((refcount + 1) << 
R5C_RADIX_COUNT_SHIFT));
-   } else {
-   /*
-* this radix_tree_insert can fail safely, so no
-* need to call radix_tree_preload()
-*/
-   ret = radix_tree_insert(
-   >big_stripe_tree, t

[PATCH v6 77/99] fscache: Convert to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Removes another user of radix_tree_preload().

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/fscache/cookie.c |   6 +-
 fs/fscache/internal.h   |   2 +-
 fs/fscache/object.c |   2 +-
 fs/fscache/page.c   | 152 +---
 fs/fscache/stats.c  |   6 +-
 include/linux/fscache.h |   8 +--
 6 files changed, 76 insertions(+), 100 deletions(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index e9054e0c1a49..6d45134d609e 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -109,9 +109,7 @@ struct fscache_cookie *__fscache_acquire_cookie(
cookie->netfs_data  = netfs_data;
cookie->flags   = (1 << FSCACHE_COOKIE_NO_DATA_YET);
 
-   /* radix tree insertion won't use the preallocation pool unless it's
-* told it may not wait */
-   INIT_RADIX_TREE(>stores, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
+   xa_init(>stores);
 
switch (cookie->def->type) {
case FSCACHE_COOKIE_TYPE_INDEX:
@@ -608,7 +606,7 @@ void __fscache_relinquish_cookie(struct fscache_cookie 
*cookie, bool retire)
/* Clear pointers back to the netfs */
cookie->netfs_data  = NULL;
cookie->def = NULL;
-   BUG_ON(!radix_tree_empty(>stores));
+   BUG_ON(!xa_empty(>stores));
 
if (cookie->parent) {
ASSERTCMP(atomic_read(>parent->usage), >, 0);
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 0ff4b49a0037..468d9bd7f8c3 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -200,7 +200,7 @@ extern atomic_t fscache_n_stores_oom;
 extern atomic_t fscache_n_store_ops;
 extern atomic_t fscache_n_store_calls;
 extern atomic_t fscache_n_store_pages;
-extern atomic_t fscache_n_store_radix_deletes;
+extern atomic_t fscache_n_store_xarray_deletes;
 extern atomic_t fscache_n_store_pages_over_limit;
 
 extern atomic_t fscache_n_store_vmscan_not_storing;
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index aa0e71f02c33..ed165736a358 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -956,7 +956,7 @@ static const struct fscache_state 
*_fscache_invalidate_object(struct fscache_obj
 * retire the object instead.
 */
if (!fscache_use_cookie(object)) {
-   ASSERT(radix_tree_empty(>cookie->stores));
+   ASSERT(xa_empty(>cookie->stores));
set_bit(FSCACHE_OBJECT_RETIRED, >flags);
_leave(" [no cookie]");
return transit_to(KILL_OBJECT);
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index 961029e04027..315e2745f822 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -22,13 +22,7 @@
  */
 bool __fscache_check_page_write(struct fscache_cookie *cookie, struct page 
*page)
 {
-   void *val;
-
-   rcu_read_lock();
-   val = radix_tree_lookup(>stores, page->index);
-   rcu_read_unlock();
-
-   return val != NULL;
+   return xa_load(>stores, page->index) != NULL;
 }
 EXPORT_SYMBOL(__fscache_check_page_write);
 
@@ -64,15 +58,15 @@ bool __fscache_maybe_release_page(struct fscache_cookie 
*cookie,
  struct page *page,
  gfp_t gfp)
 {
+   XA_STATE(xas, >stores, page->index);
struct page *xpage;
-   void *val;
 
_enter("%p,%p,%x", cookie, page, gfp);
 
 try_again:
rcu_read_lock();
-   val = radix_tree_lookup(>stores, page->index);
-   if (!val) {
+   xpage = xas_load();
+   if (!xpage) {
rcu_read_unlock();
fscache_stat(_n_store_vmscan_not_storing);
__fscache_uncache_page(cookie, page);
@@ -81,31 +75,32 @@ bool __fscache_maybe_release_page(struct fscache_cookie 
*cookie,
 
/* see if the page is actually undergoing storage - if so we can't get
 * rid of it till the cache has finished with it */
-   if (radix_tree_tag_get(>stores, page->index,
-  FSCACHE_COOKIE_STORING_TAG)) {
+   if (xas_get_tag(, FSCACHE_COOKIE_STORING_TAG)) {
rcu_read_unlock();
+   xas_retry(, XA_RETRY_ENTRY);
goto page_busy;
}
 
/* the page is pending storage, so we attempt to cancel the store and
 * discard the store request so that the page can be reclaimed */
-   spin_lock(>stores_lock);
+   xas_retry(, XA_RETRY_ENTRY);
+   xas_lock();
rcu_read_unlock();
 
-   if (radix_tree_tag_get(>stores, page->index,
-  FSCACHE_COOKIE_STORING_TAG)) {
+   xpage = xas_load();
+   if (xas_get_tag(, FSCACHE_COOKIE_STORING_TAG)) {
/* the page started to undergo storage whilst we were looking,
 *

[PATCH v6 20/99] ida: Convert to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Use the xarray infrstructure like we used the radix tree infrastructure.
This lets us get rid of idr_get_free() from the radix tree code.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 include/linux/idr.h|   8 +-
 include/linux/radix-tree.h |   4 -
 lib/idr.c  | 320 ++---
 lib/radix-tree.c   | 119 -
 4 files changed, 187 insertions(+), 264 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 9064ae5f0abc..ad4199247301 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -232,11 +232,11 @@ struct ida_bitmap {
 DECLARE_PER_CPU(struct ida_bitmap *, ida_bitmap);
 
 struct ida {
-   struct radix_tree_root  ida_rt;
+   struct xarray   ida_xa;
 };
 
 #define IDA_INIT(name) {   \
-   .ida_rt = RADIX_TREE_INIT(name, IDR_INIT_FLAGS | GFP_NOWAIT),   \
+   .ida_xa = XARRAY_INIT_FLAGS(name.ida_xa, IDR_INIT_FLAGS)\
 }
 #define DEFINE_IDA(name)   struct ida name = IDA_INIT(name)
 
@@ -251,7 +251,7 @@ void ida_simple_remove(struct ida *ida, unsigned int id);
 
 static inline void ida_init(struct ida *ida)
 {
-   INIT_RADIX_TREE(>ida_rt, IDR_INIT_FLAGS | GFP_NOWAIT);
+   xa_init_flags(>ida_xa, IDR_INIT_FLAGS);
 }
 
 /**
@@ -268,6 +268,6 @@ static inline int ida_get_new(struct ida *ida, int *p_id)
 
 static inline bool ida_is_empty(const struct ida *ida)
 {
-   return radix_tree_empty(>ida_rt);
+   return xa_empty(>ida_xa);
 }
 #endif /* _LINUX_IDR_H */
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index f64beb9ba175..4c5c36414a80 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -302,10 +302,6 @@ int radix_tree_split(struct radix_tree_root *, unsigned 
long index,
 int radix_tree_join(struct radix_tree_root *, unsigned long index,
unsigned new_order, void *);
 
-void __rcu **idr_get_free(struct radix_tree_root *root,
- struct radix_tree_iter *iter, gfp_t gfp,
- unsigned long max);
-
 enum {
RADIX_TREE_ITER_TAG_MASK = 0x0f,/* tag index in lower nybble */
RADIX_TREE_ITER_TAGGED   = 0x10,/* lookup tagged slots */
diff --git a/lib/idr.c b/lib/idr.c
index 379eaa8cb75b..7e9a8850b613 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -13,7 +13,6 @@
 #include 
 
 DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap);
-static DEFINE_SPINLOCK(simple_ida_lock);
 
 /* In radix-tree.c temporarily */
 extern bool idr_nomem(struct xa_state *, gfp_t);
@@ -337,26 +336,23 @@ EXPORT_SYMBOL_GPL(idr_replace);
 /*
  * Developer's notes:
  *
- * The IDA uses the functionality provided by the IDR & radix tree to store
- * bitmaps in each entry.  The XA_FREE_TAG tag means there is at least one bit
- * free, unlike the IDR where it means at least one entry is free.
- *
- * I considered telling the radix tree that each slot is an order-10 node
- * and storing the bit numbers in the radix tree, but the radix tree can't
- * allow a single multiorder entry at index 0, which would significantly
- * increase memory consumption for the IDA.  So instead we divide the index
- * by the number of bits in the leaf bitmap before doing a radix tree lookup.
- *
- * As an optimisation, if there are only a few low bits set in any given
- * leaf, instead of allocating a 128-byte bitmap, we store the bits
+ * The IDA uses the functionality provided by the IDR & XArray to store
+ * bitmaps in each entry.  The XA_FREE_TAG tag is used to mean that there
+ * is at least one bit free, unlike the IDR where it means at least one
+ * array entry is free.
+ *
+ * The XArray supports multi-index entries, so I considered teaching the
+ * XArray that each slot is an order-10 node and indexing the XArray by the
+ * ID.  The XArray has the significant optimisation of storing the first
+ * entry in the struct xarray and avoiding allocating an xa_node.
+ * Unfortunately, it can't do that for multi-order entries.
+ * So instead the XArray index is the ID divided by the number of bits in
+ * the bitmap
+ *
+ * As a further optimisation, if there are only a few low bits set in any
+ * given leaf, instead of allocating a 128-byte bitmap, we store the bits
  * directly in the entry.
  *
- * We allow the radix tree 'exceptional' count to get out of date.  Nothing
- * in the IDA nor the radix tree code checks it.  If it becomes important
- * to maintain an accurate exceptional count, switch the rcu_assign_pointer()
- * calls to radix_tree_iter_replace() which will correct the exceptional
- * count.
- *
  * The IDA always requires a lock to alloc/free.  If we add a 'test_bit'
  * equivalent, it will still need locking.  Going to RCU lookup would require
  * using RCU to free bitmaps, and that's not trivial without embedding an
@@ -366,104 +362,114 

[PATCH v6 88/99] btrfs: Convert reada_tree to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Rename reada_tree to reada_array.  Use the xa_lock in reada_array to
replace reada_lock.  This has to use a nested spinlock as we take the
xa_lock of the reada_extents and reada_zones xarrays while holding
the reada_lock.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/btrfs/ctree.h   |  15 +--
 fs/btrfs/disk-io.c |   3 +-
 fs/btrfs/reada.c   | 119 +
 3 files changed, 70 insertions(+), 67 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 173d72dfaab6..272d099bed7e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1052,9 +1052,8 @@ struct btrfs_fs_info {
 
struct btrfs_delayed_root *delayed_root;
 
-   /* readahead tree */
-   spinlock_t reada_lock;
-   struct radix_tree_root reada_tree;
+   /* readahead extents */
+   struct xarray reada_array;
 
/* readahead works cnt */
atomic_t reada_works_cnt;
@@ -1102,6 +1101,16 @@ struct btrfs_fs_info {
 #endif
 };
 
+static inline void reada_lock(struct btrfs_fs_info *fs_info)
+{
+   spin_lock_nested(_info->reada_array.xa_lock, SINGLE_DEPTH_NESTING);
+}
+
+static inline void reada_unlock(struct btrfs_fs_info *fs_info)
+{
+   spin_unlock(_info->reada_array.xa_lock);
+}
+
 static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb)
 {
return sb->s_fs_info;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 62995a55d112..1eae29045d43 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2478,8 +2478,7 @@ int open_ctree(struct super_block *sb,
fs_info->commit_interval = BTRFS_DEFAULT_COMMIT_INTERVAL;
fs_info->avg_delayed_ref_runtime = NSEC_PER_SEC >> 6; /* div by 64 */
/* readahead state */
-   INIT_RADIX_TREE(_info->reada_tree, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
-   spin_lock_init(_info->reada_lock);
+   xa_init(_info->reada_array);
btrfs_init_ref_verify(fs_info);
 
fs_info->thread_pool_size = min_t(unsigned long,
diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index 8100f1565250..89ba0063903f 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -215,12 +215,11 @@ int btree_readahead_hook(struct extent_buffer *eb, int 
err)
struct reada_extent *re;
 
/* find extent */
-   spin_lock(_info->reada_lock);
-   re = radix_tree_lookup(_info->reada_tree,
-  eb->start >> PAGE_SHIFT);
+   reada_lock(fs_info);
+   re = xa_load(_info->reada_array, eb->start >> PAGE_SHIFT);
if (re)
re->refcnt++;
-   spin_unlock(_info->reada_lock);
+   reada_unlock(fs_info);
if (!re) {
ret = -1;
goto start_machine;
@@ -246,15 +245,15 @@ static struct reada_zone *reada_find_zone(struct 
btrfs_device *dev, u64 logical,
unsigned long index = logical >> PAGE_SHIFT;
int i;
 
-   spin_lock(_info->reada_lock);
+   reada_lock(fs_info);
zone = xa_find(>reada_zones, , ULONG_MAX, XA_PRESENT);
if (zone && logical >= zone->start && logical <= zone->end) {
kref_get(>refcnt);
-   spin_unlock(_info->reada_lock);
+   reada_unlock(fs_info);
return zone;
}
 
-   spin_unlock(_info->reada_lock);
+   reada_unlock(fs_info);
 
cache = btrfs_lookup_block_group(fs_info, logical);
if (!cache)
@@ -289,7 +288,7 @@ static struct reada_zone *reada_find_zone(struct 
btrfs_device *dev, u64 logical,
}
zone->ndevs = bbio->num_stripes;
 
-   spin_lock(_info->reada_lock);
+   reada_lock(fs_info);
curr = xa_cmpxchg(>reada_zones,
(unsigned long)(zone->end >> PAGE_SHIFT),
NULL, zone, GFP_NOWAIT | __GFP_NOWARN);
@@ -301,7 +300,7 @@ static struct reada_zone *reada_find_zone(struct 
btrfs_device *dev, u64 logical,
else
zone = NULL;
}
-   spin_unlock(_info->reada_lock);
+   reada_unlock(fs_info);
 
return zone;
 }
@@ -323,11 +322,11 @@ static struct reada_extent *reada_find_extent(struct 
btrfs_fs_info *fs_info,
int dev_replace_is_ongoing;
int have_zone = 0;
 
-   spin_lock(_info->reada_lock);
-   re = radix_tree_lookup(_info->reada_tree, index);
+   reada_lock(fs_info);
+   re = xa_load(_info->reada_array, index);
if (re)
re->refcnt++;
-   spin_unlock(_info->reada_lock);
+   reada_unlock(fs_info);
 
if (re)
return re;
@@ -378,38 +377,32 @@ static struct reada_extent *reada_find_extent(struct 
btrfs_fs_info *fs_info,
kref_get(>refcnt);
++zone->el

[PATCH v6 24/99] page cache: Add and replace pages using the XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Use the XArray APIs to add and replace pages in the page cache.  This
removes two uses of the radix tree preload API and is significantly
shorter code.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 include/linux/swap.h |   8 ++-
 mm/filemap.c | 143 ++-
 2 files changed, 67 insertions(+), 84 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index c2b8128799c1..394957963c4b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -299,8 +299,12 @@ void *workingset_eviction(struct address_space *mapping, 
struct page *page);
 bool workingset_refault(void *shadow);
 void workingset_activation(struct page *page);
 
-/* Do not use directly, use workingset_lookup_update */
-void workingset_update_node(struct radix_tree_node *node);
+/* Only track the nodes of mappings with shadow entries */
+void workingset_update_node(struct xa_node *node);
+#define mapping_set_update(xas, mapping) do {  \
+   if (!dax_mapping(mapping) && !shmem_mapping(mapping))   \
+   xas_set_update(xas, workingset_update_node);\
+} while (0)
 
 /* Returns workingset_update_node() if the mapping has shadow entries. */
 #define workingset_lookup_update(mapping)  \
diff --git a/mm/filemap.c b/mm/filemap.c
index f1b4480723dd..e6371b551de1 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -112,35 +112,6 @@
  *   ->tasklist_lock(memory_failure, collect_procs_ao)
  */
 
-static int page_cache_tree_insert(struct address_space *mapping,
- struct page *page, void **shadowp)
-{
-   struct radix_tree_node *node;
-   void **slot;
-   int error;
-
-   error = __radix_tree_create(>pages, page->index, 0,
-   , );
-   if (error)
-   return error;
-   if (*slot) {
-   void *p;
-
-   p = radix_tree_deref_slot_protected(slot,
-   >pages.xa_lock);
-   if (!xa_is_value(p))
-   return -EEXIST;
-
-   mapping->nrexceptional--;
-   if (shadowp)
-   *shadowp = p;
-   }
-   __radix_tree_replace(>pages, node, slot, page,
-workingset_lookup_update(mapping));
-   mapping->nrpages++;
-   return 0;
-}
-
 static void page_cache_tree_delete(struct address_space *mapping,
   struct page *page, void *shadow)
 {
@@ -776,51 +747,44 @@ EXPORT_SYMBOL(file_write_and_wait_range);
  * locked.  This function does not add the new page to the LRU, the
  * caller must do that.
  *
- * The remove + add is atomic.  The only way this function can fail is
- * memory allocation failure.
+ * The remove + add is atomic.  This function cannot fail.
  */
 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 {
-   int error;
+   struct address_space *mapping = old->mapping;
+   void (*freepage)(struct page *) = mapping->a_ops->freepage;
+   pgoff_t offset = old->index;
+   XA_STATE(xas, >pages, offset);
+   unsigned long flags;
 
VM_BUG_ON_PAGE(!PageLocked(old), old);
VM_BUG_ON_PAGE(!PageLocked(new), new);
VM_BUG_ON_PAGE(new->mapping, new);
 
-   error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
-   if (!error) {
-   struct address_space *mapping = old->mapping;
-   void (*freepage)(struct page *);
-   unsigned long flags;
-
-   pgoff_t offset = old->index;
-   freepage = mapping->a_ops->freepage;
-
-   get_page(new);
-   new->mapping = mapping;
-   new->index = offset;
+   get_page(new);
+   new->mapping = mapping;
+   new->index = offset;
 
-   xa_lock_irqsave(>pages, flags);
-   __delete_from_page_cache(old, NULL);
-   error = page_cache_tree_insert(mapping, new, NULL);
-   BUG_ON(error);
+   xas_lock_irqsave(, flags);
+   xas_store(, new);
 
-   /*
-* hugetlb pages do not participate in page cache accounting.
-*/
-   if (!PageHuge(new))
-   __inc_node_page_state(new, NR_FILE_PAGES);
-   if (PageSwapBacked(new))
-   __inc_node_page_state(new, NR_SHMEM);
-   xa_unlock_irqrestore(>pages, flags);
-   mem_cgroup_migrate(old, new);
-   radix_tree_preload_end();
-   if (freepage)
-   freepage(old);
-   put_page(old);
-   }
+   old->mapping = NULL;
+   /* hugetlb pages do not participate in page cache acc

[PATCH v6 38/99] mm: Convert collapse_shmem to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

I found another victim of the radix tree being hard to use.  Because
there was no call to radix_tree_preload(), khugepaged was allocating
radix_tree_nodes using GFP_ATOMIC.

I also converted a local_irq_save()/restore() pair to
disable()/enable().

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 mm/khugepaged.c | 158 +++-
 1 file changed, 65 insertions(+), 93 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 55ade70c33bb..9f49d0cd61c2 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1282,17 +1282,17 @@ static void retract_page_tables(struct address_space 
*mapping, pgoff_t pgoff)
  *
  * Basic scheme is simple, details are more complex:
  *  - allocate and freeze a new huge page;
- *  - scan over radix tree replacing old pages the new one
+ *  - scan page cache replacing old pages with the new one
  *+ swap in pages if necessary;
  *+ fill in gaps;
- *+ keep old pages around in case if rollback is required;
- *  - if replacing succeed:
+ *+ keep old pages around in case rollback is required;
+ *  - if replacing succeeds:
  *+ copy data over;
  *+ free old pages;
  *+ unfreeze huge page;
  *  - if replacing failed;
  *+ put all pages back and unfreeze them;
- *+ restore gaps in the radix-tree;
+ *+ restore gaps in the page cache;
  *+ free huge page;
  */
 static void collapse_shmem(struct mm_struct *mm,
@@ -1300,12 +1300,11 @@ static void collapse_shmem(struct mm_struct *mm,
struct page **hpage, int node)
 {
gfp_t gfp;
-   struct page *page, *new_page, *tmp;
+   struct page *new_page;
struct mem_cgroup *memcg;
pgoff_t index, end = start + HPAGE_PMD_NR;
LIST_HEAD(pagelist);
-   struct radix_tree_iter iter;
-   void **slot;
+   XA_STATE(xas, >pages, start);
int nr_none = 0, result = SCAN_SUCCEED;
 
VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
@@ -1330,48 +1329,48 @@ static void collapse_shmem(struct mm_struct *mm,
__SetPageLocked(new_page);
BUG_ON(!page_ref_freeze(new_page, 1));
 
-
/*
-* At this point the new_page is 'frozen' (page_count() is zero), locked
-* and not up-to-date. It's safe to insert it into radix tree, because
-* nobody would be able to map it or use it in other way until we
-* unfreeze it.
+* At this point the new_page is 'frozen' (page_count() is zero),
+* locked and not up-to-date. It's safe to insert it into the page
+* cache, because nobody would be able to map it or use it in other
+* way until we unfreeze it.
 */
 
-   index = start;
-   xa_lock_irq(>pages);
-   radix_tree_for_each_slot(slot, >pages, , start) {
-   int n = min(iter.index, end) - index;
-
-   /*
-* Handle holes in the radix tree: charge it from shmem and
-* insert relevant subpage of new_page into the radix-tree.
-*/
-   if (n && !shmem_charge(mapping->host, n)) {
-   result = SCAN_FAIL;
+   /* This will be less messy when we use multi-index entries */
+   do {
+   xas_lock_irq();
+   xas_create_range(, end - 1);
+   if (!xas_error())
break;
-   }
-   nr_none += n;
-   for (; index < min(iter.index, end); index++) {
-   radix_tree_insert(>pages, index,
-   new_page + (index % HPAGE_PMD_NR));
-   }
+   xas_unlock_irq();
+   if (!xas_nomem(, GFP_KERNEL))
+   goto out;
+   } while (1);
 
-   /* We are done. */
-   if (index >= end)
-   break;
+   for (index = start; index < end; index++) {
+   struct page *page = xas_next();
+
+   VM_BUG_ON(index != xas.xa_index);
+   if (!page) {
+   if (!shmem_charge(mapping->host, 1)) {
+   result = SCAN_FAIL;
+   break;
+   }
+   xas_store(, new_page + (index % HPAGE_PMD_NR));
+   nr_none++;
+   continue;
+   }
 
-   page = radix_tree_deref_slot_protected(slot,
-   >pages.xa_lock);
if (xa_is_value(page) || !PageUptodate(page)) {
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
/* swap in or instantiate fallocated page */
if (shmem_getpage(mapping->host, index, ,
SGP_NOHUGE)) {

[PATCH v6 52/99] fs: Convert buffer to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Mostly comment fixes, but one use of __xa_set_tag.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/buffer.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 1a6ae530156b..e1d18307d5c8 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -592,7 +592,7 @@ void mark_buffer_dirty_inode(struct buffer_head *bh, struct 
inode *inode)
 EXPORT_SYMBOL(mark_buffer_dirty_inode);
 
 /*
- * Mark the page dirty, and set it dirty in the radix tree, and mark the inode
+ * Mark the page dirty, and set it dirty in the page cache, and mark the inode
  * dirty.
  *
  * If warn is true, then emit a warning if the page is not uptodate and has
@@ -609,8 +609,8 @@ void __set_page_dirty(struct page *page, struct 
address_space *mapping,
if (page->mapping) {/* Race with truncate? */
WARN_ON_ONCE(warn && !PageUptodate(page));
account_page_dirtied(page, mapping);
-   radix_tree_tag_set(>pages,
-   page_index(page), PAGECACHE_TAG_DIRTY);
+   __xa_set_tag(>pages, page_index(page),
+   PAGECACHE_TAG_DIRTY);
}
xa_unlock_irqrestore(>pages, flags);
 }
@@ -1072,7 +1072,7 @@ __getblk_slow(struct block_device *bdev, sector_t block,
  * The relationship between dirty buffers and dirty pages:
  *
  * Whenever a page has any dirty buffers, the page's dirty bit is set, and
- * the page is tagged dirty in its radix tree.
+ * the page is tagged dirty in the page cache.
  *
  * At all times, the dirtiness of the buffers represents the dirtiness of
  * subsections of the page.  If the page has buffers, the page dirty bit is
@@ -1095,9 +1095,9 @@ __getblk_slow(struct block_device *bdev, sector_t block,
  * mark_buffer_dirty - mark a buffer_head as needing writeout
  * @bh: the buffer_head to mark dirty
  *
- * mark_buffer_dirty() will set the dirty bit against the buffer, then set its
- * backing page dirty, then tag the page as dirty in its address_space's radix
- * tree and then attach the address_space's inode to its superblock's dirty
+ * mark_buffer_dirty() will set the dirty bit against the buffer, then set
+ * its backing page dirty, then tag the page as dirty in the page cache
+ * and then attach the address_space's inode to its superblock's dirty
  * inode list.
  *
  * mark_buffer_dirty() is atomic.  It takes bh->b_page->mapping->private_lock,
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 39/99] mm: Convert khugepaged_scan_shmem to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Slightly shorter and easier to read code.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 mm/khugepaged.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 9f49d0cd61c2..15f1b2d81a69 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1534,8 +1534,7 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
pgoff_t start, struct page **hpage)
 {
struct page *page = NULL;
-   struct radix_tree_iter iter;
-   void **slot;
+   XA_STATE(xas, >pages, start);
int present, swap;
int node = NUMA_NO_NODE;
int result = SCAN_SUCCEED;
@@ -1544,17 +1543,11 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
swap = 0;
memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
rcu_read_lock();
-   radix_tree_for_each_slot(slot, >pages, , start) {
-   if (iter.index >= start + HPAGE_PMD_NR)
-   break;
-
-   page = radix_tree_deref_slot(slot);
-   if (radix_tree_deref_retry(page)) {
-   slot = radix_tree_iter_retry();
+   xas_for_each(, page, start + HPAGE_PMD_NR - 1) {
+   if (xas_retry(, page))
continue;
-   }
 
-   if (radix_tree_exception(page)) {
+   if (xa_is_value(page)) {
if (++swap > khugepaged_max_ptes_swap) {
result = SCAN_EXCEED_SWAP_PTE;
break;
@@ -1593,7 +1586,7 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
present++;
 
if (need_resched()) {
-   slot = radix_tree_iter_resume(slot, );
+   xas_pause();
cond_resched_rcu();
}
}
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 87/99] btrfs: Convert reada_extents to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Straightforward conversion.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/btrfs/reada.c   | 32 +---
 fs/btrfs/volumes.c |  2 +-
 fs/btrfs/volumes.h |  2 +-
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index ef8e84ff2012..8100f1565250 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -438,13 +438,14 @@ static struct reada_extent *reada_find_extent(struct 
btrfs_fs_info *fs_info,
continue;
}
prev_dev = dev;
-   ret = radix_tree_insert(>reada_extents, index, re);
+   ret = xa_insert(>reada_extents, index, re,
+   GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
if (ret) {
while (--nzones >= 0) {
dev = re->zones[nzones]->device;
BUG_ON(dev == NULL);
/* ignore whether the entry was inserted */
-   radix_tree_delete(>reada_extents, index);
+   xa_erase(>reada_extents, index);
}
radix_tree_delete(_info->reada_tree, index);
spin_unlock(_info->reada_lock);
@@ -504,7 +505,7 @@ static void reada_extent_put(struct btrfs_fs_info *fs_info,
for (i = 0; i < re->nzones; ++i) {
struct reada_zone *zone = re->zones[i];
 
-   radix_tree_delete(>device->reada_extents, index);
+   xa_erase(>device->reada_extents, index);
}
 
spin_unlock(_info->reada_lock);
@@ -644,6 +645,7 @@ static int reada_start_machine_dev(struct btrfs_device *dev)
int mirror_num = 0;
struct extent_buffer *eb = NULL;
u64 logical;
+   unsigned long index;
int ret;
int i;
 
@@ -660,19 +662,19 @@ static int reada_start_machine_dev(struct btrfs_device 
*dev)
 * a contiguous block of extents, we could also coagulate them or use
 * plugging to speed things up
 */
-   ret = radix_tree_gang_lookup(>reada_extents, (void **),
-dev->reada_next >> PAGE_SHIFT, 1);
-   if (ret == 0 || re->logical > dev->reada_curr_zone->end) {
+   index = dev->reada_next >> PAGE_SHIFT;
+   re = xa_find(>reada_extents, , ULONG_MAX, XA_PRESENT);
+   if (!re || re->logical > dev->reada_curr_zone->end) {
ret = reada_pick_zone(dev);
if (!ret) {
spin_unlock(_info->reada_lock);
return 0;
}
-   re = NULL;
-   ret = radix_tree_gang_lookup(>reada_extents, (void **),
-   dev->reada_next >> PAGE_SHIFT, 1);
+   index = dev->reada_next >> PAGE_SHIFT;
+   re = xa_find(>reada_extents, , ULONG_MAX,
+   XA_PRESENT);
}
-   if (ret == 0) {
+   if (!re) {
spin_unlock(_info->reada_lock);
return 0;
}
@@ -828,11 +830,11 @@ static void dump_devs(struct btrfs_fs_info *fs_info, int 
all)
cnt = 0;
index = 0;
while (all) {
-   struct reada_extent *re = NULL;
+   struct reada_extent *re;
 
-   ret = radix_tree_gang_lookup(>reada_extents,
-(void **), index, 1);
-   if (ret == 0)
+   re = xa_find(>reada_extents, , ULONG_MAX,
+   XA_PRESENT);
+   if (!re)
break;
pr_debug("  re: logical %llu size %u empty %d scheduled 
%d",
re->logical, fs_info->nodesize,
@@ -848,7 +850,7 @@ static void dump_devs(struct btrfs_fs_info *fs_info, int 
all)
}
}
pr_cont("\n");
-   index = (re->logical >> PAGE_SHIFT) + 1;
+   index++;
if (++cnt > 15)
break;
}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8e683799b436..304c2ef4c557 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -248,7 +248,7 @@ static struct btrfs_device *__alloc_device(void)
atomic_set(>dev_stats_ccnt, 0);
btrfs_device_data_ordered_init(dev);
xa_init(>reada_zones);
-   INIT_RADIX_TREE(>re

[PATCH v6 30/99] mm: Convert page-writeback to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Includes moving mapping_tagged() to fs.h as a static inline, and
changing it to return bool.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 include/linux/fs.h  | 17 +--
 mm/page-writeback.c | 62 +++--
 2 files changed, 32 insertions(+), 47 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index e4345c13e237..c58bc3c619bf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -470,15 +470,18 @@ struct block_device {
struct mutexbd_fsfreeze_mutex;
 } __randomize_layout;
 
+/* XArray tags, for tagging dirty and writeback pages in the pagecache. */
+#define PAGECACHE_TAG_DIRTYXA_TAG_0
+#define PAGECACHE_TAG_WRITEBACKXA_TAG_1
+#define PAGECACHE_TAG_TOWRITE  XA_TAG_2
+
 /*
- * Radix-tree tags, for tagging dirty and writeback pages within the pagecache
- * radix trees
+ * Returns true if any of the pages in the mapping are marked with the tag.
  */
-#define PAGECACHE_TAG_DIRTY0
-#define PAGECACHE_TAG_WRITEBACK1
-#define PAGECACHE_TAG_TOWRITE  2
-
-int mapping_tagged(struct address_space *mapping, int tag);
+static inline bool mapping_tagged(struct address_space *mapping, xa_tag_t tag)
+{
+   return xa_tagged(>pages, tag);
+}
 
 static inline void i_mmap_lock_write(struct address_space *mapping)
 {
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 588ce729d199..0407436a8305 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2098,33 +2098,25 @@ void __init page_writeback_init(void)
  * dirty pages in the file (thus it is important for this function to be quick
  * so that it can tag pages faster than a dirtying process can create them).
  */
-/*
- * We tag pages in batches of WRITEBACK_TAG_BATCH to reduce xa_lock latency.
- */
 void tag_pages_for_writeback(struct address_space *mapping,
 pgoff_t start, pgoff_t end)
 {
-#define WRITEBACK_TAG_BATCH 4096
-   unsigned long tagged = 0;
-   struct radix_tree_iter iter;
-   void **slot;
+   XA_STATE(xas, >pages, start);
+   unsigned int tagged = 0;
+   void *page;
 
-   xa_lock_irq(>pages);
-   radix_tree_for_each_tagged(slot, >pages, , start,
-   PAGECACHE_TAG_DIRTY) {
-   if (iter.index > end)
-   break;
-   radix_tree_iter_tag_set(>pages, ,
-   PAGECACHE_TAG_TOWRITE);
-   tagged++;
-   if ((tagged % WRITEBACK_TAG_BATCH) != 0)
+   xas_lock_irq();
+   xas_for_each_tag(, page, end, PAGECACHE_TAG_DIRTY) {
+   xas_set_tag(, PAGECACHE_TAG_TOWRITE);
+   if (++tagged % XA_CHECK_SCHED)
continue;
-   slot = radix_tree_iter_resume(slot, );
-   xa_unlock_irq(>pages);
+
+   xas_pause();
+   xas_unlock_irq();
cond_resched();
-   xa_lock_irq(>pages);
+   xas_lock_irq();
}
-   xa_unlock_irq(>pages);
+   xas_unlock_irq();
 }
 EXPORT_SYMBOL(tag_pages_for_writeback);
 
@@ -2164,7 +2156,7 @@ int write_cache_pages(struct address_space *mapping,
pgoff_t done_index;
int cycled;
int range_whole = 0;
-   int tag;
+   xa_tag_t tag;
 
pagevec_init();
if (wbc->range_cyclic) {
@@ -2445,7 +2437,7 @@ void account_page_cleaned(struct page *page, struct 
address_space *mapping,
 
 /*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
- * its radix tree.
+ * the xarray.
  *
  * This is also used when a single buffer is being dirtied: we want to set the
  * page dirty in that case, but not all the buffers.  This is a "bottom-up"
@@ -2471,7 +2463,7 @@ int __set_page_dirty_nobuffers(struct page *page)
BUG_ON(page_mapping(page) != mapping);
WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
account_page_dirtied(page, mapping);
-   radix_tree_tag_set(>pages, page_index(page),
+   __xa_set_tag(>pages, page_index(page),
   PAGECACHE_TAG_DIRTY);
xa_unlock_irqrestore(>pages, flags);
unlock_page_memcg(page);
@@ -2634,13 +2626,13 @@ EXPORT_SYMBOL(__cancel_dirty_page);
  * Returns true if the page was previously dirty.
  *
  * This is for preparing to put the page under writeout.  We leave the page
- * tagged as dirty in the radix tree so that a concurrent write-for-sync
+ * tagged as dirty in the xarray so that a concurrent write-for-sync
  * can discover it via a PAGECACHE_TAG_DIRTY walk.  The ->writepage
  * implementation will run either set_page_writeback() or set_page_dirty(),
- * at which stage we bring the page's dirty fl

[PATCH v6 32/99] mm: Convert truncate to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

This is essentially xa_cmpxchg() with the locking handled above us,
and it doesn't have to handle replacing a NULL entry.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 mm/truncate.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 69bb743dd7e5..70323c347298 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -33,15 +33,12 @@
 static inline void __clear_shadow_entry(struct address_space *mapping,
pgoff_t index, void *entry)
 {
-   struct radix_tree_node *node;
-   void **slot;
+   XA_STATE(xas, >pages, index);
 
-   if (!__radix_tree_lookup(>pages, index, , ))
+   xas_set_update(, workingset_update_node);
+   if (xas_load() != entry)
return;
-   if (*slot != entry)
-   return;
-   __radix_tree_replace(>pages, node, slot, NULL,
-workingset_update_node);
+   xas_store(, NULL);
mapping->nrexceptional--;
 }
 
@@ -746,10 +743,10 @@ int invalidate_inode_pages2_range(struct address_space 
*mapping,
index++;
}
/*
-* For DAX we invalidate page tables after invalidating radix tree.  We
+* For DAX we invalidate page tables after invalidating page cache.  We
 * could invalidate page tables while invalidating each entry however
 * that would be expensive. And doing range unmapping before doesn't
-* work as we have no cheap way to find whether radix tree entry didn't
+* work as we have no cheap way to find whether page cache entry didn't
 * get remapped later.
 */
if (dax_mapping(mapping)) {
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 79/99] blk-cgroup: Convert to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

This call to radix_tree_preload is awkward.  At the point of allocation,
we're under not only a local lock, but also under the queue lock.  So we
can't back out, drop the lock and retry the allocation.  Replace this
preload call with a call to xa_reserve() which will ensure the memory is
allocated.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 block/bfq-cgroup.c |  4 ++--
 block/blk-cgroup.c | 52 ++
 block/cfq-iosched.c|  4 ++--
 include/linux/blk-cgroup.h |  5 ++---
 4 files changed, 31 insertions(+), 34 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index da1525ec4c87..0648aaa6498b 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -860,7 +860,7 @@ static int bfq_io_set_weight_legacy(struct 
cgroup_subsys_state *css,
return ret;
 
ret = 0;
-   spin_lock_irq(>lock);
+   xa_lock_irq(>blkg_array);
bfqgd->weight = (unsigned short)val;
hlist_for_each_entry(blkg, >blkg_list, blkcg_node) {
struct bfq_group *bfqg = blkg_to_bfqg(blkg);
@@ -894,7 +894,7 @@ static int bfq_io_set_weight_legacy(struct 
cgroup_subsys_state *css,
bfqg->entity.prio_changed = 1;
}
}
-   spin_unlock_irq(>lock);
+   xa_unlock_irq(>blkg_array);
 
return ret;
 }
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 4117524ca45b..37962d52f1a8 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -146,12 +146,12 @@ struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg,
struct blkcg_gq *blkg;
 
/*
-* Hint didn't match.  Look up from the radix tree.  Note that the
+* Hint didn't match.  Fetch from the xarray.  Note that the
 * hint can only be updated under queue_lock as otherwise @blkg
-* could have already been removed from blkg_tree.  The caller is
+* could have already been removed from blkg_array.  The caller is
 * responsible for grabbing queue_lock if @update_hint.
 */
-   blkg = radix_tree_lookup(>blkg_tree, q->id);
+   blkg = xa_load(>blkg_array, q->id);
if (blkg && blkg->q == q) {
if (update_hint) {
lockdep_assert_held(q->queue_lock);
@@ -223,8 +223,8 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
}
 
/* insert */
-   spin_lock(>lock);
-   ret = radix_tree_insert(>blkg_tree, q->id, blkg);
+   xa_lock(>blkg_array);
+   ret = xa_err(__xa_store(>blkg_array, q->id, blkg, GFP_NOWAIT));
if (likely(!ret)) {
hlist_add_head_rcu(>blkcg_node, >blkg_list);
list_add(>q_node, >blkg_list);
@@ -237,7 +237,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
}
}
blkg->online = true;
-   spin_unlock(>lock);
+   xa_unlock(>blkg_array);
 
if (!ret)
return blkg;
@@ -314,7 +314,7 @@ static void blkg_destroy(struct blkcg_gq *blkg)
int i;
 
lockdep_assert_held(blkg->q->queue_lock);
-   lockdep_assert_held(>lock);
+   lockdep_assert_held(>blkg_array.xa_lock);
 
/* Something wrong if we are trying to remove same group twice */
WARN_ON_ONCE(list_empty(>q_node));
@@ -334,7 +334,7 @@ static void blkg_destroy(struct blkcg_gq *blkg)
 
blkg->online = false;
 
-   radix_tree_delete(>blkg_tree, blkg->q->id);
+   xa_erase(>blkg_array, blkg->q->id);
list_del_init(>q_node);
hlist_del_init_rcu(>blkcg_node);
 
@@ -368,9 +368,9 @@ static void blkg_destroy_all(struct request_queue *q)
list_for_each_entry_safe(blkg, n, >blkg_list, q_node) {
struct blkcg *blkcg = blkg->blkcg;
 
-   spin_lock(>lock);
+   xa_lock(>blkg_array);
blkg_destroy(blkg);
-   spin_unlock(>lock);
+   xa_unlock(>blkg_array);
}
 
q->root_blkg = NULL;
@@ -443,7 +443,7 @@ static int blkcg_reset_stats(struct cgroup_subsys_state 
*css,
int i;
 
mutex_lock(_pol_mutex);
-   spin_lock_irq(>lock);
+   xa_lock_irq(>blkg_array);
 
/*
 * Note that stat reset is racy - it doesn't synchronize against
@@ -462,7 +462,7 @@ static int blkcg_reset_stats(struct cgroup_subsys_state 
*css,
}
}
 
-   spin_unlock_irq(>lock);
+   xa_unlock_irq(>blkg_array);
mutex_unlock(_pol_mutex);
return 0;
 }
@@ -1012,7 +1012,7 @@ static void blkcg_css_offline(struct cgroup_subsys_state 
*css)
 {
struct blkcg *blkcg = css_to_blkcg(css);
 
-   spin_lock_irq(>lock);
+   xa_lock_irq(>blkg_array);
 
while (!hlist

[PATCH v6 43/99] shmem: Convert find_swap_entry to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

This is a 1:1 conversion.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 mm/shmem.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 654f367aca90..ce285ae635ea 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1076,28 +1076,27 @@ static void shmem_evict_inode(struct inode *inode)
clear_inode(inode);
 }
 
-static unsigned long find_swap_entry(struct radix_tree_root *root, void *item)
+static unsigned long find_swap_entry(struct xarray *xa, void *item)
 {
-   struct radix_tree_iter iter;
-   void **slot;
-   unsigned long found = -1;
+   XA_STATE(xas, xa, 0);
unsigned int checked = 0;
+   void *entry;
 
rcu_read_lock();
-   radix_tree_for_each_slot(slot, root, , 0) {
-   if (*slot == item) {
-   found = iter.index;
+   xas_for_each(, entry, ULONG_MAX) {
+   if (xas_retry(, entry))
+   continue;
+   if (entry == item)
break;
-   }
checked++;
-   if ((checked % 4096) != 0)
+   if ((checked % XA_CHECK_SCHED) != 0)
continue;
-   slot = radix_tree_iter_resume(slot, );
+   xas_pause();
cond_resched_rcu();
}
-
rcu_read_unlock();
-   return found;
+
+   return xas_invalid() ? -1 : xas.xa_index;
 }
 
 /*
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 31/99] mm: Convert workingset to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

We construct a fake XA_STATE and use it to delete the node with xa_store()
rather than adding a special function for this unique use case.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 include/linux/swap.h |  9 -
 mm/workingset.c  | 51 ++-
 2 files changed, 22 insertions(+), 38 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 394957963c4b..e519554730fa 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -306,15 +306,6 @@ void workingset_update_node(struct xa_node *node);
xas_set_update(xas, workingset_update_node);\
 } while (0)
 
-/* Returns workingset_update_node() if the mapping has shadow entries. */
-#define workingset_lookup_update(mapping)  \
-({ \
-   radix_tree_update_node_t __helper = workingset_update_node; \
-   if (dax_mapping(mapping) || shmem_mapping(mapping)) \
-   __helper = NULL;\
-   __helper;   \
-})
-
 /* linux/mm/page_alloc.c */
 extern unsigned long totalram_pages;
 extern unsigned long totalreserve_pages;
diff --git a/mm/workingset.c b/mm/workingset.c
index 91b6e16ad4c1..f7ca6ea5d8b1 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -148,7 +148,7 @@
  * and activations is maintained (node->inactive_age).
  *
  * On eviction, a snapshot of this counter (along with some bits to
- * identify the node) is stored in the now empty page cache radix tree
+ * identify the node) is stored in the now empty page cache
  * slot of the evicted page.  This is called a shadow entry.
  *
  * On cache misses for which there are shadow entries, an eligible
@@ -162,7 +162,7 @@
 
 /*
  * Eviction timestamps need to be able to cover the full range of
- * actionable refaults. However, bits are tight in the radix tree
+ * actionable refaults. However, bits are tight in the xarray
  * entry, and after storing the identifier for the lruvec there might
  * not be enough left to represent every single actionable refault. In
  * that case, we have to sacrifice granularity for distance, and group
@@ -338,7 +338,7 @@ void workingset_activation(struct page *page)
 
 static struct list_lru shadow_nodes;
 
-void workingset_update_node(struct radix_tree_node *node)
+void workingset_update_node(struct xa_node *node)
 {
/*
 * Track non-empty nodes that contain only shadow entries;
@@ -370,7 +370,7 @@ static unsigned long count_shadow_nodes(struct shrinker 
*shrinker,
local_irq_enable();
 
/*
-* Approximate a reasonable limit for the radix tree nodes
+* Approximate a reasonable limit for the nodes
 * containing shadow entries. We don't need to keep more
 * shadow entries than possible pages on the active list,
 * since refault distances bigger than that are dismissed.
@@ -385,11 +385,11 @@ static unsigned long count_shadow_nodes(struct shrinker 
*shrinker,
 * worst-case density of 1/8th. Below that, not all eligible
 * refaults can be detected anymore.
 *
-* On 64-bit with 7 radix_tree_nodes per page and 64 slots
+* On 64-bit with 7 xa_nodes per page and 64 slots
 * each, this will reclaim shadow entries when they consume
 * ~1.8% of available memory:
 *
-* PAGE_SIZE / radix_tree_nodes / node_entries * 8 / PAGE_SIZE
+* PAGE_SIZE / xa_nodes / node_entries * 8 / PAGE_SIZE
 */
if (sc->memcg) {
cache = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid,
@@ -398,7 +398,7 @@ static unsigned long count_shadow_nodes(struct shrinker 
*shrinker,
cache = node_page_state(NODE_DATA(sc->nid), NR_ACTIVE_FILE) +
node_page_state(NODE_DATA(sc->nid), NR_INACTIVE_FILE);
}
-   max_nodes = cache >> (RADIX_TREE_MAP_SHIFT - 3);
+   max_nodes = cache >> (XA_CHUNK_SHIFT - 3);
 
if (nodes <= max_nodes)
return 0;
@@ -408,11 +408,11 @@ static unsigned long count_shadow_nodes(struct shrinker 
*shrinker,
 static enum lru_status shadow_lru_isolate(struct list_head *item,
  struct list_lru_one *lru,
  spinlock_t *lru_lock,
- void *arg)
+ void *arg) __must_hold(lru_lock)
 {
+   XA_STATE(xas, NULL, 0);
struct address_space *mapping;
-   struct radix_tree_node *node;
-   unsigned int i;
+   struct xa_node *node;
int ret;
 
/*
@@ -420,7 +420,7 @@ static enum lru_status shadow_lru_isolate(struct list_head 
*item,
 * the sha

[PATCH v6 82/99] s390: Convert gmap to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

The three radix trees in gmap are all converted to the XArray.
This is another case where the multiple locks held mandates the use
of the xa_reserve() API.  The gmap_insert_rmap() function is
considerably simplified by using the advanced API;
gmap_radix_tree_free() turns out to just be xa_destroy(), and
gmap_rmap_radix_tree_free() is a nice little iteration followed
by xa_destroy().

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 arch/s390/include/asm/gmap.h |  12 ++--
 arch/s390/mm/gmap.c  | 133 +++
 2 files changed, 51 insertions(+), 94 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index e07cce88dfb0..7695a01d19d7 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -14,14 +14,14 @@
  * @list: list head for the mm->context gmap list
  * @crst_list: list of all crst tables used in the guest address space
  * @mm: pointer to the parent mm_struct
- * @guest_to_host: radix tree with guest to host address translation
- * @host_to_guest: radix tree with pointer to segment table entries
+ * @guest_to_host: guest to host address translation
+ * @host_to_guest: pointers to segment table entries
  * @guest_table_lock: spinlock to protect all entries in the guest page table
  * @ref_count: reference counter for the gmap structure
  * @table: pointer to the page directory
  * @asce: address space control element for gmap page table
  * @pfault_enabled: defines if pfaults are applicable for the guest
- * @host_to_rmap: radix tree with gmap_rmap lists
+ * @host_to_rmap: gmap_rmap lists
  * @children: list of shadow gmap structures
  * @pt_list: list of all page tables used in the shadow guest address space
  * @shadow_lock: spinlock to protect the shadow gmap list
@@ -35,8 +35,8 @@ struct gmap {
struct list_head list;
struct list_head crst_list;
struct mm_struct *mm;
-   struct radix_tree_root guest_to_host;
-   struct radix_tree_root host_to_guest;
+   struct xarray guest_to_host;
+   struct xarray host_to_guest;
spinlock_t guest_table_lock;
atomic_t ref_count;
unsigned long *table;
@@ -45,7 +45,7 @@ struct gmap {
void *private;
bool pfault_enabled;
/* Additional data for shadow guest address spaces */
-   struct radix_tree_root host_to_rmap;
+   struct xarray host_to_rmap;
struct list_head children;
struct list_head pt_list;
spinlock_t shadow_lock;
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 05d459b638f5..818a5e80914d 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -60,9 +60,9 @@ static struct gmap *gmap_alloc(unsigned long limit)
INIT_LIST_HEAD(>crst_list);
INIT_LIST_HEAD(>children);
INIT_LIST_HEAD(>pt_list);
-   INIT_RADIX_TREE(>guest_to_host, GFP_KERNEL);
-   INIT_RADIX_TREE(>host_to_guest, GFP_ATOMIC);
-   INIT_RADIX_TREE(>host_to_rmap, GFP_ATOMIC);
+   xa_init(>guest_to_host);
+   xa_init(>host_to_guest);
+   xa_init(>host_to_rmap);
spin_lock_init(>guest_table_lock);
spin_lock_init(>shadow_lock);
atomic_set(>ref_count, 1);
@@ -121,55 +121,16 @@ static void gmap_flush_tlb(struct gmap *gmap)
__tlb_flush_global();
 }
 
-static void gmap_radix_tree_free(struct radix_tree_root *root)
-{
-   struct radix_tree_iter iter;
-   unsigned long indices[16];
-   unsigned long index;
-   void __rcu **slot;
-   int i, nr;
-
-   /* A radix tree is freed by deleting all of its entries */
-   index = 0;
-   do {
-   nr = 0;
-   radix_tree_for_each_slot(slot, root, , index) {
-   indices[nr] = iter.index;
-   if (++nr == 16)
-   break;
-   }
-   for (i = 0; i < nr; i++) {
-   index = indices[i];
-   radix_tree_delete(root, index);
-   }
-   } while (nr > 0);
-}
-
-static void gmap_rmap_radix_tree_free(struct radix_tree_root *root)
+static void gmap_rmap_free(struct xarray *xa)
 {
struct gmap_rmap *rmap, *rnext, *head;
-   struct radix_tree_iter iter;
-   unsigned long indices[16];
-   unsigned long index;
-   void __rcu **slot;
-   int i, nr;
-
-   /* A radix tree is freed by deleting all of its entries */
-   index = 0;
-   do {
-   nr = 0;
-   radix_tree_for_each_slot(slot, root, , index) {
-   indices[nr] = iter.index;
-   if (++nr == 16)
-   break;
-   }
-   for (i = 0; i < nr; i++) {
-   index = indices[i];
-   head = radix_tree_delete(root, index);
-

[PATCH v6 42/99] shmem: Convert shmem_confirm_swap to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

xa_load has its own RCU locking, so we can eliminate it here.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 mm/shmem.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index fad6c9e7402e..654f367aca90 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -348,12 +348,7 @@ static int shmem_xa_replace(struct address_space *mapping,
 static bool shmem_confirm_swap(struct address_space *mapping,
   pgoff_t index, swp_entry_t swap)
 {
-   void *item;
-
-   rcu_read_lock();
-   item = radix_tree_lookup(>pages, index);
-   rcu_read_unlock();
-   return item == swp_to_radix_entry(swap);
+   return xa_load(>pages, index) == swp_to_radix_entry(swap);
 }
 
 /*
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 81/99] i915: Convert handles_vma to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Straightforward conversion.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_context.c   | 12 +---
 drivers/gpu/drm/i915/i915_gem_context.h   |  4 ++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c|  6 +++---
 drivers/gpu/drm/i915/selftests/mock_context.c |  2 +-
 5 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 25ce7bcf9988..69e944f4dfce 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3351,7 +3351,7 @@ void i915_gem_close_object(struct drm_gem_object *gem, 
struct drm_file *file)
if (ctx->file_priv != fpriv)
continue;
 
-   vma = radix_tree_delete(>handles_vma, lut->handle);
+   vma = xa_erase(>handles_vma, lut->handle);
GEM_BUG_ON(vma->obj != obj);
 
/* We allow the process to have multiple handles to the same
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index f782cf2069c1..1aff35ba6e18 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -95,9 +95,9 @@
 
 static void lut_close(struct i915_gem_context *ctx)
 {
+   XA_STATE(xas, >handles_vma, 0);
struct i915_lut_handle *lut, *ln;
-   struct radix_tree_iter iter;
-   void __rcu **slot;
+   struct i915_vma *vma;
 
list_for_each_entry_safe(lut, ln, >handles_list, ctx_link) {
list_del(>obj_link);
@@ -105,10 +105,8 @@ static void lut_close(struct i915_gem_context *ctx)
}
 
rcu_read_lock();
-   radix_tree_for_each_slot(slot, >handles_vma, , 0) {
-   struct i915_vma *vma = rcu_dereference_raw(*slot);
-
-   radix_tree_iter_delete(>handles_vma, , slot);
+   xas_for_each(, vma, ULONG_MAX) {
+   xas_store(, NULL);
__i915_gem_object_release_unless_active(vma->obj);
}
rcu_read_unlock();
@@ -276,7 +274,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
ctx->i915 = dev_priv;
ctx->priority = I915_PRIORITY_NORMAL;
 
-   INIT_RADIX_TREE(>handles_vma, GFP_KERNEL);
+   xa_init(>handles_vma);
INIT_LIST_HEAD(>handles_list);
 
/* Default context will never have a file_priv */
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h 
b/drivers/gpu/drm/i915/i915_gem_context.h
index 44688e22a5c2..8e3e0d002f77 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -181,11 +181,11 @@ struct i915_gem_context {
/** remap_slice: Bitmask of cache lines that need remapping */
u8 remap_slice;
 
-   /** handles_vma: rbtree to look up our context specific obj/vma for
+   /** handles_vma: lookup our context specific obj/vma for
 * the user handle. (user handles are per fd, but the binding is
 * per vm, which may be one per context or shared with the global GTT)
 */
-   struct radix_tree_root handles_vma;
+   struct xarray handles_vma;
 
/** handles_list: reverse list of all the rbtree entries in use for
 * this context, which allows us to free all the allocations on
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 435ed95df144..828f4b5473ea 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -683,7 +683,7 @@ static int eb_select_context(struct i915_execbuffer *eb)
 
 static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
-   struct radix_tree_root *handles_vma = >ctx->handles_vma;
+   struct xarray *handles_vma = >ctx->handles_vma;
struct drm_i915_gem_object *obj;
unsigned int i;
int err;
@@ -702,7 +702,7 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
struct i915_lut_handle *lut;
struct i915_vma *vma;
 
-   vma = radix_tree_lookup(handles_vma, handle);
+   vma = xa_load(handles_vma, handle);
if (likely(vma))
goto add_vma;
 
@@ -724,7 +724,7 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
goto err_obj;
}
 
-   err = radix_tree_insert(handles_vma, handle, vma);
+   err = xa_err(xa_store(handles_vma, handle, vma, GFP_KERNEL));
if (unlikely(err)) {
kfree(lut);
goto err_obj;
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c 
b/drivers/gpu/drm/i915/selftests/mock_context.c
index bbf80d42e793..b664a7159242 100644
--- a/drivers/gpu/drm/i915/selftests/mock_c

[PATCH v6 78/99] sh: intc: Convert to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

The radix tree was being protected by a raw spinlock.  I believe that
was not necessary, and the new internal regular spinlock will be
adequate for this array.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 drivers/sh/intc/core.c  |  9 ++
 drivers/sh/intc/internals.h |  5 ++--
 drivers/sh/intc/virq.c  | 72 +
 3 files changed, 25 insertions(+), 61 deletions(-)

diff --git a/drivers/sh/intc/core.c b/drivers/sh/intc/core.c
index 8e72bcbd3d6d..356a423d9dcb 100644
--- a/drivers/sh/intc/core.c
+++ b/drivers/sh/intc/core.c
@@ -30,7 +30,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include "internals.h"
@@ -78,11 +77,8 @@ static void __init intc_register_irq(struct intc_desc *desc,
struct intc_handle_int *hp;
struct irq_data *irq_data;
unsigned int data[2], primary;
-   unsigned long flags;
 
-   raw_spin_lock_irqsave(_big_lock, flags);
-   radix_tree_insert(>tree, enum_id, intc_irq_xlate_get(irq));
-   raw_spin_unlock_irqrestore(_big_lock, flags);
+   xa_store(>array, enum_id, intc_irq_xlate_get(irq), GFP_ATOMIC);
 
/*
 * Prefer single interrupt source bitmap over other combinations:
@@ -196,8 +192,7 @@ int __init register_intc_controller(struct intc_desc *desc)
INIT_LIST_HEAD(>list);
list_add_tail(>list, _list);
 
-   raw_spin_lock_init(>lock);
-   INIT_RADIX_TREE(>tree, GFP_ATOMIC);
+   xa_init(>array);
 
d->index = nr_intc_controllers;
 
diff --git a/drivers/sh/intc/internals.h b/drivers/sh/intc/internals.h
index fa73c173b56a..9b6fd07e99a6 100644
--- a/drivers/sh/intc/internals.h
+++ b/drivers/sh/intc/internals.h
@@ -5,7 +5,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #define _INTC_MK(fn, mode, addr_e, addr_d, width, shift) \
@@ -54,8 +54,7 @@ struct intc_subgroup_entry {
 struct intc_desc_int {
struct list_head list;
struct device dev;
-   struct radix_tree_root tree;
-   raw_spinlock_t lock;
+   struct xarray array;
unsigned int index;
unsigned long *reg;
 #ifdef CONFIG_SMP
diff --git a/drivers/sh/intc/virq.c b/drivers/sh/intc/virq.c
index a638c3048207..801c9c8b7556 100644
--- a/drivers/sh/intc/virq.c
+++ b/drivers/sh/intc/virq.c
@@ -12,7 +12,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include "internals.h"
@@ -27,10 +26,7 @@ struct intc_virq_list {
 #define for_each_virq(entry, head) \
for (entry = head; entry; entry = entry->next)
 
-/*
- * Tags for the radix tree
- */
-#define INTC_TAG_VIRQ_NEEDS_ALLOC  0
+#define INTC_TAG_VIRQ_NEEDS_ALLOC  XA_TAG_0
 
 void intc_irq_xlate_set(unsigned int irq, intc_enum id, struct intc_desc_int 
*d)
 {
@@ -54,23 +50,18 @@ int intc_irq_lookup(const char *chipname, intc_enum enum_id)
int irq = -1;
 
list_for_each_entry(d, _list, list) {
-   int tagged;
-
if (strcmp(d->chip.name, chipname) != 0)
continue;
 
/*
 * Catch early lookups for subgroup VIRQs that have not
-* yet been allocated an IRQ. This already includes a
-* fast-path out if the tree is untagged, so there is no
-* need to explicitly test the root tree.
+* yet been allocated an IRQ.
 */
-   tagged = radix_tree_tag_get(>tree, enum_id,
-   INTC_TAG_VIRQ_NEEDS_ALLOC);
-   if (unlikely(tagged))
+   if (unlikely(xa_get_tag(>array, enum_id,
+   INTC_TAG_VIRQ_NEEDS_ALLOC)))
break;
 
-   ptr = radix_tree_lookup(>tree, enum_id);
+   ptr = xa_load(>array, enum_id);
if (ptr) {
irq = ptr - intc_irq_xlate;
break;
@@ -148,22 +139,16 @@ static void __init intc_subgroup_init_one(struct 
intc_desc *desc,
 {
struct intc_map_entry *mapped;
unsigned int pirq;
-   unsigned long flags;
int i;
 
-   mapped = radix_tree_lookup(>tree, subgroup->parent_id);
-   if (!mapped) {
-   WARN_ON(1);
+   mapped = xa_load(>array, subgroup->parent_id);
+   if (WARN_ON(!mapped))
return;
-   }
 
pirq = mapped - intc_irq_xlate;
 
-   raw_spin_lock_irqsave(>lock, flags);
-
for (i = 0; i < ARRAY_SIZE(subgroup->enum_ids); i++) {
struct intc_subgroup_entry *entry;
-   int err;
 
if (!subgroup->enum_ids[i])
continue;
@@ -176,15 +161,14 @@ static void __init intc_subgroup_init_one(struct 
intc_desc *desc,
entry-&

[PATCH v6 80/99] blk-ioc: Convert to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Skip converting the lock to use xa_lock; I think this code can live with
the double-locking.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 block/blk-ioc.c   | 13 +++--
 include/linux/iocontext.h |  6 +++---
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index f23311e4b201..baf83c8ac503 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -68,7 +68,7 @@ static void ioc_destroy_icq(struct io_cq *icq)
 
lockdep_assert_held(>lock);
 
-   radix_tree_delete(>icq_tree, icq->q->id);
+   xa_erase(>icq_array, icq->q->id);
hlist_del_init(>ioc_node);
list_del_init(>q_node);
 
@@ -278,7 +278,7 @@ int create_task_io_context(struct task_struct *task, gfp_t 
gfp_flags, int node)
atomic_set(>nr_tasks, 1);
atomic_set(>active_ref, 1);
spin_lock_init(>lock);
-   INIT_RADIX_TREE(>icq_tree, GFP_ATOMIC | __GFP_HIGH);
+   xa_init_flags(>icq_array, XA_FLAGS_LOCK_IRQ);
INIT_HLIST_HEAD(>icq_list);
INIT_WORK(>release_work, ioc_release_fn);
 
@@ -363,7 +363,7 @@ struct io_cq *ioc_lookup_icq(struct io_context *ioc, struct 
request_queue *q)
if (icq && icq->q == q)
goto out;
 
-   icq = radix_tree_lookup(>icq_tree, q->id);
+   icq = xa_load(>icq_array, q->id);
if (icq && icq->q == q)
rcu_assign_pointer(ioc->icq_hint, icq); /* allowed to race */
else
@@ -398,7 +398,7 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct 
request_queue *q,
if (!icq)
return NULL;
 
-   if (radix_tree_maybe_preload(gfp_mask) < 0) {
+   if (xa_reserve(>icq_array, q->id, gfp_mask)) {
kmem_cache_free(et->icq_cache, icq);
return NULL;
}
@@ -412,7 +412,8 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct 
request_queue *q,
spin_lock_irq(q->queue_lock);
spin_lock(>lock);
 
-   if (likely(!radix_tree_insert(>icq_tree, q->id, icq))) {
+   if (likely(!xa_store(>icq_array, q->id, icq,
+   GFP_ATOMIC | __GFP_HIGH))) {
hlist_add_head(>ioc_node, >icq_list);
list_add(>q_node, >icq_list);
if (et->uses_mq && et->ops.mq.init_icq)
@@ -421,6 +422,7 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct 
request_queue *q,
et->ops.sq.elevator_init_icq_fn(icq);
} else {
kmem_cache_free(et->icq_cache, icq);
+   xa_erase(>icq_array, q->id);
icq = ioc_lookup_icq(ioc, q);
if (!icq)
printk(KERN_ERR "cfq: icq link failed!\n");
@@ -428,7 +430,6 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct 
request_queue *q,
 
spin_unlock(>lock);
spin_unlock_irq(q->queue_lock);
-   radix_tree_preload_end();
return icq;
 }
 
diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
index dba15ca8e60b..e16224f70084 100644
--- a/include/linux/iocontext.h
+++ b/include/linux/iocontext.h
@@ -2,9 +2,9 @@
 #ifndef IOCONTEXT_H
 #define IOCONTEXT_H
 
-#include 
 #include 
 #include 
+#include 
 
 enum {
ICQ_EXITED  = 1 << 2,
@@ -56,7 +56,7 @@ enum {
  * - ioc->icq_list and icq->ioc_node are protected by ioc lock.
  *   q->icq_list and icq->q_node by q lock.
  *
- * - ioc->icq_tree and ioc->icq_hint are protected by ioc lock, while icq
+ * - ioc->icq_array and ioc->icq_hint are protected by ioc lock, while icq
  *   itself is protected by q lock.  However, both the indexes and icq
  *   itself are also RCU managed and lookup can be performed holding only
  *   the q lock.
@@ -111,7 +111,7 @@ struct io_context {
int nr_batch_requests; /* Number of requests left in the batch */
unsigned long last_waited; /* Time last woken after wait for request */
 
-   struct radix_tree_root  icq_tree;
+   struct xarray   icq_array;
struct io_cq __rcu  *icq_hint;
struct hlist_head   icq_list;
 
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 54/99] nilfs2: Convert to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

I'm not 100% convinced that the rewrite of nilfs_copy_back_pages is
correct, but it will at least have different bugs from the current
version.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/nilfs2/btnode.c | 37 +++-
 fs/nilfs2/page.c   | 72 +++---
 2 files changed, 56 insertions(+), 53 deletions(-)

diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index 9e2a00207436..b5997e8c5441 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -177,42 +177,36 @@ int nilfs_btnode_prepare_change_key(struct address_space 
*btnc,
ctxt->newbh = NULL;
 
if (inode->i_blkbits == PAGE_SHIFT) {
-   lock_page(obh->b_page);
-   /*
-* We cannot call radix_tree_preload for the kernels older
-* than 2.6.23, because it is not exported for modules.
-*/
+   void *entry;
+   struct page *opage = obh->b_page;
+   lock_page(opage);
 retry:
-   err = radix_tree_preload(GFP_NOFS & ~__GFP_HIGHMEM);
-   if (err)
-   goto failed_unlock;
/* BUG_ON(oldkey != obh->b_page->index); */
-   if (unlikely(oldkey != obh->b_page->index))
-   NILFS_PAGE_BUG(obh->b_page,
+   if (unlikely(oldkey != opage->index))
+   NILFS_PAGE_BUG(opage,
   "invalid oldkey %lld (newkey=%lld)",
   (unsigned long long)oldkey,
   (unsigned long long)newkey);
 
-   xa_lock_irq(>pages);
-   err = radix_tree_insert(>pages, newkey, obh->b_page);
-   xa_unlock_irq(>pages);
+   entry = xa_cmpxchg(>pages, newkey, NULL, opage, GFP_NOFS);
/*
 * Note: page->index will not change to newkey until
 * nilfs_btnode_commit_change_key() will be called.
 * To protect the page in intermediate state, the page lock
 * is held.
 */
-   radix_tree_preload_end();
-   if (!err)
+   if (!entry)
return 0;
-   else if (err != -EEXIST)
+   if (xa_is_err(entry)) {
+   err = xa_err(entry);
goto failed_unlock;
+   }
 
err = invalidate_inode_pages2_range(btnc, newkey, newkey);
if (!err)
goto retry;
/* fallback to copy mode */
-   unlock_page(obh->b_page);
+   unlock_page(opage);
}
 
nbh = nilfs_btnode_create_block(btnc, newkey);
@@ -252,9 +246,8 @@ void nilfs_btnode_commit_change_key(struct address_space 
*btnc,
mark_buffer_dirty(obh);
 
xa_lock_irq(>pages);
-   radix_tree_delete(>pages, oldkey);
-   radix_tree_tag_set(>pages, newkey,
-  PAGECACHE_TAG_DIRTY);
+   __xa_erase(>pages, oldkey);
+   __xa_set_tag(>pages, newkey, PAGECACHE_TAG_DIRTY);
xa_unlock_irq(>pages);
 
opage->index = obh->b_blocknr = newkey;
@@ -283,9 +276,7 @@ void nilfs_btnode_abort_change_key(struct address_space 
*btnc,
return;
 
if (nbh == NULL) {  /* blocksize == pagesize */
-   xa_lock_irq(>pages);
-   radix_tree_delete(>pages, newkey);
-   xa_unlock_irq(>pages);
+   xa_erase(>pages, newkey);
unlock_page(ctxt->bh->b_page);
} else
brelse(nbh);
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 1c6703efde9e..31d20f624971 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -304,10 +304,10 @@ int nilfs_copy_dirty_pages(struct address_space *dmap,
 void nilfs_copy_back_pages(struct address_space *dmap,
   struct address_space *smap)
 {
+   XA_STATE(xas, >pages, 0);
struct pagevec pvec;
unsigned int i, n;
pgoff_t index = 0;
-   int err;
 
pagevec_init();
 repeat:
@@ -317,43 +317,56 @@ void nilfs_copy_back_pages(struct address_space *dmap,
 
for (i = 0; i < pagevec_count(); i++) {
struct page *page = pvec.pages[i], *dpage;
-   pgoff_t offset = page->index;
+   xas_set(, page->index);
 
lock_page(page);
-   dpage = find_lock_page(dmap, offset);
+   do {
+   xas_lock_irq();
+   dpage = xas_create();
+   if (!xas_error())

[PATCH v6 66/99] page cache: Finish XArray conversion

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

With no more radix tree API users left, we can drop the GFP flags
and use xa_init() instead of INIT_RADIX_TREE().

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/inode.c | 2 +-
 include/linux/fs.h | 2 +-
 mm/swap_state.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index c7b00573c10d..f5680b805336 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -348,7 +348,7 @@ EXPORT_SYMBOL(inc_nlink);
 void address_space_init_once(struct address_space *mapping)
 {
memset(mapping, 0, sizeof(*mapping));
-   INIT_RADIX_TREE(>pages, GFP_ATOMIC | __GFP_ACCOUNT);
+   xa_init_flags(>pages, XA_FLAGS_LOCK_IRQ);
init_rwsem(>i_mmap_rwsem);
INIT_LIST_HEAD(>private_list);
spin_lock_init(>private_lock);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c58bc3c619bf..b459bf4ddb62 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -410,7 +410,7 @@ int pagecache_write_end(struct file *, struct address_space 
*mapping,
  */
 struct address_space {
struct inode*host;
-   struct radix_tree_root  pages;
+   struct xarray   pages;
gfp_t   gfp_mask;
atomic_ti_mmap_writable;
struct rb_root_cached   i_mmap;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 219e3b4f09e6..25f027d0bb00 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -573,7 +573,7 @@ int init_swap_address_space(unsigned int type, unsigned 
long nr_pages)
return -ENOMEM;
for (i = 0; i < nr; i++) {
space = spaces + i;
-   INIT_RADIX_TREE(>pages, GFP_ATOMIC|__GFP_NOWARN);
+   xa_init_flags(>pages, XA_FLAGS_LOCK_IRQ);
atomic_set(>i_mmap_writable, 0);
space->a_ops = _aops;
/* swap cache doesn't use writeback related tags */
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 33/99] mm: Convert add_to_swap_cache to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Combine __add_to_swap_cache and add_to_swap_cache into one function
since there is no more need to preload.

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 mm/swap_state.c | 93 ++---
 1 file changed, 29 insertions(+), 64 deletions(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 3f95e8fc4cb2..a57b5ad4c503 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -107,14 +107,15 @@ void show_swap_cache_info(void)
 }
 
 /*
- * __add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
+ * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
  * but sets SwapCache flag and private instead of mapping and index.
  */
-int __add_to_swap_cache(struct page *page, swp_entry_t entry)
+int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
 {
-   int error, i, nr = hpage_nr_pages(page);
-   struct address_space *address_space;
+   struct address_space *address_space = swap_address_space(entry);
pgoff_t idx = swp_offset(entry);
+   XA_STATE(xas, _space->pages, idx);
+   unsigned long i, nr = 1UL << compound_order(page);
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapCache(page), page);
@@ -123,50 +124,30 @@ int __add_to_swap_cache(struct page *page, swp_entry_t 
entry)
page_ref_add(page, nr);
SetPageSwapCache(page);
 
-   address_space = swap_address_space(entry);
-   xa_lock_irq(_space->pages);
-   for (i = 0; i < nr; i++) {
-   set_page_private(page + i, entry.val + i);
-   error = radix_tree_insert(_space->pages,
- idx + i, page + i);
-   if (unlikely(error))
-   break;
-   }
-   if (likely(!error)) {
+   do {
+   xas_lock_irq();
+   xas_create_range(, idx + nr - 1);
+   if (xas_error())
+   goto unlock;
+   for (i = 0; i < nr; i++) {
+   VM_BUG_ON_PAGE(xas.xa_index != idx + i, page);
+   set_page_private(page + i, entry.val + i);
+   xas_store(, page + i);
+   xas_next();
+   }
address_space->nrpages += nr;
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
ADD_CACHE_INFO(add_total, nr);
-   } else {
-   /*
-* Only the context which have set SWAP_HAS_CACHE flag
-* would call add_to_swap_cache().
-* So add_to_swap_cache() doesn't returns -EEXIST.
-*/
-   VM_BUG_ON(error == -EEXIST);
-   set_page_private(page + i, 0UL);
-   while (i--) {
-   radix_tree_delete(_space->pages, idx + i);
-   set_page_private(page + i, 0UL);
-   }
-   ClearPageSwapCache(page);
-   page_ref_sub(page, nr);
-   }
-   xa_unlock_irq(_space->pages);
+unlock:
+   xas_unlock_irq();
+   } while (xas_nomem(, gfp));
 
-   return error;
-}
-
-
-int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask)
-{
-   int error;
+   if (!xas_error())
+   return 0;
 
-   error = radix_tree_maybe_preload_order(gfp_mask, compound_order(page));
-   if (!error) {
-   error = __add_to_swap_cache(page, entry);
-   radix_tree_preload_end();
-   }
-   return error;
+   ClearPageSwapCache(page);
+   page_ref_sub(page, nr);
+   return xas_error();
 }
 
 /*
@@ -220,7 +201,7 @@ int add_to_swap(struct page *page)
goto fail;
 
/*
-* Radix-tree node allocations from PF_MEMALLOC contexts could
+* XArray node allocations from PF_MEMALLOC contexts could
 * completely exhaust the page allocator. __GFP_NOMEMALLOC
 * stops emergency reserves from being allocated.
 *
@@ -232,7 +213,6 @@ int add_to_swap(struct page *page)
 */
err = add_to_swap_cache(page, entry,
__GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
-   /* -ENOMEM radix-tree allocation failure */
if (err)
/*
 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
@@ -400,19 +380,11 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
gfp_t gfp_mask,
break;  /* Out of memory */
}
 
-   /*
-* call radix_tree_preload() while we can wait.
-*/
-   err = radix_tree_maybe_preload(gfp_mask & GFP_KERNEL);
-   if (err)
-   break;
-
/*
 * Swap entry

[PATCH v6 58/99] dax: Convert lock_slot to XArray

2018-01-17 Thread Matthew Wilcox
From: Matthew Wilcox <mawil...@microsoft.com>

Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
---
 fs/dax.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index f3463d93a6ce..8eab0b56f7f9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -188,12 +188,11 @@ static void dax_wake_mapping_entry_waiter(struct 
address_space *mapping,
 /*
  * Mark the given slot as locked.  Must be called with xa_lock held.
  */
-static inline void *lock_slot(struct address_space *mapping, void **slot)
+static inline void *lock_slot(struct xa_state *xas)
 {
-   unsigned long v = xa_to_value(
-   radix_tree_deref_slot_protected(slot, >pages.xa_lock));
+   unsigned long v = xa_to_value(xas_load(xas));
void *entry = xa_mk_value(v | DAX_ENTRY_LOCK);
-   radix_tree_replace_slot(>pages, slot, entry);
+   xas_store(xas, entry);
return entry;
 }
 
@@ -244,7 +243,7 @@ static void dax_unlock_mapping_entry(struct address_space 
*mapping,
 
xas_lock_irq();
entry = xas_load();
-   if (WARN_ON_ONCE(!entry || !xa_is_value(entry) || !dax_locked(entry))) {
+   if (WARN_ON_ONCE(!xa_is_value(entry) || !dax_locked(entry))) {
xas_unlock_irq();
return;
}
@@ -303,6 +302,7 @@ static void put_unlocked_mapping_entry(struct address_space 
*mapping,
 static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
unsigned long size_flag)
 {
+   XA_STATE(xas, >pages, index);
bool pmd_downgrade = false; /* splitting 2MiB entry into 4k entries? */
void *entry, **slot;
 
@@ -341,7 +341,7 @@ static void *grab_mapping_entry(struct address_space 
*mapping, pgoff_t index,
 * Make sure 'entry' remains valid while we drop
 * xa_lock.
 */
-   entry = lock_slot(mapping, slot);
+   entry = lock_slot();
}
 
xa_unlock_irq(>pages);
@@ -408,7 +408,7 @@ static void *grab_mapping_entry(struct address_space 
*mapping, pgoff_t index,
xa_unlock_irq(>pages);
return entry;
}
-   entry = lock_slot(mapping, slot);
+   entry = lock_slot();
  out_unlock:
xa_unlock_irq(>pages);
return entry;
@@ -639,6 +639,7 @@ static int dax_writeback_one(struct block_device *bdev,
pgoff_t index, void *entry)
 {
struct radix_tree_root *pages = >pages;
+   XA_STATE(xas, pages, index);
void *entry2, **slot, *kaddr;
long ret = 0, id;
sector_t sector;
@@ -675,7 +676,7 @@ static int dax_writeback_one(struct block_device *bdev,
if (!radix_tree_tag_get(pages, index, PAGECACHE_TAG_TOWRITE))
goto put_unlocked;
/* Lock the entry to serialize with page faults */
-   entry = lock_slot(mapping, slot);
+   entry = lock_slot();
/*
 * We can clear the tag now but we have to be careful so that concurrent
 * dax_writeback_one() calls for the same index cannot finish before we
@@ -1500,8 +1501,9 @@ static int dax_insert_pfn_mkwrite(struct vm_fault *vmf,
  pfn_t pfn)
 {
struct address_space *mapping = vmf->vma->vm_file->f_mapping;
-   void *entry, **slot;
pgoff_t index = vmf->pgoff;
+   XA_STATE(xas, >pages, index);
+   void *entry, **slot;
int vmf_ret, error;
 
xa_lock_irq(>pages);
@@ -1517,7 +1519,7 @@ static int dax_insert_pfn_mkwrite(struct vm_fault *vmf,
return VM_FAULT_NOPAGE;
}
radix_tree_tag_set(>pages, index, PAGECACHE_TAG_DIRTY);
-   entry = lock_slot(mapping, slot);
+   entry = lock_slot();
xa_unlock_irq(>pages);
switch (pe_size) {
case PE_SIZE_PTE:
-- 
2.15.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   >