Re: [PATCH 4/6] mm/gup: track gup-pinned pages

2019-02-20 Thread John Hubbard

On 2/20/19 11:24 AM, Ira Weiny wrote:

On Sun, Feb 03, 2019 at 09:21:33PM -0800, john.hubb...@gmail.com wrote:

From: John Hubbard 

[snip]

+ *
+ * Locking: the lockless algorithm described in 
page_cache_gup_pin_speculative()
+ * and page_cache_gup_pin_speculative() provides safe operation for


Did you mean:

page_cache_gup_pin_speculative and __ page_cache_get_speculative __?

Just found this while looking at your branch.

Sorry,
Ira



Hi Ira,

Yes, thanks for catching that. I've changed it in the git repo now, and it will
show up when the next spin of this patchset goes out.

thanks,
--
John Hubbard
NVIDIA


Re: [PATCH 4/6] mm/gup: track gup-pinned pages

2019-02-20 Thread Ira Weiny
On Sun, Feb 03, 2019 at 09:21:33PM -0800, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 

[snip]

>  
> +/*
> + * GUP_PIN_COUNTING_BIAS, and the associated functions that use it, overload
> + * the page's refcount so that two separate items are tracked: the original 
> page
> + * reference count, and also a new count of how many get_user_pages() calls 
> were
> + * made against the page. ("gup-pinned" is another term for the latter).
> + *
> + * With this scheme, get_user_pages() becomes special: such pages are marked
> + * as distinct from normal pages. As such, the new put_user_page() call (and
> + * its variants) must be used in order to release gup-pinned pages.
> + *
> + * Choice of value:
> + *
> + * By making GUP_PIN_COUNTING_BIAS a power of two, debugging of page 
> reference
> + * counts with respect to get_user_pages() and put_user_page() becomes 
> simpler,
> + * due to the fact that adding an even power of two to the page refcount has
> + * the effect of using only the upper N bits, for the code that counts up 
> using
> + * the bias value. This means that the lower bits are left for the exclusive
> + * use of the original code that increments and decrements by one (or at 
> least,
> + * by much smaller values than the bias value).
> + *
> + * Of course, once the lower bits overflow into the upper bits (and this is
> + * OK, because subtraction recovers the original values), then visual 
> inspection
> + * no longer suffices to directly view the separate counts. However, for 
> normal
> + * applications that don't have huge page reference counts, this won't be an
> + * issue.
> + *
> + * This has to work on 32-bit as well as 64-bit systems. In the more 
> constrained
> + * 32-bit systems, the 10 bit value of the bias value leaves 22 bits for the
> + * upper bits. Therefore, only about 4M calls to get_user_page() may occur 
> for
> + * a page.
> + *
> + * Locking: the lockless algorithm described in 
> page_cache_gup_pin_speculative()
> + * and page_cache_gup_pin_speculative() provides safe operation for

Did you mean:

page_cache_gup_pin_speculative and __ page_cache_get_speculative __?

Just found this while looking at your branch.

Sorry,
Ira

> + * get_user_pages and page_mkclean and other calls that race to set up page
> + * table entries.
> + */
> +#define GUP_PIN_COUNTING_BIAS (1UL << 10)
> +
> +int get_gup_pin_page(struct page *page);
> +
> +void put_user_page(struct page *page);
> +void put_user_pages_dirty(struct page **pages, unsigned long npages);
> +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
> +void put_user_pages(struct page **pages, unsigned long npages);
> +
> +/**
> + * page_gup_pinned() - report if a page is gup-pinned (pinned by a call to
> + *   get_user_pages).
> + * @page:pointer to page to be queried.
> + * @Returns: True, if it is likely that the page has been "gup-pinned".
> + *   False, if the page is definitely not gup-pinned.
> + */
> +static inline bool page_gup_pinned(struct page *page)
> +{
> + return (page_ref_count(page)) > GUP_PIN_COUNTING_BIAS;
> +}
> +
>  static inline void get_page(struct page *page)
>  {
>   page = compound_head(page);
> @@ -993,30 +1050,6 @@ static inline void put_page(struct page *page)
>   __put_page(page);
>  }
>  
> -/**
> - * put_user_page() - release a gup-pinned page
> - * @page:pointer to page to be released
> - *
> - * Pages that were pinned via get_user_pages*() must be released via
> - * either put_user_page(), or one of the put_user_pages*() routines
> - * below. This is so that eventually, pages that are pinned via
> - * get_user_pages*() can be separately tracked and uniquely handled. In
> - * particular, interactions with RDMA and filesystems need special
> - * handling.
> - *
> - * put_user_page() and put_page() are not interchangeable, despite this early
> - * implementation that makes them look the same. put_user_page() calls must
> - * be perfectly matched up with get_user_page() calls.
> - */
> -static inline void put_user_page(struct page *page)
> -{
> - put_page(page);
> -}
> -
> -void put_user_pages_dirty(struct page **pages, unsigned long npages);
> -void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
> -void put_user_pages(struct page **pages, unsigned long npages);
> -
>  #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>  #define SECTION_IN_PAGE_FLAGS
>  #endif
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 5c8a9b59cbdc..5f5b72ba595f 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -209,6 +209,11 @@ static inline int page_cache_add_speculative(struct page 
> *page, int count)
>   return __page_cache_add_speculative(page, count);
>  }
>  
> +static inline int page_cache_gup_pin_speculative(struct page *page)
> +{
> + return __page_cache_add_speculative(page, GUP_PIN_COUNTING_BIAS);
> +}
> +
>  #ifdef 

Re: [PATCH 4/6] mm/gup: track gup-pinned pages

2019-02-04 Thread John Hubbard
On 2/4/19 10:19 AM, Matthew Wilcox wrote:
> On Sun, Feb 03, 2019 at 09:21:33PM -0800, john.hubb...@gmail.com wrote:
>> +/*
>> + * GUP_PIN_COUNTING_BIAS, and the associated functions that use it, overload
>> + * the page's refcount so that two separate items are tracked: the original 
>> page
>> + * reference count, and also a new count of how many get_user_pages() calls 
>> were
>> + * made against the page. ("gup-pinned" is another term for the latter).
>> + *
>> + * With this scheme, get_user_pages() becomes special: such pages are marked
>> + * as distinct from normal pages. As such, the new put_user_page() call (and
>> + * its variants) must be used in order to release gup-pinned pages.
>> + *
>> + * Choice of value:
>> + *
>> + * By making GUP_PIN_COUNTING_BIAS a power of two, debugging of page 
>> reference
>> + * counts with respect to get_user_pages() and put_user_page() becomes 
>> simpler,
>> + * due to the fact that adding an even power of two to the page refcount has
>> + * the effect of using only the upper N bits, for the code that counts up 
>> using
>> + * the bias value. This means that the lower bits are left for the exclusive
>> + * use of the original code that increments and decrements by one (or at 
>> least,
>> + * by much smaller values than the bias value).
>> + *
>> + * Of course, once the lower bits overflow into the upper bits (and this is
>> + * OK, because subtraction recovers the original values), then visual 
>> inspection
>> + * no longer suffices to directly view the separate counts. However, for 
>> normal
>> + * applications that don't have huge page reference counts, this won't be an
>> + * issue.
>> + *
>> + * This has to work on 32-bit as well as 64-bit systems. In the more 
>> constrained
>> + * 32-bit systems, the 10 bit value of the bias value leaves 22 bits for the
>> + * upper bits. Therefore, only about 4M calls to get_user_page() may occur 
>> for
>> + * a page.
> 
> The refcount is 32-bit on both 64 and 32 bit systems.  This limit
> exists on both sizes of system.
> 

Oh right, I'll just delete that last paragraph, then. Thanks for catching that.


thanks,
-- 
John Hubbard
NVIDIA


Re: [PATCH 4/6] mm/gup: track gup-pinned pages

2019-02-04 Thread Matthew Wilcox
On Sun, Feb 03, 2019 at 09:21:33PM -0800, john.hubb...@gmail.com wrote:
> +/*
> + * GUP_PIN_COUNTING_BIAS, and the associated functions that use it, overload
> + * the page's refcount so that two separate items are tracked: the original 
> page
> + * reference count, and also a new count of how many get_user_pages() calls 
> were
> + * made against the page. ("gup-pinned" is another term for the latter).
> + *
> + * With this scheme, get_user_pages() becomes special: such pages are marked
> + * as distinct from normal pages. As such, the new put_user_page() call (and
> + * its variants) must be used in order to release gup-pinned pages.
> + *
> + * Choice of value:
> + *
> + * By making GUP_PIN_COUNTING_BIAS a power of two, debugging of page 
> reference
> + * counts with respect to get_user_pages() and put_user_page() becomes 
> simpler,
> + * due to the fact that adding an even power of two to the page refcount has
> + * the effect of using only the upper N bits, for the code that counts up 
> using
> + * the bias value. This means that the lower bits are left for the exclusive
> + * use of the original code that increments and decrements by one (or at 
> least,
> + * by much smaller values than the bias value).
> + *
> + * Of course, once the lower bits overflow into the upper bits (and this is
> + * OK, because subtraction recovers the original values), then visual 
> inspection
> + * no longer suffices to directly view the separate counts. However, for 
> normal
> + * applications that don't have huge page reference counts, this won't be an
> + * issue.
> + *
> + * This has to work on 32-bit as well as 64-bit systems. In the more 
> constrained
> + * 32-bit systems, the 10 bit value of the bias value leaves 22 bits for the
> + * upper bits. Therefore, only about 4M calls to get_user_page() may occur 
> for
> + * a page.

The refcount is 32-bit on both 64 and 32 bit systems.  This limit
exists on both sizes of system.