Re: [RFC 1/2] page-flags: Make page lock operation atomic
On Tue, Feb 12, 2019 at 08:45:35AM +0100, Jan Kara wrote: > On Mon 11-02-19 09:56:53, Matthew Wilcox wrote: > > On Mon, Feb 11, 2019 at 06:48:46PM +0100, Jan Kara wrote: > > > On Mon 11-02-19 13:59:24, Linux Upstream wrote: > > > > > > > > > >> Signed-off-by: Chintan Pandya > > > > > > > > > > NAK. > > > > > > > > > > This is bound to regress some stuff. Now agreed that using non-atomic > > > > > ops is tricky, but many are in places where we 'know' there can't be > > > > > concurrency. > > > > > > > > > > If you can show any single one is wrong, we can fix that one, but > > > > > we're > > > > > not going to blanket remove all this just because. > > > > > > > > Not quite familiar with below stack but from crash dump, found that this > > > > was another stack running on some other CPU at the same time which also > > > > updates page cache lru and manipulate locks. > > > > > > > > [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184 > > > > [84415.344588] [20190123_21:27:50.786276]@1 > > > > workingset_refault+0xdc/0x268 > > > > [84415.344600] [20190123_21:27:50.786288]@1 > > > > add_to_page_cache_lru+0x84/0x11c > > > > [84415.344612] [20190123_21:27:50.786301]@1 > > > > ext4_mpage_readpages+0x178/0x714 > > > > [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60 > > > > [84415.344636] [20190123_21:27:50.786324]@1 > > > > __do_page_cache_readahead+0x16c/0x280 > > > > [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588 > > > > [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50 > > > > [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88 > > > > > > > > Not entirely sure if it's racing with the crashing stack or it's simply > > > > overrides the the bit set by case 2 (mentioned in 0/2). > > > > > > So this is interesting. Looking at __add_to_page_cache_locked() nothing > > > seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get > > > reordered into __add_to_page_cache_locked() after page is actually added > > > to > > > the xarray. So that one particular instance might benefit from atomic > > > SetPageLocked or a barrier somewhere between __SetPageLocked() and the > > > actual addition of entry into the xarray. > > > > There's a write barrier when you add something to the XArray, by virtue > > of the call to rcu_assign_pointer(). > > OK, I've missed rcu_assign_pointer(). Thanks for correction... but... > rcu_assign_pointer() is __smp_store_release(&p, v) and that on x86 seems to > be: > > barrier(); \ > WRITE_ONCE(*p, v); \ > > which seems to provide a compiler barrier but not an SMP barrier? So is x86 > store ordering strong enough to make writes appear in the right order? So far > I didn't think so... What am I missing? X86 is TSO, and that is strong enough for this. The only visible reordering on TSO is due to the store-buffer, that is: writes may happen after later reads.
Re: [RFC 1/2] page-flags: Make page lock operation atomic
On Mon 11-02-19 09:56:53, Matthew Wilcox wrote: > On Mon, Feb 11, 2019 at 06:48:46PM +0100, Jan Kara wrote: > > On Mon 11-02-19 13:59:24, Linux Upstream wrote: > > > > > > > >> Signed-off-by: Chintan Pandya > > > > > > > > NAK. > > > > > > > > This is bound to regress some stuff. Now agreed that using non-atomic > > > > ops is tricky, but many are in places where we 'know' there can't be > > > > concurrency. > > > > > > > > If you can show any single one is wrong, we can fix that one, but we're > > > > not going to blanket remove all this just because. > > > > > > Not quite familiar with below stack but from crash dump, found that this > > > was another stack running on some other CPU at the same time which also > > > updates page cache lru and manipulate locks. > > > > > > [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184 > > > [84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268 > > > [84415.344600] [20190123_21:27:50.786288]@1 > > > add_to_page_cache_lru+0x84/0x11c > > > [84415.344612] [20190123_21:27:50.786301]@1 > > > ext4_mpage_readpages+0x178/0x714 > > > [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60 > > > [84415.344636] [20190123_21:27:50.786324]@1 > > > __do_page_cache_readahead+0x16c/0x280 > > > [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588 > > > [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50 > > > [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88 > > > > > > Not entirely sure if it's racing with the crashing stack or it's simply > > > overrides the the bit set by case 2 (mentioned in 0/2). > > > > So this is interesting. Looking at __add_to_page_cache_locked() nothing > > seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get > > reordered into __add_to_page_cache_locked() after page is actually added to > > the xarray. So that one particular instance might benefit from atomic > > SetPageLocked or a barrier somewhere between __SetPageLocked() and the > > actual addition of entry into the xarray. > > There's a write barrier when you add something to the XArray, by virtue > of the call to rcu_assign_pointer(). OK, I've missed rcu_assign_pointer(). Thanks for correction... but... rcu_assign_pointer() is __smp_store_release(&p, v) and that on x86 seems to be: barrier(); \ WRITE_ONCE(*p, v); \ which seems to provide a compiler barrier but not an SMP barrier? So is x86 store ordering strong enough to make writes appear in the right order? So far I didn't think so... What am I missing? Honza -- Jan Kara SUSE Labs, CR
Re: [RFC 1/2] page-flags: Make page lock operation atomic
On Mon, Feb 11, 2019 at 06:48:46PM +0100, Jan Kara wrote: > On Mon 11-02-19 13:59:24, Linux Upstream wrote: > > > > > >> Signed-off-by: Chintan Pandya > > > > > > NAK. > > > > > > This is bound to regress some stuff. Now agreed that using non-atomic > > > ops is tricky, but many are in places where we 'know' there can't be > > > concurrency. > > > > > > If you can show any single one is wrong, we can fix that one, but we're > > > not going to blanket remove all this just because. > > > > Not quite familiar with below stack but from crash dump, found that this > > was another stack running on some other CPU at the same time which also > > updates page cache lru and manipulate locks. > > > > [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184 > > [84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268 > > [84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c > > [84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714 > > [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60 > > [84415.344636] [20190123_21:27:50.786324]@1 > > __do_page_cache_readahead+0x16c/0x280 > > [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588 > > [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50 > > [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88 > > > > Not entirely sure if it's racing with the crashing stack or it's simply > > overrides the the bit set by case 2 (mentioned in 0/2). > > So this is interesting. Looking at __add_to_page_cache_locked() nothing > seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get > reordered into __add_to_page_cache_locked() after page is actually added to > the xarray. So that one particular instance might benefit from atomic > SetPageLocked or a barrier somewhere between __SetPageLocked() and the > actual addition of entry into the xarray. There's a write barrier when you add something to the XArray, by virtue of the call to rcu_assign_pointer().
Re: [RFC 1/2] page-flags: Make page lock operation atomic
On Mon 11-02-19 13:59:24, Linux Upstream wrote: > > > >> Signed-off-by: Chintan Pandya > > > > NAK. > > > > This is bound to regress some stuff. Now agreed that using non-atomic > > ops is tricky, but many are in places where we 'know' there can't be > > concurrency. > > > > If you can show any single one is wrong, we can fix that one, but we're > > not going to blanket remove all this just because. > > Not quite familiar with below stack but from crash dump, found that this > was another stack running on some other CPU at the same time which also > updates page cache lru and manipulate locks. > > [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184 > [84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268 > [84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c > [84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714 > [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60 > [84415.344636] [20190123_21:27:50.786324]@1 > __do_page_cache_readahead+0x16c/0x280 > [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588 > [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50 > [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88 > > Not entirely sure if it's racing with the crashing stack or it's simply > overrides the the bit set by case 2 (mentioned in 0/2). So this is interesting. Looking at __add_to_page_cache_locked() nothing seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get reordered into __add_to_page_cache_locked() after page is actually added to the xarray. So that one particular instance might benefit from atomic SetPageLocked or a barrier somewhere between __SetPageLocked() and the actual addition of entry into the xarray. Honza -- Jan Kara SUSE Labs, CR
Re: [RFC 1/2] page-flags: Make page lock operation atomic
On 11/02/19 7:16 PM, Peter Zijlstra wrote: > On Mon, Feb 11, 2019 at 12:53:53PM +, Chintan Pandya wrote: >> Currently, page lock operation is non-atomic. This is opening >> some scope for race condition. For ex, if 2 threads are accessing >> same page flags, it may happen that our desired thread's page >> lock bit (PG_locked) might get overwritten by other thread >> leaving page unlocked. This can cause issues later when some >> code expects page to be locked but it is not. >> >> Make page lock/unlock operation use the atomic version of >> set_bit API. There are other flag set operations which still >> uses non-atomic version of set_bit API. Bit, that might be >> the change for the future. >> >> Change-Id: I13bdbedc2b198af014d885e1925c93b83ed6660e > > That doesn't belong in patches. Sure. That's a miss. Will fix this. > >> Signed-off-by: Chintan Pandya > > NAK. > > This is bound to regress some stuff. Now agreed that using non-atomic > ops is tricky, but many are in places where we 'know' there can't be > concurrency. > > If you can show any single one is wrong, we can fix that one, but we're > not going to blanket remove all this just because. Not quite familiar with below stack but from crash dump, found that this was another stack running on some other CPU at the same time which also updates page cache lru and manipulate locks. [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184 [84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268 [84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c [84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714 [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60 [84415.344636] [20190123_21:27:50.786324]@1 __do_page_cache_readahead+0x16c/0x280 [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588 [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50 [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88 Not entirely sure if it's racing with the crashing stack or it's simply overrides the the bit set by case 2 (mentioned in 0/2). >
Re: [RFC 1/2] page-flags: Make page lock operation atomic
On Mon, Feb 11, 2019 at 12:53:53PM +, Chintan Pandya wrote: > Currently, page lock operation is non-atomic. This is opening > some scope for race condition. For ex, if 2 threads are accessing > same page flags, it may happen that our desired thread's page > lock bit (PG_locked) might get overwritten by other thread > leaving page unlocked. This can cause issues later when some > code expects page to be locked but it is not. > > Make page lock/unlock operation use the atomic version of > set_bit API. There are other flag set operations which still > uses non-atomic version of set_bit API. Bit, that might be > the change for the future. > > Change-Id: I13bdbedc2b198af014d885e1925c93b83ed6660e That doesn't belong in patches. > Signed-off-by: Chintan Pandya NAK. This is bound to regress some stuff. Now agreed that using non-atomic ops is tricky, but many are in places where we 'know' there can't be concurrency. If you can show any single one is wrong, we can fix that one, but we're not going to blanket remove all this just because.
[RFC 1/2] page-flags: Make page lock operation atomic
Currently, page lock operation is non-atomic. This is opening some scope for race condition. For ex, if 2 threads are accessing same page flags, it may happen that our desired thread's page lock bit (PG_locked) might get overwritten by other thread leaving page unlocked. This can cause issues later when some code expects page to be locked but it is not. Make page lock/unlock operation use the atomic version of set_bit API. There are other flag set operations which still uses non-atomic version of set_bit API. Bit, that might be the change for the future. Change-Id: I13bdbedc2b198af014d885e1925c93b83ed6660e Signed-off-by: Chintan Pandya --- fs/cifs/file.c | 8 fs/pipe.c | 2 +- include/linux/page-flags.h | 2 +- include/linux/pagemap.h| 6 +++--- mm/filemap.c | 4 ++-- mm/khugepaged.c| 2 +- mm/ksm.c | 2 +- mm/memory-failure.c| 2 +- mm/memory.c| 2 +- mm/migrate.c | 2 +- mm/shmem.c | 6 +++--- mm/swap_state.c| 4 ++-- mm/vmscan.c| 2 +- 13 files changed, 22 insertions(+), 22 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 7d6539a04fac..23bcdee37239 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3661,13 +3661,13 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list, * should have access to this page, we're safe to simply set * PG_locked without checking it first. */ - __SetPageLocked(page); + SetPageLocked(page); rc = add_to_page_cache_locked(page, mapping, page->index, gfp); /* give up if we can't stick it in the cache */ if (rc) { - __ClearPageLocked(page); + ClearPageLocked(page); return rc; } @@ -3688,9 +3688,9 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list, if (*bytes + PAGE_SIZE > rsize) break; - __SetPageLocked(page); + SetPageLocked(page); if (add_to_page_cache_locked(page, mapping, page->index, gfp)) { - __ClearPageLocked(page); + ClearPageLocked(page); break; } list_move_tail(&page->lru, tmplist); diff --git a/fs/pipe.c b/fs/pipe.c index 8ef7d7bef775..1bab40a2ca44 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -147,7 +147,7 @@ static int anon_pipe_buf_steal(struct pipe_inode_info *pipe, if (page_count(page) == 1) { if (memcg_kmem_enabled()) memcg_kmem_uncharge(page, 0); - __SetPageLocked(page); + SetPageLocked(page); return 0; } return 1; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 5af67406b9c9..a56a9bd4bc6b 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -268,7 +268,7 @@ static inline int TestClearPage##uname(struct page *page) { return 0; } #define TESTSCFLAG_FALSE(uname) \ TESTSETFLAG_FALSE(uname) TESTCLEARFLAG_FALSE(uname) -__PAGEFLAG(Locked, locked, PF_NO_TAIL) +PAGEFLAG(Locked, locked, PF_NO_TAIL) PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) __CLEARPAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) PAGEFLAG(Error, error, PF_NO_COMPOUND) TESTCLEARFLAG(Error, error, PF_NO_COMPOUND) PAGEFLAG(Referenced, referenced, PF_HEAD) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 51a9a0af3281..87a0447cfbe0 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -619,17 +619,17 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask); /* * Like add_to_page_cache_locked, but used to add newly allocated pages: - * the page is new, so we can just run __SetPageLocked() against it. + * the page is new, so we can just run SetPageLocked() against it. */ static inline int add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask) { int error; - __SetPageLocked(page); + SetPageLocked(page); error = add_to_page_cache_locked(page, mapping, offset, gfp_mask); if (unlikely(error)) - __ClearPageLocked(page); + ClearPageLocked(page); return error; } diff --git a/mm/filemap.c b/mm/filemap.c index 8e09304af1ec..14284726cf3a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -807,11 +807,11 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping, void *shadow = NULL; int ret; - __SetPageLocked(page); + SetPageLocked(page); ret = __add_to_page_cache_locked(page, mapping, offset, gfp_mask, &shadow); if (unlikely(ret