Re: Changes to reduce pressure on uvm_pageqlock
On Thu, Dec 12, 2019 at 06:30:49PM +0900, Takashi YAMAMOTO wrote: > > I do want to add more fields. It turns out the first two fit into the > > unused lower bits of pg->phys_addr without too much inconvenience for > > anybody: > > > > - freelist: 'cos uvm_page_lookup_freelist() turns up prominently in traces > > > > how about having physseg id instead? > i think it isn't too expensive to get freelist and phys_addr using it. > i guess numa things have good uses of physseg-equivalent too. It looks doable, and still better than using the rbtree in uvm_pagefree(), but would require changes to the allocation of physsegs for UVM_HOTPLUG. I will go with freelist # to begin with, but it will be easy to change to seg later with some small changes to the allocation. I will include that in the comments. > > - bucket: which cpu->ci_package_id I am from, for L2/L3 locality or NUMA > > > > The third looks like it'll fit in the bottom bits of pg->offset but I need > > to investigate what dificulties it might cause: > > > > - waiter count for PG_BUSY, replacing PG_WANTED, to avoid thundering herd > > > > i guess sooner or later we will need a major surgery to allow concurrent > faults on a page. at least for easy/common cases. > (well, it has been "sooner or later" for a decade.) Yup a waiters count only treats the symptom. I looked at FreeBSD's vm_page out of interest and it seems they have turned PG_BUSY into a kind of reader/writer lock. Hmm. Andrew
Re: Changes to reduce pressure on uvm_pageqlock
hi, 2019年12月11日(水) 20:24 Andrew Doran : > On Sun, Dec 08, 2019 at 07:00:43AM -0800, Jason Thorpe wrote: > > > > On Dec 8, 2019, at 6:00 AM, Mindaugas Rasiukevicius > wrote: > > > > > > Just a small note: since vm_page::wire_count and vm_page::loan_count > are > > > kind of mutually exclusive, I think suggestion by yamt@ (long time > ago) to > > > merge them (positive value meaning wired count and negative -- loan) > is a > > > useful one. It would also free up those 16-bits, since Andrew seems > to be > > > in need of some. > > Thank you for the suggestion. I have added stuff into vm_page which I > don't > particularly like doing, so I'll look into the change you mentioned, maybe > not this week though and after I deliver this set of changes. > > I do want to add more fields. It turns out the first two fit into the > unused lower bits of pg->phys_addr without too much inconvenience for > anybody: > > - freelist: 'cos uvm_page_lookup_freelist() turns up prominently in traces > how about having physseg id instead? i think it isn't too expensive to get freelist and phys_addr using it. i guess numa things have good uses of physseg-equivalent too. > - bucket: which cpu->ci_package_id I am from, for L2/L3 locality or NUMA > > The third looks like it'll fit in the bottom bits of pg->offset but I need > to investigate what dificulties it might cause: > > - waiter count for PG_BUSY, replacing PG_WANTED, to avoid thundering herd > i guess sooner or later we will need a major surgery to allow concurrent faults on a page. at least for easy/common cases. (well, it has been "sooner or later" for a decade.) > > > That seems like a good change to make, but omg please wrap the > > manipulation in macros / inlines because it seems like otherwise it could > > be quite error-prone. > > Agreed. > > Andrew >
Re: Changes to reduce pressure on uvm_pageqlock
On Sun, Dec 08, 2019 at 07:00:43AM -0800, Jason Thorpe wrote: > > On Dec 8, 2019, at 6:00 AM, Mindaugas Rasiukevicius > > wrote: > > > > Just a small note: since vm_page::wire_count and vm_page::loan_count are > > kind of mutually exclusive, I think suggestion by yamt@ (long time ago) to > > merge them (positive value meaning wired count and negative -- loan) is a > > useful one. It would also free up those 16-bits, since Andrew seems to be > > in need of some. Thank you for the suggestion. I have added stuff into vm_page which I don't particularly like doing, so I'll look into the change you mentioned, maybe not this week though and after I deliver this set of changes. I do want to add more fields. It turns out the first two fit into the unused lower bits of pg->phys_addr without too much inconvenience for anybody: - freelist: 'cos uvm_page_lookup_freelist() turns up prominently in traces - bucket: which cpu->ci_package_id I am from, for L2/L3 locality or NUMA The third looks like it'll fit in the bottom bits of pg->offset but I need to investigate what dificulties it might cause: - waiter count for PG_BUSY, replacing PG_WANTED, to avoid thundering herd > That seems like a good change to make, but omg please wrap the > manipulation in macros / inlines because it seems like otherwise it could > be quite error-prone. Agreed. Andrew
Re: Changes to reduce pressure on uvm_pageqlock
Complete & compilable diff as promised. Having taken feedback into consideration and considering that with the last revision uvm_pageqlock wasn't doing a whole lot any more, this takes it further than before, about as far as I want to go right now. Summary: - uvm_pageqlock is gone. - instead we have vm_page::interlock protecting identity, and... - the pdpolicy has its own lock. could do multiple queues/locks later. - the ground is paved for lazy dequeueing of pages. - vm_page has grown (yamt's radixtree can eliminate rbtree node later). - i think it fixes this recent PR: http://gnats.netbsd.org/54727 - vm_page::pqflags becomes totally private to pdpolicy. it now needs to be 32-bits to stand on its own for architectures that can't do atomic sub-word writes, because it's locked by a different lock to the adjacent fields. i am using the spare bits for lazy activation. - i have bumped loan & wire counts to 32-bits. i don't think 16 is safe. - clockpro was already broken but I have tried not to make it worse. http://www.netbsd.org/~ad/2019/uvm_pageqlock.diff Cheers, Andrew On Sat, Dec 07, 2019 at 10:40:39PM +, Andrew Doran wrote: > On Sat, Dec 07, 2019 at 01:32:35PM -0800, Chuck Silvers wrote: > > > I like this new version much more than the previous version. > > The new diff looks incomplete though, it uses the "pdpol" field in > > vm_page that was new in the previous diff. It looks like there > > should be changes in uvm_pdaemon.c to go with the new diff too? > > Could you send the complete new diff again please? > > Yes it's incomplete. I'll split these pieces out of my tree, merge into > -current tomorrow and get a clean diff. > > I think the locking change is enough on its own, so I will exclude the > "pdpol" piece to avoid clouding things, and go again with another review for > that and other parts in the near future. > > > > With that done, we'd then only need one version of uvm_pageactivate() etc, > > > because those functions will no longer care about uvm_pageqlock. > > > > The pageq lock is named that because it currently protects the pageq's. :-) > > If it's not going to protect the pageq's anymore then it ought to be > > renamed. > > Probably better to just work on eliminating it though. > > Agreed. > > > In your new diff, the pageq lock is still used by the pagedaemon > > (for uvmpd_trylockowner() as you noted in the comment). > > It would be good not to have a global lock involved in changing page > > identity, > > and with the pageq's now protected by a separate lock, we should be able to > > eliminate the need for the pageq lock when freeing a page with an RCUy thing > > to prevent freeing uvm_object and uvm_anon structures while the pagedaemon > > might > > still be accessing them via the vm_page fields. We should be able to > > improve > > the locking for pg->loan_count with a similar RCUy thing... my recollection > > is > > that the only reason loan_count uses the pageq lock currently is to > > stabilize > > the page identity while locking the owning object. It might make sense to > > have > > On stablisation yes I agree. My reading has always been that it's safe to > test those properties with either the pageqlock or the object locked - but > on what happens next, your intent matters. > > > pg->wire_count protected by the pdpol lock too (though I have some thoughts > > about changing the nature of wire_count that would probaby make that not > > make sense, I haven't had time to think that through yet). > > > > But anyway, I'll stop rambling at this point so we can focus on the diff at > > hand. > > :-) > > That's an interesting point re stabilisation and a RCUy type thing I see > exactly what you're getting at and it's related to the few points of > connection between datasets I mentioned. I'll have a think about that. > > Thanks for taking the time to look this over Chuck. > > Cheers, > Andrew > > > > > -Chuck > > > > > > > I tried it out: the code looks more elegant and the scheme works in > > > practice > > > with contention on uvm_pageqlock now quite minor, and the sum of > > > contention > > > on the two locks lower than it was on uvm_pageqlock prior. > > > > > > With the locking for the pdpolicy code being private to the file I think > > > it > > > would then be easier to experiment with different strategies to reduce > > > contention further. > > > > > > http://www.netbsd.org/~ad/2019/pdpolicy.diff > > > > > > Andrew
Re: Changes to reduce pressure on uvm_pageqlock
> On Dec 8, 2019, at 6:00 AM, Mindaugas Rasiukevicius wrote: > > Just a small note: since vm_page::wire_count and vm_page::loan_count are > kind of mutually exclusive, I think suggestion by yamt@ (long time ago) to > merge them (positive value meaning wired count and negative -- loan) is a > useful one. It would also free up those 16-bits, since Andrew seems to be > in need of some. That seems like a good change to make, but omg please wrap the manipulation in macros / inlines because it seems like otherwise it could be quite error-prone. -- thorpej
Re: Changes to reduce pressure on uvm_pageqlock
Chuck Silvers wrote: > In your new diff, the pageq lock is still used by the pagedaemon > (for uvmpd_trylockowner() as you noted in the comment). > It would be good not to have a global lock involved in changing page > identity, and with the pageq's now protected by a separate lock, we > should be able to eliminate the need for the pageq lock when freeing a > page with an RCUy thing to prevent freeing uvm_object and uvm_anon > structures while the pagedaemon might still be accessing them via the > vm_page fields. We should be able to improve the locking for > pg->loan_count with a similar RCUy thing... my recollection is that the > only reason loan_count uses the pageq lock currently is to stabilize the > page identity while locking the owning object. It might make sense to > have pg->wire_count protected by the pdpol lock too (though I have some > thoughts about changing the nature of wire_count that would probaby make > that not make sense, I haven't had time to think that through yet). Just a small note: since vm_page::wire_count and vm_page::loan_count are kind of mutually exclusive, I think suggestion by yamt@ (long time ago) to merge them (positive value meaning wired count and negative -- loan) is a useful one. It would also free up those 16-bits, since Andrew seems to be in need of some. -- Mindaugas
Re: Changes to reduce pressure on uvm_pageqlock
On Sat, Dec 07, 2019 at 01:32:35PM -0800, Chuck Silvers wrote: > I like this new version much more than the previous version. > The new diff looks incomplete though, it uses the "pdpol" field in > vm_page that was new in the previous diff. It looks like there > should be changes in uvm_pdaemon.c to go with the new diff too? > Could you send the complete new diff again please? Yes it's incomplete. I'll split these pieces out of my tree, merge into -current tomorrow and get a clean diff. I think the locking change is enough on its own, so I will exclude the "pdpol" piece to avoid clouding things, and go again with another review for that and other parts in the near future. > > With that done, we'd then only need one version of uvm_pageactivate() etc, > > because those functions will no longer care about uvm_pageqlock. > > The pageq lock is named that because it currently protects the pageq's. :-) > If it's not going to protect the pageq's anymore then it ought to be renamed. > Probably better to just work on eliminating it though. Agreed. > In your new diff, the pageq lock is still used by the pagedaemon > (for uvmpd_trylockowner() as you noted in the comment). > It would be good not to have a global lock involved in changing page identity, > and with the pageq's now protected by a separate lock, we should be able to > eliminate the need for the pageq lock when freeing a page with an RCUy thing > to prevent freeing uvm_object and uvm_anon structures while the pagedaemon > might > still be accessing them via the vm_page fields. We should be able to improve > the locking for pg->loan_count with a similar RCUy thing... my recollection is > that the only reason loan_count uses the pageq lock currently is to stabilize > the page identity while locking the owning object. It might make sense to > have On stablisation yes I agree. My reading has always been that it's safe to test those properties with either the pageqlock or the object locked - but on what happens next, your intent matters. > pg->wire_count protected by the pdpol lock too (though I have some thoughts > about changing the nature of wire_count that would probaby make that not > make sense, I haven't had time to think that through yet). > > But anyway, I'll stop rambling at this point so we can focus on the diff at > hand. > :-) That's an interesting point re stabilisation and a RCUy type thing I see exactly what you're getting at and it's related to the few points of connection between datasets I mentioned. I'll have a think about that. Thanks for taking the time to look this over Chuck. Cheers, Andrew > > -Chuck > > > > I tried it out: the code looks more elegant and the scheme works in practice > > with contention on uvm_pageqlock now quite minor, and the sum of contention > > on the two locks lower than it was on uvm_pageqlock prior. > > > > With the locking for the pdpolicy code being private to the file I think it > > would then be easier to experiment with different strategies to reduce > > contention further. > > > > http://www.netbsd.org/~ad/2019/pdpolicy.diff > > > > Andrew
Re: Changes to reduce pressure on uvm_pageqlock
I like this new version much more than the previous version. The new diff looks incomplete though, it uses the "pdpol" field in vm_page that was new in the previous diff. It looks like there should be changes in uvm_pdaemon.c to go with the new diff too? Could you send the complete new diff again please? On Fri, Dec 06, 2019 at 04:34:50PM +, Andrew Doran wrote: > I ended up taking a disliking to these changes so I went back and took > another look at the problem yesterday evening. > > There are two distinct sets of data involved that need protection: page > identity (connected anon, uobj, wiring, loaning) and page replacement state. > > There are limited points of connection between those two sets, so it's easy > to do two locks. We can give the pdpolicy code a lock for its state and let > it manage that lock, which only it and the pagedaemon need to know about. > The lock can be private to the file (uvm_pdpolicy_clock / clockpro.c), with > a couple of hooks for the pagedaemon to acquire and release it. I like this. > With that done, we'd then only need one version of uvm_pageactivate() etc, > because those functions will no longer care about uvm_pageqlock. The pageq lock is named that because it currently protects the pageq's. :-) If it's not going to protect the pageq's anymore then it ought to be renamed. Probably better to just work on eliminating it though. In your new diff, the pageq lock is still used by the pagedaemon (for uvmpd_trylockowner() as you noted in the comment). It would be good not to have a global lock involved in changing page identity, and with the pageq's now protected by a separate lock, we should be able to eliminate the need for the pageq lock when freeing a page with an RCUy thing to prevent freeing uvm_object and uvm_anon structures while the pagedaemon might still be accessing them via the vm_page fields. We should be able to improve the locking for pg->loan_count with a similar RCUy thing... my recollection is that the only reason loan_count uses the pageq lock currently is to stabilize the page identity while locking the owning object. It might make sense to have pg->wire_count protected by the pdpol lock too (though I have some thoughts about changing the nature of wire_count that would probaby make that not make sense, I haven't had time to think that through yet). But anyway, I'll stop rambling at this point so we can focus on the diff at hand. :-) -Chuck > I tried it out: the code looks more elegant and the scheme works in practice > with contention on uvm_pageqlock now quite minor, and the sum of contention > on the two locks lower than it was on uvm_pageqlock prior. > > With the locking for the pdpolicy code being private to the file I think it > would then be easier to experiment with different strategies to reduce > contention further. > > http://www.netbsd.org/~ad/2019/pdpolicy.diff > > Andrew
Re: Changes to reduce pressure on uvm_pageqlock
On Fri, Dec 06, 2019 at 11:33:27PM +, Andrew Doran wrote: > [...] > As to the reason why: at the start of November on my machine system time for > a kernel build was 3200-3300s. With this plus the remaining changes it's > down to 850s so far and with !DIAGNOSTIC about 580s. good job ! thanks ! -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: Changes to reduce pressure on uvm_pageqlock
Ok here's is a more complete patch for the change to uvm_pageqlock. I have edited by hand to remove lots of unrelated changes so it's to get a picture of the changes - it won't apply or compile. http://www.netbsd.org/~ad/2019/partial.diff (It's time consuming to detangle this from the rest of my changes which are per-CPU stat collection, a topology-aware page allocator that dispenses with the "per-cpu" lists we have now, and yamt's radixtree). There are a few more things I want to look at tomorrow including the PQ_READAHEAD flag (it's not safe now) whether the pagedaemon needs to know about the pdpolicy lock at all, and a last go over it all checking everything. The earlier changes to vm_page I mentioned for alignment and the page replacement policy I am going to postpone because radixtree changes the picture bigtime and more testing is required. (That's a separate review.) As to the reason why: at the start of November on my machine system time for a kernel build was 3200-3300s. With this plus the remaining changes it's down to 850s so far and with !DIAGNOSTIC about 580s. Andrew On Fri, Dec 06, 2019 at 04:34:50PM +, Andrew Doran wrote: > I ended up taking a disliking to these changes so I went back and took > another look at the problem yesterday evening. > > There are two distinct sets of data involved that need protection: page > identity (connected anon, uobj, wiring, loaning) and page replacement state. > > There are limited points of connection between those two sets, so it's easy > to do two locks. We can give the pdpolicy code a lock for its state and let > it manage that lock, which only it and the pagedaemon need to know about. > The lock can be private to the file (uvm_pdpolicy_clock / clockpro.c), with > a couple of hooks for the pagedaemon to acquire and release it. > > With that done, we'd then only need one version of uvm_pageactivate() etc, > because those functions will no longer care about uvm_pageqlock. > > I tried it out: the code looks more elegant and the scheme works in practice > with contention on uvm_pageqlock now quite minor, and the sum of contention > on the two locks lower than it was on uvm_pageqlock prior. > > With the locking for the pdpolicy code being private to the file I think it > would then be easier to experiment with different strategies to reduce > contention further. > > http://www.netbsd.org/~ad/2019/pdpolicy.diff > > Andrew > > On Tue, Dec 03, 2019 at 12:41:25AM +, Andrew Doran wrote: > > Hi, > > > > I have some straightforward changes for uvm to improve the situation on > > many-CPU machines. I'm going to break them into pieces to make them easier > > to review (only this first piece and what's already in CVS is ready). > > > > I have carefuly measured the impact of these over hundreds of kernel builds, > > using lockstat, tprof and some custom instrumentation so I'm confident that > > for each, the effects at least are of value. > > > > Anyway I'd be grateful if someone could take a look. This one is about > > reducing pressure on uvm_pageqlock, and cache misses on struct vm_page. > > > > Cheers, > > Andrew > > > > > > http://www.netbsd.org/~ad/2019/uvm1.diff > > > > > > vm_page: cluster largely static fields used during page lookup in the first > > 64-bytes. Increase wire_count to 32-bits, and add a field for use of the > > page replacement policy. This leaves 2 bytes spare in a word locked by > > uvm_fpageqlock/uvm_pageqlock which I want to use later for changes to the > > page allocator. It also brings vm_page up to 128 bytes on amd64. > > > > New functions: > > > > => uvmpdpol_pageactivate_p() > > > > For page replacement policy. Returns true if pdpol thinks activation info > > would be useful enough to cause disruption to page queues, vm_page and > > uvm_fpageqlock. For CLOCK this returns true if page is not active, or was > > not activated within the last second. > > > > => uvm_pageenqueue1() > > > > Call without uvm_pageqlock. Acquires the lock and enqueues the page only > > if not already enqueued. > > > > => uvm_pageactivate1() > > > > Call without uvm_pageqlock. Acquires the lock and activates the page only > > if uvmpdpol_pageactivate_p() says yes. No similar change for deactivate nor > > dequeue as they are much more definite. > > > > => uvm_pagefree1() > > > > First part of uvm_pagefree() - strip page of identity. Requires > > uvm_pageqlock if associated with an object. > > > > => uvm_pagefree2() > > > > Second part of uvm_pagefree(). Send page to free list. No locks required.
Re: Changes to reduce pressure on uvm_pageqlock
I ended up taking a disliking to these changes so I went back and took another look at the problem yesterday evening. There are two distinct sets of data involved that need protection: page identity (connected anon, uobj, wiring, loaning) and page replacement state. There are limited points of connection between those two sets, so it's easy to do two locks. We can give the pdpolicy code a lock for its state and let it manage that lock, which only it and the pagedaemon need to know about. The lock can be private to the file (uvm_pdpolicy_clock / clockpro.c), with a couple of hooks for the pagedaemon to acquire and release it. With that done, we'd then only need one version of uvm_pageactivate() etc, because those functions will no longer care about uvm_pageqlock. I tried it out: the code looks more elegant and the scheme works in practice with contention on uvm_pageqlock now quite minor, and the sum of contention on the two locks lower than it was on uvm_pageqlock prior. With the locking for the pdpolicy code being private to the file I think it would then be easier to experiment with different strategies to reduce contention further. http://www.netbsd.org/~ad/2019/pdpolicy.diff Andrew On Tue, Dec 03, 2019 at 12:41:25AM +, Andrew Doran wrote: > Hi, > > I have some straightforward changes for uvm to improve the situation on > many-CPU machines. I'm going to break them into pieces to make them easier > to review (only this first piece and what's already in CVS is ready). > > I have carefuly measured the impact of these over hundreds of kernel builds, > using lockstat, tprof and some custom instrumentation so I'm confident that > for each, the effects at least are of value. > > Anyway I'd be grateful if someone could take a look. This one is about > reducing pressure on uvm_pageqlock, and cache misses on struct vm_page. > > Cheers, > Andrew > > > http://www.netbsd.org/~ad/2019/uvm1.diff > > > vm_page: cluster largely static fields used during page lookup in the first > 64-bytes. Increase wire_count to 32-bits, and add a field for use of the > page replacement policy. This leaves 2 bytes spare in a word locked by > uvm_fpageqlock/uvm_pageqlock which I want to use later for changes to the > page allocator. It also brings vm_page up to 128 bytes on amd64. > > New functions: > > => uvmpdpol_pageactivate_p() > > For page replacement policy. Returns true if pdpol thinks activation info > would be useful enough to cause disruption to page queues, vm_page and > uvm_fpageqlock. For CLOCK this returns true if page is not active, or was > not activated within the last second. > > => uvm_pageenqueue1() > > Call without uvm_pageqlock. Acquires the lock and enqueues the page only > if not already enqueued. > > => uvm_pageactivate1() > > Call without uvm_pageqlock. Acquires the lock and activates the page only > if uvmpdpol_pageactivate_p() says yes. No similar change for deactivate nor > dequeue as they are much more definite. > > => uvm_pagefree1() > > First part of uvm_pagefree() - strip page of identity. Requires > uvm_pageqlock if associated with an object. > > => uvm_pagefree2() > > Second part of uvm_pagefree(). Send page to free list. No locks required.
Changes to reduce pressure on uvm_pageqlock
Hi, I have some straightforward changes for uvm to improve the situation on many-CPU machines. I'm going to break them into pieces to make them easier to review (only this first piece and what's already in CVS is ready). I have carefuly measured the impact of these over hundreds of kernel builds, using lockstat, tprof and some custom instrumentation so I'm confident that for each, the effects at least are of value. Anyway I'd be grateful if someone could take a look. This one is about reducing pressure on uvm_pageqlock, and cache misses on struct vm_page. Cheers, Andrew http://www.netbsd.org/~ad/2019/uvm1.diff vm_page: cluster largely static fields used during page lookup in the first 64-bytes. Increase wire_count to 32-bits, and add a field for use of the page replacement policy. This leaves 2 bytes spare in a word locked by uvm_fpageqlock/uvm_pageqlock which I want to use later for changes to the page allocator. It also brings vm_page up to 128 bytes on amd64. New functions: => uvmpdpol_pageactivate_p() For page replacement policy. Returns true if pdpol thinks activation info would be useful enough to cause disruption to page queues, vm_page and uvm_fpageqlock. For CLOCK this returns true if page is not active, or was not activated within the last second. => uvm_pageenqueue1() Call without uvm_pageqlock. Acquires the lock and enqueues the page only if not already enqueued. => uvm_pageactivate1() Call without uvm_pageqlock. Acquires the lock and activates the page only if uvmpdpol_pageactivate_p() says yes. No similar change for deactivate nor dequeue as they are much more definite. => uvm_pagefree1() First part of uvm_pagefree() - strip page of identity. Requires uvm_pageqlock if associated with an object. => uvm_pagefree2() Second part of uvm_pagefree(). Send page to free list. No locks required.