Re: stable/13, vm page counts do not add up
On Tue, 13 Apr 2021 17:18:42 -0400 Mark Johnston wrote: > > P.S. > > I have not been running any virtual machines. > > I do use nvidia graphics driver. > In past I had report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238698 Now I switch to AMD and got only ~1Gb memory allocated by xorg. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: stable/13, vm page counts do not add up
On 14/04/2021 16:32, Mark Johnston wrote: On Wed, Apr 14, 2021 at 02:21:44PM +0300, Andriy Gapon wrote: On 14/04/2021 00:18, Mark Johnston wrote: fbt::vm_page_unwire:entry /args[0]->oflags & 0x4/ { @unwire[stack()] = count(); } Unrelated report, dtrace complains about this probe on my stable/13 system: failed to resolve translated type for args[0] And I do not have any idea why... There was a regression, see PR 253440. I think you have the fix already, but perhaps not. Could you show output from "dtrace -lv -n fbt::vm_page_unwire:entry"? dtrace -lv -n fbt::vm_page_unwire:entry ID PROVIDERMODULE FUNCTION NAME 54323fbtkernelvm_page_unwire entry Probe Description Attributes Identifier Names: Private Data Semantics: Private Dependency Class: Unknown Argument Attributes Identifier Names: Private Data Semantics: Private Dependency Class: ISA Argument Types args[0]: (unknown) args[1]: (unknown) It seems that I should have the fix, but somehow I still have the problem. I've been doing NO_CLEAN builds for a long while, so maybe some stale file didn't get re-created... It looks that dt_lex.c under /usr/obj is rather dated. ... I've removed that file and rebuilt libdtrace and everything is okay now. Thank you. From ctfdump: [27290] FUNC (vm_page_unwire) returns: 38 args: (1463, 3) <1463> TYPEDEF vm_page_t refers to 778 <778> POINTER (anon) refers to 3575 <3575> STRUCT vm_page (104 bytes) plinks type=3563 off=0 listq type=3558 off=128 object type=3564 off=256 pindex type=3565 off=320 phys_addr type=42 off=384 md type=3571 off=448 ref_count type=31 off=640 busy_lock type=31 off=672 a type=3573 off=704 order type=3 off=736 pool type=3 off=744 flags type=3 off=752 oflags type=3 off=760 psind type=2167 off=768 segind type=2167 off=776 valid type=3574 off=784 dirty type=3574 off=792 -- Andriy Gapon -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: stable/13, vm page counts do not add up
On Wed, Apr 14, 2021 at 02:21:44PM +0300, Andriy Gapon wrote: > On 14/04/2021 00:18, Mark Johnston wrote: > > fbt::vm_page_unwire:entry > > /args[0]->oflags & 0x4/ > > { > > @unwire[stack()] = count(); > > } > > Unrelated report, dtrace complains about this probe on my stable/13 system: > failed to resolve translated type for args[0] > > And I do not have any idea why... There was a regression, see PR 253440. I think you have the fix already, but perhaps not. Could you show output from "dtrace -lv -n fbt::vm_page_unwire:entry"? > > From ctfdump: >[27290] FUNC (vm_page_unwire) returns: 38 args: (1463, 3) > ><1463> TYPEDEF vm_page_t refers to 778 ><778> POINTER (anon) refers to 3575 ><3575> STRUCT vm_page (104 bytes) > plinks type=3563 off=0 > listq type=3558 off=128 > object type=3564 off=256 > pindex type=3565 off=320 > phys_addr type=42 off=384 > md type=3571 off=448 > ref_count type=31 off=640 > busy_lock type=31 off=672 > a type=3573 off=704 > order type=3 off=736 > pool type=3 off=744 > flags type=3 off=752 > oflags type=3 off=760 > psind type=2167 off=768 > segind type=2167 off=776 > valid type=3574 off=784 > dirty type=3574 off=792 > > -- > Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: stable/13, vm page counts do not add up
On 14/04/2021 00:18, Mark Johnston wrote: fbt::vm_page_unwire:entry /args[0]->oflags & 0x4/ { @unwire[stack()] = count(); } Unrelated report, dtrace complains about this probe on my stable/13 system: failed to resolve translated type for args[0] And I do not have any idea why... From ctfdump: [27290] FUNC (vm_page_unwire) returns: 38 args: (1463, 3) <1463> TYPEDEF vm_page_t refers to 778 <778> POINTER (anon) refers to 3575 <3575> STRUCT vm_page (104 bytes) plinks type=3563 off=0 listq type=3558 off=128 object type=3564 off=256 pindex type=3565 off=320 phys_addr type=42 off=384 md type=3571 off=448 ref_count type=31 off=640 busy_lock type=31 off=672 a type=3573 off=704 order type=3 off=736 pool type=3 off=744 flags type=3 off=752 oflags type=3 off=760 psind type=2167 off=768 segind type=2167 off=776 valid type=3574 off=784 dirty type=3574 off=792 -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: stable/13, vm page counts do not add up
On Tue, Apr 13, 2021 at 05:01:49PM +0300, Andriy Gapon wrote: > On 07/04/2021 23:56, Mark Johnston wrote: > > I don't know what might be causing it then. It could be a page leak. > > The kernel allocates wired pages without adjusting the v_wire_count > > counter in some cases, but the ones I know about happen at boot and > > should not account for such a large disparity. I do not see it on a few > > systems that I have access to. > > Mark or anyone, > > do you have a suggestion on how to approach hunting for the potential page > leak? > It's been a long while since I worked with that code and it changed a lot. > > Here is some additional info. > I had approximately 2 million unaccounted pages. > I rebooted the system and that number became 20 thousand which is more > reasonable and could be explained by those boot-time allocations that you > mentioned. > After 30 hours of uptime the number became 60 thousand. > > I monitored the number and so far I could not correlate it with any activity. > > P.S. > I have not been running any virtual machines. > I do use nvidia graphics driver. My guess is that something is allocating pages without VM_ALLOC_WIRE and either they're managed and something is failing to place them in page queues, or they're unmanaged and should likely be counted as wired. It is also possible that something is allocating wired, unmanaged pages and unwiring them without freeing them. For managed pages, vm_page_unwire() ensures they get placed in a queue. vm_page_unwire_noq() does not, but it is typically only used with unmanaged pages. The nvidia drivers do not appear to call any vm_page_* functions, at least based on the kld symbol tables. So you might try using DTrace to collect stacks for these functions, leaving it running for a while and comparing stack counts with the number of pages leaked while the script is running. Something like: fbt::vm_page_alloc_domain_after:entry /(args[3] & 0x20) == 0/ { @alloc[stack()] = count(); } fbt::vm_page_alloc_contig_domain:entry /(args[3] & 0x20) == 0/ { @alloc[stack()] = count(); } fbt::vm_page_unwire_noq:entry { @unwire[stack()] = count(); } fbt::vm_page_unwire:entry /args[0]->oflags & 0x4/ { @unwire[stack()] = count(); } It might be that the count of leaked pages does not relate directly to the counts collected by the script, e.g., because there is some race that results in a leak. But we can try to rule out some easier cases first. I tried to look for possible causes of the KTLS page leak mentioned elsewhere in this thread but can't see any obvious problems. Does your affected system use sendfile() at all? I also wonder if you see much mbuf usage on the system. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: stable/13, vm page counts do not add up
On 07/04/2021 23:56, Mark Johnston wrote: > I don't know what might be causing it then. It could be a page leak. > The kernel allocates wired pages without adjusting the v_wire_count > counter in some cases, but the ones I know about happen at boot and > should not account for such a large disparity. I do not see it on a few > systems that I have access to. Mark or anyone, do you have a suggestion on how to approach hunting for the potential page leak? It's been a long while since I worked with that code and it changed a lot. Here is some additional info. I had approximately 2 million unaccounted pages. I rebooted the system and that number became 20 thousand which is more reasonable and could be explained by those boot-time allocations that you mentioned. After 30 hours of uptime the number became 60 thousand. I monitored the number and so far I could not correlate it with any activity. P.S. I have not been running any virtual machines. I do use nvidia graphics driver. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
RE: stable/13, vm page counts do not add up
> -Original Message- > From: owner-freebsd-curr...@freebsd.org curr...@freebsd.org> On Behalf Of Mark Johnston > Sent: Wednesday, April 7, 2021 10:57 PM > To: Andriy Gapon > Cc: freebsd-stable List ; FreeBSD Current > > Subject: Re: stable/13, vm page counts do not add up > > On Wed, Apr 07, 2021 at 11:22:41PM +0300, Andriy Gapon wrote: > > On 07/04/2021 22:54, Mark Johnston wrote: > > > On Wed, Apr 07, 2021 at 10:42:57PM +0300, Andriy Gapon wrote: > > >> > > >> I regularly see that the top's memory line does not add up (and by a > > >> lot). > > >> That can be seen with vm.stats as well. > > >> > > >> For example: > > >> $ sysctl vm.stats | fgrep count > > >> vm.stats.vm.v_cache_count: 0 > > >> vm.stats.vm.v_user_wire_count: 3231 > > >> vm.stats.vm.v_laundry_count: 262058 > > >> vm.stats.vm.v_inactive_count: 3054178 > > >> vm.stats.vm.v_active_count: 621131 > > >> vm.stats.vm.v_wire_count: 1871176 > > >> vm.stats.vm.v_free_count: 18 > > >> vm.stats.vm.v_page_count: 8134982 > > >> > > >> $ bc > > >>>>> 18 + 1871176 + 621131 + 3054178 + 262058 > > >> 5996320 > > >>>>> 8134982 - 5996320 > > >> 2138662 > > >> > > >> As you can see, it's not a small number of pages either. > > >> Approximately 2 million pages, 8 gigabytes or 25% of the whole memory > on this > > >> system. > > >> > > >> This is 47c00a9835926e96, 13.0-STABLE amd64. > > >> I do not think that I saw anything like that when I used (much) older > FreeBSD. > > > > > > One relevant change is that vm_page_wire() no longer removes pages > from > > > LRU queues, so the count of pages in the queues can include wired > pages. > > > If the page daemon runs, it will dequeue any wired pages that are > > > encountered. > > > > Maybe I misunderstand how that works, but I would expect that the sum > of all > > counters could be greater than v_page_count at times. But in my case it's > less. > > I misread, sorry. You're right, what I described would cause double > counting. > > I don't know what might be causing it then. It could be a page leak. > The kernel allocates wired pages without adjusting the v_wire_count > counter in some cases, but the ones I know about happen at boot and > should not account for such a large disparity. I do not see it on a few > systems that I have access to. > > > > This was done to reduce queue lock contention, operations like > > > sendfile() which transiently wire pages would otherwise trigger two > > > queue operations per page. Now that queue operations are batched this > > > might not be as important. > > > > > > We could perhaps add a new flavour of vm_page_wire() which is not lazy > > > and would be suited for e.g., the buffer cache. What is the primary > > > source of wired pages in this case? > > > > It should be ZFS, I guess. > > > > -- > > Andriy Gapon > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current- > unsubscr...@freebsd.org" I see kernel memory disappearing, when enabling ktls: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253281 Last test done with 13.0-RC1. I'm a bit at a loss how to debug this further. Regards Juergen Weiss Juergen Weiss | we...@uni-mainz.de | ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: stable/13, vm page counts do not add up
On Wed, Apr 07, 2021 at 11:22:41PM +0300, Andriy Gapon wrote: > On 07/04/2021 22:54, Mark Johnston wrote: > > On Wed, Apr 07, 2021 at 10:42:57PM +0300, Andriy Gapon wrote: > >> > >> I regularly see that the top's memory line does not add up (and by a lot). > >> That can be seen with vm.stats as well. > >> > >> For example: > >> $ sysctl vm.stats | fgrep count > >> vm.stats.vm.v_cache_count: 0 > >> vm.stats.vm.v_user_wire_count: 3231 > >> vm.stats.vm.v_laundry_count: 262058 > >> vm.stats.vm.v_inactive_count: 3054178 > >> vm.stats.vm.v_active_count: 621131 > >> vm.stats.vm.v_wire_count: 1871176 > >> vm.stats.vm.v_free_count: 18 > >> vm.stats.vm.v_page_count: 8134982 > >> > >> $ bc > > 18 + 1871176 + 621131 + 3054178 + 262058 > >> 5996320 > > 8134982 - 5996320 > >> 2138662 > >> > >> As you can see, it's not a small number of pages either. > >> Approximately 2 million pages, 8 gigabytes or 25% of the whole memory on > >> this > >> system. > >> > >> This is 47c00a9835926e96, 13.0-STABLE amd64. > >> I do not think that I saw anything like that when I used (much) older > >> FreeBSD. > > > > One relevant change is that vm_page_wire() no longer removes pages from > > LRU queues, so the count of pages in the queues can include wired pages. > > If the page daemon runs, it will dequeue any wired pages that are > > encountered. > > Maybe I misunderstand how that works, but I would expect that the sum of all > counters could be greater than v_page_count at times. But in my case it's > less. I misread, sorry. You're right, what I described would cause double counting. I don't know what might be causing it then. It could be a page leak. The kernel allocates wired pages without adjusting the v_wire_count counter in some cases, but the ones I know about happen at boot and should not account for such a large disparity. I do not see it on a few systems that I have access to. > > This was done to reduce queue lock contention, operations like > > sendfile() which transiently wire pages would otherwise trigger two > > queue operations per page. Now that queue operations are batched this > > might not be as important. > > > > We could perhaps add a new flavour of vm_page_wire() which is not lazy > > and would be suited for e.g., the buffer cache. What is the primary > > source of wired pages in this case? > > It should be ZFS, I guess. > > -- > Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: stable/13, vm page counts do not add up
On 07/04/2021 22:54, Mark Johnston wrote: > On Wed, Apr 07, 2021 at 10:42:57PM +0300, Andriy Gapon wrote: >> >> I regularly see that the top's memory line does not add up (and by a lot). >> That can be seen with vm.stats as well. >> >> For example: >> $ sysctl vm.stats | fgrep count >> vm.stats.vm.v_cache_count: 0 >> vm.stats.vm.v_user_wire_count: 3231 >> vm.stats.vm.v_laundry_count: 262058 >> vm.stats.vm.v_inactive_count: 3054178 >> vm.stats.vm.v_active_count: 621131 >> vm.stats.vm.v_wire_count: 1871176 >> vm.stats.vm.v_free_count: 18 >> vm.stats.vm.v_page_count: 8134982 >> >> $ bc > 18 + 1871176 + 621131 + 3054178 + 262058 >> 5996320 > 8134982 - 5996320 >> 2138662 >> >> As you can see, it's not a small number of pages either. >> Approximately 2 million pages, 8 gigabytes or 25% of the whole memory on this >> system. >> >> This is 47c00a9835926e96, 13.0-STABLE amd64. >> I do not think that I saw anything like that when I used (much) older >> FreeBSD. > > One relevant change is that vm_page_wire() no longer removes pages from > LRU queues, so the count of pages in the queues can include wired pages. > If the page daemon runs, it will dequeue any wired pages that are > encountered. Maybe I misunderstand how that works, but I would expect that the sum of all counters could be greater than v_page_count at times. But in my case it's less. > This was done to reduce queue lock contention, operations like > sendfile() which transiently wire pages would otherwise trigger two > queue operations per page. Now that queue operations are batched this > might not be as important. > > We could perhaps add a new flavour of vm_page_wire() which is not lazy > and would be suited for e.g., the buffer cache. What is the primary > source of wired pages in this case? It should be ZFS, I guess. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: stable/13, vm page counts do not add up
On Wed, Apr 07, 2021 at 10:42:57PM +0300, Andriy Gapon wrote: > > I regularly see that the top's memory line does not add up (and by a lot). > That can be seen with vm.stats as well. > > For example: > $ sysctl vm.stats | fgrep count > vm.stats.vm.v_cache_count: 0 > vm.stats.vm.v_user_wire_count: 3231 > vm.stats.vm.v_laundry_count: 262058 > vm.stats.vm.v_inactive_count: 3054178 > vm.stats.vm.v_active_count: 621131 > vm.stats.vm.v_wire_count: 1871176 > vm.stats.vm.v_free_count: 18 > vm.stats.vm.v_page_count: 8134982 > > $ bc > >>> 18 + 1871176 + 621131 + 3054178 + 262058 > 5996320 > >>> 8134982 - 5996320 > 2138662 > > As you can see, it's not a small number of pages either. > Approximately 2 million pages, 8 gigabytes or 25% of the whole memory on this > system. > > This is 47c00a9835926e96, 13.0-STABLE amd64. > I do not think that I saw anything like that when I used (much) older FreeBSD. One relevant change is that vm_page_wire() no longer removes pages from LRU queues, so the count of pages in the queues can include wired pages. If the page daemon runs, it will dequeue any wired pages that are encountered. This was done to reduce queue lock contention, operations like sendfile() which transiently wire pages would otherwise trigger two queue operations per page. Now that queue operations are batched this might not be as important. We could perhaps add a new flavour of vm_page_wire() which is not lazy and would be suited for e.g., the buffer cache. What is the primary source of wired pages in this case? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
stable/13, vm page counts do not add up
I regularly see that the top's memory line does not add up (and by a lot). That can be seen with vm.stats as well. For example: $ sysctl vm.stats | fgrep count vm.stats.vm.v_cache_count: 0 vm.stats.vm.v_user_wire_count: 3231 vm.stats.vm.v_laundry_count: 262058 vm.stats.vm.v_inactive_count: 3054178 vm.stats.vm.v_active_count: 621131 vm.stats.vm.v_wire_count: 1871176 vm.stats.vm.v_free_count: 18 vm.stats.vm.v_page_count: 8134982 $ bc >>> 18 + 1871176 + 621131 + 3054178 + 262058 5996320 >>> 8134982 - 5996320 2138662 As you can see, it's not a small number of pages either. Approximately 2 million pages, 8 gigabytes or 25% of the whole memory on this system. This is 47c00a9835926e96, 13.0-STABLE amd64. I do not think that I saw anything like that when I used (much) older FreeBSD. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"