Re: Kernel memory leak with x11/nvidia-driver
On Fri, Feb 05, 2016 at 09:05:00AM -0600, Eric van Gyzen wrote: > On 02/ 4/16 08:05 PM, Mark Johnston wrote: > > On Thu, Feb 04, 2016 at 05:37:24PM -0600, Eric van Gyzen wrote: > >> On 02/ 3/16 10:54 AM, Eric van Gyzen wrote: > >>> I just set up a new desktop running head with x11/nvidia-driver. I've > >>> discovered a memory leak where pages disappear from the queues, never to > >>> return. Specifically, the total of > >>> v_active_count > >>> v_inactive_count > >>> v_wire_count > >>> v_cache_count > >>> v_free_count > >>> drops, eventually becoming /much/ less than v_page_count. After leaving > >>> xscreensaver running overnight, cycling the saver every 10 minutes, the > >>> system was unusable, because it only had a few MB of memory. (It has 8 > >>> GB physical.) > >> In case anyone is curious, /usr/local/bin/xscreensaver-hacks/glmatrix > >> triggers a fairly fast leak--around 600 pages per second. > > I'm able to repro this on my workstation. With DTrace I can see that > > glmatrix is allocating pages for an SG object at roughly the rate > > they're being leaked. I took a look at r292373 (based on the history of > > sg_pager.c) and noticed a vm_page_free() call was lost when > > sg_pager_getpages() was simplified. > > > > The patch below seems to do the trick for me. Could you give it a try > > and confirm that it fixes the problem? I run current+nvidia-driver on > > multiple workstations but hadn't observed a leak until now, so maybe > > there's something additional going on in your case. Then again, I just > > use i3lock. :) > > > > diff --git a/sys/vm/sg_pager.c b/sys/vm/sg_pager.c > > index 84bfa49..2cccb7ea 100644 > > --- a/sys/vm/sg_pager.c > > +++ b/sys/vm/sg_pager.c > > @@ -189,6 +189,9 @@ sg_pager_getpages(vm_object_t object, vm_page_t *m, int > > count, int *rbehind, > > VM_OBJECT_WLOCK(object); > > TAILQ_INSERT_TAIL(&object->un_pager.sgp.sgp_pglist, page, plinks.q); > > vm_page_replace_checked(page, object, offset, m[0]); > > + vm_page_lock(m[0]); > > + vm_page_free(m[0]); > > + vm_page_unlock(m[0]); > > m[0] = page; > > page->valid = VM_PAGE_BITS_ALL; > > > > Your patch fixes the leak completely. Nice work, Mark! > > I didn't notice the leak until I unknowingly left the screensaver > cycling all overnight. In my normal workflow, I open the windows I need > and pretty much leave them there. This doesn't seem to trigger the > leak, at least not at a noticeable rate. > > Thanks for your help! Thank you. This was committed in r295330. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Kernel memory leak with x11/nvidia-driver
On 02/ 4/16 08:05 PM, Mark Johnston wrote: On Thu, Feb 04, 2016 at 05:37:24PM -0600, Eric van Gyzen wrote: On 02/ 3/16 10:54 AM, Eric van Gyzen wrote: I just set up a new desktop running head with x11/nvidia-driver. I've discovered a memory leak where pages disappear from the queues, never to return. Specifically, the total of v_active_count v_inactive_count v_wire_count v_cache_count v_free_count drops, eventually becoming /much/ less than v_page_count. After leaving xscreensaver running overnight, cycling the saver every 10 minutes, the system was unusable, because it only had a few MB of memory. (It has 8 GB physical.) In case anyone is curious, /usr/local/bin/xscreensaver-hacks/glmatrix triggers a fairly fast leak--around 600 pages per second. I'm able to repro this on my workstation. With DTrace I can see that glmatrix is allocating pages for an SG object at roughly the rate they're being leaked. I took a look at r292373 (based on the history of sg_pager.c) and noticed a vm_page_free() call was lost when sg_pager_getpages() was simplified. The patch below seems to do the trick for me. Could you give it a try and confirm that it fixes the problem? I run current+nvidia-driver on multiple workstations but hadn't observed a leak until now, so maybe there's something additional going on in your case. Then again, I just use i3lock. :) diff --git a/sys/vm/sg_pager.c b/sys/vm/sg_pager.c index 84bfa49..2cccb7ea 100644 --- a/sys/vm/sg_pager.c +++ b/sys/vm/sg_pager.c @@ -189,6 +189,9 @@ sg_pager_getpages(vm_object_t object, vm_page_t *m, int count, int *rbehind, VM_OBJECT_WLOCK(object); TAILQ_INSERT_TAIL(&object->un_pager.sgp.sgp_pglist, page, plinks.q); vm_page_replace_checked(page, object, offset, m[0]); + vm_page_lock(m[0]); + vm_page_free(m[0]); + vm_page_unlock(m[0]); m[0] = page; page->valid = VM_PAGE_BITS_ALL; Your patch fixes the leak completely. Nice work, Mark! I didn't notice the leak until I unknowingly left the screensaver cycling all overnight. In my normal workflow, I open the windows I need and pretty much leave them there. This doesn't seem to trigger the leak, at least not at a noticeable rate. Thanks for your help! Eric ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Kernel memory leak with x11/nvidia-driver
On Thu, 4 Feb 2016 18:05:43 -0800 Mark Johnston wrote: > On Thu, Feb 04, 2016 at 05:37:24PM -0600, Eric van Gyzen wrote: > > On 02/ 3/16 10:54 AM, Eric van Gyzen wrote: > > > I just set up a new desktop running head with x11/nvidia-driver. I've > > > discovered a memory leak where pages disappear from the queues, never to > > > return. Specifically, the total of > > > v_active_count > > > v_inactive_count > > > v_wire_count > > > v_cache_count > > > v_free_count > > > drops, eventually becoming /much/ less than v_page_count. After leaving > > > xscreensaver running overnight, cycling the saver every 10 minutes, the > > > system was unusable, because it only had a few MB of memory. (It has 8 > > > GB physical.) > > > > In case anyone is curious, /usr/local/bin/xscreensaver-hacks/glmatrix > > triggers a fairly fast leak--around 600 pages per second. > > I'm able to repro this on my workstation. With DTrace I can see that > glmatrix is allocating pages for an SG object at roughly the rate > they're being leaked. I took a look at r292373 (based on the history of > sg_pager.c) and noticed a vm_page_free() call was lost when > sg_pager_getpages() was simplified. > > The patch below seems to do the trick for me. Could you give it a try > and confirm that it fixes the problem? I run current+nvidia-driver on > multiple workstations but hadn't observed a leak until now, so maybe > there's something additional going on in your case. Then again, I just > use i3lock. :) > > diff --git a/sys/vm/sg_pager.c b/sys/vm/sg_pager.c > index 84bfa49..2cccb7ea 100644 > --- a/sys/vm/sg_pager.c > +++ b/sys/vm/sg_pager.c > @@ -189,6 +189,9 @@ sg_pager_getpages(vm_object_t object, vm_page_t *m, int > count, int *rbehind, > VM_OBJECT_WLOCK(object); > TAILQ_INSERT_TAIL(&object->un_pager.sgp.sgp_pglist, page, plinks.q); > vm_page_replace_checked(page, object, offset, m[0]); > + vm_page_lock(m[0]); > + vm_page_free(m[0]); > + vm_page_unlock(m[0]); > m[0] = page; > page->valid = VM_PAGE_BITS_ALL; > I started looking at this yesterday after seeing the OP and verified that I was also losing pages. With this patch no more pages are lost. Good work! -- Gary Jennejohn ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Kernel memory leak with x11/nvidia-driver
On Thu, Feb 04, 2016 at 05:37:24PM -0600, Eric van Gyzen wrote: > On 02/ 3/16 10:54 AM, Eric van Gyzen wrote: > > I just set up a new desktop running head with x11/nvidia-driver. I've > > discovered a memory leak where pages disappear from the queues, never to > > return. Specifically, the total of > > v_active_count > > v_inactive_count > > v_wire_count > > v_cache_count > > v_free_count > > drops, eventually becoming /much/ less than v_page_count. After leaving > > xscreensaver running overnight, cycling the saver every 10 minutes, the > > system was unusable, because it only had a few MB of memory. (It has 8 > > GB physical.) > > In case anyone is curious, /usr/local/bin/xscreensaver-hacks/glmatrix > triggers a fairly fast leak--around 600 pages per second. I'm able to repro this on my workstation. With DTrace I can see that glmatrix is allocating pages for an SG object at roughly the rate they're being leaked. I took a look at r292373 (based on the history of sg_pager.c) and noticed a vm_page_free() call was lost when sg_pager_getpages() was simplified. The patch below seems to do the trick for me. Could you give it a try and confirm that it fixes the problem? I run current+nvidia-driver on multiple workstations but hadn't observed a leak until now, so maybe there's something additional going on in your case. Then again, I just use i3lock. :) diff --git a/sys/vm/sg_pager.c b/sys/vm/sg_pager.c index 84bfa49..2cccb7ea 100644 --- a/sys/vm/sg_pager.c +++ b/sys/vm/sg_pager.c @@ -189,6 +189,9 @@ sg_pager_getpages(vm_object_t object, vm_page_t *m, int count, int *rbehind, VM_OBJECT_WLOCK(object); TAILQ_INSERT_TAIL(&object->un_pager.sgp.sgp_pglist, page, plinks.q); vm_page_replace_checked(page, object, offset, m[0]); + vm_page_lock(m[0]); + vm_page_free(m[0]); + vm_page_unlock(m[0]); m[0] = page; page->valid = VM_PAGE_BITS_ALL; ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Kernel memory leak with x11/nvidia-driver
Wow that is insane. I'm going to start looking for the revision this behavior started. If you already found it, or find it before I report back plz let me know so I don't waste my time =] On Thu, Feb 4, 2016 at 6:37 PM, Eric van Gyzen wrote: > On 02/ 3/16 10:54 AM, Eric van Gyzen wrote: > >> I just set up a new desktop running head with x11/nvidia-driver. I've >> discovered a memory leak where pages disappear from the queues, never to >> return. Specifically, the total of >> v_active_count >> v_inactive_count >> v_wire_count >> v_cache_count >> v_free_count >> drops, eventually becoming /much/ less than v_page_count. After leaving >> xscreensaver running overnight, cycling the saver every 10 minutes, the >> system was unusable, because it only had a few MB of memory. (It has 8 >> GB physical.) >> > > In case anyone is curious, /usr/local/bin/xscreensaver-hacks/glmatrix > triggers a fairly fast leak--around 600 pages per second. > > Eric > > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Kernel memory leak with x11/nvidia-driver
On 02/ 3/16 10:54 AM, Eric van Gyzen wrote: I just set up a new desktop running head with x11/nvidia-driver. I've discovered a memory leak where pages disappear from the queues, never to return. Specifically, the total of v_active_count v_inactive_count v_wire_count v_cache_count v_free_count drops, eventually becoming /much/ less than v_page_count. After leaving xscreensaver running overnight, cycling the saver every 10 minutes, the system was unusable, because it only had a few MB of memory. (It has 8 GB physical.) In case anyone is curious, /usr/local/bin/xscreensaver-hacks/glmatrix triggers a fairly fast leak--around 600 pages per second. Eric ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Kernel memory leak with x11/nvidia-driver
Here are the results of the test you've suggested on my system (r293722), nvidia-driver-304-304.128 -- two runs with the break of 40 minutes: active inactive wire cache free total 85441 282221 280649 0 100455 748766 85488 282235 280655 0 100391 748769 85500 282240 280657 0 100372 748769 83226 283338 280692 0 101513 748769 82816 282439 280687 0 102827 748769 [14:01 - 1.52] [kostya@notebook2 9] ~ $ >sudo sh test.sh active inactive wire cache free total 82280 302769 304025 0 58081 747155 82273 302783 304021 0 58081 747158 82247 302809 304021 0 58081 747158 82239 302816 304009 0 58094 747158 82076 302995 304010 0 58077 747158 82080 303002 304010 0 58066 747158 [15:44 - 1.52] Hope this helps and you can see some tendency you're after. With kindest regards, Kostya Berger On Thursday, 4 February 2016, 3:56, Ultima wrote: Just tested your script, there is definitely a memory leak. I also ran into really weird behavior. Running your script in tmux after starting and stopping an xorg session a few, tmux completely froze in the session. Creating a new window in the session was also completely frozen, however this is only visually as commands still worked, just showed a blank black screen. Also unloading the kernel modules for nvidia and nvidia-modeset (new as of 358.16ish) did not free the memory. On Wed, Feb 3, 2016 at 8:24 PM, Ultima wrote: > Apologies, this should have been in my initial reply. > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201340 > or here for attachment > https://bz-attachments.freebsd.org/attachment.cgi?id=165694 > > I haven't actually had a chance to do anything after upgrading > from stable other than see the corrupted console for myself. > Lack of time =/ > > On Wed, Feb 3, 2016 at 2:41 PM, Eric van Gyzen > wrote: > >> On 02/03/2016 10:54, Eric van Gyzen wrote: >> > I just set up a new desktop running head with x11/nvidia-driver. I've >> > discovered a memory leak where pages disappear from the queues, never to >> > return. Specifically, the total of >> > v_active_count >> > v_inactive_count >> > v_wire_count >> > v_cache_count >> > v_free_count >> > drops, eventually becoming /much/ less than v_page_count. >> >> Here is a script to log the data: >> >> #!/bin/sh >> >> readonly QUEUES="active inactive wire cache free total" >> readonly FORMAT="%s\t%s\t%s\t%s\t%s\t%s\n" >> >> vm_page_counts() { >> for queue in $QUEUES; do >> if [ "$queue" != "total" ]; then >> sysctl -n vm.stats.vm.v_${queue}_count >> fi >> done >> } >> >> sum() { >> s=0 >> while [ $# -gt 0 ]; do >> s=$((s + $1)) >> shift >> done >> echo $s >> } >> >> print_counts() { >> counts="`vm_page_counts`" >> printf "$FORMAT" $counts `sum $counts` >> } >> >> printf "$FORMAT" $QUEUES >> print_counts >> while sleep 60; do >> print_counts >> done >> >> ___ >> freebsd-current@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org >> " >> > > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Kernel memory leak with x11/nvidia-driver
Just tested your script, there is definitely a memory leak. I also ran into really weird behavior. Running your script in tmux after starting and stopping an xorg session a few, tmux completely froze in the session. Creating a new window in the session was also completely frozen, however this is only visually as commands still worked, just showed a blank black screen. Also unloading the kernel modules for nvidia and nvidia-modeset (new as of 358.16ish) did not free the memory. On Wed, Feb 3, 2016 at 8:24 PM, Ultima wrote: > Apologies, this should have been in my initial reply. > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201340 > or here for attachment > https://bz-attachments.freebsd.org/attachment.cgi?id=165694 > > I haven't actually had a chance to do anything after upgrading > from stable other than see the corrupted console for myself. > Lack of time =/ > > On Wed, Feb 3, 2016 at 2:41 PM, Eric van Gyzen > wrote: > >> On 02/03/2016 10:54, Eric van Gyzen wrote: >> > I just set up a new desktop running head with x11/nvidia-driver. I've >> > discovered a memory leak where pages disappear from the queues, never to >> > return. Specifically, the total of >> > v_active_count >> > v_inactive_count >> > v_wire_count >> > v_cache_count >> > v_free_count >> > drops, eventually becoming /much/ less than v_page_count. >> >> Here is a script to log the data: >> >> #!/bin/sh >> >> readonly QUEUES="active inactive wire cache free total" >> readonly FORMAT="%s\t%s\t%s\t%s\t%s\t%s\n" >> >> vm_page_counts() { >> for queue in $QUEUES; do >> if [ "$queue" != "total" ]; then >> sysctl -n vm.stats.vm.v_${queue}_count >> fi >> done >> } >> >> sum() { >> s=0 >> while [ $# -gt 0 ]; do >> s=$((s + $1)) >> shift >> done >> echo $s >> } >> >> print_counts() { >> counts="`vm_page_counts`" >> printf "$FORMAT" $counts `sum $counts` >> } >> >> printf "$FORMAT" $QUEUES >> print_counts >> while sleep 60; do >> print_counts >> done >> >> ___ >> freebsd-current@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org >> " >> > > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Kernel memory leak with x11/nvidia-driver
Apologies, this should have been in my initial reply. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201340 or here for attachment https://bz-attachments.freebsd.org/attachment.cgi?id=165694 I haven't actually had a chance to do anything after upgrading from stable other than see the corrupted console for myself. Lack of time =/ On Wed, Feb 3, 2016 at 2:41 PM, Eric van Gyzen wrote: > On 02/03/2016 10:54, Eric van Gyzen wrote: > > I just set up a new desktop running head with x11/nvidia-driver. I've > > discovered a memory leak where pages disappear from the queues, never to > > return. Specifically, the total of > > v_active_count > > v_inactive_count > > v_wire_count > > v_cache_count > > v_free_count > > drops, eventually becoming /much/ less than v_page_count. > > Here is a script to log the data: > > #!/bin/sh > > readonly QUEUES="active inactive wire cache free total" > readonly FORMAT="%s\t%s\t%s\t%s\t%s\t%s\n" > > vm_page_counts() { > for queue in $QUEUES; do > if [ "$queue" != "total" ]; then > sysctl -n vm.stats.vm.v_${queue}_count > fi > done > } > > sum() { > s=0 > while [ $# -gt 0 ]; do > s=$((s + $1)) > shift > done > echo $s > } > > print_counts() { > counts="`vm_page_counts`" > printf "$FORMAT" $counts `sum $counts` > } > > printf "$FORMAT" $QUEUES > print_counts > while sleep 60; do > print_counts > done > > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Kernel memory leak with x11/nvidia-driver
On 02/03/2016 10:54, Eric van Gyzen wrote: > I just set up a new desktop running head with x11/nvidia-driver. I've > discovered a memory leak where pages disappear from the queues, never to > return. Specifically, the total of > v_active_count > v_inactive_count > v_wire_count > v_cache_count > v_free_count > drops, eventually becoming /much/ less than v_page_count. Here is a script to log the data: #!/bin/sh readonly QUEUES="active inactive wire cache free total" readonly FORMAT="%s\t%s\t%s\t%s\t%s\t%s\n" vm_page_counts() { for queue in $QUEUES; do if [ "$queue" != "total" ]; then sysctl -n vm.stats.vm.v_${queue}_count fi done } sum() { s=0 while [ $# -gt 0 ]; do s=$((s + $1)) shift done echo $s } print_counts() { counts="`vm_page_counts`" printf "$FORMAT" $counts `sum $counts` } printf "$FORMAT" $QUEUES print_counts while sleep 60; do print_counts done ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Kernel memory leak with x11/nvidia-driver
I just set up a new desktop running head with x11/nvidia-driver. I've discovered a memory leak where pages disappear from the queues, never to return. Specifically, the total of v_active_count v_inactive_count v_wire_count v_cache_count v_free_count drops, eventually becoming /much/ less than v_page_count. After leaving xscreensaver running overnight, cycling the saver every 10 minutes, the system was unusable, because it only had a few MB of memory. (It has 8 GB physical.) I see this on head from a few days ago. I do /not/ see it on stable/10 from a few days ago. Just starting and stopping Xorg eats pages. Starting and stopping X apps also eats pages, some more than others. Some screensavers ate a lot more memory than others. This is why I suspect the nvidia driver is the trigger. I rebuilt the x11/nvidia-driver port, but that didn't help. I would love to bisect to find the offending commit, but that would take a /lot/ of time, partly because I don't have a lower bound other than the stable/10 branch point (since this is a new installation). Does anyone know of any specific commits that could be suspicious? There have been several big changes in VM (e.g. more NUMA support, cache page elimination). Do any of those changes seem more likely? I'll take even a hunch at this point. :) Are there other areas I should look at? Is anyone running an older revision of head with the nvidia driver and doesn't see this problem? That would narrow the range for bisection. I should mention that I'm running with D3162* in my tree for NIC support, but there is no relationship between the leak and network activity, and I don't see the leak on stable/10+D3162. Thanks in advance, Eric * https://reviews.freebsd.org/D3162 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"