Re: Kernel memory leak in ATAPI/CAM or ATAng?
Date: Sun, 09 Nov 2003 22:43:47 -0700 From: Scott Long [EMAIL PROTECTED] Kevin Oberman wrote: Tested. It's much better, although ATA request keeps adding more memory all the time when mplayer is playing, but it's now increasing at about 20K/minute which is a huge improvement. Still, I don't understand why it should just continue to grow all of the time. The data rate is about constant. I would expect that it should grow to a size where the data being processed can be accommodated and then stop growing. I don't see it stopping. Thanks for the quick fix. Well, it sounds like there is still a memory leak somewhere. Make sure that you have rev 1.27 of atapi-cam.c to be sure. If so, please let me know which malloc type in vmstat -m is growing. Oh, crap! I guess I pulled the new version too quickly yesterday when your message arrived. I had 1.26. And I don't have a DVD with me, so I was seeing a much slower leak because the CD transfers data so much more slowly. After a kernel rebuild I see: ATA request 0 0K 1K 7285 128 after reading some bulk data off of a CD. Thanks! -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Kernel memory leak in ATAPI/CAM or ATAng?
Fixed. Please retest. Scott Long wrote: Kevin Oberman wrote: Date: Thu, 6 Nov 2003 11:23:30 -0500 (EST) From: Robert Watson [EMAIL PROTECTED] On Thu, 6 Nov 2003, Kevin Oberman wrote: I have learned a bit more about the problems I have been having with the DVD drive on my T30 laptop. When I have run the drive for an extended time (like 2 or 3 hours), I invariably have my system lock up because it can't malloc kernel memory for the ATAPI/CAM or ATA device. (Usually it's both.) The only recovery seems to be to reboot the system. Is it possible to drop to DDB and generate a coredump at that point? If so, you can run vmstat on the core to look at memory use statistics in a post-mortem way. As to what to look for: big numbers is about the limit of what I can suggest, I'm afraid :-). Usually the activity of choice is to compare vmstat statistics (with -m and -z) during normal operation and when the leak has occurred, and look for any marked differences. It's worth observing that there are two failure modes here that appear almost identical: (1) a memory leak resulting in address space exhaustion for the kernel, and (2) a tunable maximum allocation being too high for the available address space. Note that (2) isn't a leak, simply a poorly tuned value. We've noticed a number of tuned memory limits were set when memory sizes on systems were much lower, and so we've had to readjust the tuning parameters for large memory systems. Likewise, a number of problems were observed when PAE was introduced, as some of the tuning parameters scaled with the amount of physical memory, not with the addressable space for the kernel. So we probably want to be on the look out for both of these possibilities. Well, I have no details to this point, but 'vmstat -m' makes the problem obvious. The amount of kernel memory allocated to ATA request climbs forever and after enough data is transferred, it runs out of KVM. This is a continual leak, and monitoring it on the running system makes it pretty clear that something is leaking. I don't think (2) is the issue. Because the field allocated in vmstat are not large enough, this is a bit hard to read. The field all merge into some REALLY large numbers. After reboot, it is 5K. When running mencode I see this increasing at a rate of a bit under 1.9 MB per minute. It does not look like a tuning issue. No matter how big KVM is allowed to grow, it's only a matter of time until it is gone. I am going to do some testing to see what operations seem to causse this. I assume it does not happen all of the time or everyone would have seen it. I suspect it only happens with ATAPI/CAM activity, possibly only with simultaneous ATA and ATAPI/COM activity. Does vmstat -m show which malloc type is growing? Knowing this will greatly speed up the debugging process. Thanks! Scott ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Kernel memory leak in ATAPI/CAM or ATAng?
On Thu, Nov 06, 2003 at 08:08:31AM -0800, Kevin Oberman wrote: Any ideas on where I can look for more information? I'm going to try doing some monitoring with vmstat while running to see if I can spot anything, but I am not sure just what I am looking for. The VM system is not something I know much about, but I did read Terry Lambert's excellent message to current on KVM tuning and I'm hoping that this might help, but, if there really is a memory leak, tuning will not fix it. Got a link to ...Terry Lambert's excellent message to current on KVM tuning.. ? Very keen to read it. - aW ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Kernel memory leak in ATAPI/CAM or ATAng?
Date: Mon, 10 Nov 2003 11:37:00 +1030 From: Alex Wilkinson [EMAIL PROTECTED] On Thu, Nov 06, 2003 at 08:08:31AM -0800, Kevin Oberman wrote: Any ideas on where I can look for more information? I'm going to try doing some monitoring with vmstat while running to see if I can spot anything, but I am not sure just what I am looking for. The VM system is not something I know much about, but I did read Terry Lambert's excellent message to current on KVM tuning and I'm hoping that this might help, but, if there really is a memory leak, tuning will not fix it. Got a link to ...Terry Lambert's excellent message to current on KVM tuning.. ? I found it in Google Groups. Search for mailing.freebsd.current Lambert kmem_map and you will find several good articles. It's really worth reading the entire threads to get a better understanding of how all of this works. The one I was referring to was at: http://groups.google.com/groups?q=mailing.freebsd.current+Lambert+kmem_mapstart=10hl=enlr=ie=UTF-8selm=bde8n6%244jb%241%40FreeBSD.csie.NCTU.edu.twrnum=12 -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Kernel memory leak in ATAPI/CAM or ATAng?
Tested. It's much better, although ATA request keeps adding more memory all the time when mplayer is playing, but it's now increasing at about 20K/minute which is a huge improvement. Still, I don't understand why it should just continue to grow all of the time. The data rate is about constant. I would expect that it should grow to a size where the data being processed can be accommodated and then stop growing. I don't see it stopping. Thanks for the quick fix. Sorry to have taken so long to test it, but I am at SC2003 in Phoenix for the next two weeks building and running the show network. About 40 10Gig links this year and about 150 100K and 1Gig links this year. I have no idea how many miles of fiber in the convention center, but we start installing it tomorrow morning. We also should be bringing up the OC-192s to the major research nets over the next two days. If any of you are at the show, stop by the NOC and say Hi. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Kernel memory leak in ATAPI/CAM or ATAng?
Kevin Oberman wrote: Tested. It's much better, although ATA request keeps adding more memory all the time when mplayer is playing, but it's now increasing at about 20K/minute which is a huge improvement. Still, I don't understand why it should just continue to grow all of the time. The data rate is about constant. I would expect that it should grow to a size where the data being processed can be accommodated and then stop growing. I don't see it stopping. Thanks for the quick fix. Well, it sounds like there is still a memory leak somewhere. Make sure that you have rev 1.27 of atapi-cam.c to be sure. If so, please let me know which malloc type in vmstat -m is growing. Scott ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Kernel memory leak in ATAPI/CAM or ATAng?
Kevin Oberman wrote: Date: Thu, 6 Nov 2003 11:23:30 -0500 (EST) From: Robert Watson [EMAIL PROTECTED] On Thu, 6 Nov 2003, Kevin Oberman wrote: I have learned a bit more about the problems I have been having with the DVD drive on my T30 laptop. When I have run the drive for an extended time (like 2 or 3 hours), I invariably have my system lock up because it can't malloc kernel memory for the ATAPI/CAM or ATA device. (Usually it's both.) The only recovery seems to be to reboot the system. Is it possible to drop to DDB and generate a coredump at that point? If so, you can run vmstat on the core to look at memory use statistics in a post-mortem way. As to what to look for: big numbers is about the limit of what I can suggest, I'm afraid :-). Usually the activity of choice is to compare vmstat statistics (with -m and -z) during normal operation and when the leak has occurred, and look for any marked differences. It's worth observing that there are two failure modes here that appear almost identical: (1) a memory leak resulting in address space exhaustion for the kernel, and (2) a tunable maximum allocation being too high for the available address space. Note that (2) isn't a leak, simply a poorly tuned value. We've noticed a number of tuned memory limits were set when memory sizes on systems were much lower, and so we've had to readjust the tuning parameters for large memory systems. Likewise, a number of problems were observed when PAE was introduced, as some of the tuning parameters scaled with the amount of physical memory, not with the addressable space for the kernel. So we probably want to be on the look out for both of these possibilities. Well, I have no details to this point, but 'vmstat -m' makes the problem obvious. The amount of kernel memory allocated to ATA request climbs forever and after enough data is transferred, it runs out of KVM. This is a continual leak, and monitoring it on the running system makes it pretty clear that something is leaking. I don't think (2) is the issue. Because the field allocated in vmstat are not large enough, this is a bit hard to read. The field all merge into some REALLY large numbers. After reboot, it is 5K. When running mencode I see this increasing at a rate of a bit under 1.9 MB per minute. It does not look like a tuning issue. No matter how big KVM is allowed to grow, it's only a matter of time until it is gone. I am going to do some testing to see what operations seem to causse this. I assume it does not happen all of the time or everyone would have seen it. I suspect it only happens with ATAPI/CAM activity, possibly only with simultaneous ATA and ATAPI/COM activity. Does vmstat -m show which malloc type is growing? Knowing this will greatly speed up the debugging process. Thanks! Scott ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Kernel memory leak in ATAPI/CAM or ATAng?
Date: Fri, 07 Nov 2003 00:45:47 -0700 From: Scott Long [EMAIL PROTECTED] Kevin Oberman wrote: Date: Thu, 6 Nov 2003 11:23:30 -0500 (EST) From: Robert Watson [EMAIL PROTECTED] On Thu, 6 Nov 2003, Kevin Oberman wrote: I have learned a bit more about the problems I have been having with the DVD drive on my T30 laptop. When I have run the drive for an extended time (like 2 or 3 hours), I invariably have my system lock up because it can't malloc kernel memory for the ATAPI/CAM or ATA device. (Usually it's both.) The only recovery seems to be to reboot the system. Is it possible to drop to DDB and generate a coredump at that point? If so, you can run vmstat on the core to look at memory use statistics in a post-mortem way. As to what to look for: big numbers is about the limit of what I can suggest, I'm afraid :-). Usually the activity of choice is to compare vmstat statistics (with -m and -z) during normal operation and when the leak has occurred, and look for any marked differences. It's worth observing that there are two failure modes here that appear almost identical: (1) a memory leak resulting in address space exhaustion for the kernel, and (2) a tunable maximum allocation being too high for the available address space. Note that (2) isn't a leak, simply a poorly tuned value. We've noticed a number of tuned memory limits were set when memory sizes on systems were much lower, and so we've had to readjust the tuning parameters for large memory systems. Likewise, a number of problems were observed when PAE was introduced, as some of the tuning parameters scaled with the amount of physical memory, not with the addressable space for the kernel. So we probably want to be on the look out for both of these possibilities. Well, I have no details to this point, but 'vmstat -m' makes the problem obvious. The amount of kernel memory allocated to ATA request climbs forever and after enough data is transferred, it runs out of KVM. This is a continual leak, and monitoring it on the running system makes it pretty clear that something is leaking. I don't think (2) is the issue. Because the field allocated in vmstat are not large enough, this is a bit hard to read. The field all merge into some REALLY large numbers. After reboot, it is 5K. When running mencode I see this increasing at a rate of a bit under 1.9 MB per minute. It does not look like a tuning issue. No matter how big KVM is allowed to grow, it's only a matter of time until it is gone. I am going to do some testing to see what operations seem to causse this. I assume it does not happen all of the time or everyone would have seen it. I suspect it only happens with ATAPI/CAM activity, possibly only with simultaneous ATA and ATAPI/COM activity. Does vmstat -m show which malloc type is growing? Knowing this will greatly speed up the debugging process. I'm not sure I follow. The leak is in ATA request. Is there something more to be seen in vmstat -m? I have confirmed that it seems to happen with any reads from the DVD device, but my testing has been done with mplayer. Makes it a bit tough to watch a full-length movie! I have opened kern/59043 on the problem. Let me know if I can do further testing. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Kernel memory leak in ATAPI/CAM or ATAng?
I have learned a bit more about the problems I have been having with the DVD drive on my T30 laptop. When I have run the drive for an extended time (like 2 or 3 hours), I invariably have my system lock up because it can't malloc kernel memory for the ATAPI/CAM or ATA device. (Usually it's both.) The only recovery seems to be to reboot the system. I suspect a memory leak because it seems to be linked to total amount of data transferred, even in multiple invocations of the program. Of course, once the kernel grabs VM, I guess it generally does not actually release it, but it should re-use the existing allocation and not keep allocating more. I posted my config and dmesg files yesterday. I have tried tuning KVM_SIZE stuff with no real success, just the loss of the ability to run with APM loaded. (This is possibly due to mis-tuning.) Any ideas on where I can look for more information? I'm going to try doing some monitoring with vmstat while running to see if I can spot anything, but I am not sure just what I am looking for. The VM system is not something I know much about, but I did read Terry Lambert's excellent message to current on KVM tuning and I'm hoping that this might help, but, if there really is a memory leak, tuning will not fix it. FWIW, this problem did not exist a few month ago prior to ATAng. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Kernel memory leak in ATAPI/CAM or ATAng?
On Thu, 6 Nov 2003, Kevin Oberman wrote: I have learned a bit more about the problems I have been having with the DVD drive on my T30 laptop. When I have run the drive for an extended time (like 2 or 3 hours), I invariably have my system lock up because it can't malloc kernel memory for the ATAPI/CAM or ATA device. (Usually it's both.) The only recovery seems to be to reboot the system. Is it possible to drop to DDB and generate a coredump at that point? If so, you can run vmstat on the core to look at memory use statistics in a post-mortem way. As to what to look for: big numbers is about the limit of what I can suggest, I'm afraid :-). Usually the activity of choice is to compare vmstat statistics (with -m and -z) during normal operation and when the leak has occurred, and look for any marked differences. It's worth observing that there are two failure modes here that appear almost identical: (1) a memory leak resulting in address space exhaustion for the kernel, and (2) a tunable maximum allocation being too high for the available address space. Note that (2) isn't a leak, simply a poorly tuned value. We've noticed a number of tuned memory limits were set when memory sizes on systems were much lower, and so we've had to readjust the tuning parameters for large memory systems. Likewise, a number of problems were observed when PAE was introduced, as some of the tuning parameters scaled with the amount of physical memory, not with the addressable space for the kernel. So we probably want to be on the look out for both of these possibilities. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects [EMAIL PROTECTED] Network Associates Laboratories ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Kernel memory leak in ATAPI/CAM or ATAng?
Date: Thu, 6 Nov 2003 11:23:30 -0500 (EST) From: Robert Watson [EMAIL PROTECTED] On Thu, 6 Nov 2003, Kevin Oberman wrote: I have learned a bit more about the problems I have been having with the DVD drive on my T30 laptop. When I have run the drive for an extended time (like 2 or 3 hours), I invariably have my system lock up because it can't malloc kernel memory for the ATAPI/CAM or ATA device. (Usually it's both.) The only recovery seems to be to reboot the system. Is it possible to drop to DDB and generate a coredump at that point? If so, you can run vmstat on the core to look at memory use statistics in a post-mortem way. As to what to look for: big numbers is about the limit of what I can suggest, I'm afraid :-). Usually the activity of choice is to compare vmstat statistics (with -m and -z) during normal operation and when the leak has occurred, and look for any marked differences. It's worth observing that there are two failure modes here that appear almost identical: (1) a memory leak resulting in address space exhaustion for the kernel, and (2) a tunable maximum allocation being too high for the available address space. Note that (2) isn't a leak, simply a poorly tuned value. We've noticed a number of tuned memory limits were set when memory sizes on systems were much lower, and so we've had to readjust the tuning parameters for large memory systems. Likewise, a number of problems were observed when PAE was introduced, as some of the tuning parameters scaled with the amount of physical memory, not with the addressable space for the kernel. So we probably want to be on the look out for both of these possibilities. Well, I have no details to this point, but 'vmstat -m' makes the problem obvious. The amount of kernel memory allocated to ATA request climbs forever and after enough data is transferred, it runs out of KVM. This is a continual leak, and monitoring it on the running system makes it pretty clear that something is leaking. I don't think (2) is the issue. Because the field allocated in vmstat are not large enough, this is a bit hard to read. The field all merge into some REALLY large numbers. After reboot, it is 5K. When running mencode I see this increasing at a rate of a bit under 1.9 MB per minute. It does not look like a tuning issue. No matter how big KVM is allowed to grow, it's only a matter of time until it is gone. I am going to do some testing to see what operations seem to causse this. I assume it does not happen all of the time or everyone would have seen it. I suspect it only happens with ATAPI/CAM activity, possibly only with simultaneous ATA and ATAPI/COM activity. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]