hello Greg.  thanks for writing.  It is difficult  to clone this 
particular VM because it
is a server which runs a number of tasks,including DNS, mail, httpd, DHCP 
service, etc.  I've
tried adjusting the amount of RAM available to the machine and that doesn't 
seem to make that
much difference to the frequency of hangs.  It can hang mere hours after  
booting, or it can be
weeks.

It's a good thought though and it might be useful in the future.

One thing I've noticed is I've been trying to determine if I get many kernel 
memory failures
without crashes as the machine runs.  
So, I wrote the following awk script to take the output of vmstat -m to  show 
me if memory
allocations are failing.
there are failures, but they are not continuous.  Rather, they appear to be 
bursty.  
For example, as I write this, the machine has been 
up:  3:52PM  up 20 days,  3:50, 2 users, load averages: 0.24, 0.42, 0.30

I get the following output from my script:
vmstat -m | awk -f memfail.awk

biopl 272 2717692 2 2717629 39646 39641 5 428 0 inf 0
buf16k 16384 2429887 34 2406027 120741 111469 9272 13892 0 inf 0
pcgnormal 256 19114251 130625 19113738 365670 365637 33 1095 0 inf 0
pvpage 4096 777453 3 775913 352117 350577 1540 3701 0 inf 0
xnfrx 4096 1184808 10 1184517 87242 86951 291 435 0 inf 0
Totals 213633401 130674 210402708 2904574 2614430 290144

These totals are since boot time.


Any thoughts would be greatly appreciated.

-thanks
-Brian


<Cut here for awk script>
# $Id$
# NAME: Brian Buhrow
# DATE: June 30, 2025
# PURPOSE:
# Try and figure out if we're getting memory allocation failures from any pools.

BEGIN { }
{
        $4 += 0
        if ($4 > 0) {
                printf "%s\n",$0
        }
}
END { }

Reply via email to