hello Greg. thanks for writing. It is difficult to clone this particular VM because it is a server which runs a number of tasks,including DNS, mail, httpd, DHCP service, etc. I've tried adjusting the amount of RAM available to the machine and that doesn't seem to make that much difference to the frequency of hangs. It can hang mere hours after booting, or it can be weeks.
It's a good thought though and it might be useful in the future. One thing I've noticed is I've been trying to determine if I get many kernel memory failures without crashes as the machine runs. So, I wrote the following awk script to take the output of vmstat -m to show me if memory allocations are failing. there are failures, but they are not continuous. Rather, they appear to be bursty. For example, as I write this, the machine has been up: 3:52PM up 20 days, 3:50, 2 users, load averages: 0.24, 0.42, 0.30 I get the following output from my script: vmstat -m | awk -f memfail.awk biopl 272 2717692 2 2717629 39646 39641 5 428 0 inf 0 buf16k 16384 2429887 34 2406027 120741 111469 9272 13892 0 inf 0 pcgnormal 256 19114251 130625 19113738 365670 365637 33 1095 0 inf 0 pvpage 4096 777453 3 775913 352117 350577 1540 3701 0 inf 0 xnfrx 4096 1184808 10 1184517 87242 86951 291 435 0 inf 0 Totals 213633401 130674 210402708 2904574 2614430 290144 These totals are since boot time. Any thoughts would be greatly appreciated. -thanks -Brian <Cut here for awk script> # $Id$ # NAME: Brian Buhrow # DATE: June 30, 2025 # PURPOSE: # Try and figure out if we're getting memory allocation failures from any pools. BEGIN { } { $4 += 0 if ($4 > 0) { printf "%s\n",$0 } } END { }