Deadlocks (Was: Re: vmstat 'b' (disk busy?) field keeps climbing ...)
On Mon, 26 Jun 2006, Kostik Belousov wrote: Core dumps are somewhat unconvenient in this situation. Better, sending report to me, follow my advise in http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html 'k, I'm working on getting a serial console working on one of the 3 servers that I'm getting these 'deadlocks' on ... I've upgraded the OS to the latest -STABLE (so that I'm not wasting ppls time on a bug that might already be fixed), and adding in the various options as detailed in the handbook URL above ... First question, what should I expect? A bunch of messages to the console? A slow as a dog server (ie. the INVARIANTS stuff)? or nothing until the deadlock actually occurs? thx ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Mon, Jun 26, 2006 at 02:20:12AM -0300, Marc G. Fournier wrote: On Mon, 26 Jun 2006, Kostik Belousov wrote: Yes, this looks like a deadlock. As I understand, that's on 6.1-STABLE ? Yes, kernel sources, it seems, from May 25th, according to my /usr/src tree ... BTW, do you use snapshots ? Not that I've explicitly enabled ... I think that without ddb access, diagnose and debug the problem would be quite hard. Would it be a simple matter of: CTL-ALT-ESC panic to get it to dump core? Or would more be involved? Would a core dump even work? Core dumps are somewhat unconvenient in this situation. Better, sending report to me, follow my advise in http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html pgp2wZz5EWVn0.pgp Description: PGP signature
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Mon, 26 Jun 2006, Kostik Belousov wrote: On Mon, Jun 26, 2006 at 02:20:12AM -0300, Marc G. Fournier wrote: On Mon, 26 Jun 2006, Kostik Belousov wrote: Yes, this looks like a deadlock. As I understand, that's on 6.1-STABLE ? Yes, kernel sources, it seems, from May 25th, according to my /usr/src tree ... BTW, do you use snapshots ? Not that I've explicitly enabled ... I think that without ddb access, diagnose and debug the problem would be quite hard. Would it be a simple matter of: CTL-ALT-ESC panic to get it to dump core? Or would more be involved? Would a core dump even work? Core dumps are somewhat unconvenient in this situation. Better, sending report to me, follow my advise in http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html The problem is that I'm dealing with a remote server here, no serial console, and no guaranteed keyboard :( I'm between a rock and a hard place on this one :( Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
I think I might have found *at least* one of the problems, and that being the excessively high blocked states while ps isn't finding anything ... MySQL We just recently started allowing clients to run a MySQL server *within* their vServer ... in a drastic move, I just shut them all down on pluto, and blocked drop'd from ~86 down to 5 in a matter of moments ... restarting them all has it climbing once more, being up around 22 already ... I'm going to go with that theory for now, and keep an eye on things ... Just curious as to why, even with -H, its not showing any blocked states within ps though ... ? Thx On Mon, 26 Jun 2006, Kostik Belousov wrote: On Mon, Jun 26, 2006 at 02:20:12AM -0300, Marc G. Fournier wrote: On Mon, 26 Jun 2006, Kostik Belousov wrote: Yes, this looks like a deadlock. As I understand, that's on 6.1-STABLE ? Yes, kernel sources, it seems, from May 25th, according to my /usr/src tree ... BTW, do you use snapshots ? Not that I've explicitly enabled ... I think that without ddb access, diagnose and debug the problem would be quite hard. Would it be a simple matter of: CTL-ALT-ESC panic to get it to dump core? Or would more be involved? Would a core dump even work? Core dumps are somewhat unconvenient in this situation. Better, sending report to me, follow my advise in http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Monday 26 June 2006 20:25, Marc G. Fournier wrote: I think I might have found *at least* one of the problems, and that being the excessively high blocked states while ps isn't finding anything ... MySQL We just recently started allowing clients to run a MySQL server *within* their vServer ... in a drastic move, I just shut them all down on pluto, and blocked drop'd from ~86 down to 5 in a matter of moments ... restarting them all has it climbing once more, being up around 22 already ... I'm going to go with that theory for now, and keep an eye on things ... Just curious as to why, even with -H, its not showing any blocked states within ps though ... ? The blocked column shows also processes that have objects paging. Most likely you are *short* on memory. In order to relieve the pressure program .text pages are free'ed and need to be refetched from disc whenever the respective code is being executed. If you allow every vServer to run its own mySQL with all the libaries etc it's clear what is killing you! Add more memory or make sure that .text pages can be reused by several processes. As far as I understand vServer will all see a different source and thus not share buffers or the like. Thx On Mon, 26 Jun 2006, Kostik Belousov wrote: On Mon, Jun 26, 2006 at 02:20:12AM -0300, Marc G. Fournier wrote: On Mon, 26 Jun 2006, Kostik Belousov wrote: Yes, this looks like a deadlock. As I understand, that's on 6.1-STABLE ? Yes, kernel sources, it seems, from May 25th, according to my /usr/src tree ... BTW, do you use snapshots ? Not that I've explicitly enabled ... I think that without ddb access, diagnose and debug the problem would be quite hard. Would it be a simple matter of: CTL-ALT-ESC panic to get it to dump core? Or would more be involved? Would a core dump even work? Core dumps are somewhat unconvenient in this situation. Better, sending report to me, follow my advise in http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kern eldebug-deadlocks.html Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] -- /\ Best regards, | [EMAIL PROTECTED] \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | [EMAIL PROTECTED] / \ ASCII Ribbon Campaign | Against HTML Mail and News pgptYDCrhfoid.pgp Description: PGP signature
Re: vmstat 'b' (disk busy?) field keeps climbing ...
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hej there, Marc G. Fournier wrote: I think I might have found *at least* one of the problems, and that being the excessively high blocked states while ps isn't finding anything ... MySQL We just recently started allowing clients to run a MySQL server *within* their vServer ... in a drastic move, I just shut them all down on pluto, and blocked drop'd from ~86 down to 5 in a matter of moments ... restarting them all has it climbing once more, being up around 22 already ... I don't know wether it helps at all. I guess, not... But I'm seeing blocked processes (mysqld) waiting for disk I/O all over the place when running heavy duty MySQL servers. This is on Linux and FreeBSD. Linux would be either 2.4.31 or 2.6.14 both with MySQL 4.1.x ./Marian -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (Darwin) iD8DBQFEoGzngAq87Uq5FMsRAvk2AKDhWv6DplvGko1/5F4sy7JXuSOcTACcC8zO uF/xKOKq7oyR2V/cP93CKzI= =dv7F -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
Marc G. Fournier wrote: On Mon, 26 Jun 2006, Max Laier wrote: On Monday 26 June 2006 20:25, Marc G. Fournier wrote: I think I might have found *at least* one of the problems, and that being the excessively high blocked states while ps isn't finding anything ... MySQL We just recently started allowing clients to run a MySQL server *within* their vServer ... in a drastic move, I just shut them all down on pluto, and blocked drop'd from ~86 down to 5 in a matter of moments ... restarting them all has it climbing once more, being up around 22 already ... I'm going to go with that theory for now, and keep an eye on things ... Just curious as to why, even with -H, its not showing any blocked states within ps though ... ? The blocked column shows also processes that have objects paging. Most likely you are *short* on memory. In order to relieve the pressure program .text pages are free'ed and need to be refetched from disc whenever the respective code is being executed. 'k, but shouldn't the OS be doing any swapping, if this was the case? I'm getting 1M of swappage when the blocked pages are really high ... It makes sense when you think about it (as Matthew Fuller pointed out in this thread 2 days ago). There is no point in swapping out binary pages as they are ALREADY stored on disk and can be re-fetched with ease (remember the binary is marked in use so we don't have to worry about it getting modified out from under us); why double disk usage by storing binaries on the swap partition? In this case, binary pages are getting paged out under memory pressure and have to be paged back in when needed. This results in high vnode pager activity but little swap pager activity. Matthew pointed out that the vnode pager also handles mmap()'d files, which could come into play with MySQL. -Jonathan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, 24 Jun 2006, Marc G. Fournier wrote: MGF MGF 'k, stupid question then ... what am I searching for? MGF MGF MGF MGF # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr MGF MGF Well, try MGF MGF ps axlww | awk '$10 ~ /^D[^L]/' MGF MGF which should give you a list of blocked-in-uninterruptible-syscall MGF processes MGF excluding kernel threads... MGF MGF Nadda: MGF MGF pluto# ps axlww | awk '$10 ~ /^D[^L]/' Errm... It seems I turn you to the wrong side... Normal disk-locked processes have DL (DL+) state... Well, then try something like ps ax -O ppid,flags,mwchan | awk '($6 ~ /^D/ || $6 == STAT) $3 !~ /^20.$/' Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sun, 25 Jun 2006, Dmitry Morozovsky wrote: On Sun, 25 Jun 2006, Marc G. Fournier wrote: MGF Errm... It seems I turn you to the wrong side... Normal disk-locked MGF processes MGF have DL (DL+) state... Well, then try something like MGF MGF ps ax -O ppid,flags,mwchan | awk '($6 ~ /^D/ || $6 == STAT) $3 !~ MGF /^20.$/' MGF MGF Still nothing to show 4 blocked processes (where it is sitting right now): MGF MGF pluto# ps ax -O ppid,flags,mwchan | awk '($6 ~ /^D/ || $6 == STAT) $3 MGF !~ /^20.$/' MGF PID PPID F MWCHAN TT STAT TIME COMMAND Hmm, which processes are in D state at all, then? ps ax -O ppid,flags,mwchan | awk '$6 ~ /^D/ || $6 == STAT' # ps ax -O ppid,flags,mwchan | awk '$6 ~ /^D/ || $6 == STAT' PID PPID F MWCHAN TT STAT TIME COMMAND 2 0 204 - ?? DL 0:22.83 [g_event] 3 0 204 - ?? DL 3:13.81 [g_up] 4 0 204 - ?? DL 4:10.38 [g_down] 5 0 204 - ?? DL 0:00.43 [thread taskq] 6 0 204 - ?? DL 0:00.00 [kqueue taskq] 7 0 204 - ?? DL 0:00.00 [acpi_task0] 8 0 204 - ?? DL 0:00.00 [acpi_task1] 9 0 204 - ?? DL 0:00.00 [acpi_task2] 10 0 204 ktrace ?? DL 0:00.00 [ktrace] 15 0 204 - ?? DL 0:37.74 [yarrow] 25 0 204 psleep ?? DL 0:36.57 [pagedaemon] 26 0 204 psleep ?? DL 0:00.00 [vmdaemon] 27 0 20c pgzero ?? DL 6:20.67 [pagezero] 28 0 204 psleep ?? DL 0:07.66 [bufdaemon] 29 0 204 vlruwt ?? DL 0:22.29 [vnlru] 30 0 204 syncer ?? DL 7:19.03 [syncer] 31 0 204 sdflus ?? DL 1:02.41 [softdepflush] 36 0 204 - ?? DL 3:20.45 [schedcpu] 87945 304184002 vnread pa DL+0:00.00 awk $6 ~ /^D/ || $6 == STAT Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Mon, Jun 26, 2006 at 01:47:04AM -0300, Marc G. Fournier wrote: On Mon, 26 Jun 2006, Marc G. Fournier wrote: 3416 1 1004100 ufs ?? DsJ0:13.01 /usr/local/libexec/postfix/master 3418 3416 1004100 ufs ?? DJ 0:04.16 qmgr -l -t fifo -u 33561 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33562 3416 1004100 ufs ?? DJ 0:00.06 local -t unix 33565 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33566 3416 1004100 ufs ?? DJ 0:00.05 local -t unix 33567 3416 1004100 ufs ?? DJ 0:00.02 lmtp -t unix -u 33568 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33569 3416 1004100 ufs ?? DJ 0:00.05 local -t unix 33570 3416 1004100 ufs ?? DJ 0:00.05 lmtp -t unix -u 33572 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33573 3416 1004100 ufs ?? DJ 0:00.05 local -t unix 33574 3416 1004100 ufs ?? DJ 0:00.02 bounce -z -n defer -t unix -u 33577 3416 1004100 ufs ?? DJ 0:00.05 lmtp -t unix -u 33578 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33580 3416 1004100 ufs ?? DJ 0:00.05 lmtp -t unix -u 33581 3416 1004100 biowr ?? DJ 0:00.01 bounce -z -n defer -t unix -u 33584 3416 1004100 ufs ?? DJ 0:00.01 bounce -z -n defer -t unix -u 54491 480904000 ufs ?? D 0:00.79 du -skx /vm/296/alambredelacalcompartido.com 30418 304164002 ppwait pa Ds 0:00.19 -csh (csh) The postfix/master process on PID 3416 is within the alambredelacalcompartido.com vServer/directory structure that is currently blocked for the du ... If that helps any ... and none of those processes I can appear to kill off ... Yes, this looks like a deadlock. As I understand, that's on 6.1-STABLE ? BTW, do you use snapshots ? I think that without ddb access, diagnose and debug the problem would be quite hard. pgpnQG9EguM1t.pgp Description: PGP signature
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Mon, 26 Jun 2006, Marc G. Fournier wrote: 3416 1 1004100 ufs ?? DsJ0:13.01 /usr/local/libexec/postfix/master 3418 3416 1004100 ufs ?? DJ 0:04.16 qmgr -l -t fifo -u 33561 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33562 3416 1004100 ufs ?? DJ 0:00.06 local -t unix 33565 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33566 3416 1004100 ufs ?? DJ 0:00.05 local -t unix 33567 3416 1004100 ufs ?? DJ 0:00.02 lmtp -t unix -u 33568 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33569 3416 1004100 ufs ?? DJ 0:00.05 local -t unix 33570 3416 1004100 ufs ?? DJ 0:00.05 lmtp -t unix -u 33572 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33573 3416 1004100 ufs ?? DJ 0:00.05 local -t unix 33574 3416 1004100 ufs ?? DJ 0:00.02 bounce -z -n defer -t unix -u 33577 3416 1004100 ufs ?? DJ 0:00.05 lmtp -t unix -u 33578 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33580 3416 1004100 ufs ?? DJ 0:00.05 lmtp -t unix -u 33581 3416 1004100 biowr ?? DJ 0:00.01 bounce -z -n defer -t unix -u 33584 3416 1004100 ufs ?? DJ 0:00.01 bounce -z -n defer -t unix -u 54491 480904000 ufs ?? D 0:00.79 du -skx /vm/296/alambredelacalcompartido.com 30418 304164002 ppwait pa Ds 0:00.19 -csh (csh) The postfix/master process on PID 3416 is within the alambredelacalcompartido.com vServer/directory structure that is currently blocked for the du ... If that helps any ... and none of those processes I can appear to kill off ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sun, 25 Jun 2006, Marc G. Fournier wrote: # ps ax -O ppid,flags,mwchan | awk '$6 ~ /^D/ || $6 == STAT' PID PPID F MWCHAN TT STAT TIME COMMAND 2 0 204 - ?? DL 0:22.83 [g_event] 3 0 204 - ?? DL 3:13.81 [g_up] 4 0 204 - ?? DL 4:10.38 [g_down] 5 0 204 - ?? DL 0:00.43 [thread taskq] 6 0 204 - ?? DL 0:00.00 [kqueue taskq] 7 0 204 - ?? DL 0:00.00 [acpi_task0] 8 0 204 - ?? DL 0:00.00 [acpi_task1] 9 0 204 - ?? DL 0:00.00 [acpi_task2] 10 0 204 ktrace ?? DL 0:00.00 [ktrace] 15 0 204 - ?? DL 0:37.74 [yarrow] 25 0 204 psleep ?? DL 0:36.57 [pagedaemon] 26 0 204 psleep ?? DL 0:00.00 [vmdaemon] 27 0 20c pgzero ?? DL 6:20.67 [pagezero] 28 0 204 psleep ?? DL 0:07.66 [bufdaemon] 29 0 204 vlruwt ?? DL 0:22.29 [vnlru] 30 0 204 syncer ?? DL 7:19.03 [syncer] 31 0 204 sdflus ?? DL 1:02.41 [softdepflush] 36 0 204 - ?? DL 3:20.45 [schedcpu] 87945 304184002 vnread pa DL+0:00.00 awk $6 ~ /^D/ || $6 == STAT *Now*, I have something: 0 26 0 8008212 165648 198 0 0 6 280 0 26 0 326 2034 1322 7 5 87 0 26 0 7956160 169000 177 0 0 0 253 0 10 0 286 1943 1048 8 3 89 1 26 0 7947212 170768 40 1 0 0 86 0 35 0 358 1160 1199 1 3 96 0 26 0 7957372 168476 375 5 1 1 294 0 7 0 299 2108 1334 5 5 89 0 26 0 7934964 169540 76 0 0 0 91 0 1 0 263 999 842 1 2 97 0 26 0 7920860 171672 1334 0 0 0 1205 0 25 0 334 4443 2003 14 12 74 0 26 0 7928452 169656 399 3 1 1 333 0 12 0 334 2303 1893 5 7 88 0 26 0 7957244 163572 329 3 0 0 169 0 10 0 347 2177 1896 4 5 91 3416 1 1004100 ufs ?? DsJ0:13.01 /usr/local/libexec/postfix/master 3418 3416 1004100 ufs ?? DJ 0:04.16 qmgr -l -t fifo -u 33561 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33562 3416 1004100 ufs ?? DJ 0:00.06 local -t unix 33565 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33566 3416 1004100 ufs ?? DJ 0:00.05 local -t unix 33567 3416 1004100 ufs ?? DJ 0:00.02 lmtp -t unix -u 33568 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33569 3416 1004100 ufs ?? DJ 0:00.05 local -t unix 33570 3416 1004100 ufs ?? DJ 0:00.05 lmtp -t unix -u 33572 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33573 3416 1004100 ufs ?? DJ 0:00.05 local -t unix 33574 3416 1004100 ufs ?? DJ 0:00.02 bounce -z -n defer -t unix -u 33577 3416 1004100 ufs ?? DJ 0:00.05 lmtp -t unix -u 33578 3416 1004100 ufs ?? DJ 0:00.02 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 33580 3416 1004100 ufs ?? DJ 0:00.05 lmtp -t unix -u 33581 3416 1004100 biowr ?? DJ 0:00.01 bounce -z -n defer -t unix -u 33584 3416 1004100 ufs ?? DJ 0:00.01 bounce -z -n defer -t unix -u 54491 480904000 ufs ?? D 0:00.79 du -skx /vm/296/alambredelacalcompartido.com 30418 304164002 ppwait pa Ds 0:00.19 -csh (csh) Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Mon, 26 Jun 2006, Kostik Belousov wrote: Yes, this looks like a deadlock. As I understand, that's on 6.1-STABLE ? Yes, kernel sources, it seems, from May 25th, according to my /usr/src tree ... BTW, do you use snapshots ? Not that I've explicitly enabled ... I think that without ddb access, diagnose and debug the problem would be quite hard. Would it be a simple matter of: CTL-ALT-ESC panic to get it to dump core? Or would more be involved? Would a core dump even work? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, 24 Jun 2006, Marc G. Fournier wrote: MGF 'b' stands for blocked, not busy. Judging by your page fault rate MGF and the high number of frees and pages being scanned, you're probably MGF swapping tasks in and out and are waiting on disk. Take a look at MGF vmstat -s, and consider adding more RAM if this is correct... MGF MGF is there a way of finding out what processes are blocked? Aren't they in 'D' status by ps? Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, Jun 24, 2006 at 11:55:26AM +0400, Dmitry Morozovsky wrote: On Sat, 24 Jun 2006, Marc G. Fournier wrote: MGF 'b' stands for blocked, not busy. Judging by your page fault rate MGF and the high number of frees and pages being scanned, you're probably MGF swapping tasks in and out and are waiting on disk. Take a look at MGF vmstat -s, and consider adding more RAM if this is correct... MGF MGF is there a way of finding out what processes are blocked? Aren't they in 'D' status by ps? Use ps axlww. In this way, at least actual blocking points are shown. pgpXvesDvKOr8.pgp Description: PGP signature
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, 24 Jun 2006, Kostik Belousov wrote: KB MGF 'b' stands for blocked, not busy. Judging by your page fault rate KB MGF and the high number of frees and pages being scanned, you're probably KB MGF swapping tasks in and out and are waiting on disk. Take a look at KB MGF vmstat -s, and consider adding more RAM if this is correct... KB MGF KB MGF is there a way of finding out what processes are blocked? KB KB Aren't they in 'D' status by ps? KB Use ps axlww. In this way, at least actual blocking points are shown. /me knows ;-) OTOH, I'm faurly sure scrappy@ should know this very well... Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Fri, Jun 23, 2006 at 11:38:44PM -0400 I heard the voice of Chuck Swiger, and lo! it spake thus: Yeah-- it's more common for a system to need more RAM for dynamicly allocated content which would be placed into the swapfile then it uses binary executable pages, it's possible to go the other way, too. Yeah, and it's WAY the other way. 0 swap pager pageins 0 swap pager pageouts 31750 vnode pager pageins 15954 vnode pager pageouts That speaks of HUGE memory pressure in program text; plenty for the 'data' of the programs, but really really tight for the programs themselves. That'll also lead to a lot of disk thrashing. And there aren't even all that many fork() calls, relative to my box (of course, mine does things like ports builds that spawn of totally stupid numbers of processes, so that may be a quirk here rather than there). Perhaps rebuilding a bunch of stuff with -Os will gain you some breathing room, but more memory or less load is probably the only real answer. And I think you already had 4 gig in an i386 box, so you're kinda in trouble on the memory side. -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Fri, Jun 23, 2006 at 08:33:05PM -0500 I heard the voice of Matthew D. Fuller, and lo! it spake thus: It's the vnode pager, not the swap pager. AIUI, that's mostly paging in and out pages of running binaries (from the image on disk), not moving stuff in and out of swapspace. Actually, as I think of it, I think the vnode pager would also be the part used faulting pages in and out of mmap()'d files, which could point at the database. Memory pressure from that wouldn't result in swapping, since clean pages would just get tossed and dirty pages would be synced and tossed. In that case it may not be so much a 'problem', as just 'normal' for your case, and your actual problem may be somewhat elsewhere. -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, 24 Jun 2006, Matthew D. Fuller wrote: On Fri, Jun 23, 2006 at 11:38:44PM -0400 I heard the voice of Chuck Swiger, and lo! it spake thus: Yeah-- it's more common for a system to need more RAM for dynamicly allocated content which would be placed into the swapfile then it uses binary executable pages, it's possible to go the other way, too. Yeah, and it's WAY the other way. 0 swap pager pageins 0 swap pager pageouts 31750 vnode pager pageins 15954 vnode pager pageouts That speaks of HUGE memory pressure in program text; plenty for the 'data' of the programs, but really really tight for the programs themselves. That'll also lead to a lot of disk thrashing. And there aren't even all that many fork() calls, relative to my box (of course, mine does things like ports builds that spawn of totally stupid numbers of processes, so that may be a quirk here rather than there). Perhaps rebuilding a bunch of stuff with -Os will gain you some breathing room, but more memory or less load is probably the only real answer. And I think you already had 4 gig in an i386 box, so you're kinda in trouble on the memory side. Would having only 1 CPU (1 died, used to be two) cause this, or pure memory? And, if things are *that* tight, shouldn't it be doing more swapping? pluto# pstat -s Device 1K-blocks UsedAvail Capacity /dev/da0s1b 8388608 7324 8381284 0% pluto# uptime 2:52PM up 20:17, 5 users, load averages: 1.26, 4.08, 5.64 pluto# From top: last pid: 46611; load averages: 1.09, 3.86, 5.53 up 0+20:17:38 14:52:16 1311 processes:9 running, 1301 sleeping, 1 zombie CPU states: 1.1% user, 0.0% nice, 3.0% system, 0.4% interrupt, 95.6% idle Mem: 3088M Active, 349M Inact, 313M Wired, 165M Cache, 112M Buf, 27M Free Swap: 8192M Total, 7268K Used, 8185M Free Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, 24 Jun 2006, Kostik Belousov wrote: On Sat, Jun 24, 2006 at 11:55:26AM +0400, Dmitry Morozovsky wrote: On Sat, 24 Jun 2006, Marc G. Fournier wrote: MGF 'b' stands for blocked, not busy. Judging by your page fault rate MGF and the high number of frees and pages being scanned, you're probably MGF swapping tasks in and out and are waiting on disk. Take a look at MGF vmstat -s, and consider adding more RAM if this is correct... MGF MGF is there a way of finding out what processes are blocked? Aren't they in 'D' status by ps? Use ps axlww. In this way, at least actual blocking points are shown. 'k, stupid question then ... what am I searching for? # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr 654 select 230 lockf 166 wait 85 - 80 piperd 71 nanslp 33 kserel 22 user 10 pause 9 ttyin 5 sbwait 3 psleep 3 accept 2 kqread 2 Giant 1 vlruwt 1 syncer 1 sdflus 1 ppwait 1 ktrace 1 MWCHAN According to vmstat, I'm holding at '4 blocked' for the most part ... sbwwait is socket related, not disk ... and none of the others look right ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.org ICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, 24 Jun 2006, Dmitry Morozovsky wrote: On Sat, 24 Jun 2006, Kostik Belousov wrote: KB MGF 'b' stands for blocked, not busy. Judging by your page fault rate KB MGF and the high number of frees and pages being scanned, you're probably KB MGF swapping tasks in and out and are waiting on disk. Take a look at KB MGF vmstat -s, and consider adding more RAM if this is correct... KB MGF KB MGF is there a way of finding out what processes are blocked? KB KB Aren't they in 'D' status by ps? KB Use ps axlww. In this way, at least actual blocking points are shown. /me knows ;-) OTOH, I'm faurly sure scrappy@ should know this very well... Considering I thought it was 'busy' and not 'blocked', I'm in new territory here :) Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, 24 Jun 2006, Marc G. Fournier wrote: MGF MGF is there a way of finding out what processes are blocked? MGF MGF Aren't they in 'D' status by ps? MGF Use ps axlww. In this way, at least actual blocking points are shown. MGF MGF 'k, stupid question then ... what am I searching for? MGF MGF # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr Well, try ps axlww | awk '$10 ~ /^D[^L]/' which should give you a list of blocked-in-uninterruptible-syscall processes excluding kernel threads... Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- [EMAIL PROTECTED] *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, 24 Jun 2006, Dmitry Morozovsky wrote: On Sat, 24 Jun 2006, Marc G. Fournier wrote: MGF MGF is there a way of finding out what processes are blocked? MGF MGF Aren't they in 'D' status by ps? MGF Use ps axlww. In this way, at least actual blocking points are shown. MGF MGF 'k, stupid question then ... what am I searching for? MGF MGF # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr Well, try ps axlww | awk '$10 ~ /^D[^L]/' which should give you a list of blocked-in-uninterruptible-syscall processes excluding kernel threads... Nadda: pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# yet vmstat shows 4: 0 4 0 7248532 229668 545 0 0 0 735 0 5 0 365 3308 1529 15 8 78 0 4 0 7239388 222364 127 8 2 0 158 0 27 0 412 2324 1075 8 5 87 if I do it repeatedly, fairly often, I can occasionally get one or two: pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' 0 18886 1 0 96 0 10316 5968 proctr DsJ ??0:02.81 /usr/local/sbin/httpd 0 19673 1 0 96 0 13776 6688 proctr DsJ ??0:02.88 /usr/local/sbin/httpd -DSSL 0 46540 46538 1 8 0 5020 2396 ppwait Dsp40:00.07 -csh (csh) pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' 0 55163 55160 6 96 0 0 8 proctr DE??0:00.00 uptime 0 46540 46538 1 96 0 5020 2396 proctr Dsp40:00.07 -csh (csh) pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' 0 46540 46538 40 8 0 5020 2396 ppwait Dsp40:00.08 -csh (csh) pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' But its definitely not 'a consistent 4' ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, Jun 24, 2006 at 02:57:27PM -0300, Marc G. Fournier wrote: On Sat, 24 Jun 2006, Kostik Belousov wrote: On Sat, Jun 24, 2006 at 11:55:26AM +0400, Dmitry Morozovsky wrote: On Sat, 24 Jun 2006, Marc G. Fournier wrote: MGF 'b' stands for blocked, not busy. Judging by your page fault rate MGF and the high number of frees and pages being scanned, you're probably MGF swapping tasks in and out and are waiting on disk. Take a look at MGF vmstat -s, and consider adding more RAM if this is correct... MGF MGF is there a way of finding out what processes are blocked? Aren't they in 'D' status by ps? Use ps axlww. In this way, at least actual blocking points are shown. 'k, stupid question then ... what am I searching for? # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr 654 select 230 lockf 166 wait 85 - 80 piperd 71 nanslp 33 kserel 22 user 10 pause 9 ttyin 5 sbwait 3 psleep 3 accept 2 kqread 2 Giant 1 vlruwt 1 syncer 1 sdflus 1 ppwait 1 ktrace 1 MWCHAN According to vmstat, I'm holding at '4 blocked' for the most part ... sbwwait is socket related, not disk ... and none of the others look right ... I would say, using big magic cristall ball, that you problems are not kernel-related. I see only too suspicious points: 1. high number of pipe readers and waiters for file locks. It may be normal for your load. 2. 2 Giant holders/lockers. Is it constant ? Are the processes holding/waiting for Giant are the same ? Anyway, being in your shoes, I would start looking at applications. Ah, and does dmesg show anything ? pgpuiHdwOdsiN.pgp Description: PGP signature
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, Jun 24, 2006 at 09:52:03PM +0300, Kostik Belousov wrote: On Sat, Jun 24, 2006 at 02:57:27PM -0300, Marc G. Fournier wrote: On Sat, 24 Jun 2006, Kostik Belousov wrote: On Sat, Jun 24, 2006 at 11:55:26AM +0400, Dmitry Morozovsky wrote: On Sat, 24 Jun 2006, Marc G. Fournier wrote: MGF 'b' stands for blocked, not busy. Judging by your page fault rate MGF and the high number of frees and pages being scanned, you're probably MGF swapping tasks in and out and are waiting on disk. Take a look at MGF vmstat -s, and consider adding more RAM if this is correct... MGF MGF is there a way of finding out what processes are blocked? Aren't they in 'D' status by ps? Use ps axlww. In this way, at least actual blocking points are shown. 'k, stupid question then ... what am I searching for? # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr 654 select 230 lockf 166 wait 85 - 80 piperd 71 nanslp 33 kserel 22 user 10 pause 9 ttyin 5 sbwait 3 psleep 3 accept 2 kqread 2 Giant 1 vlruwt 1 syncer 1 sdflus 1 ppwait 1 ktrace 1 MWCHAN According to vmstat, I'm holding at '4 blocked' for the most part ... sbwwait is socket related, not disk ... and none of the others look right ... I would say, using big magic cristall ball, that you problems are not kernel-related. I see only too suspicious points: 1. high number of pipe readers and waiters for file locks. It may be normal for your load. 2. 2 Giant holders/lockers. Is it constant ? Are the processes holding/waiting for Giant are the same ? Anyway, being in your shoes, I would start looking at applications. Ah, and does dmesg show anything ? And another question: what are the processes in the state user ? I never see that state. More, search thru the sources does not show what this could be. pgpM0emrvUT4e.pgp Description: PGP signature
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, 24 Jun 2006, Kostik Belousov wrote: 2. 2 Giant holders/lockers. Is it constant ? Are the processes holding/waiting for Giant are the same ? Mostly appears to be 'clock' ... pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.03 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.03 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.03 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.03 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.03 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep pluto# ps axlww | grep Giant | grep -v grep 0 92517 46540 114 110 0 5032 2412 Giant LV+ p40:00.00 -csh (csh) pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.04 [swi4: clock] 1001 14098 1 0 96 0 3308 1324 Giant LsJ ??0:04.98 /usr/local/libexec/postfix/master pluto# ps axlww | grep Giant | grep -v grep pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.04 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep pluto# And I suspect that the master is disk i/o, since the iir driver is listed as still GIANT-LOCKED: iir0: Intel Integrated RAID Controller mem 0xfc8f-0xfc8f3fff irq 30 at device 9.0 on pci1 iir0: [GIANT-LOCKED] Ah, and does dmesg show anything ? Anything (other then the iir driver) that I'd be looking for in there? Or could that be what is compounding the problem? Too many things trying to acquire GIANT over the drive(s), creating a deadlock? Long shot there ... ? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, 24 Jun 2006, Kostik Belousov wrote: On Sat, Jun 24, 2006 at 09:52:03PM +0300, Kostik Belousov wrote: On Sat, Jun 24, 2006 at 02:57:27PM -0300, Marc G. Fournier wrote: On Sat, 24 Jun 2006, Kostik Belousov wrote: On Sat, Jun 24, 2006 at 11:55:26AM +0400, Dmitry Morozovsky wrote: On Sat, 24 Jun 2006, Marc G. Fournier wrote: MGF 'b' stands for blocked, not busy. Judging by your page fault rate MGF and the high number of frees and pages being scanned, you're probably MGF swapping tasks in and out and are waiting on disk. Take a look at MGF vmstat -s, and consider adding more RAM if this is correct... MGF MGF is there a way of finding out what processes are blocked? Aren't they in 'D' status by ps? Use ps axlww. In this way, at least actual blocking points are shown. 'k, stupid question then ... what am I searching for? # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr 654 select 230 lockf 166 wait 85 - 80 piperd 71 nanslp 33 kserel 22 user 10 pause 9 ttyin 5 sbwait 3 psleep 3 accept 2 kqread 2 Giant 1 vlruwt 1 syncer 1 sdflus 1 ppwait 1 ktrace 1 MWCHAN According to vmstat, I'm holding at '4 blocked' for the most part ... sbwwait is socket related, not disk ... and none of the others look right ... I would say, using big magic cristall ball, that you problems are not kernel-related. I see only too suspicious points: 1. high number of pipe readers and waiters for file locks. It may be normal for your load. 2. 2 Giant holders/lockers. Is it constant ? Are the processes holding/waiting for Giant are the same ? Anyway, being in your shoes, I would start looking at applications. Ah, and does dmesg show anything ? And another question: what are the processes in the state user ? I never see that state. More, search thru the sources does not show what this could be. Odd, I'm not finding any, but, I did get a Giant on a grep of the ps listing:: pluto# ps axlww | grep user 0 93055 46540 0 96 0 348 212 Giant L+p40:00.00 grep user Not sure where those 'user' came from though ... just ran the above again: # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr 603 select 231 lockf 71 nanslp 33 - 30 kserel 23 wait 9 ttyin 9 sbwait 7 pause 6 accept 4 piperd 3 psleep 3 kqread 3 Giant 1 syncer 1 sdflus 1 ppwait 1 pgzero 1 ktrace 1 MWCHAN And nothing ... Got a Giant lock on sshd too? pluto# ps axlww | grep Giant 0 693 556 1 96 0 6096 2080 Giant Ls??0:02.18 sshd: [EMAIL PROTECTED] (sshd) 0 94334 46540 0 96 0 348 208 - R+p40:00.00 grep Giant Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Sat, Jun 24, 2006 at 04:45:49PM -0300, Marc G. Fournier wrote: On Sat, 24 Jun 2006, Kostik Belousov wrote: On Sat, Jun 24, 2006 at 09:52:03PM +0300, Kostik Belousov wrote: On Sat, Jun 24, 2006 at 02:57:27PM -0300, Marc G. Fournier wrote: On Sat, 24 Jun 2006, Kostik Belousov wrote: On Sat, Jun 24, 2006 at 11:55:26AM +0400, Dmitry Morozovsky wrote: On Sat, 24 Jun 2006, Marc G. Fournier wrote: MGF 'b' stands for blocked, not busy. Judging by your page fault rate MGF and the high number of frees and pages being scanned, you're probably MGF swapping tasks in and out and are waiting on disk. Take a look at MGF vmstat -s, and consider adding more RAM if this is correct... MGF MGF is there a way of finding out what processes are blocked? Aren't they in 'D' status by ps? Use ps axlww. In this way, at least actual blocking points are shown. 'k, stupid question then ... what am I searching for? # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr 654 select 230 lockf 166 wait 85 - 80 piperd 71 nanslp 33 kserel 22 user 10 pause 9 ttyin 5 sbwait 3 psleep 3 accept 2 kqread 2 Giant 1 vlruwt 1 syncer 1 sdflus 1 ppwait 1 ktrace 1 MWCHAN According to vmstat, I'm holding at '4 blocked' for the most part ... sbwwait is socket related, not disk ... and none of the others look right ... I would say, using big magic cristall ball, that you problems are not kernel-related. I see only too suspicious points: 1. high number of pipe readers and waiters for file locks. It may be normal for your load. 2. 2 Giant holders/lockers. Is it constant ? Are the processes holding/waiting for Giant are the same ? Anyway, being in your shoes, I would start looking at applications. Ah, and does dmesg show anything ? And another question: what are the processes in the state user ? I never see that state. More, search thru the sources does not show what this could be. Odd, I'm not finding any, but, I did get a Giant on a grep of the ps listing:: pluto# ps axlww | grep user 0 93055 46540 0 96 0 348 212 Giant L+p40:00.00 grep user Not sure where those 'user' came from though ... just ran the above again: # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr 603 select 231 lockf 71 nanslp 33 - 30 kserel 23 wait 9 ttyin 9 sbwait 7 pause 6 accept 4 piperd 3 psleep 3 kqread 3 Giant 1 syncer 1 sdflus 1 ppwait 1 pgzero 1 ktrace 1 MWCHAN And nothing ... Got a Giant lock on sshd too? pluto# ps axlww | grep Giant 0 693 556 1 96 0 6096 2080 Giant Ls??0:02.18 sshd: [EMAIL PROTECTED] (sshd) 0 94334 46540 0 96 0 348 208 - R+p40:00.00 grep Giant Everything looks normal, transient Giant aquire/contention is quite normal, esp. when you have several Giant-locked kernel parts. I strongly suggest to move point of investigation to the application(s) itself. Kernel seems to be innocent. [Deadlock due to disk driver/Giant/fs immediately shows as HUGE number of processes in D state with completely different set of wait states. All your processes do select/wait for file lock/read from pipe/something threaded.] pgpmx3UNhUhhu.pgp Description: PGP signature
Re: vmstat 'b' (disk busy?) field keeps climbing ...
'k, this has gotta be a leak somewhere ... I'm now up to 6 blocked: 0 8 0 7449224 236552 213 2 1 0 109 0 101 0 475 2890 2143 2 6 92 0 6 0 7481104 247704 578 0 0 0 1196 0 262 0 808 8901 3049 5 16 79 0 6 0 7450820 253576 1385 3 2 3 1379 0 13 0 303 4742 1703 13 13 73 0 6 0 7478168 248372 295 0 0 0 160 0 57 0 428 1900 2616 2 7 92 0 6 0 7473064 2490721 0 0 0 23 0 6 0 273 822 845 1 2 97 1 6 0 7479000 243164 275 17 7 0 144 0 17 0 317 1572 2180 2 6 92 But there don't appear to be any processes reporting itself as being blocked: pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' 0 30418 30416 6 8 0 4916 2624 ppwait Dspa0:00.13 -csh (csh) pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' 0 30418 30416 137 8 0 4916 2624 ppwait Dspa0:00.14 -csh (csh) pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' pluto# ps axlww | awk '$10 ~ /^D[^L]/' Or is there something else I should be looking at/for? In the case of this system, its kernel sources from ~May 25th ... On Sat, 24 Jun 2006, Marc G. Fournier wrote: On Sat, 24 Jun 2006, Kostik Belousov wrote: 2. 2 Giant holders/lockers. Is it constant ? Are the processes holding/waiting for Giant are the same ? Mostly appears to be 'clock' ... pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.03 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.03 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.03 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.03 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.03 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep pluto# ps axlww | grep Giant | grep -v grep 0 92517 46540 114 110 0 5032 2412 Giant LV+ p40:00.00 -csh (csh) pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.04 [swi4: clock] 1001 14098 1 0 96 0 3308 1324 Giant LsJ ??0:04.98 /usr/local/libexec/postfix/master pluto# ps axlww | grep Giant | grep -v grep pluto# ps axlww | grep Giant | grep -v grep 012 0 0 -32 0 0 8 Giant LL??3:07.04 [swi4: clock] pluto# ps axlww | grep Giant | grep -v grep pluto# And I suspect that the master is disk i/o, since the iir driver is listed as still GIANT-LOCKED: iir0: Intel Integrated RAID Controller mem 0xfc8f-0xfc8f3fff irq 30 at device 9.0 on pci1 iir0: [GIANT-LOCKED] Ah, and does dmesg show anything ? Anything (other then the iir driver) that I'd be looking for in there? Or could that be what is compounding the problem? Too many things trying to acquire GIANT over the drive(s), creating a deadlock? Long shot there ... ? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Jun 23, 2006, at 4:44 PM, Marc G. Fournier wrote: procs memory pagedisks faults cpu r b w avmfre flt re pi po fr sr da0 pa0 in sy cs us sy id 1 42 1 10249060 161668 1290 54 12 3 1409 2202 102 0 751 6416 3350 24 15 61 0 39 0 10148976 148104 654 10 5 2 660 0 49 0 615 4440 2584 18 9 73 the last time it hung, it hit about 45 ... about 6 hours ago, it was at ~5-10 ... anything I should look at to figure out where those 39+ are 'busy'? 'b' stands for blocked, not busy. Judging by your page fault rate and the high number of frees and pages being scanned, you're probably swapping tasks in and out and are waiting on disk. Take a look at vmstat -s, and consider adding more RAM if this is correct... The system is running a May 25th kernel of FreeBSD 6-STABLE .. Dual- PIII ... -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Fri, 23 Jun 2006, Charles Swiger wrote: On Jun 23, 2006, at 4:44 PM, Marc G. Fournier wrote: procs memory pagedisks faults cpu r b w avmfre flt re pi po fr sr da0 pa0 in sy cs us sy id 1 42 1 10249060 161668 1290 54 12 3 1409 2202 102 0 751 6416 3350 24 15 61 0 39 0 10148976 148104 654 10 5 2 660 0 49 0 615 4440 2584 18 9 73 the last time it hung, it hit about 45 ... about 6 hours ago, it was at ~5-10 ... anything I should look at to figure out where those 39+ are 'busy'? 'b' stands for blocked, not busy. Judging by your page fault rate and the high number of frees and pages being scanned, you're probably swapping tasks in and out and are waiting on disk. Take a look at vmstat -s, and consider adding more RAM if this is correct... 'k, I will keep an eye on things and check vmstat -s as those numbers grow higher ... thanks for hte clarification on 'blocked' vs 'busy' :( What specifically should I be looking at in vmstat -s? note that the server just rebooted, 0 swap is used: # pstat -s Device 1K-blocks UsedAvail Capacity /dev/da0s1b 83886080 8388608 0% and vmstat -s is showing: # vmstat -s 8434656 cpu context switches 2554513 device interrupts 430486 software interrupts 4353484 traps 21299255 system calls 36 kernel threads created 28399 fork() calls 1708 vfork() calls 0 rfork() calls 0 swap pager pageins 0 swap pager pages paged in 0 swap pager pageouts 0 swap pager pages paged out 31750 vnode pager pageins 209538 vnode pager pages paged in 15954 vnode pager pageouts 219494 vnode pager pages paged out 20 page daemon wakeups 648514 pages examined by the page daemon 16508 pages reactivated 1014412 copy-on-write faults 5389 copy-on-write optimized faults 1982109 zero fill pages zeroed 1070481 zero fill pages prezeroed 1626 intransit blocking page faults 3786729 total VM faults taken 0 pages affected by kernel thread creation 2344822 pages affected by fork() 299231 pages affected by vfork() 0 pages affected by rfork() 3360377 pages freed 0 pages freed by daemon 1672560 pages freed by exiting processes 618892 pages active 275063 pages inactive 42967 pages in VM cache 66898 pages wired down 6398 pages free 4096 bytes per page 36972009 total name lookups cache hits (97% pos + 0% neg) system 0% per-directory deletions 0%, falsehits 0%, toolong 0% thx ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Fri, Jun 23, 2006 at 07:12:54PM -0300 I heard the voice of Marc G. Fournier, and lo! it spake thus: 31750 vnode pager pageins 209538 vnode pager pages paged in 15954 vnode pager pageouts 219494 vnode pager pages paged out This may be something to look at. My workstation (~3.5 day uptime) has a fraction of that: 7204 vnode pager pageins 37609 vnode pager pages paged in 1 vnode pager pageouts 1 vnode pager pages paged out Compare to the number of processes spawned (I'm at 10x yours): 28399 fork() calls 1708 vfork() calls 0 rfork() calls 282510 fork() calls 22164 vfork() calls 0 rfork() calls That sounds like hefty memory pressure. -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Fri, 23 Jun 2006, Matthew D. Fuller wrote: On Fri, Jun 23, 2006 at 07:12:54PM -0300 I heard the voice of Marc G. Fournier, and lo! it spake thus: 31750 vnode pager pageins 209538 vnode pager pages paged in 15954 vnode pager pageouts 219494 vnode pager pages paged out This may be something to look at. My workstation (~3.5 day uptime) has a fraction of that: 7204 vnode pager pageins 37609 vnode pager pages paged in 1 vnode pager pageouts 1 vnode pager pages paged out Compare to the number of processes spawned (I'm at 10x yours): 28399 fork() calls 1708 vfork() calls 0 rfork() calls 282510 fork() calls 22164 vfork() calls 0 rfork() calls That sounds like hefty memory pressure. Which is odd, no, if I'm hardly swapping? # pstat -s Device 1K-blocks UsedAvail Capacity /dev/da0s1b 8388608 3396 8385212 0% Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Fri, Jun 23, 2006 at 10:02:22PM -0300 I heard the voice of Marc G. Fournier, and lo! it spake thus: Which is odd, no, if I'm hardly swapping? Well, 31750 vnode pager pageins 15954 vnode pager pageouts It's the vnode pager, not the swap pager. AIUI, that's mostly paging in and out pages of running binaries (from the image on disk), not moving stuff in and out of swapspace. -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Fri, 23 Jun 2006, Matthew D. Fuller wrote: On Fri, Jun 23, 2006 at 10:02:22PM -0300 I heard the voice of Marc G. Fournier, and lo! it spake thus: Which is odd, no, if I'm hardly swapping? Well, 31750 vnode pager pageins 15954 vnode pager pageouts It's the vnode pager, not the swap pager. AIUI, that's mostly paging in and out pages of running binaries (from the image on disk), not moving stuff in and out of swapspace. ah, okay ... I've been talking with the on-site tech, and there are more issues then just what I'm seeing with vmstat ... the fact that one of the CPUs died yesterday being one of them :( Just bothers me that its been running 4.x for 3 years now, no problems and as soon as I upgrade it to 6.x, all the headaches start :( Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Fri, Jun 23, 2006 at 11:02:35PM -0300, Marc G. Fournier wrote: Just bothers me that its been running 4.x for 3 years now, no problems and as soon as I upgrade it to 6.x, all the headaches start :( I am well familiar with Mr. Murphy and all his works :-) mcl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
Marc G. Fournier wrote: [ ... ] 31750 vnode pager pageins 15954 vnode pager pageouts It's the vnode pager, not the swap pager. AIUI, that's mostly paging in and out pages of running binaries (from the image on disk), not moving stuff in and out of swapspace. ah, okay ... Yeah-- it's more common for a system to need more RAM for dynamicly allocated content which would be placed into the swapfile then it uses binary executable pages, it's possible to go the other way, too. -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vmstat 'b' (disk busy?) field keeps climbing ...
On Fri, 23 Jun 2006, Charles Swiger wrote: On Jun 23, 2006, at 4:44 PM, Marc G. Fournier wrote: procs memory pagedisks faults cpu r b w avmfre flt re pi po fr sr da0 pa0 in sy cs us sy id 1 42 1 10249060 161668 1290 54 12 3 1409 2202 102 0 751 6416 3350 24 15 61 0 39 0 10148976 148104 654 10 5 2 660 0 49 0 615 4440 2584 18 9 73 the last time it hung, it hit about 45 ... about 6 hours ago, it was at ~5-10 ... anything I should look at to figure out where those 39+ are 'busy'? 'b' stands for blocked, not busy. Judging by your page fault rate and the high number of frees and pages being scanned, you're probably swapping tasks in and out and are waiting on disk. Take a look at vmstat -s, and consider adding more RAM if this is correct... is there a way of finding out what processes are blocked? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]