cron pile up! Lot's of cron: running job (cron)
http://lists.freebsd.org/pipermail/freebsd-questions/2007-December/164174.html ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: cron pile up! Lot's of cron: running job (cron)
Rudy wrote: The thing is, sometimes it runs fine, other times it backlogs (It may complete at a latter date... the PID 82253 is still waiting ... Gonna see it it completes instead of killing all the stuck crons...). All the crons are cleared out right now... 'ps' shows only crond. Related to putting the other cron job in marks??? Well, I think I messed up in my suggestion, by omitting the CRON at the end. My point/thought was, put the entire command /path/to/script.sh ARG in quotes. Cron is pretty archaic, and I wondered if it was trying to run /path/to/script.sh and ARG as two jobs instead of one, and hanging on ARG since CRON is something of a reserved word. IANAE, YMMV, and all that. Kevin Kinsey -- I like myself, but I won't say I'm as handsome as the bull that kidnapped Europa. -- Marcus Tullius Cicero ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: cron pile up! Lot's of cron: running job (cron)
On Sat, 15 Dec 2007 16:18:31 -0600 Kevin Kinsey [EMAIL PROTECTED] wrote: Rudy wrote: The thing is, sometimes it runs fine, other times it backlogs (It may complete at a latter date... the PID 82253 is still waiting ... Gonna see it it completes instead of killing all the stuck crons...). All the crons are cleared out right now... 'ps' shows only crond. Related to putting the other cron job in marks??? Well, I think I messed up in my suggestion, by omitting the CRON at the end. My point/thought was, put the entire command /path/to/script.sh ARG in quotes. Cron is pretty archaic, and I wondered if it was trying to run /path/to/script.sh and ARG as two jobs instead of one, and hanging on ARG since CRON is something of a reserved word. IANAE, YMMV, and all that. MMV :) The following has been merrily running on three boxes, the oldest of them for, um, 9.5 years: */5 * * * * root/root/bin/ipfwsnap cron Yes, 'cron' is a checked and logged argument to ipfwsnap. Various other /etc/crontab entries demonstrate no need to enclose arguments in quotes, except where they'd be necessary anyway - as per examples in crontab(5) Cheers, Ian ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: cron pile up! Lot's of cron: running job (cron)
Dan Nelson wrote: In the last episode (Dec 03), Support (Rudy) said: Below is part of the cron... Seems like any random cronjob can get clogged up... load varies from 0.2 to 1.0 on this dual-core box. I rebooted the box -- cron's continue to slowly pile up. One of the cronjobs that is 'stuck' is this one: /root/bin/raid-status.sh which can be found here: http://www.monkeybrains.net/~rudy/example/raid_status.html Forgot to mention, I am running: 6.2-STABLE FreeBSD 6.2-STABLE #3: Thu May 31 01:18:15 PDT 2007 OH, ps shows this: 58383 ?? D 0:00.00 cron: running job (cron) 58384 ?? IVs0:00.00 cron: running job (cron) In general, when troubleshhoting, ps axlw is a more useful command. It adds among other columns, the MWCHAN one, which details exactly why a process is stuck in the D state. Anyway, cron does a fork and then a vfork creating a child and a grandchild process. I'm sort of surprised at the amount of code between vfork and exec in the grandchild in /src/usr.sbin/cron/cron/do_command.c . Since process 3 is actually using process 2's address space one must be extremely careful not to modify static variables or change other global state that would affect the parent once it resumes execution, and all the logging, environment-setting, and user-context calls are certain to mess with the parent's state, especially with nss modules in the mix. I'd personally recompile cron with all vforks replaced with fork and see what happens. It couldn't hurt to update to a newer kernel version along the RELENG_6 branch as a test, I guess. Note that your uname will change to 6.3-PRERELEASE, but apart from causing lsof to complain, you should be okay. /var/log/cron has this entry: Dec 3 20:16:00 pita /usr/sbin/cron[58384]: (root) CMD (/root/bin/raid-status.sh CRON) BUT there is no 'raid-status.sh' stuck in the ps axw. Seems like the vfork set off the cronjob, it ran, but then cron didn't 'stop' executing. Any debuggin tips? Can you tell if raid-status.sh ever ran? i.e. is process 2 stuck at the start of vfork or at the end. I added this line to the top of my cronjob: logger -t DEBUG $0: $$ and cron seems stuck BEFORE the script is ever run. Whether it sticks or not appears random, as plenty of log lines are showing up with the output of the logger command in my /var/log/messages. # tail /var/log/messages Dec 13 11:16:00 pita DEBUG: /root/bin/raid-status.sh: 64414 Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80115 Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80119 Dec 13 12:11:00 pita DEBUG: /root/bin/raid-status.sh: 84283 Here is the ps output: # ps axlw UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 85939 82253 0 8 0 2148 1560 ppwait D ??0:00.00 cron: running job (cron) 0 85940 85939 0 4 0 2148 1560 sbwait IVs ??0:00.00 cron: running job (cron) # grep 85940 /var/log/cron Dec 13 12:16:00 pita /usr/sbin/cron[85940]: (root) CMD (/root/bin/raid-status.sh CRON) - Rudy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: cron pile up! Lot's of cron: running job (cron)
Rudy wrote: Dan Nelson wrote: In the last episode (Dec 03), Support (Rudy) said: Below is part of the cron... Seems like any random cronjob can get clogged up... load varies from 0.2 to 1.0 on this dual-core box. I rebooted the box -- cron's continue to slowly pile up. One of the cronjobs that is 'stuck' is this one: /root/bin/raid-status.sh which can be found here: http://www.monkeybrains.net/~rudy/example/raid_status.html Forgot to mention, I am running: 6.2-STABLE FreeBSD 6.2-STABLE #3: Thu May 31 01:18:15 PDT 2007 OH, ps shows this: 58383 ?? D 0:00.00 cron: running job (cron) 58384 ?? IVs0:00.00 cron: running job (cron) In general, when troubleshhoting, ps axlw is a more useful command. It adds among other columns, the MWCHAN one, which details exactly why a process is stuck in the D state. Anyway, cron does a fork and then a vfork creating a child and a grandchild process. I'm sort of surprised at the amount of code between vfork and exec in the grandchild in /src/usr.sbin/cron/cron/do_command.c . Since process 3 is actually using process 2's address space one must be extremely careful not to modify static variables or change other global state that would affect the parent once it resumes execution, and all the logging, environment-setting, and user-context calls are certain to mess with the parent's state, especially with nss modules in the mix. I'd personally recompile cron with all vforks replaced with fork and see what happens. It couldn't hurt to update to a newer kernel version along the RELENG_6 branch as a test, I guess. Note that your uname will change to 6.3-PRERELEASE, but apart from causing lsof to complain, you should be okay. /var/log/cron has this entry: Dec 3 20:16:00 pita /usr/sbin/cron[58384]: (root) CMD (/root/bin/raid-status.sh CRON) BUT there is no 'raid-status.sh' stuck in the ps axw. Seems like the vfork set off the cronjob, it ran, but then cron didn't 'stop' executing. Any debuggin tips? Can you tell if raid-status.sh ever ran? i.e. is process 2 stuck at the start of vfork or at the end. I added this line to the top of my cronjob: logger -t DEBUG $0: $$ and cron seems stuck BEFORE the script is ever run. Whether it sticks or not appears random, as plenty of log lines are showing up with the output of the logger command in my /var/log/messages. # tail /var/log/messages Dec 13 11:16:00 pita DEBUG: /root/bin/raid-status.sh: 64414 Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80115 Dec 13 12:00:00 pita DEBUG: /root/bin/raid-status.sh: 80119 Dec 13 12:11:00 pita DEBUG: /root/bin/raid-status.sh: 84283 Here is the ps output: # ps axlw UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 85939 82253 0 8 0 2148 1560 ppwait D ??0:00.00 cron: running job (cron) 0 85940 85939 0 4 0 2148 1560 sbwait IVs ??0:00.00 cron: running job (cron) # grep 85940 /var/log/cron Dec 13 12:16:00 pita /usr/sbin/cron[85940]: (root) CMD (/root/bin/raid-status.sh CRON) - Rudy Just as a favor to an old coot, could you change your crontab entry to read like this: */16 * * * * /root/bin/raid-status.sh and see if it makes any difference? Kevin Kinsey -- There are never any bugs you haven't found yet. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
cron pile up! Lot's of cron: running job (cron)
cron jobs seem to get stuck. Not always, but within a day, there are at least 20 stuck. It is not always the same cronjob that does the sticking. :) When this occurs, I can run ps ax| grep cron and get a bunch of lines like this: 51921 ?? D 0:00.00 cron: running job (cron) 51922 ?? IVs0:00.00 cron: running job (cron) 52544 ?? D 0:00.00 cron: running job (cron) 52545 ?? IVs0:00.00 cron: running job (cron) 54418 ?? D 0:00.00 cron: running job (cron) 54419 ?? IVs0:00.00 cron: running job (cron) 54667 ?? D 0:00.00 cron: running job (cron) 54668 ?? IVs0:00.00 cron: running job (cron) 55835 ?? D 0:00.00 cron: running job (cron) 55836 ?? IVs0:00.00 cron: running job (cron) What is going on? Please help me remedy this situation. The PID numbers next to cron's with a STATE of IVs show up in /var/log/cron, for example: # grep 54668 /var/log/cron Dec 2 22:32:00 pita /usr/sbin/cron[54668]: (root) CMD (/root/bin/raid-status.sh CRON) # grep 55836 /var/log/cron Dec 2 22:40:00 pita /usr/sbin/cron[55836]: (root) CMD (/root/bin/10minutes.mail.sh | mail -E -s [ERROR] mail.monkeybrains.net [EMAIL PROTECTED]) If I run 'lsof' I can find these open handles: cron 54668 root cwd VDIR 0,80512 471040 /var/cron cron 54668 root rtd VDIR 0,775122 / cron 54668 root txt VREG 0,82 32496 122864 /usr/sbin/cron cron 54668 root txt VREG 0,77 16271249929 /libexec/ld-elf.so.1 cron 54668 root txt VREG 0,77 4478849922 /lib/libutil.so.5 cron 54668 root txt VREG 0,77 94195249923 /lib/libc.so.6 cron 54668 root txt VREG 0,82 19277 826439 /usr/local/lib/nss_mysql.so.1 cron 54668 root txt VREG 0,82 413626 826986 /usr/local/lib/mysql/libmysqlclient.so.15 cron 54668 root txt VREG 0,77 6460449928 /lib/libz.so.3 cron 54668 root txt VREG 0,77 10743249918 /lib/libm.so.4 cron 54668 root txt VREG 0,77 2864849916 /lib/libcrypt.so.3 cron 54668 root0u PIPE 0xca02c660 16384 -0xca02c718 cron 54668 root1u PIPE 0xcc473250 0 -0xcc473198 cron 54668 root2u PIPE 0xcc473250 0 -0xcc473198 cron 54668 root5u unix 0xc66658580t0 -0xc67e89bc cron 54667 root cwd VDIR 0,80512 471040 /var/cron cron 54667 root rtd VDIR 0,775122 / cron 54667 root txt VREG 0,82 32496 122864 /usr/sbin/cron cron 54667 root txt VREG 0,77 16271249929 /libexec/ld-elf.so.1 cron 54667 root txt VREG 0,77 4478849922 /lib/libutil.so.5 cron 54667 root txt VREG 0,77 94195249923 /lib/libc.so.6 cron 54667 root txt VREG 0,82 19277 826439 /usr/local/lib/nss_mysql.so.1 cron 54667 root txt VREG 0,82 413626 826986 /usr/local/lib/mysql/libmysqlclient.so.15 cron 54667 root txt VREG 0,77 6460449928 /lib/libz.so.3 cron 54667 root txt VREG 0,77 10743249918 /lib/libm.so.4 cron 54667 root txt VREG 0,77 2864849916 /lib/libcrypt.so.3 cron 54667 root0u VCHR 0,260t0 26 /dev/null cron 54667 root1u VCHR 0,260t0 26 /dev/null cron 54667 root2u VCHR 0,260t0 26 /dev/null cron 54667 root3u PIPE 0xca02c660 16384 -0xca02c718 cron 54667 root4u PIPE 0xca02c718 0 -0xca02c660 cron 54667 root5u unix 0xc66658580t0 -0xc67e89bc cron 54667 root6u PIPE 0xcc473198 16384 -0xcc473250 cron 54667 root7u unix 0xc67e86f40t0 -(none) cron 54667 root8u PIPE 0xcc473250 0 -0xcc473198 What is going on? Is my libnss_mysql acting up? Rudy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: cron pile up! Lot's of cron: running job (cron)
Rudy wrote: cron jobs seem to get stuck. Not always, but within a day, there are at least 20 stuck. It is not always the same cronjob that does the sticking. :) When this occurs, I can run ps ax| grep cron and get a bunch of lines like this: 51921 ?? D 0:00.00 cron: running job (cron) 51922 ?? IVs0:00.00 cron: running job (cron) 52544 ?? D 0:00.00 cron: running job (cron) 52545 ?? IVs0:00.00 cron: running job (cron) 54418 ?? D 0:00.00 cron: running job (cron) 54419 ?? IVs0:00.00 cron: running job (cron) 54667 ?? D 0:00.00 cron: running job (cron) 54668 ?? IVs0:00.00 cron: running job (cron) 55835 ?? D 0:00.00 cron: running job (cron) 55836 ?? IVs0:00.00 cron: running job (cron) What is going on? Please help me remedy this situation. The PID numbers next to cron's with a STATE of IVs show up in /var/log/cron, for example: # grep 54668 /var/log/cron Dec 2 22:32:00 pita /usr/sbin/cron[54668]: (root) CMD (/root/bin/raid-status.sh CRON) # grep 55836 /var/log/cron Dec 2 22:40:00 pita /usr/sbin/cron[55836]: (root) CMD (/root/bin/10minutes.mail.sh | mail -E -s [ERROR] mail.monkeybrains.net [EMAIL PROTECTED]) If I run 'lsof' I can find these open handles: cron 54668 root cwd VDIR 0,80512 471040 /var/cron cron 54668 root rtd VDIR 0,77 5122 / cron 54668 root txt VREG 0,82 32496 122864 /usr/sbin/cron cron 54668 root txt VREG 0,77 162712 49929 /libexec/ld-elf.so.1 cron 54668 root txt VREG 0,77 44788 49922 /lib/libutil.so.5 cron 54668 root txt VREG 0,77 941952 49923 /lib/libc.so.6 cron 54668 root txt VREG 0,82 19277 826439 /usr/local/lib/nss_mysql.so.1 cron 54668 root txt VREG 0,82 413626 826986 /usr/local/lib/mysql/libmysqlclient.so.15 cron 54668 root txt VREG 0,77 64604 49928 /lib/libz.so.3 cron 54668 root txt VREG 0,77 107432 49918 /lib/libm.so.4 cron 54668 root txt VREG 0,77 28648 49916 /lib/libcrypt.so.3 cron 54668 root0u PIPE 0xca02c660 16384 -0xca02c718 cron 54668 root1u PIPE 0xcc473250 0 -0xcc473198 cron 54668 root2u PIPE 0xcc473250 0 -0xcc473198 cron 54668 root5u unix 0xc6665858 0t0 -0xc67e89bc cron 54667 root cwd VDIR 0,80512 471040 /var/cron cron 54667 root rtd VDIR 0,77 5122 / cron 54667 root txt VREG 0,82 32496 122864 /usr/sbin/cron cron 54667 root txt VREG 0,77 162712 49929 /libexec/ld-elf.so.1 cron 54667 root txt VREG 0,77 44788 49922 /lib/libutil.so.5 cron 54667 root txt VREG 0,77 941952 49923 /lib/libc.so.6 cron 54667 root txt VREG 0,82 19277 826439 /usr/local/lib/nss_mysql.so.1 cron 54667 root txt VREG 0,82 413626 826986 /usr/local/lib/mysql/libmysqlclient.so.15 cron 54667 root txt VREG 0,77 64604 49928 /lib/libz.so.3 cron 54667 root txt VREG 0,77 107432 49918 /lib/libm.so.4 cron 54667 root txt VREG 0,77 28648 49916 /lib/libcrypt.so.3 cron 54667 root0u VCHR 0,260t0 26 /dev/null cron 54667 root1u VCHR 0,260t0 26 /dev/null cron 54667 root2u VCHR 0,260t0 26 /dev/null cron 54667 root3u PIPE 0xca02c660 16384 -0xca02c718 cron 54667 root4u PIPE 0xca02c718 0 -0xca02c660 cron 54667 root5u unix 0xc6665858 0t0 -0xc67e89bc cron 54667 root6u PIPE 0xcc473198 16384 -0xcc473250 cron 54667 root7u unix 0xc67e86f4 0t0 -(none) cron 54667 root8u PIPE 0xcc473250 0 -0xcc473198 What is going on? Is my libnss_mysql acting up? What scripts are running? Care to sanitize the crontab file and show it as well? Barring hardware issues (disk errors, etc.), I'd suspect the scripts. What about server load averages? KDK -- Law of Continuity: Experiments should be reproducible. They should all fail the same way. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions
Re: cron pile up! Lot's of cron: running job (cron)
Below is part of the cron... Seems like any random cronjob can get clogged up... load varies from 0.2 to 1.0 on this dual-core box. I rebooted the box -- cron's continue to slowly pile up. One of the cronjobs that is 'stuck' is this one: /root/bin/raid-status.sh which can be found here: http://www.monkeybrains.net/~rudy/example/raid_status.html Forgot to mention, I am running: 6.2-STABLE FreeBSD 6.2-STABLE #3: Thu May 31 01:18:15 PDT 2007 OH, ps shows this: 58383 ?? D 0:00.00 cron: running job (cron) 58384 ?? IVs0:00.00 cron: running job (cron) /var/log/cron has this entry: Dec 3 20:16:00 pita /usr/sbin/cron[58384]: (root) CMD (/root/bin/raid-status.sh CRON) BUT there is no 'raid-status.sh' stuck in the ps axw. Seems like the vfork set off the cronjob, it ran, but then cron didn't 'stop' executing. Any debuggin tips? Rudy --- PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/bin:/usr/local/sbin:/usr/X11R6/bin:/root/bin [EMAIL PROTECTED] Root Cron for example.net ## # check demons, limit sendmail, generate fwdmail aliases ## */10 * * * * /root/bin/10minutes.mail.sh | mail -E -s [ERROR] example.monkeybrains.net [EMAIL PROTECTED] */16 * * * * /root/bin/raid-status.sh CRON ## # Anti-Spam measures ## 1 5 * * * /usr/local/etc/mail/blacklist2access.pl | /usr/bin/mail -E -s [INFO] mail: blacklist2access script [EMAIL PROTECTED] ## update the rules/balcklists list 40 5 * * * /usr/local/bin/sa-update --allowplugins --gpgkey D1C035168C1EBC08464946DA258CDB3ABDE9DC10 --channel saupdates.openprotect.com /usr/local/etc/rc.d/sa-spamd restart 48 5 * * * /usr/local/bin/sa-update --channel updates.spamassassin.org /usr/local/etc/rc.d/sa-spamd restart ## and anti-virus 49 */2 * * * su -m clamav -c '/usr/local/bin/freshclam --quiet' @weekly /usr/bin/find /var/tmp/ -maxdepth 1 -and -path *clamav* -and -type d -and \! -newermt '2 days ago' -and -delete ### # Clean stuff up # old trash, viruses, old spam, and authdaemon cache ### ## squirrelmail attachments 45 3 * * * /usr/bin/find /var/spool/squirrelmail/attach \! -newermt '9 day ago' -delete ## stuff marked as Trash or in Trash folder 55 3 * * * /usr/bin/find /home /data/virtual/ -path */Maildir/* -and -name *:*T -and \! -newermt '2 day ago' -delete 35 3 * * * /usr/bin/find /home/ /data/virtual/ -path */Maildir/.Trash/* -name *net* -and \! -newermt '4 day ago' -delete ... etc ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: cron pile up! Lot's of cron: running job (cron)
In the last episode (Dec 03), Support (Rudy) said: Below is part of the cron... Seems like any random cronjob can get clogged up... load varies from 0.2 to 1.0 on this dual-core box. I rebooted the box -- cron's continue to slowly pile up. One of the cronjobs that is 'stuck' is this one: /root/bin/raid-status.sh which can be found here: http://www.monkeybrains.net/~rudy/example/raid_status.html Forgot to mention, I am running: 6.2-STABLE FreeBSD 6.2-STABLE #3: Thu May 31 01:18:15 PDT 2007 OH, ps shows this: 58383 ?? D 0:00.00 cron: running job (cron) 58384 ?? IVs0:00.00 cron: running job (cron) In general, when troubleshhoting, ps axlw is a more useful command. It adds among other columns, the MWCHAN one, which details exactly why a process is stuck in the D state. Anyway, cron does a fork and then a vfork creating a child and a grandchild process. I'm sort of surprised at the amount of code between vfork and exec in the grandchild in /src/usr.sbin/cron/cron/do_command.c . Since process 3 is actually using process 2's address space one must be extremely careful not to modify static variables or change other global state that would affect the parent once it resumes execution, and all the logging, environment-setting, and user-context calls are certain to mess with the parent's state, especially with nss modules in the mix. I'd personally recompile cron with all vforks replaced with fork and see what happens. It couldn't hurt to update to a newer kernel version along the RELENG_6 branch as a test, I guess. Note that your uname will change to 6.3-PRERELEASE, but apart from causing lsof to complain, you should be okay. /var/log/cron has this entry: Dec 3 20:16:00 pita /usr/sbin/cron[58384]: (root) CMD (/root/bin/raid-status.sh CRON) BUT there is no 'raid-status.sh' stuck in the ps axw. Seems like the vfork set off the cronjob, it ran, but then cron didn't 'stop' executing. Any debuggin tips? Can you tell if raid-status.sh ever ran? i.e. is process 2 stuck at the start of vfork or at the end. BTW, here's a minimal example of the danger of putting code between vfork and exec: #include err.h #include stdio.h #include unistd.h int main(void) { int i = 1; switch (vfork()) { case -1: err(1, vfork failed); break; case 0: /* child */ i = 2; execl(/usr/bin/true, true, NULL); _exit(0); break; default: break; } printf(in parent, i is %d\n, i); return 0; } -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]