I've got a RH 7.2 machine running as a SMB client, writing backup files
to a Win98 box as backup storage.   Sometimes the Win98 machine goes
down and, obviously, the remote files are not accessible.

The problem is that the RH machine has some cron scripts that write to
the SMB shares.  When the Win98 machine goes down, the scripts suspend
when they can't write to the shared file.  `Cron` dutifully continues to 
run the script every 15 minutes trying to write to the same file, and a 
"traffic jam" of these processes start to accumulate that halt, but 
don't die, when trying unsuccessfully write to the file.  These scripts 
continue to exist as processes, which can be seen piling up w/ `ps` 
(edited for clarity):


root      1695   937  0 01:00 ?        00:00:00 CROND
manager   1696  1695  0 01:00 ?        00:00:00 /bin/sh -c b_up.sh
manager   1700  1696  0 01:00 ?        00:00:00 cp LOCAL REMOTE
root      1711   937  0 01:15 ?        00:00:00 CROND
manager   1712  1711  0 01:15 ?        00:00:00 /bin/sh -c b_up.sh
manager   1716  1712  0 01:15 ?        00:00:00 cp LOCAL REMOTE
root      1721   937  0 01:30 ?        00:00:00 CROND
manager   1722  1721  0 01:30 ?        00:00:00 /bin/sh -c b_up.sh
manager   1726  1722  0 01:30 ?        00:00:00 cp LOCAL REMOTE
root      1727   937  0 01:45 ?        00:00:00 CROND
manager   1728  1727  0 01:45 ?        00:00:00 /bin/sh -c b_up.sh
manager   1732  1728  0 01:45 ?        00:00:00 cp LOCAL REMOTE

...etc, ad infinitum




But even after the Win98 machine comes back up, the scripts don't
complete and are still left in a state of suspension.


I tried doing some error correction, but in the following code, `cp`
never returns an error, it just locks up forever after trying to access
the inaccessible REMOTE file:

cp ${LOCAL} ${REMOTE}
        ERR=$?
        if [ $ERR -ne 0 ]
        then
        echo "error copying file" | mail -s ERROR root
        exit
        fi


Additionally, the process jam seems to lock up my email subsystem such
that no email error messages can get out, neither can I ftp to the RH
box.  Logwatch reports the following:


--------------------- sendmail Begin ------------------------

264707 bytes transferred
7 messages sent

**Unmatched Entries**

rejecting connections on daemon MTA: load average: 169
rejecting connections on daemon MTA: load average: 169
rejecting connections on daemon MTA: load average: 169
rejecting connections on daemon MTA: load average: 169

...etc, ad infinitum


So...anyone have ideas how I can resolve this SMB lockup problem, which
seems to cascade into other problems???

Any assistance will be greatly appreciated.

Thanks!
Cosmo Lee





-- 
redhat-list mailing list
unsubscribe mailto:[EMAIL PROTECTED]?subject=unsubscribe
https://listman.redhat.com/mailman/listinfo/redhat-list

Reply via email to