I'm turning my logging up to lvl 3, it looks like I'm seeing the symptom, not the cause the more I look at this. Based on last night's failures I have the following pattern:
At 9:24pm ntbackup on ravenclaw begins writing to the existing file RAVECLAW.bkf, so far nothing logged at lvl 2 yet... at 7:04am log.ravenclaw shows: [2010/08/26 07:04:35.538218, 2] smbd/sesssetup.c:1390(setup_new_vc_session) setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all old resources. [2010/08/26 07:04:35.576723, 2] smbd/sesssetup.c:1390(setup_new_vc_session) setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all old resources. [2010/08/26 07:04:35.697538, 2] auth/auth.c:304(check_ntlm_password) check_ntlm_password: authentication for user [backup] -> [backup] -> [backup] succeeded [2010/08/26 07:04:35.700101, 1] smbd/service.c:1070(make_connection_snum) ravenclaw (192.168.1.8) connect to service backups initially as user backup (uid=34, gid=34) (pid 30350) [2010/08/26 07:04:35.701039, 2] smbd/open.c:631(open_file) backup opened file A7B4G5J1.BJ3 read=Yes write=Yes (numopen=1) [2010/08/26 07:04:35.702556, 2] smbd/close.c:656(close_normal_file) backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK The open and closed chatter for that file repeats until 7:06:47am... [2010/08/26 07:06:36.254759, 2] smbd/close.c:656(close_normal_file) backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK [2010/08/26 07:06:47.640491, 1] smbd/service.c:1251(close_cnum) ravenclaw (192.168.1.8) closed connection to service backups [2010/08/26 07:08:12.795026, 0] lib/util_sock.c:675(write_data) [2010/08/26 07:08:12.795070, 0] lib/util_sock.c:1432(get_peer_addr_internal) getpeername failed. Error was Transport endpoint is not connected write_data: write failure in writing to client 0.0.0.0. Error Connection reset by peer [2010/08/26 07:08:12.814075, 0] smbd/process.c:79(srv_send_smb) Error writing 51 bytes to client. -1. (Transport endpoint is not connected) [2010/08/26 07:08:12.816315, 0] lib/util_sock.c:675(write_data) [2010/08/26 07:08:12.816325, 0] lib/util_sock.c:1432(get_peer_addr_internal) getpeername failed. Error was Transport endpoint is not connected write_data: write failure in writing to client 0.0.0.0. Error Broken pipe [2010/08/26 07:08:12.816333, 0] smbd/process.c:79(srv_send_smb) Error writing 55 bytes to client. -1. (Transport endpoint is not connected) [2010/08/26 07:08:12.817211, 0] lib/util_sock.c:675(write_data) [2010/08/26 07:08:12.817219, 0] lib/util_sock.c:1432(get_peer_addr_internal) getpeername failed. Error was Transport endpoint is not connected write_data: write failure in writing to client 0.0.0.0. Error Broken pipe [2010/08/26 07:08:12.817226, 2] smbd/process.c:2202(keepalive_fn) Keepalive failed - exiting. [2010/08/26 07:08:12.817339, 0] lib/util_sock.c:675(write_data) [2010/08/26 07:08:12.817348, 0] lib/util_sock.c:1432(get_peer_addr_internal) getpeername failed. Error was Transport endpoint is not connected write_data: write failure in writing to client 0.0.0.0. Error Broken pipe [2010/08/26 07:08:12.817355, 0] smbd/process.c:79(srv_send_smb) Error writing 51 bytes to client. -1. (Transport endpoint is not connected) [2010/08/26 07:08:12.941965, 2] lib/messages_local.c:289(message_notify) message to process 30350 failed - No such process [2010/08/26 07:08:12.941991, 2] lib/messages_local.c:379(messaging_tdb_send) pid 30350 doesn't exist - deleting messages record [2010/08/26 07:08:12.942010, 2] lib/messages_local.c:289(message_notify) message to process 30350 failed - No such process [2010/08/26 07:08:12.942020, 2] lib/messages_local.c:379(messaging_tdb_send) pid 30350 doesn't exist - deleting messages record [2010/08/26 07:08:12.942037, 2] lib/messages_local.c:289(message_notify) message to process 30350 failed - No such process [2010/08/26 07:08:12.942046, 2] lib/messages_local.c:379(messaging_tdb_send) pid 30350 doesn't exist - deleting messages record [2010/08/26 07:08:12.942065, 2] lib/messages_local.c:289(message_notify) message to process 30350 failed - No such process [2010/08/26 07:08:12.942075, 2] lib/messages_local.c:379(messaging_tdb_send) pid 30350 doesn't exist - deleting messages record [2010/08/26 07:08:12.942136, 2] smbd/close.c:656(close_normal_file) backup closed file Windows Servers/Ravenclaw/RAVENCLAW.bkf (numopen=0) NT_STATUS_OK [2010/08/26 07:08:12.942159, 1] smbd/service.c:1251(close_cnum) ravenclaw (192.168.1.8) closed connection to service backups This is followed by the cleanup messages in log.smbd: [2010/08/26 07:08:13.139151, 1] smbd/server.c:267(remove_child_pid) Scheduled cleanup of brl and lock database after unclean shutdown [2010/08/26 07:08:33.152490, 1] smbd/server.c:240(cleanup_timeout_fn) Cleaning up brl and lock database after unclean shutdown Ravenclaw reports the backup failed at 7:06am, error writing. So something happened between Samba and the 2k3 box at 7:04am that triggered the new session, but I don't have anything to indicate a reason. I have eventlog entries on Ravenclaw from after the fact noting Delayed Write Failed, but nothing leading up to the problem. Josh C -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba