Re: [Bacula-users] Random Backup Failures
On 11 Jul 2007 at 11:15, Chris Morris wrote: Since I've introduced FreeBSD snapshots into my Bacula plan, I've started getting random backup failures. A server will fail one day, and back up just fine the next. A server will back up fine one day and fail the next. ...all with no changes to the Bacula configuration. Below, I've posted pertinent portions of configuration files and message logs. Please let me know if I need to supply any further information to help troubleshoot this down. You say the jobs fail. What is the failure? Error message? bacula-dir.conf pertinent portions only...sensitive information removed with: *REMOVED* JobDefs { Name = BSD Type = Backup FileSet = defaultBSD Storage = storage01 Messages = Standard Pool = Default ClientRunBeforeJob = /usr/local/bin/sudo /usr/local/sbin/snapshot make -g1 /var:autogen_bkup ClientRunBeforeJob = /usr/local/bin/sudo /usr/local/sbin/snapshot make -g1 /usr:autogen_bkup ClientRunBeforeJob = /usr/local/bin/sudo /usr/local/sbin/snapshot mount /var:autogen_bkup /mnt/var ClientRunBeforeJob = /usr/local/bin/sudo /usr/local/sbin/snapshot mount /usr:autogen_bkup /mnt/usr ClientRunAfterJob = /usr/local/bin/sudo /usr/local/sbin/snapshot umount /mnt/var ClientRunAfterJob = /usr/local/bin/sudo /usr/local/sbin/snapshot umount /mnt/usr I suggest creating scripts on the client, and moving these commands into those scripts. It makes the JobDefs easier to read. Sure, you have to copy stuff to the client, but I think that's cleaner. YMMV. Priority = 10 } Typical job, as they are all nearly identical: Job { Name = app06_BSD Client = app06-fd Schedule = MonCycle JobDefs = BSD Write Bootstrap = /var/db/bacula/app06.bsr } Typical client, as they are all nearly identical: Client { Name = app06-fd Address = app06 FDPort = 9102 Catalog = MyCatalog Password = *REMOVED* # password for FileDaemon File Retention = 30 days # 30 days Job Retention = 6 months # six months AutoPrune = yes # Prune expired Jobs/Files } My primary FileSet resource: FileSet { Name = defaultBSD Include { Options { signature = MD5 compression = GZIP } File = / File = /mnt/usr File = /mnt/var } Exclude { File = /proc File = /tmp File = /.journal File = /.fsck } } Finally, I get the same message from my /var/log/messages file at every failure. The lines before and after this have nothing to do with the backup. Jul 11 08:19:07 app11 sudo: *REMOVED* : TTY=unknown ; PWD=/usr/local/etc/rc.d ; USER=root ; COMMAND=/usr/local/sbin/snapshot make -g1 /var:autogen_bkup Jul 11 08:21:34 app11 kernel: fsync: giving up on dirty Jul 11 08:21:34 app11 kernel: 0xff005b07cd90: tag devfs, type VCHR Jul 11 08:21:34 app11 kernel: usecount 1, writecount 0, refcount 604 mountedhere 0xff011f1ba200 Jul 11 08:21:34 app11 kernel: flags () Jul 11 08:21:34 app11 kernel: v_object 0xff005d3a ref 0 pages 8572 Jul 11 08:21:34 app11 kernel: lock type devfs: EXCL (count 1) by thread 0xff00abc57980 (pid 46772) Jul 11 08:21:34 app11 kernel: dev da0s1d Jul 11 08:22:03 app11 sudo: *REMOVED* : TTY=unknown ; PWD=/usr/local/etc/rc.d ; USER=root ; COMMAND=/usr/local/sbin/snapshot make -g1 /usr:autogen_bkup Jul 11 08:24:13 app11 sudo: *REMOVED* : TTY=unknown ; PWD=/usr/local/etc/rc.d ; USER=root ; COMMAND=/usr/local/sbin/snapshot mount /var:autogen_bkup /mnt/var Jul 11 08:24:13 app11 kernel: g_vfs_done():md0[READ(offset=65536, length=8192)]error = 5 This looks like an OS issue, not a Bacula issue. I suggest following up on the FreeBSD maling lists. -- Dan Langille - http://www.langille.org/ - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Random Backup Failures
Dan Langille wrote: You say the jobs fail. What is the failure? Error message? Below, I've pasted in a failure notification email that Bacula automatically sends. 11-Jul 07:18 admin01-dir: Start Backup JobId 255, Job=app11_BSD.2007-07-10_23.35.04 11-Jul 01:19 app11-fd: DIR and FD clocks differ by -21547 seconds, FD automatically adjusting. 11-Jul 01:19 app11-fd: ClientRunBeforeJob: run command /usr/local/bin/sudo /usr/local/sbin/snapshot make -g1 /var:autogen_bkup 11-Jul 01:22 app11-fd: ClientRunBeforeJob: mount: /var/.snap/autogen_bkup.0: Resource temporarily unavailable 11-Jul 01:22 app11-fd: ClientRunBeforeJob: run command /usr/local/bin/sudo /usr/local/sbin/snapshot make -g1 /usr:autogen_bkup 11-Jul 01:24 app11-fd: ClientRunBeforeJob: run command /usr/local/bin/sudo /usr/local/sbin/snapshot mount /var:autogen_bkup /mnt/var 11-Jul 01:24 app11-fd: ClientRunBeforeJob: mount: /dev/md0: Input/output error 11-Jul 01:24 app11-fd: ClientRunBeforeJob: snapshot:ERROR: unable to mount /dev/md0 under /mnt/var 11-Jul 01:24 app11-fd: app11_BSD.2007-07-10_23.35.04 Error: Runscript: ClientRunBeforeJob returned non-zero status=1. ERR=Child exited with code 1 11-Jul 07:23 admin01-dir: app11_BSD.2007-07-10_23.35.04 Fatal error: Bad response to ClientRunBeforeJob command: wanted 2000 OK RunBefore , got 2905 Bad RunBeforeJob command. 11-Jul 07:23 admin01-dir: app11_BSD.2007-07-10_23.35.04 Error: Bacula 2.0.3 (06Mar07): 11-Jul-2007 07:23:21 JobId: 255 Job:app11_BSD.2007-07-10_23.35.04 Backup Level: Differential, since=2007-07-10 07:32:06 Client: app11-fd 2.0.3 (06Mar07) amd64-portbld-freebsd6.2,freebsd,6.2-RC1 FileSet:defaultBSD 2007-07-09 23:05:00 Pool: Default (From Job resource) Storage:storage01 (From Job resource) Scheduled time: 10-Jul-2007 23:35:03 Start time: 11-Jul-2007 07:18:09 End time: 11-Jul-2007 07:23:21 Elapsed time: 5 mins 12 secs Priority: 10 FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None VSS:no Encryption: no Volume name(s): Volume Session Id: 95 Volume Session Time:1183747854 Last Volume Bytes: 403,356,186,192 (403.3 GB) Non-fatal FD errors:0 SD Errors: 0 FD termination status: SD termination status: OK Termination:*** Backup Error *** -- S i x F e e t U p | Nowhere to go but open source Silicon Valley: +1 (650) 401-8579 x609 Midwest: +1 (317) 861-5948 x609 Toll-Free: 1-866-SIX-FEET mailto:[EMAIL PROTECTED] http://www.sixfeetup.com | Zope/Plone Custom Development - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Random Backup Failures
On 7/11/07, Chris Morris [EMAIL PROTECTED] wrote: Dan Langille wrote: You say the jobs fail. What is the failure? Error message? Below, I've pasted in a failure notification email that Bacula automatically sends. 11-Jul 07:18 admin01-dir: Start Backup JobId 255, Job=app11_BSD.2007-07-10_23.35.04 11-Jul 01:19 app11-fd: DIR and FD clocks differ by -21547 seconds, FD automatically adjusting. 11-Jul 01:19 app11-fd: ClientRunBeforeJob: run command /usr/local/bin/sudo /usr/local/sbin/snapshot make -g1 /var:autogen_bkup 11-Jul 01:22 app11-fd: ClientRunBeforeJob: mount: /var/.snap/autogen_bkup.0: Resource temporarily unavailable 11-Jul 01:22 app11-fd: ClientRunBeforeJob: run command /usr/local/bin/sudo /usr/local/sbin/snapshot make -g1 /usr:autogen_bkup 11-Jul 01:24 app11-fd: ClientRunBeforeJob: run command /usr/local/bin/sudo /usr/local/sbin/snapshot mount /var:autogen_bkup /mnt/var 11-Jul 01:24 app11-fd: ClientRunBeforeJob: mount: /dev/md0: Input/output error 11-Jul 01:24 app11-fd: ClientRunBeforeJob: snapshot:ERROR: unable to mount /dev/md0 under /mnt/var 11-Jul 01:24 app11-fd: app11_BSD.2007-07-10_23.35.04 Error: Runscript: ClientRunBeforeJob returned non-zero status=1. ERR=Child exited with code 1 11-Jul 07:23 admin01-dir: app11_BSD.2007-07-10_23.35.04 Fatal error: Bad response to ClientRunBeforeJob command: wanted 2000 OK RunBefore , got 2905 Bad RunBeforeJob command. This says your ClientRunBeforeJob is has failed as it could not perform the mount of /dev/md0 to /mnt/var. Have you checked into that? Is bacula-fd running as user bacula? Possibly this is a permissions issue. John - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Random Backup Failures
For the benefit of list readers and those that may still be trying to assist me with troubleshooting. I have finally been able to duplicate the errors on my own terms. I could never duplicate this before, because I would sit at a terminal window and make the snapshot, mount the snapshot, browse the snapshot, and umount the snapshot in that order. Finally, I decided to try to *behave* more like I expect a script to work. I opened two terminal windows. In one, I started the snapshot generation process. In the other, I tried to mount the snapshot before it was finished. It, of course, didn't work. More importantly, however, _I got the exact errors that I would randomly get in my automated overnight backups. _Now that I can reproduce the error, fixing and implementing is trivial. Many thanks to you those that provided assistance with this matter. Thank you, Chris Morris -- S i x F e e t U p | Nowhere to go but open source Silicon Valley: +1 (650) 401-8579 x609 Midwest: +1 (317) 861-5948 x609 Toll-Free: 1-866-SIX-FEET mailto:[EMAIL PROTECTED] http://www.sixfeetup.com | Zope/Plone Custom Development John Drescher wrote: On 7/11/07, *Chris Morris* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Dan Langille wrote: You say the jobs fail. What is the failure? Error message? Below, I've pasted in a failure notification email that Bacula automatically sends. 11-Jul 07:18 admin01-dir: Start Backup JobId 255, Job=app11_BSD.2007-07-10_23.35.04 11-Jul 01:19 app11-fd: DIR and FD clocks differ by -21547 seconds, FD automatically adjusting. 11-Jul 01:19 app11-fd: ClientRunBeforeJob: run command /usr/local/bin/sudo /usr/local/sbin/snapshot make -g1 /var:autogen_bkup 11-Jul 01:22 app11-fd: ClientRunBeforeJob: mount: /var/.snap/autogen_bkup.0: Resource temporarily unavailable 11-Jul 01:22 app11-fd: ClientRunBeforeJob: run command /usr/local/bin/sudo /usr/local/sbin/snapshot make -g1 /usr:autogen_bkup 11-Jul 01:24 app11-fd: ClientRunBeforeJob: run command /usr/local/bin/sudo /usr/local/sbin/snapshot mount /var:autogen_bkup /mnt/var 11-Jul 01:24 app11-fd: ClientRunBeforeJob: mount: /dev/md0: Input/output error 11-Jul 01:24 app11-fd: ClientRunBeforeJob: snapshot:ERROR: unable to mount /dev/md0 under /mnt/var 11-Jul 01:24 app11-fd: app11_BSD.2007-07-10_23.35.04 Error: Runscript: ClientRunBeforeJob returned non-zero status=1. ERR=Child exited with code 1 11-Jul 07:23 admin01-dir: app11_BSD.2007-07-10_23.35.04 Fatal error: Bad response to ClientRunBeforeJob command: wanted 2000 OK RunBefore , got 2905 Bad RunBeforeJob command. This says your ClientRunBeforeJob is has failed as it could not perform the mount of /dev/md0 to /mnt/var. Have you checked into that? Is bacula-fd running as user bacula? Possibly this is a permissions issue. John - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users