Re: [Bacula-users] connections timing out
Op 20110811 om 22:15 schreef Boudewijn Ector: On 07/27/2011 10:31 AM, Pietro Bertera wrote: 2011/7/26 Boudewijn Ectorboudew...@boudewijnector.nl: Can someone please point me out where I should start to investigate this problem? From the internet, I can reach the director and the SD @ the 'leiden' system. I can reach the FD's at all servers which are to be backed up. the command status client=xxx in bconsole returns everything correctly ? Regards, Pietro Hi Pietro, Sorry for the late reply, since I've been on a holiday. Nothing has changed, and the problem can still be reproduced: *status client=www Connecting to Client www at www.boudewijnector.nl:9102 www.boudewijnector.nl-fd Version: 5.0.2 (28 April 2010) x86_64-pc-linux-gnu debian squeeze/sid Daemon started 11-Aug-11 18:22, 1 Job run since started. Heap: heap=1,597,440 smbytes=176,189 max_bytes=267,404 bufs=145 max_bufs=279 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 Running Jobs: JobId 293 Job wwwjob.2011-08-11_21.45.14_07 is running. Full Backup Job started: 11-Aug-11 21:45 Files=3,607 Bytes=11,011,930 Bytes/sec=1,101,193 Errors=1 Files Examined=3,608 Processing file: /root/home/boudewijn/IMG_9895.JPG SDReadSeqNo=5 fd=5 Director connected at: 11-Aug-11 21:45 Terminated Jobs: JobId LevelFiles Bytes Status FinishedName == 292 Full 93,8766.468 G Error11-Aug-11 20:23 wwwjob * So the director seems to be able to connect to the file daemon, am I correct? Cheers, Boudewijn Ector See if adding a line like Heartbeat Interval = 60 # seconden ( 0 is uit ) to both bacula-dir and bacula-fd helps. Cheers Geert Stappers who had in the past also time out errors on long back-ups -- uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] connections timing out
On 07/27/2011 10:31 AM, Pietro Bertera wrote: 2011/7/26 Boudewijn Ectorboudew...@boudewijnector.nl: Can someone please point me out where I should start to investigate this problem? From the internet, I can reach the director and the SD @ the 'leiden' system. I can reach the FD's at all servers which are to be backed up. the command status client=xxx in bconsole returns everything correctly ? Regards, Pietro Hi Pietro, Sorry for the late reply, since I've been on a holiday. Nothing has changed, and the problem can still be reproduced: *status client=www Connecting to Client www at www.boudewijnector.nl:9102 www.boudewijnector.nl-fd Version: 5.0.2 (28 April 2010) x86_64-pc-linux-gnu debian squeeze/sid Daemon started 11-Aug-11 18:22, 1 Job run since started. Heap: heap=1,597,440 smbytes=176,189 max_bytes=267,404 bufs=145 max_bufs=279 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 Running Jobs: JobId 293 Job wwwjob.2011-08-11_21.45.14_07 is running. Full Backup Job started: 11-Aug-11 21:45 Files=3,607 Bytes=11,011,930 Bytes/sec=1,101,193 Errors=1 Files Examined=3,608 Processing file: /root/home/boudewijn/IMG_9895.JPG SDReadSeqNo=5 fd=5 Director connected at: 11-Aug-11 21:45 Terminated Jobs: JobId LevelFiles Bytes Status FinishedName == 292 Full 93,8766.468 G Error11-Aug-11 20:23 wwwjob * So the director seems to be able to connect to the file daemon, am I correct? Cheers, Boudewijn Ector -- Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] connections timing out
Hi guys, For my company I've been trying to get bacula up and running properly. My currect situation: Host 'leiden' : Located at my home, multiple large (8TB) raid arrays attached. Therefore running bacula-sd and bacula-dir. 100mbit download bandwidth. Running debian testing, bacula version 5.0.3. Multiple hosts to be backed up, on a 100/100 connection. debian stable, bacula 5.0.3 running bacula-fd, default config. The complete bacula-dir.conf is located at: http://pastebin.com/8JvCdmL9 Please note that I have substituted all passwords by an X. Relevant parts are: Director {# define myself Name = leiden-dir QueryFile = /etc/bacula/scripts/query.sql WorkingDirectory = /var/lib/bacula PidDirectory = /var/run/bacula Maximum Concurrent Jobs = 10 Password = X # Console password Messages = Daemon DirAddresses = { ip = { addr = 192.168.1.44; port = 9101 } ip = { addr = 127.0.0.1; port =9101 } } } JobDefs { Name = sql-weekly Type = Backup Level = Incremental Client = sql FileSet = Full Set Schedule = WeeklyCycle Storage = leiden-filestorage Messages = Standard Pool = LeidenPool Priority = 10 } JobDefs { Name = mail-weekly Type = Backup Level = Incremental Client = mail FileSet = Full Set Schedule = WeeklyCycle Storage = leiden-filestorage Messages = Standard Pool = LeidenPool Priority = 10 } Job { Name = sqljob JobDefs = sql-weekly Write Bootstrap = /var/lib/bacula/sql.bsr } Job { Name = mailjob JobDefs = mail-weekly Write Bootstrap = /var/lib/bacula/mail.bsr } # Client (File Services) to backup Client { Name = sql Address = sql.boudewijnector.nl FDPort = 9102 Catalog = MyCatalog Password = X # password for FileDaemon File Retention = 30 days# 30 days Job Retention = 6 months# six months AutoPrune = yes # Prune expired Jobs/Files } Client { Name = mail Address = mail.boudewijnector.nl FDPort = 9102 Catalog = MyCatalog Password = X # password for FileDaemon File Retention = 30 days# 30 days Job Retention = 6 months# six months AutoPrune = yes # Prune expired Jobs/Files } The current problem is that I get errors on some hosts, such as: 17-Jul 02:52 leiden-dir JobId 94: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer 17-Jul 02:52 leiden-dir JobId 94: Fatal error: No Job status returned from FD. 17-Jul 02:52 leiden-dir JobId 94: Error: Bacula leiden-dir 5.0.3 (04Aug10): 17-Jul-2011 02:52:30 Build OS: i486-pc-linux-gnu debian wheezy/sid JobId: 94 Job:BLAjob.2011-07-17_00.52.14_10 Backup Level: Full (upgraded from Incremental) Client: client4 5.0.2 (28Apr10) x86_64-pc-linux-gnu,debian,squeeze/sid FileSet:Home Set 2011-07-16 23:49:43 Pool: LeidenPool (From Job resource) Catalog:MyCatalog (From Client resource) Storage:leiden-filestorage (From Job resource) Scheduled time: 17-Jul-2011 00:52:13 Start time: 17-Jul-2011 00:52:16 End time: 17-Jul-2011 02:52:30 Elapsed time: 2 hours 14 secs Priority: 10 FD Files Written: 0 SD Files Written: 137,033 FD Bytes Written: 0 (0 B) SD Bytes Written: 3,586,674,915 (3.586 GB) Rate: 0.0 KB/s Software Compression: None VSS:no Encryption: no Accurate: no Volume name(s): LeidenVol0005 Volume Session Id: 20 Volume Session Time:1310599400 Last Volume Bytes: 12,025,925,394 (12.02 GB) Non-fatal FD errors:0 SD Errors: 0 FD termination status: Error SD termination status: OK Termination:*** Backup Error *** When trying to rerun the job it also fails after 2 hours I tried to fix it this way: In the Job @ bacula-dir , I added Max Run Time = 144000 because it seemed like bacula shut down the connection after 2 hours. I also changed the keep-alive time on the machine running bacula-dir : sysctl -w net.ipv4.tcp_keepalive_time=60 When I did so, it failed completely: Elapsed time: 15 hours 22 mins 58 secs Priority: 10 FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None VSS:no Encryption: no Accurate: no Volume name(s): Volume Session Id: 33 Volume Session Time:1310599400 That's really bad, my router did not detect any traffic at all except for some