Re: [Bacula-users] Restore Errors
Hello Sergio, Yes, it has all signs of being a hardware error. A critical target error with a sector address that is reasonable for Bacula's byte address sounds to me like the OS had a write error so the disk could not be read correctly by Bacula. The time of the kernel error is after the Bacula read, so perhaps sometime after the disk was written by Bacula it started going bad. About the only thing you can do now is run the the restore (either with bextract or on the bacula-sd execution line) with -p, so that Bacula will try to ignore errors. You might be able to restore something. Best regards, Kern On 07/06/2017 04:57 PM, Sergio Belkin wrote: Thanks Kern and Wanderlei, I've tried bls and I get: 06-Jul 11:48 bls JobId 0: Error: block.c:429 Read error on fd=3 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output error. 06-Jul 11:48 bls JobId 0: Error: read_records.c:124 block.c:429 Read error on fd=3 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output error. Also, in kernel logs: I've found: Jul 06 11:55:22 helsinki.infoestructura.local kernel: blk_update_request: critical target error, dev sda, sector 495437784 So I guess is a hardware error, isn't it? Greetings 2017-07-05 4:32 GMT-03:00 Kern Sibbald: Yes, Bacula is telling you that it is a disk I/O error. I suggest you check your kernel log. This looks like a hardware I/O error. If you find something in your kernel log, you should check your disk drive very carefully, it may be going bad. If there are no problems noted in the kernel log, then there is some other problem -- such as an interface (or network) error with the external disk). Best regards, Kern On 07/04/2017 06:50 PM, Sergio Belkin wrote: Hi, When I run restore, I have the following errors: 04-Jul 13:39 bacula-sd JobId 2094: Error: block.c:429 Read error on fd=7 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output error. 04-Jul 13:39 bacula-sd JobId 2094: Error: read_records.c:124 block.c:429 Read error on fd=7 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output error. I've found this documentation: http://www.bacula.org/5.2.x-manuals/en/main/main/Restore_Command.html#SECTION002111 but I use a File Device (on an extenal disk), not a tape Bacula version of director is 7.0 Is a disk error? Could be a misconfiguration? Thanks in advance -- -- Sergio Belkin LPIC-2 Certified - http://www.lpi.org
Re: [Bacula-users] Restore Errors
> Thanks Kern and Wanderlei, Hello, Sergio, > I've tried bls and I get: > 06-Jul 11:48 bls JobId 0: Error: block.c:429 Read error on fd=3 at file:blk > 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output > error. > 06-Jul 11:48 bls JobId 0: Error: read_records.c:124 block.c:429 Read error on > fd=3 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). > ERR=Input/output error. > Also, in kernel logs: > I've found: > Jul 06 11:55:22 helsinki.infoestructura.local kernel: blk_update_request: > critical target error, dev sda, sector 495437784 > So I guess is a hardware error, isn't it? Sorry for the para jump. Did you try to fsck? Regards, -- === Heitor Medrado de Faria | EB-1 Visa | LPIC-III | ITIL-F | EMC 05-001| Bacula Systems Certified Administrator II • Do you need Bacula training? http://bacula.us/video-classes/ +55 61 98268-4220 | http://bacula.us === -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore Errors
Thanks Kern and Wanderlei, I've tried bls and I get: 06-Jul 11:48 bls JobId 0: Error: block.c:429 Read error on fd=3 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output error. 06-Jul 11:48 bls JobId 0: Error: read_records.c:124 block.c:429 Read error on fd=3 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output error. Also, in kernel logs: I've found: Jul 06 11:55:22 helsinki.infoestructura.local kernel: blk_update_request: critical target error, dev sda, sector 495437784 So I guess is a hardware error, isn't it? Greetings 2017-07-05 4:32 GMT-03:00 Kern Sibbald: > Yes, Bacula is telling you that it is a disk I/O error. > > I suggest you check your kernel log. This looks like a hardware I/O > error. If you find something in your kernel log, you should check your > disk drive very carefully, it may be going bad. If there are no problems > noted in the kernel log, then there is some other problem -- such as an > interface (or network) error with the external disk). > > Best regards, > > Kern > > On 07/04/2017 06:50 PM, Sergio Belkin wrote: > > Hi, > > When I run restore, I have the following errors: > > 04-Jul 13:39 bacula-sd JobId 2094: Error: block.c:429 Read error on fd=7 > at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). > ERR=Input/output error. > 04-Jul 13:39 bacula-sd JobId 2094: Error: read_records.c:124 block.c:429 > Read error on fd=7 at file:blk 0:397845547 on device "WStorage2" > (/backup/external/2week). ERR=Input/output error. > > > I've found this documentation: http://www.bacula.org/5.2.x- > manuals/en/main/main/Restore_Command.html#SECTION002111 > > but I use a File Device (on an extenal disk), not a tape > > Bacula version of director is 7.0 > > Is a disk error? Could be a misconfiguration? > > Thanks in advance > -- > -- > Sergio Belkin > LPIC-2 Certified - http://www.lpi.org > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > ___ > Bacula-users mailing > listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users > > > -- -- Sergio Belkin LPIC-2 Certified - http://www.lpi.org -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore Errors
Yes, Bacula is telling you that it is a disk I/O error. I suggest you check your kernel log. This looks like a hardware I/O error. If you find something in your kernel log, you should check your disk drive very carefully, it may be going bad. If there are no problems noted in the kernel log, then there is some other problem -- such as an interface (or network) error with the external disk). Best regards, Kern On 07/04/2017 06:50 PM, Sergio Belkin wrote: Hi, When I run restore, I have the following errors: 04-Jul 13:39 bacula-sd JobId 2094: Error: block.c:429 Read error on fd=7 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output error. 04-Jul 13:39 bacula-sd JobId 2094: Error: read_records.c:124 block.c:429 Read error on fd=7 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output error. I've found this documentation: http://www.bacula.org/5.2.x-manuals/en/main/main/Restore_Command.html#SECTION002111 but I use a File Device (on an extenal disk), not a tape Bacula version of director is 7.0 Is a disk error? Could be a misconfiguration? Thanks in advance -- -- Sergio Belkin LPIC-2 Certified - http://www.lpi.org -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore Errors
Hello Sergio Probably you have some error in the volume. You can try BLS and BEXTRACT to try to restore some data from the volume. In the shell command line: Syntax bls -c /etc/bacula/bacula-sd.conf -V Volume-0001 /path/for/storage bextract -c /etc/bacula/bacula-sd.conf -V Volume-0001 /path/for/storage /path/for/restore Best regards *Wanderlei Hüttel* http://www.huttel.com.br 2017-07-04 13:50 GMT-03:00 Sergio Belkin: > Hi, > > When I run restore, I have the following errors: > > 04-Jul 13:39 bacula-sd JobId 2094: Error: block.c:429 Read error on fd=7 > at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). > ERR=Input/output error. > 04-Jul 13:39 bacula-sd JobId 2094: Error: read_records.c:124 block.c:429 > Read error on fd=7 at file:blk 0:397845547 on device "WStorage2" > (/backup/external/2week). ERR=Input/output error. > > > I've found this documentation: http://www.bacula.org/5.2.x- > manuals/en/main/main/Restore_Command.html#SECTION002111 > > but I use a File Device (on an extenal disk), not a tape > > Bacula version of director is 7.0 > > Is a disk error? Could be a misconfiguration? > > Thanks in advance > -- > -- > Sergio Belkin > LPIC-2 Certified - http://www.lpi.org > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Restore Errors
Hi, When I run restore, I have the following errors: 04-Jul 13:39 bacula-sd JobId 2094: Error: block.c:429 Read error on fd=7 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output error. 04-Jul 13:39 bacula-sd JobId 2094: Error: read_records.c:124 block.c:429 Read error on fd=7 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output error. I've found this documentation: http://www.bacula.org/5.2.x-manuals/en/main/main/Restore_Command.html#SECTION002111 but I use a File Device (on an extenal disk), not a tape Bacula version of director is 7.0 Is a disk error? Could be a misconfiguration? Thanks in advance -- -- Sergio Belkin LPIC-2 Certified - http://www.lpi.org -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Restore Errors
Hello, I am attempting to perform a test restore using bacula however the job is failing to restore symlinks which are backed up the volume. Here is the error from the console. 13-Feb 10:53 rt01.example.com JobId 8339: Error: create_file.c:305 Could not symlink /storage/home/55/etc/rc.d/rc4.d/S26apmd -> ../init.d/apmd: ERR=No such file or directory Is there a way to make bacula restore symlinks properly? I do not care if it is restored as a broken link. -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Restore Errors, terminates early
Below is the console log from my failing restore job. As you can see, the number of files restored is WAY low. I'm trying to figure out how to get what I can out of this restore. I've done small restores before, a file or two, a even a small directory. This is the first time I've had to restore a lot of stuff. Unfortunately, this is not a test :-( Any ideas? TIA, roland Run Restore job JobName:RestoreFiles Bootstrap: /var/spool/bacula/archos-dir.restore.3.bsr Where: /tmp/bacula-restores Replace:always FileSet:System Set Client: aristarchus-fd Storage:File When: 2008-11-06 14:06:05 Catalog:MyCatalog Priority: 10 OK to run? (yes/mod/no): yes Job queued. JobId=289 06-Nov 14:06 archos-dir: Start Restore Job RestoreFiles.2008-11-06_14.06.08 06-Nov 14:06 archos-sd: Ready to read from volume Aristarchus-0004 on device FileStorage (/backup). 06-Nov 14:06 archos-sd: Forward spacing Volume Aristarchus-0004 to file:block 15:2571201622. 06-Nov 14:10 archos-sd: RestoreFiles.2008-11-06_14.06.08 Error: block.c:317 Volume data error at 18:63957951! Block checksum mismatch in block=1199365 len=64512: calc=90c6023e blk=c0deabb4 06-Nov 14:10 aristarchus-fd JobId 289: Error: attribs.c:421 File size of restored file /tmp/bacula-restores/home/roland/tmp/20080429-AstroTrac/img_3470.png not correct. Original 36831200, restored 26476544. 06-Nov 14:10 archos-dir: RestoreFiles.2008-11-06_14.06.08 Error: Bacula 2.0.3 (06Mar07): 06-Nov-2008 14:10:12 JobId: 289 Job:RestoreFiles.2008-11-06_14.06.08 Client: aristarchus-fd Start time: 06-Nov-2008 14:06:10 End time: 06-Nov-2008 14:10:12 Files Expected: 126,091 Files Restored: 14,446 Bytes Restored: 8,439,830,107 Rate: 34875.3 KB/s FD Errors: 1 FD termination status: Error SD termination status: Error Termination:*** Restore Error *** 06-Nov 14:10 archos-dir: Begin pruning Jobs. 06-Nov 14:10 archos-dir: No Jobs found to prune. 06-Nov 14:10 archos-dir: Begin pruning Files. 06-Nov 14:10 archos-dir: No Files found to prune. 06-Nov 14:10 archos-dir: End auto prune. -- PGP Key ID: 66 BC 3B CD Roland B. Roberts, PhD RL Enterprises [EMAIL PROTECTED]6818 Madeline Court [EMAIL PROTECTED] Brooklyn, NY 11220 - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore Errors, terminates early
Roland Roberts wrote: Below is the console log from my failing restore job. As you can see, the number of files restored is WAY low. I'm trying to figure out how to get what I can out of this restore. I've done small restores before, a file or two, a even a small directory. This is the first time I've had to restore a lot of stuff. Unfortunately, this is not a test :-( It would appear the problem is in the backend with quoting file names. I have some configuration files that were created via a Java webstart task. Who cares? Well, they are arguably misconfigured 'cause they create their config files as c:\jobwatch.properties which ends up in my home directory as /home/roland/c\:\\jobwatch.properties. That name doesn't get quoted correctly in the SQL query that goes to PostgreSQL, so the query fails (and I get an error in syslog from the postmaster). I've just unmarked those files and will see how far I can get now. It is looking better (since it is still running). I assume I should log this as a bug roland -- PGP Key ID: 66 BC 3B CD Roland B. Roberts, PhD RL Enterprises [EMAIL PROTECTED]6818 Madeline Court [EMAIL PROTECTED] Brooklyn, NY 11220 - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore Errors, terminates early
Roland Roberts wrote: It would appear the problem is in the backend with quoting file names. I have some configuration files that were created via a Java webstart task. Who cares? Well, they are arguably misconfigured 'cause they create their config files as c:\jobwatch.properties which ends up in my home directory as /home/roland/c\:\\jobwatch.properties. That name doesn't get quoted correctly in the SQL query that goes to PostgreSQL, so the query fails (and I get an error in syslog from the postmaster). I've just unmarked those files and will see how far I can get now. It is looking better (since it is still running). Well, I spoke too soon. It's clear that this is not the whole story. I'm not getting any logs on the server side to help me with this. It's still quitting early, and syslog does show postgresql errors coincident with the job termination. They look like this: Nov 6 17:14:06 archos postgres[31135]: [30-1] ERROR: table delcandidates does not exist Nov 6 17:14:06 archos postgres[31135]: [30-2] STATEMENT: DROP TABLE DelCandidates Nov 6 17:14:06 archos postgres[31135]: [31-1] ERROR: index delinx1 does not exist Nov 6 17:14:06 archos postgres[31135]: [31-2] STATEMENT: DROP INDEX DelInx1 Nov 6 17:14:06 archos postgres[31135]: [32-1] ERROR: index delinx1 does not exist Nov 6 17:14:06 archos postgres[31135]: [32-2] STATEMENT: DROP INDEX DelInx1 But that may be innocuous as I also seem to get this message when I *issue* the restore command: Nov 6 17:22:09 archos postgres[31135]: [33-1] ERROR: table temp does not exist Nov 6 17:22:09 archos postgres[31135]: [33-2] STATEMENT: DROP TABLE temp Nov 6 17:22:09 archos postgres[31135]: [34-1] ERROR: table temp1 does not exist Nov 6 17:22:09 archos postgres[31135]: [34-2] STATEMENT: DROP TABLE temp1 The restore seems to terminate when it gets any error like a file size not matching. This isn't what I expected from the manual where I expected it to continue on until all files were restored as best as possible. I'm now picking directories, one at a time, and restoring them. Any better ideas on tracking this down? roland -- PGP Key ID: 66 BC 3B CD Roland B. Roberts, PhD RL Enterprises [EMAIL PROTECTED]6818 Madeline Court [EMAIL PROTECTED] Brooklyn, NY 11220 - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
I started the DB check yesterday morning. It is still running. How long does this usually take? Or am I doing something wrong? Wolfgang -Original Message- From: Julien [mailto:[EMAIL PROTECTED] Sent: Monday, July 30, 2007 10:22 To: Mair Wolfgang-awm013 Cc: Doytchin Spiridonov; bacula-users Subject: Re: [Bacula-users] Restore errors Hi Mair, did you tried with a dbcheck ? (dbcheck -c /path/to/bacula-dir.conf ; Toggle modify database flag ; 16) All (3-15)) I had also a huge difference between files expected and files restored, but the operation above fixed that ... Regards, Julien On Mon, 2007-07-30 at 09:03 +0100, Mair Wolfgang-awm013 wrote: Hello, In my case spooling brought a remarkable improvement. Where as I had hundreds of errors on one restore I hardly see them now again with spooling in place. Doytchin, I also saw the same behavior like you did. With the concurrent jobs = 1, there were no errors. In this case no matter with or without spooling. Unfortunately setting the concurrent jobs = 1 is not an option in our environment. So my current setting is spooling = on and concurrent jobs = 5. With this settings, it looks like that the Linux systems (OpenSuse and red hat) are ok but Solaris still has problems. For example below is a restore I did on Friday. Of course I don't mind about the door files. But the difference between expected and restored files is ways to much. And even worse, I have no idea what happened to the missing files. I don't know if this has to do something with the restore errors we saw. This could also be something different. During the weekend I moved bacula to a new and separate server. It runs on OpenSuse 10.2 with the latest patches in place now. The few restore jobs I've done so far with this went ok. All in all I still don't feel very comfortable with this, it needs more tests to be done. I will continue with testing and keep you updated. Wolfgang 27-Jul 11:07 porsche-dir: Start Restore Job RestoreFiles.2007-07-27_11.07.03 27-Jul 11:07 porsche-sd: Ready to read from volume full-27-7-2007.20 on device FileStorageFull (/export/bacula-dump). 27-Jul 11:07 porsche-sd: Forward spacing Volume full-27-7-2007.20 to file:block 0:3999802558. 27-Jul 11:08 porsche-sd: End of file 1 on device FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.20 27-Jul 11:08 porsche-sd: End of Volume at file 1 on device FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.20 27-Jul 11:08 porsche-sd: Ready to read from volume full-27-7-2007.21 on device FileStorageFull (/export/bacula-dump). 27-Jul 11:08 porsche-sd: Forward spacing Volume full-27-7-2007.21 to file:block 0:200. 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make node /export/xxx/dev/.zone_reg_door: ERR=Invalid argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make node /export/xxx/dev/.devfsadm_synch_door: ERR=Invalid argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make node /export/xxx/etc/sysevent/devfsadm_event_channel/reg_door: ERR=Invalid argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make node /export/xxx/etc/sysevent/devfsadm_event_channel/1: ERR=Invalid argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make node /export/xxx/etc/sysevent/syseventconfd_event_channel/reg_door: ERR=Invalid argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make node /export/xxx/etc/sysevent/sysevent_door: ERR=Invalid argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make node /export/xxx/etc/sysevent/piclevent_door: ERR=Invalid argument 27-Jul 11:15 porsche-sd: End of file 1 on device FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.21 27-Jul 11:15 porsche-sd: End of Volume at file 1 on device FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.21 27-Jul 11:15 porsche-sd: Ready to read from volume full-27-7-2007.22 on device FileStorageFull (/export/bacula-dump). 27-Jul 11:15 porsche-sd: Forward spacing Volume full-27-7-2007.22 to file:block 0:200. 27-Jul 11:21 porsche-sd: End of file 1 on device FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.22 27-Jul 11:21 porsche-sd: End of Volume at file 1 on device FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.22 27-Jul 11:21 porsche-sd: Ready to read from volume full-27-7-2007.23 on device FileStorageFull (/export/bacula-dump). 27-Jul 11:21 porsche-sd: Forward spacing Volume full-27-7-2007.23 to file:block 0:200. 27-Jul 11:28 porsche-sd: End of file 1 on device FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.23 27-Jul 11:28 porsche-sd: End of Volume at file 1 on device
Re: [Bacula-users] Restore errors
On 7/31/07, Mair Wolfgang-awm013 [EMAIL PROTECTED] wrote: I started the DB check yesterday morning. It is still running. How long does this usually take? Or am I doing something wrong? This depends on how big your database, what type of database and if it is properly indexed. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, it will run forewer if you don't create indexes. You better stop it (also make sure the sql query is stopped as this was not a case here) and then do something that Frank Alpeter posted once here, I'll quote him: --- I'm running dbcheck periodically every sunday with the following script: #!/bin/sh PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin echo $(date) Creating temp indices for bacula database... mysql -ubacula EOFA use bacula CREATE INDEX file_tmp_filenameid_idx ON File (FilenameId); CREATE INDEX file_tmp_pathid_idx ON File (PathId); EOFA echo $(date) Running dbcheck... dbcheck -c /usr/local/etc/bacula-dir.conf -f -b -v echo $(date) Removing indices and optimizing bacula database... mysql -ubacula EOFB use bacula DROP INDEX file_tmp_filenameid_idx ON File; DROP INDEX file_tmp_pathid_idx ON File; OPTIMIZE TABLE UnsavedFiles, Counters, CDImages, BaseFiles, Device, Version, Status, MediaType, Storage, FileSet, Client, Pool, Media, Job, JobMedia, File, Path, Filename; EOFB echo $(date) Done... Due to the additional index entries, the script runs for about 90 minutes (eternally without them) with a database of 24 million file entries from 88 clients, mostly application servers. It helps a lot, but still it doesn't help against orphaned entries from previously removed clients. --- Regards. Tuesday, July 31, 2007, 3:08:22 PM: MWa I started the DB check yesterday morning. It is still running. How long MWa does this usually take? Or am I doing something wrong? MWa Wolfgang MWa -Original Message- MWa From: Julien [mailto:[EMAIL PROTECTED] MWa Sent: Monday, July 30, 2007 10:22 MWa To: Mair Wolfgang-awm013 MWa Cc: Doytchin Spiridonov; bacula-users MWa Subject: Re: [Bacula-users] Restore errors MWa Hi Mair, MWa did you tried with a dbcheck ? MWa (dbcheck -c /path/to/bacula-dir.conf ; Toggle modify database flag ; 16) MWa All (3-15)) MWa I had also a huge difference between files expected and files restored, MWa but the operation above fixed that ... MWa Regards, MWa Julien MWa On Mon, 2007-07-30 at 09:03 +0100, Mair Wolfgang-awm013 wrote: Hello, In my case spooling brought a remarkable improvement. Where as I had hundreds of errors on one restore I hardly see them now again with spooling in place. Doytchin, I also saw the same behavior like you did. With the concurrent jobs = 1, there were no errors. In this case no matter with or without MWa spooling. Unfortunately setting the concurrent jobs = 1 is not an option in our environment. So my current setting is spooling = on and concurrent jobs = 5. With this settings, it looks like that the Linux systems (OpenSuse and red hat) are ok but Solaris still has problems. For example below is a restore I did on Friday. Of course I don't mind about the door files. But the difference between expected and restored files is ways to much. And even worse, I have no idea what happened to the missing files. I don't know if this has to do something with the restore errors we saw. This could also be something different. During the weekend I moved bacula to a new and separate server. It runs on OpenSuse 10.2 with the latest patches in place now. The few restore jobs I've done so far with this went ok. All in all I still don't feel very comfortable with this, it needs more tests to be done. I will continue with testing and keep you MWa updated. Wolfgang 27-Jul 11:07 porsche-dir: Start Restore Job RestoreFiles.2007-07-27_11.07.03 27-Jul 11:07 porsche-sd: Ready to read from volume full-27-7-2007.20 on device FileStorageFull (/export/bacula-dump). 27-Jul 11:07 porsche-sd: Forward spacing Volume full-27-7-2007.20 to file:block 0:3999802558. 27-Jul 11:08 porsche-sd: End of file 1 on device FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.20 27-Jul 11:08 porsche-sd: End of Volume at file 1 on device FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.20 27-Jul 11:08 porsche-sd: Ready to read from volume full-27-7-2007.21 on device FileStorageFull (/export/bacula-dump). 27-Jul 11:08 porsche-sd: Forward spacing Volume full-27-7-2007.21 to file:block 0:200. 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make node /export/xxx/dev/.zone_reg_door: ERR=Invalid argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make node /export/xxx/dev/.devfsadm_synch_door: ERR=Invalid argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make node /export/xxx/etc/sysevent/devfsadm_event_channel/reg_door: ERR=Invalid argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 MWa Error: create_file.c:245 Cannot make node
Re: [Bacula-users] Restore errors
porsche-dir: No Jobs found to prune. 27-Jul 11:28 porsche-dir: Begin pruning Files. 27-Jul 11:28 porsche-dir: No Files found to prune. 27-Jul 11:28 porsche-dir: End auto prune. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doytchin Spiridonov Sent: Saturday, July 28, 2007 02:02 To: bacula-users Subject: Re: [Bacula-users] Restore errors Hello, just to note that several days after a full backup and incremental bacpus, restores are OK, which again proves that the problem was caused by running concurrent jobs. Wolfgang do you have the same results? Regards Wednesday, July 25, 2007, 8:12:25 PM: DS Hello, DS 2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for DS all clients. DS Restore OK of all jobs. DS Seems this (concurrent jobs) is the problem. DS Regards. DS Tuesday, July 24, 2007, 9:57:35 PM: DS I don't have any other ideas to check with to provide more cases. DS It's developers turn now... - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
: 27-Jul-2007 11:07:05 End time: 27-Jul-2007 11:28:34 Files Expected: 303,761 Files Restored: 301,923 Bytes Restored: 27,412,500,483 Rate: 21266.5 KB/s FD Errors: 7 FD termination status: Error SD termination status: OK Termination:*** Restore Error *** 27-Jul 11:28 porsche-dir: Begin pruning Jobs. 27-Jul 11:28 porsche-dir: No Jobs found to prune. 27-Jul 11:28 porsche-dir: Begin pruning Files. 27-Jul 11:28 porsche-dir: No Files found to prune. 27-Jul 11:28 porsche-dir: End auto prune. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doytchin Spiridonov Sent: Saturday, July 28, 2007 02:02 To: bacula-users Subject: Re: [Bacula-users] Restore errors Hello, just to note that several days after a full backup and incremental bacpus, restores are OK, which again proves that the problem was caused by running concurrent jobs. Wolfgang do you have the same results? Regards Wednesday, July 25, 2007, 8:12:25 PM: DS Hello, DS 2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for DS all clients. DS Restore OK of all jobs. DS Seems this (concurrent jobs) is the problem. DS Regards. DS Tuesday, July 24, 2007, 9:57:35 PM: DS I don't have any other ideas to check with to provide more cases. DS It's developers turn now... - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
MWa on device FileStorageFull (/export/bacula-dump). MWa 27-Jul 11:21 porsche-sd: Forward spacing Volume full-27-7-2007.23 to MWa file:block 0:200. MWa 27-Jul 11:28 porsche-sd: End of file 1 on device FileStorageFull MWa (/export/bacula-dump), Volume full-27-7-2007.23 MWa 27-Jul 11:28 porsche-sd: End of Volume at file 1 on device MWa FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.23 MWa 27-Jul 11:28 porsche-sd: End of all volumes. MWa 27-Jul 11:28 porsche-dir: RestoreFiles.2007-07-27_11.07.03 Error: Bacula MWa 2.0.3 (06Mar07): 27-Jul-2007 11:28:34 MWa JobId: 40 MWa Job:RestoreFiles.2007-07-27_11.07.03 MWa Client: prinz-fd MWa Start time: 27-Jul-2007 11:07:05 MWa End time: 27-Jul-2007 11:28:34 MWa Files Expected: 303,761 MWa Files Restored: 301,923 MWa Bytes Restored: 27,412,500,483 MWa Rate: 21266.5 KB/s MWa FD Errors: 7 MWa FD termination status: Error MWa SD termination status: OK MWa Termination:*** Restore Error *** MWa 27-Jul 11:28 porsche-dir: Begin pruning Jobs. MWa 27-Jul 11:28 porsche-dir: No Jobs found to prune. MWa 27-Jul 11:28 porsche-dir: Begin pruning Files. MWa 27-Jul 11:28 porsche-dir: No Files found to prune. MWa 27-Jul 11:28 porsche-dir: End auto prune. MWa MWa -Original Message- MWa From: [EMAIL PROTECTED] MWa [mailto:[EMAIL PROTECTED] On Behalf Of MWa Doytchin Spiridonov MWa Sent: Saturday, July 28, 2007 02:02 MWa To: bacula-users MWa Subject: Re: [Bacula-users] Restore errors MWa Hello, MWa just to note that several days after a full backup and incremental MWa bacpus, restores are OK, which again proves that the problem was caused MWa by running concurrent jobs. MWa Wolfgang do you have the same results? MWa Regards MWa Wednesday, July 25, 2007, 8:12:25 PM: DS Hello, DS 2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for DS all clients. DS Restore OK of all jobs. DS Seems this (concurrent jobs) is the problem. DS Regards. DS Tuesday, July 24, 2007, 9:57:35 PM: DS I don't have any other ideas to check with to provide more cases. DS It's developers turn now... MWa MWa - MWa This SF.net email is sponsored by: Splunk Inc. MWa Still grepping through log files to find problems? Stop. MWa Now Search log events and configuration files using AJAX and a browser. MWa Download your FREE copy of Splunk now http://get.splunk.com/ MWa ___ MWa Bacula-users mailing list MWa Bacula-users@lists.sourceforge.net MWa https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
of file 1 on device FileStorageFull MWa (/export/bacula-dump), Volume full-27-7-2007.22 MWa 27-Jul 11:21 porsche-sd: End of Volume at file 1 on device MWa FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.22 MWa 27-Jul 11:21 porsche-sd: Ready to read from volume full-27-7-2007.23 MWa on device FileStorageFull (/export/bacula-dump). MWa 27-Jul 11:21 porsche-sd: Forward spacing Volume full-27-7-2007.23 to MWa file:block 0:200. MWa 27-Jul 11:28 porsche-sd: End of file 1 on device FileStorageFull MWa (/export/bacula-dump), Volume full-27-7-2007.23 MWa 27-Jul 11:28 porsche-sd: End of Volume at file 1 on device MWa FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.23 MWa 27-Jul 11:28 porsche-sd: End of all volumes. MWa 27-Jul 11:28 porsche-dir: RestoreFiles.2007-07-27_11.07.03 Error: Bacula MWa 2.0.3 (06Mar07): 27-Jul-2007 11:28:34 MWa JobId: 40 MWa Job:RestoreFiles.2007-07-27_11.07.03 MWa Client: prinz-fd MWa Start time: 27-Jul-2007 11:07:05 MWa End time: 27-Jul-2007 11:28:34 MWa Files Expected: 303,761 MWa Files Restored: 301,923 MWa Bytes Restored: 27,412,500,483 MWa Rate: 21266.5 KB/s MWa FD Errors: 7 MWa FD termination status: Error MWa SD termination status: OK MWa Termination:*** Restore Error *** MWa 27-Jul 11:28 porsche-dir: Begin pruning Jobs. MWa 27-Jul 11:28 porsche-dir: No Jobs found to prune. MWa 27-Jul 11:28 porsche-dir: Begin pruning Files. MWa 27-Jul 11:28 porsche-dir: No Files found to prune. MWa 27-Jul 11:28 porsche-dir: End auto prune. MWa MWa -Original Message- MWa From: [EMAIL PROTECTED] MWa [mailto:[EMAIL PROTECTED] On Behalf Of MWa Doytchin Spiridonov MWa Sent: Saturday, July 28, 2007 02:02 MWa To: bacula-users MWa Subject: Re: [Bacula-users] Restore errors MWa Hello, MWa just to note that several days after a full backup and incremental MWa bacpus, restores are OK, which again proves that the problem was caused MWa by running concurrent jobs. MWa Wolfgang do you have the same results? MWa Regards MWa Wednesday, July 25, 2007, 8:12:25 PM: DS Hello, DS 2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for DS all clients. DS Restore OK of all jobs. DS Seems this (concurrent jobs) is the problem. DS Regards. DS Tuesday, July 24, 2007, 9:57:35 PM: DS I don't have any other ideas to check with to provide more cases. DS It's developers turn now... MWa MWa - MWa This SF.net email is sponsored by: Splunk Inc. MWa Still grepping through log files to find problems? Stop. MWa Now Search log events and configuration files using AJAX and a browser. MWa Download your FREE copy of Splunk now http://get.splunk.com/ MWa ___ MWa Bacula-users mailing list MWa Bacula-users@lists.sourceforge.net MWa https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, just to note that several days after a full backup and incremental bacpus, restores are OK, which again proves that the problem was caused by running concurrent jobs. Wolfgang do you have the same results? Regards Wednesday, July 25, 2007, 8:12:25 PM: DS Hello, DS 2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for all DS clients. DS Restore OK of all jobs. DS Seems this (concurrent jobs) is the problem. DS Regards. DS Tuesday, July 24, 2007, 9:57:35 PM: DS I don't have any other ideas to check with to provide more cases. It's DS developers turn now... - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Steen wrote: Hello, just wondering from the sideline here: On Tuesday 24 July 2007 07:28:48 Doytchin Spiridonov wrote: 1. some static files (i.e. not log files!) are restored with wrong (always larger) size, while first N bytes match, and the rest is filled with a part of another file (not sure if this is just file with a wrong size and some old data at the disk appears at the end, or bacula restores part of another file and append it to the end). The file can be restored correctly if marked alone doesn't his prove that the relevant catalog data for the file is OK? Quite possibly; I suspect that one of the developers would have to look at the catalog contents to say one way or the other. b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header file index 42452 not equal record index 0 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124 Error sending to File daemon. ERR=Connection reset by peer Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306 Write error sending 30 bytes to client:10.2.1.13:36643: ERR=Connection reset by peer This seems to point to an error on the fd side? How can that be related to the backing up part? The error basically indicates that the FD got data from the SD it believed to be corrupt. This could theoretically be due to a problem on the FD side, but it certainly is not enough rule out the director or SD. -- Frank Sweetser fs at wpi.edu | For every problem, there is a solution that WPI Senior Network Engineer | is simple, elegant, and wrong. - HL Mencken GPG fingerprint = 6174 1257 129E 0D21 D8D4 E8A3 8E39 29E3 E2E8 8CEC - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, Thursday, July 26, 2007, 7:38:43 PM: S Hello, S just wondering from the sideline here: The file can be restored correctly if marked alone S doesn't his prove that the relevant catalog data for the file is OK? This probably means that the problem is with positioning inside volumes? but the error 3. below is generated (which seems to be just a bogus error). b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header file index 42452 not equal record index 0 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124 Error sending to File daemon. ERR=Connection reset by peer Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306 Write error sending 30 bytes to client:10.2.1.13:36643: ERR=Connection reset by peer S This seems to point to an error on the fd side? How can that be related to the S backing up part? No, as the same error happens when restoring to several different fds. Happens at the same time and positions and using different versions (2.0.3, 2.1.26, 2.1.28). The message could come from the fd, but if wrong data/packets are sent to fd, or if the connection is closed by the director or sd? I think those errors could be explained only by the developer who wrote the software. However as they couldn't reproduce the problem (seems they don't have the right test case, as this is clear is not related to the hadrware or OS) they closed the bug report and for us now the only solution is not to use concurrent jobs (and i guess for everyone, if you want to be sure that one day you could restore your backup if needed). The tests so far show that when running jobs 1 by 1 (not concurrent) none of the errors happen. I'm happy that we found at least a workaround, otherwise Bacula would be useless. The bottom line is that with Bacula if your backups are reported to be OK, this doesn't mean you could restore the files w/o a problem and it's a good idea to check periodically full restores. Regards. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, just wondering from the sideline here: On Tuesday 24 July 2007 07:28:48 Doytchin Spiridonov wrote: 1. some static files (i.e. not log files!) are restored with wrong (always larger) size, while first N bytes match, and the rest is filled with a part of another file (not sure if this is just file with a wrong size and some old data at the disk appears at the end, or bacula restores part of another file and append it to the end). The file can be restored correctly if marked alone doesn't his prove that the relevant catalog data for the file is OK? but the error 3. below is generated (which seems to be just a bogus error). An example error is: --- b0: Restore_b0.d6.int.2007-07-23_22.37.34 Error: attribs.c:410 File size of restored file /home/bacula/res/b3.2/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm not correct. Original 3826291, restored 10620921. --- When this error is present (always) the second error below (but w/o additional error messages) is present as well (missing files) 2. large amount of files are missing (while they are present in the catalog and selected) - tens of thousands (not sockets or anything else that Bacula ignores by default). When this happens usually an error like this appear (if not the first one above): --- b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header file index 42452 not equal record index 0 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124 Error sending to File daemon. ERR=Connection reset by peer Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306 Write error sending 30 bytes to client:10.2.1.13:36643: ERR=Connection reset by peer This seems to point to an error on the fd side? How can that be related to the backing up part? --- 3. when a file from error 1 is restored alone it is OK, but another bogus error is generated: --- Storage: Restore_b0.d6.int.2007-07-23_22.57.42 Error: block.c:275 Volume data error at 0:3999743252! Wanted ID: BB02, got Иnлу. Buffer discarded. --- Found that the above number (3999743252) is not present as block address for any block in the volumes, but the same number appears as part of JobMedia record in the database. This is everything in 2.1.28 sumarized, that poped up as a problem or fact. (2.0.3 had another bug with bogus errors about sockets' attributes and 2.1.26 had a bogus SQL error messages but those are fixed OK in 2.1.28). If anyone wants, feel free to reopen the bug in Mantis (903). I'm not going to do so as I am personally disappointed by the attitude this is not a bug - work it out yourself and the suggestion to send you our servers as a gift to test with, plus support fees... nice. Now it's up to you to create better test cases to catch more bugs if any. We will start our backup again w/o concurrent jobs and we will continue to monitor restores on a daily basis as the above tests are just 3 and I agree there is a posibility that it was just a chance that the later two tests went OK. But it was my suggestion from the beginning that the problem is Bacula damages either database numbers or volume records when concurrent jobs are running and so far the facts proved this. (!) The workaround for the problem is to switch off concurrent jobs as if not - the chance you have invalid backups are high (some 90% from our own cases and at least with our servers/os/configuration; this is so if it is not said that 100% of backups are wrong as after diff/incremental backups Bacula restores files that are deleted which is really a bad behaviour in many cases/services). Regards Tuesday, July 24, 2007, 12:15:43 AM: DL On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote: Hello, I've filed this as a bug, but while Kern couldn't reproduce it he gave up. So let us find here what could be the problem. There are actually two problems, they could be linked. DL Please. If anyone can solve the issue given what you supplied, they DL would. You were asked to supply a reproducible situation. Hopefully DL we can get to that position quickly without further unnecessary DL distractions. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Regards Steen - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/
Re: [Bacula-users] Restore errors
Hello, 2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for all clients. Restore OK of all jobs. Seems this (concurrent jobs) is the problem. Regards. Tuesday, July 24, 2007, 9:57:35 PM: DS I don't have any other ideas to check with to provide more cases. It's DS developers turn now... - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, This is exactly what I experienced last week. I submitted this under the subject: ' Restore Error of linux-install-fdFul'. However, I didn't had the time yet to track this as much down as Doytchin did. Great work! This morning (before reading through all this) I also found that if I do a single backup the restore runs fine. My setting has concurrent jobs = 3. If I backup with this I get the same errors as already described. In order to contribute something to this here, this is my setup and what I did so far: Opensuse 10.2 Bacula 2.0.3 First setup was in a vmware machine on a opensuse 10.2. Since I could not find anything wrong with the OS or file system. I thought about the virtual machine. I moved to a different (no virtual) system. Installed the OS new and compiled bacula 2.0.3 from scratch. (./configure --enable-smartalloc --with-mysql) copied the config files and created the needed mysql tables with the supplied scripts. The first manual backup and restore I did went without problems. I tried two big machines. Then I left it run the usual backup with 3 concurrent jobs. Tried a restore and it failed with the known problems. As Doytchin already tracked down, it is a matter of the concurrent running backup jobs. I fully agree with this. Currently I've also set the concurrent jobs = 1 and the backup is still running. I don't know if this would be usable in our environment, since it takes now a quite long time to complete. Hopefully the restore will work out fine now with this setting. I'll keep you updated. Regards Wolfgang -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doytchin Spiridonov Sent: Tuesday, July 24, 2007 07:29 To: bacula-users Subject: Re: [Bacula-users] Restore errors Hello, done. Found where is the problem after some more tests (and once again it is not in our hadrware or OS or broken things). It is where I initially suggested - the concurrent jobs. After the first (and native configuration) we used (concurrent jobs, with gzip) we tested the following: 1. concurrent jobs, w/o gzip - we got similar errors (1 wrong filesize from 4 jobs, but 3 of 4 jobs with less files than expected, the 4th usually is very small - 100 files - and never had errors, so I would say 100% of jobs was invalid) 2. no concurrent jobs (Maximum Concurrent Jobs = 1 at dir and sd), w/o gzip - good news, all restores are OK, no errors, Files Expected and Files Restored match! 3. no concurrent jobs WITH gzip - again OK, all restores are OK, no errors, Files Expected and Files Restored match! So until now we have: - the problem is not caused by a corrupted file system - volumes are consistent and bls doesn't show errors - MySQL is OK (initially 4.1.x now 5.0.37) - when running concurrent jobs both 2.0.3 and 2.1.28 say backups are OK but restores fail with one of the 3 kinds of errors listed below - when concurrent jobs are turned off everything is OK - gzip on/off doesn't affect the errors Once again the 3 types of errors are: 1. some static files (i.e. not log files!) are restored with wrong (always larger) size, while first N bytes match, and the rest is filled with a part of another file (not sure if this is just file with a wrong size and some old data at the disk appears at the end, or bacula restores part of another file and append it to the end). The file can be restored correctly if marked alone but the error 3. below is generated (which seems to be just a bogus error). An example error is: --- b0: Restore_b0.d6.int.2007-07-23_22.37.34 Error: attribs.c:410 File size of restored file /home/bacula/res/b3.2/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm not correct. Original 3826291, restored 10620921. --- When this error is present (always) the second error below (but w/o additional error messages) is present as well (missing files) 2. large amount of files are missing (while they are present in the catalog and selected) - tens of thousands (not sockets or anything else that Bacula ignores by default). When this happens usually an error like this appear (if not the first one above): --- b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header file index 42452 not equal record index 0 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124 Error sending to File daemon. ERR=Connection reset by peer Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306 Write error sending 30 bytes to client:10.2.1.13:36643: ERR=Connection reset by peer --- 3. when a file from error 1 is restored alone it is OK, but another bogus error is generated: --- Storage: Restore_b0.d6.int.2007-07-23_22.57.42 Error: block.c:275 Volume data error at 0:3999743252! Wanted ID: BB02, got Иnлу. Buffer discarded. --- Found that the above number (3999743252) is not present as block address for any block in the volumes, but the same number appears as part of JobMedia record in the database
Re: [Bacula-users] Restore errors
On 7/24/07, Mair Wolfgang-awm013 [EMAIL PROTECTED] wrote: Hello, This is exactly what I experienced last week. I submitted this under the subject: ' Restore Error of linux-install-fdFul'. However, I didn't had the time yet to track this as much down as Doytchin did. Great work! This morning (before reading through all this) I also found that if I do a single backup the restore runs fine. My setting has concurrent jobs = 3. If I backup with this I get the same errors as already described. Has anyone who has this problem tried turning spooling on? John - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Doytchin Spiridonov wrote: Hello, done. Found where is the problem after some more tests (and once again it is not in our hadrware or OS or broken things). It is where I initially suggested - the concurrent jobs. So you can reliably reproduce the problem now? Excellent! After the first (and native configuration) we used (concurrent jobs, with gzip) we tested the following: 1. concurrent jobs, w/o gzip - we got similar errors (1 wrong filesize from 4 jobs, but 3 of 4 jobs with less files than expected, the 4th usually is very small - 100 files - and never had errors, so I would say 100% of jobs was invalid) 2. no concurrent jobs (Maximum Concurrent Jobs = 1 at dir and sd), w/o gzip - good news, all restores are OK, no errors, Files Expected and Files Restored match! 3. no concurrent jobs WITH gzip - again OK, all restores are OK, no errors, Files Expected and Files Restored match! Okay, so it looks like you can reproduce the symptoms just with multiple concurrent jobs, regardless of the gzip settings. So until now we have: - the problem is not caused by a corrupted file system - volumes are consistent and bls doesn't show errors - MySQL is OK (initially 4.1.x now 5.0.37) - when running concurrent jobs both 2.0.3 and 2.1.28 say backups are OK but restores fail with one of the 3 kinds of errors listed below - when concurrent jobs are turned off everything is OK - gzip on/off doesn't affect the errors I realize that you mentioned in another email you're dumping the mysql tables nightly, but I would still strongly recommend that you run a repair tables on your catalog to be absolutely sure there isn't any subtle corruption that's snuck in. It pays to be painfully methodical when troubleshooting this kind of scenario, especially since you seem to be the first to knowingly run into this problem. Another good thing to try would be to double check and make sure that your catalog schema exactly matches what bacula is expecting. If, for example, the column type holding volume offsets somehow became a 16 bit int where bacula was expecting a 32 bit, the inserted values could become truncated or wrap around, causing the kind of corruption you're seeing. Actually, that gives me another idea. While I've never used it myself, you may be able to get more details by running some jobs with strict mode turned on on your mysql catalog. http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html If your bacula installation is doing something that would cause the data stored to be wrong, such as storing a value that doesn't fit in the column type, I believe this should turn it from a silent warning into a fatal error, making it easier to track down. Also, it's been suggested that you try turning on spooling. Have you done so? Once again the 3 types of errors are: 1. some static files (i.e. not log files!) are restored with wrong (always larger) size, while first N bytes match, and the rest is filled with a part of another file (not sure if this is just file with a wrong size and some old data at the disk appears at the end, or bacula restores part of another file and append it to the end). The file can be restored correctly if marked alone but the error 3. below is generated (which seems to be just a bogus error). An example error is: --- b0: Restore_b0.d6.int.2007-07-23_22.37.34 Error: attribs.c:410 File size of restored file /home/bacula/res/b3.2/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm not correct. Original 3826291, restored 10620921. --- When this error is present (always) the second error below (but w/o additional error messages) is present as well (missing files) 2. large amount of files are missing (while they are present in the catalog and selected) - tens of thousands (not sockets or anything else that Bacula ignores by default). When this happens usually an error like this appear (if not the first one above): --- b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header file index 42452 not equal record index 0 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124 Error sending to File daemon. ERR=Connection reset by peer Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306 Write error sending 30 bytes to client:10.2.1.13:36643: ERR=Connection reset by peer --- 3. when a file from error 1 is restored alone it is OK, but another bogus error is generated: --- Storage: Restore_b0.d6.int.2007-07-23_22.57.42 Error: block.c:275 Volume data error at 0:3999743252! Wanted ID: BB02, got Иnлу. Buffer discarded. --- Found that the above number (3999743252) is not present as block address for any block in the volumes, but the same number appears as part of JobMedia record in the database. This is everything in 2.1.28 sumarized, that poped up as a problem or fact. (2.0.3 had another bug with bogus errors about sockets' attributes and 2.1.26 had a bogus SQL error messages but those are
Re: [Bacula-users] Restore errors
Spooling? Does this also apply if my backup goes directly to files? Here is my seeting: sd: Device { Name = FileStorage Media Type = File Archive Device = /export/bacula-dump LabelMedia = yes; # lets Bacula label unlabeled media Random Access = Yes; AutomaticMount = yes; # when device opened, read it RemovableMedia = no; AlwaysOpen = no; } dir: # Storage { Name = File Address = porsche# N.B. Use a fully qualified name here SDPort = 9103 Password = YnwWs7iqCDg1mMq1LG3rB4LX1mpC7PNgCn68Y52Iu Device = FileStorage Media Type = File Maximum Concurrent Jobs = 1 } -Original Message- From: John Drescher [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 13:00 To: Mair Wolfgang-awm013 Cc: Doytchin Spiridonov; bacula-users Subject: Re: [Bacula-users] Restore errors On 7/24/07, Mair Wolfgang-awm013 [EMAIL PROTECTED] wrote: Hello, This is exactly what I experienced last week. I submitted this under the subject: ' Restore Error of linux-install-fdFul'. However, I didn't had the time yet to track this as much down as Doytchin did. Great work! This morning (before reading through all this) I also found that if I do a single backup the restore runs fine. My setting has concurrent jobs = 3. If I backup with this I get the same errors as already described. Has anyone who has this problem tried turning spooling on? John - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Mair Wolfgang-awm013 wrote: Spooling? Does this also apply if my backup goes directly to files? It would in this case, yes. With spooling, the data goes to the spooling file first, and is then unspooled in chunks. Without spooling, all of the data from the multiple jobs goes straight to the volume as it comes in. If the problem is the data going as it comes in, spooling would make the symptoms go away, and narrow down where the underlying problem might be. -- Frank Sweetser fs at wpi.edu | For every problem, there is a solution that WPI Senior Network Engineer | is simple, elegant, and wrong. - HL Mencken GPG fingerprint = 6174 1257 129E 0D21 D8D4 E8A3 8E39 29E3 E2E8 8CEC - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, Tuesday, July 24, 2007, 2:00:43 PM: FS Okay, so it looks like you can reproduce the symptoms just with multiple FS concurrent jobs, regardless of the gzip settings. I am sure the file/dirs backed up are important! I bet developers are tested enough concurrent jobs but if they didn't catched the problem then the dir structure/file numbers/size is important. To give a picture of our test environement - I am testing with 4 jobs, 4 separate servers, 50K - 350K files each and 2-7GB of data. One of the job is backing up the bacula server itself (not sure if this matters; I noted a possible problem with naming the daemons with same names and so temp files overwritten, but this is not our case). So until now we have: - the problem is not caused by a corrupted file system - volumes are consistent and bls doesn't show errors - MySQL is OK (initially 4.1.x now 5.0.37) - when running concurrent jobs both 2.0.3 and 2.1.28 say backups are OK but restores fail with one of the 3 kinds of errors listed below - when concurrent jobs are turned off everything is OK - gzip on/off doesn't affect the errors FS I realize that you mentioned in another email you're dumping the mysql tables FS nightly, but I would still strongly recommend that you run a repair tables on FS your catalog to be absolutely sure there isn't any subtle corruption that's FS snuck in. It pays to be painfully methodical when troubleshooting this kind FS of scenario, especially since you seem to be the first to knowingly run into FS this problem. FS Another good thing to try would be to double check and make sure that your FS catalog schema exactly matches what bacula is expecting. If, for example, the FS column type holding volume offsets somehow became a 16 bit int where bacula FS was expecting a 32 bit, the inserted values could become truncated or wrap FS around, causing the kind of corruption you're seeing. FS Actually, that gives me another idea. While I've never used it myself, you FS may be able to get more details by running some jobs with strict mode turned FS on on your mysql catalog. FS http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html FS If your bacula installation is doing something that would cause the data FS stored to be wrong, such as storing a value that doesn't fit in the column FS type, I believe this should turn it from a silent warning into a fatal error, FS making it easier to track down. FS Also, it's been suggested that you try turning on spooling. Have you done so? Nice suggestion. Will try it and spooling as well. This probably will cut the possibilities to half as the problem is either with the wrong database data or wrong data in volumes (or both). Re mysql check - as we fixed the problem yesterday I don't have a DB now to check against but I'll start a new backup the old way to got the problem and to verify the DB just to remove that posibility. (We have one spare server for Bacula tests and in fact there is no Xen and LVM but were getting the problem there as well, this to prove also that it is not Xen or LVM related, I forgot to mention this yesterday). Done and more info: surprised, this time only one of 4 jobs had a problem and strange - at a similar place (I recall the filename once was broken - its from the same dir). Anyway - the file size was different and (type of error 1). Checked the bacula tables - no problems, all had status OK. BUT, I see at this server it happened 1 out of 4 jobs, while at the other 4 of 4 (which was much better for testing). I think if I enable spooling if I get no errors this couldn't mean spooling solved the problem, it could be just a good chance. As you see it doesn't happen always nor for all jobs. But I will run several more tests anyway. Now running the same with spooling. First impression is that I noted for 4 jobs it is writing to 8 different files. This is not so good for performance and it would be the same to define different pool for every job wouldn't it? If the spooling fixes the problem (i.e. separate write for every job) this would mean that separate pool will do the same, saving some time for data transfer between files? (!) The workaround for the problem is to switch off concurrent jobs... FS Obviously that's not a very good workaround in the long run, especially for FS those of us with multiple drives. This is why I also asked earlier yesterday about comparison w/ or w/o concurrent jobs or writing to separate volumes, as I was sure we will end with no concurrent jobs but as Wolfgang is sharing - it is slower. Regards. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list
Re: [Bacula-users] Restore errors
Hello, Tuesday, July 24, 2007, 2:00:43 PM: FS Also, it's been suggested that you try turning on spooling. Have you done so? Good news (or bad, who knows) enabled spooling (Maximum Job Spool Size = 500m) performed the same and AGAIN the first job I tested to restore ~44K files are missing: Files Expected: 348,120 Files Restored: 304,654 Another restore jobs is similar: Files Expected: 190,741 Files Restored: 154,016 The case is slightly different as this time there is NO other errors generated (like file with wrong size or the Record header file index not equal). Hope this helps in resolution of the problem. Regards. P.S. Will try your suggestion for STRICT_TRANS_TABLES but I guess it's more likely that a field is changed from 16 to 32 bit instead from 32 to 16... - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Just to say that the difference problem I had between Files Expected and Files Restored has been resolved with a $ dbcheck I don't understand why my database had inconsistencies (as I never had any hard reboot, electric cut or such) ... should dbcheck be executed at regular interval ? On Tue, 2007-07-24 at 05:25 +0300, Doytchin Spiridonov wrote: Hello, Tuesday, July 24, 2007, 2:25:44 AM: FS Doytchin Spiridonov wrote: Hello, Monday, July 23, 2007, 9:02:21 PM: FS The first thing that I would try is unmounting the filesystem and performing a FS full fsck on it, to rule out filesystem corruption. not a problem with the FS or disks, checked that. Checked the logical content of the volumes as well (bls -k -v) - no errors. For curiosity as I don't know if bls should print errors if content is damaged, I changed one random byte of one volume to 0, run the bls and got an error: Block checksum mismatch in block=113 len=64512: calc=2a576dc5 blk=44a509f3 So I would say the 3 problems (files with wrong size, missing files and error about ID: BB02) are not hardware/fs/disks related and are caused by a bug in Bacula and it is related to wrong positioning in the volumes and mismatched numbers. FS Based on this and your other emails, I would next suspect a problem with the FS catalog. Again, I'd start by making sure no errors have crept in by doing a FS consistency check at the database level - a 'repair tables' in mysql, or the FS equivalent in postgresql. Not the case - we are doing daily DB dumps before backups and they would crash if tables are damaged. No mysql error logs. The problem is not in the MySQL or broken dbs. Regards. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, the last 2 tests: Tuesday, July 24, 2007, 2:00:43 PM: FS Actually, that gives me another idea. While I've never used it myself, you FS may be able to get more details by running some jobs with strict mode turned FS on on your mysql catalog. FS http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html FS If your bacula installation is doing something that would cause the data FS stored to be wrong, such as storing a value that doesn't fit in the column FS type, I believe this should turn it from a silent warning into a fatal error, FS making it easier to track down. 1. I've set for MySQL: sql_mode=TRADITIONAL and run again all the jobs. No errors were reported as it was before. Tested restore jobs. Again 1 of them had errors (one RPM file with wrong size and restored less files). 2. So this was a time to check Julien's ideat to try dbcheck. Run: dbcheck -c /etc/bacula/bacula-dir.conf -v -f (I first created 2 indexes as dbcheck otherwise run forewer) 0 problems found for all the steps! Anyway I tried again a restore - same problem. So it is not one that can be fixed by dbcheck. I don't have any other ideas to check with to provide more cases. It's developers turn now... Regards. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Restore errors
Hello, trying to identify a bug in bacula and/or our system setup. Is there anyone that on restore had errors like this: Error: attribs.c:410 File size of restored file /home/bacula/res/b3/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm not correct. Original 3826291, restored 10620921. - the file is not a log file or any file that has changed during the backup (in which cases an error like the one above should be normal) - the wrong file size is always larger that the original; if we cut the first N bytes, where the N is the correct file size, the original and restored files match; we noted that the appended data is part of another file from the backup, not a garbage data. Note that this other file (from which some part has been appended to the file with wrong size) is restored correctly, so the only problem is wrong file size decision by bacula and reading further than its end (seems this is some internal buffer of Bacula as the data is stored in the volumes using GZIP and just reading further would break everything and the appended data should be garbage, not unzipped data). - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Doytchin Spiridonov wrote: Hello, trying to identify a bug in bacula and/or our system setup. Is there anyone that on restore had errors like this: Error: attribs.c:410 File size of restored file /home/bacula/res/b3/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm not correct. Original 3826291, restored 10620921. - the file is not a log file or any file that has changed during the backup (in which cases an error like the one above should be normal) - the wrong file size is always larger that the original; if we cut the first N bytes, where the N is the correct file size, the original and restored files match; we noted that the appended data is part of another file from the backup, not a garbage data. Note that this other file (from which some part has been appended to the file with wrong size) is restored correctly, so the only problem is wrong file size decision by bacula and reading further than its end (seems this is some internal buffer of Bacula as the data is stored in the volumes using GZIP and just reading further would break everything and the appended data should be garbage, not unzipped data). The first thing that I would try is unmounting the filesystem and performing a full fsck on it, to rule out filesystem corruption. -- Frank Sweetser fs at wpi.edu | For every problem, there is a solution that WPI Senior Network Engineer | is simple, elegant, and wrong. - HL Mencken GPG fingerprint = 6174 1257 129E 0D21 D8D4 E8A3 8E39 29E3 E2E8 8CEC - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Doytchin Spiridonov wrote: Hello, trying to identify a bug in bacula and/or our system setup. Is there anyone that on restore had errors like this: Error: attribs.c:410 File size of restored file /home/bacula/res/b3/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm not correct. Original 3826291, restored 10620921. - the file is not a log file or any file that has changed during the backup (in which cases an error like the one above should be normal) - the wrong file size is always larger that the original; if we cut the first N bytes, where the N is the correct file size, the original and restored files match; we noted that the appended data is part of another file from the backup, not a garbage data. Note that this other file (from which some part has been appended to the file with wrong size) is restored correctly, so the only problem is wrong file size decision by bacula and reading further than its end (seems this is some internal buffer of Bacula as the data is stored in the volumes using GZIP and just reading further would break everything and the appended data should be garbage, not unzipped data). This has been brought up several times within the last week, but never with the explanation and examination. I wonder if some of the other who have experienced it (I do not know their names -- hopefully they can chime in) can do the same thing for us. This is potentially serious, seems like, if it is a widespread problem. I think if the others can verify it, this should also be copied to Bacula devel. I think I will try a large restore of my own today to see what happens. Please give the rest of the details of your setup, however -- you don't even include the Bacula version, and that is a very basic piece of information. Operating system (presumably RedHat Linux from the file you backed up, but who knows), architecture... all would be useful. - -- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Systems Programmer II |$| |__| | | |__/ | \| _| |[EMAIL PROTECTED] - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/AST - NJMS Medical Science Bldg - C630 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGpO2Bmb+gadEcsb4RAovaAKDN9E+Z32g25j7QgY+oCKnXxn2W1QCgzg5i DAPsInHWmca02OB69yd6Lec= =qT9Q -END PGP SIGNATURE- begin:vcard fn:Ryan Novosielski n:Novosielski;Ryan org:UMDNJ;IST/AST adr;dom:MSB C630;;185 South Orange Avenue;Newark;NJ;07103 email;internet:[EMAIL PROTECTED] title:Systems Programmer III tel;work:(973) 972-0922 tel;fax:(973) 972-7412 tel;pager:(866) 20-UMDNJ x-mozilla-html:FALSE version:2.1 end:vcard - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
On 23 Jul 2007 at 14:03, Ryan Novosielski wrote: This has been brought up several times within the last week, but never with the explanation and examination. I wonder if some of the other who have experienced it (I do not know their names -- hopefully they can chime in) can do the same thing for us. This is potentially serious, seems like, if it is a widespread problem. It may be the same case, raised by different people. I think if the others can verify it, this should also be copied to Bacula devel. I think I will try a large restore of my own today to see what happens. Devel is aware of the issue as it was originally raised in the bug tracking system. The consensus was it is not a bug, or more correctly, there was no information supplied which permitted reproduction of the bug. If we can't reproduce the bug, we sure can't test for it, and we sure can't confirm it's been fixed. Please give the rest of the details of your setup, however -- you don't even include the Bacula version, and that is a very basic piece of information. Operating system (presumably RedHat Linux from the file you backed up, but who knows), architecture... all would be useful. This might help: http://bugs.bacula.org/view.php?id=903 -- Dan Langille - http://www.langille.org/ Available for hire: http://www.freebsddiary.org/dan_langille.php - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, I've filed this as a bug, but while Kern couldn't reproduce it he gave up. So let us find here what could be the problem. There are actually two problems, they could be linked. Here is the history: Initially we were using 2.0.3. Running backups for several weeks I wanted to restore a file and was surprised that I can't restore it. It was listed in the catalog, I could select it and run a restore job, but the file didn't come up. Investigating what happened I run a full restore job and was surprised that in that directory (where the file is) several files are missing. Also the error message similar to the one in my first post here were present. In addition to it there was a big difference between marked files and actually restored files (sure not hard links, sockets or anything else that is ignored by Bacula - at one of the tests the whole /home/ directory was missing). After that we startd with tests (backup full/diff/inc, restore etc) for a week. Every time (but at random places/files) similar error happen. Sometimes there are errors, sometimes not. Haven't run so much tests so I could come up with a decision when this happens. But IT HAPPENS and as a result we don't have a reliable backup. I know a lot of people run backups w/o testing restores and that's why (if this is not related to our specific setup) those problem could appear only if they have emergency which actually doesn't happen often. Anyway, here are the hardware and setup details: *** Bacula: 2.1.28 on all servers. From yesterday we cleaned everything (bacula DB and volumes) and installed everywhere the latest beta *2.1.28* (note this is not the problem of the beta as we discovered when we had 2.0.3). 2.1.28 fixed 2 other problems we discovered with 2.0.3, but this one is still there. Director and most of the servers are 64 bit, two of the servers are 32 bit. *** OS: Linux CentOS 4.5 *** MySQL: 5.0.37 *** Servers (all are almost identical): Supermicro, PDSME - Intel E7230 (Mukilteo) chipset, Intel Pentium D 930 Dual Core 3.0GHz, 3Ware IDE RAID Controller Escalade 9550SX. Servers have 4 disks each in RAID 1+0, only the Bacula server has many disks in RAID 5. *** Some servers are plain CentOS, some have Xen with virtual servers, the Bacula server itsels also has Xen, but the Bacula is running in Dom0, no other virtual machines at this time are running on it. *** Those servers with Xen als have LVM. *** We run (and I guess here is the problem of Bacula) concurrent jobs. *** GZIP compression is enabled. *** we save volumes on harddisk, their size is set to 4480MB --- How to get an error: As initially we discovered the error after several weeks of backups, We guessed that this could ba caused by us by a wrong setting of Volume Retention or any other Retention time and some files are purged. We started everything from zero again, and after 3 days (it happened that the first was Full, the next Differential and the last Incremental) we performed a test and that error happened again! So we were sure this is not caused by purge of some files accidentally. After that we could get that error even after just a full backup, trying to restore immediately after it is finished. Yesterday we cleaned everything again and compiled (from SRPMs) the latest 2.1.28. We run again full backup (again all concurret jobs) and the errors described here happen when we try to restore files from every job (except one where there are just 150 files). So the problems are two: - sometimes some files are restored with higher size, while the first part of the file matches exactly the original file (not log files or dynamic files) This happens on very rare cases (~one case per 5 jobs) - sometimes not all files are restored, but tens of thousands are missing, an example: Files Expected: 190,718 Files Restored: 166,097 This happens more often (~one case per 2 jobs). Note that once the error happens we can reproduce it on every restore at the same place for the same file and the same number of missing files (i.e. this is not a problem of restore, it is most a problem of volumes). What are our future tests: 1. we will do the same (concurrent jobs) but w.o using GZIP 2. if it happens again we will set max jobs to 1 so every job is run alone. Because when testing AFAIR we didn't get errors when we run just one full backup job. This always happen when we do several at once (but I am not 100% sure, thats why we will test this) 3. if it still happens we will run it with normal kernel (so to exclude the Xen influence) 4. last we will try w/o LVM (which would be harder) Regards P.S. sorry for my English :) Monday, July 23, 2007, 9:03:45 PM: RN -BEGIN PGP SIGNED MESSAGE- RN Hash: SHA1 RN Doytchin Spiridonov wrote: Hello, trying to identify a bug in bacula and/or our system setup. Is there anyone that on restore had errors like this: Error: attribs.c:410 File size of restored file
Re: [Bacula-users] Restore errors
Hello Dan, Monday, July 23, 2007, 9:35:05 PM: DL Devel is aware of the issue as it was originally raised in the bug DL tracking system. The consensus was it is not a bug, or more DL correctly, there was no information supplied which permitted DL reproduction of the bug. If we can't reproduce the bug, we sure DL can't test for it, and we sure can't confirm it's been fixed. Can you provide some guidance on how and what to test? As I personally don't have any idea what data is stored in the Catalog, what is in the Volumes, regarding the problem of wrong file size: - does Bacula store the file size somewhere and if so where? how to extract that/those numbers if they are at more than one place? - how then the file could have larger size? (for example there is a GZIP stream, you unpack it and you get larger result or what? or the second (wrong) number is the number stored in the volume?) - do you agree that it is strange that while the stream is gzipped, the appended data is part of a real file? If the volume were damaged, I guess the additional data should be some garbage? Also is it possible that the file is restored OK, but only its size is set to a wrong mumber and we've seen some data previously existed in the disk? Or bacula has some internal buffer this additional part was from a file restored just before the wrong one? Regards. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, I forgot to mention something very IMPORTANT: I discovered that in *all* of such cases (restored files with larger size), if we don't perform full restore, but restore a SINGLE file, it is restored OK with *correct* size and content. It is OK even if we restore the directory where it is (with the other files in it). Which proves its is not a problem with the FS, kernel, xen, lvm, hardware, etc, but it is a problem with Bacula. Regards Monday, July 23, 2007, 9:57:40 PM: DS Hello, DS I've filed this as a bug, but while Kern couldn't reproduce it he gave DS up. So let us find here what could be the problem. There are actually DS two problems, they could be linked. DS Here is the history: DS Initially we were using 2.0.3. Running backups for several weeks I DS wanted to restore a file and was surprised that I can't restore it. It DS was listed in the catalog, I could select it and run a restore job, DS but the file didn't come up. Investigating what happened I run a full DS restore job and was surprised that in that directory (where the file DS is) several files are missing. Also the error message similar to the DS one in my first post here were present. In addition to it there was a DS big difference between marked files and actually restored files (sure DS not hard links, sockets or anything else that is ignored by Bacula - DS at one of the tests the whole /home/ directory was missing). DS After that we startd with tests (backup full/diff/inc, restore etc) DS for a week. Every time (but at random places/files) similar error DS happen. Sometimes there are errors, sometimes not. Haven't run so much DS tests so I could come up with a decision when this happens. But IT DS HAPPENS and as a result we don't have a reliable backup. I know a lot DS of people run backups w/o testing restores and that's why (if this is DS not related to our specific setup) those problem could appear only if DS they have emergency which actually doesn't happen often. Anyway, here DS are the hardware and setup details: DS *** Bacula: 2.1.28 on all servers. From yesterday we cleaned everything (bacula DB and volumes) and DS installed everywhere the latest beta *2.1.28* (note this is not the DS problem of the beta as we discovered when we had 2.0.3). 2.1.28 fixed DS 2 other problems we discovered with 2.0.3, but this one is still DS there. DS Director and most of the servers are 64 bit, two of the servers are 32 DS bit. DS *** OS: Linux CentOS 4.5 DS *** MySQL: 5.0.37 DS *** Servers (all are almost identical): Supermicro, PDSME - Intel DS E7230 (Mukilteo) chipset, Intel Pentium D 930 Dual Core 3.0GHz, 3Ware DS IDE RAID Controller Escalade 9550SX. Servers have 4 disks each in RAID DS 1+0, only the Bacula server has many disks in RAID 5. DS *** Some servers are plain CentOS, some have Xen with virtual servers, DS the Bacula server itsels also has Xen, but the Bacula is running in DS Dom0, no other virtual machines at this time are running on it. DS *** Those servers with Xen als have LVM. DS *** We run (and I guess here is the problem of Bacula) concurrent DS jobs. DS *** GZIP compression is enabled. DS *** we save volumes on harddisk, their size is set to 4480MB DS --- How to get an error: DS As initially we discovered the error after several weeks of backups, DS We guessed that this could ba caused by us by a wrong setting of DS Volume Retention or any other Retention time and some files are DS purged. DS We started everything from zero again, and after 3 days (it happened DS that the first was Full, the next Differential and the last DS Incremental) we performed a test and that error happened again! So we DS were sure this is not caused by purge of some files accidentally. DS After that we could get that error even after just a full backup, DS trying to restore immediately after it is finished. DS Yesterday we cleaned everything again and compiled (from SRPMs) the DS latest 2.1.28. DS We run again full backup (again all concurret jobs) and the errors DS described here happen when we try to restore files from every job DS (except one where there are just 150 files). DS So the problems are two: DS - sometimes some files are restored with higher size, while the first DS part of the file matches exactly the original file (not log files or DS dynamic files) This happens on very rare cases (~one case per 5 jobs) DS - sometimes not all files are restored, but tens of thousands are DS missing, an example: DS Files Expected: 190,718 DS Files Restored: 166,097 DS This happens more often (~one case per 2 jobs). DS Note that once the error happens we can reproduce it on every restore DS at the same place for the same file and the same number of missing DS files (i.e. this is not a problem of restore, it is most a problem of DS volumes). DS What are our future tests: DS 1. we will do the same (concurrent jobs) but w.o using GZIP DS 2. if it happens again we will set max jobs to 1 so every job is run DS
Re: [Bacula-users] Restore errors
sometimes not all files are restored, but tens of thousands are missing, an example: Files Expected: 190,718 Files Restored: 166,097 This happens more often (~one case per 2 jobs). Just to say that I have the case too every time I restore, but I think you can ignore it. Following my observations and the little tests I made, the difference between files expected and files restored are the number of directories. It seems that bacula doesn't handle this properly (a bug in the counter .. ?). Regards, Julien On Mon, 2007-07-23 at 23:01 +0300, Doytchin Spiridonov wrote: Hello, I forgot to mention something very IMPORTANT: I discovered that in *all* of such cases (restored files with larger size), if we don't perform full restore, but restore a SINGLE file, it is restored OK with *correct* size and content. It is OK even if we restore the directory where it is (with the other files in it). Which proves its is not a problem with the FS, kernel, xen, lvm, hardware, etc, but it is a problem with Bacula. Regards Monday, July 23, 2007, 9:57:40 PM: DS Hello, DS I've filed this as a bug, but while Kern couldn't reproduce it he gave DS up. So let us find here what could be the problem. There are actually DS two problems, they could be linked. DS Here is the history: DS Initially we were using 2.0.3. Running backups for several weeks I DS wanted to restore a file and was surprised that I can't restore it. It DS was listed in the catalog, I could select it and run a restore job, DS but the file didn't come up. Investigating what happened I run a full DS restore job and was surprised that in that directory (where the file DS is) several files are missing. Also the error message similar to the DS one in my first post here were present. In addition to it there was a DS big difference between marked files and actually restored files (sure DS not hard links, sockets or anything else that is ignored by Bacula - DS at one of the tests the whole /home/ directory was missing). DS After that we startd with tests (backup full/diff/inc, restore etc) DS for a week. Every time (but at random places/files) similar error DS happen. Sometimes there are errors, sometimes not. Haven't run so much DS tests so I could come up with a decision when this happens. But IT DS HAPPENS and as a result we don't have a reliable backup. I know a lot DS of people run backups w/o testing restores and that's why (if this is DS not related to our specific setup) those problem could appear only if DS they have emergency which actually doesn't happen often. Anyway, here DS are the hardware and setup details: DS *** Bacula: 2.1.28 on all servers. From yesterday we cleaned everything (bacula DB and volumes) and DS installed everywhere the latest beta *2.1.28* (note this is not the DS problem of the beta as we discovered when we had 2.0.3). 2.1.28 fixed DS 2 other problems we discovered with 2.0.3, but this one is still DS there. DS Director and most of the servers are 64 bit, two of the servers are 32 DS bit. DS *** OS: Linux CentOS 4.5 DS *** MySQL: 5.0.37 DS *** Servers (all are almost identical): Supermicro, PDSME - Intel DS E7230 (Mukilteo) chipset, Intel Pentium D 930 Dual Core 3.0GHz, 3Ware DS IDE RAID Controller Escalade 9550SX. Servers have 4 disks each in RAID DS 1+0, only the Bacula server has many disks in RAID 5. DS *** Some servers are plain CentOS, some have Xen with virtual servers, DS the Bacula server itsels also has Xen, but the Bacula is running in DS Dom0, no other virtual machines at this time are running on it. DS *** Those servers with Xen als have LVM. DS *** We run (and I guess here is the problem of Bacula) concurrent DS jobs. DS *** GZIP compression is enabled. DS *** we save volumes on harddisk, their size is set to 4480MB DS --- How to get an error: DS As initially we discovered the error after several weeks of backups, DS We guessed that this could ba caused by us by a wrong setting of DS Volume Retention or any other Retention time and some files are DS purged. DS We started everything from zero again, and after 3 days (it happened DS that the first was Full, the next Differential and the last DS Incremental) we performed a test and that error happened again! So we DS were sure this is not caused by purge of some files accidentally. DS After that we could get that error even after just a full backup, DS trying to restore immediately after it is finished. DS Yesterday we cleaned everything again and compiled (from SRPMs) the DS latest 2.1.28. DS We run again full backup (again all concurret jobs) and the errors DS described here happen when we try to restore files from every job DS (except one where there are just 150 files). DS So the problems are two: DS - sometimes some files are restored with higher size, while the first DS part of the file matches exactly the original file (not log files or DS
Re: [Bacula-users] Restore errors
Hello, no, probably you didn't found which are the missing files. After we restore we compare the restored files with original. The conclusion is that there are really missing files! (As I mentioned those are not hardlinks, sockets, etc - in a test we had missing /home/ directory and all files in it!) Bacula's counter is OK and from our tests I can say that the only good restore is that when those numbers match. If you see difference like below, you can be sure your restore file set is really wrong. Could you please check if the above is true for your restores? Would be helpful to know we are not alone. P.S. And please, anyone who never did a restore, please do some tests. That way you will be sure you have a valid backup OR you will know you have an invalid one ;) The easier test is to do full restore somewhere and to check if the Files Expected and Files Restored are much different (and/or the error regarding bad file size). Regards. Monday, July 23, 2007, 11:18:00 PM: sometimes not all files are restored, but tens of thousands are missing, an example: Files Expected: 190,718 Files Restored: 166,097 This happens more often (~one case per 2 jobs). J Just to say that I have the case too every time I restore, but I think J you can ignore it. Following my observations and the little tests I J made, the difference between files expected and files restored are J the number of directories. It seems that bacula doesn't handle this J properly (a bug in the counter .. ?). J Regards, J Julien J On Mon, 2007-07-23 at 23:01 +0300, Doytchin Spiridonov wrote: Hello, I forgot to mention something very IMPORTANT: I discovered that in *all* of such cases (restored files with larger size), if we don't perform full restore, but restore a SINGLE file, it is restored OK with *correct* size and content. It is OK even if we restore the directory where it is (with the other files in it). Which proves its is not a problem with the FS, kernel, xen, lvm, hardware, etc, but it is a problem with Bacula. Regards Monday, July 23, 2007, 9:57:40 PM: DS Hello, DS I've filed this as a bug, but while Kern couldn't reproduce it he gave DS up. So let us find here what could be the problem. There are actually DS two problems, they could be linked. DS Here is the history: DS Initially we were using 2.0.3. Running backups for several weeks I DS wanted to restore a file and was surprised that I can't restore it. It DS was listed in the catalog, I could select it and run a restore job, DS but the file didn't come up. Investigating what happened I run a full DS restore job and was surprised that in that directory (where the file DS is) several files are missing. Also the error message similar to the DS one in my first post here were present. In addition to it there was a DS big difference between marked files and actually restored files (sure DS not hard links, sockets or anything else that is ignored by Bacula - DS at one of the tests the whole /home/ directory was missing). DS After that we startd with tests (backup full/diff/inc, restore etc) DS for a week. Every time (but at random places/files) similar error DS happen. Sometimes there are errors, sometimes not. Haven't run so much DS tests so I could come up with a decision when this happens. But IT DS HAPPENS and as a result we don't have a reliable backup. I know a lot DS of people run backups w/o testing restores and that's why (if this is DS not related to our specific setup) those problem could appear only if DS they have emergency which actually doesn't happen often. Anyway, here DS are the hardware and setup details: DS *** Bacula: 2.1.28 on all servers. From yesterday we cleaned everything (bacula DB and volumes) and DS installed everywhere the latest beta *2.1.28* (note this is not the DS problem of the beta as we discovered when we had 2.0.3). 2.1.28 fixed DS 2 other problems we discovered with 2.0.3, but this one is still DS there. DS Director and most of the servers are 64 bit, two of the servers are 32 DS bit. DS *** OS: Linux CentOS 4.5 DS *** MySQL: 5.0.37 DS *** Servers (all are almost identical): Supermicro, PDSME - Intel DS E7230 (Mukilteo) chipset, Intel Pentium D 930 Dual Core 3.0GHz, 3Ware DS IDE RAID Controller Escalade 9550SX. Servers have 4 disks each in RAID DS 1+0, only the Bacula server has many disks in RAID 5. DS *** Some servers are plain CentOS, some have Xen with virtual servers, DS the Bacula server itsels also has Xen, but the Bacula is running in DS Dom0, no other virtual machines at this time are running on it. DS *** Those servers with Xen als have LVM. DS *** We run (and I guess here is the problem of Bacula) concurrent DS jobs. DS *** GZIP compression is enabled. DS *** we save volumes on harddisk, their size is set to 4480MB DS --- How to get an error: DS As initially we discovered the error after several weeks of backups, DS We guessed that
Re: [Bacula-users] Restore errors
On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote: Hello Dan, Monday, July 23, 2007, 9:35:05 PM: DL Devel is aware of the issue as it was originally raised in the bug DL tracking system. The consensus was it is not a bug, or more DL correctly, there was no information supplied which permitted DL reproduction of the bug. If we can't reproduce the bug, we sure DL can't test for it, and we sure can't confirm it's been fixed. Can you provide some guidance on how and what to test? Test the backups. Backup one file. Restore. Backup N files. restore. Backup N directories, restore. Find a simple and reproducible situation which demonstrates the problem. As I personally don't have any idea what data is stored in the Catalog, what is in the Volumes, regarding the problem of wrong file size: - does Bacula store the file size somewhere and if so where? how to extract that/those numbers if they are at more than one place? There is a section in the manual regarding database tables. I suspect you want the File Job table, but I cannot recall from memory. - how then the file could have larger size? (for example there is a GZIP stream, you unpack it and you get larger result or what? or the second (wrong) number is the number stored in the volume?) I don't know. - do you agree that it is strange that while the stream is gzipped, the appended data is part of a real file? If the volume were damaged, I guess the additional data should be some garbage? Also is it possible that the file is restored OK, but only its size is set to a wrong mumber and we've seen some data previously existed in the disk? Or bacula has some internal buffer this additional part was from a file restored just before the wrong one? Someone sugested you verify your filesystem (e.g. fsck). Have you done that? I suggest avoiding trying to answer your above questions. Concentrate on the following. My suggestion, as was made by others, is to find something which can be reproduced. Developers cannot solve the problem if they cannot reproduce it. -- Dan Langille - http://www.langille.org/ Available for hire: http://www.freebsddiary.org/dan_langille.php - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote: Hello, I've filed this as a bug, but while Kern couldn't reproduce it he gave up. So let us find here what could be the problem. There are actually two problems, they could be linked. Please. If anyone can solve the issue given what you supplied, they would. You were asked to supply a reproducible situation. Hopefully we can get to that position quickly without further unnecessary distractions. -- Dan Langille - http://www.langille.org/ Available for hire: http://www.freebsddiary.org/dan_langille.php - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, don't get me wrong, I know pretty well that nothing can be done there until you have a clean example. But having in mind that these errors hapen in 1 per 4 million files it also would be hard to isolate the case where this happens. The fact is that it happens pretty often and as I said we are continuing to try different ways to see when it will stop to happen. If we find that I will provide the info here. Regards. Tuesday, July 24, 2007, 12:15:43 AM: DL On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote: Hello, I've filed this as a bug, but while Kern couldn't reproduce it he gave up. So let us find here what could be the problem. There are actually two problems, they could be linked. DL Please. If anyone can solve the issue given what you supplied, they DL would. You were asked to supply a reproducible situation. Hopefully DL we can get to that position quickly without further unnecessary DL distractions. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, Tuesday, July 24, 2007, 12:15:37 AM: DL Someone sugested you verify your filesystem (e.g. fsck). Have you DL done that? Yes: FS The first thing that I would try is unmounting the filesystem and performing a FS full fsck on it, to rule out filesystem corruption. 1. unmounted and checked the partition with volumes for bad blocks and fs problems: fsck.ext3 -n -c -v /dev/ Got NO errors. 2. for the volumes used by the jobs with problems, I did the following: bls -k -v /home/bacula/FILE0001 and got NO errors (I'm not sure if there is something wrong with the volume the above should produce errors, but it listed all blocks and they were OK) 3. same for: bls -j /home/bacula/FILE0001 !NEW: Here is one additional error and note: I mentioned that if the file that is restored with a wrong size during a full backup is OK when restored alone. However in that case another error is reported: Storage: Restore_b0.d6.int.2007-07-23_22.57.42 Error: block.c:275 Volume data error at 0:3999743252! Wanted ID: BB02, got Иnлу. Buffer discarded. Note tha this error didn't affected the file restore - it was OK - size and its content. After we verified above everything and the volumes have correct blocks, this error is trange. Does this means that Bacula is positioning at a wrong place rather there is a problem with the volume itself? I reported that once and now (different jobs and version of Bacula - before it was 2.0.3, now 2.1.28) it is the same. The strange thing is (as before): - we listed all blocks in the volumes with bls -k - there is no such File:blk as 0:3999743252 - the number 3999743252 however appears in the database, one entry in the table JobMedia: (18,2,2,134546,159167,0,1,3999743252,400989292,5,0,0) While I don't know for what it is used, I can see that: - either this number in the database is wrong - either Bacula should not try to position at that number as there is no valid File:blk at that address. Regards - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, Monday, July 23, 2007, 9:02:21 PM: FS The first thing that I would try is unmounting the filesystem and performing a FS full fsck on it, to rule out filesystem corruption. not a problem with the FS or disks, checked that. Checked the logical content of the volumes as well (bls -k -v) - no errors. For curiosity as I don't know if bls should print errors if content is damaged, I changed one random byte of one volume to 0, run the bls and got an error: Block checksum mismatch in block=113 len=64512: calc=2a576dc5 blk=44a509f3 So I would say the 3 problems (files with wrong size, missing files and error about ID: BB02) are not hardware/fs/disks related and are caused by a bug in Bacula and it is related to wrong positioning in the volumes and mismatched numbers. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Doytchin Spiridonov wrote: Hello, Monday, July 23, 2007, 9:02:21 PM: FS The first thing that I would try is unmounting the filesystem and performing a FS full fsck on it, to rule out filesystem corruption. not a problem with the FS or disks, checked that. Checked the logical content of the volumes as well (bls -k -v) - no errors. For curiosity as I don't know if bls should print errors if content is damaged, I changed one random byte of one volume to 0, run the bls and got an error: Block checksum mismatch in block=113 len=64512: calc=2a576dc5 blk=44a509f3 So I would say the 3 problems (files with wrong size, missing files and error about ID: BB02) are not hardware/fs/disks related and are caused by a bug in Bacula and it is related to wrong positioning in the volumes and mismatched numbers. Based on this and your other emails, I would next suspect a problem with the catalog. Again, I'd start by making sure no errors have crept in by doing a consistency check at the database level - a 'repair tables' in mysql, or the equivalent in postgresql. -- Frank Sweetser fs at wpi.edu | For every problem, there is a solution that WPI Network Engineer | is simple, elegant, and wrong. - HL Mencken GPG fingerprint = 6174 1257 129E 0D21 D8D4 E8A3 8E39 29E3 E2E8 8CEC - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, Tuesday, July 24, 2007, 12:15:37 AM: DL On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote: DL Test the backups. Backup one file. Restore. Backup N files. DL restore. Backup N directories, restore. Find a simple and DL reproducible situation which demonstrates the problem. Tested disks and backups (volumes) they are OK and not damaged. From the small amount of tests so far (40-50) I can say (but not 100% sure as this is not planned QA we are just doing different combinations to achieve backup/restore w/o errors) that backing up a single job (even with 4M of files) and restore is OK and there are no errors. All of the cases with errors happen when we run jobs concurrently. DL My suggestion, as was made by others, is to find something which can DL be reproduced. Developers cannot solve the problem if they cannot DL reproduce it. Yes we have it - everytime we do full backup concurrently of 4-5 servers we are testing with, we have at least one job that has a problem. We will continue with the tests as I noted before to see when (if) we will get backup/restore w/o an error. Regards. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, Tuesday, July 24, 2007, 2:25:44 AM: FS Doytchin Spiridonov wrote: Hello, Monday, July 23, 2007, 9:02:21 PM: FS The first thing that I would try is unmounting the filesystem and performing a FS full fsck on it, to rule out filesystem corruption. not a problem with the FS or disks, checked that. Checked the logical content of the volumes as well (bls -k -v) - no errors. For curiosity as I don't know if bls should print errors if content is damaged, I changed one random byte of one volume to 0, run the bls and got an error: Block checksum mismatch in block=113 len=64512: calc=2a576dc5 blk=44a509f3 So I would say the 3 problems (files with wrong size, missing files and error about ID: BB02) are not hardware/fs/disks related and are caused by a bug in Bacula and it is related to wrong positioning in the volumes and mismatched numbers. FS Based on this and your other emails, I would next suspect a problem with the FS catalog. Again, I'd start by making sure no errors have crept in by doing a FS consistency check at the database level - a 'repair tables' in mysql, or the FS equivalent in postgresql. Not the case - we are doing daily DB dumps before backups and they would crash if tables are damaged. No mysql error logs. The problem is not in the MySQL or broken dbs. Regards. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors
Hello, done. Found where is the problem after some more tests (and once again it is not in our hadrware or OS or broken things). It is where I initially suggested - the concurrent jobs. After the first (and native configuration) we used (concurrent jobs, with gzip) we tested the following: 1. concurrent jobs, w/o gzip - we got similar errors (1 wrong filesize from 4 jobs, but 3 of 4 jobs with less files than expected, the 4th usually is very small - 100 files - and never had errors, so I would say 100% of jobs was invalid) 2. no concurrent jobs (Maximum Concurrent Jobs = 1 at dir and sd), w/o gzip - good news, all restores are OK, no errors, Files Expected and Files Restored match! 3. no concurrent jobs WITH gzip - again OK, all restores are OK, no errors, Files Expected and Files Restored match! So until now we have: - the problem is not caused by a corrupted file system - volumes are consistent and bls doesn't show errors - MySQL is OK (initially 4.1.x now 5.0.37) - when running concurrent jobs both 2.0.3 and 2.1.28 say backups are OK but restores fail with one of the 3 kinds of errors listed below - when concurrent jobs are turned off everything is OK - gzip on/off doesn't affect the errors Once again the 3 types of errors are: 1. some static files (i.e. not log files!) are restored with wrong (always larger) size, while first N bytes match, and the rest is filled with a part of another file (not sure if this is just file with a wrong size and some old data at the disk appears at the end, or bacula restores part of another file and append it to the end). The file can be restored correctly if marked alone but the error 3. below is generated (which seems to be just a bogus error). An example error is: --- b0: Restore_b0.d6.int.2007-07-23_22.37.34 Error: attribs.c:410 File size of restored file /home/bacula/res/b3.2/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm not correct. Original 3826291, restored 10620921. --- When this error is present (always) the second error below (but w/o additional error messages) is present as well (missing files) 2. large amount of files are missing (while they are present in the catalog and selected) - tens of thousands (not sockets or anything else that Bacula ignores by default). When this happens usually an error like this appear (if not the first one above): --- b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header file index 42452 not equal record index 0 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124 Error sending to File daemon. ERR=Connection reset by peer Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306 Write error sending 30 bytes to client:10.2.1.13:36643: ERR=Connection reset by peer --- 3. when a file from error 1 is restored alone it is OK, but another bogus error is generated: --- Storage: Restore_b0.d6.int.2007-07-23_22.57.42 Error: block.c:275 Volume data error at 0:3999743252! Wanted ID: BB02, got Иnлу. Buffer discarded. --- Found that the above number (3999743252) is not present as block address for any block in the volumes, but the same number appears as part of JobMedia record in the database. This is everything in 2.1.28 sumarized, that poped up as a problem or fact. (2.0.3 had another bug with bogus errors about sockets' attributes and 2.1.26 had a bogus SQL error messages but those are fixed OK in 2.1.28). If anyone wants, feel free to reopen the bug in Mantis (903). I'm not going to do so as I am personally disappointed by the attitude this is not a bug - work it out yourself and the suggestion to send you our servers as a gift to test with, plus support fees... nice. Now it's up to you to create better test cases to catch more bugs if any. We will start our backup again w/o concurrent jobs and we will continue to monitor restores on a daily basis as the above tests are just 3 and I agree there is a posibility that it was just a chance that the later two tests went OK. But it was my suggestion from the beginning that the problem is Bacula damages either database numbers or volume records when concurrent jobs are running and so far the facts proved this. (!) The workaround for the problem is to switch off concurrent jobs as if not - the chance you have invalid backups are high (some 90% from our own cases and at least with our servers/os/configuration; this is so if it is not said that 100% of backups are wrong as after diff/incremental backups Bacula restores files that are deleted which is really a bad behaviour in many cases/services). Regards Tuesday, July 24, 2007, 12:15:43 AM: DL On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote: Hello, I've filed this as a bug, but while Kern couldn't reproduce it he gave up. So let us find here what could be the problem. There are actually two problems, they could be linked. DL Please. If anyone can solve the issue given what you supplied, they DL would. You were asked to supply a reproducible situation. Hopefully DL
Re: [Bacula-users] Restore errors: Permission Denied
On Friday 21 October 2005 21:11, Martin Simmons wrote: On Thu, 20 Oct 2005 16:29:17 +1000, Craig Holyoak [EMAIL PROTECTED] said: Craig On Wed, 2005-10-19 at 11:17 +0100, Martin Simmons wrote:=20 On Wed, 19 Oct 2005 12:08:24 +1000, Craig Holyoak [EMAIL PROTECTED] said: Craig I'm running bacula 1.36.2 on Debian stable. Whenever I try to run a restore Craig job, it fails with: Craig 19-Oct 11:29 helmsdeep: Start Restore Job Restore.2005-10-19_11.29.50 Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: Could not create bootstrap file /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap: ERR=3DPermission denied Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: job.c:1662 Comm error with SD. bad response to Bootstrap. ERR=3DConnection reset by peer Craig 19-Oct 11:29 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 1.36.2 (28Feb05): 19-Oct-2005 11:29:53 Craig JobId: 878 Craig Job:Restore.2005-10-19_11.29.50 Craig Client: newman Craig Start time: 19-Oct-2005 11:29:52 Craig End time: 19-Oct-2005 11:29:53 Craig Files Expected: 1 Craig Files Restored: 0 Craig Bytes Restored: 0 Craig Rate: 0.0 KB/s Craig FD Errors: 0 Craig FD termination status: Error Craig SD termination status: Error Craig Termination:*** Restore Error *** Craig This machine runs all bacula daemons. The director and sd run as the bacula Craig user, and the sd runs as root. /var/lib/bacula is writable by the bacula user Craig (and root, obviously :-). Craig I've tried modifying the job and redirecting the bootstrap file elsewhere (eg Craig /tmp), but I keep getting the same errors. I have never run a successful Craig restore using bconsole. I'm forced to use bextract to do all my restores, Craig which works fine. Craig Any ideas? The error comes from the SD. So helmsdeep and newman are the same machine? Craig helmsdeep is the name of my bacula director, which runs on newman, Craig so yes, these are the same machine. Does getting the same errors mean that it always says /var/lib/bacula even when the bootstrap should be in /tmp? Craig Perhaps I'm a little unclear on what bootstraps are involved. By Craig default, a bootstrap for the job is placed Craig in /var/lib/bacula/restore.bsr. If I change this elsewhere by modifying Craig the job before it is run, ie, /tmp/restore.bsr, it fails because it Craig can't find the file - there is no /tmp/restore.bsr. But even though it Craig creates the bootstrap successfully to /var/lib/bacula/restore.bsr, it Craig still wants to Craig create /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap, at Craig which point if fails with permission denied. Ah, I see. From the output, it looks like your File Daemon and Store Daemon are both called newman (in the conf files)? What happens if you rename one of the them to something else? I'm wondering if there is a filename clash somehere. Yes, if two daemons are running on the same machine sharing the same working directory they *MUST* have unique names. __Martin --- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users --- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today * Register for a JBoss Training Course Free Certification Exam for All Training Attendees Through End of 2005 Visit http://www.jboss.com/services/certification for more information ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors: Permission Denied
On Thu, 20 Oct 2005 16:29:17 +1000, Craig Holyoak [EMAIL PROTECTED] said: Craig On Wed, 2005-10-19 at 11:17 +0100, Martin Simmons wrote:=20 On Wed, 19 Oct 2005 12:08:24 +1000, Craig Holyoak [EMAIL PROTECTED] said: Craig I'm running bacula 1.36.2 on Debian stable. Whenever I try to run a restore Craig job, it fails with: Craig 19-Oct 11:29 helmsdeep: Start Restore Job Restore.2005-10-19_11.29.50 Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: Could not create bootstrap file /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap: ERR=3DPermission denied Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: job.c:1662 Comm error with SD. bad response to Bootstrap. ERR=3DConnection reset by peer Craig 19-Oct 11:29 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 1.36.2 (28Feb05): 19-Oct-2005 11:29:53 Craig JobId: 878 Craig Job:Restore.2005-10-19_11.29.50 Craig Client: newman Craig Start time: 19-Oct-2005 11:29:52 Craig End time: 19-Oct-2005 11:29:53 Craig Files Expected: 1 Craig Files Restored: 0 Craig Bytes Restored: 0 Craig Rate: 0.0 KB/s Craig FD Errors: 0 Craig FD termination status: Error Craig SD termination status: Error Craig Termination:*** Restore Error *** Craig This machine runs all bacula daemons. The director and sd run as the bacula Craig user, and the sd runs as root. /var/lib/bacula is writable by the bacula user Craig (and root, obviously :-). Craig I've tried modifying the job and redirecting the bootstrap file elsewhere (eg Craig /tmp), but I keep getting the same errors. I have never run a successful Craig restore using bconsole. I'm forced to use bextract to do all my restores, Craig which works fine. Craig Any ideas? The error comes from the SD. So helmsdeep and newman are the same machine? Craig helmsdeep is the name of my bacula director, which runs on newman, Craig so yes, these are the same machine. Does getting the same errors mean that it always says /var/lib/bacula even when the bootstrap should be in /tmp? Craig Perhaps I'm a little unclear on what bootstraps are involved. By Craig default, a bootstrap for the job is placed Craig in /var/lib/bacula/restore.bsr. If I change this elsewhere by modifying Craig the job before it is run, ie, /tmp/restore.bsr, it fails because it Craig can't find the file - there is no /tmp/restore.bsr. But even though it Craig creates the bootstrap successfully to /var/lib/bacula/restore.bsr, it Craig still wants to Craig create /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap, at Craig which point if fails with permission denied. Ah, I see. From the output, it looks like your File Daemon and Store Daemon are both called newman (in the conf files)? What happens if you rename one of the them to something else? I'm wondering if there is a filename clash somehere. __Martin --- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore errors: Permission Denied [SOLVED]
On Fri, 2005-10-21 at 20:11 +0100, Martin Simmons wrote: On Thu, 20 Oct 2005 16:29:17 +1000, Craig Holyoak [EMAIL PROTECTED] said: Craig On Wed, 2005-10-19 at 11:17 +0100, Martin Simmons wrote:=20 On Wed, 19 Oct 2005 12:08:24 +1000, Craig Holyoak [EMAIL PROTECTED] said: Craig I'm running bacula 1.36.2 on Debian stable. Whenever I try to run a restore Craig job, it fails with: Craig 19-Oct 11:29 helmsdeep: Start Restore Job Restore.2005-10-19_11.29.50 Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: Could not create bootstrap file /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap: ERR=3DPermission denied Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: job.c:1662 Comm error with SD. bad response to Bootstrap. ERR=3DConnection reset by peer Craig 19-Oct 11:29 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 1.36.2 (28Feb05): 19-Oct-2005 11:29:53 Craig JobId: 878 Craig Job:Restore.2005-10-19_11.29.50 Craig Client: newman Craig Start time: 19-Oct-2005 11:29:52 Craig End time: 19-Oct-2005 11:29:53 Craig Files Expected: 1 Craig Files Restored: 0 Craig Bytes Restored: 0 Craig Rate: 0.0 KB/s Craig FD Errors: 0 Craig FD termination status: Error Craig SD termination status: Error Craig Termination:*** Restore Error *** Craig This machine runs all bacula daemons. The director and sd run as the bacula Craig user, and the sd runs as root. /var/lib/bacula is writable by the bacula user Craig (and root, obviously :-). Craig I've tried modifying the job and redirecting the bootstrap file elsewhere (eg Craig /tmp), but I keep getting the same errors. I have never run a successful Craig restore using bconsole. I'm forced to use bextract to do all my restores, Craig which works fine. Craig Any ideas? The error comes from the SD. So helmsdeep and newman are the same machine? Craig helmsdeep is the name of my bacula director, which runs on newman, Craig so yes, these are the same machine. Does getting the same errors mean that it always says /var/lib/bacula even when the bootstrap should be in /tmp? Craig Perhaps I'm a little unclear on what bootstraps are involved. By Craig default, a bootstrap for the job is placed Craig in /var/lib/bacula/restore.bsr. If I change this elsewhere by modifying Craig the job before it is run, ie, /tmp/restore.bsr, it fails because it Craig can't find the file - there is no /tmp/restore.bsr. But even though it Craig creates the bootstrap successfully to /var/lib/bacula/restore.bsr, it Craig still wants to Craig create /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap, at Craig which point if fails with permission denied. Ah, I see. From the output, it looks like your File Daemon and Store Daemon are both called newman (in the conf files)? What happens if you rename one of the them to something else? I'm wondering if there is a filename clash somehere. Renaming the SD to newman-sd worked perfectly. Thanks! Craig -- Craig Holyoak [EMAIL PROTECTED] http://www.helmsdeep.org/ signature.asc Description: This is a digitally signed message part
Re: [Bacula-users] Restore errors: Permission Denied
On Wed, 2005-10-19 at 11:17 +0100, Martin Simmons wrote: On Wed, 19 Oct 2005 12:08:24 +1000, Craig Holyoak [EMAIL PROTECTED] said: Craig I'm running bacula 1.36.2 on Debian stable. Whenever I try to run a restore Craig job, it fails with: Craig 19-Oct 11:29 helmsdeep: Start Restore Job Restore.2005-10-19_11.29.50 Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: Could not create bootstrap file /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap: ERR=Permission denied Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: job.c:1662 Comm error with SD. bad response to Bootstrap. ERR=Connection reset by peer Craig 19-Oct 11:29 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 1.36.2 (28Feb05): 19-Oct-2005 11:29:53 Craig JobId: 878 Craig Job:Restore.2005-10-19_11.29.50 Craig Client: newman Craig Start time: 19-Oct-2005 11:29:52 Craig End time: 19-Oct-2005 11:29:53 Craig Files Expected: 1 Craig Files Restored: 0 Craig Bytes Restored: 0 Craig Rate: 0.0 KB/s Craig FD Errors: 0 Craig FD termination status: Error Craig SD termination status: Error Craig Termination:*** Restore Error *** Craig This machine runs all bacula daemons. The director and sd run as the bacula Craig user, and the sd runs as root. /var/lib/bacula is writable by the bacula user Craig (and root, obviously :-). Craig I've tried modifying the job and redirecting the bootstrap file elsewhere (eg Craig /tmp), but I keep getting the same errors. I have never run a successful Craig restore using bconsole. I'm forced to use bextract to do all my restores, Craig which works fine. Craig Any ideas? The error comes from the SD. So helmsdeep and newman are the same machine? helmsdeep is the name of my bacula director, which runs on newman, so yes, these are the same machine. Does getting the same errors mean that it always says /var/lib/bacula even when the bootstrap should be in /tmp? Perhaps I'm a little unclear on what bootstraps are involved. By default, a bootstrap for the job is placed in /var/lib/bacula/restore.bsr. If I change this elsewhere by modifying the job before it is run, ie, /tmp/restore.bsr, it fails because it can't find the file - there is no /tmp/restore.bsr. But even though it creates the bootstrap successfully to /var/lib/bacula/restore.bsr, it still wants to create /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap, at which point if fails with permission denied. What is the output of ls -la /var/lib/bacula ls -la /var/lib ls -la /var on the SD? [newman:~]# ls -la /var/lib/bacula total 528848 drwxrwxr-x 2 bacula backups 4096 2005-10-20 07:21 . drwxr-xr-x 36 root root 4096 2005-09-17 14:03 .. -rw-r- 1 bacula backups 516 2005-10-20 01:18 BackupCatalog.bsr -rw-r- 1 bacula backups 539603968 2005-06-28 09:52 bacula.db.old -rw-r- 1 bacula backups 2032 2005-10-19 11:28 bacula-dir.9101.state -rw-r- 1 root root 2032 2005-10-19 11:28 bacula-fd.9102.state -rw-r- 1 bacula tape 2032 2005-10-19 11:28 bacula-sd.9103.state -rw-r- 1 bacula backups 103 2004-11-28 12:39 CVS.bsr -rw-r- 1 bacula backups 2779 2004-09-19 16:13 Data.bsr -rw--- 1 bacula backups 0 2005-10-19 11:31 helmsdeep.conmsg -rw--- 1 bacula backups 0 2004-09-17 17:45 helmsdeep-dir.conmsg -rw-r- 1 bacula backups 1303 2005-10-20 01:11 Home.bsr -rwxrwx--- 1 bacula backups 1319129 2005-10-20 01:18 log -rw-r- 1 bacula backups 523 2005-10-20 01:12 Mail.bsr -rw-r- 1 bacula backups 430 2004-11-28 12:55 Music.bsr -rw--- 1 bacula backups 0 2004-09-16 08:27 newman-dir.conmsg -rw-r- 1 bacula backups 633 2005-10-20 01:05 NewmanRoot.bsr -rw-r- 1 bacula backups 745 2005-10-20 01:07 PenfoldRoot.bsr -rw-r- 1 bacula backups 107 2005-10-16 03:00 Public.bsr -rw-r- 1 bacula backups 107 2005-10-19 11:29 restore.bsr -rw-r- 1 root root 73 2005-07-08 10:11 root-exclude -rw-r- 1 root root2 2004-11-15 09:20 root-include -rw-r- 1 bacula backups 104 2005-10-16 03:00 Source.bsr [newman:~]# ls -la /var/lib total 144 drwxr-xr-x 36 root root4096 2005-09-17 14:03 . drwxr-xr-x 14 root root4096 2005-07-12 16:57 .. drwxr-xr-x 2 root root4096 2005-05-12 15:35 apache2 drwxr-xr-x 3 root root4096 2005-07-18 13:21 apt drwxr-xr-x 2 root root4096 2004-07-31 17:51 aptitude drwxr-xr-x 2 root root4096 2005-03-04 02:21 apt-proxy drwxrwxr-x 2 bacula backups 4096 2005-10-20 07:21 bacula drwxr-xr-x 2 clamav
Re: [Bacula-users] Restore errors: Permission Denied
On Wed, 19 Oct 2005 12:08:24 +1000, Craig Holyoak [EMAIL PROTECTED] said: Craig I'm running bacula 1.36.2 on Debian stable. Whenever I try to run a restore Craig job, it fails with: Craig 19-Oct 11:29 helmsdeep: Start Restore Job Restore.2005-10-19_11.29.50 Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: Could not create bootstrap file /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap: ERR=Permission denied Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: job.c:1662 Comm error with SD. bad response to Bootstrap. ERR=Connection reset by peer Craig 19-Oct 11:29 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 1.36.2 (28Feb05): 19-Oct-2005 11:29:53 Craig JobId: 878 Craig Job:Restore.2005-10-19_11.29.50 Craig Client: newman Craig Start time: 19-Oct-2005 11:29:52 Craig End time: 19-Oct-2005 11:29:53 Craig Files Expected: 1 Craig Files Restored: 0 Craig Bytes Restored: 0 Craig Rate: 0.0 KB/s Craig FD Errors: 0 Craig FD termination status: Error Craig SD termination status: Error Craig Termination:*** Restore Error *** Craig This machine runs all bacula daemons. The director and sd run as the bacula Craig user, and the sd runs as root. /var/lib/bacula is writable by the bacula user Craig (and root, obviously :-). Craig I've tried modifying the job and redirecting the bootstrap file elsewhere (eg Craig /tmp), but I keep getting the same errors. I have never run a successful Craig restore using bconsole. I'm forced to use bextract to do all my restores, Craig which works fine. Craig Any ideas? The error comes from the SD. So helmsdeep and newman are the same machine? Does getting the same errors mean that it always says /var/lib/bacula even when the bootstrap should be in /tmp? What is the output of ls -la /var/lib/bacula ls -la /var/lib ls -la /var on the SD? Does touch /var/lib/bacula/touch-test work as the bacula user on the SD? __Martin --- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Restore errors: Permission Denied
I'm running bacula 1.36.2 on Debian stable. Whenever I try to run a restore job, it fails with: 19-Oct 11:29 helmsdeep: Start Restore Job Restore.2005-10-19_11.29.50 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: Could not create bootstrap file /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap: ERR=Permission denied 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: job.c:1662 Comm error with SD. bad response to Bootstrap. ERR=Connection reset by peer 19-Oct 11:29 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 1.36.2 (28Feb05): 19-Oct-2005 11:29:53 JobId: 878 Job:Restore.2005-10-19_11.29.50 Client: newman Start time: 19-Oct-2005 11:29:52 End time: 19-Oct-2005 11:29:53 Files Expected: 1 Files Restored: 0 Bytes Restored: 0 Rate: 0.0 KB/s FD Errors: 0 FD termination status: Error SD termination status: Error Termination:*** Restore Error *** This machine runs all bacula daemons. The director and sd run as the bacula user, and the sd runs as root. /var/lib/bacula is writable by the bacula user (and root, obviously :-). I've tried modifying the job and redirecting the bootstrap file elsewhere (eg /tmp), but I keep getting the same errors. I have never run a successful restore using bconsole. I'm forced to use bextract to do all my restores, which works fine. Any ideas? Thanks, Craig -- Craig Holyoak [EMAIL PROTECTED] http://www.helmsdeep.org/ signature.asc Description: Digital signature