Re: [Bacula-users] Restore Errors

2017-07-06 Thread Kern Sibbald

  
  
Hello Sergio,
Yes, it has all signs of being a hardware error.

A critical target error with a sector address that is reasonable
  for Bacula's byte address sounds to me like the OS had a write
  error so the disk could not be read correctly by Bacula.  The time
  of the kernel error is after the Bacula read, so perhaps sometime
  after the disk was written by Bacula it started going bad.
About the only thing you can do now is run the the restore
  (either with bextract or on the bacula-sd execution line) with -p,
  so that Bacula will try to ignore errors.  You might be able to
  restore something.
Best regards,
Kern


On 07/06/2017 04:57 PM, Sergio Belkin
  wrote:


  

  

  
Thanks Kern and Wanderlei,
  

I've tried bls and I get:

06-Jul 11:48 bls JobId 0: Error: block.c:429 Read error
on fd=3 at file:blk 0:397845547 on device "WStorage2"
(/backup/external/2week). ERR=Input/output error.
06-Jul 11:48 bls JobId 0: Error: read_records.c:124
block.c:429 Read error on fd=3 at file:blk 0:397845547
on device "WStorage2" (/backup/external/2week).
ERR=Input/output error.

  
  Also, in kernel logs:
  

I've found:

Jul 06 11:55:22 helsinki.infoestructura.local kernel:
blk_update_request: critical target error, dev sda, sector
495437784

  
  So I guess is a hardware error, isn't it?
  

Greetings

  
  

  
  
2017-07-05 4:32 GMT-03:00 Kern Sibbald
  :
  

  Yes, Bacula is telling you that it is a disk I/O error.
  
  I suggest you check your kernel log.  This looks like a
hardware I/O error.  If you find something in your
kernel log, you should check your disk drive very
carefully, it may be going bad.  If there are no
problems noted in the kernel log, then there is some
other problem -- such as an interface (or network) error
with the external disk).
  Best regards,
  Kern
  
  
 
  On
07/04/2017 06:50 PM, Sergio Belkin wrote:
  

  
  

  

  

  Hi,

  
  When  I run restore, I have the following
  errors:
  
  04-Jul 13:39 bacula-sd JobId 2094: Error:
  block.c:429 Read error on fd=7 at file:blk
  0:397845547 on device "WStorage2"
  (/backup/external/2week). ERR=Input/output
  error.
  04-Jul 13:39 bacula-sd JobId 2094: Error:
  read_records.c:124 block.c:429 Read error on
  fd=7 at file:blk
  0:397845547 on device "WStorage2"
  (/backup/external/2week). ERR=Input/output
  error.
  
  

I've found this documentation: http://www.bacula.org/5.2.x-manuals/en/main/main/Restore_Command.html#SECTION002111

  
  but I use a File Device (on an extenal disk), not
  a tape
  
  

  

  Bacula version of director is 7.0

  
  Is a disk error? Could be a
misconfiguration?
  
  
  
  Thanks in advance
  
  -- 

  

  --
Sergio Belkin
LPIC-2 Certified - http://www.lpi.org


Re: [Bacula-users] Restore Errors

2017-07-06 Thread Heitor Faria
> Thanks Kern and Wanderlei,
Hello, Sergio, 

> I've tried bls and I get:
> 06-Jul 11:48 bls JobId 0: Error: block.c:429 Read error on fd=3 at file:blk
> 0:397845547 on device "WStorage2" (/backup/external/2week). ERR=Input/output
> error.
> 06-Jul 11:48 bls JobId 0: Error: read_records.c:124 block.c:429 Read error on
> fd=3 at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week).
> ERR=Input/output error.

> Also, in kernel logs:

> I've found:

> Jul 06 11:55:22 helsinki.infoestructura.local kernel: blk_update_request:
> critical target error, dev sda, sector 495437784

> So I guess is a hardware error, isn't it?

Sorry for the para jump. Did you try to fsck? 

Regards, 
-- 
=== 
Heitor Medrado de Faria | EB-1 Visa | LPIC-III | ITIL-F | EMC 05-001| Bacula 
Systems Certified Administrator II 
• Do you need Bacula training? http://bacula.us/video-classes/ 
+55 61 98268-4220 | http://bacula.us 
=== 
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore Errors

2017-07-06 Thread Sergio Belkin
Thanks Kern and Wanderlei,

I've tried bls and I get:

06-Jul 11:48 bls JobId 0: Error: block.c:429 Read error on fd=3 at file:blk
0:397845547 on device "WStorage2" (/backup/external/2week).
ERR=Input/output error.
06-Jul 11:48 bls JobId 0: Error: read_records.c:124 block.c:429 Read error
on fd=3 at file:blk 0:397845547 on device "WStorage2"
(/backup/external/2week). ERR=Input/output error.

Also, in kernel logs:

I've found:

Jul 06 11:55:22 helsinki.infoestructura.local kernel: blk_update_request:
critical target error, dev sda, sector 495437784

So I guess is a hardware error, isn't it?

Greetings


2017-07-05 4:32 GMT-03:00 Kern Sibbald :

> Yes, Bacula is telling you that it is a disk I/O error.
>
> I suggest you check your kernel log.  This looks like a hardware I/O
> error.  If you find something in your kernel log, you should check your
> disk drive very carefully, it may be going bad.  If there are no problems
> noted in the kernel log, then there is some other problem -- such as an
> interface (or network) error with the external disk).
>
> Best regards,
>
> Kern
>
> On 07/04/2017 06:50 PM, Sergio Belkin wrote:
>
> Hi,
>
> When  I run restore, I have the following errors:
>
> 04-Jul 13:39 bacula-sd JobId 2094: Error: block.c:429 Read error on fd=7
> at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week).
> ERR=Input/output error.
> 04-Jul 13:39 bacula-sd JobId 2094: Error: read_records.c:124 block.c:429
> Read error on fd=7 at file:blk 0:397845547 on device "WStorage2"
> (/backup/external/2week). ERR=Input/output error.
>
>
> I've found this documentation: http://www.bacula.org/5.2.x-
> manuals/en/main/main/Restore_Command.html#SECTION002111
>
> but I use a File Device (on an extenal disk), not a tape
>
> Bacula version of director is 7.0
>
> Is a disk error? Could be a misconfiguration?
>
> Thanks in advance
> --
> --
> Sergio Belkin
> LPIC-2 Certified - http://www.lpi.org
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
>
> ___
> Bacula-users mailing 
> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
>


-- 
--
Sergio Belkin
LPIC-2 Certified - http://www.lpi.org
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore Errors

2017-07-05 Thread Kern Sibbald

  
  
Yes, Bacula is telling you that it is a disk I/O error.

I suggest you check your kernel log.  This looks like a hardware
  I/O error.  If you find something in your kernel log, you should
  check your disk drive very carefully, it may be going bad.  If
  there are no problems noted in the kernel log, then there is some
  other problem -- such as an interface (or network) error with the
  external disk).
Best regards,
Kern


On 07/04/2017 06:50 PM, Sergio Belkin
  wrote:


  

  
Hi,
  

When  I run restore, I have the following errors:

04-Jul 13:39 bacula-sd JobId 2094: Error: block.c:429 Read
error on fd=7 at file:blk 0:397845547 on device "WStorage2"
(/backup/external/2week). ERR=Input/output error.
04-Jul 13:39 bacula-sd JobId 2094: Error: read_records.c:124
block.c:429 Read error on fd=7 at file:blk 0:397845547 on
device "WStorage2" (/backup/external/2week).
ERR=Input/output error.


  
  I've found this documentation: http://www.bacula.org/5.2.x-manuals/en/main/main/Restore_Command.html#SECTION002111
  

but I use a File Device (on an extenal disk), not a tape


  

  
Bacula version of director is 7.0
  

Is a disk error? Could be a misconfiguration?



Thanks in advance

-- 
  

  
--
  Sergio Belkin
  LPIC-2 Certified - http://www.lpi.org
  

  

  

  

  
  
  
  
  --
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
  
  
  
  ___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users



  


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore Errors

2017-07-04 Thread Wanderlei Huttel
Hello Sergio

Probably you have some error in the volume.

You can try BLS and BEXTRACT to try to restore some data from the volume.

In the shell command line:

Syntax
bls -c /etc/bacula/bacula-sd.conf -V Volume-0001 /path/for/storage

bextract -c /etc/bacula/bacula-sd.conf -V Volume-0001 /path/for/storage
/path/for/restore



Best regards

*Wanderlei Hüttel*
http://www.huttel.com.br

2017-07-04 13:50 GMT-03:00 Sergio Belkin :

> Hi,
>
> When  I run restore, I have the following errors:
>
> 04-Jul 13:39 bacula-sd JobId 2094: Error: block.c:429 Read error on fd=7
> at file:blk 0:397845547 on device "WStorage2" (/backup/external/2week).
> ERR=Input/output error.
> 04-Jul 13:39 bacula-sd JobId 2094: Error: read_records.c:124 block.c:429
> Read error on fd=7 at file:blk 0:397845547 on device "WStorage2"
> (/backup/external/2week). ERR=Input/output error.
>
>
> I've found this documentation: http://www.bacula.org/5.2.x-
> manuals/en/main/main/Restore_Command.html#SECTION002111
>
> but I use a File Device (on an extenal disk), not a tape
>
> Bacula version of director is 7.0
>
> Is a disk error? Could be a misconfiguration?
>
> Thanks in advance
> --
> --
> Sergio Belkin
> LPIC-2 Certified - http://www.lpi.org
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Restore Errors

2017-07-04 Thread Sergio Belkin
Hi,

When  I run restore, I have the following errors:

04-Jul 13:39 bacula-sd JobId 2094: Error: block.c:429 Read error on fd=7 at
file:blk 0:397845547 on device "WStorage2" (/backup/external/2week).
ERR=Input/output error.
04-Jul 13:39 bacula-sd JobId 2094: Error: read_records.c:124 block.c:429
Read error on fd=7 at file:blk 0:397845547 on device "WStorage2"
(/backup/external/2week). ERR=Input/output error.


I've found this documentation:
http://www.bacula.org/5.2.x-manuals/en/main/main/Restore_Command.html#SECTION002111

but I use a File Device (on an extenal disk), not a tape

Bacula version of director is 7.0

Is a disk error? Could be a misconfiguration?

Thanks in advance
-- 
--
Sergio Belkin
LPIC-2 Certified - http://www.lpi.org
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Restore Errors

2017-02-13 Thread Michael Watters
Hello,

I am attempting to perform a test restore using bacula however the job
is failing to restore symlinks which are backed up the volume.

Here is the error from the console.

13-Feb 10:53 rt01.example.com JobId 8339: Error: create_file.c:305 Could
not symlink /storage/home/55/etc/rc.d/rc4.d/S26apmd -> ../init.d/apmd:
ERR=No such file or directory

Is there a way to make bacula restore symlinks properly?  I do not care
if it is restored as a broken link.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Restore Errors, terminates early

2008-11-06 Thread Roland Roberts
Below is the console log from my failing restore job.  As you can see, the
number of files restored is WAY low.  I'm trying to figure out how to get
what I can out of this restore.

I've done small restores before, a file or two, a even a small directory. 
This is the first time I've had to restore a lot of stuff.

Unfortunately, this is not a test :-(

Any ideas?

TIA,

roland


Run Restore job
JobName:RestoreFiles
Bootstrap:  /var/spool/bacula/archos-dir.restore.3.bsr
Where:  /tmp/bacula-restores
Replace:always
FileSet:System Set
Client: aristarchus-fd
Storage:File
When:   2008-11-06 14:06:05
Catalog:MyCatalog
Priority:   10
OK to run? (yes/mod/no): yes
Job queued. JobId=289
06-Nov 14:06 archos-dir: Start Restore Job RestoreFiles.2008-11-06_14.06.08
06-Nov 14:06 archos-sd: Ready to read from volume Aristarchus-0004 on
device FileStorage (/backup).
06-Nov 14:06 archos-sd: Forward spacing Volume Aristarchus-0004 to
file:block 15:2571201622.
06-Nov 14:10 archos-sd: RestoreFiles.2008-11-06_14.06.08 Error: block.c:317
Volume data error at 18:63957951!
Block checksum mismatch in block=1199365 len=64512: calc=90c6023e blk=c0deabb4
06-Nov 14:10 aristarchus-fd JobId 289: Error: attribs.c:421 File size of
restored file
/tmp/bacula-restores/home/roland/tmp/20080429-AstroTrac/img_3470.png not
correct. Original 36831200, restored 26476544.
06-Nov 14:10 archos-dir: RestoreFiles.2008-11-06_14.06.08 Error: Bacula
2.0.3 (06Mar07): 06-Nov-2008 14:10:12
  JobId:  289
  Job:RestoreFiles.2008-11-06_14.06.08
  Client: aristarchus-fd
  Start time: 06-Nov-2008 14:06:10
  End time:   06-Nov-2008 14:10:12
  Files Expected: 126,091
  Files Restored: 14,446
  Bytes Restored: 8,439,830,107
  Rate:   34875.3 KB/s
  FD Errors:  1
  FD termination status:  Error
  SD termination status:  Error
  Termination:*** Restore Error ***

06-Nov 14:10 archos-dir: Begin pruning Jobs.
06-Nov 14:10 archos-dir: No Jobs found to prune.
06-Nov 14:10 archos-dir: Begin pruning Files.
06-Nov 14:10 archos-dir: No Files found to prune.
06-Nov 14:10 archos-dir: End auto prune.




-- 
   PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD RL Enterprises
[EMAIL PROTECTED]6818 Madeline Court
[EMAIL PROTECTED]   Brooklyn, NY 11220


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore Errors, terminates early

2008-11-06 Thread Roland Roberts

Roland Roberts wrote:
 Below is the console log from my failing restore job.  As you can see, the
 number of files restored is WAY low.  I'm trying to figure out how to get
 what I can out of this restore.

 I've done small restores before, a file or two, a even a small directory.
 This is the first time I've had to restore a lot of stuff.

 Unfortunately, this is not a test :-(

It would appear the problem is in the backend with quoting file names.  I
have some configuration files that were created via a Java webstart task. 
Who cares?  Well, they are arguably misconfigured 'cause they create their
config files as c:\jobwatch.properties which ends up in my home directory
as /home/roland/c\:\\jobwatch.properties.  That name doesn't get quoted
correctly in the SQL query that goes to PostgreSQL, so the query fails (and
I get an error in syslog from the postmaster).

I've just unmarked those files and will see how far I can get now.  It is
looking better (since it is still running).

I assume I should log this as a bug

roland
-- 
   PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD RL Enterprises
[EMAIL PROTECTED]6818 Madeline Court
[EMAIL PROTECTED]   Brooklyn, NY 11220


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore Errors, terminates early

2008-11-06 Thread Roland Roberts

Roland Roberts wrote:

 It would appear the problem is in the backend with quoting file names.  I
 have some configuration files that were created via a Java webstart task.
 Who cares?  Well, they are arguably misconfigured 'cause they create their
 config files as c:\jobwatch.properties which ends up in my home directory
 as /home/roland/c\:\\jobwatch.properties.  That name doesn't get quoted
 correctly in the SQL query that goes to PostgreSQL, so the query fails (and
 I get an error in syslog from the postmaster).

 I've just unmarked those files and will see how far I can get now.  It is
 looking better (since it is still running).

Well, I spoke too soon.

It's clear that this is not the whole story.  I'm not getting any logs on
the server side to help me with this.  It's still quitting early, and syslog
does show postgresql errors coincident with the job termination.  They look
like this:

Nov  6 17:14:06 archos postgres[31135]: [30-1] ERROR:  table delcandidates
does not exist
Nov  6 17:14:06 archos postgres[31135]: [30-2] STATEMENT:  DROP TABLE
DelCandidates
Nov  6 17:14:06 archos postgres[31135]: [31-1] ERROR:  index delinx1 does
not exist
Nov  6 17:14:06 archos postgres[31135]: [31-2] STATEMENT:  DROP INDEX DelInx1
Nov  6 17:14:06 archos postgres[31135]: [32-1] ERROR:  index delinx1 does
not exist
Nov  6 17:14:06 archos postgres[31135]: [32-2] STATEMENT:  DROP INDEX DelInx1

But that may be innocuous as I also seem to get this message when I *issue*
the restore command:

Nov  6 17:22:09 archos postgres[31135]: [33-1] ERROR:  table temp does not
exist
Nov  6 17:22:09 archos postgres[31135]: [33-2] STATEMENT:  DROP TABLE temp
Nov  6 17:22:09 archos postgres[31135]: [34-1] ERROR:  table temp1 does
not exist
Nov  6 17:22:09 archos postgres[31135]: [34-2] STATEMENT:  DROP TABLE temp1

The restore seems to terminate when it gets any error like a file size not
matching.  This isn't what I expected from the manual where I expected it to
continue on until all files were restored as best as possible.

I'm now picking directories, one at a time, and restoring them.

Any better ideas on tracking this down?

roland

-- 
   PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD RL Enterprises
[EMAIL PROTECTED]6818 Madeline Court
[EMAIL PROTECTED]   Brooklyn, NY 11220


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-31 Thread Mair Wolfgang-awm013
I started the DB check yesterday morning. It is still running. How long
does this usually take? Or am I doing something wrong?

Wolfgang 

-Original Message-
From: Julien [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 30, 2007 10:22
To: Mair Wolfgang-awm013
Cc: Doytchin Spiridonov; bacula-users
Subject: Re: [Bacula-users] Restore errors

Hi Mair,

did you tried with a dbcheck ?

(dbcheck -c /path/to/bacula-dir.conf ; Toggle modify database flag ; 16)
All (3-15))

I had also a huge difference between files expected and files restored,
but the operation above fixed that ...

Regards,
Julien


On Mon, 2007-07-30 at 09:03 +0100, Mair Wolfgang-awm013 wrote:
 Hello,
 
 In my case spooling brought a remarkable improvement. Where as I had 
 hundreds of errors on one restore I hardly see them now again with 
 spooling in place.
 
 Doytchin,
 I also saw the same behavior like you did. With the concurrent jobs = 
 1, there were no errors. In this case no matter with or without
spooling.
 
 Unfortunately setting the concurrent jobs = 1 is not an option in our 
 environment. So my current setting is spooling = on and concurrent 
 jobs = 5.
 
 With this settings, it looks like that the Linux systems (OpenSuse and

 red hat) are ok but Solaris still has problems.
 
 For example below is a restore I did on Friday. Of course I don't mind

 about the door files. But the difference between expected and restored

 files is ways to much. And even worse, I have no idea what happened to

 the missing files. I don't know if this has to do something with the 
 restore errors we saw. This could also be something different.
 
 During the weekend I moved bacula to a new and separate server. It 
 runs on OpenSuse 10.2 with the latest patches in place now. The few 
 restore jobs I've done so far with this went ok.
 
 All in all I still don't feel very comfortable with this, it needs 
 more tests to be done. I will continue with testing and keep you
updated.
 
 Wolfgang
 
 
 27-Jul 11:07 porsche-dir: Start Restore Job
 RestoreFiles.2007-07-27_11.07.03 27-Jul 11:07 porsche-sd: Ready to 
 read from volume full-27-7-2007.20 on device FileStorageFull
 (/export/bacula-dump).
 27-Jul 11:07 porsche-sd: Forward spacing Volume full-27-7-2007.20 to

 file:block 0:3999802558.
 27-Jul 11:08 porsche-sd: End of file 1 on device FileStorageFull
 (/export/bacula-dump), Volume full-27-7-2007.20
 27-Jul 11:08 porsche-sd: End of Volume at file 1 on device 
 FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.20
 27-Jul 11:08 porsche-sd: Ready to read from volume full-27-7-2007.21
 on device FileStorageFull (/export/bacula-dump).
 27-Jul 11:08 porsche-sd: Forward spacing Volume full-27-7-2007.21 to

 file:block 0:200.
 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error:
 create_file.c:245 Cannot make node /export/xxx/dev/.zone_reg_door:
 ERR=Invalid argument 27-Jul 11:33 prinz-fd:
 RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make 
 node /export/xxx/dev/.devfsadm_synch_door: ERR=Invalid argument 27-Jul
 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error:
 create_file.c:245 Cannot make node
 /export/xxx/etc/sysevent/devfsadm_event_channel/reg_door: ERR=Invalid 
 argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03
Error:
 create_file.c:245 Cannot make node
 /export/xxx/etc/sysevent/devfsadm_event_channel/1: ERR=Invalid 
 argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03
Error:
 create_file.c:245 Cannot make node
 /export/xxx/etc/sysevent/syseventconfd_event_channel/reg_door:
 ERR=Invalid argument 27-Jul 11:33 prinz-fd:
 RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make 
 node /export/xxx/etc/sysevent/sysevent_door: ERR=Invalid argument 
 27-Jul
 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error:
 create_file.c:245 Cannot make node
 /export/xxx/etc/sysevent/piclevent_door: ERR=Invalid argument 27-Jul
 11:15 porsche-sd: End of file 1 on device FileStorageFull
 (/export/bacula-dump), Volume full-27-7-2007.21
 27-Jul 11:15 porsche-sd: End of Volume at file 1 on device 
 FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.21
 27-Jul 11:15 porsche-sd: Ready to read from volume full-27-7-2007.22
 on device FileStorageFull (/export/bacula-dump).
 27-Jul 11:15 porsche-sd: Forward spacing Volume full-27-7-2007.22 to

 file:block 0:200.
 27-Jul 11:21 porsche-sd: End of file 1 on device FileStorageFull
 (/export/bacula-dump), Volume full-27-7-2007.22
 27-Jul 11:21 porsche-sd: End of Volume at file 1 on device 
 FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.22
 27-Jul 11:21 porsche-sd: Ready to read from volume full-27-7-2007.23
 on device FileStorageFull (/export/bacula-dump).
 27-Jul 11:21 porsche-sd: Forward spacing Volume full-27-7-2007.23 to

 file:block 0:200.
 27-Jul 11:28 porsche-sd: End of file 1 on device FileStorageFull
 (/export/bacula-dump), Volume full-27-7-2007.23
 27-Jul 11:28 porsche-sd: End of Volume at file 1 on device

Re: [Bacula-users] Restore errors

2007-07-31 Thread John Drescher
On 7/31/07, Mair Wolfgang-awm013 [EMAIL PROTECTED] wrote:
 I started the DB check yesterday morning. It is still running. How long
 does this usually take? Or am I doing something wrong?

This depends on how big your database, what type of database and if it
is properly indexed.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-31 Thread Doytchin Spiridonov
Hello,

it will run forewer if you don't create indexes. You better stop it
(also make sure the sql query is stopped as this was not a case here)
and then do something that Frank Alpeter posted once here, I'll quote
him:

---
I'm running dbcheck periodically every sunday with the following script:


#!/bin/sh

PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin

echo $(date) Creating temp indices for bacula database...

mysql -ubacula EOFA
use bacula
CREATE INDEX file_tmp_filenameid_idx ON File (FilenameId);
CREATE INDEX file_tmp_pathid_idx ON File (PathId);
EOFA

echo $(date) Running dbcheck...
dbcheck -c /usr/local/etc/bacula-dir.conf -f -b -v

echo $(date) Removing indices and optimizing bacula database...

mysql -ubacula EOFB
use bacula
DROP INDEX file_tmp_filenameid_idx ON File;
DROP INDEX file_tmp_pathid_idx ON File;
OPTIMIZE TABLE UnsavedFiles, Counters, CDImages, BaseFiles, Device,
Version, Status, MediaType, Storage, FileSet, Client, Pool, Media,
Job, JobMedia, File, Path, Filename;
EOFB

echo $(date) Done...


Due to the additional index entries, the script runs for about 90
minutes (eternally without them) with a database of 24 million file
entries from 88 clients, mostly application servers.

It helps a lot, but still it doesn't help against orphaned entries
from previously removed clients.
---



Regards.




Tuesday, July 31, 2007, 3:08:22 PM:

MWa I started the DB check yesterday morning. It is still running. How long
MWa does this usually take? Or am I doing something wrong?

MWa Wolfgang 

MWa -Original Message-
MWa From: Julien [mailto:[EMAIL PROTECTED] 
MWa Sent: Monday, July 30, 2007 10:22
MWa To: Mair Wolfgang-awm013
MWa Cc: Doytchin Spiridonov; bacula-users
MWa Subject: Re: [Bacula-users] Restore errors

MWa Hi Mair,

MWa did you tried with a dbcheck ?

MWa (dbcheck -c /path/to/bacula-dir.conf ; Toggle modify database flag ; 16)
MWa All (3-15))

MWa I had also a huge difference between files expected and files restored,
MWa but the operation above fixed that ...

MWa Regards,
MWa Julien


MWa On Mon, 2007-07-30 at 09:03 +0100, Mair Wolfgang-awm013 wrote:
 Hello,
 
 In my case spooling brought a remarkable improvement. Where as I had 
 hundreds of errors on one restore I hardly see them now again with 
 spooling in place.
 
 Doytchin,
 I also saw the same behavior like you did. With the concurrent jobs = 
 1, there were no errors. In this case no matter with or without
MWa spooling.
 
 Unfortunately setting the concurrent jobs = 1 is not an option in our 
 environment. So my current setting is spooling = on and concurrent 
 jobs = 5.
 
 With this settings, it looks like that the Linux systems (OpenSuse and

 red hat) are ok but Solaris still has problems.
 
 For example below is a restore I did on Friday. Of course I don't mind

 about the door files. But the difference between expected and restored

 files is ways to much. And even worse, I have no idea what happened to

 the missing files. I don't know if this has to do something with the 
 restore errors we saw. This could also be something different.
 
 During the weekend I moved bacula to a new and separate server. It 
 runs on OpenSuse 10.2 with the latest patches in place now. The few 
 restore jobs I've done so far with this went ok.
 
 All in all I still don't feel very comfortable with this, it needs 
 more tests to be done. I will continue with testing and keep you
MWa updated.
 
 Wolfgang
 
 
 27-Jul 11:07 porsche-dir: Start Restore Job
 RestoreFiles.2007-07-27_11.07.03 27-Jul 11:07 porsche-sd: Ready to 
 read from volume full-27-7-2007.20 on device FileStorageFull
 (/export/bacula-dump).
 27-Jul 11:07 porsche-sd: Forward spacing Volume full-27-7-2007.20 to

 file:block 0:3999802558.
 27-Jul 11:08 porsche-sd: End of file 1 on device FileStorageFull
 (/export/bacula-dump), Volume full-27-7-2007.20
 27-Jul 11:08 porsche-sd: End of Volume at file 1 on device 
 FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.20
 27-Jul 11:08 porsche-sd: Ready to read from volume full-27-7-2007.21
 on device FileStorageFull (/export/bacula-dump).
 27-Jul 11:08 porsche-sd: Forward spacing Volume full-27-7-2007.21 to

 file:block 0:200.
 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error:
 create_file.c:245 Cannot make node /export/xxx/dev/.zone_reg_door:
 ERR=Invalid argument 27-Jul 11:33 prinz-fd:
 RestoreFiles.2007-07-27_11.07.03 Error: create_file.c:245 Cannot make 
 node /export/xxx/dev/.devfsadm_synch_door: ERR=Invalid argument 27-Jul
 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03 Error:
 create_file.c:245 Cannot make node
 /export/xxx/etc/sysevent/devfsadm_event_channel/reg_door: ERR=Invalid 
 argument 27-Jul 11:33 prinz-fd: RestoreFiles.2007-07-27_11.07.03
MWa Error:
 create_file.c:245 Cannot make node

Re: [Bacula-users] Restore errors

2007-07-30 Thread Mair Wolfgang-awm013
 porsche-dir: No Jobs found to prune.
27-Jul 11:28 porsche-dir: Begin pruning Files.
27-Jul 11:28 porsche-dir: No Files found to prune.
27-Jul 11:28 porsche-dir: End auto prune.


 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Doytchin Spiridonov
Sent: Saturday, July 28, 2007 02:02
To: bacula-users
Subject: Re: [Bacula-users] Restore errors

Hello,

just to note that several days after a full backup and incremental
bacpus, restores are OK, which again proves that the problem was caused
by running concurrent jobs.

Wolfgang do you have the same results?

Regards


Wednesday, July 25, 2007, 8:12:25 PM:

DS Hello,

DS 2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for 
DS all clients.

DS Restore OK of all jobs.

DS Seems this (concurrent jobs) is the problem.

DS Regards.


DS Tuesday, July 24, 2007, 9:57:35 PM:

DS I don't have any other ideas to check with to provide more cases. 
DS It's developers turn now...




-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-30 Thread Julien
: 27-Jul-2007 11:07:05
   End time:   27-Jul-2007 11:28:34
   Files Expected: 303,761
   Files Restored: 301,923
   Bytes Restored: 27,412,500,483
   Rate:   21266.5 KB/s
   FD Errors:  7
   FD termination status:  Error
   SD termination status:  OK
   Termination:*** Restore Error ***
 
 27-Jul 11:28 porsche-dir: Begin pruning Jobs.
 27-Jul 11:28 porsche-dir: No Jobs found to prune.
 27-Jul 11:28 porsche-dir: Begin pruning Files.
 27-Jul 11:28 porsche-dir: No Files found to prune.
 27-Jul 11:28 porsche-dir: End auto prune.
 
 
  
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 Doytchin Spiridonov
 Sent: Saturday, July 28, 2007 02:02
 To: bacula-users
 Subject: Re: [Bacula-users] Restore errors
 
 Hello,
 
 just to note that several days after a full backup and incremental
 bacpus, restores are OK, which again proves that the problem was caused
 by running concurrent jobs.
 
 Wolfgang do you have the same results?
 
 Regards
 
 
 Wednesday, July 25, 2007, 8:12:25 PM:
 
 DS Hello,
 
 DS 2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for 
 DS all clients.
 
 DS Restore OK of all jobs.
 
 DS Seems this (concurrent jobs) is the problem.
 
 DS Regards.
 
 
 DS Tuesday, July 24, 2007, 9:57:35 PM:
 
 DS I don't have any other ideas to check with to provide more cases. 
 DS It's developers turn now...
 
 
 
 
 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now   http://get.splunk.com/
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users
 
 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now   http://get.splunk.com/
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-30 Thread Doytchin Spiridonov
MWa on device FileStorageFull (/export/bacula-dump).
MWa 27-Jul 11:21 porsche-sd: Forward spacing Volume full-27-7-2007.23 to
MWa file:block 0:200.
MWa 27-Jul 11:28 porsche-sd: End of file 1 on device FileStorageFull
MWa (/export/bacula-dump), Volume full-27-7-2007.23
MWa 27-Jul 11:28 porsche-sd: End of Volume at file 1 on device
MWa FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.23
MWa 27-Jul 11:28 porsche-sd: End of all volumes.
MWa 27-Jul 11:28 porsche-dir: RestoreFiles.2007-07-27_11.07.03 Error: Bacula
MWa 2.0.3 (06Mar07): 27-Jul-2007 11:28:34
MWa   JobId:  40
MWa   Job:RestoreFiles.2007-07-27_11.07.03
MWa   Client: prinz-fd
MWa   Start time: 27-Jul-2007 11:07:05
MWa   End time:   27-Jul-2007 11:28:34
MWa   Files Expected: 303,761
MWa   Files Restored: 301,923
MWa   Bytes Restored: 27,412,500,483
MWa   Rate:   21266.5 KB/s
MWa   FD Errors:  7
MWa   FD termination status:  Error
MWa   SD termination status:  OK
MWa   Termination:*** Restore Error ***

MWa 27-Jul 11:28 porsche-dir: Begin pruning Jobs.
MWa 27-Jul 11:28 porsche-dir: No Jobs found to prune.
MWa 27-Jul 11:28 porsche-dir: Begin pruning Files.
MWa 27-Jul 11:28 porsche-dir: No Files found to prune.
MWa 27-Jul 11:28 porsche-dir: End auto prune.


MWa  

MWa -Original Message-
MWa From: [EMAIL PROTECTED]
MWa [mailto:[EMAIL PROTECTED] On Behalf Of
MWa Doytchin Spiridonov
MWa Sent: Saturday, July 28, 2007 02:02
MWa To: bacula-users
MWa Subject: Re: [Bacula-users] Restore errors

MWa Hello,

MWa just to note that several days after a full backup and incremental
MWa bacpus, restores are OK, which again proves that the problem was caused
MWa by running concurrent jobs.

MWa Wolfgang do you have the same results?

MWa Regards


MWa Wednesday, July 25, 2007, 8:12:25 PM:

DS Hello,

DS 2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for 
DS all clients.

DS Restore OK of all jobs.

DS Seems this (concurrent jobs) is the problem.

DS Regards.


DS Tuesday, July 24, 2007, 9:57:35 PM:

DS I don't have any other ideas to check with to provide more cases. 
DS It's developers turn now...



MWa 
MWa -
MWa This SF.net email is sponsored by: Splunk Inc.
MWa Still grepping through log files to find problems?  Stop.
MWa Now Search log events and configuration files using AJAX and a browser.
MWa Download your FREE copy of Splunk now   http://get.splunk.com/
MWa ___
MWa Bacula-users mailing list
MWa Bacula-users@lists.sourceforge.net
MWa https://lists.sourceforge.net/lists/listinfo/bacula-users


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-30 Thread Julien
 of file 1 on device FileStorageFull
 MWa (/export/bacula-dump), Volume full-27-7-2007.22
 MWa 27-Jul 11:21 porsche-sd: End of Volume at file 1 on device
 MWa FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.22
 MWa 27-Jul 11:21 porsche-sd: Ready to read from volume full-27-7-2007.23
 MWa on device FileStorageFull (/export/bacula-dump).
 MWa 27-Jul 11:21 porsche-sd: Forward spacing Volume full-27-7-2007.23 to
 MWa file:block 0:200.
 MWa 27-Jul 11:28 porsche-sd: End of file 1 on device FileStorageFull
 MWa (/export/bacula-dump), Volume full-27-7-2007.23
 MWa 27-Jul 11:28 porsche-sd: End of Volume at file 1 on device
 MWa FileStorageFull (/export/bacula-dump), Volume full-27-7-2007.23
 MWa 27-Jul 11:28 porsche-sd: End of all volumes.
 MWa 27-Jul 11:28 porsche-dir: RestoreFiles.2007-07-27_11.07.03 Error: Bacula
 MWa 2.0.3 (06Mar07): 27-Jul-2007 11:28:34
 MWa   JobId:  40
 MWa   Job:RestoreFiles.2007-07-27_11.07.03
 MWa   Client: prinz-fd
 MWa   Start time: 27-Jul-2007 11:07:05
 MWa   End time:   27-Jul-2007 11:28:34
 MWa   Files Expected: 303,761
 MWa   Files Restored: 301,923
 MWa   Bytes Restored: 27,412,500,483
 MWa   Rate:   21266.5 KB/s
 MWa   FD Errors:  7
 MWa   FD termination status:  Error
 MWa   SD termination status:  OK
 MWa   Termination:*** Restore Error ***
 
 MWa 27-Jul 11:28 porsche-dir: Begin pruning Jobs.
 MWa 27-Jul 11:28 porsche-dir: No Jobs found to prune.
 MWa 27-Jul 11:28 porsche-dir: Begin pruning Files.
 MWa 27-Jul 11:28 porsche-dir: No Files found to prune.
 MWa 27-Jul 11:28 porsche-dir: End auto prune.
 
 
 MWa  
 
 MWa -Original Message-
 MWa From: [EMAIL PROTECTED]
 MWa [mailto:[EMAIL PROTECTED] On Behalf Of
 MWa Doytchin Spiridonov
 MWa Sent: Saturday, July 28, 2007 02:02
 MWa To: bacula-users
 MWa Subject: Re: [Bacula-users] Restore errors
 
 MWa Hello,
 
 MWa just to note that several days after a full backup and incremental
 MWa bacpus, restores are OK, which again proves that the problem was caused
 MWa by running concurrent jobs.
 
 MWa Wolfgang do you have the same results?
 
 MWa Regards
 
 
 MWa Wednesday, July 25, 2007, 8:12:25 PM:
 
 DS Hello,
 
 DS 2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for 
 DS all clients.
 
 DS Restore OK of all jobs.
 
 DS Seems this (concurrent jobs) is the problem.
 
 DS Regards.
 
 
 DS Tuesday, July 24, 2007, 9:57:35 PM:
 
 DS I don't have any other ideas to check with to provide more cases. 
 DS It's developers turn now...
 
 
 
 MWa 
 MWa -
 MWa This SF.net email is sponsored by: Splunk Inc.
 MWa Still grepping through log files to find problems?  Stop.
 MWa Now Search log events and configuration files using AJAX and a browser.
 MWa Download your FREE copy of Splunk now   http://get.splunk.com/
 MWa ___
 MWa Bacula-users mailing list
 MWa Bacula-users@lists.sourceforge.net
 MWa https://lists.sourceforge.net/lists/listinfo/bacula-users
 
 
 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now   http://get.splunk.com/
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-27 Thread Doytchin Spiridonov
Hello,

just to note that several days after a full backup and incremental
bacpus, restores are OK, which again proves that the problem was
caused by running concurrent jobs.

Wolfgang do you have the same results?

Regards


Wednesday, July 25, 2007, 8:12:25 PM:

DS Hello,

DS 2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for all
DS clients.

DS Restore OK of all jobs.

DS Seems this (concurrent jobs) is the problem.

DS Regards.


DS Tuesday, July 24, 2007, 9:57:35 PM:

DS I don't have any other ideas to check with to provide more cases. It's
DS developers turn now...



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-26 Thread Frank Sweetser
Steen wrote:
 Hello,
 just wondering from the sideline here:
 
 On Tuesday 24 July 2007 07:28:48 Doytchin Spiridonov wrote:
 
 1. some static files (i.e. not log files!) are restored with wrong
 (always larger) size, while first N bytes match, and the rest is
 filled with a part of another file (not sure if this is just file with
 a wrong size and some old data at the disk appears at the end, or
 bacula restores part of another file and append it to the end). The
 file can be restored correctly if marked alone

 doesn't his prove that the relevant catalog data for the file is OK?

Quite possibly; I suspect that one of the developers would have to look at the
catalog contents to say one way or the other.

 b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header
 file index 42452 not equal record index 0
 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124
 Error sending to File daemon. ERR=Connection reset by peer
 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306
 Write error sending 30 bytes to client:10.2.1.13:36643: ERR=Connection
 reset by peer
 This seems to point to an error on the fd side? How can that be related to 
 the 
 backing up part?

The error basically indicates that the FD got data from the SD it believed to
be corrupt.  This could theoretically be due to a problem on the FD side, but
it certainly is not enough rule out the director or SD.

-- 
Frank Sweetser fs at wpi.edu  |  For every problem, there is a solution that
WPI Senior Network Engineer   |  is simple, elegant, and wrong. - HL Mencken
GPG fingerprint = 6174 1257 129E 0D21 D8D4  E8A3 8E39 29E3 E2E8 8CEC

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-26 Thread Doytchin Spiridonov
Hello,

Thursday, July 26, 2007, 7:38:43 PM:

S Hello,
S just wondering from the sideline here:

 The file can be restored correctly if marked alone
S doesn't his prove that the relevant catalog data for the file is OK?

This probably means that the problem is with positioning inside
volumes?

 but the error 3. below 
 is generated (which seems to be just a bogus error).

 b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header
 file index 42452 not equal record index 0
 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124
 Error sending to File daemon. ERR=Connection reset by peer
 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306
 Write error sending 30 bytes to client:10.2.1.13:36643: ERR=Connection
 reset by peer
S This seems to point to an error on the fd side? How can that be related to 
the
S backing up part?

No, as the same error happens when restoring to several different fds.
Happens at the same time and positions and using different versions
(2.0.3, 2.1.26, 2.1.28). The message could come from the fd, but if
wrong data/packets are sent to fd, or if the connection is closed by
the director or sd? I think those errors could be explained only by
the developer who wrote the software. However as they couldn't
reproduce the problem (seems they don't have the right test case, as
this is clear is not related to the hadrware or OS) they closed the
bug report and for us now the only solution is not to use concurrent
jobs (and i guess for everyone, if you want to be sure that one day
you could restore your backup if needed).

The tests so far show that when running jobs 1 by 1 (not concurrent)
none of the errors happen. I'm happy that we found at least a
workaround, otherwise Bacula would be useless.

The bottom line is that with Bacula if your backups are reported to be
OK, this doesn't mean you could restore the files w/o a problem and
it's a good idea to check periodically full restores.

Regards.


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-26 Thread Steen
Hello,
just wondering from the sideline here:

On Tuesday 24 July 2007 07:28:48 Doytchin Spiridonov wrote:

 1. some static files (i.e. not log files!) are restored with wrong
 (always larger) size, while first N bytes match, and the rest is
 filled with a part of another file (not sure if this is just file with
 a wrong size and some old data at the disk appears at the end, or
 bacula restores part of another file and append it to the end). The
 file can be restored correctly if marked alone
doesn't his prove that the relevant catalog data for the file is OK?
 but the error 3. below 
 is generated (which seems to be just a bogus error). An example error is:
 ---
 b0: Restore_b0.d6.int.2007-07-23_22.37.34 Error: attribs.c:410 File size
 of restored file
 /home/bacula/res/b3.2/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm
 not correct. Original 3826291, restored 10620921.
 ---
 When this error is present (always) the second error below (but w/o
 additional error messages) is present as well (missing files)

 2. large amount of files are missing (while they are present in the
 catalog and selected) - tens of thousands (not sockets or anything
 else that Bacula ignores by default). When this happens usually an
 error like this appear (if not the first one above):
 ---
 b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header
 file index 42452 not equal record index 0
 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124
 Error sending to File daemon. ERR=Connection reset by peer
 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306
 Write error sending 30 bytes to client:10.2.1.13:36643: ERR=Connection
 reset by peer
This seems to point to an error on the fd side? How can that be related to the 
backing up part?

 ---

 3. when a file from error 1 is restored alone it is OK, but another
 bogus error is generated:
 ---
 Storage: Restore_b0.d6.int.2007-07-23_22.57.42 Error: block.c:275
 Volume data error at 0:3999743252! Wanted ID: BB02, got Иnлу.
 Buffer discarded.
 ---
 Found that the above number (3999743252) is not present as block
 address for any block in the volumes, but the same number appears as
 part of JobMedia record in the database.


 This is everything in 2.1.28 sumarized, that poped up as a problem or
 fact.
 (2.0.3 had another bug with bogus errors about sockets' attributes and
 2.1.26 had a bogus SQL error messages but those are fixed OK in
 2.1.28).

 If anyone wants, feel free to reopen the bug in Mantis (903). I'm not
 going to do so as I am personally disappointed by the attitude this
 is not a bug - work it out yourself and the suggestion to send you
 our servers as a gift to test with, plus support fees... nice. Now
 it's up to you to create better test cases to catch more bugs if any.

 We will start our backup again w/o concurrent jobs and we will
 continue to monitor restores on a daily basis as the above tests are
 just 3 and I agree there is a posibility that it was just a chance
 that the later two tests went OK. But it was my suggestion from the
 beginning that the problem is Bacula damages either database numbers
 or volume records when concurrent jobs are running and so far the
 facts proved this.


 (!) The workaround for the problem is to switch off concurrent jobs as
 if not - the chance you have invalid backups are high (some 90% from
 our own cases and at least with our servers/os/configuration; this is
 so if it is not said that 100% of backups are wrong as after
 diff/incremental backups Bacula restores files that are deleted which
 is really a bad behaviour in many cases/services).


 Regards



 Tuesday, July 24, 2007, 12:15:43 AM:

 DL On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote:
  Hello,
 
  I've filed this as a bug, but while Kern couldn't reproduce it he gave
  up. So let us find here what could be the problem. There are actually
  two problems, they could be linked.

 DL Please.  If anyone can solve the issue given what you supplied, they
 DL would.  You were asked to supply a reproducible situation.  Hopefully
 DL we can get to that position quickly without further unnecessary
 DL distractions.



 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now   http://get.splunk.com/
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users



-- 
Regards

Steen

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/

Re: [Bacula-users] Restore errors

2007-07-25 Thread Doytchin Spiridonov
Hello,

2nd day w/o concurrent jobs: we have 1xFULL and 1xINCREMENTAL for all
clients.

Restore OK of all jobs.

Seems this (concurrent jobs) is the problem.

Regards.


Tuesday, July 24, 2007, 9:57:35 PM:

DS I don't have any other ideas to check with to provide more cases. It's
DS developers turn now...




-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-24 Thread Mair Wolfgang-awm013
Hello,

This is exactly what I experienced last week. I submitted this under the 
subject: ' Restore Error of linux-install-fdFul'. 

However, I didn't had the time yet to track this as much down as Doytchin did. 
Great work! 
This morning (before reading through all this) I also found that if I do a 
single backup the restore runs fine. My setting has concurrent jobs = 3. If I 
backup with this I get the same errors as already described. 

In order to contribute something to this here, this is my setup and what I did 
so far:

Opensuse 10.2 
Bacula 2.0.3

First setup was in a vmware machine on a opensuse 10.2. Since I could not find 
anything wrong with the OS or file system. I thought about the virtual machine.
I moved to a different (no virtual) system. Installed the OS new and compiled 
bacula 2.0.3 from scratch. (./configure --enable-smartalloc --with-mysql) 
copied the config files and created the needed mysql tables with the supplied 
scripts.
The first manual backup and restore I did went without problems. I tried two 
big machines. 
Then I left it run the usual backup with 3 concurrent jobs. Tried a restore and 
it failed with the known problems. 

As Doytchin already tracked down, it is a matter of the concurrent running 
backup jobs. I fully agree with this. 

Currently I've also set the concurrent jobs = 1 and the backup is still 
running. I don't know if this would be usable in our environment, since it 
takes now a quite long time to complete. 
Hopefully the restore will work out fine now with this setting. I'll keep you 
updated.

Regards
Wolfgang




-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doytchin 
Spiridonov
Sent: Tuesday, July 24, 2007 07:29
To: bacula-users
Subject: Re: [Bacula-users] Restore errors

Hello,

done. Found where is the problem after some more tests (and once again it is 
not in our hadrware or OS or broken things). It is where I initially suggested 
- the concurrent jobs.

After the first (and native configuration) we used (concurrent jobs, with gzip) 
we tested the following:

1. concurrent jobs, w/o gzip
- we got similar errors (1 wrong filesize from 4 jobs, but 3 of 4 jobs with 
less files than expected, the 4th usually is very small - 100 files - and never 
had errors, so I would say 100% of jobs was invalid)

2. no concurrent jobs (Maximum Concurrent Jobs = 1 at dir and sd), w/o gzip
- good news, all restores are OK, no errors, Files Expected and Files Restored 
match!

3. no concurrent jobs WITH gzip
- again OK, all restores are OK, no errors, Files Expected and Files Restored 
match!

So until now we have:
- the problem is not caused by a corrupted file system
- volumes are consistent and bls doesn't show errors
- MySQL is OK (initially 4.1.x now 5.0.37)
- when running concurrent jobs both 2.0.3 and 2.1.28 say backups are OK but 
restores fail with one of the 3 kinds of errors listed below
- when concurrent jobs are turned off everything is OK
- gzip on/off doesn't affect the errors

Once again the 3 types of errors are:

1. some static files (i.e. not log files!) are restored with wrong (always 
larger) size, while first N bytes match, and the rest is filled with a part of 
another file (not sure if this is just file with a wrong size and some old data 
at the disk appears at the end, or bacula restores part of another file and 
append it to the end). The file can be restored correctly if marked alone but 
the error 3. below is generated (which seems to be just a bogus error). An 
example error is:
---
b0: Restore_b0.d6.int.2007-07-23_22.37.34 Error: attribs.c:410 File size of 
restored file 
/home/bacula/res/b3.2/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm
not correct. Original 3826291, restored 10620921.
---
When this error is present (always) the second error below (but w/o additional 
error messages) is present as well (missing files)

2. large amount of files are missing (while they are present in the catalog and 
selected) - tens of thousands (not sockets or anything else that Bacula ignores 
by default). When this happens usually an error like this appear (if not the 
first one above):
---
b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header file index 
42452 not equal record index 0
Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124 Error 
sending to File daemon. ERR=Connection reset by peer
Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306 Write error 
sending 30 bytes to client:10.2.1.13:36643: ERR=Connection reset by peer
---

3. when a file from error 1 is restored alone it is OK, but another bogus error 
is generated:
---
Storage: Restore_b0.d6.int.2007-07-23_22.57.42 Error: block.c:275 Volume data 
error at 0:3999743252! Wanted ID: BB02, got Иnлу.
Buffer discarded.
---
Found that the above number (3999743252) is not present as block address for 
any block in the volumes, but the same number appears as part of JobMedia 
record in the database

Re: [Bacula-users] Restore errors

2007-07-24 Thread John Drescher
On 7/24/07, Mair Wolfgang-awm013 [EMAIL PROTECTED] wrote:
 Hello,

 This is exactly what I experienced last week. I submitted this under the 
 subject: ' Restore Error of linux-install-fdFul'.

 However, I didn't had the time yet to track this as much down as Doytchin 
 did. Great work!
 This morning (before reading through all this) I also found that if I do a 
 single backup the restore runs fine. My setting has concurrent jobs = 3. If I 
 backup with this I get the same errors as already described.

Has anyone who has this problem tried turning spooling on?

John

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-24 Thread Frank Sweetser
Doytchin Spiridonov wrote:
 Hello,
 
 done. Found where is the problem after some more tests (and once again
 it is not in our hadrware or OS or broken things). It is where I
 initially suggested - the concurrent jobs.

So you can reliably reproduce the problem now?  Excellent!

 After the first (and native configuration) we used (concurrent jobs,
 with gzip) we tested the following:
 
 1. concurrent jobs, w/o gzip
 - we got similar errors (1 wrong filesize from 4 jobs, but 3 of 4 jobs
 with less files than expected, the 4th usually is very small - 100
 files - and never had errors, so I would say 100% of jobs was invalid)
 
 2. no concurrent jobs (Maximum Concurrent Jobs = 1 at dir and sd), w/o
 gzip
 - good news, all restores are OK, no errors, Files Expected and Files
 Restored match!
 
 3. no concurrent jobs WITH gzip
 - again OK, all restores are OK, no errors, Files Expected and Files
 Restored match!

Okay, so it looks like you can reproduce the symptoms just with multiple
concurrent jobs, regardless of the gzip settings.

 So until now we have:
 - the problem is not caused by a corrupted file system
 - volumes are consistent and bls doesn't show errors
 - MySQL is OK (initially 4.1.x now 5.0.37)
 - when running concurrent jobs both 2.0.3 and 2.1.28 say backups are
 OK but restores fail with one of the 3 kinds of errors listed below
 - when concurrent jobs are turned off everything is OK
 - gzip on/off doesn't affect the errors

I realize that you mentioned in another email you're dumping the mysql tables
nightly, but I would still strongly recommend that you run a repair tables on
your catalog to be absolutely sure there isn't any subtle corruption that's
snuck in.  It pays to be painfully methodical when troubleshooting this kind
of scenario, especially since you seem to be the first to knowingly run into
this problem.

Another good thing to try would be to double check and make sure that your
catalog schema exactly matches what bacula is expecting.  If, for example, the
column type holding volume offsets somehow became a 16 bit int where bacula
was expecting a 32 bit, the inserted values could become truncated or wrap
around, causing the kind of corruption you're seeing.

Actually, that gives me another idea.  While I've never used it myself, you
may be able to get more details by running some jobs with strict mode turned
on on your mysql catalog.

http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html

If your bacula installation is doing something that would cause the data
stored to be wrong, such as storing a value that doesn't fit in the column
type, I believe this should turn it from a silent warning into a fatal error,
making it easier to track down.

Also, it's been suggested that you try turning on spooling.  Have you done so?

 Once again the 3 types of errors are:
 
 1. some static files (i.e. not log files!) are restored with wrong
 (always larger) size, while first N bytes match, and the rest is
 filled with a part of another file (not sure if this is just file with
 a wrong size and some old data at the disk appears at the end, or
 bacula restores part of another file and append it to the end). The
 file can be restored correctly if marked alone but the error 3. below
 is generated (which seems to be just a bogus error). An example error is:
 ---
 b0: Restore_b0.d6.int.2007-07-23_22.37.34 Error: attribs.c:410 File size
 of restored file
 /home/bacula/res/b3.2/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm
 not correct. Original 3826291, restored 10620921.
 ---
 When this error is present (always) the second error below (but w/o
 additional error messages) is present as well (missing files)
 
 2. large amount of files are missing (while they are present in the
 catalog and selected) - tens of thousands (not sockets or anything
 else that Bacula ignores by default). When this happens usually an
 error like this appear (if not the first one above):
 ---
 b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header
 file index 42452 not equal record index 0
 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124
 Error sending to File daemon. ERR=Connection reset by peer
 Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306
 Write error sending 30 bytes to client:10.2.1.13:36643: ERR=Connection
 reset by peer
 ---
 
 3. when a file from error 1 is restored alone it is OK, but another
 bogus error is generated:
 ---
 Storage: Restore_b0.d6.int.2007-07-23_22.57.42 Error: block.c:275
 Volume data error at 0:3999743252! Wanted ID: BB02, got Иnлу.
 Buffer discarded.
 ---
 Found that the above number (3999743252) is not present as block
 address for any block in the volumes, but the same number appears as
 part of JobMedia record in the database.
 
 
 This is everything in 2.1.28 sumarized, that poped up as a problem or
 fact.
 (2.0.3 had another bug with bogus errors about sockets' attributes and
 2.1.26 had a bogus SQL error messages but those are 

Re: [Bacula-users] Restore errors

2007-07-24 Thread Mair Wolfgang-awm013
Spooling? Does this also apply if my backup goes directly to files? 

Here is my seeting:
sd:
Device {
  Name = FileStorage
  Media Type = File
  Archive Device = /export/bacula-dump
  LabelMedia = yes;   # lets Bacula label unlabeled
media
  Random Access = Yes;
  AutomaticMount = yes;   # when device opened, read it
  RemovableMedia = no;
  AlwaysOpen = no;
}

dir:
#
Storage {
  Name = File
  Address = porsche# N.B. Use a fully qualified name
here
  SDPort = 9103
  Password = YnwWs7iqCDg1mMq1LG3rB4LX1mpC7PNgCn68Y52Iu
  Device = FileStorage
  Media Type = File
  Maximum Concurrent Jobs = 1
}



-Original Message-
From: John Drescher [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 13:00
To: Mair Wolfgang-awm013
Cc: Doytchin Spiridonov; bacula-users
Subject: Re: [Bacula-users] Restore errors

On 7/24/07, Mair Wolfgang-awm013 [EMAIL PROTECTED] wrote:
 Hello,

 This is exactly what I experienced last week. I submitted this under
the subject: ' Restore Error of linux-install-fdFul'.

 However, I didn't had the time yet to track this as much down as
Doytchin did. Great work!
 This morning (before reading through all this) I also found that if I
do a single backup the restore runs fine. My setting has concurrent jobs
= 3. If I backup with this I get the same errors as already described.

Has anyone who has this problem tried turning spooling on?

John

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-24 Thread Frank Sweetser
Mair Wolfgang-awm013 wrote:
 Spooling? Does this also apply if my backup goes directly to files? 

It would in this case, yes.  With spooling, the data goes to the spooling file
first, and is then unspooled in chunks.  Without spooling, all of the data
from the multiple jobs goes straight to the volume as it comes in.  If the
problem is the data going as it comes in, spooling would make the symptoms go
away, and narrow down where the underlying problem might be.

-- 
Frank Sweetser fs at wpi.edu  |  For every problem, there is a solution that
WPI Senior Network Engineer   |  is simple, elegant, and wrong. - HL Mencken
GPG fingerprint = 6174 1257 129E 0D21 D8D4  E8A3 8E39 29E3 E2E8 8CEC

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-24 Thread Doytchin Spiridonov
Hello,


Tuesday, July 24, 2007, 2:00:43 PM:

FS Okay, so it looks like you can reproduce the symptoms just with multiple
FS concurrent jobs, regardless of the gzip settings.

I am sure the file/dirs backed up are important! I bet developers are
tested enough concurrent jobs but if they didn't catched the problem
then the dir structure/file numbers/size is important. To give a
picture of our test environement - I am testing with 4 jobs, 4
separate servers, 50K - 350K files each and 2-7GB of data. One of the
job is backing up the bacula server itself (not sure if this matters;
I noted a possible problem with naming the daemons with same names and
so temp files overwritten, but this is not our case).

 So until now we have:
 - the problem is not caused by a corrupted file system
 - volumes are consistent and bls doesn't show errors
 - MySQL is OK (initially 4.1.x now 5.0.37)
 - when running concurrent jobs both 2.0.3 and 2.1.28 say backups are
 OK but restores fail with one of the 3 kinds of errors listed below
 - when concurrent jobs are turned off everything is OK
 - gzip on/off doesn't affect the errors

FS I realize that you mentioned in another email you're dumping the mysql 
tables
FS nightly, but I would still strongly recommend that you run a repair tables 
on
FS your catalog to be absolutely sure there isn't any subtle corruption that's
FS snuck in.  It pays to be painfully methodical when troubleshooting this kind
FS of scenario, especially since you seem to be the first to knowingly run into
FS this problem.

FS Another good thing to try would be to double check and make sure that your
FS catalog schema exactly matches what bacula is expecting.  If, for example, 
the
FS column type holding volume offsets somehow became a 16 bit int where bacula
FS was expecting a 32 bit, the inserted values could become truncated or wrap
FS around, causing the kind of corruption you're seeing.

FS Actually, that gives me another idea.  While I've never used it myself, you
FS may be able to get more details by running some jobs with strict mode turned
FS on on your mysql catalog.

FS http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html

FS If your bacula installation is doing something that would cause the data
FS stored to be wrong, such as storing a value that doesn't fit in the column
FS type, I believe this should turn it from a silent warning into a fatal 
error,
FS making it easier to track down.

FS Also, it's been suggested that you try turning on spooling.  Have you done 
so?

Nice suggestion. Will try it and spooling as well. This probably will
cut the possibilities to half as the problem is either with the wrong
database data or wrong data in volumes (or both).

Re mysql check - as we fixed the problem yesterday I don't have a DB
now to check against but I'll start a new backup the old way to got
the problem and to verify the DB just to remove that posibility. (We
have one spare server for Bacula tests and in fact there is no Xen and
LVM but were getting the problem there as well, this to prove also
that it is not Xen or LVM related, I forgot to mention this
yesterday).

Done and more info: surprised, this time only one of 4 jobs had a
problem and strange - at a similar place (I recall the filename once
was broken - its from the same dir). Anyway - the file size was
different and (type of error 1).

Checked the bacula tables - no problems, all had status OK.

BUT, I see at this server it happened 1 out of 4 jobs, while at the
other 4 of 4 (which was much better for testing). I think if I enable
spooling if I get no errors this couldn't mean spooling solved the
problem, it could be just a good chance. As you see it doesn't happen
always nor for all jobs. But I will run several more tests anyway.

Now running the same with spooling. First impression is that I noted
for 4 jobs it is writing to 8 different files. This is not so good for
performance and it would be the same to define different pool for
every job wouldn't it? If the spooling fixes the problem (i.e.
separate write for every job) this would mean that separate pool will
do the same, saving some time for data transfer between files?

 (!) The workaround for the problem is to switch off concurrent
 jobs...

FS Obviously that's not a very good workaround in the long run, especially for
FS those of us with multiple drives.

This is why I also asked earlier yesterday about comparison w/ or w/o
concurrent jobs or writing to separate volumes, as I was sure we will
end with no concurrent jobs but as Wolfgang is sharing - it is slower.

Regards.


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list

Re: [Bacula-users] Restore errors

2007-07-24 Thread Doytchin Spiridonov
Hello,


Tuesday, July 24, 2007, 2:00:43 PM:


FS Also, it's been suggested that you try turning on spooling.  Have you done 
so?

Good news (or bad, who knows) enabled spooling (Maximum Job Spool Size
= 500m) performed the same and AGAIN the first job I tested to restore
~44K files are missing:

  Files Expected: 348,120
  Files Restored: 304,654

Another restore jobs is similar:
  Files Expected: 190,741
  Files Restored: 154,016

The case is slightly different as this time there is NO other errors
generated (like file with wrong size or the Record header file index
not equal).

Hope this helps in resolution of the problem.

Regards.

P.S. Will try your suggestion for STRICT_TRANS_TABLES but I guess it's more
likely that a field is changed from 16 to 32 bit instead from 32 to
16...



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-24 Thread Julien
Just to say that the difference problem I had between Files Expected
and Files Restored has been resolved with a $ dbcheck

I don't understand why my database had inconsistencies (as I never had
any hard reboot, electric cut or such) ... should dbcheck be executed at
regular interval ?

On Tue, 2007-07-24 at 05:25 +0300, Doytchin Spiridonov wrote:
 Hello,
 
 Tuesday, July 24, 2007, 2:25:44 AM:
 
 FS Doytchin Spiridonov wrote:
  Hello,
  
  Monday, July 23, 2007, 9:02:21 PM:
  
  
  FS The first thing that I would try is unmounting the filesystem and 
  performing a
  FS full fsck on it, to rule out filesystem corruption.
  
  not a problem with the FS or disks, checked that.
  
  Checked the logical content of the volumes as well (bls -k -v) - no
  errors. For curiosity as I don't know if bls should print errors if
  content is damaged, I changed one random byte of one volume to 0,
  run the bls and got an error: Block checksum mismatch in block=113
  len=64512: calc=2a576dc5 blk=44a509f3
  
  So I would say the 3 problems (files with wrong size, missing files
  and error about ID: BB02) are not hardware/fs/disks related and are
  caused by a bug in Bacula and it is related to wrong positioning in
  the volumes and mismatched numbers.
 
 FS Based on this and your other emails, I would next suspect a problem with 
 the
 FS catalog.  Again, I'd start by making sure no errors have crept in by 
 doing a
 FS consistency check at the database level - a 'repair tables' in mysql, or 
 the
 FS equivalent in postgresql.
 
 
 Not the case - we are doing daily DB dumps before backups and they
 would crash if tables are damaged. No mysql error logs. The problem is
 not in the MySQL or broken dbs.
 
 Regards.
 
 
 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now   http://get.splunk.com/
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-24 Thread Doytchin Spiridonov
Hello,

the last 2 tests:

Tuesday, July 24, 2007, 2:00:43 PM:


FS Actually, that gives me another idea.  While I've never used it myself, you
FS may be able to get more details by running some jobs with strict mode turned
FS on on your mysql catalog.

FS http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html

FS If your bacula installation is doing something that would cause the data
FS stored to be wrong, such as storing a value that doesn't fit in the column
FS type, I believe this should turn it from a silent warning into a fatal 
error,
FS making it easier to track down.

1. I've set for MySQL: sql_mode=TRADITIONAL and run again all the
jobs. No errors were reported as it was before. Tested restore jobs.
Again 1 of them had errors (one RPM file with wrong size and restored
less files).

2. So this was a time to check Julien's ideat to try dbcheck.
Run: dbcheck -c /etc/bacula/bacula-dir.conf -v -f
(I first created 2 indexes as dbcheck otherwise run forewer)
0 problems found for all the steps!
Anyway I tried again a restore - same problem. So it is not one
that can be fixed by dbcheck.

I don't have any other ideas to check with to provide more cases. It's
developers turn now...

Regards.


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Restore errors

2007-07-23 Thread Doytchin Spiridonov
Hello,

trying to identify a bug in bacula and/or our system setup.

Is there anyone that on restore had errors like this:

Error: attribs.c:410 File size of restored file
/home/bacula/res/b3/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm
not correct. Original 3826291, restored 10620921.

- the file is not a log file or any file that has changed during the
backup (in which cases an error like the one above should be normal)

- the wrong file size is always larger that the original; if we cut
the first N bytes, where the N is the correct file size, the original
and restored files match; we noted that the appended data is part of
another file from the backup, not a garbage data. Note that this other
file (from which some part has been appended to the file with wrong
size) is restored correctly, so the only problem is wrong file size
decision by bacula and reading further than its end (seems this is
some internal buffer of Bacula as the data is stored in the volumes
using GZIP and just reading further would break everything and the
appended data should be garbage, not unzipped data).




-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Frank Sweetser
Doytchin Spiridonov wrote:
 Hello,
 
 trying to identify a bug in bacula and/or our system setup.
 
 Is there anyone that on restore had errors like this:
 
 Error: attribs.c:410 File size of restored file
 /home/bacula/res/b3/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm
 not correct. Original 3826291, restored 10620921.
 
 - the file is not a log file or any file that has changed during the
 backup (in which cases an error like the one above should be normal)
 
 - the wrong file size is always larger that the original; if we cut
 the first N bytes, where the N is the correct file size, the original
 and restored files match; we noted that the appended data is part of
 another file from the backup, not a garbage data. Note that this other
 file (from which some part has been appended to the file with wrong
 size) is restored correctly, so the only problem is wrong file size
 decision by bacula and reading further than its end (seems this is
 some internal buffer of Bacula as the data is stored in the volumes
 using GZIP and just reading further would break everything and the
 appended data should be garbage, not unzipped data).

The first thing that I would try is unmounting the filesystem and performing a
full fsck on it, to rule out filesystem corruption.

-- 
Frank Sweetser fs at wpi.edu  |  For every problem, there is a solution that
WPI Senior Network Engineer   |  is simple, elegant, and wrong. - HL Mencken
GPG fingerprint = 6174 1257 129E 0D21 D8D4  E8A3 8E39 29E3 E2E8 8CEC

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Ryan Novosielski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Doytchin Spiridonov wrote:
 Hello,
 
 trying to identify a bug in bacula and/or our system setup.
 
 Is there anyone that on restore had errors like this:
 
 Error: attribs.c:410 File size of restored file
 /home/bacula/res/b3/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm
 not correct. Original 3826291, restored 10620921.
 
 - the file is not a log file or any file that has changed during the
 backup (in which cases an error like the one above should be normal)
 
 - the wrong file size is always larger that the original; if we cut
 the first N bytes, where the N is the correct file size, the original
 and restored files match; we noted that the appended data is part of
 another file from the backup, not a garbage data. Note that this other
 file (from which some part has been appended to the file with wrong
 size) is restored correctly, so the only problem is wrong file size
 decision by bacula and reading further than its end (seems this is
 some internal buffer of Bacula as the data is stored in the volumes
 using GZIP and just reading further would break everything and the
 appended data should be garbage, not unzipped data).

This has been brought up several times within the last week, but never
with the explanation and examination. I wonder if some of the other who
have experienced it (I do not know their names -- hopefully they can
chime in) can do the same thing for us. This is potentially serious,
seems like, if it is a widespread problem.

I think if the others can verify it, this should also be copied to
Bacula devel. I think I will try a large restore of my own today to see
what happens.

Please give the rest of the details of your setup, however -- you don't
even include the Bacula version, and that is a very basic piece of
information. Operating system (presumably RedHat Linux from the file you
backed up, but who knows), architecture... all would be useful.

- --
  _  _ _  _ ___  _  _  _
 |Y#| |  | |\/| |  \ |\ |  | |Ryan Novosielski - Systems Programmer II
 |$| |__| |  | |__/ | \| _| |[EMAIL PROTECTED] - 973/972.0922 (2-0922)
 \__/ Univ. of Med. and Dent.|IST/AST - NJMS Medical Science Bldg - C630
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGpO2Bmb+gadEcsb4RAovaAKDN9E+Z32g25j7QgY+oCKnXxn2W1QCgzg5i
DAPsInHWmca02OB69yd6Lec=
=qT9Q
-END PGP SIGNATURE-
begin:vcard
fn:Ryan Novosielski
n:Novosielski;Ryan
org:UMDNJ;IST/AST
adr;dom:MSB C630;;185 South Orange Avenue;Newark;NJ;07103
email;internet:[EMAIL PROTECTED]
title:Systems Programmer III
tel;work:(973) 972-0922
tel;fax:(973) 972-7412
tel;pager:(866) 20-UMDNJ
x-mozilla-html:FALSE
version:2.1
end:vcard

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Dan Langille
On 23 Jul 2007 at 14:03, Ryan Novosielski wrote:

 This has been brought up several times within the last week, but never
 with the explanation and examination. I wonder if some of the other
 who have experienced it (I do not know their names -- hopefully they
 can chime in) can do the same thing for us. This is potentially
 serious, seems like, if it is a widespread problem.

It may be the same case, raised by different people.

 I think if the others can verify it, this should also be copied to
 Bacula devel. I think I will try a large restore of my own today to
 see what happens.

Devel is aware of the issue as it was originally raised in the bug 
tracking system.  The consensus was it is not a bug, or more 
correctly, there was no information supplied which permitted 
reproduction of the bug.  If we can't reproduce the bug, we sure 
can't test for it, and we sure can't confirm it's been fixed.

 Please give the rest of the details of your setup, however -- you
 don't even include the Bacula version, and that is a very basic piece
 of information. Operating system (presumably RedHat Linux from the
 file you backed up, but who knows), architecture... all would be
 useful.

This might help: http://bugs.bacula.org/view.php?id=903

-- 
Dan Langille - http://www.langille.org/
Available for hire: http://www.freebsddiary.org/dan_langille.php



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Doytchin Spiridonov
Hello,

I've filed this as a bug, but while Kern couldn't reproduce it he gave
up. So let us find here what could be the problem. There are actually
two problems, they could be linked.

Here is the history:
Initially we were using 2.0.3. Running backups for several weeks I
wanted to restore a file and was surprised that I can't restore it. It
was listed in the catalog, I could select it and run a restore job,
but the file didn't come up. Investigating what happened I run a full
restore job and was surprised that in that directory (where the file
is) several files are missing. Also the error message similar to the
one in my first post here were present. In addition to it there was a
big difference between marked files and actually restored files (sure
not hard links, sockets or anything else that is ignored by Bacula -
at one of the tests the whole /home/ directory was missing).
After that we startd with tests (backup full/diff/inc, restore etc)
for a week. Every time (but at random places/files) similar error
happen. Sometimes there are errors, sometimes not. Haven't run so much
tests so I could come up with a decision when this happens. But IT
HAPPENS and as a result we don't have a reliable backup. I know a lot
of people run backups w/o testing restores and that's why (if this is
not related to our specific setup) those problem could appear only if
they have emergency which actually doesn't happen often. Anyway, here
are the hardware and setup details:

*** Bacula: 2.1.28 on all servers.
From yesterday we cleaned everything (bacula DB and volumes) and
installed everywhere the latest beta *2.1.28* (note this is not the
problem of the beta as we discovered when we had 2.0.3). 2.1.28 fixed
2 other problems we discovered with 2.0.3, but this one is still
there.
Director and most of the servers are 64 bit, two of the servers are 32
bit.
*** OS: Linux CentOS 4.5
*** MySQL: 5.0.37
*** Servers (all are almost identical): Supermicro, PDSME - Intel
E7230 (Mukilteo) chipset, Intel Pentium D 930 Dual Core 3.0GHz, 3Ware
IDE RAID Controller Escalade 9550SX. Servers have 4 disks each in RAID
1+0, only the Bacula server has many disks in RAID 5.
*** Some servers are plain CentOS, some have Xen with virtual servers,
the Bacula server itsels also has Xen, but the Bacula is running in
Dom0, no other virtual machines at this time are running on it.
*** Those servers with Xen als have LVM.
*** We run (and I guess here is the problem of Bacula) concurrent
jobs.
*** GZIP compression is enabled.
*** we save volumes on harddisk, their size is set to 4480MB

--- How to get an error:
As initially we discovered the error after several weeks of backups,
We guessed that this could ba caused by us by a wrong setting of
Volume Retention or any other Retention time and some files are
purged.

We started everything from zero again, and after 3 days (it happened
that the first was Full, the next Differential and the last
Incremental) we performed a test and that error happened again! So we
were sure this is not caused by purge of some files accidentally.

After that we could get that error even after just a full backup,
trying to restore immediately after it is finished.

Yesterday we cleaned everything again and compiled (from SRPMs) the
latest 2.1.28.

We run again full backup (again all concurret jobs) and the errors
described here happen when we try to restore files from every job
(except one where there are just 150 files).

So the problems are two:
- sometimes some files are restored with higher size, while the first
part of the file matches exactly the original file (not log files or
dynamic files) This happens on very rare cases (~one case per 5 jobs)
- sometimes not all files are restored, but tens of thousands are
missing, an example:
  Files Expected: 190,718
  Files Restored: 166,097
This happens more often (~one case per 2 jobs).

Note that once the error happens we can reproduce it on every restore
at the same place for the same file and the same number of missing
files (i.e. this is not a problem of restore, it is most a problem of
volumes).

What are our future tests:
1. we will do the same (concurrent jobs) but w.o using GZIP
2. if it happens again we will set max jobs to 1 so every job is run
alone. Because when testing AFAIR we didn't get errors when we run
just one full backup job. This always happen when we do several at
once (but I am not 100% sure, thats why we will test this)
3. if it still happens we will run it with normal kernel (so to exclude
the Xen influence)
4. last we will try w/o LVM (which would be harder)

Regards
P.S. sorry for my English :)


Monday, July 23, 2007, 9:03:45 PM:

RN -BEGIN PGP SIGNED MESSAGE-
RN Hash: SHA1

RN Doytchin Spiridonov wrote:
 Hello,
 
 trying to identify a bug in bacula and/or our system setup.
 
 Is there anyone that on restore had errors like this:
 
 Error: attribs.c:410 File size of restored file
 

Re: [Bacula-users] Restore errors

2007-07-23 Thread Doytchin Spiridonov
Hello Dan,

Monday, July 23, 2007, 9:35:05 PM:


DL Devel is aware of the issue as it was originally raised in the bug 
DL tracking system.  The consensus was it is not a bug, or more 
DL correctly, there was no information supplied which permitted 
DL reproduction of the bug.  If we can't reproduce the bug, we sure 
DL can't test for it, and we sure can't confirm it's been fixed.

Can you provide some guidance on how and what to test? As I personally
don't have any idea what data is stored in the Catalog, what is in the
Volumes, regarding the problem of wrong file size:
- does Bacula store the file size somewhere and if so where? how to extract
that/those numbers if they are at more than one place?
- how then the file could have larger size? (for example there is a
GZIP stream, you unpack it and you get larger result or what? or the second
(wrong) number is the number stored in the volume?)
- do you agree that it is strange that while the stream is gzipped, the
appended data is part of a real file? If the volume were damaged, I
guess the additional data should be some garbage? Also is it possible
that the file is restored OK, but only its size is set to a wrong
mumber and we've seen some data previously existed in the disk? Or
bacula has some internal buffer this additional part was from a file
restored just before the wrong one?

Regards.


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Doytchin Spiridonov
Hello,

I forgot to mention something very IMPORTANT: I discovered that in
*all* of such cases (restored files with larger size), if we don't
perform full restore, but restore a SINGLE file, it is restored OK
with *correct* size and content. It is OK even if we restore the
directory where it is (with the other files in it).

Which proves its is not a problem with the FS, kernel, xen, lvm,
hardware, etc, but it is a problem with Bacula.

Regards


Monday, July 23, 2007, 9:57:40 PM:

DS Hello,

DS I've filed this as a bug, but while Kern couldn't reproduce it he gave
DS up. So let us find here what could be the problem. There are actually
DS two problems, they could be linked.

DS Here is the history:
DS Initially we were using 2.0.3. Running backups for several weeks I
DS wanted to restore a file and was surprised that I can't restore it. It
DS was listed in the catalog, I could select it and run a restore job,
DS but the file didn't come up. Investigating what happened I run a full
DS restore job and was surprised that in that directory (where the file
DS is) several files are missing. Also the error message similar to the
DS one in my first post here were present. In addition to it there was a
DS big difference between marked files and actually restored files (sure
DS not hard links, sockets or anything else that is ignored by Bacula -
DS at one of the tests the whole /home/ directory was missing).
DS After that we startd with tests (backup full/diff/inc, restore etc)
DS for a week. Every time (but at random places/files) similar error
DS happen. Sometimes there are errors, sometimes not. Haven't run so much
DS tests so I could come up with a decision when this happens. But IT
DS HAPPENS and as a result we don't have a reliable backup. I know a lot
DS of people run backups w/o testing restores and that's why (if this is
DS not related to our specific setup) those problem could appear only if
DS they have emergency which actually doesn't happen often. Anyway, here
DS are the hardware and setup details:

DS *** Bacula: 2.1.28 on all servers.
From yesterday we cleaned everything (bacula DB and volumes) and
DS installed everywhere the latest beta *2.1.28* (note this is not the
DS problem of the beta as we discovered when we had 2.0.3). 2.1.28 fixed
DS 2 other problems we discovered with 2.0.3, but this one is still
DS there.
DS Director and most of the servers are 64 bit, two of the servers are 32
DS bit.
DS *** OS: Linux CentOS 4.5
DS *** MySQL: 5.0.37
DS *** Servers (all are almost identical): Supermicro, PDSME - Intel
DS E7230 (Mukilteo) chipset, Intel Pentium D 930 Dual Core 3.0GHz, 3Ware
DS IDE RAID Controller Escalade 9550SX. Servers have 4 disks each in RAID
DS 1+0, only the Bacula server has many disks in RAID 5.
DS *** Some servers are plain CentOS, some have Xen with virtual servers,
DS the Bacula server itsels also has Xen, but the Bacula is running in
DS Dom0, no other virtual machines at this time are running on it.
DS *** Those servers with Xen als have LVM.
DS *** We run (and I guess here is the problem of Bacula) concurrent
DS jobs.
DS *** GZIP compression is enabled.
DS *** we save volumes on harddisk, their size is set to 4480MB

DS --- How to get an error:
DS As initially we discovered the error after several weeks of backups,
DS We guessed that this could ba caused by us by a wrong setting of
DS Volume Retention or any other Retention time and some files are
DS purged.

DS We started everything from zero again, and after 3 days (it happened
DS that the first was Full, the next Differential and the last
DS Incremental) we performed a test and that error happened again! So we
DS were sure this is not caused by purge of some files accidentally.

DS After that we could get that error even after just a full backup,
DS trying to restore immediately after it is finished.

DS Yesterday we cleaned everything again and compiled (from SRPMs) the
DS latest 2.1.28.

DS We run again full backup (again all concurret jobs) and the errors
DS described here happen when we try to restore files from every job
DS (except one where there are just 150 files).

DS So the problems are two:
DS - sometimes some files are restored with higher size, while the first
DS part of the file matches exactly the original file (not log files or
DS dynamic files) This happens on very rare cases (~one case per 5 jobs)
DS - sometimes not all files are restored, but tens of thousands are
DS missing, an example:
DS   Files Expected: 190,718
DS   Files Restored: 166,097
DS This happens more often (~one case per 2 jobs).

DS Note that once the error happens we can reproduce it on every restore
DS at the same place for the same file and the same number of missing
DS files (i.e. this is not a problem of restore, it is most a problem of
DS volumes).

DS What are our future tests:
DS 1. we will do the same (concurrent jobs) but w.o using GZIP
DS 2. if it happens again we will set max jobs to 1 so every job is run
DS 

Re: [Bacula-users] Restore errors

2007-07-23 Thread Julien
 sometimes not all files are restored, but tens of thousands are 
 missing, an example:
 Files Expected: 190,718
 Files Restored: 166,097
 This happens more often (~one case per 2 jobs).

Just to say that I have the case too every time I restore, but I think
you can ignore it. Following my observations and the little tests I
made, the difference between files expected and files restored are
the number of directories. It seems that bacula doesn't handle this
properly (a bug in the counter .. ?).

Regards,
Julien


On Mon, 2007-07-23 at 23:01 +0300, Doytchin Spiridonov wrote:
 Hello,
 
 I forgot to mention something very IMPORTANT: I discovered that in
 *all* of such cases (restored files with larger size), if we don't
 perform full restore, but restore a SINGLE file, it is restored OK
 with *correct* size and content. It is OK even if we restore the
 directory where it is (with the other files in it).
 
 Which proves its is not a problem with the FS, kernel, xen, lvm,
 hardware, etc, but it is a problem with Bacula.
 
 Regards
 
 
 Monday, July 23, 2007, 9:57:40 PM:
 
 DS Hello,
 
 DS I've filed this as a bug, but while Kern couldn't reproduce it he gave
 DS up. So let us find here what could be the problem. There are actually
 DS two problems, they could be linked.
 
 DS Here is the history:
 DS Initially we were using 2.0.3. Running backups for several weeks I
 DS wanted to restore a file and was surprised that I can't restore it. It
 DS was listed in the catalog, I could select it and run a restore job,
 DS but the file didn't come up. Investigating what happened I run a full
 DS restore job and was surprised that in that directory (where the file
 DS is) several files are missing. Also the error message similar to the
 DS one in my first post here were present. In addition to it there was a
 DS big difference between marked files and actually restored files (sure
 DS not hard links, sockets or anything else that is ignored by Bacula -
 DS at one of the tests the whole /home/ directory was missing).
 DS After that we startd with tests (backup full/diff/inc, restore etc)
 DS for a week. Every time (but at random places/files) similar error
 DS happen. Sometimes there are errors, sometimes not. Haven't run so much
 DS tests so I could come up with a decision when this happens. But IT
 DS HAPPENS and as a result we don't have a reliable backup. I know a lot
 DS of people run backups w/o testing restores and that's why (if this is
 DS not related to our specific setup) those problem could appear only if
 DS they have emergency which actually doesn't happen often. Anyway, here
 DS are the hardware and setup details:
 
 DS *** Bacula: 2.1.28 on all servers.
 From yesterday we cleaned everything (bacula DB and volumes) and
 DS installed everywhere the latest beta *2.1.28* (note this is not the
 DS problem of the beta as we discovered when we had 2.0.3). 2.1.28 fixed
 DS 2 other problems we discovered with 2.0.3, but this one is still
 DS there.
 DS Director and most of the servers are 64 bit, two of the servers are 32
 DS bit.
 DS *** OS: Linux CentOS 4.5
 DS *** MySQL: 5.0.37
 DS *** Servers (all are almost identical): Supermicro, PDSME - Intel
 DS E7230 (Mukilteo) chipset, Intel Pentium D 930 Dual Core 3.0GHz, 3Ware
 DS IDE RAID Controller Escalade 9550SX. Servers have 4 disks each in RAID
 DS 1+0, only the Bacula server has many disks in RAID 5.
 DS *** Some servers are plain CentOS, some have Xen with virtual servers,
 DS the Bacula server itsels also has Xen, but the Bacula is running in
 DS Dom0, no other virtual machines at this time are running on it.
 DS *** Those servers with Xen als have LVM.
 DS *** We run (and I guess here is the problem of Bacula) concurrent
 DS jobs.
 DS *** GZIP compression is enabled.
 DS *** we save volumes on harddisk, their size is set to 4480MB
 
 DS --- How to get an error:
 DS As initially we discovered the error after several weeks of backups,
 DS We guessed that this could ba caused by us by a wrong setting of
 DS Volume Retention or any other Retention time and some files are
 DS purged.
 
 DS We started everything from zero again, and after 3 days (it happened
 DS that the first was Full, the next Differential and the last
 DS Incremental) we performed a test and that error happened again! So we
 DS were sure this is not caused by purge of some files accidentally.
 
 DS After that we could get that error even after just a full backup,
 DS trying to restore immediately after it is finished.
 
 DS Yesterday we cleaned everything again and compiled (from SRPMs) the
 DS latest 2.1.28.
 
 DS We run again full backup (again all concurret jobs) and the errors
 DS described here happen when we try to restore files from every job
 DS (except one where there are just 150 files).
 
 DS So the problems are two:
 DS - sometimes some files are restored with higher size, while the first
 DS part of the file matches exactly the original file (not log files or
 DS 

Re: [Bacula-users] Restore errors

2007-07-23 Thread Doytchin Spiridonov
Hello,

no, probably you didn't found which are the missing files. After we
restore we compare the restored files with original. The conclusion is
that there are really missing files! (As I mentioned those are not
hardlinks, sockets, etc - in a test we had missing /home/ directory
and all files in it!)

Bacula's counter is OK and from our tests I can say that the only good
restore is that when those numbers match. If you see difference like
below, you can be sure your restore file set is really wrong.

Could you please check if the above is true for your restores? Would
be helpful to know we are not alone.

P.S. And please, anyone who never did a restore, please do some tests.
That way you will be sure you have a valid backup OR you will know you
have an invalid one ;) The easier test is to do full restore somewhere
and to check if the Files Expected and Files Restored are much
different (and/or the error regarding bad file size).

Regards.


Monday, July 23, 2007, 11:18:00 PM:

 sometimes not all files are restored, but tens of thousands are 
 missing, an example:
 Files Expected: 190,718
 Files Restored: 166,097
 This happens more often (~one case per 2 jobs).

J Just to say that I have the case too every time I restore, but I think
J you can ignore it. Following my observations and the little tests I
J made, the difference between files expected and files restored are
J the number of directories. It seems that bacula doesn't handle this
J properly (a bug in the counter .. ?).

J Regards,
J Julien


J On Mon, 2007-07-23 at 23:01 +0300, Doytchin Spiridonov wrote:
 Hello,
 
 I forgot to mention something very IMPORTANT: I discovered that in
 *all* of such cases (restored files with larger size), if we don't
 perform full restore, but restore a SINGLE file, it is restored OK
 with *correct* size and content. It is OK even if we restore the
 directory where it is (with the other files in it).
 
 Which proves its is not a problem with the FS, kernel, xen, lvm,
 hardware, etc, but it is a problem with Bacula.
 
 Regards
 
 
 Monday, July 23, 2007, 9:57:40 PM:
 
 DS Hello,
 
 DS I've filed this as a bug, but while Kern couldn't reproduce it he gave
 DS up. So let us find here what could be the problem. There are actually
 DS two problems, they could be linked.
 
 DS Here is the history:
 DS Initially we were using 2.0.3. Running backups for several weeks I
 DS wanted to restore a file and was surprised that I can't restore it. It
 DS was listed in the catalog, I could select it and run a restore job,
 DS but the file didn't come up. Investigating what happened I run a full
 DS restore job and was surprised that in that directory (where the file
 DS is) several files are missing. Also the error message similar to the
 DS one in my first post here were present. In addition to it there was a
 DS big difference between marked files and actually restored files (sure
 DS not hard links, sockets or anything else that is ignored by Bacula -
 DS at one of the tests the whole /home/ directory was missing).
 DS After that we startd with tests (backup full/diff/inc, restore etc)
 DS for a week. Every time (but at random places/files) similar error
 DS happen. Sometimes there are errors, sometimes not. Haven't run so much
 DS tests so I could come up with a decision when this happens. But IT
 DS HAPPENS and as a result we don't have a reliable backup. I know a lot
 DS of people run backups w/o testing restores and that's why (if this is
 DS not related to our specific setup) those problem could appear only if
 DS they have emergency which actually doesn't happen often. Anyway, here
 DS are the hardware and setup details:
 
 DS *** Bacula: 2.1.28 on all servers.
 From yesterday we cleaned everything (bacula DB and volumes) and
 DS installed everywhere the latest beta *2.1.28* (note this is not the
 DS problem of the beta as we discovered when we had 2.0.3). 2.1.28 fixed
 DS 2 other problems we discovered with 2.0.3, but this one is still
 DS there.
 DS Director and most of the servers are 64 bit, two of the servers are 32
 DS bit.
 DS *** OS: Linux CentOS 4.5
 DS *** MySQL: 5.0.37
 DS *** Servers (all are almost identical): Supermicro, PDSME - Intel
 DS E7230 (Mukilteo) chipset, Intel Pentium D 930 Dual Core 3.0GHz, 3Ware
 DS IDE RAID Controller Escalade 9550SX. Servers have 4 disks each in RAID
 DS 1+0, only the Bacula server has many disks in RAID 5.
 DS *** Some servers are plain CentOS, some have Xen with virtual servers,
 DS the Bacula server itsels also has Xen, but the Bacula is running in
 DS Dom0, no other virtual machines at this time are running on it.
 DS *** Those servers with Xen als have LVM.
 DS *** We run (and I guess here is the problem of Bacula) concurrent
 DS jobs.
 DS *** GZIP compression is enabled.
 DS *** we save volumes on harddisk, their size is set to 4480MB
 
 DS --- How to get an error:
 DS As initially we discovered the error after several weeks of backups,
 DS We guessed that 

Re: [Bacula-users] Restore errors

2007-07-23 Thread Dan Langille
On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote:

 Hello Dan,
 
 Monday, July 23, 2007, 9:35:05 PM:
 
 
 DL Devel is aware of the issue as it was originally raised in the bug 
 DL tracking system.  The consensus was it is not a bug, or more 
 DL correctly, there was no information supplied which permitted 
 DL reproduction of the bug.  If we can't reproduce the bug, we sure 
 DL can't test for it, and we sure can't confirm it's been fixed.
 
 Can you provide some guidance on how and what to test?

Test the backups.  Backup one file. Restore.  Backup N files.  
restore.  Backup N directories, restore.  Find a simple and 
reproducible situation which demonstrates the problem.

 As I personally
 don't have any idea what data is stored in the Catalog, what is in the
 Volumes, regarding the problem of wrong file size:
 - does Bacula store the file size somewhere and if so where? how to extract
 that/those numbers if they are at more than one place?

There is a section in the manual regarding database tables.  I 
suspect you want the File Job table, but I cannot recall from memory.

 - how then the file could have larger size? (for example there is a
 GZIP stream, you unpack it and you get larger result or what? or the
 second (wrong) number is the number stored in the volume?) 

I don't know.

 - do you agree that it is strange that while the stream is gzipped, the
 appended data is part of a real file? If the volume were damaged, I
 guess the additional data should be some garbage? Also is it possible
 that the file is restored OK, but only its size is set to a wrong
 mumber and we've seen some data previously existed in the disk? Or
 bacula has some internal buffer this additional part was from a file
 restored just before the wrong one?

Someone sugested you verify your filesystem (e.g. fsck).  Have you 
done that?

I suggest avoiding trying to answer your above questions.  
Concentrate on the following.

My suggestion, as was made by others, is to find something which can 
be reproduced.  Developers cannot solve the problem if they cannot 
reproduce it.

-- 
Dan Langille - http://www.langille.org/
Available for hire: http://www.freebsddiary.org/dan_langille.php



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Dan Langille
On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote:

 Hello,
 
 I've filed this as a bug, but while Kern couldn't reproduce it he gave
 up. So let us find here what could be the problem. There are actually
 two problems, they could be linked.

Please.  If anyone can solve the issue given what you supplied, they 
would.  You were asked to supply a reproducible situation.  Hopefully 
we can get to that position quickly without further unnecessary 
distractions.

-- 
Dan Langille - http://www.langille.org/
Available for hire: http://www.freebsddiary.org/dan_langille.php



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Doytchin Spiridonov
Hello,

don't get me wrong, I know pretty well that nothing can be done there
until you have a clean example. But having in mind that these errors
hapen in 1 per 4 million files it also would be hard to isolate the
case where this happens. The fact is that it happens pretty often and
as I said we are continuing to try different ways to see when it will
stop to happen. If we find that I will provide the info here.

Regards.

Tuesday, July 24, 2007, 12:15:43 AM:

DL On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote:

 Hello,
 
 I've filed this as a bug, but while Kern couldn't reproduce it he gave
 up. So let us find here what could be the problem. There are actually
 two problems, they could be linked.

DL Please.  If anyone can solve the issue given what you supplied, they 
DL would.  You were asked to supply a reproducible situation.  Hopefully 
DL we can get to that position quickly without further unnecessary 
DL distractions.



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Doytchin Spiridonov
Hello,

Tuesday, July 24, 2007, 12:15:37 AM:

DL Someone sugested you verify your filesystem (e.g. fsck).  Have you 
DL done that?

Yes:

FS The first thing that I would try is unmounting the filesystem and 
performing a
FS full fsck on it, to rule out filesystem corruption.


1. unmounted and checked the partition with volumes for bad blocks and
fs problems:
   fsck.ext3 -n -c -v /dev/
Got NO errors.

2. for the volumes used by the jobs with problems, I did the following:
   bls -k -v /home/bacula/FILE0001
and got NO errors (I'm not sure if there is something wrong with the
volume the above should produce errors, but it listed all blocks and they
were OK)

3. same for:
   bls -j /home/bacula/FILE0001

   
!NEW: Here is one additional error and note:

I mentioned that if the file that is restored with a wrong size during
a full backup is OK when restored alone. However in that case another
error is reported:

Storage: Restore_b0.d6.int.2007-07-23_22.57.42 Error: block.c:275 Volume data 
error at 0:3999743252!
Wanted ID: BB02, got Иnлу. Buffer discarded.

Note tha this error didn't affected the file restore - it was OK -
size and its content.


After we verified above everything and the volumes have correct
blocks, this error is trange. Does this means that Bacula is
positioning at a wrong place rather there is a problem with the volume
itself?

I reported that once and now (different jobs and version of Bacula -
before it was 2.0.3, now 2.1.28) it is the same.

The strange thing is (as before):
- we listed all blocks in the volumes with bls -k
- there is no such File:blk as 0:3999743252
- the number 3999743252 however appears in the database, one entry in
the table JobMedia:
(18,2,2,134546,159167,0,1,3999743252,400989292,5,0,0)

While I don't know for what it is used, I can see that:
- either this number in the database is wrong
- either Bacula should not try to position at that number as there is
no valid File:blk at that address.



Regards



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Doytchin Spiridonov
Hello,

Monday, July 23, 2007, 9:02:21 PM:


FS The first thing that I would try is unmounting the filesystem and 
performing a
FS full fsck on it, to rule out filesystem corruption.

not a problem with the FS or disks, checked that.

Checked the logical content of the volumes as well (bls -k -v) - no
errors. For curiosity as I don't know if bls should print errors if
content is damaged, I changed one random byte of one volume to 0,
run the bls and got an error: Block checksum mismatch in block=113
len=64512: calc=2a576dc5 blk=44a509f3

So I would say the 3 problems (files with wrong size, missing files
and error about ID: BB02) are not hardware/fs/disks related and are
caused by a bug in Bacula and it is related to wrong positioning in
the volumes and mismatched numbers.


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Frank Sweetser
Doytchin Spiridonov wrote:
 Hello,
 
 Monday, July 23, 2007, 9:02:21 PM:
 
 
 FS The first thing that I would try is unmounting the filesystem and 
 performing a
 FS full fsck on it, to rule out filesystem corruption.
 
 not a problem with the FS or disks, checked that.
 
 Checked the logical content of the volumes as well (bls -k -v) - no
 errors. For curiosity as I don't know if bls should print errors if
 content is damaged, I changed one random byte of one volume to 0,
 run the bls and got an error: Block checksum mismatch in block=113
 len=64512: calc=2a576dc5 blk=44a509f3
 
 So I would say the 3 problems (files with wrong size, missing files
 and error about ID: BB02) are not hardware/fs/disks related and are
 caused by a bug in Bacula and it is related to wrong positioning in
 the volumes and mismatched numbers.

Based on this and your other emails, I would next suspect a problem with the
catalog.  Again, I'd start by making sure no errors have crept in by doing a
consistency check at the database level - a 'repair tables' in mysql, or the
equivalent in postgresql.

-- 
Frank Sweetser fs at wpi.edu  |  For every problem, there is a solution that
WPI Network Engineer  |  is simple, elegant, and wrong. - HL Mencken
GPG fingerprint = 6174 1257 129E 0D21 D8D4  E8A3 8E39 29E3 E2E8 8CEC

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Doytchin Spiridonov
Hello,

Tuesday, July 24, 2007, 12:15:37 AM:

DL On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote:


DL Test the backups.  Backup one file. Restore.  Backup N files.  
DL restore.  Backup N directories, restore.  Find a simple and 
DL reproducible situation which demonstrates the problem.

Tested disks and backups (volumes) they are OK and not damaged. From
the small amount of tests so far (40-50) I can say (but not 100% sure
as this is not planned QA we are just doing different combinations to
achieve backup/restore w/o errors) that backing up a single job (even
with 4M of files) and restore is OK and there are no errors. All of
the cases with errors happen when we run jobs concurrently.

DL My suggestion, as was made by others, is to find something which can 
DL be reproduced.  Developers cannot solve the problem if they cannot 
DL reproduce it.

Yes we have it - everytime we do full backup concurrently of 4-5
servers we are testing with, we have at least one job that has a
problem.

We will continue with the tests as I noted before to see when (if) we
will get backup/restore w/o an error.

Regards.



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Doytchin Spiridonov
Hello,

Tuesday, July 24, 2007, 2:25:44 AM:

FS Doytchin Spiridonov wrote:
 Hello,
 
 Monday, July 23, 2007, 9:02:21 PM:
 
 
 FS The first thing that I would try is unmounting the filesystem and 
 performing a
 FS full fsck on it, to rule out filesystem corruption.
 
 not a problem with the FS or disks, checked that.
 
 Checked the logical content of the volumes as well (bls -k -v) - no
 errors. For curiosity as I don't know if bls should print errors if
 content is damaged, I changed one random byte of one volume to 0,
 run the bls and got an error: Block checksum mismatch in block=113
 len=64512: calc=2a576dc5 blk=44a509f3
 
 So I would say the 3 problems (files with wrong size, missing files
 and error about ID: BB02) are not hardware/fs/disks related and are
 caused by a bug in Bacula and it is related to wrong positioning in
 the volumes and mismatched numbers.

FS Based on this and your other emails, I would next suspect a problem with the
FS catalog.  Again, I'd start by making sure no errors have crept in by doing a
FS consistency check at the database level - a 'repair tables' in mysql, or the
FS equivalent in postgresql.


Not the case - we are doing daily DB dumps before backups and they
would crash if tables are damaged. No mysql error logs. The problem is
not in the MySQL or broken dbs.

Regards.


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors

2007-07-23 Thread Doytchin Spiridonov
Hello,

done. Found where is the problem after some more tests (and once again
it is not in our hadrware or OS or broken things). It is where I
initially suggested - the concurrent jobs.

After the first (and native configuration) we used (concurrent jobs,
with gzip) we tested the following:

1. concurrent jobs, w/o gzip
- we got similar errors (1 wrong filesize from 4 jobs, but 3 of 4 jobs
with less files than expected, the 4th usually is very small - 100
files - and never had errors, so I would say 100% of jobs was invalid)

2. no concurrent jobs (Maximum Concurrent Jobs = 1 at dir and sd), w/o
gzip
- good news, all restores are OK, no errors, Files Expected and Files
Restored match!

3. no concurrent jobs WITH gzip
- again OK, all restores are OK, no errors, Files Expected and Files
Restored match!

So until now we have:
- the problem is not caused by a corrupted file system
- volumes are consistent and bls doesn't show errors
- MySQL is OK (initially 4.1.x now 5.0.37)
- when running concurrent jobs both 2.0.3 and 2.1.28 say backups are
OK but restores fail with one of the 3 kinds of errors listed below
- when concurrent jobs are turned off everything is OK
- gzip on/off doesn't affect the errors

Once again the 3 types of errors are:

1. some static files (i.e. not log files!) are restored with wrong
(always larger) size, while first N bytes match, and the rest is
filled with a part of another file (not sure if this is just file with
a wrong size and some old data at the disk appears at the end, or
bacula restores part of another file and append it to the end). The
file can be restored correctly if marked alone but the error 3. below
is generated (which seems to be just a bogus error). An example error is:
---
b0: Restore_b0.d6.int.2007-07-23_22.37.34 Error: attribs.c:410 File size
of restored file
/home/bacula/res/b3.2/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm
not correct. Original 3826291, restored 10620921.
---
When this error is present (always) the second error below (but w/o
additional error messages) is present as well (missing files)

2. large amount of files are missing (while they are present in the
catalog and selected) - tens of thousands (not sockets or anything
else that Bacula ignores by default). When this happens usually an
error like this appear (if not the first one above):
---
b3: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: Record header
file index 42452 not equal record index 0
Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Fatal error: read.c:124
Error sending to File daemon. ERR=Connection reset by peer
Storage: Restore_b3.d6.int.2007-07-23_17.31.47 Error: bsock.c:306
Write error sending 30 bytes to client:10.2.1.13:36643: ERR=Connection
reset by peer
---

3. when a file from error 1 is restored alone it is OK, but another
bogus error is generated:
---
Storage: Restore_b0.d6.int.2007-07-23_22.57.42 Error: block.c:275
Volume data error at 0:3999743252! Wanted ID: BB02, got Иnлу.
Buffer discarded.
---
Found that the above number (3999743252) is not present as block
address for any block in the volumes, but the same number appears as
part of JobMedia record in the database.


This is everything in 2.1.28 sumarized, that poped up as a problem or
fact.
(2.0.3 had another bug with bogus errors about sockets' attributes and
2.1.26 had a bogus SQL error messages but those are fixed OK in
2.1.28).

If anyone wants, feel free to reopen the bug in Mantis (903). I'm not
going to do so as I am personally disappointed by the attitude this
is not a bug - work it out yourself and the suggestion to send you
our servers as a gift to test with, plus support fees... nice. Now
it's up to you to create better test cases to catch more bugs if any.

We will start our backup again w/o concurrent jobs and we will
continue to monitor restores on a daily basis as the above tests are
just 3 and I agree there is a posibility that it was just a chance
that the later two tests went OK. But it was my suggestion from the
beginning that the problem is Bacula damages either database numbers
or volume records when concurrent jobs are running and so far the
facts proved this.


(!) The workaround for the problem is to switch off concurrent jobs as
if not - the chance you have invalid backups are high (some 90% from
our own cases and at least with our servers/os/configuration; this is
so if it is not said that 100% of backups are wrong as after
diff/incremental backups Bacula restores files that are deleted which
is really a bad behaviour in many cases/services).


Regards



Tuesday, July 24, 2007, 12:15:43 AM:

DL On 23 Jul 2007 at 21:57, Doytchin Spiridonov wrote:

 Hello,
 
 I've filed this as a bug, but while Kern couldn't reproduce it he gave
 up. So let us find here what could be the problem. There are actually
 two problems, they could be linked.

DL Please.  If anyone can solve the issue given what you supplied, they 
DL would.  You were asked to supply a reproducible situation.  Hopefully 
DL 

Re: [Bacula-users] Restore errors: Permission Denied

2005-10-25 Thread Kern Sibbald
On Friday 21 October 2005 21:11, Martin Simmons wrote:
  On Thu, 20 Oct 2005 16:29:17 +1000, Craig Holyoak
  [EMAIL PROTECTED] said:

   Craig On Wed, 2005-10-19 at 11:17 +0100, Martin Simmons wrote:=20

 On Wed, 19 Oct 2005 12:08:24 +1000, Craig Holyoak
 [EMAIL PROTECTED] said:

   Craig I'm running bacula 1.36.2 on Debian stable. Whenever I try to run
 a restore Craig job, it fails with:

   Craig 19-Oct 11:29 helmsdeep: Start Restore Job
 Restore.2005-10-19_11.29.50 Craig 19-Oct 11:29 newman:
 Restore.2005-10-19_11.29.50 Fatal error:  Could not create bootstrap file
 /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap:
 ERR=3DPermission denied Craig 19-Oct 11:29 newman:
 Restore.2005-10-19_11.29.50 Fatal error: job.c:1662 Comm error with SD. bad
 response to Bootstrap. ERR=3DConnection reset by peer Craig 19-Oct 11:29
 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 1.36.2 (28Feb05):
 19-Oct-2005 11:29:53 Craig JobId:  878
   Craig Job:Restore.2005-10-19_11.29.50
   Craig Client: newman
   Craig Start time: 19-Oct-2005 11:29:52
   Craig End time:   19-Oct-2005 11:29:53
   Craig Files Expected: 1
   Craig Files Restored: 0
   Craig Bytes Restored: 0
   Craig Rate:   0.0 KB/s
   Craig FD Errors:  0
   Craig FD termination status:  Error
   Craig SD termination status:  Error
   Craig Termination:*** Restore Error ***

   Craig This machine runs all bacula daemons. The director and sd run as
 the bacula Craig user, and the sd runs as root. /var/lib/bacula is
 writable by the bacula user Craig (and root, obviously :-).

   Craig I've tried modifying the job and redirecting the bootstrap file
 elsewhere (eg Craig /tmp), but I keep getting the same errors. I have
 never run a successful Craig restore using bconsole. I'm forced to use
 bextract to do all my restores, Craig which works fine.

   Craig Any ideas?

The error comes from the SD.
   
So helmsdeep and newman are the same machine?

   Craig helmsdeep is the name of my bacula director, which runs on
 newman, Craig so yes, these are the same machine.

Does getting the same errors mean that it always says
/var/lib/bacula even when the bootstrap should be in /tmp?

   Craig Perhaps I'm a little unclear on what bootstraps are involved. By
   Craig default, a bootstrap for the job is placed
   Craig in /var/lib/bacula/restore.bsr. If I change this elsewhere by
 modifying Craig the job before it is run, ie, /tmp/restore.bsr, it fails
 because it Craig can't find the file - there is no /tmp/restore.bsr. But
 even though it Craig creates the bootstrap successfully to
 /var/lib/bacula/restore.bsr, it Craig still wants to
   Craig create
 /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap, at Craig
 which point if fails with permission denied.

 Ah, I see.

 From the output, it looks like your File Daemon and Store Daemon are both

 called newman (in the conf files)?  What happens if you rename one of the
 them to something else?

 I'm wondering if there is a filename clash somehere.

Yes, if two daemons are running on the same machine sharing the same working 
directory they *MUST* have unique names.


 __Martin


 ---
 This SF.Net email is sponsored by:
 Power Architecture Resource Center: Free content, downloads, discussions,
 and more. http://solutions.newsforge.com/ibmarch.tmpl
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


---
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors: Permission Denied

2005-10-21 Thread Martin Simmons
 On Thu, 20 Oct 2005 16:29:17 +1000, Craig Holyoak [EMAIL PROTECTED] 
 said:

  Craig On Wed, 2005-10-19 at 11:17 +0100, Martin Simmons wrote:=20
On Wed, 19 Oct 2005 12:08:24 +1000, Craig Holyoak [EMAIL PROTECTED] 
said:
   
  Craig I'm running bacula 1.36.2 on Debian stable. Whenever I try to run a 
restore
  Craig job, it fails with:
   
  Craig 19-Oct 11:29 helmsdeep: Start Restore Job Restore.2005-10-19_11.29.50
  Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error:  Could 
not create bootstrap file 
/var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap: ERR=3DPermission 
denied
  Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: 
job.c:1662 Comm error with SD. bad response to Bootstrap. ERR=3DConnection 
reset by peer
  Craig 19-Oct 11:29 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 
1.36.2 (28Feb05): 19-Oct-2005 11:29:53
  Craig JobId:  878
  Craig Job:Restore.2005-10-19_11.29.50
  Craig Client: newman
  Craig Start time: 19-Oct-2005 11:29:52
  Craig End time:   19-Oct-2005 11:29:53
  Craig Files Expected: 1
  Craig Files Restored: 0
  Craig Bytes Restored: 0
  Craig Rate:   0.0 KB/s
  Craig FD Errors:  0
  Craig FD termination status:  Error
  Craig SD termination status:  Error
  Craig Termination:*** Restore Error ***
   
  Craig This machine runs all bacula daemons. The director and sd run as the 
bacula
  Craig user, and the sd runs as root. /var/lib/bacula is writable by the 
bacula user
  Craig (and root, obviously :-).
   
  Craig I've tried modifying the job and redirecting the bootstrap file 
elsewhere (eg
  Craig /tmp), but I keep getting the same errors. I have never run a 
successful
  Craig restore using bconsole. I'm forced to use bextract to do all my 
restores,
  Craig which works fine.
   
  Craig Any ideas?
   
   The error comes from the SD.
   
   So helmsdeep and newman are the same machine?

  Craig helmsdeep is the name of my bacula director, which runs on newman,
  Craig so yes, these are the same machine.

   Does getting the same errors mean that it always says /var/lib/bacula 
even
   when the bootstrap should be in /tmp?

  Craig Perhaps I'm a little unclear on what bootstraps are involved. By
  Craig default, a bootstrap for the job is placed
  Craig in /var/lib/bacula/restore.bsr. If I change this elsewhere by modifying
  Craig the job before it is run, ie, /tmp/restore.bsr, it fails because it
  Craig can't find the file - there is no /tmp/restore.bsr. But even though it
  Craig creates the bootstrap successfully to /var/lib/bacula/restore.bsr, it
  Craig still wants to
  Craig create /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap, at
  Craig which point if fails with permission denied.

Ah, I see.

From the output, it looks like your File Daemon and Store Daemon are both
called newman (in the conf files)?  What happens if you rename one of the
them to something else?

I'm wondering if there is a filename clash somehere.

__Martin


---
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Restore errors: Permission Denied [SOLVED]

2005-10-21 Thread Craig Holyoak
On Fri, 2005-10-21 at 20:11 +0100, Martin Simmons wrote:
  On Thu, 20 Oct 2005 16:29:17 +1000, Craig Holyoak [EMAIL PROTECTED] 
  said:
 
   Craig On Wed, 2005-10-19 at 11:17 +0100, Martin Simmons wrote:=20
 On Wed, 19 Oct 2005 12:08:24 +1000, Craig Holyoak [EMAIL 
 PROTECTED] said:

   Craig I'm running bacula 1.36.2 on Debian stable. Whenever I try to run a 
 restore
   Craig job, it fails with:

   Craig 19-Oct 11:29 helmsdeep: Start Restore Job Restore.2005-10-19_11.29.50
   Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error:  Could 
 not create bootstrap file 
 /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap: 
 ERR=3DPermission denied
   Craig 19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: 
 job.c:1662 Comm error with SD. bad response to Bootstrap. ERR=3DConnection 
 reset by peer
   Craig 19-Oct 11:29 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 
 1.36.2 (28Feb05): 19-Oct-2005 11:29:53
   Craig JobId:  878
   Craig Job:Restore.2005-10-19_11.29.50
   Craig Client: newman
   Craig Start time: 19-Oct-2005 11:29:52
   Craig End time:   19-Oct-2005 11:29:53
   Craig Files Expected: 1
   Craig Files Restored: 0
   Craig Bytes Restored: 0
   Craig Rate:   0.0 KB/s
   Craig FD Errors:  0
   Craig FD termination status:  Error
   Craig SD termination status:  Error
   Craig Termination:*** Restore Error ***

   Craig This machine runs all bacula daemons. The director and sd run as the 
 bacula
   Craig user, and the sd runs as root. /var/lib/bacula is writable by the 
 bacula user
   Craig (and root, obviously :-).

   Craig I've tried modifying the job and redirecting the bootstrap file 
 elsewhere (eg
   Craig /tmp), but I keep getting the same errors. I have never run a 
 successful
   Craig restore using bconsole. I'm forced to use bextract to do all my 
 restores,
   Craig which works fine.

   Craig Any ideas?

The error comes from the SD.

So helmsdeep and newman are the same machine?
 
   Craig helmsdeep is the name of my bacula director, which runs on 
 newman,
   Craig so yes, these are the same machine.
 
Does getting the same errors mean that it always says 
 /var/lib/bacula even
when the bootstrap should be in /tmp?
 
   Craig Perhaps I'm a little unclear on what bootstraps are involved. By
   Craig default, a bootstrap for the job is placed
   Craig in /var/lib/bacula/restore.bsr. If I change this elsewhere by 
 modifying
   Craig the job before it is run, ie, /tmp/restore.bsr, it fails because it
   Craig can't find the file - there is no /tmp/restore.bsr. But even though 
 it
   Craig creates the bootstrap successfully to /var/lib/bacula/restore.bsr, it
   Craig still wants to
   Craig create /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap, 
 at
   Craig which point if fails with permission denied.
 
 Ah, I see.
 
 From the output, it looks like your File Daemon and Store Daemon are both
 called newman (in the conf files)?  What happens if you rename one of the
 them to something else?
 
 I'm wondering if there is a filename clash somehere.

Renaming the SD to newman-sd worked perfectly. Thanks!

Craig

-- 
Craig Holyoak
[EMAIL PROTECTED]
http://www.helmsdeep.org/


signature.asc
Description: This is a digitally signed message part


Re: [Bacula-users] Restore errors: Permission Denied

2005-10-20 Thread Craig Holyoak
On Wed, 2005-10-19 at 11:17 +0100, Martin Simmons wrote: 
  On Wed, 19 Oct 2005 12:08:24 +1000, Craig Holyoak [EMAIL PROTECTED] 
  said:
 
   Craig I'm running bacula 1.36.2 on Debian stable. Whenever I try to run a 
 restore
   Craig job, it fails with:
 
   Craig   19-Oct 11:29 helmsdeep: Start Restore Job 
 Restore.2005-10-19_11.29.50
   Craig   19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: 
 Could not create bootstrap file 
 /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap: ERR=Permission 
 denied
   Craig   19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: 
 job.c:1662 Comm error with SD. bad response to Bootstrap. ERR=Connection 
 reset by peer
   Craig   19-Oct 11:29 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 
 1.36.2 (28Feb05): 19-Oct-2005 11:29:53
   Craig JobId:  878
   Craig Job:Restore.2005-10-19_11.29.50
   Craig Client: newman
   Craig Start time: 19-Oct-2005 11:29:52
   Craig End time:   19-Oct-2005 11:29:53
   Craig Files Expected: 1
   Craig Files Restored: 0
   Craig Bytes Restored: 0
   Craig Rate:   0.0 KB/s
   Craig FD Errors:  0
   Craig FD termination status:  Error
   Craig SD termination status:  Error
   Craig Termination:*** Restore Error ***
 
   Craig This machine runs all bacula daemons. The director and sd run as the 
 bacula
   Craig user, and the sd runs as root. /var/lib/bacula is writable by the 
 bacula user
   Craig (and root, obviously :-).
 
   Craig I've tried modifying the job and redirecting the bootstrap file 
 elsewhere (eg
   Craig /tmp), but I keep getting the same errors. I have never run a 
 successful
   Craig restore using bconsole. I'm forced to use bextract to do all my 
 restores,
   Craig which works fine.
 
   Craig Any ideas?
 
 The error comes from the SD.
 
 So helmsdeep and newman are the same machine?

helmsdeep is the name of my bacula director, which runs on newman,
so yes, these are the same machine.

 Does getting the same errors mean that it always says /var/lib/bacula even
 when the bootstrap should be in /tmp?

Perhaps I'm a little unclear on what bootstraps are involved. By
default, a bootstrap for the job is placed
in /var/lib/bacula/restore.bsr. If I change this elsewhere by modifying
the job before it is run, ie, /tmp/restore.bsr, it fails because it
can't find the file - there is no /tmp/restore.bsr. But even though it
creates the bootstrap successfully to /var/lib/bacula/restore.bsr, it
still wants to
create /var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap, at
which point if fails with permission denied.

 What is the output of
 
 ls -la /var/lib/bacula
 ls -la /var/lib
 ls -la /var
 
 on the SD? 

[newman:~]# ls -la /var/lib/bacula
total 528848
drwxrwxr-x   2 bacula backups  4096 2005-10-20 07:21 .
drwxr-xr-x  36 root   root 4096 2005-09-17 14:03 ..
-rw-r-   1 bacula backups   516 2005-10-20 01:18
BackupCatalog.bsr
-rw-r-   1 bacula backups 539603968 2005-06-28 09:52 bacula.db.old
-rw-r-   1 bacula backups  2032 2005-10-19 11:28
bacula-dir.9101.state
-rw-r-   1 root   root 2032 2005-10-19 11:28
bacula-fd.9102.state
-rw-r-   1 bacula tape 2032 2005-10-19 11:28
bacula-sd.9103.state
-rw-r-   1 bacula backups   103 2004-11-28 12:39 CVS.bsr
-rw-r-   1 bacula backups  2779 2004-09-19 16:13 Data.bsr
-rw---   1 bacula backups 0 2005-10-19 11:31
helmsdeep.conmsg
-rw---   1 bacula backups 0 2004-09-17 17:45
helmsdeep-dir.conmsg
-rw-r-   1 bacula backups  1303 2005-10-20 01:11 Home.bsr
-rwxrwx---   1 bacula backups   1319129 2005-10-20 01:18 log
-rw-r-   1 bacula backups   523 2005-10-20 01:12 Mail.bsr
-rw-r-   1 bacula backups   430 2004-11-28 12:55 Music.bsr
-rw---   1 bacula backups 0 2004-09-16 08:27
newman-dir.conmsg
-rw-r-   1 bacula backups   633 2005-10-20 01:05 NewmanRoot.bsr
-rw-r-   1 bacula backups   745 2005-10-20 01:07 PenfoldRoot.bsr
-rw-r-   1 bacula backups   107 2005-10-16 03:00 Public.bsr
-rw-r-   1 bacula backups   107 2005-10-19 11:29 restore.bsr
-rw-r-   1 root   root   73 2005-07-08 10:11 root-exclude
-rw-r-   1 root   root2 2004-11-15 09:20 root-include
-rw-r-   1 bacula backups   104 2005-10-16 03:00 Source.bsr
[newman:~]# ls -la /var/lib
total 144
drwxr-xr-x  36 root   root4096 2005-09-17 14:03 .
drwxr-xr-x  14 root   root4096 2005-07-12 16:57 ..
drwxr-xr-x   2 root   root4096 2005-05-12 15:35 apache2
drwxr-xr-x   3 root   root4096 2005-07-18 13:21 apt
drwxr-xr-x   2 root   root4096 2004-07-31 17:51 aptitude
drwxr-xr-x   2 root   root4096 2005-03-04 02:21 apt-proxy
drwxrwxr-x   2 bacula backups 4096 2005-10-20 07:21 bacula
drwxr-xr-x   2 clamav 

Re: [Bacula-users] Restore errors: Permission Denied

2005-10-19 Thread Martin Simmons
 On Wed, 19 Oct 2005 12:08:24 +1000, Craig Holyoak [EMAIL PROTECTED] 
 said:

  Craig I'm running bacula 1.36.2 on Debian stable. Whenever I try to run a 
restore
  Craig job, it fails with:

  Craig   19-Oct 11:29 helmsdeep: Start Restore Job Restore.2005-10-19_11.29.50
  Craig   19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: Could 
not create bootstrap file 
/var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap: ERR=Permission 
denied
  Craig   19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: 
job.c:1662 Comm error with SD. bad response to Bootstrap. ERR=Connection reset 
by peer
  Craig   19-Oct 11:29 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 
1.36.2 (28Feb05): 19-Oct-2005 11:29:53
  Craig JobId:  878
  Craig Job:Restore.2005-10-19_11.29.50
  Craig Client: newman
  Craig Start time: 19-Oct-2005 11:29:52
  Craig End time:   19-Oct-2005 11:29:53
  Craig Files Expected: 1
  Craig Files Restored: 0
  Craig Bytes Restored: 0
  Craig Rate:   0.0 KB/s
  Craig FD Errors:  0
  Craig FD termination status:  Error
  Craig SD termination status:  Error
  Craig Termination:*** Restore Error ***

  Craig This machine runs all bacula daemons. The director and sd run as the 
bacula
  Craig user, and the sd runs as root. /var/lib/bacula is writable by the 
bacula user
  Craig (and root, obviously :-).

  Craig I've tried modifying the job and redirecting the bootstrap file 
elsewhere (eg
  Craig /tmp), but I keep getting the same errors. I have never run a 
successful
  Craig restore using bconsole. I'm forced to use bextract to do all my 
restores,
  Craig which works fine.

  Craig Any ideas?

The error comes from the SD.

So helmsdeep and newman are the same machine?

Does getting the same errors mean that it always says /var/lib/bacula even
when the bootstrap should be in /tmp?

What is the output of

ls -la /var/lib/bacula
ls -la /var/lib
ls -la /var

on the SD?

Does

touch /var/lib/bacula/touch-test

work as the bacula user on the SD?

__Martin


---
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Restore errors: Permission Denied

2005-10-18 Thread Craig Holyoak
I'm running bacula 1.36.2 on Debian stable. Whenever I try to run a restore
job, it fails with:

  19-Oct 11:29 helmsdeep: Start Restore Job Restore.2005-10-19_11.29.50 
  
  19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: Could not 
create bootstrap file 
/var/lib/bacula/newman.Restore.2005-10-19_11.29.50.bootstrap: ERR=Permission 
denied
  19-Oct 11:29 newman: Restore.2005-10-19_11.29.50 Fatal error: job.c:1662 Comm 
error with SD. bad response to Bootstrap. ERR=Connection reset by peer
  19-Oct 11:29 helmsdeep: Restore.2005-10-19_11.29.50 Error: Bacula 1.36.2 
(28Feb05): 19-Oct-2005 11:29:53
JobId:  878 
  
Job:Restore.2005-10-19_11.29.50 
  
Client: newman  
  
Start time: 19-Oct-2005 11:29:52
  
End time:   19-Oct-2005 11:29:53
  
Files Expected: 1   
  
Files Restored: 0   
  
Bytes Restored: 0   
  
Rate:   0.0 KB/s
  
FD Errors:  0   
  
FD termination status:  Error   
  
SD termination status:  Error   
  
Termination:*** Restore Error ***

This machine runs all bacula daemons. The director and sd run as the bacula
user, and the sd runs as root. /var/lib/bacula is writable by the bacula user
(and root, obviously :-).

I've tried modifying the job and redirecting the bootstrap file elsewhere (eg
/tmp), but I keep getting the same errors. I have never run a successful
restore using bconsole. I'm forced to use bextract to do all my restores,
which works fine.

Any ideas?

Thanks,

Craig

-- 
Craig Holyoak
[EMAIL PROTECTED]
http://www.helmsdeep.org/


signature.asc
Description: Digital signature