Re: [Bacula-users] Big job keeps failing after server replacement

2019-05-03 Thread Kern Sibbald
Hello,

Note: for Windows, the FD can probably backup the files, but it may not
be backing up everything so you might have
restore problems later.  There have been a *lot* of fixes for the
Windows FD since version 5.2.x

It seems that on 6 April, you were running the SD as bacula:bacula, and
now you are running it as bacula:tape.
That could be the source of your problem.

On 5/2/19 5:14 PM, Gary Stainburn wrote:
> Hi Kern,
>
> Thank you for your response.
>
> On Thursday 02 May 2019 14:23:54 Kern Sibbald wrote:
>> Hello,
>>
>> Sorry you are having problems.  I note the following things:
>>
>> 1. You are running on a *very* old Bacula version.
> I am aware of this, but it is the latest version that installs from the 
> Centos repositories, and more importantly is only a few revisions higher than 
> was on the old (dead) box ensuring that the database format etc would be 
> compatible.
>
>> 2. I am not sure that version of Bacula supports Windows 7, where you
>> are getting failures.
> This version has successfully worked on Win-Xp, Win7, Win8 and Win10.  I have 
> approc 150 workstations all with one of these versions. I also have various 
> ages of Linux server which all also work
>
>> 3. As for the errors, it looks like the SD does not have permission to
>> open /var/bacula/crownest
> When this was first set up I had forgotten to set the owner:group on the 
> directory. I did fix this, and the files that are now in that directory have 
> the correct owner:group, and were written by the SD (see OP). 
>
> I am aware that this is what the errors are saying, but that does not explain 
> how the files are still being created.
>
>
>> 4. Is /var/bacula/crownnest a network mount (this could explain the
>> failures).
> This is a file  storage device configured in the SD on the same box as the 
> Director.  The directory itself is local to that box
>
>> 5. It looks like it is taking a bit over 2 hours for the SD to mount the
>> volume for the backup, and
>>     this is approximately the network inactivity timeout period.  So
>> possibly the FD<->SD connection
>>     is timing out
> The client device and the storage device are on different sites, with a 30GB 
> connection, doing a 18GB full backup. This would explain the 2 hour run time.
>
>> 6. Try adding HeartBeatInterval = 300 to the Dir, FD, and SD.  I think
>> there are 5 places where it must
>>     be done (note this is the default in more recent Baculas).
> I will investigate where this needs to go and will apply it.
>
>> Recommendations:
>> 1. Make sure the SD either has full permissions on all disk files or
>> runs as root.
> The SD runs as bacula:tape which is the default configuration from the RPM's 
> and matches the old box.
>
> As you can see from the directory structure, this should be correct.
> [root@lou bacula]# ls -ld / /var/ /var/bacula/ /var/bacula/crownest/ 
> /var/bacula/crownest/* /var/bacula/hales/ /var/bacula/hales/*
> dr-xr-xr-x. 18 root   root  281 Apr  5 11:13 /
> drwxr-xr-x. 27 root   root 4096 Mar  8 13:59 /var/
> drwxr-xr-x. 14 bacula bacula   8192 Apr 30 11:28 /var/bacula/
> drwxr-xr-x.  2 bacula bacula   4096 May  2 15:43 /var/bacula/crownest/
> -rw-r-.  1 bacula tape   5368688828 Apr 30 17:20 
> /var/bacula/crownest/crownest72930
> -rw-r-.  1 bacula tape   5368688851 Apr 30 17:21 
> /var/bacula/crownest/crownest72931
> -rw-r-.  1 bacula tape   5368688789 Apr 30 17:21 
> /var/bacula/crownest/crownest72932
> [snip]
> -rw-r-.  1 bacula tape   1745617753 May  2 15:43 
> /var/bacula/crownest/crownest72965
> drwxr-xr-x.  3 bacula bacula  12288 May  1 23:41 /var/bacula/hales/
> -rw-r-.  1 bacula bacula 5368705600 Apr  6 03:29 
> /var/bacula/hales/hales68076
> -rw-r-.  1 bacula bacula 5368688843 Apr  6 03:29 
> /var/bacula/hales/hales68078
> -rw-r-.  1 bacula bacula 5368670610 Apr  6 03:51 
> /var/bacula/hales/hales68082
>
> (Hales is the storage device that was used previously)
>
>> 2. Make sure any network mounts (NFS, CIFS) are mounted prior to running
>> a job
> I do not use network mounts
>
>> 3. Add the Heart Beat Interval = 300 in all the required resources --
>> this may be documented  in one of the white papers, but is surely documented 
>> in the manual,
>> look in the index ...
> I have added this line to the Director, client and storage resources. I 
> cannot see where else it wuold be needed.
>
>> 4. When you get it running think seriously about upgrading.  Though
>> RedHat releases an older version
>>    of Bacula the builders do create packages for newer version -- or
>> look on www.bacula.org for binaries.
> I am aware that I am running an old version. Is there a documented upgrade 
> path?  I believe the Centos 8 is imminent, and would prefer to stick to 
> standard RPM's where possible
>



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Big job keeps failing after server replacement

2019-05-02 Thread Gary Stainburn
Hi Kern,

Thank you for your response.

On Thursday 02 May 2019 14:23:54 Kern Sibbald wrote:
> Hello,
> 
> Sorry you are having problems.  I note the following things:
> 
> 1. You are running on a *very* old Bacula version.

I am aware of this, but it is the latest version that installs from the Centos 
repositories, and more importantly is only a few revisions higher than was on 
the old (dead) box ensuring that the database format etc would be compatible.

> 2. I am not sure that version of Bacula supports Windows 7, where you
> are getting failures.

This version has successfully worked on Win-Xp, Win7, Win8 and Win10.  I have 
approc 150 workstations all with one of these versions. I also have various 
ages of Linux server which all also work

> 3. As for the errors, it looks like the SD does not have permission to
> open /var/bacula/crownest

When this was first set up I had forgotten to set the owner:group on the 
directory. I did fix this, and the files that are now in that directory have 
the correct owner:group, and were written by the SD (see OP). 

I am aware that this is what the errors are saying, but that does not explain 
how the files are still being created.


> 4. Is /var/bacula/crownnest a network mount (this could explain the
> failures).

This is a file  storage device configured in the SD on the same box as the 
Director.  The directory itself is local to that box

> 5. It looks like it is taking a bit over 2 hours for the SD to mount the
> volume for the backup, and
>     this is approximately the network inactivity timeout period.  So
> possibly the FD<->SD connection
>     is timing out

The client device and the storage device are on different sites, with a 30GB 
connection, doing a 18GB full backup. This would explain the 2 hour run time.

> 6. Try adding HeartBeatInterval = 300 to the Dir, FD, and SD.  I think
> there are 5 places where it must
>     be done (note this is the default in more recent Baculas).

I will investigate where this needs to go and will apply it.

> 
> Recommendations:
> 1. Make sure the SD either has full permissions on all disk files or
> runs as root.

The SD runs as bacula:tape which is the default configuration from the RPM's 
and matches the old box.

As you can see from the directory structure, this should be correct.
[root@lou bacula]# ls -ld / /var/ /var/bacula/ /var/bacula/crownest/ 
/var/bacula/crownest/* /var/bacula/hales/ /var/bacula/hales/*
dr-xr-xr-x. 18 root   root  281 Apr  5 11:13 /
drwxr-xr-x. 27 root   root 4096 Mar  8 13:59 /var/
drwxr-xr-x. 14 bacula bacula   8192 Apr 30 11:28 /var/bacula/
drwxr-xr-x.  2 bacula bacula   4096 May  2 15:43 /var/bacula/crownest/
-rw-r-.  1 bacula tape   5368688828 Apr 30 17:20 
/var/bacula/crownest/crownest72930
-rw-r-.  1 bacula tape   5368688851 Apr 30 17:21 
/var/bacula/crownest/crownest72931
-rw-r-.  1 bacula tape   5368688789 Apr 30 17:21 
/var/bacula/crownest/crownest72932
[snip]
-rw-r-.  1 bacula tape   1745617753 May  2 15:43 
/var/bacula/crownest/crownest72965
drwxr-xr-x.  3 bacula bacula  12288 May  1 23:41 /var/bacula/hales/
-rw-r-.  1 bacula bacula 5368705600 Apr  6 03:29 
/var/bacula/hales/hales68076
-rw-r-.  1 bacula bacula 5368688843 Apr  6 03:29 
/var/bacula/hales/hales68078
-rw-r-.  1 bacula bacula 5368670610 Apr  6 03:51 
/var/bacula/hales/hales68082

(Hales is the storage device that was used previously)

> 2. Make sure any network mounts (NFS, CIFS) are mounted prior to running
> a job

I do not use network mounts

> 3. Add the Heart Beat Interval = 300 in all the required resources --
> this may be documented  in one of the white papers, but is surely documented 
> in the manual,
> look in the index ...

I have added this line to the Director, client and storage resources. I cannot 
see where else it wuold be needed.

> 4. When you get it running think seriously about upgrading.  Though
> RedHat releases an older version
>    of Bacula the builders do create packages for newer version -- or
> look on www.bacula.org for binaries.

I am aware that I am running an old version. Is there a documented upgrade 
path?  I believe the Centos 8 is imminent, and would prefer to stick to 
standard RPM's where possible


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Big job keeps failing after server replacement

2019-05-02 Thread Kern Sibbald
Hello,

Sorry you are having problems.  I note the following things:

1. You are running on a *very* old Bacula version.
2. I am not sure that version of Bacula supports Windows 7, where you
are getting failures.
3. As for the errors, it looks like the SD does not have permission to
open /var/bacula/crownest
4. Is /var/bacula/crownnest a network mount (this could explain the
failures).
5. It looks like it is taking a bit over 2 hours for the SD to mount the
volume for the backup, and
    this is approximately the network inactivity timeout period.  So
possibly the FD<->SD connection
    is timing out
6. Try adding HeartBeatInterval = 300 to the Dir, FD, and SD.  I think
there are 5 places where it must
    be done (note this is the default in more recent Baculas).

Recommendations:
1. Make sure the SD either has full permissions on all disk files or
runs as root.
2. Make sure any network mounts (NFS, CIFS) are mounted prior to running
a job
3. Add the Heart Beat Interval = 300 in all the required resources --
this may be documented
    in one of the white papers, but is surely documented in the manual,
look in the index ...
4. When you get it running think seriously about upgrading.  Though
RedHat releases an older version
   of Bacula the builders do create packages for newer version -- or
look on www.bacula.org for binaries.

Best regards,
Kern

On 5/2/19 1:14 PM, Gary Stainburn wrote:
> A few months back, the server running my Director and main storage died.  I 
> managed to boot using a live CD and successfully copied everything onto a new 
> Centos 7 box.  I restored the latest database backup, copied the config files 
> and rsynced the storage.
>
> Amazingly all just continued to work. More amazing seeing as the failed box 
> was a Fedora 19 setup and wa quite old.
>
> The only problem is I have a small number of large jobs that are constantly 
> failing.  Everything appears to work fine, and the client status shows the 
> job as completed OK. See below.
>
> However, the job itself fails, and gets rescheduled. In the case I'm looking 
> at it keeps running a 18GB backup.
>
> In order to try to narrow down the problem, as well as aid job scheduling I 
> have set up a new storage device and pointed this job at that, but it has not 
> made any difference.  Looking at the log file, I see two things
>
> Firstly, error messages about accessing the new storage folder and backup 
> volumes. This is odd as the backup is being successfully written to this 
> device. See below.
>
> Secondly, the job appears to be trying to connect to the FD once the job is 
> complete and is then failing.  I don't konw why this is the case.
>
> Does anyone have any suggestions of what I can do to fix this?
>
> The full job log is available at https://www1.ringways.co.uk/bacula.log
>
> I'm running standard Centos 7 RPM's.  
>
> bacula-console-bat-5.2.13-23.1.el7.x86_64
> bacula-libs-5.2.13-23.1.el7.x86_64
> bacula-console-5.2.13-23.1.el7.x86_64
> bacula-common-5.2.13-23.1.el7.x86_64
> bacula-storage-5.2.13-23.1.el7.x86_64
> bacula-libs-sql-5.2.13-23.1.el7.x86_64
> bacula-client-5.2.13-23.1.el7.x86_64
> bacula-director-5.2.13-23.1.el7.x86_64
> postgresql-docs-9.2.24-1.el7_5.x86_64
> postgresql-upgrade-9.2.24-1.el7_5.x86_64
> postgresql-contrib-9.2.24-1.el7_5.x86_64
> postgresql-devel-9.2.24-1.el7_5.x86_64
> postgresql-9.2.24-1.el7_5.x86_64
> postgresql-server-9.2.24-1.el7_5.x86_64
> postgresql-plperl-9.2.24-1.el7_5.x86_64
> postgresql-libs-9.2.24-1.el7_5.x86_64
>
>
> storage direcortory
> ***
> [root@lou bacula]# ls -ld crownest/ crownest/*
> drwxr-xr-x. 2 bacula bacula   4096 May  2 10:38 crownest/
> -rw-r-. 1 bacula tape   5368688828 Apr 30 17:20 crownest/crownest72930
> -rw-r-. 1 bacula tape   5368688851 Apr 30 17:21 crownest/crownest72931
> -rw-r-. 1 bacula tape   5368688789 Apr 30 17:21 crownest/crownest72932
> -rw-r-. 1 bacula tape   5368669086 Apr 30 21:03 crownest/crownest72933
> -rw-r-. 1 bacula tape   5368676895 Apr 30 22:59 crownest/crownest72934
> -rw-r-. 1 bacula tape   5368688828 Apr 30 23:00 crownest/crownest72935
> -rw-r-. 1 bacula tape   5368688851 Apr 30 23:01 crownest/crownest72936
> -rw-r-. 1 bacula tape   5368688795 Apr 30 23:01 crownest/crownest72937
> -rw-r-. 1 bacula tape   5368696610 May  1 04:24 crownest/crownest72938
> -rw-r-. 1 bacula tape   5368688851 May  1 04:25 crownest/crownest72939
> -rw-r-. 1 bacula tape   5368688851 May  1 04:26 crownest/crownest72940
> -rw-r-. 1 bacula tape   5368696566 May  1 14:03 crownest/crownest72941
> -rw-r-. 1 bacula tape   5368688838 May  1 14:04 crownest/crownest72942
> -rw-r-. 1 bacula tape   5368688851 May  1 14:04 crownest/crownest72943
> -rw-r-. 1 bacula tape   5368688801 May  1 14:05 crownest/crownest72944
> -rw-r-. 1 bacula tape   5368701109 May  1 16:13 crownest/crownest72945
> -rw-r-. 1 bacula tape   5368688850 May  1 16:14 crownest/crownest72947
> -rw-r-. 1 bacula 

Re: [Bacula-users] Big job keeps failing after server replacement

2019-05-02 Thread Josh Fisher



On 5/2/2019 7:14 AM, Gary Stainburn wrote:

A few months back, the server running my Director and main storage died.  I 
managed to boot using a live CD and successfully copied everything onto a new 
Centos 7 box.  I restored the latest database backup, copied the config files 
and rsynced the storage.

Amazingly all just continued to work. More amazing seeing as the failed box was 
a Fedora 19 setup and wa quite old.

The only problem is I have a small number of large jobs that are constantly 
failing.  Everything appears to work fine, and the client status shows the job 
as completed OK. See below.

However, the job itself fails, and gets rescheduled. In the case I'm looking at 
it keeps running a 18GB backup.

In order to try to narrow down the problem, as well as aid job scheduling I 
have set up a new storage device and pointed this job at that, but it has not 
made any difference.  Looking at the log file, I see two things

Firstly, error messages about accessing the new storage folder and backup 
volumes. This is odd as the backup is being successfully written to this 
device. See below.

Secondly, the job appears to be trying to connect to the FD once the job is 
complete and is then failing.  I don't konw why this is the case.

Does anyone have any suggestions of what I can do to fix this?

The full job log is available at https://www1.ringways.co.uk/bacula.log

I'm running standard Centos 7 RPM's.

bacula-console-bat-5.2.13-23.1.el7.x86_64
bacula-libs-5.2.13-23.1.el7.x86_64
bacula-console-5.2.13-23.1.el7.x86_64
bacula-common-5.2.13-23.1.el7.x86_64
bacula-storage-5.2.13-23.1.el7.x86_64
bacula-libs-sql-5.2.13-23.1.el7.x86_64
bacula-client-5.2.13-23.1.el7.x86_64
bacula-director-5.2.13-23.1.el7.x86_64
postgresql-docs-9.2.24-1.el7_5.x86_64
postgresql-upgrade-9.2.24-1.el7_5.x86_64
postgresql-contrib-9.2.24-1.el7_5.x86_64
postgresql-devel-9.2.24-1.el7_5.x86_64
postgresql-9.2.24-1.el7_5.x86_64
postgresql-server-9.2.24-1.el7_5.x86_64
postgresql-plperl-9.2.24-1.el7_5.x86_64
postgresql-libs-9.2.24-1.el7_5.x86_64


storage direcortory
***



According to the job log, the permissions on these volume files do not 
allow the bacula-sd daemon r/w access. Determine what user:group 
bacula-sd is running as and either change the volume file permissions or 
else change the user:group that bacula-sd runs as.




[root@lou bacula]# ls -ld crownest/ crownest/*
drwxr-xr-x. 2 bacula bacula   4096 May  2 10:38 crownest/
-rw-r-. 1 bacula tape   5368688828 Apr 30 17:20 crownest/crownest72930
-rw-r-. 1 bacula tape   5368688851 Apr 30 17:21 crownest/crownest72931
-rw-r-. 1 bacula tape   5368688789 Apr 30 17:21 crownest/crownest72932
-rw-r-. 1 bacula tape   5368669086 Apr 30 21:03 crownest/crownest72933
-rw-r-. 1 bacula tape   5368676895 Apr 30 22:59 crownest/crownest72934
-rw-r-. 1 bacula tape   5368688828 Apr 30 23:00 crownest/crownest72935
-rw-r-. 1 bacula tape   5368688851 Apr 30 23:01 crownest/crownest72936
-rw-r-. 1 bacula tape   5368688795 Apr 30 23:01 crownest/crownest72937
-rw-r-. 1 bacula tape   5368696610 May  1 04:24 crownest/crownest72938
-rw-r-. 1 bacula tape   5368688851 May  1 04:25 crownest/crownest72939
-rw-r-. 1 bacula tape   5368688851 May  1 04:26 crownest/crownest72940
-rw-r-. 1 bacula tape   5368696566 May  1 14:03 crownest/crownest72941
-rw-r-. 1 bacula tape   5368688838 May  1 14:04 crownest/crownest72942
-rw-r-. 1 bacula tape   5368688851 May  1 14:04 crownest/crownest72943
-rw-r-. 1 bacula tape   5368688801 May  1 14:05 crownest/crownest72944
-rw-r-. 1 bacula tape   5368701109 May  1 16:13 crownest/crownest72945
-rw-r-. 1 bacula tape   5368688850 May  1 16:14 crownest/crownest72947
-rw-r-. 1 bacula tape   5368688851 May  1 16:14 crownest/crownest72948
-rw-r-. 1 bacula tape   5368647060 May  1 23:38 crownest/crownest72949
-rw-r-. 1 bacula tape   5368654487 May  2 00:29 crownest/crownest72950
-rw-r-. 1 bacula tape   5368659375 May  2 08:28 crownest/crownest72951
-rw-r-. 1 bacula tape   5368688850 May  2 08:29 crownest/crownest72952
-rw-r-. 1 bacula tape   5368688845 May  2 08:30 crownest/crownest72953
-rw-r-. 1 bacula tape   5368662716 May  2 08:52 crownest/crownest72954
-rw-r-. 1 bacula tape   5368671066 May  2 10:37 crownest/crownest72955
-rw-r-. 1 bacula tape   5368688850 May  2 10:37 crownest/crownest72956
-rw-r-. 1 bacula tape   5368688851 May  2 10:38 crownest/crownest72957
-rw-r-. 1 bacula tape   2391406983 May  2 10:38 crownest/crownest72958
[root@lou bacula]# pwd
/var/bacula
[root@lou bacula]#


Client Status
***

*status client=lsaless-fd
Connecting to Client lsaless-fd at 10.3.3.3:9102

esales4-fd Version: 5.2.10 (28 June 2012)  VSS Linux Cross-compile Win64
Daemon started 02-May-19 10:38. Jobs: run=0 running=0.
Microsoft Windows 7 Professional Service Pack 1 (build 7601), 64-bit
  Heap: heap=0 smbytes=239,813 max_bytes=262,885 bufs=409 max_bufs=438