Re: [Bacula-users] Big job keeps failing after server replacement
Hello, Note: for Windows, the FD can probably backup the files, but it may not be backing up everything so you might have restore problems later. There have been a *lot* of fixes for the Windows FD since version 5.2.x It seems that on 6 April, you were running the SD as bacula:bacula, and now you are running it as bacula:tape. That could be the source of your problem. On 5/2/19 5:14 PM, Gary Stainburn wrote: > Hi Kern, > > Thank you for your response. > > On Thursday 02 May 2019 14:23:54 Kern Sibbald wrote: >> Hello, >> >> Sorry you are having problems. I note the following things: >> >> 1. You are running on a *very* old Bacula version. > I am aware of this, but it is the latest version that installs from the > Centos repositories, and more importantly is only a few revisions higher than > was on the old (dead) box ensuring that the database format etc would be > compatible. > >> 2. I am not sure that version of Bacula supports Windows 7, where you >> are getting failures. > This version has successfully worked on Win-Xp, Win7, Win8 and Win10. I have > approc 150 workstations all with one of these versions. I also have various > ages of Linux server which all also work > >> 3. As for the errors, it looks like the SD does not have permission to >> open /var/bacula/crownest > When this was first set up I had forgotten to set the owner:group on the > directory. I did fix this, and the files that are now in that directory have > the correct owner:group, and were written by the SD (see OP). > > I am aware that this is what the errors are saying, but that does not explain > how the files are still being created. > > >> 4. Is /var/bacula/crownnest a network mount (this could explain the >> failures). > This is a file storage device configured in the SD on the same box as the > Director. The directory itself is local to that box > >> 5. It looks like it is taking a bit over 2 hours for the SD to mount the >> volume for the backup, and >> this is approximately the network inactivity timeout period. So >> possibly the FD<->SD connection >> is timing out > The client device and the storage device are on different sites, with a 30GB > connection, doing a 18GB full backup. This would explain the 2 hour run time. > >> 6. Try adding HeartBeatInterval = 300 to the Dir, FD, and SD. I think >> there are 5 places where it must >> be done (note this is the default in more recent Baculas). > I will investigate where this needs to go and will apply it. > >> Recommendations: >> 1. Make sure the SD either has full permissions on all disk files or >> runs as root. > The SD runs as bacula:tape which is the default configuration from the RPM's > and matches the old box. > > As you can see from the directory structure, this should be correct. > [root@lou bacula]# ls -ld / /var/ /var/bacula/ /var/bacula/crownest/ > /var/bacula/crownest/* /var/bacula/hales/ /var/bacula/hales/* > dr-xr-xr-x. 18 root root 281 Apr 5 11:13 / > drwxr-xr-x. 27 root root 4096 Mar 8 13:59 /var/ > drwxr-xr-x. 14 bacula bacula 8192 Apr 30 11:28 /var/bacula/ > drwxr-xr-x. 2 bacula bacula 4096 May 2 15:43 /var/bacula/crownest/ > -rw-r-. 1 bacula tape 5368688828 Apr 30 17:20 > /var/bacula/crownest/crownest72930 > -rw-r-. 1 bacula tape 5368688851 Apr 30 17:21 > /var/bacula/crownest/crownest72931 > -rw-r-. 1 bacula tape 5368688789 Apr 30 17:21 > /var/bacula/crownest/crownest72932 > [snip] > -rw-r-. 1 bacula tape 1745617753 May 2 15:43 > /var/bacula/crownest/crownest72965 > drwxr-xr-x. 3 bacula bacula 12288 May 1 23:41 /var/bacula/hales/ > -rw-r-. 1 bacula bacula 5368705600 Apr 6 03:29 > /var/bacula/hales/hales68076 > -rw-r-. 1 bacula bacula 5368688843 Apr 6 03:29 > /var/bacula/hales/hales68078 > -rw-r-. 1 bacula bacula 5368670610 Apr 6 03:51 > /var/bacula/hales/hales68082 > > (Hales is the storage device that was used previously) > >> 2. Make sure any network mounts (NFS, CIFS) are mounted prior to running >> a job > I do not use network mounts > >> 3. Add the Heart Beat Interval = 300 in all the required resources -- >> this may be documented in one of the white papers, but is surely documented >> in the manual, >> look in the index ... > I have added this line to the Director, client and storage resources. I > cannot see where else it wuold be needed. > >> 4. When you get it running think seriously about upgrading. Though >> RedHat releases an older version >> of Bacula the builders do create packages for newer version -- or >> look on www.bacula.org for binaries. > I am aware that I am running an old version. Is there a documented upgrade > path? I believe the Centos 8 is imminent, and would prefer to stick to > standard RPM's where possible > ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Big job keeps failing after server replacement
Hi Kern, Thank you for your response. On Thursday 02 May 2019 14:23:54 Kern Sibbald wrote: > Hello, > > Sorry you are having problems. I note the following things: > > 1. You are running on a *very* old Bacula version. I am aware of this, but it is the latest version that installs from the Centos repositories, and more importantly is only a few revisions higher than was on the old (dead) box ensuring that the database format etc would be compatible. > 2. I am not sure that version of Bacula supports Windows 7, where you > are getting failures. This version has successfully worked on Win-Xp, Win7, Win8 and Win10. I have approc 150 workstations all with one of these versions. I also have various ages of Linux server which all also work > 3. As for the errors, it looks like the SD does not have permission to > open /var/bacula/crownest When this was first set up I had forgotten to set the owner:group on the directory. I did fix this, and the files that are now in that directory have the correct owner:group, and were written by the SD (see OP). I am aware that this is what the errors are saying, but that does not explain how the files are still being created. > 4. Is /var/bacula/crownnest a network mount (this could explain the > failures). This is a file storage device configured in the SD on the same box as the Director. The directory itself is local to that box > 5. It looks like it is taking a bit over 2 hours for the SD to mount the > volume for the backup, and > this is approximately the network inactivity timeout period. So > possibly the FD<->SD connection > is timing out The client device and the storage device are on different sites, with a 30GB connection, doing a 18GB full backup. This would explain the 2 hour run time. > 6. Try adding HeartBeatInterval = 300 to the Dir, FD, and SD. I think > there are 5 places where it must > be done (note this is the default in more recent Baculas). I will investigate where this needs to go and will apply it. > > Recommendations: > 1. Make sure the SD either has full permissions on all disk files or > runs as root. The SD runs as bacula:tape which is the default configuration from the RPM's and matches the old box. As you can see from the directory structure, this should be correct. [root@lou bacula]# ls -ld / /var/ /var/bacula/ /var/bacula/crownest/ /var/bacula/crownest/* /var/bacula/hales/ /var/bacula/hales/* dr-xr-xr-x. 18 root root 281 Apr 5 11:13 / drwxr-xr-x. 27 root root 4096 Mar 8 13:59 /var/ drwxr-xr-x. 14 bacula bacula 8192 Apr 30 11:28 /var/bacula/ drwxr-xr-x. 2 bacula bacula 4096 May 2 15:43 /var/bacula/crownest/ -rw-r-. 1 bacula tape 5368688828 Apr 30 17:20 /var/bacula/crownest/crownest72930 -rw-r-. 1 bacula tape 5368688851 Apr 30 17:21 /var/bacula/crownest/crownest72931 -rw-r-. 1 bacula tape 5368688789 Apr 30 17:21 /var/bacula/crownest/crownest72932 [snip] -rw-r-. 1 bacula tape 1745617753 May 2 15:43 /var/bacula/crownest/crownest72965 drwxr-xr-x. 3 bacula bacula 12288 May 1 23:41 /var/bacula/hales/ -rw-r-. 1 bacula bacula 5368705600 Apr 6 03:29 /var/bacula/hales/hales68076 -rw-r-. 1 bacula bacula 5368688843 Apr 6 03:29 /var/bacula/hales/hales68078 -rw-r-. 1 bacula bacula 5368670610 Apr 6 03:51 /var/bacula/hales/hales68082 (Hales is the storage device that was used previously) > 2. Make sure any network mounts (NFS, CIFS) are mounted prior to running > a job I do not use network mounts > 3. Add the Heart Beat Interval = 300 in all the required resources -- > this may be documented in one of the white papers, but is surely documented > in the manual, > look in the index ... I have added this line to the Director, client and storage resources. I cannot see where else it wuold be needed. > 4. When you get it running think seriously about upgrading. Though > RedHat releases an older version > of Bacula the builders do create packages for newer version -- or > look on www.bacula.org for binaries. I am aware that I am running an old version. Is there a documented upgrade path? I believe the Centos 8 is imminent, and would prefer to stick to standard RPM's where possible ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Big job keeps failing after server replacement
Hello, Sorry you are having problems. I note the following things: 1. You are running on a *very* old Bacula version. 2. I am not sure that version of Bacula supports Windows 7, where you are getting failures. 3. As for the errors, it looks like the SD does not have permission to open /var/bacula/crownest 4. Is /var/bacula/crownnest a network mount (this could explain the failures). 5. It looks like it is taking a bit over 2 hours for the SD to mount the volume for the backup, and this is approximately the network inactivity timeout period. So possibly the FD<->SD connection is timing out 6. Try adding HeartBeatInterval = 300 to the Dir, FD, and SD. I think there are 5 places where it must be done (note this is the default in more recent Baculas). Recommendations: 1. Make sure the SD either has full permissions on all disk files or runs as root. 2. Make sure any network mounts (NFS, CIFS) are mounted prior to running a job 3. Add the Heart Beat Interval = 300 in all the required resources -- this may be documented in one of the white papers, but is surely documented in the manual, look in the index ... 4. When you get it running think seriously about upgrading. Though RedHat releases an older version of Bacula the builders do create packages for newer version -- or look on www.bacula.org for binaries. Best regards, Kern On 5/2/19 1:14 PM, Gary Stainburn wrote: > A few months back, the server running my Director and main storage died. I > managed to boot using a live CD and successfully copied everything onto a new > Centos 7 box. I restored the latest database backup, copied the config files > and rsynced the storage. > > Amazingly all just continued to work. More amazing seeing as the failed box > was a Fedora 19 setup and wa quite old. > > The only problem is I have a small number of large jobs that are constantly > failing. Everything appears to work fine, and the client status shows the > job as completed OK. See below. > > However, the job itself fails, and gets rescheduled. In the case I'm looking > at it keeps running a 18GB backup. > > In order to try to narrow down the problem, as well as aid job scheduling I > have set up a new storage device and pointed this job at that, but it has not > made any difference. Looking at the log file, I see two things > > Firstly, error messages about accessing the new storage folder and backup > volumes. This is odd as the backup is being successfully written to this > device. See below. > > Secondly, the job appears to be trying to connect to the FD once the job is > complete and is then failing. I don't konw why this is the case. > > Does anyone have any suggestions of what I can do to fix this? > > The full job log is available at https://www1.ringways.co.uk/bacula.log > > I'm running standard Centos 7 RPM's. > > bacula-console-bat-5.2.13-23.1.el7.x86_64 > bacula-libs-5.2.13-23.1.el7.x86_64 > bacula-console-5.2.13-23.1.el7.x86_64 > bacula-common-5.2.13-23.1.el7.x86_64 > bacula-storage-5.2.13-23.1.el7.x86_64 > bacula-libs-sql-5.2.13-23.1.el7.x86_64 > bacula-client-5.2.13-23.1.el7.x86_64 > bacula-director-5.2.13-23.1.el7.x86_64 > postgresql-docs-9.2.24-1.el7_5.x86_64 > postgresql-upgrade-9.2.24-1.el7_5.x86_64 > postgresql-contrib-9.2.24-1.el7_5.x86_64 > postgresql-devel-9.2.24-1.el7_5.x86_64 > postgresql-9.2.24-1.el7_5.x86_64 > postgresql-server-9.2.24-1.el7_5.x86_64 > postgresql-plperl-9.2.24-1.el7_5.x86_64 > postgresql-libs-9.2.24-1.el7_5.x86_64 > > > storage direcortory > *** > [root@lou bacula]# ls -ld crownest/ crownest/* > drwxr-xr-x. 2 bacula bacula 4096 May 2 10:38 crownest/ > -rw-r-. 1 bacula tape 5368688828 Apr 30 17:20 crownest/crownest72930 > -rw-r-. 1 bacula tape 5368688851 Apr 30 17:21 crownest/crownest72931 > -rw-r-. 1 bacula tape 5368688789 Apr 30 17:21 crownest/crownest72932 > -rw-r-. 1 bacula tape 5368669086 Apr 30 21:03 crownest/crownest72933 > -rw-r-. 1 bacula tape 5368676895 Apr 30 22:59 crownest/crownest72934 > -rw-r-. 1 bacula tape 5368688828 Apr 30 23:00 crownest/crownest72935 > -rw-r-. 1 bacula tape 5368688851 Apr 30 23:01 crownest/crownest72936 > -rw-r-. 1 bacula tape 5368688795 Apr 30 23:01 crownest/crownest72937 > -rw-r-. 1 bacula tape 5368696610 May 1 04:24 crownest/crownest72938 > -rw-r-. 1 bacula tape 5368688851 May 1 04:25 crownest/crownest72939 > -rw-r-. 1 bacula tape 5368688851 May 1 04:26 crownest/crownest72940 > -rw-r-. 1 bacula tape 5368696566 May 1 14:03 crownest/crownest72941 > -rw-r-. 1 bacula tape 5368688838 May 1 14:04 crownest/crownest72942 > -rw-r-. 1 bacula tape 5368688851 May 1 14:04 crownest/crownest72943 > -rw-r-. 1 bacula tape 5368688801 May 1 14:05 crownest/crownest72944 > -rw-r-. 1 bacula tape 5368701109 May 1 16:13 crownest/crownest72945 > -rw-r-. 1 bacula tape 5368688850 May 1 16:14 crownest/crownest72947 > -rw-r-. 1 bacula
Re: [Bacula-users] Big job keeps failing after server replacement
On 5/2/2019 7:14 AM, Gary Stainburn wrote: A few months back, the server running my Director and main storage died. I managed to boot using a live CD and successfully copied everything onto a new Centos 7 box. I restored the latest database backup, copied the config files and rsynced the storage. Amazingly all just continued to work. More amazing seeing as the failed box was a Fedora 19 setup and wa quite old. The only problem is I have a small number of large jobs that are constantly failing. Everything appears to work fine, and the client status shows the job as completed OK. See below. However, the job itself fails, and gets rescheduled. In the case I'm looking at it keeps running a 18GB backup. In order to try to narrow down the problem, as well as aid job scheduling I have set up a new storage device and pointed this job at that, but it has not made any difference. Looking at the log file, I see two things Firstly, error messages about accessing the new storage folder and backup volumes. This is odd as the backup is being successfully written to this device. See below. Secondly, the job appears to be trying to connect to the FD once the job is complete and is then failing. I don't konw why this is the case. Does anyone have any suggestions of what I can do to fix this? The full job log is available at https://www1.ringways.co.uk/bacula.log I'm running standard Centos 7 RPM's. bacula-console-bat-5.2.13-23.1.el7.x86_64 bacula-libs-5.2.13-23.1.el7.x86_64 bacula-console-5.2.13-23.1.el7.x86_64 bacula-common-5.2.13-23.1.el7.x86_64 bacula-storage-5.2.13-23.1.el7.x86_64 bacula-libs-sql-5.2.13-23.1.el7.x86_64 bacula-client-5.2.13-23.1.el7.x86_64 bacula-director-5.2.13-23.1.el7.x86_64 postgresql-docs-9.2.24-1.el7_5.x86_64 postgresql-upgrade-9.2.24-1.el7_5.x86_64 postgresql-contrib-9.2.24-1.el7_5.x86_64 postgresql-devel-9.2.24-1.el7_5.x86_64 postgresql-9.2.24-1.el7_5.x86_64 postgresql-server-9.2.24-1.el7_5.x86_64 postgresql-plperl-9.2.24-1.el7_5.x86_64 postgresql-libs-9.2.24-1.el7_5.x86_64 storage direcortory *** According to the job log, the permissions on these volume files do not allow the bacula-sd daemon r/w access. Determine what user:group bacula-sd is running as and either change the volume file permissions or else change the user:group that bacula-sd runs as. [root@lou bacula]# ls -ld crownest/ crownest/* drwxr-xr-x. 2 bacula bacula 4096 May 2 10:38 crownest/ -rw-r-. 1 bacula tape 5368688828 Apr 30 17:20 crownest/crownest72930 -rw-r-. 1 bacula tape 5368688851 Apr 30 17:21 crownest/crownest72931 -rw-r-. 1 bacula tape 5368688789 Apr 30 17:21 crownest/crownest72932 -rw-r-. 1 bacula tape 5368669086 Apr 30 21:03 crownest/crownest72933 -rw-r-. 1 bacula tape 5368676895 Apr 30 22:59 crownest/crownest72934 -rw-r-. 1 bacula tape 5368688828 Apr 30 23:00 crownest/crownest72935 -rw-r-. 1 bacula tape 5368688851 Apr 30 23:01 crownest/crownest72936 -rw-r-. 1 bacula tape 5368688795 Apr 30 23:01 crownest/crownest72937 -rw-r-. 1 bacula tape 5368696610 May 1 04:24 crownest/crownest72938 -rw-r-. 1 bacula tape 5368688851 May 1 04:25 crownest/crownest72939 -rw-r-. 1 bacula tape 5368688851 May 1 04:26 crownest/crownest72940 -rw-r-. 1 bacula tape 5368696566 May 1 14:03 crownest/crownest72941 -rw-r-. 1 bacula tape 5368688838 May 1 14:04 crownest/crownest72942 -rw-r-. 1 bacula tape 5368688851 May 1 14:04 crownest/crownest72943 -rw-r-. 1 bacula tape 5368688801 May 1 14:05 crownest/crownest72944 -rw-r-. 1 bacula tape 5368701109 May 1 16:13 crownest/crownest72945 -rw-r-. 1 bacula tape 5368688850 May 1 16:14 crownest/crownest72947 -rw-r-. 1 bacula tape 5368688851 May 1 16:14 crownest/crownest72948 -rw-r-. 1 bacula tape 5368647060 May 1 23:38 crownest/crownest72949 -rw-r-. 1 bacula tape 5368654487 May 2 00:29 crownest/crownest72950 -rw-r-. 1 bacula tape 5368659375 May 2 08:28 crownest/crownest72951 -rw-r-. 1 bacula tape 5368688850 May 2 08:29 crownest/crownest72952 -rw-r-. 1 bacula tape 5368688845 May 2 08:30 crownest/crownest72953 -rw-r-. 1 bacula tape 5368662716 May 2 08:52 crownest/crownest72954 -rw-r-. 1 bacula tape 5368671066 May 2 10:37 crownest/crownest72955 -rw-r-. 1 bacula tape 5368688850 May 2 10:37 crownest/crownest72956 -rw-r-. 1 bacula tape 5368688851 May 2 10:38 crownest/crownest72957 -rw-r-. 1 bacula tape 2391406983 May 2 10:38 crownest/crownest72958 [root@lou bacula]# pwd /var/bacula [root@lou bacula]# Client Status *** *status client=lsaless-fd Connecting to Client lsaless-fd at 10.3.3.3:9102 esales4-fd Version: 5.2.10 (28 June 2012) VSS Linux Cross-compile Win64 Daemon started 02-May-19 10:38. Jobs: run=0 running=0. Microsoft Windows 7 Professional Service Pack 1 (build 7601), 64-bit Heap: heap=0 smbytes=239,813 max_bytes=262,885 bufs=409 max_bufs=438